Partition Issue

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Shruthi
Participant
Posts: 74
Joined: Sun Oct 05, 2008 10:59 pm
Location: Bangalore

Partition Issue

Post by Shruthi »

Hi .. I have a scenario where we are doing inner join on 7 to 8 millions of records. My job design looks like this

Ref_File
|
Sort2
|
Source --> Sort1 --> Join--> Target

Sort1 and Sort2 are partitioned on same keys and partition type is Hash. The datatype's of the keys are also same. In join stage, both the input link's partition type is "Same". Join type is "Inner Join"

4 nodes are configured.

Each time I run for same source records, the number of output records is different.

The output from join differs by some 100 records. It would be great if someone could point me the mistake done here.
Shruthi
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

Can you embed your design with code tags? Not able to follow your design.
Shruthi
Participant
Posts: 74
Joined: Sun Oct 05, 2008 10:59 pm
Location: Bangalore

Re: Partition Issue

Post by Shruthi »

[code]
Ref_File
|
Sort2
|
Source --> Sort1 --> Join--> Target
[/code]
Shruthi
Participant
Posts: 74
Joined: Sun Oct 05, 2008 10:59 pm
Location: Bangalore

Post by Shruthi »

I'm not able to able to give my design clearly.

Let me explain the issue

I have two links to join stage.
The data in these links are sorted and hash partitioned on the same keys. The order of the columns in both the links are same.

In join stage, partition type for both the input links is "Same".
Join type is "Inner Join".

4 nodes are configured.

Each time, I run this job, the number of records in the output is different.
As the number of records are huge, am not able to compare the values. Is there any mistake here?
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

Try with a single node and verify the results. Is it Same?
Shruthi
Participant
Posts: 74
Joined: Sun Oct 05, 2008 10:59 pm
Location: Bangalore

Post by Shruthi »

Thanks so much for your reply. I have started running with single node. It will take some 2 hours. will keep posting the result.
Shruthi
Participant
Posts: 74
Joined: Sun Oct 05, 2008 10:59 pm
Location: Bangalore

Rartition Issue

Post by Shruthi »

Hi Balaji.. It worked fine with single node. Re-creating the jobs to check if something has missed out. Is there any other method to find the issue?
Shruthi
mithun.mg
Participant
Posts: 11
Joined: Thu Feb 22, 2007 6:04 am
Location: banglore

Re: Rartition Issue

Post by mithun.mg »

Shruthi wrote:Hi Balaji.. It worked fine with single node. Re-creating the jobs to check if something has missed out. Is there any other method to find the issue?

Hi,
I am facing the same issue with CDC(Change data capture) stage ...Each time when i run i am getting different Output while using 4 node con fig file.When i checked it using single node file it is working fine.

The job design is like this

Sort_Stage (Reference link to CDC stage)
|
|
Sort_Stage--->CDC Stage----->Output


I have taken precaution of sorting the keys with hash Partition from both the source as well as reference link and also the order,datatype is same.

Can any one please help me understand what might be the problem
Thanks & Regards
MITHUN M G
ETL Devloper
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This is a different question and therefore requires a new thread. Click "Post New Topic" to begin a new thread.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply