Page 1 of 1

Partition Issue

Posted: Tue Dec 09, 2008 4:00 am
by Shruthi
Hi .. I have a scenario where we are doing inner join on 7 to 8 millions of records. My job design looks like this

Ref_File
|
Sort2
|
Source --> Sort1 --> Join--> Target

Sort1 and Sort2 are partitioned on same keys and partition type is Hash. The datatype's of the keys are also same. In join stage, both the input link's partition type is "Same". Join type is "Inner Join"

4 nodes are configured.

Each time I run for same source records, the number of output records is different.

The output from join differs by some 100 records. It would be great if someone could point me the mistake done here.

Posted: Tue Dec 09, 2008 4:51 am
by balajisr
Can you embed your design with code tags? Not able to follow your design.

Re: Partition Issue

Posted: Tue Dec 09, 2008 4:59 am
by Shruthi
[code]
Ref_File
|
Sort2
|
Source --> Sort1 --> Join--> Target
[/code]

Posted: Tue Dec 09, 2008 5:04 am
by Shruthi
I'm not able to able to give my design clearly.

Let me explain the issue

I have two links to join stage.
The data in these links are sorted and hash partitioned on the same keys. The order of the columns in both the links are same.

In join stage, partition type for both the input links is "Same".
Join type is "Inner Join".

4 nodes are configured.

Each time, I run this job, the number of records in the output is different.
As the number of records are huge, am not able to compare the values. Is there any mistake here?

Posted: Tue Dec 09, 2008 5:10 am
by balajisr
Try with a single node and verify the results. Is it Same?

Posted: Tue Dec 09, 2008 5:13 am
by Shruthi
Thanks so much for your reply. I have started running with single node. It will take some 2 hours. will keep posting the result.

Rartition Issue

Posted: Tue Dec 09, 2008 8:05 am
by Shruthi
Hi Balaji.. It worked fine with single node. Re-creating the jobs to check if something has missed out. Is there any other method to find the issue?

Re: Rartition Issue

Posted: Wed Dec 10, 2008 9:27 am
by mithun.mg
Shruthi wrote:Hi Balaji.. It worked fine with single node. Re-creating the jobs to check if something has missed out. Is there any other method to find the issue?

Hi,
I am facing the same issue with CDC(Change data capture) stage ...Each time when i run i am getting different Output while using 4 node con fig file.When i checked it using single node file it is working fine.

The job design is like this

Sort_Stage (Reference link to CDC stage)
|
|
Sort_Stage--->CDC Stage----->Output


I have taken precaution of sorting the keys with hash Partition from both the source as well as reference link and also the order,datatype is same.

Can any one please help me understand what might be the problem

Posted: Wed Dec 10, 2008 1:45 pm
by ray.wurlod
This is a different question and therefore requires a new thread. Click "Post New Topic" to begin a new thread.