Partition Issue

Shruthi · Post by **Shruthi** » Tue Dec 09, 2008 4:00 am

Hi .. I have a scenario where we are doing inner join on 7 to 8 millions of records. My job design looks like this

Ref_File
|
Sort2
|
Source --> Sort1 --> Join--> Target

Sort1 and Sort2 are partitioned on same keys and partition type is Hash. The datatype's of the keys are also same. In join stage, both the input link's partition type is "Same". Join type is "Inner Join"

4 nodes are configured.

Each time I run for same source records, the number of output records is different.

The output from join differs by some 100 records. It would be great if someone could point me the mistake done here.

balajisr · Post by **balajisr** » Tue Dec 09, 2008 4:51 am

Can you embed your design with code tags? Not able to follow your design.

Shruthi · Post by **Shruthi** » Tue Dec 09, 2008 4:59 am

[code]
Ref_File
|
Sort2
|
Source --> Sort1 --> Join--> Target
[/code]

Shruthi · Post by **Shruthi** » Tue Dec 09, 2008 5:04 am

I'm not able to able to give my design clearly.

Let me explain the issue

I have two links to join stage.
The data in these links are sorted and hash partitioned on the same keys. The order of the columns in both the links are same.

In join stage, partition type for both the input links is "Same".
Join type is "Inner Join".

4 nodes are configured.

Each time, I run this job, the number of records in the output is different.
As the number of records are huge, am not able to compare the values. Is there any mistake here?

balajisr · Post by **balajisr** » Tue Dec 09, 2008 5:10 am

Try with a single node and verify the results. Is it Same?

Shruthi · Post by **Shruthi** » Tue Dec 09, 2008 5:13 am

Thanks so much for your reply. I have started running with single node. It will take some 2 hours. will keep posting the result.

Shruthi · Post by **Shruthi** » Tue Dec 09, 2008 8:05 am

Hi Balaji.. It worked fine with single node. Re-creating the jobs to check if something has missed out. Is there any other method to find the issue?

mithun.mg · Post by **mithun.mg** » Wed Dec 10, 2008 9:27 am

Shruthi wrote:Hi Balaji.. It worked fine with single node. Re-creating the jobs to check if something has missed out. Is there any other method to find the issue?

Hi,
I am facing the same issue with CDC(Change data capture) stage ...Each time when i run i am getting different Output while using 4 node con fig file.When i checked it using single node file it is working fine.

The job design is like this

Sort_Stage (Reference link to CDC stage)
|
|
Sort_Stage--->CDC Stage----->Output

I have taken precaution of sorting the keys with hash Partition from both the source as well as reference link and also the order,datatype is same.

Can any one please help me understand what might be the problem

ray.wurlod · Post by **ray.wurlod** » Wed Dec 10, 2008 1:45 pm

This is a different question and therefore requires a new thread. Click "Post New Topic" to begin a new thread.

DSXchange

Partition Issue

Partition Issue

Re: Partition Issue

Rartition Issue

Re: Rartition Issue