Join stage partition

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
gowrishankar_h
Participant
Posts: 42
Joined: Wed Dec 26, 2012 1:13 pm

Join stage partition

Post by gowrishankar_h »

Hi All,

For join stage i have 2 input links.Say example as below.

Input 1: input -2
A B C D T U P D
--------- --------
1 2 4 5 t u 8 5
6 7 h 5 8 i 7 7
3 5 6 t 6 8 5 t
t 7 5 d 5 y t u

Input -1 is hash partitioned with column D and sorted with column D.
Input -2 is hash partitioned with column T and sorted with column P and D.

is it possible to do join with key column as D.Will it produce correct out put.
I read in document tat input link of join has be to partitioned and sorted with key column of join stage.Here i sorted the key column but it is with different partitioned.

Looking forward the answer frm expert's guys.
Please help me to understand.


Thanks,
Gowri
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Re: Join stage partition

Post by ArndW »

Code: Select all

Input 1:      input -2
A B C D        T U P D 
---------        --------    
1 2 4 5        t u 8 5
6 7 h 5        8 i 7 7
3 5 6 t         6 8 5 t
t 7 5 d         5  y t u
No, if the left link is partitioned on "D" and the right link partitioned on "T" then your join will not work correctly unless you are running a 1-node configuration.

For example if on a 4-node configuration your left link will hash the value 5 to node 0, the value "t" to node 1 and "d" to node 2.
The right link will hash the "T" column value 5 to node 0, "t" to node 1, 8 to node 2 and 6 to node 3.

So the "D" column value 5 on the left link will be in node 0 but on the right hand column it would be on node 1. The join wouldn't match any value and on an inner join would not output anything and on an outer would return null values for the unmatched columns.
gowrishankar_h
Participant
Posts: 42
Joined: Wed Dec 26, 2012 1:13 pm

Re: Join stage partition

Post by gowrishankar_h »

Thanks AndrW,
One basic question..So you mean to say only sort on the same key colum alone wont work for join.even partition also should be on the same key column for both the input.
In join both input will be compared like Left link input node 0 to right link input node 0 and it wont be like left input node 0 to all other node of right link?iam i right?


Thanks,
Gowri
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

Yes you are correct on that. Data fron left link on Node 0 will be compared with data from right link on node 0 only. And partioning should be done on join keys and should be same for both the links.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

One needs to ensure that records with the same keys are on the same partitions on the join. When joining on multiple keys it is sufficient to use the first key for partitioning. Note that the data going into a join must be sorted on the join keys as well, if the job design doesn't include such a sort then DataStage will insert a tsort operator at runtime (unless switched off using $APT_NO_SORT_INSERTION, in which case badly sorted data causes the job to abort).
gowrishankar_h
Participant
Posts: 42
Joined: Wed Dec 26, 2012 1:13 pm

Post by gowrishankar_h »

Thanks priyadarshikunal and Arndrw
Post Reply