Join stage partition

gowrishankar_h · Post by **gowrishankar_h** » Thu Aug 01, 2013 2:52 am

Hi All,

For join stage i have 2 input links.Say example as below.

Input 1: input -2
A B C D T U P D
--------- --------
1 2 4 5 t u 8 5
6 7 h 5 8 i 7 7
3 5 6 t 6 8 5 t
t 7 5 d 5 y t u

Input -1 is hash partitioned with column D and sorted with column D.
Input -2 is hash partitioned with column T and sorted with column P and D.

is it possible to do join with key column as D.Will it produce correct out put.
I read in document tat input link of join has be to partitioned and sorted with key column of join stage.Here i sorted the key column but it is with different partitioned.

Looking forward the answer frm expert's guys.
Please help me to understand.

Thanks,
Gowri

ArndW · Post by **ArndW** » Thu Aug 01, 2013 4:02 am

Code: Select all

Input 1:      input -2
A B C D        T U P D 
---------        --------    
1 2 4 5        t u 8 5
6 7 h 5        8 i 7 7
3 5 6 t         6 8 5 t
t 7 5 d         5  y t u

No, if the left link is partitioned on "D" and the right link partitioned on "T" then your join will not work correctly unless you are running a 1-node configuration.

For example if on a 4-node configuration your left link will hash the value 5 to node 0, the value "t" to node 1 and "d" to node 2.
The right link will hash the "T" column value 5 to node 0, "t" to node 1, 8 to node 2 and 6 to node 3.

So the "D" column value 5 on the left link will be in node 0 but on the right hand column it would be on node 1. The join wouldn't match any value and on an inner join would not output anything and on an outer would return null values for the unmatched columns.

gowrishankar_h · Post by **gowrishankar_h** » Thu Aug 01, 2013 4:14 am

Thanks AndrW,
One basic question..So you mean to say only sort on the same key colum alone wont work for join.even partition also should be on the same key column for both the input.
In join both input will be compared like Left link input node 0 to right link input node 0 and it wont be like left input node 0 to all other node of right link?iam i right?

Thanks,
Gowri

priyadarshikunal · Post by **priyadarshikunal** » Thu Aug 01, 2013 5:20 am

Yes you are correct on that. Data fron left link on Node 0 will be compared with data from right link on node 0 only. And partioning should be done on join keys and should be same for both the links.

ArndW · Post by **ArndW** » Thu Aug 01, 2013 6:42 am

One needs to ensure that records with the same keys are on the same partitions on the join. When joining on multiple keys it is sufficient to use the first key for partitioning. Note that the data going into a join must be sorted on the join keys as well, if the job design doesn't include such a sort then DataStage will insert a tsort operator at runtime (unless switched off using $APT_NO_SORT_INSERTION, in which case badly sorted data causes the job to abort).

gowrishankar_h · Post by **gowrishankar_h** » Thu Aug 01, 2013 7:02 am

Thanks priyadarshikunal and Arndrw

DSXchange

Join stage partition

Join stage partition

Re: Join stage partition

Re: Join stage partition