Hi All,
For join stage i have 2 input links.Say example as below.
Input 1: input -2
A B C D T U P D
--------- --------
1 2 4 5 t u 8 5
6 7 h 5 8 i 7 7
3 5 6 t 6 8 5 t
t 7 5 d 5 y t u
Input -1 is hash partitioned with column D and sorted with column D.
Input -2 is hash partitioned with column T and sorted with column P and D.
is it possible to do join with key column as D.Will it produce correct out put.
I read in document tat input link of join has be to partitioned and sorted with key column of join stage.Here i sorted the key column but it is with different partitioned.
Looking forward the answer frm expert's guys.
Please help me to understand.
Thanks,
Gowri
Join stage partition
Moderators: chulett, rschirm, roy
Re: Join stage partition
Code: Select all
Input 1: input -2
A B C D T U P D
--------- --------
1 2 4 5 t u 8 5
6 7 h 5 8 i 7 7
3 5 6 t 6 8 5 t
t 7 5 d 5 y t u
For example if on a 4-node configuration your left link will hash the value 5 to node 0, the value "t" to node 1 and "d" to node 2.
The right link will hash the "T" column value 5 to node 0, "t" to node 1, 8 to node 2 and 6 to node 3.
So the "D" column value 5 on the left link will be in node 0 but on the right hand column it would be on node 1. The join wouldn't match any value and on an inner join would not output anything and on an outer would return null values for the unmatched columns.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 42
- Joined: Wed Dec 26, 2012 1:13 pm
Re: Join stage partition
Thanks AndrW,
One basic question..So you mean to say only sort on the same key colum alone wont work for join.even partition also should be on the same key column for both the input.
In join both input will be compared like Left link input node 0 to right link input node 0 and it wont be like left input node 0 to all other node of right link?iam i right?
Thanks,
Gowri
One basic question..So you mean to say only sort on the same key colum alone wont work for join.even partition also should be on the same key column for both the input.
In join both input will be compared like Left link input node 0 to right link input node 0 and it wont be like left input node 0 to all other node of right link?iam i right?
Thanks,
Gowri
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
One needs to ensure that records with the same keys are on the same partitions on the join. When joining on multiple keys it is sufficient to use the first key for partitioning. Note that the data going into a join must be sorted on the join keys as well, if the job design doesn't include such a sort then DataStage will insert a tsort operator at runtime (unless switched off using $APT_NO_SORT_INSERTION, in which case badly sorted data causes the job to abort).
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 42
- Joined: Wed Dec 26, 2012 1:13 pm