Help on JOIN stage
Posted: Fri Feb 08, 2013 5:24 am
Let's us consider that I need to implement the following database logic in Datastage . How would I go for it ?
The requirement is to JOIN two data sources based on TWO KEY COLUMNS
soemthing like :
select A.COL1 , A.COL2 , B.COL1, B.COL2
from
table1 A
JOIN table2 B
on A.COL3=B.COL3 and A.COL4=B.COl4 ;
Question 1. Do I need to specify TWO DIFFERENT KEY's in the JOIN STAGE , one for COL3 and the other for COL4 ? Is that possible ?
Question 2 : If multiple key's are specified within a SINGLE Join stage , does the inputs needs to be HASH PARTITIONED on all the key columns ?
If yes , will that not create erroneous results , because so far as I know , for a single stage , partitioning could be done on ONLY ONE COLUMN of the INPUT LINK.
If multiple keys are specified as partitioning keys , results may be incorrect.
My workaround:
Create a SINGLE join key field by using COL3|COL4 and partitioning the inputs based on this surrogate join column . The results remian same as the database
join .
Thanks
Kumarjit.
The requirement is to JOIN two data sources based on TWO KEY COLUMNS
soemthing like :
select A.COL1 , A.COL2 , B.COL1, B.COL2
from
table1 A
JOIN table2 B
on A.COL3=B.COL3 and A.COL4=B.COl4 ;
Question 1. Do I need to specify TWO DIFFERENT KEY's in the JOIN STAGE , one for COL3 and the other for COL4 ? Is that possible ?
Question 2 : If multiple key's are specified within a SINGLE Join stage , does the inputs needs to be HASH PARTITIONED on all the key columns ?
If yes , will that not create erroneous results , because so far as I know , for a single stage , partitioning could be done on ONLY ONE COLUMN of the INPUT LINK.
If multiple keys are specified as partitioning keys , results may be incorrect.
My workaround:
Create a SINGLE join key field by using COL3|COL4 and partitioning the inputs based on this surrogate join column . The results remian same as the database
join .
Thanks
Kumarjit.