Let's us consider that I need to implement the following database logic in Datastage . How would I go for it ?
The requirement is to JOIN two data sources based on TWO KEY COLUMNS
soemthing like :
select A.COL1 , A.COL2 , B.COL1, B.COL2
from
table1 A
JOIN table2 B
on A.COL3=B.COL3 and A.COL4=B.COl4 ;
Question 1. Do I need to specify TWO DIFFERENT KEY's in the JOIN STAGE , one for COL3 and the other for COL4 ? Is that possible ?
Question 2 : If multiple key's are specified within a SINGLE Join stage , does the inputs needs to be HASH PARTITIONED on all the key columns ?
If yes , will that not create erroneous results , because so far as I know , for a single stage , partitioning could be done on ONLY ONE COLUMN of the INPUT LINK.
If multiple keys are specified as partitioning keys , results may be incorrect.
My workaround:
Create a SINGLE join key field by using COL3|COL4 and partitioning the inputs based on this surrogate join column . The results remian same as the database
join .
Thanks
Kumarjit.
Help on JOIN stage
Moderators: chulett, rschirm, roy
Help on JOIN stage
Pain is the best teacher, but very few attend his class..
-
- Premium Member
- Posts: 536
- Joined: Thu Oct 11, 2007 1:48 am
- Location: Bangalore
Hi Kumar,
Answer1:-Yes if you are joining on Two columns you need to specify both the columns into a join stage.
Answer 2:- It depends upon your data.If you define Hash partition on only one key column and it has few groupings,may be all your data will land up to one or two nodes and other nodes will be useless as a result performance will be effected.We should always try to split the records to all the partitions.
Answer1:-Yes if you are joining on Two columns you need to specify both the columns into a join stage.
Answer 2:- It depends upon your data.If you define Hash partition on only one key column and it has few groupings,may be all your data will land up to one or two nodes and other nodes will be useless as a result performance will be effected.We should always try to split the records to all the partitions.
Thanks
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
THE ONE thing I forgot to mention in my post,
If the fields on which join is performed is Nullable, and if they actually contain NULL values , will the output of the database join match the output of the Datastage JOIN stage ( using multiple join keys ) ?
I often faced this problem , that if the fields on which the data sources are joined contain NULL values, the database query output does not match with the datastage's output.
Can anyone please guide me on this ?
Thanks
Kumarjit.
If the fields on which join is performed is Nullable, and if they actually contain NULL values , will the output of the database join match the output of the Datastage JOIN stage ( using multiple join keys ) ?
I often faced this problem , that if the fields on which the data sources are joined contain NULL values, the database query output does not match with the datastage's output.
Can anyone please guide me on this ?
Thanks
Kumarjit.
Pain is the best teacher, but very few attend his class..
-
- Premium Member
- Posts: 536
- Joined: Thu Oct 11, 2007 1:48 am
- Location: Bangalore
Hi Kumar,
Conceptually it should match,since once you apply HASH partition on the input of key columns,it generates hash value and it will make sure that it lands up to same partition.With partition you should sort the inputs with null first.In case all joining columns are null,it is pointless to perform join.
Conceptually it should match,since once you apply HASH partition on the input of key columns,it generates hash value and it will make sure that it lands up to same partition.With partition you should sort the inputs with null first.In case all joining columns are null,it is pointless to perform join.
Thanks
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore