Page 1 of 1

Datastage Join

Posted: Thu Apr 10, 2014 12:03 am
by sreesuku2
Hi,

I think its a simple one..

I have 2 files as listed below.

File1

Code: Select all

Project	Practice
ABC	     AD
ABC	     Cloud
File2

Code: Select all

Project	ID
ABC	    ID1
ABC	    ID2
I need to create a file3 as below

Code: Select all

Practise	ID
AD	      ID1
AD	      ID2
Cloud      ID1
Cloud      ID2
Can somebody help

Posted: Thu Apr 10, 2014 1:18 am
by prasson_ibm
Hi,

It is cross join.

Job design will look like below:-

Code: Select all

                     c-----Lnk1---->LKP1------>f
       File2-----> Copy(c)      File1----->Filter            Funnel(f)----->Dataset   
                             c----Lnk2----->LKP2------->f 
Use file2 as a source.In Copy stage pass both columns to link1 and link2.

File1 will be input to filter stage which has two output links. Filter the records based on "AD" and "Cloud" and pass to respective links which will be reference to the lookup stage.

Funnel the outputs from lookup stage.

Reply

Posted: Thu Apr 10, 2014 1:52 am
by ssnegi
you can create a dummy column in both files hard coded to 1 in transformer. Then join the dummy column from both links.

file1 --> transformer Dummy 1-->
-----------------------------------------Join Inner Dummy--> output
file2 --> transformer Dummy 1-->

Posted: Thu Apr 10, 2014 3:17 am
by ray.wurlod
Investigate the possibilities in a Merge stage.

Ignore ssnegi's response, which is not applicable for server job.

Reply

Posted: Thu Apr 10, 2014 3:27 am
by ssnegi
There is transformer in server job and you can output that to sequential files and then do inner join using merge stage

Posted: Thu Apr 10, 2014 7:00 am
by chulett
I think Ray meant Prasson, his is the PX specific one. And it just looks like a cross join from the very small sample data posted but I doubt it is - so the Merge stage should be able to handle the two files directly without any shenanigans.

Posted: Thu Apr 10, 2014 5:10 pm
by ray.wurlod
ssnegi specifies a Join. There is no Join stage in server jobs.

prasson's is also a parallel solution.