Page 1 of 1

outer join thru hash file

Posted: Thu Oct 06, 2005 11:49 am
by kollurianu
Hi all,

how to implement outer join with datastage , i have two files f1 and f2

f1 is the current file and f2 is the old file ,

f2 is a dump from the table , iam comparing those two files to get

records , that are 1. records are in f1 but not in f2 (new records)

2. changed records

3. records that are in f2 but not in f1


how to accomplish it ? any ideas

Thank you,

Posted: Thu Oct 06, 2005 12:08 pm
by kcbland
One possible solution:
Put f1 into a hash file, stream f2 and reference f1 hash file, put constraint that compares all relevant columns and output the new or differences.

Put f2 into a hash file, stream f1 and reference f2 hash file, put constraint that compares to see if found and output the new.


Second possible solution:
Put f1 and f2 into two tables and use 3 SQL queries to give you new to f1 (outer join not found), new to f2 (outer join not found), and different between f1 and f2 inner join.

Third solution:
Use unix diff statement twice switching order of files and pop out new or different rows. Dedup the delta rows because both statements will produce deltas as well as new.

Posted: Thu Oct 06, 2005 12:19 pm
by kollurianu
Thank you so much for your suggestions Kenneth Bland.

Posted: Thu Oct 06, 2005 12:41 pm
by clshore
I've used the first method Ken suggests with good results for Left, Right, and Inner joins (don't even think about the Merge stage; it works, but is a real PITA to build and maintain).
How do you wish to handle f1 and f2 records with the same keys?
Do you want survivorship, or do you want to preserve the attributes from both?
For a full Outer join, create a single hash file H using the common key metadata. For survivorship, use a single set of attribute columns, else attributes from f1 and f2. Write all of f1, then all of f2 into H, (or vice versa if you wish f1 records with same key to survive instead of f2).
The result in H will be a full outer join of f1 and f2.

Carter

Posted: Thu Oct 06, 2005 5:09 pm
by ray.wurlod
If you think about it, the standard lookup in a Transformer stage is a left outer join, in that it returns NULL for every column from the reference input link if the lookup does not succeed.

You typically turn it into an inner join using an output link constraint expression.