outer join thru hash file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
kollurianu
Premium Member
Premium Member
Posts: 614
Joined: Fri Feb 06, 2004 3:59 pm

outer join thru hash file

Post by kollurianu »

Hi all,

how to implement outer join with datastage , i have two files f1 and f2

f1 is the current file and f2 is the old file ,

f2 is a dump from the table , iam comparing those two files to get

records , that are 1. records are in f1 but not in f2 (new records)

2. changed records

3. records that are in f2 but not in f1


how to accomplish it ? any ideas

Thank you,
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

One possible solution:
Put f1 into a hash file, stream f2 and reference f1 hash file, put constraint that compares all relevant columns and output the new or differences.

Put f2 into a hash file, stream f1 and reference f2 hash file, put constraint that compares to see if found and output the new.


Second possible solution:
Put f1 and f2 into two tables and use 3 SQL queries to give you new to f1 (outer join not found), new to f2 (outer join not found), and different between f1 and f2 inner join.

Third solution:
Use unix diff statement twice switching order of files and pop out new or different rows. Dedup the delta rows because both statements will produce deltas as well as new.
Last edited by kcbland on Thu Oct 06, 2005 12:45 pm, edited 1 time in total.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kollurianu
Premium Member
Premium Member
Posts: 614
Joined: Fri Feb 06, 2004 3:59 pm

Post by kollurianu »

Thank you so much for your suggestions Kenneth Bland.
clshore
Charter Member
Charter Member
Posts: 115
Joined: Tue Oct 21, 2003 11:45 am

Post by clshore »

I've used the first method Ken suggests with good results for Left, Right, and Inner joins (don't even think about the Merge stage; it works, but is a real PITA to build and maintain).
How do you wish to handle f1 and f2 records with the same keys?
Do you want survivorship, or do you want to preserve the attributes from both?
For a full Outer join, create a single hash file H using the common key metadata. For survivorship, use a single set of attribute columns, else attributes from f1 and f2. Write all of f1, then all of f2 into H, (or vice versa if you wish f1 records with same key to survive instead of f2).
The result in H will be a full outer join of f1 and f2.

Carter
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you think about it, the standard lookup in a Transformer stage is a left outer join, in that it returns NULL for every column from the reference input link if the lookup does not succeed.

You typically turn it into an inner join using an output link constraint expression.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply