How best to merge two datasets to get difference

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
bowen_tim
Premium Member
Premium Member
Posts: 6
Joined: Thu Jul 22, 2004 10:37 am
Location: Takapuna, New Zealand

How best to merge two datasets to get difference

Post by bowen_tim »

I have two tables with the same keys and I would like to calculate the cumulative difference between them.

From table1 I select the YTD balance up until the current month, from table2 I select the YTD balance up until the previous month. The difference is what I post into the current period in my target table.

The way I have attempted to do this is by selecting from the two tables separately, reverse the signs on the amount columns on table2, then use a link collector stage to merge the two datasets. I then sort the output using a sort stage, then use an aggregator stage to come up with the net difference.

Although this approach should in theory come up with the correct result, it seems to perform poorly - which leads me to believe there is probably a better way to do it.

I have read various posts that discourage the use of the link collector stage. Wondering if anyone can suggest a better approach for this job.

Much appreciated.
Tim
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

If the tables are in the same database instance I can't see why your technique wouldn't work keeping the entire process inside the database.

Use a union, negate one table's values and simply SQL group and sum, no sorting required. I would guess that with a parallel query database and a vast row reduction you would get a faster turn-around staying within the database. Your currently extracting 2X as many rows as the target result requires, there may be a lot of overhead there.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
paddu
Premium Member
Premium Member
Posts: 232
Joined: Tue Feb 22, 2005 11:14 am
Location: California

Post by paddu »

Can you post an example . Probably makes easy to understand

Something like this

Table1

assume Jan-March

Col1,Col2,YTD_balance
1,123,400.00


Table 2
assume Jan_feb

Col1,Col2,YTD_balance
1,123,300.00

Result something like this ???

Col1,Col2,YTD_balance
1,123,100.00

If this is you are looking for then why using link collector and aggregator stages. All you need to do is in our DB query. Join the two tables and subtract the YTD _balance column .

I hope i did not misunderstand your requirement.
Thanks
Paddu
bowen_tim
Premium Member
Premium Member
Posts: 6
Joined: Thu Jul 22, 2004 10:37 am
Location: Takapuna, New Zealand

Post by bowen_tim »

OK - changed the initial selection to use user-defined SQL to select the net balance from the two tables as suggested by Kenneth. This achieves the exact same result but performs alot better.
Using user defined SQL just didn't feel like the purist way to do things - but I guess at the end of the day it is all about performance and result.
Thanks for your help.
Post Reply