JOIN tables in Datastage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Billy
Participant
Posts: 9
Joined: Fri Aug 22, 2003 3:53 am
Location: Malaysia

JOIN tables in Datastage

Post by Billy »

Hi,

I have a table transaction (IDNo, Date, Tran amount) which I have to UNION back again based on the IDNo to get the cummulative tran amount.

Something like :
SELECT a.IDNo, a.Date, (SELECT SUM(b.Tran amount) from table b where b.Date < a.Date and b.IDNo = a.IDNo group by b.IDNo) from table a

Can a MERGE stage or a Hash File be useful in this situation?

Would appreciate any urgent help, need to resolve this issue [?]


thanks in advance
Billy
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Hi Billy,

Merge stage is to join two Sequential File. If you are talking about the table then any database should have the functionality to do the union based on your select. Try the user definied query to solve your problem.

Cheers
Rasi
Billy
Participant
Posts: 9
Joined: Fri Aug 22, 2003 3:53 am
Location: Malaysia

Post by Billy »

Hi Rasi,
Yes, I have thot of that option unfortunately the DBMS I am using does not support sub-select statements. FYI, the DBMS is Progress.

cheers
Billy
WoMaWil
Participant
Posts: 482
Joined: Thu Mar 13, 2003 7:17 am
Location: Amsterdam

Post by WoMaWil »

Hi Billy,

there are a lot of options within DataStage. One is for example sort your table by IdNo and Date and work with Stagevariables, that should be one of many possible solutions.



Wolfgang Huerter
=====================
Cologne, Germany
spracht
Participant
Posts: 105
Joined: Tue Apr 15, 2003 11:30 pm
Location: Germany

Post by spracht »

Billy

looks like you are trying to get the maximum Date and the sum of Tran Amount per IDNo. If that's the case, couldn't you try:

select IDNo, max(Date), sum(Tran amount)
from table a
group by IDNo

[?]

Stephan
Billy
Participant
Posts: 9
Joined: Fri Aug 22, 2003 3:53 am
Location: Malaysia

Post by Billy »

Hi Stephan

The objective is not to get the max date but a cummulative amount. Let me give an example

Table A
-------
IDNo Tran Date Tran Amount
AAA 1/1/01 100
BBB 2/1/01 100
CCC 3/1/01 100
AAA 1/1/02 100
BBB 2/1/02 100
CCC 3/1/02 100
AAA 1/1/03 100
BBB 2/1/03 100
CCC 3/1/03 100
AAA 1/4/03 100
BBB 2/4/03 100
CCC 3/4/03 100

SELECT a.IDNo, a.Date, (SELECT SUM(b.Tran amount) from table b where b.Date < a.Date and b.IDNo = a.IDNo group by b.IDNo) from table a

The result I would expect to achieve is :

IDNo Tran Date Tran Amount Cumm Amount
AAA 1/1/01 100 0 (nothing to cummulate 1st tran)
BBB 2/1/01 100 0
CCC 3/1/01 100 0
AAA 1/1/02 100 100
BBB 2/1/02 100 100
CCC 3/1/02 100 100
AAA 1/1/03 100 200
BBB 2/1/03 100 200
CCC 3/1/03 100 200
AAA 1/4/03 100 300
BBB 2/4/03 100 300
CCC 3/4/03 100 300

This is just one part of the calculation, another part is I have to cummulate the figures starting from the week before.

Welcome all and any ideas that would help resolve these problems [:)]


cheers
Bernard
spracht
Participant
Posts: 105
Joined: Tue Apr 15, 2003 11:30 pm
Location: Germany

Post by spracht »

Does this fit better? You would possibly have to translate null values to 0.

SELECT a.IDNo, a.Date, a.Tran amount, SUM(b.Tran amount)
from table a, outer table b
where a.IDNo=b.IDNo
and b.Date < a.Date
group by a.IDNo, a.Date, a.Tran amount


Stephan
Billy
Participant
Posts: 9
Joined: Fri Aug 22, 2003 3:53 am
Location: Malaysia

Post by Billy »

Hi Stephan

Looks workable and good. However I will still have to perform an additional SELECT statement to get the cummulative figures from the previous week.

All these are part of a pretty complex subset of the final table we wish to get. We need to accumulate based on a certain range and the full range.

In addition to this we are trying to move the processing away from the server into the staging area. Trying to maximize the potential of DS [:)]


cheers
Billy
spracht
Participant
Posts: 105
Joined: Tue Apr 15, 2003 11:30 pm
Location: Germany

Post by spracht »

As for the cumulation as of the week before, does that mean you would always only sum it up starting Sunday (or Monday) last week up to yesterday?

In most DBMS there should be a function like WEEKDAY, returning 0-6 (or 1-7) for days of the week Sunday to Saturday or Monday to Sunday.

Provided that the internal representation of a date in your DBMS is an integer (number of days passed since '1900-01-01' or whatever), you could do something similar as:

SELECT a.IDNo, a.Date, a.Tran amount, SUM(b.Tran amount)
from table a, outer table b
where a.IDNo=b.IDNo
and b.Date between a.Date - (Weekday(a.Date)+7) and a.Date - 1
group by a.IDNo, a.Date, a.Tran amount

You would have to check the exact syntax and values to be applied for your DBMS.

Stephan
Billy
Participant
Posts: 9
Joined: Fri Aug 22, 2003 3:53 am
Location: Malaysia

Post by Billy »

Stephan
Yes, this works.

We also have another similiar problem but this is even more complex.

TranNo TranCnt TranCo Amt TranDt
000001 0000000 ABCCom 100 1/1/01
000002 0000000 DEFCom 100 1/1/01
000003 0000000 GHICom 100 1/1/01
000001 0000001 ABCCom 100 3/1/01
000002 0000001 DEFCom 100 3/1/01
000003 0000001 GHICom 100 3/1/01
000001 0000002 ABCCom 100 7/1/01
000002 0000002 DEFCom 100 7/1/01
000003 0000002 GHICom 100 7/1/01
000001 0000004 ABCCom 100 8/1/01
000002 0000004 DEFCom 100 8/1/01
000003 0000004 GHICom 100 8/1/01
100001 0000005 ABCCom 505 8/1/01
100002 0000005 DEFCom 505 8/1/01
100003 0000005 GHICom 505 8/1/01

For this case I need to get the result as shown below and the formula for the calculation are

1. Previous Date
For each TranCompany, accumulate the Amount which has the largest TranCount

2. To Date
For each TranCompany, accumulate the Amount which has the largest TranCount


TranNo TranCnt TranCo Amt TranDt ToDt PrevDt
100001 0000001 ABCCom 100 1/1/01 100 0
100002 0000001 DEFCom 100 1/1/01 100 0
100003 0000001 GHICom 100 1/1/01 100 0

100001 0000002 ABCCom 202 3/1/01 202 100
100002 0000002 DEFCom 202 3/1/01 202 100
100003 0000002 GHICom 202 3/1/01 202 100

100001 0000003 ABCCom 303 7/1/01 303 202
100002 0000003 DEFCom 303 7/1/01 303 202
100003 0000003 GHICom 303 7/1/01 303 202

100001 0000004 ABCCom 404 8/1/01 505 303
100002 0000004 DEFCom 404 8/1/01 505 303
100003 0000004 GHICom 404 8/1/01 505 303
100001 0000005 ABCCom 505 8/1/01 505 303
100002 0000005 DEFCom 505 8/1/01 505 303
100003 0000005 GHICom 505 8/1/01 505 303

100001 0000006 ABCCom 606 11/1/01 606 505
100002 0000006 DEFCom 606 11/1/01 606 505
100003 0000006 GHICom 606 11/1/01 606 505

This case is much more complex than the previous one as we need to get the max TranCnt record and also the previous Transaction MAX Trancnt



rgds
Billy
spracht
Participant
Posts: 105
Joined: Tue Apr 15, 2003 11:30 pm
Location: Germany

Post by spracht »

Billy

I don't understand the requirement (maybe a had one beer too much [:)]). Please try to explain it with more detail.

Stephan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

[?] I really can't see what any of your requirements has to do with DataStage. You're really just after help constructing SQL statements.
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Hi Billy,

The solution is to break your jobs into steps. As I already mentioned you need to try with User Defined SQL and split this into number of steps based on your requirement.

My advise is not to try and do the whole thing in one single SQL. Break it into serious of steps you will get the result.

Cheers
Rasi
Billy
Participant
Posts: 9
Joined: Fri Aug 22, 2003 3:53 am
Location: Malaysia

Post by Billy »

Hi Stephan & Rasi,

Thanks for the advise, I am extracting from the source into the a temp DBMS which supports sub-select statements and then I extract it out into the staging area again for JOINs

I'm new to the ETL and perhaps this may sound like a naive question - from your experiences is it normal to pump data into a temp DBMS to perform certain complex aggregation and then extract it out again during the ETL process?

I can't see any other way than to do it in the temp DBMS?


cheers
Bernard
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Yes Billy,


Creating a Temp table and doing aggregation is one of the smart way. You need to have index on your temp tables before doing the aggregate.

Cheers
Rasi
Post Reply