JOIN tables in Datastage
Moderators: chulett, rschirm, roy
JOIN tables in Datastage
Hi,
I have a table transaction (IDNo, Date, Tran amount) which I have to UNION back again based on the IDNo to get the cummulative tran amount.
Something like :
SELECT a.IDNo, a.Date, (SELECT SUM(b.Tran amount) from table b where b.Date < a.Date and b.IDNo = a.IDNo group by b.IDNo) from table a
Can a MERGE stage or a Hash File be useful in this situation?
Would appreciate any urgent help, need to resolve this issue [?]
thanks in advance
Billy
I have a table transaction (IDNo, Date, Tran amount) which I have to UNION back again based on the IDNo to get the cummulative tran amount.
Something like :
SELECT a.IDNo, a.Date, (SELECT SUM(b.Tran amount) from table b where b.Date < a.Date and b.IDNo = a.IDNo group by b.IDNo) from table a
Can a MERGE stage or a Hash File be useful in this situation?
Would appreciate any urgent help, need to resolve this issue [?]
thanks in advance
Billy
Hi Stephan
The objective is not to get the max date but a cummulative amount. Let me give an example
Table A
-------
IDNo Tran Date Tran Amount
AAA 1/1/01 100
BBB 2/1/01 100
CCC 3/1/01 100
AAA 1/1/02 100
BBB 2/1/02 100
CCC 3/1/02 100
AAA 1/1/03 100
BBB 2/1/03 100
CCC 3/1/03 100
AAA 1/4/03 100
BBB 2/4/03 100
CCC 3/4/03 100
SELECT a.IDNo, a.Date, (SELECT SUM(b.Tran amount) from table b where b.Date < a.Date and b.IDNo = a.IDNo group by b.IDNo) from table a
The result I would expect to achieve is :
IDNo Tran Date Tran Amount Cumm Amount
AAA 1/1/01 100 0 (nothing to cummulate 1st tran)
BBB 2/1/01 100 0
CCC 3/1/01 100 0
AAA 1/1/02 100 100
BBB 2/1/02 100 100
CCC 3/1/02 100 100
AAA 1/1/03 100 200
BBB 2/1/03 100 200
CCC 3/1/03 100 200
AAA 1/4/03 100 300
BBB 2/4/03 100 300
CCC 3/4/03 100 300
This is just one part of the calculation, another part is I have to cummulate the figures starting from the week before.
Welcome all and any ideas that would help resolve these problems [:)]
cheers
Bernard
The objective is not to get the max date but a cummulative amount. Let me give an example
Table A
-------
IDNo Tran Date Tran Amount
AAA 1/1/01 100
BBB 2/1/01 100
CCC 3/1/01 100
AAA 1/1/02 100
BBB 2/1/02 100
CCC 3/1/02 100
AAA 1/1/03 100
BBB 2/1/03 100
CCC 3/1/03 100
AAA 1/4/03 100
BBB 2/4/03 100
CCC 3/4/03 100
SELECT a.IDNo, a.Date, (SELECT SUM(b.Tran amount) from table b where b.Date < a.Date and b.IDNo = a.IDNo group by b.IDNo) from table a
The result I would expect to achieve is :
IDNo Tran Date Tran Amount Cumm Amount
AAA 1/1/01 100 0 (nothing to cummulate 1st tran)
BBB 2/1/01 100 0
CCC 3/1/01 100 0
AAA 1/1/02 100 100
BBB 2/1/02 100 100
CCC 3/1/02 100 100
AAA 1/1/03 100 200
BBB 2/1/03 100 200
CCC 3/1/03 100 200
AAA 1/4/03 100 300
BBB 2/4/03 100 300
CCC 3/4/03 100 300
This is just one part of the calculation, another part is I have to cummulate the figures starting from the week before.
Welcome all and any ideas that would help resolve these problems [:)]
cheers
Bernard
Hi Stephan
Looks workable and good. However I will still have to perform an additional SELECT statement to get the cummulative figures from the previous week.
All these are part of a pretty complex subset of the final table we wish to get. We need to accumulate based on a certain range and the full range.
In addition to this we are trying to move the processing away from the server into the staging area. Trying to maximize the potential of DS [:)]
cheers
Billy
Looks workable and good. However I will still have to perform an additional SELECT statement to get the cummulative figures from the previous week.
All these are part of a pretty complex subset of the final table we wish to get. We need to accumulate based on a certain range and the full range.
In addition to this we are trying to move the processing away from the server into the staging area. Trying to maximize the potential of DS [:)]
cheers
Billy
As for the cumulation as of the week before, does that mean you would always only sum it up starting Sunday (or Monday) last week up to yesterday?
In most DBMS there should be a function like WEEKDAY, returning 0-6 (or 1-7) for days of the week Sunday to Saturday or Monday to Sunday.
Provided that the internal representation of a date in your DBMS is an integer (number of days passed since '1900-01-01' or whatever), you could do something similar as:
SELECT a.IDNo, a.Date, a.Tran amount, SUM(b.Tran amount)
from table a, outer table b
where a.IDNo=b.IDNo
and b.Date between a.Date - (Weekday(a.Date)+7) and a.Date - 1
group by a.IDNo, a.Date, a.Tran amount
You would have to check the exact syntax and values to be applied for your DBMS.
Stephan
In most DBMS there should be a function like WEEKDAY, returning 0-6 (or 1-7) for days of the week Sunday to Saturday or Monday to Sunday.
Provided that the internal representation of a date in your DBMS is an integer (number of days passed since '1900-01-01' or whatever), you could do something similar as:
SELECT a.IDNo, a.Date, a.Tran amount, SUM(b.Tran amount)
from table a, outer table b
where a.IDNo=b.IDNo
and b.Date between a.Date - (Weekday(a.Date)+7) and a.Date - 1
group by a.IDNo, a.Date, a.Tran amount
You would have to check the exact syntax and values to be applied for your DBMS.
Stephan
Stephan
Yes, this works.
We also have another similiar problem but this is even more complex.
TranNo TranCnt TranCo Amt TranDt
000001 0000000 ABCCom 100 1/1/01
000002 0000000 DEFCom 100 1/1/01
000003 0000000 GHICom 100 1/1/01
000001 0000001 ABCCom 100 3/1/01
000002 0000001 DEFCom 100 3/1/01
000003 0000001 GHICom 100 3/1/01
000001 0000002 ABCCom 100 7/1/01
000002 0000002 DEFCom 100 7/1/01
000003 0000002 GHICom 100 7/1/01
000001 0000004 ABCCom 100 8/1/01
000002 0000004 DEFCom 100 8/1/01
000003 0000004 GHICom 100 8/1/01
100001 0000005 ABCCom 505 8/1/01
100002 0000005 DEFCom 505 8/1/01
100003 0000005 GHICom 505 8/1/01
For this case I need to get the result as shown below and the formula for the calculation are
1. Previous Date
For each TranCompany, accumulate the Amount which has the largest TranCount
2. To Date
For each TranCompany, accumulate the Amount which has the largest TranCount
TranNo TranCnt TranCo Amt TranDt ToDt PrevDt
100001 0000001 ABCCom 100 1/1/01 100 0
100002 0000001 DEFCom 100 1/1/01 100 0
100003 0000001 GHICom 100 1/1/01 100 0
100001 0000002 ABCCom 202 3/1/01 202 100
100002 0000002 DEFCom 202 3/1/01 202 100
100003 0000002 GHICom 202 3/1/01 202 100
100001 0000003 ABCCom 303 7/1/01 303 202
100002 0000003 DEFCom 303 7/1/01 303 202
100003 0000003 GHICom 303 7/1/01 303 202
100001 0000004 ABCCom 404 8/1/01 505 303
100002 0000004 DEFCom 404 8/1/01 505 303
100003 0000004 GHICom 404 8/1/01 505 303
100001 0000005 ABCCom 505 8/1/01 505 303
100002 0000005 DEFCom 505 8/1/01 505 303
100003 0000005 GHICom 505 8/1/01 505 303
100001 0000006 ABCCom 606 11/1/01 606 505
100002 0000006 DEFCom 606 11/1/01 606 505
100003 0000006 GHICom 606 11/1/01 606 505
This case is much more complex than the previous one as we need to get the max TranCnt record and also the previous Transaction MAX Trancnt
rgds
Billy
Yes, this works.
We also have another similiar problem but this is even more complex.
TranNo TranCnt TranCo Amt TranDt
000001 0000000 ABCCom 100 1/1/01
000002 0000000 DEFCom 100 1/1/01
000003 0000000 GHICom 100 1/1/01
000001 0000001 ABCCom 100 3/1/01
000002 0000001 DEFCom 100 3/1/01
000003 0000001 GHICom 100 3/1/01
000001 0000002 ABCCom 100 7/1/01
000002 0000002 DEFCom 100 7/1/01
000003 0000002 GHICom 100 7/1/01
000001 0000004 ABCCom 100 8/1/01
000002 0000004 DEFCom 100 8/1/01
000003 0000004 GHICom 100 8/1/01
100001 0000005 ABCCom 505 8/1/01
100002 0000005 DEFCom 505 8/1/01
100003 0000005 GHICom 505 8/1/01
For this case I need to get the result as shown below and the formula for the calculation are
1. Previous Date
For each TranCompany, accumulate the Amount which has the largest TranCount
2. To Date
For each TranCompany, accumulate the Amount which has the largest TranCount
TranNo TranCnt TranCo Amt TranDt ToDt PrevDt
100001 0000001 ABCCom 100 1/1/01 100 0
100002 0000001 DEFCom 100 1/1/01 100 0
100003 0000001 GHICom 100 1/1/01 100 0
100001 0000002 ABCCom 202 3/1/01 202 100
100002 0000002 DEFCom 202 3/1/01 202 100
100003 0000002 GHICom 202 3/1/01 202 100
100001 0000003 ABCCom 303 7/1/01 303 202
100002 0000003 DEFCom 303 7/1/01 303 202
100003 0000003 GHICom 303 7/1/01 303 202
100001 0000004 ABCCom 404 8/1/01 505 303
100002 0000004 DEFCom 404 8/1/01 505 303
100003 0000004 GHICom 404 8/1/01 505 303
100001 0000005 ABCCom 505 8/1/01 505 303
100002 0000005 DEFCom 505 8/1/01 505 303
100003 0000005 GHICom 505 8/1/01 505 303
100001 0000006 ABCCom 606 11/1/01 606 505
100002 0000006 DEFCom 606 11/1/01 606 505
100003 0000006 GHICom 606 11/1/01 606 505
This case is much more complex than the previous one as we need to get the max TranCnt record and also the previous Transaction MAX Trancnt
rgds
Billy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Hi Billy,
The solution is to break your jobs into steps. As I already mentioned you need to try with User Defined SQL and split this into number of steps based on your requirement.
My advise is not to try and do the whole thing in one single SQL. Break it into serious of steps you will get the result.
Cheers
Rasi
The solution is to break your jobs into steps. As I already mentioned you need to try with User Defined SQL and split this into number of steps based on your requirement.
My advise is not to try and do the whole thing in one single SQL. Break it into serious of steps you will get the result.
Cheers
Rasi
Hi Stephan & Rasi,
Thanks for the advise, I am extracting from the source into the a temp DBMS which supports sub-select statements and then I extract it out into the staging area again for JOINs
I'm new to the ETL and perhaps this may sound like a naive question - from your experiences is it normal to pump data into a temp DBMS to perform certain complex aggregation and then extract it out again during the ETL process?
I can't see any other way than to do it in the temp DBMS?
cheers
Bernard
Thanks for the advise, I am extracting from the source into the a temp DBMS which supports sub-select statements and then I extract it out into the staging area again for JOINs
I'm new to the ETL and perhaps this may sound like a naive question - from your experiences is it normal to pump data into a temp DBMS to perform certain complex aggregation and then extract it out again during the ETL process?
I can't see any other way than to do it in the temp DBMS?
cheers
Bernard