Datastage Looping Aggregation

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
hhh
Participant
Posts: 86
Joined: Tue Aug 02, 2005 7:39 am

Datastage Looping Aggregation

Post by hhh »

Hello Friends,

Need your input to create better datastage design.

Well let me tell you my input data and what i want to achieve from input data.
Input Data
Installment AcctNumber Earned Interest Total Interest
0 abc 17.09066667 1237.2
1 abc 37.88133333 1237.2
2 abc 39.86433333 1237.2
3 abc 36.71166667 1237.2
4 abc 38.60966667 1237.2

Expected OutPut :
Installment AcctNumber UnEarnedInterest
0 abc 1220.109333(Total Interest - ( sum(Earned Interest) for 0 installement)
1 abc 1159.454333(Total Interest - ( sum(Earned Interest) for 0 & 1 installements)
2 abc 1142.363667(Total Interest - ( sum(Earned Interest) for 0,1 & 2 installements)
3 abc 1105.652(Total Interest - ( sum(Earned Interest) for 0,1,2 & 3 installements)
4 abc 1067.042333(Total Interest - ( sum(Earned Interest) for 0,1,2,3 & 4 installements)

I thought that i will achieve with below design :

InputData
|
Copy Stage
|
Join Stage(This will have two link(let us say x and y links) which has same data so i will produce cross product based AcctNumber key column)
|
Transfomer/Fileter Stage( Filter the record based on x.Installment No >= y.Installment No )
|
Aggregator Stage( Will do sum( Earned Interest) based on x.Installment No , AcctNumber,Total Interest)
|
Transformer Stage( Final output will be produced based on : sum( Earned Interest)/Total Interest
|
Output Stage

All over i feel that this will be costly process as this will eat lots of scratch area as we have billions of records.
Apreciate if you could provide better idea or job design for this requirement.
Thanks in advance.
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi,
Installment number is fixed between 0 to 4 or it may increase/decrease?
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

What's the purpose of the join stage? What will it provide?

Based upon your examples, I think a simple key-change-based calculation using stage variables within a transformer is all that's necessary:

Code: Select all

Input--->Sort--->Transformer--->Output
Sort: Partition on Account, Sort on account/Installment#, generate a keyChange column
Transformer: Sum the Earned Interest in a stage variable, resetting on each keyChange=1, then subtract that summed value from the Total Interest value for the output record.

Code: Select all

svSumEI:  if keyChange=1 then input.EI else svSumEI + input.EI
Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
hhh
Participant
Posts: 86
Joined: Tue Aug 02, 2005 7:39 am

Post by hhh »

As number of installements will be veried on each account therefore join stage will merge the records based on account number and later on calculation will be applied in transformer stage as per my example.

Actually, i like your solution more with sort stage and tested the data it works for this requirement.

Lots of thanks ! Let us close this thread.
Post Reply