agregation

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bart12872
Participant
Posts: 82
Joined: Fri Jan 19, 2007 5:38 pm

agregation

Post by bart12872 »

Hello,

I want to agregate a fact table (DB2 table). Basically, it will be that :

Agregate Key (6 columns)
sum(champ1)
sum(champ2)
max(champ3)
max(champ4)
min(champ3)
min(champ4)
count (distinct champ7)
count (distinct champ8)

In fact, there is 50 SUM, 10 MIN, 10 MAX and 2 COUNT DISTINCT.
and the table contains 1,000,000,000 rows
My question is : what is the best way to do this agregation in Datastage EE, the most optimal?
Or a DB2 SQL request is preferable ?


thanks,
bart
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You need to specify the expected row count and the destination of the results before we can give a best recommendation for your SPECIFIC example.

If the result is going to load into a table within the same database instance and reduce from 1 billion rows to 1 million rows, then the argument would be strong to do it as SQL within the database.

If the result is going to load into a remote database or file and the expected row count is 1/10th the current row count, then the argument could be made to extract and summarize on the fly.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If the grouping columns are indexed that is another argument for performing the summary in the database; it can leverage the indexes to establish the groupings.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply