Page 1 of 1

agregation

Posted: Tue Jun 24, 2008 8:23 am
by bart12872
Hello,

I want to agregate a fact table (DB2 table). Basically, it will be that :

Agregate Key (6 columns)
sum(champ1)
sum(champ2)
max(champ3)
max(champ4)
min(champ3)
min(champ4)
count (distinct champ7)
count (distinct champ8)

In fact, there is 50 SUM, 10 MIN, 10 MAX and 2 COUNT DISTINCT.
and the table contains 1,000,000,000 rows
My question is : what is the best way to do this agregation in Datastage EE, the most optimal?
Or a DB2 SQL request is preferable ?


thanks,
bart

Posted: Tue Jun 24, 2008 9:27 am
by kcbland
You need to specify the expected row count and the destination of the results before we can give a best recommendation for your SPECIFIC example.

If the result is going to load into a table within the same database instance and reduce from 1 billion rows to 1 million rows, then the argument would be strong to do it as SQL within the database.

If the result is going to load into a remote database or file and the expected row count is 1/10th the current row count, then the argument could be made to extract and summarize on the fly.

Posted: Tue Jun 24, 2008 4:22 pm
by ray.wurlod
If the grouping columns are indexed that is another argument for performing the summary in the database; it can leverage the indexes to establish the groupings.