Agregation

bart12872 · Post by **bart12872** » Tue Aug 19, 2008 7:15 am

Hi,

i have a problem with an agregation. I need to agregate 400 billions rows to 2 billions rows and figure out 20 indicators.
The performance are poor in Datastage because I must use the sort mode. (sort 400 billions rows, hum..!).
So, I decide to extract, transform my data and load the rows in DB2 and DB2 agregate. It's better but still too long.

additionnal info : each indicators are defined at row level, I can't pre agregate data in DB2 extraction. I need to extract all the rows figure out indicators and after agregate with sum functions.

Did someone faced this problematic ? Have you any idea to improve performance in this situation?

Does restructure stage can help me ? I mean, if I create vectors of indicators and I use combine records for exemple ?

Thanks,
Martin.

ArndW · Post by **ArndW** » Tue Aug 19, 2008 7:20 am

I would pre-sort the data before the aggregator stage in DataStage and tell it that the data is already sorted, the job will just fly through the data. If the same data is to be used more than once then sort it into a dataset and use that.

The more nodes you sort into and use at runtime, the more throughput you will get (this, of course, depends upon your hardware).

If the execution time is really important, look into a product such as SyncSort. I would be surprised if a database could do this relatively straightforward aggregation faster than a PX job.