Datastage Aggregator operation Vs Stored Procedures(PL/SQL)

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsedi
Participant
Posts: 220
Joined: Wed Jun 02, 2004 12:38 am

Datastage Aggregator operation Vs Stored Procedures(PL/SQL)

Post by dsedi »

Currently,we have an application that is bulit using Datastage.Most of the jobs do the aggregation on different levels(grouping elements) to the incoming data and loads it to the tables.The whole of the process takes around 45 mins for 120,000 records.
We are thinking of moving the aggregation operation out of etl and use PL/SQL so as to improve the performance.Pls let me know the procs/Cons involved in shifting the aggregation logic from ETL to PL/SQL with respect to performance
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

May be this helps
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Ultramundane
Participant
Posts: 407
Joined: Mon Jun 27, 2005 8:54 am
Location: Walker, Michigan
Contact:

Re: Datastage Aggregator operation Vs Stored Procedures(PL/S

Post by Ultramundane »

I wonder what the current application/job designs look like and if there is way to change what you have to increase the performance of the jobs?

Here are my ideas.

- If you are going to group by A and A,B and A,B,C you could create a dataset that is sorted and partitioned on A,B,C to perform all three of these operations and specifying sort method in aggregation.

- If your system can handle it, maybe some of the jobs can run concurrently to increase parallelism. Such as, run several jobs to build the sorted datasets (if shared) concurrently. And, running several of the aggregation jobs concurrently.

- If your database has indexes that cover the sort keys consider pulling the data out sorted, use an order by and pull the data out sorted.

- If your database has indexes that cover the grouping (if so, it has indexes to suffice ordering as well), consider writing your SQL statement with the grouping.
Post Reply