Currently,we have an application that is bulit using Datastage.Most of the jobs do the aggregation on different levels(grouping elements) to the incoming data and loads it to the tables.The whole of the process takes around 45 mins for 120,000 records.
We are thinking of moving the aggregation operation out of etl and use PL/SQL so as to improve the performance.Pls let me know the procs/Cons involved in shifting the aggregation logic from ETL to PL/SQL with respect to performance
Datastage Aggregator operation Vs Stored Procedures(PL/SQL)
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 407
- Joined: Mon Jun 27, 2005 8:54 am
- Location: Walker, Michigan
- Contact:
Re: Datastage Aggregator operation Vs Stored Procedures(PL/S
I wonder what the current application/job designs look like and if there is way to change what you have to increase the performance of the jobs?
Here are my ideas.
- If you are going to group by A and A,B and A,B,C you could create a dataset that is sorted and partitioned on A,B,C to perform all three of these operations and specifying sort method in aggregation.
- If your system can handle it, maybe some of the jobs can run concurrently to increase parallelism. Such as, run several jobs to build the sorted datasets (if shared) concurrently. And, running several of the aggregation jobs concurrently.
- If your database has indexes that cover the sort keys consider pulling the data out sorted, use an order by and pull the data out sorted.
- If your database has indexes that cover the grouping (if so, it has indexes to suffice ordering as well), consider writing your SQL statement with the grouping.
Here are my ideas.
- If you are going to group by A and A,B and A,B,C you could create a dataset that is sorted and partitioned on A,B,C to perform all three of these operations and specifying sort method in aggregation.
- If your system can handle it, maybe some of the jobs can run concurrently to increase parallelism. Such as, run several jobs to build the sorted datasets (if shared) concurrently. And, running several of the aggregation jobs concurrently.
- If your database has indexes that cover the sort keys consider pulling the data out sorted, use an order by and pull the data out sorted.
- If your database has indexes that cover the grouping (if so, it has indexes to suffice ordering as well), consider writing your SQL statement with the grouping.