Performance tuning

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
DSGuru79
Charter Member
Charter Member
Posts: 12
Joined: Thu Nov 09, 2006 8:57 pm

Performance tuning

Post by DSGuru79 »

Hi Gurus,

I need some advice on performance tuning in datastage Parallel extender.
This is the scenario.

1. I am using teradata enterprise stage to read a table and do some processing using lookup stage, sort stage and aggregator stage and finally loading 30 million records into the teradata table using teradata multiload stage. (runtime is 40 min)

2. To reduce run time, I just tried to do the sorting and aggregation in teradata enterprise stage itself and then do the same lookups, then loading into teradata table using teradata multiload stage - 30 million records. (runtime is 39 min).

My assumption was the scenario 2 should run very fast when compared to scenario 1. But it was not. Why is that ? Can some one give an explanation about this ? Is my assumption wrong? ( actually I expected the scenario 2 will run in 20 or 25 min, since it avoids sort and aggregate stage in datastage)

Thanks in advance
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

But it still has to DO the sorting and aggregation, and the DataStage clock is still running once that request has been sent.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

If this were server then you would see a considerable amount of difference, but px sorting is pretty sleek. As Ray noted, sorting and agregations are still being done and hence your results.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
DSGuru79
Charter Member
Charter Member
Posts: 12
Joined: Thu Nov 09, 2006 8:57 pm

Post by DSGuru79 »

Thanks.

Is there any other way I can improve the performance in parallel for this scenario?
DSGuru79
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

Which is the stage that is taking relatively more time? is it extraction, laod??
DSGuru79
Charter Member
Charter Member
Posts: 12
Joined: Thu Nov 09, 2006 8:57 pm

Post by DSGuru79 »

Extraction takes more time than load (25 min) and load is nearly 10 minutes.
DSGuru79
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Thats time is for extraction only or for extrction + sorting + aggregations? If its just extractioin then there isnt much you can do. Execpt extract in chunks using the Multiple instances, load it to sepererate file, do a cat and sort and then in your second job, do the aggregations and load.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

Is there are a reason why you are using the multiload stage vs. the enterprise stage to load the target table? We typically have seen better load performance using enterprise.

Also, have your DBA check out the queries you are using to pull the data (both the straight read and the one where the sorting and aggregation are handled within the query). Make sure that is as tuned as possible.

How many Teradata nodes? How many DataStage nodes are defined in your config file? What values are you setting for the requestedsessions and sessionsperplayer options?

Brad.
Post Reply