Page 1 of 1

Performance tuning

Posted: Fri Jan 26, 2007 1:13 am
by DSGuru79
Hi Gurus,

I need some advice on performance tuning in datastage Parallel extender.
This is the scenario.

1. I am using teradata enterprise stage to read a table and do some processing using lookup stage, sort stage and aggregator stage and finally loading 30 million records into the teradata table using teradata multiload stage. (runtime is 40 min)

2. To reduce run time, I just tried to do the sorting and aggregation in teradata enterprise stage itself and then do the same lookups, then loading into teradata table using teradata multiload stage - 30 million records. (runtime is 39 min).

My assumption was the scenario 2 should run very fast when compared to scenario 1. But it was not. Why is that ? Can some one give an explanation about this ? Is my assumption wrong? ( actually I expected the scenario 2 will run in 20 or 25 min, since it avoids sort and aggregate stage in datastage)

Thanks in advance

Posted: Fri Jan 26, 2007 5:25 am
by ray.wurlod
But it still has to DO the sorting and aggregation, and the DataStage clock is still running once that request has been sent.

Posted: Fri Jan 26, 2007 7:47 am
by DSguru2B
If this were server then you would see a considerable amount of difference, but px sorting is pretty sleek. As Ray noted, sorting and agregations are still being done and hence your results.

Posted: Fri Jan 26, 2007 9:17 am
by DSGuru79
Thanks.

Is there any other way I can improve the performance in parallel for this scenario?

Posted: Fri Jan 26, 2007 10:58 am
by Krazykoolrohit
Which is the stage that is taking relatively more time? is it extraction, laod??

Posted: Fri Jan 26, 2007 1:58 pm
by DSGuru79
Extraction takes more time than load (25 min) and load is nearly 10 minutes.

Posted: Fri Jan 26, 2007 2:06 pm
by DSguru2B
Thats time is for extraction only or for extrction + sorting + aggregations? If its just extractioin then there isnt much you can do. Execpt extract in chunks using the Multiple instances, load it to sepererate file, do a cat and sort and then in your second job, do the aggregations and load.

Posted: Mon Jan 29, 2007 5:19 pm
by bcarlson
Is there are a reason why you are using the multiload stage vs. the enterprise stage to load the target table? We typically have seen better load performance using enterprise.

Also, have your DBA check out the queries you are using to pull the data (both the straight read and the one where the sorting and aggregation are handled within the query). Make sure that is as tuned as possible.

How many Teradata nodes? How many DataStage nodes are defined in your config file? What values are you setting for the requestedsessions and sessionsperplayer options?

Brad.