Performance tuning

DSGuru79 · Post by **DSGuru79** » Fri Jan 26, 2007 1:13 am

Hi Gurus,

I need some advice on performance tuning in datastage Parallel extender.
This is the scenario.

1. I am using teradata enterprise stage to read a table and do some processing using lookup stage, sort stage and aggregator stage and finally loading 30 million records into the teradata table using teradata multiload stage. (runtime is 40 min)

2. To reduce run time, I just tried to do the sorting and aggregation in teradata enterprise stage itself and then do the same lookups, then loading into teradata table using teradata multiload stage - 30 million records. (runtime is 39 min).

My assumption was the scenario 2 should run very fast when compared to scenario 1. But it was not. Why is that ? Can some one give an explanation about this ? Is my assumption wrong? ( actually I expected the scenario 2 will run in 20 or 25 min, since it avoids sort and aggregate stage in datastage)

Thanks in advance

ray.wurlod · Post by **ray.wurlod** » Fri Jan 26, 2007 5:25 am

But it still has to DO the sorting and aggregation, and the DataStage clock is still running once that request has been sent.

DSguru2B · Post by **DSguru2B** » Fri Jan 26, 2007 7:47 am

If this were server then you would see a considerable amount of difference, but px sorting is pretty sleek. As Ray noted, sorting and agregations are still being done and hence your results.

DSGuru79 · Post by **DSGuru79** » Fri Jan 26, 2007 9:17 am

Thanks.

Is there any other way I can improve the performance in parallel for this scenario?

Krazykoolrohit · Post by **Krazykoolrohit** » Fri Jan 26, 2007 10:58 am

Which is the stage that is taking relatively more time? is it extraction, laod??

DSGuru79 · Post by **DSGuru79** » Fri Jan 26, 2007 1:58 pm

Extraction takes more time than load (25 min) and load is nearly 10 minutes.

DSguru2B · Post by **DSguru2B** » Fri Jan 26, 2007 2:06 pm

Thats time is for extraction only or for extrction + sorting + aggregations? If its just extractioin then there isnt much you can do. Execpt extract in chunks using the Multiple instances, load it to sepererate file, do a cat and sort and then in your second job, do the aggregations and load.

bcarlson · Post by **bcarlson** » Mon Jan 29, 2007 5:19 pm

Is there are a reason why you are using the multiload stage vs. the enterprise stage to load the target table? We typically have seen better load performance using enterprise.

Also, have your DBA check out the queries you are using to pull the data (both the straight read and the one where the sorting and aggregation are handled within the query). Make sure that is as tuned as possible.

How many Teradata nodes? How many DataStage nodes are defined in your config file? What values are you setting for the requestedsessions and sessionsperplayer options?

Brad.