Page 1 of 1

Datastage Job performance

Posted: Tue Mar 31, 2015 11:45 pm
by adi_2_chaos
Hi All,

There is a datastage job we are using to fetch records from a source DB by using Teradata Connector stage (Bulk mode, invoking the Fast Export) and write it to a dataset - the volume is close to 32 million records (fetching data close to 25 GB) and the job takes close to 1 hour and 15 minutes daily. When I see the explain plan, the approximate time shown is less than a minute.

I also tried to run the query via Bteq on the unix command line - the query took less than 3 minutes to execute; but while writing to the data file, it took close to 6 hours for completion.

I even tried enabling Parallel Sync while setting the max sessions as 6, but it hasnt shown any considerable improvement either.

So, if I can take a guess, query is working considerably well. What needs to be looked into is how the writing of data can be optimized.

Is there a way how this job can be further optimized?

With best regards
Sriram

Posted: Wed Apr 01, 2015 6:02 am
by Mike
I don't think there is much you can accomplish with the job design... it sounds like you're already taking advantage of the connector's bulk read capability.

An investment in faster network and faster disk might be necessary...

Mike

Posted: Wed Apr 01, 2015 7:36 am
by AshishDevassy
Whats the throughput that you see with your current design.

Posted: Wed Apr 01, 2015 7:56 am
by PaulVL
Clone the job and replace the dataset with a peek stage. Look at the performance and compare to the orriginal where you write to disk. That will at least indicate if it's your disk access that is your bottleneck and not the TD + Network.

Posted: Wed Apr 01, 2015 8:00 am
by qt_ky
In the job that writes to the Data Set stage, experiment with using different numbers of nodes in the config file. How many nodes is it currently running on?

You may also want to check with your server administrator and/or storage team, if you have one. Find out exactly how slow those disks are writing.