Datastage Job performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
adi_2_chaos
Participant
Posts: 96
Joined: Fri Apr 17, 2009 5:58 am
Location: Delhi

Datastage Job performance

Post by adi_2_chaos »

Hi All,

There is a datastage job we are using to fetch records from a source DB by using Teradata Connector stage (Bulk mode, invoking the Fast Export) and write it to a dataset - the volume is close to 32 million records (fetching data close to 25 GB) and the job takes close to 1 hour and 15 minutes daily. When I see the explain plan, the approximate time shown is less than a minute.

I also tried to run the query via Bteq on the unix command line - the query took less than 3 minutes to execute; but while writing to the data file, it took close to 6 hours for completion.

I even tried enabling Parallel Sync while setting the max sessions as 6, but it hasnt shown any considerable improvement either.

So, if I can take a guess, query is working considerably well. What needs to be looked into is how the writing of data can be optimized.

Is there a way how this job can be further optimized?

With best regards
Sriram
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

I don't think there is much you can accomplish with the job design... it sounds like you're already taking advantage of the connector's bulk read capability.

An investment in faster network and faster disk might be necessary...

Mike
AshishDevassy
Participant
Posts: 32
Joined: Fri Nov 07, 2014 11:18 am

Post by AshishDevassy »

Whats the throughput that you see with your current design.
that's wierd ?
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Clone the job and replace the dataset with a peek stage. Look at the performance and compare to the orriginal where you write to disk. That will at least indicate if it's your disk access that is your bottleneck and not the TD + Network.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

In the job that writes to the Data Set stage, experiment with using different numbers of nodes in the config file. How many nodes is it currently running on?

You may also want to check with your server administrator and/or storage team, if you have one. Find out exactly how slow those disks are writing.
Choose a job you love, and you will never have to work a day in your life. - Confucius
Post Reply