Performance Tuning

venkatesan · Post by **venkatesan** » Thu Dec 30, 2010 2:05 pm

All,

My Jobs are running fine, its taking 6 hrs time to complete the Nightly ETL Refresh, soon we are going to get 30 times more data.
But we have only 8 hrs bandwidth for Nightly Refresh.

My Question:

How to make datastage use more resources to run faster? like running larger job in multiple processor without redesigning the job.

How to know the DataStage system utilization ?

Below is my environment details:

Quad Core Processor.

40 Ghz RAM.

Thanks
Venkatesan

dsisbank · Post by **dsisbank** » Thu Dec 30, 2010 2:46 pm

You can increase your parallelism on your Configuration File ( $APT_CONFIG_FILE).

DSguru2B · Post by **DSguru2B** » Thu Dec 30, 2010 3:13 pm

The guy is on sever edition so no config file.
There is no straight forward answer for your question. Someone has to really see/analyze the code and see where the bottleneck is. Its a process and not a quick fix.

ray.wurlod · Post by **ray.wurlod** » Thu Dec 30, 2010 3:51 pm

Nice to know the speed of your memory, but it's much more important to know the capacity of your memory. You can use Link Partitioner and Link Collector stages to split your data stream into parallel streams that can be processed in separate processes, using (mandatorily) inter-process row buffering to establish communication between processes. You may even be able to design parallelism completely using multi-instance jobs, assuming there is no necessarily sequential stage (such as writing to a text file) in the design.

venkatesan · Post by **venkatesan** » Fri Dec 31, 2010 1:27 pm

We have more than 1000 jobs. Is that any other way to increase the performance of these jobs by without redesigning job.

ray.wurlod · Post by **ray.wurlod** » Fri Dec 31, 2010 2:58 pm

Run fewer jobs at once? Really one can not offer advice because we have no information at all about resources demanded by your jobs, resources available on your system, where any bottlenecks might be occurring, which jobs are scheduled to run simultaneously, and so on. The only information we have is your unwillingness to effect changes to jobs that might help. You could try simply enabling inter-process row buffering (yes, that's a change).

venkatesan · Post by **venkatesan** » Sat Jan 01, 2011 12:03 am

Thanks ray. We are using EPM for our Peoplesoft ERP. This is a custom warehouse delivered by Peoplesoft. Its using datastage 7X for all ETL. Since its delivered mart, with all builtin job. It will be hard for us to change all the jobs, also it has more than 1000 jobs. Only change we can do is in the project level not in the job level.

How to calculate resource demand for jobs?

How to increase the number of records processed per second, right now a simple one to one copy is taking 40 minutes to process 3 million data?
Job design is simple Source DRS Stage --> Transformer --> Target DRS Stage
Database is Oracle 11g(both).

India2000 · Post by **India2000** » Sat Jan 01, 2011 1:07 pm

use Link partitioner and link collector stage

ray.wurlod · Post by **ray.wurlod** » Sat Jan 01, 2011 3:00 pm

Your unwillingness to change is severely limiting your options. Did you try simply enabling inter-process row buffering as I suggested?

Create some jobs to determine where the bottleneck is. For example:

Code: Select all

DRS --->  Transformer --->  SeqFile

with a constraint of @FALSE in the Transformer stage will show you the potential read speed that you could get were you not writing to a target. Remove the constraint to learn how fast the job can run writing to a text file rather than to Oracle. Experiment.

PaulVL · Post by **PaulVL** » Tue Jan 04, 2011 8:48 am

If you cannot make modifications to your jobs you must identify what is the underlying bottleneck of your situation.

A) Look at the performance of your server during your slowness.
- This will indicate if you are limited on actual CPU or Memory on your engine tier.
- Increasing the CPU Cores on your server may help ( remember that you have to increase your DS Licensing).
- Increasing the memory on your server does not cost you extra product licensing.

B) Identify the sources of your data and look to see if they are saturated.
- Are you limited in your ability to extract or incert your data to your databases.
- If your jobs land data to disk, and you have slow disk technology, seek to improve it.

C) Look to improve your scheduling.
- Delaying a job by ten or twenty minutes may increase your job performance. You don't want to submit 1000 jobs all at once.

D) What other processes could be executing on the server that could be disabled?
- Hopefully this is a dedicated DataStage Server and no other tasks are executing that may chew up your precious CPU cycles.