Performance Tuning

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
venkatesan
Premium Member
Premium Member
Posts: 10
Joined: Sat Dec 25, 2010 11:35 pm
Location: Bangalore
Contact:

Performance Tuning

Post by venkatesan »

All,

My Jobs are running fine, its taking 6 hrs time to complete the Nightly ETL Refresh, soon we are going to get 30 times more data.
But we have only 8 hrs bandwidth for Nightly Refresh.

My Question:

How to make datastage use more resources to run faster? like running larger job in multiple processor without redesigning the job.

How to know the DataStage system utilization ?

Below is my environment details:

Quad Core Processor.

40 Ghz RAM.


Thanks
Venkatesan
dsisbank
Premium Member
Premium Member
Posts: 50
Joined: Fri Aug 11, 2006 4:53 am

Post by dsisbank »

You can increase your parallelism on your Configuration File ( $APT_CONFIG_FILE).
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

The guy is on sever edition so no config file.
There is no straight forward answer for your question. Someone has to really see/analyze the code and see where the bottleneck is. Its a process and not a quick fix.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Nice to know the speed of your memory, but it's much more important to know the capacity of your memory. You can use Link Partitioner and Link Collector stages to split your data stream into parallel streams that can be processed in separate processes, using (mandatorily) inter-process row buffering to establish communication between processes. You may even be able to design parallelism completely using multi-instance jobs, assuming there is no necessarily sequential stage (such as writing to a text file) in the design.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
venkatesan
Premium Member
Premium Member
Posts: 10
Joined: Sat Dec 25, 2010 11:35 pm
Location: Bangalore
Contact:

Post by venkatesan »

We have more than 1000 jobs. Is that any other way to increase the performance of these jobs by without redesigning job.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Run fewer jobs at once? Really one can not offer advice because we have no information at all about resources demanded by your jobs, resources available on your system, where any bottlenecks might be occurring, which jobs are scheduled to run simultaneously, and so on. The only information we have is your unwillingness to effect changes to jobs that might help. You could try simply enabling inter-process row buffering (yes, that's a change).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
venkatesan
Premium Member
Premium Member
Posts: 10
Joined: Sat Dec 25, 2010 11:35 pm
Location: Bangalore
Contact:

Post by venkatesan »

Thanks ray. We are using EPM for our Peoplesoft ERP. This is a custom warehouse delivered by Peoplesoft. Its using datastage 7X for all ETL. Since its delivered mart, with all builtin job. It will be hard for us to change all the jobs, also it has more than 1000 jobs. Only change we can do is in the project level not in the job level.

How to calculate resource demand for jobs?

How to increase the number of records processed per second, right now a simple one to one copy is taking 40 minutes to process 3 million data?
Job design is simple Source DRS Stage --> Transformer --> Target DRS Stage
Database is Oracle 11g(both).
India2000
Participant
Posts: 274
Joined: Sun Aug 22, 2010 11:07 am

Post by India2000 »

use Link partitioner and link collector stage
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Your unwillingness to change is severely limiting your options. Did you try simply enabling inter-process row buffering as I suggested?

Create some jobs to determine where the bottleneck is. For example:

Code: Select all

DRS --->  Transformer --->  SeqFile
with a constraint of @FALSE in the Transformer stage will show you the potential read speed that you could get were you not writing to a target. Remove the constraint to learn how fast the job can run writing to a text file rather than to Oracle. Experiment.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

If you cannot make modifications to your jobs you must identify what is the underlying bottleneck of your situation.


A) Look at the performance of your server during your slowness.
- This will indicate if you are limited on actual CPU or Memory on your engine tier.
- Increasing the CPU Cores on your server may help ( remember that you have to increase your DS Licensing).
- Increasing the memory on your server does not cost you extra product licensing.


B) Identify the sources of your data and look to see if they are saturated.
- Are you limited in your ability to extract or incert your data to your databases.
- If your jobs land data to disk, and you have slow disk technology, seek to improve it.


C) Look to improve your scheduling.
- Delaying a job by ten or twenty minutes may increase your job performance. You don't want to submit 1000 jobs all at once.


D) What other processes could be executing on the server that could be disabled?
- Hopefully this is a dedicated DataStage Server and no other tasks are executing that may chew up your precious CPU cycles.
Post Reply