Server Job Optimization for multi-processor systems

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
psluser
Premium Member
Premium Member
Posts: 39
Joined: Tue Apr 22, 2008 7:00 am
Location: Pune, India

Server Job Optimization for multi-processor systems

Post by psluser »

hi,

In a project my client has a 8 processor high-end windows based machine running Datastage v7.5.1 server. I wish to know if there are any configuration parameters or job design considerations to take care of for utilizing full capability of multiprocessor system?

Regards,
Manish
Rubu
Premium Member
Premium Member
Posts: 82
Joined: Sun Feb 27, 2005 9:09 pm
Location: Bangalore

Post by Rubu »

In server you may not be able to change performance, by merely changing parameters.

To take advantage of the number of processors, you may decide to partition the data using Link Partitioner.

By checking IPC and increasing buffer space, you can take advantage of pipeline processing.
Regards
Palas
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard. Sometimes the best optimization is not to be found within one job but in the balance of jobs that you are running at the same time. Investigate which jobs are independent of each other and run these in parallel. In combination with inter-process row buffering in jobs that have more than one active stage you should be able to find some combination that fully utilizes the available processing power of your eight CPUs.

Be careful not to increase the demand beyond what can be supplied, however, since you then end up degrading machine performance as too many processes compete for time slices and need to spend unproductive time paging their working sets in and out of memory.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

I'm being asked for sizing advice for a new box just now.

Our current system is an IBM pSeries P640 server with 4x375MHz processors and 8GB of RAM. How well would a 2-core 4.2GHz Power6 processor with 8GB of RAM stack up to this? We will be running up to maybe 4 or 5 jobs in parallel, would more processors or more RAM be the biggest benefit?
Phil Hibbs | Capgemini
Technical Consultant
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Question - how is your current overall CPU load while running jobs? If it isn't consistently high (i.e. over 80%) then your hardware question shouldn't revolve around the CPU and memory but needs to include your I/O subsystem as well. If both have local disks then are they the same type and do you have identical numbers of controllers with the same bandwidth?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You need to measure the benefit of running your job with partitioned parallelism designs versus one single job running all of the data. More partitions translates to more parallel pipelines, which greatly benefit from having more cpus. If your job runs at 10K rows/sec and finishes in 10 minutes, then having 4 instances running at 8K rows/sec may finish in 5 minutes. You measure the added system overhead versus the reduced runtime and make a judgment as to whether the added overhead is worth the time reduction. You may be able to run other jobs doing other things at the same time and live with the 10 minute runtime because you're getting more work done overall.

When you design jobs that aren't easily tunable or measurable, then you really can't see where your opportunities to tune lie. Arnd's point of looking at your cpu utilization is the best approach for an environment where you can't measure anything well. If your cpus are not bound, look at your disk utilization. Maybe the disk subsystem can't page data/memory quick enough and the cpus are underutilized. Maybe your cpus aren't used because of network waits, or remote database bottlenecks.

If you're not using your cpus then either something else is preventing them from running full out or you're not running enough things simultaneously. Both are solved thru job design.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
muruganr117
Participant
Posts: 40
Joined: Sun Jan 21, 2007 1:52 pm
Location: Chennai
Contact:

Re: Server Job Optimization for multi-processor systems

Post by muruganr117 »

Hi,

I am working with Datastage7.1 server edition, we have done a similar multi processing technique using MULTIPLE INSTANCE option available in JOB PROPERTIES, and we utilise 4 CPU's at around 80-90%.

But for implementing the technique, we need to partition the source data as Mr.Rubu has said.There might be lot of design changes depending on complexitis involved.
psluser wrote:hi,

In a project my client has a 8 processor high-end windows based machine running Datastage v7.5.1 server. I wish to know if there are any configuration parameters or job design considerations to take care of for utilizing full capability of multiprocessor system?

Regards,
Manish
Post Reply