Parallel Vs Server Performance Test - Unexpected Results

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

In link sort in the target stage could be one reason. And when both the source and target are stages that run only in sequential mode, the idea of parallelism is lost. And more over the startup times are always more for parallel jobs and increase with the number of stages and number of node as apparent from your experience. But your run times show an abnormal (to me) increase in times. Probably look at the job log to see the start up times and run times and compare them. Try with a DataSet as source and target in the parallel job.
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

In link sort in the target stage could be one reason. And when both the source and target are stages that run only in sequential mode, the idea of parallelism is lost. And more over the startup times are always more for parallel jobs and increase with the number of stages and number of node as apparent from your experience. But your run times show an abnormal (to me) increase in times. Probably look at the job log to see the start up times and run times and compare them. Try with a DataSet as source and target in the parallel job.
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

What is the mode of operation of the aggregator stage? Is it Hash or Sort? Try using sort mode of operation.

Remove 'Perform sort' option in target sequential file stage. Use sorted merge collection method and verify the results.

What is your server load while executing these tests?
basav_ds
Participant
Posts: 24
Joined: Sun Nov 11, 2007 11:19 pm
Location: Mumbai

Post by basav_ds »

Aggt stage is working in Parallel execution mode, Partition type is 'SAME'.

I tried with removing 'Perform sort' option in target sequential file stage and Used sorted merge collection method but not getting sorted o/p.

Server load is zero during tests.
I never let school to interfere in my education
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

What you are seeing is the overheads of partitioning and re-partitioning your data, which is why four nodes takes longer than two. I am surprised a single node is slower than a server job - my own benchmark shows parallel jobs sort and aggregate many times faster than server jobs. During the parallel sort it will write some data out to temporary files, I think it's in the temp directory, and it looks like you have inefficient file i/o. If you are on version 8 I would switch on some of the job monitoring features to see what is slowing your jobs down.
Post Reply