any strategy whether to go for parallel job or server job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
umamahes
Premium Member
Premium Member
Posts: 110
Joined: Tue Jul 04, 2006 9:08 pm

any strategy whether to go for parallel job or server job

Post by umamahes »

is there any parameters to decide to go for parallel job or server job
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Welcome Aboard :D

Availabe resource, (money) and the amount of data to be process with in the batch window should be the main criteria you should lookup on.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
sri1dhar
Charter Member
Charter Member
Posts: 54
Joined: Mon Nov 03, 2003 3:57 pm

Post by sri1dhar »

My personal experience is Parallel jobs may perform better but they take longer to develop. There are several bugs & the issues just keep coming. Server Edition is much stable. Still we decided to stick with Parallel jobs.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

For single row jobs (such as select max(col) from table) a server job is probably faster, because its startup time is smaller.

Using parallelism techniques in server jobs can handle surprisingly large volumes of data. However, the fact that parallel jobs can scale automatically is a big plus.i

I'm still guided by gut feel rather than by quantified metrics. Even a large volume of data with a lot of date/time manipulation may give me reason to pause and at least consider using a server job. On the other hand, that's what I grew up with, so it's probably a biased view.

Budget would, of course, be a consideration also.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Parallel jobs are faster on large volumes of data via all the parallel partitioning capabilities, they are also faster with sorting and aggregation functions even when compared in a non-parallel mode. The C++ stages of parallel jobs seem to be more efficient. However parallel jobs have a slower startup time (something they are trying to fix in Hawk) and they are fussier about metadata and when you first start using them it takes time getting used to all the warnings.

Server edition is cheaper and somewhat easier to use initially. However parallel is easier in some areas: very large data volumes, change data capture stage, more join/lookup/merge functionality and flexibility, easier to build custom stages.

Since server jobs and parallel jobs can run from the same job sequences it is quite easy to move from server to parallel by converting just those jobs that handle the highest volumes and leaving the bulk as server jobs.

You can read my blogs on the subject:
Process in parallel or take up folk dancing
DataStage server v enterprise: some performance stats
Hawk overview, screenshots and questionnaires!
Post Reply