Need to develop parallel job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mydsworld
Participant
Posts: 321
Joined: Thu Sep 07, 2006 3:55 am

Need to develop parallel job

Post by mydsworld »

When we can run a server job in both a single processor and multi-processor systems,why should we go for parallel job (just because it gives more number of stages to work with).
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No reason.

Parallel jobs do give you the future flexibility to spread your processing over multiple CPUs in multiple machines (e.g. in a cluster or grid configuration) and to be able to change the number of partitions without needing to recompile or provide different parameter values, which you would need to do if using server jobs.

And to take advantages of many of the new features in IBM Information Server, you must be running on the parallel architecture.

But if you're happy with what you're doing, and it's performing adequately, then it's perfectly OK to stay there. Server jobs will be supported by IBM for a very long time yet.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Ray, Ain't the Orchestrate engine is more faster than the DsEngine(Server)? Lets just talk about single CPU for same transformation logic.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

For small jobs server jobs are faster (finish more quickly). The startup cost of parallel jobs (even just starting the conductor process, composing the score, starting the section leader processes, distributing the score and starting the player processes - oh, and license checking) is an overhead that server jobs don't have.

I have no quantified results about where the break-even point would be, since this would be hardware-specific in any case. However, as a rule of thumb with a local database I'd opt for a server job for anything up to 1000 rows.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I've got two blog posts showing how a parallel job can be a lot faster than a server job. Just posted DataStage Tip: Extracting database data 250% faster that reports on a Developerworks article and DataStage server v enterprise: some performance stats. You can still get great performance out of server jobs with techniques such as Unix sorting, CRC32, hash files and multiple instance jobs.
Post Reply