Server jobs vs Parallel jobs

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
dsqspro
Premium Member
Premium Member
Posts: 20
Joined: Wed Apr 15, 2009 7:01 am

Server jobs vs Parallel jobs

Post by dsqspro »

For data warehouse which is best approach

Server jobs vs Parallel jobs

I recommended both based on case by case in development in version 8.1.

Can some one help me with pros & cons fact sheet?

It is a good strategy to use only Parallel job.

Thank you
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This is a matter of personal philosophy and office politics rather than a technology argument. My personal philosophy is to use the right tool for the job, which is often a mix of server and parallel jobs. The fact that the target is a data warehouse is moot.

Server jobs are especially good at low volumes (say up to 10000 rows), particulary those jobs that process exactly one row, such as retrieving the next available surrogate key value. It's also easy to park that value into a server job's user status area for retrieval by the controlling sequence.

Parallel jobs have substantial startup overheads compared to server jobs but, once running (assuming that they're well written) can process huge volumes of data very quickly.

Do not believe any FUD (fear, uncertainty and doubt) about server job support being withdrawn. This is not going to happen.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

I've been leaning towards going all PX for new development at new sites for several reasons:

1) Though Server jobs are still faster for small amounts of data, PX jobs can be be single-threaded (sequential operation) to significantly reduce overhead and startup times. Most of my customers don't have lots of jobs that handle small numbers of records anyway, so the overall impact of these very fast jobs is negligible between server and PX.

2) Many of the new features and products are PX-only going forward. If you ever need to add one of these later (like QualityStage, Oracle Hyperion Connectos, etc.) it is easier if you are already on PX.

3) Reducing support requirements. Supporting one code base instead of two.

Please note - with that said - In cases where there's pre-existing server expertise and none of the above factors come into play, I quite happily write server jobs.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

There are also some very special situations/techniques/approaches that require Server.....

Some of this is a factor of Job design, and other methods are possible with Enterprise Edition, but here are two that only Server can easily do....(I'm sure there are others...and in the context of this discussion, many techniques that can only be done or are best done only in EE)....

a) process a single row thru an ENTIRE set of stages before processing the next row. Image a scenario where a lookup done early is tested for null --- if the lookup failed, certain logic takes place that (at the far end of the job) results in an INSERT, with Commit=1. In Server, unless you play games with inter-process row buffering, the next row with the same key value will "find" the value in the lookup. Not so with EE because rows are (for valid performance reasons) buffered together. This is, of course, a low volume exercise in transaction control, which fits in with the other comments above.

b) closely related --- being able to create a job that writes to "n" odbc tables on multiple links (some delete, some insert, some update, etc.) and choose your own time to commit (or to rollback). This takes some effort in Server, but is do-able with any ODBC based scenario.

Bottom line? ...as Ray noted, a mix is best. EE ought to be your primary, with Server reserved for quick running jobs, especially utility functions, and for specialized scenarios. I also like server for low to mid volume scenarios that are 100% variable text (often true with XML). Server throws around variable text with no difficulty, no screaming about formats or too-short or too-long strings, and with a slew of easy to use transformation functions. I've finished the Job and tested and QA'd it before the same effort with EE gets past null handling issues.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply