any strategy whether to go for parallel job or server job
Moderators: chulett, rschirm, roy
any strategy whether to go for parallel job or server job
is there any parameters to decide to go for parallel job or server job
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
For single row jobs (such as select max(col) from table) a server job is probably faster, because its startup time is smaller.
Using parallelism techniques in server jobs can handle surprisingly large volumes of data. However, the fact that parallel jobs can scale automatically is a big plus.i
I'm still guided by gut feel rather than by quantified metrics. Even a large volume of data with a lot of date/time manipulation may give me reason to pause and at least consider using a server job. On the other hand, that's what I grew up with, so it's probably a biased view.
Budget would, of course, be a consideration also.
Using parallelism techniques in server jobs can handle surprisingly large volumes of data. However, the fact that parallel jobs can scale automatically is a big plus.i
I'm still guided by gut feel rather than by quantified metrics. Even a large volume of data with a lot of date/time manipulation may give me reason to pause and at least consider using a server job. On the other hand, that's what I grew up with, so it's probably a biased view.
Budget would, of course, be a consideration also.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Parallel jobs are faster on large volumes of data via all the parallel partitioning capabilities, they are also faster with sorting and aggregation functions even when compared in a non-parallel mode. The C++ stages of parallel jobs seem to be more efficient. However parallel jobs have a slower startup time (something they are trying to fix in Hawk) and they are fussier about metadata and when you first start using them it takes time getting used to all the warnings.
Server edition is cheaper and somewhat easier to use initially. However parallel is easier in some areas: very large data volumes, change data capture stage, more join/lookup/merge functionality and flexibility, easier to build custom stages.
Since server jobs and parallel jobs can run from the same job sequences it is quite easy to move from server to parallel by converting just those jobs that handle the highest volumes and leaving the bulk as server jobs.
You can read my blogs on the subject:
Process in parallel or take up folk dancing
DataStage server v enterprise: some performance stats
Hawk overview, screenshots and questionnaires!
Server edition is cheaper and somewhat easier to use initially. However parallel is easier in some areas: very large data volumes, change data capture stage, more join/lookup/merge functionality and flexibility, easier to build custom stages.
Since server jobs and parallel jobs can run from the same job sequences it is quite easy to move from server to parallel by converting just those jobs that handle the highest volumes and leaving the bulk as server jobs.
You can read my blogs on the subject:
Process in parallel or take up folk dancing
DataStage server v enterprise: some performance stats
Hawk overview, screenshots and questionnaires!
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn