IPC versus multi instance jobs

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
pajj
Participant
Posts: 11
Joined: Fri Jun 16, 2006 12:27 pm

IPC versus multi instance jobs

Post by pajj »

Is there a benefit to running multi instance jobs processing dynamically partitioned data without IPC enabled versus running a single job with IPC enabled and using link partioning?
kris
Participant
Posts: 160
Joined: Tue Dec 09, 2003 2:45 pm
Location: virginia, usa

Re: IPC versus multi instance jobs

Post by kris »

pajj wrote:Is there a benefit to running multi instance jobs processing dynamically partitioned data without IPC enabled versus running a single job with IPC enabled and using link partioning?
There is a significant difference in two approaches.

Having IPC (Row buffering enabled or with IPC stage in the job) enabled for a job will enable the job to run using a separate process for each active stage, which will give you some level of performance boost (depends on in process or inter process).

By running multiple instances of a job with partitioned input is more like a divide and conquer approach. Running a number of instances depending on the configuration of your server will run like multi threads of one process in parallel using more system resources does the job a lot faster compared running a single job as one threaded process.

De pending on the type of requirement, you choose one of these two approaches.

Kris~
~Kris
kris
Participant
Posts: 160
Joined: Tue Dec 09, 2003 2:45 pm
Location: virginia, usa

Re: IPC versus multi instance jobs

Post by kris »

pajj wrote:Is there a benefit to running multi instance jobs processing dynamically partitioned data without IPC enabled versus running a single job with IPC enabled and using link partioning?
There is a significant difference in two approaches.
1. Having IPC (Row buffering enabled or with IPC stage in the job) enabled for a job will enable the job to run using a separate process for each active stage, which will give you some level of performance boost (depends on in process or inter process).

2. By running multiple instances of a job with partitioned input is more like a divide and conquer approach. Running a number of instances depending on the configuration of your server will run like multi threads of one process in parallel using more system resources does the job a lot faster compared running a single job as one threaded process.
De pending on the type of requirement, you choose one of these two approaches.

Kris~
~Kris
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Re: IPC versus multi instance jobs

Post by JoshGeorge »

When you do it in single job with IPC - Advantage is on the connectivity you make (Especially database, if you are doing bulk loading). Also from maintenance point easy to investigate.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

More on how they are vastly different.... multi-instancing means launching a "whole new job"...and you control the degree of "parallel activity" (words chosen carefully there) by your source definition. For instance, you could launch two instances with MQSeries as the source, each getting a different QueueName as a job parameter, or have Oracle as the source, with each instance getting a different value or range in a WHERE clause. Each "job instance" that you launch may itself be running in one or more processes depending on its topology. Certainly each instance could then have (for Server) it's own settings for IPC.

The features for IPC (intra and inter process, or using the IPC stage itself), result in separate processes WITHIN THE SAME JOB for each of the stages. There are some rules as to where the boundaries are placed, but basically IPC is giving you a certain level of "pipelining" (moving chunks of data thru the job, each stage working concurrently). That is fine, but don't ever try to just "turn it on" without thinking about it. It also alters the "row by row" behavior that you may be depending on in your job. Imagine a job that does a lookup near the source, and if that lookup fails, a flag is set, and then 10 stages later, towards the end of the job, a row is inserted into the original lookup table. If you want to ensure that the VERY NEXT row from the source FINDS the newly inserted row, then you CANNOT use IPC --- with IPC turned on, the second row will likely never find the lookup, because it will have been thru the lookup before the first row gets inserted (one way to think of it is that the buffers are following each other more closely with IPC). So...it could be a great performance boost --- or could kill your job logic. Use it wisely and carefully, and in even in conjunction with multi-instancing, once you understand each of the concepts.

Ernie
pajj
Participant
Posts: 11
Joined: Fri Jun 16, 2006 12:27 pm

Post by pajj »

eostic wrote: Imagine a job that does a lookup near the source, and if that lookup fails, a flag is set, and then 10 stages later, towards the end of the job, a row is inserted into the original lookup table.
Ernie
Using the job design you describe , would that not occur in a PX job stream due to pipelining?
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

yes... a very interesting topic. A job that requires that exact type of functionality and flow requires Server. Of course, there may be better ways in EE to "skin the cat" (ie...an alternative technique overall).

Ernie
Post Reply