IPC STAGE

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Saama
Premium Member
Premium Member
Posts: 83
Joined: Wed Nov 22, 2006 6:42 pm
Location: Pune
Contact:

IPC STAGE

Post by Saama »

Hi,

As per design standards, we keep IPC stage between 2 passive stages,

This improves performance.Why can't we go for transformer stage, what

is the advantage in choosing IPC stage over transformer stage
between 2 passive stages.

cheers;
saama
georgesebastian
Participant
Posts: 83
Joined: Tue Dec 19, 2006 8:38 am

Re: IPC STAGE

Post by georgesebastian »

Saama wrote:Hi,

As per design standards, we keep IPC stage between 2 passive stages,

This improves performance.Why can't we go for transformer stage, what

is the advantage in choosing IPC stage over transformer stage
between 2 passive stages.

cheers;
saama
Hi Saama,

Consider a simple job where source is a flat file and target is a database.
If you are using IPC in between the source and target and you are having multiple processors. One processor will be Reading data from the file at the same time another processor will be writing the data to the database, the data which has been read from the file. This improves the jobs performance.Transformer satge is not having this functionality.

In a nut shell IPC stage allows as to use multiple CPUs at the same time.

Thanks
George
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

IPC alters the "row by row" behavior in your job to a certain level of "pipelining" (moving chunks of data thru the job, each stage working concurrently).
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I have to partially agree and disagree with the previous 2 posters regarding how an IPC stage affects processing.
In a nut shell IPC stage allows as to use multiple CPUs at the same time.
The important differentiation there is not the CPU usage, but that by using an IPC you have created distinct processes. this is what actually affects the processing speed. Even on a single CPU system the performance can be improved with IPC - particularly if the source or target object processes in blocks (as can happen with various devices) so that one process can be busy working while another might be waiting on a peripheral device.
...IPC alters the "row by row" behavior ...
Actually, the IPC stage does nothing to alter the row-row behaviour of DS processing; but there is an element of pipelining introduced.

Let us take a hypothetical DS server job:

Code: Select all

sequential --> transform1 --> transform2 --> sequential
where transform1 and transform2 both do heavy processing, i.e. computing several CRC32 numbers apiece. Each transform needs 10 CPU seconds to process the input/output data.

If we simplified the model of a computer system, then

a) 1-CPU, no other busy users. The job will get 100% of the CPU and will complete in 20 seconds
b) 1-CPU, 2 other busy users. The job will get 1/3 of the total CPU and run in 60 seconds.
c) 4-CPU no other busy users. The job will 100% of the CPU and will complete in 20 seconds
d) 4-CPU, 2 other busy users. The system has 2 "free" processors, so the job will get 100% of one CPU and complete in 20 seconds

This changes when an IPC stage is put between the 2 transform stages. Now the job will be run as 2 separate OS processes. One process is everything before the IPC stage, and the other is everything after the IPC, the stage is a passive one and has no process. Using the simple model, the runtimes will change to:

a) 1-CPU, no other busy users. The job will get 100% of the CPU and will complete in 20 seconds (actually, it will complete a bit faster than before, but only marginally - or hardly noticeably in this CPU bound example)
b) 1-CPU, 2 other busy users. The job will now get 2/4 of the total CPU [4 total system processes, each getting 1/4; 2 of them are this job's] and will complete in 30 seconds.
c) 4-CPU, no other busy users. Each process gets 100% CPU so it will complete in 10 seconds.
c) 4-CPU, 2 other busy users. The system has 4 busy process each on 1 CPU, and both tranforms are running concurrently so the job will finish in 10 seconds.
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

When you say usage of IPC result in separate processes within the same job for separate stages it does imply a behavioural change in the processing of records in the server jobs. If a particular stage is used for look up and simultaneously updated in the same job, introduction of IPC would definitely make a difference in the job behaviour.
ArndW wrote:
...IPC alters the "row by row" behavior ...
Actually, the IPC stage does nothing to alter the row-row behaviour of DS processing; but there is an element of pipelining introduced...

"...but that by using an IPC you have created distinct processes..."
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

JoshGeorge wrote:...If a particular stage is used for look up and simultaneously updated in the same job...
That is a very contrived example, since turning on any type of buffering would force the same thing to happen without use of IPC. In addition, if you make the IPC buffer size less than double the record size the job would run as before.

The IPC does not change the row-by-row processing order in a job, it does adds a buffering and blocking aspect but the order of rows on the reading side of an IPC stage is identical to that of the writing side.
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

You would use IPC to enable inter process buffering explicitly between 2 active stages. In that case in process buffering will apply to all other links where IPC is not used. If you enable inter process buffering at job level through job properties, then you dont have to add an IPC stage in the job.

IPC stage enables pipeline parallelism between the 2 stages where it is used( reader and writer processes), in process buffering will apply between all other stages. Advantages will be realized best on a multiprocessor system.
Post Reply