Page 1 of 1

Inter-process

Posted: Mon Nov 08, 2004 8:38 am
by admiral69
I've a very simple job: SeqFile->Transformer->SeqFile.
I want to improve a performance.
There is a multi-processor system.
When I use the IPC stage I have a great performance.
But when I specify it implicitly by turning inter process
row buffering on via Data Stage Administrator,
a speed of reading the data like single-process mode job.
But I can see all of the processors are working :?:
Thanks,
George

Re: Inter-process

Posted: Mon Nov 08, 2004 10:46 am
by ogmios
From sequential file to sequential flie there's not much you can do do parallize it... it's named sequential file for something. What you could is to do is split jobs to output to several files and then recombine them afterwards. but whether this would be really worth it.

Ogmios

Re: Inter-process

Posted: Mon Nov 08, 2004 12:21 pm
by shawn_ramsey
admiral69 wrote:I've a very simple job: SeqFile->Transformer->SeqFile.
I want to improve a performance.
There is a multi-processor system.
When I use the IPC stage I have a great performance.
But when I specify it implicitly by turning inter process
row buffering on via Data Stage Administrator,
a speed of reading the data like single-process mode job.
But I can see all of the processors are working :?:
Thanks,
George
If I recall correctly turning the automatic buffering in DataStage will only insert the IPC between two active stages. In you case the only active stage that you have is the one transformer so no IPC stage was inserted.

Re: Inter-process

Posted: Mon Nov 08, 2004 12:29 pm
by shawn_ramsey
Ogmios,

We use the IPC stage quite frequently in this type of a scenario and have seen some significant performance benefits. It has less with paralleization of the processing of rows and more with splitting the processing of the single stream across multiple processors. The biggest benefit we have seen is where the source is a complex flat file and the destination is sequential. CFF -> IPC -> Xfrm -> Sequential.

Re: Inter-process

Posted: Tue Nov 09, 2004 1:51 am
by admiral69
shawn_ramsey wrote:Ogmios,

We use the IPC stage quite frequently in this type of a scenario and have seen some significant performance benefits. It has less with paralleization of the processing of rows and more with splitting the processing of the single stream across multiple processors. The biggest benefit we have seen is where the source is a complex flat file and the destination is sequential. CFF -> IPC -> Xfrm -> Sequential.
We also use it in more complicate job, but the question is - why the performance is different if I use implicitly option via DataStage Administrator?

Posted: Tue Nov 09, 2004 2:40 pm
by ray.wurlod
Because, as Shawn rightly said, implicit row buffering only occurs on a link that joins two active stages. Your job design only has one active stage (the Transformer stage).
Explicit IPC stages force a process boundary to exist.

Posted: Fri Nov 12, 2004 6:11 am
by ewartpm
Hi George

Have you tried using the LINK PARTITIONER stage. It will allow a single input stream to be split up to 64 ways thereby utilising the SMP architecture (you need to have the Inter-Process option selected).

Posted: Sun Nov 14, 2004 1:40 am
by admiral69
ewartpm wrote:Hi George

Have you tried using the LINK PARTITIONER stage. It will allow a single input stream to be split up to 64 ways thereby utilising the SMP architecture (you need to have the Inter-Process option selected).
Yep, I also use this stage.
Thank you all for your posts :)