I've a very simple job: SeqFile->Transformer->SeqFile.
I want to improve a performance.
There is a multi-processor system.
When I use the IPC stage I have a great performance.
But when I specify it implicitly by turning inter process
row buffering on via Data Stage Administrator,
a speed of reading the data like single-process mode job.
But I can see all of the processors are working
Thanks,
George
From sequential file to sequential flie there's not much you can do do parallize it... it's named sequential file for something. What you could is to do is split jobs to output to several files and then recombine them afterwards. but whether this would be really worth it.
Ogmios
In theory there's no difference between theory and practice. In practice there is.
admiral69 wrote:I've a very simple job: SeqFile->Transformer->SeqFile.
I want to improve a performance.
There is a multi-processor system.
When I use the IPC stage I have a great performance.
But when I specify it implicitly by turning inter process
row buffering on via Data Stage Administrator,
a speed of reading the data like single-process mode job.
But I can see all of the processors are working
Thanks,
George
If I recall correctly turning the automatic buffering in DataStage will only insert the IPC between two active stages. In you case the only active stage that you have is the one transformer so no IPC stage was inserted.
Shawn Ramsey
"It is a mistake to think you can solve any major problems just with potatoes."
-- Douglas Adams
We use the IPC stage quite frequently in this type of a scenario and have seen some significant performance benefits. It has less with paralleization of the processing of rows and more with splitting the processing of the single stream across multiple processors. The biggest benefit we have seen is where the source is a complex flat file and the destination is sequential. CFF -> IPC -> Xfrm -> Sequential.
Shawn Ramsey
"It is a mistake to think you can solve any major problems just with potatoes."
-- Douglas Adams
We use the IPC stage quite frequently in this type of a scenario and have seen some significant performance benefits. It has less with paralleization of the processing of rows and more with splitting the processing of the single stream across multiple processors. The biggest benefit we have seen is where the source is a complex flat file and the destination is sequential. CFF -> IPC -> Xfrm -> Sequential.
We also use it in more complicate job, but the question is - why the performance is different if I use implicitly option via DataStage Administrator?
Because, as Shawn rightly said, implicit row buffering only occurs on a link that joins two active stages. Your job design only has one active stage (the Transformer stage).
Explicit IPC stages force a process boundary to exist.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Have you tried using the LINK PARTITIONER stage. It will allow a single input stream to be split up to 64 ways thereby utilising the SMP architecture (you need to have the Inter-Process option selected).
Have you tried using the LINK PARTITIONER stage. It will allow a single input stream to be split up to 64 ways thereby utilising the SMP architecture (you need to have the Inter-Process option selected).
Yep, I also use this stage.
Thank you all for your posts