IPC Stage

metlsas · Post by **metlsas** » Fri Jan 28, 2005 10:58 am

I have a job pulling data from Oracle tables( 50 million records).It is just a trasfering into another table with little modifications ( like date formate is changed in the Xfr). I used the IPC stage as

oci-----Xfr----IPC----collector ---IPC--oci
-----xfr----IPC---
-----xfr----IPC---

1) In Job Properties/Performance, enable row buffering is set to in process,
what is the difference between the in process and inter process.

2) what is buffer size ? I set it default value

this is a server job .
will there b any chang if I use Inter proces

Thanks in Advance

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Fri Jan 28, 2005 11:02 am

In Process is to denote DataStage to move one or several buffersize of data from one transform to another. This is useful when you are running in single processor.

Inter-Process is to break the whole process so each transform runs on a separate processor with its own buffer memory. This is useful when you are running in multi-processor machine.

Buffer size denotes the amount of memory you expect DataStage to allocate for each transform in either of above cases.

metlsas · Post by **metlsas** » Fri Jan 28, 2005 1:36 pm

As all the xfr's are doing same thing, how will datastage split the load on each of the xrf in the job shown. Will the buffer size have anything to do

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Fri Jan 28, 2005 4:43 pm

It is done by passing multiple rows (depending on buffer size) through to each transform in sets so that the next transform starts processing even when the previous has not finished. By this way, if you have multiple processor, you may have 3 processes working in small logical units in parallell.

Please note that referencing rows committed in the target (i.e. commit every 1 row and reference it in a lookup within the job) will not function as expected as rows might have moved forward the chain in groups and skip the reference together.

throbinson · Post by **throbinson** » Sat Jan 29, 2005 6:38 am

Since the source and target are both Oracle, is there some reason why you are extracting from Oracle and loading back into Oracle via Datastage and not doing it in Oracle directly? If the ETL Server is different then either the source Oracle and/or target then your design is really in question. Why would one take data out of a database move it around the network and then put it back into the same database? One reason is because you've got big hairy transforms that are best documented and written, not in PL/SQL, but in DataStage. However, your argument for this would have to be very strong to justify the network performance hit you are probably incurring.

Given that your reasons for using DataStage are valid, have you thought about making the job a multi-instance and instead of using duplicated transformers in a single job via IPC, using a single transformer multiple times a sin a multi-instance job?