IPC Stage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
metlsas
Participant
Posts: 8
Joined: Sun May 30, 2004 5:55 pm

IPC Stage

Post by metlsas »

I have a job pulling data from Oracle tables( 50 million records).It is just a trasfering into another table with little modifications ( like date formate is changed in the Xfr). I used the IPC stage as

oci-----Xfr----IPC----collector ---IPC--oci
-----xfr----IPC---
-----xfr----IPC---

1) In Job Properties/Performance, enable row buffering is set to in process,
what is the difference between the in process and inter process.

2) what is buffer size ? I set it default value

this is a server job .
will there b any chang if I use Inter proces

Thanks in Advance
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

In Process is to denote DataStage to move one or several buffersize of data from one transform to another. This is useful when you are running in single processor.

Inter-Process is to break the whole process so each transform runs on a separate processor with its own buffer memory. This is useful when you are running in multi-processor machine.

Buffer size denotes the amount of memory you expect DataStage to allocate for each transform in either of above cases.
metlsas
Participant
Posts: 8
Joined: Sun May 30, 2004 5:55 pm

Post by metlsas »

As all the xfr's are doing same thing, how will datastage split the load on each of the xrf in the job shown. Will the buffer size have anything to do
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

It is done by passing multiple rows (depending on buffer size) through to each transform in sets so that the next transform starts processing even when the previous has not finished. By this way, if you have multiple processor, you may have 3 processes working in small logical units in parallell.

Please note that referencing rows committed in the target (i.e. commit every 1 row and reference it in a lookup within the job) will not function as expected as rows might have moved forward the chain in groups and skip the reference together.
throbinson
Charter Member
Charter Member
Posts: 299
Joined: Wed Nov 13, 2002 5:38 pm
Location: USA

Post by throbinson »

Since the source and target are both Oracle, is there some reason why you are extracting from Oracle and loading back into Oracle via Datastage and not doing it in Oracle directly? If the ETL Server is different then either the source Oracle and/or target then your design is really in question. Why would one take data out of a database move it around the network and then put it back into the same database? One reason is because you've got big hairy transforms that are best documented and written, not in PL/SQL, but in DataStage. However, your argument for this would have to be very strong to justify the network performance hit you are probably incurring.

Given that your reasons for using DataStage are valid, have you thought about making the job a multi-instance and instead of using duplicated transformers in a single job via IPC, using a single transformer multiple times a sin a multi-instance job?
Post Reply