using reference and output as the same file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

using reference and output as the same file

Post by dnat »

Hi,

We have a job in server where we are using the reference and the output as the same file.

For example.

We have the following records in the file

key1,rectype,update
1111,AA,1
1111,BB,2
1112,BB,3
1113,AA,3

In this file Record type(rectype) AA is an insert record and BB is an update record.

Once we process this file we update it in a db.

The record 1111,BB,2 is followed by a record type AA. So, the update of 1111,BB,2 will be overlayed on the record 1.
For the third record (1112,BB,3) it will check whether a record is already present in the DB since it is a BB type(update record). If already present then update other wise error out.
Fourth record (1113,AA,3) is a direct insert record.

So in the DB we can find the following values

key1,update

1111,2
1112,3(assuming that a record was already present in the DB)
1113,3.

We used hashed file as a reference as well as the output.
i.e Once we have the insert record in the output it will be present in the reference as well and will be considered for lookup to find out for the following update record.

Can we implement the same kind of change in parallel. is there any way for it?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This is called a "blocking operation". It is entirely forbidden in parallel jobs, as it disrupts pipeline parallelism.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kiran259
Participant
Posts: 48
Joined: Thu Aug 16, 2007 11:17 pm
Location: United States
Contact:

Post by kiran259 »

How about splitting into two jobs and making it as a sequence as look up reads all the data into memory and then starts processing.I am not sure,but it is not recommended for huge amounts of data.
Kiran Vaduguri

As soon as the fear approaches near, attack and destroy it.
uegodawa
Participant
Posts: 71
Joined: Thu Apr 27, 2006 12:46 pm

Post by uegodawa »

Same business logic can be achieved by creating a server job also. Same hash file is used for Reference and Output.


Input ---- Transformaer ---- Hash File
|
|
Hash File
Thanks,
Upul
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This fact was mentioned in the original post.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply