Hash file lookup and simultaneous update

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

miamichillin99
Participant
Posts: 24
Joined: Wed Dec 01, 2004 11:12 am

Hash file lookup and simultaneous update

Post by miamichillin99 »

Good afternoon.

I'm not sure if this is even possible, but I wanted to perform a lookup against a hash file. Now the tricky part is for each record that is used in the hash file for looking up I want to update a column named "PROCESSED" to TRUE. That way I can keep track of which records from the hash file were used during the run.

Is this possible?

Many thanks to all that lend a helping hand. :D
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

Yes.

Do NOT enable the Pre-load file into memory option in your reference lookup hash file stage, and do NOT enable Write cache in your output hash file stage.

Remember, the hash file stage does a row level write. Therefore, your lookup and output stages must read and then write all columns in the hash file row.
miamichillin99
Participant
Posts: 24
Joined: Wed Dec 01, 2004 11:12 am

Post by miamichillin99 »

Chuck thanks for the response. I'm not sure if I understand you so let me recap. I already have both of those options turned off. my job looks like this:

Code: Select all

OCI1 --> HASH
               |              
               |
OCI2 --> XFM --> OCI_BULK
The OCI1 loads the hash file then OCI2 goes to the transformer and does lookup against the hash file. I then write out the result to a table using the OCI_BULK. But I want the hash to now reflect which of it's records were used by the lookup for another job that is going to use that same hash file.

Code: Select all

UV1(With HASH) --> XFM --> OCI_BULK
In the second job it's going to check for the records that were not used in the lookup from the first job. Something like:

Code: Select all

Select * from HASH where PROCESSED = FALSE
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If they're separate jobs, then they're not simultaneous and you shouldn't have any problem.
(If you wanted to do it all in one job, then you should enable lock for update when reading from the hashed file.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
miamichillin99
Participant
Posts: 24
Joined: Wed Dec 01, 2004 11:12 am

Post by miamichillin99 »

My big question is how do I update the hash-file during a lookup against the same hash file?

I want to perform the lookup in JOB #1 and update the hash column PROCESSED to TRUE. Then I can use the same hash file in JOB 2 and look for the ones that have the column PROCESSED set to FALSE.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Use an output link from the lookup transformer to the hash-file with same metadata. Send your updates to the hash file through this link. Include a constraint of NOT(referencelink.NOTFOUND).
miamichillin99
Participant
Posts: 24
Joined: Wed Dec 01, 2004 11:12 am

Post by miamichillin99 »

Thanks Sainath, Ray, and Chuck. I'll try it in the morning.
miamichillin99
Participant
Posts: 24
Joined: Wed Dec 01, 2004 11:12 am

Post by miamichillin99 »

I have tried updating the hash file as shown below, but as you all know there is no updating feature in the hash file stage. What stage do I use to update a hash file?

Code: Select all

         HASH_UPDATE(HASH Stage) 
                  /
OCI1 --> HASH   / 
           |  /          
           |/
OCI2 -->  XFM --> OCI_BULK
miamichillin99
Participant
Posts: 24
Joined: Wed Dec 01, 2004 11:12 am

Post by miamichillin99 »

I think I have answered my own question by searching this forum, but to be sure is it the UV stage?
miamichillin99
Participant
Posts: 24
Joined: Wed Dec 01, 2004 11:12 am

It WORKED!!!!!

Post by miamichillin99 »

Thank you all for your help. Below is the final set-up for JOB 1:

Code: Select all

         HASH_UPDATE(UV Stage) 
                  / 
OCI1 --> HASH   / 
           |  /          
           |/ 
OCI2 -->  XFM --> OCI_BULK
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

I don't know why you're using the UV stage. Simply use the hash file stage and map all reference columns to the output hash file. The act of using the UV stage simply hides the fact that under the covers, DataStage is reading the whole row, updating the interested column in the row in memory, then writing the whole row back.

Your reference lookup ALREADY HAS READ THIS ROW, simply write it back out. Your current solution DOUBLE READS THE SAME ROW. Change the UV stage for a Hash stage and all is well. Just don't use any caching if repeating driving rows have to "see" the prior row.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Plan as though you are using 2 different hash files but 'with same name and column defn.s'
miamichillin99
Participant
Posts: 24
Joined: Wed Dec 01, 2004 11:12 am

Update to Hash instead of UV

Post by miamichillin99 »

Well I tried changing it to a hash file instead of the UV stage, but now my Hash file only contains the records that went through the XFM as Not(NOTFOUND)'s. I wanted the hash file to contain the same amount of records that it had when it began. Since not all the records in the hash file are being used by the lookup I want to only update those that have been used and leave the others as they were. So the one's the were used by the lookup would have a PROCESSED = TRUE, but the ones that were not used by the lookup would be left alone, as PROCESSED = FALSE. The UV stage accomplishes this fine. The hash stage on the other hand appears to delete hash file completely and then loads it with the records from the XFM output constaint of Not(NOTFOUND), which is not what I want.

Kc I'm not sure how a record would be read twice. Does the UV stages UPDATE perform a read?
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Just set the value of the column PROCESSED to either true or false depending on the result of the lookup (link variable is lnk.NOTFOUND found in the context menu for derivations). All hash writes are desctructive so you basically have to build the record like you want it, but the outcome of using the hash would be the same as if you had used the UV stage.

Also, make sure you are not constraing your output by .NOTFOUND. Make sure all rows flow through the process.

I think this might help you.

Regards,
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

When you populate from Oracle OCI, set all values in Process to False. Set it to true when the lookup is found.
Post Reply