Hash file lookup and simultaneous update

miamichillin99 · Post by **miamichillin99** » Tue Mar 08, 2005 2:02 pm

Good afternoon.

I'm not sure if this is even possible, but I wanted to perform a lookup against a hash file. Now the tricky part is for each record that is used in the hash file for looking up I want to update a column named "PROCESSED" to TRUE. That way I can keep track of which records from the hash file were used during the run.

Is this possible?

Many thanks to all that lend a helping hand. :D

chucksmith · Post by **chucksmith** » Tue Mar 08, 2005 2:10 pm

Yes.

Do NOT enable the Pre-load file into memory option in your reference lookup hash file stage, and do NOT enable Write cache in your output hash file stage.

Remember, the hash file stage does a row level write. Therefore, your lookup and output stages must read and then write all columns in the hash file row.

miamichillin99 · Post by **miamichillin99** » Tue Mar 08, 2005 3:13 pm

Chuck thanks for the response. I'm not sure if I understand you so let me recap. I already have both of those options turned off. my job looks like this:

Code: Select all

OCI1 --> HASH
               |              
               |
OCI2 --> XFM --> OCI_BULK

The OCI1 loads the hash file then OCI2 goes to the transformer and does lookup against the hash file. I then write out the result to a table using the OCI_BULK. But I want the hash to now reflect which of it's records were used by the lookup for another job that is going to use that same hash file.

Code: Select all

UV1(With HASH) --> XFM --> OCI_BULK

In the second job it's going to check for the records that were not used in the lookup from the first job. Something like:

Code: Select all

Select * from HASH where PROCESSED = FALSE

ray.wurlod · Post by **ray.wurlod** » Tue Mar 08, 2005 3:39 pm

If they're separate jobs, then they're not simultaneous and you shouldn't have any problem.
(If you wanted to do it all in one job, then you should enable lock for update when reading from the hashed file.)

miamichillin99 · Post by **miamichillin99** » Tue Mar 08, 2005 4:24 pm

My big question is how do I update the hash-file during a lookup against the same hash file?

I want to perform the lookup in JOB #1 and update the hash column PROCESSED to TRUE. Then I can use the same hash file in JOB 2 and look for the ones that have the column PROCESSED set to FALSE.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Tue Mar 08, 2005 4:54 pm

Use an output link from the lookup transformer to the hash-file with same metadata. Send your updates to the hash file through this link. Include a constraint of NOT(referencelink.NOTFOUND).

miamichillin99 · Post by **miamichillin99** » Tue Mar 08, 2005 6:27 pm

Thanks Sainath, Ray, and Chuck. I'll try it in the morning.

miamichillin99 · Post by **miamichillin99** » Wed Mar 09, 2005 3:26 pm

I have tried updating the hash file as shown below, but as you all know there is no updating feature in the hash file stage. What stage do I use to update a hash file?

Code: Select all

         HASH_UPDATE(HASH Stage) 
                  /
OCI1 --> HASH   / 
           |  /          
           |/
OCI2 -->  XFM --> OCI_BULK

miamichillin99 · Post by **miamichillin99** » Wed Mar 09, 2005 3:41 pm

I think I have answered my own question by searching this forum, but to be sure is it the UV stage?

miamichillin99 · Post by **miamichillin99** » Wed Mar 09, 2005 5:02 pm

Thank you all for your help. Below is the final set-up for JOB 1:

Code: Select all

         HASH_UPDATE(UV Stage) 
                  / 
OCI1 --> HASH   / 
           |  /          
           |/ 
OCI2 -->  XFM --> OCI_BULK

kcbland · Post by **kcbland** » Wed Mar 09, 2005 7:32 pm

I don't know why you're using the UV stage. Simply use the hash file stage and map all reference columns to the output hash file. The act of using the UV stage simply hides the fact that under the covers, DataStage is reading the whole row, updating the interested column in the row in memory, then writing the whole row back.

Your reference lookup ALREADY HAS READ THIS ROW, simply write it back out. Your current solution DOUBLE READS THE SAME ROW. Change the UV stage for a Hash stage and all is well. Just don't use any caching if repeating driving rows have to "see" the prior row.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Mar 10, 2005 4:08 am

Plan as though you are using 2 different hash files but 'with same name and column defn.s'

miamichillin99 · Post by **miamichillin99** » Thu Mar 10, 2005 1:27 pm

Well I tried changing it to a hash file instead of the UV stage, but now my Hash file only contains the records that went through the XFM as Not(NOTFOUND)'s. I wanted the hash file to contain the same amount of records that it had when it began. Since not all the records in the hash file are being used by the lookup I want to only update those that have been used and leave the others as they were. So the one's the were used by the lookup would have a PROCESSED = TRUE, but the ones that were not used by the lookup would be left alone, as PROCESSED = FALSE. The UV stage accomplishes this fine. The hash stage on the other hand appears to delete hash file completely and then loads it with the records from the XFM output constaint of Not(NOTFOUND), which is not what I want.

Kc I'm not sure how a record would be read twice. Does the UV stages UPDATE perform a read?

mhester · Post by **mhester** » Thu Mar 10, 2005 2:18 pm

Just set the value of the column PROCESSED to either true or false depending on the result of the lookup (link variable is lnk.NOTFOUND found in the context menu for derivations). All hash writes are desctructive so you basically have to build the record like you want it, but the outcome of using the hash would be the same as if you had used the UV stage.

Also, make sure you are not constraing your output by .NOTFOUND. Make sure all rows flow through the process.

I think this might help you.

Regards,

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Mar 10, 2005 2:40 pm

When you populate from Oracle OCI, set all values in Process to False. Set it to true when the lookup is found.

DSXchange

Hash file lookup and simultaneous update

Hash file lookup and simultaneous update

It WORKED!!!!!

Update to Hash instead of UV