How to check existing records in Hash File

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
niteshjha
Participant
Posts: 2
Joined: Fri Aug 17, 2007 12:51 am
Location: Bangalore

How to check existing records in Hash File

Post by niteshjha »

Hi,
This is my first post.I am using hash file for lookup (DataStage- Server Edition).Suppose in first run I have some records in hash file lookup(ex.empid 1,2,3). Then in second run in source flat file again we have empid 1,2,3,4,5,6 and we have to load it to hash file for look up purpose.
Question: I want to load only empid 4,5,6 in hasfile beacause 1,2,3 is already existing.I want to reuse same hash file.
So, is there any option in Hash File that it will check existing record in hash file memory then it will load ? otherwisw how to do it without using any extra stage?

Thanks in advance
Nitesh
Raghavendra
Participant
Posts: 147
Joined: Sat Apr 30, 2005 1:23 am
Location: Bangalore,India

Post by Raghavendra »

One way is to do a look up on the same hash file for existing records.If you try to load the records without doing a lookup you will get a warning for duplicate records.
niteshjha
Participant
Posts: 2
Joined: Fri Aug 17, 2007 12:51 am
Location: Bangalore

Post by niteshjha »

Raghavendra wrote:One way is to do a look up on the same hash file for existing records.If you try to load the records without doing a lookup you will get a warning for duplicate records.
Is there any option in stage(hash file) to do a look up on the same hash file? Or we have to put other hash file stage and have to do lookup? Please elaborate the same.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You require two hashed file stages. A passive stage can not open its output link(s) until its input link(s) are closed. So you need one stage for performing lookups (specify cache disabled, lock for update) and one stage for writing to the hashed file (specify write cache disabled).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As noted... you will need an 'extra' stage if you want to both read and write to a hashed file in the same job. And then you would check the status of the lookup to know if the value being looked up existed or not - i.e. if the lookup was successful. There is a 'Link Variable' available for that - LinkName.NOTFOUND - when true indicates 'failure' and when false, success.

You could then constrain your write link to only send 'new' values to the hashed file. You would also need to ensure your lookup stage has caching disabled (or 'Enabled, Locked for Update') if you need those new keys to be immediately available in the current job run.

Lastly, because hashed files are 'destructive overwrite' a.k.a. 'Last one in wins' when it comes to key handling, you could write all of your input key values to the hashed file and the end result would be the same as if you had only written the new ones.

Speaking from a key values standpoint only, of course, any associated data (non-key) values are what would drive your decision to send all or only new keys back into the hashed file. Of course, you could also send existing records back if their data elements need to be updated as well.

After all that, not sure we're answering your question. It will depend on your exact job design and how you are getting 'some records in hashed file', how you are loading it. And then how it is being leveraged during the job run. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply