How to check existing records in Hash File

niteshjha · Post by **niteshjha** » Fri Aug 17, 2007 4:40 am

Hi,
This is my first post.I am using hash file for lookup (DataStage- Server Edition).Suppose in first run I have some records in hash file lookup(ex.empid 1,2,3). Then in second run in source flat file again we have empid 1,2,3,4,5,6 and we have to load it to hash file for look up purpose.
Question: I want to load only empid 4,5,6 in hasfile beacause 1,2,3 is already existing.I want to reuse same hash file.
So, is there any option in Hash File that it will check existing record in hash file memory then it will load ? otherwisw how to do it without using any extra stage?

Thanks in advance
Nitesh

Raghavendra · Post by **Raghavendra** » Fri Aug 17, 2007 5:08 am

One way is to do a look up on the same hash file for existing records.If you try to load the records without doing a lookup you will get a warning for duplicate records.

niteshjha · Post by **niteshjha** » Fri Aug 17, 2007 5:22 am

Raghavendra wrote:One way is to do a look up on the same hash file for existing records.If you try to load the records without doing a lookup you will get a warning for duplicate records.

Is there any option in stage(hash file) to do a look up on the same hash file? Or we have to put other hash file stage and have to do lookup? Please elaborate the same.

Thanks

ray.wurlod · Post by **ray.wurlod** » Fri Aug 17, 2007 6:06 am

You require two hashed file stages. A passive stage can not open its output link(s) until its input link(s) are closed. So you need one stage for performing lookups (specify cache disabled, lock for update) and one stage for writing to the hashed file (specify write cache disabled).

chulett · Post by **chulett** » Fri Aug 17, 2007 6:58 am

As noted... you will need an 'extra' stage if you want to both read and write to a hashed file in the same job. And then you would check the status of the lookup to know if the value being looked up existed or not - i.e. if the lookup was successful. There is a 'Link Variable' available for that - LinkName.NOTFOUND - when true indicates 'failure' and when false, success.

You could then constrain your write link to only send 'new' values to the hashed file. You would also need to ensure your lookup stage has caching disabled (or 'Enabled, Locked for Update') if you need those new keys to be immediately available in the current job run.

Lastly, because hashed files are 'destructive overwrite' a.k.a. 'Last one in wins' when it comes to key handling, you could write all of your input key values to the hashed file and the end result would be the same as if you had only written the new ones.

Speaking from a key values standpoint only, of course, any associated data (non-key) values are what would drive your decision to send all or only new keys back into the hashed file. Of course, you could also send existing records back if their data elements need to be updated as well.

After all that, not sure we're answering your question. It will depend on your exact job design and how you are getting 'some records in hashed file', how you are loading it. And then how it is being leveraged during the job run.