Page 1 of 1

Updating HASH Files

Posted: Thu Jan 17, 2008 10:39 am
by rmcclure
I have a simple staging job that pulls data from a source and puts it in a target table. The job then deletes and crates a Hash file for look ups.
There are 20K records each day to pull but 9 million total in the table. It takes about 10 minutes to update the table and about 20 minutes to re-create the hash file.
I would like to update the hash file rather than re-create it. The 20K of records could include both inserts and updates. I changed the job to pass the 20K records to the hash file instead of re-creating the complete file, I also removed the create file options.
The problem is even though there is a key on the hash file it adds all 20K worth of records including the updates. This creates duplicates in the hash file.
Is there something I am missing to easily update the hash file.

Re: Updating HASH Files

Posted: Thu Jan 17, 2008 10:45 am
by gateleys
Have you chosen to "Clear file before writing" in the hashed file input property?
rmcclure wrote:The problem is even though there is a key on the hash file it adds all 20K worth of records including the updates. This creates duplicates in the hash file.
Now, that sounds odd. You sure they are duplicates, them keys??

Re: Updating HASH Files

Posted: Thu Jan 17, 2008 10:57 am
by rmcclure
Clear file before writing is NOT checked.

The same 6 fields make up the unique key in both the source table and the hash file. I checked and all the properties are the same for each field (type, length, display...) in both the DB table and the hash file.
gateleys wrote:Have you chosen to "Clear file before writing" in the hashed file input property?

Now, that sounds odd. You sure they are duplicates, them keys??

Posted: Thu Jan 17, 2008 11:16 am
by chulett
No such thing as duplicate keys in a hashed file. Honest. You need to check your data / process a little closer it seems to me.

Posted: Thu Jan 17, 2008 12:03 pm
by rmcclure
chulett wrote:No such thing as duplicate keys in a hashed file. Honest. You need to check your data / process a little closer it seems to me. ...
Here is what I get for the 6 columns that make up the key in the Hash file:

1807839 143208 A1 P - JDEPD
1807839 143208 A1 P - JDEPD
1807839 143208 A2 P - JDEPD
1807839 143208 A2 P - JDEPD
1807839 143208 B1 P - JDEPD
1807839 143208 B1 P - JDEPD
1807839 143208 C1 P - JDEPD
1807839 143208 C1 P - JDEPD
1807839 143208 XA1 P - JDEPD
1807839 143208 XA1 P - JDEPD

Posted: Thu Jan 17, 2008 12:19 pm
by chulett
All it takes is a trailing space (for example) for one field to not equal another, something we can't tell from here. What is the dash supposed to represent?

All it takes to 'update' a hashed file is to do exactly what you are doing - send the complete / replacement records in again. As long as you don't expect deletes to happen, the destructive overwrite that happens automatically over the key fields will ensure that you have no duplicates. If you've double-checked the data and are convinced that things are exactly duplicated across the combination of all key fields, then you need to involve your Support provider as that shouldn't be happening in a hashed file.

Posted: Thu Jan 17, 2008 12:22 pm
by Minhajuddin
Try to dump all your data into a sequential file and analyze it. There is no way a hashed file can have duplicates on keys. Maybe you have spaces in your data. We'll know only if you take a dump of some records and look at them closely.

Re: Updating HASH Files

Posted: Thu Jan 17, 2008 12:24 pm
by paddu
rmcclure wrote:I changed the job to pass the 20K records to the hash file instead of re-creating the complete file, I also removed the create file options.

Did you change the keys ?

I have no issue in my job which does inserts and updates to hashed file

Posted: Thu Jan 17, 2008 3:50 pm
by ray.wurlod
It's "hashed" file, not "hash" file.

Because of the mechanism that locates records in a hashed file it is impossible (unless the hashed file is damaged) to have duplicate keys in a hashed file. Please check that your key is defined only on these six columns. What you are seeing is possible if there are more than six columns in the key.

All writes to a hashed file via a Hashed File stage are destructive overwrites. There is no concept of "insert" or "update", only "replace". If you want insert/update, use a UniVerse stage.

Posted: Fri Jan 18, 2008 11:20 am
by rmcclure
chulett wrote:All it takes is a trailing space (for example) for one field to not equal another, something we can't tell from here. What is the dash supposed to represent?
Problem solved..It must have been spaces.
The original job had records updating a table and then the hashed file being completely rebuilt from the table
The job was modified to send the exact same records to the hashed file as to the DB Table. Thus updating the hashed file. That is where I started getting duplicates. I added trims all character fields before passing them to the hashed file and now I have no more duplicates.

thanks