Updating HASH Files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
rmcclure
Participant
Posts: 48
Joined: Fri Dec 01, 2006 7:50 am

Updating HASH Files

Post by rmcclure »

I have a simple staging job that pulls data from a source and puts it in a target table. The job then deletes and crates a Hash file for look ups.
There are 20K records each day to pull but 9 million total in the table. It takes about 10 minutes to update the table and about 20 minutes to re-create the hash file.
I would like to update the hash file rather than re-create it. The 20K of records could include both inserts and updates. I changed the job to pass the 20K records to the hash file instead of re-creating the complete file, I also removed the create file options.
The problem is even though there is a key on the hash file it adds all 20K worth of records including the updates. This creates duplicates in the hash file.
Is there something I am missing to easily update the hash file.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Re: Updating HASH Files

Post by gateleys »

Have you chosen to "Clear file before writing" in the hashed file input property?
rmcclure wrote:The problem is even though there is a key on the hash file it adds all 20K worth of records including the updates. This creates duplicates in the hash file.
Now, that sounds odd. You sure they are duplicates, them keys??
gateleys
rmcclure
Participant
Posts: 48
Joined: Fri Dec 01, 2006 7:50 am

Re: Updating HASH Files

Post by rmcclure »

Clear file before writing is NOT checked.

The same 6 fields make up the unique key in both the source table and the hash file. I checked and all the properties are the same for each field (type, length, display...) in both the DB table and the hash file.
gateleys wrote:Have you chosen to "Clear file before writing" in the hashed file input property?

Now, that sounds odd. You sure they are duplicates, them keys??
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No such thing as duplicate keys in a hashed file. Honest. You need to check your data / process a little closer it seems to me.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rmcclure
Participant
Posts: 48
Joined: Fri Dec 01, 2006 7:50 am

Post by rmcclure »

chulett wrote:No such thing as duplicate keys in a hashed file. Honest. You need to check your data / process a little closer it seems to me. ...
Here is what I get for the 6 columns that make up the key in the Hash file:

1807839 143208 A1 P - JDEPD
1807839 143208 A1 P - JDEPD
1807839 143208 A2 P - JDEPD
1807839 143208 A2 P - JDEPD
1807839 143208 B1 P - JDEPD
1807839 143208 B1 P - JDEPD
1807839 143208 C1 P - JDEPD
1807839 143208 C1 P - JDEPD
1807839 143208 XA1 P - JDEPD
1807839 143208 XA1 P - JDEPD
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

All it takes is a trailing space (for example) for one field to not equal another, something we can't tell from here. What is the dash supposed to represent?

All it takes to 'update' a hashed file is to do exactly what you are doing - send the complete / replacement records in again. As long as you don't expect deletes to happen, the destructive overwrite that happens automatically over the key fields will ensure that you have no duplicates. If you've double-checked the data and are convinced that things are exactly duplicated across the combination of all key fields, then you need to involve your Support provider as that shouldn't be happening in a hashed file.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Minhajuddin
Participant
Posts: 467
Joined: Tue Mar 20, 2007 6:36 am
Location: Chennai
Contact:

Post by Minhajuddin »

Try to dump all your data into a sequential file and analyze it. There is no way a hashed file can have duplicates on keys. Maybe you have spaces in your data. We'll know only if you take a dump of some records and look at them closely.
Minhajuddin

<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
paddu
Premium Member
Premium Member
Posts: 232
Joined: Tue Feb 22, 2005 11:14 am
Location: California

Re: Updating HASH Files

Post by paddu »

rmcclure wrote:I changed the job to pass the 20K records to the hash file instead of re-creating the complete file, I also removed the create file options.

Did you change the keys ?

I have no issue in my job which does inserts and updates to hashed file
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's "hashed" file, not "hash" file.

Because of the mechanism that locates records in a hashed file it is impossible (unless the hashed file is damaged) to have duplicate keys in a hashed file. Please check that your key is defined only on these six columns. What you are seeing is possible if there are more than six columns in the key.

All writes to a hashed file via a Hashed File stage are destructive overwrites. There is no concept of "insert" or "update", only "replace". If you want insert/update, use a UniVerse stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rmcclure
Participant
Posts: 48
Joined: Fri Dec 01, 2006 7:50 am

Post by rmcclure »

chulett wrote:All it takes is a trailing space (for example) for one field to not equal another, something we can't tell from here. What is the dash supposed to represent?
Problem solved..It must have been spaces.
The original job had records updating a table and then the hashed file being completely rebuilt from the table
The job was modified to send the exact same records to the hashed file as to the DB Table. Thus updating the hashed file. That is where I started getting duplicates. I added trims all character fields before passing them to the hashed file and now I have no more duplicates.

thanks
Post Reply