Hash File Key Columns

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
nkln@you
Premium Member
Premium Member
Posts: 271
Joined: Wed Nov 17, 2004 5:15 am
Location: US

Hash File Key Columns

Post by nkln@you »

I am populating 13 columns into a hash file. Out of which first 6 columns are keys, as these 6 columns make the record unique.

The same hash file is used as a look up in transformer. This hash file is looked up by input source. I need to compare only first 4 columns of hash files with that of source.

So, for that, while reading from hash file, I need to make only first 4 columns of hash file as keys in transforemr because if the 5th & 6th column are made as key, there is no field in source to compare, and it gives me error.


But if i make only 1st 4 fields as keys, the incoming data from hash file is wrong, and so look up is not done properly. Wrong in the sense the values for columns are interchanged in hash file while reading.

If I make only 4 fileds as keys while populating hash file, only latest values for 5th & 6TH columns will be present as it overides the old values.

What is the solution for this. I need to compare only 1st 4 fields, but the hash file has to be populated with 6 fields
Aim high
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Are the key that you lookup in the transformer are first 4 and not 3rd,4th,5th and 6th. If so change the order, maintain the same order while populating as well as lookup operation.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
oacvb
Participant
Posts: 128
Joined: Wed Feb 18, 2004 5:33 am

Post by oacvb »

Could you please explain why you wanted to create hash file with 6 key columns when you are using only 4 for lookup.

I don't think you can create hash file with 6 key columns and using only 4 key columns. Because Hash file work in the following way.

When it stores the data, hash key will be generated for all unique key columns and based on that it will be stored / retrieved from hash file. if you use only a few as key in lookup wrong hash key will be generated which will fetch wrong values.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

It is Hahsed file and not Hash file.
And the hashing mecanism is for the whole keys rather than the keys.
Its basically stores the data based on the order of the field. The position can be still changed while reading the file using the "Position" option. But care should be take while using this. It will totally confuse the order of the fields.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
oacvb
Participant
Posts: 128
Joined: Wed Feb 18, 2004 5:33 am

Post by oacvb »

Well.You are right it's hashed file. I am not talking about the position of the key columns. What if partial key's are used during lookup?he is using only partial key columns.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

As the OP stated that input has many records when considiring first 4 keys and the duplicate rows are getting rejected during the key field mark. And would be able to get the full list only if all the six fields are marked as key.
But its a very good valid point that, if the lookup is done only based on four keys, the same duplicate issue will occur. Lets see whats the requirement of OP.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
nkln@you
Premium Member
Premium Member
Posts: 271
Joined: Wed Nov 17, 2004 5:15 am
Location: US

Post by nkln@you »

Kumar I want to compare first 4 only not 3rd, 4th, 5th, 6th columns. I didnt understand the follwoing sentence.

If so change the order, maintain the same order while populating as well as lookup operation.
Yes, as oacvb said I am using only partial key columns. I am populating hash file with 6 key columns because

Code: Select all

If I make only 4 fileds as keys while populating hash file, only latest values for 5th & 6TH columns will be present as it overides the old values
suppose values for 6 columns are as below for 2 records:

a b c d e f g

a b c d e s t

If I make first 4 columns as keys then in hash file only the second records will be present.

a b c d e s t as it overrides the first record.

So, what should I do to satisfy my requirement
Aim high
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As noted, you cannot build a hashed file with X key fields and then perform a lookup with less than X key fields. Period. You have two choices here:

1) Build the hashed file with 4 key fields. Yes, the data for the 'missing' 5 & 6 key fields will because they are no longer part of the key and the 'destructive overwrite' mechanism will trump the old values. Make sure that the last one that is written to the hashed file is the one that you want back from the lookup.

2) Use a UV stage over the hashed file to do the lookup. That will allow you to use less keys during lookup that it was built with. It will also mean that each lookup could return more than one record.

The latter is a whole 'nuther conversation. You'd need a VOC record for the UV stage, and you'd need to build an index over the 4 key fields or the lookup will be dog slow. This has been discussed quite a number of times, which a search for the terms BUILD.INDEX or CREATE.INDEX would reveal.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

nkln@you wrote:So, what should I do to satisfy my requirement
So, what exactly is your requirement? In your two record example, what are you expecting back when you do a lookup using:

a b c d
-craig

"You can never have too many knives" -- Logan Nine Fingers
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

If you do a lookup on a b c d, you need the record on "e f g" or "e s t"?
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
oacvb
Participant
Posts: 128
Joined: Wed Feb 18, 2004 5:33 am

Post by oacvb »

First it's not possible to use sub-set of key columns while reading hashed file. Even if he is getting only a part, there is no guarantee that he will get the same because of usage of less key columns.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It is possible, but only using a UV stage, as noted. Performance will be helped by indexing the four columns that make up the partial key. The UV stage does support multiple reference row return (find out how by searching the forum or on-line Help).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply