Hash File Key Columns

nkln@you · Post by **nkln@you** » Thu Mar 08, 2007 9:26 pm

I am populating 13 columns into a hash file. Out of which first 6 columns are keys, as these 6 columns make the record unique.

The same hash file is used as a look up in transformer. This hash file is looked up by input source. I need to compare only first 4 columns of hash files with that of source.

So, for that, while reading from hash file, I need to make only first 4 columns of hash file as keys in transforemr because if the 5th & 6th column are made as key, there is no field in source to compare, and it gives me error.

But if i make only 1st 4 fields as keys, the incoming data from hash file is wrong, and so look up is not done properly. Wrong in the sense the values for columns are interchanged in hash file while reading.

If I make only 4 fileds as keys while populating hash file, only latest values for 5th & 6TH columns will be present as it overides the old values.

What is the solution for this. I need to compare only 1st 4 fields, but the hash file has to be populated with 6 fields

kumar_s · Post by **kumar_s** » Thu Mar 08, 2007 9:42 pm

Are the key that you lookup in the transformer are first 4 and not 3rd,4th,5th and 6th. If so change the order, maintain the same order while populating as well as lookup operation.

oacvb · Post by **oacvb** » Thu Mar 08, 2007 9:56 pm

Could you please explain why you wanted to create hash file with 6 key columns when you are using only 4 for lookup.

I don't think you can create hash file with 6 key columns and using only 4 key columns. Because Hash file work in the following way.

When it stores the data, hash key will be generated for all unique key columns and based on that it will be stored / retrieved from hash file. if you use only a few as key in lookup wrong hash key will be generated which will fetch wrong values.

kumar_s · Post by **kumar_s** » Thu Mar 08, 2007 10:03 pm

It is Hahsed file and not Hash file.
And the hashing mecanism is for the whole keys rather than the keys.
Its basically stores the data based on the order of the field. The position can be still changed while reading the file using the "Position" option. But care should be take while using this. It will totally confuse the order of the fields.

oacvb · Post by **oacvb** » Thu Mar 08, 2007 10:08 pm

Well.You are right it's hashed file. I am not talking about the position of the key columns. What if partial key's are used during lookup?he is using only partial key columns.

kumar_s · Post by **kumar_s** » Thu Mar 08, 2007 10:12 pm

As the OP stated that input has many records when considiring first 4 keys and the duplicate rows are getting rejected during the key field mark. And would be able to get the full list only if all the six fields are marked as key.
But its a very good valid point that, if the lookup is done only based on four keys, the same duplicate issue will occur. Lets see whats the requirement of OP.

nkln@you · Post by **nkln@you** » Thu Mar 08, 2007 10:25 pm

Kumar I want to compare first 4 only not 3rd, 4th, 5th, 6th columns. I didnt understand the follwoing sentence.

If so change the order, maintain the same order while populating as well as lookup operation.

Yes, as oacvb said I am using only partial key columns. I am populating hash file with 6 key columns because

Code: Select all

If I make only 4 fileds as keys while populating hash file, only latest values for 5th & 6TH columns will be present as it overides the old values

suppose values for 6 columns are as below for 2 records:

a b c d e f g

a b c d e s t

If I make first 4 columns as keys then in hash file only the second records will be present.

a b c d e s t as it overrides the first record.

So, what should I do to satisfy my requirement

chulett · Post by **chulett** » Thu Mar 08, 2007 10:25 pm

As noted, you cannot build a hashed file with X key fields and then perform a lookup with less than X key fields. Period. You have two choices here:

1) Build the hashed file with 4 key fields. Yes, the data for the 'missing' 5 & 6 key fields will because they are no longer part of the key and the 'destructive overwrite' mechanism will trump the old values. Make sure that the last one that is written to the hashed file is the one that you want back from the lookup.

2) Use a UV stage over the hashed file to do the lookup. That will allow you to use less keys during lookup that it was built with. It will also mean that each lookup could return more than one record.

The latter is a whole 'nuther conversation. You'd need a VOC record for the UV stage, and you'd need to build an index over the 4 key fields or the lookup will be dog slow. This has been discussed quite a number of times, which a search for the terms BUILD.INDEX or CREATE.INDEX would reveal.

chulett · Post by **chulett** » Thu Mar 08, 2007 10:28 pm

nkln@you wrote:So, what should I do to satisfy my requirement

So, what exactly is your requirement? In your two record example, what are you expecting back when you do a lookup using:

a b c d

kumar_s · Post by **kumar_s** » Thu Mar 08, 2007 10:32 pm

If you do a lookup on a b c d, you need the record on "e f g" or "e s t"?

oacvb · Post by **oacvb** » Thu Mar 08, 2007 10:39 pm

First it's not possible to use sub-set of key columns while reading hashed file. Even if he is getting only a part, there is no guarantee that he will get the same because of usage of less key columns.

ray.wurlod · Post by **ray.wurlod** » Thu Mar 08, 2007 11:47 pm

It is possible, but only using a UV stage, as noted. Performance will be helped by indexing the four columns that make up the partial key. The UV stage does support multiple reference row return (find out how by searching the forum or on-line Help).