Page 1 of 1

Hash file lookup with an empty string

Posted: Mon Jun 14, 2004 5:12 pm
by KeithM
I am doing a lookup to a hash file to get a value using a hash key made up of two columns. If both of the columns that I am using for the lookup are an empty string, the result of the lookup is a record not found. If I replace the empty strings with a single space, the lookup will work but I am wondering if this is the best way to handle this?

Posted: Mon Jun 14, 2004 6:00 pm
by mhester
Keith,

First - based on lack of information my assumption here is that you might be loading a warehouse in which case my answers below address this situation and may not be correct if you are doing something else.

I don't believe this would be valid from the traditional warehouse load point of view unless a space is a valid value. Normally when you load a warehouse and are performing lookups, three (3) conditions might occur -

1 - a lookup is performed and the record is not read. At this point a predefined surrogate key/value is assigned (in our case it is a -9 ). This allows the data to be loaded, but will aggregate to the invalid bucket.

2 - a lookup is performed and the record is read but the value you want to retrieve is empty or null. Again, a predefined surrogate key/value is assigned (in our case it is a -8 ). This allows the data to be loaded, but will aggregate to the unknown bucket.

3 - a lookup is performed and the record is read and a valid value is found. In this case the data loads and aggregates correctly.

My examples above represent loading either dimension or fact tables. For example if you were looking up a currency sid and had USD on the input row and this was not found in the lookup then a -9 would be assigned to the currency sid. If the record was read but no valid value exists in the lookup then you would assign a -8 etc...

Does this help?

Regards,

Michael Hester

Posted: Mon Jun 14, 2004 10:09 pm
by chulett
And just to further help understand what you are trying to do...

Under what circumstances would you be looking up anything where both keys are empty strings - in essence null - and expecting a result other than what you received? You can't have a hash record where key fields are null any more than a primary key field in, let's say Oracle, can be null. At least I sure as heck don't think you can... :? Generally seems to result in interesting phantom errors in your log.

Not that I've ever done that, of course. :wink:

Posted: Mon Jun 14, 2004 11:01 pm
by mandyli
When ever you use lookup and hash file stage please use TRIM ( ) for lookup I hope this is best practice when ever u comparing or lookup process.

Posted: Tue Jun 15, 2004 12:25 am
by ray.wurlod
TRIM("") yields "", which doesn't help the original problem.

The OP needs to determine how the "" values came to be in the hashed file's key columns in the first place. It is entirely legal for them to be there (since "" is a known value, unlike NULL), but is clearly causing a problem.

What are the business rules that this job design is striving to address? Knowing the answer to this question will probably focus your mind more closely on what is supposed to be happening (as opposed to what is actually happening).

Posted: Tue Jun 15, 2004 6:15 am
by chulett
ray.wurlod wrote:It is entirely legal for them to be there (since "" is a known value, unlike NULL), but is clearly causing a problem.
Ah well, Mea Culpa. :oops:

However, I do seem to remember a post a little while back where someone had been using 'empty strings' in hash file lookup keys under 5.x but it no longer seemed to work after upgrading to 7.x... something like that. Off to Search! (or actually try if I get some time at work today).

Posted: Tue Jun 15, 2004 6:51 am
by KeithM
The hash file that I am using in this process is used to lookup the surrogate key for a 'status' dimension. In the source system, when a record is initially created the two columns that are my hash file key are not filled in and are assigned an empty string indicating a status of 'unknown.' So having both of the columns contain an empty string is a valid business condition.

Since I can write a record out to the hash file where both of my key values contain an empty string, I was hoping I would be able read that value back in through a lookup and get the surrogate key assigned to the status of 'unknown.' It doesn't appear that I can do this with a hash file. My other option would be to create an condition statement and if both keys are empty, assign a key value through code. The thing I don't like about this is that I would be hard coding key values in my jobs.

Posted: Tue Jun 15, 2004 7:08 am
by mhester
Keith,

You can simply add invalid and unknown rows to the dimensions in the warehouse. Give them some kind of value like ~~~~~ and _____ making one the key value for unknown and one the key value for invalid. Make sure you add these values to the full length of the column. Prior to loading facts you will build the lookup hash and these will be extracted. They are now available for lookup and assignment. As part of your preparation for loading, check these columns for the condition that would trigger either an invalid or unknown and set the key appropriately (again to the full length of the column).

This is not so much "hard coding" a key as having a practice in place for handling fact loads and assigning invalid and unknown surrogates. Something like this is how most companies define these conditions.

This is just one method that may work for you. Every consultant I know does something similar, but usually there are differences and these depend on your business rules, business processes, requirements etc...

Regards,

Michael Hester