Hash file lookup with an empty string

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
KeithM
Participant
Posts: 61
Joined: Thu Apr 22, 2004 11:34 am
Contact:

Hash file lookup with an empty string

Post by KeithM »

I am doing a lookup to a hash file to get a value using a hash key made up of two columns. If both of the columns that I am using for the lookup are an empty string, the result of the lookup is a record not found. If I replace the empty strings with a single space, the lookup will work but I am wondering if this is the best way to handle this?
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Keith,

First - based on lack of information my assumption here is that you might be loading a warehouse in which case my answers below address this situation and may not be correct if you are doing something else.

I don't believe this would be valid from the traditional warehouse load point of view unless a space is a valid value. Normally when you load a warehouse and are performing lookups, three (3) conditions might occur -

1 - a lookup is performed and the record is not read. At this point a predefined surrogate key/value is assigned (in our case it is a -9 ). This allows the data to be loaded, but will aggregate to the invalid bucket.

2 - a lookup is performed and the record is read but the value you want to retrieve is empty or null. Again, a predefined surrogate key/value is assigned (in our case it is a -8 ). This allows the data to be loaded, but will aggregate to the unknown bucket.

3 - a lookup is performed and the record is read and a valid value is found. In this case the data loads and aggregates correctly.

My examples above represent loading either dimension or fact tables. For example if you were looking up a currency sid and had USD on the input row and this was not found in the lookup then a -9 would be assigned to the currency sid. If the record was read but no valid value exists in the lookup then you would assign a -8 etc...

Does this help?

Regards,

Michael Hester
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

And just to further help understand what you are trying to do...

Under what circumstances would you be looking up anything where both keys are empty strings - in essence null - and expecting a result other than what you received? You can't have a hash record where key fields are null any more than a primary key field in, let's say Oracle, can be null. At least I sure as heck don't think you can... :? Generally seems to result in interesting phantom errors in your log.

Not that I've ever done that, of course. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
mandyli
Premium Member
Premium Member
Posts: 898
Joined: Wed May 26, 2004 10:45 pm
Location: Chicago

Post by mandyli »

When ever you use lookup and hash file stage please use TRIM ( ) for lookup I hope this is best practice when ever u comparing or lookup process.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

TRIM("") yields "", which doesn't help the original problem.

The OP needs to determine how the "" values came to be in the hashed file's key columns in the first place. It is entirely legal for them to be there (since "" is a known value, unlike NULL), but is clearly causing a problem.

What are the business rules that this job design is striving to address? Knowing the answer to this question will probably focus your mind more closely on what is supposed to be happening (as opposed to what is actually happening).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ray.wurlod wrote:It is entirely legal for them to be there (since "" is a known value, unlike NULL), but is clearly causing a problem.
Ah well, Mea Culpa. :oops:

However, I do seem to remember a post a little while back where someone had been using 'empty strings' in hash file lookup keys under 5.x but it no longer seemed to work after upgrading to 7.x... something like that. Off to Search! (or actually try if I get some time at work today).
-craig

"You can never have too many knives" -- Logan Nine Fingers
KeithM
Participant
Posts: 61
Joined: Thu Apr 22, 2004 11:34 am
Contact:

Post by KeithM »

The hash file that I am using in this process is used to lookup the surrogate key for a 'status' dimension. In the source system, when a record is initially created the two columns that are my hash file key are not filled in and are assigned an empty string indicating a status of 'unknown.' So having both of the columns contain an empty string is a valid business condition.

Since I can write a record out to the hash file where both of my key values contain an empty string, I was hoping I would be able read that value back in through a lookup and get the surrogate key assigned to the status of 'unknown.' It doesn't appear that I can do this with a hash file. My other option would be to create an condition statement and if both keys are empty, assign a key value through code. The thing I don't like about this is that I would be hard coding key values in my jobs.
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Keith,

You can simply add invalid and unknown rows to the dimensions in the warehouse. Give them some kind of value like ~~~~~ and _____ making one the key value for unknown and one the key value for invalid. Make sure you add these values to the full length of the column. Prior to loading facts you will build the lookup hash and these will be extracted. They are now available for lookup and assignment. As part of your preparation for loading, check these columns for the condition that would trigger either an invalid or unknown and set the key appropriately (again to the full length of the column).

This is not so much "hard coding" a key as having a practice in place for handling fact loads and assigning invalid and unknown surrogates. Something like this is how most companies define these conditions.

This is just one method that may work for you. Every consultant I know does something similar, but usually there are differences and these depend on your business rules, business processes, requirements etc...

Regards,

Michael Hester
Post Reply