Page 1 of 1

dup rows in a hash file

Posted: Tue Jun 10, 2003 8:47 am
by tonystark622
I'm trying to speed up a lookup and was thinking about reading the DB table into a hash file. The problem is that I'm using the "reference link with multi row result set" option on the ODBC stage that I'm currently using for this lookup stage. Is there a way to create a hash file to allow multiple rows with the same keys? If so, is there a way to get multiple rows back like I'm doing with the ODBC stage now?

I appreciate your help,
Tony

Posted: Tue Jun 10, 2003 9:00 am
by chulett
Since you can't have duplicates on a hash key, I would think you would need to define it to include enough columns to create (in essence) duplicates and then do the lookup on a subset of the key.

-craig

Posted: Tue Jun 10, 2003 9:08 am
by inter5566
Tony,
AFAIK you can only perform the multi-row lookup using an ODBC or a Universe stage for the reference source. We are planning on posing a question concerning this hash file functionality at our next users group. I have been experimenting with the multi-row lookup and can understand wanting to increase the performance three or four fold.
If you find out a way to resolve this, please let us all know.

Posted: Tue Jun 10, 2003 3:42 pm
by tonystark622
Craig,
I thought that you had to use the whole key that that hash file was created with to access it on a lookup (from a hash file stage).

I recall some really screwy things happening when I inadvertently had the key defined one way one place and another way someplace else. It would still seem to me that I'm only going to get one row when doing the lookup through a hash file stage and specifiying a key that should return multiple rows from the hash file...

Hmmmm. Could you define it as a hash file to load it, the access it for lookup as a Universe stage... Then you could use the "reference link with multi row result set" from the Universe stage...

Just thinking out loud. Maybe it will jog someone's memory :)

Tony

Posted: Tue Jun 10, 2003 3:44 pm
by tonystark622
Steve,
I'll post an answer here if I come up with anything.

Tony

Posted: Tue Jun 10, 2003 4:10 pm
by chulett
Tony, I probably answered too quickly. Hash files do not allow the return of multirow record sets, as others have pointed out. You need to use the ODBC or Universe stages for that.

You don't use a Primary Key for a multirow reference lookup, you use a foreign key instead according to the docs. Search for 'multirow' in the Designer's Help and you'll see what I mean. It sounds like it's simply a matter of changing which columns you mark as keys in the lookup, but I'll bet there's more to it than that. [}:)] I would think you should create an index on the FK fields...

But I think you are on the right track.

-craig

Posted: Tue Jun 10, 2003 6:27 pm
by ray.wurlod
If you've created and populated the hashed file no problem. You can access it using a UV stage, and thereby search on secondary keys. Indexing those secondary keys will provide some performance boost. Note that, by accessing the hashed file through a UV stage, you lose all memory caching capability.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

Posted: Wed Jun 11, 2003 6:32 am
by tonystark622
Ray,
Will there be any benefit to doing this, then? I hadn't thought this all the way through. If I lose memory caching, then there probably isn't any benefit to doing it this way compared to just looking it up in the Oracle DB. Is this correct?

Thanks again for your help.

Tony

Posted: Wed Jun 11, 2003 5:44 pm
by ray.wurlod
The usual reason for doing it with a UV stage is to have something local, to avoid doing reference lookups across a network.
Therefore, if Oracle and DataStage server are on the same machine, what you say is correct (all else being equal, of course, for example indexing on the secondary key columns). Any difference will be negligible.