dup rows in a hash file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

dup rows in a hash file

Post by tonystark622 »

I'm trying to speed up a lookup and was thinking about reading the DB table into a hash file. The problem is that I'm using the "reference link with multi row result set" option on the ODBC stage that I'm currently using for this lookup stage. Is there a way to create a hash file to allow multiple rows with the same keys? If so, is there a way to get multiple rows back like I'm doing with the ODBC stage now?

I appreciate your help,
Tony
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Since you can't have duplicates on a hash key, I would think you would need to define it to include enough columns to create (in essence) duplicates and then do the lookup on a subset of the key.

-craig
inter5566
Premium Member
Premium Member
Posts: 57
Joined: Tue Jun 10, 2003 1:51 pm
Location: US - Midwest

Post by inter5566 »

Tony,
AFAIK you can only perform the multi-row lookup using an ODBC or a Universe stage for the reference source. We are planning on posing a question concerning this hash file functionality at our next users group. I have been experimenting with the multi-row lookup and can understand wanting to increase the performance three or four fold.
If you find out a way to resolve this, please let us all know.
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

Craig,
I thought that you had to use the whole key that that hash file was created with to access it on a lookup (from a hash file stage).

I recall some really screwy things happening when I inadvertently had the key defined one way one place and another way someplace else. It would still seem to me that I'm only going to get one row when doing the lookup through a hash file stage and specifiying a key that should return multiple rows from the hash file...

Hmmmm. Could you define it as a hash file to load it, the access it for lookup as a Universe stage... Then you could use the "reference link with multi row result set" from the Universe stage...

Just thinking out loud. Maybe it will jog someone's memory :)

Tony
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

Steve,
I'll post an answer here if I come up with anything.

Tony
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Tony, I probably answered too quickly. Hash files do not allow the return of multirow record sets, as others have pointed out. You need to use the ODBC or Universe stages for that.

You don't use a Primary Key for a multirow reference lookup, you use a foreign key instead according to the docs. Search for 'multirow' in the Designer's Help and you'll see what I mean. It sounds like it's simply a matter of changing which columns you mark as keys in the lookup, but I'll bet there's more to it than that. [}:)] I would think you should create an index on the FK fields...

But I think you are on the right track.

-craig
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you've created and populated the hashed file no problem. You can access it using a UV stage, and thereby search on secondary keys. Indexing those secondary keys will provide some performance boost. Note that, by accessing the hashed file through a UV stage, you lose all memory caching capability.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

Ray,
Will there be any benefit to doing this, then? I hadn't thought this all the way through. If I lose memory caching, then there probably isn't any benefit to doing it this way compared to just looking it up in the Oracle DB. Is this correct?

Thanks again for your help.

Tony
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The usual reason for doing it with a UV stage is to have something local, to avoid doing reference lookups across a network.
Therefore, if Oracle and DataStage server are on the same machine, what you say is correct (all else being equal, of course, for example indexing on the secondary key columns). Any difference will be negligible.
Post Reply