Lookup 1 Key in a 2 Key Hash

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
endyl13
Premium Member
Premium Member
Posts: 44
Joined: Tue Jun 25, 2002 12:47 am
Location: Qatar
Contact:

Lookup 1 Key in a 2 Key Hash

Post by endyl13 »

Hi,

My Hash File has 2 Keys (Hash.A and Hash.B), I like to do a lookup from a data file with the only 1 key (Data.A).
How can I get all value from the hash where the Hash.A = Data.A ?
What value should I put in the transformer when looking up to Hash.B ?
I tried to put @TRUE or Not(@NULL) or Not(0) without any success...

Thanks...


Regards

ND
inter5566
Premium Member
Premium Member
Posts: 57
Joined: Tue Jun 10, 2003 1:51 pm
Location: US - Midwest

Post by inter5566 »

ND,
I don't think you will be able to do what your trying with a hash file. I may stand corrected after some of the more enlightened members chime in on this. To resolve your situation you may want to switch to feeding your transformer stage from a UV stage. With the UV stage, you can load it using one set of key columns and then reference it by a different set of keys. You can also obtain multiple key matches to increase your stream output (multirow lookup).

Hope this helps,

Steve
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Depending on exactly how the Hash was created, you may be laboring under a misconception.

If you simply specified two columns as 'Key' when the hash was created, you do not have two separate keys in the hash - you have a hash with a single composite primary key. This should mean that the combination of A & B will be enforced to be unique and this can be easily tested by pushing some data into the hash and seeing what you end up with.

Sounds like what you may need is a Primary Key and an Alternate Index on the hash, and this is something that needs to be manually created, as far as I know. As Steve posted, there are other stages that allow multiple row returns, which is more than likely what you'd get when doing a lookup on a non-unique index. I don't have the syntax off the top of my head but you can search the forum and archive, or wait for more enlightened folks to chime in. [:)]

-craig
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

ND,

If you store the hash file in the account, rather than in a directory, you can access the hash file using a Universe stage and putting the hash file name as the table name. You can specify just the one field as a key field in the meta data in the Universe stage and do the lookup that way.

Good Luck,
Tony
inter5566
Premium Member
Premium Member
Posts: 57
Joined: Tue Jun 10, 2003 1:51 pm
Location: US - Midwest

Post by inter5566 »

Tony,

Thanks for the tip. It would be nice if we could do this from a hash stored in a directory path. We use some fairly large hash tables and prefer to clean them up from the UNIX side using a general cleaning script.

Thanks, it helped

Steve
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'm pretty sure you can, you just have to add a VOC entry for the Hash manually. Something like that, anyway. [:I] Ray has detailed this process in the past.

-craig
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

Steve,

As Craig said, you can do this with the hash file in a directory other than account, but you have to create a VOC entry for it. Ray explained how to do this one time, but I don't have it handy. And I'm still learning Universe (still learning DataStage for that matter ), so I don't have this off the top of my head either...

Try searching the archives for hash and VOC maybe???

Also, as someone already stated, this lookup method can be VERY slow if you can't convince it to use existing indexes and you may have to create secondary indexes. Curiously enough, Universe requires you to CREATE.INDEX then BUILD.INDEX before you can use an index.

Good Luck,
Tony
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

ND

The VOC is like SYS.ALL_TABLES in Oracle. It controls which hash files are accessable directly by a project. You can create VOC records in the editor called ED or use SETFILE. If you created a hash file in the project for Customer then the VOC record would look like this at TCL:

ED VOC Customer
F
Customer
D_Customer

Below this project you would see these files. There would be a Customer directory with 3 files in it .Type30, DATA.30 and OVER.30. This is where the hash file data is stored. This is because the default hash file is DYNAMIC or type 30. The D_Customer file stores the column names for this hash file. You can view these columns names by typing in "LIST DICT Customer".

To access hash files that were created in a specific directory like "/u2" then the VOC record looks like this:

F
/u2/Customer
/u2/D_Customer

You can get help on ED by typing in "HELP ED" or "?" once you are in ED. The same is true for "HELP SETFILE".

Kim.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There is a single command you can use, called SETFILE. It will prompt for any argument not given. Syntax is:

SETFILE pathname localname [OVERWRITING]

pathname is the pathname of the hashed file (data portion)
localname is the unqualified name that goes in the VOC file

Example

SETFILE /data03/hashedfiles/hshPerson hshPerson OVERWRITING


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

To answer the original question, all you have to do is lie. Don't declare Hash.B as a key column (in this job's Transformer stage).

However, this introduces the possibility that there is more than one row in the hashed file that meets your criterion (Data.A = Hash.A). A hashed file stage will only return the first matching row.

If you want all rows, use a UV stage rather than a hashed file stage, and check the multi-row result set check box. This will generate one row of output for each found row.

The other issue is performance. Because you're not accessing the hashed file with its entire key, it will not use hashing but will use a table scan instead (slow). You may need to create an index on the Hash.A column.

If the hashed file was created from a UV stage, a standard SQL statement is used, for example:
CREATE INDEX HASHA_IDX ON hashfile(Hash.A);

If the hashed file was created from a Hashed File stage, a slightly different method is used, involving two non-SQL commands:
CREATE.INDEX hashfile Hash.A
BUILD.INDEX hashfile Hash.A


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
endyl13
Premium Member
Premium Member
Posts: 44
Joined: Tue Jun 25, 2002 12:47 am
Location: Qatar
Contact:

Post by endyl13 »

Thank you all for the detail explanations...
I have never tried the UV stage before, but I am going to try that approach.
Thanks again...

[;)]

Regards

ND
Post Reply