Hash File Lookup

donlank · Post by **donlank** » Thu Nov 11, 2004 1:26 pm

I am new with Datastage.

I have a Hash file that has 4 columns that make a unique record. When looking up against the hash file with these four columns, I am able to return a surrogate key.

Now the problem....

Two of the columns are common to all records with in that group.

Column3 and Column4 will not always be Null, but for job i am trying to accomplish, they will be because other jobs are using the same hash file that need all 4 columns.

I.E.
_Key_ C1 _C2__ _C3_ __C4_
1234 | 1 | 0000 | NULL | NULL
5222 | 1 | 0001 | NULL | NULL
4421 | 1 | 0002 | NULL | NULL
2421 | 2 | 0000 | NULL | NULL
6314 | 2 | 0001 | NULL | NULL

Is there a way to get return a list from a Hash File?

I need all the Surr. Keys from a group.

i.e.
All Surr. Keys where C1 = '1' and C3 = 'Null';
I need to get the keys, 1234, 5222 and 4421 from the lookup.

All Surr. Keys where C2 = '2' and C3 = 'Null';
I need to get the keys: 2421 and 6314 from the lookup.

Is there a way to accomplish this?

Thanks,
Kevin

chucksmith · Post by **chucksmith** » Thu Nov 11, 2004 1:50 pm

Perhaps, you should make a new hash file with key columns C1 and C3, and a data column containing a delimited string of all of the surrogate keys associated with the key column pair. Use @VM as your delimiter.

Now your lookup will give you what you want. It is easy to build this hash file, just create a job that does a reference lookup and a write to the same hash file. DISABLE WRITE CACHE! The lookup gives you the current state of the row, or a NOTFOUND condition. Now you can set the surrogate key derivation to the incoming value or the lookup value : @VM : incoming value.

If you write the results of the job you described to a hash file after this lookup, you can take advantage on the normalize feature of the hash file stage if you need to normalize the relationship to a single surrogate key later.

donlank · Post by **donlank** » Thu Nov 11, 2004 2:00 pm

That solution isn't feasible because the delimited field would need to be an extremely large length to handle the possible results, since the surrogate key is 22 bit length and we are looking at a possibilty of returning up to a few thousand keys per lookup.

More information:
Is there a way to return multiple results from a lookup against a hashed file?

It is a master hashed file used for foreign keying and it is keyed on 4 columns which make up the natural key (you would need to perform a lookup on all 4 columns to return a unique row). Two different jobs require use of this hashed file, but one of the jobs will only be using 2 of the 4 key columns to perform the lookup. We expect and desire to receive multiple results from this lookup. Is this possible? If so, how do we see/manage each returned row individually?

chucksmith · Post by **chucksmith** » Thu Nov 11, 2004 2:13 pm

Hash files support very long rows. They are variable length records. You may want to try building it, and see what you get.

A hash file lookup returns a single row. You need a join. Look at a UV stage or creating a table to efficiently perform this join. You would load it with your C1 and C3 columns, then join with you dimension table's natural key columns. You will need an index.

ray.wurlod · Post by **ray.wurlod** » Thu Nov 11, 2004 3:26 pm

Using a Hashed File stage you can only perform a key-based lookup.
You can perform a reverse lookup against a hashed file stage using a UV stage, but will need some tricky SQL to handle the possibility of NULL (lots of OR conjunctions). And it won't be fast unless you index the non-key columns.

roy · Post by **roy** » Sun Nov 14, 2004 2:47 pm

Hi,
I was wondering if the following info might be of any help

?
there is a posibility of using the ODBC stage for returning multiple lookup results, this means that any source row is duplicated with each lookup found; afterwards you might aggregate or use stage variables to flatten all rows to 1.
this is a somewhat elusive option since the implementation of it is actually in the transformer and not the ODBC stage, in the transformer go to properties then to the inputs tab then select the ODBC lookup link and check the "reference link with multi row result set".

FYI, I do not have the whole picture of what you need but this might be an option.

IHTH,

branimir.kostic · Post by **branimir.kostic** » Wed Nov 17, 2004 8:55 am

What is a UV Stage. I have a similiar problem. I want to get multiple rows from a lookup against a hash file (analog to ODBC Stage). How can i solve it?

Thanks for your reply

Regards

ray.wurlod · Post by **ray.wurlod** » Wed Nov 17, 2004 3:11 pm

Welcome aboard! :D

Hashed files are, originally, the means by which the UniVerse RDBMS implemented its database tables. Instead of using a primary key index, they use a computational method (called a "hashing algorithm") to determine the location of a record. UniVerse is the database on which the original DataStage Repository was created; while they diverged a couple of years ago, they're still more than 90% comparable to each other.

The UV stage is a stage type that access UniVerse - or DataStage - database tables using ODBC protocol, and therefore SQL.

Hashed File stages can only select a row given its key value; a UV stage can use any SQL. But Hashed File stage is much faster than UV stage where it can be used, because it can cache the data in memory.

There is one caveat; the hashed file must be visible from the DataStage project, or "account", in order to be able to use a UV stage to access it. If you've created the hashed file in a directory, you will need to use a SETFILE command to create a pointer to it. For example, if you've created a hashed file called hfCustomer in directory /usr3/data/hash, then open your Administrator client Command window for the project and execute the command

Code: Select all

SETFILE /usr3/data/hash/hfCustomer hfCustomer OVERWRITING

branimir.kostic · Post by **branimir.kostic** » Thu Nov 18, 2004 4:38 am

Thanks for your reply.

Within DataStage I created the hashfile hash_POLIZZ1 to the directory path D:\Hashfiles. Via command (SET.FILE D:\Hashfiles\hash_POLIZZ1 hash_POLIZZ1 OVERWRITING) I pointed the file to VOC.

Because of a new requirement (new columns have to be added to the specified hashfile) i tried to delete it via command interface. Doing it I receive following error:
"hash_POLIZZ1" is not a file definition record in your VOC file

Starting the Job in Datastage with the option "Create File" causes a job abort with following error:
jobVertragInitialisierung..hashPOLIZZ1.xfm_to_hashPOLIZZ1: DSD.UVOpen An operating system file already exists with the name "D_hash_POLIZZ1".
File name =
File not created.

How can I delete the file, so there is no pointer on it and now existing definition in the repository?

Thanks for your reply

ray.wurlod · Post by **ray.wurlod** » Thu Nov 18, 2004 2:26 pm

There's no "." in SETFILE.

You may have created something called a "Q" pointer in your VOC file. More on that later.

You can delete the hashed file, if there is no VOC entry, using

Code: Select all

DOS /C "DEL /S/Q D:\Hashfiles\hash_POLIZZ1"
DOS /C "DEL /S/Q D:\Hashfiles\D_hash_POLIZZ1"

The SET.FILE command (with a ".") normally creates a different kind of VOC entry, called a "Q" pointer, a logical pointer to a file in another account. It's syntax is different from that of SETFILE. You attempted to create a "Q" pointer called OVERWRITING, which would have failed because OVERWRITING already exists in the VOC file as a keyword.

After you create your hashed file, use SETFILE without a "."

Code: Select all

SETFILE D:\Hashfiles\POLIZZ1 POLIZZ1 OVERWRITING

branimir.kostic · Post by **branimir.kostic** » Thu Nov 18, 2004 2:49 pm

Thanks, it works :D
Wow, DataStage is much more complex than Informatica PowerCenter

ray.wurlod · Post by **ray.wurlod** » Thu Nov 18, 2004 2:53 pm

None of that is DataStage.

What you've been doing is fooling around in a database that supports DataStage. If you hadn't wanted SETFILE in the first place, then you would have removed the hashed file via a check box in a stage.

chucksmith · Post by **chucksmith** » Thu Nov 18, 2004 2:55 pm

Perhaps another option would have been on the hash file stage, after checking the Create File box on the Output / General tab, to then press the Options button, and check the box Delete file before create.

Seems pretty easy to me.

branimir.kostic · Post by **branimir.kostic** » Fri Nov 19, 2004 2:02 am

Sorry, but i have to fool in the database because DataStage is not able to delete his own hashfile or just clear the data, although I have checked the options offered by DataStage ("Create File" / "Clear before writing" / "Delete file before Creating"). All this options were checked but in my case It didn't work.
So I tried to set a pointer, so that DataStage is able to delete my file.

ray.wurlod · Post by **ray.wurlod** » Fri Nov 19, 2004 4:34 pm

If you don't start fooling in the database (SETFILE), DataStage CAN delete its own hashed files.