Page 1 of 1

Hash File (mis)behaviour

Posted: Tue Aug 30, 2005 1:27 am
by ar_shriks
Hi,

We have a job which looks as follows:

Ora9i ----> Trans_1 ----> Hash1 -----> (Further Stages)

The output Link of Hash1 is a reference link to further stages in the job.

Output row counts:
---------------------
Ora9i to Trans_1 = 5150800
Trans_1 to Hash1 = 5150800
Hash1 to Further Stages = 5150811

Please note that the input row count from the Transformer_1 to Hash1 is 5150800 whereas it outputs 5150811 rows. This is quite strange because we expect the input and output row counts to be equal. The file names and column definitions in the 'Input' and 'Output' tabs of Hash1 are exactly the same and only the first field (column) of the Hash1 has been defined as the Key in the 'Input' and 'Output' tabs.

We took this hash file (HashFile1), created a new job and output it to a sequential file. In this case the Input Link and Output Link show the row count as 5150800.

Can someone tell us what could be the reason for the HashFile1 to output more rows than input?

Regards
Shrikanth

Posted: Tue Aug 30, 2005 1:35 am
by ray.wurlod
5150811 is the number of times a lookup was successfully performed against the hashed file. It is unrelated to the number of rows in the hashed file.

Your expectations need to be managed better! :lol:

Indeed, it is possible that the hashed file contains fewer than the number of rows that were written to it because, when the same key value occurs, a destructive overwrite occurs. You proved, in your second job, that this was not the case here.

Posted: Tue Aug 30, 2005 1:51 am
by elavenil
Yes. Ray is right(always).

In order to verify the no of rows in the hash file, just execute 'select count(*) from the hashfile' that would give you the no of rows in the hash file. Pls keep in mind that you may need to establish the pointer to the hash file if it the file is created in the folder other than account.

HTWH.

Regards
Saravanan