Hash File (mis)behaviour

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ar_shriks
Participant
Posts: 11
Joined: Sun Feb 03, 2002 9:42 am
Location: United Arab Emirates

Hash File (mis)behaviour

Post by ar_shriks »

Hi,

We have a job which looks as follows:

Ora9i ----> Trans_1 ----> Hash1 -----> (Further Stages)

The output Link of Hash1 is a reference link to further stages in the job.

Output row counts:
---------------------
Ora9i to Trans_1 = 5150800
Trans_1 to Hash1 = 5150800
Hash1 to Further Stages = 5150811

Please note that the input row count from the Transformer_1 to Hash1 is 5150800 whereas it outputs 5150811 rows. This is quite strange because we expect the input and output row counts to be equal. The file names and column definitions in the 'Input' and 'Output' tabs of Hash1 are exactly the same and only the first field (column) of the Hash1 has been defined as the Key in the 'Input' and 'Output' tabs.

We took this hash file (HashFile1), created a new job and output it to a sequential file. In this case the Input Link and Output Link show the row count as 5150800.

Can someone tell us what could be the reason for the HashFile1 to output more rows than input?

Regards
Shrikanth
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

5150811 is the number of times a lookup was successfully performed against the hashed file. It is unrelated to the number of rows in the hashed file.

Your expectations need to be managed better! :lol:

Indeed, it is possible that the hashed file contains fewer than the number of rows that were written to it because, when the same key value occurs, a destructive overwrite occurs. You proved, in your second job, that this was not the case here.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

Yes. Ray is right(always).

In order to verify the no of rows in the hash file, just execute 'select count(*) from the hashfile' that would give you the no of rows in the hash file. Pls keep in mind that you may need to establish the pointer to the hash file if it the file is created in the folder other than account.

HTWH.

Regards
Saravanan
Post Reply