Page 1 of 1

Writing data into two hash

Posted: Tue Apr 19, 2005 3:08 am
by Luk
Hi!

I have strange problem. I have got transformer and output of transformer is connected with two different hash files.

As input of transformer I veve got sequence of surrogate keys( made with GetKeyNextValue) colA, colB and colC. I'm putting key column, colA as PK, colB as PK into firs hash - key column, colB as PK and colC as PK into second one.

When job is finished, in my log I have 2000 rows written to hash1 and 2000 rows written to hash2.

But I have noticed that some integer keys which are in first hash aren't in second.
I used UniVerse stage and made SQL to hash files - and there is 2000 rows in first hash and 1900 in second!

Do you have any ideas why it is happens??

Posted: Tue Apr 19, 2005 4:10 am
by ArndW
Luk,

as you stated, both links had 2000 rows go down them. Since the way that Hash files work is that WRITEs to an already existing key will perform an overwrite of the existing record, you need to look at your PK on the second file of ColB and ColC.

I would suggest you change the stage to drop & re-create the hash file and make sure you are using both ColB and ColC as keys in the output to that stage. You have some periodicity to the overwrites, so if you do a select of your second file and order by ColB and ColC you should, within the first couple of pages, find a "missing" row which might help debug the problem. Which integer keys are missing? Could you have leading "0"s?

Posted: Tue Apr 19, 2005 4:28 am
by Sainath.Srinivasan
Try to write them into seq files which can then be used to compare. Hash file by default overwrites keys and hence may result in <= rows you wrote in it.

Posted: Tue Apr 19, 2005 5:08 am
by Luk
Is it possible to overwrite surrogate key column (made by GetKeyNextValue) in hash when I have PK build with two columns (key is unique only when you take both columns, if you take only one they won't be unique) ??

Posted: Tue Apr 19, 2005 5:11 am
by ArndW
Luk,

if you ran the job and created the hash file with just one column as the PK and later added the second column the file will only use the original definition, that's why I suggested you force delete and re-create the file. This is a relatively common source of problems. Does it now work?

Posted: Tue Apr 19, 2005 5:22 am
by Luk
Yes that is true - I have noticed that pair of PK columns is not 100% unique !!!
if you ran the job and created the hash file with just one column as the PK and later added the second column the file will only use the original definition,
I am using checbox "create file" and checkbox "delete file befor creation" in hash options. Is it enaugh for recreating file with new definition??

Posted: Tue Apr 19, 2005 6:01 am
by Luk
OK :)

problem is solved

thanks you all!!

Regards

Posted: Tue Apr 19, 2005 6:52 am
by ArndW
Luk,

in order for this forum to work it would be nice to have you tell us what the solution and/or the problem was so that others might search this thread and get a solution.

Posted: Tue Apr 19, 2005 7:22 am
by Luk
:) You already gave solution - as I've mentioned you had right!!

set of columns which I used as PK in hash file wasn't 100% unique (there was an update on few records in hash) . I have added 1 more column as key and everything is unique and number of rows i hashes is correct!

Regards