Writing data into two hash

Luk · Post by **Luk** » Tue Apr 19, 2005 3:08 am

Hi!

I have strange problem. I have got transformer and output of transformer is connected with two different hash files.

As input of transformer I veve got sequence of surrogate keys( made with GetKeyNextValue) colA, colB and colC. I'm putting key column, colA as PK, colB as PK into firs hash - key column, colB as PK and colC as PK into second one.

When job is finished, in my log I have 2000 rows written to hash1 and 2000 rows written to hash2.

But I have noticed that some integer keys which are in first hash aren't in second.
I used UniVerse stage and made SQL to hash files - and there is 2000 rows in first hash and 1900 in second!

Do you have any ideas why it is happens??

ArndW · Post by **ArndW** » Tue Apr 19, 2005 4:10 am

Luk,

as you stated, both links had 2000 rows go down them. Since the way that Hash files work is that WRITEs to an already existing key will perform an overwrite of the existing record, you need to look at your PK on the second file of ColB and ColC.

I would suggest you change the stage to drop & re-create the hash file and make sure you are using both ColB and ColC as keys in the output to that stage. You have some periodicity to the overwrites, so if you do a select of your second file and order by ColB and ColC you should, within the first couple of pages, find a "missing" row which might help debug the problem. Which integer keys are missing? Could you have leading "0"s?

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Tue Apr 19, 2005 4:28 am

Try to write them into seq files which can then be used to compare. Hash file by default overwrites keys and hence may result in <= rows you wrote in it.

Luk · Post by **Luk** » Tue Apr 19, 2005 5:08 am

Is it possible to overwrite surrogate key column (made by GetKeyNextValue) in hash when I have PK build with two columns (key is unique only when you take both columns, if you take only one they won't be unique) ??

ArndW · Post by **ArndW** » Tue Apr 19, 2005 5:11 am

Luk,

if you ran the job and created the hash file with just one column as the PK and later added the second column the file will only use the original definition, that's why I suggested you force delete and re-create the file. This is a relatively common source of problems. Does it now work?

Luk · Post by **Luk** » Tue Apr 19, 2005 5:22 am

Yes that is true - I have noticed that pair of PK columns is not 100% unique !!!

if you ran the job and created the hash file with just one column as the PK and later added the second column the file will only use the original definition,

I am using checbox "create file" and checkbox "delete file befor creation" in hash options. Is it enaugh for recreating file with new definition??

Luk · Post by **Luk** » Tue Apr 19, 2005 6:01 am

OK

problem is solved

thanks you all!!

Regards

ArndW · Post by **ArndW** » Tue Apr 19, 2005 6:52 am

Luk,

in order for this forum to work it would be nice to have you tell us what the solution and/or the problem was so that others might search this thread and get a solution.

Luk · Post by **Luk** » Tue Apr 19, 2005 7:22 am

You already gave solution - as I've mentioned you had right!!

set of columns which I used as PK in hash file wasn't 100% unique (there was an update on few records in hash) . I have added 1 more column as key and everything is unique and number of rows i hashes is correct!

Regards