Page 1 of 1

Problem with Hash File

Posted: Thu Dec 07, 2006 2:50 pm
by sshettar
Hi All,

i have this job where i need to do a self join on one table based on one column and then join it to another table based on another column.

here is what i have done
i get the data from a complex flat file (CFF) and the output link goes to the Transformer and also to the hash file.
in the hash file i am making the column to be joned as key column and then giving it to the transformer again . but here the problem is that the hash file is removing duplicate records of that key column which i dont want it to happen .

can anybody help me on this issue

Thanks

Posted: Thu Dec 07, 2006 2:54 pm
by narasimha
That is what a hashed file does, You cannot get dupplicate entries for a particular key. In other words Destructive overwrite

Posted: Thu Dec 07, 2006 3:21 pm
by DSguru2B
You need a work around. Either load your file to a temp table and then perform regular sql. OR create a dummy key with running sequential number while creating the hashed file so that each and every record is retained. Access that hashed file via Universe Stage and them do sql join at the expense of performance.

Posted: Thu Dec 07, 2006 9:50 pm
by Sreenivasulu
Hi

You can use the removeduplicate stage in parallel version (if the project architecture allows this)

Regards
Sreeni
DSguru2B wrote:You need a work around. Either load your file to a temp table and then perform regular sql. OR create a dummy key with running sequential number while creating the hashed file so that each and every record is retained. Access that hashed file via Universe Stage and them do sql join at the expense of performance.

Posted: Thu Dec 07, 2006 10:07 pm
by thebird
Sreenivasulu wrote: You can use the removeduplicate stage in parallel version (if the project architecture allows this)
Removal of duplicates is not the requirement here!
sshettar wrote: but here the problem is that the hash file is removing duplicate records of that key column which i dont want it to happen .

Posted: Thu Dec 07, 2006 10:07 pm
by chulett
Two problems - they want the duplicates and this is a Server discussion. I know you made the 'project architecture' comment, so perhaps only one problem then. :wink:

Posted: Fri Dec 08, 2006 12:00 am
by I_Server_Whale
:lol:

Posted: Fri Dec 08, 2006 3:38 pm
by ray.wurlod
In the Advanced DataStage class from IBM there is an example that does exactly that; captures the first version into one hashed file and all the duplicates into a text file. Basically, you update the hashed file if the lookup fails, and send the row to the text file if the lookup succeeds. Make sure that read cache is "disabled, lock for update" and write cache is not enabled.