Problem with Hash File

sshettar · Post by **sshettar** » Thu Dec 07, 2006 2:50 pm

Hi All,

i have this job where i need to do a self join on one table based on one column and then join it to another table based on another column.

here is what i have done
i get the data from a complex flat file (CFF) and the output link goes to the Transformer and also to the hash file.
in the hash file i am making the column to be joned as key column and then giving it to the transformer again . but here the problem is that the hash file is removing duplicate records of that key column which i dont want it to happen .

can anybody help me on this issue

Thanks

narasimha · Post by **narasimha** » Thu Dec 07, 2006 2:54 pm

That is what a hashed file does, You cannot get dupplicate entries for a particular key. In other words Destructive overwrite

DSguru2B · Post by **DSguru2B** » Thu Dec 07, 2006 3:21 pm

You need a work around. Either load your file to a temp table and then perform regular sql. OR create a dummy key with running sequential number while creating the hashed file so that each and every record is retained. Access that hashed file via Universe Stage and them do sql join at the expense of performance.

Sreenivasulu · Post by **Sreenivasulu** » Thu Dec 07, 2006 9:50 pm

Hi

You can use the removeduplicate stage in parallel version (if the project architecture allows this)

Regards
Sreeni

DSguru2B wrote:You need a work around. Either load your file to a temp table and then perform regular sql. OR create a dummy key with running sequential number while creating the hashed file so that each and every record is retained. Access that hashed file via Universe Stage and them do sql join at the expense of performance.

thebird · Post by **thebird** » Thu Dec 07, 2006 10:07 pm

Sreenivasulu wrote: You can use the removeduplicate stage in parallel version (if the project architecture allows this)

Removal of duplicates is not the requirement here!

sshettar wrote: but here the problem is that the hash file is removing duplicate records of that key column which i dont want it to happen .

chulett · Post by **chulett** » Thu Dec 07, 2006 10:07 pm

Two problems - they want the duplicates and this is a Server discussion. I know you made the 'project architecture' comment, so perhaps only one problem then.

I_Server_Whale · Post by **I_Server_Whale** » Fri Dec 08, 2006 12:00 am

ray.wurlod · Post by **ray.wurlod** » Fri Dec 08, 2006 3:38 pm

In the Advanced DataStage class from IBM there is an example that does exactly that; captures the first version into one hashed file and all the duplicates into a text file. Basically, you update the hashed file if the lookup fails, and send the row to the text file if the lookup succeeds. Make sure that read cache is "disabled, lock for update" and write cache is not enabled.