Problem with Hash File

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sshettar
Premium Member
Premium Member
Posts: 264
Joined: Thu Nov 30, 2006 10:37 am

Problem with Hash File

Post by sshettar »

Hi All,

i have this job where i need to do a self join on one table based on one column and then join it to another table based on another column.

here is what i have done
i get the data from a complex flat file (CFF) and the output link goes to the Transformer and also to the hash file.
in the hash file i am making the column to be joned as key column and then giving it to the transformer again . but here the problem is that the hash file is removing duplicate records of that key column which i dont want it to happen .

can anybody help me on this issue

Thanks
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

That is what a hashed file does, You cannot get dupplicate entries for a particular key. In other words Destructive overwrite
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

You need a work around. Either load your file to a temp table and then perform regular sql. OR create a dummy key with running sequential number while creating the hashed file so that each and every record is retained. Access that hashed file via Universe Stage and them do sql join at the expense of performance.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

Hi

You can use the removeduplicate stage in parallel version (if the project architecture allows this)

Regards
Sreeni
DSguru2B wrote:You need a work around. Either load your file to a temp table and then perform regular sql. OR create a dummy key with running sequential number while creating the hashed file so that each and every record is retained. Access that hashed file via Universe Stage and them do sql join at the expense of performance.
thebird
Participant
Posts: 254
Joined: Thu Jan 06, 2005 12:11 am
Location: India
Contact:

Post by thebird »

Sreenivasulu wrote: You can use the removeduplicate stage in parallel version (if the project architecture allows this)
Removal of duplicates is not the requirement here!
sshettar wrote: but here the problem is that the hash file is removing duplicate records of that key column which i dont want it to happen .
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Two problems - they want the duplicates and this is a Server discussion. I know you made the 'project architecture' comment, so perhaps only one problem then. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

:lol:
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In the Advanced DataStage class from IBM there is an example that does exactly that; captures the first version into one hashed file and all the duplicates into a text file. Basically, you update the hashed file if the lookup fails, and send the row to the text file if the lookup succeeds. Make sure that read cache is "disabled, lock for update" and write cache is not enabled.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply