Hi All,
i have this job where i need to do a self join on one table based on one column and then join it to another table based on another column.
here is what i have done
i get the data from a complex flat file (CFF) and the output link goes to the Transformer and also to the hash file.
in the hash file i am making the column to be joned as key column and then giving it to the transformer again . but here the problem is that the hash file is removing duplicate records of that key column which i dont want it to happen .
can anybody help me on this issue
Thanks
Problem with Hash File
Moderators: chulett, rschirm, roy
You need a work around. Either load your file to a temp table and then perform regular sql. OR create a dummy key with running sequential number while creating the hashed file so that each and every record is retained. Access that hashed file via Universe Stage and them do sql join at the expense of performance.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Premium Member
- Posts: 892
- Joined: Thu Oct 16, 2003 5:18 am
Hi
You can use the removeduplicate stage in parallel version (if the project architecture allows this)
Regards
Sreeni
You can use the removeduplicate stage in parallel version (if the project architecture allows this)
Regards
Sreeni
DSguru2B wrote:You need a work around. Either load your file to a temp table and then perform regular sql. OR create a dummy key with running sequential number while creating the hashed file so that each and every record is retained. Access that hashed file via Universe Stage and them do sql join at the expense of performance.
Removal of duplicates is not the requirement here!Sreenivasulu wrote: You can use the removeduplicate stage in parallel version (if the project architecture allows this)
sshettar wrote: but here the problem is that the hash file is removing duplicate records of that key column which i dont want it to happen .
-
- Premium Member
- Posts: 1255
- Joined: Wed Feb 02, 2005 11:54 am
- Location: United States of America
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
In the Advanced DataStage class from IBM there is an example that does exactly that; captures the first version into one hashed file and all the duplicates into a text file. Basically, you update the hashed file if the lookup fails, and send the row to the text file if the lookup succeeds. Make sure that read cache is "disabled, lock for update" and write cache is not enabled.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.