Hi All,
I have a Hash File Stage which has few duplicate key records going in and as HashFile Stage works, I am getting last input duplicate key record as output. Is there a way to get the first record among the duplicates as the output. I am using a sort stage and and a surrogate key to get the first one in the output but would like to know whether there is a better option using some functionality of Hashfile stage itself.
Input to Hash File
Col1(Key Field in HF) Col2 Col3
100 ABC C99
100 RXZ G77
100 JKL G77
115 XYZ R33
Normal Output
Col1(Key Field in HF) Col2 Col3
100 JKL G77
115 XYZ R33
Required Output
Col1(Key Field in HF) Col2 Col3
100 ABC C99
115 XYZ R33
Thanks in Advance,
Tom.
To Get the First Duplicate Record from HashFile Output
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 41
- Joined: Fri Jun 04, 2004 5:52 am
- Location: Bangalore
If you want the 'first' rather than the 'last', you need to sort input on your key fields in a descending order rather than ascending. Then all you'll have in the hash when you are done are the first (lowest) values for any duplicate keys.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
The Hash file key must be different from what you've stated, but the general command for the hash file SELECT would read
SELECT HF BY Col1 BREAK.ON Col1 DET.SUP
SELECT HF BY Col1 BREAK.ON Col1 DET.SUP
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Either sort data to be loaded into the hashed file in reverse order, as Craig suggested, or de-duplicate the data by other means before loading them into the hashed file.
All writes to hashed files via the Hashed File stage are destructive overwrites.
If you use a UV stage to insert rows you will achieve what you want, but generate warnings (row already exists) for each duplicate key value. The UV stage uses SQL.
All writes to hashed files via the Hashed File stage are destructive overwrites.
If you use a UV stage to insert rows you will achieve what you want, but generate warnings (row already exists) for each duplicate key value. The UV stage uses SQL.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
-
- Premium Member
- Posts: 892
- Joined: Thu Oct 16, 2003 5:18 am