To Get the First Duplicate Record from HashFile Output

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
tombastian
Premium Member
Premium Member
Posts: 41
Joined: Fri Jun 04, 2004 5:52 am
Location: Bangalore

To Get the First Duplicate Record from HashFile Output

Post by tombastian »

Hi All,
I have a Hash File Stage which has few duplicate key records going in and as HashFile Stage works, I am getting last input duplicate key record as output. Is there a way to get the first record among the duplicates as the output. I am using a sort stage and and a surrogate key to get the first one in the output but would like to know whether there is a better option using some functionality of Hashfile stage itself.


Input to Hash File

Col1(Key Field in HF) Col2 Col3
100 ABC C99
100 RXZ G77
100 JKL G77
115 XYZ R33

Normal Output

Col1(Key Field in HF) Col2 Col3
100 JKL G77
115 XYZ R33

Required Output

Col1(Key Field in HF) Col2 Col3
100 ABC C99
115 XYZ R33

Thanks in Advance,
Tom.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

If you want the 'first' rather than the 'last', you need to sort input on your key fields in a descending order rather than ascending. Then all you'll have in the hash when you are done are the first (lowest) values for any duplicate keys.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The Hash file key must be different from what you've stated, but the general command for the hash file SELECT would read

SELECT HF BY Col1 BREAK.ON Col1 DET.SUP
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Either sort data to be loaded into the hashed file in reverse order, as Craig suggested, or de-duplicate the data by other means before loading them into the hashed file.
All writes to hashed files via the Hashed File stage are destructive overwrites.
If you use a UV stage to insert rows you will achieve what you want, but generate warnings (row already exists) for each duplicate key value. The UV stage uses SQL.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Pass into agg stage with 'first' option for the fields.
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

Use "first" in Aggregrator stage to get the frist duplicate record
or "last" to get the last duplicate record

Regards
Sreenviasulu
Post Reply