retain duplicates out of hash file
Moderators: chulett, rschirm, roy
retain duplicates out of hash file
Hi everyone, I have a requirement where i would need to retain duplicates out of hash file.
my primary link is db2 stage going into a transformer. i have reference link coming out of hash file into a transformer stage. output link out of transformer stage goes to a sequential file.
now, in hash file, i have columnA and columnB. I have selected columnA as primary key. i can map this primary key in transfore stage to another column from DB2 stage. but by doing this, if i will be losing duplicates from hash file. i want to retain them. and if i select columnB also as primary key then i don't have anything to map it to from DB2 stage.
the values are:
123,abc
123,def
please let me know how i can retain these duplicates and pass them on to output link. thanks!
my primary link is db2 stage going into a transformer. i have reference link coming out of hash file into a transformer stage. output link out of transformer stage goes to a sequential file.
now, in hash file, i have columnA and columnB. I have selected columnA as primary key. i can map this primary key in transfore stage to another column from DB2 stage. but by doing this, if i will be losing duplicates from hash file. i want to retain them. and if i select columnB also as primary key then i don't have anything to map it to from DB2 stage.
the values are:
123,abc
123,def
please let me know how i can retain these duplicates and pass them on to output link. thanks!
That's just it, there are no duplicates in a hashed file unless you fake it by follow the technique suggested that the search would turn up. Let me make sure we're both on the same page - if you want one input row to hit a reference lookup and return more than one lookup result (your 'duplicates'), that's a "multi row result set".
If that's not what you meant, you'll need to take another shot at explaining what exactly it is you need.
If that's not what you meant, you'll need to take another shot at explaining what exactly it is you need.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
Once the data goes into a hashed file, the duplicates are gone. The second row with the same key will overwrite the data from the first row. You're trying to turn sausages back into pigs.DSbox61 wrote:No, Craig. I just want a way to pass on all the duplicates also to my output file.
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
Exactly, which is why one would need to play the game of writing the data out with multiple keys (so as to retain the duplicates) and then reading it back with fewer keys for the "multi-row" part. More complicated than that of course, as a search would reveal, but that's the gist of it.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Premium Member
- Posts: 425
- Joined: Sat Nov 19, 2005 9:26 am
- Location: New York City
- Contact:
Are the values in column B known and an small list? Like Months, Week days, Quater .... If yes then I will be happy to share a trick to de-normalize data that I fetch once in while from my "Bag Of Tricks"
Let us know
Let us know
Julio Rodriguez
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Thanks everyone for the solutions. Craig, you are right. I'm looking for multi row result set. I had pretty much hit the dead end as i can't use UV or odbc stage as that requires creation of DSNs and lot more stuff which i have to go thru DBAs and Admins. Instead i started re-designing the whole approach and trying to do with session tables and queries and then join the session tables and get the data out. hope it works.
thanks again everyone.
thanks again everyone.
You don't need to set up a 'Universe connection' unless (I suppose) you are connecting to an actual Universe database. Here, the technique involves creating a regular hashed file and then accessing it with a UV stage - and all that UV stage needs to do that is a VOC record. Now, you've already got one if the hashed file is created in an 'account' but you will need to add one for 'pathed' hashed files. A search for SETFILE will get you the syntax.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: