retain duplicates out of hash file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

DSbox61
Premium Member
Premium Member
Posts: 62
Joined: Mon May 22, 2006 10:18 am

retain duplicates out of hash file

Post by DSbox61 »

Hi everyone, I have a requirement where i would need to retain duplicates out of hash file.

my primary link is db2 stage going into a transformer. i have reference link coming out of hash file into a transformer stage. output link out of transformer stage goes to a sequential file.

now, in hash file, i have columnA and columnB. I have selected columnA as primary key. i can map this primary key in transfore stage to another column from DB2 stage. but by doing this, if i will be losing duplicates from hash file. i want to retain them. and if i select columnB also as primary key then i don't have anything to map it to from DB2 stage.

the values are:
123,abc
123,def

please let me know how i can retain these duplicates and pass them on to output link. thanks!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sounds like what you want is a "multi row result set" which requires a UV or ODBC stage to implement. Search here for all of the gory details on how to set that up.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSbox61
Premium Member
Premium Member
Posts: 62
Joined: Mon May 22, 2006 10:18 am

Post by DSbox61 »

No, Craig. I just want a way to pass on all the duplicates also to my output file.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That's just it, there are no duplicates in a hashed file unless you fake it by follow the technique suggested that the search would turn up. Let me make sure we're both on the same page - if you want one input row to hit a reference lookup and return more than one lookup result (your 'duplicates'), that's a "multi row result set".

If that's not what you meant, you'll need to take another shot at explaining what exactly it is you need.
-craig

"You can never have too many knives" -- Logan Nine Fingers
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

DSbox61 wrote:No, Craig. I just want a way to pass on all the duplicates also to my output file.
Once the data goes into a hashed file, the duplicates are gone. The second row with the same key will overwrite the data from the first row. You're trying to turn sausages back into pigs.
Phil Hibbs | Capgemini
Technical Consultant
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Exactly, which is why one would need to play the game of writing the data out with multiple keys (so as to retain the duplicates) and then reading it back with fewer keys for the "multi-row" part. More complicated than that of course, as a search would reveal, but that's the gist of it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Are the values in column B known and an small list? Like Months, Week days, Quater .... If yes then I will be happy to share a trick to de-normalize data that I fetch once in while from my "Bag Of Tricks"


Let us know
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Denormalization needs creation of metadata.

Better store the hashed file source into another table and do a join.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:idea: Rather than continue to go round and round, why don't we wait for DSbox61 to come back and clear things up before we throw out any more 'solutions'?
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSbox61
Premium Member
Premium Member
Posts: 62
Joined: Mon May 22, 2006 10:18 am

Post by DSbox61 »

Thanks everyone for the solutions. Craig, you are right. I'm looking for multi row result set. I had pretty much hit the dead end as i can't use UV or odbc stage as that requires creation of DSNs and lot more stuff which i have to go thru DBAs and Admins. Instead i started re-designing the whole approach and trying to do with session tables and queries and then join the session tables and get the data out. hope it works. :roll:

thanks again everyone.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The UV stage doesn't require a DSN, just a VOC record and an index, neither of which require admin privs and they can all be automated in a job. But if you want to take a different route, I appreciate that and good luck with it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSbox61
Premium Member
Premium Member
Posts: 62
Joined: Mon May 22, 2006 10:18 am

Post by DSbox61 »

Craig, for information sake, i would like to know how i can set up a universe connection. i read the admin manual and it says that i need to make entries in the uvodbc.config file. can you please tell me where i can find this file and is this what i need to be able to use an UV stage.

thanks!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You don't need to set up a 'Universe connection' unless (I suppose) you are connecting to an actual Universe database. Here, the technique involves creating a regular hashed file and then accessing it with a UV stage - and all that UV stage needs to do that is a VOC record. Now, you've already got one if the hashed file is created in an 'account' but you will need to add one for 'pathed' hashed files. A search for SETFILE will get you the syntax.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The UniVerse connection already exists, and is called localuv.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I guess I should have said 'an actual Universe database other than the local repository'. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply