duplicate issue

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
India2000
Participant
Posts: 274
Joined: Sun Aug 22, 2010 11:07 am

duplicate issue

Post by India2000 »

duplicat records with same key value but different field values,how does DS picks up values. which records does it takes for processing? is there any order for selecting records from the duplciates?
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Many of the stages don't care whether records are duplicates or not and will happily process all of the records.

Certain stages can identify and optionally remove duplicates (Sort, Remove Duplicates), others may handle duplicates differently depending upon the input link and/or options chosen (join, lookup, merge), still others may depend upon the operation of an outside system (database stages, for instance) to handle duplicates.

To learn the specifics about how the various stages can handle duplicates, read the Information Server Parallel Job Developer's Guide, available for download from <a href="https://www-304.ibm.com/support/docview ... 0">here</a>.

Stages such as transformers, custom operators and buildops can work with other stages to identify and handle duplicates as required.

You can design your job to meet the requirements you have with regards to duplicate handling. Therefore, the answer is ultimately: "What do you need it to do?"

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You might also contemplate what you want to do if there are nulls in the data.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
India2000
Participant
Posts: 274
Joined: Sun Aug 22, 2010 11:07 am

Post by India2000 »

I have a job where ref is .ds file and one of the 2 fields is the key field.I see there are around 15 records with same key value but different values for the other field. this stage is used as a reference for lookup.while doing lookup on what criteria the record will be selected and loaded into target. How datastge handles this duplicates at the lookup stage and selects one record?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Lookup returns the first one found, but still checks for others, unless you enable multiple row return from that reference input link.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
suse_dk
Participant
Posts: 93
Joined: Thu Aug 11, 2011 6:18 am
Location: Denmark

Post by suse_dk »

There is no criteria for which duplicate is choosen - it is just the first one encounted that the match will be performed on.

So, unless you want multiple rows in the output, then you should remove duplicates in either a sort or remove duplicate stage, where you can define the criteria
_________________
- Susanne
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

suse_dk wrote:There is no criteria for which duplicate is choosen - it is just the first one encounted that the match will be performed on.
That sounds like criteria to me. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
suse_dk
Participant
Posts: 93
Joined: Thu Aug 11, 2011 6:18 am
Location: Denmark

Post by suse_dk »

:roll:
_________________
- Susanne
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Seriously? Sorry but first you said there's no criteria and then you stated the criteria it uses. Wasn't trying to bust anyone's chops, it just tickled my funny bone a bit. And FWIW, that behaviour matches what an Informatica lookup does when its selection criteria is set to "First".
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply