duplicate issue
Moderators: chulett, rschirm, roy
duplicate issue
duplicat records with same key value but different field values,how does DS picks up values. which records does it takes for processing? is there any order for selecting records from the duplciates?
Many of the stages don't care whether records are duplicates or not and will happily process all of the records.
Certain stages can identify and optionally remove duplicates (Sort, Remove Duplicates), others may handle duplicates differently depending upon the input link and/or options chosen (join, lookup, merge), still others may depend upon the operation of an outside system (database stages, for instance) to handle duplicates.
To learn the specifics about how the various stages can handle duplicates, read the Information Server Parallel Job Developer's Guide, available for download from <a href="https://www-304.ibm.com/support/docview ... 0">here</a>.
Stages such as transformers, custom operators and buildops can work with other stages to identify and handle duplicates as required.
You can design your job to meet the requirements you have with regards to duplicate handling. Therefore, the answer is ultimately: "What do you need it to do?"
Regards,
Certain stages can identify and optionally remove duplicates (Sort, Remove Duplicates), others may handle duplicates differently depending upon the input link and/or options chosen (join, lookup, merge), still others may depend upon the operation of an outside system (database stages, for instance) to handle duplicates.
To learn the specifics about how the various stages can handle duplicates, read the Information Server Parallel Job Developer's Guide, available for download from <a href="https://www-304.ibm.com/support/docview ... 0">here</a>.
Stages such as transformers, custom operators and buildops can work with other stages to identify and handle duplicates as required.
You can design your job to meet the requirements you have with regards to duplicate handling. Therefore, the answer is ultimately: "What do you need it to do?"
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I have a job where ref is .ds file and one of the 2 fields is the key field.I see there are around 15 records with same key value but different values for the other field. this stage is used as a reference for lookup.while doing lookup on what criteria the record will be selected and loaded into target. How datastge handles this duplicates at the lookup stage and selects one record?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
There is no criteria for which duplicate is choosen - it is just the first one encounted that the match will be performed on.
So, unless you want multiple rows in the output, then you should remove duplicates in either a sort or remove duplicate stage, where you can define the criteria
So, unless you want multiple rows in the output, then you should remove duplicates in either a sort or remove duplicate stage, where you can define the criteria
_________________
- Susanne
- Susanne
Seriously? Sorry but first you said there's no criteria and then you stated the criteria it uses. Wasn't trying to bust anyone's chops, it just tickled my funny bone a bit. And FWIW, that behaviour matches what an Informatica lookup does when its selection criteria is set to "First".
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers