Pick a particular duplicate record
Moderators: chulett, rschirm, roy
Pick a particular duplicate record
Is there any way we can pick a particular duplicate record from two duplicate records?.
For eg: if there are two duplicate records on Field1, and my requirment is to pick the record with field2 as "some particular value"..else pick the first out of the duplicates.
eg:
filed1 field2
123 abc
123 def
I should pick 123 abc. else the first record out of the group.
Thanks
For eg: if there are two duplicate records on Field1, and my requirment is to pick the record with field2 as "some particular value"..else pick the first out of the duplicates.
eg:
filed1 field2
123 abc
123 def
I should pick 123 abc. else the first record out of the group.
Thanks
Is that level of programming possible in datastage?.dsusersaj wrote:Nope. I need to conditionally select a record from duplicate if the duplicate contains a particular value for a non-key field. Else retain the first record.anbu wrote:Use remove duplicate stage and set Duplicate to retain as First.
Also set execution mode to sequential in remove duplicate stage
0001 is not my lowest possible value..This was just an example. My actual values are 07108 and 07107. There are values lower than and higher than these values in the list.chulett wrote:If '0001' is your lowest possible value for that field, sort ascending on it and always pick the first duplicate as noted earlier. ...
Code: Select all
Source --> Transformer --> Remove Duplicate --> Target
Code: Select all
Field2 <> "0001"
You are the creator of your destiny - Swami Vivekananda
I want to remove '0001' only when it is coming as duplicate.anbu wrote:Set the constraint in transformer toCode: Select all
Source --> Transformer --> Remove Duplicate --> Target
Use Remove duplicates as mentioned above.Code: Select all
Field2 <> "0001"
In your design, there is a possibility that '0001' gets retained from the first part while the other records gets removed.
And the second part removes '0001' . SO from that group there is nothing that gets retained.
If "0001' is not always a constant value, you can always create a dummy column something like a flag you can set to based upon the input values within the transformer stage in the design above (from anbu) and then use the Remove Duplicates stage (by sorting the new dummy column along with the keys).anbu wrote:Set the constraint in transformer toCode: Select all
Source --> Transformer --> Remove Duplicate --> Target
Use Remove duplicates as mentioned above.Code: Select all
Field2 <> "0001"
Hope that helps.
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
kris007 wrote:If "0001' is not always a constant value, you can always create a dummy column something like a flag you can set to based upon the input values within the transformer stage in the design above (from anbu) and then use the Remove Duplicates stage (by sorting the new dummy column along with the keys).anbu wrote:Set the constraint in transformer toCode: Select all
Source --> Transformer --> Remove Duplicate --> Target
Use Remove duplicates as mentioned above.Code: Select all
Field2 <> "0001"
Hope that helps.
This will also remove '0001' when it is not coming as a dupe.
Nope. You have to define your dummy column in a way that you can identify the values you need drop when they have duplicates. Something like
In the Remove Duplicates, while you hash partition and sort the data on the keys, you need to just sort the data on the Dummy column (and not partition it). You can do this by right clicking on the Dummy Column within the Input >> Partitioning tab once you drag the keys onto the right hand side. As long as you sort your data on the Dummy column, you will retain "0001" when it is not coming in as a dupe.
Hope that helps.
Code: Select all
If Field = '0001' Then 1 Else 2
Hope that helps.
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
But the next step is to filter out 0001 right?.. Filed1<>'0001'. Then these wil be filtered out even if it is coming as a non-dupe.kris007 wrote:Nope. You have to define your dummy column in a way that you can identify the values you need drop when they have duplicates. Something likeIn the Remove Duplicates, while you hash partition and sort the data on the keys, you need to just sort the data on the Dummy column (and not partition it). You can do this by right clicking on the Dummy Column within the Input >> Partitioning tab once you drag the keys onto the right hand side. As long as you sort your data on the Dummy column, you will retain "0001" when it is not coming in as a dupe.Code: Select all
If Field = '0001' Then 1 Else 2
Hope that helps.
I guess the tricky part here is to define that dummy column with a value when there are duplicates on '0001' . May be you can pull out an extra link from source where the non-key column has '0001' and group by on the 'key' column to find the 'counts'. Count more than 1 would tell you that key has duplicates on '0001'. Use the resultant set as a lookup.kris007 wrote:Nope. You have to define your dummy column in a way that you can identify the values you need drop when they have duplicates.
Or is there an easier way!!!
Regards,
SriKara
SriKara