Pick a particular duplicate record

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

dsusersaj
Premium Member
Premium Member
Posts: 160
Joined: Mon Dec 17, 2007 10:44 am

Pick a particular duplicate record

Post by dsusersaj »

Is there any way we can pick a particular duplicate record from two duplicate records?.

For eg: if there are two duplicate records on Field1, and my requirment is to pick the record with field2 as "some particular value"..else pick the first out of the duplicates.

eg:

filed1 field2
123 abc
123 def

I should pick 123 abc. else the first record out of the group.

Thanks
anbu
Premium Member
Premium Member
Posts: 596
Joined: Sat Feb 18, 2006 2:25 am
Location: india

Post by anbu »

Use remove duplicate stage and set Duplicate to retain as First.

Also set execution mode to sequential in remove duplicate stage
You are the creator of your destiny - Swami Vivekananda
dsusersaj
Premium Member
Premium Member
Posts: 160
Joined: Mon Dec 17, 2007 10:44 am

Post by dsusersaj »

anbu wrote:Use remove duplicate stage and set Duplicate to retain as First.

Also set execution mode to sequential in remove duplicate stage
Nope. I need to conditionally select a record from duplicate if the duplicate contains a particular value for a non-key field. Else retain the first record.
dsusersaj
Premium Member
Premium Member
Posts: 160
Joined: Mon Dec 17, 2007 10:44 am

Post by dsusersaj »

dsusersaj wrote:
anbu wrote:Use remove duplicate stage and set Duplicate to retain as First.

Also set execution mode to sequential in remove duplicate stage
Nope. I need to conditionally select a record from duplicate if the duplicate contains a particular value for a non-key field. Else retain the first record.
Is that level of programming possible in datastage?.
anbu
Premium Member
Premium Member
Posts: 596
Joined: Sat Feb 18, 2006 2:25 am
Location: india

Post by anbu »

What is the requirement to pick the field2?
You are the creator of your destiny - Swami Vivekananda
dsusersaj
Premium Member
Premium Member
Posts: 160
Joined: Mon Dec 17, 2007 10:44 am

Post by dsusersaj »

anbu wrote:What is the requirement to pick the field2?
Requirement is that if out of the duplicates, field 2 is "0001" then exclude this record and take the other record. Else if it is a normal duplicate take any.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

If '0001' is your lowest possible value for that field, sort ascending on it and always pick the first duplicate as noted earlier.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dsusersaj
Premium Member
Premium Member
Posts: 160
Joined: Mon Dec 17, 2007 10:44 am

Post by dsusersaj »

chulett wrote:If '0001' is your lowest possible value for that field, sort ascending on it and always pick the first duplicate as noted earlier. ...
0001 is not my lowest possible value..This was just an example. My actual values are 07108 and 07107. There are values lower than and higher than these values in the list.
anbu
Premium Member
Premium Member
Posts: 596
Joined: Sat Feb 18, 2006 2:25 am
Location: india

Post by anbu »

Code: Select all

Source --> Transformer --> Remove Duplicate --> Target
Set the constraint in transformer to

Code: Select all

Field2 <> "0001"
Use Remove duplicates as mentioned above.
You are the creator of your destiny - Swami Vivekananda
dsusersaj
Premium Member
Premium Member
Posts: 160
Joined: Mon Dec 17, 2007 10:44 am

Post by dsusersaj »

anbu wrote:

Code: Select all

Source --> Transformer --> Remove Duplicate --> Target
Set the constraint in transformer to

Code: Select all

Field2 <> "0001"
Use Remove duplicates as mentioned above.
I want to remove '0001' only when it is coming as duplicate.

In your design, there is a possibility that '0001' gets retained from the first part while the other records gets removed.

And the second part removes '0001' . SO from that group there is nothing that gets retained.
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

anbu wrote:

Code: Select all

Source --> Transformer --> Remove Duplicate --> Target
Set the constraint in transformer to

Code: Select all

Field2 <> "0001"
Use Remove duplicates as mentioned above.
If "0001' is not always a constant value, you can always create a dummy column something like a flag you can set to based upon the input values within the transformer stage in the design above (from anbu) and then use the Remove Duplicates stage (by sorting the new dummy column along with the keys).

Hope that helps.
Kris

Where's the "Any" key?-Homer Simpson
dsusersaj
Premium Member
Premium Member
Posts: 160
Joined: Mon Dec 17, 2007 10:44 am

Post by dsusersaj »

kris007 wrote:
anbu wrote:

Code: Select all

Source --> Transformer --> Remove Duplicate --> Target
Set the constraint in transformer to

Code: Select all

Field2 <> "0001"
Use Remove duplicates as mentioned above.
If "0001' is not always a constant value, you can always create a dummy column something like a flag you can set to based upon the input values within the transformer stage in the design above (from anbu) and then use the Remove Duplicates stage (by sorting the new dummy column along with the keys).

Hope that helps.

This will also remove '0001' when it is not coming as a dupe.
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

Nope. You have to define your dummy column in a way that you can identify the values you need drop when they have duplicates. Something like

Code: Select all

If Field = '0001' Then 1 Else 2
In the Remove Duplicates, while you hash partition and sort the data on the keys, you need to just sort the data on the Dummy column (and not partition it). You can do this by right clicking on the Dummy Column within the Input >> Partitioning tab once you drag the keys onto the right hand side. As long as you sort your data on the Dummy column, you will retain "0001" when it is not coming in as a dupe.

Hope that helps.
Kris

Where's the "Any" key?-Homer Simpson
dsusersaj
Premium Member
Premium Member
Posts: 160
Joined: Mon Dec 17, 2007 10:44 am

Post by dsusersaj »

kris007 wrote:Nope. You have to define your dummy column in a way that you can identify the values you need drop when they have duplicates. Something like

Code: Select all

If Field = '0001' Then 1 Else 2
In the Remove Duplicates, while you hash partition and sort the data on the keys, you need to just sort the data on the Dummy column (and not partition it). You can do this by right clicking on the Dummy Column within the Input >> Partitioning tab once you drag the keys onto the right hand side. As long as you sort your data on the Dummy column, you will retain "0001" when it is not coming in as a dupe.

Hope that helps.
But the next step is to filter out 0001 right?.. Filed1<>'0001'. Then these wil be filtered out even if it is coming as a non-dupe.
SriKara
Premium Member
Premium Member
Posts: 30
Joined: Wed Jun 01, 2005 8:40 am
Location: UK

Post by SriKara »

kris007 wrote:Nope. You have to define your dummy column in a way that you can identify the values you need drop when they have duplicates.
I guess the tricky part here is to define that dummy column with a value when there are duplicates on '0001' . May be you can pull out an extra link from source where the non-key column has '0001' and group by on the 'key' column to find the 'counts'. Count more than 1 would tell you that key has duplicates on '0001'. Use the resultant set as a lookup.

Or is there an easier way!!!
Regards,
SriKara
Post Reply