Page 1 of 1

remove duplicates using Transformer

Posted: Thu Sep 10, 2009 11:21 am
by neena
Hi,

I am trying to remove duplictes only using Transformer. In the input tab of transformer I am doing hash partitioning and doing perform sort. On one of key I am doing sort partitioning and then on other key I did just sorting Asc and then other column I did sorting Descending.

Key1 (Sorting,partitioning)
Key2(Sorting Asc)
column1(sorting Des)

Then I checked the Stable and Unique check box in the tab expecting to retain the first record when there are duplictes, but I don't see any duplicate records getting dropped.
Could any one please let me know how this stable and unique works because in documentation it is mentioned that if I check both stable and Unique the first duplicte record will be retained. Please let me know if I am missing anything or any other postes regardign this.
Any help would be really appreciated.

Posted: Thu Sep 10, 2009 11:24 am
by ArndW
Are all 3 keys supposed to denote the duplicates or just the first or second keys?

Posted: Thu Sep 10, 2009 11:29 am
by neena
Its first and second keys, both of them.

Posted: Thu Sep 10, 2009 11:33 am
by ArndW
But since the comparison is done on all 3 sorted columns you won't get duplicates...

Posted: Thu Sep 10, 2009 11:46 am
by neena
Thank you much, you are right I tested with only key 1 and key 2 and it worked just fine, removing the duplicates. I guess I has to use remove duplicate stage and retain the first record.
After the transformer stage I will use the same partitioning in the remove duplicate stage and retain the first record. Please let me know if thats not correct approach.

Posted: Thu Sep 10, 2009 12:10 pm
by betterthanever
neena wrote:Thank you much, you are right I tested with only key 1 and key 2 and it worked just fine, removing the duplicates. I guess I has to use remove duplicate stage and retain the first record.
After the transformer stage I will use the same partitioning in the remove duplicate stage and retain the first record. Please let me know if thats not correct approach.
by default...the remove dups stage again inserts the sort operator...

Posted: Thu Sep 10, 2009 12:16 pm
by neena
The reason I was trying to avoid using remove duplicate stage is because this is an existing code and I am trying to avoid adding stages.
What I did was, in transformer I did Hash partitioning and perform sort but didn't checked the stable and unique check boxe's.

Key1 (Sorting,partitioning)
Key2(Sorting Asc)
column1(sorting Des)

Next stage after this transformer is copy stage, so I used "same" partitioning in copy stage and checked perform sort, stable and Unique check boxes and selected the Key1 and Key2

Key1 (Sorting, Asc)
Key2(Sorting Asc)

It worked fine, but please let me know if there are any down sides of doing this.