Hi,
I am trying to remove duplictes only using Transformer. In the input tab of transformer I am doing hash partitioning and doing perform sort. On one of key I am doing sort partitioning and then on other key I did just sorting Asc and then other column I did sorting Descending.
Key1 (Sorting,partitioning)
Key2(Sorting Asc)
column1(sorting Des)
Then I checked the Stable and Unique check box in the tab expecting to retain the first record when there are duplictes, but I don't see any duplicate records getting dropped.
Could any one please let me know how this stable and unique works because in documentation it is mentioned that if I check both stable and Unique the first duplicte record will be retained. Please let me know if I am missing anything or any other postes regardign this.
Any help would be really appreciated.
remove duplicates using Transformer
Moderators: chulett, rschirm, roy
Are all 3 keys supposed to denote the duplicates or just the first or second keys?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
But since the comparison is done on all 3 sorted columns you won't get duplicates...
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Thank you much, you are right I tested with only key 1 and key 2 and it worked just fine, removing the duplicates. I guess I has to use remove duplicate stage and retain the first record.
After the transformer stage I will use the same partitioning in the remove duplicate stage and retain the first record. Please let me know if thats not correct approach.
After the transformer stage I will use the same partitioning in the remove duplicate stage and retain the first record. Please let me know if thats not correct approach.
-
- Participant
- Posts: 152
- Joined: Tue Jan 13, 2009 8:59 am
by default...the remove dups stage again inserts the sort operator...neena wrote:Thank you much, you are right I tested with only key 1 and key 2 and it worked just fine, removing the duplicates. I guess I has to use remove duplicate stage and retain the first record.
After the transformer stage I will use the same partitioning in the remove duplicate stage and retain the first record. Please let me know if thats not correct approach.
The reason I was trying to avoid using remove duplicate stage is because this is an existing code and I am trying to avoid adding stages.
What I did was, in transformer I did Hash partitioning and perform sort but didn't checked the stable and unique check boxe's.
Key1 (Sorting,partitioning)
Key2(Sorting Asc)
column1(sorting Des)
Next stage after this transformer is copy stage, so I used "same" partitioning in copy stage and checked perform sort, stable and Unique check boxes and selected the Key1 and Key2
Key1 (Sorting, Asc)
Key2(Sorting Asc)
It worked fine, but please let me know if there are any down sides of doing this.
What I did was, in transformer I did Hash partitioning and perform sort but didn't checked the stable and unique check boxe's.
Key1 (Sorting,partitioning)
Key2(Sorting Asc)
column1(sorting Des)
Next stage after this transformer is copy stage, so I used "same" partitioning in copy stage and checked perform sort, stable and Unique check boxes and selected the Key1 and Key2
Key1 (Sorting, Asc)
Key2(Sorting Asc)
It worked fine, but please let me know if there are any down sides of doing this.