Hi all,
when we have a remove duplicate option in sort stage, why we
have a remove duplicate stage in PX, thought it is
recommended to sort data before using a remove duplicate
stage.
If any one knows please answer
difference in Sort w/RD and Remove Duplicate w/Sort
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 48
- Joined: Fri Feb 29, 2008 1:09 am
- Location: Bangalore
difference in Sort w/RD and Remove Duplicate w/Sort
Thanks&Regards
S.Swathi
S.Swathi
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
Intervew Question?
There is specific stage in PX for almost all necessary transformation generally applied during migration, hence there is no question about existence of Remove duplicate stage. However there are questions about existence of sort stage as you can use Inlink sort, which uses the same tsort operator as sort stage. But still the explicit sort stage has added fuctionality, and also good to maintain.
A unique sort takes the first record it encounters depending on the key defined for sort. Now lets say you want the first/last record for a key for data sorted on date, in this case you have to sort data on key+date and then remove duplicate on key. There are other ways to do the same but in my opinion a combination of sort and remove duplicate will be most suitable solution.
There is specific stage in PX for almost all necessary transformation generally applied during migration, hence there is no question about existence of Remove duplicate stage. However there are questions about existence of sort stage as you can use Inlink sort, which uses the same tsort operator as sort stage. But still the explicit sort stage has added fuctionality, and also good to maintain.
A unique sort takes the first record it encounters depending on the key defined for sort. Now lets say you want the first/last record for a key for data sorted on date, in this case you have to sort data on key+date and then remove duplicate on key. There are other ways to do the same but in my opinion a combination of sort and remove duplicate will be most suitable solution.
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.![Wink :wink:](./images/smilies/icon_wink.gif)
Genius may have its limitations, but stupidity is not thus handicapped.
![Wink :wink:](./images/smilies/icon_wink.gif)
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You can avoid the Remove Duplicates stage if you don't care which record from each group is kept; if you want to specify that the first or last record from each group is kept then you need a Remove Duplicates stage.
Remove Duplicates relies on data being sorted and partitioned on the key that identifies duplicates.
An explicit Sort stage gives you control over how much memory is allocated for sorting, and a number of other benefits that are not useful in the current scenario. Memory allocated for sorting can also be controlled by setting the APT_TSORT_STRESS_BLOCKSIZE environment variable.
Remove Duplicates relies on data being sorted and partitioned on the key that identifies duplicates.
An explicit Sort stage gives you control over how much memory is allocated for sorting, and a number of other benefits that are not useful in the current scenario. Memory allocated for sorting can also be controlled by setting the APT_TSORT_STRESS_BLOCKSIZE environment variable.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.