I have to replicate the following scenario in DataStage.
There is a sequential file which has been sorted using five different keys in descending order using a SyncSort program. The output of this file is given to another SyncSort program which uses three keys (these three are part of the five used earlier) and removes the duplicates and outputs to another sequential file.
Now, I have tried to replicate this by using a Sort stage followed by a Remove Duplicates stage. But we already know that if the keys used in a Sort stage and those used in a following stage are different, then a warning is shown. But here there is no other option as I have to replicate the existing scenario. I have used hash partitioning for the Sort stage and 'Same' partitioning for the Remove Dups stage.
The number of output records obtained in the PX job is same as in the Syncsort utility. But the order is jumbled up. Also, is there any way I can remove the warning in this particular scenario, ie. When checking operator: User inserted sort "Sort_stage" does not fulfill the sort requirements of the downstream operator "Remove_Dups_Stage".
Replicating Syncsort in PX
Moderators: chulett, rschirm, roy
Replicating Syncsort in PX
Joel Satire
-
- Premium Member
- Posts: 99
- Joined: Mon Sep 03, 2007 7:49 am
- Location: Stockholm, Sweden
As far as I know, your result should be what you're striving for in what you are describing, just make sure datastage hasn't inserted any own operators (inserted sort). We are using a similar approach at my place but where we are filtering duplicates using an extra column. We accepted those warnings after severe testing.
However to get rid of the warning you have to remove duplicates in the same order as your sort. For example
Sort on key1,key2,key3,key4.key5
then remove duplicates on key1,key2,key3
This will eliminate all warnings.
However to get rid of the warning you have to remove duplicates in the same order as your sort. For example
Sort on key1,key2,key3,key4.key5
then remove duplicates on key1,key2,key3
This will eliminate all warnings.
Stefan, by 'inserted sort' do you mean the 'perform sort' option within the sort stage and the remove dups stage? If that is so, yes I have disabled them.
And yes I have used the keys in the order that you have shown.
I guess the warnings are not an issue. The problem is that the sorted data is jumbled once it gets into the remove dups stage, ie. the output from the sort stage (which has five keys in descending order) is 5, 4, 3, 2, 1. But output data from the remove dups stage is 3, 4, 1, 2, 5.
Can someone provide me with possible reasons as to why this jumbling up occurs?
And yes I have used the keys in the order that you have shown.
I guess the warnings are not an issue. The problem is that the sorted data is jumbled once it gets into the remove dups stage, ie. the output from the sort stage (which has five keys in descending order) is 5, 4, 3, 2, 1. But output data from the remove dups stage is 3, 4, 1, 2, 5.
Can someone provide me with possible reasons as to why this jumbling up occurs?
Joel Satire