Case Study with SyncSort

joesat · Post by **joesat** » Mon Sep 17, 2007 8:05 am

I have to replicate the following scenario in DataStage.

There is a sequential file which has been sorted using five different keys in descending order using a SyncSort program. The output of this file is given to another SyncSort program which uses three keys (these three are part of the five used earlier) and removes the duplicates and outputs to another sequential file.

Now, I have tried to replicate this by using a Sort stage followed by a Remove Duplicates stage. But we already know that if the keys used in a Sort stage and those used in a following stage are different, then a warning is shown. But here there is no other option as I have to replicate the existing scenario. I have used hash partitioning for the Sort stage and 'Same' partitioning for the Remove Dups stage.

The number of output records obtained in the PX job is same as in the Syncsort utility. But the order is jumbled up. Also, is there any way I can remove the warning in this particular scenario, ie. When checking operator: User inserted sort "Sort_stage" does not fulfill the sort requirements of the downstream operator "Remove_Dups_Stage".