Issue with Sorting in Remove Duplicate Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sshettar
Premium Member
Premium Member
Posts: 264
Joined: Thu Nov 30, 2006 10:37 am

Issue with Sorting in Remove Duplicate Stage

Post by sshettar »

Hi All,

Well i have this job where in i'm doing a join on two tables of which i need most of the fields of table1 and just one field of table2 in order to build my table3.
In the process i do have cetain duplicates of the field on which i'm joining the two tables in the table2 .
and i had to also remove duplicates on one other column from table1 too.
So here is what i have done
DB2Stage(Table2)
.
.
.
Remove DuplicateSatge1
.
. .

DB2 Stage.....RDUplicate .... JoinStage.........Transformer.........Table3
(Table1) Stage2


Well in RDuplicate Stage1 i'm removing the dupliccates of the column on which we are joining the two tables , for this i'm doing a hash partition on that filed and also sorting it (ASC)
And in RDuplicate Stage2 i'm removing the duplicates of oneother field in table1 which is required for building my Table3. for this again i'm doin a hash partion on that filed and again sorting (ASC)

In my join stage i'm doing a hash partion for the input link from RDuplicate Stage1 on the joning field and sorting too, and for the link from the table2 i'm doing a same partion as it has already been partioned the way it should be in RDuplicate Stage1.

But here i'm getting warnings for the link from the RDuplicate Stage2to the Join Stage and i dont quite understand as to what these warings mean and why i'm getting these warnings. heres how my warning looks

Remove_Duplicates_19: When checking operator: User inserted sort "Remove_Duplicates_19.lnk_ARS_extract_Sort" does not fulfill the sort requirements of the downstream operator "Remove_Duplicates_19"

Join_Left_Outer.DSLink20_Sort: When checking operator: Operator of type "APT_TSortOperator": will partition despite the
preserve-partitioning flag on the data set on input port 0.


Remove_Duplicates_19: When checking operator: User inserted sort "Remove_Duplicates_19.lnk_ARS_extract_Sort" does not fulfill the sort requirements of the downstream operator "Remove_Duplicates_19"

Any help on this would be highly appreciated

Thanks
haimurali
Participant
Posts: 5
Joined: Mon Nov 07, 2005 10:50 pm

Re: Issue with Sorting in Remove Duplicate Stage

Post by haimurali »

RMD stage do the following
order of keys should be order of your hash partition
stage -> advanced -> preserve partition select clear.
thebird
Participant
Posts: 254
Joined: Thu Jan 06, 2005 12:11 am
Location: India
Contact:

Re: Issue with Sorting in Remove Duplicate Stage

Post by thebird »

Join_Left_Outer.DSLink20_Sort: When checking operator: Operator of type "APT_TSortOperator": will partition despite the
preserve-partitioning flag on the data set on input port 0.
This warning tells you that the Sort (TSort operator) on the input link on the Join stage will repartition the data (as you have set it to HASH) though the Preserve Partition Flag in the previous stage is set to "Propagate".

As mentioned in the above post -
haimurali wrote:stage -> advanced -> preserve partition - select clear.
Do this in the Remove Duplicate stage to remove the warning.
Remove_Duplicates_19: When checking operator: User inserted sort "Remove_Duplicates_19.lnk_ARS_extract_Sort" does not fulfill the sort requirements of the downstream operator "Remove_Duplicates_19"
This tells you that - sort key or the sort order or the partition key order is not as required by the operator.

As mentioned by haimurali - set it right and that should remove the warning.
------------------
Aneesh
Post Reply