Page 1 of 1

Warning message in Sort Stage

Posted: Wed Feb 08, 2006 4:09 pm
by somu_june
Hi,


Please help in solving this problem . Iam getting a warning message like this

Rmv_VFD_Cbl: When checking operator: User inserted sort "Sort_VFD_cbl" does not fulfill the sort requirements of the downstream operator "Rmv_VFD_Cbl"

I have a job which has sort stage connected to the remove duplicates stage. Actually what Iam doing in sort stage is iam sorting on validfromdate which has char[8] length. what Iam giving In sort stage is ...... properties........sorting keys ...... Key = Validfromdate, sortkeymode = sort,sort order = ascending and in options ..... Allow duplicate = TRUE and in
advanced tab ......... Executionmode = Sequential

and in Remove duplicate stage Iam giving in properties.... keys that define duplicates....
Key =validfromdate. and in options tab : duplicates to retain first and in advanced tab .... execution mode : parallel.


Thanks,
Somaraju

Re: Warning message in Sort Stage

Posted: Wed Feb 08, 2006 5:24 pm
by kwwilliams
One of the other developers got that here just the other day. I don't know exactly what caused it. But I did manage to get rid of it. In the Sort Stgae under the Stage/Advanced tab, set preserve partitioning to clear.

In the remove duplicate stage under input/partitioning set the partition type to hash. Set the key to your validfromdate and check the perform sort and stable box. The stable is it to preserve the previous sorting order.

Hope that helps,

Re: Warning message in Sort Stage

Posted: Wed Feb 08, 2006 7:37 pm
by somu_june
Hi Williams,


Thanks for solving my another problem







Thanks,
Somaraju

Posted: Wed Feb 08, 2006 9:18 pm
by kumar_s
Hi Somu raj,

Code: Select all

Rmv_VFD_Cbl: When checking operator: User inserted sort "Sort_VFD_cbl" does not fulfill the sort requirements of the downstream operator "Rmv_VFD_Cbl" 
This warnings apear only if the key defnition in the sort and the remove dulicate differes (Even slightly).

Make sure the partion (If hash) is based on the key in the sort in your case Validfromdate. (No need to execute in Sequential).
Maintail the same partion over the Remove duplicate stage so that it inherits the partiton and the sorting information from the preceding stage Sort_VFD_cbl.
Make sure even the data type remains the same.

-Kumar

Posted: Thu Feb 09, 2006 7:12 am
by kwwilliams
What the error probably stems from is that this would be a piece of the larger job. The data was all ready partitioned but then through a copy or transformer just a few columns were pushed into another stream. Now the partitioining that is being propogated through doesn't make any sense. So you need to clear the partitioning and then reset it I believe.

That's my theory about what caused this.

Keith

Posted: Thu Feb 09, 2006 8:08 am
by kumar_s
Yes, repartitioning will always overrides the warning. But need a compromise on the performance :wink: .
Always better to avoid unnecessary repartitioning. If the sort stage is introduced for the sake of remove duplicates, it is wise to partiton once at the sort stage, probably with hash partiton on the key and maintain the same partion to the remove duplicate.

-Kumar

Posted: Thu Feb 09, 2006 9:31 am
by kwwilliams
I agree that repartitioning when not neccessary should be avoided. But if I have a job that starts out with three key columns. i am going to partition based upon that key (A,B, and C).

At some later point in the job I need to get the max date to put into a control table. I don't need A, B or C. i need column D. So I send the main stream into a copy stage and just push column D down a secondary stream. These dates are still in the same partitions based upon A,B and C. Shouldn't I clear the partition at this point and repartition based upon the date itself? (This is truly a question and not a rebuttle). The way I see it, unless I repartition the data, I could have the dates spread across different partitions and therefore would be unable to truly get the min or max for each date. But my understanding could be incorrect. If it is I truly would appreciate clarification.

Thanks,

Posted: Thu Feb 09, 2006 3:08 pm
by somu_june
Hi Kumar,

I tried your option . I gave hash partition in sortstage and and set perserve partitioning to default and I gave the same hash partition to the remove duplicate stage and set perseved partition to default. I got a warning message saying


Rmv_VFD_Cbl.Frm_Sort_cbl_Sort: When checking operator: Operator of type "APT_TSortOperator": will partition despite the
preserve-partitioning flag on the data set on input port 0.



Thanks,
Somaraju

Posted: Thu Feb 09, 2006 3:46 pm
by kwwilliams
will partition despite the preserve-partitioning flag on the data set on input port 0.

Is why you need to go to the previous stage and tell it to clear the partitioning. Each stage sends a message to the next stage to clear or propogate the partitioning. If you tell it to propogate and then repartition it will throw a warning just like what you received there. I interested in knowing if there is another way around this entire issue, when the job is set up as I outlined in my previous post.

Posted: Fri Feb 10, 2006 3:42 am
by ameyvaidya
kwwilliams wrote: I interested in knowing if there is another way around this entire issue, when the job is set up as I outlined in my previous post.
Hi Keith,
Have no access to DS at the moment but this is what we used to do:

Hash Partition before the sort stage
Run the sort stage in Parallel mode
Do not repartition the data between the sort and remove duplicates stages.

What happened in Somu's case is that although the sort stage sorts the data, it ran in sequential mode. the data was re-partitioned after the sort stage and it may be possible that either the order of data or might have been modified or the remove duplicates stage downstream might not have agreed with the partitioning type.

IHTH

Posted: Fri Feb 10, 2006 8:16 am
by kumar_s
somu_june wrote:Hi Kumar,

I tried your option . I gave hash partition in sortstage and and set perserve partitioning to default and I gave the same hash partition to the remove duplicate stage and set perseved partition to default. I got a
Hi Somu,
I mentioned SAME partition and not same hash partition. :wink:

Hi Keith,

In your case, yes repartition is needed. Key on which it was partitioned (A,B andC) and and key on which the aggregate need to be done (Date) is different. So it certainly need a repartition.
Say If your requirement goes in other way, if you need to find a max/min of date for each grough of key (A,B and C) then repartiton wont be required.

-Kumar