Warning message in Sort Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
somu_june
Premium Member
Premium Member
Posts: 439
Joined: Wed Sep 14, 2005 9:28 am
Location: 36p,reading road

Warning message in Sort Stage

Post by somu_june »

Hi,


Please help in solving this problem . Iam getting a warning message like this

Rmv_VFD_Cbl: When checking operator: User inserted sort "Sort_VFD_cbl" does not fulfill the sort requirements of the downstream operator "Rmv_VFD_Cbl"

I have a job which has sort stage connected to the remove duplicates stage. Actually what Iam doing in sort stage is iam sorting on validfromdate which has char[8] length. what Iam giving In sort stage is ...... properties........sorting keys ...... Key = Validfromdate, sortkeymode = sort,sort order = ascending and in options ..... Allow duplicate = TRUE and in
advanced tab ......... Executionmode = Sequential

and in Remove duplicate stage Iam giving in properties.... keys that define duplicates....
Key =validfromdate. and in options tab : duplicates to retain first and in advanced tab .... execution mode : parallel.


Thanks,
Somaraju
somaraju
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: Warning message in Sort Stage

Post by kwwilliams »

One of the other developers got that here just the other day. I don't know exactly what caused it. But I did manage to get rid of it. In the Sort Stgae under the Stage/Advanced tab, set preserve partitioning to clear.

In the remove duplicate stage under input/partitioning set the partition type to hash. Set the key to your validfromdate and check the perform sort and stable box. The stable is it to preserve the previous sorting order.

Hope that helps,
somu_june
Premium Member
Premium Member
Posts: 439
Joined: Wed Sep 14, 2005 9:28 am
Location: 36p,reading road

Re: Warning message in Sort Stage

Post by somu_june »

Hi Williams,


Thanks for solving my another problem







Thanks,
Somaraju
somaraju
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi Somu raj,

Code: Select all

Rmv_VFD_Cbl: When checking operator: User inserted sort "Sort_VFD_cbl" does not fulfill the sort requirements of the downstream operator "Rmv_VFD_Cbl" 
This warnings apear only if the key defnition in the sort and the remove dulicate differes (Even slightly).

Make sure the partion (If hash) is based on the key in the sort in your case Validfromdate. (No need to execute in Sequential).
Maintail the same partion over the Remove duplicate stage so that it inherits the partiton and the sorting information from the preceding stage Sort_VFD_cbl.
Make sure even the data type remains the same.

-Kumar
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

What the error probably stems from is that this would be a piece of the larger job. The data was all ready partitioned but then through a copy or transformer just a few columns were pushed into another stream. Now the partitioining that is being propogated through doesn't make any sense. So you need to clear the partitioning and then reset it I believe.

That's my theory about what caused this.

Keith
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Yes, repartitioning will always overrides the warning. But need a compromise on the performance :wink: .
Always better to avoid unnecessary repartitioning. If the sort stage is introduced for the sake of remove duplicates, it is wise to partiton once at the sort stage, probably with hash partiton on the key and maintain the same partion to the remove duplicate.

-Kumar
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

I agree that repartitioning when not neccessary should be avoided. But if I have a job that starts out with three key columns. i am going to partition based upon that key (A,B, and C).

At some later point in the job I need to get the max date to put into a control table. I don't need A, B or C. i need column D. So I send the main stream into a copy stage and just push column D down a secondary stream. These dates are still in the same partitions based upon A,B and C. Shouldn't I clear the partition at this point and repartition based upon the date itself? (This is truly a question and not a rebuttle). The way I see it, unless I repartition the data, I could have the dates spread across different partitions and therefore would be unable to truly get the min or max for each date. But my understanding could be incorrect. If it is I truly would appreciate clarification.

Thanks,
somu_june
Premium Member
Premium Member
Posts: 439
Joined: Wed Sep 14, 2005 9:28 am
Location: 36p,reading road

Post by somu_june »

Hi Kumar,

I tried your option . I gave hash partition in sortstage and and set perserve partitioning to default and I gave the same hash partition to the remove duplicate stage and set perseved partition to default. I got a warning message saying


Rmv_VFD_Cbl.Frm_Sort_cbl_Sort: When checking operator: Operator of type "APT_TSortOperator": will partition despite the
preserve-partitioning flag on the data set on input port 0.



Thanks,
Somaraju
somaraju
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

will partition despite the preserve-partitioning flag on the data set on input port 0.

Is why you need to go to the previous stage and tell it to clear the partitioning. Each stage sends a message to the next stage to clear or propogate the partitioning. If you tell it to propogate and then repartition it will throw a warning just like what you received there. I interested in knowing if there is another way around this entire issue, when the job is set up as I outlined in my previous post.
ameyvaidya
Charter Member
Charter Member
Posts: 166
Joined: Wed Mar 16, 2005 6:52 am
Location: Mumbai, India

Post by ameyvaidya »

kwwilliams wrote: I interested in knowing if there is another way around this entire issue, when the job is set up as I outlined in my previous post.
Hi Keith,
Have no access to DS at the moment but this is what we used to do:

Hash Partition before the sort stage
Run the sort stage in Parallel mode
Do not repartition the data between the sort and remove duplicates stages.

What happened in Somu's case is that although the sort stage sorts the data, it ran in sequential mode. the data was re-partitioned after the sort stage and it may be possible that either the order of data or might have been modified or the remove duplicates stage downstream might not have agreed with the partitioning type.

IHTH
Amey Vaidya<i>
I am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand.</i>
<i>- Douglas Adams</i>
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

somu_june wrote:Hi Kumar,

I tried your option . I gave hash partition in sortstage and and set perserve partitioning to default and I gave the same hash partition to the remove duplicate stage and set perseved partition to default. I got a
Hi Somu,
I mentioned SAME partition and not same hash partition. :wink:

Hi Keith,

In your case, yes repartition is needed. Key on which it was partitioned (A,B andC) and and key on which the aggregate need to be done (Date) is different. So it certainly need a repartition.
Say If your requirement goes in other way, if you need to find a max/min of date for each grough of key (A,B and C) then repartiton wont be required.

-Kumar
Post Reply