Help to use correct partitioning

highpoint · Post by **highpoint** » Sat Jan 15, 2011 10:44 pm

Hi,

I have job with many stages.
Here is where i need some help.

Sort -> Transformer -> Aggregator --> Transformer --> Pivot

Currently in the sort stage i am sorting on key field "Product_ID"
All the stages are using the Auto partitioning.

I would like to performance tune the job.

I tried using hash partitioning on sort stage and then using "same" partitioning all the way upto pivot stage.

When i use the Same partitioning with Transformer i get the Warning
"Input dataset 0 has a partitioning method other than entire specified; disabling memory sharing".

Please help me which partitioning to use for getting best performance.

ray.wurlod · Post by **ray.wurlod** » Sun Jan 16, 2011 12:39 am

This is an unusual message. What is your Transformer stage doing? Can you please post the exact, and entire, message, so we can be certain before offering advice?

(Auto) will probably achive optimum partitioning in this job design. I'm only curious about where it thinks that Entire might be appropriate - this is normally only on the reference input of a Lookup stage.

highpoint · Post by **highpoint** » Sun Jan 16, 2011 1:12 pm

ray.wurlod wrote:This is an unusual message. What is your Transformer stage doing? Can you please post the exact, and entire, message, so we can be certain before offering advice?

(Auto) will probably achive optimum partitioning in this job design. I'm only curious about where it thinks that Entire might be appropriate - this is normally only on the reference input of a Lookup stage.

My Key column in this job is "product_id"
My transformer is using the key change column defined in sort stage on field "product_id" to do counting logic.

I was under impression that the sort and aggregator stages should NEVER use auto partioning and should you hash partioning on the grouping field. I am using sort method in aggregator.

So, to improve performance i am using the hash partitioning on sort stage and all other stages use same partitioning.

Then i am getting this warning:
xTransform: Input dataset 0 has a partitioning method other than entire specified; disabling memory sharing.

Appreciate your help in to achieve best performance tuning.

ray.wurlod · Post by **ray.wurlod** » Sun Jan 16, 2011 1:33 pm

Ignore the message. It's only alerting you to the fact that shared memory will not be used because not all keys are on all nodes. Use a message handler to demote to informational.

Next step is to look at the score, to see what partitioning (Auto) is actually giving you. The Sort stage requires its input to be hash partitioned on the first sort key. Assuming that you are grouping by product_id, then Same should be used to carry that partitioning through the job.

highpoint · Post by **highpoint** » Sun Jan 16, 2011 2:10 pm

My company doesn't like demoting messages and also does not accept any warnings.

Auto partitioning is working fine. But will the data be always correct using auto partitioning??

And also how do i check auto is giving me which partitioning.

Appreciate your reply.

ray.wurlod · Post by **ray.wurlod** » Sun Jan 16, 2011 7:11 pm

highpoint wrote:My company doesn't like demoting messages and also does not accept any warnings.

You're between a rock and a hard place. Entire partitioning will yield incorrect results in your job design. The responst to "doesn't like" therefore has to be "tough".

Check partitioning in the score. Partitioning is indicated between pairs of data sets in that section of the score.

XRAY · Post by **XRAY** » Tue Jan 18, 2011 3:37 am

There is another post related to this problem

viewtopic.php?p=179155&sid=3c15e5709b0c ... 6f43f257e2

I got the same error message sometimes on transformer and I set the "Preserve partitioning" to "Clear" in the previous stage. In your situation, it is a sort stage, I am not sure would it cause another problem.

DSXchange

Help to use correct partitioning

Help to use correct partitioning

Re: Help to use correct partitioning