Help to use correct partitioning

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
highpoint
Premium Member
Premium Member
Posts: 123
Joined: Sat Jun 19, 2010 12:01 am
Location: Chicago

Help to use correct partitioning

Post by highpoint »

Hi,

I have job with many stages.
Here is where i need some help.

Sort -> Transformer -> Aggregator --> Transformer --> Pivot

Currently in the sort stage i am sorting on key field "Product_ID"
All the stages are using the Auto partitioning.

I would like to performance tune the job.

I tried using hash partitioning on sort stage and then using "same" partitioning all the way upto pivot stage.

When i use the Same partitioning with Transformer i get the Warning
"Input dataset 0 has a partitioning method other than entire specified; disabling memory sharing".

Please help me which partitioning to use for getting best performance.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This is an unusual message. What is your Transformer stage doing? Can you please post the exact, and entire, message, so we can be certain before offering advice?

(Auto) will probably achive optimum partitioning in this job design. I'm only curious about where it thinks that Entire might be appropriate - this is normally only on the reference input of a Lookup stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
highpoint
Premium Member
Premium Member
Posts: 123
Joined: Sat Jun 19, 2010 12:01 am
Location: Chicago

Post by highpoint »

ray.wurlod wrote:This is an unusual message. What is your Transformer stage doing? Can you please post the exact, and entire, message, so we can be certain before offering advice?

(Auto) will probably achive optimum partitioning in this job design. I'm only curious about where it thinks that Entire might be appropriate - this is normally only on the reference input of a Lookup stage.
My Key column in this job is "product_id"
My transformer is using the key change column defined in sort stage on field "product_id" to do counting logic.

I was under impression that the sort and aggregator stages should NEVER use auto partioning and should you hash partioning on the grouping field. I am using sort method in aggregator.

So, to improve performance i am using the hash partitioning on sort stage and all other stages use same partitioning.

Then i am getting this warning:
xTransform: Input dataset 0 has a partitioning method other than entire specified; disabling memory sharing.

Appreciate your help in to achieve best performance tuning.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Ignore the message. It's only alerting you to the fact that shared memory will not be used because not all keys are on all nodes. Use a message handler to demote to informational.

Next step is to look at the score, to see what partitioning (Auto) is actually giving you. The Sort stage requires its input to be hash partitioned on the first sort key. Assuming that you are grouping by product_id, then Same should be used to carry that partitioning through the job.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
highpoint
Premium Member
Premium Member
Posts: 123
Joined: Sat Jun 19, 2010 12:01 am
Location: Chicago

Post by highpoint »

My company doesn't like demoting messages and also does not accept any warnings.

Auto partitioning is working fine. But will the data be always correct using auto partitioning??


And also how do i check auto is giving me which partitioning.

Appreciate your reply.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

highpoint wrote:My company doesn't like demoting messages and also does not accept any warnings.
You're between a rock and a hard place. Entire partitioning will yield incorrect results in your job design. The responst to "doesn't like" therefore has to be "tough".

Check partitioning in the score. Partitioning is indicated between pairs of data sets in that section of the score.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
XRAY
Participant
Posts: 33
Joined: Mon Apr 03, 2006 12:09 am

Re: Help to use correct partitioning

Post by XRAY »

There is another post related to this problem

viewtopic.php?p=179155&sid=3c15e5709b0c ... 6f43f257e2

I got the same error message sometimes on transformer and I set the "Preserve partitioning" to "Clear" in the previous stage. In your situation, it is a sort stage, I am not sure would it cause another problem.
Post Reply