Pivot stage - Parallel mode

Kirtikumar · Post by **Kirtikumar** » Wed Mar 19, 2008 11:57 pm

Hi,

I ready many thread on the pivot stage in parallel and the performance issue.

I created one temp job and tested it in parallel mode and seq mode. As expected the PX performance was better.

In PX mode, I kept the partitioning to auto.

Many of the threads mentioned that the pivot stage needs hash on pivot keys. The pivot documentation that I have does not talk abt partitioning for the stage or PX or Seq mode.

Though the pivot stage functionality is simple i.e. convert one row into multiple based on the input cols, just wondering why does it need hash partitioning?

E.g.
If input is:

Code: Select all

EmpNo Error1 Error2 Error3
1001  Er1    Er2    Er3
1002  Er4    Er5    Er6
1001  Er1    Er4    Er5

It has to generate the following output

Code: Select all

EmpNo Error
1001  Er1 
1001  Er2  
1001  Er3
1002  Er4
1002  Er5
1002  Er6
1001  Er1 
1001  Er4
1001  Er5

I also tried running the same job with RR partion and it produced the same no. of rows. The output was also the same.

Any thoughts?

ShaneMuir · Post by **ShaneMuir** » Thu Mar 20, 2008 12:14 am

In the example you have given, without hash partitioning would it be more difficult to remove duplicates? (without repartitioning further down the stream)

ray.wurlod · Post by **ray.wurlod** » Thu Mar 20, 2008 12:34 am

It needs key-based partitioning (whether Hash or Modulus) so that every key value has all its rows on the same partition, so that (necessarily) the pivot will generate the correct number of rows per key.