Pivot stage - Parallel mode

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Kirtikumar
Participant
Posts: 437
Joined: Fri Oct 15, 2004 6:13 am
Location: Pune, India

Pivot stage - Parallel mode

Post by Kirtikumar »

Hi,

I ready many thread on the pivot stage in parallel and the performance issue.

I created one temp job and tested it in parallel mode and seq mode. As expected the PX performance was better.

In PX mode, I kept the partitioning to auto.

Many of the threads mentioned that the pivot stage needs hash on pivot keys. The pivot documentation that I have does not talk abt partitioning for the stage or PX or Seq mode.

Though the pivot stage functionality is simple i.e. convert one row into multiple based on the input cols, just wondering why does it need hash partitioning?

E.g.
If input is:

Code: Select all

EmpNo Error1 Error2 Error3
1001  Er1    Er2    Er3
1002  Er4    Er5    Er6
1001  Er1    Er4    Er5
It has to generate the following output

Code: Select all

EmpNo Error
1001  Er1 
1001  Er2  
1001  Er3
1002  Er4
1002  Er5
1002  Er6
1001  Er1 
1001  Er4
1001  Er5
I also tried running the same job with RR partion and it produced the same no. of rows. The output was also the same.

Any thoughts?
Regards,
S. Kirtikumar.
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

In the example you have given, without hash partitioning would it be more difficult to remove duplicates? (without repartitioning further down the stream)
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It needs key-based partitioning (whether Hash or Modulus) so that every key value has all its rows on the same partition, so that (necessarily) the pivot will generate the correct number of rows per key.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply