Pivot stage too slow

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sarathchandrakt
Participant
Posts: 50
Joined: Fri Aug 29, 2014 1:32 pm
Location: Mumbai

Pivot stage too slow

Post by sarathchandrakt »

My job structure is something like this,

DataSource>Remove Duplicates>Pivot Stage>Transformer>TargetDB

I have 3million records to process. My pivot stage is taking 90mins to do that. I have 8 input columns to Pivot stage out of which 2 need to be pivoted vertically.

In the beginning I grouped by all other 6 columns. Then I reduced it to only 1 column. Still there is no improvment in performance. I used Hash and Roundrobin partitioning. Hash actually decreased the performance where as Roundrobin brought doen the procesing time to 80Mins.

Can you suggest any other steps the improve the performance of Pivot stage?

Thank You.
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Not sure its actually the pivot stage but more likely the sorting that is being done.

Before a remove duplicates and a pivot stage I am pretty sure that there is an sort inserted (as the data needs to be sorted prior to performing these activities).

Is the duplicate key the same as the pivot key? (or is the pivot key at least a subset of the duplicate key fields?)

Try and sort and partition your data as early as possible, either in your data source or using a sort stage, and where possible avoid repartitioning by setting the input links to the remove duplicates and pivot stages to have a partitioning of SAME.
sarathchandrakt
Participant
Posts: 50
Joined: Fri Aug 29, 2014 1:32 pm
Location: Mumbai

Post by sarathchandrakt »

Thank You Shane. I added a Sort between Pivot and DataSource and changed "allow duplicates" to false in Sort stage, so I don't need another remove duplicates. I partitioned the data in sort stage and kept it as same in Pivot. It worked like charm. Thank You again.
Post Reply