Pivot stage giving very poor performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sidhartha_saha
Participant
Posts: 3
Joined: Wed Aug 10, 2005 10:19 pm

Pivot stage giving very poor performance

Post by sidhartha_saha »

I have about 5 million rows coming in the input. Each row has 28 columns, in addition to the key columns. I need to generate 28 rows from each input row. So the number of rows in output which will be fed to a teradata enterprise stage are 5*28 million rows. I am using the PIVOT stage to achieve this.

But, I'm getting very poor performance out of the PIVOT stage. Any idea how we can improve the performnace or use some other stage to achieve the same.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard. :D

Think about what Pivot stage is doing. It takes a lot of resources, particularly memory, for processing a lot of rows. There is nothing you can do about that.

You can create your own Build stage to pivot your data according to your specific requirements, which removes all the necessary compromises that are in the Pivot stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sidhartha_saha
Participant
Posts: 3
Joined: Wed Aug 10, 2005 10:19 pm

Post by sidhartha_saha »

Can you please elaborate on how we can do this ?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Select the Stage Types branch of your Repository, choose New Parallel Stage. Fill in your C++ code on the appropriate tabs. Use the stage in a job design.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vijayrc
Participant
Posts: 197
Joined: Sun Apr 02, 2006 10:31 am
Location: NJ

Post by vijayrc »

Yes...Pivot performance is considerably poor. You can go about building BuildOp Stages using c++ code whatever you use the pivot stage for.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There's a poster by the name of Larry.Griffith who has written pivot BuildOps. One of the posts where he mentions that is in here. Might be worth contacting him and see if he can help, if he doesn't stop by and post something here.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Larry.Griffith
Premium Member
Premium Member
Posts: 7
Joined: Wed Nov 02, 2005 12:19 pm

Post by Larry.Griffith »

I have written wrapped stages that pivot tall and flat and run in parallel mode.

I wonder what you consider good performance. By wrapping a simple C routine I can pivot a flat fixed file, 1.65MM records, 24 columns into 14.9MM rows (no blank columns) with key_id, column number, and value on an AIX box in 40 seconds. We are running 3 nodes.

Changing the options of the wrapped stage to write out blank columns (for comparison purposes because the DataStage Pivot has no such option) the same job took 1:25. It would be faster if the data was cleaner and the file was delimited instead of fixed.

So you can get a benchmark against the DataStage Pivot, I ran the same file through it and it took 19:31.


What kind of performance are you getting/looking for?
Thanks
Larry
sidhartha_saha
Participant
Posts: 3
Joined: Wed Aug 10, 2005 10:19 pm

Post by sidhartha_saha »

Thanks all for your suggestions.

We have a table which has 28 columns, in addition to the key columns. I need to convert every such row to 28 rows. I have about 5 million rows in the input. We have written a BuildOp stage as per the suggestions but that doesn't seem to make much of a difference.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That - along with the coding exercise itself - should tell you something about the amount of work that is involved.

It's time to manage expectations better.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply