Pivot stage giving very poor performance

sidhartha_saha · Post by **sidhartha_saha** » Thu Jun 21, 2007 3:51 pm

I have about 5 million rows coming in the input. Each row has 28 columns, in addition to the key columns. I need to generate 28 rows from each input row. So the number of rows in output which will be fed to a teradata enterprise stage are 5*28 million rows. I am using the PIVOT stage to achieve this.

But, I'm getting very poor performance out of the PIVOT stage. Any idea how we can improve the performnace or use some other stage to achieve the same.

ray.wurlod · Post by **ray.wurlod** » Thu Jun 21, 2007 3:57 pm

Welcome aboard. :D

Think about what Pivot stage is doing. It takes a lot of resources, particularly memory, for processing a lot of rows. There is nothing you can do about that.

You can create your own Build stage to pivot your data according to your specific requirements, which removes all the necessary compromises that are in the Pivot stage.

sidhartha_saha · Post by **sidhartha_saha** » Thu Jun 21, 2007 4:01 pm

Can you please elaborate on how we can do this ?

ray.wurlod · Post by **ray.wurlod** » Thu Jun 21, 2007 4:05 pm

Select the Stage Types branch of your Repository, choose New Parallel Stage. Fill in your C++ code on the appropriate tabs. Use the stage in a job design.

vijayrc · Post by **vijayrc** » Thu Jun 21, 2007 7:13 pm

Yes...Pivot performance is considerably poor. You can go about building BuildOp Stages using c++ code whatever you use the pivot stage for.

chulett · Post by **chulett** » Fri Jun 22, 2007 6:31 am

There's a poster by the name of Larry.Griffith who has written pivot BuildOps. One of the posts where he mentions that is in here. Might be worth contacting him and see if he can help, if he doesn't stop by and post something here.

Larry.Griffith · Post by **Larry.Griffith** » Fri Jun 22, 2007 9:25 am

I have written wrapped stages that pivot tall and flat and run in parallel mode.

I wonder what you consider good performance. By wrapping a simple C routine I can pivot a flat fixed file, 1.65MM records, 24 columns into 14.9MM rows (no blank columns) with key_id, column number, and value on an AIX box in 40 seconds. We are running 3 nodes.

Changing the options of the wrapped stage to write out blank columns (for comparison purposes because the DataStage Pivot has no such option) the same job took 1:25. It would be faster if the data was cleaner and the file was delimited instead of fixed.

So you can get a benchmark against the DataStage Pivot, I ran the same file through it and it took 19:31.

What kind of performance are you getting/looking for?

sidhartha_saha · Post by **sidhartha_saha** » Fri Jun 22, 2007 11:55 am

Thanks all for your suggestions.

We have a table which has 28 columns, in addition to the key columns. I need to convert every such row to 28 rows. I have about 5 million rows in the input. We have written a BuildOp stage as per the suggestions but that doesn't seem to make much of a difference.

ray.wurlod · Post by **ray.wurlod** » Sat Jun 23, 2007 7:07 am

That - along with the coding exercise itself - should tell you something about the amount of work that is involved.

It's time to manage expectations better.