Pivot stage giving very poor performance
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 3
- Joined: Wed Aug 10, 2005 10:19 pm
Pivot stage giving very poor performance
I have about 5 million rows coming in the input. Each row has 28 columns, in addition to the key columns. I need to generate 28 rows from each input row. So the number of rows in output which will be fed to a teradata enterprise stage are 5*28 million rows. I am using the PIVOT stage to achieve this.
But, I'm getting very poor performance out of the PIVOT stage. Any idea how we can improve the performnace or use some other stage to achieve the same.
But, I'm getting very poor performance out of the PIVOT stage. Any idea how we can improve the performnace or use some other stage to achieve the same.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Welcome aboard. :D
Think about what Pivot stage is doing. It takes a lot of resources, particularly memory, for processing a lot of rows. There is nothing you can do about that.
You can create your own Build stage to pivot your data according to your specific requirements, which removes all the necessary compromises that are in the Pivot stage.
Think about what Pivot stage is doing. It takes a lot of resources, particularly memory, for processing a lot of rows. There is nothing you can do about that.
You can create your own Build stage to pivot your data according to your specific requirements, which removes all the necessary compromises that are in the Pivot stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 3
- Joined: Wed Aug 10, 2005 10:19 pm
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
There's a poster by the name of Larry.Griffith who has written pivot BuildOps. One of the posts where he mentions that is in here. Might be worth contacting him and see if he can help, if he doesn't stop by and post something here.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Premium Member
- Posts: 7
- Joined: Wed Nov 02, 2005 12:19 pm
I have written wrapped stages that pivot tall and flat and run in parallel mode.
I wonder what you consider good performance. By wrapping a simple C routine I can pivot a flat fixed file, 1.65MM records, 24 columns into 14.9MM rows (no blank columns) with key_id, column number, and value on an AIX box in 40 seconds. We are running 3 nodes.
Changing the options of the wrapped stage to write out blank columns (for comparison purposes because the DataStage Pivot has no such option) the same job took 1:25. It would be faster if the data was cleaner and the file was delimited instead of fixed.
So you can get a benchmark against the DataStage Pivot, I ran the same file through it and it took 19:31.
What kind of performance are you getting/looking for?
I wonder what you consider good performance. By wrapping a simple C routine I can pivot a flat fixed file, 1.65MM records, 24 columns into 14.9MM rows (no blank columns) with key_id, column number, and value on an AIX box in 40 seconds. We are running 3 nodes.
Changing the options of the wrapped stage to write out blank columns (for comparison purposes because the DataStage Pivot has no such option) the same job took 1:25. It would be faster if the data was cleaner and the file was delimited instead of fixed.
So you can get a benchmark against the DataStage Pivot, I ran the same file through it and it took 19:31.
What kind of performance are you getting/looking for?
Thanks
Larry
Larry
-
- Participant
- Posts: 3
- Joined: Wed Aug 10, 2005 10:19 pm
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: