To add my standard reply ...
If the number of rows in the data set is consistent (e.g. there are always "n" rows in the data set that have to be vertically pivoted), I'd lay money on the unix "paste" command doing this quickest. Try "man paste" for more details.
David
Transpose a row Question in Parallel jobs
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Transpose a row Question in Parallel jobs
I'm not sure you guys have any solution yet, but I had a same situation and I did this in a different way. The job was processed 400million rows in less than 20min.
This is how i did in PX
1) Sort the data on key column and "Hash" partition the same data on key column (using Sort stage). This ensures all rows corresponding to the key column would be processed by the same node.
2)In transformer stage, using stage variables , append the data if they belong to same key column
Input data:
key , data
A, p1
A, p2
B, p1
Out put of transformer:
A, P1
A, p1|p2
B, P1
3) using the remove duplicate stage, keep the last row and remove all.
Output:
A, p1|p2
B, p1
Now you can handle this string the way you want.
Hope this helps.
This is how i did in PX
1) Sort the data on key column and "Hash" partition the same data on key column (using Sort stage). This ensures all rows corresponding to the key column would be processed by the same node.
2)In transformer stage, using stage variables , append the data if they belong to same key column
Input data:
key , data
A, p1
A, p2
B, p1
Out put of transformer:
A, P1
A, p1|p2
B, P1
3) using the remove duplicate stage, keep the last row and remove all.
Output:
A, p1|p2
B, p1
Now you can handle this string the way you want.
Hope this helps.