Transpose a row Question in Parallel jobs

djm · Post by **djm** » Thu Jan 05, 2006 11:12 pm

To add my standard reply ...

If the number of rows in the data set is consistent (e.g. there are always "n" rows in the data set that have to be vertically pivoted), I'd lay money on the unix "paste" command doing this quickest. Try "man paste" for more details.

David

chulett · Post by **chulett** » Fri Jan 06, 2006 7:30 am

Mmmmm.... man paste.

ray.wurlod · Post by **ray.wurlod** » Fri Jan 06, 2006 4:14 pm

Did you ever issue the UNIX command make love ?

bsreenu · Post by **bsreenu** » Wed Jan 18, 2006 10:17 am

I'm not sure you guys have any solution yet, but I had a same situation and I did this in a different way. The job was processed 400million rows in less than 20min.

This is how i did in PX

1) Sort the data on key column and "Hash" partition the same data on key column (using Sort stage). This ensures all rows corresponding to the key column would be processed by the same node.

2)In transformer stage, using stage variables , append the data if they belong to same key column

Input data:
key , data
A, p1
A, p2
B, p1

Out put of transformer:
A, P1
A, p1|p2
B, P1

3) using the remove duplicate stage, keep the last row and remove all.

Output:

A, p1|p2
B, p1

Now you can handle this string the way you want.

Hope this helps.