Transpose a row Question in Parallel jobs

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

djm
Participant
Posts: 68
Joined: Wed Mar 02, 2005 3:42 am
Location: N.Z.

Post by djm »

To add my standard reply ...

If the number of rows in the data set is consistent (e.g. there are always "n" rows in the data set that have to be vertically pivoted), I'd lay money on the unix "paste" command doing this quickest. Try "man paste" for more details.

David
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Mmmmm.... man paste. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Did you ever issue the UNIX command make love ? :lol:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bsreenu
Participant
Posts: 22
Joined: Mon Aug 16, 2004 3:57 pm

Transpose a row Question in Parallel jobs

Post by bsreenu »

I'm not sure you guys have any solution yet, but I had a same situation and I did this in a different way. The job was processed 400million rows in less than 20min.

This is how i did in PX

1) Sort the data on key column and "Hash" partition the same data on key column (using Sort stage). This ensures all rows corresponding to the key column would be processed by the same node.

2)In transformer stage, using stage variables , append the data if they belong to same key column

Input data:
key , data
A, p1
A, p2
B, p1

Out put of transformer:
A, P1
A, p1|p2
B, P1

3) using the remove duplicate stage, keep the last row and remove all.

Output:

A, p1|p2
B, p1

Now you can handle this string the way you want.

Hope this helps.
Post Reply