Vertical Pivot Dynamic Columns

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dslisa
Participant
Posts: 24
Joined: Mon Oct 27, 2008 6:48 pm

Vertical Pivot Dynamic Columns

Post by dslisa »

Hi everyone,
I need some help to apply vertical pivot logic for dynamic data. I have achieved the vertical pivot and the data output is correct but now the business rule has changed and I will not have constant number of columns. Below is my job design.

Input Data

Key_Col Data_Col
1 name1
1 name2
1 name3
2 name1
2 name2

Seq. File-------->sort stage--------------->transformer------------->column import stage------------>Remove Duplicate------------>Output seq. file.

Output Data
1 name1 name2 name3
2 name1 name2

Transformer stage is using stage variable to concatenate the data and I am using Column Import Stage to split the data coming in a single row to multiple columns based on ",".
I have searched the forum but could not find a concrete resolution for changing number of columns.
Any idea regarding this is appreciated.
mail2hfz
Premium Member
Premium Member
Posts: 92
Joined: Thu Nov 16, 2006 8:51 am

Post by mail2hfz »

This has been discussed many times before. Do a search on the forum for "Vertical pivot".
dslisa
Participant
Posts: 24
Joined: Mon Oct 27, 2008 6:48 pm

Post by dslisa »

True, the vertical pivot has been discussed many times before. But nothing concrete for vertical pivot for changing number of input. Has anybody implemented this?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In parallel jobs you need to configure for the maximum possible number of columns, assigning null to each unused column and subsequently filtering out the nulls. Or you might consider using a server job and the multi-value handling capabilities of a UniVerse stage to achieve a vertical pivot with an arbitrary number of columns.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dslisa
Participant
Posts: 24
Joined: Mon Oct 27, 2008 6:48 pm

Post by dslisa »

Thanks Ray.
I am not supposed to use server jobs. So let us say the max number of columns I can have is 500. After the transformer stage in my job design my output would be like follows.

1 name1 name2...........name500.

But for this month let say I only have 200 columns. How to pad the rest 300 with null?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Or just a Server job with a hashed file being read from and written to by the same transformer. Easy Peasy.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Make the columns nullable and ensure that the default value is null.

Push back on the arbitrary restriction on your effectiveness. Server jobs are perfectly valid.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dslisa
Participant
Posts: 24
Joined: Mon Oct 27, 2008 6:48 pm

Post by dslisa »

Hi Ray,
I assume from your previous post that I have to read the data as a long fixed length data. Each column is 100 characters long so for 500 columns it will be 500*100. Now the problem is lets say after 200 columns my data is padded with null. Even then after every 100 character column import stage needs to find a ",". How can I achieve this?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Nothing I posted implies fixed width.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dslisa
Participant
Posts: 24
Joined: Mon Oct 27, 2008 6:48 pm

Post by dslisa »

Sorry about the confusion.
Let me explain my plan of action.
I thought as I already have the output as

1,name1,name2,name3
2,name1,name2
3,name1,name2,name3,name4

I could read each row as one long fixed width and pad the remaining character with null.
Say for example after name3 for 1st row, after name2 for 2nd row and so on.
Am I headed towards the right direction or towards the ditch?
Post Reply