Dynamically pivot columns & data to become rows

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Configure the maximum possible (in the data) number of columns and allow them to be NULL if not populated. Run these rows through a Pivot stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ronnie
Premium Member
Premium Member
Posts: 18
Joined: Thu Feb 14, 2008 9:13 pm
Location: Herndon, Va

Post by Ronnie »

Though, I'm still in the lost of how actual column names and its data become rows of data within this stage. It sounds like the pivot stage alone is not going to handle this requirement.

Thanks for responding Ray.
Ronnie B
OddJob
Participant
Posts: 163
Joined: Tue Feb 28, 2006 5:00 am
Location: Sheffield, UK

Post by OddJob »

The following may not be very performant as it uses a Peek stage...

Pass your records into a Peek stage, and select the following properties:
All records=True
Peek Records Output Mode=Output
Show Column Names=True
Delimiter String=|

This will output records like:
Col1:Val1|Col2:Val2|Col3:Val3 etc i.e. each record input to the peek will be turned into a string containing delimited name/value pairs.

To convert each name/value pair into a record, send the Peek output into an External Filter stage, with a command line of:
tr '|' '\n'

This will convert each of the pairs into new records i.e. for every single record coming out of the Peek stage you'll now have (Number of Columns) records. Each record will look like
Col1:Val1
Col2:Val2
...
Coln:Valn

Now use a transformer to split each of the new records into the column name and value. Give the transformer 2 outputs OCOL1 and OCOL2.
For OCOL1 use a derivation of Field(Link.InCol, ':', 1, 1).
For OCOL2 use a derivation of Field(Link.InCol, ':', 2, 1).

Hope this helps!
Ronnie
Premium Member
Premium Member
Posts: 18
Joined: Thu Feb 14, 2008 9:13 pm
Location: Herndon, Va

Post by Ronnie »

Thanks OddJob.

I'll look it over and give it a try. I'll let you know the turnout...may take a few days though. Appreciate it!
Ronnie B
Ronnie
Premium Member
Premium Member
Posts: 18
Joined: Thu Feb 14, 2008 9:13 pm
Location: Herndon, Va

Post by Ronnie »

OddJob,

I started coding your recommendation and hit a few stumbling blocks. How do I declare the output columns in the Output tabs within the Peek & External Filter stages?
- Layout after Peek stage is "Col1:Val1|Col2:Val2 etc". Would I declare one long column to hold "Col1:Val1|Col2:Val2 etc" as Varchar? If so, the column would be holding both character (Col*) and numeric fields (Val*).
- Layout after External Filter stage is
Col1:Val1
Col2:Val2
etc.
How would the output columns be declared here?

Again your help would be much appreciated.
Thanks
Ronnie B
OddJob
Participant
Posts: 163
Joined: Tue Feb 28, 2006 5:00 am
Location: Sheffield, UK

Post by OddJob »

The Peek stage will define its own output format, as follows:
Col Name: rec
SQL Type: VarChar
Nullable: No

This is an unbound character field i.e. no length is given. The peek doesn't know how long the output will be. This isn't the most efficient way of doing things, but we have no choice in this instance.

In my test job, the External Filter takes this single column and passes to the command line specified. The output is again called rec (but you could change the name) and the type is VarChar, again no length specified.

The point about the peek is that it will take any input type and convert it to VarChar - it's doing all the implicit conversions for you i.e. Double to Varchar, Int to VarChar etc.
Peek is really just for reporting a long string of names/values, but in this case we're going to split it up and use to our benefit.

In the transformer I would start to add some more reasonable type information that applies to your output columns OCOL1 and OCOL2. You may want to truncate the outputs to fit to VarChar(256) for example.
It depends upon the types you need for your output.
Post Reply