Pivoting Dynamic number of columns

dsuser7 · Post by **dsuser7** » Wed Mar 09, 2011 10:31 am

Hi,

I'm trying to obtain the below output. The below input sample comes in a csv file.
I'm using V8.1

Input

col1,col2,col3,col4,col3,col4
a,b,x1,y1,x2,y2
q,w,r,s

Output
a,b,x1,y1
a,b,x2,y2
q,w,r,s

This is a case of pivoting but the set of col3,col4 may repeat any number of times. The column names col3 and col4 repeat with same name.

If pivot stage is to be used by defining maximum number of columns (which will be a guess), the Sequential stage used to read the .csv file would fail to read if the number of maximum columns are not present in other rows.

Any ideas on how to achieve this.
Thanks.

jwiles · Post by **jwiles** » Wed Mar 09, 2011 10:38 am

I think similar requirements have been discussed here just recently, but:

Read the record as a single varchar column
(8.5) Use a transformer with looping to build the pivoted output records
(pre-8.5) Use a transfomer with multiple output links to build the pivoted records and funnel them together, or use a buildop/custom operator

Any version: Use awk/perl/C/C++ with an external source stage/SeqFile filter/beforeJob ExecSH to do the pivoting outside of DS.

Regards,

dsuser7 · Post by **dsuser7** » Wed Mar 09, 2011 10:47 am

jwiles wrote:I think similar requirements have been discussed here just recently, but:

Read the record as a single varchar column
(8.5) Use a transformer with looping to build the pivoted output records
(pre-8.5) Use a transfomer with multiple output links to build the pivoted records and funnel them together, or use a buildop/custom operator

Any version: Use awk/perl/C/C++ with an external source stage/SeqFile filter/beforeJob ExecSH to do the pivoting outside of DS.

Regards,

Thanks for the reply JWiles.
I have tried using BuildOps but couldn't really go too far as I didn't knew what needs to be put in the PreLoop,Per-record and PostLoop.
I have read the material that comes with product but really couldn't figure out how to do it.

jwiles · Post by **jwiles** » Wed Mar 09, 2011 11:02 am

The BuildOp interface essentially presents a framework/IDE for coding C++ logic within DS Designer. The three sections--Pre-Loop, Per-Record and Post-Loop--would contain logic to be performed during job initialization, processing records and end-of-job respectively.

Too much more to go into than is appropriate for the forum, tho.

Regards,

ray.wurlod · Post by **ray.wurlod** » Wed Mar 09, 2011 3:23 pm

Attend the IBM Advanced DataStage class. You will learn about, and do, construction of a working Build stage.

ray.wurlod · Post by **ray.wurlod** » Wed Mar 09, 2011 3:32 pm

This is more easily accomplished with a server job, in which the Sequential File stage has a "missing column" rule.