Pivot stage question

Karine · Post by **Karine** » Sun May 06, 2007 4:27 am

I have a pipe delimited file that I need to pivot to multiple columns:

e.g.
key| col1 | col2| col3 | col4| col5
1 |A |B | | |
2 |C | | | |

I want the result to be:

key | col
1 | A
1 | B
2 | C

But I'm getting the following from my pivot stage:

key | col
1 | A
1 | B
1 |
1 |
1 |
2 | C
2 |
2 |
2 |
2 |

Is there any setting in the pivot stage where I can prevent the null values to be output?

chulett · Post by **chulett** » Sun May 06, 2007 7:00 am

No, you need to filter them out post-pivot via a constraint or some other mechanism.

JoshGeorge · Post by **JoshGeorge** » Mon May 07, 2007 11:05 pm

Before pivot do a conversion of '||' to '' (Routine suggested. Do this for the whole file. Recursive till there are no more '||')

ray.wurlod · Post by **ray.wurlod** » Tue May 08, 2007 1:32 am

Won't work. The missing columns will still be treated as empty or null and pivoted, as long as five input columns are defined. It has to be post-processed.

JoshGeorge · Post by **JoshGeorge** » Wed May 09, 2007 8:02 pm

How about a routine?

$INCLUDE DSINCLUDE JOBCONTROL.H
File_Name = trim(TempFileDir):trim(FileName):trim(Ext)
File_size = 0
VTemp = ''
Ans = 'Error'
RecVar = ''
OpenSeq File_Name To FVar_Keys
Else
Call DSLogWarn("Unable To Open Text File ": File_Name,"FileOpenError")
ErrorCode = -1 ; * STOP THE ROUTINE
End
For N = 1 To 1000 ;* Find total count of rows in file n substitute for this 1000
ReadSeq FileLine From FVar_Keys
On Error
Call DSLogFatal("Error Reading from ":File_Name:" FILE_Read_STATUS=":Status(),"FileReadError")
ErrorCode = -1 ; * STOP THE ROUTINE
End
Then
* Line Read Successfully
Cnt = Count(FileLine, '|')
Q =2
Loop While Cnt <> 0
TempString = Trim(Field(FileLine,'|',Q))
If NOT(TempString = '' or IsNull(TempString)) Then
RecVar = RecVar : Field(FileLine,'|',1) :'|': Field(FileLine,'|',Q) : CHAR(10)
End
Q=Q+1
Cnt = Cnt - 1
Repeat
End Else
END
NEXT N
CLOSESEQ FVar_Keys

OpenSeq File_Name To FVar_Keys Then WEOFSEQ FVar_Keys
WriteBlk RecVar To FVar_Keys
Else
Call DSLogWarn("Unable write to Text File ": File_Name,"FileOpenError")
ErrorCode = -1 ; * STOP THE ROUTINE
End
Call DSLogInfo("Successful conversion", "FILE_CONVERSION_SUCCESS")
Ans = 'Pivot Success'
CLOSESEQ FVar_Keys

ray.wurlod · Post by **ray.wurlod** » Wed May 09, 2007 8:53 pm

Bottleneck. No parallelism. Need to call server routine from parallel job - not impossible, but a bottleneck and a performance inhibitor.

Larry.Griffith · Post by **Larry.Griffith** » Mon May 21, 2007 2:36 pm

I am a newbie at this, it is my first post, so please bear with me.

I do a tremendous amount of pivoting in the work I do, and have not been very impressed with the pivot stage. I have 2000+ input formats that I have to standardize into one of a few standard formats. These input formats range from 3 or 4 columns to several thousand, and the output formats are just as variable. I decided to take a metadata driven approach and have stored my input and output formats in Oracle tables.

The approach I used was to write two parallel plugins one to pivot vertically, the other to pivot horizontally. I wrote them in C, and they accept parameters for the delimter, quotes, etc. They are very generic, and I have been able to utilize them in most everything I do.

1) Read line and parse and pivot vertically, write out with key, column number, and value (if not null)
2) Oracle lookup by input format and field number, replacing number with a sysname, and getting data format (number, value, string, date, etc)
3) Oracle lookup by output format and fieldname to get new field number, data type
4) pivot horizontally using key and field_number

Seems to work very well, and I was able to preserve the performance through making the plugins parallel.

Hope this helps. DSXchange has helped me on a number of occasions, and I would like to try and return the favor.

kommven · Post by **kommven** » Tue May 22, 2007 12:03 pm

I came accross same situation and solving it in Transformer used a constraint Len(Trim(somelink.somecolumn)) = 0.

I had no gain in using IsNull() Validation but above logic worked fine.

It works and simple.

Since Transformer is parallel you still have performance enhanced.

chulett wrote:No, you need to filter them out post-pivot via a constraint or some other mechanism.

subhashini · Post by **subhashini** » Thu May 24, 2007 4:34 am

I am also facing similar problem.

I hv to pivot columns dynamically.

For example

1|a|b|c
2|d|e
3|f|g|h|i|j|k|l etc

ie., some times i have to pivot only 3 columns as in the first line of example, some times 2, some times n. This has to be dynamically.

Iam able to do it when i assume the values can be max of some 30 . But iam not able to do it dynamically.

Seeking for help.....

Thanks in advance

Subhashini

ray.wurlod · Post by **ray.wurlod** » Thu May 24, 2007 3:17 pm

So pivot 30 columns then filter out those whose length is 0 or which contain null.

subhashini · Post by **subhashini** » Thu May 24, 2007 11:21 pm

But my problem, i won't always get 30 columns. I may get 30/50/100/2000 columns which iam mentioning as dynamically.

The columns to pivot is not static , it always differ. I can't set it to some max limit also.... b'coz the no. of columns are keep on changing

Waiting for some suggestion.

Thanks in advance.
Subhashini.

ray.wurlod · Post by **ray.wurlod** » Fri May 25, 2007 1:46 am

subhasini wrote:max of some 30

That's why I responded as I did.
Configure as many columns as your max.

DSguru2B · Post by **DSguru2B** » Fri May 25, 2007 7:09 am

Can you live with an awk script to get this done. Its an awesome workaround for dynamic variables, such as dynamic columns. You are sitting on parallel engine on windoze. I presume you have MKS Toolkit installed and can run unix scripts, Yes?

vijayrc · Post by **vijayrc** » Sat Jun 23, 2007 6:22 am

Larry.Griffith wrote:I am a newbie at this, it is my first post, so please bear with me.

I do a tremendous amount of pivoting in the work I do, and have not been very impressed with the pivot stage. I have 2000+ input formats that I have to standardize into one of a few standard formats. These input formats range from 3 or 4 columns to several thousand, and the output formats are just as variable. I decided to take a metadata driven approach and have stored my input and output formats in Oracle tables.

The approach I used was to write two parallel plugins one to pivot vertically, the other to pivot horizontally. I wrote them in C, and they accept parameters for the delimter, quotes, etc. They are very generic, and I have been able to utilize them in most everything I do.

1) Read line and parse and pivot vertically, write out with key, column number, and value (if not null)
2) Oracle lookup by input format and field number, replacing number with a sysname, and getting data format (number, value, string, date, etc)
3) Oracle lookup by output format and fieldname to get new field number, data type
4) pivot horizontally using key and field_number

Seems to work very well, and I was able to preserve the performance through making the plugins parallel.

Hope this helps. DSXchange has helped me on a number of occasions, and I would like to try and return the favor.

Larry,
Thanks for the post. We too have many pivot functionality in our application, and since Pivot is a performance poor, we did our own Buildops for each of these [though we haven't come up with what you did..come up with a generic pivot and customize for different metadata]
We face more of Horizontal pivot and some Vertical pivot.
Would be mind, if you can share the C code for it, which would make our life much simpler. Don;t bother, if that's too much of an asking.
Thanks

DSXchange

Pivot stage question

Pivot stage question

Pivoting

Re: Pivoting