Pivot stage question

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Karine
Participant
Posts: 37
Joined: Sun Feb 18, 2007 3:33 am

Pivot stage question

Post by Karine »

I have a pipe delimited file that I need to pivot to multiple columns:

e.g.
key| col1 | col2| col3 | col4| col5
1 |A |B | | |
2 |C | | | |

I want the result to be:

key | col
1 | A
1 | B
2 | C

But I'm getting the following from my pivot stage:

key | col
1 | A
1 | B
1 |
1 |
1 |
2 | C
2 |
2 |
2 |
2 |

Is there any setting in the pivot stage where I can prevent the null values to be output?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No, you need to filter them out post-pivot via a constraint or some other mechanism.
-craig

"You can never have too many knives" -- Logan Nine Fingers
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

Before pivot do a conversion of '||' to '' (Routine suggested. Do this for the whole file. Recursive till there are no more '||')
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Won't work. The missing columns will still be treated as empty or null and pivoted, as long as five input columns are defined. It has to be post-processed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

How about a routine?

$INCLUDE DSINCLUDE JOBCONTROL.H
File_Name = trim(TempFileDir):trim(FileName):trim(Ext)
File_size = 0
VTemp = ''
Ans = 'Error'
RecVar = ''
OpenSeq File_Name To FVar_Keys
Else
Call DSLogWarn("Unable To Open Text File ": File_Name,"FileOpenError")
ErrorCode = -1 ; * STOP THE ROUTINE
End
For N = 1 To 1000 ;* Find total count of rows in file n substitute for this 1000
ReadSeq FileLine From FVar_Keys
On Error
Call DSLogFatal("Error Reading from ":File_Name:" FILE_Read_STATUS=":Status(),"FileReadError")
ErrorCode = -1 ; * STOP THE ROUTINE
End
Then
* Line Read Successfully
Cnt = Count(FileLine, '|')
Q =2
Loop While Cnt <> 0
TempString = Trim(Field(FileLine,'|',Q))
If NOT(TempString = '' or IsNull(TempString)) Then
RecVar = RecVar : Field(FileLine,'|',1) :'|': Field(FileLine,'|',Q) : CHAR(10)
End
Q=Q+1
Cnt = Cnt - 1
Repeat
End Else
END
NEXT N
CLOSESEQ FVar_Keys

OpenSeq File_Name To FVar_Keys Then WEOFSEQ FVar_Keys
WriteBlk RecVar To FVar_Keys
Else
Call DSLogWarn("Unable write to Text File ": File_Name,"FileOpenError")
ErrorCode = -1 ; * STOP THE ROUTINE
End
Call DSLogInfo("Successful conversion", "FILE_CONVERSION_SUCCESS")
Ans = 'Pivot Success'
CLOSESEQ FVar_Keys
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

:cry: Bottleneck. No parallelism. Need to call server routine from parallel job - not impossible, but a bottleneck and a performance inhibitor.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Larry.Griffith
Premium Member
Premium Member
Posts: 7
Joined: Wed Nov 02, 2005 12:19 pm

Pivoting

Post by Larry.Griffith »

I am a newbie at this, it is my first post, so please bear with me.

I do a tremendous amount of pivoting in the work I do, and have not been very impressed with the pivot stage. I have 2000+ input formats that I have to standardize into one of a few standard formats. These input formats range from 3 or 4 columns to several thousand, and the output formats are just as variable. I decided to take a metadata driven approach and have stored my input and output formats in Oracle tables.

The approach I used was to write two parallel plugins one to pivot vertically, the other to pivot horizontally. I wrote them in C, and they accept parameters for the delimter, quotes, etc. They are very generic, and I have been able to utilize them in most everything I do.

1) Read line and parse and pivot vertically, write out with key, column number, and value (if not null)
2) Oracle lookup by input format and field number, replacing number with a sysname, and getting data format (number, value, string, date, etc)
3) Oracle lookup by output format and fieldname to get new field number, data type
4) pivot horizontally using key and field_number

Seems to work very well, and I was able to preserve the performance through making the plugins parallel.

Hope this helps. DSXchange has helped me on a number of occasions, and I would like to try and return the favor.
Thanks
Larry
kommven
Charter Member
Charter Member
Posts: 125
Joined: Mon Jul 12, 2004 12:37 pm

Post by kommven »

I came accross same situation and solving it in Transformer used a constraint Len(Trim(somelink.somecolumn)) = 0.

I had no gain in using IsNull() Validation but above logic worked fine.

It works and simple.

Since Transformer is parallel you still have performance enhanced.
chulett wrote:No, you need to filter them out post-pivot via a constraint or some other mechanism.
subhashini
Participant
Posts: 8
Joined: Thu Nov 23, 2006 8:43 am

Post by subhashini »

I am also facing similar problem.

I hv to pivot columns dynamically.

For example

1|a|b|c
2|d|e
3|f|g|h|i|j|k|l etc

ie., some times i have to pivot only 3 columns as in the first line of example, some times 2, some times n. This has to be dynamically.

Iam able to do it when i assume the values can be max of some 30 . But iam not able to do it dynamically.

Seeking for help.....

Thanks in advance

Subhashini
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

So pivot 30 columns then filter out those whose length is 0 or which contain null.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
subhashini
Participant
Posts: 8
Joined: Thu Nov 23, 2006 8:43 am

Post by subhashini »

But my problem, i won't always get 30 columns. I may get 30/50/100/2000 columns which iam mentioning as dynamically.

The columns to pivot is not static , it always differ. I can't set it to some max limit also.... b'coz the no. of columns are keep on changing

Waiting for some suggestion.


Thanks in advance.
Subhashini.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

subhasini wrote:max of some 30
That's why I responded as I did.
Configure as many columns as your max.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Can you live with an awk script to get this done. Its an awesome workaround for dynamic variables, such as dynamic columns. You are sitting on parallel engine on windoze. I presume you have MKS Toolkit installed and can run unix scripts, Yes?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
vijayrc
Participant
Posts: 197
Joined: Sun Apr 02, 2006 10:31 am
Location: NJ

Re: Pivoting

Post by vijayrc »

Larry.Griffith wrote:I am a newbie at this, it is my first post, so please bear with me.

I do a tremendous amount of pivoting in the work I do, and have not been very impressed with the pivot stage. I have 2000+ input formats that I have to standardize into one of a few standard formats. These input formats range from 3 or 4 columns to several thousand, and the output formats are just as variable. I decided to take a metadata driven approach and have stored my input and output formats in Oracle tables.

The approach I used was to write two parallel plugins one to pivot vertically, the other to pivot horizontally. I wrote them in C, and they accept parameters for the delimter, quotes, etc. They are very generic, and I have been able to utilize them in most everything I do.

1) Read line and parse and pivot vertically, write out with key, column number, and value (if not null)
2) Oracle lookup by input format and field number, replacing number with a sysname, and getting data format (number, value, string, date, etc)
3) Oracle lookup by output format and fieldname to get new field number, data type
4) pivot horizontally using key and field_number

Seems to work very well, and I was able to preserve the performance through making the plugins parallel.

Hope this helps. DSXchange has helped me on a number of occasions, and I would like to try and return the favor.
Larry,
Thanks for the post. We too have many pivot functionality in our application, and since Pivot is a performance poor, we did our own Buildops for each of these [though we haven't come up with what you did..come up with a generic pivot and customize for different metadata]
We face more of Horizontal pivot and some Vertical pivot.
Would be mind, if you can share the C code for it, which would make our life much simpler. Don;t bother, if that's too much of an asking.
Thanks
Post Reply