Page 1 of 1

vertical pivot for variable number of rows and columns

Posted: Sat Jun 18, 2011 8:14 pm
by mctny
Hi

I have an interesting requirement and I would like to find a systematic solution for it. I would appreciate if you could give me some hints to implement the logic. ( I searched the forum but could not find a good answer)

input file is a sequential file with the format below

zzzzz a b c ...... n
date1 a1 b1 c1 ......n1
date2 a2 b2 c2 ......n2
date3 a3 b3 c3 ......n3
................................
................................
dateX ax bx cx ......nx


I am reading the file as one record as the number of columns is unknown
Number of rows is also unknown as you can guess. in fact the file is comma delimited.

My desired output format will be in a data set of 2 columns
The "zzzz" is garbage and will be thrown out. All the "a" values will be concatenated as well as the "b" and "c" values etc. the date column is same for every row but it is again concatenated with the all the dates in the order of incoming rows.

value ........................ date
--------------------- --------------------------------
a a1 a2 a3 .... an date1 date2 date3 .....datex
b b1 b2 b3 .... bn date1 date2 date3 .....datex
c c1 c2 c3 ...... cn date1 date2 date3 .....datex
......................... ........................................
......................... ........................................
n n1 n2 n3 .....nx date1 date2 date3 .....datex


I am able to handle the date part easily, the value part requires vertical pivot with unknown number of columns which I think is not possible.

Posted: Sat Jun 18, 2011 11:44 pm
by chulett
Actual number unknown perhaps but do you know the maximum possible number of columns? If so, you pivot that many and then filter out any 'empty' rows post-pivot.

Posted: Sun Jun 19, 2011 12:04 am
by mctny
Thanks Craig for the reply, That's the thing, as of now I don't have a safe number that I can assume as the maximum number of columns. ( I can go back to the business Analysts and ask if they can give me such a number)

is it possible to do in a transformer with a loop where the loop limit is the number of delimiter in the first row?

Posted: Sun Jun 19, 2011 12:07 am
by ray.wurlod
Yes. Stage variables are evaluated before loop variables, so you could count the delimiters into a stage variable then build your loop exit condition based on @ITERATION being less than that value.

Posted: Sun Jun 19, 2011 12:27 am
by mctny
Thanks Ray, it still does not help me because in that case, I have to create dynamic number of stage variables (equal to the number of delimiters which will be different for each run). if possible then the logic will be simple, when rows comes, I will keep concatenating the incoming value of the stage variable.

is such a thing possible? can I create n number of stage variables? if not possible, is there any other approach?

Posted: Sun Jun 19, 2011 6:26 am
by ray.wurlod
Not really. Within the loop, use the Field() function to extract the nth field where n = @ITERATION (the current value of the loop counter).

Code: Select all

svDelimitedFieldCount = DCount(InLink.DelimitedField, ",")

Loop 
   While @ITERATION <= svDelimitedFieldCount

lvNthField = Field(InLink.DelimitedField, ",", @ITERATION, 1)

Posted: Sun Jun 19, 2011 7:21 am
by chulett
Yup, the "looping" functionality in 8.5 will greatly simplify this problem... should have noticed that you had that version. :(

Posted: Sun Jun 19, 2011 10:11 am
by mctny
Thanks to you both, Yes I can do that and it will help me extract all the elements of the matrix with the order a,b, c, ..n, a1, b1, c1,... n1,a2,b2,c2,...cn... but I am not sure it will help me to achieve my desired output, it will only help me to concatenate ai, bi, ci ... ni ( where i is 1,2 .. n), however I want to concatenate a1, a2, a3...an. if I have to rephrase I need to skip n elements to append all a's, b's etc.

Yes stage variable is available in loop but note that loop variable is not available in a stage variable derivation. and I can not save the loop variable ( lvNthField) across the incoming rows in n different loop variables