Converting a dynamic array into multiple rows
Posted: Tue May 12, 2009 8:02 pm
Hello Gurus,
(I have gone through the forums and read all about vertical pivots before asking this question. I have also perused the manuals - BASIC reference and Parallel Job Developers Guide)
The requirement is as follows
Input
[code]
Col1 Col2 Col3
A B 1,2,3
D E 4,5,6,7,8
[/code]
The field delimiter is pipe character
Output
[code]
Col1 Col2 Col3
A B 1
A B 2
A B 3
D E 4
D E 5
D E 6
D E 7
D E 8
[/code]
The field delimiter is pipe character
I followed one of Ray's post to use Splice and Reuse UVerse functions (very elegant for the task at end) and ended up with the following
[code]
Splice(Reuse(DSLink1.Col1:"|":DSLink2.Col2),@VM,Convert(",",@FM,DSLink1.Col3))
[/code]
Please note that I have two key columns and hence I had to concatenate both when using Reuse function.
The result is as follows
[code]
A|B@VM1@FMA|B@VM2@FMA|B@VM3@FMD|E@VM4@FMD|E@VM5@FMD|E@VM6@FMD|E@VM7@FMD|E@VM8@FM
[/code]
I was able to convert the above value into multiple records by replacing @VM and @FM in the function with the "|" character and Char(10) respectively.
[code]
EReplace(Splice(Reuse(DSLink1.Col1:"|":DSLink2.Col2),"|",Convert("|",@FM,DSLink1.Col3)),@FM,Char(10))
[/code]
This yields results as expected when I write the output to a text file.
However, I only have one column for the entire output data. Although the output file looks "clean" I am not able to use the value as such, when I pass the output to the next stage in the SAME job. Essentially there is only one column whereas I need three columns. Also, in spite of the Char(10) introduced, DataStage thinks that there is only one row because if I use
[code]
DCount(EReplace(Splice(Reuse(DSLink1.Col1:"|":DSLink2.Col2),"|",Convert("|",@FM,DSLink1.Col3)),@FM,Char(10)),"|")
[/code]
the return value is the total number of delimiter ("|") used in the output data. i.e. the return value for DCount is not
2
2
2
....
rather the value of DCount is 14
Hence I decided to use @VM and @FM at least so that I have an array to start playing with.
Now I need to convert the array into multiple rows and multiple columns such that I could define the columns within the same job in DataStage. The goal is to provide the output to a Join stage within the same job.
I looked into the guides and tried my hands on almost everything that I thought could be a fit, but in vain. For example, I tried to use a Split Vector, but the attempt failed because the input was not a vector. I could not use Make Vector because there is really no vector to create. In my mind, I had already created the vector with the array.
I could use EReplace and with the pipe character and Char(10), I could write this output to a data set or a flat file and use it in another job, but this would be ineffecient.
Many thanks in advance,
(I have gone through the forums and read all about vertical pivots before asking this question. I have also perused the manuals - BASIC reference and Parallel Job Developers Guide)
The requirement is as follows
Input
[code]
Col1 Col2 Col3
A B 1,2,3
D E 4,5,6,7,8
[/code]
The field delimiter is pipe character
Output
[code]
Col1 Col2 Col3
A B 1
A B 2
A B 3
D E 4
D E 5
D E 6
D E 7
D E 8
[/code]
The field delimiter is pipe character
I followed one of Ray's post to use Splice and Reuse UVerse functions (very elegant for the task at end) and ended up with the following
[code]
Splice(Reuse(DSLink1.Col1:"|":DSLink2.Col2),@VM,Convert(",",@FM,DSLink1.Col3))
[/code]
Please note that I have two key columns and hence I had to concatenate both when using Reuse function.
The result is as follows
[code]
A|B@VM1@FMA|B@VM2@FMA|B@VM3@FMD|E@VM4@FMD|E@VM5@FMD|E@VM6@FMD|E@VM7@FMD|E@VM8@FM
[/code]
I was able to convert the above value into multiple records by replacing @VM and @FM in the function with the "|" character and Char(10) respectively.
[code]
EReplace(Splice(Reuse(DSLink1.Col1:"|":DSLink2.Col2),"|",Convert("|",@FM,DSLink1.Col3)),@FM,Char(10))
[/code]
This yields results as expected when I write the output to a text file.
However, I only have one column for the entire output data. Although the output file looks "clean" I am not able to use the value as such, when I pass the output to the next stage in the SAME job. Essentially there is only one column whereas I need three columns. Also, in spite of the Char(10) introduced, DataStage thinks that there is only one row because if I use
[code]
DCount(EReplace(Splice(Reuse(DSLink1.Col1:"|":DSLink2.Col2),"|",Convert("|",@FM,DSLink1.Col3)),@FM,Char(10)),"|")
[/code]
the return value is the total number of delimiter ("|") used in the output data. i.e. the return value for DCount is not
2
2
2
....
rather the value of DCount is 14
Hence I decided to use @VM and @FM at least so that I have an array to start playing with.
Now I need to convert the array into multiple rows and multiple columns such that I could define the columns within the same job in DataStage. The goal is to provide the output to a Join stage within the same job.
I looked into the guides and tried my hands on almost everything that I thought could be a fit, but in vain. For example, I tried to use a Split Vector, but the attempt failed because the input was not a vector. I could not use Make Vector because there is really no vector to create. In my mind, I had already created the vector with the array.
I could use EReplace and with the pipe character and Char(10), I could write this output to a data set or a flat file and use it in another job, but this would be ineffecient.
Many thanks in advance,