Converting a dynamic array into multiple rows

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
shankar_ramanath
Premium Member
Premium Member
Posts: 67
Joined: Thu Aug 09, 2007 7:51 pm

Converting a dynamic array into multiple rows

Post by shankar_ramanath »

Hello Gurus,

(I have gone through the forums and read all about vertical pivots before asking this question. I have also perused the manuals - BASIC reference and Parallel Job Developers Guide)

The requirement is as follows

Input

[code]
Col1 Col2 Col3
A B 1,2,3
D E 4,5,6,7,8
[/code]

The field delimiter is pipe character

Output

[code]
Col1 Col2 Col3
A B 1
A B 2
A B 3
D E 4
D E 5
D E 6
D E 7
D E 8
[/code]

The field delimiter is pipe character

I followed one of Ray's post to use Splice and Reuse UVerse functions (very elegant for the task at end) and ended up with the following

[code]
Splice(Reuse(DSLink1.Col1:"|":DSLink2.Col2),@VM,Convert(",",@FM,DSLink1.Col3))
[/code]

Please note that I have two key columns and hence I had to concatenate both when using Reuse function.

The result is as follows

[code]
A|B@VM1@FMA|B@VM2@FMA|B@VM3@FMD|E@VM4@FMD|E@VM5@FMD|E@VM6@FMD|E@VM7@FMD|E@VM8@FM
[/code]

I was able to convert the above value into multiple records by replacing @VM and @FM in the function with the "|" character and Char(10) respectively.

[code]
EReplace(Splice(Reuse(DSLink1.Col1:"|":DSLink2.Col2),"|",Convert("|",@FM,DSLink1.Col3)),@FM,Char(10))
[/code]

This yields results as expected when I write the output to a text file.

However, I only have one column for the entire output data. Although the output file looks "clean" I am not able to use the value as such, when I pass the output to the next stage in the SAME job. Essentially there is only one column whereas I need three columns. Also, in spite of the Char(10) introduced, DataStage thinks that there is only one row because if I use

[code]
DCount(EReplace(Splice(Reuse(DSLink1.Col1:"|":DSLink2.Col2),"|",Convert("|",@FM,DSLink1.Col3)),@FM,Char(10)),"|")
[/code]

the return value is the total number of delimiter ("|") used in the output data. i.e. the return value for DCount is not

2
2
2
....

rather the value of DCount is 14

Hence I decided to use @VM and @FM at least so that I have an array to start playing with.

Now I need to convert the array into multiple rows and multiple columns such that I could define the columns within the same job in DataStage. The goal is to provide the output to a Join stage within the same job.

I looked into the guides and tried my hands on almost everything that I thought could be a fit, but in vain. For example, I tried to use a Split Vector, but the attempt failed because the input was not a vector. I could not use Make Vector because there is really no vector to create. In my mind, I had already created the vector with the array.

I could use EReplace and with the pipe character and Char(10), I could write this output to a data set or a flat file and use it in another job, but this would be ineffecient.

Many thanks in advance,
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Leave the @FM characters - they will be converted to newlines when you write to the text file. Make sure that you specify 000 for delimiter and quote characters in the format and None as the record delimiter style.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
shankar_ramanath
Premium Member
Premium Member
Posts: 67
Joined: Thu Aug 09, 2007 7:51 pm

Post by shankar_ramanath »

[quote="ray.wurlod"]Leave the @FM characters - they will be converted to newlines when you write to the text file. Make sure that you specify 000 for delimiter and quote characters in the format and None as the record delimiter style.[/quote]

Hi Ray,

Thanks for the reply.

Is there a way to get the output as multiple rows and multiple columns without writing to a text file? I intend to use the output of the function in the same job. In this particular case, the output will be used in a Join stage. Hence I need to get the multi-column output without sending it to an intermediary such as a text file or data set.

Thanks,
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Possibly Column Import to parse the record and Pivot. That's just an initial thought, and probably requires you to preserve the newline characters.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
shankar_ramanath
Premium Member
Premium Member
Posts: 67
Joined: Thu Aug 09, 2007 7:51 pm

Post by shankar_ramanath »

ray.wurlod wrote:Possibly Column Import to parse the record and Pivot. That's just an initial thought, and probably requires you to preserve the newline characters.
Thanks Ray!

I think I am pretty close. I did not quite understand what you meant by "preserve the newline characters" initially, but now I am beginning to understand.

I have two output streams from the BASIC transformer both of which use the same function

Code: Select all

Change(Splice(Reuse(DSLink.Col1:":":DSLink.Col2),":",Convert(",",@FM,DSLink.Col3)),@FM,"|")
The first stream sends the output to an external filter that uses the filter command tr '|' '\n'. I use the Field function to filter the output into three columns using the Field function. Works like a charm. For the example in question, the output is of the format

Code: Select all

 
Col1 Col2 Col3 
A B 1 
A B 2 
A B 3 
D E 4 
D E 5 
D E 6 
D E 7 
D E 8 


However, I see that this approach is sub-optimal because using the External Filter needs to collect the data from multiple partitions and is therefore semantically no different from a sequential file/data set.

The second stream uses a parallel transformer that attempts to perform inline conversion of "|" into Char(10) using EReplace function. The result is as follows

Code: Select all

 
Col1 Col2 Col3 
A B 1 
A 
(I have also tried using Convert and Change function with the same result).

I think that the BASIC string conversion functions are unable to interpret the newline character in the correct manner whereas the Unix "tr" function works perfectly (I may be totally wrong). I have gone through the BASIC reference guide to see if there is any other option but I am out of luck.

Could you please suggest?

Many thanks again,
Post Reply