Transpose of a datavalue seperated by Commas

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There's usually multiple ways to solve any problem, but the simplest thing for this situation would be to use FIELD. From memory:

FIELD(DATA,",",1) -> Col1
FIELD(DATA,",",2) -> Col2 etc

You tell it the string to parse, what the seperator is and which occurance you want.

-craig
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

For very many rows it might be faster to write two columns into a text file with no quoting and "," as the separator, then to read six columns from the text file. (DataStage has some clever read-ahead capabilities within the Sequential File stage.)
cwong
Charter Member
Charter Member
Posts: 26
Joined: Tue Apr 30, 2002 8:02 am
Location: Canada

Post by cwong »

Another suggestion is to use the Pivot stage.
You may assume up to more than 6 columns just in case it will expand in the future.
Then for each column coming out, check if the column is not null then create an output record.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Have you verified this? It's my understanding that the Pivot stage requires the pivoting elements to be in separate columns, not in a comma-delimited list in one column.
Tony
Participant
Posts: 2
Joined: Wed Nov 20, 2002 3:00 am

Post by Tony »

There's usually multiple ways to solve any problem, but the simplest thing for this situation would be to use FIELD. From memory:

FIELD(DATA,",",1) -> Col1
FIELD(DATA,",",2) -> Col2 etc

You tell it the string to parse, what the seperator is and which occurance you want.

-craig

Hi all,

I got a similar problem like this,but i want to seperate values into different rows:
I got 1 row as input,and need 3 rows return:

col1 col2 col3
==== ====== =====
a b,c,d zzz

col1 col2 col3
===== ===== ======
a b zzz
a c zzz
a d zzz

How can i do this?
Thanks for your kindly response.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Split the three columns into five columns using a Transformer stage (derivation expressions using Field() as discussed above), then use a Pivot stage.
Manual for the Pivot stage is Pivot.pdf and is in the Packages folder of your DataStage client CD. It has good examples.

Welcome to the forum, Tony; I suspect you are the first from Taiwan (but would be happy to be proven wrong).

Edited by - ray.wurlod on 11/21/2002 02:05:05
Tony
Participant
Posts: 2
Joined: Wed Nov 20, 2002 3:00 am

Post by Tony »

Split the three columns into five columns using a Transformer stage (derivation expressions using Field() as discussed above), then use a Pivot stage.
Manual for the Pivot stage is Pivot.pdf and is in the Packages folder of your DataStage client CD. It has good examples.

Welcome to the forum, Tony; I suspect you are the first from Taiwan (but would be happy to be proven wrong).

Edited by - ray.wurlod on 11/21/2002 02:05:05
Hi,Ray

Thanks for your reply,
That's a very helpful information for me.
and...i think you got the right answer :)
cwong
Charter Member
Charter Member
Posts: 26
Joined: Tue Apr 30, 2002 8:02 am
Location: Canada

Post by cwong »

Ray,

Use a sequencial file stage in front where the field delimiter will be specify accordingly; in this case will be the "," then inside the pivot stage, will list all the columns. Coming out from the pivot stage will be one column at a time. Use a transformer stage to check if the incoming data is not null, create an output record.

For another scenario, any column of data that needs to repeat in subsequent rows, save in a stage variable in the transformer stage and use it to build the output record.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Nice thought, Cristal, but the source data come from two columns in a database table. The comma-delimited list is in one of the columns.
cwong
Charter Member
Charter Member
Posts: 26
Joined: Tue Apr 30, 2002 8:02 am
Location: Canada

Post by cwong »

I would suggest 2 options:
Option #1:
- bring in all needed columns from database table via the SELECT stmt
- use a transformer stage to concat all columns together with the @FM in between each
- translate delimiter in delimited list into @FM (field mark)
for example:
col1:@FM:col2:@FM:translate(col3,',',@FM)
note: pls check syntax of translate function.

Option #2:
- concatenate the columns into one big column in Select stmt
for example: SELECT col1||','||col2 FROM tab ...
assuming the ',' is the delimiter or use whatever used in the delimited list.
- send the big column into a transformer stage and translate the delimiter into @FM just before passing into the Pivot stage ...

Note: This option seems to be more flexible to handle changing number of incoming columns.



cwong
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In option #1 the function you require is Convert (there is no Translate function), for example:
Convert(",", @FM, col3)

Your option #2 in interesting in theory - have you tested it? As far as I know, the field mark character is not the internal delimited used by DataStage generally, but I'm prepared to learn new stuff; if the Pivot stage uses the field mark internally your option #2 would work.

The most flexible approach to handling a variable number of incoming columns is to use the multi-valued column capability of UniVerse tables; load the table un-normalized then select from it normalized on the association or non-associated multi-valued column. Of course this requires some knowledge to set up the UniVerse table with columns defined to be multi-valued and, perhaps, associated.

By and large, though, Cristal, I believe you are trying to add too much complexity to what is really a simple problem with a simple solution. In DataStage simplicity is the foundation of performance.



Edited by - ray.wurlod on 11/23/2002 15:40:52
cwong
Charter Member
Charter Member
Posts: 26
Joined: Tue Apr 30, 2002 8:02 am
Location: Canada

Post by cwong »

Thanks for correcting the use of function ... the time I typed in my reply here, I just finished coding the transalte func in a SQL stmt to DB2 and got mixed up.

Yes, I am not a UV guru and have never use the multi-valued column capability of UniVerse tables. Your suggestion would be complicated to me.
Post Reply