Page 1 of 1

CSV column to different columns

Posted: Thu Dec 04, 2008 4:48 am
by mdbatra
I have 2 columns containing values like:
Col1 Col2(contain CSVs)
value string1,string2,string3..& so on.
I require the output to be as:
Col1 Col2
Value String1
Value String2
Value String3 .

Thought of first coverting the input column col2 into different columns like
Col1 Col2 Col3 Col4 (need as many columns as strings)
value String1 String2 String3

& then secondly a pivot to get
Col1 Col2
Value String1
Value String2
Value String3

Any help on first or suggestion of an alternative easier way to acheive the result.

Posted: Thu Dec 04, 2008 8:15 am
by chulett
Is the number of comma delimited fields fixed or variable? Or do you at least know what the maximum number of fields there can be?

Posted: Thu Dec 04, 2008 8:51 am
by mdbatra
Those comma delimited fields are basically the stage names of all the stages in a particular job.
So, maximum number is also not fixed..!

To be more precise, from DSGetJobInfo(variable,DSJ.STAGELIST) , i get a comma delimited column containing all stage name of a job. But, requirment is to have them in separate columns such that later on i can do a pivot to get stage names as Rows instead of columns.

or any other way for acheiving same?

Posted: Thu Dec 04, 2008 9:34 am
by Mike
The fact that it is a variable number with no theoretical maximum makes this more challenging.

If potential rejects are acceptable to you, you could choose some arbitrary maximum number of columns (say 100) and use a column import stage or field functions within a transformer to populate up to 100 columns while rejecting any records that have more than 100.

If you want to handle it without imposing some arbitrary maximum, I would look into the possibility of replacing each comma delimiter with a concatenation of newline + Col1 + comma.

Mike

Posted: Thu Dec 04, 2008 1:22 pm
by mdbatra
I am able to do it using Field functions but for that i should know before hand that how many stages i have in job.
How i am supposed to leave it for datastage to get the number of stages & then do the same for all stages in a job & further for all the jobs.

Posted: Thu Dec 04, 2008 1:33 pm
by Mike
Server job with custom routine would be the way to go to handle variable occurences with unknown maximum.

Mike

Posted: Thu Dec 04, 2008 1:57 pm
by ray.wurlod
Or even easier a server job loading a multi-valued field and reading it normalized.

Posted: Thu Dec 04, 2008 3:29 pm
by mdbatra
I did a search for the same(Reading Normalized) but didnt get the desirable.
Can you please tell how to read in a Normalized manner!

Posted: Thu Dec 04, 2008 3:33 pm
by chulett
Do an exact search for "normalize" for quite a number of discussions on the subject. For example: viewtopic.php?t=121718

Posted: Fri Dec 05, 2008 4:55 am
by mdbatra
Hi Craig,
I browsed the forum as guided by you but didnt have the luck. From the link that you sent in your last post, i saw that somebody tried with:

Convert(";", @VM, FIELD2) In a transformer and load it into a hash file as a non key column.
Can you please explain the @VM functionality, not able to define the syntax properly perhaps.

or as you asked for the Maximum number of stages in the earlier post, even if i have that , how is that going to help me.

Posted: Fri Dec 05, 2008 10:21 am
by chulett
If you know the max - create that many additional fields in a Transformer, populate with Field(), pivot and then filter out the null results.

Posted: Fri Dec 05, 2008 10:51 am
by chulett
That Convert() function is turning the field delimiters inside the string to "Value Marks", magical goodies that make a field "multi-valued" and are key to the normalization that can be done.

Posted: Fri Dec 05, 2008 10:58 am
by mdbatra
I am a bit late in reading your post, Craig :D
I just finished up with doing same , then thought of checking for a reply here. Infact, your post reinforced that i did it correctly. Thanks!

Now,i am done with getting the
Job_Name Job_NO Stage_Name , for a particular Job(hardcoded the job name in Job Control).

Would be much thankful if you can guide for doing it for all jobs. I trust that i need to do something like : Have all the job names in a file, then read enteries in the file sequentially & pass job name to the other parameters in Job control...Correct me if wrong!
If correct, can you please tell what changes i need to make!
Thanks Again!

Posted: Fri Dec 05, 2008 11:14 am
by chulett
Sure, you can use 'dsjob' to dump job names to a file and then drive from that file. Or you could actually read DS_JOBS directly with a Hashed File or Universe stage in a Server job to drive this or to populate the file.

Posted: Fri Dec 05, 2008 12:12 pm
by mdbatra
Thanks Craig, for your valuable assitance!