CSV column to different columns

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mdbatra
Premium Member
Premium Member
Posts: 175
Joined: Wed Oct 22, 2008 10:01 am
Location: City of London

CSV column to different columns

Post by mdbatra »

I have 2 columns containing values like:
Col1 Col2(contain CSVs)
value string1,string2,string3..& so on.
I require the output to be as:
Col1 Col2
Value String1
Value String2
Value String3 .

Thought of first coverting the input column col2 into different columns like
Col1 Col2 Col3 Col4 (need as many columns as strings)
value String1 String2 String3

& then secondly a pivot to get
Col1 Col2
Value String1
Value String2
Value String3

Any help on first or suggestion of an alternative easier way to acheive the result.
Rgds,
MB
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Is the number of comma delimited fields fixed or variable? Or do you at least know what the maximum number of fields there can be?
-craig

"You can never have too many knives" -- Logan Nine Fingers
mdbatra
Premium Member
Premium Member
Posts: 175
Joined: Wed Oct 22, 2008 10:01 am
Location: City of London

Post by mdbatra »

Those comma delimited fields are basically the stage names of all the stages in a particular job.
So, maximum number is also not fixed..!

To be more precise, from DSGetJobInfo(variable,DSJ.STAGELIST) , i get a comma delimited column containing all stage name of a job. But, requirment is to have them in separate columns such that later on i can do a pivot to get stage names as Rows instead of columns.

or any other way for acheiving same?
Rgds,
MB
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

The fact that it is a variable number with no theoretical maximum makes this more challenging.

If potential rejects are acceptable to you, you could choose some arbitrary maximum number of columns (say 100) and use a column import stage or field functions within a transformer to populate up to 100 columns while rejecting any records that have more than 100.

If you want to handle it without imposing some arbitrary maximum, I would look into the possibility of replacing each comma delimiter with a concatenation of newline + Col1 + comma.

Mike
mdbatra
Premium Member
Premium Member
Posts: 175
Joined: Wed Oct 22, 2008 10:01 am
Location: City of London

Post by mdbatra »

I am able to do it using Field functions but for that i should know before hand that how many stages i have in job.
How i am supposed to leave it for datastage to get the number of stages & then do the same for all stages in a job & further for all the jobs.
Rgds,
MB
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Server job with custom routine would be the way to go to handle variable occurences with unknown maximum.

Mike
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Or even easier a server job loading a multi-valued field and reading it normalized.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mdbatra
Premium Member
Premium Member
Posts: 175
Joined: Wed Oct 22, 2008 10:01 am
Location: City of London

Post by mdbatra »

I did a search for the same(Reading Normalized) but didnt get the desirable.
Can you please tell how to read in a Normalized manner!
Rgds,
MB
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Do an exact search for "normalize" for quite a number of discussions on the subject. For example: viewtopic.php?t=121718
-craig

"You can never have too many knives" -- Logan Nine Fingers
mdbatra
Premium Member
Premium Member
Posts: 175
Joined: Wed Oct 22, 2008 10:01 am
Location: City of London

Post by mdbatra »

Hi Craig,
I browsed the forum as guided by you but didnt have the luck. From the link that you sent in your last post, i saw that somebody tried with:

Convert(";", @VM, FIELD2) In a transformer and load it into a hash file as a non key column.
Can you please explain the @VM functionality, not able to define the syntax properly perhaps.

or as you asked for the Maximum number of stages in the earlier post, even if i have that , how is that going to help me.
Rgds,
MB
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

If you know the max - create that many additional fields in a Transformer, populate with Field(), pivot and then filter out the null results.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That Convert() function is turning the field delimiters inside the string to "Value Marks", magical goodies that make a field "multi-valued" and are key to the normalization that can be done.
-craig

"You can never have too many knives" -- Logan Nine Fingers
mdbatra
Premium Member
Premium Member
Posts: 175
Joined: Wed Oct 22, 2008 10:01 am
Location: City of London

Post by mdbatra »

I am a bit late in reading your post, Craig :D
I just finished up with doing same , then thought of checking for a reply here. Infact, your post reinforced that i did it correctly. Thanks!

Now,i am done with getting the
Job_Name Job_NO Stage_Name , for a particular Job(hardcoded the job name in Job Control).

Would be much thankful if you can guide for doing it for all jobs. I trust that i need to do something like : Have all the job names in a file, then read enteries in the file sequentially & pass job name to the other parameters in Job control...Correct me if wrong!
If correct, can you please tell what changes i need to make!
Thanks Again!
Rgds,
MB
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sure, you can use 'dsjob' to dump job names to a file and then drive from that file. Or you could actually read DS_JOBS directly with a Hashed File or Universe stage in a Server job to drive this or to populate the file.
-craig

"You can never have too many knives" -- Logan Nine Fingers
mdbatra
Premium Member
Premium Member
Posts: 175
Joined: Wed Oct 22, 2008 10:01 am
Location: City of London

Post by mdbatra »

Thanks Craig, for your valuable assitance!
Rgds,
MB
Post Reply