how to create multiple rows from a row

gsym · Post by **gsym** » Tue Jan 02, 2007 1:11 pm

Hi Everyone,
I have a file with tilda delimited(12 fields) and field3,field4,field5 can have one or multiple values separated by commas

pepsi~1~1,2,3~001,002~hourly,salaried~200612~quarterly~1~U~50~peer group~pepsi report

output should be:

pepsi,1,1,001,hourly,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,1,002,hourly,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,1,001,salaried,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,1,002,salaried,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,2,001,hourly,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,2,002,hourly,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,2,001,salaried,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,2,002,salaried,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,3,001,hourly,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,3,002,hourly,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,3,001,salaried,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,3,002,salaried,200612,quarterly,1,U,50,peer group,pepsi report

I need ur suggestions plz...

DSguru2B · Post by **DSguru2B** » Tue Jan 02, 2007 1:52 pm

Pivot stage. Read the pdf help for specific ways to accomplish your task.

ray.wurlod · Post by **ray.wurlod** » Tue Jan 02, 2007 2:19 pm

You may need two cascaded Pivot stages to achieve what you describe.

gsym · Post by **gsym** » Wed Jan 03, 2007 10:26 am

here the fields (3,4,and 5) can have a single value or multiple values. dont know the maximum number of values for these fields.
how to do it?

Thanks

DSguru2B · Post by **DSguru2B** » Wed Jan 03, 2007 10:44 am

It doesnt matter how many values are there. Its horizontal data being pivoted. The final meta data will remain constant. The pivot stage will get this done. I am not saying just one pivot stage. You might need to parse your data a little bit too.

DSguru2B · Post by **DSguru2B** » Thu Jan 04, 2007 12:43 pm

Granted. This will take more work than just a simple use of multiple pivot stages as the number of fields that are comma delimited are unknown. Pivot stage would have worked if the comma delimited columns (3, 4 and 5) were static. You need something like awk. Here is what i came up with. This script will work for your input data.i.e;
-total of 12 columns
-3,4 and 5 are varying only.

Run the following script. Just change the variables inFile and outFile to adapt to your environment.

Code: Select all

#!/usr/bin/ksh

#Change the following two lines
export inFile=/Data/SFDCDEV/temp/gsym.txt
export outFile=/Data/SFDCDEV/temp/Result_gsym.txt

#remove previous output file if exists.
if [ -f $outFile]
then
  rm -f $outFile
fi

#the main body
cat $inFile | while read line
do
  #The next 7 lines are referred as first part in my explanation below
   echo $line | awk 'BEGIN{FS="~"} 
                     {split ($5, c, ",")}
                     {
                      OFS="~"
                      for (z in c)
                      print $1,$2,$3,$4,c[z],$6,$7,$8,$9,$10,$11,$12
                     }' | while read A
 #The next 8 lines are referred as second part in my explanation below
 do
   echo $A |  awk 'BEGIN{FS="~"}
                   {split ($4, b, ",")}
                   {
                    OFS="~"
                    for (y in b)
                    print $1,$2,$3,b[y],$5,$6,$7,$8,$9,$10,$11,$12
                   }' | while read B
 #The next 8 lines are referred as third part in my explanation below  
 do
   echo $B |  awk 'BEGIN{FS="~"}
                   {split ($3, a, ",")}
                   {
                    OFS=","
                    for (x in a)
                    print $1,$2,a[x],$4,$5,$6,$7,$8,$9,$10,$11,$12
                   }' >> $outFile
  done
 done
done

What i am doing is splitting the 5 column into an array called z. Printing all the rest of the columns as it is with every value of array z. The output of first part will result in a similar format as the input file, the only difference will be that column 5 will be split.
Now each row of first part will be sent to the second part which does a similar thing. Then each row of second part will be sent to third part and finally written to the output file.
Run this script and you will get a pivoted,comma delimited file as you want.

ray.wurlod · Post by **ray.wurlod** » Thu Jan 04, 2007 3:55 pm

I'm sure there's a solution possible in DataStage, probably involving Transformer and Remove Duplicates (keeping last) stages.

DSguru2B · Post by **DSguru2B** » Thu Jan 04, 2007 3:59 pm

Well, it would have been if the comma delimited values were static. Then they could have been changed to columns and a pivot stage would do the trick. I dont know how the varying fields would be treated by DataStage. I thought about it, maybe not long enough. Did'nt want to burn any more brain cells, so i wrote out a script. I would love to know of a way for it to be done in DS. Hey i am here to learn after all.

DSguru2B · Post by **DSguru2B** » Fri Jan 05, 2007 4:00 pm

leomauer wrote: 2. In every multi-value field replace commas with @VM (char(253))

You mean @FM

But how would you then read from the hashed file with 12 defined on your output ?

leomauer · Post by **leomauer** » Fri Jan 05, 2007 4:12 pm

DSguru2B wrote:
leomauer wrote: 2. In every multi-value field replace commas with @VM (char(253))
You mean @FM
But how would you then read from the hashed file with 12 defined on your output ?

No. I really mean @VM.

That is how multi-value fields are reconized in hash file functionality. @FM will convert each multi-value field into number of the regular hash file fields.
I wanted to envoke "Normalize on" hash file functionality, not just spread fields out.
There will be 1 record loaded to first hash file, 3 records unloaded from it and loaded to second. 6 records unloaded from second and loaded to third hash file. And 12 records unloaded from the third hash file.

DSguru2B · Post by **DSguru2B** » Fri Jan 05, 2007 5:13 pm

Hmm, pretty cool. I will give that a shot on monday.