how to create multiple rows from a row

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
gsym
Charter Member
Charter Member
Posts: 118
Joined: Thu Feb 02, 2006 3:05 pm

how to create multiple rows from a row

Post by gsym »

Hi Everyone,
I have a file with tilda delimited(12 fields) and field3,field4,field5 can have one or multiple values separated by commas

pepsi~1~1,2,3~001,002~hourly,salaried~200612~quarterly~1~U~50~peer group~pepsi report

output should be:

pepsi,1,1,001,hourly,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,1,002,hourly,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,1,001,salaried,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,1,002,salaried,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,2,001,hourly,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,2,002,hourly,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,2,001,salaried,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,2,002,salaried,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,3,001,hourly,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,3,002,hourly,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,3,001,salaried,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,3,002,salaried,200612,quarterly,1,U,50,peer group,pepsi report

I need ur suggestions plz...
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Pivot stage. Read the pdf help for specific ways to accomplish your task.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You may need two cascaded Pivot stages to achieve what you describe.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gsym
Charter Member
Charter Member
Posts: 118
Joined: Thu Feb 02, 2006 3:05 pm

Post by gsym »

here the fields (3,4,and 5) can have a single value or multiple values. dont know the maximum number of values for these fields.
how to do it?

Thanks
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

It doesnt matter how many values are there. Its horizontal data being pivoted. The final meta data will remain constant. The pivot stage will get this done. I am not saying just one pivot stage. You might need to parse your data a little bit too.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Granted. This will take more work than just a simple use of multiple pivot stages as the number of fields that are comma delimited are unknown. Pivot stage would have worked if the comma delimited columns (3, 4 and 5) were static. You need something like awk. Here is what i came up with. This script will work for your input data.i.e;
-total of 12 columns
-3,4 and 5 are varying only.

Run the following script. Just change the variables inFile and outFile to adapt to your environment.

Code: Select all

#!/usr/bin/ksh

#Change the following two lines
export inFile=/Data/SFDCDEV/temp/gsym.txt
export outFile=/Data/SFDCDEV/temp/Result_gsym.txt

#remove previous output file if exists.
if [ -f $outFile]
then
  rm -f $outFile
fi

#the main body
cat $inFile | while read line
do
  #The next 7 lines are referred as first part in my explanation below
   echo $line | awk 'BEGIN{FS="~"} 
                     {split ($5, c, ",")}
                     {
                      OFS="~"
                      for (z in c)
                      print $1,$2,$3,$4,c[z],$6,$7,$8,$9,$10,$11,$12
                     }' | while read A
 #The next 8 lines are referred as second part in my explanation below
 do
   echo $A |  awk 'BEGIN{FS="~"}
                   {split ($4, b, ",")}
                   {
                    OFS="~"
                    for (y in b)
                    print $1,$2,$3,b[y],$5,$6,$7,$8,$9,$10,$11,$12
                   }' | while read B
 #The next 8 lines are referred as third part in my explanation below  
 do
   echo $B |  awk 'BEGIN{FS="~"}
                   {split ($3, a, ",")}
                   {
                    OFS=","
                    for (x in a)
                    print $1,$2,a[x],$4,$5,$6,$7,$8,$9,$10,$11,$12
                   }' >> $outFile
  done
 done
done
What i am doing is splitting the 5 column into an array called z. Printing all the rest of the columns as it is with every value of array z. The output of first part will result in a similar format as the input file, the only difference will be that column 5 will be split.
Now each row of first part will be sent to the second part which does a similar thing. Then each row of second part will be sent to third part and finally written to the output file.
Run this script and you will get a pivoted,comma delimited file as you want.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I'm sure there's a solution possible in DataStage, probably involving Transformer and Remove Duplicates (keeping last) stages.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Well, it would have been if the comma delimited values were static. Then they could have been changed to columns and a pivot stage would do the trick. I dont know how the varying fields would be treated by DataStage. I thought about it, maybe not long enough. Did'nt want to burn any more brain cells, so i wrote out a script. I would love to know of a way for it to be done in DS. Hey i am here to learn after all. :wink:
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

leomauer wrote: 2. In every multi-value field replace commas with @VM (char(253))
You mean @FM :wink:
But how would you then read from the hashed file with 12 defined on your output ?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
leomauer
Premium Member
Premium Member
Posts: 100
Joined: Mon Nov 03, 2003 1:33 pm

Post by leomauer »

DSguru2B wrote:
leomauer wrote: 2. In every multi-value field replace commas with @VM (char(253))
You mean @FM :wink:
But how would you then read from the hashed file with 12 defined on your output ?
No. I really mean @VM. :P That is how multi-value fields are reconized in hash file functionality. @FM will convert each multi-value field into number of the regular hash file fields.
I wanted to envoke "Normalize on" hash file functionality, not just spread fields out.
There will be 1 record loaded to first hash file, 3 records unloaded from it and loaded to second. 6 records unloaded from second and loaded to third hash file. And 12 records unloaded from the third hash file.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Hmm, pretty cool. I will give that a shot on monday.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply