how to create multiple rows from a row
Posted: Tue Jan 02, 2007 1:11 pm
by gsym
Hi Everyone,
I have a file with tilda delimited(12 fields) and field3,field4,field5 can have one or multiple values separated by commas
pepsi~1~1,2,3~001,002~hourly,salaried~200612~quarterly~1~U~50~peer group~pepsi report
output should be:
pepsi,1,1,001,hourly,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,1,002,hourly,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,1,001,salaried,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,1,002,salaried,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,2,001,hourly,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,2,002,hourly,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,2,001,salaried,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,2,002,salaried,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,3,001,hourly,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,3,002,hourly,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,3,001,salaried,200612,quarterly,1,U,50,peer group,pepsi report
pepsi,1,3,002,salaried,200612,quarterly,1,U,50,peer group,pepsi report
I need ur suggestions plz...
Posted: Tue Jan 02, 2007 1:52 pm
by DSguru2B
Pivot stage. Read the pdf help for specific ways to accomplish your task.
Posted: Tue Jan 02, 2007 2:19 pm
by ray.wurlod
You may need two cascaded Pivot stages to achieve what you describe.
Posted: Wed Jan 03, 2007 10:26 am
by gsym
here the fields (3,4,and 5) can have a single value or multiple values. dont know the maximum number of values for these fields.
how to do it?
Thanks
Posted: Wed Jan 03, 2007 10:44 am
by DSguru2B
It doesnt matter how many values are there. Its horizontal data being pivoted. The final meta data will remain constant. The pivot stage will get this done. I am not saying just one pivot stage. You might need to parse your data a little bit too.
Posted: Thu Jan 04, 2007 12:43 pm
by DSguru2B
Granted. This will take more work than just a simple use of multiple pivot stages as the number of fields that are comma delimited are unknown. Pivot stage would have worked if the comma delimited columns (3, 4 and 5) were static. You need something like awk. Here is what i came up with. This script will work for your input data.i.e;
-total of 12 columns
-3,4 and 5 are varying only.
Run the following script. Just change the variables inFile and outFile to adapt to your environment.
Code: Select all
#!/usr/bin/ksh
#Change the following two lines
export inFile=/Data/SFDCDEV/temp/gsym.txt
export outFile=/Data/SFDCDEV/temp/Result_gsym.txt
#remove previous output file if exists.
if [ -f $outFile]
then
rm -f $outFile
fi
#the main body
cat $inFile | while read line
do
#The next 7 lines are referred as first part in my explanation below
echo $line | awk 'BEGIN{FS="~"}
{split ($5, c, ",")}
{
OFS="~"
for (z in c)
print $1,$2,$3,$4,c[z],$6,$7,$8,$9,$10,$11,$12
}' | while read A
#The next 8 lines are referred as second part in my explanation below
do
echo $A | awk 'BEGIN{FS="~"}
{split ($4, b, ",")}
{
OFS="~"
for (y in b)
print $1,$2,$3,b[y],$5,$6,$7,$8,$9,$10,$11,$12
}' | while read B
#The next 8 lines are referred as third part in my explanation below
do
echo $B | awk 'BEGIN{FS="~"}
{split ($3, a, ",")}
{
OFS=","
for (x in a)
print $1,$2,a[x],$4,$5,$6,$7,$8,$9,$10,$11,$12
}' >> $outFile
done
done
done
What i am doing is splitting the 5 column into an array called z. Printing all the rest of the columns as it is with every value of array z. The output of first part will result in a similar format as the input file, the only difference will be that column 5 will be split.
Now each row of first part will be sent to the second part which does a similar thing. Then each row of second part will be sent to third part and finally written to the output file.
Run this script and you will get a pivoted,comma delimited file as you want.
Posted: Thu Jan 04, 2007 3:55 pm
by ray.wurlod
I'm sure there's a solution possible in DataStage, probably involving Transformer and Remove Duplicates (keeping last) stages.
Posted: Thu Jan 04, 2007 3:59 pm
by DSguru2B
Well, it would have been if the comma delimited values were static. Then they could have been changed to columns and a pivot stage would do the trick. I dont know how the varying fields would be treated by DataStage. I thought about it, maybe not long enough. Did'nt want to burn any more brain cells, so i wrote out a script. I would love to know of a way for it to be done in DS. Hey i am here to learn after all.
![Wink :wink:](./images/smilies/icon_wink.gif)
Posted: Fri Jan 05, 2007 4:12 pm
by leomauer
DSguru2B wrote:leomauer wrote:
2. In every multi-value field replace commas with @VM (char(253))
You mean @FM
But how would you then read from the hashed file with 12 defined on your output ?
No. I really mean @VM.
![Razz :P](./images/smilies/icon_razz.gif)
That is how multi-value fields are reconized in hash file functionality. @FM will convert each multi-value field into number of the regular hash file fields.
I wanted to envoke "Normalize on" hash file functionality, not just spread fields out.
There will be 1 record loaded to first hash file, 3 records unloaded from it and loaded to second. 6 records unloaded from second and loaded to third hash file. And 12 records unloaded from the third hash file.
Posted: Fri Jan 05, 2007 5:13 pm
by DSguru2B
Hmm, pretty cool. I will give that a shot on monday.