Hi,
I have a big file ( '|' delimted ). I need to split this file into multiple files based on the value in the first column.
all the rows with the same first column value should go into one file.|77
there are 314 diffrent values in first column so the big file needs to split in to 314 different files.
example:
xx|33|55|66|ggg|ddfc|
xx|67|67|89||568|fdsk|
zz|44|67|55||568|fdsk|
yy|456|jdfjd|dfksd|567|67|
xx|45|67|49||588|fdsk|
zz|67|67|89||568|fdsk|
output files are 3
xx.txt
xx|33|55|66|ggg|ddfc|
xx|67|67|89||568|fdsk|
xx|45|67|49||588|fdsk|
yy.txt
yy|456|jdfjd|dfksd|567|67|
zz.txt
zz|67|67|89||568|fdsk|
zz|44|67|55||568|fdsk|
Thanks,
Ram
Splitting a file into multiple files based on first column
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 34
- Joined: Thu Jul 29, 2004 1:03 pm
My 2 cents,
I would not do it using Datastage, simply because it will be too tedious and the job very cumbersome.
I would rather use a shell script (perl script / a java program) to achieve what you want to. A simple algorithm would be as follows
Preferably (but not necessary) Sort the data based on the first column.
Read each line line from the master file (file containing all the records).
Read the value in the first column using the cut command. Assign this to a variable. This will be the file name you want to write to.
Append the row to the file identified by the variable.
This might be time consuming depending upon the number of rows you have, but in my opinion elegant.
Hope this helps.
Thanks,
Shekar
I would not do it using Datastage, simply because it will be too tedious and the job very cumbersome.
I would rather use a shell script (perl script / a java program) to achieve what you want to. A simple algorithm would be as follows
Preferably (but not necessary) Sort the data based on the first column.
Read each line line from the master file (file containing all the records).
Read the value in the first column using the cut command. Assign this to a variable. This will be the file name you want to write to.
Append the row to the file identified by the variable.
This might be time consuming depending upon the number of rows you have, but in my opinion elegant.
Hope this helps.
Thanks,
Shekar
Welcome aboard Ram
This is something that you want to keep outside datastage. Write a small shell script. That will get you going.
From the top of my head, something like
myfile.txt is your "huge" file. Provide a fully qualified path to it.
This is something that you want to keep outside datastage. Write a small shell script. That will get you going.
From the top of my head, something like
Code: Select all
#!/usr/bin/ksh
awk -F"|" '{ print $1 }' myfile.txt | sort | uniq > uniqfile.txt
cat uniqfile.txt | while read filenames
do
cat myfile.txt | grep ${filenames} > ${filenames}.txt
done
Last edited by DSguru2B on Thu Jun 29, 2006 9:55 am, edited 1 time in total.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.