Splitting a Large text file into small files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mdtauseefhussain
Participant
Posts: 38
Joined: Mon Feb 27, 2006 10:34 pm
Location: Chennai
Contact:

Splitting a Large text file into small files

Post by mdtauseefhussain »

HI! all

I have a text file which contains 90,000 records.my requirement is to split this file in to four parts becuse according to an requirement a file should contain 25,000 records only .After splitting i need to concatenate the timestamp with the file name ,

Plese can any 1 help me

Thanks in Advance

Tausif
Mohammed Tausif Hussain Sheikh
Cognizant technologies,Perungudi
Chennai
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

viewtopic.php?p=196397#196397

Hope this thread can help you.

Sam
mdtauseefhussain
Participant
Posts: 38
Joined: Mon Feb 27, 2006 10:34 pm
Location: Chennai
Contact:

Post by mdtauseefhussain »

I have used the slpit command,it is Splitting the file ,but the output files are in different format but not in .txt format.

can anyone help me on this
Mohammed Tausif Hussain Sheikh
Cognizant technologies,Perungudi
Chennai
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The split command is incapable of changing the file's format if you split on lines. What - precisely - did you do?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You are probably confused by the lack of a '.txt' extension on the split files. It doesn't mean that it 'changed the format', if you open them you'll see they are still fine.

Part of your 'split post-processing' will need to be renaming the files back to your desired pattern, including restoring the extension of record.
-craig

"You can never have too many knives" -- Logan Nine Fingers
mdtauseefhussain
Participant
Posts: 38
Joined: Mon Feb 27, 2006 10:34 pm
Location: Chennai
Contact:

Post by mdtauseefhussain »

Yes i ahve done that by using cat command i was able to achieve the desired results

now iam thinking to automate the process ,as it is difficult to keep renaming the file


As per the reuire ments

i got a text file of size 62 mb

i use split -10000 filename.txt Sample

the result was 100 different files

the i manually renamed the files with .txt extension and time stamp concatenated the file name to it

it was pretty tough to do it manually

Can any one sugget me to automate the process

for ex

Run split

Count num of out put file

and rename them with the native for mat
Mohammed Tausif Hussain Sheikh
Cognizant technologies,Perungudi
Chennai
thumsup9
Charter Member
Charter Member
Posts: 168
Joined: Fri Feb 18, 2005 11:29 am

Post by thumsup9 »

Something like this with a Unix Shell Script

for file in *
do
mv "${file}" "${file}".txt.`date`
done
thumsup9
Charter Member
Charter Member
Posts: 168
Joined: Fri Feb 18, 2005 11:29 am

Post by thumsup9 »

thumsup9 wrote:Something like this with a Unix Shell Script

for file in *
do
mv "${file}" "${file}".txt.`date`
done
oops realized you are working on Windows..
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Doesn't matter. If running parallel jobs on Windows then MKS Toolkit is installed, and you CAN use Unix shell scripts.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jgreve
Premium Member
Premium Member
Posts: 107
Joined: Mon Sep 25, 2006 4:25 pm

minor script, one-shot effort

Post by jgreve »

I'd do a KSH script like this... thum's for file in * one looks fine, too.
Depending on how I set the loop up, the files would expand (glob) at
invocation time so it had an empty list to chew on - hence the find
in mine.

I like the date format %Y%m%d, which generates 20061004.
"man date" will show you more tricks w/formats.
You'll want to kick cnt up to what was it, 20000?
Note that "now" uses back-ticks (the other single quote)
around the entire date-command (the param is in normal single
quotes).
the date-marker just once... that way if you run around midnight,
half your files don't get a different prefix.

Code: Select all

$ cat split.sh
big=foo_seq.dat
cnt=20
now=`date '+%Y%m%d'`
prefix=xx_
rm -f $prefix*
split -l $cnt $big $prefix$now
find . -name "$prefix*" -print | {
   while read file; do
      mv $file $file.txt
   done
}

$
If you tweak the first line to say:
big=$1
and put the above into a file name mysplit.sh, you're
looking at smth like this:

$ ./mysplit.sh huge_file.txt

Good luck. Output looks like this for my test:
foo_seq.dat
xx_20061004aa.txt
xx_20061004ab.txt
xx_20061004ac.txt
xx_20061004ad.txt
xx_20061004ae.txt
mdtauseefhussain
Participant
Posts: 38
Joined: Mon Feb 27, 2006 10:34 pm
Location: Chennai
Contact:

Post by mdtauseefhussain »

the script is working when i run the commans individually

Wen i run it as a shell script it giving an error
Mohammed Tausif Hussain Sheikh
Cognizant technologies,Perungudi
Chennai
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

[sigh] Any error in particular or do we have to guess?
-craig

"You can never have too many knives" -- Logan Nine Fingers
mdtauseefhussain
Participant
Posts: 38
Joined: Mon Feb 27, 2006 10:34 pm
Location: Chennai
Contact:

Post by mdtauseefhussain »

Thanks ,Sorry that script actually ,i wrote the script in notepad and svae it as shell scritp and tried to run ,the script was throwing error "cannot find Ctrl M
the i tried to open that script in vi editor when i loked M cahracter was appearing in place of enter keys ,i deleted thos in vi editor and ran the script it working fine

Thnks for your help ,iam very new to Unix ,i apologise if my quirese were very simple

Iam great full to all who have helped me in solving this issue
Mohammed Tausif Hussain Sheikh
Cognizant technologies,Perungudi
Chennai
jgreve
Premium Member
Premium Member
Posts: 107
Joined: Mon Sep 25, 2006 4:25 pm

Post by jgreve »

Well, good luck!

My unix introduction was painful :)
I trapped myself in the "cat" command (had to reboot
the machine to get out of it!) I've learned a lot since then.
If you're doing this on windows, go get a program call VIM
(vim.org) - that is better than the "vi" includes.

Spend time with vi - learn how to drive it well,
it will pay back your study-energy more than 100 times.
(in other words, quit using NOTEPAD and other "gui" editors,
go 100% vi - it will suck for 2 or 3 weeks, then it will get
better. Just do it. Really).

Depending on what kind of unix you're using, try to get
an administrator handbook. You can't really do anything
in unix until you know a little bit about everything, so
go shallow and wide in your learning. There will be time
to drill into things later as problems come up.
mdtauseefhussain wrote:Thanks ,Sorry that script actually ,i wrote the
script in notepad and svae it as shell scritp and tried to run ,the script was throwing error "cannot find Ctrl M
the i tried to open that script in vi editor when i loked M cahracter was appearing in place of enter keys ,i deleted thos in vi editor and ran the script it working fine

Thnks for your help ,iam very new to Unix ,i apologise if my quirese were very simple

Iam great full to all who have helped me in solving this issue
Post Reply