Spiltting a huge file into multiple small files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
avats
Participant
Posts: 1
Joined: Tue Feb 01, 2011 2:40 am
Location: India

Spiltting a huge file into multiple small files

Post by avats »

Hi,

We have a requirement where we need to split the data into multiple small XML files with around 1000 rows. We can do this based on the rowcount but we are not sure which stage to be used to split the rows in multiple files.

Earlier we were writing all the data to a single file but the job is failing due to space issue on the server, so we need to split the data.

Please help.

Thanks,
Avats
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Space is space, no matter if it's all in one file or many... perhaps you meant a file size issue? Regardless, what you want to take advantage of is the Trigger Column in the XML Output stage.

Every time the value in the column changes, the output file is closed and a new one is opened. It doesn't need to be a column you pass to the output field so you could simply count the output rows and change the value in the trigger column every 1000 rows.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vishal_rastogi
Participant
Posts: 47
Joined: Thu Dec 09, 2010 4:37 am

Post by vishal_rastogi »

you can use the unix script to split the file using the sed command and later load the spiltted file to target
Vish
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I'm with Craig. 15GiB of data in one file is still going to occupy 15GiB in multiple files.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
udayk_2007
Participant
Posts: 72
Joined: Wed Dec 12, 2007 2:29 am

Post by udayk_2007 »

unix split command can be used for this purpose.

Regards
Ulhas
MarkB
Premium Member
Premium Member
Posts: 95
Joined: Fri Oct 27, 2006 9:13 am

Post by MarkB »

I don't think coming up with ways to split the file will accomplish anything other than complicating the job and getting the same results. :roll: The issue the OP has is one of disk space. You can split the file into hundreds of smaller files and it won't matter one iota if you are still writing them out to the same disk - eventually you are going to run out of space.

What the OP needs is another disk to write to or to make enough space on the disk he is currently writing to in order to run the job.
Post Reply