Page 1 of 1

split data based on size in datastage

Posted: Thu Jun 12, 2014 9:21 am
by prasson_ibm
Hi,

Is there any way in datastage i can split the input file (eg 1 GB) based on size lets say 1 MB and create the multiple files.

Posted: Thu Jun 12, 2014 9:39 am
by vamsi.4a6
In 9.1 we can split the file based on key but i think we have only unix split command option. In one of post i read we can use Folder stage in server jobs to split a file but not sure exactly

Posted: Thu Jun 12, 2014 9:39 am
by chulett
Perhaps, but why not simply do it at the command line?

Posted: Thu Jun 12, 2014 10:01 am
by prasson_ibm
Hi,

This is what i exactly suggested,but my client is rigid,want to check the capebility of datastage :cry:

Posted: Thu Jun 12, 2014 10:34 am
by chulett
I'm not aware of any "split by size" option. Of course you could BuildOp whatever you like and still call it "in DataStage". Otherwise you would have to do something by row count after you figure out approximately how many records will generally equal 1MB. Using the Folder stage as a target is one option but is a bit of a pain in the butt. There have been other suggestions made in the past when this question was asked before that a search should turn up.

Posted: Thu Jun 12, 2014 11:40 am
by ArndW
Since the input file is, by definition, a source it is not changed in the job. If you wish to use a UNIX utility such as "split" to split the file then you can do so as part of the before-job calls, likewise you could call a BASIC Before-job routine to perform more complex tasks on the file.

Are you sure you didn't mean to split the output file?

Posted: Thu Jun 12, 2014 3:29 pm
by chulett
Sam Ting. :wink:

Posted: Thu Jun 12, 2014 8:11 pm
by qt_ky
The Big Data File stage in 9.1 has an optional Max File Size property for writing target files. When the max size is reached (in MB), it generates another target file. I found this in the documentation; have not tested it myself.