split data based on size in datastage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

split data based on size in datastage

Post by prasson_ibm »

Hi,

Is there any way in datastage i can split the input file (eg 1 GB) based on size lets say 1 MB and create the multiple files.
vamsi.4a6
Participant
Posts: 334
Joined: Sun Jan 22, 2012 7:06 am
Contact:

Post by vamsi.4a6 »

In 9.1 we can split the file based on key but i think we have only unix split command option. In one of post i read we can use Folder stage in server jobs to split a file but not sure exactly
Thanks and Regards
Vamsi krishna.v
http://datastage-vamsi.blogspot.in/
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Perhaps, but why not simply do it at the command line?
-craig

"You can never have too many knives" -- Logan Nine Fingers
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi,

This is what i exactly suggested,but my client is rigid,want to check the capebility of datastage :cry:
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'm not aware of any "split by size" option. Of course you could BuildOp whatever you like and still call it "in DataStage". Otherwise you would have to do something by row count after you figure out approximately how many records will generally equal 1MB. Using the Folder stage as a target is one option but is a bit of a pain in the butt. There have been other suggestions made in the past when this question was asked before that a search should turn up.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Since the input file is, by definition, a source it is not changed in the job. If you wish to use a UNIX utility such as "split" to split the file then you can do so as part of the before-job calls, likewise you could call a BASIC Before-job routine to perform more complex tasks on the file.

Are you sure you didn't mean to split the output file?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sam Ting. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

The Big Data File stage in 9.1 has an optional Max File Size property for writing target files. When the max size is reached (in MB), it generates another target file. I found this in the documentation; have not tested it myself.
Choose a job you love, and you will never have to work a day in your life. - Confucius
Post Reply