Page 1 of 1

Splitting Huge targetfile into small file

Posted: Tue Oct 03, 2006 4:11 am
by mtechnocrat
Hi All
I have one huge file with name price.txt .I need to split this file for every 25000 records.After splitting the out files will be price_timestamp.txt .So I have tried with UNIX split command .this is not meeting my requirement.Can anyone suggest me how to achieve this.

thanks in Adv
hari

Posted: Tue Oct 03, 2006 4:29 am
by ArndW
I would not use DataStage to do this, but would use the UNIX split command, which is quite efficient and does exactly what you want.

Posted: Tue Oct 03, 2006 8:02 am
by kduke
The split command has a funny syntax. It can split files by blocks, char or lines if I remember correctly. It names the files funny as well. The file names by default end up like fileaa, fileab, fileac. You can control the names a little bit but once again it is difficult to understand the syntax.

Once you figure out how it all works then you will need to use mv to move the file names to another naming convention.

Re: Splitting Huge targetfile into small file

Posted: Tue Oct 03, 2006 8:16 am
by adams06
Try using mod function.

MOD(KeyMgtGetNextValue('your job name'),25000)
mtechnocrat wrote:Hi All
I have one huge file with name price.txt .I need to split this file for every 25000 records.After splitting the out files will be price_timestamp.txt .So I have tried with UNIX split command .this is not meeting my requirement.Can anyone suggest me how to achieve this.

thanks in Adv
hari

Posted: Tue Oct 03, 2006 8:44 am
by ArndW
There is no reason to use any KeyMgt calls, that would slow a job down immensely. The MODulo will give you a group number, but won't split output files automatically. By far the easiest on UNIX is split. If it must be done in a DataStage job I would put the logic into some short BASIC code, since a server job needs to define all output sequential file links in the design phase, i.e. you would need to account for a maximum number of splits and code those number of links into the job.

Posted: Tue Oct 03, 2006 4:26 pm
by ray.wurlod
Mod(@INROWNUM,25000) would be very efficient, but how many output links would you need? This could become unwieldy for millions of rows. And, indeed, there is a limit on the number of links that you can attach to a Transformer stage (stream input plus 127 others, if I recall correctly).