Splitting Huge targetfile into small file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
mtechnocrat
Participant
Posts: 38
Joined: Sat Feb 28, 2004 12:11 pm

Splitting Huge targetfile into small file

Post by mtechnocrat »

Hi All
I have one huge file with name price.txt .I need to split this file for every 25000 records.After splitting the out files will be price_timestamp.txt .So I have tried with UNIX split command .this is not meeting my requirement.Can anyone suggest me how to achieve this.

thanks in Adv
hari
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I would not use DataStage to do this, but would use the UNIX split command, which is quite efficient and does exactly what you want.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

The split command has a funny syntax. It can split files by blocks, char or lines if I remember correctly. It names the files funny as well. The file names by default end up like fileaa, fileab, fileac. You can control the names a little bit but once again it is difficult to understand the syntax.

Once you figure out how it all works then you will need to use mv to move the file names to another naming convention.
Mamu Kim
adams06
Participant
Posts: 92
Joined: Sun Mar 12, 2006 3:00 pm

Re: Splitting Huge targetfile into small file

Post by adams06 »

Try using mod function.

MOD(KeyMgtGetNextValue('your job name'),25000)
mtechnocrat wrote:Hi All
I have one huge file with name price.txt .I need to split this file for every 25000 records.After splitting the out files will be price_timestamp.txt .So I have tried with UNIX split command .this is not meeting my requirement.Can anyone suggest me how to achieve this.

thanks in Adv
hari
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

There is no reason to use any KeyMgt calls, that would slow a job down immensely. The MODulo will give you a group number, but won't split output files automatically. By far the easiest on UNIX is split. If it must be done in a DataStage job I would put the logic into some short BASIC code, since a server job needs to define all output sequential file links in the design phase, i.e. you would need to account for a maximum number of splits and code those number of links into the job.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Mod(@INROWNUM,25000) would be very efficient, but how many output links would you need? This could become unwieldy for millions of rows. And, indeed, there is a limit on the number of links that you can attach to a Transformer stage (stream input plus 127 others, if I recall correctly).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply