Multiple files based on number

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dodda
Premium Member
Premium Member
Posts: 244
Joined: Tue May 29, 2007 11:31 am

Multiple files based on number

Post by dodda »

Hello

I have a requirement where lets say i have a flat file with 200 records and after doing some transformations i need to produce a file each for every 50 records. So i will have to produce 4 files with 50 records each. If the input file has 60 records i need to produce 2 files with 1 file 50 records and the second with 10 records. Is there a way that this can be done through datstage.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes. Create sufficient output links to handle the worst case scenario, and filter on row number (ideally row number from original source). Run in sequential mode.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Pagadrai
Participant
Posts: 111
Joined: Fri Dec 31, 2004 1:16 am
Location: Chennai

Re: Multiple files based on number

Post by Pagadrai »

Hi,
If predicting the number of branches you might need is tough, you can try this:

Create a Wrapped stage (Custom stage)
You can call a Unix script that will partition the data and write
to multiple files.

This is just an idea. :idea:
I will also try this for learning purpose and post the result.
dodda
Premium Member
Premium Member
Posts: 244
Joined: Tue May 29, 2007 11:31 am

Post by dodda »

Hi

thanks for your response. Yes the input file might have any number of records but i need to produce a file for every 50 records.I have never created custom stages before. Is there a way other than creating custom stages.

Thanks
Pagadrai
Participant
Posts: 111
Joined: Fri Dec 31, 2004 1:16 am
Location: Chennai

Post by Pagadrai »

dodda wrote:Hi

thanks for your response. Yes the input file might have any number of records but i need to produce a file for every 50 records.I have never created custom stages before. Is there a way other than creating custom stages.

Thanks
Hi,
Once you have the unix script for the purpose, implementing is not difficult.
or instead of a stage, you can land the data in an intermediary sequential file and call the script once the job is complete.
dodda
Premium Member
Premium Member
Posts: 244
Joined: Tue May 29, 2007 11:31 am

Post by dodda »

OK

Thanks for your help
verify
Premium Member
Premium Member
Posts: 99
Joined: Sun Mar 30, 2008 8:35 am

Post by verify »

After doing the transformations for entire records, load the records into a sequential file and then call your script through After Job Routine that will split your records into 50..
RK Raju
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

If you go that route, the UNIX command "split" can be used to chunk up the full file into smaller files and then you may need to loop through the results and rename the files unless you can live with the naming convention the command uses.
-craig

"You can never have too many knives" -- Logan Nine Fingers
wahi80
Participant
Posts: 214
Joined: Thu Feb 07, 2008 4:37 pm

Post by wahi80 »

Thats right, just use the csplit command in unix and you will be able to achieve your objective
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Actually, just the "split" command since the requirement is by row count. There's no need for the "context split" capability (based on file contents) that the csplit command brings to the table.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply