Routine to split files

PAULOM · Post by **PAULOM** » Mon Nov 25, 2013 3:07 am

Hello,

I have to create a routine DataStage which allows me to splitter my file in entry to so much file as ID (parameter) in the first file.

Is it possible to make that with a routine?

Thank you.

ArndW · Post by **ArndW** » Mon Nov 25, 2013 4:42 am

You can do it in a routine, or better yet in a transform stage - but could you post an example of what you want to do, i.e. how that parameter value is defined on the first line and how to split your file.

PAULOM · Post by **PAULOM** » Mon Nov 25, 2013 4:49 am

ArndW · Post by **ArndW** » Mon Nov 25, 2013 6:04 am

How many output files (ID_SESSION) do you have and are the values for the filenames known beforehand? If the number is limited and the values known, then I would use a transform stage with constraints, one output link per output file.

PAULOM · Post by **PAULOM** » Mon Nov 25, 2013 6:43 am

No, I don't know the number and the names of ID_SESSION...

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Mon Nov 25, 2013 11:05 am

[Edit] - redacted some advice - missed the fact it was a Server job!

To be quite truthful, if there's no data manipulation being done, this could be done much more efficiently by a UNIX shell script that sorted the data based on the ID_SESSION then split it into its component files. If I had a choice, that's what I'd do. You could still execute the shell script from a job sequence to put it into a job stream.

chulett · Post by **chulett** » Mon Nov 25, 2013 11:11 am

asorrell wrote:DataStage doesn't have the ability to select output file names based on data values until Revision 9.1.

Other than the Folder stage in a Server job or Server Shared Container but it too falls into the 'ugly' camp.

Oh, and technically the XML Output stage. Sorta.

[Edit] So noted! I modified my post... Always keeping me honest!

- Andy

PAULOM · Post by **PAULOM** » Tue Nov 26, 2013 4:21 am

Hello,

Thank you for your answers. I am not going to follow your recommendations because I don't feel able of making it technically.

I know that it is possible to make it with a routine (and it was on this point that I wanted to know more about it) more a loop on the job.

Thanks to you.

eph · Post by **eph** » Tue Nov 26, 2013 4:37 am

Hi,

In addition to the folder stage in container proposed by Craig, you can use a custom PX routine to do the split (that's what I use successfully).

Check those threads :
viewtopic.php?t=145412
viewtopic.php?t=114531

Eric

ArndW · Post by **ArndW** » Tue Nov 26, 2013 4:46 am

If you really want to make a routine you would pass it the columns, in the routine it would:

1. Open the appropriate file using OPENSEQ
2. Position to the end-of-file using SEEK
3. Write the line using WRITESEQ
4. Close the file using CLOSESEQ

If the files are big or performance is important, you could use a COMMON block in your code to store an array of file units pointing to the corresponding sequential files so that you don't need to OPEN/CLOSE them on each call.

PAULOM · Post by **PAULOM** » Tue Nov 26, 2013 11:57 am

@eph, sorry i'm on Datastage Server 8.5 and not PX
@ArndW, thanks for your response, Can u please explain step by step how to use this routine in my requirement.

Thanks and Regards,

Mike · Post by **Mike** » Tue Nov 26, 2013 12:08 pm

Do you really need a routine? I find Craig's suggestion of using the Folder stage to be the easiest implementation...

Mike

bart12872 · Post by **bart12872** » Tue Nov 26, 2013 3:07 pm

Well, I think the quickest way and the easier way is to use a awk command

you create your file with all data. And after that run a awk command reading all lines and writing them in output files.
the output files name contains a parameter witch is the value of the id_session of the line.

something like that
awk -F"|" '{ print $0 }' /directory/your_file_$3.txt

i don't test this command line, but i use it a lot in the past. Feel free to test and adapt it.

ArndW · Post by **ArndW** » Wed Nov 27, 2013 5:13 am

PAULOM wrote:@ArndW, thanks for your response, Can u please explain step by step how to use this routine in my requirement.

No, I don't wish to do that. That would end up actually writing the BASIC routine and I think if you open the BASIC manual for the description of those routines and give it a go you should be able to do it, and the people here would certainly assist you in specific problems that you might have.

chulett · Post by **chulett** » Mon Dec 02, 2013 10:46 am

I noted that the Folder stage was a bit of a pain to use as a target because you need to send the entire output payload to it at once in a single record. Is that what you ended up doing? Hard for me to tell from what you posted.