Page 1 of 2

Routine to split files

Posted: Mon Nov 25, 2013 3:07 am
by PAULOM
Hello,

I have to create a routine DataStage which allows me to splitter my file in entry to so much file as ID (parameter) in the first file.

Is it possible to make that with a routine?

Thank you.

Posted: Mon Nov 25, 2013 4:42 am
by ArndW
You can do it in a routine, or better yet in a transform stage - but could you post an example of what you want to do, i.e. how that parameter value is defined on the first line and how to split your file.

Posted: Mon Nov 25, 2013 4:49 am
by PAULOM
Thanks for your answer,

for example :

My entry file :

NAME|SURNAME|ID_SESSION|...

PAUL|BOB|TEST
PIT|BRAD|TEST
BILLY|JOHN|TEST2


At final, I want 2 files with the first :

PAUL|BOB|TEST
PIT|BRAD|TEST

And the second :

BILLY|JOHN|TEST2

I don't how to do with the Transformer, i think i do make a routine to split may files and do a loop in my job....

Posted: Mon Nov 25, 2013 6:04 am
by ArndW
How many output files (ID_SESSION) do you have and are the values for the filenames known beforehand? If the number is limited and the values known, then I would use a transform stage with constraints, one output link per output file.

Posted: Mon Nov 25, 2013 6:43 am
by PAULOM
No, I don't know the number and the names of ID_SESSION...

Posted: Mon Nov 25, 2013 11:05 am
by asorrell
[Edit] - redacted some advice - missed the fact it was a Server job!

To be quite truthful, if there's no data manipulation being done, this could be done much more efficiently by a UNIX shell script that sorted the data based on the ID_SESSION then split it into its component files. If I had a choice, that's what I'd do. You could still execute the shell script from a job sequence to put it into a job stream.

Posted: Mon Nov 25, 2013 11:11 am
by chulett
asorrell wrote:DataStage doesn't have the ability to select output file names based on data values until Revision 9.1.
Other than the Folder stage in a Server job or Server Shared Container but it too falls into the 'ugly' camp.

Oh, and technically the XML Output stage. Sorta. :wink:

[Edit] So noted! I modified my post... Always keeping me honest! :-) - Andy

Posted: Tue Nov 26, 2013 4:21 am
by PAULOM
Hello,

Thank you for your answers. I am not going to follow your recommendations because I don't feel able of making it technically.

I know that it is possible to make it with a routine (and it was on this point that I wanted to know more about it) more a loop on the job.

Thanks to you.

Posted: Tue Nov 26, 2013 4:37 am
by eph
Hi,

In addition to the folder stage in container proposed by Craig, you can use a custom PX routine to do the split (that's what I use successfully).

Check those threads :
viewtopic.php?t=145412
viewtopic.php?t=114531

Eric

Posted: Tue Nov 26, 2013 4:46 am
by ArndW
If you really want to make a routine you would pass it the columns, in the routine it would:

1. Open the appropriate file using OPENSEQ
2. Position to the end-of-file using SEEK
3. Write the line using WRITESEQ
4. Close the file using CLOSESEQ

If the files are big or performance is important, you could use a COMMON block in your code to store an array of file units pointing to the corresponding sequential files so that you don't need to OPEN/CLOSE them on each call.

Posted: Tue Nov 26, 2013 11:57 am
by PAULOM
@eph, sorry i'm on Datastage Server 8.5 and not PX
@ArndW, thanks for your response, Can u please explain step by step how to use this routine in my requirement.

Thanks and Regards,

Posted: Tue Nov 26, 2013 12:08 pm
by Mike
Do you really need a routine? I find Craig's suggestion of using the Folder stage to be the easiest implementation...

Mike

Posted: Tue Nov 26, 2013 3:07 pm
by bart12872
Well, I think the quickest way and the easier way is to use a awk command

you create your file with all data. And after that run a awk command reading all lines and writing them in output files.
the output files name contains a parameter witch is the value of the id_session of the line.

something like that
awk -F"|" '{ print $0 }' /directory/your_file_$3.txt

i don't test this command line, but i use it a lot in the past. Feel free to test and adapt it.

Posted: Wed Nov 27, 2013 5:13 am
by ArndW
PAULOM wrote:@ArndW, thanks for your response, Can u please explain step by step how to use this routine in my requirement.
No, I don't wish to do that. That would end up actually writing the BASIC routine and I think if you open the BASIC manual for the description of those routines and give it a go you should be able to do it, and the people here would certainly assist you in specific problems that you might have.

Posted: Mon Dec 02, 2013 10:46 am
by chulett
I noted that the Folder stage was a bit of a pain to use as a target because you need to send the entire output payload to it at once in a single record. Is that what you ended up doing? Hard for me to tell from what you posted.