Routine to split files

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

PAULOM
Participant
Posts: 33
Joined: Thu Jul 11, 2013 2:03 am

Routine to split files

Post by PAULOM »

Hello,

I have to create a routine DataStage which allows me to splitter my file in entry to so much file as ID (parameter) in the first file.

Is it possible to make that with a routine?

Thank you.
Last edited by PAULOM on Tue Nov 26, 2013 11:41 am, edited 1 time in total.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You can do it in a routine, or better yet in a transform stage - but could you post an example of what you want to do, i.e. how that parameter value is defined on the first line and how to split your file.
PAULOM
Participant
Posts: 33
Joined: Thu Jul 11, 2013 2:03 am

Post by PAULOM »

Thanks for your answer,

for example :

My entry file :

NAME|SURNAME|ID_SESSION|...

PAUL|BOB|TEST
PIT|BRAD|TEST
BILLY|JOHN|TEST2


At final, I want 2 files with the first :

PAUL|BOB|TEST
PIT|BRAD|TEST

And the second :

BILLY|JOHN|TEST2

I don't how to do with the Transformer, i think i do make a routine to split may files and do a loop in my job....
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

How many output files (ID_SESSION) do you have and are the values for the filenames known beforehand? If the number is limited and the values known, then I would use a transform stage with constraints, one output link per output file.
PAULOM
Participant
Posts: 33
Joined: Thu Jul 11, 2013 2:03 am

Post by PAULOM »

No, I don't know the number and the names of ID_SESSION...
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

[Edit] - redacted some advice - missed the fact it was a Server job!

To be quite truthful, if there's no data manipulation being done, this could be done much more efficiently by a UNIX shell script that sorted the data based on the ID_SESSION then split it into its component files. If I had a choice, that's what I'd do. You could still execute the shell script from a job sequence to put it into a job stream.
Last edited by asorrell on Tue Nov 26, 2013 2:16 pm, edited 4 times in total.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

asorrell wrote:DataStage doesn't have the ability to select output file names based on data values until Revision 9.1.
Other than the Folder stage in a Server job or Server Shared Container but it too falls into the 'ugly' camp.

Oh, and technically the XML Output stage. Sorta. :wink:

[Edit] So noted! I modified my post... Always keeping me honest! :-) - Andy
-craig

"You can never have too many knives" -- Logan Nine Fingers
PAULOM
Participant
Posts: 33
Joined: Thu Jul 11, 2013 2:03 am

Post by PAULOM »

Hello,

Thank you for your answers. I am not going to follow your recommendations because I don't feel able of making it technically.

I know that it is possible to make it with a routine (and it was on this point that I wanted to know more about it) more a loop on the job.

Thanks to you.
eph
Premium Member
Premium Member
Posts: 110
Joined: Mon Oct 18, 2010 10:25 am

Post by eph »

Hi,

In addition to the folder stage in container proposed by Craig, you can use a custom PX routine to do the split (that's what I use successfully).

Check those threads :
viewtopic.php?t=145412
viewtopic.php?t=114531

Eric
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you really want to make a routine you would pass it the columns, in the routine it would:

1. Open the appropriate file using OPENSEQ
2. Position to the end-of-file using SEEK
3. Write the line using WRITESEQ
4. Close the file using CLOSESEQ

If the files are big or performance is important, you could use a COMMON block in your code to store an array of file units pointing to the corresponding sequential files so that you don't need to OPEN/CLOSE them on each call.
PAULOM
Participant
Posts: 33
Joined: Thu Jul 11, 2013 2:03 am

Post by PAULOM »

@eph, sorry i'm on Datastage Server 8.5 and not PX
@ArndW, thanks for your response, Can u please explain step by step how to use this routine in my requirement.

Thanks and Regards,
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Do you really need a routine? I find Craig's suggestion of using the Folder stage to be the easiest implementation...

Mike
bart12872
Participant
Posts: 82
Joined: Fri Jan 19, 2007 5:38 pm

Post by bart12872 »

Well, I think the quickest way and the easier way is to use a awk command

you create your file with all data. And after that run a awk command reading all lines and writing them in output files.
the output files name contains a parameter witch is the value of the id_session of the line.

something like that
awk -F"|" '{ print $0 }' /directory/your_file_$3.txt

i don't test this command line, but i use it a lot in the past. Feel free to test and adapt it.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

PAULOM wrote:@ArndW, thanks for your response, Can u please explain step by step how to use this routine in my requirement.
No, I don't wish to do that. That would end up actually writing the BASIC routine and I think if you open the BASIC manual for the description of those routines and give it a go you should be able to do it, and the people here would certainly assist you in specific problems that you might have.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I noted that the Folder stage was a bit of a pain to use as a target because you need to send the entire output payload to it at once in a single record. Is that what you ended up doing? Hard for me to tell from what you posted.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply