Routine to split files
Moderators: chulett, rschirm, roy
Routine to split files
Hello,
I have to create a routine DataStage which allows me to splitter my file in entry to so much file as ID (parameter) in the first file.
Is it possible to make that with a routine?
Thank you.
I have to create a routine DataStage which allows me to splitter my file in entry to so much file as ID (parameter) in the first file.
Is it possible to make that with a routine?
Thank you.
Last edited by PAULOM on Tue Nov 26, 2013 11:41 am, edited 1 time in total.
You can do it in a routine, or better yet in a transform stage - but could you post an example of what you want to do, i.e. how that parameter value is defined on the first line and how to split your file.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Thanks for your answer,
for example :
My entry file :
NAME|SURNAME|ID_SESSION|...
PAUL|BOB|TEST
PIT|BRAD|TEST
BILLY|JOHN|TEST2
At final, I want 2 files with the first :
PAUL|BOB|TEST
PIT|BRAD|TEST
And the second :
BILLY|JOHN|TEST2
I don't how to do with the Transformer, i think i do make a routine to split may files and do a loop in my job....
for example :
My entry file :
NAME|SURNAME|ID_SESSION|...
PAUL|BOB|TEST
PIT|BRAD|TEST
BILLY|JOHN|TEST2
At final, I want 2 files with the first :
PAUL|BOB|TEST
PIT|BRAD|TEST
And the second :
BILLY|JOHN|TEST2
I don't how to do with the Transformer, i think i do make a routine to split may files and do a loop in my job....
How many output files (ID_SESSION) do you have and are the values for the filenames known beforehand? If the number is limited and the values known, then I would use a transform stage with constraints, one output link per output file.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
[Edit] - redacted some advice - missed the fact it was a Server job!
To be quite truthful, if there's no data manipulation being done, this could be done much more efficiently by a UNIX shell script that sorted the data based on the ID_SESSION then split it into its component files. If I had a choice, that's what I'd do. You could still execute the shell script from a job sequence to put it into a job stream.
To be quite truthful, if there's no data manipulation being done, this could be done much more efficiently by a UNIX shell script that sorted the data based on the ID_SESSION then split it into its component files. If I had a choice, that's what I'd do. You could still execute the shell script from a job sequence to put it into a job stream.
Last edited by asorrell on Tue Nov 26, 2013 2:16 pm, edited 4 times in total.
Other than the Folder stage in a Server job or Server Shared Container but it too falls into the 'ugly' camp.asorrell wrote:DataStage doesn't have the ability to select output file names based on data values until Revision 9.1.
Oh, and technically the XML Output stage. Sorta.
[Edit] So noted! I modified my post... Always keeping me honest! - Andy
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Hi,
In addition to the folder stage in container proposed by Craig, you can use a custom PX routine to do the split (that's what I use successfully).
Check those threads :
viewtopic.php?t=145412
viewtopic.php?t=114531
Eric
In addition to the folder stage in container proposed by Craig, you can use a custom PX routine to do the split (that's what I use successfully).
Check those threads :
viewtopic.php?t=145412
viewtopic.php?t=114531
Eric
If you really want to make a routine you would pass it the columns, in the routine it would:
1. Open the appropriate file using OPENSEQ
2. Position to the end-of-file using SEEK
3. Write the line using WRITESEQ
4. Close the file using CLOSESEQ
If the files are big or performance is important, you could use a COMMON block in your code to store an array of file units pointing to the corresponding sequential files so that you don't need to OPEN/CLOSE them on each call.
1. Open the appropriate file using OPENSEQ
2. Position to the end-of-file using SEEK
3. Write the line using WRITESEQ
4. Close the file using CLOSESEQ
If the files are big or performance is important, you could use a COMMON block in your code to store an array of file units pointing to the corresponding sequential files so that you don't need to OPEN/CLOSE them on each call.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Well, I think the quickest way and the easier way is to use a awk command
you create your file with all data. And after that run a awk command reading all lines and writing them in output files.
the output files name contains a parameter witch is the value of the id_session of the line.
something like that
awk -F"|" '{ print $0 }' /directory/your_file_$3.txt
i don't test this command line, but i use it a lot in the past. Feel free to test and adapt it.
you create your file with all data. And after that run a awk command reading all lines and writing them in output files.
the output files name contains a parameter witch is the value of the id_session of the line.
something like that
awk -F"|" '{ print $0 }' /directory/your_file_$3.txt
i don't test this command line, but i use it a lot in the past. Feel free to test and adapt it.
No, I don't wish to do that. That would end up actually writing the BASIC routine and I think if you open the BASIC manual for the description of those routines and give it a go you should be able to do it, and the people here would certainly assist you in specific problems that you might have.PAULOM wrote:@ArndW, thanks for your response, Can u please explain step by step how to use this routine in my requirement.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>