How to split data into files?
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 37
- Joined: Thu Nov 25, 2004 8:44 pm
- Location: Bangalore, Karnataka, India
How to split data into files?
Hi
I have an input file which has to be split into different files depending on the value of the fifth field. The problem is...
the number of files is determined by another sequential file that contains the values that the fifth field can take.
Sequential file contents is as follows...
account
address
registration
So, if fifth field is 'account', the row should go into account.txt.
if it is 'address', the row should go into address.txt.
the data in the sequential file can change.
I'm in a hurry.
Please help!!!
I have an input file which has to be split into different files depending on the value of the fifth field. The problem is...
the number of files is determined by another sequential file that contains the values that the fifth field can take.
Sequential file contents is as follows...
account
address
registration
So, if fifth field is 'account', the row should go into account.txt.
if it is 'address', the row should go into address.txt.
the data in the sequential file can change.
I'm in a hurry.
Please help!!!
Regards,
Vivek RS
Vivek RS
Vivek,
aha, the plot thickens :D That is a bit more difficult, especially dynamically in a single pass. Offhand I think you might need to do 2 passes on your main data; one pass to get a complete list of possible values. I just thought about using Px's built-in partitioning for datasets, I recall that you can use your own (c-language) formula so you might be able to do a dynamic allocation. Hopefully someone on this forum has done that or knows how...
aha, the plot thickens :D That is a bit more difficult, especially dynamically in a single pass. Offhand I think you might need to do 2 passes on your main data; one pass to get a complete list of possible values. I just thought about using Px's built-in partitioning for datasets, I recall that you can use your own (c-language) formula so you might be able to do a dynamic allocation. Hopefully someone on this forum has done that or knows how...
Vivek,
As far as I could think, it's impossible to do in DataStage because in DataStage each output file you are writing into should be determined at the time of job development. One thing I can suggest you is that it can be done using DOS batch files. Please try that and let us know the result.
As far as I could think, it's impossible to do in DataStage because in DataStage each output file you are writing into should be determined at the time of job development. One thing I can suggest you is that it can be done using DOS batch files. Please try that and let us know the result.
Vignesh.
"A conclusion is simply the place where you got tired of thinking."
"A conclusion is simply the place where you got tired of thinking."
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Use any normal method (such as using an agg) to identify the distinct file values.
Use the 'Multiple Instance' property to create multiple files from the same file as source and use parameters for target file. This paramter can take its value from the distinc values mentioned above.
As you may be using V > 7.5, you can use the looping mechanism in the sequencer for this purpose.
But note that Windows may lock the file if multiple processes access at the same time.
Use the 'Multiple Instance' property to create multiple files from the same file as source and use parameters for target file. This paramter can take its value from the distinc values mentioned above.
As you may be using V > 7.5, you can use the looping mechanism in the sequencer for this purpose.
But note that Windows may lock the file if multiple processes access at the same time.
-
- Participant
- Posts: 36
- Joined: Wed Feb 16, 2005 5:20 pm
- Location: IL
File
Incase if ur using a transformer and generalised a parameter so that it will write to a specified file.vivek_rs wrote:This seems to be a good idea.
I'm using 7.1. So, I'll have to write a Job Control that reads a sequential file and calls multiple instances of the job to extract different segments into different files.
Can anyone think of anything better?
TIA
Let m eknow if i am wrong
This does not address the "unknown number of files" condition.mujeebur wrote:Sort the file and pass it to a Transformer stage . Use stage variable to compare the previous row , if its changes write to different file , else write to the same file. Like wise you may have accounts.txt and address.txt ..etc by using constraints mechanism.