creating many output files
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 11
- Joined: Fri Jan 30, 2009 3:21 am
creating many output files
Hi,
I have one source file ( dataset) and I need to create many output files from this source file based on different conditions.
There are around 1000 conditions and I need to create 1 output file for each condition.
I am thinking of creating 30 jobs and each job will create 30 or 40 outpufiles.
Is there any other way to do this?
Thanks & Regards,
Veera.
I have one source file ( dataset) and I need to create many output files from this source file based on different conditions.
There are around 1000 conditions and I need to create 1 output file for each condition.
I am thinking of creating 30 jobs and each job will create 30 or 40 outpufiles.
Is there any other way to do this?
Thanks & Regards,
Veera.
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
Re: creating many output files
The last time I had to do something like this I did it in a Routine in a Server Job. The client pushed back as everything was supposed to be Parallel Jobs, but we persuaded them that a Parallel Job was an inappropriate choice for this operation, and that a Server Job was an acceptable solution. Another option would be to do it in a shell script.
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
-
- Participant
- Posts: 11
- Joined: Fri Jan 30, 2009 3:21 am
Re: creating many output files
Thanks for the reply Phil and DSGuru. I had also suggested my client about the Server job but client did not accepted it as the source data is 1.6Billion records. So I am planning to do it in Shell script. Thanks for the suggestion.
-
- Premium Member
- Posts: 258
- Joined: Tue Jul 04, 2006 10:35 pm
- Location: Toronto
Re: creating many output files
you can do this
in a transformer
If [condition 1] Then "echo ":inputlink.col1: inputlink.col2:inputlink.col3...:">/directorey/file1.txt"
else If[ condition 2] Then "echo ":inputlink.col1: inputlink.col2:inputlink.col3...:">/directorey/file2.txt"
.
.
.
.
.
.
else ''
then in the after job subroutine run execSH " sh < seqfile.txt
This should work ...
Code: Select all
dataset ----- transformer--------seqfile
If [condition 1] Then "echo ":inputlink.col1: inputlink.col2:inputlink.col3...:">/directorey/file1.txt"
else If[ condition 2] Then "echo ":inputlink.col1: inputlink.col2:inputlink.col3...:">/directorey/file2.txt"
.
.
.
.
.
.
else ''
then in the after job subroutine run execSH " sh < seqfile.txt
This should work ...
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
Re: creating many output files
That should be >> not >samyamkrishna wrote:Code: Select all
If [condition 1] Then "echo ":inputlink.col1: inputlink.col2:inputlink.col3...:">/directorey/file1.txt" else If[ condition 2] Then "echo ":inputlink.col1: inputlink.col2:inputlink.col3...:">/directorey/file2.txt"
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
-
- Premium Member
- Posts: 258
- Joined: Tue Jul 04, 2006 10:35 pm
- Location: Toronto
Re: creating many output files
ya sorry my bad.
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
Re: creating many output files
...although the file will need to be deleted beforehand, unless you can detect the first occurrence of each rule and do a > on the first and a >> on each subsequent. Personally I wouldn't do it by generating a shell script like this, I'd just generate the CSV and then handle the splitting in a shell script - that way you aren't tying the DataStage build to a particular shell.
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
-
- Participant
- Posts: 6
- Joined: Wed Mar 09, 2005 9:35 am
Re: creating many output files
If you need to do this in Datastage you can simply use a jobs that use a Filter stage after the file. You can have many output files from the filter and you only need to put a where clause like ColumnA = 123, etc... (exclude the where) the trick will be keeping track of what link numbers go with each file. I would write them down one at a time as you place the links... starting with 0.. the first link you attach will be 0 and then 1,2,3 .....
This should be an efficient way of doing it and the data should be sorted on your key. You don't want to use a transformer if there are no transformations
This should be an efficient way of doing it and the data should be sorted on your key. You don't want to use a transformer if there are no transformations
-
- Premium Member
- Posts: 1044
- Joined: Wed Sep 29, 2004 3:30 am
- Location: Nottingham, UK
- Contact:
Re: creating many output files
I think the question was to find out if there is a way of doing it without around 1000 output links in a job.netgurutoo wrote:If you need to do this in Datastage you can simply use a jobs that use a Filter stage after the file. You can have many output files from the filter and you only need to put a where clause like ColumnA = 123, etc... (exclude the where) the trick will be keeping track of what link numbers go with each file. I would write them down one at a time as you place the links... starting with 0.. the first link you attach will be 0 and then 1,2,3 .....
Phil Hibbs | Capgemini
Technical Consultant
Technical Consultant
You could use an external target stage, pass it 1 column which starts with the fully qualified file name and then pass the rest of your column (with whatever delimiter your target file needs).
I think you would want to look at sorting the incoming data here also by the 1st column. But I am not sure its entirely necessary.
It in the destination program of the external target stage you would set your Target method to Specific program and use the following code. Following example has comma as delimiter
Or you could just do it in unix directly.
I think you would want to look at sorting the incoming data here also by the 1st column. But I am not sure its entirely necessary.
It in the destination program of the external target stage you would set your Target method to Specific program and use the following code. Following example has comma as delimiter
Code: Select all
awk '{nPosField1=index($0,",");print substr($0,nPosField1+1)>substr($0,1,nPosField1-1)}'