split file
Moderators: chulett, rschirm, roy
split file
Hi,
i have to split the one file into 5 file.
ex: x
1
1
2
2
3
3
4
4
4
4
i have to split x in such a way that
1,1 goes to file 1
2,2 goes to file 2
3,3 goes to file 3
4,4,4,4 goes to file 4
Any ideas.
Thanks in advance
i have to split the one file into 5 file.
ex: x
1
1
2
2
3
3
4
4
4
4
i have to split x in such a way that
1,1 goes to file 1
2,2 goes to file 2
3,3 goes to file 3
4,4,4,4 goes to file 4
Any ideas.
Thanks in advance
-
- Premium Member
- Posts: 503
- Joined: Wed Jun 29, 2005 8:14 am
Re: split file
I think this is simple one. The way I m thinking is to have the file as the source and then Five Transformer with Five Files as target. The Transformers will have filter condition determining on the field value for e.g. If fied.value = 1 then it goes to Link1 etc etc.
Not aware of if there is a simler method than this.
Not aware of if there is a simler method than this.
Re: split
Hi,
Can you try something like this:
Write a stage variables.
And write a constraint in transformer Stagevaribale2=@true for all five output stream.
Check this.I am not sure whether this works or not but just giving you an option to try.
Can you try something like this:
Write a stage variables.
Code: Select all
Stagevariable1:ToTX1.Inputcolumn
Stagevaribale2:If ToTX1.Inputcolumn=Stagevariable1 Then true else false.
And write a constraint in transformer Stagevaribale2=@true for all five output stream.
Check this.I am not sure whether this works or not but just giving you an option to try.
There are 25000 records opp...
If you can access unix from your windows os via samba or some other software then you can do this easily by writing a shell script. Something like this has been achieved over here at dsxchange. Try to search for a relative script.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
False.kumar_s wrote:It is best to write a BASIC routine if you are not sure about the number of group of x you are about to get.
It is easier to create a Transformer stage with that number of outputs, propagate the rows where the input value changes, and use stage variables to construct the output lines. The technique can be found by searching the forum for an exact match on vertical pivot
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
The OP did say he only needed to create 5 output files. That is simple enough with the GUI to set up 5 output links with constraints.
Now if he had said the number of output files was unknown and could number in the hundreds or thousands, then I would think Kumar's suggestion would be a good option. It would get really cumbersome to do this with the GUI when the number of outputs gets higher than maybe a few dozen.
Mike
Now if he had said the number of output files was unknown and could number in the hundreds or thousands, then I would think Kumar's suggestion would be a good option. It would get really cumbersome to do this with the GUI when the number of outputs gets higher than maybe a few dozen.
Mike
-
- Participant
- Posts: 232
- Joined: Fri Sep 30, 2005 4:52 am
- Contact:
use multi instance
I do not know why you need to split a file to so many files. You can use hash file and multi instance do achieve this. you need to built to job, first job just read the file and write only the first one distinct column into one hash file then write into a txt file. seconde datastage job will be called by unix shell scripts, the scripts will read one value per time and pass the datastage job (second) as parameter. the second datastage job will use the parameter in a transform's constraint or filter, write data into a file with the parameter as file name. such as myfile_#paparamet#.txt.
the unix shell scrip just a simple " for loop " scripts. all the instance job can run at same time, (do not know the max instance can run at same time) , so the performance will not be bad.
the unix shell scrip just a simple " for loop " scripts. all the instance job can run at same time, (do not know the max instance can run at same time) , so the performance will not be bad.
Hi
I am agree with changming except for the loop because we are in windows.
But you can write a job control who call your job x times with the value to proceed in parameter.
You should include this parameter in all the stage to name the file.
Hope this Help
I am agree with changming except for the loop because we are in windows.
But you can write a job control who call your job x times with the value to proceed in parameter.
You should include this parameter in all the stage to name the file.
Code: Select all
* Setup SplitJOB, run it, wait for it to finish, and test for success
hJob1 = DSAttachJob("SplitJOB", DSJ.ERRFATAL)
If NOT(hJob1) Then
Call DSLogFatal("Job Attach Failed: SplitJOB", "JobControl")
Abort
End
ErrCode = DSSetParam(hJob1, "Value",1)
ErrCode = DSRunJob(hJob1, DSJ.RUNNORMAL)
ErrCode = DSWaitForJob(hJob1)
Status = DSGetJobInfo(hJob1, DSJ.JOBSTATUS)
If Status = DSJS.RUNFAILED Or Status = DSJS.CRASHED Then
* Fatal Error - No Return
Call DSLogFatal("Job Failed: SplitJOB", "JobControl")
End
* Setup SplitJOB, run it, wait for it to finish, and test for success
hJob2 = DSAttachJob("SplitJOB", DSJ.ERRFATAL)
If NOT(hJob2) Then
Call DSLogFatal("Job Attach Failed: SplitJOB", "JobControl")
Abort
End
ErrCode = DSSetParam(hJob2, "Value",2)
ErrCode = DSRunJob(hJob2, DSJ.RUNNORMAL)
ErrCode = DSWaitForJob(hJob2)
Status = DSGetJobInfo(hJob2, DSJ.JOBSTATUS)
If Status = DSJS.RUNFAILED Or Status = DSJS.CRASHED Then
* Fatal Error - No Return
Call DSLogFatal("Job Failed: SplitJOB", "JobControl")
End