split file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
adams06
Participant
Posts: 92
Joined: Sun Mar 12, 2006 3:00 pm

split file

Post by adams06 »

Hi,

i have to split the one file into 5 file.

ex: x

1
1
2
2
3
3
4
4
4
4

i have to split x in such a way that
1,1 goes to file 1
2,2 goes to file 2
3,3 goes to file 3
4,4,4,4 goes to file 4


Any ideas.

Thanks in advance
DeepakCorning
Premium Member
Premium Member
Posts: 503
Joined: Wed Jun 29, 2005 8:14 am

Re: split file

Post by DeepakCorning »

I think this is simple one. The way I m thinking is to have the file as the source and then Five Transformer with Five Files as target. The Transformers will have filter condition determining on the field value for e.g. If fied.value = 1 then it goes to Link1 etc etc.

Not aware of if there is a simler method than this.
meena
Participant
Posts: 430
Joined: Tue Sep 13, 2005 12:17 pm

Post by meena »

Hi,
I think you can do this by using single transformer to create five sequential files and use the filter condition in transformer as DeepakCorning asked to do.
adams06
Participant
Posts: 92
Joined: Sun Mar 12, 2006 3:00 pm

split

Post by adams06 »

Hi Meena,

There are 25000 records opp...

Thanks.


meena wrote:Hi,
I think you can do this by using single transformer to create five sequential files and use the filter condition in transformer as DeepakCorning asked to do.
meena
Participant
Posts: 430
Joined: Tue Sep 13, 2005 12:17 pm

Re: split

Post by meena »

Hi,
Can you try something like this:
Write a stage variables.

Code: Select all

Stagevariable1:ToTX1.Inputcolumn
Stagevaribale2:If ToTX1.Inputcolumn=Stagevariable1 Then true else false.

And write a constraint in transformer Stagevaribale2=@true for all five output stream.

Check this.I am not sure whether this works or not but just giving you an option to try.
There are 25000 records opp...
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

It is best to write a BASIC routine if you are not sure about the number of group of x you are about to get.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

If you can access unix from your windows os via samba or some other software then you can do this easily by writing a shell script. Something like this has been achieved over here at dsxchange. Try to search for a relative script.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

kumar_s wrote:It is best to write a BASIC routine if you are not sure about the number of group of x you are about to get.
False.
It is easier to create a Transformer stage with that number of outputs, propagate the rows where the input value changes, and use stage variables to construct the output lines. The technique can be found by searching the forum for an exact match on vertical pivot
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

The OP did say he only needed to create 5 output files. That is simple enough with the GUI to set up 5 output links with constraints.

Now if he had said the number of output files was unknown and could number in the hundreds or thousands, then I would think Kumar's suggestion would be a good option. It would get really cumbersome to do this with the GUI when the number of outputs gets higher than maybe a few dozen.

Mike
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Mike wrote:The OP did say he only needed to create 5 output files. That is simple enough with the GUI to set up 5 output links with constraints.
True.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
sb_akarmarkar
Participant
Posts: 232
Joined: Fri Sep 30, 2005 4:52 am
Contact:

Post by sb_akarmarkar »

Hi adam,

We can use folder stage also ...


Thanks,
Anupam
changming
Participant
Posts: 68
Joined: Wed Oct 13, 2004 3:35 am

use multi instance

Post by changming »

I do not know why you need to split a file to so many files. You can use hash file and multi instance do achieve this. you need to built to job, first job just read the file and write only the first one distinct column into one hash file then write into a txt file. seconde datastage job will be called by unix shell scripts, the scripts will read one value per time and pass the datastage job (second) as parameter. the second datastage job will use the parameter in a transform's constraint or filter, write data into a file with the parameter as file name. such as myfile_#paparamet#.txt.
the unix shell scrip just a simple " for loop " scripts. all the instance job can run at same time, (do not know the max instance can run at same time) , so the performance will not be bad.
thurmy34
Premium Member
Premium Member
Posts: 198
Joined: Fri Mar 31, 2006 8:27 am
Location: Paris

Post by thurmy34 »

Hi

I am agree with changming except for the loop because we are in windows.
But you can write a job control who call your job x times with the value to proceed in parameter.
You should include this parameter in all the stage to name the file.

Code: Select all


* Setup SplitJOB, run it, wait for it to finish, and test for success
      hJob1 = DSAttachJob("SplitJOB", DSJ.ERRFATAL)
      If NOT(hJob1) Then
         Call DSLogFatal("Job Attach Failed: SplitJOB", "JobControl")
         Abort
      End
      ErrCode = DSSetParam(hJob1, "Value",1)
      ErrCode = DSRunJob(hJob1, DSJ.RUNNORMAL)
      ErrCode = DSWaitForJob(hJob1)
      Status = DSGetJobInfo(hJob1, DSJ.JOBSTATUS)
      If Status = DSJS.RUNFAILED Or Status = DSJS.CRASHED Then
         * Fatal Error - No Return
         Call DSLogFatal("Job Failed: SplitJOB", "JobControl")
      End

* Setup SplitJOB, run it, wait for it to finish, and test for success
      hJob2 = DSAttachJob("SplitJOB", DSJ.ERRFATAL)
      If NOT(hJob2) Then
         Call DSLogFatal("Job Attach Failed: SplitJOB", "JobControl")
         Abort
      End

      ErrCode = DSSetParam(hJob2, "Value",2)
      ErrCode = DSRunJob(hJob2, DSJ.RUNNORMAL)
      ErrCode = DSWaitForJob(hJob2)
      Status = DSGetJobInfo(hJob2, DSJ.JOBSTATUS)
      If Status = DSJS.RUNFAILED Or Status = DSJS.CRASHED Then
         * Fatal Error - No Return
         Call DSLogFatal("Job Failed: SplitJOB", "JobControl")
      End
Hope this Help
Post Reply