Page 1 of 1

split file

Posted: Fri Sep 08, 2006 10:30 am
by adams06
Hi,

i have to split the one file into 5 file.

ex: x

1
1
2
2
3
3
4
4
4
4

i have to split x in such a way that
1,1 goes to file 1
2,2 goes to file 2
3,3 goes to file 3
4,4,4,4 goes to file 4


Any ideas.

Thanks in advance

Re: split file

Posted: Fri Sep 08, 2006 11:09 am
by DeepakCorning
I think this is simple one. The way I m thinking is to have the file as the source and then Five Transformer with Five Files as target. The Transformers will have filter condition determining on the field value for e.g. If fied.value = 1 then it goes to Link1 etc etc.

Not aware of if there is a simler method than this.

Posted: Fri Sep 08, 2006 11:30 am
by meena
Hi,
I think you can do this by using single transformer to create five sequential files and use the filter condition in transformer as DeepakCorning asked to do.

split

Posted: Fri Sep 08, 2006 12:05 pm
by adams06
Hi Meena,

There are 25000 records opp...

Thanks.


meena wrote:Hi,
I think you can do this by using single transformer to create five sequential files and use the filter condition in transformer as DeepakCorning asked to do.

Re: split

Posted: Fri Sep 08, 2006 12:40 pm
by meena
Hi,
Can you try something like this:
Write a stage variables.

Code: Select all

Stagevariable1:ToTX1.Inputcolumn
Stagevaribale2:If ToTX1.Inputcolumn=Stagevariable1 Then true else false.

And write a constraint in transformer Stagevaribale2=@true for all five output stream.

Check this.I am not sure whether this works or not but just giving you an option to try.
There are 25000 records opp...

Posted: Sat Sep 09, 2006 12:05 am
by kumar_s
It is best to write a BASIC routine if you are not sure about the number of group of x you are about to get.

Posted: Sat Sep 09, 2006 1:27 pm
by DSguru2B
If you can access unix from your windows os via samba or some other software then you can do this easily by writing a shell script. Something like this has been achieved over here at dsxchange. Try to search for a relative script.

Posted: Sun Sep 10, 2006 3:24 am
by ray.wurlod
kumar_s wrote:It is best to write a BASIC routine if you are not sure about the number of group of x you are about to get.
False.
It is easier to create a Transformer stage with that number of outputs, propagate the rows where the input value changes, and use stage variables to construct the output lines. The technique can be found by searching the forum for an exact match on vertical pivot

Posted: Sun Sep 10, 2006 7:11 am
by Mike
The OP did say he only needed to create 5 output files. That is simple enough with the GUI to set up 5 output links with constraints.

Now if he had said the number of output files was unknown and could number in the hundreds or thousands, then I would think Kumar's suggestion would be a good option. It would get really cumbersome to do this with the GUI when the number of outputs gets higher than maybe a few dozen.

Mike

Posted: Sun Sep 10, 2006 6:38 pm
by kumar_s
Mike wrote:The OP did say he only needed to create 5 output files. That is simple enough with the GUI to set up 5 output links with constraints.
True.

Posted: Sun Sep 10, 2006 10:41 pm
by sb_akarmarkar
Hi adam,

We can use folder stage also ...


Thanks,
Anupam

use multi instance

Posted: Sun Sep 10, 2006 10:41 pm
by changming
I do not know why you need to split a file to so many files. You can use hash file and multi instance do achieve this. you need to built to job, first job just read the file and write only the first one distinct column into one hash file then write into a txt file. seconde datastage job will be called by unix shell scripts, the scripts will read one value per time and pass the datastage job (second) as parameter. the second datastage job will use the parameter in a transform's constraint or filter, write data into a file with the parameter as file name. such as myfile_#paparamet#.txt.
the unix shell scrip just a simple " for loop " scripts. all the instance job can run at same time, (do not know the max instance can run at same time) , so the performance will not be bad.

Posted: Mon Sep 11, 2006 5:11 am
by thurmy34
Hi

I am agree with changming except for the loop because we are in windows.
But you can write a job control who call your job x times with the value to proceed in parameter.
You should include this parameter in all the stage to name the file.

Code: Select all


* Setup SplitJOB, run it, wait for it to finish, and test for success
      hJob1 = DSAttachJob("SplitJOB", DSJ.ERRFATAL)
      If NOT(hJob1) Then
         Call DSLogFatal("Job Attach Failed: SplitJOB", "JobControl")
         Abort
      End
      ErrCode = DSSetParam(hJob1, "Value",1)
      ErrCode = DSRunJob(hJob1, DSJ.RUNNORMAL)
      ErrCode = DSWaitForJob(hJob1)
      Status = DSGetJobInfo(hJob1, DSJ.JOBSTATUS)
      If Status = DSJS.RUNFAILED Or Status = DSJS.CRASHED Then
         * Fatal Error - No Return
         Call DSLogFatal("Job Failed: SplitJOB", "JobControl")
      End

* Setup SplitJOB, run it, wait for it to finish, and test for success
      hJob2 = DSAttachJob("SplitJOB", DSJ.ERRFATAL)
      If NOT(hJob2) Then
         Call DSLogFatal("Job Attach Failed: SplitJOB", "JobControl")
         Abort
      End

      ErrCode = DSSetParam(hJob2, "Value",2)
      ErrCode = DSRunJob(hJob2, DSJ.RUNNORMAL)
      ErrCode = DSWaitForJob(hJob2)
      Status = DSGetJobInfo(hJob2, DSJ.JOBSTATUS)
      If Status = DSJS.RUNFAILED Or Status = DSJS.CRASHED Then
         * Fatal Error - No Return
         Call DSLogFatal("Job Failed: SplitJOB", "JobControl")
      End
Hope this Help