Page 1 of 1

filltering logic to generate the output file

Posted: Tue Sep 27, 2005 12:44 pm
by arvind
Hi All,

I'm getting a file with milions records. this file i have to fillter out based on the country, generate a file for each country.There are 60 countries and i need to generate 60 files from one single source file. Can anybody suggest the logic how to do.
I want to know is there any restrication for the number of output files in a single datastage job.


Thanks in Advance
Arvind

Posted: Tue Sep 27, 2005 1:58 pm
by ArndW
There is no real limitation of how many output stages you might have; but for 60 potential output files this is a bit much. I would look into using the PX "fileset" - you can partition this so that you have 60 dataset files in the file set but in the job it just looks like you are writing one file.

Posted: Tue Sep 27, 2005 5:48 pm
by vmcburney
Another way is a multiple instance job that is run for each country.
- Write a job that reads the file, removes duplicates from the country field and writes it to a sequential file.
- Write a BASIC routine that receives a row number and opens the sequential file and returns the text on that row.
- Write a sequence job with a loop, the first step in the loop is a routine stage that passes in the loop number and receives back the country name. Pass this to the multiple instance job with the country name as the invocation id and job parameter, embed the country name into a filter constraint and into your output file name to make it unique for each country.

It might be too slow as you are starting and stopping 60 parallel jobs. Certainly a lot slower then partitioned filesets, though I don't know how you would partition to have just one country in each fileset without specifying more nodes then countries.

Posted: Wed Sep 28, 2005 3:11 pm
by arvind
Hello All,
Thanks Arnd and vmcburney

I have created the Datastage job with ftpstage transformation stage and FileSet stage.
Job is running fine and I am getting data in (/detld2/etl/ascential/scratchdatasets/datasets) this directory.
Data in four nodes with different file name:
export.p453112.P000000_F0000
export.p453112.P000001_F0000
export.p453112.P000002_F0000
export.p453112.P000003_F0000
I want to know how to generate multiple files based on condition from a single file.

Thanks in Advance
Arvind