Hi All,
I'm getting a file with milions records. this file i have to fillter out based on the country, generate a file for each country.There are 60 countries and i need to generate 60 files from one single source file. Can anybody suggest the logic how to do.
I want to know is there any restrication for the number of output files in a single datastage job.
Thanks in Advance
Arvind
filltering logic to generate the output file
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Another way is a multiple instance job that is run for each country.
- Write a job that reads the file, removes duplicates from the country field and writes it to a sequential file.
- Write a BASIC routine that receives a row number and opens the sequential file and returns the text on that row.
- Write a sequence job with a loop, the first step in the loop is a routine stage that passes in the loop number and receives back the country name. Pass this to the multiple instance job with the country name as the invocation id and job parameter, embed the country name into a filter constraint and into your output file name to make it unique for each country.
It might be too slow as you are starting and stopping 60 parallel jobs. Certainly a lot slower then partitioned filesets, though I don't know how you would partition to have just one country in each fileset without specifying more nodes then countries.
- Write a job that reads the file, removes duplicates from the country field and writes it to a sequential file.
- Write a BASIC routine that receives a row number and opens the sequential file and returns the text on that row.
- Write a sequence job with a loop, the first step in the loop is a routine stage that passes in the loop number and receives back the country name. Pass this to the multiple instance job with the country name as the invocation id and job parameter, embed the country name into a filter constraint and into your output file name to make it unique for each country.
It might be too slow as you are starting and stopping 60 parallel jobs. Certainly a lot slower then partitioned filesets, though I don't know how you would partition to have just one country in each fileset without specifying more nodes then countries.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Hello All,
Thanks Arnd and vmcburney
I have created the Datastage job with ftpstage transformation stage and FileSet stage.
Job is running fine and I am getting data in (/detld2/etl/ascential/scratchdatasets/datasets) this directory.
Data in four nodes with different file name:
export.p453112.P000000_F0000
export.p453112.P000001_F0000
export.p453112.P000002_F0000
export.p453112.P000003_F0000
I want to know how to generate multiple files based on condition from a single file.
Thanks in Advance
Arvind
Thanks Arnd and vmcburney
I have created the Datastage job with ftpstage transformation stage and FileSet stage.
Job is running fine and I am getting data in (/detld2/etl/ascential/scratchdatasets/datasets) this directory.
Data in four nodes with different file name:
export.p453112.P000000_F0000
export.p453112.P000001_F0000
export.p453112.P000002_F0000
export.p453112.P000003_F0000
I want to know how to generate multiple files based on condition from a single file.
Thanks in Advance
Arvind