Hi All,
I'm getting a file with milions records. this file i have to fillter out based on the country, generate a file for each country.There are 60 countries and i need to generate 60 files from one single source file. Can anybody suggest the logic how to do.
I want to know is there any restrication for the number of output files in a single datastage job.
Thanks in Advance
Arvind
filltering logic to generate the output file
Moderators: chulett, rschirm, roy
There is no real limitation of how many output stages you might have; but for 60 potential output files this is a bit much. I would look into using the PX "fileset" - you can partition this so that you have 60 dataset files in the file set but in the job it just looks like you are writing one file.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Another way is a multiple instance job that is run for each country.
- Write a job that reads the file, removes duplicates from the country field and writes it to a sequential file.
- Write a BASIC routine that receives a row number and opens the sequential file and returns the text on that row.
- Write a sequence job with a loop, the first step in the loop is a routine stage that passes in the loop number and receives back the country name. Pass this to the multiple instance job with the country name as the invocation id and job parameter, embed the country name into a filter constraint and into your output file name to make it unique for each country.
It might be too slow as you are starting and stopping 60 parallel jobs. Certainly a lot slower then partitioned filesets, though I don't know how you would partition to have just one country in each fileset without specifying more nodes then countries.
- Write a job that reads the file, removes duplicates from the country field and writes it to a sequential file.
- Write a BASIC routine that receives a row number and opens the sequential file and returns the text on that row.
- Write a sequence job with a loop, the first step in the loop is a routine stage that passes in the loop number and receives back the country name. Pass this to the multiple instance job with the country name as the invocation id and job parameter, embed the country name into a filter constraint and into your output file name to make it unique for each country.
It might be too slow as you are starting and stopping 60 parallel jobs. Certainly a lot slower then partitioned filesets, though I don't know how you would partition to have just one country in each fileset without specifying more nodes then countries.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Hello All,
Thanks Arnd and vmcburney
I have created the Datastage job with ftpstage transformation stage and FileSet stage.
Job is running fine and I am getting data in (/detld2/etl/ascential/scratchdatasets/datasets) this directory.
Data in four nodes with different file name:
export.p453112.P000000_F0000
export.p453112.P000001_F0000
export.p453112.P000002_F0000
export.p453112.P000003_F0000
I want to know how to generate multiple files based on condition from a single file.
Thanks in Advance
Arvind
Thanks Arnd and vmcburney
I have created the Datastage job with ftpstage transformation stage and FileSet stage.
Job is running fine and I am getting data in (/detld2/etl/ascential/scratchdatasets/datasets) this directory.
Data in four nodes with different file name:
export.p453112.P000000_F0000
export.p453112.P000001_F0000
export.p453112.P000002_F0000
export.p453112.P000003_F0000
I want to know how to generate multiple files based on condition from a single file.
Thanks in Advance
Arvind