filltering logic to generate the output file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
arvind
Participant
Posts: 17
Joined: Sun Aug 07, 2005 7:57 am

filltering logic to generate the output file

Post by arvind »

Hi All,

I'm getting a file with milions records. this file i have to fillter out based on the country, generate a file for each country.There are 60 countries and i need to generate 60 files from one single source file. Can anybody suggest the logic how to do.
I want to know is there any restrication for the number of output files in a single datastage job.


Thanks in Advance
Arvind
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

There is no real limitation of how many output stages you might have; but for 60 potential output files this is a bit much. I would look into using the PX "fileset" - you can partition this so that you have 60 dataset files in the file set but in the job it just looks like you are writing one file.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Another way is a multiple instance job that is run for each country.
- Write a job that reads the file, removes duplicates from the country field and writes it to a sequential file.
- Write a BASIC routine that receives a row number and opens the sequential file and returns the text on that row.
- Write a sequence job with a loop, the first step in the loop is a routine stage that passes in the loop number and receives back the country name. Pass this to the multiple instance job with the country name as the invocation id and job parameter, embed the country name into a filter constraint and into your output file name to make it unique for each country.

It might be too slow as you are starting and stopping 60 parallel jobs. Certainly a lot slower then partitioned filesets, though I don't know how you would partition to have just one country in each fileset without specifying more nodes then countries.
arvind
Participant
Posts: 17
Joined: Sun Aug 07, 2005 7:57 am

Post by arvind »

Hello All,
Thanks Arnd and vmcburney

I have created the Datastage job with ftpstage transformation stage and FileSet stage.
Job is running fine and I am getting data in (/detld2/etl/ascential/scratchdatasets/datasets) this directory.
Data in four nodes with different file name:
export.p453112.P000000_F0000
export.p453112.P000001_F0000
export.p453112.P000002_F0000
export.p453112.P000003_F0000
I want to know how to generate multiple files based on condition from a single file.

Thanks in Advance
Arvind
Post Reply