Co Sort functionality using Datastage Generic Job
Posted: Tue Feb 26, 2013 3:32 pm
I am currently working in a datastage migration project 7.5 to 8.5
As part of code migration process we have 1000s of CoSort scripts also which needs to be migrated to the new Linux server.
To avoid licensing cost for CoSort the plan is to convert the CoSort scripts in Datastage Parallel jobs.
Strategy:-
1) If we have any existing PX job before/after the co-sort script then merge the co-sort logic in either of the datastage PX job.
2) If no PX job avaiable before/after the co-sort script, then create a new PX job for the co-sort script logic to implement.
I have done some analysis in the co-sort scripts and found that most of the scripts will do the following operations :-
a) Read Input Fixed width files.
b) Sorts data of the Input Files
c) Filters data
d) Joins data
e) aggregate data
My plan is to create a generic datastage job which does the above operations easily irrespective of the number of inputs/output files.
This is to replace "co-sort" functionality using datastage job.
My plan is to use schema files,filter stage,join stage,modify stage and RCP functionality to implement it.
Before starting on I need all the DS experts inputs.
Is it feasible to do so?
If not What's the other best way to do it ?
Can we do it using custom stages -- buildop/wrapper or using Datatstage APIs ?
Welcome all your inputs.
As part of code migration process we have 1000s of CoSort scripts also which needs to be migrated to the new Linux server.
To avoid licensing cost for CoSort the plan is to convert the CoSort scripts in Datastage Parallel jobs.
Strategy:-
1) If we have any existing PX job before/after the co-sort script then merge the co-sort logic in either of the datastage PX job.
2) If no PX job avaiable before/after the co-sort script, then create a new PX job for the co-sort script logic to implement.
I have done some analysis in the co-sort scripts and found that most of the scripts will do the following operations :-
a) Read Input Fixed width files.
b) Sorts data of the Input Files
c) Filters data
d) Joins data
e) aggregate data
My plan is to create a generic datastage job which does the above operations easily irrespective of the number of inputs/output files.
This is to replace "co-sort" functionality using datastage job.
My plan is to use schema files,filter stage,join stage,modify stage and RCP functionality to implement it.
Before starting on I need all the DS experts inputs.
Is it feasible to do so?
If not What's the other best way to do it ?
Can we do it using custom stages -- buildop/wrapper or using Datatstage APIs ?
Welcome all your inputs.