Co Sort functionality using Datastage Generic Job

datisaq · Post by **datisaq** » Tue Feb 26, 2013 3:32 pm

I am currently working in a datastage migration project 7.5 to 8.5

As part of code migration process we have 1000s of CoSort scripts also which needs to be migrated to the new Linux server.

To avoid licensing cost for CoSort the plan is to convert the CoSort scripts in Datastage Parallel jobs.

Strategy:-
1) If we have any existing PX job before/after the co-sort script then merge the co-sort logic in either of the datastage PX job.
2) If no PX job avaiable before/after the co-sort script, then create a new PX job for the co-sort script logic to implement.

I have done some analysis in the co-sort scripts and found that most of the scripts will do the following operations :-
a) Read Input Fixed width files.
b) Sorts data of the Input Files
c) Filters data
d) Joins data
e) aggregate data

My plan is to create a generic datastage job which does the above operations easily irrespective of the number of inputs/output files.
This is to replace "co-sort" functionality using datastage job.

My plan is to use schema files,filter stage,join stage,modify stage and RCP functionality to implement it.

Before starting on I need all the DS experts inputs.
Is it feasible to do so?
If not What's the other best way to do it ?
Can we do it using custom stages -- buildop/wrapper or using Datatstage APIs ?

Welcome all your inputs.

ray.wurlod · Post by **ray.wurlod** » Tue Feb 26, 2013 7:36 pm

None of your approaches achieves the required result of avoiding CoSort licensing, as you're still planning to invoke the CoSort scripts.

Why not try implementing one of them using a Sort stage, perhaps increasing the available memory per node when performing the sort?
Make sure your data are partitioned on the sort keys using a key-based partitioning algorithm (Hash or Modulus).

chulett · Post by **chulett** » Wed Feb 27, 2013 12:10 am

Note quite sure how you got that from a reading of their post, Ray, they mention nothing about still planning on invoking the CoSort scripts. They are looking for advice on exactly what you mentioned - converting the work that CoSort does into DataStage.

Off the top of my head, however, I have no idea how feasible it is to do that generically...i.e. via schema files and RCP.