Co Sort functionality using Datastage Generic Job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
datisaq
Participant
Posts: 154
Joined: Wed May 14, 2008 4:34 am

Co Sort functionality using Datastage Generic Job

Post by datisaq »

I am currently working in a datastage migration project 7.5 to 8.5

As part of code migration process we have 1000s of CoSort scripts also which needs to be migrated to the new Linux server.

To avoid licensing cost for CoSort the plan is to convert the CoSort scripts in Datastage Parallel jobs.

Strategy:-
1) If we have any existing PX job before/after the co-sort script then merge the co-sort logic in either of the datastage PX job.
2) If no PX job avaiable before/after the co-sort script, then create a new PX job for the co-sort script logic to implement.

I have done some analysis in the co-sort scripts and found that most of the scripts will do the following operations :-
a) Read Input Fixed width files.
b) Sorts data of the Input Files
c) Filters data
d) Joins data
e) aggregate data

My plan is to create a generic datastage job which does the above operations easily irrespective of the number of inputs/output files.
This is to replace "co-sort" functionality using datastage job.

My plan is to use schema files,filter stage,join stage,modify stage and RCP functionality to implement it.

Before starting on I need all the DS experts inputs.
Is it feasible to do so?
If not What's the other best way to do it ?
Can we do it using custom stages -- buildop/wrapper or using Datatstage APIs ?

Welcome all your inputs.
IBM Certified - Information Server 8.1
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

None of your approaches achieves the required result of avoiding CoSort licensing, as you're still planning to invoke the CoSort scripts.

Why not try implementing one of them using a Sort stage, perhaps increasing the available memory per node when performing the sort?
Make sure your data are partitioned on the sort keys using a key-based partitioning algorithm (Hash or Modulus).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Note quite sure how you got that from a reading of their post, Ray, they mention nothing about still planning on invoking the CoSort scripts. They are looking for advice on exactly what you mentioned - converting the work that CoSort does into DataStage.

Off the top of my head, however, I have no idea how feasible it is to do that generically...i.e. via schema files and RCP.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply