How to assess a DataStage Job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Madhavan VM
Participant
Posts: 33
Joined: Sat Jul 02, 2005 2:27 am
Location: Bangalore

How to assess a DataStage Job

Post by Madhavan VM »

hi,

How can we assess a DataStage job?

For example, In other languages we define jobs as simple, medium or complex depending on lines of code. Or we use the caper jones tables to find the function points.

Similar to these lines do we have a standard for rating the DataStage jobs?

How do we calculate the metrics like productivity, quality, cost on quality etc on DataStage jobs?

Is there a way by which we can classify a DataStage job as simple, medium or complex? Like if a DataStage job has 5-8 stages, can we classify it as simple or do we have to take in to account the transformations and the complexity of logic that gets implemented in the job?

In general, what all gets accounted in assessing a DataStage job and how do we do it? Do we have an industry accepted standard?
warm regards,
Ajith GK
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I think you need to gather stats. What is your average number of stages? What is your average number of columns per stage? What is the most used stage types? All concepts like simple, medium and complex should be based on stats like average and standard deviation. We had lengthly discussion about this recently. You need to find this post.

What would be nice if we had these stats from the whole DSX community. We could compare our averages to the group. Do we do things more or less complex than other people. I think we need to know what you are using DataStage for too. If you are building a datamart, ODS or doing a conversion then these numbers should change. These numbers should be easy to gather though.

I promised to do the same for row counts. I have not started that yet. Maybe we could do both soon. How many people would post your averages to the forum if I would hide the fact of where the numbers came from? I would love to know these numbers. I am sure many here would too. We would need to categorize your application to datamart, data conversion, ODS and other. I doubt if we could agree to a definition as to what all of these are but maybe we should try. I think this would be beneficial.

I am sorry. I have not had the time to finish row count stats. I am willing to do this as well if anyone would participate.
Mamu Kim
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Mamu Kim
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You would also need some measure of clarity (lack of obfuscation). For example, does the job have "hidden" ingredients, such as before/after subroutines, about which the future maintainers are not warned with, say, annotations on the design canvas?
Containers can add to or detract from clarity; so quantifying that might be a nightmare. For example, it is good practice to use Shared Containers (reusable componentry is generally a Good Thing). Local containers can aid in understanding the design but, if used unwisely, can add to overall obfuscation. Consider the generic design

Code: Select all

Container  --->  Container  -->  Container
extract_data     magic_happens   load_data
It's not really immediately obvious what's going on, is it?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

clarity (lack of obfuscation)
Perfectly clear Professor Ray man.
:)

Definitely a good point. Documentation of any kind adds value. Either a project is over documented and nothing gets done. Usually this is done by one of the big accounting firms pretending to be consultants. Or the other situation is no documentation at all. No project management, no change control, no source to target documents, no business analysis at all. Both are very bad.

A balance of some fore thought with source to target documents and some change management without documenting every column width change is the most productive and cost effective. If they certified to a specific level of PM then expect it to take forever to get things done.
Mamu Kim
srekant
Premium Member
Premium Member
Posts: 85
Joined: Wed Jan 19, 2005 6:52 am
Location: Detroit

Interested

Post by srekant »

kduke wrote:I think you need to gather stats. What is your average number of stages? What is your average number of columns per stage? What is the most used stage types? All concepts like simple, medium and complex should be based on stats like average and standard deviation. We had lengthly discussion about this recently. You need to find this post.

What would be nice if we had these stats from the whole DSX community. We could compare our averages to the group. Do we do things more or less complex than other people. I think we need to know what you are using DataStage for too. If you are building a datamart, ODS or doing a conversion then these numbers should change. These numbers should be easy to gather though.

I promised to do the same for row counts. I have not started that yet. Maybe we could do both soon. How many people would post your averages to the forum if I would hide the fact of where the numbers came from? I would love to know these numbers. I am sure many here would too. We would need to categorize your application to datamart, data conversion, ODS and other. I doubt if we could agree to a definition as to what all of these are but maybe we should try. I think this would be beneficial.

I am sorry. I have not had the time to finish row count stats. I am willing to do this as well if anyone would participate.

Hi kduke,

I am pretty much interested to participate in what u said and also willing to give stats on DS7.5 projects which i have done.I would be useful to have a template which we can use for estimating the complexities of Data Stage jobs.
Sree
Post Reply