How to assess a DataStage Job

Madhavan VM · Post by **Madhavan VM** » Wed Jul 06, 2005 10:59 am

hi,

How can we assess a DataStage job?

For example, In other languages we define jobs as simple, medium or complex depending on lines of code. Or we use the caper jones tables to find the function points.

Similar to these lines do we have a standard for rating the DataStage jobs?

How do we calculate the metrics like productivity, quality, cost on quality etc on DataStage jobs?

Is there a way by which we can classify a DataStage job as simple, medium or complex? Like if a DataStage job has 5-8 stages, can we classify it as simple or do we have to take in to account the transformations and the complexity of logic that gets implemented in the job?

In general, what all gets accounted in assessing a DataStage job and how do we do it? Do we have an industry accepted standard?

kduke · Post by **kduke** » Wed Jul 06, 2005 11:58 am

I think you need to gather stats. What is your average number of stages? What is your average number of columns per stage? What is the most used stage types? All concepts like simple, medium and complex should be based on stats like average and standard deviation. We had lengthly discussion about this recently. You need to find this post.

What would be nice if we had these stats from the whole DSX community. We could compare our averages to the group. Do we do things more or less complex than other people. I think we need to know what you are using DataStage for too. If you are building a datamart, ODS or doing a conversion then these numbers should change. These numbers should be easy to gather though.

I promised to do the same for row counts. I have not started that yet. Maybe we could do both soon. How many people would post your averages to the forum if I would hide the fact of where the numbers came from? I would love to know these numbers. I am sure many here would too. We would need to categorize your application to datamart, data conversion, ODS and other. I doubt if we could agree to a definition as to what all of these are but maybe we should try. I think this would be beneficial.

I am sorry. I have not had the time to finish row count stats. I am willing to do this as well if anyone would participate.

kduke · Post by **kduke** » Wed Jul 06, 2005 12:04 pm

Check out viewtopic.php?t=93464

ray.wurlod · Post by **ray.wurlod** » Wed Jul 06, 2005 3:14 pm

You would also need some measure of clarity (lack of obfuscation). For example, does the job have "hidden" ingredients, such as before/after subroutines, about which the future maintainers are not warned with, say, annotations on the design canvas?
Containers can add to or detract from clarity; so quantifying that might be a nightmare. For example, it is good practice to use Shared Containers (reusable componentry is generally a Good Thing). Local containers can aid in understanding the design but, if used unwisely, can add to overall obfuscation. Consider the generic design

Code: Select all

Container  --->  Container  -->  Container
extract_data     magic_happens   load_data

It's not really immediately obvious what's going on, is it?

kduke · Post by **kduke** » Wed Jul 06, 2005 3:37 pm

clarity (lack of obfuscation)

Perfectly clear Professor Ray man.

Definitely a good point. Documentation of any kind adds value. Either a project is over documented and nothing gets done. Usually this is done by one of the big accounting firms pretending to be consultants. Or the other situation is no documentation at all. No project management, no change control, no source to target documents, no business analysis at all. Both are very bad.

A balance of some fore thought with source to target documents and some change management without documenting every column width change is the most productive and cost effective. If they certified to a specific level of PM then expect it to take forever to get things done.

srekant · Post by **srekant** » Thu Jul 07, 2005 6:36 am

kduke wrote:I think you need to gather stats. What is your average number of stages? What is your average number of columns per stage? What is the most used stage types? All concepts like simple, medium and complex should be based on stats like average and standard deviation. We had lengthly discussion about this recently. You need to find this post.

What would be nice if we had these stats from the whole DSX community. We could compare our averages to the group. Do we do things more or less complex than other people. I think we need to know what you are using DataStage for too. If you are building a datamart, ODS or doing a conversion then these numbers should change. These numbers should be easy to gather though.

I promised to do the same for row counts. I have not started that yet. Maybe we could do both soon. How many people would post your averages to the forum if I would hide the fact of where the numbers came from? I would love to know these numbers. I am sure many here would too. We would need to categorize your application to datamart, data conversion, ODS and other. I doubt if we could agree to a definition as to what all of these are but maybe we should try. I think this would be beneficial.

I am sorry. I have not had the time to finish row count stats. I am willing to do this as well if anyone would participate.

Hi kduke,

I am pretty much interested to participate in what u said and also willing to give stats on DS7.5 projects which i have done.I would be useful to have a template which we can use for estimating the complexities of Data Stage jobs.

DSXchange

How to assess a DataStage Job

How to assess a DataStage Job

Interested