How to measure job complexity?
Posted: Wed Jun 15, 2005 6:30 pm
I got an email today asking this question. I have been thinking about this a lot lately. One of the reports that comes out of DwNav is Design Stats. This counts jobs, links and columns by category. The assumption is that one developer is probably working on that category. It would be nice to compare this to last month and see how many things got added in a month. This will give some kind of idea of how many new objects were created. You need to rate these objects like jobs 1000, Links 300 and columns 50 or something like that. You might be able to rate it by the time it takes to create a job, a link and a column. User defined SQL may add another level of complexity. If a job has 10 links then it may double the complexity. 20 may quadruple. Because the more complex something is the more time it takes but not at the same rate. If you have 10 objects to build a thing then 100 may take 20 times longer instead of 10 times. It is not a one for one.
I would think that a developer could consistently develop the same number of jobs or objects or whatever you want to call them. The idea is as these tools do more and more of the work then we need to know the number of steps it took to build this thing whether you call it a job or whatever. They used to count the number of lines of code produced. Now is more and more obscure but there is a task associated with each step.
Tools like DataStage shield you from the underlying complexity to connect to a database and insert, update or bulk load but you need to understand these concepts or what is actually happening. The idea used to be that these tools would be the great equalizer. That a bad developer would create work at the same rate as a more intelligent one. But the oposite is happening. These tools let the developer set at a conceptual level and not worry about the details. Therefore the more complex you can think the faster you can get your data from the source to the target mentally and physically.
I thought about writing a book about this. I think it would be cool to teach people how to measure work and therefore separate the good developers from the bad. This may ignore quality. Some guys work always breaks while others never break. I noticed that the level of quality is measured differently in different companies. Say you are lead with 3 or 4 devlopers. Say you are dealing with finacial data then accuracy is more important than speed. Say you are Google and you measure clicks on ads to your customers. Then you need to process millions of rows. Can you tell who in the household is clicking by what they click and change it while they are still logged in. Like a child may click on Disney where an adult might click on Ford. This may double or triple their revenue. How do they know. They build a data warehouse with DataStage PX. The first may run for days but could still save them millions because month old data is fine. Their sales cycle is annual. Google's is instant by instant.
How do you measure your work? How do know you are faster or slower than the developer next to you. I had a guy ask me why I got paid more than he did. I asked him why he wanted to know. He said we both did the same thing, "DataStage". I just said "Oh really".
I just automated a process which built 80 jobs because they were straight table copies. It took me 6 hours to write and it ran in 40 seconds. How many of you have written 80 jobs in 6 hours? I did a similar thing at the last job. We snowfalked a dimension to speed up the ETL and MicroStrategy. One dimension became 12. I had the transformation rules in a table used to evaluate metadata. We had 4 jobs per table load including a sequence. I wrote a process to generate all 4 jobs in about 4 days. 3 out of the 4 jobs will compile straight up. The forth takes a few minutes to fix the constraints. It genrated all 48 jobs in a few minutes.
I was talking to a guy about the html documentation that we generate. It took him 3 days to document one job. The documentation in DwNav and in JobReport was better and could document the whole project in minutes or seconds. They were told not to use these tools because they wanted to bill the hours. I said if the customer finds out then they should be upset. It cost them a lot of money for a poor quality document but never knew how to measure the work being done. Most work is measured from a gut feeling. It feels like Kim does good work but I am not sure. It seems like Kim is organized but I am not sure. This is how most businesses run today. They have no idea if I am better or worse than the guy next to me. He seems to get more work done.
Hopefully the whole team is more productive when I am around. That is my goal. If I never explain how or why I do things the way I do then that is okay. I know the two Craigs from Denver do it the same way. Ohers get it besides my friends respect me. That is all I need to motivate me.
If you think outside the box and most people are in the box then you are by definition weird.
I would think that a developer could consistently develop the same number of jobs or objects or whatever you want to call them. The idea is as these tools do more and more of the work then we need to know the number of steps it took to build this thing whether you call it a job or whatever. They used to count the number of lines of code produced. Now is more and more obscure but there is a task associated with each step.
Tools like DataStage shield you from the underlying complexity to connect to a database and insert, update or bulk load but you need to understand these concepts or what is actually happening. The idea used to be that these tools would be the great equalizer. That a bad developer would create work at the same rate as a more intelligent one. But the oposite is happening. These tools let the developer set at a conceptual level and not worry about the details. Therefore the more complex you can think the faster you can get your data from the source to the target mentally and physically.
I thought about writing a book about this. I think it would be cool to teach people how to measure work and therefore separate the good developers from the bad. This may ignore quality. Some guys work always breaks while others never break. I noticed that the level of quality is measured differently in different companies. Say you are lead with 3 or 4 devlopers. Say you are dealing with finacial data then accuracy is more important than speed. Say you are Google and you measure clicks on ads to your customers. Then you need to process millions of rows. Can you tell who in the household is clicking by what they click and change it while they are still logged in. Like a child may click on Disney where an adult might click on Ford. This may double or triple their revenue. How do they know. They build a data warehouse with DataStage PX. The first may run for days but could still save them millions because month old data is fine. Their sales cycle is annual. Google's is instant by instant.
How do you measure your work? How do know you are faster or slower than the developer next to you. I had a guy ask me why I got paid more than he did. I asked him why he wanted to know. He said we both did the same thing, "DataStage". I just said "Oh really".
I just automated a process which built 80 jobs because they were straight table copies. It took me 6 hours to write and it ran in 40 seconds. How many of you have written 80 jobs in 6 hours? I did a similar thing at the last job. We snowfalked a dimension to speed up the ETL and MicroStrategy. One dimension became 12. I had the transformation rules in a table used to evaluate metadata. We had 4 jobs per table load including a sequence. I wrote a process to generate all 4 jobs in about 4 days. 3 out of the 4 jobs will compile straight up. The forth takes a few minutes to fix the constraints. It genrated all 48 jobs in a few minutes.
I was talking to a guy about the html documentation that we generate. It took him 3 days to document one job. The documentation in DwNav and in JobReport was better and could document the whole project in minutes or seconds. They were told not to use these tools because they wanted to bill the hours. I said if the customer finds out then they should be upset. It cost them a lot of money for a poor quality document but never knew how to measure the work being done. Most work is measured from a gut feeling. It feels like Kim does good work but I am not sure. It seems like Kim is organized but I am not sure. This is how most businesses run today. They have no idea if I am better or worse than the guy next to me. He seems to get more work done.
Hopefully the whole team is more productive when I am around. That is my goal. If I never explain how or why I do things the way I do then that is okay. I know the two Craigs from Denver do it the same way. Ohers get it besides my friends respect me. That is all I need to motivate me.
If you think outside the box and most people are in the box then you are by definition weird.