Balancing input and output record counts

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Bhusan
Premium Member
Premium Member
Posts: 15
Joined: Wed Jul 07, 2004 4:04 pm

Balancing input and output record counts

Post by Bhusan »

I would like to know if any one has a better idea to reconcile input, output, and rejected record counts when the source is a database table? I can think of taking the row count before starting the transformation and comparing that to number of records rejected and written to the target. Any better idea will be appreciated.
ketfos
Participant
Posts: 562
Joined: Mon May 03, 2004 8:58 pm
Location: san francisco
Contact:

Post by ketfos »

Hi,
You are having three links - input, output and Reject.
Use Show Performance Statitics option, while running your job.
This will give you records read from source, written to output link and written to Reject link.

Ketfos
dwscblr
Participant
Posts: 19
Joined: Tue May 18, 2004 12:39 am

Post by dwscblr »

If you want job statistics in a file, Call the DSendJobReport with the argument 2;directorypath in the "After routine" of the job for which the statistics are required.
This creates the job log in text format in the directorypath specified.

This will give you number of records read, written and rejected. You could write a shell script to format it if required.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

They posted a job on ADN and I modified it to run all jobs and post the row counts, job run times and parameter values to an Oracle table. This can easily be modifed to post to any database. We currently run this as the last job in a sequence. It will take a sequence, job, job folder or "all" as a parameter and get these stats for all jobs or one job or any of the above.

I think you need all of these. Sequences do not have row counts but it is nice to know how long a sequence runs. Also parameters are important because you might of run a job and posted to the dev version of the database from production. Row counts over time is important because a job may slow down over time because of network problems, database problems or any number of issues.

These jobs are in EtlStats.zip. They are well documented. You can also download them from my web site http://www.duke-consulting.com/DataStage_Tips_plain.htm. The jobs are in Ds7.1 so you may have problems with importing them to older versions of DataStage. So let me know I could probably create an older Ds version if I get requests to do so.

The jobs are straight forward and could easily be modified to do other things like export all log files. I have modified it to generate html documents using JobReport job posted on ADN. I wrote a matching set of html docs for routines. All of this is in KgdGenHtml.zip. Makes DataStage almost self documenting. I have a similar set of html documents in DwNav. This html looks much better. I should pretty up DwNav's version soon. They do not have a where used section which critical. I think you should be able to find all the jobs which reads or writes to a specific table so you can add a column to the jobs. That is a lot of what we do after startup on a EDW. If you did not build the jobs then it is really difficult to figure out what jobs update what tables. Not only that but a column may travel from a table to a hash file to sequential file and then bulk loaded to the EDW table. DwNav helps simpify this process either within the tool or the html documentation. I plan on creating some of this on top of the free stuff KgdGenHtml.zip but don't expect tomorrow. DwNav is almost free anyway.

Most EDWs are seriously lacking in documentation. I want to post an ETL checklist of sample ETL documents like source to target documents, design ideas, how to use metadata to help you design ETL and similar concepts. I just want to help jumpstart this part of our industry. I hate documenting stuff. I created most of this stuff because I do not like spending time on it. It definitely helps organize the ETL side of EDW.
Mamu Kim
peternolan9
Participant
Posts: 214
Joined: Mon Feb 23, 2004 2:10 am
Location: Dublin, Ireland
Contact:

Post by peternolan9 »

kduke wrote: Most EDWs are seriously lacking in documentation. I want to post an ETL checklist of sample ETL documents like source to target documents, design ideas, how to use metadata to help you design ETL and similar concepts. I just want to help jumpstart this part of our industry. I hate documenting stuff. I created most of this stuff because I do not like spending time on it. It definitely helps organize the ETL side of EDW.
Hi Kim,
you are right but it's not because the methodologies, checklists etc are not there....

Iterations (now owned by Ascential) is the best methodology money can buy for the DW space and it has all these things in it....I think it is on the price list for only about USD15K now....

Unfortunately, customers just don't buy methodologies in any great number so it's just not cost justifiable to spend a lot of money developing them.

When I worked at Ardent I had a USD10B revenue client who was spending in excess of USD500K developing 4 separate DW methodologies across the company....I thought I was a 'sure thing' to sell Iterations.....amazingly, I failed on that occasion...

From then, if the customer did not want to take my recommendation of buying the methodology we just took 'all care and no responsibility'. If a customer wanted Ardent to take responsibility for the project, and to act as a long term partner, the methodology was part of the deal, no discussion entered into....and it used to be USD40K back then....

Pesonally, I think anyone building a DW without a methdology should also try to drive from London to Vienna without a map....about the same chance of success....!!!!!
Best Regards
Peter Nolan
www.peternolan.com
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Peter

What is Iterations?
Mamu Kim
peternolan9
Participant
Posts: 214
Joined: Mon Feb 23, 2004 2:10 am
Location: Dublin, Ireland
Contact:

Post by peternolan9 »

kduke wrote:Peter

What is Iterations?
Hi Kim,
my, someone at Ascential is not doing their job, you being a DataStage person should have had someone tell you about Iterations.

It is the development methodology first developed by Prism Solutions and it came into Ardent when Ardent bought Prism.

It is THE best methodology money can buy. It has everything that opens and shuts for an ODS/DW implementation project. Templates, samples etc. When I was the Professional Services Manager in Australia we pushed it quite hard, but alas did not get many new sales in my time. We used to use it at all our largest accounts which was about 15 if memory serves me correctly.....

It was also licensed by a number of major SIs to deliver their DW solustions...I believe CSC and EDS still license it for their projects, but I'm 3.5 years out of Ardent now so anything could have happened... ;-)

The only better methodology around is now owned by IBM and you can't buy it. It was part of the PwC Management Consulting Services. A friend of mine was the primary author and it is a stunning work...but like I said, not for sale.

So, my point was it is not like there is any shortage of guidelines on how to document EDW projects. Simply put, the business lets IT get away with not documenting EDW projects, and the business must pay the price for this indulgence in the end.
Best Regards
Peter Nolan
www.peternolan.com
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Peter

That is my point. I thought if I could get people to do some of these things like row counts then they could see the benefit of doing more and more documentation and DataStage admin duties. I think it is terrible that people do not have source to target documents and even if they do then they start coding before they know the mappings. EDW development is so out of control at so many places. They do not have a clue how to get organized and stay organized.

Maybe you should outline this methodology and explain it to us and why it is so good. Why doesn't ASCL promote it here in the US.

Thanks Kim.
Mamu Kim
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Correctly the name is all upper case (ITERATIONS).

Allegedly it's the methodology used by all Ascential consultants world-wide, but I have seen no evidence of that.

It is a very comprehensive methodology; the first steps, for example, seek to determine whether the enterprise is even ready to adopt DW-type application.

And there's a comprehensive graphic, so that you have a picture to use when explaining to the PHB what you're up to.

But ITERATIONS does not address methods to obtain reconciliation of row counts or other data. It does identify this kind of check/audit as a necessary phase in the process, but does not seek to prescribe how it must be done. The intention is that ITERATIONS is agnostic as to the particular tools used (it has phases for tool selection).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Ray

You don't seem convinced that it is useful.
Mamu Kim
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Au contraire, it's extremely useful, as it includes templates for every piece of documentation (MS Word, MS Excel and MS Project) that you need. But it would have taken too long to explain all this! Get someone at Ascential to send you just the schematic (a single A3 sheet).

If there's enough interest, I'll summarise ITERATIONS on the Forum this weekend, since there seems to be nothing about it on the Ascential web site.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Explain more.
Mamu Kim
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Kind of funny... I ended up sitting through the Tips & Tricks Part 2 webinar this morning and while I'd never heard of it until it was mentioned in this thread, the presenter made an off-hand comment about their methodology - something called 'Iterations'. Err... sorry. :oops: ITERATIONS.

I'd be interested in whatever you'd be willing to post, Ray.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

CRAIG

I HEAR YOU, MAN. ITERATIONS, never heard of it till this week. I also saw the webnar. 7.5 is cool. Bunch of new stuff.

RAY, ENLIGHTEN US.

Thanks Kim.
Mamu Kim
Post Reply