Capture Number of records in a table load

hi_manoj · Post by **hi_manoj** » Fri May 03, 2013 4:26 am

Hi All,

I have around 130 job which writes to table, it may be insert or update.
The database is Oracle and in some job I am using Oracle connector stage and in some jobs I am using Oracle Enterprise stage.

Now my task is to create a common routine/script which will execute (may be after each job completes) and capture the Job name and and how many number of records inserted to a table in to a file and latter on I will insert those information to my audit tables.

I have tried with DSJobreport routine with xml format (2) but I think it will not help me to have a generic job. I have to create different xsd files for each job

Please help.

Regards

ray.wurlod · Post by **ray.wurlod** » Fri May 03, 2013 5:24 am

Why not just use a common naming convention for the link that connects into the Oracle stage and create a simple routine that calls DSGetLinkInfo() function to return the link row count?

kduke · Post by **kduke** » Wed May 08, 2013 9:02 pm

Use EtlStats. I think audit tables like this are silly. True audits are not asking DataStage how many rows did you send down each link then comparing rows coming in to rows going out. That is like asking someone did you do a good job. They always say yes. Runtime metadata is NOT an audit. Most developers leave off the reject links. Then you never know is something got rejected. I am shocked at how many PX jobs have no reject links. Who taught these people?

An audit should be unbiased. It should run simple SQL against the sources and compare to the targets. I wrote several blogs about ETL Completeness. Almost nobody read them. Even fewer people use the "canned" set of jobs I gave away to actually audit your ETL.

Do not call runtime metadata audit tables. It is not accurate.

prasson_ibm · Post by **prasson_ibm** » Wed May 08, 2013 11:55 pm

Hi duke,
I have the same kind of requirement in my project,we have target as oracle and after inserting records to target table we need to call a package with input value as record count of data inserted and reject count.I am getting a reject link on target stage.
We are using DSGetsLinkInfo() function to get the link count and applied outputlinkcount - rejectlink counr to get number of record inserted into table and its working fine for me.

kduke · Post by **kduke** » Thu May 09, 2013 6:11 pm

All I am saying it is not a valid audit. It may be useful. If you ask an auditor to check the books. He does not go up to the accountant and say does your books balance? If a QA person would check the ETL loads then they would not or should not use DataStage. They should run SQL against the source and and a separate SQL statement against the target to see if they match. That is exactly what ETL Completeness will do. You need to load your SQL statements into tables. There are 3 tables. One stores the SQL statements. One stores the results of each SQL statement. Each SQL statement has a name called measure name. One table associates the source measure to the target measure for reporting. The report gives you the percentage of target divided by the source.

This whole process simulates what a QA person would do to check source to target row counts or totals. The SQL can sum a column or count rows. The SQL can get very complex.

I am saying my process is textbook correct. ETL runtime metadata is a history of what the ETL did. Almost a log if you will. Not really a valid textbook "audit". You can do it any way you want. If you ever get truly audited then good luck trying to prove your runtime metadata is a valid audit.

kduke · Post by **kduke** » Thu May 09, 2013 6:16 pm

Do a search for completeness on this site and you find several topics about it.

Also:
http://it.toolbox.com/blogs/object-orie ... orts-35391

http://www.duke-consulting.com/Setting_ ... eports.htm

kduke · Post by **kduke** » Thu May 09, 2013 6:45 pm

A lot of the information on this is crude because it for my use mostly. So I documented for a few customers. Ron is the only other person I know of that has installed it. I have it at a few customers or former employers.

My documentation is not as cool as Andy's or Ken Bland's. It is free so don't complain and I might help you install it.

Connection information is stored in a table. Most databases can encrypt the password column. So it is your responsibility to hide this information. You need information on how to use encrypted columns on a specific database then ask. Maybe I have done it or someone else here has done it on your type of database.

Also these are all server jobs because this was written a long time ago. If you want to convert it then go ahead. Works fine as is. If someone starts to use this then let me know. I have created some pretty cool ways of measuring completeness. I have some concepts you might not of thought of like revenue for previous month or day. We also used the history to see if revenue was growing or how fast our customer base was growing. You can compare revenue breakdown by sales source like internet sales versus direct sales. Lots of clever uses once you get it working. Would be easy to use a dashboard on top of this results history.