To get count(*) from tables

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
suresh.narasimha
Premium Member
Premium Member
Posts: 81
Joined: Mon Nov 21, 2005 4:17 am
Location: Sydney, Australia
Contact:

To get count(*) from tables

Post by suresh.narasimha »

Hi All,

Please help me out with a solution...

I have 50 jobs with one to one mapping from a source flat file to DB2 stage.

I have a table with 2 columns with table name and row count.

I need to create a job to populate the row count for the respective tables after all the 50 jobs complete.

How can i do that..

Regards,
Suresh N
SURESH NARASIMHA
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

At the end of each job, have an after-job subroutine, gather the row count for your target link going into the table. Or maybe even inside the job itself after that particular link finishes, populate the link count and the table name into a delimited file. Do the same in all 50 jobs. The mode of writing will be 'Append'. This way, by the end of all 50 jobs, you will have a single job pick up that delimited file and load that data into a table.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

EtlStats will do this. In the QA jobs it can count both the source and target tables and email you a report of both compared together. This is called a completeness report. You could use EtlStats just to get the row counts of all the jobs ran.

There are jobs within EtlStats which can extract the table names for you. It works on specific stage types. You are going to have to research how it works.

No matter what, there is no easy solution to your problem. You are going to have to invent something.
Mamu Kim
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Did i forget to mention ETLStats. :oops:
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Link row counts will only report the number of rows that DataStage sent to the database server/driver along that link. It does not report the number of rows in the table. For example you may have updated the same row 1000 times. The link row count will show 1000 even though there may be only one row in the table. Create separate queries or jobs to fetch the table row counts.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

The QA jobs in EtlStats are designed to count records in source and target and compare them. I have posted links to the reports which are generated from this type of automated QA process.

The way it works is it requires you to out SQL statements you want run in a table called ETL_QA_SQL. There is a sequence which will run all these jobs in the proper order to compare the numbers. Say the source table is MEMBERS and the target table is MEMBER_DIM. Your QA select statements are:

Code: Select all

select count(*) from MEMBERS;
select count(*) from MEMBER_DIM;
Now if you have 100 records in MEMBERS and 90 in MEMBER_DIM then you loaded 90% of the source records. Your MEMBER_DIM is 90% complete.

When you run these jobs then need to write the table which compares these 2 measures. The counts get loaded into a table called ETL_QA_RESULT and a column called QA_STAT. Now say you call the first MEMBER_DIM_S_1 and the second MEMBER_DIM_T_1. This goes in ETL_QA_SQL.QA_STAT_NAME. To report on these then you need one row in ETL_QA_RESULT_MAP where:

Code: Select all

ETL_QA_RESULT_MAP
  TABLE_NAME          = MEMBER_DIM
  ETL_QA_RESULT_ID_S  = MEMBER_DIM_S_1
  ETL_QA_RESULT_ID_T  = MEMBER_DIM_T_1
Mamu Kim
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

This is a very complicated process. It is hard to explain. I have tried many times to explain. If you get it working then please let us know.

I can also sum fields. So fact tables can be QA'd.

Code: Select all

select sum(AMT_BILLED) from invoices;
select sum(AmountBilled) from INVOICE_FACT;
Mamu Kim
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Most of these jobs in EtlStats were built in my spare time. A lot of these the customer said no and I built it anyway because I knew it had value but I could not explain it well enough for them to want to pay for it. After a while they just turn me loose knowing I am better at delivering than explaining. I hope you take these jobs and improve on them and post them where others can download the new improved versions.

A lot of blood, sweat and tears are in all my products. I worked hard on these prodcuts and expect them to be of great value to someone if only to myself. If I had more time and money I am sure they could all be improved but I am too old and tired to work that hard on my own time any more. Someone else needs to take ownership on some of these or buy some products.

I think some of these need to get added to Hawk. I hope if they do automated job generation that add automated QA scripts like the above. These are simple but powerful concepts. If users do not have confidence in your data warehouse then automate the QA and email them the results. You will be surprised how fast they start use the data warehouse.
Mamu Kim
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I created a document to help people det this up. I have no way of explaining the importance of communicating to your users the accuracy of your data warehouse esecially a datamart. These simple QA completeness reports will build customer confidence in your accuracy. Please try to set this up if you have any kind of star schema with a fact table and dimension tables. You will not regret it. Let me know how it goes.

http://www.duke-consulting.com/Setting_ ... eports.htm

This is free. It is part of EtlStats which is free. Download from my tips page.
Mamu Kim
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

If your customers like this as much as mine then you will be very happy. I thought about turning this into a product. I may still automate the tables needed for the reports. It will be easy to populate from any mapping table.

Post a couple of your source tables and target tables and I will show you how to set it up.
Mamu Kim
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I forgot:

http://www.duke-consulting.com/Download ... sSetup.zip

Sample:

http://www.duke-consulting.com/KimD/Etl ... 70305.html

I wish I could show the real reports where the fact table matches to the penny on several columns in the fact table. The trending reports are pretty cool as well. If you are a dot com then is your revenue up or down based on a daily average sales. How important is that to your company.
Mamu Kim
Post Reply