Number of rows affected

yiminghu · Post by **yiminghu** » Mon Jan 05, 2004 10:27 am

Hi Everybody,

In our old hard coded ETL process (implemented using SQL stored procedure), whenever there is an action that modifies table, it generate a record in a log table, the record logs which table and how many rows have been affected by the action. Since SQL server offers a global variable @@rowcount, it is very easy to implement the functionality. I am wondering whether you can easily implement the same thing in datastage?

I noticed that in datastage, the log file shows row numbers passing through each stage if you turn the monitor on, but such information is messed up with other information, it is not easy to fetch the information.

Can anybody offer me any solutions?

Thanks,

Yiming

kcbland · Post by **kcbland** » Mon Jan 05, 2004 10:44 am

If you have MetaStage, you can use that to collect such information. However, nothing gets you away from naming links according to a consistent and meaningful convention.

In addition, your ETL design architecture will assign meaning to your link statistics. If you have a job design that simply passes rows to an OCI/ODBC stage that has custom SQL, stored procedure, or SQL that does a wildcard update (update on something other than the primary key that can/will update more than one row), then your link statistics become meaningless. If you are using INSERT ELSE UPDATE type SQL, then you also don't get individual metrics for insert and updates. Furthermore, if you superfluously updated a row (no material difference), then your ETL design could have interrogated the row first and discriminated the update.

So the answer is, a well design process will graphical depict the flow and the naming convention for the links will dictate the ease in use in metric gathering and analysis. A thoughtful plan from the beginning makes it EASY on the backend once you have to start doing the day-to-day maintenance and reporting on the process. Too many ETL applications are poorly designed and planned, and their process and design metadata reflect that. The solution is not the tool, it's the architect you used. That's the value of a seasoned ETL architect, you get your money's worth!

Teej · Post by **Teej** » Mon Jan 05, 2004 12:03 pm

We employs a script that handles this automatically. It is ideal for us because we use a third party scheduler to execute jobs, so using this script instead of dsjob, everything gets handled properly behind the scene.

I am sure that at least with PX, it is possible to do the same thing using a custom stage, but the above script works for PX and Server so why reinvent the wheel?

-T.J.

ray.wurlod · Post by **ray.wurlod** » Mon Jan 05, 2004 5:33 pm

An "active stage finishing" event in the log file shows the number of rows processed along each of its links. You could obtain the number of rows sent to your database from this event in the job log.

Alternately you could add some code in an after-job or after-stage routine, or in job control code, to use the DSGetLinkInfo() function to interrogate just that link, to get its link row count.

Note that this is not necessarily the number of rows affected; it is the number of rows sent by DataStage to the database server. It is only the same as the number of rows affected if the database server accepted all rows (did not roll back any transactions).