Number of rows affected

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
yiminghu
Participant
Posts: 91
Joined: Fri Nov 14, 2003 10:09 am

Number of rows affected

Post by yiminghu »

Hi Everybody,

In our old hard coded ETL process (implemented using SQL stored procedure), whenever there is an action that modifies table, it generate a record in a log table, the record logs which table and how many rows have been affected by the action. Since SQL server offers a global variable @@rowcount, it is very easy to implement the functionality. I am wondering whether you can easily implement the same thing in datastage?

I noticed that in datastage, the log file shows row numbers passing through each stage if you turn the monitor on, but such information is messed up with other information, it is not easy to fetch the information.

Can anybody offer me any solutions?

Thanks,

Yiming
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

If you have MetaStage, you can use that to collect such information. However, nothing gets you away from naming links according to a consistent and meaningful convention.

In addition, your ETL design architecture will assign meaning to your link statistics. If you have a job design that simply passes rows to an OCI/ODBC stage that has custom SQL, stored procedure, or SQL that does a wildcard update (update on something other than the primary key that can/will update more than one row), then your link statistics become meaningless. If you are using INSERT ELSE UPDATE type SQL, then you also don't get individual metrics for insert and updates. Furthermore, if you superfluously updated a row (no material difference), then your ETL design could have interrogated the row first and discriminated the update.

So the answer is, a well design process will graphical depict the flow and the naming convention for the links will dictate the ease in use in metric gathering and analysis. A thoughtful plan from the beginning makes it EASY on the backend once you have to start doing the day-to-day maintenance and reporting on the process. Too many ETL applications are poorly designed and planned, and their process and design metadata reflect that. The solution is not the tool, it's the architect you used. That's the value of a seasoned ETL architect, you get your money's worth!
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

We employs a script that handles this automatically. It is ideal for us because we use a third party scheduler to execute jobs, so using this script instead of dsjob, everything gets handled properly behind the scene.

I am sure that at least with PX, it is possible to do the same thing using a custom stage, but the above script works for PX and Server so why reinvent the wheel?

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

An "active stage finishing" event in the log file shows the number of rows processed along each of its links. You could obtain the number of rows sent to your database from this event in the job log.

Alternately you could add some code in an after-job or after-stage routine, or in job control code, to use the DSGetLinkInfo() function to interrogate just that link, to get its link row count.

Note that this is not necessarily the number of rows affected; it is the number of rows sent by DataStage to the database server. It is only the same as the number of rows affected if the database server accepted all rows (did not roll back any transactions).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply