Page 1 of 1

Commit Progress Information Logging and Skipping

Posted: Fri Jun 11, 2010 9:45 am
by Ultramundane
Note: The requested capabilities may not work correctly when using a grid of servers. Thus, this request is for customers using one server or when only using one server for a set of related jobs.

1. Please add the following capabilities to each database stage so that the number of commits issued by a target stage can be tracked.

+ Add a property to every database stage that will cause each database stage to count the number of commits which were issued.

Note: If using more than a one node configuration file and only using one server, this should be a shared counter that is exclusively locked by the target database stage when updated. This is so that the counter is kept consistent.

+ Add a property to every database stage that will allow for this counter to be written to a plain text file after each increment of the counter. This counter will be overlaid in the file for each update/increment. When the job completes successfully, this file is deleted.

Note: If this property is specified, the lock on the counter must be remain until after the record is flushed to disk. After which, the exclusive lock on the counter is released for subsequent updating and flushing.

2. Please add the following capabilities to the sequential file stage and also to the dataset stage.

+ Add the capability of reading a counter from a plain text file into a variable that is part of the sequential file and dataset stages. This should be done as follows:
If the specified file does not exist
THEN set the V_SKIP_RECORDS counter to 0
ELSE IF the file exists and the counter is an integer
THEN set V_SKIP_RECORDS to that value
ELSE ABORT THE JOB

+ Add an extended property to the counter read that allows for a number of records to be skipped as follows:
V_SKIP_RECORDS * {COMMIT INTERVAL SPECIFIED BY USER} + <ANYTHING THE USER SPECIFIES>

Thanks

Re: Commit Progress Information Logging and Skipping

Posted: Fri Jun 11, 2010 10:05 am
by Ultramundane
Also, jobs must be written correctly to make use of these properties. That is, depending on partitioning methods employed, number of nodes, number of target stages and virtual target stages (that is number of nodes), this may not work properly. Thus, customers must create jobs correctly to make use of this capability.

Posted: Fri Jun 11, 2010 6:40 pm
by ray.wurlod
I don't see why this couldn't be written to handle execution on multiple machines, whether in a cluster or a grid, though the information would have to be collected (and maybe reported) per-node, just as row counts currently are.

One of the additional items of information might be the number of rows committed. More difficult is the count of separate insert, update and delete operations requested.