Interesting SQL design conundrum

ShaneMuir · Post by **ShaneMuir** » Tue Sep 21, 2010 2:57 am

Hi All

I realise that this is probably more a question for an Oracle SQL forum but I will ask here anyway.

Currently I have a datastage job which is to select a number of records for processing. To do this it selects messages with an indicator field of null from the table and updates this indicator indicating that it is ready for processing. Easy enough so far. The problems begin to arise as the datastage job which calls this update is run as multi-instance. Further complicating matters is that when selecting the records, checks must be made that no related records are being processed in the other instances.

At first we thought we'd cracked it with a simple update select statement, as follows:

Code: Select all

UPDATE table SET indicator = 'process_0'
where record_id exists in 
(SELECT record id WHERE indicator is null
 AND NOT EXISTS
 (SELECT parent_record_id where indicator is not null))

Obviously that is not the full syntax, but basically we select records where the process indicator is null but exclude related records (those with the same parent record id) where the process indicator is not null (indicating that it is processing elsewhere). Basically related records should be processed either in the same batch/instance, or wait until that batch containing it has finished and then it can be selected by another instance.

This process works if the update job is running in a single instance. However when its running multiple instance its possible that 2 or more instances of the process can call the job at the same time and as a result the first one will be updating and prior to committing those records the second instance can select those same records and overwrite the indicator which its own data. (This indicator determines what process that record will follow).

Anyway so we tried updating the SQL to a stored procedure that would lock the records once selected so that a second instance could not select those records for update, as follows:

Code: Select all

CURSOR IS
(SELECT record id WHERE indicator is null
 AND NOT EXISTS
 (SELECT parent_record_id where indicator is not null))
 FOR UPDATE SKIP LOCKED;
BEGIN
FOR cursor
LOOP
  UPDATE table SET indicator = 'process_0'
  WHERE CURRENT OF cursor
END LOOP
COMMIT;

This process works mostly. Except that related messages can be selected by different instances if the multiple instances start at the same time (as when the second instance starts the related records might not be locked by the first process yet).

So I was wondering if anybody out there who has better SQL skills can see a problem with the logic I am trying to implement. I think its an interesting problem and would really like to know if its possible in the one procedure (I am sure it must be).

Just so you're aware we do have a work around where we have built a delay into the process and are running the original update select statement essentially as a single instance within the controlling sequence, so as stated above this is more a question of interest rather than necessity.

Looking forward to any feedback if any. Thanks in advance.

chulett · Post by **chulett** » Tue Sep 21, 2010 7:14 am

Yikes... way too early in the morning for brain-burners like that, perhaps later when my day is a little futher under way. Bonus points, though, for the use of "conundrum" in the thread title!

kduke · Post by **kduke** » Tue Sep 21, 2010 4:02 pm

You need to keep your instances from processing the same records. Lets say you have 3 instances running then divide a column by 3 and use the MOD() to return remainder so each SQL processes a different set of records.

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Wed Sep 22, 2010 1:48 pm

Can you try using WHERE (NODENUMBER = CURRENT NODE) or some similar syntax for your database? What you are trying to do is restrict the node to only update the records in its "slice" of the data. This prevents two different partitions from locking horns over a record.

mhester · Post by **mhester** » Wed Sep 22, 2010 3:15 pm

I am going to go out on a limb here and give my opinion that as long as the data are not filtered per Kim or Andy's suggestion (not even sure that would work either) and that each instance processes the same table and can run at the same time then it is unlikely that this can be solved with SQL queries.

I can think of ways to do this if using SQL is not a requirement.

ShaneMuir · Post by **ShaneMuir** » Thu Sep 23, 2010 2:22 am

Thanks for all your suggestions guys.

I will look into them as they might help provide some solutions, however I think that Mike might be right in that my requirements cannot be met by SQL alone. I have no problem stopping multiple instances from selecting the same records, its more the second requirement of excluding related messages from being selected by another instance.

The MOD idea would work, but the only issue would be with timing. Using that would preclude a record from being read by any other instance, but would also mean that it can only ever be read by a single instance. Which would mean that if it missed being selected by a couple of seconds, it would have to wait until that instance came around to processing again, rather than just getting processed by the next available instance. Unfortunately timeliness is a factor here.

As I mentioned we have a way around it, and this does appear to be working in production, but it has introduced a forced delay into the process by forcing the update job to wait until a certain number of seconds past the minute before starting. Whilst this is not ideal it introduces enough of a delay to ensure that the selection job has started before another process comes along. The a simple is job running command makes the process wait until the selection job is free.

Anyway thanks again for all your suggestions, very much appreciated, especially since its not really a DS question and it has a very specific set of circumstances that I'm sure people really didn't want to think about too much :D

I'll park this for now but will come back and update if we ever get an SQL solution.

FranklinE · Post by **FranklinE** » Thu Sep 23, 2010 8:51 am

Thank you, Shane, for posting an interesting topic.

I'm late to the discussion, but I thought you might find some value in this: row, table and instance contentions are old hat in my shop. We have high volumes of both data and users (people and apps). Our DBAs are tasked with managing that, and the SQL-level controls we use are at their discretion, not the developers'. Not knowing from your posts if this is already on your mind, but the conversation you need to have is with your database administrators about balancing row, table and schema locks. It looks to me like your situation can be resolved with row locks, judicious commit timing and reasonable wait-state settings that work with the DBA's general configuration of your database(s). Good luck.