I have a table containing various terms and usage counts (plus related info).

To update this table, I have a stored procedure. This is executed while I import a bunch (80000+) of records. I have found that importing those records in parallell cuts execution time in half (on my old dual core, soon to be upgraded to quad, PC).

The procedure first checks if the term exists. If it exists then the count is updated and some other operations performed. If not, then it gets added.

This design is clearly not threadsafe.

Since this particular operation is relatively light weight, I'd happily lock the entire table if I can ensure that nobody inserts anything to it after I have checked if the term exists.

Is there a clean and nice way to do this? I'd do a lock() in my .net code, but I have no control over the number of instances running this code. It would solve my immediate problem, but not the underlying one.

asked 12 Oct '11, 11:58

Rune's gravatar image

Rune
2365817
accept rate: 50%

edited 13 Oct '11, 03:34

Volker%20Barth's gravatar image

Volker Barth
40.2k361549822

Exactly what is "not threadsafe"? Please explain. If a row is updated but not committed, all other attempts to update that row are blocked.

(12 Oct '11, 13:43) Breck Carter

Do you try to insert (or update) each record on its own or are you doing set operations? (I guess the former is true.)

Is the whole procedure designed as an atomic operation? - IMHO, that would prevent concurrent inserts.

(12 Oct '11, 16:12) Volker Barth

Volker, yes, the procedure is designed as an atomic operation. It assumes no-one else (i.e. another instance of itself) will add the same term before it does.

Breck, there is a race condition. This stored proc first checks to see if the term is already present. There is no guarantee that it will manage to sneak in a new row before someone else, hence it will sometimes end with a primary key violation.

The sad part is that our database contains other storedprocs designed using a similar pattern (involving different tables). So ideally I need to figure out the most efficient way of solving this. (and I need to be careful about what I ask for -- "lock table" looks tempting, but probably not optimal :) )

(13 Oct '11, 02:39) Rune
Replies hidden
1

@Rune: I'm slow, without actual code to look at it took me a while to figure out what the problem was. It's common... not just in your database but all across the world, perhaps in other galaxies as well.

Another approach, which might work in other cases but probably not this one, is to perform an UPDATE first and then the SELECT; for example, when incrementing a single column in a single row of a single table as part of a home-grown "auto increment" solution. UPDATE first to do the increment, then SELECT the new value and finally COMMIT, guarantees that all connections will single-thread through the incrementing process... because only one connection can UPDATE at a time, all others wait... even at isolation level 0.

(13 Oct '11, 08:43) Breck Carter

The WAITFOR DELAY is just for demonstration purposes, to guarantee collisions:

CREATE TABLE MyTermTable ( 
   MyTerm INTEGER NOT NULL PRIMARY KEY,
   counter INTEGER NOT NULL );

CREATE PROCEDURE merge_MyTermTable ( IN @MyTerm INTEGER ) BEGIN IF NOT EXISTS ( SELECT * FROM MyTermTable WHERE MyTerm = @MyTerm ) THEN BEGIN DECLARE primary_key_violation EXCEPTION FOR SQLSTATE '23W01'; WAITFOR DELAY '00:00:05'; INSERT MyTermTable VALUES ( @MyTerm, 1 ); MESSAGE STRING ( 'INSERT: ', @MyTerm ); EXCEPTION WHEN primary_key_violation THEN UPDATE MyTermTable SET counter = counter + 1 WHERE MyTerm = @MyTerm; MESSAGE STRING ( 'Exception UPDATE: ', @MyTerm ); WHEN OTHERS THEN RESIGNAL; END ELSE UPDATE MyTermTable SET counter = counter + 1 WHERE MyTerm = @MyTerm; MESSAGE STRING ( 'Regular UPDATE: ', @MyTerm ); END IF; COMMIT; END;

-- dbisql session 1... CALL merge_MyTermTable ( 1 ); CALL merge_MyTermTable ( 2 ); CALL merge_MyTermTable ( 3 ); CALL merge_MyTermTable ( 4 ); CALL merge_MyTermTable ( 5 );

-- dbisql session 2... CALL merge_MyTermTable ( 1 ); CALL merge_MyTermTable ( 3 ); CALL merge_MyTermTable ( 5 );

-- dbeng12 console... INSERT: 1 Exception UPDATE: 1 INSERT: 2 INSERT: 3 Exception UPDATE: 3 INSERT: 5 INSERT: 4 Regular UPDATE: 5

SELECT * FROM MyTermTable ORDER BY MyTerm;

MyTerm,counter 1,2 2,1 3,2 4,1 5,2

permanent link

answered 12 Oct '11, 17:22

Breck%20Carter's gravatar image

Breck Carter
32.5k5417261050
accept rate: 20%

edited 12 Oct '11, 17:25

FYI: When adding the "WITH(HOLDLOCK)" table hint to the SELECT (see my comment on Mark's answer), the procedure will deadlock in this test situation - this could be handled as well.

(12 Oct '11, 17:55) Volker Barth
Replies hidden

Thanks Breck.

I was a bit nervous at first, because I associate exceptions with a performance penalty, but then I eventually (I'm slow) realized I observed 5 primary key violations in 84000+ records. I can certainly live with what is literally only a handful exceptions! :)

(13 Oct '11, 02:43) Rune
Replies hidden

If you prefer to omit the PK violation, the HOLDLOCK solution should work for you (though it will also throw an exception) - I guess that's a (mere philosophical?) question whether to prevent a race condition or to handle it after it has happened...

(13 Oct '11, 03:32) Volker Barth

@Volker: When you say "race condition" are you referring to the question, or the code in my answer?

At the point the EXCEPTION is thrown for the answer code above, the other connection has (by definition) performed an INSERT after this connection did the SELECT. This connection's attempt to do the UPDATE will then either (a) be blocked until the other connection commits because the other connection has the row locked, after which it will run, or (b) run without waiting because the other connection has released the lock.

(13 Oct '11, 08:02) Breck Carter

@Rune: You may have noticed that COMMIT is fast whereas ROLLBACK takes forever when a large number of rows are involved; same explanation, the engine is optimized for the 99.9999% case (transactions are committed) rather than the other one.

(13 Oct '11, 08:17) Breck Carter

@Volker: Are you surrrre it will deadlock? One connection has a resource the other needs, and vice versa? What are the two different resources? (they have to be different, two connections can't have locks on the same thing, or... I need more coffee :)

(13 Oct '11, 08:49) Breck Carter

@Breck: "Race condition" referred to the question, though I related it also to the possible PK violation that will be handled by your code - which is (of course!) a working solution IMHO:)

What I'm trying to say is that the HOLDLOCK attempt would prevent the PK violation whereas your code will handle it. Can't say which is more appropriate or performs better...

(13 Oct '11, 09:03) Volker Barth
showing 2 of 7 show all flat view

Perhaps you should look at the MERGE statement. The MERGE statement is capable inserting a row if the row does not exist otherwise it can update the row. Once you have inserted/updated the row you can then proceed to do your other operations.

permanent link

answered 12 Oct '11, 13:14

Mark%20Culp's gravatar image

Mark Culp
24.9k10141297
accept rate: 41%

1

MERGE semantics (and INSERT...ON EXISTING) aren't "threadsafe" in this sense either. The current semantics of MERGE and INSERT ON EXISTING are based on the underlying isolation level of the connection. That is to say, the existence check done by these algorithms is not done (today) using INTENT or WRITE row locks, so it's possible that concurrently-executing statements can incur duplicate primary key errors. These errors will still have to be handled by the application in some way.

(12 Oct '11, 15:30) Glenn Paulley
Replies hidden

Does it mean it would take isolation level 3 to prevent other transactions to insert rows while the particular process is about to insert these?

I.e. would something like this untested code fragment prevent concurrent inserts by use of the HOLDLOCK table hint (error handling omitted):

begin
   if (select count(*)
       from MyTermTable with (holdlock)
       where MyTerm = @MyTerm) = 0 then
      insert MyTermTable...
   else
      update MyTermTable...
   end if;
end;
commit;
(12 Oct '11, 16:21) Volker Barth

@Volker: You should test this... does COUNT(*) meet the definition of phantom row? Does the isolation level 3 HOLDLOCK thingie actually create anti-insert locks? ...these aren't rhetorical questions, I don't know, my cult sect doesn't allow isolation level 3 :)

(13 Oct '11, 08:12) Breck Carter
2

@Glenn: Are you saying that INSERT ON EXISTING actually tests for the row's existence before attempting the INSERT? As opposed to handling the exception? (which is what the syntax implies).

In other words, if I want code that is actually robust, as opposed to exhibiting the appearance of robustness, I have resort to Old School techniques? This is a revelation... :)

(13 Oct '11, 08:23) Breck Carter

@Breck: I hope Glenn will answer these very good questions - I don't have the "code", and I'm not even sure how to confirm that the behaviour I would expect will behave as expected...

And I do not usually use isolation level 3 (except during SQL Remote extractions...), nor do I remember to have used WITH (HOLDLOCK) in production code - we don't do that much in parallel:)

(13 Oct '11, 08:56) Volker Barth

Current tests with two dbisqlc sessions reveal that

select count(*) from MyTermTable with (holdlock) where MyTerm = @MyTerm

does seem to set the same phantom locks as

select * from MyTermTable with (holdlock) where MyTerm = @MyTerm

that is, such a SELECT will block your proposed stored procedure if blocking is on.

FWIW, isolation level 2 - i.e. with (REPEATABLEREAD) - wont't be sufficient, as it will prevent the modification of the row if that is already existing, however, it won't block the insertion of such a row.

Unfortunately, isolation level 3 seems to be blocking any insertion into this table with any @MyTerm value as any new row might be suspicious to fulfil the running select's conditions...

Resume:

I would think in this case the "curing" of PK violations is much better than to prevent them with isolation level 3.

(13 Oct '11, 11:34) Volker Barth

I have to correct my statement on blocking all inserts:

AFAIK, that would only happen when a table scan is used to build the select's result set. As I understand, when the query optimizer uses an index (e.g. for the PK), lesser rows will be blocked.

(13 Oct '11, 12:40) Volker Barth

@Glenn: Breck still might like to get an answer here - and me, too:)

(18 Oct '11, 07:11) Volker Barth
showing 1 of 8 show all flat view

Since this particular operation is relatively light weight, I'd happily lock the entire table if I can ensure that nobody inserts anything to it after I have checked if the term exists.

If you'd like to lock the entire table, there's the LOCK TABLE statement. However, the mechanism that you're specifically referring to (preventing one connection from modifying the table while another connection is also modifying the table) can be handled via Isolation Levels. If you didn't want to change your code to use the MERGE statement as Mark suggested, you might be able to adjust your isolation level for the stored procedure to use your existing stored procedure code and prevent concurrency issues.

permanent link

answered 12 Oct '11, 13:43

Jeff%20Albion's gravatar image

Jeff Albion
10.8k171175
accept rate: 25%

1

@Jeff: At the risk of sounding litigious, it's important to avoid confusion in other people's minds, especially when it involves One Of Life's Greatest Mysteries: Locking And Blocking.

The desired behavior of "preventing one connection from modifying the table while another connection is also modifying the table" does not depend on isolation level. If one connection has updated a row and not yet committed or rolled back, no other connection can update that row, not even at isolation level zero. Isolation levels can be used to "pump up the volume" of locks and blocks, but not reduce it... updates are sacrosanct.

I know I took that "quoted phrase" out of context, I know what you are getting at... but I also know that some folks read more into isolation levels than they should.

(13 Oct '11, 11:50) Breck Carter
1

Thanks Breck - you're right that my comments as written are highly misleading.

I was assuming that the conflict was reading from the source data from the original problem description (which is based on isolation levels), not writing the inserting data. (That was a bad assumption!)

I also should not have said "modifying" in that quote and instead should have said "reading" - not sure why I would indicate that isolation levels would affect writing data. "Even a cow knows" that writes always get exclusive locks! :)

(13 Oct '11, 12:59) Jeff Albion

create special table, which store information (flag) about execution state of your special "one thread procedure". in the beginning of this one thread stored procedure lock this special table in serial mode. So after that, you get one thread procedure.

permanent link

answered 12 Oct '11, 12:47

AlexeyK77's gravatar image

AlexeyK77
70761224
accept rate: 8%

This is a summary of my attempts with the HOLDLOCK table hint.

I think Breck's solution is better - so I just add it here for completeness...

I have changed Breck's sample procedure slightly:

CREATE OR REPLACE PROCEDURE merge_MyTermTable ( IN @MyTerm INTEGER )
BEGIN
   IF NOT EXISTS ( SELECT * FROM MyTermTable WITH (HOLDLOCK) WHERE MyTerm = @MyTerm ) THEN
      BEGIN
         DECLARE primary_key_violation EXCEPTION FOR SQLSTATE '23W01';
         DECLARE rollback_detected  EXCEPTION FOR SQLSTATE '40001';
         WAITFOR DELAY '00:00:05';
         INSERT MyTermTable VALUES ( @MyTerm, 1 );
         MESSAGE STRING ( 'INSERT: ', @MyTerm );
         EXCEPTION 
            WHEN primary_key_violation, rollback_detected THEN
               UPDATE MyTermTable SET counter = counter + 1 WHERE MyTerm = @MyTerm;
               MESSAGE STRING ( 'Exception UPDATE: ', @MyTerm );
            WHEN OTHERS THEN
               RESIGNAL;
      END
   ELSE
      UPDATE MyTermTable SET counter = counter + 1 WHERE MyTerm = @MyTerm;
      MESSAGE STRING ( 'Regular UPDATE: ', @MyTerm );
   END IF;
   COMMIT;
END;

-- dbisql session 1...
CALL merge_MyTermTable ( 1 );
CALL merge_MyTermTable ( 2 );
CALL merge_MyTermTable ( 3 );
CALL merge_MyTermTable ( 4 );
CALL merge_MyTermTable ( 5 );

-- dbisql session 2...
CALL merge_MyTermTable ( 1 );
CALL merge_MyTermTable ( 3 );
CALL merge_MyTermTable ( 5 );

-- dbeng12 console...
Exception UPDATE: 1
INSERT: 1
UPDATE: 2
INSERT: 3
Regular UPDATE: 3
Exception UPDATE: 4
INSERT: 4
INSERT: 5

SELECT * FROM MyTermTable ORDER BY MyTerm;

MyTerm,counter
1,2
2,1
3,2
4,1
5,2

In the end, it does block much more than Breck's version has to handle PK conflicts. Therefore I guess the performance will suffer because of the HOLDLOCK.

permanent link

answered 13 Oct '11, 11:59

Volker%20Barth's gravatar image

Volker Barth
40.2k361549822
accept rate: 34%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×125
×59
×21

question asked: 12 Oct '11, 11:58

question was seen: 5,704 times

last updated: 18 Oct '11, 07:11