The forum will experience an outage sometime between February 10 at 7:00pm EST and February 12 at 11:59 EST for installation of security updates. The actual time and duration of the outage are unknown but attempts will be made to minimize the downtime. We apologize for any inconvenience.

I have a process in my application where I need to search a large list (large for the program. We're talking about 2.5 million rows, which isn't large by database standards).

Anyway, I have rows in table #1 that need to be matched up with rows in the large table (table #2) and new rows inserted into a third table. I use an INSERT INTO ... SELECT statement to do this and it works. The thing is that this process can only be run on a row in table #1 once. So after the INSERT INTO ... SELECT runs, table #1 is updated and a column changed so they won't be selected again.

There is a bug, however.

The data in table #1 is collected by a sensor device and then inserted into that row. There is a process that runs as a Windows service and actually executes a stored procedure that does the INSERT INTO ... SELECT statement. The problem is that it is possible for a row to be inserted into Table #1 in the time between the INSERT INTO . . . SELECT finishes determining which rows need to be included and the UPDATE begins. When that happens, that row that is never included in the results of the INSERT INTO ... SELECT.

To fix this, we are considering setting the transaction isolation level to SNAPSHOT. I know that this comes with a performance hit. The question is how big of a hit is it? Does anyone who has any experience with it have any information they will share with us?



asked 14 Sep '12, 10:04

TonyV's gravatar image

accept rate: 75%

edited 15 Sep '12, 07:49

Volker%20Barth's gravatar image

Volker Barth

There may be a way to structure your SQL commands so they work without having to bludgeon performance to death... but we need to see a more thorough description of the requirements... or, just show us the actual code that has the problem.

(14 Sep '12, 15:09) Breck Carter

I don't have relevant experience, but asked a similar question lately:

(15 Sep '12, 07:51) Volker Barth

I'm providing a little more information here to make things a little clearer.

The application is used in law enforcement. The sensors are collecting license plate information about vehicles that drive past them. There is a list of plates belonging to persons of interest (suspended license or plates, stolen plates or vehicle, wanted, etc.) This is table #2, and has 2.5 million rows in it in our test database. This list is maintained by our proprietary server and is downloaded to each client. When the software is first installed, a client has to download all of the rows in table #2 and add them to the table. From then on, the server updates that list daily, or multiple times daily, and the table in each client is updated.

As plate information is gathered, each plate has to be checked against table 2. It can only be checked once. This is table #1. It is possible for multiple sensors to be connected to one computer, and for each camera to collect the same plate information. This goes into table #1 as 2 rows. Because of the geometry of the installation, it is possible for one camera to "see" the vehicle some number of milliseconds before the second one. The check of table #2 for the first camera can be under way when the second camera's row is added to table #1. We're trying to prevent the check of the second camera's row from being marked as done before that check is actually done.

We tried Isolation Level 3 and it worked, at first. Then we started getting deadlocks while the initial load of table #2 was occurring. This wasn't acceptable, as it created untold other problems, including delaying the download of table #2, if it was chosen as the victim of the deadlock.

What we ended up doing was setting the isolation level back to the default and we added a DECLARE LOCAL TABLE to the stored procedure that does the check. We copy information from table #1 into this temporary table first. The rest of the procedure uses the information in the temporary table to determine which rows in table #1 are used by the INSERT INTO ... SELECT and the UPDATE statements. This solves the issue and, once we got the INSERT INTO ... SELECT to use the right index, it's fast enough that deadlocks are very unlikely.

permanent link

answered 17 Oct '12, 10:40

TonyV's gravatar image

accept rate: 75%

So you haven't tried snapshot isolation (which should have solved the deadlock problems IMHO)? - I'm just curious about any performance experiences...

(Apparently, your choice to use a local temporary table to store data is a very useful and common way to make a "data snapshot".)

Aside: The subject matter reminds me of another notion of "snapshot" - does #table 2 have an external import routine? Just kidding:)

(17 Oct '12, 10:52) Volker Barth

We were afraid of the performance hit in overall database performace when turning on Snapshot Isolation. This gives us the snapshot without incurring the hit, whatever it is. Plus the table is dropped when the stored procedure exits, so we don't have to clean up after ourselves.

(17 Oct '12, 11:00) TonyV

collecting license plate information about vehicles that drive past them

That gives a rollback a very vivid meaning: "Sorry mate, couldn't check your plate, please roll back and try again...". - "Deadlock" works as well:)

(18 Oct '12, 03:53) Volker Barth

In your use case I think isolation level 3 would be better, as you want to prevent changes during your select. The snapshot will give you as it says a temp copy of the data, so it would prevent you from ommiting rows which are changed or deleted during your select. But as I understand the contrary is your problem, you select and might overlook newly inserted rows. This seems to fall in the class of phantom rows.

permanent link

answered 14 Sep '12, 12:32

Martin's gravatar image

accept rate: 14%

Yes, but snapshot isolation would as well prevent phantom rows: Both the insert ... select from #table1 and the update on #table 1 (if run in the same transaction) would see the same set of rows, so only the "seen" rows would be marked as "handled". - The next should then still list all rows that have been inserted after the begin of the first snapshot. - At least that is my idea of TonyV's description:)

And isolation level 3 might prevent the sensor device to deliver its data or would need an according buffer to wait until the phantom locks are released. That could or could not be a problem here...

(15 Sep '12, 07:48) Volker Barth

Volker Barth is correct in his interpretation of my issue. However, we ended up getting deadlocks when we tried isolation level 3. I'm adding an answer to indicate what we've done to get this to work.

(17 Oct '12, 10:21) TonyV
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text]( "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:


question asked: 14 Sep '12, 10:04

question was seen: 1,097 times

last updated: 18 Oct '12, 03:53