I have a process in my application where I need to search a large list (large for the program. We're talking about 2.5 million rows, which isn't large by database standards).
Anyway, I have rows in table #1 that need to be matched up with rows in the large table (table #2) and new rows inserted into a third table. I use an INSERT INTO ... SELECT statement to do this and it works. The thing is that this process can only be run on a row in table #1 once. So after the INSERT INTO ... SELECT runs, table #1 is updated and a column changed so they won't be selected again.
There is a bug, however.
The data in table #1 is collected by a sensor device and then inserted into that row. There is a process that runs as a Windows service and actually executes a stored procedure that does the INSERT INTO ... SELECT statement. The problem is that it is possible for a row to be inserted into Table #1 in the time between the INSERT INTO . . . SELECT finishes determining which rows need to be included and the UPDATE begins. When that happens, that row that is never included in the results of the INSERT INTO ... SELECT.
To fix this, we are considering setting the transaction isolation level to SNAPSHOT. I know that this comes with a performance hit. The question is how big of a hit is it? Does anyone who has any experience with it have any information they will share with us?
I'm providing a little more information here to make things a little clearer.
The application is used in law enforcement. The sensors are collecting license plate information about vehicles that drive past them. There is a list of plates belonging to persons of interest (suspended license or plates, stolen plates or vehicle, wanted, etc.) This is table #2, and has 2.5 million rows in it in our test database. This list is maintained by our proprietary server and is downloaded to each client. When the software is first installed, a client has to download all of the rows in table #2 and add them to the table. From then on, the server updates that list daily, or multiple times daily, and the table in each client is updated.
As plate information is gathered, each plate has to be checked against table 2. It can only be checked once. This is table #1. It is possible for multiple sensors to be connected to one computer, and for each camera to collect the same plate information. This goes into table #1 as 2 rows. Because of the geometry of the installation, it is possible for one camera to "see" the vehicle some number of milliseconds before the second one. The check of table #2 for the first camera can be under way when the second camera's row is added to table #1. We're trying to prevent the check of the second camera's row from being marked as done before that check is actually done.
We tried Isolation Level 3 and it worked, at first. Then we started getting deadlocks while the initial load of table #2 was occurring. This wasn't acceptable, as it created untold other problems, including delaying the download of table #2, if it was chosen as the victim of the deadlock.
What we ended up doing was setting the isolation level back to the default and we added a
answered 17 Oct '12, 10:40
In your use case I think isolation level 3 would be better, as you want to prevent changes during your select. The snapshot will give you as it says a temp copy of the data, so it would prevent you from ommiting rows which are changed or deleted during your select. But as I understand the contrary is your problem, you select and might overlook newly inserted rows. This seems to fall in the class of phantom rows.
answered 14 Sep '12, 12:32