The forum will experience an outage sometime between February 10 at 7:00pm EST and February 12 at 11:59 EST for installation of security updates. The actual time and duration of the outage are unknown but attempts will be made to minimize the downtime. We apologize for any inconvenience.

We have a customer whose database has grown by about 50% over the last year - to around 12GB now. They have been complaining in a fall off in performance which has been hard to quantify - but can be seen in some (fairly complex) reports. The number of users is relatively small - 20ish.

The server is running Windows Server 2008 R2 on a virtulised platform (VMWare) with pretty fast links to a SAN. Initially 8GB of RAM was available, 6GB allocated to a fixed db engine cache.

An initial look at the server showed heavy fragmentation of the database file (1800 segments), that was defragmented but without any noticeable improvement. Looking at the resource monitor showed that the working set of the dbsrv10 process was using pretty well all the allocated RAM. It also showed very heavy disk activity on the dbfile when the "complex report" was run - taking about 2 mins to complete. A second run of the same report was almost instantaneous with no disk activity. Processor activity remained around 5%. If the same report was run a few hours later - it was slow again, suggesting that the required data had been moved out of cache.

From this we concluded that more RAM might help, and accordingly it was increased to 14GB and the fixed dbcache to 12GB. Re-running the same report after the database had been running for a day or two, produced little improvement. The working set of the db engine had increased, varying between 8 and about 10GB. From previous db activity all the required data should have been in cache. Looking at the disk activity showed a major difference - whereas before the activity related to the db file - it is now related almost completely to the Windows swap file. If I understand it correctly, the "hard faults" number also suggests that data required by the db engine is having to be retrieved from the paging file. Overall physical memory usage (reported by Resource Monitor) seems to vary between 75% & 100%, but doesn't seem to be driven by running particularly demanding queries.

Our thought now is try try reducing the db cache to give the RAM to the OS - say an additional 2GB to start with, but does that make sense?

alt text

alt text


asked 11 Oct '12, 11:32

Justin%20Willey's gravatar image

Justin Willey
accept rate: 20%

edited 11 Oct '12, 11:33


Have you tried looking at the plan for the slow query for the case when the cache was cold? Perhaps the query can be rewritten or an index added that would result in not as much data being required to generate the answer (and hence not as many I/Os needed to get the data into cache) and thus the db cache could be reduced back to its original size?

(11 Oct '12, 11:46) Mark Culp
Replies hidden

Not yet in detail Mark - the problem doesn't seem to be about just a single report but "general slowness in the system". The overall problem has been difficult to quantify - the report has been a useful proxy for that. I will however look at the plan again in case that particular thing can be improved.

The new high level of paging file activity relating to the db engine is happening even when the particular report isn't being run - though it climbs further when it is.

There is also some internal fragmentation of important tables and indexes - however that wouldn't explain why the paging file has suddenly got so popular!

(11 Oct '12, 11:57) Justin Willey

Is there anything else running on that machine that could explain why the OS may have trimmed the working set? Is it a file server as well?

(11 Oct '12, 13:29) John Smirnios
Replies hidden

Have you taken a look at the "Avg. Disk Queue" metric for the disk in Windows Performance Monitor? This metric will help you know when your disk is "working hard" during these times.

Yes, you're correct on your analysis: "Hard faults" is the metric for page faults and we'll have to go out to the Windows swap file for those misses. This is because the process size of the database server (plus OS requirements) exceeds how much RAM you have.

You may be possibly trading off one type of disk work for the other by shrinking the database server cache size though: currently your database server is configured to take up memory for cache such that your OS has to start swapping. If you configure the database server with less cache, you can avoid the OS swapping behaviour but the database server will likely be working harder against its temporary file due to the lower cache value (which would be advantageous in terms of configuration but may not be advantageous in terms of actually increasing the database performance).

Adding more system memory would be the only real way to prevent the disk trade-off.

Can you reproduce the same slowness if you run the report, run sa_flush_cache and try to run the report again?

How often do you review your database statistics on your tables (as this database has grown significantly over time)? Are there any particularly large/growing tables that you frequently add/remove data from that may be involved in this report that may gain a benefit by reviewing the statistics?

(11 Oct '12, 13:31) Jeff Albion
Replies hidden

It's a dedicated (virtual) machine - just the db engine and index server (which is using very little). I've been wondering about the OS memory requirement - in the past we've assumed 2GB is enough for a machine not doing anything else - but maybe times have moved on?

(11 Oct '12, 13:51) Justin Willey

The average disk read queue figure is indeed going right up for the "slow" runs compared with not moving for the "fast" runs. Interestingly, if I run the sa_flush_cache() and then run the report again - while it is a bit slower and the disk reads go up - it's not nearly as slow as when I run the report having left it for a few hours of normal database use. It's as though the optimizer is forgetting something.

Also, whereas before there was no significant CPU usage - that now goes to 50%.

Looks like next step is to play with statistics perhaps?

(11 Oct '12, 14:32) Justin Willey

The difference in performance between sa_flush_cache() and the OS swapping out the cache can resonably be expected. If the OS pages swaps out the cache, each page will be brought in serially as each page fault is addressed. Any thread that encounters a page fault will be suspended until the page is brought in.

If the cache is empty, the query execution engine can use asynchronous read-ahead to drive IO to higher levels and increase CPU utilization by ensuring that data is in cache before it is actually needed by query execution.

(11 Oct '12, 14:53) John Smirnios

It's hard to say what the OS requirements are and there could be other OS things running in the background that do file access and cause the file cache to grow and force the db cache out of memory. The task manager's Performance tab will show you the size of filesystem cache (the "Cached" line in the "Physical Memory" box). There is probably an equivalent perfmon counter. It may be interesting to monitor that counter.

(11 Oct '12, 14:57) John Smirnios

Thank you, that makes a lot of sense - it also explains why the raw bytessec figure seems higher reading the database file rather than the swap file. The moral seems to be that having to use the swap file is bad news - so we need to think about yet more RAM as Jeff suggested (unless we can find Windows process to kill off permanently).

(11 Oct '12, 15:10) Justin Willey

Do monitor the size of the filesystem cache as mentioned above -- it could be the culprit.

(11 Oct '12, 18:06) John Smirnios

Will do. Is the idea that if that grows it will cause the db server to relinquish cache from the working set - meaning that the stuff I want kept may be being discarded?

I'll keep an eye on the filesystem cache while the system is in proper use tomorrow and see what happens. At the moment, with just me running tests, it's pretty static at around 0.5GB. If that is the case, the cure presumably would be more RAM (as Jeff suggested above)?

If nothing else, I'm learning a lot with this exercise!

(11 Oct '12, 18:16) Justin Willey
showing 3 of 11 show all flat view
Be the first one to answer this question!
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text]( "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:


question asked: 11 Oct '12, 11:32

question was seen: 1,705 times

last updated: 11 Oct '12, 18:18