Let's say I have a giant SELECT buried inside an application, and I just want to do a quick dbisql timing test without all the application complexity. Here's the pseudo code for an example

SELECT blah blah blah FROM blah blah blah WHERE blah blah blah AND SO ON;

All else being equal (same predicates, same host variable values, same everything else except the SELECT list), can I rely on the following being just as slow as the original?

Let's further assume there is no funkiness in the SELECT list, no user-defined function calls that do RECURSIVE CROSS JOIN UPDATEs between DBA.Mars and DBA.TheMoon, none of that stuff, just ordinary columns.

SELECT COUNT(*) FROM blah blah blah WHERE blah blah blah AND SO ON;

asked 22 Jul '11, 05:41

Breck%20Carter's gravatar image

Breck Carter
32.5k5417261050
accept rate: 20%

What about

select count(*) from
(SELECT blah blah blah
    FROM blah blah blah
    WHERE blah blah blah AND SO ON) S
(22 Jul '11, 07:14) Volker Barth
Replies hidden
2

Unfortunately (for Breck in this case; fortunately for other SA users the rest of the time), the optimizer is likely to prune this query down until the intermediate select list is empty, at which point it is no different from Breck's original count(*) question.

(22 Jul '11, 09:08) David DeHaan

There are a number of reasons the COUNT(*) variant can be faster than the original (I wouldn't expect it to be slower). The query does not need all of the columns that the original query did, so the optimizer can eliminate tables (Elimination of unnecessary inner and outer joins). Further, table columns are not needed in the select list and this may allow a table to be accessed using index-only retrieval where this might not have been possible in the original query. Even if neither of these changes occur, any materializing operators may have a reduced data size because they don't need to carry up columns that are used only in the select list.

I know that you suggested there are no UDFs in the SELECT list (those would not need to be evaluated in the COUNT(*) variant), but even builtin functions can take time to evaluate.

There are a couple of techniques that I use (with their own limitations). You can use a row-limitation clause such as "TOP 1 START AT 1000000000". This won't return any rows to the client, but it does have the same intermediate data characteristics (materializing operators will copy base table columns needed in the original). However, this approach does not evaluate the SELECT list expressions, and it also may alter the plan selection. Another approach is to rely on the GRAPHICAL_PLAN() builtin function with a statistics-level of 3 (or use dbisql's plan viewer and "Detailed statistics". This adds a small amount of per-row cost at the root of the query, but it does not change the plan. All SELECT list expressions are evaluated and the plan is unmodified from the original. Care is needed, though, to match the characteristics of the original query (cursor type, etc.) and this approach does not work with host variables in the statement (they can be replaced by connection-level variables if necessary).

Another approach I have recently become fond of is to use the sa_copy_cursor_to_temp_table() system procedure, setting first-row to 9223372036854775807 so that no rows are actually copied. The advantage of this approach is that it can be used with any statement where you can open a cursor, and it builds the cursor "in-situ".

Comparing the techniques, the TOP-n/START AT approach is closest to the COUNT(*) one. You don't need to turn the statement into a string (escaping ') and it can work with host variables. It is probably closer in performance to the original than the COUNT(*) approach. The GRAPHICAL_PLAN() is probably more accurate, but it may require more setup (convert to string, convert host variables to connection variables). The sa_copy_cursor_to_temp_table() approach works well for timing in-situ but may not be the simple approach you want to reach for when first analysing a query for performance.

All of these approaches intentionally eliminate the network time needed to send the result back. In the cases where that is pertinent, the fetchtst tool is a great way to test the performance of a query; it allows you to control the cursor type, it supports host variables, and it can simulate different ways of fetching the result set. It can also be combined with the above described techniques.

permanent link

answered 22 Jul '11, 06:52

Ivan%20T.%20Bowman's gravatar image

Ivan T. Bowman
2.8k22732
accept rate: 39%

3

You should write a book. I know, you have real work to do, but still, you should do it.

(22 Jul '11, 14:38) Breck Carter
Replies hidden

Well, a blog would do, too ... I guess I remember you sighing "If Ivan had a blog" once upon a time in your blog, Breck:)

(22 Jul '11, 17:26) Volker Barth
2

Thank you @Breck and @Volker; I appreciate your kind comments (and Breck's earlier suggestion that I start a blog). I do have a blog (actually two) that are both for limited consumption. I haven't found that I update either with any appropriate frequency. My non-work blog was last updated six years ago, my "work internal" blog over 2 years ago. If I could achieve a better frequency, I'd consider having a public work-related blog.

(25 Jul '11, 19:21) Ivan T. Bowman
Replies hidden

It gets easier with increased frequency... and nobody's gonna complain if you do a bit of self-plagiary, like edit and republish the detailed answers you have posted here and on the newsgroups. Or not... leave them for ME to republish :)

(26 Jul '11, 08:32) Breck Carter
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×275

question asked: 22 Jul '11, 05:41

question was seen: 2,642 times

last updated: 26 Jul '11, 08:32