Answers to: What does this SUM OVER code do?http://sqlanywhere-forum.sap.com/questions/530/what-does-this-sum-over-code-do<p>Once upon a time I wrote this code... today my Sustaining Engineer asked "what does it do?" and alas, I am at a loss for words:</p>
<pre>SELECT rroad_ranked_table.table_id,
CAST ( SUM ( rroad_ranked_table.row_count )
OVER ( ORDER BY rroad_ranked_table.row_count DESC,
rroad_ranked_table.table_id ) -- OVER ( ORDER BY must be unique
AS DECIMAL ( 15, 4 ) ) AS sum_row_count,
sum_row_count
/ CAST ( ( SELECT SUM ( row_count )
FROM rroad_ranked_table )
AS DECIMAL ( 15, 4 ) ) AS fraction
INTO #temp_row_fraction
FROM rroad_ranked_table;
</pre>
<p>Here's the table:</p>
<pre>DECLARE LOCAL TEMPORARY TABLE rroad_ranked_table (
table_id UNSIGNED INT NOT NULL,
row_fraction DECIMAL ( 15, 4 ) NOT NULL DEFAULT ( 0.0 ),
row_count UNSIGNED BIGINT NOT NULL,
size_fraction DECIMAL ( 15, 4 ) NOT NULL DEFAULT ( 0.0 ),
size_in_bytes UNSIGNED BIGINT NOT NULL,
CONSTRAINT PRIMARY KEY (
table_id ) )
NOT TRANSACTIONAL;
</pre>enWed, 19 Mar 2014 14:38:19 -0400Answer by Ivan T. Bowmanhttp://sqlanywhere-forum.sap.com/questions/530/what-does-this-sum-over-code-do/20828<p>I <strong>really</strong> should know better than to quibble with Glenn, but here are a few comments anyhow:</p>
<p>First, the following statement from the help and Glenn's answer is not correct:</p>
<blockquote>
<p>"If the window specification contains an ORDER BY clause, it is equivalent to specifying RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW."</p>
</blockquote>
<p>The RANGE unit can <em>only</em> be used when:</p>
<ul>
<li>The ORDER BY clause contains a single element</li>
<li>The single element of the ORDER BY if of datetime type or a number type</li>
</ul>
<p>If you have an OVER clause with an ORDER BY with two elements, the you are not allowed to use RANGE units (you get an error Composite ORDER BY not allowed with RANGE SQLCODE=-966). In Breck's example, you could not use RANGE because there are two order-by elements. In other queries, the order by element could be a string.</p>
<p>Aha, you may be thinking, RANGE is clearly the wrong unit and we should instead use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. Unfortunately the semantics are not the same when there are ties in the ORDER BY column. Consider the following:</p>
<div class="codehilite"><pre><span class="k">create</span> <span class="k">table</span> <span class="n">T_OverExample</span><span class="p">(</span><span class="n">pk</span> <span class="nb">int</span><span class="p">,</span> <span class="n">x</span> <span class="nb">int</span><span class="p">,</span> <span class="n">y</span> <span class="n">double</span><span class="p">);</span>
<span class="k">insert</span> <span class="k">into</span> <span class="n">T_OverExample</span> <span class="k">select</span> <span class="n">row_num</span> <span class="k">as</span> <span class="n">pk</span><span class="p">,</span> <span class="n">pk</span><span class="o">/</span><span class="mi">2</span> <span class="k">as</span> <span class="n">x</span><span class="p">,</span> <span class="n">pk</span> <span class="k">as</span> <span class="n">y</span> <span class="k">from</span> <span class="n">sa_rowgenerator</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">10</span><span class="p">);</span>
<span class="k">select</span> <span class="o">*</span>
<span class="p">,</span> <span class="k">sum</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="n">over</span><span class="p">(</span><span class="k">order</span> <span class="k">by</span> <span class="n">x</span><span class="p">)</span> <span class="k">sum</span>
<span class="p">,</span> <span class="k">sum</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="n">over</span><span class="p">(</span><span class="k">order</span> <span class="k">by</span> <span class="n">x</span> <span class="k">rows</span> <span class="k">between</span> <span class="n">unbounded</span> <span class="n">preceding</span> <span class="k">and</span> <span class="k">current</span> <span class="k">row</span><span class="p">)</span> <span class="n">sumrows</span>
<span class="p">,</span> <span class="k">sum</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="n">over</span><span class="p">(</span><span class="k">order</span> <span class="k">by</span> <span class="n">x</span><span class="p">,</span><span class="n">x</span> <span class="n">range</span> <span class="k">between</span> <span class="n">unbounded</span> <span class="n">preceding</span> <span class="k">and</span> <span class="k">current</span> <span class="k">row</span><span class="p">)</span> <span class="n">sumrange</span>
<span class="k">from</span> <span class="n">T_OverExample</span><span class="p">;</span>
</pre></div>
<p>In this example, we have <code>sum</code> and <code>sumrange</code> always equal. Since there are duplicate <code>x</code> values, the sum is computed over a group with a single x value, and the sum is returned for all rows in the group. The <code>sumrows</code> column is computed using ROWS units and the result is different. Even if there are tie groups according to the ORDER BY, the SUM is computed cumulatively with each row generating a new value. If there are tie groups, the order within the group is non-deterministic(*).</p>
<p>SQL Anywhere does not support a relatively new unit of GROUPS. When an ORDER BY is specified but the window does not define bounds, then the default bounds are actually GROUPS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. The only real difference is that GROUPS does not put any restrictions on the number or types of elements in the ORDER BY.</p>
<p>A second comment is that in Breck's formulation there is a subquery to compute the total sum--usually this would be done as a second OVER clause as follows:</p>
<div class="codehilite"><pre><span class="k">select</span> <span class="n">pk</span><span class="p">,</span> <span class="k">sum</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="n">over</span><span class="p">(</span><span class="k">order</span> <span class="k">by</span> <span class="n">x</span><span class="p">)</span><span class="o">/</span><span class="k">sum</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="n">over</span><span class="p">()</span> <span class="n">cumulative_frac</span>
<span class="k">from</span> <span class="n">T_OverExample</span><span class="p">;</span>
</pre></div>
<p>In many cases there wont be a huge performance difference but this approach is really important if you have a PARTITION BY -- you want the fraction within the partition instead of the entire table.</p>
<p>(*) If the ORDER BY in an OVER clause is not unique, then the ordering is non-deterministic. Nevertheless, if there are two window functions with an identical ORDER BY, the server is required to choose the same ordering for both. So, if you were worrying about that detail, that one's ok.</p>Ivan T. BowmanWed, 19 Mar 2014 14:38:19 -0400http://sqlanywhere-forum.sap.com/questions/530/what-does-this-sum-over-code-do/20828Answer by Glenn Paulleyhttp://sqlanywhere-forum.sap.com/questions/530/what-does-this-sum-over-code-do/531<p>From the help, you'll know that</p>
<p>"If the window specification contains an ORDER BY clause, it is equivalent to specifying RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW."</p>
<p>Aside: my recommedation is to always specify either ROWS or RANGE in a window so that the intent of the computation is clear.</p>
<p>So the window aggregate SUM() is hence computing a cumulative sum for each row in rroad_ranked_table. The third SELECT list expression then computes the fraction of this cumulative sum over the sum of all of the row counts.</p>
<p>Does that help?</p>Glenn PaulleyFri, 19 Mar 2010 14:07:18 -0400http://sqlanywhere-forum.sap.com/questions/530/what-does-this-sum-over-code-do/531