Sometimes one has to use a dummy aggregate function, for example when using a window function over a single row (cf. this answer. Here a "dummy aggregate" means an aggregate function like sum() or max() calculated over a single row (and therefore, a single value). For that reason, it would be semantically irrelevant whether one uses max(), min(), sum(), avg() or even list() (unless the datatype does not allow for that).
Is there any reason to prefer one of those aggregates over others when it comes to applying them to a single row?
Or is there some optimization that would just use the single value in all these cases - thereby make the calculation (nearly) identically simple?
I don't think it matters all that much which function you use, because a difference in the choice of function is not going to significantly affect the choice of access plan. I, too, like Breck's choice of
I have tried to leave this question alone because Glenn's answer is helpful and correct for the question as asked. However, when I first read the question subject, I presumed a different question because I didn't read the parentheses at first. The following is an answer to that presumed question.
The cheapest aggregate function to use (when possible) is
In your examples, you need to actually extract a value from a group. This comes up in a number of contexts; another one I have seen occurs when you need to retrieve a column that is functionally determined by the GROUP BY but the server is not able to detect that. Consider:
select a,b, any_value(c), sum(d) from T group by a,b
Here, assume that (a,b) is a candidate key for T that the optimizer does not recognize. The server will give an error if you don't use an aggregate for c: you could put it in the GROUP BY, but you might prefer not to. Then, you could use an aggregate that selects any value of c. The MIN() and MAX() aggregates work for most data types, so they are good candidates. SUM() may be a little cheaper but is only appropriate for numeric types. The FIRST_VALUE() is acceptable with a window specification, but it will not be parallelized by current servers.
All these aggregates have the unfortunate property that they evaluate the argument of the aggregate for each row in the group (yes, even the FIRST_VALUE). This can be a problem if the argument is a UDF or subquery. It would be nice if for these purposes there were an "ANY_VALUE()" aggregate that only evaluated the argument one time (and Ani Nica has requested this a few times). Until it is available, the best recommendation I can make is to use any of the aggregates discussed; if you do have a UDF or subquery, then prefer to evaluate it with the aggregate as an argument instead of as an argument to the aggregate (so it will be evaluated only once per group instead of once per row).
However, from my formative years on a dairy farm, I'm pretty sure even a cow knows all of the above.