From ff6fe1502d2d8553fa3a305483344cb90a6411d4 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Mon, 13 Dec 1999 17:39:38 +0000 Subject: [PATCH] Update documentation to reflect availability of aggregate(DISTINCT). Try to provide a more lucid discussion in 'Using Aggregate Functions' tutorial section. --- doc/src/sgml/query.sgml | 73 +++++++++++++++++++++++++++++++++++++------- doc/src/sgml/ref/select.sgml | 21 +++++++------ doc/src/sgml/syntax.sgml | 53 ++++++++++++++++++++++---------- 3 files changed, 111 insertions(+), 36 deletions(-) diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml index a1830eb029..961bc0d9dc 100644 --- a/doc/src/sgml/query.sgml +++ b/doc/src/sgml/query.sgml @@ -361,39 +361,90 @@ DELETE FROM classname; Like most other query languages, PostgreSQL supports aggregate functions. - The current implementation of - Postgres aggregate functions have some limitations. - Specifically, while there are aggregates to compute - such functions as the count, sum, + An aggregate function computes a single result from multiple input rows. + For example, there are aggregates to compute the + count, sum, avg (average), max (maximum) and - min (minimum) over a set of instances, aggregates can only - appear in the target list of a query and not directly in the - qualification (the where clause). As an example, + min (minimum) over a set of instances. + + + + It is important to understand the interaction between aggregates and + SQL's where and having clauses. + The fundamental difference between where and + having is this: where selects + input rows before groups and aggregates are computed (thus, it controls + which rows go into the aggregate computation), whereas + having selects group rows after groups and + aggregates are computed. Thus, the + where clause may not contain aggregate functions; + it makes no sense to try to use an aggregate to determine which rows + will be inputs to the aggregates. On the other hand, + having clauses always contain aggregate functions. + (Strictly speaking, you are allowed to write a having + clause that doesn't use aggregates, but it's wasteful; the same condition + could be used more efficiently at the where stage.) + + + + As an example, we can find the highest low-temperature reading anywhere + with SELECT max(temp_lo) FROM weather; - is allowed, while + If we want to know which city (or cities) that reading occurred in, + we might try SELECT city FROM weather WHERE temp_lo = max(temp_lo); - is not. However, as is often the case the query can be restated to accomplish - the intended result; here by using a subselect: + but this will not work since the aggregate max() can't be used in + where. However, as is often the case the query can be + restated to accomplish the intended result; here by using a + subselect: SELECT city FROM weather WHERE temp_lo = (SELECT max(temp_lo) FROM weather); + This is OK because the sub-select is an independent computation that + computes its own aggregate separately from what's happening in the outer + select. - Aggregates may also have group by clauses: + Aggregates are also very useful in combination with + group by clauses. For example, we can get the + maximum low temperature observed in each city with SELECT city, max(temp_lo) FROM weather GROUP BY city; + which gives us one output row per city. We can filter these grouped + rows using having: + +SELECT city, max(temp_lo) + FROM weather + GROUP BY city + HAVING min(temp_lo) < 0; + + which gives us the same results for only the cities that have some + below-zero readings. Finally, if we only care about cities whose + names begin with 'P', we might do + +SELECT city, max(temp_lo) + FROM weather + WHERE city like 'P%' + GROUP BY city + HAVING min(temp_lo) < 0; + + Note that we can apply the city-name restriction in + where, since it needs no aggregate. This is + more efficient than adding the restriction to having, + because we avoid doing the grouping and aggregate calculations + for all rows that fail the where check. diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index f2afa38b7c..070f8b43d0 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -1,5 +1,5 @@ @@ -202,10 +202,10 @@ SELECT [ ALL | DISTINCT [ ON column DISTINCT will eliminate all duplicate rows from the - selection. + result. DISTINCT ON column will eliminate all duplicates in the specified column; this is - equivalent to using + similar to using GROUP BY column. ALL will return all candidate rows, including duplicates. @@ -320,11 +320,13 @@ GROUP BY column [, ...] GROUP BY will condense into a single row all rows that share the - same values for the - grouped columns; aggregates return values derived from all rows - that make up the group. The value returned for an ungrouped - and unaggregated column is dependent on the order in which rows - happen to be read from the database. + same values for the grouped columns. Aggregate functions, if any, + are computed across all rows making up each group, producing a + separate value for each group (whereas without GROUP BY, an + aggregate produces a single value computed across all the selected + rows). When GROUP BY is present, it is not valid to refer to + ungrouped columns except within aggregate functions, since there + would be more than one possible value to return for an ungrouped column. @@ -354,7 +356,8 @@ HAVING cond_expr Each column referenced in cond_expr shall unambiguously - reference a grouping column. + reference a grouping column, unless the reference appears within an + aggregate function. diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index a74bd08be1..beacf8fbdf 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -642,15 +642,16 @@ CAST 'string' AS type a_expr right_unary_operator left_unary_operator a_expr parameter - functional expressions - aggregate expressions + functional expression + aggregate expression - We have already discussed constants and attributes. The two kinds of - operator expressions indicate respectively binary and left_unary - expressions. The following sections discuss the remaining options. + We have already discussed constants and attributes. The three kinds of + operator expressions indicate respectively binary (infix), right-unary + (suffix) and left-unary (prefix) operators. The following sections + discuss the remaining options. @@ -690,7 +691,7 @@ CREATE FUNCTION dept (name) enclosed in parentheses: -function (a_expr [, a_expr ) +function (a_expr [, a_expr ... ] ) @@ -705,20 +706,40 @@ sqrt(emp.salary) - Aggregate Expression + Aggregate Expressions - An aggregate expression - represents a simple aggregate (i.e., one that computes a single value) - or an aggregate function (i.e., one that computes a set of values). - The syntax is the following: + An aggregate expression represents the application + of an aggregate function across the rows selected by a query. + An aggregate function reduces multiple inputs to a single output value, + such as the sum or average of the inputs. + The syntax of an aggregate expression is one of the following: - -aggregate_name (attribute) - + + aggregate_name (expression) + aggregate_name (DISTINCT expression) + aggregate_name ( * ) + + + where aggregate_name is a previously defined + aggregate, and expression is any expression + that doesn't itself contain an aggregate expression. + - where aggregate_name - must be a previously defined aggregate. + + The first form of aggregate expression invokes the aggregate across all + input rows for which the given expression yields a non-null value. + The second form invokes the aggregate for all distinct non-null values + of the expression found in the input rows. The last form invokes the + aggregate once for each input row regardless of null or non-null values; + since no particular input value is specified, it is generally only useful + for the count() aggregate. + + + + For example, count(*) yields the total number of input rows; + count(f1) yields the number of input rows in which f1 is non-null; + count(distinct f1) yields the number of distinct non-null values of f1. -- 2.11.0