Rip out table expression section from SQL syntax chapter and develop it

author Peter Eisentraut <peter_e@gmx.net>

Mon, 22 Jan 2001 23:34:33 +0000 (23:34 +0000)

committer Peter Eisentraut <peter_e@gmx.net>

Mon, 22 Jan 2001 23:34:33 +0000 (23:34 +0000)
author Peter Eisentraut <peter_e@gmx.net>
Mon, 22 Jan 2001 23:34:33 +0000 (23:34 +0000)
committer Peter Eisentraut <peter_e@gmx.net>
Mon, 22 Jan 2001 23:34:33 +0000 (23:34 +0000)
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml

index 799f517..5d784d7 100644 (file)
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -1,4 +1,4 @@
-<!-- $Header: /cvsroot/pgsql/doc/src/sgml/filelist.sgml,v 1.4 2001/01/06 11:58:56 petere Exp $ -->
+<!-- $Header: /cvsroot/pgsql/doc/src/sgml/filelist.sgml,v 1.5 2001/01/22 23:34:32 petere Exp $ -->
  
  <!entity about      SYSTEM "about.sgml">
  <!entity history    SYSTEM "history.sgml">
@@ -31,7 +31,7 @@
  <!entity plsql      SYSTEM "plsql.sgml">
  <!entity pltcl      SYSTEM "pltcl.sgml">
  <!entity psql       SYSTEM "psql.sgml">
-<!entity query-ug   SYSTEM "query-ug.sgml">
+<!entity queries    SYSTEM "queries.sgml">
  <!entity storage    SYSTEM "storage.sgml">
  <!entity syntax     SYSTEM "syntax.sgml">
  <!entity typeconv   SYSTEM "typeconv.sgml">
diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml

new file mode 100644 (file)

index 0000000..e9ede07
--- /dev/null
+++ b/doc/src/sgml/queries.sgml
@@ -0,0 +1,819 @@
+<!-- $Header: /cvsroot/pgsql/doc/src/sgml/queries.sgml,v 1.1 2001/01/22 23:34:33 petere Exp $ -->
+
+<chapter id="queries">
+ <title>Queries</title>
+
+ <para>
+  A <firstterm>query</firstterm> is the process of or the command to
+  retrieve data from a database.  In SQL the <command>SELECT</command>
+  command is used to specify queries.  The general syntax of the
+  <command>SELECT</command> command is
+<synopsis>
+SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable> <optional><replaceable>sort_specification</replaceable></optional>
+</synopsis>
+  The following sections describe the details of the select list, the
+  table expression, and the sort specification.  The simplest kind of
+  query has the form
+<programlisting>
+SELECT * FROM table1;
+</programlisting>
+  Assuming that there is a table called table1, this command would
+  retrieve all rows and all columns from table1.  (The method of
+  retrieval depends on the client application.  For example, the
+  <application>psql</application> program will display an ASCII-art
+  table on the screen, client libraries will offer functions to
+  retrieve individual rows and columns.)  The select list
+  specification <literal>*</literal> means all columns that the table
+  expression happens to provide.  A select list can also select a
+  subset of the available columns or even make calculations on the
+  columns before retrieving them; see <xref
+  linkend="queries-select-lists">.  For example, if table1 has columns
+  named a, b, and c (and perhaps others) you can make the following
+  query:
+<programlisting>
+SELECT a, b + c FROM table1;
+</programlisting>
+  (assuming that b and c are of a numeric data type).
+ </para>
+
+ <para>
+  <literal>FROM table1</literal> is a particularly simple kind of
+  table expression.  In general, table expressions can be complex
+  constructs of base tables, joins, and subqueries.  But you can also
+  omit the table expression entirely and use the SELECT command as a
+  calculator:
+<programlisting>
+SELECT 3 * 4;
+</programlisting>
+  This is more useful if the expressions in the select list return
+  varying results.  For example, you could call a function this way.
+<programlisting>
+SELECT random();
+</programlisting>
+ </para>
+
+ <sect1 id="queries-table-expressions">
+  <title>Table Expressions</title>
+
+  <para>
+   A <firstterm>table expression</firstterm> specifies a table.  The
+   table expression contains a FROM clause that is optionally followed
+   by WHERE, GROUP BY, and HAVING clauses.  Trivial table expressions
+   simply refer to a table on disk, a so-called base table, but more
+   complex expressions can be used to modify or combine base tables in
+   various ways.
+  </para>
+
+  <para>
+   The WHERE, GROUP BY, and HAVING clauses in the table expression
+   specify a pipeline of successive transformations performed on the
+   table derived in the FROM clause.  The final transformed table that
+   is derived provides the input rows used to derive output rows as
+   specified by the select list of derived column value expressions.
+  </para>
+       
+  <sect2 id="queries-from">
+   <title>FROM clause</title>
+ 
+   <para>
+    The FROM clause derives a table from one or more other tables
+    given in a comma-separated table reference list.
+<synopsis>
+FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_reference</replaceable> <optional>, ...</optional></optional>
+</synopsis>
+
+    A table reference may be a table name or a derived table such as a
+    subquery, a table join, or complex combinations of these.  If more
+    than one table reference is listed in the FROM clause they are
+    CROSS JOINed (see below) to form the derived table that may then
+    be subject to transformations by the WHERE, GROUP BY, and HAVING
+    clauses and is finally the result of the overall table expression.
+   </para>
+
+   <para>
+    If a table reference is a simple table name and it is the
+    supertable in a table inheritance hierarchy, rows of the table
+    include rows from all of its subtable successors unless the
+    keyword ONLY precedes the table name.
+   </para>
+
+   <sect3 id="queries-join">
+    <title>Joined Tables</title>
+
+    <para>
+     A joined table is a table derived from two other (real or
+     derived) tables according to the rules of the particular join
+     type.  INNER, OUTER, NATURAL, and CROSS JOIN are supported.
+    </para>
+
+    <variablelist>
+     <title>Join Types</title>
+
+     <varlistentry>
+      <term>CROSS JOIN</term>
+
+      <listitem>
+<synopsis>
+<replaceable>T1</replaceable> CROSS JOIN <replaceable>T2</replaceable>
+</synopsis>
+
+       <para>
+        For each combination of rows from
+        <replaceable>T1</replaceable> and
+        <replaceable>T2</replaceable> the derived table will contain a
+        row consisting of all columns in <replaceable>T1</replaceable>
+        followed by all columns in <replaceable>T2</replaceable>.  If
+        the tables have have N and M rows respectively, the joined
+        table will have N * M rows.  A cross join is essentially an
+        <literal>INNER JOIN ON TRUE</literal>.
+       </para>
+
+       <tip>
+        <para>
+         <literal>FROM <replaceable>T1</replaceable> CROSS JOIN
+         <replaceable>T2</replaceable></literal> is equivalent to
+         <literal>FROM <replaceable>T1</replaceable>,
+         <replaceable>T2</replaceable></literal>.
+        </para>
+       </tip>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>Qualified JOINs</term>
+      <listitem>
+
+<synopsis>
+<replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> ON <replaceable>boolean expression</replaceable>
+<replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> USING ( <replaceable>join column list</replaceable> )
+</synopsis>
+        
+       <para>
+        The words <token>INNER</token> and <token>OUTER</token> are
+        optional for all JOINs.  <token>INNER</token> is the default;
+        <token>LEFT</token>, <token>RIGHT</token>, and
+        <token>FULL</token> are for OUTER JOINs only.
+       </para>
+
+       <para>
+        The <firstterm>join condition</firstterm> is specified in the
+        ON or USING clause.  (The meaning of the join condition
+        depends on the particular join type; see below.)  The ON
+        clause takes a Boolean value expression of the same kind as is
+        used in a WHERE clause.  The USING clause takes a
+        comma-separated list of column names, which the joined tables
+        must have in common, and joins the tables on the equality of
+        those columns as a set, resulting in a joined table having one
+        column for each common column listed and all of the other
+        columns from both tables.  Thus, <literal>USING (a, b,
+        c)</literal> is equivalent to <literal>ON (t1.a = t2.a AND
+        t1.b = t2.b AND t1.c = t2.c)</literal> with the exception that
+        if ON is used there will be two columns a, b, and c in the
+        result, whereas with USING there will be only one of each.
+       </para>
+
+       <variablelist>
+        <varlistentry>
+         <term>INNER JOIN</term>
+
+         <listitem>
+          <para>
+           For each row R1 of T1, the joined table has a row for each
+           row in T2 that satisfies the join condition with R1.
+          </para>
+         </listitem>
+        </varlistentry>
+
+        <varlistentry>
+         <term>LEFT OUTER JOIN</term>
+
+         <listitem>
+          <para>
+           First, an INNER JOIN is performed.  Then, for a row in T1
+           that does not satisfy the join condition with any row in
+           T2, a joined row is returned with NULL values in columns of
+           T2.  Thus, the joined table unconditionally has a row for each
+           row in T1.
+          </para>
+         </listitem>
+        </varlistentry>
+         
+        <varlistentry>
+         <term>RIGHT OUTER JOIN</term>
+
+         <listitem>
+          <para>
+           This is like a left join, only that the result table will
+           unconditionally have a row for each row in T2.
+          </para>
+         </listitem>
+        </varlistentry>
+         
+        <varlistentry>
+         <term>FULL OUTER JOIN</term>
+
+         <listitem>
+          <para>
+           First, an INNER JOIN is performed.  Then, for each row in
+           T1 that does not satisfy the join condition with any row in
+           T2, a joined row is returned with null values in columns of
+           T2.  Also, for each row of T2 that does not satisfy the
+           join condition with any row in T1, a joined row with null
+           values in the columns of T1 is returned.
+          </para>
+         </listitem>
+        </varlistentry>
+       </variablelist>
+      </listitem>
+     </varlistentry>
+      
+     <varlistentry>
+      <term>NATURAL JOIN</term>
+
+      <listitem>
+<synopsis>
+<replaceable>T1</replaceable> NATURAL { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> JOIN <replaceable>T2</replaceable>
+</synopsis>
+       <para>
+        A natural join creates a joined table where every pair of matching
+        column names between the two tables are merged into one column. The
+        join specification is effectively a USING clause containing all the
+        common column names and is otherwise like a Qualified JOIN.
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+
+    <para>
+     Joins of all types can be chained together or nested where either
+     or both of <replaceable>T1</replaceable> and
+     <replaceable>T2</replaceable> may be JOINed tables.  Parenthesis
+     can be used around JOIN clauses to control the join order which
+     are otherwise left to right.
+    </para>
+   </sect3>
+
+   <sect3 id="queries-subqueries">
+    <title>Subqueries</title>
+
+    <para>
+     Subqueries specifying a derived table must be enclosed in
+     parenthesis and <emphasis>must</emphasis> be named using an AS
+     clause.  (See <xref linkend="queries-table-aliases">.)
+    </para>
+
+<programlisting>
+FROM (SELECT * FROM table1) AS alias_name
+</programlisting>
+
+    <para>
+     This example is equivalent to <literal>FROM table1 AS
+     alias_name</literal>.  Many subqueries can be written as table
+     joins instead.
+    </para>
+   </sect3>
+
+   <sect3 id="queries-table-aliases">
+    <title>Table and Column Aliases</title>
+
+    <para>
+     A temporary name can be given to tables and complex table
+     references to be used for references to the derived table in
+     further processing.  This is called a <firstterm>table
+     alias</firstterm>.
+<synopsis>
+FROM <replaceable>table_reference</replaceable> AS <replaceable>alias</replaceable>
+</synopsis>
+     Here, <replaceable>alias</replaceable> can be any regular
+     identifier.  The alias becomes the new name of the table
+     reference for the current query -- it is no longer possible to
+     refer to the table by the original name (if the table reference
+     was an ordinary base table).  Thus
+<programlisting>
+SELECT * FROM my_table AS m WHERE my_table.a > 5;
+</programlisting>
+     is not valid SQL syntax.  What will happen instead, as a
+     <productname>Postgres</productname> extension, is that an implicit
+     table reference is added to the FROM clause, so the query is
+     processed as if it was written as
+<programlisting>
+SELECT * FROM my_table AS m, my_table WHERE my_table.a > 5;
+</programlisting>
+     Table aliases are mainly for notational convenience, but it is
+     necessary to use them when joining a table to itself, e.g.,
+<programlisting>
+SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...
+</programlisting>
+     Additionally, an alias is required if the table reference is a
+     subquery.
+    </para>
+
+    <para>
+     Parenthesis are used to resolve ambiguities.  The following
+     statement will assign the alias <literal>b</literal> to the
+     result of the join, unlike the previous example:
+<programlisting>
+SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...
+</programlisting>
+    </para>
+
+    <para>
+<synopsis>
+FROM <replaceable>table_reference</replaceable> <replaceable>alias</replaceable>
+</synopsis>
+     This form is equivalent the previously treated one; the
+     <token>AS</token> key word is noise.
+    </para>
+
+    <para>
+<synopsis>
+FROM <replaceable>table_reference</replaceable> <optional>AS</optional> <replaceable>alias</replaceable> ( <replaceable>column1</replaceable> <optional>, <replaceable>column2</replaceable> <optional>, ...</optional></optional> )
+</synopsis>
+     In addition to renaming the table as described above, the columns
+     of the table are also given temporary names.  If less column
+     aliases are specified than the actual table has columns, the last
+     columns are not renamed.  This syntax is especially useful for
+     self-joins or subqueries.
+    </para>
+   </sect3>
+
+   <sect3 id="queries-table-expression-examples">
+    <title>Examples</title>
+
+    <para>
+<programlisting>
+FROM T1 INNER JOIN T2 USING (C)
+FROM T1 LEFT OUTER JOIN T2 USING (C)
+FROM (T1 RIGHT OUTER JOIN T2 ON (T1C1=T2C1)) AS DT1
+FROM (T1 FULL OUTER JOIN T2 USING (C)) AS DT1 (DT1C1, DT1C2)
+
+FROM T1 NATURAL INNER JOIN T2
+FROM T1 NATURAL LEFT OUTER JOIN T2
+FROM T1 NATURAL RIGHT OUTER JOIN T2
+FROM T1 NATURAL FULL OUTER JOIN T2
+
+FROM (SELECT * FROM T1) DT1 CROSS JOIN T2, T3
+FROM (SELECT * FROM T1) DT1, T2, T3
+</programlisting>
+
+     Above are some examples of joined tables and complex derived
+     tables.  Notice how the AS clause renames or names a derived
+     table and how the optional comma-separated list of column names
+     that follows gives names or renames the columns.  The last two
+     FROM clauses produce the same derived table from T1, T2, and T3.
+     The AS keyword was omitted in naming the subquery as DT1.  The
+     keywords OUTER and INNER are noise that can be omitted also.
+    </para>
+   </sect3>
+
+  </sect2>
+
+  <sect2 id="queries-where">
+   <title>WHERE clause</title>
+
+   <para>
+    The syntax of the WHERE clause is
+<synopsis>
+WHERE <replaceable>search condition</replaceable>
+</synopsis>
+    where <replaceable>search condition</replaceable> is any value
+    expression as defined in <xref linkend="sql-expressions"> that
+    returns a value of type <type>boolean</type>.
+   </para>
+
+   <para>
+    After the processing of the FROM clause is done, each row of the
+    derived table is checked against the search condition.  If the
+    result of the condition is true, the row is kept in the output
+    table, otherwise (that is, if the result is false or NULL) it is
+    discarded.  The search condition typically references at least some
+    column in the table generated in the FROM clause; this is not
+    required, but otherwise the WHERE clause will be fairly useless.
+   </para>
+
+   <note>
+    <para>
+     Before the implementation of the JOIN syntax, it was necessary to
+     put the join condition of an inner join in the WHERE clause.  For
+     example, these table expressions are equivalent:
+<programlisting>
+FROM a, b WHERE a.id = b.id AND b.val &gt; 5
+</programlisting>
+     and
+<programlisting>
+FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val &gt; 5
+</programlisting>
+     or perhaps even
+<programlisting>
+FROM a NATURAL JOIN b WHERE b.val &gt; 5
+</programlisting>
+     Which one of these you use is mainly a matter of style.  The JOIN
+     syntax in the FROM clause is probably not as portable to other
+     products.  For outer joins there is no choice in any case:  they
+     must be done in the FROM clause.
+    </para>
+   </note>
+
+<programlisting>
+FROM FDT WHERE
+    C1 > 5
+
+FROM FDT WHERE
+    C1 IN (1, 2, 3)
+FROM FDT WHERE
+    C1 IN (SELECT C1 FROM T2)
+FROM FDT WHERE
+    C1 IN (SELECT C3 FROM T2 WHERE C2 = FDT.C1 + 10)
+
+FROM FDT WHERE
+    C1 BETWEEN (SELECT C3 FROM T2 WHERE C2 = FDT.C1 + 10) AND 100
+
+FROM FDT WHERE
+    EXISTS (SELECT C1 FROM T2 WHERE C2 > FDT.C1)
+</programlisting>
+
+   <para>
+    In the examples above, FDT is the table derived in the FROM
+    clause. Rows that do not meet the search condition of the where
+    clause are eliminated from FDT. Notice the use of scalar
+    subqueries as value expressions (C2 assumed UNIQUE).  Just like
+    any other query, the subqueries can employ complex table
+    expressions.  Notice how FDT is referenced in the subqueries.
+    Qualifying C1 as FDT.C1 is only necessary if C1 is the name of a
+    column in the derived input table of the subquery.  Qualifying the
+    column name adds clarity even when it is not needed.  The column
+    naming scope of an outer query extends into its inner queries.
+   </para>
+  </sect2>
+
+
+  <sect2 id="queries-group">
+   <title>GROUP BY and HAVING clauses</title>     
+
+   <para>
+    After passing the WHERE filter, the derived input table may be
+    subject to grouping, using the GROUP BY clause, and elimination of
+    group rows using the HAVING clause.
+   </para>
+
+<synopsis>
+SELECT <replaceable>select_list</replaceable> FROM ... <optional>WHERE ...</optional> GROUP BY <replaceable>grouping_column_reference</replaceable> <optional>, <replaceable>grouping_column_reference</replaceable></optional>...
+</synopsis>
+
+   <para>
+    The GROUP BY clause is used to group together rows in a table that
+    share the same values in all the columns listed. The order in
+    which the columns are listed does not matter (as opposed to an
+    ORDER BY clause).  The purpose is to reduce each group of rows
+    sharing common values into one group row that is representative of
+    all rows in the group.  This is done to eliminate redundancy in
+    the output and/or obtain aggregates that apply to these groups.
+   </para>
+         
+   <para>
+    Once a table is grouped, columns that are not included in the
+    grouping cannot be referenced, except in aggregate expressions,
+    since a specific value in those columns is ambiguous - which row
+    in the group should it come from?  The grouped-by columns can be
+    referenced in select list column expressions since they have a
+    known constant value per group.  Aggregate functions on the
+    ungrouped columns provide values that span the rows of a group,
+    not of the whole table.  For instance, a
+    <function>sum(sales)</function> on a grouped table by product code
+    gives the total sales for each product, not the total sales on all
+    products.  The aggregates of the ungrouped columns are
+    representative of the group, whereas their individual values may
+    not be.
+   </para>
+
+   <para>
+    Example:
+<programlisting>
+SELECT pid, p.name, (sum(s.units) * p.price) AS sales
+  FROM products p LEFT JOIN sales s USING ( pid )
+  GROUP BY pid, p.name, p.price;
+</programlisting>
+    In this example, the columns pid, p.name, and p.price must be in
+    the GROUP BY clause since they are referenced in the query select
+    list.  The column s.units does not have to be in the GROUP BY list
+    since it is only used in an aggregate expression
+    (<function>sum()</function>), which represents the group of sales
+    of a product.  For each product, a summary row is returned about
+    all sales of the product.
+   </para>
+
+   <para>
+    In strict SQL, GROUP BY can only group by columns of the source
+    table but Postgres extends this to also allow GROUP BY to group by
+    select columns in the query select list.  Grouping by value
+    expressions instead of simple column names is also allowed.
+   </para>
+
+   <para>
+<synopsis>
+SELECT <replaceable>select_list</replaceable> FROM ... <optional>WHERE ...</optional> GROUP BY ... HAVING <replaceable>boolean_expression</replaceable>
+</synopsis>
+    If a table has been grouped using a GROUP BY clause, but then only
+    certain groups are of interest, the HAVING clause can be used,
+    much like a WHERE clause, to eliminate groups from a grouped
+    table.  For some queries, Postgres allows a HAVING clause to be
+    used without a GROUP BY and then it acts just like another WHERE
+    clause, but the point in using HAVING that way is not clear. Since
+    HAVING operates on groups, only grouped columns can be listed in
+    the HAVING clause.  If selection based on some ungrouped column is
+    desired, it should be expressed in the WHERE clause.
+   </para>
+
+   <para>
+    Example:
+<programlisting>
+SELECT pid    AS "Products",
+       p.name AS "Over 5000",
+       (sum(s.units) * (p.price - p.cost)) AS "Past Month Profit"
+  FROM products p LEFT JOIN sales s USING ( pid )
+  WHERE p.date > CURRENT_DATE - INTERVAL '4 weeks'
+  GROUP BY pid, p.name, p.price, p.cost
+    HAVING p.price > 5000;
+</programlisting>
+    In the example above, the WHERE clause is selecting rows by a
+    column that is not grouped, while the HAVING clause
+    is selecting groups with a price greater than 5000.
+   </para>
+  </sect2>
+ </sect1>
+
+
+ <sect1 id="queries-select-lists">
+  <title>Select Lists</title>
+
+  <para>
+   The table expression in the <command>SELECT</command> command
+   constructs an intermediate virtual table by possibly combining
+   tables, views, eliminating rows, grouping, etc.  This table is
+   finally passed on to processing by the select list.  The select
+   list determines which <emphasis>columns</emphasis> of the
+   intermediate table are retained.  The simplest kind of select list
+   is <literal>*</literal> which retains all columns that the table
+   expression produces.  Otherwise, a select list is a comma-separated
+   list of value expressions (as defined in <xref
+   linkend="sql-expressions">).  For instance, it could be a list of
+   column names:
+<programlisting>
+SELECT a, b, c FROM ...
+</programlisting>
+   The columns names a, b, and c are either the actual names of the
+   columns of table referenced in the FROM clause, or the aliases
+   given to them as explained in <xref linkend="queries-table-aliases">.
+   The name space available in the select list is the same as in the
+   WHERE clause (unless grouping is used, in which case it is the same
+   as in the HAVING clause).  If more than one table has a column of
+   the same name, the table name must also be given, as in
+<programlisting>
+SELECT tbl1.a, tbl2.b, tbl1.c FROM ...
+</programlisting>
+   (see also <xref linkend="queries-where">).
+  </para>
+
+  <para>
+   If an arbitrary value expression is used in the select list, it
+   conceptually adds a new virtual column to the returned table.  The
+   value expression is effectively evaluated once for each retrieved
+   row with real values substituted for any column references.  But
+   the expressions in the select list do not have to reference any
+   columns in the table expression of the FROM clause; they can be
+   constant arithmetic expressions as well, for instance.
+  </para>
+
+  <sect2 id="queries-column-labels">
+   <title>Column Labels</title>
+
+   <para>
+    The entries in the select list can be assigned names for further
+    processing.  The <quote>further processing</quote> in this case is
+    an optional sort specification and the client application (e.g.,
+    column headers for display).  For example:
+<programlisting>
+SELECT a AS value, b + c AS sum FROM ...
+</programlisting>
+    The AS key word can in fact be omitted.
+   </para>
+
+   <para>
+    If no name is chosen, the system assigns a default.  For simple
+    column references, this is the name of the column.  For function
+    calls, this is the name of the function.  For complex expressions,
+    the system will generate a generic name.
+   </para>
+
+   <note>
+    <para>
+     The naming of output columns here is different from that done in
+     the FROM clause (see <xref linkend="queries-table-aliases">).  This
+     pipeline will in fact allow you to rename the same column twice,
+     but the name chosen in the select list is the one that will be
+     passed on.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2 id="queries-distinct">
+   <title>DISTINCT</title>
+
+   <para>
+    After the select list has been processed, the result table may
+    optionally be subject to the elimination of duplicates.  The
+    <token>DISTINCT</token> key word is written directly after the
+    <token>SELECT</token> to enable this:
+<synopsis>
+SELECT DISTINCT <replaceable>select_list</replaceable> ...
+</synopsis>
+    (Instead of <token>DISTINCT</token> the word <token>ALL</token>
+    can be used to select the default behavior of retaining all rows.)
+   </para>
+
+   <para>
+    Obviously, two rows are considered distinct if they differ in at
+    least one column value.  NULLs are considered equal in this
+    consideration.
+   </para>
+
+   <para>
+    Alternatively, an arbitrary expression can determine what rows are
+    to be considered distinct:
+<synopsis>
+SELECT DISTINCT ON (<replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ...</optional>) <replaceable>select_list</replaceable> ...
+</synopsis>
+    Here <replaceable>expression</replaceable> is an arbitrary value
+    expression that is evaluated for all rows.  A set of rows for
+    which all the expressions is equal are considered duplicates and
+    only the first row is kept in the output.  Note that the
+    <quote>first row</quote> of a set is unpredictable unless the
+    query is sorted.
+   </para>
+
+   <para>
+    The DISTINCT ON clause is not part of the SQL standard and is
+    sometimes considered bad style because of the indeterminate nature
+    of its results.  With judicious use of GROUP BY and subselects in
+    FROM the construct can be avoided, but it is very often the much
+    more convenient alternative.
+   </para>
+  </sect2>
+ </sect1>
+
+ <sect1 id="queries-union">
+  <title>Combining Queries</title>
+
+  <para>
+   The results of two queries can be combined using the set operations
+   union, intersection, and difference.  The syntax is
+<synopsis>
+<replaceable>query1</replaceable> UNION <optional>ALL</optional> <replaceable>query2</replaceable>
+<replaceable>query1</replaceable> INTERSECT <optional>ALL</optional> <replaceable>query2</replaceable>
+<replaceable>query1</replaceable> EXCEPT <optional>ALL</optional> <replaceable>query2</replaceable>
+</synopsis>
+   <replaceable>query1</replaceable> and
+   <replaceable>query2</replaceable> are queries that can use any of
+   the features discussed up to this point.  Set operations can also
+   be nested and chained, for example
+<synopsis>
+<replaceable>query1</replaceable> UNION <replaceable>query2</replaceable> UNION <replaceable>query3</replaceable>
+</synopsis>
+   which really says
+<synopsis>
+(<replaceable>query1</replaceable> UNION <replaceable>query2</replaceable>) UNION <replaceable>query3</replaceable>
+</synopsis>
+  </para>
+
+  <para>
+   <command>UNION</command> effectively appends the result of
+   <replaceable>query2</replaceable> to the result of
+   <replaceable>query1</replaceable> (although there is no guarantee
+   that this is the order in which the rows are actually returned) and
+   eliminates all duplicate rows, in the sense of DISTINCT, unless ALL
+   is specified.
+  </para>
+
+  <para>
+   <command>INTERSECT</command> returns all rows that are both in the
+   result of <replaceable>query1</replaceable> and in the result of
+   <replaceable>query2</replaceable>.  Duplicate rows are eliminated
+   unless ALL is specified.
+  </para>
+
+  <para>
+   <command>EXCEPT</command> returns all rows that are in the result
+   of <replaceable>query1</replaceable> but not in the result of
+   <replaceable>query2</replaceable>.  Again, duplicates are
+   eliminated unless ALL is specified.
+  </para>
+
+  <para>
+   In order to calculate the union, intersection, or difference of two
+   queries, the two queries must be <quote>union compatible</quote>,
+   which means that they both return the same number of columns, and
+   that the corresponding columns have compatible data types, as
+   described in <xref linkend="typeconv-union-case">.
+  </para>
+ </sect1>
+
+
+ <sect1 id="queries-order">
+  <title>Sorting Rows</title>
+          
+  <para>
+   After a query has produced an output table (after the select list
+   has been processed) it can optionally be sorted.  If sorting is not
+   chosen, the rows will be returned in random order.  The actual
+   order in that case will depend on the scan and join plan types and
+   the order on disk, but it must not be relied on.  A particular
+   ordering can only be guaranteed if the sort step is explicitly
+   chosen.
+  </para>
+
+  <para>
+   The ORDER BY clause specifies the sort order:
+<synopsis>
+SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable> ORDER BY <replaceable>column1</replaceable> <optional>ASC | DESC</optional> <optional>, <replaceable>column2</replaceable> <optional>ASC | DESC</optional> ...</optional>
+</synopsis>
+   <replaceable>column1</replaceable>, etc., refer to select list
+   columns:  It can either be the name of a column (either the
+   explicit column label or default name, as explained in <xref
+   linkend="queries-column-labels">) or the number of a column.  Some
+   examples:
+<programlisting>
+SELECT a, b FROM table1 ORDER BY a;
+SELECT a + b AS sum, c FROM table1 ORDER BY sum;
+SELECT a, sum(b) FROM table1 GROUP BY a ORDER BY 1;
+</programlisting>
+  </para>
+
+  <para>
+   As an extension to the SQL standard, Postgres also allows ordering
+   by arbitrary expressions:
+<programlisting>
+SELECT a, b FROM table1 ORDER BY a + b;
+</programlisting>
+   References to column names in the FROM clause that are renamed in
+   the select list are also allowed:
+<programlisting>
+SELECT a AS b FROM table1 ORDER BY a;
+</programlisting>
+   But this does not work in queries involving UNION, INTERSECT, or
+   EXCEPT, and is not portable.
+  </para>
+
+  <para>
+   Each column specification may be followed by an optional ASC or
+   DESC to set the sort direction.  ASC is default.  Ascending order
+   puts smaller values first, where <quote>smaller</quote> is defined
+   in terms of the <literal>&lt;</literal> operator.  Similarly,
+   descending order is determined with the <literal>&gt;</literal>
+   operator.
+  </para>
+
+  <para>
+   If more than one sort column is specified the later entries are
+   used to sort the rows that are equal under the order imposed by the
+   earlier sort specifications.
+  </para>
+ </sect1>
+
+ <sect1 id="queries-limit">
+  <title>LIMIT and OFFSET</title>
+
+<synopsis>
+SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable> <optional>ORDER BY <replaceable>sort_spec</replaceable></optional> <optional>LIMIT { <replaceable>number</replaceable> | ALL }</optional> <optional>OFFSET <replaceable>number</replaceable></optional>
+</synopsis>
+
+  <para>
+   LIMIT allows you to retrieve just a portion of the rows that are
+   generated by the rest of the query.  If a limit count is given, no
+   more than that many rows will be returned.  If an offset is given,
+   that many rows will be skipped before starting to return rows.
+  </para>
+
+  <para>
+   When using LIMIT, it is a good idea to use an ORDER BY clause that
+   constrains the result rows into a unique order.  Otherwise you will
+   get an unpredictable subset of the query's rows---you may be asking
+   for the tenth through twentieth rows, but tenth through twentieth
+   in what ordering?  The ordering is unknown, unless you specified
+   ORDER BY.
+  </para>
+
+  <para>
+   The query optimizer takes LIMIT into account when generating a
+   query plan, so you are very likely to get different plans (yielding
+   different row orders) depending on what you give for LIMIT and
+   OFFSET.  Thus, using different LIMIT/OFFSET values to select
+   different subsets of a query result <emphasis>will give
+   inconsistent results</emphasis> unless you enforce a predictable
+   result ordering with ORDER BY.  This is not a bug; it is an
+   inherent consequence of the fact that SQL does not promise to
+   deliver the results of a query in any particular order unless ORDER
+   BY is used to constrain the order.
+  </para>
+ </sect1>
+
+</chapter>
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml

index f994cbe..843f2d0 100644 (file)
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1,5 +1,5 @@
  <!--
-$Header: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v 1.36 2001/01/21 22:02:01 petere Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v 1.37 2001/01/22 23:34:33 petere Exp $
  -->
  
  <chapter id="sql-syntax">
@@ -743,7 +743,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
    </para>
  
    <para>
-   In addition to this list, there are a number of contructs that can
+   In addition to this list, there are a number of constructs that can
     be classified as an expression but do not follow any general syntax
     rules.  These generally have the semantics of a function or
     operator and are explained in the appropriate location in <xref
@@ -763,15 +763,15 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
     <para>
      A column can be referenced in the form:
  <synopsis>
-<replaceable>corelation</replaceable>.<replaceable>columnname</replaceable> `['<replaceable>subscript</replaceable>`]'
+<replaceable>correlation</replaceable>.<replaceable>columnname</replaceable> `['<replaceable>subscript</replaceable>`]'
  </synopsis>
  
-    <replaceable>corelation</replaceable> is either the name of a
+    <replaceable>correlation</replaceable> is either the name of a
      table, an alias for a table defined by means of a FROM clause, or
      the keyword <literal>NEW</literal> or <literal>OLD</literal>.
      (NEW and OLD can only appear in the action portion of a rule,
-    while other corelation names can be used in any SQL statement.)
-    The corelation name can be omitted if the column name is unique
+    while other correlation names can be used in any SQL statement.)
+    The correlation name can be omitted if the column name is unique
      across all the tables being used in the current query.  If
      <replaceable>column</replaceable> is of an array type, then the
      optional <replaceable>subscript</replaceable> selects a specific
@@ -895,8 +895,8 @@ sqrt(2)
      The precedence and associativity of the operators is hard-wired
      into the parser.  Most operators have the same precedence and are
      left-associative.  This may lead to non-intuitive behavior; for
-    example the boolean operators "&lt;" and "&gt;" have a different
-    precedence than the boolean operators "&lt;=" and "&gt;=".  Also,
+    example the Boolean operators "&lt;" and "&gt;" have a different
+    precedence than the Boolean operators "&lt;=" and "&gt;=".  Also,
      you will sometimes need to add parentheses when using combinations
      of binary and unary operators.  For instance
  <programlisting>
@@ -917,7 +917,7 @@ SELECT (5 &amp;) ~ 6;
      <tgroup cols="2">
       <thead>
        <row>
-       <entry>OperatorElement</entry>
+       <entry>Operator/Element</entry>
         <entry>Associativity</entry>
         <entry>Description</entry>
        </row>
@@ -1057,526 +1057,6 @@ SELECT (5 &amp;) ~ 6;
     </para>
    </sect1>
  
-
- <sect1 id="sql-table-expressions">
-  <title>Table Expressions</title>
-
-  <para>
-   A <firstterm>table expression</firstterm> specifies a table.  The
-   table expression contains a FROM clause that is optionally followed
-   by WHERE, GROUP BY, and HAVING clauses.  Trivial table expressions
-   simply refer to a table on disk, a so-called base table, but more
-   complex expressions can be used to modify or combine base tables in
-   various ways.
-  </para>
-
-  <para>
-   The general syntax of the <command>SELECT</command> command is
-<synopsis>
-SELECT <replaceable>select_list</replaceable> <replaceable>table_expression</replaceable>
-</synopsis>
-
-   The <replaceable>select_list</replaceable> is a comma separated
-   list of <replaceable>value expressions</replaceable> as defined in
-   <xref linkend="sql-expressions"> that specify the derived columns
-   of the query output table.  Column names in the derived table that
-   is the result of the <replaceable>table_expression</replaceable>
-   can be used in the <replaceable>value expression</replaceable>s of
-   the <replaceable>select_list</replaceable>.
-  </para>
-
-  <para>
-   The WHERE, GROUP BY, and HAVING clauses in the table expression
-   specify a pipeline of successive transformations performed on the
-   table derived in the FROM clause.  The final transformed table that
-   is derived provides the input rows used to derive output rows as
-   specified by the select list of derived column value expressions.
-  </para>
-       
-  <sect2>
-   <title>FROM clause</title>
- 
-   <para>
-    The FROM clause derives a table from one or more other tables
-    given in a comma-separated table reference list.
-<synopsis>
-FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_reference</replaceable> <optional>, ...</optional></optional>
-</synopsis>
-
-    A table reference may be a table name or a derived table such as a
-    subquery, a table join, or complex combinations of these.  If more
-    than one table reference is listed in the FROM clause they are
-    CROSS JOINed (see below) to form the derived table that may then
-    be subject to transformations by the WHERE, GROUP BY, and HAVING
-    clauses and is finally the result of the overall table expression.
-   </para>
-
-   <para>
-    If a table reference is a simple table name and it is the
-    supertable in a table inheritance hierarchy, rows of the table
-    include rows from all of its subtable successors unless the
-    keyword ONLY precedes the table name.
-   </para>
-
-   <sect3>
-    <title>Joined Tables</title>
-
-    <para>
-     A joined table is a table derived from two other (real or
-     derived) tables according to the rules of the particular join
-     type.  INNER, OUTER, NATURAL, and CROSS JOIN are supported.
-    </para>
-
-    <variablelist>
-     <title>Join Types</title>
-
-     <varlistentry>
-      <term>CROSS JOIN</term>
-
-      <listitem>
-<synopsis>
-<replaceable>T1</replaceable> CROSS JOIN <replaceable>T2</replaceable>
-</synopsis>
-
-       <para>
-        For each combination of rows from
-        <replaceable>T1</replaceable> and
-        <replaceable>T2</replaceable> the derived table will contain a
-        row consisting of all columns in <replaceable>T1</replaceable>
-        followed by all columns in <replaceable>T2</replaceable>.  If
-        the tables have have N and M rows respectively, the joined
-        table will have N * M rows.  A cross join is essentially an
-        <literal>INNER JOIN ON TRUE</literal>.
-       </para>
-
-       <tip>
-        <para>
-         <literal>FROM <replaceable>T1</replaceable> CROSS JOIN
-         <replaceable>T2</replaceable></literal> is equivalent to
-         <literal>FROM <replaceable>T1</replaceable>,
-         <replaceable>T2</replaceable></literal>.
-        </para>
-       </tip>
-      </listitem>
-     </varlistentry>
-
-     <varlistentry>
-      <term>Qualified JOINs</term>
-      <listitem>
-
-<synopsis>
-<replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> ON <replaceable>boolean expression</replaceable>
-<replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> USING ( <replaceable>join column list</replaceable> )
-</synopsis>
-        
-       <para>
-        The words <token>INNER</token> and <token>OUTER</token> are
-        optional for all JOINs.  <token>INNER</token> is the default;
-        <token>LEFT</token>, <token>RIGHT</token>, and
-        <token>FULL</token> are for OUTER JOINs only.
-       </para>
-
-       <para>
-        The <firstterm>join condition</firstterm> is specified in the
-        ON or USING clause.  (The meaning of the join condition
-        depends on the particular join type; see below.)  The ON
-        clause takes a boolean value expression of the same kind as is
-        used in a WHERE clause.  The USING clause takes a
-        comma-separated list of column names, which the joined tables
-        must have in common, and joins the tables on the equality of
-        those columns as a set, resulting in a joined table having one
-        column for each common column listed and all of the other
-        columns from both tables.  Thus, <literal>USING (a, b,
-        c)</literal> is equivalent to <literal>ON (t1.a = t2.a AND
-        t1.b = t2.b AND t1.c = t2.c)</literal> with the exception that
-        if ON is used there will be two columns a, b, and c in the
-        result, whereas with USING there will be only one of each.
-       </para>
-
-       <variablelist>
-        <varlistentry>
-         <term>INNER JOIN</term>
-
-         <listitem>
-          <para>
-           For each row R1 of T1, the joined table has a row for each
-           row in T2 that satisfies the join condition with R1.
-          </para>
-         </listitem>
-        </varlistentry>
-
-        <varlistentry>
-         <term>LEFT OUTER JOIN</term>
-
-         <listitem>
-          <para>
-           First, an INNER JOIN is performed.  Then, for a row in T1
-           that does not satisfy the join condition with any row in
-           T2, a joined row is returned with NULL values in columns of
-           T2.  Thus, the joined table unconditionally has a row for each
-           row in T1.
-          </para>
-         </listitem>
-        </varlistentry>
-         
-        <varlistentry>
-         <term>RIGHT OUTER JOIN</term>
-
-         <listitem>
-          <para>
-           This is like a left join, only that the result table will
-           unconditionally have a row for each row in T2.
-          </para>
-         </listitem>
-        </varlistentry>
-         
-        <varlistentry>
-         <term>FULL OUTER JOIN</term>
-
-         <listitem>
-          <para>
-           First, an INNER JOIN is performed.  Then, for each row in
-           T1 that does not satisfy the join condition with any row in
-           T2, a joined row is returned with null values in columns of
-           T2.  Also, for each row of T2 that does not satisfy the
-           join condition with any row in T1, a joined row with null
-           values in the columns of T1 is returned.
-          </para>
-         </listitem>
-        </varlistentry>
-       </variablelist>
-      </listitem>
-     </varlistentry>
-      
-     <varlistentry>
-      <term>NATURAL JOIN</term>
-
-      <listitem>
-<synopsis>
-<replaceable>T1</replaceable> NATURAL { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> JOIN <replaceable>T2</replaceable>
-</synopsis>
-       <para>
-        A natural join creates a joined table where every pair of matching
-        column names between the two tables are merged into one column. The
-        join specification is effectively a USING clause containing all the
-        common column names and is otherwise like a Qualified JOIN.
-       </para>
-      </listitem>
-     </varlistentry>
-    </variablelist>
-
-    <para>
-     Joins of all types can be chained together or nested where either
-     or both of <replaceable>T1</replaceable> and
-     <replaceable>T2</replaceable> may be JOINed tables.  Parenthesis
-     can be used around JOIN clauses to control the join order which
-     are otherwise left to right.
-    </para>
-   </sect3>
-
-   <sect3 id="sql-subqueries">
-    <title>Subqueries</title>
-
-    <para>
-     Subqueries specifying a derived table must be enclosed in
-     parenthesis and <emphasis>must</emphasis> be named using an AS
-     clause.  (See <xref linkend="sql-table-aliases">.)
-    </para>
-
-<programlisting>
-FROM (SELECT * FROM table1) AS alias_name
-</programlisting>
-
-    <para>
-     This example is equivalent to <literal>FROM table1 AS
-     alias_name</literal>.  Many subquieries can be written as table
-     joins instead.
-    </para>
-   </sect3>
-
-   <sect3 id="sql-table-aliases">
-    <title>Table and Column Aliases</title>
-
-    <para>
-     A temporary name can be given to tables and complex table
-     references to be used for references to the derived table in
-     further processing.  This is called a <firstterm>table
-     alias</firstterm>.
-<synopsis>
-FROM <replaceable>table_reference</replaceable> AS <replaceable>alias</replaceable>
-</synopsis>
-     Here, <replaceable>alias</replaceable> can be any regular
-     identifier.  The alias becomes the new name of the table
-     reference for the current query -- it is no longer possible to
-     refer to the table by the original name (if the table reference
-     was an ordinary base table).  Thus
-<programlisting>
-SELECT * FROM my_table AS m WHERE my_table.a > 5;
-</programlisting>
-     is not valid SQL syntax.  What will happen instead, as a
-     <productname>Postgres</productname> extension, is that an implict
-     table reference is added to the FROM clause, so the query is
-     processed as if it was written as
-<programlisting>
-SELECT * FROM my_table AS m, my_table WHERE my_table.a > 5;
-</programlisting>
-     Table aliases are mainly for notational convenience, but it is
-     necessary to use them when joining a table to itself, e.g.,
-<programlisting>
-SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...
-</programlisting>
-     Additionally, an alias is required if the table reference is a
-     subquery.
-    </para>
-
-    <para>
-     Parenthesis are used to resolve ambiguities.  The following
-     statement will assign the alias <literal>b</literal> to the
-     result of the join, unlike the previous example:
-<programlisting>
-SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...
-</programlisting>
-    </para>
-
-    <para>
-<synopsis>
-FROM <replaceable>table_reference</replaceable> <replaceable>alias</replaceable>
-</synopsis>
-     This form is equivalent the previously treated one; the
-     <token>AS</token> key word is noise.
-    </para>
-
-    <para>
-<synopsis>
-FROM <replaceable>table_reference</replaceable> <optional>AS</optional> <replaceable>alias</replaceable> ( <replaceable>column1</replaceable> <optional>, <replaceable>column2</replaceable> <optional>, ...</optional></optional> )
-</synopsis>
-     In addition to renaming the table as described above, the columns
-     of the table are also given temporary names.  If less column
-     aliases are specified than the actual table has columns, the last
-     columns are not renamed.  This syntax is especially useful for
-     self-joins or subqueries.
-    </para>
-   </sect3>
-
-   <sect3>
-    <title>Examples</title>
-
-    <para>
-<programlisting>
-FROM T1 INNER JOIN T2 USING (C)
-FROM T1 LEFT OUTER JOIN T2 USING (C)
-FROM (T1 RIGHT OUTER JOIN T2 ON (T1C1=T2C1)) AS DT1
-FROM (T1 FULL OUTER JOIN T2 USING (C)) AS DT1 (DT1C1, DT1C2)
-
-FROM T1 NATURAL INNER JOIN T2
-FROM T1 NATURAL LEFT OUTER JOIN T2
-FROM T1 NATURAL RIGHT OUTER JOIN T2
-FROM T1 NATURAL FULL OUTER JOIN T2
-
-FROM (SELECT * FROM T1) DT1 CROSS JOIN T2, T3
-FROM (SELECT * FROM T1) DT1, T2, T3
-</programlisting>
-
-     Above are some examples of joined tables and complex derived
-     tables.  Notice how the AS clause renames or names a derived
-     table and how the optional comma-separated list of column names
-     that follows gives names or renames the columns.  The last two
-     FROM clauses produce the same derived table from T1, T2, and T3.
-     The AS keyword was omitted in naming the subquery as DT1.  The
-     keywords OUTER and INNER are noise that can be omitted also.
-    </para>
-   </sect3>
-
-  </sect2>
-
-  <sect2>
-   <title>WHERE clause</title>
-
-   <para>
-    The syntax of the WHERE clause is
-<synopsis>
-WHERE <replaceable>search condition</replaceable>
-</synopsis>
-    where <replaceable>search condition</replaceable> is any value
-    expression as defined in <xref linkend="sql-expressions"> that
-    returns a value of type <type>boolean</type>.
-   </para>
-
-   <para>
-    After the processing of the FROM clause is done, each row of the
-    derived table is checked against the search condition.  If the
-    result of the condition is true, the row is kept in the output
-    table, otherwise (that is, if the result is false or NULL) it is
-    discared.  The search condition typically references at least some
-    column in the table generated in the FROM clause; this is not
-    required, but otherwise the WHERE clause will be fairly useless.
-   </para>
-
-   <note>
-    <para>
-     Before the implementation of the JOIN syntax, it was necessary to
-     put the join condition of an inner join in the WHERE clause.  For
-     example, these table expressions are equivalent:
-<programlisting>
-FROM a, b WHERE a.id = b.id AND b.val &gt; 5
-</programlisting>
-     and
-<programlisting>
-FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val &gt; 5
-</programlisting>
-     or perhaps even
-<programlisting>
-FROM a NATURAL JOIN b WHERE b.val &gt; 5
-</programlisting>
-     Which one of these you use is mainly a matter of style.  The JOIN
-     syntax in the FROM clause is probably not as portable to other
-     products.  For outer joins there is no choice in any case:  they
-     must be done in the FROM clause.
-    </para>
-   </note>
-
-<programlisting>
-FROM FDT WHERE
-    C1 > 5
-
-FROM FDT WHERE
-    C1 IN (1, 2, 3)
-FROM FDT WHERE
-    C1 IN (SELECT C1 FROM T2)
-FROM FDT WHERE
-    C1 IN (SELECT C3 FROM T2 WHERE C2 = FDT.C1 + 10)
-
-FROM FDT WHERE
-    C1 BETWEEN (SELECT C3 FROM T2 WHERE C2 = FDT.C1 + 10) AND 100
-
-FROM FDT WHERE
-    EXISTS (SELECT C1 FROM T2 WHERE C2 > FDT.C1)
-</programlisting>
-
-   <para>
-    In the examples above, FDT is the table derived in the FROM
-    clause. Rows that do not meet the search condition of the where
-    clause are eliminated from FDT. Notice the use of scalar
-    subqueries as value expressions (C2 assumed UNIQUE).  Just like
-    any other query, the subqueries can employ complex table
-    expressions.  Notice how FDT is referenced in the subqueries.
-    Qualifying C1 as FDT.C1 is only necessary if C1 is the name of a
-    column in the derived input table of the subquery.  Qualifying the
-    column name adds clarity even when it is not needed.  The column
-    naming scope of an outer query extends into its inner queries.
-   </para>
-  </sect2>
-
-<!-- This is confusing as heck.  Make it simpler. -->
-
-<![IGNORE[
-
-  <sect2>
-   <title>GROUP BY and HAVING clauses</title>     
-
-   <para>
-    After passing the WHERE filter, the derived input table may be
-    subject to grouping, using the GROUP BY clause, and elimination of
-    group rows using the HAVING clause.  (The HAVING clause can also
-    be used without GROUP BY, but then it is equivalent to the WHERE
-    clause.)
-   </para>
-
-   <para>
-    In standard SQL, the GROUP BY clause takes a list of column names,
-    that specify a subrow, from the derived input table produced by
-    the previous WHERE or FROM clause and partitions the table into
-    groups with duplicate subrows such that within a column of the
-    subrow, no column value is distinct from other column values. The
-    resulting derived input table is a special type of table, called a
-    grouped table, which still contains all columns but only
-    references to columns of the grouped subrow, and group aggregates,
-    derived from any of the columns, may appear in derived column
-    value expressions in the query select list.  When deriving an
-    output table from a query using a grouped input table, each output
-    row is derived from a corresponding group/partition of the grouped
-    table. Aggregates computed in a derived output column are
-    aggregates on the current partition/group of the grouped input
-    table being processed.  Only one output table row results per
-    group/partition of the grouped input table.
-   </para>
-
-   <para>
-    Postgres has extended the GROUP BY clause to allow some
-    non-standard, but useful behavior. Derived output columns, given
-    names using an AS clause in the query select list, may appear in
-    the GROUP BY clause in combination with, or instead of, the input
-    table column names. Tables may also be grouped by arbitrary
-    expressions. If output table column names appear in the GROUP BY
-    list, then the input table is augmented with additional columns of
-    the output table columns listed in the GROUP BY clause. The value
-    for each row in the additional columns is computed from the value
-    expression that defines the output column in the query select
-    list. The augmented input table is grouped by the column names
-    listed in the GROUP BY clause. The resulting grouped augmented
-    input table is then treated according standard SQL GROUP BY
-    semantics. Only the columns of the unaugmented input table in the
-    grouped subrow (if any), and group aggregates, derived from any of
-    the columns of the unaugmented input table, may be referenced in
-    the value expressions of the derived output columns of the
-    query. Output columns derived with an aggregate expression cannot
-    be named in the GROUP BY clause.
-   </para>
-
-   <para>
-    A HAVING clause may optionally follow a GROUP BY clause.  The
-    HAVING clause selects or eliminates, depending on which
-    perspective is taken, groups from the grouped table derived in the
-    GROUP BY clause that precedes it.  The search condition is the
-    same type of expression allowed in a WHERE clause and may
-    reference any of the input table column names in the grouped
-    subrow, but may not reference any others or any named output
-    columns.  When the search condition results in TRUE the group is
-    retained, otherwise the group is eliminated.
-   </para>
-  </sect2>
-
-   <sect2>
-    <title>ORDER BY and LIMIT clauses</title>
-          
-    <para>
-     ORDER BY and LIMIT clauses are not clauses of a table expression.
-     They are optional clauses that may follow a query expression and
-     are discussed here because they are commonly used with the
-     clauses above.
-    </para>
-
-    <para>
-     ORDER BY takes a comma-separated list of columns and performs a
-     cascaded ordering of the table by the columns listed, in the
-     order listed.  The keyword DESC or ASC may follow any column name
-     or expression in the list to specify descending or ascending
-     ordering, respectively. Ascending order is the default.  The
-     ORDER BY clause conforms to the SQL standard but is extended in
-     Postgres.  Postgres allows ORDER BY to reference both output
-     table columns, as named in the select list using the AS clause,
-     and input table columns, as given by the table derived in the
-     FROM clause and other previous clauses. Postgres also extends
-     ORDER BY to allow ordering by arbitrary expressions. If used in a
-     query with a GROUP BY clause, the ORDER BY clause can only
-     reference output table column names and grouped input table
-     columns.
-    </para>
-
-    <para>
-     LIMIT is not a standard SQL clause.  LIMIT is a Postgres
-     extension that limits the number of rows that will be returned
-     from a query.  The rows returned by a query using the LIMIT
-     clause are random if no ORDER BY clause is specified.  A LIMIT
-     clause may optionally be followed by an OFFSET clause which
-     specifies a number of rows to be skipped in the output table
-     before returning the number of rows specified in the LIMIT
-     clause.
-    </para>
-   </sect2>
-]]>
-  </sect1>
-
  </chapter>
  
  <!-- Keep this comment at the end of the file
diff --git a/doc/src/sgml/user.sgml b/doc/src/sgml/user.sgml

index 78d4220..c78e3d3 100644 (file)
--- a/doc/src/sgml/user.sgml
+++ b/doc/src/sgml/user.sgml
@@ -1,5 +1,5 @@
  <!--
-$Header: /cvsroot/pgsql/doc/src/sgml/Attic/user.sgml,v 1.24 2001/01/13 23:58:55 petere Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/Attic/user.sgml,v 1.25 2001/01/22 23:34:33 petere Exp $
  -->
  
  <book id="user">
@@ -44,6 +44,7 @@ $Header: /cvsroot/pgsql/doc/src/sgml/Attic/user.sgml,v 1.24 2001/01/13 23:58:55
  
   &intro;
   &syntax;
+ &queries;
   &datatype;
   &func;
   &typeconv;
author	Peter Eisentraut <peter_e@gmx.net>
	Mon, 22 Jan 2001 23:34:33 +0000 (23:34 +0000)
committer	Peter Eisentraut <peter_e@gmx.net>
	Mon, 22 Jan 2001 23:34:33 +0000 (23:34 +0000)
doc/src/sgml/filelist.sgml		patch \| blob \| history
doc/src/sgml/queries.sgml	[new file with mode: 0644]	patch \| blob
doc/src/sgml/syntax.sgml		patch \| blob \| history
doc/src/sgml/user.sgml		patch \| blob \| history