Surrogate pair support for U& string and identifier syntax

[pg-rex/syncrep.git] / doc / src / sgml / syntax.sgml
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml

index f102b33..c805e2e 100644 (file)
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1,6 +1,4 @@
-<!--
-$PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.105 2005/11/04 23:14:02 petere Exp $
--->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.135 2009/09/21 22:22:07 petere Exp $ -->
  
  <chapter id="sql-syntax">
   <title>SQL Syntax</title>
@@ -13,12 +11,12 @@ $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.105 2005/11/04 23:14:02 petere E
   <para>
    This chapter describes the syntax of SQL.  It forms the foundation
    for understanding the following chapters which will go into detail
-  about how the SQL commands are applied to define and modify data.
+  about how SQL commands are applied to define and modify data.
   </para>
  
   <para>
    We also advise users who are already familiar with SQL to read this
-  chapter carefully because there are several rules and concepts that
+  chapter carefully because it contains several rules and concepts that
    are implemented inconsistently among SQL databases or that are
    specific to <productname>PostgreSQL</productname>.
   </para>
@@ -122,7 +120,7 @@ INSERT INTO MY_TABLE VALUES (3, 'hi there');
      key word can be letters, underscores, digits
      (<literal>0</literal>-<literal>9</literal>), or dollar signs
      (<literal>$</>).  Note that dollar signs are not allowed in identifiers
-    according to the letter of the SQL standard, so their use may render
+    according to the letter of the SQL standard, so their use might render
      applications less portable.
      The SQL standard will not define a key word that contains
      digits or starts or ends with an underscore, so identifiers of this
@@ -133,12 +131,12 @@ INSERT INTO MY_TABLE VALUES (3, 'hi there');
     <para>
      <indexterm><primary>identifier</primary><secondary>length</secondary></indexterm>
      The system uses no more than <symbol>NAMEDATALEN</symbol>-1
-    characters of an identifier; longer names can be written in
+    bytes of an identifier; longer names can be written in
      commands, but they will be truncated.  By default,
      <symbol>NAMEDATALEN</symbol> is 64 so the maximum identifier
-    length is 63. If this limit is problematic, it can be raised by
+    length is 63 bytes. If this limit is problematic, it can be raised by
      changing the <symbol>NAMEDATALEN</symbol> constant in
-    <filename>src/include/postgres_ext.h</filename>.
+    <filename>src/include/pg_config_manual.h</filename>.
     </para>
  
     <para>
@@ -146,16 +144,16 @@ INSERT INTO MY_TABLE VALUES (3, 'hi there');
       <primary>case sensitivity</primary>
       <secondary>of SQL commands</secondary>
      </indexterm>
-    Identifier and key word names are case insensitive.  Therefore
+    Identifier and key word names are case insensitive.  Therefore:
  <programlisting>
  UPDATE MY_TABLE SET A = 5;
  </programlisting>
-    can equivalently be written as
+    can equivalently be written as:
  <programlisting>
  uPDaTE my_TabLE SeT a = 5;
  </programlisting>
      A convention often used is to write key words in upper
-    case and names in lower case, e.g.,
+    case and names in lower case, e.g.:
  <programlisting>
  UPDATE my_table SET a = 5;
  </programlisting>
@@ -184,14 +182,69 @@ UPDATE "my_table" SET "a" = 5;
     </para>
  
     <para>
-    Quoted identifiers can contain any character other than a double
-    quote itself.  (To include a double quote, write two double quotes.)
+    Quoted identifiers can contain any character, except the character
+    with code zero.  (To include a double quote, write two double quotes.)
      This allows constructing table or column names that would
      otherwise not be possible, such as ones containing spaces or
      ampersands.  The length limitation still applies.
     </para>
  
     <para>
+    <indexterm><primary>Unicode escape</primary><secondary>in
+    identifiers</secondary></indexterm> A variant of quoted
+    identifiers allows including escaped Unicode characters identified
+    by their code points.  This variant starts
+    with <literal>U&amp;</literal> (upper or lower case U followed by
+    ampersand) immediately before the opening double quote, without
+    any spaces in between, for example <literal>U&amp;"foo"</literal>.
+    (Note that this creates an ambiguity with the
+    operator <literal>&amp;</literal>.  Use spaces around the operator to
+    avoid this problem.)  Inside the quotes, Unicode characters can be
+    specified in escaped form by writing a backslash followed by the
+    four-digit hexadecimal code point number or alternatively a
+    backslash followed by a plus sign followed by a six-digit
+    hexadecimal code point number.  For example, the
+    identifier <literal>"data"</literal> could be written as
+<programlisting>
+U&amp;"d\0061t\+000061"
+</programlisting>
+    The following less trivial example writes the Russian
+    word <quote>slon</quote> (elephant) in Cyrillic letters:
+<programlisting>
+U&amp;"\0441\043B\043E\043D"
+</programlisting>
+   </para>
+
+   <para>
+    If a different escape character than backslash is desired, it can
+    be specified using
+    the <literal>UESCAPE</literal><indexterm><primary>UESCAPE</primary></indexterm>
+    clause after the string, for example:
+<programlisting>
+U&amp;"d!0061t!+000061" UESCAPE '!'
+</programlisting>
+    The escape character can be any single character other than a
+    hexadecimal digit, the plus sign, a single quote, a double quote,
+    or a whitespace character.  Note that the escape character is
+    written in single quotes, not double quotes.
+   </para>
+
+   <para>
+    To include the escape character in the identifier literally, write
+    it twice.
+   </para>
+
+   <para>
+    The Unicode escape syntax works only when the server encoding is
+    UTF8.  When other server encodings are used, only code points in
+    the ASCII range (up to <literal>\007F</literal>) can be specified.
+    Both the 4-digit and the 6-digit form can be used to specify
+    UTF-16 surrogate pairs to compose characters with code points
+    larger than <literal>\FFFF</literal> (although the availability of
+    the 6-digit form technically makes this unnecessary).
+   </para>
+
+   <para>
      Quoting an identifier also makes it case-sensitive, whereas
      unquoted names are always folded to lower case.  For example, the
      identifiers <literal>FOO</literal>, <literal>foo</literal>, and
@@ -242,71 +295,232 @@ UPDATE "my_table" SET "a" = 5;
       </indexterm>
       A string constant in SQL is an arbitrary sequence of characters
       bounded by single quotes (<literal>'</literal>), for example
-     <literal>'This is a string'</literal>.  The standard-compliant way of
-     writing a single-quote character within a string constant is to
-     write two adjacent single quotes, e.g.
+     <literal>'This is a string'</literal>.  To include
+     a single-quote character within a string constant,
+     write two adjacent single quotes, e.g.,
       <literal>'Dianne''s horse'</literal>.
-     <productname>PostgreSQL</productname> also allows single quotes
-     to be escaped with a backslash (<literal>\'</literal>).  However,
-     future versions of <productname>PostgreSQL</productname> will not
-     allow this, so applications using backslashes should convert to the 
-     standard-compliant method outlined above.
+     Note that this is <emphasis>not</> the same as a double-quote
+     character (<literal>"</>). <!-- font-lock sanity: " -->
      </para>
  
      <para>
-     Another <productname>PostgreSQL</productname> extension is that
-     C-style backslash escapes are available: <literal>\b</literal> is a
-     backspace, <literal>\f</literal> is a form feed,
-     <literal>\n</literal> is a newline, <literal>\r</literal> is a
-     carriage return, <literal>\t</literal> is a tab. Also supported is
-     <literal>\<replaceable>digits</replaceable></literal>, where
-     <replaceable>digits</replaceable> represents an octal byte value, and
-     <literal>\x<replaceable>hexdigits</replaceable></literal>, where
-     <replaceable>hexdigits</replaceable> represents a hexadecimal byte value.
-     (It is your responsibility that the byte sequences you create are
-     valid characters in the server character set encoding.) Any other
+     Two string constants that are only separated by whitespace
+     <emphasis>with at least one newline</emphasis> are concatenated
+     and effectively treated as if the string had been written as one
+     constant.  For example:
+<programlisting>
+SELECT 'foo'
+'bar';
+</programlisting>
+     is equivalent to:
+<programlisting>
+SELECT 'foobar';
+</programlisting>
+     but:
+<programlisting>
+SELECT 'foo'      'bar';
+</programlisting>
+     is not valid syntax.  (This slightly bizarre behavior is specified
+     by <acronym>SQL</acronym>; <productname>PostgreSQL</productname> is
+     following the standard.)
+    </para>
+   </sect3>
+
+   <sect3 id="sql-syntax-strings-escape">
+    <title>String Constants with C-Style Escapes</title>
+
+     <indexterm zone="sql-syntax-strings-escape">
+      <primary>escape string syntax</primary>
+     </indexterm>
+     <indexterm zone="sql-syntax-strings-escape">
+      <primary>backslash escapes</primary>
+     </indexterm>
+
+    <para>
+     <productname>PostgreSQL</productname> also accepts <quote>escape</>
+     string constants, which are an extension to the SQL standard.
+     An escape string constant is specified by writing the letter
+     <literal>E</literal> (upper or lower case) just before the opening single
+     quote, e.g., <literal>E'foo'</>.  (When continuing an escape string
+     constant across lines, write <literal>E</> only before the first opening
+     quote.)
+     Within an escape string, a backslash character (<literal>\</>) begins a
+     C-like <firstterm>backslash escape</> sequence, in which the combination
+     of backslash and following character(s) represent a special byte
+     value, as shown in <xref linkend="sql-backslash-table">.
+    </para>
+
+     <table id="sql-backslash-table">
+      <title>Backslash Escape Sequences</title>
+      <tgroup cols="2">
+      <thead>
+       <row>
+        <entry>Backslash Escape Sequence</>
+        <entry>Interpretation</entry>
+       </row>
+      </thead>
+
+      <tbody>
+       <row>
+        <entry><literal>\b</literal></entry>
+        <entry>backspace</entry>
+       </row>
+       <row>
+        <entry><literal>\f</literal></entry>
+        <entry>form feed</entry>
+       </row>
+       <row>
+        <entry><literal>\n</literal></entry>
+        <entry>newline</entry>
+       </row>
+       <row>
+        <entry><literal>\r</literal></entry>
+        <entry>carriage return</entry>
+       </row>
+       <row>
+        <entry><literal>\t</literal></entry>
+        <entry>tab</entry>
+       </row>
+       <row>
+        <entry>
+         <literal>\<replaceable>o</replaceable></literal>,
+         <literal>\<replaceable>oo</replaceable></literal>,
+         <literal>\<replaceable>ooo</replaceable></literal>
+         (<replaceable>o</replaceable> = 0 - 7)
+        </entry>
+        <entry>octal byte value</entry>
+       </row>
+       <row>
+        <entry>
+         <literal>\x<replaceable>h</replaceable></literal>,
+         <literal>\x<replaceable>hh</replaceable></literal>
+         (<replaceable>h</replaceable> = 0 - 9, A - F)
+        </entry>
+        <entry>hexadecimal byte value</entry>
+       </row>
+      </tbody>
+      </tgroup>
+     </table>
+
+    <para>
+     Any other
       character following a backslash is taken literally. Thus, to
-     include a backslash in a string constant, write two backslashes.
+     include a backslash character, write two backslashes (<literal>\\</>).
+     Also, a single quote can be included in an escape string by writing
+     <literal>\'</literal>, in addition to the normal way of <literal>''</>.
      </para>
  
-    <note>
      <para>
-     While ordinary strings now support C-style backslash escapes,
-     future versions will generate warnings for such usage and
-     eventually treat backslashes as literal characters to be
-     standard-conforming. The proper way to specify escape processing is
-     to use the escape string syntax to indicate that escape
-     processing is desired. Escape string syntax is specified by writing
-     the letter <literal>E</literal> (upper or lower case) just before
-     the string, e.g. <literal>E'\041'</>. This method will work in all
-     future versions of <productname>PostgreSQL</productname>.
+     It is your responsibility that the byte sequences you create are
+     valid characters in the server character set encoding.  When the
+     server encoding is UTF-8, then the alternative Unicode escape
+     syntax, explained in <xref linkend="sql-syntax-strings-uescape">,
+     should be used instead.  (The alternative would be doing the
+     UTF-8 encoding by hand and writing out the bytes, which would be
+     very cumbersome.)
      </para>
-    </note>
+
+    <caution>
+    <para>
+     If the configuration parameter
+     <xref linkend="guc-standard-conforming-strings"> is <literal>off</>,
+     then <productname>PostgreSQL</productname> recognizes backslash escapes
+     in both regular and escape string constants.  This is for backward
+     compatibility with the historical behavior, where backslash escapes
+     were always recognized.
+     Although <varname>standard_conforming_strings</> currently defaults to
+     <literal>off</>, the default will change to <literal>on</> in a future
+     release for improved standards compliance.  Applications are therefore
+     encouraged to migrate away from using backslash escapes.  If you need
+     to use a backslash escape to represent a special character, write the
+     string constant with an <literal>E</> to be sure it will be handled the same
+     way in future releases.
+    </para>
+
+    <para>
+     In addition to <varname>standard_conforming_strings</>, the configuration
+     parameters <xref linkend="guc-escape-string-warning"> and
+     <xref linkend="guc-backslash-quote"> govern treatment of backslashes
+     in string constants.
+    </para>
+    </caution>
  
      <para>
       The character with the code zero cannot be in a string constant.
      </para>
+   </sect3>
+
+   <sect3 id="sql-syntax-strings-uescape">
+    <title>String Constants with Unicode Escapes</title>
+
+    <indexterm  zone="sql-syntax-strings-uescape">
+     <primary>Unicode escape</primary>
+     <secondary>in string constants</secondary>
+    </indexterm>
  
      <para>
-     Two string constants that are only separated by whitespace
-     <emphasis>with at least one newline</emphasis> are concatenated
-     and effectively treated as if the string had been written in one
-     constant.  For example:
+     <productname>PostgreSQL</productname> also supports another type
+     of escape syntax for strings that allows specifying arbitrary
+     Unicode characters by code point.  A Unicode escape string
+     constant starts with <literal>U&amp;</literal> (upper or lower case
+     letter U followed by ampersand) immediately before the opening
+     quote, without any spaces in between, for
+     example <literal>U&amp;'foo'</literal>.  (Note that this creates an
+     ambiguity with the operator <literal>&amp;</literal>.  Use spaces
+     around the operator to avoid this problem.)  Inside the quotes,
+     Unicode characters can be specified in escaped form by writing a
+     backslash followed by the four-digit hexadecimal code point
+     number or alternatively a backslash followed by a plus sign
+     followed by a six-digit hexadecimal code point number.  For
+     example, the string <literal>'data'</literal> could be written as
  <programlisting>
-SELECT 'foo'
-'bar';
+U&amp;'d\0061t\+000061'
  </programlisting>
-     is equivalent to
+     The following less trivial example writes the Russian
+     word <quote>slon</quote> (elephant) in Cyrillic letters:
  <programlisting>
-SELECT 'foobar';
+U&amp;'\0441\043B\043E\043D'
  </programlisting>
-     but
+    </para>
+
+    <para>
+     If a different escape character than backslash is desired, it can
+     be specified using
+     the <literal>UESCAPE</literal><indexterm><primary>UESCAPE</primary></indexterm>
+     clause after the string, for example:
  <programlisting>
-SELECT 'foo'      'bar';
+U&amp;'d!0061t!+000061' UESCAPE '!'
  </programlisting>
-     is not valid syntax.  (This slightly bizarre behavior is specified
-     by <acronym>SQL</acronym>; <productname>PostgreSQL</productname> is
-     following the standard.)
+     The escape character can be any single character other than a
+     hexadecimal digit, the plus sign, a single quote, a double quote,
+     or a whitespace character.
+    </para>
+
+    <para>
+     The Unicode escape syntax works only when the server encoding is
+     UTF8.  When other server encodings are used, only code points in
+     the ASCII range (up to <literal>\007F</literal>) can be
+     specified.
+     Both the 4-digit and the 6-digit form can be used to specify
+     UTF-16 surrogate pairs to compose characters with code points
+     larger than <literal>\FFFF</literal> (although the availability
+     of the 6-digit form technically makes this unnecessary).
+    </para>
+
+    <para>
+     Also, the Unicode escape syntax for string constants only works
+     when the configuration
+     parameter <xref linkend="guc-standard-conforming-strings"> is
+     turned on.  This is because otherwise this syntax could confuse
+     clients that parse the SQL statements to the point that it could
+     lead to SQL injections and similar security issues.  If the
+     parameter is set to off, this syntax will be rejected with an
+     error message.
+    </para>
+
+    <para>
+     To include the escape character in the string literally, write it
+     twice.
      </para>
     </sect3>
  
@@ -441,7 +655,7 @@ $function$
       digits (0 through 9).  At least one digit must be before or after the
       decimal point, if one is used.  At least one digit must follow the
       exponent marker (<literal>e</literal>), if one is present.
-     There may not be any spaces or other characters embedded in the
+     There cannot be any spaces or other characters embedded in the
       constant.  Note that any leading plus or minus sign is not actually
       considered part of the constant; it is an operator applied to the
       constant.
@@ -481,7 +695,7 @@ $function$
       force a numeric value to be interpreted as a specific data type
       by casting it.<indexterm><primary>type cast</primary></indexterm>
       For example, you can force a numeric value to be treated as type
-     <type>real</> (<type>float4</>) by writing
+     <type>real</> (<type>float4</>) by writing:
  
  <programlisting>
  REAL '1.23'  -- string style
@@ -512,7 +726,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
       The string constant's text is passed to the input conversion
       routine for the type called <replaceable>type</replaceable>. The
       result is a constant of the indicated type.  The explicit type
-     cast may be omitted if there is no ambiguity as to the type the
+     cast can be omitted if there is no ambiguity as to the type the
       constant must be (for example, when it is assigned directly to a
       table column), in which case it is automatically coerced.
      </para>
@@ -528,7 +742,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
  <synopsis>
  <replaceable>typename</replaceable> ( '<replaceable>string</replaceable>' )
  </synopsis>
-     but not all type names may be used in this way; see <xref
+     but not all type names can be used in this way; see <xref
       linkend="sql-syntax-type-casts"> for details.
      </para>
  
@@ -536,18 +750,18 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
       The <literal>::</literal>, <literal>CAST()</literal>, and
       function-call syntaxes can also be used to specify run-time type
       conversions of arbitrary expressions, as discussed in <xref
-     linkend="sql-syntax-type-casts">.  But the form
-     <literal><replaceable>type</replaceable> '<replaceable>string</replaceable>'</literal>
-     can only be used to specify the type of a literal constant.
-     Another restriction on
-     <literal><replaceable>type</replaceable> '<replaceable>string</replaceable>'</literal>
-     is that it does not work for array types; use <literal>::</literal>
+     linkend="sql-syntax-type-casts">.  To avoid syntactic ambiguity, the
+     <literal><replaceable>type</> '<replaceable>string</>'</literal>
+     syntax can only be used to specify the type of a simple literal constant.
+     Another restriction on the
+     <literal><replaceable>type</> '<replaceable>string</>'</literal>
+     syntax is that it does not work for array types; use <literal>::</literal>
       or <literal>CAST()</literal> to specify the type of an array constant.
      </para>
  
      <para>
       The <literal>CAST()</> syntax conforms to SQL.  The
-     <literal><replaceable>type</replaceable> '<replaceable>string</replaceable>'</literal>
+     <literal><replaceable>type</> '<replaceable>string</>'</literal>
       syntax is a generalization of the standard: SQL specifies this syntax only
       for a few data types, but <productname>PostgreSQL</productname> allows it
       for all types.  The syntax with
@@ -625,7 +839,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
        A dollar sign (<literal>$</literal>) followed by digits is used
        to represent a positional parameter in the body of a function
        definition or a prepared statement.  In other contexts the
-      dollar sign may be part of an identifier or a dollar-quoted string
+      dollar sign can be part of an identifier or a dollar-quoted string
        constant.
       </para>
      </listitem>
@@ -675,8 +889,9 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
       <para>
        The asterisk (<literal>*</literal>) is used in some contexts to denote
        all the fields of a table row or composite value.  It also
-      has a special meaning when used as the argument of the
-      <function>COUNT</function> aggregate function.
+      has a special meaning when used as the argument of an
+      aggregate function, namely that the aggregate does not require
+      any explicit parameter.
       </para>
      </listitem>
  
@@ -700,7 +915,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
     </indexterm>
  
     <para>
-    A comment is an arbitrary sequence of characters beginning with
+    A comment is a sequence of characters beginning with
      double dashes and extending to the end of the line, e.g.:
  <programlisting>
  -- This is a standard SQL comment
@@ -717,7 +932,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
      where the comment begins with <literal>/*</literal> and extends to
      the matching occurrence of <literal>*/</literal>. These block
      comments nest, as specified in the SQL standard but unlike C, so that one can
-    comment out larger blocks of code that may contain existing block
+    comment out larger blocks of code that might contain existing block
      comments.
     </para>
  
@@ -740,23 +955,23 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
      associativity of the operators in <productname>PostgreSQL</>.
      Most operators have the same precedence and are left-associative.
      The precedence and associativity of the operators is hard-wired
-    into the parser.  This may lead to non-intuitive behavior; for
+    into the parser.  This can lead to non-intuitive behavior; for
      example the Boolean operators <literal>&lt;</> and
      <literal>&gt;</> have a different precedence than the Boolean
      operators <literal>&lt;=</> and <literal>&gt;=</>.  Also, you will
      sometimes need to add parentheses when using combinations of
-    binary and unary operators.  For instance
+    binary and unary operators.  For instance:
  <programlisting>
  SELECT 5 ! - 6;
  </programlisting>
-   will be parsed as
+   will be parsed as:
  <programlisting>
  SELECT 5 ! (- 6);
  </programlisting>
      because the parser has no idea &mdash; until it is too late
      &mdash; that <token>!</token> is defined as a postfix operator,
      not an infix one.  To get the desired behavior in this case, you
-    must write
+    must write:
  <programlisting>
  SELECT (5 !) - 6;
  </programlisting>
@@ -910,13 +1125,13 @@ SELECT (5 !) - 6;
  
     <para>
      When a schema-qualified operator name is used in the
-    <literal>OPERATOR</> syntax, as for example in
+    <literal>OPERATOR</> syntax, as for example in:
  <programlisting>
  SELECT 3 OPERATOR(pg_catalog.+) 4;
  </programlisting>
      the <literal>OPERATOR</> construct is taken to have the default precedence
      shown in <xref linkend="sql-precedence-table"> for <quote>any other</> operator.  This is true no matter
-    which specific operator name appears inside <literal>OPERATOR()</>.
+    which specific operator appears inside <literal>OPERATOR()</>.
     </para>
    </sect2>
   </sect1>
@@ -958,82 +1173,88 @@ SELECT 3 OPERATOR(pg_catalog.+) 4;
     <itemizedlist>
      <listitem>
       <para>
-      A constant or literal value.
+      A constant or literal value
       </para>
      </listitem>
  
      <listitem>
       <para>
-      A column reference.
+      A column reference
       </para>
      </listitem>
  
      <listitem>
       <para>
        A positional parameter reference, in the body of a function definition
-      or prepared statement.
+      or prepared statement
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      A subscripted expression
       </para>
      </listitem>
  
      <listitem>
       <para>
-      A subscripted expression.
+      A field selection expression
       </para>
      </listitem>
  
      <listitem>
       <para>
-      A field selection expression.
+      An operator invocation
       </para>
      </listitem>
  
      <listitem>
       <para>
-      An operator invocation.
+      A function call
       </para>
      </listitem>
  
      <listitem>
       <para>
-      A function call.
+      An aggregate expression
       </para>
      </listitem>
  
      <listitem>
       <para>
-      An aggregate expression.
+      A window function call
       </para>
      </listitem>
  
      <listitem>
       <para>
-      A type cast.
+      A type cast
       </para>
      </listitem>
  
      <listitem>
       <para>
-      A scalar subquery.
+      A scalar subquery
       </para>
      </listitem>
  
      <listitem>
       <para>
-      An array constructor.
+      An array constructor
       </para>
      </listitem>
  
      <listitem>
       <para>
-      A row constructor.
+      A row constructor
       </para>
      </listitem>
  
      <listitem>
       <para>
-      Another value expression in parentheses, useful to group
+      Another value expression in parentheses (used to group
        subexpressions and override
-      precedence.<indexterm><primary>parenthesis</></>
+      precedence<indexterm><primary>parenthesis</></>)
       </para>
      </listitem>
     </itemizedlist>
@@ -1062,7 +1283,7 @@ SELECT 3 OPERATOR(pg_catalog.+) 4;
     </indexterm>
  
     <para>
-    A column can be referenced in the form
+    A column can be referenced in the form:
  <synopsis>
  <replaceable>correlation</replaceable>.<replaceable>columnname</replaceable>
  </synopsis>
@@ -1075,7 +1296,7 @@ SELECT 3 OPERATOR(pg_catalog.+) 4;
      the key words <literal>NEW</literal> or <literal>OLD</literal>.
      (<literal>NEW</literal> and <literal>OLD</literal> can only appear in rewrite rules,
      while other correlation names can be used in any SQL statement.)
-    The correlation name and separating dot may be omitted if the column name
+    The correlation name and separating dot can be omitted if the column name
      is unique across all the tables being used in the current query.  (See also <xref linkend="queries">.)
     </para>
    </sect2>
@@ -1107,7 +1328,7 @@ $<replaceable>number</replaceable>
  
     <para>
      For example, consider the definition of a function,
-    <function>dept</function>, as
+    <function>dept</function>, as:
  
  <programlisting>
  CREATE FUNCTION dept(text) RETURNS dept
@@ -1145,11 +1366,11 @@ CREATE FUNCTION dept(text) RETURNS dept
  
     <para>
      In general the array <replaceable>expression</replaceable> must be
-    parenthesized, but the parentheses may be omitted when the expression
+    parenthesized, but the parentheses can be omitted when the expression
      to be subscripted is just a column reference or positional parameter.
      Also, multiple subscripts can be concatenated when the original array
      is multidimensional.
-    For example,
+    For example:
  
  <programlisting>
  mytable.arraycolumn[4]
@@ -1180,9 +1401,9 @@ $1[10:42]
  
     <para>
      In general the row <replaceable>expression</replaceable> must be
-    parenthesized, but the parentheses may be omitted when the expression
+    parenthesized, but the parentheses can be omitted when the expression
      to be selected from is just a table reference or positional parameter.
-    For example,
+    For example:
  
  <programlisting>
  mytable.mycolumn
@@ -1191,7 +1412,18 @@ $1.somecolumn
  </programlisting>
  
      (Thus, a qualified column reference is actually just a special case
-    of the field selection syntax.)
+    of the field selection syntax.)  An important special case is
+    extracting a field from a table column that is of a composite type:
+
+<programlisting>
+(compositecol).somefield
+(mytable.compositecol).somefield
+</programlisting>
+
+    The parentheses are required here to show that
+    <structfield>compositecol</> is a column name not a table name,
+    or that <structname>mytable</> is a table name not a schema name
+    in the second case.
     </para>
    </sect2>
  
@@ -1213,7 +1445,7 @@ $1.somecolumn
      where the <replaceable>operator</replaceable> token follows the syntax
      rules of <xref linkend="sql-syntax-operators">, or is one of the
      key words <token>AND</token>, <token>OR</token>, and
-    <token>NOT</token>, or is a qualified operator name in the form
+    <token>NOT</token>, or is a qualified operator name in the form:
  <synopsis>
  <literal>OPERATOR(</><replaceable>schema</><literal>.</><replaceable>operatorname</><literal>)</>
  </synopsis>
@@ -1238,7 +1470,7 @@ $1.somecolumn
      enclosed in parentheses:
  
  <synopsis>
-<replaceable>function</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional> )
+<replaceable>function_name</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional> )
  </synopsis>
     </para>
  
@@ -1251,7 +1483,7 @@ sqrt(2)
  
     <para>
      The list of built-in functions is in <xref linkend="functions">.
-    Other functions may be added by the user.
+    Other functions can be added by the user.
     </para>
    </sect2>
  
@@ -1271,31 +1503,31 @@ sqrt(2)
      syntax of an aggregate expression is one of the following:
  
  <synopsis>
-<replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable>)
-<replaceable>aggregate_name</replaceable> (ALL <replaceable>expression</replaceable>)
+<replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable> [ , ... ] )
+<replaceable>aggregate_name</replaceable> (ALL <replaceable>expression</replaceable> [ , ... ] )
  <replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable>)
  <replaceable>aggregate_name</replaceable> ( * )
  </synopsis>
  
      where <replaceable>aggregate_name</replaceable> is a previously
      defined aggregate (possibly qualified with a schema name), and
-    <replaceable>expression</replaceable> is 
+    <replaceable>expression</replaceable> is
      any value expression that does not itself contain an aggregate
-    expression.
+    expression or a window function call.
     </para>
  
     <para>
      The first form of aggregate expression invokes the aggregate
-    across all input rows for which the given expression yields a
-    non-null value.  (Actually, it is up to the aggregate function
+    across all input rows for which the given expression(s) yield
+    non-null values.  (Actually, it is up to the aggregate function
      whether to ignore null values or not &mdash; but all the standard ones do.)
      The second form is the same as the first, since
      <literal>ALL</literal> is the default.  The third form invokes the
-    aggregate for all distinct non-null values of the expression found
+    aggregate for all distinct non-null values of the expressions found
      in the input rows.  The last form invokes the aggregate once for
      each input row regardless of null or non-null values; since no
      particular input value is specified, it is generally only useful
-    for the <function>count()</function> aggregate function.
+    for the <function>count(*)</function> aggregate function.
     </para>
  
     <para>
@@ -1308,12 +1540,12 @@ sqrt(2)
  
     <para>
      The predefined aggregate functions are described in <xref
-    linkend="functions-aggregate">.  Other aggregate functions may be added
-    by the user. 
+    linkend="functions-aggregate">.  Other aggregate functions can be added
+    by the user.
     </para>
  
     <para>
-    An aggregate expression may only appear in the result list or
+    An aggregate expression can only appear in the result list or
      <literal>HAVING</> clause of a <command>SELECT</> command.
      It is forbidden in other clauses, such as <literal>WHERE</>,
      because those clauses are logically evaluated before the results
@@ -1325,7 +1557,7 @@ sqrt(2)
      <xref linkend="sql-syntax-scalar-subqueries"> and
      <xref linkend="functions-subquery">), the aggregate is normally
      evaluated over the rows of the subquery.  But an exception occurs
-    if the aggregate's argument contains only outer-level variables:
+    if the aggregate's arguments contain only outer-level variables:
      the aggregate then belongs to the nearest such outer level, and is
      evaluated over the rows of that query.  The aggregate expression
      as a whole is then an outer reference for the subquery it appears in,
@@ -1334,6 +1566,128 @@ sqrt(2)
      appearing only in the result list or <literal>HAVING</> clause
      applies with respect to the query level that the aggregate belongs to.
     </para>
+
+   <note>
+    <para>
+     <productname>PostgreSQL</productname> currently does not support
+     <literal>DISTINCT</> with more than one input expression.
+    </para>
+   </note>
+  </sect2>
+
+  <sect2 id="syntax-window-functions">
+   <title>Window Function Calls</title>
+
+   <indexterm zone="syntax-window-functions">
+    <primary>window function</primary>
+    <secondary>invocation</secondary>
+   </indexterm>
+
+   <indexterm zone="syntax-window-functions">
+    <primary>OVER clause</primary>
+   </indexterm>
+
+   <para>
+    A <firstterm>window function call</firstterm> represents the application
+    of an aggregate-like function over some portion of the rows selected
+    by a query.  Unlike regular aggregate function calls, this is not tied
+    to grouping of the selected rows into a single output row &mdash; each
+    row remains separate in the query output.  However the window function
+    is able to scan all the rows that would be part of the current row's
+    group according to the grouping specification (<literal>PARTITION BY</>
+    list) of the window function call.
+    The syntax of a window function call is one of the following:
+
+<synopsis>
+<replaceable>function_name</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional>) OVER ( <replaceable class="parameter">window_definition</replaceable> )
+<replaceable>function_name</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional>) OVER <replaceable>window_name</replaceable>
+<replaceable>function_name</replaceable> ( * ) OVER ( <replaceable class="parameter">window_definition</replaceable> )
+<replaceable>function_name</replaceable> ( * ) OVER <replaceable>window_name</replaceable>
+</synopsis>
+    where <replaceable class="parameter">window_definition</replaceable>
+    has the syntax
+<synopsis>
+[ <replaceable class="parameter">existing_window_name</replaceable> ]
+[ PARTITION BY <replaceable class="parameter">expression</replaceable> [, ...] ]
+[ ORDER BY <replaceable class="parameter">expression</replaceable> [ ASC | DESC | USING <replaceable class="parameter">operator</replaceable> ] [ NULLS { FIRST | LAST } ] [, ...] ]
+[ <replaceable class="parameter">frame_clause</replaceable> ]
+</synopsis>
+    and the optional <replaceable class="parameter">frame_clause</replaceable>
+    can be one of
+<synopsis>
+RANGE UNBOUNDED PRECEDING
+RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
+RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
+ROWS UNBOUNDED PRECEDING
+ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
+ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
+</synopsis>
+
+    Here, <replaceable>expression</replaceable> represents any value
+    expression that does not itself contain window function calls.
+    The <literal>PARTITION BY</> and <literal>ORDER BY</> lists have
+    essentially the same syntax and semantics as <literal>GROUP BY</>
+    and <literal>ORDER BY</> clauses of the whole query, except that their
+    expressions are always just expressions and cannot be output-column
+    names or numbers.
+    <replaceable>window_name</replaceable> is a reference to a named window
+    specification defined in the query's <literal>WINDOW</literal> clause.
+    Named window specifications are usually referenced with just
+    <literal>OVER</> <replaceable>window_name</replaceable>, but it is
+    also possible to write a window name inside the parentheses and then
+    optionally supply an ordering clause and/or frame clause (the referenced
+    window must lack these clauses, if they are supplied here).
+    This latter syntax follows the same rules as modifying an existing
+    window name within the <literal>WINDOW</literal> clause; see the
+    <xref linkend="sql-select" endterm="sql-select-title"> reference
+    page for details.
+   </para>
+
+   <para>
+    The <replaceable class="parameter">frame_clause</replaceable> specifies
+    the set of rows constituting the <firstterm>window frame</>, for those
+    window functions that act on the frame instead of the whole partition.
+    The default framing option is <literal>RANGE UNBOUNDED PRECEDING</>,
+    which is the same as <literal>RANGE BETWEEN UNBOUNDED PRECEDING AND
+    CURRENT ROW</>; it selects rows up through the current row's last
+    peer in the <literal>ORDER BY</> ordering (which means all rows if
+    there is no <literal>ORDER BY</>).  The options
+    <literal>RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING</> and
+    <literal>ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING</>
+    are also equivalent: they always select all rows in the partition.
+    Lastly, <literal>ROWS UNBOUNDED PRECEDING</> or its verbose equivalent
+    <literal>ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</> select
+    all rows up through the current row (regardless of duplicates).
+    Beware that this option can produce implementation-dependent results
+    if the <literal>ORDER BY</> ordering does not order the rows uniquely.
+   </para>
+
+   <para>
+    The built-in window functions are described in <xref
+    linkend="functions-window-table">.  Other window functions can be added by
+    the user.  Also, any built-in or user-defined aggregate function can be
+    used as a window function.
+   </para>
+
+   <para>
+    The syntaxes using <literal>*</> are used for calling parameter-less
+    aggregate functions as window functions, for example
+    <literal>count(*) OVER (PARTITION BY x ORDER BY y)</>.
+    <literal>*</> is customarily not used for non-aggregate window functions.
+    Aggregate window functions, unlike normal aggregate functions, do not
+    allow <literal>DISTINCT</> to be used within the function argument list.
+   </para>
+
+   <para>
+    Window function calls are permitted only in the <literal>SELECT</literal>
+    list and the <literal>ORDER BY</> clause of the query.
+   </para>
+
+   <para>
+    More information about window functions can be found in
+    <xref linkend="tutorial-window"> and
+    <xref linkend="queries-window">.
+   </para>
    </sect2>
  
    <sect2 id="sql-syntax-type-casts">
@@ -1374,7 +1728,7 @@ CAST ( <replaceable>expression</replaceable> AS <replaceable>type</replaceable>
     </para>
  
     <para>
-    An explicit type cast may usually be omitted if there is no ambiguity as
+    An explicit type cast can usually be omitted if there is no ambiguity as
      to the type that a value expression must produce (for example, when it is
      assigned to a table column); the system will automatically apply a
      type cast in such cases.  However, automatic casting is only done for
@@ -1392,22 +1746,27 @@ CAST ( <replaceable>expression</replaceable> AS <replaceable>type</replaceable>
  </synopsis>
      However, this only works for types whose names are also valid as
      function names.  For example, <literal>double precision</literal>
-    can't be used this way, but the equivalent <literal>float8</literal>
+    cannot be used this way, but the equivalent <literal>float8</literal>
      can.  Also, the names <literal>interval</>, <literal>time</>, and
      <literal>timestamp</> can only be used in this fashion if they are
      double-quoted, because of syntactic conflicts.  Therefore, the use of
      the function-like cast syntax leads to inconsistencies and should
-    probably be avoided in new applications.
-
-    (The function-like syntax is in fact just a function call.  When
-    one of the two standard cast syntaxes is used to do a run-time
-    conversion, it will internally invoke a registered function to
-    perform the conversion.  By convention, these conversion functions
-    have the same name as their output type, and thus the <quote>function-like
-    syntax</> is nothing more than a direct invocation of the underlying
-    conversion function.  Obviously, this is not something that a portable
-    application should rely on.)
+    probably be avoided.
     </para>
+
+   <note>
+    <para>
+     The function-like syntax is in fact just a function call.  When
+     one of the two standard cast syntaxes is used to do a run-time
+     conversion, it will internally invoke a registered function to
+     perform the conversion.  By convention, these conversion functions
+     have the same name as their output type, and thus the <quote>function-like
+     syntax</> is nothing more than a direct invocation of the underlying
+     conversion function.  Obviously, this is not something that a portable
+     application should rely on.  For further details see
+     <xref linkend="sql-createcast" endterm="sql-createcast-title">.
+    </para>
+   </note>
    </sect2>
  
    <sect2 id="sql-syntax-scalar-subqueries">
@@ -1456,12 +1815,12 @@ SELECT name, (SELECT max(pop) FROM cities WHERE cities.state = states.name)
  
     <para>
      An array constructor is an expression that builds an
-    array value from values for its member elements.  A simple array
-    constructor 
+    array value using values for its member elements.  A simple array
+    constructor
      consists of the key word <literal>ARRAY</literal>, a left square bracket
-    <literal>[</>, one or more expressions (separated by commas) for the
+    <literal>[</>, a list of expressions (separated by commas) for the
      array element values, and finally a right square bracket <literal>]</>.
-    For example,
+    For example:
  <programlisting>
  SELECT ARRAY[1,2,3+4];
    array
@@ -1469,15 +1828,28 @@ SELECT ARRAY[1,2,3+4];
   {1,2,7}
  (1 row)
  </programlisting>
-    The array element type is the common type of the member expressions,
+    By default,
+    the array element type is the common type of the member expressions,
      determined using the same rules as for <literal>UNION</> or
-    <literal>CASE</> constructs (see <xref linkend="typeconv-union-case">). 
+    <literal>CASE</> constructs (see <xref linkend="typeconv-union-case">).
+    You can override this by explicitly casting the array constructor to the
+    desired type, for example:
+<programlisting>
+SELECT ARRAY[1,2,22.7]::integer[];
+  array
+----------
+ {1,2,23}
+(1 row)
+</programlisting>
+    This has the same effect as casting each expression to the array
+    element type individually.
+    For more on casting, see <xref linkend="sql-syntax-type-casts">.
     </para>
  
     <para>
      Multidimensional array values can be built by nesting array
      constructors.
-    In the inner constructors, the key word <literal>ARRAY</literal> may
+    In the inner constructors, the key word <literal>ARRAY</literal> can
      be omitted.  For example, these produce the same result:
  
  <programlisting>
@@ -1496,6 +1868,8 @@ SELECT ARRAY[[1,2],[3,4]];
  
      Since multidimensional arrays must be rectangular, inner constructors
      at the same level must produce sub-arrays of identical dimensions.
+    Any cast applied to the outer <literal>ARRAY</> constructor propagates
+    automatically to all the inner constructors.
    </para>
  
    <para>
@@ -1516,6 +1890,19 @@ SELECT ARRAY[f1, f2, '{{9,10},{11,12}}'::int[]] FROM arr;
    </para>
  
    <para>
+   You can construct an empty array, but since it's impossible to have an
+   array with no type, you must explicitly cast your empty array to the
+   desired type.  For example:
+<programlisting>
+SELECT ARRAY[]::integer[];
+ array
+-------
+ {}
+(1 row)
+</programlisting>
+  </para>
+
+  <para>
     It is also possible to construct an array from the results of a
     subquery.  In this form, the array constructor is written with the
     key word <literal>ARRAY</literal> followed by a parenthesized (not
@@ -1560,11 +1947,11 @@ SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
  
     <para>
      A row constructor is an expression that builds a row value (also
-    called a composite value) from values
+    called a composite value) using values
      for its member fields.  A row constructor consists of the key word
      <literal>ROW</literal>, a left parenthesis, zero or more
      expressions (separated by commas) for the row field values, and finally
-    a right parenthesis.  For example,
+    a right parenthesis.  For example:
  <programlisting>
  SELECT ROW(1,2.5,'this is a test');
  </programlisting>
@@ -1573,10 +1960,35 @@ SELECT ROW(1,2.5,'this is a test');
     </para>
  
     <para>
+    A row constructor can include the syntax
+    <replaceable>rowvalue</replaceable><literal>.*</literal>,
+    which will be expanded to a list of the elements of the row value,
+    just as occurs when the <literal>.*</> syntax is used at the top level
+    of a <command>SELECT</> list.  For example, if table <literal>t</> has
+    columns <literal>f1</> and <literal>f2</>, these are the same:
+<programlisting>
+SELECT ROW(t.*, 42) FROM t;
+SELECT ROW(t.f1, t.f2, 42) FROM t;
+</programlisting>
+   </para>
+
+   <note>
+    <para>
+     Before <productname>PostgreSQL</productname> 8.2, the
+     <literal>.*</literal> syntax was not expanded, so that writing
+     <literal>ROW(t.*, 42)</> created a two-field row whose first field
+     was another row value.  The new behavior is usually more useful.
+     If you need the old behavior of nested row values, write the inner
+     row value without <literal>.*</literal>, for instance
+     <literal>ROW(t, 42)</>.
+    </para>
+   </note>
+
+   <para>
      By default, the value created by a <literal>ROW</> expression is of
      an anonymous record type.  If necessary, it can be cast to a named
      composite type &mdash; either the row type of a table, or a composite type
-    created with <command>CREATE TYPE AS</>.  An explicit cast may be needed
+    created with <command>CREATE TYPE AS</>.  An explicit cast might be needed
      to avoid ambiguity.  For example:
  <programlisting>
  CREATE TABLE mytable(f1 int, f2 float, f3 text);
@@ -1617,11 +2029,11 @@ SELECT getf1(CAST(ROW(11,'this is a test',2.5) AS myrowtype));
     in a composite-type table column, or to be passed to a function that
     accepts a composite parameter.  Also,
     it is possible to compare two row values or test a row with
-   <literal>IS NULL</> or <literal>IS NOT NULL</>, for example
+   <literal>IS NULL</> or <literal>IS NOT NULL</>, for example:
  <programlisting>
  SELECT ROW(1,2.5,'this is a test') = ROW(1, 3, 'not the same');
  
-SELECT ROW(a, b, c) IS NOT NULL FROM table;
+SELECT ROW(table.*) IS NULL FROM table;  -- detect all-null rows
  </programlisting>
     For more detail see <xref linkend="functions-comparisons">.
     Row constructors can also be used in connection with subqueries,
@@ -1647,12 +2059,12 @@ SELECT ROW(a, b, c) IS NOT NULL FROM table;
     <para>
      Furthermore, if the result of an expression can be determined by
      evaluating only some parts of it, then other subexpressions
-    might not be evaluated at all.  For instance, if one wrote
+    might not be evaluated at all.  For instance, if one wrote:
  <programlisting>
  SELECT true OR somefunc();
  </programlisting>
      then <literal>somefunc()</literal> would (probably) not be called
-    at all. The same would be the case if one wrote
+    at all. The same would be the case if one wrote:
  <programlisting>
  SELECT somefunc() OR true;
  </programlisting>
@@ -1667,45 +2079,28 @@ SELECT somefunc() OR true;
      rely on side effects or evaluation order in <literal>WHERE</> and <literal>HAVING</> clauses,
      since those clauses are extensively reprocessed as part of
      developing an execution plan.  Boolean
-    expressions (<literal>AND</>/<literal>OR</>/<literal>NOT</> combinations) in those clauses may be reorganized
+    expressions (<literal>AND</>/<literal>OR</>/<literal>NOT</> combinations) in those clauses can be reorganized
      in any manner allowed by the laws of Boolean algebra.
     </para>
  
     <para>
      When it is essential to force evaluation order, a <literal>CASE</>
-    construct (see <xref linkend="functions-conditional">) may be
+    construct (see <xref linkend="functions-conditional">) can be
      used.  For example, this is an untrustworthy way of trying to
      avoid division by zero in a <literal>WHERE</> clause:
  <programlisting>
-SELECT ... WHERE x &lt;&gt; 0 AND y/x &gt; 1.5;
+SELECT ... WHERE x &gt; 0 AND y/x &gt; 1.5;
  </programlisting>
      But this is safe:
  <programlisting>
-SELECT ... WHERE CASE WHEN x &lt;&gt; 0 THEN y/x &gt; 1.5 ELSE false END;
+SELECT ... WHERE CASE WHEN x &gt; 0 THEN y/x &gt; 1.5 ELSE false END;
  </programlisting>
      A <literal>CASE</> construct used in this fashion will defeat optimization
      attempts, so it should only be done when necessary.  (In this particular
-    example, it would doubtless be best to sidestep the problem by writing
+    example, it would be better to sidestep the problem by writing
      <literal>y &gt; 1.5*x</> instead.)
     </para>
    </sect2>
   </sect1>
  
  </chapter>
-
-<!-- Keep this comment at the end of the file
-Local variables:
-mode:sgml
-sgml-omittag:nil
-sgml-shorttag:t
-sgml-minimize-attributes:nil
-sgml-always-quote-attributes:t
-sgml-indent-step:1
-sgml-indent-data:t
-sgml-parent-document:nil
-sgml-default-dtd-file:"./reference.ced"
-sgml-exposed-tags:nil
-sgml-local-catalogs:("/usr/lib/sgml/catalog")
-sgml-local-ecat-files:nil
-End:
--->