-<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.122 2008/03/20 21:42:47 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.135 2009/09/21 22:22:07 petere Exp $ -->
<chapter id="sql-syntax">
<title>SQL Syntax</title>
<para>
This chapter describes the syntax of SQL. It forms the foundation
for understanding the following chapters which will go into detail
- about how the SQL commands are applied to define and modify data.
+ about how SQL commands are applied to define and modify data.
</para>
<para>
We also advise users who are already familiar with SQL to read this
- chapter carefully because there are several rules and concepts that
+ chapter carefully because it contains several rules and concepts that
are implemented inconsistently among SQL databases or that are
specific to <productname>PostgreSQL</productname>.
</para>
</para>
<para>
+ <indexterm><primary>Unicode escape</primary><secondary>in
+ identifiers</secondary></indexterm> A variant of quoted
+ identifiers allows including escaped Unicode characters identified
+ by their code points. This variant starts
+ with <literal>U&</literal> (upper or lower case U followed by
+ ampersand) immediately before the opening double quote, without
+ any spaces in between, for example <literal>U&"foo"</literal>.
+ (Note that this creates an ambiguity with the
+ operator <literal>&</literal>. Use spaces around the operator to
+ avoid this problem.) Inside the quotes, Unicode characters can be
+ specified in escaped form by writing a backslash followed by the
+ four-digit hexadecimal code point number or alternatively a
+ backslash followed by a plus sign followed by a six-digit
+ hexadecimal code point number. For example, the
+ identifier <literal>"data"</literal> could be written as
+<programlisting>
+U&"d\0061t\+000061"
+</programlisting>
+ The following less trivial example writes the Russian
+ word <quote>slon</quote> (elephant) in Cyrillic letters:
+<programlisting>
+U&"\0441\043B\043E\043D"
+</programlisting>
+ </para>
+
+ <para>
+ If a different escape character than backslash is desired, it can
+ be specified using
+ the <literal>UESCAPE</literal><indexterm><primary>UESCAPE</primary></indexterm>
+ clause after the string, for example:
+<programlisting>
+U&"d!0061t!+000061" UESCAPE '!'
+</programlisting>
+ The escape character can be any single character other than a
+ hexadecimal digit, the plus sign, a single quote, a double quote,
+ or a whitespace character. Note that the escape character is
+ written in single quotes, not double quotes.
+ </para>
+
+ <para>
+ To include the escape character in the identifier literally, write
+ it twice.
+ </para>
+
+ <para>
+ The Unicode escape syntax works only when the server encoding is
+ UTF8. When other server encodings are used, only code points in
+ the ASCII range (up to <literal>\007F</literal>) can be specified.
+ Both the 4-digit and the 6-digit form can be used to specify
+ UTF-16 surrogate pairs to compose characters with code points
+ larger than <literal>\FFFF</literal> (although the availability of
+ the 6-digit form technically makes this unnecessary).
+ </para>
+
+ <para>
Quoting an identifier also makes it case-sensitive, whereas
unquoted names are always folded to lower case. For example, the
identifiers <literal>FOO</literal>, <literal>foo</literal>, and
bounded by single quotes (<literal>'</literal>), for example
<literal>'This is a string'</literal>. To include
a single-quote character within a string constant,
- write two adjacent single quotes, e.g.
+ write two adjacent single quotes, e.g.,
<literal>'Dianne''s horse'</literal>.
Note that this is <emphasis>not</> the same as a double-quote
- character (<literal>"</>).
+ character (<literal>"</>). <!-- font-lock sanity: " -->
</para>
<para>
by <acronym>SQL</acronym>; <productname>PostgreSQL</productname> is
following the standard.)
</para>
+ </sect3>
- <para>
- <indexterm>
+ <sect3 id="sql-syntax-strings-escape">
+ <title>String Constants with C-Style Escapes</title>
+
+ <indexterm zone="sql-syntax-strings-escape">
<primary>escape string syntax</primary>
</indexterm>
- <indexterm>
+ <indexterm zone="sql-syntax-strings-escape">
<primary>backslash escapes</primary>
</indexterm>
+
+ <para>
<productname>PostgreSQL</productname> also accepts <quote>escape</>
string constants, which are an extension to the SQL standard.
An escape string constant is specified by writing the letter
<literal>E</literal> (upper or lower case) just before the opening single
- quote, e.g. <literal>E'foo'</>. (When continuing an escape string
+ quote, e.g., <literal>E'foo'</>. (When continuing an escape string
constant across lines, write <literal>E</> only before the first opening
quote.)
Within an escape string, a backslash character (<literal>\</>) begins a
C-like <firstterm>backslash escape</> sequence, in which the combination
- of backslash and following character(s) represents a special byte value.
- <literal>\b</literal> is a backspace,
- <literal>\f</literal> is a form feed,
- <literal>\n</literal> is a newline,
- <literal>\r</literal> is a carriage return,
- <literal>\t</literal> is a tab.
- Also supported are
- <literal>\<replaceable>digits</replaceable></literal>, where
- <replaceable>digits</replaceable> represents an octal byte value, and
- <literal>\x<replaceable>hexdigits</replaceable></literal>, where
- <replaceable>hexdigits</replaceable> represents a hexadecimal byte value.
- (It is your responsibility that the byte sequences you create are
- valid characters in the server character set encoding.) Any other
+ of backslash and following character(s) represent a special byte
+ value, as shown in <xref linkend="sql-backslash-table">.
+ </para>
+
+ <table id="sql-backslash-table">
+ <title>Backslash Escape Sequences</title>
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Backslash Escape Sequence</>
+ <entry>Interpretation</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry><literal>\b</literal></entry>
+ <entry>backspace</entry>
+ </row>
+ <row>
+ <entry><literal>\f</literal></entry>
+ <entry>form feed</entry>
+ </row>
+ <row>
+ <entry><literal>\n</literal></entry>
+ <entry>newline</entry>
+ </row>
+ <row>
+ <entry><literal>\r</literal></entry>
+ <entry>carriage return</entry>
+ </row>
+ <row>
+ <entry><literal>\t</literal></entry>
+ <entry>tab</entry>
+ </row>
+ <row>
+ <entry>
+ <literal>\<replaceable>o</replaceable></literal>,
+ <literal>\<replaceable>oo</replaceable></literal>,
+ <literal>\<replaceable>ooo</replaceable></literal>
+ (<replaceable>o</replaceable> = 0 - 7)
+ </entry>
+ <entry>octal byte value</entry>
+ </row>
+ <row>
+ <entry>
+ <literal>\x<replaceable>h</replaceable></literal>,
+ <literal>\x<replaceable>hh</replaceable></literal>
+ (<replaceable>h</replaceable> = 0 - 9, A - F)
+ </entry>
+ <entry>hexadecimal byte value</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ Any other
character following a backslash is taken literally. Thus, to
include a backslash character, write two backslashes (<literal>\\</>).
Also, a single quote can be included in an escape string by writing
<literal>\'</literal>, in addition to the normal way of <literal>''</>.
</para>
+ <para>
+ It is your responsibility that the byte sequences you create are
+ valid characters in the server character set encoding. When the
+ server encoding is UTF-8, then the alternative Unicode escape
+ syntax, explained in <xref linkend="sql-syntax-strings-uescape">,
+ should be used instead. (The alternative would be doing the
+ UTF-8 encoding by hand and writing out the bytes, which would be
+ very cumbersome.)
+ </para>
+
<caution>
<para>
If the configuration parameter
<xref linkend="guc-standard-conforming-strings"> is <literal>off</>,
then <productname>PostgreSQL</productname> recognizes backslash escapes
in both regular and escape string constants. This is for backward
- compatibility with the historical behavior, in which backslash escapes
+ compatibility with the historical behavior, where backslash escapes
were always recognized.
Although <varname>standard_conforming_strings</> currently defaults to
<literal>off</>, the default will change to <literal>on</> in a future
release for improved standards compliance. Applications are therefore
encouraged to migrate away from using backslash escapes. If you need
to use a backslash escape to represent a special character, write the
- constant with an <literal>E</> to be sure it will be handled the same
+ string constant with an <literal>E</> to be sure it will be handled the same
way in future releases.
</para>
</para>
</sect3>
+ <sect3 id="sql-syntax-strings-uescape">
+ <title>String Constants with Unicode Escapes</title>
+
+ <indexterm zone="sql-syntax-strings-uescape">
+ <primary>Unicode escape</primary>
+ <secondary>in string constants</secondary>
+ </indexterm>
+
+ <para>
+ <productname>PostgreSQL</productname> also supports another type
+ of escape syntax for strings that allows specifying arbitrary
+ Unicode characters by code point. A Unicode escape string
+ constant starts with <literal>U&</literal> (upper or lower case
+ letter U followed by ampersand) immediately before the opening
+ quote, without any spaces in between, for
+ example <literal>U&'foo'</literal>. (Note that this creates an
+ ambiguity with the operator <literal>&</literal>. Use spaces
+ around the operator to avoid this problem.) Inside the quotes,
+ Unicode characters can be specified in escaped form by writing a
+ backslash followed by the four-digit hexadecimal code point
+ number or alternatively a backslash followed by a plus sign
+ followed by a six-digit hexadecimal code point number. For
+ example, the string <literal>'data'</literal> could be written as
+<programlisting>
+U&'d\0061t\+000061'
+</programlisting>
+ The following less trivial example writes the Russian
+ word <quote>slon</quote> (elephant) in Cyrillic letters:
+<programlisting>
+U&'\0441\043B\043E\043D'
+</programlisting>
+ </para>
+
+ <para>
+ If a different escape character than backslash is desired, it can
+ be specified using
+ the <literal>UESCAPE</literal><indexterm><primary>UESCAPE</primary></indexterm>
+ clause after the string, for example:
+<programlisting>
+U&'d!0061t!+000061' UESCAPE '!'
+</programlisting>
+ The escape character can be any single character other than a
+ hexadecimal digit, the plus sign, a single quote, a double quote,
+ or a whitespace character.
+ </para>
+
+ <para>
+ The Unicode escape syntax works only when the server encoding is
+ UTF8. When other server encodings are used, only code points in
+ the ASCII range (up to <literal>\007F</literal>) can be
+ specified.
+ Both the 4-digit and the 6-digit form can be used to specify
+ UTF-16 surrogate pairs to compose characters with code points
+ larger than <literal>\FFFF</literal> (although the availability
+ of the 6-digit form technically makes this unnecessary).
+ </para>
+
+ <para>
+ Also, the Unicode escape syntax for string constants only works
+ when the configuration
+ parameter <xref linkend="guc-standard-conforming-strings"> is
+ turned on. This is because otherwise this syntax could confuse
+ clients that parse the SQL statements to the point that it could
+ lead to SQL injections and similar security issues. If the
+ parameter is set to off, this syntax will be rejected with an
+ error message.
+ </para>
+
+ <para>
+ To include the escape character in the string literally, write it
+ twice.
+ </para>
+ </sect3>
+
<sect3 id="sql-syntax-dollar-quoting">
<title>Dollar-Quoted String Constants</title>
</indexterm>
<para>
- A comment is an arbitrary sequence of characters beginning with
+ A comment is a sequence of characters beginning with
double dashes and extending to the end of the line, e.g.:
<programlisting>
-- This is a standard SQL comment
</programlisting>
the <literal>OPERATOR</> construct is taken to have the default precedence
shown in <xref linkend="sql-precedence-table"> for <quote>any other</> operator. This is true no matter
- which specific operator name appears inside <literal>OPERATOR()</>.
+ which specific operator appears inside <literal>OPERATOR()</>.
</para>
</sect2>
</sect1>
<itemizedlist>
<listitem>
<para>
- A constant or literal value.
+ A constant or literal value
</para>
</listitem>
<listitem>
<para>
- A column reference.
+ A column reference
</para>
</listitem>
<listitem>
<para>
A positional parameter reference, in the body of a function definition
- or prepared statement.
+ or prepared statement
</para>
</listitem>
<listitem>
<para>
- A subscripted expression.
+ A subscripted expression
</para>
</listitem>
<listitem>
<para>
- A field selection expression.
+ A field selection expression
</para>
</listitem>
<listitem>
<para>
- An operator invocation.
+ An operator invocation
</para>
</listitem>
<listitem>
<para>
- A function call.
+ A function call
</para>
</listitem>
<listitem>
<para>
- An aggregate expression.
+ An aggregate expression
</para>
</listitem>
<listitem>
<para>
- A type cast.
+ A window function call
</para>
</listitem>
<listitem>
<para>
- A scalar subquery.
+ A type cast
</para>
</listitem>
<listitem>
<para>
- An array constructor.
+ A scalar subquery
</para>
</listitem>
<listitem>
<para>
- A row constructor.
+ An array constructor
</para>
</listitem>
<listitem>
<para>
- Another value expression in parentheses, useful to group
+ A row constructor
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Another value expression in parentheses (used to group
subexpressions and override
- precedence.<indexterm><primary>parenthesis</></>
+ precedence<indexterm><primary>parenthesis</></>)
</para>
</listitem>
</itemizedlist>
</indexterm>
<para>
- A column can be referenced in the form
+ A column can be referenced in the form:
<synopsis>
<replaceable>correlation</replaceable>.<replaceable>columnname</replaceable>
</synopsis>
</programlisting>
(Thus, a qualified column reference is actually just a special case
- of the field selection syntax.)
+ of the field selection syntax.) An important special case is
+ extracting a field from a table column that is of a composite type:
+
+<programlisting>
+(compositecol).somefield
+(mytable.compositecol).somefield
+</programlisting>
+
+ The parentheses are required here to show that
+ <structfield>compositecol</> is a column name not a table name,
+ or that <structname>mytable</> is a table name not a schema name
+ in the second case.
</para>
</sect2>
where the <replaceable>operator</replaceable> token follows the syntax
rules of <xref linkend="sql-syntax-operators">, or is one of the
key words <token>AND</token>, <token>OR</token>, and
- <token>NOT</token>, or is a qualified operator name in the form
+ <token>NOT</token>, or is a qualified operator name in the form:
<synopsis>
<literal>OPERATOR(</><replaceable>schema</><literal>.</><replaceable>operatorname</><literal>)</>
</synopsis>
enclosed in parentheses:
<synopsis>
-<replaceable>function</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional> )
+<replaceable>function_name</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional> )
</synopsis>
</para>
<synopsis>
<replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable> [ , ... ] )
<replaceable>aggregate_name</replaceable> (ALL <replaceable>expression</replaceable> [ , ... ] )
-<replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable> [ , ... ] )
+<replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable>)
<replaceable>aggregate_name</replaceable> ( * )
</synopsis>
defined aggregate (possibly qualified with a schema name), and
<replaceable>expression</replaceable> is
any value expression that does not itself contain an aggregate
- expression.
+ expression or a window function call.
</para>
<para>
</note>
</sect2>
+ <sect2 id="syntax-window-functions">
+ <title>Window Function Calls</title>
+
+ <indexterm zone="syntax-window-functions">
+ <primary>window function</primary>
+ <secondary>invocation</secondary>
+ </indexterm>
+
+ <indexterm zone="syntax-window-functions">
+ <primary>OVER clause</primary>
+ </indexterm>
+
+ <para>
+ A <firstterm>window function call</firstterm> represents the application
+ of an aggregate-like function over some portion of the rows selected
+ by a query. Unlike regular aggregate function calls, this is not tied
+ to grouping of the selected rows into a single output row — each
+ row remains separate in the query output. However the window function
+ is able to scan all the rows that would be part of the current row's
+ group according to the grouping specification (<literal>PARTITION BY</>
+ list) of the window function call.
+ The syntax of a window function call is one of the following:
+
+<synopsis>
+<replaceable>function_name</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional>) OVER ( <replaceable class="parameter">window_definition</replaceable> )
+<replaceable>function_name</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional>) OVER <replaceable>window_name</replaceable>
+<replaceable>function_name</replaceable> ( * ) OVER ( <replaceable class="parameter">window_definition</replaceable> )
+<replaceable>function_name</replaceable> ( * ) OVER <replaceable>window_name</replaceable>
+</synopsis>
+ where <replaceable class="parameter">window_definition</replaceable>
+ has the syntax
+<synopsis>
+[ <replaceable class="parameter">existing_window_name</replaceable> ]
+[ PARTITION BY <replaceable class="parameter">expression</replaceable> [, ...] ]
+[ ORDER BY <replaceable class="parameter">expression</replaceable> [ ASC | DESC | USING <replaceable class="parameter">operator</replaceable> ] [ NULLS { FIRST | LAST } ] [, ...] ]
+[ <replaceable class="parameter">frame_clause</replaceable> ]
+</synopsis>
+ and the optional <replaceable class="parameter">frame_clause</replaceable>
+ can be one of
+<synopsis>
+RANGE UNBOUNDED PRECEDING
+RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
+RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
+ROWS UNBOUNDED PRECEDING
+ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
+ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
+</synopsis>
+
+ Here, <replaceable>expression</replaceable> represents any value
+ expression that does not itself contain window function calls.
+ The <literal>PARTITION BY</> and <literal>ORDER BY</> lists have
+ essentially the same syntax and semantics as <literal>GROUP BY</>
+ and <literal>ORDER BY</> clauses of the whole query, except that their
+ expressions are always just expressions and cannot be output-column
+ names or numbers.
+ <replaceable>window_name</replaceable> is a reference to a named window
+ specification defined in the query's <literal>WINDOW</literal> clause.
+ Named window specifications are usually referenced with just
+ <literal>OVER</> <replaceable>window_name</replaceable>, but it is
+ also possible to write a window name inside the parentheses and then
+ optionally supply an ordering clause and/or frame clause (the referenced
+ window must lack these clauses, if they are supplied here).
+ This latter syntax follows the same rules as modifying an existing
+ window name within the <literal>WINDOW</literal> clause; see the
+ <xref linkend="sql-select" endterm="sql-select-title"> reference
+ page for details.
+ </para>
+
+ <para>
+ The <replaceable class="parameter">frame_clause</replaceable> specifies
+ the set of rows constituting the <firstterm>window frame</>, for those
+ window functions that act on the frame instead of the whole partition.
+ The default framing option is <literal>RANGE UNBOUNDED PRECEDING</>,
+ which is the same as <literal>RANGE BETWEEN UNBOUNDED PRECEDING AND
+ CURRENT ROW</>; it selects rows up through the current row's last
+ peer in the <literal>ORDER BY</> ordering (which means all rows if
+ there is no <literal>ORDER BY</>). The options
+ <literal>RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING</> and
+ <literal>ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING</>
+ are also equivalent: they always select all rows in the partition.
+ Lastly, <literal>ROWS UNBOUNDED PRECEDING</> or its verbose equivalent
+ <literal>ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</> select
+ all rows up through the current row (regardless of duplicates).
+ Beware that this option can produce implementation-dependent results
+ if the <literal>ORDER BY</> ordering does not order the rows uniquely.
+ </para>
+
+ <para>
+ The built-in window functions are described in <xref
+ linkend="functions-window-table">. Other window functions can be added by
+ the user. Also, any built-in or user-defined aggregate function can be
+ used as a window function.
+ </para>
+
+ <para>
+ The syntaxes using <literal>*</> are used for calling parameter-less
+ aggregate functions as window functions, for example
+ <literal>count(*) OVER (PARTITION BY x ORDER BY y)</>.
+ <literal>*</> is customarily not used for non-aggregate window functions.
+ Aggregate window functions, unlike normal aggregate functions, do not
+ allow <literal>DISTINCT</> to be used within the function argument list.
+ </para>
+
+ <para>
+ Window function calls are permitted only in the <literal>SELECT</literal>
+ list and the <literal>ORDER BY</> clause of the query.
+ </para>
+
+ <para>
+ More information about window functions can be found in
+ <xref linkend="tutorial-window"> and
+ <xref linkend="queries-window">.
+ </para>
+ </sect2>
+
<sect2 id="sql-syntax-type-casts">
<title>Type Casts</title>
<literal>timestamp</> can only be used in this fashion if they are
double-quoted, because of syntactic conflicts. Therefore, the use of
the function-like cast syntax leads to inconsistencies and should
- probably be avoided in new applications.
+ probably be avoided.
</para>
<note>
<para>
An array constructor is an expression that builds an
- array value from values for its member elements. A simple array
+ array value using values for its member elements. A simple array
constructor
consists of the key word <literal>ARRAY</literal>, a left square bracket
<literal>[</>, a list of expressions (separated by commas) for the
<para>
A row constructor is an expression that builds a row value (also
- called a composite value) from values
+ called a composite value) using values
for its member fields. A row constructor consists of the key word
<literal>ROW</literal>, a left parenthesis, zero or more
expressions (separated by commas) for the row field values, and finally