-<!--
-$Header: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v 1.72 2002/10/24 17:48:54 petere Exp $
--->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.135 2009/09/21 22:22:07 petere Exp $ -->
<chapter id="sql-syntax">
<title>SQL Syntax</title>
<para>
This chapter describes the syntax of SQL. It forms the foundation
for understanding the following chapters which will go into detail
- about how the SQL commands are applied to define and modify data.
+ about how SQL commands are applied to define and modify data.
</para>
<para>
We also advise users who are already familiar with SQL to read this
- chapter carefully because there are several rules and concepts that
+ chapter carefully because it contains several rules and concepts that
are implemented inconsistently among SQL databases or that are
specific to <productname>PostgreSQL</productname>.
</para>
<sect1 id="sql-syntax-lexical">
<title>Lexical Structure</title>
+ <indexterm>
+ <primary>token</primary>
+ </indexterm>
+
<para>
SQL input consists of a sequence of
<firstterm>commands</firstterm>. A command is composed of a
a <token>SET</token> token to appear in a certain position, and
this particular variation of <command>INSERT</command> also
requires a <token>VALUES</token> in order to be complete. The
- precise syntax rules for each command are described in the
- &cite-reference;.
+ precise syntax rules for each command are described in <xref linkend="reference">.
</para>
<sect2 id="sql-syntax-identifiers">
<title>Identifiers and Key Words</title>
<indexterm zone="sql-syntax-identifiers">
- <primary>identifiers</primary>
+ <primary>identifier</primary>
+ <secondary>syntax of</secondary>
</indexterm>
<indexterm zone="sql-syntax-identifiers">
- <primary>key words</primary>
- <secondary>syntax</secondary>
+ <primary>name</primary>
+ <secondary>syntax of</secondary>
+ </indexterm>
+
+ <indexterm zone="sql-syntax-identifiers">
+ <primary>key word</primary>
+ <secondary>syntax of</secondary>
</indexterm>
<para>
(<literal>a</literal>-<literal>z</literal>, but also letters with
diacritical marks and non-Latin letters) or an underscore
(<literal>_</literal>). Subsequent characters in an identifier or
- key word can be letters, digits
- (<literal>0</literal>-<literal>9</literal>), or underscores,
- although the SQL standard will not define a key word that contains
- digits or starts or ends with an underscore.
+ key word can be letters, underscores, digits
+ (<literal>0</literal>-<literal>9</literal>), or dollar signs
+ (<literal>$</>). Note that dollar signs are not allowed in identifiers
+ according to the letter of the SQL standard, so their use might render
+ applications less portable.
+ The SQL standard will not define a key word that contains
+ digits or starts or ends with an underscore, so identifiers of this
+ form are safe against possible conflict with future extensions of the
+ standard.
</para>
<para>
+ <indexterm><primary>identifier</primary><secondary>length</secondary></indexterm>
The system uses no more than <symbol>NAMEDATALEN</symbol>-1
- characters of an identifier; longer names can be written in
+ bytes of an identifier; longer names can be written in
commands, but they will be truncated. By default,
- <symbol>NAMEDATALEN</symbol> is 64 so the maximum identifier length
- is 63 (but at the time the system is built,
- <symbol>NAMEDATALEN</symbol> can be changed in
- <filename>src/include/postgres_ext.h</filename>).
+ <symbol>NAMEDATALEN</symbol> is 64 so the maximum identifier
+ length is 63 bytes. If this limit is problematic, it can be raised by
+ changing the <symbol>NAMEDATALEN</symbol> constant in
+ <filename>src/include/pg_config_manual.h</filename>.
</para>
<para>
<indexterm>
<primary>case sensitivity</primary>
- <secondary>SQL commands</secondary>
+ <secondary>of SQL commands</secondary>
</indexterm>
- Identifier and key word names are case insensitive. Therefore
+ Identifier and key word names are case insensitive. Therefore:
<programlisting>
UPDATE MY_TABLE SET A = 5;
</programlisting>
- can equivalently be written as
+ can equivalently be written as:
<programlisting>
uPDaTE my_TabLE SeT a = 5;
</programlisting>
A convention often used is to write key words in upper
- case and names in lower case, e.g.,
+ case and names in lower case, e.g.:
<programlisting>
UPDATE my_table SET a = 5;
</programlisting>
<para>
<indexterm>
- <primary>quotes</primary>
+ <primary>quotation marks</primary>
<secondary>and identifiers</secondary>
</indexterm>
There is a second kind of identifier: the <firstterm>delimited
</para>
<para>
- Quoted identifiers can contain any character other than a double
- quote itself. This allows constructing table or column names that
- would otherwise not be possible, such as ones containing spaces or
+ Quoted identifiers can contain any character, except the character
+ with code zero. (To include a double quote, write two double quotes.)
+ This allows constructing table or column names that would
+ otherwise not be possible, such as ones containing spaces or
ampersands. The length limitation still applies.
</para>
<para>
+ <indexterm><primary>Unicode escape</primary><secondary>in
+ identifiers</secondary></indexterm> A variant of quoted
+ identifiers allows including escaped Unicode characters identified
+ by their code points. This variant starts
+ with <literal>U&</literal> (upper or lower case U followed by
+ ampersand) immediately before the opening double quote, without
+ any spaces in between, for example <literal>U&"foo"</literal>.
+ (Note that this creates an ambiguity with the
+ operator <literal>&</literal>. Use spaces around the operator to
+ avoid this problem.) Inside the quotes, Unicode characters can be
+ specified in escaped form by writing a backslash followed by the
+ four-digit hexadecimal code point number or alternatively a
+ backslash followed by a plus sign followed by a six-digit
+ hexadecimal code point number. For example, the
+ identifier <literal>"data"</literal> could be written as
+<programlisting>
+U&"d\0061t\+000061"
+</programlisting>
+ The following less trivial example writes the Russian
+ word <quote>slon</quote> (elephant) in Cyrillic letters:
+<programlisting>
+U&"\0441\043B\043E\043D"
+</programlisting>
+ </para>
+
+ <para>
+ If a different escape character than backslash is desired, it can
+ be specified using
+ the <literal>UESCAPE</literal><indexterm><primary>UESCAPE</primary></indexterm>
+ clause after the string, for example:
+<programlisting>
+U&"d!0061t!+000061" UESCAPE '!'
+</programlisting>
+ The escape character can be any single character other than a
+ hexadecimal digit, the plus sign, a single quote, a double quote,
+ or a whitespace character. Note that the escape character is
+ written in single quotes, not double quotes.
+ </para>
+
+ <para>
+ To include the escape character in the identifier literally, write
+ it twice.
+ </para>
+
+ <para>
+ The Unicode escape syntax works only when the server encoding is
+ UTF8. When other server encodings are used, only code points in
+ the ASCII range (up to <literal>\007F</literal>) can be specified.
+ Both the 4-digit and the 6-digit form can be used to specify
+ UTF-16 surrogate pairs to compose characters with code points
+ larger than <literal>\FFFF</literal> (although the availability of
+ the 6-digit form technically makes this unnecessary).
+ </para>
+
+ <para>
Quoting an identifier also makes it case-sensitive, whereas
unquoted names are always folded to lower case. For example, the
- identifiers <literal>FOO</literal>, <literal>foo</literal> and
+ identifiers <literal>FOO</literal>, <literal>foo</literal>, and
<literal>"foo"</literal> are considered the same by
- <productname>PostgreSQL</productname>, but <literal>"Foo"</literal>
- and <literal>"FOO"</literal> are different from these three and
- each other.
- <footnote>
- <para>
- The folding of unquoted names to lower case in <productname>PostgreSQL</>
- is incompatible with the SQL standard, which says that unquoted
- names should be folded to upper case. Thus, <literal>foo</literal>
- should be equivalent to <literal>"FOO"</literal> not
- <literal>"foo"</literal> according to the standard. If you want to
- write portable applications you are advised to always quote a particular
- name or never quote it.
- </para>
- </footnote>
+ <productname>PostgreSQL</productname>, but
+ <literal>"Foo"</literal> and <literal>"FOO"</literal> are
+ different from these three and each other. (The folding of
+ unquoted names to lower case in <productname>PostgreSQL</> is
+ incompatible with the SQL standard, which says that unquoted names
+ should be folded to upper case. Thus, <literal>foo</literal>
+ should be equivalent to <literal>"FOO"</literal> not
+ <literal>"foo"</literal> according to the standard. If you want
+ to write portable applications you are advised to always quote a
+ particular name or never quote it.)
</para>
</sect2>
<title>Constants</title>
<indexterm zone="sql-syntax-constants">
- <primary>constants</primary>
+ <primary>constant</primary>
</indexterm>
<para>
strings, bit strings, and numbers.
Constants can also be specified with explicit types, which can
enable more accurate representation and more efficient handling by
- the system. The implicit constants are described below; explicit
- constants are discussed afterwards.
+ the system. These alternatives are discussed in the following
+ subsections.
</para>
<sect3 id="sql-syntax-strings">
<title>String Constants</title>
<indexterm zone="sql-syntax-strings">
- <primary>character strings</primary>
- <secondary>constants</secondary>
+ <primary>character string</primary>
+ <secondary>constant</secondary>
</indexterm>
<para>
<indexterm>
- <primary>quotes</primary>
+ <primary>quotation marks</primary>
<secondary>escaping</secondary>
</indexterm>
A string constant in SQL is an arbitrary sequence of characters
- bounded by single quotes (<quote>'</quote>), e.g., <literal>'This
- is a string'</literal>. SQL allows single quotes to be embedded
- in strings by typing two adjacent single quotes (e.g.,
- <literal>'Dianne''s horse'</literal>). In
- <productname>PostgreSQL</productname> single quotes may
- alternatively be escaped with a backslash (<quote>\</quote>,
- e.g., <literal>'Dianne\'s horse'</literal>).
- </para>
-
- <para>
- C-style backslash escapes are also available:
- <literal>\b</literal> is a backspace, <literal>\f</literal> is a
- form feed, <literal>\n</literal> is a newline,
- <literal>\r</literal> is a carriage return, <literal>\t</literal>
- is a tab, and <literal>\<replaceable>xxx</replaceable></literal>,
- where <replaceable>xxx</replaceable> is an octal number, is the
- character with the corresponding ASCII code. Any other character
- following a backslash is taken literally. Thus, to include a
- backslash in a string constant, type two backslashes.
- </para>
-
- <para>
- The character with the code zero cannot be in a string constant.
+ bounded by single quotes (<literal>'</literal>), for example
+ <literal>'This is a string'</literal>. To include
+ a single-quote character within a string constant,
+ write two adjacent single quotes, e.g.,
+ <literal>'Dianne''s horse'</literal>.
+ Note that this is <emphasis>not</> the same as a double-quote
+ character (<literal>"</>). <!-- font-lock sanity: " -->
</para>
<para>
Two string constants that are only separated by whitespace
<emphasis>with at least one newline</emphasis> are concatenated
- and effectively treated as if the string had been written in one
+ and effectively treated as if the string had been written as one
constant. For example:
<programlisting>
SELECT 'foo'
'bar';
</programlisting>
- is equivalent to
+ is equivalent to:
<programlisting>
SELECT 'foobar';
</programlisting>
- but
+ but:
<programlisting>
SELECT 'foo' 'bar';
</programlisting>
is not valid syntax. (This slightly bizarre behavior is specified
- by <acronym>SQL9x</acronym>; <productname>PostgreSQL</productname> is
+ by <acronym>SQL</acronym>; <productname>PostgreSQL</productname> is
following the standard.)
</para>
</sect3>
+ <sect3 id="sql-syntax-strings-escape">
+ <title>String Constants with C-Style Escapes</title>
+
+ <indexterm zone="sql-syntax-strings-escape">
+ <primary>escape string syntax</primary>
+ </indexterm>
+ <indexterm zone="sql-syntax-strings-escape">
+ <primary>backslash escapes</primary>
+ </indexterm>
+
+ <para>
+ <productname>PostgreSQL</productname> also accepts <quote>escape</>
+ string constants, which are an extension to the SQL standard.
+ An escape string constant is specified by writing the letter
+ <literal>E</literal> (upper or lower case) just before the opening single
+ quote, e.g., <literal>E'foo'</>. (When continuing an escape string
+ constant across lines, write <literal>E</> only before the first opening
+ quote.)
+ Within an escape string, a backslash character (<literal>\</>) begins a
+ C-like <firstterm>backslash escape</> sequence, in which the combination
+ of backslash and following character(s) represent a special byte
+ value, as shown in <xref linkend="sql-backslash-table">.
+ </para>
+
+ <table id="sql-backslash-table">
+ <title>Backslash Escape Sequences</title>
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Backslash Escape Sequence</>
+ <entry>Interpretation</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry><literal>\b</literal></entry>
+ <entry>backspace</entry>
+ </row>
+ <row>
+ <entry><literal>\f</literal></entry>
+ <entry>form feed</entry>
+ </row>
+ <row>
+ <entry><literal>\n</literal></entry>
+ <entry>newline</entry>
+ </row>
+ <row>
+ <entry><literal>\r</literal></entry>
+ <entry>carriage return</entry>
+ </row>
+ <row>
+ <entry><literal>\t</literal></entry>
+ <entry>tab</entry>
+ </row>
+ <row>
+ <entry>
+ <literal>\<replaceable>o</replaceable></literal>,
+ <literal>\<replaceable>oo</replaceable></literal>,
+ <literal>\<replaceable>ooo</replaceable></literal>
+ (<replaceable>o</replaceable> = 0 - 7)
+ </entry>
+ <entry>octal byte value</entry>
+ </row>
+ <row>
+ <entry>
+ <literal>\x<replaceable>h</replaceable></literal>,
+ <literal>\x<replaceable>hh</replaceable></literal>
+ (<replaceable>h</replaceable> = 0 - 9, A - F)
+ </entry>
+ <entry>hexadecimal byte value</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ Any other
+ character following a backslash is taken literally. Thus, to
+ include a backslash character, write two backslashes (<literal>\\</>).
+ Also, a single quote can be included in an escape string by writing
+ <literal>\'</literal>, in addition to the normal way of <literal>''</>.
+ </para>
+
+ <para>
+ It is your responsibility that the byte sequences you create are
+ valid characters in the server character set encoding. When the
+ server encoding is UTF-8, then the alternative Unicode escape
+ syntax, explained in <xref linkend="sql-syntax-strings-uescape">,
+ should be used instead. (The alternative would be doing the
+ UTF-8 encoding by hand and writing out the bytes, which would be
+ very cumbersome.)
+ </para>
+
+ <caution>
+ <para>
+ If the configuration parameter
+ <xref linkend="guc-standard-conforming-strings"> is <literal>off</>,
+ then <productname>PostgreSQL</productname> recognizes backslash escapes
+ in both regular and escape string constants. This is for backward
+ compatibility with the historical behavior, where backslash escapes
+ were always recognized.
+ Although <varname>standard_conforming_strings</> currently defaults to
+ <literal>off</>, the default will change to <literal>on</> in a future
+ release for improved standards compliance. Applications are therefore
+ encouraged to migrate away from using backslash escapes. If you need
+ to use a backslash escape to represent a special character, write the
+ string constant with an <literal>E</> to be sure it will be handled the same
+ way in future releases.
+ </para>
+
+ <para>
+ In addition to <varname>standard_conforming_strings</>, the configuration
+ parameters <xref linkend="guc-escape-string-warning"> and
+ <xref linkend="guc-backslash-quote"> govern treatment of backslashes
+ in string constants.
+ </para>
+ </caution>
+
+ <para>
+ The character with the code zero cannot be in a string constant.
+ </para>
+ </sect3>
+
+ <sect3 id="sql-syntax-strings-uescape">
+ <title>String Constants with Unicode Escapes</title>
+
+ <indexterm zone="sql-syntax-strings-uescape">
+ <primary>Unicode escape</primary>
+ <secondary>in string constants</secondary>
+ </indexterm>
+
+ <para>
+ <productname>PostgreSQL</productname> also supports another type
+ of escape syntax for strings that allows specifying arbitrary
+ Unicode characters by code point. A Unicode escape string
+ constant starts with <literal>U&</literal> (upper or lower case
+ letter U followed by ampersand) immediately before the opening
+ quote, without any spaces in between, for
+ example <literal>U&'foo'</literal>. (Note that this creates an
+ ambiguity with the operator <literal>&</literal>. Use spaces
+ around the operator to avoid this problem.) Inside the quotes,
+ Unicode characters can be specified in escaped form by writing a
+ backslash followed by the four-digit hexadecimal code point
+ number or alternatively a backslash followed by a plus sign
+ followed by a six-digit hexadecimal code point number. For
+ example, the string <literal>'data'</literal> could be written as
+<programlisting>
+U&'d\0061t\+000061'
+</programlisting>
+ The following less trivial example writes the Russian
+ word <quote>slon</quote> (elephant) in Cyrillic letters:
+<programlisting>
+U&'\0441\043B\043E\043D'
+</programlisting>
+ </para>
+
+ <para>
+ If a different escape character than backslash is desired, it can
+ be specified using
+ the <literal>UESCAPE</literal><indexterm><primary>UESCAPE</primary></indexterm>
+ clause after the string, for example:
+<programlisting>
+U&'d!0061t!+000061' UESCAPE '!'
+</programlisting>
+ The escape character can be any single character other than a
+ hexadecimal digit, the plus sign, a single quote, a double quote,
+ or a whitespace character.
+ </para>
+
+ <para>
+ The Unicode escape syntax works only when the server encoding is
+ UTF8. When other server encodings are used, only code points in
+ the ASCII range (up to <literal>\007F</literal>) can be
+ specified.
+ Both the 4-digit and the 6-digit form can be used to specify
+ UTF-16 surrogate pairs to compose characters with code points
+ larger than <literal>\FFFF</literal> (although the availability
+ of the 6-digit form technically makes this unnecessary).
+ </para>
+
+ <para>
+ Also, the Unicode escape syntax for string constants only works
+ when the configuration
+ parameter <xref linkend="guc-standard-conforming-strings"> is
+ turned on. This is because otherwise this syntax could confuse
+ clients that parse the SQL statements to the point that it could
+ lead to SQL injections and similar security issues. If the
+ parameter is set to off, this syntax will be rejected with an
+ error message.
+ </para>
+
+ <para>
+ To include the escape character in the string literally, write it
+ twice.
+ </para>
+ </sect3>
+
+ <sect3 id="sql-syntax-dollar-quoting">
+ <title>Dollar-Quoted String Constants</title>
+
+ <indexterm>
+ <primary>dollar quoting</primary>
+ </indexterm>
+
+ <para>
+ While the standard syntax for specifying string constants is usually
+ convenient, it can be difficult to understand when the desired string
+ contains many single quotes or backslashes, since each of those must
+ be doubled. To allow more readable queries in such situations,
+ <productname>PostgreSQL</productname> provides another way, called
+ <quote>dollar quoting</quote>, to write string constants.
+ A dollar-quoted string constant
+ consists of a dollar sign (<literal>$</literal>), an optional
+ <quote>tag</quote> of zero or more characters, another dollar
+ sign, an arbitrary sequence of characters that makes up the
+ string content, a dollar sign, the same tag that began this
+ dollar quote, and a dollar sign. For example, here are two
+ different ways to specify the string <quote>Dianne's horse</>
+ using dollar quoting:
+<programlisting>
+$$Dianne's horse$$
+$SomeTag$Dianne's horse$SomeTag$
+</programlisting>
+ Notice that inside the dollar-quoted string, single quotes can be
+ used without needing to be escaped. Indeed, no characters inside
+ a dollar-quoted string are ever escaped: the string content is always
+ written literally. Backslashes are not special, and neither are
+ dollar signs, unless they are part of a sequence matching the opening
+ tag.
+ </para>
+
+ <para>
+ It is possible to nest dollar-quoted string constants by choosing
+ different tags at each nesting level. This is most commonly used in
+ writing function definitions. For example:
+<programlisting>
+$function$
+BEGIN
+ RETURN ($1 ~ $q$[\t\r\n\v\\]$q$);
+END;
+$function$
+</programlisting>
+ Here, the sequence <literal>$q$[\t\r\n\v\\]$q$</> represents a
+ dollar-quoted literal string <literal>[\t\r\n\v\\]</>, which will
+ be recognized when the function body is executed by
+ <productname>PostgreSQL</>. But since the sequence does not match
+ the outer dollar quoting delimiter <literal>$function$</>, it is
+ just some more characters within the constant so far as the outer
+ string is concerned.
+ </para>
+
+ <para>
+ The tag, if any, of a dollar-quoted string follows the same rules
+ as an unquoted identifier, except that it cannot contain a dollar sign.
+ Tags are case sensitive, so <literal>$tag$String content$tag$</literal>
+ is correct, but <literal>$TAG$String content$tag$</literal> is not.
+ </para>
+
+ <para>
+ A dollar-quoted string that follows a keyword or identifier must
+ be separated from it by whitespace; otherwise the dollar quoting
+ delimiter would be taken as part of the preceding identifier.
+ </para>
+
+ <para>
+ Dollar quoting is not part of the SQL standard, but it is often a more
+ convenient way to write complicated string literals than the
+ standard-compliant single quote syntax. It is particularly useful when
+ representing string constants inside other constants, as is often needed
+ in procedural function definitions. With single-quote syntax, each
+ backslash in the above example would have to be written as four
+ backslashes, which would be reduced to two backslashes in parsing the
+ original string constant, and then to one when the inner string constant
+ is re-parsed during function execution.
+ </para>
+ </sect3>
+
<sect3 id="sql-syntax-bit-strings">
<title>Bit-String Constants</title>
<indexterm zone="sql-syntax-bit-strings">
- <primary>bit strings</primary>
- <secondary>constants</secondary>
+ <primary>bit string</primary>
+ <secondary>constant</secondary>
</indexterm>
<para>
- Bit-string constants look like string constants with a
+ Bit-string constants look like regular string constants with a
<literal>B</literal> (upper or lower case) immediately before the
opening quote (no intervening whitespace), e.g.,
<literal>B'1001'</literal>. The only characters allowed within
Alternatively, bit-string constants can be specified in hexadecimal
notation, using a leading <literal>X</literal> (upper or lower case),
e.g., <literal>X'1FF'</literal>. This notation is equivalent to
- a bit-string constant with four binary digits for each hex digit.
+ a bit-string constant with four binary digits for each hexadecimal digit.
</para>
<para>
Both forms of bit-string constant can be continued
across lines in the same way as regular string constants.
+ Dollar quoting cannot be used in a bit-string constant.
</para>
</sect3>
<title>Numeric Constants</title>
<indexterm>
- <primary>numeric</primary>
- <secondary>constants</secondary>
+ <primary>number</primary>
+ <secondary>constant</secondary>
</indexterm>
<para>
digits (0 through 9). At least one digit must be before or after the
decimal point, if one is used. At least one digit must follow the
exponent marker (<literal>e</literal>), if one is present.
- There may not be any spaces or other characters embedded in the
- constant. Notice that any leading plus or minus sign is not actually
+ There cannot be any spaces or other characters embedded in the
+ constant. Note that any leading plus or minus sign is not actually
considered part of the constant; it is an operator applied to the
constant.
</para>
</para>
<para>
+ <indexterm><primary>integer</primary></indexterm>
+ <indexterm><primary>bigint</primary></indexterm>
+ <indexterm><primary>numeric</primary></indexterm>
A numeric constant that contains neither a decimal point nor an
exponent is initially presumed to be type <type>integer</> if its
value fits in type <type>integer</> (32 bits); otherwise it is
presumed to be type <type>bigint</> if its
value fits in type <type>bigint</> (64 bits); otherwise it is
- taken to be type <type>numeric</>. Constants that contain decimal
+ taken to be type <type>numeric</>. Constants that contain decimal
points and/or exponents are always initially presumed to be type
<type>numeric</>.
</para>
<para>
The initially assigned data type of a numeric constant is just a
- starting point for the type resolution algorithms. In most
- cases the constant will be automatically coerced to the most
- appropriate type depending on context. When necessary, you
- can force a numeric value to be interpreted as a specific
- data type by casting it. For example, you can force a numeric
- value to be treated as type <type>real</> (<type>float4</>)
- by writing
+ starting point for the type resolution algorithms. In most cases
+ the constant will be automatically coerced to the most
+ appropriate type depending on context. When necessary, you can
+ force a numeric value to be interpreted as a specific data type
+ by casting it.<indexterm><primary>type cast</primary></indexterm>
+ For example, you can force a numeric value to be treated as type
+ <type>real</> (<type>float4</>) by writing:
<programlisting>
REAL '1.23' -- string style
1.23::REAL -- PostgreSQL (historical) style
</programlisting>
- </para>
- </sect3>
+
+ These are actually just special cases of the general casting
+ notations discussed next.
+ </para>
+ </sect3>
<sect3 id="sql-syntax-constants-generic">
<title>Constants of Other Types</title>
<indexterm>
- <primary>data types</primary>
- <secondary>constants</secondary>
+ <primary>data type</primary>
+ <secondary>constant</secondary>
</indexterm>
<para>
'<replaceable>string</replaceable>'::<replaceable>type</replaceable>
CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
</synopsis>
- The string's text is passed to the input conversion
+ The string constant's text is passed to the input conversion
routine for the type called <replaceable>type</replaceable>. The
result is a constant of the indicated type. The explicit type
- cast may be omitted if there is no ambiguity as to the type the
- constant must be (for example, when it is passed as an argument
- to a non-overloaded function), in which case it is automatically
- coerced.
+ cast can be omitted if there is no ambiguity as to the type the
+ constant must be (for example, when it is assigned directly to a
+ table column), in which case it is automatically coerced.
+ </para>
+
+ <para>
+ The string constant can be written using either regular SQL
+ notation or dollar-quoting.
</para>
<para>
<synopsis>
<replaceable>typename</replaceable> ( '<replaceable>string</replaceable>' )
</synopsis>
- but not all type names may be used in this way; see <xref
+ but not all type names can be used in this way; see <xref
linkend="sql-syntax-type-casts"> for details.
</para>
The <literal>::</literal>, <literal>CAST()</literal>, and
function-call syntaxes can also be used to specify run-time type
conversions of arbitrary expressions, as discussed in <xref
- linkend="sql-syntax-type-casts">. But the form
- <replaceable>type</replaceable> '<replaceable>string</replaceable>'
- can only be used to specify the type of a literal constant.
- Another restriction on
- <replaceable>type</replaceable> '<replaceable>string</replaceable>'
- is that it does not work for array types; use <literal>::</literal>
+ linkend="sql-syntax-type-casts">. To avoid syntactic ambiguity, the
+ <literal><replaceable>type</> '<replaceable>string</>'</literal>
+ syntax can only be used to specify the type of a simple literal constant.
+ Another restriction on the
+ <literal><replaceable>type</> '<replaceable>string</>'</literal>
+ syntax is that it does not work for array types; use <literal>::</literal>
or <literal>CAST()</literal> to specify the type of an array constant.
</para>
- </sect3>
-
- <sect3>
- <title>Array constants</title>
-
- <indexterm>
- <primary>arrays</primary>
- <secondary>constants</secondary>
- </indexterm>
-
- <para>
- The general format of an array constant is the following:
-<synopsis>
-'{ <replaceable>val1</replaceable> <replaceable>delim</replaceable> <replaceable>val2</replaceable> <replaceable>delim</replaceable> ... }'
-</synopsis>
- where <replaceable>delim</replaceable> is the delimiter character
- for the type, as recorded in its <literal>pg_type</literal>
- entry. (For all built-in types, this is the comma character
- <quote><literal>,</literal></>.) Each <replaceable>val</replaceable> is either a constant
- of the array element type, or a subarray. An example of an
- array constant is
-<programlisting>
-'{{1,2,3},{4,5,6},{7,8,9}}'
-</programlisting>
- This constant is a two-dimensional, 3-by-3 array consisting of three
- subarrays of integers.
- </para>
-
- <para>
- Individual array elements can be placed between double-quote
- marks (<literal>"</literal>) <!-- " --> to avoid ambiguity
- problems with respect to whitespace. Without quote marks, the
- array-value parser will skip leading whitespace.
- </para>
<para>
- (Array constants are actually only a special case of the generic
- type constants discussed in the previous section. The constant
- is initially treated as a string and passed to the array input
- conversion routine. An explicit type specification might be
- necessary.)
+ The <literal>CAST()</> syntax conforms to SQL. The
+ <literal><replaceable>type</> '<replaceable>string</>'</literal>
+ syntax is a generalization of the standard: SQL specifies this syntax only
+ for a few data types, but <productname>PostgreSQL</productname> allows it
+ for all types. The syntax with
+ <literal>::</literal> is historical <productname>PostgreSQL</productname>
+ usage, as is the function-call syntax.
</para>
</sect3>
</sect2>
-
<sect2 id="sql-syntax-operators">
<title>Operators</title>
<indexterm zone="sql-syntax-operators">
- <primary>operators</primary>
+ <primary>operator</primary>
<secondary>syntax</secondary>
</indexterm>
<para>
- An operator is a sequence of up to <symbol>NAMEDATALEN</symbol>-1
+ An operator name is a sequence of up to <symbol>NAMEDATALEN</symbol>-1
(63 by default) characters from the following list:
<literallayout>
-+ - * / < > = ~ ! @ # % ^ & | ` ? $
++ - * / < > = ~ ! @ # % ^ & | ` ?
</literallayout>
There are a few restrictions on operator names, however:
<itemizedlist>
<listitem>
<para>
- <literal>$</> (dollar) cannot be a single-character operator, although it
- can be part of a multiple-character operator name.
- </para>
- </listitem>
-
- <listitem>
- <para>
<literal>--</literal> and <literal>/*</literal> cannot appear
anywhere in an operator name, since they will be taken as the
start of a comment.
A multiple-character operator name cannot end in <literal>+</> or <literal>-</>,
unless the name also contains at least one of these characters:
<literallayout>
-~ ! @ # % ^ & | ` ? $
+~ ! @ # % ^ & | ` ?
</literallayout>
For example, <literal>@-</literal> is an allowed operator name,
but <literal>*-</literal> is not. This restriction allows
<listitem>
<para>
A dollar sign (<literal>$</literal>) followed by digits is used
- to represent the positional parameters in the body of a function
- definition. In other contexts the dollar sign may be part of an
- operator name.
+ to represent a positional parameter in the body of a function
+ definition or a prepared statement. In other contexts the
+ dollar sign can be part of an identifier or a dollar-quoted string
+ constant.
</para>
</listitem>
<listitem>
<para>
- The asterisk (<literal>*</literal>) has a special meaning when
- used in the <command>SELECT</command> command or with the
- <function>COUNT</function> aggregate function.
+ The asterisk (<literal>*</literal>) is used in some contexts to denote
+ all the fields of a table row or composite value. It also
+ has a special meaning when used as the argument of an
+ aggregate function, namely that the aggregate does not require
+ any explicit parameter.
</para>
</listitem>
<listitem>
<para>
- The period (<literal>.</literal>) is used in floating-point
+ The period (<literal>.</literal>) is used in numeric
constants, and to separate schema, table, and column names.
</para>
</listitem>
<title>Comments</title>
<indexterm zone="sql-syntax-comments">
- <primary>comments</primary>
- <secondary>in SQL</secondary>
+ <primary>comment</primary>
+ <secondary sortas="SQL">in SQL</secondary>
</indexterm>
<para>
- A comment is an arbitrary sequence of characters beginning with
+ A comment is a sequence of characters beginning with
double dashes and extending to the end of the line, e.g.:
<programlisting>
--- This is a standard SQL92 comment
+-- This is a standard SQL comment
</programlisting>
</para>
</programlisting>
where the comment begins with <literal>/*</literal> and extends to
the matching occurrence of <literal>*/</literal>. These block
- comments nest, as specified in SQL99 but unlike C, so that one can
- comment out larger blocks of code that may contain existing block
+ comments nest, as specified in the SQL standard but unlike C, so that one can
+ comment out larger blocks of code that might contain existing block
comments.
</para>
<title>Lexical Precedence</title>
<indexterm zone="sql-precedence">
- <primary>operators</primary>
+ <primary>operator</primary>
<secondary>precedence</secondary>
</indexterm>
<para>
+ <xref linkend="sql-precedence-table"> shows the precedence and
+ associativity of the operators in <productname>PostgreSQL</>.
+ Most operators have the same precedence and are left-associative.
The precedence and associativity of the operators is hard-wired
- into the parser. Most operators have the same precedence and are
- left-associative. This may lead to non-intuitive behavior; for
- example the Boolean operators <literal><</> and <literal>></> have a different
- precedence than the Boolean operators <literal><=</> and <literal>>=</>. Also,
- you will sometimes need to add parentheses when using combinations
- of binary and unary operators. For instance
+ into the parser. This can lead to non-intuitive behavior; for
+ example the Boolean operators <literal><</> and
+ <literal>></> have a different precedence than the Boolean
+ operators <literal><=</> and <literal>>=</>. Also, you will
+ sometimes need to add parentheses when using combinations of
+ binary and unary operators. For instance:
<programlisting>
SELECT 5 ! - 6;
</programlisting>
- will be parsed as
+ will be parsed as:
<programlisting>
SELECT 5 ! (- 6);
</programlisting>
- because the parser has no idea -- until it is too late -- that
- <token>!</token> is defined as a postfix operator, not an infix one.
- To get the desired behavior in this case, you must write
+ because the parser has no idea — until it is too late
+ — that <token>!</token> is defined as a postfix operator,
+ not an infix one. To get the desired behavior in this case, you
+ must write:
<programlisting>
SELECT (5 !) - 6;
</programlisting>
This is the price one pays for extensibility.
</para>
- <table tocentry="1">
+ <table id="sql-precedence-table">
<title>Operator Precedence (decreasing)</title>
<tgroup cols="3">
<row>
<entry><token>BETWEEN</token></entry>
<entry></entry>
- <entry>containment</entry>
+ <entry>range containment</entry>
</row>
<row>
<para>
When a schema-qualified operator name is used in the
- <literal>OPERATOR</> syntax, as for example in
+ <literal>OPERATOR</> syntax, as for example in:
<programlisting>
SELECT 3 OPERATOR(pg_catalog.+) 4;
</programlisting>
the <literal>OPERATOR</> construct is taken to have the default precedence
- shown above for <quote>any other</> operator. This is true no matter
- which specific operator name appears inside <literal>OPERATOR()</>.
+ shown in <xref linkend="sql-precedence-table"> for <quote>any other</> operator. This is true no matter
+ which specific operator appears inside <literal>OPERATOR()</>.
</para>
</sect2>
</sect1>
<sect1 id="sql-expressions">
<title>Value Expressions</title>
+ <indexterm zone="sql-expressions">
+ <primary>expression</primary>
+ <secondary>syntax</secondary>
+ </indexterm>
+
+ <indexterm zone="sql-expressions">
+ <primary>value expression</primary>
+ </indexterm>
+
+ <indexterm>
+ <primary>scalar</primary>
+ <see>expression</see>
+ </indexterm>
+
<para>
Value expressions are used in a variety of contexts, such
as in the target list of the <command>SELECT</command> command, as
<itemizedlist>
<listitem>
<para>
- A constant or literal value; see <xref linkend="sql-syntax-constants">.
+ A constant or literal value
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ A column reference
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ A positional parameter reference, in the body of a function definition
+ or prepared statement
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ A subscripted expression
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ A field selection expression
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ An operator invocation
</para>
</listitem>
<listitem>
<para>
- A column reference.
+ A function call
</para>
</listitem>
<listitem>
<para>
- A positional parameter reference, in the body of a function declaration.
+ An aggregate expression
</para>
</listitem>
<listitem>
<para>
- An operator invocation.
+ A window function call
</para>
</listitem>
<listitem>
<para>
- A function call.
+ A type cast
</para>
</listitem>
<listitem>
<para>
- An aggregate expression.
+ A scalar subquery
</para>
</listitem>
<listitem>
<para>
- A type cast.
+ An array constructor
</para>
</listitem>
<listitem>
<para>
- A scalar subquery.
+ A row constructor
</para>
</listitem>
<listitem>
-<synopsis>( <replaceable>expression</replaceable> )</synopsis>
<para>
- Parentheses are used to group subexpressions and override precedence.
+ Another value expression in parentheses (used to group
+ subexpressions and override
+ precedence<indexterm><primary>parenthesis</></>)
</para>
</listitem>
</itemizedlist>
<sect2>
<title>Column References</title>
+ <indexterm>
+ <primary>column reference</primary>
+ </indexterm>
+
<para>
A column can be referenced in the form:
<synopsis>
-<replaceable>correlation</replaceable>.<replaceable>columnname</replaceable> `['<replaceable>subscript</replaceable>`]'
+<replaceable>correlation</replaceable>.<replaceable>columnname</replaceable>
</synopsis>
+ </para>
+ <para>
<replaceable>correlation</replaceable> is the name of a
- table (possibly qualified), or an alias for a table defined by means of a
- FROM clause, or
+ table (possibly qualified with a schema name), or an alias for a table
+ defined by means of a <literal>FROM</literal> clause, or one of
the key words <literal>NEW</literal> or <literal>OLD</literal>.
- (NEW and OLD can only appear in rules,
+ (<literal>NEW</literal> and <literal>OLD</literal> can only appear in rewrite rules,
while other correlation names can be used in any SQL statement.)
- The correlation name and separating dot may be omitted if the column name
- is unique
- across all the tables being used in the current query. If
- <replaceable>column</replaceable> is of an array type, then the
- optional <replaceable>subscript</replaceable> selects a specific
- element or elements in the array. If no subscript is provided, then the
- whole array is selected. (See <xref linkend="arrays"> for more about
- arrays.)
+ The correlation name and separating dot can be omitted if the column name
+ is unique across all the tables being used in the current query. (See also <xref linkend="queries">.)
</para>
</sect2>
<sect2>
<title>Positional Parameters</title>
+ <indexterm>
+ <primary>parameter</primary>
+ <secondary>syntax</secondary>
+ </indexterm>
+
+ <indexterm>
+ <primary>$</primary>
+ </indexterm>
+
<para>
- A positional parameter reference is used to indicate a parameter
+ A positional parameter reference is used to indicate a value
that is supplied externally to an SQL statement. Parameters are
- used in SQL function definitions and in prepared queries.
+ used in SQL function definitions and in prepared queries. Some
+ client libraries also support specifying data values separately
+ from the SQL command string, in which case parameters are used to
+ refer to the out-of-line data values.
The form of a parameter reference is:
<synopsis>
$<replaceable>number</replaceable>
<para>
For example, consider the definition of a function,
- <function>dept</function>, as
+ <function>dept</function>, as:
<programlisting>
-CREATE FUNCTION dept (text) RETURNS dept
- AS 'SELECT * FROM dept WHERE name = $1'
- LANGUAGE SQL;
+CREATE FUNCTION dept(text) RETURNS dept
+ AS $$ SELECT * FROM dept WHERE name = $1 $$
+ LANGUAGE SQL;
</programlisting>
- Here the <literal>$1</literal> will be replaced by the first
- function argument when the function is invoked.
+ Here the <literal>$1</literal> references the value of the first
+ function argument whenever the function is invoked.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Subscripts</title>
+
+ <indexterm>
+ <primary>subscript</primary>
+ </indexterm>
+
+ <para>
+ If an expression yields a value of an array type, then a specific
+ element of the array value can be extracted by writing
+<synopsis>
+<replaceable>expression</replaceable>[<replaceable>subscript</replaceable>]
+</synopsis>
+ or multiple adjacent elements (an <quote>array slice</>) can be extracted
+ by writing
+<synopsis>
+<replaceable>expression</replaceable>[<replaceable>lower_subscript</replaceable>:<replaceable>upper_subscript</replaceable>]
+</synopsis>
+ (Here, the brackets <literal>[ ]</literal> are meant to appear literally.)
+ Each <replaceable>subscript</replaceable> is itself an expression,
+ which must yield an integer value.
+ </para>
+
+ <para>
+ In general the array <replaceable>expression</replaceable> must be
+ parenthesized, but the parentheses can be omitted when the expression
+ to be subscripted is just a column reference or positional parameter.
+ Also, multiple subscripts can be concatenated when the original array
+ is multidimensional.
+ For example:
+
+<programlisting>
+mytable.arraycolumn[4]
+mytable.two_d_column[17][34]
+$1[10:42]
+(arrayfunction(a,b))[42]
+</programlisting>
+
+ The parentheses in the last example are required.
+ See <xref linkend="arrays"> for more about arrays.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Field Selection</title>
+
+ <indexterm>
+ <primary>field selection</primary>
+ </indexterm>
+
+ <para>
+ If an expression yields a value of a composite type (row type), then a
+ specific field of the row can be extracted by writing
+<synopsis>
+<replaceable>expression</replaceable>.<replaceable>fieldname</replaceable>
+</synopsis>
+ </para>
+
+ <para>
+ In general the row <replaceable>expression</replaceable> must be
+ parenthesized, but the parentheses can be omitted when the expression
+ to be selected from is just a table reference or positional parameter.
+ For example:
+
+<programlisting>
+mytable.mycolumn
+$1.somecolumn
+(rowfunction(a,b)).col3
+</programlisting>
+
+ (Thus, a qualified column reference is actually just a special case
+ of the field selection syntax.) An important special case is
+ extracting a field from a table column that is of a composite type:
+
+<programlisting>
+(compositecol).somefield
+(mytable.compositecol).somefield
+</programlisting>
+
+ The parentheses are required here to show that
+ <structfield>compositecol</> is a column name not a table name,
+ or that <structname>mytable</> is a table name not a schema name
+ in the second case.
</para>
</sect2>
<sect2>
<title>Operator Invocations</title>
+ <indexterm>
+ <primary>operator</primary>
+ <secondary>invocation</secondary>
+ </indexterm>
+
<para>
There are three possible syntaxes for an operator invocation:
<simplelist>
</simplelist>
where the <replaceable>operator</replaceable> token follows the syntax
rules of <xref linkend="sql-syntax-operators">, or is one of the
- keywords <token>AND</token>, <token>OR</token>, and
- <token>NOT</token>, or is a qualified operator name
+ key words <token>AND</token>, <token>OR</token>, and
+ <token>NOT</token>, or is a qualified operator name in the form:
<synopsis>
- <literal>OPERATOR(</><replaceable>schema</><literal>.</><replaceable>operatorname</><literal>)</>
+<literal>OPERATOR(</><replaceable>schema</><literal>.</><replaceable>operatorname</><literal>)</>
</synopsis>
Which particular operators exist and whether
they are unary or binary depends on what operators have been
<sect2>
<title>Function Calls</title>
+ <indexterm>
+ <primary>function</primary>
+ <secondary>invocation</secondary>
+ </indexterm>
+
<para>
The syntax for a function call is the name of a function
(possibly qualified with a schema name), followed by its argument list
enclosed in parentheses:
<synopsis>
-<replaceable>function</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional> )
+<replaceable>function_name</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional> )
</synopsis>
</para>
<para>
The list of built-in functions is in <xref linkend="functions">.
- Other functions may be added by the user.
+ Other functions can be added by the user.
</para>
</sect2>
<title>Aggregate Expressions</title>
<indexterm zone="syntax-aggregates">
- <primary>aggregate functions</primary>
+ <primary>aggregate function</primary>
+ <secondary>invocation</secondary>
</indexterm>
<para>
output value, such as the sum or average of the inputs. The
syntax of an aggregate expression is one of the following:
- <simplelist>
- <member><replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable>)</member>
- <member><replaceable>aggregate_name</replaceable> (ALL <replaceable>expression</replaceable>)</member>
- <member><replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable>)</member>
- <member><replaceable>aggregate_name</replaceable> ( * )</member>
- </simplelist>
+<synopsis>
+<replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable> [ , ... ] )
+<replaceable>aggregate_name</replaceable> (ALL <replaceable>expression</replaceable> [ , ... ] )
+<replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable>)
+<replaceable>aggregate_name</replaceable> ( * )
+</synopsis>
where <replaceable>aggregate_name</replaceable> is a previously
- defined aggregate (possibly a qualified name), and
- <replaceable>expression</replaceable> is
+ defined aggregate (possibly qualified with a schema name), and
+ <replaceable>expression</replaceable> is
any value expression that does not itself contain an aggregate
- expression.
+ expression or a window function call.
</para>
<para>
The first form of aggregate expression invokes the aggregate
- across all input rows for which the given expression yields a
- non-null value. (Actually, it is up to the aggregate function
- whether to ignore null values or not --- but all the standard ones do.)
+ across all input rows for which the given expression(s) yield
+ non-null values. (Actually, it is up to the aggregate function
+ whether to ignore null values or not — but all the standard ones do.)
The second form is the same as the first, since
<literal>ALL</literal> is the default. The third form invokes the
- aggregate for all distinct non-null values of the expression found
+ aggregate for all distinct non-null values of the expressions found
in the input rows. The last form invokes the aggregate once for
each input row regardless of null or non-null values; since no
particular input value is specified, it is generally only useful
- for the <function>count()</function> aggregate function.
+ for the <function>count(*)</function> aggregate function.
</para>
<para>
<para>
The predefined aggregate functions are described in <xref
- linkend="functions-aggregate">. Other aggregate functions may be added
- by the user.
+ linkend="functions-aggregate">. Other aggregate functions can be added
+ by the user.
+ </para>
+
+ <para>
+ An aggregate expression can only appear in the result list or
+ <literal>HAVING</> clause of a <command>SELECT</> command.
+ It is forbidden in other clauses, such as <literal>WHERE</>,
+ because those clauses are logically evaluated before the results
+ of aggregates are formed.
+ </para>
+
+ <para>
+ When an aggregate expression appears in a subquery (see
+ <xref linkend="sql-syntax-scalar-subqueries"> and
+ <xref linkend="functions-subquery">), the aggregate is normally
+ evaluated over the rows of the subquery. But an exception occurs
+ if the aggregate's arguments contain only outer-level variables:
+ the aggregate then belongs to the nearest such outer level, and is
+ evaluated over the rows of that query. The aggregate expression
+ as a whole is then an outer reference for the subquery it appears in,
+ and acts as a constant over any one evaluation of that subquery.
+ The restriction about
+ appearing only in the result list or <literal>HAVING</> clause
+ applies with respect to the query level that the aggregate belongs to.
+ </para>
+
+ <note>
+ <para>
+ <productname>PostgreSQL</productname> currently does not support
+ <literal>DISTINCT</> with more than one input expression.
+ </para>
+ </note>
+ </sect2>
+
+ <sect2 id="syntax-window-functions">
+ <title>Window Function Calls</title>
+
+ <indexterm zone="syntax-window-functions">
+ <primary>window function</primary>
+ <secondary>invocation</secondary>
+ </indexterm>
+
+ <indexterm zone="syntax-window-functions">
+ <primary>OVER clause</primary>
+ </indexterm>
+
+ <para>
+ A <firstterm>window function call</firstterm> represents the application
+ of an aggregate-like function over some portion of the rows selected
+ by a query. Unlike regular aggregate function calls, this is not tied
+ to grouping of the selected rows into a single output row — each
+ row remains separate in the query output. However the window function
+ is able to scan all the rows that would be part of the current row's
+ group according to the grouping specification (<literal>PARTITION BY</>
+ list) of the window function call.
+ The syntax of a window function call is one of the following:
+
+<synopsis>
+<replaceable>function_name</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional>) OVER ( <replaceable class="parameter">window_definition</replaceable> )
+<replaceable>function_name</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional>) OVER <replaceable>window_name</replaceable>
+<replaceable>function_name</replaceable> ( * ) OVER ( <replaceable class="parameter">window_definition</replaceable> )
+<replaceable>function_name</replaceable> ( * ) OVER <replaceable>window_name</replaceable>
+</synopsis>
+ where <replaceable class="parameter">window_definition</replaceable>
+ has the syntax
+<synopsis>
+[ <replaceable class="parameter">existing_window_name</replaceable> ]
+[ PARTITION BY <replaceable class="parameter">expression</replaceable> [, ...] ]
+[ ORDER BY <replaceable class="parameter">expression</replaceable> [ ASC | DESC | USING <replaceable class="parameter">operator</replaceable> ] [ NULLS { FIRST | LAST } ] [, ...] ]
+[ <replaceable class="parameter">frame_clause</replaceable> ]
+</synopsis>
+ and the optional <replaceable class="parameter">frame_clause</replaceable>
+ can be one of
+<synopsis>
+RANGE UNBOUNDED PRECEDING
+RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
+RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
+ROWS UNBOUNDED PRECEDING
+ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
+ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
+</synopsis>
+
+ Here, <replaceable>expression</replaceable> represents any value
+ expression that does not itself contain window function calls.
+ The <literal>PARTITION BY</> and <literal>ORDER BY</> lists have
+ essentially the same syntax and semantics as <literal>GROUP BY</>
+ and <literal>ORDER BY</> clauses of the whole query, except that their
+ expressions are always just expressions and cannot be output-column
+ names or numbers.
+ <replaceable>window_name</replaceable> is a reference to a named window
+ specification defined in the query's <literal>WINDOW</literal> clause.
+ Named window specifications are usually referenced with just
+ <literal>OVER</> <replaceable>window_name</replaceable>, but it is
+ also possible to write a window name inside the parentheses and then
+ optionally supply an ordering clause and/or frame clause (the referenced
+ window must lack these clauses, if they are supplied here).
+ This latter syntax follows the same rules as modifying an existing
+ window name within the <literal>WINDOW</literal> clause; see the
+ <xref linkend="sql-select" endterm="sql-select-title"> reference
+ page for details.
+ </para>
+
+ <para>
+ The <replaceable class="parameter">frame_clause</replaceable> specifies
+ the set of rows constituting the <firstterm>window frame</>, for those
+ window functions that act on the frame instead of the whole partition.
+ The default framing option is <literal>RANGE UNBOUNDED PRECEDING</>,
+ which is the same as <literal>RANGE BETWEEN UNBOUNDED PRECEDING AND
+ CURRENT ROW</>; it selects rows up through the current row's last
+ peer in the <literal>ORDER BY</> ordering (which means all rows if
+ there is no <literal>ORDER BY</>). The options
+ <literal>RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING</> and
+ <literal>ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING</>
+ are also equivalent: they always select all rows in the partition.
+ Lastly, <literal>ROWS UNBOUNDED PRECEDING</> or its verbose equivalent
+ <literal>ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</> select
+ all rows up through the current row (regardless of duplicates).
+ Beware that this option can produce implementation-dependent results
+ if the <literal>ORDER BY</> ordering does not order the rows uniquely.
+ </para>
+
+ <para>
+ The built-in window functions are described in <xref
+ linkend="functions-window-table">. Other window functions can be added by
+ the user. Also, any built-in or user-defined aggregate function can be
+ used as a window function.
+ </para>
+
+ <para>
+ The syntaxes using <literal>*</> are used for calling parameter-less
+ aggregate functions as window functions, for example
+ <literal>count(*) OVER (PARTITION BY x ORDER BY y)</>.
+ <literal>*</> is customarily not used for non-aggregate window functions.
+ Aggregate window functions, unlike normal aggregate functions, do not
+ allow <literal>DISTINCT</> to be used within the function argument list.
+ </para>
+
+ <para>
+ Window function calls are permitted only in the <literal>SELECT</literal>
+ list and the <literal>ORDER BY</> clause of the query.
+ </para>
+
+ <para>
+ More information about window functions can be found in
+ <xref linkend="tutorial-window"> and
+ <xref linkend="queries-window">.
</para>
</sect2>
<sect2 id="sql-syntax-type-casts">
<title>Type Casts</title>
- <indexterm>
- <primary>data types</primary>
- <secondary>type casts</secondary>
- </indexterm>
+ <indexterm>
+ <primary>data type</primary>
+ <secondary>type cast</secondary>
+ </indexterm>
+
+ <indexterm>
+ <primary>type cast</primary>
+ </indexterm>
<para>
A type cast specifies a conversion from one data type to another.
CAST ( <replaceable>expression</replaceable> AS <replaceable>type</replaceable> )
<replaceable>expression</replaceable>::<replaceable>type</replaceable>
</synopsis>
- The <literal>CAST</> syntax conforms to SQL92; the syntax with
+ The <literal>CAST</> syntax conforms to SQL; the syntax with
<literal>::</literal> is historical <productname>PostgreSQL</productname>
usage.
</para>
<para>
When a cast is applied to a value expression of a known type, it
represents a run-time type conversion. The cast will succeed only
- if a suitable type conversion function is available. Notice that this
+ if a suitable type conversion operation has been defined. Notice that this
is subtly different from the use of casts with constants, as shown in
<xref linkend="sql-syntax-constants-generic">. A cast applied to an
unadorned string literal represents the initial assignment of a type
</para>
<para>
- An explicit type cast may usually be omitted if there is no ambiguity as
+ An explicit type cast can usually be omitted if there is no ambiguity as
to the type that a value expression must produce (for example, when it is
assigned to a table column); the system will automatically apply a
type cast in such cases. However, automatic casting is only done for
- cast functions that are marked <quote>OK to apply implicitly</>
- in the system catalogs. Other cast functions must be invoked with
+ casts that are marked <quote>OK to apply implicitly</>
+ in the system catalogs. Other casts must be invoked with
explicit casting syntax. This restriction is intended to prevent
surprising conversions from being applied silently.
</para>
</synopsis>
However, this only works for types whose names are also valid as
function names. For example, <literal>double precision</literal>
- can't be used this way, but the equivalent <literal>float8</literal>
+ cannot be used this way, but the equivalent <literal>float8</literal>
can. Also, the names <literal>interval</>, <literal>time</>, and
<literal>timestamp</> can only be used in this fashion if they are
double-quoted, because of syntactic conflicts. Therefore, the use of
the function-like cast syntax leads to inconsistencies and should
- probably be avoided in new applications.
+ probably be avoided.
</para>
+
+ <note>
+ <para>
+ The function-like syntax is in fact just a function call. When
+ one of the two standard cast syntaxes is used to do a run-time
+ conversion, it will internally invoke a registered function to
+ perform the conversion. By convention, these conversion functions
+ have the same name as their output type, and thus the <quote>function-like
+ syntax</> is nothing more than a direct invocation of the underlying
+ conversion function. Obviously, this is not something that a portable
+ application should rely on. For further details see
+ <xref linkend="sql-createcast" endterm="sql-createcast-title">.
+ </para>
+ </note>
</sect2>
- <sect2>
+ <sect2 id="sql-syntax-scalar-subqueries">
<title>Scalar Subqueries</title>
+ <indexterm>
+ <primary>subquery</primary>
+ </indexterm>
+
<para>
A scalar subquery is an ordinary
- <command>SELECT</command> in parentheses that returns exactly one
- row with one column. The <command>SELECT</command> query is executed
+ <command>SELECT</command> query in parentheses that returns exactly one
+ row with one column. (See <xref linkend="queries"> for information about writing queries.)
+ The <command>SELECT</command> query is executed
and the single returned value is used in the surrounding value expression.
It is an error to use a query that
returns more than one row or more than one column as a scalar subquery.
there is no error; the scalar result is taken to be null.)
The subquery can refer to variables from the surrounding query,
which will act as constants during any one evaluation of the subquery.
- See also <xref linkend="functions-subquery">.
+ See also <xref linkend="functions-subquery"> for other expressions involving subqueries.
</para>
<para>
state:
<programlisting>
SELECT name, (SELECT max(pop) FROM cities WHERE cities.state = states.name)
-FROM states;
+ FROM states;
</programlisting>
</para>
</sect2>
- <sect2>
- <title>Expression Evaluation</title>
+ <sect2 id="sql-syntax-array-constructors">
+ <title>Array Constructors</title>
+
+ <indexterm>
+ <primary>array</primary>
+ <secondary>constructor</secondary>
+ </indexterm>
+
+ <indexterm>
+ <primary>ARRAY</primary>
+ </indexterm>
+
+ <para>
+ An array constructor is an expression that builds an
+ array value using values for its member elements. A simple array
+ constructor
+ consists of the key word <literal>ARRAY</literal>, a left square bracket
+ <literal>[</>, a list of expressions (separated by commas) for the
+ array element values, and finally a right square bracket <literal>]</>.
+ For example:
+<programlisting>
+SELECT ARRAY[1,2,3+4];
+ array
+---------
+ {1,2,7}
+(1 row)
+</programlisting>
+ By default,
+ the array element type is the common type of the member expressions,
+ determined using the same rules as for <literal>UNION</> or
+ <literal>CASE</> constructs (see <xref linkend="typeconv-union-case">).
+ You can override this by explicitly casting the array constructor to the
+ desired type, for example:
+<programlisting>
+SELECT ARRAY[1,2,22.7]::integer[];
+ array
+----------
+ {1,2,23}
+(1 row)
+</programlisting>
+ This has the same effect as casting each expression to the array
+ element type individually.
+ For more on casting, see <xref linkend="sql-syntax-type-casts">.
+ </para>
+
+ <para>
+ Multidimensional array values can be built by nesting array
+ constructors.
+ In the inner constructors, the key word <literal>ARRAY</literal> can
+ be omitted. For example, these produce the same result:
+
+<programlisting>
+SELECT ARRAY[ARRAY[1,2], ARRAY[3,4]];
+ array
+---------------
+ {{1,2},{3,4}}
+(1 row)
+
+SELECT ARRAY[[1,2],[3,4]];
+ array
+---------------
+ {{1,2},{3,4}}
+(1 row)
+</programlisting>
+
+ Since multidimensional arrays must be rectangular, inner constructors
+ at the same level must produce sub-arrays of identical dimensions.
+ Any cast applied to the outer <literal>ARRAY</> constructor propagates
+ automatically to all the inner constructors.
+ </para>
+
+ <para>
+ Multidimensional array constructor elements can be anything yielding
+ an array of the proper kind, not only a sub-<literal>ARRAY</> construct.
+ For example:
+<programlisting>
+CREATE TABLE arr(f1 int[], f2 int[]);
+
+INSERT INTO arr VALUES (ARRAY[[1,2],[3,4]], ARRAY[[5,6],[7,8]]);
+
+SELECT ARRAY[f1, f2, '{{9,10},{11,12}}'::int[]] FROM arr;
+ array
+------------------------------------------------
+ {{{1,2},{3,4}},{{5,6},{7,8}},{{9,10},{11,12}}}
+(1 row)
+</programlisting>
+ </para>
+
+ <para>
+ You can construct an empty array, but since it's impossible to have an
+ array with no type, you must explicitly cast your empty array to the
+ desired type. For example:
+<programlisting>
+SELECT ARRAY[]::integer[];
+ array
+-------
+ {}
+(1 row)
+</programlisting>
+ </para>
+
+ <para>
+ It is also possible to construct an array from the results of a
+ subquery. In this form, the array constructor is written with the
+ key word <literal>ARRAY</literal> followed by a parenthesized (not
+ bracketed) subquery. For example:
+<programlisting>
+SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
+ ?column?
+-------------------------------------------------------------
+ {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31}
+(1 row)
+</programlisting>
+ The subquery must return a single column. The resulting
+ one-dimensional array will have an element for each row in the
+ subquery result, with an element type matching that of the
+ subquery's output column.
+ </para>
+
+ <para>
+ The subscripts of an array value built with <literal>ARRAY</literal>
+ always begin with one. For more information about arrays, see
+ <xref linkend="arrays">.
+ </para>
+
+ </sect2>
+
+ <sect2 id="sql-syntax-row-constructors">
+ <title>Row Constructors</title>
+
+ <indexterm>
+ <primary>composite type</primary>
+ <secondary>constructor</secondary>
+ </indexterm>
+
+ <indexterm>
+ <primary>row type</primary>
+ <secondary>constructor</secondary>
+ </indexterm>
+
+ <indexterm>
+ <primary>ROW</primary>
+ </indexterm>
+
+ <para>
+ A row constructor is an expression that builds a row value (also
+ called a composite value) using values
+ for its member fields. A row constructor consists of the key word
+ <literal>ROW</literal>, a left parenthesis, zero or more
+ expressions (separated by commas) for the row field values, and finally
+ a right parenthesis. For example:
+<programlisting>
+SELECT ROW(1,2.5,'this is a test');
+</programlisting>
+ The key word <literal>ROW</> is optional when there is more than one
+ expression in the list.
+ </para>
+
+ <para>
+ A row constructor can include the syntax
+ <replaceable>rowvalue</replaceable><literal>.*</literal>,
+ which will be expanded to a list of the elements of the row value,
+ just as occurs when the <literal>.*</> syntax is used at the top level
+ of a <command>SELECT</> list. For example, if table <literal>t</> has
+ columns <literal>f1</> and <literal>f2</>, these are the same:
+<programlisting>
+SELECT ROW(t.*, 42) FROM t;
+SELECT ROW(t.f1, t.f2, 42) FROM t;
+</programlisting>
+ </para>
+
+ <note>
+ <para>
+ Before <productname>PostgreSQL</productname> 8.2, the
+ <literal>.*</literal> syntax was not expanded, so that writing
+ <literal>ROW(t.*, 42)</> created a two-field row whose first field
+ was another row value. The new behavior is usually more useful.
+ If you need the old behavior of nested row values, write the inner
+ row value without <literal>.*</literal>, for instance
+ <literal>ROW(t, 42)</>.
+ </para>
+ </note>
+
+ <para>
+ By default, the value created by a <literal>ROW</> expression is of
+ an anonymous record type. If necessary, it can be cast to a named
+ composite type — either the row type of a table, or a composite type
+ created with <command>CREATE TYPE AS</>. An explicit cast might be needed
+ to avoid ambiguity. For example:
+<programlisting>
+CREATE TABLE mytable(f1 int, f2 float, f3 text);
+
+CREATE FUNCTION getf1(mytable) RETURNS int AS 'SELECT $1.f1' LANGUAGE SQL;
+
+-- No cast needed since only one getf1() exists
+SELECT getf1(ROW(1,2.5,'this is a test'));
+ getf1
+-------
+ 1
+(1 row)
+
+CREATE TYPE myrowtype AS (f1 int, f2 text, f3 numeric);
+
+CREATE FUNCTION getf1(myrowtype) RETURNS int AS 'SELECT $1.f1' LANGUAGE SQL;
+
+-- Now we need a cast to indicate which function to call:
+SELECT getf1(ROW(1,2.5,'this is a test'));
+ERROR: function getf1(record) is not unique
+
+SELECT getf1(ROW(1,2.5,'this is a test')::mytable);
+ getf1
+-------
+ 1
+(1 row)
+
+SELECT getf1(CAST(ROW(11,'this is a test',2.5) AS myrowtype));
+ getf1
+-------
+ 11
+(1 row)
+</programlisting>
+ </para>
+
+ <para>
+ Row constructors can be used to build composite values to be stored
+ in a composite-type table column, or to be passed to a function that
+ accepts a composite parameter. Also,
+ it is possible to compare two row values or test a row with
+ <literal>IS NULL</> or <literal>IS NOT NULL</>, for example:
+<programlisting>
+SELECT ROW(1,2.5,'this is a test') = ROW(1, 3, 'not the same');
+
+SELECT ROW(table.*) IS NULL FROM table; -- detect all-null rows
+</programlisting>
+ For more detail see <xref linkend="functions-comparisons">.
+ Row constructors can also be used in connection with subqueries,
+ as discussed in <xref linkend="functions-subquery">.
+ </para>
+
+ </sect2>
+
+ <sect2 id="syntax-express-eval">
+ <title>Expression Evaluation Rules</title>
+
+ <indexterm>
+ <primary>expression</primary>
+ <secondary>order of evaluation</secondary>
+ </indexterm>
<para>
The order of evaluation of subexpressions is not defined. In
<para>
Furthermore, if the result of an expression can be determined by
evaluating only some parts of it, then other subexpressions
- might not be evaluated at all. For instance, if one wrote
+ might not be evaluated at all. For instance, if one wrote:
<programlisting>
SELECT true OR somefunc();
</programlisting>
then <literal>somefunc()</literal> would (probably) not be called
- at all. The same would be the case if one wrote
+ at all. The same would be the case if one wrote:
<programlisting>
SELECT somefunc() OR true;
</programlisting>
<para>
As a consequence, it is unwise to use functions with side effects
as part of complex expressions. It is particularly dangerous to
- rely on side effects or evaluation order in WHERE and HAVING clauses,
+ rely on side effects or evaluation order in <literal>WHERE</> and <literal>HAVING</> clauses,
since those clauses are extensively reprocessed as part of
developing an execution plan. Boolean
- expressions (AND/OR/NOT combinations) in those clauses may be reorganized
+ expressions (<literal>AND</>/<literal>OR</>/<literal>NOT</> combinations) in those clauses can be reorganized
in any manner allowed by the laws of Boolean algebra.
</para>
<para>
- When it is essential to force evaluation order, a CASE construct may
- be used. For example, this is an untrustworthy way of trying to
- avoid division by zero in a WHERE clause:
+ When it is essential to force evaluation order, a <literal>CASE</>
+ construct (see <xref linkend="functions-conditional">) can be
+ used. For example, this is an untrustworthy way of trying to
+ avoid division by zero in a <literal>WHERE</> clause:
<programlisting>
-SELECT ... WHERE x <> 0 AND y/x > 1.5;
+SELECT ... WHERE x > 0 AND y/x > 1.5;
</programlisting>
- but this is safe:
+ But this is safe:
<programlisting>
-SELECT ... WHERE CASE WHEN x <> 0 THEN y/x > 1.5 ELSE false END;
+SELECT ... WHERE CASE WHEN x > 0 THEN y/x > 1.5 ELSE false END;
</programlisting>
- A CASE construct used in this fashion will defeat optimization attempts,
- so it should only be done when necessary.
+ A <literal>CASE</> construct used in this fashion will defeat optimization
+ attempts, so it should only be done when necessary. (In this particular
+ example, it would be better to sidestep the problem by writing
+ <literal>y > 1.5*x</> instead.)
</para>
</sect2>
</sect1>
</chapter>
-
-<!-- Keep this comment at the end of the file
-Local variables:
-mode:sgml
-sgml-omittag:nil
-sgml-shorttag:t
-sgml-minimize-attributes:nil
-sgml-always-quote-attributes:t
-sgml-indent-step:1
-sgml-indent-data:t
-sgml-parent-document:nil
-sgml-default-dtd-file:"./reference.ced"
-sgml-exposed-tags:nil
-sgml-local-catalogs:("/usr/lib/sgml/catalog")
-sgml-local-ecat-files:nil
-End:
--->