doc/src/sgml/syntax.sgml

   1 <!--
   2 $Header: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v 1.59 2002/03/22 19:20:31 petere Exp $
   3 -->
   4
   5 <chapter id="sql-syntax">
   6  <title>SQL Syntax</title>
   7
   8  <indexterm zone="sql-syntax">
   9   <primary>syntax</primary>
  10   <secondary>SQL</secondary>
  11  </indexterm>
  12
  13   <abstract>
  14    <para>
  15     This chapter describes the syntax of SQL.
  16    </para>
  17   </abstract>
  18
  19  <sect1 id="sql-syntax-lexical">
  20   <title>Lexical Structure</title>
  21
  22   <para>
  23    SQL input consists of a sequence of
  24    <firstterm>commands</firstterm>.  A command is composed of a
  25    sequence of <firstterm>tokens</firstterm>, terminated by a
  26    semicolon (<quote>;</quote>).  The end of the input stream also
  27    terminates a command.  Which tokens are valid depends on the syntax
  28    of the particular command.
  29   </para>
  30
  31   <para>
  32    A token can be a <firstterm>key word</firstterm>, an
  33    <firstterm>identifier</firstterm>, a <firstterm>quoted
  34    identifier</firstterm>, a <firstterm>literal</firstterm> (or
  35    constant), or a special character symbol.  Tokens are normally
  36    separated by whitespace (space, tab, newline), but need not be if
  37    there is no ambiguity (which is generally only the case if a
  38    special character is adjacent to some other token type).
  39   </para>
  40
  41   <para>
  42    Additionally, <firstterm>comments</firstterm> can occur in SQL
  43    input.  They are not tokens, they are effectively equivalent to
  44    whitespace.
  45   </para>
  46
  47   <informalexample id="sql-syntax-ex-commands">
  48    <para>
  49     For example, the following is (syntactically) valid SQL input:
  50 <programlisting>
  51 SELECT * FROM MY_TABLE;
  52 UPDATE MY_TABLE SET A = 5;
  53 INSERT INTO MY_TABLE VALUES (3, 'hi there');
  54 </programlisting>
  55     This is a sequence of three commands, one per line (although this
  56     is not required; more than one command can be on a line, and
  57     commands can usefully be split across lines).
  58    </para>
  59   </informalexample>
  60
  61   <para>
  62    The SQL syntax is not very consistent regarding what tokens
  63    identify commands and which are operands or parameters.  The first
  64    few tokens are generally the command name, so in the above example
  65    we would usually speak of a <quote>SELECT</quote>, an
  66    <quote>UPDATE</quote>, and an <quote>INSERT</quote> command.  But
  67    for instance the <command>UPDATE</command> command always requires
  68    a <token>SET</token> token to appear in a certain position, and
  69    this particular variation of <command>INSERT</command> also
  70    requires a <token>VALUES</token> in order to be complete.  The
  71    precise syntax rules for each command are described in the
  72    <citetitle>Reference Manual</citetitle>.
  73   </para>
  74
  75   <sect2 id="sql-syntax-identifiers">
  76    <title>Identifiers and Key Words</title>
  77
  78    <indexterm zone="sql-syntax-identifiers">
  79     <primary>identifiers</primary>
  80    </indexterm>
  81
  82    <indexterm zone="sql-syntax-identifiers">
  83     <primary>key words</primary>
  84     <secondary>syntax</secondary>
  85    </indexterm>
  86
  87    <para>
  88     Tokens such as <token>SELECT</token>, <token>UPDATE</token>, or
  89     <token>VALUES</token> in the example above are examples of
  90     <firstterm>key words</firstterm>, that is, words that have a fixed
  91     meaning in the SQL language.  The tokens <token>MY_TABLE</token>
  92     and <token>A</token> are examples of
  93     <firstterm>identifiers</firstterm>.  They identify names of
  94     tables, columns, or other database objects, depending on the
  95     command they are used in.  Therefore they are sometimes simply
  96     called <quote>names</quote>.  Key words and identifiers have the
  97     same lexical structure, meaning that one cannot know whether a
  98     token is an identifier or a key word without knowing the language.
  99     A complete list of key words can be found in <xref
 100     linkend="sql-keywords-appendix">.
 101    </para>
 102
 103    <para>
 104     SQL identifiers and key words must begin with a letter
 105     (<literal>a</literal>-<literal>z</literal>, but also letters with
 106     diacritical marks and non-Latin letters) or an underscore
 107     (<literal>_</literal>).  Subsequent characters in an identifier or
 108     key word can be letters, digits
 109     (<literal>0</literal>-<literal>9</literal>), or underscores,
 110     although the SQL standard will not define a key word that contains
 111     digits or starts or ends with an underscore.
 112    </para>
 113
 114    <para>
 115     The system uses no more than <symbol>NAMEDATALEN</symbol>-1
 116     characters of an identifier; longer names can be written in
 117     commands, but they will be truncated.  By default,
 118     <symbol>NAMEDATALEN</symbol> is 32 so the maximum identifier length
 119     is 31 (but at the time the system is built,
 120     <symbol>NAMEDATALEN</symbol> can be changed in
 121     <filename>src/include/postgres_ext.h</filename>).
 122    </para>
 123
 124    <para>
 125     <indexterm>
 126      <primary>case sensitivity</primary>
 127      <secondary>SQL commands</secondary>
 128     </indexterm>
 129     Identifier and key word names are case insensitive.  Therefore
 130 <programlisting>
 131 UPDATE MY_TABLE SET A = 5;
 132 </programlisting>
 133     can equivalently be written as
 134 <programlisting>
 135 uPDaTE my_TabLE SeT a = 5;
 136 </programlisting>
 137     A convention often used is to write key words in upper
 138     case and names in lower case, e.g.,
 139 <programlisting>
 140 UPDATE my_table SET a = 5;
 141 </programlisting>
 142    </para>
 143
 144    <para>
 145     <indexterm>
 146      <primary>quotes</primary>
 147      <secondary>and identifiers</secondary>
 148     </indexterm>
 149     There is a second kind of identifier:  the <firstterm>delimited
 150     identifier</firstterm> or <firstterm>quoted
 151     identifier</firstterm>.  It is formed by enclosing an arbitrary
 152     sequence of characters in double-quotes
 153     (<literal>"</literal>). <!-- " font-lock mania --> A delimited
 154     identifier is always an identifier, never a key word.  So
 155     <literal>"select"</literal> could be used to refer to a column or
 156     table named <quote>select</quote>, whereas an unquoted
 157     <literal>select</literal> would be taken as a key word and
 158     would therefore provoke a parse error when used where a table or
 159     column name is expected.  The example can be written with quoted
 160     identifiers like this:
 161 <programlisting>
 162 UPDATE "my_table" SET "a" = 5;
 163 </programlisting>
 164    </para>
 165
 166    <para>
 167     Quoted identifiers can contain any character other than a double
 168     quote itself.  This allows constructing table or column names that
 169     would otherwise not be possible, such as ones containing spaces or
 170     ampersands.  The length limitation still applies.
 171    </para>
 172
 173    <para>
 174     Quoting an identifier also makes it case-sensitive, whereas
 175     unquoted names are always folded to lower case.  For example, the
 176     identifiers <literal>FOO</literal>, <literal>foo</literal> and
 177     <literal>"foo"</literal> are considered the same by
 178     <productname>PostgreSQL</productname>, but <literal>"Foo"</literal>
 179     and <literal>"FOO"</literal> are different from these three and
 180     each other.
 181     <footnote>
 182      <para>
 183       The folding of unquoted names to lower case in <productname>PostgreSQL</>
 184       is incompatible with the SQL standard, which says that unquoted
 185       names should be folded to upper case.  Thus, <literal>foo</literal>
 186       should be equivalent to <literal>"FOO"</literal> not
 187       <literal>"foo"</literal> according to the standard.  If you want to
 188       write portable applications you are advised to always quote a particular
 189       name or never quote it.
 190      </para>
 191     </footnote>
 192    </para>
 193   </sect2>
 194
 195
 196   <sect2 id="sql-syntax-constants">
 197    <title>Constants</title>
 198
 199    <indexterm zone="sql-syntax-constants">
 200     <primary>constants</primary>
 201    </indexterm>
 202
 203    <para>
 204     There are four kinds of <firstterm>implicitly-typed
 205     constants</firstterm> in <productname>PostgreSQL</productname>:
 206     strings, bit strings, integers, and floating-point numbers.
 207     Constants can also be specified with explicit types, which can
 208     enable more accurate representation and more efficient handling by
 209     the system. The implicit constants are described below; explicit
 210     constants are discussed afterwards.
 211    </para>
 212
 213    <sect3 id="sql-syntax-strings">
 214     <title>String Constants</title>
 215
 216     <indexterm zone="sql-syntax-strings">
 217      <primary>character strings</primary>
 218      <secondary>constants</secondary>
 219     </indexterm>
 220
 221     <para>
 222      <indexterm>
 223       <primary>quotes</primary>
 224       <secondary>escaping</secondary>
 225      </indexterm>
 226      A string constant in SQL is an arbitrary sequence of characters
 227      bounded by single quotes (<quote>'</quote>), e.g., <literal>'This
 228      is a string'</literal>.  SQL allows single quotes to be embedded
 229      in strings by typing two adjacent single quotes (e.g.,
 230      <literal>'Dianne''s horse'</literal>).  In
 231      <productname>PostgreSQL</productname> single quotes may
 232      alternatively be escaped with a backslash (<quote>\</quote>,
 233      e.g., <literal>'Dianne\'s horse'</literal>).
 234     </para>
 235
 236     <para>
 237      C-style backslash escapes are also available:
 238      <literal>\b</literal> is a backspace, <literal>\f</literal> is a
 239      form feed, <literal>\n</literal> is a newline,
 240      <literal>\r</literal> is a carriage return, <literal>\t</literal>
 241      is a tab, and <literal>\<replaceable>xxx</replaceable></literal>,
 242      where <replaceable>xxx</replaceable> is an octal number, is the
 243      character with the corresponding ASCII code.  Any other character
 244      following a backslash is taken literally.  Thus, to include a
 245      backslash in a string constant, type two backslashes.
 246     </para>
 247
 248     <para>
 249      The character with the code zero cannot be in a string constant.
 250     </para>
 251
 252     <para>
 253      Two string constants that are only separated by whitespace
 254      <emphasis>with at least one newline</emphasis> are concatenated
 255      and effectively treated as if the string had been written in one
 256      constant.  For example:
 257 <programlisting>
 258 SELECT 'foo'
 259 'bar';
 260 </programlisting>
 261      is equivalent to
 262 <programlisting>
 263 SELECT 'foobar';
 264 </programlisting>
 265      but
 266 <programlisting>
 267 SELECT 'foo'      'bar';
 268 </programlisting>
 269      is not valid syntax, and <productname>PostgreSQL</productname> is
 270       consistent with <acronym>SQL9x</acronym> in this regard.
 271     </para>
 272    </sect3>
 273
 274    <sect3 id="sql-syntax-bit-strings">
 275     <title>Bit-String Constants</title>
 276
 277     <indexterm zone="sql-syntax-bit-strings">
 278      <primary>bit strings</primary>
 279      <secondary>constants</secondary>
 280     </indexterm>
 281
 282     <para>
 283      Bit-string constants look like string constants with a
 284      <literal>B</literal> (upper or lower case) immediately before the
 285      opening quote (no intervening whitespace), e.g.,
 286      <literal>B'1001'</literal>.  The only characters allowed within
 287      bit-string constants are <literal>0</literal> and
 288      <literal>1</literal>.  Bit-string constants can be continued
 289      across lines in the same way as regular string constants.
 290     </para>
 291    </sect3>
 292
 293    <sect3>
 294     <title>Integer Constants</title>
 295
 296     <para>
 297      Integer constants in SQL are sequences of decimal digits (0
 298      though 9) with no decimal point and no exponent.  The range of legal values
 299      depends on which integer data type is used, but the plain
 300      <type>integer</type> type accepts values ranging from -2147483648
 301      to +2147483647.  (The optional plus or minus sign is actually a
 302      separate unary operator and not part of the integer constant.)
 303     </para>
 304    </sect3>
 305
 306    <sect3>
 307     <title>Floating-Point Constants</title>
 308
 309     <indexterm>
 310      <primary>floating point</primary>
 311      <secondary>constants</secondary>
 312     </indexterm>
 313
 314     <para>
 315      Floating-point constants are accepted in these general forms:
 316 <synopsis>
 317 <replaceable>digits</replaceable>.<optional><replaceable>digits</replaceable></optional><optional>e<optional>+-</optional><replaceable>digits</replaceable></optional>
 318 <optional><replaceable>digits</replaceable></optional>.<replaceable>digits</replaceable><optional>e<optional>+-</optional><replaceable>digits</replaceable></optional>
 319 <replaceable>digits</replaceable>e<optional>+-</optional><replaceable>digits</replaceable>
 320 </synopsis>
 321      where <replaceable>digits</replaceable> is one or more decimal
 322      digits.  At least one digit must be before or after the decimal
 323      point. At least one digit must follow the exponent delimiter
 324       (<literal>e</literal>) if that field is present.
 325      Thus, a floating-point constant is distinguished from an integer
 326      constant by the presence of either the decimal point or the
 327      exponent clause (or both).  There must not be a space or other
 328      characters embedded in the constant.
 329     </para>
 330
 331      <informalexample>
 332       <para>
 333        These are some examples of valid floating-point constants:
 334        <literallayout>
 335 3.5
 336 4.
 337 .001
 338 5e2
 339 1.925e-3
 340        </literallayout>
 341       </para>
 342      </informalexample>
 343
 344     <para>
 345      Floating-point constants are of type <type>DOUBLE
 346      PRECISION</type>. <type>REAL</type> can be specified explicitly
 347      by using <acronym>SQL</acronym> string notation or
 348      <productname>PostgreSQL</productname> type notation:
 349
 350       <programlisting>
 351 REAL '1.23'  -- string style
 352 '1.23'::REAL -- PostgreSQL (historical) style
 353       </programlisting>
 354      </para>
 355     </sect3>
 356
 357    <sect3 id="sql-syntax-constants-generic">
 358     <title>Constants of Other Types</title>
 359
 360     <indexterm>
 361      <primary>data types</primary>
 362      <secondary>constants</secondary>
 363     </indexterm>
 364
 365     <para>
 366      A constant of an <emphasis>arbitrary</emphasis> type can be
 367      entered using any one of the following notations:
 368 <synopsis>
 369 <replaceable>type</replaceable> '<replaceable>string</replaceable>'
 370 '<replaceable>string</replaceable>'::<replaceable>type</replaceable>
 371 CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
 372 </synopsis>
 373      The string's text is passed to the input conversion
 374      routine for the type called <replaceable>type</replaceable>. The
 375      result is a constant of the indicated type.  The explicit type
 376      cast may be omitted if there is no ambiguity as to the type the
 377      constant must be (for example, when it is passed as an argument
 378      to a non-overloaded function), in which case it is automatically
 379      coerced.
 380     </para>
 381
 382     <para>
 383      It is also possible to specify a type coercion using a function-like
 384      syntax:
 385 <synopsis>
 386 <replaceable>typename</replaceable> ( '<replaceable>string</replaceable>' )
 387 </synopsis>
 388      but not all type names may be used in this way; see <xref
 389      linkend="sql-syntax-type-casts"> for details.
 390     </para>
 391
 392     <para>
 393      The <literal>::</literal>, <literal>CAST()</literal>, and
 394      function-call syntaxes can also be used to specify run-time type
 395      conversions of arbitrary expressions, as discussed in <xref
 396      linkend="sql-syntax-type-casts">.  But the form
 397      <replaceable>type</replaceable> '<replaceable>string</replaceable>'
 398      can only be used to specify the type of a literal constant.
 399      Another restriction on
 400      <replaceable>type</replaceable> '<replaceable>string</replaceable>'
 401      is that it does not work for array types; use <literal>::</literal>
 402      or <literal>CAST()</literal> to specify the type of an array constant.
 403     </para>
 404    </sect3>
 405
 406    <sect3>
 407     <title>Array constants</title>
 408
 409     <indexterm>
 410      <primary>arrays</primary>
 411      <secondary>constants</secondary>
 412     </indexterm>
 413
 414     <para>
 415      The general format of an array constant is the following:
 416 <synopsis>
 417 '{ <replaceable>val1</replaceable> <replaceable>delim</replaceable> <replaceable>val2</replaceable> <replaceable>delim</replaceable> ... }'
 418 </synopsis>
 419      where <replaceable>delim</replaceable> is the delimiter character
 420      for the type, as recorded in its <literal>pg_type</literal>
 421      entry.  (For all built-in types, this is the comma character
 422      <quote><literal>,</literal></>.)  Each <replaceable>val</replaceable> is either a constant
 423      of the array element type, or a subarray.  An example of an
 424      array constant is
 425 <programlisting>
 426 '{{1,2,3},{4,5,6},{7,8,9}}'
 427 </programlisting>
 428      This constant is a two-dimensional, 3-by-3 array consisting of three
 429      subarrays of integers.
 430     </para>
 431
 432     <para>
 433      Individual array elements can be placed between double-quote
 434      marks (<literal>"</literal>) <!-- " --> to avoid ambiguity
 435      problems with respect to whitespace.  Without quote marks, the
 436      array-value parser will skip leading whitespace.
 437     </para>
 438
 439     <para>
 440      (Array constants are actually only a special case of the generic
 441      type constants discussed in the previous section.  The constant
 442      is initially treated as a string and passed to the array input
 443      conversion routine.  An explicit type specification might be
 444      necessary.)
 445     </para>
 446    </sect3>
 447   </sect2>
 448
 449
 450   <sect2 id="sql-syntax-operators">
 451    <title>Operators</title>
 452
 453    <indexterm zone="sql-syntax-operators">
 454     <primary>operators</primary>
 455     <secondary>syntax</secondary>
 456    </indexterm>
 457
 458    <para>
 459     An operator is a sequence of up to <symbol>NAMEDATALEN</symbol>-1
 460     (31 by default) characters from the following list:
 461 <literallayout>
 462 + - * / &lt; &gt; = ~ ! @ # % ^ &amp; | ` ? $
 463 </literallayout>
 464
 465     There are a few restrictions on operator names, however:
 466     <itemizedlist>
 467      <listitem>
 468       <para>
 469        <literal>$</> (dollar) cannot be a single-character operator, although it
 470        can be part of a multiple-character operator name.
 471       </para>
 472      </listitem>
 473
 474      <listitem>
 475       <para>
 476        <literal>--</literal> and <literal>/*</literal> cannot appear
 477        anywhere in an operator name, since they will be taken as the
 478        start of a comment.
 479       </para>
 480      </listitem>
 481
 482      <listitem>
 483       <para>
 484        A multiple-character operator name cannot end in <literal>+</> or <literal>-</>,
 485        unless the name also contains at least one of these characters:
 486 <literallayout>
 487 ~ ! @ # % ^ &amp; | ` ? $
 488 </literallayout>
 489        For example, <literal>@-</literal> is an allowed operator name,
 490        but <literal>*-</literal> is not.  This restriction allows
 491        <productname>PostgreSQL</productname> to parse SQL-compliant
 492        queries without requiring spaces between tokens.
 493       </para>
 494      </listitem>
 495     </itemizedlist>
 496    </para>
 497
 498    <para>
 499     When working with non-SQL-standard operator names, you will usually
 500     need to separate adjacent operators with spaces to avoid ambiguity.
 501     For example, if you have defined a left unary operator named <literal>@</literal>,
 502     you cannot write <literal>X*@Y</literal>; you must write
 503     <literal>X* @Y</literal> to ensure that
 504     <productname>PostgreSQL</productname> reads it as two operator names
 505     not one.
 506    </para>
 507   </sect2>
 508
 509   <sect2>
 510    <title>Special Characters</title>
 511
 512   <para>
 513    Some characters that are not alphanumeric have a special meaning
 514    that is different from being an operator.  Details on the usage can
 515    be found at the location where the respective syntax element is
 516    described.  This section only exists to advise the existence and
 517    summarize the purposes of these characters.
 518
 519    <itemizedlist>
 520     <listitem>
 521      <para>
 522       A dollar sign (<literal>$</literal>) followed by digits is used
 523       to represent the positional parameters in the body of a function
 524       definition.  In other contexts the dollar sign may be part of an
 525       operator name.
 526      </para>
 527     </listitem>
 528
 529     <listitem>
 530      <para>
 531       Parentheses (<literal>()</literal>) have their usual meaning to
 532       group expressions and enforce precedence.  In some cases
 533       parentheses are required as part of the fixed syntax of a
 534       particular SQL command.
 535      </para>
 536     </listitem>
 537
 538     <listitem>
 539      <para>
 540       Brackets (<literal>[]</literal>) are used to select the elements
 541       of an array.  See <xref linkend="arrays"> for more information
 542       on arrays.
 543      </para>
 544     </listitem>
 545
 546     <listitem>
 547      <para>
 548       Commas (<literal>,</literal>) are used in some syntactical
 549       constructs to separate the elements of a list.
 550      </para>
 551     </listitem>
 552
 553     <listitem>
 554      <para>
 555       The semicolon (<literal>;</literal>) terminates an SQL command.
 556       It cannot appear anywhere within a command, except within a
 557       string constant or quoted identifier.
 558      </para>
 559     </listitem>
 560
 561     <listitem>
 562      <para>
 563       The colon (<literal>:</literal>) is used to select
 564       <quote>slices</quote> from arrays. (See <xref
 565       linkend="arrays">.)  In certain SQL dialects (such as Embedded
 566       SQL), the colon is used to prefix variable names.
 567      </para>
 568     </listitem>
 569
 570     <listitem>
 571      <para>
 572       The asterisk (<literal>*</literal>) has a special meaning when
 573       used in the <command>SELECT</command> command or with the
 574       <function>COUNT</function> aggregate function.
 575      </para>
 576     </listitem>
 577
 578     <listitem>
 579      <para>
 580       The period (<literal>.</literal>) is used in floating-point
 581       constants, and to separate table and column names.
 582      </para>
 583     </listitem>
 584    </itemizedlist>
 585
 586    </para>
 587   </sect2>
 588
 589   <sect2 id="sql-syntax-comments">
 590    <title>Comments</title>
 591
 592    <indexterm zone="sql-syntax-comments">
 593     <primary>comments</primary>
 594     <secondary>in SQL</secondary>
 595    </indexterm>
 596
 597    <para>
 598     A comment is an arbitrary sequence of characters beginning with
 599     double dashes and extending to the end of the line, e.g.:
 600 <programlisting>
 601 -- This is a standard SQL92 comment
 602 </programlisting>
 603    </para>
 604
 605    <para>
 606     Alternatively, C-style block comments can be used:
 607 <programlisting>
 608 /* multiline comment
 609  * with nesting: /* nested block comment */
 610  */
 611 </programlisting>
 612     where the comment begins with <literal>/*</literal> and extends to
 613     the matching occurrence of <literal>*/</literal>. These block
 614     comments nest, as specified in SQL99 but unlike C, so that one can
 615     comment out larger blocks of code that may contain existing block
 616     comments.
 617    </para>
 618
 619    <para>
 620     A comment is removed from the input stream before further syntax
 621     analysis and is effectively replaced by whitespace.
 622    </para>
 623   </sect2>
 624  </sect1>
 625
 626
 627   <sect1 id="sql-syntax-columns">
 628    <title>Columns</title>
 629
 630     <para>
 631      A <firstterm>column</firstterm>
 632      is either a user-defined column of a given table or one of the
 633      following system-defined columns:
 634
 635      <indexterm>
 636       <primary>columns</primary>
 637       <secondary>system columns</secondary>
 638      </indexterm>
 639
 640      <variablelist>
 641       <varlistentry>
 642        <term><structfield>oid</></term>
 643        <listitem>
 644         <para>
 645          <indexterm>
 646           <primary>OID</primary>
 647          </indexterm>
 648          The object identifier (object ID) of a row.  This is a serial number
 649          that is automatically added by <productname>PostgreSQL</productname> to all table rows (unless
 650          the table was created WITHOUT OIDS, in which case this column is
 651          not present).
 652         </para>
 653        </listitem>
 654       </varlistentry>
 655
 656       <varlistentry>
 657       <term><structfield>tableoid</></term>
 658        <listitem>
 659         <para>
 660          The OID of the table containing this row.  This attribute is
 661          particularly handy for queries that select from inheritance
 662          hierarchies, since without it, it's difficult to tell which
 663          individual table a row came from.  The
 664          <structfield>tableoid</structfield> can be joined against the
 665          <structfield>oid</structfield> column of
 666          <classname>pg_class</classname> to obtain the table name.
 667         </para>
 668        </listitem>
 669       </varlistentry>
 670
 671       <varlistentry>
 672        <term><structfield>xmin</></term>
 673        <listitem>
 674         <para>
 675          The identity (transaction ID) of the inserting transaction for
 676          this tuple.  (Note: A tuple is an individual state of a row;
 677          each update of a row creates a new tuple for the same logical row.)
 678         </para>
 679        </listitem>
 680       </varlistentry>
 681
 682       <varlistentry>
 683       <term><structfield>cmin</></term>
 684        <listitem>
 685         <para>
 686          The command identifier (starting at zero) within the inserting
 687          transaction.
 688         </para>
 689        </listitem>
 690       </varlistentry>
 691
 692       <varlistentry>
 693       <term><structfield>xmax</></term>
 694        <listitem>
 695         <para>
 696          The identity (transaction ID) of the deleting transaction,
 697          or zero for an undeleted tuple.  It is possible for this field
 698          to be nonzero in a visible tuple: that usually indicates that the
 699          deleting transaction hasn't committed yet, or that an attempted
 700          deletion was rolled back.
 701         </para>
 702        </listitem>
 703       </varlistentry>
 704
 705       <varlistentry>
 706       <term><structfield>cmax</></term>
 707        <listitem>
 708         <para>
 709          The command identifier within the deleting transaction, or zero.
 710         </para>
 711        </listitem>
 712       </varlistentry>
 713
 714       <varlistentry>
 715       <term><structfield>ctid</></term>
 716        <listitem>
 717         <para>
 718          The tuple ID of the tuple within its table.  This is a pair
 719          (block number, tuple index within block) that identifies the
 720          physical location of the tuple.  Note that although the <structfield>ctid</structfield>
 721          can be used to locate the tuple very quickly, a row's <structfield>ctid</structfield>
 722          will change each time it is updated or moved by <command>VACUUM
 723          FULL</>.
 724          Therefore <structfield>ctid</structfield> is useless as a long-term row identifier.
 725          The OID, or even better a user-defined serial number, should
 726          be used to identify logical rows.
 727         </para>
 728        </listitem>
 729       </varlistentry>
 730      </variablelist>
 731     </para>
 732
 733     <para>
 734      OIDs are 32-bit quantities and are assigned from a single cluster-wide
 735      counter.  In a large or long-lived database, it is possible for the
 736      counter to wrap around.  Hence, it is bad practice to assume that OIDs
 737      are unique, unless you take steps to ensure that they are unique.
 738      Recommended practice when using OIDs for row identification is to create
 739      a unique constraint on the OID column of each table for which the OID will be
 740      used.  Never assume that OIDs are unique across tables; use the
 741      combination of <structfield>tableoid</> and row OID if you need a database-wide
 742      identifier.  (Future releases of <productname>PostgreSQL</productname> are likely to use a separate
 743      OID counter for each table, so that <structfield>tableoid</> <emphasis>must</> be
 744      included to arrive at a globally unique identifier.)
 745     </para>
 746
 747     <para>
 748      Transaction identifiers are 32-bit quantities.  In a long-lived
 749      database it is possible for transaction IDs to wrap around.  This
 750      is not a fatal problem given appropriate maintenance procedures;
 751      see the <citetitle>Administrator's Guide</> for details.  However, it is
 752      unwise to depend on uniqueness of transaction IDs over the long term
 753      (more than one billion transactions).
 754     </para>
 755
 756     <para>
 757      Command identifiers are also 32-bit quantities.  This creates a hard
 758      limit of 2<superscript>32</> (4 billion) SQL commands within a single transaction.
 759      In practice this limit is not a problem --- note that the limit is on
 760      number of SQL queries, not number of tuples processed.
 761     </para>
 762   </sect1>
 763
 764
 765  <sect1 id="sql-expressions">
 766   <title>Value Expressions</title>
 767
 768   <para>
 769    Value expressions are used in a variety of contexts, such
 770    as in the target list of the <command>SELECT</command> command, as
 771    new column values in <command>INSERT</command> or
 772    <command>UPDATE</command>, or in search conditions in a number of
 773    commands.  The result of a value expression is sometimes called a
 774    <firstterm>scalar</firstterm>, to distinguish it from the result of
 775    a table expression (which is a table).  Value expressions are
 776    therefore also called <firstterm>scalar expressions</firstterm> (or
 777    even simply <firstterm>expressions</firstterm>).  The expression
 778    syntax allows the calculation of values from primitive parts using
 779    arithmetic, logical, set, and other operations.
 780   </para>
 781
 782   <para>
 783    A value expression is one of the following:
 784
 785    <itemizedlist>
 786     <listitem>
 787      <para>
 788       A constant or literal value; see <xref linkend="sql-syntax-constants">.
 789      </para>
 790     </listitem>
 791
 792     <listitem>
 793      <para>
 794       A column reference.
 795      </para>
 796     </listitem>
 797
 798     <listitem>
 799      <para>
 800       A positional parameter reference, in the body of a function declaration.
 801      </para>
 802     </listitem>
 803
 804     <listitem>
 805      <para>
 806       An operator invocation.
 807      </para>
 808     </listitem>
 809
 810     <listitem>
 811      <para>
 812       A function call.
 813      </para>
 814     </listitem>
 815
 816     <listitem>
 817      <para>
 818       An aggregate expression.
 819      </para>
 820     </listitem>
 821
 822     <listitem>
 823      <para>
 824       A type cast.
 825      </para>
 826     </listitem>
 827
 828     <listitem>
 829      <para>
 830       A scalar subquery.
 831      </para>
 832     </listitem>
 833
 834     <listitem>
 835 <synopsis>( <replaceable>expression</replaceable> )</synopsis>
 836      <para>
 837       Parentheses are used to group subexpressions and override precedence.
 838      </para>
 839     </listitem>
 840    </itemizedlist>
 841   </para>
 842
 843   <para>
 844    In addition to this list, there are a number of constructs that can
 845    be classified as an expression but do not follow any general syntax
 846    rules.  These generally have the semantics of a function or
 847    operator and are explained in the appropriate location in <xref
 848    linkend="functions">.  An example is the <literal>IS NULL</literal>
 849    clause.
 850   </para>
 851
 852   <para>
 853    We have already discussed constants in <xref
 854    linkend="sql-syntax-constants">.  The following sections discuss
 855    the remaining options.
 856   </para>
 857
 858   <sect2>
 859    <title>Column References</title>
 860
 861    <para>
 862     A column can be referenced in the form:
 863 <synopsis>
 864 <replaceable>correlation</replaceable>.<replaceable>columnname</replaceable> `['<replaceable>subscript</replaceable>`]'
 865 </synopsis>
 866
 867     <replaceable>correlation</replaceable> is either the name of a
 868     table, an alias for a table defined by means of a FROM clause, or
 869     the key words <literal>NEW</literal> or <literal>OLD</literal>.
 870     (NEW and OLD can only appear in the action portion of a rule,
 871     while other correlation names can be used in any SQL statement.)
 872     The correlation name and separating dot may be omitted if the column name
 873     is unique
 874     across all the tables being used in the current query.  If
 875     <replaceable>column</replaceable> is of an array type, then the
 876     optional <replaceable>subscript</replaceable> selects a specific
 877     element or elements in the array.  If no subscript is provided, then the
 878     whole array is selected.  (See <xref linkend="arrays"> for more about
 879     arrays.)
 880    </para>
 881   </sect2>
 882
 883   <sect2>
 884    <title>Positional Parameters</title>
 885
 886    <para>
 887     A positional parameter reference is used to indicate a parameter
 888     in an SQL function.  Typically this is used in SQL function
 889     definition statements.  The form of a parameter is:
 890 <synopsis>
 891 $<replaceable>number</replaceable>
 892 </synopsis>
 893    </para>
 894
 895    <para>
 896     For example, consider the definition of a function,
 897     <function>dept</function>, as
 898
 899 <programlisting>
 900 CREATE FUNCTION dept (text) RETURNS dept
 901   AS 'SELECT * FROM dept WHERE name = $1'
 902   LANGUAGE SQL;
 903 </programlisting>
 904
 905     Here the <literal>$1</literal> will be replaced by the first
 906     function argument when the function is invoked.
 907    </para>
 908   </sect2>
 909
 910   <sect2>
 911    <title>Operator Invocations</title>
 912
 913    <para>
 914     There are three possible syntaxes for an operator invocation:
 915     <simplelist>
 916      <member><replaceable>expression</replaceable> <replaceable>operator</replaceable> <replaceable>expression</replaceable> (binary infix operator)</member>
 917      <member><replaceable>operator</replaceable> <replaceable>expression</replaceable> (unary prefix operator)</member>
 918      <member><replaceable>expression</replaceable> <replaceable>operator</replaceable> (unary postfix operator)</member>
 919     </simplelist>
 920     where the <replaceable>operator</replaceable> token follows the syntax
 921     rules of <xref linkend="sql-syntax-operators"> or is one of the
 922     tokens <token>AND</token>, <token>OR</token>, and
 923     <token>NOT</token>.  Which particular operators exist and whether
 924     they are unary or binary depends on what operators have been
 925     defined by the system or the user.  <xref linkend="functions">
 926     describes the built-in operators.
 927    </para>
 928   </sect2>
 929
 930   <sect2>
 931    <title>Function Calls</title>
 932
 933    <para>
 934     The syntax for a function call is the name of a function
 935     (which is subject to the syntax rules for identifiers of <xref
 936     linkend="sql-syntax-identifiers">), followed by its argument list
 937     enclosed in parentheses:
 938
 939 <synopsis>
 940 <replaceable>function</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional> )
 941 </synopsis>
 942    </para>
 943
 944    <para>
 945     For example, the following computes the square root of 2:
 946 <programlisting>
 947 sqrt(2)
 948 </programlisting>
 949    </para>
 950
 951    <para>
 952     The list of built-in functions is in <xref linkend="functions">.
 953     Other functions may be added by the user.
 954    </para>
 955   </sect2>
 956
 957   <sect2 id="syntax-aggregates">
 958    <title>Aggregate Expressions</title>
 959
 960    <indexterm zone="syntax-aggregates">
 961     <primary>aggregate functions</primary>
 962    </indexterm>
 963
 964    <para>
 965     An <firstterm>aggregate expression</firstterm> represents the
 966     application of an aggregate function across the rows selected by a
 967     query.  An aggregate function reduces multiple inputs to a single
 968     output value, such as the sum or average of the inputs.  The
 969     syntax of an aggregate expression is one of the following:
 970
 971     <simplelist>
 972      <member><replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable>)</member>
 973      <member><replaceable>aggregate_name</replaceable> (ALL <replaceable>expression</replaceable>)</member>
 974      <member><replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable>)</member>
 975      <member><replaceable>aggregate_name</replaceable> ( * )</member>
 976     </simplelist>
 977
 978     where <replaceable>aggregate_name</replaceable> is a previously
 979     defined aggregate, and <replaceable>expression</replaceable> is
 980     any value expression that does not itself contain an aggregate
 981     expression.
 982    </para>
 983
 984    <para>
 985     The first form of aggregate expression invokes the aggregate
 986     across all input rows for which the given expression yields a
 987     non-NULL value.  (Actually, it is up to the aggregate function
 988     whether to ignore NULLs or not --- but all the standard ones do.)
 989     The second form is the same as the first, since
 990     <literal>ALL</literal> is the default.  The third form invokes the
 991     aggregate for all distinct non-NULL values of the expression found
 992     in the input rows.  The last form invokes the aggregate once for
 993     each input row regardless of NULL or non-NULL values; since no
 994     particular input value is specified, it is generally only useful
 995     for the <function>count()</function> aggregate function.
 996    </para>
 997
 998    <para>
 999     For example, <literal>count(*)</literal> yields the total number
1000     of input rows; <literal>count(f1)</literal> yields the number of
1001     input rows in which <literal>f1</literal> is non-NULL;
1002     <literal>count(distinct f1)</literal> yields the number of
1003     distinct non-NULL values of <literal>f1</literal>.
1004    </para>
1005
1006    <para>
1007     The predefined aggregate functions are described in <xref
1008     linkend="functions-aggregate">.  Other aggregate functions may be added
1009     by the user.
1010    </para>
1011   </sect2>
1012
1013   <sect2 id="sql-syntax-type-casts">
1014    <title>Type Casts</title>
1015
1016     <indexterm>
1017      <primary>data types</primary>
1018      <secondary>type casts</secondary>
1019     </indexterm>
1020
1021    <para>
1022     A type cast specifies a conversion from one data type to another.
1023     <productname>PostgreSQL</productname> accepts two equivalent syntaxes
1024     for type casts:
1025 <synopsis>
1026 CAST ( <replaceable>expression</replaceable> AS <replaceable>type</replaceable> )
1027 <replaceable>expression</replaceable>::<replaceable>type</replaceable>
1028 </synopsis>
1029     The <literal>CAST</> syntax conforms to SQL92; the syntax with
1030     <literal>::</literal> is historical <productname>PostgreSQL</productname>
1031     usage.
1032    </para>
1033
1034    <para>
1035     When a cast is applied to a value expression of a known type, it
1036     represents a run-time type conversion.  The cast will succeed only
1037     if a suitable type conversion function is available.  Notice that this
1038     is subtly different from the use of casts with constants, as shown in
1039     <xref linkend="sql-syntax-constants-generic">.  A cast applied to an
1040     unadorned string literal represents the initial assignment of a type
1041     to a literal constant value, and so it will succeed for any type
1042     (if the contents of the string literal are acceptable input syntax for the
1043     data type).
1044    </para>
1045
1046    <para>
1047     An explicit type cast may be omitted if there is no ambiguity as to the
1048     type that a value expression must produce (for example, when it is
1049     assigned to a table column); the system will automatically apply a
1050     type cast in such cases.
1051    </para>
1052
1053    <para>
1054     It is also possible to specify a type cast using a function-like
1055     syntax:
1056 <synopsis>
1057 <replaceable>typename</replaceable> ( <replaceable>expression</replaceable> )
1058 </synopsis>
1059     However, this only works for types whose names are also valid as
1060     function names.  For example, <literal>double precision</literal>
1061     can't be used this way, but the equivalent <literal>float8</literal>
1062     can.  Also, the names <literal>interval</>, <literal>time</>, and
1063     <literal>timestamp</> can only be used in this fashion if they are
1064     double-quoted, because of parser conflicts.  Therefore, the use of
1065     the function-like cast syntax leads to inconsistencies and should
1066     probably be avoided in new applications.
1067    </para>
1068   </sect2>
1069
1070   <sect2>
1071    <title>Scalar Subqueries</title>
1072
1073    <para>
1074     A scalar subquery is an ordinary
1075     <command>SELECT</command> in parentheses that returns exactly one
1076     row with one column.  The <command>SELECT</command> query is executed
1077     and the single returned value is used in the surrounding value expression.
1078     It is an error to use a query that
1079     returns more than one row or more than one column as a scalar subquery.
1080     (But if, during a particular execution, the subquery returns no rows,
1081     there is no error; the scalar result is taken to be NULL.)
1082     The subquery can refer to variables from the surrounding query,
1083     which will act as constants during any one evaluation of the subquery.
1084     See also <xref linkend="functions-subquery">.
1085    </para>
1086
1087    <para>
1088     For example, the following finds the largest city population in each
1089     state:
1090 <programlisting>
1091 SELECT name, (SELECT max(pop) FROM cities WHERE cities.state = states.name)
1092 FROM states;
1093 </programlisting>
1094    </para>
1095   </sect2>
1096
1097  </sect1>
1098
1099
1100   <sect1 id="sql-precedence">
1101    <title>Lexical Precedence</title>
1102
1103    <indexterm zone="sql-precedence">
1104     <primary>operators</primary>
1105     <secondary>precedence</secondary>
1106    </indexterm>
1107
1108    <para>
1109     The precedence and associativity of the operators is hard-wired
1110     into the parser.  Most operators have the same precedence and are
1111     left-associative.  This may lead to non-intuitive behavior; for
1112     example the Boolean operators <literal>&lt;</> and <literal>&gt;</> have a different
1113     precedence than the Boolean operators <literal>&lt;=</> and <literal>&gt;=</>.  Also,
1114     you will sometimes need to add parentheses when using combinations
1115     of binary and unary operators.  For instance
1116 <programlisting>
1117 SELECT 5 ! - 6;
1118 </programlisting>
1119    will be parsed as
1120 <programlisting>
1121 SELECT 5 ! (- 6);
1122 </programlisting>
1123     because the parser has no idea -- until it is too late -- that
1124     <token>!</token> is defined as a postfix operator, not an infix one.
1125     To get the desired behavior in this case, you must write
1126 <programlisting>
1127 SELECT (5 !) - 6;
1128 </programlisting>
1129     This is the price one pays for extensibility.
1130    </para>
1131
1132    <table tocentry="1">
1133     <title>Operator Precedence (decreasing)</title>
1134
1135     <tgroup cols="3">
1136      <thead>
1137       <row>
1138        <entry>Operator/Element</entry>
1139        <entry>Associativity</entry>
1140        <entry>Description</entry>
1141       </row>
1142      </thead>
1143
1144      <tbody>
1145       <row>
1146        <entry><token>::</token></entry>
1147        <entry>left</entry>
1148        <entry><productname>PostgreSQL</productname>-style typecast</entry>
1149       </row>
1150
1151       <row>
1152        <entry><token>[</token> <token>]</token></entry>
1153        <entry>left</entry>
1154        <entry>array element selection</entry>
1155       </row>
1156
1157       <row>
1158        <entry><token>.</token></entry>
1159        <entry>left</entry>
1160        <entry>table/column name separator</entry>
1161       </row>
1162
1163       <row>
1164        <entry><token>-</token></entry>
1165        <entry>right</entry>
1166        <entry>unary minus</entry>
1167       </row>
1168
1169       <row>
1170        <entry><token>^</token></entry>
1171        <entry>left</entry>
1172        <entry>exponentiation</entry>
1173       </row>
1174
1175       <row>
1176        <entry><token>*</token> <token>/</token> <token>%</token></entry>
1177        <entry>left</entry>
1178        <entry>multiplication, division, modulo</entry>
1179       </row>
1180
1181       <row>
1182        <entry><token>+</token> <token>-</token></entry>
1183        <entry>left</entry>
1184        <entry>addition, subtraction</entry>
1185       </row>
1186
1187       <row>
1188        <entry><token>IS</token></entry>
1189        <entry></entry>
1190        <entry>test for TRUE, FALSE, UNKNOWN, NULL</entry>
1191       </row>
1192
1193       <row>
1194        <entry><token>ISNULL</token></entry>
1195        <entry></entry>
1196        <entry>test for NULL</entry>
1197       </row>
1198
1199       <row>
1200        <entry><token>NOTNULL</token></entry>
1201        <entry></entry>
1202        <entry>test for NOT NULL</entry>
1203       </row>
1204
1205       <row>
1206        <entry>(any other)</entry>
1207        <entry>left</entry>
1208        <entry>all other native and user-defined operators</entry>
1209       </row>
1210
1211       <row>
1212        <entry><token>IN</token></entry>
1213        <entry></entry>
1214        <entry>set membership</entry>
1215       </row>
1216
1217       <row>
1218        <entry><token>BETWEEN</token></entry>
1219        <entry></entry>
1220        <entry>containment</entry>
1221       </row>
1222
1223       <row>
1224        <entry><token>OVERLAPS</token></entry>
1225        <entry></entry>
1226        <entry>time interval overlap</entry>
1227       </row>
1228
1229       <row>
1230        <entry><token>LIKE</token> <token>ILIKE</token></entry>
1231        <entry></entry>
1232        <entry>string pattern matching</entry>
1233       </row>
1234
1235       <row>
1236        <entry><token>&lt;</token> <token>&gt;</token></entry>
1237        <entry></entry>
1238        <entry>less than, greater than</entry>
1239       </row>
1240
1241       <row>
1242        <entry><token>=</token></entry>
1243        <entry>right</entry>
1244        <entry>equality, assignment</entry>
1245       </row>
1246
1247       <row>
1248        <entry><token>NOT</token></entry>
1249        <entry>right</entry>
1250        <entry>logical negation</entry>
1251       </row>
1252
1253       <row>
1254        <entry><token>AND</token></entry>
1255        <entry>left</entry>
1256        <entry>logical conjunction</entry>
1257       </row>
1258
1259       <row>
1260        <entry><token>OR</token></entry>
1261        <entry>left</entry>
1262        <entry>logical disjunction</entry>
1263       </row>
1264      </tbody>
1265     </tgroup>
1266    </table>
1267
1268    <para>
1269     Note that the operator precedence rules also apply to user-defined
1270     operators that have the same names as the built-in operators
1271     mentioned above.  For example, if you define a
1272     <quote>+</quote> operator for some custom data type it will have
1273     the same precedence as the built-in <quote>+</quote> operator, no
1274     matter what yours does.
1275    </para>
1276   </sect1>
1277
1278 </chapter>
1279
1280 <!-- Keep this comment at the end of the file
1281 Local variables:
1282 mode:sgml
1283 sgml-omittag:nil
1284 sgml-shorttag:t
1285 sgml-minimize-attributes:nil
1286 sgml-always-quote-attributes:t
1287 sgml-indent-step:1
1288 sgml-indent-data:t
1289 sgml-parent-document:nil
1290 sgml-default-dtd-file:"./reference.ced"
1291 sgml-exposed-tags:nil
1292 sgml-local-catalogs:("/usr/lib/sgml/catalog")
1293 sgml-local-ecat-files:nil
1294 End:
1295 -->