doc/src/sgml/syntax.sgml

   1 <!--
   2 $Header: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v 1.50 2001/11/08 23:34:33 petere Exp $
   3 -->
   4
   5 <chapter id="sql-syntax">
   6  <title>SQL Syntax</title>
   7
   8  <indexterm zone="sql-syntax">
   9   <primary>syntax</primary>
  10   <secondary>SQL</secondary>
  11  </indexterm>
  12
  13   <abstract>
  14    <para>
  15     This chapter describes the syntax of SQL.
  16    </para>
  17   </abstract>
  18
  19  <sect1 id="sql-syntax-lexical">
  20   <title>Lexical Structure</title>
  21
  22   <para>
  23    SQL input consists of a sequence of
  24    <firstterm>commands</firstterm>.  A command is composed of a
  25    sequence of <firstterm>tokens</firstterm>, terminated by a
  26    semicolon (<quote>;</quote>).  The end of the input stream also
  27    terminates a command.  Which tokens are valid depends on the syntax
  28    of the particular command.
  29   </para>
  30
  31   <para>
  32    A token can be a <firstterm>key word</firstterm>, an
  33    <firstterm>identifier</firstterm>, a <firstterm>quoted
  34    identifier</firstterm>, a <firstterm>literal</firstterm> (or
  35    constant), or a special character symbol.  Tokens are normally
  36    separated by whitespace (space, tab, newline), but need not be if
  37    there is no ambiguity (which is generally only the case if a
  38    special character is adjacent to some other token type).
  39   </para>
  40
  41   <para>
  42    Additionally, <firstterm>comments</firstterm> can occur in SQL
  43    input.  They are not tokens, they are effectively equivalent to
  44    whitespace.
  45   </para>
  46
  47   <informalexample id="sql-syntax-ex-commands">
  48    <para>
  49     For example, the following is (syntactically) valid SQL input:
  50 <programlisting>
  51 SELECT * FROM MY_TABLE;
  52 UPDATE MY_TABLE SET A = 5;
  53 INSERT INTO MY_TABLE VALUES (3, 'hi there');
  54 </programlisting>
  55     This is a sequence of three commands, one per line (although this
  56     is not required; more than one command can be on a line, and
  57     commands can usefully be split across lines).
  58    </para>
  59   </informalexample>
  60
  61   <para>
  62    The SQL syntax is not very consistent regarding what tokens
  63    identify commands and which are operands or parameters.  The first
  64    few tokens are generally the command name, so in the above example
  65    we would usually speak of a <quote>SELECT</quote>, an
  66    <quote>UPDATE</quote>, and an <quote>INSERT</quote> command.  But
  67    for instance the <command>UPDATE</command> command always requires
  68    a <token>SET</token> token to appear in a certain position, and
  69    this particular variation of <command>INSERT</command> also
  70    requires a <token>VALUES</token> in order to be complete.  The
  71    precise syntax rules for each command are described in the
  72    <citetitle>Reference Manual</citetitle>.
  73   </para>
  74
  75   <sect2 id="sql-syntax-identifiers">
  76    <title>Identifiers and Key Words</title>
  77
  78    <indexterm zone="sql-syntax-identifiers">
  79     <primary>identifiers</primary>
  80    </indexterm>
  81
  82    <indexterm zone="sql-syntax-identifiers">
  83     <primary>key words</primary>
  84     <secondary>syntax</secondary>
  85    </indexterm>
  86
  87    <para>
  88     Tokens such as <token>SELECT</token>, <token>UPDATE</token>, or
  89     <token>VALUES</token> in the example above are examples of
  90     <firstterm>key words</firstterm>, that is, words that have a fixed
  91     meaning in the SQL language.  The tokens <token>MY_TABLE</token>
  92     and <token>A</token> are examples of
  93     <firstterm>identifiers</firstterm>.  They identify names of
  94     tables, columns, or other database objects, depending on the
  95     command they are used in.  Therefore they are sometimes simply
  96     called <quote>names</quote>.  Key words and identifiers have the
  97     same lexical structure, meaning that one cannot know whether a
  98     token is an identifier or a key word without knowing the language.
  99     A complete list of key words can be found in <xref
 100     linkend="sql-keywords-appendix">.
 101    </para>
 102
 103    <para>
 104     SQL identifiers and key words must begin with a letter
 105     (<literal>a</literal>-<literal>z</literal>) or underscore
 106     (<literal>_</literal>).  Subsequent characters in an identifier or
 107     key word can be letters, digits
 108     (<literal>0</literal>-<literal>9</literal>), or underscores,
 109     although the SQL standard will not define a key word that contains
 110     digits or starts or ends with an underscore.
 111    </para>
 112
 113    <para>
 114     The system uses no more than <symbol>NAMEDATALEN</symbol>-1
 115     characters of an identifier; longer names can be written in
 116     commands, but they will be truncated.  By default,
 117     <symbol>NAMEDATALEN</symbol> is 32 so the maximum identifier length
 118     is 31 (but at the time the system is built,
 119     <symbol>NAMEDATALEN</symbol> can be changed in
 120     <filename>src/include/postgres_ext.h</filename>).
 121    </para>
 122
 123    <para>
 124     <indexterm>
 125      <primary>case sensitivity</primary>
 126      <secondary>SQL commands</secondary>
 127     </indexterm>
 128     Identifier and key word names are case insensitive.  Therefore
 129 <programlisting>
 130 UPDATE MY_TABLE SET A = 5;
 131 </programlisting>
 132     can equivalently be written as
 133 <programlisting>
 134 uPDaTE my_TabLE SeT a = 5;
 135 </programlisting>
 136     A convention often used is to write key words in upper
 137     case and names in lower case, e.g.,
 138 <programlisting>
 139 UPDATE my_table SET a = 5;
 140 </programlisting>
 141    </para>
 142
 143    <para>
 144     <indexterm>
 145      <primary>quotes</primary>
 146      <secondary>and identifiers</secondary>
 147     </indexterm>
 148     There is a second kind of identifier:  the <firstterm>delimited
 149     identifier</firstterm> or <firstterm>quoted
 150     identifier</firstterm>.  It is formed by enclosing an arbitrary
 151     sequence of characters in double-quotes
 152     (<literal>"</literal>). <!-- " font-lock mania --> A delimited
 153     identifier is always an identifier, never a key word.  So
 154     <literal>"select"</literal> could be used to refer to a column or
 155     table named <quote>select</quote>, whereas an unquoted
 156     <literal>select</literal> would be taken as a key word and
 157     would therefore provoke a parse error when used where a table or
 158     column name is expected.  The example can be written with quoted
 159     identifiers like this:
 160 <programlisting>
 161 UPDATE "my_table" SET "a" = 5;
 162 </programlisting>
 163    </para>
 164
 165    <para>
 166     Quoted identifiers can contain any character other than a double
 167     quote itself.  This allows constructing table or column names that
 168     would otherwise not be possible, such as ones containing spaces or
 169     ampersands.  The length limitation still applies.
 170    </para>
 171
 172    <para>
 173     Quoting an identifier also makes it case-sensitive, whereas
 174     unquoted names are always folded to lower case.  For example, the
 175     identifiers <literal>FOO</literal>, <literal>foo</literal> and
 176     <literal>"foo"</literal> are considered the same by
 177     <productname>Postgres</productname>, but <literal>"Foo"</literal>
 178     and <literal>"FOO"</literal> are different from these three and
 179     each other.
 180     <footnote>
 181      <para>
 182       The folding of unquoted names to lower case in <productname>PostgreSQL</>
 183       is incompatible with the SQL standard, which says that unquoted
 184       names should be folded to upper case.  Thus, <literal>foo</literal>
 185       should be equivalent to <literal>"FOO"</literal> not
 186       <literal>"foo"</literal> according to the standard.  If you want to
 187       write portable applications you are advised to always quote a particular
 188       name or never quote it.
 189      </para>
 190     </footnote>
 191    </para>
 192   </sect2>
 193
 194
 195   <sect2 id="sql-syntax-constants">
 196    <title>Constants</title>
 197
 198    <indexterm zone="sql-syntax-constants">
 199     <primary>constants</primary>
 200    </indexterm>
 201
 202    <para>
 203     There are four kinds of <firstterm>implicitly typed
 204     constants</firstterm> in <productname>Postgres</productname>:
 205     strings, bit strings, integers, and floating point numbers.
 206     Constants can also be specified with explicit types, which can
 207     enable more accurate representation and more efficient handling by
 208     the system. The implicit constants are described below; explicit
 209     constants are discussed afterwards.
 210    </para>
 211
 212    <sect3 id="sql-syntax-strings">
 213     <title>String Constants</title>
 214
 215     <indexterm zone="sql-syntax-strings">
 216      <primary>character strings</primary>
 217      <secondary>constants</secondary>
 218     </indexterm>
 219
 220     <para>
 221      <indexterm>
 222       <primary>quotes</primary>
 223       <secondary>escaping</secondary>
 224      </indexterm>
 225      A string constant in SQL is an arbitrary sequence of characters
 226      bounded by single quotes (<quote>'</quote>), e.g., <literal>'This
 227      is a string'</literal>.  SQL allows single quotes to be embedded
 228      in strings by typing two adjacent single quotes (e.g.,
 229      <literal>'Dianne''s horse'</literal>).  In
 230      <productname>Postgres</productname> single quotes may
 231      alternatively be escaped with a backslash (<quote>\</quote>,
 232      e.g., <literal>'Dianne\'s horse'</literal>).
 233     </para>
 234
 235     <para>
 236      C-style backslash escapes are also available:
 237      <literal>\b</literal> is a backspace, <literal>\f</literal> is a
 238      form feed, <literal>\n</literal> is a newline,
 239      <literal>\r</literal> is a carriage return, <literal>\t</literal>
 240      is a tab, and <literal>\<replaceable>xxx</replaceable></literal>,
 241      where <replaceable>xxx</replaceable> is an octal number, is the
 242      character with the corresponding ASCII code.  Any other character
 243      following a backslash is taken literally.  Thus, to include a
 244      backslash in a string constant, type two backslashes.
 245     </para>
 246
 247     <para>
 248      The character with the code zero cannot be in a string constant.
 249     </para>
 250
 251     <para>
 252      Two string constants that are only separated by whitespace
 253      <emphasis>with at least one newline</emphasis> are concatenated
 254      and effectively treated as if the string had been written in one
 255      constant.  For example:
 256 <programlisting>
 257 SELECT 'foo'
 258 'bar';
 259 </programlisting>
 260      is equivalent to
 261 <programlisting>
 262 SELECT 'foobar';
 263 </programlisting>
 264      but
 265 <programlisting>
 266 SELECT 'foo'      'bar';
 267 </programlisting>
 268      is not valid syntax.
 269     </para>
 270    </sect3>
 271
 272    <sect3 id="sql-syntax-bit-strings">
 273     <title>Bit String Constants</title>
 274
 275     <indexterm zone="sql-syntax-bit-strings">
 276      <primary>bit strings</primary>
 277      <secondary>constants</secondary>
 278     </indexterm>
 279
 280     <para>
 281      Bit string constants look like string constants with a
 282      <literal>B</literal> (upper or lower case) immediately before the
 283      opening quote (no intervening whitespace), e.g.,
 284      <literal>B'1001'</literal>.  The only characters allowed within
 285      bit string constants are <literal>0</literal> and
 286      <literal>1</literal>.  Bit string constants can be continued
 287      across lines in the same way as regular string constants.
 288     </para>
 289    </sect3>
 290
 291    <sect3>
 292     <title>Integer Constants</title>
 293
 294     <para>
 295      Integer constants in SQL are sequences of decimal digits (0
 296      though 9) with no decimal point.  The range of legal values
 297      depends on which integer data type is used, but the plain
 298      <type>integer</type> type accepts values ranging from -2147483648
 299      to +2147483647.  (The optional plus or minus sign is actually a
 300      separate unary operator and not part of the integer constant.)
 301     </para>
 302    </sect3>
 303
 304    <sect3>
 305     <title>Floating Point Constants</title>
 306
 307     <indexterm>
 308      <primary>floating point</primary>
 309      <secondary>constants</secondary>
 310     </indexterm>
 311
 312     <para>
 313      Floating point constants are accepted in these general forms:
 314 <synopsis>
 315 <replaceable>digits</replaceable>.<optional><replaceable>digits</replaceable></optional><optional>e<optional>+-</optional><replaceable>digits</replaceable></optional>
 316 <optional><replaceable>digits</replaceable></optional>.<replaceable>digits</replaceable><optional>e<optional>+-</optional><replaceable>digits</replaceable></optional>
 317 <replaceable>digits</replaceable>e<optional>+-</optional><replaceable>digits</replaceable>
 318 </synopsis>
 319      where <replaceable>digits</replaceable> is one or more decimal
 320      digits.  At least one digit must be before or after the decimal
 321      point, and after the <literal>e</literal> if you use that option.
 322      Thus, a floating point constant is distinguished from an integer
 323      constant by the presence of either the decimal point or the
 324      exponent clause (or both).  There must not be a space or other
 325      characters embedded in the constant.
 326     </para>
 327
 328     <informalexample>
 329      <para>
 330       These are some examples of valid floating point constants:
 331 <literallayout>
 332 3.5
 333 4.
 334 .001
 335 5e2
 336 1.925e-3
 337 </literallayout>
 338      </para>
 339     </informalexample>
 340
 341     <para>
 342      Floating point constants are of type <type>DOUBLE
 343      PRECISION</type>. <type>REAL</type> can be specified explicitly
 344      by using <acronym>SQL</acronym> string notation or
 345      <productname>Postgres</productname> type notation:
 346
 347 <programlisting>
 348 REAL '1.23'  -- string style
 349 '1.23'::REAL -- Postgres (historical) style
 350      </programlisting>
 351     </para>
 352    </sect3>
 353
 354    <sect3 id="sql-syntax-constants-generic">
 355     <title>Constants of Other Types</title>
 356
 357     <indexterm>
 358      <primary>data types</primary>
 359      <secondary>constants</secondary>
 360     </indexterm>
 361
 362     <para>
 363      A constant of an <emphasis>arbitrary</emphasis> type can be
 364      entered using any one of the following notations:
 365 <synopsis>
 366 <replaceable>type</replaceable> '<replaceable>string</replaceable>'
 367 '<replaceable>string</replaceable>'::<replaceable>type</replaceable>
 368 CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
 369 </synopsis>
 370      The value inside the string is passed to the input conversion
 371      routine for the type called <replaceable>type</replaceable>. The
 372      result is a constant of the indicated type.  The explicit type
 373      cast may be omitted if there is no ambiguity as to the type the
 374      constant must be (for example, when it is passed as an argument
 375      to a non-overloaded function), in which case it is automatically
 376      coerced.
 377     </para>
 378
 379     <para>
 380      It is also possible to specify a type coercion using a function-like
 381      syntax:
 382 <synopsis>
 383 <replaceable>typename</replaceable> ( <replaceable>value</replaceable> )
 384 </synopsis>
 385      although this only works for types whose names are also valid as
 386      function names.  (For example, <literal>double precision</literal>
 387      can't be used this way --- but the equivalent <literal>float8</literal>
 388      can.)
 389     </para>
 390
 391     <para>
 392      The <literal>::</literal>, <literal>CAST()</literal>, and
 393      function-call syntaxes can also be used to specify the type of
 394      arbitrary expressions, but the form
 395      <replaceable>type</replaceable>
 396      '<replaceable>string</replaceable>' can only be used to specify
 397      the type of a literal constant.
 398     </para>
 399    </sect3>
 400
 401    <sect3>
 402     <title>Array constants</title>
 403
 404     <indexterm>
 405      <primary>arrays</primary>
 406      <secondary>constants</secondary>
 407     </indexterm>
 408
 409     <para>
 410      The general format of an array constant is the following:
 411 <synopsis>
 412 '{ <replaceable>val1</replaceable> <replaceable>delim</replaceable> <replaceable>val2</replaceable> <replaceable>delim</replaceable> ... }'
 413 </synopsis>
 414      where <replaceable>delim</replaceable> is the delimiter character
 415      for the type, as recorded in its <literal>pg_type</literal>
 416      entry.  (For all built-in types, this is the comma character
 417      <quote><literal>,</literal></>.)  Each <replaceable>val</replaceable> is either a constant
 418      of the array element type, or a sub-array.  An example of an
 419      array constant is
 420 <programlisting>
 421 '{{1,2,3},{4,5,6},{7,8,9}}'
 422 </programlisting>
 423      This constant is a two-dimensional, 3 by 3 array consisting of three
 424      sub-arrays of integers.
 425     </para>
 426
 427     <para>
 428      Individual array elements can be placed between double-quote
 429      marks (<literal>"</literal>) <!-- " --> to avoid ambiguity
 430      problems with respect to white space.  Without quote marks, the
 431      array-value parser will skip leading white space.
 432     </para>
 433
 434     <para>
 435      (Array constants are actually only a special case of the generic
 436      type constants discussed in the previous section.  The constant
 437      is initially treated as a string and passed to the array input
 438      conversion routine.  An explicit type specification might be
 439      necessary.)
 440     </para>
 441    </sect3>
 442   </sect2>
 443
 444
 445   <sect2 id="sql-syntax-operators">
 446    <title>Operators</title>
 447
 448    <indexterm zone="sql-syntax-operators">
 449     <primary>operators</primary>
 450     <secondary>syntax</secondary>
 451    </indexterm>
 452
 453    <para>
 454     An operator is a sequence of up to <symbol>NAMEDATALEN</symbol>-1
 455     (31 by default) characters from the following list:
 456 <literallayout>
 457 + - * / &lt; &gt; = ~ ! @ # % ^ &amp; | ` ? $
 458 </literallayout>
 459
 460     There are a few restrictions on operator names, however:
 461     <itemizedlist>
 462      <listitem>
 463       <para>
 464        <literal>$</> (dollar) cannot be a single-character operator, although it
 465        can be part of a multiple-character operator name.
 466       </para>
 467      </listitem>
 468
 469      <listitem>
 470       <para>
 471        <literal>--</literal> and <literal>/*</literal> cannot appear
 472        anywhere in an operator name, since they will be taken as the
 473        start of a comment.
 474       </para>
 475      </listitem>
 476
 477      <listitem>
 478       <para>
 479        A multiple-character operator name cannot end in <literal>+</> or <literal>-</>,
 480        unless the name also contains at least one of these characters:
 481 <literallayout>
 482 ~ ! @ # % ^ &amp; | ` ? $
 483 </literallayout>
 484        For example, <literal>@-</literal> is an allowed operator name,
 485        but <literal>*-</literal> is not.  This restriction allows
 486        <productname>Postgres</productname> to parse SQL-compliant
 487        queries without requiring spaces between tokens.
 488       </para>
 489      </listitem>
 490     </itemizedlist>
 491    </para>
 492
 493    <para>
 494     When working with non-SQL-standard operator names, you will usually
 495     need to separate adjacent operators with spaces to avoid ambiguity.
 496     For example, if you have defined a left-unary operator named <literal>@</literal>,
 497     you cannot write <literal>X*@Y</literal>; you must write
 498     <literal>X* @Y</literal> to ensure that
 499     <productname>Postgres</productname> reads it as two operator names
 500     not one.
 501    </para>
 502   </sect2>
 503
 504   <sect2>
 505    <title>Special Characters</title>
 506
 507   <para>
 508    Some characters that are not alphanumeric have a special meaning
 509    that is different from being an operator.  Details on the usage can
 510    be found at the location where the respective syntax element is
 511    described.  This section only exists to advise the existence and
 512    summarize the purposes of these characters.
 513
 514    <itemizedlist>
 515     <listitem>
 516      <para>
 517       A dollar sign (<literal>$</literal>) followed by digits is used
 518       to represent the positional parameters in the body of a function
 519       definition.  In other contexts the dollar sign may be part of an
 520       operator name.
 521      </para>
 522     </listitem>
 523
 524     <listitem>
 525      <para>
 526       Parentheses (<literal>()</literal>) have their usual meaning to
 527       group expressions and enforce precedence.  In some cases
 528       parentheses are required as part of the fixed syntax of a
 529       particular SQL command.
 530      </para>
 531     </listitem>
 532
 533     <listitem>
 534      <para>
 535       Brackets (<literal>[]</literal>) are used to select the elements
 536       of an array.  See <xref linkend="arrays"> for more information
 537       on arrays.
 538      </para>
 539     </listitem>
 540
 541     <listitem>
 542      <para>
 543       Commas (<literal>,</literal>) are used in some syntactical
 544       constructs to separate the elements of a list.
 545      </para>
 546     </listitem>
 547
 548     <listitem>
 549      <para>
 550       The semicolon (<literal>;</literal>) terminates an SQL command.
 551       It cannot appear anywhere within a command, except within a
 552       string constant or quoted identifier.
 553      </para>
 554     </listitem>
 555
 556     <listitem>
 557      <para>
 558       The colon (<literal>:</literal>) is used to select
 559       <quote>slices</quote> from arrays. (See <xref
 560       linkend="arrays">.)  In certain SQL dialects (such as Embedded
 561       SQL), the colon is used to prefix variable names.
 562      </para>
 563     </listitem>
 564
 565     <listitem>
 566      <para>
 567       The asterisk (<literal>*</literal>) has a special meaning when
 568       used in the <command>SELECT</command> command or with the
 569       <function>COUNT</function> aggregate function.
 570      </para>
 571     </listitem>
 572
 573     <listitem>
 574      <para>
 575       The period (<literal>.</literal>) is used in floating point
 576       constants, and to separate table and column names.
 577      </para>
 578     </listitem>
 579    </itemizedlist>
 580
 581    </para>
 582   </sect2>
 583
 584   <sect2 id="sql-syntax-comments">
 585    <title>Comments</title>
 586
 587    <indexterm zone="sql-syntax-comments">
 588     <primary>comments</primary>
 589     <secondary>in SQL</secondary>
 590    </indexterm>
 591
 592    <para>
 593     A comment is an arbitrary sequence of characters beginning with
 594     double dashes and extending to the end of the line, e.g.:
 595 <programlisting>
 596 -- This is a standard SQL92 comment
 597 </programlisting>
 598    </para>
 599
 600    <para>
 601     Alternatively, C-style block comments can be used:
 602 <programlisting>
 603 /* multiline comment
 604  * with nesting: /* nested block comment */
 605  */
 606 </programlisting>
 607     where the comment begins with <literal>/*</literal> and extends to
 608     the matching occurrence of <literal>*/</literal>. These block
 609     comments nest, as specified in SQL99 but unlike C, so that one can
 610     comment out larger blocks of code that may contain existing block
 611     comments.
 612    </para>
 613
 614    <para>
 615     A comment is removed from the input stream before further syntax
 616     analysis and is effectively replaced by whitespace.
 617    </para>
 618   </sect2>
 619  </sect1>
 620
 621
 622   <sect1 id="sql-syntax-columns">
 623    <title>Columns</title>
 624
 625     <para>
 626      A <firstterm>column</firstterm>
 627      is either a user-defined column of a given table or one of the
 628      following system-defined columns:
 629
 630      <indexterm>
 631       <primary>columns</primary>
 632       <secondary>system columns</secondary>
 633      </indexterm>
 634
 635      <variablelist>
 636       <varlistentry>
 637        <term><structfield>oid</></term>
 638        <listitem>
 639         <para>
 640          <indexterm>
 641           <primary>OID</primary>
 642          </indexterm>
 643          The object identifier (object ID) of a row.  This is a serial number
 644          that is automatically added by Postgres to all table rows (unless
 645          the table was created WITHOUT OIDS, in which case this column is
 646          not present).
 647         </para>
 648        </listitem>
 649       </varlistentry>
 650
 651       <varlistentry>
 652       <term><structfield>tableoid</></term>
 653        <listitem>
 654         <para>
 655          The OID of the table containing this row.  This attribute is
 656          particularly handy for queries that select from inheritance
 657          hierarchies, since without it, it's difficult to tell which
 658          individual table a row came from.  The
 659          <structfield>tableoid</structfield> can be joined against the
 660          <structfield>oid</structfield> column of
 661          <classname>pg_class</classname> to obtain the table name.
 662         </para>
 663        </listitem>
 664       </varlistentry>
 665
 666       <varlistentry>
 667        <term><structfield>xmin</></term>
 668        <listitem>
 669         <para>
 670          The identity (transaction ID) of the inserting transaction for
 671          this tuple.  (Note: a tuple is an individual state of a row;
 672          each UPDATE of a row creates a new tuple for the same logical row.)
 673         </para>
 674        </listitem>
 675       </varlistentry>
 676
 677       <varlistentry>
 678       <term><structfield>cmin</></term>
 679        <listitem>
 680         <para>
 681          The command identifier (starting at zero) within the inserting
 682          transaction.
 683         </para>
 684        </listitem>
 685       </varlistentry>
 686
 687       <varlistentry>
 688       <term><structfield>xmax</></term>
 689        <listitem>
 690         <para>
 691          The identity (transaction ID) of the deleting transaction,
 692          or zero for an undeleted tuple.  It is possible for this field
 693          to be nonzero in a visible tuple: that usually indicates that the
 694          deleting transaction hasn't committed yet, or that an attempted
 695          deletion was rolled back.
 696         </para>
 697        </listitem>
 698       </varlistentry>
 699
 700       <varlistentry>
 701       <term><structfield>cmax</></term>
 702        <listitem>
 703         <para>
 704          The command identifier within the deleting transaction, or zero.
 705         </para>
 706        </listitem>
 707       </varlistentry>
 708
 709       <varlistentry>
 710       <term><structfield>ctid</></term>
 711        <listitem>
 712         <para>
 713          The tuple ID of the tuple within its table.  This is a pair
 714          (block number, tuple index within block) that identifies the
 715          physical location of the tuple.  Note that although the <structfield>ctid</structfield>
 716          can be used to locate the tuple very quickly, a row's <structfield>ctid</structfield>
 717          will change each time it is updated or moved by <command>VACUUM
 718          FULL</>.
 719          Therefore <structfield>ctid</structfield> is useless as a long-term row identifier.
 720          The OID, or even better a user-defined serial number, should
 721          be used to identify logical rows.
 722         </para>
 723        </listitem>
 724       </varlistentry>
 725      </variablelist>
 726     </para>
 727
 728     <para>
 729      OIDs are 32-bit quantities and are assigned from a single cluster-wide
 730      counter.  In a large or long-lived database, it is possible for the
 731      counter to wrap around.  Hence, it is bad practice to assume that OIDs
 732      are unique, unless you take steps to ensure that they are unique.
 733      Recommended practice when using OIDs for row identification is to create
 734      a unique index on the OID column of each table for which the OID will be
 735      used.  Never assume that OIDs are unique across tables; use the
 736      combination of <structfield>tableoid</> and row OID if you need a database-wide
 737      identifier.  (Future releases of Postgres are likely to use a separate
 738      OID counter for each table, so that <structfield>tableoid</> <emphasis>must</> be
 739      included to arrive at a globally unique identifier.)
 740     </para>
 741
 742     <para>
 743      Transaction identifiers are 32-bit quantities.  In a long-lived
 744      database it is possible for transaction IDs to wrap around.  This
 745      is not a fatal problem given appropriate maintenance procedures;
 746      see the Administrator's Guide for details.  However, it is unwise
 747      to depend on uniqueness of transaction IDs over the long term
 748      (more than one billion transactions).
 749     </para>
 750
 751     <para>
 752      Command identifiers are also 32-bit quantities.  This creates a hard
 753      limit of 2^32 (4 billion) SQL commands within a single transaction.
 754      In practice this limit is not a problem --- note that the limit is on
 755      number of SQL queries, not number of tuples processed.
 756     </para>
 757
 758     <para>
 759      For further information on the system attributes consult
 760      <xref linkend="STON87a">.
 761     </para>
 762
 763   </sect1>
 764
 765
 766  <sect1 id="sql-expressions">
 767   <title>Value Expressions</title>
 768
 769   <para>
 770    Value expressions are used in a variety of contexts, such
 771    as in the target list of the <command>SELECT</command> command, as
 772    new column values in <command>INSERT</command> or
 773    <command>UPDATE</command>, or in search conditions in a number of
 774    commands.  The result of a value expression is sometimes called a
 775    <firstterm>scalar</firstterm>, to distinguish it from the result of
 776    a table expression (which is a table).  Value expressions are
 777    therefore also called <firstterm>scalar expressions</firstterm> (or
 778    even simply <firstterm>expressions</firstterm>).  The expression
 779    syntax allows the calculation of values from primitive parts using
 780    arithmetic, logical, set, and other operations.
 781   </para>
 782
 783   <para>
 784    A value expression is one of the following:
 785
 786    <itemizedlist>
 787     <listitem>
 788      <para>
 789       A constant or literal value; see <xref linkend="sql-syntax-constants">.
 790      </para>
 791     </listitem>
 792
 793     <listitem>
 794      <para>
 795       A column reference
 796      </para>
 797     </listitem>
 798
 799     <listitem>
 800      <para>
 801       An operator invocation:
 802       <simplelist>
 803        <member><replaceable>expression</replaceable> <replaceable>operator</replaceable> <replaceable>expression</replaceable> (binary infix operator)</member>
 804        <member><replaceable>operator</replaceable> <replaceable>expression</replaceable> (unary prefix operator)</member>
 805        <member><replaceable>expression</replaceable> <replaceable>operator</replaceable> (unary postfix operator)</member>
 806       </simplelist>
 807       where <replaceable>operator</replaceable> follows the syntax
 808       rules of <xref linkend="sql-syntax-operators"> or is one of the
 809       tokens <token>AND</token>, <token>OR</token>, and
 810       <token>NOT</token>.  Which particular operators exist and whether
 811       they are unary or binary depends on what operators have been
 812       defined by the system or the user.  <xref linkend="functions">
 813       describes the built-in operators.
 814      </para>
 815     </listitem>
 816
 817     <listitem>
 818 <synopsis>( <replaceable>expression</replaceable> )</synopsis>
 819      <para>
 820       Parentheses are used to group subexpressions and override precedence.
 821      </para>
 822     </listitem>
 823
 824     <listitem>
 825      <para>
 826       A positional parameter reference, in the body of a function declaration.
 827      </para>
 828     </listitem>
 829
 830     <listitem>
 831      <para>
 832       A function call
 833      </para>
 834     </listitem>
 835
 836     <listitem>
 837      <para>
 838       An aggregate expression
 839      </para>
 840     </listitem>
 841
 842     <listitem>
 843      <para>
 844       A scalar subquery.  This is an ordinary
 845       <command>SELECT</command> in parentheses that returns exactly one
 846       row with one column.  It is an error to use a subquery that
 847       returns more than one row or more than one column in the context
 848       of a value expression.
 849      </para>
 850     </listitem>
 851    </itemizedlist>
 852   </para>
 853
 854   <para>
 855    In addition to this list, there are a number of constructs that can
 856    be classified as an expression but do not follow any general syntax
 857    rules.  These generally have the semantics of a function or
 858    operator and are explained in the appropriate location in <xref
 859    linkend="functions">.  An example is the <literal>IS NULL</literal>
 860    clause.
 861   </para>
 862
 863   <para>
 864    We have already discussed constants in <xref
 865    linkend="sql-syntax-constants">.  The following sections discuss
 866    the remaining options.
 867   </para>
 868
 869   <sect2>
 870    <title>Column References</title>
 871
 872    <para>
 873     A column can be referenced in the form:
 874 <synopsis>
 875 <replaceable>correlation</replaceable>.<replaceable>columnname</replaceable> `['<replaceable>subscript</replaceable>`]'
 876 </synopsis>
 877
 878     <replaceable>correlation</replaceable> is either the name of a
 879     table, an alias for a table defined by means of a FROM clause, or
 880     the keyword <literal>NEW</literal> or <literal>OLD</literal>.
 881     (NEW and OLD can only appear in the action portion of a rule,
 882     while other correlation names can be used in any SQL statement.)
 883     The correlation name can be omitted if the column name is unique
 884     across all the tables being used in the current query.  If
 885     <replaceable>column</replaceable> is of an array type, then the
 886     optional <replaceable>subscript</replaceable> selects a specific
 887     element in the array.  If no subscript is provided, then the whole
 888     array is selected.  Refer to the description of the particular
 889     commands in the <citetitle>PostgreSQL Reference Manual</citetitle>
 890     for the allowed syntax in each case.
 891    </para>
 892   </sect2>
 893
 894   <sect2>
 895    <title>Positional Parameters</title>
 896
 897    <para>
 898     A positional parameter reference is used to indicate a parameter
 899     in an SQL function.  Typically this is used in SQL function
 900     definition statements.  The form of a parameter is:
 901 <synopsis>
 902 $<replaceable>number</replaceable>
 903 </synopsis>
 904    </para>
 905
 906    <para>
 907     For example, consider the definition of a function,
 908     <function>dept</function>, as
 909
 910 <programlisting>
 911 CREATE FUNCTION dept (text) RETURNS dept
 912   AS 'SELECT * FROM dept WHERE name = $1'
 913   LANGUAGE 'sql';
 914 </programlisting>
 915
 916     Here the <literal>$1</literal> will be replaced by the first
 917     function argument when the function is invoked.
 918    </para>
 919   </sect2>
 920
 921   <sect2>
 922    <title>Function Calls</title>
 923
 924    <para>
 925     The syntax for a function call is the name of a function
 926     (which is subject to the syntax rules for identifiers of <xref
 927     linkend="sql-syntax-identifiers">), followed by its argument list
 928     enclosed in parentheses:
 929
 930 <synopsis>
 931 <replaceable>function</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional> )
 932 </synopsis>
 933    </para>
 934
 935    <para>
 936     For example, the following computes the square root of 2:
 937 <programlisting>
 938 sqrt(2)
 939 </programlisting>
 940    </para>
 941
 942    <para>
 943     The list of built-in functions is in <xref linkend="functions">.
 944     Other functions may be added by the user.
 945    </para>
 946   </sect2>
 947
 948   <sect2 id="syntax-aggregates">
 949    <title>Aggregate Expressions</title>
 950
 951    <indexterm zone="syntax-aggregates">
 952     <primary>aggregate functions</primary>
 953    </indexterm>
 954
 955    <para>
 956     An <firstterm>aggregate expression</firstterm> represents the
 957     application of an aggregate function across the rows selected by a
 958     query.  An aggregate function reduces multiple inputs to a single
 959     output value, such as the sum or average of the inputs.  The
 960     syntax of an aggregate expression is one of the following:
 961
 962     <simplelist>
 963      <member><replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable>)</member>
 964      <member><replaceable>aggregate_name</replaceable> (ALL <replaceable>expression</replaceable>)</member>
 965      <member><replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable>)</member>
 966      <member><replaceable>aggregate_name</replaceable> ( * )</member>
 967     </simplelist>
 968
 969     where <replaceable>aggregate_name</replaceable> is a previously
 970     defined aggregate, and <replaceable>expression</replaceable> is
 971     any expression that does not itself contain an aggregate
 972     expression.
 973    </para>
 974
 975    <para>
 976     The first form of aggregate expression invokes the aggregate
 977     across all input rows for which the given expression yields a
 978     non-NULL value.  (Actually, it is up to the aggregate function
 979     whether to ignore NULLs or not --- but all the standard ones do.)
 980     The second form is the same as the first, since
 981     <literal>ALL</literal> is the default.  The third form invokes the
 982     aggregate for all distinct non-NULL values of the expression found
 983     in the input rows.  The last form invokes the aggregate once for
 984     each input row regardless of NULL or non-NULL values; since no
 985     particular input value is specified, it is generally only useful
 986     for the <function>count()</function> aggregate function.
 987    </para>
 988
 989    <para>
 990     For example, <literal>count(*)</literal> yields the total number
 991     of input rows; <literal>count(f1)</literal> yields the number of
 992     input rows in which <literal>f1</literal> is non-NULL;
 993     <literal>count(distinct f1)</literal> yields the number of
 994     distinct non-NULL values of <literal>f1</literal>.
 995    </para>
 996
 997    <para>
 998     The predefined aggregate functions are described in <xref
 999     linkend="functions-aggregate">.  Other aggregate functions may be added
1000     by the user.
1001    </para>
1002   </sect2>
1003
1004  </sect1>
1005
1006
1007   <sect1 id="sql-precedence">
1008    <title>Lexical Precedence</title>
1009
1010    <indexterm zone="sql-precedence">
1011     <primary>operators</primary>
1012     <secondary>precedence</secondary>
1013    </indexterm>
1014
1015    <para>
1016     The precedence and associativity of the operators is hard-wired
1017     into the parser.  Most operators have the same precedence and are
1018     left-associative.  This may lead to non-intuitive behavior; for
1019     example the Boolean operators <literal>&lt;</> and <literal>&gt;</> have a different
1020     precedence than the Boolean operators <literal>&lt;=</> and <literal>&gt;=</>.  Also,
1021     you will sometimes need to add parentheses when using combinations
1022     of binary and unary operators.  For instance
1023 <programlisting>
1024 SELECT 5 ! - 6;
1025 </programlisting>
1026    will be parsed as
1027 <programlisting>
1028 SELECT 5 ! (- 6);
1029 </programlisting>
1030     because the parser has no idea -- until it is too late -- that
1031     <token>!</token> is defined as a postfix operator, not an infix one.
1032     To get the desired behavior in this case, you must write
1033 <programlisting>
1034 SELECT (5 !) - 6;
1035 </programlisting>
1036     This is the price one pays for extensibility.
1037    </para>
1038
1039    <table tocentry="1">
1040     <title>Operator Precedence (decreasing)</title>
1041
1042     <tgroup cols="3">
1043      <thead>
1044       <row>
1045        <entry>Operator/Element</entry>
1046        <entry>Associativity</entry>
1047        <entry>Description</entry>
1048       </row>
1049      </thead>
1050
1051      <tbody>
1052       <row>
1053        <entry><token>::</token></entry>
1054        <entry>left</entry>
1055        <entry><productname>Postgres</productname>-style typecast</entry>
1056       </row>
1057
1058       <row>
1059        <entry><token>[</token> <token>]</token></entry>
1060        <entry>left</entry>
1061        <entry>array element selection</entry>
1062       </row>
1063
1064       <row>
1065        <entry><token>.</token></entry>
1066        <entry>left</entry>
1067        <entry>table/column name separator</entry>
1068       </row>
1069
1070       <row>
1071        <entry><token>-</token></entry>
1072        <entry>right</entry>
1073        <entry>unary minus</entry>
1074       </row>
1075
1076       <row>
1077        <entry><token>^</token></entry>
1078        <entry>left</entry>
1079        <entry>exponentiation</entry>
1080       </row>
1081
1082       <row>
1083        <entry><token>*</token> <token>/</token> <token>%</token></entry>
1084        <entry>left</entry>
1085        <entry>multiplication, division, modulo</entry>
1086       </row>
1087
1088       <row>
1089        <entry><token>+</token> <token>-</token></entry>
1090        <entry>left</entry>
1091        <entry>addition, subtraction</entry>
1092       </row>
1093
1094       <row>
1095        <entry><token>IS</token></entry>
1096        <entry></entry>
1097        <entry>test for TRUE, FALSE, UNKNOWN, NULL</entry>
1098       </row>
1099
1100       <row>
1101        <entry><token>ISNULL</token></entry>
1102        <entry></entry>
1103        <entry>test for NULL</entry>
1104       </row>
1105
1106       <row>
1107        <entry><token>NOTNULL</token></entry>
1108        <entry></entry>
1109        <entry>test for NOT NULL</entry>
1110       </row>
1111
1112       <row>
1113        <entry>(any other)</entry>
1114        <entry>left</entry>
1115        <entry>all other native and user-defined operators</entry>
1116       </row>
1117
1118       <row>
1119        <entry><token>IN</token></entry>
1120        <entry></entry>
1121        <entry>set membership</entry>
1122       </row>
1123
1124       <row>
1125        <entry><token>BETWEEN</token></entry>
1126        <entry></entry>
1127        <entry>containment</entry>
1128       </row>
1129
1130       <row>
1131        <entry><token>OVERLAPS</token></entry>
1132        <entry></entry>
1133        <entry>time interval overlap</entry>
1134       </row>
1135
1136       <row>
1137        <entry><token>LIKE</token> <token>ILIKE</token></entry>
1138        <entry></entry>
1139        <entry>string pattern matching</entry>
1140       </row>
1141
1142       <row>
1143        <entry><token>&lt;</token> <token>&gt;</token></entry>
1144        <entry></entry>
1145        <entry>less than, greater than</entry>
1146       </row>
1147
1148       <row>
1149        <entry><token>=</token></entry>
1150        <entry>right</entry>
1151        <entry>equality, assignment</entry>
1152       </row>
1153
1154       <row>
1155        <entry><token>NOT</token></entry>
1156        <entry>right</entry>
1157        <entry>logical negation</entry>
1158       </row>
1159
1160       <row>
1161        <entry><token>AND</token></entry>
1162        <entry>left</entry>
1163        <entry>logical conjunction</entry>
1164       </row>
1165
1166       <row>
1167        <entry><token>OR</token></entry>
1168        <entry>left</entry>
1169        <entry>logical disjunction</entry>
1170       </row>
1171      </tbody>
1172     </tgroup>
1173    </table>
1174
1175    <para>
1176     Note that the operator precedence rules also apply to user-defined
1177     operators that have the same names as the built-in operators
1178     mentioned above.  For example, if you define a
1179     <quote>+</quote> operator for some custom data type it will have
1180     the same precedence as the built-in <quote>+</quote> operator, no
1181     matter what yours does.
1182    </para>
1183   </sect1>
1184
1185 </chapter>
1186
1187 <!-- Keep this comment at the end of the file
1188 Local variables:
1189 mode:sgml
1190 sgml-omittag:nil
1191 sgml-shorttag:t
1192 sgml-minimize-attributes:nil
1193 sgml-always-quote-attributes:t
1194 sgml-indent-step:1
1195 sgml-indent-data:t
1196 sgml-parent-document:nil
1197 sgml-default-dtd-file:"./reference.ced"
1198 sgml-exposed-tags:nil
1199 sgml-local-catalogs:("/usr/lib/sgml/catalog")
1200 sgml-local-ecat-files:nil
1201 End:
1202 -->