doc/src/sgml/query.sgml

   1 <!-- doc/src/sgml/query.sgml -->
   2
   3  <chapter id="tutorial-sql">
   4   <title>The <acronym>SQL</acronym> Language</title>
   5
   6   <sect1 id="tutorial-sql-intro">
   7    <title>Introduction</title>
   8
   9    <para>
  10     This chapter provides an overview of how to use
  11     <acronym>SQL</acronym> to perform simple operations.  This
  12     tutorial is only intended to give you an introduction and is in no
  13     way a complete tutorial on <acronym>SQL</acronym>.  Numerous books
  14     have been written on <acronym>SQL</acronym>, including <xref
  15     linkend="MELT93"> and <xref linkend="DATE97">.
  16     You should be aware that some <productname>PostgreSQL</productname>
  17     language features are extensions to the standard.
  18    </para>
  19
  20    <para>
  21     In the examples that follow, we assume that you have created a
  22     database named <literal>mydb</literal>, as described in the previous
  23     chapter, and have been able to start <application>psql</application>.
  24    </para>
  25
  26    <para>
  27     Examples in this manual can also be found in the
  28     <productname>PostgreSQL</productname> source distribution
  29     in the directory <filename>src/tutorial/</filename>.  (Binary
  30     distributions of <productname>PostgreSQL</productname> might not
  31     compile these files.)  To use those
  32     files, first change to that directory and run <application>make</>:
  33
  34 <screen>
  35 <prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/src/tutorial</userinput>
  36 <prompt>$</prompt> <userinput>make</userinput>
  37 </screen>
  38
  39     This creates the scripts and compiles the C files containing user-defined
  40     functions and types.  Then, to start the tutorial, do the following:
  41
  42 <screen>
  43 <prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/tutorial</userinput>
  44 <prompt>$</prompt> <userinput>psql -s mydb</userinput>
  45 <computeroutput>
  46 ...
  47 </computeroutput>
  48
  49 <prompt>mydb=&gt;</prompt> <userinput>\i basics.sql</userinput>
  50 </screen>
  51
  52     The <literal>\i</literal> command reads in commands from the
  53     specified file. <command>psql</command>'s <literal>-s</> option puts you in
  54     single step mode which pauses before sending each statement to the
  55     server.  The commands used in this section are in the file
  56     <filename>basics.sql</filename>.
  57    </para>
  58   </sect1>
  59
  60
  61   <sect1 id="tutorial-concepts">
  62    <title>Concepts</title>
  63
  64    <para>
  65     <indexterm><primary>relational database</primary></indexterm>
  66     <indexterm><primary>hierarchical database</primary></indexterm>
  67     <indexterm><primary>object-oriented database</primary></indexterm>
  68     <indexterm><primary>relation</primary></indexterm>
  69     <indexterm><primary>table</primary></indexterm>
  70
  71     <productname>PostgreSQL</productname> is a <firstterm>relational
  72     database management system</firstterm> (<acronym>RDBMS</acronym>).
  73     That means it is a system for managing data stored in
  74     <firstterm>relations</firstterm>.  Relation is essentially a
  75     mathematical term for <firstterm>table</firstterm>.  The notion of
  76     storing data in tables is so commonplace today that it might
  77     seem inherently obvious, but there are a number of other ways of
  78     organizing databases.  Files and directories on Unix-like
  79     operating systems form an example of a hierarchical database.  A
  80     more modern development is the object-oriented database.
  81    </para>
  82
  83    <para>
  84     <indexterm><primary>row</primary></indexterm>
  85     <indexterm><primary>column</primary></indexterm>
  86
  87     Each table is a named collection of <firstterm>rows</firstterm>.
  88     Each row of a given table has the same set of named
  89     <firstterm>columns</firstterm>,
  90     and each column is of a specific data type.  Whereas columns have
  91     a fixed order in each row, it is important to remember that SQL
  92     does not guarantee the order of the rows within the table in any
  93     way (although they can be explicitly sorted for display).
  94    </para>
  95
  96    <para>
  97     <indexterm><primary>database cluster</primary></indexterm>
  98     <indexterm><primary>cluster</primary><secondary>of databases</secondary><see>database cluster</see></indexterm>
  99
 100     Tables are grouped into databases, and a collection of databases
 101     managed by a single <productname>PostgreSQL</productname> server
 102     instance constitutes a database <firstterm>cluster</firstterm>.
 103    </para>
 104   </sect1>
 105
 106
 107   <sect1 id="tutorial-table">
 108    <title>Creating a New Table</title>
 109
 110    <indexterm zone="tutorial-table">
 111     <primary>CREATE TABLE</primary>
 112    </indexterm>
 113
 114    <para>
 115     You  can  create  a  new  table by specifying the table
 116     name, along with all column names and their types:
 117
 118 <programlisting>
 119 CREATE TABLE weather (
 120     city            varchar(80),
 121     temp_lo         int,           -- low temperature
 122     temp_hi         int,           -- high temperature
 123     prcp            real,          -- precipitation
 124     date            date
 125 );
 126 </programlisting>
 127
 128     You can enter this into <command>psql</command> with the line
 129     breaks.  <command>psql</command> will recognize that the command
 130     is not terminated until the semicolon.
 131    </para>
 132
 133    <para>
 134     White space (i.e., spaces, tabs, and newlines) can be used freely
 135     in SQL commands.  That means you can type the command aligned
 136     differently than above, or even all on one line.  Two dashes
 137     (<quote><literal>--</literal></quote>) introduce comments.
 138     Whatever follows them is ignored up to the end of the line.  SQL
 139     is case insensitive about key words and identifiers, except
 140     when identifiers are double-quoted to preserve the case (not done
 141     above).
 142    </para>
 143
 144    <para>
 145     <type>varchar(80)</type> specifies a data type that can store
 146     arbitrary character strings up to 80 characters in length.
 147     <type>int</type> is the normal integer type.  <type>real</type> is
 148     a type for storing single precision floating-point numbers.
 149     <type>date</type> should be self-explanatory.  (Yes, the column of
 150     type <type>date</type> is also named <structfield>date</structfield>.
 151     This might be convenient or confusing &mdash; you choose.)
 152    </para>
 153
 154    <para>
 155     <productname>PostgreSQL</productname> supports the standard
 156     <acronym>SQL</acronym> types <type>int</type>,
 157     <type>smallint</type>, <type>real</type>, <type>double
 158     precision</type>, <type>char(<replaceable>N</>)</type>,
 159     <type>varchar(<replaceable>N</>)</type>, <type>date</type>,
 160     <type>time</type>, <type>timestamp</type>, and
 161     <type>interval</type>, as well as other types of general utility
 162     and a rich set of geometric types.
 163     <productname>PostgreSQL</productname> can be customized with an
 164     arbitrary number of user-defined data types.  Consequently, type
 165     names are not key words in the syntax, except where required to
 166     support special cases in the <acronym>SQL</acronym> standard.
 167    </para>
 168
 169    <para>
 170     The second example will store cities and their associated
 171     geographical location:
 172 <programlisting>
 173 CREATE TABLE cities (
 174     name            varchar(80),
 175     location        point
 176 );
 177 </programlisting>
 178     The <type>point</type> type is an example of a
 179     <productname>PostgreSQL</productname>-specific data type.
 180    </para>
 181
 182    <para>
 183     <indexterm>
 184      <primary>DROP TABLE</primary>
 185     </indexterm>
 186
 187     Finally, it should be mentioned that if you don't need a table any
 188     longer or want to recreate it differently you can remove it using
 189     the following command:
 190 <synopsis>
 191 DROP TABLE <replaceable>tablename</replaceable>;
 192 </synopsis>
 193    </para>
 194   </sect1>
 195
 196
 197   <sect1 id="tutorial-populate">
 198    <title>Populating a Table With Rows</title>
 199
 200    <indexterm zone="tutorial-populate">
 201     <primary>INSERT</primary>
 202    </indexterm>
 203
 204    <para>
 205     The <command>INSERT</command> statement is used to populate a table  with
 206     rows:
 207
 208 <programlisting>
 209 INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
 210 </programlisting>
 211
 212     Note that all data types use rather obvious input formats.
 213     Constants that are not simple numeric values usually must be
 214     surrounded by single quotes (<literal>'</>), as in the example.
 215     The
 216     <type>date</type> type is actually quite flexible in what it
 217     accepts, but for this tutorial we will stick to the unambiguous
 218     format shown here.
 219    </para>
 220
 221    <para>
 222     The <type>point</type> type requires a coordinate pair as input,
 223     as shown here:
 224 <programlisting>
 225 INSERT INTO cities VALUES ('San Francisco', '(-194.0, 53.0)');
 226 </programlisting>
 227    </para>
 228
 229    <para>
 230     The syntax used so far requires you to remember the order of the
 231     columns.  An alternative syntax allows you to list the columns
 232     explicitly:
 233 <programlisting>
 234 INSERT INTO weather (city, temp_lo, temp_hi, prcp, date)
 235     VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29');
 236 </programlisting>
 237     You can list the columns in a different order if you wish or
 238     even omit some columns, e.g., if the precipitation is unknown:
 239 <programlisting>
 240 INSERT INTO weather (date, city, temp_hi, temp_lo)
 241     VALUES ('1994-11-29', 'Hayward', 54, 37);
 242 </programlisting>
 243     Many developers consider explicitly listing the columns better
 244     style than relying on the order implicitly.
 245    </para>
 246
 247    <para>
 248     Please enter all the commands shown above so you have some data to
 249     work with in the following sections.
 250    </para>
 251
 252    <para>
 253     <indexterm>
 254      <primary>COPY</primary>
 255     </indexterm>
 256
 257     You could also have used <command>COPY</command> to load large
 258     amounts of data from flat-text files.  This is usually faster
 259     because the <command>COPY</command> command is optimized for this
 260     application while allowing less flexibility than
 261     <command>INSERT</command>.  An example would be:
 262
 263 <programlisting>
 264 COPY weather FROM '/home/user/weather.txt';
 265 </programlisting>
 266
 267     where the file name for the source file must be available on the
 268     machine running the backend process, not the client, since the backend process
 269     reads the file directly.  You can read more about the
 270     <command>COPY</command> command in <xref linkend="sql-copy">.
 271    </para>
 272   </sect1>
 273
 274
 275   <sect1 id="tutorial-select">
 276    <title>Querying a Table</title>
 277
 278    <para>
 279     <indexterm><primary>query</primary></indexterm>
 280     <indexterm><primary>SELECT</primary></indexterm>
 281
 282     To retrieve data from a table, the table is
 283     <firstterm>queried</firstterm>.  An <acronym>SQL</acronym>
 284     <command>SELECT</command> statement is used to do this.  The
 285     statement is divided into a select list (the part that lists the
 286     columns to be returned), a table list (the part that lists the
 287     tables from which to retrieve the data), and an optional
 288     qualification (the part that specifies any restrictions).  For
 289     example, to retrieve all the rows of table
 290     <structname>weather</structname>, type:
 291 <programlisting>
 292 SELECT * FROM weather;
 293 </programlisting>
 294     Here <literal>*</literal> is a shorthand for <quote>all columns</quote>.
 295      <footnote>
 296       <para>
 297        While <literal>SELECT *</literal> is useful for off-the-cuff
 298        queries, it is widely considered bad style in production code,
 299        since adding a column to the table would change the results.
 300       </para>
 301      </footnote>
 302     So the same result would be had with:
 303 <programlisting>
 304 SELECT city, temp_lo, temp_hi, prcp, date FROM weather;
 305 </programlisting>
 306
 307     The output should be:
 308
 309 <screen>
 310      city      | temp_lo | temp_hi | prcp |    date
 311 ---------------+---------+---------+------+------------
 312  San Francisco |      46 |      50 | 0.25 | 1994-11-27
 313  San Francisco |      43 |      57 |    0 | 1994-11-29
 314  Hayward       |      37 |      54 |      | 1994-11-29
 315 (3 rows)
 316 </screen>
 317    </para>
 318
 319    <para>
 320     You can write expressions, not just simple column references, in the
 321     select list.  For example, you can do:
 322 <programlisting>
 323 SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather;
 324 </programlisting>
 325     This should give:
 326 <screen>
 327      city      | temp_avg |    date
 328 ---------------+----------+------------
 329  San Francisco |       48 | 1994-11-27
 330  San Francisco |       50 | 1994-11-29
 331  Hayward       |       45 | 1994-11-29
 332 (3 rows)
 333 </screen>
 334     Notice how the <literal>AS</literal> clause is used to relabel the
 335     output column.  (The <literal>AS</literal> clause is optional.)
 336    </para>
 337
 338    <para>
 339     A query can be <quote>qualified</> by adding a <literal>WHERE</>
 340     clause that specifies which rows are wanted.  The <literal>WHERE</>
 341     clause contains a Boolean (truth value) expression, and only rows for
 342     which the Boolean expression is true are returned.  The usual
 343     Boolean operators (<literal>AND</literal>,
 344     <literal>OR</literal>, and <literal>NOT</literal>) are allowed in
 345     the qualification.  For example, the following
 346     retrieves the weather of San Francisco on rainy days:
 347
 348 <programlisting>
 349 SELECT * FROM weather
 350     WHERE city = 'San Francisco' AND prcp &gt; 0.0;
 351 </programlisting>
 352     Result:
 353 <screen>
 354      city      | temp_lo | temp_hi | prcp |    date
 355 ---------------+---------+---------+------+------------
 356  San Francisco |      46 |      50 | 0.25 | 1994-11-27
 357 (1 row)
 358 </screen>
 359    </para>
 360
 361    <para>
 362     <indexterm><primary>ORDER BY</primary></indexterm>
 363
 364     You can request that the results of a query
 365     be returned in sorted order:
 366
 367 <programlisting>
 368 SELECT * FROM weather
 369     ORDER BY city;
 370 </programlisting>
 371
 372 <screen>
 373      city      | temp_lo | temp_hi | prcp |    date
 374 ---------------+---------+---------+------+------------
 375  Hayward       |      37 |      54 |      | 1994-11-29
 376  San Francisco |      43 |      57 |    0 | 1994-11-29
 377  San Francisco |      46 |      50 | 0.25 | 1994-11-27
 378 </screen>
 379
 380     In this example, the sort order isn't fully specified, and so you
 381     might get the San Francisco rows in either order.  But you'd always
 382     get the results shown above if you do:
 383
 384 <programlisting>
 385 SELECT * FROM weather
 386     ORDER BY city, temp_lo;
 387 </programlisting>
 388    </para>
 389
 390    <para>
 391     <indexterm><primary>DISTINCT</primary></indexterm>
 392     <indexterm><primary>duplicate</primary></indexterm>
 393
 394     You can request that duplicate rows be removed from the result of
 395     a query:
 396
 397 <programlisting>
 398 SELECT DISTINCT city
 399     FROM weather;
 400 </programlisting>
 401
 402 <screen>
 403      city
 404 ---------------
 405  Hayward
 406  San Francisco
 407 (2 rows)
 408 </screen>
 409
 410     Here again, the result row ordering might vary.
 411     You can ensure consistent results by using <literal>DISTINCT</literal> and
 412     <literal>ORDER BY</literal> together:
 413      <footnote>
 414       <para>
 415        In some database systems, including older versions of
 416        <productname>PostgreSQL</productname>, the implementation of
 417        <literal>DISTINCT</literal> automatically orders the rows and
 418        so <literal>ORDER BY</literal> is unnecessary.  But this is not
 419        required by the SQL standard, and current
 420        <productname>PostgreSQL</productname> does not guarantee that
 421        <literal>DISTINCT</literal> causes the rows to be ordered.
 422       </para>
 423      </footnote>
 424
 425 <programlisting>
 426 SELECT DISTINCT city
 427     FROM weather
 428     ORDER BY city;
 429 </programlisting>
 430    </para>
 431   </sect1>
 432
 433
 434   <sect1 id="tutorial-join">
 435    <title>Joins Between Tables</title>
 436
 437    <indexterm zone="tutorial-join">
 438     <primary>join</primary>
 439    </indexterm>
 440
 441    <para>
 442     Thus far, our queries have only accessed one table at a time.
 443     Queries can access multiple tables at once, or access the same
 444     table in such a way that multiple rows of the table are being
 445     processed at the same time.  A query that accesses multiple rows
 446     of the same or different tables at one time is called a
 447     <firstterm>join</firstterm> query.  As an example, say you wish to
 448     list all the weather records together with the location of the
 449     associated city.  To do that, we need to compare the <structfield>city</>
 450     column of each row of the <structname>weather</> table with the
 451     <structfield>name</> column of all rows in the <structname>cities</>
 452     table, and select the pairs of rows where these values match.
 453     <note>
 454      <para>
 455       This  is only a conceptual model.  The join is usually performed
 456       in a more efficient manner than actually comparing each possible
 457       pair of rows, but this is invisible to the user.
 458      </para>
 459     </note>
 460     This would be accomplished by the following query:
 461
 462 <programlisting>
 463 SELECT *
 464     FROM weather, cities
 465     WHERE city = name;
 466 </programlisting>
 467
 468 <screen>
 469      city      | temp_lo | temp_hi | prcp |    date    |     name      | location
 470 ---------------+---------+---------+------+------------+---------------+-----------
 471  San Francisco |      46 |      50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
 472  San Francisco |      43 |      57 |    0 | 1994-11-29 | San Francisco | (-194,53)
 473 (2 rows)
 474 </screen>
 475
 476    </para>
 477
 478    <para>
 479     Observe two things about the result set:
 480     <itemizedlist>
 481      <listitem>
 482       <para>
 483        There is no result row for the city of Hayward.  This is
 484        because there is no matching entry in the
 485        <structname>cities</structname> table for Hayward, so the join
 486        ignores the unmatched rows in the <structname>weather</> table.  We will see
 487        shortly how this can be fixed.
 488       </para>
 489      </listitem>
 490
 491      <listitem>
 492       <para>
 493        There are two columns containing the city name.  This is
 494        correct because the lists of columns from the
 495        <structname>weather</structname> and
 496        <structname>cities</structname> tables are concatenated.  In
 497        practice this is undesirable, though, so you will probably want
 498        to list the output columns explicitly rather than using
 499        <literal>*</literal>:
 500 <programlisting>
 501 SELECT city, temp_lo, temp_hi, prcp, date, location
 502     FROM weather, cities
 503     WHERE city = name;
 504 </programlisting>
 505       </para>
 506      </listitem>
 507     </itemizedlist>
 508    </para>
 509
 510    <formalpara>
 511     <title>Exercise:</title>
 512
 513     <para>
 514      Attempt to determine the semantics of this query when the
 515      <literal>WHERE</literal> clause is omitted.
 516     </para>
 517    </formalpara>
 518
 519    <para>
 520     Since the columns all had different names, the parser
 521     automatically found which table they belong to.  If there
 522     were duplicate column names in the two tables you'd need to
 523     <firstterm>qualify</> the column names to show which one you
 524     meant, as in:
 525
 526 <programlisting>
 527 SELECT weather.city, weather.temp_lo, weather.temp_hi,
 528        weather.prcp, weather.date, cities.location
 529     FROM weather, cities
 530     WHERE cities.name = weather.city;
 531 </programlisting>
 532
 533     It is widely considered good style to qualify all column names
 534     in a join query, so that the query won't fail if a duplicate
 535     column name is later added to one of the tables.
 536    </para>
 537
 538    <para>
 539     Join queries of the kind seen thus far can also be written in this
 540     alternative form:
 541
 542 <programlisting>
 543 SELECT *
 544     FROM weather INNER JOIN cities ON (weather.city = cities.name);
 545 </programlisting>
 546
 547     This syntax is not as commonly used as the one above, but we show
 548     it here to help you understand the following topics.
 549    </para>
 550
 551    <para>
 552     <indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
 553
 554     Now we will figure out how we can get the Hayward records back in.
 555     What we want the query to do is to scan the
 556     <structname>weather</structname> table and for each row to find the
 557     matching <structname>cities</structname> row(s).  If no matching row is
 558     found we want some <quote>empty values</quote> to be substituted
 559     for the <structname>cities</structname> table's columns.  This kind
 560     of query is called an <firstterm>outer join</firstterm>.  (The
 561     joins we have seen so far are inner joins.)  The command looks
 562     like this:
 563
 564 <programlisting>
 565 SELECT *
 566     FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
 567
 568      city      | temp_lo | temp_hi | prcp |    date    |     name      | location
 569 ---------------+---------+---------+------+------------+---------------+-----------
 570  Hayward       |      37 |      54 |      | 1994-11-29 |               |
 571  San Francisco |      46 |      50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
 572  San Francisco |      43 |      57 |    0 | 1994-11-29 | San Francisco | (-194,53)
 573 (3 rows)
 574 </programlisting>
 575
 576     This query is called a <firstterm>left outer
 577     join</firstterm> because the table mentioned on the left of the
 578     join operator will have each of its rows in the output at least
 579     once, whereas the table on the right will only have those rows
 580     output that match some row of the left table.  When outputting a
 581     left-table row for which there is no right-table match, empty (null)
 582     values are substituted for the right-table columns.
 583    </para>
 584
 585    <formalpara>
 586     <title>Exercise:</title>
 587
 588     <para>
 589      There are also right outer joins and full outer joins.  Try to
 590      find out what those do.
 591     </para>
 592    </formalpara>
 593
 594    <para>
 595     <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
 596     <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
 597
 598     We can also join a table against itself.  This is called a
 599     <firstterm>self join</firstterm>.  As an example, suppose we wish
 600     to find all the weather records that are in the temperature range
 601     of other weather records.  So we need to compare the
 602     <structfield>temp_lo</> and <structfield>temp_hi</> columns of
 603     each <structname>weather</structname> row to the
 604     <structfield>temp_lo</structfield> and
 605     <structfield>temp_hi</structfield> columns of all other
 606     <structname>weather</structname> rows.  We can do this with the
 607     following query:
 608
 609 <programlisting>
 610 SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,
 611     W2.city, W2.temp_lo AS low, W2.temp_hi AS high
 612     FROM weather W1, weather W2
 613     WHERE W1.temp_lo &lt; W2.temp_lo
 614     AND W1.temp_hi &gt; W2.temp_hi;
 615
 616      city      | low | high |     city      | low | high
 617 ---------------+-----+------+---------------+-----+------
 618  San Francisco |  43 |   57 | San Francisco |  46 |   50
 619  Hayward       |  37 |   54 | San Francisco |  46 |   50
 620 (2 rows)
 621 </programlisting>
 622
 623     Here we have relabeled the weather table as <literal>W1</> and
 624     <literal>W2</> to be able to distinguish the left and right side
 625     of the join.  You can also use these kinds of aliases in other
 626     queries to save some typing, e.g.:
 627 <programlisting>
 628 SELECT *
 629     FROM weather w, cities c
 630     WHERE w.city = c.name;
 631 </programlisting>
 632     You will encounter this style of abbreviating quite frequently.
 633    </para>
 634   </sect1>
 635
 636
 637   <sect1 id="tutorial-agg">
 638    <title>Aggregate Functions</title>
 639
 640    <indexterm zone="tutorial-agg">
 641     <primary>aggregate function</primary>
 642    </indexterm>
 643
 644    <para>
 645     Like  most  other relational database products,
 646     <productname>PostgreSQL</productname> supports
 647     <firstterm>aggregate functions</>.
 648     An aggregate function computes a single result from multiple input rows.
 649     For example, there are aggregates to compute the
 650     <function>count</function>, <function>sum</function>,
 651     <function>avg</function> (average), <function>max</function> (maximum) and
 652     <function>min</function> (minimum) over a set of rows.
 653    </para>
 654
 655    <para>
 656     As an example, we can find the highest low-temperature reading anywhere
 657     with:
 658
 659 <programlisting>
 660 SELECT max(temp_lo) FROM weather;
 661 </programlisting>
 662
 663 <screen>
 664  max
 665 -----
 666   46
 667 (1 row)
 668 </screen>
 669    </para>
 670
 671    <para>
 672     <indexterm><primary>subquery</primary></indexterm>
 673
 674     If we wanted to know what city (or cities) that reading occurred in,
 675     we might try:
 676
 677 <programlisting>
 678 SELECT city FROM weather WHERE temp_lo = max(temp_lo);     <lineannotation>WRONG</lineannotation>
 679 </programlisting>
 680
 681     but this will not work since the aggregate
 682     <function>max</function> cannot be used in the
 683     <literal>WHERE</literal> clause.  (This restriction exists because
 684     the <literal>WHERE</literal> clause determines which rows will be
 685     included in the aggregate calculation; so obviously it has to be evaluated
 686     before aggregate functions are computed.)
 687     However, as is often the case
 688     the query can be restated to accomplish the desired result, here
 689     by using a <firstterm>subquery</firstterm>:
 690
 691 <programlisting>
 692 SELECT city FROM weather
 693     WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
 694 </programlisting>
 695
 696 <screen>
 697      city
 698 ---------------
 699  San Francisco
 700 (1 row)
 701 </screen>
 702
 703     This is OK because the subquery is an independent computation
 704     that computes its own aggregate separately from what is happening
 705     in the outer query.
 706    </para>
 707
 708    <para>
 709     <indexterm><primary>GROUP BY</primary></indexterm>
 710     <indexterm><primary>HAVING</primary></indexterm>
 711
 712     Aggregates are also very useful in combination with <literal>GROUP
 713     BY</literal> clauses.  For example, we can get the maximum low
 714     temperature observed in each city with:
 715
 716 <programlisting>
 717 SELECT city, max(temp_lo)
 718     FROM weather
 719     GROUP BY city;
 720 </programlisting>
 721
 722 <screen>
 723      city      | max
 724 ---------------+-----
 725  Hayward       |  37
 726  San Francisco |  46
 727 (2 rows)
 728 </screen>
 729
 730     which gives us one output row per city.  Each aggregate result is
 731     computed over the table rows matching that city.
 732     We can filter these grouped
 733     rows using <literal>HAVING</literal>:
 734
 735 <programlisting>
 736 SELECT city, max(temp_lo)
 737     FROM weather
 738     GROUP BY city
 739     HAVING max(temp_lo) &lt; 40;
 740 </programlisting>
 741
 742 <screen>
 743   city   | max
 744 ---------+-----
 745  Hayward |  37
 746 (1 row)
 747 </screen>
 748
 749     which gives us the same results for only the cities that have all
 750     <structfield>temp_lo</> values below 40.  Finally, if we only care about
 751     cities whose
 752     names begin with <quote><literal>S</literal></quote>, we might do:
 753
 754 <programlisting>
 755 SELECT city, max(temp_lo)
 756     FROM weather
 757     WHERE city LIKE 'S%'<co id="co.tutorial-agg-like">
 758     GROUP BY city
 759     HAVING max(temp_lo) &lt; 40;
 760 </programlisting>
 761    <calloutlist>
 762     <callout arearefs="co.tutorial-agg-like">
 763      <para>
 764       The <literal>LIKE</literal> operator does pattern matching and
 765       is explained in <xref linkend="functions-matching">.
 766      </para>
 767     </callout>
 768    </calloutlist>
 769    </para>
 770
 771    <para>
 772     It is important to understand the interaction between aggregates and
 773     <acronym>SQL</acronym>'s <literal>WHERE</literal> and <literal>HAVING</literal> clauses.
 774     The fundamental difference between <literal>WHERE</literal> and
 775     <literal>HAVING</literal> is this: <literal>WHERE</literal> selects
 776     input rows before groups and aggregates are computed (thus, it controls
 777     which rows go into the aggregate computation), whereas
 778     <literal>HAVING</literal> selects group rows after groups and
 779     aggregates are computed.  Thus, the
 780     <literal>WHERE</literal> clause must not contain aggregate functions;
 781     it makes no sense to try to use an aggregate to determine which rows
 782     will be inputs to the aggregates.  On the other hand, the
 783     <literal>HAVING</literal> clause always contains aggregate functions.
 784     (Strictly speaking, you are allowed to write a <literal>HAVING</literal>
 785     clause that doesn't use aggregates, but it's seldom useful. The same
 786     condition could be used more efficiently at the <literal>WHERE</literal>
 787     stage.)
 788    </para>
 789
 790    <para>
 791     In the previous example, we can apply the city name restriction in
 792     <literal>WHERE</literal>, since it needs no aggregate.  This is
 793     more efficient than adding the restriction to <literal>HAVING</literal>,
 794     because we avoid doing the grouping and aggregate calculations
 795     for all rows that fail the <literal>WHERE</literal> check.
 796    </para>
 797   </sect1>
 798
 799
 800   <sect1 id="tutorial-update">
 801    <title>Updates</title>
 802
 803    <indexterm zone="tutorial-update">
 804     <primary>UPDATE</primary>
 805    </indexterm>
 806
 807    <para>
 808     You can update existing rows using the
 809     <command>UPDATE</command> command.
 810     Suppose you discover the temperature readings are
 811     all off by 2 degrees after November 28.  You can correct the
 812     data as follows:
 813
 814 <programlisting>
 815 UPDATE weather
 816     SET temp_hi = temp_hi - 2,  temp_lo = temp_lo - 2
 817     WHERE date &gt; '1994-11-28';
 818 </programlisting>
 819    </para>
 820
 821    <para>
 822     Look at the new state of the data:
 823 <programlisting>
 824 SELECT * FROM weather;
 825
 826      city      | temp_lo | temp_hi | prcp |    date
 827 ---------------+---------+---------+------+------------
 828  San Francisco |      46 |      50 | 0.25 | 1994-11-27
 829  San Francisco |      41 |      55 |    0 | 1994-11-29
 830  Hayward       |      35 |      52 |      | 1994-11-29
 831 (3 rows)
 832 </programlisting>
 833    </para>
 834   </sect1>
 835
 836   <sect1 id="tutorial-delete">
 837    <title>Deletions</title>
 838
 839    <indexterm zone="tutorial-delete">
 840     <primary>DELETE</primary>
 841    </indexterm>
 842
 843    <para>
 844     Rows can be removed from a table using the <command>DELETE</command>
 845     command.
 846     Suppose you are no longer interested in the weather of Hayward.
 847     Then you can do the following to delete those rows from the table:
 848 <programlisting>
 849 DELETE FROM weather WHERE city = 'Hayward';
 850 </programlisting>
 851
 852     All weather records belonging to Hayward are removed.
 853
 854 <programlisting>
 855 SELECT * FROM weather;
 856 </programlisting>
 857
 858 <screen>
 859      city      | temp_lo | temp_hi | prcp |    date
 860 ---------------+---------+---------+------+------------
 861  San Francisco |      46 |      50 | 0.25 | 1994-11-27
 862  San Francisco |      41 |      55 |    0 | 1994-11-29
 863 (2 rows)
 864 </screen>
 865    </para>
 866
 867    <para>
 868     One should be wary of statements of the form
 869 <synopsis>
 870 DELETE FROM <replaceable>tablename</replaceable>;
 871 </synopsis>
 872
 873     Without a qualification, <command>DELETE</command> will
 874     remove  <emphasis>all</>  rows from the given table, leaving it
 875     empty.  The system will not request confirmation before
 876     doing this!
 877    </para>
 878   </sect1>
 879
 880  </chapter>