1 <!-- doc/src/sgml/query.sgml -->
3 <chapter id="tutorial-sql">
4 <title>The <acronym>SQL</acronym> Language</title>
6 <sect1 id="tutorial-sql-intro">
7 <title>Introduction</title>
10 This chapter provides an overview of how to use
11 <acronym>SQL</acronym> to perform simple operations. This
12 tutorial is only intended to give you an introduction and is in no
13 way a complete tutorial on <acronym>SQL</acronym>. Numerous books
14 have been written on <acronym>SQL</acronym>, including <xref
15 linkend="MELT93"> and <xref linkend="DATE97">.
16 You should be aware that some <productname>PostgreSQL</productname>
17 language features are extensions to the standard.
21 In the examples that follow, we assume that you have created a
22 database named <literal>mydb</literal>, as described in the previous
23 chapter, and have been able to start <application>psql</application>.
27 Examples in this manual can also be found in the
28 <productname>PostgreSQL</productname> source distribution
29 in the directory <filename>src/tutorial/</filename>. (Binary
30 distributions of <productname>PostgreSQL</productname> might not
31 compile these files.) To use those
32 files, first change to that directory and run <application>make</>:
35 <prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/src/tutorial</userinput>
36 <prompt>$</prompt> <userinput>make</userinput>
39 This creates the scripts and compiles the C files containing user-defined
40 functions and types. Then, to start the tutorial, do the following:
43 <prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/tutorial</userinput>
44 <prompt>$</prompt> <userinput>psql -s mydb</userinput>
49 <prompt>mydb=></prompt> <userinput>\i basics.sql</userinput>
52 The <literal>\i</literal> command reads in commands from the
53 specified file. <command>psql</command>'s <literal>-s</> option puts you in
54 single step mode which pauses before sending each statement to the
55 server. The commands used in this section are in the file
56 <filename>basics.sql</filename>.
61 <sect1 id="tutorial-concepts">
62 <title>Concepts</title>
65 <indexterm><primary>relational database</primary></indexterm>
66 <indexterm><primary>hierarchical database</primary></indexterm>
67 <indexterm><primary>object-oriented database</primary></indexterm>
68 <indexterm><primary>relation</primary></indexterm>
69 <indexterm><primary>table</primary></indexterm>
71 <productname>PostgreSQL</productname> is a <firstterm>relational
72 database management system</firstterm> (<acronym>RDBMS</acronym>).
73 That means it is a system for managing data stored in
74 <firstterm>relations</firstterm>. Relation is essentially a
75 mathematical term for <firstterm>table</firstterm>. The notion of
76 storing data in tables is so commonplace today that it might
77 seem inherently obvious, but there are a number of other ways of
78 organizing databases. Files and directories on Unix-like
79 operating systems form an example of a hierarchical database. A
80 more modern development is the object-oriented database.
84 <indexterm><primary>row</primary></indexterm>
85 <indexterm><primary>column</primary></indexterm>
87 Each table is a named collection of <firstterm>rows</firstterm>.
88 Each row of a given table has the same set of named
89 <firstterm>columns</firstterm>,
90 and each column is of a specific data type. Whereas columns have
91 a fixed order in each row, it is important to remember that SQL
92 does not guarantee the order of the rows within the table in any
93 way (although they can be explicitly sorted for display).
97 <indexterm><primary>database cluster</primary></indexterm>
98 <indexterm><primary>cluster</primary><secondary>of databases</secondary><see>database cluster</see></indexterm>
100 Tables are grouped into databases, and a collection of databases
101 managed by a single <productname>PostgreSQL</productname> server
102 instance constitutes a database <firstterm>cluster</firstterm>.
107 <sect1 id="tutorial-table">
108 <title>Creating a New Table</title>
110 <indexterm zone="tutorial-table">
111 <primary>CREATE TABLE</primary>
115 You can create a new table by specifying the table
116 name, along with all column names and their types:
119 CREATE TABLE weather (
121 temp_lo int, -- low temperature
122 temp_hi int, -- high temperature
123 prcp real, -- precipitation
128 You can enter this into <command>psql</command> with the line
129 breaks. <command>psql</command> will recognize that the command
130 is not terminated until the semicolon.
134 White space (i.e., spaces, tabs, and newlines) can be used freely
135 in SQL commands. That means you can type the command aligned
136 differently than above, or even all on one line. Two dashes
137 (<quote><literal>--</literal></quote>) introduce comments.
138 Whatever follows them is ignored up to the end of the line. SQL
139 is case insensitive about key words and identifiers, except
140 when identifiers are double-quoted to preserve the case (not done
145 <type>varchar(80)</type> specifies a data type that can store
146 arbitrary character strings up to 80 characters in length.
147 <type>int</type> is the normal integer type. <type>real</type> is
148 a type for storing single precision floating-point numbers.
149 <type>date</type> should be self-explanatory. (Yes, the column of
150 type <type>date</type> is also named <structfield>date</structfield>.
151 This might be convenient or confusing — you choose.)
155 <productname>PostgreSQL</productname> supports the standard
156 <acronym>SQL</acronym> types <type>int</type>,
157 <type>smallint</type>, <type>real</type>, <type>double
158 precision</type>, <type>char(<replaceable>N</>)</type>,
159 <type>varchar(<replaceable>N</>)</type>, <type>date</type>,
160 <type>time</type>, <type>timestamp</type>, and
161 <type>interval</type>, as well as other types of general utility
162 and a rich set of geometric types.
163 <productname>PostgreSQL</productname> can be customized with an
164 arbitrary number of user-defined data types. Consequently, type
165 names are not key words in the syntax, except where required to
166 support special cases in the <acronym>SQL</acronym> standard.
170 The second example will store cities and their associated
171 geographical location:
173 CREATE TABLE cities (
178 The <type>point</type> type is an example of a
179 <productname>PostgreSQL</productname>-specific data type.
184 <primary>DROP TABLE</primary>
187 Finally, it should be mentioned that if you don't need a table any
188 longer or want to recreate it differently you can remove it using
189 the following command:
191 DROP TABLE <replaceable>tablename</replaceable>;
197 <sect1 id="tutorial-populate">
198 <title>Populating a Table With Rows</title>
200 <indexterm zone="tutorial-populate">
201 <primary>INSERT</primary>
205 The <command>INSERT</command> statement is used to populate a table with
209 INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
212 Note that all data types use rather obvious input formats.
213 Constants that are not simple numeric values usually must be
214 surrounded by single quotes (<literal>'</>), as in the example.
216 <type>date</type> type is actually quite flexible in what it
217 accepts, but for this tutorial we will stick to the unambiguous
222 The <type>point</type> type requires a coordinate pair as input,
225 INSERT INTO cities VALUES ('San Francisco', '(-194.0, 53.0)');
230 The syntax used so far requires you to remember the order of the
231 columns. An alternative syntax allows you to list the columns
234 INSERT INTO weather (city, temp_lo, temp_hi, prcp, date)
235 VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29');
237 You can list the columns in a different order if you wish or
238 even omit some columns, e.g., if the precipitation is unknown:
240 INSERT INTO weather (date, city, temp_hi, temp_lo)
241 VALUES ('1994-11-29', 'Hayward', 54, 37);
243 Many developers consider explicitly listing the columns better
244 style than relying on the order implicitly.
248 Please enter all the commands shown above so you have some data to
249 work with in the following sections.
254 <primary>COPY</primary>
257 You could also have used <command>COPY</command> to load large
258 amounts of data from flat-text files. This is usually faster
259 because the <command>COPY</command> command is optimized for this
260 application while allowing less flexibility than
261 <command>INSERT</command>. An example would be:
264 COPY weather FROM '/home/user/weather.txt';
267 where the file name for the source file must be available on the
268 machine running the backend process, not the client, since the backend process
269 reads the file directly. You can read more about the
270 <command>COPY</command> command in <xref linkend="sql-copy">.
275 <sect1 id="tutorial-select">
276 <title>Querying a Table</title>
279 <indexterm><primary>query</primary></indexterm>
280 <indexterm><primary>SELECT</primary></indexterm>
282 To retrieve data from a table, the table is
283 <firstterm>queried</firstterm>. An <acronym>SQL</acronym>
284 <command>SELECT</command> statement is used to do this. The
285 statement is divided into a select list (the part that lists the
286 columns to be returned), a table list (the part that lists the
287 tables from which to retrieve the data), and an optional
288 qualification (the part that specifies any restrictions). For
289 example, to retrieve all the rows of table
290 <structname>weather</structname>, type:
292 SELECT * FROM weather;
294 Here <literal>*</literal> is a shorthand for <quote>all columns</quote>.
297 While <literal>SELECT *</literal> is useful for off-the-cuff
298 queries, it is widely considered bad style in production code,
299 since adding a column to the table would change the results.
302 So the same result would be had with:
304 SELECT city, temp_lo, temp_hi, prcp, date FROM weather;
307 The output should be:
310 city | temp_lo | temp_hi | prcp | date
311 ---------------+---------+---------+------+------------
312 San Francisco | 46 | 50 | 0.25 | 1994-11-27
313 San Francisco | 43 | 57 | 0 | 1994-11-29
314 Hayward | 37 | 54 | | 1994-11-29
320 You can write expressions, not just simple column references, in the
321 select list. For example, you can do:
323 SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather;
327 city | temp_avg | date
328 ---------------+----------+------------
329 San Francisco | 48 | 1994-11-27
330 San Francisco | 50 | 1994-11-29
331 Hayward | 45 | 1994-11-29
334 Notice how the <literal>AS</literal> clause is used to relabel the
335 output column. (The <literal>AS</literal> clause is optional.)
339 A query can be <quote>qualified</> by adding a <literal>WHERE</>
340 clause that specifies which rows are wanted. The <literal>WHERE</>
341 clause contains a Boolean (truth value) expression, and only rows for
342 which the Boolean expression is true are returned. The usual
343 Boolean operators (<literal>AND</literal>,
344 <literal>OR</literal>, and <literal>NOT</literal>) are allowed in
345 the qualification. For example, the following
346 retrieves the weather of San Francisco on rainy days:
349 SELECT * FROM weather
350 WHERE city = 'San Francisco' AND prcp > 0.0;
354 city | temp_lo | temp_hi | prcp | date
355 ---------------+---------+---------+------+------------
356 San Francisco | 46 | 50 | 0.25 | 1994-11-27
362 <indexterm><primary>ORDER BY</primary></indexterm>
364 You can request that the results of a query
365 be returned in sorted order:
368 SELECT * FROM weather
373 city | temp_lo | temp_hi | prcp | date
374 ---------------+---------+---------+------+------------
375 Hayward | 37 | 54 | | 1994-11-29
376 San Francisco | 43 | 57 | 0 | 1994-11-29
377 San Francisco | 46 | 50 | 0.25 | 1994-11-27
380 In this example, the sort order isn't fully specified, and so you
381 might get the San Francisco rows in either order. But you'd always
382 get the results shown above if you do:
385 SELECT * FROM weather
386 ORDER BY city, temp_lo;
391 <indexterm><primary>DISTINCT</primary></indexterm>
392 <indexterm><primary>duplicate</primary></indexterm>
394 You can request that duplicate rows be removed from the result of
410 Here again, the result row ordering might vary.
411 You can ensure consistent results by using <literal>DISTINCT</literal> and
412 <literal>ORDER BY</literal> together:
415 In some database systems, including older versions of
416 <productname>PostgreSQL</productname>, the implementation of
417 <literal>DISTINCT</literal> automatically orders the rows and
418 so <literal>ORDER BY</literal> is unnecessary. But this is not
419 required by the SQL standard, and current
420 <productname>PostgreSQL</productname> does not guarantee that
421 <literal>DISTINCT</literal> causes the rows to be ordered.
434 <sect1 id="tutorial-join">
435 <title>Joins Between Tables</title>
437 <indexterm zone="tutorial-join">
438 <primary>join</primary>
442 Thus far, our queries have only accessed one table at a time.
443 Queries can access multiple tables at once, or access the same
444 table in such a way that multiple rows of the table are being
445 processed at the same time. A query that accesses multiple rows
446 of the same or different tables at one time is called a
447 <firstterm>join</firstterm> query. As an example, say you wish to
448 list all the weather records together with the location of the
449 associated city. To do that, we need to compare the <structfield>city</>
450 column of each row of the <structname>weather</> table with the
451 <structfield>name</> column of all rows in the <structname>cities</>
452 table, and select the pairs of rows where these values match.
455 This is only a conceptual model. The join is usually performed
456 in a more efficient manner than actually comparing each possible
457 pair of rows, but this is invisible to the user.
460 This would be accomplished by the following query:
469 city | temp_lo | temp_hi | prcp | date | name | location
470 ---------------+---------+---------+------+------------+---------------+-----------
471 San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
472 San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
479 Observe two things about the result set:
483 There is no result row for the city of Hayward. This is
484 because there is no matching entry in the
485 <structname>cities</structname> table for Hayward, so the join
486 ignores the unmatched rows in the <structname>weather</> table. We will see
487 shortly how this can be fixed.
493 There are two columns containing the city name. This is
494 correct because the lists of columns from the
495 <structname>weather</structname> and
496 <structname>cities</structname> tables are concatenated. In
497 practice this is undesirable, though, so you will probably want
498 to list the output columns explicitly rather than using
499 <literal>*</literal>:
501 SELECT city, temp_lo, temp_hi, prcp, date, location
511 <title>Exercise:</title>
514 Attempt to determine the semantics of this query when the
515 <literal>WHERE</literal> clause is omitted.
520 Since the columns all had different names, the parser
521 automatically found which table they belong to. If there
522 were duplicate column names in the two tables you'd need to
523 <firstterm>qualify</> the column names to show which one you
527 SELECT weather.city, weather.temp_lo, weather.temp_hi,
528 weather.prcp, weather.date, cities.location
530 WHERE cities.name = weather.city;
533 It is widely considered good style to qualify all column names
534 in a join query, so that the query won't fail if a duplicate
535 column name is later added to one of the tables.
539 Join queries of the kind seen thus far can also be written in this
544 FROM weather INNER JOIN cities ON (weather.city = cities.name);
547 This syntax is not as commonly used as the one above, but we show
548 it here to help you understand the following topics.
552 <indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
554 Now we will figure out how we can get the Hayward records back in.
555 What we want the query to do is to scan the
556 <structname>weather</structname> table and for each row to find the
557 matching <structname>cities</structname> row(s). If no matching row is
558 found we want some <quote>empty values</quote> to be substituted
559 for the <structname>cities</structname> table's columns. This kind
560 of query is called an <firstterm>outer join</firstterm>. (The
561 joins we have seen so far are inner joins.) The command looks
566 FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
568 city | temp_lo | temp_hi | prcp | date | name | location
569 ---------------+---------+---------+------+------------+---------------+-----------
570 Hayward | 37 | 54 | | 1994-11-29 | |
571 San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
572 San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
576 This query is called a <firstterm>left outer
577 join</firstterm> because the table mentioned on the left of the
578 join operator will have each of its rows in the output at least
579 once, whereas the table on the right will only have those rows
580 output that match some row of the left table. When outputting a
581 left-table row for which there is no right-table match, empty (null)
582 values are substituted for the right-table columns.
586 <title>Exercise:</title>
589 There are also right outer joins and full outer joins. Try to
590 find out what those do.
595 <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
596 <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
598 We can also join a table against itself. This is called a
599 <firstterm>self join</firstterm>. As an example, suppose we wish
600 to find all the weather records that are in the temperature range
601 of other weather records. So we need to compare the
602 <structfield>temp_lo</> and <structfield>temp_hi</> columns of
603 each <structname>weather</structname> row to the
604 <structfield>temp_lo</structfield> and
605 <structfield>temp_hi</structfield> columns of all other
606 <structname>weather</structname> rows. We can do this with the
610 SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,
611 W2.city, W2.temp_lo AS low, W2.temp_hi AS high
612 FROM weather W1, weather W2
613 WHERE W1.temp_lo < W2.temp_lo
614 AND W1.temp_hi > W2.temp_hi;
616 city | low | high | city | low | high
617 ---------------+-----+------+---------------+-----+------
618 San Francisco | 43 | 57 | San Francisco | 46 | 50
619 Hayward | 37 | 54 | San Francisco | 46 | 50
623 Here we have relabeled the weather table as <literal>W1</> and
624 <literal>W2</> to be able to distinguish the left and right side
625 of the join. You can also use these kinds of aliases in other
626 queries to save some typing, e.g.:
629 FROM weather w, cities c
630 WHERE w.city = c.name;
632 You will encounter this style of abbreviating quite frequently.
637 <sect1 id="tutorial-agg">
638 <title>Aggregate Functions</title>
640 <indexterm zone="tutorial-agg">
641 <primary>aggregate function</primary>
645 Like most other relational database products,
646 <productname>PostgreSQL</productname> supports
647 <firstterm>aggregate functions</>.
648 An aggregate function computes a single result from multiple input rows.
649 For example, there are aggregates to compute the
650 <function>count</function>, <function>sum</function>,
651 <function>avg</function> (average), <function>max</function> (maximum) and
652 <function>min</function> (minimum) over a set of rows.
656 As an example, we can find the highest low-temperature reading anywhere
660 SELECT max(temp_lo) FROM weather;
672 <indexterm><primary>subquery</primary></indexterm>
674 If we wanted to know what city (or cities) that reading occurred in,
678 SELECT city FROM weather WHERE temp_lo = max(temp_lo); <lineannotation>WRONG</lineannotation>
681 but this will not work since the aggregate
682 <function>max</function> cannot be used in the
683 <literal>WHERE</literal> clause. (This restriction exists because
684 the <literal>WHERE</literal> clause determines which rows will be
685 included in the aggregate calculation; so obviously it has to be evaluated
686 before aggregate functions are computed.)
687 However, as is often the case
688 the query can be restated to accomplish the desired result, here
689 by using a <firstterm>subquery</firstterm>:
692 SELECT city FROM weather
693 WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
703 This is OK because the subquery is an independent computation
704 that computes its own aggregate separately from what is happening
709 <indexterm><primary>GROUP BY</primary></indexterm>
710 <indexterm><primary>HAVING</primary></indexterm>
712 Aggregates are also very useful in combination with <literal>GROUP
713 BY</literal> clauses. For example, we can get the maximum low
714 temperature observed in each city with:
717 SELECT city, max(temp_lo)
724 ---------------+-----
730 which gives us one output row per city. Each aggregate result is
731 computed over the table rows matching that city.
732 We can filter these grouped
733 rows using <literal>HAVING</literal>:
736 SELECT city, max(temp_lo)
739 HAVING max(temp_lo) < 40;
749 which gives us the same results for only the cities that have all
750 <structfield>temp_lo</> values below 40. Finally, if we only care about
752 names begin with <quote><literal>S</literal></quote>, we might do:
755 SELECT city, max(temp_lo)
757 WHERE city LIKE 'S%'<co id="co.tutorial-agg-like">
759 HAVING max(temp_lo) < 40;
762 <callout arearefs="co.tutorial-agg-like">
764 The <literal>LIKE</literal> operator does pattern matching and
765 is explained in <xref linkend="functions-matching">.
772 It is important to understand the interaction between aggregates and
773 <acronym>SQL</acronym>'s <literal>WHERE</literal> and <literal>HAVING</literal> clauses.
774 The fundamental difference between <literal>WHERE</literal> and
775 <literal>HAVING</literal> is this: <literal>WHERE</literal> selects
776 input rows before groups and aggregates are computed (thus, it controls
777 which rows go into the aggregate computation), whereas
778 <literal>HAVING</literal> selects group rows after groups and
779 aggregates are computed. Thus, the
780 <literal>WHERE</literal> clause must not contain aggregate functions;
781 it makes no sense to try to use an aggregate to determine which rows
782 will be inputs to the aggregates. On the other hand, the
783 <literal>HAVING</literal> clause always contains aggregate functions.
784 (Strictly speaking, you are allowed to write a <literal>HAVING</literal>
785 clause that doesn't use aggregates, but it's seldom useful. The same
786 condition could be used more efficiently at the <literal>WHERE</literal>
791 In the previous example, we can apply the city name restriction in
792 <literal>WHERE</literal>, since it needs no aggregate. This is
793 more efficient than adding the restriction to <literal>HAVING</literal>,
794 because we avoid doing the grouping and aggregate calculations
795 for all rows that fail the <literal>WHERE</literal> check.
800 <sect1 id="tutorial-update">
801 <title>Updates</title>
803 <indexterm zone="tutorial-update">
804 <primary>UPDATE</primary>
808 You can update existing rows using the
809 <command>UPDATE</command> command.
810 Suppose you discover the temperature readings are
811 all off by 2 degrees after November 28. You can correct the
816 SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2
817 WHERE date > '1994-11-28';
822 Look at the new state of the data:
824 SELECT * FROM weather;
826 city | temp_lo | temp_hi | prcp | date
827 ---------------+---------+---------+------+------------
828 San Francisco | 46 | 50 | 0.25 | 1994-11-27
829 San Francisco | 41 | 55 | 0 | 1994-11-29
830 Hayward | 35 | 52 | | 1994-11-29
836 <sect1 id="tutorial-delete">
837 <title>Deletions</title>
839 <indexterm zone="tutorial-delete">
840 <primary>DELETE</primary>
844 Rows can be removed from a table using the <command>DELETE</command>
846 Suppose you are no longer interested in the weather of Hayward.
847 Then you can do the following to delete those rows from the table:
849 DELETE FROM weather WHERE city = 'Hayward';
852 All weather records belonging to Hayward are removed.
855 SELECT * FROM weather;
859 city | temp_lo | temp_hi | prcp | date
860 ---------------+---------+---------+------+------------
861 San Francisco | 46 | 50 | 0.25 | 1994-11-27
862 San Francisco | 41 | 55 | 0 | 1994-11-29
868 One should be wary of statements of the form
870 DELETE FROM <replaceable>tablename</replaceable>;
873 Without a qualification, <command>DELETE</command> will
874 remove <emphasis>all</> rows from the given table, leaving it
875 empty. The system will not request confirmation before