1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
5 >Routine Database Maintenance Tasks</TITLE
8 CONTENT="Modular DocBook HTML Stylesheet Version 1.7"><LINK
10 HREF="mailto:pgsql-docs@postgresql.org"><LINK
12 TITLE="PostgreSQL 7.4.1 Documentation"
13 HREF="index.html"><LINK
15 TITLE="Server Administration"
16 HREF="admin.html"><LINK
18 TITLE="Character Set Support"
19 HREF="multibyte.html"><LINK
21 TITLE="Routine Reindexing"
22 HREF="routine-reindex.html"><LINK
25 HREF="stylesheet.css"><META
27 CONTENT="2003-12-22T03:48:47"></HEAD
33 SUMMARY="Header navigation table"
43 >PostgreSQL 7.4.1 Documentation</TH
81 HREF="routine-reindex.html"
96 >Chapter 21. Routine Database Maintenance Tasks</H1
102 >Table of Contents</B
106 HREF="maintenance.html#ROUTINE-VACUUMING"
107 >Routine Vacuuming</A
113 HREF="maintenance.html#VACUUM-FOR-SPACE-RECOVERY"
114 >Recovering disk space</A
118 HREF="maintenance.html#VACUUM-FOR-STATISTICS"
119 >Updating planner statistics</A
123 HREF="maintenance.html#VACUUM-FOR-WRAPAROUND"
124 >Preventing transaction ID wraparound failures</A
130 HREF="routine-reindex.html"
131 >Routine Reindexing</A
135 HREF="logfile-maintenance.html"
136 >Log File Maintenance</A
144 > There are a few routine maintenance chores that must be performed on
145 a regular basis to keep a <SPAN
149 server running smoothly. The tasks discussed here are repetitive
150 in nature and can easily be automated using standard Unix tools such
154 > scripts. But it is the database
155 administrator's responsibility to set up appropriate scripts, and to
156 check that they execute successfully.
159 > One obvious maintenance task is creation of backup copies of the data on a
160 regular schedule. Without a recent backup, you have no chance of recovery
161 after a catastrophe (disk failure, fire, mistakenly dropping a critical
162 table, etc.). The backup and recovery mechanisms available in
166 > are discussed at length in
173 > The other main category of maintenance task is periodic <SPAN
177 of the database. This activity is discussed in
179 HREF="maintenance.html#ROUTINE-VACUUMING"
184 > Something else that might need periodic attention is log file management.
185 This is discussed in <A
186 HREF="logfile-maintenance.html"
194 > is low-maintenance compared
195 to some other database management systems. Nonetheless,
196 appropriate attention to these tasks will go far towards ensuring a
197 pleasant and productive experience with the system.
204 NAME="ROUTINE-VACUUMING"
205 >21.1. Routine Vacuuming</A
218 must be run on a regular basis for several reasons:
226 >To recover disk space occupied by updated or deleted
231 >To update data statistics used by the
239 >To protect against loss of very old data due to
242 >transaction ID wraparound</I
248 The frequency and scope of the <TT
251 > operations performed for each of
252 these reasons will vary depending on the needs of each site.
253 Therefore, database administrators must understand these issues and
254 develop an appropriate maintenance strategy. This section concentrates
255 on explaining the high-level issues; for details about command syntax
256 and so on, see the <TT
259 > command reference page.
265 > 7.2, the standard form
269 > can run in parallel with normal database operations
270 (selects, inserts, updates, deletes, but not changes to table definitions).
271 Routine vacuuming is therefore not nearly as intrusive as it was in prior
272 releases, and it's not as critical to try to schedule it at low-usage
280 NAME="VACUUM-FOR-SPACE-RECOVERY"
281 >21.1.1. Recovering disk space</A
298 immediately remove the old version of the row.
299 This approach is necessary to gain the benefits of multiversion
300 concurrency control (see <A
304 must not be deleted while it is still potentially visible to other
305 transactions. But eventually, an outdated or deleted row version is no
306 longer of interest to any transaction. The space it occupies must be
307 reclaimed for reuse by new rows, to avoid infinite growth of disk
308 space requirements. This is done by running <TT
314 > Clearly, a table that receives frequent updates or deletes will need
315 to be vacuumed more often than tables that are seldom updated. It
316 may be useful to set up periodic <SPAN
320 vacuum only selected tables, skipping tables that are known not to
321 change often. This is only likely to be helpful if you have both
322 large heavily-updated tables and large seldom-updated tables --- the
323 extra cost of vacuuming a small table isn't enough to be worth
327 > The standard form of <TT
330 > is best used with the goal of
331 maintaining a fairly level steady-state usage of disk space. The standard
332 form finds old row versions and makes their space available for re-use within
333 the table, but it does not try very hard to shorten the table file and
334 return disk space to the operating system. If you need to return disk
335 space to the operating system you can use <TT
339 but what's the point of releasing disk space that will only have to be
340 allocated again soon? Moderately frequent standard <TT
344 are a better approach than infrequent <TT
348 maintaining heavily-updated tables.
351 > Recommended practice for most sites is to schedule a database-wide
355 > once a day at a low-usage time of day, supplemented
356 by more frequent vacuuming of heavily-updated tables if necessary.
357 (If you have multiple databases in a cluster, don't forget to
358 vacuum each one; the program <TT
369 vacuuming for space recovery.
375 > is recommended for cases where you know you have
376 deleted the majority of rows in a table, so that the steady-state size
377 of the table can be shrunk substantially with <TT
381 more aggressive approach.
384 > If you have a table whose contents are deleted completely every so often,
385 consider doing it with <TT
403 NAME="VACUUM-FOR-STATISTICS"
404 >21.1.2. Updating planner statistics</A
416 > query planner relies on
417 statistical information about the contents of tables in order to
418 generate good plans for queries. These statistics are gathered by
422 > command, which can be invoked by itself or
423 as an optional step in <TT
426 >. It is important to have
427 reasonably accurate statistics, otherwise poor choices of plans may
428 degrade database performance.
431 > As with vacuuming for space recovery, frequent updates of statistics
432 are more useful for heavily-updated tables than for seldom-updated
433 ones. But even for a heavily-updated table, there may be no need for
434 statistics updates if the statistical distribution of the data is
435 not changing much. A simple rule of thumb is to think about how much
436 the minimum and maximum values of the columns in the table change.
440 > column that contains the time
441 of row update will have a constantly-increasing maximum value as
442 rows are added and updated; such a column will probably need more
443 frequent statistics updates than, say, a column containing URLs for
444 pages accessed on a website. The URL column may receive changes just
445 as often, but the statistical distribution of its values probably
446 changes relatively slowly.
449 > It is possible to run <TT
452 > on specific tables and even
453 just specific columns of a table, so the flexibility exists to update some
454 statistics more frequently than others if your application requires it.
455 In practice, however, the usefulness of this feature is doubtful.
463 > is a fairly fast operation even on large tables,
464 because it uses a statistical random sampling of the rows of a table
465 rather than reading every single row. So it's probably much simpler
466 to just run it over the whole database every so often.
475 > Although per-column tweaking of <TT
478 > frequency may not be
479 very productive, you may well find it worthwhile to do per-column
480 adjustment of the level of detail of the statistics collected by
484 >. Columns that are heavily used in <TT
488 and have highly irregular data distributions may require a finer-grain
489 data histogram than other columns. See <TT
498 > Recommended practice for most sites is to schedule a database-wide
502 > once a day at a low-usage time of day; this can
503 usefully be combined with a nightly <TT
507 sites with relatively slowly changing table statistics may find that
508 this is overkill, and that less-frequent <TT
520 NAME="VACUUM-FOR-WRAPAROUND"
521 >21.1.3. Preventing transaction ID wraparound failures</A
530 >'s MVCC transaction semantics
531 depend on being able to compare transaction ID (<ACRONYM
535 numbers: a row version with an insertion XID greater than the current
536 transaction's XID is <SPAN
538 >"in the future"</SPAN
539 > and should not be visible
540 to the current transaction. But since transaction IDs have limited size
541 (32 bits at this writing) a cluster that runs for a long time (more
542 than 4 billion transactions) will suffer <I
546 >: the XID counter wraps around to zero, and all of a sudden
547 transactions that were in the past appear to be in the future --- which
548 means their outputs become invisible. In short, catastrophic data loss.
549 (Actually the data is still there, but that's cold comfort if you can't
556 > 7.2, the only defense
557 against XID wraparound was to re-<TT
561 billion transactions. This of course was not very satisfactory for
562 high-traffic sites, so a better solution has been devised. The new
563 approach allows a server to remain up indefinitely, without
567 > or any sort of restart. The price is this
568 maintenance requirement: <SPAN
572 >every table in the database must
573 be vacuumed at least once every billion transactions</I
578 > In practice this isn't an onerous requirement, but since the
579 consequences of failing to meet it can be complete data loss (not
580 just wasted disk space or slow performance), some special provisions
581 have been made to help database administrators keep track of the
582 time since the last <TT
585 >. The remainder of this
586 section gives the details.
589 > The new approach to XID comparison distinguishes two special XIDs,
597 >). These two XIDs are always considered older
598 than every normal XID. Normal XIDs (those greater than 2) are
599 compared using modulo-2<SUP
601 > arithmetic. This means
602 that for every normal XID, there are two billion XIDs that are
606 > and two billion that are <SPAN
610 way to say it is that the normal XID space is circular with no
611 endpoint. Therefore, once a row version has been created with a particular
612 normal XID, the row version will appear to be <SPAN
616 the next two billion transactions, no matter which normal XID we are
617 talking about. If the row version still exists after more than two billion
618 transactions, it will suddenly appear to be in the future. To
619 prevent data loss, old row versions must be reassigned the XID
623 > sometime before they reach the
624 two-billion-transactions-old mark. Once they are assigned this
625 special XID, they will appear to be <SPAN
629 normal transactions regardless of wraparound issues, and so such
630 row versions will be good until deleted, no matter how long that is. This
631 reassignment of XID is handled by <TT
640 >'s normal policy is to reassign <TT
644 to any row version with a normal XID more than one billion transactions in the
645 past. This policy preserves the original insertion XID until it is not
646 likely to be of interest anymore. (In fact, most row versions will probably
647 live and die without ever being <SPAN
650 >.) With this policy,
651 the maximum safe interval between <TT
655 is exactly one billion transactions: if you wait longer, it's possible
656 that a row version that was not quite old enough to be reassigned last time
657 is now more than two billion transactions old and has wrapped around
658 into the future --- i.e., is lost to you. (Of course, it'll reappear
659 after another two billion transactions, but that's no help.)
665 > runs are needed anyway for the reasons
666 described earlier, it's unlikely that any table would not be vacuumed
667 for as long as a billion transactions. But to help administrators ensure
668 this constraint is met, <TT
671 > stores transaction ID
672 statistics in the system table <TT
679 > column of a database's
683 > row is updated at the completion of any
684 database-wide vacuum operation (i.e., <TT
688 name a specific table). The value stored in this field is the freeze
689 cutoff XID that was used by that <TT
692 > command. All normal
693 XIDs older than this cutoff XID are guaranteed to have been replaced by
697 > within that database. A convenient way to
698 examine this information is to execute the query
701 CLASS="PROGRAMLISTING"
702 >SELECT datname, age(datfrozenxid) FROM pg_database;</PRE
708 > column measures the number of transactions from the
709 cutoff XID to the current transaction's XID.
712 > With the standard freezing policy, the <TT
716 at one billion for a freshly-vacuumed database. When the <TT
720 approaches two billion, the database must be vacuumed again to avoid
721 risk of wraparound failures. Recommended practice is to vacuum each
722 database at least once every half-a-billion (500 million) transactions,
723 so as to provide plenty of safety margin. To help meet this rule,
724 each database-wide <TT
727 > automatically delivers a warning
735 > of more than 1.5 billion transactions, for example:
738 CLASS="PROGRAMLISTING"
740 WARNING: some databases have not been vacuumed in 1613770184 transactions
741 HINT: Better vacuum them within 533713463 transactions, or you may have a wraparound failure.
753 aggressive freezing policy: row versions are frozen if they are old enough
754 to be considered good by all open transactions. In particular, if a
758 > is performed in an otherwise-idle
759 database, it is guaranteed that <SPAN
765 > row versions in that
766 database will be frozen. Hence, as long as the database is not
767 modified in any way, it will not need subsequent vacuuming to avoid
768 transaction ID wraparound problems. This technique is used by
776 It should also be used to prepare any user-created databases that
787 >, since there isn't any convenient way to
788 vacuum a database that you can't connect to. Note that
792 >'s automatic warning message about
793 unvacuumed databases will ignore <TT
804 giving false warnings about these databases; therefore it's up to
805 you to ensure that such databases are frozen correctly.
815 SUMMARY="Footer navigation table"
826 HREF="multibyte.html"
844 HREF="routine-reindex.html"
854 >Character Set Support</TD
868 >Routine Reindexing</TD