From fdff883aca7f13660d01e708f38bfb105c3c7872 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Fri, 21 Oct 2005 19:39:08 +0000 Subject: [PATCH] Clean up autovacuum documentation, which was a bit out of sync with what the code actually does, and needed copy-editing anyway. Also take the opportunity to expand the section on routine reindexing. --- doc/src/sgml/maintenance.sgml | 134 +++++++++++++++++++++++++++++------------- 1 file changed, 92 insertions(+), 42 deletions(-) diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index d347e27332..672d740930 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -1,5 +1,5 @@ @@ -474,9 +474,9 @@ HINT: Stop the postmaster and use a standalone backend to VACUUM in "mydb". tuples. These checks use the row-level statistics collection facility; therefore, the autovacuum daemon cannot be used unless and are set true. Also, it's - important to allow a slot for the autovacuum process when choosing the - value of . + linkend="guc-stats-row-level"> are set to true. Also, + it's important to allow a slot for the autovacuum process when choosing + the value of . @@ -487,75 +487,91 @@ HINT: Stop the postmaster and use a standalone backend to VACUUM in "mydb". database-wide VACUUM call, or VACUUM FREEZE if it's a template database, and then terminates. If no database fulfills this criterion, the one that was least recently - processed by autovacuum itself is chosen. In this mode, each table in - the database is checked for new and obsolete tuples, according to the - applicable autovacuum parameters. If a - pg_autovacuum tuple is found for this - table, these settings are applied; otherwise the global values in - postgresql.conf are used. See - for more details on the global settings. + processed by autovacuum is chosen. In this case each table in + the selected database is checked, and individual VACUUM + or ANALYZE commands are issued as needed. - For each table, two conditions are used to determine which operation to - apply. If the number of obsolete tuples since the last + For each table, two conditions are used to determine which operation(s) + to apply. If the number of obsolete tuples since the last VACUUM exceeds the vacuum threshold, the - table is vacuumed and analyzed. The vacuum threshold is defined as: + table is vacuumed. The vacuum threshold is defined as: vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples where the vacuum base threshold is - pg_autovacuum.vac_base_thresh, + , the vacuum scale factor is - pg_autovacuum.vac_scale_factor + , and the number of tuples is pg_class.reltuples. - The number of obsolete tuples is taken from the statistics - collector, which is a semi-accurate count updated by each + The number of obsolete tuples is obtained from the statistics + collector; it is a semi-accurate count updated by each UPDATE and DELETE operation. (It is only semi-accurate because some information may be lost under heavy - load.) For analyze, a similar condition is used: the threshold, calculated - by an equivalent equation to that above, is compared to the number of - new tuples, that is, those created by the INSERT and - COPY commands. + load.) For analyze, a similar condition is used: the threshold, defined as + +analyze threshold = analyze base threshold + analyze scale factor * number of tuples + + is compared to the total number of tuples inserted, updated, or deleted + since the last ANALYZE. - Note that if any of the values in pg_autovacuum - are set to a negative number, or if a tuple is not present at all in - pg_autovacuum for any particular table, the - equivalent value from postgresql.conf is used. + The default thresholds and scale factors are taken from + postgresql.conf, but it is possible to override them + on a table-by-table basis by making entries in the system catalog + pg_autovacuum. + If a pg_autovacuum row exists for a particular + table, the settings it specifies are applied; otherwise the global + settings are used. See for + more details on the global settings. Besides the base threshold values and scale factors, there are three - parameters that can be set for each table in pg_autovacuum. - The first parameter, pg_autovacuum.enabled, - can be used to instruct the autovacuum daemon to skip any particular table - by setting it to false. - The other two, the vacuum cost delay + more parameters that can be set for each table in + pg_autovacuum. + The first, pg_autovacuum.enabled, + can be set to false to instruct the autovacuum daemon + to skip that particular table entirely. In this case + autovacuum will only touch the table when it vacuums the entire database + to prevent transaction ID wraparound. + The other two parameters, the vacuum cost delay (pg_autovacuum.vac_cost_delay) and the vacuum cost limit (pg_autovacuum.vac_cost_limit), are used to set table-specific values for the - feature. The above note about negative values also applies here, but - also note that if the postgresql.conf variables - autovacuum_vacuum_cost_limit and - autovacuum_vacuum_cost_delay are also set to negative - values, the global vacuum_cost_limit and - vacuum_cost_delay values will be used instead. + feature. - + + If any of the values in pg_autovacuum + are set to a negative number, or if a row is not present at all in + pg_autovacuum for any particular table, the + corresponding values from postgresql.conf are used. + + + + There is not currently any support for making + pg_autovacuum entries, except by doing + manual INSERTs into the catalog. This feature will be + improved in future releases, and it is likely that the catalog + definition will change. + + + The contents of the pg_autovacuum system catalog are currently not saved in database dumps created by the tools pg_dump and pg_dumpall. - If you need to preserve them across a dump/reload cycle, make sure you + If you want to preserve them across a dump/reload cycle, make sure you dump the catalog manually. - + @@ -571,8 +587,42 @@ vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuple In some situations it is worthwhile to rebuild indexes periodically with the REINDEX command. - However, PostgreSQL 7.4 has substantially reduced the need - for this activity compared to earlier releases. + + + + In PostgreSQL releases before 7.4, periodic reindexing + was frequently necessary to avoid index bloat, due to lack of + internal space reclamation in btree indexes. Any situation in which the + range of index keys changed over time — for example, an index on + timestamps in a table where old entries are eventually deleted — + would result in bloat, because index pages for no-longer-needed portions + of the key range were not reclaimed for re-use. Over time, the index size + could become indefinitely much larger than the amount of useful data in it. + + + + In PostgreSQL 7.4 and later, index pages that have become + completely empty are reclaimed for re-use. There is still a possibility + for inefficient use of space: if all but a few index keys on a page have + been deleted, the page remains allocated. So a usage pattern in which all + but a few keys in each range are eventually deleted will see poor use of + space. The potential for bloat is not indefinite — at worst there + will be one key per page — but it may still be worthwhile to schedule + periodic reindexing for indexes that have such usage patterns. + + + + The potential for bloat in non-btree indexes has not been well + characterized. It is a good idea to keep an eye on the index's physical + size when using any non-btree index type. + + + + Also, for btree indexes a freshly-constructed index is somewhat faster to + access than one that has been updated many times, because logically + adjacent pages are usually also physically adjacent in a newly built index. + (This consideration does not currently apply to non-btree indexes.) It + might be worthwhile to reindex periodically just to improve access speed. -- 2.11.0