Remove transactions TODO.detail.

author Bruce Momjian <bruce@momjian.us>

Tue, 17 Aug 2004 17:11:44 +0000 (17:11 +0000)

committer Bruce Momjian <bruce@momjian.us>

Tue, 17 Aug 2004 17:11:44 +0000 (17:11 +0000)
author Bruce Momjian <bruce@momjian.us>
Tue, 17 Aug 2004 17:11:44 +0000 (17:11 +0000)
committer Bruce Momjian <bruce@momjian.us>
Tue, 17 Aug 2004 17:11:44 +0000 (17:11 +0000)
diff --git a/doc/TODO.detail/transactions b/doc/TODO.detail/transactions

deleted file mode 100644 (file)

index ce7af3e..0000000
--- a/doc/TODO.detail/transactions
+++ /dev/null
@@ -1,1177 +0,0 @@
-From pgsql-hackers-owner+M11649@postgresql.org Wed Aug  1 15:22:46 2001
-Return-path: <pgsql-hackers-owner+M11649@postgresql.org>
-Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
-       by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f71JMjN09768
-       for <pgman@candle.pha.pa.us>; Wed, 1 Aug 2001 15:22:45 -0400 (EDT)
-Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
-       by postgresql.org (8.11.3/8.11.1) with SMTP id f71JMUf62338;
-       Wed, 1 Aug 2001 15:22:30 -0400 (EDT)
-       (envelope-from pgsql-hackers-owner+M11649@postgresql.org)
-Received: from sectorbase2.sectorbase.com (sectorbase2.sectorbase.com [63.88.121.62] (may be forged))
-       by postgresql.org (8.11.3/8.11.1) with SMTP id f71J4df57086
-       for <pgsql-hackers@postgresql.org>; Wed, 1 Aug 2001 15:04:40 -0400 (EDT)
-       (envelope-from vmikheev@SECTORBASE.COM)
-Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19)
-       id <PG1LSSPZ>; Wed, 1 Aug 2001 12:04:31 -0700
-Message-ID: <3705826352029646A3E91C53F7189E32016705@sectorbase2.sectorbase.com>
-From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
-To: "'pgsql-hackers@postgresql.org'" <pgsql-hackers@postgresql.org>
-Subject: [HACKERS] Using POSIX mutex-es
-Date: Wed, 1 Aug 2001 12:04:24 -0700 
-MIME-Version: 1.0
-X-Mailer: Internet Mail Service (5.5.2653.19)
-Content-Type: text/plain;
-       charset="koi8-r"
-Precedence: bulk
-Sender: pgsql-hackers-owner@postgresql.org
-Status: RO
-
-1. Just changed
-       TAS(lock) to pthread_mutex_trylock(lock)
-       S_LOCK(lock) to pthread_mutex_lock(lock)
-       S_UNLOCK(lock) to pthread_mutex_unlock(lock)
-(and S_INIT_LOCK to share mutex-es between processes).
-
-2. pgbench was initialized with scale 10.
-   SUN WS 10 (512Mb), Solaris 2.6 (I'm unable to test on E4500 -:()
-   -B 16384, wal_files 8, wal_buffers 256,
-   checkpoint_segments 64, checkpoint_timeout 3600
-   50 clients x 100 transactions
-   (after initialization DB dir was saved and before each test
-    copyed back and vacuum-ed).
-
-3. No difference.
-   Mutex version maybe 0.5-1 % faster (eg: 37.264238 tps vs 37.083339 tps).
-
-So - no gain, but no performance loss "from using pthread library"
-(I've also run tests with 1 client), at least on Solaris.
-
-And so - looks like we can use POSIX mutex-es and conditional variables
-(not semaphores; man pthread_cond_wait) and should implement light lmgr,
-probably with priority locking.
-
-Vadim
-
----------------------------(end of broadcast)---------------------------
-TIP 2: you can get off all lists at once with the unregister command
-    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
-
-From pgsql-hackers-owner+M18052=candle.pha.pa.us=pgman@postgresql.org Wed Jan 23 13:39:19 2002
-Return-path: <pgsql-hackers-owner+M18052=candle.pha.pa.us=pgman@postgresql.org>
-Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
-       by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0NIdIU26480
-       for <pgman@candle.pha.pa.us>; Wed, 23 Jan 2002 13:39:18 -0500 (EST)
-Received: (qmail 59371 invoked by alias); 23 Jan 2002 18:39:18 -0000
-Received: from unknown (HELO postgresql.org) (64.49.215.8)
-  by www.postgresql.org with SMTP; 23 Jan 2002 18:39:18 -0000
-Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
-       by postgresql.org (8.11.3/8.11.4) with ESMTP id g0NIJ8l47400
-       for <pgsql-hackers@postgreSQL.org>; Wed, 23 Jan 2002 13:19:08 -0500 (EST)
-       (envelope-from pgman@candle.pha.pa.us)
-Received: (from pgman@localhost)
-       by candle.pha.pa.us (8.11.6/8.10.1) id g0NIJ5i24508
-       for pgsql-hackers@postgreSQL.org; Wed, 23 Jan 2002 13:19:05 -0500 (EST)
-From: Bruce Momjian <pgman@candle.pha.pa.us>
-Message-ID: <200201231819.g0NIJ5i24508@candle.pha.pa.us>
-Subject: [HACKERS] Savepoints
-To: PostgreSQL-development <pgsql-hackers@postgresql.org>
-Date: Wed, 23 Jan 2002 13:19:05 -0500 (EST)
-X-Mailer: ELM [version 2.4ME+ PL96 (25)]
-MIME-Version: 1.0
-Content-Transfer-Encoding: 7bit
-Content-Type: text/plain; charset=US-ASCII
-Precedence: bulk
-Sender: pgsql-hackers-owner@postgresql.org
-Status: RO
-
-I have talked in the past about a possible implementation of
-savepoints/nested transactions.  I would like to more formally outline
-my ideas below.
-
-We have talked about using WAL for such a purpose, but that requires WAL
-files to remain for the life of a transaction, which seems unacceptable.
-Other database systems do that, and it is a pain for administrators.  I
-realized we could do some sort of WAL compaction, but that seems quite
-complex too.
-
-Basically, under my plan, WAL would be unchanged.  WAL's function is
-crash recovery, and it would retain that.  There would also be no
-on-disk changes.  I would use the command counter in certain cases to
-identify savepoints.
-
-My idea is to keep savepoint undo information in a private area per
-backend, either in memory or on disk.  We can either save the
-relid/tids of modified rows, or if there are too many, discard the
-saved ones and just remember the modified relids.  On rollback to save
-point, either clear up the modified relid/tids, or sequential scan
-through the relid and clear up all the tuples that have our transaction
-id and have command counters that are part of the undo savepoint.
-
-It seems marking undo savepoint rows with a fixed aborted transaction id
-would be the easiest solution.
-
-Of course, we only remember modified rows when we are in savepoints, and
-only undo them when we rollback to a savepoint.  Transaction processing
-remains the same.
-
-There is no reason for other backend to be able to see savepoint undo
-information, and keeping it private greatly simplifies the
-implementation.
-
--- 
-  Bruce Momjian                        |  http://candle.pha.pa.us
-  pgman@candle.pha.pa.us               |  (610) 853-3000
-  +  If your life is a hard drive,     |  830 Blythe Avenue
-  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
-
----------------------------(end of broadcast)---------------------------
-TIP 6: Have you searched our list archives?
-
-http://archives.postgresql.org
-
-From hstenger@adinet.com.uy Wed Jan 23 14:13:33 2002
-Return-path: <hstenger@adinet.com.uy>
-Received: from correo.adinet.com.uy (fecorreo01.adinet.com.uy [206.99.44.217])
-       by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0NJDWU29832
-       for <pgman@candle.pha.pa.us>; Wed, 23 Jan 2002 14:13:33 -0500 (EST)
-Received: from adinet.com.uy (200.61.76.155) by correo.adinet.com.uy (5.5.052) (authenticated as hstenger@adinet.com.uy)
-        id 3C4DBC5C00017E9F; Wed, 23 Jan 2002 16:13:25 -0300
-Message-ID: <3C4F0BC0.5CFBB919@adinet.com.uy>
-Date: Wed, 23 Jan 2002 16:15:12 -0300
-From: Haroldo Stenger <hstenger@adinet.com.uy>
-X-Mailer: Mozilla 4.78 [en] (Win98; U)
-X-Accept-Language: en
-MIME-Version: 1.0
-To: Bruce Momjian <pgman@candle.pha.pa.us>
-cc: PostgreSQL-development <pgsql-hackers@postgreSQL.org>
-Subject: Re: [HACKERS] Savepoints
-References: <200201231819.g0NIJ5i24508@candle.pha.pa.us>
-Content-Type: text/plain; charset=us-ascii
-Content-Transfer-Encoding: 7bit
-Status: OR
-
-Bruce Momjian wrote:
-> 
-> Basically, under my plan, WAL would be unchanged.  WAL's function is
-> crash recovery, and it would retain that.  There would also be no
-> on-disk changes.  I would use the command counter in certain cases to
-> identify savepoints.
-
-This is a pointer to the previous August thread, where your original proposal
-was posted, and some WAL/not WAL discussion took place. Just not to repeat the
-already mentioned points. Oh, it's google archive just for fun, and to not
-overload hub.org ;-)
-
-http://groups.google.com/groups?hl=en&threadm=200108050432.f754Wdo11696%40candle.pha.pa.us&rnum=1&prev=/groups%3Fhl%3Den%26selm%3D200108050432.f754Wdo11696%2540candle.pha.pa.us
-
-Regards,
-Haroldo.
-
-From vmikheev@SECTORBASE.COM Wed Jan 23 18:23:04 2002
-Return-path: <vmikheev@SECTORBASE.COM>
-Received: from sectorbase2.sectorbase.com ([66.106.163.120])
-       by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0NNN3U21442
-       for <pgman@candle.pha.pa.us>; Wed, 23 Jan 2002 18:23:04 -0500 (EST)
-Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19)
-       id <DKXVZ14S>; Wed, 23 Jan 2002 15:22:52 -0800
-Message-ID: <3705826352029646A3E91C53F7189E32518483@sectorbase2.sectorbase.com>
-From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
-To: "'Bruce Momjian'" <pgman@candle.pha.pa.us>,
-   PostgreSQL-development
-  <pgsql-hackers@postgreSQL.org>
-Subject: RE: [HACKERS] Savepoints
-Date: Wed, 23 Jan 2002 15:22:42 -0800
-MIME-Version: 1.0
-X-Mailer: Internet Mail Service (5.5.2653.19)
-Content-Type: text/plain;
-       charset="iso-8859-1"
-Status: ORr
-
-> I have talked in the past about a possible implementation of
-> savepoints/nested transactions.  I would like to more formally outline
-> my ideas below.
-
-Well, I would like to do the same -:)
-
-> ...
-> There is no reason for other backend to be able to see savepoint undo
-> information, and keeping it private greatly simplifies the
-> implementation.
-
-Yes... and requires additional memory/disk space: we keep old records
-in data files and we'll store them again...
-
-How about: use overwriting smgr + put old records into rollback
-segments - RS - (you have to keep them somewhere till TX's running
-anyway) + use WAL only as REDO log (RS will be used to rollback TX'
-changes and WAL will be used for RS/data files recovery).
-Something like what Oracle does.
-
-Vadim
-
-From pgsql-hackers-owner+M18085=candle.pha.pa.us=pgman@postgresql.org Wed Jan 23 20:15:02 2002
-Return-path: <pgsql-hackers-owner+M18085=candle.pha.pa.us=pgman@postgresql.org>
-Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
-       by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0O1F1U26461
-       for <pgman@candle.pha.pa.us>; Wed, 23 Jan 2002 20:15:02 -0500 (EST)
-Received: (qmail 92866 invoked by alias); 24 Jan 2002 01:14:59 -0000
-Received: from unknown (HELO postgresql.org) (64.49.215.8)
-  by www.postgresql.org with SMTP; 24 Jan 2002 01:14:59 -0000
-Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
-       by postgresql.org (8.11.3/8.11.4) with ESMTP id g0O18ml91949
-       for <pgsql-hackers@postgresql.org>; Wed, 23 Jan 2002 20:08:50 -0500 (EST)
-       (envelope-from pgman@candle.pha.pa.us)
-Received: (from pgman@localhost)
-       by candle.pha.pa.us (8.11.6/8.10.1) id g0O18jV26044;
-       Wed, 23 Jan 2002 20:08:45 -0500 (EST)
-From: Bruce Momjian <pgman@candle.pha.pa.us>
-Message-ID: <200201240108.g0O18jV26044@candle.pha.pa.us>
-Subject: Re: [HACKERS] Savepoints
-In-Reply-To: <3705826352029646A3E91C53F7189E32518483@sectorbase2.sectorbase.com>
-To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
-Date: Wed, 23 Jan 2002 20:08:45 -0500 (EST)
-cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
-X-Mailer: ELM [version 2.4ME+ PL96 (25)]
-MIME-Version: 1.0
-Content-Transfer-Encoding: 7bit
-Content-Type: text/plain; charset=US-ASCII
-Precedence: bulk
-Sender: pgsql-hackers-owner@postgresql.org
-Status: OR
-
-Mikheev, Vadim wrote:
-> > I have talked in the past about a possible implementation of
-> > savepoints/nested transactions.  I would like to more formally outline
-> > my ideas below.
-> 
-> Well, I would like to do the same -:)
-
-Good.
-
-> > ...
-> > There is no reason for other backend to be able to see savepoint undo
-> > information, and keeping it private greatly simplifies the
-> > implementation.
-> 
-> Yes... and requires additional memory/disk space: we keep old records
-> in data files and we'll store them again...
-
-I was suggesting keeping only relid/tid or in some cases only relid. 
-Seems like one or the other will fit all needs:  relid/tid for update of
-a few rows, relid for many rows updated in the same table.  I saw no
-need to store the actual data.
-
-> How about: use overwriting smgr + put old records into rollback
-> segments - RS - (you have to keep them somewhere till TX's running
-> anyway) + use WAL only as REDO log (RS will be used to rollback TX'
-> changes and WAL will be used for RS/data files recovery).
-> Something like what Oracle does.
-
-Why record the old data rows rather than the tids?  While the
-transaction is running, the rows can't be moved anyway.  Also, why store
-them in a shared area.  That has additional requirements because one old
-transaction can require all transactions to keep their stuff around. 
-Why not just make it a private data file for each backend?
-
--- 
-  Bruce Momjian                        |  http://candle.pha.pa.us
-  pgman@candle.pha.pa.us               |  (610) 853-3000
-  +  If your life is a hard drive,     |  830 Blythe Avenue
-  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
-
----------------------------(end of broadcast)---------------------------
-TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
-
-From pgsql-hackers-owner+M18086=candle.pha.pa.us=pgman@postgresql.org Wed Jan 23 20:25:47 2002
-Return-path: <pgsql-hackers-owner+M18086=candle.pha.pa.us=pgman@postgresql.org>
-Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
-       by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0O1PkU26964
-       for <pgman@candle.pha.pa.us>; Wed, 23 Jan 2002 20:25:47 -0500 (EST)
-Received: (qmail 94878 invoked by alias); 24 Jan 2002 01:25:44 -0000
-Received: from unknown (HELO postgresql.org) (64.49.215.8)
-  by www.postgresql.org with SMTP; 24 Jan 2002 01:25:44 -0000
-Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
-       by postgresql.org (8.11.3/8.11.4) with ESMTP id g0O1L1l94075
-       for <pgsql-hackers@postgreSQL.org>; Wed, 23 Jan 2002 20:21:01 -0500 (EST)
-       (envelope-from pgman@candle.pha.pa.us)
-Received: (from pgman@localhost)
-       by candle.pha.pa.us (8.11.6/8.10.1) id g0O1Kwm26748;
-       Wed, 23 Jan 2002 20:20:58 -0500 (EST)
-From: Bruce Momjian <pgman@candle.pha.pa.us>
-Message-ID: <200201240120.g0O1Kwm26748@candle.pha.pa.us>
-Subject: Re: [HACKERS] Savepoints
-In-Reply-To: <3705826352029646A3E91C53F7189E32518483@sectorbase2.sectorbase.com>
-To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
-Date: Wed, 23 Jan 2002 20:20:58 -0500 (EST)
-cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
-X-Mailer: ELM [version 2.4ME+ PL96 (25)]
-MIME-Version: 1.0
-Content-Transfer-Encoding: 7bit
-Content-Type: text/plain; charset=US-ASCII
-Precedence: bulk
-Sender: pgsql-hackers-owner@postgresql.org
-Status: OR
-
-> > There is no reason for other backend to be able to see savepoint undo
-> > information, and keeping it private greatly simplifies the
-> > implementation.
-> 
-> Yes... and requires additional memory/disk space: we keep old records
-> in data files and we'll store them again...
-> 
-> How about: use overwriting smgr + put old records into rollback
-> segments - RS - (you have to keep them somewhere till TX's running
-> anyway) + use WAL only as REDO log (RS will be used to rollback TX'
-> changes and WAL will be used for RS/data files recovery).
-> Something like what Oracle does.
-
-I am sorry.  I see what you are saying now.  I missed the words
-"overwriting smgr".  You are suggesting going to an overwriting storage
-manager.  Is this to be done only because of savepoints.  Doesn't seem
-worth it when I have a possible solution without such a drastic change.
-Also, overwriting storage manager will require MVCC to read through
-there to get accurate MVCC visibility, right?
-
--- 
-  Bruce Momjian                        |  http://candle.pha.pa.us
-  pgman@candle.pha.pa.us               |  (610) 853-3000
-  +  If your life is a hard drive,     |  830 Blythe Avenue
-  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
-
----------------------------(end of broadcast)---------------------------
-TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
-
-From vmikheev@SECTORBASE.COM Wed Jan 23 21:03:29 2002
-Return-path: <vmikheev@SECTORBASE.COM>
-Received: from sectorbase2.sectorbase.com ([66.106.163.120])
-       by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0O23TU28813
-       for <pgman@candle.pha.pa.us>; Wed, 23 Jan 2002 21:03:29 -0500 (EST)
-Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19)
-       id <DKXVZFBY>; Wed, 23 Jan 2002 18:03:18 -0800
-Message-ID: <3705826352029646A3E91C53F7189E32518487@sectorbase2.sectorbase.com>
-From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
-To: "'Bruce Momjian'" <pgman@candle.pha.pa.us>
-cc: PostgreSQL-development <pgsql-hackers@postgreSQL.org>
-Subject: RE: [HACKERS] Savepoints
-Date: Wed, 23 Jan 2002 18:03:11 -0800
-MIME-Version: 1.0
-X-Mailer: Internet Mail Service (5.5.2653.19)
-Content-Type: text/plain;
-       charset="iso-8859-1"
-Status: ORr
-
-> > How about: use overwriting smgr + put old records into rollback
-> > segments - RS - (you have to keep them somewhere till TX's running
-> > anyway) + use WAL only as REDO log (RS will be used to rollback TX'
-> > changes and WAL will be used for RS/data files recovery).
-> > Something like what Oracle does.
-> 
-> I am sorry. I see what you are saying now. I missed the words
-
-And I'm sorry for missing your notes about storing relid+tid only.
-
-> "overwriting smgr". You are suggesting going to an overwriting
-> storage manager. Is this to be done only because of savepoints.
-
-No. One point I made a few monthes ago (and never got objections)
-is - why to keep old data in data files sooooo long?
-Imagine long running TX (eg pg_dump). Why other TX-s must read
-again and again completely useless (for them) old data we keep
-for pg_dump?
-
-> Doesn't seem worth it when I have a possible solution without
-> such a drastic change.
-> Also, overwriting storage manager will require MVCC to read
-> through there to get accurate MVCC visibility, right?
-
-Right... just like now non-overwriting smgr requires *ALL*
-TX-s to read old data in data files. But with overwriting smgr
-TX will read RS only when it is required and as far (much) as
-it is required.
-
-Simple solutions are not always the best ones.
-Compare Oracle and InterBase. Both have MVCC.
-Smgr-s are different. What RDBMS is more cool?
-Why doesn't Oracle use more simple non-overwriting smgr
-(as InterBase... and we do)?
-
-Vadim
-
-From dhogaza@pacifier.com Wed Jan 23 21:05:37 2002
-Return-path: <dhogaza@pacifier.com>
-Received: from comet.pacifier.com ([199.2.117.155])
-       by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0O25bU28962
-       for <pgman@candle.pha.pa.us>; Wed, 23 Jan 2002 21:05:37 -0500 (EST)
-Received: from pacifier.com (dsl-dhogaza.pacifier.net [207.202.226.68])
-       by comet.pacifier.com (8.11.2/8.11.1) with ESMTP id g0O24qX29917;
-       Wed, 23 Jan 2002 18:04:52 -0800 (PST)
-Message-ID: <3C4F6BF0.2010406@pacifier.com>
-Date: Wed, 23 Jan 2002 18:05:36 -0800
-From: Don Baccus <dhogaza@pacifier.com>
-User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7) Gecko/20011221
-X-Accept-Language: en-us
-MIME-Version: 1.0
-To: Bruce Momjian <pgman@candle.pha.pa.us>
-cc: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>,
-   PostgreSQL-development <pgsql-hackers@postgresql.org>
-Subject: Re: [HACKERS] Savepoints
-References: <200201240120.g0O1Kwm26748@candle.pha.pa.us>
-Content-Type: text/plain; charset=us-ascii; format=flowed
-Content-Transfer-Encoding: 7bit
-Status: OR
-
-Bruce Momjian wrote:
-
-
-> I am sorry.  I see what you are saying now.  I missed the words
-> "overwriting smgr".  You are suggesting going to an overwriting storage
-> manager.
-
-
-Overwriting storage managers don't suffer from unbounded growth of 
-datafiles until garbage collection (vacuum) is performed.  In fact, 
-there's no need for a vacuum-style utility.  The rollback segments only 
-need to keep around enough past history to rollback transactions that 
-are executing.
-
-Of course, then the size of your transactions are limited by the size of 
-your rollback segments, which in Oracle are fixed in length when you 
-build your database (there are ways to change this when you figure out 
-that you didn't pick a good number when creating it).
-
- >Is this to be done only because of savepoints.
-
-Not in traditional storage managers such as Oracle uses.  The complexity 
-of managing visibility and the like are traded off against the fact that 
-you're not stuck ever needing to garbage collect a database that 
-occupies a roomful of disks.
-
-It's a trade-off.  PG's current storage manager seems to work awfully 
-well in a lot of common database scenarios, and Tom's new vacuum is 
-meant to help mitigate against the drawbacks.   But overwriting storage 
-managers certainly have their advantages, too.
-
- >  Doesn't seem
-
-> worth it when I have a possible solution without such a drastic change.
-> Also, overwriting storage manager will require MVCC to read through
-> there to get accurate MVCC visibility, right?
-
-
-Yep...
-
--- 
-Don Baccus
-Portland, OR
-http://donb.photo.net, http://birdnotes.net, http://openacs.org
-
-
-From Inoue@tpf.co.jp Thu Jan 24 11:34:48 2002
-Return-path: <Inoue@tpf.co.jp>
-Received: from p2272.nsk.ne.jp (p2272.nsk.ne.jp [210.145.18.145])
-       by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0OGYjU23980
-       for <pgman@candle.pha.pa.us>; Thu, 24 Jan 2002 11:34:47 -0500 (EST)
-Received: from mcadnote1 (ppm132.noc.fukui.nsk.ne.jp [61.198.95.32])
-       by p2272.nsk.ne.jp (8.9.3/3.7W-20000722) with SMTP id BAA12147;
-       Fri, 25 Jan 2002 01:34:24 +0900 (JST)
-From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
-To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
-cc: "PostgreSQL-development" <pgsql-hackers@postgreSQL.org>,
-   "'Bruce Momjian'" <pgman@candle.pha.pa.us>
-Subject: RE: [HACKERS] Savepoints
-Date: Fri, 25 Jan 2002 01:34:29 +0900
-Message-ID: <EKEJJICOHDIEMGPNIFIJKEFBGJAA.Inoue@tpf.co.jp>
-MIME-Version: 1.0
-Content-Type: text/plain;
-       charset="iso-8859-1"
-Content-Transfer-Encoding: 7bit
-X-Priority: 3 (Normal)
-X-MSMail-Priority: Normal
-X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
-In-Reply-To: <3705826352029646A3E91C53F7189E32518483@sectorbase2.sectorbase.com>
-X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
-Importance: Normal
-Status: OR
-
-> -----Original Message-----
-> From: Mikheev, Vadim
-> 
-> How about: use overwriting smgr + put old records into rollback
-> segments - RS - (you have to keep them somewhere till TX's running
-> anyway) + use WAL only as REDO log (RS will be used to rollback TX'
-> changes and WAL will be used for RS/data files recovery).
-> Something like what Oracle does.
-
-As long as we use no overwriting manager
-1) Rollback(data) isn't needed in case of a db crash.
-2) Rollback(data) isn't needed to cancal a transaction entirely.
-3) We don't need to mind the transaction size so much.
-
-We can't use the db any longer if a REDO recovery fails now.
-Under overwriting smgr we can't use the db any longer either
-if rollback fails. How could PG be not less reliable than now ?
-
-regards,
-Hiroshi Inoue
-
-From pgsql-hackers-owner+M18123=candle.pha.pa.us=pgman@postgresql.org Thu Jan 24 14:15:11 2002
-Return-path: <pgsql-hackers-owner+M18123=candle.pha.pa.us=pgman@postgresql.org>
-Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
-       by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0OJFAU12547
-       for <pgman@candle.pha.pa.us>; Thu, 24 Jan 2002 14:15:10 -0500 (EST)
-Received: (qmail 43413 invoked by alias); 24 Jan 2002 19:13:48 -0000
-Received: from unknown (HELO postgresql.org) (64.49.215.8)
-  by www.postgresql.org with SMTP; 24 Jan 2002 19:13:48 -0000
-Received: from sectorbase2.sectorbase.com ([66.106.163.120])
-       by postgresql.org (8.11.3/8.11.4) with ESMTP id g0OJC4l42011
-       for <pgsql-hackers@postgreSQL.org>; Thu, 24 Jan 2002 14:12:04 -0500 (EST)
-       (envelope-from vmikheev@SECTORBASE.COM)
-Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19)
-       id <DKXVZF9P>; Thu, 24 Jan 2002 11:11:54 -0800
-Message-ID: <3705826352029646A3E91C53F7189E3251848B@sectorbase2.sectorbase.com>
-From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
-To: "'Hiroshi Inoue'" <Inoue@tpf.co.jp>
-cc: PostgreSQL-development <pgsql-hackers@postgresql.org>,
-   "'Bruce Momjian'"
-  <pgman@candle.pha.pa.us>
-Subject: Re: [HACKERS] Savepoints
-Date: Thu, 24 Jan 2002 11:11:52 -0800
-MIME-Version: 1.0
-X-Mailer: Internet Mail Service (5.5.2653.19)
-Content-Type: text/plain;
-       charset="iso-8859-1"
-Precedence: bulk
-Sender: pgsql-hackers-owner@postgresql.org
-Status: OR
-
-> > How about: use overwriting smgr + put old records into rollback
-> > segments - RS - (you have to keep them somewhere till TX's running
-> > anyway) + use WAL only as REDO log (RS will be used to rollback TX'
-> > changes and WAL will be used for RS/data files recovery).
-> > Something like what Oracle does.
-> 
-> As long as we use no overwriting manager
-> 1) Rollback(data) isn't needed in case of a db crash.
-> 2) Rollback(data) isn't needed to cancal a transaction entirely.
-
--1) But vacuum must read a huge amount of data to remove dirt.
--2) But TX-s must read data they are not interested at all.
-
-> 3) We don't need to mind the transaction size so much.
-
--3) The same with overwriting smgr and WAL used *only as REDO log*:
-we are not required to keep WAL files for duration of transaction
-- as soon as server knows that changes logged in some WAL file
-applied to data files and RS on disk (and archived, for WAL-based
-BAR) that file may be reused/removed. Old data will still occupy
-space in RS but their space in data files will be available
-for reuse.
-
-> We can't use the db any longer if a REDO recovery fails now.
-
-Reset WAL and use/dump it. Annoying? Agreed. Fix bugs and/or
-use good RAM - whatever caused problem with restart.
-
-> Under overwriting smgr we can't use the db any longer either
-> if rollback fails.
-
-Why should it fail? Bugs? Fix them.
-
-> How could PG be not less reliable than now ?
-
-Is today' RG more reliable than Oracle, Informix, DB2?
-
-Vadim
-
----------------------------(end of broadcast)---------------------------
-TIP 3: if posting/reading through Usenet, please send an appropriate
-subscribe-nomail command to majordomo@postgresql.org so that your
-message can get through to the mailing list cleanly
-
-From pgsql-hackers-owner+M18125=candle.pha.pa.us=pgman@postgresql.org Thu Jan 24 14:23:42 2002
-Return-path: <pgsql-hackers-owner+M18125=candle.pha.pa.us=pgman@postgresql.org>
-Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
-       by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0OJNfU13481
-       for <pgman@candle.pha.pa.us>; Thu, 24 Jan 2002 14:23:42 -0500 (EST)
-Received: (qmail 49604 invoked by alias); 24 Jan 2002 19:23:40 -0000
-Received: from unknown (HELO postgresql.org) (64.49.215.8)
-  by www.postgresql.org with SMTP; 24 Jan 2002 19:23:40 -0000
-Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
-       by postgresql.org (8.11.3/8.11.4) with ESMTP id g0OJMTl48885
-       for <pgsql-hackers@postgreSQL.org>; Thu, 24 Jan 2002 14:22:29 -0500 (EST)
-       (envelope-from pgman@candle.pha.pa.us)
-Received: (from pgman@localhost)
-       by candle.pha.pa.us (8.11.6/8.10.1) id g0OJMJf13378;
-       Thu, 24 Jan 2002 14:22:19 -0500 (EST)
-From: Bruce Momjian <pgman@candle.pha.pa.us>
-Message-ID: <200201241922.g0OJMJf13378@candle.pha.pa.us>
-Subject: Re: [HACKERS] Savepoints
-In-Reply-To: <3705826352029646A3E91C53F7189E32518487@sectorbase2.sectorbase.com>
-To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
-Date: Thu, 24 Jan 2002 14:22:19 -0500 (EST)
-cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
-X-Mailer: ELM [version 2.4ME+ PL96 (25)]
-MIME-Version: 1.0
-Content-Transfer-Encoding: 7bit
-Content-Type: text/plain; charset=US-ASCII
-Precedence: bulk
-Sender: pgsql-hackers-owner@postgresql.org
-Status: OR
-
-
-OK, I have had time to think about this, and I think I can put the two
-proposals into perspective.  I will use Vadim's terminology.
-
-In our current setup, rollback/undo data is kept in the same file as our
-live data.  This data is used for two purposes, one, for rollback of
-transactions, and perhaps subtransactions in the future, and second, for
-MVCC visibility for backends making changes.
-
-So, it seems the real question is whether a database modification should
-write the old data into a separate rollback segment and modify the heap
-data, or just create a new row and require the old row to be removed
-later by vacuum.
-
-Let's look at this behavior without MVCC.  In such cases, if someone
-tries to read a modified row, it will block and wait for the modifying
-backend to commit or rollback, when it will then continue.  In such
-cases, there is no reason for the waiting transaction to read the old
-data in the redo segment because it can't continue anyway.
-
-Now, with MVCC, the backend has to read through the redo segment to get
-the original data value for that row.
-
-Now, while rollback segments do help with cleaning out old UPDATE rows,
-how does it improve DELETE performance?  Seems it would just mark it as
-expired like we do now.
-
-One objection I always had to redo segments was that if I start a
-transaction in the morning and walk away, none of the redo segments can
-be recycled.  I was going to ask if we can force some type of redo
-segment compaction to keep old active rows and delete rows no longer
-visible to any transaction.  However, I now realize that our VACUUM has
-the same problem.  Tuples with XID >= GetOldestXmin() are not recycled,
-meaning we have this problem in our current implementation too.  (I
-wonder if our vacuum could be smarter about knowing which rows are
-visible, perhaps by creating a sorted list of xid's and doing a binary
-search on the list to determine visibility.)
-
-So, I guess the issue is, do we want to keep redo information in the
-main table, or split it out into redo segments.  Certainly we have to
-eliminate the Oracle restrictions that redo segment size is fixed at
-install time.
-
-The advantages of a redo segment is that hopefully we don't have
-transactions reading through irrelevant undo information.  The
-disadvantage is that we now have redo information grouped into table
-files where a sequential scan can be performed.  (Index scans of redo
-info are a performance problem currently.)  We would have to somehow
-efficiently access redo information grouped into the redo segments. 
-Perhaps a hash based in relid would help here.  Another disadvantage is
-concurrency.  When we start modifying heap data in place, we have to
-prevent other backends from seeing that modification while we move the
-old data to the redo segment.
-
-I guess my feeling is that if we can get vacuum to happen automatically,
-how is our current non-overwriting storage manager different from redo
-segments?
-
-One big advantage of redo segments would be that right now, if someone
-updates a row repeatedly, there are lots of heap versions of the row
-that are difficult to shrink in the table, while if they are in the redo
-segments, we can more efficiently remove them, and there is only on heap
-row.
-
-How is recovery handled with rollback segments?  Do we write old and new
-data to WAL?   We just write new data to WAL now, right?  Do we fsync
-rollback segments?
-
-Have I outlined this accurately?
-
----------------------------------------------------------------------------
-
-Mikheev, Vadim wrote:
-> > > How about: use overwriting smgr + put old records into rollback
-> > > segments - RS - (you have to keep them somewhere till TX's running
-> > > anyway) + use WAL only as REDO log (RS will be used to rollback TX'
-> > > changes and WAL will be used for RS/data files recovery).
-> > > Something like what Oracle does.
-> > 
-> > I am sorry. I see what you are saying now. I missed the words
-> 
-> And I'm sorry for missing your notes about storing relid+tid only.
-> 
-> > "overwriting smgr". You are suggesting going to an overwriting
-> > storage manager. Is this to be done only because of savepoints.
-> 
-> No. One point I made a few monthes ago (and never got objections)
-> is - why to keep old data in data files sooooo long?
-> Imagine long running TX (eg pg_dump). Why other TX-s must read
-> again and again completely useless (for them) old data we keep
-> for pg_dump?
-> 
-> > Doesn't seem worth it when I have a possible solution without
-> > such a drastic change.
-> > Also, overwriting storage manager will require MVCC to read
-> > through there to get accurate MVCC visibility, right?
-> 
-> Right... just like now non-overwriting smgr requires *ALL*
-> TX-s to read old data in data files. But with overwriting smgr
-> TX will read RS only when it is required and as far (much) as
-> it is required.
-> 
-> Simple solutions are not always the best ones.
-> Compare Oracle and InterBase. Both have MVCC.
-> Smgr-s are different. What RDBMS is more cool?
-> Why doesn't Oracle use more simple non-overwriting smgr
-> (as InterBase... and we do)?
-> 
-> Vadim
-> 
-
--- 
-  Bruce Momjian                        |  http://candle.pha.pa.us
-  pgman@candle.pha.pa.us               |  (610) 853-3000
-  +  If your life is a hard drive,     |  830 Blythe Avenue
-  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
-
----------------------------(end of broadcast)---------------------------
-TIP 2: you can get off all lists at once with the unregister command
-    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
-
-From pgsql-hackers-owner+M18141=candle.pha.pa.us=pgman@postgresql.org Thu Jan 24 19:43:38 2002
-Return-path: <pgsql-hackers-owner+M18141=candle.pha.pa.us=pgman@postgresql.org>
-Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
-       by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0P0hbU15026
-       for <pgman@candle.pha.pa.us>; Thu, 24 Jan 2002 19:43:38 -0500 (EST)
-Received: (qmail 28642 invoked by alias); 25 Jan 2002 00:43:24 -0000
-Received: from unknown (HELO postgresql.org) (64.49.215.8)
-  by www.postgresql.org with SMTP; 25 Jan 2002 00:43:24 -0000
-Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34])
-       by postgresql.org (8.11.3/8.11.4) with SMTP id g0P0YIl27208
-       for <pgsql-hackers@postgreSQL.org>; Thu, 24 Jan 2002 19:34:18 -0500 (EST)
-       (envelope-from Inoue@tpf.co.jp)
-Received: (qmail 3661 invoked from network); 25 Jan 2002 00:34:19 -0000
-Received: from unknown (HELO viscomail.tpf.co.jp) (100.0.0.108)
-  by sd2.tpf-fw-c.co.jp with SMTP; 25 Jan 2002 00:34:19 -0000
-Received: from tpf.co.jp (3dgateway1 [126.0.1.60])
-       by viscomail.tpf.co.jp (8.8.8+Sun/8.8.8) with ESMTP id JAA00756;
-       Fri, 25 Jan 2002 09:34:18 +0900 (JST)
-Message-ID: <3C50A807.32A29E09@tpf.co.jp>
-Date: Fri, 25 Jan 2002 09:34:15 +0900
-From: Hiroshi Inoue <Inoue@tpf.co.jp>
-X-Mailer: Mozilla 4.73 [ja] (Windows NT 5.0; U)
-X-Accept-Language: ja
-MIME-Version: 1.0
-To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
-cc: PostgreSQL-development <pgsql-hackers@postgresql.org>,
-   "'Bruce Momjian'" <pgman@candle.pha.pa.us>
-Subject: Re: [HACKERS] Savepoints
-References: <3705826352029646A3E91C53F7189E3251848B@sectorbase2.sectorbase.com>
-Content-Type: text/plain; charset=iso-2022-jp
-Content-Transfer-Encoding: 7bit
-Precedence: bulk
-Sender: pgsql-hackers-owner@postgresql.org
-Status: OR
-
-"Mikheev, Vadim" wrote:
-> 
-> > > How about: use overwriting smgr + put old records into rollback
-> > > segments - RS - (you have to keep them somewhere till TX's running
-> > > anyway) + use WAL only as REDO log (RS will be used to rollback TX'
-> > > changes and WAL will be used for RS/data files recovery).
-> > > Something like what Oracle does.
-> >
-> > As long as we use no overwriting manager
-> > 1) Rollback(data) isn't needed in case of a db crash.
-> > 2) Rollback(data) isn't needed to cancal a transaction entirely.
-> 
-> -1) But vacuum must read a huge amount of data to remove dirt.
-> -2) But TX-s must read data they are not interested at all.
-> 
-> > 3) We don't need to mind the transaction size so much.
-> 
-> -3) The same with overwriting smgr and WAL used *only as REDO log*:
-
-The larger RS becomes the longer it would take time to cancel
-the transaction whereas it is executed in a momemnt under no
-overwriting smgr and for example if RS exhausted all disk space
-is PG really safe ? Other backends would also fail because they
-couldn't write RS any mode. Many transactions would execute
-UNDO operations simultaneously but there's no space to write
-WALs (UNDO operations must be written to WAL also) and PG
-system would abort. And could PG restart under such situations ?
-Even though there's a way to recover from the situation, I
-think we should avoid such dangerous situations from the
-first. Basically recovery operations should never fail.
- 
-> 
-> > We can't use the db any longer if a REDO recovery fails now.
-> 
-> Reset WAL and use/dump it. Annoying? Agreed. Fix bugs and/or
-> use good RAM - whatever caused problem with restart.
-
-As I already mentioned recovery operations should never fail.
-> 
-> > Under overwriting smgr we can't use the db any longer either
-> > if rollback fails.
-> 
-> Why should it fail? Bugs? Fix them.
-
-Rollback operations are executed much more often than
-REDO recovery and it is hard to fix such bugs once PG
-was released. Most people in such troubles have no
-time to persue the cause. In reality I replied to the
-PG restart troubles twice (with --wal-debug and pg_resetxlog
-suggestions ) in Japan but got no further replies.
-
-> 
-> > How could PG be not less reliable than now ?
-> 
-> Is today' RG more reliable than Oracle, Informix, DB2?
-
-I have never been and would never be optiomistic
-about recovery. Is 7.1 more reliable than 7.0 from the
-recovery POV ? I see no reason why overwriting smgr is
-more relaible than no overwriting smgr as for recovery.
-
-regards,
-Hiroshi Inoue
-
----------------------------(end of broadcast)---------------------------
-TIP 6: Have you searched our list archives?
-
-http://archives.postgresql.org
-
-From ZeugswetterA@spardat.at Fri Jan 25 09:21:40 2002
-Return-path: <ZeugswetterA@spardat.at>
-Received: from smxsat1.smxs.net (smxsat1.smxs.net [213.150.10.1])
-       by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0PELde10640
-       for <pgman@candle.pha.pa.us>; Fri, 25 Jan 2002 09:21:39 -0500 (EST)
-Received: from m01x1.s-mxs.net [10.3.55.201]
-       by smxsat1.smxs.net
-       with XWall v3.18f ;
-       Fri, 25 Jan 2002 15:22:51 +0100
-Received: from m0103.s-mxs.net [10.3.55.3]
-       by m01x1.s-mxs.net
-       with XWall v3.18a ;
-       Fri, 25 Jan 2002 15:21:23 +0100
-Received: from m0114.s-mxs.net ([10.3.55.14]) by m0103.s-mxs.net with Microsoft SMTPSVC(5.0.2195.2966);
-  Fri, 25 Jan 2002 15:21:22 +0100
-X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3
-content-class: urn:content-classes:message
-MIME-Version: 1.0
-Content-Type: text/plain;
-       charset="iso-8859-1"
-Subject: RE: [HACKERS] Savepoints
-Date: Fri, 25 Jan 2002 15:21:22 +0100
-Message-ID: <46C15C39FEB2C44BA555E356FBCD6FA42128DE@m0114.s-mxs.net>
-Thread-Topic: [HACKERS] Savepoints
-Thread-Index: AcGkZ8SMKn//UUTjS3mi+qC7+gZAwwBQ4YMA
-From: "Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at>
-To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>,
-   "Bruce Momjian" <pgman@candle.pha.pa.us>,
-   "PostgreSQL-development" <pgsql-hackers@postgresql.org>
-X-OriginalArrivalTime: 25 Jan 2002 14:21:22.0648 (UTC) FILETIME=[9090BD80:01C1A5AB]
-Content-Transfer-Encoding: 8bit
-X-MIME-Autoconverted: from quoted-printable to 8bit by candle.pha.pa.us id g0PELde10640
-Status: OR
-
-Vadim wrote:
-> How about: use overwriting smgr + put old records into rollback
-> segments - RS - (you have to keep them somewhere till TX's running
-> anyway) + use WAL only as REDO log (RS will be used to rollback TX'
-> changes and WAL will be used for RS/data files recovery).
-> Something like what Oracle does.
-
-We have all the info we need in WAL and in the old rows,
-why would you want to write them to RS ?
-You only need RS for overwriting smgr.
-
-Andreas
-
-From pgsql-hackers-owner+M18209=candle.pha.pa.us=pgman@postgresql.org Fri Jan 25 16:14:02 2002
-Return-path: <pgsql-hackers-owner+M18209=candle.pha.pa.us=pgman@postgresql.org>
-Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
-       by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0PLE1e19182
-       for <pgman@candle.pha.pa.us>; Fri, 25 Jan 2002 16:14:01 -0500 (EST)
-Received: (qmail 85111 invoked by alias); 25 Jan 2002 21:13:59 -0000
-Received: from unknown (HELO postgresql.org) (64.49.215.8)
-  by www.postgresql.org with SMTP; 25 Jan 2002 21:13:59 -0000
-Received: from smxsat1.smxs.net (smxsat1.smxs.net [213.150.10.1])
-       by postgresql.org (8.11.3/8.11.4) with ESMTP id g0PL48l79366
-       for <pgsql-hackers@postgresql.org>; Fri, 25 Jan 2002 16:04:09 -0500 (EST)
-       (envelope-from ZeugswetterA@spardat.at)
-Received: from m01x1.s-mxs.net [10.3.55.201]
-       by smxsat1.smxs.net
-       with XWall v3.18f ;
-       Fri, 25 Jan 2002 22:05:21 +0100
-Received: from m0102.s-mxs.net [10.3.55.2]
-       by m01x1.s-mxs.net
-       with XWall v3.18a ;
-       Fri, 25 Jan 2002 22:03:54 +0100
-Received: from m0114.s-mxs.net ([10.3.55.14]) by m0102.s-mxs.net with Microsoft SMTPSVC(5.0.2195.2966);
-  Fri, 25 Jan 2002 22:03:53 +0100
-X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3
-content-class: urn:content-classes:message
-MIME-Version: 1.0
-Content-Type: text/plain;
-       charset="iso-8859-1"
-Subject: Re: [HACKERS] Savepoints
-Date: Fri, 25 Jan 2002 22:03:53 +0100
-Message-ID: <46C15C39FEB2C44BA555E356FBCD6FA41EB4C4@m0114.s-mxs.net>
-Thread-Topic: [HACKERS] Savepoints
-Thread-Index: AcGlDMGVwSWndt4kT1C7QhclLvQPWgA1arbw
-From: "Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at>
-To: "Bruce Momjian" <pgman@candle.pha.pa.us>,
-   "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
-cc: "PostgreSQL-development" <pgsql-hackers@postgresql.org>
-X-OriginalArrivalTime: 25 Jan 2002 21:03:53.0685 (UTC) FILETIME=[CBB48850:01C1A5E3]
-Content-Transfer-Encoding: 8bit
-X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id g0PLDAm83732
-Precedence: bulk
-Sender: pgsql-hackers-owner@postgresql.org
-Status: ORr
-
-
-> Now, with MVCC, the backend has to read through the redo segment to get
-
-You mean rollback segment, but ...
-
-> the original data value for that row.
-
-Will only need to be looked up if the row is currently beeing modified by 
-a not yet comitted txn (at least in the default read committed mode)  
-
-> 
-> Now, while rollback segments do help with cleaning out old UPDATE rows,
-> how does it improve DELETE performance?  Seems it would just mark it as
-> expired like we do now.
-
-delete would probably be: 
-1. mark original deleted and write whole row to RS
-
-I don't think you would like to mix looking up deleted rows in heap
-but updated rows in RS
-
-Andreas
-
-PS: not that I like overwrite with MVCC now
-If you think of VACUUM as garbage collection PG is highly trendy with
-the non-overwriting smgr.
-
----------------------------(end of broadcast)---------------------------
-TIP 5: Have you checked our extensive FAQ?
-
-http://www.postgresql.org/users-lounge/docs/faq.html
-
-From pgsql-hackers-owner+M18211=candle.pha.pa.us=pgman@postgresql.org Fri Jan 25 16:53:45 2002
-Return-path: <pgsql-hackers-owner+M18211=candle.pha.pa.us=pgman@postgresql.org>
-Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
-       by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0PLrie22174
-       for <pgman@candle.pha.pa.us>; Fri, 25 Jan 2002 16:53:44 -0500 (EST)
-Received: (qmail 96831 invoked by alias); 25 Jan 2002 21:53:43 -0000
-Received: from unknown (HELO postgresql.org) (64.49.215.8)
-  by www.postgresql.org with SMTP; 25 Jan 2002 21:53:43 -0000
-Received: from smxsat1.smxs.net (smxsat1.smxs.net [213.150.10.1])
-       by postgresql.org (8.11.3/8.11.4) with ESMTP id g0PLpRl96298
-       for <pgsql-hackers@postgresql.org>; Fri, 25 Jan 2002 16:51:27 -0500 (EST)
-       (envelope-from ZeugswetterA@spardat.at)
-Received: from m01x1.s-mxs.net [10.3.55.201]
-       by smxsat1.smxs.net
-       with XWall v3.18f ;
-       Fri, 25 Jan 2002 22:52:54 +0100
-Received: from m0103.s-mxs.net [10.3.55.3]
-       by m01x1.s-mxs.net
-       with XWall v3.18a ;
-       Fri, 25 Jan 2002 22:51:25 +0100
-Received: from m0114.s-mxs.net ([10.3.55.14]) by m0103.s-mxs.net with Microsoft SMTPSVC(5.0.2195.2966);
-  Fri, 25 Jan 2002 22:51:25 +0100
-X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3
-content-class: urn:content-classes:message
-MIME-Version: 1.0
-Content-Type: text/plain;
-       charset="iso-8859-1"
-Subject: Re: [HACKERS] Savepoints
-Date: Fri, 25 Jan 2002 22:51:24 +0100
-Message-ID: <46C15C39FEB2C44BA555E356FBCD6FA41EB4C5@m0114.s-mxs.net>
-Thread-Topic: [HACKERS] Savepoints
-Thread-Index: AcGlznYKFcqoYpMnSlGQHhQuEf6LuAAGpxnQ
-From: "Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at>
-To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
-cc: <pgsql-hackers@postgresql.org>
-X-OriginalArrivalTime: 25 Jan 2002 21:51:25.0008 (UTC) FILETIME=[6F39E500:01C1A5EA]
-Content-Transfer-Encoding: 8bit
-X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id g0PLrP196418
-Precedence: bulk
-Sender: pgsql-hackers-owner@postgresql.org
-Status: OR
-
-
-> > > How about: use overwriting smgr + put old records into rollback
-> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-> > > segments - RS - (you have to keep them somewhere till TX's running
-> > > anyway) + use WAL only as REDO log (RS will be used to 
-> rollback TX'
-> > > changes and WAL will be used for RS/data files recovery).
-> > > Something like what Oracle does.
-> > 
-> > We have all the info we need in WAL and in the old rows,
-> > why would you want to write them to RS ?
-> > You only need RS for overwriting smgr.
-> 
-> This is what I'm saying - implement Overwriting smgr...
-
-Yes I am sorry, I am catching up on email and had not read Bruce's 
-comment (nor yours correctly) :-(
-
-I was also long in the pro overwriting camp, because I am used to 
-non MVCC dbs like DB/2 and Informix. (which I like very much) 
-But I am starting to doubt that overwriting is really so good for
-an MVCC db. And I don't think PG wants to switch to non MVCC :-)
-
-Imho it would only need a much more aggressive VACUUM backend.
-(aka garbage collector :-) Maybe It could be designed to sniff the 
-redo log (buffer) to get a hint at what to actually clean out next.
-
-Andreas
-
----------------------------(end of broadcast)---------------------------
-TIP 4: Don't 'kill -9' the postmaster
-
-From pgsql-hackers-owner+M18218=candle.pha.pa.us=pgman@postgresql.org Fri Jan 25 19:14:24 2002
-Return-path: <pgsql-hackers-owner+M18218=candle.pha.pa.us=pgman@postgresql.org>
-Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
-       by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0Q0ENe03543
-       for <pgman@candle.pha.pa.us>; Fri, 25 Jan 2002 19:14:23 -0500 (EST)
-Received: (qmail 22482 invoked by alias); 26 Jan 2002 00:13:55 -0000
-Received: from unknown (HELO postgresql.org) (64.49.215.8)
-  by www.postgresql.org with SMTP; 26 Jan 2002 00:13:55 -0000
-Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
-       by postgresql.org (8.11.3/8.11.4) with ESMTP id g0PNw1l20714
-       for <pgsql-hackers@postgresql.org>; Fri, 25 Jan 2002 18:58:01 -0500 (EST)
-       (envelope-from pgman@candle.pha.pa.us)
-Received: (from pgman@localhost)
-       by candle.pha.pa.us (8.11.6/8.10.1) id g0PNvoL02515;
-       Fri, 25 Jan 2002 18:57:50 -0500 (EST)
-From: Bruce Momjian <pgman@candle.pha.pa.us>
-Message-ID: <200201252357.g0PNvoL02515@candle.pha.pa.us>
-Subject: Re: [HACKERS] Savepoints
-In-Reply-To: <46C15C39FEB2C44BA555E356FBCD6FA41EB4C4@m0114.s-mxs.net>
-To: Zeugswetter Andreas SB SD <ZeugswetterA@spardat.at>
-Date: Fri, 25 Jan 2002 18:57:50 -0500 (EST)
-cc: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>,
-   PostgreSQL-development <pgsql-hackers@postgresql.org>
-X-Mailer: ELM [version 2.4ME+ PL96 (25)]
-MIME-Version: 1.0
-Content-Transfer-Encoding: 7bit
-Content-Type: text/plain; charset=US-ASCII
-Precedence: bulk
-Sender: pgsql-hackers-owner@postgresql.org
-Status: OR
-
-Zeugswetter Andreas SB SD wrote:
-> 
-> > Now, with MVCC, the backend has to read through the redo segment to get
-> 
-> You mean rollback segment, but ...
-
-
-Sorry, yes.  I get redo/undo/rollback mixed up sometimes. :-)
-
-> > the original data value for that row.
-> 
-> Will only need to be looked up if the row is currently beeing modified by 
-> a not yet comitted txn (at least in the default read committed mode)  
-
-Uh, not really.  The transaction may have completed after my transaction
-started, meaning even though it looks like it is committed, to me, it is
-not visible.  Most MVCC visibility will require undo lookup.
-
-> 
-> > 
-> > Now, while rollback segments do help with cleaning out old UPDATE rows,
-> > how does it improve DELETE performance?  Seems it would just mark it as
-> > expired like we do now.
-> 
-> delete would probably be: 
-> 1. mark original deleted and write whole row to RS
-> 
-> I don't think you would like to mix looking up deleted rows in heap
-> but updated rows in RS
-
-Yes, so really the overwriting is only a big win for UPDATE.  Right now,
-UPDATE is DELETE/INSERT, and that DELETE makes MVCC happy. :-)
-
-My whole goal was to simplify this so we can see the differences.
-
-
-> PS: not that I like overwrite with MVCC now
-> If you think of VACUUM as garbage collection PG is highly trendy with
-> the non-overwriting smgr.
-
-Yes, that is basically what it is now, a garbage collector that collects
-in heap rather than in undo.
-
--- 
-  Bruce Momjian                        |  http://candle.pha.pa.us
-  pgman@candle.pha.pa.us               |  (610) 853-3000
-  +  If your life is a hard drive,     |  830 Blythe Avenue
-  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
-
----------------------------(end of broadcast)---------------------------
-TIP 3: if posting/reading through Usenet, please send an appropriate
-subscribe-nomail command to majordomo@postgresql.org so that your
-message can get through to the mailing list cleanly
-
-From pgman Wed Jan 23 10:36:13 2002
-Subject: Savepoints
-To: PostgreSQL-development <pgsql-hackers@postgreSQL.org>
-Date: Wed, 23 Jan 2002 13:19:05 -0500 (EST)
-X-Mailer: ELM [version 2.4ME+ PL96 (25)]
-MIME-Version: 1.0
-Content-Transfer-Encoding: 7bit
-Content-Type: text/plain; charset=US-ASCII
-Content-Length:  1829
-Status: OR
-
-I have talked in the past about a possible implementation of
-savepoints/nested transactions.  I would like to more formally outline
-my ideas below.
-
-We have talked about using WAL for such a purpose, but that requires WAL
-files to remain for the life of a transaction, which seems unacceptable.
-Other database systems do that, and it is a pain for administrators.  I
-realized we could do some sort of WAL compaction, but that seems quite
-complex too.
-
-Basically, under my plan, WAL would be unchanged.  WAL's function is
-crash recovery, and it would retain that.  There would also be no
-on-disk changes.  I would use the command counter in certain cases to
-identify savepoints.
-
-My idea is to keep savepoint undo information in a private area per
-backend, either in memory or on disk.  We can either save the
-relid/tids of modified rows, or if there are too many, discard the
-saved ones and just remember the modified relids.  On rollback to save
-point, either clear up the modified relid/tids, or sequential scan
-through the relid and clear up all the tuples that have our transaction
-id and have command counters that are part of the undo savepoint.
-
-It seems marking undo savepoint rows with a fixed aborted transaction id
-would be the easiest solution.
-
-Of course, we only remember modified rows when we are in savepoints, and
-only undo them when we rollback to a savepoint.  Transaction processing
-remains the same.
-
-There is no reason for other backend to be able to see savepoint undo
-information, and keeping it private greatly simplifies the
-implementation.
-
--- 
-  Bruce Momjian                        |  http://candle.pha.pa.us
-  pgman@candle.pha.pa.us               |  (610) 853-3000
-  +  If your life is a hard drive,     |  830 Blythe Avenue
-  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
-
author	Bruce Momjian <bruce@momjian.us>
	Tue, 17 Aug 2004 17:11:44 +0000 (17:11 +0000)
committer	Bruce Momjian <bruce@momjian.us>
	Tue, 17 Aug 2004 17:11:44 +0000 (17:11 +0000)