Better internal documentation of locking for Hot Standby conflict resolution.

author Simon Riggs <simon@2ndQuadrant.com>

Thu, 21 Jan 2010 00:53:58 +0000 (00:53 +0000)

committer Simon Riggs <simon@2ndQuadrant.com>

Thu, 21 Jan 2010 00:53:58 +0000 (00:53 +0000)
author Simon Riggs <simon@2ndQuadrant.com>
Thu, 21 Jan 2010 00:53:58 +0000 (00:53 +0000)
committer Simon Riggs <simon@2ndQuadrant.com>
Thu, 21 Jan 2010 00:53:58 +0000 (00:53 +0000)
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c

index 41b7941..1793783 100644 (file)
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -37,7 +37,7 @@
   *
   *
   * IDENTIFICATION
- *       $PostgreSQL: pgsql/src/backend/storage/ipc/procarray.c,v 1.57 2010/01/16 17:17:26 tgl Exp $
+ *       $PostgreSQL: pgsql/src/backend/storage/ipc/procarray.c,v 1.58 2010/01/21 00:53:58 sriggs Exp $
   *
   *-------------------------------------------------------------------------
   */
@@ -1623,17 +1623,57 @@ GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
  /*
   * GetConflictingVirtualXIDs -- returns an array of currently active VXIDs.
   *
- * The array is palloc'd and is terminated with an invalid VXID.
- *
   * Usage is limited to conflict resolution during recovery on standby servers.
   * limitXmin is supplied as either latestRemovedXid, or InvalidTransactionId
   * in cases where we cannot accurately determine a value for latestRemovedXid.
- * If limitXmin is InvalidTransactionId then we know that the very
+ *
+ * If limitXmin is InvalidTransactionId then we are forced to assume that
   * latest xid that might have caused a cleanup record will be
   * latestCompletedXid, so we set limitXmin to be latestCompletedXid instead.
   * We then skip any backends with xmin > limitXmin. This means that
   * cleanup records don't conflict with some recent snapshots.
   *
+ * The reason for using latestCompletedxid is that we aren't certain which
+ * of the xids in KnownAssignedXids are actually FATAL errors that did
+ * not write abort records. In almost every case they won't be, but we
+ * don't know that for certain. So we need to conflict with all current
+ * snapshots whose xmin is less than latestCompletedXid to be safe. This
+ * causes false positives in our assessment of which vxids conflict.
+ *
+ * By using exclusive lock we prevent new snapshots from being taken while
+ * we work out which snapshots to conflict with. This protects those new
+ * snapshots from also being included in our conflict list. 
+ *
+ * After the lock is released, we allow snapshots again. It is possible
+ * that we arrive at a snapshot that is identical to one that we just
+ * decided we should conflict with. This a case of false positives, not an
+ * actual problem.
+ * 
+ * There are two cases: (1) if we were correct in using latestCompletedXid
+ * then that means that all xids in the snapshot lower than that are FATAL
+ * errors, so not xids that ever commit. We can make no visibility errors
+ * if we allow such xids into the snapshot. (2) if we erred on the side of
+ * caution and in fact the latestRemovedXid should have been earlier than
+ * latestCompletedXid then we conflicted with a snapshot needlessly. Taking
+ * another identical snapshot is OK, because the earlier conflicted
+ * snapshot was a false positive.
+ * 
+ * In either case, a snapshot taken after conflict assessment will still be
+ * valid and non-conflicting even if an identical snapshot that existed
+ * before conflict assessment was assessed as conflicting.
+ * 
+ * If we allowed concurrent snapshots while we were deciding who to
+ * conflict with we would need to include all concurrent snapshotters in
+ * the conflict list as well. We'd have difficulty in working out exactly
+ * who that was, so it is happier for all concerned if we take an exclusive
+ * lock. Notice that we only hold that lock for as long as it takes to
+ * make the conflict list, not for the whole duration of the conflict
+ * resolution.
+ * 
+ * It also means that users waiting for a snapshot is a good thing, since
+ * it is more likely that they will live longer after having waited. So it
+ * is a benefit, not an oversight that we use exclusive lock here.
+ *
   * We replace InvalidTransactionId with latestCompletedXid here because
   * this is the most convenient place to do that, while we hold ProcArrayLock.
   * The originator of the cleanup record wanted to avoid checking the value of
author	Simon Riggs <simon@2ndQuadrant.com>
	Thu, 21 Jan 2010 00:53:58 +0000 (00:53 +0000)
committer	Simon Riggs <simon@2ndQuadrant.com>
	Thu, 21 Jan 2010 00:53:58 +0000 (00:53 +0000)