From 6c508b597893a09af4d287284df281b3b1a11b65 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 14 Jul 2020 02:30:44 -0400
Subject: [PATCH v6 5/5] Documentation - WIP

TODOs:

1] TBC, the section for pg_is_in_readonly() function, right now it is under "Recovery Information Functions"
2] Documentation regarding ALTER SYSTEM READ/WRITE
---
 src/backend/access/transam/README | 60 ++++++++++++++++++++++++++++---
 src/backend/storage/page/README   | 12 +++----
 2 files changed, 61 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
index 1edc8180c12..fe7a84f93da 100644
--- a/src/backend/access/transam/README
+++ b/src/backend/access/transam/README
@@ -442,8 +442,8 @@ to be modified.
 2. START_CRIT_SECTION()  (Any error during the next three steps must cause a
 PANIC because the shared buffers will contain unlogged changes, which we
 have to ensure don't get to disk.  Obviously, you should check conditions
-such as whether there's enough free space on the page before you start the
-critical section.)
+such as whether there's WAL write permission and enough free space on the page
+before you start the critical section.)

 3. Apply the required changes to the shared buffer(s).

@@ -486,6 +486,54 @@ with the incomplete-split flag set, it will finish the interrupted split by
 inserting the key to the parent, before proceeding.

+Read only system state
+----------------------
+
+The system state when it is not currently possible to insert write-ahead log
+records, either because the system is still in recovery or because the system
+forced to WAL prohibited by executing ALTER SYSTEM READ ONLY.  We have a
+lower-level defense in XLogBeginInsert() and elsewhere to stop us from modifying
+data during recovery when !XLogInsertAllowed(), but if XLogBeginInsert() is
+inside the critical section we must not depend on it to report an error.
+Otherwise, it will cause PANIC as mentioned previously.
+
+We do not reach the point where we try to write WAL during recovery but ALTER
+SYSTEM READ ONLY can be executed anytime by the user to stop WAL writing.  Any
+backends which receive read-only system state transition barrier interrupt need
+to stop WAL writing immediately.  For barrier absorption the backed(s) will kill
+the running transaction which has valid XID indicates that the transaction has
+performed and/or planning WAL write.  The transaction which doesn't acquire
+valid XID yet or operation such VACUUM or CONCURRENT CREATE INDEX which not
+necessary have valid XID for WAL will not be prevented while barrier processing,
+and those might hit the error from XLogBeginInsert() while trying to write WAL
+in read only system state.  To prevent such error from XLogBeginInsert() inside
+the critical section the WAL write permission has to check before
+START_CRIT_SECTION().
+
+To enforce the practice to check WAL permission before entering into critical
+section for the WAL write, we have added an assert check flag that indicates
+permission has been checked before calling XLogBeginInsert().  If not,
+XLogBeginInsert() will have assertion failure.  WAL permission check is not
+mandatory if the XLogBeginInsert() is not inside the critical section where
+throwing the error is acceptable.  To get permission check flag set either
+CheckWALPermitted(), AssertWALPermitted_HaveXID(), or AssertWALPermitted()
+should be called before START_CRIT_SECTION().  This flag automatically resets
+while exiting from the critical section.  The rule to place either of permission
+check routines will be:
+
+	The places where WAL write operation in critical can be expected without
+	having valid XID (e.g vacuum) need to protect by CheckWALPermitted(), so
+	that error can be reported outside before critical section.
+
+	The places where INSERT and UPDATE are expected which are never happened
+	without valid XID can be checked using AssertWALPermitted_HaveXID.  So that
+	non-assert build will not have the checking overhead.
+
+	The places we know that we cannot be reached in the read-only state and may
+	or may not have XID, but need to ensure the permission has been checked on
+	assert enabled build should use AssertWALPermitted().
+
+
 Constructing a WAL record
 -------------------------

@@ -531,7 +579,8 @@ Details of the API functions:

 void XLogBeginInsert(void)

-    Must be called before XLogRegisterBuffer and XLogRegisterData.
+	Must be called before XLogRegisterBuffer and XLogRegisterData.  WAL
+	permission must be check before calling it in a critical section.

 void XLogResetInsertion(void)

@@ -638,8 +687,9 @@ MarkBufferDirtyHint() to mark the block dirty.
 If the buffer is clean and checksums are in use then MarkBufferDirtyHint()
 inserts an XLOG_FPI_FOR_HINT record to ensure that we take a full page image
 that includes the hint. We do this to avoid a partial page write, when we
-write the dirtied page. WAL is not written during recovery, so we simply skip
-dirtying blocks because of hints when in recovery.
+write the dirtied page. WAL is not written while in read only (i.e. during
+recovery or in WAL prohibit state), so we simply skip dirtying blocks because of
+hints when in recovery.

 If you do decide to optimise away a WAL record, then any calls to
 MarkBufferDirty() must be replaced by MarkBufferDirtyHint(),
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59ad..15f0bb4b7b5 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -56,9 +56,9 @@ WAL is a fatal error and prevents further recovery, whereas a checksum failure
 on a normal data block is a hard error but not a critical one for the server,
 even if it is a very bad thing for the user.

-New WAL records cannot be written during recovery, so hint bits set during
-recovery must not dirty the page if the buffer is not already dirty, when
-checksums are enabled.  Systems in Hot-Standby mode may benefit from hint bits
-being set, but with checksums enabled, a page cannot be dirtied after setting a
-hint bit (due to the torn page risk). So, it must wait for full-page images
-containing the hint bit updates to arrive from the primary.
+New WAL records cannot be written during recovery or or while in WAL prohibit
+state, so hint bits set during recovery must not dirty the page if the buffer is
+not already dirty, when checksums are enabled.  Systems in Hot-Standby mode may
+benefit from hint bits being set, but with checksums enabled, a page cannot be
+dirtied after setting a hint bit (due to the torn page risk). So, it must wait
+for full-page images containing the hint bit updates to arrive from the primary.
-- 
2.22.0