From 33b8057d8789418c02b6f3afd954a17c5c340986 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jun 2020 00:45:20 -0400
Subject: [PATCH v1 6/6] Documentation - WIP

TODOs:

1] TBC, the section for pg_is_in_readonly() function, right now it is under "Recovery Information Functions"
2] Documentation regarding ALTER SYSTEM READ/WRITE
---
 src/backend/access/transam/README | 59 ++++++++++++++++++++++++++++---
 src/backend/storage/page/README   | 13 +++----
 2 files changed, 61 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
index eb9aac5fd39..f62929f1660 100644
--- a/src/backend/access/transam/README
+++ b/src/backend/access/transam/README
@@ -433,8 +433,8 @@ to be modified.
 2. START_CRIT_SECTION()  (Any error during the next three steps must cause a
 PANIC because the shared buffers will contain unlogged changes, which we
 have to ensure don't get to disk.  Obviously, you should check conditions
-such as whether there's enough free space on the page before you start the
-critical section.)
+such as whether there's WAL write permission and enough free space on the page
+before you start the critical section.)
 
 3. Apply the required changes to the shared buffer(s).
 
@@ -477,6 +477,54 @@ with the incomplete-split flag set, it will finish the interrupted split by
 inserting the key to the parent, before proceeding.
 
 
+Read only system state
+----------------------
+
+The system state when it is not currently possible to insert write-ahead log
+records, either because the system is still in recovery or because the system
+forced to WAL prohibited by executing ALTER SYSTEM READ ONLY.  We have a
+lower-level defense in XLogBeginInsert() and elsewhere to stop us from modifying
+data during recovery when !XLogInsertAllowed(), but if XLogBeginInsert() is
+inside the critical section we must not depend on it to report an error.
+Otherwise, it will cause PANIC as mentioned previously.
+
+We do not reach the point where we try to write WAL during recovery but ALTER
+SYSTEM READ ONLY can be executed anytime by the user to stop WAL writing.  Any
+backends which receive read-only system state transition barrier interrupt need
+to stop WAL writing immediately.  For barrier absorption the backed(s) will kill
+the running transaction which has valid XID indicates that the transaction has
+performed and/or planning WAL write.  The transaction which doesn't acquire
+valid XID yet or operation such VACUUM or CONCURRENT CREATE INDEX which not
+necessary have valid XID for WAL will not be prevented while barrier processing,
+and those might hit the error from XLogBeginInsert() while trying to write WAL
+in read only system state.  To prevent such error from XLogBeginInsert() inside
+the critical section the WAL write permission has to check before
+START_CRIT_SECTION().
+
+To enforce the practice to check WAL permission before entering into critical
+section for the WAL write, we have added an assert check flag that indicates
+permission has been checked before calling XLogBeginInsert().  If not,
+XLogBeginInsert() will have assertion failure.  WAL permission check is not
+mandatory if the XLogBeginInsert() is not inside the critical section where
+throwing the error is acceptable.  To get permission check flag set either
+CheckWALPermitted(), AssertWALPermitted_HaveXID(), or AssertWALPermitted()
+should be called before START_CRIT_SECTION().  This flag automatically resets
+while exiting from the critical section.  The rule to place either of permission
+check routines will be:
+
+	The places where WAL write operation in critical can be expected without
+	having valid XID (e.g vacuum) need to protect by CheckWALPermitted(), so
+	that error can be reported outside before critical section.
+
+	The places where INSERT and UPDATE are expected which are never happened
+	without valid XID can be checked using AssertWALPermitted_HaveXID.  So that
+	non-assert build will not have the checking overhead.
+
+	The places we know that we cannot be reached in the read-only state and may
+	or may not have XID, but need to ensure the permission has been checked on
+	assert enabled build should use AssertWALPermitted().
+
+
 Constructing a WAL record
 -------------------------
 
@@ -522,7 +570,8 @@ Details of the API functions:
 
 void XLogBeginInsert(void)
 
-    Must be called before XLogRegisterBuffer and XLogRegisterData.
+	Must be called before XLogRegisterBuffer and XLogRegisterData.  WAL
+	permission must be check before calling it in a critical section.
 
 void XLogResetInsertion(void)
 
@@ -630,8 +679,8 @@ If the buffer is clean and checksums are in use then
 MarkBufferDirtyHint() inserts an XLOG_FPI record to ensure that we
 take a full page image that includes the hint. We do this to avoid
 a partial page write, when we write the dirtied page. WAL is not
-written during recovery, so we simply skip dirtying blocks because
-of hints when in recovery.
+written while in read only (i.e. during recovery or in WAL prohibit state), so
+we simply skip dirtying blocks because of hints when in read only.
 
 If you do decide to optimise away a WAL record, then any calls to
 MarkBufferDirty() must be replaced by MarkBufferDirtyHint(),
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index 4e45bd92abc..e5a32e53649 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -56,9 +56,10 @@ WAL is a fatal error and prevents further recovery, whereas a checksum failure
 on a normal data block is a hard error but not a critical one for the server,
 even if it is a very bad thing for the user.
 
-New WAL records cannot be written during recovery, so hint bits set during
-recovery must not dirty the page if the buffer is not already dirty, when
-checksums are enabled.  Systems in Hot-Standby mode may benefit from hint bits
-being set, but with checksums enabled, a page cannot be dirtied after setting a
-hint bit (due to the torn page risk). So, it must wait for full-page images
-containing the hint bit updates to arrive from the master.
+New WAL records cannot be written during recovery or while in WAL prohibit
+state, so hint bits set during read only system state must not dirty the page if
+the buffer is not already dirty, when checksums are enabled.  Systems in
+Hot-Standby mode may benefit from hint bits being set, but with checksums
+enabled, a page cannot be dirtied after setting a hint bit (due to the torn page
+risk). So, it must wait for full-page images containing the hint bit updates to
+arrive from the master.
-- 
2.18.0