From 33b8057d8789418c02b6f3afd954a17c5c340986 Mon Sep 17 00:00:00 2001 From: Amul Sul Date: Tue, 2 Jun 2020 00:45:20 -0400 Subject: [PATCH v1 6/6] Documentation - WIP TODOs: 1] TBC, the section for pg_is_in_readonly() function, right now it is under "Recovery Information Functions" 2] Documentation regarding ALTER SYSTEM READ/WRITE --- src/backend/access/transam/README | 59 ++++++++++++++++++++++++++++--- src/backend/storage/page/README | 13 +++---- 2 files changed, 61 insertions(+), 11 deletions(-) diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README index eb9aac5fd39..f62929f1660 100644 --- a/src/backend/access/transam/README +++ b/src/backend/access/transam/README @@ -433,8 +433,8 @@ to be modified. 2. START_CRIT_SECTION() (Any error during the next three steps must cause a PANIC because the shared buffers will contain unlogged changes, which we have to ensure don't get to disk. Obviously, you should check conditions -such as whether there's enough free space on the page before you start the -critical section.) +such as whether there's WAL write permission and enough free space on the page +before you start the critical section.) 3. Apply the required changes to the shared buffer(s). @@ -477,6 +477,54 @@ with the incomplete-split flag set, it will finish the interrupted split by inserting the key to the parent, before proceeding. +Read only system state +---------------------- + +The system state when it is not currently possible to insert write-ahead log +records, either because the system is still in recovery or because the system +forced to WAL prohibited by executing ALTER SYSTEM READ ONLY. We have a +lower-level defense in XLogBeginInsert() and elsewhere to stop us from modifying +data during recovery when !XLogInsertAllowed(), but if XLogBeginInsert() is +inside the critical section we must not depend on it to report an error. +Otherwise, it will cause PANIC as mentioned previously. + +We do not reach the point where we try to write WAL during recovery but ALTER +SYSTEM READ ONLY can be executed anytime by the user to stop WAL writing. Any +backends which receive read-only system state transition barrier interrupt need +to stop WAL writing immediately. For barrier absorption the backed(s) will kill +the running transaction which has valid XID indicates that the transaction has +performed and/or planning WAL write. The transaction which doesn't acquire +valid XID yet or operation such VACUUM or CONCURRENT CREATE INDEX which not +necessary have valid XID for WAL will not be prevented while barrier processing, +and those might hit the error from XLogBeginInsert() while trying to write WAL +in read only system state. To prevent such error from XLogBeginInsert() inside +the critical section the WAL write permission has to check before +START_CRIT_SECTION(). + +To enforce the practice to check WAL permission before entering into critical +section for the WAL write, we have added an assert check flag that indicates +permission has been checked before calling XLogBeginInsert(). If not, +XLogBeginInsert() will have assertion failure. WAL permission check is not +mandatory if the XLogBeginInsert() is not inside the critical section where +throwing the error is acceptable. To get permission check flag set either +CheckWALPermitted(), AssertWALPermitted_HaveXID(), or AssertWALPermitted() +should be called before START_CRIT_SECTION(). This flag automatically resets +while exiting from the critical section. The rule to place either of permission +check routines will be: + + The places where WAL write operation in critical can be expected without + having valid XID (e.g vacuum) need to protect by CheckWALPermitted(), so + that error can be reported outside before critical section. + + The places where INSERT and UPDATE are expected which are never happened + without valid XID can be checked using AssertWALPermitted_HaveXID. So that + non-assert build will not have the checking overhead. + + The places we know that we cannot be reached in the read-only state and may + or may not have XID, but need to ensure the permission has been checked on + assert enabled build should use AssertWALPermitted(). + + Constructing a WAL record ------------------------- @@ -522,7 +570,8 @@ Details of the API functions: void XLogBeginInsert(void) - Must be called before XLogRegisterBuffer and XLogRegisterData. + Must be called before XLogRegisterBuffer and XLogRegisterData. WAL + permission must be check before calling it in a critical section. void XLogResetInsertion(void) @@ -630,8 +679,8 @@ If the buffer is clean and checksums are in use then MarkBufferDirtyHint() inserts an XLOG_FPI record to ensure that we take a full page image that includes the hint. We do this to avoid a partial page write, when we write the dirtied page. WAL is not -written during recovery, so we simply skip dirtying blocks because -of hints when in recovery. +written while in read only (i.e. during recovery or in WAL prohibit state), so +we simply skip dirtying blocks because of hints when in read only. If you do decide to optimise away a WAL record, then any calls to MarkBufferDirty() must be replaced by MarkBufferDirtyHint(), diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README index 4e45bd92abc..e5a32e53649 100644 --- a/src/backend/storage/page/README +++ b/src/backend/storage/page/README @@ -56,9 +56,10 @@ WAL is a fatal error and prevents further recovery, whereas a checksum failure on a normal data block is a hard error but not a critical one for the server, even if it is a very bad thing for the user. -New WAL records cannot be written during recovery, so hint bits set during -recovery must not dirty the page if the buffer is not already dirty, when -checksums are enabled. Systems in Hot-Standby mode may benefit from hint bits -being set, but with checksums enabled, a page cannot be dirtied after setting a -hint bit (due to the torn page risk). So, it must wait for full-page images -containing the hint bit updates to arrive from the master. +New WAL records cannot be written during recovery or while in WAL prohibit +state, so hint bits set during read only system state must not dirty the page if +the buffer is not already dirty, when checksums are enabled. Systems in +Hot-Standby mode may benefit from hint bits being set, but with checksums +enabled, a page cannot be dirtied after setting a hint bit (due to the torn page +risk). So, it must wait for full-page images containing the hint bit updates to +arrive from the master. -- 2.18.0