From e7a1f9e599ec56a3b142b943b2351b10b5037585 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 19 Jun 2018 16:03:51 +0900 Subject: [PATCH] Add note in pg_rewind documentation about read-only files When performing pg_rewind, the presence of a read-only file which is not accessible for writes will cause a failure while processing. This can cause the control file of the target data folder to be truncated, causing it to not be reusable with a successive run. We have discussed on the thread a couple of ways to deal with the problem: 1) Consider EACCES failures on relation files as critical not not on non-relation files, which goes down to custom configuration files as well as FSM or VM files. Being able to make the difference between custom files and the ones critical for the system requires additional maintenance with the addition of new filtering rules, which is costly in the long term. 2) Order the fetched block ranges and delay the truncation of a file only when its first range chunk is received. That's actually not reliable either, as when facing a failure some of the relation files may have been already manipulated. If by chance one if able to start and stop the target's server after a failed rewind, they have good chances to have already a broken instance. There are some solutions which could be considered: 1) Add pre-checks making sure that a set of files which are going to be processed for copy can be sanely written to, and complain about it before writing any data on the target's data folder. This has the disadvantage that a user would still need to rebuild the links used for what was previously a set of read-only files, and also to fetch read-only files on the source and save them as raw files locally, which could be security-sensitive. 2) Prevent the data of read-only files to be fetched from the source, which would save some bandwidth, but requires the backend's pg_stat_file() to be extended with an entry's st_mode. This can only happen on HEAD. 3) Allow callers to specify custom exclusion rules, however this could be a foot-gun for anything accidentally filtering out critical files. 1) is the one causing the less code churn, still it is not clear to me if any of those are worth considering anyway as read-only files would need to be most likely-rebuilt after a rewind. Hence the most simple solution is to just document the behavior and tell users to not do that. We could always consider new solutions in future releases to ease the handling of such files, but that may not be worth it. 2) would be rather interesting, but that's a lot of infrastructure to justify. Also, when pg_rewind fails mid-flight, there is likely no way to be able to recover the target data folder anyway, in which case a new base backup is the best option. A note is added in the documentation as well. Discussion: https://postgr.es/m/20180104200633.17004.16377%40wrigleys.postgresql.org --- doc/src/sgml/ref/pg_rewind.sgml | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml index 520d843f0e..ee35ce18b0 100644 --- a/doc/src/sgml/ref/pg_rewind.sgml +++ b/doc/src/sgml/ref/pg_rewind.sgml @@ -95,6 +95,26 @@ PostgreSQL documentation are currently on by default. must also be set to on, but is enabled by default. + + + + If pg_rewind fails while processing, then + the data folder of the target is likely not in a state that can be + recovered. In such a case, taking a new fresh backup is recommended. + + + + pg_rewind will fail immediately if it finds + files it cannot write directly to. This can happen for example when + the source and the target server use the same file mapping for read-only + SSL keys and certificates. If such files are present on the target server + it is recommended to remove them before running + pg_rewind. After doing the rewind, some of + those files may have been copied from the source, in which case it may + be necessary to remove the data copied and restore back the set of links + used before the rewind. + + -- 2.17.1