Quick Links

Re: Backend crash on non-exclusive backup cancel

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	David Steele <david(at)pgmasters(dot)net>
Cc:	PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject:	Re: Backend crash on non-exclusive backup cancel
Date:	2017-02-28 03:05:11
Message-ID:	CAB7nPqRCyoJTbgaLs9sH5Ehf0thHZp-KY8hJY6mCkaXAMW0i6w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs pgsql-hackers

On Tue, Feb 28, 2017 at 10:33 AM, David Steele <david(at)pgmasters(dot)net> wrote:
> This condition should throw "backup is not in progress" just as a
> exclusive backup would, whether assertions are enabled or not.
>
> I believe the solution is to move the exclusive flag to xlog.c and only
> decrement XLogCtl->Insert.nonExclusiveBackups when exclusive is true,
> otherwise return an error. Even then, it wouldn't be clear if the
> backup had completed or not.

I understand by this sentence that you mean
nonexclusive_backup_running, and I think that I agree with that. We
need a way to reset it properly the session-level switch in case of an
interrupt found, and that needs visibly to happen in
pg_stop_backup_callback and pg_start_backup_callback(). At the same
time I think that exclusive_backup_running had better be moved to
xlog.c as well. If I look at the failure in details when issuing a
cancel, I can see that XLogCtl->Insert.nonExclusiveBackups gets
decremented at the end do_pg_stop_backup, but
nonexclusive_backup_running never gets set back to false because of
the query cancellation.

> I suppose any cancelled non-exclusive
> pg_stop_backup() should be considered aborted whether a stop backup
> record was written or not?

That's not necessarily true, I can see a stop backup able to finish as
well by issuing a cancel request. It seems to me that we just need to
have the shmem information updated at the same time as the
session-level switches for consistency and we're good. The
inconsistency in places when updating the session-level flags and the
shmem-level flags is what is causing harm.

> If that makes sense I'm happy to work up a patch. This is definitely an
> edge case and I seriously doubt it is causing any issues in the field.

Well, at least it is nothing caused directly by 974ece5, I am able to
crash the server with or without that...

There is actually some code that I know of that can issue a cancel
request on pg_stop_backend(), this does not have assertions of course
but it may become an issue. That's pretty close to the improvements I
did recently for lock handling of exclusive backups, so there's
nothing really new for me :)

David, are you willing to write a patch or should I? Changing
nonexclusive_backup_running/exclusive_backup_running to be an enum
would make the code more readable as well IMO.
--
Michael

In response to

Backend crash on non-exclusive backup cancel at 2017-02-28 01:33:34 from David Steele

Responses

Re: Backend crash on non-exclusive backup cancel at 2017-02-28 13:21:18 from David Steele

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Magnus Hagander	2017-02-28 11:15:30	Re: BUG #14543: libpq fails with group readable ssl keys
Previous Message	David Steele	2017-02-28 01:33:34	Backend crash on non-exclusive backup cancel

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Stephen Frost	2017-02-28 03:10:52	Re: logical replication access control patches
Previous Message	Bruce Momjian	2017-02-28 03:01:15	Re: rename pg_log directory?