From: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
---|---|
To: | David Steele <david(at)pgmasters(dot)net> |
Cc: | PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Backend crash on non-exclusive backup cancel |
Date: | 2017-02-28 03:05:11 |
Message-ID: | CAB7nPqRCyoJTbgaLs9sH5Ehf0thHZp-KY8hJY6mCkaXAMW0i6w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On Tue, Feb 28, 2017 at 10:33 AM, David Steele <david(at)pgmasters(dot)net> wrote:
> This condition should throw "backup is not in progress" just as a
> exclusive backup would, whether assertions are enabled or not.
>
> I believe the solution is to move the exclusive flag to xlog.c and only
> decrement XLogCtl->Insert.nonExclusiveBackups when exclusive is true,
> otherwise return an error. Even then, it wouldn't be clear if the
> backup had completed or not.
I understand by this sentence that you mean
nonexclusive_backup_running, and I think that I agree with that. We
need a way to reset it properly the session-level switch in case of an
interrupt found, and that needs visibly to happen in
pg_stop_backup_callback and pg_start_backup_callback(). At the same
time I think that exclusive_backup_running had better be moved to
xlog.c as well. If I look at the failure in details when issuing a
cancel, I can see that XLogCtl->Insert.nonExclusiveBackups gets
decremented at the end do_pg_stop_backup, but
nonexclusive_backup_running never gets set back to false because of
the query cancellation.
> I suppose any cancelled non-exclusive
> pg_stop_backup() should be considered aborted whether a stop backup
> record was written or not?
That's not necessarily true, I can see a stop backup able to finish as
well by issuing a cancel request. It seems to me that we just need to
have the shmem information updated at the same time as the
session-level switches for consistency and we're good. The
inconsistency in places when updating the session-level flags and the
shmem-level flags is what is causing harm.
> If that makes sense I'm happy to work up a patch. This is definitely an
> edge case and I seriously doubt it is causing any issues in the field.
Well, at least it is nothing caused directly by 974ece5, I am able to
crash the server with or without that...
There is actually some code that I know of that can issue a cancel
request on pg_stop_backend(), this does not have assertions of course
but it may become an issue. That's pretty close to the improvements I
did recently for lock handling of exclusive backups, so there's
nothing really new for me :)
David, are you willing to write a patch or should I? Changing
nonexclusive_backup_running/exclusive_backup_running to be an enum
would make the code more readable as well IMO.
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2017-02-28 11:15:30 | Re: BUG #14543: libpq fails with group readable ssl keys |
Previous Message | David Steele | 2017-02-28 01:33:34 | Backend crash on non-exclusive backup cancel |
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2017-02-28 03:10:52 | Re: logical replication access control patches |
Previous Message | Bruce Momjian | 2017-02-28 03:01:15 | Re: rename pg_log directory? |