[RFC 0/1] Check for pid re-use before killing domain process

manish.mishra posted 1 patch 1 year, 6 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20221006093331.237161-1-manish.mishra@nutanix.com
src/conf/domain_conf.c                        |  14 ++-
src/conf/domain_conf.h                        |   1 +
src/libvirt_private.syms                      |   7 ++
src/qemu/qemu_domain.h                        |   1 +
src/qemu/qemu_process.c                       |  49 ++++++--
src/util/vircommand.c                         |  37 ++++++
src/util/vircommand.h                         |   3 +
src/util/virpidfile.c                         | 113 +++++++++++++-----
src/util/virpidfile.h                         |  17 +++
src/util/virprocess.c                         |  73 ++++++++++-
src/util/virprocess.h                         |   8 ++
.../qemustatusxml2xmldata/backup-pull-in.xml  |   2 +-
.../blockjob-blockdev-in.xml                  |   2 +-
.../blockjob-mirror-in.xml                    |   2 +-
.../migration-in-params-in.xml                |   2 +-
.../migration-out-nbd-bitmaps-in.xml          |   2 +-
.../migration-out-nbd-in.xml                  |   2 +-
.../migration-out-nbd-out.xml                 |   2 +-
.../migration-out-nbd-tls-in.xml              |   2 +-
.../migration-out-nbd-tls-out.xml             |   2 +-
.../migration-out-params-in.xml               |   2 +-
tests/qemustatusxml2xmldata/modern-in.xml     |   2 +-
tests/qemustatusxml2xmldata/upgrade-in.xml    |   2 +-
tests/qemustatusxml2xmldata/upgrade-out.xml   |   2 +-
.../qemustatusxml2xmldata/vcpus-multi-in.xml  |   2 +-
25 files changed, 295 insertions(+), 56 deletions(-)
[RFC 0/1] Check for pid re-use before killing domain process
Posted by manish.mishra 1 year, 6 months ago
Libvirt stores pid of domain(e.g Qemu) process when a domain process is started
and same pid is used while destroying domain process. There is always a race
possible that before libvirt tries to kill domain process, actual domain process
is already dead and same pid is used for another process. In that case libvirt
may kill an innocent process.

With this patch we store start time of domain process when domain is started and
we match stime again while killing domain process if it does not match we can
assume domain process is already dead.

This patch series tries to create a generic interface which can be used for
non-domain processes too, even though initial patch series handles pid re-use
for domain process only.

Proposed changes:
1. While creating a daemon process through virExec, create a file which stores
   start time of the process, along with the pid file. A Separate file for start
   time is created instead of storing start time too in *.pid file so that
   existing external user of <domain-id>*.pid file does not break.
2. Persist stime in domstatus in domain xml.
3. In virProcessKill before sending signal to process also verify start time of 
   process. For sending signal to process pidfd_send_signal is used avoid any 
   race between verifying start time and sending signal. Following steps are
   taken while killing a process.
   3.1 Open a pid-fd for given pid with pidfd_open.
   3.2 Verify start time of pid with stored start time. If start time does not 
       match, assume process is already dead.
   3.3 Send signal to pid-fd with pidfd_send_signal. Even if pid was re-used
       just after verifying star time, signal will be sent to a pidfd instance
       which was created before verifying timestamp.

manish.mishra (1):
  Check for pid re-use before killing qemu process

 src/conf/domain_conf.c                        |  14 ++-
 src/conf/domain_conf.h                        |   1 +
 src/libvirt_private.syms                      |   7 ++
 src/qemu/qemu_domain.h                        |   1 +
 src/qemu/qemu_process.c                       |  49 ++++++--
 src/util/vircommand.c                         |  37 ++++++
 src/util/vircommand.h                         |   3 +
 src/util/virpidfile.c                         | 113 +++++++++++++-----
 src/util/virpidfile.h                         |  17 +++
 src/util/virprocess.c                         |  73 ++++++++++-
 src/util/virprocess.h                         |   8 ++
 .../qemustatusxml2xmldata/backup-pull-in.xml  |   2 +-
 .../blockjob-blockdev-in.xml                  |   2 +-
 .../blockjob-mirror-in.xml                    |   2 +-
 .../migration-in-params-in.xml                |   2 +-
 .../migration-out-nbd-bitmaps-in.xml          |   2 +-
 .../migration-out-nbd-in.xml                  |   2 +-
 .../migration-out-nbd-out.xml                 |   2 +-
 .../migration-out-nbd-tls-in.xml              |   2 +-
 .../migration-out-nbd-tls-out.xml             |   2 +-
 .../migration-out-params-in.xml               |   2 +-
 tests/qemustatusxml2xmldata/modern-in.xml     |   2 +-
 tests/qemustatusxml2xmldata/upgrade-in.xml    |   2 +-
 tests/qemustatusxml2xmldata/upgrade-out.xml   |   2 +-
 .../qemustatusxml2xmldata/vcpus-multi-in.xml  |   2 +-
 25 files changed, 295 insertions(+), 56 deletions(-)

-- 
2.22.3
Re: [RFC 0/1] Check for pid re-use before killing domain process
Posted by Jonathon Jongsma 1 year, 6 months ago
I believe that pidfd syscalls were introduced in kernel 5.2. Judging by 
our CI build setup, the oldest distrubution that we support is Alma 
Linux 8, which still has kernel version 4.18.


On 10/6/22 4:33 AM, manish.mishra wrote:
> Libvirt stores pid of domain(e.g Qemu) process when a domain process is started
> and same pid is used while destroying domain process. There is always a race
> possible that before libvirt tries to kill domain process, actual domain process
> is already dead and same pid is used for another process. In that case libvirt
> may kill an innocent process.
> 
> With this patch we store start time of domain process when domain is started and
> we match stime again while killing domain process if it does not match we can
> assume domain process is already dead.
> 
> This patch series tries to create a generic interface which can be used for
> non-domain processes too, even though initial patch series handles pid re-use
> for domain process only.
> 
> Proposed changes:
> 1. While creating a daemon process through virExec, create a file which stores
>     start time of the process, along with the pid file. A Separate file for start
>     time is created instead of storing start time too in *.pid file so that
>     existing external user of <domain-id>*.pid file does not break.
> 2. Persist stime in domstatus in domain xml.
> 3. In virProcessKill before sending signal to process also verify start time of
>     process. For sending signal to process pidfd_send_signal is used avoid any
>     race between verifying start time and sending signal. Following steps are
>     taken while killing a process.
>     3.1 Open a pid-fd for given pid with pidfd_open.
>     3.2 Verify start time of pid with stored start time. If start time does not
>         match, assume process is already dead.
>     3.3 Send signal to pid-fd with pidfd_send_signal. Even if pid was re-used
>         just after verifying star time, signal will be sent to a pidfd instance
>         which was created before verifying timestamp.



> 
> manish.mishra (1):
>    Check for pid re-use before killing qemu process
> 
>   src/conf/domain_conf.c                        |  14 ++-
>   src/conf/domain_conf.h                        |   1 +
>   src/libvirt_private.syms                      |   7 ++
>   src/qemu/qemu_domain.h                        |   1 +
>   src/qemu/qemu_process.c                       |  49 ++++++--
>   src/util/vircommand.c                         |  37 ++++++
>   src/util/vircommand.h                         |   3 +
>   src/util/virpidfile.c                         | 113 +++++++++++++-----
>   src/util/virpidfile.h                         |  17 +++
>   src/util/virprocess.c                         |  73 ++++++++++-
>   src/util/virprocess.h                         |   8 ++
>   .../qemustatusxml2xmldata/backup-pull-in.xml  |   2 +-
>   .../blockjob-blockdev-in.xml                  |   2 +-
>   .../blockjob-mirror-in.xml                    |   2 +-
>   .../migration-in-params-in.xml                |   2 +-
>   .../migration-out-nbd-bitmaps-in.xml          |   2 +-
>   .../migration-out-nbd-in.xml                  |   2 +-
>   .../migration-out-nbd-out.xml                 |   2 +-
>   .../migration-out-nbd-tls-in.xml              |   2 +-
>   .../migration-out-nbd-tls-out.xml             |   2 +-
>   .../migration-out-params-in.xml               |   2 +-
>   tests/qemustatusxml2xmldata/modern-in.xml     |   2 +-
>   tests/qemustatusxml2xmldata/upgrade-in.xml    |   2 +-
>   tests/qemustatusxml2xmldata/upgrade-out.xml   |   2 +-
>   .../qemustatusxml2xmldata/vcpus-multi-in.xml  |   2 +-
>   25 files changed, 295 insertions(+), 56 deletions(-)
>
Re: [RFC 0/1] Check for pid re-use before killing domain process
Posted by Daniel P. Berrangé 1 year, 6 months ago
On Tue, Oct 11, 2022 at 11:20:00AM -0500, Jonathon Jongsma wrote:
> I believe that pidfd syscalls were introduced in kernel 5.2. Judging by our
> CI build setup, the oldest distrubution that we support is Alma Linux 8,
> which still has kernel version 4.18.

We need to support FreeBSD / macOS for the QEMU driver too, which becomes
the same problem.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [RFC 0/1] Check for pid re-use before killing domain process
Posted by manish.mishra 1 year, 6 months ago
Thanks for review Jonathon, Daniel

On 11/10/22 9:56 pm, Daniel P. Berrangé wrote:
> On Tue, Oct 11, 2022 at 11:20:00AM -0500, Jonathon Jongsma wrote:
>> I believe that pidfd syscalls were introduced in kernel 5.2. Judging by our
>> CI build setup, the oldest distrubution that we support is Alma Linux 8,
>> which still has kernel version 4.18.

I did not know that, will it okay to put it in some condional check. Pidfd

is used anyway to remove a very small window race, more important one

is start time comparison. As these kind of issues can be very easily

reproducible if domain process dies when libvirt is dead and by time

libvirt comes pid is re-used. So atleast start time checking reduce race

window even for older kernel.

> We need to support FreeBSD / macOS for the QEMU driver too, which becomes
> the same problem.

Sorry Daniel i could not understand much of this, can you please give some

more detail.

>
> With regards,
> Daniel

Thanks

Manish Mishra
Re: [RFC 0/1] Check for pid re-use before killing domain process
Posted by Daniel P. Berrangé 1 year, 6 months ago
On Tue, Oct 11, 2022 at 10:11:29PM +0530, manish.mishra wrote:
> Thanks for review Jonathon, Daniel
> 
> On 11/10/22 9:56 pm, Daniel P. Berrangé wrote:
> > On Tue, Oct 11, 2022 at 11:20:00AM -0500, Jonathon Jongsma wrote:
> > > I believe that pidfd syscalls were introduced in kernel 5.2. Judging by our
> > > CI build setup, the oldest distrubution that we support is Alma Linux 8,
> > > which still has kernel version 4.18.
> 
> I did not know that, will it okay to put it in some condional check. Pidfd
> is used anyway to remove a very small window race, more important one
> is start time comparison. As these kind of issues can be very easily
> reproducible if domain process dies when libvirt is dead and by time
> libvirt comes pid is re-used. So atleast start time checking reduce race
> window even for older kernel.

I presume you're seeing this on fairly old kernels ? On modern
kernels, pid_max is ~4 billion, instead of 64k, so the chances
of seeing PID reuse is tiny.

If the risk is primarily from the situation where libvirtd was
shutoff at the time QEMU stopped, then we ought to read
/proc/$PID/stat in qemuProcessReconnect() and entirely skip
any attempt to reconnect if we see the starttime has changed
while we were stopped. Of course needs to be skipped on non-Linux.

Also having an conditionally compiled pidfd check during the
kill() path would be a second safety net, for modern Linux.

> > We need to support FreeBSD / macOS for the QEMU driver too, which becomes
> > the same problem.
> 
> Sorry Daniel i could not understand much of this, can you please give some

Libvirt targets multiple platforms, not merely Linux. the QEMU driver
is expected to run on Linux, FreeBSD and macOS, so cannot rely on
Linux specific functionality - that has to be conditionally built.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [RFC 0/1] Check for pid re-use before killing domain process
Posted by manish.mishra 1 year, 6 months ago
On 11/10/22 10:17 pm, Daniel P. Berrangé wrote:
> On Tue, Oct 11, 2022 at 10:11:29PM +0530, manish.mishra wrote:
>> Thanks for review Jonathon, Daniel
>>
>> On 11/10/22 9:56 pm, Daniel P. Berrangé wrote:
>>> On Tue, Oct 11, 2022 at 11:20:00AM -0500, Jonathon Jongsma wrote:
>>>> I believe that pidfd syscalls were introduced in kernel 5.2. Judging by our
>>>> CI build setup, the oldest distrubution that we support is Alma Linux 8,
>>>> which still has kernel version 4.18.
>> I did not know that, will it okay to put it in some condional check. Pidfd
>> is used anyway to remove a very small window race, more important one
>> is start time comparison. As these kind of issues can be very easily
>> reproducible if domain process dies when libvirt is dead and by time
>> libvirt comes pid is re-used. So atleast start time checking reduce race
>> window even for older kernel.
> I presume you're seeing this on fairly old kernels ? On modern
> kernels, pid_max is ~4 billion, instead of 64k, so the chances
> of seeing PID reuse is tiny.
>
> If the risk is primarily from the situation where libvirtd was
> shutoff at the time QEMU stopped, then we ought to read
> /proc/$PID/stat in qemuProcessReconnect() and entirely skip
> any attempt to reconnect if we see the starttime has changed
> while we were stopped. Of course needs to be skipped on non-Linux.
>
> Also having an conditionally compiled pidfd check during the
> kill() path would be a second safety net, for modern Linux.
>
>>> We need to support FreeBSD / macOS for the QEMU driver too, which becomes
>>> the same problem.
>> Sorry Daniel i could not understand much of this, can you please give some
> Libvirt targets multiple platforms, not merely Linux. the QEMU driver
> is expected to run on Linux, FreeBSD and macOS, so cannot rely on
> Linux specific functionality - that has to be conditionally built.

Yes got it, this clarifies, thanks Daniel, I currently i had for non-windows

platform, sure i will make is for specific, also check for kernel version.

Are you okay to create a separate file for stime. I mean i have to keep

this keeping in mind that i can not change .pid file and libvirt can restart

any time so persisting in just domain xml by sharing stime with pipes

may not be full proof.

>
> With regards,
> Daniel