src/conf/domain_conf.c | 14 ++- src/conf/domain_conf.h | 1 + src/libvirt_private.syms | 7 ++ src/qemu/qemu_domain.h | 1 + src/qemu/qemu_process.c | 49 ++++++-- src/util/vircommand.c | 37 ++++++ src/util/vircommand.h | 3 + src/util/virpidfile.c | 113 +++++++++++++----- src/util/virpidfile.h | 17 +++ src/util/virprocess.c | 73 ++++++++++- src/util/virprocess.h | 8 ++ .../qemustatusxml2xmldata/backup-pull-in.xml | 2 +- .../blockjob-blockdev-in.xml | 2 +- .../blockjob-mirror-in.xml | 2 +- .../migration-in-params-in.xml | 2 +- .../migration-out-nbd-bitmaps-in.xml | 2 +- .../migration-out-nbd-in.xml | 2 +- .../migration-out-nbd-out.xml | 2 +- .../migration-out-nbd-tls-in.xml | 2 +- .../migration-out-nbd-tls-out.xml | 2 +- .../migration-out-params-in.xml | 2 +- tests/qemustatusxml2xmldata/modern-in.xml | 2 +- tests/qemustatusxml2xmldata/upgrade-in.xml | 2 +- tests/qemustatusxml2xmldata/upgrade-out.xml | 2 +- .../qemustatusxml2xmldata/vcpus-multi-in.xml | 2 +- 25 files changed, 295 insertions(+), 56 deletions(-)
Libvirt stores pid of domain(e.g Qemu) process when a domain process is started and same pid is used while destroying domain process. There is always a race possible that before libvirt tries to kill domain process, actual domain process is already dead and same pid is used for another process. In that case libvirt may kill an innocent process. With this patch we store start time of domain process when domain is started and we match stime again while killing domain process if it does not match we can assume domain process is already dead. This patch series tries to create a generic interface which can be used for non-domain processes too, even though initial patch series handles pid re-use for domain process only. Proposed changes: 1. While creating a daemon process through virExec, create a file which stores start time of the process, along with the pid file. A Separate file for start time is created instead of storing start time too in *.pid file so that existing external user of <domain-id>*.pid file does not break. 2. Persist stime in domstatus in domain xml. 3. In virProcessKill before sending signal to process also verify start time of process. For sending signal to process pidfd_send_signal is used avoid any race between verifying start time and sending signal. Following steps are taken while killing a process. 3.1 Open a pid-fd for given pid with pidfd_open. 3.2 Verify start time of pid with stored start time. If start time does not match, assume process is already dead. 3.3 Send signal to pid-fd with pidfd_send_signal. Even if pid was re-used just after verifying star time, signal will be sent to a pidfd instance which was created before verifying timestamp. manish.mishra (1): Check for pid re-use before killing qemu process src/conf/domain_conf.c | 14 ++- src/conf/domain_conf.h | 1 + src/libvirt_private.syms | 7 ++ src/qemu/qemu_domain.h | 1 + src/qemu/qemu_process.c | 49 ++++++-- src/util/vircommand.c | 37 ++++++ src/util/vircommand.h | 3 + src/util/virpidfile.c | 113 +++++++++++++----- src/util/virpidfile.h | 17 +++ src/util/virprocess.c | 73 ++++++++++- src/util/virprocess.h | 8 ++ .../qemustatusxml2xmldata/backup-pull-in.xml | 2 +- .../blockjob-blockdev-in.xml | 2 +- .../blockjob-mirror-in.xml | 2 +- .../migration-in-params-in.xml | 2 +- .../migration-out-nbd-bitmaps-in.xml | 2 +- .../migration-out-nbd-in.xml | 2 +- .../migration-out-nbd-out.xml | 2 +- .../migration-out-nbd-tls-in.xml | 2 +- .../migration-out-nbd-tls-out.xml | 2 +- .../migration-out-params-in.xml | 2 +- tests/qemustatusxml2xmldata/modern-in.xml | 2 +- tests/qemustatusxml2xmldata/upgrade-in.xml | 2 +- tests/qemustatusxml2xmldata/upgrade-out.xml | 2 +- .../qemustatusxml2xmldata/vcpus-multi-in.xml | 2 +- 25 files changed, 295 insertions(+), 56 deletions(-) -- 2.22.3
I believe that pidfd syscalls were introduced in kernel 5.2. Judging by our CI build setup, the oldest distrubution that we support is Alma Linux 8, which still has kernel version 4.18. On 10/6/22 4:33 AM, manish.mishra wrote: > Libvirt stores pid of domain(e.g Qemu) process when a domain process is started > and same pid is used while destroying domain process. There is always a race > possible that before libvirt tries to kill domain process, actual domain process > is already dead and same pid is used for another process. In that case libvirt > may kill an innocent process. > > With this patch we store start time of domain process when domain is started and > we match stime again while killing domain process if it does not match we can > assume domain process is already dead. > > This patch series tries to create a generic interface which can be used for > non-domain processes too, even though initial patch series handles pid re-use > for domain process only. > > Proposed changes: > 1. While creating a daemon process through virExec, create a file which stores > start time of the process, along with the pid file. A Separate file for start > time is created instead of storing start time too in *.pid file so that > existing external user of <domain-id>*.pid file does not break. > 2. Persist stime in domstatus in domain xml. > 3. In virProcessKill before sending signal to process also verify start time of > process. For sending signal to process pidfd_send_signal is used avoid any > race between verifying start time and sending signal. Following steps are > taken while killing a process. > 3.1 Open a pid-fd for given pid with pidfd_open. > 3.2 Verify start time of pid with stored start time. If start time does not > match, assume process is already dead. > 3.3 Send signal to pid-fd with pidfd_send_signal. Even if pid was re-used > just after verifying star time, signal will be sent to a pidfd instance > which was created before verifying timestamp. > > manish.mishra (1): > Check for pid re-use before killing qemu process > > src/conf/domain_conf.c | 14 ++- > src/conf/domain_conf.h | 1 + > src/libvirt_private.syms | 7 ++ > src/qemu/qemu_domain.h | 1 + > src/qemu/qemu_process.c | 49 ++++++-- > src/util/vircommand.c | 37 ++++++ > src/util/vircommand.h | 3 + > src/util/virpidfile.c | 113 +++++++++++++----- > src/util/virpidfile.h | 17 +++ > src/util/virprocess.c | 73 ++++++++++- > src/util/virprocess.h | 8 ++ > .../qemustatusxml2xmldata/backup-pull-in.xml | 2 +- > .../blockjob-blockdev-in.xml | 2 +- > .../blockjob-mirror-in.xml | 2 +- > .../migration-in-params-in.xml | 2 +- > .../migration-out-nbd-bitmaps-in.xml | 2 +- > .../migration-out-nbd-in.xml | 2 +- > .../migration-out-nbd-out.xml | 2 +- > .../migration-out-nbd-tls-in.xml | 2 +- > .../migration-out-nbd-tls-out.xml | 2 +- > .../migration-out-params-in.xml | 2 +- > tests/qemustatusxml2xmldata/modern-in.xml | 2 +- > tests/qemustatusxml2xmldata/upgrade-in.xml | 2 +- > tests/qemustatusxml2xmldata/upgrade-out.xml | 2 +- > .../qemustatusxml2xmldata/vcpus-multi-in.xml | 2 +- > 25 files changed, 295 insertions(+), 56 deletions(-) >
On Tue, Oct 11, 2022 at 11:20:00AM -0500, Jonathon Jongsma wrote: > I believe that pidfd syscalls were introduced in kernel 5.2. Judging by our > CI build setup, the oldest distrubution that we support is Alma Linux 8, > which still has kernel version 4.18. We need to support FreeBSD / macOS for the QEMU driver too, which becomes the same problem. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Thanks for review Jonathon, Daniel On 11/10/22 9:56 pm, Daniel P. Berrangé wrote: > On Tue, Oct 11, 2022 at 11:20:00AM -0500, Jonathon Jongsma wrote: >> I believe that pidfd syscalls were introduced in kernel 5.2. Judging by our >> CI build setup, the oldest distrubution that we support is Alma Linux 8, >> which still has kernel version 4.18. I did not know that, will it okay to put it in some condional check. Pidfd is used anyway to remove a very small window race, more important one is start time comparison. As these kind of issues can be very easily reproducible if domain process dies when libvirt is dead and by time libvirt comes pid is re-used. So atleast start time checking reduce race window even for older kernel. > We need to support FreeBSD / macOS for the QEMU driver too, which becomes > the same problem. Sorry Daniel i could not understand much of this, can you please give some more detail. > > With regards, > Daniel Thanks Manish Mishra
On Tue, Oct 11, 2022 at 10:11:29PM +0530, manish.mishra wrote: > Thanks for review Jonathon, Daniel > > On 11/10/22 9:56 pm, Daniel P. Berrangé wrote: > > On Tue, Oct 11, 2022 at 11:20:00AM -0500, Jonathon Jongsma wrote: > > > I believe that pidfd syscalls were introduced in kernel 5.2. Judging by our > > > CI build setup, the oldest distrubution that we support is Alma Linux 8, > > > which still has kernel version 4.18. > > I did not know that, will it okay to put it in some condional check. Pidfd > is used anyway to remove a very small window race, more important one > is start time comparison. As these kind of issues can be very easily > reproducible if domain process dies when libvirt is dead and by time > libvirt comes pid is re-used. So atleast start time checking reduce race > window even for older kernel. I presume you're seeing this on fairly old kernels ? On modern kernels, pid_max is ~4 billion, instead of 64k, so the chances of seeing PID reuse is tiny. If the risk is primarily from the situation where libvirtd was shutoff at the time QEMU stopped, then we ought to read /proc/$PID/stat in qemuProcessReconnect() and entirely skip any attempt to reconnect if we see the starttime has changed while we were stopped. Of course needs to be skipped on non-Linux. Also having an conditionally compiled pidfd check during the kill() path would be a second safety net, for modern Linux. > > We need to support FreeBSD / macOS for the QEMU driver too, which becomes > > the same problem. > > Sorry Daniel i could not understand much of this, can you please give some Libvirt targets multiple platforms, not merely Linux. the QEMU driver is expected to run on Linux, FreeBSD and macOS, so cannot rely on Linux specific functionality - that has to be conditionally built. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 11/10/22 10:17 pm, Daniel P. Berrangé wrote: > On Tue, Oct 11, 2022 at 10:11:29PM +0530, manish.mishra wrote: >> Thanks for review Jonathon, Daniel >> >> On 11/10/22 9:56 pm, Daniel P. Berrangé wrote: >>> On Tue, Oct 11, 2022 at 11:20:00AM -0500, Jonathon Jongsma wrote: >>>> I believe that pidfd syscalls were introduced in kernel 5.2. Judging by our >>>> CI build setup, the oldest distrubution that we support is Alma Linux 8, >>>> which still has kernel version 4.18. >> I did not know that, will it okay to put it in some condional check. Pidfd >> is used anyway to remove a very small window race, more important one >> is start time comparison. As these kind of issues can be very easily >> reproducible if domain process dies when libvirt is dead and by time >> libvirt comes pid is re-used. So atleast start time checking reduce race >> window even for older kernel. > I presume you're seeing this on fairly old kernels ? On modern > kernels, pid_max is ~4 billion, instead of 64k, so the chances > of seeing PID reuse is tiny. > > If the risk is primarily from the situation where libvirtd was > shutoff at the time QEMU stopped, then we ought to read > /proc/$PID/stat in qemuProcessReconnect() and entirely skip > any attempt to reconnect if we see the starttime has changed > while we were stopped. Of course needs to be skipped on non-Linux. > > Also having an conditionally compiled pidfd check during the > kill() path would be a second safety net, for modern Linux. > >>> We need to support FreeBSD / macOS for the QEMU driver too, which becomes >>> the same problem. >> Sorry Daniel i could not understand much of this, can you please give some > Libvirt targets multiple platforms, not merely Linux. the QEMU driver > is expected to run on Linux, FreeBSD and macOS, so cannot rely on > Linux specific functionality - that has to be conditionally built. Yes got it, this clarifies, thanks Daniel, I currently i had for non-windows platform, sure i will make is for specific, also check for kernel version. Are you okay to create a separate file for stime. I mean i have to keep this keeping in mind that i can not change .pid file and libvirt can restart any time so persisting in just domain xml by sharing stime with pipes may not be full proof. > > With regards, > Daniel
© 2016 - 2024 Red Hat, Inc.