Add a mutext to protect the SIGBUS case, as we cannot mess concurrently
with the sigbus handler and we have to manage the global variable
sigbus_memset_context. The MADV_POPULATE_WRITE path can run
concurrently.
Note that page_mutex and page_cond are shared between concurrent
invocations, which shouldn't be a problem.
This is a preparation for future virtio-mem prealloc code, which will call
os_mem_prealloc() asynchronously from an iothread when handling guest
requests.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
util/oslib-posix.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 60d1da2d6c..181f6bbf1a 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -94,6 +94,7 @@ typedef struct MemsetThread MemsetThread;
/* used by sigbus_handler() */
static MemsetContext *sigbus_memset_context;
+static QemuMutex sigbus_mutex;
static QemuMutex page_mutex;
static QemuCond page_cond;
@@ -605,12 +606,17 @@ static bool madv_populate_write_possible(char *area, size_t pagesize)
void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
Error **errp)
{
+ static gsize initialized;
int ret;
struct sigaction act, oldact;
size_t hpagesize = qemu_fd_getpagesize(fd);
size_t numpages = DIV_ROUND_UP(memory, hpagesize);
bool use_madv_populate_write;
+ if (g_once_init_enter(&initialized)) {
+ qemu_mutex_init(&sigbus_mutex);
+ }
+
/*
* Sense on every invocation, as MADV_POPULATE_WRITE cannot be used for
* some special mappings, such as mapping /dev/mem.
@@ -620,6 +626,7 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
}
if (!use_madv_populate_write) {
+ qemu_mutex_lock(&sigbus_mutex);
memset(&act, 0, sizeof(act));
act.sa_handler = &sigbus_handler;
act.sa_flags = 0;
@@ -646,6 +653,7 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
perror("os_mem_prealloc: failed to reinstall signal handler");
exit(1);
}
+ qemu_mutex_unlock(&sigbus_mutex);
}
}
--
2.31.1
On Wed, Jul 14, 2021 at 01:23:06PM +0200, David Hildenbrand wrote:
> Add a mutext to protect the SIGBUS case, as we cannot mess concurrently
typo s/mutext/mutex/
> with the sigbus handler and we have to manage the global variable
> sigbus_memset_context. The MADV_POPULATE_WRITE path can run
> concurrently.
>
> Note that page_mutex and page_cond are shared between concurrent
> invocations, which shouldn't be a problem.
>
> This is a preparation for future virtio-mem prealloc code, which will call
> os_mem_prealloc() asynchronously from an iothread when handling guest
> requests.
Hmm, I'm wondering how the need to temporarily play with SIGBUS
at runtime for mem preallocation will interact with the SIGBUS
handler installed by softmmu/cpus.c.
The SIGBUS handler the preallocation code is installed just
blindly assumes the SIGBUS is related to the preallocation
work being done. This is a fine assumption during initially
startup where we're single threaded and not running guest
CPUs. I'm less clear on whether that's a valid assumption
at runtime once guest CPUs are running.
If the sigbus_handler method in softmmu/cpus.c is doing
something important for QEMU, then why is it ok for us to
periodically disable that handler and replace it with
something else that takes a completely different action ?
Of course with the madvise impl we're bypassing the SIGBUS
dance entirely. This is good for people with new kernels,
but is this SIGBUS stuff safe for older kernels ?
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> util/oslib-posix.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 60d1da2d6c..181f6bbf1a 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -94,6 +94,7 @@ typedef struct MemsetThread MemsetThread;
>
> /* used by sigbus_handler() */
> static MemsetContext *sigbus_memset_context;
> +static QemuMutex sigbus_mutex;
>
> static QemuMutex page_mutex;
> static QemuCond page_cond;
> @@ -605,12 +606,17 @@ static bool madv_populate_write_possible(char *area, size_t pagesize)
> void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
> Error **errp)
> {
> + static gsize initialized;
> int ret;
> struct sigaction act, oldact;
> size_t hpagesize = qemu_fd_getpagesize(fd);
> size_t numpages = DIV_ROUND_UP(memory, hpagesize);
> bool use_madv_populate_write;
>
> + if (g_once_init_enter(&initialized)) {
> + qemu_mutex_init(&sigbus_mutex);
> + }
> +
> /*
> * Sense on every invocation, as MADV_POPULATE_WRITE cannot be used for
> * some special mappings, such as mapping /dev/mem.
> @@ -620,6 +626,7 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
> }
>
> if (!use_madv_populate_write) {
> + qemu_mutex_lock(&sigbus_mutex);
> memset(&act, 0, sizeof(act));
> act.sa_handler = &sigbus_handler;
> act.sa_flags = 0;
> @@ -646,6 +653,7 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
> perror("os_mem_prealloc: failed to reinstall signal handler");
> exit(1);
> }
> + qemu_mutex_unlock(&sigbus_mutex);
> }
> }
>
> --
> 2.31.1
>
>
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 20.07.21 16:22, Daniel P. Berrangé wrote: > On Wed, Jul 14, 2021 at 01:23:06PM +0200, David Hildenbrand wrote: >> Add a mutext to protect the SIGBUS case, as we cannot mess concurrently > > typo s/mutext/mutex/ > >> with the sigbus handler and we have to manage the global variable >> sigbus_memset_context. The MADV_POPULATE_WRITE path can run >> concurrently. >> >> Note that page_mutex and page_cond are shared between concurrent >> invocations, which shouldn't be a problem. >> >> This is a preparation for future virtio-mem prealloc code, which will call >> os_mem_prealloc() asynchronously from an iothread when handling guest >> requests. > > Hmm, I'm wondering how the need to temporarily play with SIGBUS > at runtime for mem preallocation will interact with the SIGBUS > handler installed by softmmu/cpus.c. That's exactly why I came up with MADV_POPULATE_WRITE, to avoid having to mess with different kinds of sigbus at the same time. You can only get it wrong. > > The SIGBUS handler the preallocation code is installed just > blindly assumes the SIGBUS is related to the preallocation > work being done. This is a fine assumption during initially > startup where we're single threaded and not running guest > CPUs. I'm less clear on whether that's a valid assumption > at runtime once guest CPUs are running. I assume it's quite broken, for example, already when hotplugging a DIMM and prallocating memory for the memory backend. > > If the sigbus_handler method in softmmu/cpus.c is doing > something important for QEMU, then why is it ok for us to > periodically disable that handler and replace it with > something else that takes a completely different action ? I don't think it is ok. I assume it has been broken for a long time, just nobody ever ran into that race. > > Of course with the madvise impl we're bypassing the SIGBUS > dance entirely. This is good for people with new kernels, > but is this SIGBUS stuff safe for older kernels ? It remains broken with old kernels I guess. There isn't too much that we can do: disabling prealloc=on once the VM is running breaks existing use cases. Fortunately, running into that race seems to be rare, at least I never hear reports. -- Thanks, David / dhildenb
On Tue, Jul 20, 2021 at 04:27:32PM +0200, David Hildenbrand wrote: > On 20.07.21 16:22, Daniel P. Berrangé wrote: > > On Wed, Jul 14, 2021 at 01:23:06PM +0200, David Hildenbrand wrote: > > > Add a mutext to protect the SIGBUS case, as we cannot mess concurrently > > > > typo s/mutext/mutex/ > > > > > with the sigbus handler and we have to manage the global variable > > > sigbus_memset_context. The MADV_POPULATE_WRITE path can run > > > concurrently. > > > > > > Note that page_mutex and page_cond are shared between concurrent > > > invocations, which shouldn't be a problem. > > > > > > This is a preparation for future virtio-mem prealloc code, which will call > > > os_mem_prealloc() asynchronously from an iothread when handling guest > > > requests. > > > > Hmm, I'm wondering how the need to temporarily play with SIGBUS > > at runtime for mem preallocation will interact with the SIGBUS > > handler installed by softmmu/cpus.c. > > That's exactly why I came up with MADV_POPULATE_WRITE, to avoid having to > mess with different kinds of sigbus at the same time. You can only get it > wrong. > > > > > The SIGBUS handler the preallocation code is installed just > > blindly assumes the SIGBUS is related to the preallocation > > work being done. This is a fine assumption during initially > > startup where we're single threaded and not running guest > > CPUs. I'm less clear on whether that's a valid assumption > > at runtime once guest CPUs are running. > > I assume it's quite broken, for example, already when hotplugging a DIMM and > prallocating memory for the memory backend. > > > > > If the sigbus_handler method in softmmu/cpus.c is doing > > something important for QEMU, then why is it ok for us to > > periodically disable that handler and replace it with > > something else that takes a completely different action ? > > I don't think it is ok. I assume it has been broken for a long time, just > nobody ever ran into that race. > > > > > Of course with the madvise impl we're bypassing the SIGBUS > > dance entirely. This is good for people with new kernels, > > but is this SIGBUS stuff safe for older kernels ? > > It remains broken with old kernels I guess. There isn't too much that we can > do: disabling prealloc=on once the VM is running breaks existing use cases. Ok, while refactoring this, could you add a scary warning next to the sigaction calls mentioning that this code is not likely to play well with qemu's other handling of sigbus, as a reminder to future reviewers. > Fortunately, running into that race seems to be rare, at least I never hear > reports. The failure mode is likely to be silent or easily mis-interpreted Is there any value in emitting a one-time per process warning message on stderr if we take the old codepath post-startup ? Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
>>> >>> Of course with the madvise impl we're bypassing the SIGBUS >>> dance entirely. This is good for people with new kernels, >>> but is this SIGBUS stuff safe for older kernels ? >> >> It remains broken with old kernels I guess. There isn't too much that we can >> do: disabling prealloc=on once the VM is running breaks existing use cases. > > Ok, while refactoring this, could you add a scary warning next to the > sigaction calls mentioning that this code is not likely to play well > with qemu's other handling of sigbus, as a reminder to future reviewers. Sure thing! > >> Fortunately, running into that race seems to be rare, at least I never hear >> reports. > > The failure mode is likely to be silent or easily mis-interpreted > > Is there any value in emitting a one-time per process warning message > on stderr if we take the old codepath post-startup ? Will look into emitting a warning when running this code while the VM is already running. (hope it won't be too ugly to have vm state checks in util/oslib-posix). -- Thanks, David / dhildenb
© 2016 - 2026 Red Hat, Inc.