[PATCH v1 3/3] util/oslib-posix: Support concurrent os_mem_prealloc() invocation

David Hildenbrand posted 3 patches 4 years, 6 months ago
Maintainers: Paolo Bonzini <pbonzini@redhat.com>
There is a newer version of this series
[PATCH v1 3/3] util/oslib-posix: Support concurrent os_mem_prealloc() invocation
Posted by David Hildenbrand 4 years, 6 months ago
Add a mutext to protect the SIGBUS case, as we cannot mess concurrently
with the sigbus handler and we have to manage the global variable
sigbus_memset_context. The MADV_POPULATE_WRITE path can run
concurrently.

Note that page_mutex and page_cond are shared between concurrent
invocations, which shouldn't be a problem.

This is a preparation for future virtio-mem prealloc code, which will call
os_mem_prealloc() asynchronously from an iothread when handling guest
requests.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 util/oslib-posix.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 60d1da2d6c..181f6bbf1a 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -94,6 +94,7 @@ typedef struct MemsetThread MemsetThread;
 
 /* used by sigbus_handler() */
 static MemsetContext *sigbus_memset_context;
+static QemuMutex sigbus_mutex;
 
 static QemuMutex page_mutex;
 static QemuCond page_cond;
@@ -605,12 +606,17 @@ static bool madv_populate_write_possible(char *area, size_t pagesize)
 void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
                      Error **errp)
 {
+    static gsize initialized;
     int ret;
     struct sigaction act, oldact;
     size_t hpagesize = qemu_fd_getpagesize(fd);
     size_t numpages = DIV_ROUND_UP(memory, hpagesize);
     bool use_madv_populate_write;
 
+    if (g_once_init_enter(&initialized)) {
+        qemu_mutex_init(&sigbus_mutex);
+    }
+
     /*
      * Sense on every invocation, as MADV_POPULATE_WRITE cannot be used for
      * some special mappings, such as mapping /dev/mem.
@@ -620,6 +626,7 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
     }
 
     if (!use_madv_populate_write) {
+        qemu_mutex_lock(&sigbus_mutex);
         memset(&act, 0, sizeof(act));
         act.sa_handler = &sigbus_handler;
         act.sa_flags = 0;
@@ -646,6 +653,7 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
             perror("os_mem_prealloc: failed to reinstall signal handler");
             exit(1);
         }
+        qemu_mutex_unlock(&sigbus_mutex);
     }
 }
 
-- 
2.31.1


Re: [PATCH v1 3/3] util/oslib-posix: Support concurrent os_mem_prealloc() invocation
Posted by Daniel P. Berrangé 4 years, 6 months ago
On Wed, Jul 14, 2021 at 01:23:06PM +0200, David Hildenbrand wrote:
> Add a mutext to protect the SIGBUS case, as we cannot mess concurrently

typo  s/mutext/mutex/

> with the sigbus handler and we have to manage the global variable
> sigbus_memset_context. The MADV_POPULATE_WRITE path can run
> concurrently.
> 
> Note that page_mutex and page_cond are shared between concurrent
> invocations, which shouldn't be a problem.
> 
> This is a preparation for future virtio-mem prealloc code, which will call
> os_mem_prealloc() asynchronously from an iothread when handling guest
> requests.

Hmm, I'm wondering how the need to temporarily play with SIGBUS
at runtime for mem preallocation will interact with the SIGBUS
handler installed by softmmu/cpus.c.

The SIGBUS handler the preallocation code is installed just
blindly assumes the SIGBUS is related to the preallocation
work being done. This is a fine assumption during initially
startup where we're single threaded and not running guest
CPUs. I'm less clear on whether that's a valid assumption
at runtime once guest CPUs are running.

If the sigbus_handler method in softmmu/cpus.c is doing
something important for QEMU, then why is it ok for us to
periodically disable that handler and replace it with
something else that takes a completely different action ?

Of course with the madvise impl we're bypassing the SIGBUS
dance entirely. This is good for people with new kernels,
but is this SIGBUS stuff safe for older kernels ?

> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  util/oslib-posix.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 60d1da2d6c..181f6bbf1a 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -94,6 +94,7 @@ typedef struct MemsetThread MemsetThread;
>  
>  /* used by sigbus_handler() */
>  static MemsetContext *sigbus_memset_context;
> +static QemuMutex sigbus_mutex;
>  
>  static QemuMutex page_mutex;
>  static QemuCond page_cond;
> @@ -605,12 +606,17 @@ static bool madv_populate_write_possible(char *area, size_t pagesize)
>  void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
>                       Error **errp)
>  {
> +    static gsize initialized;
>      int ret;
>      struct sigaction act, oldact;
>      size_t hpagesize = qemu_fd_getpagesize(fd);
>      size_t numpages = DIV_ROUND_UP(memory, hpagesize);
>      bool use_madv_populate_write;
>  
> +    if (g_once_init_enter(&initialized)) {
> +        qemu_mutex_init(&sigbus_mutex);
> +    }
> +
>      /*
>       * Sense on every invocation, as MADV_POPULATE_WRITE cannot be used for
>       * some special mappings, such as mapping /dev/mem.
> @@ -620,6 +626,7 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
>      }
>  
>      if (!use_madv_populate_write) {
> +        qemu_mutex_lock(&sigbus_mutex);
>          memset(&act, 0, sizeof(act));
>          act.sa_handler = &sigbus_handler;
>          act.sa_flags = 0;
> @@ -646,6 +653,7 @@ void os_mem_prealloc(int fd, char *area, size_t memory, int smp_cpus,
>              perror("os_mem_prealloc: failed to reinstall signal handler");
>              exit(1);
>          }
> +        qemu_mutex_unlock(&sigbus_mutex);
>      }
>  }



>  
> -- 
> 2.31.1
> 
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [PATCH v1 3/3] util/oslib-posix: Support concurrent os_mem_prealloc() invocation
Posted by David Hildenbrand 4 years, 6 months ago
On 20.07.21 16:22, Daniel P. Berrangé wrote:
> On Wed, Jul 14, 2021 at 01:23:06PM +0200, David Hildenbrand wrote:
>> Add a mutext to protect the SIGBUS case, as we cannot mess concurrently
> 
> typo  s/mutext/mutex/
> 
>> with the sigbus handler and we have to manage the global variable
>> sigbus_memset_context. The MADV_POPULATE_WRITE path can run
>> concurrently.
>>
>> Note that page_mutex and page_cond are shared between concurrent
>> invocations, which shouldn't be a problem.
>>
>> This is a preparation for future virtio-mem prealloc code, which will call
>> os_mem_prealloc() asynchronously from an iothread when handling guest
>> requests.
> 
> Hmm, I'm wondering how the need to temporarily play with SIGBUS
> at runtime for mem preallocation will interact with the SIGBUS
> handler installed by softmmu/cpus.c.

That's exactly why I came up with MADV_POPULATE_WRITE, to avoid having 
to mess with different kinds of sigbus at the same time. You can only 
get it wrong.

> 
> The SIGBUS handler the preallocation code is installed just
> blindly assumes the SIGBUS is related to the preallocation
> work being done. This is a fine assumption during initially
> startup where we're single threaded and not running guest
> CPUs. I'm less clear on whether that's a valid assumption
> at runtime once guest CPUs are running.

I assume it's quite broken, for example, already when hotplugging a DIMM 
and prallocating memory for the memory backend.

> 
> If the sigbus_handler method in softmmu/cpus.c is doing
> something important for QEMU, then why is it ok for us to
> periodically disable that handler and replace it with
> something else that takes a completely different action ?

I don't think it is ok. I assume it has been broken for a long time, 
just nobody ever ran into that race.

> 
> Of course with the madvise impl we're bypassing the SIGBUS
> dance entirely. This is good for people with new kernels,
> but is this SIGBUS stuff safe for older kernels ?

It remains broken with old kernels I guess. There isn't too much that we 
can do: disabling prealloc=on once the VM is running breaks existing use 
cases.

Fortunately, running into that race seems to be rare, at least I never 
hear reports.

-- 
Thanks,

David / dhildenb


Re: [PATCH v1 3/3] util/oslib-posix: Support concurrent os_mem_prealloc() invocation
Posted by Daniel P. Berrangé 4 years, 6 months ago
On Tue, Jul 20, 2021 at 04:27:32PM +0200, David Hildenbrand wrote:
> On 20.07.21 16:22, Daniel P. Berrangé wrote:
> > On Wed, Jul 14, 2021 at 01:23:06PM +0200, David Hildenbrand wrote:
> > > Add a mutext to protect the SIGBUS case, as we cannot mess concurrently
> > 
> > typo  s/mutext/mutex/
> > 
> > > with the sigbus handler and we have to manage the global variable
> > > sigbus_memset_context. The MADV_POPULATE_WRITE path can run
> > > concurrently.
> > > 
> > > Note that page_mutex and page_cond are shared between concurrent
> > > invocations, which shouldn't be a problem.
> > > 
> > > This is a preparation for future virtio-mem prealloc code, which will call
> > > os_mem_prealloc() asynchronously from an iothread when handling guest
> > > requests.
> > 
> > Hmm, I'm wondering how the need to temporarily play with SIGBUS
> > at runtime for mem preallocation will interact with the SIGBUS
> > handler installed by softmmu/cpus.c.
> 
> That's exactly why I came up with MADV_POPULATE_WRITE, to avoid having to
> mess with different kinds of sigbus at the same time. You can only get it
> wrong.
> 
> > 
> > The SIGBUS handler the preallocation code is installed just
> > blindly assumes the SIGBUS is related to the preallocation
> > work being done. This is a fine assumption during initially
> > startup where we're single threaded and not running guest
> > CPUs. I'm less clear on whether that's a valid assumption
> > at runtime once guest CPUs are running.
> 
> I assume it's quite broken, for example, already when hotplugging a DIMM and
> prallocating memory for the memory backend.
> 
> > 
> > If the sigbus_handler method in softmmu/cpus.c is doing
> > something important for QEMU, then why is it ok for us to
> > periodically disable that handler and replace it with
> > something else that takes a completely different action ?
> 
> I don't think it is ok. I assume it has been broken for a long time, just
> nobody ever ran into that race.
> 
> > 
> > Of course with the madvise impl we're bypassing the SIGBUS
> > dance entirely. This is good for people with new kernels,
> > but is this SIGBUS stuff safe for older kernels ?
> 
> It remains broken with old kernels I guess. There isn't too much that we can
> do: disabling prealloc=on once the VM is running breaks existing use cases.

Ok, while refactoring this, could you add a scary warning next to the
sigaction calls mentioning that this code is not likely to play well
with qemu's other handling of sigbus, as a reminder to future reviewers.

> Fortunately, running into that race seems to be rare, at least I never hear
> reports.

The failure mode is likely to be silent or easily mis-interpreted

Is there any value in emitting a one-time per process warning message
on stderr if we take the old codepath post-startup ?

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [PATCH v1 3/3] util/oslib-posix: Support concurrent os_mem_prealloc() invocation
Posted by David Hildenbrand 4 years, 6 months ago
>>>
>>> Of course with the madvise impl we're bypassing the SIGBUS
>>> dance entirely. This is good for people with new kernels,
>>> but is this SIGBUS stuff safe for older kernels ?
>>
>> It remains broken with old kernels I guess. There isn't too much that we can
>> do: disabling prealloc=on once the VM is running breaks existing use cases.
> 
> Ok, while refactoring this, could you add a scary warning next to the
> sigaction calls mentioning that this code is not likely to play well
> with qemu's other handling of sigbus, as a reminder to future reviewers.

Sure thing!

> 
>> Fortunately, running into that race seems to be rare, at least I never hear
>> reports.
> 
> The failure mode is likely to be silent or easily mis-interpreted
> 
> Is there any value in emitting a one-time per process warning message
> on stderr if we take the old codepath post-startup ?

Will look into emitting a warning when running this code while the VM is 
already running. (hope it won't be too ugly to have vm state checks in 
util/oslib-posix).

-- 
Thanks,

David / dhildenb