Since 5.0 QEMU uses hostmem backend for allocating main guest RAM.
The backend however calls mbind() which is typically NOP
in case of default policy/absent host-nodes bitmap.
However when runing in container with black-listed mbind()
syscall, QEMU fails to start with error
"cannot bind memory to host NUMA nodes: Operation not permitted"
even when user hasn't provided host-nodes to pin to explictly
(which is the case with -m option)
To fix issue, call mbind() only in case when user has provided
host-nodes explicitly (i.e. host_nodes bitmap is not empty).
That should allow to run QEMU in containers with black-listed
mbind() without memory pinning. If QEMU provided memory-pinning
is required user still has to white-list mbind() in container
configuration.
Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
CC: berrange@redhat.com
CC: ehabkost@redhat.com
CC: pbonzini@redhat.com
CC: mhohmann@physnet.uni-hamburg.de
CC: qemu-stable@nongnu.org
---
backends/hostmem.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 327f9eebc3..0efd7b7bd6 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -383,8 +383,10 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
assert(sizeof(backend->host_nodes) >=
BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
assert(maxnode <= MAX_NODES);
- if (mbind(ptr, sz, backend->policy,
- maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) {
+
+ if (maxnode &&
+ mbind(ptr, sz, backend->policy, backend->host_nodes, maxnode + 1,
+ flags)) {
if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
error_setg_errno(errp, errno,
"cannot bind memory to host NUMA nodes");
--
2.18.1
Typo "empty" in patch subject. On 4/30/20 5:46 PM, Igor Mammedov wrote: > Since 5.0 QEMU uses hostmem backend for allocating main guest RAM. > The backend however calls mbind() which is typically NOP > in case of default policy/absent host-nodes bitmap. > However when runing in container with black-listed mbind() > syscall, QEMU fails to start with error > "cannot bind memory to host NUMA nodes: Operation not permitted" > even when user hasn't provided host-nodes to pin to explictly > (which is the case with -m option) > > To fix issue, call mbind() only in case when user has provided > host-nodes explicitly (i.e. host_nodes bitmap is not empty). > That should allow to run QEMU in containers with black-listed > mbind() without memory pinning. If QEMU provided memory-pinning > is required user still has to white-list mbind() in container > configuration. > > Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de> > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > --- > CC: berrange@redhat.com > CC: ehabkost@redhat.com > CC: pbonzini@redhat.com > CC: mhohmann@physnet.uni-hamburg.de > CC: qemu-stable@nongnu.org > --- > backends/hostmem.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/backends/hostmem.c b/backends/hostmem.c > index 327f9eebc3..0efd7b7bd6 100644 > --- a/backends/hostmem.c > +++ b/backends/hostmem.c > @@ -383,8 +383,10 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) > assert(sizeof(backend->host_nodes) >= > BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long)); > assert(maxnode <= MAX_NODES); > - if (mbind(ptr, sz, backend->policy, > - maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) { > + > + if (maxnode && > + mbind(ptr, sz, backend->policy, backend->host_nodes, maxnode + 1, > + flags)) { > if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) { > error_setg_errno(errp, errno, > "cannot bind memory to host NUMA nodes"); >
Thanks! I applied the patch, and now it works also inside the docker container, for all architectures (i386, x86_64, arm, aarch64) for which I have test cases at hand. Indeed, since the container is configured by a public cloud service, there is no possibility to change any security settings. Disabling mbind unless explicitly requested seems to be the best way to go here. On 30.04.20 19:42, Philippe Mathieu-Daudé wrote: > Typo "empty" in patch subject. > > On 4/30/20 5:46 PM, Igor Mammedov wrote: >> Since 5.0 QEMU uses hostmem backend for allocating main guest RAM. >> The backend however calls mbind() which is typically NOP >> in case of default policy/absent host-nodes bitmap. >> However when runing in container with black-listed mbind() >> syscall, QEMU fails to start with error >> "cannot bind memory to host NUMA nodes: Operation not permitted" >> even when user hasn't provided host-nodes to pin to explictly >> (which is the case with -m option) >> >> To fix issue, call mbind() only in case when user has provided >> host-nodes explicitly (i.e. host_nodes bitmap is not empty). >> That should allow to run QEMU in containers with black-listed >> mbind() without memory pinning. If QEMU provided memory-pinning >> is required user still has to white-list mbind() in container >> configuration. >> >> Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de> >> Signed-off-by: Igor Mammedov <imammedo@redhat.com> >> --- >> CC: berrange@redhat.com >> CC: ehabkost@redhat.com >> CC: pbonzini@redhat.com >> CC: mhohmann@physnet.uni-hamburg.de >> CC: qemu-stable@nongnu.org >> --- >> backends/hostmem.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/backends/hostmem.c b/backends/hostmem.c >> index 327f9eebc3..0efd7b7bd6 100644 >> --- a/backends/hostmem.c >> +++ b/backends/hostmem.c >> @@ -383,8 +383,10 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) >> assert(sizeof(backend->host_nodes) >= >> BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long)); >> assert(maxnode <= MAX_NODES); >> - if (mbind(ptr, sz, backend->policy, >> - maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) { >> + >> + if (maxnode && >> + mbind(ptr, sz, backend->policy, backend->host_nodes, maxnode + 1, >> + flags)) { >> if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) { >> error_setg_errno(errp, errno, >> "cannot bind memory to host NUMA nodes"); >> >
On Thu, Apr 30, 2020 at 11:46:06AM -0400, Igor Mammedov wrote: > Since 5.0 QEMU uses hostmem backend for allocating main guest RAM. > The backend however calls mbind() which is typically NOP > in case of default policy/absent host-nodes bitmap. > However when runing in container with black-listed mbind() > syscall, QEMU fails to start with error > "cannot bind memory to host NUMA nodes: Operation not permitted" > even when user hasn't provided host-nodes to pin to explictly > (which is the case with -m option) > > To fix issue, call mbind() only in case when user has provided > host-nodes explicitly (i.e. host_nodes bitmap is not empty). > That should allow to run QEMU in containers with black-listed > mbind() without memory pinning. If QEMU provided memory-pinning > is required user still has to white-list mbind() in container > configuration. > > Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de> > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > --- > CC: berrange@redhat.com > CC: ehabkost@redhat.com > CC: pbonzini@redhat.com > CC: mhohmann@physnet.uni-hamburg.de > CC: qemu-stable@nongnu.org > --- > backends/hostmem.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/backends/hostmem.c b/backends/hostmem.c > index 327f9eebc3..0efd7b7bd6 100644 > --- a/backends/hostmem.c > +++ b/backends/hostmem.c > @@ -383,8 +383,10 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) > assert(sizeof(backend->host_nodes) >= > BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long)); > assert(maxnode <= MAX_NODES); > - if (mbind(ptr, sz, backend->policy, > - maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) { > + > + if (maxnode && > + mbind(ptr, sz, backend->policy, backend->host_nodes, maxnode + 1, > + flags)) { > if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) { > error_setg_errno(errp, errno, > "cannot bind memory to host NUMA nodes"); personally I would have found this code clearer if the check had been "if (backend->policy != MPOL_DEFAULT && ..." as I had to read quite a few lines to understand that the 'maxnode' is zero if-and-only-if policy == MPOL_DEFAULT Regardless though, this is functionally correct so Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 5/1/20 10:57 AM, Daniel P. Berrangé wrote: > On Thu, Apr 30, 2020 at 11:46:06AM -0400, Igor Mammedov wrote: >> Since 5.0 QEMU uses hostmem backend for allocating main guest RAM. >> The backend however calls mbind() which is typically NOP >> in case of default policy/absent host-nodes bitmap. >> However when runing in container with black-listed mbind() >> syscall, QEMU fails to start with error >> "cannot bind memory to host NUMA nodes: Operation not permitted" >> even when user hasn't provided host-nodes to pin to explictly >> (which is the case with -m option) >> >> To fix issue, call mbind() only in case when user has provided >> host-nodes explicitly (i.e. host_nodes bitmap is not empty). >> That should allow to run QEMU in containers with black-listed >> mbind() without memory pinning. If QEMU provided memory-pinning >> is required user still has to white-list mbind() in container >> configuration. >> >> Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de> >> Signed-off-by: Igor Mammedov <imammedo@redhat.com> >> --- >> CC: berrange@redhat.com >> CC: ehabkost@redhat.com >> CC: pbonzini@redhat.com >> CC: mhohmann@physnet.uni-hamburg.de >> CC: qemu-stable@nongnu.org >> --- >> backends/hostmem.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/backends/hostmem.c b/backends/hostmem.c >> index 327f9eebc3..0efd7b7bd6 100644 >> --- a/backends/hostmem.c >> +++ b/backends/hostmem.c >> @@ -383,8 +383,10 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) >> assert(sizeof(backend->host_nodes) >= >> BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long)); >> assert(maxnode <= MAX_NODES); >> - if (mbind(ptr, sz, backend->policy, >> - maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) { >> + >> + if (maxnode && >> + mbind(ptr, sz, backend->policy, backend->host_nodes, maxnode + 1, >> + flags)) { >> if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) { >> error_setg_errno(errp, errno, >> "cannot bind memory to host NUMA nodes"); > > personally I would have found this code clearer if the > check had been "if (backend->policy != MPOL_DEFAULT && ..." > as I had to read quite a few lines to understand that the > 'maxnode' is zero if-and-only-if policy == MPOL_DEFAULT > > Regardless though, this is functionally correct so > > Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> I could reproduce running 'make check-qtest-hppa' on the qemu:fedora image: TEST check-qtest-hppa: tests/qtest/boot-serial-test qemu-system-hppa: cannot bind memory to host NUMA nodes: Operation not permitted Broken pipe tests/qtest/libqtest.c:166: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0) ERROR - too few tests run (expected 1, got 0) make: *** [tests/Makefile.include:637: check-qtest-hppa] Error 1 Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> > > Regards, > Daniel >
On Thu, Apr 30, 2020 at 11:46:06AM -0400, Igor Mammedov wrote: > Since 5.0 QEMU uses hostmem backend for allocating main guest RAM. > The backend however calls mbind() which is typically NOP > in case of default policy/absent host-nodes bitmap. > However when runing in container with black-listed mbind() > syscall, QEMU fails to start with error > "cannot bind memory to host NUMA nodes: Operation not permitted" > even when user hasn't provided host-nodes to pin to explictly > (which is the case with -m option) > > To fix issue, call mbind() only in case when user has provided > host-nodes explicitly (i.e. host_nodes bitmap is not empty). > That should allow to run QEMU in containers with black-listed > mbind() without memory pinning. If QEMU provided memory-pinning > is required user still has to white-list mbind() in container > configuration. > > Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de> > Signed-off-by: Igor Mammedov <imammedo@redhat.com> Queued on machine-next, thanks! -- Eduardo
Hi Eduardo, On 5/4/20 5:44 PM, Eduardo Habkost wrote: > On Thu, Apr 30, 2020 at 11:46:06AM -0400, Igor Mammedov wrote: >> Since 5.0 QEMU uses hostmem backend for allocating main guest RAM. >> The backend however calls mbind() which is typically NOP >> in case of default policy/absent host-nodes bitmap. >> However when runing in container with black-listed mbind() >> syscall, QEMU fails to start with error >> "cannot bind memory to host NUMA nodes: Operation not permitted" >> even when user hasn't provided host-nodes to pin to explictly >> (which is the case with -m option) >> >> To fix issue, call mbind() only in case when user has provided >> host-nodes explicitly (i.e. host_nodes bitmap is not empty). >> That should allow to run QEMU in containers with black-listed >> mbind() without memory pinning. If QEMU provided memory-pinning >> is required user still has to white-list mbind() in container >> configuration. >> >> Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de> >> Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > Queued on machine-next, thanks! I've been debugging this issue again today and figured it was not merged, if possible can you add the "Cc: qemu-stable@nongnu.org" tag before sending your pull request? Thanks, Phil.
On Mon, 11 May 2020 18:00:01 +0200 Philippe Mathieu-Daudé <philmd@redhat.com> wrote: > Hi Eduardo, > > On 5/4/20 5:44 PM, Eduardo Habkost wrote: > > On Thu, Apr 30, 2020 at 11:46:06AM -0400, Igor Mammedov wrote: > >> Since 5.0 QEMU uses hostmem backend for allocating main guest RAM. > >> The backend however calls mbind() which is typically NOP > >> in case of default policy/absent host-nodes bitmap. > >> However when runing in container with black-listed mbind() > >> syscall, QEMU fails to start with error > >> "cannot bind memory to host NUMA nodes: Operation not permitted" > >> even when user hasn't provided host-nodes to pin to explictly > >> (which is the case with -m option) > >> > >> To fix issue, call mbind() only in case when user has provided > >> host-nodes explicitly (i.e. host_nodes bitmap is not empty). > >> That should allow to run QEMU in containers with black-listed > >> mbind() without memory pinning. If QEMU provided memory-pinning > >> is required user still has to white-list mbind() in container > >> configuration. > >> > >> Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de> > >> Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > > > Queued on machine-next, thanks! > > I've been debugging this issue again today and figured it was not > merged, if possible can you add the "Cc: qemu-stable@nongnu.org" tag > before sending your pull request? it's CCed already, so my impression was that will should picked up once it was reviewed. > > Thanks, > > Phil. >
On 5/11/20 9:24 PM, Igor Mammedov wrote: > On Mon, 11 May 2020 18:00:01 +0200 > Philippe Mathieu-Daudé <philmd@redhat.com> wrote: > >> Hi Eduardo, >> >> On 5/4/20 5:44 PM, Eduardo Habkost wrote: >>> On Thu, Apr 30, 2020 at 11:46:06AM -0400, Igor Mammedov wrote: >>>> Since 5.0 QEMU uses hostmem backend for allocating main guest RAM. >>>> The backend however calls mbind() which is typically NOP >>>> in case of default policy/absent host-nodes bitmap. >>>> However when runing in container with black-listed mbind() >>>> syscall, QEMU fails to start with error >>>> "cannot bind memory to host NUMA nodes: Operation not permitted" >>>> even when user hasn't provided host-nodes to pin to explictly >>>> (which is the case with -m option) >>>> >>>> To fix issue, call mbind() only in case when user has provided >>>> host-nodes explicitly (i.e. host_nodes bitmap is not empty). >>>> That should allow to run QEMU in containers with black-listed >>>> mbind() without memory pinning. If QEMU provided memory-pinning >>>> is required user still has to white-list mbind() in container >>>> configuration. >>>> >>>> Reported-by: Manuel Hohmann <mhohmann@physnet.uni-hamburg.de> >>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com> >>> >>> Queued on machine-next, thanks! >> >> I've been debugging this issue again today and figured it was not >> merged, if possible can you add the "Cc: qemu-stable@nongnu.org" tag >> before sending your pull request? > it's CCed already, so my impression was that will should picked up once it was reviewed. Correct, however some distributions find easier to grep for the 'Cc: qemu-stable@nongnu.org' merged tag before qemu-stable is released. > >> >> Thanks, >> >> Phil. >> >
© 2016 - 2024 Red Hat, Inc.