[PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd

Peter Xu posted 3 patches 1 year, 3 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20230125224016.212529-1-peterx@redhat.com
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Cornelia Huck <cohuck@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Juan Quintela <quintela@redhat.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Thomas Huth <thuth@redhat.com>, Laurent Vivier <lvivier@redhat.com>
There is a newer version of this series
include/qemu/userfaultfd.h                    |   1 +
include/standard-headers/drm/drm_fourcc.h     |  34 ++++-
include/standard-headers/linux/ethtool.h      |  63 +++++++-
include/standard-headers/linux/fuse.h         |   6 +-
.../linux/input-event-codes.h                 |   1 +
include/standard-headers/linux/virtio_blk.h   |  19 +++
linux-headers/asm-generic/hugetlb_encode.h    |  26 ++--
linux-headers/asm-generic/mman-common.h       |   2 +
linux-headers/asm-mips/mman.h                 |   2 +
linux-headers/asm-riscv/kvm.h                 |   4 +
linux-headers/linux/kvm.h                     |   1 +
linux-headers/linux/psci.h                    |  14 ++
linux-headers/linux/userfaultfd.h             |   4 +
linux-headers/linux/vfio.h                    | 142 ++++++++++++++++++
migration/postcopy-ram.c                      |  11 +-
tests/qtest/migration-test.c                  |   3 +-
util/trace-events                             |   1 +
util/userfaultfd.c                            |  49 +++++-
18 files changed, 354 insertions(+), 29 deletions(-)
[PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd
Posted by Peter Xu 1 year, 3 months ago
The new /dev/userfaultfd handle is superior to the system call with a
better permission control and also works for a restricted seccomp
environment.

The new device was only introduced in v6.1 so we need a header update.

Please have a look, thanks.

Peter Xu (3):
  linux-headers: Update to v6.1
  util/userfaultfd: Add uffd_open()
  util/userfaultfd: Support /dev/userfaultfd

 include/qemu/userfaultfd.h                    |   1 +
 include/standard-headers/drm/drm_fourcc.h     |  34 ++++-
 include/standard-headers/linux/ethtool.h      |  63 +++++++-
 include/standard-headers/linux/fuse.h         |   6 +-
 .../linux/input-event-codes.h                 |   1 +
 include/standard-headers/linux/virtio_blk.h   |  19 +++
 linux-headers/asm-generic/hugetlb_encode.h    |  26 ++--
 linux-headers/asm-generic/mman-common.h       |   2 +
 linux-headers/asm-mips/mman.h                 |   2 +
 linux-headers/asm-riscv/kvm.h                 |   4 +
 linux-headers/linux/kvm.h                     |   1 +
 linux-headers/linux/psci.h                    |  14 ++
 linux-headers/linux/userfaultfd.h             |   4 +
 linux-headers/linux/vfio.h                    | 142 ++++++++++++++++++
 migration/postcopy-ram.c                      |  11 +-
 tests/qtest/migration-test.c                  |   3 +-
 util/trace-events                             |   1 +
 util/userfaultfd.c                            |  49 +++++-
 18 files changed, 354 insertions(+), 29 deletions(-)

-- 
2.37.3
Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd
Posted by Michal Prívozník 1 year, 3 months ago
On 1/25/23 23:40, Peter Xu wrote:
> The new /dev/userfaultfd handle is superior to the system call with a
> better permission control and also works for a restricted seccomp
> environment.
> 
> The new device was only introduced in v6.1 so we need a header update.
> 
> Please have a look, thanks.

I was wondering whether it would make sense/be possible for mgmt app
(libvirt) to pass FD for /dev/userfaultfd instead of QEMU opening it
itself. But looking into the code, libvirt would need to do that when
spawning QEMU because that's when QEMU itself initializes internal state
and queries userfaultfd caps.

Michal
Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd
Posted by Dr. David Alan Gilbert 1 year, 3 months ago
* Michal Prívozník (mprivozn@redhat.com) wrote:
> On 1/25/23 23:40, Peter Xu wrote:
> > The new /dev/userfaultfd handle is superior to the system call with a
> > better permission control and also works for a restricted seccomp
> > environment.
> > 
> > The new device was only introduced in v6.1 so we need a header update.
> > 
> > Please have a look, thanks.
> 
> I was wondering whether it would make sense/be possible for mgmt app
> (libvirt) to pass FD for /dev/userfaultfd instead of QEMU opening it
> itself. But looking into the code, libvirt would need to do that when
> spawning QEMU because that's when QEMU itself initializes internal state
> and queries userfaultfd caps.

You also have to be careful about what the userfaultfd semantics are; I
can't remember them - but if you open it in one process and pass it to
another process, which processes address space are you trying to
monitor?

Dave

> Michal
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd
Posted by Peter Xu 1 year, 3 months ago
On Thu, Jan 26, 2023 at 02:15:11PM +0000, Dr. David Alan Gilbert wrote:
> * Michal Prívozník (mprivozn@redhat.com) wrote:
> > On 1/25/23 23:40, Peter Xu wrote:
> > > The new /dev/userfaultfd handle is superior to the system call with a
> > > better permission control and also works for a restricted seccomp
> > > environment.
> > > 
> > > The new device was only introduced in v6.1 so we need a header update.
> > > 
> > > Please have a look, thanks.
> > 
> > I was wondering whether it would make sense/be possible for mgmt app
> > (libvirt) to pass FD for /dev/userfaultfd instead of QEMU opening it
> > itself. But looking into the code, libvirt would need to do that when
> > spawning QEMU because that's when QEMU itself initializes internal state
> > and queries userfaultfd caps.
> 
> You also have to be careful about what the userfaultfd semantics are; I
> can't remember them - but if you open it in one process and pass it to
> another process, which processes address space are you trying to
> monitor?

Yes it's a problem.  The kernel always fetches the current mm_struct* which
represents the current context of virtual address space when creating the
uffd handle (for either the syscall or the ioctl() approach).

It works only if Libvirt will invoke QEMU as a thread and they'll share the
same address space.

Why libvirt would like to do so?

Thanks,

-- 
Peter Xu


Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd
Posted by Daniel P. Berrangé 1 year, 3 months ago
On Thu, Jan 26, 2023 at 10:25:05AM -0500, Peter Xu wrote:
> On Thu, Jan 26, 2023 at 02:15:11PM +0000, Dr. David Alan Gilbert wrote:
> > * Michal Prívozník (mprivozn@redhat.com) wrote:
> > > On 1/25/23 23:40, Peter Xu wrote:
> > > > The new /dev/userfaultfd handle is superior to the system call with a
> > > > better permission control and also works for a restricted seccomp
> > > > environment.
> > > > 
> > > > The new device was only introduced in v6.1 so we need a header update.
> > > > 
> > > > Please have a look, thanks.
> > > 
> > > I was wondering whether it would make sense/be possible for mgmt app
> > > (libvirt) to pass FD for /dev/userfaultfd instead of QEMU opening it
> > > itself. But looking into the code, libvirt would need to do that when
> > > spawning QEMU because that's when QEMU itself initializes internal state
> > > and queries userfaultfd caps.
> > 
> > You also have to be careful about what the userfaultfd semantics are; I
> > can't remember them - but if you open it in one process and pass it to
> > another process, which processes address space are you trying to
> > monitor?
> 
> Yes it's a problem.  The kernel always fetches the current mm_struct* which
> represents the current context of virtual address space when creating the
> uffd handle (for either the syscall or the ioctl() approach).

At what point does the process address space get associated ? When
the /dev/userfaultfd is opened, or only when ioctl(USERFAULTFD_IOC_NEW)
is called ?  If it is the former, then we have no choice, QEMU must open
it. if it is the latter, then libvirt can open /dev/userfaultfd, pass
it to QEMU which can then do the ioctl(USERFAULTFD_IOC_NEW).

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd
Posted by Peter Xu 1 year, 3 months ago
On Thu, Jan 26, 2023 at 03:59:33PM +0000, Daniel P. Berrangé wrote:
> On Thu, Jan 26, 2023 at 10:25:05AM -0500, Peter Xu wrote:
> > On Thu, Jan 26, 2023 at 02:15:11PM +0000, Dr. David Alan Gilbert wrote:
> > > * Michal Prívozník (mprivozn@redhat.com) wrote:
> > > > On 1/25/23 23:40, Peter Xu wrote:
> > > > > The new /dev/userfaultfd handle is superior to the system call with a
> > > > > better permission control and also works for a restricted seccomp
> > > > > environment.
> > > > > 
> > > > > The new device was only introduced in v6.1 so we need a header update.
> > > > > 
> > > > > Please have a look, thanks.
> > > > 
> > > > I was wondering whether it would make sense/be possible for mgmt app
> > > > (libvirt) to pass FD for /dev/userfaultfd instead of QEMU opening it
> > > > itself. But looking into the code, libvirt would need to do that when
> > > > spawning QEMU because that's when QEMU itself initializes internal state
> > > > and queries userfaultfd caps.
> > > 
> > > You also have to be careful about what the userfaultfd semantics are; I
> > > can't remember them - but if you open it in one process and pass it to
> > > another process, which processes address space are you trying to
> > > monitor?
> > 
> > Yes it's a problem.  The kernel always fetches the current mm_struct* which
> > represents the current context of virtual address space when creating the
> > uffd handle (for either the syscall or the ioctl() approach).
> 
> At what point does the process address space get associated ? When
> the /dev/userfaultfd is opened, or only when ioctl(USERFAULTFD_IOC_NEW)
> is called ?  If it is the former, then we have no choice, QEMU must open
> it. if it is the latter, then libvirt can open /dev/userfaultfd, pass
> it to QEMU which can then do the ioctl(USERFAULTFD_IOC_NEW).

Good point.. It should be the latter, so should be doable.

What should be the best interface for QEMU to detect the fd passing over to
it?  IIUC qemu_open() requires the name to be /dev/fdset/*, but there's no
existing cmdline that QEMU can know which fd number to fetch from fdset to
be used as the /dev/userfaultfd descriptor.

monitor_get_fd() seems more proper, where we can define an unique string so
Libvirt can preset the descriptor with the same string attached to it, then
I can opt-in monitor_get_fd() before trying to open() or doing the syscall.

Thanks,

-- 
Peter Xu


Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd
Posted by Peter Xu 1 year, 3 months ago
On Thu, Jan 26, 2023 at 12:26:45PM -0500, Peter Xu wrote:
> On Thu, Jan 26, 2023 at 03:59:33PM +0000, Daniel P. Berrangé wrote:
> > On Thu, Jan 26, 2023 at 10:25:05AM -0500, Peter Xu wrote:
> > > On Thu, Jan 26, 2023 at 02:15:11PM +0000, Dr. David Alan Gilbert wrote:
> > > > * Michal Prívozník (mprivozn@redhat.com) wrote:
> > > > > On 1/25/23 23:40, Peter Xu wrote:
> > > > > > The new /dev/userfaultfd handle is superior to the system call with a
> > > > > > better permission control and also works for a restricted seccomp
> > > > > > environment.
> > > > > > 
> > > > > > The new device was only introduced in v6.1 so we need a header update.
> > > > > > 
> > > > > > Please have a look, thanks.
> > > > > 
> > > > > I was wondering whether it would make sense/be possible for mgmt app
> > > > > (libvirt) to pass FD for /dev/userfaultfd instead of QEMU opening it
> > > > > itself. But looking into the code, libvirt would need to do that when
> > > > > spawning QEMU because that's when QEMU itself initializes internal state
> > > > > and queries userfaultfd caps.
> > > > 
> > > > You also have to be careful about what the userfaultfd semantics are; I
> > > > can't remember them - but if you open it in one process and pass it to
> > > > another process, which processes address space are you trying to
> > > > monitor?
> > > 
> > > Yes it's a problem.  The kernel always fetches the current mm_struct* which
> > > represents the current context of virtual address space when creating the
> > > uffd handle (for either the syscall or the ioctl() approach).
> > 
> > At what point does the process address space get associated ? When
> > the /dev/userfaultfd is opened, or only when ioctl(USERFAULTFD_IOC_NEW)
> > is called ?  If it is the former, then we have no choice, QEMU must open
> > it. if it is the latter, then libvirt can open /dev/userfaultfd, pass
> > it to QEMU which can then do the ioctl(USERFAULTFD_IOC_NEW).
> 
> Good point.. It should be the latter, so should be doable.
> 
> What should be the best interface for QEMU to detect the fd passing over to
> it?  IIUC qemu_open() requires the name to be /dev/fdset/*, but there's no
> existing cmdline that QEMU can know which fd number to fetch from fdset to
> be used as the /dev/userfaultfd descriptor.
> 
> monitor_get_fd() seems more proper, where we can define an unique string so
> Libvirt can preset the descriptor with the same string attached to it, then
> I can opt-in monitor_get_fd() before trying to open() or doing the syscall.

Daniel/Michal, any input here from Libvirt side?

I just noticed that monitor_get_fd() is bound to a specific monitor, then
it seems not clear which one is from Libvirt.  If to use qemu_open() and
add-fd I think we need another QEMU cmdline to set the fd path, iiuc.

I can also leave that for later if opening /dev/userfaultfd is already
resolving the immediate problem in containers.

Thanks,

-- 
Peter Xu


Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd
Posted by Daniel P. Berrangé 1 year, 3 months ago
On Tue, Jan 31, 2023 at 02:48:54PM -0500, Peter Xu wrote:
> On Thu, Jan 26, 2023 at 12:26:45PM -0500, Peter Xu wrote:
> > On Thu, Jan 26, 2023 at 03:59:33PM +0000, Daniel P. Berrangé wrote:
> > > On Thu, Jan 26, 2023 at 10:25:05AM -0500, Peter Xu wrote:
> > > > On Thu, Jan 26, 2023 at 02:15:11PM +0000, Dr. David Alan Gilbert wrote:
> > > > > * Michal Prívozník (mprivozn@redhat.com) wrote:
> > > > > > On 1/25/23 23:40, Peter Xu wrote:
> > > > > > > The new /dev/userfaultfd handle is superior to the system call with a
> > > > > > > better permission control and also works for a restricted seccomp
> > > > > > > environment.
> > > > > > > 
> > > > > > > The new device was only introduced in v6.1 so we need a header update.
> > > > > > > 
> > > > > > > Please have a look, thanks.
> > > > > > 
> > > > > > I was wondering whether it would make sense/be possible for mgmt app
> > > > > > (libvirt) to pass FD for /dev/userfaultfd instead of QEMU opening it
> > > > > > itself. But looking into the code, libvirt would need to do that when
> > > > > > spawning QEMU because that's when QEMU itself initializes internal state
> > > > > > and queries userfaultfd caps.
> > > > > 
> > > > > You also have to be careful about what the userfaultfd semantics are; I
> > > > > can't remember them - but if you open it in one process and pass it to
> > > > > another process, which processes address space are you trying to
> > > > > monitor?
> > > > 
> > > > Yes it's a problem.  The kernel always fetches the current mm_struct* which
> > > > represents the current context of virtual address space when creating the
> > > > uffd handle (for either the syscall or the ioctl() approach).
> > > 
> > > At what point does the process address space get associated ? When
> > > the /dev/userfaultfd is opened, or only when ioctl(USERFAULTFD_IOC_NEW)
> > > is called ?  If it is the former, then we have no choice, QEMU must open
> > > it. if it is the latter, then libvirt can open /dev/userfaultfd, pass
> > > it to QEMU which can then do the ioctl(USERFAULTFD_IOC_NEW).
> > 
> > Good point.. It should be the latter, so should be doable.
> > 
> > What should be the best interface for QEMU to detect the fd passing over to
> > it?  IIUC qemu_open() requires the name to be /dev/fdset/*, but there's no
> > existing cmdline that QEMU can know which fd number to fetch from fdset to
> > be used as the /dev/userfaultfd descriptor.
> > 
> > monitor_get_fd() seems more proper, where we can define an unique string so
> > Libvirt can preset the descriptor with the same string attached to it, then
> > I can opt-in monitor_get_fd() before trying to open() or doing the syscall.
> 
> Daniel/Michal, any input here from Libvirt side?
> 
> I just noticed that monitor_get_fd() is bound to a specific monitor, then
> it seems not clear which one is from Libvirt.  If to use qemu_open() and
> add-fd I think we need another QEMU cmdline to set the fd path, iiuc.
> 
> I can also leave that for later if opening /dev/userfaultfd is already
> resolving the immediate problem in containers.

I don't have any great ideas really. If we assume the /dev/userfaultfd
is accessible to QEMU we can ignore it.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd
Posted by Peter Xu 1 year, 3 months ago
On Tue, Jan 31, 2023 at 08:06:55PM +0000, Daniel P. Berrangé wrote:
> On Tue, Jan 31, 2023 at 02:48:54PM -0500, Peter Xu wrote:
> > On Thu, Jan 26, 2023 at 12:26:45PM -0500, Peter Xu wrote:
> > > On Thu, Jan 26, 2023 at 03:59:33PM +0000, Daniel P. Berrangé wrote:
> > > > On Thu, Jan 26, 2023 at 10:25:05AM -0500, Peter Xu wrote:
> > > > > On Thu, Jan 26, 2023 at 02:15:11PM +0000, Dr. David Alan Gilbert wrote:
> > > > > > * Michal Prívozník (mprivozn@redhat.com) wrote:
> > > > > > > On 1/25/23 23:40, Peter Xu wrote:
> > > > > > > > The new /dev/userfaultfd handle is superior to the system call with a
> > > > > > > > better permission control and also works for a restricted seccomp
> > > > > > > > environment.
> > > > > > > > 
> > > > > > > > The new device was only introduced in v6.1 so we need a header update.
> > > > > > > > 
> > > > > > > > Please have a look, thanks.
> > > > > > > 
> > > > > > > I was wondering whether it would make sense/be possible for mgmt app
> > > > > > > (libvirt) to pass FD for /dev/userfaultfd instead of QEMU opening it
> > > > > > > itself. But looking into the code, libvirt would need to do that when
> > > > > > > spawning QEMU because that's when QEMU itself initializes internal state
> > > > > > > and queries userfaultfd caps.
> > > > > > 
> > > > > > You also have to be careful about what the userfaultfd semantics are; I
> > > > > > can't remember them - but if you open it in one process and pass it to
> > > > > > another process, which processes address space are you trying to
> > > > > > monitor?
> > > > > 
> > > > > Yes it's a problem.  The kernel always fetches the current mm_struct* which
> > > > > represents the current context of virtual address space when creating the
> > > > > uffd handle (for either the syscall or the ioctl() approach).
> > > > 
> > > > At what point does the process address space get associated ? When
> > > > the /dev/userfaultfd is opened, or only when ioctl(USERFAULTFD_IOC_NEW)
> > > > is called ?  If it is the former, then we have no choice, QEMU must open
> > > > it. if it is the latter, then libvirt can open /dev/userfaultfd, pass
> > > > it to QEMU which can then do the ioctl(USERFAULTFD_IOC_NEW).
> > > 
> > > Good point.. It should be the latter, so should be doable.
> > > 
> > > What should be the best interface for QEMU to detect the fd passing over to
> > > it?  IIUC qemu_open() requires the name to be /dev/fdset/*, but there's no
> > > existing cmdline that QEMU can know which fd number to fetch from fdset to
> > > be used as the /dev/userfaultfd descriptor.
> > > 
> > > monitor_get_fd() seems more proper, where we can define an unique string so
> > > Libvirt can preset the descriptor with the same string attached to it, then
> > > I can opt-in monitor_get_fd() before trying to open() or doing the syscall.
> > 
> > Daniel/Michal, any input here from Libvirt side?
> > 
> > I just noticed that monitor_get_fd() is bound to a specific monitor, then
> > it seems not clear which one is from Libvirt.  If to use qemu_open() and
> > add-fd I think we need another QEMU cmdline to set the fd path, iiuc.
> > 
> > I can also leave that for later if opening /dev/userfaultfd is already
> > resolving the immediate problem in containers.
> 
> I don't have any great ideas really. If we assume the /dev/userfaultfd
> is accessible to QEMU we can ignore it.

It's my understanding that QEMU process will be invoked by the user or
group that has access to /dev/userfaultfd, probably in the same context as
what Libvirt specified. So hopefully everything will work out naturally
already.

There's one thing I'm unsure on introducing a new qemu cmdline option - I
can't remember where I get this memory, but - IIRC Paolo suggested at some
point to reduce or forbid introducing new options to QEMU.

To remedy that, we can also add a migration parameter which will point to
/dev/userfaultfd (which can be set to "/dev/fdsets/N" by Libvirt in QMP in
QEMU's early stage), considering that so far most of the uffd features are
used by migration submodule, IMHO it's fine to do so.

Said that, I think we can always work on top of this series if that'll be
useful to libvirt some day; the change should be trivial.  So I can keep
this series simple.

I'll wait 1-2 more days to see whether Michal has anything to comment.

Thanks,

-- 
Peter Xu


Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd
Posted by Michal Prívozník 1 year, 3 months ago
On 1/31/23 22:01, Peter Xu wrote:
> I'll wait 1-2 more days to see whether Michal has anything to comment.

Yeah, we can go with your patches and leave FD passing for future work.
It's orthogonal after all.

In the end we can have (in the order of precedence):

1) new cmd line argument, say:

   qemu-system-x86_64 -userfaultfd fd=5 # where FD 5 is passed by
libvirt when exec()-ing qemu, just like other FDs, e.g. -chardev
socket,fd=XXX

2) your patches, where qemu opens /dev/userfaultfd

3) current behavior, userfaultfd syscall


Michal
Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd
Posted by Peter Xu 1 year, 3 months ago
On Wed, Feb 01, 2023 at 08:55:01AM +0100, Michal Prívozník wrote:
> On 1/31/23 22:01, Peter Xu wrote:
> > I'll wait 1-2 more days to see whether Michal has anything to comment.
> 
> Yeah, we can go with your patches and leave FD passing for future work.
> It's orthogonal after all.
> 
> In the end we can have (in the order of precedence):
> 
> 1) new cmd line argument, say:
> 
>    qemu-system-x86_64 -userfaultfd fd=5 # where FD 5 is passed by
> libvirt when exec()-ing qemu, just like other FDs, e.g. -chardev
> socket,fd=XXX
> 
> 2) your patches, where qemu opens /dev/userfaultfd
> 
> 3) current behavior, userfaultfd syscall

Sounds good.  Thanks.

-- 
Peter Xu


Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd
Posted by Michal Prívozník 1 year, 3 months ago
On 1/26/23 16:25, Peter Xu wrote:
> On Thu, Jan 26, 2023 at 02:15:11PM +0000, Dr. David Alan Gilbert wrote:
>> * Michal Prívozník (mprivozn@redhat.com) wrote:
>>> On 1/25/23 23:40, Peter Xu wrote:
>>>> The new /dev/userfaultfd handle is superior to the system call with a
>>>> better permission control and also works for a restricted seccomp
>>>> environment.
>>>>
>>>> The new device was only introduced in v6.1 so we need a header update.
>>>>
>>>> Please have a look, thanks.
>>>
>>> I was wondering whether it would make sense/be possible for mgmt app
>>> (libvirt) to pass FD for /dev/userfaultfd instead of QEMU opening it
>>> itself. But looking into the code, libvirt would need to do that when
>>> spawning QEMU because that's when QEMU itself initializes internal state
>>> and queries userfaultfd caps.
>>
>> You also have to be careful about what the userfaultfd semantics are; I
>> can't remember them - but if you open it in one process and pass it to
>> another process, which processes address space are you trying to
>> monitor?
> 
> Yes it's a problem.  The kernel always fetches the current mm_struct* which
> represents the current context of virtual address space when creating the
> uffd handle (for either the syscall or the ioctl() approach).

Ah, I did not realize that.

> 
> It works only if Libvirt will invoke QEMU as a thread and they'll share the
> same address space.
> 
> Why libvirt would like to do so?

Well, we tend to pass files as FD more and more, because it allows us to
give access to "privileged" files to unprivileged process. What I did
not realize is that userfaultfd is different, not yet another file.

Michal


Re: [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd
Posted by Peter Xu 1 year, 3 months ago
On Thu, Jan 26, 2023 at 04:29:10PM +0100, Michal Prívozník wrote:
> On 1/26/23 16:25, Peter Xu wrote:
> > On Thu, Jan 26, 2023 at 02:15:11PM +0000, Dr. David Alan Gilbert wrote:
> >> * Michal Prívozník (mprivozn@redhat.com) wrote:
> >>> On 1/25/23 23:40, Peter Xu wrote:
> >>>> The new /dev/userfaultfd handle is superior to the system call with a
> >>>> better permission control and also works for a restricted seccomp
> >>>> environment.
> >>>>
> >>>> The new device was only introduced in v6.1 so we need a header update.
> >>>>
> >>>> Please have a look, thanks.
> >>>
> >>> I was wondering whether it would make sense/be possible for mgmt app
> >>> (libvirt) to pass FD for /dev/userfaultfd instead of QEMU opening it
> >>> itself. But looking into the code, libvirt would need to do that when
> >>> spawning QEMU because that's when QEMU itself initializes internal state
> >>> and queries userfaultfd caps.
> >>
> >> You also have to be careful about what the userfaultfd semantics are; I
> >> can't remember them - but if you open it in one process and pass it to
> >> another process, which processes address space are you trying to
> >> monitor?
> > 
> > Yes it's a problem.  The kernel always fetches the current mm_struct* which
> > represents the current context of virtual address space when creating the
> > uffd handle (for either the syscall or the ioctl() approach).
> 
> Ah, I did not realize that.
> 
> > 
> > It works only if Libvirt will invoke QEMU as a thread and they'll share the
> > same address space.
> > 
> > Why libvirt would like to do so?
> 
> Well, we tend to pass files as FD more and more, because it allows us to
> give access to "privileged" files to unprivileged process. What I did
> not realize is that userfaultfd is different, not yet another file.

I see.  Yes uffd is special comparing to most of the other fds, IMHO
majorly because it's a resource not being public but closely bound to the
process context of the mm.

There used to have proposals that grant permission to open uffd handle for
other processes, but the security implication was still not fully clear and
that discussion discontinued.

Then the question is whether there is still any scenario that QEMU may not
have privilege to either /dev/userfaultfd or using the syscall.

Thanks,

-- 
Peter Xu