[v1] RE: [RFC PATCH 00/18] KVM: Post-copy live migration for guest_memfd

RE: [RFC PATCH 00/18] KVM: Post-copy live migration for guest_memfd

Posted by Wang, Wei W 1 year, 6 months ago

On Thursday, July 11, 2024 7:42 AM, James Houghton wrote:
> This patch series implements the KVM-based demand paging system that was
> first introduced back in November[1] by David Matlack.
> 
> The working name for this new system is KVM Userfault, but that name is very
> confusing so it will not be the final name.
> 
Hi James,
I had implemented a similar approach for TDX post-copy migration, there are quite
some differences though. Got some questions about your design below.

> Problem: post-copy with guest_memfd
> ===================================
> 
> Post-copy live migration makes it possible to migrate VMs from one host to
> another no matter how fast they are writing to memory while keeping the VM
> paused for a minimal amount of time. For post-copy to work, we
> need:
>  1. to be able to prevent KVM from being able to access particular pages
>     of guest memory until we have populated it  2. for userspace to know when
> KVM is trying to access a particular
>     page.
>  3. a way to allow the access to proceed.
> 
> Traditionally, post-copy live migration is implemented using userfaultfd, which
> hooks into the main mm fault path. KVM hits this path when it is doing HVA ->
> PFN translations (with GUP) or when it itself attempts to access guest memory.
> Userfaultfd sends a page fault notification to userspace, and KVM goes to sleep.
> 
> Userfaultfd works well, as it is not specific to KVM; everyone who attempts to
> access guest memory will block the same way.
> 
> However, with guest_memfd, we do not use GUP to translate from GFN to HPA
> (nor is there an intermediate HVA).
> 
> So userfaultfd in its current form cannot be used to support post-copy live
> migration with guest_memfd-backed VMs.
> 
> Solution: hook into the gfn -> pfn translation
> ==============================================
> 
> The only way to implement post-copy with a non-KVM-specific userfaultfd-like
> system would be to introduce the concept of a file-userfault[2] to intercept
> faults on a guest_memfd.
> 
> Instead, we take the simpler approach of adding a KVM-specific API, and we
> hook into the GFN -> HVA or GFN -> PFN translation steps (for traditional
> memslots and for guest_memfd respectively).


Why taking KVM_EXIT_MEMORY_FAULT faults for the traditional shared
pages (i.e. GFN -> HVA)? 
It seems simpler if we use KVM_EXIT_MEMORY_FAULT for private pages only, leaving
shared pages to go through the existing userfaultfd mechanism:
- The need for “asynchronous userfaults,” introduced by patch 14, could be eliminated.
- The additional support (e.g., KVM_MEMORY_EXIT_FLAG_USERFAULT) for private page
  faults exiting to userspace for postcopy might not be necessary, because all pages on the
  destination side are initially “shared,” and the guest’s first access will always cause an
  exit to userspace for shared->private conversion. So VMM is able to leverage the exit to
  fetch the page data from the source (VMM can know if a page data has been fetched
  from the source or not).

> 
> I have intentionally added support for traditional memslots, as the complexity
> that it adds is minimal, and it is useful for some VMMs, as it can be used to
> fully implement post-copy live migration.
> 
> Implementation Details
> ======================
> 
> Let's break down how KVM implements each of the three core requirements
> for implementing post-copy as laid out above:
> 
> --- Preventing access: KVM_MEMORY_ATTRIBUTE_USERFAULT ---
> 
> The most straightforward way to inform KVM of userfault-enabled pages is to
> use a new memory attribute, say KVM_MEMORY_ATTRIBUTE_USERFAULT.
> 
> There is already infrastructure in place for modifying and checking memory
> attributes. Using this interface is slightly challenging, as there is no UAPI for
> setting/clearing particular attributes; we must set the exact attributes we want.
> 
> The synchronization that is in place for updating memory attributes is not
> suitable for post-copy live migration either, which will require updating
> memory attributes (from userfault to no-userfault) very frequently.
> 
> Another potential interface could be to use something akin to a dirty bitmap,
> where a bitmap describes which pages within a memslot (or VM) should trigger
> userfaults. This way, it is straightforward to make updates to the userfault
> status of a page cheap.
> 
> When KVM Userfault is enabled, we need to be careful not to map a userfault
> page in response to a fault on a non-userfault page. In this RFC, I've taken the
> simplest approach: force new PTEs to be PAGE_SIZE.
> 
> --- Page fault notifications ---
> 
> For page faults generated by vCPUs running in guest mode, if the page the
> vCPU is trying to access is a userfault-enabled page, we use

Why is it necessary to add the per-page control (with uAPIs for VMM to set/clear)?
Any functional issues if we just have all the page faults exit to userspace during the
post-copy period?
- As also mentioned above, userspace can easily know if a page needs to be
  fetched from the source or not, so upon a fault exit to userspace, VMM can
  decide to block the faulting vcpu thread or return back to KVM immediately.
- If improvement is really needed (would need profiling first) to reduce number
  of exits to userspace, a  KVM internal status (bitmap or xarray) seems sufficient.
  Each page only needs to exit to userspace once for the purpose of fetching its data
  from the source in postcopy. It doesn't seem to need userspace to enable the exit
  again for the page (via a new uAPI), right?

Re: [RFC PATCH 00/18] KVM: Post-copy live migration for guest_memfd

Posted by James Houghton 1 year, 6 months ago

On Mon, Jul 15, 2024 at 8:28 AM Wang, Wei W <wei.w.wang@intel.com> wrote:
>
> On Thursday, July 11, 2024 7:42 AM, James Houghton wrote:
> > This patch series implements the KVM-based demand paging system that was
> > first introduced back in November[1] by David Matlack.
> >
> > The working name for this new system is KVM Userfault, but that name is very
> > confusing so it will not be the final name.
> >
> Hi James,
> I had implemented a similar approach for TDX post-copy migration, there are quite
> some differences though. Got some questions about your design below.

Thanks for the feedback!!

>
> > Problem: post-copy with guest_memfd
> > ===================================
> >
> > Post-copy live migration makes it possible to migrate VMs from one host to
> > another no matter how fast they are writing to memory while keeping the VM
> > paused for a minimal amount of time. For post-copy to work, we
> > need:
> >  1. to be able to prevent KVM from being able to access particular pages
> >     of guest memory until we have populated it  2. for userspace to know when
> > KVM is trying to access a particular
> >     page.
> >  3. a way to allow the access to proceed.
> >
> > Traditionally, post-copy live migration is implemented using userfaultfd, which
> > hooks into the main mm fault path. KVM hits this path when it is doing HVA ->
> > PFN translations (with GUP) or when it itself attempts to access guest memory.
> > Userfaultfd sends a page fault notification to userspace, and KVM goes to sleep.
> >
> > Userfaultfd works well, as it is not specific to KVM; everyone who attempts to
> > access guest memory will block the same way.
> >
> > However, with guest_memfd, we do not use GUP to translate from GFN to HPA
> > (nor is there an intermediate HVA).
> >
> > So userfaultfd in its current form cannot be used to support post-copy live
> > migration with guest_memfd-backed VMs.
> >
> > Solution: hook into the gfn -> pfn translation
> > ==============================================
> >
> > The only way to implement post-copy with a non-KVM-specific userfaultfd-like
> > system would be to introduce the concept of a file-userfault[2] to intercept
> > faults on a guest_memfd.
> >
> > Instead, we take the simpler approach of adding a KVM-specific API, and we
> > hook into the GFN -> HVA or GFN -> PFN translation steps (for traditional
> > memslots and for guest_memfd respectively).
>
>
> Why taking KVM_EXIT_MEMORY_FAULT faults for the traditional shared
> pages (i.e. GFN -> HVA)?
> It seems simpler if we use KVM_EXIT_MEMORY_FAULT for private pages only, leaving
> shared pages to go through the existing userfaultfd mechanism:
> - The need for “asynchronous userfaults,” introduced by patch 14, could be eliminated.
> - The additional support (e.g., KVM_MEMORY_EXIT_FLAG_USERFAULT) for private page
>   faults exiting to userspace for postcopy might not be necessary, because all pages on the
>   destination side are initially “shared,” and the guest’s first access will always cause an
>   exit to userspace for shared->private conversion. So VMM is able to leverage the exit to
>   fetch the page data from the source (VMM can know if a page data has been fetched
>   from the source or not).

You're right that, today, including support for guest-private memory
*only* indeed simplifies things (no async userfaults). I think your
strategy for implementing post-copy would work (so, shared->private
conversion faults for vCPU accesses to private memory, and userfaultfd
for everything else).

I'm not 100% sure what should happen in the case of a non-vCPU access
to should-be-private memory; today it seems like KVM just provides the
shared version of the page, so conventional use of userfaultfd
shouldn't break anything.

But eventually guest_memfd itself will support "shared" memory, and
(IIUC) it won't use VMAs, so userfaultfd won't be usable (without
changes anyway). For a non-confidential VM, all memory will be
"shared", so shared->private conversions can't help us there either.
Starting everything as private almost works (so using private->shared
conversions as a notification mechanism), but if the first time KVM
attempts to use a page is not from a vCPU (and is from a place where
we cannot easily return to userspace), the need for "async userfaults"
comes back.

For this use case, it seems cleaner to have a new interface. (And, as
far as I can tell, we would at least need some kind of "async
userfault"-like mechanism.)

Another reason why, today, KVM Userfault is helpful is that
userfaultfd has a couple drawbacks. Userfaultfd migration with
HugeTLB-1G is basically unusable, as HugeTLB pages cannot be mapped at
PAGE_SIZE. Some discussion here[1][2].

Moving the implementation of post-copy to KVM means that, throughout
post-copy, we can avoid changes to the main mm page tables, and we
only need to modify the second stage page tables. This saves the
memory needed to store the extra set of shattered page tables, and we
save the performance overhead of the page table modifications and
accounting that mm does.

There's some more discussion about these points in David's RFC[3].

[1]: https://lore.kernel.org/linux-mm/20230218002819.1486479-1-jthoughton@google.com/
[2]: https://lore.kernel.org/linux-mm/ZdcKwK7CXgEsm-Co@x1n/
[3]: https://lore.kernel.org/kvm/CALzav=d23P5uE=oYqMpjFohvn0CASMJxXB_XEOEi-jtqWcFTDA@mail.gmail.com/

>
> >

> > I have intentionally added support for traditional memslots, as the complexity
> > that it adds is minimal, and it is useful for some VMMs, as it can be used to
> > fully implement post-copy live migration.
> >
> > Implementation Details
> > ======================
> >
> > Let's break down how KVM implements each of the three core requirements
> > for implementing post-copy as laid out above:
> >
> > --- Preventing access: KVM_MEMORY_ATTRIBUTE_USERFAULT ---
> >
> > The most straightforward way to inform KVM of userfault-enabled pages is to
> > use a new memory attribute, say KVM_MEMORY_ATTRIBUTE_USERFAULT.
> >
> > There is already infrastructure in place for modifying and checking memory
> > attributes. Using this interface is slightly challenging, as there is no UAPI for
> > setting/clearing particular attributes; we must set the exact attributes we want.
> >
> > The synchronization that is in place for updating memory attributes is not
> > suitable for post-copy live migration either, which will require updating
> > memory attributes (from userfault to no-userfault) very frequently.
> >
> > Another potential interface could be to use something akin to a dirty bitmap,
> > where a bitmap describes which pages within a memslot (or VM) should trigger
> > userfaults. This way, it is straightforward to make updates to the userfault
> > status of a page cheap.
> >
> > When KVM Userfault is enabled, we need to be careful not to map a userfault
> > page in response to a fault on a non-userfault page. In this RFC, I've taken the
> > simplest approach: force new PTEs to be PAGE_SIZE.
> >
> > --- Page fault notifications ---
> >
> > For page faults generated by vCPUs running in guest mode, if the page the
> > vCPU is trying to access is a userfault-enabled page, we use
>
> Why is it necessary to add the per-page control (with uAPIs for VMM to set/clear)?
> Any functional issues if we just have all the page faults exit to userspace during the
> post-copy period?
> - As also mentioned above, userspace can easily know if a page needs to be
>   fetched from the source or not, so upon a fault exit to userspace, VMM can
>   decide to block the faulting vcpu thread or return back to KVM immediately.
> - If improvement is really needed (would need profiling first) to reduce number
>   of exits to userspace, a  KVM internal status (bitmap or xarray) seems sufficient.
>   Each page only needs to exit to userspace once for the purpose of fetching its data
>   from the source in postcopy. It doesn't seem to need userspace to enable the exit
>   again for the page (via a new uAPI), right?

We don't necessarily need a way to go from no-fault -> fault for a
page, that's right[4]. But we do need a way for KVM to be able to
allow the access to proceed (i.e., go from fault -> no-fault). IOW, if
we get a fault and come out to userspace, we need a way to tell KVM
not to do that again. In the case of shared->private conversions, that
mechanism is toggling the memory attributes for a gfn. For
conventional userfaultfd, that's using UFFDIO_COPY/CONTINUE/POISON.
Maybe I'm misunderstanding your question.

[4]: It is helpful for poison emulation for HugeTLB-backed VMs today,
but this is not important.

RE: [RFC PATCH 00/18] KVM: Post-copy live migration for guest_memfd

Posted by Wang, Wei W 1 year, 6 months ago

On Wednesday, July 17, 2024 1:10 AM, James Houghton wrote:
> You're right that, today, including support for guest-private memory
> *only* indeed simplifies things (no async userfaults). I think your strategy for
> implementing post-copy would work (so, shared->private conversion faults for
> vCPU accesses to private memory, and userfaultfd for everything else).

Yes, it works and has been used for our internal tests.

> 
> I'm not 100% sure what should happen in the case of a non-vCPU access to
> should-be-private memory; today it seems like KVM just provides the shared
> version of the page, so conventional use of userfaultfd shouldn't break
> anything.

This seems to be the trusted IO usage (not aware of other usages, emulated device
backends, such as vhost, work with shared pages). Migration support for trusted device
passthrough doesn't seem to be architecturally ready yet. Especially for postcopy,
AFAIK, even the legacy VM case lacks the support for device passthrough (not sure if
you've made it internally). So it seems too early to discuss this in detail.


> 
> But eventually guest_memfd itself will support "shared" memory, 

OK, I thought of this. Not sure how feasible it would be to extend gmem for
shared memory. I think questions like below need to be investigated:
#1 what are the tangible benefits of gmem based shared memory, compared to the
     legacy shared memory that we have now?
#2 There would be some gaps to make gmem usable for shared pages. For
      example, would it support userspace to map (without security concerns)?
#3 if gmem gets extended to be something like hugetlb (e.g. 1GB), would it result
     in the same issue as hugetlb? 

The support of using gmem for shared memory isn't in place yet, and this seems
to be a dependency for the support being added here.

> and
> (IIUC) it won't use VMAs, so userfaultfd won't be usable (without changes
> anyway). For a non-confidential VM, all memory will be "shared", so shared-
> >private conversions can't help us there either.
> Starting everything as private almost works (so using private->shared
> conversions as a notification mechanism), but if the first time KVM attempts to
> use a page is not from a vCPU (and is from a place where we cannot easily
> return to userspace), the need for "async userfaults"
> comes back.

Yeah, this needs to be resolved for KVM userfaults. If gmem is used for private
pages only, this wouldn't be an issue (it will be covered by userfaultfd).


> 
> For this use case, it seems cleaner to have a new interface. (And, as far as I can
> tell, we would at least need some kind of "async userfault"-like mechanism.)
> 
> Another reason why, today, KVM Userfault is helpful is that userfaultfd has a
> couple drawbacks. Userfaultfd migration with HugeTLB-1G is basically
> unusable, as HugeTLB pages cannot be mapped at PAGE_SIZE. Some discussion
> here[1][2].
> 
> Moving the implementation of post-copy to KVM means that, throughout
> post-copy, we can avoid changes to the main mm page tables, and we only
> need to modify the second stage page tables. This saves the memory needed
> to store the extra set of shattered page tables, and we save the performance
> overhead of the page table modifications and accounting that mm does.

It would be nice to see some data for comparisons between kvm faults and userfaultfd
e.g., end to end latency of handling a page fault via getting data from the source.
(I didn't find data from the link you shared. Please correct me if I missed it)


> We don't necessarily need a way to go from no-fault -> fault for a page, that's
> right[4]. But we do need a way for KVM to be able to allow the access to
> proceed (i.e., go from fault -> no-fault). IOW, if we get a fault and come out to
> userspace, we need a way to tell KVM not to do that again.
> In the case of shared->private conversions, that mechanism is toggling the memory
> attributes for a gfn.  For conventional userfaultfd, that's using
> UFFDIO_COPY/CONTINUE/POISON.
> Maybe I'm misunderstanding your question.

We can come back to this after the dependency discussion above is done. (If gmem is only
used for private pages, the support for postcopy, including changes required for VMMs, would
be simpler)

Re: [RFC PATCH 00/18] KVM: Post-copy live migration for guest_memfd

Posted by James Houghton 1 year, 6 months ago

On Wed, Jul 17, 2024 at 8:03 AM Wang, Wei W <wei.w.wang@intel.com> wrote:
>
> On Wednesday, July 17, 2024 1:10 AM, James Houghton wrote:
> > You're right that, today, including support for guest-private memory
> > *only* indeed simplifies things (no async userfaults). I think your strategy for
> > implementing post-copy would work (so, shared->private conversion faults for
> > vCPU accesses to private memory, and userfaultfd for everything else).
>
> Yes, it works and has been used for our internal tests.
>
> >
> > I'm not 100% sure what should happen in the case of a non-vCPU access to
> > should-be-private memory; today it seems like KVM just provides the shared
> > version of the page, so conventional use of userfaultfd shouldn't break
> > anything.
>
> This seems to be the trusted IO usage (not aware of other usages, emulated device
> backends, such as vhost, work with shared pages). Migration support for trusted device
> passthrough doesn't seem to be architecturally ready yet. Especially for postcopy,
> AFAIK, even the legacy VM case lacks the support for device passthrough (not sure if
> you've made it internally). So it seems too early to discuss this in detail.

We don't migrate VMs with passthrough devices.

I still think the way KVM handles non-vCPU accesses to private memory
is wrong: surely it is an error, yet we simply provide the shared
version of the page. *shrug*

>
> >
> > But eventually guest_memfd itself will support "shared" memory,
>
> OK, I thought of this. Not sure how feasible it would be to extend gmem for
> shared memory. I think questions like below need to be investigated:

An RFC for it got posted recently[1]. :)

> #1 what are the tangible benefits of gmem based shared memory, compared to the
>      legacy shared memory that we have now?

For [1], unmapping guest memory from the direct map.

> #2 There would be some gaps to make gmem usable for shared pages. For
>       example, would it support userspace to map (without security concerns)?

At least in [1], userspace would be able to mmap it, but KVM would
still not be able to GUP it (instead going through the normal
guest_memfd path).

> #3 if gmem gets extended to be something like hugetlb (e.g. 1GB), would it result
>      in the same issue as hugetlb?

Good question. At the end of the day, the problem is that GUP relies
on host mm page table mappings, and HugeTLB can't map things with
PAGE_SIZE PTEs.

At least as of [1], given that KVM doesn't GUP guest_memfd memory, we
don't rely on the host mm page table layout, so we don't have the same
problem.

For VMMs that want to catch userspace (or non-GUP kernel) accesses via
a guest_memfd VMA, then it's possible it has the same issue. But for
VMMs that don't care to catch these kinds of accesses (the kind of
user that would use KVM Userfault to implement post-copy), it doesn't
matter.

[1]: https://lore.kernel.org/kvm/20240709132041.3625501-1-roypat@amazon.co.uk/

>
> The support of using gmem for shared memory isn't in place yet, and this seems
> to be a dependency for the support being added here.

Perhaps I've been slightly preemptive. :) I still think there's useful
discussion here.

> > and
> > (IIUC) it won't use VMAs, so userfaultfd won't be usable (without changes
> > anyway). For a non-confidential VM, all memory will be "shared", so shared-
> > >private conversions can't help us there either.
> > Starting everything as private almost works (so using private->shared
> > conversions as a notification mechanism), but if the first time KVM attempts to
> > use a page is not from a vCPU (and is from a place where we cannot easily
> > return to userspace), the need for "async userfaults"
> > comes back.
>
> Yeah, this needs to be resolved for KVM userfaults. If gmem is used for private
> pages only, this wouldn't be an issue (it will be covered by userfaultfd).

We're on the same page here.

>
>
> >
> > For this use case, it seems cleaner to have a new interface. (And, as far as I can
> > tell, we would at least need some kind of "async userfault"-like mechanism.)
> >
> > Another reason why, today, KVM Userfault is helpful is that userfaultfd has a
> > couple drawbacks. Userfaultfd migration with HugeTLB-1G is basically
> > unusable, as HugeTLB pages cannot be mapped at PAGE_SIZE. Some discussion
> > here[1][2].
> >
> > Moving the implementation of post-copy to KVM means that, throughout
> > post-copy, we can avoid changes to the main mm page tables, and we only
> > need to modify the second stage page tables. This saves the memory needed
> > to store the extra set of shattered page tables, and we save the performance
> > overhead of the page table modifications and accounting that mm does.
>
> It would be nice to see some data for comparisons between kvm faults and userfaultfd
> e.g., end to end latency of handling a page fault via getting data from the source.
> (I didn't find data from the link you shared. Please correct me if I missed it)

I don't have an A/B comparison for kernel end-to-end fault latency. :(
But I can tell you that with 32us or so network latency, it's not a
huge difference (assuming Anish's series[2]).

The real performance issue comes when we are collapsing the page
tables at the end. We basically have to do ~2x of everything (TLB
flushes, etc.), plus additional accounting that HugeTLB/THP does
(adjusting refcount/mapcount), etc. And one must optimize how the
unmap MMU notifiers are called so as to not stall vCPUs unnecessarily.

[2]: https://lore.kernel.org/kvm/20240215235405.368539-1-amoorthy@google.com/

>
>
> > We don't necessarily need a way to go from no-fault -> fault for a page, that's
> > right[4]. But we do need a way for KVM to be able to allow the access to
> > proceed (i.e., go from fault -> no-fault). IOW, if we get a fault and come out to
> > userspace, we need a way to tell KVM not to do that again.
> > In the case of shared->private conversions, that mechanism is toggling the memory
> > attributes for a gfn.  For conventional userfaultfd, that's using
> > UFFDIO_COPY/CONTINUE/POISON.
> > Maybe I'm misunderstanding your question.
>
> We can come back to this after the dependency discussion above is done. (If gmem is only
> used for private pages, the support for postcopy, including changes required for VMMs, would
> be simpler)

RE: [RFC PATCH 00/18] KVM: Post-copy live migration for guest_memfd

Posted by Wang, Wei W 1 year, 6 months ago

On Thursday, July 18, 2024 9:09 AM, James Houghton wrote:
> On Wed, Jul 17, 2024 at 8:03 AM Wang, Wei W <wei.w.wang@intel.com>
> wrote:
> >
> > On Wednesday, July 17, 2024 1:10 AM, James Houghton wrote:
> > > You're right that, today, including support for guest-private memory
> > > *only* indeed simplifies things (no async userfaults). I think your
> > > strategy for implementing post-copy would work (so, shared->private
> > > conversion faults for vCPU accesses to private memory, and userfaultfd for
> everything else).
> >
> > Yes, it works and has been used for our internal tests.
> >
> > >
> > > I'm not 100% sure what should happen in the case of a non-vCPU
> > > access to should-be-private memory; today it seems like KVM just
> > > provides the shared version of the page, so conventional use of
> > > userfaultfd shouldn't break anything.
> >
> > This seems to be the trusted IO usage (not aware of other usages,
> > emulated device backends, such as vhost, work with shared pages).
> > Migration support for trusted device passthrough doesn't seem to be
> > architecturally ready yet. Especially for postcopy, AFAIK, even the
> > legacy VM case lacks the support for device passthrough (not sure if you've
> made it internally). So it seems too early to discuss this in detail.
> 
> We don't migrate VMs with passthrough devices.
> 
> I still think the way KVM handles non-vCPU accesses to private memory is
> wrong: surely it is an error, yet we simply provide the shared version of the
> page. *shrug*
> 
> >
> > >
> > > But eventually guest_memfd itself will support "shared" memory,
> >
> > OK, I thought of this. Not sure how feasible it would be to extend
> > gmem for shared memory. I think questions like below need to be
> investigated:
> 
> An RFC for it got posted recently[1]. :)
> 
> > #1 what are the tangible benefits of gmem based shared memory, compared
> to the
> >      legacy shared memory that we have now?
> 
> For [1], unmapping guest memory from the direct map.
> 
> > #2 There would be some gaps to make gmem usable for shared pages. For
> >       example, would it support userspace to map (without security concerns)?
> 
> At least in [1], userspace would be able to mmap it, but KVM would still not be
> able to GUP it (instead going through the normal guest_memfd path).
> 
> > #3 if gmem gets extended to be something like hugetlb (e.g. 1GB), would it
> result
> >      in the same issue as hugetlb?
> 
> Good question. At the end of the day, the problem is that GUP relies on host
> mm page table mappings, and HugeTLB can't map things with PAGE_SIZE PTEs.
> 
> At least as of [1], given that KVM doesn't GUP guest_memfd memory, we don't
> rely on the host mm page table layout, so we don't have the same problem.
> 
> For VMMs that want to catch userspace (or non-GUP kernel) accesses via a
> guest_memfd VMA, then it's possible it has the same issue. But for VMMs that
> don't care to catch these kinds of accesses (the kind of user that would use
> KVM Userfault to implement post-copy), it doesn't matter.
> 
> [1]: https://lore.kernel.org/kvm/20240709132041.3625501-1-
> roypat@amazon.co.uk/

Ah, I overlooked this series, thanks for the reminder.
Let me check the details first.