mm/hmm: Add mmap lock-drop support for userfaultfd-backed mappings | Patchew

[PATCH v3 0/3] mm/hmm: Add mmap lock-drop support for userfaultfd-backed mappings

Stanislav Kinsburskii posted 3 patches 3 weeks, 4 days ago

Download series mbox

Documentation/mm/hmm.rst               |   62 +++++++++++
include/linux/hmm.h                    |    1
lib/test_hmm.c                         |  122 +++++++++++++++++++++
lib/test_hmm_uapi.h                    |    1
mm/hmm.c                               |  187 ++++++++++++++++++++++++--------
tools/testing/selftests/mm/hmm-tests.c |  149 +++++++++++++++++++++++++
6 files changed, 478 insertions(+), 44 deletions(-)

Expand all Fold all

[PATCH v3 0/3] mm/hmm: Add mmap lock-drop support for userfaultfd-backed mappings

Posted by Stanislav Kinsburskii 3 weeks, 4 days ago

This series extends the HMM framework to support userfaultfd-backed memory
by allowing the mmap read lock to be dropped during hmm_range_fault().

Some page fault handlers — most notably userfaultfd — require the mmap lock
to be released so that userspace can resolve the fault. The current HMM
interface never sets FAULT_FLAG_ALLOW_RETRY, making it impossible to fault
in pages from userfaultfd-registered regions.

This series follows the established int *locked pattern from
get_user_pages_remote() in mm/gup.c. A new entry point,
hmm_range_fault_unlockable(), accepts an int *locked parameter. When the
mmap lock is dropped during fault resolution (VM_FAULT_RETRY or
VM_FAULT_COMPLETED), the function returns 0 with *locked = 0, signalling
the caller to restart its walk. The existing hmm_range_fault() is
refactored into a thin wrapper that passes NULL, preserving current
behavior for all existing callers.

Faulting hugetlb pages on the unlockable path is not supported because
walk_hugetlb_range() unconditionally holds and releases
hugetlb_vma_lock_read across the callback; if the mmap lock is dropped
inside the callback, the VMA may be freed before the walk framework's
unlock. Hugetlb pages already present in page tables are handled normally.
Possible approaches to lift this limitation are documented in
Documentation/mm/hmm.rst.

Changes in v3:
- Return -EFAULT from dmirror_fault_unlockable() when the mirrored mm can no longer be pinned.
- Add an eventfd stop signal for the userfaultfd handler thread to avoid waiting for the poll timeout on successful test completion.

Changes in v2:

- Split into a preparatory refactor (new patch 1) that moves
handle_mm_fault() out of the walk callbacks, plus a smaller feature
patch on top. Suggested by David Hildenbrand.
- Hugetlb regions are now supported on the unlockable path; the v1
-EFAULT short-circuit and the hugetlb_vma_lock_read drop/retake
dance are gone.
- Distinct internal sentinels for "needs fault" (HMM_FAULT_PENDING)
and "lock dropped" (HMM_FAULT_UNLOCKED).
- Outer loop now re-walks after a successful internal fault so the
faulted pfns end up in range->hmm_pfns.
- Kernel-doc on hmm_range_fault_unlockable() and the
Documentation/mm/hmm.rst example match the implementation.
- Dropped the mshv driver conversion (v1 patch 2); will post
separately.
- Selftest converted to drive the path through test_hmm with a
userfaultfd handler (new HMM_DMIRROR_READ_UNLOCKABLE ioctl).

---

Stanislav Kinsburskii (3):
mm/hmm: move page fault handling out of walk callbacks
mm/hmm: add hmm_range_fault_unlockable() for mmap lock-drop support
selftests/mm: add userfaultfd test for HMM unlockable path

Documentation/mm/hmm.rst | 62 +++++++++++
include/linux/hmm.h | 1
lib/test_hmm.c | 122 +++++++++++++++++++++
lib/test_hmm_uapi.h | 1
mm/hmm.c | 187 ++++++++++++++++++++++++--------
tools/testing/selftests/mm/hmm-tests.c | 149 +++++++++++++++++++++++++
6 files changed, 478 insertions(+), 44 deletions(-)

Re: [PATCH v3 0/3] mm/hmm: Add mmap lock-drop support for userfaultfd-backed mappings

Posted by Andrew Morton 3 weeks, 2 days ago

On Wed, 20 May 2026 07:09:19 -0700 Stanislav Kinsburskii <skinsburskii@gmail.com> wrote:

> This series extends the HMM framework to support userfaultfd-backed memory
> by allowing the mmap read lock to be dropped during hmm_range_fault().
> 
> Some page fault handlers — most notably userfaultfd — require the mmap lock
> to be released so that userspace can resolve the fault. The current HMM
> interface never sets FAULT_FLAG_ALLOW_RETRY, making it impossible to fault
> in pages from userfaultfd-registered regions.
> 
> This series follows the established int *locked pattern from
> get_user_pages_remote() in mm/gup.c. A new entry point,
> hmm_range_fault_unlockable(), accepts an int *locked parameter. When the
> mmap lock is dropped during fault resolution (VM_FAULT_RETRY or
> VM_FAULT_COMPLETED), the function returns 0 with *locked = 0, signalling
> the caller to restart its walk. The existing hmm_range_fault() is
> refactored into a thin wrapper that passes NULL, preserving current
> behavior for all existing callers.
> 
> Faulting hugetlb pages on the unlockable path is not supported because
> walk_hugetlb_range() unconditionally holds and releases
> hugetlb_vma_lock_read across the callback; if the mmap lock is dropped
> inside the callback, the VMA may be freed before the walk framework's
> unlock. Hugetlb pages already present in page tables are handled normally.
> Possible approaches to lift this limitation are documented in
> Documentation/mm/hmm.rst.

Thanks.  AI review identified one possible issue, possibly a duplicate
from the v2 series?

	https://sashiko.dev/#/patchset/177928604779.589431.14703161356676674288.stgit@skinsburskii

I'll take no action at this stage, shall await reviewer input.  Please
poke me in a week or so if nothing has happened.

Which is quite possible - things seem rather hectic at this time and
we're almost at -rc5!

Re: [PATCH v3 0/3] mm/hmm: Add mmap lock-drop support for userfaultfd-backed mappings

Posted by Stanislav Kinsburskii 2 weeks, 2 days ago

On Thu, May 21, 2026 at 04:33:09PM -0700, Andrew Morton wrote:
> On Wed, 20 May 2026 07:09:19 -0700 Stanislav Kinsburskii <skinsburskii@gmail.com> wrote:
> 
> > This series extends the HMM framework to support userfaultfd-backed memory
> > by allowing the mmap read lock to be dropped during hmm_range_fault().
> > 
> > Some page fault handlers — most notably userfaultfd — require the mmap lock
> > to be released so that userspace can resolve the fault. The current HMM
> > interface never sets FAULT_FLAG_ALLOW_RETRY, making it impossible to fault
> > in pages from userfaultfd-registered regions.
> > 
> > This series follows the established int *locked pattern from
> > get_user_pages_remote() in mm/gup.c. A new entry point,
> > hmm_range_fault_unlockable(), accepts an int *locked parameter. When the
> > mmap lock is dropped during fault resolution (VM_FAULT_RETRY or
> > VM_FAULT_COMPLETED), the function returns 0 with *locked = 0, signalling
> > the caller to restart its walk. The existing hmm_range_fault() is
> > refactored into a thin wrapper that passes NULL, preserving current
> > behavior for all existing callers.
> > 
> > Faulting hugetlb pages on the unlockable path is not supported because
> > walk_hugetlb_range() unconditionally holds and releases
> > hugetlb_vma_lock_read across the callback; if the mmap lock is dropped
> > inside the callback, the VMA may be freed before the walk framework's
> > unlock. Hugetlb pages already present in page tables are handled normally.
> > Possible approaches to lift this limitation are documented in
> > Documentation/mm/hmm.rst.
> 
> Thanks.  AI review identified one possible issue, possibly a duplicate
> from the v2 series?
> 
> 	https://sashiko.dev/#/patchset/177928604779.589431.14703161356676674288.stgit@skinsburskii
> 
> I'll take no action at this stage, shall await reviewer input.  Please
> poke me in a week or so if nothing has happened.
> 

Hi Andrew,

A gentle reminder as requested: do you think this change could be taken into
the mm tree?
It's beneficial not only for the MSHV driver, but can be used for
post-copy live migration of GPU states in future.

Thanks,
Stanislav

> Which is quite possible - things seem rather hectic at this time and
> we're almost at -rc5!

Re: [PATCH v3 0/3] mm/hmm: Add mmap lock-drop support for userfaultfd-backed mappings

Posted by Andrew Morton 2 weeks, 2 days ago

On Thu, 28 May 2026 12:53:59 -0700 Stanislav Kinsburskii <skinsburskii@gmail.com> wrote:

> A gentle reminder as requested: do you think this change could be taken into
> the mm tree?
> It's beneficial not only for the MSHV driver, but can be used for
> post-copy live migration of GPU states in future.

Still no review, alas.  It's not a trivial thing, affecting both hmm
and userfaultfd.  And we're closing in on -rc6.

I'd prefer that we revisit in the next cycle, please.  Refresh retest
and resend after -rc1?

Re: [PATCH v3 0/3] mm/hmm: Add mmap lock-drop support for userfaultfd-backed mappings

Posted by Stanislav Kinsburskii 2 weeks, 2 days ago

On Thu, May 28, 2026 at 01:11:15PM -0700, Andrew Morton wrote:
> On Thu, 28 May 2026 12:53:59 -0700 Stanislav Kinsburskii <skinsburskii@gmail.com> wrote:
> 
> > A gentle reminder as requested: do you think this change could be taken into
> > the mm tree?
> > It's beneficial not only for the MSHV driver, but can be used for
> > post-copy live migration of GPU states in future.
> 
> Still no review, alas.  It's not a trivial thing, affecting both hmm
> and userfaultfd.  And we're closing in on -rc6.
> 
> I'd prefer that we revisit in the next cycle, please.  Refresh retest
> and resend after -rc1?
> 

Sure, will do.

Thanks,
Stanislav

Re: [PATCH v3 0/3] mm/hmm: Add mmap lock-drop support for userfaultfd-backed mappings

Posted by Stanislav Kinsburskii 3 weeks, 2 days ago

On Thu, May 21, 2026 at 04:33:09PM -0700, Andrew Morton wrote:
> On Wed, 20 May 2026 07:09:19 -0700 Stanislav Kinsburskii <skinsburskii@gmail.com> wrote:
> 
> > This series extends the HMM framework to support userfaultfd-backed memory
> > by allowing the mmap read lock to be dropped during hmm_range_fault().
> > 
> > Some page fault handlers — most notably userfaultfd — require the mmap lock
> > to be released so that userspace can resolve the fault. The current HMM
> > interface never sets FAULT_FLAG_ALLOW_RETRY, making it impossible to fault
> > in pages from userfaultfd-registered regions.
> > 
> > This series follows the established int *locked pattern from
> > get_user_pages_remote() in mm/gup.c. A new entry point,
> > hmm_range_fault_unlockable(), accepts an int *locked parameter. When the
> > mmap lock is dropped during fault resolution (VM_FAULT_RETRY or
> > VM_FAULT_COMPLETED), the function returns 0 with *locked = 0, signalling
> > the caller to restart its walk. The existing hmm_range_fault() is
> > refactored into a thin wrapper that passes NULL, preserving current
> > behavior for all existing callers.
> > 
> > Faulting hugetlb pages on the unlockable path is not supported because
> > walk_hugetlb_range() unconditionally holds and releases
> > hugetlb_vma_lock_read across the callback; if the mmap lock is dropped
> > inside the callback, the VMA may be freed before the walk framework's
> > unlock. Hugetlb pages already present in page tables are handled normally.
> > Possible approaches to lift this limitation are documented in
> > Documentation/mm/hmm.rst.
> 
> Thanks.  AI review identified one possible issue, possibly a duplicate
> from the v2 series?
> 
> 	https://sashiko.dev/#/patchset/177928604779.589431.14703161356676674288.stgit@skinsburskii
> 

I think this Sashiko finding is a false positive for current kselftest_harness.h.

ASSERT_EQ() expands to __EXPECT(..., 1), then the optional handler calls
__bail(1, _metadata). For assertions, __bail() calls abort() after
fixture teardown, not a plain return from the test function. See tools/
testing/selftests/kselftest_harness.h:521 and
tools/testing/selftests/kselftest_harness.h:962.

So for these lines after pthread_create() in
tools/testing/selftests/mm/hmm-tests.c:2979, a failed ASSERT_*
terminates the test process. The background thread does not continue
running after the test function returns with uffd_args popped, because
there is no normal return from the assertion path.

There is still a cleanup-quality argument: aborting skips the explicit
eventfd wake, pthread_join(), and frees/closes. But in a kselftest child
process that should be an acceptable failure-path behavior, not a stack
use-after-free.

> I'll take no action at this stage, shall await reviewer input.  Please
> poke me in a week or so if nothing has happened.
> 

Given the explanation above, I don't have an intent to address sashiko's
comment and send another revision unless you are certan there is an
issue to fix there.
If you are, please let me know.

> Which is quite possible - things seem rather hectic at this time and
> we're almost at -rc5!

Indeed.

Thank you again for your time,
Stanislav