[PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()

Lorenzo Stoakes posted 10 patches 2 weeks, 4 days ago
There is a newer version of this series
[PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()
Posted by Lorenzo Stoakes 2 weeks, 4 days ago
These functions are very confusing indeed. 'Entering' a lock could be
interpreted as acquiring it, but this is not what these functions are
interacting with.

Equally they don't indicate at all what kind of lock we are 'entering' or
'exiting'. Finally they are misleading as we invoke these functions when we
already hold a write lock to detach a VMA.

These functions are explicitly simply 'entering' and 'exiting' a state in
which we hold the EXCLUSIVE lock in order that we can either mark the VMA
as being write-locked, or mark the VMA detached.

Rename the functions accordingly, and also update
__vma_exit_exclusive_locked() to return detached state with a __must_check
directive, as it is simply clumsy to pass an output pointer here to
detached state and inconsistent vs. __vma_enter_exclusive_locked().

Finally, remove the unnecessary 'inline' directives.

No functional change intended.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/mmap_lock.h |  4 +--
 mm/mmap_lock.c            | 60 +++++++++++++++++++++++++--------------
 2 files changed, 41 insertions(+), 23 deletions(-)

diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index da63b1be6ec0..873bc5f3c97c 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -209,8 +209,8 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
 	__vma_lockdep_release_read(vma);
 	detached = __vma_refcount_put(vma, &refcnt);
 	/*
-	 * __vma_enter_locked() may be sleeping waiting for readers to drop
-	 * their reference count, so wake it up if we were the last reader
+	 * __vma_enter_exclusive_locked() may be sleeping waiting for readers to
+	 * drop their reference count, so wake it up if we were the last reader
 	 * blocking it from being acquired.
 	 */
 	if (!detached && are_readers_excluded(refcnt))
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 7a0361cff6db..f73221174a8b 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -46,19 +46,43 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
 #ifdef CONFIG_MMU
 #ifdef CONFIG_PER_VMA_LOCK
 
-static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
+/*
+ * Now that all readers have been evicted, mark the VMA as being out of the
+ * 'exclude readers' state.
+ *
+ * Returns true if the VMA is now detached, otherwise false.
+ */
+static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
 {
-	*detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
-					  &vma->vm_refcnt);
+	bool detached;
+
+	detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
+					 &vma->vm_refcnt);
 	__vma_lockdep_release_exclusive(vma);
+	return detached;
 }
 
 /*
- * __vma_enter_locked() returns 0 immediately if the vma is not
- * attached, otherwise it waits for any current readers to finish and
- * returns 1.  Returns -EINTR if a signal is received while waiting.
+ * Mark the VMA as being in a state of excluding readers, check to see if any
+ * VMA read locks are indeed held, and if so wait for them to be released.
+ *
+ * Note that this function pairs with vma_refcount_put() which will wake up this
+ * thread when it detects that the last reader has released its lock.
+ *
+ * The state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases where we
+ * wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal signal
+ * is permitted to kill it.
+ *
+ * The function will return 0 immediately if the VMA is detached, and 1 once the
+ * VMA has evicted all readers, leaving the VMA exclusively locked.
+ *
+ * If the function returns 1, the caller is required to invoke
+ * __vma_exit_exclusive_locked() once the exclusive state is no longer required.
+ *
+ * If state is set to something other than TASK_UNINTERRUPTIBLE, the function
+ * may also return -EINTR to indicate a fatal signal was received while waiting.
  */
-static inline int __vma_enter_locked(struct vm_area_struct *vma,
+static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
 		bool detaching, int state)
 {
 	int err;
@@ -85,13 +109,10 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
 		   refcount_read(&vma->vm_refcnt) == tgt_refcnt,
 		   state);
 	if (err) {
-		bool detached;
-
-		__vma_exit_locked(vma, &detached);
-		if (detached) {
+		if (__vma_exit_exclusive_locked(vma)) {
 			/*
 			 * The wait failed, but the last reader went away
-			 * as well.  Tell the caller the VMA is detached.
+			 * as well. Tell the caller the VMA is detached.
 			 */
 			WARN_ON_ONCE(!detaching);
 			err = 0;
@@ -108,7 +129,7 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
 {
 	int locked;
 
-	locked = __vma_enter_locked(vma, false, state);
+	locked = __vma_enter_exclusive_locked(vma, false, state);
 	if (locked < 0)
 		return locked;
 
@@ -120,12 +141,9 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
 	 */
 	WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
 
-	if (locked) {
-		bool detached;
-
-		__vma_exit_locked(vma, &detached);
-		WARN_ON_ONCE(detached); /* vma should remain attached */
-	}
+	/* vma should remain attached. */
+	if (locked)
+		WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
 
 	return 0;
 }
@@ -145,12 +163,12 @@ void vma_mark_detached(struct vm_area_struct *vma)
 	detached = __vma_refcount_put(vma, NULL);
 	if (unlikely(!detached)) {
 		/* Wait until vma is detached with no readers. */
-		if (__vma_enter_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
+		if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
 			/*
 			 * Once this is complete, no readers can increment the
 			 * reference count, and the VMA is marked detached.
 			 */
-			__vma_exit_locked(vma, &detached);
+			detached = __vma_exit_exclusive_locked(vma);
 			WARN_ON_ONCE(!detached);
 		}
 	}
-- 
2.52.0
Re: [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()
Posted by Vlastimil Babka 2 weeks, 3 days ago
On 1/22/26 14:01, Lorenzo Stoakes wrote:
> These functions are very confusing indeed. 'Entering' a lock could be
> interpreted as acquiring it, but this is not what these functions are
> interacting with.
> 
> Equally they don't indicate at all what kind of lock we are 'entering' or
> 'exiting'. Finally they are misleading as we invoke these functions when we
> already hold a write lock to detach a VMA.
> 
> These functions are explicitly simply 'entering' and 'exiting' a state in
> which we hold the EXCLUSIVE lock in order that we can either mark the VMA
> as being write-locked, or mark the VMA detached.

If we hold a write lock (i.e. in vma_mark_detached()), that normally means
it's also exclusive?
And if we talk about the state between __vma_enter_exclusive_locked and
__vma_exit_exclusive_locked() as "holding an EXCLUSIVE lock", it's not
exactly the same lock as what we call "VMA write lock" right, so what lock
is it?

Maybe it would help if we stopped calling this internal thing a "lock"?
Except we use it for lockdep's lock_acquire_exclusive(). Sigh, sorry I don't
have any great suggestion.

Maybe call those functions __vma_exclude_readers_start() and
__vma_exclude_readers_end() instead, or something?

> Rename the functions accordingly, and also update
> __vma_exit_exclusive_locked() to return detached state with a __must_check
> directive, as it is simply clumsy to pass an output pointer here to
> detached state and inconsistent vs. __vma_enter_exclusive_locked().
> 
> Finally, remove the unnecessary 'inline' directives.
> 
> No functional change intended.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  include/linux/mmap_lock.h |  4 +--
>  mm/mmap_lock.c            | 60 +++++++++++++++++++++++++--------------
>  2 files changed, 41 insertions(+), 23 deletions(-)
> 
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index da63b1be6ec0..873bc5f3c97c 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -209,8 +209,8 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
>  	__vma_lockdep_release_read(vma);
>  	detached = __vma_refcount_put(vma, &refcnt);
>  	/*
> -	 * __vma_enter_locked() may be sleeping waiting for readers to drop
> -	 * their reference count, so wake it up if we were the last reader
> +	 * __vma_enter_exclusive_locked() may be sleeping waiting for readers to
> +	 * drop their reference count, so wake it up if we were the last reader
>  	 * blocking it from being acquired.
>  	 */
>  	if (!detached && are_readers_excluded(refcnt))
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index 7a0361cff6db..f73221174a8b 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -46,19 +46,43 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
>  #ifdef CONFIG_MMU
>  #ifdef CONFIG_PER_VMA_LOCK
>  
> -static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
> +/*
> + * Now that all readers have been evicted, mark the VMA as being out of the
> + * 'exclude readers' state.
> + *
> + * Returns true if the VMA is now detached, otherwise false.
> + */
> +static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
>  {
> -	*detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> -					  &vma->vm_refcnt);
> +	bool detached;
> +
> +	detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> +					 &vma->vm_refcnt);
>  	__vma_lockdep_release_exclusive(vma);
> +	return detached;
>  }
>  
>  /*
> - * __vma_enter_locked() returns 0 immediately if the vma is not
> - * attached, otherwise it waits for any current readers to finish and
> - * returns 1.  Returns -EINTR if a signal is received while waiting.
> + * Mark the VMA as being in a state of excluding readers, check to see if any
> + * VMA read locks are indeed held, and if so wait for them to be released.
> + *
> + * Note that this function pairs with vma_refcount_put() which will wake up this
> + * thread when it detects that the last reader has released its lock.
> + *
> + * The state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases where we
> + * wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal signal
> + * is permitted to kill it.
> + *
> + * The function will return 0 immediately if the VMA is detached, and 1 once the
> + * VMA has evicted all readers, leaving the VMA exclusively locked.
> + *
> + * If the function returns 1, the caller is required to invoke
> + * __vma_exit_exclusive_locked() once the exclusive state is no longer required.
> + *
> + * If state is set to something other than TASK_UNINTERRUPTIBLE, the function
> + * may also return -EINTR to indicate a fatal signal was received while waiting.
>   */
> -static inline int __vma_enter_locked(struct vm_area_struct *vma,
> +static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
>  		bool detaching, int state)
>  {
>  	int err;
> @@ -85,13 +109,10 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>  		   refcount_read(&vma->vm_refcnt) == tgt_refcnt,
>  		   state);
>  	if (err) {
> -		bool detached;
> -
> -		__vma_exit_locked(vma, &detached);
> -		if (detached) {
> +		if (__vma_exit_exclusive_locked(vma)) {
>  			/*
>  			 * The wait failed, but the last reader went away
> -			 * as well.  Tell the caller the VMA is detached.
> +			 * as well. Tell the caller the VMA is detached.
>  			 */
>  			WARN_ON_ONCE(!detaching);
>  			err = 0;
> @@ -108,7 +129,7 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
>  {
>  	int locked;
>  
> -	locked = __vma_enter_locked(vma, false, state);
> +	locked = __vma_enter_exclusive_locked(vma, false, state);
>  	if (locked < 0)
>  		return locked;
>  
> @@ -120,12 +141,9 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
>  	 */
>  	WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
>  
> -	if (locked) {
> -		bool detached;
> -
> -		__vma_exit_locked(vma, &detached);
> -		WARN_ON_ONCE(detached); /* vma should remain attached */
> -	}
> +	/* vma should remain attached. */
> +	if (locked)
> +		WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
>  
>  	return 0;
>  }
> @@ -145,12 +163,12 @@ void vma_mark_detached(struct vm_area_struct *vma)
>  	detached = __vma_refcount_put(vma, NULL);
>  	if (unlikely(!detached)) {
>  		/* Wait until vma is detached with no readers. */
> -		if (__vma_enter_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
> +		if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
>  			/*
>  			 * Once this is complete, no readers can increment the
>  			 * reference count, and the VMA is marked detached.
>  			 */
> -			__vma_exit_locked(vma, &detached);
> +			detached = __vma_exit_exclusive_locked(vma);
>  			WARN_ON_ONCE(!detached);
>  		}
>  	}
Re: [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()
Posted by Lorenzo Stoakes 2 weeks, 3 days ago
On Fri, Jan 23, 2026 at 10:16:22AM +0100, Vlastimil Babka wrote:
> On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > These functions are very confusing indeed. 'Entering' a lock could be
> > interpreted as acquiring it, but this is not what these functions are
> > interacting with.
> >
> > Equally they don't indicate at all what kind of lock we are 'entering' or
> > 'exiting'. Finally they are misleading as we invoke these functions when we
> > already hold a write lock to detach a VMA.
> >
> > These functions are explicitly simply 'entering' and 'exiting' a state in
> > which we hold the EXCLUSIVE lock in order that we can either mark the VMA
> > as being write-locked, or mark the VMA detached.
>
> If we hold a write lock (i.e. in vma_mark_detached()), that normally means
> it's also exclusive?
> And if we talk about the state between __vma_enter_exclusive_locked and
> __vma_exit_exclusive_locked() as "holding an EXCLUSIVE lock", it's not
> exactly the same lock as what we call "VMA write lock" right, so what lock
> is it?

Well it's not exclusive because spurious reader refcount increments are
possible (unless already detached of course...)

We are _excluding_ readers, including spurious ones realyl.

>
> Maybe it would help if we stopped calling this internal thing a "lock"?
> Except we use it for lockdep's lock_acquire_exclusive(). Sigh, sorry I don't
> have any great suggestion.
>
> Maybe call those functions __vma_exclude_readers_start() and
> __vma_exclude_readers_end() instead, or something?

Yeah that's probably better actually, will rename accordingly.

Cheers, Lorenzo
Re: [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()
Posted by Lorenzo Stoakes 2 weeks, 2 days ago
On Fri, Jan 23, 2026 at 04:17:54PM +0000, Lorenzo Stoakes wrote:
> On Fri, Jan 23, 2026 at 10:16:22AM +0100, Vlastimil Babka wrote:
> > On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > > These functions are very confusing indeed. 'Entering' a lock could be
> > > interpreted as acquiring it, but this is not what these functions are
> > > interacting with.
> > >
> > > Equally they don't indicate at all what kind of lock we are 'entering' or
> > > 'exiting'. Finally they are misleading as we invoke these functions when we
> > > already hold a write lock to detach a VMA.
> > >
> > > These functions are explicitly simply 'entering' and 'exiting' a state in
> > > which we hold the EXCLUSIVE lock in order that we can either mark the VMA
> > > as being write-locked, or mark the VMA detached.
> >
> > If we hold a write lock (i.e. in vma_mark_detached()), that normally means
> > it's also exclusive?
> > And if we talk about the state between __vma_enter_exclusive_locked and
> > __vma_exit_exclusive_locked() as "holding an EXCLUSIVE lock", it's not
> > exactly the same lock as what we call "VMA write lock" right, so what lock
> > is it?
>
> Well it's not exclusive because spurious reader refcount increments are
> possible (unless already detached of course...)
>
> We are _excluding_ readers, including spurious ones realyl.
>
> >
> > Maybe it would help if we stopped calling this internal thing a "lock"?
> > Except we use it for lockdep's lock_acquire_exclusive(). Sigh, sorry I don't
> > have any great suggestion.
> >
> > Maybe call those functions __vma_exclude_readers_start() and
> > __vma_exclude_readers_end() instead, or something?
>
> Yeah that's probably better actually, will rename accordingly.

Actually went with __vma_start_exclude_readers() and
__vma_end_exclude_readers() as felt they read better.

Cheers, Lorenzo
Re: [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()
Posted by Suren Baghdasaryan 2 weeks, 3 days ago
On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> These functions are very confusing indeed. 'Entering' a lock could be
> interpreted as acquiring it, but this is not what these functions are
> interacting with.
>
> Equally they don't indicate at all what kind of lock we are 'entering' or
> 'exiting'. Finally they are misleading as we invoke these functions when we
> already hold a write lock to detach a VMA.
>
> These functions are explicitly simply 'entering' and 'exiting' a state in
> which we hold the EXCLUSIVE lock in order that we can either mark the VMA
> as being write-locked, or mark the VMA detached.
>
> Rename the functions accordingly, and also update
> __vma_exit_exclusive_locked() to return detached state with a __must_check
> directive, as it is simply clumsy to pass an output pointer here to
> detached state and inconsistent vs. __vma_enter_exclusive_locked().
>
> Finally, remove the unnecessary 'inline' directives.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  include/linux/mmap_lock.h |  4 +--
>  mm/mmap_lock.c            | 60 +++++++++++++++++++++++++--------------
>  2 files changed, 41 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index da63b1be6ec0..873bc5f3c97c 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -209,8 +209,8 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
>         __vma_lockdep_release_read(vma);
>         detached = __vma_refcount_put(vma, &refcnt);
>         /*
> -        * __vma_enter_locked() may be sleeping waiting for readers to drop
> -        * their reference count, so wake it up if we were the last reader
> +        * __vma_enter_exclusive_locked() may be sleeping waiting for readers to
> +        * drop their reference count, so wake it up if we were the last reader
>          * blocking it from being acquired.
>          */
>         if (!detached && are_readers_excluded(refcnt))
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index 7a0361cff6db..f73221174a8b 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -46,19 +46,43 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
>  #ifdef CONFIG_MMU
>  #ifdef CONFIG_PER_VMA_LOCK
>
> -static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
> +/*
> + * Now that all readers have been evicted, mark the VMA as being out of the
> + * 'exclude readers' state.
> + *
> + * Returns true if the VMA is now detached, otherwise false.
> + */
> +static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
>  {
> -       *detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> -                                         &vma->vm_refcnt);
> +       bool detached;
> +
> +       detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> +                                        &vma->vm_refcnt);
>         __vma_lockdep_release_exclusive(vma);
> +       return detached;
>  }
>
>  /*
> - * __vma_enter_locked() returns 0 immediately if the vma is not
> - * attached, otherwise it waits for any current readers to finish and
> - * returns 1.  Returns -EINTR if a signal is received while waiting.
> + * Mark the VMA as being in a state of excluding readers, check to see if any
> + * VMA read locks are indeed held, and if so wait for them to be released.
> + *
> + * Note that this function pairs with vma_refcount_put() which will wake up this
> + * thread when it detects that the last reader has released its lock.
> + *
> + * The state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases where we
> + * wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal signal
> + * is permitted to kill it.
> + *
> + * The function will return 0 immediately if the VMA is detached, and 1 once the
> + * VMA has evicted all readers, leaving the VMA exclusively locked.

The wording here is a bit misleading. We do not evict the readers,
just wait for them to complete and exit.

> + *
> + * If the function returns 1, the caller is required to invoke
> + * __vma_exit_exclusive_locked() once the exclusive state is no longer required.
> + *
> + * If state is set to something other than TASK_UNINTERRUPTIBLE, the function
> + * may also return -EINTR to indicate a fatal signal was received while waiting.
>   */
> -static inline int __vma_enter_locked(struct vm_area_struct *vma,
> +static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
>                 bool detaching, int state)
>  {
>         int err;
> @@ -85,13 +109,10 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
>                    state);
>         if (err) {
> -               bool detached;
> -
> -               __vma_exit_locked(vma, &detached);
> -               if (detached) {
> +               if (__vma_exit_exclusive_locked(vma)) {
>                         /*
>                          * The wait failed, but the last reader went away
> -                        * as well.  Tell the caller the VMA is detached.
> +                        * as well. Tell the caller the VMA is detached.
>                          */
>                         WARN_ON_ONCE(!detaching);
>                         err = 0;
> @@ -108,7 +129,7 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
>  {
>         int locked;
>
> -       locked = __vma_enter_locked(vma, false, state);
> +       locked = __vma_enter_exclusive_locked(vma, false, state);
>         if (locked < 0)
>                 return locked;
>
> @@ -120,12 +141,9 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
>          */
>         WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
>
> -       if (locked) {
> -               bool detached;
> -
> -               __vma_exit_locked(vma, &detached);
> -               WARN_ON_ONCE(detached); /* vma should remain attached */
> -       }
> +       /* vma should remain attached. */
> +       if (locked)
> +               WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));

I'm wary of calling functions from WARN_ON_ONCE() statements. If
someone decides to replace WARN_ON_ONCE() with VM_WARN_ON_ONCE(), the
call will disappear when CONFIG_DEBUG_VM=n. Maybe I'm being paranoid
but it's because I have been bitten by that before...

>
>         return 0;
>  }
> @@ -145,12 +163,12 @@ void vma_mark_detached(struct vm_area_struct *vma)
>         detached = __vma_refcount_put(vma, NULL);
>         if (unlikely(!detached)) {
>                 /* Wait until vma is detached with no readers. */
> -               if (__vma_enter_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
> +               if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
>                         /*
>                          * Once this is complete, no readers can increment the
>                          * reference count, and the VMA is marked detached.
>                          */
> -                       __vma_exit_locked(vma, &detached);
> +                       detached = __vma_exit_exclusive_locked(vma);
>                         WARN_ON_ONCE(!detached);
>                 }
>         }
> --
> 2.52.0
>
Re: [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()
Posted by Lorenzo Stoakes 2 weeks, 2 days ago
On Thu, Jan 22, 2026 at 12:15:20PM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > These functions are very confusing indeed. 'Entering' a lock could be
> > interpreted as acquiring it, but this is not what these functions are
> > interacting with.
> >
> > Equally they don't indicate at all what kind of lock we are 'entering' or
> > 'exiting'. Finally they are misleading as we invoke these functions when we
> > already hold a write lock to detach a VMA.
> >
> > These functions are explicitly simply 'entering' and 'exiting' a state in
> > which we hold the EXCLUSIVE lock in order that we can either mark the VMA
> > as being write-locked, or mark the VMA detached.
> >
> > Rename the functions accordingly, and also update
> > __vma_exit_exclusive_locked() to return detached state with a __must_check
> > directive, as it is simply clumsy to pass an output pointer here to
> > detached state and inconsistent vs. __vma_enter_exclusive_locked().
> >
> > Finally, remove the unnecessary 'inline' directives.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> >  include/linux/mmap_lock.h |  4 +--
> >  mm/mmap_lock.c            | 60 +++++++++++++++++++++++++--------------
> >  2 files changed, 41 insertions(+), 23 deletions(-)
> >
> > diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> > index da63b1be6ec0..873bc5f3c97c 100644
> > --- a/include/linux/mmap_lock.h
> > +++ b/include/linux/mmap_lock.h
> > @@ -209,8 +209,8 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
> >         __vma_lockdep_release_read(vma);
> >         detached = __vma_refcount_put(vma, &refcnt);
> >         /*
> > -        * __vma_enter_locked() may be sleeping waiting for readers to drop
> > -        * their reference count, so wake it up if we were the last reader
> > +        * __vma_enter_exclusive_locked() may be sleeping waiting for readers to
> > +        * drop their reference count, so wake it up if we were the last reader
> >          * blocking it from being acquired.
> >          */
> >         if (!detached && are_readers_excluded(refcnt))
> > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> > index 7a0361cff6db..f73221174a8b 100644
> > --- a/mm/mmap_lock.c
> > +++ b/mm/mmap_lock.c
> > @@ -46,19 +46,43 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
> >  #ifdef CONFIG_MMU
> >  #ifdef CONFIG_PER_VMA_LOCK
> >
> > -static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
> > +/*
> > + * Now that all readers have been evicted, mark the VMA as being out of the
> > + * 'exclude readers' state.
> > + *
> > + * Returns true if the VMA is now detached, otherwise false.
> > + */
> > +static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
> >  {
> > -       *detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> > -                                         &vma->vm_refcnt);
> > +       bool detached;
> > +
> > +       detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> > +                                        &vma->vm_refcnt);
> >         __vma_lockdep_release_exclusive(vma);
> > +       return detached;
> >  }
> >
> >  /*
> > - * __vma_enter_locked() returns 0 immediately if the vma is not
> > - * attached, otherwise it waits for any current readers to finish and
> > - * returns 1.  Returns -EINTR if a signal is received while waiting.
> > + * Mark the VMA as being in a state of excluding readers, check to see if any
> > + * VMA read locks are indeed held, and if so wait for them to be released.
> > + *
> > + * Note that this function pairs with vma_refcount_put() which will wake up this
> > + * thread when it detects that the last reader has released its lock.
> > + *
> > + * The state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases where we
> > + * wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal signal
> > + * is permitted to kill it.
> > + *
> > + * The function will return 0 immediately if the VMA is detached, and 1 once the
> > + * VMA has evicted all readers, leaving the VMA exclusively locked.
>
> The wording here is a bit misleading. We do not evict the readers,
> just wait for them to complete and exit.

OK updated to:

 * The function will return 0 immediately if the VMA is detached, or wait for
 * readers and return 1 once they have all exited, leaving the VMA exclusively
 * locked.

>
> > + *
> > + * If the function returns 1, the caller is required to invoke
> > + * __vma_exit_exclusive_locked() once the exclusive state is no longer required.
> > + *
> > + * If state is set to something other than TASK_UNINTERRUPTIBLE, the function
> > + * may also return -EINTR to indicate a fatal signal was received while waiting.
> >   */
> > -static inline int __vma_enter_locked(struct vm_area_struct *vma,
> > +static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
> >                 bool detaching, int state)
> >  {
> >         int err;
> > @@ -85,13 +109,10 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
> >                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
> >                    state);
> >         if (err) {
> > -               bool detached;
> > -
> > -               __vma_exit_locked(vma, &detached);
> > -               if (detached) {
> > +               if (__vma_exit_exclusive_locked(vma)) {
> >                         /*
> >                          * The wait failed, but the last reader went away
> > -                        * as well.  Tell the caller the VMA is detached.
> > +                        * as well. Tell the caller the VMA is detached.
> >                          */
> >                         WARN_ON_ONCE(!detaching);
> >                         err = 0;
> > @@ -108,7 +129,7 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> >  {
> >         int locked;
> >
> > -       locked = __vma_enter_locked(vma, false, state);
> > +       locked = __vma_enter_exclusive_locked(vma, false, state);
> >         if (locked < 0)
> >                 return locked;
> >
> > @@ -120,12 +141,9 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> >          */
> >         WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
> >
> > -       if (locked) {
> > -               bool detached;
> > -
> > -               __vma_exit_locked(vma, &detached);
> > -               WARN_ON_ONCE(detached); /* vma should remain attached */
> > -       }
> > +       /* vma should remain attached. */
> > +       if (locked)
> > +               WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
>
> I'm wary of calling functions from WARN_ON_ONCE() statements. If
> someone decides to replace WARN_ON_ONCE() with VM_WARN_ON_ONCE(), the
> call will disappear when CONFIG_DEBUG_VM=n. Maybe I'm being paranoid
> but it's because I have been bitten by that before...

OK replaced with:

	if (locked) {
		bool detached = __vma_end_exclude_readers(vma);

		/* The VMA should remain attached. */
		WARN_ON_ONCE(detached);
	}

Note that this, and indeed the comment above actually, both get replaced in a
later commit :) but I will action this changes regardless to stay consistent.
Re: [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()
Posted by Andrew Morton 2 weeks, 3 days ago
On Thu, 22 Jan 2026 12:15:20 -0800 Suren Baghdasaryan <surenb@google.com> wrote:

> > +       /* vma should remain attached. */
> > +       if (locked)
> > +               WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
> 
> I'm wary of calling functions from WARN_ON_ONCE() statements. If
> someone decides to replace WARN_ON_ONCE() with VM_WARN_ON_ONCE(), the
> call will disappear when CONFIG_DEBUG_VM=n. Maybe I'm being paranoid
> but it's because I have been bitten by that before...

Yes please.  The elision is desirable if the function has no side-effects, but
__vma_exit_exclusive_locked() changes stuff.

Someone(tm) should check for this.  A pathetically partial grep turns
up plenty of things:

mm/slab_common.c:	if (head && !WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&head_gp_snap)))
mm/slab_common.c:	if (!WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&bnode->gp_snap))) {
mm/page-writeback.c:		WARN_ON_ONCE(atomic_long_add_return(delta,
mm/page_isolation.c:		WARN_ON_ONCE(!pageblock_unisolate_and_move_free_pages(zone, page));
mm/page_alloc.c:	VM_WARN_ONCE(get_pageblock_isolate(page),
mm/numa_memblks.c:	WARN_ON(memblock_clear_hotplug(0, max_addr));
mm/numa_memblks.c:	WARN_ON(memblock_set_node(0, max_addr, &memblock.memory, NUMA_NO_NODE));
mm/numa_memblks.c:	WARN_ON(memblock_set_node(0, max_addr, &memblock.reserved,
mm/zsmalloc.c:		WARN_ON(!zpdesc_trylock(zpdesc));
Re: [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()
Posted by Lorenzo Stoakes 2 weeks, 3 days ago
On Thu, Jan 22, 2026 at 12:55:29PM -0800, Andrew Morton wrote:
> On Thu, 22 Jan 2026 12:15:20 -0800 Suren Baghdasaryan <surenb@google.com> wrote:
>
> > > +       /* vma should remain attached. */
> > > +       if (locked)
> > > +               WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
> >
> > I'm wary of calling functions from WARN_ON_ONCE() statements. If
> > someone decides to replace WARN_ON_ONCE() with VM_WARN_ON_ONCE(), the
> > call will disappear when CONFIG_DEBUG_VM=n. Maybe I'm being paranoid
> > but it's because I have been bitten by that before...
>
> Yes please.  The elision is desirable if the function has no side-effects, but
> __vma_exit_exclusive_locked() changes stuff.

Ack will update in this case :)

>
> Someone(tm) should check for this.  A pathetically partial grep turns
> up plenty of things:
>
> mm/slab_common.c:	if (head && !WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&head_gp_snap)))
> mm/slab_common.c:	if (!WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&bnode->gp_snap))) {
> mm/page-writeback.c:		WARN_ON_ONCE(atomic_long_add_return(delta,
> mm/page_isolation.c:		WARN_ON_ONCE(!pageblock_unisolate_and_move_free_pages(zone, page));
> mm/page_alloc.c:	VM_WARN_ONCE(get_pageblock_isolate(page),
> mm/numa_memblks.c:	WARN_ON(memblock_clear_hotplug(0, max_addr));
> mm/numa_memblks.c:	WARN_ON(memblock_set_node(0, max_addr, &memblock.memory, NUMA_NO_NODE));
> mm/numa_memblks.c:	WARN_ON(memblock_set_node(0, max_addr, &memblock.reserved,
> mm/zsmalloc.c:		WARN_ON(!zpdesc_trylock(zpdesc));
>

*Adds to todo*

Cheers, Lorenzo
Re: [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()
Posted by Lorenzo Stoakes 2 weeks, 4 days ago
I'm clearly cursed today... but obviously this is the RESEND 6/10 :)

Andrew - can you check to make sure I didn't confuse your scripts? Apologies for
this!

Cheers, Lorenzo