[PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns

Lorenzo Stoakes posted 10 patches 2 weeks, 3 days ago
There is a newer version of this series
[PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
Posted by Lorenzo Stoakes 2 weeks, 3 days ago
It is confusing to have __vma_enter_exclusive_locked() return 0, 1 or an
error (but only when waiting for readers in TASK_KILLABLE state), and
having the return value be stored in a stack variable called 'locked' is
further confusion.

More generally, we are doing a lock of rather finnicky things during the
acquisition of a state in which readers are excluded and moving out of this
state, including tracking whether we are detached or not or whether an
error occurred.

We are implementing logic in __vma_enter_exclusive_locked() that
effectively acts as if 'if one caller calls us do X, if another then do Y',
which is very confusing from a control flow perspective.

Introducing the shared helper object state helps us avoid this, as we can
now handle the 'an error arose but we're detached' condition correctly in
both callers - a warning if not detaching, and treating the situation as if
no error arose in the case of a VMA detaching.

This also acts to help document what's going on and allows us to add some
more logical debug asserts.

Also update vma_mark_detached() to add a guard clause for the likely
'already detached' state (given we hold the mmap write lock), and add a
comment about ephemeral VMA read lock reference count increments to clarify
why we are entering/exiting an exclusive locked state here.

No functional change intended.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/mmap_lock.c | 144 +++++++++++++++++++++++++++++++------------------
 1 file changed, 91 insertions(+), 53 deletions(-)

diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index f73221174a8b..75166a43ffa4 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -46,20 +46,40 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
 #ifdef CONFIG_MMU
 #ifdef CONFIG_PER_VMA_LOCK

+/* State shared across __vma_[enter, exit]_exclusive_locked(). */
+struct vma_exclude_readers_state {
+	/* Input parameters. */
+	struct vm_area_struct *vma;
+	int state; /* TASK_KILLABLE or TASK_UNINTERRUPTIBLE. */
+	bool detaching;
+
+	bool detached;
+	bool exclusive; /* Are we exclusively locked? */
+};
+
 /*
  * Now that all readers have been evicted, mark the VMA as being out of the
  * 'exclude readers' state.
  *
  * Returns true if the VMA is now detached, otherwise false.
  */
-static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
+static void __vma_exit_exclusive_locked(struct vma_exclude_readers_state *ves)
 {
-	bool detached;
+	struct vm_area_struct *vma = ves->vma;
+
+	VM_WARN_ON_ONCE(ves->detached);
+	VM_WARN_ON_ONCE(!ves->exclusive);

-	detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
-					 &vma->vm_refcnt);
+	ves->detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
+					      &vma->vm_refcnt);
 	__vma_lockdep_release_exclusive(vma);
-	return detached;
+}
+
+static unsigned int get_target_refcnt(struct vma_exclude_readers_state *ves)
+{
+	const unsigned int tgt = ves->detaching ? 0 : 1;
+
+	return tgt | VM_REFCNT_EXCLUDE_READERS_FLAG;
 }

 /*
@@ -69,30 +89,31 @@ static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
  * Note that this function pairs with vma_refcount_put() which will wake up this
  * thread when it detects that the last reader has released its lock.
  *
- * The state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases where we
- * wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal signal
- * is permitted to kill it.
+ * The ves->state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases
+ * where we wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal
+ * signal is permitted to kill it.
  *
- * The function will return 0 immediately if the VMA is detached, and 1 once the
- * VMA has evicted all readers, leaving the VMA exclusively locked.
+ * The function sets the ves->locked parameter to true if an exclusive lock was
+ * acquired, or false if the VMA was detached or an error arose on wait.
  *
- * If the function returns 1, the caller is required to invoke
- * __vma_exit_exclusive_locked() once the exclusive state is no longer required.
+ * If the function indicates an exclusive lock was acquired via ves->exclusive
+ * (or equivalently, returning 0 with !ves->detached), the caller is required to
+ * invoke __vma_exit_exclusive_locked() once the exclusive state is no longer
+ * required.
  *
- * If state is set to something other than TASK_UNINTERRUPTIBLE, the function
- * may also return -EINTR to indicate a fatal signal was received while waiting.
+ * If ves->state is set to something other than TASK_UNINTERRUPTIBLE, the
+ * function may also return -EINTR to indicate a fatal signal was received while
+ * waiting.
  */
-static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
-		bool detaching, int state)
+static int __vma_enter_exclusive_locked(struct vma_exclude_readers_state *ves)
 {
-	int err;
-	unsigned int tgt_refcnt = VM_REFCNT_EXCLUDE_READERS_FLAG;
+	struct vm_area_struct *vma = ves->vma;
+	unsigned int tgt_refcnt = get_target_refcnt(ves);
+	int err = 0;

 	mmap_assert_write_locked(vma->vm_mm);
-
-	/* Additional refcnt if the vma is attached. */
-	if (!detaching)
-		tgt_refcnt++;
+	VM_WARN_ON_ONCE(ves->detached);
+	VM_WARN_ON_ONCE(ves->exclusive);

 	/*
 	 * If vma is detached then only vma_mark_attached() can raise the
@@ -101,37 +122,39 @@ static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
 	 * See the comment describing the vm_area_struct->vm_refcnt field for
 	 * details of possible refcnt values.
 	 */
-	if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
+	if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt)) {
+		ves->detached = true;
 		return 0;
+	}

 	__vma_lockdep_acquire_exclusive(vma);
 	err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
 		   refcount_read(&vma->vm_refcnt) == tgt_refcnt,
-		   state);
+		   ves->state);
 	if (err) {
-		if (__vma_exit_exclusive_locked(vma)) {
-			/*
-			 * The wait failed, but the last reader went away
-			 * as well. Tell the caller the VMA is detached.
-			 */
-			WARN_ON_ONCE(!detaching);
-			err = 0;
-		}
+		__vma_exit_exclusive_locked(ves);
 		return err;
 	}
-	__vma_lockdep_stat_mark_acquired(vma);

-	return 1;
+	__vma_lockdep_stat_mark_acquired(vma);
+	ves->exclusive = true;
+	return 0;
 }

 int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
 		int state)
 {
-	int locked;
+	int err;
+	struct vma_exclude_readers_state ves = {
+		.vma = vma,
+		.state = state,
+	};

-	locked = __vma_enter_exclusive_locked(vma, false, state);
-	if (locked < 0)
-		return locked;
+	err = __vma_enter_exclusive_locked(&ves);
+	if (err) {
+		WARN_ON_ONCE(ves.detached);
+		return err;
+	}

 	/*
 	 * We should use WRITE_ONCE() here because we can have concurrent reads
@@ -141,9 +164,11 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
 	 */
 	WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);

-	/* vma should remain attached. */
-	if (locked)
-		WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
+	if (!ves.detached) {
+		__vma_exit_exclusive_locked(&ves);
+		/* VMA should remain attached. */
+		WARN_ON_ONCE(ves.detached);
+	}

 	return 0;
 }
@@ -151,7 +176,12 @@ EXPORT_SYMBOL_GPL(__vma_start_write);

 void vma_mark_detached(struct vm_area_struct *vma)
 {
-	bool detached;
+	struct vma_exclude_readers_state ves = {
+		.vma = vma,
+		.state = TASK_UNINTERRUPTIBLE,
+		.detaching = true,
+	};
+	int err;

 	vma_assert_write_locked(vma);
 	vma_assert_attached(vma);
@@ -160,18 +190,26 @@ void vma_mark_detached(struct vm_area_struct *vma)
 	 * See the comment describing the vm_area_struct->vm_refcnt field for
 	 * details of possible refcnt values.
 	 */
-	detached = __vma_refcount_put(vma, NULL);
-	if (unlikely(!detached)) {
-		/* Wait until vma is detached with no readers. */
-		if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
-			/*
-			 * Once this is complete, no readers can increment the
-			 * reference count, and the VMA is marked detached.
-			 */
-			detached = __vma_exit_exclusive_locked(vma);
-			WARN_ON_ONCE(!detached);
-		}
+	if (likely(__vma_refcount_put(vma, NULL)))
+		return;
+
+	/*
+	 * Wait until the VMA is detached with no readers. Since we hold the VMA
+	 * write lock, the only read locks that might be present are those from
+	 * threads trying to acquire the read lock and incrementing the
+	 * reference count before realising the write lock is held and
+	 * decrementing it.
+	 */
+	err = __vma_enter_exclusive_locked(&ves);
+	if (!err && !ves.detached) {
+		/*
+		 * Once this is complete, no readers can increment the
+		 * reference count, and the VMA is marked detached.
+		 */
+		__vma_exit_exclusive_locked(&ves);
 	}
+	/* If an error arose but we were detached anyway, we don't care. */
+	WARN_ON_ONCE(!ves.detached);
 }

 /*
--
2.52.0
Re: [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
Posted by Vlastimil Babka 2 weeks, 2 days ago
On 1/22/26 14:01, Lorenzo Stoakes wrote:
> It is confusing to have __vma_enter_exclusive_locked() return 0, 1 or an
> error (but only when waiting for readers in TASK_KILLABLE state), and
> having the return value be stored in a stack variable called 'locked' is
> further confusion.
> 
> More generally, we are doing a lock of rather finnicky things during the
> acquisition of a state in which readers are excluded and moving out of this
> state, including tracking whether we are detached or not or whether an
> error occurred.
> 
> We are implementing logic in __vma_enter_exclusive_locked() that
> effectively acts as if 'if one caller calls us do X, if another then do Y',
> which is very confusing from a control flow perspective.
> 
> Introducing the shared helper object state helps us avoid this, as we can
> now handle the 'an error arose but we're detached' condition correctly in
> both callers - a warning if not detaching, and treating the situation as if
> no error arose in the case of a VMA detaching.
> 
> This also acts to help document what's going on and allows us to add some
> more logical debug asserts.
> 
> Also update vma_mark_detached() to add a guard clause for the likely
> 'already detached' state (given we hold the mmap write lock), and add a
> comment about ephemeral VMA read lock reference count increments to clarify
> why we are entering/exiting an exclusive locked state here.
> 
> No functional change intended.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  mm/mmap_lock.c | 144 +++++++++++++++++++++++++++++++------------------
>  1 file changed, 91 insertions(+), 53 deletions(-)
> 
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index f73221174a8b..75166a43ffa4 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -46,20 +46,40 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
>  #ifdef CONFIG_MMU
>  #ifdef CONFIG_PER_VMA_LOCK
> 
> +/* State shared across __vma_[enter, exit]_exclusive_locked(). */
> +struct vma_exclude_readers_state {
> +	/* Input parameters. */
> +	struct vm_area_struct *vma;
> +	int state; /* TASK_KILLABLE or TASK_UNINTERRUPTIBLE. */
> +	bool detaching;
> +
> +	bool detached;
> +	bool exclusive; /* Are we exclusively locked? */
> +};
> +
>  /*
>   * Now that all readers have been evicted, mark the VMA as being out of the
>   * 'exclude readers' state.
>   *
>   * Returns true if the VMA is now detached, otherwise false.
>   */
> -static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
> +static void __vma_exit_exclusive_locked(struct vma_exclude_readers_state *ves)
>  {
> -	bool detached;
> +	struct vm_area_struct *vma = ves->vma;
> +
> +	VM_WARN_ON_ONCE(ves->detached);
> +	VM_WARN_ON_ONCE(!ves->exclusive);

I think this will triger when called on wait failure from
__vma_enter_exclusive_locked(). Given the other things Suren raised about
the field, I wonder if it's worth keeping it?

> -	detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> -					 &vma->vm_refcnt);
> +	ves->detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> +					      &vma->vm_refcnt);
>  	__vma_lockdep_release_exclusive(vma);
> -	return detached;
> +}
> +

> @@ -151,7 +176,12 @@ EXPORT_SYMBOL_GPL(__vma_start_write);
> 
>  void vma_mark_detached(struct vm_area_struct *vma)
>  {
> -	bool detached;
> +	struct vma_exclude_readers_state ves = {
> +		.vma = vma,
> +		.state = TASK_UNINTERRUPTIBLE,
> +		.detaching = true,
> +	};
> +	int err;
> 
>  	vma_assert_write_locked(vma);
>  	vma_assert_attached(vma);
> @@ -160,18 +190,26 @@ void vma_mark_detached(struct vm_area_struct *vma)
>  	 * See the comment describing the vm_area_struct->vm_refcnt field for
>  	 * details of possible refcnt values.
>  	 */
> -	detached = __vma_refcount_put(vma, NULL);
> -	if (unlikely(!detached)) {
> -		/* Wait until vma is detached with no readers. */
> -		if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
> -			/*
> -			 * Once this is complete, no readers can increment the
> -			 * reference count, and the VMA is marked detached.
> -			 */
> -			detached = __vma_exit_exclusive_locked(vma);
> -			WARN_ON_ONCE(!detached);
> -		}
> +	if (likely(__vma_refcount_put(vma, NULL)))
> +		return;

Seems to me it would be worthwhile splitting this function to an
static-inline-in-header vma_mark_detached() that does only the asserts and
__vma_refcount_put(), and keeping the function here as __vma_mark_detached()
(or maybe differently named since the detaching kinda already happened with
the refcount put... __vma_mark_detached_finish()?) handling the rare case
__vma_refcount_put() returns false.

> +
> +	/*
> +	 * Wait until the VMA is detached with no readers. Since we hold the VMA
> +	 * write lock, the only read locks that might be present are those from
> +	 * threads trying to acquire the read lock and incrementing the
> +	 * reference count before realising the write lock is held and
> +	 * decrementing it.
> +	 */
> +	err = __vma_enter_exclusive_locked(&ves);
> +	if (!err && !ves.detached) {
> +		/*
> +		 * Once this is complete, no readers can increment the
> +		 * reference count, and the VMA is marked detached.
> +		 */
> +		__vma_exit_exclusive_locked(&ves);
>  	}
> +	/* If an error arose but we were detached anyway, we don't care. */
> +	WARN_ON_ONCE(!ves.detached);
>  }
> 
>  /*
> --
> 2.52.0
Re: [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
Posted by Lorenzo Stoakes 2 weeks, 2 days ago
On Fri, Jan 23, 2026 at 11:02:04AM +0100, Vlastimil Babka wrote:
> On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > It is confusing to have __vma_enter_exclusive_locked() return 0, 1 or an
> > error (but only when waiting for readers in TASK_KILLABLE state), and
> > having the return value be stored in a stack variable called 'locked' is
> > further confusion.
> >
> > More generally, we are doing a lock of rather finnicky things during the
> > acquisition of a state in which readers are excluded and moving out of this
> > state, including tracking whether we are detached or not or whether an
> > error occurred.
> >
> > We are implementing logic in __vma_enter_exclusive_locked() that
> > effectively acts as if 'if one caller calls us do X, if another then do Y',
> > which is very confusing from a control flow perspective.
> >
> > Introducing the shared helper object state helps us avoid this, as we can
> > now handle the 'an error arose but we're detached' condition correctly in
> > both callers - a warning if not detaching, and treating the situation as if
> > no error arose in the case of a VMA detaching.
> >
> > This also acts to help document what's going on and allows us to add some
> > more logical debug asserts.
> >
> > Also update vma_mark_detached() to add a guard clause for the likely
> > 'already detached' state (given we hold the mmap write lock), and add a
> > comment about ephemeral VMA read lock reference count increments to clarify
> > why we are entering/exiting an exclusive locked state here.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> >  mm/mmap_lock.c | 144 +++++++++++++++++++++++++++++++------------------
> >  1 file changed, 91 insertions(+), 53 deletions(-)
> >
> > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> > index f73221174a8b..75166a43ffa4 100644
> > --- a/mm/mmap_lock.c
> > +++ b/mm/mmap_lock.c
> > @@ -46,20 +46,40 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
> >  #ifdef CONFIG_MMU
> >  #ifdef CONFIG_PER_VMA_LOCK
> >
> > +/* State shared across __vma_[enter, exit]_exclusive_locked(). */
> > +struct vma_exclude_readers_state {
> > +	/* Input parameters. */
> > +	struct vm_area_struct *vma;
> > +	int state; /* TASK_KILLABLE or TASK_UNINTERRUPTIBLE. */
> > +	bool detaching;
> > +
> > +	bool detached;
> > +	bool exclusive; /* Are we exclusively locked? */
> > +};
> > +
> >  /*
> >   * Now that all readers have been evicted, mark the VMA as being out of the
> >   * 'exclude readers' state.
> >   *
> >   * Returns true if the VMA is now detached, otherwise false.
> >   */
> > -static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
> > +static void __vma_exit_exclusive_locked(struct vma_exclude_readers_state *ves)
> >  {
> > -	bool detached;
> > +	struct vm_area_struct *vma = ves->vma;
> > +
> > +	VM_WARN_ON_ONCE(ves->detached);
> > +	VM_WARN_ON_ONCE(!ves->exclusive);
>
> I think this will triger when called on wait failure from
> __vma_enter_exclusive_locked(). Given the other things Suren raised about
> the field, I wonder if it's worth keeping it?

He was suggesting I use ves->exclusive over ves->detached? I've now actioned
those changes so... yeh not dropping that.

But you're right that assert is wrong, removed.

>
> > -	detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> > -					 &vma->vm_refcnt);
> > +	ves->detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> > +					      &vma->vm_refcnt);
> >  	__vma_lockdep_release_exclusive(vma);
> > -	return detached;
> > +}
> > +
>
> > @@ -151,7 +176,12 @@ EXPORT_SYMBOL_GPL(__vma_start_write);
> >
> >  void vma_mark_detached(struct vm_area_struct *vma)
> >  {
> > -	bool detached;
> > +	struct vma_exclude_readers_state ves = {
> > +		.vma = vma,
> > +		.state = TASK_UNINTERRUPTIBLE,
> > +		.detaching = true,
> > +	};
> > +	int err;
> >
> >  	vma_assert_write_locked(vma);
> >  	vma_assert_attached(vma);
> > @@ -160,18 +190,26 @@ void vma_mark_detached(struct vm_area_struct *vma)
> >  	 * See the comment describing the vm_area_struct->vm_refcnt field for
> >  	 * details of possible refcnt values.
> >  	 */
> > -	detached = __vma_refcount_put(vma, NULL);
> > -	if (unlikely(!detached)) {
> > -		/* Wait until vma is detached with no readers. */
> > -		if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
> > -			/*
> > -			 * Once this is complete, no readers can increment the
> > -			 * reference count, and the VMA is marked detached.
> > -			 */
> > -			detached = __vma_exit_exclusive_locked(vma);
> > -			WARN_ON_ONCE(!detached);
> > -		}
> > +	if (likely(__vma_refcount_put(vma, NULL)))
> > +		return;
>
> Seems to me it would be worthwhile splitting this function to an
> static-inline-in-header vma_mark_detached() that does only the asserts and
> __vma_refcount_put(), and keeping the function here as __vma_mark_detached()
> (or maybe differently named since the detaching kinda already happened with
> the refcount put... __vma_mark_detached_finish()?) handling the rare case
> __vma_refcount_put() returns false.

Yeah good idea, that saves us always having the ves state etc. too and separates
it out nicely.

Have called it __vma_exclude_readers_for_detach(), and made the change.

>
> > +
> > +	/*
> > +	 * Wait until the VMA is detached with no readers. Since we hold the VMA
> > +	 * write lock, the only read locks that might be present are those from
> > +	 * threads trying to acquire the read lock and incrementing the
> > +	 * reference count before realising the write lock is held and
> > +	 * decrementing it.
> > +	 */
> > +	err = __vma_enter_exclusive_locked(&ves);
> > +	if (!err && !ves.detached) {
> > +		/*
> > +		 * Once this is complete, no readers can increment the
> > +		 * reference count, and the VMA is marked detached.
> > +		 */
> > +		__vma_exit_exclusive_locked(&ves);
> >  	}
> > +	/* If an error arose but we were detached anyway, we don't care. */
> > +	WARN_ON_ONCE(!ves.detached);
> >  }
> >
> >  /*
> > --
> > 2.52.0
>

Cheers, Lorenzo
Re: [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
Posted by Suren Baghdasaryan 2 weeks, 3 days ago
On Thu, Jan 22, 2026 at 5:03 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> It is confusing to have __vma_enter_exclusive_locked() return 0, 1 or an
> error (but only when waiting for readers in TASK_KILLABLE state), and
> having the return value be stored in a stack variable called 'locked' is
> further confusion.
>
> More generally, we are doing a lock of rather finnicky things during the
> acquisition of a state in which readers are excluded and moving out of this
> state, including tracking whether we are detached or not or whether an
> error occurred.
>
> We are implementing logic in __vma_enter_exclusive_locked() that
> effectively acts as if 'if one caller calls us do X, if another then do Y',
> which is very confusing from a control flow perspective.
>
> Introducing the shared helper object state helps us avoid this, as we can
> now handle the 'an error arose but we're detached' condition correctly in
> both callers - a warning if not detaching, and treating the situation as if
> no error arose in the case of a VMA detaching.
>
> This also acts to help document what's going on and allows us to add some
> more logical debug asserts.
>
> Also update vma_mark_detached() to add a guard clause for the likely
> 'already detached' state (given we hold the mmap write lock), and add a
> comment about ephemeral VMA read lock reference count increments to clarify
> why we are entering/exiting an exclusive locked state here.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  mm/mmap_lock.c | 144 +++++++++++++++++++++++++++++++------------------
>  1 file changed, 91 insertions(+), 53 deletions(-)
>
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index f73221174a8b..75166a43ffa4 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -46,20 +46,40 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
>  #ifdef CONFIG_MMU
>  #ifdef CONFIG_PER_VMA_LOCK
>
> +/* State shared across __vma_[enter, exit]_exclusive_locked(). */
> +struct vma_exclude_readers_state {
> +       /* Input parameters. */
> +       struct vm_area_struct *vma;
> +       int state; /* TASK_KILLABLE or TASK_UNINTERRUPTIBLE. */
> +       bool detaching;
> +
Are these:
            /* Output parameters. */
?
> +       bool detached;
> +       bool exclusive; /* Are we exclusively locked? */
> +};
> +
>  /*
>   * Now that all readers have been evicted, mark the VMA as being out of the
>   * 'exclude readers' state.
>   *
>   * Returns true if the VMA is now detached, otherwise false.
>   */
> -static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
> +static void __vma_exit_exclusive_locked(struct vma_exclude_readers_state *ves)
>  {
> -       bool detached;
> +       struct vm_area_struct *vma = ves->vma;
> +
> +       VM_WARN_ON_ONCE(ves->detached);
> +       VM_WARN_ON_ONCE(!ves->exclusive);
>
> -       detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> -                                        &vma->vm_refcnt);
> +       ves->detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> +                                             &vma->vm_refcnt);
>         __vma_lockdep_release_exclusive(vma);
> -       return detached;
> +}
> +
> +static unsigned int get_target_refcnt(struct vma_exclude_readers_state *ves)
> +{
> +       const unsigned int tgt = ves->detaching ? 0 : 1;
> +
> +       return tgt | VM_REFCNT_EXCLUDE_READERS_FLAG;
>  }
>
>  /*
> @@ -69,30 +89,31 @@ static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
>   * Note that this function pairs with vma_refcount_put() which will wake up this
>   * thread when it detects that the last reader has released its lock.
>   *
> - * The state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases where we
> - * wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal signal
> - * is permitted to kill it.
> + * The ves->state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases
> + * where we wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal
> + * signal is permitted to kill it.
>   *
> - * The function will return 0 immediately if the VMA is detached, and 1 once the
> - * VMA has evicted all readers, leaving the VMA exclusively locked.
> + * The function sets the ves->locked parameter to true if an exclusive lock was

s/ves->locked/ves->exclusive

> + * acquired, or false if the VMA was detached or an error arose on wait.
>   *
> - * If the function returns 1, the caller is required to invoke
> - * __vma_exit_exclusive_locked() once the exclusive state is no longer required.
> + * If the function indicates an exclusive lock was acquired via ves->exclusive
> + * (or equivalently, returning 0 with !ves->detached),

I would remove the mention of that equivalence because with this
change, return 0 simply indicates that the operation was successful
and should not be used to infer any additional states. To get specific
state the caller should use proper individual ves fields. Using return
value for anything else defeats the whole purpose of this cleanup.

> the caller is required to
> + * invoke __vma_exit_exclusive_locked() once the exclusive state is no longer
> + * required.
>   *
> - * If state is set to something other than TASK_UNINTERRUPTIBLE, the function
> - * may also return -EINTR to indicate a fatal signal was received while waiting.
> + * If ves->state is set to something other than TASK_UNINTERRUPTIBLE, the
> + * function may also return -EINTR to indicate a fatal signal was received while
> + * waiting.
>   */
> -static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
> -               bool detaching, int state)
> +static int __vma_enter_exclusive_locked(struct vma_exclude_readers_state *ves)
>  {
> -       int err;
> -       unsigned int tgt_refcnt = VM_REFCNT_EXCLUDE_READERS_FLAG;
> +       struct vm_area_struct *vma = ves->vma;
> +       unsigned int tgt_refcnt = get_target_refcnt(ves);
> +       int err = 0;
>
>         mmap_assert_write_locked(vma->vm_mm);
> -
> -       /* Additional refcnt if the vma is attached. */
> -       if (!detaching)
> -               tgt_refcnt++;
> +       VM_WARN_ON_ONCE(ves->detached);
> +       VM_WARN_ON_ONCE(ves->exclusive);

Aren't these output parameters? If so, why do we stipulate their
initial values instead of setting them appropriately?

>
>         /*
>          * If vma is detached then only vma_mark_attached() can raise the
> @@ -101,37 +122,39 @@ static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
>          * See the comment describing the vm_area_struct->vm_refcnt field for
>          * details of possible refcnt values.
>          */
> -       if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
> +       if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt)) {
> +               ves->detached = true;
>                 return 0;
> +       }
>
>         __vma_lockdep_acquire_exclusive(vma);
>         err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
>                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
> -                  state);
> +                  ves->state);
>         if (err) {
> -               if (__vma_exit_exclusive_locked(vma)) {
> -                       /*
> -                        * The wait failed, but the last reader went away
> -                        * as well. Tell the caller the VMA is detached.
> -                        */
> -                       WARN_ON_ONCE(!detaching);
> -                       err = 0;
> -               }
> +               __vma_exit_exclusive_locked(ves);
>                 return err;

Nice! We preserve both error and detached state information.

>         }
> -       __vma_lockdep_stat_mark_acquired(vma);
>
> -       return 1;
> +       __vma_lockdep_stat_mark_acquired(vma);
> +       ves->exclusive = true;
> +       return 0;
>  }
>
>  int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
>                 int state)
>  {
> -       int locked;
> +       int err;
> +       struct vma_exclude_readers_state ves = {
> +               .vma = vma,
> +               .state = state,
> +       };
>
> -       locked = __vma_enter_exclusive_locked(vma, false, state);
> -       if (locked < 0)
> -               return locked;
> +       err = __vma_enter_exclusive_locked(&ves);
> +       if (err) {
> +               WARN_ON_ONCE(ves.detached);

I believe the above WARN_ON_ONCE() should stay inside of
__vma_enter_exclusive_locked(). Its correctness depends on the
implementation details of __vma_enter_exclusive_locked(). More
specifically, it is only correct because
__vma_enter_exclusive_locked() returns 0 if the VMA is detached, even
if there was a pending SIGKILL.

> +               return err;
> +       }
>
>         /*
>          * We should use WRITE_ONCE() here because we can have concurrent reads
> @@ -141,9 +164,11 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
>          */
>         WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
>
> -       /* vma should remain attached. */
> -       if (locked)
> -               WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
> +       if (!ves.detached) {

Strictly speaking the above check should be checking ves->exclusive
instead of !ves.detached. What you have is technically correct but
it's again related to that comment:

"If the function indicates an exclusive lock was acquired via
ves->exclusive (or equivalently, returning 0 with !ves->detached), the
caller is required to invoke __vma_exit_exclusive_locked() once the
exclusive state is no longer required."

So, here you are using returning 0 with !ves->detached as an
indication that the VMA was successfully locked. I think it's less
confusing if we use the field dedicated for that purpose.

> +               __vma_exit_exclusive_locked(&ves);
> +               /* VMA should remain attached. */
> +               WARN_ON_ONCE(ves.detached);
> +       }
>
>         return 0;
>  }
> @@ -151,7 +176,12 @@ EXPORT_SYMBOL_GPL(__vma_start_write);
>
>  void vma_mark_detached(struct vm_area_struct *vma)
>  {
> -       bool detached;
> +       struct vma_exclude_readers_state ves = {
> +               .vma = vma,
> +               .state = TASK_UNINTERRUPTIBLE,
> +               .detaching = true,
> +       };
> +       int err;
>
>         vma_assert_write_locked(vma);
>         vma_assert_attached(vma);
> @@ -160,18 +190,26 @@ void vma_mark_detached(struct vm_area_struct *vma)
>          * See the comment describing the vm_area_struct->vm_refcnt field for
>          * details of possible refcnt values.
>          */
> -       detached = __vma_refcount_put(vma, NULL);
> -       if (unlikely(!detached)) {
> -               /* Wait until vma is detached with no readers. */
> -               if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
> -                       /*
> -                        * Once this is complete, no readers can increment the
> -                        * reference count, and the VMA is marked detached.
> -                        */
> -                       detached = __vma_exit_exclusive_locked(vma);
> -                       WARN_ON_ONCE(!detached);
> -               }
> +       if (likely(__vma_refcount_put(vma, NULL)))
> +               return;
> +
> +       /*
> +        * Wait until the VMA is detached with no readers. Since we hold the VMA
> +        * write lock, the only read locks that might be present are those from
> +        * threads trying to acquire the read lock and incrementing the
> +        * reference count before realising the write lock is held and
> +        * decrementing it.
> +        */
> +       err = __vma_enter_exclusive_locked(&ves);
> +       if (!err && !ves.detached) {

Same here, we should be checking ves->exclusive to decide if
__vma_exit_exclusive_locked() should be called or not.

> +               /*
> +                * Once this is complete, no readers can increment the
> +                * reference count, and the VMA is marked detached.
> +                */
> +               __vma_exit_exclusive_locked(&ves);
>         }
> +       /* If an error arose but we were detached anyway, we don't care. */
> +       WARN_ON_ONCE(!ves.detached);
>  }
>
>  /*
> --
> 2.52.0
Re: [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
Posted by Lorenzo Stoakes 2 weeks, 2 days ago
On Thu, Jan 22, 2026 at 01:41:39PM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 22, 2026 at 5:03 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > It is confusing to have __vma_enter_exclusive_locked() return 0, 1 or an
> > error (but only when waiting for readers in TASK_KILLABLE state), and
> > having the return value be stored in a stack variable called 'locked' is
> > further confusion.
> >
> > More generally, we are doing a lock of rather finnicky things during the
> > acquisition of a state in which readers are excluded and moving out of this
> > state, including tracking whether we are detached or not or whether an
> > error occurred.
> >
> > We are implementing logic in __vma_enter_exclusive_locked() that
> > effectively acts as if 'if one caller calls us do X, if another then do Y',
> > which is very confusing from a control flow perspective.
> >
> > Introducing the shared helper object state helps us avoid this, as we can
> > now handle the 'an error arose but we're detached' condition correctly in
> > both callers - a warning if not detaching, and treating the situation as if
> > no error arose in the case of a VMA detaching.
> >
> > This also acts to help document what's going on and allows us to add some
> > more logical debug asserts.
> >
> > Also update vma_mark_detached() to add a guard clause for the likely
> > 'already detached' state (given we hold the mmap write lock), and add a
> > comment about ephemeral VMA read lock reference count increments to clarify
> > why we are entering/exiting an exclusive locked state here.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> >  mm/mmap_lock.c | 144 +++++++++++++++++++++++++++++++------------------
> >  1 file changed, 91 insertions(+), 53 deletions(-)
> >
> > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> > index f73221174a8b..75166a43ffa4 100644
> > --- a/mm/mmap_lock.c
> > +++ b/mm/mmap_lock.c
> > @@ -46,20 +46,40 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
> >  #ifdef CONFIG_MMU
> >  #ifdef CONFIG_PER_VMA_LOCK
> >
> > +/* State shared across __vma_[enter, exit]_exclusive_locked(). */
> > +struct vma_exclude_readers_state {
> > +       /* Input parameters. */
> > +       struct vm_area_struct *vma;
> > +       int state; /* TASK_KILLABLE or TASK_UNINTERRUPTIBLE. */
> > +       bool detaching;
> > +
> Are these:
>             /* Output parameters. */
> ?

Yurp.

Oh you added the comment :) well clearly I should have, will do so.

> > +       bool detached;
> > +       bool exclusive; /* Are we exclusively locked? */
> > +};
> > +
> >  /*
> >   * Now that all readers have been evicted, mark the VMA as being out of the
> >   * 'exclude readers' state.
> >   *
> >   * Returns true if the VMA is now detached, otherwise false.
> >   */
> > -static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
> > +static void __vma_exit_exclusive_locked(struct vma_exclude_readers_state *ves)
> >  {
> > -       bool detached;
> > +       struct vm_area_struct *vma = ves->vma;
> > +
> > +       VM_WARN_ON_ONCE(ves->detached);
> > +       VM_WARN_ON_ONCE(!ves->exclusive);
> >
> > -       detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> > -                                        &vma->vm_refcnt);
> > +       ves->detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> > +                                             &vma->vm_refcnt);
> >         __vma_lockdep_release_exclusive(vma);
> > -       return detached;
> > +}
> > +
> > +static unsigned int get_target_refcnt(struct vma_exclude_readers_state *ves)
> > +{
> > +       const unsigned int tgt = ves->detaching ? 0 : 1;
> > +
> > +       return tgt | VM_REFCNT_EXCLUDE_READERS_FLAG;
> >  }
> >
> >  /*
> > @@ -69,30 +89,31 @@ static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
> >   * Note that this function pairs with vma_refcount_put() which will wake up this
> >   * thread when it detects that the last reader has released its lock.
> >   *
> > - * The state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases where we
> > - * wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal signal
> > - * is permitted to kill it.
> > + * The ves->state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases
> > + * where we wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal
> > + * signal is permitted to kill it.
> >   *
> > - * The function will return 0 immediately if the VMA is detached, and 1 once the
> > - * VMA has evicted all readers, leaving the VMA exclusively locked.
> > + * The function sets the ves->locked parameter to true if an exclusive lock was
>
> s/ves->locked/ves->exclusive
>
> > + * acquired, or false if the VMA was detached or an error arose on wait.
> >   *
> > - * If the function returns 1, the caller is required to invoke
> > - * __vma_exit_exclusive_locked() once the exclusive state is no longer required.
> > + * If the function indicates an exclusive lock was acquired via ves->exclusive
> > + * (or equivalently, returning 0 with !ves->detached),
>
> I would remove the mention of that equivalence because with this
> change, return 0 simply indicates that the operation was successful
> and should not be used to infer any additional states. To get specific
> state the caller should use proper individual ves fields. Using return
> value for anything else defeats the whole purpose of this cleanup.

OK I'll remove the equivalency comment.

>
> > the caller is required to
> > + * invoke __vma_exit_exclusive_locked() once the exclusive state is no longer
> > + * required.
> >   *
> > - * If state is set to something other than TASK_UNINTERRUPTIBLE, the function
> > - * may also return -EINTR to indicate a fatal signal was received while waiting.
> > + * If ves->state is set to something other than TASK_UNINTERRUPTIBLE, the
> > + * function may also return -EINTR to indicate a fatal signal was received while
> > + * waiting.
> >   */
> > -static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
> > -               bool detaching, int state)
> > +static int __vma_enter_exclusive_locked(struct vma_exclude_readers_state *ves)
> >  {
> > -       int err;
> > -       unsigned int tgt_refcnt = VM_REFCNT_EXCLUDE_READERS_FLAG;
> > +       struct vm_area_struct *vma = ves->vma;
> > +       unsigned int tgt_refcnt = get_target_refcnt(ves);
> > +       int err = 0;
> >
> >         mmap_assert_write_locked(vma->vm_mm);
> > -
> > -       /* Additional refcnt if the vma is attached. */
> > -       if (!detaching)
> > -               tgt_refcnt++;
> > +       VM_WARN_ON_ONCE(ves->detached);
> > +       VM_WARN_ON_ONCE(ves->exclusive);
>
> Aren't these output parameters? If so, why do we stipulate their
> initial values instead of setting them appropriately?

I guess it was just to ensure correctly set up but yes it's a bit weird, will
remove.

>
> >
> >         /*
> >          * If vma is detached then only vma_mark_attached() can raise the
> > @@ -101,37 +122,39 @@ static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
> >          * See the comment describing the vm_area_struct->vm_refcnt field for
> >          * details of possible refcnt values.
> >          */
> > -       if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
> > +       if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt)) {
> > +               ves->detached = true;
> >                 return 0;
> > +       }
> >
> >         __vma_lockdep_acquire_exclusive(vma);
> >         err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
> >                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
> > -                  state);
> > +                  ves->state);
> >         if (err) {
> > -               if (__vma_exit_exclusive_locked(vma)) {
> > -                       /*
> > -                        * The wait failed, but the last reader went away
> > -                        * as well. Tell the caller the VMA is detached.
> > -                        */
> > -                       WARN_ON_ONCE(!detaching);
> > -                       err = 0;
> > -               }
> > +               __vma_exit_exclusive_locked(ves);
> >                 return err;
>
> Nice! We preserve both error and detached state information.

:)

>
> >         }
> > -       __vma_lockdep_stat_mark_acquired(vma);
> >
> > -       return 1;
> > +       __vma_lockdep_stat_mark_acquired(vma);
> > +       ves->exclusive = true;
> > +       return 0;
> >  }
> >
> >  int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> >                 int state)
> >  {
> > -       int locked;
> > +       int err;
> > +       struct vma_exclude_readers_state ves = {
> > +               .vma = vma,
> > +               .state = state,
> > +       };
> >
> > -       locked = __vma_enter_exclusive_locked(vma, false, state);
> > -       if (locked < 0)
> > -               return locked;
> > +       err = __vma_enter_exclusive_locked(&ves);
> > +       if (err) {
> > +               WARN_ON_ONCE(ves.detached);
>
> I believe the above WARN_ON_ONCE() should stay inside of
> __vma_enter_exclusive_locked(). Its correctness depends on the
> implementation details of __vma_enter_exclusive_locked(). More

Well this was kind of horrible in the original implementation, as you are
literally telling the function whether you are detaching or not, and only doing
this assert if you were not.

That kind of 'if the caller is X do A, if the caller is Y do B' is really a code
smell, you should have X do the thing.

> specifically, it is only correct because
> __vma_enter_exclusive_locked() returns 0 if the VMA is detached, even
> if there was a pending SIGKILL.

Well it's a documented aspect of the function that we return 0 immediately on
detached state so I'm not sure that is an implementation detail?

I significantly prefer having that here vs. 'if not detaching then assert if
detached' for people to scratch their heads over in the function.

I think this detail is incorrect anyway, because:

	if (err) {
		if (__vma_exit_exclusive_locked(vma)) {
			/*
			 * The wait failed, but the last reader went away
			 * as well. Tell the caller the VMA is detached.
			 */
			 WARN_ON_ONCE(!detaching);
			 err = 0;
		}
		...
	}

Implies - hey we're fine with err not being zero AND detaching right? In which
case reset the error?

Except when detaching we set TASK_UNINTERRUPTIBLE? Which surely means we never
seen an error?

Or do we?

Either way it's something we handle differently based on _caller_. So it doesn't
belong in the function at all.

It's certainly logic that's highly confusing and needs to be handled
differently.

>
> > +               return err;
> > +       }
> >
> >         /*
> >          * We should use WRITE_ONCE() here because we can have concurrent reads
> > @@ -141,9 +164,11 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> >          */
> >         WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
> >
> > -       /* vma should remain attached. */
> > -       if (locked)
> > -               WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
> > +       if (!ves.detached) {
>
> Strictly speaking the above check should be checking ves->exclusive
> instead of !ves.detached. What you have is technically correct but
> it's again related to that comment:
>
> "If the function indicates an exclusive lock was acquired via
> ves->exclusive (or equivalently, returning 0 with !ves->detached), the
> caller is required to invoke __vma_exit_exclusive_locked() once the
> exclusive state is no longer required."
>
> So, here you are using returning 0 with !ves->detached as an
> indication that the VMA was successfully locked. I think it's less
> confusing if we use the field dedicated for that purpose.

OK changed.

>
> > +               __vma_exit_exclusive_locked(&ves);
> > +               /* VMA should remain attached. */
> > +               WARN_ON_ONCE(ves.detached);
> > +       }
> >
> >         return 0;
> >  }
> > @@ -151,7 +176,12 @@ EXPORT_SYMBOL_GPL(__vma_start_write);
> >
> >  void vma_mark_detached(struct vm_area_struct *vma)
> >  {
> > -       bool detached;
> > +       struct vma_exclude_readers_state ves = {
> > +               .vma = vma,
> > +               .state = TASK_UNINTERRUPTIBLE,
> > +               .detaching = true,
> > +       };
> > +       int err;
> >
> >         vma_assert_write_locked(vma);
> >         vma_assert_attached(vma);
> > @@ -160,18 +190,26 @@ void vma_mark_detached(struct vm_area_struct *vma)
> >          * See the comment describing the vm_area_struct->vm_refcnt field for
> >          * details of possible refcnt values.
> >          */
> > -       detached = __vma_refcount_put(vma, NULL);
> > -       if (unlikely(!detached)) {
> > -               /* Wait until vma is detached with no readers. */
> > -               if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
> > -                       /*
> > -                        * Once this is complete, no readers can increment the
> > -                        * reference count, and the VMA is marked detached.
> > -                        */
> > -                       detached = __vma_exit_exclusive_locked(vma);
> > -                       WARN_ON_ONCE(!detached);
> > -               }
> > +       if (likely(__vma_refcount_put(vma, NULL)))
> > +               return;
> > +
> > +       /*
> > +        * Wait until the VMA is detached with no readers. Since we hold the VMA
> > +        * write lock, the only read locks that might be present are those from
> > +        * threads trying to acquire the read lock and incrementing the
> > +        * reference count before realising the write lock is held and
> > +        * decrementing it.
> > +        */
> > +       err = __vma_enter_exclusive_locked(&ves);
> > +       if (!err && !ves.detached) {
>
> Same here, we should be checking ves->exclusive to decide if
> __vma_exit_exclusive_locked() should be called or not.

Ack, changed.

>
> > +               /*
> > +                * Once this is complete, no readers can increment the
> > +                * reference count, and the VMA is marked detached.
> > +                */
> > +               __vma_exit_exclusive_locked(&ves);
> >         }
> > +       /* If an error arose but we were detached anyway, we don't care. */
> > +       WARN_ON_ONCE(!ves.detached);
> >  }
> >
> >  /*
> > --
> > 2.52.0

Cheers, Lorenzo
Re: [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
Posted by Suren Baghdasaryan 2 weeks, 2 days ago
On Fri, Jan 23, 2026 at 9:59 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Thu, Jan 22, 2026 at 01:41:39PM -0800, Suren Baghdasaryan wrote:
> > On Thu, Jan 22, 2026 at 5:03 AM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > >
> > > It is confusing to have __vma_enter_exclusive_locked() return 0, 1 or an
> > > error (but only when waiting for readers in TASK_KILLABLE state), and
> > > having the return value be stored in a stack variable called 'locked' is
> > > further confusion.
> > >
> > > More generally, we are doing a lock of rather finnicky things during the
> > > acquisition of a state in which readers are excluded and moving out of this
> > > state, including tracking whether we are detached or not or whether an
> > > error occurred.
> > >
> > > We are implementing logic in __vma_enter_exclusive_locked() that
> > > effectively acts as if 'if one caller calls us do X, if another then do Y',
> > > which is very confusing from a control flow perspective.
> > >
> > > Introducing the shared helper object state helps us avoid this, as we can
> > > now handle the 'an error arose but we're detached' condition correctly in
> > > both callers - a warning if not detaching, and treating the situation as if
> > > no error arose in the case of a VMA detaching.
> > >
> > > This also acts to help document what's going on and allows us to add some
> > > more logical debug asserts.
> > >
> > > Also update vma_mark_detached() to add a guard clause for the likely
> > > 'already detached' state (given we hold the mmap write lock), and add a
> > > comment about ephemeral VMA read lock reference count increments to clarify
> > > why we are entering/exiting an exclusive locked state here.
> > >
> > > No functional change intended.
> > >
> > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > ---
> > >  mm/mmap_lock.c | 144 +++++++++++++++++++++++++++++++------------------
> > >  1 file changed, 91 insertions(+), 53 deletions(-)
> > >
> > > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> > > index f73221174a8b..75166a43ffa4 100644
> > > --- a/mm/mmap_lock.c
> > > +++ b/mm/mmap_lock.c
> > > @@ -46,20 +46,40 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
> > >  #ifdef CONFIG_MMU
> > >  #ifdef CONFIG_PER_VMA_LOCK
> > >
> > > +/* State shared across __vma_[enter, exit]_exclusive_locked(). */
> > > +struct vma_exclude_readers_state {
> > > +       /* Input parameters. */
> > > +       struct vm_area_struct *vma;
> > > +       int state; /* TASK_KILLABLE or TASK_UNINTERRUPTIBLE. */
> > > +       bool detaching;
> > > +
> > Are these:
> >             /* Output parameters. */
> > ?
>
> Yurp.
>
> Oh you added the comment :) well clearly I should have, will do so.
>
> > > +       bool detached;
> > > +       bool exclusive; /* Are we exclusively locked? */
> > > +};
> > > +
> > >  /*
> > >   * Now that all readers have been evicted, mark the VMA as being out of the
> > >   * 'exclude readers' state.
> > >   *
> > >   * Returns true if the VMA is now detached, otherwise false.
> > >   */
> > > -static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
> > > +static void __vma_exit_exclusive_locked(struct vma_exclude_readers_state *ves)
> > >  {
> > > -       bool detached;
> > > +       struct vm_area_struct *vma = ves->vma;
> > > +
> > > +       VM_WARN_ON_ONCE(ves->detached);
> > > +       VM_WARN_ON_ONCE(!ves->exclusive);
> > >
> > > -       detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> > > -                                        &vma->vm_refcnt);
> > > +       ves->detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> > > +                                             &vma->vm_refcnt);
> > >         __vma_lockdep_release_exclusive(vma);
> > > -       return detached;
> > > +}
> > > +
> > > +static unsigned int get_target_refcnt(struct vma_exclude_readers_state *ves)
> > > +{
> > > +       const unsigned int tgt = ves->detaching ? 0 : 1;
> > > +
> > > +       return tgt | VM_REFCNT_EXCLUDE_READERS_FLAG;
> > >  }
> > >
> > >  /*
> > > @@ -69,30 +89,31 @@ static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
> > >   * Note that this function pairs with vma_refcount_put() which will wake up this
> > >   * thread when it detects that the last reader has released its lock.
> > >   *
> > > - * The state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases where we
> > > - * wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal signal
> > > - * is permitted to kill it.
> > > + * The ves->state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases
> > > + * where we wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal
> > > + * signal is permitted to kill it.
> > >   *
> > > - * The function will return 0 immediately if the VMA is detached, and 1 once the
> > > - * VMA has evicted all readers, leaving the VMA exclusively locked.
> > > + * The function sets the ves->locked parameter to true if an exclusive lock was
> >
> > s/ves->locked/ves->exclusive
> >
> > > + * acquired, or false if the VMA was detached or an error arose on wait.
> > >   *
> > > - * If the function returns 1, the caller is required to invoke
> > > - * __vma_exit_exclusive_locked() once the exclusive state is no longer required.
> > > + * If the function indicates an exclusive lock was acquired via ves->exclusive
> > > + * (or equivalently, returning 0 with !ves->detached),
> >
> > I would remove the mention of that equivalence because with this
> > change, return 0 simply indicates that the operation was successful
> > and should not be used to infer any additional states. To get specific
> > state the caller should use proper individual ves fields. Using return
> > value for anything else defeats the whole purpose of this cleanup.
>
> OK I'll remove the equivalency comment.
>
> >
> > > the caller is required to
> > > + * invoke __vma_exit_exclusive_locked() once the exclusive state is no longer
> > > + * required.
> > >   *
> > > - * If state is set to something other than TASK_UNINTERRUPTIBLE, the function
> > > - * may also return -EINTR to indicate a fatal signal was received while waiting.
> > > + * If ves->state is set to something other than TASK_UNINTERRUPTIBLE, the
> > > + * function may also return -EINTR to indicate a fatal signal was received while
> > > + * waiting.
> > >   */
> > > -static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
> > > -               bool detaching, int state)
> > > +static int __vma_enter_exclusive_locked(struct vma_exclude_readers_state *ves)
> > >  {
> > > -       int err;
> > > -       unsigned int tgt_refcnt = VM_REFCNT_EXCLUDE_READERS_FLAG;
> > > +       struct vm_area_struct *vma = ves->vma;
> > > +       unsigned int tgt_refcnt = get_target_refcnt(ves);
> > > +       int err = 0;
> > >
> > >         mmap_assert_write_locked(vma->vm_mm);
> > > -
> > > -       /* Additional refcnt if the vma is attached. */
> > > -       if (!detaching)
> > > -               tgt_refcnt++;
> > > +       VM_WARN_ON_ONCE(ves->detached);
> > > +       VM_WARN_ON_ONCE(ves->exclusive);
> >
> > Aren't these output parameters? If so, why do we stipulate their
> > initial values instead of setting them appropriately?
>
> I guess it was just to ensure correctly set up but yes it's a bit weird, will
> remove.
>
> >
> > >
> > >         /*
> > >          * If vma is detached then only vma_mark_attached() can raise the
> > > @@ -101,37 +122,39 @@ static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
> > >          * See the comment describing the vm_area_struct->vm_refcnt field for
> > >          * details of possible refcnt values.
> > >          */
> > > -       if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
> > > +       if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt)) {
> > > +               ves->detached = true;
> > >                 return 0;
> > > +       }
> > >
> > >         __vma_lockdep_acquire_exclusive(vma);
> > >         err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
> > >                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
> > > -                  state);
> > > +                  ves->state);
> > >         if (err) {
> > > -               if (__vma_exit_exclusive_locked(vma)) {
> > > -                       /*
> > > -                        * The wait failed, but the last reader went away
> > > -                        * as well. Tell the caller the VMA is detached.
> > > -                        */
> > > -                       WARN_ON_ONCE(!detaching);
> > > -                       err = 0;
> > > -               }
> > > +               __vma_exit_exclusive_locked(ves);
> > >                 return err;
> >
> > Nice! We preserve both error and detached state information.
>
> :)
>
> >
> > >         }
> > > -       __vma_lockdep_stat_mark_acquired(vma);
> > >
> > > -       return 1;
> > > +       __vma_lockdep_stat_mark_acquired(vma);
> > > +       ves->exclusive = true;
> > > +       return 0;
> > >  }
> > >
> > >  int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> > >                 int state)
> > >  {
> > > -       int locked;
> > > +       int err;
> > > +       struct vma_exclude_readers_state ves = {
> > > +               .vma = vma,
> > > +               .state = state,
> > > +       };
> > >
> > > -       locked = __vma_enter_exclusive_locked(vma, false, state);
> > > -       if (locked < 0)
> > > -               return locked;
> > > +       err = __vma_enter_exclusive_locked(&ves);
> > > +       if (err) {
> > > +               WARN_ON_ONCE(ves.detached);
> >
> > I believe the above WARN_ON_ONCE() should stay inside of
> > __vma_enter_exclusive_locked(). Its correctness depends on the
> > implementation details of __vma_enter_exclusive_locked(). More
>
> Well this was kind of horrible in the original implementation, as you are
> literally telling the function whether you are detaching or not, and only doing
> this assert if you were not.
>
> That kind of 'if the caller is X do A, if the caller is Y do B' is really a code
> smell, you should have X do the thing.
>
> > specifically, it is only correct because
> > __vma_enter_exclusive_locked() returns 0 if the VMA is detached, even
> > if there was a pending SIGKILL.
>
> Well it's a documented aspect of the function that we return 0 immediately on
> detached state so I'm not sure that is an implementation detail?
>
> I significantly prefer having that here vs. 'if not detaching then assert if
> detached' for people to scratch their heads over in the function.
>
> I think this detail is incorrect anyway, because:
>
>         if (err) {
>                 if (__vma_exit_exclusive_locked(vma)) {
>                         /*
>                          * The wait failed, but the last reader went away
>                          * as well. Tell the caller the VMA is detached.
>                          */
>                          WARN_ON_ONCE(!detaching);
>                          err = 0;
>                 }
>                 ...
>         }
>
> Implies - hey we're fine with err not being zero AND detaching right? In which
> case reset the error?
>
> Except when detaching we set TASK_UNINTERRUPTIBLE? Which surely means we never
> seen an error?
>
> Or do we?
>
> Either way it's something we handle differently based on _caller_. So it doesn't
> belong in the function at all.
>
> It's certainly logic that's highly confusing and needs to be handled
> differently.

Just to be clear, I'm not defending the way it is done before your
change, however the old check for "if not detaching then assert if
detached" makes more sense to me than "if
__vma_enter_exclusive_locked() failed assert that we VMA is still
attached". The latter one does not make logical sense to me. It's only
correct because of the implementation detail of
__vma_enter_exclusive_locked().

>
> >
> > > +               return err;
> > > +       }
> > >
> > >         /*
> > >          * We should use WRITE_ONCE() here because we can have concurrent reads
> > > @@ -141,9 +164,11 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> > >          */
> > >         WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
> > >
> > > -       /* vma should remain attached. */
> > > -       if (locked)
> > > -               WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
> > > +       if (!ves.detached) {
> >
> > Strictly speaking the above check should be checking ves->exclusive
> > instead of !ves.detached. What you have is technically correct but
> > it's again related to that comment:
> >
> > "If the function indicates an exclusive lock was acquired via
> > ves->exclusive (or equivalently, returning 0 with !ves->detached), the
> > caller is required to invoke __vma_exit_exclusive_locked() once the
> > exclusive state is no longer required."
> >
> > So, here you are using returning 0 with !ves->detached as an
> > indication that the VMA was successfully locked. I think it's less
> > confusing if we use the field dedicated for that purpose.
>
> OK changed.
>
> >
> > > +               __vma_exit_exclusive_locked(&ves);
> > > +               /* VMA should remain attached. */
> > > +               WARN_ON_ONCE(ves.detached);
> > > +       }
> > >
> > >         return 0;
> > >  }
> > > @@ -151,7 +176,12 @@ EXPORT_SYMBOL_GPL(__vma_start_write);
> > >
> > >  void vma_mark_detached(struct vm_area_struct *vma)
> > >  {
> > > -       bool detached;
> > > +       struct vma_exclude_readers_state ves = {
> > > +               .vma = vma,
> > > +               .state = TASK_UNINTERRUPTIBLE,
> > > +               .detaching = true,
> > > +       };
> > > +       int err;
> > >
> > >         vma_assert_write_locked(vma);
> > >         vma_assert_attached(vma);
> > > @@ -160,18 +190,26 @@ void vma_mark_detached(struct vm_area_struct *vma)
> > >          * See the comment describing the vm_area_struct->vm_refcnt field for
> > >          * details of possible refcnt values.
> > >          */
> > > -       detached = __vma_refcount_put(vma, NULL);
> > > -       if (unlikely(!detached)) {
> > > -               /* Wait until vma is detached with no readers. */
> > > -               if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
> > > -                       /*
> > > -                        * Once this is complete, no readers can increment the
> > > -                        * reference count, and the VMA is marked detached.
> > > -                        */
> > > -                       detached = __vma_exit_exclusive_locked(vma);
> > > -                       WARN_ON_ONCE(!detached);
> > > -               }
> > > +       if (likely(__vma_refcount_put(vma, NULL)))
> > > +               return;
> > > +
> > > +       /*
> > > +        * Wait until the VMA is detached with no readers. Since we hold the VMA
> > > +        * write lock, the only read locks that might be present are those from
> > > +        * threads trying to acquire the read lock and incrementing the
> > > +        * reference count before realising the write lock is held and
> > > +        * decrementing it.
> > > +        */
> > > +       err = __vma_enter_exclusive_locked(&ves);
> > > +       if (!err && !ves.detached) {
> >
> > Same here, we should be checking ves->exclusive to decide if
> > __vma_exit_exclusive_locked() should be called or not.
>
> Ack, changed.
>
> >
> > > +               /*
> > > +                * Once this is complete, no readers can increment the
> > > +                * reference count, and the VMA is marked detached.
> > > +                */
> > > +               __vma_exit_exclusive_locked(&ves);
> > >         }
> > > +       /* If an error arose but we were detached anyway, we don't care. */
> > > +       WARN_ON_ONCE(!ves.detached);
> > >  }
> > >
> > >  /*
> > > --
> > > 2.52.0
>
> Cheers, Lorenzo
Re: [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
Posted by Lorenzo Stoakes 2 weeks, 2 days ago
On Fri, Jan 23, 2026 at 11:34:25AM -0800, Suren Baghdasaryan wrote:
> > > >  int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> > > >                 int state)
> > > >  {
> > > > -       int locked;
> > > > +       int err;
> > > > +       struct vma_exclude_readers_state ves = {
> > > > +               .vma = vma,
> > > > +               .state = state,
> > > > +       };
> > > >
> > > > -       locked = __vma_enter_exclusive_locked(vma, false, state);
> > > > -       if (locked < 0)
> > > > -               return locked;
> > > > +       err = __vma_enter_exclusive_locked(&ves);
> > > > +       if (err) {
> > > > +               WARN_ON_ONCE(ves.detached);
> > >
> > > I believe the above WARN_ON_ONCE() should stay inside of
> > > __vma_enter_exclusive_locked(). Its correctness depends on the
> > > implementation details of __vma_enter_exclusive_locked(). More
> >
> > Well this was kind of horrible in the original implementation, as you are
> > literally telling the function whether you are detaching or not, and only doing
> > this assert if you were not.
> >
> > That kind of 'if the caller is X do A, if the caller is Y do B' is really a code
> > smell, you should have X do the thing.
> >
> > > specifically, it is only correct because
> > > __vma_enter_exclusive_locked() returns 0 if the VMA is detached, even
> > > if there was a pending SIGKILL.
> >
> > Well it's a documented aspect of the function that we return 0 immediately on
> > detached state so I'm not sure that is an implementation detail?
> >
> > I significantly prefer having that here vs. 'if not detaching then assert if
> > detached' for people to scratch their heads over in the function.
> >
> > I think this detail is incorrect anyway, because:
> >
> >         if (err) {
> >                 if (__vma_exit_exclusive_locked(vma)) {
> >                         /*
> >                          * The wait failed, but the last reader went away
> >                          * as well. Tell the caller the VMA is detached.
> >                          */
> >                          WARN_ON_ONCE(!detaching);
> >                          err = 0;
> >                 }
> >                 ...
> >         }
> >
> > Implies - hey we're fine with err not being zero AND detaching right? In which
> > case reset the error?
> >
> > Except when detaching we set TASK_UNINTERRUPTIBLE? Which surely means we never
> > seen an error?
> >
> > Or do we?
> >
> > Either way it's something we handle differently based on _caller_. So it doesn't
> > belong in the function at all.
> >
> > It's certainly logic that's highly confusing and needs to be handled
> > differently.
>
> Just to be clear, I'm not defending the way it is done before your
> change, however the old check for "if not detaching then assert if

I mean you basically are since here I am trying to change it and you're
telling me not to, so you are definitely defending this.

> detached" makes more sense to me than "if
> __vma_enter_exclusive_locked() failed assert that we VMA is still
> attached". The latter one does not make logical sense to me. It's only

I don't understand what you're quoting here?

> correct because of the implementation detail of
> __vma_enter_exclusive_locked().

Except that implementation detail no longer exists?


Before:

         if (err) {
                 if (__vma_exit_exclusive_locked(vma)) {
                         /*
                          * The wait failed, but the last reader went away
                          * as well. Tell the caller the VMA is detached.
                          */
                          WARN_ON_ONCE(!detaching);
                          err = 0;
                 }
                 ...
         }

After:

	if (err) {
		__vma_end_exclude_readers(ves);
		return err;
	}

So now each caller receives an error _and decides what to do with it_.

In __vma_exclude_readers_for_detach():


	err = __vma_start_exclude_readers(&ves);
	if (!err && ves.exclusive) {
		...
	}
	/* If an error arose but we were detached anyway, we don't care. */
	WARN_ON_ONCE(!ves.detached);

Right that's pretty clear? We expect to be detached no matter what, and the
comment points out that, yeah, err could result in detachment.

In the __vma_start_write() path:

	err = __vma_start_exclude_readers(&ves);
	if (err) {
		WARN_ON_ONCE(ves.detached);
		return err;
	}

I mean, yes we don't expect to be detached when we're acquiring a write.

Honestly I've spent the past 6 hours responding to review for a series I
really didn't want to write in the first place, updating and testing
etc. as I go, and I've essentially accepted every single point of feedback.

So I'm a little frustrated at getting stuck on this issue.

So I'm afraid I'm going to send the v4 out as-is and we can have a v5 (or
ideally, a fix-patch) if we have to, but you definitely need to be more
convincing about this.

I might just be wrong and missing the point out of tiredness but, at this
stage, I'm not going to hold up the respin over this.

Thanks, Lorenzo
Re: [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
Posted by Suren Baghdasaryan 2 weeks, 2 days ago
On Fri, Jan 23, 2026 at 12:04 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Fri, Jan 23, 2026 at 11:34:25AM -0800, Suren Baghdasaryan wrote:
> > > > >  int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> > > > >                 int state)
> > > > >  {
> > > > > -       int locked;
> > > > > +       int err;
> > > > > +       struct vma_exclude_readers_state ves = {
> > > > > +               .vma = vma,
> > > > > +               .state = state,
> > > > > +       };
> > > > >
> > > > > -       locked = __vma_enter_exclusive_locked(vma, false, state);
> > > > > -       if (locked < 0)
> > > > > -               return locked;
> > > > > +       err = __vma_enter_exclusive_locked(&ves);
> > > > > +       if (err) {
> > > > > +               WARN_ON_ONCE(ves.detached);
> > > >
> > > > I believe the above WARN_ON_ONCE() should stay inside of
> > > > __vma_enter_exclusive_locked(). Its correctness depends on the
> > > > implementation details of __vma_enter_exclusive_locked(). More
> > >
> > > Well this was kind of horrible in the original implementation, as you are
> > > literally telling the function whether you are detaching or not, and only doing
> > > this assert if you were not.
> > >
> > > That kind of 'if the caller is X do A, if the caller is Y do B' is really a code
> > > smell, you should have X do the thing.
> > >
> > > > specifically, it is only correct because
> > > > __vma_enter_exclusive_locked() returns 0 if the VMA is detached, even
> > > > if there was a pending SIGKILL.
> > >
> > > Well it's a documented aspect of the function that we return 0 immediately on
> > > detached state so I'm not sure that is an implementation detail?
> > >
> > > I significantly prefer having that here vs. 'if not detaching then assert if
> > > detached' for people to scratch their heads over in the function.
> > >
> > > I think this detail is incorrect anyway, because:
> > >
> > >         if (err) {
> > >                 if (__vma_exit_exclusive_locked(vma)) {
> > >                         /*
> > >                          * The wait failed, but the last reader went away
> > >                          * as well. Tell the caller the VMA is detached.
> > >                          */
> > >                          WARN_ON_ONCE(!detaching);
> > >                          err = 0;
> > >                 }
> > >                 ...
> > >         }
> > >
> > > Implies - hey we're fine with err not being zero AND detaching right? In which
> > > case reset the error?
> > >
> > > Except when detaching we set TASK_UNINTERRUPTIBLE? Which surely means we never
> > > seen an error?
> > >
> > > Or do we?
> > >
> > > Either way it's something we handle differently based on _caller_. So it doesn't
> > > belong in the function at all.
> > >
> > > It's certainly logic that's highly confusing and needs to be handled
> > > differently.
> >
> > Just to be clear, I'm not defending the way it is done before your
> > change, however the old check for "if not detaching then assert if
>
> I mean you basically are since here I am trying to change it and you're
> telling me not to, so you are definitely defending this.
>
> > detached" makes more sense to me than "if
> > __vma_enter_exclusive_locked() failed assert that we VMA is still
> > attached". The latter one does not make logical sense to me. It's only
>
> I don't understand what you're quoting here?
>
> > correct because of the implementation detail of
> > __vma_enter_exclusive_locked().
>
> Except that implementation detail no longer exists?
>
>
> Before:
>
>          if (err) {
>                  if (__vma_exit_exclusive_locked(vma)) {
>                          /*
>                           * The wait failed, but the last reader went away
>                           * as well. Tell the caller the VMA is detached.
>                           */
>                           WARN_ON_ONCE(!detaching);
>                           err = 0;
>                  }
>                  ...
>          }
>
> After:
>
>         if (err) {
>                 __vma_end_exclude_readers(ves);
>                 return err;
>         }
>
> So now each caller receives an error _and decides what to do with it_.
>
> In __vma_exclude_readers_for_detach():
>
>
>         err = __vma_start_exclude_readers(&ves);
>         if (!err && ves.exclusive) {
>                 ...
>         }
>         /* If an error arose but we were detached anyway, we don't care. */
>         WARN_ON_ONCE(!ves.detached);
>
> Right that's pretty clear? We expect to be detached no matter what, and the
> comment points out that, yeah, err could result in detachment.
>
> In the __vma_start_write() path:
>
>         err = __vma_start_exclude_readers(&ves);
>         if (err) {
>                 WARN_ON_ONCE(ves.detached);
>                 return err;
>         }
>
> I mean, yes we don't expect to be detached when we're acquiring a write.
>
> Honestly I've spent the past 6 hours responding to review for a series I
> really didn't want to write in the first place, updating and testing
> etc. as I go, and I've essentially accepted every single point of feedback.
>
> So I'm a little frustrated at getting stuck on this issue.
>
> So I'm afraid I'm going to send the v4 out as-is and we can have a v5 (or
> ideally, a fix-patch) if we have to, but you definitely need to be more
> convincing about this.
>
> I might just be wrong and missing the point out of tiredness but, at this
> stage, I'm not going to hold up the respin over this.

Sorry, I didn't realize I was causing that much trouble and I
understand your frustration.
From your reply, it sounds like you made enough changes to the patch
that my concern might already be obsolete. I'll review the new
submission on Sunday and will provide my feedback.
Thanks,
Suren.

>
> Thanks, Lorenzo
Re: [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
Posted by Lorenzo Stoakes 2 weeks, 1 day ago
On Fri, Jan 23, 2026 at 02:07:43PM -0800, Suren Baghdasaryan wrote:
>
> Sorry, I didn't realize I was causing that much trouble and I
> understand your frustration.
> From your reply, it sounds like you made enough changes to the patch
> that my concern might already be obsolete. I'll review the new
> submission on Sunday and will provide my feedback.
> Thanks,
> Suren.

Apologies for being grumpy, long day :) to be clear I value your and
Vlastimil's feedback very much, and thanks to you both for having taken the
time to review the rework.

Hopefully that's reflected in just how much I've updated the series in
response to both your absolutely valid pointing out of mistakes as well as
suggestions for improvements, I think the series is way better with your
input! (As always with code review - it is just a net positive).

Please do review the new revision with scrutiny and comment on anything you
find that you feel I should update, including this issue, perhaps I simply
misunderstood you, but hopefully you can also see my point of view as to
why I felt it was useful to factor that out.

In general I'm hoping to move away from cleanups and towards meatier series
but as co-maintainer of the VMA locks I felt it really important to make
the VMA locks logic a lot clearer - to not just complain but do something
:)

In general the issue has been around abstraction at the 'intermediate'
level as Vlasta describes it, the public API is fine, so just rearranging
things such that developers coming to the code can build a good mental
model of what's going on.

So hopefully this series helps get us a least a decent way along that road!

Cheers, Lorenzo
Re: [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
Posted by Suren Baghdasaryan 2 weeks ago
On Sat, Jan 24, 2026 at 12:54 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Fri, Jan 23, 2026 at 02:07:43PM -0800, Suren Baghdasaryan wrote:
> >
> > Sorry, I didn't realize I was causing that much trouble and I
> > understand your frustration.
> > From your reply, it sounds like you made enough changes to the patch
> > that my concern might already be obsolete. I'll review the new
> > submission on Sunday and will provide my feedback.
> > Thanks,
> > Suren.
>
> Apologies for being grumpy, long day :)

No worries. I get it.

> to be clear I value your and
> Vlastimil's feedback very much, and thanks to you both for having taken the
> time to review the rework.

The cleanup you did may not be the most exciting work you do but it
really aids understanding the code and its intent. Thanks for doing
that!

>
> Hopefully that's reflected in just how much I've updated the series in
> response to both your absolutely valid pointing out of mistakes as well as
> suggestions for improvements, I think the series is way better with your
> input! (As always with code review - it is just a net positive).

That's my goal.

>
> Please do review the new revision with scrutiny and comment on anything you
> find that you feel I should update, including this issue, perhaps I simply
> misunderstood you, but hopefully you can also see my point of view as to
> why I felt it was useful to factor that out.

I started reviewing new patches but need to check the important ones
with fresh eyes. Will finish them tomorrow morning.

>
> In general I'm hoping to move away from cleanups and towards meatier series
> but as co-maintainer of the VMA locks I felt it really important to make
> the VMA locks logic a lot clearer - to not just complain but do something
> :)
>
> In general the issue has been around abstraction at the 'intermediate'
> level as Vlasta describes it, the public API is fine, so just rearranging
> things such that developers coming to the code can build a good mental
> model of what's going on.
>
> So hopefully this series helps get us a least a decent way along that road!

It definitely does. Thanks again for doing this!
Cheers,
Suren.

>
> Cheers, Lorenzo