[PATCH RESEND v3 02/10] mm/vma: document possible vma->vm_refcnt values and reference comment

Lorenzo Stoakes posted 10 patches 2 weeks, 4 days ago
There is a newer version of this series
[PATCH RESEND v3 02/10] mm/vma: document possible vma->vm_refcnt values and reference comment
Posted by Lorenzo Stoakes 2 weeks, 4 days ago
The possible vma->vm_refcnt values are confusing and vague, explain in
detail what these can be in a comment describing the vma->vm_refcnt field
and reference this comment in various places that read/write this field.

No functional change intended.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/mm_types.h  | 39 +++++++++++++++++++++++++++++++++++++--
 include/linux/mmap_lock.h |  7 +++++++
 mm/mmap_lock.c            |  6 ++++++
 3 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 94de392ed3c5..e5ee66f84d9a 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -758,7 +758,8 @@ static inline struct anon_vma_name *anon_vma_name_alloc(const char *name)
  * set the VM_REFCNT_EXCLUDE_READERS_FLAG in vma->vm_refcnt to indiciate to
  * vma_start_read() that the reference count should be left alone.
  *
- * Once the operation is complete, this value is subtracted from vma->vm_refcnt.
+ * See the comment describing vm_refcnt in vm_area_struct for details as to
+ * which values the VMA reference count can be.
  */
 #define VM_REFCNT_EXCLUDE_READERS_BIT	(30)
 #define VM_REFCNT_EXCLUDE_READERS_FLAG	(1U << VM_REFCNT_EXCLUDE_READERS_BIT)
@@ -989,7 +990,41 @@ struct vm_area_struct {
 	struct vma_numab_state *numab_state;	/* NUMA Balancing state */
 #endif
 #ifdef CONFIG_PER_VMA_LOCK
-	/* Unstable RCU readers are allowed to read this. */
+	/*
+	 * Used to keep track of the number of references taken by VMA read or
+	 * write locks. May have the VM_REFCNT_EXCLUDE_READERS_FLAG set
+	 * indicating that a thread has entered __vma_enter_locked() and is
+	 * waiting on any outstanding read locks to exit.
+	 *
+	 * This value can be equal to:
+	 *
+	 * 0 - Detached.
+	 *
+	 * 1 - Unlocked or write-locked.
+	 *
+	 * >1, < VM_REFCNT_EXCLUDE_READERS_FLAG - Read-locked or (unlikely)
+	 * write-locked with other threads having temporarily incremented the
+	 * reference count prior to determining it is write-locked and
+	 * decrementing it again.
+	 *
+	 * VM_REFCNT_EXCLUDE_READERS_FLAG - Detached, pending
+	 * __vma_exit_locked() completion which will decrement the reference
+	 * count to zero. IMPORTANT - at this stage no further readers can
+	 * increment the reference count. It can only be reduced.
+	 *
+	 * VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - Either an attached VMA pending
+	 * __vma_exit_locked() completion which will decrement the reference
+	 * count to one, OR a detached VMA waiting on a single spurious reader
+	 * to decrement reference count. IMPORTANT - as above, no further
+	 * readers can increment the reference count.
+	 *
+	 * > VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - VMA is waiting on readers,
+	 * whether it is attempting to acquire a write lock or attempting to
+	 * detach. IMPORTANT - as above, no ruther readers can increment the
+	 * reference count.
+	 *
+	 * NOTE: Unstable RCU readers are allowed to read this.
+	 */
 	refcount_t vm_refcnt ____cacheline_aligned_in_smp;
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	struct lockdep_map vmlock_dep_map;
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 5acbd4ba1b52..a764439d0276 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -130,6 +130,9 @@ static inline bool is_vma_writer_only(int refcnt)
 	 * attached. Waiting on a detached vma happens only in
 	 * vma_mark_detached() and is a rare case, therefore most of the time
 	 * there will be no unnecessary wakeup.
+	 *
+	 * See the comment describing the vm_area_struct->vm_refcnt field for
+	 * details of possible refcnt values.
 	 */
 	return (refcnt & VM_REFCNT_EXCLUDE_READERS_FLAG) &&
 		refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1;
@@ -249,6 +252,10 @@ static inline void vma_assert_locked(struct vm_area_struct *vma)
 {
 	unsigned int mm_lock_seq;

+	/*
+	 * See the comment describing the vm_area_struct->vm_refcnt field for
+	 * details of possible refcnt values.
+	 */
 	VM_BUG_ON_VMA(refcount_read(&vma->vm_refcnt) <= 1 &&
 		      !__is_vma_write_locked(vma, &mm_lock_seq), vma);
 }
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 1d23b48552e9..75dc098aea14 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -65,6 +65,9 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
 	/*
 	 * If vma is detached then only vma_mark_attached() can raise the
 	 * vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached().
+	 *
+	 * See the comment describing the vm_area_struct->vm_refcnt field for
+	 * details of possible refcnt values.
 	 */
 	if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
 		return 0;
@@ -137,6 +140,9 @@ void vma_mark_detached(struct vm_area_struct *vma)
 	 * before they check vm_lock_seq, realize the vma is locked and drop
 	 * back the vm_refcnt. That is a narrow window for observing a raised
 	 * vm_refcnt.
+	 *
+	 * See the comment describing the vm_area_struct->vm_refcnt field for
+	 * details of possible refcnt values.
 	 */
 	if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) {
 		/* Wait until vma is detached with no readers. */
--
2.52.0
Re: [PATCH RESEND v3 02/10] mm/vma: document possible vma->vm_refcnt values and reference comment
Posted by Vlastimil Babka 2 weeks, 4 days ago
On 1/22/26 14:01, Lorenzo Stoakes wrote:
> The possible vma->vm_refcnt values are confusing and vague, explain in
> detail what these can be in a comment describing the vma->vm_refcnt field
> and reference this comment in various places that read/write this field.
> 
> No functional change intended.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Thanks, very useful. Forgive my nitpicks :) It's because it's tricky so best
try to be as precise as possible, I believe.

> ---
>  include/linux/mm_types.h  | 39 +++++++++++++++++++++++++++++++++++++--
>  include/linux/mmap_lock.h |  7 +++++++
>  mm/mmap_lock.c            |  6 ++++++
>  3 files changed, 50 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 94de392ed3c5..e5ee66f84d9a 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -758,7 +758,8 @@ static inline struct anon_vma_name *anon_vma_name_alloc(const char *name)
>   * set the VM_REFCNT_EXCLUDE_READERS_FLAG in vma->vm_refcnt to indiciate to
>   * vma_start_read() that the reference count should be left alone.
>   *
> - * Once the operation is complete, this value is subtracted from vma->vm_refcnt.
> + * See the comment describing vm_refcnt in vm_area_struct for details as to
> + * which values the VMA reference count can be.
>   */
>  #define VM_REFCNT_EXCLUDE_READERS_BIT	(30)
>  #define VM_REFCNT_EXCLUDE_READERS_FLAG	(1U << VM_REFCNT_EXCLUDE_READERS_BIT)
> @@ -989,7 +990,41 @@ struct vm_area_struct {
>  	struct vma_numab_state *numab_state;	/* NUMA Balancing state */
>  #endif
>  #ifdef CONFIG_PER_VMA_LOCK
> -	/* Unstable RCU readers are allowed to read this. */
> +	/*
> +	 * Used to keep track of the number of references taken by VMA read or
> +	 * write locks. May have the VM_REFCNT_EXCLUDE_READERS_FLAG set

I wonder about the "or write locks" part. The process of acquiring it uses
VM_REFCNT_EXCLUDE_READERS_FLAG but then the writer doesn't hold a 1
refcount? (the sentence could be read it way IMHO) It's vma being attached
that does, AFAIK?

> +	 * indicating that a thread has entered __vma_enter_locked() and is
> +	 * waiting on any outstanding read locks to exit.
> +	 *
> +	 * This value can be equal to:
> +	 *
> +	 * 0 - Detached.

Is it worth saying that readers can't increment the refcount?

> +	 * 1 - Unlocked or write-locked.

"Attached and either unlocked or write-locked." ?

(see how "write-locked" isn't reflected, I argued above)

> +	 *
> +	 * >1, < VM_REFCNT_EXCLUDE_READERS_FLAG - Read-locked or (unlikely)
> +	 * write-locked with other threads having temporarily incremented the
> +	 * reference count prior to determining it is write-locked and
> +	 * decrementing it again.

Ack.

> +	 * VM_REFCNT_EXCLUDE_READERS_FLAG - Detached, pending
> +	 * __vma_exit_locked() completion which will decrement the reference
> +	 * count to zero. IMPORTANT - at this stage no further readers can
> +	 * increment the reference count. It can only be reduced.
> +	 *
> +	 * VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - Either an attached VMA pending
> +	 * __vma_exit_locked() completion which will decrement the reference
> +	 * count to one, OR a detached VMA waiting on a single spurious reader
> +	 * to decrement reference count. IMPORTANT - as above, no further
> +	 * readers can increment the reference count.
> +	 *
> +	 * > VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - VMA is waiting on readers,

"VMA is waiting" sounds weird? a thread might be, but VMA itself?
(similarly in the previous paragraph)

> +	 * whether it is attempting to acquire a write lock or attempting to
> +	 * detach. IMPORTANT - as above, no ruther readers can increment the
> +	 * reference count.
> +	 *
> +	 * NOTE: Unstable RCU readers are allowed to read this.
> +	 */
>  	refcount_t vm_refcnt ____cacheline_aligned_in_smp;
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>  	struct lockdep_map vmlock_dep_map;
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index 5acbd4ba1b52..a764439d0276 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -130,6 +130,9 @@ static inline bool is_vma_writer_only(int refcnt)
>  	 * attached. Waiting on a detached vma happens only in
>  	 * vma_mark_detached() and is a rare case, therefore most of the time
>  	 * there will be no unnecessary wakeup.
> +	 *
> +	 * See the comment describing the vm_area_struct->vm_refcnt field for
> +	 * details of possible refcnt values.
>  	 */
>  	return (refcnt & VM_REFCNT_EXCLUDE_READERS_FLAG) &&
>  		refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1;
> @@ -249,6 +252,10 @@ static inline void vma_assert_locked(struct vm_area_struct *vma)
>  {
>  	unsigned int mm_lock_seq;
> 
> +	/*
> +	 * See the comment describing the vm_area_struct->vm_refcnt field for
> +	 * details of possible refcnt values.
> +	 */
>  	VM_BUG_ON_VMA(refcount_read(&vma->vm_refcnt) <= 1 &&
>  		      !__is_vma_write_locked(vma, &mm_lock_seq), vma);
>  }
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index 1d23b48552e9..75dc098aea14 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -65,6 +65,9 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>  	/*
>  	 * If vma is detached then only vma_mark_attached() can raise the
>  	 * vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached().
> +	 *
> +	 * See the comment describing the vm_area_struct->vm_refcnt field for
> +	 * details of possible refcnt values.
>  	 */
>  	if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
>  		return 0;
> @@ -137,6 +140,9 @@ void vma_mark_detached(struct vm_area_struct *vma)
>  	 * before they check vm_lock_seq, realize the vma is locked and drop
>  	 * back the vm_refcnt. That is a narrow window for observing a raised
>  	 * vm_refcnt.
> +	 *
> +	 * See the comment describing the vm_area_struct->vm_refcnt field for
> +	 * details of possible refcnt values.
>  	 */
>  	if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) {
>  		/* Wait until vma is detached with no readers. */
> --
> 2.52.0
Re: [PATCH RESEND v3 02/10] mm/vma: document possible vma->vm_refcnt values and reference comment
Posted by Lorenzo Stoakes 2 weeks, 3 days ago
On Thu, Jan 22, 2026 at 05:48:07PM +0100, Vlastimil Babka wrote:
> On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > The possible vma->vm_refcnt values are confusing and vague, explain in
> > detail what these can be in a comment describing the vma->vm_refcnt field
> > and reference this comment in various places that read/write this field.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Thanks, very useful. Forgive my nitpicks :) It's because it's tricky so best
> try to be as precise as possible, I believe.

Ack.

>
> > ---
> >  include/linux/mm_types.h  | 39 +++++++++++++++++++++++++++++++++++++--
> >  include/linux/mmap_lock.h |  7 +++++++
> >  mm/mmap_lock.c            |  6 ++++++
> >  3 files changed, 50 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 94de392ed3c5..e5ee66f84d9a 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -758,7 +758,8 @@ static inline struct anon_vma_name *anon_vma_name_alloc(const char *name)
> >   * set the VM_REFCNT_EXCLUDE_READERS_FLAG in vma->vm_refcnt to indiciate to
> >   * vma_start_read() that the reference count should be left alone.
> >   *
> > - * Once the operation is complete, this value is subtracted from vma->vm_refcnt.
> > + * See the comment describing vm_refcnt in vm_area_struct for details as to
> > + * which values the VMA reference count can be.
> >   */
> >  #define VM_REFCNT_EXCLUDE_READERS_BIT	(30)
> >  #define VM_REFCNT_EXCLUDE_READERS_FLAG	(1U << VM_REFCNT_EXCLUDE_READERS_BIT)
> > @@ -989,7 +990,41 @@ struct vm_area_struct {
> >  	struct vma_numab_state *numab_state;	/* NUMA Balancing state */
> >  #endif
> >  #ifdef CONFIG_PER_VMA_LOCK
> > -	/* Unstable RCU readers are allowed to read this. */
> > +	/*
> > +	 * Used to keep track of the number of references taken by VMA read or
> > +	 * write locks. May have the VM_REFCNT_EXCLUDE_READERS_FLAG set
>
> I wonder about the "or write locks" part. The process of acquiring it uses
> VM_REFCNT_EXCLUDE_READERS_FLAG but then the writer doesn't hold a 1
> refcount? (the sentence could be read it way IMHO) It's vma being attached
> that does, AFAIK?

Right the intent is to say that, in the process of excluding readers, a write
lock can vary the reference count.

It's a pity we can't describe the refcnt in some sensible, logical way as it's
really being overloaded quite a bit for multiple things.

It really isn't actually keeping track of references (other than read locks
taken).

OK so I have updated this to say:

	 * Used to keep track of firstly, whether the VMA is attached, secondly,
	 * if attached, how many read locks are taken, and thirdly, if the
	 * VM_REFCNT_EXCLUDE_READERS_FLAG is set, whether any read locks held
	 * are currently in the process of being excluded.

>
> > +	 * indicating that a thread has entered __vma_enter_locked() and is
> > +	 * waiting on any outstanding read locks to exit.
> > +	 *
> > +	 * This value can be equal to:
> > +	 *
> > +	 * 0 - Detached.
>
> Is it worth saying that readers can't increment the refcount?

Added, updated to say:

	 * 0 - Detached. IMPORTANT: when the refcnt is zero, readers cannot
	 * increment it.

>
> > +	 * 1 - Unlocked or write-locked.
>
> "Attached and either unlocked or write-locked." ?
>
> (see how "write-locked" isn't reflected, I argued above)

I'm not sure what you mean, a write lock requires the VMA to be attached (or it
bails out on attempted write lock). So it's kind of implicit right?

But yes better to be explicit, so have replaced with your suggestion.

>
> > +	 *
> > +	 * >1, < VM_REFCNT_EXCLUDE_READERS_FLAG - Read-locked or (unlikely)
> > +	 * write-locked with other threads having temporarily incremented the
> > +	 * reference count prior to determining it is write-locked and
> > +	 * decrementing it again.
>
> Ack.
>
> > +	 * VM_REFCNT_EXCLUDE_READERS_FLAG - Detached, pending
> > +	 * __vma_exit_locked() completion which will decrement the reference
> > +	 * count to zero. IMPORTANT - at this stage no further readers can
> > +	 * increment the reference count. It can only be reduced.
> > +	 *
> > +	 * VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - Either an attached VMA pending
> > +	 * __vma_exit_locked() completion which will decrement the reference
> > +	 * count to one, OR a detached VMA waiting on a single spurious reader
> > +	 * to decrement reference count. IMPORTANT - as above, no further
> > +	 * readers can increment the reference count.
> > +	 *
> > +	 * > VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - VMA is waiting on readers,
>
> "VMA is waiting" sounds weird? a thread might be, but VMA itself?
> (similarly in the previous paragraph)

I was trying to make it a bit more succinct that way I think, but agreed
it's unclear. Have replaced with:

	 * VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - A thread is either write-locking
	 * an attached VMA and has yet to invoke __vma_exit_locked(), OR a
	 * thread is detaching a VMA and is waiting on a single spurious reader
	 * in order to decrement the reference count. IMPORTANT - as above, no
	 * further readers can increment the reference count.
	 *
	 * > VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - A thread either
	 * write-locking or detaching a VMA is waiting on readers to
	 * exit. IMPORTANT - as above, no ruther readers can increment the
	 * reference count.

Cheers, Lorenzo
Re: [PATCH RESEND v3 02/10] mm/vma: document possible vma->vm_refcnt values and reference comment
Posted by Suren Baghdasaryan 2 weeks, 4 days ago
On Thu, Jan 22, 2026 at 8:48 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > The possible vma->vm_refcnt values are confusing and vague, explain in
> > detail what these can be in a comment describing the vma->vm_refcnt field
> > and reference this comment in various places that read/write this field.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Thanks, very useful. Forgive my nitpicks :) It's because it's tricky so best
> try to be as precise as possible, I believe.

Another thanks from me.

>
> > ---
> >  include/linux/mm_types.h  | 39 +++++++++++++++++++++++++++++++++++++--
> >  include/linux/mmap_lock.h |  7 +++++++
> >  mm/mmap_lock.c            |  6 ++++++
> >  3 files changed, 50 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 94de392ed3c5..e5ee66f84d9a 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -758,7 +758,8 @@ static inline struct anon_vma_name *anon_vma_name_alloc(const char *name)
> >   * set the VM_REFCNT_EXCLUDE_READERS_FLAG in vma->vm_refcnt to indiciate to
> >   * vma_start_read() that the reference count should be left alone.
> >   *
> > - * Once the operation is complete, this value is subtracted from vma->vm_refcnt.
> > + * See the comment describing vm_refcnt in vm_area_struct for details as to
> > + * which values the VMA reference count can be.
> >   */
> >  #define VM_REFCNT_EXCLUDE_READERS_BIT        (30)
> >  #define VM_REFCNT_EXCLUDE_READERS_FLAG       (1U << VM_REFCNT_EXCLUDE_READERS_BIT)
> > @@ -989,7 +990,41 @@ struct vm_area_struct {
> >       struct vma_numab_state *numab_state;    /* NUMA Balancing state */
> >  #endif
> >  #ifdef CONFIG_PER_VMA_LOCK
> > -     /* Unstable RCU readers are allowed to read this. */
> > +     /*
> > +      * Used to keep track of the number of references taken by VMA read or
> > +      * write locks. May have the VM_REFCNT_EXCLUDE_READERS_FLAG set
>
> I wonder about the "or write locks" part. The process of acquiring it uses
> VM_REFCNT_EXCLUDE_READERS_FLAG but then the writer doesn't hold a 1
> refcount? (the sentence could be read it way IMHO) It's vma being attached
> that does, AFAIK?

Yes, since there can be only one write-locker it only has to set
VM_REFCNT_EXCLUDE_READERS_FLAG bit to announce its presence, without
incrementing the refcount.

>
> > +      * indicating that a thread has entered __vma_enter_locked() and is
> > +      * waiting on any outstanding read locks to exit.
> > +      *
> > +      * This value can be equal to:
> > +      *
> > +      * 0 - Detached.
>
> Is it worth saying that readers can't increment the refcount?

Yes, you mention that for VM_REFCNT_EXCLUDE_READERS_FLAG value. The
same IMPORTANT notice applies here.

>
> > +      * 1 - Unlocked or write-locked.
>
> "Attached and either unlocked or write-locked." ?

Agree. That's more specific.
Should we also mention here that unlocked vs write-locked distinction
is determined using the vm_lock_seq member?

>
> (see how "write-locked" isn't reflected, I argued above)
>
> > +      *
> > +      * >1, < VM_REFCNT_EXCLUDE_READERS_FLAG - Read-locked or (unlikely)
> > +      * write-locked with other threads having temporarily incremented the
> > +      * reference count prior to determining it is write-locked and
> > +      * decrementing it again.
>
> Ack.
>
> > +      * VM_REFCNT_EXCLUDE_READERS_FLAG - Detached, pending
> > +      * __vma_exit_locked() completion which will decrement the reference
> > +      * count to zero. IMPORTANT - at this stage no further readers can
> > +      * increment the reference count. It can only be reduced.
> > +      *
> > +      * VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - Either an attached VMA pending
> > +      * __vma_exit_locked() completion which will decrement the reference
> > +      * count to one, OR a detached VMA waiting on a single spurious reader
> > +      * to decrement reference count. IMPORTANT - as above, no further
> > +      * readers can increment the reference count.
> > +      *
> > +      * > VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - VMA is waiting on readers,
>
> "VMA is waiting" sounds weird? a thread might be, but VMA itself?
> (similarly in the previous paragraph)

Maybe "VMA in the process of being write-locked or detached, which got
blocked due to the spurious readers that temporarily raised the
refcount"?

>
> > +      * whether it is attempting to acquire a write lock or attempting to
> > +      * detach. IMPORTANT - as above, no ruther readers can increment the
> > +      * reference count.
> > +      *
> > +      * NOTE: Unstable RCU readers are allowed to read this.
> > +      */
> >       refcount_t vm_refcnt ____cacheline_aligned_in_smp;
> >  #ifdef CONFIG_DEBUG_LOCK_ALLOC
> >       struct lockdep_map vmlock_dep_map;
> > diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> > index 5acbd4ba1b52..a764439d0276 100644
> > --- a/include/linux/mmap_lock.h
> > +++ b/include/linux/mmap_lock.h
> > @@ -130,6 +130,9 @@ static inline bool is_vma_writer_only(int refcnt)
> >        * attached. Waiting on a detached vma happens only in
> >        * vma_mark_detached() and is a rare case, therefore most of the time
> >        * there will be no unnecessary wakeup.
> > +      *
> > +      * See the comment describing the vm_area_struct->vm_refcnt field for
> > +      * details of possible refcnt values.
> >        */
> >       return (refcnt & VM_REFCNT_EXCLUDE_READERS_FLAG) &&
> >               refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1;
> > @@ -249,6 +252,10 @@ static inline void vma_assert_locked(struct vm_area_struct *vma)
> >  {
> >       unsigned int mm_lock_seq;
> >
> > +     /*
> > +      * See the comment describing the vm_area_struct->vm_refcnt field for
> > +      * details of possible refcnt values.
> > +      */
> >       VM_BUG_ON_VMA(refcount_read(&vma->vm_refcnt) <= 1 &&
> >                     !__is_vma_write_locked(vma, &mm_lock_seq), vma);
> >  }
> > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> > index 1d23b48552e9..75dc098aea14 100644
> > --- a/mm/mmap_lock.c
> > +++ b/mm/mmap_lock.c
> > @@ -65,6 +65,9 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
> >       /*
> >        * If vma is detached then only vma_mark_attached() can raise the
> >        * vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached().
> > +      *
> > +      * See the comment describing the vm_area_struct->vm_refcnt field for
> > +      * details of possible refcnt values.
> >        */
> >       if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
> >               return 0;
> > @@ -137,6 +140,9 @@ void vma_mark_detached(struct vm_area_struct *vma)
> >        * before they check vm_lock_seq, realize the vma is locked and drop
> >        * back the vm_refcnt. That is a narrow window for observing a raised
> >        * vm_refcnt.
> > +      *
> > +      * See the comment describing the vm_area_struct->vm_refcnt field for
> > +      * details of possible refcnt values.
> >        */
> >       if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) {
> >               /* Wait until vma is detached with no readers. */
> > --
> > 2.52.0
>
Re: [PATCH RESEND v3 02/10] mm/vma: document possible vma->vm_refcnt values and reference comment
Posted by Lorenzo Stoakes 2 weeks, 3 days ago
Sorry missed this before moving on to 3/10, responses below.

On Thu, Jan 22, 2026 at 09:28:05AM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 22, 2026 at 8:48 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > > The possible vma->vm_refcnt values are confusing and vague, explain in
> > > detail what these can be in a comment describing the vma->vm_refcnt field
> > > and reference this comment in various places that read/write this field.
> > >
> > > No functional change intended.
> > >
> > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> >
> > Thanks, very useful. Forgive my nitpicks :) It's because it's tricky so best
> > try to be as precise as possible, I believe.
>
> Another thanks from me.

Ack

>
> >
> > > ---
> > >  include/linux/mm_types.h  | 39 +++++++++++++++++++++++++++++++++++++--
> > >  include/linux/mmap_lock.h |  7 +++++++
> > >  mm/mmap_lock.c            |  6 ++++++
> > >  3 files changed, 50 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > > index 94de392ed3c5..e5ee66f84d9a 100644
> > > --- a/include/linux/mm_types.h
> > > +++ b/include/linux/mm_types.h
> > > @@ -758,7 +758,8 @@ static inline struct anon_vma_name *anon_vma_name_alloc(const char *name)
> > >   * set the VM_REFCNT_EXCLUDE_READERS_FLAG in vma->vm_refcnt to indiciate to
> > >   * vma_start_read() that the reference count should be left alone.
> > >   *
> > > - * Once the operation is complete, this value is subtracted from vma->vm_refcnt.
> > > + * See the comment describing vm_refcnt in vm_area_struct for details as to
> > > + * which values the VMA reference count can be.
> > >   */
> > >  #define VM_REFCNT_EXCLUDE_READERS_BIT        (30)
> > >  #define VM_REFCNT_EXCLUDE_READERS_FLAG       (1U << VM_REFCNT_EXCLUDE_READERS_BIT)
> > > @@ -989,7 +990,41 @@ struct vm_area_struct {
> > >       struct vma_numab_state *numab_state;    /* NUMA Balancing state */
> > >  #endif
> > >  #ifdef CONFIG_PER_VMA_LOCK
> > > -     /* Unstable RCU readers are allowed to read this. */
> > > +     /*
> > > +      * Used to keep track of the number of references taken by VMA read or
> > > +      * write locks. May have the VM_REFCNT_EXCLUDE_READERS_FLAG set
> >
> > I wonder about the "or write locks" part. The process of acquiring it uses
> > VM_REFCNT_EXCLUDE_READERS_FLAG but then the writer doesn't hold a 1
> > refcount? (the sentence could be read it way IMHO) It's vma being attached
> > that does, AFAIK?
>
> Yes, since there can be only one write-locker it only has to set
> VM_REFCNT_EXCLUDE_READERS_FLAG bit to announce its presence, without
> incrementing the refcount.

See reply to Vlastimil, I've reworked this to:

	 * Used to keep track of firstly, whether the VMA is attached, secondly,
	 * if attached, how many read locks are taken, and thirdly, if the
	 * VM_REFCNT_EXCLUDE_READERS_FLAG is set, whether any read locks held
	 * are currently in the process of being excluded.

>
> >
> > > +      * indicating that a thread has entered __vma_enter_locked() and is
> > > +      * waiting on any outstanding read locks to exit.
> > > +      *
> > > +      * This value can be equal to:
> > > +      *
> > > +      * 0 - Detached.
> >
> > Is it worth saying that readers can't increment the refcount?
>
> Yes, you mention that for VM_REFCNT_EXCLUDE_READERS_FLAG value. The
> same IMPORTANT notice applies here.

Yup already done as per reply to Vlastimil.

>
> >
> > > +      * 1 - Unlocked or write-locked.
> >
> > "Attached and either unlocked or write-locked." ?
>
> Agree. That's more specific.
> Should we also mention here that unlocked vs write-locked distinction
> is determined using the vm_lock_seq member?

Good idea. Have updated to:

	 * 1 - Attached and either unlocked or write-locked. Write locks are
	 * identified via __is_vma_write_locked() which checks for equality of
	 * vma->vm_lock_seq and mm->mm_lock_seq.

Note I felt it'd be distracting to say 'vma->vm_mm->mm_lock_seq.sequence' :)
mm->mm_lock_seq and hey go look at the function to see the gory details.

>
> >
> > (see how "write-locked" isn't reflected, I argued above)
> >
> > > +      *
> > > +      * >1, < VM_REFCNT_EXCLUDE_READERS_FLAG - Read-locked or (unlikely)
> > > +      * write-locked with other threads having temporarily incremented the
> > > +      * reference count prior to determining it is write-locked and
> > > +      * decrementing it again.
> >
> > Ack.
> >
> > > +      * VM_REFCNT_EXCLUDE_READERS_FLAG - Detached, pending
> > > +      * __vma_exit_locked() completion which will decrement the reference
> > > +      * count to zero. IMPORTANT - at this stage no further readers can
> > > +      * increment the reference count. It can only be reduced.
> > > +      *
> > > +      * VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - Either an attached VMA pending
> > > +      * __vma_exit_locked() completion which will decrement the reference
> > > +      * count to one, OR a detached VMA waiting on a single spurious reader
> > > +      * to decrement reference count. IMPORTANT - as above, no further
> > > +      * readers can increment the reference count.
> > > +      *
> > > +      * > VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - VMA is waiting on readers,
> >
> > "VMA is waiting" sounds weird? a thread might be, but VMA itself?
> > (similarly in the previous paragraph)
>
> Maybe "VMA in the process of being write-locked or detached, which got
> blocked due to the spurious readers that temporarily raised the
> refcount"?
>

Well no, because you might have legit read locks held that you're waiting on
right? That are not spurious? Can be a mix of spurious and legit (though
unlikely, since by then you'd hope the READ_ONCE() check would succeed, but hey
there's weird out-of-order arches out there).

As per reply to Vlastimil I updated it to:

	 * VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - A thread is either write-locking
	 * an attached VMA and has yet to invoke __vma_exit_locked(), OR a
	 * thread is detaching a VMA and is waiting on a single spurious reader
	 * in order to decrement the reference count. IMPORTANT - as above, no
	 * further readers can increment the reference count.
	 *
	 * > VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - A thread is either
	 * write-locking or detaching a VMA is waiting on readers to
	 * exit. IMPORTANT - as above, no ruther readers can increment the
	 * reference count.

Thanks, Lorenzo