include/linux/mm.h | 5 +- include/linux/mm_types.h | 57 +++++++- include/linux/mmap_lock.h | 264 +++++++++++++++++++++++++++++++++----- mm/madvise.c | 4 +- mm/mmap_lock.c | 173 ++++++++++++++++--------- 5 files changed, 396 insertions(+), 107 deletions(-)
This series first introduces a series of refactorings, intended to significantly improve readability and abstraction of the code. Sometimes we wish to assert that a VMA is stable, that is - the VMA cannot be changed underneath us. This will be the case if EITHER the VMA lock or the mmap lock is held. We already open-code this in two places - anon_vma_name() in mm/madvise.c and vma_flag_set_atomic() in include/linux/mm.h. This series adds vma_assert_stablised() which abstract this can be used in these callsites instead. This implementation uses lockdep where possible - that is VMA read locks - which correctly track read lock acquisition/release via: vma_start_read() -> rwsem_acquire_read() vma_start_read_locked() -> vma_start_read_locked_nested() -> rwsem_acquire_read() And: vma_end_read() -> vma_refcount_put() -> rwsem_release() We don't track the VMA locks using lockdep for VMA write locks, however these are predicated upon mmap write locks whose lockdep state we do track, and additionally vma_assert_stabillised() asserts this check if VMA read lock is not held, so we get lockdep coverage in this case also. We also add extensive comments to describe what we're doing. There's some tricky stuff around mmap locking and stabilisation races that we have to be careful of that I describe in the patch introducing vma_assert_stabilised(). This change also lays the foundation for future series to add this assert in further places where we wish to make it clear that we rely upon a stabilised VMA. The motivation for this change was precisely this. v4: * Propagated tags (thanks Vlastimil, Suren!) * Updated reference count documentation in 2/10 as per Vlastimil, Suren. * Updated 7/10 to update the references in the reference count comment from __vma_exit_locked() to __vma_end_exclude_readers(). * Renamed are_readers_excluded() to __vma_are_readers_excluded() as per Vlastimil. * Several more comment updates as per Vlastimil, Suren in 3/10. * Updated 3/10 commit message as per Suren. * Updated __vma_refcount_put() to just return the newcnt as per Suren. * Renamed __vma_refcount_put() to __vma_refcount_put_return() as per Vlastimil. * Made __vma_refcount_put_return() __must_check too. * Comment fixups on 4/10 as per Vlastimil. * Renamed __vma_enter_exclusive_locked() and __vma_exit_exclusive_locked() to __vma_start_exclude_readers() and __vma_end_exclude_readers() as per Vlastimil in 6/10. * Reworked comment as per Suren in 6/10. * Avoided WARN_ON_ONCE() function invocation as per Suren in 6/10. * s/ves->locked/ves->exclusive/ as per Suren in 7/10. * Removed confusing asserts in 7/10 as per Suren. * Changed from !ves.detached to ves.exclusive in 7/10 as per Suren. * Updated comments in 7/10 as per Suren. * Removed broken assert in __vma_end_exclude_readers() in 7/10 as per Vlastimil. * Separated out vma_mark_detached() into static inline portion and unlikely exclude readers in 7/10 as per Vlastimil. * Removed mm seq num output parameter from __is_vma_write_locked() as per Vlastimil in 8/10. * Converted VM_BUG_ON_VMA() to VM_WARN_ON_ONCE() in 8/10 as per Vlastimil (though he said it in reply to a future commit :). * Added helper function __vma_raw_mm_seqnum() to aid the conversion of __is_vma_write_locked() and updated the commit message accordingly. * Moved mmap_assert_write_locked() to __vma_raw_mm_seqnum() is it is required for this access to be valid. * Replaced VM_BUG_ON_VMA() with VM_WARN_ON_ONCE_VMA() on 9/10 as per Vlastiml. * Renamed refs to refcnt in vma_assert_locked() to be consistent. * Moved comment about reference count possible values above refcnt assignment so it's not just weirdly at the top of the function. v3: * Added 8 patches of refactoring the VMA lock implementation :) * Dropped the vma_is_*locked() predicates as too difficult to get entirely right. * Updated vma_assert_locked() to assert what we sensibly can, use lockdep if possible and invoke vma_assert_write_locked() to share code as before. * Took into account extensive feedback received from Vlastimil (thanks! :) https://lore.kernel.org/all/cover.1769086312.git.lorenzo.stoakes@oracle.com/ v2: * Added lockdep as much as possible to the mix as per Peter and Sebastian. * Added comments to make clear what we're doing in each case. * I realise I made a mistake in saying the previous duplicative VMA stable asserts were wrong - vma_assert_locked() is not a no-op if !CONFIG_PER_VMA_LOCK, instead it degrades to asserting that the mmap lock is held, so this is correct, though means we'd have checked this twice, only triggering an assert the second time. * Accounted for is_vma_writer_only() case in vma_is_read_locked(). * Accounted for two hideous issues - we cannot check VMA lock first, because we may be holding a VMA write lock and be raced by VMA readers of _other_ VMA's. If we check the mmap lock first and assert, we may hold a VMA read lock and race other threads which hodl the mmap read lock and fail an assert. We resolve this by a precise mmap ownership check if lockdep is used, and allowing the check to be approximate if no lockdep. * Added more comments and updated commit logs. * Dropped Suren's Suggested-by as significant changes in this set (this was for the vma_is_read_locked() as a concept). https://lore.kernel.org/all/cover.1768855783.git.lorenzo.stoakes@oracle.com/ v1: https://lore.kernel.org/all/cover.1768569863.git.lorenzo.stoakes@oracle.com/ Lorenzo Stoakes (10): mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG mm/vma: document possible vma->vm_refcnt values and reference comment mm/vma: rename is_vma_write_only(), separate out shared refcount put mm/vma: add+use vma lockdep acquire/release defines mm/vma: de-duplicate __vma_enter_locked() error path mm/vma: clean up __vma_enter/exit_locked() mm/vma: introduce helper struct + thread through exclusive lock fns mm/vma: improve and document __is_vma_write_locked() mm/vma: update vma_assert_locked() to use lockdep mm/vma: add and use vma_assert_stabilised() include/linux/mm.h | 5 +- include/linux/mm_types.h | 57 +++++++- include/linux/mmap_lock.h | 264 +++++++++++++++++++++++++++++++++----- mm/madvise.c | 4 +- mm/mmap_lock.c | 173 ++++++++++++++++--------- 5 files changed, 396 insertions(+), 107 deletions(-) -- 2.52.0
On Fri, 23 Jan 2026 20:12:10 +0000 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> This series first introduces a series of refactorings, intended to
> significantly improve readability and abstraction of the code.
I updated the mm-unstable branch with this, thanks.
> v4:
> * Propagated tags (thanks Vlastimil, Suren!)
> * Updated reference count documentation in 2/10 as per Vlastimil, Suren.
> * Updated 7/10 to update the references in the reference count comment from
> __vma_exit_locked() to __vma_end_exclude_readers().
> * Renamed are_readers_excluded() to __vma_are_readers_excluded() as per
> Vlastimil.
> * Several more comment updates as per Vlastimil, Suren in 3/10.
> * Updated 3/10 commit message as per Suren.
> * Updated __vma_refcount_put() to just return the newcnt as per Suren.
> * Renamed __vma_refcount_put() to __vma_refcount_put_return() as per Vlastimil.
> * Made __vma_refcount_put_return() __must_check too.
> * Comment fixups on 4/10 as per Vlastimil.
> * Renamed __vma_enter_exclusive_locked() and __vma_exit_exclusive_locked()
> to __vma_start_exclude_readers() and __vma_end_exclude_readers() as per
> Vlastimil in 6/10.
> * Reworked comment as per Suren in 6/10.
> * Avoided WARN_ON_ONCE() function invocation as per Suren in 6/10.
> * s/ves->locked/ves->exclusive/ as per Suren in 7/10.
> * Removed confusing asserts in 7/10 as per Suren.
> * Changed from !ves.detached to ves.exclusive in 7/10 as per Suren.
> * Updated comments in 7/10 as per Suren.
> * Removed broken assert in __vma_end_exclude_readers() in 7/10 as per
> Vlastimil.
> * Separated out vma_mark_detached() into static inline portion and unlikely
> exclude readers in 7/10 as per Vlastimil.
> * Removed mm seq num output parameter from __is_vma_write_locked() as per
> Vlastimil in 8/10.
> * Converted VM_BUG_ON_VMA() to VM_WARN_ON_ONCE() in 8/10 as per Vlastimil
> (though he said it in reply to a future commit :).
> * Added helper function __vma_raw_mm_seqnum() to aid the conversion of
> __is_vma_write_locked() and updated the commit message accordingly.
> * Moved mmap_assert_write_locked() to __vma_raw_mm_seqnum() is it is
> required for this access to be valid.
> * Replaced VM_BUG_ON_VMA() with VM_WARN_ON_ONCE_VMA() on 9/10 as per
> Vlastiml.
> * Renamed refs to refcnt in vma_assert_locked() to be consistent.
> * Moved comment about reference count possible values above refcnt
> assignment so it's not just weirdly at the top of the function.
Below is how v4 altered mm.git:
--- a/include/linux/mmap_lock.h~b
+++ a/include/linux/mmap_lock.h
@@ -87,10 +87,11 @@ static inline void mmap_assert_write_loc
*
* Write locks are acquired exclusively per-VMA, but released in a shared
* fashion, that is upon vma_end_write_all(), we update the mmap's seqcount such
- * that write lock is de-acquired.
+ * that write lock is released.
*
* We therefore cannot track write locks per-VMA, nor do we try. Mitigating this
- * is the fact that, of course, we do lockdep-track the mmap lock rwsem.
+ * is the fact that, of course, we do lockdep-track the mmap lock rwsem which
+ * must be held when taking a VMA write lock.
*
* We do, however, want to indicate that during either acquisition of a VMA
* write lock or detachment of a VMA that we require the lock held be exclusive,
@@ -152,14 +153,9 @@ static inline void vma_lock_init(struct
vma->vm_lock_seq = UINT_MAX;
}
-/**
- * are_readers_excluded() - Determine whether @refcnt describes a VMA which has
- * excluded all VMA read locks.
- * @refcnt: The VMA reference count obtained from vm_area_struct->vm_refcnt.
- *
- * We may be raced by other readers temporarily incrementing the reference
- * count, though the race window is very small, this might cause spurious
- * wakeups.
+/*
+ * This function determines whether the input VMA reference count describes a
+ * VMA which has excluded all VMA read locks.
*
* In the case of a detached VMA, we may incorrectly indicate that readers are
* excluded when one remains, because in that scenario we target a refcount of
@@ -170,7 +166,7 @@ static inline void vma_lock_init(struct
*
* Returns: true if readers are excluded, false otherwise.
*/
-static inline bool are_readers_excluded(int refcnt)
+static inline bool __vma_are_readers_excluded(int refcnt)
{
/*
* See the comment describing the vm_area_struct->vm_refcnt field for
@@ -180,15 +176,21 @@ static inline bool are_readers_excluded(
refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1;
}
-static inline bool __vma_refcount_put(struct vm_area_struct *vma, int *refcnt)
+/*
+ * Actually decrement the VMA reference count.
+ *
+ * The function returns the reference count as it was immediately after the
+ * decrement took place. If it returns zero, the VMA is now detached.
+ */
+static inline __must_check unsigned int
+__vma_refcount_put_return(struct vm_area_struct *vma)
{
int oldcnt;
- bool detached;
- detached = __refcount_dec_and_test(&vma->vm_refcnt, &oldcnt);
- if (refcnt)
- *refcnt = oldcnt - 1;
- return detached;
+ if (__refcount_dec_and_test(&vma->vm_refcnt, &oldcnt))
+ return 0;
+
+ return oldcnt - 1;
}
/**
@@ -203,17 +205,21 @@ static inline void vma_refcount_put(stru
{
/* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt. */
struct mm_struct *mm = vma->vm_mm;
- int refcnt;
- bool detached;
+ int newcnt;
__vma_lockdep_release_read(vma);
- detached = __vma_refcount_put(vma, &refcnt);
+ newcnt = __vma_refcount_put_return(vma);
+
/*
- * __vma_enter_exclusive_locked() may be sleeping waiting for readers to
+ * __vma_start_exclude_readers() may be sleeping waiting for readers to
* drop their reference count, so wake it up if we were the last reader
* blocking it from being acquired.
+ *
+ * We may be raced by other readers temporarily incrementing the
+ * reference count, though the race window is very small, this might
+ * cause spurious wakeups.
*/
- if (!detached && are_readers_excluded(refcnt))
+ if (newcnt && __vma_are_readers_excluded(newcnt))
rcuwait_wake_up(&mm->vma_writer_wait);
}
@@ -252,6 +258,15 @@ static inline void vma_end_read(struct v
vma_refcount_put(vma);
}
+static inline unsigned int __vma_raw_mm_seqnum(struct vm_area_struct *vma)
+{
+ const struct mm_struct *mm = vma->vm_mm;
+
+ /* We must hold an exclusive write lock for this access to be valid. */
+ mmap_assert_write_locked(vma->vm_mm);
+ return mm->mm_lock_seq.sequence;
+}
+
/*
* Determine whether a VMA is write-locked. Must be invoked ONLY if the mmap
* write lock is held.
@@ -260,22 +275,13 @@ static inline void vma_end_read(struct v
*
* Note that mm_lock_seq is updated only if the VMA is NOT write-locked.
*/
-static inline bool __is_vma_write_locked(struct vm_area_struct *vma,
- unsigned int *mm_lock_seq)
+static inline bool __is_vma_write_locked(struct vm_area_struct *vma)
{
- struct mm_struct *mm = vma->vm_mm;
- const unsigned int seq = mm->mm_lock_seq.sequence;
-
- mmap_assert_write_locked(mm);
-
/*
* current task is holding mmap_write_lock, both vma->vm_lock_seq and
* mm->mm_lock_seq can't be concurrently modified.
*/
- if (vma->vm_lock_seq == seq)
- return true;
- *mm_lock_seq = seq;
- return false;
+ return vma->vm_lock_seq == __vma_raw_mm_seqnum(vma);
}
int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
@@ -288,12 +294,10 @@ int __vma_start_write(struct vm_area_str
*/
static inline void vma_start_write(struct vm_area_struct *vma)
{
- unsigned int mm_lock_seq;
-
- if (__is_vma_write_locked(vma, &mm_lock_seq))
+ if (__is_vma_write_locked(vma))
return;
- __vma_start_write(vma, mm_lock_seq, TASK_UNINTERRUPTIBLE);
+ __vma_start_write(vma, __vma_raw_mm_seqnum(vma), TASK_UNINTERRUPTIBLE);
}
/**
@@ -312,11 +316,10 @@ static inline void vma_start_write(struc
static inline __must_check
int vma_start_write_killable(struct vm_area_struct *vma)
{
- unsigned int mm_lock_seq;
-
- if (__is_vma_write_locked(vma, &mm_lock_seq))
+ if (__is_vma_write_locked(vma))
return 0;
- return __vma_start_write(vma, mm_lock_seq, TASK_KILLABLE);
+
+ return __vma_start_write(vma, __vma_raw_mm_seqnum(vma), TASK_KILLABLE);
}
/**
@@ -325,9 +328,7 @@ int vma_start_write_killable(struct vm_a
*/
static inline void vma_assert_write_locked(struct vm_area_struct *vma)
{
- unsigned int mm_lock_seq;
-
- VM_BUG_ON_VMA(!__is_vma_write_locked(vma, &mm_lock_seq), vma);
+ VM_WARN_ON_ONCE_VMA(!__is_vma_write_locked(vma), vma);
}
/**
@@ -337,12 +338,7 @@ static inline void vma_assert_write_lock
*/
static inline void vma_assert_locked(struct vm_area_struct *vma)
{
- unsigned int refs;
-
- /*
- * See the comment describing the vm_area_struct->vm_refcnt field for
- * details of possible refcnt values.
- */
+ unsigned int refcnt;
/*
* If read-locked or currently excluding readers, then the VMA is
@@ -353,18 +349,22 @@ static inline void vma_assert_locked(str
return;
#endif
- refs = refcount_read(&vma->vm_refcnt);
+ /*
+ * See the comment describing the vm_area_struct->vm_refcnt field for
+ * details of possible refcnt values.
+ */
+ refcnt = refcount_read(&vma->vm_refcnt);
/*
* In this case we're either read-locked, write-locked with temporary
* readers, or in the midst of excluding readers, all of which means
* we're locked.
*/
- if (refs > 1)
+ if (refcnt > 1)
return;
/* It is a bug for the VMA to be detached here. */
- VM_BUG_ON_VMA(!refs, vma);
+ VM_WARN_ON_ONCE_VMA(!refcnt, vma);
/*
* OK, the VMA has a reference count of 1 which means it is either
@@ -447,7 +447,28 @@ static inline void vma_mark_attached(str
refcount_set_release(&vma->vm_refcnt, 1);
}
-void vma_mark_detached(struct vm_area_struct *vma);
+void __vma_exclude_readers_for_detach(struct vm_area_struct *vma);
+
+static inline void vma_mark_detached(struct vm_area_struct *vma)
+{
+ vma_assert_write_locked(vma);
+ vma_assert_attached(vma);
+
+ /*
+ * The VMA still being attached (refcnt > 0) - is unlikely, because the
+ * vma has been already write-locked and readers can increment vm_refcnt
+ * only temporarily before they check vm_lock_seq, realize the vma is
+ * locked and drop back the vm_refcnt. That is a narrow window for
+ * observing a raised vm_refcnt.
+ *
+ * See the comment describing the vm_area_struct->vm_refcnt field for
+ * details of possible refcnt values.
+ */
+ if (likely(!__vma_refcount_put_return(vma)))
+ return;
+
+ __vma_exclude_readers_for_detach(vma);
+}
struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
unsigned long address);
--- a/include/linux/mm_types.h~b
+++ a/include/linux/mm_types.h
@@ -753,7 +753,7 @@ static inline struct anon_vma_name *anon
#endif
/*
- * WHile __vma_enter_locked() is working to ensure are no read-locks held on a
+ * While __vma_enter_locked() is working to ensure are no read-locks held on a
* VMA (either while acquiring a VMA write lock or marking a VMA detached) we
* set the VM_REFCNT_EXCLUDE_READERS_FLAG in vma->vm_refcnt to indiciate to
* vma_start_read() that the reference count should be left alone.
@@ -991,16 +991,19 @@ struct vm_area_struct {
#endif
#ifdef CONFIG_PER_VMA_LOCK
/*
- * Used to keep track of the number of references taken by VMA read or
- * write locks. May have the VM_REFCNT_EXCLUDE_READERS_FLAG set
- * indicating that a thread has entered __vma_enter_locked() and is
- * waiting on any outstanding read locks to exit.
+ * Used to keep track of firstly, whether the VMA is attached, secondly,
+ * if attached, how many read locks are taken, and thirdly, if the
+ * VM_REFCNT_EXCLUDE_READERS_FLAG is set, whether any read locks held
+ * are currently in the process of being excluded.
*
* This value can be equal to:
*
- * 0 - Detached.
+ * 0 - Detached. IMPORTANT: when the refcnt is zero, readers cannot
+ * increment it.
*
- * 1 - Unlocked or write-locked.
+ * 1 - Attached and either unlocked or write-locked. Write locks are
+ * identified via __is_vma_write_locked() which checks for equality of
+ * vma->vm_lock_seq and mm->mm_lock_seq.
*
* >1, < VM_REFCNT_EXCLUDE_READERS_FLAG - Read-locked or (unlikely)
* write-locked with other threads having temporarily incremented the
@@ -1008,19 +1011,19 @@ struct vm_area_struct {
* decrementing it again.
*
* VM_REFCNT_EXCLUDE_READERS_FLAG - Detached, pending
- * __vma_exit_locked() completion which will decrement the reference
- * count to zero. IMPORTANT - at this stage no further readers can
- * increment the reference count. It can only be reduced.
- *
- * VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - Either an attached VMA pending
- * __vma_exit_locked() completion which will decrement the reference
- * count to one, OR a detached VMA waiting on a single spurious reader
- * to decrement reference count. IMPORTANT - as above, no further
- * readers can increment the reference count.
- *
- * > VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - VMA is waiting on readers,
- * whether it is attempting to acquire a write lock or attempting to
- * detach. IMPORTANT - as above, no ruther readers can increment the
+ * __vma_exit_exclusive_locked() completion which will decrement the
+ * reference count to zero. IMPORTANT - at this stage no further readers
+ * can increment the reference count. It can only be reduced.
+ *
+ * VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - A thread is either write-locking
+ * an attached VMA and has yet to invoke __vma_exit_exclusive_locked(),
+ * OR a thread is detaching a VMA and is waiting on a single spurious
+ * reader in order to decrement the reference count. IMPORTANT - as
+ * above, no further readers can increment the reference count.
+ *
+ * > VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - A thread is either
+ * write-locking or detaching a VMA is waiting on readers to
+ * exit. IMPORTANT - as above, no ruther readers can increment the
* reference count.
*
* NOTE: Unstable RCU readers are allowed to read this.
--- a/mm/mmap_lock.c~b
+++ a/mm/mmap_lock.c
@@ -53,6 +53,7 @@ struct vma_exclude_readers_state {
int state; /* TASK_KILLABLE or TASK_UNINTERRUPTIBLE. */
bool detaching;
+ /* Output parameters. */
bool detached;
bool exclusive; /* Are we exclusively locked? */
};
@@ -60,15 +61,12 @@ struct vma_exclude_readers_state {
/*
* Now that all readers have been evicted, mark the VMA as being out of the
* 'exclude readers' state.
- *
- * Returns true if the VMA is now detached, otherwise false.
*/
-static void __vma_exit_exclusive_locked(struct vma_exclude_readers_state *ves)
+static void __vma_end_exclude_readers(struct vma_exclude_readers_state *ves)
{
struct vm_area_struct *vma = ves->vma;
VM_WARN_ON_ONCE(ves->detached);
- VM_WARN_ON_ONCE(!ves->exclusive);
ves->detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
&vma->vm_refcnt);
@@ -93,27 +91,24 @@ static unsigned int get_target_refcnt(st
* where we wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal
* signal is permitted to kill it.
*
- * The function sets the ves->locked parameter to true if an exclusive lock was
- * acquired, or false if the VMA was detached or an error arose on wait.
+ * The function sets the ves->exclusive parameter to true if readers were
+ * excluded, or false if the VMA was detached or an error arose on wait.
*
* If the function indicates an exclusive lock was acquired via ves->exclusive
- * (or equivalently, returning 0 with !ves->detached), the caller is required to
- * invoke __vma_exit_exclusive_locked() once the exclusive state is no longer
- * required.
+ * the caller is required to invoke __vma_end_exclude_readers() once the
+ * exclusive state is no longer required.
*
* If ves->state is set to something other than TASK_UNINTERRUPTIBLE, the
* function may also return -EINTR to indicate a fatal signal was received while
* waiting.
*/
-static int __vma_enter_exclusive_locked(struct vma_exclude_readers_state *ves)
+static int __vma_start_exclude_readers(struct vma_exclude_readers_state *ves)
{
struct vm_area_struct *vma = ves->vma;
unsigned int tgt_refcnt = get_target_refcnt(ves);
int err = 0;
mmap_assert_write_locked(vma->vm_mm);
- VM_WARN_ON_ONCE(ves->detached);
- VM_WARN_ON_ONCE(ves->exclusive);
/*
* If vma is detached then only vma_mark_attached() can raise the
@@ -132,7 +127,7 @@ static int __vma_enter_exclusive_locked(
refcount_read(&vma->vm_refcnt) == tgt_refcnt,
ves->state);
if (err) {
- __vma_exit_exclusive_locked(ves);
+ __vma_end_exclude_readers(ves);
return err;
}
@@ -150,7 +145,7 @@ int __vma_start_write(struct vm_area_str
.state = state,
};
- err = __vma_enter_exclusive_locked(&ves);
+ err = __vma_start_exclude_readers(&ves);
if (err) {
WARN_ON_ONCE(ves.detached);
return err;
@@ -164,8 +159,8 @@ int __vma_start_write(struct vm_area_str
*/
WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
- if (!ves.detached) {
- __vma_exit_exclusive_locked(&ves);
+ if (ves.exclusive) {
+ __vma_end_exclude_readers(&ves);
/* VMA should remain attached. */
WARN_ON_ONCE(ves.detached);
}
@@ -174,7 +169,7 @@ int __vma_start_write(struct vm_area_str
}
EXPORT_SYMBOL_GPL(__vma_start_write);
-void vma_mark_detached(struct vm_area_struct *vma)
+void __vma_exclude_readers_for_detach(struct vm_area_struct *vma)
{
struct vma_exclude_readers_state ves = {
.vma = vma,
@@ -183,16 +178,6 @@ void vma_mark_detached(struct vm_area_st
};
int err;
- vma_assert_write_locked(vma);
- vma_assert_attached(vma);
-
- /*
- * See the comment describing the vm_area_struct->vm_refcnt field for
- * details of possible refcnt values.
- */
- if (likely(__vma_refcount_put(vma, NULL)))
- return;
-
/*
* Wait until the VMA is detached with no readers. Since we hold the VMA
* write lock, the only read locks that might be present are those from
@@ -200,13 +185,13 @@ void vma_mark_detached(struct vm_area_st
* reference count before realising the write lock is held and
* decrementing it.
*/
- err = __vma_enter_exclusive_locked(&ves);
- if (!err && !ves.detached) {
+ err = __vma_start_exclude_readers(&ves);
+ if (!err && ves.exclusive) {
/*
* Once this is complete, no readers can increment the
* reference count, and the VMA is marked detached.
*/
- __vma_exit_exclusive_locked(&ves);
+ __vma_end_exclude_readers(&ves);
}
/* If an error arose but we were detached anyway, we don't care. */
WARN_ON_ONCE(!ves.detached);
_
© 2016 - 2026 Red Hat, Inc.