[v2] perf: Convert mmap() related reference counts to refcount_t

[patch V2 RESEND 0/6] perf: Convert mmap() related reference counts to refcount_t

Posted by Thomas Gleixner 6 months ago

[ Resend because I fat-fingered the template. Sorry for the noise ]

This is an update to V1 of this conversion series, which can be found here:

  https://lore.kernel.org/all/20250806195624.880096284@linutronix.de

The recently fixed reference count leaks could have been detected by using
refcount_t and refcount_t would have mitigated the potential overflow at
least.

It turned out that converting the code as is does not work as the
allocation code ends up doing a refcount_inc() for the first allocation,
which causes refcount_t sanity checks to emit a UAF warning.

The reason is that the code is sharing functionality at the wrong level and
ends up being overly complicated for no reason. That's what inevitable led
to the refcount leak problems.

Address this by splitting the ringbuffer and the AUX buffer mapping and
allocation parts out into seperate functions, which handle the reference
counts in a sane way.

That not only simplifies the code and makes it halfways comprehensible, but
also allows to convert the mmap() related reference counts to refcount_t.

It survives lightweight testing with perf and passes the perf/mmap
selftest.

Changes vs. V1:

  - Fix the invers condition in the temporary workaround for the AUX buffer
    split out - Lorenzo

  - Apply writable flags in the AUX buffer allocation - Lorenzo

  - Fix the bogus subject line of the AUX buffer allocation splitout

  - Add a comment about size matching

  - Rebased on v16.17-rc1

  - Picked up Reviewed tags as far as applicable

Delta patch below

The series applies on top of Linus tree and is also available from git:

    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git perf/refcounts

Thanks,

	tglx
---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2f061d392cd9..d2de721a7614 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6987,6 +6987,7 @@ static int perf_mmap_rb(struct vm_area_struct *vma, struct perf_event *event,
        WARN_ON_ONCE(event->ctx->parent_ctx);
 
        if (rb) {
+               /* Must have the same size */
                if (data_page_nr(rb) != nr_pages)
                        return -EINVAL;
 
@@ -7084,6 +7085,9 @@ static int perf_mmap_aux(struct vm_area_struct *vma, struct perf_event *event,
                return -EPERM;
        }
 
+       if (vma->vm_flags & VM_WRITE)
+               rb_flags |= RING_BUFFER_WRITABLE;
+
        ret = rb_alloc_aux(rb, event, vma->vm_pgoff, nr_pages,
                           event->attr.aux_watermark, rb_flags);
        if (ret) {

---
 include/linux/perf_event.h  |    2 
 kernel/events/core.c        |  365 ++++++++++++++++++++++----------------------
 kernel/events/internal.h    |    4 
 kernel/events/ring_buffer.c |    2 
 4 files changed, 189 insertions(+), 184 deletions(-)

[patch V2 RESEND 1/6] perf/core: Remove redundant condition for AUX buffer size

Posted by Thomas Gleixner 6 months ago

It is already checked whether the VMA size is the same as
nr_pages * PAGE_SIZE, so later checking both:

      aux_size == vma_size && aux_size == nr_pages * PAGE_SIZE

is redundant. Remove the vma_size check as nr_pages is what is actually
used in the allocation function. That prepares for splitting out the buffer
allocation into seperate functions, so that only nr_pages needs to be
handed in.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

---
 kernel/events/core.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7043,7 +7043,7 @@ static int perf_mmap(struct file *file,
 		if (rb_has_aux(rb) && rb->aux_pgoff != vma->vm_pgoff)
 			goto aux_unlock;
 
-		if (aux_size != vma_size || aux_size != nr_pages * PAGE_SIZE)
+		if (aux_size != nr_pages * PAGE_SIZE)
 			goto aux_unlock;
 
 		/* already mapped with a different size */

Re: [patch V2 RESEND 1/6] perf/core: Remove redundant condition for AUX buffer size

Posted by Lorenzo Stoakes 6 months ago

On Mon, Aug 11, 2025 at 02:36:28PM +0200, Thomas Gleixner wrote:
> It is already checked whether the VMA size is the same as
> nr_pages * PAGE_SIZE, so later checking both:
>
>       aux_size == vma_size && aux_size == nr_pages * PAGE_SIZE
>
> is redundant. Remove the vma_size check as nr_pages is what is actually
> used in the allocation function. That prepares for splitting out the buffer
> allocation into seperate functions, so that only nr_pages needs to be

NIT: seperate -> separate.

> handed in.
>
> No functional change.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> ---
>  kernel/events/core.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7043,7 +7043,7 @@ static int perf_mmap(struct file *file,
>  		if (rb_has_aux(rb) && rb->aux_pgoff != vma->vm_pgoff)
>  			goto aux_unlock;
>
> -		if (aux_size != vma_size || aux_size != nr_pages * PAGE_SIZE)
> +		if (aux_size != nr_pages * PAGE_SIZE)
>  			goto aux_unlock;
>
>  		/* already mapped with a different size */
>
>
>

[patch V2 RESEND 2/6] perf/core: Split out mlock limit handling

Posted by Thomas Gleixner 6 months ago

To prepare for splitting the buffer allocation out into seperate functions
for the ring buffer and the AUX buffer, split out mlock limit handling into
a helper function, which can be called from both.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

---
 kernel/events/core.c |   75 +++++++++++++++++++++++++--------------------------
 1 file changed, 38 insertions(+), 37 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6927,17 +6927,49 @@ static int map_range(struct perf_buffer
 	return err;
 }
 
+static bool perf_mmap_calc_limits(struct vm_area_struct *vma, long *user_extra, long *extra)
+{
+	unsigned long user_locked, user_lock_limit, locked, lock_limit;
+	struct user_struct *user = current_user();
+
+	user_lock_limit = sysctl_perf_event_mlock >> (PAGE_SHIFT - 10);
+	/* Increase the limit linearly with more CPUs */
+	user_lock_limit *= num_online_cpus();
+
+	user_locked = atomic_long_read(&user->locked_vm);
+
+	/*
+	 * sysctl_perf_event_mlock may have changed, so that
+	 *     user->locked_vm > user_lock_limit
+	 */
+	if (user_locked > user_lock_limit)
+		user_locked = user_lock_limit;
+	user_locked += *user_extra;
+
+	if (user_locked > user_lock_limit) {
+		/*
+		 * charge locked_vm until it hits user_lock_limit;
+		 * charge the rest from pinned_vm
+		 */
+		*extra = user_locked - user_lock_limit;
+		*user_extra -= *extra;
+	}
+
+	lock_limit = rlimit(RLIMIT_MEMLOCK);
+	lock_limit >>= PAGE_SHIFT;
+	locked = atomic64_read(&vma->vm_mm->pinned_vm) + *extra;
+
+	return locked <= lock_limit || !perf_is_paranoid() || capable(CAP_IPC_LOCK);
+}
+
 static int perf_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct perf_event *event = file->private_data;
-	unsigned long user_locked, user_lock_limit;
 	struct user_struct *user = current_user();
+	unsigned long vma_size, nr_pages;
+	long user_extra = 0, extra = 0;
 	struct mutex *aux_mutex = NULL;
 	struct perf_buffer *rb = NULL;
-	unsigned long locked, lock_limit;
-	unsigned long vma_size;
-	unsigned long nr_pages;
-	long user_extra = 0, extra = 0;
 	int ret, flags = 0;
 	mapped_f mapped;
 
@@ -7063,38 +7095,7 @@ static int perf_mmap(struct file *file,
 		}
 	}
 
-	user_lock_limit = sysctl_perf_event_mlock >> (PAGE_SHIFT - 10);
-
-	/*
-	 * Increase the limit linearly with more CPUs:
-	 */
-	user_lock_limit *= num_online_cpus();
-
-	user_locked = atomic_long_read(&user->locked_vm);
-
-	/*
-	 * sysctl_perf_event_mlock may have changed, so that
-	 *     user->locked_vm > user_lock_limit
-	 */
-	if (user_locked > user_lock_limit)
-		user_locked = user_lock_limit;
-	user_locked += user_extra;
-
-	if (user_locked > user_lock_limit) {
-		/*
-		 * charge locked_vm until it hits user_lock_limit;
-		 * charge the rest from pinned_vm
-		 */
-		extra = user_locked - user_lock_limit;
-		user_extra -= extra;
-	}
-
-	lock_limit = rlimit(RLIMIT_MEMLOCK);
-	lock_limit >>= PAGE_SHIFT;
-	locked = atomic64_read(&vma->vm_mm->pinned_vm) + extra;
-
-	if ((locked > lock_limit) && perf_is_paranoid() &&
-		!capable(CAP_IPC_LOCK)) {
+	if (!perf_mmap_calc_limits(vma, &user_extra, &extra)) {
 		ret = -EPERM;
 		goto unlock;
 	}

Re: [patch V2 RESEND 2/6] perf/core: Split out mlock limit handling

Posted by Lorenzo Stoakes 6 months ago

On Mon, Aug 11, 2025 at 02:36:30PM +0200, Thomas Gleixner wrote:
> To prepare for splitting the buffer allocation out into seperate functions

NIT: seperate -> separate

> for the ring buffer and the AUX buffer, split out mlock limit handling into
> a helper function, which can be called from both.
>
> No functional change intended.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> ---

[patch V2 RESEND 3/6] perf/core: Split out VM accounting

Posted by Thomas Gleixner 6 months ago

Similary to the mlock limit calculation the VM accounting is required for
both the ringbuffer and the AUX buffer allocations.

To prepare for splitting them out into seperate functions, move the
accounting into a helper function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

---
 kernel/events/core.c |   13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6962,10 +6962,17 @@ static bool perf_mmap_calc_limits(struct
 	return locked <= lock_limit || !perf_is_paranoid() || capable(CAP_IPC_LOCK);
 }
 
+static void perf_mmap_account(struct vm_area_struct *vma, long user_extra, long extra)
+{
+	struct user_struct *user = current_user();
+
+	atomic_long_add(user_extra, &user->locked_vm);
+	atomic64_add(extra, &vma->vm_mm->pinned_vm);
+}
+
 static int perf_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct perf_event *event = file->private_data;
-	struct user_struct *user = current_user();
 	unsigned long vma_size, nr_pages;
 	long user_extra = 0, extra = 0;
 	struct mutex *aux_mutex = NULL;
@@ -7136,9 +7143,7 @@ static int perf_mmap(struct file *file,
 
 unlock:
 	if (!ret) {
-		atomic_long_add(user_extra, &user->locked_vm);
-		atomic64_add(extra, &vma->vm_mm->pinned_vm);
-
+		perf_mmap_account(vma, user_extra, extra);
 		atomic_inc(&event->mmap_count);
 	} else if (rb) {
 		/* AUX allocation failed */

Re: [patch V2 RESEND 3/6] perf/core: Split out VM accounting

Posted by Lorenzo Stoakes 6 months ago

On Mon, Aug 11, 2025 at 02:36:33PM +0200, Thomas Gleixner wrote:
> Similary to the mlock limit calculation the VM accounting is required for

NIT: Similary -> similarly

> both the ringbuffer and the AUX buffer allocations.
>
> To prepare for splitting them out into seperate functions, move the

NIT: seperate -> separate

> accounting into a helper function.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> ---
>  kernel/events/core.c |   13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6962,10 +6962,17 @@ static bool perf_mmap_calc_limits(struct
>  	return locked <= lock_limit || !perf_is_paranoid() || capable(CAP_IPC_LOCK);
>  }
>
> +static void perf_mmap_account(struct vm_area_struct *vma, long user_extra, long extra)
> +{
> +	struct user_struct *user = current_user();
> +
> +	atomic_long_add(user_extra, &user->locked_vm);
> +	atomic64_add(extra, &vma->vm_mm->pinned_vm);
> +}
> +
>  static int perf_mmap(struct file *file, struct vm_area_struct *vma)
>  {
>  	struct perf_event *event = file->private_data;
> -	struct user_struct *user = current_user();
>  	unsigned long vma_size, nr_pages;
>  	long user_extra = 0, extra = 0;
>  	struct mutex *aux_mutex = NULL;
> @@ -7136,9 +7143,7 @@ static int perf_mmap(struct file *file,
>
>  unlock:
>  	if (!ret) {
> -		atomic_long_add(user_extra, &user->locked_vm);
> -		atomic64_add(extra, &vma->vm_mm->pinned_vm);
> -
> +		perf_mmap_account(vma, user_extra, extra);
>  		atomic_inc(&event->mmap_count);
>  	} else if (rb) {
>  		/* AUX allocation failed */
>
>
>

[patch V2 RESEND 4/6] perf/core: Split out AUX buffer allocation

Posted by Thomas Gleixner 6 months ago

The code logic in perf_mmap() is incomprehensible and has been source of
subtle bugs in the past. It makes it impossible to convert the atomic_t
reference counts to refcount_t.

There is not really much, which is shared between the ringbuffer and AUX
buffer allocation code since the mlock limit calculation and the
accounting has been split out into helper functions.

Move the AUX buffer allocation code out and integrate the call with a
momentary workaround to allow skipping the remaining ringbuffer related
code completely. That workaround will be removed once the ringbuffer
allocation is moved to its own function as well.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Fixup invers condition and add the dropped flags setup back - Lorenzo
    Fixup subject line to match the content
---
 kernel/events/core.c |  137 +++++++++++++++++++++++++++++----------------------
 1 file changed, 78 insertions(+), 59 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6970,12 +6970,79 @@ static void perf_mmap_account(struct vm_
 	atomic64_add(extra, &vma->vm_mm->pinned_vm);
 }
 
+static int perf_mmap_aux(struct vm_area_struct *vma, struct perf_event *event,
+			 unsigned long nr_pages)
+{
+	long user_extra = nr_pages, extra = 0;
+	struct perf_buffer *rb = event->rb;
+	u64 aux_offset, aux_size;
+	int ret, rb_flags = 0;
+
+	/*
+	 * AUX area mapping: if rb->aux_nr_pages != 0, it's already
+	 * mapped, all subsequent mappings should have the same size
+	 * and offset. Must be above the normal perf buffer.
+	 */
+	aux_offset = READ_ONCE(rb->user_page->aux_offset);
+	aux_size = READ_ONCE(rb->user_page->aux_size);
+
+	if (aux_offset < perf_data_size(rb) + PAGE_SIZE)
+		return -EINVAL;
+
+	if (aux_offset != vma->vm_pgoff << PAGE_SHIFT)
+		return -EINVAL;
+
+	/* Already mapped with a different offset */
+	if (rb_has_aux(rb) && rb->aux_pgoff != vma->vm_pgoff)
+		return -EINVAL;
+
+	if (aux_size != nr_pages * PAGE_SIZE)
+		return -EINVAL;
+
+	/* Already mapped with a different size */
+	if (rb_has_aux(rb) && rb->aux_nr_pages != nr_pages)
+		return -EINVAL;
+
+	if (!is_power_of_2(nr_pages))
+		return -EINVAL;
+
+	/* If this succeeds, subsequent failures have to undo it */
+	if (!atomic_inc_not_zero(&rb->mmap_count))
+		return -EINVAL;
+
+	/* If mapped, attach to it */
+	if (rb_has_aux(rb)) {
+		atomic_inc(&rb->aux_mmap_count);
+		return 0;
+	}
+
+	if (!perf_mmap_calc_limits(vma, &user_extra, &extra)) {
+		atomic_dec(&rb->mmap_count);
+		return -EPERM;
+	}
+
+	if (vma->vm_flags & VM_WRITE)
+		rb_flags |= RING_BUFFER_WRITABLE;
+
+	ret = rb_alloc_aux(rb, event, vma->vm_pgoff, nr_pages,
+			   event->attr.aux_watermark, rb_flags);
+	if (ret) {
+		atomic_dec(&rb->mmap_count);
+		return ret;
+	}
+
+	atomic_set(&rb->aux_mmap_count, 1);
+	rb->aux_mmap_locked = extra;
+	perf_mmap_account(vma, user_extra, extra);
+	atomic_inc(&event->mmap_count);
+	return 0;
+}
+
 static int perf_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct perf_event *event = file->private_data;
 	unsigned long vma_size, nr_pages;
 	long user_extra = 0, extra = 0;
-	struct mutex *aux_mutex = NULL;
 	struct perf_buffer *rb = NULL;
 	int ret, flags = 0;
 	mapped_f mapped;
@@ -7055,51 +7122,15 @@ static int perf_mmap(struct file *file,
 		}
 
 	} else {
-		/*
-		 * AUX area mapping: if rb->aux_nr_pages != 0, it's already
-		 * mapped, all subsequent mappings should have the same size
-		 * and offset. Must be above the normal perf buffer.
-		 */
-		u64 aux_offset, aux_size;
-
-		rb = event->rb;
-		if (!rb)
-			goto aux_unlock;
-
-		aux_mutex = &rb->aux_mutex;
-		mutex_lock(aux_mutex);
-
-		aux_offset = READ_ONCE(rb->user_page->aux_offset);
-		aux_size = READ_ONCE(rb->user_page->aux_size);
-
-		if (aux_offset < perf_data_size(rb) + PAGE_SIZE)
-			goto aux_unlock;
-
-		if (aux_offset != vma->vm_pgoff << PAGE_SHIFT)
-			goto aux_unlock;
-
-		/* already mapped with a different offset */
-		if (rb_has_aux(rb) && rb->aux_pgoff != vma->vm_pgoff)
-			goto aux_unlock;
-
-		if (aux_size != nr_pages * PAGE_SIZE)
-			goto aux_unlock;
-
-		/* already mapped with a different size */
-		if (rb_has_aux(rb) && rb->aux_nr_pages != nr_pages)
-			goto aux_unlock;
-
-		if (!is_power_of_2(nr_pages))
-			goto aux_unlock;
-
-		if (!atomic_inc_not_zero(&rb->mmap_count))
-			goto aux_unlock;
-
-		if (rb_has_aux(rb)) {
-			atomic_inc(&rb->aux_mmap_count);
-			ret = 0;
-			goto unlock;
+		if (!event->rb) {
+			ret = -EINVAL;
+		} else {
+			scoped_guard(mutex, &event->rb->aux_mutex)
+				ret = perf_mmap_aux(vma, event, nr_pages);
 		}
+		// Temporary workaround to split out AUX handling first
+		mutex_unlock(&event->mmap_mutex);
+		goto out;
 	}
 
 	if (!perf_mmap_calc_limits(vma, &user_extra, &extra)) {
@@ -7132,28 +7163,16 @@ static int perf_mmap(struct file *file,
 		perf_event_init_userpage(event);
 		perf_event_update_userpage(event);
 		ret = 0;
-	} else {
-		ret = rb_alloc_aux(rb, event, vma->vm_pgoff, nr_pages,
-				   event->attr.aux_watermark, flags);
-		if (!ret) {
-			atomic_set(&rb->aux_mmap_count, 1);
-			rb->aux_mmap_locked = extra;
-		}
 	}
-
 unlock:
 	if (!ret) {
 		perf_mmap_account(vma, user_extra, extra);
 		atomic_inc(&event->mmap_count);
-	} else if (rb) {
-		/* AUX allocation failed */
-		atomic_dec(&rb->mmap_count);
 	}
-aux_unlock:
-	if (aux_mutex)
-		mutex_unlock(aux_mutex);
 	mutex_unlock(&event->mmap_mutex);
 
+// Temporary until RB allocation is split out.
+out:
 	if (ret)
 		return ret;

Re: [patch V2 RESEND 4/6] perf/core: Split out AUX buffer allocation

Posted by Lorenzo Stoakes 6 months ago

On Mon, Aug 11, 2025 at 02:36:35PM +0200, Thomas Gleixner wrote:
> The code logic in perf_mmap() is incomprehensible and has been source of
> subtle bugs in the past. It makes it impossible to convert the atomic_t
> reference counts to refcount_t.
>
> There is not really much, which is shared between the ringbuffer and AUX
> buffer allocation code since the mlock limit calculation and the
> accounting has been split out into helper functions.
>
> Move the AUX buffer allocation code out and integrate the call with a
> momentary workaround to allow skipping the remaining ringbuffer related
> code completely. That workaround will be removed once the ringbuffer
> allocation is moved to its own function as well.
>
> No functional change.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

git range-diff made this much easier :)

I see you have also rearranged the ordering of the event->state <=
PERF_EVENT_STATE_REVOKED check and the data_page_nr() check as well, which
comparing to the original, seems to be a correct thing to do (I guess maybe
we'd just have a different error message than anticipated at worst from
having them in the 'wrong' order).

Overall this LGTM, thanks for getting this respun quickly!!

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

> ---
> V2: Fixup invers condition and add the dropped flags setup back - Lorenzo
>     Fixup subject line to match the content
> ---
>  kernel/events/core.c |  137 +++++++++++++++++++++++++++++----------------------
>  1 file changed, 78 insertions(+), 59 deletions(-)
>
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6970,12 +6970,79 @@ static void perf_mmap_account(struct vm_
>  	atomic64_add(extra, &vma->vm_mm->pinned_vm);
>  }
>
> +static int perf_mmap_aux(struct vm_area_struct *vma, struct perf_event *event,
> +			 unsigned long nr_pages)
> +{
> +	long user_extra = nr_pages, extra = 0;
> +	struct perf_buffer *rb = event->rb;
> +	u64 aux_offset, aux_size;
> +	int ret, rb_flags = 0;
> +
> +	/*
> +	 * AUX area mapping: if rb->aux_nr_pages != 0, it's already
> +	 * mapped, all subsequent mappings should have the same size
> +	 * and offset. Must be above the normal perf buffer.
> +	 */
> +	aux_offset = READ_ONCE(rb->user_page->aux_offset);
> +	aux_size = READ_ONCE(rb->user_page->aux_size);
> +
> +	if (aux_offset < perf_data_size(rb) + PAGE_SIZE)
> +		return -EINVAL;
> +
> +	if (aux_offset != vma->vm_pgoff << PAGE_SHIFT)
> +		return -EINVAL;
> +
> +	/* Already mapped with a different offset */
> +	if (rb_has_aux(rb) && rb->aux_pgoff != vma->vm_pgoff)
> +		return -EINVAL;
> +
> +	if (aux_size != nr_pages * PAGE_SIZE)
> +		return -EINVAL;
> +
> +	/* Already mapped with a different size */
> +	if (rb_has_aux(rb) && rb->aux_nr_pages != nr_pages)
> +		return -EINVAL;
> +
> +	if (!is_power_of_2(nr_pages))
> +		return -EINVAL;
> +
> +	/* If this succeeds, subsequent failures have to undo it */
> +	if (!atomic_inc_not_zero(&rb->mmap_count))
> +		return -EINVAL;
> +
> +	/* If mapped, attach to it */
> +	if (rb_has_aux(rb)) {
> +		atomic_inc(&rb->aux_mmap_count);
> +		return 0;
> +	}
> +
> +	if (!perf_mmap_calc_limits(vma, &user_extra, &extra)) {
> +		atomic_dec(&rb->mmap_count);
> +		return -EPERM;
> +	}
> +
> +	if (vma->vm_flags & VM_WRITE)
> +		rb_flags |= RING_BUFFER_WRITABLE;
> +
> +	ret = rb_alloc_aux(rb, event, vma->vm_pgoff, nr_pages,
> +			   event->attr.aux_watermark, rb_flags);
> +	if (ret) {
> +		atomic_dec(&rb->mmap_count);
> +		return ret;
> +	}
> +
> +	atomic_set(&rb->aux_mmap_count, 1);
> +	rb->aux_mmap_locked = extra;
> +	perf_mmap_account(vma, user_extra, extra);
> +	atomic_inc(&event->mmap_count);
> +	return 0;
> +}
> +
>  static int perf_mmap(struct file *file, struct vm_area_struct *vma)
>  {
>  	struct perf_event *event = file->private_data;
>  	unsigned long vma_size, nr_pages;
>  	long user_extra = 0, extra = 0;
> -	struct mutex *aux_mutex = NULL;
>  	struct perf_buffer *rb = NULL;
>  	int ret, flags = 0;
>  	mapped_f mapped;
> @@ -7055,51 +7122,15 @@ static int perf_mmap(struct file *file,
>  		}
>
>  	} else {
> -		/*
> -		 * AUX area mapping: if rb->aux_nr_pages != 0, it's already
> -		 * mapped, all subsequent mappings should have the same size
> -		 * and offset. Must be above the normal perf buffer.
> -		 */
> -		u64 aux_offset, aux_size;
> -
> -		rb = event->rb;
> -		if (!rb)
> -			goto aux_unlock;
> -
> -		aux_mutex = &rb->aux_mutex;
> -		mutex_lock(aux_mutex);
> -
> -		aux_offset = READ_ONCE(rb->user_page->aux_offset);
> -		aux_size = READ_ONCE(rb->user_page->aux_size);
> -
> -		if (aux_offset < perf_data_size(rb) + PAGE_SIZE)
> -			goto aux_unlock;
> -
> -		if (aux_offset != vma->vm_pgoff << PAGE_SHIFT)
> -			goto aux_unlock;
> -
> -		/* already mapped with a different offset */
> -		if (rb_has_aux(rb) && rb->aux_pgoff != vma->vm_pgoff)
> -			goto aux_unlock;
> -
> -		if (aux_size != nr_pages * PAGE_SIZE)
> -			goto aux_unlock;
> -
> -		/* already mapped with a different size */
> -		if (rb_has_aux(rb) && rb->aux_nr_pages != nr_pages)
> -			goto aux_unlock;
> -
> -		if (!is_power_of_2(nr_pages))
> -			goto aux_unlock;
> -
> -		if (!atomic_inc_not_zero(&rb->mmap_count))
> -			goto aux_unlock;
> -
> -		if (rb_has_aux(rb)) {
> -			atomic_inc(&rb->aux_mmap_count);
> -			ret = 0;
> -			goto unlock;
> +		if (!event->rb) {
> +			ret = -EINVAL;
> +		} else {
> +			scoped_guard(mutex, &event->rb->aux_mutex)
> +				ret = perf_mmap_aux(vma, event, nr_pages);
>  		}
> +		// Temporary workaround to split out AUX handling first
> +		mutex_unlock(&event->mmap_mutex);
> +		goto out;
>  	}
>
>  	if (!perf_mmap_calc_limits(vma, &user_extra, &extra)) {
> @@ -7132,28 +7163,16 @@ static int perf_mmap(struct file *file,
>  		perf_event_init_userpage(event);
>  		perf_event_update_userpage(event);
>  		ret = 0;
> -	} else {
> -		ret = rb_alloc_aux(rb, event, vma->vm_pgoff, nr_pages,
> -				   event->attr.aux_watermark, flags);
> -		if (!ret) {
> -			atomic_set(&rb->aux_mmap_count, 1);
> -			rb->aux_mmap_locked = extra;
> -		}
>  	}
> -
>  unlock:
>  	if (!ret) {
>  		perf_mmap_account(vma, user_extra, extra);
>  		atomic_inc(&event->mmap_count);
> -	} else if (rb) {
> -		/* AUX allocation failed */
> -		atomic_dec(&rb->mmap_count);
>  	}
> -aux_unlock:
> -	if (aux_mutex)
> -		mutex_unlock(aux_mutex);
>  	mutex_unlock(&event->mmap_mutex);
>
> +// Temporary until RB allocation is split out.
> +out:
>  	if (ret)
>  		return ret;
>
>

Re: [patch V2 RESEND 4/6] perf/core: Split out AUX buffer allocation

Posted by Peter Zijlstra 5 months, 4 weeks ago

On Mon, Aug 11, 2025 at 02:21:08PM +0100, Lorenzo Stoakes wrote:
> On Mon, Aug 11, 2025 at 02:36:35PM +0200, Thomas Gleixner wrote:
> > The code logic in perf_mmap() is incomprehensible and has been source of
> > subtle bugs in the past. It makes it impossible to convert the atomic_t
> > reference counts to refcount_t.
> >
> > There is not really much, which is shared between the ringbuffer and AUX
> > buffer allocation code since the mlock limit calculation and the
> > accounting has been split out into helper functions.
> >
> > Move the AUX buffer allocation code out and integrate the call with a
> > momentary workaround to allow skipping the remaining ringbuffer related
> > code completely. That workaround will be removed once the ringbuffer
> > allocation is moved to its own function as well.
> >
> > No functional change.
> >
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> git range-diff made this much easier :)
> 
> I see you have also rearranged the ordering of the event->state <=
> PERF_EVENT_STATE_REVOKED check and the data_page_nr() check as well, which
> comparing to the original, seems to be a correct thing to do (I guess maybe
> we'd just have a different error message than anticipated at worst from
> having them in the 'wrong' order).
> 
> Overall this LGTM, thanks for getting this respun quickly!!
> 
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Well, there's still that problem below. That was a goto unlock and would
pass through the perf_mmap_account() and atomic_inc(mmap_count), where
now it does not.

Arguably it should not do the perf_mmap_account(), but lets keep the
code functionally equivalent for now.

Anyway, I've re-done these two break-out patches in a more fine grained
fashion and will post a new series in a few.

> > ---
> > V2: Fixup invers condition and add the dropped flags setup back - Lorenzo
> >     Fixup subject line to match the content
> > ---
> >  kernel/events/core.c |  137 +++++++++++++++++++++++++++++----------------------
> >  1 file changed, 78 insertions(+), 59 deletions(-)
> >
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -6970,12 +6970,79 @@ static void perf_mmap_account(struct vm_
> >  	atomic64_add(extra, &vma->vm_mm->pinned_vm);
> >  }
> >
> > +static int perf_mmap_aux(struct vm_area_struct *vma, struct perf_event *event,
> > +			 unsigned long nr_pages)
> > +{
> > +	long user_extra = nr_pages, extra = 0;
> > +	struct perf_buffer *rb = event->rb;
> > +	u64 aux_offset, aux_size;
> > +	int ret, rb_flags = 0;
> > +
> > +	/*
> > +	 * AUX area mapping: if rb->aux_nr_pages != 0, it's already
> > +	 * mapped, all subsequent mappings should have the same size
> > +	 * and offset. Must be above the normal perf buffer.
> > +	 */
> > +	aux_offset = READ_ONCE(rb->user_page->aux_offset);
> > +	aux_size = READ_ONCE(rb->user_page->aux_size);
> > +
> > +	if (aux_offset < perf_data_size(rb) + PAGE_SIZE)
> > +		return -EINVAL;
> > +
> > +	if (aux_offset != vma->vm_pgoff << PAGE_SHIFT)
> > +		return -EINVAL;
> > +
> > +	/* Already mapped with a different offset */
> > +	if (rb_has_aux(rb) && rb->aux_pgoff != vma->vm_pgoff)
> > +		return -EINVAL;
> > +
> > +	if (aux_size != nr_pages * PAGE_SIZE)
> > +		return -EINVAL;
> > +
> > +	/* Already mapped with a different size */
> > +	if (rb_has_aux(rb) && rb->aux_nr_pages != nr_pages)
> > +		return -EINVAL;
> > +
> > +	if (!is_power_of_2(nr_pages))
> > +		return -EINVAL;
> > +
> > +	/* If this succeeds, subsequent failures have to undo it */
> > +	if (!atomic_inc_not_zero(&rb->mmap_count))
> > +		return -EINVAL;
> > +
> > +	/* If mapped, attach to it */
> > +	if (rb_has_aux(rb)) {
> > +		atomic_inc(&rb->aux_mmap_count);
> > +		return 0;

here, the return 0 should've been a goto to...

> > +	}
> > +
> > +	if (!perf_mmap_calc_limits(vma, &user_extra, &extra)) {
> > +		atomic_dec(&rb->mmap_count);
> > +		return -EPERM;
> > +	}
> > +
> > +	if (vma->vm_flags & VM_WRITE)
> > +		rb_flags |= RING_BUFFER_WRITABLE;
> > +
> > +	ret = rb_alloc_aux(rb, event, vma->vm_pgoff, nr_pages,
> > +			   event->attr.aux_watermark, rb_flags);
> > +	if (ret) {
> > +		atomic_dec(&rb->mmap_count);
> > +		return ret;
> > +	}
> > +
> > +	atomic_set(&rb->aux_mmap_count, 1);
> > +	rb->aux_mmap_locked = extra;

here:

> > +	perf_mmap_account(vma, user_extra, extra);
> > +	atomic_inc(&event->mmap_count);
> > +	return 0;
> > +}

Re: [patch V2 RESEND 4/6] perf/core: Split out AUX buffer allocation

Posted by Lorenzo Stoakes 5 months, 4 weeks ago

On Tue, Aug 12, 2025 at 12:06:13PM +0200, Peter Zijlstra wrote:
> On Mon, Aug 11, 2025 at 02:21:08PM +0100, Lorenzo Stoakes wrote:
> > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Well, there's still that problem below. That was a goto unlock and would
> pass through the perf_mmap_account() and atomic_inc(mmap_count), where
> now it does not.

Yep, you're right, damn it.

I managed to miss this one among all the rest, and I was looking very very
carefully bit-by-bit... goes to show just how horrible the code was in teh first
place.

>
> Arguably it should not do the perf_mmap_account(), but lets keep the
> code functionally equivalent for now.
>
> Anyway, I've re-done these two break-out patches in a more fine grained
> fashion and will post a new series in a few.

OK will take a look.

>
> > > ---
> > > V2: Fixup invers condition and add the dropped flags setup back - Lorenzo
> > >     Fixup subject line to match the content
> > > ---
> > >  kernel/events/core.c |  137 +++++++++++++++++++++++++++++----------------------
> > >  1 file changed, 78 insertions(+), 59 deletions(-)
> > >
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -6970,12 +6970,79 @@ static void perf_mmap_account(struct vm_
> > >  	atomic64_add(extra, &vma->vm_mm->pinned_vm);
> > >  }
> > >
> > > +static int perf_mmap_aux(struct vm_area_struct *vma, struct perf_event *event,
> > > +			 unsigned long nr_pages)
> > > +{
> > > +	long user_extra = nr_pages, extra = 0;
> > > +	struct perf_buffer *rb = event->rb;
> > > +	u64 aux_offset, aux_size;
> > > +	int ret, rb_flags = 0;
> > > +
> > > +	/*
> > > +	 * AUX area mapping: if rb->aux_nr_pages != 0, it's already
> > > +	 * mapped, all subsequent mappings should have the same size
> > > +	 * and offset. Must be above the normal perf buffer.
> > > +	 */
> > > +	aux_offset = READ_ONCE(rb->user_page->aux_offset);
> > > +	aux_size = READ_ONCE(rb->user_page->aux_size);
> > > +
> > > +	if (aux_offset < perf_data_size(rb) + PAGE_SIZE)
> > > +		return -EINVAL;
> > > +
> > > +	if (aux_offset != vma->vm_pgoff << PAGE_SHIFT)
> > > +		return -EINVAL;
> > > +
> > > +	/* Already mapped with a different offset */
> > > +	if (rb_has_aux(rb) && rb->aux_pgoff != vma->vm_pgoff)
> > > +		return -EINVAL;
> > > +
> > > +	if (aux_size != nr_pages * PAGE_SIZE)
> > > +		return -EINVAL;
> > > +
> > > +	/* Already mapped with a different size */
> > > +	if (rb_has_aux(rb) && rb->aux_nr_pages != nr_pages)
> > > +		return -EINVAL;
> > > +
> > > +	if (!is_power_of_2(nr_pages))
> > > +		return -EINVAL;
> > > +
> > > +	/* If this succeeds, subsequent failures have to undo it */
> > > +	if (!atomic_inc_not_zero(&rb->mmap_count))
> > > +		return -EINVAL;
> > > +
> > > +	/* If mapped, attach to it */
> > > +	if (rb_has_aux(rb)) {
> > > +		atomic_inc(&rb->aux_mmap_count);
> > > +		return 0;
>
> here, the return 0 should've been a goto to...

Yeah... I think I had assumed we fixed this up somehow on the return but of
course we can't since we moved the perf_map_account() here (nor could we
differentiate return values).

But I think partly a product of the diff not showing that bit and, despite
checking line-by-line against old version I just about missed this one.

Yeah, this code, this refactor is very much needed!

>
> > > +	}
> > > +
> > > +	if (!perf_mmap_calc_limits(vma, &user_extra, &extra)) {
> > > +		atomic_dec(&rb->mmap_count);
> > > +		return -EPERM;
> > > +	}
> > > +
> > > +	if (vma->vm_flags & VM_WRITE)
> > > +		rb_flags |= RING_BUFFER_WRITABLE;
> > > +
> > > +	ret = rb_alloc_aux(rb, event, vma->vm_pgoff, nr_pages,
> > > +			   event->attr.aux_watermark, rb_flags);
> > > +	if (ret) {
> > > +		atomic_dec(&rb->mmap_count);
> > > +		return ret;
> > > +	}
> > > +
> > > +	atomic_set(&rb->aux_mmap_count, 1);
> > > +	rb->aux_mmap_locked = extra;
>
> here:
>
> > > +	perf_mmap_account(vma, user_extra, extra);
> > > +	atomic_inc(&event->mmap_count);
> > > +	return 0;
> > > +}

[patch V2 RESEND 5/6] perf/core: Split the ringbuffer mmap() and allocation code out

Posted by Thomas Gleixner 6 months ago

The code logic in perf_mmap() is incomprehensible and has been source of
subtle bugs in the past. It makes it impossible to convert the atomic_t
reference counts to refcount_t.

Now that the AUX buffer mapping and allocation code is in it's own function
apply the same treatment to the ringbuffer part and remove the temporary
workarounds created by the AUX split out.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/events/core.c |  176 ++++++++++++++++++++++-----------------------------
 1 file changed, 78 insertions(+), 98 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6970,6 +6970,70 @@ static void perf_mmap_account(struct vm_
 	atomic64_add(extra, &vma->vm_mm->pinned_vm);
 }
 
+static int perf_mmap_rb(struct vm_area_struct *vma, struct perf_event *event,
+			unsigned long nr_pages)
+{
+	long user_extra = nr_pages, extra = 0;
+	struct perf_buffer *rb = event->rb;
+	int rb_flags = 0;
+
+	/*
+	 * If we have rb pages ensure they're a power-of-two number, so we
+	 * can do bitmasks instead of modulo.
+	 */
+	if (--nr_pages != 0 && !is_power_of_2(nr_pages))
+		return -EINVAL;
+
+	WARN_ON_ONCE(event->ctx->parent_ctx);
+
+	if (rb) {
+		/* Must have the same size */
+		if (data_page_nr(rb) != nr_pages)
+			return -EINVAL;
+
+		if (atomic_inc_not_zero(&event->rb->mmap_count)) {
+			/*
+			 * Success -- managed to mmap() the same buffer
+			 * multiple times.
+			 */
+			atomic_inc(&event->mmap_count);
+			return 0;
+		}
+		/*
+		 * Raced against perf_mmap_close()'s
+		 * atomic_dec_and_mutex_lock() remove the event and
+		 * continue as if !event->rb
+		 */
+		ring_buffer_attach(event, NULL);
+	}
+
+	if (!perf_mmap_calc_limits(vma, &user_extra, &extra))
+		return -EPERM;
+
+	if (vma->vm_flags & VM_WRITE)
+		rb_flags |= RING_BUFFER_WRITABLE;
+
+	rb = rb_alloc(nr_pages, event->attr.watermark ? event->attr.wakeup_watermark : 0,
+		      event->cpu, rb_flags);
+
+	if (!rb)
+		return -ENOMEM;
+
+	atomic_set(&rb->mmap_count, 1);
+	rb->mmap_user = get_current_user();
+	rb->mmap_locked = extra;
+
+	ring_buffer_attach(event, rb);
+
+	perf_event_update_time(event);
+	perf_event_init_userpage(event);
+	perf_event_update_userpage(event);
+
+	perf_mmap_account(vma, user_extra, extra);
+	atomic_set(&event->mmap_count, 1);
+	return 0;
+}
+
 static int perf_mmap_aux(struct vm_area_struct *vma, struct perf_event *event,
 			 unsigned long nr_pages)
 {
@@ -7042,10 +7106,8 @@ static int perf_mmap(struct file *file,
 {
 	struct perf_event *event = file->private_data;
 	unsigned long vma_size, nr_pages;
-	long user_extra = 0, extra = 0;
-	struct perf_buffer *rb = NULL;
-	int ret, flags = 0;
 	mapped_f mapped;
+	int ret;
 
 	/*
 	 * Don't allow mmap() of inherited per-task counters. This would
@@ -7071,114 +7133,32 @@ static int perf_mmap(struct file *file,
 	if (vma_size != PAGE_SIZE * nr_pages)
 		return -EINVAL;
 
-	user_extra = nr_pages;
-
-	mutex_lock(&event->mmap_mutex);
-	ret = -EINVAL;
-
-	/*
-	 * This relies on __pmu_detach_event() taking mmap_mutex after marking
-	 * the event REVOKED. Either we observe the state, or __pmu_detach_event()
-	 * will detach the rb created here.
-	 */
-	if (event->state <= PERF_EVENT_STATE_REVOKED) {
-		ret = -ENODEV;
-		goto unlock;
-	}
-
-	if (vma->vm_pgoff == 0) {
-		nr_pages -= 1;
-
+	scoped_guard(mutex, &event->mmap_mutex) {
 		/*
-		 * If we have rb pages ensure they're a power-of-two number, so we
-		 * can do bitmasks instead of modulo.
+		 * This relies on __pmu_detach_event() taking mmap_mutex
+		 * after marking the event REVOKED. Either we observe the
+		 * state, or __pmu_detach_event() will detach the rb
+		 * created here.
 		 */
-		if (nr_pages != 0 && !is_power_of_2(nr_pages))
-			goto unlock;
-
-		WARN_ON_ONCE(event->ctx->parent_ctx);
+		if (event->state <= PERF_EVENT_STATE_REVOKED)
+			return -ENODEV;
 
-		if (event->rb) {
-			if (data_page_nr(event->rb) != nr_pages)
-				goto unlock;
-
-			if (atomic_inc_not_zero(&event->rb->mmap_count)) {
-				/*
-				 * Success -- managed to mmap() the same buffer
-				 * multiple times.
-				 */
-				ret = 0;
-				/* We need the rb to map pages. */
-				rb = event->rb;
-				goto unlock;
-			}
-
-			/*
-			 * Raced against perf_mmap_close()'s
-			 * atomic_dec_and_mutex_lock() remove the
-			 * event and continue as if !event->rb
-			 */
-			ring_buffer_attach(event, NULL);
-		}
-
-	} else {
-		if (!event->rb) {
-			ret = -EINVAL;
+		if (vma->vm_pgoff == 0) {
+			ret = perf_mmap_rb(vma, event, nr_pages);
 		} else {
+			if (!event->rb)
+				return -EINVAL;
 			scoped_guard(mutex, &event->rb->aux_mutex)
 				ret = perf_mmap_aux(vma, event, nr_pages);
 		}
-		// Temporary workaround to split out AUX handling first
-		mutex_unlock(&event->mmap_mutex);
-		goto out;
-	}
-
-	if (!perf_mmap_calc_limits(vma, &user_extra, &extra)) {
-		ret = -EPERM;
-		goto unlock;
-	}
-
-	WARN_ON(!rb && event->rb);
-
-	if (vma->vm_flags & VM_WRITE)
-		flags |= RING_BUFFER_WRITABLE;
-
-	if (!rb) {
-		rb = rb_alloc(nr_pages,
-			      event->attr.watermark ? event->attr.wakeup_watermark : 0,
-			      event->cpu, flags);
-
-		if (!rb) {
-			ret = -ENOMEM;
-			goto unlock;
-		}
-
-		atomic_set(&rb->mmap_count, 1);
-		rb->mmap_user = get_current_user();
-		rb->mmap_locked = extra;
-
-		ring_buffer_attach(event, rb);
-
-		perf_event_update_time(event);
-		perf_event_init_userpage(event);
-		perf_event_update_userpage(event);
-		ret = 0;
-	}
-unlock:
-	if (!ret) {
-		perf_mmap_account(vma, user_extra, extra);
-		atomic_inc(&event->mmap_count);
 	}
-	mutex_unlock(&event->mmap_mutex);
 
-// Temporary until RB allocation is split out.
-out:
 	if (ret)
 		return ret;
 
 	/*
 	 * Since pinned accounting is per vm we cannot allow fork() to copy our
-	 * vma.
+	 * VMA. The VMA is fixed size and must not be included in dumps.
 	 */
 	vm_flags_set(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP);
 	vma->vm_ops = &perf_mmap_vmops;
@@ -7193,7 +7173,7 @@ static int perf_mmap(struct file *file,
 	 * full cleanup in this case and therefore does not invoke
 	 * vmops::close().
 	 */
-	ret = map_range(rb, vma);
+	ret = map_range(event->rb, vma);
 	if (ret)
 		perf_mmap_close(vma);

Re: [patch V2 RESEND 5/6] perf/core: Split the ringbuffer mmap() and allocation code out

Posted by Lorenzo Stoakes 6 months ago

On Mon, Aug 11, 2025 at 02:36:37PM +0200, Thomas Gleixner wrote:
> The code logic in perf_mmap() is incomprehensible and has been source of
> subtle bugs in the past. It makes it impossible to convert the atomic_t
> reference counts to refcount_t.
>
> Now that the AUX buffer mapping and allocation code is in it's own function
> apply the same treatment to the ringbuffer part and remove the temporary
> workarounds created by the AUX split out.
>
> No functional change intended.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Aside from nits/code style stuff, this LGTM afaict.

I've done a bunch of (basic) testing locally and hit no issues as well with
the series so this all seems fine.

So:

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

>
> ---
>  kernel/events/core.c |  176 ++++++++++++++++++++++-----------------------------
>  1 file changed, 78 insertions(+), 98 deletions(-)
>
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6970,6 +6970,70 @@ static void perf_mmap_account(struct vm_
>  	atomic64_add(extra, &vma->vm_mm->pinned_vm);
>  }
>
> +static int perf_mmap_rb(struct vm_area_struct *vma, struct perf_event *event,
> +			unsigned long nr_pages)
> +{
> +	long user_extra = nr_pages, extra = 0;
> +	struct perf_buffer *rb = event->rb;
> +	int rb_flags = 0;
> +
> +	/*
> +	 * If we have rb pages ensure they're a power-of-two number, so we
> +	 * can do bitmasks instead of modulo.
> +	 */
> +	if (--nr_pages != 0 && !is_power_of_2(nr_pages))
> +		return -EINVAL;
> +
> +	WARN_ON_ONCE(event->ctx->parent_ctx);
> +
> +	if (rb) {
> +		/* Must have the same size */
> +		if (data_page_nr(rb) != nr_pages)
> +			return -EINVAL;
> +
> +		if (atomic_inc_not_zero(&event->rb->mmap_count)) {

NIT: Can't we use &rb->mmap_count here?

> +			/*
> +			 * Success -- managed to mmap() the same buffer
> +			 * multiple times.
> +			 */
> +			atomic_inc(&event->mmap_count);
> +			return 0;
> +		}
> +		/*
> +		 * Raced against perf_mmap_close()'s
> +		 * atomic_dec_and_mutex_lock() remove the event and
> +		 * continue as if !event->rb
> +		 */
> +		ring_buffer_attach(event, NULL);
> +	}
> +
> +	if (!perf_mmap_calc_limits(vma, &user_extra, &extra))
> +		return -EPERM;
> +
> +	if (vma->vm_flags & VM_WRITE)
> +		rb_flags |= RING_BUFFER_WRITABLE;
> +
> +	rb = rb_alloc(nr_pages, event->attr.watermark ? event->attr.wakeup_watermark : 0,
> +		      event->cpu, rb_flags);

Not sure I love us reusing 'rb' here, perhaps we should declare:

struct perf_buffer *prev_rb = event->rb;
struct perf_buffer *rb;

Then refactor above to refer to prev_rb and assign rb here?

> +
> +	if (!rb)
> +		return -ENOMEM;
> +
> +	atomic_set(&rb->mmap_count, 1);
> +	rb->mmap_user = get_current_user();
> +	rb->mmap_locked = extra;
> +
> +	ring_buffer_attach(event, rb);
> +
> +	perf_event_update_time(event);
> +	perf_event_init_userpage(event);
> +	perf_event_update_userpage(event);
> +
> +	perf_mmap_account(vma, user_extra, extra);
> +	atomic_set(&event->mmap_count, 1);
> +	return 0;
> +}
> +
>  static int perf_mmap_aux(struct vm_area_struct *vma, struct perf_event *event,
>  			 unsigned long nr_pages)
>  {
> @@ -7042,10 +7106,8 @@ static int perf_mmap(struct file *file,
>  {
>  	struct perf_event *event = file->private_data;
>  	unsigned long vma_size, nr_pages;
> -	long user_extra = 0, extra = 0;
> -	struct perf_buffer *rb = NULL;
> -	int ret, flags = 0;
>  	mapped_f mapped;
> +	int ret;
>
>  	/*
>  	 * Don't allow mmap() of inherited per-task counters. This would
> @@ -7071,114 +7133,32 @@ static int perf_mmap(struct file *file,
>  	if (vma_size != PAGE_SIZE * nr_pages)
>  		return -EINVAL;
>
> -	user_extra = nr_pages;
> -
> -	mutex_lock(&event->mmap_mutex);
> -	ret = -EINVAL;
> -
> -	/*
> -	 * This relies on __pmu_detach_event() taking mmap_mutex after marking
> -	 * the event REVOKED. Either we observe the state, or __pmu_detach_event()
> -	 * will detach the rb created here.
> -	 */
> -	if (event->state <= PERF_EVENT_STATE_REVOKED) {
> -		ret = -ENODEV;
> -		goto unlock;
> -	}
> -
> -	if (vma->vm_pgoff == 0) {
> -		nr_pages -= 1;
> -
> +	scoped_guard(mutex, &event->mmap_mutex) {
>  		/*
> -		 * If we have rb pages ensure they're a power-of-two number, so we
> -		 * can do bitmasks instead of modulo.
> +		 * This relies on __pmu_detach_event() taking mmap_mutex
> +		 * after marking the event REVOKED. Either we observe the
> +		 * state, or __pmu_detach_event() will detach the rb
> +		 * created here.
>  		 */
> -		if (nr_pages != 0 && !is_power_of_2(nr_pages))
> -			goto unlock;
> -
> -		WARN_ON_ONCE(event->ctx->parent_ctx);
> +		if (event->state <= PERF_EVENT_STATE_REVOKED)
> +			return -ENODEV;
>
> -		if (event->rb) {
> -			if (data_page_nr(event->rb) != nr_pages)
> -				goto unlock;
> -
> -			if (atomic_inc_not_zero(&event->rb->mmap_count)) {
> -				/*
> -				 * Success -- managed to mmap() the same buffer
> -				 * multiple times.
> -				 */
> -				ret = 0;
> -				/* We need the rb to map pages. */
> -				rb = event->rb;
> -				goto unlock;
> -			}
> -
> -			/*
> -			 * Raced against perf_mmap_close()'s
> -			 * atomic_dec_and_mutex_lock() remove the
> -			 * event and continue as if !event->rb
> -			 */
> -			ring_buffer_attach(event, NULL);
> -		}
> -
> -	} else {
> -		if (!event->rb) {
> -			ret = -EINVAL;
> +		if (vma->vm_pgoff == 0) {
> +			ret = perf_mmap_rb(vma, event, nr_pages);
>  		} else {
> +			if (!event->rb)
> +				return -EINVAL;
>  			scoped_guard(mutex, &event->rb->aux_mutex)
>  				ret = perf_mmap_aux(vma, event, nr_pages);
>  		}
> -		// Temporary workaround to split out AUX handling first
> -		mutex_unlock(&event->mmap_mutex);
> -		goto out;
> -	}
> -
> -	if (!perf_mmap_calc_limits(vma, &user_extra, &extra)) {
> -		ret = -EPERM;
> -		goto unlock;
> -	}
> -
> -	WARN_ON(!rb && event->rb);
> -
> -	if (vma->vm_flags & VM_WRITE)
> -		flags |= RING_BUFFER_WRITABLE;
> -
> -	if (!rb) {
> -		rb = rb_alloc(nr_pages,
> -			      event->attr.watermark ? event->attr.wakeup_watermark : 0,
> -			      event->cpu, flags);
> -
> -		if (!rb) {
> -			ret = -ENOMEM;
> -			goto unlock;
> -		}
> -
> -		atomic_set(&rb->mmap_count, 1);
> -		rb->mmap_user = get_current_user();
> -		rb->mmap_locked = extra;
> -
> -		ring_buffer_attach(event, rb);
> -
> -		perf_event_update_time(event);
> -		perf_event_init_userpage(event);
> -		perf_event_update_userpage(event);
> -		ret = 0;
> -	}
> -unlock:
> -	if (!ret) {
> -		perf_mmap_account(vma, user_extra, extra);
> -		atomic_inc(&event->mmap_count);
>  	}
> -	mutex_unlock(&event->mmap_mutex);
>
> -// Temporary until RB allocation is split out.
> -out:
>  	if (ret)
>  		return ret;
>
>  	/*
>  	 * Since pinned accounting is per vm we cannot allow fork() to copy our
> -	 * vma.
> +	 * VMA. The VMA is fixed size and must not be included in dumps.
>  	 */
>  	vm_flags_set(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP);
>  	vma->vm_ops = &perf_mmap_vmops;
> @@ -7193,7 +7173,7 @@ static int perf_mmap(struct file *file,
>  	 * full cleanup in this case and therefore does not invoke
>  	 * vmops::close().
>  	 */
> -	ret = map_range(rb, vma);
> +	ret = map_range(event->rb, vma);

Have gone through the code and convinced myself we always set this by this
point :) I seem to remember there was some issue with me using event->rb in
a previous version of my series to add map_range(). Perhaps it was
positioned differently... Or maybe misremembering...

At any rate we used to have a WARN_ON(rb != event->rb) here so it's clearly
an invariant since hey we're meant to set this field anyway...

>  	if (ret)
>  		perf_mmap_close(vma);
>
>

Re: [patch V2 RESEND 5/6] perf/core: Split the ringbuffer mmap() and allocation code out

Posted by Lorenzo Stoakes 6 months ago

> On Mon, Aug 11, 2025 at 02:36:37PM +0200, Thomas Gleixner wrote:
> > ---
> >  kernel/events/core.c |  176 ++++++++++++++++++++++-----------------------------
> >  1 file changed, 78 insertions(+), 98 deletions(-)
> >
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -6970,6 +6970,70 @@ static void perf_mmap_account(struct vm_
> >  	atomic64_add(extra, &vma->vm_mm->pinned_vm);
> >  }
> >
> > +static int perf_mmap_rb(struct vm_area_struct *vma, struct perf_event *event,
> > +			unsigned long nr_pages)
> > +{
> > +	long user_extra = nr_pages, extra = 0;
> > +	struct perf_buffer *rb = event->rb;
> > +	int rb_flags = 0;

...

> > +	perf_mmap_account(vma, user_extra, extra);
> > +	atomic_set(&event->mmap_count, 1);

OK so just one question - here we move from an atomic_inc() below, to an
atomic_set() here, which resolves the issue we saw before with use of
refcounts.

Is it definitely safe to do so and not breaking some expectation that this
might somehow be raced by something?

Thanks!

> > +	return 0;
> > +}
> > +

> >  	/*
> >  	 * Don't allow mmap() of inherited per-task counters. This would
> > @@ -7071,114 +7133,32 @@ static int perf_mmap(struct file *file,
> >  	if (vma_size != PAGE_SIZE * nr_pages)
> >  		return -EINVAL;
> >

...

> > -unlock:
> > -	if (!ret) {
> > -		perf_mmap_account(vma, user_extra, extra);
> > -		atomic_inc(&event->mmap_count);
> >  	}

[patch V2 RESEND 6/6] perf/core: Convert mmap() refcounts to refcount_t

Posted by Thomas Gleixner 6 months ago

The recently fixed reference count leaks could have been detected by using
refcount_t and refcount_t would have mitigated the potential overflow at
least.

Now that the code is properly structured, convert the mmap() related
mmap_count variants over to refcount_t.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 include/linux/perf_event.h  |    2 +-
 kernel/events/core.c        |   40 ++++++++++++++++++++--------------------
 kernel/events/internal.h    |    4 ++--
 kernel/events/ring_buffer.c |    2 +-
 4 files changed, 24 insertions(+), 24 deletions(-)

--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -859,7 +859,7 @@ struct perf_event {
 
 	/* mmap bits */
 	struct mutex			mmap_mutex;
-	atomic_t			mmap_count;
+	refcount_t			mmap_count;
 
 	struct perf_buffer		*rb;
 	struct list_head		rb_entry;
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3968,7 +3968,7 @@ static noinline int visit_groups_merge(s
  */
 static inline bool event_update_userpage(struct perf_event *event)
 {
-	if (likely(!atomic_read(&event->mmap_count)))
+	if (likely(!refcount_read(&event->mmap_count)))
 		return false;
 
 	perf_event_update_time(event);
@@ -6704,11 +6704,11 @@ static void perf_mmap_open(struct vm_are
 	struct perf_event *event = vma->vm_file->private_data;
 	mapped_f mapped = get_mapped(event, event_mapped);
 
-	atomic_inc(&event->mmap_count);
-	atomic_inc(&event->rb->mmap_count);
+	refcount_inc(&event->mmap_count);
+	refcount_inc(&event->rb->mmap_count);
 
 	if (vma->vm_pgoff)
-		atomic_inc(&event->rb->aux_mmap_count);
+		refcount_inc(&event->rb->aux_mmap_count);
 
 	if (mapped)
 		mapped(event, vma->vm_mm);
@@ -6743,7 +6743,7 @@ static void perf_mmap_close(struct vm_ar
 	 * to avoid complications.
 	 */
 	if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
-	    atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &rb->aux_mutex)) {
+	    refcount_dec_and_mutex_lock(&rb->aux_mmap_count, &rb->aux_mutex)) {
 		/*
 		 * Stop all AUX events that are writing to this buffer,
 		 * so that we can free its AUX pages and corresponding PMU
@@ -6763,10 +6763,10 @@ static void perf_mmap_close(struct vm_ar
 		mutex_unlock(&rb->aux_mutex);
 	}
 
-	if (atomic_dec_and_test(&rb->mmap_count))
+	if (refcount_dec_and_test(&rb->mmap_count))
 		detach_rest = true;
 
-	if (!atomic_dec_and_mutex_lock(&event->mmap_count, &event->mmap_mutex))
+	if (!refcount_dec_and_mutex_lock(&event->mmap_count, &event->mmap_mutex))
 		goto out_put;
 
 	ring_buffer_attach(event, NULL);
@@ -6991,17 +6991,17 @@ static int perf_mmap_rb(struct vm_area_s
 		if (data_page_nr(rb) != nr_pages)
 			return -EINVAL;
 
-		if (atomic_inc_not_zero(&event->rb->mmap_count)) {
+		if (refcount_inc_not_zero(&event->rb->mmap_count)) {
 			/*
 			 * Success -- managed to mmap() the same buffer
 			 * multiple times.
 			 */
-			atomic_inc(&event->mmap_count);
+			refcount_inc(&event->mmap_count);
 			return 0;
 		}
 		/*
 		 * Raced against perf_mmap_close()'s
-		 * atomic_dec_and_mutex_lock() remove the event and
+		 * refcount_dec_and_mutex_lock() remove the event and
 		 * continue as if !event->rb
 		 */
 		ring_buffer_attach(event, NULL);
@@ -7019,7 +7019,7 @@ static int perf_mmap_rb(struct vm_area_s
 	if (!rb)
 		return -ENOMEM;
 
-	atomic_set(&rb->mmap_count, 1);
+	refcount_set(&rb->mmap_count, 1);
 	rb->mmap_user = get_current_user();
 	rb->mmap_locked = extra;
 
@@ -7030,7 +7030,7 @@ static int perf_mmap_rb(struct vm_area_s
 	perf_event_update_userpage(event);
 
 	perf_mmap_account(vma, user_extra, extra);
-	atomic_set(&event->mmap_count, 1);
+	refcount_set(&event->mmap_count, 1);
 	return 0;
 }
 
@@ -7071,17 +7071,17 @@ static int perf_mmap_aux(struct vm_area_
 		return -EINVAL;
 
 	/* If this succeeds, subsequent failures have to undo it */
-	if (!atomic_inc_not_zero(&rb->mmap_count))
+	if (!refcount_inc_not_zero(&rb->mmap_count))
 		return -EINVAL;
 
 	/* If mapped, attach to it */
 	if (rb_has_aux(rb)) {
-		atomic_inc(&rb->aux_mmap_count);
+		refcount_inc(&rb->aux_mmap_count);
 		return 0;
 	}
 
 	if (!perf_mmap_calc_limits(vma, &user_extra, &extra)) {
-		atomic_dec(&rb->mmap_count);
+		refcount_dec(&rb->mmap_count);
 		return -EPERM;
 	}
 
@@ -7091,14 +7091,14 @@ static int perf_mmap_aux(struct vm_area_
 	ret = rb_alloc_aux(rb, event, vma->vm_pgoff, nr_pages,
 			   event->attr.aux_watermark, rb_flags);
 	if (ret) {
-		atomic_dec(&rb->mmap_count);
+		refcount_dec(&rb->mmap_count);
 		return ret;
 	}
 
-	atomic_set(&rb->aux_mmap_count, 1);
+	refcount_set(&rb->aux_mmap_count, 1);
 	rb->aux_mmap_locked = extra;
 	perf_mmap_account(vma, user_extra, extra);
-	atomic_inc(&event->mmap_count);
+	refcount_inc(&event->mmap_count);
 	return 0;
 }
 
@@ -13247,7 +13247,7 @@ perf_event_set_output(struct perf_event
 	mutex_lock_double(&event->mmap_mutex, &output_event->mmap_mutex);
 set:
 	/* Can't redirect output if we've got an active mmap() */
-	if (atomic_read(&event->mmap_count))
+	if (refcount_read(&event->mmap_count))
 		goto unlock;
 
 	if (output_event) {
@@ -13260,7 +13260,7 @@ perf_event_set_output(struct perf_event
 			goto unlock;
 
 		/* did we race against perf_mmap_close() */
-		if (!atomic_read(&rb->mmap_count)) {
+		if (!refcount_read(&rb->mmap_count)) {
 			ring_buffer_put(rb);
 			goto unlock;
 		}
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -35,7 +35,7 @@ struct perf_buffer {
 	spinlock_t			event_lock;
 	struct list_head		event_list;
 
-	atomic_t			mmap_count;
+	refcount_t			mmap_count;
 	unsigned long			mmap_locked;
 	struct user_struct		*mmap_user;
 
@@ -47,7 +47,7 @@ struct perf_buffer {
 	unsigned long			aux_pgoff;
 	int				aux_nr_pages;
 	int				aux_overwrite;
-	atomic_t			aux_mmap_count;
+	refcount_t			aux_mmap_count;
 	unsigned long			aux_mmap_locked;
 	void				(*free_aux)(void *);
 	refcount_t			aux_refcount;
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -400,7 +400,7 @@ void *perf_aux_output_begin(struct perf_
 	 * the same order, see perf_mmap_close. Otherwise we end up freeing
 	 * aux pages in this path, which is a bug, because in_atomic().
 	 */
-	if (!atomic_read(&rb->aux_mmap_count))
+	if (!refcount_read(&rb->aux_mmap_count))
 		goto err;
 
 	if (!refcount_inc_not_zero(&rb->aux_refcount))

Re: [patch V2 RESEND 6/6] perf/core: Convert mmap() refcounts to refcount_t

Posted by Lorenzo Stoakes 6 months ago

On Mon, Aug 11, 2025 at 02:36:39PM +0200, Thomas Gleixner wrote:
> The recently fixed reference count leaks could have been detected by using
> refcount_t and refcount_t would have mitigated the potential overflow at
> least.
>
> Now that the code is properly structured, convert the mmap() related
> mmap_count variants over to refcount_t.
>
> No functional change intended.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

OK, this LGTM on basis that the _inc() -> _set() change mentioned is all good,
so:

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

>
> ---
>  include/linux/perf_event.h  |    2 +-
>  kernel/events/core.c        |   40 ++++++++++++++++++++--------------------
>  kernel/events/internal.h    |    4 ++--
>  kernel/events/ring_buffer.c |    2 +-
>  4 files changed, 24 insertions(+), 24 deletions(-)
>
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -859,7 +859,7 @@ struct perf_event {
>
>  	/* mmap bits */
>  	struct mutex			mmap_mutex;
> -	atomic_t			mmap_count;
> +	refcount_t			mmap_count;
>
>  	struct perf_buffer		*rb;
>  	struct list_head		rb_entry;
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3968,7 +3968,7 @@ static noinline int visit_groups_merge(s
>   */
>  static inline bool event_update_userpage(struct perf_event *event)
>  {
> -	if (likely(!atomic_read(&event->mmap_count)))
> +	if (likely(!refcount_read(&event->mmap_count)))
>  		return false;
>
>  	perf_event_update_time(event);
> @@ -6704,11 +6704,11 @@ static void perf_mmap_open(struct vm_are
>  	struct perf_event *event = vma->vm_file->private_data;
>  	mapped_f mapped = get_mapped(event, event_mapped);
>
> -	atomic_inc(&event->mmap_count);
> -	atomic_inc(&event->rb->mmap_count);
> +	refcount_inc(&event->mmap_count);
> +	refcount_inc(&event->rb->mmap_count);
>
>  	if (vma->vm_pgoff)
> -		atomic_inc(&event->rb->aux_mmap_count);
> +		refcount_inc(&event->rb->aux_mmap_count);
>
>  	if (mapped)
>  		mapped(event, vma->vm_mm);
> @@ -6743,7 +6743,7 @@ static void perf_mmap_close(struct vm_ar
>  	 * to avoid complications.
>  	 */
>  	if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
> -	    atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &rb->aux_mutex)) {
> +	    refcount_dec_and_mutex_lock(&rb->aux_mmap_count, &rb->aux_mutex)) {
>  		/*
>  		 * Stop all AUX events that are writing to this buffer,
>  		 * so that we can free its AUX pages and corresponding PMU
> @@ -6763,10 +6763,10 @@ static void perf_mmap_close(struct vm_ar
>  		mutex_unlock(&rb->aux_mutex);
>  	}
>
> -	if (atomic_dec_and_test(&rb->mmap_count))
> +	if (refcount_dec_and_test(&rb->mmap_count))
>  		detach_rest = true;
>
> -	if (!atomic_dec_and_mutex_lock(&event->mmap_count, &event->mmap_mutex))
> +	if (!refcount_dec_and_mutex_lock(&event->mmap_count, &event->mmap_mutex))
>  		goto out_put;
>
>  	ring_buffer_attach(event, NULL);
> @@ -6991,17 +6991,17 @@ static int perf_mmap_rb(struct vm_area_s
>  		if (data_page_nr(rb) != nr_pages)
>  			return -EINVAL;
>
> -		if (atomic_inc_not_zero(&event->rb->mmap_count)) {
> +		if (refcount_inc_not_zero(&event->rb->mmap_count)) {
>  			/*
>  			 * Success -- managed to mmap() the same buffer
>  			 * multiple times.
>  			 */
> -			atomic_inc(&event->mmap_count);
> +			refcount_inc(&event->mmap_count);
>  			return 0;
>  		}
>  		/*
>  		 * Raced against perf_mmap_close()'s
> -		 * atomic_dec_and_mutex_lock() remove the event and
> +		 * refcount_dec_and_mutex_lock() remove the event and
>  		 * continue as if !event->rb
>  		 */
>  		ring_buffer_attach(event, NULL);
> @@ -7019,7 +7019,7 @@ static int perf_mmap_rb(struct vm_area_s
>  	if (!rb)
>  		return -ENOMEM;
>
> -	atomic_set(&rb->mmap_count, 1);
> +	refcount_set(&rb->mmap_count, 1);
>  	rb->mmap_user = get_current_user();
>  	rb->mmap_locked = extra;
>
> @@ -7030,7 +7030,7 @@ static int perf_mmap_rb(struct vm_area_s
>  	perf_event_update_userpage(event);
>
>  	perf_mmap_account(vma, user_extra, extra);
> -	atomic_set(&event->mmap_count, 1);
> +	refcount_set(&event->mmap_count, 1);

This is the really pertinent change - with this now a _set (asked about on 5/6
just to confirm this is all good) - the refcount doesn't think there's a UAF and
we're all good.

>  	return 0;
>  }
>
> @@ -7071,17 +7071,17 @@ static int perf_mmap_aux(struct vm_area_
>  		return -EINVAL;
>
>  	/* If this succeeds, subsequent failures have to undo it */
> -	if (!atomic_inc_not_zero(&rb->mmap_count))
> +	if (!refcount_inc_not_zero(&rb->mmap_count))
>  		return -EINVAL;
>
>  	/* If mapped, attach to it */
>  	if (rb_has_aux(rb)) {
> -		atomic_inc(&rb->aux_mmap_count);
> +		refcount_inc(&rb->aux_mmap_count);
>  		return 0;
>  	}
>
>  	if (!perf_mmap_calc_limits(vma, &user_extra, &extra)) {
> -		atomic_dec(&rb->mmap_count);
> +		refcount_dec(&rb->mmap_count);
>  		return -EPERM;
>  	}
>
> @@ -7091,14 +7091,14 @@ static int perf_mmap_aux(struct vm_area_
>  	ret = rb_alloc_aux(rb, event, vma->vm_pgoff, nr_pages,
>  			   event->attr.aux_watermark, rb_flags);
>  	if (ret) {
> -		atomic_dec(&rb->mmap_count);
> +		refcount_dec(&rb->mmap_count);
>  		return ret;
>  	}
>
> -	atomic_set(&rb->aux_mmap_count, 1);
> +	refcount_set(&rb->aux_mmap_count, 1);
>  	rb->aux_mmap_locked = extra;
>  	perf_mmap_account(vma, user_extra, extra);
> -	atomic_inc(&event->mmap_count);
> +	refcount_inc(&event->mmap_count);
>  	return 0;
>  }
>
> @@ -13247,7 +13247,7 @@ perf_event_set_output(struct perf_event
>  	mutex_lock_double(&event->mmap_mutex, &output_event->mmap_mutex);
>  set:
>  	/* Can't redirect output if we've got an active mmap() */
> -	if (atomic_read(&event->mmap_count))
> +	if (refcount_read(&event->mmap_count))
>  		goto unlock;
>
>  	if (output_event) {
> @@ -13260,7 +13260,7 @@ perf_event_set_output(struct perf_event
>  			goto unlock;
>
>  		/* did we race against perf_mmap_close() */
> -		if (!atomic_read(&rb->mmap_count)) {
> +		if (!refcount_read(&rb->mmap_count)) {
>  			ring_buffer_put(rb);
>  			goto unlock;
>  		}
> --- a/kernel/events/internal.h
> +++ b/kernel/events/internal.h
> @@ -35,7 +35,7 @@ struct perf_buffer {
>  	spinlock_t			event_lock;
>  	struct list_head		event_list;
>
> -	atomic_t			mmap_count;
> +	refcount_t			mmap_count;
>  	unsigned long			mmap_locked;
>  	struct user_struct		*mmap_user;
>
> @@ -47,7 +47,7 @@ struct perf_buffer {
>  	unsigned long			aux_pgoff;
>  	int				aux_nr_pages;
>  	int				aux_overwrite;
> -	atomic_t			aux_mmap_count;
> +	refcount_t			aux_mmap_count;
>  	unsigned long			aux_mmap_locked;
>  	void				(*free_aux)(void *);
>  	refcount_t			aux_refcount;
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -400,7 +400,7 @@ void *perf_aux_output_begin(struct perf_
>  	 * the same order, see perf_mmap_close. Otherwise we end up freeing
>  	 * aux pages in this path, which is a bug, because in_atomic().
>  	 */
> -	if (!atomic_read(&rb->aux_mmap_count))
> +	if (!refcount_read(&rb->aux_mmap_count))
>  		goto err;
>
>  	if (!refcount_inc_not_zero(&rb->aux_refcount))
>