From nobody Sun Oct 5 09:06:16 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D5D802C08CA for ; Wed, 6 Aug 2025 20:13:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754511184; cv=none; b=QUu4P6DkHY2pFJGeJXIx2EMiGa1IhEPRGtIJr5ld9Rxjpe6T2sqSj6CciH8+Xb+qoOc8RSfB/b4UMjrLg8F1ndlxxUosxIPP0iDM6zf5yHohqUQyHMOCS0jNQPR2li00qnyv5WVX9phj3wMuLZCZoCMoFp1mt5pVm4xWf7cjlAg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754511184; c=relaxed/simple; bh=EvekjKimv6DPVa4oXAQouJKYBitznonUqyeEHOjkb5U=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=XE7i9uWz6AidwZ9a4J14KvBCtF+GZvMfcY9mL0YBWYfUHxj+PBw6NKVJNEmf5Ga4AJVwQya1nOYQXyEo1SP8XA2b/NiCvku7WMzlcvsgwrneivgMAABdc97HD9rwYg/NjnGXzqP++slRpEu4y4jBXG1dS7C6UMm6x0v8B8rkg9g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=JkbIJAtm; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=S0Thr4zo; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="JkbIJAtm"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="S0Thr4zo" Message-ID: <20250806200617.513959766@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1754511181; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=5impsAP87RWuwsK32+xW8Xa7uCwljYhrxKJgzH5kwI4=; b=JkbIJAtmSC+HFIn2FJbeJfoM2Gqq40Hfv7BvxnVbNrABVI3TdF3KqdFWioPEZLnxOXMvmE 2ummQemjkoApFLAAxE+2QhVCsuE5rriIbeXLXK0xjXcShbyW1UgKg6Iu5Yqbw/UJVLU3PP dEbf4vs2fkVaoF0kDE7qEW+CIzRMQKbCVshy2OllY2nas1zH07v6YoRPSqB+y6VlNF6vGQ OIIeduGaqLDTOKi9IpOLBe5G9PZbZBS7AW5i1UnpcP3YIUsPTThAXGHh/FWBLiWr8JYCij 1HCoBPxW4puSKMwWit8+gREgM1HsN9sdNqTiPP2HeGixmHt5Hvddu2/C/vB9vg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1754511181; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=5impsAP87RWuwsK32+xW8Xa7uCwljYhrxKJgzH5kwI4=; b=S0Thr4zo1plI/cxzCWEPZlFBsV6LVvpTaTIWbv4i3anmg8oHux4qVkjOXzqPCgEIOKmuB5 qdouZVcYk/9rsRBQ== From: Thomas Gleixner To: LKML Cc: Linus Torvalds , Peter Zijlstra , Ingo Molnar , Namhyung Kim , Arnaldo Carvalho de Melo , Lorenzo Stoakes , Kees Cook Subject: [patch 5/6] perf/core: Split the ringbuffer mmap() and allocation code out References: <20250806195624.880096284@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Wed, 6 Aug 2025 22:13:00 +0200 (CEST) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The code logic in perf_mmap() is incomprehensible and has been source of subtle bugs in the past. It makes it impossible to convert the atomic_t reference counts to refcount_t. Now that the AUX buffer mapping and allocation code is in it's own function apply the same treatment to the ringbuffer part and remove the temporary workarounds created by the AUX split out. No functional change intended. Signed-off-by: Thomas Gleixner --- kernel/events/core.c | 175 ++++++++++++++++++++++------------------------= ----- 1 file changed, 77 insertions(+), 98 deletions(-) --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6970,6 +6970,69 @@ static void perf_mmap_account(struct vm_ atomic64_add(extra, &vma->vm_mm->pinned_vm); } =20 +static int perf_mmap_rb(struct vm_area_struct *vma, struct perf_event *eve= nt, + unsigned long nr_pages) +{ + long user_extra =3D nr_pages, extra =3D 0; + struct perf_buffer *rb =3D event->rb; + int rb_flags =3D 0; + + /* + * If we have rb pages ensure they're a power-of-two number, so we + * can do bitmasks instead of modulo. + */ + if (--nr_pages !=3D 0 && !is_power_of_2(nr_pages)) + return -EINVAL; + + WARN_ON_ONCE(event->ctx->parent_ctx); + + if (rb) { + if (data_page_nr(rb) !=3D nr_pages) + return -EINVAL; + + if (atomic_inc_not_zero(&event->rb->mmap_count)) { + /* + * Success -- managed to mmap() the same buffer + * multiple times. + */ + atomic_inc(&event->mmap_count); + return 0; + } + /* + * Raced against perf_mmap_close()'s + * atomic_dec_and_mutex_lock() remove the event and + * continue as if !event->rb + */ + ring_buffer_attach(event, NULL); + } + + if (!perf_mmap_calc_limits(vma, &user_extra, &extra)) + return -EPERM; + + if (vma->vm_flags & VM_WRITE) + rb_flags |=3D RING_BUFFER_WRITABLE; + + rb =3D rb_alloc(nr_pages, event->attr.watermark ? event->attr.wakeup_wate= rmark : 0, + event->cpu, rb_flags); + + if (!rb) + return -ENOMEM; + + atomic_set(&rb->mmap_count, 1); + rb->mmap_user =3D get_current_user(); + rb->mmap_locked =3D extra; + + ring_buffer_attach(event, rb); + + perf_event_update_time(event); + perf_event_init_userpage(event); + perf_event_update_userpage(event); + + perf_mmap_account(vma, user_extra, extra); + atomic_set(&event->mmap_count, 1); + return 0; +} + static int perf_mmap_aux(struct vm_area_struct *vma, struct perf_event *ev= ent, unsigned long nr_pages) { @@ -7039,10 +7102,8 @@ static int perf_mmap(struct file *file, { struct perf_event *event =3D file->private_data; unsigned long vma_size, nr_pages; - long user_extra =3D 0, extra =3D 0; - struct perf_buffer *rb =3D NULL; - int ret, flags =3D 0; mapped_f mapped; + int ret; =20 /* * Don't allow mmap() of inherited per-task counters. This would @@ -7068,114 +7129,32 @@ static int perf_mmap(struct file *file, if (vma_size !=3D PAGE_SIZE * nr_pages) return -EINVAL; =20 - user_extra =3D nr_pages; - - mutex_lock(&event->mmap_mutex); - ret =3D -EINVAL; - - /* - * This relies on __pmu_detach_event() taking mmap_mutex after marking - * the event REVOKED. Either we observe the state, or __pmu_detach_event() - * will detach the rb created here. - */ - if (event->state <=3D PERF_EVENT_STATE_REVOKED) { - ret =3D -ENODEV; - goto unlock; - } - - if (vma->vm_pgoff =3D=3D 0) { - nr_pages -=3D 1; - + scoped_guard(mutex, &event->mmap_mutex) { /* - * If we have rb pages ensure they're a power-of-two number, so we - * can do bitmasks instead of modulo. + * This relies on __pmu_detach_event() taking mmap_mutex + * after marking the event REVOKED. Either we observe the + * state, or __pmu_detach_event() will detach the rb + * created here. */ - if (nr_pages !=3D 0 && !is_power_of_2(nr_pages)) - goto unlock; - - WARN_ON_ONCE(event->ctx->parent_ctx); - - if (event->rb) { - if (data_page_nr(event->rb) !=3D nr_pages) - goto unlock; - - if (atomic_inc_not_zero(&event->rb->mmap_count)) { - /* - * Success -- managed to mmap() the same buffer - * multiple times. - */ - ret =3D 0; - /* We need the rb to map pages. */ - rb =3D event->rb; - goto unlock; - } - - /* - * Raced against perf_mmap_close()'s - * atomic_dec_and_mutex_lock() remove the - * event and continue as if !event->rb - */ - ring_buffer_attach(event, NULL); - } + if (event->state <=3D PERF_EVENT_STATE_REVOKED) + return -ENODEV; =20 - } else { - if (event->rb) { - ret =3D -EINVAL; + if (vma->vm_pgoff =3D=3D 0) { + ret =3D perf_mmap_rb(vma, event, nr_pages); } else { + if (!event->rb) + return -EINVAL; scoped_guard(mutex, &event->rb->aux_mutex) ret =3D perf_mmap_aux(vma, event, nr_pages); } - // Temporary workaround to split out AUX handling first - mutex_unlock(&event->mmap_mutex); - goto out; - } - - if (!perf_mmap_calc_limits(vma, &user_extra, &extra)) { - ret =3D -EPERM; - goto unlock; - } - - WARN_ON(!rb && event->rb); - - if (vma->vm_flags & VM_WRITE) - flags |=3D RING_BUFFER_WRITABLE; - - if (!rb) { - rb =3D rb_alloc(nr_pages, - event->attr.watermark ? event->attr.wakeup_watermark : 0, - event->cpu, flags); - - if (!rb) { - ret =3D -ENOMEM; - goto unlock; - } - - atomic_set(&rb->mmap_count, 1); - rb->mmap_user =3D get_current_user(); - rb->mmap_locked =3D extra; - - ring_buffer_attach(event, rb); - - perf_event_update_time(event); - perf_event_init_userpage(event); - perf_event_update_userpage(event); - ret =3D 0; - } -unlock: - if (!ret) { - perf_mmap_account(vma, user_extra, extra); - atomic_inc(&event->mmap_count); } - mutex_unlock(&event->mmap_mutex); =20 -// Temporary until RB allocation is split out. -out: if (ret) return ret; =20 /* * Since pinned accounting is per vm we cannot allow fork() to copy our - * vma. + * VMA. The VMA is fixed size and must not be included in dumps. */ vm_flags_set(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP); vma->vm_ops =3D &perf_mmap_vmops; @@ -7190,7 +7169,7 @@ static int perf_mmap(struct file *file, * full cleanup in this case and therefore does not invoke * vmops::close(). */ - ret =3D map_range(rb, vma); + ret =3D map_range(event->rb, vma); if (ret) perf_mmap_close(vma);