From nobody Mon Apr 6 11:53:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B85C8C433FE for ; Thu, 29 Sep 2022 22:55:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229881AbiI2Wzo (ORCPT ); Thu, 29 Sep 2022 18:55:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47492 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229676AbiI2WzZ (ORCPT ); Thu, 29 Sep 2022 18:55:25 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD97ACBAF1 for ; Thu, 29 Sep 2022 15:55:23 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 258D4B82686 for ; Thu, 29 Sep 2022 22:55:22 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E08F7C433D7; Thu, 29 Sep 2022 22:55:20 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.96) (envelope-from ) id 1oe2SI-000clR-2Q; Thu, 29 Sep 2022 18:56:34 -0400 Message-ID: <20220929225634.360646581@goodmis.org> User-Agent: quilt/0.66 Date: Thu, 29 Sep 2022 18:55:43 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Andrew Morton , Song Liu Subject: [for-next][PATCH 01/15] ftrace: Fix recursive locking direct_mutex in ftrace_modify_direct_caller References: <20220929225542.784716766@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Song Liu Naveen reported recursive locking of direct_mutex with sample ftrace-direct-modify.ko: [ 74.762406] WARNING: possible recursive locking detected [ 74.762887] 6.0.0-rc6+ #33 Not tainted [ 74.763216] -------------------------------------------- [ 74.763672] event-sample-fn/1084 is trying to acquire lock: [ 74.764152] ffffffff86c9d6b0 (direct_mutex){+.+.}-{3:3}, at: \ register_ftrace_function+0x1f/0x180 [ 74.764922] [ 74.764922] but task is already holding lock: [ 74.765421] ffffffff86c9d6b0 (direct_mutex){+.+.}-{3:3}, at: \ modify_ftrace_direct+0x34/0x1f0 [ 74.766142] [ 74.766142] other info that might help us debug this: [ 74.766701] Possible unsafe locking scenario: [ 74.766701] [ 74.767216] CPU0 [ 74.767437] ---- [ 74.767656] lock(direct_mutex); [ 74.767952] lock(direct_mutex); [ 74.768245] [ 74.768245] *** DEADLOCK *** [ 74.768245] [ 74.768750] May be due to missing lock nesting notation [ 74.768750] [ 74.769332] 1 lock held by event-sample-fn/1084: [ 74.769731] #0: ffffffff86c9d6b0 (direct_mutex){+.+.}-{3:3}, at: \ modify_ftrace_direct+0x34/0x1f0 [ 74.770496] [ 74.770496] stack backtrace: [ 74.770884] CPU: 4 PID: 1084 Comm: event-sample-fn Not tainted ... [ 74.771498] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... [ 74.772474] Call Trace: [ 74.772696] [ 74.772896] dump_stack_lvl+0x44/0x5b [ 74.773223] __lock_acquire.cold.74+0xac/0x2b7 [ 74.773616] lock_acquire+0xd2/0x310 [ 74.773936] ? register_ftrace_function+0x1f/0x180 [ 74.774357] ? lock_is_held_type+0xd8/0x130 [ 74.774744] ? my_tramp2+0x11/0x11 [ftrace_direct_modify] [ 74.775213] __mutex_lock+0x99/0x1010 [ 74.775536] ? register_ftrace_function+0x1f/0x180 [ 74.775954] ? slab_free_freelist_hook.isra.43+0x115/0x160 [ 74.776424] ? ftrace_set_hash+0x195/0x220 [ 74.776779] ? register_ftrace_function+0x1f/0x180 [ 74.777194] ? kfree+0x3e1/0x440 [ 74.777482] ? my_tramp2+0x11/0x11 [ftrace_direct_modify] [ 74.777941] ? __schedule+0xb40/0xb40 [ 74.778258] ? register_ftrace_function+0x1f/0x180 [ 74.778672] ? my_tramp1+0xf/0xf [ftrace_direct_modify] [ 74.779128] register_ftrace_function+0x1f/0x180 [ 74.779527] ? ftrace_set_filter_ip+0x33/0x70 [ 74.779910] ? __schedule+0xb40/0xb40 [ 74.780231] ? my_tramp1+0xf/0xf [ftrace_direct_modify] [ 74.780678] ? my_tramp2+0x11/0x11 [ftrace_direct_modify] [ 74.781147] ftrace_modify_direct_caller+0x5b/0x90 [ 74.781563] ? 0xffffffffa0201000 [ 74.781859] ? my_tramp1+0xf/0xf [ftrace_direct_modify] [ 74.782309] modify_ftrace_direct+0x1b2/0x1f0 [ 74.782690] ? __schedule+0xb40/0xb40 [ 74.783014] ? simple_thread+0x2a/0xb0 [ftrace_direct_modify] [ 74.783508] ? __schedule+0xb40/0xb40 [ 74.783832] ? my_tramp2+0x11/0x11 [ftrace_direct_modify] [ 74.784294] simple_thread+0x76/0xb0 [ftrace_direct_modify] [ 74.784766] kthread+0xf5/0x120 [ 74.785052] ? kthread_complete_and_exit+0x20/0x20 [ 74.785464] ret_from_fork+0x22/0x30 [ 74.785781] Fix this by using register_ftrace_function_nolock in ftrace_modify_direct_caller. Link: https://lkml.kernel.org/r/20220927004146.1215303-1-song@kernel.org Fixes: 53cd885bc5c3 ("ftrace: Allow IPMODIFY and DIRECT ops on the same fun= ction") Reported-and-tested-by: Naveen N. Rao Signed-off-by: Song Liu Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ftrace.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 5a1ec7e1af33..406d0597c409 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -5427,6 +5427,8 @@ static struct ftrace_ops stub_ops =3D { * it is safe to modify the ftrace record, where it should be * currently calling @old_addr directly, to call @new_addr. * + * This is called with direct_mutex locked. + * * Safety checks should be made to make sure that the code at * @rec->ip is currently calling @old_addr. And this must * also update entry->direct to @new_addr. @@ -5439,6 +5441,8 @@ int __weak ftrace_modify_direct_caller(struct ftrace_= func_entry *entry, unsigned long ip =3D rec->ip; int ret; =20 + lockdep_assert_held(&direct_mutex); + /* * The ftrace_lock was used to determine if the record * had more than one registered user to it. If it did, @@ -5461,7 +5465,7 @@ int __weak ftrace_modify_direct_caller(struct ftrace_= func_entry *entry, if (ret) goto out_lock; =20 - ret =3D register_ftrace_function(&stub_ops); + ret =3D register_ftrace_function_nolock(&stub_ops); if (ret) { ftrace_set_filter_ip(&stub_ops, ip, 1, 0); goto out_lock; --=20 2.35.1 From nobody Mon Apr 6 11:53:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1471AC433F5 for ; Thu, 29 Sep 2022 22:55:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229747AbiI2Wz2 (ORCPT ); Thu, 29 Sep 2022 18:55:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47378 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229488AbiI2WzZ (ORCPT ); Thu, 29 Sep 2022 18:55:25 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E2B7A50E3; Thu, 29 Sep 2022 15:55:22 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 12C2A618D5; Thu, 29 Sep 2022 22:55:22 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7D3EBC43470; Thu, 29 Sep 2022 22:55:21 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.96) (envelope-from ) id 1oe2SJ-000clz-0l; Thu, 29 Sep 2022 18:56:35 -0400 Message-ID: <20220929225634.862151649@goodmis.org> User-Agent: quilt/0.66 Date: Thu, 29 Sep 2022 18:55:44 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Andrew Morton , stable@vger.kernel.org Subject: [for-next][PATCH 02/15] ring-buffer: Allow splice to read previous partially read pages References: <20220929225542.784716766@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Steven Rostedt (Google)" If a page is partially read, and then the splice system call is run against the ring buffer, it will always fail to read, no matter how much is in the ring buffer. That's because the code path for a partial read of the page does will fail if the "full" flag is set. The splice system call wants full pages, so if the read of the ring buffer is not yet full, it should return zero, and the splice will block. But if a previous read was done, where the beginning has been consumed, it should still be given to the splice caller if the rest of the page has been written to. This caused the splice command to never consume data in this scenario, and let the ring buffer just fill up and lose events. Link: https://lkml.kernel.org/r/20220927144317.46be6b80@gandalf.local.home Cc: stable@vger.kernel.org Fixes: 8789a9e7df6bf ("ring-buffer: read page interface") Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index d59b6a328b7f..6b145d48dfd1 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -5616,7 +5616,15 @@ int ring_buffer_read_page(struct trace_buffer *buffe= r, unsigned int pos =3D 0; unsigned int size; =20 - if (full) + /* + * If a full page is expected, this can still be returned + * if there's been a previous partial read and the + * rest of the page can be read and the commit page is off + * the reader page. + */ + if (full && + (!read || (len < (commit - read)) || + cpu_buffer->reader_page =3D=3D cpu_buffer->commit_page)) goto out_unlock; =20 if (len > (commit - read)) --=20 2.35.1 From nobody Mon Apr 6 11:53:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6DE2FC43217 for ; Thu, 29 Sep 2022 22:55:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229926AbiI2Wzb (ORCPT ); Thu, 29 Sep 2022 18:55:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47354 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229449AbiI2WzY (ORCPT ); Thu, 29 Sep 2022 18:55:24 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 21E4582D3B; Thu, 29 Sep 2022 15:55:23 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 72E72621C1; Thu, 29 Sep 2022 22:55:22 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D9A57C433B5; Thu, 29 Sep 2022 22:55:21 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.96) (envelope-from ) id 1oe2SJ-000cmX-2M; Thu, 29 Sep 2022 18:56:35 -0400 Message-ID: <20220929225635.339549783@goodmis.org> User-Agent: quilt/0.66 Date: Thu, 29 Sep 2022 18:55:45 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Andrew Morton , stable@vger.kernel.org Subject: [for-next][PATCH 03/15] ring-buffer: Have the shortest_full queue be the shortest not longest References: <20220929225542.784716766@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Steven Rostedt (Google)" The logic to know when the shortest waiters on the ring buffer should be woken up or not has uses a less than instead of a greater than compare, which causes the shortest_full to actually be the longest. Link: https://lkml.kernel.org/r/20220927231823.718039222@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar Cc: Andrew Morton Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to w= ake up reader") Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 6b145d48dfd1..02db92c9eb1b 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1011,7 +1011,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int= cpu, int full) nr_pages =3D cpu_buffer->nr_pages; dirty =3D ring_buffer_nr_dirty_pages(buffer, cpu); if (!cpu_buffer->shortest_full || - cpu_buffer->shortest_full < full) + cpu_buffer->shortest_full > full) cpu_buffer->shortest_full =3D full; raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); if (!pagebusy && --=20 2.35.1 From nobody Mon Apr 6 11:53:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F4B1C433FE for ; Thu, 29 Sep 2022 22:55:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229908AbiI2Wzi (ORCPT ); Thu, 29 Sep 2022 18:55:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47404 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229658AbiI2WzZ (ORCPT ); Thu, 29 Sep 2022 18:55:25 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD8BACBAED; Thu, 29 Sep 2022 15:55:23 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id D584D621DA; Thu, 29 Sep 2022 22:55:22 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4FCEAC433C1; Thu, 29 Sep 2022 22:55:22 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.96) (envelope-from ) id 1oe2SK-000cn7-0i; Thu, 29 Sep 2022 18:56:36 -0400 Message-ID: <20220929225635.826306562@goodmis.org> User-Agent: quilt/0.66 Date: Thu, 29 Sep 2022 18:55:46 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Andrew Morton , stable@vger.kernel.org Subject: [for-next][PATCH 04/15] ring-buffer: Check pending waiters when doing wake ups as well References: <20220929225542.784716766@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Steven Rostedt (Google)" The wake up waiters only checks the "wakeup_full" variable and not the "full_waiters_pending". The full_waiters_pending is set when a waiter is added to the wait queue. The wakeup_full is only set when an event is triggered, and it clears the full_waiters_pending to avoid multiple calls to irq_work_queue(). The irq_work callback really needs to check both wakeup_full as well as full_waiters_pending such that this code can be used to wake up waiters when a file is closed that represents the ring buffer and the waiters need to be woken up. Link: https://lkml.kernel.org/r/20220927231824.209460321@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar Cc: Andrew Morton Fixes: 15693458c4bc0 ("tracing/ring-buffer: Move poll wake ups into ring bu= ffer code") Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 02db92c9eb1b..5a7d818ca3ea 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -917,8 +917,9 @@ static void rb_wake_up_waiters(struct irq_work *work) struct rb_irq_work *rbwork =3D container_of(work, struct rb_irq_work, wor= k); =20 wake_up_all(&rbwork->waiters); - if (rbwork->wakeup_full) { + if (rbwork->full_waiters_pending || rbwork->wakeup_full) { rbwork->wakeup_full =3D false; + rbwork->full_waiters_pending =3D false; wake_up_all(&rbwork->full_waiters); } } --=20 2.35.1 From nobody Mon Apr 6 11:53:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44607C433FE for ; Thu, 29 Sep 2022 22:55:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230041AbiI2Wzt (ORCPT ); Thu, 29 Sep 2022 18:55:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47516 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229734AbiI2WzZ (ORCPT ); Thu, 29 Sep 2022 18:55:25 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6B352D2D6A; Thu, 29 Sep 2022 15:55:24 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 5CB18621D9; Thu, 29 Sep 2022 22:55:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CB165C433D6; Thu, 29 Sep 2022 22:55:22 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.96) (envelope-from ) id 1oe2SK-000cng-2J; Thu, 29 Sep 2022 18:56:36 -0400 Message-ID: <20220929225636.321282157@goodmis.org> User-Agent: quilt/0.66 Date: Thu, 29 Sep 2022 18:55:47 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Andrew Morton , stable@vger.kernel.org Subject: [for-next][PATCH 05/15] ring-buffer: Add ring_buffer_wake_waiters() References: <20220929225542.784716766@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Steven Rostedt (Google)" On closing of a file that represents a ring buffer or flushing the file, there may be waiters on the ring buffer that needs to be woken up and exit the ring_buffer_wait() function. Add ring_buffer_wake_waiters() to wake up the waiters on the ring buffer and allow them to exit the wait loop. Link: https://lkml.kernel.org/r/20220928133938.28dc2c27@gandalf.local.home Cc: stable@vger.kernel.org Cc: Ingo Molnar Cc: Andrew Morton Fixes: 15693458c4bc0 ("tracing/ring-buffer: Move poll wake ups into ring bu= ffer code") Signed-off-by: Steven Rostedt (Google) --- include/linux/ring_buffer.h | 2 +- kernel/trace/ring_buffer.c | 39 +++++++++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+), 1 deletion(-) diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index dac53fd3afea..2504df9a0453 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -101,7 +101,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags,= struct lock_class_key *k int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full); __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, struct file *filp, poll_table *poll_table); - +void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu); =20 #define RING_BUFFER_ALL_CPUS -1 =20 diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 5a7d818ca3ea..3046deacf7b3 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -413,6 +413,7 @@ struct rb_irq_work { struct irq_work work; wait_queue_head_t waiters; wait_queue_head_t full_waiters; + long wait_index; bool waiters_pending; bool full_waiters_pending; bool wakeup_full; @@ -924,6 +925,37 @@ static void rb_wake_up_waiters(struct irq_work *work) } } =20 +/** + * ring_buffer_wake_waiters - wake up any waiters on this ring buffer + * @buffer: The ring buffer to wake waiters on + * + * In the case of a file that represents a ring buffer is closing, + * it is prudent to wake up any waiters that are on this. + */ +void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu) +{ + struct ring_buffer_per_cpu *cpu_buffer; + struct rb_irq_work *rbwork; + + if (cpu =3D=3D RING_BUFFER_ALL_CPUS) { + + /* Wake up individual ones too. One level recursion */ + for_each_buffer_cpu(buffer, cpu) + ring_buffer_wake_waiters(buffer, cpu); + + rbwork =3D &buffer->irq_work; + } else { + cpu_buffer =3D buffer->buffers[cpu]; + rbwork =3D &cpu_buffer->irq_work; + } + + rbwork->wait_index++; + /* make sure the waiters see the new index */ + smp_wmb(); + + rb_wake_up_waiters(&rbwork->work); +} + /** * ring_buffer_wait - wait for input to the ring buffer * @buffer: buffer to wait on @@ -939,6 +971,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int c= pu, int full) struct ring_buffer_per_cpu *cpu_buffer; DEFINE_WAIT(wait); struct rb_irq_work *work; + long wait_index; int ret =3D 0; =20 /* @@ -957,6 +990,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int c= pu, int full) work =3D &cpu_buffer->irq_work; } =20 + wait_index =3D READ_ONCE(work->wait_index); =20 while (true) { if (full) @@ -1021,6 +1055,11 @@ int ring_buffer_wait(struct trace_buffer *buffer, in= t cpu, int full) } =20 schedule(); + + /* Make sure to see the new wait index */ + smp_rmb(); + if (wait_index !=3D work->wait_index) + break; } =20 if (full) --=20 2.35.1 From nobody Mon Apr 6 11:53:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C1D3C433F5 for ; Thu, 29 Sep 2022 22:55:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229791AbiI2Wzw (ORCPT ); Thu, 29 Sep 2022 18:55:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47520 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229744AbiI2WzZ (ORCPT ); Thu, 29 Sep 2022 18:55:25 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98E3AD74D6; Thu, 29 Sep 2022 15:55:24 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6F208621DE; Thu, 29 Sep 2022 22:55:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 48B05C433C1; Thu, 29 Sep 2022 22:55:23 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.96) (envelope-from ) id 1oe2SL-000coF-0d; Thu, 29 Sep 2022 18:56:37 -0400 Message-ID: <20220929225636.815153112@goodmis.org> User-Agent: quilt/0.66 Date: Thu, 29 Sep 2022 18:55:48 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Andrew Morton , stable@vger.kernel.org Subject: [for-next][PATCH 06/15] tracing: Wake up ring buffer waiters on closing of the file References: <20220929225542.784716766@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Steven Rostedt (Google)" When the file that represents the ring buffer is closed, there may be waiters waiting on more input from the ring buffer. Call ring_buffer_wake_waiters() to wake up any waiters when the file is closed. Link: https://lkml.kernel.org/r/20220927231825.182416969@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar Cc: Andrew Morton Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") Signed-off-by: Steven Rostedt (Google) --- include/linux/trace_events.h | 1 + kernel/trace/trace.c | 15 +++++++++++++++ 2 files changed, 16 insertions(+) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 8401dec93c15..20749bd9db71 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -92,6 +92,7 @@ struct trace_iterator { unsigned int temp_size; char *fmt; /* modified format holder */ unsigned int fmt_size; + long wait_index; =20 /* trace_seq for __print_flags() and __print_symbolic() etc. */ struct trace_seq tmp_seq; diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index aed7ea6e6045..e101b0764b39 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8160,6 +8160,12 @@ static int tracing_buffers_release(struct inode *ino= de, struct file *file) =20 __trace_array_put(iter->tr); =20 + iter->wait_index++; + /* Make sure the waiters see the new wait_index */ + smp_wmb(); + + ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); + if (info->spare) ring_buffer_free_read_page(iter->array_buffer->buffer, info->spare_cpu, info->spare); @@ -8313,6 +8319,8 @@ tracing_buffers_splice_read(struct file *file, loff_t= *ppos, =20 /* did we read anything? */ if (!spd.nr_pages) { + long wait_index; + if (ret) goto out; =20 @@ -8320,10 +8328,17 @@ tracing_buffers_splice_read(struct file *file, loff= _t *ppos, if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK)) goto out; =20 + wait_index =3D READ_ONCE(iter->wait_index); + ret =3D wait_on_pipe(iter, iter->tr->buffer_percent); if (ret) goto out; =20 + /* Make sure we see the new wait_index */ + smp_rmb(); + if (wait_index !=3D iter->wait_index) + goto out; + goto again; } =20 --=20 2.35.1 From nobody Mon Apr 6 11:53:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC603C433F5 for ; Thu, 29 Sep 2022 22:56:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230175AbiI2W4C (ORCPT ); Thu, 29 Sep 2022 18:56:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47520 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229819AbiI2Wz2 (ORCPT ); Thu, 29 Sep 2022 18:55:28 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B840E7E1D; Thu, 29 Sep 2022 15:55:25 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 5EF87621D7; Thu, 29 Sep 2022 22:55:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CB18AC433D6; Thu, 29 Sep 2022 22:55:23 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.96) (envelope-from ) id 1oe2SL-000coo-2F; Thu, 29 Sep 2022 18:56:37 -0400 Message-ID: <20220929225637.311818825@goodmis.org> User-Agent: quilt/0.66 Date: Thu, 29 Sep 2022 18:55:49 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Andrew Morton , stable@vger.kernel.org Subject: [for-next][PATCH 07/15] tracing: Add ioctl() to force ring buffer waiters to wake up References: <20220929225542.784716766@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Steven Rostedt (Google)" If a process is waiting on the ring buffer for data, there currently isn't a clean way to force it to wake up. Add an ioctl call that will force any tasks that are waiting on the trace_pipe_raw file to wake up. Link: https://lkml.kernel.org/r/20220929095029.117f913f@gandalf.local.home Cc: stable@vger.kernel.org Cc: Ingo Molnar Cc: Andrew Morton Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index e101b0764b39..58afc83afc9d 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8349,12 +8349,34 @@ tracing_buffers_splice_read(struct file *file, loff= _t *ppos, return ret; } =20 +/* An ioctl call with cmd 0 to the ring buffer file will wake up all waite= rs */ +static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, uns= igned long arg) +{ + struct ftrace_buffer_info *info =3D file->private_data; + struct trace_iterator *iter =3D &info->iter; + + if (cmd) + return -ENOIOCTLCMD; + + mutex_lock(&trace_types_lock); + + iter->wait_index++; + /* Make sure the waiters see the new wait_index */ + smp_wmb(); + + ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file); + + mutex_unlock(&trace_types_lock); + return 0; +} + static const struct file_operations tracing_buffers_fops =3D { .open =3D tracing_buffers_open, .read =3D tracing_buffers_read, .poll =3D tracing_buffers_poll, .release =3D tracing_buffers_release, .splice_read =3D tracing_buffers_splice_read, + .unlocked_ioctl =3D tracing_buffers_ioctl, .llseek =3D no_llseek, }; =20 --=20 2.35.1 From nobody Mon Apr 6 11:53:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0967EC433FE for ; Thu, 29 Sep 2022 22:56:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230154AbiI2Wz7 (ORCPT ); Thu, 29 Sep 2022 18:55:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47492 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229488AbiI2Wz2 (ORCPT ); Thu, 29 Sep 2022 18:55:28 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C05FEA590; Thu, 29 Sep 2022 15:55:25 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id CCF99621DD; Thu, 29 Sep 2022 22:55:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 40F1AC433B5; Thu, 29 Sep 2022 22:55:24 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.96) (envelope-from ) id 1oe2SM-000cpN-0b; Thu, 29 Sep 2022 18:56:38 -0400 Message-ID: <20220929225637.791264178@goodmis.org> User-Agent: quilt/0.66 Date: Thu, 29 Sep 2022 18:55:50 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Andrew Morton , stable@vger.kernel.org Subject: [for-next][PATCH 08/15] tracing: Wake up waiters when tracing is disabled References: <20220929225542.784716766@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Steven Rostedt (Google)" When tracing is disabled, there's no reason that waiters should stay waiting, wake them up, otherwise tasks get stuck when they should be flushing the buffers. Cc: stable@vger.kernel.org Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 58afc83afc9d..bb5597c6bfc1 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8334,6 +8334,10 @@ tracing_buffers_splice_read(struct file *file, loff_= t *ppos, if (ret) goto out; =20 + /* No need to wait after waking up when tracing is off */ + if (!tracer_tracing_is_on(iter->tr)) + goto out; + /* Make sure we see the new wait_index */ smp_rmb(); if (wait_index !=3D iter->wait_index) @@ -9065,6 +9069,8 @@ rb_simple_write(struct file *filp, const char __user = *ubuf, tracer_tracing_off(tr); if (tr->current_trace->stop) tr->current_trace->stop(tr); + /* Wake up any waiters */ + ring_buffer_wake_waiters(buffer, RING_BUFFER_ALL_CPUS); } mutex_unlock(&trace_types_lock); } --=20 2.35.1 From nobody Mon Apr 6 11:53:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8435BC433F5 for ; Thu, 29 Sep 2022 22:55:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229964AbiI2Wzz (ORCPT ); Thu, 29 Sep 2022 18:55:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47522 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229735AbiI2Wz2 (ORCPT ); Thu, 29 Sep 2022 18:55:28 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C370EC55A for ; Thu, 29 Sep 2022 15:55:25 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id EC4A2621E1 for ; Thu, 29 Sep 2022 22:55:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C2A83C433C1; Thu, 29 Sep 2022 22:55:24 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.96) (envelope-from ) id 1oe2SM-000cpw-2W; Thu, 29 Sep 2022 18:56:38 -0400 Message-ID: <20220929225638.299125775@goodmis.org> User-Agent: quilt/0.66 Date: Thu, 29 Sep 2022 18:55:51 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Andrew Morton , Colin Ian King Subject: [for-next][PATCH 09/15] tracing: Fix spelling mistake "preapre" -> "prepare" References: <20220929225542.784716766@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Colin Ian King There is a spelling mistake in the trace text. Fix it. Link: https://lkml.kernel.org/r/20220928215828.66325-1-colin.i.king@gmail.c= om Signed-off-by: Colin Ian King Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index bb5597c6bfc1..def721de68a0 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -10157,7 +10157,7 @@ __init static int tracer_alloc_buffers(void) * buffer. The memory will be removed once the "instance" is removed. */ ret =3D cpuhp_setup_state_multi(CPUHP_TRACE_RB_PREPARE, - "trace/RB:preapre", trace_rb_cpu_prepare, + "trace/RB:prepare", trace_rb_cpu_prepare, NULL); if (ret < 0) goto out_free_cpumask; --=20 2.35.1 From nobody Mon Apr 6 11:53:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56EE3C433F5 for ; Thu, 29 Sep 2022 22:56:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230196AbiI2W4I (ORCPT ); Thu, 29 Sep 2022 18:56:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229923AbiI2Wzb (ORCPT ); Thu, 29 Sep 2022 18:55:31 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A3C6D2D6A for ; Thu, 29 Sep 2022 15:55:28 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B5BBAB82593 for ; Thu, 29 Sep 2022 22:55:26 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 89ED7C433D6; Thu, 29 Sep 2022 22:55:25 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.96) (envelope-from ) id 1oe2SN-000cqV-19; Thu, 29 Sep 2022 18:56:39 -0400 Message-ID: <20220929225638.903022694@goodmis.org> User-Agent: quilt/0.66 Date: Thu, 29 Sep 2022 18:55:52 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Andrew Morton , Beau Belgrave Subject: [for-next][PATCH 10/15] tracing/user_events: Use NULL for strstr checks References: <20220929225542.784716766@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Beau Belgrave Trivial fix to ensure strstr checks use NULL instead of 0. Link: https://lkml.kernel.org/r/20220728233309.1896-2-beaub@linux.microsoft= .com Signed-off-by: Beau Belgrave Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace_events_user.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_u= ser.c index a6621c52ce45..075d694d20e3 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -277,7 +277,7 @@ static int user_event_add_field(struct user_event *user= , const char *type, goto add_field; =20 add_validator: - if (strstr(type, "char") !=3D 0) + if (strstr(type, "char") !=3D NULL) validator_flags |=3D VALIDATOR_ENSURE_NULL; =20 validator =3D kmalloc(sizeof(*validator), GFP_KERNEL); @@ -458,7 +458,7 @@ static const char *user_field_format(const char *type) return "%d"; if (strcmp(type, "unsigned char") =3D=3D 0) return "%u"; - if (strstr(type, "char[") !=3D 0) + if (strstr(type, "char[") !=3D NULL) return "%s"; =20 /* Unknown, likely struct, allowed treat as 64-bit */ @@ -479,7 +479,7 @@ static bool user_field_is_dyn_string(const char *type, = const char **str_func) =20 return false; check: - return strstr(type, "char") !=3D 0; + return strstr(type, "char") !=3D NULL; } =20 #define LEN_OR_ZERO (len ? len - pos : 0) --=20 2.35.1 From nobody Mon Apr 6 11:53:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6279C433FE for ; Thu, 29 Sep 2022 22:56:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230090AbiI2W40 (ORCPT ); Thu, 29 Sep 2022 18:56:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48796 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229929AbiI2Wzf (ORCPT ); Thu, 29 Sep 2022 18:55:35 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A1BFFF509D for ; Thu, 29 Sep 2022 15:55:28 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 2DD9EB82685 for ; Thu, 29 Sep 2022 22:55:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D755CC433C1; Thu, 29 Sep 2022 22:55:25 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.96) (envelope-from ) id 1oe2SN-000cr4-2g; Thu, 29 Sep 2022 18:56:39 -0400 Message-ID: <20220929225639.450643112@goodmis.org> User-Agent: quilt/0.66 Date: Thu, 29 Sep 2022 18:55:53 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Andrew Morton , Mathieu Desnoyers , Beau Belgrave Subject: [for-next][PATCH 11/15] tracing/user_events: Use WRITE instead of READ for io vector import References: <20220929225542.784716766@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Beau Belgrave import_single_range expects the direction/rw to be where it came from, not the protection/limit. Since the import is in a write path use WRITE. Link: https://lkml.kernel.org/r/20220728233309.1896-3-beaub@linux.microsoft= .com Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.= zimbra@efficios.com/ Reported-by: Mathieu Desnoyers Signed-off-by: Beau Belgrave Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace_events_user.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_u= ser.c index 075d694d20e3..15edbf6b1e2e 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -1245,7 +1245,8 @@ static ssize_t user_events_write(struct file *file, c= onst char __user *ubuf, if (unlikely(*ppos !=3D 0)) return -EFAULT; =20 - if (unlikely(import_single_range(READ, (char *)ubuf, count, &iov, &i))) + if (unlikely(import_single_range(WRITE, (char __user *)ubuf, + count, &iov, &i))) return -EFAULT; =20 return user_events_write_core(file, &i); --=20 2.35.1 From nobody Mon Apr 6 11:53:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4ADCAC433FE for ; Thu, 29 Sep 2022 22:56:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230189AbiI2W4E (ORCPT ); Thu, 29 Sep 2022 18:56:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47522 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229842AbiI2Wz3 (ORCPT ); Thu, 29 Sep 2022 18:55:29 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44ED3F192B for ; Thu, 29 Sep 2022 15:55:27 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 16238621C1 for ; Thu, 29 Sep 2022 22:55:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7DB87C433B5; Thu, 29 Sep 2022 22:55:26 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.96) (envelope-from ) id 1oe2SO-000crd-18; Thu, 29 Sep 2022 18:56:40 -0400 Message-ID: <20220929225639.943545555@goodmis.org> User-Agent: quilt/0.66 Date: Thu, 29 Sep 2022 18:55:54 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Andrew Morton , Mathieu Desnoyers , Beau Belgrave Subject: [for-next][PATCH 12/15] tracing/user_events: Ensure user provided strings are safely formatted References: <20220929225542.784716766@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Beau Belgrave User processes can provide bad strings that may cause issues or leak kernel details back out. Don't trust the content of these strings when formatting strings for matching. This also moves to a consistent dynamic length string creation model. Link: https://lkml.kernel.org/r/20220728233309.1896-4-beaub@linux.microsoft= .com Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.= zimbra@efficios.com/ Reported-by: Mathieu Desnoyers Signed-off-by: Beau Belgrave Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace_events_user.c | 91 +++++++++++++++++++++----------- 1 file changed, 59 insertions(+), 32 deletions(-) diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_u= ser.c index 15edbf6b1e2e..f9bb7d37d76f 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -45,7 +45,6 @@ #define MAX_EVENT_DESC 512 #define EVENT_NAME(user_event) ((user_event)->tracepoint.name) #define MAX_FIELD_ARRAY_SIZE 1024 -#define MAX_FIELD_ARG_NAME 256 =20 static char *register_page_data; =20 @@ -483,6 +482,48 @@ static bool user_field_is_dyn_string(const char *type,= const char **str_func) } =20 #define LEN_OR_ZERO (len ? len - pos : 0) +static int user_dyn_field_set_string(int argc, const char **argv, int *iou= t, + char *buf, int len, bool *colon) +{ + int pos =3D 0, i =3D *iout; + + *colon =3D false; + + for (; i < argc; ++i) { + if (i !=3D *iout) + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, " "); + + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, "%s", argv[i]); + + if (strchr(argv[i], ';')) { + ++i; + *colon =3D true; + break; + } + } + + /* Actual set, advance i */ + if (len !=3D 0) + *iout =3D i; + + return pos + 1; +} + +static int user_field_set_string(struct ftrace_event_field *field, + char *buf, int len, bool colon) +{ + int pos =3D 0; + + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, "%s", field->type); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, " "); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, "%s", field->name); + + if (colon) + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, ";"); + + return pos + 1; +} + static int user_event_set_print_fmt(struct user_event *user, char *buf, in= t len) { struct ftrace_event_field *field, *next; @@ -926,49 +967,35 @@ static int user_event_free(struct dyn_event *ev) static bool user_field_match(struct ftrace_event_field *field, int argc, const char **argv, int *iout) { - char *field_name, *arg_name; - int len, pos, i =3D *iout; + char *field_name =3D NULL, *dyn_field_name =3D NULL; bool colon =3D false, match =3D false; + int dyn_len, len; =20 - if (i >=3D argc) + if (*iout >=3D argc) return false; =20 - len =3D MAX_FIELD_ARG_NAME; - field_name =3D kmalloc(len, GFP_KERNEL); - arg_name =3D kmalloc(len, GFP_KERNEL); + dyn_len =3D user_dyn_field_set_string(argc, argv, iout, dyn_field_name, + 0, &colon); =20 - if (!arg_name || !field_name) - goto out; - - pos =3D 0; - - for (; i < argc; ++i) { - if (i !=3D *iout) - pos +=3D snprintf(arg_name + pos, len - pos, " "); + len =3D user_field_set_string(field, field_name, 0, colon); =20 - pos +=3D snprintf(arg_name + pos, len - pos, argv[i]); - - if (strchr(argv[i], ';')) { - ++i; - colon =3D true; - break; - } - } + if (dyn_len !=3D len) + return false; =20 - pos =3D 0; + dyn_field_name =3D kmalloc(dyn_len, GFP_KERNEL); + field_name =3D kmalloc(len, GFP_KERNEL); =20 - pos +=3D snprintf(field_name + pos, len - pos, field->type); - pos +=3D snprintf(field_name + pos, len - pos, " "); - pos +=3D snprintf(field_name + pos, len - pos, field->name); + if (!dyn_field_name || !field_name) + goto out; =20 - if (colon) - pos +=3D snprintf(field_name + pos, len - pos, ";"); + user_dyn_field_set_string(argc, argv, iout, dyn_field_name, + dyn_len, &colon); =20 - *iout =3D i; + user_field_set_string(field, field_name, len, colon); =20 - match =3D strcmp(arg_name, field_name) =3D=3D 0; + match =3D strcmp(dyn_field_name, field_name) =3D=3D 0; out: - kfree(arg_name); + kfree(dyn_field_name); kfree(field_name); =20 return match; --=20 2.35.1 From nobody Mon Apr 6 11:53:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3D24C433F5 for ; Thu, 29 Sep 2022 22:56:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230220AbiI2W4P (ORCPT ); Thu, 29 Sep 2022 18:56:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48722 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229606AbiI2Wzf (ORCPT ); Thu, 29 Sep 2022 18:55:35 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B23CD74D6 for ; Thu, 29 Sep 2022 15:55:28 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8FAE9621C2 for ; Thu, 29 Sep 2022 22:55:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0A399C43142; Thu, 29 Sep 2022 22:55:27 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.96) (envelope-from ) id 1oe2SO-000csD-2u; Thu, 29 Sep 2022 18:56:40 -0400 Message-ID: <20220929225640.475558980@goodmis.org> User-Agent: quilt/0.66 Date: Thu, 29 Sep 2022 18:55:55 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Andrew Morton , Mathieu Desnoyers , Beau Belgrave Subject: [for-next][PATCH 13/15] tracing/user_events: Use refcount instead of atomic for ref tracking References: <20220929225542.784716766@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Beau Belgrave User processes could open up enough event references to cause rollovers. These could cause use after free scenarios, which we do not want. Switching to refcount APIs prevent this, but will leak memory once saturated. Once saturated, user processes can still use the events. This prevents a bad user process from stopping existing telemetry from being emitted. Link: https://lkml.kernel.org/r/20220728233309.1896-5-beaub@linux.microsoft= .com Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.= zimbra@efficios.com/ Reported-by: Mathieu Desnoyers Signed-off-by: Beau Belgrave Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace_events_user.c | 53 +++++++++++++++----------------- 1 file changed, 24 insertions(+), 29 deletions(-) diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_u= ser.c index f9bb7d37d76f..2bcae7abfa81 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -57,7 +58,7 @@ static DECLARE_BITMAP(page_bitmap, MAX_EVENTS); * within a file a user_event might be created if it does not * already exist. These are globally used and their lifetime * is tied to the refcnt member. These cannot go away until the - * refcnt reaches zero. + * refcnt reaches one. */ struct user_event { struct tracepoint tracepoint; @@ -67,7 +68,7 @@ struct user_event { struct hlist_node node; struct list_head fields; struct list_head validators; - atomic_t refcnt; + refcount_t refcnt; int index; int flags; int min_size; @@ -105,6 +106,12 @@ static u32 user_event_key(char *name) return jhash(name, strlen(name), 0); } =20 +static __always_inline __must_check +bool user_event_last_ref(struct user_event *user) +{ + return refcount_read(&user->refcnt) =3D=3D 1; +} + static __always_inline __must_check size_t copy_nofault(void *addr, size_t bytes, struct iov_iter *i) { @@ -662,7 +669,7 @@ static struct user_event *find_user_event(char *name, u= 32 *outkey) =20 hash_for_each_possible(register_table, user, node, key) if (!strcmp(EVENT_NAME(user), name)) { - atomic_inc(&user->refcnt); + refcount_inc(&user->refcnt); return user; } =20 @@ -876,12 +883,12 @@ static int user_event_reg(struct trace_event_call *ca= ll, =20 return ret; inc: - atomic_inc(&user->refcnt); + refcount_inc(&user->refcnt); update_reg_page_for(user); return 0; dec: update_reg_page_for(user); - atomic_dec(&user->refcnt); + refcount_dec(&user->refcnt); return 0; } =20 @@ -907,7 +914,7 @@ static int user_event_create(const char *raw_command) ret =3D user_event_parse_cmd(name, &user); =20 if (!ret) - atomic_dec(&user->refcnt); + refcount_dec(&user->refcnt); =20 mutex_unlock(®_mutex); =20 @@ -951,14 +958,14 @@ static bool user_event_is_busy(struct dyn_event *ev) { struct user_event *user =3D container_of(ev, struct user_event, devent); =20 - return atomic_read(&user->refcnt) !=3D 0; + return !user_event_last_ref(user); } =20 static int user_event_free(struct dyn_event *ev) { struct user_event *user =3D container_of(ev, struct user_event, devent); =20 - if (atomic_read(&user->refcnt) !=3D 0) + if (!user_event_last_ref(user)) return -EBUSY; =20 return destroy_user_event(user); @@ -1137,8 +1144,8 @@ static int user_event_parse(char *name, char *args, c= har *flags, =20 user->index =3D index; =20 - /* Ensure we track ref */ - atomic_inc(&user->refcnt); + /* Ensure we track self ref and caller ref (2) */ + refcount_set(&user->refcnt, 2); =20 dyn_event_init(&user->devent, &user_event_dops); dyn_event_add(&user->devent, &user->call); @@ -1164,29 +1171,17 @@ static int user_event_parse(char *name, char *args,= char *flags, static int delete_user_event(char *name) { u32 key; - int ret; struct user_event *user =3D find_user_event(name, &key); =20 if (!user) return -ENOENT; =20 - /* Ensure we are the last ref */ - if (atomic_read(&user->refcnt) !=3D 1) { - ret =3D -EBUSY; - goto put_ref; - } - - ret =3D destroy_user_event(user); + refcount_dec(&user->refcnt); =20 - if (ret) - goto put_ref; - - return ret; -put_ref: - /* No longer have this ref */ - atomic_dec(&user->refcnt); + if (!user_event_last_ref(user)) + return -EBUSY; =20 - return ret; + return destroy_user_event(user); } =20 /* @@ -1314,7 +1309,7 @@ static int user_events_ref_add(struct file *file, str= uct user_event *user) =20 new_refs->events[i] =3D user; =20 - atomic_inc(&user->refcnt); + refcount_inc(&user->refcnt); =20 rcu_assign_pointer(file->private_data, new_refs); =20 @@ -1374,7 +1369,7 @@ static long user_events_ioctl_reg(struct file *file, = unsigned long uarg) ret =3D user_events_ref_add(file, user); =20 /* No longer need parse ref, ref_add either worked or not */ - atomic_dec(&user->refcnt); + refcount_dec(&user->refcnt); =20 /* Positive number is index and valid */ if (ret < 0) @@ -1464,7 +1459,7 @@ static int user_events_release(struct inode *node, st= ruct file *file) user =3D refs->events[i]; =20 if (user) - atomic_dec(&user->refcnt); + refcount_dec(&user->refcnt); } out: file->private_data =3D NULL; --=20 2.35.1 From nobody Mon Apr 6 11:53:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36E94C433F5 for ; Thu, 29 Sep 2022 22:56:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230199AbiI2W4M (ORCPT ); Thu, 29 Sep 2022 18:56:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229562AbiI2Wzf (ORCPT ); Thu, 29 Sep 2022 18:55:35 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B450E11A6 for ; Thu, 29 Sep 2022 15:55:28 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 433CC621DF for ; Thu, 29 Sep 2022 22:55:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 82A13C433C1; Thu, 29 Sep 2022 22:55:27 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.96) (envelope-from ) id 1oe2SP-000cso-1U; Thu, 29 Sep 2022 18:56:41 -0400 Message-ID: <20220929225641.003313801@goodmis.org> User-Agent: quilt/0.66 Date: Thu, 29 Sep 2022 18:55:56 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Andrew Morton , Mathieu Desnoyers , Beau Belgrave Subject: [for-next][PATCH 14/15] tracing/user_events: Use bits vs bytes for enabled status page data References: <20220929225542.784716766@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Beau Belgrave User processes may require many events and when they do the cache performance of a byte index status check is less ideal than a bit index. The previous event limit per-page was 4096, the new limit is 32,768. This change adds a bitwise index to the user_reg struct. Programs check that the bit at status_bit has a bit set within the status page(s). Link: https://lkml.kernel.org/r/20220728233309.1896-6-beaub@linux.microsoft= .com Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.= zimbra@efficios.com/ Suggested-by: Mathieu Desnoyers Signed-off-by: Beau Belgrave Signed-off-by: Steven Rostedt (Google) --- include/linux/user_events.h | 15 +--- kernel/trace/trace_events_user.c | 75 +++++++++++++++++-- samples/user_events/example.c | 25 +++++-- .../selftests/user_events/ftrace_test.c | 47 ++++++++++-- .../testing/selftests/user_events/perf_test.c | 11 ++- 5 files changed, 135 insertions(+), 38 deletions(-) diff --git a/include/linux/user_events.h b/include/linux/user_events.h index 736e05603463..592a3fbed98e 100644 --- a/include/linux/user_events.h +++ b/include/linux/user_events.h @@ -20,15 +20,6 @@ #define USER_EVENTS_SYSTEM "user_events" #define USER_EVENTS_PREFIX "u:" =20 -/* Bits 0-6 are for known probe types, Bit 7 is for unknown probes */ -#define EVENT_BIT_FTRACE 0 -#define EVENT_BIT_PERF 1 -#define EVENT_BIT_OTHER 7 - -#define EVENT_STATUS_FTRACE (1 << EVENT_BIT_FTRACE) -#define EVENT_STATUS_PERF (1 << EVENT_BIT_PERF) -#define EVENT_STATUS_OTHER (1 << EVENT_BIT_OTHER) - /* Create dynamic location entry within a 32-bit value */ #define DYN_LOC(offset, size) ((size) << 16 | (offset)) =20 @@ -45,12 +36,12 @@ struct user_reg { /* Input: Pointer to string with event name, description and flags */ __u64 name_args; =20 - /* Output: Byte index of the event within the status page */ - __u32 status_index; + /* Output: Bitwise index of the event within the status page */ + __u32 status_bit; =20 /* Output: Index of the event to use when writing data */ __u32 write_index; -}; +} __attribute__((__packed__)); =20 #define DIAG_IOC_MAGIC '*' =20 diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_u= ser.c index 2bcae7abfa81..2c0a6ec75548 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -40,17 +40,44 @@ */ #define MAX_PAGE_ORDER 0 #define MAX_PAGES (1 << MAX_PAGE_ORDER) -#define MAX_EVENTS (MAX_PAGES * PAGE_SIZE) +#define MAX_BYTES (MAX_PAGES * PAGE_SIZE) +#define MAX_EVENTS (MAX_BYTES * 8) =20 /* Limit how long of an event name plus args within the subsystem. */ #define MAX_EVENT_DESC 512 #define EVENT_NAME(user_event) ((user_event)->tracepoint.name) #define MAX_FIELD_ARRAY_SIZE 1024 =20 +/* + * The MAP_STATUS_* macros are used for taking a index and determining the + * appropriate byte and the bit in the byte to set/reset for an event. + * + * The lower 3 bits of the index decide which bit to set. + * The remaining upper bits of the index decide which byte to use for the = bit. + * + * This is used when an event has a probe attached/removed to reflect live + * status of the event wanting tracing or not to user-programs via shared + * memory maps. + */ +#define MAP_STATUS_BYTE(index) ((index) >> 3) +#define MAP_STATUS_MASK(index) BIT((index) & 7) + +/* + * Internal bits (kernel side only) to keep track of connected probes: + * These are used when status is requested in text form about an event. Th= ese + * bits are compared against an internal byte on the event to determine wh= ich + * probes to print out to the user. + * + * These do not reflect the mapped bytes between the user and kernel space. + */ +#define EVENT_STATUS_FTRACE BIT(0) +#define EVENT_STATUS_PERF BIT(1) +#define EVENT_STATUS_OTHER BIT(7) + static char *register_page_data; =20 static DEFINE_MUTEX(reg_mutex); -static DEFINE_HASHTABLE(register_table, 4); +static DEFINE_HASHTABLE(register_table, 8); static DECLARE_BITMAP(page_bitmap, MAX_EVENTS); =20 /* @@ -72,6 +99,7 @@ struct user_event { int index; int flags; int min_size; + char status; }; =20 /* @@ -106,6 +134,22 @@ static u32 user_event_key(char *name) return jhash(name, strlen(name), 0); } =20 +static __always_inline +void user_event_register_set(struct user_event *user) +{ + int i =3D user->index; + + register_page_data[MAP_STATUS_BYTE(i)] |=3D MAP_STATUS_MASK(i); +} + +static __always_inline +void user_event_register_clear(struct user_event *user) +{ + int i =3D user->index; + + register_page_data[MAP_STATUS_BYTE(i)] &=3D ~MAP_STATUS_MASK(i); +} + static __always_inline __must_check bool user_event_last_ref(struct user_event *user) { @@ -648,7 +692,7 @@ static int destroy_user_event(struct user_event *user) =20 dyn_event_remove(&user->devent); =20 - register_page_data[user->index] =3D 0; + user_event_register_clear(user); clear_bit(user->index, page_bitmap); hash_del(&user->node); =20 @@ -827,7 +871,12 @@ static void update_reg_page_for(struct user_event *use= r) rcu_read_unlock_sched(); } =20 - register_page_data[user->index] =3D status; + if (status) + user_event_register_set(user); + else + user_event_register_clear(user); + + user->status =3D status; } =20 /* @@ -1332,7 +1381,17 @@ static long user_reg_get(struct user_reg __user *ure= g, struct user_reg *kreg) if (size > PAGE_SIZE) return -E2BIG; =20 - return copy_struct_from_user(kreg, sizeof(*kreg), ureg, size); + if (size < offsetofend(struct user_reg, write_index)) + return -EINVAL; + + ret =3D copy_struct_from_user(kreg, sizeof(*kreg), ureg, size); + + if (ret) + return ret; + + kreg->size =3D size; + + return 0; } =20 /* @@ -1376,7 +1435,7 @@ static long user_events_ioctl_reg(struct file *file, = unsigned long uarg) return ret; =20 put_user((u32)ret, &ureg->write_index); - put_user(user->index, &ureg->status_index); + put_user(user->index, &ureg->status_bit); =20 return 0; } @@ -1485,7 +1544,7 @@ static int user_status_mmap(struct file *file, struct= vm_area_struct *vma) { unsigned long size =3D vma->vm_end - vma->vm_start; =20 - if (size !=3D MAX_EVENTS) + if (size !=3D MAX_BYTES) return -EINVAL; =20 return remap_pfn_range(vma, vma->vm_start, @@ -1520,7 +1579,7 @@ static int user_seq_show(struct seq_file *m, void *p) mutex_lock(®_mutex); =20 hash_for_each(register_table, i, user, node) { - status =3D register_page_data[user->index]; + status =3D user->status; flags =3D user->flags; =20 seq_printf(m, "%d:%s", user->index, EVENT_NAME(user)); diff --git a/samples/user_events/example.c b/samples/user_events/example.c index 4f5778e441c0..d06dc24156ec 100644 --- a/samples/user_events/example.c +++ b/samples/user_events/example.c @@ -12,13 +12,21 @@ #include #include #include +#include +#include #include =20 +#if __BITS_PER_LONG =3D=3D 64 +#define endian_swap(x) htole64(x) +#else +#define endian_swap(x) htole32(x) +#endif + /* Assumes debugfs is mounted */ const char *data_file =3D "/sys/kernel/debug/tracing/user_events_data"; const char *status_file =3D "/sys/kernel/debug/tracing/user_events_status"; =20 -static int event_status(char **status) +static int event_status(long **status) { int fd =3D open(status_file, O_RDONLY); =20 @@ -33,7 +41,8 @@ static int event_status(char **status) return 0; } =20 -static int event_reg(int fd, const char *command, int *status, int *write) +static int event_reg(int fd, const char *command, long *index, long *mask, + int *write) { struct user_reg reg =3D {0}; =20 @@ -43,7 +52,8 @@ static int event_reg(int fd, const char *command, int *st= atus, int *write) if (ioctl(fd, DIAG_IOCSREG, ®) =3D=3D -1) return -1; =20 - *status =3D reg.status_index; + *index =3D reg.status_bit / __BITS_PER_LONG; + *mask =3D endian_swap(1L << (reg.status_bit % __BITS_PER_LONG)); *write =3D reg.write_index; =20 return 0; @@ -51,8 +61,9 @@ static int event_reg(int fd, const char *command, int *st= atus, int *write) =20 int main(int argc, char **argv) { - int data_fd, status, write; - char *status_page; + int data_fd, write; + long index, mask; + long *status_page; struct iovec io[2]; __u32 count =3D 0; =20 @@ -61,7 +72,7 @@ int main(int argc, char **argv) =20 data_fd =3D open(data_file, O_RDWR); =20 - if (event_reg(data_fd, "test u32 count", &status, &write) =3D=3D -1) + if (event_reg(data_fd, "test u32 count", &index, &mask, &write) =3D=3D -1) return errno; =20 /* Setup iovec */ @@ -75,7 +86,7 @@ int main(int argc, char **argv) getchar(); =20 /* Check if anyone is listening */ - if (status_page[status]) { + if (status_page[index] & mask) { /* Yep, trace out our data */ writev(data_fd, (const struct iovec *)io, 2); =20 diff --git a/tools/testing/selftests/user_events/ftrace_test.c b/tools/test= ing/selftests/user_events/ftrace_test.c index a80fb5ef61d5..404a2713dcae 100644 --- a/tools/testing/selftests/user_events/ftrace_test.c +++ b/tools/testing/selftests/user_events/ftrace_test.c @@ -22,6 +22,11 @@ const char *enable_file =3D "/sys/kernel/debug/tracing/e= vents/user_events/__test_e const char *trace_file =3D "/sys/kernel/debug/tracing/trace"; const char *fmt_file =3D "/sys/kernel/debug/tracing/events/user_events/__t= est_event/format"; =20 +static inline int status_check(char *status_page, int status_bit) +{ + return status_page[status_bit >> 3] & (1 << (status_bit & 7)); +} + static int trace_bytes(void) { int fd =3D open(trace_file, O_RDONLY); @@ -197,12 +202,12 @@ TEST_F(user, register_events) { /* Register should work */ ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, ®)); ASSERT_EQ(0, reg.write_index); - ASSERT_NE(0, reg.status_index); + ASSERT_NE(0, reg.status_bit); =20 /* Multiple registers should result in same index */ ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, ®)); ASSERT_EQ(0, reg.write_index); - ASSERT_NE(0, reg.status_index); + ASSERT_NE(0, reg.status_bit); =20 /* Ensure disabled */ self->enable_fd =3D open(enable_file, O_RDWR); @@ -212,15 +217,15 @@ TEST_F(user, register_events) { /* MMAP should work and be zero'd */ ASSERT_NE(MAP_FAILED, status_page); ASSERT_NE(NULL, status_page); - ASSERT_EQ(0, status_page[reg.status_index]); + ASSERT_EQ(0, status_check(status_page, reg.status_bit)); =20 /* Enable event and ensure bits updated in status */ ASSERT_NE(-1, write(self->enable_fd, "1", sizeof("1"))) - ASSERT_EQ(EVENT_STATUS_FTRACE, status_page[reg.status_index]); + ASSERT_NE(0, status_check(status_page, reg.status_bit)); =20 /* Disable event and ensure bits updated in status */ ASSERT_NE(-1, write(self->enable_fd, "0", sizeof("0"))) - ASSERT_EQ(0, status_page[reg.status_index]); + ASSERT_EQ(0, status_check(status_page, reg.status_bit)); =20 /* File still open should return -EBUSY for delete */ ASSERT_EQ(-1, ioctl(self->data_fd, DIAG_IOCSDEL, "__test_event")); @@ -240,6 +245,8 @@ TEST_F(user, write_events) { struct iovec io[3]; __u32 field1, field2; int before =3D 0, after =3D 0; + int page_size =3D sysconf(_SC_PAGESIZE); + char *status_page; =20 reg.size =3D sizeof(reg); reg.name_args =3D (__u64)"__test_event u32 field1; u32 field2"; @@ -254,10 +261,18 @@ TEST_F(user, write_events) { io[2].iov_base =3D &field2; io[2].iov_len =3D sizeof(field2); =20 + status_page =3D mmap(NULL, page_size, PROT_READ, MAP_SHARED, + self->status_fd, 0); + /* Register should work */ ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, ®)); ASSERT_EQ(0, reg.write_index); - ASSERT_NE(0, reg.status_index); + ASSERT_NE(0, reg.status_bit); + + /* MMAP should work and be zero'd */ + ASSERT_NE(MAP_FAILED, status_page); + ASSERT_NE(NULL, status_page); + ASSERT_EQ(0, status_check(status_page, reg.status_bit)); =20 /* Write should fail on invalid slot with ENOENT */ io[0].iov_base =3D &field2; @@ -271,6 +286,9 @@ TEST_F(user, write_events) { self->enable_fd =3D open(enable_file, O_RDWR); ASSERT_NE(-1, write(self->enable_fd, "1", sizeof("1"))) =20 + /* Event should now be enabled */ + ASSERT_NE(0, status_check(status_page, reg.status_bit)); + /* Write should make it out to ftrace buffers */ before =3D trace_bytes(); ASSERT_NE(-1, writev(self->data_fd, (const struct iovec *)io, 3)); @@ -298,7 +316,7 @@ TEST_F(user, write_fault) { /* Register should work */ ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, ®)); ASSERT_EQ(0, reg.write_index); - ASSERT_NE(0, reg.status_index); + ASSERT_NE(0, reg.status_bit); =20 /* Write should work normally */ ASSERT_NE(-1, writev(self->data_fd, (const struct iovec *)io, 2)); @@ -315,6 +333,11 @@ TEST_F(user, write_validator) { int loc, bytes; char data[8]; int before =3D 0, after =3D 0; + int page_size =3D sysconf(_SC_PAGESIZE); + char *status_page; + + status_page =3D mmap(NULL, page_size, PROT_READ, MAP_SHARED, + self->status_fd, 0); =20 reg.size =3D sizeof(reg); reg.name_args =3D (__u64)"__test_event __rel_loc char[] data"; @@ -322,7 +345,12 @@ TEST_F(user, write_validator) { /* Register should work */ ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, ®)); ASSERT_EQ(0, reg.write_index); - ASSERT_NE(0, reg.status_index); + ASSERT_NE(0, reg.status_bit); + + /* MMAP should work and be zero'd */ + ASSERT_NE(MAP_FAILED, status_page); + ASSERT_NE(NULL, status_page); + ASSERT_EQ(0, status_check(status_page, reg.status_bit)); =20 io[0].iov_base =3D ®.write_index; io[0].iov_len =3D sizeof(reg.write_index); @@ -340,6 +368,9 @@ TEST_F(user, write_validator) { self->enable_fd =3D open(enable_file, O_RDWR); ASSERT_NE(-1, write(self->enable_fd, "1", sizeof("1"))) =20 + /* Event should now be enabled */ + ASSERT_NE(0, status_check(status_page, reg.status_bit)); + /* Full in-bounds write should work */ before =3D trace_bytes(); loc =3D DYN_LOC(0, bytes); diff --git a/tools/testing/selftests/user_events/perf_test.c b/tools/testin= g/selftests/user_events/perf_test.c index 26851d51d6bb..8b4c7879d5a7 100644 --- a/tools/testing/selftests/user_events/perf_test.c +++ b/tools/testing/selftests/user_events/perf_test.c @@ -35,6 +35,11 @@ static long perf_event_open(struct perf_event_attr *pe, = pid_t pid, return syscall(__NR_perf_event_open, pe, pid, cpu, group_fd, flags); } =20 +static inline int status_check(char *status_page, int status_bit) +{ + return status_page[status_bit >> 3] & (1 << (status_bit & 7)); +} + static int get_id(void) { FILE *fp =3D fopen(id_file, "r"); @@ -120,8 +125,8 @@ TEST_F(user, perf_write) { /* Register should work */ ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, ®)); ASSERT_EQ(0, reg.write_index); - ASSERT_NE(0, reg.status_index); - ASSERT_EQ(0, status_page[reg.status_index]); + ASSERT_NE(0, reg.status_bit); + ASSERT_EQ(0, status_check(status_page, reg.status_bit)); =20 /* Id should be there */ id =3D get_id(); @@ -144,7 +149,7 @@ TEST_F(user, perf_write) { ASSERT_NE(MAP_FAILED, perf_page); =20 /* Status should be updated */ - ASSERT_EQ(EVENT_STATUS_PERF, status_page[reg.status_index]); + ASSERT_NE(0, status_check(status_page, reg.status_bit)); =20 event.index =3D reg.write_index; event.field1 =3D 0xc001; --=20 2.35.1 From nobody Mon Apr 6 11:53:08 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5820EC433FE for ; Thu, 29 Sep 2022 22:56:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230082AbiI2W4Y (ORCPT ); Thu, 29 Sep 2022 18:56:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48810 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229932AbiI2Wzf (ORCPT ); Thu, 29 Sep 2022 18:55:35 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E98BAEC55A for ; Thu, 29 Sep 2022 15:55:28 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 9E949621E1 for ; Thu, 29 Sep 2022 22:55:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 13A63C433D7; Thu, 29 Sep 2022 22:55:28 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.96) (envelope-from ) id 1oe2SP-000ctN-3D; Thu, 29 Sep 2022 18:56:42 -0400 Message-ID: <20220929225641.570530955@goodmis.org> User-Agent: quilt/0.66 Date: Thu, 29 Sep 2022 18:55:57 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Andrew Morton , Beau Belgrave Subject: [for-next][PATCH 15/15] tracing/user_events: Update ABI documentation to align to bits vs bytes References: <20220929225542.784716766@goodmis.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Beau Belgrave Update the documentation to reflect the new ABI requirements and how to use the byte index with the mask properly to check event status. Link: https://lkml.kernel.org/r/20220728233309.1896-7-beaub@linux.microsoft= .com Signed-off-by: Beau Belgrave Signed-off-by: Steven Rostedt (Google) --- Documentation/trace/user_events.rst | 86 +++++++++++++++++++---------- 1 file changed, 58 insertions(+), 28 deletions(-) diff --git a/Documentation/trace/user_events.rst b/Documentation/trace/user= _events.rst index c180936f49fc..9f181f342a70 100644 --- a/Documentation/trace/user_events.rst +++ b/Documentation/trace/user_events.rst @@ -20,14 +20,14 @@ dynamic_events is the same as the ioctl with the u: pre= fix applied. =20 Typically programs will register a set of events that they wish to expose = to tools that can read trace_events (such as ftrace and perf). The registrati= on -process gives back two ints to the program for each event. The first int i= s the -status index. This index describes which byte in the +process gives back two ints to the program for each event. The first int is +the status bit. This describes which bit in little-endian format in the /sys/kernel/debug/tracing/user_events_status file represents this event. T= he -second int is the write index. This index describes the data when a write(= ) or +second int is the write index which describes the data when a write() or writev() is called on the /sys/kernel/debug/tracing/user_events_data file. =20 -The structures referenced in this document are contained with the -/include/uap/linux/user_events.h file in the source tree. +The structures referenced in this document are contained within the +/include/uapi/linux/user_events.h file in the source tree. =20 **NOTE:** *Both user_events_status and user_events_data are under the trac= efs filesystem and may be mounted at different paths than above.* @@ -38,18 +38,18 @@ Registering within a user process is done via ioctl() o= ut to the /sys/kernel/debug/tracing/user_events_data file. The command to issue is DIAG_IOCSREG. =20 -This command takes a struct user_reg as an argument:: +This command takes a packed struct user_reg as an argument:: =20 struct user_reg { u32 size; u64 name_args; - u32 status_index; + u32 status_bit; u32 write_index; }; =20 The struct user_reg requires two inputs, the first is the size of the stru= cture to ensure forward and backward compatibility. The second is the command st= ring -to issue for registering. Upon success two outputs are set, the status ind= ex +to issue for registering. Upon success two outputs are set, the status bit and the write index. =20 User based events show up under tracefs like any other event under the @@ -111,15 +111,56 @@ in realtime. This allows user programs to only incur = the cost of the write() or writev() calls when something is actively attached to the event. =20 User programs call mmap() on /sys/kernel/debug/tracing/user_events_status = to -check the status for each event that is registered. The byte to check in t= he -file is given back after the register ioctl() via user_reg.status_index. +check the status for each event that is registered. The bit to check in the +file is given back after the register ioctl() via user_reg.status_bit. The= bit +is always in little-endian format. Programs can check if the bit is set ei= ther +using a byte-wise index with a mask or a long-wise index with a little-end= ian +mask. + Currently the size of user_events_status is a single page, however, custom kernel configurations can change this size to allow more user based events= . In all cases the size of the file is a multiple of a page size. =20 -For example, if the register ioctl() gives back a status_index of 3 you wo= uld -check byte 3 of the returned mmap data to see if anything is attached to t= hat -event. +For example, if the register ioctl() gives back a status_bit of 3 you would +check byte 0 (3 / 8) of the returned mmap data and then AND the result wit= h 8 +(1 << (3 % 8)) to see if anything is attached to that event. + +A byte-wise index check is performed as follows:: + + int index, mask; + char *status_page; + + index =3D status_bit / 8; + mask =3D 1 << (status_bit % 8); + + ... + + if (status_page[index] & mask) { + /* Enabled */ + } + +A long-wise index check is performed as follows:: + + #include + #include + + #if __BITS_PER_LONG =3D=3D 64 + #define endian_swap(x) htole64(x) + #else + #define endian_swap(x) htole32(x) + #endif + + long index, mask, *status_page; + + index =3D status_bit / __BITS_PER_LONG; + mask =3D 1L << (status_bit % __BITS_PER_LONG); + mask =3D endian_swap(mask); + + ... + + if (status_page[index] & mask) { + /* Enabled */ + } =20 Administrators can easily check the status of all registered events by rea= ding the user_events_status file directly via a terminal. The output is as foll= ows:: @@ -137,7 +178,7 @@ For example, on a system that has a single event the ou= tput looks like this:: =20 Active: 1 Busy: 0 - Max: 4096 + Max: 32768 =20 If a user enables the user event via ftrace, the output would change to th= is:: =20 @@ -145,21 +186,10 @@ If a user enables the user event via ftrace, the outp= ut would change to this:: =20 Active: 1 Busy: 1 - Max: 4096 - -**NOTE:** *A status index of 0 will never be returned. This allows user -programs to have an index that can be used on error cases.* - -Status Bits -^^^^^^^^^^^ -The byte being checked will be non-zero if anything is attached. Programs = can -check specific bits in the byte to see what mechanism has been attached. - -The following values are defined to aid in checking what has been attached: - -**EVENT_STATUS_FTRACE** - Bit set if ftrace has been attached (Bit 0). + Max: 32768 =20 -**EVENT_STATUS_PERF** - Bit set if perf has been attached (Bit 1). +**NOTE:** *A status bit of 0 will never be returned. This allows user prog= rams +to have a bit that can be used on error cases.* =20 Writing Data ------------ --=20 2.35.1