[v13] ring-buffer: Making persistent ring buffers robust

[PATCH v13 4/4] ring-buffer: Add persistent ring buffer selftest

Posted by Masami Hiramatsu (Google) 1 week, 2 days ago

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Add a self-destractive test for the persistent ring buffer. This
will invalidate some sub-buffer pages in the persistent ring buffer
when kernel gets panic, and check whether the number of detected
invalid pages and the total entry_bytes are the same as record
after reboot.

This can ensure the kernel correctly recover partially corrupted
persistent ring buffer when boot.

The test only runs on the persistent ring buffer whose name is
"ptracingtest". And user has to fill it up with events before
kernel panics.

To run the test, enable CONFIG_RING_BUFFER_PERSISTENT_SELFTEST
and you have to setup the kernel cmdline;

 reserve_mem=20M:2M:trace trace_instance=ptracingtest^traceoff@trace
 panic=1

And run following commands after the 1st boot;

 cd /sys/kernel/tracing/instances/ptracingtest
 echo 1 > tracing_on
 echo 1 > events/enable
 sleep 3
 echo c > /proc/sysrq-trigger

After panic message, the kernel will reboot and run the verification
on the persistent ring buffer, e.g.

 Ring buffer meta [2] invalid buffer page detected
 Ring buffer meta [2] is from previous boot! (318 pages discarded)
 Ring buffer testing [2] invalid pages: PASSED (318/318)
 Ring buffer testing [2] entry_bytes: PASSED (1300476/1300476)

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v10:
  - Add entry_bytes test.
  - Do not compile test code if CONFIG_RING_BUFFER_PERSISTENT_SELFTEST=n.
 Changes in v9:
  - Test also reader pages.
---
 include/linux/ring_buffer.h |    1 +
 kernel/trace/Kconfig        |   15 +++++++++
 kernel/trace/ring_buffer.c  |   69 +++++++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace.c        |    4 ++
 4 files changed, 89 insertions(+)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 994f52b34344..0670742b2d60 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -238,6 +238,7 @@ int ring_buffer_subbuf_size_get(struct trace_buffer *buffer);
 
 enum ring_buffer_flags {
 	RB_FL_OVERWRITE		= 1 << 0,
+	RB_FL_TESTING		= 1 << 1,
 };
 
 #ifdef CONFIG_RING_BUFFER
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e130da35808f..094d5511bb17 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -1202,6 +1202,21 @@ config RING_BUFFER_VALIDATE_TIME_DELTAS
 	  Only say Y if you understand what this does, and you
 	  still want it enabled. Otherwise say N
 
+config RING_BUFFER_PERSISTENT_SELFTEST
+	bool "Enable persistent ring buffer selftest"
+	depends on RING_BUFFER
+	help
+	  Run a selftest on the persistent ring buffer which names
+	  "ptracingtest" (and its backup) when panic_on_reboot by
+	  invalidating ring buffer pages.
+	  Note that user has to enable events on the persistent ring
+	  buffer manually to fill up ring buffers before rebooting.
+	  Since this invalidates the data on test target ring buffer,
+	  "ptracingtest" persistent ring buffer must not be used for
+	  actual tracing, but only for testing.
+
+	  If unsure, say N
+
 config MMIOTRACE_TEST
 	tristate "Test module for mmiotrace"
 	depends on MMIOTRACE && m
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index e5178239f2f9..10443347a6d8 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -64,6 +64,10 @@ struct ring_buffer_cpu_meta {
 	unsigned long	commit_buffer;
 	__u32		subbuf_size;
 	__u32		nr_subbufs;
+#ifdef CONFIG_RING_BUFFER_PERSISTENT_SELFTEST
+	__u32		nr_invalid;
+	__u32		entry_bytes;
+#endif
 	int		buffers[];
 };
 
@@ -2077,6 +2081,19 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
 
 	pr_info("Ring buffer meta [%d] is from previous boot! (%d pages discarded)\n",
 		cpu_buffer->cpu, discarded);
+
+#ifdef CONFIG_RING_BUFFER_PERSISTENT_SELFTEST
+	if (meta->nr_invalid)
+		pr_info("Ring buffer testing [%d] invalid pages: %s (%d/%d)\n",
+			cpu_buffer->cpu,
+			(discarded == meta->nr_invalid) ? "PASSED" : "FAILED",
+			discarded, meta->nr_invalid);
+	if (meta->entry_bytes)
+		pr_info("Ring buffer testing [%d] entry_bytes: %s (%ld/%ld)\n",
+			cpu_buffer->cpu,
+			(entry_bytes == meta->entry_bytes) ? "PASSED" : "FAILED",
+			(long)entry_bytes, (long)meta->entry_bytes);
+#endif
 	return;
 
  invalid:
@@ -2557,12 +2574,64 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
 	kfree(cpu_buffer);
 }
 
+#ifdef CONFIG_RING_BUFFER_PERSISTENT_SELFTEST
+static void rb_test_inject_invalid_pages(struct trace_buffer *buffer)
+{
+	struct ring_buffer_per_cpu *cpu_buffer;
+	struct ring_buffer_cpu_meta *meta;
+	struct buffer_data_page *dpage;
+	u32 entry_bytes = 0;
+	unsigned long ptr;
+	int subbuf_size;
+	int invalid = 0;
+	int cpu;
+	int i;
+
+	if (!(buffer->flags & RB_FL_TESTING))
+		return;
+
+	guard(preempt)();
+	cpu = smp_processor_id();
+
+	cpu_buffer = buffer->buffers[cpu];
+	meta = cpu_buffer->ring_meta;
+	ptr = (unsigned long)rb_subbufs_from_meta(meta);
+	subbuf_size = meta->subbuf_size;
+
+	for (i = 0; i < meta->nr_subbufs; i++) {
+		int idx = meta->buffers[i];
+
+		dpage = (void *)(ptr + idx * subbuf_size);
+		/* Skip unused pages */
+		if (!local_read(&dpage->commit))
+			continue;
+
+		/* Invalidate even pages. */
+		if (!(i & 0x1)) {
+			local_add(subbuf_size + 1, &dpage->commit);
+			invalid++;
+		} else {
+			/* Count total commit bytes. */
+			entry_bytes += local_read(&dpage->commit);
+		}
+	}
+
+	pr_info("Inject invalidated %d pages on CPU%d, total size: %ld\n",
+		invalid, cpu, (long)entry_bytes);
+	meta->nr_invalid = invalid;
+	meta->entry_bytes = entry_bytes;
+}
+#else /* !CONFIG_RING_BUFFER_PERSISTENT_SELFTEST */
+#define rb_test_inject_invalid_pages(buffer)	do { } while (0)
+#endif
+
 /* Stop recording on a persistent buffer and flush cache if needed. */
 static int rb_flush_buffer_cb(struct notifier_block *nb, unsigned long event, void *data)
 {
 	struct trace_buffer *buffer = container_of(nb, struct trace_buffer, flush_nb);
 
 	ring_buffer_record_off(buffer);
+	rb_test_inject_invalid_pages(buffer);
 	arch_ring_buffer_flush_range(buffer->range_addr_start, buffer->range_addr_end);
 	return NOTIFY_DONE;
 }
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4189ec9df6a5..108b0d16badf 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9366,6 +9366,8 @@ static void setup_trace_scratch(struct trace_array *tr,
 	memset(tscratch, 0, size);
 }
 
+#define TRACE_TEST_PTRACING_NAME	"ptracingtest"
+
 static int
 allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, unsigned long size)
 {
@@ -9378,6 +9380,8 @@ allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, unsigned
 	buf->tr = tr;
 
 	if (tr->range_addr_start && tr->range_addr_size) {
+		if (!strcmp(tr->name, TRACE_TEST_PTRACING_NAME))
+			rb_flags |= RB_FL_TESTING;
 		/* Add scratch buffer to handle 128 modules */
 		buf->buffer = ring_buffer_alloc_range(size, rb_flags, 0,
 						      tr->range_addr_start,

Re: [PATCH v13 4/4] ring-buffer: Add persistent ring buffer selftest

Posted by Steven Rostedt 6 days, 6 hours ago

On Wed, 25 Mar 2026 11:25:25 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:

> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
> Add a self-destractive test for the persistent ring buffer. This
> will invalidate some sub-buffer pages in the persistent ring buffer
> when kernel gets panic, and check whether the number of detected
> invalid pages and the total entry_bytes are the same as record
> after reboot.
> 
> This can ensure the kernel correctly recover partially corrupted
> persistent ring buffer when boot.
> 
> The test only runs on the persistent ring buffer whose name is
> "ptracingtest". And user has to fill it up with events before
> kernel panics.
> 
> To run the test, enable CONFIG_RING_BUFFER_PERSISTENT_SELFTEST

I think a more appropriate config name would be:

  CONFIG_PERSISTENT_RING_BUFFER_ERROR_INJECT

as that's what it is doing as it is only testing error injection and not
the persistent ring buffer.

> and you have to setup the kernel cmdline;
> 
>  reserve_mem=20M:2M:trace trace_instance=ptracingtest^traceoff@trace
>  panic=1
> 
> And run following commands after the 1st boot;
> 
>  cd /sys/kernel/tracing/instances/ptracingtest
>  echo 1 > tracing_on
>  echo 1 > events/enable
>  sleep 3
>  echo c > /proc/sysrq-trigger

These instructions should probably be in the CONFIG help message.

> 
> After panic message, the kernel will reboot and run the verification
> on the persistent ring buffer, e.g.
> 
>  Ring buffer meta [2] invalid buffer page detected
>  Ring buffer meta [2] is from previous boot! (318 pages discarded)
>  Ring buffer testing [2] invalid pages: PASSED (318/318)
>  Ring buffer testing [2] entry_bytes: PASSED (1300476/1300476)

BTW, when I tested this, I got the above on the first boot, but if I
rebooted normally without re-enabling the persistent ring buffer, I would
get on the next boot:


[    0.966510] Ring buffer meta [2] is from previous boot! (0 pages discarded)
[    0.971338]  #2
[    1.003431] Ring buffer meta [3] is from previous boot! (0 pages discarded)
[    1.007737]  #3
[    1.039091] Ring buffer meta [4] is from previous boot! (0 pages discarded)
[    1.043181] Ring buffer testing [4] invalid pages: FAILED (0/1597)
[    1.044660] Ring buffer testing [4] entry_bytes: PASSED (6512464/6512464)
[    1.047829]  #4
[    1.079811] Ring buffer meta [5] is from previous boot! (0 pages discarded)
[    1.083728]  #5
[    1.116764] Ring buffer meta [6] is from previous boot! (0 pages discarded)
[    1.120846]  #6
[    1.156502] Ring buffer meta [7] is from previous boot! (0 pages discarded)
[    1.160857]  #7

I'll start testing the previous 3 patches and may add them to next.

Also, I noticed that there's nothing that reads the RB_MISSING as I thought
it might. I'll have to look into how to pass that info to the trace output.

-- Steve

Re: [PATCH v13 4/4] ring-buffer: Add persistent ring buffer selftest

Posted by Masami Hiramatsu (Google) 4 days, 1 hour ago

On Fri, 27 Mar 2026 16:25:08 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 25 Mar 2026 11:25:25 +0900
> "Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> 
> > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > 
> > Add a self-destractive test for the persistent ring buffer. This
> > will invalidate some sub-buffer pages in the persistent ring buffer
> > when kernel gets panic, and check whether the number of detected
> > invalid pages and the total entry_bytes are the same as record
> > after reboot.
> > 
> > This can ensure the kernel correctly recover partially corrupted
> > persistent ring buffer when boot.
> > 
> > The test only runs on the persistent ring buffer whose name is
> > "ptracingtest". And user has to fill it up with events before
> > kernel panics.
> > 
> > To run the test, enable CONFIG_RING_BUFFER_PERSISTENT_SELFTEST
> 
> I think a more appropriate config name would be:
> 
>   CONFIG_PERSISTENT_RING_BUFFER_ERROR_INJECT
> 
> as that's what it is doing as it is only testing error injection and not
> the persistent ring buffer.

OK, selftest will be another implementation.

(preparing buffer with test data and check recovery process?)

> 
> > and you have to setup the kernel cmdline;
> > 
> >  reserve_mem=20M:2M:trace trace_instance=ptracingtest^traceoff@trace
> >  panic=1
> > 
> > And run following commands after the 1st boot;
> > 
> >  cd /sys/kernel/tracing/instances/ptracingtest
> >  echo 1 > tracing_on
> >  echo 1 > events/enable
> >  sleep 3
> >  echo c > /proc/sysrq-trigger
> 
> These instructions should probably be in the CONFIG help message.

OK. I'll add it.

> 
> > 
> > After panic message, the kernel will reboot and run the verification
> > on the persistent ring buffer, e.g.
> > 
> >  Ring buffer meta [2] invalid buffer page detected
> >  Ring buffer meta [2] is from previous boot! (318 pages discarded)
> >  Ring buffer testing [2] invalid pages: PASSED (318/318)
> >  Ring buffer testing [2] entry_bytes: PASSED (1300476/1300476)
> 
> BTW, when I tested this, I got the above on the first boot, but if I
> rebooted normally without re-enabling the persistent ring buffer, I would
> get on the next boot:

Hmm, since it is already recovered (rewound) the 2nd rewound process
may not work correctly. Let me fix it.
> 
> 
> [    0.966510] Ring buffer meta [2] is from previous boot! (0 pages discarded)
> [    0.971338]  #2
> [    1.003431] Ring buffer meta [3] is from previous boot! (0 pages discarded)
> [    1.007737]  #3
> [    1.039091] Ring buffer meta [4] is from previous boot! (0 pages discarded)
> [    1.043181] Ring buffer testing [4] invalid pages: FAILED (0/1597)
> [    1.044660] Ring buffer testing [4] entry_bytes: PASSED (6512464/6512464)
> [    1.047829]  #4
> [    1.079811] Ring buffer meta [5] is from previous boot! (0 pages discarded)
> [    1.083728]  #5
> [    1.116764] Ring buffer meta [6] is from previous boot! (0 pages discarded)
> [    1.120846]  #6
> [    1.156502] Ring buffer meta [7] is from previous boot! (0 pages discarded)
> [    1.160857]  #7
> 
> I'll start testing the previous 3 patches and may add them to next.

Thanks,

> 
> Also, I noticed that there's nothing that reads the RB_MISSING as I thought
> it might. I'll have to look into how to pass that info to the trace output.
> 
> -- Steve
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

Re: [PATCH v13 4/4] ring-buffer: Add persistent ring buffer selftest

Posted by Steven Rostedt 6 days, 6 hours ago

On Fri, 27 Mar 2026 16:25:08 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> Also, I noticed that there's nothing that reads the RB_MISSING as I thought
> it might. I'll have to look into how to pass that info to the trace output.

And when I cat /sys/kernel/tracing/instances/ptracingtest/per_cpu/cpuX/trace_pipe

   (where X is the failed buffer)

It triggered an infinite loop of:

[  206.549217] ------------[ cut here ]------------
[  206.550907] WARNING: kernel/trace/ring_buffer.c:5751 at __rb_get_reader_page+0xa6b/0x1040, CPU#2: cat/1197
[  206.554111] Modules linked in:
[  206.555331] CPU: 2 UID: 0 PID: 1197 Comm: cat Tainted: G        W           7.0.0-rc4-test-00028-g7b37f48b2c57-dirty #276 PREEMPT(full) 
[  206.559048] Tainted: [W]=WARN
[  206.560244] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
[  206.563212] RIP: 0010:__rb_get_reader_page+0xa6b/0x1040
[  206.564964] Code: ff df 48 c1 ea 03 80 3c 02 00 0f 85 4a 05 00 00 48 8b 43 10 be 04 00 00 00 4c 8d 60 08 4c 89 e7 e8 9a 2d 63 00 f0 41 ff 04 24 <0f> 0b e9 36 fb ff ff e8 29 39 05 00 fb 0f 1f 44 00 00 4d 85 f6 0f
[  206.572295] RSP: 0018:ffff888112a77938 EFLAGS: 00010006
[  206.574095] RAX: 0000000000000001 RBX: ffff888100d6e000 RCX: 0000000000000001
[  206.576458] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff88810027b808
[  206.578749] RBP: 1ffff1102254ef34 R08: ffffffff909a1556 R09: ffffed102004f701
[  206.581020] R10: ffffed102004f702 R11: ffff88823443a000 R12: ffff88810027b808
[  206.583312] R13: ffff888100f65f00 R14: ffff888100f65f00 R15: dffffc0000000000
[  206.585647] FS:  00007f98e4d80780(0000) GS:ffff88829e3c2000(0000) knlGS:0000000000000000
[  206.588246] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  206.590179] CR2: 00007f98e4d3e000 CR3: 000000012272e006 CR4: 0000000000172ef0
[  206.592444] Call Trace:
[  206.593518]  <TASK>
[  206.594436]  ? __pfx___rb_get_reader_page+0x10/0x10
[  206.596148]  ? lock_acquire+0x1b2/0x340
[  206.597599]  rb_buffer_peek+0x37e/0x520
[  206.598954]  ring_buffer_peek+0xe9/0x310
[  206.601956]  peek_next_entry+0x15a/0x280
[  206.603420]  __find_next_entry+0x39f/0x530
[  206.604918]  ? __pfx___mutex_lock+0x10/0x10
[  206.606474]  ? rcu_is_watching+0x15/0xb0
[  206.616049]  ? __pfx___find_next_entry+0x10/0x10
[  206.617741]  ? preempt_count_sub+0x10c/0x1c0
[  206.619242]  ? __pfx_down_read+0x10/0x10
[  206.620687]  trace_find_next_entry_inc+0x2f/0x240
[  206.622351]  tracing_read_pipe+0x4e7/0xc60
[  206.623852]  ? rw_verify_area+0x353/0x5f0
[  206.625325]  vfs_read+0x171/0xb20
[  206.626592]  ? __lock_acquire+0x487/0x2220
[  206.628135]  ? __pfx___handle_mm_fault+0x10/0x10
[  206.629784]  ? __pfx_vfs_read+0x10/0x10
[  206.632696]  ? __pfx_css_rstat_updated+0x10/0x10
[  206.634351]  ? rcu_is_watching+0x15/0xb0
[  206.635835]  ? trace_preempt_on+0x126/0x160
[  206.637362]  ? preempt_count_sub+0x10c/0x1c0
[  206.638880]  ? count_memcg_events+0x10a/0x4b0
[  206.640455]  ? find_held_lock+0x2b/0x80
[  206.641908]  ? rcu_read_unlock+0x17/0x60
[  206.643340]  ? lock_release+0x1ab/0x320
[  206.644812]  ksys_read+0xff/0x200
[  206.646127]  ? __pfx_ksys_read+0x10/0x10
[  206.647651]  do_syscall_64+0x117/0x16c0
[  206.649035]  ? irqentry_exit+0xd9/0x690
[  206.650548]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  206.652331] RIP: 0033:0x7f98e4e14eb2
[  206.653743] Code: 18 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 1a 83 e2 39 83 fa 08 75 12 e8 2b ff ff ff 0f 1f 00 49 89 ca 48 8b 44 24 20 0f 05 <48> 83 c4 18 c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 10 ff 74 24 18
[  206.659364] RSP: 002b:00007ffdc0a8d930 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
[  206.663251] RAX: ffffffffffffffda RBX: 0000000000040000 RCX: 00007f98e4e14eb2
[  206.665614] RDX: 0000000000040000 RSI: 00007f98e4d3f000 RDI: 0000000000000003
[  206.668022] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000
[  206.670306] R10: 0000000000000000 R11: 0000000000000202 R12: 00007f98e4d3f000
[  206.672624] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000040000
[  206.674941]  </TASK>
[  206.675927] irq event stamp: 7898
[  206.677154] hardirqs last  enabled at (7897): [<ffffffff90991f6f>] ring_buffer_empty_cpu+0x19f/0x2f0
[  206.680088] hardirqs last disabled at (7898): [<ffffffff909a277d>] ring_buffer_peek+0x17d/0x310
[  206.682881] softirqs last  enabled at (7888): [<ffffffff9056cffc>] handle_softirqs+0x5bc/0x7c0
[  206.685710] softirqs last disabled at (7879): [<ffffffff9056d322>] __irq_exit_rcu+0x112/0x230
[  206.688483] ---[ end trace 0000000000000000 ]---

OK, that RB_MISSED_EVENTS is causing an issue. Something else we need to
look into. The warning is that __rb_get_reader_page() is trying more than 3
times. Thus I think it's constantly swapping the head page and the reader
page. Something to investigate.

So, I'm holding off pulling in these patches. I may take the first one
though.

-- Steve

Re: [PATCH v13 4/4] ring-buffer: Add persistent ring buffer selftest

Posted by Masami Hiramatsu (Google) 3 days, 1 hour ago

On Fri, 27 Mar 2026 16:47:48 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 27 Mar 2026 16:25:08 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > Also, I noticed that there's nothing that reads the RB_MISSING as I thought
> > it might. I'll have to look into how to pass that info to the trace output.
> 
> And when I cat /sys/kernel/tracing/instances/ptracingtest/per_cpu/cpuX/trace_pipe
> 
>    (where X is the failed buffer)
> 
> It triggered an infinite loop of:
> 
> [  206.549217] ------------[ cut here ]------------
> [  206.550907] WARNING: kernel/trace/ring_buffer.c:5751 at __rb_get_reader_page+0xa6b/0x1040, CPU#2: cat/1197
> [  206.554111] Modules linked in:
> [  206.555331] CPU: 2 UID: 0 PID: 1197 Comm: cat Tainted: G        W           7.0.0-rc4-test-00028-g7b37f48b2c57-dirty #276 PREEMPT(full) 
> [  206.559048] Tainted: [W]=WARN
> [  206.560244] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
> [  206.563212] RIP: 0010:__rb_get_reader_page+0xa6b/0x1040
> [  206.564964] Code: ff df 48 c1 ea 03 80 3c 02 00 0f 85 4a 05 00 00 48 8b 43 10 be 04 00 00 00 4c 8d 60 08 4c 89 e7 e8 9a 2d 63 00 f0 41 ff 04 24 <0f> 0b e9 36 fb ff ff e8 29 39 05 00 fb 0f 1f 44 00 00 4d 85 f6 0f
> [  206.572295] RSP: 0018:ffff888112a77938 EFLAGS: 00010006
> [  206.574095] RAX: 0000000000000001 RBX: ffff888100d6e000 RCX: 0000000000000001
> [  206.576458] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff88810027b808
> [  206.578749] RBP: 1ffff1102254ef34 R08: ffffffff909a1556 R09: ffffed102004f701
> [  206.581020] R10: ffffed102004f702 R11: ffff88823443a000 R12: ffff88810027b808
> [  206.583312] R13: ffff888100f65f00 R14: ffff888100f65f00 R15: dffffc0000000000
> [  206.585647] FS:  00007f98e4d80780(0000) GS:ffff88829e3c2000(0000) knlGS:0000000000000000
> [  206.588246] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  206.590179] CR2: 00007f98e4d3e000 CR3: 000000012272e006 CR4: 0000000000172ef0
> [  206.592444] Call Trace:
> [  206.593518]  <TASK>
> [  206.594436]  ? __pfx___rb_get_reader_page+0x10/0x10
> [  206.596148]  ? lock_acquire+0x1b2/0x340
> [  206.597599]  rb_buffer_peek+0x37e/0x520
> [  206.598954]  ring_buffer_peek+0xe9/0x310
> [  206.601956]  peek_next_entry+0x15a/0x280
> [  206.603420]  __find_next_entry+0x39f/0x530
> [  206.604918]  ? __pfx___mutex_lock+0x10/0x10
> [  206.606474]  ? rcu_is_watching+0x15/0xb0
> [  206.616049]  ? __pfx___find_next_entry+0x10/0x10
> [  206.617741]  ? preempt_count_sub+0x10c/0x1c0
> [  206.619242]  ? __pfx_down_read+0x10/0x10
> [  206.620687]  trace_find_next_entry_inc+0x2f/0x240
> [  206.622351]  tracing_read_pipe+0x4e7/0xc60
> [  206.623852]  ? rw_verify_area+0x353/0x5f0
> [  206.625325]  vfs_read+0x171/0xb20
> [  206.626592]  ? __lock_acquire+0x487/0x2220
> [  206.628135]  ? __pfx___handle_mm_fault+0x10/0x10
> [  206.629784]  ? __pfx_vfs_read+0x10/0x10
> [  206.632696]  ? __pfx_css_rstat_updated+0x10/0x10
> [  206.634351]  ? rcu_is_watching+0x15/0xb0
> [  206.635835]  ? trace_preempt_on+0x126/0x160
> [  206.637362]  ? preempt_count_sub+0x10c/0x1c0
> [  206.638880]  ? count_memcg_events+0x10a/0x4b0
> [  206.640455]  ? find_held_lock+0x2b/0x80
> [  206.641908]  ? rcu_read_unlock+0x17/0x60
> [  206.643340]  ? lock_release+0x1ab/0x320
> [  206.644812]  ksys_read+0xff/0x200
> [  206.646127]  ? __pfx_ksys_read+0x10/0x10
> [  206.647651]  do_syscall_64+0x117/0x16c0
> [  206.649035]  ? irqentry_exit+0xd9/0x690
> [  206.650548]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  206.652331] RIP: 0033:0x7f98e4e14eb2
> [  206.653743] Code: 18 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 1a 83 e2 39 83 fa 08 75 12 e8 2b ff ff ff 0f 1f 00 49 89 ca 48 8b 44 24 20 0f 05 <48> 83 c4 18 c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 10 ff 74 24 18
> [  206.659364] RSP: 002b:00007ffdc0a8d930 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
> [  206.663251] RAX: ffffffffffffffda RBX: 0000000000040000 RCX: 00007f98e4e14eb2
> [  206.665614] RDX: 0000000000040000 RSI: 00007f98e4d3f000 RDI: 0000000000000003
> [  206.668022] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000
> [  206.670306] R10: 0000000000000000 R11: 0000000000000202 R12: 00007f98e4d3f000
> [  206.672624] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000040000
> [  206.674941]  </TASK>
> [  206.675927] irq event stamp: 7898
> [  206.677154] hardirqs last  enabled at (7897): [<ffffffff90991f6f>] ring_buffer_empty_cpu+0x19f/0x2f0
> [  206.680088] hardirqs last disabled at (7898): [<ffffffff909a277d>] ring_buffer_peek+0x17d/0x310
> [  206.682881] softirqs last  enabled at (7888): [<ffffffff9056cffc>] handle_softirqs+0x5bc/0x7c0
> [  206.685710] softirqs last disabled at (7879): [<ffffffff9056d322>] __irq_exit_rcu+0x112/0x230
> [  206.688483] ---[ end trace 0000000000000000 ]---
> 
> OK, that RB_MISSED_EVENTS is causing an issue. Something else we need to
> look into. The warning is that __rb_get_reader_page() is trying more than 3
> times. Thus I think it's constantly swapping the head page and the reader
> page. Something to investigate.

I think this happens because invalidated buffer is empty. After recovering
persistent ring buffer, there can be contiguous empty buffers on the
ring buffer. Thus the reader needs to find next non-empty buffer
on the list.

Thank you,


> 
> So, I'm holding off pulling in these patches. I may take the first one
> though.
> 
> -- Steve


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

Re: [PATCH v13 4/4] ring-buffer: Add persistent ring buffer selftest

Posted by Masami Hiramatsu (Google) 3 days, 18 hours ago

On Fri, 27 Mar 2026 16:47:48 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 27 Mar 2026 16:25:08 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > Also, I noticed that there's nothing that reads the RB_MISSING as I thought
> > it might. I'll have to look into how to pass that info to the trace output.
> 
> And when I cat /sys/kernel/tracing/instances/ptracingtest/per_cpu/cpuX/trace_pipe

I tried this but it works 

~ # cat /sys/kernel/tracing/instances/ptracingtest/per_cpu/cpu5/stats 
entries: 36198
overrun: 0
commit overrun: 0
bytes: 1301360
oldest event ts:    24.796202
now ts:    48.613676
dropped events: 0
read events: 0
~ # cat /sys/kernel/tracing/instances/ptracingtest/per_cpu/cpu5/trace_pipe >> /dev/null 
~ # cat /sys/kernel/tracing/instances/ptracingtest/per_cpu/cpu5/stats 
entries: 0
overrun: 0
commit overrun: 0
bytes: 52
oldest event ts:    27.931273
now ts:    71.443017
dropped events: 0
read events: 36198



> 
>    (where X is the failed buffer)
> 
> It triggered an infinite loop of:
> 
> [  206.549217] ------------[ cut here ]------------
> [  206.550907] WARNING: kernel/trace/ring_buffer.c:5751 at __rb_get_reader_page+0xa6b/0x1040, CPU#2: cat/1197
> [  206.554111] Modules linked in:
> [  206.555331] CPU: 2 UID: 0 PID: 1197 Comm: cat Tainted: G        W           7.0.0-rc4-test-00028-g7b37f48b2c57-dirty #276 PREEMPT(full) 
> [  206.559048] Tainted: [W]=WARN
> [  206.560244] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
> [  206.563212] RIP: 0010:__rb_get_reader_page+0xa6b/0x1040
> [  206.564964] Code: ff df 48 c1 ea 03 80 3c 02 00 0f 85 4a 05 00 00 48 8b 43 10 be 04 00 00 00 4c 8d 60 08 4c 89 e7 e8 9a 2d 63 00 f0 41 ff 04 24 <0f> 0b e9 36 fb ff ff e8 29 39 05 00 fb 0f 1f 44 00 00 4d 85 f6 0f
> [  206.572295] RSP: 0018:ffff888112a77938 EFLAGS: 00010006
> [  206.574095] RAX: 0000000000000001 RBX: ffff888100d6e000 RCX: 0000000000000001
> [  206.576458] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff88810027b808
> [  206.578749] RBP: 1ffff1102254ef34 R08: ffffffff909a1556 R09: ffffed102004f701
> [  206.581020] R10: ffffed102004f702 R11: ffff88823443a000 R12: ffff88810027b808
> [  206.583312] R13: ffff888100f65f00 R14: ffff888100f65f00 R15: dffffc0000000000
> [  206.585647] FS:  00007f98e4d80780(0000) GS:ffff88829e3c2000(0000) knlGS:0000000000000000
> [  206.588246] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  206.590179] CR2: 00007f98e4d3e000 CR3: 000000012272e006 CR4: 0000000000172ef0
> [  206.592444] Call Trace:
> [  206.593518]  <TASK>
> [  206.594436]  ? __pfx___rb_get_reader_page+0x10/0x10
> [  206.596148]  ? lock_acquire+0x1b2/0x340
> [  206.597599]  rb_buffer_peek+0x37e/0x520
> [  206.598954]  ring_buffer_peek+0xe9/0x310
> [  206.601956]  peek_next_entry+0x15a/0x280
> [  206.603420]  __find_next_entry+0x39f/0x530
> [  206.604918]  ? __pfx___mutex_lock+0x10/0x10
> [  206.606474]  ? rcu_is_watching+0x15/0xb0
> [  206.616049]  ? __pfx___find_next_entry+0x10/0x10
> [  206.617741]  ? preempt_count_sub+0x10c/0x1c0
> [  206.619242]  ? __pfx_down_read+0x10/0x10
> [  206.620687]  trace_find_next_entry_inc+0x2f/0x240
> [  206.622351]  tracing_read_pipe+0x4e7/0xc60
> [  206.623852]  ? rw_verify_area+0x353/0x5f0
> [  206.625325]  vfs_read+0x171/0xb20
> [  206.626592]  ? __lock_acquire+0x487/0x2220
> [  206.628135]  ? __pfx___handle_mm_fault+0x10/0x10
> [  206.629784]  ? __pfx_vfs_read+0x10/0x10
> [  206.632696]  ? __pfx_css_rstat_updated+0x10/0x10
> [  206.634351]  ? rcu_is_watching+0x15/0xb0
> [  206.635835]  ? trace_preempt_on+0x126/0x160
> [  206.637362]  ? preempt_count_sub+0x10c/0x1c0
> [  206.638880]  ? count_memcg_events+0x10a/0x4b0
> [  206.640455]  ? find_held_lock+0x2b/0x80
> [  206.641908]  ? rcu_read_unlock+0x17/0x60
> [  206.643340]  ? lock_release+0x1ab/0x320
> [  206.644812]  ksys_read+0xff/0x200
> [  206.646127]  ? __pfx_ksys_read+0x10/0x10
> [  206.647651]  do_syscall_64+0x117/0x16c0
> [  206.649035]  ? irqentry_exit+0xd9/0x690
> [  206.650548]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  206.652331] RIP: 0033:0x7f98e4e14eb2
> [  206.653743] Code: 18 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 1a 83 e2 39 83 fa 08 75 12 e8 2b ff ff ff 0f 1f 00 49 89 ca 48 8b 44 24 20 0f 05 <48> 83 c4 18 c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 10 ff 74 24 18
> [  206.659364] RSP: 002b:00007ffdc0a8d930 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
> [  206.663251] RAX: ffffffffffffffda RBX: 0000000000040000 RCX: 00007f98e4e14eb2
> [  206.665614] RDX: 0000000000040000 RSI: 00007f98e4d3f000 RDI: 0000000000000003
> [  206.668022] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000
> [  206.670306] R10: 0000000000000000 R11: 0000000000000202 R12: 00007f98e4d3f000
> [  206.672624] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000040000
> [  206.674941]  </TASK>
> [  206.675927] irq event stamp: 7898
> [  206.677154] hardirqs last  enabled at (7897): [<ffffffff90991f6f>] ring_buffer_empty_cpu+0x19f/0x2f0
> [  206.680088] hardirqs last disabled at (7898): [<ffffffff909a277d>] ring_buffer_peek+0x17d/0x310
> [  206.682881] softirqs last  enabled at (7888): [<ffffffff9056cffc>] handle_softirqs+0x5bc/0x7c0
> [  206.685710] softirqs last disabled at (7879): [<ffffffff9056d322>] __irq_exit_rcu+0x112/0x230
> [  206.688483] ---[ end trace 0000000000000000 ]---
> 
> OK, that RB_MISSED_EVENTS is causing an issue. Something else we need to
> look into. The warning is that __rb_get_reader_page() is trying more than 3
> times. Thus I think it's constantly swapping the head page and the reader
> page. Something to investigate.

In this version, it does not set RB_MISSED_EVENTS on invalid pages.
However, it ignores that bit when validating it.


static int rb_validate_buffer(struct buffer_page *bpage, int cpu,
			      struct ring_buffer_cpu_meta *meta)
{
[...]
	/*
	 * When a sub-buffer is recovered from a read, the commit value may
	 * have RB_MISSED_* bits set, as these bits are reset on reuse.
	 * Even after clearing these bits, a commit value greater than the
	 * subbuf_size is considered invalid.
	 */
	tail = local_read(&dpage->commit) & ~RB_MISSED_MASK;
	if (tail <= meta->subbuf_size)
		ret = rb_read_data_buffer(dpage, tail, cpu, &ts, &delta);

But it does not remove the RB_MISSED_EVENTS flag from commit if
the page is *VALID*. (it is cleared only if the page is invalid)

Thus, if the page originally has the RB_MISSED_EVENTS, the recovery
process does not remove it, and reader may cause infinite loop.

I think in any case, these flags should be removed when it is valided.

Thank you,


> 
> So, I'm holding off pulling in these patches. I may take the first one
> though.
> 
> -- Steve


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

[PATCH v13 1/4] ring-buffer: Flush and stop persistent ring buffer on panic
[PATCH v13 2/4] ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
[PATCH v13 3/4] ring-buffer: Skip invalid sub-buffers when rewinding persistent ring buffer
[PATCH v13 4/4] ring-buffer: Add persistent ring buffer selftest