kernel/trace/ring_buffer.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)
From: Steven Rostedt <rostedt@goodmis.org>
When reading a memory mapped buffer the reader page is just swapped out
with the last page written in the write buffer. If the reader page is the
same as the commit buffer (the buffer that is currently being written to)
it was assumed that it should never have missed events. If it does, it
triggers a WARN_ON_ONCE().
But there just happens to be one scenario where this can legitimately
happen. That is on a commit_overrun. A commit overrun is when an interrupt
preempts an event being written to the buffer and then the interrupt adds
so many new events that it fills and wraps the buffer back to the commit.
Any new events would then be dropped and be reported as "missed_events".
In this case, the next page to read is the commit buffer and after the
swap of the reader page, the reader page will be the commit buffer, but
this time there will be missed events and this triggers the following
warning:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 1127 at kernel/trace/ring_buffer.c:7357 ring_buffer_map_get_reader+0x49a/0x780
Modules linked in: kvm_intel kvm irqbypass
CPU: 2 UID: 0 PID: 1127 Comm: trace-cmd Not tainted 6.15.0-rc7-test-00004-g478bc2824b45-dirty #564 PREEMPT
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:ring_buffer_map_get_reader+0x49a/0x780
Code: 00 00 00 48 89 fe 48 c1 ee 03 80 3c 2e 00 0f 85 ec 01 00 00 4d 3b a6 a8 00 00 00 0f 85 8a fd ff ff 48 85 c0 0f 84 55 fe ff ff <0f> 0b e9 4e fe ff ff be 08 00 00 00 4c 89 54 24 58 48 89 54 24 50
RSP: 0018:ffff888121787dc0 EFLAGS: 00010002
RAX: 00000000000006a2 RBX: ffff888100062800 RCX: ffffffff8190cb49
RDX: ffff888126934c00 RSI: 1ffff11020200a15 RDI: ffff8881010050a8
RBP: dffffc0000000000 R08: 0000000000000000 R09: ffffed1024d26982
R10: ffff888126934c17 R11: ffff8881010050a8 R12: ffff888126934c00
R13: ffff8881010050b8 R14: ffff888101005000 R15: ffff888126930008
FS: 00007f95c8cd7540(0000) GS:ffff8882b576e000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f95c8de4dc0 CR3: 0000000128452002 CR4: 0000000000172ef0
Call Trace:
<TASK>
? __pfx_ring_buffer_map_get_reader+0x10/0x10
tracing_buffers_ioctl+0x283/0x370
__x64_sys_ioctl+0x134/0x190
do_syscall_64+0x79/0x1c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f95c8de48db
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
RSP: 002b:00007ffe037ba110 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffe037bb2b0 RCX: 00007f95c8de48db
RDX: 0000000000000000 RSI: 0000000000005220 RDI: 0000000000000006
RBP: 00007ffe037ba180 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe037bb6f8 R14: 00007f95c9065000 R15: 00005575c7492c90
</TASK>
irq event stamp: 5080
hardirqs last enabled at (5079): [<ffffffff83e0adb0>] _raw_spin_unlock_irqrestore+0x50/0x70
hardirqs last disabled at (5080): [<ffffffff83e0aa83>] _raw_spin_lock_irqsave+0x63/0x70
softirqs last enabled at (4182): [<ffffffff81516122>] handle_softirqs+0x552/0x710
softirqs last disabled at (4159): [<ffffffff815163f7>] __irq_exit_rcu+0x107/0x210
---[ end trace 0000000000000000 ]---
The way to differentiate this case from the normal case of there only
being one page written to where the swap of the reader page received that
one page (which is the commit page), check if the tail page is on the
reader page. The difference between the commit page and the tail page is
that the tail page is where new writes go to, and the commit page holds
the first write that hasn't been committed yet. In the case of an
interrupt preempting the write of an event and filling the buffer, it
would move the tail page but not the commit page.
Have the warning only trigger if the tail page is also on the reader page,
and also print out the number of events dropped by a commit overrun as
that can not yet be safely added to the page so that the reader can see
there were events dropped.
Cc: stable@vger.kernel.org
Fixes: fe832be05a8ee ("ring-buffer: Have mmapped ring buffer keep track of missed events")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
kernel/trace/ring_buffer.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 241acb470c42..508826886a87 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -7351,10 +7351,25 @@ int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu)
}
} else {
/*
- * There really shouldn't be any missed events if the commit
+ * There really shouldn't be any missed events if the tail_page
* is on the reader page.
*/
- WARN_ON_ONCE(missed_events);
+ if (missed_events) {
+ if (!WARN_ONCE(cpu_buffer->reader_page == cpu_buffer->tail_page,
+ "Reader on commit with %ld missed events",
+ missed_events)) {
+ /*
+ * If the tail page is not on the reader page but
+ * the commit_page is, that would mean that there's
+ * a commit_overrun (an interrupt preempted an
+ * addition of an event and then filled the buffer
+ * with new events). In this case it's not an
+ * error, but it should still be reported.
+ */
+ pr_info("Ring buffer commit overrun lost %ld events at timestamp:%lld\n",
+ missed_events, cpu_buffer->reader_page->page->time_stamp);
+ }
+ }
}
cpu_buffer->lost_events = 0;
--
2.47.2
On Tue, 27 May 2025 12:11:40 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> When reading a memory mapped buffer the reader page is just swapped out
> with the last page written in the write buffer. If the reader page is the
> same as the commit buffer (the buffer that is currently being written to)
> it was assumed that it should never have missed events. If it does, it
> triggers a WARN_ON_ONCE().
>
> But there just happens to be one scenario where this can legitimately
> happen. That is on a commit_overrun. A commit overrun is when an interrupt
> preempts an event being written to the buffer and then the interrupt adds
> so many new events that it fills and wraps the buffer back to the commit.
> Any new events would then be dropped and be reported as "missed_events".
>
> In this case, the next page to read is the commit buffer and after the
> swap of the reader page, the reader page will be the commit buffer, but
> this time there will be missed events and this triggers the following
> warning:
>
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 1127 at kernel/trace/ring_buffer.c:7357 ring_buffer_map_get_reader+0x49a/0x780
> Modules linked in: kvm_intel kvm irqbypass
> CPU: 2 UID: 0 PID: 1127 Comm: trace-cmd Not tainted 6.15.0-rc7-test-00004-g478bc2824b45-dirty #564 PREEMPT
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> RIP: 0010:ring_buffer_map_get_reader+0x49a/0x780
> Code: 00 00 00 48 89 fe 48 c1 ee 03 80 3c 2e 00 0f 85 ec 01 00 00 4d 3b a6 a8 00 00 00 0f 85 8a fd ff ff 48 85 c0 0f 84 55 fe ff ff <0f> 0b e9 4e fe ff ff be 08 00 00 00 4c 89 54 24 58 48 89 54 24 50
> RSP: 0018:ffff888121787dc0 EFLAGS: 00010002
> RAX: 00000000000006a2 RBX: ffff888100062800 RCX: ffffffff8190cb49
> RDX: ffff888126934c00 RSI: 1ffff11020200a15 RDI: ffff8881010050a8
> RBP: dffffc0000000000 R08: 0000000000000000 R09: ffffed1024d26982
> R10: ffff888126934c17 R11: ffff8881010050a8 R12: ffff888126934c00
> R13: ffff8881010050b8 R14: ffff888101005000 R15: ffff888126930008
> FS: 00007f95c8cd7540(0000) GS:ffff8882b576e000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f95c8de4dc0 CR3: 0000000128452002 CR4: 0000000000172ef0
> Call Trace:
> <TASK>
> ? __pfx_ring_buffer_map_get_reader+0x10/0x10
> tracing_buffers_ioctl+0x283/0x370
> __x64_sys_ioctl+0x134/0x190
> do_syscall_64+0x79/0x1c0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7f95c8de48db
> Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
> RSP: 002b:00007ffe037ba110 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00007ffe037bb2b0 RCX: 00007f95c8de48db
> RDX: 0000000000000000 RSI: 0000000000005220 RDI: 0000000000000006
> RBP: 00007ffe037ba180 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007ffe037bb6f8 R14: 00007f95c9065000 R15: 00005575c7492c90
> </TASK>
> irq event stamp: 5080
> hardirqs last enabled at (5079): [<ffffffff83e0adb0>] _raw_spin_unlock_irqrestore+0x50/0x70
> hardirqs last disabled at (5080): [<ffffffff83e0aa83>] _raw_spin_lock_irqsave+0x63/0x70
> softirqs last enabled at (4182): [<ffffffff81516122>] handle_softirqs+0x552/0x710
> softirqs last disabled at (4159): [<ffffffff815163f7>] __irq_exit_rcu+0x107/0x210
> ---[ end trace 0000000000000000 ]---
>
> The way to differentiate this case from the normal case of there only
> being one page written to where the swap of the reader page received that
> one page (which is the commit page), check if the tail page is on the
> reader page. The difference between the commit page and the tail page is
> that the tail page is where new writes go to, and the commit page holds
> the first write that hasn't been committed yet. In the case of an
> interrupt preempting the write of an event and filling the buffer, it
> would move the tail page but not the commit page.
(BTW, what happen if the interrupted process commits the event? That
event will be lost, or commit and just move commit_page?)
Thus the reader_page == commit_page && reader_page == tail_page,
missed_events should be 0?
Possible cases if missed_events != 0:
- reader_page != commit_page
(writer's commit overtook the reader)
- reader_page == commit_page but reader_page != tail_page
(writer overtook the reader, but commit is not completed yet.)
if
- reader_page == commit_page == tail_page
in this case, missed_events should be 0.
Since the reader_page is out of the ring buffer, writer should not
use reader_page while reading the same reader_page, is that right?
>
> Have the warning only trigger if the tail page is also on the reader page,
> and also print out the number of events dropped by a commit overrun as
> that can not yet be safely added to the page so that the reader can see
> there were events dropped.
>
> Cc: stable@vger.kernel.org
> Fixes: fe832be05a8ee ("ring-buffer: Have mmapped ring buffer keep track of missed events")
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> kernel/trace/ring_buffer.c | 19 +++++++++++++++++--
> 1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 241acb470c42..508826886a87 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -7351,10 +7351,25 @@ int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu)
> }
> } else {
> /*
> - * There really shouldn't be any missed events if the commit
> + * There really shouldn't be any missed events if the tail_page
> * is on the reader page.
> */
> - WARN_ON_ONCE(missed_events);
> + if (missed_events) {
> + if (!WARN_ONCE(cpu_buffer->reader_page == cpu_buffer->tail_page,
> + "Reader on commit with %ld missed events",
> + missed_events)) {
> + /*
> + * If the tail page is not on the reader page but
> + * the commit_page is, that would mean that there's
> + * a commit_overrun (an interrupt preempted an
> + * addition of an event and then filled the buffer
> + * with new events). In this case it's not an
> + * error, but it should still be reported.
> + */
> + pr_info("Ring buffer commit overrun lost %ld events at timestamp:%lld\n",
> + missed_events, cpu_buffer->reader_page->page->time_stamp);
Do we need this pr_info() for each commit overrun?
> + }
> + }
> }
Just for cleanup the code idea, with above change, this code is
something like;
----------------
missed_events = cpu_buffer->lost_events;
if (cpu_buffer->reader_page != cpu_buffer->commit_page) {
if (missed_event) {
}
} else {
if (missed_event) {
if (!WARN_ONCE(cpu_buffer->reader_page == cpu_buffer->tail_page,"...")) {
pr_info("...")
}
}
}
----------------
Can we make it as below?
----------------
missed_events = cpu_buffer->lost_events;
if (missed_event) {
if (cpu_buffer->reader_page != cpu_buffer->commit_page) {
} else if (!WARN_ONCE(cpu_buffer->reader_page == cpu_buffer->tail_page, "...") {
/**/
pr_info("...");
}
}
----------------
Thank you,
>
> cpu_buffer->lost_events = 0;
> --
> 2.47.2
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
On Wed, 28 May 2025 10:42:03 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> > The way to differentiate this case from the normal case of there
> > only being one page written to where the swap of the reader page
> > received that one page (which is the commit page), check if the
> > tail page is on the reader page. The difference between the commit
> > page and the tail page is that the tail page is where new writes go
> > to, and the commit page holds the first write that hasn't been
> > committed yet. In the case of an interrupt preempting the write of
> > an event and filling the buffer, it would move the tail page but
> > not the commit page.
>
> (BTW, what happen if the interrupted process commits the event? That
> event will be lost, or commit and just move commit_page?)
No, the first event to be created is the "commit" event. If it gets
interrupted and the interrupt adds a bunch of events that wraps the
ring buffer, it can't touch the commit event, it will just start
dropping events. Then when the commit event finishes, it can either be
read by the reader, or overwritten by the next events coming in.
>
>
> Thus the reader_page == commit_page && reader_page == tail_page,
> missed_events should be 0?
>
> Possible cases if missed_events != 0:
>
> - reader_page != commit_page
> (writer's commit overtook the reader)
The reader is never in the write buffer. Just the head page will move.
When a new reader page is taken it will swap out the old reader page
with the head page. If the head page is the commit page, then the
commit page becomes the reader page too.
>
> - reader_page == commit_page but reader_page != tail_page
> (writer overtook the reader, but commit is not completed yet.)
No, "writer overtook the reader" doesn't make sense as the reader is
not on the write buffer, so the writer can not catch up to it. What the
write buffer has is the "head" page which is where the next reader will
come to.
The only way reader_page == commit_page and reader_page != tail_page is
if the commit was interrupted and the interrupt added events and moved
forward off the commit_page. The only way there would be missed events
in that case is if the interrupt added so many events it wrapped the
buffer and then started dropping events.
>
> if
> - reader_page == commit_page == tail_page
> in this case, missed_events should be 0.
>
> Since the reader_page is out of the ring buffer, writer should not
> use reader_page while reading the same reader_page, is that right?
Correct. But the writer could end up on the reader page after the swap,
if the head page happened to be the commit page.
>
> > cpu_buffer->tail_page,
> > + "Reader on commit with %ld
> > missed events",
> > + missed_events)) {
> > + /*
> > + * If the tail page is not on the
> > reader page but
> > + * the commit_page is, that would
> > mean that there's
> > + * a commit_overrun (an interrupt
> > preempted an
> > + * addition of an event and then
> > filled the buffer
> > + * with new events). In this case
> > it's not an
> > + * error, but it should still be
> > reported.
> > + */
> > + pr_info("Ring buffer commit
> > overrun lost %ld events at timestamp:%lld\n",
> > + missed_events,
> > cpu_buffer->reader_page->page->time_stamp);
>
> Do we need this pr_info() for each commit overrun?
Yes. When doing this stress test, it printed at most 4 times. It
happens once per time the interrupt fills the buffer while interrupting
the buffer.
I seldom ever get commit overruns. It's one of the fields in the status
file located in: /sys/kernel/tracing/per_cpu/cpu*/stats
>
> > + }
> > + }
> > }
>
> Just for cleanup the code idea, with above change, this code is
> something like;
>
> ----------------
>
> missed_events = cpu_buffer->lost_events;
>
> if (cpu_buffer->reader_page != cpu_buffer->commit_page) {
> if (missed_event) {
>
> }
> } else {
> if (missed_event) {
> if (!WARN_ONCE(cpu_buffer->reader_page ==
> cpu_buffer->tail_page,"...")) { pr_info("...")
> }
> }
> }
>
> ----------------
>
> Can we make it as below?
>
> ----------------
> missed_events = cpu_buffer->lost_events;
>
> if (missed_event) {
> if (cpu_buffer->reader_page != cpu_buffer->commit_page) {
>
> } else if (!WARN_ONCE(cpu_buffer->reader_page ==
> cpu_buffer->tail_page, "...") { /**/
> pr_info("...");
> }
> }
Hmm, OK, I'll look at that.
Thanks,
-- Steve
On Tue, 27 May 2025 22:17:35 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed, 28 May 2025 10:42:03 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
>
> > > The way to differentiate this case from the normal case of there
> > > only being one page written to where the swap of the reader page
> > > received that one page (which is the commit page), check if the
> > > tail page is on the reader page. The difference between the commit
> > > page and the tail page is that the tail page is where new writes go
> > > to, and the commit page holds the first write that hasn't been
> > > committed yet. In the case of an interrupt preempting the write of
> > > an event and filling the buffer, it would move the tail page but
> > > not the commit page.
> >
> > (BTW, what happen if the interrupted process commits the event? That
> > event will be lost, or commit and just move commit_page?)
>
> No, the first event to be created is the "commit" event. If it gets
> interrupted and the interrupt adds a bunch of events that wraps the
> ring buffer, it can't touch the commit event, it will just start
> dropping events. Then when the commit event finishes, it can either be
> read by the reader, or overwritten by the next events coming in.
Hmm, rb_end_commit() seems to try commit all nested events (committing > 1)
if committing == 1, and the rb_set_commit_to_write() pushes commit pointer
to the latest write pointer. So when it "start dropping events"? (is that
write side?)
> >
> > Thus the reader_page == commit_page && reader_page == tail_page,
> > missed_events should be 0?
> >
> > Possible cases if missed_events != 0:
> >
> > - reader_page != commit_page
> > (writer's commit overtook the reader)
>
> The reader is never in the write buffer. Just the head page will move.
> When a new reader page is taken it will swap out the old reader page
> with the head page. If the head page is the commit page, then the
> commit page becomes the reader page too.
Hmm, so this is the case that the writer is too fast and it moves
head page to the next page. Thus some events on old head page are
lost.
>
> >
> > - reader_page == commit_page but reader_page != tail_page
> > (writer overtook the reader, but commit is not completed yet.)
>
> No, "writer overtook the reader" doesn't make sense as the reader is
> not on the write buffer, so the writer can not catch up to it. What the
> write buffer has is the "head" page which is where the next reader will
> come to.
>
> The only way reader_page == commit_page and reader_page != tail_page is
> if the commit was interrupted and the interrupt added events and moved
> forward off the commit_page. The only way there would be missed events
> in that case is if the interrupt added so many events it wrapped the
> buffer and then started dropping events.
OK, so "wrapped the buffer" means "tail (write) page caught up the commit"?
>
> >
> > if
> > - reader_page == commit_page == tail_page
> > in this case, missed_events should be 0.
> >
> > Since the reader_page is out of the ring buffer, writer should not
> > use reader_page while reading the same reader_page, is that right?
>
> Correct. But the writer could end up on the reader page after the swap,
> if the head page happened to be the commit page.
OK.
>
> >
>
>
> > > cpu_buffer->tail_page,
> > > + "Reader on commit with %ld
> > > missed events",
> > > + missed_events)) {
> > > + /*
> > > + * If the tail page is not on the
> > > reader page but
> > > + * the commit_page is, that would
> > > mean that there's
> > > + * a commit_overrun (an interrupt
> > > preempted an
> > > + * addition of an event and then
> > > filled the buffer
> > > + * with new events). In this case
> > > it's not an
> > > + * error, but it should still be
> > > reported.
> > > + */
> > > + pr_info("Ring buffer commit
> > > overrun lost %ld events at timestamp:%lld\n",
> > > + missed_events,
> > > cpu_buffer->reader_page->page->time_stamp);
> >
> > Do we need this pr_info() for each commit overrun?
>
> Yes. When doing this stress test, it printed at most 4 times. It
> happens once per time the interrupt fills the buffer while interrupting
> the buffer.
Ah, I meant we should report it in printk buffer, since the event
lost will happen in some case and reader can notice that.
>
> I seldom ever get commit overruns. It's one of the fields in the status
> file located in: /sys/kernel/tracing/per_cpu/cpu*/stats
>
> >
> > > + }
> > > + }
> > > }
> >
> > Just for cleanup the code idea, with above change, this code is
> > something like;
> >
> > ----------------
> >
> > missed_events = cpu_buffer->lost_events;
> >
> > if (cpu_buffer->reader_page != cpu_buffer->commit_page) {
> > if (missed_event) {
> >
> > }
> > } else {
> > if (missed_event) {
> > if (!WARN_ONCE(cpu_buffer->reader_page ==
> > cpu_buffer->tail_page,"...")) { pr_info("...")
> > }
> > }
> > }
> >
> > ----------------
> >
> > Can we make it as below?
> >
> > ----------------
> > missed_events = cpu_buffer->lost_events;
> >
> > if (missed_event) {
> > if (cpu_buffer->reader_page != cpu_buffer->commit_page) {
> >
> > } else if (!WARN_ONCE(cpu_buffer->reader_page ==
> > cpu_buffer->tail_page, "...") { /**/
> > pr_info("...");
> > }
> > }
>
> Hmm, OK, I'll look at that.
Thanks! Anyway, the logic looks good to me.
>
> Thanks,
>
> -- Steve
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
On Thu, 29 May 2025 00:03:42 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> > > (BTW, what happen if the interrupted process commits the event? That
> > > event will be lost, or commit and just move commit_page?)
> >
> > No, the first event to be created is the "commit" event. If it gets
> > interrupted and the interrupt adds a bunch of events that wraps the
> > ring buffer, it can't touch the commit event, it will just start
> > dropping events. Then when the commit event finishes, it can either be
> > read by the reader, or overwritten by the next events coming in.
>
> Hmm, rb_end_commit() seems to try commit all nested events (committing > 1)
> if committing == 1, and the rb_set_commit_to_write() pushes commit pointer
> to the latest write pointer. So when it "start dropping events"? (is that
> write side?)
The "start dropping events" is when committing > 1. The commit overrun
events dropping happens in rb_move_tail(), where a write falls off the end
of the sub buffer and moves the tail to the next sub-buffer. If the next
sub-buffer is the commit page, then it fails and just drops the event. This
will never happen when committing == 1.
>
> > >
> > > Thus the reader_page == commit_page && reader_page == tail_page,
> > > missed_events should be 0?
> > >
> > > Possible cases if missed_events != 0:
> > >
> > > - reader_page != commit_page
> > > (writer's commit overtook the reader)
> >
> > The reader is never in the write buffer. Just the head page will move.
> > When a new reader page is taken it will swap out the old reader page
> > with the head page. If the head page is the commit page, then the
> > commit page becomes the reader page too.
>
> Hmm, so this is the case that the writer is too fast and it moves
> head page to the next page. Thus some events on old head page are
> lost.
This has nothing to do with the speed of the reader. When a event is being
written, and it gets interrupted and the interrupt starts writing events,
the interrupted event stays put. The reader could be very fast, but it will
stop when it gets to the commit page. Thus, the reader could be all caught
up, but is now waiting for new events to be committed.
If an event is interrupted, no event is fully committed until that event is
no longer interrupted and then can continue.
rb_reserve_next_event() {
rb_start_commit() <<-- here the reader can never go past the commit page
[ interrupt start ]
for (i = 0; i < 10000000000; i++)
rb_reserve_next_event(); <<-- This will eventually fill up the
buffer, but will stop when it
hits the commit page and that
will be when the commit overrun
events start dropping.
[ On another CPU:
ring_buffer_map_get_reader() <<-- This now gets the commit page as the
reader page, and there will be missed
events.
]
[ interrupt end ]
rb_end_commit() <<-- This will move the commit page forward.
}
This is a very unlikely scenario, but I can trigger it with the
perf/trace-cmd test that enables all events and function tracing with KASAN
and lockdep enabled running hackbench, as that pretty much creates the:
for (i = 0; i < 10000000000; i++)
rb_reserve_next_event();
scenario.
And even in that case, it triggers at most a couple of times a second:
> OK, so "wrapped the buffer" means "tail (write) page caught up the commit"?
Yes.
> > > Do we need this pr_info() for each commit overrun?
> >
> > Yes. When doing this stress test, it printed at most 4 times. It
> > happens once per time the interrupt fills the buffer while interrupting
> > the buffer.
>
> Ah, I meant we should report it in printk buffer, since the event
> lost will happen in some case and reader can notice that.
I think we agree here?
Thanks for the review, I'll have a new patch shortly.
-- Steve
On Wed, 28 May 2025 11:34:39 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > And even in that case, it triggers at most a couple of times a second: Forgot to cut and paste: [ 367.010564] Ring buffer [1] commit overrun lost 1070 events at timestamp:367737571566 [ 371.328151] Ring buffer [3] commit overrun lost 21442 events at timestamp:372051356588 This run had only two prints. I've had at most up to 8 prints for a test that runs over a minute. I updated the code to print the per CPU buffer it happened on as well as the timestamp. -- Steve
On Tue, 27 May 2025 12:11:40 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > But there just happens to be one scenario where this can legitimately > happen. That is on a commit_overrun. A commit overrun is when an interrupt > preempts an event being written to the buffer and then the interrupt adds > so many new events that it fills and wraps the buffer back to the commit. > Any new events would then be dropped and be reported as "missed_events". I'll probably update the commit log, but the way I triggered this was to run: # perf record -o perf-test.dat -a -- trace-cmd record --nosplice -e all -p function hackbench 50 Which causes perf to trigger a bunch of interrupts while trace-cmd enables function tracing and all events. This is on a debug kernel that has lockdep, KASAN and interrupt and preemption disabling events enabled. Basically, this causes a lot to be traced in an interrupt. Enough to fill 1.4 megs of the tracing buffer with events in interrupts before a single event could be recorded. I've never triggered this when those extreme conditions were not there. -- Steve
On Tue, 27 May 2025 16:11:30 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > On Tue, 27 May 2025 12:11:40 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > But there just happens to be one scenario where this can legitimately > > happen. That is on a commit_overrun. A commit overrun is when an interrupt > > preempts an event being written to the buffer and then the interrupt adds > > so many new events that it fills and wraps the buffer back to the commit. > > Any new events would then be dropped and be reported as "missed_events". > > I'll probably update the commit log, but the way I triggered this was to run: > > # perf record -o perf-test.dat -a -- trace-cmd record --nosplice -e all -p function hackbench 50 Hmm, so this runs 3 commands, hackbench, which is traced by trace-cmd, which is traced by perf. > > Which causes perf to trigger a bunch of interrupts while trace-cmd enables > function tracing and all events. This is on a debug kernel that has > lockdep, KASAN and interrupt and preemption disabling events enabled. Ah, that is the full-set of the interrupt and tracing :) Thanks, > > Basically, this causes a lot to be traced in an interrupt. Enough to fill > 1.4 megs of the tracing buffer with events in interrupts before a single > event could be recorded. > > I've never triggered this when those extreme conditions were not there. > > -- Steve -- Masami Hiramatsu (Google) <mhiramat@kernel.org>
© 2016 - 2025 Red Hat, Inc.