kernel/trace/ring_buffer.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
From: Steven Rostedt <rostedt@goodmis.org>
In order to prevent any subtle races with the buffer update counter,
change it to an atomic_t. Also, since atomic_t is 32 bits, move its
location in the ring_buffer_per_cpu structure next to "current_context" as
that too is only 32 bits (making 64 bit alignment).
The counter is only used to detect that the buffer has been updated when
the buffer verifier check is being done. It's not really that important
that it's atomic or not. But since the updates to the counter are never in
the fast path, having it be consistent isn't a bad thing.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Note, this is based on top of:
https://lore.kernel.org/linux-trace-kernel/20240715145141.5528-1-petr.pavlu@suse.com/
kernel/trace/ring_buffer.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index a6a1c26ea2e3..bbf7f68f9af2 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -481,9 +481,9 @@ struct ring_buffer_per_cpu {
struct buffer_data_page *free_page;
unsigned long nr_pages;
unsigned int current_context;
- struct list_head *pages;
/* pages generation counter, incremented when the list changes */
- unsigned long cnt;
+ atomic_t cnt;
+ struct list_head *pages;
struct buffer_page *head_page; /* read from head */
struct buffer_page *tail_page; /* write to tail */
struct buffer_page *commit_page; /* committed pages */
@@ -1532,7 +1532,7 @@ static void rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
head = rb_list_head(cpu_buffer->pages);
if (!rb_check_links(cpu_buffer, head))
goto out_locked;
- buffer_cnt = cpu_buffer->cnt;
+ buffer_cnt = atomic_read(&cpu_buffer->cnt);
tmp = head;
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
return;
@@ -1540,7 +1540,7 @@ static void rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
while (true) {
raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
- if (buffer_cnt != cpu_buffer->cnt) {
+ if (buffer_cnt != atomic_read(&cpu_buffer->cnt)) {
/* The list was updated, try again. */
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
goto again;
@@ -2585,7 +2585,7 @@ rb_remove_pages(struct ring_buffer_per_cpu *cpu_buffer, unsigned long nr_pages)
/* make sure pages points to a valid page in the ring buffer */
cpu_buffer->pages = next_page;
- cpu_buffer->cnt++;
+ atomic_inc(&cpu_buffer->cnt);
/* update head page */
if (head_bit)
@@ -2692,7 +2692,7 @@ rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer)
* pointer to point to end of list
*/
head_page->prev = last_page;
- cpu_buffer->cnt++;
+ atomic_inc(&cpu_buffer->cnt);
success = true;
break;
}
@@ -5347,7 +5347,7 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
rb_list_head(reader->list.next)->prev = &cpu_buffer->reader_page->list;
rb_inc_page(&cpu_buffer->head_page);
- cpu_buffer->cnt++;
+ atomic_inc(&cpu_buffer->cnt);
local_inc(&cpu_buffer->pages_read);
/* Finally update the reader page to the new head */
@@ -6804,7 +6804,7 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order)
cpu_buffer->pages = cpu_buffer->new_pages.next;
cpu_buffer->new_pages.next->prev = cpu_buffer->new_pages.prev;
cpu_buffer->new_pages.prev->next = cpu_buffer->new_pages.next;
- cpu_buffer->cnt++;
+ atomic_inc(&cpu_buffer->cnt);
/* Clear the new_pages list */
INIT_LIST_HEAD(&cpu_buffer->new_pages);
--
2.45.2
On 10/11/24 01:58, Steven Rostedt wrote: > From: Steven Rostedt <rostedt@goodmis.org> > > In order to prevent any subtle races with the buffer update counter, > change it to an atomic_t. Also, since atomic_t is 32 bits, move its > location in the ring_buffer_per_cpu structure next to "current_context" as > that too is only 32 bits (making 64 bit alignment). > > The counter is only used to detect that the buffer has been updated when > the buffer verifier check is being done. It's not really that important > that it's atomic or not. But since the updates to the counter are never in > the fast path, having it be consistent isn't a bad thing. > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > --- > Note, this is based on top of: > > https://lore.kernel.org/linux-trace-kernel/20240715145141.5528-1-petr.pavlu@suse.com/ Sorry for not replying to your last comment on my patch, I was ill. The member ring_buffer_per_cpu.cnt is intended to be accessed under the reader_lock, same as the pages pointer which it is tied to, so this change shouldn't be strictly needed. -- Thanks, Petr
On Fri, 11 Oct 2024 10:05:51 +0200 Petr Pavlu <petr.pavlu@suse.com> wrote: > On 10/11/24 01:58, Steven Rostedt wrote: > > From: Steven Rostedt <rostedt@goodmis.org> > > > > In order to prevent any subtle races with the buffer update counter, > > change it to an atomic_t. Also, since atomic_t is 32 bits, move its > > location in the ring_buffer_per_cpu structure next to "current_context" as > > that too is only 32 bits (making 64 bit alignment). > > > > The counter is only used to detect that the buffer has been updated when > > the buffer verifier check is being done. It's not really that important > > that it's atomic or not. But since the updates to the counter are never in > > the fast path, having it be consistent isn't a bad thing. > > > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > --- > > Note, this is based on top of: > > > > https://lore.kernel.org/linux-trace-kernel/20240715145141.5528-1-petr.pavlu@suse.com/ > > Sorry for not replying to your last comment on my patch, I was ill. > > The member ring_buffer_per_cpu.cnt is intended to be accessed under the > reader_lock, same as the pages pointer which it is tied to, so this > change shouldn't be strictly needed. > Right, but there was one location that the cnt was updated outside the lock. The one I commented on. But instead of adding a lock around it, I decided to just make it an atomic. Then there would be no need for the lock. Hmm, it still needs a memory barrier though. At least a smp_mb__after_atomic(). -- Steve
On Fri, 11 Oct 2024 10:01:32 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > > Sorry for not replying to your last comment on my patch, I was ill. > > > > The member ring_buffer_per_cpu.cnt is intended to be accessed under the > > reader_lock, same as the pages pointer which it is tied to, so this > > change shouldn't be strictly needed. > > > > Right, but there was one location that the cnt was updated outside the > lock. The one I commented on. But instead of adding a lock around it, I > decided to just make it an atomic. Then there would be no need for the > lock. Hmm, it still needs a memory barrier though. At least a > smp_mb__after_atomic(). Hmm, I don't like how the update in ring_buffer_subbuf_order_set() is unprotected. I think this is the proper patch: diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 696d422d5b35..0672df07b599 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -6774,6 +6774,7 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) } for_each_buffer_cpu(buffer, cpu) { + unsigned long flags; if (!cpumask_test_cpu(cpu, buffer->cpumask)) continue; @@ -6800,11 +6801,15 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) struct buffer_page, list); list_del_init(&cpu_buffer->reader_page->list); + /* Synchronize with rb_check_pages() */ + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); + /* The cpu_buffer pages are a link list with no head */ cpu_buffer->pages = cpu_buffer->new_pages.next; cpu_buffer->new_pages.next->prev = cpu_buffer->new_pages.prev; cpu_buffer->new_pages.prev->next = cpu_buffer->new_pages.next; cpu_buffer->cnt++; + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); /* Clear the new_pages list */ INIT_LIST_HEAD(&cpu_buffer->new_pages); Even though it's also protected by the buffer->mutex, I feel more comfortable with this. -- Steve
On 10/11/24 16:48, Steven Rostedt wrote: > On Fri, 11 Oct 2024 10:01:32 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > >>> Sorry for not replying to your last comment on my patch, I was ill. >>> >>> The member ring_buffer_per_cpu.cnt is intended to be accessed under the >>> reader_lock, same as the pages pointer which it is tied to, so this >>> change shouldn't be strictly needed. >>> >> >> Right, but there was one location that the cnt was updated outside the >> lock. The one I commented on. But instead of adding a lock around it, I >> decided to just make it an atomic. Then there would be no need for the >> lock. Hmm, it still needs a memory barrier though. At least a >> smp_mb__after_atomic(). > > Hmm, I don't like how the update in ring_buffer_subbuf_order_set() is > unprotected. I think this is the proper patch: > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > index 696d422d5b35..0672df07b599 100644 > --- a/kernel/trace/ring_buffer.c > +++ b/kernel/trace/ring_buffer.c > @@ -6774,6 +6774,7 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) > } > > for_each_buffer_cpu(buffer, cpu) { > + unsigned long flags; > > if (!cpumask_test_cpu(cpu, buffer->cpumask)) > continue; > @@ -6800,11 +6801,15 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) > struct buffer_page, list); > list_del_init(&cpu_buffer->reader_page->list); > > + /* Synchronize with rb_check_pages() */ > + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); > + > /* The cpu_buffer pages are a link list with no head */ > cpu_buffer->pages = cpu_buffer->new_pages.next; > cpu_buffer->new_pages.next->prev = cpu_buffer->new_pages.prev; > cpu_buffer->new_pages.prev->next = cpu_buffer->new_pages.next; > cpu_buffer->cnt++; > + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); > > /* Clear the new_pages list */ > INIT_LIST_HEAD(&cpu_buffer->new_pages); > > Even though it's also protected by the buffer->mutex, I feel more > comfortable with this. Sorry, I missed context of your comment and that it is about ring_buffer_subbuf_order_set(). I agree, I also noticed the missing locking in this function and it looked to me as something that should be fixed. I happen to have a somewhat more complex patch for it from a few months ago (pasted below). I think I didn't send it to the list because I then noticed other potential locking problems with the subbuf code, which I wanted to examine more closely first. -- Petr From 359f5aa523bc27ca8600b4efae471eefc0514ce0 Mon Sep 17 00:00:00 2001 From: Petr Pavlu <petr.pavlu@suse.com> Date: Tue, 16 Jul 2024 11:35:00 +0200 Subject: [PATCH] ring-buffer: Fix reader locking when changing the sub buffer order The function ring_buffer_subbuf_order_set() updates each ring_buffer_per_cpu and installs new sub buffers that match the requested page order. This operation may be invoked concurrently with readers that rely on some of the modified data, such as the head bit (RB_PAGE_HEAD), or the ring_buffer_per_cpu.pages and reader_page pointers. However, no exclusive access is acquired by ring_buffer_subbuf_order_set(). Modyfing the mentioned data while a reader also operates on them can then result in incorrect memory access and various crashes. Fix the problem by taking the reader_lock when updating a specific ring_buffer_per_cpu in ring_buffer_subbuf_order_set(). Fixes: 8e7b58c27b3c ("ring-buffer: Just update the subbuffers when changing their allocation order") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> --- kernel/trace/ring_buffer.c | 41 ++++++++++++++++++++++++-------------- 1 file changed, 26 insertions(+), 15 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 4c24191fa47d..d324f4f9ab9d 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -6773,36 +6773,38 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) } for_each_buffer_cpu(buffer, cpu) { + struct buffer_data_page *old_free_data_page; + struct list_head old_pages; + unsigned long flags; if (!cpumask_test_cpu(cpu, buffer->cpumask)) continue; cpu_buffer = buffer->buffers[cpu]; + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); + /* Clear the head bit to make the link list normal to read */ rb_head_page_deactivate(cpu_buffer); - /* Now walk the list and free all the old sub buffers */ - list_for_each_entry_safe(bpage, tmp, cpu_buffer->pages, list) { - list_del_init(&bpage->list); - free_buffer_page(bpage); - } - /* The above loop stopped an the last page needing to be freed */ - bpage = list_entry(cpu_buffer->pages, struct buffer_page, list); - free_buffer_page(bpage); - - /* Free the current reader page */ - free_buffer_page(cpu_buffer->reader_page); + /* + * Collect buffers from the cpu_buffer pages list and the + * reader_page on old_pages, so they can be freed later when not + * under a spinlock. The pages list is a linked list with no + * head, adding old_pages turns it into a regular list with + * old_pages being the head. + */ + list_add(&old_pages, cpu_buffer->pages); + list_add(&cpu_buffer->reader_page->list, &old_pages); /* One page was allocated for the reader page */ cpu_buffer->reader_page = list_entry(cpu_buffer->new_pages.next, struct buffer_page, list); list_del_init(&cpu_buffer->reader_page->list); - /* The cpu_buffer pages are a link list with no head */ + /* Install the new pages, remove the head from the list */ cpu_buffer->pages = cpu_buffer->new_pages.next; - cpu_buffer->new_pages.next->prev = cpu_buffer->new_pages.prev; - cpu_buffer->new_pages.prev->next = cpu_buffer->new_pages.next; + list_del(&cpu_buffer->new_pages); cpu_buffer->cnt++; /* Clear the new_pages list */ @@ -6815,11 +6817,20 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) cpu_buffer->nr_pages = cpu_buffer->nr_pages_to_update; cpu_buffer->nr_pages_to_update = 0; - free_pages((unsigned long)cpu_buffer->free_page, old_order); + old_free_data_page = cpu_buffer->free_page; cpu_buffer->free_page = NULL; rb_head_page_activate(cpu_buffer); + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); + + /* Free old sub buffers */ + list_for_each_entry_safe(bpage, tmp, &old_pages, list) { + list_del_init(&bpage->list); + free_buffer_page(bpage); + } + free_pages((unsigned long)old_free_data_page, old_order); + rb_check_pages(cpu_buffer); }
On Fri, 11 Oct 2024 17:20:12 +0200 Petr Pavlu <petr.pavlu@suse.com> wrote: > >From 359f5aa523bc27ca8600b4efae471eefc0514ce0 Mon Sep 17 00:00:00 2001 > From: Petr Pavlu <petr.pavlu@suse.com> > Date: Tue, 16 Jul 2024 11:35:00 +0200 > Subject: [PATCH] ring-buffer: Fix reader locking when changing the sub buffer > order > > The function ring_buffer_subbuf_order_set() updates each > ring_buffer_per_cpu and installs new sub buffers that match the requested > page order. This operation may be invoked concurrently with readers that > rely on some of the modified data, such as the head bit (RB_PAGE_HEAD), or > the ring_buffer_per_cpu.pages and reader_page pointers. However, no > exclusive access is acquired by ring_buffer_subbuf_order_set(). Modyfing > the mentioned data while a reader also operates on them can then result in > incorrect memory access and various crashes. > > Fix the problem by taking the reader_lock when updating a specific > ring_buffer_per_cpu in ring_buffer_subbuf_order_set(). > > Fixes: 8e7b58c27b3c ("ring-buffer: Just update the subbuffers when changing their allocation order") > Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> > --- > kernel/trace/ring_buffer.c | 41 ++++++++++++++++++++++++-------------- > 1 file changed, 26 insertions(+), 15 deletions(-) > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > index 4c24191fa47d..d324f4f9ab9d 100644 > --- a/kernel/trace/ring_buffer.c > +++ b/kernel/trace/ring_buffer.c > @@ -6773,36 +6773,38 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) > } > > for_each_buffer_cpu(buffer, cpu) { > + struct buffer_data_page *old_free_data_page; > + struct list_head old_pages; > + unsigned long flags; > > if (!cpumask_test_cpu(cpu, buffer->cpumask)) > continue; > > cpu_buffer = buffer->buffers[cpu]; > > + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); > + > /* Clear the head bit to make the link list normal to read */ > rb_head_page_deactivate(cpu_buffer); > > - /* Now walk the list and free all the old sub buffers */ > - list_for_each_entry_safe(bpage, tmp, cpu_buffer->pages, list) { > - list_del_init(&bpage->list); > - free_buffer_page(bpage); > - } > - /* The above loop stopped an the last page needing to be freed */ > - bpage = list_entry(cpu_buffer->pages, struct buffer_page, list); > - free_buffer_page(bpage); > - > - /* Free the current reader page */ > - free_buffer_page(cpu_buffer->reader_page); > + /* > + * Collect buffers from the cpu_buffer pages list and the > + * reader_page on old_pages, so they can be freed later when not > + * under a spinlock. The pages list is a linked list with no > + * head, adding old_pages turns it into a regular list with > + * old_pages being the head. > + */ > + list_add(&old_pages, cpu_buffer->pages); > + list_add(&cpu_buffer->reader_page->list, &old_pages); > > /* One page was allocated for the reader page */ > cpu_buffer->reader_page = list_entry(cpu_buffer->new_pages.next, > struct buffer_page, list); > list_del_init(&cpu_buffer->reader_page->list); > > - /* The cpu_buffer pages are a link list with no head */ > + /* Install the new pages, remove the head from the list */ > cpu_buffer->pages = cpu_buffer->new_pages.next; > - cpu_buffer->new_pages.next->prev = cpu_buffer->new_pages.prev; > - cpu_buffer->new_pages.prev->next = cpu_buffer->new_pages.next; > + list_del(&cpu_buffer->new_pages); Heh, not sure why I didn't just use list_del() on the head itself instead of open coding it. But anyway, it should be list_del_init(), as I will likely remove the initialization of new_pages and have it always be initialized. I rather have it clean than pointing to something unknown. Care to send this as a real patch? We can look at any other locking issues here separately, but this should be fixed now. -- Steve > cpu_buffer->cnt++; > > /* Clear the new_pages list */ > @@ -6815,11 +6817,20 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) > cpu_buffer->nr_pages = cpu_buffer->nr_pages_to_update; > cpu_buffer->nr_pages_to_update = 0; > > - free_pages((unsigned long)cpu_buffer->free_page, old_order); > + old_free_data_page = cpu_buffer->free_page; > cpu_buffer->free_page = NULL; > > rb_head_page_activate(cpu_buffer); > > + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); > + > + /* Free old sub buffers */ > + list_for_each_entry_safe(bpage, tmp, &old_pages, list) { > + list_del_init(&bpage->list); > + free_buffer_page(bpage); > + } > + free_pages((unsigned long)old_free_data_page, old_order); > + > rb_check_pages(cpu_buffer); > } >
On Fri, 11 Oct 2024 17:20:12 +0200 Petr Pavlu <petr.pavlu@suse.com> wrote: > I agree, I also noticed the missing locking in this function and it > looked to me as something that should be fixed. I happen to have > a somewhat more complex patch for it from a few months ago (pasted > below). I think I didn't send it to the list because I then noticed > other potential locking problems with the subbuf code, which I wanted to > examine more closely first. > Hmm, I think you are correct that the buffer->mutex isn't enough for the sub buffer page and it requires a bigger window. I'll look at your patch and also the logic to see if it can be squeezed down a little. -- Steve
© 2016 - 2024 Red Hat, Inc.