The comment above uprobe_write_opcode() is wrong, unapply_uprobe() calls
it under mmap_read_lock() and this is correct.
And it is completely unclear why register_for_each_vma() takes mmap_lock
for writing, add a comment to explain that mmap_write_lock() is needed to
avoid the following race:
- A task T hits the bp installed by uprobe and calls
find_active_uprobe()
- uprobe_unregister() removes this uprobe/bp
- T calls find_uprobe() which returns NULL
- another uprobe_register() installs the bp at the same address
- T calls is_trap_at_addr() which returns true
- T returns to handle_swbp() and gets SIGTRAP.
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/events/uprobes.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 2c83ba776fc7..d52b624a50fa 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -453,7 +453,7 @@ static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
* @vaddr: the virtual address to store the opcode.
* @opcode: opcode to be written at @vaddr.
*
- * Called with mm->mmap_lock held for write.
+ * Called with mm->mmap_lock held for read or write.
* Return 0 (success) or a negative errno.
*/
int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
@@ -1046,7 +1046,12 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
if (err && is_register)
goto free;
-
+ /*
+ * We take mmap_lock for writing to avoid the race with
+ * find_active_uprobe(), install_breakpoint() must not
+ * make is_trap_at_addr() true right after find_uprobe()
+ * returns NULL.
+ */
mmap_write_lock(mm);
vma = find_vma(mm, info->vaddr);
if (!vma || !valid_vma(vma, is_register) ||
--
2.25.1.362.g51ebf55
On Wed, 10 Jul 2024 16:00:45 +0200 Oleg Nesterov <oleg@redhat.com> wrote: > The comment above uprobe_write_opcode() is wrong, unapply_uprobe() calls > it under mmap_read_lock() and this is correct. > > And it is completely unclear why register_for_each_vma() takes mmap_lock > for writing, add a comment to explain that mmap_write_lock() is needed to > avoid the following race: > > - A task T hits the bp installed by uprobe and calls > find_active_uprobe() > > - uprobe_unregister() removes this uprobe/bp > > - T calls find_uprobe() which returns NULL > > - another uprobe_register() installs the bp at the same address > > - T calls is_trap_at_addr() which returns true > > - T returns to handle_swbp() and gets SIGTRAP. > > Reported-by: Andrii Nakryiko <andrii@kernel.org> > Signed-off-by: Oleg Nesterov <oleg@redhat.com> > --- > kernel/events/uprobes.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index 2c83ba776fc7..d52b624a50fa 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -453,7 +453,7 @@ static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm, > * @vaddr: the virtual address to store the opcode. > * @opcode: opcode to be written at @vaddr. > * > - * Called with mm->mmap_lock held for write. > + * Called with mm->mmap_lock held for read or write. > * Return 0 (success) or a negative errno. > */ > int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, > @@ -1046,7 +1046,12 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new) > > if (err && is_register) > goto free; > - > + /* > + * We take mmap_lock for writing to avoid the race with > + * find_active_uprobe(), install_breakpoint() must not > + * make is_trap_at_addr() true right after find_uprobe() > + * returns NULL. Sorry, I couldn't catch the latter part. What is the relationship of taking the mmap_lock and install_breakpoint() and is_trap_at_addr() here? You meant that find_active_uprobe() is using find_uprobe() which searchs uprobe form rbtree? But it seems uprobe is already inserted to the rbtree in alloc_uprobe() so find_uprobe() will not return NULL here, right? Thank you, > + */ > mmap_write_lock(mm); > vma = find_vma(mm, info->vaddr); > if (!vma || !valid_vma(vma, is_register) || > -- > 2.25.1.362.g51ebf55 > > -- Masami Hiramatsu (Google) <mhiramat@kernel.org>
On 07/10, Masami Hiramatsu wrote:
>
> On Wed, 10 Jul 2024 16:00:45 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > The comment above uprobe_write_opcode() is wrong, unapply_uprobe() calls
> > it under mmap_read_lock() and this is correct.
> >
> > And it is completely unclear why register_for_each_vma() takes mmap_lock
> > for writing, add a comment to explain that mmap_write_lock() is needed to
> > avoid the following race:
> >
> > - A task T hits the bp installed by uprobe and calls
> > find_active_uprobe()
> >
> > - uprobe_unregister() removes this uprobe/bp
> >
> > - T calls find_uprobe() which returns NULL
> >
> > - another uprobe_register() installs the bp at the same address
> >
> > - T calls is_trap_at_addr() which returns true
> >
> > - T returns to handle_swbp() and gets SIGTRAP.
...
> > int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> > @@ -1046,7 +1046,12 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
> >
> > if (err && is_register)
> > goto free;
> > -
> > + /*
> > + * We take mmap_lock for writing to avoid the race with
> > + * find_active_uprobe(), install_breakpoint() must not
> > + * make is_trap_at_addr() true right after find_uprobe()
> > + * returns NULL.
>
> Sorry, I couldn't catch the latter part. What is the relationship of
> taking the mmap_lock and install_breakpoint() and is_trap_at_addr() here?
Please the the changelog above, it tries to explain this race with more
details...
> You meant that find_active_uprobe() is using find_uprobe() which searchs
> uprobe form rbtree?
Yes,
> But it seems uprobe is already inserted to the rbtree
> in alloc_uprobe() so find_uprobe() will not return NULL here, right?
uprobe_register() -> alloc_uprobe() can come after
find_active_uprobe() -> find_uprobe() returns NULL.
Now, if uprobe_register() -> register_for_each_vma() used mmap_read_lock(), it
could do install_breakpoint() before find_active_uprobe() calls is_trap_at_addr().
In this case find_active_uprobe() returns with uprobe == NULL and is_swbp == 1,
handle_swbp() treat this case as the "normal" int3 without uprobe and do
if (!uprobe) {
if (is_swbp > 0) {
/* No matching uprobe; signal SIGTRAP. */
force_sig(SIGTRAP);
Does this answer your question?
Oleg.
On Wed, 10 Jul 2024 17:10:07 +0200
Oleg Nesterov <oleg@redhat.com> wrote:
> On 07/10, Masami Hiramatsu wrote:
> >
> > On Wed, 10 Jul 2024 16:00:45 +0200
> > Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > The comment above uprobe_write_opcode() is wrong, unapply_uprobe() calls
> > > it under mmap_read_lock() and this is correct.
> > >
> > > And it is completely unclear why register_for_each_vma() takes mmap_lock
> > > for writing, add a comment to explain that mmap_write_lock() is needed to
> > > avoid the following race:
> > >
> > > - A task T hits the bp installed by uprobe and calls
> > > find_active_uprobe()
> > >
> > > - uprobe_unregister() removes this uprobe/bp
> > >
> > > - T calls find_uprobe() which returns NULL
> > >
> > > - another uprobe_register() installs the bp at the same address
> > >
> > > - T calls is_trap_at_addr() which returns true
> > >
> > > - T returns to handle_swbp() and gets SIGTRAP.
>
> ...
>
> > > int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> > > @@ -1046,7 +1046,12 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
> > >
> > > if (err && is_register)
> > > goto free;
> > > -
> > > + /*
> > > + * We take mmap_lock for writing to avoid the race with
> > > + * find_active_uprobe(), install_breakpoint() must not
> > > + * make is_trap_at_addr() true right after find_uprobe()
> > > + * returns NULL.
> >
> > Sorry, I couldn't catch the latter part. What is the relationship of
> > taking the mmap_lock and install_breakpoint() and is_trap_at_addr() here?
>
> Please the the changelog above, it tries to explain this race with more
> details...
OK, but it seems we should write the above longer explanation here.
What about the comment like this?
/*
* We take mmap_lock for writing to avoid the race with
* find_active_uprobe() and is_trap_at_adder() in reader
* side.
* If the reader, which hits a swbp and is handling it,
* does not take mmap_lock for reading, it is possible
* that find_active_uprobe() returns NULL (because
* uprobe_unregister() removes uprobes right before that),
* but is_trap_at_addr() can return true afterwards (because
* another thread calls uprobe_register() on the same address).
* This causes unexpected SIGTRAP on reader thread.
* Taking mmap_lock avoids this race.
*/
>
> > You meant that find_active_uprobe() is using find_uprobe() which searchs
> > uprobe form rbtree?
>
> Yes,
>
> > But it seems uprobe is already inserted to the rbtree
> > in alloc_uprobe() so find_uprobe() will not return NULL here, right?
>
> uprobe_register() -> alloc_uprobe() can come after
> find_active_uprobe() -> find_uprobe() returns NULL.
>
> Now, if uprobe_register() -> register_for_each_vma() used mmap_read_lock(), it
> could do install_breakpoint() before find_active_uprobe() calls is_trap_at_addr().
>
> In this case find_active_uprobe() returns with uprobe == NULL and is_swbp == 1,
> handle_swbp() treat this case as the "normal" int3 without uprobe and do
>
> if (!uprobe) {
> if (is_swbp > 0) {
> /* No matching uprobe; signal SIGTRAP. */
> force_sig(SIGTRAP);
>
> Does this answer your question?
No, thanks for the explanation.
Thank you!
>
> Oleg.
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
On 07/11, Masami Hiramatsu wrote:
>
> On Wed, 10 Jul 2024 17:10:07 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > > > int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> > > > @@ -1046,7 +1046,12 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
> > > >
> > > > if (err && is_register)
> > > > goto free;
> > > > -
> > > > + /*
> > > > + * We take mmap_lock for writing to avoid the race with
> > > > + * find_active_uprobe(), install_breakpoint() must not
> > > > + * make is_trap_at_addr() true right after find_uprobe()
> > > > + * returns NULL.
> > >
...
> OK, but it seems we should write the above longer explanation here.
> What about the comment like this?
Well, I am biased, but your version looks much more confusing to me...
> /*
> * We take mmap_lock for writing to avoid the race with
> * find_active_uprobe() and is_trap_at_adder() in reader
> * side.
> * If the reader, which hits a swbp and is handling it,
> * does not take mmap_lock for reading,
this looks as if the reader which hits a swbp takes mmap_lock for reading
because of this race. No, find_active_uprobe() needs mmap_read_lock() for
vma_lookup, get_user_pages, etc.
> it is possible
> * that find_active_uprobe() returns NULL (because
> * uprobe_unregister() removes uprobes right before that),
> * but is_trap_at_addr() can return true afterwards (because
> * another thread calls uprobe_register() on the same address).
^^^^^^^^^^^^^^^
We are the thread which called uprobe_register(), we are going to
do install_breakpoint().
And btw, not that I think this makes sense, but register_for_each_vma()
could probably do
if (is_register)
mmap_write_lock(mm);
else
mmap_read_lock(mm);
Oleg.
On Thu, 11 Jul 2024 11:49:40 +0200 Oleg Nesterov <oleg@redhat.com> wrote: > On 07/11, Masami Hiramatsu wrote: > > > > On Wed, 10 Jul 2024 17:10:07 +0200 > > Oleg Nesterov <oleg@redhat.com> wrote: > > > > > > > int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, > > > > > @@ -1046,7 +1046,12 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new) > > > > > > > > > > if (err && is_register) > > > > > goto free; > > > > > - > > > > > + /* > > > > > + * We take mmap_lock for writing to avoid the race with > > > > > + * find_active_uprobe(), install_breakpoint() must not > > > > > + * make is_trap_at_addr() true right after find_uprobe() > > > > > + * returns NULL. > > > > > > ... > > > OK, but it seems we should write the above longer explanation here. > > What about the comment like this? > > Well, I am biased, but your version looks much more confusing to me... > > > /* > > * We take mmap_lock for writing to avoid the race with > > * find_active_uprobe() and is_trap_at_adder() in reader > > * side. > > * If the reader, which hits a swbp and is handling it, > > * does not take mmap_lock for reading, > > this looks as if the reader which hits a swbp takes mmap_lock for reading > because of this race. No, find_active_uprobe() needs mmap_read_lock() for > vma_lookup, get_user_pages, etc. OK, so it is for looking up VMA. (But in the end, this rock protects both the VMAs and uprobes, right?) > > > it is possible > > * that find_active_uprobe() returns NULL (because > > * uprobe_unregister() removes uprobes right before that), > > * but is_trap_at_addr() can return true afterwards (because > > * another thread calls uprobe_register() on the same address). > ^^^^^^^^^^^^^^^ > We are the thread which called uprobe_register(), we are going to > do install_breakpoint(). Ah, yes :) What about this? * We take mmap_lock for writing to avoid the race with * find_active_uprobe(), which takes mmap_lock for reading. * Thus this install_breakpoint() must not make * is_trap_at_addr() true right after find_uprobe() * returns NULL in find_active_uprobe(). > > And btw, not that I think this makes sense, but register_for_each_vma() > could probably do > > if (is_register) > mmap_write_lock(mm); > else > mmap_read_lock(mm); Agreed. Thank you, > > Oleg. > -- Masami Hiramatsu (Google) <mhiramat@kernel.org>
On 07/11, Masami Hiramatsu wrote: > > What about this? > > * We take mmap_lock for writing to avoid the race with > * find_active_uprobe(), which takes mmap_lock for reading. > * Thus this install_breakpoint() must not make > * is_trap_at_addr() true right after find_uprobe() > * returns NULL in find_active_uprobe(). Thanks! will change. Oleg.
© 2016 - 2025 Red Hat, Inc.