[RFC PATCH 0/8] uprobe/x86: Add support to optimize prologue

Jiri Olsa posted 8 patches 2 months, 3 weeks ago
arch/x86/include/asm/uprobes.h                          |  35 +++++++---
arch/x86/kernel/uprobes.c                               | 336 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------
include/linux/uprobes.h                                 |   1 +
kernel/events/uprobes.c                                 |   6 ++
tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c | 129 ++++++++++++++++++++++++++++++++-----
5 files changed, 434 insertions(+), 73 deletions(-)
[RFC PATCH 0/8] uprobe/x86: Add support to optimize prologue
Posted by Jiri Olsa 2 months, 3 weeks ago
hi,
the subject is bit too optimistic, in nutshell the idea is to allow
optimization on top of emulated instructions and then add support to
emulate more instructions with high presence in function prologues.

This patchset adds support to optimize uprobe on top of instruction
that could be emulated and also adds support to emulate particular
versions of mov and sub instructions to cover some of the user space
functions prologues, like:

  pushq %rbp
  movq  %rsp,%rbp
  subq  $0xb0,%rsp

The idea is to store instructions on underlying 5 bytes and emulate
them during the int3 and uprobe syscall execution:

  - install 'call trampoline' through standard int3 update
  - if int3 is hit before we finish optimizing we emulate
    all underlying instructions
  - when call is installed the uprobe syscall will emulate
    all underlying instructions

There's an additional issue that single instruction replacement does
not have and it's the possibility of the user space code to jump in the
middle of those 5 bytes. I think it's unlikely to happen at the function
prologue, but uprobe could be placed anywhere. I'm not sure how to
mitigate this other than having some enable/disable switch or config
option, which is unfortunate.

The patchset is based on bpf-next/master with [1] changes merged in.

thanks,
jirka


[1] https://lore.kernel.org/lkml/20251117093137.572132-1-jolsa@kernel.org/T/#m95a3208943ec24c5eee17ad6113002fdc6776cf8
---
Jiri Olsa (8):
      uprobe/x86: Introduce struct arch_uprobe_xol object
      uprobe/x86: Use struct arch_uprobe_xol in emulate callback
      uprobe/x86: Add support to emulate mov reg,reg instructions
      uprobe/x86: Add support to emulate sub imm,reg instructions
      uprobe/x86: Add support to optimize on top of emulated instructions
      selftests/bpf: Add test for mov and sub emulation
      selftests/bpf: Add test for uprobe prologue optimization
      selftests/bpf: Add race test for uprobe proglog optimization

 arch/x86/include/asm/uprobes.h                          |  35 +++++++---
 arch/x86/kernel/uprobes.c                               | 336 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------
 include/linux/uprobes.h                                 |   1 +
 kernel/events/uprobes.c                                 |   6 ++
 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c | 129 ++++++++++++++++++++++++++++++++-----
 5 files changed, 434 insertions(+), 73 deletions(-)
Re: [RFC PATCH 0/8] uprobe/x86: Add support to optimize prologue
Posted by Jiri Olsa 2 months ago
On Mon, Nov 17, 2025 at 01:40:49PM +0100, Jiri Olsa wrote:
> hi,
> the subject is bit too optimistic, in nutshell the idea is to allow
> optimization on top of emulated instructions and then add support to
> emulate more instructions with high presence in function prologues.
> 
> This patchset adds support to optimize uprobe on top of instruction
> that could be emulated and also adds support to emulate particular
> versions of mov and sub instructions to cover some of the user space
> functions prologues, like:
> 
>   pushq %rbp
>   movq  %rsp,%rbp
>   subq  $0xb0,%rsp
> 
> The idea is to store instructions on underlying 5 bytes and emulate
> them during the int3 and uprobe syscall execution:
> 
>   - install 'call trampoline' through standard int3 update
>   - if int3 is hit before we finish optimizing we emulate
>     all underlying instructions
>   - when call is installed the uprobe syscall will emulate
>     all underlying instructions

David, sorry I used wrong email.. I think the update here might
be a problem, any chance you could check?

thanks,
jirka


> 
> There's an additional issue that single instruction replacement does
> not have and it's the possibility of the user space code to jump in the
> middle of those 5 bytes. I think it's unlikely to happen at the function
> prologue, but uprobe could be placed anywhere. I'm not sure how to
> mitigate this other than having some enable/disable switch or config
> option, which is unfortunate.
> 
> The patchset is based on bpf-next/master with [1] changes merged in.
> 
> thanks,
> jirka
> 
> 
> [1] https://lore.kernel.org/lkml/20251117093137.572132-1-jolsa@kernel.org/T/#m95a3208943ec24c5eee17ad6113002fdc6776cf8
> ---
> Jiri Olsa (8):
>       uprobe/x86: Introduce struct arch_uprobe_xol object
>       uprobe/x86: Use struct arch_uprobe_xol in emulate callback
>       uprobe/x86: Add support to emulate mov reg,reg instructions
>       uprobe/x86: Add support to emulate sub imm,reg instructions
>       uprobe/x86: Add support to optimize on top of emulated instructions
>       selftests/bpf: Add test for mov and sub emulation
>       selftests/bpf: Add test for uprobe prologue optimization
>       selftests/bpf: Add race test for uprobe proglog optimization
> 
>  arch/x86/include/asm/uprobes.h                          |  35 +++++++---
>  arch/x86/kernel/uprobes.c                               | 336 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------
>  include/linux/uprobes.h                                 |   1 +
>  kernel/events/uprobes.c                                 |   6 ++
>  tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c | 129 ++++++++++++++++++++++++++++++++-----
>  5 files changed, 434 insertions(+), 73 deletions(-)
Re: [RFC PATCH 0/8] uprobe/x86: Add support to optimize prologue
Posted by Oleg Nesterov 2 months, 2 weeks ago
On 11/17, Jiri Olsa wrote:
>
> This patchset adds support to optimize uprobe on top of instruction
> that could be emulated and also adds support to emulate particular
> versions of mov and sub instructions to cover some of the user space
> functions prologues, like:
>
>   pushq %rbp
>   movq  %rsp,%rbp
>   subq  $0xb0,%rsp

...

> There's an additional issue that single instruction replacement does
> not have and it's the possibility of the user space code to jump in the
> middle of those 5 bytes. I think it's unlikely to happen at the function
> prologue, but uprobe could be placed anywhere. I'm not sure how to
> mitigate this other than having some enable/disable switch or config
> option, which is unfortunate.

plus this breaks single-stepping... Although perhaps we don't really care.

Oleg.
Re: [RFC PATCH 0/8] uprobe/x86: Add support to optimize prologue
Posted by Masami Hiramatsu (Google) 2 months ago
On Mon, 24 Nov 2025 19:12:42 +0100
Oleg Nesterov <oleg@redhat.com> wrote:

> On 11/17, Jiri Olsa wrote:
> >
> > This patchset adds support to optimize uprobe on top of instruction
> > that could be emulated and also adds support to emulate particular
> > versions of mov and sub instructions to cover some of the user space
> > functions prologues, like:
> >
> >   pushq %rbp
> >   movq  %rsp,%rbp
> >   subq  $0xb0,%rsp
> 
> ...
> 
> > There's an additional issue that single instruction replacement does
> > not have and it's the possibility of the user space code to jump in the
> > middle of those 5 bytes. I think it's unlikely to happen at the function
> > prologue, but uprobe could be placed anywhere. I'm not sure how to
> > mitigate this other than having some enable/disable switch or config
> > option, which is unfortunate.
> 
> plus this breaks single-stepping... Although perhaps we don't really care.

Yeah, and I think we can stop optimization if post_handler is set.

Thanks,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>
Re: [RFC PATCH 0/8] uprobe/x86: Add support to optimize prologue
Posted by Oleg Nesterov 2 months ago
On 12/08, Masami Hiramatsu wrote:
>
> On Mon, 24 Nov 2025 19:12:42 +0100
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > On 11/17, Jiri Olsa wrote:
> > >
> > > There's an additional issue that single instruction replacement does
> > > not have and it's the possibility of the user space code to jump in the
> > > middle of those 5 bytes. I think it's unlikely to happen at the function
> > > prologue, but uprobe could be placed anywhere. I'm not sure how to
> > > mitigate this other than having some enable/disable switch or config
> > > option, which is unfortunate.
> >
> > plus this breaks single-stepping... Although perhaps we don't really care.
>
> Yeah, and I think we can stop optimization if post_handler is set.

Hmm, why? This doesn't depend on whether ->ret_handler is set or not...

Oleg.