[PATCH bpf 2/7] bpf/x86: prevent trampoline attachment when args location on stack is uncertain

Alexis Lothoré (eBPF Foundation) posted 7 patches 3 months, 4 weeks ago
[PATCH bpf 2/7] bpf/x86: prevent trampoline attachment when args location on stack is uncertain
Posted by Alexis Lothoré (eBPF Foundation) 3 months, 4 weeks ago
When the target function receives more arguments than available
registers, the additional arguments are passed on stack, and so the
generated trampoline needs to read those to prepare the bpf context,
but also to prepare the target function stack when it is in charge of
calling it. This works well for scalar types, but if the value is a
struct, we can not know for sure the exact struct location, as it may
have been packed or manually aligned to a greater value.

Prevent wrong readings by refusing trampoline attachment if the target
function receives a struct on stack. While at it, move the max bpf args
check in the new function.

Fixes: 473e3150e30a ("bpf, x86: allow function arguments up to 12 for TRACING")
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
---
 arch/x86/net/bpf_jit_comp.c | 36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 9689834de1bb1a90fdc28156e0e2a56ac0ff2076..120e05a978679c046631cc94d942800c3051ad0a 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -3001,6 +3001,29 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
 	return 0;
 }
 
+static int validate_args(const struct btf_func_model *m)
+{
+	int i, arg_regs = 0, nr_regs = 0;
+
+	for (i = 0; i < min_t(int, m->nr_args, MAX_BPF_FUNC_ARGS); i++) {
+		arg_regs = (m->arg_size[i] + 7) / 8;
+
+		if (nr_regs + arg_regs > MAX_REGS_FOR_ARGS &&
+		    m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
+			return -ENOTSUPP;
+		nr_regs += arg_regs;
+	}
+
+	/* x86-64 supports up to MAX_BPF_FUNC_ARGS arguments. 1-6
+	 * are passed through regs, the remains are through stack.
+	 */
+	if (nr_regs > MAX_BPF_FUNC_ARGS)
+		return -ENOTSUPP;
+
+
+	return 0;
+}
+
 /* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
 #define LOAD_TRAMP_TAIL_CALL_CNT_PTR(stack)	\
 	__LOAD_TCC_PTR(-round_up(stack, 8) - 8)
@@ -3089,18 +3112,19 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 	WARN_ON_ONCE((flags & BPF_TRAMP_F_INDIRECT) &&
 		     (flags & ~(BPF_TRAMP_F_INDIRECT | BPF_TRAMP_F_RET_FENTRY_RET)));
 
+	/* make sure that any argument can be located and processed by the
+	 * trampoline
+	 */
+	ret = validate_args(m);
+	if (ret)
+		return ret;
+
 	/* extra registers for struct arguments */
 	for (i = 0; i < m->nr_args; i++) {
 		if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG)
 			nr_regs += (m->arg_size[i] + 7) / 8 - 1;
 	}
 
-	/* x86-64 supports up to MAX_BPF_FUNC_ARGS arguments. 1-6
-	 * are passed through regs, the remains are through stack.
-	 */
-	if (nr_regs > MAX_BPF_FUNC_ARGS)
-		return -ENOTSUPP;
-
 	/* Generated trampoline stack layout:
 	 *
 	 * RBP + 8         [ return address  ]

-- 
2.49.0

Re: [PATCH bpf 2/7] bpf/x86: prevent trampoline attachment when args location on stack is uncertain
Posted by Peter Zijlstra 3 months, 4 weeks ago
On Fri, Jun 13, 2025 at 09:37:11AM +0200, Alexis Lothoré (eBPF Foundation) wrote:
> When the target function receives more arguments than available
> registers, the additional arguments are passed on stack, and so the
> generated trampoline needs to read those to prepare the bpf context,
> but also to prepare the target function stack when it is in charge of
> calling it. This works well for scalar types, but if the value is a
> struct, we can not know for sure the exact struct location, as it may
> have been packed or manually aligned to a greater value.

https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf

Has fairly clear rules on how arguments are encoded. Broadly speaking
for the kernel, if the structure exceeds 2 registers in size, it is
passed as a reference, otherwise it is passed as two registers.
Re: [PATCH bpf 2/7] bpf/x86: prevent trampoline attachment when args location on stack is uncertain
Posted by Alexis Lothoré 3 months, 4 weeks ago
Hi Peter,

On Fri Jun 13, 2025 at 10:11 AM CEST, Peter Zijlstra wrote:
> On Fri, Jun 13, 2025 at 09:37:11AM +0200, Alexis Lothoré (eBPF Foundation) wrote:
>> When the target function receives more arguments than available
>> registers, the additional arguments are passed on stack, and so the
>> generated trampoline needs to read those to prepare the bpf context,
>> but also to prepare the target function stack when it is in charge of
>> calling it. This works well for scalar types, but if the value is a
>> struct, we can not know for sure the exact struct location, as it may
>> have been packed or manually aligned to a greater value.
>
> https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf
>
> Has fairly clear rules on how arguments are encoded. Broadly speaking
> for the kernel, if the structure exceeds 2 registers in size, it is
> passed as a reference, otherwise it is passed as two registers.

Maybe my commit wording is not precise enough, but indeed, there's not
doubt about whether the struct value is passed on the stack or through a
register/a pair of registers. The doubt is rather about the struct location
when it is passed _by value_ and _on the stack_: the ABI indeed clearly
states that "Structures and unions assume the alignment of their most
strictly aligned component" (p.13), but this rule is "silently broken" when
a struct has an __attribute__((packed)) or and __attribute__((aligned(X))),
and AFAICT this case can not be detected at runtime with current BTF info.

-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Re: [PATCH bpf 2/7] bpf/x86: prevent trampoline attachment when args location on stack is uncertain
Posted by Peter Zijlstra 3 months, 4 weeks ago
On Fri, Jun 13, 2025 at 10:26:37AM +0200, Alexis Lothoré wrote:
> Hi Peter,
> 
> On Fri Jun 13, 2025 at 10:11 AM CEST, Peter Zijlstra wrote:
> > On Fri, Jun 13, 2025 at 09:37:11AM +0200, Alexis Lothoré (eBPF Foundation) wrote:
> >> When the target function receives more arguments than available
> >> registers, the additional arguments are passed on stack, and so the
> >> generated trampoline needs to read those to prepare the bpf context,
> >> but also to prepare the target function stack when it is in charge of
> >> calling it. This works well for scalar types, but if the value is a
> >> struct, we can not know for sure the exact struct location, as it may
> >> have been packed or manually aligned to a greater value.
> >
> > https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf
> >
> > Has fairly clear rules on how arguments are encoded. Broadly speaking
> > for the kernel, if the structure exceeds 2 registers in size, it is
> > passed as a reference, otherwise it is passed as two registers.
> 
> Maybe my commit wording is not precise enough, but indeed, there's not
> doubt about whether the struct value is passed on the stack or through a
> register/a pair of registers. The doubt is rather about the struct location
> when it is passed _by value_ and _on the stack_: the ABI indeed clearly
> states that "Structures and unions assume the alignment of their most
> strictly aligned component" (p.13), but this rule is "silently broken" when
> a struct has an __attribute__((packed)) or and __attribute__((aligned(X))),
> and AFAICT this case can not be detected at runtime with current BTF info.

Ah, okay. So it is a failure of BTF. That was indeed not clear.
Re: [PATCH bpf 2/7] bpf/x86: prevent trampoline attachment when args location on stack is uncertain
Posted by Alexis Lothoré 3 months, 4 weeks ago
On Fri Jun 13, 2025 at 10:32 AM CEST, Peter Zijlstra wrote:
> On Fri, Jun 13, 2025 at 10:26:37AM +0200, Alexis Lothoré wrote:
>> Hi Peter,
>> 
>> On Fri Jun 13, 2025 at 10:11 AM CEST, Peter Zijlstra wrote:
>> > On Fri, Jun 13, 2025 at 09:37:11AM +0200, Alexis Lothoré (eBPF Foundation) wrote:

[...]

>> Maybe my commit wording is not precise enough, but indeed, there's not
>> doubt about whether the struct value is passed on the stack or through a
>> register/a pair of registers. The doubt is rather about the struct location
>> when it is passed _by value_ and _on the stack_: the ABI indeed clearly
>> states that "Structures and unions assume the alignment of their most
>> strictly aligned component" (p.13), but this rule is "silently broken" when
>> a struct has an __attribute__((packed)) or and __attribute__((aligned(X))),
>> and AFAICT this case can not be detected at runtime with current BTF info.
>
> Ah, okay. So it is a failure of BTF. That was indeed not clear.

If I need to respin, I'll rewrite the commit message to include the details
above.

Alexis




-- 
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Re: [PATCH bpf 2/7] bpf/x86: prevent trampoline attachment when args location on stack is uncertain
Posted by Alexei Starovoitov 3 months, 4 weeks ago
On Fri, Jun 13, 2025 at 1:59 AM Alexis Lothoré
<alexis.lothore@bootlin.com> wrote:
>
> On Fri Jun 13, 2025 at 10:32 AM CEST, Peter Zijlstra wrote:
> > On Fri, Jun 13, 2025 at 10:26:37AM +0200, Alexis Lothoré wrote:
> >> Hi Peter,
> >>
> >> On Fri Jun 13, 2025 at 10:11 AM CEST, Peter Zijlstra wrote:
> >> > On Fri, Jun 13, 2025 at 09:37:11AM +0200, Alexis Lothoré (eBPF Foundation) wrote:
>
> [...]
>
> >> Maybe my commit wording is not precise enough, but indeed, there's not
> >> doubt about whether the struct value is passed on the stack or through a
> >> register/a pair of registers. The doubt is rather about the struct location
> >> when it is passed _by value_ and _on the stack_: the ABI indeed clearly
> >> states that "Structures and unions assume the alignment of their most
> >> strictly aligned component" (p.13), but this rule is "silently broken" when
> >> a struct has an __attribute__((packed)) or and __attribute__((aligned(X))),
> >> and AFAICT this case can not be detected at runtime with current BTF info.
> >
> > Ah, okay. So it is a failure of BTF. That was indeed not clear.
>
> If I need to respin, I'll rewrite the commit message to include the details
> above.

No need to respin. The cover letter is quite detailed already.

But looking at the patch and this thread I think we need to agree
on the long term approach to BTF, since people assume that
it's a more compact dwarf and any missing information
should be added to it.
Like in this case special alignment case and packed attributes
are not expressed in BTF and I believe they should not be.
BTF is not a debug format and not a substitute for dwarf.
There is no goal to express everything possible in C.
It's minimal, because BTF is _practical_ description of
types and data present in the kernel.
I don't think the special case of packing and alignment exists
in the kernel today, so the current format is sufficient.
It doesn't miss anything.
I think we made arm64 JIT unnecessary restrictive and now considering
to make all other JITs restrictive too for hypothetical case
of some future kernel functions.
I feel we're going in the wrong direction.
Instead we should teach pahole to sanitize BTF where functions
are using this fancy alignment and packed structs.
pahole can see it in dwarf and can skip emitting BTF for such
functions. Then the kernel JITs on all architectures won't even
see such cases.

The issue was initially discovered by a selftest:
https://lore.kernel.org/bpf/20250411-many_args_arm64-v1-3-0a32fe72339e@bootlin.com/
that attempted to support these two types:
+struct bpf_testmod_struct_arg_4 {
+ __u64 a;
+ __u64 b;
+};
+
+struct bpf_testmod_struct_arg_5 {
+ __int128 a;
+};

The former is present in the kernel. It's more or less sockptr_t,
and people want to access it for observability in tracing.
The latter doesn't exist in the kernel and we cannot represent
it properly in BTF without losing alignment.

So I think we should go back to that series:
https://lore.kernel.org/bpf/20250411-many_args_arm64-v1-0-0a32fe72339e@bootlin.com/

remove __int128 selftest, but also teach pahole
to recognize types that cannot be represented in BTF and
don't emit them either into vmlinux or in kernel module
(like in this case it was bpf_testmod.ko)
I think that would be a better path forward aligned
with the long term goal of BTF.

And before people ask... pahole is a trusted component of the build
system. We trust it just as we trust gcc, clang, linker, objtool.
Re: [PATCH bpf 2/7] bpf/x86: prevent trampoline attachment when args location on stack is uncertain
Posted by Alexis Lothoré 3 months, 3 weeks ago
On Sat Jun 14, 2025 at 12:35 AM CEST, Alexei Starovoitov wrote:
> On Fri, Jun 13, 2025 at 1:59 AM Alexis Lothoré
> <alexis.lothore@bootlin.com> wrote:
>>
>> On Fri Jun 13, 2025 at 10:32 AM CEST, Peter Zijlstra wrote:
>> > On Fri, Jun 13, 2025 at 10:26:37AM +0200, Alexis Lothoré wrote:

[...]

>> If I need to respin, I'll rewrite the commit message to include the details
>> above.
>
> No need to respin. The cover letter is quite detailed already.
>
> But looking at the patch and this thread I think we need to agree
> on the long term approach to BTF, since people assume that
> it's a more compact dwarf and any missing information
> should be added to it.
> Like in this case special alignment case and packed attributes
> are not expressed in BTF and I believe they should not be.
> BTF is not a debug format and not a substitute for dwarf.
> There is no goal to express everything possible in C.
> It's minimal, because BTF is _practical_ description of
> types and data present in the kernel.
> I don't think the special case of packing and alignment exists
> in the kernel today, so the current format is sufficient.
> It doesn't miss anything.
> I think we made arm64 JIT unnecessary restrictive and now considering
> to make all other JITs restrictive too for hypothetical case
> of some future kernel functions.
> I feel we're going in the wrong direction.
> Instead we should teach pahole to sanitize BTF where functions
> are using this fancy alignment and packed structs.
> pahole can see it in dwarf and can skip emitting BTF for such
> functions. Then the kernel JITs on all architectures won't even
> see such cases.
>
> The issue was initially discovered by a selftest:
> https://lore.kernel.org/bpf/20250411-many_args_arm64-v1-3-0a32fe72339e@bootlin.com/
> that attempted to support these two types:
> +struct bpf_testmod_struct_arg_4 {
> + __u64 a;
> + __u64 b;
> +};
> +
> +struct bpf_testmod_struct_arg_5 {
> + __int128 a;
> +};
>
> The former is present in the kernel. It's more or less sockptr_t,
> and people want to access it for observability in tracing.
> The latter doesn't exist in the kernel and we cannot represent
> it properly in BTF without losing alignment.
>
> So I think we should go back to that series:
> https://lore.kernel.org/bpf/20250411-many_args_arm64-v1-0-0a32fe72339e@bootlin.com/
>
> remove __int128 selftest, but also teach pahole
> to recognize types that cannot be represented in BTF and
> don't emit them either into vmlinux or in kernel module
> (like in this case it was bpf_testmod.ko)
> I think that would be a better path forward aligned
> with the long term goal of BTF.
>
> And before people ask... pahole is a trusted component of the build
> system. We trust it just as we trust gcc, clang, linker, objtool.

So if I understand correctly your point, it would be better to just move out
those constraints from the JIT compilers, and just not represent those special
cases in BTF, so that it becomes impossible to hook programs on those functions,
since they are not event present in BTF info.
And so:
- cancel this series
- revert the small ARM64 check about struct passed on stack
- update pahole to make sure that it does not encode info about this specific
  kind of functions.

I still expect some challenges with this. AFAIU pahole uses DWARF to generate
BTF, and discussions in [1] highlighted the fact that the attributes altering
the structs alignment are not reliably encoded in DWARF. Maybe pahole can
"guess" if a struct has been altered, by doing something like
btf_is_struct_packed in libbpf ? As Andrii mentioned in [2], it may not be
able to cover all cases, but that could  be a start. If that's indeed the
desired direction, I can take a further look at this.

+ CC dwarves ML

Alexis

[1] https://lore.kernel.org/bpf/9a2ba0ad-b34d-42f8-89a6-d9a44f007bdc@linux.dev/
[2] https://lore.kernel.org/bpf/CAEf4BzZHMYyGDZ4c4eNXG7Fm=ecxCCbKhKbQTbCjvWmKtdwvBw@mail.gmail.com/
Re: [PATCH bpf 2/7] bpf/x86: prevent trampoline attachment when args location on stack is uncertain
Posted by Alexei Starovoitov 3 months, 3 weeks ago
On Sun, Jun 15, 2025 at 7:00 AM Alexis Lothoré
<alexis.lothore@bootlin.com> wrote:
>
> On Sat Jun 14, 2025 at 12:35 AM CEST, Alexei Starovoitov wrote:
> > On Fri, Jun 13, 2025 at 1:59 AM Alexis Lothoré
> > <alexis.lothore@bootlin.com> wrote:
> >>
> >> On Fri Jun 13, 2025 at 10:32 AM CEST, Peter Zijlstra wrote:
> >> > On Fri, Jun 13, 2025 at 10:26:37AM +0200, Alexis Lothoré wrote:
>
> [...]
>
> >> If I need to respin, I'll rewrite the commit message to include the details
> >> above.
> >
> > No need to respin. The cover letter is quite detailed already.
> >
> > But looking at the patch and this thread I think we need to agree
> > on the long term approach to BTF, since people assume that
> > it's a more compact dwarf and any missing information
> > should be added to it.
> > Like in this case special alignment case and packed attributes
> > are not expressed in BTF and I believe they should not be.
> > BTF is not a debug format and not a substitute for dwarf.
> > There is no goal to express everything possible in C.
> > It's minimal, because BTF is _practical_ description of
> > types and data present in the kernel.
> > I don't think the special case of packing and alignment exists
> > in the kernel today, so the current format is sufficient.
> > It doesn't miss anything.
> > I think we made arm64 JIT unnecessary restrictive and now considering
> > to make all other JITs restrictive too for hypothetical case
> > of some future kernel functions.
> > I feel we're going in the wrong direction.
> > Instead we should teach pahole to sanitize BTF where functions
> > are using this fancy alignment and packed structs.
> > pahole can see it in dwarf and can skip emitting BTF for such
> > functions. Then the kernel JITs on all architectures won't even
> > see such cases.
> >
> > The issue was initially discovered by a selftest:
> > https://lore.kernel.org/bpf/20250411-many_args_arm64-v1-3-0a32fe72339e@bootlin.com/
> > that attempted to support these two types:
> > +struct bpf_testmod_struct_arg_4 {
> > + __u64 a;
> > + __u64 b;
> > +};
> > +
> > +struct bpf_testmod_struct_arg_5 {
> > + __int128 a;
> > +};
> >
> > The former is present in the kernel. It's more or less sockptr_t,
> > and people want to access it for observability in tracing.
> > The latter doesn't exist in the kernel and we cannot represent
> > it properly in BTF without losing alignment.
> >
> > So I think we should go back to that series:
> > https://lore.kernel.org/bpf/20250411-many_args_arm64-v1-0-0a32fe72339e@bootlin.com/
> >
> > remove __int128 selftest, but also teach pahole
> > to recognize types that cannot be represented in BTF and
> > don't emit them either into vmlinux or in kernel module
> > (like in this case it was bpf_testmod.ko)
> > I think that would be a better path forward aligned
> > with the long term goal of BTF.
> >
> > And before people ask... pahole is a trusted component of the build
> > system. We trust it just as we trust gcc, clang, linker, objtool.
>
> So if I understand correctly your point, it would be better to just move out
> those constraints from the JIT compilers, and just not represent those special
> cases in BTF, so that it becomes impossible to hook programs on those functions,
> since they are not event present in BTF info.
> And so:
> - cancel this series
> - revert the small ARM64 check about struct passed on stack
> - update pahole to make sure that it does not encode info about this specific
>   kind of functions.

yes

> I still expect some challenges with this. AFAIU pahole uses DWARF to generate
> BTF, and discussions in [1] highlighted the fact that the attributes altering
> the structs alignment are not reliably encoded in DWARF. Maybe pahole can
> "guess" if a struct has been altered, by doing something like
> btf_is_struct_packed in libbpf ? As Andrii mentioned in [2], it may not be
> able to cover all cases, but that could  be a start. If that's indeed the
> desired direction, I can take a further look at this.

so be it. If syzbot was taught to fuzz dwarf I'm sure it would
have exposed hundreds of bugs in the format itself and compilers,
but since such convoluted constructs are not present in the kernel
source code it's not a concern.