[v3] bpf: Enhance __bpf_prog_map_compatible()

[PATCH bpf-next v3 3/6] bpf: Disallow !kprobe_write_ctx progs tail-calling kprobe_write_ctx progs

Posted by Leon Hwang 1 month ago

Uprobe programs that modify regs require different runtime assumptions
than those that do not. Mixing !kprobe_write_ctx progs with
kprobe_write_ctx progs via tail calls could break these assumptions.

To address this, reject the combination of !kprobe_write_ctx progs with
kprobe_write_ctx progs in bpf_map_owner_matches(), which prevents the
tail callee from modifying regs unexpectedly.

Also reject kprobe_write_ctx mismatches during initialization to
prevent bypassing the above restriction.

Without this check, the above restriction can be bypassed as follows.

struct {
	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
	__uint(max_entries, 1);
	__uint(key_size, 4);
	__uint(value_size, 4);
} jmp_table SEC(".maps");

SEC("?kprobe")
int prog_a(struct pt_regs *regs)
{
	regs->ax = 0;
	bpf_tail_call_static(regs, &jmp_table, 0);
	return 0;
}

SEC("?kprobe")
int prog_b(struct pt_regs *regs)
{
	bpf_tail_call_static(regs, &jmp_table, 0);
	return 0;
}

The jmp_table is shared between prog_a and prog_b.

* Load prog_a.
  At this point, owner->kprobe_write_ctx=true.
* Load prog_b.
  At this point, prog_b passes the compatibility check.
* Add prog_a to jmp_table.
* Attach prog_b to a kernel function.

When the kernel function runs, prog_a will unexpectedly modify regs.

Fixes: 7384893d970e ("bpf: Allow uprobe program to change context registers")
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 include/linux/bpf.h |  7 ++++---
 kernel/bpf/core.c   | 30 +++++++++++++++++++++++++-----
 2 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 05b34a6355b0..dbafed52b2ba 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -285,9 +285,10 @@ struct bpf_list_node_kern {
  */
 struct bpf_map_owner {
 	enum bpf_prog_type type;
-	bool jited;
-	bool xdp_has_frags;
-	bool sleepable;
+	u32 jited:1;
+	u32 xdp_has_frags:1;
+	u32 sleepable:1;
+	u32 kprobe_write_ctx:1;
 	u64 storage_cookie[MAX_BPF_CGROUP_STORAGE_TYPE];
 	const struct btf_type *attach_func_proto;
 	enum bpf_attach_type expected_attach_type;
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index b24a613d99f2..121a697d4da5 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2390,6 +2390,7 @@ static void bpf_map_owner_init(struct bpf_map_owner *owner, const struct bpf_pro
 	owner->jited = fp->jited;
 	owner->xdp_has_frags = aux->xdp_has_frags;
 	owner->sleepable = fp->sleepable;
+	owner->kprobe_write_ctx = aux->kprobe_write_ctx;
 	owner->expected_attach_type = fp->expected_attach_type;
 	owner->attach_func_proto = aux->attach_func_proto;
 	for_each_cgroup_storage_type(i)
@@ -2397,8 +2398,14 @@ static void bpf_map_owner_init(struct bpf_map_owner *owner, const struct bpf_pro
 			aux->cgroup_storage[i]->cookie : 0;
 }
 
+enum bpf_map_owner_match_type {
+	BPF_MAP_OWNER_MATCH_FOR_INIT,
+	BPF_MAP_OWNER_MATCH_FOR_UPDATE,
+};
+
 static bool bpf_map_owner_matches(const struct bpf_map *map, const struct bpf_prog *fp,
-				  enum bpf_prog_type prog_type)
+				  enum bpf_prog_type prog_type,
+				  enum bpf_map_owner_match_type match)
 {
 	struct bpf_map_owner *owner = map->owner;
 	struct bpf_prog_aux *aux = fp->aux;
@@ -2411,6 +2418,18 @@ static bool bpf_map_owner_matches(const struct bpf_map *map, const struct bpf_pr
 	    owner->sleepable != fp->sleepable)
 		return false;
 
+	switch (match) {
+	case BPF_MAP_OWNER_MATCH_FOR_INIT:
+		if (owner->kprobe_write_ctx != aux->kprobe_write_ctx)
+			return false;
+		break;
+
+	case BPF_MAP_OWNER_MATCH_FOR_UPDATE:
+		if (!owner->kprobe_write_ctx && aux->kprobe_write_ctx)
+			return false;
+		break;
+	}
+
 	if (map->map_type == BPF_MAP_TYPE_PROG_ARRAY &&
 	    owner->expected_attach_type != fp->expected_attach_type)
 		return false;
@@ -2436,7 +2455,8 @@ static bool bpf_map_owner_matches(const struct bpf_map *map, const struct bpf_pr
 }
 
 static bool __bpf_prog_map_compatible(struct bpf_map *map,
-				      const struct bpf_prog *fp)
+				      const struct bpf_prog *fp,
+				      enum bpf_map_owner_match_type match)
 {
 	enum bpf_prog_type prog_type = resolve_prog_type(fp);
 	bool ret = false;
@@ -2453,7 +2473,7 @@ static bool __bpf_prog_map_compatible(struct bpf_map *map,
 		bpf_map_owner_init(map->owner, fp, prog_type);
 		ret = true;
 	} else {
-		ret = bpf_map_owner_matches(map, fp, prog_type);
+		ret = bpf_map_owner_matches(map, fp, prog_type, match);
 	}
 err:
 	spin_unlock(&map->owner_lock);
@@ -2470,7 +2490,7 @@ bool bpf_prog_map_compatible(struct bpf_map *map, const struct bpf_prog *fp)
 	if (bpf_prog_is_dev_bound(fp->aux))
 		return false;
 
-	return __bpf_prog_map_compatible(map, fp);
+	return __bpf_prog_map_compatible(map, fp, BPF_MAP_OWNER_MATCH_FOR_UPDATE);
 }
 
 static int bpf_check_tail_call(const struct bpf_prog *fp)
@@ -2485,7 +2505,7 @@ static int bpf_check_tail_call(const struct bpf_prog *fp)
 		if (!map_type_contains_progs(map))
 			continue;
 
-		if (!__bpf_prog_map_compatible(map, fp)) {
+		if (!__bpf_prog_map_compatible(map, fp, BPF_MAP_OWNER_MATCH_FOR_INIT)) {
 			ret = -EINVAL;
 			goto out;
 		}
-- 
2.52.0

Re: [PATCH bpf-next v3 3/6] bpf: Disallow !kprobe_write_ctx progs tail-calling kprobe_write_ctx progs

Posted by Jiri Olsa 4 weeks ago

On Tue, Mar 03, 2026 at 11:06:36PM +0800, Leon Hwang wrote:
> Uprobe programs that modify regs require different runtime assumptions
> than those that do not. Mixing !kprobe_write_ctx progs with
> kprobe_write_ctx progs via tail calls could break these assumptions.
> 
> To address this, reject the combination of !kprobe_write_ctx progs with
> kprobe_write_ctx progs in bpf_map_owner_matches(), which prevents the
> tail callee from modifying regs unexpectedly.

hi,
could you please give some example where this is actual problem?
I'd expect it's up to user (whoever installs the tailcall map) to
avoid such situations.

thanks,
jirka


> 
> Also reject kprobe_write_ctx mismatches during initialization to
> prevent bypassing the above restriction.
> 
> Without this check, the above restriction can be bypassed as follows.
> 
> struct {
> 	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
> 	__uint(max_entries, 1);
> 	__uint(key_size, 4);
> 	__uint(value_size, 4);
> } jmp_table SEC(".maps");
> 
> SEC("?kprobe")
> int prog_a(struct pt_regs *regs)
> {
> 	regs->ax = 0;
> 	bpf_tail_call_static(regs, &jmp_table, 0);
> 	return 0;
> }
> 
> SEC("?kprobe")
> int prog_b(struct pt_regs *regs)
> {
> 	bpf_tail_call_static(regs, &jmp_table, 0);
> 	return 0;
> }
> 
> The jmp_table is shared between prog_a and prog_b.
> 
> * Load prog_a.
>   At this point, owner->kprobe_write_ctx=true.
> * Load prog_b.
>   At this point, prog_b passes the compatibility check.
> * Add prog_a to jmp_table.
> * Attach prog_b to a kernel function.
> 
> When the kernel function runs, prog_a will unexpectedly modify regs.
> 
> Fixes: 7384893d970e ("bpf: Allow uprobe program to change context registers")
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  include/linux/bpf.h |  7 ++++---
>  kernel/bpf/core.c   | 30 +++++++++++++++++++++++++-----
>  2 files changed, 29 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 05b34a6355b0..dbafed52b2ba 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -285,9 +285,10 @@ struct bpf_list_node_kern {
>   */
>  struct bpf_map_owner {
>  	enum bpf_prog_type type;
> -	bool jited;
> -	bool xdp_has_frags;
> -	bool sleepable;
> +	u32 jited:1;
> +	u32 xdp_has_frags:1;
> +	u32 sleepable:1;
> +	u32 kprobe_write_ctx:1;
>  	u64 storage_cookie[MAX_BPF_CGROUP_STORAGE_TYPE];
>  	const struct btf_type *attach_func_proto;
>  	enum bpf_attach_type expected_attach_type;
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index b24a613d99f2..121a697d4da5 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2390,6 +2390,7 @@ static void bpf_map_owner_init(struct bpf_map_owner *owner, const struct bpf_pro
>  	owner->jited = fp->jited;
>  	owner->xdp_has_frags = aux->xdp_has_frags;
>  	owner->sleepable = fp->sleepable;
> +	owner->kprobe_write_ctx = aux->kprobe_write_ctx;
>  	owner->expected_attach_type = fp->expected_attach_type;
>  	owner->attach_func_proto = aux->attach_func_proto;
>  	for_each_cgroup_storage_type(i)
> @@ -2397,8 +2398,14 @@ static void bpf_map_owner_init(struct bpf_map_owner *owner, const struct bpf_pro
>  			aux->cgroup_storage[i]->cookie : 0;
>  }
>  
> +enum bpf_map_owner_match_type {
> +	BPF_MAP_OWNER_MATCH_FOR_INIT,
> +	BPF_MAP_OWNER_MATCH_FOR_UPDATE,
> +};
> +
>  static bool bpf_map_owner_matches(const struct bpf_map *map, const struct bpf_prog *fp,
> -				  enum bpf_prog_type prog_type)
> +				  enum bpf_prog_type prog_type,
> +				  enum bpf_map_owner_match_type match)
>  {
>  	struct bpf_map_owner *owner = map->owner;
>  	struct bpf_prog_aux *aux = fp->aux;
> @@ -2411,6 +2418,18 @@ static bool bpf_map_owner_matches(const struct bpf_map *map, const struct bpf_pr
>  	    owner->sleepable != fp->sleepable)
>  		return false;
>  
> +	switch (match) {
> +	case BPF_MAP_OWNER_MATCH_FOR_INIT:
> +		if (owner->kprobe_write_ctx != aux->kprobe_write_ctx)
> +			return false;
> +		break;
> +
> +	case BPF_MAP_OWNER_MATCH_FOR_UPDATE:
> +		if (!owner->kprobe_write_ctx && aux->kprobe_write_ctx)
> +			return false;
> +		break;
> +	}
> +
>  	if (map->map_type == BPF_MAP_TYPE_PROG_ARRAY &&
>  	    owner->expected_attach_type != fp->expected_attach_type)
>  		return false;
> @@ -2436,7 +2455,8 @@ static bool bpf_map_owner_matches(const struct bpf_map *map, const struct bpf_pr
>  }
>  
>  static bool __bpf_prog_map_compatible(struct bpf_map *map,
> -				      const struct bpf_prog *fp)
> +				      const struct bpf_prog *fp,
> +				      enum bpf_map_owner_match_type match)
>  {
>  	enum bpf_prog_type prog_type = resolve_prog_type(fp);
>  	bool ret = false;
> @@ -2453,7 +2473,7 @@ static bool __bpf_prog_map_compatible(struct bpf_map *map,
>  		bpf_map_owner_init(map->owner, fp, prog_type);
>  		ret = true;
>  	} else {
> -		ret = bpf_map_owner_matches(map, fp, prog_type);
> +		ret = bpf_map_owner_matches(map, fp, prog_type, match);
>  	}
>  err:
>  	spin_unlock(&map->owner_lock);
> @@ -2470,7 +2490,7 @@ bool bpf_prog_map_compatible(struct bpf_map *map, const struct bpf_prog *fp)
>  	if (bpf_prog_is_dev_bound(fp->aux))
>  		return false;
>  
> -	return __bpf_prog_map_compatible(map, fp);
> +	return __bpf_prog_map_compatible(map, fp, BPF_MAP_OWNER_MATCH_FOR_UPDATE);
>  }
>  
>  static int bpf_check_tail_call(const struct bpf_prog *fp)
> @@ -2485,7 +2505,7 @@ static int bpf_check_tail_call(const struct bpf_prog *fp)
>  		if (!map_type_contains_progs(map))
>  			continue;
>  
> -		if (!__bpf_prog_map_compatible(map, fp)) {
> +		if (!__bpf_prog_map_compatible(map, fp, BPF_MAP_OWNER_MATCH_FOR_INIT)) {
>  			ret = -EINVAL;
>  			goto out;
>  		}
> -- 
> 2.52.0
>

Re: [PATCH bpf-next v3 3/6] bpf: Disallow !kprobe_write_ctx progs tail-calling kprobe_write_ctx progs

Posted by Leon Hwang 4 weeks ago

On 12/3/26 06:45, Jiri Olsa wrote:
> On Tue, Mar 03, 2026 at 11:06:36PM +0800, Leon Hwang wrote:
>> Uprobe programs that modify regs require different runtime assumptions
>> than those that do not. Mixing !kprobe_write_ctx progs with
>> kprobe_write_ctx progs via tail calls could break these assumptions.
>>
>> To address this, reject the combination of !kprobe_write_ctx progs with
>> kprobe_write_ctx progs in bpf_map_owner_matches(), which prevents the
>> tail callee from modifying regs unexpectedly.
> 
> hi,
> could you please give some example where this is actual problem?
> I'd expect it's up to user (whoever installs the tailcall map) to
> avoid such situations.
> 
The self test in patch #6 can verify the problem, that
kprobe_write_ctx=true progs can be abused to modify struct pt_regs via
tail calls when tracing kernel functions using kprobe_write_ctx=false prog.

Explain the problem by reusing the test code:

int dummy_run;
u64 data;

struct {
	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
	__uint(max_entries, 1);
	__uint(key_size, sizeof(__u32));
	__uint(value_size, sizeof(__u32));
} prog_array_dummy SEC(".maps");

SEC("?kprobe")
int dummy_kprobe(void *ctx)
{
	dummy_run++;
	bpf_tail_call_static(ctx, &prog_array_dummy, 0);
	return 0;
}

SEC("?kprobe")
int kprobe(struct pt_regs *regs)
{
	data = regs->di = 0;
	return 0;
}

The "kprobe" prog will be added to "prog_array_dummy" map.

In user space, the "dummy_kprobe" prog will attach to kernel function
"bpf_entry_test1".

Actually, without this patch, when "bpf_fentry_test1" runs, the arg "a"
will be updated as 0. Thus, bpf_prog_test_run_tracing() returns -EFAULT
instead of 0.

bpf_prog_test_run_tracing()
|-->bpf_fentry_test1()
    |-->dummy_kprobe()
        |-->kprobe() /* via tail call */
            |-->regs->di = 0;
    return 1; /* instead of 2 */
return -EFAULT;

Yep, the commit log is not clear to describe this abuse problem. Will
update it.

Thanks,
Leon

Re: [PATCH bpf-next v3 3/6] bpf: Disallow !kprobe_write_ctx progs tail-calling kprobe_write_ctx progs

Posted by Jiri Olsa 4 weeks ago

On Thu, Mar 12, 2026 at 10:24:24AM +0800, Leon Hwang wrote:
> On 12/3/26 06:45, Jiri Olsa wrote:
> > On Tue, Mar 03, 2026 at 11:06:36PM +0800, Leon Hwang wrote:
> >> Uprobe programs that modify regs require different runtime assumptions
> >> than those that do not. Mixing !kprobe_write_ctx progs with
> >> kprobe_write_ctx progs via tail calls could break these assumptions.
> >>
> >> To address this, reject the combination of !kprobe_write_ctx progs with
> >> kprobe_write_ctx progs in bpf_map_owner_matches(), which prevents the
> >> tail callee from modifying regs unexpectedly.
> > 
> > hi,
> > could you please give some example where this is actual problem?
> > I'd expect it's up to user (whoever installs the tailcall map) to
> > avoid such situations.
> > 
> The self test in patch #6 can verify the problem, that
> kprobe_write_ctx=true progs can be abused to modify struct pt_regs via
> tail calls when tracing kernel functions using kprobe_write_ctx=false prog.
> 
> Explain the problem by reusing the test code:
> 
> int dummy_run;
> u64 data;
> 
> struct {
> 	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
> 	__uint(max_entries, 1);
> 	__uint(key_size, sizeof(__u32));
> 	__uint(value_size, sizeof(__u32));
> } prog_array_dummy SEC(".maps");
> 
> SEC("?kprobe")
> int dummy_kprobe(void *ctx)
> {
> 	dummy_run++;
> 	bpf_tail_call_static(ctx, &prog_array_dummy, 0);
> 	return 0;
> }
> 
> SEC("?kprobe")
> int kprobe(struct pt_regs *regs)
> {
> 	data = regs->di = 0;
> 	return 0;
> }
> 
> The "kprobe" prog will be added to "prog_array_dummy" map.
> 
> In user space, the "dummy_kprobe" prog will attach to kernel function
> "bpf_entry_test1".
> 
> Actually, without this patch, when "bpf_fentry_test1" runs, the arg "a"
> will be updated as 0. Thus, bpf_prog_test_run_tracing() returns -EFAULT
> instead of 0.
> 
> bpf_prog_test_run_tracing()
> |-->bpf_fentry_test1()
>     |-->dummy_kprobe()
>         |-->kprobe() /* via tail call */
>             |-->regs->di = 0;
>     return 1; /* instead of 2 */
> return -EFAULT;
> 
> Yep, the commit log is not clear to describe this abuse problem. Will
> update it.

ah right :-\ ok, I think we need to do the suggested one way check and
that should prevent kprobes having writeable ctx

thanks,
jirka

Re: [PATCH bpf-next v3 3/6] bpf: Disallow !kprobe_write_ctx progs tail-calling kprobe_write_ctx progs

Posted by Leon Hwang 3 weeks, 6 days ago

On 2026/3/12 18:46, Jiri Olsa wrote:
> On Thu, Mar 12, 2026 at 10:24:24AM +0800, Leon Hwang wrote:
>> On 12/3/26 06:45, Jiri Olsa wrote:
[...]
>>
>> Actually, without this patch, when "bpf_fentry_test1" runs, the arg "a"
>> will be updated as 0. Thus, bpf_prog_test_run_tracing() returns -EFAULT
>> instead of 0.
>>
>> bpf_prog_test_run_tracing()
>> |-->bpf_fentry_test1()
>>     |-->dummy_kprobe()
>>         |-->kprobe() /* via tail call */
>>             |-->regs->di = 0;
>>     return 1; /* instead of 2 */
>> return -EFAULT;
>>
>> Yep, the commit log is not clear to describe this abuse problem. Will
>> update it.
> 
> ah right :-\ ok, I think we need to do the suggested one way check and
> that should prevent kprobes having writeable ctx
> 

Yeah, will apply strict both-ways check. See
https://lore.kernel.org/bpf/8a3c7f14-b3d1-4208-acbd-76e5db6c5320@linux.dev/.

Thanks,
Leon

Re: [PATCH bpf-next v3 3/6] bpf: Disallow !kprobe_write_ctx progs tail-calling kprobe_write_ctx progs

Posted by bot+bpf-ci@kernel.org 1 month ago

> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c

[ ... ]

> +	switch (match) {
> +	case BPF_MAP_OWNER_MATCH_FOR_INIT:
> +		if (owner->kprobe_write_ctx != aux->kprobe_write_ctx)
> +			return false;
> +		break;
> +
> +	case BPF_MAP_OWNER_MATCH_FOR_UPDATE:
> +		if (!owner->kprobe_write_ctx && aux->kprobe_write_ctx)
> +			return false;
> +		break;
> +	}

In the v1 review, Alexei Starovoitov asked for strict conformance
both ways rather than one-directional validation:

    "Don't be fancy. Require strict conformance both ways
    in *all* patches."

https://lore.kernel.org/bpf/CAADnVQ+-V0-1i8_+CYpK7G0CnV-2n8e9Szv43yM3Az197eL_0A@mail.gmail.com/

The BPF_MAP_OWNER_MATCH_FOR_UPDATE case still uses one-directional
matching, rejecting only !owner->kprobe_write_ctx && aux->kprobe_write_ctx
but allowing the reverse.  Was this an intentional design choice, and
if so, should the changelog note the disagreement?


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/22630000544

Re: [PATCH bpf-next v3 3/6] bpf: Disallow !kprobe_write_ctx progs tail-calling kprobe_write_ctx progs

Posted by Kumar Kartikeya Dwivedi 4 weeks, 1 day ago

On Tue, 3 Mar 2026 at 17:04, <bot+bpf-ci@kernel.org> wrote:
>
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
>
> [ ... ]
>
> > +     switch (match) {
> > +     case BPF_MAP_OWNER_MATCH_FOR_INIT:
> > +             if (owner->kprobe_write_ctx != aux->kprobe_write_ctx)
> > +                     return false;
> > +             break;
> > +
> > +     case BPF_MAP_OWNER_MATCH_FOR_UPDATE:
> > +             if (!owner->kprobe_write_ctx && aux->kprobe_write_ctx)
> > +                     return false;
> > +             break;
> > +     }
>
> In the v1 review, Alexei Starovoitov asked for strict conformance
> both ways rather than one-directional validation:
>
>     "Don't be fancy. Require strict conformance both ways
>     in *all* patches."
>
> https://lore.kernel.org/bpf/CAADnVQ+-V0-1i8_+CYpK7G0CnV-2n8e9Szv43yM3Az197eL_0A@mail.gmail.com/
>
> The BPF_MAP_OWNER_MATCH_FOR_UPDATE case still uses one-directional
> matching, rejecting only !owner->kprobe_write_ctx && aux->kprobe_write_ctx
> but allowing the reverse.  Was this an intentional design choice, and
> if so, should the changelog note the disagreement?
>

Let's follow the approach Alexei outlined, while the changes look ok
to me, let's remove the one way check and just do
owner->kprobe_write_ctx != aux->kprobe_write_ctx, same for the other
two checks. With this we can also get rid of this INIT vs UPDATE
distinction.

Other than that I think patches are good, please also test both
directions in the selftest in next respin.

>
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
>
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/22630000544

Re: [PATCH bpf-next v3 3/6] bpf: Disallow !kprobe_write_ctx progs tail-calling kprobe_write_ctx progs

Posted by Leon Hwang 4 weeks, 1 day ago

On 11/3/26 01:23, Kumar Kartikeya Dwivedi wrote:
> On Tue, 3 Mar 2026 at 17:04, <bot+bpf-ci@kernel.org> wrote:
>>
>>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>>> --- a/kernel/bpf/core.c
>>> +++ b/kernel/bpf/core.c
>>
>> [ ... ]
>>
>>> +     switch (match) {
>>> +     case BPF_MAP_OWNER_MATCH_FOR_INIT:
>>> +             if (owner->kprobe_write_ctx != aux->kprobe_write_ctx)
>>> +                     return false;
>>> +             break;
>>> +
>>> +     case BPF_MAP_OWNER_MATCH_FOR_UPDATE:
>>> +             if (!owner->kprobe_write_ctx && aux->kprobe_write_ctx)
>>> +                     return false;
>>> +             break;
>>> +     }
>>
>> In the v1 review, Alexei Starovoitov asked for strict conformance
>> both ways rather than one-directional validation:
>>
>>     "Don't be fancy. Require strict conformance both ways
>>     in *all* patches."
>>
>> https://lore.kernel.org/bpf/CAADnVQ+-V0-1i8_+CYpK7G0CnV-2n8e9Szv43yM3Az197eL_0A@mail.gmail.com/
>>
>> The BPF_MAP_OWNER_MATCH_FOR_UPDATE case still uses one-directional
>> matching, rejecting only !owner->kprobe_write_ctx && aux->kprobe_write_ctx
>> but allowing the reverse.  Was this an intentional design choice, and
>> if so, should the changelog note the disagreement?
>>
> 
> Let's follow the approach Alexei outlined, while the changes look ok
> to me, let's remove the one way check and just do
> owner->kprobe_write_ctx != aux->kprobe_write_ctx, same for the other
> two checks. With this we can also get rid of this INIT vs UPDATE
> distinction.
> 
> Other than that I think patches are good, please also test both
> directions in the selftest in next respin.
> 

Hi Kumar,

Thanks for your review.

I agreed that both-ways check could simplify the code. But, the UX could
not be great.

Let me explain it with an example.

struct {
	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
	__uint(max_entries, 1);
	__uint(key_size, sizeof(__u32));
	__uint(value_size, sizeof(__u32));
} jmp_table SEC(".maps");

SEC("?kprobe")
int prog_a(struct pt_regs *regs)
{
	regs->ax = 0;
	bpf_tail_call_static(regs, &jmp_table, 0);
	return 0;
}

SEC("?kprobe")
int prog_b(struct pt_regs *regs)
{
	return 0;
}

prog_a is kprobe_write_ctx.
prog_b is !kprobe_write_ctx. And, it will be added to jmp_table.

With both-ways check, prog_b is required to modify regs. I think it's
too restrictive for users to use prog_b as tail callee.

With one-way-check of this patch, prog_b can be used as tail callee to
keep regs as-is.

This was what I was concerned about, and the reason why I did not follow
Alexei's approach.

Thanks,
Leon

Re: [PATCH bpf-next v3 3/6] bpf: Disallow !kprobe_write_ctx progs tail-calling kprobe_write_ctx progs

Posted by Kumar Kartikeya Dwivedi 4 weeks, 1 day ago

On Wed, 11 Mar 2026 at 07:08, Leon Hwang <leon.hwang@linux.dev> wrote:
>
> On 11/3/26 01:23, Kumar Kartikeya Dwivedi wrote:
> > On Tue, 3 Mar 2026 at 17:04, <bot+bpf-ci@kernel.org> wrote:
> >>
> >>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> >>> --- a/kernel/bpf/core.c
> >>> +++ b/kernel/bpf/core.c
> >>
> >> [ ... ]
> >>
> >>> +     switch (match) {
> >>> +     case BPF_MAP_OWNER_MATCH_FOR_INIT:
> >>> +             if (owner->kprobe_write_ctx != aux->kprobe_write_ctx)
> >>> +                     return false;
> >>> +             break;
> >>> +
> >>> +     case BPF_MAP_OWNER_MATCH_FOR_UPDATE:
> >>> +             if (!owner->kprobe_write_ctx && aux->kprobe_write_ctx)
> >>> +                     return false;
> >>> +             break;
> >>> +     }
> >>
> >> In the v1 review, Alexei Starovoitov asked for strict conformance
> >> both ways rather than one-directional validation:
> >>
> >>     "Don't be fancy. Require strict conformance both ways
> >>     in *all* patches."
> >>
> >> https://lore.kernel.org/bpf/CAADnVQ+-V0-1i8_+CYpK7G0CnV-2n8e9Szv43yM3Az197eL_0A@mail.gmail.com/
> >>
> >> The BPF_MAP_OWNER_MATCH_FOR_UPDATE case still uses one-directional
> >> matching, rejecting only !owner->kprobe_write_ctx && aux->kprobe_write_ctx
> >> but allowing the reverse.  Was this an intentional design choice, and
> >> if so, should the changelog note the disagreement?
> >>
> >
> > Let's follow the approach Alexei outlined, while the changes look ok
> > to me, let's remove the one way check and just do
> > owner->kprobe_write_ctx != aux->kprobe_write_ctx, same for the other
> > two checks. With this we can also get rid of this INIT vs UPDATE
> > distinction.
> >
> > Other than that I think patches are good, please also test both
> > directions in the selftest in next respin.
> >
>
> Hi Kumar,
>
> Thanks for your review.
>
> I agreed that both-ways check could simplify the code. But, the UX could
> not be great.
>
> Let me explain it with an example.
>
> struct {
>         __uint(type, BPF_MAP_TYPE_PROG_ARRAY);
>         __uint(max_entries, 1);
>         __uint(key_size, sizeof(__u32));
>         __uint(value_size, sizeof(__u32));
> } jmp_table SEC(".maps");
>
> SEC("?kprobe")
> int prog_a(struct pt_regs *regs)
> {
>         regs->ax = 0;
>         bpf_tail_call_static(regs, &jmp_table, 0);
>         return 0;
> }
>
> SEC("?kprobe")
> int prog_b(struct pt_regs *regs)
> {
>         return 0;
> }
>
> prog_a is kprobe_write_ctx.
> prog_b is !kprobe_write_ctx. And, it will be added to jmp_table.
>
> With both-ways check, prog_b is required to modify regs. I think it's
> too restrictive for users to use prog_b as tail callee.
>
> With one-way-check of this patch, prog_b can be used as tail callee to
> keep regs as-is.
>
> This was what I was concerned about, and the reason why I did not follow
> Alexei's approach.

I agree but the main question is whether such a case is realistic, are
we going to have write_ctx programs tail calling this way?
Tail calls are already pretty rare, thinking more about it extension
programs are probably also broken wrt checks in this set.
bpf_check_attach_target is doing none of these things for
prog_extension = true. Nobody reported a problem, so I doubt anyone is
hitting this.
It probably also needs to be fixed.
Since you noticed it, we should close the gap conservatively for now,
and wait for a real use case to pop up before enabling this one-way.

>
> Thanks,
> Leon
>

Re: [PATCH bpf-next v3 3/6] bpf: Disallow !kprobe_write_ctx progs tail-calling kprobe_write_ctx progs

Posted by Alexei Starovoitov 4 weeks ago

On Wed, Mar 11, 2026 at 2:22 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Wed, 11 Mar 2026 at 07:08, Leon Hwang <leon.hwang@linux.dev> wrote:
> >
> > On 11/3/26 01:23, Kumar Kartikeya Dwivedi wrote:
> > > On Tue, 3 Mar 2026 at 17:04, <bot+bpf-ci@kernel.org> wrote:
> > >>
> > >>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > >>> --- a/kernel/bpf/core.c
> > >>> +++ b/kernel/bpf/core.c
> > >>
> > >> [ ... ]
> > >>
> > >>> +     switch (match) {
> > >>> +     case BPF_MAP_OWNER_MATCH_FOR_INIT:
> > >>> +             if (owner->kprobe_write_ctx != aux->kprobe_write_ctx)
> > >>> +                     return false;
> > >>> +             break;
> > >>> +
> > >>> +     case BPF_MAP_OWNER_MATCH_FOR_UPDATE:
> > >>> +             if (!owner->kprobe_write_ctx && aux->kprobe_write_ctx)
> > >>> +                     return false;
> > >>> +             break;
> > >>> +     }
> > >>
> > >> In the v1 review, Alexei Starovoitov asked for strict conformance
> > >> both ways rather than one-directional validation:
> > >>
> > >>     "Don't be fancy. Require strict conformance both ways
> > >>     in *all* patches."
> > >>
> > >> https://lore.kernel.org/bpf/CAADnVQ+-V0-1i8_+CYpK7G0CnV-2n8e9Szv43yM3Az197eL_0A@mail.gmail.com/
> > >>
> > >> The BPF_MAP_OWNER_MATCH_FOR_UPDATE case still uses one-directional
> > >> matching, rejecting only !owner->kprobe_write_ctx && aux->kprobe_write_ctx
> > >> but allowing the reverse.  Was this an intentional design choice, and
> > >> if so, should the changelog note the disagreement?
> > >>
> > >
> > > Let's follow the approach Alexei outlined, while the changes look ok
> > > to me, let's remove the one way check and just do
> > > owner->kprobe_write_ctx != aux->kprobe_write_ctx, same for the other
> > > two checks. With this we can also get rid of this INIT vs UPDATE
> > > distinction.
> > >
> > > Other than that I think patches are good, please also test both
> > > directions in the selftest in next respin.
> > >
> >
> > Hi Kumar,
> >
> > Thanks for your review.
> >
> > I agreed that both-ways check could simplify the code. But, the UX could
> > not be great.
> >
> > Let me explain it with an example.
> >
> > struct {
> >         __uint(type, BPF_MAP_TYPE_PROG_ARRAY);
> >         __uint(max_entries, 1);
> >         __uint(key_size, sizeof(__u32));
> >         __uint(value_size, sizeof(__u32));
> > } jmp_table SEC(".maps");
> >
> > SEC("?kprobe")
> > int prog_a(struct pt_regs *regs)
> > {
> >         regs->ax = 0;
> >         bpf_tail_call_static(regs, &jmp_table, 0);
> >         return 0;
> > }
> >
> > SEC("?kprobe")
> > int prog_b(struct pt_regs *regs)
> > {
> >         return 0;
> > }
> >
> > prog_a is kprobe_write_ctx.
> > prog_b is !kprobe_write_ctx. And, it will be added to jmp_table.
> >
> > With both-ways check, prog_b is required to modify regs. I think it's
> > too restrictive for users to use prog_b as tail callee.
> >
> > With one-way-check of this patch, prog_b can be used as tail callee to
> > keep regs as-is.
> >
> > This was what I was concerned about, and the reason why I did not follow
> > Alexei's approach.
>
> I agree but the main question is whether such a case is realistic, are
> we going to have write_ctx programs tail calling this way?
> Tail calls are already pretty rare, thinking more about it extension
> programs are probably also broken wrt checks in this set.
> bpf_check_attach_target is doing none of these things for
> prog_extension = true. Nobody reported a problem, so I doubt anyone is
> hitting this.
> It probably also needs to be fixed.
> Since you noticed it, we should close the gap conservatively for now,
> and wait for a real use case to pop up before enabling this one-way.

+1
tail_calls in general hopefully will be deprecated soon.
As soon as we have support for indirect calls there won't be any reason
for tail calls to exist. (other than not breaking current users).
We definitely don't want to open it up further.
So the simplest fix.

Re: [PATCH bpf-next v3 3/6] bpf: Disallow !kprobe_write_ctx progs tail-calling kprobe_write_ctx progs

Posted by Leon Hwang 4 weeks ago

On 2026/3/11 23:44, Alexei Starovoitov wrote:
> On Wed, Mar 11, 2026 at 2:22 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
>>

[...]

>>
>> I agree but the main question is whether such a case is realistic, are
>> we going to have write_ctx programs tail calling this way?
>> Tail calls are already pretty rare, thinking more about it extension
>> programs are probably also broken wrt checks in this set.
>> bpf_check_attach_target is doing none of these things for
>> prog_extension = true. Nobody reported a problem, so I doubt anyone is
>> hitting this.
>> It probably also needs to be fixed.
>> Since you noticed it, we should close the gap conservatively for now,
>> and wait for a real use case to pop up before enabling this one-way.
> 
> +1
> tail_calls in general hopefully will be deprecated soon.
> As soon as we have support for indirect calls there won't be any reason
> for tail calls to exist. (other than not breaking current users).
> We definitely don't want to open it up further.
> So the simplest fix.

Got it.

Will follow the both-ways check approach in the next revision.

Will apply the conservative check to extension programs using another
patch series, after verifying the potential issues for them.

Thanks,
Leon