[PATCH bpf-next] bpf: remove is_return in struct bpf_session_run_ctx

Menglong Dong posted 1 patch 1 week, 2 days ago
kernel/trace/bpf_trace.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
[PATCH bpf-next] bpf: remove is_return in struct bpf_session_run_ctx
Posted by Menglong Dong 1 week, 2 days ago
The "data" in struct bpf_session_run_ctx is always 8-bytes aligned.
Therefore, we can store the "is_return" to the last bit of the "data",
which can make bpf_session_run_ctx 8-bytes aligned and save memory.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 kernel/trace/bpf_trace.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index f2360579658e..b6a34aa039f6 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2269,7 +2269,6 @@ fs_initcall(bpf_event_init);
 
 struct bpf_session_run_ctx {
 	struct bpf_run_ctx run_ctx;
-	bool is_return;
 	void *data;
 };
 
@@ -2535,8 +2534,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
 {
 	struct bpf_kprobe_multi_run_ctx run_ctx = {
 		.session_ctx = {
-			.is_return = is_return,
-			.data = data,
+			.data = (void *)((unsigned long)data | !!is_return),
 		},
 		.link = link,
 		.entry_ip = entry_ip,
@@ -3058,8 +3056,7 @@ static int uprobe_prog_run(struct bpf_uprobe *uprobe,
 	struct bpf_uprobe_multi_link *link = uprobe->link;
 	struct bpf_uprobe_multi_run_ctx run_ctx = {
 		.session_ctx = {
-			.is_return = is_return,
-			.data = data,
+			.data = (void *)((unsigned long)data | !!is_return),
 		},
 		.entry_ip = entry_ip,
 		.uprobe = uprobe,
@@ -3316,7 +3313,7 @@ __bpf_kfunc bool bpf_session_is_return(void)
 	struct bpf_session_run_ctx *session_ctx;
 
 	session_ctx = container_of(current->bpf_ctx, struct bpf_session_run_ctx, run_ctx);
-	return session_ctx->is_return;
+	return (unsigned long)session_ctx->data & 1;
 }
 
 __bpf_kfunc __u64 *bpf_session_cookie(void)
@@ -3324,7 +3321,7 @@ __bpf_kfunc __u64 *bpf_session_cookie(void)
 	struct bpf_session_run_ctx *session_ctx;
 
 	session_ctx = container_of(current->bpf_ctx, struct bpf_session_run_ctx, run_ctx);
-	return session_ctx->data;
+	return (__u64 *)((unsigned long)session_ctx->data & ~1);
 }
 
 __bpf_kfunc_end_defs();
-- 
2.51.0
Re: [PATCH bpf-next] bpf: remove is_return in struct bpf_session_run_ctx
Posted by kernel test robot 1 week, 1 day ago
Hi Menglong,

kernel test robot noticed the following build warnings:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Menglong-Dong/bpf-remove-is_return-in-struct-bpf_session_run_ctx/20250922-175833
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20250922095705.252519-1-dongml2%40chinatelecom.cn
patch subject: [PATCH bpf-next] bpf: remove is_return in struct bpf_session_run_ctx
config: i386-randconfig-r113-20250923 (https://download.01.org/0day-ci/archive/20250923/202509231550.BWhLcP9e-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250923/202509231550.BWhLcP9e-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509231550.BWhLcP9e-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
   kernel/trace/bpf_trace.c:833:41: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected void [noderef] __user *[addressable] [assigned] [usertype] sival_ptr @@     got void * @@
   kernel/trace/bpf_trace.c:833:41: sparse:     expected void [noderef] __user *[addressable] [assigned] [usertype] sival_ptr
   kernel/trace/bpf_trace.c:833:41: sparse:     got void *
>> kernel/trace/bpf_trace.c:3059:62: sparse: sparse: dubious: x | !y
   kernel/trace/bpf_trace.c:3512:52: sparse: sparse: cast removes address space '__user' of expression
   kernel/trace/bpf_trace.c:3526:56: sparse: sparse: cast removes address space '__user' of expression
   kernel/trace/bpf_trace.c:3540:52: sparse: sparse: cast removes address space '__user' of expression
   kernel/trace/bpf_trace.c:3547:56: sparse: sparse: cast removes address space '__user' of expression
   kernel/trace/bpf_trace.c:3555:52: sparse: sparse: cast removes address space '__user' of expression
   kernel/trace/bpf_trace.c:3563:56: sparse: sparse: cast removes address space '__user' of expression
   kernel/trace/bpf_trace.c: note: in included file (through include/linux/rbtree.h, include/linux/mm_types.h, include/linux/mmzone.h, ...):
   include/linux/rcupdate.h:869:9: sparse: sparse: context imbalance in 'uprobe_prog_run' - unexpected unlock

vim +3059 kernel/trace/bpf_trace.c

  3050	
  3051	static int uprobe_prog_run(struct bpf_uprobe *uprobe,
  3052				   unsigned long entry_ip,
  3053				   struct pt_regs *regs,
  3054				   bool is_return, void *data)
  3055	{
  3056		struct bpf_uprobe_multi_link *link = uprobe->link;
  3057		struct bpf_uprobe_multi_run_ctx run_ctx = {
  3058			.session_ctx = {
> 3059				.data = (void *)((unsigned long)data | !!is_return),
  3060			},
  3061			.entry_ip = entry_ip,
  3062			.uprobe = uprobe,
  3063		};
  3064		struct bpf_prog *prog = link->link.prog;
  3065		bool sleepable = prog->sleepable;
  3066		struct bpf_run_ctx *old_run_ctx;
  3067		int err;
  3068	
  3069		if (link->task && !same_thread_group(current, link->task))
  3070			return 0;
  3071	
  3072		if (sleepable)
  3073			rcu_read_lock_trace();
  3074		else
  3075			rcu_read_lock();
  3076	
  3077		migrate_disable();
  3078	
  3079		old_run_ctx = bpf_set_run_ctx(&run_ctx.session_ctx.run_ctx);
  3080		err = bpf_prog_run(link->link.prog, regs);
  3081		bpf_reset_run_ctx(old_run_ctx);
  3082	
  3083		migrate_enable();
  3084	
  3085		if (sleepable)
  3086			rcu_read_unlock_trace();
  3087		else
  3088			rcu_read_unlock();
  3089		return err;
  3090	}
  3091	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH bpf-next] bpf: remove is_return in struct bpf_session_run_ctx
Posted by Song Liu 1 week, 2 days ago
On Mon, Sep 22, 2025 at 11:57 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> The "data" in struct bpf_session_run_ctx is always 8-bytes aligned.
> Therefore, we can store the "is_return" to the last bit of the "data",
> which can make bpf_session_run_ctx 8-bytes aligned and save memory.

Does this really save anything? AFAICT, bpf_session_run_ctx is
only allocated on the stack. Therefore, we don't save any memory
unless there is potential risk of stack overflow.

OTOH, this last-bit logic is confusing and error prone. I would argue
against this type of optimization.

Thanks,
Song
Re: [PATCH bpf-next] bpf: remove is_return in struct bpf_session_run_ctx
Posted by Menglong Dong 1 week, 2 days ago
On Mon, Sep 22, 2025 at 10:08 PM Song Liu <song@kernel.org> wrote:
>
> On Mon, Sep 22, 2025 at 11:57 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >
> > The "data" in struct bpf_session_run_ctx is always 8-bytes aligned.
> > Therefore, we can store the "is_return" to the last bit of the "data",
> > which can make bpf_session_run_ctx 8-bytes aligned and save memory.
>
> Does this really save anything? AFAICT, bpf_session_run_ctx is
> only allocated on the stack. Therefore, we don't save any memory
> unless there is potential risk of stack overflow.

Hi, Song. My original intention is to save the usage of the
stack to prevent potential stack overflow, especially when we
trace all the kernel functions with kprobe-multi.

The most thing for me is that the unaligned field in the struct
looks very awkward, and it consumes 8-bytes only for a bit.

>
> OTOH, this last-bit logic is confusing and error prone. I would argue
> against this type of optimization.

Ah, you are right about this part. It does make the code more
confusing :/

Thanks!
Menglong Dong

>
> Thanks,
> Song
Re: [PATCH bpf-next] bpf: remove is_return in struct bpf_session_run_ctx
Posted by Alexei Starovoitov 1 week, 1 day ago
On Mon, Sep 22, 2025 at 7:11 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> On Mon, Sep 22, 2025 at 10:08 PM Song Liu <song@kernel.org> wrote:
> >
> > On Mon, Sep 22, 2025 at 11:57 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > >
> > > The "data" in struct bpf_session_run_ctx is always 8-bytes aligned.
> > > Therefore, we can store the "is_return" to the last bit of the "data",
> > > which can make bpf_session_run_ctx 8-bytes aligned and save memory.
> >
> > Does this really save anything? AFAICT, bpf_session_run_ctx is
> > only allocated on the stack. Therefore, we don't save any memory
> > unless there is potential risk of stack overflow.
>
> Hi, Song. My original intention is to save the usage of the
> stack to prevent potential stack overflow,

8 bytes won't matter, but wasting 8 bytes for 1 bit is indeed annoying.

> especially when we
> trace all the kernel functions with kprobe-multi.

What do you mean? kprobe-multi won't recurse,
so tracing all or a few functions is the same concern
from stack overflow pov, no ?

> The most thing for me is that the unaligned field in the struct
> looks very awkward, and it consumes 8-bytes only for a bit.

let's keep it as-is. If stack overflow is indeed an issue we need
a generic way to detect it and prevent it.
We've been thinking whether vmap stack guard pages
can become JIT's extable-like things, so when stack overflow
happens we unwind stack and stop bpf prog instead of panicing.