kernel/trace/bpf_trace.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
The "data" in struct bpf_session_run_ctx is always 8-bytes aligned.
Therefore, we can store the "is_return" to the last bit of the "data",
which can make bpf_session_run_ctx 8-bytes aligned and save memory.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
kernel/trace/bpf_trace.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index f2360579658e..b6a34aa039f6 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2269,7 +2269,6 @@ fs_initcall(bpf_event_init);
struct bpf_session_run_ctx {
struct bpf_run_ctx run_ctx;
- bool is_return;
void *data;
};
@@ -2535,8 +2534,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
{
struct bpf_kprobe_multi_run_ctx run_ctx = {
.session_ctx = {
- .is_return = is_return,
- .data = data,
+ .data = (void *)((unsigned long)data | !!is_return),
},
.link = link,
.entry_ip = entry_ip,
@@ -3058,8 +3056,7 @@ static int uprobe_prog_run(struct bpf_uprobe *uprobe,
struct bpf_uprobe_multi_link *link = uprobe->link;
struct bpf_uprobe_multi_run_ctx run_ctx = {
.session_ctx = {
- .is_return = is_return,
- .data = data,
+ .data = (void *)((unsigned long)data | !!is_return),
},
.entry_ip = entry_ip,
.uprobe = uprobe,
@@ -3316,7 +3313,7 @@ __bpf_kfunc bool bpf_session_is_return(void)
struct bpf_session_run_ctx *session_ctx;
session_ctx = container_of(current->bpf_ctx, struct bpf_session_run_ctx, run_ctx);
- return session_ctx->is_return;
+ return (unsigned long)session_ctx->data & 1;
}
__bpf_kfunc __u64 *bpf_session_cookie(void)
@@ -3324,7 +3321,7 @@ __bpf_kfunc __u64 *bpf_session_cookie(void)
struct bpf_session_run_ctx *session_ctx;
session_ctx = container_of(current->bpf_ctx, struct bpf_session_run_ctx, run_ctx);
- return session_ctx->data;
+ return (__u64 *)((unsigned long)session_ctx->data & ~1);
}
__bpf_kfunc_end_defs();
--
2.51.0
Hi Menglong, kernel test robot noticed the following build warnings: [auto build test WARNING on bpf-next/master] url: https://github.com/intel-lab-lkp/linux/commits/Menglong-Dong/bpf-remove-is_return-in-struct-bpf_session_run_ctx/20250922-175833 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master patch link: https://lore.kernel.org/r/20250922095705.252519-1-dongml2%40chinatelecom.cn patch subject: [PATCH bpf-next] bpf: remove is_return in struct bpf_session_run_ctx config: i386-randconfig-r113-20250923 (https://download.01.org/0day-ci/archive/20250923/202509231550.BWhLcP9e-lkp@intel.com/config) compiler: gcc-14 (Debian 14.2.0-19) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250923/202509231550.BWhLcP9e-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202509231550.BWhLcP9e-lkp@intel.com/ sparse warnings: (new ones prefixed by >>) kernel/trace/bpf_trace.c:833:41: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected void [noderef] __user *[addressable] [assigned] [usertype] sival_ptr @@ got void * @@ kernel/trace/bpf_trace.c:833:41: sparse: expected void [noderef] __user *[addressable] [assigned] [usertype] sival_ptr kernel/trace/bpf_trace.c:833:41: sparse: got void * >> kernel/trace/bpf_trace.c:3059:62: sparse: sparse: dubious: x | !y kernel/trace/bpf_trace.c:3512:52: sparse: sparse: cast removes address space '__user' of expression kernel/trace/bpf_trace.c:3526:56: sparse: sparse: cast removes address space '__user' of expression kernel/trace/bpf_trace.c:3540:52: sparse: sparse: cast removes address space '__user' of expression kernel/trace/bpf_trace.c:3547:56: sparse: sparse: cast removes address space '__user' of expression kernel/trace/bpf_trace.c:3555:52: sparse: sparse: cast removes address space '__user' of expression kernel/trace/bpf_trace.c:3563:56: sparse: sparse: cast removes address space '__user' of expression kernel/trace/bpf_trace.c: note: in included file (through include/linux/rbtree.h, include/linux/mm_types.h, include/linux/mmzone.h, ...): include/linux/rcupdate.h:869:9: sparse: sparse: context imbalance in 'uprobe_prog_run' - unexpected unlock vim +3059 kernel/trace/bpf_trace.c 3050 3051 static int uprobe_prog_run(struct bpf_uprobe *uprobe, 3052 unsigned long entry_ip, 3053 struct pt_regs *regs, 3054 bool is_return, void *data) 3055 { 3056 struct bpf_uprobe_multi_link *link = uprobe->link; 3057 struct bpf_uprobe_multi_run_ctx run_ctx = { 3058 .session_ctx = { > 3059 .data = (void *)((unsigned long)data | !!is_return), 3060 }, 3061 .entry_ip = entry_ip, 3062 .uprobe = uprobe, 3063 }; 3064 struct bpf_prog *prog = link->link.prog; 3065 bool sleepable = prog->sleepable; 3066 struct bpf_run_ctx *old_run_ctx; 3067 int err; 3068 3069 if (link->task && !same_thread_group(current, link->task)) 3070 return 0; 3071 3072 if (sleepable) 3073 rcu_read_lock_trace(); 3074 else 3075 rcu_read_lock(); 3076 3077 migrate_disable(); 3078 3079 old_run_ctx = bpf_set_run_ctx(&run_ctx.session_ctx.run_ctx); 3080 err = bpf_prog_run(link->link.prog, regs); 3081 bpf_reset_run_ctx(old_run_ctx); 3082 3083 migrate_enable(); 3084 3085 if (sleepable) 3086 rcu_read_unlock_trace(); 3087 else 3088 rcu_read_unlock(); 3089 return err; 3090 } 3091 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
On Mon, Sep 22, 2025 at 11:57 AM Menglong Dong <menglong8.dong@gmail.com> wrote: > > The "data" in struct bpf_session_run_ctx is always 8-bytes aligned. > Therefore, we can store the "is_return" to the last bit of the "data", > which can make bpf_session_run_ctx 8-bytes aligned and save memory. Does this really save anything? AFAICT, bpf_session_run_ctx is only allocated on the stack. Therefore, we don't save any memory unless there is potential risk of stack overflow. OTOH, this last-bit logic is confusing and error prone. I would argue against this type of optimization. Thanks, Song
On Mon, Sep 22, 2025 at 10:08 PM Song Liu <song@kernel.org> wrote: > > On Mon, Sep 22, 2025 at 11:57 AM Menglong Dong <menglong8.dong@gmail.com> wrote: > > > > The "data" in struct bpf_session_run_ctx is always 8-bytes aligned. > > Therefore, we can store the "is_return" to the last bit of the "data", > > which can make bpf_session_run_ctx 8-bytes aligned and save memory. > > Does this really save anything? AFAICT, bpf_session_run_ctx is > only allocated on the stack. Therefore, we don't save any memory > unless there is potential risk of stack overflow. Hi, Song. My original intention is to save the usage of the stack to prevent potential stack overflow, especially when we trace all the kernel functions with kprobe-multi. The most thing for me is that the unaligned field in the struct looks very awkward, and it consumes 8-bytes only for a bit. > > OTOH, this last-bit logic is confusing and error prone. I would argue > against this type of optimization. Ah, you are right about this part. It does make the code more confusing :/ Thanks! Menglong Dong > > Thanks, > Song
On Mon, Sep 22, 2025 at 7:11 AM Menglong Dong <menglong8.dong@gmail.com> wrote: > > On Mon, Sep 22, 2025 at 10:08 PM Song Liu <song@kernel.org> wrote: > > > > On Mon, Sep 22, 2025 at 11:57 AM Menglong Dong <menglong8.dong@gmail.com> wrote: > > > > > > The "data" in struct bpf_session_run_ctx is always 8-bytes aligned. > > > Therefore, we can store the "is_return" to the last bit of the "data", > > > which can make bpf_session_run_ctx 8-bytes aligned and save memory. > > > > Does this really save anything? AFAICT, bpf_session_run_ctx is > > only allocated on the stack. Therefore, we don't save any memory > > unless there is potential risk of stack overflow. > > Hi, Song. My original intention is to save the usage of the > stack to prevent potential stack overflow, 8 bytes won't matter, but wasting 8 bytes for 1 bit is indeed annoying. > especially when we > trace all the kernel functions with kprobe-multi. What do you mean? kprobe-multi won't recurse, so tracing all or a few functions is the same concern from stack overflow pov, no ? > The most thing for me is that the unaligned field in the struct > looks very awkward, and it consumes 8-bytes only for a bit. let's keep it as-is. If stack overflow is indeed an issue we need a generic way to detect it and prevent it. We've been thinking whether vmap stack guard pages can become JIT's extable-like things, so when stack overflow happens we unwind stack and stop bpf prog instead of panicing.
© 2016 - 2025 Red Hat, Inc.