From nobody Sat Oct 4 17:30:18 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BD8A625DB06; Thu, 14 Aug 2025 07:17:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755155876; cv=none; b=bpBG4pTxMx2YgcC8WY8TiZbQ9aI+TSjbt/kOxAR4P3TGANMdiFdJ24XLqbers2Iohp+DDBPZGM30GiFuXkKOdmgq5I4HwSxEtnipN+TLqLpEFjQNi7xd0iCfCgsIN2AEXoNNc58JZd0VQK6USFebU4uLjDa11KFDLChSH9nMIDQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755155876; c=relaxed/simple; bh=Wm2qdxRnDJvd/5aGLY/oyhdsp1MSPFfFQ9r/DRTCpr8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XxK2aoQsvLAuM1AWJ4V19nnwDFlqeDV4DxL5gwNrF3deEZVTSAjxsqFa/PRf+tSoxmPOCjiv5c3JdtmvtkvBXHkDr/Ntt46hfwpndxtDQ0APISKejvG6/H5xX5kdY9jdiew95zDeZbDhgU5o2/+ATK/Pg5v9O6F9h4jxw42BqNo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=IhFAjdoL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="IhFAjdoL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D62EEC4CEF7; Thu, 14 Aug 2025 07:17:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1755155876; bh=Wm2qdxRnDJvd/5aGLY/oyhdsp1MSPFfFQ9r/DRTCpr8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=IhFAjdoLmd2zCg0en6wVI/mTCNqPPfTOwY1eikoXfgH5sWlAo1wfslyxyImtYxRgJ AM6e9iFzlZ0CmxPNqRQx9RSqyvZJVoZA0CIsax9+TO+Pe6wKPoyyYdhpuegdaUtI+I 349RpQ32lB4PTq9NpOnSWEFR2nL2JZugq9NOFPAvXd14dokh3Wxgaw6CUkAC/3LtTM +bLAXznsiZbIjS1RNeawHVysG9nWK/aGYRbeUYIMAn7NJUQBr3qq1CVRwu6/EMC7ca bUJhcMmXEI4WKuZ5eo5SbsgiYULidMWZJpCSFpvXDDXdathEPudmNyU2zH9784AZ7T XEdCErZW5nALw== From: Namhyung Kim To: Arnaldo Carvalho de Melo , Ian Rogers , Kan Liang Cc: Jiri Olsa , Adrian Hunter , Peter Zijlstra , Ingo Molnar , LKML , linux-perf-users@vger.kernel.org, bpf@vger.kernel.org, Song Liu , Howard Chu , Jakub Brnak Subject: [PATCH 1/5] perf trace: use standard syscall tracepoint structs for augmentation Date: Thu, 14 Aug 2025 00:17:50 -0700 Message-ID: <20250814071754.193265-2-namhyung@kernel.org> X-Mailer: git-send-email 2.51.0.rc1.167.g924127e9c0-goog In-Reply-To: <20250814071754.193265-1-namhyung@kernel.org> References: <20250814071754.193265-1-namhyung@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Jakub Brnak Replace custom syscall structs with the standard trace_event_raw_sys_enter and trace_event_raw_sys_exit from vmlinux.h. This fixes a data structure misalignment issue discovered on RHEL-9, which prevented BPF programs from correctly accessing syscall arguments. This change also aims to improve compatibility between different version of the perf tool and kernel by using CO-RE so BPF code can correclty adjust field offsets. Signed-off-by: Jakub Brnak [ coding style updates and fix a BPF verifier issue ] Signed-off-by: Namhyung Kim --- .../bpf_skel/augmented_raw_syscalls.bpf.c | 62 ++++++++----------- tools/perf/util/bpf_skel/vmlinux/vmlinux.h | 14 +++++ 2 files changed, 40 insertions(+), 36 deletions(-) diff --git a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c b/tools/= perf/util/bpf_skel/augmented_raw_syscalls.bpf.c index cb86e261b4de0685..2c9bcc6b8cb0c06c 100644 --- a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c +++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c @@ -60,18 +60,6 @@ struct syscalls_sys_exit { __uint(max_entries, 512); } syscalls_sys_exit SEC(".maps"); =20 -struct syscall_enter_args { - unsigned long long common_tp_fields; - long syscall_nr; - unsigned long args[6]; -}; - -struct syscall_exit_args { - unsigned long long common_tp_fields; - long syscall_nr; - long ret; -}; - /* * Desired design of maximum size and alignment (see RFC2553) */ @@ -115,7 +103,7 @@ struct pids_filtered { } pids_filtered SEC(".maps"); =20 struct augmented_args_payload { - struct syscall_enter_args args; + struct trace_event_raw_sys_enter args; struct augmented_arg arg, arg2; // We have to reserve space for two argum= ents (rename, etc) }; =20 @@ -135,7 +123,7 @@ struct beauty_map_enter { } beauty_map_enter SEC(".maps"); =20 struct beauty_payload_enter { - struct syscall_enter_args args; + struct trace_event_raw_sys_enter args; struct augmented_arg aug_args[6]; }; =20 @@ -192,7 +180,7 @@ unsigned int augmented_arg__read_str(struct augmented_a= rg *augmented_arg, const } =20 SEC("tp/raw_syscalls/sys_enter") -int syscall_unaugmented(struct syscall_enter_args *args) +int syscall_unaugmented(struct trace_event_raw_sys_enter *args) { return 1; } @@ -204,7 +192,7 @@ int syscall_unaugmented(struct syscall_enter_args *args) * filename. */ SEC("tp/syscalls/sys_enter_connect") -int sys_enter_connect(struct syscall_enter_args *args) +int sys_enter_connect(struct trace_event_raw_sys_enter *args) { struct augmented_args_payload *augmented_args =3D augmented_args_payload(= ); const void *sockaddr_arg =3D (const void *)args->args[1]; @@ -225,7 +213,7 @@ int sys_enter_connect(struct syscall_enter_args *args) } =20 SEC("tp/syscalls/sys_enter_sendto") -int sys_enter_sendto(struct syscall_enter_args *args) +int sys_enter_sendto(struct trace_event_raw_sys_enter *args) { struct augmented_args_payload *augmented_args =3D augmented_args_payload(= ); const void *sockaddr_arg =3D (const void *)args->args[4]; @@ -243,7 +231,7 @@ int sys_enter_sendto(struct syscall_enter_args *args) } =20 SEC("tp/syscalls/sys_enter_open") -int sys_enter_open(struct syscall_enter_args *args) +int sys_enter_open(struct trace_event_raw_sys_enter *args) { struct augmented_args_payload *augmented_args =3D augmented_args_payload(= ); const void *filename_arg =3D (const void *)args->args[0]; @@ -258,7 +246,7 @@ int sys_enter_open(struct syscall_enter_args *args) } =20 SEC("tp/syscalls/sys_enter_openat") -int sys_enter_openat(struct syscall_enter_args *args) +int sys_enter_openat(struct trace_event_raw_sys_enter *args) { struct augmented_args_payload *augmented_args =3D augmented_args_payload(= ); const void *filename_arg =3D (const void *)args->args[1]; @@ -273,7 +261,7 @@ int sys_enter_openat(struct syscall_enter_args *args) } =20 SEC("tp/syscalls/sys_enter_rename") -int sys_enter_rename(struct syscall_enter_args *args) +int sys_enter_rename(struct trace_event_raw_sys_enter *args) { struct augmented_args_payload *augmented_args =3D augmented_args_payload(= ); const void *oldpath_arg =3D (const void *)args->args[0], @@ -304,7 +292,7 @@ int sys_enter_rename(struct syscall_enter_args *args) } =20 SEC("tp/syscalls/sys_enter_renameat2") -int sys_enter_renameat2(struct syscall_enter_args *args) +int sys_enter_renameat2(struct trace_event_raw_sys_enter *args) { struct augmented_args_payload *augmented_args =3D augmented_args_payload(= ); const void *oldpath_arg =3D (const void *)args->args[1], @@ -346,7 +334,7 @@ struct perf_event_attr_size { }; =20 SEC("tp/syscalls/sys_enter_perf_event_open") -int sys_enter_perf_event_open(struct syscall_enter_args *args) +int sys_enter_perf_event_open(struct trace_event_raw_sys_enter *args) { struct augmented_args_payload *augmented_args =3D augmented_args_payload(= ); const struct perf_event_attr_size *attr =3D (const struct perf_event_attr= _size *)args->args[0], *attr_read; @@ -378,7 +366,7 @@ int sys_enter_perf_event_open(struct syscall_enter_args= *args) } =20 SEC("tp/syscalls/sys_enter_clock_nanosleep") -int sys_enter_clock_nanosleep(struct syscall_enter_args *args) +int sys_enter_clock_nanosleep(struct trace_event_raw_sys_enter *args) { struct augmented_args_payload *augmented_args =3D augmented_args_payload(= ); const void *rqtp_arg =3D (const void *)args->args[2]; @@ -399,7 +387,7 @@ int sys_enter_clock_nanosleep(struct syscall_enter_args= *args) } =20 SEC("tp/syscalls/sys_enter_nanosleep") -int sys_enter_nanosleep(struct syscall_enter_args *args) +int sys_enter_nanosleep(struct trace_event_raw_sys_enter *args) { struct augmented_args_payload *augmented_args =3D augmented_args_payload(= ); const void *req_arg =3D (const void *)args->args[0]; @@ -429,7 +417,7 @@ static bool pid_filter__has(struct pids_filtered *pids,= pid_t pid) return bpf_map_lookup_elem(pids, &pid) !=3D NULL; } =20 -static int augment_sys_enter(void *ctx, struct syscall_enter_args *args) +static int augment_sys_enter(void *ctx, struct trace_event_raw_sys_enter *= args) { bool augmented, do_output =3D false; int zero =3D 0, index, value_size =3D sizeof(struct augmented_arg) - offs= etof(struct augmented_arg, value); @@ -444,7 +432,7 @@ static int augment_sys_enter(void *ctx, struct syscall_= enter_args *args) return 1; =20 /* use syscall number to get beauty_map entry */ - nr =3D (__u32)args->syscall_nr; + nr =3D (__u32)args->id; beauty_map =3D bpf_map_lookup_elem(&beauty_map_enter, &nr); =20 /* set up payload for output */ @@ -454,8 +442,8 @@ static int augment_sys_enter(void *ctx, struct syscall_= enter_args *args) if (beauty_map =3D=3D NULL || payload =3D=3D NULL) return 1; =20 - /* copy the sys_enter header, which has the syscall_nr */ - __builtin_memcpy(&payload->args, args, sizeof(struct syscall_enter_args)); + /* copy the sys_enter header, which has the id */ + __builtin_memcpy(&payload->args, args, sizeof(*args)); =20 /* * Determine what type of argument and how many bytes to read from user s= pace, using the @@ -489,9 +477,11 @@ static int augment_sys_enter(void *ctx, struct syscall= _enter_args *args) index =3D -(size + 1); barrier_var(index); // Prevent clang (noticed with v18) from removing t= he &=3D 7 trick. index &=3D 7; // Satisfy the bounds checking with the verifier in s= ome kernels. - aug_size =3D args->args[index] > TRACE_AUG_MAX_BUF ? TRACE_AUG_MAX_BUF = : args->args[index]; + aug_size =3D args->args[index]; =20 if (aug_size > 0) { + if (aug_size > TRACE_AUG_MAX_BUF) + aug_size =3D TRACE_AUG_MAX_BUF; if (!bpf_probe_read_user(((struct augmented_arg *)payload_offset)->val= ue, aug_size, arg)) augmented =3D true; } @@ -515,14 +505,14 @@ static int augment_sys_enter(void *ctx, struct syscal= l_enter_args *args) } } =20 - if (!do_output || (sizeof(struct syscall_enter_args) + output) > sizeof(s= truct beauty_payload_enter)) + if (!do_output || (sizeof(*args) + output) > sizeof(*payload)) return 1; =20 - return augmented__beauty_output(ctx, payload, sizeof(struct syscall_enter= _args) + output); + return augmented__beauty_output(ctx, payload, sizeof(*args) + output); } =20 SEC("tp/raw_syscalls/sys_enter") -int sys_enter(struct syscall_enter_args *args) +int sys_enter(struct trace_event_raw_sys_enter *args) { struct augmented_args_payload *augmented_args; /* @@ -550,16 +540,16 @@ int sys_enter(struct syscall_enter_args *args) * unaugmented tracepoint payload. */ if (augment_sys_enter(args, &augmented_args->args)) - bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.syscall_nr= ); + bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.id); =20 // If not found on the PROG_ARRAY syscalls map, then we're filtering it: return 0; } =20 SEC("tp/raw_syscalls/sys_exit") -int sys_exit(struct syscall_exit_args *args) +int sys_exit(struct trace_event_raw_sys_exit *args) { - struct syscall_exit_args exit_args; + struct trace_event_raw_sys_exit exit_args; =20 if (pid_filter__has(&pids_filtered, getpid())) return 0; @@ -570,7 +560,7 @@ int sys_exit(struct syscall_exit_args *args) * "!raw_syscalls:unaugmented" that will just return 1 to return the * unaugmented tracepoint payload. */ - bpf_tail_call(args, &syscalls_sys_exit, exit_args.syscall_nr); + bpf_tail_call(args, &syscalls_sys_exit, exit_args.id); /* * If not found on the PROG_ARRAY syscalls map, then we're filtering it: */ diff --git a/tools/perf/util/bpf_skel/vmlinux/vmlinux.h b/tools/perf/util/b= pf_skel/vmlinux/vmlinux.h index a59ce912be18cd0f..b8b2347268633cdf 100644 --- a/tools/perf/util/bpf_skel/vmlinux/vmlinux.h +++ b/tools/perf/util/bpf_skel/vmlinux/vmlinux.h @@ -212,4 +212,18 @@ struct pglist_data { int nr_zones; } __attribute__((preserve_access_index)); =20 +struct trace_event_raw_sys_enter { + struct trace_entry ent; + long int id; + long unsigned int args[6]; + char __data[0]; +} __attribute__((preserve_access_index)); + +struct trace_event_raw_sys_exit { + struct trace_entry ent; + long int id; + long int ret; + char __data[0]; +} __attribute__((preserve_access_index)); + #endif // __VMLINUX_H --=20 2.51.0.rc1.167.g924127e9c0-goog From nobody Sat Oct 4 17:30:18 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4BB1D2690D9; Thu, 14 Aug 2025 07:17:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755155877; cv=none; b=tizHmc9lwcB+vGIO57YdXzOySWmjaw1TPMaWZlJcVGtWsi/jmkDg13LffIGlUvwc4p79Uf4YToxxClJAzzNQfDKtcHIq0vuuXuEZtBNaztv4NTYnDBOKhKmDTrECzmWJ3OzZVhWAXXLtNtv9Kc1+Y0OVmWSswp0J0g58wUY2WmQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755155877; c=relaxed/simple; bh=K94ORhqnl3c2smJSX8vrfd6Auv3Vej1J3NFaxf1as9M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Z1bI2xeN5zqq1aUVdhMZtljx2ml78dkyGs4ofudQak44vO9ludEnWQuPiV1Dp7geMeiAppaNhrc8vxYR+bj5wrecdpfJ4dSUest6lk4boDp/pyjleK16eWmFX2YI9rIZigb/yv08ViS5s78NuMCyFPVJTuPSxaiANfK0MzTzY/A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UeDjDgSo; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UeDjDgSo" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 73DBBC4CEF5; Thu, 14 Aug 2025 07:17:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1755155876; bh=K94ORhqnl3c2smJSX8vrfd6Auv3Vej1J3NFaxf1as9M=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UeDjDgSops4QH8fr5ivvWjkt+XOxRVMZcOMF3yosBOIRBsgEHyvJ3v82gk0yreYml deLgLt45Td/WAayt+kl+mO+jBSLSMwnd+jkgFkbJ6CYQ4vJcYjWCxz8VwiHYzhy2st ZgO7deSTJnZsB7cWV13j6PvZjxCco9rwfSXt+rFQJ0IclsFOidfx060o2vF2dWRzc8 gK/XwNDSZvmTXUXNnI1usgMj1Geyy5H1CHlb6RxNBQIicVRcYaMaZZXkB8SVeVDBKV Vi5tzV9JUT77A/xfBZjCBM1TspQ/dYVHiTHRQe/6a9Cu/zb8hRSmckv6cisqs4+T0s kK/LUPakYMQZw== From: Namhyung Kim To: Arnaldo Carvalho de Melo , Ian Rogers , Kan Liang Cc: Jiri Olsa , Adrian Hunter , Peter Zijlstra , Ingo Molnar , LKML , linux-perf-users@vger.kernel.org, bpf@vger.kernel.org, Song Liu , Howard Chu Subject: [PATCH 2/5] perf trace: Split unaugmented sys_exit program Date: Thu, 14 Aug 2025 00:17:51 -0700 Message-ID: <20250814071754.193265-3-namhyung@kernel.org> X-Mailer: git-send-email 2.51.0.rc1.167.g924127e9c0-goog In-Reply-To: <20250814071754.193265-1-namhyung@kernel.org> References: <20250814071754.193265-1-namhyung@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" We want to handle syscall exit path differently so let's split the unaugmented exit BPF program. Currently it does nothing (same as sys_enter). Signed-off-by: Namhyung Kim --- tools/perf/builtin-trace.c | 8 +++++--- tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c | 8 +++++++- tools/perf/util/bpf_trace_augment.c | 9 +++++++-- tools/perf/util/trace_augment.h | 10 ++++++++-- 4 files changed, 27 insertions(+), 8 deletions(-) diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c index fe737b3ac6e67d3b..1bc912273af2db66 100644 --- a/tools/perf/builtin-trace.c +++ b/tools/perf/builtin-trace.c @@ -3770,13 +3770,15 @@ static void trace__init_syscall_bpf_progs(struct tr= ace *trace, int e_machine, in static int trace__bpf_prog_sys_enter_fd(struct trace *trace, int e_machine= , int id) { struct syscall *sc =3D trace__syscall_info(trace, NULL, e_machine, id); - return sc ? bpf_program__fd(sc->bpf_prog.sys_enter) : bpf_program__fd(una= ugmented_prog); + return sc ? bpf_program__fd(sc->bpf_prog.sys_enter) : + bpf_program__fd(augmented_syscalls__unaugmented_enter()); } =20 static int trace__bpf_prog_sys_exit_fd(struct trace *trace, int e_machine,= int id) { struct syscall *sc =3D trace__syscall_info(trace, NULL, e_machine, id); - return sc ? bpf_program__fd(sc->bpf_prog.sys_exit) : bpf_program__fd(unau= gmented_prog); + return sc ? bpf_program__fd(sc->bpf_prog.sys_exit) : + bpf_program__fd(augmented_syscalls__unaugmented_exit()); } =20 static int trace__bpf_sys_enter_beauty_map(struct trace *trace, int e_mach= ine, int key, unsigned int *beauty_array) @@ -3977,7 +3979,7 @@ static int trace__init_syscalls_bpf_prog_array_maps(s= truct trace *trace, int e_m if (augmented_syscalls__get_map_fds(&map_enter_fd, &map_exit_fd, &beauty_= map_fd) < 0) return -1; =20 - unaugmented_prog =3D augmented_syscalls__unaugmented(); + unaugmented_prog =3D augmented_syscalls__unaugmented_enter(); =20 for (int i =3D 0, num_idx =3D syscalltbl__num_idx(e_machine); i < num_idx= ; ++i) { int prog_fd, key =3D syscalltbl__id_at_idx(e_machine, i); diff --git a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c b/tools/= perf/util/bpf_skel/augmented_raw_syscalls.bpf.c index 2c9bcc6b8cb0c06c..0016deb321fe0d97 100644 --- a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c +++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c @@ -180,7 +180,13 @@ unsigned int augmented_arg__read_str(struct augmented_= arg *augmented_arg, const } =20 SEC("tp/raw_syscalls/sys_enter") -int syscall_unaugmented(struct trace_event_raw_sys_enter *args) +int sys_enter_unaugmented(struct trace_event_raw_sys_enter *args) +{ + return 1; +} + +SEC("tp/raw_syscalls/sys_exit") +int sys_exit_unaugmented(struct trace_event_raw_sys_exit *args) { return 1; } diff --git a/tools/perf/util/bpf_trace_augment.c b/tools/perf/util/bpf_trac= e_augment.c index 56ed17534caa4f3f..f2792ede0249ab89 100644 --- a/tools/perf/util/bpf_trace_augment.c +++ b/tools/perf/util/bpf_trace_augment.c @@ -115,9 +115,14 @@ int augmented_syscalls__get_map_fds(int *enter_fd, int= *exit_fd, int *beauty_fd) return 0; } =20 -struct bpf_program *augmented_syscalls__unaugmented(void) +struct bpf_program *augmented_syscalls__unaugmented_enter(void) { - return skel->progs.syscall_unaugmented; + return skel->progs.sys_enter_unaugmented; +} + +struct bpf_program *augmented_syscalls__unaugmented_exit(void) +{ + return skel->progs.sys_exit_unaugmented; } =20 struct bpf_program *augmented_syscalls__find_by_title(const char *name) diff --git a/tools/perf/util/trace_augment.h b/tools/perf/util/trace_augmen= t.h index 4f729bc6775304b4..70b11d3f52906c36 100644 --- a/tools/perf/util/trace_augment.h +++ b/tools/perf/util/trace_augment.h @@ -14,7 +14,8 @@ void augmented_syscalls__setup_bpf_output(void); int augmented_syscalls__set_filter_pids(unsigned int nr, pid_t *pids); int augmented_syscalls__get_map_fds(int *enter_fd, int *exit_fd, int *beau= ty_fd); struct bpf_program *augmented_syscalls__find_by_title(const char *name); -struct bpf_program *augmented_syscalls__unaugmented(void); +struct bpf_program *augmented_syscalls__unaugmented_enter(void); +struct bpf_program *augmented_syscalls__unaugmented_exit(void); void augmented_syscalls__cleanup(void); =20 #else /* !HAVE_BPF_SKEL */ @@ -52,7 +53,12 @@ augmented_syscalls__find_by_title(const char *name __may= be_unused) return NULL; } =20 -static inline struct bpf_program *augmented_syscalls__unaugmented(void) +static inline struct bpf_program *augmented_syscalls__unaugmented_enter(vo= id) +{ + return NULL; +} + +static inline struct bpf_program *augmented_syscalls__unaugmented_exit(voi= d) { return NULL; } --=20 2.51.0.rc1.167.g924127e9c0-goog From nobody Sat Oct 4 17:30:18 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DFBA7253920; Thu, 14 Aug 2025 07:17:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755155878; cv=none; b=LfWs6U0j72rUIvRr9gUYM40A4WIs78NJEBytnfKdhcp7t9O2HX4ySMiYIjTEvmPbUtLhb612XoH6pRC8ubdM5WtNgtv1UiWQxATWErrKKoyMjb4hqknrmcy+2pxRzEIxUEqyslnBwI0gApwAt5Zw/9PcCIIsyLwaFe2nQHt1c7o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755155878; c=relaxed/simple; bh=x+9M5F5x387QNOybxfpdJicfYn3UHE1jdPztSgN78SE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=K1DkoqnKqlYNkpSXm1fCsdkH0jbjYXXK8A82QADIill/3bjdmhC5KgALut1ncwoaKZgklojj+HtIS96rZtNvbru2ZA4pLv0fpNyE8SlnGMgNFMAEHeMskd+8IwTT33aJNSa2wWNV6JreHA3nigL1I9Sj1YOfvVHunDjOpp9ASsY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=dp7+YgzF; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="dp7+YgzF" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 10CF5C4CEEF; Thu, 14 Aug 2025 07:17:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1755155877; bh=x+9M5F5x387QNOybxfpdJicfYn3UHE1jdPztSgN78SE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dp7+YgzFGHQmu223p6w/Om757sPoMtEVYKNNCZmJ+q81X0zu/YyarFt6EO0N4SG5V XcokePz6UGLH2rjT65QdvKW9dNo13MRGewRKo3osnHyWSat7mOJzNvRltbW1mM6ns+ gFWComMUk3o4mZNXG7u3pUfviVuZV+ez0QarIRpVFVZDFAqBhktZqdfkO5LylFFJ5Z 4Cw/hB51a3bY0E+2YIKrgnquMdow0aTcfF6E5ZC1Z38ezZlZlkzRR1YWBLrkuMSh1d OCMYJNH2CdcQcq8O9mrg4qwMlwalChbAZIGlHT9iZRV4DMSC5IvJnyS+GDEv3T0gRi F3B1a0sFAKT/Q== From: Namhyung Kim To: Arnaldo Carvalho de Melo , Ian Rogers , Kan Liang Cc: Jiri Olsa , Adrian Hunter , Peter Zijlstra , Ingo Molnar , LKML , linux-perf-users@vger.kernel.org, bpf@vger.kernel.org, Song Liu , Howard Chu Subject: [PATCH 3/5] perf trace: Do not return 0 from syscall tracepoint BPF Date: Thu, 14 Aug 2025 00:17:52 -0700 Message-ID: <20250814071754.193265-4-namhyung@kernel.org> X-Mailer: git-send-email 2.51.0.rc1.167.g924127e9c0-goog In-Reply-To: <20250814071754.193265-1-namhyung@kernel.org> References: <20250814071754.193265-1-namhyung@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Howard reported that returning 0 from the BPF resulted in affecting global syscall tracepoint handling. What we want to do is just to drop syscall output in the current perf session. So we need a different approach. Currently perf trace uses bpf-output event for augmented arguments and raw_syscalls:sys_{enter,exit} tracepoint events for normal arguments. But I think we can just use bpf-output in both cases and drop the trace point events. Then it needs to distinguish bpf-output data if it's for enter or exit. Repurpose struct trace_entry.type which is common in both syscall entry and exit tracepoints. Closes: https://lore.kernel.org/r/20250529065537.529937-1-howardchu95@gmail= .com Suggested-by: Howard Chu Signed-off-by: Namhyung Kim --- tools/perf/builtin-trace.c | 119 ++++++++++++++---- .../bpf_skel/augmented_raw_syscalls.bpf.c | 37 ++++-- tools/perf/util/bpf_skel/perf_trace_u.h | 14 +++ 3 files changed, 133 insertions(+), 37 deletions(-) create mode 100644 tools/perf/util/bpf_skel/perf_trace_u.h diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c index 1bc912273af2db66..e1caa82bc427b68b 100644 --- a/tools/perf/builtin-trace.c +++ b/tools/perf/builtin-trace.c @@ -22,6 +22,7 @@ #include #endif #include "util/bpf_map.h" +#include "util/bpf_skel/perf_trace_u.h" #include "util/rlimit.h" #include "builtin.h" #include "util/cgroup.h" @@ -535,6 +536,61 @@ static struct evsel *perf_evsel__raw_syscall_newtp(con= st char *direction, void * return NULL; } =20 +static struct syscall_tp sys_enter_tp; +static struct syscall_tp sys_exit_tp; + +static int evsel__init_bpf_output_tp(struct evsel *evsel) +{ + struct tep_event *event; + struct tep_format_field *field; + struct syscall_tp *sc; + + if (evsel =3D=3D NULL) + return 0; + + event =3D trace_event__tp_format("raw_syscalls", "sys_enter"); + if (IS_ERR(event)) + event =3D trace_event__tp_format("syscalls", "sys_enter"); + if (IS_ERR(event)) + return PTR_ERR(event); + + field =3D tep_find_field(event, "id"); + if (field =3D=3D NULL) + return -EINVAL; + + tp_field__init_uint(&sys_enter_tp.id, field, evsel->needs_swap); + __tp_field__init_ptr(&sys_enter_tp.args, sys_enter_tp.id.offset + sizeof(= u64)); + + /* ID is at the same offset, use evsel sc for convenience */ + sc =3D evsel__syscall_tp(evsel); + if (sc =3D=3D NULL) + return -ENOMEM; + + event =3D trace_event__tp_format("raw_syscalls", "sys_exit"); + if (IS_ERR(event)) + event =3D trace_event__tp_format("syscalls", "sys_exit"); + if (IS_ERR(event)) + return PTR_ERR(event); + + field =3D tep_find_field(event, "id"); + if (field =3D=3D NULL) + return -EINVAL; + + tp_field__init_uint(&sys_exit_tp.id, field, evsel->needs_swap); + + field =3D tep_find_field(event, "ret"); + if (field =3D=3D NULL) + return -EINVAL; + + tp_field__init_uint(&sys_exit_tp.ret, field, evsel->needs_swap); + + /* Save the common part to the evsel sc */ + BUG_ON(sys_enter_tp.id.offset !=3D sys_exit_tp.id.offset); + sc->id =3D sys_enter_tp.id; + + return 0; +} + #define perf_evsel__sc_tp_uint(evsel, name, sample) \ ({ struct syscall_tp *fields =3D __evsel__syscall_tp(evsel); \ fields->name.integer(&fields->name, sample); }) @@ -2777,7 +2833,10 @@ static int trace__sys_enter(struct trace *trace, str= uct evsel *evsel, =20 trace__fprintf_sample(trace, evsel, sample, thread); =20 - args =3D perf_evsel__sc_tp_ptr(evsel, args, sample); + if (evsel =3D=3D trace->syscalls.events.bpf_output) + args =3D sys_enter_tp.args.pointer(&sys_enter_tp.args, sample); + else + args =3D perf_evsel__sc_tp_ptr(evsel, args, sample); =20 if (ttrace->entry_str =3D=3D NULL) { ttrace->entry_str =3D malloc(trace__entry_str_size); @@ -2797,8 +2856,10 @@ static int trace__sys_enter(struct trace *trace, str= uct evsel *evsel, * thinking that the extra 2 u64 args are the augmented filename, so just= check * here and avoid using augmented syscalls when the evsel is the raw_sysc= alls one. */ - if (evsel !=3D trace->syscalls.events.sys_enter) - augmented_args =3D syscall__augmented_args(sc, sample, &augmented_args_s= ize, trace->raw_augmented_syscalls_args_size); + if (evsel =3D=3D trace->syscalls.events.bpf_output) { + augmented_args =3D syscall__augmented_args(sc, sample, &augmented_args_s= ize, + trace->raw_augmented_syscalls_args_size); + } ttrace->entry_time =3D sample->time; msg =3D ttrace->entry_str; printed +=3D scnprintf(msg + printed, trace__entry_str_size - printed, "%= s(", sc->name); @@ -2922,7 +2983,10 @@ static int trace__sys_exit(struct trace *trace, stru= ct evsel *evsel, =20 trace__fprintf_sample(trace, evsel, sample, thread); =20 - ret =3D perf_evsel__sc_tp_uint(evsel, ret, sample); + if (evsel =3D=3D trace->syscalls.events.bpf_output) + ret =3D sys_exit_tp.ret.integer(&sys_exit_tp.ret, sample); + else + ret =3D perf_evsel__sc_tp_uint(evsel, ret, sample); =20 if (trace->summary) thread__update_stats(thread, ttrace, id, sample, ret, trace); @@ -3252,6 +3316,17 @@ static int trace__event_handler(struct trace *trace,= struct evsel *evsel, } } =20 + if (evsel =3D=3D trace->syscalls.events.bpf_output) { + short *event_type =3D sample->raw_data; + + if (*event_type =3D=3D SYSCALL_TRACE_ENTER) + trace__sys_enter(trace, evsel, event, sample); + else + trace__sys_exit(trace, evsel, event, sample); + + goto printed; + } + trace__printf_interrupted_entry(trace); trace__fprintf_tstamp(trace, sample->time, trace->output); =20 @@ -3261,25 +3336,6 @@ static int trace__event_handler(struct trace *trace,= struct evsel *evsel, if (thread) trace__fprintf_comm_tid(trace, thread, trace->output); =20 - if (evsel =3D=3D trace->syscalls.events.bpf_output) { - int id =3D perf_evsel__sc_tp_uint(evsel, id, sample); - int e_machine =3D thread ? thread__e_machine(thread, trace->host) : EM_H= OST; - struct syscall *sc =3D trace__syscall_info(trace, evsel, e_machine, id); - - if (sc) { - fprintf(trace->output, "%s(", sc->name); - trace__fprintf_sys_enter(trace, evsel, sample); - fputc(')', trace->output); - goto newline; - } - - /* - * XXX: Not having the associated syscall info or not finding/adding - * the thread should never happen, but if it does... - * fall thru and print it as a bpf_output event. - */ - } - fprintf(trace->output, "%s(", evsel->name); =20 if (evsel__is_bpf_output(evsel)) { @@ -3299,7 +3355,6 @@ static int trace__event_handler(struct trace *trace, = struct evsel *evsel, } } =20 -newline: fprintf(trace->output, ")\n"); =20 if (callchain_ret > 0) @@ -3307,6 +3362,7 @@ static int trace__event_handler(struct trace *trace, = struct evsel *evsel, else if (callchain_ret < 0) pr_err("Problem processing %s callchain, skipping...\n", evsel__name(evs= el)); =20 +printed: ++trace->nr_events_printed; =20 if (evsel->max_events !=3D ULONG_MAX && ++evsel->nr_events_printed =3D=3D= evsel->max_events) { @@ -4527,7 +4583,7 @@ static int trace__run(struct trace *trace, int argc, = const char **argv) =20 trace->multiple_threads =3D perf_thread_map__pid(evlist->core.threads, 0)= =3D=3D -1 || perf_thread_map__nr(evlist->core.threads) > 1 || - evlist__first(evlist)->core.attr.inherit; + !trace->opts.no_inherit; =20 /* * Now that we already used evsel->core.attr to ask the kernel to setup t= he @@ -5552,8 +5608,6 @@ int cmd_trace(int argc, const char **argv) if (err < 0) goto skip_augmentation; =20 - trace__add_syscall_newtp(&trace); - err =3D augmented_syscalls__create_bpf_output(trace.evlist); if (err =3D=3D 0) trace.syscalls.events.bpf_output =3D evlist__last(trace.evlist); @@ -5589,6 +5643,7 @@ int cmd_trace(int argc, const char **argv) =20 if (trace.evlist->core.nr_entries > 0) { bool use_btf =3D false; + struct evsel *augmented =3D trace.syscalls.events.bpf_output; =20 evlist__set_default_evsel_handler(trace.evlist, trace__event_handler); if (evlist__set_syscall_tp_fields(trace.evlist, &use_btf)) { @@ -5598,6 +5653,16 @@ int cmd_trace(int argc, const char **argv) =20 if (use_btf) trace__load_vmlinux_btf(&trace); + + if (augmented) { + if (evsel__init_bpf_output_tp(augmented) < 0) { + perror("failed to initialize bpf output fields\n"); + goto out; + } + trace.raw_augmented_syscalls_args_size =3D sys_enter_tp.id.offset; + trace.raw_augmented_syscalls_args_size +=3D (6 + 1) * sizeof(long); + trace.raw_augmented_syscalls =3D true; + } } =20 /* diff --git a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c b/tools/= perf/util/bpf_skel/augmented_raw_syscalls.bpf.c index 0016deb321fe0d97..979d60d7dce6565b 100644 --- a/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c +++ b/tools/perf/util/bpf_skel/augmented_raw_syscalls.bpf.c @@ -7,6 +7,7 @@ */ =20 #include "vmlinux.h" +#include "perf_trace_u.h" =20 #include #include @@ -140,7 +141,7 @@ static inline struct augmented_args_payload *augmented_= args_payload(void) return bpf_map_lookup_elem(&augmented_args_tmp, &key); } =20 -static inline int augmented__output(void *ctx, struct augmented_args_paylo= ad *args, int len) +static inline int augmented__output(void *ctx, void *args, int len) { /* If perf_event_output fails, return non-zero so that it gets recorded u= naugmented */ return bpf_perf_event_output(ctx, &__augmented_syscalls__, BPF_F_CURRENT_= CPU, args, len); @@ -182,12 +183,20 @@ unsigned int augmented_arg__read_str(struct augmented= _arg *augmented_arg, const SEC("tp/raw_syscalls/sys_enter") int sys_enter_unaugmented(struct trace_event_raw_sys_enter *args) { + struct augmented_args_payload *augmented_args =3D augmented_args_payload(= ); + + if (augmented_args) + augmented__output(args, &augmented_args->args, sizeof(*args)); return 1; } =20 SEC("tp/raw_syscalls/sys_exit") int sys_exit_unaugmented(struct trace_event_raw_sys_exit *args) { + struct augmented_args_payload *augmented_args =3D augmented_args_payload(= ); + + if (augmented_args) + augmented__output(args, &augmented_args->args, sizeof(*args)); return 1; } =20 @@ -450,6 +459,7 @@ static int augment_sys_enter(void *ctx, struct trace_ev= ent_raw_sys_enter *args) =20 /* copy the sys_enter header, which has the id */ __builtin_memcpy(&payload->args, args, sizeof(*args)); + payload->args.ent.type =3D SYSCALL_TRACE_ENTER; =20 /* * Determine what type of argument and how many bytes to read from user s= pace, using the @@ -532,13 +542,14 @@ int sys_enter(struct trace_event_raw_sys_enter *args) */ =20 if (pid_filter__has(&pids_filtered, getpid())) - return 0; + return 1; =20 augmented_args =3D augmented_args_payload(); if (augmented_args =3D=3D NULL) return 1; =20 bpf_probe_read_kernel(&augmented_args->args, sizeof(augmented_args->args)= , args); + augmented_args->args.ent.type =3D SYSCALL_TRACE_ENTER; =20 /* * Jump to syscall specific augmenter, even if the default one, @@ -548,29 +559,35 @@ int sys_enter(struct trace_event_raw_sys_enter *args) if (augment_sys_enter(args, &augmented_args->args)) bpf_tail_call(args, &syscalls_sys_enter, augmented_args->args.id); =20 - // If not found on the PROG_ARRAY syscalls map, then we're filtering it: - return 0; + return 1; } =20 SEC("tp/raw_syscalls/sys_exit") int sys_exit(struct trace_event_raw_sys_exit *args) { - struct trace_event_raw_sys_exit exit_args; + struct augmented_args_payload *augmented_args; =20 if (pid_filter__has(&pids_filtered, getpid())) - return 0; + return 1; + + augmented_args =3D augmented_args_payload(); + if (augmented_args =3D=3D NULL) + return 1; + + bpf_probe_read_kernel(&augmented_args->args, sizeof(*args), args); + augmented_args->args.ent.type =3D SYSCALL_TRACE_EXIT; =20 - bpf_probe_read_kernel(&exit_args, sizeof(exit_args), args); /* * Jump to syscall specific return augmenter, even if the default one, * "!raw_syscalls:unaugmented" that will just return 1 to return the * unaugmented tracepoint payload. */ - bpf_tail_call(args, &syscalls_sys_exit, exit_args.id); + bpf_tail_call(args, &syscalls_sys_exit, args->id); /* - * If not found on the PROG_ARRAY syscalls map, then we're filtering it: + * If not found on the PROG_ARRAY syscalls map, then we're filtering it + * by not emitting bpf-output event. */ - return 0; + return 1; } =20 char _license[] SEC("license") =3D "GPL"; diff --git a/tools/perf/util/bpf_skel/perf_trace_u.h b/tools/perf/util/bpf_= skel/perf_trace_u.h new file mode 100644 index 0000000000000000..5b41afa734331d89 --- /dev/null +++ b/tools/perf/util/bpf_skel/perf_trace_u.h @@ -0,0 +1,14 @@ +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +// Copyright (c) 2025 Google + +// This file will be shared between BPF and userspace. + +#ifndef __PERF_TRACE_U_H +#define __PERF_TRACE_U_H + +enum syscall_trace_type { + SYSCALL_TRACE_ENTER =3D 0, + SYSCALL_TRACE_EXIT, +}; + +#endif /* __PERF_TRACE_U_H */ --=20 2.51.0.rc1.167.g924127e9c0-goog From nobody Sat Oct 4 17:30:18 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3668E26C384; Thu, 14 Aug 2025 07:17:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755155878; cv=none; b=rn30jhHkiDX83d5EjGJuD4AgaF6uuQosgdzbHiZguo/mhGKULXDYYPpDGdq6spA6sqfu5pTNqJLUdZFHFVo9A87Z3EGCt5wJIhujmtVtM/YblCw47jX5fprgaDQA6vGvKRv01nWta4nQ0cA4nrAQykc82+pb8ZbswNYeuCP7eQ0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755155878; c=relaxed/simple; bh=x5ISNwGpGFs3HPOinoNY/g7uQoPEdQ8+j3ZXq6MAHkY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cl+Q/hbwoSkyxyLviBq/7cu47yXu3ivVKg+PymofhLlIae7cgVRjxtA6EQ03fnM7/HsGsfh/CdGrCEjLL++w6j4ZI0PYk0yFWXJJJkpa/ANjC+TkACgR13t4ePfS2vsxft+wte+/E8zvV7bQ540czEGARax6HZL7tqzulxnZ/j0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UIiDqes0; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UIiDqes0" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A12FAC4CEF4; Thu, 14 Aug 2025 07:17:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1755155878; bh=x5ISNwGpGFs3HPOinoNY/g7uQoPEdQ8+j3ZXq6MAHkY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UIiDqes02ufk4guOt5/+Ojlnz0PFS/aORwSvigI4cXuBep6uGGvOSieGDoCVGCRg5 82CLpZR6C8UUhe8M28LJfDxZitY6pIF0AuQo5Gh3NBz1UzIfbkbjc0uAu8+IKwLqMY gSqZPxtyysKfugxMzXTGORLSl+SiCYXsRJD5Dq4BYXdyOFTXsSVIaffKiAEYxdbU/p X4MYmhC2t9iLAd6ZyB9jbO1C7fBrQphajkOfZmM+HeUJRf1vg8CSJxefpV8CQXtEJh 0h75jWrzQ5zumSRzeeOyWm+j+UJUJn0C7hzEGL9Xf85Ef7Y00NBJ6LHg650GXoxZiR il0Gj7xzTTOig== From: Namhyung Kim To: Arnaldo Carvalho de Melo , Ian Rogers , Kan Liang Cc: Jiri Olsa , Adrian Hunter , Peter Zijlstra , Ingo Molnar , LKML , linux-perf-users@vger.kernel.org, bpf@vger.kernel.org, Song Liu , Howard Chu Subject: [PATCH 4/5] perf trace: Remove unused code Date: Thu, 14 Aug 2025 00:17:53 -0700 Message-ID: <20250814071754.193265-5-namhyung@kernel.org> X-Mailer: git-send-email 2.51.0.rc1.167.g924127e9c0-goog In-Reply-To: <20250814071754.193265-1-namhyung@kernel.org> References: <20250814071754.193265-1-namhyung@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now syscall init for augmented arguments is simplified. Let's get rid of dead code. Signed-off-by: Namhyung Kim --- tools/perf/builtin-trace.c | 110 ------------------------------------- 1 file changed, 110 deletions(-) diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c index e1caa82bc427b68b..a7a49d8997d55594 100644 --- a/tools/perf/builtin-trace.c +++ b/tools/perf/builtin-trace.c @@ -470,38 +470,6 @@ static int evsel__init_syscall_tp(struct evsel *evsel) return -ENOMEM; } =20 -static int evsel__init_augmented_syscall_tp(struct evsel *evsel, struct ev= sel *tp) -{ - struct syscall_tp *sc =3D evsel__syscall_tp(evsel); - - if (sc !=3D NULL) { - struct tep_format_field *syscall_id =3D evsel__field(tp, "id"); - if (syscall_id =3D=3D NULL) - syscall_id =3D evsel__field(tp, "__syscall_nr"); - if (syscall_id =3D=3D NULL || - __tp_field__init_uint(&sc->id, syscall_id->size, syscall_id->offset,= evsel->needs_swap)) - return -EINVAL; - - return 0; - } - - return -ENOMEM; -} - -static int evsel__init_augmented_syscall_tp_args(struct evsel *evsel) -{ - struct syscall_tp *sc =3D __evsel__syscall_tp(evsel); - - return __tp_field__init_ptr(&sc->args, sc->id.offset + sizeof(u64)); -} - -static int evsel__init_augmented_syscall_tp_ret(struct evsel *evsel) -{ - struct syscall_tp *sc =3D __evsel__syscall_tp(evsel); - - return __tp_field__init_uint(&sc->ret, sizeof(u64), sc->id.offset + sizeo= f(u64), evsel->needs_swap); -} - static int evsel__init_raw_syscall_tp(struct evsel *evsel, void *handler) { if (evsel__syscall_tp(evsel) !=3D NULL) { @@ -5506,7 +5474,6 @@ int cmd_trace(int argc, const char **argv) }; bool __maybe_unused max_stack_user_set =3D true; bool mmap_pages_user_set =3D true; - struct evsel *evsel; const char * const trace_subcommands[] =3D { "record", NULL }; int err =3D -1; char bf[BUFSIZ]; @@ -5665,83 +5632,6 @@ int cmd_trace(int argc, const char **argv) } } =20 - /* - * If we are augmenting syscalls, then combine what we put in the - * __augmented_syscalls__ BPF map with what is in the - * syscalls:sys_exit_FOO tracepoints, i.e. just like we do without BPF, - * combining raw_syscalls:sys_enter with raw_syscalls:sys_exit. - * - * We'll switch to look at two BPF maps, one for sys_enter and the - * other for sys_exit when we start augmenting the sys_exit paths with - * buffers that are being copied from kernel to userspace, think 'read' - * syscall. - */ - if (trace.syscalls.events.bpf_output) { - evlist__for_each_entry(trace.evlist, evsel) { - bool raw_syscalls_sys_exit =3D evsel__name_is(evsel, "raw_syscalls:sys_= exit"); - - if (raw_syscalls_sys_exit) { - trace.raw_augmented_syscalls =3D true; - goto init_augmented_syscall_tp; - } - - if (trace.syscalls.events.bpf_output->priv =3D=3D NULL && - strstr(evsel__name(evsel), "syscalls:sys_enter")) { - struct evsel *augmented =3D trace.syscalls.events.bpf_output; - if (evsel__init_augmented_syscall_tp(augmented, evsel) || - evsel__init_augmented_syscall_tp_args(augmented)) - goto out; - /* - * Augmented is __augmented_syscalls__ BPF_OUTPUT event - * Above we made sure we can get from the payload the tp fields - * that we get from syscalls:sys_enter tracefs format file. - */ - augmented->handler =3D trace__sys_enter; - /* - * Now we do the same for the *syscalls:sys_enter event so that - * if we handle it directly, i.e. if the BPF prog returns 0 so - * as not to filter it, then we'll handle it just like we would - * for the BPF_OUTPUT one: - */ - if (evsel__init_augmented_syscall_tp(evsel, evsel) || - evsel__init_augmented_syscall_tp_args(evsel)) - goto out; - evsel->handler =3D trace__sys_enter; - } - - if (strstarts(evsel__name(evsel), "syscalls:sys_exit_")) { - struct syscall_tp *sc; -init_augmented_syscall_tp: - if (evsel__init_augmented_syscall_tp(evsel, evsel)) - goto out; - sc =3D __evsel__syscall_tp(evsel); - /* - * For now with BPF raw_augmented we hook into - * raw_syscalls:sys_enter and there we get all - * 6 syscall args plus the tracepoint common - * fields and the syscall_nr (another long). - * So we check if that is the case and if so - * don't look after the sc->args_size but - * always after the full raw_syscalls:sys_enter - * payload, which is fixed. - * - * We'll revisit this later to pass - * s->args_size to the BPF augmenter (now - * tools/perf/examples/bpf/augmented_raw_syscalls.c, - * so that it copies only what we need for each - * syscall, like what happens when we use - * syscalls:sys_enter_NAME, so that we reduce - * the kernel/userspace traffic to just what is - * needed for each syscall. - */ - if (trace.raw_augmented_syscalls) - trace.raw_augmented_syscalls_args_size =3D (6 + 1) * sizeof(long) + s= c->id.offset; - evsel__init_augmented_syscall_tp_ret(evsel); - evsel->handler =3D trace__sys_exit; - } - } - } - if ((argc >=3D 1) && (strcmp(argv[0], "record") =3D=3D 0)) { err =3D trace__record(&trace, argc-1, &argv[1]); goto out; --=20 2.51.0.rc1.167.g924127e9c0-goog From nobody Sat Oct 4 17:30:18 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2309B26E16F; Thu, 14 Aug 2025 07:17:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755155879; cv=none; b=GKcYU9+bIfMH7MVGiP6G6lij82etxka7kzGKi57JgDroceHb4EpZGIxxksJsackZ4nyWp6Mls7XIOZpuehKS6iseN0veJXWyL7N9oAMO1IuLohi626MqSR1aC9yKBHVtKSEFEl8gmzvxGNktwlqKqjwGZ+zZKwllh+xwAsiFgs8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755155879; c=relaxed/simple; bh=yfHuajUoMYm+ze2VB15y2TKSP2Ni3Z1+r9n+XCyy0AA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=E6ZV5UYHjKEP2xh6I1rq8YnrJ8gDbd37wB2Og1CG8N+SsEhxzc8P+sl+83BgtcaXVtRPok69wenyJz8NWzLLYxUXuZxfWexBjrswPVWghdzISJ/p5hc7Ds5IB9u6eeaVg3HlxJvJW+IFdk7N9s0IbTmX4h++1oRWlLzZcho/NCE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=e1fO8XuI; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="e1fO8XuI" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3EC17C4CEF9; Thu, 14 Aug 2025 07:17:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1755155878; bh=yfHuajUoMYm+ze2VB15y2TKSP2Ni3Z1+r9n+XCyy0AA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=e1fO8XuI+3Wsd4kdRrnqoDASgVS6chLeIizcvp9lDpsck408VcePXYfyvHQSxKabU 013/PzOl1HnK3Xr8VpI5VbW1p9GKYF9BehgqhkYs41yEqSjEv+yUbmz5iD6h6OvmpQ fRsOLdYtBohF+CIl+2bDS/V8J8rMSKaUt2l3x7g3joSW5UrLwKsw9WJYllK7G+DRPq g+9oz8I41CdmiDj84TuFXlg24d4Kyjcg2OMBkpW8fjAxp8VKCP21q/z06vS4MVMfAV BQmvNYN0v2j49331ccR4SuoFzo6JEuiDT27760Epw0DlzgsnSRlAV4y7naWRCILL4H 6hM9YE/bdlLeg== From: Namhyung Kim To: Arnaldo Carvalho de Melo , Ian Rogers , Kan Liang Cc: Jiri Olsa , Adrian Hunter , Peter Zijlstra , Ingo Molnar , LKML , linux-perf-users@vger.kernel.org, bpf@vger.kernel.org, Song Liu , Howard Chu Subject: [PATCH 5/5] perf test: Remove exclusive tag from perf trace tests Date: Thu, 14 Aug 2025 00:17:54 -0700 Message-ID: <20250814071754.193265-6-namhyung@kernel.org> X-Mailer: git-send-email 2.51.0.rc1.167.g924127e9c0-goog In-Reply-To: <20250814071754.193265-1-namhyung@kernel.org> References: <20250814071754.193265-1-namhyung@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now it's safe to run multiple perf trace commands at the same time. Let's make them non-exclusive so that they can run in parallel. $ sudo perf test 'perf trace' 113: Check open filename arg using perf trace + vfs_getname : Sk= ip 114: perf trace enum augmentation tests : Ok 115: perf trace BTF general tests : Ok 116: perf trace exit race : Ok 117: perf trace record and replay : Ok 118: perf trace summary : Ok Signed-off-by: Namhyung Kim --- tools/perf/tests/shell/trace+probe_vfs_getname.sh | 2 +- tools/perf/tests/shell/trace_summary.sh | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/tests/shell/trace+probe_vfs_getname.sh b/tools/perf= /tests/shell/trace+probe_vfs_getname.sh index 7a0b1145d0cd744b..ff7c2f8d41db5802 100755 --- a/tools/perf/tests/shell/trace+probe_vfs_getname.sh +++ b/tools/perf/tests/shell/trace+probe_vfs_getname.sh @@ -1,5 +1,5 @@ #!/bin/bash -# Check open filename arg using perf trace + vfs_getname (exclusive) +# Check open filename arg using perf trace + vfs_getname =20 # Uses the 'perf test shell' library to add probe:vfs_getname to the system # then use it with 'perf trace' using 'touch' to write to a temp file, then diff --git a/tools/perf/tests/shell/trace_summary.sh b/tools/perf/tests/she= ll/trace_summary.sh index 22e2651d59191676..1a99a125492955ad 100755 --- a/tools/perf/tests/shell/trace_summary.sh +++ b/tools/perf/tests/shell/trace_summary.sh @@ -1,5 +1,5 @@ #!/bin/bash -# perf trace summary (exclusive) +# perf trace summary # SPDX-License-Identifier: GPL-2.0 =20 # Check that perf trace works with various summary mode --=20 2.51.0.rc1.167.g924127e9c0-goog