From nobody Thu Nov 28 06:35:01 2024 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DCD3A19CC2C; Thu, 3 Oct 2024 15:18:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968729; cv=none; b=ayzF2hBbu0D8djAga6f0HFZs3vvQQ27PMbuntHro6/r18btG84pNh8htKBT33JF5DCXSg7jZEkRpEaoIQxE8RfNIS0nHrAKJcYK0EM7S+j0/eDEcwyIKqZCEUG1iRnQIA5lvNgA+1rxfBz3972u+wTdyE29X/9uQgHxKAyeV8U0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968729; c=relaxed/simple; bh=Z7z2A9oSUV97QfXJhrFcIHwq+oXKMgFXGOqMqqmGKUI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=XHLpFSn4aljd2W0VfSah0a8dq4zAC3GQSsgDzfysFKiam6dYfFPszGxcfPTwjt5A8QrvyGpDffU1c+gIG2FXB4J9t8PUhWD2cSZSeKXAjL75BdywsIXd6IbJTOE06jQS46x3bC62qmUzHhd0uuloW5NWYNzOcZdXsbIH2K/d4Rw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=jA+oprdh; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="jA+oprdh" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1727968723; bh=Z7z2A9oSUV97QfXJhrFcIHwq+oXKMgFXGOqMqqmGKUI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jA+oprdhu7tN2ydWNlovtDOGMjp/+T4WlWWTONsMOkQxEb4Zf7ksBxoIiI7BYcFf+ cYhThsftuS5co3OlNPEQwduPSNgIW2R2RwNG4G2w3uBxc+PwRZRHX+oGWusFvzMHli gpFNagwWKg38ng/pMzOFn/XLmVy6BPdqCtv09ddix30XblZCKaakcYGqQIeSfPuNIT VdxFKKgqi1VjvJY611YyjOfwGPs9gZEivQEpjMfm9wJ4TZsr/RxqKJCbC5f+YlfJw/ p+q0DLo8hcn1Typj9tsC+omdqJqyouS6+B+m8iH43f5dUIOXP9AMYU71qNiJJ+jicC FcQMLMk1paOMQ== Received: from thinkos.internal.efficios.com (96-127-217-162.qc.cable.ebox.net [96.127.217.162]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4XKFgl1NQLz5yZ; Thu, 3 Oct 2024 11:18:43 -0400 (EDT) From: Mathieu Desnoyers To: Steven Rostedt , Masami Hiramatsu Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Peter Zijlstra , Alexei Starovoitov , Yonghong Song , "Paul E . McKenney" , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Namhyung Kim , Andrii Nakryiko , bpf@vger.kernel.org, Joel Fernandes , linux-trace-kernel@vger.kernel.org, Michael Jeanson Subject: [PATCH v1 1/8] tracing: Declare system call tracepoints with TRACE_EVENT_SYSCALL Date: Thu, 3 Oct 2024 11:16:31 -0400 Message-Id: <20241003151638.1608537-2-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20241003151638.1608537-1-mathieu.desnoyers@efficios.com> References: <20241003151638.1608537-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In preparation for allowing system call tracepoints to handle page faults, introduce TRACE_EVENT_SYSCALL to declare the sys_enter/sys_exit tracepoints. Emit the static inlines register_trace_syscall_##name for events declared with TRACE_EVENT_SYSCALL, allowing source-level validation that only probes meant to handle system call entry/exit events are registered to them. Move the common code between __DECLARE_TRACE and __DECLARE_TRACE_SYSCALL into __DECLARE_TRACE_COMMON. This change is not meant to alter the generated code, and only prepares the following modifications. Signed-off-by: Mathieu Desnoyers Cc: Michael Jeanson Cc: Steven Rostedt Cc: Masami Hiramatsu Cc: Peter Zijlstra Cc: Alexei Starovoitov Cc: Yonghong Song Cc: Paul E. McKenney Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Mark Rutland Cc: Alexander Shishkin Cc: Namhyung Kim Cc: Andrii Nakryiko Cc: bpf@vger.kernel.org Cc: Joel Fernandes --- Changes since v0: - Fix allnoconfig build by adding __DECLARE_TRACE_SYSCALL define in CONFIG_TRACEPOINTS=3Dn case. - Rename unregister_trace_sys_{enter,exit} to unregister_trace_syscall_sys_{enter,exit} for symmetry with register. - Add emit trace_syscall_##name##_enabled for syscall tracepoints rather than trace_##name##_enabled, so it is in sync with the rest of the naming. --- include/linux/tracepoint.h | 83 ++++++++++++++++++++++++++++++--- include/trace/bpf_probe.h | 3 ++ include/trace/define_trace.h | 5 ++ include/trace/events/syscalls.h | 4 +- include/trace/perf.h | 3 ++ include/trace/trace_events.h | 28 +++++++++++ kernel/entry/common.c | 4 +- kernel/trace/trace_syscalls.c | 16 +++---- 8 files changed, 127 insertions(+), 19 deletions(-) diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 93a9f3070b48..666499b9f3be 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -268,10 +268,17 @@ static inline struct tracepoint *tracepoint_ptr_deref= (tracepoint_ptr_t *p) * site if it is not watching, as it will need to be active when the * tracepoint is enabled. */ -#define __DECLARE_TRACE(name, proto, args, cond, data_proto) \ +#define __DECLARE_TRACE_COMMON(name, proto, args, cond, data_proto) \ extern int __traceiter_##name(data_proto); \ DECLARE_STATIC_CALL(tp_func_##name, __traceiter_##name); \ extern struct tracepoint __tracepoint_##name; \ + static inline void \ + check_trace_callback_type_##name(void (*cb)(data_proto)) \ + { \ + } \ + +#define __DECLARE_TRACE(name, proto, args, cond, data_proto) \ + __DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), cond, PARAMS(da= ta_proto)) \ static inline void trace_##name(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ @@ -283,8 +290,13 @@ static inline struct tracepoint *tracepoint_ptr_deref(= tracepoint_ptr_t *p) "RCU not watching for tracepoint"); \ } \ } \ - __DECLARE_TRACE_RCU(name, PARAMS(proto), PARAMS(args), \ - PARAMS(cond)) \ + static inline void trace_##name##_rcuidle(proto) \ + { \ + if (static_key_false(&__tracepoint_##name.key)) \ + __DO_TRACE(name, \ + TP_ARGS(args), \ + TP_CONDITION(cond), 1); \ + } \ static inline int \ register_trace_##name(void (*probe)(data_proto), void *data) \ { \ @@ -302,14 +314,42 @@ static inline struct tracepoint *tracepoint_ptr_deref= (tracepoint_ptr_t *p) unregister_trace_##name(void (*probe)(data_proto), void *data) \ { \ return tracepoint_probe_unregister(&__tracepoint_##name,\ - (void *)probe, data); \ + (void *)probe, data);\ } \ - static inline void \ - check_trace_callback_type_##name(void (*cb)(data_proto)) \ + static inline bool \ + trace_##name##_enabled(void) \ + { \ + return static_key_false(&__tracepoint_##name.key); \ + } + + +#define __DECLARE_TRACE_SYSCALL(name, proto, args, cond, data_proto) \ + __DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), cond, PARAMS(da= ta_proto)) \ + static inline void trace_syscall_##name(proto) \ + { \ + if (static_key_false(&__tracepoint_##name.key)) \ + __DO_TRACE(name, \ + TP_ARGS(args), \ + TP_CONDITION(cond), 0); \ + if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ + WARN_ONCE(!rcu_is_watching(), \ + "RCU not watching for tracepoint"); \ + } \ + } \ + static inline int \ + register_trace_syscall_##name(void (*probe)(data_proto), void *data) \ { \ + return tracepoint_probe_register(&__tracepoint_##name, \ + (void *)probe, data); \ + } \ + static inline int \ + unregister_trace_syscall_##name(void (*probe)(data_proto), void *data) \ + { \ + return tracepoint_probe_unregister(&__tracepoint_##name,\ + (void *)probe, data);\ } \ static inline bool \ - trace_##name##_enabled(void) \ + trace_syscall_##name##_enabled(void) \ { \ return static_key_false(&__tracepoint_##name.key); \ } @@ -398,6 +438,27 @@ static inline struct tracepoint *tracepoint_ptr_deref(= tracepoint_ptr_t *p) return false; \ } =20 +#define __DECLARE_TRACE_SYSCALL(name, proto, args, cond, data_proto) \ + static inline void trace_syscall_##name(proto) \ + { } \ + static inline int \ + register_trace_syscall_##name(void (*probe)(data_proto), \ + void *data) \ + { \ + return -ENOSYS; \ + } \ + static inline int \ + unregister_trace_syscall_##name(void (*probe)(data_proto), \ + void *data) \ + { \ + return -ENOSYS; \ + } \ + static inline bool \ + trace_syscall_##name##_enabled(void) \ + { \ + return false; \ + } + #define DEFINE_TRACE_FN(name, reg, unreg, proto, args) #define DEFINE_TRACE(name, proto, args) #define EXPORT_TRACEPOINT_SYMBOL_GPL(name) @@ -459,6 +520,11 @@ static inline struct tracepoint *tracepoint_ptr_deref(= tracepoint_ptr_t *p) cpu_online(raw_smp_processor_id()) && (PARAMS(cond)), \ PARAMS(void *__data, proto)) =20 +#define DECLARE_TRACE_SYSCALL(name, proto, args) \ + __DECLARE_TRACE_SYSCALL(name, PARAMS(proto), PARAMS(args), \ + cpu_online(raw_smp_processor_id()), \ + PARAMS(void *__data, proto)) + #define TRACE_EVENT_FLAGS(event, flag) =20 #define TRACE_EVENT_PERF_PERM(event, expr...) @@ -596,6 +662,9 @@ static inline struct tracepoint *tracepoint_ptr_deref(t= racepoint_ptr_t *p) struct, assign, print) \ DECLARE_TRACE_CONDITION(name, PARAMS(proto), \ PARAMS(args), PARAMS(cond)) +#define TRACE_EVENT_SYSCALL(name, proto, args, struct, assign, \ + print, reg, unreg) \ + DECLARE_TRACE_SYSCALL(name, PARAMS(proto), PARAMS(args)) =20 #define TRACE_EVENT_FLAGS(event, flag) =20 diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h index a2ea11cc912e..c85bbce5aaa5 100644 --- a/include/trace/bpf_probe.h +++ b/include/trace/bpf_probe.h @@ -53,6 +53,9 @@ __bpf_trace_##call(void *__data, proto) \ #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ __BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args)) =20 +#undef DECLARE_EVENT_SYSCALL_CLASS +#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS + /* * This part is compiled out, it is only here as a build time check * to make sure that if the tracepoint handling changes, the diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h index 00723935dcc7..ff5fa17a6259 100644 --- a/include/trace/define_trace.h +++ b/include/trace/define_trace.h @@ -46,6 +46,10 @@ assign, print, reg, unreg) \ DEFINE_TRACE_FN(name, reg, unreg, PARAMS(proto), PARAMS(args)) =20 +#undef TRACE_EVENT_SYSCALL +#define TRACE_EVENT_SYSCALL(name, proto, args, struct, assign, print, reg,= unreg) \ + DEFINE_TRACE_FN(name, reg, unreg, PARAMS(proto), PARAMS(args)) + #undef TRACE_EVENT_NOP #define TRACE_EVENT_NOP(name, proto, args, struct, assign, print) =20 @@ -107,6 +111,7 @@ #undef TRACE_EVENT #undef TRACE_EVENT_FN #undef TRACE_EVENT_FN_COND +#undef TRACE_EVENT_SYSCALL #undef TRACE_EVENT_CONDITION #undef TRACE_EVENT_NOP #undef DEFINE_EVENT_NOP diff --git a/include/trace/events/syscalls.h b/include/trace/events/syscall= s.h index b6e0cbc2c71f..f31ff446b468 100644 --- a/include/trace/events/syscalls.h +++ b/include/trace/events/syscalls.h @@ -15,7 +15,7 @@ =20 #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS =20 -TRACE_EVENT_FN(sys_enter, +TRACE_EVENT_SYSCALL(sys_enter, =20 TP_PROTO(struct pt_regs *regs, long id), =20 @@ -41,7 +41,7 @@ TRACE_EVENT_FN(sys_enter, =20 TRACE_EVENT_FLAGS(sys_enter, TRACE_EVENT_FL_CAP_ANY) =20 -TRACE_EVENT_FN(sys_exit, +TRACE_EVENT_SYSCALL(sys_exit, =20 TP_PROTO(struct pt_regs *regs, long ret), =20 diff --git a/include/trace/perf.h b/include/trace/perf.h index 2c11181c82e0..ded997af481e 100644 --- a/include/trace/perf.h +++ b/include/trace/perf.h @@ -55,6 +55,9 @@ perf_trace_##call(void *__data, proto) \ head, __task); \ } =20 +#undef DECLARE_EVENT_SYSCALL_CLASS +#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS + /* * This part is compiled out, it is only here as a build time check * to make sure that if the tracepoint handling changes, the diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h index c2f9cabf154d..8bcbb9ee44de 100644 --- a/include/trace/trace_events.h +++ b/include/trace/trace_events.h @@ -45,6 +45,16 @@ PARAMS(print)); \ DEFINE_EVENT(name, name, PARAMS(proto), PARAMS(args)); =20 +#undef TRACE_EVENT_SYSCALL +#define TRACE_EVENT_SYSCALL(name, proto, args, tstruct, assign, print, reg= , unreg) \ + DECLARE_EVENT_SYSCALL_CLASS(name, \ + PARAMS(proto), \ + PARAMS(args), \ + PARAMS(tstruct), \ + PARAMS(assign), \ + PARAMS(print)); \ + DEFINE_EVENT(name, name, PARAMS(proto), PARAMS(args)); + #include "stages/stage1_struct_define.h" =20 #undef DECLARE_EVENT_CLASS @@ -57,6 +67,9 @@ \ static struct trace_event_class event_class_##name; =20 +#undef DECLARE_EVENT_SYSCALL_CLASS +#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS + #undef DEFINE_EVENT #define DEFINE_EVENT(template, name, proto, args) \ static struct trace_event_call __used \ @@ -117,6 +130,9 @@ tstruct; \ }; =20 +#undef DECLARE_EVENT_SYSCALL_CLASS +#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS + #undef DEFINE_EVENT #define DEFINE_EVENT(template, name, proto, args) =20 @@ -208,6 +224,9 @@ static struct trace_event_functions trace_event_type_fu= ncs_##call =3D { \ .trace =3D trace_raw_output_##call, \ }; =20 +#undef DECLARE_EVENT_SYSCALL_CLASS +#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS + #undef DEFINE_EVENT_PRINT #define DEFINE_EVENT_PRINT(template, call, proto, args, print) \ static notrace enum print_line_t \ @@ -265,6 +284,9 @@ static inline notrace int trace_event_get_offsets_##cal= l( \ return __data_size; \ } =20 +#undef DECLARE_EVENT_SYSCALL_CLASS +#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS + #include TRACE_INCLUDE(TRACE_INCLUDE_FILE) =20 /* @@ -409,6 +431,9 @@ trace_event_raw_event_##call(void *__data, proto) \ * fail to compile unless it too is updated. */ =20 +#undef DECLARE_EVENT_SYSCALL_CLASS +#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS + #undef DEFINE_EVENT #define DEFINE_EVENT(template, call, proto, args) \ static inline void ftrace_test_probe_##call(void) \ @@ -434,6 +459,9 @@ static struct trace_event_class __used __refdata event_= class_##call =3D { \ _TRACE_PERF_INIT(call) \ }; =20 +#undef DECLARE_EVENT_SYSCALL_CLASS +#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS + #undef DEFINE_EVENT #define DEFINE_EVENT(template, call, proto, args) \ \ diff --git a/kernel/entry/common.c b/kernel/entry/common.c index 5b6934e23c21..c9ac1c605d8b 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -58,7 +58,7 @@ long syscall_trace_enter(struct pt_regs *regs, long sysca= ll, syscall =3D syscall_get_nr(current, regs); =20 if (unlikely(work & SYSCALL_WORK_SYSCALL_TRACEPOINT)) { - trace_sys_enter(regs, syscall); + trace_syscall_sys_enter(regs, syscall); /* * Probes or BPF hooks in the tracepoint may have changed the * system call number as well. @@ -166,7 +166,7 @@ static void syscall_exit_work(struct pt_regs *regs, uns= igned long work) audit_syscall_exit(regs); =20 if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT) - trace_sys_exit(regs, syscall_get_return_value(current, regs)); + trace_syscall_sys_exit(regs, syscall_get_return_value(current, regs)); =20 step =3D report_single_step(work); if (step || work & SYSCALL_WORK_SYSCALL_TRACE) diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c index 785733245ead..67ac5366f724 100644 --- a/kernel/trace/trace_syscalls.c +++ b/kernel/trace/trace_syscalls.c @@ -377,7 +377,7 @@ static int reg_event_syscall_enter(struct trace_event_f= ile *file, return -ENOSYS; mutex_lock(&syscall_trace_lock); if (!tr->sys_refcount_enter) - ret =3D register_trace_sys_enter(ftrace_syscall_enter, tr); + ret =3D register_trace_syscall_sys_enter(ftrace_syscall_enter, tr); if (!ret) { rcu_assign_pointer(tr->enter_syscall_files[num], file); tr->sys_refcount_enter++; @@ -399,7 +399,7 @@ static void unreg_event_syscall_enter(struct trace_even= t_file *file, tr->sys_refcount_enter--; RCU_INIT_POINTER(tr->enter_syscall_files[num], NULL); if (!tr->sys_refcount_enter) - unregister_trace_sys_enter(ftrace_syscall_enter, tr); + unregister_trace_syscall_sys_enter(ftrace_syscall_enter, tr); mutex_unlock(&syscall_trace_lock); } =20 @@ -415,7 +415,7 @@ static int reg_event_syscall_exit(struct trace_event_fi= le *file, return -ENOSYS; mutex_lock(&syscall_trace_lock); if (!tr->sys_refcount_exit) - ret =3D register_trace_sys_exit(ftrace_syscall_exit, tr); + ret =3D register_trace_syscall_sys_exit(ftrace_syscall_exit, tr); if (!ret) { rcu_assign_pointer(tr->exit_syscall_files[num], file); tr->sys_refcount_exit++; @@ -437,7 +437,7 @@ static void unreg_event_syscall_exit(struct trace_event= _file *file, tr->sys_refcount_exit--; RCU_INIT_POINTER(tr->exit_syscall_files[num], NULL); if (!tr->sys_refcount_exit) - unregister_trace_sys_exit(ftrace_syscall_exit, tr); + unregister_trace_syscall_sys_exit(ftrace_syscall_exit, tr); mutex_unlock(&syscall_trace_lock); } =20 @@ -633,7 +633,7 @@ static int perf_sysenter_enable(struct trace_event_call= *call) =20 mutex_lock(&syscall_trace_lock); if (!sys_perf_refcount_enter) - ret =3D register_trace_sys_enter(perf_syscall_enter, NULL); + ret =3D register_trace_syscall_sys_enter(perf_syscall_enter, NULL); if (ret) { pr_info("event trace: Could not activate syscall entry trace point"); } else { @@ -654,7 +654,7 @@ static void perf_sysenter_disable(struct trace_event_ca= ll *call) sys_perf_refcount_enter--; clear_bit(num, enabled_perf_enter_syscalls); if (!sys_perf_refcount_enter) - unregister_trace_sys_enter(perf_syscall_enter, NULL); + unregister_trace_syscall_sys_enter(perf_syscall_enter, NULL); mutex_unlock(&syscall_trace_lock); } =20 @@ -732,7 +732,7 @@ static int perf_sysexit_enable(struct trace_event_call = *call) =20 mutex_lock(&syscall_trace_lock); if (!sys_perf_refcount_exit) - ret =3D register_trace_sys_exit(perf_syscall_exit, NULL); + ret =3D register_trace_syscall_sys_exit(perf_syscall_exit, NULL); if (ret) { pr_info("event trace: Could not activate syscall exit trace point"); } else { @@ -753,7 +753,7 @@ static void perf_sysexit_disable(struct trace_event_cal= l *call) sys_perf_refcount_exit--; clear_bit(num, enabled_perf_exit_syscalls); if (!sys_perf_refcount_exit) - unregister_trace_sys_exit(perf_syscall_exit, NULL); + unregister_trace_syscall_sys_exit(perf_syscall_exit, NULL); mutex_unlock(&syscall_trace_lock); } =20 --=20 2.39.2 From nobody Thu Nov 28 06:35:01 2024 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DCCEE19B5BE; Thu, 3 Oct 2024 15:18:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968730; cv=none; b=HzQWumJQP1CF1Att7+N++bFb9Wt6qsDoDwEfTJam7MorjkxJ+RpM2GX+Y5YycnbgWg7cp2ALS9sKwxh9ol9t3M/dLOxRhEKn1ON79Ujws5vOe1S9NQOEK31eMo+G6aVGM5pJuO9nIunYrghkMUMxX9heFJnBErXElD845Obu8Bs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968730; c=relaxed/simple; bh=YfUxvqecmLq1ezbZMRPsLG48Is0rYBeYSY6FPAlxDG4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=MvwXATHokXnLOVNS0l2gV0UmrfHLv0RMOWO4d+6SVMQfP7fhbNQ5JmtIA5+7DtdaouV/bMYSYNwujVSmj2IWDgBkLIJzJ4gXG5mZ6I9P/sXMCo1ZLmSz/UlcKdZ79hMqb2BuXo4oemrKatmBcrPkkr9oIg+9FUaOs99xalvBMJM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=VqLYrxvw; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="VqLYrxvw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1727968723; bh=YfUxvqecmLq1ezbZMRPsLG48Is0rYBeYSY6FPAlxDG4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VqLYrxvwzJWV/4hTU4Sk8UCFuks5FqY6THsxGRVy3sa8ENcBSgnyjI86NgXhahjzK RRSyFJFZAB+EyvnTPh9mJRKVW689Grm5UHeVbTs/mCV4yI1zGI9UiH7o2QCtDQIt+G bluvEabStl8kS+qROjk+3SE2CeC31PCMiQIQUiyS0IRz8lxUSO/8GBAvSuQwtmYuku Sb1FwQXYNEy3kZ5mb756OlRx8leUcQDIUbqPt6w46ZY0XCmJfMXvJXY6yIC8K83eFC kAFqAB2xjSqJuzwpQihY5XyaryDg/o9D48slKLN/JZs1w/ireofBqrVgWX/NGnbnoK htXhCZdoBY7vg== Received: from thinkos.internal.efficios.com (96-127-217-162.qc.cable.ebox.net [96.127.217.162]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4XKFgl3xrWz5k7; Thu, 3 Oct 2024 11:18:43 -0400 (EDT) From: Mathieu Desnoyers To: Steven Rostedt , Masami Hiramatsu Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Peter Zijlstra , Alexei Starovoitov , Yonghong Song , "Paul E . McKenney" , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Namhyung Kim , Andrii Nakryiko , bpf@vger.kernel.org, Joel Fernandes , linux-trace-kernel@vger.kernel.org, Michael Jeanson Subject: [PATCH v1 2/8] tracing/ftrace: guard syscall probe with preempt_notrace Date: Thu, 3 Oct 2024 11:16:32 -0400 Message-Id: <20241003151638.1608537-3-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20241003151638.1608537-1-mathieu.desnoyers@efficios.com> References: <20241003151638.1608537-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In preparation for allowing system call enter/exit instrumentation to handle page faults, make sure that ftrace can handle this change by explicitly disabling preemption within the ftrace system call tracepoint probes to respect the current expectations within ftrace ring buffer code. This change does not yet allow ftrace to take page faults per se within its probe, but allows its existing probes to adapt to the upcoming change. Signed-off-by: Mathieu Desnoyers Acked-by: Masami Hiramatsu (Google) Cc: Michael Jeanson Cc: Steven Rostedt Cc: Masami Hiramatsu Cc: Peter Zijlstra Cc: Alexei Starovoitov Cc: Yonghong Song Cc: Paul E. McKenney Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Mark Rutland Cc: Alexander Shishkin Cc: Namhyung Kim Cc: Andrii Nakryiko Cc: bpf@vger.kernel.org Cc: Joel Fernandes --- include/trace/trace_events.h | 38 ++++++++++++++++++++++++++++------- kernel/trace/trace_syscalls.c | 12 +++++++++++ 2 files changed, 43 insertions(+), 7 deletions(-) diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h index 8bcbb9ee44de..0228d9ed94a3 100644 --- a/include/trace/trace_events.h +++ b/include/trace/trace_events.h @@ -263,6 +263,9 @@ static struct trace_event_fields trace_event_fields_##c= all[] =3D { \ tstruct \ {} }; =20 +#undef DECLARE_EVENT_SYSCALL_CLASS +#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS + #undef DEFINE_EVENT_PRINT #define DEFINE_EVENT_PRINT(template, name, proto, args, print) =20 @@ -396,11 +399,11 @@ static inline notrace int trace_event_get_offsets_##c= all( \ =20 #include "stages/stage6_event_callback.h" =20 -#undef DECLARE_EVENT_CLASS -#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ - \ + +#undef __DECLARE_EVENT_CLASS +#define __DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ static notrace void \ -trace_event_raw_event_##call(void *__data, proto) \ +do_trace_event_raw_event_##call(void *__data, proto) \ { \ struct trace_event_file *trace_file =3D __data; \ struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\ @@ -425,15 +428,34 @@ trace_event_raw_event_##call(void *__data, proto) \ \ trace_event_buffer_commit(&fbuffer); \ } + +#undef DECLARE_EVENT_CLASS +#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ +__DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \ + PARAMS(assign), PARAMS(print)) \ +static notrace void \ +trace_event_raw_event_##call(void *__data, proto) \ +{ \ + do_trace_event_raw_event_##call(__data, args); \ +} + +#undef DECLARE_EVENT_SYSCALL_CLASS +#define DECLARE_EVENT_SYSCALL_CLASS(call, proto, args, tstruct, assign, pr= int) \ +__DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \ + PARAMS(assign), PARAMS(print)) \ +static notrace void \ +trace_event_raw_event_##call(void *__data, proto) \ +{ \ + guard(preempt_notrace)(); \ + do_trace_event_raw_event_##call(__data, args); \ +} + /* * The ftrace_test_probe is compiled out, it is only here as a build time = check * to make sure that if the tracepoint handling changes, the ftrace probe = will * fail to compile unless it too is updated. */ =20 -#undef DECLARE_EVENT_SYSCALL_CLASS -#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS - #undef DEFINE_EVENT #define DEFINE_EVENT(template, call, proto, args) \ static inline void ftrace_test_probe_##call(void) \ @@ -443,6 +465,8 @@ static inline void ftrace_test_probe_##call(void) \ =20 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE) =20 +#undef __DECLARE_EVENT_CLASS + #include "stages/stage7_class_define.h" =20 #undef DECLARE_EVENT_CLASS diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c index 67ac5366f724..ab4db8c23f36 100644 --- a/kernel/trace/trace_syscalls.c +++ b/kernel/trace/trace_syscalls.c @@ -299,6 +299,12 @@ static void ftrace_syscall_enter(void *data, struct pt= _regs *regs, long id) int syscall_nr; int size; =20 + /* + * Syscall probe called with preemption enabled, but the ring + * buffer and per-cpu data require preemption to be disabled. + */ + guard(preempt_notrace)(); + syscall_nr =3D trace_get_syscall_nr(current, regs); if (syscall_nr < 0 || syscall_nr >=3D NR_syscalls) return; @@ -338,6 +344,12 @@ static void ftrace_syscall_exit(void *data, struct pt_= regs *regs, long ret) struct trace_event_buffer fbuffer; int syscall_nr; =20 + /* + * Syscall probe called with preemption enabled, but the ring + * buffer and per-cpu data require preemption to be disabled. + */ + guard(preempt_notrace)(); + syscall_nr =3D trace_get_syscall_nr(current, regs); if (syscall_nr < 0 || syscall_nr >=3D NR_syscalls) return; --=20 2.39.2 From nobody Thu Nov 28 06:35:01 2024 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DCC05DDA8; Thu, 3 Oct 2024 15:18:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968729; cv=none; b=jZi1Rc2fUYc3UVME3Cn4Vf1VGevrpm/imT2REj7HbrgWqLJKV9UPa+yw54i/KZVKn8fSd88buQGEF5wjFMML2flMkUlwWtdXT7kLXvzpojIrcMFbmliYu0PJbeJqJTpANZwq00E96nTEl6JfGYfUx91Lnl6V+O8M6NSWY3F4ISc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968729; c=relaxed/simple; bh=xLHuv9/61/GTu39f5ouDh6xzs97QFfjZjBVP9NGmNMs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=gXk2c8i8cSyDwUlYYYsgYfsiSObHWp6yBqz8gh80DcjlRpRrcir/MGXf+RFSdugJz3Tb12PnjJLlo0M4j/Sdnr2dlG320hO8C3vPBbuimmipcuoCX3wdK8YJdnDecHqeiFcGyodEZiVhUOjeFMo8rZLxJJbSfMea5PYx2ZB626k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=NluCR9Uh; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="NluCR9Uh" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1727968724; bh=xLHuv9/61/GTu39f5ouDh6xzs97QFfjZjBVP9NGmNMs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NluCR9Uh9G9Clc/r3fRK9HTEIitLgkArB/6qMtTvPVSc3GHw7oxX5P0IE2kr3z4Eq tqwsFv1YqobrcCEtJ52O2OcSqnYRjekWZokH1aSvncZlY5g2Vjz8A+MOXW59/bYpis jLHWJcp7q5x45aZVT1ki7lrx/xXPlJaaQdlP4EeNASP4vM34ugQTJzWiUR3IyEB2ZR q7zCOs5O1kIyCGKGaODQHnipQBHjx6NU8rxgycmX10P5n5VwC7gJrfrhY1EeCpY6CV SCKroKQjwUwvlLrs7x6wx4SZ10oifdmilei2iArResKZn4ru/wC/PHXRjTCQ+Ipb8S 31w2sWtUyGoAQ== Received: from thinkos.internal.efficios.com (96-127-217-162.qc.cable.ebox.net [96.127.217.162]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4XKFgl6Zjxz5t2; Thu, 3 Oct 2024 11:18:43 -0400 (EDT) From: Mathieu Desnoyers To: Steven Rostedt , Masami Hiramatsu Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Peter Zijlstra , Alexei Starovoitov , Yonghong Song , "Paul E . McKenney" , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Namhyung Kim , Andrii Nakryiko , bpf@vger.kernel.org, Joel Fernandes , linux-trace-kernel@vger.kernel.org, Michael Jeanson Subject: [PATCH v1 3/8] tracing/perf: guard syscall probe with preempt_notrace Date: Thu, 3 Oct 2024 11:16:33 -0400 Message-Id: <20241003151638.1608537-4-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20241003151638.1608537-1-mathieu.desnoyers@efficios.com> References: <20241003151638.1608537-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In preparation for allowing system call enter/exit instrumentation to handle page faults, make sure that perf can handle this change by explicitly disabling preemption within the perf system call tracepoint probes to respect the current expectations within perf ring buffer code. This change does not yet allow perf to take page faults per se within its probe, but allows its existing probes to adapt to the upcoming change. Signed-off-by: Mathieu Desnoyers Cc: Michael Jeanson Cc: Steven Rostedt Cc: Masami Hiramatsu Cc: Peter Zijlstra Cc: Alexei Starovoitov Cc: Yonghong Song Cc: Paul E. McKenney Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Mark Rutland Cc: Alexander Shishkin Cc: Namhyung Kim Cc: Andrii Nakryiko Cc: bpf@vger.kernel.org Cc: Joel Fernandes --- include/trace/perf.h | 41 +++++++++++++++++++++++++++++++---- kernel/trace/trace_syscalls.c | 12 ++++++++++ 2 files changed, 49 insertions(+), 4 deletions(-) diff --git a/include/trace/perf.h b/include/trace/perf.h index ded997af481e..5650c1bad088 100644 --- a/include/trace/perf.h +++ b/include/trace/perf.h @@ -12,10 +12,10 @@ #undef __perf_task #define __perf_task(t) (__task =3D (t)) =20 -#undef DECLARE_EVENT_CLASS -#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ +#undef __DECLARE_EVENT_CLASS +#define __DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ static notrace void \ -perf_trace_##call(void *__data, proto) \ +do_perf_trace_##call(void *__data, proto) \ { \ struct trace_event_call *event_call =3D __data; \ struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\ @@ -55,8 +55,38 @@ perf_trace_##call(void *__data, proto) \ head, __task); \ } =20 +/* + * Define unused __count and __task variables to use @args to pass + * arguments to do_perf_trace_##call. This is needed because the + * macros __perf_count and __perf_task introduce the side-effect to + * store copies into those local variables. + */ +#undef DECLARE_EVENT_CLASS +#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ +__DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \ + PARAMS(assign), PARAMS(print)) \ +static notrace void \ +perf_trace_##call(void *__data, proto) \ +{ \ + u64 __count __attribute__((unused)); \ + struct task_struct *__task __attribute__((unused)); \ + \ + do_perf_trace_##call(__data, args); \ +} + #undef DECLARE_EVENT_SYSCALL_CLASS -#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS +#define DECLARE_EVENT_SYSCALL_CLASS(call, proto, args, tstruct, assign, pr= int) \ +__DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \ + PARAMS(assign), PARAMS(print)) \ +static notrace void \ +perf_trace_##call(void *__data, proto) \ +{ \ + u64 __count __attribute__((unused)); \ + struct task_struct *__task __attribute__((unused)); \ + \ + guard(preempt_notrace)(); \ + do_perf_trace_##call(__data, args); \ +} =20 /* * This part is compiled out, it is only here as a build time check @@ -76,4 +106,7 @@ static inline void perf_test_probe_##call(void) \ DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args)) =20 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE) + +#undef __DECLARE_EVENT_CLASS + #endif /* CONFIG_PERF_EVENTS */ diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c index ab4db8c23f36..edcfa47446c7 100644 --- a/kernel/trace/trace_syscalls.c +++ b/kernel/trace/trace_syscalls.c @@ -596,6 +596,12 @@ static void perf_syscall_enter(void *ignore, struct pt= _regs *regs, long id) int rctx; int size; =20 + /* + * Syscall probe called with preemption enabled, but the ring + * buffer and per-cpu data require preemption to be disabled. + */ + guard(preempt_notrace)(); + syscall_nr =3D trace_get_syscall_nr(current, regs); if (syscall_nr < 0 || syscall_nr >=3D NR_syscalls) return; @@ -698,6 +704,12 @@ static void perf_syscall_exit(void *ignore, struct pt_= regs *regs, long ret) int rctx; int size; =20 + /* + * Syscall probe called with preemption enabled, but the ring + * buffer and per-cpu data require preemption to be disabled. + */ + guard(preempt_notrace)(); + syscall_nr =3D trace_get_syscall_nr(current, regs); if (syscall_nr < 0 || syscall_nr >=3D NR_syscalls) return; --=20 2.39.2 From nobody Thu Nov 28 06:35:01 2024 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DCC971779A5; Thu, 3 Oct 2024 15:18:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968730; cv=none; b=D18RH3qMXM690oRBjzE0e918zSAfZ28+wqZSRQG+sEaNNpMeSbSKdQxjy5JQ6PrOXECZvf44OYDQPPG/352OoG6BGFdWbWY/QJzM2IinwHqUqvTCA5t7FIqmBGdhm4m0XLLxTXtCa7lb+JUT2yjxIg5s1B5gqcZ6TjDRWSmmWIA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968730; c=relaxed/simple; bh=uQvGGRjFBg81/CMoacZvVtIn7AY3sce+G3Hk/Qjw2wU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tdRMwbZ6YyMfbuSyRY9jsyzgbW476TbdiTytjF545PrP5skLwpnLEpT+DEYDrC/2R18YLjIJnylbDcsIJ+4bUiKo6pbuLTYABWLDYcjYCIFVqZ6ukLVC51MT0e9OTxutbpQUkPWDNweeMVr3p8zlbzpm2iueUPKhBTRHFng6nDU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=mU85fo2b; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="mU85fo2b" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1727968724; bh=uQvGGRjFBg81/CMoacZvVtIn7AY3sce+G3Hk/Qjw2wU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=mU85fo2bLIGDjHEQ16l7Q//TqPy25gR39Uz3iHNyh7HDSOk5kiZOw69UunjzQxabq PruZ3Ulfkk3NrGM2bde5AvrYpb7venBKofxhMJbmY/6r89BUAkm7tlv6yzt/+FRUmJ eTLjY1vgL591RUNznwCOq2JCTnocAjl85/5SBWhsONrXNxcE+6awzyR7XnCQXG2DBH Lw1SebxTTOEULZw9oh7qDyqsSNwnkMmQH1jURAKSFfWXxcPFinvAl01eJEZH52rMb0 xOc6xYWkOHpS9OtP5KND58X5DMByHUJGYqVEcAh3LA/AP1gXdy5l8XbodLKxicds3J 9iem7qOXN8snQ== Received: from thinkos.internal.efficios.com (96-127-217-162.qc.cable.ebox.net [96.127.217.162]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4XKFgm2T3Cz62g; Thu, 3 Oct 2024 11:18:44 -0400 (EDT) From: Mathieu Desnoyers To: Steven Rostedt , Masami Hiramatsu Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Peter Zijlstra , Alexei Starovoitov , Yonghong Song , "Paul E . McKenney" , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Namhyung Kim , Andrii Nakryiko , bpf@vger.kernel.org, Joel Fernandes , linux-trace-kernel@vger.kernel.org, Andrii Nakryiko , Michael Jeanson Subject: [PATCH v1 4/8] tracing/bpf: guard syscall probe with preempt_notrace Date: Thu, 3 Oct 2024 11:16:34 -0400 Message-Id: <20241003151638.1608537-5-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20241003151638.1608537-1-mathieu.desnoyers@efficios.com> References: <20241003151638.1608537-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In preparation for allowing system call enter/exit instrumentation to handle page faults, make sure that bpf can handle this change by explicitly disabling preemption within the bpf system call tracepoint probes to respect the current expectations within bpf tracing code. This change does not yet allow bpf to take page faults per se within its probe, but allows its existing probes to adapt to the upcoming change. Signed-off-by: Mathieu Desnoyers Acked-by: Andrii Nakryiko Tested-by: Andrii Nakryiko # BPF parts Cc: Michael Jeanson Cc: Steven Rostedt Cc: Masami Hiramatsu Cc: Peter Zijlstra Cc: Alexei Starovoitov Cc: Yonghong Song Cc: Paul E. McKenney Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Mark Rutland Cc: Alexander Shishkin Cc: Namhyung Kim Cc: Andrii Nakryiko Cc: bpf@vger.kernel.org Cc: Joel Fernandes --- include/trace/bpf_probe.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h index c85bbce5aaa5..211b98d45fc6 100644 --- a/include/trace/bpf_probe.h +++ b/include/trace/bpf_probe.h @@ -53,8 +53,17 @@ __bpf_trace_##call(void *__data, proto) \ #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ __BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args)) =20 +#define __BPF_DECLARE_TRACE_SYSCALL(call, proto, args) \ +static notrace void \ +__bpf_trace_##call(void *__data, proto) \ +{ \ + guard(preempt_notrace)(); \ + CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args)); \ +} + #undef DECLARE_EVENT_SYSCALL_CLASS -#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS +#define DECLARE_EVENT_SYSCALL_CLASS(call, proto, args, tstruct, assign, pr= int) \ + __BPF_DECLARE_TRACE_SYSCALL(call, PARAMS(proto), PARAMS(args)) =20 /* * This part is compiled out, it is only here as a build time check --=20 2.39.2 From nobody Thu Nov 28 06:35:01 2024 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CDDEE1A2C32; Thu, 3 Oct 2024 15:18:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968733; cv=none; b=uIzWSrPFltPa7DLTk136GhoAZs/+w4TmAeG6kpwUkKXY69stX23i/HUfIEeWuEkjiAcaA/zhEyB89H07L29bS4flGOFKuzpWW6yo5L0tDKRgaViMOoBa+XDgSXhqXUS9gzBz/bOSoXZ0BtKj7yozIdwKmGi3ffy+UqJO6rwaHSI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968733; c=relaxed/simple; bh=cFKaryhx8iL3A0KR0AkdZdd7ZsIGBNvmJeXwPGSJJBE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=AZNaTIhD6sKF65IP68xMEdZ6Syz5O0BN/aJavqHwkCo+bmzBwwE9758ycMtzSNy7Tm59zFewafmsJMwJ/CV5sZEOv063fgrtt3X1u2xfrkdeNMg4W2tiuzgQ2kykp4FzddtsazcWJ6YwIrCMzxKFZacUjZnXN1CtCsf+wSddVCE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=ba7jj/3k; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="ba7jj/3k" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1727968725; bh=cFKaryhx8iL3A0KR0AkdZdd7ZsIGBNvmJeXwPGSJJBE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ba7jj/3kHWelvv8zFIa0KiKXxkHJoQSR8ucgpInmUSXf1Ddq3iHqFe/ylEeBA2B59 JL0juShEwPw7iLpsYPQQCEAoohnReLjCelQ0mAjOQ/jUn0x9kK5Re3ZHZmD0n5uhGG ELM1iqbb+lAcDIzUnS+IAgewwdMMU8v3Lpy27Q23gIgEc1VnprQDoyhzEz7jM2iu7Y LbuYDIhqyOa88gen751+RbBgYY/fMX5W3mLE8uUeGaXyTyoX2RJcpKD08OCmoU51/z gZ3jjipHklwwbED1/hkxcL13lQ52HCkMrEOIhTJCoxUz8mHPAPRoQqr3Dxm4j2eSTY wTrIvxv706lKQ== Received: from thinkos.internal.efficios.com (96-127-217-162.qc.cable.ebox.net [96.127.217.162]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4XKFgm5G4rz5pT; Thu, 3 Oct 2024 11:18:44 -0400 (EDT) From: Mathieu Desnoyers To: Steven Rostedt , Masami Hiramatsu Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Peter Zijlstra , Alexei Starovoitov , Yonghong Song , "Paul E . McKenney" , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Namhyung Kim , Andrii Nakryiko , bpf@vger.kernel.org, Joel Fernandes , linux-trace-kernel@vger.kernel.org, Michael Jeanson Subject: [PATCH v1 5/8] tracing: Allow system call tracepoints to handle page faults Date: Thu, 3 Oct 2024 11:16:35 -0400 Message-Id: <20241003151638.1608537-6-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20241003151638.1608537-1-mathieu.desnoyers@efficios.com> References: <20241003151638.1608537-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use Tasks Trace RCU to protect iteration of system call enter/exit tracepoint probes to allow those probes to handle page faults. In preparation for this change, all tracers registering to system call enter/exit tracepoints should expect those to be called with preemption enabled. This allows tracers to fault-in userspace system call arguments such as path strings within their probe callbacks. Signed-off-by: Mathieu Desnoyers Cc: Michael Jeanson Cc: Steven Rostedt Cc: Masami Hiramatsu Cc: Peter Zijlstra Cc: Alexei Starovoitov Cc: Yonghong Song Cc: Paul E. McKenney Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Mark Rutland Cc: Alexander Shishkin Cc: Namhyung Kim Cc: Andrii Nakryiko Cc: bpf@vger.kernel.org Cc: Joel Fernandes --- include/linux/tracepoint.h | 25 +++++++++++++++++-------- init/Kconfig | 1 + 2 files changed, 18 insertions(+), 8 deletions(-) diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 666499b9f3be..6faf34e5efc9 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -17,6 +17,7 @@ #include #include #include +#include #include #include =20 @@ -109,6 +110,7 @@ void for_each_tracepoint_in_module(struct module *mod, #ifdef CONFIG_TRACEPOINTS static inline void tracepoint_synchronize_unregister(void) { + synchronize_rcu_tasks_trace(); synchronize_srcu(&tracepoint_srcu); synchronize_rcu(); } @@ -211,7 +213,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(t= racepoint_ptr_t *p) * it_func[0] is never NULL because there is at least one element in the a= rray * when the array itself is non NULL. */ -#define __DO_TRACE(name, args, cond, rcuidle) \ +#define __DO_TRACE(name, args, cond, rcuidle, syscall) \ do { \ int __maybe_unused __idx =3D 0; \ \ @@ -222,8 +224,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(= tracepoint_ptr_t *p) "Bad RCU usage for tracepoint")) \ return; \ \ - /* keep srcu and sched-rcu usage consistent */ \ - preempt_disable_notrace(); \ + if (syscall) { \ + rcu_read_lock_trace(); \ + } else { \ + /* keep srcu and sched-rcu usage consistent */ \ + preempt_disable_notrace(); \ + } \ \ /* \ * For rcuidle callers, use srcu since sched-rcu \ @@ -241,7 +247,10 @@ static inline struct tracepoint *tracepoint_ptr_deref(= tracepoint_ptr_t *p) srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\ } \ \ - preempt_enable_notrace(); \ + if (syscall) \ + rcu_read_unlock_trace(); \ + else \ + preempt_enable_notrace(); \ } while (0) =20 #ifndef MODULE @@ -251,7 +260,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(t= racepoint_ptr_t *p) if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(name, \ TP_ARGS(args), \ - TP_CONDITION(cond), 1); \ + TP_CONDITION(cond), 1, 0); \ } #else #define __DECLARE_TRACE_RCU(name, proto, args, cond) @@ -284,7 +293,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(t= racepoint_ptr_t *p) if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(name, \ TP_ARGS(args), \ - TP_CONDITION(cond), 0); \ + TP_CONDITION(cond), 0, 0); \ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ WARN_ONCE(!rcu_is_watching(), \ "RCU not watching for tracepoint"); \ @@ -295,7 +304,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(t= racepoint_ptr_t *p) if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(name, \ TP_ARGS(args), \ - TP_CONDITION(cond), 1); \ + TP_CONDITION(cond), 1, 0); \ } \ static inline int \ register_trace_##name(void (*probe)(data_proto), void *data) \ @@ -330,7 +339,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(t= racepoint_ptr_t *p) if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(name, \ TP_ARGS(args), \ - TP_CONDITION(cond), 0); \ + TP_CONDITION(cond), 0, 1); \ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ WARN_ONCE(!rcu_is_watching(), \ "RCU not watching for tracepoint"); \ diff --git a/init/Kconfig b/init/Kconfig index fbd0cb06a50a..eedd0064fb36 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1984,6 +1984,7 @@ config BINDGEN_VERSION_TEXT # config TRACEPOINTS bool + select TASKS_TRACE_RCU =20 source "kernel/Kconfig.kexec" =20 --=20 2.39.2 From nobody Thu Nov 28 06:35:01 2024 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CD5851A2C05; Thu, 3 Oct 2024 15:18:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968732; cv=none; b=G1l4nigL8+3mqg9chPYfmZV5TDxBKqyOqknvKbzkKasAxi15gcJPQzKQ1Q/nbj6q2yO1vRwUV6XAuAHLj2vYPS7/NOGj54inSox8HAkAj/SY7jzU3rLD9dd2cRF96x2NxwRSf3K5SEg1Q/5K9mL09iuQ8Ol75CF3F3Dc97CwrJY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968732; c=relaxed/simple; bh=mtMV15nwSgC24wpcl6rcJCStzDUXS/XjqRPiMOjRqhA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=LiEFbYfCFTMASBgvwkwxbDs7f8YI2+QZutwZtP1tJdYeVipW3Wtw7yzH/7/BUCQ+D0qvLqeo/HPe9Lwek9f6GzW00MNA/pfZ3H59XpZzvmussc4hvYgwdI6A+hbhAYeTsHaEp3koKR75ziz8U6237ymwGrnXOQNa8f/Tn/l/64k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=hbV7WCRK; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="hbV7WCRK" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1727968725; bh=mtMV15nwSgC24wpcl6rcJCStzDUXS/XjqRPiMOjRqhA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hbV7WCRKltuEdzVtpPFHreHAjf7XCF2DxigiVx4HVFO59+Xs8Po0FO+kKpPKC77PM CF8FV3qm6HiFOse7/ZJHzexYFCXYErjX/9MFVKwWFmCRMsRWrB0zjlcP7cmZycsz4z 2uOOe/pe5nwPcuMtDVq3KyAW7VFuad4m/W89czZTYjIGPtJY30jBHrCIOxYrnM0D8K pu6mMNdjuVkFl07TU6LQW0HZiywg3G+yCbXHDNmIiQNaWII1QvHLSfNzpzg0SzdOFT 0U43387P7QOocr3w586bS4vLBQbWgpJczOO8K5f5CJ+0p01dKDRxsbnOF33Ms87nfH VFVeKJN4TcsVA== Received: from thinkos.internal.efficios.com (96-127-217-162.qc.cable.ebox.net [96.127.217.162]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4XKFgn14Qkz640; Thu, 3 Oct 2024 11:18:45 -0400 (EDT) From: Mathieu Desnoyers To: Steven Rostedt , Masami Hiramatsu Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Peter Zijlstra , Alexei Starovoitov , Yonghong Song , "Paul E . McKenney" , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Namhyung Kim , Andrii Nakryiko , bpf@vger.kernel.org, Joel Fernandes , linux-trace-kernel@vger.kernel.org, Michael Jeanson Subject: [PATCH v1 6/8] tracing/ftrace: Add might_fault check to syscall probes Date: Thu, 3 Oct 2024 11:16:36 -0400 Message-Id: <20241003151638.1608537-7-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20241003151638.1608537-1-mathieu.desnoyers@efficios.com> References: <20241003151638.1608537-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a might_fault() check to validate that the ftrace sys_enter/sys_exit probe callbacks are indeed called from a context where page faults can be handled. Signed-off-by: Mathieu Desnoyers Acked-by: Masami Hiramatsu (Google) Cc: Michael Jeanson Cc: Steven Rostedt Cc: Masami Hiramatsu Cc: Peter Zijlstra Cc: Alexei Starovoitov Cc: Yonghong Song Cc: Paul E. McKenney Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Mark Rutland Cc: Alexander Shishkin Cc: Namhyung Kim Cc: Andrii Nakryiko Cc: bpf@vger.kernel.org Cc: Joel Fernandes --- include/trace/trace_events.h | 1 + kernel/trace/trace_syscalls.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h index 0228d9ed94a3..e0d4850b0d77 100644 --- a/include/trace/trace_events.h +++ b/include/trace/trace_events.h @@ -446,6 +446,7 @@ __DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args)= , PARAMS(tstruct), \ static notrace void \ trace_event_raw_event_##call(void *__data, proto) \ { \ + might_fault(); \ guard(preempt_notrace)(); \ do_trace_event_raw_event_##call(__data, args); \ } diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c index edcfa47446c7..89d7e4c57b5b 100644 --- a/kernel/trace/trace_syscalls.c +++ b/kernel/trace/trace_syscalls.c @@ -303,6 +303,7 @@ static void ftrace_syscall_enter(void *data, struct pt_= regs *regs, long id) * Syscall probe called with preemption enabled, but the ring * buffer and per-cpu data require preemption to be disabled. */ + might_fault(); guard(preempt_notrace)(); =20 syscall_nr =3D trace_get_syscall_nr(current, regs); @@ -348,6 +349,7 @@ static void ftrace_syscall_exit(void *data, struct pt_r= egs *regs, long ret) * Syscall probe called with preemption enabled, but the ring * buffer and per-cpu data require preemption to be disabled. */ + might_fault(); guard(preempt_notrace)(); =20 syscall_nr =3D trace_get_syscall_nr(current, regs); --=20 2.39.2 From nobody Thu Nov 28 06:35:01 2024 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4217D1A3A9B; Thu, 3 Oct 2024 15:18:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968732; cv=none; b=u7k+XmdQP425mRQCxZ/2chfxjtujB0cXfifHYNVuFyslNNzv1d+M9oijecwK2SYEm2xdBRlTmcBuAeXiG5pDSdclVjrtc9nUhdJaH8gn1j1v0bqPgPN1xKhqdpUkdpu4dXSkxwsl8BDfqEIBXZwFg282Q5RYlC4Z/oIeq4vWlR4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968732; c=relaxed/simple; bh=Lii3pZqSuFPJyMdWMDiDKTX8jZ8Szv5keXbzZp3BAq8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=mZQDMC6GG9cPw2pVH8Xx2Hh/rrjZxI+zNq7Bhvuq9dV/48C2712i4szssMAFshyy/20lAXGAzyDrpG0t9rVT8MxQLefUwHOJfpnjgwgxtkrdBMq1OYNlNOIqwknTlOuFnwkpwldn70PYzEvZaEf7n0hOOeTI2lzsEfZb6R0bzFc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=wzJ41y/g; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="wzJ41y/g" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1727968725; bh=Lii3pZqSuFPJyMdWMDiDKTX8jZ8Szv5keXbzZp3BAq8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=wzJ41y/gY4clu1bSep1bTUDwRRe2e4PuQGiPcOIqM6lkcVF8QzitjNi+iEXwHJD5F 8tbMTcW0awu75RSPBCvBumsoVEvTiqqFIhn/EKrkdpbp3WxTvQe7k40wSqcrrJd3ja qq5Qsw8LS2o5y2mMGDcJjz/+08hyVmp1a7XkhcFT4rCekuij4s7IuPegsVqm/B5K8j RQ2iIV+WG+5U2L4tznSkhZuA58MzimnkhTv/rrZjZaCDvWKSCJaktU9l/qAYcK71eM 173q0B94P+C8JP4NTnpjNX/Qcg9G34XiEl4nVapFv8ro2yH9n2D/oggK34dHs9j6et DlSYrFB/63EsQ== Received: from thinkos.internal.efficios.com (96-127-217-162.qc.cable.ebox.net [96.127.217.162]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4XKFgn3cV6z5t3; Thu, 3 Oct 2024 11:18:45 -0400 (EDT) From: Mathieu Desnoyers To: Steven Rostedt , Masami Hiramatsu Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Peter Zijlstra , Alexei Starovoitov , Yonghong Song , "Paul E . McKenney" , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Namhyung Kim , Andrii Nakryiko , bpf@vger.kernel.org, Joel Fernandes , linux-trace-kernel@vger.kernel.org, Michael Jeanson Subject: [PATCH v1 7/8] tracing/perf: Add might_fault check to syscall probes Date: Thu, 3 Oct 2024 11:16:37 -0400 Message-Id: <20241003151638.1608537-8-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20241003151638.1608537-1-mathieu.desnoyers@efficios.com> References: <20241003151638.1608537-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a might_fault() check to validate that the perf sys_enter/sys_exit probe callbacks are indeed called from a context where page faults can be handled. Signed-off-by: Mathieu Desnoyers Cc: Michael Jeanson Cc: Steven Rostedt Cc: Masami Hiramatsu Cc: Peter Zijlstra Cc: Alexei Starovoitov Cc: Yonghong Song Cc: Paul E. McKenney Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Mark Rutland Cc: Alexander Shishkin Cc: Namhyung Kim Cc: Andrii Nakryiko Cc: bpf@vger.kernel.org Cc: Joel Fernandes --- include/trace/perf.h | 1 + kernel/trace/trace_syscalls.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/include/trace/perf.h b/include/trace/perf.h index 5650c1bad088..321bfd7919f6 100644 --- a/include/trace/perf.h +++ b/include/trace/perf.h @@ -84,6 +84,7 @@ perf_trace_##call(void *__data, proto) \ u64 __count __attribute__((unused)); \ struct task_struct *__task __attribute__((unused)); \ \ + might_fault(); \ guard(preempt_notrace)(); \ do_perf_trace_##call(__data, args); \ } diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c index 89d7e4c57b5b..0d42d6f293d6 100644 --- a/kernel/trace/trace_syscalls.c +++ b/kernel/trace/trace_syscalls.c @@ -602,6 +602,7 @@ static void perf_syscall_enter(void *ignore, struct pt_= regs *regs, long id) * Syscall probe called with preemption enabled, but the ring * buffer and per-cpu data require preemption to be disabled. */ + might_fault(); guard(preempt_notrace)(); =20 syscall_nr =3D trace_get_syscall_nr(current, regs); @@ -710,6 +711,7 @@ static void perf_syscall_exit(void *ignore, struct pt_r= egs *regs, long ret) * Syscall probe called with preemption enabled, but the ring * buffer and per-cpu data require preemption to be disabled. */ + might_fault(); guard(preempt_notrace)(); =20 syscall_nr =3D trace_get_syscall_nr(current, regs); --=20 2.39.2 From nobody Thu Nov 28 06:35:01 2024 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4012319EECD; Thu, 3 Oct 2024 15:18:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968732; cv=none; b=VyMtj+qgXPtUCcCQfwfHF9fAP7ncGip7s35Y1NpRkymn4gvvp2zG2MPkKyk4ZD76Br+t6QmwQcDF1LJBLfryZBwMwoyHmjU1gikTsdRrZMB4d9D8/OyMhl3klSd0tcTzf8qlgdRcEprGiOiraupnchCzajptyY9DDP9QXTd9bI0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968732; c=relaxed/simple; bh=iV8BNOEUAlVnHaxKIKuldZvcuyZQrIw4r05iL6v8clg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=pGixRWpNp9OvqC9TB0Dc4aoJ+z3n7iD+FTAXIIcYQdnU94iejxcL9yLWHRd0UAC6Z7iTmC+dWfUxiBd/3b+keVKmxXoNLXsmo9N7QVMFEsCk/IG8lf/76Q/p4KNj4LR109fzkiESgFCUsHLRXwjGAWu24hOBgSikxvADrhYr2jY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=p4BRDEib; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="p4BRDEib" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1727968726; bh=iV8BNOEUAlVnHaxKIKuldZvcuyZQrIw4r05iL6v8clg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=p4BRDEibEFUrShfp4oo5FOB2ZXFQ/6+zGl80lZBPupnfK9PBU6VtoFe0cOPyGtQ0x P/XF1NaRyDifYVq/vKFoePTlsCxt8nxL7J1Vt3Q+5I2KL/gMSOnA5521p6TS+X9KDO LDPGZxlm1reC4mguEwMaNhhuGWsZVFkLAXpAibjCzKkuiDSAfT12ROUwZbJBpA6+Yr RGS3g6MtJlQ6hvChtYmBJ+d/kQL7+GazLYyhP/flAE5Bnk5E0Zw1X+lJ+XF6HdtDjA DHmGRjdAsvrRUsNO8sJwx+fE/rGfBSdUHDDiuyLGSpIA4B7l8stXnKrWQ8ZLIfMxnk 5YJbpwO3FUFew== Received: from thinkos.internal.efficios.com (96-127-217-162.qc.cable.ebox.net [96.127.217.162]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4XKFgn6Jqyz6Dp; Thu, 3 Oct 2024 11:18:45 -0400 (EDT) From: Mathieu Desnoyers To: Steven Rostedt , Masami Hiramatsu Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Peter Zijlstra , Alexei Starovoitov , Yonghong Song , "Paul E . McKenney" , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Namhyung Kim , Andrii Nakryiko , bpf@vger.kernel.org, Joel Fernandes , linux-trace-kernel@vger.kernel.org, Andrii Nakryiko , Michael Jeanson Subject: [PATCH v1 8/8] tracing/bpf: Add might_fault check to syscall probes Date: Thu, 3 Oct 2024 11:16:38 -0400 Message-Id: <20241003151638.1608537-9-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20241003151638.1608537-1-mathieu.desnoyers@efficios.com> References: <20241003151638.1608537-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a might_fault() check to validate that the bpf sys_enter/sys_exit probe callbacks are indeed called from a context where page faults can be handled. Signed-off-by: Mathieu Desnoyers Acked-by: Andrii Nakryiko Tested-by: Andrii Nakryiko # BPF parts Cc: Michael Jeanson Cc: Steven Rostedt Cc: Masami Hiramatsu Cc: Peter Zijlstra Cc: Alexei Starovoitov Cc: Yonghong Song Cc: Paul E. McKenney Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Mark Rutland Cc: Alexander Shishkin Cc: Namhyung Kim Cc: Andrii Nakryiko Cc: bpf@vger.kernel.org Cc: Joel Fernandes --- include/trace/bpf_probe.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h index 211b98d45fc6..099df5c3e38a 100644 --- a/include/trace/bpf_probe.h +++ b/include/trace/bpf_probe.h @@ -57,6 +57,7 @@ __bpf_trace_##call(void *__data, proto) \ static notrace void \ __bpf_trace_##call(void *__data, proto) \ { \ + might_fault(); \ guard(preempt_notrace)(); \ CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args)); \ } --=20 2.39.2