From nobody Wed Dec 17 16:29:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9EA8834EF15; Tue, 28 Oct 2025 23:11:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693068; cv=none; b=Y1Q1+DDymL1Nx9c+7MGtpa9r/UtyAnz9wSeoD5eoE5dROCzld/zfPx3oxK3A1BEhcH+BwEvqMCx10T2ZheN95BqmZKhXSU/4tUiqTR42k94oBrIuKo1cz8CShmAD5JH8RoUgCU9GlBWR/c7/sc9CPiXxbx8y79hA878Fyea2B4E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693068; c=relaxed/simple; bh=gpImYmddmWpDMEFMGThwcGsf1u5xb28cmOX36GHILGs=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=m8yYw/F0kj69ChG+9Y3U6d3SF/FnPa1hBgFr6gPTGeYFmLs6M6mm44g6jbz37dQnqno3MCLYdLQzmHP02tHV9viK572WQzT3/TDgCNCZx0QijVF40FBvAqo/rGxnBF0NlEOYsBgJEOkNXJtpDj/Get54O4QIK04rio1nUVVEpBM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=fmJL5eQS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="fmJL5eQS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 12E00C4CEFD; Tue, 28 Oct 2025 23:11:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1761693068; bh=gpImYmddmWpDMEFMGThwcGsf1u5xb28cmOX36GHILGs=; h=Date:From:To:Cc:Subject:References:From; b=fmJL5eQS+1VO3LIg5pDfAy0u1NLTmb+h+vjn5pMVe1NCZrSfhAG1Dbbo2BgvqSxYQ golmMw6TzZqKxaSAVLp012imRnhPT6ClhWUUazgRCpbgSM+qsalsFt9175DxU3o8Um PRUUiJvuwJbEYaGThCQNqkxfg9YdEIQQyxyxfkmJXoczAXVAkI9qWJYLA2H7vTOLhU ehDvl9EGRx1MBWaJhZ6I178sZKcRa22KAaR9IsyQj/djcKLtbdZXInT4UvmGxqX8IN lkDxk3trnBjJCqJX+Ne2p2K3KDBOv5ASuUfyx9deZ9UeyEg8pvjMFx0WU6yY2HswKN FnEYD6SkjBP/g== Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1vDsqx-00000004qpE-10VN; Tue, 28 Oct 2025 19:11:47 -0400 Message-ID: <20251028231147.096570057@kernel.org> User-Agent: quilt/0.68 Date: Tue, 28 Oct 2025 19:11:15 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Peter Zijlstra , Namhyung Kim , Takaya Saeki , Tom Zanussi , Thomas Gleixner , Ian Rogers , Douglas Raillard , Arnaldo Carvalho de Melo , Jiri Olsa , Adrian Hunter , Ingo Molnar Subject: [PATCH v5 01/13] tracing: Make trace_user_fault_read() exposed to rest of tracing References: <20251028231114.820213884@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt The write to the trace_marker file is a critical section where it cannot take locks nor allocate memory. To read from user space, it allocates a per CPU buffer when the trace_marker file is opened, and then when the write system call is performed, it uses the following method to read from user space: preempt_disable(); buffer =3D per_cpu_ptr(cpu_buffers, cpu); do { cnt =3D nr_context_switches_cpu(); migrate_disable(); preempt_enable(); ret =3D copy_from_user(buffer, ptr, len); preempt_disable(); migrate_enable(); } while (!ret && cnt !=3D nr_context_switches_cpu()); if (!ret) ring_buffer_write(buffer); preempt_enable(); It records the number of context switches for the current CPU, enables preemption, copies from user space, disable preemption and then checks if the number of context switches changed. If it did not, then the buffer is valid, otherwise the buffer may have been corrupted and the read from user space must be tried again. The system call trace events are now faultable and have the same restrictions as the trace_marker write. For system calls to read the user space buffer (for example to read the file of the openat system call), it needs the same logic. Instead of copying the code over to the system call trace events, make the code generic to allow the system call trace events to use the same code. The following API is added internally to the tracing sub system (these are only exposed within the tracing subsystem and not to be used outside of it): trace_user_fault_init() - initializes a trace_user_buf_info descriptor that will allocate the per CPU buffers to copy from user space into. trace_user_fault_destroy() - used to free the allocations made by trace_user_fault_init(). trace_user_fault_get() - update the ref count of the info descriptor to allow more than one user to use the same descriptor. trace_user_fault_put() - decrement the ref count. trace_user_fault_read() - performs the above action to read user space into the per CPU buffer. The preempt_disable() is expected before calling this function and preemption must remain disabled while the buffer returned is in use. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace.c | 250 ++++++++++++++++++++++++++++++++----------- kernel/trace/trace.h | 17 +++ 2 files changed, 205 insertions(+), 62 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index d1e527cf2aae..50832411c5c0 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -7223,52 +7223,43 @@ struct trace_user_buf { char *buf; }; =20 -struct trace_user_buf_info { - struct trace_user_buf __percpu *tbuf; - int ref; -}; - - static DEFINE_MUTEX(trace_user_buffer_mutex); static struct trace_user_buf_info *trace_user_buffer; =20 -static void trace_user_fault_buffer_free(struct trace_user_buf_info *tinfo) +/** + * trace_user_fault_destroy - free up allocated memory of a trace user buf= fer + * @tinfo: The descriptor to free up + * + * Frees any data allocated in the trace info dsecriptor. + */ +void trace_user_fault_destroy(struct trace_user_buf_info *tinfo) { char *buf; int cpu; =20 + if (!tinfo || !tinfo->tbuf) + return; + for_each_possible_cpu(cpu) { buf =3D per_cpu_ptr(tinfo->tbuf, cpu)->buf; kfree(buf); } free_percpu(tinfo->tbuf); - kfree(tinfo); } =20 -static int trace_user_fault_buffer_enable(void) +static int user_fault_buffer_enable(struct trace_user_buf_info *tinfo, siz= e_t size) { - struct trace_user_buf_info *tinfo; char *buf; int cpu; =20 - guard(mutex)(&trace_user_buffer_mutex); - - if (trace_user_buffer) { - trace_user_buffer->ref++; - return 0; - } - - tinfo =3D kmalloc(sizeof(*tinfo), GFP_KERNEL); - if (!tinfo) - return -ENOMEM; + lockdep_assert_held(&trace_user_buffer_mutex); =20 tinfo->tbuf =3D alloc_percpu(struct trace_user_buf); - if (!tinfo->tbuf) { - kfree(tinfo); + if (!tinfo->tbuf) return -ENOMEM; - } =20 tinfo->ref =3D 1; + tinfo->size =3D size; =20 /* Clear each buffer in case of error */ for_each_possible_cpu(cpu) { @@ -7276,42 +7267,165 @@ static int trace_user_fault_buffer_enable(void) } =20 for_each_possible_cpu(cpu) { - buf =3D kmalloc_node(TRACE_MARKER_MAX_SIZE, GFP_KERNEL, + buf =3D kmalloc_node(size, GFP_KERNEL, cpu_to_node(cpu)); - if (!buf) { - trace_user_fault_buffer_free(tinfo); + if (!buf) return -ENOMEM; - } per_cpu_ptr(tinfo->tbuf, cpu)->buf =3D buf; } =20 - trace_user_buffer =3D tinfo; - return 0; } =20 -static void trace_user_fault_buffer_disable(void) +/* For internal use. Free and reinitialize */ +static void user_buffer_free(struct trace_user_buf_info **tinfo) { - struct trace_user_buf_info *tinfo; + lockdep_assert_held(&trace_user_buffer_mutex); =20 - guard(mutex)(&trace_user_buffer_mutex); + trace_user_fault_destroy(*tinfo); + kfree(*tinfo); + *tinfo =3D NULL; +} + +/* For internal use. Initialize and allocate */ +static int user_buffer_init(struct trace_user_buf_info **tinfo, size_t siz= e) +{ + bool alloc =3D false; + int ret; + + lockdep_assert_held(&trace_user_buffer_mutex); + + if (!*tinfo) { + alloc =3D true; + *tinfo =3D kzalloc(sizeof(**tinfo), GFP_KERNEL); + if (!*tinfo) + return -ENOMEM; + } =20 - tinfo =3D trace_user_buffer; + ret =3D user_fault_buffer_enable(*tinfo, size); + if (ret < 0 && alloc) + user_buffer_free(tinfo); =20 - if (WARN_ON_ONCE(!tinfo)) + return ret; +} + +/* For internal use, derefrence and free if necessary */ +static void user_buffer_put(struct trace_user_buf_info **tinfo) +{ + guard(mutex)(&trace_user_buffer_mutex); + + if (WARN_ON_ONCE(!*tinfo || !(*tinfo)->ref)) return; =20 - if (--tinfo->ref) + if (--(*tinfo)->ref) return; =20 - trace_user_fault_buffer_free(tinfo); - trace_user_buffer =3D NULL; + user_buffer_free(tinfo); } =20 -/* Must be called with preemption disabled */ -static char *trace_user_fault_read(struct trace_user_buf_info *tinfo, - const char __user *ptr, size_t size, - size_t *read_size) +/** + * trace_user_fault_init - Allocated or reference a per CPU buffer + * @tinfo: A pointer to the trace buffer descriptor + * @size: The size to allocate each per CPU buffer + * + * Create a per CPU buffer that can be used to copy from user space + * in a task context. When calling trace_user_fault_read(), preemption + * must be disabled, and it will enable preemption and copy user + * space data to the buffer. If any schedule switches occur, it will + * retry until it succeeds without a schedule switch knowing the buffer + * is still valid. + * + * Returns 0 on success, negative on failure. + */ +int trace_user_fault_init(struct trace_user_buf_info *tinfo, size_t size) +{ + int ret; + + if (!tinfo) + return -EINVAL; + + guard(mutex)(&trace_user_buffer_mutex); + + ret =3D user_buffer_init(&tinfo, size); + if (ret < 0) + trace_user_fault_destroy(tinfo); + + return ret; +} + +/** + * trace_user_fault_get - up the ref count for the user buffer + * @tinfo: A pointer to a pointer to the trace buffer descriptor + * + * Ups the ref count of the trace buffer. + * + * Returns the new ref count. + */ +int trace_user_fault_get(struct trace_user_buf_info *tinfo) +{ + if (!tinfo) + return -1; + + guard(mutex)(&trace_user_buffer_mutex); + + tinfo->ref++; + return tinfo->ref; +} + +/** + * trace_user_fault_put - dereference a per cpu trace buffer + * @tinfo: The @tinfo that was passed to trace_user_fault_get() + * + * Decrement the ref count of @tinfo. + * + * Returns the new refcount (negative on error). + */ +int trace_user_fault_put(struct trace_user_buf_info *tinfo) +{ + guard(mutex)(&trace_user_buffer_mutex); + + if (WARN_ON_ONCE(!tinfo || !tinfo->ref)) + return -1; + + --tinfo->ref; + return tinfo->ref; +} + +/** + * trace_user_fault_read - Read user space into a per CPU buffer + * @tinfo: The @tinfo allocated by trace_user_fault_get() + * @ptr: The user space pointer to read + * @size: The size of user space to read. + * @copy_func: Optional function to use to copy from user space + * @data: Data to pass to copy_func if it was supplied + * + * Preemption must be disabled when this is called, and must not + * be enabled while using the returned buffer. + * This does the copying from user space into a per CPU buffer. + * + * The @size must not be greater than the size passed in to + * trace_user_fault_init(). + * + * If @copy_func is NULL, trace_user_fault_read() will use copy_from_user(= ), + * otherwise it will call @copy_func. It will call @copy_func with: + * + * buffer: the per CPU buffer of the @tinfo. + * ptr: The pointer @ptr to user space to read + * size: The @size of the ptr to read + * data: The @data parameter + * + * It is expected that @copy_func will return 0 on success and non zero + * if there was a fault. + * + * Returns a pointer to the buffer with the content read from @ptr. + * Preemption must remain disabled while the caller accesses the + * buffer returned by this function. + * Returns NULL if there was a fault, or the size passed in is + * greater than the size passed to trace_user_fault_init(). + */ +char *trace_user_fault_read(struct trace_user_buf_info *tinfo, + const char __user *ptr, size_t size, + trace_user_buf_copy copy_func, void *data) { int cpu =3D smp_processor_id(); char *buffer =3D per_cpu_ptr(tinfo->tbuf, cpu)->buf; @@ -7319,9 +7433,14 @@ static char *trace_user_fault_read(struct trace_user= _buf_info *tinfo, int trys =3D 0; int ret; =20 - if (size > TRACE_MARKER_MAX_SIZE) - size =3D TRACE_MARKER_MAX_SIZE; - *read_size =3D 0; + lockdep_assert_preemption_disabled(); + + /* + * It's up to the caller to not try to copy more than it said + * it would. + */ + if (size > tinfo->size) + return NULL; =20 /* * This acts similar to a seqcount. The per CPU context switches are @@ -7361,7 +7480,14 @@ static char *trace_user_fault_read(struct trace_user= _buf_info *tinfo, */ preempt_enable_notrace(); =20 - ret =3D __copy_from_user(buffer, ptr, size); + /* Make sure preemption is enabled here */ + lockdep_assert_preemption_enabled(); + + if (copy_func) { + ret =3D copy_func(buffer, ptr, size, data); + } else { + ret =3D __copy_from_user(buffer, ptr, size); + } =20 preempt_disable_notrace(); migrate_enable(); @@ -7378,7 +7504,6 @@ static char *trace_user_fault_read(struct trace_user_= buf_info *tinfo, */ } while (nr_context_switches_cpu(cpu) !=3D cnt); =20 - *read_size =3D size; return buffer; } =20 @@ -7389,7 +7514,6 @@ tracing_mark_write(struct file *filp, const char __us= er *ubuf, struct trace_array *tr =3D filp->private_data; ssize_t written =3D -ENODEV; unsigned long ip; - size_t size; char *buf; =20 if (tracing_disabled) @@ -7407,13 +7531,10 @@ tracing_mark_write(struct file *filp, const char __= user *ubuf, /* Must have preemption disabled while having access to the buffer */ guard(preempt_notrace)(); =20 - buf =3D trace_user_fault_read(trace_user_buffer, ubuf, cnt, &size); + buf =3D trace_user_fault_read(trace_user_buffer, ubuf, cnt, NULL, NULL); if (!buf) return -EFAULT; =20 - if (cnt > size) - cnt =3D size; - /* The selftests expect this function to be the IP address */ ip =3D _THIS_IP_; =20 @@ -7473,7 +7594,6 @@ tracing_mark_raw_write(struct file *filp, const char = __user *ubuf, { struct trace_array *tr =3D filp->private_data; ssize_t written =3D -ENODEV; - size_t size; char *buf; =20 if (tracing_disabled) @@ -7486,17 +7606,17 @@ tracing_mark_raw_write(struct file *filp, const cha= r __user *ubuf, if (cnt < sizeof(unsigned int)) return -EINVAL; =20 + /* raw write is all or nothing */ + if (cnt > TRACE_MARKER_MAX_SIZE) + return -EINVAL; + /* Must have preemption disabled while having access to the buffer */ guard(preempt_notrace)(); =20 - buf =3D trace_user_fault_read(trace_user_buffer, ubuf, cnt, &size); + buf =3D trace_user_fault_read(trace_user_buffer, ubuf, cnt, NULL, NULL); if (!buf) return -EFAULT; =20 - /* raw write is all or nothing */ - if (cnt > size) - return -EINVAL; - /* The global trace_marker_raw can go to multiple instances */ if (tr =3D=3D &global_trace) { guard(rcu)(); @@ -7516,20 +7636,26 @@ static int tracing_mark_open(struct inode *inode, s= truct file *filp) { int ret; =20 - ret =3D trace_user_fault_buffer_enable(); - if (ret < 0) - return ret; + scoped_guard(mutex, &trace_user_buffer_mutex) { + if (!trace_user_buffer) { + ret =3D user_buffer_init(&trace_user_buffer, TRACE_MARKER_MAX_SIZE); + if (ret < 0) + return ret; + } else { + trace_user_buffer->ref++; + } + } =20 stream_open(inode, filp); ret =3D tracing_open_generic_tr(inode, filp); if (ret < 0) - trace_user_fault_buffer_disable(); + user_buffer_put(&trace_user_buffer); return ret; } =20 static int tracing_mark_release(struct inode *inode, struct file *file) { - trace_user_fault_buffer_disable(); + user_buffer_put(&trace_user_buffer); return tracing_release_generic_tr(inode, file); } =20 diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 85eabb454bee..8439fe3058cc 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1531,6 +1531,23 @@ void trace_buffered_event_enable(void); =20 void early_enable_events(struct trace_array *tr, char *buf, bool disable_f= irst); =20 +struct trace_user_buf; +struct trace_user_buf_info { + struct trace_user_buf __percpu *tbuf; + size_t size; + int ref; +}; + +typedef int (*trace_user_buf_copy)(char *dst, const char __user *src, + size_t size, void *data); +int trace_user_fault_init(struct trace_user_buf_info *tinfo, size_t size); +int trace_user_fault_get(struct trace_user_buf_info *tinfo); +int trace_user_fault_put(struct trace_user_buf_info *tinfo); +void trace_user_fault_destroy(struct trace_user_buf_info *tinfo); +char *trace_user_fault_read(struct trace_user_buf_info *tinfo, + const char __user *ptr, size_t size, + trace_user_buf_copy copy_func, void *data); + static inline void __trace_event_discard_commit(struct trace_buffer *buffer, struct ring_buffer_event *event) --=20 2.51.0 From nobody Wed Dec 17 16:29:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D29C136A5F2; Tue, 28 Oct 2025 23:11:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693068; cv=none; b=F06mf7r5ybMGMFxbnPwOFozoiP5awaC7KcvO+hwK67+XzCRKVKdY7ATkF0J/NnX5fYO1+pVyO/IIeGrxOdbsorCcBWzZcS8RISvldvr6CG+WGz8PXWMrmoRgR94uLaHbYQfdp+fIG7MZpci+DNr1InGQxZnwWZQ9cSfJBGxVcsw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693068; c=relaxed/simple; bh=XhvOdc47mXAQBrfjChlubd4QQ0kAYV7gF6cSagNgGdQ=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=OqouQkve+vG4dvPYupZN6M8KdD7MRZ89Vdm7QGpq4bxLYDFxeyAlpGTFai3xHEllCpXHVytGWh0Dzpzia6C0AL6MTGzjohvpyhRw3Bv/w+C9HvCC8AJ7IWckPULSwpuSknd5h+TEwTwqUJKfNewwcAGVvSnUO9hegaiFbeBnYzs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=syAF2f6G; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="syAF2f6G" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 28ABCC113D0; Tue, 28 Oct 2025 23:11:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1761693068; bh=XhvOdc47mXAQBrfjChlubd4QQ0kAYV7gF6cSagNgGdQ=; h=Date:From:To:Cc:Subject:References:From; b=syAF2f6GnmJqRfLIiCw/xE4taToQ4K8MbcLtXAeU20NfybDrznDgSBdOWBku735SU xBR0Gkxbs7mEqoEWHVww1PtSW+3Bbar6hCMafAOTAPyNRrLL2fWjAxLklcSih0CMVC /PXMcJkv8WyBTQ/My+lS7Td5FP6jgLhU/FhuW2rEbhl48MWIEAY/7x+Mp5kr+bJfyF VVe9FBykS6nCAcqH/u55GTd2cwnyAuLOe0NDPULbSkfDvX2QPqMKlbLp4LO0Jg7FH/ dUb2kw2Kc6cr9I6M78LDbT15sMofoxO9abS0BIPQxPrq5kz1gibdkuz0c/B50GXDoO T5S+J/2AW9oIQ== Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1vDsqx-00000004qpj-1ieI; Tue, 28 Oct 2025 19:11:47 -0400 Message-ID: <20251028231147.261867956@kernel.org> User-Agent: quilt/0.68 Date: Tue, 28 Oct 2025 19:11:16 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Peter Zijlstra , Namhyung Kim , Takaya Saeki , Tom Zanussi , Thomas Gleixner , Ian Rogers , Douglas Raillard , Arnaldo Carvalho de Melo , Jiri Olsa , Adrian Hunter , Ingo Molnar Subject: [PATCH v5 02/13] tracing: Have syscall trace events read user space string References: <20251028231114.820213884@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt As of commit 654ced4a1377 ("tracing: Introduce tracepoint_is_faultable()") system call trace events allow faulting in user space memory. Have some of the system call trace events take advantage of this. Use the trace_user_fault_read() logic to read the user space buffer from user space and instead of just saving the pointer to the buffer in the system call event, also save the string that is passed in. The syscall event has its nb_args shorten from an int to a short (where even u8 is plenty big enough) and the freed two bytes are used for "user_mask". The new "user_mask" field is used to store the index of the "args" field array that has the address to read from user space. This value is set to 0 if the system call event does not need to read user space for a field. This mask can be used to know if the event may fault or not. Only one bit set in user_mask is supported at this time. This allows the output to look like this: sys_access(filename: 0x7f8c55368470 "/etc/ld.so.preload", mode: 4) sys_execve(filename: 0x564ebcf5a6b8 "/usr/bin/emacs", argv: 0x7fff357c0300= , envp: 0x564ebc4a4820) Signed-off-by: Steven Rostedt (Google) --- Changes since v4: https://lore.kernel.org/20251021005232.590696802@kernel.o= rg - Hide __NR_ values in switch statement when not defined (kernel test robot) include/trace/syscall.h | 4 +- kernel/trace/trace_syscalls.c | 436 ++++++++++++++++++++++++++++++++-- 2 files changed, 420 insertions(+), 20 deletions(-) diff --git a/include/trace/syscall.h b/include/trace/syscall.h index 8e193f3a33b3..85f21ca15a41 100644 --- a/include/trace/syscall.h +++ b/include/trace/syscall.h @@ -16,6 +16,7 @@ * @name: name of the syscall * @syscall_nr: number of the syscall * @nb_args: number of parameters it takes + * @user_mask: mask of @args that will read user space * @types: list of types as strings * @args: list of args as strings (args[i] matches types[i]) * @enter_fields: list of fields for syscall_enter trace event @@ -25,7 +26,8 @@ struct syscall_metadata { const char *name; int syscall_nr; - int nb_args; + short nb_args; + short user_mask; const char **types; const char **args; struct list_head enter_fields; diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c index 0f932b22f9ec..528ac90eda5d 100644 --- a/kernel/trace/trace_syscalls.c +++ b/kernel/trace/trace_syscalls.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 #include #include +#include #include #include #include @@ -123,6 +124,9 @@ const char *get_syscall_name(int syscall) return entry->name; } =20 +/* Added to user strings when max limit is reached */ +#define EXTRA "..." + static enum print_line_t print_syscall_enter(struct trace_iterator *iter, int flags, struct trace_event *event) @@ -132,7 +136,9 @@ print_syscall_enter(struct trace_iterator *iter, int fl= ags, struct trace_entry *ent =3D iter->ent; struct syscall_trace_enter *trace; struct syscall_metadata *entry; - int i, syscall; + int i, syscall, val; + unsigned char *ptr; + int len; =20 trace =3D (typeof(trace))ent; syscall =3D trace->nr; @@ -167,6 +173,19 @@ print_syscall_enter(struct trace_iterator *iter, int f= lags, else trace_seq_printf(s, "%s: 0x%lx", entry->args[i], trace->args[i]); + + if (!(BIT(i) & entry->user_mask)) + continue; + + /* This arg points to a user space string */ + ptr =3D (void *)trace->args + sizeof(long) * entry->nb_args; + val =3D *(int *)ptr; + + /* The value is a dynamic string (len << 16 | offset) */ + ptr =3D (void *)ent + (val & 0xffff); + len =3D val >> 16; + + trace_seq_printf(s, " \"%.*s\"", len, ptr); } =20 trace_seq_putc(s, ')'); @@ -223,15 +242,27 @@ __set_enter_print_fmt(struct syscall_metadata *entry,= char *buf, int len) =20 pos +=3D snprintf(buf + pos, LEN_OR_ZERO, "\""); for (i =3D 0; i < entry->nb_args; i++) { - pos +=3D snprintf(buf + pos, LEN_OR_ZERO, "%s: 0x%%0%zulx%s", - entry->args[i], sizeof(unsigned long), - i =3D=3D entry->nb_args - 1 ? "" : ", "); + if (i) + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, ", "); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, "%s: 0x%%0%zulx", + entry->args[i], sizeof(unsigned long)); + + if (!(BIT(i) & entry->user_mask)) + continue; + + /* Add the format for the user space string */ + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, " \\\"%%s\\\""); } pos +=3D snprintf(buf + pos, LEN_OR_ZERO, "\""); =20 for (i =3D 0; i < entry->nb_args; i++) { pos +=3D snprintf(buf + pos, LEN_OR_ZERO, ", ((unsigned long)(REC->%s))", entry->args[i]); + if (!(BIT(i) & entry->user_mask)) + continue; + /* The user space string for arg has name ___val */ + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, ", __get_str(__%s_val)", + entry->args[i]); } =20 #undef LEN_OR_ZERO @@ -277,8 +308,12 @@ static int __init syscall_enter_define_fields(struct t= race_event_call *call) { struct syscall_trace_enter trace; struct syscall_metadata *meta =3D call->data; + unsigned long mask; + char *arg; int offset =3D offsetof(typeof(trace), args); + int idx; int ret =3D 0; + int len; int i; =20 for (i =3D 0; i < meta->nb_args; i++) { @@ -291,9 +326,148 @@ static int __init syscall_enter_define_fields(struct = trace_event_call *call) offset +=3D sizeof(unsigned long); } =20 + if (ret || !meta->user_mask) + return ret; + + mask =3D meta->user_mask; + idx =3D ffs(mask) - 1; + + /* + * User space strings are faulted into a temporary buffer and then + * added as a dynamic string to the end of the event. + * The user space string name for the arg pointer is "___val". + */ + len =3D strlen(meta->args[idx]) + sizeof("___val"); + arg =3D kmalloc(len, GFP_KERNEL); + if (WARN_ON_ONCE(!arg)) { + meta->user_mask =3D 0; + return -ENOMEM; + } + + snprintf(arg, len, "__%s_val", meta->args[idx]); + + ret =3D trace_define_field(call, "__data_loc char[]", + arg, offset, sizeof(int), 0, + FILTER_OTHER); + if (ret) + kfree(arg); return ret; } =20 +#define SYSCALL_FAULT_BUF_SZ 512 + +/* Use the tracing per CPU buffer infrastructure to copy from user space */ +struct syscall_user_buffer { + struct trace_user_buf_info buf; + struct rcu_head rcu; +}; + +static struct syscall_user_buffer *syscall_buffer; + +static int syscall_fault_buffer_enable(void) +{ + struct syscall_user_buffer *sbuf; + int ret; + + lockdep_assert_held(&syscall_trace_lock); + + if (syscall_buffer) { + trace_user_fault_get(&syscall_buffer->buf); + return 0; + } + + sbuf =3D kmalloc(sizeof(*sbuf), GFP_KERNEL); + if (!sbuf) + return -ENOMEM; + + ret =3D trace_user_fault_init(&sbuf->buf, SYSCALL_FAULT_BUF_SZ); + if (ret < 0) { + kfree(sbuf); + return ret; + } + + WRITE_ONCE(syscall_buffer, sbuf); + + return 0; +} + +static void rcu_free_syscall_buffer(struct rcu_head *rcu) +{ + struct syscall_user_buffer *sbuf =3D + container_of(rcu, struct syscall_user_buffer, rcu); + + trace_user_fault_destroy(&sbuf->buf); + kfree(sbuf); +} + + +static void syscall_fault_buffer_disable(void) +{ + struct syscall_user_buffer *sbuf =3D syscall_buffer; + + lockdep_assert_held(&syscall_trace_lock); + + if (trace_user_fault_put(&sbuf->buf)) + return; + + WRITE_ONCE(syscall_buffer, NULL); + call_rcu_tasks_trace(&sbuf->rcu, rcu_free_syscall_buffer); +} + +static int syscall_copy_user(char *buf, const char __user *ptr, + size_t size, void *data) +{ + unsigned long *ret_size =3D data; + int ret; + + ret =3D strncpy_from_user(buf, ptr, size); + if (ret < 0) + return 1; + *ret_size =3D ret; + return 0; +} + +static char *sys_fault_user(struct syscall_metadata *sys_data, + struct syscall_user_buffer *sbuf, + unsigned long *args, unsigned int *data_size) +{ + unsigned long size =3D SYSCALL_FAULT_BUF_SZ - 1; + unsigned long mask =3D sys_data->user_mask; + int idx =3D ffs(mask) - 1; + char *ptr; + char *buf; + + /* Get the pointer to user space memory to read */ + ptr =3D (char *)args[idx]; + *data_size =3D 0; + + buf =3D trace_user_fault_read(&sbuf->buf, ptr, size, + syscall_copy_user, &size); + if (!buf) + return NULL; + + /* Replace any non-printable characters with '.' */ + for (int i =3D 0; i < size; i++) { + if (!isprint(buf[i])) + buf[i] =3D '.'; + } + + /* + * If the text was truncated due to our max limit, add "..." to + * the string. + */ + if (size > SYSCALL_FAULT_BUF_SZ - sizeof(EXTRA)) { + strscpy(buf + SYSCALL_FAULT_BUF_SZ - sizeof(EXTRA), + EXTRA, sizeof(EXTRA)); + size =3D SYSCALL_FAULT_BUF_SZ; + } else { + buf[size++] =3D '\0'; + } + + *data_size =3D size; + return buf; +} + static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id) { struct trace_array *tr =3D data; @@ -302,15 +476,17 @@ static void ftrace_syscall_enter(void *data, struct p= t_regs *regs, long id) struct syscall_metadata *sys_data; struct trace_event_buffer fbuffer; unsigned long args[6]; + char *user_ptr; + int user_size =3D 0; int syscall_nr; - int size; + int size =3D 0; + bool mayfault; =20 /* * Syscall probe called with preemption enabled, but the ring * buffer and per-cpu data require preemption to be disabled. */ might_fault(); - guard(preempt_notrace)(); =20 syscall_nr =3D trace_get_syscall_nr(current, regs); if (syscall_nr < 0 || syscall_nr >=3D NR_syscalls) @@ -327,7 +503,32 @@ static void ftrace_syscall_enter(void *data, struct pt= _regs *regs, long id) if (!sys_data) return; =20 - size =3D sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args; + /* Check if this syscall event faults in user space memory */ + mayfault =3D sys_data->user_mask !=3D 0; + + guard(preempt_notrace)(); + + syscall_get_arguments(current, regs, args); + + if (mayfault) { + struct syscall_user_buffer *sbuf; + + /* If the syscall_buffer is NULL, tracing is being shutdown */ + sbuf =3D READ_ONCE(syscall_buffer); + if (!sbuf) + return; + + user_ptr =3D sys_fault_user(sys_data, sbuf, args, &user_size); + /* + * user_size is the amount of data to append. + * Need to add 4 for the meta field that points to + * the user memory at the end of the event and also + * stores its size. + */ + size =3D 4 + user_size; + } + + size +=3D sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args; =20 entry =3D trace_event_buffer_reserve(&fbuffer, trace_file, size); if (!entry) @@ -335,9 +536,36 @@ static void ftrace_syscall_enter(void *data, struct pt= _regs *regs, long id) =20 entry =3D ring_buffer_event_data(fbuffer.event); entry->nr =3D syscall_nr; - syscall_get_arguments(current, regs, args); + memcpy(entry->args, args, sizeof(unsigned long) * sys_data->nb_args); =20 + if (mayfault) { + void *ptr; + int val; + + /* + * Set the pointer to point to the meta data of the event + * that has information about the stored user space memory. + */ + ptr =3D (void *)entry->args + sizeof(unsigned long) * sys_data->nb_args; + + /* + * The meta data will store the offset of the user data from + * the beginning of the event. + */ + val =3D (ptr - (void *)entry) + 4; + + /* Store the offset and the size into the meta data */ + *(int *)ptr =3D val | (user_size << 16); + + /* Nothing to do if the user space was empty or faulted */ + if (user_size) { + /* Now store the user space data into the event */ + ptr +=3D 4; + memcpy(ptr, user_ptr, user_size); + } + } + trace_event_buffer_commit(&fbuffer); } =20 @@ -386,39 +614,50 @@ static void ftrace_syscall_exit(void *data, struct pt= _regs *regs, long ret) static int reg_event_syscall_enter(struct trace_event_file *file, struct trace_event_call *call) { + struct syscall_metadata *sys_data =3D call->data; struct trace_array *tr =3D file->tr; int ret =3D 0; int num; =20 - num =3D ((struct syscall_metadata *)call->data)->syscall_nr; + num =3D sys_data->syscall_nr; if (WARN_ON_ONCE(num < 0 || num >=3D NR_syscalls)) return -ENOSYS; - mutex_lock(&syscall_trace_lock); - if (!tr->sys_refcount_enter) + guard(mutex)(&syscall_trace_lock); + if (sys_data->user_mask) { + ret =3D syscall_fault_buffer_enable(); + if (ret < 0) + return ret; + } + if (!tr->sys_refcount_enter) { ret =3D register_trace_sys_enter(ftrace_syscall_enter, tr); - if (!ret) { - WRITE_ONCE(tr->enter_syscall_files[num], file); - tr->sys_refcount_enter++; + if (ret < 0) { + if (sys_data->user_mask) + syscall_fault_buffer_disable(); + return ret; + } } - mutex_unlock(&syscall_trace_lock); - return ret; + WRITE_ONCE(tr->enter_syscall_files[num], file); + tr->sys_refcount_enter++; + return 0; } =20 static void unreg_event_syscall_enter(struct trace_event_file *file, struct trace_event_call *call) { + struct syscall_metadata *sys_data =3D call->data; struct trace_array *tr =3D file->tr; int num; =20 - num =3D ((struct syscall_metadata *)call->data)->syscall_nr; + num =3D sys_data->syscall_nr; if (WARN_ON_ONCE(num < 0 || num >=3D NR_syscalls)) return; - mutex_lock(&syscall_trace_lock); + guard(mutex)(&syscall_trace_lock); tr->sys_refcount_enter--; WRITE_ONCE(tr->enter_syscall_files[num], NULL); if (!tr->sys_refcount_enter) unregister_trace_sys_enter(ftrace_syscall_enter, tr); - mutex_unlock(&syscall_trace_lock); + if (sys_data->user_mask) + syscall_fault_buffer_disable(); } =20 static int reg_event_syscall_exit(struct trace_event_file *file, @@ -459,6 +698,163 @@ static void unreg_event_syscall_exit(struct trace_eve= nt_file *file, mutex_unlock(&syscall_trace_lock); } =20 +/* + * For system calls that reference user space memory that can + * be recorded into the event, set the system call meta data's user_mask + * to the "args" index that points to the user space memory to retrieve. + */ +static void check_faultable_syscall(struct trace_event_call *call, int nr) +{ + struct syscall_metadata *sys_data =3D call->data; + + /* Only work on entry */ + if (sys_data->enter_event !=3D call) + return; + + switch (nr) { + /* user arg at position 0 */ +#ifdef __NR_access + case __NR_access: +#endif + case __NR_acct: + case __NR_add_key: /* Just _type. TODO add _description */ + case __NR_chdir: +#ifdef __NR_chown + case __NR_chown: +#endif +#ifdef __NR_chmod + case __NR_chmod: +#endif + case __NR_chroot: +#ifdef __NR_creat + case __NR_creat: +#endif + case __NR_delete_module: + case __NR_execve: + case __NR_fsopen: + case __NR_getxattr: /* Just pathname, TODO add name */ +#ifdef __NR_lchown + case __NR_lchown: +#endif + case __NR_lgetxattr: /* Just pathname, TODO add name */ + case __NR_lremovexattr: /* Just pathname, TODO add name */ +#ifdef __NR_link + case __NR_link: /* Just oldname. TODO add newname */ +#endif + case __NR_listxattr: /* Just pathname, TODO add list */ + case __NR_llistxattr: /* Just pathname, TODO add list */ + case __NR_lsetxattr: /* Just pathname, TODO add list */ +#ifdef __NR_open + case __NR_open: +#endif + case __NR_memfd_create: + case __NR_mount: /* Just dev_name, TODO add dir_name and type */ +#ifdef __NR_mkdir + case __NR_mkdir: +#endif +#ifdef __NR_mknod + case __NR_mknod: +#endif + case __NR_mq_open: + case __NR_mq_unlink: + case __NR_pivot_root: /* Just new_root, TODO add old_root */ +#ifdef __NR_readlink + case __NR_readlink: +#endif + case __NR_removexattr: /* Just pathname, TODO add name */ +#ifdef __NR_rename + case __NR_rename: /* Just oldname. TODO add newname */ +#endif + case __NR_request_key: /* Just _type. TODO add _description */ +#ifdef __NR_rmdir + case __NR_rmdir: +#endif + case __NR_setxattr: /* Just pathname, TODO add list */ + case __NR_shmdt: +#ifdef __NR_statfs + case __NR_statfs: +#endif + case __NR_swapon: + case __NR_swapoff: +#ifdef __NR_symlink + case __NR_symlink: /* Just oldname. TODO add newname */ +#endif +#ifdef __NR_truncate + case __NR_truncate: +#endif +#ifdef __NR_unlink + case __NR_unlink: +#endif + case __NR_umount2: +#ifdef __NR_utime + case __NR_utime: +#endif +#ifdef __NR_utimes + case __NR_utimes: +#endif + sys_data->user_mask =3D BIT(0); + break; + /* user arg at position 1 */ + case __NR_execveat: + case __NR_faccessat: + case __NR_faccessat2: + case __NR_finit_module: + case __NR_fchmodat: + case __NR_fchmodat2: + case __NR_fchownat: + case __NR_fgetxattr: + case __NR_flistxattr: + case __NR_fsetxattr: + case __NR_fspick: + case __NR_fremovexattr: +#ifdef __NR_futimesat + case __NR_futimesat: +#endif + case __NR_getxattrat: /* Just pathname, TODO add name */ + case __NR_inotify_add_watch: + case __NR_linkat: /* Just oldname. TODO add newname */ + case __NR_listxattrat: /* Just pathname, TODO add list */ + case __NR_mkdirat: + case __NR_mknodat: + case __NR_mount_setattr: + case __NR_move_mount: /* Just from_pathname, TODO add to_pathname */ + case __NR_name_to_handle_at: +#ifdef __NR_newfstatat + case __NR_newfstatat: +#endif + case __NR_openat: + case __NR_openat2: + case __NR_open_tree: + case __NR_open_tree_attr: + case __NR_readlinkat: +#ifdef __NR_renameat + case __NR_renameat: /* Just oldname. TODO add newname */ +#endif + case __NR_renameat2: /* Just oldname. TODO add newname */ + case __NR_removexattrat: /* Just pathname, TODO add name */ + case __NR_quotactl: + case __NR_setxattrat: /* Just pathname, TODO add list */ + case __NR_syslog: + case __NR_symlinkat: /* Just oldname. TODO add newname */ + case __NR_statx: + case __NR_unlinkat: + case __NR_utimensat: + sys_data->user_mask =3D BIT(1); + break; + /* user arg at position 2 */ + case __NR_init_module: + case __NR_fsconfig: + sys_data->user_mask =3D BIT(2); + break; + /* user arg at position 4 */ + case __NR_fanotify_mark: + sys_data->user_mask =3D BIT(4); + break; + default: + sys_data->user_mask =3D 0; + } +} + static int __init init_syscall_trace(struct trace_event_call *call) { int id; @@ -471,6 +867,8 @@ static int __init init_syscall_trace(struct trace_event= _call *call) return -ENOSYS; } =20 + check_faultable_syscall(call, num); + if (set_syscall_print_fmt(call) < 0) return -ENOMEM; =20 --=20 2.51.0 From nobody Wed Dec 17 16:29:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B05A83596FB; Tue, 28 Oct 2025 23:11:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693068; cv=none; b=VGoKlucgTFUr1jqSBhu8yoOtbKXDLbkVHERbvw9NWqpmWr5xHjYBym+NAYGFrCwlSopIiQibBH1gjbUzS90C7nl+/mnsv8Vwp5vqciMvvtvTIMillPwTddQtjUfKNtslH45qMsx1sXDN0Xr4cUP5qXm6QNjuN+z+bzJOYaSLauo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693068; c=relaxed/simple; bh=WLsBwXREdr0DyUqjdCg2PWKUh9JaCW/A96J4RdAFlf4=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=CvPZljyF+nMyux8Xj9lC7bdQOKEat04SfAM14XlHoR3iJhXAtkpwlKQKHtq4y3yF7PbjbsEoNTX7K90z4KXxlWRBREpKxzfNJxL9Qg3WYDQ58rj1YWqjGbMrp5+kh/3RKRXA+XH7wtfdii9sNq9CPXHR/drGFRjuzhQA2npk68A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=p6ep1J5R; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="p6ep1J5R" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4954AC116B1; Tue, 28 Oct 2025 23:11:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1761693068; bh=WLsBwXREdr0DyUqjdCg2PWKUh9JaCW/A96J4RdAFlf4=; h=Date:From:To:Cc:Subject:References:From; b=p6ep1J5RFKLOizMslwbFGSSaCij/hxdsUSEToSe+eexDfZASXEs4h096u5BdA3EIg m0+u4N3i2eIGQUhFdFyPbckQ1gHl5hUbRzhmefun5sRbmFEtt4UePG43TA4f5e5qLM zLBlUU6eir3pOiYV+ND6vA5trK4RFbLJgzcVdPKIGzKC1B/NVf7jD0biC9g+icbOv5 6MAOnlrD0cl9mtAcFVm/tKTwtDPrn113rZqpZoxMOZBaytSuj0VNltUIPug7UHNXyr /lQse05rm/v+4FhbX0/CDE9vP82EIcfeTFHkHHqjVAp3nky6B3SA51FPTpKArOWSzJ +fM8/aJifTU9g== Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1vDsqx-00000004qqD-2PPU; Tue, 28 Oct 2025 19:11:47 -0400 Message-ID: <20251028231147.429583335@kernel.org> User-Agent: quilt/0.68 Date: Tue, 28 Oct 2025 19:11:17 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Peter Zijlstra , Namhyung Kim , Takaya Saeki , Tom Zanussi , Thomas Gleixner , Ian Rogers , Douglas Raillard , Arnaldo Carvalho de Melo , Jiri Olsa , Adrian Hunter , Ingo Molnar Subject: [PATCH v5 03/13] perf: tracing: Simplify perf_sysenter_enable/disable() with guards References: <20251028231114.820213884@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Use guard(mutex)(&syscall_trace_lock) for perf_sysenter_enable() and perf_sysenter_disable() as well as for the perf_sysexit_enable() and perf_sysexit_disable(). This will make it easier to update these functions with other code that has early exit handling. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace_syscalls.c | 48 ++++++++++++++++------------------- 1 file changed, 22 insertions(+), 26 deletions(-) diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c index 528ac90eda5d..42d066d8c0ab 100644 --- a/kernel/trace/trace_syscalls.c +++ b/kernel/trace/trace_syscalls.c @@ -1049,22 +1049,21 @@ static void perf_syscall_enter(void *ignore, struct= pt_regs *regs, long id) =20 static int perf_sysenter_enable(struct trace_event_call *call) { - int ret =3D 0; int num; =20 num =3D ((struct syscall_metadata *)call->data)->syscall_nr; =20 - mutex_lock(&syscall_trace_lock); - if (!sys_perf_refcount_enter) - ret =3D register_trace_sys_enter(perf_syscall_enter, NULL); - if (ret) { - pr_info("event trace: Could not activate syscall entry trace point"); - } else { - set_bit(num, enabled_perf_enter_syscalls); - sys_perf_refcount_enter++; + guard(mutex)(&syscall_trace_lock); + if (!sys_perf_refcount_enter) { + int ret =3D register_trace_sys_enter(perf_syscall_enter, NULL); + if (ret) { + pr_info("event trace: Could not activate syscall entry trace point"); + return ret; + } } - mutex_unlock(&syscall_trace_lock); - return ret; + set_bit(num, enabled_perf_enter_syscalls); + sys_perf_refcount_enter++; + return 0; } =20 static void perf_sysenter_disable(struct trace_event_call *call) @@ -1073,12 +1072,11 @@ static void perf_sysenter_disable(struct trace_even= t_call *call) =20 num =3D ((struct syscall_metadata *)call->data)->syscall_nr; =20 - mutex_lock(&syscall_trace_lock); + guard(mutex)(&syscall_trace_lock); sys_perf_refcount_enter--; clear_bit(num, enabled_perf_enter_syscalls); if (!sys_perf_refcount_enter) unregister_trace_sys_enter(perf_syscall_enter, NULL); - mutex_unlock(&syscall_trace_lock); } =20 static int perf_call_bpf_exit(struct trace_event_call *call, struct pt_reg= s *regs, @@ -1155,22 +1153,21 @@ static void perf_syscall_exit(void *ignore, struct = pt_regs *regs, long ret) =20 static int perf_sysexit_enable(struct trace_event_call *call) { - int ret =3D 0; int num; =20 num =3D ((struct syscall_metadata *)call->data)->syscall_nr; =20 - mutex_lock(&syscall_trace_lock); - if (!sys_perf_refcount_exit) - ret =3D register_trace_sys_exit(perf_syscall_exit, NULL); - if (ret) { - pr_info("event trace: Could not activate syscall exit trace point"); - } else { - set_bit(num, enabled_perf_exit_syscalls); - sys_perf_refcount_exit++; + guard(mutex)(&syscall_trace_lock); + if (!sys_perf_refcount_exit) { + int ret =3D register_trace_sys_exit(perf_syscall_exit, NULL); + if (ret) { + pr_info("event trace: Could not activate syscall exit trace point"); + return ret; + } } - mutex_unlock(&syscall_trace_lock); - return ret; + set_bit(num, enabled_perf_exit_syscalls); + sys_perf_refcount_exit++; + return 0; } =20 static void perf_sysexit_disable(struct trace_event_call *call) @@ -1179,12 +1176,11 @@ static void perf_sysexit_disable(struct trace_event= _call *call) =20 num =3D ((struct syscall_metadata *)call->data)->syscall_nr; =20 - mutex_lock(&syscall_trace_lock); + guard(mutex)(&syscall_trace_lock); sys_perf_refcount_exit--; clear_bit(num, enabled_perf_exit_syscalls); if (!sys_perf_refcount_exit) unregister_trace_sys_exit(perf_syscall_exit, NULL); - mutex_unlock(&syscall_trace_lock); } =20 #endif /* CONFIG_PERF_EVENTS */ --=20 2.51.0 From nobody Wed Dec 17 16:29:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E7ECD36A61E; Tue, 28 Oct 2025 23:11:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693069; cv=none; b=nvQnfzt0HGojtLX+qYc2whMzwzQAjQ5VoRWaSbmdWaFKRYVvk/hwFV8wq5gFl9CLQSjWbkgsx/ENh+VTFJb99mG0KUaiQuKMmHok9BL6AECfG0/f0aDmQGSwwZ68jkgIDTx7UUAQTAgioFiDsjIOKoVNwToEkipaPlyG2eBDJoo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693069; c=relaxed/simple; bh=ctlTeGXqwpdvN57n2osWjQggasajNrzURKXjqnKy6XI=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=nWYLqwL/XXvf/mI0CkR7B5kwPUGCAUvJDgD/sVdIDr0tmFTf4pKNUd8t8P5SDYM3pYWdUZaWvNX9j8t8C7iRYnyY72DD6Jz2TlFJkmR4y+KRKOARkZTwZ+zcxHIzYAzXahDMeF8CN7SQGL4K8iaSiC1x7b+q+Or+N5ygbNOaX7g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=iu/fSy8a; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="iu/fSy8a" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 906D9C19422; Tue, 28 Oct 2025 23:11:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1761693068; bh=ctlTeGXqwpdvN57n2osWjQggasajNrzURKXjqnKy6XI=; h=Date:From:To:Cc:Subject:References:From; b=iu/fSy8a7SvoLO/KSu0CP8Vp090US5MkRIw5tJEUVQdXnoUJm5i771XWXQEbrfe1U X9gsKKTLUuO33CzeL12Nd7k2jnvBYrzfpOt2ZzwYYcuN9iHQ6HX7p3fJGgn0BTUpYU 0F2ELXJVabZPGH45Xrx24tm+TOtazBg5gRC6R+SkcMAC19omj8/K1LxF5NYbY4DTq1 fCJRzlq/68fAeztmhfBnFSAbj8Z8Uaf6P3LvpnL4EgcqnTmq1TVw7yV+ST3H39vDXW +GeZYP1RvGIjIdu/cOrcS89R0OwmXAsfqYwrqxfloxHvQP4V8w+xo/j91MOfG3KbyE kMffhrbLKZbZA== Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1vDsqx-00000004qqh-37go; Tue, 28 Oct 2025 19:11:47 -0400 Message-ID: <20251028231147.593925979@kernel.org> User-Agent: quilt/0.68 Date: Tue, 28 Oct 2025 19:11:18 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Peter Zijlstra , Namhyung Kim , Takaya Saeki , Tom Zanussi , Thomas Gleixner , Ian Rogers , Douglas Raillard , Arnaldo Carvalho de Melo , Jiri Olsa , Adrian Hunter , Ingo Molnar Subject: [PATCH v5 04/13] perf: tracing: Have perf system calls read user space References: <20251028231114.820213884@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Allow some of the system call events to read user space buffers. Instead of just showing the pointer into user space, allow perf events to also record the content of those pointers. For example: # perf record -e syscalls:sys_enter_openat ls /usr/bin [..] # perf script ls 1024 [005] 52.902721: syscalls:sys_enter_openat: dfd: 0xffff= ff9c, filename: 0x7fc1dbae321c "/etc/ld.so.cache", flags: 0x00080000, mode:= 0x00000000 ls 1024 [005] 52.902899: syscalls:sys_enter_openat: dfd: 0xffff= ff9c, filename: 0x7fc1dbaae140 "/lib/x86_64-linux-gnu/libselinux.so.1", fla= gs: 0x00080000, mode: 0x00000000 ls 1024 [005] 52.903471: syscalls:sys_enter_openat: dfd: 0xffff= ff9c, filename: 0x7fc1dbaae690 "/lib/x86_64-linux-gnu/libcap.so.2", flags: = 0x00080000, mode: 0x00000000 ls 1024 [005] 52.903946: syscalls:sys_enter_openat: dfd: 0xffff= ff9c, filename: 0x7fc1dbaaebe0 "/lib/x86_64-linux-gnu/libc.so.6", flags: 0x= 00080000, mode: 0x00000000 ls 1024 [005] 52.904629: syscalls:sys_enter_openat: dfd: 0xffff= ff9c, filename: 0x7fc1dbaaf110 "/lib/x86_64-linux-gnu/libpcre2-8.so.0", fla= gs: 0x00080000, mode: 0x00000000 ls 1024 [005] 52.906985: syscalls:sys_enter_openat: dfd: 0xffff= ffffffffff9c, filename: 0x7fc1dba92904 "/proc/filesystems", flags: 0x000800= 00, mode: 0x00000000 ls 1024 [005] 52.907323: syscalls:sys_enter_openat: dfd: 0xffff= ff9c, filename: 0x7fc1dba19490 "/usr/lib/locale/locale-archive", flags: 0x0= 0080000, mode: 0x00000000 ls 1024 [005] 52.907746: syscalls:sys_enter_openat: dfd: 0xffff= ff9c, filename: 0x556fb888dcd0 "/usr/bin", flags: 0x00090800, mode: 0x00000= 000 Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace_syscalls.c | 136 ++++++++++++++++++++++------------ 1 file changed, 90 insertions(+), 46 deletions(-) diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c index 42d066d8c0ab..ed9332f8bdf8 100644 --- a/kernel/trace/trace_syscalls.c +++ b/kernel/trace/trace_syscalls.c @@ -468,6 +468,58 @@ static char *sys_fault_user(struct syscall_metadata *s= ys_data, return buf; } =20 +static int +syscall_get_data(struct syscall_metadata *sys_data, unsigned long *args, + char **buffer, int *size, int *user_size) +{ + struct syscall_user_buffer *sbuf; + + /* If the syscall_buffer is NULL, tracing is being shutdown */ + sbuf =3D READ_ONCE(syscall_buffer); + if (!sbuf) + return -1; + + *buffer =3D sys_fault_user(sys_data, sbuf, args, user_size); + /* + * user_size is the amount of data to append. + * Need to add 4 for the meta field that points to + * the user memory at the end of the event and also + * stores its size. + */ + *size =3D 4 + *user_size; + return 0; +} + +static void syscall_put_data(struct syscall_metadata *sys_data, + struct syscall_trace_enter *entry, + char *buffer, int size) +{ + void *ptr; + int val; + + /* + * Set the pointer to point to the meta data of the event + * that has information about the stored user space memory. + */ + ptr =3D (void *)entry->args + sizeof(unsigned long) * sys_data->nb_args; + + /* + * The meta data will store the offset of the user data from + * the beginning of the event. + */ + val =3D (ptr - (void *)entry) + 4; + + /* Store the offset and the size into the meta data */ + *(int *)ptr =3D val | (size << 16); + + /* Nothing to do if the user space was empty or faulted */ + if (size) { + /* Now store the user space data into the event */ + ptr +=3D 4; + memcpy(ptr, buffer, size); + } +} + static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id) { struct trace_array *tr =3D data; @@ -511,21 +563,9 @@ static void ftrace_syscall_enter(void *data, struct pt= _regs *regs, long id) syscall_get_arguments(current, regs, args); =20 if (mayfault) { - struct syscall_user_buffer *sbuf; - - /* If the syscall_buffer is NULL, tracing is being shutdown */ - sbuf =3D READ_ONCE(syscall_buffer); - if (!sbuf) + if (syscall_get_data(sys_data, args, &user_ptr, + &size, &user_size) < 0) return; - - user_ptr =3D sys_fault_user(sys_data, sbuf, args, &user_size); - /* - * user_size is the amount of data to append. - * Need to add 4 for the meta field that points to - * the user memory at the end of the event and also - * stores its size. - */ - size =3D 4 + user_size; } =20 size +=3D sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args; @@ -539,32 +579,8 @@ static void ftrace_syscall_enter(void *data, struct pt= _regs *regs, long id) =20 memcpy(entry->args, args, sizeof(unsigned long) * sys_data->nb_args); =20 - if (mayfault) { - void *ptr; - int val; - - /* - * Set the pointer to point to the meta data of the event - * that has information about the stored user space memory. - */ - ptr =3D (void *)entry->args + sizeof(unsigned long) * sys_data->nb_args; - - /* - * The meta data will store the offset of the user data from - * the beginning of the event. - */ - val =3D (ptr - (void *)entry) + 4; - - /* Store the offset and the size into the meta data */ - *(int *)ptr =3D val | (user_size << 16); - - /* Nothing to do if the user space was empty or faulted */ - if (user_size) { - /* Now store the user space data into the event */ - ptr +=3D 4; - memcpy(ptr, user_ptr, user_size); - } - } + if (mayfault) + syscall_put_data(sys_data, entry, user_ptr, user_size); =20 trace_event_buffer_commit(&fbuffer); } @@ -996,9 +1012,12 @@ static void perf_syscall_enter(void *ignore, struct p= t_regs *regs, long id) struct hlist_head *head; unsigned long args[6]; bool valid_prog_array; + bool mayfault; + char *user_ptr; int syscall_nr; + int user_size; int rctx; - int size; + int size =3D 0; =20 /* * Syscall probe called with preemption enabled, but the ring @@ -1017,13 +1036,24 @@ static void perf_syscall_enter(void *ignore, struct= pt_regs *regs, long id) if (!sys_data) return; =20 + syscall_get_arguments(current, regs, args); + + /* Check if this syscall event faults in user space memory */ + mayfault =3D sys_data->user_mask !=3D 0; + + if (mayfault) { + if (syscall_get_data(sys_data, args, &user_ptr, + &size, &user_size) < 0) + return; + } + head =3D this_cpu_ptr(sys_data->enter_event->perf_events); valid_prog_array =3D bpf_prog_array_valid(sys_data->enter_event); if (!valid_prog_array && hlist_empty(head)) return; =20 /* get the size after alignment with the u32 buffer size field */ - size =3D sizeof(unsigned long) * sys_data->nb_args + sizeof(*rec); + size +=3D sizeof(unsigned long) * sys_data->nb_args + sizeof(*rec); size =3D ALIGN(size + sizeof(u32), sizeof(u64)); size -=3D sizeof(u32); =20 @@ -1032,9 +1062,11 @@ static void perf_syscall_enter(void *ignore, struct = pt_regs *regs, long id) return; =20 rec->nr =3D syscall_nr; - syscall_get_arguments(current, regs, args); memcpy(&rec->args, args, sizeof(unsigned long) * sys_data->nb_args); =20 + if (mayfault) + syscall_put_data(sys_data, rec, user_ptr, user_size); + if ((valid_prog_array && !perf_call_bpf_enter(sys_data->enter_event, fake_regs, sys_data, rec= )) || hlist_empty(head)) { @@ -1049,15 +1081,24 @@ static void perf_syscall_enter(void *ignore, struct= pt_regs *regs, long id) =20 static int perf_sysenter_enable(struct trace_event_call *call) { + struct syscall_metadata *sys_data =3D call->data; int num; + int ret; =20 - num =3D ((struct syscall_metadata *)call->data)->syscall_nr; + num =3D sys_data->syscall_nr; =20 guard(mutex)(&syscall_trace_lock); + if (sys_data->user_mask) { + ret =3D syscall_fault_buffer_enable(); + if (ret < 0) + return ret; + } if (!sys_perf_refcount_enter) { - int ret =3D register_trace_sys_enter(perf_syscall_enter, NULL); + ret =3D register_trace_sys_enter(perf_syscall_enter, NULL); if (ret) { pr_info("event trace: Could not activate syscall entry trace point"); + if (sys_data->user_mask) + syscall_fault_buffer_disable(); return ret; } } @@ -1068,15 +1109,18 @@ static int perf_sysenter_enable(struct trace_event_= call *call) =20 static void perf_sysenter_disable(struct trace_event_call *call) { + struct syscall_metadata *sys_data =3D call->data; int num; =20 - num =3D ((struct syscall_metadata *)call->data)->syscall_nr; + num =3D sys_data->syscall_nr; =20 guard(mutex)(&syscall_trace_lock); sys_perf_refcount_enter--; clear_bit(num, enabled_perf_enter_syscalls); if (!sys_perf_refcount_enter) unregister_trace_sys_enter(perf_syscall_enter, NULL); + if (sys_data->user_mask) + syscall_fault_buffer_disable(); } =20 static int perf_call_bpf_exit(struct trace_event_call *call, struct pt_reg= s *regs, --=20 2.51.0 From nobody Wed Dec 17 16:29:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E7FA236A61F; Tue, 28 Oct 2025 23:11:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693069; cv=none; b=UbLd2Ta4N+C4QlzL2uqa+OA6KeEznoGKs9NAPycYkV6TzmWQ4QyaxCcPXb0BxcysWxMVZY+5iK7FL3JnXarrKJLLCyyNgL5qAj7Z/azF3UHqakY+kGi/7QIhCkQSOokFfpel9cRotTqZZB+DkgVSvdO6GmVZn+//Id9+YCjGVrc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693069; c=relaxed/simple; bh=xX9mS6YsF8vrUH+44CC7Pk4PJ2IStcEFjc2ZwrOqVM8=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=OejP2FOj9O2fifRy6Hj2Tujtm8sMHSrsel+fFJuljEHQkdWCNq55Z+vhQXj3Ycv/rOQMaINWSukuzxT90Gv3dhEVPQgSzKNPHDR2Ea1HW+q2rdvY858YTnV1zoZbHtjsE3CJUwMreguPDKgA+lEmi99KmbKf/zxHHdztjbubbRA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=QgNLY6MM; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QgNLY6MM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 94ED5C2BCB0; Tue, 28 Oct 2025 23:11:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1761693068; bh=xX9mS6YsF8vrUH+44CC7Pk4PJ2IStcEFjc2ZwrOqVM8=; h=Date:From:To:Cc:Subject:References:From; b=QgNLY6MM8dw+TW7faONZWgWLwMNDC2P/McVQJtLGo6kDeKhBpVTXH/JUUobexL1PA K/xz734yfA6OSh46ba6SXlHssYRlip5H08z5BA1XkvGvRwY3jKfdQ5Le058EyAhyFu ZFSv6sWbksO4B/ot+dWtlwMIZN6fj+xK2Ngs/UsLD0ExfgXuCmTfYRF60WDH30Aby/ FmCGJg21U7y3x0H8BlUfu/i7oMca7mZG1E7Lu2+K7cpU2hq59yXuYcMj7obhYqc9mE zrcwyH3xrhHdXRJkDkM/UKdGDA1YakTSpfIoPOtDdPUQYA1ZiRzrQ1bwdUCd3sm+9Q /7OmIlPfGd80Q== Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1vDsqx-00000004qrB-3pLV; Tue, 28 Oct 2025 19:11:47 -0400 Message-ID: <20251028231147.763528474@kernel.org> User-Agent: quilt/0.68 Date: Tue, 28 Oct 2025 19:11:19 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Peter Zijlstra , Namhyung Kim , Takaya Saeki , Tom Zanussi , Thomas Gleixner , Ian Rogers , Douglas Raillard , Arnaldo Carvalho de Melo , Jiri Olsa , Adrian Hunter , Ingo Molnar Subject: [PATCH v5 05/13] tracing: Have system call events record user array data References: <20251028231114.820213884@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt For system call events that have a length field, add a "user_arg_size" parameter to the system call meta data that denotes the index of the args array that holds the size of arg that the user_mask field has a bit set for. The "user_mask" has a bit set that denotes the arg that points to an array in the user space address space and if a system call event has the user_mask field set and the user_arg_size set, it will then record the content of that address into the trace event, up to the size defined by SYSCALL_FAULT_BUF_SZ - 1. This allows the output to look like: sys_write(fd: 0xa, buf: 0x5646978d13c0 (01:00:05:00:00:00:00:00:01:87:55:= 89:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00), count: 0x2= 0) Signed-off-by: Steven Rostedt (Google) --- include/trace/syscall.h | 4 +- kernel/trace/trace_syscalls.c | 121 ++++++++++++++++++++++++---------- 2 files changed, 90 insertions(+), 35 deletions(-) diff --git a/include/trace/syscall.h b/include/trace/syscall.h index 85f21ca15a41..9413c139da66 100644 --- a/include/trace/syscall.h +++ b/include/trace/syscall.h @@ -16,6 +16,7 @@ * @name: name of the syscall * @syscall_nr: number of the syscall * @nb_args: number of parameters it takes + * @user_arg_size: holds @arg that has size of the user space to read * @user_mask: mask of @args that will read user space * @types: list of types as strings * @args: list of args as strings (args[i] matches types[i]) @@ -26,7 +27,8 @@ struct syscall_metadata { const char *name; int syscall_nr; - short nb_args; + u8 nb_args; + s8 user_arg_size; short user_mask; const char **types; const char **args; diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c index ed9332f8bdf8..3f3cdfc9958e 100644 --- a/kernel/trace/trace_syscalls.c +++ b/kernel/trace/trace_syscalls.c @@ -124,7 +124,7 @@ const char *get_syscall_name(int syscall) return entry->name; } =20 -/* Added to user strings when max limit is reached */ +/* Added to user strings or arrays when max limit is reached */ #define EXTRA "..." =20 static enum print_line_t @@ -136,9 +136,8 @@ print_syscall_enter(struct trace_iterator *iter, int fl= ags, struct trace_entry *ent =3D iter->ent; struct syscall_trace_enter *trace; struct syscall_metadata *entry; - int i, syscall, val; + int i, syscall, val, len; unsigned char *ptr; - int len; =20 trace =3D (typeof(trace))ent; syscall =3D trace->nr; @@ -185,7 +184,23 @@ print_syscall_enter(struct trace_iterator *iter, int f= lags, ptr =3D (void *)ent + (val & 0xffff); len =3D val >> 16; =20 - trace_seq_printf(s, " \"%.*s\"", len, ptr); + if (entry->user_arg_size < 0) { + trace_seq_printf(s, " \"%.*s\"", len, ptr); + continue; + } + + val =3D trace->args[entry->user_arg_size]; + + trace_seq_puts(s, " ("); + for (int x =3D 0; x < len; x++, ptr++) { + if (x) + trace_seq_putc(s, ':'); + trace_seq_printf(s, "%02x", *ptr); + } + if (len < val) + trace_seq_printf(s, ", %s", EXTRA); + + trace_seq_putc(s, ')'); } =20 trace_seq_putc(s, ')'); @@ -250,8 +265,11 @@ __set_enter_print_fmt(struct syscall_metadata *entry, = char *buf, int len) if (!(BIT(i) & entry->user_mask)) continue; =20 - /* Add the format for the user space string */ - pos +=3D snprintf(buf + pos, LEN_OR_ZERO, " \\\"%%s\\\""); + /* Add the format for the user space string or array */ + if (entry->user_arg_size < 0) + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, " \\\"%%s\\\""); + else + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, " (%%s)"); } pos +=3D snprintf(buf + pos, LEN_OR_ZERO, "\""); =20 @@ -260,9 +278,14 @@ __set_enter_print_fmt(struct syscall_metadata *entry, = char *buf, int len) ", ((unsigned long)(REC->%s))", entry->args[i]); if (!(BIT(i) & entry->user_mask)) continue; - /* The user space string for arg has name ___val */ - pos +=3D snprintf(buf + pos, LEN_OR_ZERO, ", __get_str(__%s_val)", - entry->args[i]); + /* The user space data for arg has name ___val */ + if (entry->user_arg_size < 0) { + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, ", __get_str(__%s_val)", + entry->args[i]); + } else { + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, ", __print_dynamic_array(__%s= _val, 1)", + entry->args[i]); + } } =20 #undef LEN_OR_ZERO @@ -333,9 +356,9 @@ static int __init syscall_enter_define_fields(struct tr= ace_event_call *call) idx =3D ffs(mask) - 1; =20 /* - * User space strings are faulted into a temporary buffer and then - * added as a dynamic string to the end of the event. - * The user space string name for the arg pointer is "___val". + * User space data is faulted into a temporary buffer and then + * added as a dynamic string or array to the end of the event. + * The user space data name for the arg pointer is "___val". */ len =3D strlen(meta->args[idx]) + sizeof("___val"); arg =3D kmalloc(len, GFP_KERNEL); @@ -431,9 +454,11 @@ static char *sys_fault_user(struct syscall_metadata *s= ys_data, struct syscall_user_buffer *sbuf, unsigned long *args, unsigned int *data_size) { + trace_user_buf_copy syscall_copy =3D syscall_copy_user; unsigned long size =3D SYSCALL_FAULT_BUF_SZ - 1; unsigned long mask =3D sys_data->user_mask; int idx =3D ffs(mask) - 1; + bool array =3D false; char *ptr; char *buf; =20 @@ -441,27 +466,43 @@ static char *sys_fault_user(struct syscall_metadata *= sys_data, ptr =3D (char *)args[idx]; *data_size =3D 0; =20 + /* + * If this system call event has a size argument, use + * it to define how much of user space memory to read, + * and read it as an array and not a string. + */ + if (sys_data->user_arg_size >=3D 0) { + array =3D true; + size =3D args[sys_data->user_arg_size]; + if (size > SYSCALL_FAULT_BUF_SZ - 1) + size =3D SYSCALL_FAULT_BUF_SZ - 1; + /* use normal copy_from_user() */ + syscall_copy =3D NULL; + } + buf =3D trace_user_fault_read(&sbuf->buf, ptr, size, - syscall_copy_user, &size); + syscall_copy, &size); if (!buf) return NULL; =20 - /* Replace any non-printable characters with '.' */ - for (int i =3D 0; i < size; i++) { - if (!isprint(buf[i])) - buf[i] =3D '.'; - } + /* For strings, replace any non-printable characters with '.' */ + if (!array) { + for (int i =3D 0; i < size; i++) { + if (!isprint(buf[i])) + buf[i] =3D '.'; + } =20 - /* - * If the text was truncated due to our max limit, add "..." to - * the string. - */ - if (size > SYSCALL_FAULT_BUF_SZ - sizeof(EXTRA)) { - strscpy(buf + SYSCALL_FAULT_BUF_SZ - sizeof(EXTRA), - EXTRA, sizeof(EXTRA)); - size =3D SYSCALL_FAULT_BUF_SZ; - } else { - buf[size++] =3D '\0'; + /* + * If the text was truncated due to our max limit, add "..." to + * the string. + */ + if (size > SYSCALL_FAULT_BUF_SZ - sizeof(EXTRA)) { + strscpy(buf + SYSCALL_FAULT_BUF_SZ - sizeof(EXTRA), + EXTRA, sizeof(EXTRA)); + size =3D SYSCALL_FAULT_BUF_SZ; + } else { + buf[size++] =3D '\0'; + } } =20 *data_size =3D size; @@ -492,7 +533,7 @@ syscall_get_data(struct syscall_metadata *sys_data, uns= igned long *args, =20 static void syscall_put_data(struct syscall_metadata *sys_data, struct syscall_trace_enter *entry, - char *buffer, int size) + char *buffer, int size, int user_size) { void *ptr; int val; @@ -510,13 +551,16 @@ static void syscall_put_data(struct syscall_metadata = *sys_data, val =3D (ptr - (void *)entry) + 4; =20 /* Store the offset and the size into the meta data */ - *(int *)ptr =3D val | (size << 16); + *(int *)ptr =3D val | (user_size << 16); + + if (WARN_ON_ONCE((ptr - (void *)entry + user_size) > size)) + user_size =3D 0; =20 /* Nothing to do if the user space was empty or faulted */ - if (size) { + if (user_size) { /* Now store the user space data into the event */ ptr +=3D 4; - memcpy(ptr, buffer, size); + memcpy(ptr, buffer, user_size); } } =20 @@ -580,7 +624,7 @@ static void ftrace_syscall_enter(void *data, struct pt_= regs *regs, long id) memcpy(entry->args, args, sizeof(unsigned long) * sys_data->nb_args); =20 if (mayfault) - syscall_put_data(sys_data, entry, user_ptr, user_size); + syscall_put_data(sys_data, entry, user_ptr, size, user_size); =20 trace_event_buffer_commit(&fbuffer); } @@ -727,7 +771,16 @@ static void check_faultable_syscall(struct trace_event= _call *call, int nr) if (sys_data->enter_event !=3D call) return; =20 + sys_data->user_arg_size =3D -1; + switch (nr) { + /* user arg 1 with size arg at 2 */ + case __NR_write: + case __NR_mq_timedsend: + case __NR_pwrite64: + sys_data->user_mask =3D BIT(1); + sys_data->user_arg_size =3D 2; + break; /* user arg at position 0 */ #ifdef __NR_access case __NR_access: @@ -1065,7 +1118,7 @@ static void perf_syscall_enter(void *ignore, struct p= t_regs *regs, long id) memcpy(&rec->args, args, sizeof(unsigned long) * sys_data->nb_args); =20 if (mayfault) - syscall_put_data(sys_data, rec, user_ptr, user_size); + syscall_put_data(sys_data, rec, user_ptr, size, user_size); =20 if ((valid_prog_array && !perf_call_bpf_enter(sys_data->enter_event, fake_regs, sys_data, rec= )) || --=20 2.51.0 From nobody Wed Dec 17 16:29:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EC15E36B961; Tue, 28 Oct 2025 23:11:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693069; cv=none; b=eTj0lL7OjmUTt/3+ZhzK6WK6kPwQS31Aayv3re94B3VGlH0VE5BPBX5B4towe74aI8ta6oZdturWBg+8pZQZwheZjwR1YoIPjDyy5SnYpIfgEl2jcWubumdaHIQ8+/0Vvmman0E1JENDOhkZ5Td1Aadf4zlDGBVK1nCejTBa75w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693069; c=relaxed/simple; bh=uN+l80Ab7ApPK9S87261XX0pIHui/zk9ztMMRsEW41c=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=GcdSVN8LWfJ7JP9mfmhIhwJZH/1bdHHi9xCQ5hN2sssgUnUJwlB+5vemD7pqqTg+P66xM91TeXc6VzPVWEavMkxDJscPiZ8QnWplp0MB9BSFxKi6kEdapRGQcHgyhM3Kj+ONK+lHV9J3NMjhi9QGOIrGFL+ljji2m4FDY/P+yRg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=vANUJpgh; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="vANUJpgh" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BC602C4CEE7; Tue, 28 Oct 2025 23:11:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1761693068; bh=uN+l80Ab7ApPK9S87261XX0pIHui/zk9ztMMRsEW41c=; h=Date:From:To:Cc:Subject:References:From; b=vANUJpghhlwC/Fl10kmkH/KXl/4cZNjj1jx6gMYBP01H6EN7Rvv54fAtl8aDZl6iy 0tTeKxeL2m8kwmHdPOfFLeWQUVz39wHdg7uyMk0HYZ2Qdj7IGHXgMPCFjZfwtjqls7 UwB9tfZNpl7H+3kHflK/vxZ8qjr+cPw+joMswYJiHh/KCVYaGRfrVTVKr8VpR0wuSN HDTV5ns7hDu2BCQ1/H2cCmsqnRHgYHoxgTP8ZsrnKJT01/uD8FrbOUPh+rSHCdBiFq WJi0BR76RnTihaiFwLEdX6VVnJLydNjP4y9nWlgbU7wX2KvMOqdFY24B1PE5kP/+tP 7dW+i8LfKWlcw== Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1vDsqy-00000004qrf-0Jqi; Tue, 28 Oct 2025 19:11:48 -0400 Message-ID: <20251028231147.930550359@kernel.org> User-Agent: quilt/0.68 Date: Tue, 28 Oct 2025 19:11:20 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Peter Zijlstra , Namhyung Kim , Takaya Saeki , Tom Zanussi , Thomas Gleixner , Ian Rogers , Douglas Raillard , Arnaldo Carvalho de Melo , Jiri Olsa , Adrian Hunter , Ingo Molnar Subject: [PATCH v5 06/13] tracing: Display some syscall arrays as strings References: <20251028231114.820213884@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Some of the system calls that read a fixed length of memory from the user space address are not arrays but strings. Take a bit away from the nb_args field in the syscall meta data to use as a flag to denote that the system call's user_arg_size is being used as a string. The nb_args should never be more than 6, so 7 bits is plenty to hold that number. When the user_arg_is_str flag that, when set, will display the data array from the user space address as a string and not an array. This will allow the output to look like this: sys_sethostname(name: 0x5584310eb2a0 "debian", len: 6) Signed-off-by: Steven Rostedt (Google) --- Changes since v4: https://lore.kernel.org/20251021005233.251192796@kernel.o= rg - Use #ifdef __NR_kexec_file_load instead to hide that value include/trace/syscall.h | 4 +++- kernel/trace/trace_syscalls.c | 22 +++++++++++++++++++--- 2 files changed, 22 insertions(+), 4 deletions(-) diff --git a/include/trace/syscall.h b/include/trace/syscall.h index 9413c139da66..0dd7f2b33431 100644 --- a/include/trace/syscall.h +++ b/include/trace/syscall.h @@ -16,6 +16,7 @@ * @name: name of the syscall * @syscall_nr: number of the syscall * @nb_args: number of parameters it takes + * @user_arg_is_str: set if the arg for @user_arg_size is a string * @user_arg_size: holds @arg that has size of the user space to read * @user_mask: mask of @args that will read user space * @types: list of types as strings @@ -27,7 +28,8 @@ struct syscall_metadata { const char *name; int syscall_nr; - u8 nb_args; + u8 nb_args:7; + u8 user_arg_is_str:1; s8 user_arg_size; short user_mask; const char **types; diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c index 3f3cdfc9958e..b8e9774a8abd 100644 --- a/kernel/trace/trace_syscalls.c +++ b/kernel/trace/trace_syscalls.c @@ -184,7 +184,7 @@ print_syscall_enter(struct trace_iterator *iter, int fl= ags, ptr =3D (void *)ent + (val & 0xffff); len =3D val >> 16; =20 - if (entry->user_arg_size < 0) { + if (entry->user_arg_size < 0 || entry->user_arg_is_str) { trace_seq_printf(s, " \"%.*s\"", len, ptr); continue; } @@ -249,6 +249,7 @@ print_syscall_exit(struct trace_iterator *iter, int fla= gs, static int __init __set_enter_print_fmt(struct syscall_metadata *entry, char *buf, int len) { + bool is_string =3D entry->user_arg_is_str; int i; int pos =3D 0; =20 @@ -266,7 +267,7 @@ __set_enter_print_fmt(struct syscall_metadata *entry, c= har *buf, int len) continue; =20 /* Add the format for the user space string or array */ - if (entry->user_arg_size < 0) + if (entry->user_arg_size < 0 || is_string) pos +=3D snprintf(buf + pos, LEN_OR_ZERO, " \\\"%%s\\\""); else pos +=3D snprintf(buf + pos, LEN_OR_ZERO, " (%%s)"); @@ -279,7 +280,7 @@ __set_enter_print_fmt(struct syscall_metadata *entry, c= har *buf, int len) if (!(BIT(i) & entry->user_mask)) continue; /* The user space data for arg has name ___val */ - if (entry->user_arg_size < 0) { + if (entry->user_arg_size < 0 || is_string) { pos +=3D snprintf(buf + pos, LEN_OR_ZERO, ", __get_str(__%s_val)", entry->args[i]); } else { @@ -781,6 +782,21 @@ static void check_faultable_syscall(struct trace_event= _call *call, int nr) sys_data->user_mask =3D BIT(1); sys_data->user_arg_size =3D 2; break; + /* user arg 0 with size arg at 1 as string */ + case __NR_setdomainname: + case __NR_sethostname: + sys_data->user_mask =3D BIT(0); + sys_data->user_arg_size =3D 1; + sys_data->user_arg_is_str =3D 1; + break; +#ifdef __NR_kexec_file_load + /* user arg 4 with size arg at 3 as string */ + case __NR_kexec_file_load: + sys_data->user_mask =3D BIT(4); + sys_data->user_arg_size =3D 3; + sys_data->user_arg_is_str =3D 1; + break; +#endif /* user arg at position 0 */ #ifdef __NR_access case __NR_access: --=20 2.51.0 From nobody Wed Dec 17 16:29:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8CBE036B995; Tue, 28 Oct 2025 23:11:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693069; cv=none; b=hbi3eIdcuYhfP1+isuuglgBWfv1oXVFRCzlHaUu6ibdiia7w3ksCPS0MNPF63hNXIoeUdDXArGrZ2rL/Wg5mUdCVg+At2o3Wzl3KhEmdC9JVbWnrAOZkSM3zqq5soJUkl0PfHAi/8OIOniiEVvYWSuoByUFVHCq2EIPBdHVn/LQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693069; c=relaxed/simple; bh=RLJZLUQvp5+TQAFf9/GPVT3D7wyG54pV2Z69yQRZrl4=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=fO/IPRTzWILrcOgy8zezMCp86s4r8h1bPkoZUxGG/D9HIuCkcz7K0EhZ+8AK1EVnIoXWbFADFfpR8fIkCASSCNKZTng9BpxeJ5BbZRem0E5xIHrT0UCwbVzttVMAzWCab7pl6ZWXBGEJoelCtvC/oZWGX3IV4YboKuh1f5JeV/Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CsvyR+Kj; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CsvyR+Kj" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E2F69C19421; Tue, 28 Oct 2025 23:11:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1761693069; bh=RLJZLUQvp5+TQAFf9/GPVT3D7wyG54pV2Z69yQRZrl4=; h=Date:From:To:Cc:Subject:References:From; b=CsvyR+Kju9uNX3raBQzCWAZzfuz8zlfFw20a0PtwQENi0fo9oSHNCHy72e+ZUrBbi zsd2NQJTK1U2Q7UEJb6YMSd+lYYGVm2ZOVqqE5i7pTA5KM3yDUSKKkuYlneQOaCzig IrW+Ky12k56AH7hT5Pq4FLBbEqvMLe32IU48+W8QYBRuoHE9dIb9bHp/NhPmBYonKN JG1MXeqAuFca1c1jqgBYAM4EaUwapuuaBQ01b4LSCyag4dBMJeC/GVJHjKMUmcZ9BL 5y2kHHUCrkx7zg2BPQGh0CRBQ1pgCjBWpPYHi55iCdjeIKvCQp/icMAp+bJgM9ocQp C38XSJGN13SDQ== Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1vDsqy-00000004qs9-10ab; Tue, 28 Oct 2025 19:11:48 -0400 Message-ID: <20251028231148.095789277@kernel.org> User-Agent: quilt/0.68 Date: Tue, 28 Oct 2025 19:11:21 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Peter Zijlstra , Namhyung Kim , Takaya Saeki , Tom Zanussi , Thomas Gleixner , Ian Rogers , Douglas Raillard , Arnaldo Carvalho de Melo , Jiri Olsa , Adrian Hunter , Ingo Molnar Subject: [PATCH v5 07/13] tracing: Allow syscall trace events to read more than one user parameter References: <20251028231114.820213884@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Allow more than one field of a syscall trace event to read user space. Build on top of the user_mask by allowing more than one bit to be set that corresponds to the @args array of the syscall metadata. For each argument in the @args array that is to be read, it will have a dynamic array/string field associated to it. Note that multiple fields to be read from user space is not supported if the user_arg_size field is set in the syscall metada. That field can only be used if only one field is being read from user space as that field is a number representing the size field of the syscall event that holds the size of the data to read from user space. It becomes ambiguous if the system call reads more than one field. Currently this is not an issue. If a syscall event happens to enable two events to read user space and sets the user_arg_size field, it will trigger a warning at boot and the user_arg_size field will be cleared. The per CPU buffer that is used to read the user space addresses is now broken up into 3 sections, each of 168 bytes. The reason for 168 is that it is the biggest portion of 512 bytes divided by 3 that is 8 byte aligned. The max amount copied into the ring buffer from user space is now only 128 bytes, which is plenty. When reading user space, it still reads 167 (168-1) bytes and uses the remaining to know if it should append the extra "..." to the end or not. This will allow the event to look like this: sys_renameat2(olddfd: 0xffffff9c, oldname: 0x7ffe02facdff "/tmp/x", newdf= d: 0xffffff9c, newname: 0x7ffe02face06 "/tmp/y", flags: 1) Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace_syscalls.c | 337 +++++++++++++++++++++++----------- 1 file changed, 229 insertions(+), 108 deletions(-) diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c index b8e9774a8abd..3eafe1b8f53e 100644 --- a/kernel/trace/trace_syscalls.c +++ b/kernel/trace/trace_syscalls.c @@ -138,6 +138,7 @@ print_syscall_enter(struct trace_iterator *iter, int fl= ags, struct syscall_metadata *entry; int i, syscall, val, len; unsigned char *ptr; + int offset =3D 0; =20 trace =3D (typeof(trace))ent; syscall =3D trace->nr; @@ -177,12 +178,13 @@ print_syscall_enter(struct trace_iterator *iter, int = flags, continue; =20 /* This arg points to a user space string */ - ptr =3D (void *)trace->args + sizeof(long) * entry->nb_args; + ptr =3D (void *)trace->args + sizeof(long) * entry->nb_args + offset; val =3D *(int *)ptr; =20 /* The value is a dynamic string (len << 16 | offset) */ ptr =3D (void *)ent + (val & 0xffff); len =3D val >> 16; + offset +=3D 4; =20 if (entry->user_arg_size < 0 || entry->user_arg_is_str) { trace_seq_printf(s, " \"%.*s\"", len, ptr); @@ -335,7 +337,6 @@ static int __init syscall_enter_define_fields(struct tr= ace_event_call *call) unsigned long mask; char *arg; int offset =3D offsetof(typeof(trace), args); - int idx; int ret =3D 0; int len; int i; @@ -354,31 +355,56 @@ static int __init syscall_enter_define_fields(struct = trace_event_call *call) return ret; =20 mask =3D meta->user_mask; - idx =3D ffs(mask) - 1; =20 - /* - * User space data is faulted into a temporary buffer and then - * added as a dynamic string or array to the end of the event. - * The user space data name for the arg pointer is "___val". - */ - len =3D strlen(meta->args[idx]) + sizeof("___val"); - arg =3D kmalloc(len, GFP_KERNEL); - if (WARN_ON_ONCE(!arg)) { - meta->user_mask =3D 0; - return -ENOMEM; - } + while (mask) { + int idx =3D ffs(mask) - 1; + mask &=3D ~BIT(idx); + + /* + * User space data is faulted into a temporary buffer and then + * added as a dynamic string or array to the end of the event. + * The user space data name for the arg pointer is + * "___val". + */ + len =3D strlen(meta->args[idx]) + sizeof("___val"); + arg =3D kmalloc(len, GFP_KERNEL); + if (WARN_ON_ONCE(!arg)) { + meta->user_mask =3D 0; + return -ENOMEM; + } =20 - snprintf(arg, len, "__%s_val", meta->args[idx]); + snprintf(arg, len, "__%s_val", meta->args[idx]); =20 - ret =3D trace_define_field(call, "__data_loc char[]", - arg, offset, sizeof(int), 0, - FILTER_OTHER); - if (ret) - kfree(arg); + ret =3D trace_define_field(call, "__data_loc char[]", + arg, offset, sizeof(int), 0, + FILTER_OTHER); + if (ret) { + kfree(arg); + break; + } + offset +=3D 4; + } return ret; } =20 +/* + * Create a per CPU temporary buffer to copy user space pointers into. + * + * SYSCALL_FAULT_BUF_SZ holds the size of the per CPU buffer to use + * to copy memory from user space addresses into. + * + * SYSCALL_FAULT_ARG_SZ is the amount to copy from user space. + * + * SYSCALL_FAULT_USER_MAX is the amount to copy into the ring buffer. + * It's slightly smaller than SYSCALL_FAULT_ARG_SZ to know if it + * needs to append the EXTRA or not. + * + * This only allows up to 3 args from system calls. + */ #define SYSCALL_FAULT_BUF_SZ 512 +#define SYSCALL_FAULT_ARG_SZ 168 +#define SYSCALL_FAULT_USER_MAX 128 +#define SYSCALL_FAULT_MAX_CNT 3 =20 /* Use the tracing per CPU buffer infrastructure to copy from user space */ struct syscall_user_buffer { @@ -438,34 +464,58 @@ static void syscall_fault_buffer_disable(void) call_rcu_tasks_trace(&sbuf->rcu, rcu_free_syscall_buffer); } =20 +struct syscall_args { + char *ptr_array[SYSCALL_FAULT_MAX_CNT]; + int read[SYSCALL_FAULT_MAX_CNT]; + int uargs; +}; + static int syscall_copy_user(char *buf, const char __user *ptr, size_t size, void *data) { - unsigned long *ret_size =3D data; + struct syscall_args *args =3D data; + int ret; + + for (int i =3D 0; i < args->uargs; i++, buf +=3D SYSCALL_FAULT_ARG_SZ) { + ptr =3D (char __user *)args->ptr_array[i]; + ret =3D strncpy_from_user(buf, ptr, size); + args->read[i] =3D ret; + } + return 0; +} + +static int syscall_copy_user_array(char *buf, const char __user *ptr, + size_t size, void *data) +{ + struct syscall_args *args =3D data; int ret; =20 - ret =3D strncpy_from_user(buf, ptr, size); - if (ret < 0) - return 1; - *ret_size =3D ret; + for (int i =3D 0; i < args->uargs; i++, buf +=3D SYSCALL_FAULT_ARG_SZ) { + ptr =3D (char __user *)args->ptr_array[i]; + ret =3D __copy_from_user(buf, ptr, size); + args->read[i] =3D ret ? -1 : size; + } return 0; } =20 static char *sys_fault_user(struct syscall_metadata *sys_data, struct syscall_user_buffer *sbuf, - unsigned long *args, unsigned int *data_size) + unsigned long *args, + unsigned int data_size[SYSCALL_FAULT_MAX_CNT]) { trace_user_buf_copy syscall_copy =3D syscall_copy_user; - unsigned long size =3D SYSCALL_FAULT_BUF_SZ - 1; unsigned long mask =3D sys_data->user_mask; - int idx =3D ffs(mask) - 1; + unsigned long size =3D SYSCALL_FAULT_ARG_SZ - 1; + struct syscall_args sargs; bool array =3D false; - char *ptr; + char *buffer; char *buf; + int ret; + int i =3D 0; =20 - /* Get the pointer to user space memory to read */ - ptr =3D (char *)args[idx]; - *data_size =3D 0; + /* The extra is appended to the user data in the buffer */ + BUILD_BUG_ON(SYSCALL_FAULT_USER_MAX + sizeof(EXTRA) >=3D + SYSCALL_FAULT_ARG_SZ); =20 /* * If this system call event has a size argument, use @@ -475,67 +525,103 @@ static char *sys_fault_user(struct syscall_metadata = *sys_data, if (sys_data->user_arg_size >=3D 0) { array =3D true; size =3D args[sys_data->user_arg_size]; - if (size > SYSCALL_FAULT_BUF_SZ - 1) - size =3D SYSCALL_FAULT_BUF_SZ - 1; - /* use normal copy_from_user() */ - syscall_copy =3D NULL; + if (size > SYSCALL_FAULT_ARG_SZ - 1) + size =3D SYSCALL_FAULT_ARG_SZ - 1; + syscall_copy =3D syscall_copy_user_array; } =20 - buf =3D trace_user_fault_read(&sbuf->buf, ptr, size, - syscall_copy, &size); - if (!buf) + while (mask) { + int idx =3D ffs(mask) - 1; + mask &=3D ~BIT(idx); + + if (WARN_ON_ONCE(i =3D=3D SYSCALL_FAULT_MAX_CNT)) + break; + + /* Get the pointer to user space memory to read */ + sargs.ptr_array[i++] =3D (char *)args[idx]; + } + + sargs.uargs =3D i; + + /* Clear the values that are not used */ + for (; i < SYSCALL_FAULT_MAX_CNT; i++) { + data_size[i] =3D -1; /* Denotes no pointer */ + } + + buffer =3D trace_user_fault_read(&sbuf->buf, NULL, size, + syscall_copy, &sargs); + if (!buffer) return NULL; =20 - /* For strings, replace any non-printable characters with '.' */ - if (!array) { - for (int i =3D 0; i < size; i++) { - if (!isprint(buf[i])) - buf[i] =3D '.'; - } + buf =3D buffer; + for (i =3D 0; i < sargs.uargs; i++, buf +=3D SYSCALL_FAULT_ARG_SZ) { =20 - /* - * If the text was truncated due to our max limit, add "..." to - * the string. - */ - if (size > SYSCALL_FAULT_BUF_SZ - sizeof(EXTRA)) { - strscpy(buf + SYSCALL_FAULT_BUF_SZ - sizeof(EXTRA), - EXTRA, sizeof(EXTRA)); - size =3D SYSCALL_FAULT_BUF_SZ; + ret =3D sargs.read[i]; + if (ret < 0) + continue; + buf[ret] =3D '\0'; + + /* For strings, replace any non-printable characters with '.' */ + if (!array) { + for (int x =3D 0; x < ret; x++) { + if (!isprint(buf[x])) + buf[x] =3D '.'; + } + + /* + * If the text was truncated due to our max limit, + * add "..." to the string. + */ + if (ret > SYSCALL_FAULT_USER_MAX) { + strscpy(buf + SYSCALL_FAULT_USER_MAX, EXTRA, + sizeof(EXTRA)); + ret =3D SYSCALL_FAULT_USER_MAX + sizeof(EXTRA); + } else { + buf[ret++] =3D '\0'; + } } else { - buf[size++] =3D '\0'; + ret =3D min(ret, SYSCALL_FAULT_USER_MAX); } + data_size[i] =3D ret; } =20 - *data_size =3D size; - return buf; + return buffer; } =20 static int syscall_get_data(struct syscall_metadata *sys_data, unsigned long *args, - char **buffer, int *size, int *user_size) + char **buffer, int *size, int *user_sizes, int *uargs) { struct syscall_user_buffer *sbuf; + int i; =20 /* If the syscall_buffer is NULL, tracing is being shutdown */ sbuf =3D READ_ONCE(syscall_buffer); if (!sbuf) return -1; =20 - *buffer =3D sys_fault_user(sys_data, sbuf, args, user_size); + *buffer =3D sys_fault_user(sys_data, sbuf, args, user_sizes); /* * user_size is the amount of data to append. * Need to add 4 for the meta field that points to * the user memory at the end of the event and also * stores its size. */ - *size =3D 4 + *user_size; + for (i =3D 0; i < SYSCALL_FAULT_MAX_CNT; i++) { + if (user_sizes[i] < 0) + break; + *size +=3D user_sizes[i] + 4; + } + /* Save the number of user read arguments of this syscall */ + *uargs =3D i; return 0; } =20 static void syscall_put_data(struct syscall_metadata *sys_data, struct syscall_trace_enter *entry, - char *buffer, int size, int user_size) + char *buffer, int size, int *user_sizes, int uargs) { + char *buf =3D buffer; void *ptr; int val; =20 @@ -547,21 +633,30 @@ static void syscall_put_data(struct syscall_metadata = *sys_data, =20 /* * The meta data will store the offset of the user data from - * the beginning of the event. + * the beginning of the event. That is after the static arguments + * and the meta data fields. */ - val =3D (ptr - (void *)entry) + 4; + val =3D (ptr - (void *)entry) + 4 * uargs; + + for (int i =3D 0; i < uargs; i++) { =20 - /* Store the offset and the size into the meta data */ - *(int *)ptr =3D val | (user_size << 16); + if (i) + val +=3D user_sizes[i - 1]; =20 - if (WARN_ON_ONCE((ptr - (void *)entry + user_size) > size)) - user_size =3D 0; + /* Store the offset and the size into the meta data */ + *(int *)ptr =3D val | (user_sizes[i] << 16); =20 - /* Nothing to do if the user space was empty or faulted */ - if (user_size) { - /* Now store the user space data into the event */ + /* Skip the meta data */ ptr +=3D 4; - memcpy(ptr, buffer, user_size); + } + + for (int i =3D 0; i < uargs; i++, buf +=3D SYSCALL_FAULT_ARG_SZ) { + /* Nothing to do if the user space was empty or faulted */ + if (!user_sizes[i]) + continue; + + memcpy(ptr, buf, user_sizes[i]); + ptr +=3D user_sizes[i]; } } =20 @@ -574,9 +669,10 @@ static void ftrace_syscall_enter(void *data, struct pt= _regs *regs, long id) struct trace_event_buffer fbuffer; unsigned long args[6]; char *user_ptr; - int user_size =3D 0; + int user_sizes[SYSCALL_FAULT_MAX_CNT] =3D {}; int syscall_nr; int size =3D 0; + int uargs =3D 0; bool mayfault; =20 /* @@ -609,7 +705,7 @@ static void ftrace_syscall_enter(void *data, struct pt_= regs *regs, long id) =20 if (mayfault) { if (syscall_get_data(sys_data, args, &user_ptr, - &size, &user_size) < 0) + &size, user_sizes, &uargs) < 0) return; } =20 @@ -625,7 +721,7 @@ static void ftrace_syscall_enter(void *data, struct pt_= regs *regs, long id) memcpy(entry->args, args, sizeof(unsigned long) * sys_data->nb_args); =20 if (mayfault) - syscall_put_data(sys_data, entry, user_ptr, size, user_size); + syscall_put_data(sys_data, entry, user_ptr, size, user_sizes, uargs); =20 trace_event_buffer_commit(&fbuffer); } @@ -767,6 +863,7 @@ static void unreg_event_syscall_exit(struct trace_event= _file *file, static void check_faultable_syscall(struct trace_event_call *call, int nr) { struct syscall_metadata *sys_data =3D call->data; + unsigned long mask; =20 /* Only work on entry */ if (sys_data->enter_event !=3D call) @@ -802,7 +899,6 @@ static void check_faultable_syscall(struct trace_event_= call *call, int nr) case __NR_access: #endif case __NR_acct: - case __NR_add_key: /* Just _type. TODO add _description */ case __NR_chdir: #ifdef __NR_chown case __NR_chown: @@ -817,23 +913,13 @@ static void check_faultable_syscall(struct trace_even= t_call *call, int nr) case __NR_delete_module: case __NR_execve: case __NR_fsopen: - case __NR_getxattr: /* Just pathname, TODO add name */ #ifdef __NR_lchown case __NR_lchown: #endif - case __NR_lgetxattr: /* Just pathname, TODO add name */ - case __NR_lremovexattr: /* Just pathname, TODO add name */ -#ifdef __NR_link - case __NR_link: /* Just oldname. TODO add newname */ -#endif - case __NR_listxattr: /* Just pathname, TODO add list */ - case __NR_llistxattr: /* Just pathname, TODO add list */ - case __NR_lsetxattr: /* Just pathname, TODO add list */ #ifdef __NR_open case __NR_open: #endif case __NR_memfd_create: - case __NR_mount: /* Just dev_name, TODO add dir_name and type */ #ifdef __NR_mkdir case __NR_mkdir: #endif @@ -842,28 +928,18 @@ static void check_faultable_syscall(struct trace_even= t_call *call, int nr) #endif case __NR_mq_open: case __NR_mq_unlink: - case __NR_pivot_root: /* Just new_root, TODO add old_root */ #ifdef __NR_readlink case __NR_readlink: #endif - case __NR_removexattr: /* Just pathname, TODO add name */ -#ifdef __NR_rename - case __NR_rename: /* Just oldname. TODO add newname */ -#endif - case __NR_request_key: /* Just _type. TODO add _description */ #ifdef __NR_rmdir case __NR_rmdir: #endif - case __NR_setxattr: /* Just pathname, TODO add list */ case __NR_shmdt: #ifdef __NR_statfs case __NR_statfs: #endif case __NR_swapon: case __NR_swapoff: -#ifdef __NR_symlink - case __NR_symlink: /* Just oldname. TODO add newname */ -#endif #ifdef __NR_truncate case __NR_truncate: #endif @@ -895,14 +971,10 @@ static void check_faultable_syscall(struct trace_even= t_call *call, int nr) #ifdef __NR_futimesat case __NR_futimesat: #endif - case __NR_getxattrat: /* Just pathname, TODO add name */ case __NR_inotify_add_watch: - case __NR_linkat: /* Just oldname. TODO add newname */ - case __NR_listxattrat: /* Just pathname, TODO add list */ case __NR_mkdirat: case __NR_mknodat: case __NR_mount_setattr: - case __NR_move_mount: /* Just from_pathname, TODO add to_pathname */ case __NR_name_to_handle_at: #ifdef __NR_newfstatat case __NR_newfstatat: @@ -912,15 +984,8 @@ static void check_faultable_syscall(struct trace_event= _call *call, int nr) case __NR_open_tree: case __NR_open_tree_attr: case __NR_readlinkat: -#ifdef __NR_renameat - case __NR_renameat: /* Just oldname. TODO add newname */ -#endif - case __NR_renameat2: /* Just oldname. TODO add newname */ - case __NR_removexattrat: /* Just pathname, TODO add name */ case __NR_quotactl: - case __NR_setxattrat: /* Just pathname, TODO add list */ case __NR_syslog: - case __NR_symlinkat: /* Just oldname. TODO add newname */ case __NR_statx: case __NR_unlinkat: case __NR_utimensat: @@ -935,9 +1000,64 @@ static void check_faultable_syscall(struct trace_even= t_call *call, int nr) case __NR_fanotify_mark: sys_data->user_mask =3D BIT(4); break; + /* 2 user args, 0 and 1 */ + case __NR_add_key: + case __NR_getxattr: + case __NR_lgetxattr: + case __NR_lremovexattr: +#ifdef __NR_link + case __NR_link: +#endif + case __NR_listxattr: + case __NR_llistxattr: + case __NR_lsetxattr: + case __NR_pivot_root: + case __NR_removexattr: +#ifdef __NR_rename + case __NR_rename: +#endif + case __NR_request_key: + case __NR_setxattr: +#ifdef __NR_symlink + case __NR_symlink: +#endif + sys_data->user_mask =3D BIT(0) | BIT(1); + break; + /* 2 user args, 0 and 2 */ + case __NR_symlinkat: + sys_data->user_mask =3D BIT(0) | BIT(2); + break; + /* 2 user args, 1 and 3 */ + case __NR_getxattrat: + case __NR_linkat: + case __NR_listxattrat: + case __NR_move_mount: +#ifdef __NR_renameat + case __NR_renameat: +#endif + case __NR_renameat2: + case __NR_removexattrat: + case __NR_setxattrat: + sys_data->user_mask =3D BIT(1) | BIT(3); + break; + case __NR_mount: /* Just dev_name and dir_name, TODO add type */ + sys_data->user_mask =3D BIT(0) | BIT(1) | BIT(2); + break; default: sys_data->user_mask =3D 0; + return; } + + if (sys_data->user_arg_size < 0) + return; + + /* + * The user_arg_size can only be used when the system call + * is reading only a single address from user space. + */ + mask =3D sys_data->user_mask; + if (WARN_ON(mask & (mask - 1))) + sys_data->user_arg_size =3D -1; } =20 static int __init init_syscall_trace(struct trace_event_call *call) @@ -1083,10 +1203,11 @@ static void perf_syscall_enter(void *ignore, struct= pt_regs *regs, long id) bool valid_prog_array; bool mayfault; char *user_ptr; + int user_sizes[SYSCALL_FAULT_MAX_CNT] =3D {}; int syscall_nr; - int user_size; int rctx; int size =3D 0; + int uargs =3D 0; =20 /* * Syscall probe called with preemption enabled, but the ring @@ -1112,7 +1233,7 @@ static void perf_syscall_enter(void *ignore, struct p= t_regs *regs, long id) =20 if (mayfault) { if (syscall_get_data(sys_data, args, &user_ptr, - &size, &user_size) < 0) + &size, user_sizes, &uargs) < 0) return; } =20 @@ -1134,7 +1255,7 @@ static void perf_syscall_enter(void *ignore, struct p= t_regs *regs, long id) memcpy(&rec->args, args, sizeof(unsigned long) * sys_data->nb_args); =20 if (mayfault) - syscall_put_data(sys_data, rec, user_ptr, size, user_size); + syscall_put_data(sys_data, rec, user_ptr, size, user_sizes, uargs); =20 if ((valid_prog_array && !perf_call_bpf_enter(sys_data->enter_event, fake_regs, sys_data, rec= )) || --=20 2.51.0 From nobody Wed Dec 17 16:29:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A5A036B996; Tue, 28 Oct 2025 23:11:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693069; cv=none; b=QuXkuAguY7EJeCYQf6MspO0Rwlv2htnHqxvvClYHkrtJsA0UfFn7tP5R7mo7O5CsaofKRM1gz1byyJmTUj+Yui2Cx/13Y8OtrctKB0a4pP7Bw/W7ewBxmUQm7KISSC8R+iXfiMg8ZbVfgYtciLau7Q2wNu3Y9yMuEdL8NboxHGA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693069; c=relaxed/simple; bh=I45Ybulf26+YYRIPFaDSUbcSl4yLjjO42toNC67S6kM=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=jt3IJFrr/3tw2PegmHN094f9EvdBowabKf3GvRhAgIvbFxDsUITP6UdEbxRW/6S+JUOcwrUGlhBTLXeLq0WMVeoBE/z9qODbvrYZcJ1ZeTiRbWiid7g+WVYUTuD2p74vieKzmkfXJTFOW/zfQQWSiyMmzgqn5UEQVuTTQwa3nT8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=oNhn5Nu9; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="oNhn5Nu9" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 17FDCC116B1; Tue, 28 Oct 2025 23:11:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1761693069; bh=I45Ybulf26+YYRIPFaDSUbcSl4yLjjO42toNC67S6kM=; h=Date:From:To:Cc:Subject:References:From; b=oNhn5Nu9c5tFpg/Ox40G4EnGD2L5lilgOuf+//vqqej+Bb+QCzOMedFu+sVvq9F+v ukyGCOfpW3gB1AkCghvg0zggA0ojwiXx3K3XoCkCmozV643WMkQJOt/rMfNoCPWbRJ 1dxH08ci2zaQ5T5RGsl1psMQjdZD4MT7fihZh22Z+4F8CoJ/ycZzFKQhudmnYvUPkk YeOuiPiXDPbb8nrwaPenJv9mjD+BvDmO+w2OqDbk3ZOWuf2NLLQCUMDoXoKJzOIoU0 ZbmsTGkxRcvp5AEGgdBT5Q8OZE1nAVVqLW/Txw139JTUWhQ5mB6rPtHAKsaxqq2Xr9 ZkWebwv9gRKwQ== Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1vDsqy-00000004qsd-1iQj; Tue, 28 Oct 2025 19:11:48 -0400 Message-ID: <20251028231148.260068913@kernel.org> User-Agent: quilt/0.68 Date: Tue, 28 Oct 2025 19:11:22 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Peter Zijlstra , Namhyung Kim , Takaya Saeki , Tom Zanussi , Thomas Gleixner , Ian Rogers , Douglas Raillard , Arnaldo Carvalho de Melo , Jiri Olsa , Adrian Hunter , Ingo Molnar Subject: [PATCH v5 08/13] tracing: Add a config and syscall_user_buf_size file to limit amount written References: <20251028231114.820213884@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt When a system call that can copy user space addresses into the ring buffer, it can copy up to 511 bytes of data. This can waste precious ring buffer space if the user isn't interested in the output. Add a new file "syscall_user_buf_size" that gets initialized to a new config CONFIG_SYSCALL_BUF_SIZE_DEFAULT that defaults to 63. The config also is used to limit how much perf can read from user space. Also lower the max down to 165, as this isn't to record everything that a system call may be passing through to the kernel. 165 is more than enough. The reason for 165 is because adding one for the nul terminating byte, as well as possibly needing to append the "..." string turns it into 170 bytes. As this needs to save up to 3 arguments and 3 * 170 is 510 which fits nicely in 512 bytes (a power of 2). Signed-off-by: Steven Rostedt (Google) --- Documentation/trace/ftrace.rst | 8 ++++++ kernel/trace/Kconfig | 14 +++++++++ kernel/trace/trace.c | 52 ++++++++++++++++++++++++++++++++++ kernel/trace/trace.h | 3 ++ kernel/trace/trace_syscalls.c | 50 ++++++++++++++++++-------------- 5 files changed, 105 insertions(+), 22 deletions(-) diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst index aef674df3afd..d1f313a5f4ad 100644 --- a/Documentation/trace/ftrace.rst +++ b/Documentation/trace/ftrace.rst @@ -366,6 +366,14 @@ of ftrace. Here is a list of some of the key files: for each function. The displayed address is the patch-site address and can differ from /proc/kallsyms address. =20 + syscall_user_buf_size: + + Some system call trace events will record the data from a user + space address that one of the parameters point to. The amount of + data per event is limited. This file holds the max number of bytes + that will be recorded into the ring buffer to hold this data. + The max value is currently 165. + dyn_ftrace_total_info: =20 This file is for debugging purposes. The number of functions that diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index d2c79da81e4f..99283b2dcfd6 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -575,6 +575,20 @@ config FTRACE_SYSCALLS help Basic tracer to catch the syscall entry and exit events. =20 +config TRACE_SYSCALL_BUF_SIZE_DEFAULT + int "System call user read max size" + range 0 165 + default 63 + depends on FTRACE_SYSCALLS + help + Some system call trace events will record the data from a user + space address that one of the parameters point to. The amount of + data per event is limited. That limit is set by this config and + this config also affects how much user space data perf can read. + + For a tracing instance, this size may be changed by writing into + its syscall_user_buf_size file. + config TRACER_SNAPSHOT bool "Create a snapshot trace buffer" select TRACER_MAX_TRACE diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 50832411c5c0..2aee9a3088f4 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -6911,6 +6911,43 @@ static ssize_t tracing_splice_read_pipe(struct file = *filp, goto out; } =20 +static ssize_t +tracing_syscall_buf_read(struct file *filp, char __user *ubuf, + size_t cnt, loff_t *ppos) +{ + struct inode *inode =3D file_inode(filp); + struct trace_array *tr =3D inode->i_private; + char buf[64]; + int r; + + r =3D snprintf(buf, 64, "%d\n", tr->syscall_buf_sz); + + return simple_read_from_buffer(ubuf, cnt, ppos, buf, r); +} + +static ssize_t +tracing_syscall_buf_write(struct file *filp, const char __user *ubuf, + size_t cnt, loff_t *ppos) +{ + struct inode *inode =3D file_inode(filp); + struct trace_array *tr =3D inode->i_private; + unsigned long val; + int ret; + + ret =3D kstrtoul_from_user(ubuf, cnt, 10, &val); + if (ret) + return ret; + + if (val > SYSCALL_FAULT_USER_MAX) + val =3D SYSCALL_FAULT_USER_MAX; + + tr->syscall_buf_sz =3D val; + + *ppos +=3D cnt; + + return cnt; +} + static ssize_t tracing_entries_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos) @@ -8043,6 +8080,14 @@ static const struct file_operations tracing_entries_= fops =3D { .release =3D tracing_release_generic_tr, }; =20 +static const struct file_operations tracing_syscall_buf_fops =3D { + .open =3D tracing_open_generic_tr, + .read =3D tracing_syscall_buf_read, + .write =3D tracing_syscall_buf_write, + .llseek =3D generic_file_llseek, + .release =3D tracing_release_generic_tr, +}; + static const struct file_operations tracing_buffer_meta_fops =3D { .open =3D tracing_buffer_meta_open, .read =3D seq_read, @@ -10145,6 +10190,8 @@ trace_array_create_systems(const char *name, const = char *systems, =20 raw_spin_lock_init(&tr->start_lock); =20 + tr->syscall_buf_sz =3D global_trace.syscall_buf_sz; + tr->max_lock =3D (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; #ifdef CONFIG_TRACER_MAX_TRACE spin_lock_init(&tr->snapshot_trigger_lock); @@ -10461,6 +10508,9 @@ init_tracer_tracefs(struct trace_array *tr, struct = dentry *d_tracer) trace_create_file("buffer_subbuf_size_kb", TRACE_MODE_WRITE, d_tracer, tr, &buffer_subbuf_size_fops); =20 + trace_create_file("syscall_user_buf_size", TRACE_MODE_WRITE, d_tracer, + tr, &tracing_syscall_buf_fops); + create_trace_options_dir(tr); =20 #ifdef CONFIG_TRACER_MAX_TRACE @@ -11386,6 +11436,8 @@ __init static int tracer_alloc_buffers(void) =20 global_trace.flags =3D TRACE_ARRAY_FL_GLOBAL; =20 + global_trace.syscall_buf_sz =3D CONFIG_TRACE_SYSCALL_BUF_SIZE_DEFAULT; + INIT_LIST_HEAD(&global_trace.systems); INIT_LIST_HEAD(&global_trace.events); INIT_LIST_HEAD(&global_trace.hist_vars); diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 8439fe3058cc..d5cb4bc6cd2e 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -131,6 +131,8 @@ enum trace_type { #define HIST_STACKTRACE_SIZE (HIST_STACKTRACE_DEPTH * sizeof(unsigned long= )) #define HIST_STACKTRACE_SKIP 5 =20 +#define SYSCALL_FAULT_USER_MAX 165 + /* * syscalls are special, and need special handling, this is why * they are not included in trace_entries.h @@ -430,6 +432,7 @@ struct trace_array { int function_enabled; #endif int no_filter_buffering_ref; + unsigned int syscall_buf_sz; struct list_head hist_vars; #ifdef CONFIG_TRACER_SNAPSHOT struct cond_snapshot *cond_snapshot; diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c index 3eafe1b8f53e..a2de6364777a 100644 --- a/kernel/trace/trace_syscalls.c +++ b/kernel/trace/trace_syscalls.c @@ -390,21 +390,19 @@ static int __init syscall_enter_define_fields(struct = trace_event_call *call) /* * Create a per CPU temporary buffer to copy user space pointers into. * - * SYSCALL_FAULT_BUF_SZ holds the size of the per CPU buffer to use - * to copy memory from user space addresses into. - * - * SYSCALL_FAULT_ARG_SZ is the amount to copy from user space. - * - * SYSCALL_FAULT_USER_MAX is the amount to copy into the ring buffer. - * It's slightly smaller than SYSCALL_FAULT_ARG_SZ to know if it - * needs to append the EXTRA or not. + * SYSCALL_FAULT_USER_MAX is the amount to copy from user space. + * (defined in kernel/trace/trace.h) + + * SYSCALL_FAULT_ARG_SZ is the amount to copy from user space plus the + * nul terminating byte and possibly appended EXTRA (4 bytes). * - * This only allows up to 3 args from system calls. + * SYSCALL_FAULT_BUF_SZ holds the size of the per CPU buffer to use + * to copy memory from user space addresses into that will hold + * 3 args as only 3 args are allowed to be copied from system calls. */ -#define SYSCALL_FAULT_BUF_SZ 512 -#define SYSCALL_FAULT_ARG_SZ 168 -#define SYSCALL_FAULT_USER_MAX 128 +#define SYSCALL_FAULT_ARG_SZ (SYSCALL_FAULT_USER_MAX + 1 + 4) #define SYSCALL_FAULT_MAX_CNT 3 +#define SYSCALL_FAULT_BUF_SZ (SYSCALL_FAULT_ARG_SZ * SYSCALL_FAULT_MAX_CNT) =20 /* Use the tracing per CPU buffer infrastructure to copy from user space */ struct syscall_user_buffer { @@ -498,7 +496,8 @@ static int syscall_copy_user_array(char *buf, const cha= r __user *ptr, return 0; } =20 -static char *sys_fault_user(struct syscall_metadata *sys_data, +static char *sys_fault_user(unsigned int buf_size, + struct syscall_metadata *sys_data, struct syscall_user_buffer *sbuf, unsigned long *args, unsigned int data_size[SYSCALL_FAULT_MAX_CNT]) @@ -548,6 +547,10 @@ static char *sys_fault_user(struct syscall_metadata *s= ys_data, data_size[i] =3D -1; /* Denotes no pointer */ } =20 + /* A zero size means do not even try */ + if (!buf_size) + return NULL; + buffer =3D trace_user_fault_read(&sbuf->buf, NULL, size, syscall_copy, &sargs); if (!buffer) @@ -568,19 +571,20 @@ static char *sys_fault_user(struct syscall_metadata *= sys_data, buf[x] =3D '.'; } =20 + size =3D min(buf_size, SYSCALL_FAULT_USER_MAX); + /* * If the text was truncated due to our max limit, * add "..." to the string. */ - if (ret > SYSCALL_FAULT_USER_MAX) { - strscpy(buf + SYSCALL_FAULT_USER_MAX, EXTRA, - sizeof(EXTRA)); - ret =3D SYSCALL_FAULT_USER_MAX + sizeof(EXTRA); + if (ret > size) { + strscpy(buf + size, EXTRA, sizeof(EXTRA)); + ret =3D size + sizeof(EXTRA); } else { buf[ret++] =3D '\0'; } } else { - ret =3D min(ret, SYSCALL_FAULT_USER_MAX); + ret =3D min((unsigned int)ret, buf_size); } data_size[i] =3D ret; } @@ -590,7 +594,8 @@ static char *sys_fault_user(struct syscall_metadata *sy= s_data, =20 static int syscall_get_data(struct syscall_metadata *sys_data, unsigned long *args, - char **buffer, int *size, int *user_sizes, int *uargs) + char **buffer, int *size, int *user_sizes, int *uargs, + int buf_size) { struct syscall_user_buffer *sbuf; int i; @@ -600,7 +605,7 @@ syscall_get_data(struct syscall_metadata *sys_data, uns= igned long *args, if (!sbuf) return -1; =20 - *buffer =3D sys_fault_user(sys_data, sbuf, args, user_sizes); + *buffer =3D sys_fault_user(buf_size, sys_data, sbuf, args, user_sizes); /* * user_size is the amount of data to append. * Need to add 4 for the meta field that points to @@ -705,7 +710,7 @@ static void ftrace_syscall_enter(void *data, struct pt_= regs *regs, long id) =20 if (mayfault) { if (syscall_get_data(sys_data, args, &user_ptr, - &size, user_sizes, &uargs) < 0) + &size, user_sizes, &uargs, tr->syscall_buf_sz) < 0) return; } =20 @@ -1204,6 +1209,7 @@ static void perf_syscall_enter(void *ignore, struct p= t_regs *regs, long id) bool mayfault; char *user_ptr; int user_sizes[SYSCALL_FAULT_MAX_CNT] =3D {}; + int buf_size =3D CONFIG_TRACE_SYSCALL_BUF_SIZE_DEFAULT; int syscall_nr; int rctx; int size =3D 0; @@ -1233,7 +1239,7 @@ static void perf_syscall_enter(void *ignore, struct p= t_regs *regs, long id) =20 if (mayfault) { if (syscall_get_data(sys_data, args, &user_ptr, - &size, user_sizes, &uargs) < 0) + &size, user_sizes, &uargs, buf_size) < 0) return; } =20 --=20 2.51.0 From nobody Wed Dec 17 16:29:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A69D36B997; Tue, 28 Oct 2025 23:11:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693069; cv=none; b=o88b3TDL6XPF/wfbrUG1wb9Agc5hFSCQpWIyQSY5fsLGVYtFcnTUDC6K2e1qR2YFu5PX34pIe5QZYTeBLTh/i37UzY2E2wa88qMk13GECFBrqYC3dLG6a1WEBWFPY/rGiY+7vQtesg65w/FzCCmCfY9VOehPiwSZO9Ip9Jtf5hY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693069; c=relaxed/simple; bh=wSwArNUKZ0GhTI12DbyC0pJqy9+aiJFgUD1QWSxVO2g=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=EJBTeHYJmaqLdJwz3q4qA4drBQjYmsvzflJFgANDdpmHMZpMlWLGZGbhBzPt0wccYych/gqN5w1vLv7C7DyVuwHRRuFcc23lPM98+r1fpUL8lmbOMxSG2qeckxQxYsTlIM4WnVj5uB1W7wsY3RFUKTFGFOYyrEFB2SKXVRJAbAc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CmgQ55i3; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CmgQ55i3" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3E9D8C113D0; Tue, 28 Oct 2025 23:11:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1761693069; bh=wSwArNUKZ0GhTI12DbyC0pJqy9+aiJFgUD1QWSxVO2g=; h=Date:From:To:Cc:Subject:References:From; b=CmgQ55i3mcLx4EAODWd0NO0bUXKKg9HDiZlS4ocGVeH2XL33jalnC61Noqs9vSkMt ltIpe7oG163nFtRox8YJmyABDRB3hRAdnU48QJwCSE6+2KlHLmwGhW5BCywXkD+IDb tXCw/j87qQZo5L6xmOdF/+s0x/kj/tMEW/EksiT1/9u1ZfTtacp8Gt4qxGPOIC5jlK /73CvPRNmVJRbeAUrA2WHRYsjcV2hQiy5Iya8MkVX3kTiME2gAIwNXD3aR/HAZWH6y FOIIYRztuYkRlTadcqFXo1nbKUDyEd7HMsUAFU08xugBKXmE2DJtQ11AbRhHTqNSPM lUnXHIn9KxVmg== Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1vDsqy-00000004qt8-2Phy; Tue, 28 Oct 2025 19:11:48 -0400 Message-ID: <20251028231148.429422865@kernel.org> User-Agent: quilt/0.68 Date: Tue, 28 Oct 2025 19:11:23 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Peter Zijlstra , Namhyung Kim , Takaya Saeki , Tom Zanussi , Thomas Gleixner , Ian Rogers , Douglas Raillard , Arnaldo Carvalho de Melo , Jiri Olsa , Adrian Hunter , Ingo Molnar Subject: [PATCH v5 09/13] tracing: Show printable characters in syscall arrays References: <20251028231114.820213884@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt When displaying the contents of the user space data passed to the kernel, instead of just showing the array values, also print any printable content. Instead of just: bash-1113 [003] ..... 3433.290654: sys_write(fd: 2, buf: 0x555a8deedd= b0 (72:6f:6f:74:40:64:65:62:69:61:6e:2d:78:38:36:2d:36:34:3a:7e:23:20), cou= nt: 0x16) Display: bash-1113 [003] ..... 3433.290654: sys_write(fd: 2, buf: 0x555a8deedd= b0 (72:6f:6f:74:40:64:65:62:69:61:6e:2d:78:38:36:2d:36:34:3a:7e:23:20) "roo= t@debian-x86-64:~# ", count: 0x16) This only affects tracing and does not affect perf, as this only updates the output from the kernel. The output from perf is via user space. This may change by an update to libtraceevent that will then update perf to have this as well. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace_syscalls.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c index a2de6364777a..2d1307f13e13 100644 --- a/kernel/trace/trace_syscalls.c +++ b/kernel/trace/trace_syscalls.c @@ -155,6 +155,8 @@ print_syscall_enter(struct trace_iterator *iter, int fl= ags, trace_seq_printf(s, "%s(", entry->name); =20 for (i =3D 0; i < entry->nb_args; i++) { + bool printable =3D false; + char *str; =20 if (trace_seq_has_overflowed(s)) goto end; @@ -193,8 +195,11 @@ print_syscall_enter(struct trace_iterator *iter, int f= lags, =20 val =3D trace->args[entry->user_arg_size]; =20 + str =3D ptr; trace_seq_puts(s, " ("); for (int x =3D 0; x < len; x++, ptr++) { + if (isascii(*ptr) && isprint(*ptr)) + printable =3D true; if (x) trace_seq_putc(s, ':'); trace_seq_printf(s, "%02x", *ptr); @@ -203,6 +208,22 @@ print_syscall_enter(struct trace_iterator *iter, int f= lags, trace_seq_printf(s, ", %s", EXTRA); =20 trace_seq_putc(s, ')'); + + /* If nothing is printable, don't bother printing anything */ + if (!printable) + continue; + + trace_seq_puts(s, " \""); + for (int x =3D 0; x < len; x++) { + if (isascii(str[x]) && isprint(str[x])) + trace_seq_putc(s, str[x]); + else + trace_seq_putc(s, '.'); + } + if (len < val) + trace_seq_printf(s, "\"%s", EXTRA); + else + trace_seq_putc(s, '"'); } =20 trace_seq_putc(s, ')'); --=20 2.51.0 From nobody Wed Dec 17 16:29:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3A2B36B99A; Tue, 28 Oct 2025 23:11:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693069; cv=none; b=CpUkzS5msRB2bYunYlOUFLC8YvT95jab8rnp5f9GLM2WtapqL0wp/9GN2j0zEu9qSDVLkkJbafp72E90fJAJkJ2MrtJ2rfj/6Pt5xXPPBnqmrFnzszbSisq8f9pQCP+IqIRZkb9ksR8nYcoJkq7emJQKshyjKYsT0l0AQRxVctk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693069; c=relaxed/simple; bh=Lp/sEkMZsE67tXH+Ndy0qtN++BL6+DdO3Q5iet8yKzg=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Y9UrdlOArzUH9BymnOl5r73T1JWf/yPkci67mBw3LybgCmaHRrRV1/4D6t7aqHOuEUQsmnu9CA3EymWyTwwVUL9vVpOnUs50xM3SIhAFmFgHSIbICwD11IaXqzTsEVy3hcpGvzXLG17Xw6N312SLjbpMj7PDGIYGXrtWAgFA/H0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=i8krNUSc; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="i8krNUSc" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 67EDFC19422; Tue, 28 Oct 2025 23:11:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1761693069; bh=Lp/sEkMZsE67tXH+Ndy0qtN++BL6+DdO3Q5iet8yKzg=; h=Date:From:To:Cc:Subject:References:From; b=i8krNUScaLziVHZs+ozI3RPNMAAeZZPq0nbXGTrj59pCCA7n3/6dyGnkkR7waj8h9 F+mIs/z8eOpUyuZrnLJAvYsiv62C+Q59l9z+mumDDI/ImZuPXj2H5aMjRlLRQptkek v/hHurLMtHK3WiEFVdb5BUCRHnXoD4mCOUhN42Qij4ZeXz461m1RKA8FKa5Mt2VBYN GVa+noXqKr96KLAjhIPnsGkYkd9sZ9ZTfPRHdGQH2T895jhWT5pHCuUyEXY0xP8Or0 oeOqHVgyTzMv6bngFbamDy4+vQYys9fStI1oeuCl57kL6QkZY424Uttc3HFbI1xA2j oCpXc0KcwHjhQ== Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1vDsqy-00000004qtf-37bz; Tue, 28 Oct 2025 19:11:48 -0400 Message-ID: <20251028231148.594898736@kernel.org> User-Agent: quilt/0.68 Date: Tue, 28 Oct 2025 19:11:24 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Peter Zijlstra , Namhyung Kim , Takaya Saeki , Tom Zanussi , Thomas Gleixner , Ian Rogers , Douglas Raillard , Arnaldo Carvalho de Melo , Jiri Olsa , Adrian Hunter , Ingo Molnar Subject: [PATCH v5 10/13] tracing: Add trace_seq_pop() and seq_buf_pop() References: <20251028231114.820213884@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt In order to allow an interface to remove an added character from the trace_seq and seq_buf descriptors, add helper functions trace_seq_pop() and seq_buf_pop(). Signed-off-by: Steven Rostedt (Google) --- include/linux/seq_buf.h | 17 +++++++++++++++++ include/linux/trace_seq.h | 13 +++++++++++++ 2 files changed, 30 insertions(+) diff --git a/include/linux/seq_buf.h b/include/linux/seq_buf.h index 52791e070506..9f2839e73f8a 100644 --- a/include/linux/seq_buf.h +++ b/include/linux/seq_buf.h @@ -149,6 +149,23 @@ static inline void seq_buf_commit(struct seq_buf *s, i= nt num) } } =20 +/** + * seq_buf_pop - pop off the last written character + * @s: the seq_buf handle + * + * Removes the last written character to the seq_buf @s. + * + * Returns the last character or -1 if it is empty. + */ +static inline int seq_buf_pop(struct seq_buf *s) +{ + if (!s->len) + return -1; + + s->len--; + return (unsigned int)s->buffer[s->len]; +} + extern __printf(2, 3) int seq_buf_printf(struct seq_buf *s, const char *fmt, ...); extern __printf(2, 0) diff --git a/include/linux/trace_seq.h b/include/linux/trace_seq.h index 557780fe1c77..4a0b8c172d27 100644 --- a/include/linux/trace_seq.h +++ b/include/linux/trace_seq.h @@ -80,6 +80,19 @@ static inline bool trace_seq_has_overflowed(struct trace= _seq *s) return s->full || seq_buf_has_overflowed(&s->seq); } =20 +/** + * trace_seq_pop - pop off the last written character + * @s: trace sequence descriptor + * + * Removes the last written character to the trace_seq @s. + * + * Returns the last character or -1 if it is empty. + */ +static inline int trace_seq_pop(struct trace_seq *s) +{ + return seq_buf_pop(&s->seq); +} + /* * Currently only defined when tracing is enabled. */ --=20 2.51.0 From nobody Wed Dec 17 16:29:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D977F36C22D; Tue, 28 Oct 2025 23:11:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693070; cv=none; b=I8lVDG49X5WrOD1oiC0hbPxMbvVAE+Lg3jKASvZwNTrlYGv2oRkmMG72pvfl+VsFEkXfyL7HW2V+CeX5PB5NptA7AcfURxEyS/wEQkEydQCnlMam/U3CkqMuvqjFPgP9XuARq78/rVyKDu2KgXRHOPK8owQ5dT9xPZm6j95ObuU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693070; c=relaxed/simple; bh=wMndS1zpOKjDai9i3vF4Mrgfka5EbpCT2G1NvoumHq4=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=TZtO+/eWUXZ+l8vEI7f+P2vUiurqMNdKvD2cRGKN/xeUfXQnA/p8nQpLBZfz7ueltEWJ8MGPhTzOYNU77jL9JovGoequhjkcGwTxk4DnQGp+lg8AddK02jlHyA6SeBtFQJ9pjirvgCZgsk6ch7ed3CrL3N5/tVluJdrpb/W9tNQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=evt3jJDT; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="evt3jJDT" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A2AFDC4CEE7; Tue, 28 Oct 2025 23:11:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1761693069; bh=wMndS1zpOKjDai9i3vF4Mrgfka5EbpCT2G1NvoumHq4=; h=Date:From:To:Cc:Subject:References:From; b=evt3jJDTSe2z/nQn0SpOdg3Zj6KB6RTSTaRRxh1/02ni5Ogg7fELX0g/QulSz4U8L 42WbN5xdHhqrSoW9ZcDnyPRfrKewFdEWQoEaX11WBCutXSCIJXw1PwsnICDiPMi0hw InOejdvo8NnsI3KDAMWl2Bf4I6Ax6w8VN4pthmt/qzRsk3Xn1EX2cihwKpKaYLDzdi 3wHFfEoqNxG5BZOkqMpMCZ8zcr3HOqpZrhiIniMJthHwvyjvQBESiiiZP8FZtQeTox RROwWvdiK39oLwQJijKHA9XCFdBgIPE8TXnd57dSDZxxENKvsRSOsw7IeJmuOLEBNx NkWFrG4F0fmAA== Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1vDsqy-00000004qu9-3otn; Tue, 28 Oct 2025 19:11:48 -0400 Message-ID: <20251028231148.763161484@kernel.org> User-Agent: quilt/0.68 Date: Tue, 28 Oct 2025 19:11:25 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Peter Zijlstra , Namhyung Kim , Takaya Saeki , Tom Zanussi , Thomas Gleixner , Ian Rogers , Douglas Raillard , Arnaldo Carvalho de Melo , Jiri Olsa , Adrian Hunter , Ingo Molnar Subject: [PATCH v5 11/13] tracing: Add parsing of flags to the sys_enter_openat trace event References: <20251028231114.820213884@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt Add some logic to give the openat system call trace event a bit more human readable information: syscalls:sys_enter_openat: dfd: 0xffffff9c, filename: 0x7f0053dc121c "/e= tc/ld.so.cache", flags: O_RDONLY|O_CLOEXEC, mode: 0000 The above is output from "perf script" and now shows the flags used by the openat system call. Since the output from tracing is in the kernel, it can also remove the mode field when not used (when flags does not contain O_CREATE|O_TMPFILE) touch-1185 [002] ...1. 1291.690154: sys_openat(dfd: 4294967196, file= name: 139785545139344 "/usr/lib/locale/locale-archive", flags: O_RDONLY|O_C= LOEXEC) touch-1185 [002] ...1. 1291.690504: sys_openat(dfd: 1844674407370955= 1516, filename: 140733603151330 "/tmp/x", flags: O_WRONLY|O_CREAT|O_NOCTTY|= O_NONBLOCK, mode: 0666) As system calls have a fixed ABI, their trace events can be extended. This currently only updates the openat system call, but others may be extended in the future. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace_syscalls.c | 192 ++++++++++++++++++++++++++++++++-- 1 file changed, 182 insertions(+), 10 deletions(-) diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c index 2d1307f13e13..47d9771e8f7c 100644 --- a/kernel/trace/trace_syscalls.c +++ b/kernel/trace/trace_syscalls.c @@ -127,6 +127,116 @@ const char *get_syscall_name(int syscall) /* Added to user strings or arrays when max limit is reached */ #define EXTRA "..." =20 +static void get_dynamic_len_ptr(struct syscall_trace_enter *trace, + struct syscall_metadata *entry, + int *offset_p, int *len_p, unsigned char **ptr_p) +{ + unsigned char *ptr; + int offset =3D *offset_p; + int val; + + /* This arg points to a user space string */ + ptr =3D (void *)trace->args + sizeof(long) * entry->nb_args + offset; + val =3D *(int *)ptr; + + /* The value is a dynamic string (len << 16 | offset) */ + ptr =3D (void *)trace + (val & 0xffff); + *len_p =3D val >> 16; + offset +=3D 4; + + *ptr_p =3D ptr; + *offset_p =3D offset; +} + +static enum print_line_t +sys_enter_openat_print(struct syscall_trace_enter *trace, struct syscall_m= etadata *entry, + struct trace_seq *s, struct trace_event *event) +{ + unsigned char *ptr; + int offset =3D 0; + int bits, len; + bool done =3D false; + static const struct trace_print_flags __flags[] =3D + { + { O_TMPFILE, "O_TMPFILE" }, + { O_WRONLY, "O_WRONLY" }, + { O_RDWR, "O_RDWR" }, + { O_CREAT, "O_CREAT" }, + { O_EXCL, "O_EXCL" }, + { O_NOCTTY, "O_NOCTTY" }, + { O_TRUNC, "O_TRUNC" }, + { O_APPEND, "O_APPEND" }, + { O_NONBLOCK, "O_NONBLOCK" }, + { O_DSYNC, "O_DSYNC" }, + { O_DIRECT, "O_DIRECT" }, + { O_LARGEFILE, "O_LARGEFILE" }, + { O_DIRECTORY, "O_DIRECTORY" }, + { O_NOFOLLOW, "O_NOFOLLOW" }, + { O_NOATIME, "O_NOATIME" }, + { O_CLOEXEC, "O_CLOEXEC" }, + { -1, NULL } + }; + + trace_seq_printf(s, "%s(", entry->name); + + for (int i =3D 0; !done && i < entry->nb_args; i++) { + + if (trace_seq_has_overflowed(s)) + goto end; + + if (i) + trace_seq_puts(s, ", "); + + switch (i) { + case 2: + bits =3D trace->args[2]; + + trace_seq_puts(s, "flags: "); + + /* No need to show mode when not creating the file */ + if (!(bits & (O_CREAT|O_TMPFILE))) + done =3D true; + + if (!(bits & O_ACCMODE)) { + if (!bits) { + trace_seq_puts(s, "O_RDONLY"); + continue; + } + trace_seq_puts(s, "O_RDONLY|"); + } + + trace_print_flags_seq(s, "|", bits, __flags); + /* + * trace_print_flags_seq() adds a '\0' to the + * buffer, but this needs to append more to the seq. + */ + if (!trace_seq_has_overflowed(s)) + trace_seq_pop(s); + + continue; + case 3: + trace_seq_printf(s, "%s: 0%03o", entry->args[i], + (unsigned int)trace->args[i]); + continue; + } + + trace_seq_printf(s, "%s: %lu", entry->args[i], + trace->args[i]); + + if (!(BIT(i) & entry->user_mask)) + continue; + + get_dynamic_len_ptr(trace, entry, &offset, &len, &ptr); + trace_seq_printf(s, " \"%.*s\"", len, ptr); + } + + trace_seq_putc(s, ')'); +end: + trace_seq_putc(s, '\n'); + + return trace_handle_return(s); +} + static enum print_line_t print_syscall_enter(struct trace_iterator *iter, int flags, struct trace_event *event) @@ -152,6 +262,15 @@ print_syscall_enter(struct trace_iterator *iter, int f= lags, goto end; } =20 + switch (entry->syscall_nr) { + case __NR_openat: + if (!tr || !(tr->trace_flags & TRACE_ITER_VERBOSE)) + return sys_enter_openat_print(trace, entry, s, event); + break; + default: + break; + } + trace_seq_printf(s, "%s(", entry->name); =20 for (i =3D 0; i < entry->nb_args; i++) { @@ -179,14 +298,7 @@ print_syscall_enter(struct trace_iterator *iter, int f= lags, if (!(BIT(i) & entry->user_mask)) continue; =20 - /* This arg points to a user space string */ - ptr =3D (void *)trace->args + sizeof(long) * entry->nb_args + offset; - val =3D *(int *)ptr; - - /* The value is a dynamic string (len << 16 | offset) */ - ptr =3D (void *)ent + (val & 0xffff); - len =3D val >> 16; - offset +=3D 4; + get_dynamic_len_ptr(trace, entry, &offset, &len, &ptr); =20 if (entry->user_arg_size < 0 || entry->user_arg_is_str) { trace_seq_printf(s, " \"%.*s\"", len, ptr); @@ -269,6 +381,62 @@ print_syscall_exit(struct trace_iterator *iter, int fl= ags, .size =3D sizeof(_type), .align =3D __alignof__(_type), \ .is_signed =3D is_signed_type(_type), .filter_type =3D FILTER_OTHER } =20 +/* When len=3D0, we just calculate the needed length */ +#define LEN_OR_ZERO (len ? len - pos : 0) + +static int __init +sys_enter_openat_print_fmt(struct syscall_metadata *entry, char *buf, int = len) +{ + int pos =3D 0; + + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + "\"dfd: 0x%%08lx, filename: 0x%%08lx \\\"%%s\\\", flags: %%s%%s, mode: = 0%%03o\","); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + " ((unsigned long)(REC->dfd)),"); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + " ((unsigned long)(REC->filename)),"); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + " __get_str(__filename_val),"); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + " (REC->flags & ~3) && !(REC->flags & 3) ? \"O_RDONLY|\" : \"\", "); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + " REC->flags ? __print_flags(REC->flags, \"|\", "); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + "{ 0x%x, \"O_WRONLY\" }, ", O_WRONLY); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + "{ 0x%x, \"O_RDWR\" }, ", O_RDWR); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + "{ 0x%x, \"O_CREAT\" }, ", O_CREAT); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + "{ 0x%x, \"O_EXCL\" }, ", O_EXCL); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + "{ 0x%x, \"O_NOCTTY\" }, ", O_NOCTTY); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + "{ 0x%x, \"O_TRUNC\" }, ", O_TRUNC); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + "{ 0x%x, \"O_APPEND\" }, ", O_APPEND); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + "{ 0x%x, \"O_NONBLOCK\" }, ", O_NONBLOCK); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + "{ 0x%x, \"O_DSYNC\" }, ", O_DSYNC); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + "{ 0x%x, \"O_DIRECT\" }, ", O_DIRECT); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + "{ 0x%x, \"O_LARGEFILE\" }, ", O_LARGEFILE); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + "{ 0x%x, \"O_DIRECTORY\" }, ", O_DIRECTORY); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + "{ 0x%x, \"O_NOFOLLOW\" }, ", O_NOFOLLOW); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + "{ 0x%x, \"O_NOATIME\" }, ", O_NOATIME); + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + "{ 0x%x, \"O_CLOEXEC\" }) : \"O_RDONLY\", ", O_CLOEXEC); + + pos +=3D snprintf(buf + pos, LEN_OR_ZERO, + " ((unsigned long)(REC->mode))"); + return pos; +} + static int __init __set_enter_print_fmt(struct syscall_metadata *entry, char *buf, int len) { @@ -276,8 +444,12 @@ __set_enter_print_fmt(struct syscall_metadata *entry, = char *buf, int len) int i; int pos =3D 0; =20 - /* When len=3D0, we just calculate the needed length */ -#define LEN_OR_ZERO (len ? len - pos : 0) + switch (entry->syscall_nr) { + case __NR_openat: + return sys_enter_openat_print_fmt(entry, buf, len); + default: + break; + } =20 pos +=3D snprintf(buf + pos, LEN_OR_ZERO, "\""); for (i =3D 0; i < entry->nb_args; i++) { --=20 2.51.0 From nobody Wed Dec 17 16:29:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EFB4F36C230; Tue, 28 Oct 2025 23:11:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693070; cv=none; b=O+68cmto4neJmJmz+l2nN4vA34N4CxpwMr7wz5ZOZeHxfkRO5cHZLbFsfy3M6xEbZMt5F9t6YaCS+806KFWO0SZLqMWsPsyFzRdvj7y9yK044w/QKTKAZG5Oa+0hsvXJtsI2hdyVve3njGwSOarKbwI+LvnD3JKBZXQn92yge7M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693070; c=relaxed/simple; bh=pEZ5P2P/pcsHQBwGmSCZzSbB6LsFpFMiLNyesbHEmAw=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=fzcmG4Kc5j8nCAkjWl5WFveIZsNU5HUDAnPeYLBedmCL0C8aE8lWemdBlrV0WdIUV67kF7KC4BiciRtUgNz4HhNooheZEhib7ITPT1ZsaScFB8jw3xPuT4BXntAmKpEGJq84EK+hl+N4vmzBveAsAvx5qHtGdXP2GgfkvDfjFNw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Z9/TnuhV; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Z9/TnuhV" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CE146C116C6; Tue, 28 Oct 2025 23:11:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1761693069; bh=pEZ5P2P/pcsHQBwGmSCZzSbB6LsFpFMiLNyesbHEmAw=; h=Date:From:To:Cc:Subject:References:From; b=Z9/TnuhVpyEh5iCmXDPSB7UJwa0bGZ3dErdVZXPMUFQ/46iS7BhNsn6//oOpg0vjB a/4HRxmyL/sZSCoWUob17jNv1P2uE3CJ1XtfZRvd0GR0rrhafmlrpqtqYjkDOfYDdl meo92kG0SYzJ4YDlWXCqHyHzkXRl7p48yGiyfRkEWu+djRcWQGDyjeKTjgsb+nMDLm Dw0C0CPo7GKd2ny6W0IIZHok4eFx20fLvgCP0zc9AL10yO32Gh9SExom4sQAll9g29 M7kzVUYxRSYig6YvrslIBlTzx+OfTAzlS+CP2wADRiTWakzP6YiOd20PUvpSatlPvo nZX9TYpUnf9lg== Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1vDsqz-00000004quf-0KOC; Tue, 28 Oct 2025 19:11:49 -0400 Message-ID: <20251028231148.929243047@kernel.org> User-Agent: quilt/0.68 Date: Tue, 28 Oct 2025 19:11:26 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Peter Zijlstra , Namhyung Kim , Takaya Saeki , Tom Zanussi , Thomas Gleixner , Ian Rogers , Douglas Raillard , Arnaldo Carvalho de Melo , Jiri Olsa , Adrian Hunter , Ingo Molnar Subject: [PATCH v5 12/13] tracing: Check for printable characters when printing field dyn strings References: <20251028231114.820213884@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt When the "fields" option is enabled, it prints each trace event field based on its type. But a dynamic array and a dynamic string can both have a "char *" type. Printing it as a string can cause escape characters to be printed and mess up the output of the trace. For dynamic strings, test if there are any non-printable characters, and if so, print both the string with the non printable characters as '.', and the print the hex value of the array. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace_output.c | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index 97db0b0ccf3e..718b255b6fd8 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -950,7 +950,9 @@ static void print_fields(struct trace_iterator *iter, s= truct trace_event_call *c int offset; int len; int ret; + int i; void *pos; + char *str; =20 list_for_each_entry_reverse(field, head, link) { trace_seq_printf(&iter->seq, " %s=3D", field->name); @@ -977,8 +979,29 @@ static void print_fields(struct trace_iterator *iter, = struct trace_event_call *c trace_seq_puts(&iter->seq, ""); break; } - pos =3D (void *)iter->ent + offset; - trace_seq_printf(&iter->seq, "%.*s", len, (char *)pos); + str =3D (char *)iter->ent + offset; + /* Check if there's any non printable strings */ + for (i =3D 0; i < len; i++) { + if (str[i] && !(isascii(str[i]) && isprint(str[i]))) + break; + } + if (i < len) { + for (i =3D 0; i < len; i++) { + if (isascii(str[i]) && isprint(str[i])) + trace_seq_putc(&iter->seq, str[i]); + else + trace_seq_putc(&iter->seq, '.'); + } + trace_seq_puts(&iter->seq, " ("); + for (i =3D 0; i < len; i++) { + if (i) + trace_seq_putc(&iter->seq, ':'); + trace_seq_printf(&iter->seq, "%02x", str[i]); + } + trace_seq_putc(&iter->seq, ')'); + } else { + trace_seq_printf(&iter->seq, "%.*s", len, str); + } break; case FILTER_PTR_STRING: if (!iter->fmt_size) --=20 2.51.0 From nobody Wed Dec 17 16:29:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B87236C231; Tue, 28 Oct 2025 23:11:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693070; cv=none; b=OQGSw3XlYAnU3EoIcGB1gdG9Ccn81x1nlkgRmslRslyxf48KEloLO2byl+C2e2RELuLhMLYmobqFdlOtOPcpX1swiy1TE3ga8/9SSQ2LxQ4Bo8OfjbZXTr8gQ49pODBsuzTFdiP+02TksqJmSjrCHBUCvCEhlAeEtf/MyooXJ28= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761693070; c=relaxed/simple; bh=v6HVi3+ezmsDitNvxonnm8ii3SaF/3B18gnealIqhSU=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=dVn+DU2xQIVPhBoxPAqA42ZGm3MphkDukOVZ4JB9X6u9FQlgJ2IZb8hoaUkTy69zT7Mt/FkCBwH6f/Fb6FncPOsQZXNv0oOOULxnNnh+iW6a9p57qtcGWXPqnlrZor9hOTI9Z9RKmOPhP9+6/zrc5hN5L6ZTuLPkFQuDtm3d8OE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=unqV5gJK; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="unqV5gJK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DF294C113D0; Tue, 28 Oct 2025 23:11:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1761693069; bh=v6HVi3+ezmsDitNvxonnm8ii3SaF/3B18gnealIqhSU=; h=Date:From:To:Cc:Subject:References:From; b=unqV5gJKwllyBDtYy4N9dCNsHQcocPufWwwc6S3NDmdMgmzB06INRy3nSOTzktMfq jaeVXpQj+aj/R+FEh4MyKMvKrTXqs8AODH2AZaaS129XTXR8VVB5YXbZOb00wSssfh QWPnCn0t/cCcfvYbbibrFpAeM/j3Op0nfCpx5sqbaJILZEmh6kEx7h2jrqq9b9nuAV X1ChrM/AflpMgE3Z2VerrEFBIm+6tIzvag53MU5VCFo+6avWmI6LRzCreuDt8cD+nm 2tJ7gAlJMR7mQOSUZ1swjdLwgYRH72MAB42DF7QszBTUQBWZ8b9HlrD4sTq+L1p2hu YMe6xh4ltCeYQ== Received: from rostedt by gandalf with local (Exim 4.98.2) (envelope-from ) id 1vDsqz-00000004qvA-11Yl; Tue, 28 Oct 2025 19:11:49 -0400 Message-ID: <20251028231149.097404581@kernel.org> User-Agent: quilt/0.68 Date: Tue, 28 Oct 2025 19:11:27 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Peter Zijlstra , Namhyung Kim , Takaya Saeki , Tom Zanussi , Thomas Gleixner , Ian Rogers , Douglas Raillard , Arnaldo Carvalho de Melo , Jiri Olsa , Adrian Hunter , Ingo Molnar Subject: [PATCH v5 13/13] tracing: Have persistent ring buffer print syscalls normally References: <20251028231114.820213884@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Steven Rostedt The persistent ring buffer from a previous boot has to be careful printing events as the print formats of random events can have pointers to strings and such that are not available. Ftrace static events (like the function tracer event) are stable and are printed normally. System call event formats are also stable. Allow them to be printed normally as well: Instead of: <...>-1 [005] ...1. 57.240405: sys_enter_waitid: __syscall_nr=3D= 0xf7 (247) which=3D0x1 (1) upid=3D0x499 (1177) infop=3D0x7ffd5294d690 (1407= 25988939408) options=3D0x5 (5) ru=3D0x0 (0) <...>-1 [005] ...1. 57.240433: sys_exit_waitid: __syscall_nr=3D0= xf7 (247) ret=3D0x0 (0) <...>-1 [005] ...1. 57.240437: sys_enter_rt_sigprocmask: __sysca= ll_nr=3D0xe (14) how=3D0x2 (2) nset=3D0x7ffd5294d7c0 (140725988939712) oset= =3D0x0 (0) sigsetsize=3D0x8 (8) <...>-1 [005] ...1. 57.240438: sys_exit_rt_sigprocmask: __syscal= l_nr=3D0xe (14) ret=3D0x0 (0) <...>-1 [005] ...1. 57.240442: sys_enter_close: __syscall_nr=3D0= x3 (3) fd=3D0x4 (4) <...>-1 [005] ...1. 57.240463: sys_exit_close: __syscall_nr=3D0x= 3 (3) ret=3D0x0 (0) <...>-1 [005] ...1. 57.240485: sys_enter_openat: __syscall_nr=3D= 0x101 (257) dfd=3D0xffffffffffdfff9c (-2097252) filename=3D(0xffff8b81639ca= 01c) flags=3D0x80000 (524288) mode=3D0x0 (0) __filename_val=3D/run/systemd/= reboot-param <...>-1 [005] ...1. 57.240555: sys_exit_openat: __syscall_nr=3D0= x101 (257) ret=3D0xffffffffffdffffe (-2097154) <...>-1 [005] ...1. 57.240571: sys_enter_openat: __syscall_nr=3D= 0x101 (257) dfd=3D0xffffffffffdfff9c (-2097252) filename=3D(0xffff8b81639ca= 01c) flags=3D0x80000 (524288) mode=3D0x0 (0) __filename_val=3D/run/systemd/= reboot-param <...>-1 [005] ...1. 57.240620: sys_exit_openat: __syscall_nr=3D0= x101 (257) ret=3D0xffffffffffdffffe (-2097154) <...>-1 [005] ...1. 57.240629: sys_enter_writev: __syscall_nr=3D= 0x14 (20) fd=3D0x3 (3) vec=3D0x7ffd5294ce50 (140725988937296) vlen=3D0x7 (7) <...>-1 [005] ...1. 57.242281: sys_exit_writev: __syscall_nr=3D0= x14 (20) ret=3D0x24 (36) <...>-1 [005] ...1. 57.242286: sys_enter_reboot: __syscall_nr=3D= 0xa9 (169) magic1=3D0xfee1dead (4276215469) magic2=3D0x28121969 (672274793)= cmd=3D0x1234567 (19088743) arg=3D0x0 (0) Have: <...>-1 [000] ...1. 91.446011: sys_waitid(which: 1, upid: 0x4d2,= infop: 0x7ffdccdadfd0, options: 5, ru: 0) <...>-1 [000] ...1. 91.446042: sys_waitid -> 0x0 <...>-1 [000] ...1. 91.446045: sys_rt_sigprocmask(how: 2, nset: = 0x7ffdccdae100, oset: 0, sigsetsize: 8) <...>-1 [000] ...1. 91.446047: sys_rt_sigprocmask -> 0x0 <...>-1 [000] ...1. 91.446051: sys_close(fd: 4) <...>-1 [000] ...1. 91.446073: sys_close -> 0x0 <...>-1 [000] ...1. 91.446095: sys_openat(dfd: 18446744073709551= 516, filename: 139732544945794 "/run/systemd/reboot-param", flags: O_RDONLY= |O_CLOEXEC) <...>-1 [000] ...1. 91.446165: sys_openat -> 0xfffffffffffffffe <...>-1 [000] ...1. 91.446182: sys_openat(dfd: 18446744073709551= 516, filename: 139732544945794 "/run/systemd/reboot-param", flags: O_RDONLY= |O_CLOEXEC) <...>-1 [000] ...1. 91.446233: sys_openat -> 0xfffffffffffffffe <...>-1 [000] ...1. 91.446242: sys_writev(fd: 3, vec: 0x7ffdccda= d790, vlen: 7) <...>-1 [000] ...1. 91.447877: sys_writev -> 0x24 <...>-1 [000] ...1. 91.447883: sys_reboot(magic1: 0xfee1dead, ma= gic2: 0x28121969, cmd: 0x1234567, arg: 0) Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace.c | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 2aee9a3088f4..a765792d3428 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -4219,6 +4220,22 @@ static void test_cpu_buff_start(struct trace_iterato= r *iter) iter->cpu); } =20 +#ifdef CONFIG_FTRACE_SYSCALLS +static bool is_syscall_event(struct trace_event *event) +{ + return (event->funcs =3D=3D &enter_syscall_print_funcs) || + (event->funcs =3D=3D &exit_syscall_print_funcs); + +} +#define syscall_buf_size CONFIG_TRACE_SYSCALL_BUF_SIZE_DEFAULT +#else +static inline bool is_syscall_event(struct trace_event *event) +{ + return false; +} +#define syscall_buf_size 0 +#endif /* CONFIG_FTRACE_SYSCALLS */ + static enum print_line_t print_trace_fmt(struct trace_iterator *iter) { struct trace_array *tr =3D iter->tr; @@ -4251,10 +4268,12 @@ static enum print_line_t print_trace_fmt(struct tra= ce_iterator *iter) * safe to use if the array has delta offsets * Force printing via the fields. */ - if ((tr->text_delta) && - event->type > __TRACE_LAST_TYPE) + if ((tr->text_delta)) { + /* ftrace and system call events are still OK */ + if ((event->type > __TRACE_LAST_TYPE) && + !is_syscall_event(event)) return print_event_fields(iter, event); - + } return event->funcs->trace(iter, sym_flags, event); } =20 @@ -11436,7 +11455,7 @@ __init static int tracer_alloc_buffers(void) =20 global_trace.flags =3D TRACE_ARRAY_FL_GLOBAL; =20 - global_trace.syscall_buf_sz =3D CONFIG_TRACE_SYSCALL_BUF_SIZE_DEFAULT; + global_trace.syscall_buf_sz =3D syscall_buf_size; =20 INIT_LIST_HEAD(&global_trace.systems); INIT_LIST_HEAD(&global_trace.events); --=20 2.51.0