From nobody Mon Feb 9 10:28:33 2026 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 803463542EF for ; Sat, 31 Jan 2026 13:29:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769866172; cv=none; b=dnHR5QUsjpUw1uN4F2fbFujIJlXB/M1kcuMvpff5lrFGYQBaLgPSvZ4Qs4/bc+KXmGBLbeQw6sZOgGvn4QptD3EFtwe0us+DW+tPn7e0EgYbfold2wvCsPakE66ooGtF/jPLLoclW7dSGMzcdRpqqJ33b2pAxqvX4NJ1RkNzd28= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769866172; c=relaxed/simple; bh=m/U8pyNNaKUbeR2ZptM8Nlu4Kn/daQA9ukX0NpNIfYA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=rmNqSCRJ0iAMmmjBO2GeI1BVR2zNhkQRev0a4D72di63/Wfjamg2/XecAhq/qtIaTVrSjKy0KN9gN2MWc3zAOjTmPOcKt/DVFzkKTshJWZkY5vpoDd7OXFhXTVCONnSXgX+2Ooi5p7yTPLFc5sz50QHqNu2I6zU7irA4Zm4Ngew= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--vdonnefort.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=2nPEDpf0; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--vdonnefort.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="2nPEDpf0" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-47d4029340aso52911005e9.3 for ; Sat, 31 Jan 2026 05:29:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769866168; x=1770470968; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=fX14JGUgRa1ZiPwayZwX9u3eOOgxFO2kmdHSzA2Nkpk=; b=2nPEDpf0f3S1r+qM1I2pGjM+GsBVRlsI/nTpTW6bZzNLEm7C+rTl+xr9DmKgujLogT sbAwOWydsCY7Aat26zIa1GKch+ohk5T0BDRq0IwvSK35adPpwkMQUTpytAXPpkQ+SlJm 3pz5/yr5WPHY+0a6xVHv9IegmtS7TjJwzEmm2I3ReV+SfI9i/8irVvkBMyvwIbG1fofy QIgZsjXGrdKJyWRON+Ux2BFfR7kZwHDSkzpvbbNn4wrv7h3JeJEqWv//RcmjtHGTXWso 2adu4uHPwRc2JjnCWIdrit7pqscx4LW2JaCZfpueOFxpovAK9ULSlRgYVESuQU7KAiAE 2DYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769866168; x=1770470968; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fX14JGUgRa1ZiPwayZwX9u3eOOgxFO2kmdHSzA2Nkpk=; b=MpRJQhfPXw/6fv1l1Kuic6MG+8KquaW3SXWViI5ye9L1I/uV+Zn+4JRajJM8II/jK7 3dY4xvi+W6OUHuAbRFXDzQkyuZgc6Ytb4pfHJ9fMRKuHKukV12Jq0Mv3KRH1TLQf45Jx DGCPlXux57uEmupdwLmbdD9bh1YV9uuHahwRuRYg9SqStCBHQYj9J9K8kexm1Z79vuCB mQm0LONbZgVTgR/qaxIairnhgGgfqP2YnK36V+Oh7b/l95Z2jW6zNmIfYfkhtzkiZkx+ +mf+Il18V/kgOtcN5xN7U9o4HU1JurxjporFOG3HLAL5eBlicRbmhxmNzMPatEqzs5FX 4rmQ== X-Forwarded-Encrypted: i=1; AJvYcCXe/I4bWFhNOfK+9UDBphaS/+B6A4H3KH5DYRyAlIVjaXl+lyRHZWUORMiEEXAzwm+jViGKFhDSHpCrdz8=@vger.kernel.org X-Gm-Message-State: AOJu0YwmvNSPN1aa153p5gJg0eE/8osH4T8LvRWwvgpt466gFO9LoJXW i7TIHtVzpHNq6elHJDfW089FqzUrTjPTcER24L4RZcu4Fc/1DpXrcyMiS02EFXm+lHP3Vx4qsP+ vXDeK/i0Mee6JQ1k2emZJsA== X-Received: from wmxa5-n2.prod.google.com ([2002:a05:600d:6445:20b0:480:6b05:6b8e]) (user=vdonnefort job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:4ec7:b0:477:8b77:155e with SMTP id 5b1f17b1804b1-482db46b04fmr73736805e9.15.1769866167812; Sat, 31 Jan 2026 05:29:27 -0800 (PST) Date: Sat, 31 Jan 2026 13:28:25 +0000 In-Reply-To: <20260131132848.254084-1-vdonnefort@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260131132848.254084-1-vdonnefort@google.com> X-Mailer: git-send-email 2.53.0.rc1.225.gd81095ad13-goog Message-ID: <20260131132848.254084-8-vdonnefort@google.com> Subject: [PATCH v11 07/30] tracing: Add non-consuming read to trace remotes From: Vincent Donnefort To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, linux-trace-kernel@vger.kernel.org, maz@kernel.org, oliver.upton@linux.dev, joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com Cc: kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, jstultz@google.com, qperret@google.com, will@kernel.org, aneesh.kumar@kernel.org, kernel-team@android.com, linux-kernel@vger.kernel.org, Vincent Donnefort Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allow reading the trace file for trace remotes. This performs a non-consuming read of the trace buffer. Signed-off-by: Vincent Donnefort diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 1fcbdea992d2..d6eee96a6789 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4605,7 +4605,7 @@ static int s_show(struct seq_file *m, void *v) * Should be used after trace_array_get(), trace_types_lock * ensures that i_cdev was already initialized. */ -static inline int tracing_get_cpu(struct inode *inode) +int tracing_get_cpu(struct inode *inode) { if (inode->i_cdev) /* See trace_create_cpu_file() */ return (long)inode->i_cdev - 1; diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index fdbcd068ba38..aa397acb6be9 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -686,6 +686,7 @@ struct dentry *trace_create_cpu_file(const char *name, void *data, long cpu, const struct file_operations *fops); +int tracing_get_cpu(struct inode *inode); =20 =20 /** diff --git a/kernel/trace/trace_remote.c b/kernel/trace/trace_remote.c index 82060bb19ba1..e03fde73aac7 100644 --- a/kernel/trace/trace_remote.c +++ b/kernel/trace/trace_remote.c @@ -18,14 +18,23 @@ #define TRACEFS_MODE_WRITE 0640 #define TRACEFS_MODE_READ 0440 =20 +enum tri_type { + TRI_CONSUMING, + TRI_NONCONSUMING, +}; + struct trace_remote_iterator { struct trace_remote *remote; struct trace_seq seq; struct delayed_work poll_work; unsigned long lost_events; u64 ts; + struct ring_buffer_iter *rb_iter; + struct ring_buffer_iter **rb_iters; int cpu; int evt_cpu; + loff_t pos; + enum tri_type type; }; =20 struct trace_remote { @@ -36,6 +45,8 @@ struct trace_remote { unsigned long trace_buffer_size; struct ring_buffer_remote rb_remote; struct mutex lock; + struct rw_semaphore reader_lock; + struct rw_semaphore *pcpu_reader_locks; unsigned int nr_readers; unsigned int poll_ms; bool tracing_on; @@ -230,6 +241,20 @@ static int trace_remote_get(struct trace_remote *remot= e, int cpu) if (ret) return ret; =20 + if (cpu !=3D RING_BUFFER_ALL_CPUS && !remote->pcpu_reader_locks) { + int lock_cpu; + + remote->pcpu_reader_locks =3D kcalloc(nr_cpu_ids, sizeof(*remote->pcpu_r= eader_locks), + GFP_KERNEL); + if (!remote->pcpu_reader_locks) { + trace_remote_try_unload(remote); + return -ENOMEM; + } + + for_each_possible_cpu(lock_cpu) + init_rwsem(&remote->pcpu_reader_locks[lock_cpu]); + } + remote->nr_readers++; =20 return 0; @@ -244,6 +269,9 @@ static void trace_remote_put(struct trace_remote *remot= e) if (remote->nr_readers) return; =20 + kfree(remote->pcpu_reader_locks); + remote->pcpu_reader_locks =3D NULL; + trace_remote_try_unload(remote); } =20 @@ -258,13 +286,58 @@ static void __poll_remote(struct work_struct *work) msecs_to_jiffies(iter->remote->poll_ms)); } =20 -static struct trace_remote_iterator *trace_remote_iter(struct trace_remote= *remote, int cpu) +static void __free_ring_buffer_iter(struct trace_remote_iterator *iter, in= t cpu) +{ + if (!iter->rb_iter) + return; + + if (cpu !=3D RING_BUFFER_ALL_CPUS) { + ring_buffer_read_finish(iter->rb_iter); + return; + } + + for_each_possible_cpu(cpu) { + if (iter->rb_iters[cpu]) + ring_buffer_read_finish(iter->rb_iters[cpu]); + } + + kfree(iter->rb_iters); +} + +static int __alloc_ring_buffer_iter(struct trace_remote_iterator *iter, in= t cpu) +{ + if (cpu !=3D RING_BUFFER_ALL_CPUS) { + iter->rb_iter =3D ring_buffer_read_start(iter->remote->trace_buffer, cpu= , GFP_KERNEL); + + return iter->rb_iter ? 0 : -ENOMEM; + } + + iter->rb_iters =3D kcalloc(nr_cpu_ids, sizeof(*iter->rb_iters), GFP_KERNE= L); + if (!iter->rb_iters) + return -ENOMEM; + + for_each_possible_cpu(cpu) { + iter->rb_iters[cpu] =3D ring_buffer_read_start(iter->remote->trace_buffe= r, cpu, + GFP_KERNEL); + if (!iter->rb_iters[cpu]) { + __free_ring_buffer_iter(iter, RING_BUFFER_ALL_CPUS); + return -ENOMEM; + } + } + + return 0; +} + +static struct trace_remote_iterator +*trace_remote_iter(struct trace_remote *remote, int cpu, enum tri_type typ= e) { struct trace_remote_iterator *iter =3D NULL; int ret; =20 lockdep_assert_held(&remote->lock); =20 + if (type =3D=3D TRI_NONCONSUMING && !trace_remote_loaded(remote)) + return NULL; =20 ret =3D trace_remote_get(remote, cpu); if (ret) @@ -279,9 +352,21 @@ static struct trace_remote_iterator *trace_remote_iter= (struct trace_remote *remo if (iter) { iter->remote =3D remote; iter->cpu =3D cpu; + iter->type =3D type; trace_seq_init(&iter->seq); - INIT_DELAYED_WORK(&iter->poll_work, __poll_remote); - schedule_delayed_work(&iter->poll_work, msecs_to_jiffies(remote->poll_ms= )); + + switch (type) { + case TRI_CONSUMING: + INIT_DELAYED_WORK(&iter->poll_work, __poll_remote); + schedule_delayed_work(&iter->poll_work, msecs_to_jiffies(remote->poll_m= s)); + break; + case TRI_NONCONSUMING: + ret =3D __alloc_ring_buffer_iter(iter, cpu); + break; + } + + if (ret) + goto err; =20 return iter; } @@ -305,10 +390,100 @@ static void trace_remote_iter_free(struct trace_remo= te_iterator *iter) =20 lockdep_assert_held(&remote->lock); =20 + switch (iter->type) { + case TRI_CONSUMING: + cancel_delayed_work_sync(&iter->poll_work); + break; + case TRI_NONCONSUMING: + __free_ring_buffer_iter(iter, iter->cpu); + break; + } + kfree(iter); trace_remote_put(remote); } =20 +static void trace_remote_iter_read_start(struct trace_remote_iterator *ite= r) +{ + struct trace_remote *remote =3D iter->remote; + int cpu =3D iter->cpu; + + /* Acquire global reader lock */ + if (cpu =3D=3D RING_BUFFER_ALL_CPUS && iter->type =3D=3D TRI_CONSUMING) + down_write(&remote->reader_lock); + else + down_read(&remote->reader_lock); + + if (cpu =3D=3D RING_BUFFER_ALL_CPUS) + return; + + /* + * No need for the remote lock here, iter holds a reference on + * remote->nr_readers + */ + + /* Get the per-CPU one */ + if (WARN_ON_ONCE(!remote->pcpu_reader_locks)) + return; + + if (iter->type =3D=3D TRI_CONSUMING) + down_write(&remote->pcpu_reader_locks[cpu]); + else + down_read(&remote->pcpu_reader_locks[cpu]); +} + +static void trace_remote_iter_read_finished(struct trace_remote_iterator *= iter) +{ + struct trace_remote *remote =3D iter->remote; + int cpu =3D iter->cpu; + + /* Release per-CPU reader lock */ + if (cpu !=3D RING_BUFFER_ALL_CPUS) { + /* + * No need for the remote lock here, iter holds a reference on + * remote->nr_readers + */ + if (iter->type =3D=3D TRI_CONSUMING) + up_write(&remote->pcpu_reader_locks[cpu]); + else + up_read(&remote->pcpu_reader_locks[cpu]); + } + + /* Release global reader lock */ + if (cpu =3D=3D RING_BUFFER_ALL_CPUS && iter->type =3D=3D TRI_CONSUMING) + up_write(&remote->reader_lock); + else + up_read(&remote->reader_lock); +} + +static struct ring_buffer_iter *__get_rb_iter(struct trace_remote_iterator= *iter, int cpu) +{ + return iter->cpu !=3D RING_BUFFER_ALL_CPUS ? iter->rb_iter : iter->rb_ite= rs[cpu]; +} + +static struct ring_buffer_event * +__peek_event(struct trace_remote_iterator *iter, int cpu, u64 *ts, unsigne= d long *lost_events) +{ + struct ring_buffer_event *rb_evt; + struct ring_buffer_iter *rb_iter; + + switch (iter->type) { + case TRI_CONSUMING: + return ring_buffer_peek(iter->remote->trace_buffer, cpu, ts, lost_events= ); + case TRI_NONCONSUMING: + rb_iter =3D __get_rb_iter(iter, cpu); + rb_evt =3D ring_buffer_iter_peek(rb_iter, ts); + if (!rb_evt) + return NULL; + + *lost_events =3D ring_buffer_iter_dropped(rb_iter); + + return rb_evt; + } + + return NULL; +} + static bool trace_remote_iter_read_event(struct trace_remote_iterator *ite= r) { struct trace_buffer *trace_buffer =3D iter->remote->trace_buffer; @@ -318,7 +493,7 @@ static bool trace_remote_iter_read_event(struct trace_r= emote_iterator *iter) if (ring_buffer_empty_cpu(trace_buffer, cpu)) return false; =20 - if (!ring_buffer_peek(trace_buffer, cpu, &iter->ts, &iter->lost_events)) + if (!__peek_event(iter, cpu, &iter->ts, &iter->lost_events)) return false; =20 iter->evt_cpu =3D cpu; @@ -333,7 +508,7 @@ static bool trace_remote_iter_read_event(struct trace_r= emote_iterator *iter) if (ring_buffer_empty_cpu(trace_buffer, cpu)) continue; =20 - if (!ring_buffer_peek(trace_buffer, cpu, &ts, &lost_events)) + if (!__peek_event(iter, cpu, &ts, &lost_events)) continue; =20 if (ts >=3D iter->ts) @@ -347,6 +522,20 @@ static bool trace_remote_iter_read_event(struct trace_= remote_iterator *iter) return iter->ts !=3D U64_MAX; } =20 +static void trace_remote_iter_move(struct trace_remote_iterator *iter) +{ + struct trace_buffer *trace_buffer =3D iter->remote->trace_buffer; + + switch (iter->type) { + case TRI_CONSUMING: + ring_buffer_consume(trace_buffer, iter->evt_cpu, NULL, NULL); + break; + case TRI_NONCONSUMING: + ring_buffer_iter_advance(__get_rb_iter(iter, iter->evt_cpu)); + break; + } +} + static int trace_remote_iter_print_event(struct trace_remote_iterator *ite= r) { unsigned long usecs_rem; @@ -369,13 +558,14 @@ static int trace_pipe_open(struct inode *inode, struc= t file *filp) { struct trace_remote *remote =3D inode->i_private; struct trace_remote_iterator *iter; - int cpu =3D RING_BUFFER_ALL_CPUS; - - if (inode->i_cdev) - cpu =3D (long)inode->i_cdev - 1; + int cpu =3D tracing_get_cpu(inode); =20 guard(mutex)(&remote->lock); - iter =3D trace_remote_iter(remote, cpu); + + iter =3D trace_remote_iter(remote, cpu, TRI_CONSUMING); + if (IS_ERR(iter)) + return PTR_ERR(iter); + filp->private_data =3D iter; =20 return IS_ERR(iter) ? PTR_ERR(iter) : 0; @@ -410,6 +600,8 @@ static ssize_t trace_pipe_read(struct file *filp, char = __user *ubuf, size_t cnt, if (ret < 0) return ret; =20 + trace_remote_iter_read_start(iter); + while (trace_remote_iter_read_event(iter)) { int prev_len =3D iter->seq.seq.len; =20 @@ -418,9 +610,11 @@ static ssize_t trace_pipe_read(struct file *filp, char= __user *ubuf, size_t cnt, break; } =20 - ring_buffer_consume(trace_buffer, iter->evt_cpu, NULL, NULL); + trace_remote_iter_move(iter); } =20 + trace_remote_iter_read_finished(iter); + goto copy_to_user; } =20 @@ -430,14 +624,123 @@ static const struct file_operations trace_pipe_fops = =3D { .release =3D trace_pipe_release, }; =20 +static void *trace_next(struct seq_file *m, void *v, loff_t *pos) +{ + struct trace_remote_iterator *iter =3D m->private; + + ++*pos; + + if (!iter || !trace_remote_iter_read_event(iter)) + return NULL; + + trace_remote_iter_move(iter); + iter->pos++; + + return iter; +} + +static void *trace_start(struct seq_file *m, loff_t *pos) +{ + struct trace_remote_iterator *iter =3D m->private; + loff_t i; + + if (!iter) + return NULL; + + if (!*pos) { + iter->pos =3D -1; + return trace_next(m, NULL, &i); + } + + i =3D iter->pos; + while (i < *pos) { + iter =3D trace_next(m, NULL, &i); + if (!iter) + return NULL; + } + + return iter; +} + +static int trace_show(struct seq_file *m, void *v) +{ + struct trace_remote_iterator *iter =3D v; + + trace_seq_init(&iter->seq); + + if (trace_remote_iter_print_event(iter)) { + seq_printf(m, "[EVENT %d PRINT TOO BIG]\n", iter->evt->id); + return 0; + } + + return trace_print_seq(m, &iter->seq); +} + +static void trace_stop(struct seq_file *s, void *v) { } + +static const struct seq_operations trace_sops =3D { + .start =3D trace_start, + .next =3D trace_next, + .show =3D trace_show, + .stop =3D trace_stop, +}; + +static int trace_open(struct inode *inode, struct file *filp) +{ + struct trace_remote *remote =3D inode->i_private; + struct trace_remote_iterator *iter =3D NULL; + int cpu =3D tracing_get_cpu(inode); + int ret; + + if (!(filp->f_mode & FMODE_READ)) + return 0; + + guard(mutex)(&remote->lock); + + iter =3D trace_remote_iter(remote, cpu, TRI_NONCONSUMING); + if (IS_ERR(iter)) + return PTR_ERR(iter); + + ret =3D seq_open(filp, &trace_sops); + if (ret) { + trace_remote_iter_free(iter); + return ret; + } + + if (iter) + trace_remote_iter_read_start(iter); + + ((struct seq_file *)filp->private_data)->private =3D (void *)iter; + + return 0; +} + +static int trace_release(struct inode *inode, struct file *filp) +{ + struct trace_remote_iterator *iter; + + if (!(filp->f_mode & FMODE_READ)) + return 0; + + iter =3D ((struct seq_file *)filp->private_data)->private; + seq_release(inode, filp); + + if (!iter) + return 0; + + guard(mutex)(&iter->remote->lock); + + trace_remote_iter_read_finished(iter); + trace_remote_iter_free(iter); + + return 0; +} + static ssize_t trace_write(struct file *filp, const char __user *ubuf, siz= e_t cnt, loff_t *ppos) { struct inode *inode =3D file_inode(filp); struct trace_remote *remote =3D inode->i_private; - int cpu =3D RING_BUFFER_ALL_CPUS; - - if (inode->i_cdev) - cpu =3D (long)inode->i_cdev - 1; + int cpu =3D tracing_get_cpu(inode); =20 guard(mutex)(&remote->lock); =20 @@ -447,7 +750,11 @@ static ssize_t trace_write(struct file *filp, const ch= ar __user *ubuf, size_t cn } =20 static const struct file_operations trace_fops =3D { + .open =3D trace_open, .write =3D trace_write, + .read =3D seq_read, + .read_iter =3D seq_read_iter, + .release =3D trace_release, }; =20 static int trace_remote_init_tracefs(const char *name, struct trace_remote= *remote) @@ -566,6 +873,7 @@ int trace_remote_register(const char *name, struct trac= e_remote_callbacks *cbs, remote->trace_buffer_size =3D 7 << 10; remote->poll_ms =3D 100; mutex_init(&remote->lock); + init_rwsem(&remote->reader_lock); =20 if (trace_remote_init_tracefs(name, remote)) { kfree(remote); --=20 2.53.0.rc1.225.gd81095ad13-goog