fs/proc/array.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-)
This patch introduces two new fields to /proc/[pid]/status to display the
set of CPUs, representing the CPU affinity of the process's active
memory context, in both mask and list format: "Cpus_active_mm" and
"Cpus_active_mm_list". The mm_cpumask is primarily used for TLB and
cache synchronisation.
Exposing this information allows userspace to easily identify
memory-task affinity, insight to NUMA alignment, CPU isolation and
real-time workload placement.
Frequent mm_cpumask changes may indicate instability in placement
policies or excessive task migration overhead.
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
fs/proc/array.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 42932f88141a..8887c5e38e51 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -409,6 +409,23 @@ static void task_cpus_allowed(struct seq_file *m, struct task_struct *task)
cpumask_pr_args(&task->cpus_mask));
}
+/**
+ * task_cpus_active_mm - Show the mm_cpumask for a process
+ * @m: The seq_file structure for the /proc/PID/status output
+ * @mm: The memory descriptor of the process
+ *
+ * Prints the set of CPUs, representing the CPU affinity of the process's
+ * active memory context, in both mask and list format. This mask is
+ * primarily used for TLB and cache synchronisation.
+ */
+static void task_cpus_active_mm(struct seq_file *m, struct mm_struct *mm)
+{
+ seq_printf(m, "Cpus_active_mm:\t%*pb\n",
+ cpumask_pr_args(mm_cpumask(mm)));
+ seq_printf(m, "Cpus_active_mm_list:\t%*pbl\n",
+ cpumask_pr_args(mm_cpumask(mm)));
+}
+
static inline void task_core_dumping(struct seq_file *m, struct task_struct *task)
{
seq_put_decimal_ull(m, "CoreDumping:\t", !!task->signal->core_state);
@@ -450,12 +467,15 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
task_core_dumping(m, task);
task_thp_status(m, mm);
task_untag_mask(m, mm);
- mmput(mm);
}
task_sig(m, task);
task_cap(m, task);
task_seccomp(m, task);
task_cpus_allowed(m, task);
+ if (mm) {
+ task_cpus_active_mm(m, mm);
+ mmput(mm);
+ }
cpuset_task_status_allowed(m, task);
task_context_switch_counts(m, task);
arch_proc_pid_thread_features(m, task);
--
2.51.0
On 12/17/25 03:46, Aaron Tomlin wrote: > This patch introduces two new fields to /proc/[pid]/status to display the > set of CPUs, representing the CPU affinity of the process's active > memory context, in both mask and list format: "Cpus_active_mm" and > "Cpus_active_mm_list". The mm_cpumask is primarily used for TLB and > cache synchronisation. > > Exposing this information allows userspace to easily identify > memory-task affinity, insight to NUMA alignment, CPU isolation and > real-time workload placement. > > Frequent mm_cpumask changes may indicate instability in placement > policies or excessive task migration overhead. I agree with Oleg's comments. Given that everybody has read access to /proc/$PID/status IIUC, I wonder if that information could somehow help an attacker to better attack a target program (knowing which CPUs have dirty TLB etc). As you saise, it's primarily for TLB and cache sync ... Just a thought, have nothing concrete in mind. -- Cheers David
On Thu, Dec 18, 2025 at 09:30:53AM +0100, David Hildenbrand (Red Hat) wrote:
> I agree with Oleg's comments.
>
> Given that everybody has read access to /proc/$PID/status IIUC, I wonder if
> that information could somehow help an attacker to better attack a target
> program (knowing which CPUs have dirty TLB etc). As you saise, it's
> primarily for TLB and cache sync ...
>
> Just a thought, have nothing concrete in mind.
Hi David,
Thank you for raising this point; security and information leakage are,
quite rightly, paramount considerations when adding new entries to
world-readable interfaces like /proc/[pid]/status. Upon reflection, I
submit that the risk here is minimal for a few reasons:
1. Existing Visibility: The kernel already exposes a significant amount
of CPU residency information. For instance, /proc/[pid]/stat explicitly
shows the CPU a task is currently running on (field 39)
i.e., task_cpu(task), and "Cpus_allowed" already defines the bounds of
where a task can be. See do_task_stat().
2. Resolution of Data: The mm_cpumask is a relatively coarse-grained
diagnostic. While it indicates where TLB entries might be valid, it
does not provide the fine-grained timing or cache-line information
typically required for sophisticated side-channel attacks.
3. Diagnostic Value: The primary intent is to provide visibility into
the "memory footprint" across CPUs, which is invaluable for debugging
performance issues related to IPI storms and TLB shootdowns in
large-scale NUMA systems. The CPU-affinity sets the boundary; the
mm_cpumask records the arrival; they complement each other.
I trust that the diagnostic utility is seen to outweigh the theoretical
risk in this instance.
Kind regards,
--
Aaron Tomlin
Can't really comment this patch... I mean the intent.
Just a couple of nits:
- I think this patch should also update Documentation/filesystems/proc.rst
- I won't object, but do we really need/want another "if (mm)" block ?
- I guess this is just my poor English, but the usage of "affinity"
in the changelog/comment looks a bit confusing to me ;) As if this
refers to task_struct.cpus_mask.
Fortunately "Cpus_active_mm..." in task_cpus_active_mm() makes it
more clear, so feel free to ignore.
Oleg.
On 12/16, Aaron Tomlin wrote:
>
> This patch introduces two new fields to /proc/[pid]/status to display the
> set of CPUs, representing the CPU affinity of the process's active
> memory context, in both mask and list format: "Cpus_active_mm" and
> "Cpus_active_mm_list". The mm_cpumask is primarily used for TLB and
> cache synchronisation.
>
> Exposing this information allows userspace to easily identify
> memory-task affinity, insight to NUMA alignment, CPU isolation and
> real-time workload placement.
>
> Frequent mm_cpumask changes may indicate instability in placement
> policies or excessive task migration overhead.
>
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
> fs/proc/array.c | 22 +++++++++++++++++++++-
> 1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index 42932f88141a..8887c5e38e51 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -409,6 +409,23 @@ static void task_cpus_allowed(struct seq_file *m, struct task_struct *task)
> cpumask_pr_args(&task->cpus_mask));
> }
>
> +/**
> + * task_cpus_active_mm - Show the mm_cpumask for a process
> + * @m: The seq_file structure for the /proc/PID/status output
> + * @mm: The memory descriptor of the process
> + *
> + * Prints the set of CPUs, representing the CPU affinity of the process's
> + * active memory context, in both mask and list format. This mask is
> + * primarily used for TLB and cache synchronisation.
> + */
> +static void task_cpus_active_mm(struct seq_file *m, struct mm_struct *mm)
> +{
> + seq_printf(m, "Cpus_active_mm:\t%*pb\n",
> + cpumask_pr_args(mm_cpumask(mm)));
> + seq_printf(m, "Cpus_active_mm_list:\t%*pbl\n",
> + cpumask_pr_args(mm_cpumask(mm)));
> +}
> +
> static inline void task_core_dumping(struct seq_file *m, struct task_struct *task)
> {
> seq_put_decimal_ull(m, "CoreDumping:\t", !!task->signal->core_state);
> @@ -450,12 +467,15 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
> task_core_dumping(m, task);
> task_thp_status(m, mm);
> task_untag_mask(m, mm);
> - mmput(mm);
> }
> task_sig(m, task);
> task_cap(m, task);
> task_seccomp(m, task);
> task_cpus_allowed(m, task);
> + if (mm) {
> + task_cpus_active_mm(m, mm);
> + mmput(mm);
> + }
> cpuset_task_status_allowed(m, task);
> task_context_switch_counts(m, task);
> arch_proc_pid_thread_features(m, task);
> --
> 2.51.0
>
On Wed, Dec 17, 2025 at 06:33:26PM +0100, Oleg Nesterov wrote: > Can't really comment this patch... I mean the intent. > Just a couple of nits: Hi Oleg, Long time no speak. Thank you for your response. > - I think this patch should also update > Documentation/filesystems/proc.rst Acknowledged. I will do so in the follow-up patch. > - I won't object, but do we really need/want another "if (mm)" block ? I appreciate your observation; technically, the code could be more compact by merging this into the earlier conditional block. However, my reasoning here was primarily a personal preference regarding the resulting output of /proc/[PID]/status. I felt it was beneficial to keep "Cpus_active_mm" and "Cpus_active_mm_list" in close proximity to their counterparts, "Cpus_allowed" and "Cpus_allowed_list", to provide a more intuitive and logically grouped view for the user. > - I guess this is just my poor English, but the usage of "affinity" > in the changelog/comment looks a bit confusing to me ;) As if this > refers to task_struct.cpus_mask. > > Fortunately "Cpus_active_mm..." in task_cpus_active_mm() makes it > more clear, so feel free to ignore. I appreciate your perspective on the use of the word "affinity." My intention was to describe the relationship between CPUs where a memory descriptor is "active" and the CPUs where the thread is allowed to execute. In other words: the affinity set the boundary; the mm_cpumask recorded the arrival. However, I see how this could be misconstrued. I will certainly refine the language in the changelog and ensure there is no ambiguity between the two. Kind regards, -- Aaron Tomlin
© 2016 - 2026 Red Hat, Inc.