[PATCH] fs/proc: Expose mm_cpumask in /proc/[pid]/status

Aaron Tomlin posted 1 patch 1 month, 3 weeks ago
There is a newer version of this series
fs/proc/array.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
[PATCH] fs/proc: Expose mm_cpumask in /proc/[pid]/status
Posted by Aaron Tomlin 1 month, 3 weeks ago
This patch introduces two new fields to /proc/[pid]/status to display the
set of CPUs, representing the CPU affinity of the process's active
memory context, in both mask and list format: "Cpus_active_mm" and
"Cpus_active_mm_list". The mm_cpumask is primarily used for TLB and
cache synchronisation.

Exposing this information allows userspace to easily identify
memory-task affinity, insight to NUMA alignment, CPU isolation and
real-time workload placement.

Frequent mm_cpumask changes may indicate instability in placement
policies or excessive task migration overhead.

Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
 fs/proc/array.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 42932f88141a..8887c5e38e51 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -409,6 +409,23 @@ static void task_cpus_allowed(struct seq_file *m, struct task_struct *task)
 		   cpumask_pr_args(&task->cpus_mask));
 }
 
+/**
+ * task_cpus_active_mm - Show the mm_cpumask for a process
+ * @m: The seq_file structure for the /proc/PID/status output
+ * @mm: The memory descriptor of the process
+ *
+ * Prints the set of CPUs, representing the CPU affinity of the process's
+ * active memory context, in both mask and list format. This mask is
+ * primarily used for TLB and cache synchronisation.
+ */
+static void task_cpus_active_mm(struct seq_file *m, struct mm_struct *mm)
+{
+	seq_printf(m, "Cpus_active_mm:\t%*pb\n",
+		   cpumask_pr_args(mm_cpumask(mm)));
+	seq_printf(m, "Cpus_active_mm_list:\t%*pbl\n",
+		   cpumask_pr_args(mm_cpumask(mm)));
+}
+
 static inline void task_core_dumping(struct seq_file *m, struct task_struct *task)
 {
 	seq_put_decimal_ull(m, "CoreDumping:\t", !!task->signal->core_state);
@@ -450,12 +467,15 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
 		task_core_dumping(m, task);
 		task_thp_status(m, mm);
 		task_untag_mask(m, mm);
-		mmput(mm);
 	}
 	task_sig(m, task);
 	task_cap(m, task);
 	task_seccomp(m, task);
 	task_cpus_allowed(m, task);
+	if (mm) {
+		task_cpus_active_mm(m, mm);
+		mmput(mm);
+	}
 	cpuset_task_status_allowed(m, task);
 	task_context_switch_counts(m, task);
 	arch_proc_pid_thread_features(m, task);
-- 
2.51.0
Re: [PATCH] fs/proc: Expose mm_cpumask in /proc/[pid]/status
Posted by David Hildenbrand (Red Hat) 1 month, 3 weeks ago
On 12/17/25 03:46, Aaron Tomlin wrote:
> This patch introduces two new fields to /proc/[pid]/status to display the
> set of CPUs, representing the CPU affinity of the process's active
> memory context, in both mask and list format: "Cpus_active_mm" and
> "Cpus_active_mm_list". The mm_cpumask is primarily used for TLB and
> cache synchronisation.
> 
> Exposing this information allows userspace to easily identify
> memory-task affinity, insight to NUMA alignment, CPU isolation and
> real-time workload placement.
> 
> Frequent mm_cpumask changes may indicate instability in placement
> policies or excessive task migration overhead.

I agree with Oleg's comments.

Given that everybody has read access to /proc/$PID/status IIUC, I wonder 
if that information could somehow help an attacker to better attack a 
target program (knowing which CPUs have dirty TLB etc). As you saise, 
it's primarily for TLB and cache sync ...

Just a thought, have nothing concrete in mind.

-- 
Cheers

David
Re: [PATCH] fs/proc: Expose mm_cpumask in /proc/[pid]/status
Posted by Aaron Tomlin 1 month, 3 weeks ago
On Thu, Dec 18, 2025 at 09:30:53AM +0100, David Hildenbrand (Red Hat) wrote:
> I agree with Oleg's comments.
> 
> Given that everybody has read access to /proc/$PID/status IIUC, I wonder if
> that information could somehow help an attacker to better attack a target
> program (knowing which CPUs have dirty TLB etc). As you saise, it's
> primarily for TLB and cache sync ...
> 
> Just a thought, have nothing concrete in mind.

Hi David,

Thank you for raising this point; security and information leakage are,
quite rightly, paramount considerations when adding new entries to
world-readable interfaces like /proc/[pid]/status. Upon reflection, I
submit that the risk here is minimal for a few reasons:

    1. Existing Visibility: The kernel already exposes a significant amount
    of CPU residency information. For instance, /proc/[pid]/stat explicitly
    shows the CPU a task is currently running on (field 39)
    i.e., task_cpu(task), and "Cpus_allowed" already defines the bounds of
    where a task can be. See do_task_stat().

    2. Resolution of Data: The mm_cpumask is a relatively coarse-grained
    diagnostic. While it indicates where TLB entries might be valid, it
    does not provide the fine-grained timing or cache-line information
    typically required for sophisticated side-channel attacks.

    3. Diagnostic Value: The primary intent is to provide visibility into
    the "memory footprint" across CPUs, which is invaluable for debugging
    performance issues related to IPI storms and TLB shootdowns in
    large-scale NUMA systems. The CPU-affinity sets the boundary; the
    mm_cpumask records the arrival; they complement each other.

I trust that the diagnostic utility is seen to outweigh the theoretical
risk in this instance.


Kind regards,
-- 
Aaron Tomlin
Re: [PATCH] fs/proc: Expose mm_cpumask in /proc/[pid]/status
Posted by Oleg Nesterov 1 month, 3 weeks ago
Can't really comment this patch... I mean the intent.
Just a couple of nits:

	- I think this patch should also update Documentation/filesystems/proc.rst

	- I won't object, but do we really need/want another "if (mm)" block ?

	- I guess this is just my poor English, but the usage of "affinity"
	  in the changelog/comment looks a bit confusing to me ;) As if this
	  refers to task_struct.cpus_mask.

	  Fortunately "Cpus_active_mm..." in task_cpus_active_mm() makes it
	  more clear, so feel free to ignore.

Oleg.

On 12/16, Aaron Tomlin wrote:
>
> This patch introduces two new fields to /proc/[pid]/status to display the
> set of CPUs, representing the CPU affinity of the process's active
> memory context, in both mask and list format: "Cpus_active_mm" and
> "Cpus_active_mm_list". The mm_cpumask is primarily used for TLB and
> cache synchronisation.
>
> Exposing this information allows userspace to easily identify
> memory-task affinity, insight to NUMA alignment, CPU isolation and
> real-time workload placement.
>
> Frequent mm_cpumask changes may indicate instability in placement
> policies or excessive task migration overhead.
>
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
>  fs/proc/array.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/fs/proc/array.c b/fs/proc/array.c
> index 42932f88141a..8887c5e38e51 100644
> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -409,6 +409,23 @@ static void task_cpus_allowed(struct seq_file *m, struct task_struct *task)
>  		   cpumask_pr_args(&task->cpus_mask));
>  }
>
> +/**
> + * task_cpus_active_mm - Show the mm_cpumask for a process
> + * @m: The seq_file structure for the /proc/PID/status output
> + * @mm: The memory descriptor of the process
> + *
> + * Prints the set of CPUs, representing the CPU affinity of the process's
> + * active memory context, in both mask and list format. This mask is
> + * primarily used for TLB and cache synchronisation.
> + */
> +static void task_cpus_active_mm(struct seq_file *m, struct mm_struct *mm)
> +{
> +	seq_printf(m, "Cpus_active_mm:\t%*pb\n",
> +		   cpumask_pr_args(mm_cpumask(mm)));
> +	seq_printf(m, "Cpus_active_mm_list:\t%*pbl\n",
> +		   cpumask_pr_args(mm_cpumask(mm)));
> +}
> +
>  static inline void task_core_dumping(struct seq_file *m, struct task_struct *task)
>  {
>  	seq_put_decimal_ull(m, "CoreDumping:\t", !!task->signal->core_state);
> @@ -450,12 +467,15 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
>  		task_core_dumping(m, task);
>  		task_thp_status(m, mm);
>  		task_untag_mask(m, mm);
> -		mmput(mm);
>  	}
>  	task_sig(m, task);
>  	task_cap(m, task);
>  	task_seccomp(m, task);
>  	task_cpus_allowed(m, task);
> +	if (mm) {
> +		task_cpus_active_mm(m, mm);
> +		mmput(mm);
> +	}
>  	cpuset_task_status_allowed(m, task);
>  	task_context_switch_counts(m, task);
>  	arch_proc_pid_thread_features(m, task);
> --
> 2.51.0
>
Re: [PATCH] fs/proc: Expose mm_cpumask in /proc/[pid]/status
Posted by Aaron Tomlin 1 month, 3 weeks ago
On Wed, Dec 17, 2025 at 06:33:26PM +0100, Oleg Nesterov wrote:
> Can't really comment this patch... I mean the intent.
> Just a couple of nits:

Hi Oleg,

Long time no speak. Thank you for your response.

> 	- I think this patch should also update
> 	Documentation/filesystems/proc.rst

Acknowledged. I will do so in the follow-up patch.

> 	- I won't object, but do we really need/want another "if (mm)" block ?

I appreciate your observation; technically, the code could be more compact
by merging this into the earlier conditional block. However, my reasoning
here was primarily a personal preference regarding the resulting output of
/proc/[PID]/status. I felt it was beneficial to keep "Cpus_active_mm" and
"Cpus_active_mm_list" in close proximity to their counterparts,
"Cpus_allowed" and "Cpus_allowed_list", to provide a more intuitive and
logically grouped view for the user.

> 	- I guess this is just my poor English, but the usage of "affinity"
> 	  in the changelog/comment looks a bit confusing to me ;) As if this
> 	  refers to task_struct.cpus_mask.
> 
> 	  Fortunately "Cpus_active_mm..." in task_cpus_active_mm() makes it
> 	  more clear, so feel free to ignore.

I appreciate your perspective on the use of the word "affinity."
My intention was to describe the relationship between CPUs where a memory
descriptor is "active" and the CPUs where the thread is allowed to execute.
In other words: the affinity set the boundary; the mm_cpumask recorded the
arrival. However, I see how this could be misconstrued. I will certainly
refine the language in the changelog and ensure there is no ambiguity
between the two.


Kind regards,
-- 
Aaron Tomlin