[v4] x86,fs/resctrl: Fix long-standing issues

[PATCH v4 00/10] x86,fs/resctrl: Fix long-standing issues

Posted by Reinette Chatre 5 days, 2 hours ago

v3: https://lore.kernel.org/lkml/cover.1779476724.git.reinette.chatre@intel.com/
v2: https://lore.kernel.org/lkml/20260515193944.15114-1-tony.luck@intel.com/
v1: https://lore.kernel.org/all/20260508182143.14592-1-tony.luck@intel.com/

While reviewing the AET series [1] Sashiko reported a deadlock during mount,
and a use-after-free when an L3 domain is removed during CPU offline. More issues
were uncovered as fixes were developed and reviewed. While the goal is to
fix all issues the races surrounding pseudo-locked regions are not yet
solved and have been removed from this version of fixes.

Applies against tip/master to ensure it considers pending x86/cache changes.

Changes since V3:
- Drop majority of pseudo-locking fixes, only keep the double free/double
  list add fix.
- Add patch to help document safe RCU list traversal.
- See individual patches for detailed changes.

[1] https://sashiko.dev/#/patchset/20260429184858.36423-1-tony.luck%40intel.com

Reinette Chatre (7):
  x86,fs/resctrl: Document safe RCU list traversal
  fs/resctrl: Fix deadlock on errors during mount
  fs/resctrl: Prevent use-after-free in rdtgroup_kn_put()
  fs/resctrl: Fix double-add of pseudo-locked region's RMID to free list
  fs/resctrl: Prevent deadlock and use-after-free in info file handlers
  x86/resctrl: Ensure domain fully initialized before placed on RCU list
  fs/resctrl: Fix UAF from worker threads when domains are removed

Tony Luck (3):
  fs/resctrl: Move functions to avoid forward references in subsequent
    fixes
  fs/resctrl: Free mon_data structures on rdt_get_tree() failure
  fs/resctrl: Fix use-after-free during unmount

 arch/x86/kernel/cpu/resctrl/core.c        |  18 +-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |   4 +-
 arch/x86/kernel/cpu/resctrl/intel_aet.c   |   5 +-
 arch/x86/kernel/cpu/resctrl/monitor.c     |   2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    |   4 +-
 fs/resctrl/ctrlmondata.c                  |  50 +-
 fs/resctrl/internal.h                     |   3 +-
 fs/resctrl/monitor.c                      | 123 ++--
 fs/resctrl/pseudo_lock.c                  |   2 +-
 fs/resctrl/rdtgroup.c                     | 859 ++++++++++++++--------
 10 files changed, 663 insertions(+), 407 deletions(-)

-- 
2.50.1

Re: [PATCH v4 00/10] x86,fs/resctrl: Fix long-standing issues

Posted by Reinette Chatre 3 days, 14 hours ago

Hi Everybody,

Addressing the non patch specific Sashiko feedback from
https://sashiko.dev/#/patchset/cover.1780456704.git.reinette.chatre%40intel.com

I respond to patch specific Sashiko feedback in individual patches.

On 6/2/26 8:27 PM, Reinette Chatre wrote:
> v3: https://lore.kernel.org/lkml/cover.1779476724.git.reinette.chatre@intel.com/
> v2: https://lore.kernel.org/lkml/20260515193944.15114-1-tony.luck@intel.com/
> v1: https://lore.kernel.org/all/20260508182143.14592-1-tony.luck@intel.com/
> 
> While reviewing the AET series [1] Sashiko reported a deadlock during mount,
> and a use-after-free when an L3 domain is removed during CPU offline. More issues
> were uncovered as fixes were developed and reviewed. While the goal is to
> fix all issues the races surrounding pseudo-locked regions are not yet
> solved and have been removed from this version of fixes.

As anticipated Sashiko reported the issues surrounding pseudo-locking. There is
one new concern raised between pseudo-locked regions and assigned counters but
pseudo-locking is a model-specific feature on hardware that does not support
assigned counters.

Sashiko did uncover one new issue related to the limbo handler when SNC is enabled.
For this I am currently planning to add this patch to this series:

From 95da3282f94754e8840497be632314e542375e67 Mon Sep 17 00:00:00 2001
Message-ID: <95da3282f94754e8840497be632314e542375e67.1780586352.git.reinette.chatre@intel.com>
From: Reinette Chatre <reinette.chatre@intel.com>
Date: Wed, 3 Jun 2026 13:37:15 -0700
Subject: [PATCH] x86,fs/resctrl: Prevent out-of-bounds access while offlining
 CPU when SNC enabled

The architecture updates the cpu_mask in a domain's header to track which
online CPUs are associated with the domain. When this mask becomes empty
the architecture initiates offline of the domain that includes calling
on resctrl fs to offline the domain. If it is a monitoring domain in
which LLC occupancy is tracked resctrl fs forces the limbo handler to
release all busy RMID.

The limbo handler reads the current event value associated with a busy
RMID irrespective of it being checked as part of regular "is it still busy"
check or whether it will be forced released anyway. When reading an RMID
on a system with SNC enabled the "logical RMID" is converted to the
"physical RMID" and this conversion requires the NUMA node ID of the
resctrl monitoring domain that is in turn determined by querying the NUMA
node ID of any CPU belonging to the monitoring domain.

When the monitoring domain is going offline its cpu_mask is empty causing
the NUMA node ID query via cpu_to_node() to be done with "nr_cpu_ids" as
argument resulting in an out-of-bounds access.

Refactor the limbo handler to skip reading the RMID when the RMID will
just be forced released anyway. Add a safety check to the architecture's
RMID reader to protect against this scenario.

Fixes: e13db55b5a0d ("x86/resctrl: Introduce snc_nodes_per_l3_cache")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/cover.1780456704.git.reinette.chatre%40intel.com?part=9
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since v4:
- New patch
---
 arch/x86/kernel/cpu/resctrl/monitor.c |  5 ++++
 fs/resctrl/monitor.c                  | 39 +++++++++++++++------------
 2 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 9bf9d7e201aa..fb7024ae50e6 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -259,6 +259,11 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 		return -EINVAL;
 
+	if (cpumask_empty(&hdr->cpu_mask)) {
+		pr_warn_once("Domain %d has no CPUs\n", hdr->id);
+		return -EINVAL;
+	}
+
 	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
 	hw_dom = resctrl_to_arch_mon_dom(d);
 	cpu = cpumask_any(&hdr->cpu_mask);
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 0e6a389a16bf..a932a1fea818 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -135,10 +135,10 @@ void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
 	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
 	struct rmid_entry *entry;
+	bool rmid_dirty = true;
 	u32 idx, cur_idx = 1;
 	void *arch_mon_ctx;
 	void *arch_priv;
-	bool rmid_dirty;
 	u64 val = 0;
 
 	arch_priv = mon_event_all[QOS_L3_OCCUP_EVENT_ID].arch_priv;
@@ -161,22 +161,27 @@ void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
 			break;
 
 		entry = __rmid_entry(idx);
-		if (resctrl_arch_rmid_read(r, &d->hdr, entry->closid, entry->rmid,
-					   QOS_L3_OCCUP_EVENT_ID, arch_priv, &val,
-					   arch_mon_ctx)) {
-			rmid_dirty = true;
-		} else {
-			rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
-
-			/*
-			 * x86's CLOSID and RMID are independent numbers, so the entry's
-			 * CLOSID is an empty CLOSID (X86_RESCTRL_EMPTY_CLOSID). On Arm the
-			 * RMID (PMG) extends the CLOSID (PARTID) space with bits that aren't
-			 * used to select the configuration. It is thus necessary to track both
-			 * CLOSID and RMID because there may be dependencies between them
-			 * on some architectures.
-			 */
-			trace_mon_llc_occupancy_limbo(entry->closid, entry->rmid, d->hdr.id, val);
+		if (!force_free) {
+			if (resctrl_arch_rmid_read(r, &d->hdr, entry->closid,
+						   entry->rmid, QOS_L3_OCCUP_EVENT_ID,
+						   arch_priv, &val, arch_mon_ctx)) {
+				rmid_dirty = true;
+			} else {
+				rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
+
+				/*
+				 * x86's CLOSID and RMID are independent numbers,
+				 * so the entry's CLOSID is an empty CLOSID
+				 * (X86_RESCTRL_EMPTY_CLOSID). On Arm the RMID
+				 * (PMG) extends the CLOSID (PARTID) space with
+				 * bits that aren't used to select the configuration.
+				 * It is thus necessary to track both CLOSID and
+				 * RMID because there may be dependencies between
+				 * them on some architectures.
+				 */
+				trace_mon_llc_occupancy_limbo(entry->closid, entry->rmid,
+							      d->hdr.id, val);
+			}
 		}
 
 		if (force_free || !rmid_dirty) {
-- 
2.53.0