From nobody Mon Jun  8 06:35:46 2026
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E468B3F0AB6
	for <linux-kernel@vger.kernel.org>; Wed,  3 Jun 2026 03:27:59 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780457282; cv=none;
 b=tuP7t6fqoprT1tiLDccS+l8kfIKbSs8jkLVHcVUy2us7z+tbIVnfPmm0vePW1kydOP2l7poGQJadHT1q++10MWxMjFdIpS6kDgJ6kLRqwN6izFR+rf/30qv2dVA5prq4q6a2zMVxX6MFW7aRHs61ffMTRvE+EwZzFhpeqOGGwjQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780457282; c=relaxed/simple;
	bh=T4DbqQK6bzC0/39aw6+BVEETLyFKBbEMyWay9H13OJ4=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=e81ZXE10Ei8013sgisj7oelf/cU4lSLBqJTpaN9waNR7n1I7qNVKqhuqW0vvqEsrHMNUOFuCn7NM22Qo4XD/FIOT9yeGULiHOP8mPAN+b4FaTl7Vmqpmbrl5PUAT+upsDjGgkxa5hombRFl5pOFhnVSx01oQGU1UVqoV3IUlezA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=QnRg38Im; arc=none smtp.client-ip=192.198.163.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="QnRg38Im"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1780457280; x=1811993280;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=T4DbqQK6bzC0/39aw6+BVEETLyFKBbEMyWay9H13OJ4=;
  b=QnRg38ImRl86PHMg/Gp/lEiVCAOrgKmbV67hXLn6fhM9OJA1e9ZrUBQD
   zPGIwcVUprkUD44H7WtwwpbUuLeKMKEyRJaoQrADVoMZ4KcYYlqbaE5UT
   4Im66bLassMnG/QhCD4CKfUDbFoeiFXDOJWKcXMhMXGfTdD3JpQ0uUm8a
   gK68p+fb6xPVIvUxgh3Hu87Sul+28f5i2VPg43z1z2GStgKDJOrfR9Tri
   3Y3hjbE2FO9FBKSgN3pzKXWXsfdxJfhJ0P5Zkc0u4dw2E3l0sJcXcyAs3
   0ylUdMkZNoo29+LbgWG5ahf3p7DNqq/Ne1a9+EcmpvI6HRx1oNuIr/nwh
   Q==;
X-CSE-ConnectionGUID: +pLpfly5Syu+mdJu/eDbtg==
X-CSE-MsgGUID: OFtWOo2dRp2BM68hneSKwg==
X-IronPort-AV: E=McAfee;i="6800,10657,11805"; a="91938969"
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="91938969"
Received: from fmviesa007.fm.intel.com ([10.60.135.147])
  by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:27:59 -0700
X-CSE-ConnectionGUID: zKyMM5yJSq+rTKvNxMV+DQ==
X-CSE-MsgGUID: WWWBGow3SkWTGc6sZia0iA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="241110095"
Received: from rchatre-desk1.jf.intel.com ([10.165.154.99])
  by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:27:58 -0700
From: Reinette Chatre <reinette.chatre@intel.com>
To: tony.luck@intel.com,
	james.morse@arm.com,
	Dave.Martin@arm.com,
	babu.moger@amd.com,
	bp@alien8.de,
	tglx@linutronix.de,
	dave.hansen@linux.intel.com
Cc: x86@kernel.org,
	hpa@zytor.com,
	ben.horgan@arm.com,
	fustini@kernel.org,
	fenghuay@nvidia.com,
	peternewman@google.com,
	yu.c.chen@intel.com,
	linux-kernel@vger.kernel.org,
	patches@lists.linux.dev,
	reinette.chatre@intel.com
Subject: [PATCH v4 01/10] x86,fs/resctrl: Document safe RCU list traversal
Date: Tue,  2 Jun 2026 20:27:29 -0700
Message-ID: 
 <776eb116e624f312239fa71cb20d9005e0f709fb.1780456704.git.reinette.chatre@intel.com>
X-Mailer: git-send-email 2.50.1
In-Reply-To: <cover.1780456704.git.reinette.chatre@intel.com>
References: <cover.1780456704.git.reinette.chatre@intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

rdt_resource::ctrl_domains and rdt_resource::mon_domains are RCU lists with
entries added and removed by architecture from CPU hotplug callbacks that
are run with cpus_write_lock() held. These lists can be traversed safely
from resctrl fs by either holding cpus_read_lock() or relying on an RCU
read-side critical section.

resctrl fs traversals of rdt_resource::ctrl_domains and
rdt_resource::mon_domains are done using list_for_each_entry() with
cpus_read_lock() held. Similarly, x86 architecture callbacks use
list_for_each_entry() expecting that resctrl fs makes the call with
cpus_read_lock() held. Inconsistently, a lockdep_assert_cpus_held() precedes
the list_for_each_entry() call with varying distance to document this safe
RCU list traversal.

In preparation for an upcoming traversal of rdt_resource::ctrl_domains that
needs to be done from RCU read-side critical section there is a requirement
for developers to always know exactly in which context the list is being
traversed.

Replace the list_for_each_entry() traversals of RCU list with
list_for_each_entry_rcu() to document that an RCU list is being traversed
while making use of the built-in lockdep expression that additionally
documents that it is cpus_read_lock() that enables the list to be
traversed from non-RCU protection. Only revert to documenting the
safety of traversal using a comment when lockdep does not have needed
visibility in functions called via smp_call*().

The lockdep expression within list_for_each_entry_rcu() depends on
RCU_EXPERT that is not set in a typical debug kernel so keep the existing
lockdep_assert_cpus_held() that is active with CONFIG_LOCKDEP=3Dy found in
typical debug kernel.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reported-by: Sashiko <sashiko-bot@kernel.org>
---
Changes since v3:
- New patch.
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  4 ++--
 arch/x86/kernel/cpu/resctrl/monitor.c     |  2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    |  4 ++--
 fs/resctrl/ctrlmondata.c                  | 12 +++++++-----
 fs/resctrl/monitor.c                      | 23 +++++++++++++---------
 fs/resctrl/pseudo_lock.c                  |  2 +-
 fs/resctrl/rdtgroup.c                     | 24 +++++++++++------------
 7 files changed, 39 insertions(+), 32 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cp=
u/resctrl/ctrlmondata.c
index b20e705606b8..e74f1ed54b86 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -53,7 +53,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u=
32 closid)
 	/* Walking r->domains, ensure it can't race with cpuhp */
 	lockdep_assert_cpus_held();
=20
-	list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+	list_for_each_entry_rcu(d, &r->ctrl_domains, hdr.list, lockdep_is_cpus_he=
ld()) {
 		hw_dom =3D resctrl_to_arch_ctrl_dom(d);
 		msr_param.res =3D NULL;
 		for (t =3D 0; t < CDP_NUM_TYPES; t++) {
@@ -115,7 +115,7 @@ static void _resctrl_sdciae_enable(struct rdt_resource =
*r, bool enable)
 	lockdep_assert_cpus_held();
=20
 	/* Update MSR_IA32_L3_QOS_EXT_CFG MSR on all the CPUs in all domains */
-	list_for_each_entry(d, &r->ctrl_domains, hdr.list)
+	list_for_each_entry_rcu(d, &r->ctrl_domains, hdr.list, lockdep_is_cpus_he=
ld())
 		on_each_cpu_mask(&d->hdr.cpu_mask, resctrl_sdciae_set_one_amd, &enable, =
1);
 }
=20
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/re=
sctrl/monitor.c
index 9bf9d7e201aa..ca9c88d6fd14 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -500,7 +500,7 @@ static void _resctrl_abmc_enable(struct rdt_resource *r=
, bool enable)
=20
 	lockdep_assert_cpus_held();
=20
-	list_for_each_entry(d, &r->mon_domains, hdr.list) {
+	list_for_each_entry_rcu(d, &r->mon_domains, hdr.list, lockdep_is_cpus_hel=
d()) {
 		on_each_cpu_mask(&d->hdr.cpu_mask, resctrl_abmc_set_one_amd,
 				 &enable, 1);
 		resctrl_arch_reset_rmid_all(r, d);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/r=
esctrl/rdtgroup.c
index 885026468440..5ffa39fa86fa 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -151,7 +151,7 @@ static int set_cache_qos_cfg(int level, bool enable)
 		return -ENOMEM;
=20
 	r_l =3D &rdt_resources_all[level].r_resctrl;
-	list_for_each_entry(d, &r_l->ctrl_domains, hdr.list) {
+	list_for_each_entry_rcu(d, &r_l->ctrl_domains, hdr.list, lockdep_is_cpus_=
held()) {
 		if (r_l->cache.arch_has_per_cpu_cfg)
 			/* Pick all the CPUs in the domain instance */
 			for_each_cpu(cpu, &d->hdr.cpu_mask)
@@ -249,7 +249,7 @@ void resctrl_arch_reset_all_ctrls(struct rdt_resource *=
r)
 	 * CBMs in all ctrl_domains to the maximum mask value. Pick one CPU
 	 * from each domain to update the MSRs below.
 	 */
-	list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+	list_for_each_entry_rcu(d, &r->ctrl_domains, hdr.list, lockdep_is_cpus_he=
ld()) {
 		hw_dom =3D resctrl_to_arch_ctrl_dom(d);
=20
 		for (i =3D 0; i < hw_res->num_closid; i++)
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 9a7dfc48cb2e..f33712c17d38 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -261,7 +261,7 @@ static int parse_line(char *line, struct resctrl_schema=
 *s,
 		return -EINVAL;
 	}
 	dom =3D strim(dom);
-	list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+	list_for_each_entry_rcu(d, &r->ctrl_domains, hdr.list, lockdep_is_cpus_he=
ld()) {
 		if (d->hdr.id =3D=3D dom_id) {
 			data.buf =3D dom;
 			data.closid =3D rdtgrp->closid;
@@ -397,7 +397,7 @@ static void show_doms(struct seq_file *s, struct resctr=
l_schema *schema,
=20
 	if (resource_name)
 		seq_printf(s, "%*s:", max_name_width, resource_name);
-	list_for_each_entry(dom, &r->ctrl_domains, hdr.list) {
+	list_for_each_entry_rcu(dom, &r->ctrl_domains, hdr.list, lockdep_is_cpus_=
held()) {
 		if (sep)
 			seq_puts(s, ";");
=20
@@ -535,6 +535,8 @@ struct rdt_domain_hdr *resctrl_find_domain(struct list_=
head *h, int id,
 	struct rdt_domain_hdr *d;
 	struct list_head *l;
=20
+	lockdep_assert_cpus_held();
+
 	list_for_each(l, h) {
 		d =3D list_entry(l, struct rdt_domain_hdr, list);
 		/* When id is found, return its domain. */
@@ -717,7 +719,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 		 * struct mon_data. Search all domains in the resource for
 		 * one that matches this cache id.
 		 */
-		list_for_each_entry(d, &r->mon_domains, hdr.list) {
+		list_for_each_entry_rcu(d, &r->mon_domains, hdr.list, lockdep_is_cpus_he=
ld()) {
 			if (d->ci_id =3D=3D domid) {
 				cpu =3D cpumask_any(&d->hdr.cpu_mask);
 				ci =3D get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
@@ -817,7 +819,7 @@ static int resctrl_io_alloc_init_cbm(struct resctrl_sch=
ema *s, u32 closid)
 	/* Keep CDP_CODE and CDP_DATA of io_alloc CLOSID's CBM in sync. */
 	if (resctrl_arch_get_cdp_enabled(r->rid)) {
 		peer_type =3D resctrl_peer_type(s->conf_type);
-		list_for_each_entry(d, &s->res->ctrl_domains, hdr.list)
+		list_for_each_entry_rcu(d, &s->res->ctrl_domains, hdr.list, lockdep_is_c=
pus_held())
 			memcpy(&d->staged_config[peer_type],
 			       &d->staged_config[s->conf_type],
 			       sizeof(d->staged_config[0]));
@@ -980,7 +982,7 @@ static int resctrl_io_alloc_parse_line(char *line,  str=
uct rdt_resource *r,
 	}
=20
 	dom =3D strim(dom);
-	list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+	list_for_each_entry_rcu(d, &r->ctrl_domains, hdr.list, lockdep_is_cpus_he=
ld()) {
 		if (update_all || d->hdr.id =3D=3D dom_id) {
 			data.buf =3D dom;
 			data.mode =3D RDT_MODE_SHAREABLE;
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 0e6a389a16bf..d2aa7d045056 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -304,7 +304,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 	idx =3D resctrl_arch_rmid_idx_encode(entry->closid, entry->rmid);
=20
 	entry->busy =3D 0;
-	list_for_each_entry(d, &r->mon_domains, hdr.list) {
+	list_for_each_entry_rcu(d, &r->mon_domains, hdr.list, lockdep_is_cpus_hel=
d()) {
 		/*
 		 * For the first limbo RMID in the domain,
 		 * setup up the limbo worker.
@@ -502,6 +502,11 @@ static int __l3_mon_event_count_sum(struct rdtgroup *r=
dtgrp, struct rmid_read *r
 	 * all domains fail for any reason.
 	 */
 	ret =3D -EINVAL;
+	/*
+	 * RCU list being traversed with CPU hotplug lock held. lockdep
+	 * unable to help prove this here since this work is scheduled via
+	 * smp_call*(). Not called from MBM overflow handler.
+	 */
 	list_for_each_entry(d, &rr->r->mon_domains, hdr.list) {
 		if (d->ci_id !=3D rr->ci->id)
 			continue;
@@ -1226,7 +1231,7 @@ static int rdtgroup_assign_cntr_event(struct rdt_l3_m=
on_domain *d, struct rdtgro
 	int ret =3D 0;
=20
 	if (!d) {
-		list_for_each_entry(d, &r->mon_domains, hdr.list) {
+		list_for_each_entry_rcu(d, &r->mon_domains, hdr.list, lockdep_is_cpus_he=
ld()) {
 			int err;
=20
 			err =3D rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
@@ -1298,7 +1303,7 @@ static void rdtgroup_unassign_cntr_event(struct rdt_l=
3_mon_domain *d, struct rdt
 	struct rdt_resource *r =3D resctrl_arch_get_resource(mevt->rid);
=20
 	if (!d) {
-		list_for_each_entry(d, &r->mon_domains, hdr.list)
+		list_for_each_entry_rcu(d, &r->mon_domains, hdr.list, lockdep_is_cpus_he=
ld())
 			rdtgroup_free_unassign_cntr(r, d, rdtgrp, mevt);
 	} else {
 		rdtgroup_free_unassign_cntr(r, d, rdtgrp, mevt);
@@ -1370,7 +1375,7 @@ static void rdtgroup_update_cntr_event(struct rdt_res=
ource *r, struct rdtgroup *
 	struct rdt_l3_mon_domain *d;
 	int cntr_id;
=20
-	list_for_each_entry(d, &r->mon_domains, hdr.list) {
+	list_for_each_entry_rcu(d, &r->mon_domains, hdr.list, lockdep_is_cpus_hel=
d()) {
 		cntr_id =3D mbm_cntr_get(r, d, rdtgrp, evtid);
 		if (cntr_id >=3D 0)
 			rdtgroup_assign_cntr(r, d, evtid, rdtgrp->mon.rmid,
@@ -1540,7 +1545,7 @@ ssize_t resctrl_mbm_assign_mode_write(struct kernfs_o=
pen_file *of, char *buf,
 		/*
 		 * Reset all the non-achitectural RMID state and assignable counters.
 		 */
-		list_for_each_entry(d, &r->mon_domains, hdr.list) {
+		list_for_each_entry_rcu(d, &r->mon_domains, hdr.list, lockdep_is_cpus_he=
ld()) {
 			mbm_cntr_free_all(r, d);
 			resctrl_reset_rmid_all(r, d);
 		}
@@ -1563,7 +1568,7 @@ int resctrl_num_mbm_cntrs_show(struct kernfs_open_fil=
e *of,
 	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
=20
-	list_for_each_entry(dom, &r->mon_domains, hdr.list) {
+	list_for_each_entry_rcu(dom, &r->mon_domains, hdr.list, lockdep_is_cpus_h=
eld()) {
 		if (sep)
 			seq_putc(s, ';');
=20
@@ -1597,7 +1602,7 @@ int resctrl_available_mbm_cntrs_show(struct kernfs_op=
en_file *of,
 		goto out_unlock;
 	}
=20
-	list_for_each_entry(dom, &r->mon_domains, hdr.list) {
+	list_for_each_entry_rcu(dom, &r->mon_domains, hdr.list, lockdep_is_cpus_h=
eld()) {
 		if (sep)
 			seq_putc(s, ';');
=20
@@ -1647,7 +1652,7 @@ int mbm_L3_assignments_show(struct kernfs_open_file *=
of, struct seq_file *s, voi
=20
 		sep =3D false;
 		seq_printf(s, "%s:", mevt->name);
-		list_for_each_entry(d, &r->mon_domains, hdr.list) {
+		list_for_each_entry_rcu(d, &r->mon_domains, hdr.list, lockdep_is_cpus_he=
ld()) {
 			if (sep)
 				seq_putc(s, ';');
=20
@@ -1745,7 +1750,7 @@ static int resctrl_parse_mbm_assignment(struct rdt_re=
source *r, struct rdtgroup
 	}
=20
 	/* Verify if the dom_id is valid */
-	list_for_each_entry(d, &r->mon_domains, hdr.list) {
+	list_for_each_entry_rcu(d, &r->mon_domains, hdr.list, lockdep_is_cpus_hel=
d()) {
 		if (d->hdr.id =3D=3D dom_id) {
 			ret =3D rdtgroup_modify_assign_state(dom_str, d, rdtgrp, mevt);
 			if (ret) {
diff --git a/fs/resctrl/pseudo_lock.c b/fs/resctrl/pseudo_lock.c
index d1cb0986006e..dea2b4bf966f 100644
--- a/fs/resctrl/pseudo_lock.c
+++ b/fs/resctrl/pseudo_lock.c
@@ -656,7 +656,7 @@ bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_ctr=
l_domain *d)
 	 * associated with them.
 	 */
 	for_each_alloc_capable_rdt_resource(r) {
-		list_for_each_entry(d_i, &r->ctrl_domains, hdr.list) {
+		list_for_each_entry_rcu(d_i, &r->ctrl_domains, hdr.list, lockdep_is_cpus=
_held()) {
 			if (d_i->plr)
 				cpumask_or(cpu_with_psl, cpu_with_psl,
 					   &d_i->hdr.cpu_mask);
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index af2cbab14497..2a6221925767 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -117,7 +117,7 @@ void rdt_staged_configs_clear(void)
 	lockdep_assert_held(&rdtgroup_mutex);
=20
 	for_each_alloc_capable_rdt_resource(r) {
-		list_for_each_entry(dom, &r->ctrl_domains, hdr.list)
+		list_for_each_entry_rcu(dom, &r->ctrl_domains, hdr.list, lockdep_is_cpus=
_held())
 			memset(dom->staged_config, 0, sizeof(dom->staged_config));
 	}
 }
@@ -1063,7 +1063,7 @@ static int rdt_bit_usage_show(struct kernfs_open_file=
 *of,
=20
 	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
-	list_for_each_entry(dom, &r->ctrl_domains, hdr.list) {
+	list_for_each_entry_rcu(dom, &r->ctrl_domains, hdr.list, lockdep_is_cpus_=
held()) {
 		if (sep)
 			seq_putc(seq, ';');
 		hw_shareable =3D r->cache.shareable_bits;
@@ -1415,7 +1415,7 @@ static bool rdtgroup_mode_test_exclusive(struct rdtgr=
oup *rdtgrp)
 		if (r->rid =3D=3D RDT_RESOURCE_MBA || r->rid =3D=3D RDT_RESOURCE_SMBA)
 			continue;
 		has_cache =3D true;
-		list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+		list_for_each_entry_rcu(d, &r->ctrl_domains, hdr.list, lockdep_is_cpus_h=
eld()) {
 			ctrl =3D resctrl_arch_get_config(r, d, closid,
 						       s->conf_type);
 			if (rdtgroup_cbm_overlaps(s, d, ctrl, closid, false)) {
@@ -1604,7 +1604,7 @@ static int rdtgroup_size_show(struct kernfs_open_file=
 *of,
 		type =3D schema->conf_type;
 		sep =3D false;
 		seq_printf(s, "%*s:", max_name_width, schema->name);
-		list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+		list_for_each_entry_rcu(d, &r->ctrl_domains, hdr.list, lockdep_is_cpus_h=
eld()) {
 			if (sep)
 				seq_putc(s, ';');
 			if (rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKSETUP) {
@@ -1649,7 +1649,7 @@ static int mbm_config_show(struct seq_file *s, struct=
 rdt_resource *r, u32 evtid
 	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
=20
-	list_for_each_entry(dom, &r->mon_domains, hdr.list) {
+	list_for_each_entry_rcu(dom, &r->mon_domains, hdr.list, lockdep_is_cpus_h=
eld()) {
 		if (sep)
 			seq_puts(s, ";");
=20
@@ -1763,7 +1763,7 @@ static int mon_config_write(struct rdt_resource *r, c=
har *tok, u32 evtid)
 		return -EINVAL;
 	}
=20
-	list_for_each_entry(d, &r->mon_domains, hdr.list) {
+	list_for_each_entry_rcu(d, &r->mon_domains, hdr.list, lockdep_is_cpus_hel=
d()) {
 		if (d->hdr.id =3D=3D dom_id) {
 			mbm_config_write_domain(r, d, evtid, val);
 			goto next;
@@ -2554,7 +2554,7 @@ static int set_mba_sc(bool mba_sc)
=20
 	rdtgroup_default.mba_mbps_event =3D mba_mbps_default_event;
=20
-	list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+	list_for_each_entry_rcu(d, &r->ctrl_domains, hdr.list, lockdep_is_cpus_he=
ld()) {
 		for (i =3D 0; i < num_closid; i++)
 			d->mbps_val[i] =3D MBA_MAX_MBPS;
 	}
@@ -2879,7 +2879,7 @@ static int rdt_get_tree(struct fs_context *fc)
=20
 	if (resctrl_is_mbm_enabled()) {
 		r =3D resctrl_arch_get_resource(RDT_RESOURCE_L3);
-		list_for_each_entry(dom, &r->mon_domains, hdr.list)
+		list_for_each_entry_rcu(dom, &r->mon_domains, hdr.list, lockdep_is_cpus_=
held())
 			mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL,
 						   RESCTRL_PICK_ANY_CPU);
 	}
@@ -3435,7 +3435,7 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_=
node *parent_kn,
 	/* Walking r->domains, ensure it can't race with cpuhp */
 	lockdep_assert_cpus_held();
=20
-	list_for_each_entry(hdr, &r->mon_domains, list) {
+	list_for_each_entry_rcu(hdr, &r->mon_domains, list, lockdep_is_cpus_held(=
)) {
 		ret =3D mkdir_mondata_subdir(parent_kn, hdr, r, prgrp);
 		if (ret)
 			return ret;
@@ -3620,7 +3620,7 @@ int rdtgroup_init_cat(struct resctrl_schema *s, u32 c=
losid)
 	struct rdt_ctrl_domain *d;
 	int ret;
=20
-	list_for_each_entry(d, &s->res->ctrl_domains, hdr.list) {
+	list_for_each_entry_rcu(d, &s->res->ctrl_domains, hdr.list, lockdep_is_cp=
us_held()) {
 		ret =3D __init_one_rdt_domain(d, s, closid);
 		if (ret < 0)
 			return ret;
@@ -3635,7 +3635,7 @@ static void rdtgroup_init_mba(struct rdt_resource *r,=
 u32 closid)
 	struct resctrl_staged_config *cfg;
 	struct rdt_ctrl_domain *d;
=20
-	list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+	list_for_each_entry_rcu(d, &r->ctrl_domains, hdr.list, lockdep_is_cpus_he=
ld()) {
 		if (is_mba_sc(r)) {
 			d->mbps_val[closid] =3D MBA_MAX_MBPS;
 			continue;
@@ -4506,7 +4506,7 @@ static struct rdt_l3_mon_domain *get_mon_domain_from_=
cpu(int cpu,
=20
 	lockdep_assert_cpus_held();
=20
-	list_for_each_entry(d, &r->mon_domains, hdr.list) {
+	list_for_each_entry_rcu(d, &r->mon_domains, hdr.list, lockdep_is_cpus_hel=
d()) {
 		/* Find the domain that contains this CPU */
 		if (cpumask_test_cpu(cpu, &d->hdr.cpu_mask))
 			return d;
--=20
2.50.1
From nobody Mon Jun  8 06:35:46 2026
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2EB923F1668
	for <linux-kernel@vger.kernel.org>; Wed,  3 Jun 2026 03:28:01 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780457282; cv=none;
 b=R/NMg7sgph5DXh723amkwD9N/YY3v2BOS94Gz54xZxhaIw3KZfCA/d1aE+UL6L3YIWkrNFhZd99Ry7JeZD0779G+li6xQg7tMYtx6ClNMtcLimLBxStqBHlPlml6YnfskIa3r4PvdIvJm0ysu8p9BPdywZsCB+q4UP93W5nZglg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780457282; c=relaxed/simple;
	bh=6ATno5HyYAEBpGTzLDqq16Sa/353TxzmxUBOAQq6ObE=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=VHOZyJFm1FC5wrydcqoRcWDX6ltN2Jy4BqXI3MC2CNvwloz17QEKVDTff+pnK0/Nnblj++z50oQbaieKAEvOAGFNgGHaejlTK+xlF8KJ8xDQ4jFwrbvS/Rx4mX7WD3RVVM0ZcLR1364gPaI5JnF7DkxBNxhoYKsCT98YWwdttC0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=mVIvMe8B; arc=none smtp.client-ip=192.198.163.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="mVIvMe8B"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1780457281; x=1811993281;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=6ATno5HyYAEBpGTzLDqq16Sa/353TxzmxUBOAQq6ObE=;
  b=mVIvMe8BU/FDRcLuWQ2IFjiCNrvJ7Xa7JmU9Sq1JIfQpVIvj/rhJwu/Q
   +LxYtkwd/jVatm3c0u6egq9G1EP4TXCPMf8w90+AJtQOrPusX/VKskwfP
   xHxz9JDGAvHfybi9Gt1fnCmtBNwEiB+zo3nkP0RawbZRcTh5qJR/qqeLY
   noTikm1oj3cPQOd3imQ6KbI0ney+r+mAHNYu6KEnVD2SDstTi5MIc0tnJ
   Jvi+EQ96DUEOUICTV3l7Wkt2KYik4qPEHwnOTBaFz4ulfyx2NaO6+J96H
   ABdAttQ7xKO2KHW/rQQtKbsNs0kW6blP8ko0/tmFtzQ27sIC3zYoT7W5l
   g==;
X-CSE-ConnectionGUID: Bx6k12EzStWYi8/5gIvdVA==
X-CSE-MsgGUID: YrfBb3uSQtabInVEgBXxFg==
X-IronPort-AV: E=McAfee;i="6800,10657,11805"; a="91938979"
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="91938979"
Received: from fmviesa007.fm.intel.com ([10.60.135.147])
  by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:27:59 -0700
X-CSE-ConnectionGUID: AeGIMIwyTaWxqNaT7HDxIQ==
X-CSE-MsgGUID: 1qNHthCTROeJM2knSwRqzg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="241110099"
Received: from rchatre-desk1.jf.intel.com ([10.165.154.99])
  by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:27:58 -0700
From: Reinette Chatre <reinette.chatre@intel.com>
To: tony.luck@intel.com,
	james.morse@arm.com,
	Dave.Martin@arm.com,
	babu.moger@amd.com,
	bp@alien8.de,
	tglx@linutronix.de,
	dave.hansen@linux.intel.com
Cc: x86@kernel.org,
	hpa@zytor.com,
	ben.horgan@arm.com,
	fustini@kernel.org,
	fenghuay@nvidia.com,
	peternewman@google.com,
	yu.c.chen@intel.com,
	linux-kernel@vger.kernel.org,
	patches@lists.linux.dev,
	reinette.chatre@intel.com
Subject: [PATCH v4 02/10] fs/resctrl: Move functions to avoid forward
 references in subsequent fixes
Date: Tue,  2 Jun 2026 20:27:30 -0700
Message-ID: 
 <741d65f435bd6745693c321b817eca58b70ec0b2.1780456704.git.reinette.chatre@intel.com>
X-Mailer: git-send-email 2.50.1
In-Reply-To: <cover.1780456704.git.reinette.chatre@intel.com>
References: <cover.1780456704.git.reinette.chatre@intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Tony Luck <tony.luck@intel.com>

rdt_get_tree() manages resctrl fs mount and rdt_kill_sb() manages resctrl
fs unmount.

There is significant overlap between error cleanup during resctrl mount
failure and cleanup on resctrl unmount yet the cleanup is not done
consistently in these two flows.

Pull some cleanup functions before rdt_get_tree() in preparation for
a new helper that can be shared between mount and unmount.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reported-by: Sashiko <sashiko-bot@kernel.org>
---
Changes since V2:
- Rewrite changelog.

Changes since V3:
- Add Ben's Reviewed-by tag.
---
 fs/resctrl/rdtgroup.c | 376 +++++++++++++++++++++---------------------
 1 file changed, 188 insertions(+), 188 deletions(-)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 2a6221925767..2b624cf02147 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2792,6 +2792,194 @@ static void schemata_list_destroy(void)
 	}
 }
=20
+/*
+ * Move tasks from one to the other group. If @from is NULL, then all tasks
+ * in the systems are moved unconditionally (used for teardown).
+ *
+ * If @mask is not NULL the cpus on which moved tasks are running are set
+ * in that mask so the update smp function call is restricted to affected
+ * cpus.
+ */
+static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *t=
o,
+				 struct cpumask *mask)
+{
+	struct task_struct *p, *t;
+
+	read_lock(&tasklist_lock);
+	for_each_process_thread(p, t) {
+		if (!from || is_closid_match(t, from) ||
+		    is_rmid_match(t, from)) {
+			resctrl_arch_set_closid_rmid(t, to->closid,
+						     to->mon.rmid);
+
+			/*
+			 * Order the closid/rmid stores above before the loads
+			 * in task_curr(). This pairs with the full barrier
+			 * between the rq->curr update and
+			 * resctrl_arch_sched_in() during context switch.
+			 */
+			smp_mb();
+
+			/*
+			 * If the task is on a CPU, set the CPU in the mask.
+			 * The detection is inaccurate as tasks might move or
+			 * schedule before the smp function call takes place.
+			 * In such a case the function call is pointless, but
+			 * there is no other side effect.
+			 */
+			if (IS_ENABLED(CONFIG_SMP) && mask && task_curr(t))
+				cpumask_set_cpu(task_cpu(t), mask);
+		}
+	}
+	read_unlock(&tasklist_lock);
+}
+
+static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
+{
+	struct rdtgroup *sentry, *stmp;
+	struct list_head *head;
+
+	head =3D &rdtgrp->mon.crdtgrp_list;
+	list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
+		rdtgroup_unassign_cntrs(sentry);
+		free_rmid(sentry->closid, sentry->mon.rmid);
+		list_del(&sentry->mon.crdtgrp_list);
+
+		if (atomic_read(&sentry->waitcount) !=3D 0)
+			sentry->flags =3D RDT_DELETED;
+		else
+			rdtgroup_remove(sentry);
+	}
+}
+
+/*
+ * Forcibly remove all of subdirectories under root.
+ */
+static void rmdir_all_sub(void)
+{
+	struct rdtgroup *rdtgrp, *tmp;
+
+	/* Move all tasks to the default resource group */
+	rdt_move_group_tasks(NULL, &rdtgroup_default, NULL);
+
+	list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) {
+		/* Free any child rmids */
+		free_all_child_rdtgrp(rdtgrp);
+
+		/* Remove each rdtgroup other than root */
+		if (rdtgrp =3D=3D &rdtgroup_default)
+			continue;
+
+		if (rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKSETUP ||
+		    rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKED)
+			rdtgroup_pseudo_lock_remove(rdtgrp);
+
+		/*
+		 * Give any CPUs back to the default group. We cannot copy
+		 * cpu_online_mask because a CPU might have executed the
+		 * offline callback already, but is still marked online.
+		 */
+		cpumask_or(&rdtgroup_default.cpu_mask,
+			   &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
+
+		rdtgroup_unassign_cntrs(rdtgrp);
+
+		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
+
+		kernfs_remove(rdtgrp->kn);
+		list_del(&rdtgrp->rdtgroup_list);
+
+		if (atomic_read(&rdtgrp->waitcount) !=3D 0)
+			rdtgrp->flags =3D RDT_DELETED;
+		else
+			rdtgroup_remove(rdtgrp);
+	}
+	/* Notify online CPUs to update per cpu storage and PQR_ASSOC MSR */
+	update_closid_rmid(cpu_online_mask, &rdtgroup_default);
+
+	kernfs_remove(kn_info);
+	kernfs_remove(kn_mongrp);
+	kernfs_remove(kn_mondata);
+}
+
+/**
+ * mon_get_kn_priv() - Get the mon_data priv data for this event.
+ *
+ * The same values are used across the mon_data directories of all control=
 and
+ * monitor groups for the same event in the same domain. Keep a list of
+ * allocated structures and re-use an existing one with the same values for
+ * @rid, @domid, etc.
+ *
+ * @rid:    The resource id for the event file being created.
+ * @domid:  The domain id for the event file being created.
+ * @mevt:   The type of event file being created.
+ * @do_sum: Whether SNC summing monitors are being created. Only set
+ *	    when @rid =3D=3D RDT_RESOURCE_L3.
+ *
+ * Return: Pointer to mon_data private data of the event, NULL on failure.
+ */
+static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int do=
mid,
+					struct mon_evt *mevt,
+					bool do_sum)
+{
+	struct mon_data *priv;
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	list_for_each_entry(priv, &mon_data_kn_priv_list, list) {
+		if (priv->rid =3D=3D rid && priv->domid =3D=3D domid &&
+		    priv->sum =3D=3D do_sum && priv->evt =3D=3D mevt)
+			return priv;
+	}
+
+	priv =3D kzalloc_obj(*priv);
+	if (!priv)
+		return NULL;
+
+	priv->rid =3D rid;
+	priv->domid =3D domid;
+	priv->sum =3D do_sum;
+	priv->evt =3D mevt;
+	list_add_tail(&priv->list, &mon_data_kn_priv_list);
+
+	return priv;
+}
+
+/**
+ * mon_put_kn_priv() - Free all allocated mon_data structures.
+ *
+ * Called when resctrl file system is unmounted.
+ */
+static void mon_put_kn_priv(void)
+{
+	struct mon_data *priv, *tmp;
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	list_for_each_entry_safe(priv, tmp, &mon_data_kn_priv_list, list) {
+		list_del(&priv->list);
+		kfree(priv);
+	}
+}
+
+static void resctrl_fs_teardown(void)
+{
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	/* Cleared by rdtgroup_destroy_root() */
+	if (!rdtgroup_default.kn)
+		return;
+
+	rmdir_all_sub();
+	rdtgroup_unassign_cntrs(&rdtgroup_default);
+	mon_put_kn_priv();
+	rdt_pseudo_lock_release();
+	rdtgroup_default.mode =3D RDT_MODE_SHAREABLE;
+	closid_exit();
+	schemata_list_destroy();
+	rdtgroup_destroy_root();
+}
+
 static int rdt_get_tree(struct fs_context *fc)
 {
 	struct rdt_fs_context *ctx =3D rdt_fc2context(fc);
@@ -2991,194 +3179,6 @@ static int rdt_init_fs_context(struct fs_context *f=
c)
 	return 0;
 }
=20
-/*
- * Move tasks from one to the other group. If @from is NULL, then all tasks
- * in the systems are moved unconditionally (used for teardown).
- *
- * If @mask is not NULL the cpus on which moved tasks are running are set
- * in that mask so the update smp function call is restricted to affected
- * cpus.
- */
-static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *t=
o,
-				 struct cpumask *mask)
-{
-	struct task_struct *p, *t;
-
-	read_lock(&tasklist_lock);
-	for_each_process_thread(p, t) {
-		if (!from || is_closid_match(t, from) ||
-		    is_rmid_match(t, from)) {
-			resctrl_arch_set_closid_rmid(t, to->closid,
-						     to->mon.rmid);
-
-			/*
-			 * Order the closid/rmid stores above before the loads
-			 * in task_curr(). This pairs with the full barrier
-			 * between the rq->curr update and
-			 * resctrl_arch_sched_in() during context switch.
-			 */
-			smp_mb();
-
-			/*
-			 * If the task is on a CPU, set the CPU in the mask.
-			 * The detection is inaccurate as tasks might move or
-			 * schedule before the smp function call takes place.
-			 * In such a case the function call is pointless, but
-			 * there is no other side effect.
-			 */
-			if (IS_ENABLED(CONFIG_SMP) && mask && task_curr(t))
-				cpumask_set_cpu(task_cpu(t), mask);
-		}
-	}
-	read_unlock(&tasklist_lock);
-}
-
-static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
-{
-	struct rdtgroup *sentry, *stmp;
-	struct list_head *head;
-
-	head =3D &rdtgrp->mon.crdtgrp_list;
-	list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
-		rdtgroup_unassign_cntrs(sentry);
-		free_rmid(sentry->closid, sentry->mon.rmid);
-		list_del(&sentry->mon.crdtgrp_list);
-
-		if (atomic_read(&sentry->waitcount) !=3D 0)
-			sentry->flags =3D RDT_DELETED;
-		else
-			rdtgroup_remove(sentry);
-	}
-}
-
-/*
- * Forcibly remove all of subdirectories under root.
- */
-static void rmdir_all_sub(void)
-{
-	struct rdtgroup *rdtgrp, *tmp;
-
-	/* Move all tasks to the default resource group */
-	rdt_move_group_tasks(NULL, &rdtgroup_default, NULL);
-
-	list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) {
-		/* Free any child rmids */
-		free_all_child_rdtgrp(rdtgrp);
-
-		/* Remove each rdtgroup other than root */
-		if (rdtgrp =3D=3D &rdtgroup_default)
-			continue;
-
-		if (rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKSETUP ||
-		    rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKED)
-			rdtgroup_pseudo_lock_remove(rdtgrp);
-
-		/*
-		 * Give any CPUs back to the default group. We cannot copy
-		 * cpu_online_mask because a CPU might have executed the
-		 * offline callback already, but is still marked online.
-		 */
-		cpumask_or(&rdtgroup_default.cpu_mask,
-			   &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
-
-		rdtgroup_unassign_cntrs(rdtgrp);
-
-		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
-
-		kernfs_remove(rdtgrp->kn);
-		list_del(&rdtgrp->rdtgroup_list);
-
-		if (atomic_read(&rdtgrp->waitcount) !=3D 0)
-			rdtgrp->flags =3D RDT_DELETED;
-		else
-			rdtgroup_remove(rdtgrp);
-	}
-	/* Notify online CPUs to update per cpu storage and PQR_ASSOC MSR */
-	update_closid_rmid(cpu_online_mask, &rdtgroup_default);
-
-	kernfs_remove(kn_info);
-	kernfs_remove(kn_mongrp);
-	kernfs_remove(kn_mondata);
-}
-
-/**
- * mon_get_kn_priv() - Get the mon_data priv data for this event.
- *
- * The same values are used across the mon_data directories of all control=
 and
- * monitor groups for the same event in the same domain. Keep a list of
- * allocated structures and re-use an existing one with the same values for
- * @rid, @domid, etc.
- *
- * @rid:    The resource id for the event file being created.
- * @domid:  The domain id for the event file being created.
- * @mevt:   The type of event file being created.
- * @do_sum: Whether SNC summing monitors are being created. Only set
- *	    when @rid =3D=3D RDT_RESOURCE_L3.
- *
- * Return: Pointer to mon_data private data of the event, NULL on failure.
- */
-static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int do=
mid,
-					struct mon_evt *mevt,
-					bool do_sum)
-{
-	struct mon_data *priv;
-
-	lockdep_assert_held(&rdtgroup_mutex);
-
-	list_for_each_entry(priv, &mon_data_kn_priv_list, list) {
-		if (priv->rid =3D=3D rid && priv->domid =3D=3D domid &&
-		    priv->sum =3D=3D do_sum && priv->evt =3D=3D mevt)
-			return priv;
-	}
-
-	priv =3D kzalloc_obj(*priv);
-	if (!priv)
-		return NULL;
-
-	priv->rid =3D rid;
-	priv->domid =3D domid;
-	priv->sum =3D do_sum;
-	priv->evt =3D mevt;
-	list_add_tail(&priv->list, &mon_data_kn_priv_list);
-
-	return priv;
-}
-
-/**
- * mon_put_kn_priv() - Free all allocated mon_data structures.
- *
- * Called when resctrl file system is unmounted.
- */
-static void mon_put_kn_priv(void)
-{
-	struct mon_data *priv, *tmp;
-
-	lockdep_assert_held(&rdtgroup_mutex);
-
-	list_for_each_entry_safe(priv, tmp, &mon_data_kn_priv_list, list) {
-		list_del(&priv->list);
-		kfree(priv);
-	}
-}
-
-static void resctrl_fs_teardown(void)
-{
-	lockdep_assert_held(&rdtgroup_mutex);
-
-	/* Cleared by rdtgroup_destroy_root() */
-	if (!rdtgroup_default.kn)
-		return;
-
-	rmdir_all_sub();
-	rdtgroup_unassign_cntrs(&rdtgroup_default);
-	mon_put_kn_priv();
-	rdt_pseudo_lock_release();
-	rdtgroup_default.mode =3D RDT_MODE_SHAREABLE;
-	closid_exit();
-	schemata_list_destroy();
-	rdtgroup_destroy_root();
-}
-
 static void rdt_kill_sb(struct super_block *sb)
 {
 	struct rdt_resource *r;
--=20
2.50.1
From nobody Mon Jun  8 06:35:46 2026
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA1AF3F1ACB
	for <linux-kernel@vger.kernel.org>; Wed,  3 Jun 2026 03:28:01 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780457282; cv=none;
 b=pz37sCeAbL6oOmBH75jB/Lz3D9Ph1Dmi52cFCtD8zbGyOo6f9d+Rgvs70CVL2w3erozWHwfwVcWfcRlHRb75+s8aWWl8psl0pFSx//i2QoCnQ0yHJKDGbjpOxclSGGxJyMOfRLZEx6wR8PDj9Gmj4lFxuY2s34jnKr7kGzlfS0E=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780457282; c=relaxed/simple;
	bh=C8hY132qG+xjkRnVVTukTcbYF6pljAiJfoDuxTvymqU=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=lwv6i8ZC9hYukQsJRy8O2Xcayy/oldjU4sj7zK5Gk+LceJl+HuQvOkwZFzLY7RCNCG3ppFAAwo/jqQe53Y9Po4wGSPPFWLAVeyscFolb+JDqu3uOoKdDu5mVWzugTF036ufqnjOvHl4xcrdNzA/4LN2Y7HKV5Kq50nKbSTw9irQ=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=Hf0kSrZg; arc=none smtp.client-ip=192.198.163.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="Hf0kSrZg"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1780457282; x=1811993282;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=C8hY132qG+xjkRnVVTukTcbYF6pljAiJfoDuxTvymqU=;
  b=Hf0kSrZgAL+bGmqHmYa5FVaDz04fr/LnGLdz3JeAQTYSWSG86wLSK3HZ
   1LJVQDWAaQ8/VJLBEZ5xnJ6G0kEcIUK+0RDF0PUO1XgbPiKk/rsp/3PhB
   dciH36vYdXWywA+hwepvYe6uhTuZ/jlftZweG1uxiBAKHyOpqxgxRdXt4
   Ugw6usOuNRePaA6CpWEdipi1+gHO9hTQZHEJyoZ5ehpQ/eD4soG7149Fa
   58LBKBLRHdP07iBNHXVB5S0jKPt3T531semjfE3Xgrr1iuoJm1/qSD8tx
   VyuORjjMOgIY0YRYNAURFd/16XaitXL1ukuTX6lXN10EA1YiUTfEAmDUH
   A==;
X-CSE-ConnectionGUID: 43jwLUH0QkuzKYJVOXApUQ==
X-CSE-MsgGUID: +GhKgDqIT8GKouvyzR9Slw==
X-IronPort-AV: E=McAfee;i="6800,10657,11805"; a="91938989"
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="91938989"
Received: from fmviesa007.fm.intel.com ([10.60.135.147])
  by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:28:00 -0700
X-CSE-ConnectionGUID: j4VKF9jvTUqPXgHdZo7cQA==
X-CSE-MsgGUID: bUkltbcWQwi8Pk5mgLa0BA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="241110103"
Received: from rchatre-desk1.jf.intel.com ([10.165.154.99])
  by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:27:59 -0700
From: Reinette Chatre <reinette.chatre@intel.com>
To: tony.luck@intel.com,
	james.morse@arm.com,
	Dave.Martin@arm.com,
	babu.moger@amd.com,
	bp@alien8.de,
	tglx@linutronix.de,
	dave.hansen@linux.intel.com
Cc: x86@kernel.org,
	hpa@zytor.com,
	ben.horgan@arm.com,
	fustini@kernel.org,
	fenghuay@nvidia.com,
	peternewman@google.com,
	yu.c.chen@intel.com,
	linux-kernel@vger.kernel.org,
	patches@lists.linux.dev,
	reinette.chatre@intel.com
Subject: [PATCH v4 03/10] fs/resctrl: Free mon_data structures on
 rdt_get_tree() failure
Date: Tue,  2 Jun 2026 20:27:31 -0700
Message-ID: 
 <ae8539a3cd23b82f78f14809c120db63632d4318.1780456704.git.reinette.chatre@intel.com>
X-Mailer: git-send-email 2.50.1
In-Reply-To: <cover.1780456704.git.reinette.chatre@intel.com>
References: <cover.1780456704.git.reinette.chatre@intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Tony Luck <tony.luck@intel.com>

If mkdir_mondata_all() or a subsequent call in rdt_get_tree() fails, the
mon_data structures allocated by mon_get_kn_priv() are leaked.

Add mon_put_kn_priv() to the out_mongrp error path to free the mon_data
structures.

Fixes: 2a6566038544 ("x86/resctrl: Expand the width of domid by replacing m=
on_data_bits")
Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Closes: https://lore.kernel.org/lkml/5d38c1fb-8f91-472b-8897-24b2f50c772b@i=
ntel.com/
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reported-by: Sashiko <sashiko-bot@kernel.org>
---
Changes since V2:
- Reword changelog.

Changes since V3:
- Add Chenyu's Reviewed-by tag that should have been added in V2.
- Add Ben's Reviewed-by tag.
- Add Closes: tag.
---
 fs/resctrl/rdtgroup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 2b624cf02147..31cfb54a5488 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -3081,6 +3081,7 @@ static int rdt_get_tree(struct fs_context *fc)
 		kernfs_remove(kn_mondata);
 out_mongrp:
 	if (resctrl_arch_mon_capable()) {
+		mon_put_kn_priv();
 		rdtgroup_unassign_cntrs(&rdtgroup_default);
 		kernfs_remove(kn_mongrp);
 	}
--=20
2.50.1
From nobody Mon Jun  8 06:35:46 2026
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id DC7233F39EC
	for <linux-kernel@vger.kernel.org>; Wed,  3 Jun 2026 03:28:02 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780457284; cv=none;
 b=J9QLx4AyztxoH2BBtM4ViV+FGXZtuaVQ1KtYLH692DUyfWkql4pGQAF/O9BaWNc1zLgGb3IPjRw5QHmdrEYuVUkqBeAWMPzFZn29ArC38gfX9pFDLjq5YtHYRK2myJ+VjFNiTyqsw6/S57gCdYQTf94XTUORp/SWYdeK2Pw1IqA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780457284; c=relaxed/simple;
	bh=uj2HbwpeVZoI2H2eNfrEoI9yD3Oy3gxW4pyNDQE7YT4=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=ictM4TAXvAKA8Wd1iBt9tL2H8DzEJpfmQWA3qxi4//ASz0mCyuUQ0/WXBFjzoGXpwsZDDFc3TmwtH1muAvjXX58eomaI78UEQbnE7SZg8ZS7cK46S4bszuSVaZAwNGYlcMzfEMKABWrlt9XEuixWubKpjmm87N9lTeF7QFVbV0M=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=aIJySfnM; arc=none smtp.client-ip=192.198.163.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="aIJySfnM"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1780457283; x=1811993283;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=uj2HbwpeVZoI2H2eNfrEoI9yD3Oy3gxW4pyNDQE7YT4=;
  b=aIJySfnMot3sTGfsyCMvQXGY7gx2jZi2/FnjUlo9+wfR29nMvt0RD1dD
   K1cB1BwpuBLLZK734PDEqDSnDxQNSr4fFfDBPM3gmabNWno/ph04Aw8TG
   y1rM/iuTKPwFBJ4TNhoT5iD8fSxiPOJSVffJITPf+3dHsWqWPiuoYlnUE
   HtOy/+BbHL01nKgDx74xdkrFyIDRaiB2kY7F9jy8DN1qdnaVMwa0sihEF
   XhpbFoSCbmgMh81AWjibQv9Npcj4GoDgi4R5ps/KEjqSSSPrY3X/+1dHz
   dtxoRs691akp5aomwkKtSBTzLNv+nOUlHrLi6SBntgklQOiyI89P/a+9q
   Q==;
X-CSE-ConnectionGUID: 25cwYGYeQrufZuPXiPeltw==
X-CSE-MsgGUID: Zwt4JPDwT9avqRB9NTU4Ag==
X-IronPort-AV: E=McAfee;i="6800,10657,11805"; a="91938999"
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="91938999"
Received: from fmviesa007.fm.intel.com ([10.60.135.147])
  by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:28:00 -0700
X-CSE-ConnectionGUID: rt556nYLQ9Wc7DZoPIo8CA==
X-CSE-MsgGUID: 9He36uLFT8qGq+TD0vD2tw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="241110106"
Received: from rchatre-desk1.jf.intel.com ([10.165.154.99])
  by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:27:59 -0700
From: Reinette Chatre <reinette.chatre@intel.com>
To: tony.luck@intel.com,
	james.morse@arm.com,
	Dave.Martin@arm.com,
	babu.moger@amd.com,
	bp@alien8.de,
	tglx@linutronix.de,
	dave.hansen@linux.intel.com
Cc: x86@kernel.org,
	hpa@zytor.com,
	ben.horgan@arm.com,
	fustini@kernel.org,
	fenghuay@nvidia.com,
	peternewman@google.com,
	yu.c.chen@intel.com,
	linux-kernel@vger.kernel.org,
	patches@lists.linux.dev,
	reinette.chatre@intel.com
Subject: [PATCH v4 04/10] fs/resctrl: Fix use-after-free during unmount
Date: Tue,  2 Jun 2026 20:27:32 -0700
Message-ID: 
 <ae6174bbd6b8320943ddd0edefab7bceb16d101e.1780456704.git.reinette.chatre@intel.com>
X-Mailer: git-send-email 2.50.1
In-Reply-To: <cover.1780456704.git.reinette.chatre@intel.com>
References: <cover.1780456704.git.reinette.chatre@intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Tony Luck <tony.luck@intel.com>

During unmount or failure teardown all mon_data structures that contain
monitoring event file private data are freed after which kernfs nodes are
removed. However, the RDT_DELETED flag is never set for the statically
allocated default resource group.

A concurrent reader of an event file associated with the default resource
group may, after dropping kernfs active protection, block on rdtgroup_mutex
while unmount proceeds to free the file private data and destroy the kernfs
node without waiting for the reader.

When the mutex is released, the reader wakes up, observes that RDT_DELETED
is not set for the default group, and dereferences the already-freed
file private data.

The scenario can be depicted as follows:
  CPU0                                      CPU1
   /*
    * Default resource group's
    * monitoring data accessible via
    * kernfs file with kernfs_node::priv
    * pointing to a struct mon_data.
    * User opens the file for reading.
    */
   rdtgroup_mondata_show()                 /* arch encounters fatal error */
    rdtgroup_kn_lock_live()                 resctrl_exit()
     atomic_inc(&rdtgroup_default.waitcount) cpus_read_lock()
     kernfs_break_active_protection(kn)      mutex_lock(&rdtgroup_mutex)
     cpus_read_lock()                        resctrl_fs_teardown()
     mutex_lock(&rdtgroup_mutex)              rmdir_all_sub()
                                              mon_put_kn_priv()
                                               /* Delete all mon_data struc=
tures */
                                              rdtgroup_destroy_root()
                                               kernfs_destroy_root()
                                               rdtgroup_default.kn =3D NULL
                                             mutex_unlock(&rdtgroup_mutex)
     /*
      * rdtgroup_default.flags is empty so
      * rdtgroup_kn_lock_live() returns
      * &rdtgroup_default
      */
     md =3D of->kn->priv;

     /* md points to freed mon_data */

Set RDT_DELETED for the default group unconditionally since the flag does
not lead to the freeing of this statically allocated group.

Do not allow a new resctrl mount if there are any waiters on default group
of previous mount. A new mount will re-initialize the default group that
would appear to waiters from previous mount as though the default group is
accessible causing them to access the mon_data structures from the previous
mount that have been removed.

Fixes: 2a6566038544 ("x86/resctrl: Expand the width of domid by replacing m=
on_data_bits")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260508182143.14592-1-tony.luck%40i=
ntel.com?part=3D2 [1]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
---
Changes since V2:
- Rewrite changelog to not describe code as much.
- Rework changelog to switch to "Reported-by/Closes".
- Merge the duplicate rdtgroup_remove() comment with the function comment.
- Fix changelog to not mention that RDT_DELETED flag is set conditionally.
- Change "Fixes:" tag to point to commit that introduced dynamically
  allocated mon_data this bug involves.

Changes since V3:
- Depict the race. (Chenyu)
- Add Chenyu's Reviewed-by tag.
- Changelog grammar fixes.
---
 fs/resctrl/rdtgroup.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 31cfb54a5488..809f0965474c 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -585,14 +585,20 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open=
_file *of,
  *
  * On resource group creation via a mkdir, an extra kernfs_node reference =
is
  * taken to ensure that the rdtgroup structure remains accessible for the
- * rdtgroup_kn_unlock() calls where it is removed.
+ * rdtgroup_kn_unlock() calls where it is removed. The default group is
+ * statically allocated: it does not have an extra reference but will have
+ * RDT_DELETED set on unmount to support safe access to its associated fil=
es
+ * via rdtgroup_kn_lock_live/rdtgroup_kn_unlock().
  *
- * Drop the extra reference here, then free the rdtgroup structure.
+ * For all but the default group: drop the extra reference, then free the
+ * rdtgroup structure.
  *
  * Return: void
  */
 static void rdtgroup_remove(struct rdtgroup *rdtgrp)
 {
+	if (rdtgrp =3D=3D &rdtgroup_default)
+		return;
 	kernfs_put(rdtgrp->kn);
 	kfree(rdtgrp);
 }
@@ -2975,6 +2981,7 @@ static void resctrl_fs_teardown(void)
 	mon_put_kn_priv();
 	rdt_pseudo_lock_release();
 	rdtgroup_default.mode =3D RDT_MODE_SHAREABLE;
+	rdtgroup_default.flags =3D RDT_DELETED;
 	closid_exit();
 	schemata_list_destroy();
 	rdtgroup_destroy_root();
@@ -3000,6 +3007,12 @@ static int rdt_get_tree(struct fs_context *fc)
 		goto out;
 	}
=20
+	/* Avoid races from pending operations from a previous mount */
+	if (atomic_read(&rdtgroup_default.waitcount) !=3D 0) {
+		ret =3D -EBUSY;
+		goto out;
+	}
+
 	ret =3D setup_rmid_lru_list();
 	if (ret)
 		goto out;
@@ -4275,6 +4288,7 @@ static int rdtgroup_setup_root(struct rdt_fs_context =
*ctx)
=20
 	ctx->kfc.root =3D rdt_root;
 	rdtgroup_default.kn =3D kernfs_root_to_node(rdt_root);
+	rdtgroup_default.flags =3D 0;
=20
 	return 0;
 }
--=20
2.50.1
From nobody Mon Jun  8 06:35:46 2026
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id DCCDA3F39EF
	for <linux-kernel@vger.kernel.org>; Wed,  3 Jun 2026 03:28:02 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780457285; cv=none;
 b=B0buYTrhtC/yHOZYkNXgEgYgwsmnq/ZnZl5y6jsE99iSxqt0aSFI9uUVLg1SzQq0zq79xJacCUeoCm4aK0YQtFVYaAeCb3eL1fEON2D2HXX3ceH30KRbcm6P1plWTvU8oU+/qE949Jg2g1beJrC7bydMxBdzi+6Pg7PR0IYHm4Y=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780457285; c=relaxed/simple;
	bh=LHCo0MUmMyDzclk6PO+WqAPmteois0/AC1uNvOLDCqs=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=GLI4p3V6iA2ljSwFdWfj2CgpZS5BDSgDmzltt1mEP9fVlL03Dy1ZvOgdsoUd3PH9gQUOiyd+uB3CamBrjg1X6dI/XI87dI/rInEtcxC88nxG//wFRqxWqvm2L1gGCrfPvPe5yRzCzb1yZhpgP3Vj+buwXAJuUW5o5I8wtDFjjV0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=iy81/ARA; arc=none smtp.client-ip=192.198.163.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="iy81/ARA"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1780457283; x=1811993283;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=LHCo0MUmMyDzclk6PO+WqAPmteois0/AC1uNvOLDCqs=;
  b=iy81/ARAlSMTI7jvpUcJhvuJ0/lDedOdWrr6TSr8DdfH6nU8ONIf//VM
   /YcCifWzZy4HWZhXcf7n3JL7c9bKndU6mCs+vwbn5yKxvQybSa8lziDX9
   pllcgudhLBODhvEHl5WpOW9q6pq/FbVQiuT3n7ZO3+8gR5VfxouZoNFrL
   Ip8IFQ+/gEeX37XzPe8WLqRHhIaL0KZz1jjeFN82Vf08ZL2cJKfzNaoVp
   k0kowyOnrz9ccMV1ng5fHs5kctN2neoTS7fZSkwVhope+ZtRIXq4LPuLW
   gZXOMrQHhwO2Jjd3XjWjwQQOjc9yEs5LfE0s7YROWKGuIkOK4DpKCBd8e
   w==;
X-CSE-ConnectionGUID: peDmkmpYR9CwF8ptEemJHg==
X-CSE-MsgGUID: ftJnmAXtTUiShQxLR6/oXw==
X-IronPort-AV: E=McAfee;i="6800,10657,11805"; a="91939009"
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="91939009"
Received: from fmviesa007.fm.intel.com ([10.60.135.147])
  by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:28:00 -0700
X-CSE-ConnectionGUID: Dm+WnOybRc2uEBTPpVFKJg==
X-CSE-MsgGUID: LEkV1cLHQcGp9diTW8qpLg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="241110110"
Received: from rchatre-desk1.jf.intel.com ([10.165.154.99])
  by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:28:00 -0700
From: Reinette Chatre <reinette.chatre@intel.com>
To: tony.luck@intel.com,
	james.morse@arm.com,
	Dave.Martin@arm.com,
	babu.moger@amd.com,
	bp@alien8.de,
	tglx@linutronix.de,
	dave.hansen@linux.intel.com
Cc: x86@kernel.org,
	hpa@zytor.com,
	ben.horgan@arm.com,
	fustini@kernel.org,
	fenghuay@nvidia.com,
	peternewman@google.com,
	yu.c.chen@intel.com,
	linux-kernel@vger.kernel.org,
	patches@lists.linux.dev,
	reinette.chatre@intel.com
Subject: [PATCH v4 05/10] fs/resctrl: Fix deadlock on errors during mount
Date: Tue,  2 Jun 2026 20:27:33 -0700
Message-ID: 
 <1184040fb321fb99fde6155a4ab91c654b059b1b.1780456704.git.reinette.chatre@intel.com>
X-Mailer: git-send-email 2.50.1
In-Reply-To: <cover.1780456704.git.reinette.chatre@intel.com>
References: <cover.1780456704.git.reinette.chatre@intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

rdt_get_tree() acquires rdtgroup_mutex before calling kernfs_get_tree(). If
superblock setup fails inside kernfs_get_tree(), the VFS calls .kill_sb()
(rdt_kill_sb()) on the same thread before kernfs_get_tree() returns.
rdt_kill_sb() unconditionally attempts to acquire rdtgroup_mutex and
deadlock occurs.

Since mount failure resulting from kernfs_get_tree() already calls the
resctrl fs unmount handler (rdt_kill_sb()) let both call the same helper
to make it clear both paths perform the same cleanup.

Call kernfs_get_tree() outside of locks. If kernfs_get_tree() fails and
ctx->kfc.new_sb_created is set, then rdt_kill_sb() has already been called
and no further cleanup is needed.

kernfs_get_tree() may set ctx->kfc.new_sb_created and then fail to obtain
an inode for the new kn, causing the rdt_kill_sb() path to run with one few=
er
reference than required for the root to remain accessible in kernfs_kill_sb=
().
Add an extra hold on rdtgroup_default.kn to defend against this scenario
and ensure the root can be dereferenced safely from kernfs_kill_sb().

Dropping locks before kernfs_get_tree() creates a window where CPU hotplug
callbacks can race with the mount operation. Specifically, an online event
observing resctrl_mounted =3D=3D true could concurrently append directories=
 to
the unactivated kernfs tree, allocate mon_data structures, and arm backgrou=
nd
workers.

This concurrency is safe because the mount has not yet returned to the VFS,
meaning userspace cannot interact with these transient files. If
kernfs_get_tree() subsequently fails, the standard resctrl_unmount() teardo=
wn
safely manages the concurrent modifications: any dynamically generated kern=
fs
nodes are removed, and the associated memory is freed. Any background
workers spawned by the hotplug event will naturally exit without re-arming
when they acquire rdtgroup_mutex and observe resctrl_mounted =3D=3D false.

Fixes: 5ff193fbde20 ("x86/intel_rdt: Add basic resctrl filesystem support")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260429184858.36423-1-tony.luck%40i=
ntel.com [1]
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
---
Changes since V2:
- Switch to "Reported-by/Closes" in changelog

Changes since V3:
- Add Ben's Reviewed-by tag.
- Rework subject and changelog.
- s/root kn/root/ in comment. (Chenyu)
- Add Chenyu's Reviewed-by tag.
- Changelog grammar fixes.
- Add snippet to changelog about potential race with hotplug handlers.
---
 fs/resctrl/rdtgroup.c | 83 +++++++++++++++++++++++++++++--------------
 1 file changed, 56 insertions(+), 27 deletions(-)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 809f0965474c..0d073d4db734 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2987,10 +2987,34 @@ static void resctrl_fs_teardown(void)
 	rdtgroup_destroy_root();
 }
=20
+static void resctrl_unmount(void)
+{
+	struct rdt_resource *r;
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	rdt_disable_ctx();
+
+	/* Put everything back to default values. */
+	for_each_alloc_capable_rdt_resource(r)
+		resctrl_arch_reset_all_ctrls(r);
+
+	resctrl_fs_teardown();
+	if (resctrl_arch_alloc_capable())
+		resctrl_arch_disable_alloc();
+	if (resctrl_arch_mon_capable())
+		resctrl_arch_disable_mon();
+	resctrl_mounted =3D false;
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+}
+
 static int rdt_get_tree(struct fs_context *fc)
 {
 	struct rdt_fs_context *ctx =3D rdt_fc2context(fc);
 	unsigned long flags =3D RFTYPE_CTRL_BASE;
+	struct kernfs_node *rdt_root_kn;
 	struct rdt_l3_mon_domain *dom;
 	struct rdt_resource *r;
 	int ret;
@@ -3066,10 +3090,6 @@ static int rdt_get_tree(struct fs_context *fc)
 	if (ret)
 		goto out_mondata;
=20
-	ret =3D kernfs_get_tree(fc);
-	if (ret < 0)
-		goto out_psl;
-
 	if (resctrl_arch_alloc_capable())
 		resctrl_arch_enable_alloc();
 	if (resctrl_arch_mon_capable())
@@ -3085,10 +3105,38 @@ static int rdt_get_tree(struct fs_context *fc)
 						   RESCTRL_PICK_ANY_CPU);
 	}
=20
-	goto out;
+	/*
+	 * Ensure root remains accessible after mutex is unlocked so that
+	 * kernfs_kill_sb() can run safely if called by kernfs_get_tree()'s
+	 * failure path after creating a superblock but before taking reference
+	 * on root kn (for example, if unable to get inode for root kn).
+	 */
+	kernfs_get(rdtgroup_default.kn);
+
+	/*
+	 * Make backup of the current root kn being created to be used in
+	 * kernfs_put(). The additional reference taken above will prevent the
+	 * kn from being freed before kernfs_kill_sb() can run but
+	 * rdtgroup_default.kn may be set to NULL via rdtgroup_destroy_root()
+	 * and its backing root (rdt_root) could be overwritten before
+	 * kernfs_put() can run.
+	 */
+	rdt_root_kn =3D rdtgroup_default.kn;
+
+	rdt_last_cmd_clear();
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+
+	ret =3D kernfs_get_tree(fc);
+	/*
+	 * resctrl can only be mounted once, new superblock only expected
+	 * to be created once.
+	 */
+	if (!ctx->kfc.new_sb_created)
+		resctrl_unmount();
+	kernfs_put(rdt_root_kn);
+	return ret;
=20
-out_psl:
-	rdt_pseudo_lock_release();
 out_mondata:
 	if (resctrl_arch_mon_capable())
 		kernfs_remove(kn_mondata);
@@ -3108,7 +3156,6 @@ static int rdt_get_tree(struct fs_context *fc)
 out_root:
 	rdtgroup_destroy_root();
 out:
-	rdt_last_cmd_clear();
 	mutex_unlock(&rdtgroup_mutex);
 	cpus_read_unlock();
 	return ret;
@@ -3195,26 +3242,8 @@ static int rdt_init_fs_context(struct fs_context *fc)
=20
 static void rdt_kill_sb(struct super_block *sb)
 {
-	struct rdt_resource *r;
-
-	cpus_read_lock();
-	mutex_lock(&rdtgroup_mutex);
-
-	rdt_disable_ctx();
-
-	/* Put everything back to default values. */
-	for_each_alloc_capable_rdt_resource(r)
-		resctrl_arch_reset_all_ctrls(r);
-
-	resctrl_fs_teardown();
-	if (resctrl_arch_alloc_capable())
-		resctrl_arch_disable_alloc();
-	if (resctrl_arch_mon_capable())
-		resctrl_arch_disable_mon();
-	resctrl_mounted =3D false;
+	resctrl_unmount();
 	kernfs_kill_sb(sb);
-	mutex_unlock(&rdtgroup_mutex);
-	cpus_read_unlock();
 }
=20
 static struct file_system_type rdt_fs_type =3D {
--=20
2.50.1
From nobody Mon Jun  8 06:35:46 2026
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08AF03F411A
	for <linux-kernel@vger.kernel.org>; Wed,  3 Jun 2026 03:28:03 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780457284; cv=none;
 b=d+hraj5IvGDapi0+7RfylvFKi9AsnodGO33QUlR9syDZGA4TuLWd6ff5+sX8IPZV9ajLh9mAL1bUg9LZ3+qjT+aBANcCoP+YCWJsCd4WsdsG/Q3VK/qBXYuB+CYsSnQOXNoHmL6cUC5iMrkdWz6sndiTtZ+V3q+5f/cBaf5c+VE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780457284; c=relaxed/simple;
	bh=bQGZ3Gi1hecEzRonc8KkP4mUrFcfRliOT9dqTRLLRN4=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=kJHamgwliDB+7POn1TIdgOF+qpml3Z7GRdqsy9L467Gu8vUk32Kyglegww0wE8rAZ5uh2WrLOJiujjUkbUNq6O1oh6qWImi7jv4H1s4LNkENYHSaW46Im8Nzdm1FLt/3utqrlG9LDsZbsYjes9aBzrJl3oa/kP0+wa2AoEx+QU4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=e1hdAnrT; arc=none smtp.client-ip=192.198.163.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="e1hdAnrT"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1780457283; x=1811993283;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=bQGZ3Gi1hecEzRonc8KkP4mUrFcfRliOT9dqTRLLRN4=;
  b=e1hdAnrTjtQEQ4Nemj5kpeNULYTnNHH2RIVftI0KR7LDdCXeT5YZwZ60
   AC0ALDs9kaM0EbZ49Pn3rOHZKj7uJvn5X+l0uEmCy+eLuazljBUiLOw6Z
   sQrIsghn1tI4z8vbKxIEJEUATb9/JrB/pi9ISqV7JIyV+zNRzPQqSw5MN
   7FzQMlWoNOGlmy/809/exonzGS5igOwBNIebwQNe0QpFeydAAsVKuilte
   PQvIrHMQnE0KOwbjgZEAKK5WJLpHu05jxaUGzh7FsFm+VuKVnmepa75gj
   uU0FiyZF96Z7diZLTw8QKzfDGxNalR8YqPgZ1BZ4QsUdHdDxfYxsiRAu+
   Q==;
X-CSE-ConnectionGUID: mTd5iHFTRv2ma11aLTQ+2w==
X-CSE-MsgGUID: 56i2y7V4QS2zClRC427NpQ==
X-IronPort-AV: E=McAfee;i="6800,10657,11805"; a="91939019"
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="91939019"
Received: from fmviesa007.fm.intel.com ([10.60.135.147])
  by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:28:01 -0700
X-CSE-ConnectionGUID: 4atK+GKBRHKn7zvQjHHrYw==
X-CSE-MsgGUID: s45oagcsQGuyrH3D8BLmQg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="241110114"
Received: from rchatre-desk1.jf.intel.com ([10.165.154.99])
  by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:28:00 -0700
From: Reinette Chatre <reinette.chatre@intel.com>
To: tony.luck@intel.com,
	james.morse@arm.com,
	Dave.Martin@arm.com,
	babu.moger@amd.com,
	bp@alien8.de,
	tglx@linutronix.de,
	dave.hansen@linux.intel.com
Cc: x86@kernel.org,
	hpa@zytor.com,
	ben.horgan@arm.com,
	fustini@kernel.org,
	fenghuay@nvidia.com,
	peternewman@google.com,
	yu.c.chen@intel.com,
	linux-kernel@vger.kernel.org,
	patches@lists.linux.dev,
	reinette.chatre@intel.com
Subject: [PATCH v4 06/10] fs/resctrl: Prevent use-after-free in
 rdtgroup_kn_put()
Date: Tue,  2 Jun 2026 20:27:34 -0700
Message-ID: 
 <a2ba7d12c8dae5c4d04ea0bbda6bf4340f5b0d27.1780456704.git.reinette.chatre@intel.com>
X-Mailer: git-send-email 2.50.1
In-Reply-To: <cover.1780456704.git.reinette.chatre@intel.com>
References: <cover.1780456704.git.reinette.chatre@intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

A struct rdtgroup is reference counted via rdtgroup::waitcount. Callers
that need the structure to remain valid across a sleep (while waiting on
acquiring rdtgroup_mutex) take a reference with rdtgroup_kn_get() and
release it with rdtgroup_kn_put(). The release path is intended to serve
as the fallback freer: if the count drops to zero and the group has
already been marked RDT_DELETED, rdtgroup_kn_put() frees the structure.
The bulk teardown paths free_all_child_rdtgrp() and rmdir_all_sub()
resulting from a resctrl directory remove or resctrl fs unmount act as
the primary freer: they hold rdtgroup_mutex and free each rdtgroup whose
waitcount is zero, otherwise they set RDT_DELETED and leave the freeing
to the last waiter.

These two freers race. rdtgroup_kn_put() commits waitcount =3D=3D 0 with
atomic_dec_and_test() outside rdtgroup_mutex, then reads rdtgroup::flags.
Between those two operations a concurrent caller of free_all_child_rdtgrp()
or rmdir_all_sub() (which holds the mutex) can observe waitcount =3D=3D 0 v=
ia
atomic_read(), call rdtgroup_remove(), and kfree() the structure. The
subsequent read of rdtgroup::flags in rdtgroup_kn_put() is then a
use-after-free, and the structure may even be freed twice if the freed
memory happens to satisfy the RDT_DELETED flag check.

Replace the bare atomic_dec_and_test() with atomic_dec_and_mutex_lock()
so that the decrement-to-zero takes rdtgroup_mutex before the count
becomes globally visible. The inspection of rdtgroup::flags then runs
under the same mutex held by the bulk freers, making the two paths
mutually exclusive. The common case where the count does not reach
zero remains lock-free. Defer kernfs_unbreak_active_protection() until
after the mutex is dropped since kernfs active protections functionally
wrap rdtgroup_mutex. Remove resource group, which in turn drops its kernfs
reference, after kernfs protection is restored.

Fixes: b8511ccc75c0 ("x86/resctrl: Fix use-after-free when deleting resourc=
e groups")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260515193944.15114-1-tony.luck%40i=
ntel.com?part=3D1
Assisted-by: GitHub_Copilot:gemini-3.1-pro
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
Changes since V2:
- New patch

Changes since V3:
- Add Ben's Reviewed-by tag.
- Add Tony's Reviewed-by tag.
---
 fs/resctrl/rdtgroup.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 0d073d4db734..c04424c081a4 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2606,15 +2606,24 @@ static void rdtgroup_kn_get(struct rdtgroup *rdtgrp=
, struct kernfs_node *kn)
=20
 static void rdtgroup_kn_put(struct rdtgroup *rdtgrp, struct kernfs_node *k=
n)
 {
-	if (atomic_dec_and_test(&rdtgrp->waitcount) &&
-	    (rdtgrp->flags & RDT_DELETED)) {
+	bool needs_free;
+
+	if (!atomic_dec_and_mutex_lock(&rdtgrp->waitcount, &rdtgroup_mutex)) {
+		kernfs_unbreak_active_protection(kn);
+		return;
+	}
+
+	needs_free =3D rdtgrp->flags & RDT_DELETED;
+
+	mutex_unlock(&rdtgroup_mutex);
+
+	kernfs_unbreak_active_protection(kn);
+
+	if (needs_free) {
 		if (rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKSETUP ||
 		    rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKED)
 			rdtgroup_pseudo_lock_remove(rdtgrp);
-		kernfs_unbreak_active_protection(kn);
 		rdtgroup_remove(rdtgrp);
-	} else {
-		kernfs_unbreak_active_protection(kn);
 	}
 }
=20
--=20
2.50.1
From nobody Mon Jun  8 06:35:46 2026
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 616D03F54AC
	for <linux-kernel@vger.kernel.org>; Wed,  3 Jun 2026 03:28:04 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780457285; cv=none;
 b=WiBsEhFZH3kNFfeWAIefjqHkOwDuxeupq/gKpCpsK0ow95JJZqZ/cr6bkaHBc68zpPbrmw4+cS20IDbF4Nx1hU56frvn3cVzHzRzTPiXskuKoYtCA0cXFLtocDJc73CkTyDCVk6u2n2RRvLJeEF4QA+F+8EW7//9lhUwVgVcmpY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780457285; c=relaxed/simple;
	bh=+bCwbGXodPOyojvefV3AVotYYV06mZ2z2YAeKsYo9NE=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=TyDa1+E1nxkbzwd1JKuDHhenvUuMJ+seT2qq7zhFQiyumB6oBY6Snd0C4uiEOFKMxEr6BUtpgqQQyFPLGojoX6R4oHy7R184gP6NGOg2IUcJ593nVDPpwTkbWL0ZYVeooNJxbXuIOEK2NKEYTtwqgcbhLHLRLVrLp1KbnZDAzfE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=FJ4UH0z9; arc=none smtp.client-ip=192.198.163.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="FJ4UH0z9"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1780457284; x=1811993284;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=+bCwbGXodPOyojvefV3AVotYYV06mZ2z2YAeKsYo9NE=;
  b=FJ4UH0z9xfJTlt3sya+7s1brfk/P1V3zFP55mU1F03s2CjlpvbgOV/X+
   4UCSGpA6BmXZTk/R5kIRodazoWshOhkUrpZrgVF2J5mubCNZvNJMENatv
   0aHczjx6Um+8HC7EeF5VgALnG4qbLzR/Fz8zu6loUM6SjXblGmgBSmt0t
   2c8/FlOHpsAtdlXwCK1k4IAy+S+2u7SYu82e64G0jibzw8eJmP2GD9RVX
   X8k0j/0rKZOe5PbUa9TRSnBmOEX7IyMp3WpGiPXBuMwgG1I6RfykDKGpl
   KHL1Djz68O7neX8jZeV4Mh4fVCdkGTmQwonI0+zhtIvNbEyJUhDMMTtCp
   w==;
X-CSE-ConnectionGUID: 6ulp+qe1ThWjqmrQd6msPw==
X-CSE-MsgGUID: ij9zDYQqSkiv3wTeoiWBHQ==
X-IronPort-AV: E=McAfee;i="6800,10657,11805"; a="91939029"
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="91939029"
Received: from fmviesa007.fm.intel.com ([10.60.135.147])
  by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:28:01 -0700
X-CSE-ConnectionGUID: I7K+eS4KQmKbRtrIiO8SgA==
X-CSE-MsgGUID: qNzA9OWNSiq/ySxEloFnkg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="241110119"
Received: from rchatre-desk1.jf.intel.com ([10.165.154.99])
  by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:28:00 -0700
From: Reinette Chatre <reinette.chatre@intel.com>
To: tony.luck@intel.com,
	james.morse@arm.com,
	Dave.Martin@arm.com,
	babu.moger@amd.com,
	bp@alien8.de,
	tglx@linutronix.de,
	dave.hansen@linux.intel.com
Cc: x86@kernel.org,
	hpa@zytor.com,
	ben.horgan@arm.com,
	fustini@kernel.org,
	fenghuay@nvidia.com,
	peternewman@google.com,
	yu.c.chen@intel.com,
	linux-kernel@vger.kernel.org,
	patches@lists.linux.dev,
	reinette.chatre@intel.com
Subject: [PATCH v4 07/10] fs/resctrl: Fix double-add of pseudo-locked region's
 RMID to free list
Date: Tue,  2 Jun 2026 20:27:35 -0700
Message-ID: 
 <2eda4d2873e6607b3c3f19bb5cac1d9fa8e2c04d.1780456704.git.reinette.chatre@intel.com>
X-Mailer: git-send-email 2.50.1
In-Reply-To: <cover.1780456704.git.reinette.chatre@intel.com>
References: <cover.1780456704.git.reinette.chatre@intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

A pseudo-locked group's RMID is freed when it is created. On unmount
rmdir_all_sub() unconditionally frees all RMID of all groups, resulting
in a double-free of the pseudo-locked group's RMID. The consequence of this
is that the original free results in the pseudo-locked group's RMID being
added to the rmid_free_lru linked list and the second free then attempts
to add the same RMID entry to the rmid_free_lru again.

Do not double-free a pseudo-locked group's RMID.

Fixes: e0bdfe8e36f3 ("x86/intel_rdt: Support creation/removal of pseudo-loc=
ked region")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reported-by: Sashiko <sashiko-bot@kernel.org>
---
Changes since V2:
- New patch

Changes since V3:
- Extract the double-add/double-free fix from all the other pseudo-locking
  fixes that will be deferred. This issue was uncovered during testing
  of the race fixes so drop all the Reported-by and Closes tags.
---
 fs/resctrl/rdtgroup.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index c04424c081a4..77c9d22017bc 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2885,10 +2885,6 @@ static void rmdir_all_sub(void)
 		if (rdtgrp =3D=3D &rdtgroup_default)
 			continue;
=20
-		if (rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKSETUP ||
-		    rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKED)
-			rdtgroup_pseudo_lock_remove(rdtgrp);
-
 		/*
 		 * Give any CPUs back to the default group. We cannot copy
 		 * cpu_online_mask because a CPU might have executed the
@@ -2899,7 +2895,13 @@ static void rmdir_all_sub(void)
=20
 		rdtgroup_unassign_cntrs(rdtgrp);
=20
-		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
+		if (rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKSETUP ||
+		    rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKED) {
+			rdtgroup_pseudo_lock_remove(rdtgrp);
+		} else {
+			/* Pseudo-locked group's RMID is freed during setup. */
+			free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
+		}
=20
 		kernfs_remove(rdtgrp->kn);
 		list_del(&rdtgrp->rdtgroup_list);
--=20
2.50.1
From nobody Mon Jun  8 06:35:46 2026
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id B60583F58E2
	for <linux-kernel@vger.kernel.org>; Wed,  3 Jun 2026 03:28:04 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780457287; cv=none;
 b=d7uUQXd37VeG7rsFg3wE+G6IrKXAZCJ5ipUweHrYt1/4nf/uP4jAMnsAXaze2mV9lXkEk50QBCE/N1Jw4Su8d+z2c8KZKIFzuqCxgK5e8zWEgolafUK/bxqzv/Ie7sb/ci87Oraep//LO5cZM6tDqSenldNUL/jc+tc2ZPP1wtY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780457287; c=relaxed/simple;
	bh=nCfOM0nWNwA84mKyfiVdAhpO8ALeWZENqh+IUSVvkDo=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=bV2SwVmndcPMNfosZndKJRFYR7015C0Kj8nK1anv7dp/SzCWaX6z8zNk70USkjdt9znWpe8nqYennjR1RQuQ2uGW58oEu/GHUi5SLRhdcrf0I1jCEejbVXgLWi0BvDmG29ry5gOUolTTk71qKAUKHKdwKgM3YzM8mtkhYlrbT6A=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=WpqbrsAp; arc=none smtp.client-ip=192.198.163.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="WpqbrsAp"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1780457285; x=1811993285;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=nCfOM0nWNwA84mKyfiVdAhpO8ALeWZENqh+IUSVvkDo=;
  b=WpqbrsAp+jS/eC2c983qgZF02XTrknThReJroiCyv/6ocl+opDkXomWN
   IjdvRpp5P/Asn7VlA//eohdKE3UZvfbgYAhQum+OweQb26knG9A9k9KzJ
   DbkeLuwu/KVGhaI/kGCns4GA8vwJmuhjYk3CryN4JlkeU9kGdenvOb5+0
   TI2M2NjgBzH2O+cCNjdvSeloU1FngwU2GvQPapO88d55qcQDPpa+X+jzz
   /dKk0Zl8q5NJfcU4e6vFtBAiq79vAqOCdBh+548yvJMf6OMs2aEkApIhH
   FLPpNYG5CQDObobRGEBxHCpyRVXIY82QJJEQferb6VpxDbIhZ+VHR+4GC
   A==;
X-CSE-ConnectionGUID: hd2m857+QEm/pmbHbUirqQ==
X-CSE-MsgGUID: zulWo/9MRdWur0iNg+ET3A==
X-IronPort-AV: E=McAfee;i="6800,10657,11805"; a="91939040"
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="91939040"
Received: from fmviesa007.fm.intel.com ([10.60.135.147])
  by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:28:01 -0700
X-CSE-ConnectionGUID: x8+YdgjUTzeyWRYUmtCeXQ==
X-CSE-MsgGUID: 3LcRbBpzTYGd11GeF13gUw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="241110141"
Received: from rchatre-desk1.jf.intel.com ([10.165.154.99])
  by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:28:01 -0700
From: Reinette Chatre <reinette.chatre@intel.com>
To: tony.luck@intel.com,
	james.morse@arm.com,
	Dave.Martin@arm.com,
	babu.moger@amd.com,
	bp@alien8.de,
	tglx@linutronix.de,
	dave.hansen@linux.intel.com
Cc: x86@kernel.org,
	hpa@zytor.com,
	ben.horgan@arm.com,
	fustini@kernel.org,
	fenghuay@nvidia.com,
	peternewman@google.com,
	yu.c.chen@intel.com,
	linux-kernel@vger.kernel.org,
	patches@lists.linux.dev,
	reinette.chatre@intel.com
Subject: [PATCH v4 08/10] fs/resctrl: Prevent deadlock and use-after-free in
 info file handlers
Date: Tue,  2 Jun 2026 20:27:36 -0700
Message-ID: 
 <e5dc7a2474f1591e628b26d463c029a1872a4237.1780456704.git.reinette.chatre@intel.com>
X-Mailer: git-send-email 2.50.1
In-Reply-To: <cover.1780456704.git.reinette.chatre@intel.com>
References: <cover.1780456704.git.reinette.chatre@intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

resctrl provides files under the info/ directory to expose global
configuration and capabilities to userspace. These files are instantiated
statically during filesystem mount and expose data associated with internal
schema structures via kernfs private pointers.

A potential deadlock exists between userspace readers of these info files
and the unmount filesystem teardown process. Reading an info file invokes
kernfs which acquires an active reference, after which the handler typically
attempts to acquire the rdtgroup_mutex. Concurrently, unmounting the
filesystem holds the rdtgroup_mutex and then attempts to recursively
remove the info kernfs nodes involving kernfs_drain() which blocks until
all active references are released. Another problem exists where info files
might be accessed from an outdated mount if the filesystem is unmounted and
remounted during a reader's execution, leading to a use-after-free when
reading the now-deleted private schema data.

Introduce info_kn_lock() and info_kn_unlock() helpers to coordinate locking
across all info handlers. These helpers mirror similar logic used by resour=
ce
group handlers by deliberately breaking the kernfs active protection before
attempting to acquire the rdtgroup_mutex, preventing the deadlock. To guard
against the vulnerability from rapid mount cycling, info_kn_lock() securely
walks the parent lineage of the kernfs node under an RCU section to confirm
the node belongs to the globally active root before permitting the operation
to proceed. Convert all info file handlers to use this helper and only
de-reference the schema after it is determined safe to do so.

Make no attempt to output an error message to last_cmd_status on failure
since failure implies there is no filesystem with which to display the error
to user space.

Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260515193944.15114-1-tony.luck%40i=
ntel.com?part=3D3
Assisted-by: GitHub_Copilot:gemini-3.1-pro
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
Changes since V2:
- New patch

Changes since V3:
- Add Tony's Reviewed-by tag.
- Changelog grammar fixes.
---
 fs/resctrl/ctrlmondata.c |  38 ++++----
 fs/resctrl/internal.h    |   3 +-
 fs/resctrl/monitor.c     |  48 +++++-----
 fs/resctrl/rdtgroup.c    | 192 ++++++++++++++++++++++++++++++++-------
 4 files changed, 203 insertions(+), 78 deletions(-)

diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index f33712c17d38..2b29fb5a8702 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -771,10 +771,12 @@ int rdtgroup_mondata_show(struct seq_file *m, void *a=
rg)
 int resctrl_io_alloc_show(struct kernfs_open_file *of, struct seq_file *se=
q, void *v)
 {
 	struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn);
-	struct rdt_resource *r =3D s->res;
+	struct rdt_resource *r;
=20
-	mutex_lock(&rdtgroup_mutex);
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
=20
+	r =3D s->res;
 	if (r->cache.io_alloc_capable) {
 		if (resctrl_arch_get_io_alloc_enabled(r))
 			seq_puts(seq, "enabled\n");
@@ -784,7 +786,7 @@ int resctrl_io_alloc_show(struct kernfs_open_file *of, =
struct seq_file *seq, voi
 		seq_puts(seq, "not supported\n");
 	}
=20
-	mutex_unlock(&rdtgroup_mutex);
+	info_kn_unlock(of->kn);
=20
 	return 0;
 }
@@ -849,7 +851,7 @@ ssize_t resctrl_io_alloc_write(struct kernfs_open_file =
*of, char *buf,
 			       size_t nbytes, loff_t off)
 {
 	struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn);
-	struct rdt_resource *r =3D s->res;
+	struct rdt_resource *r;
 	char const *grp_name;
 	u32 io_alloc_closid;
 	bool enable;
@@ -859,9 +861,10 @@ ssize_t resctrl_io_alloc_write(struct kernfs_open_file=
 *of, char *buf,
 	if (ret)
 		return ret;
=20
-	cpus_read_lock();
-	mutex_lock(&rdtgroup_mutex);
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
=20
+	r =3D s->res;
 	rdt_last_cmd_clear();
=20
 	if (!r->cache.io_alloc_capable) {
@@ -909,8 +912,7 @@ ssize_t resctrl_io_alloc_write(struct kernfs_open_file =
*of, char *buf,
 	}
=20
 out_unlock:
-	mutex_unlock(&rdtgroup_mutex);
-	cpus_read_unlock();
+	info_kn_unlock(of->kn);
=20
 	return ret ?: nbytes;
 }
@@ -918,14 +920,15 @@ ssize_t resctrl_io_alloc_write(struct kernfs_open_fil=
e *of, char *buf,
 int resctrl_io_alloc_cbm_show(struct kernfs_open_file *of, struct seq_file=
 *seq, void *v)
 {
 	struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn);
-	struct rdt_resource *r =3D s->res;
+	struct rdt_resource *r;
 	int ret =3D 0;
=20
-	cpus_read_lock();
-	mutex_lock(&rdtgroup_mutex);
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
=20
 	rdt_last_cmd_clear();
=20
+	r =3D s->res;
 	if (!r->cache.io_alloc_capable) {
 		rdt_last_cmd_printf("io_alloc is not supported on %s\n", s->name);
 		ret =3D -ENODEV;
@@ -947,8 +950,7 @@ int resctrl_io_alloc_cbm_show(struct kernfs_open_file *=
of, struct seq_file *seq,
 	show_doms(seq, s, NULL, resctrl_io_alloc_closid(r));
=20
 out_unlock:
-	mutex_unlock(&rdtgroup_mutex);
-	cpus_read_unlock();
+	info_kn_unlock(of->kn);
 	return ret;
 }
=20
@@ -1015,7 +1017,7 @@ ssize_t resctrl_io_alloc_cbm_write(struct kernfs_open=
_file *of, char *buf,
 				   size_t nbytes, loff_t off)
 {
 	struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn);
-	struct rdt_resource *r =3D s->res;
+	struct rdt_resource *r;
 	u32 io_alloc_closid;
 	int ret =3D 0;
=20
@@ -1025,10 +1027,11 @@ ssize_t resctrl_io_alloc_cbm_write(struct kernfs_op=
en_file *of, char *buf,
=20
 	buf[nbytes - 1] =3D '\0';
=20
-	cpus_read_lock();
-	mutex_lock(&rdtgroup_mutex);
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
 	rdt_last_cmd_clear();
=20
+	r =3D s->res;
 	if (!r->cache.io_alloc_capable) {
 		rdt_last_cmd_printf("io_alloc is not supported on %s\n", s->name);
 		ret =3D -ENODEV;
@@ -1053,8 +1056,7 @@ ssize_t resctrl_io_alloc_cbm_write(struct kernfs_open=
_file *of, char *buf,
 out_clear_configs:
 	rdt_staged_configs_clear();
 out_unlock:
-	mutex_unlock(&rdtgroup_mutex);
-	cpus_read_unlock();
+	info_kn_unlock(of->kn);
=20
 	return ret ?: nbytes;
 }
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 48af75b9dc85..e62a277dee85 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -335,8 +335,9 @@ __printf(1, 2)
 void rdt_last_cmd_printf(const char *fmt, ...);
=20
 struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn);
-
 void rdtgroup_kn_unlock(struct kernfs_node *kn);
+bool info_kn_lock(struct kernfs_node *kn);
+void info_kn_unlock(struct kernfs_node *kn);
=20
 int rdtgroup_kn_mode_restrict(struct rdtgroup *r, const char *name);
=20
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index d2aa7d045056..f7ab9a1bc726 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -1057,7 +1057,8 @@ int event_filter_show(struct kernfs_open_file *of, st=
ruct seq_file *seq, void *v
 	bool sep =3D false;
 	int ret =3D 0, i;
=20
-	mutex_lock(&rdtgroup_mutex);
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
 	rdt_last_cmd_clear();
=20
 	r =3D resctrl_arch_get_resource(mevt->rid);
@@ -1078,7 +1079,7 @@ int event_filter_show(struct kernfs_open_file *of, st=
ruct seq_file *seq, void *v
 	seq_putc(seq, '\n');
=20
 out_unlock:
-	mutex_unlock(&rdtgroup_mutex);
+	info_kn_unlock(of->kn);
=20
 	return ret;
 }
@@ -1089,7 +1090,8 @@ int resctrl_mbm_assign_on_mkdir_show(struct kernfs_op=
en_file *of, struct seq_fil
 	struct rdt_resource *r =3D rdt_kn_parent_priv(of->kn);
 	int ret =3D 0;
=20
-	mutex_lock(&rdtgroup_mutex);
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
 	rdt_last_cmd_clear();
=20
 	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
@@ -1101,7 +1103,7 @@ int resctrl_mbm_assign_on_mkdir_show(struct kernfs_op=
en_file *of, struct seq_fil
 	seq_printf(s, "%u\n", r->mon.mbm_assign_on_mkdir);
=20
 out_unlock:
-	mutex_unlock(&rdtgroup_mutex);
+	info_kn_unlock(of->kn);
=20
 	return ret;
 }
@@ -1117,7 +1119,8 @@ ssize_t resctrl_mbm_assign_on_mkdir_write(struct kern=
fs_open_file *of, char *buf
 	if (ret)
 		return ret;
=20
-	mutex_lock(&rdtgroup_mutex);
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
 	rdt_last_cmd_clear();
=20
 	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
@@ -1129,7 +1132,7 @@ ssize_t resctrl_mbm_assign_on_mkdir_write(struct kern=
fs_open_file *of, char *buf
 	r->mon.mbm_assign_on_mkdir =3D value;
=20
 out_unlock:
-	mutex_unlock(&rdtgroup_mutex);
+	info_kn_unlock(of->kn);
=20
 	return ret ?: nbytes;
 }
@@ -1419,8 +1422,8 @@ ssize_t event_filter_write(struct kernfs_open_file *o=
f, char *buf, size_t nbytes
=20
 	buf[nbytes - 1] =3D '\0';
=20
-	cpus_read_lock();
-	mutex_lock(&rdtgroup_mutex);
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
=20
 	rdt_last_cmd_clear();
=20
@@ -1443,8 +1446,7 @@ ssize_t event_filter_write(struct kernfs_open_file *o=
f, char *buf, size_t nbytes
 	}
=20
 out_unlock:
-	mutex_unlock(&rdtgroup_mutex);
-	cpus_read_unlock();
+	info_kn_unlock(of->kn);
=20
 	return ret ?: nbytes;
 }
@@ -1455,7 +1457,8 @@ int resctrl_mbm_assign_mode_show(struct kernfs_open_f=
ile *of,
 	struct rdt_resource *r =3D rdt_kn_parent_priv(of->kn);
 	bool enabled;
=20
-	mutex_lock(&rdtgroup_mutex);
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
 	enabled =3D resctrl_arch_mbm_cntr_assign_enabled(r);
=20
 	if (r->mon.mbm_cntr_assignable) {
@@ -1474,7 +1477,7 @@ int resctrl_mbm_assign_mode_show(struct kernfs_open_f=
ile *of,
 		seq_puts(s, "[default]\n");
 	}
=20
-	mutex_unlock(&rdtgroup_mutex);
+	info_kn_unlock(of->kn);
=20
 	return 0;
 }
@@ -1493,8 +1496,8 @@ ssize_t resctrl_mbm_assign_mode_write(struct kernfs_o=
pen_file *of, char *buf,
=20
 	buf[nbytes - 1] =3D '\0';
=20
-	cpus_read_lock();
-	mutex_lock(&rdtgroup_mutex);
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
=20
 	rdt_last_cmd_clear();
=20
@@ -1552,8 +1555,7 @@ ssize_t resctrl_mbm_assign_mode_write(struct kernfs_o=
pen_file *of, char *buf,
 	}
=20
 out_unlock:
-	mutex_unlock(&rdtgroup_mutex);
-	cpus_read_unlock();
+	info_kn_unlock(of->kn);
=20
 	return ret ?: nbytes;
 }
@@ -1565,8 +1567,8 @@ int resctrl_num_mbm_cntrs_show(struct kernfs_open_fil=
e *of,
 	struct rdt_l3_mon_domain *dom;
 	bool sep =3D false;
=20
-	cpus_read_lock();
-	mutex_lock(&rdtgroup_mutex);
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
=20
 	list_for_each_entry_rcu(dom, &r->mon_domains, hdr.list, lockdep_is_cpus_h=
eld()) {
 		if (sep)
@@ -1577,8 +1579,7 @@ int resctrl_num_mbm_cntrs_show(struct kernfs_open_fil=
e *of,
 	}
 	seq_putc(s, '\n');
=20
-	mutex_unlock(&rdtgroup_mutex);
-	cpus_read_unlock();
+	info_kn_unlock(of->kn);
 	return 0;
 }
=20
@@ -1591,8 +1592,8 @@ int resctrl_available_mbm_cntrs_show(struct kernfs_op=
en_file *of,
 	u32 cntrs, i;
 	int ret =3D 0;
=20
-	cpus_read_lock();
-	mutex_lock(&rdtgroup_mutex);
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
=20
 	rdt_last_cmd_clear();
=20
@@ -1618,8 +1619,7 @@ int resctrl_available_mbm_cntrs_show(struct kernfs_op=
en_file *of,
 	seq_putc(s, '\n');
=20
 out_unlock:
-	mutex_unlock(&rdtgroup_mutex);
-	cpus_read_unlock();
+	info_kn_unlock(of->kn);
=20
 	return ret;
 }
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 77c9d22017bc..9f998e394911 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -977,13 +977,14 @@ static int rdt_last_cmd_status_show(struct kernfs_ope=
n_file *of,
 {
 	int len;
=20
-	mutex_lock(&rdtgroup_mutex);
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
 	len =3D seq_buf_used(&last_cmd_status);
 	if (len)
 		seq_printf(seq, "%.*s", len, last_cmd_status_buf);
 	else
 		seq_puts(seq, "ok\n");
-	mutex_unlock(&rdtgroup_mutex);
+	info_kn_unlock(of->kn);
 	return 0;
 }
=20
@@ -1002,7 +1003,11 @@ static int rdt_num_closids_show(struct kernfs_open_f=
ile *of,
 {
 	struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn);
=20
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
 	seq_printf(seq, "%u\n", s->num_closid);
+	info_kn_unlock(of->kn);
+
 	return 0;
 }
=20
@@ -1010,9 +1015,14 @@ static int rdt_default_ctrl_show(struct kernfs_open_=
file *of,
 				 struct seq_file *seq, void *v)
 {
 	struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn);
-	struct rdt_resource *r =3D s->res;
+	struct rdt_resource *r;
=20
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
+	r =3D s->res;
 	seq_printf(seq, "%x\n", resctrl_get_default_ctrl(r));
+	info_kn_unlock(of->kn);
+
 	return 0;
 }
=20
@@ -1020,9 +1030,15 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_=
file *of,
 				 struct seq_file *seq, void *v)
 {
 	struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn);
-	struct rdt_resource *r =3D s->res;
+	struct rdt_resource *r;
+
=20
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
+	r =3D s->res;
 	seq_printf(seq, "%u\n", r->cache.min_cbm_bits);
+	info_kn_unlock(of->kn);
+
 	return 0;
 }
=20
@@ -1030,9 +1046,14 @@ static int rdt_shareable_bits_show(struct kernfs_ope=
n_file *of,
 				   struct seq_file *seq, void *v)
 {
 	struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn);
-	struct rdt_resource *r =3D s->res;
+	struct rdt_resource *r;
=20
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
+	r =3D s->res;
 	seq_printf(seq, "%x\n", r->cache.shareable_bits);
+	info_kn_unlock(of->kn);
+
 	return 0;
 }
=20
@@ -1060,15 +1081,16 @@ static int rdt_bit_usage_show(struct kernfs_open_fi=
le *of,
 	 */
 	unsigned long sw_shareable =3D 0, hw_shareable =3D 0;
 	unsigned long exclusive =3D 0, pseudo_locked =3D 0;
-	struct rdt_resource *r =3D s->res;
 	struct rdt_ctrl_domain *dom;
 	int i, hwb, swb, excl, psl;
+	struct rdt_resource *r;
 	enum rdtgrp_mode mode;
 	bool sep =3D false;
 	u32 ctrl_val;
=20
-	cpus_read_lock();
-	mutex_lock(&rdtgroup_mutex);
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
+	r =3D s->res;
 	list_for_each_entry_rcu(dom, &r->ctrl_domains, hdr.list, lockdep_is_cpus_=
held()) {
 		if (sep)
 			seq_putc(seq, ';');
@@ -1144,8 +1166,7 @@ static int rdt_bit_usage_show(struct kernfs_open_file=
 *of,
 		sep =3D true;
 	}
 	seq_putc(seq, '\n');
-	mutex_unlock(&rdtgroup_mutex);
-	cpus_read_unlock();
+	info_kn_unlock(of->kn);
 	return 0;
 }
=20
@@ -1153,9 +1174,14 @@ static int rdt_min_bw_show(struct kernfs_open_file *=
of,
 			   struct seq_file *seq, void *v)
 {
 	struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn);
-	struct rdt_resource *r =3D s->res;
+	struct rdt_resource *r;
=20
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
+	r =3D s->res;
 	seq_printf(seq, "%u\n", r->membw.min_bw);
+	info_kn_unlock(of->kn);
+
 	return 0;
 }
=20
@@ -1164,8 +1190,12 @@ static int rdt_num_rmids_show(struct kernfs_open_fil=
e *of,
 {
 	struct rdt_resource *r =3D rdt_kn_parent_priv(of->kn);
=20
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
 	seq_printf(seq, "%u\n", r->mon.num_rmid);
=20
+	info_kn_unlock(of->kn);
+
 	return 0;
 }
=20
@@ -1175,6 +1205,8 @@ static int rdt_mon_features_show(struct kernfs_open_f=
ile *of,
 	struct rdt_resource *r =3D rdt_kn_parent_priv(of->kn);
 	struct mon_evt *mevt;
=20
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
 	for_each_mon_event(mevt) {
 		if (mevt->rid !=3D r->rid || !mevt->enabled)
 			continue;
@@ -1184,6 +1216,8 @@ static int rdt_mon_features_show(struct kernfs_open_f=
ile *of,
 			seq_printf(seq, "%s_config\n", mevt->name);
 	}
=20
+	info_kn_unlock(of->kn);
+
 	return 0;
 }
=20
@@ -1191,9 +1225,14 @@ static int rdt_bw_gran_show(struct kernfs_open_file =
*of,
 			    struct seq_file *seq, void *v)
 {
 	struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn);
-	struct rdt_resource *r =3D s->res;
+	struct rdt_resource *r;
=20
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
+	r =3D s->res;
 	seq_printf(seq, "%u\n", r->membw.bw_gran);
+	info_kn_unlock(of->kn);
+
 	return 0;
 }
=20
@@ -1201,16 +1240,24 @@ static int rdt_delay_linear_show(struct kernfs_open=
_file *of,
 				 struct seq_file *seq, void *v)
 {
 	struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn);
-	struct rdt_resource *r =3D s->res;
+	struct rdt_resource *r;
=20
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
+	r =3D s->res;
 	seq_printf(seq, "%u\n", r->membw.delay_linear);
+	info_kn_unlock(of->kn);
+
 	return 0;
 }
=20
 static int max_threshold_occ_show(struct kernfs_open_file *of,
 				  struct seq_file *seq, void *v)
 {
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
 	seq_printf(seq, "%u\n", resctrl_rmid_realloc_threshold);
+	info_kn_unlock(of->kn);
=20
 	return 0;
 }
@@ -1219,22 +1266,28 @@ static int rdt_thread_throttle_mode_show(struct ker=
nfs_open_file *of,
 					 struct seq_file *seq, void *v)
 {
 	struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn);
-	struct rdt_resource *r =3D s->res;
+	struct rdt_resource *r;
+
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
=20
+	r =3D s->res;
 	switch (r->membw.throttle_mode) {
 	case THREAD_THROTTLE_PER_THREAD:
 		seq_puts(seq, "per-thread\n");
-		return 0;
+		break;
 	case THREAD_THROTTLE_MAX:
 		seq_puts(seq, "max\n");
-		return 0;
+		break;
 	case THREAD_THROTTLE_UNDEFINED:
 		seq_puts(seq, "undefined\n");
-		return 0;
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		break;
 	}
=20
-	WARN_ON_ONCE(1);
-
+	info_kn_unlock(of->kn);
 	return 0;
 }
=20
@@ -1248,12 +1301,20 @@ static ssize_t max_threshold_occ_write(struct kernf=
s_open_file *of,
 	if (ret)
 		return ret;
=20
-	if (bytes > resctrl_rmid_realloc_limit)
-		return -EINVAL;
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
+
+	if (bytes > resctrl_rmid_realloc_limit) {
+		ret =3D -EINVAL;
+		goto out_unlock;
+	}
=20
 	resctrl_rmid_realloc_threshold =3D resctrl_arch_round_mon_val(bytes);
=20
-	return nbytes;
+out_unlock:
+	info_kn_unlock(of->kn);
+
+	return ret ?: nbytes;
 }
=20
 /*
@@ -1293,10 +1354,15 @@ static int rdt_has_sparse_bitmasks_show(struct kern=
fs_open_file *of,
 					struct seq_file *seq, void *v)
 {
 	struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn);
-	struct rdt_resource *r =3D s->res;
+	struct rdt_resource *r;
=20
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
+	r =3D s->res;
 	seq_printf(seq, "%u\n", r->cache.arch_has_sparse_bitmasks);
=20
+	info_kn_unlock(of->kn);
+
 	return 0;
 }
=20
@@ -1652,8 +1718,8 @@ static int mbm_config_show(struct seq_file *s, struct=
 rdt_resource *r, u32 evtid
 	struct rdt_l3_mon_domain *dom;
 	bool sep =3D false;
=20
-	cpus_read_lock();
-	mutex_lock(&rdtgroup_mutex);
+	lockdep_assert_cpus_held();
+	lockdep_assert_held(&rdtgroup_mutex);
=20
 	list_for_each_entry_rcu(dom, &r->mon_domains, hdr.list, lockdep_is_cpus_h=
eld()) {
 		if (sep)
@@ -1670,8 +1736,6 @@ static int mbm_config_show(struct seq_file *s, struct=
 rdt_resource *r, u32 evtid
 	}
 	seq_puts(s, "\n");
=20
-	mutex_unlock(&rdtgroup_mutex);
-	cpus_read_unlock();
=20
 	return 0;
 }
@@ -1681,8 +1745,12 @@ static int mbm_total_bytes_config_show(struct kernfs=
_open_file *of,
 {
 	struct rdt_resource *r =3D rdt_kn_parent_priv(of->kn);
=20
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
+
 	mbm_config_show(seq, r, QOS_L3_MBM_TOTAL_EVENT_ID);
=20
+	info_kn_unlock(of->kn);
 	return 0;
 }
=20
@@ -1691,8 +1759,12 @@ static int mbm_local_bytes_config_show(struct kernfs=
_open_file *of,
 {
 	struct rdt_resource *r =3D rdt_kn_parent_priv(of->kn);
=20
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
+
 	mbm_config_show(seq, r, QOS_L3_MBM_LOCAL_EVENT_ID);
=20
+	info_kn_unlock(of->kn);
 	return 0;
 }
=20
@@ -1790,8 +1862,8 @@ static ssize_t mbm_total_bytes_config_write(struct ke=
rnfs_open_file *of,
 	if (nbytes =3D=3D 0 || buf[nbytes - 1] !=3D '\n')
 		return -EINVAL;
=20
-	cpus_read_lock();
-	mutex_lock(&rdtgroup_mutex);
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
=20
 	rdt_last_cmd_clear();
=20
@@ -1799,8 +1871,7 @@ static ssize_t mbm_total_bytes_config_write(struct ke=
rnfs_open_file *of,
=20
 	ret =3D mon_config_write(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID);
=20
-	mutex_unlock(&rdtgroup_mutex);
-	cpus_read_unlock();
+	info_kn_unlock(of->kn);
=20
 	return ret ?: nbytes;
 }
@@ -1816,8 +1887,8 @@ static ssize_t mbm_local_bytes_config_write(struct ke=
rnfs_open_file *of,
 	if (nbytes =3D=3D 0 || buf[nbytes - 1] !=3D '\n')
 		return -EINVAL;
=20
-	cpus_read_lock();
-	mutex_lock(&rdtgroup_mutex);
+	if (!info_kn_lock(of->kn))
+		return -ENOENT;
=20
 	rdt_last_cmd_clear();
=20
@@ -1825,8 +1896,7 @@ static ssize_t mbm_local_bytes_config_write(struct ke=
rnfs_open_file *of,
=20
 	ret =3D mon_config_write(r, buf, QOS_L3_MBM_LOCAL_EVENT_ID);
=20
-	mutex_unlock(&rdtgroup_mutex);
-	cpus_read_unlock();
+	info_kn_unlock(of->kn);
=20
 	return ret ?: nbytes;
 }
@@ -2659,6 +2729,58 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn)
 	rdtgroup_kn_put(rdtgrp, kn);
 }
=20
+/*
+ * Accessing the kn after breaking active protection is safe since the open
+ * of resctrl file holds a kernfs base reference (different from active
+ * protection) on the kn ensuring that it remains accessible even if it was
+ * unlinked. Each kn in turn holds base reference to parent so the kn's
+ * genealogy remains in memory until all base references dropped.
+ */
+static bool is_active_resctrl_node(struct kernfs_node *kn)
+{
+	struct kernfs_node *p;
+	bool match =3D false;
+
+	guard(rcu)();
+	p =3D kn;
+	while (p) {
+		if (p =3D=3D rdtgroup_default.kn) {
+			match =3D true;
+			break;
+		}
+		p =3D rcu_dereference(p->__parent);
+	}
+
+	return match;
+}
+
+bool info_kn_lock(struct kernfs_node *kn)
+{
+	kernfs_break_active_protection(kn);
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	/*
+	 * Check both if resctrl is torn down (!rdtgroup_default.kn) and
+	 * if the reader's kernfs_node originates from a dead mount.
+	 */
+	if (!rdtgroup_default.kn || !is_active_resctrl_node(kn)) {
+		mutex_unlock(&rdtgroup_mutex);
+		cpus_read_unlock();
+		kernfs_unbreak_active_protection(kn);
+		return false;
+	}
+
+	return true;
+}
+
+void info_kn_unlock(struct kernfs_node *kn)
+{
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+	kernfs_unbreak_active_protection(kn);
+}
+
 static int mkdir_mondata_all(struct kernfs_node *parent_kn,
 			     struct rdtgroup *prgrp,
 			     struct kernfs_node **mon_data_kn);
--=20
2.50.1
From nobody Mon Jun  8 06:35:46 2026
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 010643F7893
	for <linux-kernel@vger.kernel.org>; Wed,  3 Jun 2026 03:28:05 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780457287; cv=none;
 b=EALAfRMdUUmzSvtlZyAi7Qr+RDUdBsXfBOK5VTe3RadTkfj+Mi5Lug1FWKYAip8Wv10zetz2WjrU2mKY+Vp1ayJoxIUdvTzLMUtcMq0ayLLoGrnyx+4WOq6sGpXveOhtHbn5JZ//7UhJ0m022Ak+FGDZfEifpXLrIL10lQyFy8g=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780457287; c=relaxed/simple;
	bh=LI3fWkBiCZaOC31ilZkPdUqLlp43WfXVSeGsZHEYUvs=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=IHEzGW5vHrvF+TMdU4sEBuB+cqM7Z8Wn8oLa3t5Nsfwmo/f01Y8COwhq/4lZ7bpJplTDLTRt/JTlnfXZ9bDrozbMMzPINV+ViWwBy5cQkef/IlIzBoTRErS1Bk7A0jrX1WH5JftQ1fo4k6BkE4gAZkGjSXmIpuNEI5xTa7R+nH8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=dPXTTdry; arc=none smtp.client-ip=192.198.163.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="dPXTTdry"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1780457286; x=1811993286;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=LI3fWkBiCZaOC31ilZkPdUqLlp43WfXVSeGsZHEYUvs=;
  b=dPXTTdryDzJ2JAxUuHCFDQZtmKtld4epfjS1dtSVtH8UwJJ1g28rrsac
   4ge4258us4o3IIPh0BTQxz5hTuoGEByHfLzx6Ekuvxv3ZA0qqe5uz9jC/
   3zIqakUbmDFh4maii6/JY0OutGXEYzqMrpg5kdWglLPLlQtPdDKI+ODru
   +0IKzXwXxRHgyhsCk68ql0THov73o7i9lQ8/KKLiTZKAB/U7WH76Hi09j
   b/oMLD7IFAXFJ1TRaCx65iawZpZRy1D9e17hM3HHRA3yHuFJfEuKObnQV
   OKIlWWjFoHIDtQItDfj1oOCl8Dd/gX5SDA3yhx3cVLZzEsx3b6ciI9Xgj
   w==;
X-CSE-ConnectionGUID: Fw1xRiomSHSC44ynQX9AqA==
X-CSE-MsgGUID: E5bRddxURzO1BNsn3B2kcA==
X-IronPort-AV: E=McAfee;i="6800,10657,11805"; a="91939050"
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="91939050"
Received: from fmviesa007.fm.intel.com ([10.60.135.147])
  by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:28:02 -0700
X-CSE-ConnectionGUID: Z1M80lUZRn2YTPwIH7PLPQ==
X-CSE-MsgGUID: N0pjBagxT+KsqCU8LKtwSw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="241110144"
Received: from rchatre-desk1.jf.intel.com ([10.165.154.99])
  by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:28:01 -0700
From: Reinette Chatre <reinette.chatre@intel.com>
To: tony.luck@intel.com,
	james.morse@arm.com,
	Dave.Martin@arm.com,
	babu.moger@amd.com,
	bp@alien8.de,
	tglx@linutronix.de,
	dave.hansen@linux.intel.com
Cc: x86@kernel.org,
	hpa@zytor.com,
	ben.horgan@arm.com,
	fustini@kernel.org,
	fenghuay@nvidia.com,
	peternewman@google.com,
	yu.c.chen@intel.com,
	linux-kernel@vger.kernel.org,
	patches@lists.linux.dev,
	reinette.chatre@intel.com
Subject: [PATCH v4 09/10] x86/resctrl: Ensure domain fully initialized before
 placed on RCU list
Date: Tue,  2 Jun 2026 20:27:37 -0700
Message-ID: 
 <b456813ba11633881b8feb12813b040647aacf44.1780456704.git.reinette.chatre@intel.com>
X-Mailer: git-send-email 2.50.1
In-Reply-To: <cover.1780456704.git.reinette.chatre@intel.com>
References: <cover.1780456704.git.reinette.chatre@intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

A resctrl domain consists of the domain structure self that includes
pointers to dynamically allocated filesystem as well as architecture
specific data. For example, the L3 monitoring domain structure consists
of the architecture specific struct rdt_hw_l3_mon_domain that contains
the dynamically allocated rdt_hw_l3_mon_domain::arch_mbm_states
architectural state and the embedded struct rdt_l3_mon_domain contains
the dynamically allocated rdt_l3_mon_domain::mbm_states resctrl fs state.

The domains are added to and removed from an RCU protected list while
cpus_write_lock() is held so that readers could access domains via
cpus_read_lock() or from an RCU read-side critical section.

A reader accessing a domain via the RCU list expects that the domain and
all its dynamically allocated data is accessible. Only place the domain on
the RCU list when all its dynamically allocated data is ready, similarly
unlink it from RCU list (again with cpus_write_lock() held) before removing
any of its dynamically allocated data.

Calling resctrl_online_mon_domain() before adding the domain to the RCU
list creates the kernfs files that expose the domain's monitoring data to
user space before adding the domain to the RCU list. This is safe because
rdtgroup_mondata_show() acquires cpus_read_lock() before it traverses the
RCU list and will thus block until the domain is added to the RCU list.

There are no readers accessing a domain via RCU list. Ensure safety of
access when such a reader arrives.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Reported-by: Sashiko <sashiko-bot@kernel.org>
---
Changes since V2:
- New patch

Changes since V3:
- Add Tony's Reviewed-by tag.
- Add Chenyu's Reviewed-by tag.
- Grammar fixes in changelog.
- Add snippet to changelog about possible race with rdtgroup_mondata_show().
---
 arch/x86/kernel/cpu/resctrl/core.c      | 18 +++++++-----------
 arch/x86/kernel/cpu/resctrl/intel_aet.c |  5 ++---
 2 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resct=
rl/core.c
index 9c01d2562b7a..bca782050198 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -515,14 +515,12 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_r=
esource *r)
 		return;
 	}
=20
-	list_add_tail_rcu(&d->hdr.list, add_pos);
-
 	err =3D resctrl_online_ctrl_domain(r, d);
 	if (err) {
-		list_del_rcu(&d->hdr.list);
-		synchronize_rcu();
 		ctrl_domain_free(hw_dom);
+		return;
 	}
+	list_add_tail_rcu(&d->hdr.list, add_pos);
 }
=20
 static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, s=
truct list_head *add_pos)
@@ -556,14 +554,12 @@ static void l3_mon_domain_setup(int cpu, int id, stru=
ct rdt_resource *r, struct
 		return;
 	}
=20
-	list_add_tail_rcu(&d->hdr.list, add_pos);
-
 	err =3D resctrl_online_mon_domain(r, &d->hdr);
 	if (err) {
-		list_del_rcu(&d->hdr.list);
-		synchronize_rcu();
 		l3_mon_domain_free(hw_dom);
+		return;
 	}
+	list_add_tail_rcu(&d->hdr.list, add_pos);
 }
=20
 static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
@@ -642,9 +638,9 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_=
resource *r)
 	d =3D container_of(hdr, struct rdt_ctrl_domain, hdr);
 	hw_dom =3D resctrl_to_arch_ctrl_dom(d);
=20
-	resctrl_offline_ctrl_domain(r, d);
 	list_del_rcu(&hdr->list);
 	synchronize_rcu();
+	resctrl_offline_ctrl_domain(r, d);
=20
 	/*
 	 * rdt_ctrl_domain "d" is going to be freed below, so clear
@@ -689,9 +685,9 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_r=
esource *r)
=20
 		d =3D container_of(hdr, struct rdt_l3_mon_domain, hdr);
 		hw_dom =3D resctrl_to_arch_mon_dom(d);
-		resctrl_offline_mon_domain(r, hdr);
 		list_del_rcu(&hdr->list);
 		synchronize_rcu();
+		resctrl_offline_mon_domain(r, hdr);
 		l3_mon_domain_free(hw_dom);
 		break;
 	}
@@ -702,9 +698,9 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_r=
esource *r)
 			return;
=20
 		pkgd =3D container_of(hdr, struct rdt_perf_pkg_mon_domain, hdr);
-		resctrl_offline_mon_domain(r, hdr);
 		list_del_rcu(&hdr->list);
 		synchronize_rcu();
+		resctrl_offline_mon_domain(r, hdr);
 		kfree(pkgd);
 		break;
 	}
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/=
resctrl/intel_aet.c
index 89b8b619d5d5..c22c3cf5167d 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -398,12 +398,11 @@ void intel_aet_mon_domain_setup(int cpu, int id, stru=
ct rdt_resource *r,
 	d->hdr.type =3D RESCTRL_MON_DOMAIN;
 	d->hdr.rid =3D RDT_RESOURCE_PERF_PKG;
 	cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
-	list_add_tail_rcu(&d->hdr.list, add_pos);
=20
 	err =3D resctrl_online_mon_domain(r, &d->hdr);
 	if (err) {
-		list_del_rcu(&d->hdr.list);
-		synchronize_rcu();
 		kfree(d);
+		return;
 	}
+	list_add_tail_rcu(&d->hdr.list, add_pos);
 }
--=20
2.50.1
From nobody Mon Jun  8 06:35:46 2026
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0D2EC3F7899
	for <linux-kernel@vger.kernel.org>; Wed,  3 Jun 2026 03:28:06 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=192.198.163.9
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780457288; cv=none;
 b=tD3s8himccZqaAElFJnHjaU4PfJyZPqy//lyZdooDUEu31fl73eRSujMb57eIzNY7bNLuMDoHjEOpkVOj/pHlxtXOP1DdpLuhZNGQf+YGNHXVQh9CKplnPGSR4r7B1O1cR3ngzDWzlMC42dPcnL356ZRf1TUHgSNWMdTpVNhvP4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780457288; c=relaxed/simple;
	bh=NkRz2ujFLstUVTop8cwQvKn2tC+od6QZJws0jIRRO8k=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=fk4zom8IGUyPGvGJ3+MW5AieG14yW5unis5GXhr8QAqVxWE+LOzjz3QG5PDSohXRQWZeSfGMSVchmmQI4axBw4uWjANZrrGp95Hnusgt69i/mmpRoYgT0+VoCTIHIKdpTH0Ih0/hjV8no3RuVmXpP4dMg5MIf/Hc/TG75RUrWQw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com;
 spf=pass smtp.mailfrom=intel.com;
 dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b=R09Bng2h; arc=none smtp.client-ip=192.198.163.9
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com
 header.b="R09Bng2h"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1780457286; x=1811993286;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=NkRz2ujFLstUVTop8cwQvKn2tC+od6QZJws0jIRRO8k=;
  b=R09Bng2hugCtX8+g/L9/hM7m7IUaiz9SlVxFjnl4j7ROVmIs9WAsoE5N
   MhE7uXPLBqVxU4n5JIXA62bnt42jgSr0SLLVYHetRdVKmsU6Lp822QSTr
   awmtp+4msa/kC++kdMDcyRw4y1w5UQwhPv/6mZOfkBd7asLhdwjswaG4j
   LzujE38c07k7jIUTbBgeqw1HboA9o2P/ofI8/+Cj2j2bnXIyakXBBYH3H
   jhx2WRL0lvKTfIWqgLW3qm6aJ7MmKtHcLW6r17SSBTaVhsl79KdwhW/Uz
   +YUONnrWqbj/ZiCPX6o92C5NYU7a9GALMzKa0VP+3C9DfUNa38FNqqjJB
   w==;
X-CSE-ConnectionGUID: DypI4DMCQPeR07Wo6r3kAQ==
X-CSE-MsgGUID: Aem/4RThQcGtiFZ7Sq961Q==
X-IronPort-AV: E=McAfee;i="6800,10657,11805"; a="91939061"
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="91939061"
Received: from fmviesa007.fm.intel.com ([10.60.135.147])
  by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:28:02 -0700
X-CSE-ConnectionGUID: trpoU+IfQoanmb/bMAAi9w==
X-CSE-MsgGUID: WdrtOkcIR1KdSAaIWULefw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.24,184,1774335600";
   d="scan'208";a="241110148"
Received: from rchatre-desk1.jf.intel.com ([10.165.154.99])
  by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 02 Jun 2026 20:28:01 -0700
From: Reinette Chatre <reinette.chatre@intel.com>
To: tony.luck@intel.com,
	james.morse@arm.com,
	Dave.Martin@arm.com,
	babu.moger@amd.com,
	bp@alien8.de,
	tglx@linutronix.de,
	dave.hansen@linux.intel.com
Cc: x86@kernel.org,
	hpa@zytor.com,
	ben.horgan@arm.com,
	fustini@kernel.org,
	fenghuay@nvidia.com,
	peternewman@google.com,
	yu.c.chen@intel.com,
	linux-kernel@vger.kernel.org,
	patches@lists.linux.dev,
	reinette.chatre@intel.com
Subject: [PATCH v4 10/10] fs/resctrl: Fix UAF from worker threads when domains
 are removed
Date: Tue,  2 Jun 2026 20:27:38 -0700
Message-ID: 
 <c6bdc19625e0aba4978db2031b7cefb3bfd6fbc1.1780456704.git.reinette.chatre@intel.com>
X-Mailer: git-send-email 2.50.1
In-Reply-To: <cover.1780456704.git.reinette.chatre@intel.com>
References: <cover.1780456704.git.reinette.chatre@intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

The mbm_handle_overflow() and cqm_handle_limbo() workers read event
counters and may sleep while doing so. They are scheduled via
delayed_work embedded in struct rdt_l3_mon_domain. Architecture allocates
and frees these domains from CPU hotplug callbacks under cpus_write_lock(),
and the workers acquire cpus_read_lock() to keep the domain alive across
their access.

A use-after-free can occur when a worker is blocked waiting for
cpus_read_lock() while the hotplug core holds cpus_write_lock():
the architecture frees the rdt_l3_mon_domain that contains the worker's
work_struct. When the worker unblocks, the container_of() it performs on
the embedded work pointer dereferences freed memory.

Drop cpus_read_lock() from the workers and instead drain pending and
in-flight work synchronously before the architecture can free the domain.
Since architecture offlines the domain under cpus_write_lock() after it has
been unlinked from the RCU list and a grace period has elapsed, no new work
can be scheduled. The cancel only needs to wait out existing work.
Drop rdtgroup_mutex during CPU offline around cancel_delayed_work_sync()
so that a worker waiting on the mutex can complete before re-pinning the
work on a different CPU.

When offlining a CPU the architecture may iterate over resources in any
order. For example, the MBA control domain may be offlined before or
after a corresponding L3 monitor domain. Ensure that resctrl fs cancels
the workers no matter what order the architecture offlines the domains.

Fixes: 24247aeeabe9 ("x86/intel_rdt/cqm: Improve limbo list processing")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260429184858.36423-1-tony.luck%40i=
ntel.com # [1]
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since v2:
- Rewrite changelog
- v2 attempted to solve the issue by using is_percpu_thread() within the
  worker to learn if CPU worker was running on is going offline. A
  Sashiko (https://sashiko.dev/#/patchset/20260515193944.15114-1-tony.luck%=
40intel.com?part=3D5)
  pointed out that this would not be able to handle the scenario if one
  of the hotplug handlers following the resctrl offline handlers failed.
- Some other fixes attempted that failed:
  - Switch to accessing domain structure in handler via RCU so that CPU
    hotplug lock no longer needed. Use cancel_delayed_work_sync() with
    mutex dropped to cancel worker. Running worker from RCU read-side
    critical section is a problem since the worker needs to be
    able to sleep (mbm_handle_overflow()->mbm_update()->
		    mbm_update_one_event()->resctrl_arch_mon_ctx_alloc()->
		    might_sleep())
  - Adding a reference count to the domain structure to avoid the worker
    needing to take CPU hotplug lock. This ended up being very complicated
    with the architecture needing new APIs to manage the reference count
    which cannot cleanly integrate into MPAM since it uses a single
    architecture domain structure to contain both the control and monitoring
    domain structures. Managing the references across mount, unmount,
    online, offline, as well as worker self exit resulted in several
    asymmetrical and complicated paths that were error prone. Locking also
    proved to be complicated since architecture would need to initiate
    domain free that will need to call back into resctrl that will take
    rdtgroup_mutex which means that references need to be taken/released
    without locking.

Changes since V3:
----------------
- Traverse mon_domains list using list_for_each_entry_rcu( ...,
  lockdep_is_cpus_held()) to document how CPU hotplug lock is required
  to be held (via architecture).
- Add snippet in changelog to motivate canceling work in monitor and
  control domain offline handlers.
---
 fs/resctrl/monitor.c  | 52 ++++++++++++++++++++++++++++++++++---------
 fs/resctrl/rdtgroup.c | 52 ++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 89 insertions(+), 15 deletions(-)

diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index f7ab9a1bc726..db56c0153e3a 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -628,14 +628,22 @@ void mon_event_count(void *info)
 		rr->err =3D 0;
 }
=20
-static struct rdt_ctrl_domain *get_ctrl_domain_from_cpu(int cpu,
-							struct rdt_resource *r)
+/*
+ * Find the software controller's ctrl domain that contains @cpu on resour=
ce @r.
+ *
+ * Only called from the mbm_over worker via update_mba_bw() where the retu=
rned
+ * domain is kept alive by cancel_delayed_work_sync() in
+ * resctrl_offline_ctrl_domain(). This drains this worker and then waits on
+ * rdtgroup_mutex held here before the architecture can free the ctrl doma=
in.
+ *
+ * Context: Call from RCU read-side critical section.
+ */
+static struct rdt_ctrl_domain *get_sc_ctrl_domain_from_cpu(int cpu,
+							   struct rdt_resource *r)
 {
 	struct rdt_ctrl_domain *d;
=20
-	lockdep_assert_cpus_held();
-
-	list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+	list_for_each_entry_rcu(d, &r->ctrl_domains, hdr.list) {
 		/* Find the domain that contains this CPU */
 		if (cpumask_test_cpu(cpu, &d->hdr.cpu_mask))
 			return d;
@@ -696,7 +704,8 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct=
 rdt_l3_mon_domain *dom_m
 	if (WARN_ON_ONCE(!pmbm_data))
 		return;
=20
-	dom_mba =3D get_ctrl_domain_from_cpu(smp_processor_id(), r_mba);
+	guard(rcu)();
+	dom_mba =3D get_sc_ctrl_domain_from_cpu(smp_processor_id(), r_mba);
 	if (!dom_mba) {
 		pr_warn_once("Failure to get domain for MBA update\n");
 		return;
@@ -799,9 +808,19 @@ void cqm_handle_limbo(struct work_struct *work)
 	unsigned long delay =3D msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
 	struct rdt_l3_mon_domain *d;
=20
-	cpus_read_lock();
+	/*
+	 * Safe to run without CPU hotplug lock. Work is guaranteed to be
+	 * canceled before the domain structure is removed.
+	 */
 	mutex_lock(&rdtgroup_mutex);
=20
+	/*
+	 * Ensure the worker is dedicated to a CPU as intended and not
+	 * relocated by workqueue subsystem as part of CPU going offline.
+	 */
+	if (!is_percpu_thread())
+		goto out_unlock;
+
 	d =3D container_of(work, struct rdt_l3_mon_domain, cqm_limbo.work);
=20
 	__check_limbo(d, false);
@@ -813,8 +832,8 @@ void cqm_handle_limbo(struct work_struct *work)
 					 delay);
 	}
=20
+out_unlock:
 	mutex_unlock(&rdtgroup_mutex);
-	cpus_read_unlock();
 }
=20
 /**
@@ -846,7 +865,10 @@ void mbm_handle_overflow(struct work_struct *work)
 	struct list_head *head;
 	struct rdt_resource *r;
=20
-	cpus_read_lock();
+	/*
+	 * Safe to run without CPU hotplug lock. Work is guaranteed to be
+	 * canceled before the domain structure is removed.
+	 */
 	mutex_lock(&rdtgroup_mutex);
=20
 	/*
@@ -856,6 +878,17 @@ void mbm_handle_overflow(struct work_struct *work)
 	if (!resctrl_mounted || !resctrl_arch_mon_capable())
 		goto out_unlock;
=20
+	/*
+	 * Ensure the worker is dedicated to a CPU and not relocated by
+	 * workqueue subsystem as part of CPU going offline since reading
+	 * events depend on smp_processor_id(). After passing this check
+	 * smp_processor_id() is valid for entire duration of this worker
+	 * since it runs with rdtgroup_mutex held and the offline handler needs
+	 * rdtgroup_mutex to offline the CPU being run on here.
+	 */
+	if (!is_percpu_thread())
+		goto out_unlock;
+
 	r =3D resctrl_arch_get_resource(RDT_RESOURCE_L3);
 	d =3D container_of(work, struct rdt_l3_mon_domain, mbm_over.work);
=20
@@ -880,7 +913,6 @@ void mbm_handle_overflow(struct work_struct *work)
=20
 out_unlock:
 	mutex_unlock(&rdtgroup_mutex);
-	cpus_read_unlock();
 }
=20
 /**
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 9f998e394911..b5fb59d0e035 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -4491,6 +4491,29 @@ static void domain_destroy_l3_mon_state(struct rdt_l=
3_mon_domain *d)
=20
 void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_d=
omain *d)
 {
+	/*
+	 * mbm_handle_overflow() may dereference this ctrl domain via
+	 * update_mba_bw()->get_sc_ctrl_domain_from_cpu(). The architecture has
+	 * unlinked the domain from the RCU list and waited a grace period, so
+	 * no new worker iteration can find it; drain any worker that already
+	 * holds a pointer to it before the architecture frees the domain.
+	 *
+	 * Software controller is enabled/disabled on mount/unmount with
+	 * cpus_read_lock() held. Running here with cpus_write_lock() so
+	 * there are no concurrent changes to software controller status.
+	 */
+	if (r->rid =3D=3D RDT_RESOURCE_MBA && is_mba_sc(r)) {
+		struct rdt_resource *l3 =3D resctrl_arch_get_resource(RDT_RESOURCE_L3);
+		struct rdt_l3_mon_domain *mon_d;
+
+		list_for_each_entry_rcu(mon_d, &l3->mon_domains, hdr.list, lockdep_is_cp=
us_held()) {
+			if (mon_d->hdr.id =3D=3D d->hdr.id) {
+				cancel_delayed_work_sync(&mon_d->mbm_over);
+				break;
+			}
+		}
+	}
+
 	mutex_lock(&rdtgroup_mutex);
=20
 	if (supports_mba_mbps() && r->rid =3D=3D RDT_RESOURCE_MBA)
@@ -4503,6 +4526,24 @@ void resctrl_offline_mon_domain(struct rdt_resource =
*r, struct rdt_domain_hdr *h
 {
 	struct rdt_l3_mon_domain *d;
=20
+	/*
+	 * Called by architecture under CPU hotplug lock as it prepares to remove
+	 * the domain which is guaranteed to be accessible here.
+	 * The domain has been unlinked from the RCU list and a grace period
+	 * has elapsed, so no new worker can be scheduled. Drain any worker that
+	 * is in flight or pending before letting architecture proceed to free
+	 * the domain that has the workers' struct delayed_work embedded.
+	 * Do so before taking rdtgroup_mutex since the workers also acquire it.
+	 */
+	if (r->rid =3D=3D RDT_RESOURCE_L3 &&
+	    domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)) {
+		d =3D container_of(hdr, struct rdt_l3_mon_domain, hdr);
+		if (resctrl_is_mbm_enabled())
+			cancel_delayed_work_sync(&d->mbm_over);
+		if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID))
+			cancel_delayed_work_sync(&d->cqm_limbo);
+	}
+
 	mutex_lock(&rdtgroup_mutex);
=20
 	/*
@@ -4519,8 +4560,6 @@ void resctrl_offline_mon_domain(struct rdt_resource *=
r, struct rdt_domain_hdr *h
 		goto out_unlock;
=20
 	d =3D container_of(hdr, struct rdt_l3_mon_domain, hdr);
-	if (resctrl_is_mbm_enabled())
-		cancel_delayed_work(&d->mbm_over);
 	if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && has_busy_rmid(=
d)) {
 		/*
 		 * When a package is going down, forcefully
@@ -4531,7 +4570,6 @@ void resctrl_offline_mon_domain(struct rdt_resource *=
r, struct rdt_domain_hdr *h
 		 * package never comes back.
 		 */
 		__check_limbo(d, true);
-		cancel_delayed_work(&d->cqm_limbo);
 	}
=20
 	domain_destroy_l3_mon_state(d);
@@ -4712,12 +4750,16 @@ void resctrl_offline_cpu(unsigned int cpu)
 	d =3D get_mon_domain_from_cpu(cpu, l3);
 	if (d) {
 		if (resctrl_is_mbm_enabled() && cpu =3D=3D d->mbm_work_cpu) {
-			cancel_delayed_work(&d->mbm_over);
+			mutex_unlock(&rdtgroup_mutex);
+			cancel_delayed_work_sync(&d->mbm_over);
+			mutex_lock(&rdtgroup_mutex);
 			mbm_setup_overflow_handler(d, 0, cpu);
 		}
 		if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) &&
 		    cpu =3D=3D d->cqm_work_cpu && has_busy_rmid(d)) {
-			cancel_delayed_work(&d->cqm_limbo);
+			mutex_unlock(&rdtgroup_mutex);
+			cancel_delayed_work_sync(&d->cqm_limbo);
+			mutex_lock(&rdtgroup_mutex);
 			cqm_setup_limbo_handler(d, 0, cpu);
 		}
 	}
--=20
2.50.1