From nobody Mon May 25 08:11:42 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C781237FF5F for ; Fri, 15 May 2026 19:39:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778873994; cv=none; b=CnFaKI/8Tf/tCQilKwF1R0Ehxtc8SLnS4MvZE8/8IsweEnKx/RaOWgja8oeGDc9qE2gaFAQGs1PINFMLrqQXSPiULCE1KqWa0izGA/WkbHDPoieiqPFX8N/Y+QFmubgcmB8YhQNC95vIsKQQELUazzBEQB1RxTSdc8C+cMQUrAA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778873994; c=relaxed/simple; bh=E8mfTCDNKbeGgk6TThyI+YXc7htFJhG+Q5wfaOKsOpE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DBOvaINvwQWQaCAUF34otQIcGW2IcKxHBLMO3S6BDWQRaEdIWlfoP/13FaqsBH/HGF/UqteTLYc+4cYk2rWQGcfZNK8r6uzAV4P6CS90mGpEhWml807dUrclAf2kdepNydgWtJ+KxGmGDc6TNKDl+Q0YYNuRyhA/xNaxQ4UPffA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hcqsV++l; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hcqsV++l" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778873993; x=1810409993; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=E8mfTCDNKbeGgk6TThyI+YXc7htFJhG+Q5wfaOKsOpE=; b=hcqsV++lraJS01OcUBzBvglEDparb6tNmOWYIIEVrd7WrSKe8zCfRLSK pvonczUQhor6UQV1ppLFjP7TzLbTpX01QhhquJhP9roWGNnpLShNtFJtL XWTDRmC5HBRzoytEUzqUwri0z76nAAw0MnrGO0h8rS9IqgDhbh/7WJIsJ Lg84Oom4ei3fgSgt89PWIlMlcQqxnY3tkZw1C+iJDg5d+CnlpAujTnO0Q NpI7lj10NjuWlprpVDLiXfoXw98+O4o4pS/Fev8GblgNeHnEYFCGKDLWT pVeoV8qYPFh1RiMF5NWMJv4zCkrFWpKz41igsSYi/3EO8tckbwCxRj6jk A==; X-CSE-ConnectionGUID: a5X3HzYuTCq35EDPOfTnvQ== X-CSE-MsgGUID: LrTrN44LRkO6/Y9iVtslvg== X-IronPort-AV: E=McAfee;i="6800,10657,11787"; a="78972218" X-IronPort-AV: E=Sophos;i="6.23,236,1770624000"; d="scan'208";a="78972218" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2026 12:39:51 -0700 X-CSE-ConnectionGUID: m+gLblcGROOwvmWxTLvIDg== X-CSE-MsgGUID: lOmwPMt/RDavAb0ka5Pklg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,236,1770624000"; d="scan'208";a="237916561" Received: from hanvin-mobl3.amr.corp.intel.com (HELO agluck-desk3.intel.com) ([10.124.222.27]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2026 12:39:50 -0700 From: Tony Luck To: Fenghua Yu , Reinette Chatre , Maciej Wieczor-Retman , Peter Newman , James Morse , Babu Moger , Drew Fustini , Dave Martin , Chen Yu Cc: Borislav Petkov , x86@kernel.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev, Tony Luck Subject: [PATCH v2 1/5] fs/resctrl: Move functions to avoid forward references in subsequent fixes Date: Fri, 15 May 2026 12:39:40 -0700 Message-ID: <20260515193944.15114-2-tony.luck@intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260515193944.15114-1-tony.luck@intel.com> References: <20260515193944.15114-1-tony.luck@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" No functional change. Just pull some functions before rdt_get_tree(). Signed-off-by: Tony Luck --- fs/resctrl/rdtgroup.c | 376 +++++++++++++++++++++--------------------- 1 file changed, 188 insertions(+), 188 deletions(-) diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c index 5dfdaa6f9d8f..a6376a3fc4c3 100644 --- a/fs/resctrl/rdtgroup.c +++ b/fs/resctrl/rdtgroup.c @@ -2782,6 +2782,194 @@ static void schemata_list_destroy(void) } } =20 +/* + * Move tasks from one to the other group. If @from is NULL, then all tasks + * in the systems are moved unconditionally (used for teardown). + * + * If @mask is not NULL the cpus on which moved tasks are running are set + * in that mask so the update smp function call is restricted to affected + * cpus. + */ +static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *t= o, + struct cpumask *mask) +{ + struct task_struct *p, *t; + + read_lock(&tasklist_lock); + for_each_process_thread(p, t) { + if (!from || is_closid_match(t, from) || + is_rmid_match(t, from)) { + resctrl_arch_set_closid_rmid(t, to->closid, + to->mon.rmid); + + /* + * Order the closid/rmid stores above before the loads + * in task_curr(). This pairs with the full barrier + * between the rq->curr update and + * resctrl_arch_sched_in() during context switch. + */ + smp_mb(); + + /* + * If the task is on a CPU, set the CPU in the mask. + * The detection is inaccurate as tasks might move or + * schedule before the smp function call takes place. + * In such a case the function call is pointless, but + * there is no other side effect. + */ + if (IS_ENABLED(CONFIG_SMP) && mask && task_curr(t)) + cpumask_set_cpu(task_cpu(t), mask); + } + } + read_unlock(&tasklist_lock); +} + +static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp) +{ + struct rdtgroup *sentry, *stmp; + struct list_head *head; + + head =3D &rdtgrp->mon.crdtgrp_list; + list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) { + rdtgroup_unassign_cntrs(sentry); + free_rmid(sentry->closid, sentry->mon.rmid); + list_del(&sentry->mon.crdtgrp_list); + + if (atomic_read(&sentry->waitcount) !=3D 0) + sentry->flags =3D RDT_DELETED; + else + rdtgroup_remove(sentry); + } +} + +/* + * Forcibly remove all of subdirectories under root. + */ +static void rmdir_all_sub(void) +{ + struct rdtgroup *rdtgrp, *tmp; + + /* Move all tasks to the default resource group */ + rdt_move_group_tasks(NULL, &rdtgroup_default, NULL); + + list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) { + /* Free any child rmids */ + free_all_child_rdtgrp(rdtgrp); + + /* Remove each rdtgroup other than root */ + if (rdtgrp =3D=3D &rdtgroup_default) + continue; + + if (rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKSETUP || + rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKED) + rdtgroup_pseudo_lock_remove(rdtgrp); + + /* + * Give any CPUs back to the default group. We cannot copy + * cpu_online_mask because a CPU might have executed the + * offline callback already, but is still marked online. + */ + cpumask_or(&rdtgroup_default.cpu_mask, + &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask); + + rdtgroup_unassign_cntrs(rdtgrp); + + free_rmid(rdtgrp->closid, rdtgrp->mon.rmid); + + kernfs_remove(rdtgrp->kn); + list_del(&rdtgrp->rdtgroup_list); + + if (atomic_read(&rdtgrp->waitcount) !=3D 0) + rdtgrp->flags =3D RDT_DELETED; + else + rdtgroup_remove(rdtgrp); + } + /* Notify online CPUs to update per cpu storage and PQR_ASSOC MSR */ + update_closid_rmid(cpu_online_mask, &rdtgroup_default); + + kernfs_remove(kn_info); + kernfs_remove(kn_mongrp); + kernfs_remove(kn_mondata); +} + +/** + * mon_get_kn_priv() - Get the mon_data priv data for this event. + * + * The same values are used across the mon_data directories of all control= and + * monitor groups for the same event in the same domain. Keep a list of + * allocated structures and re-use an existing one with the same values for + * @rid, @domid, etc. + * + * @rid: The resource id for the event file being created. + * @domid: The domain id for the event file being created. + * @mevt: The type of event file being created. + * @do_sum: Whether SNC summing monitors are being created. Only set + * when @rid =3D=3D RDT_RESOURCE_L3. + * + * Return: Pointer to mon_data private data of the event, NULL on failure. + */ +static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int do= mid, + struct mon_evt *mevt, + bool do_sum) +{ + struct mon_data *priv; + + lockdep_assert_held(&rdtgroup_mutex); + + list_for_each_entry(priv, &mon_data_kn_priv_list, list) { + if (priv->rid =3D=3D rid && priv->domid =3D=3D domid && + priv->sum =3D=3D do_sum && priv->evt =3D=3D mevt) + return priv; + } + + priv =3D kzalloc_obj(*priv); + if (!priv) + return NULL; + + priv->rid =3D rid; + priv->domid =3D domid; + priv->sum =3D do_sum; + priv->evt =3D mevt; + list_add_tail(&priv->list, &mon_data_kn_priv_list); + + return priv; +} + +/** + * mon_put_kn_priv() - Free all allocated mon_data structures. + * + * Called when resctrl file system is unmounted. + */ +static void mon_put_kn_priv(void) +{ + struct mon_data *priv, *tmp; + + lockdep_assert_held(&rdtgroup_mutex); + + list_for_each_entry_safe(priv, tmp, &mon_data_kn_priv_list, list) { + list_del(&priv->list); + kfree(priv); + } +} + +static void resctrl_fs_teardown(void) +{ + lockdep_assert_held(&rdtgroup_mutex); + + /* Cleared by rdtgroup_destroy_root() */ + if (!rdtgroup_default.kn) + return; + + rmdir_all_sub(); + rdtgroup_unassign_cntrs(&rdtgroup_default); + mon_put_kn_priv(); + rdt_pseudo_lock_release(); + rdtgroup_default.mode =3D RDT_MODE_SHAREABLE; + closid_exit(); + schemata_list_destroy(); + rdtgroup_destroy_root(); +} + static int rdt_get_tree(struct fs_context *fc) { struct rdt_fs_context *ctx =3D rdt_fc2context(fc); @@ -2981,194 +3169,6 @@ static int rdt_init_fs_context(struct fs_context *f= c) return 0; } =20 -/* - * Move tasks from one to the other group. If @from is NULL, then all tasks - * in the systems are moved unconditionally (used for teardown). - * - * If @mask is not NULL the cpus on which moved tasks are running are set - * in that mask so the update smp function call is restricted to affected - * cpus. - */ -static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *t= o, - struct cpumask *mask) -{ - struct task_struct *p, *t; - - read_lock(&tasklist_lock); - for_each_process_thread(p, t) { - if (!from || is_closid_match(t, from) || - is_rmid_match(t, from)) { - resctrl_arch_set_closid_rmid(t, to->closid, - to->mon.rmid); - - /* - * Order the closid/rmid stores above before the loads - * in task_curr(). This pairs with the full barrier - * between the rq->curr update and - * resctrl_arch_sched_in() during context switch. - */ - smp_mb(); - - /* - * If the task is on a CPU, set the CPU in the mask. - * The detection is inaccurate as tasks might move or - * schedule before the smp function call takes place. - * In such a case the function call is pointless, but - * there is no other side effect. - */ - if (IS_ENABLED(CONFIG_SMP) && mask && task_curr(t)) - cpumask_set_cpu(task_cpu(t), mask); - } - } - read_unlock(&tasklist_lock); -} - -static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp) -{ - struct rdtgroup *sentry, *stmp; - struct list_head *head; - - head =3D &rdtgrp->mon.crdtgrp_list; - list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) { - rdtgroup_unassign_cntrs(sentry); - free_rmid(sentry->closid, sentry->mon.rmid); - list_del(&sentry->mon.crdtgrp_list); - - if (atomic_read(&sentry->waitcount) !=3D 0) - sentry->flags =3D RDT_DELETED; - else - rdtgroup_remove(sentry); - } -} - -/* - * Forcibly remove all of subdirectories under root. - */ -static void rmdir_all_sub(void) -{ - struct rdtgroup *rdtgrp, *tmp; - - /* Move all tasks to the default resource group */ - rdt_move_group_tasks(NULL, &rdtgroup_default, NULL); - - list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) { - /* Free any child rmids */ - free_all_child_rdtgrp(rdtgrp); - - /* Remove each rdtgroup other than root */ - if (rdtgrp =3D=3D &rdtgroup_default) - continue; - - if (rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKSETUP || - rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKED) - rdtgroup_pseudo_lock_remove(rdtgrp); - - /* - * Give any CPUs back to the default group. We cannot copy - * cpu_online_mask because a CPU might have executed the - * offline callback already, but is still marked online. - */ - cpumask_or(&rdtgroup_default.cpu_mask, - &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask); - - rdtgroup_unassign_cntrs(rdtgrp); - - free_rmid(rdtgrp->closid, rdtgrp->mon.rmid); - - kernfs_remove(rdtgrp->kn); - list_del(&rdtgrp->rdtgroup_list); - - if (atomic_read(&rdtgrp->waitcount) !=3D 0) - rdtgrp->flags =3D RDT_DELETED; - else - rdtgroup_remove(rdtgrp); - } - /* Notify online CPUs to update per cpu storage and PQR_ASSOC MSR */ - update_closid_rmid(cpu_online_mask, &rdtgroup_default); - - kernfs_remove(kn_info); - kernfs_remove(kn_mongrp); - kernfs_remove(kn_mondata); -} - -/** - * mon_get_kn_priv() - Get the mon_data priv data for this event. - * - * The same values are used across the mon_data directories of all control= and - * monitor groups for the same event in the same domain. Keep a list of - * allocated structures and re-use an existing one with the same values for - * @rid, @domid, etc. - * - * @rid: The resource id for the event file being created. - * @domid: The domain id for the event file being created. - * @mevt: The type of event file being created. - * @do_sum: Whether SNC summing monitors are being created. Only set - * when @rid =3D=3D RDT_RESOURCE_L3. - * - * Return: Pointer to mon_data private data of the event, NULL on failure. - */ -static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int do= mid, - struct mon_evt *mevt, - bool do_sum) -{ - struct mon_data *priv; - - lockdep_assert_held(&rdtgroup_mutex); - - list_for_each_entry(priv, &mon_data_kn_priv_list, list) { - if (priv->rid =3D=3D rid && priv->domid =3D=3D domid && - priv->sum =3D=3D do_sum && priv->evt =3D=3D mevt) - return priv; - } - - priv =3D kzalloc_obj(*priv); - if (!priv) - return NULL; - - priv->rid =3D rid; - priv->domid =3D domid; - priv->sum =3D do_sum; - priv->evt =3D mevt; - list_add_tail(&priv->list, &mon_data_kn_priv_list); - - return priv; -} - -/** - * mon_put_kn_priv() - Free all allocated mon_data structures. - * - * Called when resctrl file system is unmounted. - */ -static void mon_put_kn_priv(void) -{ - struct mon_data *priv, *tmp; - - lockdep_assert_held(&rdtgroup_mutex); - - list_for_each_entry_safe(priv, tmp, &mon_data_kn_priv_list, list) { - list_del(&priv->list); - kfree(priv); - } -} - -static void resctrl_fs_teardown(void) -{ - lockdep_assert_held(&rdtgroup_mutex); - - /* Cleared by rdtgroup_destroy_root() */ - if (!rdtgroup_default.kn) - return; - - rmdir_all_sub(); - rdtgroup_unassign_cntrs(&rdtgroup_default); - mon_put_kn_priv(); - rdt_pseudo_lock_release(); - rdtgroup_default.mode =3D RDT_MODE_SHAREABLE; - closid_exit(); - schemata_list_destroy(); - rdtgroup_destroy_root(); -} - static void rdt_kill_sb(struct super_block *sb) { struct rdt_resource *r; --=20 2.54.0 From nobody Mon May 25 08:11:42 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AD51D413D7D for ; Fri, 15 May 2026 19:39:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778873995; cv=none; b=g4ii4n5/qHjG1eaU0Tg8jYOnfwUJjD+wVW9zwRe9sBWKg5RhAWGkcP8FjyFie9c3GxDifpQ3AXSMaR63HVliIfJSLDi0mevBxvs4o3v4b/MfrHa8ob3VuDqjFjtIcoYHT3wDt8G0/Y8l5zQdomPeG0TjYLkzzwdNu145ZPotC/U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778873995; c=relaxed/simple; bh=z6bA6belrSfAUypBILamq7CWKvxjiNvjUeDFMb09bws=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iwpc3LWsh0LXtEwB3riCSVi+cMtAdOdRtaFBXTMpDKAPjXKGHEx+krNGwleq1YjtNtCEaFgXvujeCPald8g4kdwpxlLfGoxHwYoGFrhp2a8yMfXfEO8+br5QnBz0NZnVPkDLL95FqweDiSYh8zJUx6gEmDugvUWu6SCLwRMvTXY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=OC3wVBD3; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="OC3wVBD3" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778873993; x=1810409993; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=z6bA6belrSfAUypBILamq7CWKvxjiNvjUeDFMb09bws=; b=OC3wVBD3salIDSUzuxqqrY8z5o5fycUcdGFCosG5jNuFYTtc0xYaYu8p N3OBXccE0Wg02c98J0uw/Tv52/R4SC/4RnzoB8vsRLzDe1HmJWUoeXGLb iNMxKgWWm64sXA93gWuCa5apyByLoNBqhNFdEBjTCyZ8unKqY5KCjH1pV vbG/sYLa4AKQH1Xlp1sVqVlFpzRl/oVOJ4qHbRqgoIKqgBBQuAVeyWNSQ iLLGduN9+MtZ3OMGzE0VtF2nvqIzhPeit/+q6rwNhhSXP9UGyHSp93/J/ nNhfmFgZUved6sD72chg2EhnZD9puurp1gf/UUJ4F2eK3jrMb6g4ZDbQ1 g==; X-CSE-ConnectionGUID: Qzu88kwNSeK7yK64P/iNiQ== X-CSE-MsgGUID: I/KSrCQ2Sq+VeqsvF5ZBHg== X-IronPort-AV: E=McAfee;i="6800,10657,11787"; a="78972228" X-IronPort-AV: E=Sophos;i="6.23,236,1770624000"; d="scan'208";a="78972228" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2026 12:39:51 -0700 X-CSE-ConnectionGUID: guA8r3jSQK2N9pOBgbOKEA== X-CSE-MsgGUID: nDUIVyG0TkqxbGKosg3K7w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,236,1770624000"; d="scan'208";a="237916565" Received: from hanvin-mobl3.amr.corp.intel.com (HELO agluck-desk3.intel.com) ([10.124.222.27]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2026 12:39:50 -0700 From: Tony Luck To: Fenghua Yu , Reinette Chatre , Maciej Wieczor-Retman , Peter Newman , James Morse , Babu Moger , Drew Fustini , Dave Martin , Chen Yu Cc: Borislav Petkov , x86@kernel.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev, Tony Luck Subject: [PATCH v2 2/5] fs/resctrl: Free mon_data structures on rdt_get_tree() failure Date: Fri, 15 May 2026 12:39:41 -0700 Message-ID: <20260515193944.15114-3-tony.luck@intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260515193944.15114-1-tony.luck@intel.com> References: <20260515193944.15114-1-tony.luck@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" If mkdir_mondata_all() succeeds but a subsequent call in rdt_get_tree() fails, the mon_data structures allocated by mon_get_kn_priv() are leaked. Add mon_put_kn_priv() to the out_mondata error path to free them. Fixes: 2a6566038544 ("x86/resctrl: Expand the width of domid by replacing m= on_data_bits") Reported-by: Reinette Chatre Signed-off-by: Tony Luck Reviewed-by: Chen Yu --- fs/resctrl/rdtgroup.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c index a6376a3fc4c3..506b40dc9430 100644 --- a/fs/resctrl/rdtgroup.c +++ b/fs/resctrl/rdtgroup.c @@ -3071,6 +3071,7 @@ static int rdt_get_tree(struct fs_context *fc) kernfs_remove(kn_mondata); out_mongrp: if (resctrl_arch_mon_capable()) { + mon_put_kn_priv(); rdtgroup_unassign_cntrs(&rdtgroup_default); kernfs_remove(kn_mongrp); } --=20 2.54.0 From nobody Mon May 25 08:11:42 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0872B37CD5F for ; Fri, 15 May 2026 19:39:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778873995; cv=none; b=XjY94RNQ/jJQW28GaU1iSE75+OV9s/GxyhOSN4R2Vu+JLyTGMGS4D9+IMl65goFnADSLnITdhsD2eleGgphwPrhfPksKFFmg29luJlRLvFrqGOiTWT2GtEEG/C0gBsmoJLt7mZbLrEO72ozjEt4Iw+6XeSW9Rtke38ASdqsgnMQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778873995; c=relaxed/simple; bh=fpupJjx8Ob0z6xFVGlShdQvQmaTfW9HLNv4R/BFJBVg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=c7x3UUm7rmf73dS3QDQRxqmjooqVUbw8ssghKiA7O9wh6EV1BQGV2S2KBhfefBX2uZW14DKiML0uUe9pspPBhe/FzcI1hfPm7C/tLn+w3v7yJgMzQ8I4p9e63OsbfbagxXI7Rc8UKj6TcoWdqRt01fepOAxxopiB5rWA5+0gZBU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=XdQ8lUzB; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="XdQ8lUzB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778873994; x=1810409994; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fpupJjx8Ob0z6xFVGlShdQvQmaTfW9HLNv4R/BFJBVg=; b=XdQ8lUzB0mSGpoixO0LpUcqYVJG7Kw5sE6CkpUoIC/Un/jYKo2RpqlQV bPIr3auULP8VFrn8KZg08tds08aCJNmBfrA+4cgUQ96eJDn6UhMUpgJ7m OlSFXcpYPpCIk6/1lQBMjdNlDAAjCTLHtAgtfbi8fGgSAS+oWULbnEWez 1gWEhAeknS2ZX/bPdEQKkyM01b94tiySQp2BPu2wsTln9IHOVi/vTWWkN 1Hdg4b6AFYG/lyuQtnfRx1kzTWleP0PldowsnZ0XqdF1YwZel9HpZFMqV /hYCl6etLE5uovV0BAAbv2bhCmf9WiX/2NVUhIUloNiJjvPQNPKIxEs8y g==; X-CSE-ConnectionGUID: K1rQ2HOlQsCguNrO7m1axQ== X-CSE-MsgGUID: 4bN5RC6GSZOpsMcCEt9q7w== X-IronPort-AV: E=McAfee;i="6800,10657,11787"; a="78972239" X-IronPort-AV: E=Sophos;i="6.23,236,1770624000"; d="scan'208";a="78972239" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2026 12:39:51 -0700 X-CSE-ConnectionGUID: eg5nlDUJTBaCSl2Ip2h8rA== X-CSE-MsgGUID: wxSx8Ty2S6axW/EV7/e1TA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,236,1770624000"; d="scan'208";a="237916571" Received: from hanvin-mobl3.amr.corp.intel.com (HELO agluck-desk3.intel.com) ([10.124.222.27]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2026 12:39:51 -0700 From: Tony Luck To: Fenghua Yu , Reinette Chatre , Maciej Wieczor-Retman , Peter Newman , James Morse , Babu Moger , Drew Fustini , Dave Martin , Chen Yu Cc: Borislav Petkov , x86@kernel.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev, Tony Luck Subject: [PATCH v2 3/5] fs/resctrl: Fix use-after-free during unmount Date: Fri, 15 May 2026 12:39:42 -0700 Message-ID: <20260515193944.15114-4-tony.luck@intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260515193944.15114-1-tony.luck@intel.com> References: <20260515193944.15114-1-tony.luck@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Sashiko reported[1] this issue: During unmount or failure teardown, resctrl_fs_teardown() calls mon_put_kn_priv() (which frees all mon_data structures) followed by rdtgroup_destroy_root() (which destroys kernfs nodes). However, the RDT_DELETED flag is never set for rdtgroup_default. If a concurrent reader (e.g., rdtgroup_mondata_show()) invokes rdtgroup_kn_lock_live(), it drops kernfs active protection and blocks on rdtgroup_mutex. resctrl_fs_teardown() (holding the mutex) proceeds to free the private data and destroy the nodes without waiting for the reader. When the mutex is released, the reader wakes up, observes that RDT_DELETE= D is not set for the default group, and dereferences the already-freed of->kn-= >priv pointer. Set RDT_DELETED for the default group (if there are any tasks waiting). Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system") Signed-off-by: Tony Luck Link: https://sashiko.dev/#/patchset/20260508182143.14592-1-tony.luck%40int= el.com?part=3D2 [1] --- fs/resctrl/rdtgroup.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c index 506b40dc9430..97d1a3648b9e 100644 --- a/fs/resctrl/rdtgroup.c +++ b/fs/resctrl/rdtgroup.c @@ -593,6 +593,13 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_= file *of, */ static void rdtgroup_remove(struct rdtgroup *rdtgrp) { + /* + * Groups created with mkdir() have an extra hold, that doesn't + * apply to the default group. It is stacically allocated, so + * does not need to be freed. + */ + if (rdtgrp =3D=3D &rdtgroup_default) + return; kernfs_put(rdtgrp->kn); kfree(rdtgrp); } @@ -2965,6 +2972,7 @@ static void resctrl_fs_teardown(void) mon_put_kn_priv(); rdt_pseudo_lock_release(); rdtgroup_default.mode =3D RDT_MODE_SHAREABLE; + rdtgroup_default.flags =3D RDT_DELETED; closid_exit(); schemata_list_destroy(); rdtgroup_destroy_root(); @@ -2990,6 +2998,12 @@ static int rdt_get_tree(struct fs_context *fc) goto out; } =20 + /* Avoid races from pending operations from a previous mount */ + if (atomic_read(&rdtgroup_default.waitcount) !=3D 0) { + ret =3D -EBUSY; + goto out; + } + ret =3D setup_rmid_lru_list(); if (ret) goto out; @@ -4265,6 +4279,7 @@ static int rdtgroup_setup_root(struct rdt_fs_context = *ctx) =20 ctx->kfc.root =3D rdt_root; rdtgroup_default.kn =3D kernfs_root_to_node(rdt_root); + rdtgroup_default.flags =3D 0; =20 return 0; } --=20 2.54.0 From nobody Mon May 25 08:11:42 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B974438398B for ; Fri, 15 May 2026 19:39:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778873996; cv=none; b=uromf6gHwA53NRSLe6ieNPFwxSOqR9NfoJkoknncqbCnSQsSjhz4edk8nw4Pr6kJRAup/M3sHlEbUGKxFwWKMj+tEwOF4TmhxRhe9M4oFjfDSph6DnVihyasJGo9PNI9pHCWHaZyrAGKxsjqBtj0OlFc1eC+hjFtUSn/Un6LD94= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778873996; c=relaxed/simple; bh=3F4Z+vJGQ8MGpdrCQTPEDhUTS1uQ8W3/UXG5mV11VAw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=efFV3l3TY4hYkTDRmG2sAbJmLqgqA9IJraQG/a357p7kZxOY1/vWXthSkSU1q52i+unArL8jj4zKhtu2QhgjLbdSbjM5aKNxLKHYDUltvTKEZF9J1cqvvylK2lB6ew12g+ZefA1Ber0blh+NQdPuDQzr0OC1kajq8bH/0FmmtT8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=GUWuqLnG; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="GUWuqLnG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778873995; x=1810409995; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3F4Z+vJGQ8MGpdrCQTPEDhUTS1uQ8W3/UXG5mV11VAw=; b=GUWuqLnGjZuriL8jd7+D3MV3DCw2j+dWIow2Ni41MMrWZ8b8arZdK8Il WKJ0+FFhEsWvqj/GPh6CRghDZgE8kIDXgL67DMvhFHMnsMoeZwTUe38Yc 9ceVQqn2NqiqAZ+JehwOG8BcWNu5k43lZgKYfehSsBxuFDE3BUMIeBN6T vaL6XMj0+0j2jiY4pTuzLf5cbG9aXXy813TUDFga6qpOmUZOtfVsxPBdV ILd+1szuJeZmcQOFHqghTkAFtcdf0Ya6V1Sw/MYfWHLYy/HA8efA1RHNu dtyCzmLC4uCnn2xswOskVOajrxn7PxAE5otylSgmv9RXL071tsXfEbDt1 g==; X-CSE-ConnectionGUID: nE1qiqFCSbOTXDfk9IjKyA== X-CSE-MsgGUID: n97z+alHRge4PSDDZVdObg== X-IronPort-AV: E=McAfee;i="6800,10657,11787"; a="78972249" X-IronPort-AV: E=Sophos;i="6.23,236,1770624000"; d="scan'208";a="78972249" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2026 12:39:52 -0700 X-CSE-ConnectionGUID: Fj3/9FGjTDajzlGFXKEGAw== X-CSE-MsgGUID: AeB5t0WZTHWJglJ/8MfVRA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,236,1770624000"; d="scan'208";a="237916574" Received: from hanvin-mobl3.amr.corp.intel.com (HELO agluck-desk3.intel.com) ([10.124.222.27]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2026 12:39:51 -0700 From: Tony Luck To: Fenghua Yu , Reinette Chatre , Maciej Wieczor-Retman , Peter Newman , James Morse , Babu Moger , Drew Fustini , Dave Martin , Chen Yu Cc: Borislav Petkov , x86@kernel.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev, Tony Luck Subject: [PATCH v2 4/5] fs/resctrl: Fix deadlock for errors during mount Date: Fri, 15 May 2026 12:39:43 -0700 Message-ID: <20260515193944.15114-5-tony.luck@intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260515193944.15114-1-tony.luck@intel.com> References: <20260515193944.15114-1-tony.luck@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Reinette Chatre Sashiko noticed[1] a deadlock in the resctrl mount code. rdt_get_tree() acquires rdtgroup_mutex before calling kernfs_get_tree(). If superblock setup fails inside kernfs_get_tree(), the VFS calls kill_sb on the same thread before the call returns. rdt_kill_sb() unconditionally attempts to acquire rdtgroup_mutex and deadlock occurs. Move the call to kernfs_get_tree() outside of locks. Add resctrl_unmount() helper to keep code consistent between the rdt_get_tree() failure path and a normal unmount. If kernfs_get_tree() fails and ctx->kfc.new_sb_created is set, then rdt_kil= l_sb() has already been called and no further cleanup is needed. Add an extra hold in this error path on rdtgroup_default.kn to defend again= st other races destroying the root which is then dereferenced in kernfs_kill_s= b() Fixes: 5ff193fbde20 ("x86/intel_rdt: Add basic resctrl filesystem support") Co-developed-by: Tony Luck Signed-off-by: Tony Luck Link: https://sashiko.dev/#/patchset/20260429184858.36423-1-tony.luck%40int= el.com [1] --- fs/resctrl/rdtgroup.c | 82 +++++++++++++++++++++++++++++-------------- 1 file changed, 55 insertions(+), 27 deletions(-) diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c index 97d1a3648b9e..282a0acedea8 100644 --- a/fs/resctrl/rdtgroup.c +++ b/fs/resctrl/rdtgroup.c @@ -2978,10 +2978,34 @@ static void resctrl_fs_teardown(void) rdtgroup_destroy_root(); } =20 +static void resctrl_unmount(void) +{ + struct rdt_resource *r; + + cpus_read_lock(); + mutex_lock(&rdtgroup_mutex); + + rdt_disable_ctx(); + + /* Put everything back to default values. */ + for_each_alloc_capable_rdt_resource(r) + resctrl_arch_reset_all_ctrls(r); + + resctrl_fs_teardown(); + if (resctrl_arch_alloc_capable()) + resctrl_arch_disable_alloc(); + if (resctrl_arch_mon_capable()) + resctrl_arch_disable_mon(); + resctrl_mounted =3D false; + mutex_unlock(&rdtgroup_mutex); + cpus_read_unlock(); +} + static int rdt_get_tree(struct fs_context *fc) { struct rdt_fs_context *ctx =3D rdt_fc2context(fc); unsigned long flags =3D RFTYPE_CTRL_BASE; + struct kernfs_node *rdt_root_kn; struct rdt_l3_mon_domain *dom; struct rdt_resource *r; int ret; @@ -3057,10 +3081,6 @@ static int rdt_get_tree(struct fs_context *fc) if (ret) goto out_mondata; =20 - ret =3D kernfs_get_tree(fc); - if (ret < 0) - goto out_psl; - if (resctrl_arch_alloc_capable()) resctrl_arch_enable_alloc(); if (resctrl_arch_mon_capable()) @@ -3076,10 +3096,37 @@ static int rdt_get_tree(struct fs_context *fc) RESCTRL_PICK_ANY_CPU); } =20 - goto out; + /* + * Ensure root kn remains accessible after mutex is unlocked so that + * kernfs_kill_sb() can run safely if called by kernfs_get_tree()'s + * failure path after creating a superblock but before taking reference + * on root kn. + */ + kernfs_get(rdtgroup_default.kn); + + /* + * Make backup of the current root kn being created to be used in kernfs_= put(). + * The additional reference taken above will prevent the kn from being fr= eed + * before kernfs_kill_sb() can run but rdtgroup_default.kn may be set to = NULL + * via rdtgroup_destroy_root() and its backing root (rdt_root) could be o= verwritten + * before kernfs_put() can run. + */ + rdt_root_kn =3D rdtgroup_default.kn; + + rdt_last_cmd_clear(); + mutex_unlock(&rdtgroup_mutex); + cpus_read_unlock(); + + ret =3D kernfs_get_tree(fc); + /* + * resctrl can only be mounted once, new superblock only expected + * to be created once. + */ + if (!ctx->kfc.new_sb_created) + resctrl_unmount(); + kernfs_put(rdt_root_kn); + return ret; =20 -out_psl: - rdt_pseudo_lock_release(); out_mondata: if (resctrl_arch_mon_capable()) kernfs_remove(kn_mondata); @@ -3099,7 +3146,6 @@ static int rdt_get_tree(struct fs_context *fc) out_root: rdtgroup_destroy_root(); out: - rdt_last_cmd_clear(); mutex_unlock(&rdtgroup_mutex); cpus_read_unlock(); return ret; @@ -3186,26 +3232,8 @@ static int rdt_init_fs_context(struct fs_context *fc) =20 static void rdt_kill_sb(struct super_block *sb) { - struct rdt_resource *r; - - cpus_read_lock(); - mutex_lock(&rdtgroup_mutex); - - rdt_disable_ctx(); - - /* Put everything back to default values. */ - for_each_alloc_capable_rdt_resource(r) - resctrl_arch_reset_all_ctrls(r); - - resctrl_fs_teardown(); - if (resctrl_arch_alloc_capable()) - resctrl_arch_disable_alloc(); - if (resctrl_arch_mon_capable()) - resctrl_arch_disable_mon(); - resctrl_mounted =3D false; + resctrl_unmount(); kernfs_kill_sb(sb); - mutex_unlock(&rdtgroup_mutex); - cpus_read_unlock(); } =20 static struct file_system_type rdt_fs_type =3D { --=20 2.54.0 From nobody Mon May 25 08:11:42 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 34EC3384CD5 for ; Fri, 15 May 2026 19:39:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778873997; cv=none; b=uiqJU0oP/R5W7dQ8ODg94pqQ6y8CMStIEYyAtQpTEnsz8v23tlpexViJSaDs3E3exXPyTZRs6diOqNGg41ZCAOZGuzZwPLemhLPFrYsTh9Gxa4OEyJs1Rbo71ep1/Ns7MEPWJnG2KZdWFQNiS0e6u9fAdcxE38I74RdhdhNQke8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778873997; c=relaxed/simple; bh=Gzr2KNycChzD6zGq8x9xnRncTWTf8aVr1UXawk7ZQYA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DI/ozwE2v1KlPpSepx7qgXrurNBl0a3qg1C0kTgJP6xxtyo4QjrOcRK98LVTGnpqCI66UKuKc3ZIKM1/3yQHl97N/BTuClouQVFksezAckWjcxIkKtY+o3v3xRtY6maIIhvFh/4m3Jy5GmbqNqhlHhbZxodvSsHcenlbxSPyegU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=RLV+s1cX; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="RLV+s1cX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778873995; x=1810409995; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Gzr2KNycChzD6zGq8x9xnRncTWTf8aVr1UXawk7ZQYA=; b=RLV+s1cXVBnHeSDAzGmTp2HjmikcjX90EAAJvUn4p5I9lrkCAqN0fPjO 3gURxyyqxOubCZFuhKY/IYfCxdX64qA5R2i0mVATkHP9x0kJXr9n+BhFU TCkBArK7th3yNKTC3if43p5lJ1biEIy9CrxBqXe9K6AfLrlYBs8l9v4b+ oH1uWsp7Y2r0KXVDpI7wM9vChuZks2fL0tkCVjFaFtSWfaWHyPVp0Pvup z8U33pk2fNUiCdNm6mTIhE1sCuYqk9RKTDJtBxGPK5+CTD+GqjqIdi3uB VBiog86VICPb6NpYShQq6QBtW5eKNlN4oTKSsqiTwXhx56BJyvRbA1KQb w==; X-CSE-ConnectionGUID: Zr0WYMt9Q3ikyTjebPVfEA== X-CSE-MsgGUID: 7Vrhd3PiSce2/mLdrVxr6w== X-IronPort-AV: E=McAfee;i="6800,10657,11787"; a="78972258" X-IronPort-AV: E=Sophos;i="6.23,236,1770624000"; d="scan'208";a="78972258" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2026 12:39:52 -0700 X-CSE-ConnectionGUID: SMm6XVgET3mxL6vwPlJpiw== X-CSE-MsgGUID: f5KDMiVFTsOfNhbccgMoOw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,236,1770624000"; d="scan'208";a="237916580" Received: from hanvin-mobl3.amr.corp.intel.com (HELO agluck-desk3.intel.com) ([10.124.222.27]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2026 12:39:52 -0700 From: Tony Luck To: Fenghua Yu , Reinette Chatre , Maciej Wieczor-Retman , Peter Newman , James Morse , Babu Moger , Drew Fustini , Dave Martin , Chen Yu Cc: Borislav Petkov , x86@kernel.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev, Tony Luck Subject: [PATCH v2 5/5] fs/resctrl: Fix issues with worker threads when CPUs are taken offline Date: Fri, 15 May 2026 12:39:44 -0700 Message-ID: <20260515193944.15114-6-tony.luck@intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260515193944.15114-1-tony.luck@intel.com> References: <20260515193944.15114-1-tony.luck@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Reinette Chatre Sashiko noticed[1] a user-after-free in the resctrl worker thread code where the rdt_l3_mon_domain structure was freed while the worker was blocked waiting for locks. The root issue is that cancel_delayed_work() does not block in the case whe= re the worker thread is executing. This results in the race that Sashiko notic= ed, but also causes problems when the CPU that has been chosen to service the worker thread is taken offline. Note that worker threads are allowed to delete their own work_struct (see comment in kernel/workqueue.c:process_one_work()) so there can't be any problems on the return path from the worker in this case where the work_struct was deleted by other code while the worker was executing. Indicate failure of cancel_delayed_work() calls in resctrl_offline_cpu() by setting d->mbm_work_cpu or d->cqm_work_cpu to nr_cpu_ids. Make the worker threads check to see if they are no longer bound to the right CPU. In this case search the L3 domain list for any domain(s) with the work cpu set to nr_cpu_ids. In the case where the last CPU was removed from a domain, the domain has been removed from the list and there is nothing to do. If the domain still exists, then restart the worker on any of the remaining CPUs. Remove redundant cancel_delayed_work() calls from resctrl_offline_mon_domai= n(). Fixes: 24247aeeabe9 ("x86/intel_rdt/cqm: Improve limbo list processing") Co-developed-by: Tony Luck Signed-off-by: Tony Luck Link: https://sashiko.dev/#/patchset/20260429184858.36423-1-tony.luck%40int= el.com [1] --- fs/resctrl/monitor.c | 55 +++++++++++++++++++++++++++++++++++++++++++ fs/resctrl/rdtgroup.c | 27 +++++++++++++++------ 2 files changed, 75 insertions(+), 7 deletions(-) diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c index 9fd901c78dc6..c422850f044b 100644 --- a/fs/resctrl/monitor.c +++ b/fs/resctrl/monitor.c @@ -791,12 +791,38 @@ static void mbm_update(struct rdt_resource *r, struct= rdt_l3_mon_domain *d, */ void cqm_handle_limbo(struct work_struct *work) { + struct rdt_resource *r =3D resctrl_arch_get_resource(RDT_RESOURCE_L3); unsigned long delay =3D msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL); struct rdt_l3_mon_domain *d; =20 cpus_read_lock(); mutex_lock(&rdtgroup_mutex); =20 + /* + * Worker was blocked waiting for the CPU it was running on to go + * offline. Handle two scenarios: + * - Worker was running on the last CPU of a domain. The domain and + * thus the work_struct has been freed so do not attempt to obtain + * domain via container_of(). All remaining domains have limbo + * handlers so the loop will not find any domains needing a + * limbo handler. Just exit. + * - Worker was running on CPU that just went offline with other + * CPUs in domain still running and available to take over the + * worker. Offline handler could not schedule a new worker on + * another CPU in the domain but signaled that this needs to be + * done by setting cqm_work_cpu to nr_cpu_ids. Find the domain + * that needs a worker and schedule it after the normal CQM + * interval. + */ + if (!is_percpu_thread()) { + list_for_each_entry(d, &r->mon_domains, hdr.list) { + if (d->cqm_work_cpu =3D=3D nr_cpu_ids) + cqm_setup_limbo_handler(d, CQM_LIMBOCHECK_INTERVAL, + RESCTRL_PICK_ANY_CPU); + } + goto out_unlock; + } + d =3D container_of(work, struct rdt_l3_mon_domain, cqm_limbo.work); =20 __check_limbo(d, false); @@ -808,6 +834,7 @@ void cqm_handle_limbo(struct work_struct *work) delay); } =20 +out_unlock: mutex_unlock(&rdtgroup_mutex); cpus_read_unlock(); } @@ -852,6 +879,34 @@ void mbm_handle_overflow(struct work_struct *work) goto out_unlock; =20 r =3D resctrl_arch_get_resource(RDT_RESOURCE_L3); + + /* + * Worker was blocked waiting for the CPU it was running on to go + * offline. Handle two scenarios: + * - Worker was running on the last CPU of a domain. The domain and + * thus the work_struct has been freed so do not attempt to obtain + * domain via container_of(). All remaining domains have overflow + * handlers so the loop will not find any domains needing an + * overflow handler. Just exit. + * - Worker was running on CPU that just went offline with other + * CPUs in domain still running and available to take over the + * worker. Offline handler could not schedule a new worker on + * another CPU in the domain but signaled that this needs to be + * done by setting mbm_work_cpu to nr_cpu_ids. Find the domain + * that needs a worker and schedule it to run after the normal + * MBM interval. This is completely safe on CPUs with wide MBM + * counters. Likely OK for old CPUs with narrow counters as the + * MBM_OVERFLOW_INTERVAL was picked conservatively. + */ + if (!is_percpu_thread()) { + list_for_each_entry(d, &r->mon_domains, hdr.list) { + if (d->mbm_work_cpu =3D=3D nr_cpu_ids) + mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL, + RESCTRL_PICK_ANY_CPU); + } + goto out_unlock; + } + d =3D container_of(work, struct rdt_l3_mon_domain, mbm_over.work); =20 list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) { diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c index 282a0acedea8..fd82fc78b058 100644 --- a/fs/resctrl/rdtgroup.c +++ b/fs/resctrl/rdtgroup.c @@ -4376,8 +4376,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *= r, struct rdt_domain_hdr *h goto out_unlock; =20 d =3D container_of(hdr, struct rdt_l3_mon_domain, hdr); - if (resctrl_is_mbm_enabled()) - cancel_delayed_work(&d->mbm_over); + if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && has_busy_rmid(= d)) { /* * When a package is going down, forcefully @@ -4388,7 +4387,6 @@ void resctrl_offline_mon_domain(struct rdt_resource *= r, struct rdt_domain_hdr *h * package never comes back. */ __check_limbo(d, true); - cancel_delayed_work(&d->cqm_limbo); } =20 domain_destroy_l3_mon_state(d); @@ -4569,13 +4567,28 @@ void resctrl_offline_cpu(unsigned int cpu) d =3D get_mon_domain_from_cpu(cpu, l3); if (d) { if (resctrl_is_mbm_enabled() && cpu =3D=3D d->mbm_work_cpu) { - cancel_delayed_work(&d->mbm_over); - mbm_setup_overflow_handler(d, 0, cpu); + if (cancel_delayed_work(&d->mbm_over)) { + mbm_setup_overflow_handler(d, 0, cpu); + } else { + /* + * Unable to schedule work on new CPU if it + * is currently running since the re-schedule + * will just force new work to run on + * current CPU. Mark domain's worker as + * needing to be rescheduled to be handled + * by worker itself. + */ + d->mbm_work_cpu =3D nr_cpu_ids; + } } if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && cpu =3D=3D d->cqm_work_cpu && has_busy_rmid(d)) { - cancel_delayed_work(&d->cqm_limbo); - cqm_setup_limbo_handler(d, 0, cpu); + if (cancel_delayed_work(&d->cqm_limbo)) { + cqm_setup_limbo_handler(d, 0, cpu); + } else { + /* Same as mbm_work_cpu case above */ + d->cqm_work_cpu =3D nr_cpu_ids; + } } } =20 --=20 2.54.0