From nobody Sat Jun 13 13:31:36 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E5605224AF9 for ; Fri, 22 May 2026 19:15:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779477326; cv=none; b=nVWQbC0sQDuC9Rm2JzWuTBWeHyUoBEYGSaY5SPFT8P5v9HMogq5wdzhjijbq8YVE8NuvYSmqwa8qw0ofJcQeSW01x0dFo00zEe5tjeOgE8syz54k7nncTmMdL7lU/63wj63fQeP2B9Fd7NzBRsgW6L4WS0gPZA/GcuEQND9FDcc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779477326; c=relaxed/simple; bh=4FmSR9zmAHWMq26MKJua9afqYgMT4STvgcb9edt8Ngw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Yq1kVxu61YMf1Lc7pGw0OdjxK1MVII7/ptJRT9+JpIzAXBWtD4Hqz2mJ+7ZdcyuwKuF3D9dAwrtmATtn1uV8gPPnLyqZc10vKUVlxvp5qw+3DLIy+ndGcIZdUvfoN2CX42V8guj95AXBeWk5lX0yzlQ+uJv6140UYld375h3GHw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=aM2ICl+0; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="aM2ICl+0" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779477325; x=1811013325; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4FmSR9zmAHWMq26MKJua9afqYgMT4STvgcb9edt8Ngw=; b=aM2ICl+0DuAP26bomf9jh/vqkc7eeIf9zn7d5Txzn0k180iZysHwaHs4 Qm2a0EOQMSlrRmmOWmGUUeMiPq9diH3ESCM8skuq9EyTDOIlHrIZm8BIj QTIV89JTpkGv0NZqti0fhVYXHZR8F9fHG1yaJsa8D7nUXyZa7i5PHmRRI G/dt9dvViGLTKP5Y9CsyIAxYqObuNhMV4bSycUN2uIbvy2qfDuBTdHewZ iqFczZhFSKpCM+NRcv2Mqcs3sbS0aj3muhyJirFfzZ4rhjz0QAd2mjNKo Iy6fHXAIlwjMTKt4ibaoRl5ZEPqw+No//Y6qSUZ1vX9eOuaewCm8GcepA g==; X-CSE-ConnectionGUID: M3UjsKufS+yFWqHC7uOrKw== X-CSE-MsgGUID: xoAz5JaUSbWVv0LjefYTYA== X-IronPort-AV: E=McAfee;i="6800,10657,11794"; a="80140429" X-IronPort-AV: E=Sophos;i="6.24,162,1774335600"; d="scan'208";a="80140429" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2026 12:15:23 -0700 X-CSE-ConnectionGUID: blIO7JyPRbiROrKhhN848A== X-CSE-MsgGUID: DPaizbUoS4q+r+kTgnpPJg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,162,1774335600"; d="scan'208";a="271336459" Received: from rchatre-desk1.jf.intel.com ([10.165.154.99]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2026 12:15:24 -0700 From: Reinette Chatre To: tony.luck@intel.com, james.morse@arm.com, Dave.Martin@arm.com, babu.moger@amd.com, bp@alien8.de, tglx@linutronix.de, dave.hansen@linux.intel.com Cc: x86@kernel.org, hpa@zytor.com, ben.horgan@arm.com, fustini@kernel.org, fenghuay@nvidia.com, peternewman@google.com, yu.c.chen@intel.com, linux-kernel@vger.kernel.org, patches@lists.linux.dev, reinette.chatre@intel.com Subject: [PATCH v3 1/9] fs/resctrl: Move functions to avoid forward references in subsequent fixes Date: Fri, 22 May 2026 12:15:05 -0700 Message-ID: <872066b8b05a0dc1cbc81d57ec7154f9566de755.1779476724.git.reinette.chatre@intel.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Tony Luck rdt_get_tree() manages resctrl fs mount and rdt_kill_sb() manages resctrl fs unmount. There is significant overlap between error cleanup during resctrl mount failure and cleanup on resctrl unmount yet the cleanup is not done consistently in these two flows. Pull some cleanup functions before rdt_get_tree() in preparation for a new helper that can be shared between mount and unmount. Signed-off-by: Tony Luck Signed-off-by: Reinette Chatre Reviewed-by: Ben Horgan --- Changes since V2: - Rewrite changelog. --- fs/resctrl/rdtgroup.c | 376 +++++++++++++++++++++--------------------- 1 file changed, 188 insertions(+), 188 deletions(-) diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c index af2cbab14497..91922fe1ea08 100644 --- a/fs/resctrl/rdtgroup.c +++ b/fs/resctrl/rdtgroup.c @@ -2792,6 +2792,194 @@ static void schemata_list_destroy(void) } } =20 +/* + * Move tasks from one to the other group. If @from is NULL, then all tasks + * in the systems are moved unconditionally (used for teardown). + * + * If @mask is not NULL the cpus on which moved tasks are running are set + * in that mask so the update smp function call is restricted to affected + * cpus. + */ +static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *t= o, + struct cpumask *mask) +{ + struct task_struct *p, *t; + + read_lock(&tasklist_lock); + for_each_process_thread(p, t) { + if (!from || is_closid_match(t, from) || + is_rmid_match(t, from)) { + resctrl_arch_set_closid_rmid(t, to->closid, + to->mon.rmid); + + /* + * Order the closid/rmid stores above before the loads + * in task_curr(). This pairs with the full barrier + * between the rq->curr update and + * resctrl_arch_sched_in() during context switch. + */ + smp_mb(); + + /* + * If the task is on a CPU, set the CPU in the mask. + * The detection is inaccurate as tasks might move or + * schedule before the smp function call takes place. + * In such a case the function call is pointless, but + * there is no other side effect. + */ + if (IS_ENABLED(CONFIG_SMP) && mask && task_curr(t)) + cpumask_set_cpu(task_cpu(t), mask); + } + } + read_unlock(&tasklist_lock); +} + +static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp) +{ + struct rdtgroup *sentry, *stmp; + struct list_head *head; + + head =3D &rdtgrp->mon.crdtgrp_list; + list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) { + rdtgroup_unassign_cntrs(sentry); + free_rmid(sentry->closid, sentry->mon.rmid); + list_del(&sentry->mon.crdtgrp_list); + + if (atomic_read(&sentry->waitcount) !=3D 0) + sentry->flags =3D RDT_DELETED; + else + rdtgroup_remove(sentry); + } +} + +/* + * Forcibly remove all of subdirectories under root. + */ +static void rmdir_all_sub(void) +{ + struct rdtgroup *rdtgrp, *tmp; + + /* Move all tasks to the default resource group */ + rdt_move_group_tasks(NULL, &rdtgroup_default, NULL); + + list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) { + /* Free any child rmids */ + free_all_child_rdtgrp(rdtgrp); + + /* Remove each rdtgroup other than root */ + if (rdtgrp =3D=3D &rdtgroup_default) + continue; + + if (rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKSETUP || + rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKED) + rdtgroup_pseudo_lock_remove(rdtgrp); + + /* + * Give any CPUs back to the default group. We cannot copy + * cpu_online_mask because a CPU might have executed the + * offline callback already, but is still marked online. + */ + cpumask_or(&rdtgroup_default.cpu_mask, + &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask); + + rdtgroup_unassign_cntrs(rdtgrp); + + free_rmid(rdtgrp->closid, rdtgrp->mon.rmid); + + kernfs_remove(rdtgrp->kn); + list_del(&rdtgrp->rdtgroup_list); + + if (atomic_read(&rdtgrp->waitcount) !=3D 0) + rdtgrp->flags =3D RDT_DELETED; + else + rdtgroup_remove(rdtgrp); + } + /* Notify online CPUs to update per cpu storage and PQR_ASSOC MSR */ + update_closid_rmid(cpu_online_mask, &rdtgroup_default); + + kernfs_remove(kn_info); + kernfs_remove(kn_mongrp); + kernfs_remove(kn_mondata); +} + +/** + * mon_get_kn_priv() - Get the mon_data priv data for this event. + * + * The same values are used across the mon_data directories of all control= and + * monitor groups for the same event in the same domain. Keep a list of + * allocated structures and re-use an existing one with the same values for + * @rid, @domid, etc. + * + * @rid: The resource id for the event file being created. + * @domid: The domain id for the event file being created. + * @mevt: The type of event file being created. + * @do_sum: Whether SNC summing monitors are being created. Only set + * when @rid =3D=3D RDT_RESOURCE_L3. + * + * Return: Pointer to mon_data private data of the event, NULL on failure. + */ +static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int do= mid, + struct mon_evt *mevt, + bool do_sum) +{ + struct mon_data *priv; + + lockdep_assert_held(&rdtgroup_mutex); + + list_for_each_entry(priv, &mon_data_kn_priv_list, list) { + if (priv->rid =3D=3D rid && priv->domid =3D=3D domid && + priv->sum =3D=3D do_sum && priv->evt =3D=3D mevt) + return priv; + } + + priv =3D kzalloc_obj(*priv); + if (!priv) + return NULL; + + priv->rid =3D rid; + priv->domid =3D domid; + priv->sum =3D do_sum; + priv->evt =3D mevt; + list_add_tail(&priv->list, &mon_data_kn_priv_list); + + return priv; +} + +/** + * mon_put_kn_priv() - Free all allocated mon_data structures. + * + * Called when resctrl file system is unmounted. + */ +static void mon_put_kn_priv(void) +{ + struct mon_data *priv, *tmp; + + lockdep_assert_held(&rdtgroup_mutex); + + list_for_each_entry_safe(priv, tmp, &mon_data_kn_priv_list, list) { + list_del(&priv->list); + kfree(priv); + } +} + +static void resctrl_fs_teardown(void) +{ + lockdep_assert_held(&rdtgroup_mutex); + + /* Cleared by rdtgroup_destroy_root() */ + if (!rdtgroup_default.kn) + return; + + rmdir_all_sub(); + rdtgroup_unassign_cntrs(&rdtgroup_default); + mon_put_kn_priv(); + rdt_pseudo_lock_release(); + rdtgroup_default.mode =3D RDT_MODE_SHAREABLE; + closid_exit(); + schemata_list_destroy(); + rdtgroup_destroy_root(); +} + static int rdt_get_tree(struct fs_context *fc) { struct rdt_fs_context *ctx =3D rdt_fc2context(fc); @@ -2991,194 +3179,6 @@ static int rdt_init_fs_context(struct fs_context *f= c) return 0; } =20 -/* - * Move tasks from one to the other group. If @from is NULL, then all tasks - * in the systems are moved unconditionally (used for teardown). - * - * If @mask is not NULL the cpus on which moved tasks are running are set - * in that mask so the update smp function call is restricted to affected - * cpus. - */ -static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *t= o, - struct cpumask *mask) -{ - struct task_struct *p, *t; - - read_lock(&tasklist_lock); - for_each_process_thread(p, t) { - if (!from || is_closid_match(t, from) || - is_rmid_match(t, from)) { - resctrl_arch_set_closid_rmid(t, to->closid, - to->mon.rmid); - - /* - * Order the closid/rmid stores above before the loads - * in task_curr(). This pairs with the full barrier - * between the rq->curr update and - * resctrl_arch_sched_in() during context switch. - */ - smp_mb(); - - /* - * If the task is on a CPU, set the CPU in the mask. - * The detection is inaccurate as tasks might move or - * schedule before the smp function call takes place. - * In such a case the function call is pointless, but - * there is no other side effect. - */ - if (IS_ENABLED(CONFIG_SMP) && mask && task_curr(t)) - cpumask_set_cpu(task_cpu(t), mask); - } - } - read_unlock(&tasklist_lock); -} - -static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp) -{ - struct rdtgroup *sentry, *stmp; - struct list_head *head; - - head =3D &rdtgrp->mon.crdtgrp_list; - list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) { - rdtgroup_unassign_cntrs(sentry); - free_rmid(sentry->closid, sentry->mon.rmid); - list_del(&sentry->mon.crdtgrp_list); - - if (atomic_read(&sentry->waitcount) !=3D 0) - sentry->flags =3D RDT_DELETED; - else - rdtgroup_remove(sentry); - } -} - -/* - * Forcibly remove all of subdirectories under root. - */ -static void rmdir_all_sub(void) -{ - struct rdtgroup *rdtgrp, *tmp; - - /* Move all tasks to the default resource group */ - rdt_move_group_tasks(NULL, &rdtgroup_default, NULL); - - list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) { - /* Free any child rmids */ - free_all_child_rdtgrp(rdtgrp); - - /* Remove each rdtgroup other than root */ - if (rdtgrp =3D=3D &rdtgroup_default) - continue; - - if (rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKSETUP || - rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKED) - rdtgroup_pseudo_lock_remove(rdtgrp); - - /* - * Give any CPUs back to the default group. We cannot copy - * cpu_online_mask because a CPU might have executed the - * offline callback already, but is still marked online. - */ - cpumask_or(&rdtgroup_default.cpu_mask, - &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask); - - rdtgroup_unassign_cntrs(rdtgrp); - - free_rmid(rdtgrp->closid, rdtgrp->mon.rmid); - - kernfs_remove(rdtgrp->kn); - list_del(&rdtgrp->rdtgroup_list); - - if (atomic_read(&rdtgrp->waitcount) !=3D 0) - rdtgrp->flags =3D RDT_DELETED; - else - rdtgroup_remove(rdtgrp); - } - /* Notify online CPUs to update per cpu storage and PQR_ASSOC MSR */ - update_closid_rmid(cpu_online_mask, &rdtgroup_default); - - kernfs_remove(kn_info); - kernfs_remove(kn_mongrp); - kernfs_remove(kn_mondata); -} - -/** - * mon_get_kn_priv() - Get the mon_data priv data for this event. - * - * The same values are used across the mon_data directories of all control= and - * monitor groups for the same event in the same domain. Keep a list of - * allocated structures and re-use an existing one with the same values for - * @rid, @domid, etc. - * - * @rid: The resource id for the event file being created. - * @domid: The domain id for the event file being created. - * @mevt: The type of event file being created. - * @do_sum: Whether SNC summing monitors are being created. Only set - * when @rid =3D=3D RDT_RESOURCE_L3. - * - * Return: Pointer to mon_data private data of the event, NULL on failure. - */ -static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int do= mid, - struct mon_evt *mevt, - bool do_sum) -{ - struct mon_data *priv; - - lockdep_assert_held(&rdtgroup_mutex); - - list_for_each_entry(priv, &mon_data_kn_priv_list, list) { - if (priv->rid =3D=3D rid && priv->domid =3D=3D domid && - priv->sum =3D=3D do_sum && priv->evt =3D=3D mevt) - return priv; - } - - priv =3D kzalloc_obj(*priv); - if (!priv) - return NULL; - - priv->rid =3D rid; - priv->domid =3D domid; - priv->sum =3D do_sum; - priv->evt =3D mevt; - list_add_tail(&priv->list, &mon_data_kn_priv_list); - - return priv; -} - -/** - * mon_put_kn_priv() - Free all allocated mon_data structures. - * - * Called when resctrl file system is unmounted. - */ -static void mon_put_kn_priv(void) -{ - struct mon_data *priv, *tmp; - - lockdep_assert_held(&rdtgroup_mutex); - - list_for_each_entry_safe(priv, tmp, &mon_data_kn_priv_list, list) { - list_del(&priv->list); - kfree(priv); - } -} - -static void resctrl_fs_teardown(void) -{ - lockdep_assert_held(&rdtgroup_mutex); - - /* Cleared by rdtgroup_destroy_root() */ - if (!rdtgroup_default.kn) - return; - - rmdir_all_sub(); - rdtgroup_unassign_cntrs(&rdtgroup_default); - mon_put_kn_priv(); - rdt_pseudo_lock_release(); - rdtgroup_default.mode =3D RDT_MODE_SHAREABLE; - closid_exit(); - schemata_list_destroy(); - rdtgroup_destroy_root(); -} - static void rdt_kill_sb(struct super_block *sb) { struct rdt_resource *r; --=20 2.50.1 From nobody Sat Jun 13 13:31:36 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 349AC356744 for ; Fri, 22 May 2026 19:15:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779477328; cv=none; b=MTVB6iVKTUuHJLW8O02iUZ6ISzJ3/Yju2fkoHEjfN0sUc+zZHwhk9DgBMKGtlPMXR/l4F9kj97W/+Nrhtzr1uElKwP80yViVYzlhfaG5YqomyHZGB/hXC9F6oe7Z5F5FU3+fskxAr0UGePUMyct6opRIr+j4HpMRoth3ySpiPC8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779477328; c=relaxed/simple; bh=UpITk+dcAwtfiVSJvVbxBz6SS0IuTrTZiZwGKZ+EDiY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UzE44wFxEMCBhE/EMZeojWmeZysFVqur/9Cz9fxVYCANAGcMgD2TD4LBPhr9rSYnodkuMwpCotrAYKEbmUw5X2NpeQMm8Db8sEFCsj211l4Ecnqqo4KeB5lsM2SMv9evwaEe3MmZMvKcVuO16jG9At2UcDjjGiI8HwyqVfGuqDM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=dJ9+lz32; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="dJ9+lz32" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779477326; x=1811013326; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UpITk+dcAwtfiVSJvVbxBz6SS0IuTrTZiZwGKZ+EDiY=; b=dJ9+lz32P6N/i3whvUt076VkpNB59+7NfYdtPvJ2EQOL0s8n0YqJjGBq AbVvUk+9IfXkhaxEuWReixo3keqdvU4/JKq308BpamLwMf11h8Rukv/Vw CrasQ+S33n4IfxG4eW7YSH/JRoWacvscwiP9o1btT66ujHv5wMQZ46c0G eQw/dcYqVtfd23P51FSfSjKysATi0mCqRIII3TqPuDKg3FEpvCXvq/BUo sAhdnvhpfYkARNf7Y1q5NzidT4lDm+q17YZV9VhkIQopAMdUPIWQtN6hK 50euwsjVf/MKVlx6v5FgLDjX4PMEPfFBIbBYa6DzVjJlT1wo1sl2jji93 w==; X-CSE-ConnectionGUID: 0C7eUSewR92b+duacKjLew== X-CSE-MsgGUID: 8q0tsOw5RFqw0ZDlhbDr1w== X-IronPort-AV: E=McAfee;i="6800,10657,11794"; a="80140439" X-IronPort-AV: E=Sophos;i="6.24,162,1774335600"; d="scan'208";a="80140439" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2026 12:15:23 -0700 X-CSE-ConnectionGUID: GS8TE41WQwyZPNfqBqUtUQ== X-CSE-MsgGUID: 0Up5ztFuTK++DKDp0SQqeg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,162,1774335600"; d="scan'208";a="271336462" Received: from rchatre-desk1.jf.intel.com ([10.165.154.99]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2026 12:15:24 -0700 From: Reinette Chatre To: tony.luck@intel.com, james.morse@arm.com, Dave.Martin@arm.com, babu.moger@amd.com, bp@alien8.de, tglx@linutronix.de, dave.hansen@linux.intel.com Cc: x86@kernel.org, hpa@zytor.com, ben.horgan@arm.com, fustini@kernel.org, fenghuay@nvidia.com, peternewman@google.com, yu.c.chen@intel.com, linux-kernel@vger.kernel.org, patches@lists.linux.dev, reinette.chatre@intel.com Subject: [PATCH v3 2/9] fs/resctrl: Free mon_data structures on rdt_get_tree() failure Date: Fri, 22 May 2026 12:15:06 -0700 Message-ID: <9ef75c3c853c29e5051c8915901d43edff011a1c.1779476724.git.reinette.chatre@intel.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Tony Luck If mkdir_mondata_all() or a subsequent call in rdt_get_tree() fails, the mon_data structures allocated by mon_get_kn_priv() are leaked. Add mon_put_kn_priv() to the out_mongrp error path to free the mon_data structures. Fixes: 2a6566038544 ("x86/resctrl: Expand the width of domid by replacing m= on_data_bits") Reported-by: Reinette Chatre Signed-off-by: Tony Luck Signed-off-by: Reinette Chatre Reviewed-by: Ben Horgan --- Changes since V2: - Reword changelog. --- fs/resctrl/rdtgroup.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c index 91922fe1ea08..f573db9e6e84 100644 --- a/fs/resctrl/rdtgroup.c +++ b/fs/resctrl/rdtgroup.c @@ -3081,6 +3081,7 @@ static int rdt_get_tree(struct fs_context *fc) kernfs_remove(kn_mondata); out_mongrp: if (resctrl_arch_mon_capable()) { + mon_put_kn_priv(); rdtgroup_unassign_cntrs(&rdtgroup_default); kernfs_remove(kn_mongrp); } --=20 2.50.1 From nobody Sat Jun 13 13:31:36 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E5FA2FE056 for ; Fri, 22 May 2026 19:15:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779477328; cv=none; b=oA+e6LVwBtwpHGQ8AzW/uc6Y3IzuFgI7xsjUiRWBkGxPVb5WO5JjKkHUtyJ6D2yqbpEPFzfOCH+ctQUszvQwMZUmYzFbD8Q0J/dxh8EnKL8OPQOae1nFHiJ92wAnH/1hyH8dSwsXCe0qg+NEP//9GD6CqcZZ45Nwg9nhngfnAo0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779477328; c=relaxed/simple; bh=M1GvEqxYzKuX9Mft4AR9cVFbpKNqwdER8HTsCv2sWoE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nyZBS8tHPd2LjBK9z5Lf0kyAsXg/zJZ6yTl1bAN97Ojcm+4QpVJ+NSY6fokDd7/UMN9DOLaYOFKqOnUP+G64C8ogluKbsQNSp9r6hVV+R/LsjYS4MNniPfaWu0dN4CxvRd6zLv3GojQ0yWwRZjfsTKw0HXZB2jwsgTRLVFrnajg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=bUOf1QoI; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="bUOf1QoI" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779477326; x=1811013326; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=M1GvEqxYzKuX9Mft4AR9cVFbpKNqwdER8HTsCv2sWoE=; b=bUOf1QoIzvbnG1Rkx1b3XjtN1fePIjKxUr3ymenIT1ThB8Be38FeQrqD IRbZxsWwrRsvkZYX/Lwo7r0VFobKt+mpX9xv3qDQ9QQkHAhMpqfZ6g4qr VqiY0qoppsSN2xl5EGTqH3limvHg5et9mZxWhaxur6WdM8tBVFEmxwm+r IGXVqKidEGTinqwE9MK84iYol8uqPD2e1bzbj+/0wt7Jt3NYwvTGSOWgK eP2pbOKOMButOEgGkZPhgqBDZrU+N7Efbj7sLfBV5la1KuFCkALKKYsaQ ZShc3JvAAxKNZKkGvEVmU5WvvhmXMJ3ZzBjtZpeK3e8O+3vCHQrsyhTjw A==; X-CSE-ConnectionGUID: bK6JRUnFSMSeGbydC1JJNg== X-CSE-MsgGUID: NRYX/inYROeptFX2LOWRSg== X-IronPort-AV: E=McAfee;i="6800,10657,11794"; a="80140449" X-IronPort-AV: E=Sophos;i="6.24,162,1774335600"; d="scan'208";a="80140449" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2026 12:15:23 -0700 X-CSE-ConnectionGUID: luG879z4Rg2BiCeUp+axeQ== X-CSE-MsgGUID: YuRZtfIaTkmFi6HV06G69A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,162,1774335600"; d="scan'208";a="271336465" Received: from rchatre-desk1.jf.intel.com ([10.165.154.99]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2026 12:15:24 -0700 From: Reinette Chatre To: tony.luck@intel.com, james.morse@arm.com, Dave.Martin@arm.com, babu.moger@amd.com, bp@alien8.de, tglx@linutronix.de, dave.hansen@linux.intel.com Cc: x86@kernel.org, hpa@zytor.com, ben.horgan@arm.com, fustini@kernel.org, fenghuay@nvidia.com, peternewman@google.com, yu.c.chen@intel.com, linux-kernel@vger.kernel.org, patches@lists.linux.dev, reinette.chatre@intel.com Subject: [PATCH v3 3/9] fs/resctrl: Fix use-after-free during unmount Date: Fri, 22 May 2026 12:15:07 -0700 Message-ID: <0a48bc36304da0fce4525672cca5f41c3d8ae678.1779476724.git.reinette.chatre@intel.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Tony Luck During unmount or failure teardown all mon_data structures that contain monitoring event file private data are freed after which kernfs nodes are removed. However, the RDT_DELETED flag is never set for the statically allocated default resource group. A concurrent reader of an event file associated with the default resource group may, after dropping kernfs active protection, block on rdtgroup_mutex while unmount proceeds to free the file private data and destroy the kernfs node without waiting for the reader. When the mutex is released, the reader wakes up, observes that RDT_DELETED is not set for the default group, and dereferences the already-freed file private data. Set RDT_DELETED for the default group unconditionally since the flag does not lead to free of this statically allocated group. Do not allow a new resctrl mount if there are any waiters on default group of previous mount. A new mount will re-initialize the default group that would appear to waiters from previous mount as though the default group is accessible causing them to access the mon_data structures from the previous mount that have been removed. Fixes: 2a6566038544 ("x86/resctrl: Expand the width of domid by replacing m= on_data_bits") Reported-by: Sashiko Closes: https://sashiko.dev/#/patchset/20260508182143.14592-1-tony.luck%40i= ntel.com?part=3D2 [1] Signed-off-by: Tony Luck Signed-off-by: Reinette Chatre Reviewed-by: Chen Yu --- Changes since V2: - Rewrite changelog to not describe code as much. - Rework changelog to switch to "Reported-by/Closes". - Merge the duplicate rdtgroup_remove() comment with the function comment. - Fix changelog to not mention that RDT_DELETED flag is set conditionally. - Change "Fixes:" tag to point to commit that introduced dynamically allocated mon_data this bug involves. --- fs/resctrl/rdtgroup.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c index f573db9e6e84..8a1457825919 100644 --- a/fs/resctrl/rdtgroup.c +++ b/fs/resctrl/rdtgroup.c @@ -585,14 +585,20 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open= _file *of, * * On resource group creation via a mkdir, an extra kernfs_node reference = is * taken to ensure that the rdtgroup structure remains accessible for the - * rdtgroup_kn_unlock() calls where it is removed. + * rdtgroup_kn_unlock() calls where it is removed. The default group is + * statically allocated: it does not have an extra reference but will have + * RDT_DELETED set on unmount to support safe access to its associated fil= es + * via rdtgroup_kn_lock_live/rdtgroup_kn_unlock(). * - * Drop the extra reference here, then free the rdtgroup structure. + * For all but the default group: drop the extra reference, then free the + * rdtgroup structure. * * Return: void */ static void rdtgroup_remove(struct rdtgroup *rdtgrp) { + if (rdtgrp =3D=3D &rdtgroup_default) + return; kernfs_put(rdtgrp->kn); kfree(rdtgrp); } @@ -2975,6 +2981,7 @@ static void resctrl_fs_teardown(void) mon_put_kn_priv(); rdt_pseudo_lock_release(); rdtgroup_default.mode =3D RDT_MODE_SHAREABLE; + rdtgroup_default.flags =3D RDT_DELETED; closid_exit(); schemata_list_destroy(); rdtgroup_destroy_root(); @@ -3000,6 +3007,12 @@ static int rdt_get_tree(struct fs_context *fc) goto out; } =20 + /* Avoid races from pending operations from a previous mount */ + if (atomic_read(&rdtgroup_default.waitcount) !=3D 0) { + ret =3D -EBUSY; + goto out; + } + ret =3D setup_rmid_lru_list(); if (ret) goto out; @@ -4275,6 +4288,7 @@ static int rdtgroup_setup_root(struct rdt_fs_context = *ctx) =20 ctx->kfc.root =3D rdt_root; rdtgroup_default.kn =3D kernfs_root_to_node(rdt_root); + rdtgroup_default.flags =3D 0; =20 return 0; } --=20 2.50.1 From nobody Sat Jun 13 13:31:36 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C2DC3793B8 for ; Fri, 22 May 2026 19:15:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779477328; cv=none; b=fvgoG1l4sCf550vEkvxgQRbVysEawQxwpFSx0JQ9+ckkGoAWUKDbrjurgfeRICjQsIGlCMkyb8UX54KwodTaFfdumIZmsQZ2uvU+GzPcrq1zY0I4CJyUesIwXyWeNicrzbdO2YaQlr+WaJngy8tg/QsHabwpxnoOf0pnvwL1vt4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779477328; c=relaxed/simple; bh=pAqfW0Xh8KljEKywHhDPZm56mnGruKod6pG9vzhZKY4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cU+u07HW6DXVZZPruIFWW2EOFkkCSGorBnw6F3/YhhB5L8opmarlOCG+3tzQjIQAB5r898YUYlxLjjDe0tkWKkGKDp+cxKfyuZqiJIKm4rYPyHacrX9TlpkvdBOI/6qFkIHVxIDU6jxzSe3sQSzu+U2yH4tfbBKnlnVDRzX0oM8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=MnlYkwC/; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="MnlYkwC/" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779477327; x=1811013327; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pAqfW0Xh8KljEKywHhDPZm56mnGruKod6pG9vzhZKY4=; b=MnlYkwC/ytOqE3/Q/OXYDxDYqOLstbri1321KCdCiNwiJH4B0a/ax6e1 yJ49j7W//bHT9fBYyoAZvHD11PrucZVM11YJi7eErmX/4bc6ntYwxCJJF 6cdnvD8KAFdibur0XANdzrounQw3y5PAr3dQe+GuHq9oraGUpV6JgSQUG lkP9b1FbU6WtoEwlj0KA4I4k8ReY6TqkvRbwGm/0uBELj7jwWQC00tuaH C5zxl6+6vd5K6wa83608eztw7DeUmXKbsFRq3+ujOFhpkd4joX0kBaygu vqTbymXTM+pGMuutjf5r4hJuFDPGfLj20s6FsIN3/6Wp8j++sTnPHtgXv Q==; X-CSE-ConnectionGUID: db2q7yRqTsyNaa0dSWG1Vw== X-CSE-MsgGUID: emI+PocyRCqW7as+LNOcoQ== X-IronPort-AV: E=McAfee;i="6800,10657,11794"; a="80140459" X-IronPort-AV: E=Sophos;i="6.24,162,1774335600"; d="scan'208";a="80140459" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2026 12:15:23 -0700 X-CSE-ConnectionGUID: bhTRB8K+TX+i268WHcW5aQ== X-CSE-MsgGUID: y2bX4aFWT5CGk0FT/0nRaA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,162,1774335600"; d="scan'208";a="271336468" Received: from rchatre-desk1.jf.intel.com ([10.165.154.99]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2026 12:15:24 -0700 From: Reinette Chatre To: tony.luck@intel.com, james.morse@arm.com, Dave.Martin@arm.com, babu.moger@amd.com, bp@alien8.de, tglx@linutronix.de, dave.hansen@linux.intel.com Cc: x86@kernel.org, hpa@zytor.com, ben.horgan@arm.com, fustini@kernel.org, fenghuay@nvidia.com, peternewman@google.com, yu.c.chen@intel.com, linux-kernel@vger.kernel.org, patches@lists.linux.dev, reinette.chatre@intel.com Subject: [PATCH v3 4/9] fs/resctrl: Fix deadlock for errors during mount Date: Fri, 22 May 2026 12:15:08 -0700 Message-ID: <2e44b91263f1ccc4403a969524594800bead95c4.1779476724.git.reinette.chatre@intel.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" rdt_get_tree() acquires rdtgroup_mutex before calling kernfs_get_tree(). If superblock setup fails inside kernfs_get_tree(), the VFS calls kill_sb on the same thread before the call returns. rdt_kill_sb() unconditionally attempts to acquire rdtgroup_mutex and deadlock occurs. Move the call to kernfs_get_tree() outside of locks. If kernfs_get_tree() fails and ctx->kfc.new_sb_created is set, then rdt_kill_sb() has already been called and no further cleanup is needed. Add an extra hold in this error path on rdtgroup_default.kn to defend again= st other races destroying the root which is then dereferenced in kernfs_kill_s= b() Add resctrl_unmount() helper to keep code consistent between the rdt_get_tree() failure path and a normal unmount. Fixes: 5ff193fbde20 ("x86/intel_rdt: Add basic resctrl filesystem support") Reported-by: Sashiko Closes: https://sashiko.dev/#/patchset/20260429184858.36423-1-tony.luck%40i= ntel.com [1] Co-developed-by: Tony Luck Signed-off-by: Tony Luck Signed-off-by: Reinette Chatre Reviewed-by: Ben Horgan Reviewed-by: Chen Yu --- Changes since V2: - Switch to "Reported-by/Closes" in changelog --- fs/resctrl/rdtgroup.c | 82 +++++++++++++++++++++++++++++-------------- 1 file changed, 55 insertions(+), 27 deletions(-) diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c index 8a1457825919..d323b515060b 100644 --- a/fs/resctrl/rdtgroup.c +++ b/fs/resctrl/rdtgroup.c @@ -2987,10 +2987,34 @@ static void resctrl_fs_teardown(void) rdtgroup_destroy_root(); } =20 +static void resctrl_unmount(void) +{ + struct rdt_resource *r; + + cpus_read_lock(); + mutex_lock(&rdtgroup_mutex); + + rdt_disable_ctx(); + + /* Put everything back to default values. */ + for_each_alloc_capable_rdt_resource(r) + resctrl_arch_reset_all_ctrls(r); + + resctrl_fs_teardown(); + if (resctrl_arch_alloc_capable()) + resctrl_arch_disable_alloc(); + if (resctrl_arch_mon_capable()) + resctrl_arch_disable_mon(); + resctrl_mounted =3D false; + mutex_unlock(&rdtgroup_mutex); + cpus_read_unlock(); +} + static int rdt_get_tree(struct fs_context *fc) { struct rdt_fs_context *ctx =3D rdt_fc2context(fc); unsigned long flags =3D RFTYPE_CTRL_BASE; + struct kernfs_node *rdt_root_kn; struct rdt_l3_mon_domain *dom; struct rdt_resource *r; int ret; @@ -3066,10 +3090,6 @@ static int rdt_get_tree(struct fs_context *fc) if (ret) goto out_mondata; =20 - ret =3D kernfs_get_tree(fc); - if (ret < 0) - goto out_psl; - if (resctrl_arch_alloc_capable()) resctrl_arch_enable_alloc(); if (resctrl_arch_mon_capable()) @@ -3085,10 +3105,37 @@ static int rdt_get_tree(struct fs_context *fc) RESCTRL_PICK_ANY_CPU); } =20 - goto out; + /* + * Ensure root kn remains accessible after mutex is unlocked so that + * kernfs_kill_sb() can run safely if called by kernfs_get_tree()'s + * failure path after creating a superblock but before taking reference + * on root kn. + */ + kernfs_get(rdtgroup_default.kn); + + /* + * Make backup of the current root kn being created to be used in kernfs_= put(). + * The additional reference taken above will prevent the kn from being fr= eed + * before kernfs_kill_sb() can run but rdtgroup_default.kn may be set to = NULL + * via rdtgroup_destroy_root() and its backing root (rdt_root) could be o= verwritten + * before kernfs_put() can run. + */ + rdt_root_kn =3D rdtgroup_default.kn; + + rdt_last_cmd_clear(); + mutex_unlock(&rdtgroup_mutex); + cpus_read_unlock(); + + ret =3D kernfs_get_tree(fc); + /* + * resctrl can only be mounted once, new superblock only expected + * to be created once. + */ + if (!ctx->kfc.new_sb_created) + resctrl_unmount(); + kernfs_put(rdt_root_kn); + return ret; =20 -out_psl: - rdt_pseudo_lock_release(); out_mondata: if (resctrl_arch_mon_capable()) kernfs_remove(kn_mondata); @@ -3108,7 +3155,6 @@ static int rdt_get_tree(struct fs_context *fc) out_root: rdtgroup_destroy_root(); out: - rdt_last_cmd_clear(); mutex_unlock(&rdtgroup_mutex); cpus_read_unlock(); return ret; @@ -3195,26 +3241,8 @@ static int rdt_init_fs_context(struct fs_context *fc) =20 static void rdt_kill_sb(struct super_block *sb) { - struct rdt_resource *r; - - cpus_read_lock(); - mutex_lock(&rdtgroup_mutex); - - rdt_disable_ctx(); - - /* Put everything back to default values. */ - for_each_alloc_capable_rdt_resource(r) - resctrl_arch_reset_all_ctrls(r); - - resctrl_fs_teardown(); - if (resctrl_arch_alloc_capable()) - resctrl_arch_disable_alloc(); - if (resctrl_arch_mon_capable()) - resctrl_arch_disable_mon(); - resctrl_mounted =3D false; + resctrl_unmount(); kernfs_kill_sb(sb); - mutex_unlock(&rdtgroup_mutex); - cpus_read_unlock(); } =20 static struct file_system_type rdt_fs_type =3D { --=20 2.50.1 From nobody Sat Jun 13 13:31:36 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90A9337BE80 for ; Fri, 22 May 2026 19:15:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779477330; cv=none; b=Pxz8kDi0EgopSCycggQdiTjkQm8G+QWjRRqZ6m8MQsjHNKEak0acgbCASiJ7WK6wM13vHbnEB8/at0dT2DJcHaHKzxVrXM11DnnrqClIMI49nxKfY8hcXeUAgdDzS7ELYZr8fb0hfln3nd8tm3c+JI6h1IxAdJUOW/KlHHR3+QE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779477330; c=relaxed/simple; bh=ew9Innih+s619jXBAPyHY9uGpf+Yd02UdujYYwkihAM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=q9rr05FSat3CzrN80d5IYm7g2gEkoheEwYZMfWBPqD1KoxU7FTm2iAHZmZAbkradxpPb1ozb667UTOogOxKT/BoG+u+r1s/tr20E0stCOi5EmOtzyoEPeG1IC3t1NJZodJ4dL0/WebR8TxD944VWs6WBLlXu/FaSEGPVwUQ814c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=I1eU63OB; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="I1eU63OB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779477328; x=1811013328; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ew9Innih+s619jXBAPyHY9uGpf+Yd02UdujYYwkihAM=; b=I1eU63OB5YRM5johREKO5x1QwCyg/xGSfMiEFaw6xlBro8Qh7ifxbraf fhSOLyH+49LZ/JpMyT4vHRud9xQctEfOv4N2Bh3zAP1YCzv9/cyz88P4i Q1EzHBnMhzPrxeb/acAPPjbnfgyjSS/a3RxipnK8kZcNZxjeXVsjrNwRC CVi+1lRyAkktjmswj4ALBxdf3abFJ0lvJc7lcouJDV9JURAdMkVhGBLcP kaZ9YCU00RlAs+dG6CKZLYx5WOStLP0ufnqv9vpsj62EalQ+Oqm5CiwU0 kgapauZNEdqaq8x7m1wlLYkKSMyDa2XC9pikZf6lwby9sKIoJ0gpiFoE9 g==; X-CSE-ConnectionGUID: 4l9Yno1iT/SVf8Y3BB1oaA== X-CSE-MsgGUID: FKmwi6AXT4i92TtTTQS+rw== X-IronPort-AV: E=McAfee;i="6800,10657,11794"; a="80140469" X-IronPort-AV: E=Sophos;i="6.24,162,1774335600"; d="scan'208";a="80140469" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2026 12:15:24 -0700 X-CSE-ConnectionGUID: 61Rmk+E5QMmOvH4ltkKeJA== X-CSE-MsgGUID: oVDU+U/uThyVX7sY/HeWvA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,162,1774335600"; d="scan'208";a="271336471" Received: from rchatre-desk1.jf.intel.com ([10.165.154.99]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2026 12:15:24 -0700 From: Reinette Chatre To: tony.luck@intel.com, james.morse@arm.com, Dave.Martin@arm.com, babu.moger@amd.com, bp@alien8.de, tglx@linutronix.de, dave.hansen@linux.intel.com Cc: x86@kernel.org, hpa@zytor.com, ben.horgan@arm.com, fustini@kernel.org, fenghuay@nvidia.com, peternewman@google.com, yu.c.chen@intel.com, linux-kernel@vger.kernel.org, patches@lists.linux.dev, reinette.chatre@intel.com Subject: [PATCH v3 5/9] fs/resctrl: Prevent use-after-free in rdtgroup_kn_put() Date: Fri, 22 May 2026 12:15:09 -0700 Message-ID: X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A struct rdtgroup is reference counted via rdtgroup::waitcount. Callers that need the structure to remain valid across a sleep (while waiting on acquiring rdtgroup_mutex) take a reference with rdtgroup_kn_get() and release it with rdtgroup_kn_put(). The release path is intended to serve as the fallback freer: if the count drops to zero and the group has already been marked RDT_DELETED, rdtgroup_kn_put() frees the structure. The bulk teardown paths free_all_child_rdtgrp() and rmdir_all_sub() resulting from a resctrl directory remove or resctrl fs unmount act as the primary freer: they hold rdtgroup_mutex and free each rdtgroup whose waitcount is zero, otherwise they set RDT_DELETED and leave the freeing to the last waiter. These two freers race. rdtgroup_kn_put() commits waitcount =3D=3D 0 with atomic_dec_and_test() outside rdtgroup_mutex, then reads rdtgroup::flags. Between those two operations a concurrent caller of free_all_child_rdtgrp() or rmdir_all_sub() (which holds the mutex) can observe waitcount =3D=3D 0 v= ia atomic_read(), call rdtgroup_remove(), and kfree() the structure. The subsequent read of rdtgroup::flags in rdtgroup_kn_put() is then a use-after-free, and the structure may even be freed twice if the freed memory happens to satisfy the RDT_DELETED flag check. Replace the bare atomic_dec_and_test() with atomic_dec_and_mutex_lock() so that the decrement-to-zero takes rdtgroup_mutex before the count becomes globally visible. The inspection of rdtgroup::flags then runs under the same mutex held by the bulk freers, making the two paths mutually exclusive. The common case where the count does not reach zero remains lock-free. Defer kernfs_unbreak_active_protection() until after the mutex is dropped since kernfs active protections functionally wrap rdtgroup_mutex. Remove resource group, which in turn drops its kernfs reference, after kernfs protection is restored. Fixes: b8511ccc75c0 ("x86/resctrl: Fix use-after-free when deleting resourc= e groups") Reported-by: Sashiko Closes: https://sashiko.dev/#/patchset/20260515193944.15114-1-tony.luck%40i= ntel.com?part=3D1 Assisted-by: GitHub_Copilot:gemini-3.1-pro Signed-off-by: Reinette Chatre Reviewed-by: Ben Horgan --- Changes since V2: - New patch --- fs/resctrl/rdtgroup.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c index d323b515060b..6418395877cf 100644 --- a/fs/resctrl/rdtgroup.c +++ b/fs/resctrl/rdtgroup.c @@ -2606,15 +2606,24 @@ static void rdtgroup_kn_get(struct rdtgroup *rdtgrp= , struct kernfs_node *kn) =20 static void rdtgroup_kn_put(struct rdtgroup *rdtgrp, struct kernfs_node *k= n) { - if (atomic_dec_and_test(&rdtgrp->waitcount) && - (rdtgrp->flags & RDT_DELETED)) { + bool needs_free; + + if (!atomic_dec_and_mutex_lock(&rdtgrp->waitcount, &rdtgroup_mutex)) { + kernfs_unbreak_active_protection(kn); + return; + } + + needs_free =3D rdtgrp->flags & RDT_DELETED; + + mutex_unlock(&rdtgroup_mutex); + + kernfs_unbreak_active_protection(kn); + + if (needs_free) { if (rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKSETUP || rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKED) rdtgroup_pseudo_lock_remove(rdtgrp); - kernfs_unbreak_active_protection(kn); rdtgroup_remove(rdtgrp); - } else { - kernfs_unbreak_active_protection(kn); } } =20 --=20 2.50.1 From nobody Sat Jun 13 13:31:36 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB0DA37C101 for ; Fri, 22 May 2026 19:15:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779477331; cv=none; b=fVGwl9OAZGYCCl1PT8JQWgWzZSvqv91Yc6DtWjxF9rtEpTc/ybtGKmWJWW83Po3f3uoLAZxWucukMJHONoYHOjLUWszKACnwVJsCO6uz8rH0jTohXuQ1hnn3MgVLCEnhZYr0My+PJ7x+mTy4mOEa37G8UT1DsUOAiIGfiCofSXI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779477331; c=relaxed/simple; bh=DejNb5y7kc25szJUthqObC//f69mjqcatZ0NbQJD3ck=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=O6L/ADk54x1N5yurhjpOa48B57atGaemv637avi5srKILRFOfoVuPwcMpS+Ua6lhiFpFxDST12q3Xfh6VxPHMKV3B0vK9UJ7tMqO9unc4s6ykUQW4yLVK70PNNkm01Kfw1RoKnRbsODxdZjkQw/f3bAWaMPUwBBquU2g5QNOdV8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=AxtOHCkF; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="AxtOHCkF" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779477329; x=1811013329; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DejNb5y7kc25szJUthqObC//f69mjqcatZ0NbQJD3ck=; b=AxtOHCkFjGs1Ye5JHN5FapgmPLNCVh5dSKn3Znc6OI/ahkYmRpbB3Rza 4TEkrw0W5AthsqiiCnTV1qj53TxhHJkBS5MsO9zXc73VA/11QPDQUW9wl sTn3ezUGMwiRXNrESIJ1AA/A+8lqBAjBkb5yXeQuIcy5OfNDdGR/mai0v oVgUHT25WIAZmthDopwgHXIzsHnqEsqb/l/VlpOr3Tmt+JT+7bvJqsHKy JctsdE6IZovO+v6wm7iyVrZZu7Rr2yBiUNpXOen8R27YQnnJwjUeGRnyf JiBW+RaCbHuKm/E/K1sIbS7Xmlq838LaW+Veq+4n4Zv+ql1/Pr2PigIMW g==; X-CSE-ConnectionGUID: vFhTBS92RuqEjjxjCaeF8Q== X-CSE-MsgGUID: ECnRFs2xRTibZ71GSd7+/Q== X-IronPort-AV: E=McAfee;i="6800,10657,11794"; a="80140479" X-IronPort-AV: E=Sophos;i="6.24,162,1774335600"; d="scan'208";a="80140479" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2026 12:15:24 -0700 X-CSE-ConnectionGUID: HZOzsb/eRYGNWXUIAAO50g== X-CSE-MsgGUID: yXuMAaFATK2UBhutWHG4Og== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,162,1774335600"; d="scan'208";a="271336474" Received: from rchatre-desk1.jf.intel.com ([10.165.154.99]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2026 12:15:24 -0700 From: Reinette Chatre To: tony.luck@intel.com, james.morse@arm.com, Dave.Martin@arm.com, babu.moger@amd.com, bp@alien8.de, tglx@linutronix.de, dave.hansen@linux.intel.com Cc: x86@kernel.org, hpa@zytor.com, ben.horgan@arm.com, fustini@kernel.org, fenghuay@nvidia.com, peternewman@google.com, yu.c.chen@intel.com, linux-kernel@vger.kernel.org, patches@lists.linux.dev, reinette.chatre@intel.com Subject: [PATCH v3 6/9] fs/resctrl: Fix pseudo-locking lifetime handling Date: Fri, 22 May 2026 12:15:10 -0700 Message-ID: X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" rmdir_all_sub() implements resctrl unmount teardown by iterating through all resource groups. Memory cleanup is coordinated by deferring resource group removal to rdtgroup_kn_put() if there are lingering waiters. A couple of issues exist in the pseudo-locking lifetime management: 1) rmdir_all_sub() unconditionally invokes rdtgroup_pseudo_lock_remove() on pseudo-locked groups, even if active references remain. When the deferred rdtgroup_kn_put() is finally executed by the last waiter, it erroneously calls rdtgroup_pseudo_lock_remove() a second time, resulting in a NULL-pointer dereference as the pseudo-lock structures have already been freed. 2) pseudo_lock_dev_release() drops the waitcount reference but misses checking if the resource group has been flagged for deletion. If resctrl is unmounted while a pseudo-lock device file is open, rmdir_all_sub() will observe the active waitcount and defer the deletion to the last reference holder. When the device file descriptor is subsequently closed, pseudo_lock_dev_release() decrements the waitcount without inspecting nor acting on the RDT_DELETED flag. Neither the pseudo-lock memory cleanup nor the final resource group removal are executed. 3) rdtgroup_pseudo_lock_remove() calls debugfs_remove_recursive() that will wait for any existing debugfs users to complete before it can do cleanup. Since the pseudo-locking debugfs handlers take rdtgroup_mutex it is required that this debugfs_remove_recursive() is not called with rdtgroup_mutex held, yet rmdir_all_sub() does so. 4) A pseudo-locked group's RMID is freed when it is created. On unmount rmdir_all_sub() unconditionally frees all RMID of all groups, resulting in a double-free of the pseudo-locked group's RMID. The additional consequence of this is that the original free results in the pseudo-locked group's RMID being added to the rmid_free_lru linked list and the second free then attempts to add the same RMID entry to the rmid_free_lru again. Fix pseudo-locking lifetime handling by separating the pseudo-locking infrastructure removal from the pseudo-locking region memory teardown, and splitting pseudo-locking infrastructure removal into what needs rdtgroup_mu= tex protection and what does not. With this separation an unmount of resctrl fs can proceed to remove the pseudo-locking infrastructure of a pseudo-lock= ed region since the global infrastructure it depends on will be removed soon after (resctrl_unmount()-> resctrl_fs_teardown()->rdt_pseudo_lock_release()= ). Any active users of the pseudo-locked region via an open file can complete safely afterwards and only need to do the pseudo-locked region specific memory teardown. The new RDT_DELETED_PLR resource group flag communicates to waiters whether the pseudo-locked region's infrastructure has already been removed. Fixes: 746e08590b86 ("x86/intel_rdt: Create character device exposing pseud= o-locked region") Fixes: e0bdfe8e36f3 ("x86/intel_rdt: Support creation/removal of pseudo-loc= ked region") Reported-by: Sashiko Closes: https://sashiko.dev/#/patchset/20260515193944.15114-1-tony.luck%40i= ntel.com?part=3D3 Closes: https://sashiko.dev/#/patchset/20260515193944.15114-1-tony.luck%40i= ntel.com?part=3D1 Assisted-by: GitHub_Copilot:gemini-3.1-pro Signed-off-by: Reinette Chatre --- Changes since V2: - New patch --- fs/resctrl/internal.h | 12 +++++++++++ fs/resctrl/pseudo_lock.c | 44 +++++++++++++++++++++++++++++++++------- fs/resctrl/rdtgroup.c | 35 ++++++++++++++++++-------------- 3 files changed, 69 insertions(+), 22 deletions(-) diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h index 48af75b9dc85..e7e415ee7766 100644 --- a/fs/resctrl/internal.h +++ b/fs/resctrl/internal.h @@ -234,6 +234,15 @@ struct rdtgroup { =20 /* rdtgroup.flags */ #define RDT_DELETED 1 +/* + * RDT_DELETED_PLR is set when the pseudo-locked group's infrastructure + * (its associated device, debugfs files, etc.) has been deleted via + * rdtgroup_pseudo_lock_remove(). This can be done while there are + * references to the pseudo-locked region since the pseudo-locked region + * self is freed separately via pseudo_lock_free() after there are no more + * references. + */ +#define RDT_DELETED_PLR 2 =20 /* rftype.flags */ #define RFTYPE_FLAGS_CPUS_LIST 1 @@ -334,6 +343,7 @@ void rdt_last_cmd_puts(const char *s); __printf(1, 2) void rdt_last_cmd_printf(const char *fmt, ...); =20 +void rdtgroup_remove(struct rdtgroup *rdtgrp); struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn); =20 void rdtgroup_kn_unlock(struct kernfs_node *kn); @@ -484,6 +494,7 @@ void rdt_pseudo_lock_release(void); int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp); =20 void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp); +void pseudo_lock_free(struct rdtgroup *rdtgrp); =20 #else static inline int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp) @@ -514,6 +525,7 @@ static inline int rdtgroup_pseudo_lock_create(struct rd= tgroup *rdtgrp) } =20 static inline void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp) { } +static inline void pseudo_lock_free(struct rdtgroup *rdtgrp) { } #endif /* CONFIG_RESCTRL_FS_PSEUDO_LOCK */ =20 #endif /* _FS_RESCTRL_INTERNAL_H */ diff --git a/fs/resctrl/pseudo_lock.c b/fs/resctrl/pseudo_lock.c index d1cb0986006e..f9d180eb699e 100644 --- a/fs/resctrl/pseudo_lock.c +++ b/fs/resctrl/pseudo_lock.c @@ -333,8 +333,10 @@ static int pseudo_lock_region_alloc(struct pseudo_lock= _region *plr) * * Return: void */ -static void pseudo_lock_free(struct rdtgroup *rdtgrp) +void pseudo_lock_free(struct rdtgroup *rdtgrp) { + if (!rdtgrp->plr) + return; pseudo_lock_region_clear(rdtgrp->plr); kfree(rdtgrp->plr); rdtgrp->plr =3D NULL; @@ -928,22 +930,37 @@ void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdt= grp) { struct pseudo_lock_region *plr =3D rdtgrp->plr; =20 + lockdep_assert_held(&rdtgroup_mutex); + + if (rdtgrp->mode !=3D RDT_MODE_PSEUDO_LOCKSETUP && + rdtgrp->mode !=3D RDT_MODE_PSEUDO_LOCKED) + return; + + if (rdtgrp->flags & RDT_DELETED_PLR) + return; + + rdtgrp->flags |=3D RDT_DELETED_PLR; + if (rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKSETUP) { /* * Default group cannot be a pseudo-locked region so we can * free closid here. */ closid_free(rdtgrp->closid); - goto free; + return; } =20 pseudo_lock_cstates_relax(plr); - debugfs_remove_recursive(rdtgrp->plr->debugfs_dir); + /* + * Drop rdtgroup_mutex to enable debugfs_remove_recursive() to + * complete as it waits for active users that may be blocked + * waiting on rdtgroup_mutex to complete. + */ + mutex_unlock(&rdtgroup_mutex); + debugfs_remove_recursive(plr->debugfs_dir); + mutex_lock(&rdtgroup_mutex); device_destroy(&pseudo_lock_class, MKDEV(pseudo_lock_major, plr->minor)); pseudo_lock_minor_release(plr->minor); - -free: - pseudo_lock_free(rdtgrp); } =20 static int pseudo_lock_dev_open(struct inode *inode, struct file *filp) @@ -971,6 +988,7 @@ static int pseudo_lock_dev_open(struct inode *inode, st= ruct file *filp) static int pseudo_lock_dev_release(struct inode *inode, struct file *filp) { struct rdtgroup *rdtgrp; + bool needs_free =3D false; =20 mutex_lock(&rdtgroup_mutex); rdtgrp =3D filp->private_data; @@ -980,8 +998,20 @@ static int pseudo_lock_dev_release(struct inode *inode= , struct file *filp) return -ENODEV; } filp->private_data =3D NULL; - atomic_dec(&rdtgrp->waitcount); + + if (atomic_dec_and_test(&rdtgrp->waitcount) && + (rdtgrp->flags & RDT_DELETED)) { + needs_free =3D true; + rdtgroup_pseudo_lock_remove(rdtgrp); + } + mutex_unlock(&rdtgroup_mutex); + + if (needs_free) { + pseudo_lock_free(rdtgrp); + rdtgroup_remove(rdtgrp); + } + return 0; } =20 diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c index 6418395877cf..a8b4ac7dd823 100644 --- a/fs/resctrl/rdtgroup.c +++ b/fs/resctrl/rdtgroup.c @@ -595,7 +595,7 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_f= ile *of, * * Return: void */ -static void rdtgroup_remove(struct rdtgroup *rdtgrp) +void rdtgroup_remove(struct rdtgroup *rdtgrp) { if (rdtgrp =3D=3D &rdtgroup_default) return; @@ -2606,23 +2606,24 @@ static void rdtgroup_kn_get(struct rdtgroup *rdtgrp= , struct kernfs_node *kn) =20 static void rdtgroup_kn_put(struct rdtgroup *rdtgrp, struct kernfs_node *k= n) { - bool needs_free; + bool needs_free =3D false; =20 if (!atomic_dec_and_mutex_lock(&rdtgrp->waitcount, &rdtgroup_mutex)) { kernfs_unbreak_active_protection(kn); return; } =20 - needs_free =3D rdtgrp->flags & RDT_DELETED; + if (rdtgrp->flags & RDT_DELETED) { + needs_free =3D true; + rdtgroup_pseudo_lock_remove(rdtgrp); + } =20 mutex_unlock(&rdtgroup_mutex); =20 kernfs_unbreak_active_protection(kn); =20 if (needs_free) { - if (rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKSETUP || - rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKED) - rdtgroup_pseudo_lock_remove(rdtgrp); + pseudo_lock_free(rdtgrp); rdtgroup_remove(rdtgrp); } } @@ -2885,10 +2886,6 @@ static void rmdir_all_sub(void) if (rdtgrp =3D=3D &rdtgroup_default) continue; =20 - if (rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKSETUP || - rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKED) - rdtgroup_pseudo_lock_remove(rdtgrp); - /* * Give any CPUs back to the default group. We cannot copy * cpu_online_mask because a CPU might have executed the @@ -2899,15 +2896,23 @@ static void rmdir_all_sub(void) =20 rdtgroup_unassign_cntrs(rdtgrp); =20 - free_rmid(rdtgrp->closid, rdtgrp->mon.rmid); - kernfs_remove(rdtgrp->kn); list_del(&rdtgrp->rdtgroup_list); =20 - if (atomic_read(&rdtgrp->waitcount) !=3D 0) - rdtgrp->flags =3D RDT_DELETED; - else + if (rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKSETUP || + rdtgrp->mode =3D=3D RDT_MODE_PSEUDO_LOCKED) { + rdtgroup_pseudo_lock_remove(rdtgrp); + } else { + /* Pseudo-locked group's RMID is freed during setup. */ + free_rmid(rdtgrp->closid, rdtgrp->mon.rmid); + } + + if (atomic_read(&rdtgrp->waitcount) !=3D 0) { + rdtgrp->flags |=3D RDT_DELETED; + } else { + pseudo_lock_free(rdtgrp); rdtgroup_remove(rdtgrp); + } } /* Notify online CPUs to update per cpu storage and PQR_ASSOC MSR */ update_closid_rmid(cpu_online_mask, &rdtgroup_default); --=20 2.50.1 From nobody Sat Jun 13 13:31:36 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18A1536B05E for ; Fri, 22 May 2026 19:15:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779477331; cv=none; b=hYBo8OrsLLl7DB86mM6kHQrWsH4GoAhx06VyNrpAcztTG+HgPHIFNQZkhpe9SBMnasEce54d+c8YGOBKtsadOF6FccyV+moI56wGXeBFz7SDeZQImB/XsIrZSvRLXdGFqAaOPci8+3ICNjQMqKiXjWoc4ZvbDEZS61CY7cpdHY4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779477331; c=relaxed/simple; bh=K1I8MxIbzIiAAPzk/lJsJPaSkWTpswYCOD+0zt+b7Ek=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=k33crgd/kCneENVJ6kDHmG+Wr0fKQQbhu/gwrHBo2vmO/GmIcLAlgCYco7TcCKSGY3z4NZgNZxuJmdyEb+D6K8gO5W/NB4XR8yBmDcaWC08NYcATsajF0xlYp3xC4FyReVAP4AsDZxT8iYNGZ+Ttq1WrYk0Z9QiHYGPf5XZ7SY8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=PqAgJ1+s; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="PqAgJ1+s" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779477329; x=1811013329; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=K1I8MxIbzIiAAPzk/lJsJPaSkWTpswYCOD+0zt+b7Ek=; b=PqAgJ1+szHvDvNDPvotaOy63dwdSRUOYZHs1HuJ2PURhiXWspdvWaGGH n/OBKGx08AyNkazan8l/auG5vMrny0/WOSb1xCugvwYg6YznvJYWTJAWS +yk3njXYCIyJMnh5itsnhk1no8DZYYspEamw7IlBIBkkUhBTbPWQ3Iu6F mBdoi5hpWHpTIoQbplBO4H7geSMtAGrpFKmoXH8zsUo88AF+KHQlv1Zae 9B2dRZphFYJStCIWGAThsjzvVlZDZlmbU7irLY0Jngcy8p+isOVd31z+U RhIJlsRPvloHZeklDTmAPXKYqqScKhifpANeZg0NYMvlcPBxu/3ZfayFE A==; X-CSE-ConnectionGUID: pUTK6WXxSyyPwHsVovkF6A== X-CSE-MsgGUID: 4z+L6jgDTbq9pr/oFRRcgw== X-IronPort-AV: E=McAfee;i="6800,10657,11794"; a="80140489" X-IronPort-AV: E=Sophos;i="6.24,162,1774335600"; d="scan'208";a="80140489" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2026 12:15:24 -0700 X-CSE-ConnectionGUID: f4Nj/o7fRq2xCfUKXEs9GQ== X-CSE-MsgGUID: ygJDEnrOSY23qpP5Oja3TQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,162,1774335600"; d="scan'208";a="271336478" Received: from rchatre-desk1.jf.intel.com ([10.165.154.99]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2026 12:15:24 -0700 From: Reinette Chatre To: tony.luck@intel.com, james.morse@arm.com, Dave.Martin@arm.com, babu.moger@amd.com, bp@alien8.de, tglx@linutronix.de, dave.hansen@linux.intel.com Cc: x86@kernel.org, hpa@zytor.com, ben.horgan@arm.com, fustini@kernel.org, fenghuay@nvidia.com, peternewman@google.com, yu.c.chen@intel.com, linux-kernel@vger.kernel.org, patches@lists.linux.dev, reinette.chatre@intel.com Subject: [PATCH v3 7/9] fs/resctrl: Prevent deadlock and use-after-free in info file handlers Date: Fri, 22 May 2026 12:15:11 -0700 Message-ID: <04548591d62d3ae3e9937814d4b3926f8d0424c9.1779476724.git.reinette.chatre@intel.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" resctrl provides files under the info/ directory to expose global configuration and capabilities to userspace. These files are instantiated statically during filesystem mount and expose data associated with internal schema structures via kernfs private pointers. A potential deadlock exists between userspace readers of these info files and the unmount filesystem teardown process. Reading an info file invokes kernfs which acquires an active reference, after which the handler typically attempts to acquire the rdtgroup_mutex. Concurrently, unmounting the filesystem holds the rdtgroup_mutex and then attempts to recursively remove the info kernfs nodes involving kernfs_drain() which blocks until all active references are released. Another problem exists where info files might be accessed from an outdated mount if the filesystem is unmounted and remounted during a reader's execution, leading to a use-after-free when reading the now-deleted private schema data. Introduce info_kn_lock() and info_kn_unlock() helpers to coordinate locking across all info handlers. These helpers mirror similar logic used by resour= ce group handlers by deliberately breaking the kernfs active protection before attempting to acquire the rdtgroup_mutex, preventing the deadlock. To guard against the vulnerability from rapid mount cycling, info_kn_lock() securely walks the parent lineage of the kernfs node under an RCU section to confirm the node belongs to the globally active root before permitting the operation to proceed. Convert all info file handlers to use this helper and only de-reference the schema after it determined safe to do so. Make no attempt to output error message to last_cmd_status on failure since failure implies there is no filesystem with which to display error to user space. Reported-by: Sashiko Closes: https://sashiko.dev/#/patchset/20260515193944.15114-1-tony.luck%40i= ntel.com?part=3D3 Assisted-by: GitHub_Copilot:gemini-3.1-pro Signed-off-by: Reinette Chatre --- Changes since V2: - New patch --- fs/resctrl/ctrlmondata.c | 38 ++++---- fs/resctrl/internal.h | 3 +- fs/resctrl/monitor.c | 48 +++++----- fs/resctrl/rdtgroup.c | 192 ++++++++++++++++++++++++++++++++------- 4 files changed, 203 insertions(+), 78 deletions(-) diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c index 9a7dfc48cb2e..b95bf6208be2 100644 --- a/fs/resctrl/ctrlmondata.c +++ b/fs/resctrl/ctrlmondata.c @@ -769,10 +769,12 @@ int rdtgroup_mondata_show(struct seq_file *m, void *a= rg) int resctrl_io_alloc_show(struct kernfs_open_file *of, struct seq_file *se= q, void *v) { struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn); - struct rdt_resource *r =3D s->res; + struct rdt_resource *r; =20 - mutex_lock(&rdtgroup_mutex); + if (!info_kn_lock(of->kn)) + return -ENOENT; =20 + r =3D s->res; if (r->cache.io_alloc_capable) { if (resctrl_arch_get_io_alloc_enabled(r)) seq_puts(seq, "enabled\n"); @@ -782,7 +784,7 @@ int resctrl_io_alloc_show(struct kernfs_open_file *of, = struct seq_file *seq, voi seq_puts(seq, "not supported\n"); } =20 - mutex_unlock(&rdtgroup_mutex); + info_kn_unlock(of->kn); =20 return 0; } @@ -847,7 +849,7 @@ ssize_t resctrl_io_alloc_write(struct kernfs_open_file = *of, char *buf, size_t nbytes, loff_t off) { struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn); - struct rdt_resource *r =3D s->res; + struct rdt_resource *r; char const *grp_name; u32 io_alloc_closid; bool enable; @@ -857,9 +859,10 @@ ssize_t resctrl_io_alloc_write(struct kernfs_open_file= *of, char *buf, if (ret) return ret; =20 - cpus_read_lock(); - mutex_lock(&rdtgroup_mutex); + if (!info_kn_lock(of->kn)) + return -ENOENT; =20 + r =3D s->res; rdt_last_cmd_clear(); =20 if (!r->cache.io_alloc_capable) { @@ -907,8 +910,7 @@ ssize_t resctrl_io_alloc_write(struct kernfs_open_file = *of, char *buf, } =20 out_unlock: - mutex_unlock(&rdtgroup_mutex); - cpus_read_unlock(); + info_kn_unlock(of->kn); =20 return ret ?: nbytes; } @@ -916,14 +918,15 @@ ssize_t resctrl_io_alloc_write(struct kernfs_open_fil= e *of, char *buf, int resctrl_io_alloc_cbm_show(struct kernfs_open_file *of, struct seq_file= *seq, void *v) { struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn); - struct rdt_resource *r =3D s->res; + struct rdt_resource *r; int ret =3D 0; =20 - cpus_read_lock(); - mutex_lock(&rdtgroup_mutex); + if (!info_kn_lock(of->kn)) + return -ENOENT; =20 rdt_last_cmd_clear(); =20 + r =3D s->res; if (!r->cache.io_alloc_capable) { rdt_last_cmd_printf("io_alloc is not supported on %s\n", s->name); ret =3D -ENODEV; @@ -945,8 +948,7 @@ int resctrl_io_alloc_cbm_show(struct kernfs_open_file *= of, struct seq_file *seq, show_doms(seq, s, NULL, resctrl_io_alloc_closid(r)); =20 out_unlock: - mutex_unlock(&rdtgroup_mutex); - cpus_read_unlock(); + info_kn_unlock(of->kn); return ret; } =20 @@ -1013,7 +1015,7 @@ ssize_t resctrl_io_alloc_cbm_write(struct kernfs_open= _file *of, char *buf, size_t nbytes, loff_t off) { struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn); - struct rdt_resource *r =3D s->res; + struct rdt_resource *r; u32 io_alloc_closid; int ret =3D 0; =20 @@ -1023,10 +1025,11 @@ ssize_t resctrl_io_alloc_cbm_write(struct kernfs_op= en_file *of, char *buf, =20 buf[nbytes - 1] =3D '\0'; =20 - cpus_read_lock(); - mutex_lock(&rdtgroup_mutex); + if (!info_kn_lock(of->kn)) + return -ENOENT; rdt_last_cmd_clear(); =20 + r =3D s->res; if (!r->cache.io_alloc_capable) { rdt_last_cmd_printf("io_alloc is not supported on %s\n", s->name); ret =3D -ENODEV; @@ -1051,8 +1054,7 @@ ssize_t resctrl_io_alloc_cbm_write(struct kernfs_open= _file *of, char *buf, out_clear_configs: rdt_staged_configs_clear(); out_unlock: - mutex_unlock(&rdtgroup_mutex); - cpus_read_unlock(); + info_kn_unlock(of->kn); =20 return ret ?: nbytes; } diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h index e7e415ee7766..4e3173f25e92 100644 --- a/fs/resctrl/internal.h +++ b/fs/resctrl/internal.h @@ -345,8 +345,9 @@ void rdt_last_cmd_printf(const char *fmt, ...); =20 void rdtgroup_remove(struct rdtgroup *rdtgrp); struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn); - void rdtgroup_kn_unlock(struct kernfs_node *kn); +bool info_kn_lock(struct kernfs_node *kn); +void info_kn_unlock(struct kernfs_node *kn); =20 int rdtgroup_kn_mode_restrict(struct rdtgroup *r, const char *name); =20 diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c index 0e6a389a16bf..4565b9864a9e 100644 --- a/fs/resctrl/monitor.c +++ b/fs/resctrl/monitor.c @@ -1052,7 +1052,8 @@ int event_filter_show(struct kernfs_open_file *of, st= ruct seq_file *seq, void *v bool sep =3D false; int ret =3D 0, i; =20 - mutex_lock(&rdtgroup_mutex); + if (!info_kn_lock(of->kn)) + return -ENOENT; rdt_last_cmd_clear(); =20 r =3D resctrl_arch_get_resource(mevt->rid); @@ -1073,7 +1074,7 @@ int event_filter_show(struct kernfs_open_file *of, st= ruct seq_file *seq, void *v seq_putc(seq, '\n'); =20 out_unlock: - mutex_unlock(&rdtgroup_mutex); + info_kn_unlock(of->kn); =20 return ret; } @@ -1084,7 +1085,8 @@ int resctrl_mbm_assign_on_mkdir_show(struct kernfs_op= en_file *of, struct seq_fil struct rdt_resource *r =3D rdt_kn_parent_priv(of->kn); int ret =3D 0; =20 - mutex_lock(&rdtgroup_mutex); + if (!info_kn_lock(of->kn)) + return -ENOENT; rdt_last_cmd_clear(); =20 if (!resctrl_arch_mbm_cntr_assign_enabled(r)) { @@ -1096,7 +1098,7 @@ int resctrl_mbm_assign_on_mkdir_show(struct kernfs_op= en_file *of, struct seq_fil seq_printf(s, "%u\n", r->mon.mbm_assign_on_mkdir); =20 out_unlock: - mutex_unlock(&rdtgroup_mutex); + info_kn_unlock(of->kn); =20 return ret; } @@ -1112,7 +1114,8 @@ ssize_t resctrl_mbm_assign_on_mkdir_write(struct kern= fs_open_file *of, char *buf if (ret) return ret; =20 - mutex_lock(&rdtgroup_mutex); + if (!info_kn_lock(of->kn)) + return -ENOENT; rdt_last_cmd_clear(); =20 if (!resctrl_arch_mbm_cntr_assign_enabled(r)) { @@ -1124,7 +1127,7 @@ ssize_t resctrl_mbm_assign_on_mkdir_write(struct kern= fs_open_file *of, char *buf r->mon.mbm_assign_on_mkdir =3D value; =20 out_unlock: - mutex_unlock(&rdtgroup_mutex); + info_kn_unlock(of->kn); =20 return ret ?: nbytes; } @@ -1414,8 +1417,8 @@ ssize_t event_filter_write(struct kernfs_open_file *o= f, char *buf, size_t nbytes =20 buf[nbytes - 1] =3D '\0'; =20 - cpus_read_lock(); - mutex_lock(&rdtgroup_mutex); + if (!info_kn_lock(of->kn)) + return -ENOENT; =20 rdt_last_cmd_clear(); =20 @@ -1438,8 +1441,7 @@ ssize_t event_filter_write(struct kernfs_open_file *o= f, char *buf, size_t nbytes } =20 out_unlock: - mutex_unlock(&rdtgroup_mutex); - cpus_read_unlock(); + info_kn_unlock(of->kn); =20 return ret ?: nbytes; } @@ -1450,7 +1452,8 @@ int resctrl_mbm_assign_mode_show(struct kernfs_open_f= ile *of, struct rdt_resource *r =3D rdt_kn_parent_priv(of->kn); bool enabled; =20 - mutex_lock(&rdtgroup_mutex); + if (!info_kn_lock(of->kn)) + return -ENOENT; enabled =3D resctrl_arch_mbm_cntr_assign_enabled(r); =20 if (r->mon.mbm_cntr_assignable) { @@ -1469,7 +1472,7 @@ int resctrl_mbm_assign_mode_show(struct kernfs_open_f= ile *of, seq_puts(s, "[default]\n"); } =20 - mutex_unlock(&rdtgroup_mutex); + info_kn_unlock(of->kn); =20 return 0; } @@ -1488,8 +1491,8 @@ ssize_t resctrl_mbm_assign_mode_write(struct kernfs_o= pen_file *of, char *buf, =20 buf[nbytes - 1] =3D '\0'; =20 - cpus_read_lock(); - mutex_lock(&rdtgroup_mutex); + if (!info_kn_lock(of->kn)) + return -ENOENT; =20 rdt_last_cmd_clear(); =20 @@ -1547,8 +1550,7 @@ ssize_t resctrl_mbm_assign_mode_write(struct kernfs_o= pen_file *of, char *buf, } =20 out_unlock: - mutex_unlock(&rdtgroup_mutex); - cpus_read_unlock(); + info_kn_unlock(of->kn); =20 return ret ?: nbytes; } @@ -1560,8 +1562,8 @@ int resctrl_num_mbm_cntrs_show(struct kernfs_open_fil= e *of, struct rdt_l3_mon_domain *dom; bool sep =3D false; =20 - cpus_read_lock(); - mutex_lock(&rdtgroup_mutex); + if (!info_kn_lock(of->kn)) + return -ENOENT; =20 list_for_each_entry(dom, &r->mon_domains, hdr.list) { if (sep) @@ -1572,8 +1574,7 @@ int resctrl_num_mbm_cntrs_show(struct kernfs_open_fil= e *of, } seq_putc(s, '\n'); =20 - mutex_unlock(&rdtgroup_mutex); - cpus_read_unlock(); + info_kn_unlock(of->kn); return 0; } =20 @@ -1586,8 +1587,8 @@ int resctrl_available_mbm_cntrs_show(struct kernfs_op= en_file *of, u32 cntrs, i; int ret =3D 0; =20 - cpus_read_lock(); - mutex_lock(&rdtgroup_mutex); + if (!info_kn_lock(of->kn)) + return -ENOENT; =20 rdt_last_cmd_clear(); =20 @@ -1613,8 +1614,7 @@ int resctrl_available_mbm_cntrs_show(struct kernfs_op= en_file *of, seq_putc(s, '\n'); =20 out_unlock: - mutex_unlock(&rdtgroup_mutex); - cpus_read_unlock(); + info_kn_unlock(of->kn); =20 return ret; } diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c index a8b4ac7dd823..6601b138ac7a 100644 --- a/fs/resctrl/rdtgroup.c +++ b/fs/resctrl/rdtgroup.c @@ -977,13 +977,14 @@ static int rdt_last_cmd_status_show(struct kernfs_ope= n_file *of, { int len; =20 - mutex_lock(&rdtgroup_mutex); + if (!info_kn_lock(of->kn)) + return -ENOENT; len =3D seq_buf_used(&last_cmd_status); if (len) seq_printf(seq, "%.*s", len, last_cmd_status_buf); else seq_puts(seq, "ok\n"); - mutex_unlock(&rdtgroup_mutex); + info_kn_unlock(of->kn); return 0; } =20 @@ -1002,7 +1003,11 @@ static int rdt_num_closids_show(struct kernfs_open_f= ile *of, { struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn); =20 + if (!info_kn_lock(of->kn)) + return -ENOENT; seq_printf(seq, "%u\n", s->num_closid); + info_kn_unlock(of->kn); + return 0; } =20 @@ -1010,9 +1015,14 @@ static int rdt_default_ctrl_show(struct kernfs_open_= file *of, struct seq_file *seq, void *v) { struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn); - struct rdt_resource *r =3D s->res; + struct rdt_resource *r; =20 + if (!info_kn_lock(of->kn)) + return -ENOENT; + r =3D s->res; seq_printf(seq, "%x\n", resctrl_get_default_ctrl(r)); + info_kn_unlock(of->kn); + return 0; } =20 @@ -1020,9 +1030,15 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_= file *of, struct seq_file *seq, void *v) { struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn); - struct rdt_resource *r =3D s->res; + struct rdt_resource *r; + =20 + if (!info_kn_lock(of->kn)) + return -ENOENT; + r =3D s->res; seq_printf(seq, "%u\n", r->cache.min_cbm_bits); + info_kn_unlock(of->kn); + return 0; } =20 @@ -1030,9 +1046,14 @@ static int rdt_shareable_bits_show(struct kernfs_ope= n_file *of, struct seq_file *seq, void *v) { struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn); - struct rdt_resource *r =3D s->res; + struct rdt_resource *r; =20 + if (!info_kn_lock(of->kn)) + return -ENOENT; + r =3D s->res; seq_printf(seq, "%x\n", r->cache.shareable_bits); + info_kn_unlock(of->kn); + return 0; } =20 @@ -1060,15 +1081,16 @@ static int rdt_bit_usage_show(struct kernfs_open_fi= le *of, */ unsigned long sw_shareable =3D 0, hw_shareable =3D 0; unsigned long exclusive =3D 0, pseudo_locked =3D 0; - struct rdt_resource *r =3D s->res; struct rdt_ctrl_domain *dom; int i, hwb, swb, excl, psl; + struct rdt_resource *r; enum rdtgrp_mode mode; bool sep =3D false; u32 ctrl_val; =20 - cpus_read_lock(); - mutex_lock(&rdtgroup_mutex); + if (!info_kn_lock(of->kn)) + return -ENOENT; + r =3D s->res; list_for_each_entry(dom, &r->ctrl_domains, hdr.list) { if (sep) seq_putc(seq, ';'); @@ -1144,8 +1166,7 @@ static int rdt_bit_usage_show(struct kernfs_open_file= *of, sep =3D true; } seq_putc(seq, '\n'); - mutex_unlock(&rdtgroup_mutex); - cpus_read_unlock(); + info_kn_unlock(of->kn); return 0; } =20 @@ -1153,9 +1174,14 @@ static int rdt_min_bw_show(struct kernfs_open_file *= of, struct seq_file *seq, void *v) { struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn); - struct rdt_resource *r =3D s->res; + struct rdt_resource *r; =20 + if (!info_kn_lock(of->kn)) + return -ENOENT; + r =3D s->res; seq_printf(seq, "%u\n", r->membw.min_bw); + info_kn_unlock(of->kn); + return 0; } =20 @@ -1164,8 +1190,12 @@ static int rdt_num_rmids_show(struct kernfs_open_fil= e *of, { struct rdt_resource *r =3D rdt_kn_parent_priv(of->kn); =20 + if (!info_kn_lock(of->kn)) + return -ENOENT; seq_printf(seq, "%u\n", r->mon.num_rmid); =20 + info_kn_unlock(of->kn); + return 0; } =20 @@ -1175,6 +1205,8 @@ static int rdt_mon_features_show(struct kernfs_open_f= ile *of, struct rdt_resource *r =3D rdt_kn_parent_priv(of->kn); struct mon_evt *mevt; =20 + if (!info_kn_lock(of->kn)) + return -ENOENT; for_each_mon_event(mevt) { if (mevt->rid !=3D r->rid || !mevt->enabled) continue; @@ -1184,6 +1216,8 @@ static int rdt_mon_features_show(struct kernfs_open_f= ile *of, seq_printf(seq, "%s_config\n", mevt->name); } =20 + info_kn_unlock(of->kn); + return 0; } =20 @@ -1191,9 +1225,14 @@ static int rdt_bw_gran_show(struct kernfs_open_file = *of, struct seq_file *seq, void *v) { struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn); - struct rdt_resource *r =3D s->res; + struct rdt_resource *r; =20 + if (!info_kn_lock(of->kn)) + return -ENOENT; + r =3D s->res; seq_printf(seq, "%u\n", r->membw.bw_gran); + info_kn_unlock(of->kn); + return 0; } =20 @@ -1201,16 +1240,24 @@ static int rdt_delay_linear_show(struct kernfs_open= _file *of, struct seq_file *seq, void *v) { struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn); - struct rdt_resource *r =3D s->res; + struct rdt_resource *r; =20 + if (!info_kn_lock(of->kn)) + return -ENOENT; + r =3D s->res; seq_printf(seq, "%u\n", r->membw.delay_linear); + info_kn_unlock(of->kn); + return 0; } =20 static int max_threshold_occ_show(struct kernfs_open_file *of, struct seq_file *seq, void *v) { + if (!info_kn_lock(of->kn)) + return -ENOENT; seq_printf(seq, "%u\n", resctrl_rmid_realloc_threshold); + info_kn_unlock(of->kn); =20 return 0; } @@ -1219,22 +1266,28 @@ static int rdt_thread_throttle_mode_show(struct ker= nfs_open_file *of, struct seq_file *seq, void *v) { struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn); - struct rdt_resource *r =3D s->res; + struct rdt_resource *r; + + if (!info_kn_lock(of->kn)) + return -ENOENT; =20 + r =3D s->res; switch (r->membw.throttle_mode) { case THREAD_THROTTLE_PER_THREAD: seq_puts(seq, "per-thread\n"); - return 0; + break; case THREAD_THROTTLE_MAX: seq_puts(seq, "max\n"); - return 0; + break; case THREAD_THROTTLE_UNDEFINED: seq_puts(seq, "undefined\n"); - return 0; + break; + default: + WARN_ON_ONCE(1); + break; } =20 - WARN_ON_ONCE(1); - + info_kn_unlock(of->kn); return 0; } =20 @@ -1248,12 +1301,20 @@ static ssize_t max_threshold_occ_write(struct kernf= s_open_file *of, if (ret) return ret; =20 - if (bytes > resctrl_rmid_realloc_limit) - return -EINVAL; + if (!info_kn_lock(of->kn)) + return -ENOENT; + + if (bytes > resctrl_rmid_realloc_limit) { + ret =3D -EINVAL; + goto out_unlock; + } =20 resctrl_rmid_realloc_threshold =3D resctrl_arch_round_mon_val(bytes); =20 - return nbytes; +out_unlock: + info_kn_unlock(of->kn); + + return ret ?: nbytes; } =20 /* @@ -1293,10 +1354,15 @@ static int rdt_has_sparse_bitmasks_show(struct kern= fs_open_file *of, struct seq_file *seq, void *v) { struct resctrl_schema *s =3D rdt_kn_parent_priv(of->kn); - struct rdt_resource *r =3D s->res; + struct rdt_resource *r; =20 + if (!info_kn_lock(of->kn)) + return -ENOENT; + r =3D s->res; seq_printf(seq, "%u\n", r->cache.arch_has_sparse_bitmasks); =20 + info_kn_unlock(of->kn); + return 0; } =20 @@ -1652,8 +1718,8 @@ static int mbm_config_show(struct seq_file *s, struct= rdt_resource *r, u32 evtid struct rdt_l3_mon_domain *dom; bool sep =3D false; =20 - cpus_read_lock(); - mutex_lock(&rdtgroup_mutex); + lockdep_assert_cpus_held(); + lockdep_assert_held(&rdtgroup_mutex); =20 list_for_each_entry(dom, &r->mon_domains, hdr.list) { if (sep) @@ -1670,8 +1736,6 @@ static int mbm_config_show(struct seq_file *s, struct= rdt_resource *r, u32 evtid } seq_puts(s, "\n"); =20 - mutex_unlock(&rdtgroup_mutex); - cpus_read_unlock(); =20 return 0; } @@ -1681,8 +1745,12 @@ static int mbm_total_bytes_config_show(struct kernfs= _open_file *of, { struct rdt_resource *r =3D rdt_kn_parent_priv(of->kn); =20 + if (!info_kn_lock(of->kn)) + return -ENOENT; + mbm_config_show(seq, r, QOS_L3_MBM_TOTAL_EVENT_ID); =20 + info_kn_unlock(of->kn); return 0; } =20 @@ -1691,8 +1759,12 @@ static int mbm_local_bytes_config_show(struct kernfs= _open_file *of, { struct rdt_resource *r =3D rdt_kn_parent_priv(of->kn); =20 + if (!info_kn_lock(of->kn)) + return -ENOENT; + mbm_config_show(seq, r, QOS_L3_MBM_LOCAL_EVENT_ID); =20 + info_kn_unlock(of->kn); return 0; } =20 @@ -1790,8 +1862,8 @@ static ssize_t mbm_total_bytes_config_write(struct ke= rnfs_open_file *of, if (nbytes =3D=3D 0 || buf[nbytes - 1] !=3D '\n') return -EINVAL; =20 - cpus_read_lock(); - mutex_lock(&rdtgroup_mutex); + if (!info_kn_lock(of->kn)) + return -ENOENT; =20 rdt_last_cmd_clear(); =20 @@ -1799,8 +1871,7 @@ static ssize_t mbm_total_bytes_config_write(struct ke= rnfs_open_file *of, =20 ret =3D mon_config_write(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID); =20 - mutex_unlock(&rdtgroup_mutex); - cpus_read_unlock(); + info_kn_unlock(of->kn); =20 return ret ?: nbytes; } @@ -1816,8 +1887,8 @@ static ssize_t mbm_local_bytes_config_write(struct ke= rnfs_open_file *of, if (nbytes =3D=3D 0 || buf[nbytes - 1] !=3D '\n') return -EINVAL; =20 - cpus_read_lock(); - mutex_lock(&rdtgroup_mutex); + if (!info_kn_lock(of->kn)) + return -ENOENT; =20 rdt_last_cmd_clear(); =20 @@ -1825,8 +1896,7 @@ static ssize_t mbm_local_bytes_config_write(struct ke= rnfs_open_file *of, =20 ret =3D mon_config_write(r, buf, QOS_L3_MBM_LOCAL_EVENT_ID); =20 - mutex_unlock(&rdtgroup_mutex); - cpus_read_unlock(); + info_kn_unlock(of->kn); =20 return ret ?: nbytes; } @@ -2660,6 +2730,58 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn) rdtgroup_kn_put(rdtgrp, kn); } =20 +/* + * Accessing the kn after breaking active protection is safe since the open + * of resctrl file holds a kernfs base reference (different from active + * protection) on the kn ensuring that it remains accessible even if it was + * unlinked. Each kn in turn holds base reference to parent so the kn's + * genealogy remains in memory until all base references dropped. + */ +static bool is_active_resctrl_node(struct kernfs_node *kn) +{ + struct kernfs_node *p; + bool match =3D false; + + guard(rcu)(); + p =3D kn; + while (p) { + if (p =3D=3D rdtgroup_default.kn) { + match =3D true; + break; + } + p =3D rcu_dereference(p->__parent); + } + + return match; +} + +bool info_kn_lock(struct kernfs_node *kn) +{ + kernfs_break_active_protection(kn); + cpus_read_lock(); + mutex_lock(&rdtgroup_mutex); + + /* + * Check both if resctrl is torn down (!rdtgroup_default.kn) and + * if the reader's kernfs_node originates from a dead mount. + */ + if (!rdtgroup_default.kn || !is_active_resctrl_node(kn)) { + mutex_unlock(&rdtgroup_mutex); + cpus_read_unlock(); + kernfs_unbreak_active_protection(kn); + return false; + } + + return true; +} + +void info_kn_unlock(struct kernfs_node *kn) +{ + mutex_unlock(&rdtgroup_mutex); + cpus_read_unlock(); + kernfs_unbreak_active_protection(kn); +} + static int mkdir_mondata_all(struct kernfs_node *parent_kn, struct rdtgroup *prgrp, struct kernfs_node **mon_data_kn); --=20 2.50.1 From nobody Sat Jun 13 13:31:36 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13A7B37DE8C for ; Fri, 22 May 2026 19:15:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779477332; cv=none; b=dSnsH00BoiUx9fKvgyVJuxlebrJGGXCRrWM+maXVWc4UxfiNGzOSSyiojYjzvy3FlAMgL16VGqTme3jgBPG493cQk5uHkHKP4LbsmxNrf2ml3HbAudMAbtJ9seTU0P/NQ8QVsMNNgykKTAQMWd21ZJlSfvTqS1JmsFoA+p6WS9o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779477332; c=relaxed/simple; bh=1iXFgQCwrYTU794BU7ayq2EciWf+wRYONptSagDlmxY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=E1FnnKuNqSGXSNbKmPWlDDEQ/dzaevTLjTtX6Z19Q1J5HlKRyL4XZui1Tzv0Mi0ku7zXmG5vxOaHMJ0jQmPT4r96iNADBmDmnj73lyeiKJVqU9eWJOGlg8d+D+dvJF073OhRt+Myfivq8X+YydeRUGHJhoZEiLtRtjfrkVoAbsA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=IT/xQdRg; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="IT/xQdRg" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779477331; x=1811013331; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1iXFgQCwrYTU794BU7ayq2EciWf+wRYONptSagDlmxY=; b=IT/xQdRg3bV/2mC6m2KxsORg5RyMJQTXIe6okPONecHKuMI9d/RT7C4g 2aZfgE3+y6OpUDe+rru0aJkqcyQ5ScxvUhID6PC/bgkHYOZVZvUs0jnJR d9FcWRoQ795laccFBjldqlXHy6tgTiZsdHUYvq4lk06XXEgEPQ/9J06Qt L0rIkqVSTToXZOPgisMi0HfpsudJwGzvlHL9bI8Dje11zmBOIoTbFOfKQ 6Ngm3cuGBNR0qmdfsjCXCqSB/ybhfi5gZXQvPYci9Yxw8UPOHlR+5I73Y FhE15/2rpCdAxw9K9q6nyZcpHzE0zGh4Lwf+3qVgM71BZzPQ5HCZBajmW w==; X-CSE-ConnectionGUID: YPwiWrSzQnqK2gPWh+moPA== X-CSE-MsgGUID: esQUH0B4TeK4Ig+7UEb9vQ== X-IronPort-AV: E=McAfee;i="6800,10657,11794"; a="80140499" X-IronPort-AV: E=Sophos;i="6.24,162,1774335600"; d="scan'208";a="80140499" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2026 12:15:24 -0700 X-CSE-ConnectionGUID: SoH3CDk9TYq/OwejbVrA2A== X-CSE-MsgGUID: cEGOx0u1T3+4V+SbyzM+Rw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,162,1774335600"; d="scan'208";a="271336480" Received: from rchatre-desk1.jf.intel.com ([10.165.154.99]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2026 12:15:24 -0700 From: Reinette Chatre To: tony.luck@intel.com, james.morse@arm.com, Dave.Martin@arm.com, babu.moger@amd.com, bp@alien8.de, tglx@linutronix.de, dave.hansen@linux.intel.com Cc: x86@kernel.org, hpa@zytor.com, ben.horgan@arm.com, fustini@kernel.org, fenghuay@nvidia.com, peternewman@google.com, yu.c.chen@intel.com, linux-kernel@vger.kernel.org, patches@lists.linux.dev, reinette.chatre@intel.com Subject: [PATCH v3 8/9] x86/resctrl: Ensure domain fully initialized before placed on RCU list Date: Fri, 22 May 2026 12:15:12 -0700 Message-ID: <3ba2959b1cd3596e1e340eaee6b43487edaec0a4.1779476724.git.reinette.chatre@intel.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A resctrl domain consists of the domain structure self that includes pointers to dynamically allocated filesystem as well as architecture specific data. For example, the L3 monitoring domain structure consists of the architecture specific struct rdt_hw_l3_mon_domain that contains the dynamically allocated rdt_hw_l3_mon_domain::arch_mbm_states architectural state and the embedded struct rdt_l3_mon_domain contains the dynamically allocated rdt_l3_mon_domain::mbm_states resctrl fs state. The domains are placed on an RCU protected list so that readers could access domains via cpus_read_lock() or from an RCU read-side critical section. A reader accessing a domain via the RCU list expects that the domain and all its dynamically allocated data is accessible. Only place domain on RCU list when all its dynamically allocated data is ready, similarly unlink from RCU list before removing any of its dynamically allocated data. There are no readers accessing a domain via RCU list. Ensure safety of access when such reader arrives. Signed-off-by: Reinette Chatre Reviewed-by: Chen Yu --- Changes since V2: - New patch --- arch/x86/kernel/cpu/resctrl/core.c | 18 +++++++----------- arch/x86/kernel/cpu/resctrl/intel_aet.c | 5 ++--- 2 files changed, 9 insertions(+), 14 deletions(-) diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resct= rl/core.c index 9c01d2562b7a..bca782050198 100644 --- a/arch/x86/kernel/cpu/resctrl/core.c +++ b/arch/x86/kernel/cpu/resctrl/core.c @@ -515,14 +515,12 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_r= esource *r) return; } =20 - list_add_tail_rcu(&d->hdr.list, add_pos); - err =3D resctrl_online_ctrl_domain(r, d); if (err) { - list_del_rcu(&d->hdr.list); - synchronize_rcu(); ctrl_domain_free(hw_dom); + return; } + list_add_tail_rcu(&d->hdr.list, add_pos); } =20 static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, s= truct list_head *add_pos) @@ -556,14 +554,12 @@ static void l3_mon_domain_setup(int cpu, int id, stru= ct rdt_resource *r, struct return; } =20 - list_add_tail_rcu(&d->hdr.list, add_pos); - err =3D resctrl_online_mon_domain(r, &d->hdr); if (err) { - list_del_rcu(&d->hdr.list); - synchronize_rcu(); l3_mon_domain_free(hw_dom); + return; } + list_add_tail_rcu(&d->hdr.list, add_pos); } =20 static void domain_add_cpu_mon(int cpu, struct rdt_resource *r) @@ -642,9 +638,9 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_= resource *r) d =3D container_of(hdr, struct rdt_ctrl_domain, hdr); hw_dom =3D resctrl_to_arch_ctrl_dom(d); =20 - resctrl_offline_ctrl_domain(r, d); list_del_rcu(&hdr->list); synchronize_rcu(); + resctrl_offline_ctrl_domain(r, d); =20 /* * rdt_ctrl_domain "d" is going to be freed below, so clear @@ -689,9 +685,9 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_r= esource *r) =20 d =3D container_of(hdr, struct rdt_l3_mon_domain, hdr); hw_dom =3D resctrl_to_arch_mon_dom(d); - resctrl_offline_mon_domain(r, hdr); list_del_rcu(&hdr->list); synchronize_rcu(); + resctrl_offline_mon_domain(r, hdr); l3_mon_domain_free(hw_dom); break; } @@ -702,9 +698,9 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_r= esource *r) return; =20 pkgd =3D container_of(hdr, struct rdt_perf_pkg_mon_domain, hdr); - resctrl_offline_mon_domain(r, hdr); list_del_rcu(&hdr->list); synchronize_rcu(); + resctrl_offline_mon_domain(r, hdr); kfree(pkgd); break; } diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/= resctrl/intel_aet.c index 89b8b619d5d5..c22c3cf5167d 100644 --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c @@ -398,12 +398,11 @@ void intel_aet_mon_domain_setup(int cpu, int id, stru= ct rdt_resource *r, d->hdr.type =3D RESCTRL_MON_DOMAIN; d->hdr.rid =3D RDT_RESOURCE_PERF_PKG; cpumask_set_cpu(cpu, &d->hdr.cpu_mask); - list_add_tail_rcu(&d->hdr.list, add_pos); =20 err =3D resctrl_online_mon_domain(r, &d->hdr); if (err) { - list_del_rcu(&d->hdr.list); - synchronize_rcu(); kfree(d); + return; } + list_add_tail_rcu(&d->hdr.list, add_pos); } --=20 2.50.1 From nobody Sat Jun 13 13:31:36 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E66A737FF6A for ; Fri, 22 May 2026 19:15:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779477334; cv=none; b=NrfJQo99yQbw9JNw3vOrwZANRp9TKDhc8xxYVePRtfmmOx+o28hc7/JJR80s08K/5kLr4zHItm5CXhU7kAIiGOGIzCmRuwetYaTJ0BZEYpbiWlEsIwPBo15KokvWqclX8tZJsrMotg6OQpe1vXM1GB+6IywQgTHLUCkBZ4xueV0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779477334; c=relaxed/simple; bh=C+0z1oFt2EtVMPc6oB1Qc1vexbhegTpMehSW92wyaFo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oqtCIKjkJOkkz3feKB06RiczTZJAVKJC/ROZ/xskbEMEYqQr7r38JCNqFbhVyDm1uJHJoKFbO7staM3CUN26jwJXo1gLUpnlI84MjugK8BOSAqEQySn8+YtywlHSGNlAS27Av6UMHMmOb3RycgvCf2m+xDP27vhxy/BWDs4f2VA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=K3sTJgc0; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="K3sTJgc0" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779477332; x=1811013332; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=C+0z1oFt2EtVMPc6oB1Qc1vexbhegTpMehSW92wyaFo=; b=K3sTJgc0Jw3XndyJPNIYfxNeEvbpOlzMjkiq0RoLFpbHTWNCXP+pqDFL H+c6nWrrnbw2LHmVQkwlw8vfEz8W9xYkcgjV99yApc26D01ccegEFvjZl DFSr3JRRof/QTp6kym2eQkPvdi0SMIHgZRr3wIldJHJiJkvngW+jhmOVa te5E0UAfwE2xxhVxJtjBaFOIxXkqfIbZPCGkj6n4VcAtthud56mhg+WgV HkeoHduShsmS2c+BnbkluM7tNwiZ6CqykNEwaZksHHVKLoObvnT8eco5R uUaKaGbtIHEwO8OSRFCvrKnlDELiWrusDdBdi4qHzXlxJpqp2z63EBiBK g==; X-CSE-ConnectionGUID: /DJVkrytSpO+BZRWft0RSA== X-CSE-MsgGUID: A3tiOXbgRzajoVKaYsL2RA== X-IronPort-AV: E=McAfee;i="6800,10657,11794"; a="80140508" X-IronPort-AV: E=Sophos;i="6.24,162,1774335600"; d="scan'208";a="80140508" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2026 12:15:24 -0700 X-CSE-ConnectionGUID: DrvbXD6QT7mmeps81PA/dA== X-CSE-MsgGUID: dpnw58TaT/KaVQs1FesIWw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,162,1774335600"; d="scan'208";a="271336483" Received: from rchatre-desk1.jf.intel.com ([10.165.154.99]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2026 12:15:24 -0700 From: Reinette Chatre To: tony.luck@intel.com, james.morse@arm.com, Dave.Martin@arm.com, babu.moger@amd.com, bp@alien8.de, tglx@linutronix.de, dave.hansen@linux.intel.com Cc: x86@kernel.org, hpa@zytor.com, ben.horgan@arm.com, fustini@kernel.org, fenghuay@nvidia.com, peternewman@google.com, yu.c.chen@intel.com, linux-kernel@vger.kernel.org, patches@lists.linux.dev, reinette.chatre@intel.com Subject: [PATCH v3 9/9] fs/resctrl: Fix UAF from worker threads when domains are removed Date: Fri, 22 May 2026 12:15:13 -0700 Message-ID: X-Mailer: git-send-email 2.50.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The mbm_handle_overflow() and cqm_handle_limbo() workers read event counters and may sleep while doing so. They are scheduled via delayed_work embedded in struct rdt_l3_mon_domain. Architecture allocates and frees these domains from CPU hotplug callbacks under cpus_write_lock(), and the workers acquire cpus_read_lock() to keep the domain alive across their access. A use-after-free can occur when a worker is blocked waiting for cpus_read_lock() while the hotplug core holds cpus_write_lock(): the architecture frees the rdt_l3_mon_domain that contains the worker's work_struct. When the worker unblocks, the container_of() it performs on the embedded work pointer dereferences freed memory. Drop cpus_read_lock() from the workers and instead drain pending and in-flight work synchronously before the architecture can free the domain. Since architecture offlines the domain under cpus_write_lock() after it has been unlinked from the RCU list and a grace period has elapsed no new work can be scheduled. The cancel only needs to wait out existing work. Drop rdtgroup_mutex during CPU offline around cancel_delayed_work_sync() so that a worker waiting on the mutex can complete before re-pinning the work on a different CPU. Fixes: 24247aeeabe9 ("x86/intel_rdt/cqm: Improve limbo list processing") Reported-by: Sashiko Closes: https://sashiko.dev/#/patchset/20260429184858.36423-1-tony.luck%40i= ntel.com # [1] Co-developed-by: Tony Luck Signed-off-by: Tony Luck Signed-off-by: Reinette Chatre --- Changes since v2: - Rewrite changelog - v2 attempted to solve the issue by using is_percpu_thread() within the worker to learn if CPU worker was running on is going offline. A Sashiko (https://sashiko.dev/#/patchset/20260515193944.15114-1-tony.luck%= 40intel.com?part=3D5) pointed out that this would not be able to handle the scenario if one of the hotplug handlers following the resctrl offline handlers failed. - Some other fixes attempted that failed: - Switch to accessing domain structure in handler via RCU so that CPU hotplug lock no longer needed. Use cancel_delayed_work_sync() with mutex dropped to cancel worker. Running worker from RCU read-side critical section is a problem since the worker needs to be able to sleep (mbm_handle_overflow()->mbm_update()-> mbm_update_one_event()->resctrl_arch_mon_ctx_alloc()-> might_sleep()) - Adding a reference count to the domain structure to avoid the worker needing to take CPU hotplug lock. This ended up being very complicated with the architecture needing new APIs to manage the reference count which cannot cleanly integrate into MPAM since it uses a single architecture domain structure to contain both the control and monitoring domain structures. Managing the references across mount, unmount, online, offline, as well as worker self exit resulted in several asymmetrical and complicated paths that were error prone. Locking also proved to be complicated since architecture would need to initiate domain free that will need to call back into resctrl that will take rdtgroup_mutex which means that references need to be taken/released without locking. --- fs/resctrl/monitor.c | 52 ++++++++++++++++++++++++++++++++++--------- fs/resctrl/rdtgroup.c | 52 ++++++++++++++++++++++++++++++++++++++----- 2 files changed, 89 insertions(+), 15 deletions(-) diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c index 4565b9864a9e..37df65229109 100644 --- a/fs/resctrl/monitor.c +++ b/fs/resctrl/monitor.c @@ -623,14 +623,22 @@ void mon_event_count(void *info) rr->err =3D 0; } =20 -static struct rdt_ctrl_domain *get_ctrl_domain_from_cpu(int cpu, - struct rdt_resource *r) +/* + * Find the software controller's ctrl domain that contains @cpu on resour= ce @r. + * + * Only called from the mbm_over worker via update_mba_bw() where the retu= rned + * domain is kept alive by cancel_delayed_work_sync() in + * resctrl_offline_ctrl_domain(). This drains this worker and then waits on + * rdtgroup_mutex held here before the architecture can free the ctrl doma= in. + * + * Context: Call from RCU read-side critical section. + */ +static struct rdt_ctrl_domain *get_sc_ctrl_domain_from_cpu(int cpu, + struct rdt_resource *r) { struct rdt_ctrl_domain *d; =20 - lockdep_assert_cpus_held(); - - list_for_each_entry(d, &r->ctrl_domains, hdr.list) { + list_for_each_entry_rcu(d, &r->ctrl_domains, hdr.list) { /* Find the domain that contains this CPU */ if (cpumask_test_cpu(cpu, &d->hdr.cpu_mask)) return d; @@ -691,7 +699,8 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct= rdt_l3_mon_domain *dom_m if (WARN_ON_ONCE(!pmbm_data)) return; =20 - dom_mba =3D get_ctrl_domain_from_cpu(smp_processor_id(), r_mba); + guard(rcu)(); + dom_mba =3D get_sc_ctrl_domain_from_cpu(smp_processor_id(), r_mba); if (!dom_mba) { pr_warn_once("Failure to get domain for MBA update\n"); return; @@ -794,9 +803,19 @@ void cqm_handle_limbo(struct work_struct *work) unsigned long delay =3D msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL); struct rdt_l3_mon_domain *d; =20 - cpus_read_lock(); + /* + * Safe to run without CPU hotplug lock. Work is guaranteed to be + * canceled before the domain structure is removed. + */ mutex_lock(&rdtgroup_mutex); =20 + /* + * Ensure the worker is dedicated to a CPU as intended and not + * relocated by workqueue subsystem as part of CPU going offline. + */ + if (!is_percpu_thread()) + goto out_unlock; + d =3D container_of(work, struct rdt_l3_mon_domain, cqm_limbo.work); =20 __check_limbo(d, false); @@ -808,8 +827,8 @@ void cqm_handle_limbo(struct work_struct *work) delay); } =20 +out_unlock: mutex_unlock(&rdtgroup_mutex); - cpus_read_unlock(); } =20 /** @@ -841,7 +860,10 @@ void mbm_handle_overflow(struct work_struct *work) struct list_head *head; struct rdt_resource *r; =20 - cpus_read_lock(); + /* + * Safe to run without CPU hotplug lock. Work is guaranteed to be + * canceled before the domain structure is removed. + */ mutex_lock(&rdtgroup_mutex); =20 /* @@ -851,6 +873,17 @@ void mbm_handle_overflow(struct work_struct *work) if (!resctrl_mounted || !resctrl_arch_mon_capable()) goto out_unlock; =20 + /* + * Ensure the worker is dedicated to a CPU and not relocated by + * workqueue subsystem as part of CPU going offline since reading + * events depend on smp_processor_id(). After passing this check + * smp_processor_id() is valid for entire duration of this worker + * since it runs with rdtgroup_mutex held and the offline handler needs + * rdtgroup_mutex to offline the CPU being run on here. + */ + if (!is_percpu_thread()) + goto out_unlock; + r =3D resctrl_arch_get_resource(RDT_RESOURCE_L3); d =3D container_of(work, struct rdt_l3_mon_domain, mbm_over.work); =20 @@ -875,7 +908,6 @@ void mbm_handle_overflow(struct work_struct *work) =20 out_unlock: mutex_unlock(&rdtgroup_mutex); - cpus_read_unlock(); } =20 /** diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c index 6601b138ac7a..9281c5a71063 100644 --- a/fs/resctrl/rdtgroup.c +++ b/fs/resctrl/rdtgroup.c @@ -4493,6 +4493,29 @@ static void domain_destroy_l3_mon_state(struct rdt_l= 3_mon_domain *d) =20 void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_d= omain *d) { + /* + * mbm_handle_overflow() may dereference this ctrl domain via + * update_mba_bw()->get_sc_ctrl_domain_from_cpu(). The architecture has + * unlinked the domain from the RCU list and waited a grace period, so + * no new worker iteration can find it; drain any worker that already + * holds a pointer to it before the architecture frees the domain. + * + * Software controller is enabled/disabled on mount/unmount with + * cpus_read_lock() held. Running here with cpus_write_lock() so + * there are no concurrent changes to software controller status. + */ + if (r->rid =3D=3D RDT_RESOURCE_MBA && is_mba_sc(r)) { + struct rdt_resource *l3 =3D resctrl_arch_get_resource(RDT_RESOURCE_L3); + struct rdt_l3_mon_domain *mon_d; + + list_for_each_entry(mon_d, &l3->mon_domains, hdr.list) { + if (mon_d->hdr.id =3D=3D d->hdr.id) { + cancel_delayed_work_sync(&mon_d->mbm_over); + break; + } + } + } + mutex_lock(&rdtgroup_mutex); =20 if (supports_mba_mbps() && r->rid =3D=3D RDT_RESOURCE_MBA) @@ -4505,6 +4528,24 @@ void resctrl_offline_mon_domain(struct rdt_resource = *r, struct rdt_domain_hdr *h { struct rdt_l3_mon_domain *d; =20 + /* + * Called by architecture under CPU hotplug lock as it prepares to remove + * the domain which is guaranteed to be accessible here. + * The domain has been unlinked from the RCU list and a grace period + * has elapsed, so no new worker can be scheduled. Drain any worker that + * is in flight or pending before letting architecture proceed to free + * the domain that has the workers' struct delayed_work embedded. + * Do so before taking rdtgroup_mutex since the workers also acquire it. + */ + if (r->rid =3D=3D RDT_RESOURCE_L3 && + domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)) { + d =3D container_of(hdr, struct rdt_l3_mon_domain, hdr); + if (resctrl_is_mbm_enabled()) + cancel_delayed_work_sync(&d->mbm_over); + if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID)) + cancel_delayed_work_sync(&d->cqm_limbo); + } + mutex_lock(&rdtgroup_mutex); =20 /* @@ -4521,8 +4562,6 @@ void resctrl_offline_mon_domain(struct rdt_resource *= r, struct rdt_domain_hdr *h goto out_unlock; =20 d =3D container_of(hdr, struct rdt_l3_mon_domain, hdr); - if (resctrl_is_mbm_enabled()) - cancel_delayed_work(&d->mbm_over); if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && has_busy_rmid(= d)) { /* * When a package is going down, forcefully @@ -4533,7 +4572,6 @@ void resctrl_offline_mon_domain(struct rdt_resource *= r, struct rdt_domain_hdr *h * package never comes back. */ __check_limbo(d, true); - cancel_delayed_work(&d->cqm_limbo); } =20 domain_destroy_l3_mon_state(d); @@ -4714,12 +4752,16 @@ void resctrl_offline_cpu(unsigned int cpu) d =3D get_mon_domain_from_cpu(cpu, l3); if (d) { if (resctrl_is_mbm_enabled() && cpu =3D=3D d->mbm_work_cpu) { - cancel_delayed_work(&d->mbm_over); + mutex_unlock(&rdtgroup_mutex); + cancel_delayed_work_sync(&d->mbm_over); + mutex_lock(&rdtgroup_mutex); mbm_setup_overflow_handler(d, 0, cpu); } if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && cpu =3D=3D d->cqm_work_cpu && has_busy_rmid(d)) { - cancel_delayed_work(&d->cqm_limbo); + mutex_unlock(&rdtgroup_mutex); + cancel_delayed_work_sync(&d->cqm_limbo); + mutex_lock(&rdtgroup_mutex); cqm_setup_limbo_handler(d, 0, cpu); } } --=20 2.50.1