From nobody Thu Oct 9 04:43:55 2025 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D42B2376E0 for ; Thu, 19 Jun 2025 20:13:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750363988; cv=none; b=HPxq6nyAndVXHxvPj2rFEzuSSufoBebvMx4It4EngZs1mp1sggwJX44F/aLJ4eFQcHd0hE1ewCKHcmHej/anWB3J5lc+ngo0VmETYr4s+q7DZrz1yeKyEtp1Qxj1+NNf4lwlvbCa2XXyfeJ7HqNm7d1sdp3824PZ6MjvjqzKi4k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750363988; c=relaxed/simple; bh=OAMl2/2Up5MxIckrOMb5nox+KObSWQMglG+ax9q9st8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kFHAvHRyt41pw8YZ26SczcLu5uXvEhJ39XvcZZje8vx2Wgw9CKo6zuU0nW8FOE6fQhpv7SDnFBWA0RlpnjQHYyYZ8Dp/KLvJQNFNrY5izPcyHAr2qLH9f5UySJ/37u+y6uhmkScwU8B8r+azM06NZMiTXXnCTRgkVEko9wiMkm0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=QE3hRafb; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="QE3hRafb" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=f4zP15oJQMI/cNiQtQETjNBzcgWHpdktSGMgq1QqQy0=; b=QE3hRafbl5/SUMnuyBSz5icpda l7aN1fbrive6B7/aPZ0Am/kziLy6uOL5Y0bsF9RscZRni0l9mCy8j9Jt63QebgfEecE29eEzrnJMQ q3Go2tcpvIFeMFCsLnoYpDrEcurf1kAWZWEXwe097NpqVtOIb0Hs0sNUPnSRuW7ItqOa98XqFS+ba S3LfzLVjTupovxtqaqw5CmEklefSy7Kv3WkUenJGf1pU3LSW0ipepwP+03yS81r8JroEUeyxgWS61 pbE6i7B+pTDvC2VTDYo4Nt9R3AumxvKDnPNZ6IIWxkVTT3sGL/4yKroo2UNTiv87h37fxRVfdLdtT I/6Qw4aQ==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1uSLV6-000000000CT-1OF1; Thu, 19 Jun 2025 16:04:44 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, bp@alien8.de, x86@kernel.org, nadav.amit@gmail.com, seanjc@google.com, tglx@linutronix.de, mingo@kernel.org, Yu-cheng Yu , Rik van Riel Subject: [RFC PATCH v4 1/8] x86/mm: Introduce Remote Action Request MSRs Date: Thu, 19 Jun 2025 16:03:53 -0400 Message-ID: <20250619200442.1694583-2-riel@surriel.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250619200442.1694583-1-riel@surriel.com> References: <20250619200442.1694583-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yu-cheng Yu Remote Action Request (RAR) is a model-specific feature to speed up inter-processor operations by moving parts of those operations from software to hardware. The current RAR implementation handles TLB flushes and MSR writes. This patch introduces RAR MSRs. RAR is introduced in later patches. There are five RAR MSRs: MSR_CORE_CAPABILITIES MSR_IA32_RAR_CTRL MSR_IA32_RAR_ACT_VEC MSR_IA32_RAR_PAYLOAD_BASE MSR_IA32_RAR_INFO Signed-off-by: Yu-cheng Yu Signed-off-by: Rik van Riel --- arch/x86/include/asm/msr-index.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-in= dex.h index b7dded3c8113..367a62c50aa2 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -110,6 +110,8 @@ =20 /* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */ #define MSR_IA32_CORE_CAPS 0x000000cf +#define MSR_IA32_CORE_CAPS_RAR_BIT 1 +#define MSR_IA32_CORE_CAPS_RAR BIT(MSR_IA32_CORE_CAPS_RAR_BIT) #define MSR_IA32_CORE_CAPS_INTEGRITY_CAPS_BIT 2 #define MSR_IA32_CORE_CAPS_INTEGRITY_CAPS BIT(MSR_IA32_CORE_CAPS_INTEGRI= TY_CAPS_BIT) #define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT 5 @@ -122,6 +124,17 @@ #define SNB_C3_AUTO_UNDEMOTE (1UL << 27) #define SNB_C1_AUTO_UNDEMOTE (1UL << 28) =20 +/* + * Remote Action Requests (RAR) MSRs + */ +#define MSR_IA32_RAR_CTRL 0x000000ed +#define MSR_IA32_RAR_ACT_VEC 0x000000ee +#define MSR_IA32_RAR_PAYLOAD_BASE 0x000000ef +#define MSR_IA32_RAR_INFO 0x000000f0 + +#define RAR_CTRL_ENABLE BIT(31) +#define RAR_CTRL_IGNORE_IF BIT(30) + #define MSR_MTRRcap 0x000000fe =20 #define MSR_IA32_ARCH_CAPABILITIES 0x0000010a --=20 2.49.0 From nobody Thu Oct 9 04:43:55 2025 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F6DD233D92 for ; Thu, 19 Jun 2025 20:13:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750364000; cv=none; b=uTOdB4Z37UDGoXEvN6iZfNM+jkMmihKty/jZ1UqPS8/7C/Vv5jpWKUCDgQ7KgM5n7b1ujhtBqx3twQ+IpRFfRBY/xOXHLHxIV2heGHcAGOc9isap4rKFkzCV8WqU9Re1PhMrHMlUfnk/aNP7KqUMMsFOhcfTNS6nOrxS6JpQC68= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750364000; c=relaxed/simple; bh=S5cxOSa6d88PfbTdq9eoJMTTbccAluAglm7/Dfl/fm8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RW6P9GRlQbZWrX9XoAMnd/S9SUBVmODUA6Kvw57bTNgi6uOEaLWyAPgifnTChX4SvWuSNKxtUovSqIBvmVU4Be2X77rz2ONOQft2pwYn576rBanLQ7IJjtVtw8IGWS8j3NxN1kRZa39LrQNuGpJ2CoyQG+lhTtOC9MMAldTLoR8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=ctqX5tuo; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="ctqX5tuo" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=1SMkrgUDqqJ7Fy358K5JacpCt/v6noOWf3XUoT7x+/I=; b=ctqX5tuoUdi0vlXs2P9KTLt6Dm WPsOp62TwkWNL+dgJYLmLJtYbSYegeezuGhRQsIwGYJhAu+OnOj1thJaYOX1nvMyeAbzVz7U8C9/d r8I1kvS4QOiLudI6G7kgZC40FuXZTqrdRCK65AnxsDH7lao8Jx4tn8vnZDaMs6C01qjlXtIVx8YKO 6UMJDN1pDzP4uvxgKnCGBi2CSHBzzKmcCmPkswgEtmqZ5QpMHl6dByGMwMaeTKh9xmWvtS598BoF3 +fbIGEFssiESgl78sKaaoBUexEgM7UvoN48tG44erQbDBEIDKGRmMBDfiWGmtYcwVYYYoxNQDq5yQ s9FY6NfA==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1uSLV6-000000000CT-1Trd; Thu, 19 Jun 2025 16:04:44 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, bp@alien8.de, x86@kernel.org, nadav.amit@gmail.com, seanjc@google.com, tglx@linutronix.de, mingo@kernel.org, Rik van Riel , Rik van Riel Subject: [RFC PATCH v4 2/8] x86/mm: enable BROADCAST_TLB_FLUSH on Intel, too Date: Thu, 19 Jun 2025 16:03:54 -0400 Message-ID: <20250619200442.1694583-3-riel@surriel.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250619200442.1694583-1-riel@surriel.com> References: <20250619200442.1694583-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rik van Riel Much of the code for Intel RAR and AMD INVLPGB is shared. Place both under the same config option. Signed-off-by: Rik van Riel --- arch/x86/Kconfig.cpu | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index f928cf6e3252..ab763f69f54d 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -360,7 +360,7 @@ menuconfig PROCESSOR_SELECT =20 config BROADCAST_TLB_FLUSH def_bool y - depends on CPU_SUP_AMD && 64BIT + depends on (CPU_SUP_AMD || CPU_SUP_INTEL) && 64BIT && SMP =20 config CPU_SUP_INTEL default y --=20 2.49.0 From nobody Thu Oct 9 04:43:55 2025 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8A8DF2376E0 for ; Thu, 19 Jun 2025 20:13:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750364013; cv=none; b=EvBXCl+9Shm47deNXDljXi438qiYfFzS735hvYE4vycCOKb4AZhsUN3KmF+ZHideN2Dk+rrbuXl17V1E4TJVPXB2ZCXr2UPyl7bfH/lZ9bJ+l+o9vmRGxJxAuO3OYnOIW8ByOxy4gvCOP6hB2s4UQWSsgSo/1OjJ3bUv62XjDY0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750364013; c=relaxed/simple; bh=/qHKwe5oY99HlryV7lJq93+bmzGCoTdk/xaSSofklOg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mHfoZH+WBoeq+SAc9m4c0gjWlJzT9Ey47nyqh5ZVRJV9KhGc40i4+araLA9gTkddQMt3rPm1Iy9pPRTCY69KK7AK9dPeqbJVxw8TIDEV1n3NhHlhDoF8y/s0ff4BIXk93CFM5n5sYOHys3o1QGMAKxfyQQsS92DTpS7fGLb3S/s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=HFPl4/XQ; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="HFPl4/XQ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=XtSvW75Ct2VFujlqOE4744AQ9smmg8lk7Y796rX4H6k=; b=HFPl4/XQEYFq/UwQS2c/z2VHJs fx+FAc0GgLAxV9Q2CeoG6DIUMpgszXO+W3/YMiXc8yrrTbKyQCATrjGVvNatjbUqBdu8G8wIx7NbV TGwuQm3W/tG8TViXL/T7aKDRrZH+3+Y1ioASw3rjrXT5QqZw0P2cpDJGCjgYLCLD1bYJlR2ttr3tD yHK7Ee4LWRQo5Hhgerw1xXmU3yTfvCuQG9ijjHZdDRcManCBohksczQaOnMvC4aVt4BcPve0/zy3o 81hHHCXACLguUkiaDvlrm70hWcCojvLTerXsu/MXp/wpdcjy4PKWHCzc9icKN3orGShjKgW9zoxQN 6SpOPHzQ==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1uSLV6-000000000CT-1ZaV; Thu, 19 Jun 2025 16:04:44 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, bp@alien8.de, x86@kernel.org, nadav.amit@gmail.com, seanjc@google.com, tglx@linutronix.de, mingo@kernel.org, Yu-cheng Yu , Rik van Riel Subject: [RFC PATCH v4 3/8] x86/mm: Introduce X86_FEATURE_RAR Date: Thu, 19 Jun 2025 16:03:55 -0400 Message-ID: <20250619200442.1694583-4-riel@surriel.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250619200442.1694583-1-riel@surriel.com> References: <20250619200442.1694583-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yu-cheng Yu Introduce X86_FEATURE_RAR and enumeration of the feature. [riel: moved initialization to intel.c and disabling to Kconfig.cpufeatures] Signed-off-by: Yu-cheng Yu Signed-off-by: Rik van Riel --- arch/x86/Kconfig.cpufeatures | 4 ++++ arch/x86/include/asm/cpufeatures.h | 2 +- arch/x86/kernel/cpu/intel.c | 9 +++++++++ 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig.cpufeatures b/arch/x86/Kconfig.cpufeatures index 250c10627ab3..7d459b5f47f7 100644 --- a/arch/x86/Kconfig.cpufeatures +++ b/arch/x86/Kconfig.cpufeatures @@ -195,3 +195,7 @@ config X86_DISABLED_FEATURE_SEV_SNP config X86_DISABLED_FEATURE_INVLPGB def_bool y depends on !BROADCAST_TLB_FLUSH + +config X86_DISABLED_FEATURE_RAR + def_bool y + depends on !BROADCAST_TLB_FLUSH diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpuf= eatures.h index ee176236c2be..e6781541ffce 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -76,7 +76,7 @@ #define X86_FEATURE_K8 ( 3*32+ 4) /* Opteron, Athlon64 */ #define X86_FEATURE_ZEN5 ( 3*32+ 5) /* CPU based on Zen5 microarchitectur= e */ #define X86_FEATURE_ZEN6 ( 3*32+ 6) /* CPU based on Zen6 microarchitectur= e */ -/* Free ( 3*32+ 7) */ +#define X86_FEATURE_RAR ( 3*32+ 7) /* Intel Remote Action Request */ #define X86_FEATURE_CONSTANT_TSC ( 3*32+ 8) /* "constant_tsc" TSC ticks at= a constant rate */ #define X86_FEATURE_UP ( 3*32+ 9) /* "up" SMP kernel running on UP */ #define X86_FEATURE_ART ( 3*32+10) /* "art" Always running timer (ART) */ diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 076eaa41b8c8..0cc4ae27127c 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -719,6 +719,15 @@ static void intel_detect_tlb(struct cpuinfo_x86 *c) cpuid_leaf_0x2(®s); for_each_cpuid_0x2_desc(regs, ptr, desc) intel_tlb_lookup(desc); + + if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) { + u64 msr; + + rdmsrl(MSR_IA32_CORE_CAPS, msr); + + if (msr & MSR_IA32_CORE_CAPS_RAR) + setup_force_cpu_cap(X86_FEATURE_RAR); + } } =20 static const struct cpu_dev intel_cpu_dev =3D { --=20 2.49.0 From nobody Thu Oct 9 04:43:55 2025 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C92F30E85B for ; Thu, 19 Jun 2025 20:12:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750363962; cv=none; b=OGYUyI4AwRSeteXHuiSob+nHTcrRM9PpW5uHLq3vntT3Siz0ixpf2ZVY8iWqa0NyXswoPUpgHJIpwTA1cOgRzBkSxGHl0GzzM5QShwpuRzf5E5KWS+ZjSKjgtZQDwGQp78XdmS53biRLKaxV+9cfTBQek6EthmKzA2UCTgGh1FQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750363962; c=relaxed/simple; bh=AVRJK0AlV6BUg2dwNbgEDGLzCQWZMvMX1uvwF2M+3PM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lOkrRFnzlMA7ZtJSSofuwjPmqGPieoGcMoY1YVqk88Ip2IiQNhiYlGIlXL6vEEPdZlF6jPctP0+VXhgzBxFTOQNHkegzD4H/eKErSrqLJa/C8xP9znusUKICdAm6co6jlp0ADT794IiQWW5+iyHottzi03piQ/56Uwkv2x/zfsg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=ZFGcw+8m; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="ZFGcw+8m" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=NH09DhKK9d23nNs3oive8Tg5lMzhCAly3h7vRT7M3aU=; b=ZFGcw+8mV8r46pPtH8tgc/Hp+E 1F9+h4FeV3YEZ6PQMSjEJJofNs5o2D8GbmWELvSCStpMJwPExWsQQg0GmLTRZ0sajwt19o11r1MN/ /OoCuj5bleTr90oN9M2nrkReLfBoU0o53Zrgj3o5qc00x/XbBH2mifH2J6AFaMtVX+/hzDCc7I9n5 ozaJ+b+Z7NPgeuW+pz8QXghnbf+32q3sGxEy55OgPvQ+L1FFbYNapSdOs/qc2+fIuEIRJoB0//GIQ v6xcVLgsTwLww+O858IxIVRkSSQNd/ScZ7cqxBhgypa8z8o0v5Vx8MX3hCzXSGWh50nch9AWukZpl KZhl4AXA==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1uSLV6-000000000CT-1fGB; Thu, 19 Jun 2025 16:04:44 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, bp@alien8.de, x86@kernel.org, nadav.amit@gmail.com, seanjc@google.com, tglx@linutronix.de, mingo@kernel.org, Yu-cheng Yu , Rik van Riel Subject: [RFC PATCH v4 4/8] x86/apic: Introduce Remote Action Request Operations Date: Thu, 19 Jun 2025 16:03:56 -0400 Message-ID: <20250619200442.1694583-5-riel@surriel.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250619200442.1694583-1-riel@surriel.com> References: <20250619200442.1694583-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yu-cheng Yu RAR TLB flushing is started by sending a command to the APIC. This patch adds Remote Action Request commands. Because RAR_VECTOR is hardcoded at 0xe0, POSTED_MSI_NOTIFICATION_VECTOR has to be lowered to 0xdf, reducing the number of available vectors by 13. [riel: refactor after 6 years of changes, lower POSTED_MSI_NOTIFICATION_VEC= TOR] Signed-off-by: Yu-cheng Yu Signed-off-by: Rik van Riel --- arch/x86/include/asm/apicdef.h | 1 + arch/x86/include/asm/irq_vectors.h | 7 ++++++- arch/x86/include/asm/smp.h | 1 + arch/x86/kernel/apic/ipi.c | 5 +++++ arch/x86/kernel/apic/local.h | 3 +++ 5 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h index 094106b6a538..b152d45af91a 100644 --- a/arch/x86/include/asm/apicdef.h +++ b/arch/x86/include/asm/apicdef.h @@ -92,6 +92,7 @@ #define APIC_DM_LOWEST 0x00100 #define APIC_DM_SMI 0x00200 #define APIC_DM_REMRD 0x00300 +#define APIC_DM_RAR 0x00300 #define APIC_DM_NMI 0x00400 #define APIC_DM_INIT 0x00500 #define APIC_DM_STARTUP 0x00600 diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_= vectors.h index 47051871b436..52a0cf56562a 100644 --- a/arch/x86/include/asm/irq_vectors.h +++ b/arch/x86/include/asm/irq_vectors.h @@ -97,11 +97,16 @@ =20 #define LOCAL_TIMER_VECTOR 0xec =20 +/* + * RAR (remote action request) TLB flush + */ +#define RAR_VECTOR 0xe0 + /* * Posted interrupt notification vector for all device MSIs delivered to * the host kernel. */ -#define POSTED_MSI_NOTIFICATION_VECTOR 0xeb +#define POSTED_MSI_NOTIFICATION_VECTOR 0xdf =20 #define NR_VECTORS 256 =20 diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index 0c1c68039d6f..0e5ad0dc987a 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -120,6 +120,7 @@ void __noreturn mwait_play_dead(unsigned int eax_hint); void native_smp_send_reschedule(int cpu); void native_send_call_func_ipi(const struct cpumask *mask); void native_send_call_func_single_ipi(int cpu); +void native_send_rar_ipi(const struct cpumask *mask); =20 asmlinkage __visible void smp_reboot_interrupt(void); __visible void smp_reschedule_interrupt(struct pt_regs *regs); diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c index 98a57cb4aa86..9983c42619ef 100644 --- a/arch/x86/kernel/apic/ipi.c +++ b/arch/x86/kernel/apic/ipi.c @@ -106,6 +106,11 @@ void apic_send_nmi_to_offline_cpu(unsigned int cpu) return; apic->send_IPI(cpu, NMI_VECTOR); } + +void native_send_rar_ipi(const struct cpumask *mask) +{ + __apic_send_IPI_mask(mask, RAR_VECTOR); +} #endif /* CONFIG_SMP */ =20 static inline int __prepare_ICR2(unsigned int mask) diff --git a/arch/x86/kernel/apic/local.h b/arch/x86/kernel/apic/local.h index bdcf609eb283..833669174267 100644 --- a/arch/x86/kernel/apic/local.h +++ b/arch/x86/kernel/apic/local.h @@ -38,6 +38,9 @@ static inline unsigned int __prepare_ICR(unsigned int sho= rtcut, int vector, case NMI_VECTOR: icr |=3D APIC_DM_NMI; break; + case RAR_VECTOR: + icr |=3D APIC_DM_RAR; + break; } return icr; } --=20 2.49.0 From nobody Thu Oct 9 04:43:55 2025 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B89D430E85B for ; Thu, 19 Jun 2025 20:12:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750363951; cv=none; b=rVKfZWs3sjSQ97YpPoaf8YQvLByLutVx3hHF/lmMhK2ew6Ij0p2WrQi9tVOVQ1LadI+83NT2iPlphFTMbFUPjaKKvSdukGynZJmFWZPDEVsb20u5V08HpRkFI1hEh77OHd4BomoOADu88WtemayISO03248Jg9k/8mLZK3xV2JQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750363951; c=relaxed/simple; bh=lOtCva/zan34IwG7qg8uFPKktst3kpBu+i52/giBlWU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lS5xgAQq6j2LbsbxDPjTPmm9TQQnSH2zfJ+o52NWGk7pxMKResRfbv+ph8d9nnlW0uAVfh/y7q9UHU0yW+k43/KgVznPiMFA7Wo4R02l4PnfrF0TZdDQGJEwscgekK8x6Hkq2RaA8fLz+LvV1dhfz6BiLy88RPv9eSb2+Pef6WE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=P9dtBiUi; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="P9dtBiUi" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=ewsow4lMM2y8LpsTTEV1m0HRXucRmD9WLDLgxe22rs4=; b=P9dtBiUiSOlmih6hhQjzfz/VNm bWJEIeI+RoxKefUY/7QdtuhllibD5YzUDXmOsX4aiK827QNUpIxzyJlOch+Zi5c+AP61UjTH9up8L 1/Qn/kXDaLvi859bZj9E5kS4vqFF8VbRcVRQjwPbGWjRMFch0TBydaJvT50L5gsWb5dA5/8nOs0x5 hEY0VSbLdGjWt8xUckR6LfIsFDSwYayhp7uNAVmqXsFHxqMkZB0HbQl1ePZD48nN4WRojfl/oFXcQ SjjenamRZ4Hl+U8YYaX+fIcVupRdIl0Fquo6BHUBnsqXVu+Q4zT3TiuMbKr5/b3iEe8Q62MJBn/Ay pEXK3vJw==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1uSLV6-000000000CT-1lFC; Thu, 19 Jun 2025 16:04:44 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, bp@alien8.de, x86@kernel.org, nadav.amit@gmail.com, seanjc@google.com, tglx@linutronix.de, mingo@kernel.org, Yu-cheng Yu , Rik van Riel Subject: [RFC PATCH v4 5/8] x86/mm: Introduce Remote Action Request Date: Thu, 19 Jun 2025 16:03:57 -0400 Message-ID: <20250619200442.1694583-6-riel@surriel.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250619200442.1694583-1-riel@surriel.com> References: <20250619200442.1694583-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yu-cheng Yu Remote Action Request (RAR) is a TLB flushing broadcast facility. To start a TLB flush, the initiator CPU creates a RAR payload and sends a command to the APIC. The receiving CPUs automatically flush TLBs as specified in the payload without the kernel's involement. [ riel: add pcid parameter to smp_call_rar_many so other mms can be flushed= ] Signed-off-by: Yu-cheng Yu Signed-off-by: Rik van Riel --- arch/x86/include/asm/rar.h | 76 ++++++++++++ arch/x86/kernel/cpu/intel.c | 8 +- arch/x86/mm/Makefile | 1 + arch/x86/mm/rar.c | 236 ++++++++++++++++++++++++++++++++++++ 4 files changed, 320 insertions(+), 1 deletion(-) create mode 100644 arch/x86/include/asm/rar.h create mode 100644 arch/x86/mm/rar.c diff --git a/arch/x86/include/asm/rar.h b/arch/x86/include/asm/rar.h new file mode 100644 index 000000000000..c875b9e9c509 --- /dev/null +++ b/arch/x86/include/asm/rar.h @@ -0,0 +1,76 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_RAR_H +#define _ASM_X86_RAR_H + +/* + * RAR payload types + */ +#define RAR_TYPE_INVPG 0 +#define RAR_TYPE_INVPG_NO_CR3 1 +#define RAR_TYPE_INVPCID 2 +#define RAR_TYPE_INVEPT 3 +#define RAR_TYPE_INVVPID 4 +#define RAR_TYPE_WRMSR 5 + +/* + * Subtypes for RAR_TYPE_INVLPG + */ +#define RAR_INVPG_ADDR 0 /* address specific */ +#define RAR_INVPG_ALL 2 /* all, include global */ +#define RAR_INVPG_ALL_NO_GLOBAL 3 /* all, exclude global */ + +/* + * Subtypes for RAR_TYPE_INVPCID + */ +#define RAR_INVPCID_ADDR 0 /* address specific */ +#define RAR_INVPCID_PCID 1 /* all of PCID */ +#define RAR_INVPCID_ALL 2 /* all, include global */ +#define RAR_INVPCID_ALL_NO_GLOBAL 3 /* all, exclude global */ + +/* + * Page size for RAR_TYPE_INVLPG + */ +#define RAR_INVLPG_PAGE_SIZE_4K 0 +#define RAR_INVLPG_PAGE_SIZE_2M 1 +#define RAR_INVLPG_PAGE_SIZE_1G 2 + +/* + * Max number of pages per payload + */ +#define RAR_INVLPG_MAX_PAGES 63 + +struct rar_payload { + u64 for_sw : 8; + u64 type : 8; + u64 must_be_zero_1 : 16; + u64 subtype : 3; + u64 page_size : 2; + u64 num_pages : 6; + u64 must_be_zero_2 : 21; + + u64 must_be_zero_3; + + /* + * Starting address + */ + union { + u64 initiator_cr3; + struct { + u64 pcid : 12; + u64 ignored : 52; + }; + }; + u64 linear_address; + + /* + * Padding + */ + u64 padding[4]; +}; + +void rar_cpu_init(void); +void rar_boot_cpu_init(void); +void smp_call_rar_many(const struct cpumask *mask, u16 pcid, + unsigned long start, unsigned long end); + +#endif /* _ASM_X86_RAR_H */ diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 0cc4ae27127c..ddc5e7d81077 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -624,6 +625,9 @@ static void init_intel(struct cpuinfo_x86 *c) split_lock_init(); =20 intel_init_thermal(c); + + if (cpu_feature_enabled(X86_FEATURE_RAR)) + rar_cpu_init(); } =20 #ifdef CONFIG_X86_32 @@ -725,8 +729,10 @@ static void intel_detect_tlb(struct cpuinfo_x86 *c) =20 rdmsrl(MSR_IA32_CORE_CAPS, msr); =20 - if (msr & MSR_IA32_CORE_CAPS_RAR) + if (msr & MSR_IA32_CORE_CAPS_RAR) { setup_force_cpu_cap(X86_FEATURE_RAR); + rar_boot_cpu_init(); + } } } =20 diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 5b9908f13dcf..f36fc99e8b10 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -52,6 +52,7 @@ obj-$(CONFIG_ACPI_NUMA) +=3D srat.o obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) +=3D pkeys.o obj-$(CONFIG_RANDOMIZE_MEMORY) +=3D kaslr.o obj-$(CONFIG_MITIGATION_PAGE_TABLE_ISOLATION) +=3D pti.o +obj-$(CONFIG_BROADCAST_TLB_FLUSH) +=3D rar.o =20 obj-$(CONFIG_X86_MEM_ENCRYPT) +=3D mem_encrypt.o obj-$(CONFIG_AMD_MEM_ENCRYPT) +=3D mem_encrypt_amd.o diff --git a/arch/x86/mm/rar.c b/arch/x86/mm/rar.c new file mode 100644 index 000000000000..76959782fb03 --- /dev/null +++ b/arch/x86/mm/rar.c @@ -0,0 +1,236 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * RAR TLB shootdown + */ +#include +#include +#include +#include +#include +#include +#include + +static DEFINE_PER_CPU(struct cpumask, rar_cpu_mask); + +#define RAR_SUCCESS 0x00 +#define RAR_PENDING 0x01 +#define RAR_FAILURE 0x80 + +#define RAR_MAX_PAYLOADS 64UL + +/* How many RAR payloads are supported by this CPU */ +static int rar_max_payloads __ro_after_init =3D RAR_MAX_PAYLOADS; + +/* + * RAR payloads telling CPUs what to do. This table is shared between + * all CPUs; it is possible to have multiple payload tables shared between + * different subsets of CPUs, but that adds a lot of complexity. + */ +static struct rar_payload rar_payload[RAR_MAX_PAYLOADS] __page_aligned_bss; + +/* + * Reduce contention for the RAR payloads by having a small number of + * CPUs share a RAR payload entry, instead of a free for all with all CPUs. + */ +struct rar_lock { + union { + raw_spinlock_t lock; + char __padding[SMP_CACHE_BYTES]; + }; +}; + +static struct rar_lock rar_locks[RAR_MAX_PAYLOADS] __cacheline_aligned; + +/* + * The action vector tells each CPU which payload table entries + * have work for that CPU. + */ +static DEFINE_PER_CPU_ALIGNED(u8[RAR_MAX_PAYLOADS], rar_action); + +/* + * TODO: group CPUs together based on locality in the system instead + * of CPU number, to further reduce the cost of contention. + */ +static int cpu_rar_payload_number(void) +{ + int cpu =3D raw_smp_processor_id(); + return cpu % rar_max_payloads; +} + +static int get_payload_slot(void) +{ + int payload_nr =3D cpu_rar_payload_number(); + raw_spin_lock(&rar_locks[payload_nr].lock); + return payload_nr; +} + +static void free_payload_slot(unsigned long payload_nr) +{ + raw_spin_unlock(&rar_locks[payload_nr].lock); +} + +static void set_payload(struct rar_payload *p, u16 pcid, unsigned long sta= rt, + long pages) +{ + p->must_be_zero_1 =3D 0; + p->must_be_zero_2 =3D 0; + p->must_be_zero_3 =3D 0; + p->page_size =3D RAR_INVLPG_PAGE_SIZE_4K; + p->type =3D RAR_TYPE_INVPCID; + p->pcid =3D pcid; + p->linear_address =3D start; + + if (pcid) { + /* RAR invalidation of the mapping of a specific process. */ + if (pages < RAR_INVLPG_MAX_PAGES) { + p->num_pages =3D pages; + p->subtype =3D RAR_INVPCID_ADDR; + } else { + p->subtype =3D RAR_INVPCID_PCID; + } + } else { + /* + * Unfortunately RAR_INVPCID_ADDR excludes global translations. + * Always do a full flush for kernel invalidations. + */ + p->subtype =3D RAR_INVPCID_ALL; + } + + /* Ensure all writes are visible before the action entry is set. */ + smp_wmb(); +} + +static void set_action_entry(unsigned long payload_nr, int target_cpu) +{ + u8 *bitmap =3D per_cpu(rar_action, target_cpu); + + /* + * Given a remote CPU, "arm" its action vector to ensure it handles + * the request at payload_nr when it receives a RAR signal. + * The remote CPU will overwrite RAR_PENDING when it handles + * the request. + */ + WRITE_ONCE(bitmap[payload_nr], RAR_PENDING); +} + +static void wait_for_action_done(unsigned long payload_nr, int target_cpu) +{ + u8 status; + u8 *rar_actions =3D per_cpu(rar_action, target_cpu); + + status =3D READ_ONCE(rar_actions[payload_nr]); + + while (status =3D=3D RAR_PENDING) { + cpu_relax(); + status =3D READ_ONCE(rar_actions[payload_nr]); + } + + WARN_ON_ONCE(rar_actions[payload_nr] !=3D RAR_SUCCESS); +} + +void rar_cpu_init(void) +{ + u8 *bitmap; + u64 r; + + /* Check if this CPU was already initialized. */ + rdmsrl(MSR_IA32_RAR_PAYLOAD_BASE, r); + if (r =3D=3D (u64)virt_to_phys(rar_payload)) + return; + + bitmap =3D this_cpu_ptr(rar_action); + memset(bitmap, 0, RAR_MAX_PAYLOADS); + wrmsrl(MSR_IA32_RAR_ACT_VEC, (u64)virt_to_phys(bitmap)); + wrmsrl(MSR_IA32_RAR_PAYLOAD_BASE, (u64)virt_to_phys(rar_payload)); + + /* + * Allow RAR events to be processed while interrupts are disabled on + * a target CPU. This prevents "pileups" where many CPUs are waiting + * on one CPU that has IRQs blocked for too long, and should reduce + * contention on the rar_payload table. + */ + wrmsrl(MSR_IA32_RAR_CTRL, RAR_CTRL_ENABLE | RAR_CTRL_IGNORE_IF); +} + +void rar_boot_cpu_init(void) +{ + int max_payloads; + u64 r; + + /* The MSR contains N defining the max [0-N] rar payload slots. */ + rdmsrl(MSR_IA32_RAR_INFO, r); + max_payloads =3D (r >> 32) + 1; + + /* If this CPU supports less than RAR_MAX_PAYLOADS, lower our limit. */ + if (max_payloads < rar_max_payloads) + rar_max_payloads =3D max_payloads; + pr_info("RAR: support %d payloads\n", max_payloads); + + for (r =3D 0; r < rar_max_payloads; r++) + rar_locks[r].lock =3D __RAW_SPIN_LOCK_UNLOCKED(rar_lock); + + /* Initialize the boot CPU early to handle early boot flushes. */ + rar_cpu_init(); +} + +/* + * Inspired by smp_call_function_many(), but RAR requires a global payload + * table rather than per-CPU payloads in the CSD table, because the action + * handler is microcode rather than software. + */ +void smp_call_rar_many(const struct cpumask *mask, u16 pcid, + unsigned long start, unsigned long end) +{ + unsigned long pages =3D (end - start + PAGE_SIZE) / PAGE_SIZE; + int cpu, this_cpu =3D smp_processor_id(); + cpumask_t *dest_mask; + unsigned long payload_nr; + + /* Catch the "end - start + PAGE_SIZE" overflow above. */ + if (end =3D=3D TLB_FLUSH_ALL) + pages =3D RAR_INVLPG_MAX_PAGES + 1; + + /* + * Can deadlock when called with interrupts disabled. + * Allow CPUs that are not yet online though, as no one else can + * send smp call function interrupt to this CPU and as such deadlocks + * can't happen. + */ + if (cpu_online(this_cpu) && !oops_in_progress && !early_boot_irqs_disable= d) { + lockdep_assert_irqs_enabled(); + lockdep_assert_preemption_disabled(); + } + + /* + * A CPU needs to be initialized in order to process RARs. + * Skip offline CPUs. + * + * TODO: + * - Skip RAR to CPUs that are in a deeper C-state, with an empty TLB + * + * This code cannot use the should_flush_tlb() logic here because + * RAR flushes do not update the tlb_gen, resulting in unnecessary + * flushes at context switch time. + */ + dest_mask =3D this_cpu_ptr(&rar_cpu_mask); + cpumask_and(dest_mask, mask, cpu_online_mask); + + /* Some callers race with other CPUs changing the passed mask */ + if (unlikely(!cpumask_weight(dest_mask))) + return; + + payload_nr =3D get_payload_slot(); + set_payload(&rar_payload[payload_nr], pcid, start, pages); + + for_each_cpu(cpu, dest_mask) + set_action_entry(payload_nr, cpu); + + /* Send a message to all CPUs in the map */ + native_send_rar_ipi(dest_mask); + + for_each_cpu(cpu, dest_mask) + wait_for_action_done(payload_nr, cpu); + + free_payload_slot(payload_nr); +} +EXPORT_SYMBOL(smp_call_rar_many); --=20 2.49.0 From nobody Thu Oct 9 04:43:55 2025 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F3BF41A2632 for ; Thu, 19 Jun 2025 20:17:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750364238; cv=none; b=QD9Gg0QfOH+osjZNcZN/MP2dhQKdgFovMtWrBXtjQxhQ0O8TIerVR1Q+i3jraD1stkqnbfEAvQ0tEO9AFQFW/Rz6D+kcDlgAovQpOBTk8wBCE9oC350B6STqaLJz6s4G0HLGkdmXFajqBHNG6pGY+ZZB14u2EmAFJ07+cUnSOk0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750364238; c=relaxed/simple; bh=25nE3AP6G/VH0ulig6Pf87unBvwnVISmhX6u3B+80Rk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sdGC+6OkGeX8esHySVjPBmUQcUEne0V/gP5KybcUiyv9omzRIoCn8Um78GPO5x2/hJgDR7dvVJcq/jwroYbfbadPKyux4ASMWaIs0aIZKmX8Gwfkjl1CsjjwdYXC0rPyLV58hYrFOqanpt+GeS9tjLLrGHzsB8kCxzbE0F751Sc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=EPZLF19T; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="EPZLF19T" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=ns8Xv98DYTXn1XvQ4Hh9ILsDIPyjR4wi8vgj74UOuo0=; b=EPZLF19TTDMFGIg2KqY2VbwNSn FCRnphDNqLCYGeHblOKha968zsEgzql6pX4bjeo3ArXmIsLdR0Zl5OtBwtVqs20efA/UZLbRgyIqH ncGpOeuyyAl7fB/y6Jmh/5NyfuOsZuep6GdO0Oiv6UnfzuSUNnitxaJZmiTN1lj4kOhHKhjKCB6W3 Kl0RLnc7F8q2JjsBqjLuMZ70AQWMG5BUQnxUYe7OXcZw5pUcKi/MfYol577ZBh7IxfHop0czPvkfU 9cKsBkx5ykhgbYQfA/pMrTgTJxwSdIBCVSnq0wS2+6Y1NxXUJujhNnuv+SThlRJH7JZvUPkVbnqpX ZUb4GTiA==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1uSLV6-000000000CT-1sGz; Thu, 19 Jun 2025 16:04:44 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, bp@alien8.de, x86@kernel.org, nadav.amit@gmail.com, seanjc@google.com, tglx@linutronix.de, mingo@kernel.org, Rik van Riel , Rik van Riel Subject: [RFC PATCH v4 6/8] x86/mm: use RAR for kernel TLB flushes Date: Thu, 19 Jun 2025 16:03:58 -0400 Message-ID: <20250619200442.1694583-7-riel@surriel.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250619200442.1694583-1-riel@surriel.com> References: <20250619200442.1694583-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rik van Riel Use Intel RAR for kernel TLB flushes, when enabled. Pass in PCID 0 to smp_call_rar_many() to flush the specified addresses, regardless of which PCID they might be cached under in any destination CPU. Signed-off-by: Rik van Riel --- arch/x86/mm/tlb.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 39f80111e6f1..8931f7029d6c 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -21,6 +21,7 @@ #include #include #include +#include #include =20 #include "mm_internal.h" @@ -1468,6 +1469,18 @@ static void do_flush_tlb_all(void *info) __flush_tlb_all(); } =20 +static void rar_full_flush(const cpumask_t *cpumask) +{ + guard(preempt)(); + smp_call_rar_many(cpumask, 0, 0, TLB_FLUSH_ALL); + invpcid_flush_all(); +} + +static void rar_flush_all(void) +{ + rar_full_flush(cpu_online_mask); +} + void flush_tlb_all(void) { count_vm_tlb_event(NR_TLB_REMOTE_FLUSH); @@ -1475,6 +1488,8 @@ void flush_tlb_all(void) /* First try (faster) hardware-assisted TLB invalidation. */ if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) invlpgb_flush_all(); + else if (cpu_feature_enabled(X86_FEATURE_RAR)) + rar_flush_all(); else /* Fall back to the IPI-based invalidation. */ on_each_cpu(do_flush_tlb_all, NULL, 1); @@ -1504,15 +1519,36 @@ static void do_kernel_range_flush(void *info) struct flush_tlb_info *f =3D info; unsigned long addr; =20 + /* + * With PTI kernel TLB entries in all PCIDs need to be flushed. + * With RAR the PCID space becomes so large, we might as well flush it al= l. + * + * Either of the two by itself works with targeted flushes. + */ + if (cpu_feature_enabled(X86_FEATURE_RAR) && + cpu_feature_enabled(X86_FEATURE_PTI)) { + invpcid_flush_all(); + return; + } + /* flush range by one by one 'invlpg' */ for (addr =3D f->start; addr < f->end; addr +=3D PAGE_SIZE) flush_tlb_one_kernel(addr); } =20 +static void rar_kernel_range_flush(struct flush_tlb_info *info) +{ + guard(preempt)(); + smp_call_rar_many(cpu_online_mask, 0, info->start, info->end); + do_kernel_range_flush(info); +} + static void kernel_tlb_flush_all(struct flush_tlb_info *info) { if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) invlpgb_flush_all(); + else if (cpu_feature_enabled(X86_FEATURE_RAR)) + rar_flush_all(); else on_each_cpu(do_flush_tlb_all, NULL, 1); } @@ -1521,6 +1557,8 @@ static void kernel_tlb_flush_range(struct flush_tlb_i= nfo *info) { if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) invlpgb_kernel_range_flush(info); + else if (cpu_feature_enabled(X86_FEATURE_RAR)) + rar_kernel_range_flush(info); else on_each_cpu(do_kernel_range_flush, info, 1); } --=20 2.49.0 From nobody Thu Oct 9 04:43:55 2025 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2351E246BC7 for ; Thu, 19 Jun 2025 20:13:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750364017; cv=none; b=RpFbhV5gfQZJ4VMjPJbKN/kaMpFjQIYgHIEWfE8H8Ut1mMyVjYwiypigi19NvYX2BHseRfSw2D+NIUQyRc65mEDLD5dfkMcUlpwIlU2VlED2n+O63BHDZaQlZTXPGRhFKhQr8dwtUm69XCHucgfjhrjxCwTfWaJRQ2YUSYgJmm0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750364017; c=relaxed/simple; bh=upIu+KsZ3Z2Rp2XJfLww/NHCQuuMZXOn/Pu24KP7Kkg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YoiDTNrCsImD4wSPxoVPSD7N7E5RMvowDrTyqzDLlSYS1JnDvLB4O+QTX1jxNK2f1PpUHsCl+Eim6huWLHewt4JzUpj8iVbQfAL4uE5VHnK9eqKdP2hIcw+x85cZRbR9oe9v+PwqbVuzFuugggXSEWzhuIDKpZND3Q1cHDiA6p4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=LLoAR2aS; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="LLoAR2aS" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=MtnrCmHNd9wuzUQa9J8tgWZdGP/q5AltGll/4cvhDis=; b=LLoAR2aSYfRAknhB9dgYac6yl9 bKzAajfHZ2XU9nv2PUntqJMvr4FisZsuJaIbDVzNkXE9zscf4ERmukzyd0BNDPCoiQI2QXZKwEddZ SzvXombqhD5ysi+FFjER1QftvKkcsyZXWUhbHDRdWZFyRMCpm8eKEIIgUL2VXsZlI2VkdTjWGiVH5 QpbDFhfse3bT1NGFkRzsNLUOO+U93rzHchIHnj6oEYCfXxRCUzepIrk7z9ZoCp09rwMznaxF6Ohhx OuirHWognHKdkyAOmY0PkDkVnGM2NPZRw8gG407tOUQbsiJdCIEM20F1SqYCO6qDX8w4qJ6F8y2Ee wjzXPcfQ==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1uSLV6-000000000CT-1ySj; Thu, 19 Jun 2025 16:04:44 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, bp@alien8.de, x86@kernel.org, nadav.amit@gmail.com, seanjc@google.com, tglx@linutronix.de, mingo@kernel.org, Rik van Riel , Rik van Riel Subject: [RFC PATCH v4 7/8] x86/mm: userspace & pageout flushing using Intel RAR Date: Thu, 19 Jun 2025 16:03:59 -0400 Message-ID: <20250619200442.1694583-8-riel@surriel.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250619200442.1694583-1-riel@surriel.com> References: <20250619200442.1694583-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rik van Riel Use Intel RAR to flush userspace mappings. Because RAR flushes are targeted using a cpu bitmap, the rules are a little bit different than for true broadcast TLB invalidation. For true broadcast TLB invalidation, like done with AMD INVLPGB, a global ASID always has up to date TLB entries on every CPU. The context switch code never has to flush the TLB when switching to a global ASID on any CPU with INVLPGB. For RAR, the TLB mappings for a global ASID are kept up to date only on CPUs within the mm_cpumask, which lazily follows the threads around the system. The context switch code does not need to flush the TLB if the CPU is in the mm_cpumask, and the PCID used stays the same. However, a CPU that falls outside of the mm_cpumask can have out of date TLB mappings for this task. When switching to that task on a CPU not in the mm_cpumask, the TLB does need to be flushed. Signed-off-by: Rik van Riel --- arch/x86/include/asm/tlbflush.h | 9 +- arch/x86/mm/tlb.c | 217 ++++++++++++++++++++++++++------ 2 files changed, 182 insertions(+), 44 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflus= h.h index e9b81876ebe4..21bd9162df38 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -250,7 +250,8 @@ static inline u16 mm_global_asid(struct mm_struct *mm) { u16 asid; =20 - if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) && + !cpu_feature_enabled(X86_FEATURE_RAR)) return 0; =20 asid =3D smp_load_acquire(&mm->context.global_asid); @@ -263,7 +264,8 @@ static inline u16 mm_global_asid(struct mm_struct *mm) =20 static inline void mm_init_global_asid(struct mm_struct *mm) { - if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) { + if (cpu_feature_enabled(X86_FEATURE_INVLPGB) || + cpu_feature_enabled(X86_FEATURE_RAR)) { mm->context.global_asid =3D 0; mm->context.asid_transition =3D false; } @@ -287,7 +289,8 @@ static inline void mm_clear_asid_transition(struct mm_s= truct *mm) =20 static inline bool mm_in_asid_transition(struct mm_struct *mm) { - if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) && + !cpu_feature_enabled(X86_FEATURE_RAR)) return false; =20 return mm && READ_ONCE(mm->context.asid_transition); diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 8931f7029d6c..590742838e43 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -222,7 +222,8 @@ struct new_asid { unsigned int need_flush : 1; }; =20 -static struct new_asid choose_new_asid(struct mm_struct *next, u64 next_tl= b_gen) +static struct new_asid choose_new_asid(struct mm_struct *next, u64 next_tl= b_gen, + bool new_cpu) { struct new_asid ns; u16 asid; @@ -235,14 +236,22 @@ static struct new_asid choose_new_asid(struct mm_stru= ct *next, u64 next_tlb_gen) =20 /* * TLB consistency for global ASIDs is maintained with hardware assisted - * remote TLB flushing. Global ASIDs are always up to date. + * remote TLB flushing. Global ASIDs are always up to date with INVLPGB, + * and up to date for CPUs in the mm_cpumask with RAR.. */ - if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) { + if (cpu_feature_enabled(X86_FEATURE_INVLPGB) || + cpu_feature_enabled(X86_FEATURE_RAR)) { u16 global_asid =3D mm_global_asid(next); =20 if (global_asid) { ns.asid =3D global_asid; ns.need_flush =3D 0; + /* + * If the CPU fell out of the cpumask, it can be + * out of date with RAR, and should be flushed. + */ + if (cpu_feature_enabled(X86_FEATURE_RAR)) + ns.need_flush =3D new_cpu; return ns; } } @@ -300,7 +309,14 @@ static void reset_global_asid_space(void) { lockdep_assert_held(&global_asid_lock); =20 - invlpgb_flush_all_nonglobals(); + /* + * The global flush ensures that a freshly allocated global ASID + * has no entries in any TLB, and can be used immediately. + * With Intel RAR, the TLB may still need to be flushed at context + * switch time when dealing with a CPU that was not in the mm_cpumask + * for the process, and may have missed flushes along the way. + */ + flush_tlb_all(); =20 /* * The TLB flush above makes it safe to re-use the previously @@ -377,7 +393,7 @@ static void use_global_asid(struct mm_struct *mm) { u16 asid; =20 - guard(raw_spinlock_irqsave)(&global_asid_lock); + guard(raw_spinlock)(&global_asid_lock); =20 /* This process is already using broadcast TLB invalidation. */ if (mm_global_asid(mm)) @@ -403,13 +419,14 @@ static void use_global_asid(struct mm_struct *mm) =20 void mm_free_global_asid(struct mm_struct *mm) { - if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) && + !cpu_feature_enabled(X86_FEATURE_RAR)) return; =20 if (!mm_global_asid(mm)) return; =20 - guard(raw_spinlock_irqsave)(&global_asid_lock); + guard(raw_spinlock)(&global_asid_lock); =20 /* The global ASID can be re-used only after flush at wrap-around. */ #ifdef CONFIG_BROADCAST_TLB_FLUSH @@ -427,7 +444,8 @@ static bool mm_needs_global_asid(struct mm_struct *mm, = u16 asid) { u16 global_asid =3D mm_global_asid(mm); =20 - if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) && + !cpu_feature_enabled(X86_FEATURE_RAR)) return false; =20 /* Process is transitioning to a global ASID */ @@ -445,7 +463,8 @@ static bool mm_needs_global_asid(struct mm_struct *mm, = u16 asid) */ static void consider_global_asid(struct mm_struct *mm) { - if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) && + !cpu_feature_enabled(X86_FEATURE_RAR)) return; =20 /* Check every once in a while. */ @@ -490,6 +509,7 @@ static void finish_asid_transition(struct flush_tlb_inf= o *info) * that results in a (harmless) extra IPI. */ if (READ_ONCE(per_cpu(cpu_tlbstate.loaded_mm_asid, cpu)) !=3D bc_asid) { + info->trim_cpumask =3D true; flush_tlb_multi(mm_cpumask(info->mm), info); return; } @@ -499,7 +519,7 @@ static void finish_asid_transition(struct flush_tlb_inf= o *info) mm_clear_asid_transition(mm); } =20 -static void broadcast_tlb_flush(struct flush_tlb_info *info) +static void invlpgb_tlb_flush(struct flush_tlb_info *info) { bool pmd =3D info->stride_shift =3D=3D PMD_SHIFT; unsigned long asid =3D mm_global_asid(info->mm); @@ -530,8 +550,6 @@ static void broadcast_tlb_flush(struct flush_tlb_info *= info) addr +=3D nr << info->stride_shift; } while (addr < info->end); =20 - finish_asid_transition(info); - /* Wait for the INVLPGBs kicked off above to finish. */ __tlbsync(); } @@ -862,7 +880,7 @@ void switch_mm_irqs_off(struct mm_struct *unused, struc= t mm_struct *next, /* Check if the current mm is transitioning to a global ASID */ if (mm_needs_global_asid(next, prev_asid)) { next_tlb_gen =3D atomic64_read(&next->context.tlb_gen); - ns =3D choose_new_asid(next, next_tlb_gen); + ns =3D choose_new_asid(next, next_tlb_gen, true); goto reload_tlb; } =20 @@ -900,6 +918,7 @@ void switch_mm_irqs_off(struct mm_struct *unused, struc= t mm_struct *next, ns.asid =3D prev_asid; ns.need_flush =3D true; } else { + bool new_cpu =3D false; /* * Apply process to process speculation vulnerability * mitigations if applicable. @@ -914,20 +933,25 @@ void switch_mm_irqs_off(struct mm_struct *unused, str= uct mm_struct *next, this_cpu_write(cpu_tlbstate.loaded_mm, LOADED_MM_SWITCHING); barrier(); =20 - /* Start receiving IPIs and then read tlb_gen (and LAM below) */ - if (next !=3D &init_mm && !cpumask_test_cpu(cpu, mm_cpumask(next))) + /* Start receiving IPIs and RAR invalidations */ + if (next !=3D &init_mm && !cpumask_test_cpu(cpu, mm_cpumask(next))) { cpumask_set_cpu(cpu, mm_cpumask(next)); + if (cpu_feature_enabled(X86_FEATURE_RAR)) + new_cpu =3D true; + } + next_tlb_gen =3D atomic64_read(&next->context.tlb_gen); =20 - ns =3D choose_new_asid(next, next_tlb_gen); + ns =3D choose_new_asid(next, next_tlb_gen, new_cpu); } =20 reload_tlb: new_lam =3D mm_lam_cr3_mask(next); if (ns.need_flush) { - VM_WARN_ON_ONCE(is_global_asid(ns.asid)); - this_cpu_write(cpu_tlbstate.ctxs[ns.asid].ctx_id, next->context.ctx_id); - this_cpu_write(cpu_tlbstate.ctxs[ns.asid].tlb_gen, next_tlb_gen); + if (is_dyn_asid(ns.asid)) { + this_cpu_write(cpu_tlbstate.ctxs[ns.asid].ctx_id, next->context.ctx_id); + this_cpu_write(cpu_tlbstate.ctxs[ns.asid].tlb_gen, next_tlb_gen); + } load_new_mm_cr3(next->pgd, ns.asid, new_lam, true); =20 trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); @@ -1115,7 +1139,7 @@ static void flush_tlb_func(void *info) const struct flush_tlb_info *f =3D info; struct mm_struct *loaded_mm =3D this_cpu_read(cpu_tlbstate.loaded_mm); u32 loaded_mm_asid =3D this_cpu_read(cpu_tlbstate.loaded_mm_asid); - u64 local_tlb_gen; + u64 local_tlb_gen =3D 0; bool local =3D smp_processor_id() =3D=3D f->initiating_cpu; unsigned long nr_invalidate =3D 0; u64 mm_tlb_gen; @@ -1138,19 +1162,6 @@ static void flush_tlb_func(void *info) if (unlikely(loaded_mm =3D=3D &init_mm)) return; =20 - /* Reload the ASID if transitioning into or out of a global ASID */ - if (mm_needs_global_asid(loaded_mm, loaded_mm_asid)) { - switch_mm_irqs_off(NULL, loaded_mm, NULL); - loaded_mm_asid =3D this_cpu_read(cpu_tlbstate.loaded_mm_asid); - } - - /* Broadcast ASIDs are always kept up to date with INVLPGB. */ - if (is_global_asid(loaded_mm_asid)) - return; - - VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].ctx_id) !=3D - loaded_mm->context.ctx_id); - if (this_cpu_read(cpu_tlbstate_shared.is_lazy)) { /* * We're in lazy mode. We need to at least flush our @@ -1161,11 +1172,31 @@ static void flush_tlb_func(void *info) * This should be rare, with native_flush_tlb_multi() skipping * IPIs to lazy TLB mode CPUs. */ + cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(loaded_mm)); switch_mm_irqs_off(NULL, &init_mm, NULL); return; } =20 - local_tlb_gen =3D this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen= ); + /* Reload the ASID if transitioning into or out of a global ASID */ + if (mm_needs_global_asid(loaded_mm, loaded_mm_asid)) { + switch_mm_irqs_off(NULL, loaded_mm, NULL); + loaded_mm_asid =3D this_cpu_read(cpu_tlbstate.loaded_mm_asid); + } + + /* + * Broadcast ASIDs are always kept up to date with INVLPGB; with + * Intel RAR IPI based flushes are used periodically to trim the + * mm_cpumask, and flushes that get here should be processed. + */ + if (cpu_feature_enabled(X86_FEATURE_INVLPGB) && + is_global_asid(loaded_mm_asid)) + return; + + VM_WARN_ON(is_dyn_asid(loaded_mm_asid) && loaded_mm->context.ctx_id !=3D + this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].ctx_id)); + + if (is_dyn_asid(loaded_mm_asid)) + local_tlb_gen =3D this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_ge= n); =20 if (unlikely(f->new_tlb_gen !=3D TLB_GENERATION_INVALID && f->new_tlb_gen <=3D local_tlb_gen)) { @@ -1264,7 +1295,8 @@ static void flush_tlb_func(void *info) } =20 /* Both paths above update our state to mm_tlb_gen. */ - this_cpu_write(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen, mm_tlb_gen); + if (is_dyn_asid(loaded_mm_asid)) + this_cpu_write(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen, mm_tlb_gen); =20 /* Tracing is done in a unified manner to reduce the code size */ done: @@ -1305,15 +1337,15 @@ static bool should_flush_tlb(int cpu, void *data) if (loaded_mm =3D=3D info->mm) return true; =20 - /* In cpumask, but not the loaded mm? Periodically remove by flushing. */ - if (info->trim_cpumask) - return true; - return false; } =20 static bool should_trim_cpumask(struct mm_struct *mm) { + /* INVLPGB always goes to all CPUs. No need to trim the mask. */ + if (cpu_feature_enabled(X86_FEATURE_INVLPGB) && mm_global_asid(mm)) + return false; + if (time_after(jiffies, READ_ONCE(mm->context.next_trim_cpumask))) { WRITE_ONCE(mm->context.next_trim_cpumask, jiffies + HZ); return true; @@ -1324,6 +1356,27 @@ static bool should_trim_cpumask(struct mm_struct *mm) DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared= ); EXPORT_PER_CPU_SYMBOL(cpu_tlbstate_shared); =20 +static bool should_flush_all(const struct flush_tlb_info *info) +{ + if (info->freed_tables) + return true; + + if (info->trim_cpumask) + return true; + + /* + * INVLPGB and RAR do not use this code path normally. + * This call cleans up the cpumask or ASID transition. + */ + if (mm_global_asid(info->mm)) + return true; + + if (mm_in_asid_transition(info->mm)) + return true; + + return false; +} + STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask, const struct flush_tlb_info *info) { @@ -1349,7 +1402,7 @@ STATIC_NOPV void native_flush_tlb_multi(const struct = cpumask *cpumask, * up on the new contents of what used to be page tables, while * doing a speculative memory access. */ - if (info->freed_tables || mm_in_asid_transition(info->mm)) + if (should_flush_all(info)) on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true); else on_each_cpu_cond_mask(should_flush_tlb, flush_tlb_func, @@ -1380,6 +1433,74 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct flush_tl= b_info, flush_tlb_info); static DEFINE_PER_CPU(unsigned int, flush_tlb_info_idx); #endif =20 +static void trim_cpumask_func(void *data) +{ + struct mm_struct *loaded_mm =3D this_cpu_read(cpu_tlbstate.loaded_mm); + const struct flush_tlb_info *f =3D data; + + /* + * Clearing this bit from an IRQ handler synchronizes against + * the bit being set in switch_mm_irqs_off, with IRQs disabled. + */ + if (f->mm !=3D loaded_mm) + cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(f->mm)); +} + +static bool should_remove_cpu_from_mask(int cpu, void *data) +{ + struct mm_struct *loaded_mm =3D per_cpu(cpu_tlbstate.loaded_mm, cpu); + struct flush_tlb_info *info =3D data; + + if (loaded_mm !=3D info->mm) + return true; + + return false; +} + +/* Remove CPUs from the mm_cpumask that are running another mm. */ +static void trim_cpumask(struct flush_tlb_info *info) +{ + cpumask_t *cpumask =3D mm_cpumask(info->mm); + on_each_cpu_cond_mask(should_remove_cpu_from_mask, trim_cpumask_func, + (void *)info, 1, cpumask); +} + +static void rar_tlb_flush(struct flush_tlb_info *info) +{ + unsigned long asid =3D mm_global_asid(info->mm); + cpumask_t *cpumask =3D mm_cpumask(info->mm); + u16 pcid =3D kern_pcid(asid); + + if (info->trim_cpumask) + trim_cpumask(info); + + /* Only the local CPU needs to be flushed? */ + if (cpumask_equal(cpumask, cpumask_of(raw_smp_processor_id()))) { + lockdep_assert_irqs_enabled(); + local_irq_disable(); + flush_tlb_func(info); + local_irq_enable(); + return; + } + + /* Flush all the CPUs at once with RAR. */ + if (cpumask_weight(cpumask)) { + smp_call_rar_many(mm_cpumask(info->mm), pcid, info->start, info->end); + if (cpu_feature_enabled(X86_FEATURE_PTI)) + smp_call_rar_many(mm_cpumask(info->mm), user_pcid(asid), info->start, i= nfo->end); + } +} + +static void broadcast_tlb_flush(struct flush_tlb_info *info) +{ + if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) + invlpgb_tlb_flush(info); + else /* Intel RAR */ + rar_tlb_flush(info); + + finish_asid_transition(info); +} + static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int stride_shift, bool freed_tables, @@ -1440,6 +1561,13 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsign= ed long start, info =3D get_flush_tlb_info(mm, start, end, stride_shift, freed_tables, new_tlb_gen); =20 + /* + * IPIs and RAR can be targeted to a cpumask. Periodically trim that + * mm_cpumask by sending TLB flush IPIs, even when most TLB flushes + * are done with RAR. + */ + info->trim_cpumask =3D should_trim_cpumask(mm); + /* * flush_tlb_multi() is not optimized for the common case in which only * a local TLB flush is needed. Optimize this use-case by calling @@ -1448,7 +1576,6 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigne= d long start, if (mm_global_asid(mm)) { broadcast_tlb_flush(info); } else if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) { - info->trim_cpumask =3D should_trim_cpumask(mm); flush_tlb_multi(mm_cpumask(mm), info); consider_global_asid(mm); } else if (mm =3D=3D this_cpu_read(cpu_tlbstate.loaded_mm)) { @@ -1759,6 +1886,14 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_= batch *batch) if (cpu_feature_enabled(X86_FEATURE_INVLPGB) && batch->unmapped_pages) { invlpgb_flush_all_nonglobals(); batch->unmapped_pages =3D false; + } else if (cpu_feature_enabled(X86_FEATURE_RAR) && cpumask_any(&batch->cp= umask) < nr_cpu_ids) { + rar_full_flush(&batch->cpumask); + if (cpumask_test_cpu(cpu, &batch->cpumask)) { + lockdep_assert_irqs_enabled(); + local_irq_disable(); + invpcid_flush_all_nonglobals(); + local_irq_enable(); + } } else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) { flush_tlb_multi(&batch->cpumask, info); } else if (cpumask_test_cpu(cpu, &batch->cpumask)) { --=20 2.49.0 From nobody Thu Oct 9 04:43:55 2025 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 906DD30E84F for ; Thu, 19 Jun 2025 20:13:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750364030; cv=none; b=IKvi9I8ibWGliE52l4kIhRP5uxwG3m8HRz4Q40BcBW1BU5zONPjdcYONrvMNnrUAdeWelOQAtaGq0WKWVoxPvtX3MftLnnReRikxpfSH8EzrXC9rl3NKXCJIQ53E4rkjArPmrcecwx1pjSEx8dCgGg7V/vEAQDoADYaY9Vv3gN4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750364030; c=relaxed/simple; bh=xR2GATWvtT7K0WDhtq6quOlxEBwkLwdI0hvfuAeWvcM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Zjjrd/kPoxLXyRQJRjykn2h104uQm0OcVMxrGeT3hhVbyxSQq69LTEKeWTgJqNZb1Le5DFXyAxM1y48tyJA0c1iYZQ2kcJFRPUPErdCMDZd0WzvqVnkea27beQEm2W0V+Wi+ETix4KHqP4OsiWDJM1aVwXYvewnuZYig6lPniLo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=mVVgocVP; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="mVVgocVP" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=Njqfd1V34EDc3gdJJICc7zOO8FOT70GrNPNIY7ze66k=; b=mVVgocVPXureDbVbybeI+nKTop gHDquOzyCOavmM0jFskm9cadKN8sbbhjJxE/hwEVR+pLSSnW7p7hZMWRacK+f3Ti2wZqjmEri1ioi E3w6Bb1qWpcqZZRDHraN2H01bASrJzMjDc6CGB2wwC7ZFTXoHwoo4gMXyKJ1HJ9eTCPAiq9vWEExh bFf2M0QqfMVrLaXTj15eO5mEyLoy0cHXxSe62HW7sh0060g/hRsFNhBq3N3Xq6kIvgjJBnM+dwqlN +flHuSuc8JHNVyZ/yv0M+lFBMBGNUUOsQLEE82yCJX6eXsd9sZXv5zPwwv2LirLG1rsO0853osHDy vVOkbAfQ==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1uSLV6-000000000CT-25tR; Thu, 19 Jun 2025 16:04:44 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, bp@alien8.de, x86@kernel.org, nadav.amit@gmail.com, seanjc@google.com, tglx@linutronix.de, mingo@kernel.org, Rik van Riel , Rik van Riel Subject: [RFC PATCH v4 8/8] x86/tlb: flush the local TLB twice (DEBUG) Date: Thu, 19 Jun 2025 16:04:00 -0400 Message-ID: <20250619200442.1694583-9-riel@surriel.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250619200442.1694583-1-riel@surriel.com> References: <20250619200442.1694583-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rik van Riel The RAR code attempts to flush the local TLB in addition to remote TLBs, if the local TLB is in the cpumask. This can be seen in that the status for the local CPU is changed from RAR_PENDING to RAR_SUCCESS. However, it appears that the local TLB is not actually getting flushed when the microcode flips the status to RAR_SUCCESS! The RAR white paper suggests it should work: "At this point, the ILP may invalidate its own TLB by signaling RAR to itself in order to invoke the RAR handler locally as well." I would really appreciate some guidance from Intel on how to move forward here. Is the RAR code doing something wrong? Is the CPU not behaving quite as documented? What is the best way forward? Not-signed-off-by: Rik van Riel --- arch/x86/mm/tlb.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 590742838e43..f12eff2dbcc8 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1469,22 +1469,22 @@ static void rar_tlb_flush(struct flush_tlb_info *in= fo) { unsigned long asid =3D mm_global_asid(info->mm); cpumask_t *cpumask =3D mm_cpumask(info->mm); + int cpu =3D raw_smp_processor_id(); u16 pcid =3D kern_pcid(asid); =20 if (info->trim_cpumask) trim_cpumask(info); =20 /* Only the local CPU needs to be flushed? */ - if (cpumask_equal(cpumask, cpumask_of(raw_smp_processor_id()))) { + if (cpumask_test_cpu(cpu, cpumask)) { lockdep_assert_irqs_enabled(); local_irq_disable(); flush_tlb_func(info); local_irq_enable(); - return; } =20 /* Flush all the CPUs at once with RAR. */ - if (cpumask_weight(cpumask)) { + if (cpumask_any_but(cpumask, cpu)) { smp_call_rar_many(mm_cpumask(info->mm), pcid, info->start, info->end); if (cpu_feature_enabled(X86_FEATURE_PTI)) smp_call_rar_many(mm_cpumask(info->mm), user_pcid(asid), info->start, i= nfo->end); --=20 2.49.0