From nobody Fri Dec 19 17:37:52 2025 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 827194315A for ; Thu, 5 Jun 2025 16:35:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749141364; cv=none; b=cV5euo+Yaq6T+lmRyxRVKkxQkuoHSVDvj7oNdoqf3RSUDfEbICR5cL5QDOVzcKN7p2m0FRtQQbO3mU538MDizEbeCQu23BhI7d6qwP/4povEhywyA8VQHyMqm91eB5IUPwf7OqT7DDs1a4seFDWMIU6vBnyjslU/7d8ThsboqM0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749141364; c=relaxed/simple; bh=OAMl2/2Up5MxIckrOMb5nox+KObSWQMglG+ax9q9st8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Olk/SUj6VV2m3BXJ7PNsU2rtSfJSPWKrfylTGl4cNwDX9uj6sWZbUqUo09Lfh6VKoAVOLPgWlU1Tjv27pI+Jo6QzJCqrW9OChdS4BcioAm+LfQBvFAnO4C4V8N3eXmm5G10JlT5BhJFi4soWZkw3urOUpEv8fAi77hrcCIdCWIo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=QiBLxRLH; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="QiBLxRLH" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=f4zP15oJQMI/cNiQtQETjNBzcgWHpdktSGMgq1QqQy0=; b=QiBLxRLHERqlWHVNPz4Kpw0uUz C+LBnp6hw7J/zI9hGXrd8U/iM4/ncoxOrGQPqpBNYr/Br6vezA3cpHPGJvr+SvZvHNiK35H/DGXxe HpL18ggmUASpqyzpjaqldVhmHokIYW2PKtabuYDiS68lHHzcvaR0+hNp0vD3CU72bVD42Ais8Zol1 4a7jtIuqSiITZ7lexE7VJyiYtz8Ycwknn2IVlByeU2kBsDNWLDxB2f8A/j5vOUrp59DnjBjDElh1N Tt/MVO1XsUrIMv3oZtEVvNVEX3jHR1BBmT5ZdruH/wt2g3H3hfJBJrIBlG6AlYm713FtZWpIvKp39 B4Zh+ZWw==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1uNDZC-000000002qg-3BgF; Thu, 05 Jun 2025 12:35:46 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, bp@alien8.de, x86@kernel.org, nadav.amit@gmail.com, seanjc@google.com, tglx@linutronix.de, Yu-cheng Yu , Rik van Riel Subject: [RFC PATCH v3 1/7] x86/mm: Introduce Remote Action Request MSRs Date: Thu, 5 Jun 2025 12:35:10 -0400 Message-ID: <20250605163544.3852565-2-riel@surriel.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250605163544.3852565-1-riel@surriel.com> References: <20250605163544.3852565-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yu-cheng Yu Remote Action Request (RAR) is a model-specific feature to speed up inter-processor operations by moving parts of those operations from software to hardware. The current RAR implementation handles TLB flushes and MSR writes. This patch introduces RAR MSRs. RAR is introduced in later patches. There are five RAR MSRs: MSR_CORE_CAPABILITIES MSR_IA32_RAR_CTRL MSR_IA32_RAR_ACT_VEC MSR_IA32_RAR_PAYLOAD_BASE MSR_IA32_RAR_INFO Signed-off-by: Yu-cheng Yu Signed-off-by: Rik van Riel --- arch/x86/include/asm/msr-index.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-in= dex.h index b7dded3c8113..367a62c50aa2 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -110,6 +110,8 @@ =20 /* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */ #define MSR_IA32_CORE_CAPS 0x000000cf +#define MSR_IA32_CORE_CAPS_RAR_BIT 1 +#define MSR_IA32_CORE_CAPS_RAR BIT(MSR_IA32_CORE_CAPS_RAR_BIT) #define MSR_IA32_CORE_CAPS_INTEGRITY_CAPS_BIT 2 #define MSR_IA32_CORE_CAPS_INTEGRITY_CAPS BIT(MSR_IA32_CORE_CAPS_INTEGRI= TY_CAPS_BIT) #define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT 5 @@ -122,6 +124,17 @@ #define SNB_C3_AUTO_UNDEMOTE (1UL << 27) #define SNB_C1_AUTO_UNDEMOTE (1UL << 28) =20 +/* + * Remote Action Requests (RAR) MSRs + */ +#define MSR_IA32_RAR_CTRL 0x000000ed +#define MSR_IA32_RAR_ACT_VEC 0x000000ee +#define MSR_IA32_RAR_PAYLOAD_BASE 0x000000ef +#define MSR_IA32_RAR_INFO 0x000000f0 + +#define RAR_CTRL_ENABLE BIT(31) +#define RAR_CTRL_IGNORE_IF BIT(30) + #define MSR_MTRRcap 0x000000fe =20 #define MSR_IA32_ARCH_CAPABILITIES 0x0000010a --=20 2.49.0 From nobody Fri Dec 19 17:37:52 2025 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 828B7172BB9 for ; Thu, 5 Jun 2025 16:35:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749141363; cv=none; b=XUgYtHuWqokR+kKOu4UyQLFJTh2XmR1HlXP/yU+XfB/jU7mXtupUR/x9oMGa9luQrzBkHe49H+B3XIuhFbeWesde0ZdBo15ptpu6e4GQStZ+Kt3ce5LZRo4A24AVPtLv6Z2eQduNydr1VYzCZT1OcbuSQYdrDdnNE+GpyHhOS/I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749141363; c=relaxed/simple; bh=S5cxOSa6d88PfbTdq9eoJMTTbccAluAglm7/Dfl/fm8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fzYhR4D1qc8lCdQK6clI2aAvkCrcySDPhPVFkLdWz6i4Br1agteW4zhQYZddQKv9JHKsUqd3K7WqHWQjCZa5EzzGjU0NAM4WWzeC+1u0vItRbMStVceZdOda+N30I8x5Y1wpsnsve1K1BY8z8NHg8ufzp5jt9Q50fFGNadewUw0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=LqooNOCZ; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="LqooNOCZ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=1SMkrgUDqqJ7Fy358K5JacpCt/v6noOWf3XUoT7x+/I=; b=LqooNOCZTiZ7cTGXsh1U7Jc3tl MbULD+5Zc9VxkZWtMoaypGpilkB/nO6POPonEtWoe3Vh881rNwjFE/idtr19n1sTDPUyZKydGqc0N JpYhp6XlxoCazlLCSNGB5EB5iwXUtQA+eprZentYphTTHmT0lUXtMOq0bP/IKEm213vg8vUwqj9mH hqVmRtbFhdbiZTJC7Yrz+AjNWg/V2MV018lcxhPO5tURZmlisPuxik8GFyf+QccJKufklZ3X24f9I 1gpnDFcz7dVHiHf9SfVO0ze1Pi4teE0fieUcGkgXjkjY69/NiJSUOkVyfWO4YVkK1hLCDBb6fTxRa V7vZiXDw==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1uNDZC-000000002qg-3H7C; Thu, 05 Jun 2025 12:35:46 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, bp@alien8.de, x86@kernel.org, nadav.amit@gmail.com, seanjc@google.com, tglx@linutronix.de, Rik van Riel , Rik van Riel Subject: [RFC PATCH v3 2/7] x86/mm: enable BROADCAST_TLB_FLUSH on Intel, too Date: Thu, 5 Jun 2025 12:35:11 -0400 Message-ID: <20250605163544.3852565-3-riel@surriel.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250605163544.3852565-1-riel@surriel.com> References: <20250605163544.3852565-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rik van Riel Much of the code for Intel RAR and AMD INVLPGB is shared. Place both under the same config option. Signed-off-by: Rik van Riel --- arch/x86/Kconfig.cpu | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index f928cf6e3252..ab763f69f54d 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -360,7 +360,7 @@ menuconfig PROCESSOR_SELECT =20 config BROADCAST_TLB_FLUSH def_bool y - depends on CPU_SUP_AMD && 64BIT + depends on (CPU_SUP_AMD || CPU_SUP_INTEL) && 64BIT && SMP =20 config CPU_SUP_INTEL default y --=20 2.49.0 From nobody Fri Dec 19 17:37:52 2025 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8287415746F for ; Thu, 5 Jun 2025 16:35:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749141364; cv=none; b=bbNLEQaSmvITSrvnGI7DRrCwTBXMIzEgIa9l8VQcR+kfgQEJhD3wxs4Pi86w2w45BnPe9sv+ZujZgt8WCfCCNlvcZXnr2gIWSjQq6onfTFOdRp00+mg4U7iLEwEcR0TSQd5uDXsBTXF4UJfPeNer3qS5nYyypOU6lh8k1BEWsKs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749141364; c=relaxed/simple; bh=dIU2IPQXRRDph6JR52vMNSkMkpJZDqBmw+laGUXhAy4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UydutReJAt2qBd/8pmxaqu5d1Ksm2+JyB/x0tuIdR+ck0qj8YFE349E4Ydv45urCl0rqYoWRWZYzig4Pwg86rxn+dbonXGvCdReIIFgCPQN6BspqlVlDjme+k4WcT3uMA6P7Hgt7FANIUVJzdc3ZM1baZdlFMjYX8kFptQEOFOA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=m0u+Q9Tf; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="m0u+Q9Tf" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=abUxIwN6dE9IQKt0FIPVCZFxzLCAfWpqcU4Ho5sOehU=; b=m0u+Q9Tf8WMBaC6AbPKIPDrSQl D6rR2M40yJSKDpdQDe6CoWUWR51tpXkR5opuDcrvA2rPi9QrzenUUd2YFMwGORpeZU6RRwcvVnpjw U8821RnIGv9JbSci8jRZG6ebSaYnOTdkOlxJFU3aLMDvm+K7toWWd1KKTxTlX0B8pFTTY/KaUJavu EWuJMQnfm7M1CuCvkuBHRaFWuxYnab4PzTEwqx4TDa9HFUBSRWEwXUINLKFKQcdrJzMmA+hu8WBeF Rj1BHOHkvx7IWUiwM86JuvtT3mAEYVcLtN6UKZu1CERX4CyNPNfNKUfeconYIUE0CxjsYSi2baKpW kOE5P39Q==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1uNDZC-000000002qg-3MhL; Thu, 05 Jun 2025 12:35:46 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, bp@alien8.de, x86@kernel.org, nadav.amit@gmail.com, seanjc@google.com, tglx@linutronix.de, Yu-cheng Yu , Rik van Riel Subject: [RFC PATCH v3 3/7] x86/mm: Introduce X86_FEATURE_RAR Date: Thu, 5 Jun 2025 12:35:12 -0400 Message-ID: <20250605163544.3852565-4-riel@surriel.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250605163544.3852565-1-riel@surriel.com> References: <20250605163544.3852565-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yu-cheng Yu Introduce X86_FEATURE_RAR and enumeration of the feature. [riel: moved initialization to intel.c and disabling to Kconfig.cpufeatures] Signed-off-by: Yu-cheng Yu Signed-off-by: Rik van Riel --- arch/x86/Kconfig.cpufeatures | 4 ++++ arch/x86/include/asm/cpufeatures.h | 2 +- arch/x86/kernel/cpu/intel.c | 9 +++++++++ 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig.cpufeatures b/arch/x86/Kconfig.cpufeatures index 250c10627ab3..7d459b5f47f7 100644 --- a/arch/x86/Kconfig.cpufeatures +++ b/arch/x86/Kconfig.cpufeatures @@ -195,3 +195,7 @@ config X86_DISABLED_FEATURE_SEV_SNP config X86_DISABLED_FEATURE_INVLPGB def_bool y depends on !BROADCAST_TLB_FLUSH + +config X86_DISABLED_FEATURE_RAR + def_bool y + depends on !BROADCAST_TLB_FLUSH diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpuf= eatures.h index ee176236c2be..e6781541ffce 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -76,7 +76,7 @@ #define X86_FEATURE_K8 ( 3*32+ 4) /* Opteron, Athlon64 */ #define X86_FEATURE_ZEN5 ( 3*32+ 5) /* CPU based on Zen5 microarchitectur= e */ #define X86_FEATURE_ZEN6 ( 3*32+ 6) /* CPU based on Zen6 microarchitectur= e */ -/* Free ( 3*32+ 7) */ +#define X86_FEATURE_RAR ( 3*32+ 7) /* Intel Remote Action Request */ #define X86_FEATURE_CONSTANT_TSC ( 3*32+ 8) /* "constant_tsc" TSC ticks at= a constant rate */ #define X86_FEATURE_UP ( 3*32+ 9) /* "up" SMP kernel running on UP */ #define X86_FEATURE_ART ( 3*32+10) /* "art" Always running timer (ART) */ diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 076eaa41b8c8..f5cac46e1b91 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -335,6 +335,15 @@ static void early_init_intel(struct cpuinfo_x86 *c) */ if (cpu_has(c, X86_FEATURE_TME)) detect_tme_early(c); + + if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) { + u64 msr; + + rdmsrl(MSR_IA32_CORE_CAPS, msr); + + if (msr & MSR_IA32_CORE_CAPS_RAR) + setup_force_cpu_cap(X86_FEATURE_RAR); + } } =20 static void bsp_init_intel(struct cpuinfo_x86 *c) --=20 2.49.0 From nobody Fri Dec 19 17:37:52 2025 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 827E386347 for ; Thu, 5 Jun 2025 16:35:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749141364; cv=none; b=ThAs0OK+N1Icl1DNSHB+Et4zai82iG3qo8AQZ9dYEhDHkA47jVHzLSkDiL+YneBdTs1o76VRW07Fqn0kErEbYg6LpkDBZrb9/cLlmHIaVTjJR6p3D3mSvR74ebb77B5kMzwKl+ZwXJ34Z/EzbAuGcL67EjOf9h98ZIFfqes0958= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749141364; c=relaxed/simple; bh=AVRJK0AlV6BUg2dwNbgEDGLzCQWZMvMX1uvwF2M+3PM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lndBs8R8bfGUBqSE8xhWILreKAt1+4pSmo9JzxJ6635VXAYyd8KTi5DxAmEaNySjra/J1W+HLR/OFi8t3G+35LlJaJgEVZrpCmry68/z5ISke7LCeyPxQWzSkgwdRY0Dr2eAKc8R3f8MCzrOneXydjO3yIpWk3gR9Bo87f+cjLE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=aQhn2z3j; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="aQhn2z3j" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=NH09DhKK9d23nNs3oive8Tg5lMzhCAly3h7vRT7M3aU=; b=aQhn2z3jZRv6gpJ1/1kOuOTGXG 5OlsxcGu/i4YSYtkp8T8IS5RaT0cP7aaVphD83qgeFDBjIsIiaoFF5No5Kf0NUtHnYzaH0PIsLYZ2 Ixv3eMrZ8mhhBb0wxTZoOO//tTJhZrFCIZffA7ACjlzxiAAsiQlzZAl3o53BG8iPnxGA5Ri2xOJR7 GPGVa81+922AM1mCj/uAtw1/AR4OqqR90JcK9pXME4nl/dcgozR4gqk3Y6GRM+GqFBChzyUvALJaR H4KQglB5uZLLMQ1LVpI4gd7En0z4a5IwQe9MEFn393GKniLbbf52NIzrozLLxCkv0tdNFWb9qd/YL kQKJcWcw==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1uNDZC-000000002qg-3SCE; Thu, 05 Jun 2025 12:35:46 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, bp@alien8.de, x86@kernel.org, nadav.amit@gmail.com, seanjc@google.com, tglx@linutronix.de, Yu-cheng Yu , Rik van Riel Subject: [RFC PATCH v3 4/7] x86/apic: Introduce Remote Action Request Operations Date: Thu, 5 Jun 2025 12:35:13 -0400 Message-ID: <20250605163544.3852565-5-riel@surriel.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250605163544.3852565-1-riel@surriel.com> References: <20250605163544.3852565-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yu-cheng Yu RAR TLB flushing is started by sending a command to the APIC. This patch adds Remote Action Request commands. Because RAR_VECTOR is hardcoded at 0xe0, POSTED_MSI_NOTIFICATION_VECTOR has to be lowered to 0xdf, reducing the number of available vectors by 13. [riel: refactor after 6 years of changes, lower POSTED_MSI_NOTIFICATION_VEC= TOR] Signed-off-by: Yu-cheng Yu Signed-off-by: Rik van Riel --- arch/x86/include/asm/apicdef.h | 1 + arch/x86/include/asm/irq_vectors.h | 7 ++++++- arch/x86/include/asm/smp.h | 1 + arch/x86/kernel/apic/ipi.c | 5 +++++ arch/x86/kernel/apic/local.h | 3 +++ 5 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h index 094106b6a538..b152d45af91a 100644 --- a/arch/x86/include/asm/apicdef.h +++ b/arch/x86/include/asm/apicdef.h @@ -92,6 +92,7 @@ #define APIC_DM_LOWEST 0x00100 #define APIC_DM_SMI 0x00200 #define APIC_DM_REMRD 0x00300 +#define APIC_DM_RAR 0x00300 #define APIC_DM_NMI 0x00400 #define APIC_DM_INIT 0x00500 #define APIC_DM_STARTUP 0x00600 diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_= vectors.h index 47051871b436..52a0cf56562a 100644 --- a/arch/x86/include/asm/irq_vectors.h +++ b/arch/x86/include/asm/irq_vectors.h @@ -97,11 +97,16 @@ =20 #define LOCAL_TIMER_VECTOR 0xec =20 +/* + * RAR (remote action request) TLB flush + */ +#define RAR_VECTOR 0xe0 + /* * Posted interrupt notification vector for all device MSIs delivered to * the host kernel. */ -#define POSTED_MSI_NOTIFICATION_VECTOR 0xeb +#define POSTED_MSI_NOTIFICATION_VECTOR 0xdf =20 #define NR_VECTORS 256 =20 diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index 0c1c68039d6f..0e5ad0dc987a 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -120,6 +120,7 @@ void __noreturn mwait_play_dead(unsigned int eax_hint); void native_smp_send_reschedule(int cpu); void native_send_call_func_ipi(const struct cpumask *mask); void native_send_call_func_single_ipi(int cpu); +void native_send_rar_ipi(const struct cpumask *mask); =20 asmlinkage __visible void smp_reboot_interrupt(void); __visible void smp_reschedule_interrupt(struct pt_regs *regs); diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c index 98a57cb4aa86..9983c42619ef 100644 --- a/arch/x86/kernel/apic/ipi.c +++ b/arch/x86/kernel/apic/ipi.c @@ -106,6 +106,11 @@ void apic_send_nmi_to_offline_cpu(unsigned int cpu) return; apic->send_IPI(cpu, NMI_VECTOR); } + +void native_send_rar_ipi(const struct cpumask *mask) +{ + __apic_send_IPI_mask(mask, RAR_VECTOR); +} #endif /* CONFIG_SMP */ =20 static inline int __prepare_ICR2(unsigned int mask) diff --git a/arch/x86/kernel/apic/local.h b/arch/x86/kernel/apic/local.h index bdcf609eb283..833669174267 100644 --- a/arch/x86/kernel/apic/local.h +++ b/arch/x86/kernel/apic/local.h @@ -38,6 +38,9 @@ static inline unsigned int __prepare_ICR(unsigned int sho= rtcut, int vector, case NMI_VECTOR: icr |=3D APIC_DM_NMI; break; + case RAR_VECTOR: + icr |=3D APIC_DM_RAR; + break; } return icr; } --=20 2.49.0 From nobody Fri Dec 19 17:37:52 2025 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 827637DA9C for ; Thu, 5 Jun 2025 16:35:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749141364; cv=none; b=WALn9Gx/FDp/KJlHFMdyp5C2s/vjDLruTz72ye9S/sh+iGrudugWsQwStgPU+UyM07oUPPlw6MUYeTNfSvA7SA89EE2VmnInw1snIHT1M8XWlK9Waij8GuD0eRpTXUVTjfzcfr9/EWUix0mXMGaSKO121JuXoU2eNE6UuFtSHgQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749141364; c=relaxed/simple; bh=tUXIhZbF7H9VbMC/6Rr6SYm8b2PsrGPNeClDtMBzUMg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=avX4QWnPriGRQ/+5DsQW64gN3qO8NSHj00EsgwloBxdKPyQOPx17NTpsKw3yh93bxSEsCn3txCRz6VWWLkK1cubIaMEK8ZBQ/fcY7dKs9nsKaWofeLo+79HzOELb1E0dRXhWCclhKPJJ+ikN7gC7/2EZYqghqLYSWKOO17guQf4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=QO2d/FP7; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="QO2d/FP7" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=d0GaS7yOnMLWrg4UsPm7aK0LJLyPCA8rrRBXUPtjrSg=; b=QO2d/FP7YU/UtaWbMr4zEEzKZX k+sy01hroVzxv6hrVfMRxcPuuWwGx7gYF0yTNdpZ6YcW5HQnZlNO2fR9Q0Cr0og0TgRtPsXEP6nPV rrQRna6TVzkBMYmafmzU6/AiqbB+6KKwZ8+9adI9fARwaRa8lTuNXe7P+nqHtlJiauRcaM/Bv/hxh 8PQ3x+nQsvPmJ1ojhD1sM1+3sQG+Ji5nUUlsh7wDDEDEuv35POExsqyqq8SbrdSxFXfqEQPKkI+FF iDXbFxr7AQ0KSMOwKT2hHFpiTZs0p+RNnuJ6MA7o1XnNLe3P71X5lqFeyvXZUDYp2oo0T+qhShWHL wFXq2KsA==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1uNDZC-000000002qg-3ZIq; Thu, 05 Jun 2025 12:35:46 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, bp@alien8.de, x86@kernel.org, nadav.amit@gmail.com, seanjc@google.com, tglx@linutronix.de, Yu-cheng Yu , Rik van Riel Subject: [RFC PATCH v3 5/7] x86/mm: Introduce Remote Action Request Date: Thu, 5 Jun 2025 12:35:14 -0400 Message-ID: <20250605163544.3852565-6-riel@surriel.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250605163544.3852565-1-riel@surriel.com> References: <20250605163544.3852565-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yu-cheng Yu Remote Action Request (RAR) is a TLB flushing broadcast facility. To start a TLB flush, the initiator CPU creates a RAR payload and sends a command to the APIC. The receiving CPUs automatically flush TLBs as specified in the payload without the kernel's involement. [ riel: add pcid parameter to smp_call_rar_many so other mms can be flushed= ] Signed-off-by: Yu-cheng Yu Signed-off-by: Rik van Riel --- arch/x86/include/asm/rar.h | 69 +++++++++++ arch/x86/kernel/cpu/common.c | 4 + arch/x86/mm/Makefile | 1 + arch/x86/mm/rar.c | 217 +++++++++++++++++++++++++++++++++++ 4 files changed, 291 insertions(+) create mode 100644 arch/x86/include/asm/rar.h create mode 100644 arch/x86/mm/rar.c diff --git a/arch/x86/include/asm/rar.h b/arch/x86/include/asm/rar.h new file mode 100644 index 000000000000..78c039e40e81 --- /dev/null +++ b/arch/x86/include/asm/rar.h @@ -0,0 +1,69 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_RAR_H +#define _ASM_X86_RAR_H + +/* + * RAR payload types + */ +#define RAR_TYPE_INVPG 0 +#define RAR_TYPE_INVPG_NO_CR3 1 +#define RAR_TYPE_INVPCID 2 +#define RAR_TYPE_INVEPT 3 +#define RAR_TYPE_INVVPID 4 +#define RAR_TYPE_WRMSR 5 + +/* + * Subtypes for RAR_TYPE_INVLPG + */ +#define RAR_INVPG_ADDR 0 /* address specific */ +#define RAR_INVPG_ALL 2 /* all, include global */ +#define RAR_INVPG_ALL_NO_GLOBAL 3 /* all, exclude global */ + +/* + * Subtypes for RAR_TYPE_INVPCID + */ +#define RAR_INVPCID_ADDR 0 /* address specific */ +#define RAR_INVPCID_PCID 1 /* all of PCID */ +#define RAR_INVPCID_ALL 2 /* all, include global */ +#define RAR_INVPCID_ALL_NO_GLOBAL 3 /* all, exclude global */ + +/* + * Page size for RAR_TYPE_INVLPG + */ +#define RAR_INVLPG_PAGE_SIZE_4K 0 +#define RAR_INVLPG_PAGE_SIZE_2M 1 +#define RAR_INVLPG_PAGE_SIZE_1G 2 + +/* + * Max number of pages per payload + */ +#define RAR_INVLPG_MAX_PAGES 63 + +struct rar_payload { + u64 for_sw : 8; + u64 type : 8; + u64 must_be_zero_1 : 16; + u64 subtype : 3; + u64 page_size : 2; + u64 num_pages : 6; + u64 must_be_zero_2 : 21; + + u64 must_be_zero_3; + + /* + * Starting address + */ + u64 initiator_cr3; + u64 linear_address; + + /* + * Padding + */ + u64 padding[4]; +}; + +void rar_cpu_init(void); +void smp_call_rar_many(const struct cpumask *mask, u16 pcid, + unsigned long start, unsigned long end); + +#endif /* _ASM_X86_RAR_H */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 8feb8fd2957a..d68a0a9b2aa2 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -71,6 +71,7 @@ #include #include #include +#include =20 #include "cpu.h" =20 @@ -2425,6 +2426,9 @@ void cpu_init(void) if (is_uv_system()) uv_cpu_init(); =20 + if (cpu_feature_enabled(X86_FEATURE_RAR)) + rar_cpu_init(); + load_fixmap_gdt(cpu); } =20 diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 5b9908f13dcf..f36fc99e8b10 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -52,6 +52,7 @@ obj-$(CONFIG_ACPI_NUMA) +=3D srat.o obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) +=3D pkeys.o obj-$(CONFIG_RANDOMIZE_MEMORY) +=3D kaslr.o obj-$(CONFIG_MITIGATION_PAGE_TABLE_ISOLATION) +=3D pti.o +obj-$(CONFIG_BROADCAST_TLB_FLUSH) +=3D rar.o =20 obj-$(CONFIG_X86_MEM_ENCRYPT) +=3D mem_encrypt.o obj-$(CONFIG_AMD_MEM_ENCRYPT) +=3D mem_encrypt_amd.o diff --git a/arch/x86/mm/rar.c b/arch/x86/mm/rar.c new file mode 100644 index 000000000000..f63e68b412de --- /dev/null +++ b/arch/x86/mm/rar.c @@ -0,0 +1,217 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * RAR TLB shootdown + */ +#include +#include +#include +#include +#include +#include +#include + +static DEFINE_PER_CPU(struct cpumask, rar_cpu_mask); + +#define RAR_ACTION_SUCCESS 0x00 +#define RAR_ACTION_PENDING 0x01 +#define RAR_ACTION_FAILURE 0x80 + +#define RAR_MAX_PAYLOADS 64UL + +/* How many RAR payloads are supported by this CPU */ +static int rar_max_payloads =3D RAR_MAX_PAYLOADS; + +/* Bitmap describing which RAR payload slots are in use. */ +static unsigned long rar_in_use =3D ~(RAR_MAX_PAYLOADS - 1); + +/* + * RAR payloads telling CPUs what to do. This table is shared between + * all CPUs; it is possible to have multiple payload tables shared between + * different subsets of CPUs, but that adds a lot of complexity. + */ +static struct rar_payload rar_payload[RAR_MAX_PAYLOADS] __page_aligned_bss; + +/* + * The action vector tells each CPU which payload table entries + * have work for that CPU. + */ +static DEFINE_PER_CPU_ALIGNED(u8[RAR_MAX_PAYLOADS], rar_action); + +static unsigned long get_payload_slot(void) +{ + while (1) { + unsigned long bit; + + /* + * Find a free bit and confirm it with test_and_set_bit() + * below. If no slot is free, spin until one becomes free. + */ + bit =3D ffz(READ_ONCE(rar_in_use)); + + if (bit >=3D rar_max_payloads) + continue; + + if (!test_and_set_bit((long)bit, &rar_in_use)) + return bit; + } +} + +static void free_payload_slot(unsigned long payload_nr) +{ + clear_bit(payload_nr, &rar_in_use); +} + +static void set_payload(struct rar_payload *p, u16 pcid, unsigned long sta= rt, + uint32_t pages) +{ + p->must_be_zero_1 =3D 0; + p->must_be_zero_2 =3D 0; + p->must_be_zero_3 =3D 0; + p->page_size =3D RAR_INVLPG_PAGE_SIZE_4K; + p->type =3D RAR_TYPE_INVPCID; + p->num_pages =3D pages; + p->initiator_cr3 =3D pcid; + p->linear_address =3D start; + + if (pcid) { + /* RAR invalidation of the mapping of a specific process. */ + if (pages >=3D RAR_INVLPG_MAX_PAGES) + p->subtype =3D RAR_INVPCID_PCID; + else + p->subtype =3D RAR_INVPCID_ADDR; + } else { + /* + * Unfortunately RAR_INVPCID_ADDR excludes global translations. + * Always do a full flush for kernel invalidations. + */ + p->subtype =3D RAR_INVPCID_ALL; + } + + /* Ensure all writes are visible before the action entry is set. */ + smp_wmb(); +} + +static void set_action_entry(unsigned long payload_nr, int target_cpu) +{ + u8 *bitmap =3D per_cpu(rar_action, target_cpu); + + /* + * Given a remote CPU, "arm" its action vector to ensure it handles + * the request at payload_nr when it receives a RAR signal. + * The remote CPU will overwrite RAR_ACTION_PENDING when it handles + * the request. + */ + WRITE_ONCE(bitmap[payload_nr], RAR_ACTION_PENDING); +} + +static void wait_for_action_done(unsigned long payload_nr, int target_cpu) +{ + u8 status; + u8 *rar_actions =3D per_cpu(rar_action, target_cpu); + + status =3D READ_ONCE(rar_actions[payload_nr]); + + while (status =3D=3D RAR_ACTION_PENDING) { + cpu_relax(); + status =3D READ_ONCE(rar_actions[payload_nr]); + } + + WARN_ON_ONCE(rar_actions[payload_nr] !=3D RAR_ACTION_SUCCESS); +} + +void rar_cpu_init(void) +{ + u64 r; + u8 *bitmap; + int max_payloads; + int this_cpu =3D smp_processor_id(); + + cpumask_clear(&per_cpu(rar_cpu_mask, this_cpu)); + + /* The MSR contains N defining the max [0-N] rar payload slots. */ + rdmsrl(MSR_IA32_RAR_INFO, r); + max_payloads =3D (r >> 32) + 1; + + /* If this CPU supports less than RAR_MAX_PAYLOADS, lower our limit. */ + if (max_payloads < rar_max_payloads) + rar_max_payloads =3D max_payloads; + pr_info_once("RAR: support %d payloads\n", max_payloads); + + bitmap =3D (u8 *)per_cpu(rar_action, this_cpu); + memset(bitmap, 0, RAR_MAX_PAYLOADS); + wrmsrl(MSR_IA32_RAR_ACT_VEC, (u64)virt_to_phys(bitmap)); + wrmsrl(MSR_IA32_RAR_PAYLOAD_BASE, (u64)virt_to_phys(rar_payload)); + + /* + * Allow RAR events to be processed while interrupts are disabled on + * a target CPU. This prevents "pileups" where many CPUs are waiting + * on one CPU that has IRQs blocked for too long, and should reduce + * contention on the rar_payload table. + */ + r =3D RAR_CTRL_ENABLE | RAR_CTRL_IGNORE_IF; + wrmsrl(MSR_IA32_RAR_CTRL, r); +} + +/* + * Inspired by smp_call_function_many(), but RAR requires a global payload + * table rather than per-CPU payloads in the CSD table, because the action + * handler is microcode rather than software. + */ +void smp_call_rar_many(const struct cpumask *mask, u16 pcid, + unsigned long start, unsigned long end) +{ + unsigned long pages =3D (end - start + PAGE_SIZE) / PAGE_SIZE; + int cpu, this_cpu =3D smp_processor_id(); + cpumask_t *dest_mask; + unsigned long payload_nr; + + if (pages > RAR_INVLPG_MAX_PAGES || end =3D=3D TLB_FLUSH_ALL) + pages =3D RAR_INVLPG_MAX_PAGES; + + /* + * Can deadlock when called with interrupts disabled. + * Allow CPUs that are not yet online though, as no one else can + * send smp call function interrupt to this CPU and as such deadlocks + * can't happen. + */ + if (cpu_online(this_cpu) && !oops_in_progress && !early_boot_irqs_disable= d) { + lockdep_assert_irqs_enabled(); + lockdep_assert_preemption_disabled(); + } + + /* + * A CPU needs to be initialized in order to process RARs. + * Skip offline CPUs. + * + * TODO: + * - Use RAR to flush our own TLB so it can all happen in parallel + * (need to resolve a chicken-egg issue with the boot CPU) + * - Skip RAR to CPUs that are in a deeper C-state, with an empty TLB + * + * This code cannot use the should_flush_tlb() logic here because + * RAR flushes do not update the tlb_gen, resulting in unnecessary + * flushes at context switch time. + */ + dest_mask =3D this_cpu_ptr(&rar_cpu_mask); + cpumask_and(dest_mask, mask, cpu_online_mask); + __cpumask_clear_cpu(this_cpu, dest_mask); + + /* Some callers race with other CPUs changing the passed mask */ + if (unlikely(!cpumask_weight(dest_mask))) + return; + + payload_nr =3D get_payload_slot(); + set_payload(&rar_payload[payload_nr], pcid, start, pages); + + for_each_cpu(cpu, dest_mask) + set_action_entry(payload_nr, cpu); + + /* Send a message to all CPUs in the map */ + native_send_rar_ipi(dest_mask); + + for_each_cpu(cpu, dest_mask) + wait_for_action_done(payload_nr, cpu); + + free_payload_slot(payload_nr); +} +EXPORT_SYMBOL(smp_call_rar_many); --=20 2.49.0 From nobody Fri Dec 19 17:37:52 2025 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8279B84E1C for ; Thu, 5 Jun 2025 16:35:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749141364; cv=none; b=XODpDKtA064wVlpz/0/DlgfN8cOBoDmqqkRGZKcx+K31xrX/mCAf8b7ufccIeY4QO3OhTtQJ7xV1tFI5rU7PRp0LFaBmiKa+OjsYwB+ar7T86l0URpFxrWk+DRmUl1tjlGBgXmbb6PWTBqxP7NjgSl2jM3Q4E4kNL15t7ifoF64= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749141364; c=relaxed/simple; bh=25nE3AP6G/VH0ulig6Pf87unBvwnVISmhX6u3B+80Rk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=n6xEYWWpXo2bAH1C0JPUVugibcnW4Mw2CPMTNasrxwxUSqNaXNgKukCI8GH7O3ULNe+Qybc40zlE+O62K4DECeV28aWfpeTrj3jnGtURjxgvNmY6RbSerCRvlsfWa2npuLWA+3FZid+5kGrcSKBHbJoBagRKOOl6cxFAyBCq7PA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=YZSE4cZM; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="YZSE4cZM" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=ns8Xv98DYTXn1XvQ4Hh9ILsDIPyjR4wi8vgj74UOuo0=; b=YZSE4cZMuEyy/ZGdbe4TPopc+t FJgE21DpYqbfrE4Ee6vXz1YtjoL27HDzkOsTIQhBOebUlYieZabacXcIfvPj5dZKE4GYForzu2VPB ZrUJZHyYbCQDXNJ8nhsnji7ESBXXYT1F9JsGG3Y4XL+ixd14r6OI37YVo7qlRk+LtPDp7WZBd9Tts dOvJeS6f0kSQOWrC8s+/2HgoLJNLiu6U8hB1jQFgLiYMM6yXwkH60XRGY0a0GSL9tegxz6xt4yC4r 8l9FaIQFRLK9KaD9OlwpRaDc0eOJOsR/I9sAmfl53/yx2tGSQ0eVxq+z//AhZkeA7tS9zTK6//l5w 3epVskog==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1uNDZC-000000002qg-3faA; Thu, 05 Jun 2025 12:35:46 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, bp@alien8.de, x86@kernel.org, nadav.amit@gmail.com, seanjc@google.com, tglx@linutronix.de, Rik van Riel , Rik van Riel Subject: [RFC PATCH v3 6/7] x86/mm: use RAR for kernel TLB flushes Date: Thu, 5 Jun 2025 12:35:15 -0400 Message-ID: <20250605163544.3852565-7-riel@surriel.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250605163544.3852565-1-riel@surriel.com> References: <20250605163544.3852565-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rik van Riel Use Intel RAR for kernel TLB flushes, when enabled. Pass in PCID 0 to smp_call_rar_many() to flush the specified addresses, regardless of which PCID they might be cached under in any destination CPU. Signed-off-by: Rik van Riel --- arch/x86/mm/tlb.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 39f80111e6f1..8931f7029d6c 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -21,6 +21,7 @@ #include #include #include +#include #include =20 #include "mm_internal.h" @@ -1468,6 +1469,18 @@ static void do_flush_tlb_all(void *info) __flush_tlb_all(); } =20 +static void rar_full_flush(const cpumask_t *cpumask) +{ + guard(preempt)(); + smp_call_rar_many(cpumask, 0, 0, TLB_FLUSH_ALL); + invpcid_flush_all(); +} + +static void rar_flush_all(void) +{ + rar_full_flush(cpu_online_mask); +} + void flush_tlb_all(void) { count_vm_tlb_event(NR_TLB_REMOTE_FLUSH); @@ -1475,6 +1488,8 @@ void flush_tlb_all(void) /* First try (faster) hardware-assisted TLB invalidation. */ if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) invlpgb_flush_all(); + else if (cpu_feature_enabled(X86_FEATURE_RAR)) + rar_flush_all(); else /* Fall back to the IPI-based invalidation. */ on_each_cpu(do_flush_tlb_all, NULL, 1); @@ -1504,15 +1519,36 @@ static void do_kernel_range_flush(void *info) struct flush_tlb_info *f =3D info; unsigned long addr; =20 + /* + * With PTI kernel TLB entries in all PCIDs need to be flushed. + * With RAR the PCID space becomes so large, we might as well flush it al= l. + * + * Either of the two by itself works with targeted flushes. + */ + if (cpu_feature_enabled(X86_FEATURE_RAR) && + cpu_feature_enabled(X86_FEATURE_PTI)) { + invpcid_flush_all(); + return; + } + /* flush range by one by one 'invlpg' */ for (addr =3D f->start; addr < f->end; addr +=3D PAGE_SIZE) flush_tlb_one_kernel(addr); } =20 +static void rar_kernel_range_flush(struct flush_tlb_info *info) +{ + guard(preempt)(); + smp_call_rar_many(cpu_online_mask, 0, info->start, info->end); + do_kernel_range_flush(info); +} + static void kernel_tlb_flush_all(struct flush_tlb_info *info) { if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) invlpgb_flush_all(); + else if (cpu_feature_enabled(X86_FEATURE_RAR)) + rar_flush_all(); else on_each_cpu(do_flush_tlb_all, NULL, 1); } @@ -1521,6 +1557,8 @@ static void kernel_tlb_flush_range(struct flush_tlb_i= nfo *info) { if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) invlpgb_kernel_range_flush(info); + else if (cpu_feature_enabled(X86_FEATURE_RAR)) + rar_kernel_range_flush(info); else on_each_cpu(do_kernel_range_flush, info, 1); } --=20 2.49.0 From nobody Fri Dec 19 17:37:52 2025 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 826D128EB for ; Thu, 5 Jun 2025 16:35:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749141363; cv=none; b=hfx6L32GpRVNIUld6IFeVs33rZF9wP6aTfPtQq2WfkI+/4tZric9hpAqq9v9/RrvcJLdA91azQUfJzDFw9YFpKq3woJJtim1rMh1RJ61IqCj+DWJc/Y2iCtgR2CV50LlcSPyXVWRVz1cy5UgMqdbU6mbDuHfO1YtyHYRd2sC3+c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749141363; c=relaxed/simple; bh=pwKXZ7JdEhYK/8XERx6KXCjYqveKgYWh5sI7lsuJRCU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qHNcHC/ryjWxq/fjPZOyOhD/90rNxjiAmjSiQWXzmGZC5JzkZQEt04UKENyLeCVctE+ye2W2qZQBJmy7Ugl25Z5TXRGL6MUkmWEj7f2c4E/0rCfqZ0lXLP/n0iQd6DXzhjj9H4+zfXiEfw6gmVcUR7qYMv92vAxhO85e0JJcsI8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=YHHnm21D; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="YHHnm21D" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=khz39EtDqLWffJdydfpu+X36rlWqSeiyiuDlzc/r+Ug=; b=YHHnm21DlLph7IRT69AUT4M3xX mMMvaTZtrmUnp1LSQxORQf7RXoYZs2myZOJAl10WvyjglpTwzD7IHm4i+KZkrt1Wwz6pOXJqVek5X dplIjIH1SbjPOpF4ypKYAjT6IlN3b5tgVTQ7RrcLnlN3ChfE5yVHxxm5wbBSdJqAdxZ1X8jibAWK5 x4D8w5V4Sg2bNCi13ZqjDrH+XL9twawyWWGRmgLj6YnD/V7qZdCA9VDJ2p2humpoHsi4/lNlcgCWk GMX7zOqgliPLZN2fNEhy8EqSk5nd6BmAkZvOfYNFKBMVdjbAkma78E4SR/8iDLzbYJuOwutaqexdR hYbTcgIw==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1uNDZC-000000002qg-3lKG; Thu, 05 Jun 2025 12:35:46 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, bp@alien8.de, x86@kernel.org, nadav.amit@gmail.com, seanjc@google.com, tglx@linutronix.de, Rik van Riel , Rik van Riel Subject: [RFC PATCH v3 7/7] x86/mm: userspace & pageout flushing using Intel RAR Date: Thu, 5 Jun 2025 12:35:16 -0400 Message-ID: <20250605163544.3852565-8-riel@surriel.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250605163544.3852565-1-riel@surriel.com> References: <20250605163544.3852565-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rik van Riel Use Intel RAR to flush userspace mappings. Because RAR flushes are targeted using a cpu bitmap, the rules are a little bit different than for true broadcast TLB invalidation. For true broadcast TLB invalidation, like done with AMD INVLPGB, a global ASID always has up to date TLB entries on every CPU. The context switch code never has to flush the TLB when switching to a global ASID on any CPU with INVLPGB. For RAR, the TLB mappings for a global ASID are kept up to date only on CPUs within the mm_cpumask, which lazily follows the threads around the system. The context switch code does not need to flush the TLB if the CPU is in the mm_cpumask, and the PCID used stays the same. However, a CPU that falls outside of the mm_cpumask can have out of date TLB mappings for this task. When switching to that task on a CPU not in the mm_cpumask, the TLB does need to be flushed. Signed-off-by: Rik van Riel --- arch/x86/include/asm/tlbflush.h | 9 +- arch/x86/mm/tlb.c | 177 ++++++++++++++++++++++++-------- 2 files changed, 141 insertions(+), 45 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflus= h.h index e9b81876ebe4..21bd9162df38 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -250,7 +250,8 @@ static inline u16 mm_global_asid(struct mm_struct *mm) { u16 asid; =20 - if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) && + !cpu_feature_enabled(X86_FEATURE_RAR)) return 0; =20 asid =3D smp_load_acquire(&mm->context.global_asid); @@ -263,7 +264,8 @@ static inline u16 mm_global_asid(struct mm_struct *mm) =20 static inline void mm_init_global_asid(struct mm_struct *mm) { - if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) { + if (cpu_feature_enabled(X86_FEATURE_INVLPGB) || + cpu_feature_enabled(X86_FEATURE_RAR)) { mm->context.global_asid =3D 0; mm->context.asid_transition =3D false; } @@ -287,7 +289,8 @@ static inline void mm_clear_asid_transition(struct mm_s= truct *mm) =20 static inline bool mm_in_asid_transition(struct mm_struct *mm) { - if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) && + !cpu_feature_enabled(X86_FEATURE_RAR)) return false; =20 return mm && READ_ONCE(mm->context.asid_transition); diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 8931f7029d6c..75f32f49bef6 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -222,7 +222,8 @@ struct new_asid { unsigned int need_flush : 1; }; =20 -static struct new_asid choose_new_asid(struct mm_struct *next, u64 next_tl= b_gen) +static struct new_asid choose_new_asid(struct mm_struct *next, u64 next_tl= b_gen, + bool new_cpu) { struct new_asid ns; u16 asid; @@ -235,14 +236,22 @@ static struct new_asid choose_new_asid(struct mm_stru= ct *next, u64 next_tlb_gen) =20 /* * TLB consistency for global ASIDs is maintained with hardware assisted - * remote TLB flushing. Global ASIDs are always up to date. + * remote TLB flushing. Global ASIDs are always up to date with INVLPGB, + * and up to date for CPUs in the mm_cpumask with RAR.. */ - if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) { + if (cpu_feature_enabled(X86_FEATURE_INVLPGB) || + cpu_feature_enabled(X86_FEATURE_RAR)) { u16 global_asid =3D mm_global_asid(next); =20 if (global_asid) { ns.asid =3D global_asid; ns.need_flush =3D 0; + /* + * If the CPU fell out of the cpumask, it can be + * out of date with RAR, and should be flushed. + */ + if (cpu_feature_enabled(X86_FEATURE_RAR)) + ns.need_flush =3D new_cpu; return ns; } } @@ -300,7 +309,14 @@ static void reset_global_asid_space(void) { lockdep_assert_held(&global_asid_lock); =20 - invlpgb_flush_all_nonglobals(); + /* + * The global flush ensures that a freshly allocated global ASID + * has no entries in any TLB, and can be used immediately. + * With Intel RAR, the TLB may still need to be flushed at context + * switch time when dealing with a CPU that was not in the mm_cpumask + * for the process, and may have missed flushes along the way. + */ + flush_tlb_all(); =20 /* * The TLB flush above makes it safe to re-use the previously @@ -377,7 +393,7 @@ static void use_global_asid(struct mm_struct *mm) { u16 asid; =20 - guard(raw_spinlock_irqsave)(&global_asid_lock); + guard(raw_spinlock)(&global_asid_lock); =20 /* This process is already using broadcast TLB invalidation. */ if (mm_global_asid(mm)) @@ -403,13 +419,14 @@ static void use_global_asid(struct mm_struct *mm) =20 void mm_free_global_asid(struct mm_struct *mm) { - if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) && + !cpu_feature_enabled(X86_FEATURE_RAR)) return; =20 if (!mm_global_asid(mm)) return; =20 - guard(raw_spinlock_irqsave)(&global_asid_lock); + guard(raw_spinlock)(&global_asid_lock); =20 /* The global ASID can be re-used only after flush at wrap-around. */ #ifdef CONFIG_BROADCAST_TLB_FLUSH @@ -427,7 +444,8 @@ static bool mm_needs_global_asid(struct mm_struct *mm, = u16 asid) { u16 global_asid =3D mm_global_asid(mm); =20 - if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) && + !cpu_feature_enabled(X86_FEATURE_RAR)) return false; =20 /* Process is transitioning to a global ASID */ @@ -445,7 +463,8 @@ static bool mm_needs_global_asid(struct mm_struct *mm, = u16 asid) */ static void consider_global_asid(struct mm_struct *mm) { - if (!cpu_feature_enabled(X86_FEATURE_INVLPGB)) + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) && + !cpu_feature_enabled(X86_FEATURE_RAR)) return; =20 /* Check every once in a while. */ @@ -490,6 +509,7 @@ static void finish_asid_transition(struct flush_tlb_inf= o *info) * that results in a (harmless) extra IPI. */ if (READ_ONCE(per_cpu(cpu_tlbstate.loaded_mm_asid, cpu)) !=3D bc_asid) { + info->trim_cpumask =3D true; flush_tlb_multi(mm_cpumask(info->mm), info); return; } @@ -499,7 +519,7 @@ static void finish_asid_transition(struct flush_tlb_inf= o *info) mm_clear_asid_transition(mm); } =20 -static void broadcast_tlb_flush(struct flush_tlb_info *info) +static void invlpgb_tlb_flush(struct flush_tlb_info *info) { bool pmd =3D info->stride_shift =3D=3D PMD_SHIFT; unsigned long asid =3D mm_global_asid(info->mm); @@ -530,8 +550,6 @@ static void broadcast_tlb_flush(struct flush_tlb_info *= info) addr +=3D nr << info->stride_shift; } while (addr < info->end); =20 - finish_asid_transition(info); - /* Wait for the INVLPGBs kicked off above to finish. */ __tlbsync(); } @@ -862,7 +880,7 @@ void switch_mm_irqs_off(struct mm_struct *unused, struc= t mm_struct *next, /* Check if the current mm is transitioning to a global ASID */ if (mm_needs_global_asid(next, prev_asid)) { next_tlb_gen =3D atomic64_read(&next->context.tlb_gen); - ns =3D choose_new_asid(next, next_tlb_gen); + ns =3D choose_new_asid(next, next_tlb_gen, true); goto reload_tlb; } =20 @@ -900,6 +918,7 @@ void switch_mm_irqs_off(struct mm_struct *unused, struc= t mm_struct *next, ns.asid =3D prev_asid; ns.need_flush =3D true; } else { + bool new_cpu =3D false; /* * Apply process to process speculation vulnerability * mitigations if applicable. @@ -914,20 +933,25 @@ void switch_mm_irqs_off(struct mm_struct *unused, str= uct mm_struct *next, this_cpu_write(cpu_tlbstate.loaded_mm, LOADED_MM_SWITCHING); barrier(); =20 - /* Start receiving IPIs and then read tlb_gen (and LAM below) */ - if (next !=3D &init_mm && !cpumask_test_cpu(cpu, mm_cpumask(next))) + /* Start receiving IPIs and RAR invalidations */ + if (next !=3D &init_mm && !cpumask_test_cpu(cpu, mm_cpumask(next))) { cpumask_set_cpu(cpu, mm_cpumask(next)); + if (cpu_feature_enabled(X86_FEATURE_RAR)) + new_cpu =3D true; + } + next_tlb_gen =3D atomic64_read(&next->context.tlb_gen); =20 - ns =3D choose_new_asid(next, next_tlb_gen); + ns =3D choose_new_asid(next, next_tlb_gen, new_cpu); } =20 reload_tlb: new_lam =3D mm_lam_cr3_mask(next); if (ns.need_flush) { - VM_WARN_ON_ONCE(is_global_asid(ns.asid)); - this_cpu_write(cpu_tlbstate.ctxs[ns.asid].ctx_id, next->context.ctx_id); - this_cpu_write(cpu_tlbstate.ctxs[ns.asid].tlb_gen, next_tlb_gen); + if (is_dyn_asid(ns.asid)) { + this_cpu_write(cpu_tlbstate.ctxs[ns.asid].ctx_id, next->context.ctx_id); + this_cpu_write(cpu_tlbstate.ctxs[ns.asid].tlb_gen, next_tlb_gen); + } load_new_mm_cr3(next->pgd, ns.asid, new_lam, true); =20 trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); @@ -1115,7 +1139,7 @@ static void flush_tlb_func(void *info) const struct flush_tlb_info *f =3D info; struct mm_struct *loaded_mm =3D this_cpu_read(cpu_tlbstate.loaded_mm); u32 loaded_mm_asid =3D this_cpu_read(cpu_tlbstate.loaded_mm_asid); - u64 local_tlb_gen; + u64 local_tlb_gen =3D 0; bool local =3D smp_processor_id() =3D=3D f->initiating_cpu; unsigned long nr_invalidate =3D 0; u64 mm_tlb_gen; @@ -1138,19 +1162,6 @@ static void flush_tlb_func(void *info) if (unlikely(loaded_mm =3D=3D &init_mm)) return; =20 - /* Reload the ASID if transitioning into or out of a global ASID */ - if (mm_needs_global_asid(loaded_mm, loaded_mm_asid)) { - switch_mm_irqs_off(NULL, loaded_mm, NULL); - loaded_mm_asid =3D this_cpu_read(cpu_tlbstate.loaded_mm_asid); - } - - /* Broadcast ASIDs are always kept up to date with INVLPGB. */ - if (is_global_asid(loaded_mm_asid)) - return; - - VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].ctx_id) !=3D - loaded_mm->context.ctx_id); - if (this_cpu_read(cpu_tlbstate_shared.is_lazy)) { /* * We're in lazy mode. We need to at least flush our @@ -1161,11 +1172,31 @@ static void flush_tlb_func(void *info) * This should be rare, with native_flush_tlb_multi() skipping * IPIs to lazy TLB mode CPUs. */ + cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(loaded_mm)); switch_mm_irqs_off(NULL, &init_mm, NULL); return; } =20 - local_tlb_gen =3D this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen= ); + /* Reload the ASID if transitioning into or out of a global ASID */ + if (mm_needs_global_asid(loaded_mm, loaded_mm_asid)) { + switch_mm_irqs_off(NULL, loaded_mm, NULL); + loaded_mm_asid =3D this_cpu_read(cpu_tlbstate.loaded_mm_asid); + } + + /* + * Broadcast ASIDs are always kept up to date with INVLPGB; with + * Intel RAR IPI based flushes are used periodically to trim the + * mm_cpumask, and flushes that get here should be processed. + */ + if (cpu_feature_enabled(X86_FEATURE_INVLPGB) && + is_global_asid(loaded_mm_asid)) + return; + + VM_WARN_ON(is_dyn_asid(loaded_mm_asid) && loaded_mm->context.ctx_id !=3D + this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].ctx_id)); + + if (is_dyn_asid(loaded_mm_asid)) + local_tlb_gen =3D this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_ge= n); =20 if (unlikely(f->new_tlb_gen !=3D TLB_GENERATION_INVALID && f->new_tlb_gen <=3D local_tlb_gen)) { @@ -1264,7 +1295,8 @@ static void flush_tlb_func(void *info) } =20 /* Both paths above update our state to mm_tlb_gen. */ - this_cpu_write(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen, mm_tlb_gen); + if (is_dyn_asid(loaded_mm_asid)) + this_cpu_write(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen, mm_tlb_gen); =20 /* Tracing is done in a unified manner to reduce the code size */ done: @@ -1305,10 +1337,6 @@ static bool should_flush_tlb(int cpu, void *data) if (loaded_mm =3D=3D info->mm) return true; =20 - /* In cpumask, but not the loaded mm? Periodically remove by flushing. */ - if (info->trim_cpumask) - return true; - return false; } =20 @@ -1324,6 +1352,27 @@ static bool should_trim_cpumask(struct mm_struct *mm) DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared= ); EXPORT_PER_CPU_SYMBOL(cpu_tlbstate_shared); =20 +static bool should_flush_all(const struct flush_tlb_info *info) +{ + if (info->freed_tables) + return true; + + if (info->trim_cpumask) + return true; + + /* + * INVLPGB and RAR do not use this code path normally. + * This call cleans up the cpumask or ASID transition. + */ + if (mm_global_asid(info->mm)) + return true; + + if (mm_in_asid_transition(info->mm)) + return true; + + return false; +} + STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask, const struct flush_tlb_info *info) { @@ -1349,7 +1398,7 @@ STATIC_NOPV void native_flush_tlb_multi(const struct = cpumask *cpumask, * up on the new contents of what used to be page tables, while * doing a speculative memory access. */ - if (info->freed_tables || mm_in_asid_transition(info->mm)) + if (should_flush_all(info)) on_each_cpu_mask(cpumask, flush_tlb_func, (void *)info, true); else on_each_cpu_cond_mask(should_flush_tlb, flush_tlb_func, @@ -1380,6 +1429,35 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct flush_tl= b_info, flush_tlb_info); static DEFINE_PER_CPU(unsigned int, flush_tlb_info_idx); #endif =20 +static void rar_tlb_flush(struct flush_tlb_info *info) +{ + unsigned long asid =3D mm_global_asid(info->mm); + u16 pcid =3D kern_pcid(asid); + + /* Flush the remote CPUs. */ + smp_call_rar_many(mm_cpumask(info->mm), pcid, info->start, info->end); + if (cpu_feature_enabled(X86_FEATURE_PTI)) + smp_call_rar_many(mm_cpumask(info->mm), user_pcid(asid), info->start, in= fo->end); + + /* Flush the local TLB, if needed. */ + if (cpumask_test_cpu(smp_processor_id(), mm_cpumask(info->mm))) { + lockdep_assert_irqs_enabled(); + local_irq_disable(); + flush_tlb_func(info); + local_irq_enable(); + } +} + +static void broadcast_tlb_flush(struct flush_tlb_info *info) +{ + if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) + invlpgb_tlb_flush(info); + else /* Intel RAR */ + rar_tlb_flush(info); + + finish_asid_transition(info); +} + static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int stride_shift, bool freed_tables, @@ -1440,15 +1518,22 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsig= ned long start, info =3D get_flush_tlb_info(mm, start, end, stride_shift, freed_tables, new_tlb_gen); =20 + /* + * IPIs and RAR can be targeted to a cpumask. Periodically trim that + * mm_cpumask by sending TLB flush IPIs, even when most TLB flushes + * are done with RAR. + */ + if (!cpu_feature_enabled(X86_FEATURE_INVLPGB) || !mm_global_asid(mm)) + info->trim_cpumask =3D should_trim_cpumask(mm); + /* * flush_tlb_multi() is not optimized for the common case in which only * a local TLB flush is needed. Optimize this use-case by calling * flush_tlb_func_local() directly in this case. */ - if (mm_global_asid(mm)) { + if (mm_global_asid(mm) && !info->trim_cpumask) { broadcast_tlb_flush(info); } else if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) { - info->trim_cpumask =3D should_trim_cpumask(mm); flush_tlb_multi(mm_cpumask(mm), info); consider_global_asid(mm); } else if (mm =3D=3D this_cpu_read(cpu_tlbstate.loaded_mm)) { @@ -1759,6 +1844,14 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_= batch *batch) if (cpu_feature_enabled(X86_FEATURE_INVLPGB) && batch->unmapped_pages) { invlpgb_flush_all_nonglobals(); batch->unmapped_pages =3D false; + } else if (cpu_feature_enabled(X86_FEATURE_RAR) && cpumask_any(&batch->cp= umask) < nr_cpu_ids) { + rar_full_flush(&batch->cpumask); + if (cpumask_test_cpu(cpu, &batch->cpumask)) { + lockdep_assert_irqs_enabled(); + local_irq_disable(); + invpcid_flush_all_nonglobals(); + local_irq_enable(); + } } else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) { flush_tlb_multi(&batch->cpumask, info); } else if (cpumask_test_cpu(cpu, &batch->cpumask)) { --=20 2.49.0