From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9043C433FE for ; Thu, 27 Jan 2022 17:55:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244854AbiA0Rz0 (ORCPT ); Thu, 27 Jan 2022 12:55:26 -0500 Received: from mga12.intel.com ([192.55.52.136]:65462 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240185AbiA0RzP (ORCPT ); Thu, 27 Jan 2022 12:55:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306115; x=1674842115; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ER6JviQv6ox2UPy2iIeHIKlZtZEKY0Z2BX0fYT7NB20=; b=SCKANevs47lIrsZziijA2bCoDRTQ1HeG2HfYJAYrZjN+yJ0gOBcSE8aq xyvVJYbD63JNRrHLd6y3psaUouLp0E2hqAu0VW3iaGPUbATdzFuDb5Smk 7rJvNwUjJvrX+ZTwzil3j60i086qmqwGtZVM+ofc2xINVcmbhjjlMo6Rc kfjn9U4g8mwjM1MjInsulSO1seWubzB5Yqkit5zp6xv4OpQ/8xPNOIk5Q Y0w8ne7dN9mjz/N1ws+rGKnEEvKo52hAKy/tQ6cuvL1Dpd477mg2KCxc2 WqkeXt/4iJkBLZry7N9wXHZWA0heFDVT2AnEY1YFA+J6BjJNhylSkfPLB g==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899115" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899115" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:07 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796044" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:07 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 01/44] entry: Create an internal irqentry_exit_cond_resched() call Date: Thu, 27 Jan 2022 09:54:22 -0800 Message-Id: <20220127175505.851391-2-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The call to irqentry_exit_cond_resched() was not properly being overridden when called from xen_pv_evtchn_do_upcall(). Define __irqentry_exit_cond_resched() as the static call and place the override logic in irqentry_exit_cond_resched(). Cc: Peter Zijlstra (Intel) Signed-off-by: Ira Weiny --- Because this was found via code inspection and it does not actually fix any seen bug I've not added a fixes tag. But for reference: Fixes: 40607ee97e4e ("preempt/dynamic: Provide irqentry_exit_cond_resched()= static call") --- include/linux/entry-common.h | 5 ++++- kernel/entry/common.c | 23 +++++++++++++-------- kernel/sched/core.c | 40 ++++++++++++++++++------------------ 3 files changed, 38 insertions(+), 30 deletions(-) diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index 2e2b8d6140ed..ddaffc983e62 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -455,10 +455,13 @@ irqentry_state_t noinstr irqentry_enter(struct pt_reg= s *regs); * Conditional reschedule with additional sanity checks. */ void irqentry_exit_cond_resched(void); + +void __irqentry_exit_cond_resched(void); #ifdef CONFIG_PREEMPT_DYNAMIC -DECLARE_STATIC_CALL(irqentry_exit_cond_resched, irqentry_exit_cond_resched= ); +DECLARE_STATIC_CALL(__irqentry_exit_cond_resched, __irqentry_exit_cond_res= ched); #endif =20 + /** * irqentry_exit - Handle return from exception that used irqentry_enter() * @regs: Pointer to pt_regs (exception entry regs) diff --git a/kernel/entry/common.c b/kernel/entry/common.c index bad713684c2e..490442a48332 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -380,7 +380,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs = *regs) return ret; } =20 -void irqentry_exit_cond_resched(void) +void __irqentry_exit_cond_resched(void) { if (!preempt_count()) { /* Sanity check RCU and thread stack */ @@ -392,9 +392,20 @@ void irqentry_exit_cond_resched(void) } } #ifdef CONFIG_PREEMPT_DYNAMIC -DEFINE_STATIC_CALL(irqentry_exit_cond_resched, irqentry_exit_cond_resched); +DEFINE_STATIC_CALL(__irqentry_exit_cond_resched, __irqentry_exit_cond_resc= hed); #endif =20 +void irqentry_exit_cond_resched(void) +{ + if (IS_ENABLED(CONFIG_PREEMPTION)) { +#ifdef CONFIG_PREEMPT_DYNAMIC + static_call(__irqentry_exit_cond_resched)(); +#else + __irqentry_exit_cond_resched(); +#endif + } +} + noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) { lockdep_assert_irqs_disabled(); @@ -420,13 +431,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqen= try_state_t state) } =20 instrumentation_begin(); - if (IS_ENABLED(CONFIG_PREEMPTION)) { -#ifdef CONFIG_PREEMPT_DYNAMIC - static_call(irqentry_exit_cond_resched)(); -#else - irqentry_exit_cond_resched(); -#endif - } + irqentry_exit_cond_resched(); /* Covers both tracing and lockdep */ trace_hardirqs_on(); instrumentation_end(); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 848eaa0efe0e..7197c33beb39 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6562,29 +6562,29 @@ EXPORT_STATIC_CALL_TRAMP(preempt_schedule_notrace); * SC:might_resched * SC:preempt_schedule * SC:preempt_schedule_notrace - * SC:irqentry_exit_cond_resched + * SC:__irqentry_exit_cond_resched * * * NONE: - * cond_resched <- __cond_resched - * might_resched <- RET0 - * preempt_schedule <- NOP - * preempt_schedule_notrace <- NOP - * irqentry_exit_cond_resched <- NOP + * cond_resched <- __cond_resched + * might_resched <- RET0 + * preempt_schedule <- NOP + * preempt_schedule_notrace <- NOP + * __irqentry_exit_cond_resched <- NOP * * VOLUNTARY: - * cond_resched <- __cond_resched - * might_resched <- __cond_resched - * preempt_schedule <- NOP - * preempt_schedule_notrace <- NOP - * irqentry_exit_cond_resched <- NOP + * cond_resched <- __cond_resched + * might_resched <- __cond_resched + * preempt_schedule <- NOP + * preempt_schedule_notrace <- NOP + * __irqentry_exit_cond_resched <- NOP * * FULL: - * cond_resched <- RET0 - * might_resched <- RET0 - * preempt_schedule <- preempt_schedule - * preempt_schedule_notrace <- preempt_schedule_notrace - * irqentry_exit_cond_resched <- irqentry_exit_cond_resched + * cond_resched <- RET0 + * might_resched <- RET0 + * preempt_schedule <- preempt_schedule + * preempt_schedule_notrace <- preempt_schedule_notrace + * __irqentry_exit_cond_resched <- __irqentry_exit_cond_resched */ =20 enum { @@ -6620,7 +6620,7 @@ void sched_dynamic_update(int mode) static_call_update(might_resched, __cond_resched); static_call_update(preempt_schedule, __preempt_schedule_func); static_call_update(preempt_schedule_notrace, __preempt_schedule_notrace_f= unc); - static_call_update(irqentry_exit_cond_resched, irqentry_exit_cond_resched= ); + static_call_update(__irqentry_exit_cond_resched, __irqentry_exit_cond_res= ched); =20 switch (mode) { case preempt_dynamic_none: @@ -6628,7 +6628,7 @@ void sched_dynamic_update(int mode) static_call_update(might_resched, (void *)&__static_call_return0); static_call_update(preempt_schedule, NULL); static_call_update(preempt_schedule_notrace, NULL); - static_call_update(irqentry_exit_cond_resched, NULL); + static_call_update(__irqentry_exit_cond_resched, NULL); pr_info("Dynamic Preempt: none\n"); break; =20 @@ -6637,7 +6637,7 @@ void sched_dynamic_update(int mode) static_call_update(might_resched, __cond_resched); static_call_update(preempt_schedule, NULL); static_call_update(preempt_schedule_notrace, NULL); - static_call_update(irqentry_exit_cond_resched, NULL); + static_call_update(__irqentry_exit_cond_resched, NULL); pr_info("Dynamic Preempt: voluntary\n"); break; =20 @@ -6646,7 +6646,7 @@ void sched_dynamic_update(int mode) static_call_update(might_resched, (void *)&__static_call_return0); static_call_update(preempt_schedule, __preempt_schedule_func); static_call_update(preempt_schedule_notrace, __preempt_schedule_notrace_= func); - static_call_update(irqentry_exit_cond_resched, irqentry_exit_cond_resche= d); + static_call_update(__irqentry_exit_cond_resched, __irqentry_exit_cond_re= sched); pr_info("Dynamic Preempt: full\n"); break; } --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06F2AC433EF for ; Thu, 27 Jan 2022 17:55:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244809AbiA0RzW (ORCPT ); Thu, 27 Jan 2022 12:55:22 -0500 Received: from mga12.intel.com ([192.55.52.136]:65458 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244647AbiA0RzO (ORCPT ); Thu, 27 Jan 2022 12:55:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306114; x=1674842114; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aeJ1/P1xqXGcj5TN1+2Hu9SHaU92723DTWbcaS3bA44=; b=MGYhsQSRCmobDyWpag5+3qHyfXR/ViAeZsHHklrNshvBwwXgaBCaHUJI XJvmAfvHKqOHrSbWQ+b300k+leO0zg6lti1vT/OUfYuG1c0cDptzKYtdH rbNPgcyrLlan8EtMo2otAqGZJtEBgXLeVMTOBkfBuLUo5AyTRrvhwCdHj rlJPGuscbjPqNjVYpAfn96ogcjML6eRQ9KIzKKYcKP5XQ+1k1zU7w8HG9 aNIcXTh6105+WWeEylzi30UMpeb/SGh1thJ423OWZ5XP64nIFTQwJ51Js /4ooxAzU1KP+uXv16yWLgduAWWcVJs0BKpx2oMndYkp0iZi4r7JvMNQF/ A==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899116" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899116" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:07 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796047" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:07 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 02/44] Documentation/protection-keys: Clean up documentation for User Space pkeys Date: Thu, 27 Jan 2022 09:54:23 -0800 Message-Id: <20220127175505.851391-3-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The documentation for user space pkeys was a bit dated including things such as Amazon and distribution testing information which is irrelevant now. Update the documentation. This also streamlines adding the Supervisor Pkey documentation later on. Signed-off-by: Ira Weiny --- Documentation/core-api/protection-keys.rst | 43 ++++++++++------------ 1 file changed, 20 insertions(+), 23 deletions(-) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index ec575e72d0b2..12331db474aa 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -4,31 +4,28 @@ Memory Protection Keys =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 -Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature -which is found on Intel's Skylake (and later) "Scalable Processor" -Server CPUs. It will be available in future non-server Intel parts -and future AMD processors. - -For anyone wishing to test or use this feature, it is available in -Amazon's EC2 C5 instances and is known to work there using an Ubuntu -17.04 image. - -Memory Protection Keys provides a mechanism for enforcing page-based -protections, but without requiring modification of the page tables -when an application changes protection domains. It works by -dedicating 4 previously ignored bits in each page table entry to a -"protection key", giving 16 possible keys. - -There is also a new user-accessible register (PKRU) with two separate -bits (Access Disable and Write Disable) for each key. Being a CPU -register, PKRU is inherently thread-local, potentially giving each +Memory Protection Keys provide a mechanism for enforcing page-based +protections, but without requiring modification of the page tables when an +application changes protection domains. + +PKeys Userspace (PKU) is a feature which is found on Intel's Skylake "Scal= able +Processor" Server CPUs and later. And it will be available in future +non-server Intel parts and future AMD processors. + +pkeys work by dedicating 4 previously Reserved bits in each page table ent= ry to +a "protection key", giving 16 possible keys. + +Protections for each key are defined with a per-CPU user-accessible regist= er +(PKRU). Each of these is a 32-bit register storing two bits (Access Disab= le +and Write Disable) for each of 16 keys. + +Being a CPU register, PKRU is inherently thread-local, potentially giving = each thread a different set of protections from every other thread. =20 -There are two new instructions (RDPKRU/WRPKRU) for reading and writing -to the new register. The feature is only available in 64-bit mode, -even though there is theoretically space in the PAE PTEs. These -permissions are enforced on data access only and have no effect on -instruction fetches. +There are two instructions (RDPKRU/WRPKRU) for reading and writing to the +register. The feature is only available in 64-bit mode, even though there= is +theoretically space in the PAE PTEs. These permissions are enforced on da= ta +access only and have no effect on instruction fetches. =20 Syscalls =3D=3D=3D=3D=3D=3D=3D=3D --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3273C433F5 for ; Thu, 27 Jan 2022 17:55:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244904AbiA0Rzf (ORCPT ); Thu, 27 Jan 2022 12:55:35 -0500 Received: from mga12.intel.com ([192.55.52.136]:65458 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244721AbiA0RzP (ORCPT ); Thu, 27 Jan 2022 12:55:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306115; x=1674842115; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XIA6Fkcj/xbtufVQ0oXZugN7/T2Uo5WYvpoTA04XYtk=; b=ZmooLNAHLOutB3AgWpJ2pFk90U/hrDN7QR/OfGNFRbl0Kqp3U6uONVsk V1XCi20TLJrEa+aXBvYHz2evki6lv29+JzeXng/6P485UOpBdfU2G9EIp WwZHfFId8ceFaC13uHslXyvowhJE0a3m8TKnXYoAZROowFB5U7uXkcXml YYvIq+F9hbDfNFv89m9GPvl/izDRMu5jCpM0wdQ6vYovXeX9N8CGNbGi3 Ihr/em0C8XXyyhAAGvdT+dvmWXUPN5miqYSfKoqZTts5FZtairE1iCU9f aOYMKe2C6tTxRClOGOlz/6EWTZ6Thc1tYFyvV8MM5/e7mE/5GnOJzYHqI w==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899117" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899117" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:07 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796050" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:07 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 03/44] x86/pkeys: Create pkeys_common.h Date: Thu, 27 Jan 2022 09:54:24 -0800 Message-Id: <20220127175505.851391-4-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Protection Keys User (PKU) and Protection Keys Supervisor (PKS) work in similar fashions and can share common defines. Specifically PKS and PKU each have: 1. A single control register 2. The same number of keys 3. The same number of bits in the register per key 4. Access and Write disable in the same bit locations Given the above, share all the macros that synthesize and manipulate register values between the two features. Share these defines by moving them into a new header, change their names to reflect the common use, and include the header where needed. Also while editing the code remove the use of 'we' from comments being touched. NOTE the checkpatch errors are ignored for the init_pkru_value to align the values in the code. Signed-off-by: Ira Weiny Acked-by: Dave Hansen --- Changes from v7: Rebased onto latest --- arch/x86/include/asm/pkeys_common.h | 11 +++++++++++ arch/x86/include/asm/pkru.h | 20 ++++++++------------ arch/x86/kernel/fpu/xstate.c | 10 +++++----- arch/x86/mm/pkeys.c | 14 ++++++-------- 4 files changed, 30 insertions(+), 25 deletions(-) create mode 100644 arch/x86/include/asm/pkeys_common.h diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pke= ys_common.h new file mode 100644 index 000000000000..08c736669244 --- /dev/null +++ b/arch/x86/include/asm/pkeys_common.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_PKEYS_COMMON_H +#define _ASM_X86_PKEYS_COMMON_H + +#define PKR_AD_BIT 0x1u +#define PKR_WD_BIT 0x2u +#define PKR_BITS_PER_PKEY 2 + +#define PKR_AD_KEY(pkey) (PKR_AD_BIT << ((pkey) * PKR_BITS_PER_PKEY)) + +#endif /*_ASM_X86_PKEYS_COMMON_H */ diff --git a/arch/x86/include/asm/pkru.h b/arch/x86/include/asm/pkru.h index 74f0a2d34ffd..06980dd42946 100644 --- a/arch/x86/include/asm/pkru.h +++ b/arch/x86/include/asm/pkru.h @@ -3,10 +3,7 @@ #define _ASM_X86_PKRU_H =20 #include - -#define PKRU_AD_BIT 0x1u -#define PKRU_WD_BIT 0x2u -#define PKRU_BITS_PER_PKEY 2 +#include =20 #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS extern u32 init_pkru_value; @@ -18,18 +15,17 @@ extern u32 init_pkru_value; =20 static inline bool __pkru_allows_read(u32 pkru, u16 pkey) { - int pkru_pkey_bits =3D pkey * PKRU_BITS_PER_PKEY; - return !(pkru & (PKRU_AD_BIT << pkru_pkey_bits)); + int pkru_pkey_bits =3D pkey * PKR_BITS_PER_PKEY; + + return !(pkru & (PKR_AD_BIT << pkru_pkey_bits)); } =20 static inline bool __pkru_allows_write(u32 pkru, u16 pkey) { - int pkru_pkey_bits =3D pkey * PKRU_BITS_PER_PKEY; - /* - * Access-disable disables writes too so we need to check - * both bits here. - */ - return !(pkru & ((PKRU_AD_BIT|PKRU_WD_BIT) << pkru_pkey_bits)); + int pkru_pkey_bits =3D pkey * PKR_BITS_PER_PKEY; + + /* Access-disable disables writes too so check both bits here. */ + return !(pkru & ((PKR_AD_BIT|PKR_WD_BIT) << pkru_pkey_bits)); } =20 static inline u32 read_pkru(void) diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 02b3ddaf4f75..d8ddd306d225 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1089,19 +1089,19 @@ int arch_set_user_pkey_access(struct task_struct *t= sk, int pkey, if (WARN_ON_ONCE(pkey >=3D arch_max_pkey())) return -EINVAL; =20 - /* Set the bits we need in PKRU: */ + /* Set the bits needed in PKRU: */ if (init_val & PKEY_DISABLE_ACCESS) - new_pkru_bits |=3D PKRU_AD_BIT; + new_pkru_bits |=3D PKR_AD_BIT; if (init_val & PKEY_DISABLE_WRITE) - new_pkru_bits |=3D PKRU_WD_BIT; + new_pkru_bits |=3D PKR_WD_BIT; =20 /* Shift the bits in to the correct place in PKRU for pkey: */ - pkey_shift =3D pkey * PKRU_BITS_PER_PKEY; + pkey_shift =3D pkey * PKR_BITS_PER_PKEY; new_pkru_bits <<=3D pkey_shift; =20 /* Get old PKRU and mask off any old bits in place: */ old_pkru =3D read_pkru(); - old_pkru &=3D ~((PKRU_AD_BIT|PKRU_WD_BIT) << pkey_shift); + old_pkru &=3D ~((PKR_AD_BIT|PKR_WD_BIT) << pkey_shift); =20 /* Write old part along with new part: */ write_pkru(old_pkru | new_pkru_bits); diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index e44e938885b7..aa7042f272fb 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -110,19 +110,17 @@ int __arch_override_mprotect_pkey(struct vm_area_stru= ct *vma, int prot, int pkey return vma_pkey(vma); } =20 -#define PKRU_AD_KEY(pkey) (PKRU_AD_BIT << ((pkey) * PKRU_BITS_PER_PKEY)) - /* * Make the default PKRU value (at execve() time) as restrictive * as possible. This ensures that any threads clone()'d early * in the process's lifetime will not accidentally get access * to data which is pkey-protected later on. */ -u32 init_pkru_value =3D PKRU_AD_KEY( 1) | PKRU_AD_KEY( 2) | PKRU_AD_KEY( 3= ) | - PKRU_AD_KEY( 4) | PKRU_AD_KEY( 5) | PKRU_AD_KEY( 6) | - PKRU_AD_KEY( 7) | PKRU_AD_KEY( 8) | PKRU_AD_KEY( 9) | - PKRU_AD_KEY(10) | PKRU_AD_KEY(11) | PKRU_AD_KEY(12) | - PKRU_AD_KEY(13) | PKRU_AD_KEY(14) | PKRU_AD_KEY(15); +u32 init_pkru_value =3D PKR_AD_KEY( 1) | PKR_AD_KEY( 2) | PKR_AD_KEY( 3) | + PKR_AD_KEY( 4) | PKR_AD_KEY( 5) | PKR_AD_KEY( 6) | + PKR_AD_KEY( 7) | PKR_AD_KEY( 8) | PKR_AD_KEY( 9) | + PKR_AD_KEY(10) | PKR_AD_KEY(11) | PKR_AD_KEY(12) | + PKR_AD_KEY(13) | PKR_AD_KEY(14) | PKR_AD_KEY(15); =20 static ssize_t init_pkru_read_file(struct file *file, char __user *user_bu= f, size_t count, loff_t *ppos) @@ -155,7 +153,7 @@ static ssize_t init_pkru_write_file(struct file *file, * up immediately if someone attempts to disable access * or writes to pkey 0. */ - if (new_init_pkru & (PKRU_AD_BIT|PKRU_WD_BIT)) + if (new_init_pkru & (PKR_AD_BIT|PKR_WD_BIT)) return -EINVAL; =20 WRITE_ONCE(init_pkru_value, new_init_pkru); --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFA45C433F5 for ; Thu, 27 Jan 2022 17:55:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244877AbiA0Rzm (ORCPT ); Thu, 27 Jan 2022 12:55:42 -0500 Received: from mga12.intel.com ([192.55.52.136]:65458 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244568AbiA0RzQ (ORCPT ); Thu, 27 Jan 2022 12:55:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306116; x=1674842116; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4WzwXFLi0aZeUn6o8i43ScgQBDszfxXlboNeSVTWdms=; b=ghg2PP9DwPtAuHklKV0GoaCaCCoA3lu0CZFksmr3a4ZLm05VnRaOejHj 92chEC9dPpzkiRNNORG35w+bVgh1a9YK9bDh1S67rRxbvzGyKcMuNkNUG +bTLjmeq1RBSPlroeL8L3FhMvOx0NXIHN8R2JXZjvSy+hJYlXD9f3VAeX TTL9J02SqObv9uDktSQ/nKL0GXIMnqz21QIJxOfEm1h1F/dbjkQA8Hq6b /xTVojPfOMuSPlkhGj8sFenVToHJYxvBurMfhk6uHHEAKAXuuxdR2TfJt q2gp00kKl7G5mFTKtzrOkTnKN/T7NtKqz0rWx7AGvWI8XXNglBiTJ8CPZ w==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899118" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899118" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:08 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796054" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:07 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 04/44] x86/pkeys: Add additional PKEY helper macros Date: Thu, 27 Jan 2022 09:54:25 -0800 Message-Id: <20220127175505.851391-5-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Avoid open coding shift and mask operations by defining and using helper macros for PKey operations. Suggested-by: Dan Williams Signed-off-by: Ira Weiny --- Changes for V8 Move ahead of other patches. Simplify to only the macros used in the series --- arch/x86/include/asm/pkeys_common.h | 5 ++++- arch/x86/include/asm/pkru.h | 8 ++------ 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pke= ys_common.h index 08c736669244..d02ab5bc3fff 100644 --- a/arch/x86/include/asm/pkeys_common.h +++ b/arch/x86/include/asm/pkeys_common.h @@ -6,6 +6,9 @@ #define PKR_WD_BIT 0x2u #define PKR_BITS_PER_PKEY 2 =20 -#define PKR_AD_KEY(pkey) (PKR_AD_BIT << ((pkey) * PKR_BITS_PER_PKEY)) +#define PKR_PKEY_SHIFT(pkey) (pkey * PKR_BITS_PER_PKEY) + +#define PKR_AD_KEY(pkey) (PKR_AD_BIT << PKR_PKEY_SHIFT(pkey)) +#define PKR_WD_KEY(pkey) (PKR_WD_BIT << PKR_PKEY_SHIFT(pkey)) =20 #endif /*_ASM_X86_PKEYS_COMMON_H */ diff --git a/arch/x86/include/asm/pkru.h b/arch/x86/include/asm/pkru.h index 06980dd42946..81ddf88ac3c9 100644 --- a/arch/x86/include/asm/pkru.h +++ b/arch/x86/include/asm/pkru.h @@ -15,17 +15,13 @@ extern u32 init_pkru_value; =20 static inline bool __pkru_allows_read(u32 pkru, u16 pkey) { - int pkru_pkey_bits =3D pkey * PKR_BITS_PER_PKEY; - - return !(pkru & (PKR_AD_BIT << pkru_pkey_bits)); + return !(pkru & PKR_AD_KEY(pkey)); } =20 static inline bool __pkru_allows_write(u32 pkru, u16 pkey) { - int pkru_pkey_bits =3D pkey * PKR_BITS_PER_PKEY; - /* Access-disable disables writes too so check both bits here. */ - return !(pkru & ((PKR_AD_BIT|PKR_WD_BIT) << pkru_pkey_bits)); + return !(pkru & (PKR_AD_KEY(pkey) | PKR_WD_KEY(pkey))); } =20 static inline u32 read_pkru(void) --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E272C433EF for ; Thu, 27 Jan 2022 17:55:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244845AbiA0Rzq (ORCPT ); Thu, 27 Jan 2022 12:55:46 -0500 Received: from mga12.intel.com ([192.55.52.136]:65462 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244552AbiA0RzQ (ORCPT ); Thu, 27 Jan 2022 12:55:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306116; x=1674842116; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=whd0rbHsJD2wQTJItd3B2b9RjvqZN0XqO5BWc+Y0ono=; b=Mby0gyH6gNwDxdVlS03ginNmOrW2vKAk+EVcbX5ZKRzOZBo5k7HDTnUU pAcmSiVPkPfGHIbBZBTZk9gh8iQj+E65hyWlubHo9GWAitEEAGj8eRQCj RSQ3P9xcQ5PUomizIsKMR97gxTXzPplLPeKhjhOGaoSClHESk4mZ3W9Z6 rHUMIj8c76f4HdsSbGZWKPVYUzVms70Y8Jw5F4RfSkL4yw0wG8gop0Bep +9QjCk5/8F64zsatvef7bd6zjr3l+rigII9o2vg3j2n2pkF+55/e1n5Ib y79s4Bt2EbGNQF2njkneH2hPM9jCefnhAaZrTmQZMjsbATIBbcOR6A+ss g==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899120" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899120" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:08 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796057" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:07 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 05/44] x86/fpu: Refactor arch_set_user_pkey_access() Date: Thu, 27 Jan 2022 09:54:26 -0800 Message-Id: <20220127175505.851391-6-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Both PKU and PKS update their register values in the same way. They can therefore share the update code. Define a helper, pkey_update_pkval(), which will be used to support both Protection Key User (PKU) and the new Protection Key for Supervisor (PKS) in subsequent patches. pkey_update_pkval() contributed by Thomas Co-developed-by: Thomas Gleixner Signed-off-by: Thomas Gleixner Signed-off-by: Ira Weiny Acked-by: Dave Hansen --- Update for V8: Replace the code Peter provided in update_pkey_reg() for Thomas' pkey_update_pkval() -- https://lore.kernel.org/lkml/20200717085442.GX10769@hirez.programming.= kicks-ass.net/ --- arch/x86/include/asm/pkeys.h | 2 ++ arch/x86/kernel/fpu/xstate.c | 22 ++++------------------ arch/x86/mm/pkeys.c | 16 ++++++++++++++++ 3 files changed, 22 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index 1d5f14aff5f6..cc4d4f552f9d 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -131,4 +131,6 @@ static inline int vma_pkey(struct vm_area_struct *vma) return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT; } =20 +u32 pkey_update_pkval(u32 pkval, int pkey, u32 accessbits); + #endif /*_ASM_X86_PKEYS_H */ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index d8ddd306d225..00d059db4106 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1071,8 +1071,7 @@ void *get_xsave_addr(struct xregs_state *xsave, int x= feature_nr) int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val) { - u32 old_pkru, new_pkru_bits =3D 0; - int pkey_shift; + u32 pkru; =20 /* * This check implies XSAVE support. OSPKE only gets @@ -1089,22 +1088,9 @@ int arch_set_user_pkey_access(struct task_struct *ts= k, int pkey, if (WARN_ON_ONCE(pkey >=3D arch_max_pkey())) return -EINVAL; =20 - /* Set the bits needed in PKRU: */ - if (init_val & PKEY_DISABLE_ACCESS) - new_pkru_bits |=3D PKR_AD_BIT; - if (init_val & PKEY_DISABLE_WRITE) - new_pkru_bits |=3D PKR_WD_BIT; - - /* Shift the bits in to the correct place in PKRU for pkey: */ - pkey_shift =3D pkey * PKR_BITS_PER_PKEY; - new_pkru_bits <<=3D pkey_shift; - - /* Get old PKRU and mask off any old bits in place: */ - old_pkru =3D read_pkru(); - old_pkru &=3D ~((PKR_AD_BIT|PKR_WD_BIT) << pkey_shift); - - /* Write old part along with new part: */ - write_pkru(old_pkru | new_pkru_bits); + pkru =3D read_pkru(); + pkru =3D pkey_update_pkval(pkru, pkey, init_val); + write_pkru(pkru); =20 return 0; } diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index aa7042f272fb..cf12d8bf122b 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -190,3 +190,19 @@ static __init int setup_init_pkru(char *opt) return 1; } __setup("init_pkru=3D", setup_init_pkru); + +/* + * Kernel users use the same flags as user space: + * PKEY_DISABLE_ACCESS + * PKEY_DISABLE_WRITE + */ +u32 pkey_update_pkval(u32 pkval, int pkey, u32 accessbits) +{ + int shift =3D pkey * PKR_BITS_PER_PKEY; + + if (WARN_ON_ONCE(accessbits & ~PKEY_ACCESS_MASK)) + accessbits &=3D PKEY_ACCESS_MASK; + + pkval &=3D ~(PKEY_ACCESS_MASK << shift); + return pkval | accessbits << shift; +} --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7185CC433EF for ; Thu, 27 Jan 2022 17:56:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245145AbiA0R4N (ORCPT ); Thu, 27 Jan 2022 12:56:13 -0500 Received: from mga12.intel.com ([192.55.52.136]:65467 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244850AbiA0RzZ (ORCPT ); Thu, 27 Jan 2022 12:55:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306125; x=1674842125; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EcpsWPf8UJbDeP77dQwv3JjYYAgIDS+icWEivpBmEm8=; b=MNZn1pn6vEKMhRvNuCB9wi5i2syugC8iPViDiJJbBUM8gIMRA6Rvh7nV 3ckDhrAWWOj2XfO26sS0UNR3E7L7ZvTeWUsmdYdrciasjgYc9A+PecVSC /XqYiGOR7WbjfpvQYYeya8IyxQn0SYCQjcv+AN6aNEgszxKsbsFqkWmW/ TluwC7TNUkkYjUbqzyeb4sno5jk+/D1PPgMMltD/52WTNFidnzZleG6iI WxyrWybgw2QcwtOnk8K0JTY7d2XT6MfK8FZxdagsbQ5WSYLHDT8JrphBi vMCbKPQZqHpnGjzBlkMBtqDUMrg4kHbRP1hAyIO3b6ZCWPhv5e4Ylmjp6 Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899121" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899121" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:08 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796062" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:08 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 06/44] mm/pkeys: Add Kconfig options for PKS Date: Thu, 27 Jan 2022 09:54:27 -0800 Message-Id: <20220127175505.851391-7-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Protection Key Supervisor, PKS, is a feature used by kernel code only. As such if no kernel users are configured the PKS code is unnecessary overhead. Define a Kconfig structure which allows kernel code to detect PKS support by an architecture and then subsequently enable that support within the architecture. ARCH_HAS_SUPERVISOR_PKEYS indicates to kernel consumers that an architecture supports pkeys. PKS users can then select ARCH_ENABLE_SUPERVISOR_PKEYS to turn on the support within the architecture. If ARCH_ENABLE_SUPERVISOR_PKEYS is not selected architectures avoid the PKS overhead. ARCH_ENABLE_SUPERVISOR_PKEYS remains off until the first kernel use case sets it. Signed-off-by: Ira Weiny --- Changes for V8 Split this out to a single change patch --- arch/x86/Kconfig | 1 + mm/Kconfig | 4 ++++ 2 files changed, 5 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index ebe8fc76949a..a30fe85e27ac 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1867,6 +1867,7 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS depends on X86_64 && (CPU_SUP_INTEL || CPU_SUP_AMD) select ARCH_USES_HIGH_VMA_FLAGS select ARCH_HAS_PKEYS + select ARCH_HAS_SUPERVISOR_PKEYS help Memory Protection Keys provides a mechanism for enforcing page-based protections, but without requiring modification of the diff --git a/mm/Kconfig b/mm/Kconfig index 3326ee3903f3..46f2bb15aa4e 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -804,6 +804,10 @@ config ARCH_USES_HIGH_VMA_FLAGS bool config ARCH_HAS_PKEYS bool +config ARCH_HAS_SUPERVISOR_PKEYS + bool +config ARCH_ENABLE_SUPERVISOR_PKEYS + bool =20 config PERCPU_STATS bool "Collect percpu memory statistics" --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24DB1C433F5 for ; Thu, 27 Jan 2022 17:55:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245033AbiA0Rzx (ORCPT ); Thu, 27 Jan 2022 12:55:53 -0500 Received: from mga12.intel.com ([192.55.52.136]:65467 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234701AbiA0RzQ (ORCPT ); Thu, 27 Jan 2022 12:55:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306116; x=1674842116; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4Ac1mcQc1akR3aUY9wZB+gyhGRgjCgtM/2Y+XW9W564=; b=d28gKUVFXMFIEAn9BU42zmVFQCK2Tq22jeIM/6OsMLYqQTIvL0ZSD0t5 UiNLg909+ZR14OtXH7RYqXyMV4LUJjscwPEQjOH8dJM0DzNR+gTaIXglZ 3kN+lREB5X2QuEVdeD6O4+Hy/EVr4ivJpmyk74XoEdQAe/33vY7YAnKlc qgcRxJDJ13FtltYqIHvq9uBxVCPhV8GJN7XVAlCqvZppG9XImCXkHe+Q/ zYUS4TF4MREbMd0jr60HSIz7nEKsIGVFcHCJZLrQcSaVnmPJJ/j8OcJfj ELt6YGEAiDjdx7d2tKYynj7vwuDfCwcl5zcnyBhyldhXFAnHerwD4GKst w==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899122" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899122" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:08 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796065" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:08 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 07/44] x86/pkeys: Add PKS CPU feature bit Date: Thu, 27 Jan 2022 09:54:28 -0800 Message-Id: <20220127175505.851391-8-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Protection Keys for Supervisor pages (PKS) enables fast, hardware thread specific, manipulation of permission restrictions on supervisor page mappings. It uses the same mechanism of Protection Keys as those on User mappings but applies that mechanism to supervisor mappings using a supervisor specific MSR. The CPU indicates support for PKS in bit 31 of the ECX register after a cpuid instruction. Add the defines for this bit and the boilerplate disable infrastructure predicated on the Kconfig option. Signed-off-by: Ira Weiny --- Changes for V8 Split this out into it's own patch --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 +++++++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpuf= eatures.h index 6db4e2932b3d..b917605e9915 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -370,6 +370,7 @@ #define X86_FEATURE_MOVDIR64B (16*32+28) /* MOVDIR64B instruction */ #define X86_FEATURE_ENQCMD (16*32+29) /* ENQCMD and ENQCMDS instructions = */ #define X86_FEATURE_SGX_LC (16*32+30) /* Software Guard Extensions Launch= Control */ +#define X86_FEATURE_PKS (16*32+31) /* Protection Keys for Supervisor pag= es */ =20 /* AMD-defined CPU features, CPUID level 0x80000007 (EBX), word 17 */ #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery sup= port */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/as= m/disabled-features.h index 8f28fafa98b3..66fdad8f3941 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -44,6 +44,12 @@ # define DISABLE_OSPKE (1<<(X86_FEATURE_OSPKE & 31)) #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */ =20 +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS +# define DISABLE_PKS 0 +#else +# define DISABLE_PKS (1<<(X86_FEATURE_PKS & 31)) +#endif + #ifdef CONFIG_X86_5LEVEL # define DISABLE_LA57 0 #else @@ -85,7 +91,7 @@ #define DISABLED_MASK14 0 #define DISABLED_MASK15 0 #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UM= IP| \ - DISABLE_ENQCMD) + DISABLE_ENQCMD|DISABLE_PKS) #define DISABLED_MASK17 0 #define DISABLED_MASK18 0 #define DISABLED_MASK19 0 --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07AB0C433FE for ; Thu, 27 Jan 2022 17:56:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244437AbiA0Rz7 (ORCPT ); Thu, 27 Jan 2022 12:55:59 -0500 Received: from mga12.intel.com ([192.55.52.136]:65473 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244760AbiA0RzR (ORCPT ); Thu, 27 Jan 2022 12:55:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306117; x=1674842117; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5N4sLTNdeUIAZnJLedt6mzC6DSo6VpBb9ZGGJjLgZSQ=; b=OVD/XmczWmbtmQFdtvmVK3ZfQ7WHS2LXFYewgwSnTgjuzQ+mY8NJteAI a36NEXztet5QS5Imp5Kvp9+eBUMOVXVCGKNKSmnT4KATW80n7aVtV8gZc nMrVSitj2dM6DOkU3tWKlR5UAseXvl/OBU26oH6uxwLXDPCiwRTX8Aahi WiVcDCKi4Y2945jTVHwShcU9dZJedX4r9qqjR1WnQvLq8aXNQd134LsoR 03e+dc9cu4pHBNbe9GIGfEiHsKjX5Cj8HIv3/WnI/3REFJwef5kCqhwrj 3umbJ685QLh/brpckS6DrcjBbt1G/ggzk81EaD/cnhu33mTPar9Kfovsq g==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899123" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899123" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:08 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796070" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:08 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 08/44] x86/fault: Adjust WARN_ON for PKey fault Date: Thu, 27 Jan 2022 09:54:29 -0800 Message-Id: <20220127175505.851391-9-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Previously if a Protection key fault occurred it indicated something very wrong because user page mappings are not supposed to be in the kernel address space. Now PKey faults may happen on kernel mappings if the feature is enabled. If PKS is enabled, avoid the warning in the fault path. Cc: Sean Christopherson Cc: Dan Williams Signed-off-by: Ira Weiny --- arch/x86/mm/fault.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index d0074c6ed31a..6ed91b632eac 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1148,11 +1148,15 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned l= ong hw_error_code, unsigned long address) { /* - * Protection keys exceptions only happen on user pages. We - * have no user pages in the kernel portion of the address - * space, so do not expect them here. + * X86_PF_PK (Protection key exceptions) may occur on kernel addresses + * when PKS (PKeys Supervisor) is enabled. + * + * However, if PKS is not enabled WARN if this exception is seen + * because there are no user pages in the kernel portion of the address + * space. */ - WARN_ON_ONCE(hw_error_code & X86_PF_PK); + WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_PKS) && + (hw_error_code & X86_PF_PK)); =20 #ifdef CONFIG_X86_32 /* --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 991E5C433EF for ; Thu, 27 Jan 2022 17:56:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245108AbiA0R4J (ORCPT ); Thu, 27 Jan 2022 12:56:09 -0500 Received: from mga12.intel.com ([192.55.52.136]:65462 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244837AbiA0RzZ (ORCPT ); Thu, 27 Jan 2022 12:55:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306125; x=1674842125; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=f7V0dsd3Osql3GdcBex4B2w6PKjtIBuODoy6hMsgF5w=; b=oCIhi6yvjPKAH4nKuoToAGc81tPhmbfZ/UarkmVQLxGstW7ItcXMtvHC Rd3YMz2HH5HIN/hrxGKChrU+K/1DtawOw3ueV3SEQUcuOCoHA7uxQQAVA qCNJS4BdSPAguR0byO1u+otBfsVHB2sBn3QTHix5zO+giiZW8H5wtcauV ij2HObThwTNVOMtZdSrg+sn5XB9VPc2NYijhjUViK1BhT7v8GUJo/x8ZW BhOjc5MirFBYVeHCB2tClijTgkE/1b7xCFQnJ4NQ6shLCcO/5//t0Gef9 rw31NsBrphzQ/+tVfqy69EvDpYwk/FO69LB3mRFl4niXnDpxwEOtW+Qzr Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899124" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899124" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:08 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796073" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:08 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 09/44] x86/pkeys: Enable PKS on cpus which support it Date: Thu, 27 Jan 2022 09:54:30 -0800 Message-Id: <20220127175505.851391-10-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Protection Keys for Supervisor pages (PKS) enables fast, hardware thread specific, manipulation of permission restrictions on supervisor page mappings. It uses the same mechanism of Protection Keys as those on User mappings but applies that mechanism to supervisor mappings using a supervisor specific MSR. Bit 24 of CR4 is used to enable the feature by software. Define pks_setup() to be called when PKS is configured. Initially, pks_setup() initializes the per-cpu MSR with 0 to enable all access on all pkeys. asm/pks.h is added as a new file to store new internal functions and structures such as pks_setup(). Co-developed-by: Fenghua Yu Signed-off-by: Fenghua Yu Signed-off-by: Ira Weiny --- Changes for V8 Move setup_pks() into this patch with a default of all access for all pkeys. From Thomas s/setup_pks/pks_setup/ Update Change log to better reflect exactly what this patch does. --- arch/x86/include/asm/msr-index.h | 1 + arch/x86/include/asm/pks.h | 15 +++++++++++++++ arch/x86/include/uapi/asm/processor-flags.h | 2 ++ arch/x86/kernel/cpu/common.c | 2 ++ arch/x86/mm/pkeys.c | 16 ++++++++++++++++ 5 files changed, 36 insertions(+) create mode 100644 arch/x86/include/asm/pks.h diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-in= dex.h index 3faf0f97edb1..fca56ca646a0 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -786,6 +786,7 @@ =20 #define MSR_IA32_TSC_DEADLINE 0x000006E0 =20 +#define MSR_IA32_PKRS 0x000006E1 =20 #define MSR_TSX_FORCE_ABORT 0x0000010F =20 diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h new file mode 100644 index 000000000000..8180fc59790b --- /dev/null +++ b/arch/x86/include/asm/pks.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_PKS_H +#define _ASM_X86_PKS_H + +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + +void pks_setup(void); + +#else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ + +static inline void pks_setup(void) { } + +#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ + +#endif /* _ASM_X86_PKS_H */ diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include= /uapi/asm/processor-flags.h index bcba3c643e63..191c574b2390 100644 --- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -130,6 +130,8 @@ #define X86_CR4_SMAP _BITUL(X86_CR4_SMAP_BIT) #define X86_CR4_PKE_BIT 22 /* enable Protection Keys support */ #define X86_CR4_PKE _BITUL(X86_CR4_PKE_BIT) +#define X86_CR4_PKS_BIT 24 /* enable Protection Keys for Supervisor */ +#define X86_CR4_PKS _BITUL(X86_CR4_PKS_BIT) =20 /* * x86-64 Task Priority Register, CR8 diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 7b8382c11788..83c1abce7d93 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -59,6 +59,7 @@ #include #include #include +#include =20 #include "cpu.h" =20 @@ -1632,6 +1633,7 @@ static void identify_cpu(struct cpuinfo_x86 *c) =20 x86_init_rdrand(c); setup_pku(c); + pks_setup(); =20 /* * Clear/Set all flags overridden by options, need do it diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index cf12d8bf122b..02629219e683 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -206,3 +206,19 @@ u32 pkey_update_pkval(u32 pkval, int pkey, u32 accessb= its) pkval &=3D ~(PKEY_ACCESS_MASK << shift); return pkval | accessbits << shift; } + +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + +/* + * PKS is independent of PKU and either or both may be supported on a CPU. + */ +void pks_setup(void) +{ + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + wrmsrl(MSR_IA32_PKRS, 0); + cr4_set_bits(X86_CR4_PKS); +} + +#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D352C433F5 for ; Thu, 27 Jan 2022 17:56:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245087AbiA0R4G (ORCPT ); Thu, 27 Jan 2022 12:56:06 -0500 Received: from mga12.intel.com ([192.55.52.136]:65458 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244833AbiA0RzY (ORCPT ); Thu, 27 Jan 2022 12:55:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306124; x=1674842124; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=j+f4pB+zxhX52T3Z+EIdgYytL0v+soefFOtz8ajtICA=; b=l9B+LgzeB+CzhyqeewZKg7poc/iX68K4orhbTdiNXRP85yaMH1OWDbl6 c4qIhdgpUA+Zmde95G5MK5okuggBD6wohCX3ok1xkl5h2YT2JbW9XiMbF DK2RdypXFXG3OTTCubl534Dj0CvwzO86LMb/URRGttsSPuZdekcrNjiSr iU1ntRlonebua1dt2iWUMo/DW6qA2tjsULs5HwKj4M5gTjLybOt83m/Yz h+4MLxo+9b8PhIhz84SIKdEkIABP3sfwXw181ktso0IDYlJ3m+HzcDGF/ UiVIEHicaJicZb50atGYS1VCkSThf++V1lQeuq1vJCJ5F8LOt8YRL8cc9 g==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899125" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899125" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:08 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796077" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:08 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 10/44] Documentation/pkeys: Add initial PKS documentation Date: Thu, 27 Jan 2022 09:54:31 -0800 Message-Id: <20220127175505.851391-11-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Add initial overview and configuration information about PKS. Signed-off-by: Ira Weiny --- Documentation/core-api/protection-keys.rst | 57 ++++++++++++++++++++-- 1 file changed, 53 insertions(+), 4 deletions(-) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index 12331db474aa..58670e3ee39e 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -12,6 +12,9 @@ PKeys Userspace (PKU) is a feature which is found on Inte= l's Skylake "Scalable Processor" Server CPUs and later. And it will be available in future non-server Intel parts and future AMD processors. =20 +Protection Keys for Supervisor pages (PKS) is available in the SDM since M= ay +2020. + pkeys work by dedicating 4 previously Reserved bits in each page table ent= ry to a "protection key", giving 16 possible keys. =20 @@ -22,13 +25,20 @@ and Write Disable) for each of 16 keys. Being a CPU register, PKRU is inherently thread-local, potentially giving = each thread a different set of protections from every other thread. =20 -There are two instructions (RDPKRU/WRPKRU) for reading and writing to the -register. The feature is only available in 64-bit mode, even though there= is +For Userspace (PKU), there are two instructions (RDPKRU/WRPKRU) for readin= g and +writing to the register. + +For Supervisor (PKS), the register (MSR_IA32_PKRS) is accessible only to t= he +kernel through rdmsr and wrmsr. + +The feature is only available in 64-bit mode, even though there is theoretically space in the PAE PTEs. These permissions are enforced on da= ta access only and have no effect on instruction fetches. =20 -Syscalls -=3D=3D=3D=3D=3D=3D=3D=3D + + +Syscalls for user space keys +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D =20 There are 3 system calls which directly interact with pkeys:: =20 @@ -95,3 +105,42 @@ with a read():: The kernel will send a SIGSEGV in both cases, but si_code will be set to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when the plain mprotect() permissions are violated. + + +Kernel API for PKS support +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D + +Overview +-------- + +Similar to user space pkeys, supervisor pkeys allow additional protections= to +be defined for a supervisor mappings. Unlike user space pkeys, violations= of +these protections result in a kernel oops. + +Supervisor Memory Protection Keys (PKS) is a feature which is found on Int= el's +Sapphire Rapids (and later) "Scalable Processor" Server CPUs. It will als= o be +available in future non-server Intel parts. + +Also qemu has support as well: https://www.qemu.org/2021/04/30/qemu-6-0-0/ + +Kconfig +------- +Kernel users intending to use PKS support should depend on +ARCH_HAS_SUPERVISOR_PKEYS, and select ARCH_ENABLE_SUPERVISOR_PKEYS to turn= on +this support within the core. + + +MSR details +----------- + +It should be noted that the underlying WRMSR(MSR_IA32_PKRS) is not seriali= zing +but still maintains ordering properties similar to WRPKRU. + +Older versions of the SDM on PKRS may be wrong with regard to this +serialization. The text should be the same as that of WRPKRU. From the W= RPKRU +text: + + WRPKRU will never execute transiently. Memory accesses + affected by PKRU register will not execute (even transiently) + until all prior executions of WRPKRU have completed execution + and updated the PKRU register. --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02D57C433FE for ; Thu, 27 Jan 2022 17:55:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244668AbiA0RzO (ORCPT ); Thu, 27 Jan 2022 12:55:14 -0500 Received: from mga04.intel.com ([192.55.52.120]:37541 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240185AbiA0RzJ (ORCPT ); Thu, 27 Jan 2022 12:55:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306109; x=1674842109; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=w+BqwH+CQKO+ZVKKUrwBRI90a7JcX6kP/I2yOdwHEMM=; b=aOPYtRtFHK1YqNhG7ED/IlqK+wbesoNNM6KPJn9k620uEE2WFXyWv3bN ITw8+WuQvxWv5Dx9sWtCqHiKvqNSmklzKmP5rUqMJza4jBynix0Nzyd1p B34wOy58+yW02QxZx3Pg/6D5xMtUQ30k2noKKq9MXDA98husHCl23LDj9 WErMZ4u9oDOGxj66316KZslFRRZiyyjIqMDs4FC/VFZpkyzO05dz2ZZwD 7Ag0ZoVF8SgV3Yu9IhfLByq5TK4Ed3yBqrYAebYxwhcyO4CwqpZ9FnTqF 6V9s+Wm2/8C0HgDloD4/FYEwPcK8grVP/ETnjnRSStm0gwiolt24Na9Sf g==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="245766389" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="245766389" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:09 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796080" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:08 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 11/44] mm/pkeys: Define static PKS key array and default values Date: Thu, 27 Jan 2022 09:54:32 -0800 Message-Id: <20220127175505.851391-12-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Kernel users will need a way to allocate a PKS Pkey for their use. Introduce pks-keys.h as a place to define enum pks_pkey_consumers and the macro PKS_INIT_VALUE. PKS_INIT_VALUE holds the default value for each key. Kernel users reserve a key value by adding an entry to the enum pks_pkey_consumers with a unique value [1-15] and replacing that value in the PKS_INIT_VALUE macro using the desired default macro; PKR_RW_KEY(), PKR_WD_KEY(), or PKR_AD_KEY(). Use this value to initialize all CPUs at boot. pks-keys.h is added as a new header with minimal header dependencies. This allows the use of PKS_INIT_VALUE within other headers where the additional includes from pkeys.h caused major conflicts. The main conflict was using PKS_INIT_VALUE for INIT_TRHEAD in asm/processor.h Add documentation. Signed-off-by: Ira Weiny --- Changes for V8 Create pks-keys.h to solve header conflicts in subsequent patches. Remove create_initial_pkrs_value() which did not work Replace it with PKS_INIT_VALUE Fix up documentation to match s/PKR_RW_BIT/PKR_RW_KEY()/ s/PKRS_INIT_VALUE/PKS_INIT_VALUE Split this off of the previous patch Update documentation and embed it in the code to help ensure it is kept up to date. Changes for V7 Create a dynamic pkrs_initial_value in early init code. Clean up comments Add comment to macro guard --- Documentation/core-api/protection-keys.rst | 4 ++ arch/x86/include/asm/pkeys_common.h | 1 + arch/x86/mm/pkeys.c | 2 +- include/linux/pkeys.h | 2 + include/linux/pks-keys.h | 59 ++++++++++++++++++++++ 5 files changed, 67 insertions(+), 1 deletion(-) create mode 100644 include/linux/pks-keys.h diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index 58670e3ee39e..af283a1a9aa0 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -129,6 +129,10 @@ Kernel users intending to use PKS support should depen= d on ARCH_HAS_SUPERVISOR_PKEYS, and select ARCH_ENABLE_SUPERVISOR_PKEYS to turn= on this support within the core. =20 +PKS Key Allocation +------------------ +.. kernel-doc:: include/linux/pks-keys.h + :doc: PKS_KEY_ALLOCATION =20 MSR details ----------- diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pke= ys_common.h index d02ab5bc3fff..efb101dee3aa 100644 --- a/arch/x86/include/asm/pkeys_common.h +++ b/arch/x86/include/asm/pkeys_common.h @@ -8,6 +8,7 @@ =20 #define PKR_PKEY_SHIFT(pkey) (pkey * PKR_BITS_PER_PKEY) =20 +#define PKR_RW_KEY(pkey) (0 << PKR_PKEY_SHIFT(pkey)) #define PKR_AD_KEY(pkey) (PKR_AD_BIT << PKR_PKEY_SHIFT(pkey)) #define PKR_WD_KEY(pkey) (PKR_WD_BIT << PKR_PKEY_SHIFT(pkey)) =20 diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 02629219e683..a5b5b86e97ce 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -217,7 +217,7 @@ void pks_setup(void) if (!cpu_feature_enabled(X86_FEATURE_PKS)) return; =20 - wrmsrl(MSR_IA32_PKRS, 0); + wrmsrl(MSR_IA32_PKRS, PKS_INIT_VALUE); cr4_set_bits(X86_CR4_PKS); } =20 diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index 86be8bf27b41..e9ea8f152915 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -48,4 +48,6 @@ static inline bool arch_pkeys_enabled(void) =20 #endif /* ! CONFIG_ARCH_HAS_PKEYS */ =20 +#include + #endif /* _LINUX_PKEYS_H */ diff --git a/include/linux/pks-keys.h b/include/linux/pks-keys.h new file mode 100644 index 000000000000..05fe4a1cf888 --- /dev/null +++ b/include/linux/pks-keys.h @@ -0,0 +1,59 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_PKS_KEYS_H +#define _LINUX_PKS_KEYS_H + +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + +#include + +/** + * DOC: PKS_KEY_ALLOCATION + * + * Users reserve a key value by adding an entry to enum pks_pkey_consumers= with + * a unique value from 1 to 15. Then replacing that value in the + * PKS_INIT_VALUE macro using the desired default protection; PKR_RW_KEY(), + * PKR_WD_KEY(), or PKR_AD_KEY(). + * + * PKS_KEY_DEFAULT must remain 0 key with a default of read/write to suppo= rt + * non-pks protected pages. Unused keys should be set (Access Disabled + * PKR_AD_KEY()). + * + * For example to configure a key for 'MY_FEATURE' with a default of Write + * Disabled. + * + * .. code-block:: c + * + * enum pks_pkey_consumers + * { + * PKS_KEY_DEFAULT =3D 0, + * PKS_KEY_MY_FEATURE =3D 1, + * } + * + * #define PKS_INIT_VALUE (PKR_RW_KEY(PKS_KEY_DEFAULT) | + * PKR_WD_KEY(PKS_KEY_MY_FEATURE) | + * PKR_AD_KEY(2) | PKR_AD_KEY(3) | + * PKR_AD_KEY(4) | PKR_AD_KEY(5) | + * PKR_AD_KEY(6) | PKR_AD_KEY(7) | + * PKR_AD_KEY(8) | PKR_AD_KEY(9) | + * PKR_AD_KEY(10) | PKR_AD_KEY(11) | + * PKR_AD_KEY(12) | PKR_AD_KEY(13) | + * PKR_AD_KEY(14) | PKR_AD_KEY(15)) + * + */ +enum pks_pkey_consumers { + PKS_KEY_DEFAULT =3D 0, /* Must be 0 for default PTE values */ +}; + +#define PKS_INIT_VALUE (PKR_RW_KEY(PKS_KEY_DEFAULT) | \ + PKR_AD_KEY(1) | \ + PKR_AD_KEY(2) | PKR_AD_KEY(3) | \ + PKR_AD_KEY(4) | PKR_AD_KEY(5) | \ + PKR_AD_KEY(6) | PKR_AD_KEY(7) | \ + PKR_AD_KEY(8) | PKR_AD_KEY(9) | \ + PKR_AD_KEY(10) | PKR_AD_KEY(11) | \ + PKR_AD_KEY(12) | PKR_AD_KEY(13) | \ + PKR_AD_KEY(14) | PKR_AD_KEY(15)) + +#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ + +#endif /* _LINUX_PKS_KEYS_H */ --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46083C433F5 for ; Thu, 27 Jan 2022 17:55:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244747AbiA0RzQ (ORCPT ); Thu, 27 Jan 2022 12:55:16 -0500 Received: from mga02.intel.com ([134.134.136.20]:19413 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234701AbiA0RzJ (ORCPT ); Thu, 27 Jan 2022 12:55:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306109; x=1674842109; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=kTOs7mI3xPl7MCxOVUgiwlIET8BTz8VGJvSH//yf5OA=; b=WfUnqIGe3GQ3SWnraIj1w8qRTKt+jYT8aCW8on7h/54NXJWbnPt+crZg VTd/6o8DJPIc2MZyKl5Y9+qwvmuKtFkotMb6W2odnEYc/4dqVQZvT1Sck dXTga7zdpp0W3TGuyHoId1i8ZzUf5Yv+KagbnbyT2eUmH0o4HmUVfegEY O8Kp2zdiHN08PoVHZShunz+xkYSvZtZnv7lXAsCpcRIgB0oilvGNnZeqz sivymIZo+4O+a9B/aFTeZIbnz51oeh8pwtaA+3YVzROra5fLhqO1B0N9O Ku3L+CP1KW18JvpXqWTCiDf6Mwt6SsT568zjkTAcCjdGgLxhBoH5ak0c7 w==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302422" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302422" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:09 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796084" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:09 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 12/44] mm/pkeys: Define PKS page table macros Date: Thu, 27 Jan 2022 09:54:33 -0800 Message-Id: <20220127175505.851391-13-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Fenghua Yu Kernel users will need a way to assign their pkey to pages. Define _PAGE_PKEY() and PAGE_KERNEL_PKEY() to allow users to set a pkey on a PTE. Add documentation. Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Signed-off-by: Fenghua Yu --- Changes for V8 Split out from the 'Add PKS kernel API' patch Include documentation in this patch --- Documentation/core-api/protection-keys.rst | 7 +++++++ arch/x86/include/asm/pgtable_types.h | 22 ++++++++++++++++++++++ include/linux/pgtable.h | 4 ++++ 3 files changed, 33 insertions(+) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index af283a1a9aa0..794b7dedc544 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -134,6 +134,13 @@ PKS Key Allocation .. kernel-doc:: include/linux/pks-keys.h :doc: PKS_KEY_ALLOCATION =20 +Adding Pages to a PKey protected domain +--------------------------------------- + +.. kernel-doc:: arch/x86/include/asm/pgtable_types.h + :doc: PKS_KEY_ASSIGNMENT + + MSR details ----------- =20 diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pg= table_types.h index 40497a9020c6..e1d4535b525e 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -71,6 +71,22 @@ _PAGE_PKEY_BIT2 | \ _PAGE_PKEY_BIT3) =20 +/** + * DOC: PKS_KEY_ASSIGNMENT + * + * The following macros are used to set a pkey value in a supervisor PTE. + * + * .. code-block:: c + * + * #define _PAGE_KEY(pkey) + * #define PAGE_KERNEL_PKEY(pkey) + */ +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS +#define _PAGE_PKEY(pkey) (_AT(pteval_t, pkey) << _PAGE_BIT_PKEY_BIT0) +#else +#define _PAGE_PKEY(pkey) (_AT(pteval_t, 0)) +#endif + #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) #define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY | _PAGE_ACCESSED) #else @@ -226,6 +242,12 @@ enum page_cache_mode { #define PAGE_KERNEL_IO __pgprot_mask(__PAGE_KERNEL_IO) #define PAGE_KERNEL_IO_NOCACHE __pgprot_mask(__PAGE_KERNEL_IO_NOCACHE) =20 +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS +#define PAGE_KERNEL_PKEY(pkey) __pgprot_mask(__PAGE_KERNEL | _PAGE_PKEY(pk= ey)) +#else +#define PAGE_KERNEL_PKEY(pkey) PAGE_KERNEL +#endif + #endif /* __ASSEMBLY__ */ =20 /* xwr */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index bc8713a76e03..2864066e03ec 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1510,6 +1510,10 @@ static inline bool arch_has_pfn_modify_check(void) # define PAGE_KERNEL_EXEC PAGE_KERNEL #endif =20 +#ifndef PAGE_KERNEL_PKEY +#define PAGE_KERNEL_PKEY(pkey) PAGE_KERNEL +#endif + /* * Page Table Modification bits for pgtbl_mod_mask. * --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9560CC433FE for ; Thu, 27 Jan 2022 17:55:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239433AbiA0RzR (ORCPT ); Thu, 27 Jan 2022 12:55:17 -0500 Received: from mga02.intel.com ([134.134.136.20]:19415 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242936AbiA0RzL (ORCPT ); Thu, 27 Jan 2022 12:55:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306110; x=1674842110; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dB8erXX6YLCIrSk0eQ+TKYajzyzXOLeTrTFmLzkQS8I=; b=HLek6GcVTRAKKRQAZFGNHCiqaDLcIFopItpU86Eu+k4f24HfLVTCb/VS aALlaov4Y0sgb4fLpIYnOJju+4SR3Gls79V5Gl7XloKunXXHFmMFbp9CP O3lq1j+kELmLNmwLfXFqcHlbRjen2pvMR8e8bY7aMQeb+QKSfo8qgg6W3 nwqLYnzzOIbk3u3fw6RweXEhiP+4FdctvHDaecfniZQRsRue/Hac+umvI AUJVh/YQcGJhxfMyesYO/BmL2p9c20dRUpcM9czmKjQhERXmVd3ZF5001 3a5A2yWs2HGzFJFkJo+BLlk9I384FqxSrPNl+6ZOsXGPiULQy9C2OczUC g==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302423" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302423" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:09 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796088" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:09 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 13/44] mm/pkeys: Add initial PKS Test code Date: Thu, 27 Jan 2022 09:54:34 -0800 Message-Id: <20220127175505.851391-14-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The core PKS functionality provides an interface for kernel users to reserve a key and set up a mapping with that key. Define test code under CONFIG_PKS_TEST which allows the testing of the enablement of PKS functionality, basic setting of a page with a pkey, and ensures all defaults are set properly. Assign a pkey to the test code. While this test does waste a pkey value this should not be a problem while there remains a very limited numbers of potential pkey users. If pkeys are exhausted in the future the test can be made exclusive or shared with another user. Operation is simple. A test is requested by echo'ing the number of the test into the debugfs file. The result of the last test is reported by reading the file. $ echo 0 > /sys/kernel/debug/x86/run_pks $ cat /sys/kernel/debug/x86/run_pks PASS Two initial tests are created. One to check that the default values have been properly assigned and a second which purposely causes a fault. Add documentation. Signed-off-by: Ira Weiny --- Changes for V8 Ensure that unknown tests are flagged as failures. Split out the various tests into their own patches which test the functionality as the series goes. Move this basic test forward in the series Changes for V7 Add testing for pks_abandon_protections() Adjust pkrs_init_value Adjust for new defines Clean up comments Adjust test for static allocation of pkeys Use lookup_address() instead of follow_pte() follow_pte only works on IO and raw PFN mappings, use lookup_address() instead. lookup_address() is constrained to architectures which support it. --- Documentation/core-api/protection-keys.rst | 8 + include/linux/pks-keys.h | 3 +- lib/Kconfig.debug | 12 ++ lib/Makefile | 3 + lib/pks/Makefile | 3 + lib/pks/pks_test.c | 214 +++++++++++++++++++++ 6 files changed, 242 insertions(+), 1 deletion(-) create mode 100644 lib/pks/Makefile create mode 100644 lib/pks/pks_test.c diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index 794b7dedc544..234122e56a92 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -155,3 +155,11 @@ text: affected by PKRU register will not execute (even transiently) until all prior executions of WRPKRU have completed execution and updated the PKRU register. + +Testing +------- + +Example code can be found in lib/pks/pks_test.c + +.. kernel-doc:: lib/pks/pks_test.c + :doc: PKS_TEST diff --git a/include/linux/pks-keys.h b/include/linux/pks-keys.h index 05fe4a1cf888..69a0be979515 100644 --- a/include/linux/pks-keys.h +++ b/include/linux/pks-keys.h @@ -42,10 +42,11 @@ */ enum pks_pkey_consumers { PKS_KEY_DEFAULT =3D 0, /* Must be 0 for default PTE values */ + PKS_KEY_TEST =3D 1, }; =20 #define PKS_INIT_VALUE (PKR_RW_KEY(PKS_KEY_DEFAULT) | \ - PKR_AD_KEY(1) | \ + PKR_AD_KEY(PKS_KEY_TEST) | \ PKR_AD_KEY(2) | PKR_AD_KEY(3) | \ PKR_AD_KEY(4) | PKR_AD_KEY(5) | \ PKR_AD_KEY(6) | PKR_AD_KEY(7) | \ diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 14b89aa37c5c..5cab2100c133 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2685,6 +2685,18 @@ config HYPERV_TESTING help Select this option to enable Hyper-V vmbus testing. =20 +config PKS_TEST + bool "PKey (S)upervisor testing" + depends on ARCH_HAS_SUPERVISOR_PKEYS + select ARCH_ENABLE_SUPERVISOR_PKEYS + help + Select this option to enable testing of PKS core software and + hardware. + + Answer N if you don't know what supervisor keys are. + + If unsure, say N. + endmenu # "Kernel Testing and Coverage" =20 source "Documentation/Kconfig" diff --git a/lib/Makefile b/lib/Makefile index 300f569c626b..038a93c89714 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -398,3 +398,6 @@ $(obj)/$(TEST_FORTIFY_LOG): $(addprefix $(obj)/, $(TEST= _FORTIFY_LOGS)) FORCE ifeq ($(CONFIG_FORTIFY_SOURCE),y) $(obj)/string.o: $(obj)/$(TEST_FORTIFY_LOG) endif + +# PKS test +obj-y +=3D pks/ diff --git a/lib/pks/Makefile b/lib/pks/Makefile new file mode 100644 index 000000000000..9daccba4f7c4 --- /dev/null +++ b/lib/pks/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-$(CONFIG_PKS_TEST) +=3D pks_test.o diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c new file mode 100644 index 000000000000..159576dda47c --- /dev/null +++ b/lib/pks/pks_test.c @@ -0,0 +1,214 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright(c) 2021 Intel Corporation. All rights reserved. + */ + +/** + * DOC: PKS_TEST + * + * If CONFIG_PKS_TEST is enabled a debugfs file is created to initiate in + * kernel testing. These can be triggered by: + * + * $ echo X > /sys/kernel/debug/x86/run_pks + * + * where X is: + * + * * 0 Loop through all CPUs, report the msr, and check against the defau= lt. + * * 9 Set up and fault on a PKS protected page. + * + * NOTE: 9 will fault on purpose. Therefore, it requires the option to be + * specified 2 times in a row to ensure the intent to run it. + * + * $ cat /sys/kernel/debug/x86/run_pks + * + * Will print the result of the last test. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include + +#define PKS_TEST_MEM_SIZE (PAGE_SIZE) + +#define CHECK_DEFAULTS 0 +#define RUN_CRASH_TEST 9 + +static struct dentry *pks_test_dentry; +static bool crash_armed; + +static bool last_test_pass; + +struct pks_test_ctx { + int pkey; + char data[64]; +}; + +static void *alloc_test_page(int pkey) +{ + return __vmalloc_node_range(PKS_TEST_MEM_SIZE, 1, VMALLOC_START, VMALLOC_= END, + GFP_KERNEL, PAGE_KERNEL_PKEY(pkey), 0, + NUMA_NO_NODE, __builtin_return_address(0)); +} + +static struct pks_test_ctx *alloc_ctx(u8 pkey) +{ + struct pks_test_ctx *ctx =3D kzalloc(sizeof(*ctx), GFP_KERNEL); + + if (!ctx) { + pr_err("Failed to allocate memory for test context\n"); + return ERR_PTR(-ENOMEM); + } + + ctx->pkey =3D pkey; + sprintf(ctx->data, "%s", "DEADBEEF"); + return ctx; +} + +static void free_ctx(struct pks_test_ctx *ctx) +{ + kfree(ctx); +} + +static void crash_it(void) +{ + struct pks_test_ctx *ctx; + void *ptr; + + pr_warn(" ***** BEGIN: Unhandled fault test *****\n"); + + ctx =3D alloc_ctx(PKS_KEY_TEST); + if (IS_ERR(ctx)) { + pr_err("Failed to allocate context???\n"); + return; + } + + ptr =3D alloc_test_page(ctx->pkey); + if (!ptr) { + pr_err("Failed to vmalloc page???\n"); + return; + } + + /* This purposely faults */ + memcpy(ptr, ctx->data, 8); + + /* Should never get here if so the test failed */ + last_test_pass =3D false; + + vfree(ptr); + free_ctx(ctx); +} + +static void check_pkey_settings(void *data) +{ + unsigned long long msr =3D 0; + unsigned int cpu =3D smp_processor_id(); + + rdmsrl(MSR_IA32_PKRS, msr); + if (msr !=3D PKS_INIT_VALUE) { + pr_err("cpu %d value incorrect : 0x%llx expected 0x%x\n", + cpu, msr, PKS_INIT_VALUE); + last_test_pass =3D false; + } +} + +static void arm_or_run_crash_test(void) +{ + /* + * WARNING: Test "9" will crash. + * + * Arm the test and print a warning. A second "9" will run the test. + */ + if (!crash_armed) { + pr_warn("CAUTION: The crash test will cause an oops.\n"); + pr_warn(" Specify 9 a second time to run\n"); + pr_warn(" run any other test to clear\n"); + crash_armed =3D true; + return; + } + + crash_it(); + crash_armed =3D false; +} + +static ssize_t pks_read_file(struct file *file, char __user *user_buf, + size_t count, loff_t *ppos) +{ + char buf[64]; + unsigned int len; + + len =3D sprintf(buf, "%s\n", last_test_pass ? "PASS" : "FAIL"); + + return simple_read_from_buffer(user_buf, count, ppos, buf, len); +} + +static ssize_t pks_write_file(struct file *file, const char __user *user_b= uf, + size_t count, loff_t *ppos) +{ + int rc; + long option; + char buf[2]; + + if (copy_from_user(buf, user_buf, 1)) { + last_test_pass =3D false; + return -EFAULT; + } + buf[1] =3D '\0'; + + rc =3D kstrtol(buf, 0, &option); + if (rc) { + last_test_pass =3D false; + return count; + } + + last_test_pass =3D true; + + switch (option) { + case RUN_CRASH_TEST: + arm_or_run_crash_test(); + goto skip_arm_clearing; + case CHECK_DEFAULTS: + on_each_cpu(check_pkey_settings, NULL, 1); + break; + default: + last_test_pass =3D false; + break; + } + + /* Clear arming on any test run */ + crash_armed =3D false; + +skip_arm_clearing: + return count; +} + +static int pks_release_file(struct inode *inode, struct file *file) +{ + return 0; +} + +static const struct file_operations fops_init_pks =3D { + .read =3D pks_read_file, + .write =3D pks_write_file, + .llseek =3D default_llseek, + .release =3D pks_release_file, +}; + +static int __init pks_test_init(void) +{ + if (cpu_feature_enabled(X86_FEATURE_PKS)) + pks_test_dentry =3D debugfs_create_file("run_pks", 0600, arch_debugfs_di= r, + NULL, &fops_init_pks); + + return 0; +} +late_initcall(pks_test_init); + +static void __exit pks_test_exit(void) +{ + debugfs_remove(pks_test_dentry); + pr_info("test exit\n"); +} --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34E61C433F5 for ; Thu, 27 Jan 2022 17:55:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244862AbiA0Rze (ORCPT ); Thu, 27 Jan 2022 12:55:34 -0500 Received: from mga02.intel.com ([134.134.136.20]:19418 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243792AbiA0RzL (ORCPT ); Thu, 27 Jan 2022 12:55:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306111; x=1674842111; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=NeRBtAKXgYs7FnziQDnUBv0DLxfYL661IYkJjUwFj+I=; b=AbTjo2Ly6lHau8/eh0oNR5QozABytakfG0fPp6k0v/JnsYwKswVpo51N 0Y2eyguGMJAu8YTyUUE7DliHK35RU5CPbVol7GG6c6ozs6isanu4sYx9o f+vWnCMThPRy/aptMAjBclO12gdGl3TeVrBFIfc7f8g9eCWTU+dU+kv6X jrUvgvId8PcyMA7EVRwJrB2WV+tVKhWXf0w3Wt3F3TM22c+HTR5JOnFEb H83yHLUpO38Bx1fDEGTbvdoPWEYl3a4MbZcRRT9RNrcIwm5MqcZSmNbE4 fUnslnHFWDIG3FSi1skGoN63RrXyAyy2f7RohC3mOOkv9SmKvQ0dnby/t Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302424" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302424" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:09 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796092" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:09 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 14/44] x86/pkeys: Introduce pks_write_pkrs() Date: Thu, 27 Jan 2022 09:54:35 -0800 Message-Id: <20220127175505.851391-15-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Writing to MSR's is inefficient. Even though the underlying WRMSR(MSR_IA32_PKRS) is not serializing (see below), writing to the MSR unnecessarily should be avoided. This is especially true when the value of the PKS protections is unlikely to change from the default often. Introduce pks_write_pkrs() which avoids writing the MSR if the pkrs value has not changed for the CPU. Do this by utilizing a per-cpu cache. Protect the use of the cached value from preemption by restricting the use of pks_write_pkrs() to non-preemptable context. Further restrict it's use to callers which have checked X86_FEATURE_PKS. The initial value of the MSR is preserved on INIT. While unlikely, the PKS_INIT_VALUE may be 0 someday which would prevent pks_write_pkrs() from updating the MSR. Keep the MSR write in pks_setup() to ensure the MSR is initialized at least one time. Then call pks_write_pkrs() to set up the per-cache value to ensure it is in sync with the MSR. It should be noted that the underlying WRMSR(MSR_IA32_PKRS) is not serializing but still maintains ordering properties similar to WRPKRU. The current SDM section on PKRS needs updating but should be the same as that of WRPKRU. So to quote from the WRPKRU text: WRPKRU will never execute transiently. Memory accesses affected by PKRU register will not execute (even transiently) until all prior executions of WRPKRU have completed execution and updated the PKRU register. Suggested-by: Dave Hansen Signed-off-by: Ira Weiny --- Changes for V8 From Thomas Remove get/put_cpu_ptr() and make this a 'lower level call. This makes it preemption unsafe but it is called mostly where preemption is already disabled. Add this as a predicate of the call and those calls which need to can disable preemption. Add lockdep assert for preemption Ensure MSR gets written even if the PKS_INIT_VALUE is 0. Completely re-write the commit message. s/write_pkrs/pks_write_pkrs/ Split this off into a singular patch Changes for V7 Create a dynamic pkrs_initial_value in early init code. Clean up comments Add comment to macro guard --- arch/x86/mm/pkeys.c | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index a5b5b86e97ce..3dce99ef4127 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -209,15 +209,56 @@ u32 pkey_update_pkval(u32 pkval, int pkey, u32 access= bits) =20 #ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS =20 +static DEFINE_PER_CPU(u32, pkrs_cache); + +/* + * pks_write_pkrs() - Write the pkrs of the current CPU + * @new_pkrs: New value to write to the current CPU register + * + * Optimizes the MSR writes by maintaining a per cpu cache. + * + * Context: must be called with preemption disabled + * Context: must only be called if PKS is enabled + * + * It should also be noted that the underlying WRMSR(MSR_IA32_PKRS) is not + * serializing but still maintains ordering properties similar to WRPKRU. + * The current SDM section on PKRS needs updating but should be the same as + * that of WRPKRU. Quote from the WRPKRU text: + * + * WRPKRU will never execute transiently. Memory accesses + * affected by PKRU register will not execute (even transiently) + * until all prior executions of WRPKRU have completed execution + * and updated the PKRU register. + */ +static inline void pks_write_pkrs(u32 new_pkrs) +{ + u32 pkrs =3D __this_cpu_read(pkrs_cache); + + lockdep_assert_preemption_disabled(); + + if (pkrs !=3D new_pkrs) { + __this_cpu_write(pkrs_cache, new_pkrs); + wrmsrl(MSR_IA32_PKRS, new_pkrs); + } +} + /* * PKS is independent of PKU and either or both may be supported on a CPU. + * + * Context: must be called with preemption disabled */ void pks_setup(void) { if (!cpu_feature_enabled(X86_FEATURE_PKS)) return; =20 + /* + * If the PKS_INIT_VALUE is 0 then pks_write_pkrs() could fail to + * initialize the MSR. Do a single write here to ensure the MSR is + * written at least one time. + */ wrmsrl(MSR_IA32_PKRS, PKS_INIT_VALUE); + pks_write_pkrs(PKS_INIT_VALUE); cr4_set_bits(X86_CR4_PKS); } =20 --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E48C7C433F5 for ; Thu, 27 Jan 2022 17:55:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244825AbiA0RzX (ORCPT ); Thu, 27 Jan 2022 12:55:23 -0500 Received: from mga02.intel.com ([134.134.136.20]:19413 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244671AbiA0RzO (ORCPT ); Thu, 27 Jan 2022 12:55:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306114; x=1674842114; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7XL+oRdHoLQGSDa1tB4Piu4QrCM6QEHvL+tEBVcvpp4=; b=TOXo11Wz+4qgmoT6wWvN2nwg4JOQPbbYmTBqNOCwQlFgMaRDDY4q3BNo Dbi0DJLQHKGuvRMVNQotbnK/CWwcvatXat+qF0pR774HEHoIoQa+jsBxu dP9rOZb3OVXrOXoyUDzDEyUDcJKtaaWi5CGFu6fJmAHDCeRmgnaDfcChc iRdeTIkK7o22OnmHia2yLazNT3CbuolvkMJ0lae4FsDltH5ahYeUlWzGj t3/5XoDeq4JbQ7HBY7fuPMVz2UPDAdBgwBMBxIylJJlAnAeL9vif3kFB+ hkPjTVaPni7SO7IArPMnM6bPMUJx3s2jKe7vv4ss6/UdeL79ljo8e6Lya w==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302425" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302425" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:09 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796095" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:09 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 15/44] x86/pkeys: Preserve the PKS MSR on context switch Date: Thu, 27 Jan 2022 09:54:36 -0800 Message-Id: <20220127175505.851391-16-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The PKS MSR (PKRS) is defined as a per-logical-processor register. This isolates memory access by logical CPU. Unfortunately, the MSR is not managed by XSAVE. Therefore, tasks must save/restore the MSR value on context switch. Define pks_saved_pkrs in struct thread_struct. Initialize all tasks, including the init_task, with the PKS_INIT_VALUE when created. Restore the CPU's MSR to the saved task value on schedule in. pks_write_current() is added to ensures non-supervisor pkey configurations compile correctly without pks_saved_pkrs in thread_struct as well as ensuring CPUs without PKS support are ignored. NOTE The value of pks_saved_pkrs does not change with this patch. That is left for future patches. Co-developed-by: Fenghua Yu Signed-off-by: Fenghua Yu Signed-off-by: Ira Weiny --- Changes for V8 From Thomas Ensure pkrs_write_current() does not suffer the overhead of preempt disable. Fix setting of initial value Remove flawed and broken create_initial_pkrs_value() in favor of a much simpler and robust macro default Update function names to be consistent. s/pkrs_write_current/pks_write_current This is a more consistent name s/saved_pkrs/pks_saved_pkrs s/pkrs_init_value/PKS_INIT_VALUE Remove pks_init_task() This function was added mainly to avoid the header file issue. Adding pks-keys.h solved that and saves the complexity. Changes for V7 Move definitions from asm/processor.h to asm/pks.h s/INIT_PKRS_VALUE/pkrs_init_value Change pks_init_task()/pks_sched_in() to functions s/pks_sched_in/pks_write_current to be used more generically later in the series --- arch/x86/include/asm/pks.h | 2 ++ arch/x86/include/asm/processor.h | 17 ++++++++++++++++- arch/x86/kernel/process_64.c | 3 +++ arch/x86/mm/pkeys.c | 13 +++++++++++++ 4 files changed, 34 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h index 8180fc59790b..d211bf36492c 100644 --- a/arch/x86/include/asm/pks.h +++ b/arch/x86/include/asm/pks.h @@ -5,10 +5,12 @@ #ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS =20 void pks_setup(void); +void pks_write_current(void); =20 #else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 static inline void pks_setup(void) { } +static inline void pks_write_current(void) { } =20 #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/proces= sor.h index 2c5f12ae7d04..3530a0e50b4f 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -2,6 +2,8 @@ #ifndef _ASM_X86_PROCESSOR_H #define _ASM_X86_PROCESSOR_H =20 +#include + #include =20 /* Forward declaration, a strange C thing */ @@ -502,6 +504,12 @@ struct thread_struct { unsigned long cr2; unsigned long trap_nr; unsigned long error_code; + +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + /* Saved Protection key register for supervisor mappings */ + u32 pks_saved_pkrs; +#endif + #ifdef CONFIG_VM86 /* Virtual 86 mode info */ struct vm86 *vm86; @@ -769,7 +777,14 @@ static inline void spin_lock_prefetch(const void *x) #define KSTK_ESP(task) (task_pt_regs(task)->sp) =20 #else -#define INIT_THREAD { } + +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS +#define INIT_THREAD { \ + .pks_saved_pkrs =3D PKS_INIT_VALUE, \ +} +#else +#define INIT_THREAD { } +#endif =20 extern unsigned long KSTK_ESP(struct task_struct *task); =20 diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 3402edec236c..81fc0b638308 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -59,6 +59,7 @@ /* Not included via unistd.h */ #include #endif +#include =20 #include "process.h" =20 @@ -657,6 +658,8 @@ __switch_to(struct task_struct *prev_p, struct task_str= uct *next_p) /* Load the Intel cache allocation PQR MSR. */ resctrl_sched_in(); =20 + pks_write_current(); + return prev_p; } =20 diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 3dce99ef4127..6d94dfc9a219 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -242,6 +242,19 @@ static inline void pks_write_pkrs(u32 new_pkrs) } } =20 +/** + * pks_write_current() - Write the current thread's saved PKRS value + * + * Context: must be called with preemption disabled + */ +void pks_write_current(void) +{ + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + pks_write_pkrs(current->thread.pks_saved_pkrs); +} + /* * PKS is independent of PKU and either or both may be supported on a CPU. * --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A07CC433FE for ; Thu, 27 Jan 2022 17:55:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244835AbiA0Rzc (ORCPT ); Thu, 27 Jan 2022 12:55:32 -0500 Received: from mga02.intel.com ([134.134.136.20]:19415 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244701AbiA0RzP (ORCPT ); Thu, 27 Jan 2022 12:55:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306115; x=1674842115; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TYAz7JDYxVELS0TsoMqBdWSpXvvAabn6bWktK0rh01c=; b=FKxa6SoCjSO+jmNuU+ze+0v6Va3LpkuwNT3Rrc3ljRM9Pv16rnnCkB19 rm8trT3Twhx3U8aJh5N+DUohAr0MOyF+1mePgmfBur5Gfc1ll31/HpYnK yu7nF9XOPqBouttgNQRnQ+c+hqWhRnbA5dLOn8/xmzFrETxp+cRP00yYI pi1vL/i3DngSJhk9K9l9rJS1yTP5mBgN+xgmgDDaUSxxtqwaRCbT1JfY5 XPehs9Y1okXj7yfLl59eRkaldM/At1aC0jK+WruR4qRwfLm81w2y85V+y NkED3x5PGeYtomg4+jUDPiix6q7LXhDyQh7oqX2mnIYvnJji+WFMH7pE4 Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302426" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302426" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:09 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796099" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:09 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 16/44] mm/pkeys: Introduce pks_mk_readwrite() Date: Thu, 27 Jan 2022 09:54:37 -0800 Message-Id: <20220127175505.851391-17-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny When a user needs valid access to a PKS protected page they will need to change the protections for their pkey to Read/Write within a thread of execution. Define pks_mk_readwrite() to update the specified Pkey. Define pks_update_protection() as a helper to do the heavy lifting and to allow for subsequent pks_mk_*() calls. Define PKEY_READ_WRITE rather than use a magic value of '0' in pks_update_protection(). Finally, ensure preemption is disabled while calling pks_write_pkrs() in this code. Add documentation. Signed-off-by: Ira Weiny --- Changes for V8 Define PKEY_READ_WRITE Make the call inline Clean up the names Use pks_write_pkrs() with preemption disabled Split this out from 'Add PKS kernel API' Include documentation in this patch --- Documentation/core-api/protection-keys.rst | 9 ++++++- arch/x86/mm/pkeys.c | 28 ++++++++++++++++++++++ include/linux/pkeys.h | 25 +++++++++++++++++++ include/uapi/asm-generic/mman-common.h | 1 + 4 files changed, 62 insertions(+), 1 deletion(-) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index 234122e56a92..e4a27b93f3d4 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -141,11 +141,18 @@ Adding Pages to a PKey protected domain :doc: PKS_KEY_ASSIGNMENT =20 =20 +Changing permissions of individual keys +--------------------------------------- + +.. kernel-doc:: include/linux/pks-keys.h + :identifiers: pks_mk_readwrite + MSR details ----------- =20 It should be noted that the underlying WRMSR(MSR_IA32_PKRS) is not seriali= zing -but still maintains ordering properties similar to WRPKRU. +but still maintains ordering properties similar to WRPKRU. Thus it is saf= e to +immediately use a mapping when the pks_mk*() functions return. =20 Older versions of the SDM on PKRS may be wrong with regard to this serialization. The text should be the same as that of WRPKRU. From the W= RPKRU diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 6d94dfc9a219..7c6498fb8f8d 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -10,6 +10,7 @@ =20 #include /* boot_cpu_has, ... */ #include /* vma_pkey() */ +#include =20 int __execute_only_pkey(struct mm_struct *mm) { @@ -275,4 +276,31 @@ void pks_setup(void) cr4_set_bits(X86_CR4_PKS); } =20 +/* + * Do not call this directly, see pks_mk*(). + * + * @pkey: Key for the domain to change + * @protection: protection bits to be used + * + * Protection utilizes the same protection bits specified for User pkeys + * PKEY_DISABLE_ACCESS + * PKEY_DISABLE_WRITE + * + */ +void pks_update_protection(int pkey, u32 protection) +{ + u32 pkrs; + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + pkrs =3D current->thread.pks_saved_pkrs; + current->thread.pks_saved_pkrs =3D pkey_update_pkval(pkrs, pkey, + protection); + preempt_disable(); + pks_write_pkrs(current->thread.pks_saved_pkrs); + preempt_enable(); +} +EXPORT_SYMBOL_GPL(pks_update_protection); + #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index e9ea8f152915..73b554b99123 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -48,6 +48,31 @@ static inline bool arch_pkeys_enabled(void) =20 #endif /* ! CONFIG_ARCH_HAS_PKEYS */ =20 +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + #include +#include + +#include + +void pks_update_protection(int pkey, u32 protection); + +/** + * pks_mk_readwrite() - Make the domain Read/Write + * @pkey: the pkey for which the access should change. + * + * Allow all access, read and write, to the domain specified by pkey. Thi= s is + * not a global update and only affects the current running thread. + */ +static inline void pks_mk_readwrite(int pkey) +{ + pks_update_protection(pkey, PKEY_READ_WRITE); +} + +#else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ + +static inline void pks_mk_readwrite(int pkey) {} + +#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 #endif /* _LINUX_PKEYS_H */ diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-gene= ric/mman-common.h index 1567a3294c3d..3da6ac9e5ded 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -78,6 +78,7 @@ /* compatibility flags */ #define MAP_FILE 0 =20 +#define PKEY_READ_WRITE 0x0 #define PKEY_DISABLE_ACCESS 0x1 #define PKEY_DISABLE_WRITE 0x2 #define PKEY_ACCESS_MASK (PKEY_DISABLE_ACCESS |\ --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B68E4C433EF for ; Thu, 27 Jan 2022 17:55:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234738AbiA0Rzb (ORCPT ); Thu, 27 Jan 2022 12:55:31 -0500 Received: from mga02.intel.com ([134.134.136.20]:19418 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244712AbiA0RzP (ORCPT ); Thu, 27 Jan 2022 12:55:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306115; x=1674842115; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=NOpGQHbLg5P47A/NaN991tZuJORFucxdhxqawe4Tf4o=; b=NN1lDplokQSk92k5E8edBTaxR2pCD7yA0s/Oemuzn7osUYaFTdzgBb5b fBgrRapZPfWZ0lC3EpiljD87uypwqHD13XKEvDg/jjwmliNeSa0V/UhvE qtQVQzjTudz0zlXOyZSKz7NPPCsgoraffNgNPBBPx2wJg5TFdiFF8xgUT aGHt0bGcsWwUsIel9IDbYLyP9bUSaZxo7CTNSUJiWNvR2THqx/VPX5b0W 27XiA0BtiENeJLVUK4BYCMxPLweLcMhcWiDWEwf2hPOV6zeCzDmPrsq1V JZp9cn1LC9dbPcTAAFGVpw3t/2/UzgZZeaMdWhdXFxEmFhcYVCpCgceMz w==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302427" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302427" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:10 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796103" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:09 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 17/44] mm/pkeys: Introduce pks_mk_noaccess() Date: Thu, 27 Jan 2022 09:54:38 -0800 Message-Id: <20220127175505.851391-18-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny After a valid access for a PKS protected page Users will need to change the protections back to No Access for their Pkey Define pks_mk_noaccess() to update the specified Pkey Add documentation. Signed-off-by: Ira Weiny --- Changes for V8 Make the call inline Split this patch out from 'Add PKS kernel API' Include documentation in this patch --- Documentation/core-api/protection-keys.rst | 2 +- include/linux/pkeys.h | 13 +++++++++++++ 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index e4a27b93f3d4..115afc67153f 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -145,7 +145,7 @@ Changing permissions of individual keys --------------------------------------- =20 .. kernel-doc:: include/linux/pks-keys.h - :identifiers: pks_mk_readwrite + :identifiers: pks_mk_readwrite pks_mk_noaccess =20 MSR details ----------- diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index 73b554b99123..5f4965f5449b 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -57,6 +57,18 @@ static inline bool arch_pkeys_enabled(void) =20 void pks_update_protection(int pkey, u32 protection); =20 +/** + * pks_mk_noaccess() - Disable all access to the domain + * @pkey: the pkey for which the access should change. + * + * Disable all access to the domain specified by pkey. This is not a glob= al + * update and only affects the current running thread. + */ +static inline void pks_mk_noaccess(int pkey) +{ + pks_update_protection(pkey, PKEY_DISABLE_ACCESS); +} + /** * pks_mk_readwrite() - Make the domain Read/Write * @pkey: the pkey for which the access should change. @@ -71,6 +83,7 @@ static inline void pks_mk_readwrite(int pkey) =20 #else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 +static inline void pks_mk_noaccess(int pkey) {} static inline void pks_mk_readwrite(int pkey) {} =20 #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E317C433F5 for ; Thu, 27 Jan 2022 17:55:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244743AbiA0Rz2 (ORCPT ); Thu, 27 Jan 2022 12:55:28 -0500 Received: from mga02.intel.com ([134.134.136.20]:19413 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244437AbiA0RzP (ORCPT ); Thu, 27 Jan 2022 12:55:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306115; x=1674842115; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Y+ZrmtCf1q4qznLjZH0k13EJLAex2Lre9GU17jNVyaE=; b=cyEO9MFqmMrdeATvEzrXos5LNrqq9u+h5HfL1AFGQiFZcwnY66zhA+v+ DI5W8wExqioVvksjQ9wB4krzjnPM4hnCyDdcVMlWAi5muv40lAHQdYc4F VkHkCgVGI3DN5TL0xObpNilvaEdjpFlF7GralgKGPj6VEIDIXV1GrjhM9 vEIGTIhDXckBlzgusYi1TtD3srTxZ2ZMMW5QaIKIHunRP7Q6skeA2eCiY 7eMCPw1PgMEiJSQyV2zbu1FqYICnTtQhPZ0nszp+N5w61phfak3WcUzQs RNbzl7BCWR7tCGTQAU4yUoZPBpHKwY1iWSjTnFFk3fk8nY1yx0PBFoVaJ g==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302428" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302428" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:10 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796107" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:10 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 18/44] x86/fault: Add a PKS test fault hook Date: Thu, 27 Jan 2022 09:54:39 -0800 Message-Id: <20220127175505.851391-19-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The PKS test code is going to purposely create faults when testing invalid access. It will need a way to flag those faults as invalid and keep the kernel running properly. Create a hook in the fault handler to call back into the test code such that the test code can track when a test it runs results in a fault. The hook returns if the fault was caused by the test code so the main handler can consider the fault handled. Also the hook is responsible to clear up the reason for the fault. Predicate the hook on CONFIG_PKS_TEST. Signed-off-by: Ira Weiny --- arch/x86/include/asm/pks.h | 14 ++++++++++++++ arch/x86/mm/fault.c | 30 ++++++++++++++++++++---------- lib/pks/pks_test.c | 12 ++++++++++++ 3 files changed, 46 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h index d211bf36492c..ee9fff5b4b13 100644 --- a/arch/x86/include/asm/pks.h +++ b/arch/x86/include/asm/pks.h @@ -14,4 +14,18 @@ static inline void pks_write_current(void) { } =20 #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 + +#ifdef CONFIG_PKS_TEST + +bool pks_test_callback(void); + +#else /* !CONFIG_PKS_TEST */ + +static inline bool pks_test_callback(void) +{ + return false; +} + +#endif /* CONFIG_PKS_TEST */ + #endif /* _ASM_X86_PKS_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 6ed91b632eac..bef879943260 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -33,6 +33,7 @@ #include /* kvm_handle_async_pf */ #include /* fixup_vdso_exception() */ #include +#include =20 #define CREATE_TRACE_POINTS #include @@ -1147,16 +1148,25 @@ static void do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code, unsigned long address) { - /* - * X86_PF_PK (Protection key exceptions) may occur on kernel addresses - * when PKS (PKeys Supervisor) is enabled. - * - * However, if PKS is not enabled WARN if this exception is seen - * because there are no user pages in the kernel portion of the address - * space. - */ - WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_PKS) && - (hw_error_code & X86_PF_PK)); + if (hw_error_code & X86_PF_PK) { + /* + * X86_PF_PK (Protection key exceptions) may occur on kernel + * addresses when PKS (PKeys Supervisor) is enabled. + * + * However, if PKS is not enabled WARN if this exception is + * seen because there are no user pages in the kernel portion + * of the address space. + */ + WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_PKS)); + + /* + * If a protection key exception occurs it could be because a PKS test + * is running. If so, pks_test_callback() will clear the protection + * mechanism and return true to indicate the fault was handled. + */ + if (pks_test_callback()) + return; + } =20 #ifdef CONFIG_X86_32 /* diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c index 159576dda47c..d84ab6e7a09c 100644 --- a/lib/pks/pks_test.c +++ b/lib/pks/pks_test.c @@ -47,6 +47,18 @@ struct pks_test_ctx { char data[64]; }; =20 +/* + * pks_test_callback() is called by the fault handler to indicate it saw a= PKey + * fault. + * + * NOTE: The callback is responsible for clearing any condition which would + * cause the fault to re-trigger. + */ +bool pks_test_callback(void) +{ + return false; +} + static void *alloc_test_page(int pkey) { return __vmalloc_node_range(PKS_TEST_MEM_SIZE, 1, VMALLOC_START, VMALLOC_= END, --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72AB3C433EF for ; Thu, 27 Jan 2022 17:55:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244758AbiA0Rzk (ORCPT ); Thu, 27 Jan 2022 12:55:40 -0500 Received: from mga02.intel.com ([134.134.136.20]:19418 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244595AbiA0RzQ (ORCPT ); Thu, 27 Jan 2022 12:55:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306116; x=1674842116; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VICanE3l/b6iqPaX50y8ZrxI7HlwV5bcXzfijwPRQ+0=; b=nkaehdo6cmmhiXDcfzoQxfMFne9yU3XidQsbX1vZk+Y1bJ0m+0AuqkH4 21iJU7q1jisEsuUF/vggxI+b3hh9zLO4p5hqI5XIVJevuIMbWTNxwKMlW 3StAHWF6bhY+lvelDxKxCmXCxZpdkuze5s9+DFyUmH+qjGclM5+22tC0c fwna/I11Rf+1QmvFvF1V0IkC/ECZU5OshgoMr3aidWdJUzks1oN5Cipyy EHC2Ezl4TMAAX6onw8lqYZ/RCyCpvIzq/TFfPnQYUeRSkMVcw5iUYhhIA xZySRySjohMJNVT2YPaRE9B1eA+UyVGZyEcAesyExnjsdFQZVMXnTJjY1 g==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302430" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302430" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:10 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796111" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:10 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 19/44] mm/pkeys: PKS Testing, add pks_mk_*() tests Date: Thu, 27 Jan 2022 09:54:40 -0800 Message-Id: <20220127175505.851391-20-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Create a test which runs through both read and writes on each of the 2 modes a PKS pkey can be set to, no access and read write. First fill out pks_test_callback() to track fault count and make the test key read write to ensure the fault does not trigger again. Second verify that the pkey was properly set in the PTE. Then add the test itself which iterates each of the test cases. PKS_TEST_NO_ACCESS, WRITE, FAULT_EXPECTED PKS_TEST_NO_ACCESS, READ, FAULT_EXPECTED PKS_TEST_RDWR, WRITE, NO_FAULT_EXPECTED PKS_TEST_RDWR, READ, NO_FAULT_EXPECTED Finally add pks_mk_noaccess() at the end of the test and in the crash test to ensure that the pkey value is reset to the default at the appropriate times. Add documentation. Operation from user space is simple: $ echo 1 > /sys/kernel/debug/x86/run_pks $ cat /sys/kernel/debug/x86/run_pks PASS Signed-off-by: Ira Weiny --- Changes for V8 Remove readonly test, as that patch is not needed for PMEM Split this off into a patch which follows the pks_mk_*() patches. Thus allowing for a better view of how the test works compared to the functionality added with those patches. Remove unneeded prints --- lib/pks/pks_test.c | 168 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 167 insertions(+), 1 deletion(-) diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c index d84ab6e7a09c..fad9b996562a 100644 --- a/lib/pks/pks_test.c +++ b/lib/pks/pks_test.c @@ -14,6 +14,8 @@ * where X is: * * * 0 Loop through all CPUs, report the msr, and check against the defau= lt. + * * 1 Allocate a single key and check all 3 permissions on a page. + * * 8 Loop through all CPUs, report the msr, and check against the defau= lt. * * 9 Set up and fault on a PKS protected page. * * NOTE: 9 will fault on purpose. Therefore, it requires the option to be @@ -32,15 +34,21 @@ #include #include =20 +#include + #define PKS_TEST_MEM_SIZE (PAGE_SIZE) =20 #define CHECK_DEFAULTS 0 +#define RUN_SINGLE 1 #define RUN_CRASH_TEST 9 =20 static struct dentry *pks_test_dentry; static bool crash_armed; =20 static bool last_test_pass; +static int test_armed_key; +static int fault_cnt; +static int prev_fault_cnt; =20 struct pks_test_ctx { int pkey; @@ -56,7 +64,102 @@ struct pks_test_ctx { */ bool pks_test_callback(void) { - return false; + bool armed =3D (test_armed_key !=3D 0); + + if (armed) { + pks_mk_readwrite(test_armed_key); + fault_cnt++; + } + + return armed; +} + +static bool fault_caught(void) +{ + bool ret =3D (fault_cnt !=3D prev_fault_cnt); + + prev_fault_cnt =3D fault_cnt; + return ret; +} + +enum pks_access_mode { + PKS_TEST_NO_ACCESS, + PKS_TEST_RDWR, +}; + +#define PKS_WRITE true +#define PKS_READ false +#define PKS_FAULT_EXPECTED true +#define PKS_NO_FAULT_EXPECTED false + +static char *get_mode_str(enum pks_access_mode mode) +{ + switch (mode) { + case PKS_TEST_NO_ACCESS: + return "No Access"; + case PKS_TEST_RDWR: + return "Read Write"; + default: + pr_err("BUG in test invalid mode\n"); + break; + } + + return ""; +} + +struct pks_access_test { + enum pks_access_mode mode; + bool write; + bool fault; +}; + +static struct pks_access_test pkey_test_ary[] =3D { + { PKS_TEST_NO_ACCESS, PKS_WRITE, PKS_FAULT_EXPECTED }, + { PKS_TEST_NO_ACCESS, PKS_READ, PKS_FAULT_EXPECTED }, + + { PKS_TEST_RDWR, PKS_WRITE, PKS_NO_FAULT_EXPECTED }, + { PKS_TEST_RDWR, PKS_READ, PKS_NO_FAULT_EXPECTED }, +}; + +static bool run_access_test(struct pks_test_ctx *ctx, + struct pks_access_test *test, + void *ptr) +{ + bool fault; + + switch (test->mode) { + case PKS_TEST_NO_ACCESS: + pks_mk_noaccess(ctx->pkey); + break; + case PKS_TEST_RDWR: + pks_mk_readwrite(ctx->pkey); + break; + default: + pr_err("BUG in test invalid mode\n"); + return false; + } + + WRITE_ONCE(test_armed_key, ctx->pkey); + + if (test->write) + memcpy(ptr, ctx->data, 8); + else + memcpy(ctx->data, ptr, 8); + + fault =3D fault_caught(); + + WRITE_ONCE(test_armed_key, 0); + + if (test->fault !=3D fault) { + pr_err("pkey test FAILED: mode %s; write %s; fault %s !=3D %s\n", + get_mode_str(test->mode), + test->write ? "TRUE" : "FALSE", + test->fault ? "YES" : "NO", + fault ? "YES" : "NO"); + return false; + } + + return true; } =20 static void *alloc_test_page(int pkey) @@ -66,6 +169,48 @@ static void *alloc_test_page(int pkey) NUMA_NO_NODE, __builtin_return_address(0)); } =20 +static bool test_ctx(struct pks_test_ctx *ctx) +{ + bool rc =3D true; + int i; + u8 pkey; + void *ptr =3D NULL; + pte_t *ptep =3D NULL; + unsigned int level; + + ptr =3D alloc_test_page(ctx->pkey); + if (!ptr) { + pr_err("Failed to vmalloc page???\n"); + return false; + } + + ptep =3D lookup_address((unsigned long)ptr, &level); + if (!ptep) { + pr_err("Failed to lookup address???\n"); + rc =3D false; + goto done; + } + + pkey =3D pte_flags_pkey(ptep->pte); + if (pkey !=3D ctx->pkey) { + pr_err("invalid pkey found: %u, test_pkey: %u\n", + pkey, ctx->pkey); + rc =3D false; + goto done; + } + + for (i =3D 0; i < ARRAY_SIZE(pkey_test_ary); i++) { + /* sticky fail */ + if (!run_access_test(ctx, &pkey_test_ary[i], ptr)) + rc =3D false; + } + +done: + vfree(ptr); + + return rc; +} + static struct pks_test_ctx *alloc_ctx(u8 pkey) { struct pks_test_ctx *ctx =3D kzalloc(sizeof(*ctx), GFP_KERNEL); @@ -85,6 +230,22 @@ static void free_ctx(struct pks_test_ctx *ctx) kfree(ctx); } =20 +static bool run_single(void) +{ + struct pks_test_ctx *ctx; + bool rc; + + ctx =3D alloc_ctx(PKS_KEY_TEST); + if (IS_ERR(ctx)) + return false; + + rc =3D test_ctx(ctx); + pks_mk_noaccess(ctx->pkey); + free_ctx(ctx); + + return rc; +} + static void crash_it(void) { struct pks_test_ctx *ctx; @@ -104,6 +265,8 @@ static void crash_it(void) return; } =20 + pks_mk_noaccess(ctx->pkey); + /* This purposely faults */ memcpy(ptr, ctx->data, 8); =20 @@ -185,6 +348,9 @@ static ssize_t pks_write_file(struct file *file, const = char __user *user_buf, case CHECK_DEFAULTS: on_each_cpu(check_pkey_settings, NULL, 1); break; + case RUN_SINGLE: + last_test_pass =3D run_single(); + break; default: last_test_pass =3D false; break; --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8E4AC433EF for ; Thu, 27 Jan 2022 17:55:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245025AbiA0Rzw (ORCPT ); Thu, 27 Jan 2022 12:55:52 -0500 Received: from mga02.intel.com ([134.134.136.20]:19415 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244602AbiA0RzQ (ORCPT ); Thu, 27 Jan 2022 12:55:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306116; x=1674842116; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=uQtrsvC9NzylIZ7djxR+FBTAoptzaOW13uIB28zSQiQ=; b=kfrCOt4/Aw+Eda/6n+Af8AR7QBedxzgXe8BhIWe8Q6UbjM8gtTcF6qGm VDPwieLTNPoipmXB8CkNfwwJIeVWOhbTJLsxgHdrZF6HddfPSSmkaviER szKhms885M85HCgp6olviPdOJUXUB2LEDp8p4rtGzgkIXsF7FGb25B7o0 GoDlU4m4HThCD1N1IkDYzuRh2xsb0REVXMqmSL3x86lEhhLIhU6EWOcrJ Mng+iJkRT0brl2978cuX0xS4gMp2NRZ+Fw9qrOZItfx3eZj9A0bK3cJrE hqyLgHWVRjuBtpdPg8lDqVCMChkjTPqd5zDZcpQOZZrYn2gOu1Ces7EF1 Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302431" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302431" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:10 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796117" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:10 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 20/44] mm/pkeys: Add PKS test for context switching Date: Thu, 27 Jan 2022 09:54:41 -0800 Message-Id: <20220127175505.851391-21-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny PKS software must maintain the PKRS value for each thread in the system. It must then restore this value whenever a thread is scheduled in. Create a user space test to test this. The test runs 2 processes simultaneously on the same CPU. One sets up a known PKS value for the test pkey and sleeps while the other runs through all the protections using the same pkey. The first process is then allowed to run and it checks that its PKRS value was properly restored. On the kernel side 2 additional commands are added. One is a mechanism to arm a context and the other checks that context. The kernel maintains this context while the char device remains open. The context is cleaned up with the fd is closed. Signed-off-by: Ira Weiny --- Changes for V8 Split this off from the main testing patch Remove unneeded prints --- lib/pks/pks_test.c | 74 +++++++++++ tools/testing/selftests/x86/Makefile | 2 +- tools/testing/selftests/x86/test_pks.c | 168 +++++++++++++++++++++++++ 3 files changed, 243 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/x86/test_pks.c diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c index fad9b996562a..933f1bed4820 100644 --- a/lib/pks/pks_test.c +++ b/lib/pks/pks_test.c @@ -15,6 +15,8 @@ * * * 0 Loop through all CPUs, report the msr, and check against the defau= lt. * * 1 Allocate a single key and check all 3 permissions on a page. + * * 2 'arm context' for context switch test + * * 3 Check the context armed in '2' to ensure the MSR value was preserv= ed * * 8 Loop through all CPUs, report the msr, and check against the defau= lt. * * 9 Set up and fault on a PKS protected page. * @@ -24,6 +26,11 @@ * $ cat /sys/kernel/debug/x86/run_pks * * Will print the result of the last test. + * + * To automate context switch testing a user space program is provided in: + * + * .../tools/testing/selftests/x86/test_pks.c + * */ =20 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt @@ -33,6 +40,9 @@ #include #include #include +#include + +#include =20 #include =20 @@ -40,6 +50,8 @@ =20 #define CHECK_DEFAULTS 0 #define RUN_SINGLE 1 +#define ARM_CTX_SWITCH 2 +#define CHECK_CTX_SWITCH 3 #define RUN_CRASH_TEST 9 =20 static struct dentry *pks_test_dentry; @@ -309,6 +321,55 @@ static void arm_or_run_crash_test(void) crash_armed =3D false; } =20 +static void arm_ctx_switch(struct file *file) +{ + struct pks_test_ctx *ctx; + + ctx =3D alloc_ctx(PKS_KEY_TEST); + if (IS_ERR(ctx)) { + pr_err("Failed to allocate a context\n"); + last_test_pass =3D false; + return; + } + + /* Store context for later checks */ + if (file->private_data) { + pr_warn("Context already armed\n"); + free_ctx(file->private_data); + } + file->private_data =3D ctx; + + /* Ensure a known state to test context switch */ + pks_mk_readwrite(ctx->pkey); +} + +static void check_ctx_switch(struct file *file) +{ + struct pks_test_ctx *ctx; + unsigned long reg_pkrs; + int access; + + last_test_pass =3D true; + + if (!file->private_data) { + pr_err("No Context switch configured\n"); + last_test_pass =3D false; + return; + } + + ctx =3D file->private_data; + + rdmsrl(MSR_IA32_PKRS, reg_pkrs); + + access =3D (reg_pkrs >> PKR_PKEY_SHIFT(ctx->pkey)) & + PKEY_ACCESS_MASK; + if (access !=3D 0) { + last_test_pass =3D false; + pr_err("Context switch check failed: pkey %d: 0x%x reg: 0x%lx\n", + ctx->pkey, access, reg_pkrs); + } +} + static ssize_t pks_read_file(struct file *file, char __user *user_buf, size_t count, loff_t *ppos) { @@ -351,6 +412,14 @@ static ssize_t pks_write_file(struct file *file, const= char __user *user_buf, case RUN_SINGLE: last_test_pass =3D run_single(); break; + case ARM_CTX_SWITCH: + /* start of context switch test */ + arm_ctx_switch(file); + break; + case CHECK_CTX_SWITCH: + /* After context switch MSR should be restored */ + check_ctx_switch(file); + break; default: last_test_pass =3D false; break; @@ -365,6 +434,11 @@ static ssize_t pks_write_file(struct file *file, const= char __user *user_buf, =20 static int pks_release_file(struct inode *inode, struct file *file) { + struct pks_test_ctx *ctx =3D file->private_data; + + if (ctx) + free_ctx(ctx); + return 0; } =20 diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests= /x86/Makefile index 8a1f62ab3c8e..e08670596c14 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -13,7 +13,7 @@ CAN_BUILD_WITH_NOPIE :=3D $(shell ./check_cc.sh $(CC) tri= vial_program.c -no-pie) TARGETS_C_BOTHBITS :=3D single_step_syscall sysret_ss_attrs syscall_nt tes= t_mremap_vdso \ check_initial_reg_state sigreturn iopl ioperm \ test_vsyscall mov_ss_trap \ - syscall_arg_fault fsgsbase_restore sigaltstack + syscall_arg_fault fsgsbase_restore sigaltstack test_pks TARGETS_C_32BIT_ONLY :=3D entry_from_vm86 test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer diff --git a/tools/testing/selftests/x86/test_pks.c b/tools/testing/selftes= ts/x86/test_pks.c new file mode 100644 index 000000000000..9a24a4a61f28 --- /dev/null +++ b/tools/testing/selftests/x86/test_pks.c @@ -0,0 +1,168 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright(c) 2021 Intel Corporation. All rights reserved. + * + * User space tool to test PKS operations. Accesses test code through + * /x86/run_pks when CONFIG_PKS_TEST is enabled. + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define PKS_TEST_FILE "/sys/kernel/debug/x86/run_pks" + +#define RUN_SINGLE "1" +#define ARM_CTX_SWITCH "2" +#define CHECK_CTX_SWITCH "3" + +void print_help_and_exit(char *argv0) +{ + printf("Usage: %s [-h] \n", argv0); + printf(" --help,-h This help\n"); + printf("\n"); + printf(" Run a context switch test on (Default: 0)\n"); +} + +int check_context_switch(int cpu) +{ + int switch_done[2]; + int setup_done[2]; + cpu_set_t cpuset; + char result[32]; + int rc =3D 0; + pid_t pid; + int fd; + + CPU_ZERO(&cpuset); + CPU_SET(cpu, &cpuset); + /* + * Ensure the two processes run on the same CPU so that they go through + * a context switch. + */ + sched_setaffinity(getpid(), sizeof(cpu_set_t), &cpuset); + + if (pipe(setup_done)) { + printf("ERROR: Failed to create pipe\n"); + return -1; + } + if (pipe(switch_done)) { + printf("ERROR: Failed to create pipe\n"); + return -1; + } + + pid =3D fork(); + if (pid =3D=3D 0) { + char done =3D 'y'; + + fd =3D open(PKS_TEST_FILE, O_RDWR); + if (fd < 0) { + printf("ERROR: cannot open %s\n", PKS_TEST_FILE); + return -1; + } + + cpu =3D sched_getcpu(); + printf("Child running on cpu %d...\n", cpu); + + /* Allocate and run test. */ + write(fd, RUN_SINGLE, 1); + + /* Arm for context switch test */ + write(fd, ARM_CTX_SWITCH, 1); + + printf(" tell parent to go\n"); + write(setup_done[1], &done, sizeof(done)); + + /* Context switch out... */ + printf(" Waiting for parent...\n"); + read(switch_done[0], &done, sizeof(done)); + + /* Check msr restored */ + printf("Checking result\n"); + write(fd, CHECK_CTX_SWITCH, 1); + + read(fd, result, 10); + printf(" #PF, context switch, pkey allocation and free tests: %s\n", r= esult); + if (!strncmp(result, "PASS", 10)) { + rc =3D -1; + done =3D 'F'; + } + + /* Signal result */ + write(setup_done[1], &done, sizeof(done)); + } else { + char done =3D 'y'; + + read(setup_done[0], &done, sizeof(done)); + cpu =3D sched_getcpu(); + printf("Parent running on cpu %d\n", cpu); + + fd =3D open(PKS_TEST_FILE, O_RDWR); + if (fd < 0) { + printf("ERROR: cannot open %s\n", PKS_TEST_FILE); + return -1; + } + + /* run test with the same pkey */ + write(fd, RUN_SINGLE, 1); + + printf(" Signaling child.\n"); + write(switch_done[1], &done, sizeof(done)); + + /* Wait for result */ + read(setup_done[0], &done, sizeof(done)); + if (done =3D=3D 'F') + rc =3D -1; + } + + close(fd); + + return rc; +} + +int main(int argc, char *argv[]) +{ + int cpu =3D 0; + int rc; + int c; + + while (1) { + int option_index =3D 0; + static struct option long_options[] =3D { + {"help", no_argument, 0, 'h' }, + {0, 0, 0, 0 } + }; + + c =3D getopt_long(argc, argv, "h", long_options, &option_index); + if (c =3D=3D -1) + break; + + switch (c) { + case 'h': + print_help_and_exit(argv[0]); + break; + } + } + + if (optind < argc) + cpu =3D strtoul(argv[optind], NULL, 0); + + if (cpu >=3D sysconf(_SC_NPROCESSORS_ONLN)) { + printf("CPU %d is invalid\n", cpu); + cpu =3D sysconf(_SC_NPROCESSORS_ONLN) - 1; + printf(" running on max CPU: %d\n", cpu); + } + + rc =3D check_context_switch(cpu); + + return rc; +} --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88B5DC433EF for ; Thu, 27 Jan 2022 17:55:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241496AbiA0Rz4 (ORCPT ); Thu, 27 Jan 2022 12:55:56 -0500 Received: from mga02.intel.com ([134.134.136.20]:19413 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244600AbiA0RzQ (ORCPT ); Thu, 27 Jan 2022 12:55:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306116; x=1674842116; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2zhhwydmVa+dY1Q4o9BnDwtd7Hf/BcCgIS1cAgT1A78=; b=eYhRv+GzEMxAZ7QZt9O/hEHcC0UM3Q4wfY6x1G+zR007JiBY0FYOFw5D WiIppzqZzJNHkf1sp0oKMf4fXd6EacUVLKVoCqLefxtyLi5A+we3WPHyZ dFyg6iKQM/nE5j38yGoL7qwMwj21MDVSVhGUuqZBJ+DMP5r7ZksOSjwxI sxLLbWdaob1HpgWsXA9DivWFw5mcHlzljLgkXOjbQeR6h+nzGeAjWzGko iXVi7f7HNjG2kT/EeM4VJMpamKNSWDZjUptPpxbVV3tN0b1SSu3OfOEdW NvA3HL8+SXxOORoPSoupB9+4Y++QrjCbHfr/wN8j057igvfyXzPnRJvon Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302433" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302433" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:10 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796120" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:10 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 21/44] x86/entry: Add auxiliary pt_regs space Date: Thu, 27 Jan 2022 09:54:42 -0800 Message-Id: <20220127175505.851391-22-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The PKRS MSR is not managed by XSAVE. In order for the MSR to be saved during an exception the current CPU MSR value needs to be saved somewhere during the exception and restored when returning to the previous context. Two possible places for preserving this state were considered, irqentry_state_t or pt_regs.[1] pt_regs was much more complicated and was potentially fraught with unintended consequences.[2] However, Andy came up with a way to hide additional values on the stack which could be accessed as "extended_pt_regs".[3] This method allows any place which has struct pt_regs to get access to the extra information with no extra information being added to irq_state and pt_regs is left intact for compatibility with outside tools like BPF. Prepare the assembly code to add a hidden auxiliary pt_regs space. To simplify, the assembly code only adds space on the stack. The use of this space is left to the C code which is required to select ARCH_HAS_PTREGS_AUXILIARY to enable this support. Each nested exception gets another copy of this auxiliary space allowing for any number of levels of exception handling. Initially the space is left empty and results in no code changes because ARCH_HAS_PTREGS_AUXILIARY is not set. Subsequent patches adding data to pt_regs_auxiliary must set ARCH_HAS_PTREGS_AUXILIARY or a build failure will occur. The use of ARCH_HAS_PTREGS_AUXILIARY also avoids the introduction of 2 instructions (addq/subq) on every entry call when the extra space is not needed. 32bit is specifically excluded. Peter, Thomas, Andy, Dave, and Dan all suggested parts of the patch or aided in the development of the patch.. [1] https://lore.kernel.org/lkml/CALCETrVe1i5JdyzD_BcctxQJn+ZE3T38EFPgjxN1F= 577M36g+w@mail.gmail.com/ [2] https://lore.kernel.org/lkml/874kpxx4jf.fsf@nanos.tec.linutronix.de/#t [3] https://lore.kernel.org/lkml/CALCETrUHwZPic89oExMMe-WyDY8-O3W68NcZvse3= =3DPGW+iW5=3Dw@mail.gmail.com/ Cc: Dave Hansen Cc: Dan Williams Suggested-by: Dave Hansen Suggested-by: Dan Williams Suggested-by: Peter Zijlstra Suggested-by: Thomas Gleixner Suggested-by: Andy Lutomirski Signed-off-by: Ira Weiny --- Changes for V8: Exclude 32bit Introduce ARCH_HAS_PTREGS_AUXILIARY to optimize this away when not needed. From Thomas s/EXTENDED_PT_REGS_SIZE/PT_REGS_AUX_SIZE Fix up PTREGS_AUX_SIZE macro to be based on the structures and used in assembly code via the nifty asm-offset macros Bound calls into c code with [PUSH|POP]_RTREGS_AUXILIARY instead of using a macro 'call' Split this patch out and put the PKS specific stuff in a separate patch Changes for V7: Rebased to 5.14 entry code declare write_pkrs() in pks.h s/INIT_PKRS_VALUE/pkrs_init_value Remove unnecessary INIT_PKRS_VALUE def s/pkrs_save_set_irq/pkrs_save_irq/ The inital value for exceptions is best managed completely within the pkey code. --- arch/x86/Kconfig | 4 ++++ arch/x86/entry/calling.h | 20 ++++++++++++++++++++ arch/x86/entry/entry_64.S | 22 ++++++++++++++++++++++ arch/x86/entry/entry_64_compat.S | 6 ++++++ arch/x86/include/asm/ptrace.h | 19 +++++++++++++++++++ arch/x86/kernel/asm-offsets_64.c | 15 +++++++++++++++ arch/x86/kernel/head_64.S | 6 ++++++ 7 files changed, 92 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a30fe85e27ac..82342f27b218 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1877,6 +1877,10 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS =20 If unsure, say y. =20 +config ARCH_HAS_PTREGS_AUXILIARY + depends on X86_64 + bool + choice prompt "TSX enable mode" depends on CPU_SUP_INTEL diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index a4c061fb7c6e..d0ebf9b069c9 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -63,6 +63,26 @@ For 32-bit we have the following conventions - kernel is= built with * for assembly code: */ =20 + +#ifdef CONFIG_ARCH_HAS_PTREGS_AUXILIARY + +.macro PUSH_PTREGS_AUXILIARY + /* add space for pt_regs_auxiliary */ + subq $PTREGS_AUX_SIZE, %rsp +.endm + +.macro POP_PTREGS_AUXILIARY + /* remove space for pt_regs_auxiliary */ + addq $PTREGS_AUX_SIZE, %rsp +.endm + +#else + +#define PUSH_PTREGS_AUXILIARY +#define POP_PTREGS_AUXILIARY + +#endif + .macro PUSH_REGS rdx=3D%rdx rax=3D%rax save_ret=3D0 .if \save_ret pushq %rsi /* pt_regs->si */ diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 466df3e50276..0684a8093965 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -332,7 +332,9 @@ SYM_CODE_END(ret_from_fork) movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */ .endif =20 + PUSH_PTREGS_AUXILIARY call \cfunc + POP_PTREGS_AUXILIARY =20 jmp error_return .endm @@ -435,7 +437,9 @@ SYM_CODE_START(\asmsym) =20 movq %rsp, %rdi /* pt_regs pointer */ =20 + PUSH_PTREGS_AUXILIARY call \cfunc + POP_PTREGS_AUXILIARY =20 jmp paranoid_exit =20 @@ -496,7 +500,9 @@ SYM_CODE_START(\asmsym) * stack. */ movq %rsp, %rdi /* pt_regs pointer */ + PUSH_PTREGS_AUXILIARY call vc_switch_off_ist + POP_PTREGS_AUXILIARY movq %rax, %rsp /* Switch to new stack */ =20 UNWIND_HINT_REGS @@ -507,7 +513,9 @@ SYM_CODE_START(\asmsym) =20 movq %rsp, %rdi /* pt_regs pointer */ =20 + PUSH_PTREGS_AUXILIARY call kernel_\cfunc + POP_PTREGS_AUXILIARY =20 /* * No need to switch back to the IST stack. The current stack is either @@ -542,7 +550,9 @@ SYM_CODE_START(\asmsym) movq %rsp, %rdi /* pt_regs pointer into first argument */ movq ORIG_RAX(%rsp), %rsi /* get error code into 2nd argument*/ movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */ + PUSH_PTREGS_AUXILIARY call \cfunc + POP_PTREGS_AUXILIARY =20 jmp paranoid_exit =20 @@ -784,7 +794,9 @@ SYM_CODE_START_LOCAL(exc_xen_hypervisor_callback) movq %rdi, %rsp /* we don't return, adjust the stack frame */ UNWIND_HINT_REGS =20 + PUSH_PTREGS_AUXILIARY call xen_pv_evtchn_do_upcall + POP_PTREGS_AUXILIARY =20 jmp error_return SYM_CODE_END(exc_xen_hypervisor_callback) @@ -984,7 +996,9 @@ SYM_CODE_START_LOCAL(error_entry) /* Put us onto the real thread stack. */ popq %r12 /* save return addr in %12 */ movq %rsp, %rdi /* arg0 =3D pt_regs pointer */ + PUSH_PTREGS_AUXILIARY call sync_regs + POP_PTREGS_AUXILIARY movq %rax, %rsp /* switch stack */ ENCODE_FRAME_POINTER pushq %r12 @@ -1040,7 +1054,9 @@ SYM_CODE_START_LOCAL(error_entry) * as if we faulted immediately after IRET. */ mov %rsp, %rdi + PUSH_PTREGS_AUXILIARY call fixup_bad_iret + POP_PTREGS_AUXILIARY mov %rax, %rsp jmp .Lerror_entry_from_usermode_after_swapgs SYM_CODE_END(error_entry) @@ -1146,7 +1162,9 @@ SYM_CODE_START(asm_exc_nmi) =20 movq %rsp, %rdi movq $-1, %rsi + PUSH_PTREGS_AUXILIARY call exc_nmi + POP_PTREGS_AUXILIARY =20 /* * Return back to user mode. We must *not* do the normal exit @@ -1182,6 +1200,8 @@ SYM_CODE_START(asm_exc_nmi) * +---------------------------------------------------------+ * | pt_regs | * +---------------------------------------------------------+ + * | (Optionally) pt_regs_extended | + * +---------------------------------------------------------+ * * The "original" frame is used by hardware. Before re-enabling * NMIs, we need to be done with it, and we need to leave enough @@ -1358,7 +1378,9 @@ end_repeat_nmi: =20 movq %rsp, %rdi movq $-1, %rsi + PUSH_PTREGS_AUXILIARY call exc_nmi + POP_PTREGS_AUXILIARY =20 /* Always restore stashed CR3 value (see paranoid_entry) */ RESTORE_CR3 scratch_reg=3D%r15 save_reg=3D%r14 diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_com= pat.S index 0051cf5c792d..c6859d8acae4 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -136,7 +136,9 @@ SYM_INNER_LABEL(entry_SYSENTER_compat_after_hwframe, SY= M_L_GLOBAL) .Lsysenter_flags_fixed: =20 movq %rsp, %rdi + PUSH_PTREGS_AUXILIARY call do_SYSENTER_32 + POP_PTREGS_AUXILIARY /* XEN PV guests always use IRET path */ ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_userm= ode", \ "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV @@ -253,7 +255,9 @@ SYM_INNER_LABEL(entry_SYSCALL_compat_after_hwframe, SYM= _L_GLOBAL) UNWIND_HINT_REGS =20 movq %rsp, %rdi + PUSH_PTREGS_AUXILIARY call do_fast_syscall_32 + POP_PTREGS_AUXILIARY /* XEN PV guests always use IRET path */ ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_userm= ode", \ "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV @@ -410,6 +414,8 @@ SYM_CODE_START(entry_INT80_compat) cld =20 movq %rsp, %rdi + PUSH_PTREGS_AUXILIARY call do_int80_syscall_32 + POP_PTREGS_AUXILIARY jmp swapgs_restore_regs_and_return_to_usermode SYM_CODE_END(entry_INT80_compat) diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 703663175a5a..79541682e7f7 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -2,11 +2,13 @@ #ifndef _ASM_X86_PTRACE_H #define _ASM_X86_PTRACE_H =20 +#include #include #include #include =20 #ifndef __ASSEMBLY__ + #ifdef __i386__ =20 struct pt_regs { @@ -91,6 +93,23 @@ struct pt_regs { /* top of stack page */ }; =20 +/* + * NOTE: Features which add data to pt_regs_auxiliary must select + * ARCH_HAS_PTREGS_AUXILIARY. Failure to do so will result in a build fai= lure. + */ +struct pt_regs_auxiliary { +}; + +struct pt_regs_extended { + struct pt_regs_auxiliary aux; + struct pt_regs pt_regs __aligned(8); +}; + +static inline struct pt_regs_extended *to_extended_pt_regs(struct pt_regs = *regs) +{ + return container_of(regs, struct pt_regs_extended, pt_regs); +} + #endif /* !__i386__ */ =20 #ifdef CONFIG_PARAVIRT diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets= _64.c index b14533af7676..66f08ac3507a 100644 --- a/arch/x86/kernel/asm-offsets_64.c +++ b/arch/x86/kernel/asm-offsets_64.c @@ -4,6 +4,7 @@ #endif =20 #include +#include =20 #if defined(CONFIG_KVM_GUEST) && defined(CONFIG_PARAVIRT_SPINLOCKS) #include @@ -60,5 +61,19 @@ int main(void) DEFINE(stack_canary_offset, offsetof(struct fixed_percpu_data, stack_cana= ry)); BLANK(); #endif + +#ifdef CONFIG_ARCH_HAS_PTREGS_AUXILIARY + /* Size of Auxiliary pt_regs data */ + DEFINE(PTREGS_AUX_SIZE, sizeof(struct pt_regs_extended) - + sizeof(struct pt_regs)); +#else + /* + * Adding data to struct pt_regs_auxiliary requires setting + * ARCH_HAS_PTREGS_AUXILIARY + */ + BUILD_BUG_ON((sizeof(struct pt_regs_extended) - + sizeof(struct pt_regs)) !=3D 0); +#endif + return 0; } diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 9c63fc5988cd..8418d9de8d70 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -336,8 +336,10 @@ SYM_CODE_START_NOALIGN(vc_boot_ghcb) movq %rsp, %rdi movq ORIG_RAX(%rsp), %rsi movq initial_vc_handler(%rip), %rax + PUSH_PTREGS_AUXILIARY ANNOTATE_RETPOLINE_SAFE call *%rax + POP_PTREGS_AUXILIARY =20 /* Unwind pt_regs */ POP_REGS @@ -414,7 +416,9 @@ SYM_CODE_START_LOCAL(early_idt_handler_common) UNWIND_HINT_REGS =20 movq %rsp,%rdi /* RDI =3D pt_regs; RSI is already trapnr */ + PUSH_PTREGS_AUXILIARY call do_early_exception + POP_PTREGS_AUXILIARY =20 decl early_recursion_flag(%rip) jmp restore_regs_and_return_to_kernel @@ -438,7 +442,9 @@ SYM_CODE_START_NOALIGN(vc_no_ghcb) /* Call C handler */ movq %rsp, %rdi movq ORIG_RAX(%rsp), %rsi + PUSH_PTREGS_AUXILIARY call do_vc_no_ghcb + POP_PTREGS_AUXILIARY =20 /* Unwind pt_regs */ POP_REGS --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 764A0C433F5 for ; Thu, 27 Jan 2022 17:56:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244938AbiA0R4A (ORCPT ); Thu, 27 Jan 2022 12:56:00 -0500 Received: from mga02.intel.com ([134.134.136.20]:19418 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244829AbiA0RzY (ORCPT ); Thu, 27 Jan 2022 12:55:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306124; x=1674842124; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XJw9MAUdxudZN0wWW0EtiDX6K6ENUNzmSElBKkroX8w=; b=edFXZ851E/Z2PnHTgY26jZjl2Yn27toUhBSBkoT0zEgOIqOts1+CAE7R guMbzLUBFqUccbjuHf/hqlGYX4T4+rLB3xrrmIXLEdojNEPmcgUnaNsp4 A2Jw1rD7U75+PXdkVY4jOSVm+QF49F/GoE8GXSHFJF4ipf1I5eLYuCVxe LR2B34PaVrmK7tCal0bfMPiE7uY9QEpH/C9bceTSyNSjK8AcrsJZZI3Jc tqQu0Bpv9Gb22B7qb4neI+idS0OVP0Gi9AWFK0Jx/55rM+Uum9ctgUI7/ wn3l+EYOEaIKvXNE0nzgMjVmiOPDI8MHGL6cDVb8bgu9G2OtWlAi5xsYs g==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302435" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302435" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:10 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796124" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:10 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 22/44] entry: Pass pt_regs to irqentry_exit_cond_resched() Date: Thu, 27 Jan 2022 09:54:43 -0800 Message-Id: <20220127175505.851391-23-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Auxiliary pt_regs space needs to be manipulated by the generic entry/exit code. Unfortunately, the call to irqentry_exit_cond_resched() from xen_pv_evtchn_do_upcall() bypasses the 'normal' irqentry_exit() call. Normally the irqentry_exit() would take care of handling any auxiliary pt_regs but because of this bypass irqentry_exit_cond_resched() is required to handle it. Add pt_regs to irqentry_exit_cond_resched() so that any auxiliary pt_regs data can be handled. Create an internal exit_cond_resched() call for irqentry_exit() to avoid passing pt_regs because irqentry_exit() will directly handle any auxiliary pt_regs data. Signed-off-by: Ira Weiny --- Changes for V8 New Patch --- arch/x86/entry/common.c | 2 +- include/linux/entry-common.h | 3 ++- kernel/entry/common.c | 9 +++++++-- 3 files changed, 10 insertions(+), 4 deletions(-) diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 6c2826417b33..f1ba770d035d 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -309,7 +309,7 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct p= t_regs *regs) =20 inhcall =3D get_and_clear_inhcall(); if (inhcall && !WARN_ON_ONCE(state.exit_rcu)) { - irqentry_exit_cond_resched(); + irqentry_exit_cond_resched(regs); instrumentation_end(); restore_inhcall(inhcall); } else { diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index ddaffc983e62..14fd329847e7 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -451,10 +451,11 @@ irqentry_state_t noinstr irqentry_enter(struct pt_reg= s *regs); =20 /** * irqentry_exit_cond_resched - Conditionally reschedule on return from in= terrupt + * @regs: Pointer to pt_regs of interrupted context * * Conditional reschedule with additional sanity checks. */ -void irqentry_exit_cond_resched(void); +void irqentry_exit_cond_resched(struct pt_regs *regs); =20 void __irqentry_exit_cond_resched(void); #ifdef CONFIG_PREEMPT_DYNAMIC diff --git a/kernel/entry/common.c b/kernel/entry/common.c index 490442a48332..f4210a7fc84d 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -395,7 +395,7 @@ void __irqentry_exit_cond_resched(void) DEFINE_STATIC_CALL(__irqentry_exit_cond_resched, __irqentry_exit_cond_resc= hed); #endif =20 -void irqentry_exit_cond_resched(void) +static void exit_cond_resched(void) { if (IS_ENABLED(CONFIG_PREEMPTION)) { #ifdef CONFIG_PREEMPT_DYNAMIC @@ -406,6 +406,11 @@ void irqentry_exit_cond_resched(void) } } =20 +void irqentry_exit_cond_resched(struct pt_regs *regs) +{ + exit_cond_resched(); +} + noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) { lockdep_assert_irqs_disabled(); @@ -431,7 +436,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqent= ry_state_t state) } =20 instrumentation_begin(); - irqentry_exit_cond_resched(); + exit_cond_resched(); /* Covers both tracing and lockdep */ trace_hardirqs_on(); instrumentation_end(); --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B8BDC433F5 for ; Thu, 27 Jan 2022 17:56:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243076AbiA0R4Q (ORCPT ); Thu, 27 Jan 2022 12:56:16 -0500 Received: from mga02.intel.com ([134.134.136.20]:19415 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244846AbiA0RzZ (ORCPT ); Thu, 27 Jan 2022 12:55:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306125; x=1674842125; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IgMfD3EDPqux6N7a9kZjEZreZVxB0l9Epm9jhn3105k=; b=YUdbiuUELgzJ9uEoxR5Z07p9EQeV19TmBQNS6bxIFnNJSgW6BoE1hpda Kf211LKBOozwxKyr1Iq/h8bH039vw6OzScZIBdt2D1AmSzTe3axNkfs4W FwOr3LD2Qa1PPifhdTvp2p+kRMqwnGpwXvszBf+ZlJNWmDnSDDEjA4Tpk iJvtKkCNTDONdyzm6Td0EjTvi3sIFg317hcPpDuOKgZcLfNNjdlrJYU4e hGfEsv+PSTREG1VtN/ojE+FZzmTLPp8uErT1DBnZLlbCR8i99H5f7vEBq 8ECOOpwkz0ImnChIfGfY9Xt5qF9P8nSrIMOGfEyBNtbiicjEs6sY6cwbM w==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302436" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302436" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:11 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796130" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:10 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 23/44] entry: Add architecture auxiliary pt_regs save/restore calls Date: Thu, 27 Jan 2022 09:54:44 -0800 Message-Id: <20220127175505.851391-24-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Some architectures have auxiliary pt_regs space which is available to store extra information on the stack. For ease of implementation the common C code was left to fill in the data when needed. Define C calls for architectures to save and restore any auxiliary data they may need and call those from the common entry code. NOTE: Due to the split nature of the Xen exit code irqentry_exit_cond_resched() requires an unbalanced call to arch_restore_aux_pt_regs() regardless of the nature of the preemption configuration. Signed-off-by: Ira Weiny --- Changes for V8 New patch which introduces a generic auxiliary pt_register save restore. --- include/linux/entry-common.h | 7 +++++++ kernel/entry/common.c | 16 ++++++++++++++-- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index 14fd329847e7..b243f1cfd491 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -99,6 +99,13 @@ static inline __must_check int arch_syscall_enter_traceh= ook(struct pt_regs *regs } #endif =20 +#ifndef CONFIG_ARCH_HAS_PTREGS_AUXILIARY + +static inline void arch_save_aux_pt_regs(struct pt_regs *regs) { } +static inline void arch_restore_aux_pt_regs(struct pt_regs *regs) { } + +#endif + /** * enter_from_user_mode - Establish state when coming from user mode * diff --git a/kernel/entry/common.c b/kernel/entry/common.c index f4210a7fc84d..c778e9783361 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -323,7 +323,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs = *regs) =20 if (user_mode(regs)) { irqentry_enter_from_user_mode(regs); - return ret; + goto aux_save; } =20 /* @@ -362,7 +362,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs = *regs) instrumentation_end(); =20 ret.exit_rcu =3D true; - return ret; + goto aux_save; } =20 /* @@ -377,6 +377,11 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs= *regs) trace_hardirqs_off_finish(); instrumentation_end(); =20 +aux_save: + instrumentation_begin(); + arch_save_aux_pt_regs(regs); + instrumentation_end(); + return ret; } =20 @@ -408,6 +413,7 @@ static void exit_cond_resched(void) =20 void irqentry_exit_cond_resched(struct pt_regs *regs) { + arch_restore_aux_pt_regs(regs); exit_cond_resched(); } =20 @@ -415,6 +421,10 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqen= try_state_t state) { lockdep_assert_irqs_disabled(); =20 + instrumentation_begin(); + arch_restore_aux_pt_regs(regs); + instrumentation_end(); + /* Check whether this returns to user mode */ if (user_mode(regs)) { irqentry_exit_to_user_mode(regs); @@ -464,6 +474,7 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_r= egs *regs) instrumentation_begin(); trace_hardirqs_off_finish(); ftrace_nmi_enter(); + arch_save_aux_pt_regs(regs); instrumentation_end(); =20 return irq_state; @@ -472,6 +483,7 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_r= egs *regs) void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_= state) { instrumentation_begin(); + arch_restore_aux_pt_regs(regs); ftrace_nmi_exit(); if (irq_state.lockdep) { trace_hardirqs_on_prepare(); --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C8B2C433FE for ; Thu, 27 Jan 2022 17:56:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245001AbiA0R4P (ORCPT ); Thu, 27 Jan 2022 12:56:15 -0500 Received: from mga02.intel.com ([134.134.136.20]:19413 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244847AbiA0RzZ (ORCPT ); Thu, 27 Jan 2022 12:55:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306125; x=1674842125; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3l7R2+hjC6dyevgRc9pgdsZzcbxeEl2tX9Je9seBHP0=; b=OjJk6Z103+zCRdGAJyTS1O3t6DxE2mhG6n+PtRc7DY8NuTqQ5ZCxByx9 oZhh8ckra1z6qODca4IUqJVh7Z7tnnBipRFU1+7wjMl3PuOlo3I9xmjLL tV25viKz03V+wIKCyyrKUZCxSr1Kf19b0vEEDnXtl90g95x+clofK299U FZocbDsUOXAyLtAvVE8Pyc0RRaDKzOjHMnObGjyqEitiAdWq/pq3Cku+l cCF6hKCdFPGYrjEG8KeiKAyPw/lFQzs76tH2uzhu/+u5uDRp4KXLI+F5B oat2DGWYONJcyP+MTgKluz8TQdfF1On69B5DwUn9scdG9k9sG616B0xkm Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302437" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302437" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:11 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796134" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:11 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 24/44] x86/entry: Define arch_{save|restore}_auxiliary_pt_regs() Date: Thu, 27 Jan 2022 09:54:45 -0800 Message-Id: <20220127175505.851391-25-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The x86 architecture supports the new auxiliary pt_regs space if ARCH_HAS_PTREGS_AUXILIARY is enabled. Define the callbacks within the x86 code required by the core entry code when this support is enabled. Signed-off-by: Ira Weiny --- Changes for V8 New patch --- arch/x86/include/asm/entry-common.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/ent= ry-common.h index 43184640b579..5fa5dd2d539c 100644 --- a/arch/x86/include/asm/entry-common.h +++ b/arch/x86/include/asm/entry-common.h @@ -95,4 +95,16 @@ static __always_inline void arch_exit_to_user_mode(void) } #define arch_exit_to_user_mode arch_exit_to_user_mode =20 +#ifdef CONFIG_ARCH_HAS_PTREGS_AUXILIARY + +static inline void arch_save_aux_pt_regs(struct pt_regs *regs) +{ +} + +static inline void arch_restore_aux_pt_regs(struct pt_regs *regs) +{ +} + +#endif + #endif --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E8E3C433F5 for ; Thu, 27 Jan 2022 17:56:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245300AbiA0R4p (ORCPT ); Thu, 27 Jan 2022 12:56:45 -0500 Received: from mga02.intel.com ([134.134.136.20]:19450 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244785AbiA0Rza (ORCPT ); Thu, 27 Jan 2022 12:55:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306130; x=1674842130; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lz+GzmfxQCFYBGj/322C04YF6L+IH6KOL/+UXO7S+bE=; b=FhZa5BiU6T1v7DP2WWrMOgJYo48lVeUECIX6dYJ2Bs6qigt6lGiwcKT2 TQiIKIByxAXBcsfsrwOkLy7ZQtnxmMMct1qtael+pSpVHk9Sw9+BguMIB KX2wn3q2W0U2mhSGe6R1vqYnSMg1gwZiJVwwrD6MSxA8KasnhS/64Xevq YKkGXQptuUPP67Fj+iywB4tJfP2xnvvKhMoeSNuoKjjyKXPgesTgZSuUA bRKRV0z7mVWESjYx3YLcaMqhbjTiWIzSdpewhgqqinNULAHTdUIsQ5R8z 4Il42MnCPGzznC/gsCniSuib/GFWgh6X4/k5KfLQG/wua8jDow2JpVPCj g==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302438" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302438" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:11 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796139" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:11 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 25/44] x86/pkeys: Preserve PKRS MSR across exceptions Date: Thu, 27 Jan 2022 09:54:46 -0800 Message-Id: <20220127175505.851391-26-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny PKRS is a per-logical-processor MSR which overlays additional protection for pages which have been mapped with a protection key. It is desired to protect PKS pages while executing exception code. While in the exception code can alter the PKS permissions if necessary for any access it may require. To do this the current thread value must be saved, the CPU MSR value set to the default value, and the saved value restored upon completion of the exception. This can be done with the new auxiliary pt_regs space. Turn on the new auxiliary pt_regs space by triggering ARCH_HAS_PTREGS_AUXILIARY. This is done by making ARCH_HAS_PTREGS_AUXILIARY default yes and then dependent on ARCH_ENABLE_SUPERVISOR_PKEYS. Additional users of the auxiliary space can OR in their Kconfig options as needed. Then define pks_{save|restore}_pt_regs() to use the auxiliary space to store the thread PKRS value across exceptions. Call pks_*_pt_regs() from arch_{save|restore}_aux_pt_regs() Update the PKS test code to properly clear the saved thread PKRS value before returning to ensure current tests work with this change. Peter, Thomas, Andy, Dave, and Dan all suggested parts of the patch or aided in the development of the patch. [1] https://lore.kernel.org/lkml/CALCETrVe1i5JdyzD_BcctxQJn+ZE3T38EFPgjxN1F= 577M36g+w@mail.gmail.com/ [2] https://lore.kernel.org/lkml/874kpxx4jf.fsf@nanos.tec.linutronix.de/#t [3] https://lore.kernel.org/lkml/CALCETrUHwZPic89oExMMe-WyDY8-O3W68NcZvse3= =3DPGW+iW5=3Dw@mail.gmail.com/ Cc: Dave Hansen Cc: Dan Williams Suggested-by: Dave Hansen Suggested-by: Dan Williams Suggested-by: Peter Zijlstra Suggested-by: Thomas Gleixner Suggested-by: Andy Lutomirski Signed-off-by: Ira Weiny --- Changes for V8: Tie this into the new generic auxiliary pt_regs support. Build this on the new irqentry_*() refactoring patches Split this patch off from the PKS portion of the auxiliary pt_regs functionality. From Thomas Fix noinstr mess s/write_pkrs/pks_write_pkrs s/pkrs_init_value/PKRS_INIT_VALUE Simplify the number and location of the save/restore calls. Cover entry from user space as well. Changes for V7: Rebased to 5.14 entry code declare write_pkrs() in pks.h s/INIT_PKRS_VALUE/pkrs_init_value Remove unnecessary INIT_PKRS_VALUE def s/pkrs_save_set_irq/pkrs_save_irq/ The inital value for exceptions is best managed completely within the pkey code. --- arch/x86/Kconfig | 3 ++- arch/x86/include/asm/entry-common.h | 3 +++ arch/x86/include/asm/pks.h | 8 ++++++-- arch/x86/include/asm/ptrace.h | 3 +++ arch/x86/mm/fault.c | 2 +- arch/x86/mm/pkeys.c | 32 +++++++++++++++++++++++++++++ lib/pks/pks_test.c | 11 ++++++++-- 7 files changed, 56 insertions(+), 6 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 82342f27b218..62685906f7c3 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1878,8 +1878,9 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS If unsure, say y. =20 config ARCH_HAS_PTREGS_AUXILIARY + def_bool y depends on X86_64 - bool + depends on ARCH_ENABLE_SUPERVISOR_PKEYS =20 choice prompt "TSX enable mode" diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/ent= ry-common.h index 5fa5dd2d539c..803727b95b3a 100644 --- a/arch/x86/include/asm/entry-common.h +++ b/arch/x86/include/asm/entry-common.h @@ -8,6 +8,7 @@ #include #include #include +#include =20 /* Check that the stack and regs on entry from user mode are sane. */ static __always_inline void arch_check_user_regs(struct pt_regs *regs) @@ -99,10 +100,12 @@ static __always_inline void arch_exit_to_user_mode(voi= d) =20 static inline void arch_save_aux_pt_regs(struct pt_regs *regs) { + pks_save_pt_regs(regs); } =20 static inline void arch_restore_aux_pt_regs(struct pt_regs *regs) { + pks_restore_pt_regs(regs); } =20 #endif diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h index ee9fff5b4b13..82baa594cb3b 100644 --- a/arch/x86/include/asm/pks.h +++ b/arch/x86/include/asm/pks.h @@ -6,22 +6,26 @@ =20 void pks_setup(void); void pks_write_current(void); +void pks_save_pt_regs(struct pt_regs *regs); +void pks_restore_pt_regs(struct pt_regs *regs); =20 #else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 static inline void pks_setup(void) { } static inline void pks_write_current(void) { } +static inline void pks_save_pt_regs(struct pt_regs *regs) { } +static inline void pks_restore_pt_regs(struct pt_regs *regs) { } =20 #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 =20 #ifdef CONFIG_PKS_TEST =20 -bool pks_test_callback(void); +bool pks_test_callback(struct pt_regs *regs); =20 #else /* !CONFIG_PKS_TEST */ =20 -static inline bool pks_test_callback(void) +static inline bool pks_test_callback(struct pt_regs *regs) { return false; } diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 79541682e7f7..f2527d6451b3 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -98,6 +98,9 @@ struct pt_regs { * ARCH_HAS_PTREGS_AUXILIARY. Failure to do so will result in a build fai= lure. */ struct pt_regs_auxiliary { +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + u32 pks_thread_pkrs; +#endif }; =20 struct pt_regs_extended { diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index bef879943260..030eb3e08550 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1164,7 +1164,7 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned lon= g hw_error_code, * is running. If so, pks_test_callback() will clear the protection * mechanism and return true to indicate the fault was handled. */ - if (pks_test_callback()) + if (pks_test_callback(regs)) return; } =20 diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 7c6498fb8f8d..33b7f84ed33b 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -256,6 +256,38 @@ void pks_write_current(void) pks_write_pkrs(current->thread.pks_saved_pkrs); } =20 +/* + * PKRS is a per-logical-processor MSR which overlays additional protectio= n for + * pages which have been mapped with a protection key. + * + * To protect against exceptions having potentially privileged access to m= emory + * of an interrupted thread, save the current thread value and set the PKRS + * value to be used during the exception. + */ +void pks_save_pt_regs(struct pt_regs *regs) +{ + struct pt_regs_auxiliary *aux_pt_regs; + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + aux_pt_regs =3D &to_extended_pt_regs(regs)->aux; + aux_pt_regs->pks_thread_pkrs =3D current->thread.pks_saved_pkrs; + pks_write_pkrs(PKS_INIT_VALUE); +} + +void pks_restore_pt_regs(struct pt_regs *regs) +{ + struct pt_regs_auxiliary *aux_pt_regs; + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + aux_pt_regs =3D &to_extended_pt_regs(regs)->aux; + current->thread.pks_saved_pkrs =3D aux_pt_regs->pks_thread_pkrs; + pks_write_pkrs(current->thread.pks_saved_pkrs); +} + /* * PKS is independent of PKU and either or both may be supported on a CPU. * diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c index 933f1bed4820..77f872829300 100644 --- a/lib/pks/pks_test.c +++ b/lib/pks/pks_test.c @@ -43,6 +43,7 @@ #include =20 #include +#include /* for struct pt_regs */ =20 #include =20 @@ -74,12 +75,18 @@ struct pks_test_ctx { * NOTE: The callback is responsible for clearing any condition which would * cause the fault to re-trigger. */ -bool pks_test_callback(void) +bool pks_test_callback(struct pt_regs *regs) { + struct pt_regs_extended *ept_regs =3D to_extended_pt_regs(regs); + struct pt_regs_auxiliary *aux_pt_regs =3D &ept_regs->aux; bool armed =3D (test_armed_key !=3D 0); + u32 pkrs =3D aux_pt_regs->pks_thread_pkrs; =20 if (armed) { - pks_mk_readwrite(test_armed_key); + /* Enable read and write to stop faults */ + aux_pt_regs->pks_thread_pkrs =3D pkey_update_pkval(pkrs, + test_armed_key, + 0); fault_cnt++; } =20 --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63E35C433EF for ; Thu, 27 Jan 2022 17:56:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245205AbiA0R4a (ORCPT ); Thu, 27 Jan 2022 12:56:30 -0500 Received: from mga02.intel.com ([134.134.136.20]:19418 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244891AbiA0Rzb (ORCPT ); Thu, 27 Jan 2022 12:55:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306131; x=1674842131; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=klKmNgWFiTbZ7mbu6bCq0GUu9+o0tnvTpl/zZa5vWuY=; b=RJITqKgX2xV4bdabKSV/zxj4Bf61FWP5l/6vwC8clBaZICckXRPsoqpK nwEHytSk1HRVAboCjX+QXKqW1mvguM0fCMA1s80L+Hz0Q1wH/LTNgCJ37 nMT9ESTgItFq9WfDO9FyLqEZlhHGYf81xA+1pzm8tQhCI0tuzWgaVIbc1 dVGNwnlVg1DvRBP3wZpZd4uROaKFDrC7aHRuwZu4uWxeab0xw+fh5ZWgd SDgJNsDH6mWvS3B3j8jyQD8mN2x/Yuk4RSW/PlFatNH2djfWmirhA+PGo JJ9PIvrtSTt4bKYk5lrMYvtJvnYgS/TeYG86f1sfeqkqPvN6eE3xjRnR9 A==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302441" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302441" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:12 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796144" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:11 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 26/44] x86/fault: Print PKS MSR on fault Date: Thu, 27 Jan 2022 09:54:47 -0800 Message-Id: <20220127175505.851391-27-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny If a PKS fault occurs it will be easier to debug it if the PKS MSR value at the time of the fault is known. Add pks_dump_fault_info() to dump the PKRS MSR on fault if enabled. Suggested-by: Andy Lutomirski Signed-off-by: Ira Weiny --- Changes for V8 Split this into it's own patch. --- arch/x86/include/asm/pks.h | 2 ++ arch/x86/mm/fault.c | 3 +++ arch/x86/mm/pkeys.c | 11 +++++++++++ 3 files changed, 16 insertions(+) diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h index 82baa594cb3b..fc3c66f1bb04 100644 --- a/arch/x86/include/asm/pks.h +++ b/arch/x86/include/asm/pks.h @@ -8,6 +8,7 @@ void pks_setup(void); void pks_write_current(void); void pks_save_pt_regs(struct pt_regs *regs); void pks_restore_pt_regs(struct pt_regs *regs); +void pks_dump_fault_info(struct pt_regs *regs); =20 #else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 @@ -15,6 +16,7 @@ static inline void pks_setup(void) { } static inline void pks_write_current(void) { } static inline void pks_save_pt_regs(struct pt_regs *regs) { } static inline void pks_restore_pt_regs(struct pt_regs *regs) { } +static inline void pks_dump_fault_info(struct pt_regs *regs) { } =20 #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 030eb3e08550..697c06f08103 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -549,6 +549,9 @@ show_fault_oops(struct pt_regs *regs, unsigned long err= or_code, unsigned long ad (error_code & X86_PF_PK) ? "protection keys violation" : "permissions violation"); =20 + if (error_code & X86_PF_PK) + pks_dump_fault_info(regs); + if (!(error_code & X86_PF_USER) && user_mode(regs)) { struct desc_ptr idt, gdt; u16 ldtr, tr; diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 33b7f84ed33b..bdd700d5ad03 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -288,6 +288,17 @@ void pks_restore_pt_regs(struct pt_regs *regs) pks_write_pkrs(current->thread.pks_saved_pkrs); } =20 +void pks_dump_fault_info(struct pt_regs *regs) +{ + struct pt_regs_auxiliary *aux_pt_regs; + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + aux_pt_regs =3D &to_extended_pt_regs(regs)->aux; + pr_alert("PKRS: 0x%x\n", aux_pt_regs->pks_thread_pkrs); +} + /* * PKS is independent of PKU and either or both may be supported on a CPU. * --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 359D7C433F5 for ; Thu, 27 Jan 2022 17:56:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244923AbiA0R4h (ORCPT ); Thu, 27 Jan 2022 12:56:37 -0500 Received: from mga02.intel.com ([134.134.136.20]:19415 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244817AbiA0Rzc (ORCPT ); Thu, 27 Jan 2022 12:55:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306132; x=1674842132; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=K/5ZtoMtZQ5F0Ii5hg3nst9IQO9U2ZVIqR8hleWBK4E=; b=MENEZ2/KX3qJs2+oLfK48h7xEl0nRyHgmbnvwx/bB3PJkDvol0Y0pQu4 whkaGePNYrJ0axRyxWlvoUYUcBvM5RyCXi3YCfbjQx5+tsTBcyS1H2ZTp iejfGlc5uuZif8oV98agMIhoyNNudD1vB8ORCg4YTlFXswW2L6YfXHlHY SaUCW+8rvigZpvqKo5Nzq+lNTplAwJgIgauYbdN3HsdszKniCN3TO9SCT howklAqCPve6chFSuZhkHHvIbHqlISi0FgdCJWw52TGjxHwn4j/gAhu3V EOHFoGAdxo6v96qSS73b4SdljvDJHY+rlms4DVOuU9lTt1FGNsLfe6BuP Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302442" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302442" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:12 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796148" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:11 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 27/44] mm/pkeys: Add PKS exception test Date: Thu, 27 Jan 2022 09:54:48 -0800 Message-Id: <20220127175505.851391-28-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny During an exception the interrupted threads PKRS value must be preserved and the exception should get the default value for that Pkey. Upon return from exception the threads PKRS value should be restored. Add a PKS test which forces a fault and checks the values saved as well as tests the ability for code to change the Pkey value during the exception. Do this by changing the interrupted thread Pkey to read only prior to the exception. The default test Pkey is no access and therefore should be seen during the exception. They switch to read/write during the exception. Finally ensure that the read only value is restored when the exception is completed. $ echo 4 > /sys/kernel/debug/x86/run_pks $ cat /sys/kernel/debug/x86/run_pks PASS Signed-off-by: Ira Weiny --- Change for V8 Split this test off from the testing patch and place it after the exception saving code. --- arch/x86/include/asm/pks.h | 3 + arch/x86/mm/pkeys.c | 2 +- lib/pks/pks_test.c | 145 +++++++++++++++++++++++++++++++++++++ 3 files changed, 149 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h index fc3c66f1bb04..065386c8bf37 100644 --- a/arch/x86/include/asm/pks.h +++ b/arch/x86/include/asm/pks.h @@ -24,9 +24,12 @@ static inline void pks_dump_fault_info(struct pt_regs *r= egs) { } #ifdef CONFIG_PKS_TEST =20 bool pks_test_callback(struct pt_regs *regs); +#define __static_or_pks_test =20 #else /* !CONFIG_PKS_TEST */ =20 +#define __static_or_pks_test static + static inline bool pks_test_callback(struct pt_regs *regs) { return false; diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index bdd700d5ad03..1da78580d6de 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -210,7 +210,7 @@ u32 pkey_update_pkval(u32 pkval, int pkey, u32 accessbi= ts) =20 #ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS =20 -static DEFINE_PER_CPU(u32, pkrs_cache); +__static_or_pks_test DEFINE_PER_CPU(u32, pkrs_cache); =20 /* * pks_write_pkrs() - Write the pkrs of the current CPU diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c index 77f872829300..008a1079579d 100644 --- a/lib/pks/pks_test.c +++ b/lib/pks/pks_test.c @@ -17,6 +17,8 @@ * * 1 Allocate a single key and check all 3 permissions on a page. * * 2 'arm context' for context switch test * * 3 Check the context armed in '2' to ensure the MSR value was preserv= ed + * * 4 Test that the exception thread PKRS remains independent of the + * interrupted threads PKRS * * 8 Loop through all CPUs, report the msr, and check against the defau= lt. * * 9 Set up and fault on a PKS protected page. * @@ -53,8 +55,11 @@ #define RUN_SINGLE 1 #define ARM_CTX_SWITCH 2 #define CHECK_CTX_SWITCH 3 +#define RUN_EXCEPTION 4 #define RUN_CRASH_TEST 9 =20 +DECLARE_PER_CPU(u32, pkrs_cache); + static struct dentry *pks_test_dentry; static bool crash_armed; =20 @@ -65,8 +70,71 @@ static int prev_fault_cnt; =20 struct pks_test_ctx { int pkey; + bool pass; char data[64]; }; +static struct pks_test_ctx *test_exception_ctx; + +static bool check_pkey_val(u32 pk_reg, int pkey, u32 expected) +{ + pk_reg =3D (pk_reg >> PKR_PKEY_SHIFT(pkey)) & PKEY_ACCESS_MASK; + return (pk_reg =3D=3D expected); +} + +/* + * Check if the register @pkey value matches @expected value + * + * Both the cached and actual MSR must match. + */ +static bool check_pkrs(int pkey, u32 expected) +{ + bool ret =3D true; + u64 pkrs; + u32 *tmp_cache; + + tmp_cache =3D get_cpu_ptr(&pkrs_cache); + if (!check_pkey_val(*tmp_cache, pkey, expected)) + ret =3D false; + put_cpu_ptr(tmp_cache); + + rdmsrl(MSR_IA32_PKRS, pkrs); + if (!check_pkey_val(pkrs, pkey, expected)) + ret =3D false; + + return ret; +} + +static void check_exception(u32 thread_pkrs) +{ + /* Check the thread saved state */ + if (!check_pkey_val(thread_pkrs, test_armed_key, PKEY_DISABLE_WRITE)) { + pr_err(" FAIL: checking ept_regs->thread_pkrs\n"); + test_exception_ctx->pass =3D false; + } + + /* Check that the exception state has disabled access */ + if (!check_pkrs(test_armed_key, PKEY_DISABLE_ACCESS)) { + pr_err(" FAIL: PKRS cache and MSR\n"); + test_exception_ctx->pass =3D false; + } + + /* + * Ensure an update can occur during exception without affecting the + * interrupted thread. The interrupted thread is checked after + * exception... + */ + pks_mk_readwrite(test_armed_key); + if (!check_pkrs(test_armed_key, 0)) { + pr_err(" FAIL: exception did not change register to 0\n"); + test_exception_ctx->pass =3D false; + } + pks_mk_noaccess(test_armed_key); + if (!check_pkrs(test_armed_key, PKEY_DISABLE_ACCESS)) { + pr_err(" FAIL: exception did not change register to 0x%x\n", + PKEY_DISABLE_ACCESS); + test_exception_ctx->pass =3D false; + } +} =20 /* * pks_test_callback() is called by the fault handler to indicate it saw a= PKey @@ -82,6 +150,16 @@ bool pks_test_callback(struct pt_regs *regs) bool armed =3D (test_armed_key !=3D 0); u32 pkrs =3D aux_pt_regs->pks_thread_pkrs; =20 + if (test_exception_ctx) { + check_exception(pkrs); + /* + * Stop this check directly within the exception because the + * fault handler clean up code will call again while checking + * the PMD entry and there is no need to check this again. + */ + test_exception_ctx =3D NULL; + } + if (armed) { /* Enable read and write to stop faults */ aux_pt_regs->pks_thread_pkrs =3D pkey_update_pkval(pkrs, @@ -240,6 +318,7 @@ static struct pks_test_ctx *alloc_ctx(u8 pkey) } =20 ctx->pkey =3D pkey; + ctx->pass =3D true; sprintf(ctx->data, "%s", "DEADBEEF"); return ctx; } @@ -265,6 +344,69 @@ static bool run_single(void) return rc; } =20 +static bool run_exception_test(void) +{ + void *ptr =3D NULL; + bool pass =3D true; + struct pks_test_ctx *ctx; + + pr_info(" ***** BEGIN: exception checking\n"); + + ctx =3D alloc_ctx(PKS_KEY_TEST); + if (IS_ERR(ctx)) { + pr_err(" FAIL: no context\n"); + pass =3D false; + goto result; + } + ctx->pass =3D true; + + ptr =3D alloc_test_page(ctx->pkey); + if (!ptr) { + pr_err(" FAIL: no vmalloc page\n"); + pass =3D false; + goto free_context; + } + + pks_update_protection(ctx->pkey, PKEY_DISABLE_WRITE); + + WRITE_ONCE(test_exception_ctx, ctx); + WRITE_ONCE(test_armed_key, ctx->pkey); + + memcpy(ptr, ctx->data, 8); + + if (!fault_caught()) { + pr_err(" FAIL: did not get an exception\n"); + pass =3D false; + } + + /* + * NOTE The exception code has to enable access (b00) to keep the fault + * from looping forever. Therefore full access is seen here rather + * than write disabled. + * + * Furthermore, check_exception() disabled access during the exception + * so this is testing that the thread value was restored back to the + * thread value. + */ + if (!check_pkrs(test_armed_key, 0)) { + pr_err(" FAIL: PKRS not restored\n"); + pass =3D false; + } + + if (!ctx->pass) + pass =3D false; + + WRITE_ONCE(test_armed_key, 0); + + vfree(ptr); +free_context: + free_ctx(ctx); +result: + pr_info(" ***** END: exception checking : %s\n", + pass ? "PASS" : "FAIL"); + return pass; +} + static void crash_it(void) { struct pks_test_ctx *ctx; @@ -427,6 +569,9 @@ static ssize_t pks_write_file(struct file *file, const = char __user *user_buf, /* After context switch MSR should be restored */ check_ctx_switch(file); break; + case RUN_EXCEPTION: + last_test_pass =3D run_exception_test(); + break; default: last_test_pass =3D false; break; --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F61FC433F5 for ; Thu, 27 Jan 2022 17:56:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245229AbiA0R4d (ORCPT ); Thu, 27 Jan 2022 12:56:33 -0500 Received: from mga02.intel.com ([134.134.136.20]:19413 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244818AbiA0Rzc (ORCPT ); Thu, 27 Jan 2022 12:55:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306132; x=1674842132; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=C3DkGtAeq/fvKF1BtRuemGQclgEwuaGMDhc79ROPFiw=; b=YnQy4VezkywlEXKiqqt/KARNW4vm4HM5YywPKjLLOaR7K6kxI90bck2M 2Zz+fjhg+zSUjJjfnwBbNl0SfKGDqU0h90pxaY0340MheRNGjnZJWABbO 9rHDEIFDaS9+CoWjw54lgiOn+FFjQMdGPe4ldKclRS753r7ehCtyVswLu +zHoPIOe3vmfDdIsE3kZtKdobhry1fsXsTxlbdIss/7M2g5EqvvvxBdsE hyaPe3AGg7YKzGhOfENH9JfIw0U2G7YPqCs34JZ333VzMrVo6kRGWvg5D f2/o4lRu4LcqLumKV73wwLUkGIDADMWSIAL0CTa238HhGV4KM1dSGFJaS A==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302444" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302444" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:12 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796152" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:11 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 28/44] mm/pkeys: Introduce pks_update_exception() Date: Thu, 27 Jan 2022 09:54:49 -0800 Message-Id: <20220127175505.851391-29-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Some PKS use cases will want to catch permissions violations and optionally allow them. pks_update_protection() updates the protection of the current running context. It will _not_ work to change the protections of a thread which has been interrupted. Therefore updating a thread from within an exception is not possible with pks_update_protection(). Introduce pks_update_exception() to update the faulted threads protections in addition to the current context. A PKS fault callback can then be used to adjust the permissions of the faulted thread as necessary. Add documentation Signed-off-by: Ira Weiny --- Changes for V8 Remove the concept of abandoning a pkey in favor of using the custom fault handler via this new pks_update_exception() call Without an abandon call there is no need for an abandon mask on sched in, new thread creation, or within exceptions... This now lets all invalid access' fault Ensure that all entry points into the pks has feature checks... Place abandon fault check before the test callback to ensure testing does not detect the double fault of the abandon code and flag it incorrectly as a fault. Change return type of pks_handle_abandoned_pkeys() to bool --- Documentation/core-api/protection-keys.rst | 3 ++ arch/x86/mm/pkeys.c | 49 +++++++++++++++++++--- include/linux/pkeys.h | 5 +++ 3 files changed, 51 insertions(+), 6 deletions(-) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index 115afc67153f..b89308bf117e 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -147,6 +147,9 @@ Changing permissions of individual keys .. kernel-doc:: include/linux/pks-keys.h :identifiers: pks_mk_readwrite pks_mk_noaccess =20 +.. kernel-doc:: arch/x86/mm/pkeys.c + :identifiers: pks_update_exception + MSR details ----------- =20 diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 1da78580d6de..6723ae42732a 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -319,6 +319,15 @@ void pks_setup(void) cr4_set_bits(X86_CR4_PKS); } =20 +static void __pks_update_protection(int pkey, u32 protection) +{ + u32 pkrs =3D current->thread.pks_saved_pkrs; + + current->thread.pks_saved_pkrs =3D pkey_update_pkval(pkrs, pkey, + protection); + pks_write_pkrs(current->thread.pks_saved_pkrs); +} + /* * Do not call this directly, see pks_mk*(). * @@ -332,18 +341,46 @@ void pks_setup(void) */ void pks_update_protection(int pkey, u32 protection) { - u32 pkrs; - if (!cpu_feature_enabled(X86_FEATURE_PKS)) return; =20 - pkrs =3D current->thread.pks_saved_pkrs; - current->thread.pks_saved_pkrs =3D pkey_update_pkval(pkrs, pkey, - protection); preempt_disable(); - pks_write_pkrs(current->thread.pks_saved_pkrs); + __pks_update_protection(pkey, protection); preempt_enable(); } EXPORT_SYMBOL_GPL(pks_update_protection); =20 +/** + * pks_update_exception() - Update the protections of a faulted thread + * + * @regs: Faulting thread registers + * @pkey: pkey to update + * @protection: protection bits to use. + * + * CONTEXT: Exception + * + * pks_update_protection() updates the protection of the current running + * context. It will not work to change the protections of a thread which = has + * been interrupted. If a PKS fault callback fires it may want to update = the + * faulted threads protections in addition to it's own. + * + * Use pks_update_exception() to update the faulted threads protections + * in addition to the current context. + */ +void pks_update_exception(struct pt_regs *regs, int pkey, u32 protection) +{ + struct pt_regs_extended *ept_regs; + u32 old; + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + __pks_update_protection(pkey, protection); + + ept_regs =3D to_extended_pt_regs(regs); + old =3D ept_regs->aux.pks_thread_pkrs; + ept_regs->aux.pks_thread_pkrs =3D pkey_update_pkval(old, pkey, protection= ); +} +EXPORT_SYMBOL_GPL(pks_update_exception); + #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index 5f4965f5449b..c318d97f5da8 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -56,6 +56,7 @@ static inline bool arch_pkeys_enabled(void) #include =20 void pks_update_protection(int pkey, u32 protection); +void pks_update_exception(struct pt_regs *regs, int pkey, u32 protection); =20 /** * pks_mk_noaccess() - Disable all access to the domain @@ -85,6 +86,10 @@ static inline void pks_mk_readwrite(int pkey) =20 static inline void pks_mk_noaccess(int pkey) {} static inline void pks_mk_readwrite(int pkey) {} +static inline void pks_update_exception(struct pt_regs *regs, + int pkey, + u32 protection) +{ } =20 #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0486C433EF for ; Thu, 27 Jan 2022 17:57:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237488AbiA0R5T (ORCPT ); Thu, 27 Jan 2022 12:57:19 -0500 Received: from mga02.intel.com ([134.134.136.20]:19418 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244951AbiA0Rzk (ORCPT ); Thu, 27 Jan 2022 12:55:40 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306140; x=1674842140; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eGN63N0a99zN4ja5pEhr049Oij8dwvnWeFfMaFaK4UI=; b=WzA6nD24PUgNKJLp+avNOMP/DutzoK1/M6uTxLi3p7w9dv7Zemmf7BFL Pd4JmHkOjrLyHAfE0NhxwBvjXHUo94U3BQR68N0AkhpOo3f8FrHjghaSs zOHextBSeoHbJOz/9eq8nSzrohrC9oF8tVC56rAvxlmSBhp+PI6EE6FPe 2KvpA9Bkis996+ue0CwQkdvrDw9WZY8zwm5rCyGnYSJqYk3ETMo5I9jWX Bgfrn3Q36hTWdDWzYEUAzhHlz1S1CaINSKnAty72LKbgCCMK+gg+47So1 S6N3W+A//yA1/zcsAL3CNN42YBbJJEz+Y5OKG/sPpOSKdl+RWV96QK/se A==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302445" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302445" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:12 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796155" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:12 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 29/44] mm/pkeys: Introduce PKS fault callbacks Date: Thu, 27 Jan 2022 09:54:50 -0800 Message-Id: <20220127175505.851391-30-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Rick Edgecombe Some PKS keys will want special handling on accesses that violate the Pkey permissions. One of these is PMEM which will want to have a mode that logs the access violation, disables protection, and continues rather than oops'ing the machine. Provide an API to set callbacks for individual Pkeys. Call these through pks_handle_key_fault() which is called in the fault handler. Since PKS faults do not provide the key that faulted, this information needs to be recovered by walking the page tables and extracting it from the leaf entry. The key can then be used to call the specific user defined callback. This infrastructure could be used to implement the PKS testing code. Unfortunately, this would limit the ability to test this code itself as well as limit the testing code to a single Pkey. Because pks_test_callback() is zero overhead if CONFIG_PKS_TEST is not specified it is left as a separate hook in the fault handler. Add documentation. Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Signed-off-by: Rick Edgecombe --- Changes for V8: Add pt_regs to the callback signature so that pks_update_exception() can be called if needed. Update commit message Determine if page is large prior to not present Update commit message with more clarity as to why this was kept separate from pks_abandon_protections() and pks_test_callback() Embed documentation in c file. Move handle_pks_key_fault() to pkeys.c s/handle_pks_key_fault/pks_handle_key_fault/ This consolidates the PKS code nicely Add feature check to pks_handle_key_fault() From Rick Edgecombe Fix key value check From kernel test robot Add static to handle_pks_key_fault Changes for V7: New patch --- Documentation/core-api/protection-keys.rst | 9 ++- arch/x86/include/asm/pks.h | 9 +++ arch/x86/mm/fault.c | 3 + arch/x86/mm/pkeys.c | 86 ++++++++++++++++++++++ include/linux/pkeys.h | 3 + include/linux/pks-keys.h | 2 + 6 files changed, 111 insertions(+), 1 deletion(-) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index b89308bf117e..267efa2112e7 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -115,7 +115,8 @@ Overview =20 Similar to user space pkeys, supervisor pkeys allow additional protections= to be defined for a supervisor mappings. Unlike user space pkeys, violations= of -these protections result in a kernel oops. +these protections result in a kernel oops unless a PKS fault handler is +provided which handles the fault. =20 Supervisor Memory Protection Keys (PKS) is a feature which is found on Int= el's Sapphire Rapids (and later) "Scalable Processor" Server CPUs. It will als= o be @@ -150,6 +151,12 @@ Changing permissions of individual keys .. kernel-doc:: arch/x86/mm/pkeys.c :identifiers: pks_update_exception =20 +Overriding Default Fault Behavior +--------------------------------- + +.. kernel-doc:: arch/x86/mm/pkeys.c + :doc: DEFINE_PKS_FAULT_CALLBACK + MSR details ----------- =20 diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h index 065386c8bf37..55541bb64d08 100644 --- a/arch/x86/include/asm/pks.h +++ b/arch/x86/include/asm/pks.h @@ -9,6 +9,8 @@ void pks_write_current(void); void pks_save_pt_regs(struct pt_regs *regs); void pks_restore_pt_regs(struct pt_regs *regs); void pks_dump_fault_info(struct pt_regs *regs); +bool pks_handle_key_fault(struct pt_regs *regs, unsigned long hw_error_cod= e, + unsigned long address); =20 #else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 @@ -18,6 +20,13 @@ static inline void pks_save_pt_regs(struct pt_regs *regs= ) { } static inline void pks_restore_pt_regs(struct pt_regs *regs) { } static inline void pks_dump_fault_info(struct pt_regs *regs) { } =20 +static inline bool pks_handle_key_fault(struct pt_regs *regs, + unsigned long hw_error_code, + unsigned long address) +{ + return false; +} + #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 =20 diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 697c06f08103..e378573d97a7 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1162,6 +1162,9 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned lon= g hw_error_code, */ WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_PKS)); =20 + if (pks_handle_key_fault(regs, hw_error_code, address)) + return; + /* * If a protection key exception occurs it could be because a PKS test * is running. If so, pks_test_callback() will clear the protection diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 6723ae42732a..531cf6c74ad7 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -11,6 +11,7 @@ #include /* boot_cpu_has, ... */ #include /* vma_pkey() */ #include +#include /* X86_PF_WRITE */ =20 int __execute_only_pkey(struct mm_struct *mm) { @@ -212,6 +213,91 @@ u32 pkey_update_pkval(u32 pkval, int pkey, u32 accessb= its) =20 __static_or_pks_test DEFINE_PER_CPU(u32, pkrs_cache); =20 +/** + * DOC: DEFINE_PKS_FAULT_CALLBACK + * + * Users may also provide a fault handler which can handle a fault differe= ntly + * than an oops. For example if 'MY_FEATURE' wanted to define a handler t= hey + * can do so by adding the coresponding entry to the pks_key_callbacks arr= ay. + * + * .. code-block:: c + * + * #ifdef CONFIG_MY_FEATURE + * bool my_feature_pks_fault_callback(struct pt_regs *regs, + * unsigned long address, bool write) + * { + * if (my_feature_fault_is_ok) + * return true; + * return false; + * } + * #endif + * + * static const pks_key_callback pks_key_callbacks[PKS_KEY_NR_CONSUMERS] = =3D { + * [PKS_KEY_DEFAULT] =3D NULL, + * #ifdef CONFIG_MY_FEATURE + * [PKS_KEY_PGMAP_PROTECTION] =3D my_feature_pks_fault_callback, + * #endif + * }; + */ +static const pks_key_callback pks_key_callbacks[PKS_KEY_NR_CONSUMERS] =3D = { 0 }; + +static bool pks_call_fault_callback(struct pt_regs *regs, unsigned long ad= dress, + bool write, u16 key) +{ + if (key >=3D PKS_KEY_NR_CONSUMERS) + return false; + + if (pks_key_callbacks[key]) + return pks_key_callbacks[key](regs, address, write); + + return false; +} + +bool pks_handle_key_fault(struct pt_regs *regs, unsigned long hw_error_cod= e, + unsigned long address) +{ + bool write; + pgd_t pgd; + p4d_t p4d; + pud_t pud; + pmd_t pmd; + pte_t pte; + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return false; + + write =3D (hw_error_code & X86_PF_WRITE); + + pgd =3D READ_ONCE(*(init_mm.pgd + pgd_index(address))); + if (!pgd_present(pgd)) + return false; + + p4d =3D READ_ONCE(*p4d_offset(&pgd, address)); + if (p4d_large(p4d)) + return pks_call_fault_callback(regs, address, write, + pte_flags_pkey(p4d_val(p4d))); + if (!p4d_present(p4d)) + return false; + + pud =3D READ_ONCE(*pud_offset(&p4d, address)); + if (pud_large(pud)) + return pks_call_fault_callback(regs, address, write, + pte_flags_pkey(pud_val(pud))); + if (!pud_present(pud)) + return false; + + pmd =3D READ_ONCE(*pmd_offset(&pud, address)); + if (pmd_large(pmd)) + return pks_call_fault_callback(regs, address, write, + pte_flags_pkey(pmd_val(pmd))); + if (!pmd_present(pmd)) + return false; + + pte =3D READ_ONCE(*pte_offset_kernel(&pmd, address)); + return pks_call_fault_callback(regs, address, write, + pte_flags_pkey(pte_val(pte))); +} + /* * pks_write_pkrs() - Write the pkrs of the current CPU * @new_pkrs: New value to write to the current CPU register diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index c318d97f5da8..a53e4f2c41af 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -82,6 +82,9 @@ static inline void pks_mk_readwrite(int pkey) pks_update_protection(pkey, PKEY_READ_WRITE); } =20 +typedef bool (*pks_key_callback)(struct pt_regs *regs, unsigned long addre= ss, + bool write); + #else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 static inline void pks_mk_noaccess(int pkey) {} diff --git a/include/linux/pks-keys.h b/include/linux/pks-keys.h index 69a0be979515..a3fcd8df8688 100644 --- a/include/linux/pks-keys.h +++ b/include/linux/pks-keys.h @@ -27,6 +27,7 @@ * { * PKS_KEY_DEFAULT =3D 0, * PKS_KEY_MY_FEATURE =3D 1, + * PKS_KEY_NR_CONSUMERS =3D 2, * } * * #define PKS_INIT_VALUE (PKR_RW_KEY(PKS_KEY_DEFAULT) | @@ -43,6 +44,7 @@ enum pks_pkey_consumers { PKS_KEY_DEFAULT =3D 0, /* Must be 0 for default PTE values */ PKS_KEY_TEST =3D 1, + PKS_KEY_NR_CONSUMERS =3D 2, }; =20 #define PKS_INIT_VALUE (PKR_RW_KEY(PKS_KEY_DEFAULT) | \ --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15EC5C433FE for ; Thu, 27 Jan 2022 17:57:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245100AbiA0R5M (ORCPT ); Thu, 27 Jan 2022 12:57:12 -0500 Received: from mga02.intel.com ([134.134.136.20]:19450 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244952AbiA0Rzk (ORCPT ); Thu, 27 Jan 2022 12:55:40 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306140; x=1674842140; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nFif/WsK5jLMynqNca82ezzZaEZ8rp9PXHlxBPSpvCs=; b=gtxZp12x3dOduT82sQyWwAHAg+b+LHL9f1ze/VvwQwA2MemMmKkLfojQ zZzTKexanFnwYlMYo58ndo/ITt1BBCxABHmXRrQ7+SIcisI+oBIDvnx3V G1Q3OehmpVm8Nary88TTGJ3RVWzrR6iJO5/6ejbrGiZiI99yvtgvitVwc kZMUEmx+B7TlEH1cObSGGNOtpuD/5R2xuDMPchV6E/0ITtAurofcZ+v5n BEnqCSuIlGHFaKXddJ2Ut9numBvcwPQdKkWNp9y2PFTdhKHlfoAHfdX1w 9VQR4pcfAExAnxyHqM7N6fY/qFPJ3J6bnzXzn1vhu1FUKZONXVoLH+qWC w==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302447" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302447" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:12 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796158" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:12 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 30/44] mm/pkeys: Test setting a PKS key in a custom fault callback Date: Thu, 27 Jan 2022 09:54:51 -0800 Message-Id: <20220127175505.851391-31-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny A common use case for the custom fault callbacks will be for the callback to warn of the violation and relax the permissions rather than crash the kernel. An example of this is for non-security use cases which may want to relax the permissions and flag the invalid access rather than strictly crash the kernel. In this case the user defines a callback which detects this condition, reports the error, and allows for continued operation by handling the fault through the pks_update_exception(). Add a test which does this. $ echo 5 > /sys/kernel/debug/x86/run_pks $ cat /sys/kernel/debug/x86/run_pks PASS Signed-off-by: Ira Weiny --- Changes for V8 New test developed just to double check for regressions while reworking the code. --- arch/x86/include/asm/pks.h | 2 ++ arch/x86/mm/pkeys.c | 6 +++- lib/pks/pks_test.c | 74 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 81 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h index 55541bb64d08..e09934c540e2 100644 --- a/arch/x86/include/asm/pks.h +++ b/arch/x86/include/asm/pks.h @@ -34,6 +34,8 @@ static inline bool pks_handle_key_fault(struct pt_regs *r= egs, =20 bool pks_test_callback(struct pt_regs *regs); #define __static_or_pks_test +bool pks_test_fault_callback(struct pt_regs *regs, unsigned long address, + bool write); =20 #else /* !CONFIG_PKS_TEST */ =20 diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 531cf6c74ad7..f30ac8215785 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -239,7 +239,11 @@ __static_or_pks_test DEFINE_PER_CPU(u32, pkrs_cache); * #endif * }; */ -static const pks_key_callback pks_key_callbacks[PKS_KEY_NR_CONSUMERS] =3D = { 0 }; +static const pks_key_callback pks_key_callbacks[PKS_KEY_NR_CONSUMERS] =3D { +#ifdef CONFIG_PKS_TEST + [PKS_KEY_TEST] =3D pks_test_fault_callback, +#endif +}; =20 static bool pks_call_fault_callback(struct pt_regs *regs, unsigned long ad= dress, bool write, u16 key) diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c index 008a1079579d..1528df0bb283 100644 --- a/lib/pks/pks_test.c +++ b/lib/pks/pks_test.c @@ -19,6 +19,7 @@ * * 3 Check the context armed in '2' to ensure the MSR value was preserv= ed * * 4 Test that the exception thread PKRS remains independent of the * interrupted threads PKRS + * * 5 Test setting a key to RD/WR in a fault callback to abandon a key * * 8 Loop through all CPUs, report the msr, and check against the defau= lt. * * 9 Set up and fault on a PKS protected page. * @@ -56,6 +57,7 @@ #define ARM_CTX_SWITCH 2 #define CHECK_CTX_SWITCH 3 #define RUN_EXCEPTION 4 +#define RUN_FAULT_ABANDON 5 #define RUN_CRASH_TEST 9 =20 DECLARE_PER_CPU(u32, pkrs_cache); @@ -519,6 +521,75 @@ static void check_ctx_switch(struct file *file) } } =20 +struct { + struct pks_test_ctx *ctx; + void *test_page; + bool armed; + bool callback_seen; +} fault_callback_ctx; + +bool pks_test_fault_callback(struct pt_regs *regs, unsigned long address, + bool write) +{ + if (!fault_callback_ctx.armed) + return false; + + fault_callback_ctx.armed =3D false; + fault_callback_ctx.callback_seen =3D true; + + pks_update_exception(regs, fault_callback_ctx.ctx->pkey, 0); + + return true; +} + +static bool run_fault_clear_test(void) +{ + struct pks_test_ctx *ctx; + void *test_page; + bool rc =3D true; + + ctx =3D alloc_ctx(PKS_KEY_TEST); + if (IS_ERR(ctx)) + return false; + + test_page =3D alloc_test_page(ctx->pkey); + if (!test_page) { + pr_err("Failed to vmalloc page???\n"); + free_ctx(ctx); + return false; + } + + test_armed_key =3D PKS_KEY_TEST; + fault_callback_ctx.ctx =3D ctx; + fault_callback_ctx.test_page =3D test_page; + fault_callback_ctx.armed =3D true; + fault_callback_ctx.callback_seen =3D false; + + pks_mk_noaccess(test_armed_key); + + /* fault */ + memcpy(test_page, ctx->data, 8); + + if (!fault_callback_ctx.callback_seen) { + pr_err("Failed to see the callback\n"); + rc =3D false; + goto done; + } + + /* no fault */ + fault_callback_ctx.callback_seen =3D false; + memcpy(test_page, ctx->data, 8); + + if (fault_caught() || fault_callback_ctx.callback_seen) { + pr_err("The key failed to be set RD/WR in the callback\n"); + return false; + } + +done: + free_ctx(ctx); + return rc; +} + static ssize_t pks_read_file(struct file *file, char __user *user_buf, size_t count, loff_t *ppos) { @@ -572,6 +643,9 @@ static ssize_t pks_write_file(struct file *file, const = char __user *user_buf, case RUN_EXCEPTION: last_test_pass =3D run_exception_test(); break; + case RUN_FAULT_ABANDON: + last_test_pass =3D run_fault_clear_test(); + break; default: last_test_pass =3D false; break; --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B001EC433EF for ; Thu, 27 Jan 2022 17:57:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235148AbiA0R5b (ORCPT ); Thu, 27 Jan 2022 12:57:31 -0500 Received: from mga02.intel.com ([134.134.136.20]:19413 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244887AbiA0Rzl (ORCPT ); Thu, 27 Jan 2022 12:55:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306141; x=1674842141; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=N0Tv6h4FB/cnby57OzSRsvLwV94sj1hOq8tCorWfjUU=; b=PvEbuGjY3Eo+yEZGX4dvh5M2DEhUMo6dFW0dERwAXKsn4MRNOkapidoT UjxL/2irzlOZG975XrowBC33Ey2HSjvjpoBDuLX3Z8YaVFF43Yub6zI9X qggMGHcpeQBIoMIyX8CyxFvFSfGXAlFlZMH8ncEupbFUgU9wQWRHKuu1I V78Z+DJF0OJBxk5tylwc07lCdCNlTUx77PHec0/Z1LEudKv/pEcTto8EC asKYIs3N97ZJ3AeSITi3OPivc02eEQyE2nm8yl7cJquk9UrWBgVWHqmM3 QYK3YZRMhmiOGw/l3t8dkxw3R0DlEESr9Z0ciyALc8b677Po4UgPcKJun g==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302449" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302449" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:12 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796162" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:12 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 31/44] mm/pkeys: Add pks_available() Date: Thu, 27 Jan 2022 09:54:52 -0800 Message-Id: <20220127175505.851391-32-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The PKS code calls will not fail if they are called and a CPU does not support the PKS feature. There will be no protection but the API is safe to call. However, adding the overhead of these calls on CPUs which don't support PKS is inefficient Define pks_available() to allow users to check if PKS is enabled on the current system. If not they can chose to optimize around the PKS calls. Signed-off-by: Ira Weiny --- Changes for V8 s/pks_enabled/pks_available --- Documentation/core-api/protection-keys.rst | 3 +++ arch/x86/mm/pkeys.c | 10 ++++++++++ include/linux/pkeys.h | 6 ++++++ 3 files changed, 19 insertions(+) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index 267efa2112e7..27c9701d4aeb 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -151,6 +151,9 @@ Changing permissions of individual keys .. kernel-doc:: arch/x86/mm/pkeys.c :identifiers: pks_update_exception =20 +.. kernel-doc:: arch/x86/mm/pkeys.c + :identifiers: pks_available + Overriding Default Fault Behavior --------------------------------- =20 diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index f30ac8215785..fa71037c1dd0 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -418,6 +418,16 @@ static void __pks_update_protection(int pkey, u32 prot= ection) pks_write_pkrs(current->thread.pks_saved_pkrs); } =20 +/** + * pks_available() - Is PKS available on this system + * + * Return if PKS is currently supported and enabled on this system. + */ +bool pks_available(void) +{ + return cpu_feature_enabled(X86_FEATURE_PKS); +} + /* * Do not call this directly, see pks_mk*(). * diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index a53e4f2c41af..ec5463c373a1 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -55,6 +55,7 @@ static inline bool arch_pkeys_enabled(void) =20 #include =20 +bool pks_available(void); void pks_update_protection(int pkey, u32 protection); void pks_update_exception(struct pt_regs *regs, int pkey, u32 protection); =20 @@ -87,6 +88,11 @@ typedef bool (*pks_key_callback)(struct pt_regs *regs, u= nsigned long address, =20 #else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 +static inline bool pks_available(void) +{ + return false; +} + static inline void pks_mk_noaccess(int pkey) {} static inline void pks_mk_readwrite(int pkey) {} static inline void pks_update_exception(struct pt_regs *regs, --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10A37C433FE for ; Thu, 27 Jan 2022 17:57:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245037AbiA0R5d (ORCPT ); Thu, 27 Jan 2022 12:57:33 -0500 Received: from mga02.intel.com ([134.134.136.20]:19415 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244803AbiA0Rzl (ORCPT ); Thu, 27 Jan 2022 12:55:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306141; x=1674842141; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=m/ObHfnUjtOOeBMl3JoeXOlyP2CRjZaVHbLDfc2RsE4=; b=EcIYjo5Nz7rrtYAAQgR10uqPrfK+IHGLhZf9VWa9O/Wnwmsjn7ampO9H F/S+x48ielLNX6rEWukLWT4JtaNoxTWnOZKvK9+ZxpSSrROtJQuAJPun3 /je2o2chzq2R0TGjj1WLQlOIwJeZXyjLIT2SuopKrwPCQaSw8xVaUI0IA qB+kT8yp/iU+4OwbIMuZi7wd6Tvtu2e4x99xlKHN51kYlohH1exHsfLTA ledXDdCm1IcQlnTp20/AeHFIt+m8MG6SVLamGUKusWCgeiDtBT2u6VvVi qM94+Fh5tdZpxjMAVWe5LMIQZJbSA+rF39K25UI1y1x4vjzvNGjgGxWgw A==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302450" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302450" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:12 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796165" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:12 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 32/44] memremap_pages: Add Kconfig for DEVMAP_ACCESS_PROTECTION Date: Thu, 27 Jan 2022 09:54:53 -0800 Message-Id: <20220127175505.851391-33-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The persistent memory (PMEM) driver uses the memremap_pages facility to provide 'struct page' metadata (vmemmap) for PMEM. Given that PMEM capacity maybe orders of magnitude higher capacity than System RAM it presents a large vulnerability surface to stray writes. Unlike stray writes to System RAM, which may result in a crash or other undesirable behavior, stray writes to PMEM additionally are more likely to result in permanent data loss. Reboot is not a remediation for PMEM corruption like it is for System RAM. Given that PMEM access from the kernel is limited to a constrained set of locations (PMEM driver, Filesystem-DAX, and direct-I/O to a DAX page), it is amenable to supervisor pkey protection. Not all systems with PMEM will want additional protections. Therefore, add a Kconfig option for the user to configure the additional devmap protections. Only systems with supervisor protection keys (PKS) are able to support this new protection so depend on ARCH_HAS_SUPERVISOR_PKEYS. Furthermore, select ARCH_ENABLE_SUPERVISOR_PKEYS to ensure that the architecture support is enabled if PMEM is the only use case. Only PMEM which is advertised to the memory subsystem needs this protection. Therefore, the feature depends on NVDIMM_PFN. A default of (NVDIMM_PFN && ARCH_HAS_SUPERVISOR_PKEYS) was suggested but logically that is the same as saying default 'yes' because both NVDIMM_PFN and ARCH_HAS_SUPERVISOR_PKEYS are required. Therefore a default of 'yes' is used. Signed-off-by: Ira Weiny --- Changes for V8 Split this out from [PATCH V7 13/18] memremap_pages: Add access protection via supervisor Pro= tection Keys (PKS) --- mm/Kconfig | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/mm/Kconfig b/mm/Kconfig index 46f2bb15aa4e..67e0264acf7d 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -776,6 +776,24 @@ config ZONE_DEVICE =20 If FS_DAX is enabled, then say Y. =20 +config DEVMAP_ACCESS_PROTECTION + bool "Access protection for memremap_pages()" + depends on NVDIMM_PFN + depends on ARCH_HAS_SUPERVISOR_PKEYS + select ARCH_ENABLE_SUPERVISOR_PKEYS + default y + + help + Enable extra protections on device memory. This protects against + unintended access to devices such as a stray writes. This feature is + particularly useful to protect against corruption of persistent + memory. + + This depends on architecture support of supervisor PKeys and has no + overhead if the architecture does not support them. + + If you have persistent memory say 'Y'. + config DEV_PAGEMAP_OPS bool =20 --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13F78C433F5 for ; Thu, 27 Jan 2022 17:58:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239139AbiA0R6M (ORCPT ); Thu, 27 Jan 2022 12:58:12 -0500 Received: from mga02.intel.com ([134.134.136.20]:19418 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244940AbiA0R4A (ORCPT ); Thu, 27 Jan 2022 12:56:00 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306160; x=1674842160; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=h0ELtLaLvU8d5+IR+z+6tOFYz1WyWnwDtWsLq4wETwc=; b=UCaenWA+v6F3cRkgIaOLC83zozXc9GYTcNqeiy3IqeKTX2PfqbWmqtXY VghJyNmu4xpcN5wmB+6N0N18/kePRIwXFumDIF+zVFs7ILT7hUb2TlwHL Sc/K5iLmFwMOz85kL0gqaWi3vuUPKT0MwxjwW7JVovPdsU4y8mHrb33PP MzMqPtFnyESDlmvdu7choSs7lsfOqiTI0aWuFZFtxEF8601fAqPJcz17c 0BF/xnElGfDp0VLSrOGsQPf/vjJrmobe++auVYxnfpCQ+vL7wFYx5tu/C alDMuNunr79HXNkn2BxTL9V36LpRi6MODtoJ6HOkG52ys4xTlDVtgVqfb Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="234302451" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="234302451" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:12 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796169" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:12 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 33/44] memremap_pages: Introduce pgmap_protection_available() Date: Thu, 27 Jan 2022 09:54:54 -0800 Message-Id: <20220127175505.851391-34-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Users will need to specify that they want their dev_pagemap pages protected by specifying a flag in (struct dev_pagemap)->flags. However, it is more efficient to know if that protection is available prior to requesting it and failing the mapping. Define pgmap_protection_available() for users to check if protection is available to be used. The name of pgmap_protection_available() was specifically chosen to isolate the implementation of the protection from higher level users. However, the current implementation simply calls pks_available() to determine if it can support protection. It was considered to have users specify the flag and check if the dev_pagemap object returned was protected or not. But this was considered less efficient than a direct check beforehand. Signed-off-by: Ira Weiny --- Changes for V8 Split this out to it's own patch. s/pgmap_protection_enabled/pgmap_protection_available --- include/linux/mm.h | 13 +++++++++++++ mm/memremap.c | 11 +++++++++++ 2 files changed, 24 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index e1a84b1e6787..2ae99bee6e82 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1143,6 +1143,19 @@ static inline bool is_pci_p2pdma_page(const struct p= age *page) page->pgmap->type =3D=3D MEMORY_DEVICE_PCI_P2PDMA; } =20 +#ifdef CONFIG_DEVMAP_ACCESS_PROTECTION + +bool pgmap_protection_available(void); + +#else + +static inline bool pgmap_protection_available(void) +{ + return false; +} + +#endif /* CONFIG_DEVMAP_ACCESS_PROTECTION */ + /* 127: arbitrary random number, small enough to assemble well */ #define folio_ref_zero_or_close_to_overflow(folio) \ ((unsigned int) folio_ref_count(folio) + 127u <=3D 127u) diff --git a/mm/memremap.c b/mm/memremap.c index 6aa5f0c2d11f..c13b3b8a0048 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -63,6 +64,16 @@ static void devmap_managed_enable_put(struct dev_pagemap= *pgmap) } #endif /* CONFIG_DEV_PAGEMAP_OPS */ =20 +#ifdef CONFIG_DEVMAP_ACCESS_PROTECTION + +bool pgmap_protection_available(void) +{ + return pks_available(); +} +EXPORT_SYMBOL_GPL(pgmap_protection_available); + +#endif /* CONFIG_DEVMAP_ACCESS_PROTECTION */ + static void pgmap_array_delete(struct range *range) { xa_store_range(&pgmap_array, PHYS_PFN(range->start), PHYS_PFN(range->end), --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8E75C433EF for ; Thu, 27 Jan 2022 17:56:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244801AbiA0R4l (ORCPT ); Thu, 27 Jan 2022 12:56:41 -0500 Received: from mga12.intel.com ([192.55.52.136]:65467 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244790AbiA0Rzc (ORCPT ); Thu, 27 Jan 2022 12:55:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306132; x=1674842132; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=puOiWo8xjIdHJixtihA0/Z9lYqbmqzZHdIoTKArXNfE=; b=UT/GE7drUvW6Aw8/Gh2LYeYSxv07oAUE6I6jktmoNlCWRbDo698MoagF 5wJ75W71XxoMGIZJx2Y2JUJD0KK74eBb6wooUG/ncjmZEDXgr7io4cOEd 9QHA01wB5OVuO6FnxcKMAdIgjdT8WFpP8soGj6LZ/qUcukziIpCa6PLCy vkKkOPk5Dh2JfiH/wyy4ll5NvK2CZB7ucWVkCiG3G8zE7/utHZU1vMAGz ZFT9ZgQFIbwjiLaVUcepCqOVJrr1FNy2Mp7mEhi5Q2J25z8ZZT251bcFN 6klyHR76l1q6CYvDMSUwyOlKUsa9hGB9IJole4mgHo7xf3mPABUxsGY2f Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899131" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899131" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:13 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796173" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:12 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 34/44] memremap_pages: Introduce a PGMAP_PROTECTION flag Date: Thu, 27 Jan 2022 09:54:55 -0800 Message-Id: <20220127175505.851391-35-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The persistent memory (PMEM) driver uses the memremap_pages facility to provide 'struct page' metadata (vmemmap) for PMEM. Given that PMEM capacity maybe orders of magnitude higher capacity than System RAM it presents a large vulnerability surface to stray writes. Unlike stray writes to System RAM, which may result in a crash or other undesirable behavior, stray writes to PMEM additionally are more likely to result in permanent data loss. Reboot is not a remediation for PMEM corruption like it is for System RAM. Given that PMEM access from the kernel is limited to a constrained set of locations (PMEM driver, Filesystem-DAX, and direct-I/O to a DAX page), it is amenable to supervisor pkey protection. Some systems which have enabled DEVMAP_ACCESS_PROTECTION may not have PMEM installed. Or the PMEM may not be mapped into the direct map. Also users other than PMEM of memremap_pages() will not want these pages protected. Define a new PGMAP flag, PGMAP_PROTECTION. This can be passed in (struct dev_pagemap)->flags when calling memremap_pages() to request that the pages be protected. Then use the flag to enable a static key. The static key is used to optimize the protection away if no callers are currently using protections. Specifying this flag on a system which can't support protections will fail. Users are expected to check if protections are supported via pgmap_protection_available() prior to asking for them. Signed-off-by: Ira Weiny --- Changes for V8 Split this out into it's own patch --- include/linux/memremap.h | 1 + mm/memremap.c | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 1fafcc38acba..84402f73712c 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -80,6 +80,7 @@ struct dev_pagemap_ops { }; =20 #define PGMAP_ALTMAP_VALID (1 << 0) +#define PGMAP_PROTECTION (1 << 1) =20 /** * struct dev_pagemap - metadata for ZONE_DEVICE mappings diff --git a/mm/memremap.c b/mm/memremap.c index c13b3b8a0048..a74d985a1908 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -66,12 +66,39 @@ static void devmap_managed_enable_put(struct dev_pagema= p *pgmap) =20 #ifdef CONFIG_DEVMAP_ACCESS_PROTECTION =20 +/* + * Note; all devices which have asked for protections share the same key. = The + * key may, or may not, have been provided by the core. If not, protection + * will be disabled. The key acquisition is attempted when the first ZONE + * DEVICE requests it and freed when all zones have been unmapped. + * + * Also this must be EXPORT_SYMBOL rather than EXPORT_SYMBOL_GPL because i= t is + * intended to be used in the kmap API. + */ +DEFINE_STATIC_KEY_FALSE(dev_pgmap_protection_static_key); +EXPORT_SYMBOL(dev_pgmap_protection_static_key); + +static void devmap_protection_enable(void) +{ + static_branch_inc(&dev_pgmap_protection_static_key); +} + +static void devmap_protection_disable(void) +{ + static_branch_dec(&dev_pgmap_protection_static_key); +} + bool pgmap_protection_available(void) { return pks_available(); } EXPORT_SYMBOL_GPL(pgmap_protection_available); =20 +#else /* !CONFIG_DEVMAP_ACCESS_PROTECTION */ + +static void devmap_protection_enable(void) { } +static void devmap_protection_disable(void) { } + #endif /* CONFIG_DEVMAP_ACCESS_PROTECTION */ =20 static void pgmap_array_delete(struct range *range) @@ -173,6 +200,9 @@ void memunmap_pages(struct dev_pagemap *pgmap) =20 WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n"); devmap_managed_enable_put(pgmap); + + if (pgmap->flags & PGMAP_PROTECTION) + devmap_protection_disable(); } EXPORT_SYMBOL_GPL(memunmap_pages); =20 @@ -319,6 +349,12 @@ void *memremap_pages(struct dev_pagemap *pgmap, int ni= d) if (WARN_ONCE(!nr_range, "nr_range must be specified\n")) return ERR_PTR(-EINVAL); =20 + if (pgmap->flags & PGMAP_PROTECTION) { + if (!pgmap_protection_available()) + return ERR_PTR(-EINVAL); + devmap_protection_enable(); + } + switch (pgmap->type) { case MEMORY_DEVICE_PRIVATE: if (!IS_ENABLED(CONFIG_DEVICE_PRIVATE)) { --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99DE6C433FE for ; Thu, 27 Jan 2022 17:56:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245187AbiA0R42 (ORCPT ); Thu, 27 Jan 2022 12:56:28 -0500 Received: from mga12.intel.com ([192.55.52.136]:65473 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244871AbiA0Rz2 (ORCPT ); Thu, 27 Jan 2022 12:55:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306128; x=1674842128; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/5kL3YPno2knWFgAR17WabOcz7J2U7N6UL4fD+Thy3U=; b=JRXOIhlIl4KHc3SCLLhQ5lcYtAh3zmBGXWR8LZuFEVQNpcCm5u3ZUaQi nagUaIbl23Utpv2aPSaWi3Sq6F35jT/tU86yVELADh3gBJZnG8jkdScfZ 3XY3f/gBKqu+QIzduB6c5umb4NDiNUoxWdLqhiuPYvkFzZr0kFNV+elFL 0dSzFpCDKAcSR+u5VROyHQABDkahtIqVej0JlRlrIOGNTlEIfcoS2m6dA EntaeO4HBuE03psqRpPzkfXQCHKn3O+k4GD/nRqQMkLYVrA+Da+NcCVcM JuItDueeZuP9GxyaPdU1+HEND4qUXK6migGpAsrLTpG5Q78Ig9NyQ/1XP g==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899133" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899133" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:13 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796177" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:13 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 35/44] memremap_pages: Introduce devmap_protected() Date: Thu, 27 Jan 2022 09:54:56 -0800 Message-Id: <20220127175505.851391-36-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Users of protected dev_pagemaps can check the PGMAP_PROTECTION flag to see if the devmap is protected. However, most callers operate on struct page's not the pagemap directly. Define devmap_protected() to determine if a page is part of a dev_pagemap mapping and if so if the page is protected by the additional protections. Signed-off-by: Ira Weiny --- include/linux/mm.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2ae99bee6e82..6e4a2758e3d3 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1145,6 +1145,23 @@ static inline bool is_pci_p2pdma_page(const struct p= age *page) =20 #ifdef CONFIG_DEVMAP_ACCESS_PROTECTION =20 +DECLARE_STATIC_KEY_FALSE(dev_pgmap_protection_static_key); + +/* + * devmap_protected() requires a reference on the page to ensure there is = no + * races with dev_pagemap tear down. + */ +static inline bool devmap_protected(struct page *page) +{ + if (!static_branch_unlikely(&dev_pgmap_protection_static_key)) + return false; + if (!is_zone_device_page(page)) + return false; + if (page->pgmap->flags & PGMAP_PROTECTION) + return true; + return false; +} + bool pgmap_protection_available(void); =20 #else --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50D69C433FE for ; Thu, 27 Jan 2022 17:56:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245242AbiA0R4f (ORCPT ); Thu, 27 Jan 2022 12:56:35 -0500 Received: from mga12.intel.com ([192.55.52.136]:65462 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244797AbiA0Rzc (ORCPT ); Thu, 27 Jan 2022 12:55:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306132; x=1674842132; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hMk3LMURScwSXrvqat2n+0LCReDi2A4lEtzX2po89nQ=; b=ILI/p5427iT3toT7E/iTCKDvA3tZVWowkKROGmlfIGpBJPKoN8rJCC+t mVp6i1b/7Ci1OJdDGdAAFbc/zjvX8BQB4v44EL+4GF2QlHMGiYCRSBmOk MgVL1zWfgsG5APGDvW3Cg/L7lFKwtv8b1/DiarrmE4AoqvkZ3HIeI3bZB Zt0/03mhlCnQL0QDrpE7KZ3j3CZwGKSK1wS6qcMf0f2H5Hw5Cfrkpkiyr g743vCTISrYeq+3zPuR3PCSHy2COzRORhXPXpgpPK/aRHf/nBO3RF0twM IL8J7KGXNw4W5yHKcv6XlCg6Q2WvYeJMRtj4wgngYw6aBzstGOI4+PNpa g==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899135" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899135" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:13 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796181" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:13 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 36/44] memremap_pages: Reserve a PKS PKey for eventual use by PMEM Date: Thu, 27 Jan 2022 09:54:57 -0800 Message-Id: <20220127175505.851391-37-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The persistent memory (PMEM) driver uses the memremap_pages facility to provide 'struct page' metadata (vmemmap) for PMEM. Given that PMEM capacity maybe orders of magnitude higher capacity than System RAM it presents a large vulnerability surface to stray writes. Unlike stray writes to System RAM, which may result in a crash or other undesirable behavior, stray writes to PMEM additionally are more likely to result in permanent data loss. Reboot is not a remediation for PMEM corruption like it is for System RAM. Given that PMEM access from the kernel is limited to a constrained set of locations (PMEM driver, Filesystem-DAX, and direct-I/O to a DAX page), it is amenable to supervisor pkey protection. PMEM uses the memmap facility to map it's pages into the direct map. Reserve a PKey for use by the memmap facility. Signed-off-by: Ira Weiny --- include/linux/pks-keys.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/include/linux/pks-keys.h b/include/linux/pks-keys.h index a3fcd8df8688..46bb9a18da5a 100644 --- a/include/linux/pks-keys.h +++ b/include/linux/pks-keys.h @@ -42,14 +42,16 @@ * */ enum pks_pkey_consumers { - PKS_KEY_DEFAULT =3D 0, /* Must be 0 for default PTE values */ - PKS_KEY_TEST =3D 1, - PKS_KEY_NR_CONSUMERS =3D 2, + PKS_KEY_DEFAULT =3D 0, /* Must be 0 for default PTE values */ + PKS_KEY_TEST =3D 1, + PKS_KEY_PGMAP_PROTECTION =3D 2, + PKS_KEY_NR_CONSUMERS =3D 3, }; =20 #define PKS_INIT_VALUE (PKR_RW_KEY(PKS_KEY_DEFAULT) | \ PKR_AD_KEY(PKS_KEY_TEST) | \ - PKR_AD_KEY(2) | PKR_AD_KEY(3) | \ + PKR_AD_KEY(PKS_KEY_PGMAP_PROTECTION) | \ + PKR_AD_KEY(3) | \ PKR_AD_KEY(4) | PKR_AD_KEY(5) | \ PKR_AD_KEY(6) | PKR_AD_KEY(7) | \ PKR_AD_KEY(8) | PKR_AD_KEY(9) | \ --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75A7EC433FE for ; Thu, 27 Jan 2022 17:56:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245212AbiA0R4b (ORCPT ); Thu, 27 Jan 2022 12:56:31 -0500 Received: from mga12.intel.com ([192.55.52.136]:65458 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244799AbiA0Rzb (ORCPT ); Thu, 27 Jan 2022 12:55:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306131; x=1674842131; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+kpcQgxJ7fszxV7l5EYVD/+VKUecWGueqOZwFpwV6EQ=; b=ENM+2H1v+0DQtzBLi6mY3dwgFraZIgyi2HLHlaZ6JVbffvBH+Eg1Qv0D oUbdb86CFew6Pe63rYrxBAmjljsD8vdq4ZR2AEPCeAQLayikWcA0SUhNv xKjfuYcusVKkZ1MZ0hs/8J0idDXq9A+6xb/3p0DXSQNDSzmnPMtUZ9/nS VDitRuJIPWWtER+AfB1lYKm0b/8HPpBhmLk6iVuQruUM4ZyGr7MkJm6eY BsaRwgiKn02I0bWYk2IA6KYnQ09FhXQIU6Kgs+/aXJOXt9QVTPTspscVM 3Q5RfTWrVmqSOa5LB6V7fnrAnU1HOwbhwJBKkq73qLyBc/KkyvIRsTT1b A==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899137" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899137" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:13 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796185" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:13 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 37/44] memremap_pages: Set PKS PKey in PTEs if PGMAP_PROTECTIONS is requested Date: Thu, 27 Jan 2022 09:54:58 -0800 Message-Id: <20220127175505.851391-38-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny When the user requests protections the dev_pagemap mappings need to have a PKEY set. Define devmap_protection_adjust_pgprot() to add the PKey to the page protections. Call it when PGMAP_PROTECTIONS is requested when remapping pages. Signed-off-by: Ira Weiny --- mm/memremap.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/mm/memremap.c b/mm/memremap.c index a74d985a1908..d3e6f328a711 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -83,6 +83,14 @@ static void devmap_protection_enable(void) static_branch_inc(&dev_pgmap_protection_static_key); } =20 +static pgprot_t devmap_protection_adjust_pgprot(pgprot_t prot) +{ + pgprotval_t val; + + val =3D pgprot_val(prot); + return __pgprot(val | _PAGE_PKEY(PKS_KEY_PGMAP_PROTECTION)); +} + static void devmap_protection_disable(void) { static_branch_dec(&dev_pgmap_protection_static_key); @@ -99,6 +107,10 @@ EXPORT_SYMBOL_GPL(pgmap_protection_available); static void devmap_protection_enable(void) { } static void devmap_protection_disable(void) { } =20 +static pgprot_t devmap_protection_adjust_pgprot(pgprot_t prot) +{ + return prot; +} #endif /* CONFIG_DEVMAP_ACCESS_PROTECTION */ =20 static void pgmap_array_delete(struct range *range) @@ -353,6 +365,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) if (!pgmap_protection_available()) return ERR_PTR(-EINVAL); devmap_protection_enable(); + params.pgprot =3D devmap_protection_adjust_pgprot(params.pgprot); } =20 switch (pgmap->type) { --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D346BC433F5 for ; Thu, 27 Jan 2022 17:56:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245335AbiA0R4u (ORCPT ); Thu, 27 Jan 2022 12:56:50 -0500 Received: from mga12.intel.com ([192.55.52.136]:65504 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244769AbiA0Rzd (ORCPT ); Thu, 27 Jan 2022 12:55:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306133; x=1674842133; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OakK5azFAZtRgEn1SDscsdM2loliiFcrxNWheUTy0fE=; b=f7sM8AlOPADn4RQw2g7vqXZowDO2IoWn3erMqYmwEtNTOlXEBS4xYm2X m8vHjGfY0A8eefwVAzKhpHqh7nApzXmL0W7D5Hpr+xWBakrwqt69VMhzI 7FxGvIqDtsOMTqmpxE2j1MQiardWlmGsXQv1o9+3H5oC7GTRYoZgafb8r /txNCd5MDmn/NQGtaKGSXRDTaBGGs8AnRPiDXAUXSWX2GqPoEMVY6tcWq pW8dm/6TlkP9qMcBwr0vou9TbSy+Wdn8hUoeo0vdu6s2T6n3lanFdujoC 9qwlU8pC4iXAtx/dHspz66hQjrkQiqwaWYV28oBcLhqBNtCcmtPMTWc24 A==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899138" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899138" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:13 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796189" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:13 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 38/44] memremap_pages: Define pgmap_mk_{readwrite|noaccess}() calls Date: Thu, 27 Jan 2022 09:54:59 -0800 Message-Id: <20220127175505.851391-39-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Users will need a way to flag valid access to pages which have been protected with PGMAP protections. Provide this by defining pgmap_mk_*() accessor functions. pgmap_mk_{readwrite|noaccess}() take a struct page for convenience. They determine if the page is protected by dev_pagemap protections. If so, they perform the requested operation. In addition, the lower level __pgmap_* functions are exported. They take the dev_pagemap object directly for internal users who have knowledge of the of the dev_pagemap. All changes in the protections must be through the above calls. They abstract the protection implementation (currently the PKS api) from the upper layer users. Furthermore, the calls are nestable by the use of a per task reference count. This ensures that the first call to re-enable protection does not 'break' the last access of the device memory. Access to device memory during exceptions (#PF) is expected only from user faults. Therefore there is no need to maintain the reference count when entering or exiting exceptions. However, reference counting will occur during the exception. Recall that protection is automatically enabled during exceptions by the PKS core.[1] NOTE: It is not anticipated that any code paths will directly nest these calls. For this reason multiple reviewers, including Dan and Thomas, asked why this reference counting was needed at this level rather than in a higher level call such as kmap_{atomic,local_page}(). The reason is that pgmap_mk_readwrite() could nest with regards to other callers of pgmap_mk_*() such as kmap_{atomic,local_page}(). Therefore push this reference counting to the lower level and just ensure that these calls are nestable. [1] https://lore.kernel.org/lkml/20210401225833.566238-9-ira.weiny@intel.co= m/ Signed-off-by: Ira Weiny --- Changes for V8 Split these functions into their own patch. This helps to clarify the commit message and usage. --- include/linux/mm.h | 34 ++++++++++++++++++++++++++++++++++ include/linux/sched.h | 7 +++++++ init/init_task.c | 3 +++ mm/memremap.c | 14 ++++++++++++++ 4 files changed, 58 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 6e4a2758e3d3..60044de77c54 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1162,10 +1162,44 @@ static inline bool devmap_protected(struct page *pa= ge) return false; } =20 +void __pgmap_mk_readwrite(struct dev_pagemap *pgmap); +void __pgmap_mk_noaccess(struct dev_pagemap *pgmap); + +static inline bool pgmap_check_pgmap_prot(struct page *page) +{ + if (!devmap_protected(page)) + return false; + + /* + * There is no known use case to change permissions in an irq for pgmap + * pages + */ + lockdep_assert_in_irq(); + return true; +} + +static inline void pgmap_mk_readwrite(struct page *page) +{ + if (!pgmap_check_pgmap_prot(page)) + return; + __pgmap_mk_readwrite(page->pgmap); +} +static inline void pgmap_mk_noaccess(struct page *page) +{ + if (!pgmap_check_pgmap_prot(page)) + return; + __pgmap_mk_noaccess(page->pgmap); +} + bool pgmap_protection_available(void); =20 #else =20 +static inline void __pgmap_mk_readwrite(struct dev_pagemap *pgmap) { } +static inline void __pgmap_mk_noaccess(struct dev_pagemap *pgmap) { } +static inline void pgmap_mk_readwrite(struct page *page) { } +static inline void pgmap_mk_noaccess(struct page *page) { } + static inline bool pgmap_protection_available(void) { return false; diff --git a/include/linux/sched.h b/include/linux/sched.h index f5b2be39a78c..5020ed7e67b7 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1492,6 +1492,13 @@ struct task_struct { struct callback_head l1d_flush_kill; #endif =20 +#ifdef CONFIG_DEVMAP_ACCESS_PROTECTION + /* + * NOTE: pgmap_prot_count is modified within a single thread of + * execution. So it does not need to be atomic_t. + */ + u32 pgmap_prot_count; +#endif /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. diff --git a/init/init_task.c b/init/init_task.c index 73cc8f03511a..948b32cf8139 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -209,6 +209,9 @@ struct task_struct init_task #ifdef CONFIG_SECCOMP_FILTER .seccomp =3D { .filter_count =3D ATOMIC_INIT(0) }, #endif +#ifdef CONFIG_DEVMAP_ACCESS_PROTECTION + .pgmap_prot_count =3D 0, +#endif }; EXPORT_SYMBOL(init_task); =20 diff --git a/mm/memremap.c b/mm/memremap.c index d3e6f328a711..b75c4f778c59 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -96,6 +96,20 @@ static void devmap_protection_disable(void) static_branch_dec(&dev_pgmap_protection_static_key); } =20 +void __pgmap_mk_readwrite(struct dev_pagemap *pgmap) +{ + if (!current->pgmap_prot_count++) + pks_mk_readwrite(PKS_KEY_PGMAP_PROTECTION); +} +EXPORT_SYMBOL_GPL(__pgmap_mk_readwrite); + +void __pgmap_mk_noaccess(struct dev_pagemap *pgmap) +{ + if (!--current->pgmap_prot_count) + pks_mk_noaccess(PKS_KEY_PGMAP_PROTECTION); +} +EXPORT_SYMBOL_GPL(__pgmap_mk_noaccess); + bool pgmap_protection_available(void) { return pks_available(); --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD55CC433FE for ; Thu, 27 Jan 2022 17:57:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245393AbiA0R5E (ORCPT ); Thu, 27 Jan 2022 12:57:04 -0500 Received: from mga12.intel.com ([192.55.52.136]:65473 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244930AbiA0Rzi (ORCPT ); Thu, 27 Jan 2022 12:55:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306138; x=1674842138; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KVtLp/kvaCle0VCg5h7SYtbJ5kNcsLDbkqOlQWCypfM=; b=j4BIUMhw9GalxFhhEoMeDUxptUg94IybbZA+ISTjkGEqLMzivjbZeI0C /hbW8zNI16xJyTXz2B+kshMlmDBvglP3Z4NOvBOFvQc+DLZiuJ3PhEpFF NDgDrhNzuRC8ZcsoAW+J8z9yRfZ0A+fOKUOF5N3JsQgA/7dUZGL0RTGRl 2L08eNvzpYB0b24N5bIdio5vgnSL9QXVqQYZM2qxc/RCkZB5+sw45i76N welHaaZdQiG+0t+/mJuZ0MMvMGFT3Km4gq1qfVS1EdnoJhgXWDJ73VXhL Oshy/bxy6di0+HmQt3pOHKFHkN+4BfuGTopRJSkyqCC6IwaGjAVLcY6GH A==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899140" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899140" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:14 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796194" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:13 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 39/44] memremap_pages: Add memremap.pks_fault_mode Date: Thu, 27 Jan 2022 09:55:00 -0800 Message-Id: <20220127175505.851391-40-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Some systems may be using pmem in unanticipated ways. As such, it is possible an foreseen code path to violate the restrictions of the PMEM PKS protections. In order to provide a more seamless integration of the PMEM PKS feature provide a pks_fault_mode that allows for a relaxed mode should a previously working feature fault on the PKS protected PMEM. 2 modes are available: 'relaxed' (default) -- WARN_ONCE, removed the protections, and continuing to operate. 'strict' -- BUG_ON/or fault indicating the error. This is the most protective of the PMEM memory but may be undesirable in some configurations. NOTE: The typedef of pks_fault_modes is required to allow param_check_pks_fault() to work automatically for us. So the typedef checkpatch warning is ignored. NOTE: There was some debate about if a 3rd mode called 'silent' should be available. 'silent' would be the same as 'relaxed' but not print any output. While 'silent' is nice for admins to reduce console/log output it would result in less motivation to fix invalid access to the protected pmem pages. Therefore, 'silent' is left out. Signed-off-by: Ira Weiny --- Changes for V8 Use pks_update_exception() instead of abandoning the pkey. Split out pgmap_protection_flag_invalid() into a separate patch for clarity. From Rick Edgecombe Fix sysfs_streq() checks From Randy Dunlap Fix Documentation closing parans Changes for V7 Leverage Rick Edgecombe's fault callback infrastructure to relax invalid uses and prevent crashes From Dan Williams Use sysfs_* calls for parameter Make pgmap_disable_protection inline Remove pfn from warn output Remove silent parameter option --- .../admin-guide/kernel-parameters.txt | 14 ++++ arch/x86/mm/pkeys.c | 4 ++ include/linux/mm.h | 3 + mm/memremap.c | 67 +++++++++++++++++++ 4 files changed, 88 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index f5a27f067db9..3e70a6194831 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4158,6 +4158,20 @@ pirq=3D [SMP,APIC] Manual mp-table setup See Documentation/x86/i386/IO-APIC.rst. =20 + memremap.pks_fault_mode=3D [X86] Control the behavior of page map + protection violations. Violations may not be an actual + use of the memory but simply an attempt to map it in an + incompatible way. + (depends on CONFIG_DEVMAP_ACCESS_PROTECTION) + + Format: { relaxed | strict } + + relaxed - Print a warning, disable the protection and + continue execution. + strict - Stop kernel execution via BUG_ON or fault + + default: relaxed + plip=3D [PPT,NET] Parallel port network link Format: { parport | timid | 0 } See also Documentation/admin-guide/parport.rst. diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index fa71037c1dd0..e864a9b7828a 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -6,6 +6,7 @@ #include /* debugfs_create_u32() */ #include /* mm_struct, vma, etc... */ #include /* PKEY_* */ +#include /* fault callback */ #include =20 #include /* boot_cpu_has, ... */ @@ -243,6 +244,9 @@ static const pks_key_callback pks_key_callbacks[PKS_KEY= _NR_CONSUMERS] =3D { #ifdef CONFIG_PKS_TEST [PKS_KEY_TEST] =3D pks_test_fault_callback, #endif +#ifdef CONFIG_DEVMAP_ACCESS_PROTECTION + [PKS_KEY_PGMAP_PROTECTION] =3D pgmap_pks_fault_callback, +#endif }; =20 static bool pks_call_fault_callback(struct pt_regs *regs, unsigned long ad= dress, diff --git a/include/linux/mm.h b/include/linux/mm.h index 60044de77c54..e900df563437 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1193,6 +1193,9 @@ static inline void pgmap_mk_noaccess(struct page *pag= e) =20 bool pgmap_protection_available(void); =20 +bool pgmap_pks_fault_callback(struct pt_regs *regs, unsigned long address, + bool write); + #else =20 static inline void __pgmap_mk_readwrite(struct dev_pagemap *pgmap) { } diff --git a/mm/memremap.c b/mm/memremap.c index b75c4f778c59..783b1cd4bb42 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -96,6 +96,73 @@ static void devmap_protection_disable(void) static_branch_dec(&dev_pgmap_protection_static_key); } =20 +/* + * Ignore the checkpatch warning because the typedef allows + * param_check_pks_fault_modes to automatically check the passed value. + */ +typedef enum { + PKS_MODE_STRICT =3D 0, + PKS_MODE_RELAXED =3D 1, +} pks_fault_modes; + +pks_fault_modes pks_fault_mode =3D PKS_MODE_RELAXED; + +static int param_set_pks_fault_mode(const char *val, const struct kernel_p= aram *kp) +{ + int ret =3D -EINVAL; + + if (sysfs_streq(val, "relaxed")) { + pks_fault_mode =3D PKS_MODE_RELAXED; + ret =3D 0; + } else if (sysfs_streq(val, "strict")) { + pks_fault_mode =3D PKS_MODE_STRICT; + ret =3D 0; + } + + return ret; +} + +static int param_get_pks_fault_mode(char *buffer, const struct kernel_para= m *kp) +{ + int ret =3D 0; + + switch (pks_fault_mode) { + case PKS_MODE_STRICT: + ret =3D sysfs_emit(buffer, "strict\n"); + break; + case PKS_MODE_RELAXED: + ret =3D sysfs_emit(buffer, "relaxed\n"); + break; + default: + ret =3D sysfs_emit(buffer, "\n"); + break; + } + + return ret; +} + +static const struct kernel_param_ops param_ops_pks_fault_modes =3D { + .set =3D param_set_pks_fault_mode, + .get =3D param_get_pks_fault_mode, +}; + +#define param_check_pks_fault_modes(name, p) \ + __param_check(name, p, pks_fault_modes) +module_param(pks_fault_mode, pks_fault_modes, 0644); + +bool pgmap_pks_fault_callback(struct pt_regs *regs, unsigned long address, + bool write) +{ + /* In strict mode just let the fault handler oops */ + if (pks_fault_mode =3D=3D PKS_MODE_STRICT) + return false; + + WARN_ONCE(1, "Page map protection being disabled"); + pks_update_exception(regs, PKS_KEY_PGMAP_PROTECTION, 0); + return true; +} +EXPORT_SYMBOL_GPL(pgmap_pks_fault_callback); + void __pgmap_mk_readwrite(struct dev_pagemap *pgmap) { if (!current->pgmap_prot_count++) --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CBFABC433EF for ; Thu, 27 Jan 2022 17:57:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244973AbiA0R5I (ORCPT ); Thu, 27 Jan 2022 12:57:08 -0500 Received: from mga12.intel.com ([192.55.52.136]:65458 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244949AbiA0Rzk (ORCPT ); Thu, 27 Jan 2022 12:55:40 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306140; x=1674842140; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hafndCqd1tn+/XA563qswaQWce691LD6SXk8i5RzkWQ=; b=YOLWhpUVyja22Gr5Jyns1uDWa8dLdxKrrcSALJKUFwniQMBW7a4Bb9Oj gguGZpQAGE64Kv3QkctAqxtkKse0YmcFpUbNUwI4Mm4KgTRmk+TMxzD1t xdhgJswKLA++MNvhcW7h6uX46tecPa5ccQ7TuAHdqwiK0rQ4AXCkxJzah bkHl7e89xPr8fqb1uWQSaBj9FOferMngEsSh/pOvZX15BFKORDP2Kxp2O U5+s1ADFg6nu3Q/KPmxo408bG4I+xzifIrMahjhNgI+9HxHOIHArd+BLA EjYPeeEOTzZC6AD0RfiLZAJmVSXydvo1h9NVmO+NhspTDgyYCpdG3Qfxw g==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899143" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899143" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:14 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796200" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:13 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 40/44] memremap_pages: Add pgmap_protection_flag_invalid() Date: Thu, 27 Jan 2022 09:55:01 -0800 Message-Id: <20220127175505.851391-41-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Some systems may be using pmem in ways that are known to be incompatible with the PKS implementation. One such example is the use of kmap() to create 'global' mappings. Rather than only reporting the invalid access on fault, provide a call to flag those uses immediately. This allows for a much better splat for debugging to occur. This is also nice because even if no invalid access' actually occurs, the invalid mapping can be fixed with kmap_local_page() rather than having to look for a different solution. Define pgmap_protection_flag_invalid() and have it follow the policy set by pks_fault_mode. Signed-off-by: Ira Weiny --- Changes for V8 Split this from the fault mode patch --- include/linux/mm.h | 23 +++++++++++++++++++++++ mm/memremap.c | 9 +++++++++ 2 files changed, 32 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index e900df563437..3c0aa686b5bd 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1162,6 +1162,7 @@ static inline bool devmap_protected(struct page *page) return false; } =20 +void __pgmap_protection_flag_invalid(struct dev_pagemap *pgmap); void __pgmap_mk_readwrite(struct dev_pagemap *pgmap); void __pgmap_mk_noaccess(struct dev_pagemap *pgmap); =20 @@ -1178,6 +1179,27 @@ static inline bool pgmap_check_pgmap_prot(struct pag= e *page) return true; } =20 +/* + * pgmap_protection_flag_invalid - Check and flag an invalid use of a pgmap + * protected page + * + * There are code paths which are known to not be compatible with pgmap + * protections. pgmap_protection_flag_invalid() is provided as a 'relief + * valve' to be used in those functions which are known to be incompatible. + * + * Thus an invalid use case can be flaged with more precise data rather th= an + * just flagging a fault. Like the fault handler code this abandons the u= se of + * the PKS key and optionally allows the calling code path to continue bas= ed on + * the configuration of the memremap.pks_fault_mode command line + * (and/or sysfs) option. + */ +static inline void pgmap_protection_flag_invalid(struct page *page) +{ + if (!pgmap_check_pgmap_prot(page)) + return; + __pgmap_protection_flag_invalid(page->pgmap); +} + static inline void pgmap_mk_readwrite(struct page *page) { if (!pgmap_check_pgmap_prot(page)) @@ -1200,6 +1222,7 @@ bool pgmap_pks_fault_callback(struct pt_regs *regs, u= nsigned long address, =20 static inline void __pgmap_mk_readwrite(struct dev_pagemap *pgmap) { } static inline void __pgmap_mk_noaccess(struct dev_pagemap *pgmap) { } +static inline void pgmap_protection_flag_invalid(struct page *page) { } static inline void pgmap_mk_readwrite(struct page *page) { } static inline void pgmap_mk_noaccess(struct page *page) { } =20 diff --git a/mm/memremap.c b/mm/memremap.c index 783b1cd4bb42..fd4b9b83b770 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -150,6 +150,15 @@ static const struct kernel_param_ops param_ops_pks_fau= lt_modes =3D { __param_check(name, p, pks_fault_modes) module_param(pks_fault_mode, pks_fault_modes, 0644); =20 +void __pgmap_protection_flag_invalid(struct dev_pagemap *pgmap) +{ + if (pks_fault_mode =3D=3D PKS_MODE_STRICT) + return; + + WARN_ONCE(1, "Invalid page map use"); +} +EXPORT_SYMBOL_GPL(__pgmap_protection_flag_invalid); + bool pgmap_pks_fault_callback(struct pt_regs *regs, unsigned long address, bool write) { --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6D2AC433EF for ; Thu, 27 Jan 2022 17:57:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245098AbiA0R5V (ORCPT ); Thu, 27 Jan 2022 12:57:21 -0500 Received: from mga12.intel.com ([192.55.52.136]:65462 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244955AbiA0Rzl (ORCPT ); Thu, 27 Jan 2022 12:55:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306141; x=1674842141; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ikBmsVH9URVUPs2BMCiled8q5lxWUT7V1EgXuZO4LQk=; b=LciAOr7xX54kw2XFuLEspXwXgh/hHhkro2OPN2dwqbj8tsgn2re3bIkR fv3LVPUZDDlkI4PobHXWRHARTRfxf49kG7I4DmUuU/3STEaKfH+6Ek/D/ gMeLaB+RzCkRlZmGpTlal16iC2ds8EJ0FiMgFIKNFsp53phsUeQTE6N5i mZXjggKgHZIbCaMkZpEq5CahV2/kYFvf0DLRpeDLo7VhRzodDNleGFm4K lR6CruPF6r/1tUCtIPLoz4Kk3C9I92c34wPTFUPWDCTQ7jO1GCslJX9nQ 2BMBE68lYyXmAkFd/lfgSuRs9NYr/iKzU18+btpK4t5kev5oMtf3wg6WJ A==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899145" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899145" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:14 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796204" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:14 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 41/44] kmap: Ensure kmap works for devmap pages Date: Thu, 27 Jan 2022 09:55:02 -0800 Message-Id: <20220127175505.851391-42-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Users of devmap pages should not have to know that the pages they are operating on are special. Co-opt the kmap_{local_page,atomic}() to mediate access to PKS protected pages via the devmap facility. kmap_{local_page,atomic}() are both thread local mappings so they work well with the thread specific protections available. kmap(), on the other hand, allows for global mappings to be established, Which is incompatible with the underlying PKS facility. For this reason kmap() is not supported. Rather than leave the kmap mappings to fault at random times when users may access them, call pgmap_protection_flag_invalid() to show kmap() users the call stack of where mapping was created. This allows better debugging. This behavior is safe because neither of the 2 current DAX-capable filesystems (ext4 and xfs) perform such global mappings. And known device drivers that would handle devmap pages are not using kmap(). Any future filesystems that gain DAX support, or device drivers wanting to support devmap protected pages will need to use kmap_local_page(). Direct-map exposure is already mitigated by default on HIGHMEM systems because by definition HIGHMEM systems do not have large capacities of memory in the direct map. And using kmap in those systems actually creates a separate mapping. Therefore, to reduce complexity HIGHMEM systems are not supported. Cc: Dan Williams Cc: Dave Hansen Signed-off-by: Ira Weiny --- Changes for V8 Reword commit message --- include/linux/highmem-internal.h | 5 +++++ mm/Kconfig | 1 + 2 files changed, 6 insertions(+) diff --git a/include/linux/highmem-internal.h b/include/linux/highmem-inter= nal.h index 0a0b2b09b1b8..1a006558734c 100644 --- a/include/linux/highmem-internal.h +++ b/include/linux/highmem-internal.h @@ -159,6 +159,7 @@ static inline struct page *kmap_to_page(void *addr) static inline void *kmap(struct page *page) { might_sleep(); + pgmap_protection_flag_invalid(page); return page_address(page); } =20 @@ -174,6 +175,7 @@ static inline void kunmap(struct page *page) =20 static inline void *kmap_local_page(struct page *page) { + pgmap_mk_readwrite(page); return page_address(page); } =20 @@ -197,6 +199,7 @@ static inline void __kunmap_local(void *addr) #ifdef ARCH_HAS_FLUSH_ON_KUNMAP kunmap_flush_on_unmap(addr); #endif + pgmap_mk_noaccess(kmap_to_page(addr)); } =20 static inline void *kmap_atomic(struct page *page) @@ -206,6 +209,7 @@ static inline void *kmap_atomic(struct page *page) else preempt_disable(); pagefault_disable(); + pgmap_mk_readwrite(page); return page_address(page); } =20 @@ -224,6 +228,7 @@ static inline void __kunmap_atomic(void *addr) #ifdef ARCH_HAS_FLUSH_ON_KUNMAP kunmap_flush_on_unmap(addr); #endif + pgmap_mk_noaccess(kmap_to_page(addr)); pagefault_enable(); if (IS_ENABLED(CONFIG_PREEMPT_RT)) migrate_enable(); diff --git a/mm/Kconfig b/mm/Kconfig index 67e0264acf7d..d537679448ae 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -779,6 +779,7 @@ config ZONE_DEVICE config DEVMAP_ACCESS_PROTECTION bool "Access protection for memremap_pages()" depends on NVDIMM_PFN + depends on !HIGHMEM depends on ARCH_HAS_SUPERVISOR_PKEYS select ARCH_ENABLE_SUPERVISOR_PKEYS default y --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F411C433EF for ; Thu, 27 Jan 2022 17:57:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245422AbiA0R5f (ORCPT ); Thu, 27 Jan 2022 12:57:35 -0500 Received: from mga12.intel.com ([192.55.52.136]:65467 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244796AbiA0Rzl (ORCPT ); Thu, 27 Jan 2022 12:55:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306141; x=1674842141; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5PNbZiFfcGOHVsut66vPdPMpwN8Bxyu2O4inZ1cSArk=; b=dlt905LV9MBr0qoh5Naryw0LXMXrkdIJi0j2yheM77mQ+Q7o5z+CUvgW FHOE4PadWcFt4/a2Olf5tzHazsG5wdpT8sHNlks4SwuCeA3agBZEEcski ZhEZt96OCDvsUVX7EbwpvLPIv4T530mIiZCn7TWnbFvAMv0hW+naUnQGJ YbtBQ8y6IzpNEfjVXRm8bJYjgYiBAjrkmrIiJ7eGVjqp4oAtT7clXv8JV byiQkO0mlav7KSQEeDl+miBiQ8sMe5Bi35vUlCDRRDHlzhYV0MevxuCu/ ClXKbUu9otQ6SZVPwEO8A8pArIOsGANPv4/IU0C8mFoQyR2PEA9+klaTF w==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899147" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899147" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:14 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796208" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:14 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 42/44] dax: Stray access protection for dax_direct_access() Date: Thu, 27 Jan 2022 09:55:03 -0800 Message-Id: <20220127175505.851391-43-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny dax_direct_access() provides a way to obtain the direct map address of PMEM memory. Coordinate PKS protection with dax_direct_access() of protected devmap pages. Introduce 3 new dax_operation calls .map_protected .mk_readwrite and .mk_noaccess. These 3 calls do not have to be implemented by the dax provider if no protection is implemented. Threads of execution can use dax_mk_{readwrite,noaccess}() to relax the protection of the dax device and allow direct use of the kaddr returned from dax_direct_access(). The dax_mk_{readwrite,noaccess}() calls only need to be used to guard actual access to the memory. Other uses of dax_direct_access() do not need to use these guards. For users who require a permanent address to the dax device such as the DM write cache. dax_map_protected() indicates that the dax device has additional protections and that user should create it's own permanent mapping of the memory. Update the DM write cache code to create this permanent mapping. Signed-off-by: Ira Weiny --- Changes for V8 Rebase changes on 5.17-rc1 Clean up the cover letter dax_read_lock() is not required s/dax_protected()/dax_map_protected()/ Testing revealed a dax_flush() which was not properly protected. Changes for V7 Rework cover letter. Do not include a FS_DAX_LIMITED restriction for dcss. It will simply not implement the protection and there is no need to special case this. Clean up commit message because I did not originally understand the nuance of the s390 device. Introduce dax_{protected,mk_readwrite,mk_noaccess}() From Dan Williams Remove old clean up cruft from previous versions Remove map_protected Remove 'global' parameters all calls --- drivers/dax/super.c | 54 ++++++++++++++++++++++++++++++++++++++ drivers/md/dm-writecache.c | 8 +++++- fs/dax.c | 8 ++++++ fs/fuse/virtio_fs.c | 2 ++ include/linux/dax.h | 8 ++++++ 5 files changed, 79 insertions(+), 1 deletion(-) diff --git a/drivers/dax/super.c b/drivers/dax/super.c index e3029389d809..705b2e736200 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -117,6 +117,8 @@ enum dax_device_flags { * @pgoff: offset in pages from the start of the device to translate * @nr_pages: number of consecutive pages caller can handle relative to @p= fn * @kaddr: output parameter that returns a virtual address mapping of pfn + * Direct access through this pointer must be guarded by calls to + * dax_mk_{readwrite,noaccess}() * @pfn: output parameter that returns an absolute pfn translation of @pgo= ff * * Return: negative errno if an error occurs, otherwise the number of @@ -209,6 +211,58 @@ void dax_flush(struct dax_device *dax_dev, void *addr,= size_t size) #endif EXPORT_SYMBOL_GPL(dax_flush); =20 +bool dax_map_protected(struct dax_device *dax_dev) +{ + if (!dax_alive(dax_dev)) + return false; + + if (dax_dev->ops->map_protected) + return dax_dev->ops->map_protected(dax_dev); + return false; +} +EXPORT_SYMBOL_GPL(dax_map_protected); + +/** + * dax_mk_readwrite() - make protected dax devices read/write + * @dax_dev: the dax device representing the memory to access + * + * Any access of the kaddr memory returned from dax_direct_access() must be + * guarded by dax_mk_readwrite() and dax_mk_noaccess(). This ensures that= any + * dax devices which have additional protections are allowed to relax those + * protections for the thread using this memory. + * + * NOTE these calls must be contained within a single thread of execution = and + * both must be guarded by dax_read_lock() Which is also a requirement for + * dax_direct_access() anyway. + */ +void dax_mk_readwrite(struct dax_device *dax_dev) +{ + if (!dax_alive(dax_dev)) + return; + + if (dax_dev->ops->mk_readwrite) + dax_dev->ops->mk_readwrite(dax_dev); +} +EXPORT_SYMBOL_GPL(dax_mk_readwrite); + +/** + * dax_mk_noaccess() - restore protection to dax devices if needed + * @dax_dev: the dax device representing the memory to access + * + * See dax_direct_access() and dax_mk_readwrite() + * + * NOTE Must be called prior to dax_read_unlock() + */ +void dax_mk_noaccess(struct dax_device *dax_dev) +{ + if (!dax_alive(dax_dev)) + return; + + if (dax_dev->ops->mk_noaccess) + dax_dev->ops->mk_noaccess(dax_dev); +} +EXPORT_SYMBOL_GPL(dax_mk_noaccess); + void dax_write_cache(struct dax_device *dax_dev, bool wc) { if (wc) diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c index 4f31591d2d25..5d6d7b6bad30 100644 --- a/drivers/md/dm-writecache.c +++ b/drivers/md/dm-writecache.c @@ -297,7 +297,13 @@ static int persistent_memory_claim(struct dm_writecach= e *wc) r =3D -EOPNOTSUPP; goto err2; } - if (da !=3D p) { + + /* + * Force the write cache to map the pages directly if the dax device + * mapping is protected or if the number of pages returned was not what + * was requested. + */ + if (dax_map_protected(wc->ssd_dev->dax_dev) || da !=3D p) { long i; wc->memory_map =3D NULL; pages =3D kvmalloc_array(p, sizeof(struct page *), GFP_KERNEL); diff --git a/fs/dax.c b/fs/dax.c index cd03485867a7..0b22a1091fe2 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -728,7 +728,9 @@ static int copy_cow_page_dax(struct vm_fault *vmf, cons= t struct iomap_iter *iter return rc; } vto =3D kmap_atomic(vmf->cow_page); + dax_mk_readwrite(iter->iomap.dax_dev); copy_user_page(vto, kaddr, vmf->address, vmf->cow_page); + dax_mk_noaccess(iter->iomap.dax_dev); kunmap_atomic(vto); dax_read_unlock(id); return 0; @@ -937,8 +939,10 @@ static int dax_writeback_one(struct xa_state *xas, str= uct dax_device *dax_dev, count =3D 1UL << dax_entry_order(entry); index =3D xas->xa_index & ~(count - 1); =20 + dax_mk_readwrite(dax_dev); dax_entry_mkclean(mapping, index, pfn); dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE); + dax_mk_noaccess(dax_dev); /* * After we have flushed the cache, we can clear the dirty tag. There * cannot be new dirty data in the pfn after the flush has completed as @@ -1125,8 +1129,10 @@ static int dax_memzero(struct dax_device *dax_dev, p= goff_t pgoff, =20 ret =3D dax_direct_access(dax_dev, pgoff, 1, &kaddr, NULL); if (ret > 0) { + dax_mk_readwrite(dax_dev); memset(kaddr + offset, 0, size); dax_flush(dax_dev, kaddr + offset, size); + dax_mk_noaccess(dax_dev); } return ret; } @@ -1260,12 +1266,14 @@ static loff_t dax_iomap_iter(const struct iomap_ite= r *iomi, if (map_len > end - pos) map_len =3D end - pos; =20 + dax_mk_readwrite(dax_dev); if (iov_iter_rw(iter) =3D=3D WRITE) xfer =3D dax_copy_from_iter(dax_dev, pgoff, kaddr, map_len, iter); else xfer =3D dax_copy_to_iter(dax_dev, pgoff, kaddr, map_len, iter); + dax_mk_noaccess(dax_dev); =20 pos +=3D xfer; length -=3D xfer; diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 9d737904d07c..c748218fe70c 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -774,8 +774,10 @@ static int virtio_fs_zero_page_range(struct dax_device= *dax_dev, rc =3D dax_direct_access(dax_dev, pgoff, nr_pages, &kaddr, NULL); if (rc < 0) return rc; + dax_mk_readwrite(dax_dev); memset(kaddr, 0, nr_pages << PAGE_SHIFT); dax_flush(dax_dev, kaddr, nr_pages << PAGE_SHIFT); + dax_mk_noaccess(dax_dev); return 0; } =20 diff --git a/include/linux/dax.h b/include/linux/dax.h index 9fc5f99a0ae2..261af298f89f 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -30,6 +30,10 @@ struct dax_operations { sector_t, sector_t); /* zero_page_range: required operation. Zero page range */ int (*zero_page_range)(struct dax_device *, pgoff_t, size_t); + + bool (*map_protected)(struct dax_device *dax_dev); + void (*mk_readwrite)(struct dax_device *dax_dev); + void (*mk_noaccess)(struct dax_device *dax_dev); }; =20 #if IS_ENABLED(CONFIG_DAX) @@ -187,6 +191,10 @@ int dax_zero_page_range(struct dax_device *dax_dev, pg= off_t pgoff, size_t nr_pages); void dax_flush(struct dax_device *dax_dev, void *addr, size_t size); =20 +bool dax_map_protected(struct dax_device *dax_dev); +void dax_mk_readwrite(struct dax_device *dax_dev); +void dax_mk_noaccess(struct dax_device *dax_dev); + ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops); vm_fault_t dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_s= ize, --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 703FEC433EF for ; Thu, 27 Jan 2022 17:59:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245446AbiA0R5m (ORCPT ); Thu, 27 Jan 2022 12:57:42 -0500 Received: from mga12.intel.com ([192.55.52.136]:65504 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244998AbiA0Rzs (ORCPT ); Thu, 27 Jan 2022 12:55:48 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306148; x=1674842148; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TPB2LrmlUVsHBLkS8/W1ZpxXOGwMXL8e5PUJSM7dUI8=; b=mQFPB2LlCR7Kocqq2sbAjxLPjhj3YvSTrpLYYd3eMhPUykgby1VJncT9 8q8jro8VaTekqbUKPiDwyFzvd4BauvxxmaNyWJ0pvmz4R0YKUzsND2Xii lF33ssx1+umUNlwGdMoDp4f1V5frKHgCQA+3bUsE/sjrc4kRPSH9wZEOg RSZA8JHjdyB74SHOuP7iYARlhBMmOhnUhejhvPqwN4k0FrptBA5ijMxdF gavLPMDbe5KdRkxMhUB2N/FqQufS3oa5ktJUcANe5l/GJp0B0t/eoLrGd UfQCCoWc5nAqcJP92SWfixX9E/8REkbJw/kyvjq/92xF8eG0GibN0Elck Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="226899150" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="226899150" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:14 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796214" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:14 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 43/44] nvdimm/pmem: Enable stray access protection Date: Thu, 27 Jan 2022 09:55:04 -0800 Message-Id: <20220127175505.851391-44-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Now that all valid kernel access' to PMEM have been annotated with {__}pgmap_mk_{readwrite,noaccess}() PGMAP_PROTECTION is safe to enable in the pmem layer. Implement the pmem_map_protected() and pmem_mk_{readwrite,noaccess}() to communicate this memory has extra protection to the upper layers if PGMAP_PROTECTION is specified. Internally, the pmem driver uses a cached virtual address, pmem->virt_addr (pmem_addr). Use __pgmap_mk_{readwrite,noaccess}() directly when PGMAP_PROTECTION is active on the device. Signed-off-by: Ira Weiny --- Changes for V8 Rebase to 5.17-rc1 Remove global param Add internal structure which uses the pmem device and pgmap device directly in the *_mk_*() calls. Add pmem dax ops callbacks Use pgmap_protection_available() s/PGMAP_PKEY_PROTECT/PGMAP_PROTECTION --- drivers/nvdimm/pmem.c | 52 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 51 insertions(+), 1 deletion(-) diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 58d95242a836..2afff8157233 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -138,6 +138,18 @@ static blk_status_t read_pmem(struct page *page, unsig= ned int off, return BLK_STS_OK; } =20 +static void __pmem_mk_readwrite(struct pmem_device *pmem) +{ + if (pmem->pgmap.flags & PGMAP_PROTECTION) + __pgmap_mk_readwrite(&pmem->pgmap); +} + +static void __pmem_mk_noaccess(struct pmem_device *pmem) +{ + if (pmem->pgmap.flags & PGMAP_PROTECTION) + __pgmap_mk_noaccess(&pmem->pgmap); +} + static blk_status_t pmem_do_read(struct pmem_device *pmem, struct page *page, unsigned int page_off, sector_t sector, unsigned int len) @@ -149,7 +161,10 @@ static blk_status_t pmem_do_read(struct pmem_device *p= mem, if (unlikely(is_bad_pmem(&pmem->bb, sector, len))) return BLK_STS_IOERR; =20 + __pmem_mk_readwrite(pmem); rc =3D read_pmem(page, page_off, pmem_addr, len); + __pmem_mk_noaccess(pmem); + flush_dcache_page(page); return rc; } @@ -181,11 +196,14 @@ static blk_status_t pmem_do_write(struct pmem_device = *pmem, * after clear poison. */ flush_dcache_page(page); + + __pmem_mk_readwrite(pmem); write_pmem(pmem_addr, page, page_off, len); if (unlikely(bad_pmem)) { rc =3D pmem_clear_poison(pmem, pmem_off, len); write_pmem(pmem_addr, page, page_off, len); } + __pmem_mk_noaccess(pmem); =20 return rc; } @@ -301,11 +319,36 @@ static long pmem_dax_direct_access(struct dax_device = *dax_dev, return __pmem_direct_access(pmem, pgoff, nr_pages, kaddr, pfn); } =20 +static bool pmem_map_protected(struct dax_device *dax_dev) +{ + struct pmem_device *pmem =3D dax_get_private(dax_dev); + + return (pmem->pgmap.flags & PGMAP_PROTECTION); +} + +static void pmem_mk_readwrite(struct dax_device *dax_dev) +{ + __pmem_mk_readwrite(dax_get_private(dax_dev)); +} + +static void pmem_mk_noaccess(struct dax_device *dax_dev) +{ + __pmem_mk_noaccess(dax_get_private(dax_dev)); +} + static const struct dax_operations pmem_dax_ops =3D { .direct_access =3D pmem_dax_direct_access, .zero_page_range =3D pmem_dax_zero_page_range, }; =20 +static const struct dax_operations pmem_protected_dax_ops =3D { + .direct_access =3D pmem_dax_direct_access, + .zero_page_range =3D pmem_dax_zero_page_range, + .map_protected =3D pmem_map_protected, + .mk_readwrite =3D pmem_mk_readwrite, + .mk_noaccess =3D pmem_mk_noaccess, +}; + static ssize_t write_cache_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -427,6 +470,8 @@ static int pmem_attach_disk(struct device *dev, pmem->pfn_flags =3D PFN_DEV; if (is_nd_pfn(dev)) { pmem->pgmap.type =3D MEMORY_DEVICE_FS_DAX; + if (pgmap_protection_available()) + pmem->pgmap.flags |=3D PGMAP_PROTECTION; addr =3D devm_memremap_pages(dev, &pmem->pgmap); pfn_sb =3D nd_pfn->pfn_sb; pmem->data_offset =3D le64_to_cpu(pfn_sb->dataoff); @@ -440,6 +485,8 @@ static int pmem_attach_disk(struct device *dev, pmem->pgmap.range.end =3D res->end; pmem->pgmap.nr_range =3D 1; pmem->pgmap.type =3D MEMORY_DEVICE_FS_DAX; + if (pgmap_protection_available()) + pmem->pgmap.flags |=3D PGMAP_PROTECTION; addr =3D devm_memremap_pages(dev, &pmem->pgmap); pmem->pfn_flags |=3D PFN_MAP; bb_range =3D pmem->pgmap.range; @@ -474,7 +521,10 @@ static int pmem_attach_disk(struct device *dev, nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_range); disk->bb =3D &pmem->bb; =20 - dax_dev =3D alloc_dax(pmem, &pmem_dax_ops); + if (pmem->pgmap.flags & PGMAP_PROTECTION) + dax_dev =3D alloc_dax(pmem, &pmem_protected_dax_ops); + else + dax_dev =3D alloc_dax(pmem, &pmem_dax_ops); if (IS_ERR(dax_dev)) { rc =3D PTR_ERR(dax_dev); goto out; --=20 2.31.1 From nobody Tue Jun 30 01:43:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21FE2C433F5 for ; Thu, 27 Jan 2022 17:56:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245171AbiA0R4Z (ORCPT ); Thu, 27 Jan 2022 12:56:25 -0500 Received: from mga06.intel.com ([134.134.136.31]:16774 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244858AbiA0Rz1 (ORCPT ); Thu, 27 Jan 2022 12:55:27 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643306127; x=1674842127; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0+NRkt4jFtorhkBaictzxf0Z79rbBKCqDV6xhB7GtYA=; b=PVowpxuDFLOb5IinzrjYUdYJv8hftxyJ+XS0dMGSX+PxXuQzz+EXuJuD 69lT4Dvl6Dl/P0YmQla1Od0XRrDYz2LAU6y1wnw4Di/94hlCVKpqnJScZ dFuspR1NLIWcdP3OCfkUTyX+EOXHizCoc4XWAstPyeLljdHIG5m9lmyUn 4K90nPgx8K6qB4BYuExdSMLuzOjiuOszWTrNneSGJmandbc5HUYdQxr+0 3ZZG7BkgRDgpdgOT//jIsKT5DVJI+xgIhJ2YuHab/Jn3WCvYyT7VNm2sL nZaLkdMlQpCWpxh7bfOm+GqYerPsR3Up3LaG1YFVnDP6RXhJ9BcK+LfQs g==; X-IronPort-AV: E=McAfee;i="6200,9189,10239"; a="307637000" X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="307637000" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:14 -0800 X-IronPort-AV: E=Sophos;i="5.88,321,1635231600"; d="scan'208";a="674796218" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2022 09:55:14 -0800 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , linux-kernel@vger.kernel.org Subject: [PATCH V8 44/44] devdax: Enable stray access protection Date: Thu, 27 Jan 2022 09:55:05 -0800 Message-Id: <20220127175505.851391-45-ira.weiny@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220127175505.851391-1-ira.weiny@intel.com> References: <20220127175505.851391-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Device dax is primarily accessed through user space and kernel access is controlled through the kmap interfaces. Now that all valid kernel initiated access to dax devices have been accounted for, turn on PGMAP_PKEYS_PROTECT for device dax. Signed-off-by: Ira Weiny Reviewed-by: Dan Williams --- Changes for V8 Rebase to 5.17-rc1 Use pgmap_protection_available() s/PGMAP_PKEYS_PROTECT/PGMAP_PROTECTION/ --- drivers/dax/device.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/dax/device.c b/drivers/dax/device.c index d33a0613ed0c..cee375ef2cac 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -452,6 +452,8 @@ int dev_dax_probe(struct dev_dax *dev_dax) if (dev_dax->align > PAGE_SIZE) pgmap->vmemmap_shift =3D order_base_2(dev_dax->align >> PAGE_SHIFT); + if (pgmap_protection_available()) + pgmap->flags |=3D PGMAP_PROTECTION; addr =3D devm_memremap_pages(dev, pgmap); if (IS_ERR(addr)) return PTR_ERR(addr); --=20 2.31.1