From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE287C433EF for ; Tue, 19 Apr 2022 17:07:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355526AbiDSRJr (ORCPT ); Tue, 19 Apr 2022 13:09:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59668 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355357AbiDSRJg (ORCPT ); Tue, 19 Apr 2022 13:09:36 -0400 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F555FF7 for ; Tue, 19 Apr 2022 10:06:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388013; x=1681924013; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4kg96elYopHb8G+ADzcmCb8Kh5ugzUamsq/2aa+YD4w=; b=ZVSM1sGmAxhpfn5iglGqwXYe/CCOwcVrab2Q0ZRjLKo1W4BoeMsc09mr DvqS3ffSnJHCzl6Vq7hvKoa6h7kPtH7LtF0uoJAdZlndYN1bVNLrqggBp BrpRmQE4+nHitsqakA6Qm2GX1k4W69uDAZYJXY9MMeqH5Yu6wKKizaCSy 1H72YFRFufuRZJd6rH0Bxd90kn0S4JGbfiFFF73ke5wgwTWaDWjz58J/H TAV6S/oisr6CvW+F9OIpNbySVYvaiDcpn0TQA22m46z4cNfaCqO3RWrlW bal1GWWUdbZ6zjcUpjea0Ug9NrMS/+dVqYEhsqAv6G8tZ2b/MiMSlkekz Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="350267538" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="350267538" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:53 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="727145490" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:51 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 01/44] Documentation/protection-keys: Clean up documentation for User Space pkeys Date: Tue, 19 Apr 2022 10:06:06 -0700 Message-Id: <20220419170649.1022246-2-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The documentation for user space pkeys was a bit dated including things such as Amazon and distribution testing information which is irrelevant now. Update the documentation. This also streamlines adding the Supervisor pkey documentation later on. Cc: "Moger, Babu" Signed-off-by: Ira Weiny --- Changes for V9: use pkey Change information on which CPU's have PKU --- Documentation/core-api/protection-keys.rst | 44 +++++++++++----------- 1 file changed, 21 insertions(+), 23 deletions(-) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index ec575e72d0b2..bf28ac0401f3 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -4,31 +4,29 @@ Memory Protection Keys =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 -Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature -which is found on Intel's Skylake (and later) "Scalable Processor" -Server CPUs. It will be available in future non-server Intel parts -and future AMD processors. - -For anyone wishing to test or use this feature, it is available in -Amazon's EC2 C5 instances and is known to work there using an Ubuntu -17.04 image. - -Memory Protection Keys provides a mechanism for enforcing page-based -protections, but without requiring modification of the page tables -when an application changes protection domains. It works by -dedicating 4 previously ignored bits in each page table entry to a -"protection key", giving 16 possible keys. - -There is also a new user-accessible register (PKRU) with two separate -bits (Access Disable and Write Disable) for each key. Being a CPU -register, PKRU is inherently thread-local, potentially giving each +Memory Protection Keys provide a mechanism for enforcing page-based +protections, but without requiring modification of the page tables when an +application changes protection domains. + +Pkeys Userspace (PKU) is a feature which can be found on: + * Intel server CPUs, Skylake and later + * Intel client CPUs, Tiger Lake (11th Gen Core) and later + * Future AMD CPUs + +Pkeys work by dedicating 4 previously Reserved bits in each page table ent= ry to +a "protection key", giving 16 possible keys. + +Protections for each key are defined with a per-CPU user-accessible regist= er +(PKRU). Each of these is a 32-bit register storing two bits (Access Disab= le +and Write Disable) for each of 16 keys. + +Being a CPU register, PKRU is inherently thread-local, potentially giving = each thread a different set of protections from every other thread. =20 -There are two new instructions (RDPKRU/WRPKRU) for reading and writing -to the new register. The feature is only available in 64-bit mode, -even though there is theoretically space in the PAE PTEs. These -permissions are enforced on data access only and have no effect on -instruction fetches. +There are two instructions (RDPKRU/WRPKRU) for reading and writing to the +register. The feature is only available in 64-bit mode, even though there= is +theoretically space in the PAE PTEs. These permissions are enforced on da= ta +access only and have no effect on instruction fetches. =20 Syscalls =3D=3D=3D=3D=3D=3D=3D=3D --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB5EEC433EF for ; Tue, 19 Apr 2022 17:07:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355562AbiDSRJl (ORCPT ); Tue, 19 Apr 2022 13:09:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355332AbiDSRJg (ORCPT ); Tue, 19 Apr 2022 13:09:36 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45455EA5 for ; Tue, 19 Apr 2022 10:06:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388013; x=1681924013; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7iJMoT3l/imb1JMUcZ/zix886+gdS+rNaiSrvLlCA98=; b=WcJSc6Uv8hwyoFDeHe3cpakktDhSLWIobFotX3NLVc+9dRTNFy3AVwMU MygP48zpoMDorrUtPpNUX4y3npGT160xdxYfJDMY0sgCdnul3tf/mLfdi elXS01RSTi7uPGzCjkZXGPziUirGDVQ6xkvbBl3AaksZcGRaUiCIjaeYr gsmPYW0Jl8sv5SQfIV03QRKC8zYiSFnWoI8mEJBOazE3kuMj5c0ooffpj t0d+7quoUF7i8pOGpZlGt+fIzL7l2qtsX+ciX/FSiDKn7NmqAQuC9Z8eN x+/fOkJ8/NUUGe7hnrUNcLjXuwXPFtkR/mtvn3TiL6x9Mv2NLsngqy1dM Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="262676976" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="262676976" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:53 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="529397120" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:52 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 02/44] x86/pkeys: Clarify PKRU_AD_KEY macro Date: Tue, 19 Apr 2022 10:06:07 -0700 Message-Id: <20220419170649.1022246-3-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny When changing the PKRU_AD_KEY macro to be used for PKS the name came into question.[1] The intent of PKRU_AD_KEY is to set an initial value for the PKRU register but that is just a mask value. Clarify this by changing the name to PKRU_AD_MASK(). NOTE the checkpatch errors are ignored for the init_pkru_value to align the values in the code. [1] https://lore.kernel.org/lkml/eff862e2-bfaa-9e12-42b5-a12467d72a22@intel= .com/ Suggested-by: Dave Hansen Signed-off-by: Ira Weiny --- Changes for V9 New Patch --- arch/x86/mm/pkeys.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index e44e938885b7..7418c367e328 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -110,7 +110,7 @@ int __arch_override_mprotect_pkey(struct vm_area_struct= *vma, int prot, int pkey return vma_pkey(vma); } =20 -#define PKRU_AD_KEY(pkey) (PKRU_AD_BIT << ((pkey) * PKRU_BITS_PER_PKEY)) +#define PKRU_AD_MASK(pkey) (PKRU_AD_BIT << ((pkey) * PKRU_BITS_PER_PKEY)) =20 /* * Make the default PKRU value (at execve() time) as restrictive @@ -118,11 +118,14 @@ int __arch_override_mprotect_pkey(struct vm_area_stru= ct *vma, int prot, int pkey * in the process's lifetime will not accidentally get access * to data which is pkey-protected later on. */ -u32 init_pkru_value =3D PKRU_AD_KEY( 1) | PKRU_AD_KEY( 2) | PKRU_AD_KEY( 3= ) | - PKRU_AD_KEY( 4) | PKRU_AD_KEY( 5) | PKRU_AD_KEY( 6) | - PKRU_AD_KEY( 7) | PKRU_AD_KEY( 8) | PKRU_AD_KEY( 9) | - PKRU_AD_KEY(10) | PKRU_AD_KEY(11) | PKRU_AD_KEY(12) | - PKRU_AD_KEY(13) | PKRU_AD_KEY(14) | PKRU_AD_KEY(15); +u32 init_pkru_value =3D PKRU_AD_MASK( 1) | PKRU_AD_MASK( 2) | + PKRU_AD_MASK( 3) | PKRU_AD_MASK( 4) | + PKRU_AD_MASK( 5) | PKRU_AD_MASK( 6) | + PKRU_AD_MASK( 7) | PKRU_AD_MASK( 8) | + PKRU_AD_MASK( 9) | PKRU_AD_MASK(10) | + PKRU_AD_MASK(11) | PKRU_AD_MASK(12) | + PKRU_AD_MASK(13) | PKRU_AD_MASK(14) | + PKRU_AD_MASK(15); =20 static ssize_t init_pkru_read_file(struct file *file, char __user *user_bu= f, size_t count, loff_t *ppos) --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 704ABC433F5 for ; Tue, 19 Apr 2022 17:07:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243467AbiDSRKA (ORCPT ); Tue, 19 Apr 2022 13:10:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355563AbiDSRJm (ORCPT ); Tue, 19 Apr 2022 13:09:42 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 727AB1126 for ; Tue, 19 Apr 2022 10:06:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388019; x=1681924019; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Of3ganU3ZTxP7amVCiBrHXde5Bs0hdSIq360lhZUjQE=; b=A4h19uRe4PYcUxzHQ0xyB2fZfRIj+dL/Xmg5l/g0CLHZqDeOBbA0KoI9 sZvucs+V766rxo79jioUmBdA953s2+M8h6WwX7JyizicysKc3hAJ8K9NA ocguaB9uW/UrCdSvbcki+DiKCJ3zk59m5iEjkKh0zRsqBDawwZIsedi1S xTbCv7oNgwc3GHntQSbEjz/dbX8UOLKW6UGUp46dB26ODgwqyptEFDtt/ 3/a1sZBLEOn5qOYXKqEYR6j39JWwBgbcCSy/Qe5GxRSmbTHKj2asfHnSo 1futeGmd1RLJcI6v5+XbBKBIUF3zmAubqvjy33tRGhxDRGIv7i8fUkmhY w==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="263280352" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="263280352" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:54 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="614074972" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:53 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 03/44] x86/pkeys: Make PKRU macros generic Date: Tue, 19 Apr 2022 10:06:08 -0700 Message-Id: <20220419170649.1022246-4-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Protection Keys User (PKU) and Protection Keys Supervisor (PKS) work in similar fashions and can share common defines. Specifically PKS and PKU each have: 1. A single control register 2. The same number of keys 3. The same number of bits in the register per key 4. Access and Write disable in the same bit locations Given the above, share all the macros that synthesize and manipulate register values between the two features. Share these defines by moving them into a new header, change their names to reflect the common use, and include the header where needed. This mostly takes the form of converting names from the PKU-specific "PKRU" to a user/supervisor agnostic "PKR". Also while editing the code remove the use of 'we' from comments being touched. NOTE the checkpatch errors are ignored for the init_pkru_value to align the values in the code. Acked-by: Dave Hansen Signed-off-by: Ira Weiny --- Changes for V9: From Dave Hansen Add detail to commit message Add Ack s/PKR_AD_KEY/PKR_AD_MASK/ Changes from v7: Rebased onto latest --- arch/x86/include/asm/pkeys_common.h | 11 +++++++++++ arch/x86/include/asm/pkru.h | 20 ++++++++------------ arch/x86/kernel/fpu/xstate.c | 10 +++++----- arch/x86/mm/pkeys.c | 20 +++++++++----------- 4 files changed, 33 insertions(+), 28 deletions(-) create mode 100644 arch/x86/include/asm/pkeys_common.h diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pke= ys_common.h new file mode 100644 index 000000000000..359b94cdcc0c --- /dev/null +++ b/arch/x86/include/asm/pkeys_common.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_PKEYS_COMMON_H +#define _ASM_X86_PKEYS_COMMON_H + +#define PKR_AD_BIT 0x1u +#define PKR_WD_BIT 0x2u +#define PKR_BITS_PER_PKEY 2 + +#define PKR_AD_MASK(pkey) (PKR_AD_BIT << ((pkey) * PKR_BITS_PER_PKEY)) + +#endif /*_ASM_X86_PKEYS_COMMON_H */ diff --git a/arch/x86/include/asm/pkru.h b/arch/x86/include/asm/pkru.h index 74f0a2d34ffd..06980dd42946 100644 --- a/arch/x86/include/asm/pkru.h +++ b/arch/x86/include/asm/pkru.h @@ -3,10 +3,7 @@ #define _ASM_X86_PKRU_H =20 #include - -#define PKRU_AD_BIT 0x1u -#define PKRU_WD_BIT 0x2u -#define PKRU_BITS_PER_PKEY 2 +#include =20 #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS extern u32 init_pkru_value; @@ -18,18 +15,17 @@ extern u32 init_pkru_value; =20 static inline bool __pkru_allows_read(u32 pkru, u16 pkey) { - int pkru_pkey_bits =3D pkey * PKRU_BITS_PER_PKEY; - return !(pkru & (PKRU_AD_BIT << pkru_pkey_bits)); + int pkru_pkey_bits =3D pkey * PKR_BITS_PER_PKEY; + + return !(pkru & (PKR_AD_BIT << pkru_pkey_bits)); } =20 static inline bool __pkru_allows_write(u32 pkru, u16 pkey) { - int pkru_pkey_bits =3D pkey * PKRU_BITS_PER_PKEY; - /* - * Access-disable disables writes too so we need to check - * both bits here. - */ - return !(pkru & ((PKRU_AD_BIT|PKRU_WD_BIT) << pkru_pkey_bits)); + int pkru_pkey_bits =3D pkey * PKR_BITS_PER_PKEY; + + /* Access-disable disables writes too so check both bits here. */ + return !(pkru & ((PKR_AD_BIT|PKR_WD_BIT) << pkru_pkey_bits)); } =20 static inline u32 read_pkru(void) diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 39e1c8626ab9..e525bfee7e07 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1002,19 +1002,19 @@ int arch_set_user_pkey_access(struct task_struct *t= sk, int pkey, if (WARN_ON_ONCE(pkey >=3D arch_max_pkey())) return -EINVAL; =20 - /* Set the bits we need in PKRU: */ + /* Set the bits needed in PKRU: */ if (init_val & PKEY_DISABLE_ACCESS) - new_pkru_bits |=3D PKRU_AD_BIT; + new_pkru_bits |=3D PKR_AD_BIT; if (init_val & PKEY_DISABLE_WRITE) - new_pkru_bits |=3D PKRU_WD_BIT; + new_pkru_bits |=3D PKR_WD_BIT; =20 /* Shift the bits in to the correct place in PKRU for pkey: */ - pkey_shift =3D pkey * PKRU_BITS_PER_PKEY; + pkey_shift =3D pkey * PKR_BITS_PER_PKEY; new_pkru_bits <<=3D pkey_shift; =20 /* Get old PKRU and mask off any old bits in place: */ old_pkru =3D read_pkru(); - old_pkru &=3D ~((PKRU_AD_BIT|PKRU_WD_BIT) << pkey_shift); + old_pkru &=3D ~((PKR_AD_BIT|PKR_WD_BIT) << pkey_shift); =20 /* Write old part along with new part: */ write_pkru(old_pkru | new_pkru_bits); diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 7418c367e328..e1527b4619e1 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -110,22 +110,20 @@ int __arch_override_mprotect_pkey(struct vm_area_stru= ct *vma, int prot, int pkey return vma_pkey(vma); } =20 -#define PKRU_AD_MASK(pkey) (PKRU_AD_BIT << ((pkey) * PKRU_BITS_PER_PKEY)) - /* * Make the default PKRU value (at execve() time) as restrictive * as possible. This ensures that any threads clone()'d early * in the process's lifetime will not accidentally get access * to data which is pkey-protected later on. */ -u32 init_pkru_value =3D PKRU_AD_MASK( 1) | PKRU_AD_MASK( 2) | - PKRU_AD_MASK( 3) | PKRU_AD_MASK( 4) | - PKRU_AD_MASK( 5) | PKRU_AD_MASK( 6) | - PKRU_AD_MASK( 7) | PKRU_AD_MASK( 8) | - PKRU_AD_MASK( 9) | PKRU_AD_MASK(10) | - PKRU_AD_MASK(11) | PKRU_AD_MASK(12) | - PKRU_AD_MASK(13) | PKRU_AD_MASK(14) | - PKRU_AD_MASK(15); +u32 init_pkru_value =3D PKR_AD_MASK( 1) | PKR_AD_MASK( 2) | + PKR_AD_MASK( 3) | PKR_AD_MASK( 4) | + PKR_AD_MASK( 5) | PKR_AD_MASK( 6) | + PKR_AD_MASK( 7) | PKR_AD_MASK( 8) | + PKR_AD_MASK( 9) | PKR_AD_MASK(10) | + PKR_AD_MASK(11) | PKR_AD_MASK(12) | + PKR_AD_MASK(13) | PKR_AD_MASK(14) | + PKR_AD_MASK(15); =20 static ssize_t init_pkru_read_file(struct file *file, char __user *user_bu= f, size_t count, loff_t *ppos) @@ -158,7 +156,7 @@ static ssize_t init_pkru_write_file(struct file *file, * up immediately if someone attempts to disable access * or writes to pkey 0. */ - if (new_init_pkru & (PKRU_AD_BIT|PKRU_WD_BIT)) + if (new_init_pkru & (PKR_AD_BIT|PKR_WD_BIT)) return -EINVAL; =20 WRITE_ONCE(init_pkru_value, new_init_pkru); --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 404AEC433F5 for ; Tue, 19 Apr 2022 17:07:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355685AbiDSRKS (ORCPT ); Tue, 19 Apr 2022 13:10:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355578AbiDSRJr (ORCPT ); Tue, 19 Apr 2022 13:09:47 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8875CCFA for ; Tue, 19 Apr 2022 10:07:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388020; x=1681924020; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=uCyTbEvv9KE5o1C2FJzezkF1mueCJtD3tgpeVkHpNlc=; b=Is3nyYRZ24CLJbwU6TebOhImpl6SSZsi3PuMteFCqsC2EQ2TSsEl+XMJ R/OmmKMjxlk66vU2yU/GnYDDYRFz4NVkP7nMzzkRwWhQv607p9BxhmykR XlLoC2v1rVtDfztfcVHceQsgLWkf9h+m2HAfp0ax9XHmrvABhMuuO3Zd0 D+4CgfWIoaRvUqpf7FBzO5QdnTWVoXkILi3SY3X9sK8jXjbbf7vY5XV/x PVWHZi55OzAR5+uAsfJFrPlDESYZW4ed1bOIQnEzotqECRwnuwSaPNEqh m/O7qGK2x1kz7Nga5VvFRCbIzE/rTjbsJuOu6qF48T8vVdj86bxFEKtCb w==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="263280355" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="263280355" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:55 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="702255040" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:54 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 04/44] x86/fpu: Refactor arch_set_user_pkey_access() Date: Tue, 19 Apr 2022 10:06:09 -0700 Message-Id: <20220419170649.1022246-5-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Both PKU and PKS update their register values in the same way. They can therefore share the update code. Define a helper, pkey_update_pkval(), which will be used to support both Protection Key User (PKU) and the new Protection Key for Supervisor (PKS) in subsequent patches. pkey_update_pkval() contributed by Thomas Acked-by: Dave Hansen Co-developed-by: Thomas Gleixner Signed-off-by: Thomas Gleixner Signed-off-by: Ira Weiny --- Update for V8: From Rick Edgecombe Change pkey type to u8 Replace the code Peter provided in update_pkey_reg() for Thomas' pkey_update_pkval() -- https://lore.kernel.org/lkml/20200717085442.GX10769@hirez.programming.= kicks-ass.net/ --- arch/x86/include/asm/pkeys.h | 2 ++ arch/x86/kernel/fpu/xstate.c | 22 ++++------------------ arch/x86/mm/pkeys.c | 16 ++++++++++++++++ 3 files changed, 22 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index 1d5f14aff5f6..26616cbe19e2 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -131,4 +131,6 @@ static inline int vma_pkey(struct vm_area_struct *vma) return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT; } =20 +u32 pkey_update_pkval(u32 pkval, u8 pkey, u32 accessbits); + #endif /*_ASM_X86_PKEYS_H */ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index e525bfee7e07..ea9207b12863 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -984,8 +984,7 @@ void *get_xsave_addr(struct xregs_state *xsave, int xfe= ature_nr) int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val) { - u32 old_pkru, new_pkru_bits =3D 0; - int pkey_shift; + u32 pkru; =20 /* * This check implies XSAVE support. OSPKE only gets @@ -1002,22 +1001,9 @@ int arch_set_user_pkey_access(struct task_struct *ts= k, int pkey, if (WARN_ON_ONCE(pkey >=3D arch_max_pkey())) return -EINVAL; =20 - /* Set the bits needed in PKRU: */ - if (init_val & PKEY_DISABLE_ACCESS) - new_pkru_bits |=3D PKR_AD_BIT; - if (init_val & PKEY_DISABLE_WRITE) - new_pkru_bits |=3D PKR_WD_BIT; - - /* Shift the bits in to the correct place in PKRU for pkey: */ - pkey_shift =3D pkey * PKR_BITS_PER_PKEY; - new_pkru_bits <<=3D pkey_shift; - - /* Get old PKRU and mask off any old bits in place: */ - old_pkru =3D read_pkru(); - old_pkru &=3D ~((PKR_AD_BIT|PKR_WD_BIT) << pkey_shift); - - /* Write old part along with new part: */ - write_pkru(old_pkru | new_pkru_bits); + pkru =3D read_pkru(); + pkru =3D pkey_update_pkval(pkru, pkey, init_val); + write_pkru(pkru); =20 return 0; } diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index e1527b4619e1..7c90b2188c5f 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -193,3 +193,19 @@ static __init int setup_init_pkru(char *opt) return 1; } __setup("init_pkru=3D", setup_init_pkru); + +/* + * Kernel users use the same flags as user space: + * PKEY_DISABLE_ACCESS + * PKEY_DISABLE_WRITE + */ +u32 pkey_update_pkval(u32 pkval, u8 pkey, u32 accessbits) +{ + int shift =3D pkey * PKR_BITS_PER_PKEY; + + if (WARN_ON_ONCE(accessbits & ~PKEY_ACCESS_MASK)) + accessbits &=3D PKEY_ACCESS_MASK; + + pkval &=3D ~(PKEY_ACCESS_MASK << shift); + return pkval | accessbits << shift; +} --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F839C433F5 for ; Tue, 19 Apr 2022 17:07:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355676AbiDSRKM (ORCPT ); Tue, 19 Apr 2022 13:10:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355576AbiDSRJr (ORCPT ); Tue, 19 Apr 2022 13:09:47 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8880125CE for ; Tue, 19 Apr 2022 10:07:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388020; x=1681924020; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3treoGF538ZRvO/T0g44eU89daYldwoSNScF2DK9tso=; b=c1wRbMFoUjauiQnLKrGko9zbVflt8LQ8772CGPWBWJZYRTnnV78Mj2uH ZlzGbZk68lBZtYIGIdiYaa23OVvjdXJ8W0Cx3nI5XOBa0yKVsLpM4zoAc S0hXENvWA5sVVwM2zlih+pzLgPtUC20fc2FTK5U7vciGnWbwdwkmzeY6S AafPy5/QHNAQ7jVpoUaTE+OWOUuVKDLnovUORjBCpkq8m2/ynxaHkpC0S 2JmyrpxTeQCcZZABWmJCL2tvPqUrfaaxErIqMVt2Q+l0xBf87Y2xHuHA4 iqgbQCnAOm+n+IwxcUESMuaqIUDdgCLXVEZuIIS7908AzL95bYqjk2uhK g==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="263280356" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="263280356" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:55 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="666579647" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:54 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 05/44] mm/pkeys: Add Kconfig options for PKS Date: Tue, 19 Apr 2022 10:06:10 -0700 Message-Id: <20220419170649.1022246-6-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Consumers wishing to implement additional protections on memory pages can use PKS. However, PKS is only available on some architectures. For this reason PKS code, both in the core and in the consumers, is dead code without PKS being both available and used. Add Kconfig options to allow for the elimination of unneeded code by detecting architecture PKS support (ARCH_HAS_SUPERVISOR_PKEYS) and requiring an indication of consumer need (ARCH_ENABLE_SUPERVISOR_PKEYS). In this patch ARCH_ENABLE_SUPERVISOR_PKEYS remains off until the first kernel consumer sets it. Cc: "Moger, Babu" Signed-off-by: Ira Weiny --- Changes for V9 Dave Hansen Don't exclude AMD, cpu supported bits will properly turn the feature off. Clarify commit message Depend on CPU_SUP_INTEL Changes for V8 Split this out to a single change patch --- arch/x86/Kconfig | 1 + mm/Kconfig | 4 ++++ 2 files changed, 5 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index b0142e01002e..c53deda2ea25 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1879,6 +1879,7 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS depends on X86_64 && (CPU_SUP_INTEL || CPU_SUP_AMD) select ARCH_USES_HIGH_VMA_FLAGS select ARCH_HAS_PKEYS + select ARCH_HAS_SUPERVISOR_PKEYS help Memory Protection Keys provides a mechanism for enforcing page-based protections, but without requiring modification of the diff --git a/mm/Kconfig b/mm/Kconfig index 034d87953600..29c272974aa9 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -821,6 +821,10 @@ config ARCH_USES_HIGH_VMA_FLAGS bool config ARCH_HAS_PKEYS bool +config ARCH_HAS_SUPERVISOR_PKEYS + bool +config ARCH_ENABLE_SUPERVISOR_PKEYS + bool =20 config PERCPU_STATS bool "Collect percpu memory statistics" --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C1CFC433F5 for ; Tue, 19 Apr 2022 17:07:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355617AbiDSRJv (ORCPT ); Tue, 19 Apr 2022 13:09:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59714 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355515AbiDSRJi (ORCPT ); Tue, 19 Apr 2022 13:09:38 -0400 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12A2F2AF for ; Tue, 19 Apr 2022 10:06:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388016; x=1681924016; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ve/4jZtTUjdgGjXgqqd9l88h+YQVxKChCdKoifAV604=; b=dJiSPti57wzMt2LeuZt/rUwDpAkfl3rRLeozQJ00L5GHH4dPZDq+MuMP WikST/I8o5lnLWT1ALaEnkF7toqV6vi6Rye/xOBxrjobOaAhvefz3FQ7D l/X6ELVEcrWKjW64/yYzc2yaVyia4jXUN2AlFc1HBVpEWu6+KOEg9GLsW EdLYRNrersqAYqtt0saCC82HF2MG0iK/WxKWpZwNVnCTisjw0ww1DmSz8 y1IAE0yJNorZu/lWBlrkY/pkrDshiZLxaFbFH8xH99uQw37AVF6Z/ysG9 gPZRtHnvxRSPNZ52HF37G00UHdJxi/p9R2COv2G9NI6VaahabxjOs7ysd w==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="350267544" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="350267544" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:55 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="625733252" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:55 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 06/44] x86/pkeys: Add PKS CPU feature bit Date: Tue, 19 Apr 2022 10:06:11 -0700 Message-Id: <20220419170649.1022246-7-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Memory Protection Keys (pkeys) provides a mechanism for enforcing page-based protections, but without requiring modification of the page tables when an application changes protection domains. The supervisor support for memory protection keys is referred to as PKS (Protection Keys Supervisor). Add the defines for the CPU support bit and the boilerplate disable infrastructure predicated on the new ARCH_ENABLE_SUPERVISOR_PKEYS Kconfig option. Signed-off-by: Ira Weiny --- Changes for V9 Dave Hansen New commit message Changes for V8 Split this out into it's own patch --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 +++++++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpuf= eatures.h index 73e643ae94b6..a98a9aa2b845 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -368,6 +368,7 @@ #define X86_FEATURE_MOVDIR64B (16*32+28) /* MOVDIR64B instruction */ #define X86_FEATURE_ENQCMD (16*32+29) /* ENQCMD and ENQCMDS instructions = */ #define X86_FEATURE_SGX_LC (16*32+30) /* Software Guard Extensions Launch= Control */ +#define X86_FEATURE_PKS (16*32+31) /* Protection Keys for Supervisor pag= es */ =20 /* AMD-defined CPU features, CPUID level 0x80000007 (EBX), word 17 */ #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery sup= port */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/as= m/disabled-features.h index 1231d63f836d..cc73453bc218 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -44,6 +44,12 @@ # define DISABLE_OSPKE (1<<(X86_FEATURE_OSPKE & 31)) #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */ =20 +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS +# define DISABLE_PKS 0 +#else +# define DISABLE_PKS (1<<(X86_FEATURE_PKS & 31)) +#endif + #ifdef CONFIG_X86_5LEVEL # define DISABLE_LA57 0 #else @@ -88,7 +94,7 @@ #define DISABLED_MASK14 0 #define DISABLED_MASK15 0 #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UM= IP| \ - DISABLE_ENQCMD) + DISABLE_ENQCMD|DISABLE_PKS) #define DISABLED_MASK17 0 #define DISABLED_MASK18 0 #define DISABLED_MASK19 0 --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE5FAC433F5 for ; Tue, 19 Apr 2022 17:07:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350281AbiDSRK0 (ORCPT ); Tue, 19 Apr 2022 13:10:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59746 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355588AbiDSRJr (ORCPT ); Tue, 19 Apr 2022 13:09:47 -0400 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4DC6F38A6 for ; Tue, 19 Apr 2022 10:07:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388021; x=1681924021; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GEA3SMd1Hyd9mgC2yPlyiFisBoFISvY6li4UoRH0Ugk=; b=NaNWmAoQMJFIZN2rjybq7L2kKbA5r6jVpdTLSazqP6nYdu0G8IGPWvHU ylc0j5tQM9aBPs+5xPrL+HMR7EpC2d3UBY0P7NnLTdTp+l5rcGqnznbCu Cok/OvoLAFJtYwWhrlSm69McMAy6LmLYyqMpqxmDa7sMYY/u6RxszoR/g /RSo7U9IRtXILiHbitqudpL9WZm+HU82ukGN1rjXG2WrNYZD+jGnGUqzg 1Y6QjYJLWHHVujt5VrrkrrAkbQDI+KyTQrE+DfJOai28s11znndm0uIp6 NO39/ddNax0Xw2GjwUcoqkm0MpF5ziDepQsPVcXdQzm2J4EqBDzO9v2pP Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="263280365" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="263280365" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:56 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="614074980" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:56 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 07/44] x86/fault: Adjust WARN_ON for pkey fault Date: Tue, 19 Apr 2022 10:06:12 -0700 Message-Id: <20220419170649.1022246-8-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Previously if a protection key fault occurred on a kernel address it indicated something wrong because user page mappings are not supposed to be in the kernel address space. With the addition of PKS, pkey faults may now happen on kernel mappings. If PKS is enabled, avoid the warning in the fault path. Simplify the comment. Cc: Sean Christopherson Cc: Dan Williams Signed-off-by: Ira Weiny --- Changes for V9 From Dave Hansen Clarify the comment and commit message --- arch/x86/mm/fault.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index d0074c6ed31a..5599109d1124 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1148,11 +1148,11 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned l= ong hw_error_code, unsigned long address) { /* - * Protection keys exceptions only happen on user pages. We - * have no user pages in the kernel portion of the address - * space, so do not expect them here. + * PF_PF faults should only occur on kernel + * addresses when supervisor pkeys are enabled. */ - WARN_ON_ONCE(hw_error_code & X86_PF_PK); + WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_PKS) && + (hw_error_code & X86_PF_PK)); =20 #ifdef CONFIG_X86_32 /* --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0642CC41535 for ; Tue, 19 Apr 2022 17:10:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356161AbiDSRNF (ORCPT ); Tue, 19 Apr 2022 13:13:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355641AbiDSRJ4 (ORCPT ); Tue, 19 Apr 2022 13:09:56 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E3F6B1F4 for ; Tue, 19 Apr 2022 10:07:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388029; x=1681924029; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=RA7/CbrBlg0DZuKNvFbLv2IQL4yMBEmR7JWu0KLcJzo=; b=UfSiJkZ64IjGUuG6vxwdTG9OJqiU8x/tNmOqJd0fZLnK0Bbl2johUYcB lXvwK8mkcTrtJnCspcE5vwqH6nNSxyHAhXr2Zc1zq/uS2wCKZttaGUC52 JszlbGGNf+D4iHPLG7GyeTa5a1NYO91SbZNX+0Wh6+qg5/i41uByXYG6w qetDkzOK/HTRxzTdK7LC57OrNyDvoc8IcX2vAzT4CGhhC9X2B21FOiQee D7xrFLSgREy5/L2tdkGeYK1zplic3woI23Kyuq8WgTTMjnu3dST8INnt3 8EdEoRmffAL5AiwKlZx4wDFaOtkWL/NnZ7rbSOby2LwugMko+oz03jF1f w==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="261420760" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="261420760" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:57 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="657714516" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:56 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 08/44] Documentation/pkeys: Add initial PKS documentation Date: Tue, 19 Apr 2022 10:06:13 -0700 Message-Id: <20220419170649.1022246-9-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Add initial overview and configuration information about PKS. Cc: "Moger, Babu" Signed-off-by: Ira Weiny --- Changes for V9 Feedback from Dave Hansen Remove overview and move relevant text to the main pkey overview which covers both user ans kernel keys. Add an example of using Kconfig Move MSR details to later patches --- Documentation/core-api/protection-keys.rst | 43 ++++++++++++++++++++-- 1 file changed, 39 insertions(+), 4 deletions(-) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index bf28ac0401f3..13eedb0119e1 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -13,6 +13,11 @@ Pkeys Userspace (PKU) is a feature which can be found on: * Intel client CPUs, Tiger Lake (11th Gen Core) and later * Future AMD CPUs =20 +Protection Keys Supervisor (PKS) is a feature which can be found on: + * Sapphire Rapids (and later) "Scalable Processor" Server CPUs + * Future non-server Intel parts. + * qemu: https://www.qemu.org/2021/04/30/qemu-6-0-0/ + Pkeys work by dedicating 4 previously Reserved bits in each page table ent= ry to a "protection key", giving 16 possible keys. =20 @@ -23,13 +28,20 @@ and Write Disable) for each of 16 keys. Being a CPU register, PKRU is inherently thread-local, potentially giving = each thread a different set of protections from every other thread. =20 -There are two instructions (RDPKRU/WRPKRU) for reading and writing to the -register. The feature is only available in 64-bit mode, even though there= is +For Userspace (PKU), there are two instructions (RDPKRU/WRPKRU) for readin= g and +writing to the register. + +For Supervisor (PKS), the register (MSR_IA32_PKRS) is accessible only to t= he +kernel through rdmsr and wrmsr. + +The feature is only available in 64-bit mode, even though there is theoretically space in the PAE PTEs. These permissions are enforced on da= ta access only and have no effect on instruction fetches. =20 -Syscalls -=3D=3D=3D=3D=3D=3D=3D=3D + + +Syscalls for user space keys +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D =20 There are 3 system calls which directly interact with pkeys:: =20 @@ -96,3 +108,26 @@ with a read():: The kernel will send a SIGSEGV in both cases, but si_code will be set to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when the plain mprotect() permissions are violated. + + +Kernel API for PKS support +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D + +Kconfig +------- + +Kernel users intending to use PKS support should depend on +ARCH_HAS_SUPERVISOR_PKEYS, and select ARCH_ENABLE_SUPERVISOR_PKEYS to turn= on +this support within the core. For example: + +.. code-block:: c + + config MY_NEW_FEATURE + depends on ARCH_HAS_SUPERVISOR_PKEYS + select ARCH_ENABLE_SUPERVISOR_PKEYS + +This will make "MY_NEW_FEATURE" unavailable unless the architecture sets +ARCH_HAS_SUPERVISOR_PKEYS. It also makes it possible for multiple indepen= dent +features to "select ARCH_ENABLE_SUPERVISOR_PKEYS". If no features enable = PKS +by selecting ARCH_ENABLE_SUPERVISOR_PKEYS, PKS support will not be compiled +into the kernel. --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA97EC433EF for ; Tue, 19 Apr 2022 17:07:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355629AbiDSRJz (ORCPT ); Tue, 19 Apr 2022 13:09:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355553AbiDSRJl (ORCPT ); Tue, 19 Apr 2022 13:09:41 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B3DD7FF7 for ; Tue, 19 Apr 2022 10:06:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388018; x=1681924018; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3NA1AOfkWvyInGYP3ZXXPe9wyH6gcZKqLJYd1GBzKOE=; b=BJVf5a9s2T2TQk2RmwT5X2BKkwVyr2YsaGokXNn8ph3NyNh8xoAk9eqc BB/zJ9HhO/NsDhcEjUJ/JQfpqon6aosbvyxOs/bpCUjizeatzTJbMrwDM jG59QtBUixYAy2VgOwURo2T+p1eDLI2Oqe9m4RTKuWCxeStBQxRZjVj2E BD1qGC4DDYnCpzglJMrp1h7jHQ3bKEdo1970ZldMam11L0WSoHHc0WtTy b5bbF4Y86/gmZyQVcBXck+U+OKqkjd8KX4nzsW2ioB9De3/cN2WIeiY8I Qb93nZp481uhwQmfwT5/872j5hHRtyGO7w3YQq1pl0FpHNd8lV4ZUaqWf A==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="244402616" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="244402616" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:58 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="576192165" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:57 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 09/44] mm/pkeys: Provide for PKS key allocation Date: Tue, 19 Apr 2022 10:06:14 -0700 Message-Id: <20220419170649.1022246-10-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Kernel consumers of PKS need a way to allocate a PKS pkey and assign the initial permissions for that key. It is desirable to not allocate keys for consumers which are not configured. Introduce a macro to allocate keys sequentially based on which consumers are configured. In addition define a macro to set the proper permission bits based on the actual pkey value allocated. pks-keys.h is added as a new header with minimal header dependencies. This allows the use of PKS_INIT_VALUE within other headers where the additional includes from other pkey headers caused major conflicts. The main conflict was using PKS_INIT_VALUE for INIT_TRHEAD in asm/processor.h Add documentation. Suggested-by: Dan Williams Signed-off-by: Ira Weiny --- Changes for V9 Reword the commit message Move this patch ahead of the enable patch so that the enable patch can use PKS_INIT_VALUE From Dan Williams Use Dan's macro magic enhanced it to account for the max number of keys Update documentation for the change From Dave Hansen use pkey s/PKR_RW_KEY/PKR_RW_MASK Changes for V8 Create pks-keys.h to solve header conflicts in subsequent patches. Remove create_initial_pkrs_value() which did not work Replace it with PKS_INIT_VALUE Fix up documentation to match s/PKR_RW_BIT/PKR_RW_KEY()/ s/PKRS_INIT_VALUE/PKS_INIT_VALUE Split this off of the previous patch Update documentation and embed it in the code to help ensure it is kept up to date. Changes for V7 Create a dynamic pkrs_initial_value in early init code. Clean up comments Add comment to macro guard --- Documentation/core-api/protection-keys.rst | 5 ++ arch/x86/include/asm/pkeys_common.h | 9 ++- include/linux/pks-keys.h | 78 ++++++++++++++++++++++ 3 files changed, 91 insertions(+), 1 deletion(-) create mode 100644 include/linux/pks-keys.h diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index 13eedb0119e1..d501bd27ee29 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -131,3 +131,8 @@ ARCH_HAS_SUPERVISOR_PKEYS. It also makes it possible f= or multiple independent features to "select ARCH_ENABLE_SUPERVISOR_PKEYS". If no features enable = PKS by selecting ARCH_ENABLE_SUPERVISOR_PKEYS, PKS support will not be compiled into the kernel. + +PKS Key Allocation +------------------ +.. kernel-doc:: include/linux/pks-keys.h + :doc: PKS_KEY_ALLOCATION diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pke= ys_common.h index 359b94cdcc0c..b28a72dea22b 100644 --- a/arch/x86/include/asm/pkeys_common.h +++ b/arch/x86/include/asm/pkeys_common.h @@ -2,10 +2,17 @@ #ifndef _ASM_X86_PKEYS_COMMON_H #define _ASM_X86_PKEYS_COMMON_H =20 +#define PKS_NUM_PKEYS 16 +#define PKS_ALL_AD (0x55555555UL) + #define PKR_AD_BIT 0x1u #define PKR_WD_BIT 0x2u #define PKR_BITS_PER_PKEY 2 =20 -#define PKR_AD_MASK(pkey) (PKR_AD_BIT << ((pkey) * PKR_BITS_PER_PKEY)) +#define PKR_PKEY_SHIFT(pkey) (pkey * PKR_BITS_PER_PKEY) + +#define PKR_RW_MASK(pkey) (0 << PKR_PKEY_SHIFT(pkey)) +#define PKR_AD_MASK(pkey) (PKR_AD_BIT << PKR_PKEY_SHIFT(pkey)) +#define PKR_WD_MASK(pkey) (PKR_WD_BIT << PKR_PKEY_SHIFT(pkey)) =20 #endif /*_ASM_X86_PKEYS_COMMON_H */ diff --git a/include/linux/pks-keys.h b/include/linux/pks-keys.h new file mode 100644 index 000000000000..c914afecb2d3 --- /dev/null +++ b/include/linux/pks-keys.h @@ -0,0 +1,78 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_PKS_KEYS_H +#define _LINUX_PKS_KEYS_H + +/* + * The contents of this header should be limited to assigning PKS keys and + * default values to avoid intricate header dependencies. + */ + +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + +#include + +#define PKS_NEW_KEY(prev, config) \ + (prev + __is_defined(config)) +#define PKS_DECLARE_INIT_VALUE(pkey, value, config) \ + (PKR_##value##_MASK(pkey) * __is_defined(config)) + +/** + * DOC: PKS_KEY_ALLOCATION + * + * Users reserve a key value in 5 steps. + * 1) Use PKS_NEW_KEY to create a new key + * 2) Ensure that the last key value is specified in the PKS_NEW_KEY macro + * 3) Adjust PKS_KEY_MAX to use the newly defined key value + * 4) Use PKS_DECLARE_INIT_VALUE to define an initial value + * 5) Add the new PKS default value to PKS_INIT_VALUE + * + * The PKS_NEW_KEY and PKS_DECLARE_INIT_VALUE macros require the Kconfig + * option to be specified to automatically adjust the number of keys used. + * + * PKS_KEY_DEFAULT must remain 0 with a default of PKS_DECLARE_INIT_VALUE(= ..., + * RW, ...) to support non-pks protected pages. + * + * Example: to configure a key for 'MY_FEATURE' with a default of Write + * Disabled. + * + * .. code-block:: c + * + * #define PKS_KEY_DEFAULT 0 + * + * // 1) Use PKS_NEW_KEY to create a new key + * // 2) Ensure that the last key value is specified (eg PKS_KEY_DEFAULT) + * #define PKS_KEY_MY_FEATURE PKS_NEW_KEY(PKS_KEY_DEFAULT, CONFIG_MY_FEATU= RE) + * + * // 3) Adjust PKS_KEY_MAX + * #define PKS_KEY_MAX PKS_NEW_KEY(PKS_KEY_MY_FEATURE, 1) + * + * // 4) Define initial value + * #define PKS_KEY_MY_FEATURE_INIT PKS_DECLARE_INIT_VALUE(PKS_KEY_MY_FEATU= RE, \ + * WD, CONFIG_MY_FEATURE) + * + * + * // 5) Add initial value to PKS_INIT_VALUE + * #define PKS_INIT_VALUE ((PKS_ALL_AD & PKS_ALL_AD_MASK) | \ + * PKS_KEY_DEFAULT_INIT | \ + * PKS_KEY_MY_FEATURE_INIT \ + * ) + */ + +/* PKS_KEY_DEFAULT must be 0 */ +#define PKS_KEY_DEFAULT 0 +#define PKS_KEY_MAX PKS_NEW_KEY(PKS_KEY_DEFAULT, 1) + +/* PKS_KEY_DEFAULT_INIT must be RW */ +#define PKS_KEY_DEFAULT_INIT PKS_DECLARE_INIT_VALUE(PKS_KEY_DEFAULT, RW, 1) + +#define PKS_ALL_AD_MASK \ + GENMASK(PKS_NUM_PKEYS * PKR_BITS_PER_PKEY, \ + PKS_KEY_MAX * PKR_BITS_PER_PKEY) + +#define PKS_INIT_VALUE ((PKS_ALL_AD & PKS_ALL_AD_MASK) | \ + PKS_KEY_DEFAULT_INIT \ + ) + +#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ + +#endif /* _LINUX_PKS_KEYS_H */ --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FB44C433EF for ; Tue, 19 Apr 2022 17:07:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355563AbiDSRKW (ORCPT ); Tue, 19 Apr 2022 13:10:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355571AbiDSRJq (ORCPT ); Tue, 19 Apr 2022 13:09:46 -0400 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE08E1138 for ; Tue, 19 Apr 2022 10:06:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388019; x=1681924019; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0S/vlajBtLo9jFtd2sMm4bJOGT0tFshhyI9/QZbLLIo=; b=OT8URNPojUS4/P8SGhXEK1MKPfG4mvSCw4ylAttyVmY47leo9owgslNa ztbMJmK9j/50+5a6S50/6JOQEb51F6KRk7tz9UWcpBRskE4g27DuAzkt7 Akt4hoIufHKwKlP1NPBMDQWgKUmWdExpSCxoCN4vbD81pLQByi1Nzqc3w 492qiYxdtY1zsvkKyzUjMSHZPRUoVDyERiqwpLwet7ZEOQl+TTkuNjUxP wkxfr3GPga6pCtht3wTuYNtz9KhoOlQaqS/DPINSZY2iPKTdAt+J7vFZY 0bgHmx7lrD5Zebo58rG4Hjxt6QKVBXzqUUDd6G44JsISWA1IwdLduCqh0 Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="350267560" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="350267560" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:59 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="510207076" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:06:58 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 10/44] x86/pkeys: Enable PKS on cpus which support it Date: Tue, 19 Apr 2022 10:06:15 -0700 Message-Id: <20220419170649.1022246-11-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Protection Keys for Supervisor pages (PKS) enables fast, hardware thread specific, manipulation of permission restrictions on supervisor page mappings. It uses a supervisor specific MSR to assign permissions to the pkeys. When PKS is configured and the cpu supports PKS, initialize the MSR, and enable the hardware. Add asm/pks.h to store new internal functions and structures such as pks_setup(). Co-developed-by: Fenghua Yu Signed-off-by: Fenghua Yu Signed-off-by: Ira Weiny --- Changes for V10 Update to latest master branch Changes for V9 Reword commit message Move this after the patch defining PKS_INIT_VALUE Changes for V8 Move setup_pks() into this patch with a default of all access for all pkeys. From Thomas s/setup_pks/pks_setup/ Update Change log to better reflect exactly what this patch does. --- arch/x86/include/asm/msr-index.h | 1 + arch/x86/include/asm/pks.h | 15 +++++++++++++++ arch/x86/include/uapi/asm/processor-flags.h | 2 ++ arch/x86/kernel/cpu/common.c | 2 ++ arch/x86/mm/pkeys.c | 17 +++++++++++++++++ 5 files changed, 37 insertions(+) create mode 100644 arch/x86/include/asm/pks.h diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-in= dex.h index ee15311b6be1..e8e33b5ed507 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -809,6 +809,7 @@ =20 #define MSR_IA32_TSC_DEADLINE 0x000006E0 =20 +#define MSR_IA32_PKRS 0x000006E1 =20 #define MSR_TSX_FORCE_ABORT 0x0000010F =20 diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h new file mode 100644 index 000000000000..8180fc59790b --- /dev/null +++ b/arch/x86/include/asm/pks.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_PKS_H +#define _ASM_X86_PKS_H + +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + +void pks_setup(void); + +#else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ + +static inline void pks_setup(void) { } + +#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ + +#endif /* _ASM_X86_PKS_H */ diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include= /uapi/asm/processor-flags.h index c47cc7f2feeb..21b7783885b3 100644 --- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -132,6 +132,8 @@ #define X86_CR4_PKE _BITUL(X86_CR4_PKE_BIT) #define X86_CR4_CET_BIT 23 /* enable Control-flow Enforcement Technology = */ #define X86_CR4_CET _BITUL(X86_CR4_CET_BIT) +#define X86_CR4_PKS_BIT 24 /* enable Protection Keys for Supervisor */ +#define X86_CR4_PKS _BITUL(X86_CR4_PKS_BIT) =20 /* * x86-64 Task Priority Register, CR8 diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index e342ae4db3c4..4c0623783bd8 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -60,6 +60,7 @@ #include #include #include +#include =20 #include "cpu.h" =20 @@ -1764,6 +1765,7 @@ static void identify_cpu(struct cpuinfo_x86 *c) x86_init_rdrand(c); setup_pku(c); setup_cet(c); + pks_setup(); =20 /* * Clear/Set all flags overridden by options, need do it diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 7c90b2188c5f..f904376570f4 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -6,6 +6,7 @@ #include /* debugfs_create_u32() */ #include /* mm_struct, vma, etc... */ #include /* PKEY_* */ +#include #include =20 #include /* boot_cpu_has, ... */ @@ -209,3 +210,19 @@ u32 pkey_update_pkval(u32 pkval, u8 pkey, u32 accessbi= ts) pkval &=3D ~(PKEY_ACCESS_MASK << shift); return pkval | accessbits << shift; } + +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + +/* + * PKS is independent of PKU and either or both may be supported on a CPU. + */ +void pks_setup(void) +{ + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + wrmsrl(MSR_IA32_PKRS, PKS_INIT_VALUE); + cr4_set_bits(X86_CR4_PKS); +} + +#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 275DDC433F5 for ; Tue, 19 Apr 2022 17:08:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355806AbiDSRKk (ORCPT ); Tue, 19 Apr 2022 13:10:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355596AbiDSRJr (ORCPT ); Tue, 19 Apr 2022 13:09:47 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 783A95F59 for ; Tue, 19 Apr 2022 10:07:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388021; x=1681924021; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=uPfCwCLIhEO1e+fi1mrejHCAQdkmQVhrehDTb7JigWc=; b=NQtt8JdMpk2xQL8r4FzslBkss92fWgqoV1bQfymgJSeYNkL/ctxuFgrr wNYHRc5hkNsbNh21U4hvsN5ZT1Pu7FB8yspUxk42YNGulugGdXGmb2+Ct bwOsVbigJd1zA6Elszk+ICEkLu4/apDAcHAqD/h+NFSafDreqRV14d5IZ 4W2kMUDOKi8XIzOl9eSEdkHRWQYGtM51A8t7zixvfkEYaPMdHGVu95m/K j77HHe6FC/TSKV2kwAS+wdSSUs8wrFKYsgLP5wKBrIsOkp8i4zHBRBOKR 07x/7rDgtN+36rDr32jlPbXoGS1fskam8YL8CppWKfU1Cw7tbnkNiQNFA A==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="251123589" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="251123589" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:01 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="727145527" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:00 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 11/44] mm/pkeys: Define PKS page table macros Date: Tue, 19 Apr 2022 10:06:16 -0700 Message-Id: <20220419170649.1022246-12-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Fenghua Yu Kernel PKS consumers will need a way to assign their pkey to pages. Define _PAGE_PKEY() and PAGE_KERNEL_PKEY() to allow users to set a pkey on a PTE. Add documentation. Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Signed-off-by: Fenghua Yu --- Changes for V9 From Dave Hansen s/PKey/pkey Changes for V8 Split out from the 'Add PKS kernel API' patch Include documentation in this patch --- Documentation/core-api/protection-keys.rst | 6 ++++++ arch/x86/include/asm/pgtable_types.h | 22 ++++++++++++++++++++++ include/linux/pgtable.h | 4 ++++ 3 files changed, 32 insertions(+) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index d501bd27ee29..fe63acf5abbe 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -136,3 +136,9 @@ PKS Key Allocation ------------------ .. kernel-doc:: include/linux/pks-keys.h :doc: PKS_KEY_ALLOCATION + +Adding pages to a pkey protected domain +--------------------------------------- + +.. kernel-doc:: arch/x86/include/asm/pgtable_types.h + :doc: PKS_KEY_ASSIGNMENT diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pg= table_types.h index 40497a9020c6..e1d4535b525e 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -71,6 +71,22 @@ _PAGE_PKEY_BIT2 | \ _PAGE_PKEY_BIT3) =20 +/** + * DOC: PKS_KEY_ASSIGNMENT + * + * The following macros are used to set a pkey value in a supervisor PTE. + * + * .. code-block:: c + * + * #define _PAGE_KEY(pkey) + * #define PAGE_KERNEL_PKEY(pkey) + */ +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS +#define _PAGE_PKEY(pkey) (_AT(pteval_t, pkey) << _PAGE_BIT_PKEY_BIT0) +#else +#define _PAGE_PKEY(pkey) (_AT(pteval_t, 0)) +#endif + #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) #define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY | _PAGE_ACCESSED) #else @@ -226,6 +242,12 @@ enum page_cache_mode { #define PAGE_KERNEL_IO __pgprot_mask(__PAGE_KERNEL_IO) #define PAGE_KERNEL_IO_NOCACHE __pgprot_mask(__PAGE_KERNEL_IO_NOCACHE) =20 +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS +#define PAGE_KERNEL_PKEY(pkey) __pgprot_mask(__PAGE_KERNEL | _PAGE_PKEY(pk= ey)) +#else +#define PAGE_KERNEL_PKEY(pkey) PAGE_KERNEL +#endif + #endif /* __ASSEMBLY__ */ =20 /* xwr */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index f4f4077b97aa..bcef6b306fcb 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1511,6 +1511,10 @@ static inline bool arch_has_pfn_modify_check(void) # define PAGE_KERNEL_EXEC PAGE_KERNEL #endif =20 +#ifndef PAGE_KERNEL_PKEY +#define PAGE_KERNEL_PKEY(pkey) PAGE_KERNEL +#endif + /* * Page Table Modification bits for pgtbl_mod_mask. * --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D034C433F5 for ; Tue, 19 Apr 2022 17:08:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355902AbiDSRLF (ORCPT ); Tue, 19 Apr 2022 13:11:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60054 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355655AbiDSRJ4 (ORCPT ); Tue, 19 Apr 2022 13:09:56 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 006F611A30 for ; Tue, 19 Apr 2022 10:07:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388030; x=1681924030; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=99E8DhfEk0Ekd5my6kcIYaxIlJz7qBZKJBWjSrZUA94=; b=N+RwRk573jPr9h/dhD2px3bnJdJm5iKkO9dtX0u2FQa4MXAqAULgfbY2 +J5e7ABfWZIsgqH3qetCl2Z3QO9QHireuEGnBlpRSI6VF69xrtmjUvfy2 5SAdVO1iPHnS7YyylX+Iwq7p150PSqwpn1LlvcBay82hXtp5l8LqvOT5K HkdgmBcwmAfnlpQaLh/A+cLjbKrkeLPZBF19KzKwzIbODT/NyGR2A3Mci Z5+MPR0iIPG2oUqGEsWmugqYCfLyGZfKH0jUuYXe5KrVqDply9YcmXHNl FR7ww7dFVinAZvPYrZNdXXDXOr9AE+gh29UTJPY+IKr4YrsVxYrFztu5C w==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="261420786" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="261420786" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:01 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="657714575" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:01 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 12/44] x86/pkeys: Introduce pks_write_pkrs() Date: Tue, 19 Apr 2022 10:06:17 -0700 Message-Id: <20220419170649.1022246-13-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Writing to MSR's is inefficient. Even though the underlying PKS register, MSR_IA32_PKRS, is not serializing; writing to the MSR should be avoided if possible. Especially when updates are made in critical paths such as the scheduler or the entry code. Introduce pks_write_pkrs(). pks_write_pkrs() avoids writing MSR_IA32_PKRS if the pkrs value has not changed for the current CPU. Most of the callers are in a non-preemptable code path. Therefore, avoid calling preempt_{disable,enable}() to protect the per-cpu cache and instead rely on outer calls for this protection. Do the same with checks to X86_FEATURE_PKS. On startup, while unlikely, the PKS_INIT_VALUE may be 0. This would prevent pks_write_pkrs() from updating the MSR because of the initial value of the per-cpu cache. Therefore, keep the MSR write in pks_setup() to ensure the MSR is initialized at least one time. Suggested-by: Dave Hansen Signed-off-by: Ira Weiny --- Changes for V9 From Dave Hansen Update commit message with a bit more detail about why this optimization is needed Update the code comments as well. Changes for V8 From Thomas Remove get/put_cpu_ptr() and make this a 'lower level call. This makes it preemption unsafe but it is called mostly where preemption is already disabled. Add this as a predicate of the call and those calls which need to can disable preemption. Add lockdep assert for preemption Ensure MSR gets written even if the PKS_INIT_VALUE is 0. Completely re-write the commit message. s/write_pkrs/pks_write_pkrs/ Split this off into a singular patch Changes for V7 Create a dynamic pkrs_initial_value in early init code. Clean up comments Add comment to macro guard --- arch/x86/mm/pkeys.c | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index f904376570f4..10521f1a292e 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -213,15 +213,56 @@ u32 pkey_update_pkval(u32 pkval, u8 pkey, u32 accessb= its) =20 #ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS =20 +static DEFINE_PER_CPU(u32, pkrs_cache); + +/* + * pks_write_pkrs() - Write the pkrs of the current CPU + * @new_pkrs: New value to write to the current CPU register + * + * Optimizes the MSR writes by maintaining a per cpu cache. + * + * Context: must be called with preemption disabled + * Context: must only be called if PKS is enabled + * + * It should also be noted that the underlying WRMSR(MSR_IA32_PKRS) is not + * serializing but still maintains ordering properties similar to WRPKRU. + * The current SDM section on PKRS needs updating but should be the same as + * that of WRPKRU. Quote from the WRPKRU text: + * + * WRPKRU will never execute transiently. Memory accesses + * affected by PKRU register will not execute (even transiently) + * until all prior executions of WRPKRU have completed execution + * and updated the PKRU register. + */ +static inline void pks_write_pkrs(u32 new_pkrs) +{ + u32 pkrs =3D __this_cpu_read(pkrs_cache); + + lockdep_assert_preemption_disabled(); + + if (pkrs !=3D new_pkrs) { + __this_cpu_write(pkrs_cache, new_pkrs); + wrmsrl(MSR_IA32_PKRS, new_pkrs); + } +} + /* * PKS is independent of PKU and either or both may be supported on a CPU. + * + * Context: must be called with preemption disabled */ void pks_setup(void) { if (!cpu_feature_enabled(X86_FEATURE_PKS)) return; =20 + /* + * If the PKS_INIT_VALUE is 0 then pks_write_pkrs() will fail to + * initialize the MSR. Do a single write here to ensure the MSR is + * written at least one time. + */ wrmsrl(MSR_IA32_PKRS, PKS_INIT_VALUE); + pks_write_pkrs(PKS_INIT_VALUE); cr4_set_bits(X86_CR4_PKS); } =20 --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4272BC433FE for ; Tue, 19 Apr 2022 17:10:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355860AbiDSRKt (ORCPT ); Tue, 19 Apr 2022 13:10:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60038 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355633AbiDSRJz (ORCPT ); Tue, 19 Apr 2022 13:09:55 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A7D6525CE for ; Tue, 19 Apr 2022 10:07:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388028; x=1681924028; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IUlIcjFdu0Dobv9fAtc/uX0neIYthgVEegS00Ix8hIg=; b=dmRqoDbZnfQk/G6/1nJXiMymlJgMQek+SuumVgsn2r2VhIuwmHipEA9T /DLzozVGT9inA1kFTt1DVgWe1U9A1USJpQty28uhuSsZFYkmHmozidkT+ OHxOrpiaqb3ewejSv0eFfxnyBGo61/dZCdYeMkijxKPcS4xHBjao0WgOo dDxqgGnvNSF+thtRYC6L8iNGpeW1lGIngq+VAnQKBn6O6PkvmQxdorrim X+0qao69hlsOLtQ7MiaotWEnhs13UqJmzY4qQdiSF0CilE62lizTY/bQE qD2IVANSaGZNoQlW4QiFMJHYBy6VqidGEgNdX+dKTZyOawWI+Fp3629/8 A==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="324261459" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="324261459" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:02 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="561781853" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:01 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 13/44] x86/pkeys: Preserve the PKS MSR on context switch Date: Tue, 19 Apr 2022 10:06:18 -0700 Message-Id: <20220419170649.1022246-14-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The PKS MSR (PKRS) is a per-logical-processor register. Unfortunately, the MSR is not managed by XSAVE. Therefore, software must save/restore the MSR value on context switch. Allocate space in thread_struct to hold the saved MSR value. Ensure all tasks, including the init_task are properly initialized. Set the CPU PKRS value when a task is scheduled. Co-developed-by: Fenghua Yu Signed-off-by: Fenghua Yu Signed-off-by: Ira Weiny --- Changes for V9 From Dave Hansen Clarify the commit message s/pks_saved_pkrs/pkrs/ s/pks_write_current/x86_pkrs_load/ Change x86_pkrs_load to take the next thread instead of 'current' Changes for V8 From Thomas Ensure pkrs_write_current() does not suffer the overhead of preempt disable. Fix setting of initial value Remove flawed and broken create_initial_pkrs_value() in favor of a much simpler and robust macro default Update function names to be consistent. s/pkrs_write_current/pks_write_current This is a more consistent name s/saved_pkrs/pks_saved_pkrs s/pkrs_init_value/PKS_INIT_VALUE Remove pks_init_task() This function was added mainly to avoid the header file issue. Adding pks-keys.h solved that and saves the complexity. Changes for V7 Move definitions from asm/processor.h to asm/pks.h s/INIT_PKRS_VALUE/pkrs_init_value Change pks_init_task()/pks_sched_in() to functions s/pks_sched_in/pks_write_current to be used more generically later in the series --- arch/x86/include/asm/pks.h | 2 ++ arch/x86/include/asm/processor.h | 15 ++++++++++++++- arch/x86/kernel/process_64.c | 2 ++ arch/x86/mm/pkeys.c | 9 +++++++++ 4 files changed, 27 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h index 8180fc59790b..a7bad7301783 100644 --- a/arch/x86/include/asm/pks.h +++ b/arch/x86/include/asm/pks.h @@ -5,10 +5,12 @@ #ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS =20 void pks_setup(void); +void x86_pkrs_load(struct thread_struct *thread); =20 #else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 static inline void pks_setup(void) { } +static inline void x86_pkrs_load(struct thread_struct *thread) { } =20 #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/proces= sor.h index 91d0f93a00c7..d52970816594 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -2,6 +2,8 @@ #ifndef _ASM_X86_PROCESSOR_H #define _ASM_X86_PROCESSOR_H =20 +#include + #include =20 /* Forward declaration, a strange C thing */ @@ -529,6 +531,10 @@ struct thread_struct { * PKRU is the hardware itself. */ u32 pkru; +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + /* Saved Protection key register for supervisor mappings */ + u32 pkrs; +#endif =20 /* Floating point and extended processor state */ struct fpu fpu; @@ -771,7 +777,14 @@ static inline void spin_lock_prefetch(const void *x) #define KSTK_ESP(task) (task_pt_regs(task)->sp) =20 #else -#define INIT_THREAD { } + +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS +#define INIT_THREAD { \ + .pkrs =3D PKS_INIT_VALUE, \ +} +#else +#define INIT_THREAD { } +#endif =20 extern unsigned long KSTK_ESP(struct task_struct *task); =20 diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index e459253649be..5cfa1f8c8465 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -59,6 +59,7 @@ /* Not included via unistd.h */ #include #endif +#include =20 #include "process.h" =20 @@ -612,6 +613,7 @@ __switch_to(struct task_struct *prev_p, struct task_str= uct *next_p) x86_fsgsbase_load(prev, next); =20 x86_pkru_load(prev, next); + x86_pkrs_load(next); =20 /* * Switch the PDA and FPU contexts. diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 10521f1a292e..39e4c2cbc279 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -246,6 +246,15 @@ static inline void pks_write_pkrs(u32 new_pkrs) } } =20 +/* x86_pkrs_load() - Update CPU with the incoming thread pkrs value */ +void x86_pkrs_load(struct thread_struct *thread) +{ + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + pks_write_pkrs(thread->pkrs); +} + /* * PKS is independent of PKU and either or both may be supported on a CPU. * --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A4A7C43217 for ; Tue, 19 Apr 2022 17:10:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355893AbiDSRKy (ORCPT ); Tue, 19 Apr 2022 13:10:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59840 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355604AbiDSRJr (ORCPT ); Tue, 19 Apr 2022 13:09:47 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E2E2762C2 for ; Tue, 19 Apr 2022 10:07:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388022; x=1681924022; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yN0yRbdkpZNX+e/QEdD0Iwhz8NeQzBwyhB1u8C+FbRs=; b=Jmmu8xTpkoaKgQlbIMTn/N+SAyUxEEjebGUVMl/MTjNaYnK3BsSns0gv hI/GS4kIRmx5vnlViONryVhvFblJnQ0le/qmMIbGmas51O1+kxgkoQFBk kB6M536ZqovX7NFGqueOFYJz9dhE4oK/XX1QUR0jv7YRyZT4iNyd5BQOO 9do84tEHbN3BxcRLptK0SnwIZPfbmqBNmzaGxgSNCEfvPpLRmOrb/SQpz WwbRz4GsGwrolbsnoYxTr98ujcCE/dJC2NsoDF2uC/A/xzy+tpXtLJrKu jVKb1cShtxxNF8UaSAyL4Moocf7aNtEW5mKfKeSaesrd1uoWm7Iazvy2Z g==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="263991825" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="263991825" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:02 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="861588291" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:02 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 14/44] mm/pkeys: Introduce pks_set_readwrite() Date: Tue, 19 Apr 2022 10:06:19 -0700 Message-Id: <20220419170649.1022246-15-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny When kernel code needs access to a PKS protected page they will need to change the protections for the pkey to Read/Write. Define pks_set_readwrite() to update the specified pkey. Define pks_update_protection() as a helper to do the heavy lifting and allow for subsequent pks_set_*() calls. Define PKEY_READ_WRITE rather than use a magic value of '0' in pks_update_protection(). Finally, ensure preemption is disabled for pks_write_pkrs() because the context of this call can not generally be predicted. pks.h is created to avoid conflicts and header dependencies with the user space pkey code. Add documentation. Signed-off-by: Ira Weiny --- changes for v9 Move MSR documentation note to this patch move declarations to incline/linux/pks.h from rick edgecombe change pkey type to u8 validate pkey range in pks_update_protection from 0day fix documentation link from dave hansen s/pks_mk_*/pks_set_*/ use pkey s/pks_saved_pkrs/pkrs/ changes for v8 define pkey_read_write make the call inline clean up the names use pks_write_pkrs() with preemption disabled split this out from 'add pks kernel api' include documentation in this patch --- Documentation/core-api/protection-keys.rst | 15 +++++++++++ arch/x86/mm/pkeys.c | 31 ++++++++++++++++++++++ include/linux/pks.h | 31 ++++++++++++++++++++++ include/uapi/asm-generic/mman-common.h | 1 + 4 files changed, 78 insertions(+) create mode 100644 include/linux/pks.h diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index fe63acf5abbe..3af92e1cbffd 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -142,3 +142,18 @@ Adding pages to a pkey protected domain =20 .. kernel-doc:: arch/x86/include/asm/pgtable_types.h :doc: PKS_KEY_ASSIGNMENT + +Changing permissions of individual keys +--------------------------------------- + +.. kernel-doc:: include/linux/pks.h + :identifiers: pks_set_readwrite + +MSR details +~~~~~~~~~~~ + +WRMSR is typically an architecturally serializing instruction. However, +WRMSR(MSR_IA32_PKRS) is an exception. It is not a serializing instruction= and +instead maintains ordering properties similar to WRPKRU. Thus it is safe = to +immediately use a mapping when the pks_set*() functions returns. Check the +latest SDM for details. diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 39e4c2cbc279..e4cbc79686ea 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -6,6 +6,7 @@ #include /* debugfs_create_u32() */ #include /* mm_struct, vma, etc... */ #include /* PKEY_* */ +#include #include #include =20 @@ -275,4 +276,34 @@ void pks_setup(void) cr4_set_bits(X86_CR4_PKS); } =20 +/* + * Do not call this directly, see pks_set*(). + * + * @pkey: Key for the domain to change + * @protection: protection bits to be used + * + * Protection utilizes the same protection bits specified for User pkeys + * PKEY_DISABLE_ACCESS + * PKEY_DISABLE_WRITE + * + */ +void pks_update_protection(u8 pkey, u8 protection) +{ + u32 pkrs; + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + if (WARN_ON_ONCE(pkey >=3D PKS_KEY_MAX)) + return; + + pkrs =3D current->thread.pkrs; + current->thread.pkrs =3D pkey_update_pkval(pkrs, pkey, + protection); + preempt_disable(); + pks_write_pkrs(current->thread.pkrs); + preempt_enable(); +} +EXPORT_SYMBOL_GPL(pks_update_protection); + #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ diff --git a/include/linux/pks.h b/include/linux/pks.h new file mode 100644 index 000000000000..8b705a937b19 --- /dev/null +++ b/include/linux/pks.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_PKS_H +#define _LINUX_PKS_H + +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + +#include + +#include + +void pks_update_protection(u8 pkey, u8 protection); + +/** + * pks_set_readwrite() - Make the domain Read/Write + * @pkey: the pkey for which the access should change. + * + * Allow all access, read and write, to the domain specified by pkey. Thi= s is + * not a global update and only affects the current running thread. + */ +static inline void pks_set_readwrite(u8 pkey) +{ + pks_update_protection(pkey, PKEY_READ_WRITE); +} + +#else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ + +static inline void pks_set_readwrite(u8 pkey) {} + +#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ + +#endif /* _LINUX_PKS_H */ diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-gene= ric/mman-common.h index 6c1aa92a92e4..f179544bd33a 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -80,6 +80,7 @@ /* compatibility flags */ #define MAP_FILE 0 =20 +#define PKEY_READ_WRITE 0x0 #define PKEY_DISABLE_ACCESS 0x1 #define PKEY_DISABLE_WRITE 0x2 #define PKEY_ACCESS_MASK (PKEY_DISABLE_ACCESS |\ --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5B6FC433EF for ; Tue, 19 Apr 2022 17:07:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355786AbiDSRKb (ORCPT ); Tue, 19 Apr 2022 13:10:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59884 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355552AbiDSRJr (ORCPT ); Tue, 19 Apr 2022 13:09:47 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B5C663FE for ; Tue, 19 Apr 2022 10:07:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388024; x=1681924024; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+5b3IZ2C5Hrv6G06EbPOOwQd6d5B5Nj2B016UVv3rJ8=; b=XqY1aV5zQ9Ph/ZicmT9XrhToK0wx5AhoVazf08z8j32Igsb4ZiJKQT12 Gj6fxqmYtpjiyR3pnTV8Ezd6vgpXAOzWVcsIYSQPDOn73/Pd0mGx3RvJ+ Kph+NAN1y6LBRUvge1RpBbOv+nBH3Gh772MQvBmC0FU37oZymY8uAOA1C MjkyeuZ6scrP2g9K5kEPkmJJKNyR0aqxJ8kvRKwchnGx5X89cks2xuvKc 6ED4DJ/XvCccENTu9EF531BfTNYBSnlmm37N6+pDgSq4flBJU7f2E1WcD g6xdcQiM7ExxMHHZLZoOCK8GFxfrHiLy5WNHdCaMlwY0Ek2vBAQc9IU9L g==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="263991827" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="263991827" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:03 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="861588314" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:02 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 15/44] mm/pkeys: Introduce pks_set_noaccess() Date: Tue, 19 Apr 2022 10:06:20 -0700 Message-Id: <20220419170649.1022246-16-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny After a valid access consumers will want to change PKS protections back to No Access for their pkey. Define pks_set_noaccess() to update the specified pkey. Add documentation. Signed-off-by: Ira Weiny --- Changes for V9 Move to pks.h Change pkey type to u8 From 0day Fix documentation link From Dave Hansen use pkey s/pks_mk*/pks_set*/ Changes for V8 Make the call inline Split this patch out from 'Add PKS kernel API' Include documentation in this patch --- Documentation/core-api/protection-keys.rst | 2 +- include/linux/pks.h | 13 +++++++++++++ 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index 3af92e1cbffd..78904d98519b 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -147,7 +147,7 @@ Changing permissions of individual keys --------------------------------------- =20 .. kernel-doc:: include/linux/pks.h - :identifiers: pks_set_readwrite + :identifiers: pks_set_readwrite pks_set_noaccess =20 MSR details ~~~~~~~~~~~ diff --git a/include/linux/pks.h b/include/linux/pks.h index 8b705a937b19..9f18f8b4cbb1 100644 --- a/include/linux/pks.h +++ b/include/linux/pks.h @@ -10,6 +10,18 @@ =20 void pks_update_protection(u8 pkey, u8 protection); =20 +/** + * pks_set_noaccess() - Disable all access to the domain + * @pkey: the pkey for which the access should change. + * + * Disable all access to the domain specified by pkey. This is not a glob= al + * update and only affects the current running thread. + */ +static inline void pks_set_noaccess(u8 pkey) +{ + pks_update_protection(pkey, PKEY_DISABLE_ACCESS); +} + /** * pks_set_readwrite() - Make the domain Read/Write * @pkey: the pkey for which the access should change. @@ -24,6 +36,7 @@ static inline void pks_set_readwrite(u8 pkey) =20 #else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 +static inline void pks_set_noaccess(u8 pkey) {} static inline void pks_set_readwrite(u8 pkey) {} =20 #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 953EBC433EF for ; Tue, 19 Apr 2022 17:07:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355712AbiDSRKd (ORCPT ); Tue, 19 Apr 2022 13:10:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59886 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355606AbiDSRJr (ORCPT ); Tue, 19 Apr 2022 13:09:47 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4835762DA for ; Tue, 19 Apr 2022 10:07:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388024; x=1681924024; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HcDeh0eiynxcYfiW5+zztgOa1CKZmyghXifdTBUHbIQ=; b=jgqMzJ9/5Fu/RBp2WNjR5HauWyiIrzDrnUYjCl1iyqGwLdM9VbGjV3wc B0rkhK52LIDhZpMV8hy5hxppzW0lfJx/w87XnDZ7M4NHHJ6S0cNUlKty9 YWsDByXFlPo8hWCEdr8BNavmZHunSE+gNNFjNgiXCb8gueckN0xl0t47X Y1XLT7H71gI7HGz9YfcOY5Qlu8U+R/TiC7Zim/ww6GmuiAj/WEWKL6m34 zs+CPfkNHJ0W/tSOKoHjgLMzkwtePHJPZDz9VIsUqc+18R8u/EEsDQGR9 t7iybGvAtic6YKQtJnSPvbj1I8rUOT9jDXMV3W/1vK9X0JKXhq3vt7Jgz A==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="251123599" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="251123599" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:04 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="727145538" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:03 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 16/44] mm/pkeys: Introduce PKS fault callbacks Date: Tue, 19 Apr 2022 10:06:21 -0700 Message-Id: <20220419170649.1022246-17-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Rick Edgecombe Some PKS consumers will want special handling on violations of pkey permissions. Such a consumer is PMEM which will want to have a mode that logs the access violation, disables protection, and continues rather than oops'ing the machine. Provide an API to assign callbacks for individual pkeys. Since PKS faults do not provide the key that faulted, this information needs to be recovered by walking the page tables and extracting it from the leaf entry. The key can then be used to call the proper callback. Add documentation. Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Signed-off-by: Rick Edgecombe --- Changes for V9: Rework commit message Adjust for the use of linux/pks.h From the new key allocation: s/PKS_NR_CONSUMERS/PKS_KEY_MAX From Dave Hansen use pkey Fix conflicts with other users in the test code by moving this forward in the series Changes for V8: Add pt_regs to the callback signature so that pks_update_exception() can be called if needed. Update commit message Determine if page is large prior to not present Update commit message with more clarity as to why this was kept separate from pks_abandon_protections() and pks_test_callback() Embed documentation in c file. Move handle_pks_key_fault() to pkeys.c s/handle_pks_key_fault/pks_handle_key_fault/ This consolidates the PKS code nicely Add feature check to pks_handle_key_fault() From Rick Edgecombe Fix key value check From kernel test robot Add static to handle_pks_key_fault Changes for V7: New patch --- Documentation/core-api/protection-keys.rst | 6 ++ arch/x86/include/asm/pks.h | 10 +++ arch/x86/mm/fault.c | 17 +++-- arch/x86/mm/pkeys.c | 86 ++++++++++++++++++++++ include/linux/pks.h | 3 + 5 files changed, 116 insertions(+), 6 deletions(-) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index 78904d98519b..f309cecc3915 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -149,6 +149,12 @@ Changing permissions of individual keys .. kernel-doc:: include/linux/pks.h :identifiers: pks_set_readwrite pks_set_noaccess =20 +Overriding Default Fault Behavior +--------------------------------- + +.. kernel-doc:: arch/x86/mm/pkeys.c + :doc: DEFINE_PKS_FAULT_CALLBACK + MSR details ~~~~~~~~~~~ =20 diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h index a7bad7301783..e9ad3ecd7ed0 100644 --- a/arch/x86/include/asm/pks.h +++ b/arch/x86/include/asm/pks.h @@ -7,11 +7,21 @@ void pks_setup(void); void x86_pkrs_load(struct thread_struct *thread); =20 +bool pks_handle_key_fault(struct pt_regs *regs, unsigned long hw_error_cod= e, + unsigned long address); + #else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 static inline void pks_setup(void) { } static inline void x86_pkrs_load(struct thread_struct *thread) { } =20 +static inline bool pks_handle_key_fault(struct pt_regs *regs, + unsigned long hw_error_code, + unsigned long address) +{ + return false; +} + #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 #endif /* _ASM_X86_PKS_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 5599109d1124..e8934df1b886 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -33,6 +33,7 @@ #include /* kvm_handle_async_pf */ #include /* fixup_vdso_exception() */ #include +#include /* pks_handle_key_fault() */ =20 #define CREATE_TRACE_POINTS #include @@ -1147,12 +1148,16 @@ static void do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code, unsigned long address) { - /* - * PF_PF faults should only occur on kernel - * addresses when supervisor pkeys are enabled. - */ - WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_PKS) && - (hw_error_code & X86_PF_PK)); + if (hw_error_code & X86_PF_PK) { + /* + * PF_PF faults should only occur on kernel + * addresses when supervisor pkeys are enabled. + */ + WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_PKS)); + + if (pks_handle_key_fault(regs, hw_error_code, address)) + return; + } =20 #ifdef CONFIG_X86_32 /* diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index e4cbc79686ea..a3b27b7811da 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -12,6 +12,7 @@ =20 #include /* boot_cpu_has, ... */ #include /* vma_pkey() */ +#include /* X86_PF_WRITE */ =20 int __execute_only_pkey(struct mm_struct *mm) { @@ -216,6 +217,91 @@ u32 pkey_update_pkval(u32 pkval, u8 pkey, u32 accessbi= ts) =20 static DEFINE_PER_CPU(u32, pkrs_cache); =20 +/** + * DOC: DEFINE_PKS_FAULT_CALLBACK + * + * Users may also provide a fault handler which can handle a fault differe= ntly + * than an oops. For example if 'MY_FEATURE' wanted to define a handler t= hey + * can do so by adding the coresponding entry to the pks_key_callbacks arr= ay. + * + * .. code-block:: c + * + * #ifdef CONFIG_MY_FEATURE + * bool my_feature_pks_fault_callback(struct pt_regs *regs, + * unsigned long address, bool write) + * { + * if (my_feature_fault_is_ok) + * return true; + * return false; + * } + * #endif + * + * static const pks_key_callback pks_key_callbacks[PKS_KEY_MAX] =3D { + * [PKS_KEY_DEFAULT] =3D NULL, + * #ifdef CONFIG_MY_FEATURE + * [PKS_KEY_MY_FEATURE] =3D my_feature_pks_fault_callback, + * #endif + * }; + */ +static const pks_key_callback pks_key_callbacks[PKS_KEY_MAX] =3D { 0 }; + +static bool pks_call_fault_callback(struct pt_regs *regs, unsigned long ad= dress, + bool write, u16 key) +{ + if (key >=3D PKS_KEY_MAX) + return false; + + if (pks_key_callbacks[key]) + return pks_key_callbacks[key](regs, address, write); + + return false; +} + +bool pks_handle_key_fault(struct pt_regs *regs, unsigned long hw_error_cod= e, + unsigned long address) +{ + bool write; + pgd_t pgd; + p4d_t p4d; + pud_t pud; + pmd_t pmd; + pte_t pte; + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return false; + + write =3D (hw_error_code & X86_PF_WRITE); + + pgd =3D READ_ONCE(*(init_mm.pgd + pgd_index(address))); + if (!pgd_present(pgd)) + return false; + + p4d =3D READ_ONCE(*p4d_offset(&pgd, address)); + if (p4d_large(p4d)) + return pks_call_fault_callback(regs, address, write, + pte_flags_pkey(p4d_val(p4d))); + if (!p4d_present(p4d)) + return false; + + pud =3D READ_ONCE(*pud_offset(&p4d, address)); + if (pud_large(pud)) + return pks_call_fault_callback(regs, address, write, + pte_flags_pkey(pud_val(pud))); + if (!pud_present(pud)) + return false; + + pmd =3D READ_ONCE(*pmd_offset(&pud, address)); + if (pmd_large(pmd)) + return pks_call_fault_callback(regs, address, write, + pte_flags_pkey(pmd_val(pmd))); + if (!pmd_present(pmd)) + return false; + + pte =3D READ_ONCE(*pte_offset_kernel(&pmd, address)); + return pks_call_fault_callback(regs, address, write, + pte_flags_pkey(pte_val(pte))); +} + /* * pks_write_pkrs() - Write the pkrs of the current CPU * @new_pkrs: New value to write to the current CPU register diff --git a/include/linux/pks.h b/include/linux/pks.h index 9f18f8b4cbb1..d0d8bf1aaa1d 100644 --- a/include/linux/pks.h +++ b/include/linux/pks.h @@ -34,6 +34,9 @@ static inline void pks_set_readwrite(u8 pkey) pks_update_protection(pkey, PKEY_READ_WRITE); } =20 +typedef bool (*pks_key_callback)(struct pt_regs *regs, unsigned long addre= ss, + bool write); + #else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 static inline void pks_set_noaccess(u8 pkey) {} --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C8F2C433EF for ; Tue, 19 Apr 2022 17:10:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355842AbiDSRKo (ORCPT ); Tue, 19 Apr 2022 13:10:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355357AbiDSRJt (ORCPT ); Tue, 19 Apr 2022 13:09:49 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 472BBC27 for ; Tue, 19 Apr 2022 10:07:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388025; x=1681924025; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mw75AWpj9H6CsXSMMg1BomLVYazNvgvNALkRDIB8pQM=; b=NLuIY18JotFIGHcRzhRfmZIrDUO4H75ss8qaqZq5ZWfcmgIOYnM6u7Ld WQYPANpP7JQbltc/bmm6OVmTG+u2zaYXklGYmgqULLzIXq2Y0IlJwyPVN xzGXoMv4jVW3NpROEqDYfj0nMuGp1CQeywKo/LHr3IF7Bz8jF5U3fuqg7 VJ5dWNXGMkk8GiYNB82YsgCkry8rkCduxXj+IiFAZr1zJMUuuOU7xMbYO FV1SYd5TxBc/dpyYS1wBxbjL15y/qVyAkO263xZd++y5QIwH8bTkisKrl RSROCKw0ehjSMtja56E3A7TmIquA0vKKWLUY0LcUqGsfrfpyhBFsxuaXx Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="326720539" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="326720539" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:04 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="804734108" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:04 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 17/44] x86/entry: Add auxiliary pt_regs space Date: Tue, 19 Apr 2022 10:06:22 -0700 Message-Id: <20220419170649.1022246-18-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The PKRS MSR is not managed by XSAVE. In order for the MSR to be saved during an exception the current CPU MSR value needs to be saved somewhere during the exception and restored when returning to the previous context. Two possible places for preserving this state were considered, irqentry_state_t or pt_regs.[1] pt_regs was much more complicated and was potentially fraught with unintended consequences.[2] However, Andy Lutomirski came up with a way to hide additional values on the stack which could be accessed as "extended_pt_regs".[3] This method allows any function with current access to pt_regs to obtain access to the extra information without expanding the use of irqentry_state_t and leaving pt_regs intact for compatibility with outside tools like BPF. Prepare the assembly code to add a hidden auxiliary pt_regs space. To simplify, the assembly code only adds space on the stack as defined by the C code which needs it. The use of this space is left to the C code which is required to select ARCH_HAS_PTREGS_AUXILIARY to enable this support. Each nested exception gets another copy of this auxiliary space allowing for any number of levels of exception handling. Initially the space is left empty and results in no code changes because ARCH_HAS_PTREGS_AUXILIARY is not set. Subsequent patches adding data to pt_regs_auxiliary must set ARCH_HAS_PTREGS_AUXILIARY or a build failure will occur. The use of ARCH_HAS_PTREGS_AUXILIARY also avoids the introduction of 2 instructions (addq/subq) on every entry call when the extra space is not needed. 32bit is specifically excluded as the current consumer of this, PKS, will not support 32bit either. Peter, Thomas, Andy, Dave, and Dan all suggested parts of the patch or aided in the development of the patch.. [1] https://lore.kernel.org/lkml/CALCETrVe1i5JdyzD_BcctxQJn+ZE3T38EFPgjxN1F= 577M36g+w@mail.gmail.com/ [2] https://lore.kernel.org/lkml/874kpxx4jf.fsf@nanos.tec.linutronix.de/#t [3] https://lore.kernel.org/lkml/CALCETrUHwZPic89oExMMe-WyDY8-O3W68NcZvse3= =3DPGW+iW5=3Dw@mail.gmail.com/ Cc: Dave Hansen Cc: Dan Williams Suggested-by: Dave Hansen Suggested-by: Dan Williams Suggested-by: Peter Zijlstra Suggested-by: Thomas Gleixner Suggested-by: Andy Lutomirski Signed-off-by: Ira Weiny --- Changes for V9: Update commit message Changes for V8: Exclude 32bit Introduce ARCH_HAS_PTREGS_AUXILIARY to optimize this away when not needed. From Thomas s/EXTENDED_PT_REGS_SIZE/PT_REGS_AUX_SIZE Fix up PTREGS_AUX_SIZE macro to be based on the structures and used in assembly code via the nifty asm-offset macros Bound calls into c code with [PUSH|POP]_RTREGS_AUXILIARY instead of using a macro 'call' Split this patch out and put the PKS specific stuff in a separate patch Changes for V7: Rebased to 5.14 entry code declare write_pkrs() in pks.h s/INIT_PKRS_VALUE/pkrs_init_value Remove unnecessary INIT_PKRS_VALUE def s/pkrs_save_set_irq/pkrs_save_irq/ The inital value for exceptions is best managed completely within the pkey code. --- arch/x86/Kconfig | 4 ++++ arch/x86/entry/calling.h | 20 ++++++++++++++++++++ arch/x86/entry/entry_64.S | 22 ++++++++++++++++++++++ arch/x86/entry/entry_64_compat.S | 6 ++++++ arch/x86/include/asm/ptrace.h | 18 ++++++++++++++++++ arch/x86/kernel/asm-offsets_64.c | 15 +++++++++++++++ arch/x86/kernel/head_64.S | 6 ++++++ 7 files changed, 91 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c53deda2ea25..69e611d3b8ef 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1889,6 +1889,10 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS =20 If unsure, say y. =20 +config ARCH_HAS_PTREGS_AUXILIARY + depends on X86_64 + bool + choice prompt "TSX enable mode" depends on CPU_SUP_INTEL diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index a4c061fb7c6e..d0ebf9b069c9 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -63,6 +63,26 @@ For 32-bit we have the following conventions - kernel is= built with * for assembly code: */ =20 + +#ifdef CONFIG_ARCH_HAS_PTREGS_AUXILIARY + +.macro PUSH_PTREGS_AUXILIARY + /* add space for pt_regs_auxiliary */ + subq $PTREGS_AUX_SIZE, %rsp +.endm + +.macro POP_PTREGS_AUXILIARY + /* remove space for pt_regs_auxiliary */ + addq $PTREGS_AUX_SIZE, %rsp +.endm + +#else + +#define PUSH_PTREGS_AUXILIARY +#define POP_PTREGS_AUXILIARY + +#endif + .macro PUSH_REGS rdx=3D%rdx rax=3D%rax save_ret=3D0 .if \save_ret pushq %rsi /* pt_regs->si */ diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 4faac48ebec5..5a037a56814d 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -335,7 +335,9 @@ SYM_CODE_END(ret_from_fork) movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */ .endif =20 + PUSH_PTREGS_AUXILIARY call \cfunc + POP_PTREGS_AUXILIARY =20 jmp error_return .endm @@ -440,7 +442,9 @@ SYM_CODE_START(\asmsym) =20 movq %rsp, %rdi /* pt_regs pointer */ =20 + PUSH_PTREGS_AUXILIARY call \cfunc + POP_PTREGS_AUXILIARY =20 jmp paranoid_exit =20 @@ -502,7 +506,9 @@ SYM_CODE_START(\asmsym) * stack. */ movq %rsp, %rdi /* pt_regs pointer */ + PUSH_PTREGS_AUXILIARY call vc_switch_off_ist + POP_PTREGS_AUXILIARY movq %rax, %rsp /* Switch to new stack */ =20 UNWIND_HINT_REGS @@ -513,7 +519,9 @@ SYM_CODE_START(\asmsym) =20 movq %rsp, %rdi /* pt_regs pointer */ =20 + PUSH_PTREGS_AUXILIARY call kernel_\cfunc + POP_PTREGS_AUXILIARY =20 /* * No need to switch back to the IST stack. The current stack is either @@ -549,7 +557,9 @@ SYM_CODE_START(\asmsym) movq %rsp, %rdi /* pt_regs pointer into first argument */ movq ORIG_RAX(%rsp), %rsi /* get error code into 2nd argument*/ movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */ + PUSH_PTREGS_AUXILIARY call \cfunc + POP_PTREGS_AUXILIARY =20 /* For some configurations \cfunc ends up being a noreturn. */ REACHABLE @@ -802,7 +812,9 @@ SYM_CODE_START_LOCAL(exc_xen_hypervisor_callback) movq %rdi, %rsp /* we don't return, adjust the stack frame */ UNWIND_HINT_REGS =20 + PUSH_PTREGS_AUXILIARY call xen_pv_evtchn_do_upcall + POP_PTREGS_AUXILIARY =20 jmp error_return SYM_CODE_END(exc_xen_hypervisor_callback) @@ -1003,7 +1015,9 @@ SYM_CODE_START_LOCAL(error_entry) /* Put us onto the real thread stack. */ popq %r12 /* save return addr in %12 */ movq %rsp, %rdi /* arg0 =3D pt_regs pointer */ + PUSH_PTREGS_AUXILIARY call sync_regs + POP_PTREGS_AUXILIARY movq %rax, %rsp /* switch stack */ ENCODE_FRAME_POINTER pushq %r12 @@ -1059,7 +1073,9 @@ SYM_CODE_START_LOCAL(error_entry) * as if we faulted immediately after IRET. */ mov %rsp, %rdi + PUSH_PTREGS_AUXILIARY call fixup_bad_iret + POP_PTREGS_AUXILIARY mov %rax, %rsp jmp .Lerror_entry_from_usermode_after_swapgs SYM_CODE_END(error_entry) @@ -1166,7 +1182,9 @@ SYM_CODE_START(asm_exc_nmi) =20 movq %rsp, %rdi movq $-1, %rsi + PUSH_PTREGS_AUXILIARY call exc_nmi + POP_PTREGS_AUXILIARY =20 /* * Return back to user mode. We must *not* do the normal exit @@ -1202,6 +1220,8 @@ SYM_CODE_START(asm_exc_nmi) * +---------------------------------------------------------+ * | pt_regs | * +---------------------------------------------------------+ + * | (Optionally) pt_regs_extended | + * +---------------------------------------------------------+ * * The "original" frame is used by hardware. Before re-enabling * NMIs, we need to be done with it, and we need to leave enough @@ -1380,7 +1400,9 @@ end_repeat_nmi: =20 movq %rsp, %rdi movq $-1, %rsi + PUSH_PTREGS_AUXILIARY call exc_nmi + POP_PTREGS_AUXILIARY =20 /* Always restore stashed CR3 value (see paranoid_entry) */ RESTORE_CR3 scratch_reg=3D%r15 save_reg=3D%r14 diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_com= pat.S index 4fdb007cddbd..cf6c88eb384d 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -137,7 +137,9 @@ SYM_INNER_LABEL(entry_SYSENTER_compat_after_hwframe, SY= M_L_GLOBAL) .Lsysenter_flags_fixed: =20 movq %rsp, %rdi + PUSH_PTREGS_AUXILIARY call do_SYSENTER_32 + POP_PTREGS_AUXILIARY /* XEN PV guests always use IRET path */ ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_userm= ode", \ "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV @@ -257,7 +259,9 @@ SYM_INNER_LABEL(entry_SYSCALL_compat_after_hwframe, SYM= _L_GLOBAL) UNWIND_HINT_REGS =20 movq %rsp, %rdi + PUSH_PTREGS_AUXILIARY call do_fast_syscall_32 + POP_PTREGS_AUXILIARY /* XEN PV guests always use IRET path */ ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_userm= ode", \ "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV @@ -415,6 +419,8 @@ SYM_CODE_START(entry_INT80_compat) cld =20 movq %rsp, %rdi + PUSH_PTREGS_AUXILIARY call do_int80_syscall_32 + POP_PTREGS_AUXILIARY jmp swapgs_restore_regs_and_return_to_usermode SYM_CODE_END(entry_INT80_compat) diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 4357e0f2cd5f..0889045b3a6f 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -2,6 +2,7 @@ #ifndef _ASM_X86_PTRACE_H #define _ASM_X86_PTRACE_H =20 +#include #include #include #include @@ -91,6 +92,23 @@ struct pt_regs { /* top of stack page */ }; =20 +/* + * NOTE: Features which add data to pt_regs_auxiliary must select + * ARCH_HAS_PTREGS_AUXILIARY. Failure to do so will result in a build fai= lure. + */ +struct pt_regs_auxiliary { +}; + +struct pt_regs_extended { + struct pt_regs_auxiliary aux; + struct pt_regs pt_regs __aligned(8); +}; + +static inline struct pt_regs_extended *to_extended_pt_regs(struct pt_regs = *regs) +{ + return container_of(regs, struct pt_regs_extended, pt_regs); +} + #endif /* !__i386__ */ =20 #ifdef CONFIG_PARAVIRT diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets= _64.c index b14533af7676..66f08ac3507a 100644 --- a/arch/x86/kernel/asm-offsets_64.c +++ b/arch/x86/kernel/asm-offsets_64.c @@ -4,6 +4,7 @@ #endif =20 #include +#include =20 #if defined(CONFIG_KVM_GUEST) && defined(CONFIG_PARAVIRT_SPINLOCKS) #include @@ -60,5 +61,19 @@ int main(void) DEFINE(stack_canary_offset, offsetof(struct fixed_percpu_data, stack_cana= ry)); BLANK(); #endif + +#ifdef CONFIG_ARCH_HAS_PTREGS_AUXILIARY + /* Size of Auxiliary pt_regs data */ + DEFINE(PTREGS_AUX_SIZE, sizeof(struct pt_regs_extended) - + sizeof(struct pt_regs)); +#else + /* + * Adding data to struct pt_regs_auxiliary requires setting + * ARCH_HAS_PTREGS_AUXILIARY + */ + BUILD_BUG_ON((sizeof(struct pt_regs_extended) - + sizeof(struct pt_regs)) !=3D 0); +#endif + return 0; } diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index b8e3019547a5..00bc3a74efb7 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -341,8 +341,10 @@ SYM_CODE_START_NOALIGN(vc_boot_ghcb) movq %rsp, %rdi movq ORIG_RAX(%rsp), %rsi movq initial_vc_handler(%rip), %rax + PUSH_PTREGS_AUXILIARY ANNOTATE_RETPOLINE_SAFE call *%rax + POP_PTREGS_AUXILIARY =20 /* Unwind pt_regs */ POP_REGS @@ -421,7 +423,9 @@ SYM_CODE_START_LOCAL(early_idt_handler_common) UNWIND_HINT_REGS =20 movq %rsp,%rdi /* RDI =3D pt_regs; RSI is already trapnr */ + PUSH_PTREGS_AUXILIARY call do_early_exception + POP_PTREGS_AUXILIARY =20 decl early_recursion_flag(%rip) jmp restore_regs_and_return_to_kernel @@ -448,7 +452,9 @@ SYM_CODE_START_NOALIGN(vc_no_ghcb) /* Call C handler */ movq %rsp, %rdi movq ORIG_RAX(%rsp), %rsi + PUSH_PTREGS_AUXILIARY call do_vc_no_ghcb + POP_PTREGS_AUXILIARY =20 /* Unwind pt_regs */ POP_REGS --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21073C433EF for ; Tue, 19 Apr 2022 17:09:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355991AbiDSRL6 (ORCPT ); Tue, 19 Apr 2022 13:11:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60052 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355705AbiDSRKU (ORCPT ); Tue, 19 Apr 2022 13:10:20 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 093D713D38 for ; Tue, 19 Apr 2022 10:07:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388048; x=1681924048; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MdeiJnA9pcMmVctsMXdiSwINVFM1oa0cvnMci6E9E0w=; b=QhYvBksSt16mr3Q71wdISUnfLemXhueqjMUuZlUPWek1br0l95wZQ9i7 7zEzWaF7/xiGIm5VfKfaGwPX/AJFFKYarD/2QforTiLlSIHbRqrMUHdkW E7wRA0b/zfOHxjSu+ywD7RnO6xzzHtzFqUJZVcQ5vnnZe7Zn0xjeZ/zd4 uiziLULBAe5gtG716YOEH0+k6kUsaQDy2Q/XNb5B+6XA2WjZiQSI9/EVs K2czhlO8m36Kwlakg5XfOY7fMuwBE7LDOBJe+56T73DTyyor/7Qo1/Tta Njr21x13LNQPsrVarZcSIPFEiWIx5ySf+/Ebc4CGTdLUhwAKeWBB1eMZb A==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="243750587" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="243750587" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:05 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="530498666" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:05 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 18/44] entry: Pass pt_regs to irqentry_exit_cond_resched() Date: Tue, 19 Apr 2022 10:06:23 -0700 Message-Id: <20220419170649.1022246-19-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Auxiliary pt_regs space needs to be manipulated by the generic entry/exit code. Ideally irqentry_exit() would take care of handling any auxiliary pt_regs on exit. Unfortunately, irqentry_exit() is not the only exit from exception path. The call to irqentry_exit_cond_resched() from xen_pv_evtchn_do_upcall() bypasses irqentry_exit(). Make irqentry_exit_cond_resched() symmetrical with irqentry_enter() by passing pt_regs to it. This makes irqentry_exit_cond_resched() capable of handling auxiliary pt_regs in future patches. Separate out the internal functionality of irqentry_exit_cond_resched() and call that internally from irqentry_exit(). Signed-off-by: Ira Weiny --- Changes for V10 Patch used to be: entry: Split up irqentry_exit_cond_resched() Upstream changes forced this change. Changes for V9 Update commit message Changes for V8 New Patch --- arch/arm64/include/asm/preempt.h | 2 +- arch/arm64/kernel/entry-common.c | 4 ++-- arch/x86/entry/common.c | 2 +- include/linux/entry-common.h | 17 ++++++++------ kernel/entry/common.c | 13 +++++++---- kernel/sched/core.c | 40 ++++++++++++++++---------------- 6 files changed, 43 insertions(+), 35 deletions(-) diff --git a/arch/arm64/include/asm/preempt.h b/arch/arm64/include/asm/pree= mpt.h index 0159b625cc7f..bd185a214096 100644 --- a/arch/arm64/include/asm/preempt.h +++ b/arch/arm64/include/asm/preempt.h @@ -87,7 +87,7 @@ void preempt_schedule_notrace(void); =20 #ifdef CONFIG_PREEMPT_DYNAMIC =20 -DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched); +DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched_internal); void dynamic_preempt_schedule(void); #define __preempt_schedule() dynamic_preempt_schedule() void dynamic_preempt_schedule_notrace(void); diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-com= mon.c index 878c65aa7206..593d31154a62 100644 --- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -224,9 +224,9 @@ static void noinstr arm64_exit_el1_dbg(struct pt_regs *= regs) } =20 #ifdef CONFIG_PREEMPT_DYNAMIC -DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched); +DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched_internal); #define need_irq_preemption() \ - (static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched)) + (static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched_internal)) #else #define need_irq_preemption() (IS_ENABLED(CONFIG_PREEMPTION)) #endif diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 6c2826417b33..f1ba770d035d 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -309,7 +309,7 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct p= t_regs *regs) =20 inhcall =3D get_and_clear_inhcall(); if (inhcall && !WARN_ON_ONCE(state.exit_rcu)) { - irqentry_exit_cond_resched(); + irqentry_exit_cond_resched(regs); instrumentation_end(); restore_inhcall(inhcall); } else { diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index ab78bd4c2eb0..f35086d2a258 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -412,23 +412,26 @@ irqentry_state_t noinstr irqentry_enter(struct pt_reg= s *regs); =20 /** * irqentry_exit_cond_resched - Conditionally reschedule on return from in= terrupt + * @regs: Pointer to pt_regs of interrupted context * * Conditional reschedule with additional sanity checks. */ +void irqentry_exit_cond_resched(struct pt_regs *regs); + void raw_irqentry_exit_cond_resched(void); #ifdef CONFIG_PREEMPT_DYNAMIC #if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL) -#define irqentry_exit_cond_resched_dynamic_enabled raw_irqentry_exit_cond_= resched -#define irqentry_exit_cond_resched_dynamic_disabled NULL -DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_res= ched); -#define irqentry_exit_cond_resched() static_call(irqentry_exit_cond_resche= d)() +#define irqentry_exit_cond_resched_internal_dynamic_enabled raw_irqentry_e= xit_cond_resched +#define irqentry_exit_cond_resched_internal_dynamic_disabled NULL +DECLARE_STATIC_CALL(irqentry_exit_cond_resched_internal, raw_irqentry_exit= _cond_resched); +#define irqentry_exit_cond_resched_internal() static_call(irqentry_exit_co= nd_resched_internal)() #elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY) -DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched); +DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched_internal); void dynamic_irqentry_exit_cond_resched(void); -#define irqentry_exit_cond_resched() dynamic_irqentry_exit_cond_resched() +#define irqentry_exit_cond_resched_internal() dynamic_irqentry_exit_cond_r= esched() #endif #else /* CONFIG_PREEMPT_DYNAMIC */ -#define irqentry_exit_cond_resched() raw_irqentry_exit_cond_resched() +#define irqentry_exit_cond_resched_internal() raw_irqentry_exit_cond_resch= ed() #endif /* CONFIG_PREEMPT_DYNAMIC */ =20 /** diff --git a/kernel/entry/common.c b/kernel/entry/common.c index 93c3b86e781c..8f73b54bfa56 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -387,18 +387,23 @@ void raw_irqentry_exit_cond_resched(void) } #ifdef CONFIG_PREEMPT_DYNAMIC #if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL) -DEFINE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resc= hed); +DEFINE_STATIC_CALL(irqentry_exit_cond_resched_internal, raw_irqentry_exit_= cond_resched); #elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY) -DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched); +DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched_internal); void dynamic_irqentry_exit_cond_resched(void) { - if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched)) + if (!static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched_intern= al)) return; raw_irqentry_exit_cond_resched(); } #endif #endif =20 +void irqentry_exit_cond_resched(struct pt_regs *regs) +{ + irqentry_exit_cond_resched_internal(); +} + noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) { lockdep_assert_irqs_disabled(); @@ -425,7 +430,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqent= ry_state_t state) =20 instrumentation_begin(); if (IS_ENABLED(CONFIG_PREEMPTION)) - irqentry_exit_cond_resched(); + irqentry_exit_cond_resched_internal(); =20 /* Covers both tracing and lockdep */ trace_hardirqs_on(); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 51efaabac3e4..139ccd2c4b66 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8284,29 +8284,29 @@ EXPORT_SYMBOL(__cond_resched_rwlock_write); * SC:might_resched * SC:preempt_schedule * SC:preempt_schedule_notrace - * SC:irqentry_exit_cond_resched + * SC:irqentry_exit_cond_resched_internal * * * NONE: - * cond_resched <- __cond_resched - * might_resched <- RET0 - * preempt_schedule <- NOP - * preempt_schedule_notrace <- NOP - * irqentry_exit_cond_resched <- NOP + * cond_resched <- __cond_resched + * might_resched <- RET0 + * preempt_schedule <- NOP + * preempt_schedule_notrace <- NOP + * irqentry_exit_cond_resched_internal <- NOP * * VOLUNTARY: - * cond_resched <- __cond_resched - * might_resched <- __cond_resched - * preempt_schedule <- NOP - * preempt_schedule_notrace <- NOP - * irqentry_exit_cond_resched <- NOP + * cond_resched <- __cond_resched + * might_resched <- __cond_resched + * preempt_schedule <- NOP + * preempt_schedule_notrace <- NOP + * irqentry_exit_cond_resched_internal <- NOP * * FULL: - * cond_resched <- RET0 - * might_resched <- RET0 - * preempt_schedule <- preempt_schedule - * preempt_schedule_notrace <- preempt_schedule_notrace - * irqentry_exit_cond_resched <- irqentry_exit_cond_resched + * cond_resched <- RET0 + * might_resched <- RET0 + * preempt_schedule <- preempt_schedule + * preempt_schedule_notrace <- preempt_schedule_notrace + * irqentry_exit_cond_resched_internal <- irqentry_exit_cond_resched_int= ernal */ =20 enum { @@ -8352,7 +8352,7 @@ void sched_dynamic_update(int mode) preempt_dynamic_enable(might_resched); preempt_dynamic_enable(preempt_schedule); preempt_dynamic_enable(preempt_schedule_notrace); - preempt_dynamic_enable(irqentry_exit_cond_resched); + preempt_dynamic_enable(irqentry_exit_cond_resched_internal); =20 switch (mode) { case preempt_dynamic_none: @@ -8360,7 +8360,7 @@ void sched_dynamic_update(int mode) preempt_dynamic_disable(might_resched); preempt_dynamic_disable(preempt_schedule); preempt_dynamic_disable(preempt_schedule_notrace); - preempt_dynamic_disable(irqentry_exit_cond_resched); + preempt_dynamic_disable(irqentry_exit_cond_resched_internal); pr_info("Dynamic Preempt: none\n"); break; =20 @@ -8369,7 +8369,7 @@ void sched_dynamic_update(int mode) preempt_dynamic_enable(might_resched); preempt_dynamic_disable(preempt_schedule); preempt_dynamic_disable(preempt_schedule_notrace); - preempt_dynamic_disable(irqentry_exit_cond_resched); + preempt_dynamic_disable(irqentry_exit_cond_resched_internal); pr_info("Dynamic Preempt: voluntary\n"); break; =20 @@ -8378,7 +8378,7 @@ void sched_dynamic_update(int mode) preempt_dynamic_disable(might_resched); preempt_dynamic_enable(preempt_schedule); preempt_dynamic_enable(preempt_schedule_notrace); - preempt_dynamic_enable(irqentry_exit_cond_resched); + preempt_dynamic_enable(irqentry_exit_cond_resched_internal); pr_info("Dynamic Preempt: full\n"); break; } --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78336C4332F for ; Tue, 19 Apr 2022 17:10:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355882AbiDSRKw (ORCPT ); Tue, 19 Apr 2022 13:10:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355612AbiDSRJu (ORCPT ); Tue, 19 Apr 2022 13:09:50 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E8141138 for ; Tue, 19 Apr 2022 10:07:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388027; x=1681924027; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OvMcHVtcJvbvw4mAw0A77nXekx1g9G9EavzzKuKHygk=; b=Xq2FZQgRlEaUq802TILiJZkv3LWUVXlGahX/6kNiur3gGsQJMo6jHfYZ U1jgJaPN17hbukbFPWVXM3Gn+kpJL1KfoCpMNyE87H+q8W3dc/9EK8sXi EV3uL9E70blmmRV4mVgFUmKZwyLLSX3w/CTlFKowklaR+/VSQHwWnok+h N7HeXfjhBQcQo4a61+bgufgx6zhX4X8QOOi/L7hz+s4Lh9+PY+7fhF3RJ bXna2kf9y0kpl4z04UEZrqky6ZqaZrXnRJcGiorZKdF2jJVrkckYHTLfz BH+yxkxumA1Zi46nRR+sonXhmM88jqWbhOp2PWila7aAn9No7LxY+KBw1 g==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="263991849" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="263991849" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:07 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="861588499" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:06 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 19/44] entry: Add calls for save/restore auxiliary pt_regs Date: Tue, 19 Apr 2022 10:06:24 -0700 Message-Id: <20220419170649.1022246-20-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Some architectures have auxiliary pt_regs space which is available to store extra information on the stack. For ease of implementation the common C code was left to fill in the data when needed. Add calls to the architecture save and restore auxiliary pt_regs functions. Define empty calls for any architecture which does not have auxiliary pt_regs. NOTE: Due to the split nature of the Xen exit code irqentry_exit_cond_resched() requires an unbalanced call to arch_restore_aux_pt_regs(). Signed-off-by: Ira Weiny --- Changes for V9 Update commit message Changes for V8 New patch which introduces a generic auxiliary pt_register save restore. --- include/linux/entry-common.h | 7 +++++++ kernel/entry/common.c | 16 ++++++++++++++-- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index f35086d2a258..15b35ca937f2 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -79,6 +79,13 @@ static __always_inline void arch_check_user_regs(struct = pt_regs *regs); static __always_inline void arch_check_user_regs(struct pt_regs *regs) {} #endif =20 +#ifndef CONFIG_ARCH_HAS_PTREGS_AUXILIARY + +static inline void arch_save_aux_pt_regs(struct pt_regs *regs) { } +static inline void arch_restore_aux_pt_regs(struct pt_regs *regs) { } + +#endif + /** * enter_from_user_mode - Establish state when coming from user mode * diff --git a/kernel/entry/common.c b/kernel/entry/common.c index 8f73b54bfa56..9a02b517c7e7 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -317,7 +317,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs = *regs) =20 if (user_mode(regs)) { irqentry_enter_from_user_mode(regs); - return ret; + goto aux_save; } =20 /* @@ -356,7 +356,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs = *regs) instrumentation_end(); =20 ret.exit_rcu =3D true; - return ret; + goto aux_save; } =20 /* @@ -371,6 +371,11 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs= *regs) trace_hardirqs_off_finish(); instrumentation_end(); =20 +aux_save: + instrumentation_begin(); + arch_save_aux_pt_regs(regs); + instrumentation_end(); + return ret; } =20 @@ -401,6 +406,7 @@ void dynamic_irqentry_exit_cond_resched(void) =20 void irqentry_exit_cond_resched(struct pt_regs *regs) { + arch_restore_aux_pt_regs(regs); irqentry_exit_cond_resched_internal(); } =20 @@ -408,6 +414,10 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqen= try_state_t state) { lockdep_assert_irqs_disabled(); =20 + instrumentation_begin(); + arch_restore_aux_pt_regs(regs); + instrumentation_end(); + /* Check whether this returns to user mode */ if (user_mode(regs)) { irqentry_exit_to_user_mode(regs); @@ -459,6 +469,7 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_r= egs *regs) instrumentation_begin(); trace_hardirqs_off_finish(); ftrace_nmi_enter(); + arch_save_aux_pt_regs(regs); instrumentation_end(); =20 return irq_state; @@ -467,6 +478,7 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_r= egs *regs) void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_= state) { instrumentation_begin(); + arch_restore_aux_pt_regs(regs); ftrace_nmi_exit(); if (irq_state.lockdep) { trace_hardirqs_on_prepare(); --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59A3EC433F5 for ; Tue, 19 Apr 2022 17:10:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355870AbiDSRKv (ORCPT ); Tue, 19 Apr 2022 13:10:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355585AbiDSRJ4 (ORCPT ); Tue, 19 Apr 2022 13:09:56 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE11AB1CC for ; Tue, 19 Apr 2022 10:07:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388028; x=1681924028; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WpP+vv71UOEAq4C6Qpr453ktQWY8DDevCFkTA3RJgVU=; b=fePB4rDFZivFM9/rkG7ueYxW+td9U/pCvLN17C4OrY0QOJIzDqLE2PIT kU8HYvoTysjPJXtQeRT+un92FGsVXdFXrEBrSHNKODyPcj91uPjSVMJ/b V3XEJLbDBnVB1PGel0QHdV28lubivgRrp8KuRtPojMlYMjESwR/br87MR 34VXCk1Ct1jtEAwoU6yODjXgfObhox12CGdwgg8LUSwZWO9BGCyd4eNZa k/lLkpa5BHJuDqU4t+opjcEq82tfiTHhVfZIvEqO/ul7JyIMtWAI3WRM9 2ZBdA37BhY2+l4zU6AkHtbamMzG+6hY1F6zT6NSGbm2ehYFt0SPrJO35+ Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="245710065" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="245710065" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:08 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="647332107" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:07 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 20/44] x86/entry: Define arch_{save|restore}_auxiliary_pt_regs() Date: Tue, 19 Apr 2022 10:06:25 -0700 Message-Id: <20220419170649.1022246-21-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The x86 architecture supports the new auxiliary pt_regs space if ARCH_HAS_PTREGS_AUXILIARY is enabled. Define the callbacks within the x86 code required by the core entry code when this support is enabled. Signed-off-by: Ira Weiny --- Changes for V8 New patch --- arch/x86/include/asm/entry-common.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/ent= ry-common.h index 43184640b579..5fa5dd2d539c 100644 --- a/arch/x86/include/asm/entry-common.h +++ b/arch/x86/include/asm/entry-common.h @@ -95,4 +95,16 @@ static __always_inline void arch_exit_to_user_mode(void) } #define arch_exit_to_user_mode arch_exit_to_user_mode =20 +#ifdef CONFIG_ARCH_HAS_PTREGS_AUXILIARY + +static inline void arch_save_aux_pt_regs(struct pt_regs *regs) +{ +} + +static inline void arch_restore_aux_pt_regs(struct pt_regs *regs) +{ +} + +#endif + #endif --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71F9AC433F5 for ; Tue, 19 Apr 2022 17:08:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348359AbiDSRK5 (ORCPT ); Tue, 19 Apr 2022 13:10:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355638AbiDSRJ4 (ORCPT ); Tue, 19 Apr 2022 13:09:56 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB0E2B7E1 for ; Tue, 19 Apr 2022 10:07:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388029; x=1681924029; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Q8nK9Y056zBcLZ+qaDMBNpGF5C6sPY08OlBsXsBRCsI=; b=lIB53kGCgzZ7DI6z/xSZbFgk6PGO0kS719y9cfDBlIWrK1yENr1ihX8R F4aM4KDeVEbMoeyBa5z2aTVAOTy10VoKjsvLYytEceSoyg4B0hX+5K528 TinAVyoiJHh4J6MRq7iAkWiQuhSQFZxZtcha4xEDFZXdeprYNnNLj4BP9 qzAQAOc4nlkX8QD8hQfC0nbGU61nVZm+hNI09wO8pioDq6fo5IHb+8T1P rt5LynUbds6eomW8nq9Daitm+85avIYMOmIRFrr0pX8efA+CGXslTSl/Y HI1JgP8tQu7k5yDzz4CvsLfoVQxSZSlcpixWzuHIg8EWbe2PGBwOqt5oA w==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="245710067" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="245710067" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:08 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="647332115" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:08 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 21/44] x86/pkeys: Preserve PKRS MSR across exceptions Date: Tue, 19 Apr 2022 10:06:26 -0700 Message-Id: <20220419170649.1022246-22-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny PKRS is a per-logical-processor MSR which overlays additional protection for pages which have been mapped with a protection key. It is desired to protect PKS pages while executing exception code while also allowing exception code to access PKS pages with the proper pks_set_*() calls. To do this the current thread value must be saved, the CPU MSR value set to the default value during the exception, and the saved thread value restored upon completion. This can be done with the new auxiliary pt_regs space. When PKS is configured, configure auxiliary pt_regs, add space to pt_regs_auxiliary, and define save/restore functions. Peter, Thomas, Andy, Dave, and Dan all suggested parts of the patch or aided in the development of the patch. [1] https://lore.kernel.org/lkml/CALCETrVe1i5JdyzD_BcctxQJn+ZE3T38EFPgjxN1F= 577M36g+w@mail.gmail.com/ [2] https://lore.kernel.org/lkml/874kpxx4jf.fsf@nanos.tec.linutronix.de/#t [3] https://lore.kernel.org/lkml/CALCETrUHwZPic89oExMMe-WyDY8-O3W68NcZvse3= =3DPGW+iW5=3Dw@mail.gmail.com/ Cc: Dave Hansen Cc: Dan Williams Suggested-by: Dave Hansen Suggested-by: Dan Williams Suggested-by: Peter Zijlstra Suggested-by: Thomas Gleixner Suggested-by: Andy Lutomirski Signed-off-by: Ira Weiny --- Changes for V10: Remove test changes. Changes for V9: Update commit message s/pks_thread_pkrs/pkrs/ From Dave Hansen s/pks_saved_pkrs/pkrs/ Changes for V8: Tie this into the new generic auxiliary pt_regs support. Build this on the new irqentry_*() refactoring patches Split this patch off from the PKS portion of the auxiliary pt_regs functionality. From Thomas Fix noinstr mess s/write_pkrs/pks_write_pkrs s/pkrs_init_value/PKRS_INIT_VALUE Simplify the number and location of the save/restore calls. Cover entry from user space as well. Changes for V7: Rebased to 5.14 entry code declare write_pkrs() in pks.h s/INIT_PKRS_VALUE/pkrs_init_value Remove unnecessary INIT_PKRS_VALUE def s/pkrs_save_set_irq/pkrs_save_irq/ The inital value for exceptions is best managed completely within the pkey code. --- arch/x86/Kconfig | 3 ++- arch/x86/include/asm/entry-common.h | 3 +++ arch/x86/include/asm/pks.h | 4 ++++ arch/x86/include/asm/ptrace.h | 3 +++ arch/x86/mm/pkeys.c | 32 +++++++++++++++++++++++++++++ 5 files changed, 44 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 69e611d3b8ef..43464511ea9d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1890,8 +1890,9 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS If unsure, say y. =20 config ARCH_HAS_PTREGS_AUXILIARY + def_bool y depends on X86_64 - bool + depends on ARCH_ENABLE_SUPERVISOR_PKEYS =20 choice prompt "TSX enable mode" diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/ent= ry-common.h index 5fa5dd2d539c..803727b95b3a 100644 --- a/arch/x86/include/asm/entry-common.h +++ b/arch/x86/include/asm/entry-common.h @@ -8,6 +8,7 @@ #include #include #include +#include =20 /* Check that the stack and regs on entry from user mode are sane. */ static __always_inline void arch_check_user_regs(struct pt_regs *regs) @@ -99,10 +100,12 @@ static __always_inline void arch_exit_to_user_mode(voi= d) =20 static inline void arch_save_aux_pt_regs(struct pt_regs *regs) { + pks_save_pt_regs(regs); } =20 static inline void arch_restore_aux_pt_regs(struct pt_regs *regs) { + pks_restore_pt_regs(regs); } =20 #endif diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h index e9ad3ecd7ed0..b69e03a141fe 100644 --- a/arch/x86/include/asm/pks.h +++ b/arch/x86/include/asm/pks.h @@ -6,6 +6,8 @@ =20 void pks_setup(void); void x86_pkrs_load(struct thread_struct *thread); +void pks_save_pt_regs(struct pt_regs *regs); +void pks_restore_pt_regs(struct pt_regs *regs); =20 bool pks_handle_key_fault(struct pt_regs *regs, unsigned long hw_error_cod= e, unsigned long address); @@ -14,6 +16,8 @@ bool pks_handle_key_fault(struct pt_regs *regs, unsigned = long hw_error_code, =20 static inline void pks_setup(void) { } static inline void x86_pkrs_load(struct thread_struct *thread) { } +static inline void pks_save_pt_regs(struct pt_regs *regs) { } +static inline void pks_restore_pt_regs(struct pt_regs *regs) { } =20 static inline bool pks_handle_key_fault(struct pt_regs *regs, unsigned long hw_error_code, diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 0889045b3a6f..73936739c7e7 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -97,6 +97,9 @@ struct pt_regs { * ARCH_HAS_PTREGS_AUXILIARY. Failure to do so will result in a build fai= lure. */ struct pt_regs_auxiliary { +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + u32 pkrs; +#endif }; =20 struct pt_regs_extended { diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index a3b27b7811da..dd02e76d0359 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -342,6 +342,38 @@ void x86_pkrs_load(struct thread_struct *thread) pks_write_pkrs(thread->pkrs); } =20 +/* + * PKRS is a per-logical-processor MSR which overlays additional protectio= n for + * pages which have been mapped with a protection key. + * + * To protect against exceptions having potentially privileged access to m= emory + * of an interrupted thread, save the current thread value and set the PKRS + * value to be used during the exception. + */ +void pks_save_pt_regs(struct pt_regs *regs) +{ + struct pt_regs_auxiliary *aux_pt_regs; + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + aux_pt_regs =3D &to_extended_pt_regs(regs)->aux; + aux_pt_regs->pkrs =3D current->thread.pkrs; + pks_write_pkrs(PKS_INIT_VALUE); +} + +void pks_restore_pt_regs(struct pt_regs *regs) +{ + struct pt_regs_auxiliary *aux_pt_regs; + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + aux_pt_regs =3D &to_extended_pt_regs(regs)->aux; + current->thread.pkrs =3D aux_pt_regs->pkrs; + pks_write_pkrs(current->thread.pkrs); +} + /* * PKS is independent of PKU and either or both may be supported on a CPU. * --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0B4DC4167D for ; Tue, 19 Apr 2022 17:10:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356146AbiDSRNA (ORCPT ); Tue, 19 Apr 2022 13:13:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59840 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355648AbiDSRJ4 (ORCPT ); Tue, 19 Apr 2022 13:09:56 -0400 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 10DF1BC2E for ; Tue, 19 Apr 2022 10:07:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388030; x=1681924030; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8vcs5ki3Ih9PXx6ZkRzoktwKq0Lv/OaVdzmdmV9hSGs=; b=j0DCvUreKiiTsgTPf2kQiXjjqygzxfZJ4jAe8Cv8zBXWj8Ht9utRxNPX cpHr7a/Cyfw/oydzaZJXjwl9F/wg28JLKwfqmV2oNslHCqkOiVvMJ2+Ly LlhhfucHehvhsRQuCA3L7X06uHWwxOjugqNouwzDKewMp/cFOzrdQHWNW LGh8gogQ0sfvvkOzkrqVkyuMt9PkcBxu7u88rDGd/pSaYHSMJ/UIJZ2J5 kaa2UgzZFSbgCq6EOaDpSUiJYxdWsJy/9VZUVxhlV3J19mWdDeCYoDj+V CFiTJoW/EuJoGEzOVXPVujCu02HGUib2ALcqUTyBUnDwRrgQC5ttf8/oW g==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="350267605" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="350267605" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:09 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="510207175" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:08 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 22/44] x86/fault: Print PKS MSR on fault Date: Tue, 19 Apr 2022 10:06:27 -0700 Message-Id: <20220419170649.1022246-23-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny If a PKS fault occurs it will be easier to debug if the PKS MSR value at the time of the fault is known. Add pks_show_regs() to __show_regs() to show the PKRS MSR on fault if enabled. An 'executive summary' of the pt_regs are saved in __die_header() which ensures that the first registers are saved in the event of multiple faults. Teach this code about the extended pt_registers such that the PKS code can get to the original pkrs value as well. Suggested-by: Andy Lutomirski Suggested-by: Dave Hansen Signed-off-by: Ira Weiny --- Changes for V9 From Dave Hansen Move this output to __show_regs() next to the PKRU register dump Changes for V8 Split this into it's own patch. --- arch/x86/include/asm/pks.h | 3 +++ arch/x86/kernel/dumpstack.c | 32 ++++++++++++++++++++++++++++++-- arch/x86/kernel/process_64.c | 1 + arch/x86/mm/pkeys.c | 11 +++++++++++ 4 files changed, 45 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h index b69e03a141fe..de67d5b5a2af 100644 --- a/arch/x86/include/asm/pks.h +++ b/arch/x86/include/asm/pks.h @@ -8,6 +8,7 @@ void pks_setup(void); void x86_pkrs_load(struct thread_struct *thread); void pks_save_pt_regs(struct pt_regs *regs); void pks_restore_pt_regs(struct pt_regs *regs); +void pks_show_regs(struct pt_regs *regs, const char *log_lvl); =20 bool pks_handle_key_fault(struct pt_regs *regs, unsigned long hw_error_cod= e, unsigned long address); @@ -18,6 +19,8 @@ static inline void pks_setup(void) { } static inline void x86_pkrs_load(struct thread_struct *thread) { } static inline void pks_save_pt_regs(struct pt_regs *regs) { } static inline void pks_restore_pt_regs(struct pt_regs *regs) { } +static inline void pks_show_regs(struct pt_regs *regs, + const char *log_lvl) { } =20 static inline bool pks_handle_key_fault(struct pt_regs *regs, unsigned long hw_error_code, diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index afae4dd77495..5fae75113def 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -27,8 +27,36 @@ int panic_on_unrecovered_nmi; int panic_on_io_nmi; static int die_counter; =20 +#ifdef CONFIG_ARCH_HAS_PTREGS_AUXILIARY + +static struct pt_regs_extended exec_summary_regs; + +static void save_exec_summary(struct pt_regs *regs) +{ + exec_summary_regs =3D *(to_extended_pt_regs(regs)); +} + +static struct pt_regs *retrieve_exec_summary(void) +{ + return &exec_summary_regs.pt_regs; +} + +#else /* !CONFIG_ARCH_HAS_PTREGS_AUXILIARY */ + static struct pt_regs exec_summary_regs; =20 +static void save_exec_summary(struct pt_regs *regs) +{ + exec_summary_regs =3D *regs; +} + +static struct pt_regs *retrieve_exec_summary(void) +{ + return &exec_summary_regs; +} + +#endif /* CONFIG_ARCH_HAS_PTREGS_AUXILIARY */ + bool noinstr in_task_stack(unsigned long *stack, struct task_struct *task, struct stack_info *info) { @@ -363,7 +391,7 @@ void oops_end(unsigned long flags, struct pt_regs *regs= , int signr) oops_exit(); =20 /* Executive summary in case the oops scrolled away */ - __show_regs(&exec_summary_regs, SHOW_REGS_ALL, KERN_DEFAULT); + __show_regs(retrieve_exec_summary(), SHOW_REGS_ALL, KERN_DEFAULT); =20 if (!signr) return; @@ -390,7 +418,7 @@ static void __die_header(const char *str, struct pt_reg= s *regs, long err) =20 /* Save the regs of the first oops for the executive summary later. */ if (!die_counter) - exec_summary_regs =3D *regs; + save_exec_summary(regs); =20 if (IS_ENABLED(CONFIG_PREEMPTION)) pr =3D IS_ENABLED(CONFIG_PREEMPT_RT) ? " PREEMPT_RT" : " PREEMPT"; diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 5cfa1f8c8465..cd8f362a83c4 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -140,6 +140,7 @@ void __show_regs(struct pt_regs *regs, enum show_regs_m= ode mode, =20 if (cpu_feature_enabled(X86_FEATURE_OSPKE)) printk("%sPKRU: %08x\n", log_lvl, read_pkru()); + pks_show_regs(regs, log_lvl); } =20 void release_thread(struct task_struct *dead_task) diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index dd02e76d0359..a993c9b23815 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -374,6 +374,17 @@ void pks_restore_pt_regs(struct pt_regs *regs) pks_write_pkrs(current->thread.pkrs); } =20 +void pks_show_regs(struct pt_regs *regs, const char *log_lvl) +{ + struct pt_regs_auxiliary *aux_pt_regs; + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + aux_pt_regs =3D &to_extended_pt_regs(regs)->aux; + printk("%sPKRS: 0x%x\n", log_lvl, aux_pt_regs->pkrs); +} + /* * PKS is independent of PKU and either or both may be supported on a CPU. * --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BA69C433F5 for ; Tue, 19 Apr 2022 17:08:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355687AbiDSRLB (ORCPT ); Tue, 19 Apr 2022 13:11:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59884 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355643AbiDSRJ4 (ORCPT ); Tue, 19 Apr 2022 13:09:56 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BEE02E0F7 for ; Tue, 19 Apr 2022 10:07:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388030; x=1681924030; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qf9xNshiJlT4pUpMy4BR/enTPIkN1TfITI3Rx0+jKd0=; b=eU0hVwuKY2MekCinKD7zEdlmMrN5s4AKXw5y67VrkWMwlbgtL5RKugqV odBmctpfwvElPdgZqw9WF42RuDTjh0FPz/qPGUcJP9vu9EC0flyyt6sxL ZEOZTVbalLMaedVClcxjZeulX+JisiP76xjGhm5RzQ6reRK0dohI6loQ4 r6GHtTDd2NNnDgjFJ0v9N69ztqY5nToS+pI56YTvdCHCy6h0mBBrvm58y tw9ZgR+hRej6wR54L8S2tmIdxKnOBOiR5xrY2LdisVk66lCxaNRh5lMOZ W+97+hcvNSvdRFG4nfiKazaygOY+uOI358DNhxuKTU8ZcblrdoIcHBHVm g==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="263991859" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="263991859" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:10 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="529397236" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:09 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 23/44] mm/pkeys: Introduce pks_update_exception() Date: Tue, 19 Apr 2022 10:06:28 -0700 Message-Id: <20220419170649.1022246-24-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Some PKS use cases will want to catch permissions violations with the fault callback mechanism and optionally allow the access. The pks_set_*() calls update the protection of the current running context. They will not work to change the protections of a thread which has been interrupted. Therefore updating a thread from within an exception requires a different method. Introduce pks_update_exception() which updates the faulted threads protections in addition to the current context. Add documentation Signed-off-by: Ira Weiny --- Changes for V9 Add preemption disable around pkrs per cpu cache Update commit message Change pkey type to u8 s/pks_saved_pkrs/pkrs Changes for V8 Remove the concept of abandoning a pkey in favor of using the custom fault handler via this new pks_update_exception() call Without an abandon call there is no need for an abandon mask on sched in, new thread creation, or within exceptions... This now lets all invalid access' fault Ensure that all entry points into the pks has feature checks... Place abandon fault check before the test callback to ensure testing does not detect the double fault of the abandon code and flag it incorrectly as a fault. Change return type of pks_handle_abandoned_pkeys() to bool --- Documentation/core-api/protection-keys.rst | 3 ++ arch/x86/mm/pkeys.c | 58 +++++++++++++++++++--- include/linux/pks.h | 5 ++ 3 files changed, 58 insertions(+), 8 deletions(-) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index f309cecc3915..c5f0f5d39929 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -149,6 +149,9 @@ Changing permissions of individual keys .. kernel-doc:: include/linux/pks.h :identifiers: pks_set_readwrite pks_set_noaccess =20 +.. kernel-doc:: arch/x86/mm/pkeys.c + :identifiers: pks_update_exception + Overriding Default Fault Behavior --------------------------------- =20 diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index a993c9b23815..975ed206d957 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -405,6 +405,18 @@ void pks_setup(void) cr4_set_bits(X86_CR4_PKS); } =20 +static void __pks_update_protection(u8 pkey, u8 protection) +{ + u32 pkrs; + + pkrs =3D current->thread.pkrs; + current->thread.pkrs =3D pkey_update_pkval(pkrs, pkey, protection); + + preempt_disable(); + pks_write_pkrs(current->thread.pkrs); + preempt_enable(); +} + /* * Do not call this directly, see pks_set*(). * @@ -418,21 +430,51 @@ void pks_setup(void) */ void pks_update_protection(u8 pkey, u8 protection) { - u32 pkrs; - if (!cpu_feature_enabled(X86_FEATURE_PKS)) return; =20 if (WARN_ON_ONCE(pkey >=3D PKS_KEY_MAX)) return; =20 - pkrs =3D current->thread.pkrs; - current->thread.pkrs =3D pkey_update_pkval(pkrs, pkey, - protection); - preempt_disable(); - pks_write_pkrs(current->thread.pkrs); - preempt_enable(); + __pks_update_protection(pkey, protection); } EXPORT_SYMBOL_GPL(pks_update_protection); =20 +/** + * pks_update_exception() - Update the protections of a faulted thread + * + * @regs: Faulting thread registers + * @pkey: pkey to update + * @protection: protection bits to use. + * + * CONTEXT: Exception + * + * pks_update_exception() updates the faulted threads protections in addit= ion + * to the protections within the exception. + * + * This is useful because the pks_set_*() functions will not work to chang= e the + * protections of a thread which has been interrupted. Only the current + * context is updated by those functions. Therefore, if a PKS fault callb= ack + * wants to update the faulted threads protections it must call + * pks_update_exception(). + */ +void pks_update_exception(struct pt_regs *regs, u8 pkey, u8 protection) +{ + struct pt_regs_extended *ept_regs; + u32 old; + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + if (WARN_ON_ONCE(pkey >=3D PKS_KEY_MAX)) + return; + + __pks_update_protection(pkey, protection); + + ept_regs =3D to_extended_pt_regs(regs); + old =3D ept_regs->aux.pkrs; + ept_regs->aux.pkrs =3D pkey_update_pkval(old, pkey, protection); +} +EXPORT_SYMBOL_GPL(pks_update_exception); + #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ diff --git a/include/linux/pks.h b/include/linux/pks.h index d0d8bf1aaa1d..2ea5fb57f2dc 100644 --- a/include/linux/pks.h +++ b/include/linux/pks.h @@ -9,6 +9,7 @@ #include =20 void pks_update_protection(u8 pkey, u8 protection); +void pks_update_exception(struct pt_regs *regs, u8 pkey, u8 protection); =20 /** * pks_set_noaccess() - Disable all access to the domain @@ -41,6 +42,10 @@ typedef bool (*pks_key_callback)(struct pt_regs *regs, u= nsigned long address, =20 static inline void pks_set_noaccess(u8 pkey) {} static inline void pks_set_readwrite(u8 pkey) {} +static inline void pks_update_exception(struct pt_regs *regs, + u8 pkey, + u8 protection) +{ } =20 #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D20F7C4167E for ; Tue, 19 Apr 2022 17:10:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356131AbiDSRM6 (ORCPT ); Tue, 19 Apr 2022 13:12:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355644AbiDSRJ4 (ORCPT ); Tue, 19 Apr 2022 13:09:56 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ECAAF10FE0 for ; Tue, 19 Apr 2022 10:07:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388030; x=1681924030; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3BionnZDsSMnuwX2BAiBk9rbYY0AKD+hMHLH+ghigtE=; b=RizUCaLobAXEIIzFcRfC/9OpC5TnNc2hI89tZacGy49twPIFfV/TKwHb T56ABHZNiHjw+hilhDmRTGGtZNj5ghMowEY0rFGiTcSuhAFwPSmUxifpr w/rw1gAM+0lLLRsRifk2o0dC0KtDQEbvk24YVDTUJFl2gIiR6AUeB5FAr 10Jtb9dUeWG64PgRvZZ7gtq2SEks4AKSbtsrqwaD+2ZQnfmUmOBd9P/Kh 792HPcmUNRaqVJVlJ6LszN0XkM1U39+Qvi3tVAaiMkkK9cu2Y2vTV1RfJ t1wF5FI3Y4eQf2jyk3GyL3FdJPOXnAHFckdBeSSTgOPL6lDqlcG2VZQDX g==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="263991861" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="263991861" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:10 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="804734133" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:10 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 24/44] mm/pkeys: Add pks_available() Date: Tue, 19 Apr 2022 10:06:29 -0700 Message-Id: <20220419170649.1022246-25-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny If PKS is configured within the kernel but the CPU does not support PKS, the PKS calls remain safe to execute even without protection. However, adding the overhead of these calls on CPUs which don't support PKS is inefficient and best avoided. Define pks_available() to allow users to check if PKS is enabled on the current system. The implementation of pks_available() is placed in the asm headers while being directly exported via linux/pks.h to allow for the inline calling of cpu_feature_enabled() by consumers outside of the architecture. Signed-off-by: Ira Weiny --- Changes for V9 Driven by a request by Dan Williams to make this static inline Place this in pks.h to avoid header conflicts while allowing for an optimized call to cpu_feature_enabled() Changes for V8 s/pks_enabled/pks_available --- Documentation/core-api/protection-keys.rst | 3 +++ arch/x86/include/asm/pks.h | 12 ++++++++++++ include/linux/pks.h | 8 ++++++++ 3 files changed, 23 insertions(+) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index c5f0f5d39929..47bcb38fff4f 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -152,6 +152,9 @@ Changing permissions of individual keys .. kernel-doc:: arch/x86/mm/pkeys.c :identifiers: pks_update_exception =20 +.. kernel-doc:: arch/x86/include/asm/pks.h + :identifiers: pks_available + Overriding Default Fault Behavior --------------------------------- =20 diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h index de67d5b5a2af..cab42aadea07 100644 --- a/arch/x86/include/asm/pks.h +++ b/arch/x86/include/asm/pks.h @@ -2,8 +2,20 @@ #ifndef _ASM_X86_PKS_H #define _ASM_X86_PKS_H =20 +#include + #ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS =20 +/** + * pks_available() - Is PKS available on this system + * + * Return if PKS is currently supported and enabled on this system. + */ +static inline bool pks_available(void) +{ + return cpu_feature_enabled(X86_FEATURE_PKS); +} + void pks_setup(void); void x86_pkrs_load(struct thread_struct *thread); void pks_save_pt_regs(struct pt_regs *regs); diff --git a/include/linux/pks.h b/include/linux/pks.h index 2ea5fb57f2dc..151a3fda9de4 100644 --- a/include/linux/pks.h +++ b/include/linux/pks.h @@ -8,6 +8,9 @@ =20 #include =20 +#include + +bool pks_available(void); void pks_update_protection(u8 pkey, u8 protection); void pks_update_exception(struct pt_regs *regs, u8 pkey, u8 protection); =20 @@ -40,6 +43,11 @@ typedef bool (*pks_key_callback)(struct pt_regs *regs, u= nsigned long address, =20 #else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 +static inline bool pks_available(void) +{ + return false; +} + static inline void pks_set_noaccess(u8 pkey) {} static inline void pks_set_readwrite(u8 pkey) {} static inline void pks_update_exception(struct pt_regs *regs, --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 380EDC433F5 for ; Tue, 19 Apr 2022 17:08:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355917AbiDSRLR (ORCPT ); Tue, 19 Apr 2022 13:11:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355658AbiDSRJ5 (ORCPT ); Tue, 19 Apr 2022 13:09:57 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 33E33DED9 for ; Tue, 19 Apr 2022 10:07:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388034; x=1681924034; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ql2r7AqKnHApM/EHR/Pff0tcNQI0JekmA3IUxRweYz0=; b=JWymLZw2iN/fu/UnU9KDK1Ev49ZQ0YD22Nx4Yy9+nFJFLuZaeNU7ikFq LTIyZ+rFp5K5dBfQaUDv5QM8nyEtQ/qr/qelwVpsIFsWlyxLvbq+zvBDV 7D7KFX1dZvEZJ4qPHR1yY1KzIZuAIyg+ichRv4OgYOfP9v1Shx6XDyWmr VbT1MjJiyApB36ZQjKYO7l9qc6tOX+2Q/tCDTT2Q8iidwcZautNB8hDzw Jqng7ncuIhEC0d8apjUz+6DSRGc5L9J1nYGEGwyn3T0L8eQ9++/YkTLOU CGGHFJwNJe2zV/0b61PDaKl01dS15SH2rY4GxlTONvQ5tAowbUL6YWgTK Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="288918165" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="288918165" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:13 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="576192246" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:11 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 25/44] memremap_pages: Add Kconfig for DEVMAP_ACCESS_PROTECTION Date: Tue, 19 Apr 2022 10:06:30 -0700 Message-Id: <20220419170649.1022246-26-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The persistent memory (PMEM) driver uses the memremap_pages facility to provide 'struct page' metadata (vmemmap) for PMEM. Given that PMEM capacity may be orders of magnitude higher capacity than System RAM it presents a large vulnerability surface to stray writes. Unlike stray writes to System RAM, which may result in a crash or other undesirable behavior, stray writes to PMEM additionally are more likely to result in permanent data loss. Reboot is not a remediation for PMEM corruption like it is for System RAM. Given that PMEM access from the kernel is limited to a constrained set of locations (PMEM driver, Filesystem-DAX, and direct-I/O to a DAX page), it is amenable to supervisor pkey protection. Add a Kconfig option to configure additional devmap protections using PKS. Only PMEM which is advertised to the memory subsystem needs this protection. Therefore, the feature depends on NVDIMM_PFN. Signed-off-by: Ira Weiny --- Changes for V10 Rebased to latest Changes for V9 Change this to enable arch pks consumer for mutual exclusion with testing all pkeys From Dan Williams Default to no Clean up commit message Changes for V8 Split this out from [PATCH V7 13/18] memremap_pages: Add access protection via supervisor Pro= tection Keys (PKS) --- mm/Kconfig | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/mm/Kconfig b/mm/Kconfig index 29c272974aa9..fe1752e6e76c 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -797,6 +797,24 @@ config ZONE_DEVICE =20 If FS_DAX is enabled, then say Y. =20 +config DEVMAP_ACCESS_PROTECTION + bool "Access protection for memremap_pages()" + depends on NVDIMM_PFN + depends on ARCH_HAS_SUPERVISOR_PKEYS + select ARCH_ENABLE_PKS_CONSUMER + default n + + help + Enable extra protections on device memory. This protects against + unintended access to devices such as a stray writes. This feature is + particularly useful to protect against corruption of persistent + memory. + + This depends on architecture support of supervisor PKeys and has no + overhead if the architecture does not support them. + + If you have persistent memory say 'Y'. + # # Helpers to mirror range of the CPU page tables of a process into device = page # tables. --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D1B6C433F5 for ; Tue, 19 Apr 2022 17:08:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355912AbiDSRLJ (ORCPT ); Tue, 19 Apr 2022 13:11:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60050 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355657AbiDSRJ4 (ORCPT ); Tue, 19 Apr 2022 13:09:56 -0400 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95728EA5 for ; Tue, 19 Apr 2022 10:07:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388032; x=1681924032; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8HY8BDua6BN4XkwHxR01tuvztdYTdoms4hurcP8Ocpo=; b=U6qi3nPgQlvWwHsJBPyCKGZ8EofngQ1XOE7OJmb7mH2+kzbVocAl6DmN rT9+sYjxZhaMldriTek4uZoqm6szYSe/O0i79U3mWa6FZuJCSzZdqgnij TF7tEXAAGWSvzFgFefCqFY8XxQzicIc8prhsWvuwFnMCCA9cKLuB1zL/m 9wHEnZcx6CkgCfM+6Sy1G3/YNXDvYjpp7MftpQ6OP/2Yxz4/WrcTb3gSc 3W1VBHTvnX+ZXRMMB9KuktVABofp+AviJrJTT2lBVA2M6oBta7fGVRCUC knzrc6Qlhvq/vrO//Cir670VKGO+Tp41tUKp4m24p98aNz8+3Qfkkis+e Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="350267610" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="350267610" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:12 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="510207192" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:11 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 26/44] memremap_pages: Introduce pgmap_protection_available() Date: Tue, 19 Apr 2022 10:06:31 -0700 Message-Id: <20220419170649.1022246-27-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny PMEM will flag additional dev_pagemap protection through (struct dev_pagemap)->flags. However, it is more efficient to know if that protection is available prior to requesting it and failing the mapping. Define pgmap_protection_available() to check if protection is available prior to being requested. The name of pgmap_protection_available() was specifically chosen to isolate the implementation of the protection from higher level users. Signed-off-by: Ira Weiny --- Changes for V10 Move code from mm.h to memremap.h Upstream separated memremap.h functionality from mm.h dc90f0846df4 ("mm: don't include in ") Changes for V9 Clean up commit message From Dan Williams make call stack static inline throughout this call and pks_available() such that callers calls cpu_feature_enabled() directly Changes for V8 Split this out to it's own patch. s/pgmap_protection_enabled/pgmap_protection_available --- include/linux/memremap.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 8af304f6b504..7980d0db8617 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -6,6 +6,7 @@ #include #include #include +#include =20 struct resource; struct device; @@ -214,4 +215,20 @@ static inline void put_dev_pagemap(struct dev_pagemap = *pgmap) percpu_ref_put(&pgmap->ref); } =20 +#ifdef CONFIG_DEVMAP_ACCESS_PROTECTION + +static inline bool pgmap_protection_available(void) +{ + return pks_available(); +} + +#else + +static inline bool pgmap_protection_available(void) +{ + return false; +} + +#endif /* CONFIG_DEVMAP_ACCESS_PROTECTION */ + #endif /* _LINUX_MEMREMAP_H_ */ --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B41CFC4167B for ; Tue, 19 Apr 2022 17:10:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356092AbiDSRMl (ORCPT ); Tue, 19 Apr 2022 13:12:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355583AbiDSRJ5 (ORCPT ); Tue, 19 Apr 2022 13:09:57 -0400 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05F715F5B for ; Tue, 19 Apr 2022 10:07:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388033; x=1681924033; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bSg1SSR8/uHeBlXU/TSHBEVrfJclqHVfsxA9NRaiIU8=; b=SIwhqX2sfCv5mNWsMVa2M4/fwflup2jIUjVP8idQPfP8D4Nve6ELsBk7 uw9+S4l+b2/SI4gYxAS+uObZ10ebqEC6HZthSsn8y030veiHgLOqfbABk CPe2wzsVtk96J/N+Shu9YH9gHmzp0VcP++N7/qMlHOgBfwcPeG928eyik IUxUVLGHFMATDbGzZmCJMfLruBJRXwmVupB1/sTevCSkQx31dH4g4vqNj i/c8aYDHlX7xq1RtjRtIc8gXcmmh+Ize8SYtMf1R4lZ0qGc1tu6uNTQph E2aevjIsm6umQ5+skMJxouuvXgC95YPG/uNIJ92Ta4ngH26KEfmpTyIDw g==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="350267618" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="350267618" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:13 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="727145550" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:12 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 27/44] memremap_pages: Introduce a PGMAP_PROTECTION flag Date: Tue, 19 Apr 2022 10:06:32 -0700 Message-Id: <20220419170649.1022246-28-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The persistent memory (PMEM) driver uses the memremap_pages facility to provide 'struct page' metadata (vmemmap) for PMEM. Given that PMEM capacity maybe orders of magnitude higher capacity than System RAM it presents a large vulnerability surface to stray writes. Unlike stray writes to System RAM, which may result in a crash or other undesirable behavior, stray writes to PMEM additionally are more likely to result in permanent data loss. Reboot is not a remediation for PMEM corruption like it is for System RAM. Given that PMEM access from the kernel is limited to a constrained set of locations (PMEM driver, Filesystem-DAX, and direct-I/O to a DAX page), it is amenable to supervisor pkey protection. Some systems which have configured DEVMAP_ACCESS_PROTECTION may not have PMEM installed. Or the PMEM may not be mapped into the direct map. In addition, some callers of memremap_pages() will not want the mapped pages protected. Define a new PGMAP flag to distinguish page maps which are protected. Use this flag to enable runtime protection support. A static key is used to optimize the runtime support. Specifying this flag on a system which can't support protections will fail. Callers are expected to check if protections are supported via pgmap_protection_available(). It was considered to have callers specify the flag and check if the dev_pagemap object returned was protected or not. But this was considered less efficient than a direct check beforehand. Signed-off-by: Ira Weiny --- Changes for V9 Clean up commit message Changes for V8 Split this out into it's own patch --- include/linux/memremap.h | 1 + mm/memremap.c | 40 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 7980d0db8617..02c415b1b278 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -83,6 +83,7 @@ struct dev_pagemap_ops { }; =20 #define PGMAP_ALTMAP_VALID (1 << 0) +#define PGMAP_PROTECTION (1 << 1) =20 /** * struct dev_pagemap - metadata for ZONE_DEVICE mappings diff --git a/mm/memremap.c b/mm/memremap.c index af0223605e69..4dfb3025cee3 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -62,6 +62,37 @@ static void devmap_managed_enable_put(struct dev_pagemap= *pgmap) } #endif /* CONFIG_FS_DAX */ =20 +#ifdef CONFIG_DEVMAP_ACCESS_PROTECTION + +/* + * Note; all devices which have asked for protections share the same key. = The + * key may, or may not, have been provided by the core. If not, protection + * will be disabled. The key acquisition is attempted when the first ZONE + * DEVICE requests it and freed when all zones have been unmapped. + * + * Also this must be EXPORT_SYMBOL rather than EXPORT_SYMBOL_GPL because i= t is + * intended to be used in the kmap API. + */ +DEFINE_STATIC_KEY_FALSE(dev_pgmap_protection_static_key); +EXPORT_SYMBOL(dev_pgmap_protection_static_key); + +static void devmap_protection_enable(void) +{ + static_branch_inc(&dev_pgmap_protection_static_key); +} + +static void devmap_protection_disable(void) +{ + static_branch_dec(&dev_pgmap_protection_static_key); +} + +#else /* !CONFIG_DEVMAP_ACCESS_PROTECTION */ + +static void devmap_protection_enable(void) { } +static void devmap_protection_disable(void) { } + +#endif /* CONFIG_DEVMAP_ACCESS_PROTECTION */ + static void pgmap_array_delete(struct range *range) { xa_store_range(&pgmap_array, PHYS_PFN(range->start), PHYS_PFN(range->end), @@ -148,6 +179,9 @@ void memunmap_pages(struct dev_pagemap *pgmap) =20 WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n"); devmap_managed_enable_put(pgmap); + + if (pgmap->flags & PGMAP_PROTECTION) + devmap_protection_disable(); } EXPORT_SYMBOL_GPL(memunmap_pages); =20 @@ -295,6 +329,12 @@ void *memremap_pages(struct dev_pagemap *pgmap, int ni= d) if (WARN_ONCE(!nr_range, "nr_range must be specified\n")) return ERR_PTR(-EINVAL); =20 + if (pgmap->flags & PGMAP_PROTECTION) { + if (!pgmap_protection_available()) + return ERR_PTR(-EINVAL); + devmap_protection_enable(); + } + switch (pgmap->type) { case MEMORY_DEVICE_PRIVATE: if (!IS_ENABLED(CONFIG_DEVICE_PRIVATE)) { --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B919C433FE for ; Tue, 19 Apr 2022 17:09:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244943AbiDSRLz (ORCPT ); Tue, 19 Apr 2022 13:11:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355707AbiDSRKU (ORCPT ); Tue, 19 Apr 2022 13:10:20 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A4DA5140EF for ; Tue, 19 Apr 2022 10:07:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388049; x=1681924049; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BeVeTCAa8T+wbEw6iF6i/AKwrHy3n4a/PM6Pqzd7T5Y=; b=l1ZvKx1U6IdYogizIOizAungFpkIGcmHVMZE5YM0LIqpIf7cJA3Qk4UI rqoyB7oVZ3uYHlftwu5FtyF/ruE+39JxFm3pmMP1JTZg9NBGzy0HoFy5/ Cizs0RqhBbab2GMNXQY8GMwa+caMgSxk1qOMjhszMqsWkndR2xN77h7PL VEdYRRpBGQlPSq7EHI1XPrV0bg/mNeQJcC7OQyuI0yoilGKBH4GFj/HlK wojgTo99VGYhxAoJv26XqOKnnTWLn4e/gOoF866PgOkzvNyQtEAFzOWg9 4rnxaTTBILp0iZ39A8gk1Fth2/WbXpWJVelXcGhwlDrstDQf+o0i2wjz/ w==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="243750621" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="243750621" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:14 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="702255106" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:13 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 28/44] memremap_pages: Introduce devmap_protected() Date: Tue, 19 Apr 2022 10:06:33 -0700 Message-Id: <20220419170649.1022246-29-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Consumers of protected dev_pagemaps can check the PGMAP_PROTECTION flag to see if the devmap is protected. However, most contexts will have a struct page not the pagemap structure directly. Define devmap_protected() to determine if a page is part of a dev_pagemap mapping and if the page is protected by additional protections. Signed-off-by: Ira Weiny --- Changes for V10 Move code from mm.h to memremap.h Upstream separated memremap.h functionality from mm.h dc90f0846df4 ("mm: don't include in ") --- include/linux/memremap.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 02c415b1b278..6325f00096ec 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -223,6 +223,23 @@ static inline bool pgmap_protection_available(void) return pks_available(); } =20 +DECLARE_STATIC_KEY_FALSE(dev_pgmap_protection_static_key); + +/* + * devmap_protected() requires a reference on the page to ensure there is = no + * races with dev_pagemap tear down. + */ +static inline bool devmap_protected(struct page *page) +{ + if (!static_branch_unlikely(&dev_pgmap_protection_static_key)) + return false; + if (!is_zone_device_page(page)) + return false; + if (page->pgmap->flags & PGMAP_PROTECTION) + return true; + return false; +} + #else =20 static inline bool pgmap_protection_available(void) --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C389EC4321E for ; Tue, 19 Apr 2022 17:10:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356117AbiDSRMw (ORCPT ); Tue, 19 Apr 2022 13:12:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59928 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355332AbiDSRJ6 (ORCPT ); Tue, 19 Apr 2022 13:09:58 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A438BEA for ; Tue, 19 Apr 2022 10:07:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388035; x=1681924035; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=48r55Tzhghkqvh16jlfpDIA8KBy3fgH76RnR2W3gPbY=; b=h/Kz8sEE4stBXkFvIGnLmJ0X7y3X76SxHpp6xzgJn3XQGlJCL6ZWlP0u aqBaui2QNP8W3XVbdiQNTG/2lDqhw7eGpdOgHELvKNmKIMMXiHn0UxgC9 Tz8ibEgv1WPAjNqzryv3aSIyuuu5PEsbYQbX/XBzhZt5E4qniZtvUrb0b LQ8tDnF2LuGSrsD1T0FYBCZFOTTq7MQnxMAWo41XX3a/0e5G9qOLN5hTj 8KUuvb6n0K9LA5RZjcW1FIc/FP8mmjwriWsbF8JKjbZHStijp24VmjfZh 0Se9w13mI2DFgBgFLIFnLdHUQfgvWcAStyl9sr0Ahs9F0pdRs6b//e5HJ g==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="261420847" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="261420847" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:14 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="625733419" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:14 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 29/44] memremap_pages: Reserve a PKS pkey for eventual use by PMEM Date: Tue, 19 Apr 2022 10:06:34 -0700 Message-Id: <20220419170649.1022246-30-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Reserve a pkey for use by the memmap facility and set the default protections to Access Disabled. Signed-off-by: Ira Weiny --- Changes for V10 This patch now reserves a key before the PKS testing does. So adjust for this being the only key at this point in the series. Changes for V9 Adjust for new key allocation From Dave Hansen use pkey --- include/linux/pks-keys.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/include/linux/pks-keys.h b/include/linux/pks-keys.h index c914afecb2d3..4e63c8061e55 100644 --- a/include/linux/pks-keys.h +++ b/include/linux/pks-keys.h @@ -60,17 +60,22 @@ =20 /* PKS_KEY_DEFAULT must be 0 */ #define PKS_KEY_DEFAULT 0 -#define PKS_KEY_MAX PKS_NEW_KEY(PKS_KEY_DEFAULT, 1) +#define PKS_KEY_PGMAP_PROTECTION \ + PKS_NEW_KEY(PKS_KEY_DEFAULT, CONFIG_DEVMAP_ACCESS_PROTECTION) +#define PKS_KEY_MAX PKS_NEW_KEY(PKS_KEY_PGMAP_PROTECTION, 1) =20 /* PKS_KEY_DEFAULT_INIT must be RW */ #define PKS_KEY_DEFAULT_INIT PKS_DECLARE_INIT_VALUE(PKS_KEY_DEFAULT, RW, 1) +#define PKS_KEY_PGMAP_INIT PKS_DECLARE_INIT_VALUE(PKS_KEY_PGMAP_PROTECTION= , \ + AD, CONFIG_DEVMAP_ACCESS_PROTECTION) =20 #define PKS_ALL_AD_MASK \ GENMASK(PKS_NUM_PKEYS * PKR_BITS_PER_PKEY, \ PKS_KEY_MAX * PKR_BITS_PER_PKEY) =20 #define PKS_INIT_VALUE ((PKS_ALL_AD & PKS_ALL_AD_MASK) | \ - PKS_KEY_DEFAULT_INIT \ + PKS_KEY_DEFAULT_INIT | \ + PKS_KEY_PGMAP_INIT \ ) =20 #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F9CFC43219 for ; Tue, 19 Apr 2022 17:10:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356050AbiDSRM1 (ORCPT ); Tue, 19 Apr 2022 13:12:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355630AbiDSRJ6 (ORCPT ); Tue, 19 Apr 2022 13:09:58 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF920C53 for ; Tue, 19 Apr 2022 10:07:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388035; x=1681924035; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=i5Ntw+w2U3i3omwgQwquKdMbUX1Ci6rQ18tEbpDLUm8=; b=VynZDKZWKTulXHtL7z3D1oiKls6GJCaVWAkVxFnRKvwLNqVNd/KPIZKH AMH7gqBQtn+U44WiPXfSn/lSw2Y1ap35LrSTzuj3dml+eau+VCHSCHaHl UHSTT9NiIGjVz73dXQsCZISPtc7NSOn8gzh3xBnr1/J3ZduhuipZuq4Xt MZzLb+KbpaN5uJiZQ5iocpDCBcVbxxwDeUMoPsyELhQ5QR+a0nU/vSrRA nciF+cwkDL4s8tw6wSxlbe0pQZ+quhyf9etha9xoxxLFUN4xO9tWFiTiE KhiUizCKQAFzdALmcoEeRnBsgRPemmCQl1mnPVvMb7DJ1WfpZkxyiZjdX g==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="263991888" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="263991888" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:15 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="529397291" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:15 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 30/44] memremap_pages: Set PKS pkey in PTEs if requested Date: Tue, 19 Apr 2022 10:06:35 -0700 Message-Id: <20220419170649.1022246-31-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny When a devmap caller requests protections, the dev_pagemap PTE's need to have a PKEY set. When PGMAP_PROTECTIONS is requested add the pkey to the page protections. Signed-off-by: Ira Weiny --- Changes for V9 From Dave Hansen use pkey --- mm/memremap.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/mm/memremap.c b/mm/memremap.c index 4dfb3025cee3..215ab9c51917 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -81,6 +81,14 @@ static void devmap_protection_enable(void) static_branch_inc(&dev_pgmap_protection_static_key); } =20 +static pgprot_t devmap_protection_adjust_pgprot(pgprot_t prot) +{ + pgprotval_t val; + + val =3D pgprot_val(prot); + return __pgprot(val | _PAGE_PKEY(PKS_KEY_PGMAP_PROTECTION)); +} + static void devmap_protection_disable(void) { static_branch_dec(&dev_pgmap_protection_static_key); @@ -91,6 +99,10 @@ static void devmap_protection_disable(void) static void devmap_protection_enable(void) { } static void devmap_protection_disable(void) { } =20 +static pgprot_t devmap_protection_adjust_pgprot(pgprot_t prot) +{ + return prot; +} #endif /* CONFIG_DEVMAP_ACCESS_PROTECTION */ =20 static void pgmap_array_delete(struct range *range) @@ -333,6 +345,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) if (!pgmap_protection_available()) return ERR_PTR(-EINVAL); devmap_protection_enable(); + params.pgprot =3D devmap_protection_adjust_pgprot(params.pgprot); } =20 switch (pgmap->type) { --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24A73C433F5 for ; Tue, 19 Apr 2022 17:08:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355897AbiDSRLW (ORCPT ); Tue, 19 Apr 2022 13:11:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343836AbiDSRKA (ORCPT ); Tue, 19 Apr 2022 13:10:00 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F50C1164 for ; Tue, 19 Apr 2022 10:07:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388037; x=1681924037; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HzuyO7S2Ekg+g7XHoS8vhO/ifVxDTWKb+avzh6c7FFM=; b=ZodC3dqXByJS8k+TtY/lFmPAvcDm5mikucM1js787jV/Bwa+4Awuf1Bu h6etn0JQrmtXqhumXNFyLrn0y47IjMcSvSvLohOXyXHsy/mFAOLgeHAEm oHH04r9je2qB96zN49qfldrTLyuF5Ob5tnzb827SqAPjLz3rdd7zP3Wok odA+bA10NDAJyi7qh4zEJFKDfbLhNxe8eE/Hx3tICsmBj8cP29/MGMm4U P65GTXIZcXSOj722jW2g9cRqebxlruA50WF8IpDKGRatf0jCZQzfAuFLk KLpSn1tUCFXepLkSEYPXpd7/o28QfjbRtP31mNGw0fozEMQTGePaQEOwF Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="324261496" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="324261496" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:17 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="530498895" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:16 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 31/44] memremap_pages: Define pgmap_set_{readwrite|noaccess}() calls Date: Tue, 19 Apr 2022 10:06:36 -0700 Message-Id: <20220419170649.1022246-32-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny A thread that wants to access memory protected by PGMAP protections must first enable access, and then disable access when it is done. Introduce pgmap_set_{readwrite|noaccess}() for this purpose. The two calls are destined to be used by the kmap API and take a struct page for convenience. They determine if the page is protected and, if so, perform the requested operation. Toggling between Read/Write and No Access was chosen as it fits well with the accessibility of a kmap'ed page. Discussions did occur regarding making a finer grained mapping for Read Only but that is something which can be added at a later date. In addition, two lower level functions are exported. They take the dev_pagemap object directly for internal consumers who have knowledge of the of the dev_pagemap. All changes in the protections must be through the above calls. They abstract the protection implementation (currently the PKS API) from upper layer consumers. The calls are made nestable by the use of a per task reference count. This ensures that the first call to re-enable protection does not 'break' the last access of the device memory. Expansion of the task struct is unavoidable due to the desire to maintain kmap_local_page() as non-atomic and migratable. The only other idea to track a reference count was in a per-cpu variable. However, doing so would make kmap_local_page() equivalent to kmap_atomic() which is undesirable. Access to device memory during exceptions (#PF) is expected only from user faults. Therefore there is no need to maintain the reference count during exceptions. NOTE: It is not anticipated that any code path will directly nest these calls. For this reason multiple reviewers, including Dan and Thomas, asked why this reference counting was needed at this level rather than in a higher level call such as kmap_local_page(). The reason is that pgmap_set_readwrite() can nest with kmap_{atomic,local_page}(). Therefore this reference counting is pushed to the lower level to ensure that any combination of calls is nestable. Signed-off-by: Ira Weiny --- Changes for V10 Move code from mm.h to memremap.h Upstream separated memremap.h functionality from mm.h dc90f0846df4 ("mm: don't include in ") Changes for V9 From Dan Williams Update the commit message with details on why the thread struct needs to be expanded. Following on Dave Hansens suggestion for pks_mk s/pgmap_mk_*/pgmap_set_*/ Changes for V8 Split these functions into their own patch. This helps to clarify the commit message and usage. --- include/linux/memremap.h | 35 +++++++++++++++++++++++++++++++++++ include/linux/sched.h | 7 +++++++ init/init_task.c | 3 +++ mm/memremap.c | 14 ++++++++++++++ 4 files changed, 59 insertions(+) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 6325f00096ec..1012c6c4c664 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -240,8 +240,43 @@ static inline bool devmap_protected(struct page *page) return false; } =20 +void __pgmap_set_readwrite(struct dev_pagemap *pgmap); +void __pgmap_set_noaccess(struct dev_pagemap *pgmap); + +static inline bool pgmap_check_pgmap_prot(struct page *page) +{ + if (!devmap_protected(page)) + return false; + + /* + * There is no known use case to change permissions in an irq for pgmap + * pages + */ + lockdep_assert_in_irq(); + return true; +} + +static inline void pgmap_set_readwrite(struct page *page) +{ + if (!pgmap_check_pgmap_prot(page)) + return; + __pgmap_set_readwrite(page->pgmap); +} + +static inline void pgmap_set_noaccess(struct page *page) +{ + if (!pgmap_check_pgmap_prot(page)) + return; + __pgmap_set_noaccess(page->pgmap); +} + #else =20 +static inline void __pgmap_set_readwrite(struct dev_pagemap *pgmap) { } +static inline void __pgmap_set_noaccess(struct dev_pagemap *pgmap) { } +static inline void pgmap_set_readwrite(struct page *page) { } +static inline void pgmap_set_noaccess(struct page *page) { } + static inline bool pgmap_protection_available(void) { return false; diff --git a/include/linux/sched.h b/include/linux/sched.h index d5e3c00b74e1..7da0d2a0ac74 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1498,6 +1498,13 @@ struct task_struct { struct callback_head l1d_flush_kill; #endif =20 +#ifdef CONFIG_DEVMAP_ACCESS_PROTECTION + /* + * NOTE: pgmap_prot_count is modified within a single thread of + * execution. So it does not need to be atomic_t. + */ + u32 pgmap_prot_count; +#endif /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. diff --git a/init/init_task.c b/init/init_task.c index 73cc8f03511a..948b32cf8139 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -209,6 +209,9 @@ struct task_struct init_task #ifdef CONFIG_SECCOMP_FILTER .seccomp =3D { .filter_count =3D ATOMIC_INIT(0) }, #endif +#ifdef CONFIG_DEVMAP_ACCESS_PROTECTION + .pgmap_prot_count =3D 0, +#endif }; EXPORT_SYMBOL(init_task); =20 diff --git a/mm/memremap.c b/mm/memremap.c index 215ab9c51917..491bb49255ae 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -94,6 +94,20 @@ static void devmap_protection_disable(void) static_branch_dec(&dev_pgmap_protection_static_key); } =20 +void __pgmap_set_readwrite(struct dev_pagemap *pgmap) +{ + if (!current->pgmap_prot_count++) + pks_set_readwrite(PKS_KEY_PGMAP_PROTECTION); +} +EXPORT_SYMBOL_GPL(__pgmap_set_readwrite); + +void __pgmap_set_noaccess(struct dev_pagemap *pgmap) +{ + if (!--current->pgmap_prot_count) + pks_set_noaccess(PKS_KEY_PGMAP_PROTECTION); +} +EXPORT_SYMBOL_GPL(__pgmap_set_noaccess); + #else /* !CONFIG_DEVMAP_ACCESS_PROTECTION */ =20 static void devmap_protection_enable(void) { } --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DBDB9C433EF for ; Tue, 19 Apr 2022 17:08:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355942AbiDSRL0 (ORCPT ); Tue, 19 Apr 2022 13:11:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355635AbiDSRKC (ORCPT ); Tue, 19 Apr 2022 13:10:02 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F74FCFA for ; Tue, 19 Apr 2022 10:07:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388039; x=1681924039; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PUwp0sgNqiJqro4pztbiy6wkYALqifCh1CJOUEuojrs=; b=V6ZNGjuXUpwe2fCdJpB8DbT/4ubvPXUEW8wRjh24mJ9iuo9flyg5wUkb 68ux5oitiCawRqKIMFky09mQM6dV3jJGvRuU77i243jdVC0+CSuCfY0Z0 g5/wFCdueBfzsTzZcIFnPy6xha0JXjgDXGs6TR1VORoF5SqAgs0j+HeL0 cXJAaId1BsNIbEBFWF/9oAe30z5qYUMqvliTB2Mkszjo0ttvJL2+U9bkF n5r3v/2yz7OnBuZh/PHoKY9cdj11SsnDpy3x+xG+0WbOj0FB07HrHErjs waZIXoAf9fd5Ib4RCyJuPmEP/XovcwT6lPU/70MIbqFYOFdT0FSAqLMx+ A==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="262677083" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="262677083" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:18 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="666580379" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:18 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 32/44] memremap_pages: Add memremap.pks_fault_mode Date: Tue, 19 Apr 2022 10:06:37 -0700 Message-Id: <20220419170649.1022246-33-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny When PKS protections for PMEM are enabled the kernel may capture stray writes, or it may capture false positive access violations. An example of a false positive access violation is a code path that neglects to call kmap_{atomic,local_page}, but is otherwise a valid access. In the false positive scenario there is little risk to data integrity, but the kernel still needs to make a decision whether to report the access violation and continue, or treat the violation as fatal. That policy decision is captured in a new pks_fault_mode kernel parameter. 2 modes are available: 'relaxed' (default) -- WARN_ONCE, remove the protections, and continuing to operate. 'strict' -- Stop kernel execution via fault. This is the most protective of the PMEM memory but may be undesirable in some configurations. NOTE: There was some debate about if a 3rd mode called 'silent' should be available. 'silent' would be the same as 'relaxed' but not print any output. While 'silent' is nice for admins to reduce console/log output it would result in less motivation to fix invalid access to the protected pmem pages. Therefore, 'silent' is left out. NOTE: The __param_check macro requires a type to correctly verify the values passed as the module parameter. Therefore a typedef is made of the pks_fault_modes and the checkpatch warning regarding new typedefs is ignored. Signed-off-by: Ira Weiny --- Changes for V10 Move code from mm.h to memremap.h Upstream separated memremap.h functionality from mm.h dc90f0846df4 ("mm: don't include in ") Adjust pkey allocation around test code being moved to the end of the series. Changes for V9 From Dan Williams Clarify commit message Remove code comment regarding checkpatch From Rick Edgecombe Remove unnecessary initialization Changes for V8 Use pks_update_exception() instead of abandoning the pkey. Split out pgmap_protection_flag_invalid() into a separate patch for clarity. From Rick Edgecombe Fix sysfs_streq() checks From Randy Dunlap Fix Documentation closing parans Changes for V7 Leverage Rick Edgecombe's fault callback infrastructure to relax invalid uses and prevent crashes From Dan Williams Use sysfs_* calls for parameter Make pgmap_disable_protection inline Remove pfn from warn output Remove silent parameter option --- .../admin-guide/kernel-parameters.txt | 12 ++++ arch/x86/mm/pkeys.c | 7 +- include/linux/memremap.h | 3 + mm/memremap.c | 65 +++++++++++++++++++ 4 files changed, 86 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index 3f1cc5e317ed..a1ab60eba72a 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4229,6 +4229,18 @@ pirq=3D [SMP,APIC] Manual mp-table setup See Documentation/x86/i386/IO-APIC.rst. =20 + memremap.pks_fault_mode=3D [X86] Control the behavior of page map + protection violations. + (depends on CONFIG_DEVMAP_ACCESS_PROTECTION) + + Format: { relaxed | strict } + + relaxed - Print a warning, disable the protection and + continue execution. + strict - Stop kernel execution via fault + + default: relaxed + plip=3D [PPT,NET] Parallel port network link Format: { parport | timid | 0 } See also Documentation/admin-guide/parport.rst. diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 975ed206d957..e9a8c67f6b66 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -8,6 +8,7 @@ #include /* PKEY_* */ #include #include +#include /* fault callback */ #include =20 #include /* boot_cpu_has, ... */ @@ -243,7 +244,11 @@ static DEFINE_PER_CPU(u32, pkrs_cache); * #endif * }; */ -static const pks_key_callback pks_key_callbacks[PKS_KEY_MAX] =3D { 0 }; +static const pks_key_callback pks_key_callbacks[PKS_KEY_MAX] =3D { +#ifdef CONFIG_DEVMAP_ACCESS_PROTECTION + [PKS_KEY_PGMAP_PROTECTION] =3D pgmap_pks_fault_callback, +#endif +}; =20 static bool pks_call_fault_callback(struct pt_regs *regs, unsigned long ad= dress, bool write, u16 key) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 1012c6c4c664..47e0d102e194 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -270,6 +270,9 @@ static inline void pgmap_set_noaccess(struct page *page) __pgmap_set_noaccess(page->pgmap); } =20 +bool pgmap_pks_fault_callback(struct pt_regs *regs, unsigned long address, + bool write); + #else =20 static inline void __pgmap_set_readwrite(struct dev_pagemap *pgmap) { } diff --git a/mm/memremap.c b/mm/memremap.c index 491bb49255ae..d289ba304032 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -14,6 +14,8 @@ #include #include "internal.h" =20 +#include + static DEFINE_XARRAY(pgmap_array); =20 /* @@ -94,6 +96,69 @@ static void devmap_protection_disable(void) static_branch_dec(&dev_pgmap_protection_static_key); } =20 +typedef enum { + PKS_MODE_STRICT =3D 0, + PKS_MODE_RELAXED =3D 1, +} pks_fault_modes; + +pks_fault_modes pks_fault_mode =3D PKS_MODE_RELAXED; + +static int param_set_pks_fault_mode(const char *val, const struct kernel_p= aram *kp) +{ + int ret =3D -EINVAL; + + if (sysfs_streq(val, "relaxed")) { + pks_fault_mode =3D PKS_MODE_RELAXED; + ret =3D 0; + } else if (sysfs_streq(val, "strict")) { + pks_fault_mode =3D PKS_MODE_STRICT; + ret =3D 0; + } + + return ret; +} + +static int param_get_pks_fault_mode(char *buffer, const struct kernel_para= m *kp) +{ + int ret; + + switch (pks_fault_mode) { + case PKS_MODE_STRICT: + ret =3D sysfs_emit(buffer, "strict\n"); + break; + case PKS_MODE_RELAXED: + ret =3D sysfs_emit(buffer, "relaxed\n"); + break; + default: + ret =3D sysfs_emit(buffer, "\n"); + break; + } + + return ret; +} + +static const struct kernel_param_ops param_ops_pks_fault_modes =3D { + .set =3D param_set_pks_fault_mode, + .get =3D param_get_pks_fault_mode, +}; + +#define param_check_pks_fault_modes(name, p) \ + __param_check(name, p, pks_fault_modes) +module_param(pks_fault_mode, pks_fault_modes, 0644); + +bool pgmap_pks_fault_callback(struct pt_regs *regs, unsigned long address, + bool write) +{ + /* In strict mode just let the fault handler oops */ + if (pks_fault_mode =3D=3D PKS_MODE_STRICT) + return false; + + WARN_ONCE(1, "Page map protection being disabled"); + pks_update_exception(regs, PKS_KEY_PGMAP_PROTECTION, PKEY_READ_WRITE); + return true; +} +EXPORT_SYMBOL_GPL(pgmap_pks_fault_callback); + void __pgmap_set_readwrite(struct dev_pagemap *pgmap) { if (!current->pgmap_prot_count++) --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C928BC433FE for ; Tue, 19 Apr 2022 17:11:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356065AbiDSRMc (ORCPT ); Tue, 19 Apr 2022 13:12:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355636AbiDSRKC (ORCPT ); Tue, 19 Apr 2022 13:10:02 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABC9121B9 for ; Tue, 19 Apr 2022 10:07:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388039; x=1681924039; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5sapg/ecq+xu8Oubfcjw4YPepAu8QZrx7Lb+rmPQxjM=; b=JPqlf9jqlpc9DFuWyF9l+X+b7YtxyeDonYLSrLMmdccnOGZLyr5jXafg SjZRHJDBFhxpQwvdBKg6isxujK9YP4uZNAK9EQmWsGwJxi4Ze8owtG7Ap AaaiuNAD0idKXa/b2T8Q3G985SIyXI5GdTlVef4VnwOY/D3wHNe71na+c x/MLWm7e7khHet7WoHMbnFC8CBTqpaGMcm+4sRI8JQgl+NpPYC9Q+rVVH HSoD9EUy2q13lFYQY/VmMHVXdtbT6czTwL367RPE0P6n9Juf2rXqA3SB6 FnrFDdtUu2Nb8LmX/Fe3UKcH5ZJ1a8p4jyop9KLraguFh39mmR0bqMchi g==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="263991916" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="263991916" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:19 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="529397323" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:18 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 33/44] kmap: Make kmap work for devmap protected pages Date: Tue, 19 Apr 2022 10:06:38 -0700 Message-Id: <20220419170649.1022246-34-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Today, kmap_{local_page,atomic}() handle granting access to HIGHMEM pages without the caller needing to know if the page is HIGHMEM, or not. Use that existing infrastructure to grant access to PGMAP (PKS) protected pages. kmap_{local_page,atomic}() are both thread local mappings so they work well with the thread specific protections available within PKS. On the other hand, the kmap() call is not changed. kmap() allows for a mapping to be shared with other threads, while PKS protections operate on a thread local basis. For this reason, and the desire to move away from mappings like this, kmap() is left unsupported. This behavior is safe because neither of the 2 current DAX-capable filesystems (ext4 and xfs) perform such global mappings. And known device drivers that would handle devmap pages are not using kmap(). Any future filesystems that gain DAX support, or device drivers wanting to support devmap protected pages will need to use kmap_local_page(). Note: HIGHMEM support is mutually exclusive with PGMAP protection. The rationale is mainly to reduce complexity, but also because direct-map exposure is already mitigated by default on HIGHMEM systems because by definition HIGHMEM systems do not have large capacities of memory in the direct map. Cc: Dan Williams Cc: Dave Hansen Signed-off-by: Ira Weiny --- Changes for V10 Include memremap.h because of upstream rework Changes for V9 From Dan Williams Update commit message Clarify why kmap is not 'compatible' with PKS Explain the HIGHMEM system exclusion more Remove pgmap_protection_flag_invalid() from kmap s/pks_mk*/pks_set*/ Changes for V8 Reword commit message --- include/linux/highmem-internal.h | 6 ++++++ mm/Kconfig | 1 + 2 files changed, 7 insertions(+) diff --git a/include/linux/highmem-internal.h b/include/linux/highmem-inter= nal.h index a77be5630209..32ed07c2994b 100644 --- a/include/linux/highmem-internal.h +++ b/include/linux/highmem-internal.h @@ -151,6 +151,8 @@ static inline void totalhigh_pages_add(long count) =20 #else /* CONFIG_HIGHMEM */ =20 +#include + static inline struct page *kmap_to_page(void *addr) { return virt_to_page(addr); @@ -174,6 +176,7 @@ static inline void kunmap(struct page *page) =20 static inline void *kmap_local_page(struct page *page) { + pgmap_set_readwrite(page); return page_address(page); } =20 @@ -197,6 +200,7 @@ static inline void __kunmap_local(void *addr) #ifdef ARCH_HAS_FLUSH_ON_KUNMAP kunmap_flush_on_unmap(addr); #endif + pgmap_set_noaccess(kmap_to_page(addr)); } =20 static inline void *kmap_atomic(struct page *page) @@ -206,6 +210,7 @@ static inline void *kmap_atomic(struct page *page) else preempt_disable(); pagefault_disable(); + pgmap_set_readwrite(page); return page_address(page); } =20 @@ -224,6 +229,7 @@ static inline void __kunmap_atomic(void *addr) #ifdef ARCH_HAS_FLUSH_ON_KUNMAP kunmap_flush_on_unmap(addr); #endif + pgmap_set_noaccess(kmap_to_page(addr)); pagefault_enable(); if (IS_ENABLED(CONFIG_PREEMPT_RT)) migrate_enable(); diff --git a/mm/Kconfig b/mm/Kconfig index fe1752e6e76c..616baee3f62d 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -800,6 +800,7 @@ config ZONE_DEVICE config DEVMAP_ACCESS_PROTECTION bool "Access protection for memremap_pages()" depends on NVDIMM_PFN + depends on !HIGHMEM depends on ARCH_HAS_SUPERVISOR_PKEYS select ARCH_ENABLE_PKS_CONSUMER default n --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D143C433EF for ; Tue, 19 Apr 2022 17:08:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355955AbiDSRLe (ORCPT ); Tue, 19 Apr 2022 13:11:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355654AbiDSRKD (ORCPT ); Tue, 19 Apr 2022 13:10:03 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 27721273B for ; Tue, 19 Apr 2022 10:07:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388040; x=1681924040; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HcEslqZopmXYsUw7zwbef5KUJ9D83y9DD7QVT5vPuts=; b=b0Ni6yB7tkm9od5rpV0o8oEdTqyydOt8CKNAoJifX6dn/iiAWSkZTIPb xkDSYFqlR/xnBYAYjGxG3hMV47bnuicd7pEhY7IggKRIqxx6v0SwD2fvp Iq/YsdF/W+eWqUNzUk2WXjxCdMCU5umFLomnQtG+QJgpCl2dGxFJZTaLp bWVjmxb328m5/H5djdQsfFo/hRhdfDY9YxJHksJpir0vB+P6RRsJ64hdf DpYNhIw5jJ/Ewo2RJXCl/sUt33FDKLz0+CyLiKX26Y08nKSKYQSaRHDKq kGgz8pWmAq4QUL9ajysaDQDxM3p/MGzlD0iLrlQU0JVQSiZaUxqQ1THA+ A==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="261420874" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="261420874" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:20 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="702255151" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:19 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 34/44] dax: Stray access protection for dax_direct_access() Date: Tue, 19 Apr 2022 10:06:39 -0700 Message-Id: <20220419170649.1022246-35-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny dax_direct_access() provides a way to obtain the direct map address of PMEM memory. With the new devmap protections the use of this address needs to be bracketed by calls to enable and disable protection of those pages. These calls only need to be used to guard actual access to the memory. Other uses of dax_direct_access() do not need to use these guards. Introduce 2 new calls dax_set_readwrite() and dax_set_noaccess(). Bracket all uses of the address returned by dax_direct_access() with those calls. For consumers who require a permanent address to the dax device, such as the DM write cache, dax_map_protected() is used to query for additional protections. Update the DM write cache code to create a permanent mapping if dax_map_protected() is true. Cc: Jane Chu Signed-off-by: Ira Weiny --- Changes for V9 Do not add a new dax operation. Instead teach struct dax_device about the dev_pagemap PGMAP_PROTECTION flag and call the ops directly if needed. s/dax_mk_*/dax_set_*/ Changes for V8 Rebase changes on 5.17-rc1 Clean up the cover letter dax_read_lock() is not required s/dax_protected()/dax_map_protected()/ Testing revealed a dax_flush() which was not properly protected. Changes for V7 Rework cover letter. Do not include a FS_DAX_LIMITED restriction for dcss. It will simply not implement the protection and there is no need to special case this. Clean up commit message because I did not originally understand the nuance of the s390 device. Introduce dax_{protected,mk_readwrite,mk_noaccess}() From Dan Williams Remove old clean up cruft from previous versions Remove map_protected Remove 'global' parameters all calls --- drivers/dax/super.c | 60 ++++++++++++++++++++++++++++++++++++++ drivers/md/dm-writecache.c | 8 ++++- fs/dax.c | 8 +++++ fs/fuse/virtio_fs.c | 2 ++ include/linux/dax.h | 5 ++++ 5 files changed, 82 insertions(+), 1 deletion(-) diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 0211e6f7b47a..3105794f55f7 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "dax-private.h" =20 /** @@ -29,6 +30,7 @@ struct dax_device { void *private; unsigned long flags; const struct dax_operations *ops; + struct dev_pagemap *pgmap; }; =20 static dev_t dax_devt; @@ -118,6 +120,8 @@ enum dax_device_flags { * @pgoff: offset in pages from the start of the device to translate * @nr_pages: number of consecutive pages caller can handle relative to @p= fn * @kaddr: output parameter that returns a virtual address mapping of pfn + * Direct access through this pointer must be guarded by calls to + * dax_set_{readwrite,noaccess}() * @pfn: output parameter that returns an absolute pfn translation of @pgo= ff * * Return: negative errno if an error occurs, otherwise the number of @@ -210,6 +214,56 @@ void dax_flush(struct dax_device *dax_dev, void *addr,= size_t size) #endif EXPORT_SYMBOL_GPL(dax_flush); =20 +bool dax_map_protected(struct dax_device *dax_dev) +{ + struct dev_pagemap *pgmap =3D dax_dev->pgmap; + + if (!dax_alive(dax_dev)) + return false; + + return pgmap && (pgmap->flags & PGMAP_PROTECTION); +} +EXPORT_SYMBOL_GPL(dax_map_protected); + +/** + * dax_set_readwrite() - make protected dax devices read/write + * @dax_dev: the dax device representing the memory to access + * + * Any access of the kaddr memory returned from dax_direct_access() must be + * guarded by dax_set_readwrite() and dax_set_noaccess(). This ensures th= at any + * dax devices which have additional protections are allowed to relax those + * protections for the thread using this memory. + * + * NOTE these calls must be contained within a single thread of execution = and + * both must be guarded by dax_read_lock() Which is also a requirement for + * dax_direct_access() anyway. + */ +void dax_set_readwrite(struct dax_device *dax_dev) +{ + if (!dax_map_protected(dax_dev)) + return; + + __pgmap_set_readwrite(dax_dev->pgmap); +} +EXPORT_SYMBOL_GPL(dax_set_readwrite); + +/** + * dax_set_noaccess() - restore protection to dax devices if needed + * @dax_dev: the dax device representing the memory to access + * + * See dax_direct_access() and dax_set_readwrite() + * + * NOTE Must be called prior to dax_read_unlock() + */ +void dax_set_noaccess(struct dax_device *dax_dev) +{ + if (!dax_map_protected(dax_dev)) + return; + + __pgmap_set_noaccess(dax_dev->pgmap); +} +EXPORT_SYMBOL_GPL(dax_set_noaccess); + void dax_write_cache(struct dax_device *dax_dev, bool wc) { if (wc) @@ -249,6 +303,12 @@ void set_dax_nomc(struct dax_device *dax_dev) } EXPORT_SYMBOL_GPL(set_dax_nomc); =20 +void set_dax_pgmap(struct dax_device *dax_dev, struct dev_pagemap *pgmap) +{ + dax_dev->pgmap =3D pgmap; +} +EXPORT_SYMBOL_GPL(set_dax_pgmap); + bool dax_alive(struct dax_device *dax_dev) { lockdep_assert_held(&dax_srcu); diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c index 5630b470ba42..8fd26a237de3 100644 --- a/drivers/md/dm-writecache.c +++ b/drivers/md/dm-writecache.c @@ -297,7 +297,13 @@ static int persistent_memory_claim(struct dm_writecach= e *wc) r =3D -EOPNOTSUPP; goto err2; } - if (da !=3D p) { + + /* + * Force the write cache to map the pages directly if the dax device + * mapping is protected or if the number of pages returned was not what + * was requested. + */ + if (dax_map_protected(wc->ssd_dev->dax_dev) || da !=3D p) { long i; wc->memory_map =3D NULL; pages =3D kvmalloc_array(p, sizeof(struct page *), GFP_KERNEL); diff --git a/fs/dax.c b/fs/dax.c index 67a08a32fccb..7cc76c6752ae 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -727,7 +727,9 @@ static int copy_cow_page_dax(struct vm_fault *vmf, cons= t struct iomap_iter *iter return rc; } vto =3D kmap_atomic(vmf->cow_page); + dax_set_readwrite(iter->iomap.dax_dev); copy_user_page(vto, kaddr, vmf->address, vmf->cow_page); + dax_set_noaccess(iter->iomap.dax_dev); kunmap_atomic(vto); dax_read_unlock(id); return 0; @@ -936,8 +938,10 @@ static int dax_writeback_one(struct xa_state *xas, str= uct dax_device *dax_dev, count =3D 1UL << dax_entry_order(entry); index =3D xas->xa_index & ~(count - 1); =20 + dax_set_readwrite(dax_dev); dax_entry_mkclean(mapping, index, pfn); dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE); + dax_set_noaccess(dax_dev); /* * After we have flushed the cache, we can clear the dirty tag. There * cannot be new dirty data in the pfn after the flush has completed as @@ -1124,8 +1128,10 @@ static int dax_memzero(struct dax_device *dax_dev, p= goff_t pgoff, =20 ret =3D dax_direct_access(dax_dev, pgoff, 1, &kaddr, NULL); if (ret > 0) { + dax_set_readwrite(dax_dev); memset(kaddr + offset, 0, size); dax_flush(dax_dev, kaddr + offset, size); + dax_set_noaccess(dax_dev); } return ret; } @@ -1259,12 +1265,14 @@ static loff_t dax_iomap_iter(const struct iomap_ite= r *iomi, if (map_len > end - pos) map_len =3D end - pos; =20 + dax_set_readwrite(dax_dev); if (iov_iter_rw(iter) =3D=3D WRITE) xfer =3D dax_copy_from_iter(dax_dev, pgoff, kaddr, map_len, iter); else xfer =3D dax_copy_to_iter(dax_dev, pgoff, kaddr, map_len, iter); + dax_set_noaccess(dax_dev); =20 pos +=3D xfer; length -=3D xfer; diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 86b7dbb6a0d4..58bb949dcdfc 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -775,8 +775,10 @@ static int virtio_fs_zero_page_range(struct dax_device= *dax_dev, rc =3D dax_direct_access(dax_dev, pgoff, nr_pages, &kaddr, NULL); if (rc < 0) return rc; + dax_set_readwrite(dax_dev); memset(kaddr, 0, nr_pages << PAGE_SHIFT); dax_flush(dax_dev, kaddr, nr_pages << PAGE_SHIFT); + dax_set_noaccess(dax_dev); return 0; } =20 diff --git a/include/linux/dax.h b/include/linux/dax.h index 9fc5f99a0ae2..30fe49f9ec9d 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -91,6 +91,7 @@ static inline bool daxdev_mapping_supported(struct vm_are= a_struct *vma, =20 void set_dax_nocache(struct dax_device *dax_dev); void set_dax_nomc(struct dax_device *dax_dev); +void set_dax_pgmap(struct dax_device *dax_dev, struct dev_pagemap *pgmap); =20 struct writeback_control; #if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX) @@ -187,6 +188,10 @@ int dax_zero_page_range(struct dax_device *dax_dev, pg= off_t pgoff, size_t nr_pages); void dax_flush(struct dax_device *dax_dev, void *addr, size_t size); =20 +bool dax_map_protected(struct dax_device *dax_dev); +void dax_set_readwrite(struct dax_device *dax_dev); +void dax_set_noaccess(struct dax_device *dax_dev); + ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops); vm_fault_t dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_s= ize, --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62A7DC433EF for ; Tue, 19 Apr 2022 17:08:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355961AbiDSRLk (ORCPT ); Tue, 19 Apr 2022 13:11:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355660AbiDSRKE (ORCPT ); Tue, 19 Apr 2022 13:10:04 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2F82E5F59 for ; Tue, 19 Apr 2022 10:07:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388041; x=1681924041; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=i6pPXC0JSSSMuLc8Wn5aZ29lRSQ2lCEZPmuGWTz0kMM=; b=eYgMSXer2r+5gY9V3Ja7IdMyEM/qTzxxv70K5CeBv7wdAopIBZ1UydF3 xBeWy0flqBSObZAkN9ikdEEWwW23rwrQlJu7lOWHkukTHDWr949O03cxw TU3WK/JoA3GACUkQgN+Xi31Dkqxr14VnFcXufNoUyPdpUbo5JqKWKt78H F4urQs8J6qxdgxrgG2kRmDT39Sa85oYtR0v7MgzxFi/2KV7eQABQir2dO KjA9WPIvMQ9vaiBAiVblqCAJOJALgCmU9oMKf3ieI/j+ffFOdrOrD+Os8 aVNE4ZtFoRhE5R8iS4YCx9GejyyIDqdD1IhFGabUpOwk/D5rcsze1bO1Z g==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="251123665" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="251123665" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:20 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="592853258" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:20 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 35/44] nvdimm/pmem: Enable stray access protection Date: Tue, 19 Apr 2022 10:06:40 -0700 Message-Id: <20220419170649.1022246-36-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The persistent memory (PMEM) driver uses the memremap_pages facility to provide 'struct page' metadata (vmemmap) for PMEM. Given that PMEM capacity maybe orders of magnitude higher capacity than System RAM it presents a large vulnerability surface to stray writes. Unlike stray writes to System RAM, which may result in a crash or other undesirable behavior, stray writes to PMEM additionally are more likely to result in permanent data loss. Reboot is not a remediation for PMEM corruption like it is for System RAM. Now that all valid kernel access' to PMEM have been annotated with {__}pgmap_set_{readwrite,noaccess}() PGMAP_PROTECTION is safe to enable in the pmem layer. Set PGMAP_PROTECTION if pgmap protections are available and set the pgmap property of the dax device for it's use. Internally, the pmem driver uses a cached virtual address, pmem->virt_addr (pmem_addr). Call __pgmap_set_{readwrite,noaccess}() directly when PGMAP_PROTECTION is active on those mappings. Signed-off-by: Ira Weiny --- Changes for V9 Remove the dax operations and pass the pgmap to the dax_device for its use. s/pgmap_mk_*/pgmap_set_*/ s/pmem_mk_*/pmem_set_*/ Changes for V8 Rebase to 5.17-rc1 Remove global param Add internal structure which uses the pmem device and pgmap device directly in the *_mk_*() calls. Add pmem dax ops callbacks Use pgmap_protection_available() s/PGMAP_PKEY_PROTECT/PGMAP_PROTECTION --- drivers/nvdimm/pmem.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 58d95242a836..2c7b18da7974 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -138,6 +138,18 @@ static blk_status_t read_pmem(struct page *page, unsig= ned int off, return BLK_STS_OK; } =20 +static void pmem_set_readwrite(struct pmem_device *pmem) +{ + if (pmem->pgmap.flags & PGMAP_PROTECTION) + __pgmap_set_readwrite(&pmem->pgmap); +} + +static void pmem_set_noaccess(struct pmem_device *pmem) +{ + if (pmem->pgmap.flags & PGMAP_PROTECTION) + __pgmap_set_noaccess(&pmem->pgmap); +} + static blk_status_t pmem_do_read(struct pmem_device *pmem, struct page *page, unsigned int page_off, sector_t sector, unsigned int len) @@ -149,7 +161,11 @@ static blk_status_t pmem_do_read(struct pmem_device *p= mem, if (unlikely(is_bad_pmem(&pmem->bb, sector, len))) return BLK_STS_IOERR; =20 + /* Enable direct use of pmem->virt_addr */ + pmem_set_readwrite(pmem); rc =3D read_pmem(page, page_off, pmem_addr, len); + pmem_set_noaccess(pmem); + flush_dcache_page(page); return rc; } @@ -181,11 +197,15 @@ static blk_status_t pmem_do_write(struct pmem_device = *pmem, * after clear poison. */ flush_dcache_page(page); + + /* Enable direct use of pmem->virt_addr */ + pmem_set_readwrite(pmem); write_pmem(pmem_addr, page, page_off, len); if (unlikely(bad_pmem)) { rc =3D pmem_clear_poison(pmem, pmem_off, len); write_pmem(pmem_addr, page, page_off, len); } + pmem_set_noaccess(pmem); =20 return rc; } @@ -427,6 +447,8 @@ static int pmem_attach_disk(struct device *dev, pmem->pfn_flags =3D PFN_DEV; if (is_nd_pfn(dev)) { pmem->pgmap.type =3D MEMORY_DEVICE_FS_DAX; + if (pgmap_protection_available()) + pmem->pgmap.flags |=3D PGMAP_PROTECTION; addr =3D devm_memremap_pages(dev, &pmem->pgmap); pfn_sb =3D nd_pfn->pfn_sb; pmem->data_offset =3D le64_to_cpu(pfn_sb->dataoff); @@ -440,6 +462,8 @@ static int pmem_attach_disk(struct device *dev, pmem->pgmap.range.end =3D res->end; pmem->pgmap.nr_range =3D 1; pmem->pgmap.type =3D MEMORY_DEVICE_FS_DAX; + if (pgmap_protection_available()) + pmem->pgmap.flags |=3D PGMAP_PROTECTION; addr =3D devm_memremap_pages(dev, &pmem->pgmap); pmem->pfn_flags |=3D PFN_MAP; bb_range =3D pmem->pgmap.range; @@ -481,6 +505,8 @@ static int pmem_attach_disk(struct device *dev, } set_dax_nocache(dax_dev); set_dax_nomc(dax_dev); + if (pmem->pgmap.flags & PGMAP_PROTECTION) + set_dax_pgmap(dax_dev, &pmem->pgmap); if (is_nvdimm_sync(nd_region)) set_dax_synchronous(dax_dev); rc =3D dax_add_host(dax_dev, disk); --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F83AC433F5 for ; Tue, 19 Apr 2022 17:08:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355949AbiDSRLa (ORCPT ); Tue, 19 Apr 2022 13:11:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60190 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345024AbiDSRKF (ORCPT ); Tue, 19 Apr 2022 13:10:05 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D8B862C2 for ; Tue, 19 Apr 2022 10:07:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388042; x=1681924042; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qfkDg2QQgBb+Ae6HRu+P0c6D5UC51bq/BcKspZQJB24=; b=KgbzVTT0bTB2OyB7JuqCNsjqvaGVHmOJJpn5pBsiaEXKlxvW1HY3ynuS +fAy1dnSaEMF+irFIN1ttaIl2/L4abCuCMxncOxGlCSptjwmCw96CAwlr ndtRYOIkGQXHeMWX3S5b/YQKQqhtW+NOgrpcGtfKTmnWaRtDtOlDgINmC y0Iv8Yr0q7F+3gM9posVKSpR2n/EUtTbp4G8OkZ0TRPsfN0BmfMbtJQeR 47PG5BdakJ/EtARBUsm4huJndl5NqoSVB8/+OrtXJP9MLhDcsLzRoJd1h 2iWY9JJTrjr3XLvRugmp0rO1ryvubIx/wyVRce8YEtj9oNsjPGxTMIEqA Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="263991937" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="263991937" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:21 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="529397335" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:21 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 36/44] devdax: Enable stray access protection Date: Tue, 19 Apr 2022 10:06:41 -0700 Message-Id: <20220419170649.1022246-37-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Device dax is primarily accessed through user space and kernel access is controlled through the kmap interfaces. Now that all valid kernel initiated access to dax devices have been accounted for, turn on PGMAP_PKEYS_PROTECT for device dax. Reviewed-by: Dan Williams Signed-off-by: Ira Weiny --- Changes for V9 Add Review tag Changes for V8 Rebase to 5.17-rc1 Use pgmap_protection_available() s/PGMAP_PKEYS_PROTECT/PGMAP_PROTECTION/ --- drivers/dax/device.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/dax/device.c b/drivers/dax/device.c index 5494d745ced5..045854ba3855 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -451,6 +451,8 @@ int dev_dax_probe(struct dev_dax *dev_dax) if (dev_dax->align > PAGE_SIZE) pgmap->vmemmap_shift =3D order_base_2(dev_dax->align >> PAGE_SHIFT); + if (pgmap_protection_available()) + pgmap->flags |=3D PGMAP_PROTECTION; addr =3D devm_memremap_pages(dev, pgmap); if (IS_ERR(addr)) return PTR_ERR(addr); --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA247C433F5 for ; Tue, 19 Apr 2022 17:09:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356024AbiDSRMY (ORCPT ); Tue, 19 Apr 2022 13:12:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355238AbiDSRKG (ORCPT ); Tue, 19 Apr 2022 13:10:06 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9138E62CE for ; Tue, 19 Apr 2022 10:07:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388042; x=1681924042; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IFTAlozldUjZoDWs9NBVHctgmL+SsILD5QZm6e6W4wE=; b=Zce4VXTmkMI8Sps3ASzjSaXBAxSJsHFVK4WzLQ1vXmQI/+xaNK4txYdD +yHP1gk2iomr808jmEMC86Ygcu0Xw3lhlQsiu1HU0bqUBKFHf9iZg9aC6 fisdOXPdcrOcyfH8kUC/1j230Y7a8RZBbhVlaMtnVKdGvePNHTZaC6L1l LYMAxnmrc91lOetlb2GBCMMAmTL3+RzyVpGNNJ11XSmiNkr3G5NqaBRAS awB117HmWLwX83EnSA0slXFooW8Yn+gYJ9XZwcJgkv1ZWKeh/phPrSjAC lGn5TvGVyTITsUvCnDRPNlyNQLAqn/P1GcO5QDmZ8ZS0iE/Wdq/vFo87E Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="261420887" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="261420887" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:22 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="702255172" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:22 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 37/44] mm/pkeys: PKS testing, add initial test code Date: Tue, 19 Apr 2022 10:06:42 -0700 Message-Id: <20220419170649.1022246-38-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Define a PKS consumer for testing. Two initial tests are created. One to check that the default values have been properly assigned and a second which purposely causes a fault. Add documentation. Signed-off-by: Ira Weiny --- Changes for V10 Organize the test patches together and at the end of the larger series Change the key allocation based on the PMEM use case being added first. Changes for V9 Simplify the commit message Simplify documentation in favor of using test_pks Complete re-arch of test code... Return -ENOENT for unknown tests Adjust the key allocation Reduce the globals used during fault detection Introduce a session structure to track information as long as the debugfs file remains open. Use pr_debug() for internal debug output. Document how to run tests from debugfs with trace_printk() output. Feedback from Rick Edgecombe Change pkey type to u8 remove pks_test_exit set file data within the crash test to be cleaned up on file close Resolve when memory barriers are needed From Dave Hansen Place a lock around the execution of tests so that only a single thread execute at a time. Changes for V8 Ensure that unknown tests are flagged as failures. Split out the various tests into their own patches which test the functionality as the series goes. Move this basic test forward in the series Changes for V7 Add testing for pks_abandon_protections() Adjust pkrs_init_value Adjust for new defines Clean up comments Adjust test for static allocation of pkeys Use lookup_address() instead of follow_pte() follow_pte only works on IO and raw PFN mappings, use lookup_address() instead. lookup_address() is constrained to architectures which support it. --- Documentation/core-api/protection-keys.rst | 6 + include/linux/pks-keys.h | 9 +- lib/Kconfig.debug | 12 + lib/Makefile | 3 + lib/pks/Makefile | 3 + lib/pks/pks_test.c | 301 +++++++++++++++++++++ 6 files changed, 332 insertions(+), 2 deletions(-) create mode 100644 lib/pks/Makefile create mode 100644 lib/pks/pks_test.c diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index 47bcb38fff4f..361c6b7e1b93 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -169,3 +169,9 @@ WRMSR(MSR_IA32_PKRS) is an exception. It is not a seri= alizing instruction and instead maintains ordering properties similar to WRPKRU. Thus it is safe = to immediately use a mapping when the pks_set*() functions returns. Check the latest SDM for details. + +Testing +------- + +.. kernel-doc:: lib/pks/pks_test.c + :doc: PKS_TEST diff --git a/include/linux/pks-keys.h b/include/linux/pks-keys.h index 4e63c8061e55..380bc999cbe3 100644 --- a/include/linux/pks-keys.h +++ b/include/linux/pks-keys.h @@ -62,12 +62,16 @@ #define PKS_KEY_DEFAULT 0 #define PKS_KEY_PGMAP_PROTECTION \ PKS_NEW_KEY(PKS_KEY_DEFAULT, CONFIG_DEVMAP_ACCESS_PROTECTION) -#define PKS_KEY_MAX PKS_NEW_KEY(PKS_KEY_PGMAP_PROTECTION, 1) +#define PKS_KEY_TEST PKS_NEW_KEY(PKS_KEY_PGMAP_PROTECTION, \ + CONFIG_PKS_TEST) +#define PKS_KEY_MAX PKS_NEW_KEY(PKS_KEY_TEST, 1) =20 /* PKS_KEY_DEFAULT_INIT must be RW */ #define PKS_KEY_DEFAULT_INIT PKS_DECLARE_INIT_VALUE(PKS_KEY_DEFAULT, RW, 1) #define PKS_KEY_PGMAP_INIT PKS_DECLARE_INIT_VALUE(PKS_KEY_PGMAP_PROTECTION= , \ AD, CONFIG_DEVMAP_ACCESS_PROTECTION) +#define PKS_KEY_TEST_INIT PKS_DECLARE_INIT_VALUE(PKS_KEY_TEST, AD, \ + CONFIG_PKS_TEST) =20 #define PKS_ALL_AD_MASK \ GENMASK(PKS_NUM_PKEYS * PKR_BITS_PER_PKEY, \ @@ -75,7 +79,8 @@ =20 #define PKS_INIT_VALUE ((PKS_ALL_AD & PKS_ALL_AD_MASK) | \ PKS_KEY_DEFAULT_INIT | \ - PKS_KEY_PGMAP_INIT \ + PKS_KEY_PGMAP_INIT | \ + PKS_KEY_TEST_INIT \ ) =20 #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 075cd25363ac..7ac43b78c7bb 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2758,6 +2758,18 @@ config HYPERV_TESTING help Select this option to enable Hyper-V vmbus testing. =20 +config PKS_TEST + bool "PKey (S)upervisor testing" + depends on ARCH_HAS_SUPERVISOR_PKEYS + select ARCH_ENABLE_SUPERVISOR_PKEYS + help + Select this option to enable testing of PKS core software and + hardware. + + Answer N if you don't know what supervisor keys are. + + If unsure, say N. + endmenu # "Kernel Testing and Coverage" =20 source "Documentation/Kconfig" diff --git a/lib/Makefile b/lib/Makefile index 6b9ffc1bd1ee..67f88d92aa00 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -402,3 +402,6 @@ $(obj)/$(TEST_FORTIFY_LOG): $(addprefix $(obj)/, $(TEST= _FORTIFY_LOGS)) FORCE ifeq ($(CONFIG_FORTIFY_SOURCE),y) $(obj)/string.o: $(obj)/$(TEST_FORTIFY_LOG) endif + +# PKS test +obj-y +=3D pks/ diff --git a/lib/pks/Makefile b/lib/pks/Makefile new file mode 100644 index 000000000000..9daccba4f7c4 --- /dev/null +++ b/lib/pks/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-$(CONFIG_PKS_TEST) +=3D pks_test.o diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c new file mode 100644 index 000000000000..2fc92aaa54e8 --- /dev/null +++ b/lib/pks/pks_test.c @@ -0,0 +1,301 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright(c) 2022 Intel Corporation. All rights reserved. + */ + +/** + * DOC: PKS_TEST + * + * When CONFIG_PKS_TEST is enabled a debugfs file is created to facilitate= in + * kernel testing. Tests can be triggered by writing a test number to + * /sys/kernel/debug/x86/run_pks + * + * Results and debug output can be seen through dynamic debug. + * + * Example: + * + * .. code-block:: sh + * + * # Enable kernel debug + * echo "file pks_test.c +pflm" > /sys/kernel/debug/dynamic_debug/control + * + * # Run test + * echo 0 > /sys/kernel/debug/x86/run_pks + * + * # Turn off kernel debug + * echo "file pks_test.c -p" > /sys/kernel/debug/dynamic_debug/control + * + * # view kernel debugging output + * dmesg -H | grep pks_test + */ + +#include +#include +#include +#include +#include + +#define PKS_TEST_MEM_SIZE (PAGE_SIZE) + +#define CHECK_DEFAULTS 0 +#define RUN_CRASH_TEST 9 + +static struct dentry *pks_test_dentry; + +DEFINE_MUTEX(test_run_lock); + +struct pks_test_ctx { + u8 pkey; + char data[64]; + void *test_page; +}; + +static void debug_context(const char *label, struct pks_test_ctx *ctx) +{ + pr_debug("%s [%d] %s <-> %p\n", + label, + ctx->pkey, + ctx->data, + ctx->test_page); +} + +struct pks_session_data { + struct pks_test_ctx *ctx; + bool need_unlock; + bool crash_armed; + bool last_test_pass; +}; + +static void debug_session(const char *label, struct pks_session_data *sd) +{ + pr_debug("%s ctx %p; unlock %d; crash %d; last test %s\n", + label, + sd->ctx, + sd->need_unlock, + sd->crash_armed, + sd->last_test_pass ? "PASS" : "FAIL"); + +} + +static void debug_result(const char *label, int test_num, + struct pks_session_data *sd) +{ + pr_debug("%s [%d]: %s\n", + label, test_num, + sd->last_test_pass ? "PASS" : "FAIL"); +} + +static void *alloc_test_page(u8 pkey) +{ + return __vmalloc_node_range(PKS_TEST_MEM_SIZE, 1, VMALLOC_START, + VMALLOC_END, GFP_KERNEL, + PAGE_KERNEL_PKEY(pkey), 0, + NUMA_NO_NODE, __builtin_return_address(0)); +} + +static void free_ctx(struct pks_test_ctx *ctx) +{ + if (!ctx) + return; + + vfree(ctx->test_page); + kfree(ctx); +} + +static struct pks_test_ctx *alloc_ctx(u8 pkey) +{ + struct pks_test_ctx *ctx =3D kzalloc(sizeof(*ctx), GFP_KERNEL); + + if (!ctx) + return ERR_PTR(-ENOMEM); + + ctx->pkey =3D pkey; + sprintf(ctx->data, "%s", "DEADBEEF"); + + ctx->test_page =3D alloc_test_page(ctx->pkey); + if (!ctx->test_page) { + pr_debug("Test page allocation failed\n"); + kfree(ctx); + return ERR_PTR(-ENOMEM); + } + + debug_context("Context allocated", ctx); + return ctx; +} + +static void set_ctx_data(struct pks_session_data *sd, struct pks_test_ctx = *ctx) +{ + if (sd->ctx) { + pr_debug("Context data already set\n"); + free_ctx(sd->ctx); + } + pr_debug("Setting context data; %p\n", ctx); + sd->ctx =3D ctx; +} + +static void crash_it(struct pks_session_data *sd) +{ + struct pks_test_ctx *ctx; + + ctx =3D alloc_ctx(PKS_KEY_TEST); + if (IS_ERR(ctx)) { + pr_err("Failed to allocate context???\n"); + sd->last_test_pass =3D false; + return; + } + set_ctx_data(sd, ctx); + + pr_debug("Purposely faulting...\n"); + memcpy(ctx->test_page, ctx->data, 8); + + pr_err("ERROR: Should never get here...\n"); + sd->last_test_pass =3D false; +} + +static void check_pkey_settings(void *data) +{ + struct pks_session_data *sd =3D data; + unsigned long long msr =3D 0; + unsigned int cpu =3D smp_processor_id(); + + rdmsrl(MSR_IA32_PKRS, msr); + pr_debug("cpu %d 0x%llx\n", cpu, msr); + if (msr !=3D PKS_INIT_VALUE) { + pr_err("cpu %d value incorrect : 0x%llx expected 0x%lx\n", + cpu, msr, PKS_INIT_VALUE); + sd->last_test_pass =3D false; + } +} + +static void arm_or_run_crash_test(struct pks_session_data *sd) +{ + + /* + * WARNING: Test "9" will crash. + * Arm the test. + * A second "9" will run the test. + */ + if (!sd->crash_armed) { + pr_debug("Arming crash test\n"); + sd->crash_armed =3D true; + return; + } + + sd->crash_armed =3D false; + crash_it(sd); +} + +static ssize_t pks_read_file(struct file *file, char __user *user_buf, + size_t count, loff_t *ppos) +{ + struct pks_session_data *sd =3D file->private_data; + char buf[64]; + unsigned int len; + + len =3D sprintf(buf, "%s\n", sd->last_test_pass ? "PASS" : "FAIL"); + + return simple_read_from_buffer(user_buf, count, ppos, buf, len); +} + +static ssize_t pks_write_file(struct file *file, const char __user *user_b= uf, + size_t count, loff_t *ppos) +{ + struct pks_session_data *sd =3D file->private_data; + long test_num; + char buf[2]; + + pr_debug("Begin...\n"); + sd->last_test_pass =3D false; + + if (copy_from_user(buf, user_buf, 1)) + return -EFAULT; + buf[1] =3D '\0'; + + if (kstrtol(buf, 0, &test_num)) + return -EINVAL; + + if (mutex_lock_interruptible(&test_run_lock)) + return -EBUSY; + + sd->need_unlock =3D true; + sd->last_test_pass =3D true; + + switch (test_num) { + case RUN_CRASH_TEST: + pr_debug("crash test\n"); + arm_or_run_crash_test(file->private_data); + goto unlock_test; + case CHECK_DEFAULTS: + pr_debug("check defaults test: 0x%lx\n", PKS_INIT_VALUE); + on_each_cpu(check_pkey_settings, file->private_data, 1); + break; + default: + pr_debug("Unknown test\n"); + sd->last_test_pass =3D false; + count =3D -ENOENT; + break; + } + + /* Clear arming on any test run */ + pr_debug("Clearing crash test arm\n"); + sd->crash_armed =3D false; + +unlock_test: + /* + * Normal exit; clear up the locking flag + */ + sd->need_unlock =3D false; + mutex_unlock(&test_run_lock); + debug_result("Test complete", test_num, sd); + return count; +} + +static int pks_open_file(struct inode *inode, struct file *file) +{ + struct pks_session_data *sd =3D kzalloc(sizeof(*sd), GFP_KERNEL); + + if (!sd) + return -ENOMEM; + + debug_session("Allocated session", sd); + file->private_data =3D sd; + + return 0; +} + +static int pks_release_file(struct inode *inode, struct file *file) +{ + struct pks_session_data *sd =3D file->private_data; + + debug_session("Freeing session", sd); + + /* + * Some tests may fault and not return through the normal write + * syscall. The crash test is specifically designed to do this. Clean + * up the run lock when the file is closed if the write syscall does + * not exit normally. + */ + if (sd->need_unlock) + mutex_unlock(&test_run_lock); + free_ctx(sd->ctx); + kfree(sd); + return 0; +} + +static const struct file_operations fops_init_pks =3D { + .read =3D pks_read_file, + .write =3D pks_write_file, + .llseek =3D default_llseek, + .open =3D pks_open_file, + .release =3D pks_release_file, +}; + +static int __init pks_test_init(void) +{ + if (cpu_feature_enabled(X86_FEATURE_PKS)) + pks_test_dentry =3D debugfs_create_file("run_pks", 0600, arch_debugfs_di= r, + NULL, &fops_init_pks); + + return 0; +} +late_initcall(pks_test_init); --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91CAAC433EF for ; Tue, 19 Apr 2022 17:09:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356008AbiDSRMU (ORCPT ); Tue, 19 Apr 2022 13:12:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60228 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355661AbiDSRKH (ORCPT ); Tue, 19 Apr 2022 13:10:07 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 399937663 for ; Tue, 19 Apr 2022 10:07:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388044; x=1681924044; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LCHZGnGKt9Sx5ZOeUlrpSXA3eXrL5puBLzySFcE/4X8=; b=dJSGmT5XgzZuwBxSXfEGbN/Cx5H3QW4HwBqm0q+DY/mHoLoMGAhw8Fea M8YgSlyooNq6qw2KjnfRmC9ixITVKnBBsFItKaZ8o++k3SXBYK4hLkUij rKM5bTZkCFlqXsqlYquIp2rnP8mBJebuIS/kZBqthJEkUr6d1i8Z/DShn 7smyVeAR/9/i78RBlbzu9uJCsyW1M87ZXdyBfsjdi9j4/qdUZohK6IVfW DAJVbQvCCsF+BgITc3mRCyHigMzk5ti7dNVa2uAIijYGSe7GiVtmZRpUb HmhO5XtxNVoQey4Mwd1xPmfShz8X3Ops0QpxY4WvQpSLg3itCvLO6e9YM Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="261420897" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="261420897" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:23 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="554813711" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:22 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 38/44] x86/selftests: Add test_pks Date: Tue, 19 Apr 2022 10:06:43 -0700 Message-Id: <20220419170649.1022246-39-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny The PKS kernel tests are clumsy to run using debugfs directly. It is much nicer to have a user space application trigger the execution of those tests. Create test_pks as a selftest. Output is as follows. $ ./test_pks_64 -h Usage: ./test_pks_64 [-h,-d] [test] --help,-h This help --debug,-d Output kernel debug via dynamic debug if available Run all PKS tests or the [test] specified. [test] can be one of: 'check_defaults' 'create_fault' (Not included in run all) $ ./test_pks_64 [RUN] check_defaults [OK] Suggested-by: Rick Edgecombe Signed-off-by: Ira Weiny --- Changes for V9: New Patch --- Documentation/core-api/protection-keys.rst | 3 + tools/testing/selftests/x86/Makefile | 2 +- tools/testing/selftests/x86/test_pks.c | 353 +++++++++++++++++++++ 3 files changed, 357 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/x86/test_pks.c diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index 361c6b7e1b93..d492ec194e2a 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -175,3 +175,6 @@ Testing =20 .. kernel-doc:: lib/pks/pks_test.c :doc: PKS_TEST + +.. kernel-doc:: tools/testing/selftests/x86/test_pks.c + :doc: PKS_TEST_USER diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests= /x86/Makefile index 0388c4d60af0..f24252d2cbfb 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -13,7 +13,7 @@ CAN_BUILD_WITH_NOPIE :=3D $(shell ./check_cc.sh "$(CC)" t= rivial_program.c -no-pie) TARGETS_C_BOTHBITS :=3D single_step_syscall sysret_ss_attrs syscall_nt tes= t_mremap_vdso \ check_initial_reg_state sigreturn iopl ioperm \ test_vsyscall mov_ss_trap \ - syscall_arg_fault fsgsbase_restore sigaltstack + syscall_arg_fault fsgsbase_restore sigaltstack test_pks TARGETS_C_32BIT_ONLY :=3D entry_from_vm86 test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer diff --git a/tools/testing/selftests/x86/test_pks.c b/tools/testing/selftes= ts/x86/test_pks.c new file mode 100644 index 000000000000..df5bde9bfdbe --- /dev/null +++ b/tools/testing/selftests/x86/test_pks.c @@ -0,0 +1,353 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright(c) 2022 Intel Corporation. All rights reserved. + */ + +/** + * DOC: PKS_TEST_USER + * + * To assist in executing the tests 'test_pks' can be built from the + * tools/testing directory. See the help output for details. + * + * .. code-block:: sh + * + * $ cd tools/testing/selftests/x86 + * $ make test_pks + * $ ./test_pks_64 -h + * ... + */ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include + +#define DYN_DBG_CNT_FILE "/sys/kernel/debug/dynamic_debug/control" +#define PKS_TEST_FILE "/sys/kernel/debug/x86/run_pks" + +/* Values from the kernel */ +#define CHECK_DEFAULTS "0" +#define RUN_CRASH_TEST "9" + +time_t g_start_time; +int g_debug; + +#define PRINT_DEBUG(fmt, ...) \ + do { \ + if (g_debug) \ + printf("%s: " fmt, __func__, ##__VA_ARGS__); \ + } while (0) + +#define PRINT_ERROR(fmt, ...) \ + fprintf(stderr, "%s: " fmt, __func__, ##__VA_ARGS__) + +static int do_simple_test(const char *debugfs_str); + +/* + * The crash test is a special case which is not included in the run all + * option. Do not add it here. + */ +enum { + TEST_DEFAULTS =3D 0, + MAX_TESTS, +} tests; + +/* Special */ +#define CREATE_FAULT_TEST_NAME "create_fault" + +struct test_item { + char *name; + const char *debugfs_str; + int (*test_fn)(const char *debugfs_str); +} test_list[] =3D { + { "check_defaults", CHECK_DEFAULTS, do_simple_test } +}; + +static char *get_test_name(int test_num) +{ + if (test_num > MAX_TESTS) + return ""; + /* Special: not in run all */ + if (test_num =3D=3D MAX_TESTS) + return CREATE_FAULT_TEST_NAME; + return test_list[test_num].name; +} + +static int get_test_num(char *test_name) +{ + int i; + + /* Special: not in run all */ + if (strcmp(test_name, CREATE_FAULT_TEST_NAME) =3D=3D 0) + return MAX_TESTS; + + for (i =3D 0; i < MAX_TESTS; i++) + if (strcmp(test_name, test_list[i].name) =3D=3D 0) + return i; + return -1; +} + +static void print_help_and_exit(char *argv0) +{ + int i; + + printf("Usage: %s [-h,-d] [test]\n", argv0); + printf(" --help,-h This help\n"); + printf(" --debug,-d Output kernel debug via dynamic debug if available\n= "); + printf("\n"); + printf(" Run all PKS tests or the [test] specified.\n"); + printf("\n"); + printf(" [test] can be one of:\n"); + + for (i =3D 0; i < MAX_TESTS; i++) + printf(" '%s'\n", get_test_name(i)); + + /* Special: not in run all */ + printf(" '%s' (Not included in run all)\n", + CREATE_FAULT_TEST_NAME); + + printf("\n"); +} + +/* + * Do a simple test of writing the debugfs value and reading back for 'PAS= S' + */ +static int do_simple_test(const char *debugfs_str) +{ + char str[16]; + int fd, rc =3D 0; + + fd =3D open(PKS_TEST_FILE, O_RDWR); + if (fd < 0) { + PRINT_DEBUG("Failed to open test file : %s\n", PKS_TEST_FILE); + return -ENOENT; + } + + rc =3D write(fd, debugfs_str, strlen(debugfs_str)); + if (rc < 0) { + rc =3D -errno; + goto close_file; + } + + rc =3D read(fd, str, 16); + if (rc < 0) + goto close_file; + + str[15] =3D '\0'; + + if (strncmp(str, "PASS", 4)) { + PRINT_ERROR("result: %s\n", str); + rc =3D -EFAULT; + goto close_file; + } + + rc =3D 0; + +close_file: + close(fd); + return rc; +} + +/* + * This test is special in that it requires the option to be written 2 tim= es. + * In addition because it creates a fault it is not included in the run all + * test suite. + */ +static int create_fault(void) +{ + char str[16]; + int fd, rc =3D 0; + + fd =3D open(PKS_TEST_FILE, O_RDWR); + if (fd < 0) { + PRINT_DEBUG("Failed to open test file : %s\n", PKS_TEST_FILE); + return -ENOENT; + } + + rc =3D write(fd, "9", 1); + if (rc < 0) { + rc =3D -errno; + goto close_file; + } + + rc =3D write(fd, "9", 1); + if (rc < 0) + goto close_file; + + rc =3D read(fd, str, 16); + if (rc < 0) + goto close_file; + + str[15] =3D '\0'; + + if (strncmp(str, "PASS", 4)) { + PRINT_ERROR("result: %s\n", str); + rc =3D -EFAULT; + goto close_file; + } + + rc =3D 0; + +close_file: + close(fd); + return rc; +} + +static int run_one(int test_num) +{ + int ret; + + printf("[RUN]\t%s\n", get_test_name(test_num)); + + if (test_num =3D=3D MAX_TESTS) + /* Special: not in run all */ + ret =3D create_fault(); + else + ret =3D test_list[test_num].test_fn(test_list[test_num].debugfs_str); + + if (ret =3D=3D -ENOENT) { + printf("[SKIP] Test not supported\n"); + return 0; + } else if (ret) { + printf("[FAIL]\n"); + return 1; + } + + printf("[OK]\n"); + return 0; +} + +static int run_all(void) +{ + int i, rc =3D 0; + + for (i =3D 0; i < MAX_TESTS; i++) { + int ret =3D run_one(i); + + /* sticky fail */ + if (ret) + rc =3D ret; + } + + return rc; +} + +#define STR_LEN 256 + +/* Debug output in the kernel is through dynamic debug */ +static void setup_debug(void) +{ + char str[STR_LEN]; + int fd, rc; + + g_start_time =3D time(NULL); + + fd =3D open(DYN_DBG_CNT_FILE, O_RDWR); + if (fd < 0) { + PRINT_ERROR("Dynamic debug not available: Failed to open: %s\n", + DYN_DBG_CNT_FILE); + return; + } + + snprintf(str, STR_LEN, "file pks_test.c +pflm"); + + rc =3D write(fd, str, strlen(str)); + if (rc !=3D strlen(str)) + PRINT_ERROR("ERROR: Failed to set up dynamic debug...\n"); + + close(fd); +} + +static void print_debug(void) +{ + char str[STR_LEN]; + struct tm *tm; + int fd, rc; + + fd =3D open(DYN_DBG_CNT_FILE, O_RDWR); + if (fd < 0) + return; + + snprintf(str, STR_LEN, "file pks_test.c -p"); + + rc =3D write(fd, str, strlen(str)); + if (rc !=3D strlen(str)) + PRINT_ERROR("ERROR: Failed to turn off dynamic debug...\n"); + + close(fd); + + /* + * dmesg is not accurate with time stamps so back up the start time a + * bit to ensure all the output from this run is dumped. + */ + g_start_time -=3D 5; + tm =3D localtime(&g_start_time); + + snprintf(str, STR_LEN, + "dmesg -H --since '%d-%d-%d %d:%d:%d' | grep pks_test", + tm->tm_year + 1900, tm->tm_mon + 1, tm->tm_mday, + tm->tm_hour, tm->tm_min, tm->tm_sec); + system(str); + printf("\tDebug output command (approximate start time):\n\t\t%s\n", + str); +} + +int main(int argc, char *argv[]) +{ + int flag_all =3D 1; + int test_num =3D 0; + int rc; + + while (1) { + static struct option long_options[] =3D { + {"help", no_argument, 0, 'h' }, + {"debug", no_argument, 0, 'd' }, + {0, 0, 0, 0 } + }; + int option_index =3D 0; + int c; + + c =3D getopt_long(argc, argv, "hd", long_options, &option_index); + if (c =3D=3D -1) + break; + + switch (c) { + case 'h': + print_help_and_exit(argv[0]); + return 0; + case 'd': + g_debug++; + break; + default: + print_help_and_exit(argv[0]); + exit(-1); + } + } + + if (optind < argc) { + test_num =3D get_test_num(argv[optind]); + if (test_num < 0) { + printf("[RUN]\t'%s'\n[SKIP]\tInvalid test\n", argv[optind]); + return 1; + } + + flag_all =3D 0; + } + + if (g_debug) + setup_debug(); + + if (flag_all) + rc =3D run_all(); + else + rc =3D run_one(test_num); + + if (g_debug) + print_debug(); + + return rc; +} --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58146C433F5 for ; Tue, 19 Apr 2022 17:09:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355993AbiDSRMQ (ORCPT ); Tue, 19 Apr 2022 13:12:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60232 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355664AbiDSRKH (ORCPT ); Tue, 19 Apr 2022 13:10:07 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB21412086 for ; Tue, 19 Apr 2022 10:07:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388044; x=1681924044; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=oZIm80ulIEDYKtHknpk3AI06Sl6C4d9v82++CYANf+8=; b=a60QbO05PqpqiywfvVvA2bDCo5Woruq2i+fLTCed/lgGd2Z1fK07pg90 oNZPVfZkRvRYayr/D11SG0836Cc8gm2UlEsjnEh2agVzAznC1UBqaQvnC p980w8g/xvY0k2VORY/1p6whQzXnaJwo4OEcIWPsWGVyUNyNmGXvT/Z5E bm3d9DkeRlgIiFerunTnC2Rft8pQHqLmYiFIItQyHfj90msByribav7UB lhdr33bvu5JOzEh2dFbeqzYcktXNRtXUYQ0gBv1C9qeGD/uYW6hW3QNSa GhBF9XJhyPj0MGX14BLT5GvN8tr5wFegJVaLhowAcoEAQ1d/UyTRoxZtc g==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="245710131" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="245710131" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:24 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="576192310" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:24 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 39/44] mm/pkeys: PKS testing, add a fault call back Date: Tue, 19 Apr 2022 10:06:44 -0700 Message-Id: <20220419170649.1022246-40-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny PKS testing will need to know when a fault occurs due to it's actions so that it can properly determine functionality. Install a PKS fault handler for the PKS test pkey. Signed-off-by: Ira Weiny --- Changes for V10 Adjust for the PMEM use case being added first. Changes for V9 New Patch --- arch/x86/mm/pkeys.c | 3 +++ include/linux/pks.h | 7 +++++++ lib/pks/pks_test.c | 6 ++++++ 3 files changed, 16 insertions(+) diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index e9a8c67f6b66..9e0948766427 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -248,6 +248,9 @@ static const pks_key_callback pks_key_callbacks[PKS_KEY= _MAX] =3D { #ifdef CONFIG_DEVMAP_ACCESS_PROTECTION [PKS_KEY_PGMAP_PROTECTION] =3D pgmap_pks_fault_callback, #endif +#ifdef CONFIG_PKS_TEST + [PKS_KEY_TEST] =3D pks_test_fault_callback, +#endif }; =20 static bool pks_call_fault_callback(struct pt_regs *regs, unsigned long ad= dress, diff --git a/include/linux/pks.h b/include/linux/pks.h index 151a3fda9de4..fd0ed09dd143 100644 --- a/include/linux/pks.h +++ b/include/linux/pks.h @@ -57,4 +57,11 @@ static inline void pks_update_exception(struct pt_regs *= regs, =20 #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ =20 +#ifdef CONFIG_PKS_TEST + +bool pks_test_fault_callback(struct pt_regs *regs, unsigned long address, + bool write); + +#endif /* CONFIG_PKS_TEST */ + #endif /* _LINUX_PKS_H */ diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c index 2fc92aaa54e8..37f2cd7d0f56 100644 --- a/lib/pks/pks_test.c +++ b/lib/pks/pks_test.c @@ -85,6 +85,12 @@ static void debug_result(const char *label, int test_num, sd->last_test_pass ? "PASS" : "FAIL"); } =20 +bool pks_test_fault_callback(struct pt_regs *regs, unsigned long address, + bool write) +{ + return false; +} + static void *alloc_test_page(u8 pkey) { return __vmalloc_node_range(PKS_TEST_MEM_SIZE, 1, VMALLOC_START, --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91714C433F5 for ; Tue, 19 Apr 2022 17:09:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355934AbiDSRME (ORCPT ); Tue, 19 Apr 2022 13:12:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355689AbiDSRKT (ORCPT ); Tue, 19 Apr 2022 13:10:19 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C499412098 for ; Tue, 19 Apr 2022 10:07:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388045; x=1681924045; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mOKOV2YxIJIuB8zswaSjIpr/sNoXiSTqH1rRmIWqFac=; b=NwwG19Zfgr1cSc4Hu8Shh/6ltwCI22yMX2hh/FhGXH7MXV7D63ZLyDA9 BY3QKMfIQ19HbkMBMgLxmDypKSkeUIoGmglNTvbgJ2VGF5YENeHLULiJ2 zqRlUWwsK+TQht6i0PFfnUxzjUYV5Gp3GAAvS4fmwcWlbyS5zDJLVpm3W FNl80gCJ22i09mXli2DZeaEU0iNCTnpip2jX2ECFWBjHa79faL0AKbkKU sx6utcCLUIdw2HtT8zqPQOYuoE6uABFqqQw+D903AFnL5CG+jkIBQd8yS DmdQQU8VSl0dOCJX58AmIFfL+vK75BJnYepVzDKslos3EFr2siyQU9i1N w==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="326720646" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="326720646" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:25 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="625733486" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:24 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 40/44] mm/pkeys: PKS testing, add pks_set_*() tests Date: Tue, 19 Apr 2022 10:06:45 -0700 Message-Id: <20220419170649.1022246-41-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny Test that the pks_set_*() functions operate as intended. First, verify that the pkey was properly set in the PTE. Second, use the fault callback mechanism to detect if a fault occurred when expected and if so clear the fault. The test iterates each of the following test cases. PKS_TEST_NO_ACCESS, WRITE, FAULT_EXPECTED PKS_TEST_NO_ACCESS, READ, FAULT_EXPECTED PKS_TEST_RDWR, WRITE, NO_FAULT_EXPECTED PKS_TEST_RDWR, READ, NO_FAULT_EXPECTED Add documentation. Signed-off-by: Ira Weiny --- Changes for V10 Test patches moved together in the series which required the exception handling modification to be squashed in this test. Changes for V9 Update commit message Clarify use of global state for faults to be used by all tests Add test to test_pks user app Remove an incorrect comment in the kdoc Change pkey type to u8 From Dave Hansen s/pks_mk*/pks_set*/ From Rick Edgecombe Use standard fault callback instead of the custom PKS test one Changes for V8 Remove readonly test, as that patch is not needed for PMEM Split this off into a patch which follows the pks_mk_*() patches. Thus allowing for a better view of how the test works compared to the functionality added with those patches. Remove unneeded prints --- lib/pks/pks_test.c | 168 ++++++++++++++++++++++++- tools/testing/selftests/x86/test_pks.c | 5 +- 2 files changed, 169 insertions(+), 4 deletions(-) diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c index 37f2cd7d0f56..dc309dd941be 100644 --- a/lib/pks/pks_test.c +++ b/lib/pks/pks_test.c @@ -33,11 +33,17 @@ #include #include #include +#include +#include +#include #include =20 +#include + #define PKS_TEST_MEM_SIZE (PAGE_SIZE) =20 #define CHECK_DEFAULTS 0 +#define RUN_SINGLE 1 #define RUN_CRASH_TEST 9 =20 static struct dentry *pks_test_dentry; @@ -48,6 +54,7 @@ struct pks_test_ctx { u8 pkey; char data[64]; void *test_page; + bool fault_seen; }; =20 static void debug_context(const char *label, struct pks_test_ctx *ctx) @@ -85,10 +92,107 @@ static void debug_result(const char *label, int test_n= um, sd->last_test_pass ? "PASS" : "FAIL"); } =20 +/* Global data protected by test_run_lock */ +struct pks_test_ctx *g_ctx_under_test; + +/* + * Call set_context_for_fault() after the context has been set up and prio= r to + * the expected fault. + */ +static void set_context_for_fault(struct pks_test_ctx *ctx) +{ + g_ctx_under_test =3D ctx; + /* Ensure the state of the global context is correct prior to a fault */ + barrier(); +} + bool pks_test_fault_callback(struct pt_regs *regs, unsigned long address, bool write) { - return false; + struct pt_regs_extended *ept_regs =3D to_extended_pt_regs(regs); + struct pt_regs_auxiliary *aux_pt_regs =3D &ept_regs->aux; + u32 pkrs =3D aux_pt_regs->pkrs; + + pr_debug("PKS Fault callback: ctx %p\n", g_ctx_under_test); + + if (!g_ctx_under_test) + return false; + + aux_pt_regs->pkrs =3D pkey_update_pkval(pkrs, g_ctx_under_test->pkey, 0); + g_ctx_under_test->fault_seen =3D true; + return true; +} + +enum pks_access_mode { + PKS_TEST_NO_ACCESS, + PKS_TEST_RDWR, +}; + +#define PKS_WRITE true +#define PKS_READ false +#define PKS_FAULT_EXPECTED true +#define PKS_NO_FAULT_EXPECTED false + +static char *get_mode_str(enum pks_access_mode mode) +{ + switch (mode) { + case PKS_TEST_NO_ACCESS: + return "No Access"; + case PKS_TEST_RDWR: + return "Read Write"; + } + + return ""; +} + +struct pks_access_test { + enum pks_access_mode mode; + bool write; + bool fault; +}; + +static struct pks_access_test pkey_test_ary[] =3D { + { PKS_TEST_NO_ACCESS, PKS_WRITE, PKS_FAULT_EXPECTED }, + { PKS_TEST_NO_ACCESS, PKS_READ, PKS_FAULT_EXPECTED }, + + { PKS_TEST_RDWR, PKS_WRITE, PKS_NO_FAULT_EXPECTED }, + { PKS_TEST_RDWR, PKS_READ, PKS_NO_FAULT_EXPECTED }, +}; + +static bool run_access_test(struct pks_test_ctx *ctx, + struct pks_access_test *test, + void *ptr) +{ + switch (test->mode) { + case PKS_TEST_NO_ACCESS: + pks_set_noaccess(ctx->pkey); + break; + case PKS_TEST_RDWR: + pks_set_readwrite(ctx->pkey); + break; + default: + pr_debug("BUG in test, invalid mode\n"); + return false; + } + + ctx->fault_seen =3D false; + set_context_for_fault(ctx); + + if (test->write) + memcpy(ptr, ctx->data, 8); + else + memcpy(ctx->data, ptr, 8); + + if (test->fault !=3D ctx->fault_seen) { + pr_err("pkey test FAILED: mode %s; write %s; fault %s !=3D %s\n", + get_mode_str(test->mode), + test->write ? "TRUE" : "FALSE", + test->fault ? "YES" : "NO", + ctx->fault_seen ? "YES" : "NO"); + return false; + } + + return true; } =20 static void *alloc_test_page(u8 pkey) @@ -108,6 +212,37 @@ static void free_ctx(struct pks_test_ctx *ctx) kfree(ctx); } =20 +static bool test_ctx(struct pks_test_ctx *ctx) +{ + bool rc =3D true; + int i; + u8 pkey; + void *ptr =3D ctx->test_page; + pte_t *ptep =3D NULL; + unsigned int level; + + ptep =3D lookup_address((unsigned long)ptr, &level); + if (!ptep) { + pr_err("Failed to lookup address???\n"); + return false; + } + + pkey =3D pte_flags_pkey(ptep->pte); + if (pkey !=3D ctx->pkey) { + pr_err("invalid pkey found: %u, test_pkey: %u\n", + pkey, ctx->pkey); + return false; + } + + for (i =3D 0; i < ARRAY_SIZE(pkey_test_ary); i++) { + /* sticky fail */ + if (!run_access_test(ctx, &pkey_test_ary[i], ptr)) + rc =3D false; + } + + return rc; +} + static struct pks_test_ctx *alloc_ctx(u8 pkey) { struct pks_test_ctx *ctx =3D kzalloc(sizeof(*ctx), GFP_KERNEL); @@ -139,6 +274,23 @@ static void set_ctx_data(struct pks_session_data *sd, = struct pks_test_ctx *ctx) sd->ctx =3D ctx; } =20 +static bool run_single(struct pks_session_data *sd) +{ + struct pks_test_ctx *ctx; + bool rc; + + ctx =3D alloc_ctx(PKS_KEY_TEST); + if (IS_ERR(ctx)) + return false; + + set_ctx_data(sd, ctx); + + rc =3D test_ctx(ctx); + pks_set_noaccess(ctx->pkey); + + return rc; +} + static void crash_it(struct pks_session_data *sd) { struct pks_test_ctx *ctx; @@ -203,6 +355,12 @@ static ssize_t pks_read_file(struct file *file, char _= _user *user_buf, return simple_read_from_buffer(user_buf, count, ppos, buf, len); } =20 +static void cleanup_test(void) +{ + g_ctx_under_test =3D NULL; + mutex_unlock(&test_run_lock); +} + static ssize_t pks_write_file(struct file *file, const char __user *user_b= uf, size_t count, loff_t *ppos) { @@ -235,6 +393,10 @@ static ssize_t pks_write_file(struct file *file, const= char __user *user_buf, pr_debug("check defaults test: 0x%lx\n", PKS_INIT_VALUE); on_each_cpu(check_pkey_settings, file->private_data, 1); break; + case RUN_SINGLE: + pr_debug("Single key\n"); + sd->last_test_pass =3D run_single(file->private_data); + break; default: pr_debug("Unknown test\n"); sd->last_test_pass =3D false; @@ -251,7 +413,7 @@ static ssize_t pks_write_file(struct file *file, const = char __user *user_buf, * Normal exit; clear up the locking flag */ sd->need_unlock =3D false; - mutex_unlock(&test_run_lock); + cleanup_test(); debug_result("Test complete", test_num, sd); return count; } @@ -282,7 +444,7 @@ static int pks_release_file(struct inode *inode, struct= file *file) * not exit normally. */ if (sd->need_unlock) - mutex_unlock(&test_run_lock); + cleanup_test(); free_ctx(sd->ctx); kfree(sd); return 0; diff --git a/tools/testing/selftests/x86/test_pks.c b/tools/testing/selftes= ts/x86/test_pks.c index df5bde9bfdbe..2c10b6c50416 100644 --- a/tools/testing/selftests/x86/test_pks.c +++ b/tools/testing/selftests/x86/test_pks.c @@ -31,6 +31,7 @@ =20 /* Values from the kernel */ #define CHECK_DEFAULTS "0" +#define RUN_SINGLE "1" #define RUN_CRASH_TEST "9" =20 time_t g_start_time; @@ -53,6 +54,7 @@ static int do_simple_test(const char *debugfs_str); */ enum { TEST_DEFAULTS =3D 0, + TEST_SINGLE, MAX_TESTS, } tests; =20 @@ -64,7 +66,8 @@ struct test_item { const char *debugfs_str; int (*test_fn)(const char *debugfs_str); } test_list[] =3D { - { "check_defaults", CHECK_DEFAULTS, do_simple_test } + { "check_defaults", CHECK_DEFAULTS, do_simple_test }, + { "single", RUN_SINGLE, do_simple_test } }; =20 static char *get_test_name(int test_num) --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6728AC433EF for ; Tue, 19 Apr 2022 17:09:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355970AbiDSRLm (ORCPT ); Tue, 19 Apr 2022 13:11:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355694AbiDSRKU (ORCPT ); Tue, 19 Apr 2022 13:10:20 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D6E612AF1 for ; Tue, 19 Apr 2022 10:07:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388047; x=1681924047; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aCc6vZZks5fhxGxuEXFNNrEdo/tMegPVC4Stf9I/c9g=; b=J+pdC6c/2KsaO5ZoHsRDlELw0wyTKbwCvS1DgzCV3u6cuTwznko9rlDT ugh7g6Dn6GLJ9c/wZrvMnuiLDC1I2bQGIwCQbN2mPCLOce41UdL+sZI05 hJNEIRTWOHQxOrRSCZy3PY29xCmWxUQX4sVU6zrixeh6gBHbFXfKJzl9p /F1oTtwspX1UjYkdjXDTtN25Ld3D5ORJLEo28QX2zJLlbd6gQqsN4iLJ2 KKmXXY5t8DlHyKmImmuua41ut3VFbta6DQFDblC2nP7o5KzH5WTkg6ro4 dxi/vyle1f3uqsRpL+sh+1QBR2HLwyxcZ1lC7NYJbum17aUKYs7+KiYDq g==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="326720652" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="326720652" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:25 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="530499070" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:25 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 41/44] mm/pkeys: PKS testing, test context switching Date: Tue, 19 Apr 2022 10:06:46 -0700 Message-Id: <20220419170649.1022246-42-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny PKS software must maintain the PKRS value during a context switch. Test this by running two processes simultaneously on the same CPU while using different permissions for the same pkey. Leverage test_pks to create two threads scheduled on the same cpu. On the kernel side create two commands. One to set up the pkey prior to the context switch (arm context switch) and a second to check the pkey after the context switch (check context switch). Signed-off-by: Ira Weiny --- Changes for V10 Fix CPU command line option Changes for V9 From Rick Edgecombe Ensure the parent/child threads don't cause each other to hang if one experiences a failure Adjust for the new test_pks user space component Adjust the debug output for '-d' option s/pks_mk_*/pks_set_*/ Use new set_file_data() call Changes for V8 Split this off from the main testing patch Remove unneeded prints --- lib/pks/pks_test.c | 54 +++++++++ tools/testing/selftests/x86/test_pks.c | 159 ++++++++++++++++++++++++- 2 files changed, 208 insertions(+), 5 deletions(-) diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c index dc309dd941be..86af2f61393d 100644 --- a/lib/pks/pks_test.c +++ b/lib/pks/pks_test.c @@ -38,12 +38,16 @@ #include #include =20 +#include + #include =20 #define PKS_TEST_MEM_SIZE (PAGE_SIZE) =20 #define CHECK_DEFAULTS 0 #define RUN_SINGLE 1 +#define ARM_CTX_SWITCH 2 +#define CHECK_CTX_SWITCH 3 #define RUN_CRASH_TEST 9 =20 static struct dentry *pks_test_dentry; @@ -343,6 +347,48 @@ static void arm_or_run_crash_test(struct pks_session_d= ata *sd) crash_it(sd); } =20 +static void arm_ctx_switch(struct pks_session_data *sd) +{ + struct pks_test_ctx *ctx; + + ctx =3D alloc_ctx(PKS_KEY_TEST); + if (IS_ERR(ctx)) { + pr_err("Failed to allocate a context\n"); + sd->last_test_pass =3D false; + return; + } + + set_ctx_data(sd, ctx); + + /* Ensure a known state to test context switch */ + pks_set_readwrite(ctx->pkey); +} + +static void check_ctx_switch(struct pks_session_data *sd) +{ + struct pks_test_ctx *ctx =3D sd->ctx; + unsigned long reg_pkrs; + int access; + + sd->last_test_pass =3D true; + + if (!ctx) { + pr_err("No Context switch configured\n"); + sd->last_test_pass =3D false; + return; + } + + rdmsrl(MSR_IA32_PKRS, reg_pkrs); + + access =3D (reg_pkrs >> PKR_PKEY_SHIFT(ctx->pkey)) & + PKEY_ACCESS_MASK; + if (access !=3D 0) { + pr_err("Context switch check failed: pkey %u: 0x%x reg: 0x%lx\n", + ctx->pkey, access, reg_pkrs); + sd->last_test_pass =3D false; + } +} + static ssize_t pks_read_file(struct file *file, char __user *user_buf, size_t count, loff_t *ppos) { @@ -397,6 +443,14 @@ static ssize_t pks_write_file(struct file *file, const= char __user *user_buf, pr_debug("Single key\n"); sd->last_test_pass =3D run_single(file->private_data); break; + case ARM_CTX_SWITCH: + pr_debug("Arming Context switch test\n"); + arm_ctx_switch(file->private_data); + break; + case CHECK_CTX_SWITCH: + pr_debug("Checking Context switch test\n"); + check_ctx_switch(file->private_data); + break; default: pr_debug("Unknown test\n"); sd->last_test_pass =3D false; diff --git a/tools/testing/selftests/x86/test_pks.c b/tools/testing/selftes= ts/x86/test_pks.c index 2c10b6c50416..626421fa8ed8 100644 --- a/tools/testing/selftests/x86/test_pks.c +++ b/tools/testing/selftests/x86/test_pks.c @@ -17,6 +17,7 @@ * ... */ #define _GNU_SOURCE +#include #include #include #include @@ -32,10 +33,13 @@ /* Values from the kernel */ #define CHECK_DEFAULTS "0" #define RUN_SINGLE "1" +#define ARM_CTX_SWITCH "2" +#define CHECK_CTX_SWITCH "3" #define RUN_CRASH_TEST "9" =20 time_t g_start_time; int g_debug; +unsigned long g_cpu; =20 #define PRINT_DEBUG(fmt, ...) \ do { \ @@ -47,6 +51,7 @@ int g_debug; fprintf(stderr, "%s: " fmt, __func__, ##__VA_ARGS__) =20 static int do_simple_test(const char *debugfs_str); +static int do_context_switch(const char *debugfs_str); =20 /* * The crash test is a special case which is not included in the run all @@ -55,6 +60,7 @@ static int do_simple_test(const char *debugfs_str); enum { TEST_DEFAULTS =3D 0, TEST_SINGLE, + TEST_CTX_SWITCH, MAX_TESTS, } tests; =20 @@ -67,7 +73,8 @@ struct test_item { int (*test_fn)(const char *debugfs_str); } test_list[] =3D { { "check_defaults", CHECK_DEFAULTS, do_simple_test }, - { "single", RUN_SINGLE, do_simple_test } + { "single", RUN_SINGLE, do_simple_test }, + { "context_switch", ARM_CTX_SWITCH, do_context_switch } }; =20 static char *get_test_name(int test_num) @@ -101,6 +108,7 @@ static void print_help_and_exit(char *argv0) printf("Usage: %s [-h,-d] [test]\n", argv0); printf(" --help,-h This help\n"); printf(" --debug,-d Output kernel debug via dynamic debug if available\n= "); + printf(" --cpu,-c Use 'cpu' for context switch default 0\n"); printf("\n"); printf(" Run all PKS tests or the [test] specified.\n"); printf("\n"); @@ -116,6 +124,143 @@ static void print_help_and_exit(char *argv0) printf("\n"); } =20 +/* + * debugfs_str is ignored for this test. + */ +static int do_context_switch(const char *debugfs_str) +{ + int switch_done[2]; + int setup_done[2]; + cpu_set_t cpuset; + char result[32]; + char done =3D 'P'; + int rc =3D 0; + pid_t pid; + int fd; + + if (g_cpu >=3D sysconf(_SC_NPROCESSORS_ONLN)) { + PRINT_ERROR("CPU %lu is invalid\n", g_cpu); + g_cpu =3D sysconf(_SC_NPROCESSORS_ONLN) - 1; + PRINT_ERROR(" running on max CPU: %lu\n", g_cpu); + } + + CPU_ZERO(&cpuset); + CPU_SET(g_cpu, &cpuset); + /* + * Ensure the two processes run on the same CPU so that they go through + * a context switch. + */ + sched_setaffinity(getpid(), sizeof(cpu_set_t), &cpuset); + + if (pipe(setup_done)) { + PRINT_ERROR("ERROR: Failed to create pipe\n"); + return -EIO; + } + if (pipe(switch_done)) { + PRINT_ERROR("ERROR: Failed to create pipe\n"); + return -EIO; + } + + fd =3D open(PKS_TEST_FILE, O_RDWR); + if (fd < 0) { + PRINT_DEBUG("Failed to open test file : %s\n", PKS_TEST_FILE); + return -ENOENT; + } + + /* Avoid duplicated output after fork */ + fflush(stderr); + fflush(stdout); + + pid =3D fork(); + if (pid =3D=3D 0) { + char done =3D 'P'; + + g_cpu =3D sched_getcpu(); + PRINT_DEBUG("Child: running on cpu %lu...\n", g_cpu); + + /* Allocate and run test. */ + write(fd, RUN_SINGLE, 1); + + /* Arm for context switch test */ + write(fd, ARM_CTX_SWITCH, 1); + + PRINT_DEBUG("Child: Tell parent to go\n"); + write(setup_done[1], &done, sizeof(done)); + + /* Context switch out... */ + PRINT_DEBUG("Child: Waiting for parent...\n"); + read(switch_done[0], &done, sizeof(done)); + + /* Check msr restored */ + PRINT_DEBUG("Child: Checking result\n"); + rc =3D write(fd, CHECK_CTX_SWITCH, 1); + if (rc < 0) { + if (errno =3D=3D ENOENT) { + sprintf(result, "SKIP"); + done =3D 'S'; + } else { + sprintf(result, "FAIL"); + done =3D 'F'; + } + goto child_exit; + } + + read(fd, result, 10); + if (strncmp(result, "PASS", 4)) + done =3D 'F'; + +child_exit: + PRINT_DEBUG("Child: Result (%c) %s\n", done, result); + + /* Signal result */ + write(setup_done[1], &done, sizeof(done)); + close(fd); + + exit(0); + } + + PRINT_DEBUG("Parent: Waiting for child\n"); + read(setup_done[0], &done, sizeof(done)); + g_cpu =3D sched_getcpu(); + PRINT_DEBUG("Parent: running on cpu %lu\n", g_cpu); + + /* The parent needs a unique file context within the kernel */ + close(fd); + fd =3D open(PKS_TEST_FILE, O_RDWR); + if (fd < 0) { + PRINT_ERROR("FATAL ERROR: cannot open %s\n", PKS_TEST_FILE); + PRINT_DEBUG("Parent: Signaling child 'fail'\n"); + done =3D 'F'; + write(switch_done[1], &done, sizeof(done)); + return -ENOENT; + } + + /* run test with the same pkey */ + rc =3D write(fd, RUN_SINGLE, 1); + + PRINT_DEBUG("Parent: Signaling child\n"); + write(switch_done[1], &done, sizeof(done)); + + if (rc < 0) { + rc =3D -errno; + goto close_file; + } + rc =3D 0; + + /* Wait for result */ + read(setup_done[0], &done, sizeof(done)); + if (done =3D=3D 'S') + rc =3D -ENOENT; + if (done =3D=3D 'F') + rc =3D -EFAULT; + + PRINT_DEBUG("Parent: exiting with rc (%c) %d\n", done, rc); + +close_file: + close(fd); + return rc; +} + /* * Do a simple test of writing the debugfs value and reading back for 'PAS= S' */ @@ -307,14 +452,15 @@ int main(int argc, char *argv[]) =20 while (1) { static struct option long_options[] =3D { - {"help", no_argument, 0, 'h' }, - {"debug", no_argument, 0, 'd' }, - {0, 0, 0, 0 } + {"help", no_argument, 0, 'h' }, + {"debug", no_argument, 0, 'd' }, + {"cpu", required_argument, 0, 'c' }, + {0, 0, 0, 0 } }; int option_index =3D 0; int c; =20 - c =3D getopt_long(argc, argv, "hd", long_options, &option_index); + c =3D getopt_long(argc, argv, "hdc:", long_options, &option_index); if (c =3D=3D -1) break; =20 @@ -325,6 +471,9 @@ int main(int argc, char *argv[]) case 'd': g_debug++; break; + case 'c': + g_cpu =3D strtoul(optarg, NULL, 0); + break; default: print_help_and_exit(argv[0]); exit(-1); --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 143C8C433F5 for ; Tue, 19 Apr 2022 17:09:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355984AbiDSRML (ORCPT ); Tue, 19 Apr 2022 13:12:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60050 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355698AbiDSRKU (ORCPT ); Tue, 19 Apr 2022 13:10:20 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6ABE313CC2 for ; Tue, 19 Apr 2022 10:07:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388047; x=1681924047; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZSLyLZxH6D8Mp+7onzwQckyZ3t/ShnheZ0rUHeiHQM8=; b=YqOveCABQR9PtwJZ1UQSPICZJn5COQOguPnacOL+Nz/m2ZIK/W6EuLTx UPrS1VScvti3ELypTF3MPQRkfo6ErLEI6NymqOlQBUPPO1pH7q26scZWN FbCVv0xaR/AQ93TjiJrn6v3U2QpYFIc0YbRjdCKYnzbRCys1tm4vClyHc zUAVH8ozoHE5dh5KgnLP/QTpPrVn7drxbX3C+CNoPRluwybNjkiN17S9E Bj80YjT9YJoVMfuerUdle/2KtcbuFBGJQ75Z0fWL4/Y/r0UIkP1MggHls xJlEV0IJ1R84miJ2FKljk6ANZxdfOT7pJkbunujsDpmZ0Ai23AkMuAMPr Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="262677119" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="262677119" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:27 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="727145629" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:26 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 42/44] mm/pkeys: PKS testing, Add exception test Date: Tue, 19 Apr 2022 10:06:47 -0700 Message-Id: <20220419170649.1022246-43-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny During an exception the interrupted threads PKRS value is preserved and the exception receives the default value for that pkey. Upon return from exception the threads PKRS value is restored. Add a PKS test which forces a fault to check that this works as intended. Check that both the thread as well as the exception PKRS state is correct at the beginning, during, and after the exception. Add the test to the test_pks app. Signed-off-by: Ira Weiny --- Change for V9 Add test to test_pks Clean up the globals shared with the fault handler Use the PKS Test specific fault callback s/pks_mk*/pks_set*/ Change pkey type to u8 From Dave Hansen use pkey Change for V8 Split this test off from the testing patch and place it after the exception saving code. --- arch/x86/mm/pkeys.c | 2 +- include/linux/pks.h | 6 ++ lib/pks/pks_test.c | 133 +++++++++++++++++++++++++ tools/testing/selftests/x86/test_pks.c | 5 +- 4 files changed, 144 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 9e0948766427..ee5eff6bdbf3 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -216,7 +216,7 @@ u32 pkey_update_pkval(u32 pkval, u8 pkey, u32 accessbit= s) =20 #ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS =20 -static DEFINE_PER_CPU(u32, pkrs_cache); +__static_or_pks_test DEFINE_PER_CPU(u32, pkrs_cache); =20 /** * DOC: DEFINE_PKS_FAULT_CALLBACK diff --git a/include/linux/pks.h b/include/linux/pks.h index fd0ed09dd143..163c75992a8a 100644 --- a/include/linux/pks.h +++ b/include/linux/pks.h @@ -59,9 +59,15 @@ static inline void pks_update_exception(struct pt_regs *= regs, =20 #ifdef CONFIG_PKS_TEST =20 +#define __static_or_pks_test + bool pks_test_fault_callback(struct pt_regs *regs, unsigned long address, bool write); =20 +#else /* !CONFIG_PKS_TEST */ + +#define __static_or_pks_test static + #endif /* CONFIG_PKS_TEST */ =20 #endif /* _LINUX_PKS_H */ diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c index 86af2f61393d..762f4a19cb7d 100644 --- a/lib/pks/pks_test.c +++ b/lib/pks/pks_test.c @@ -48,19 +48,30 @@ #define RUN_SINGLE 1 #define ARM_CTX_SWITCH 2 #define CHECK_CTX_SWITCH 3 +#define RUN_EXCEPTION 4 #define RUN_CRASH_TEST 9 =20 +DECLARE_PER_CPU(u32, pkrs_cache); + static struct dentry *pks_test_dentry; =20 DEFINE_MUTEX(test_run_lock); =20 struct pks_test_ctx { u8 pkey; + bool pass; char data[64]; void *test_page; bool fault_seen; + bool validate_exp_handling; }; =20 +static bool check_pkey_val(u32 pk_reg, u8 pkey, u32 expected) +{ + pk_reg =3D (pk_reg >> PKR_PKEY_SHIFT(pkey)) & PKEY_ACCESS_MASK; + return (pk_reg =3D=3D expected); +} + static void debug_context(const char *label, struct pks_test_ctx *ctx) { pr_debug("%s [%d] %s <-> %p\n", @@ -96,6 +107,63 @@ static void debug_result(const char *label, int test_nu= m, sd->last_test_pass ? "PASS" : "FAIL"); } =20 +/* + * Check if the register @pkey value matches @expected value + * + * Both the cached and actual MSR must match. + */ +static bool check_pkrs(u8 pkey, u8 expected) +{ + bool ret =3D true; + u64 pkrs; + u32 *tmp_cache; + + tmp_cache =3D get_cpu_ptr(&pkrs_cache); + if (!check_pkey_val(*tmp_cache, pkey, expected)) + ret =3D false; + put_cpu_ptr(tmp_cache); + + rdmsrl(MSR_IA32_PKRS, pkrs); + if (!check_pkey_val(pkrs, pkey, expected)) + ret =3D false; + + return ret; +} + +static void validate_exception(struct pks_test_ctx *ctx, u32 thread_pkrs) +{ + u8 pkey =3D ctx->pkey; + + /* Check that the thread state was saved */ + if (!check_pkey_val(thread_pkrs, pkey, PKEY_DISABLE_WRITE)) { + pr_err(" FAIL: checking aux_pt_regs->thread_pkrs\n"); + ctx->pass =3D false; + } + + /* Check that the exception received the default of disabled access */ + if (!check_pkrs(pkey, PKEY_DISABLE_ACCESS)) { + pr_err(" FAIL: PKRS cache and MSR\n"); + ctx->pass =3D false; + } + + /* + * Ensure an update can occur during exception without affecting the + * interrupted thread. The interrupted thread is verified after the + * exception returns. + */ + pks_set_readwrite(pkey); + if (!check_pkrs(pkey, 0)) { + pr_err(" FAIL: exception did not change register to 0\n"); + ctx->pass =3D false; + } + pks_set_noaccess(pkey); + if (!check_pkrs(pkey, PKEY_DISABLE_ACCESS)) { + pr_err(" FAIL: exception did not change register to 0x%x\n", + PKEY_DISABLE_ACCESS); + ctx->pass =3D false; + } +} + /* Global data protected by test_run_lock */ struct pks_test_ctx *g_ctx_under_test; =20 @@ -122,6 +190,16 @@ bool pks_test_fault_callback(struct pt_regs *regs, uns= igned long address, if (!g_ctx_under_test) return false; =20 + if (g_ctx_under_test->validate_exp_handling) { + validate_exception(g_ctx_under_test, pkrs); + /* + * Stop this check directly within the exception because the + * fault handler clean up code will call again while checking + * the PMD entry and there is no need to check this again. + */ + g_ctx_under_test->validate_exp_handling =3D false; + } + aux_pt_regs->pkrs =3D pkey_update_pkval(pkrs, g_ctx_under_test->pkey, 0); g_ctx_under_test->fault_seen =3D true; return true; @@ -255,6 +333,7 @@ static struct pks_test_ctx *alloc_ctx(u8 pkey) return ERR_PTR(-ENOMEM); =20 ctx->pkey =3D pkey; + ctx->pass =3D true; sprintf(ctx->data, "%s", "DEADBEEF"); =20 ctx->test_page =3D alloc_test_page(ctx->pkey); @@ -295,6 +374,56 @@ static bool run_single(struct pks_session_data *sd) return rc; } =20 +static bool run_exception_test(struct pks_session_data *sd) +{ + bool pass =3D true; + struct pks_test_ctx *ctx; + + ctx =3D alloc_ctx(PKS_KEY_TEST); + if (IS_ERR(ctx)) { + pr_debug(" FAIL: no context\n"); + return false; + } + + set_ctx_data(sd, ctx); + + /* + * Set the thread pkey value to something other than the default of + * access disable but something which still causes a fault, disable + * writes. + */ + pks_update_protection(ctx->pkey, PKEY_DISABLE_WRITE); + + ctx->validate_exp_handling =3D true; + set_context_for_fault(ctx); + + memcpy(ctx->test_page, ctx->data, 8); + + if (!ctx->fault_seen) { + pr_err(" FAIL: did not get an exception\n"); + pass =3D false; + } + + /* + * The exception code has to enable access to keep the fault from + * looping forever. Therefore full access is seen here rather than + * write disabled. + * + * However, this does verify that the exception state was independent + * of the interrupted threads state because validate_exception() + * disabled access during the exception. + */ + if (!check_pkrs(ctx->pkey, 0)) { + pr_err(" FAIL: PKRS not restored\n"); + pass =3D false; + } + + if (!ctx->pass) + pass =3D false; + + return pass; +} + static void crash_it(struct pks_session_data *sd) { struct pks_test_ctx *ctx; @@ -451,6 +580,10 @@ static ssize_t pks_write_file(struct file *file, const= char __user *user_buf, pr_debug("Checking Context switch test\n"); check_ctx_switch(file->private_data); break; + case RUN_EXCEPTION: + pr_debug("Exception checking\n"); + sd->last_test_pass =3D run_exception_test(file->private_data); + break; default: pr_debug("Unknown test\n"); sd->last_test_pass =3D false; diff --git a/tools/testing/selftests/x86/test_pks.c b/tools/testing/selftes= ts/x86/test_pks.c index 626421fa8ed8..c40035803e38 100644 --- a/tools/testing/selftests/x86/test_pks.c +++ b/tools/testing/selftests/x86/test_pks.c @@ -35,6 +35,7 @@ #define RUN_SINGLE "1" #define ARM_CTX_SWITCH "2" #define CHECK_CTX_SWITCH "3" +#define RUN_EXCEPTION "4" #define RUN_CRASH_TEST "9" =20 time_t g_start_time; @@ -61,6 +62,7 @@ enum { TEST_DEFAULTS =3D 0, TEST_SINGLE, TEST_CTX_SWITCH, + TEST_EXCEPTION, MAX_TESTS, } tests; =20 @@ -74,7 +76,8 @@ struct test_item { } test_list[] =3D { { "check_defaults", CHECK_DEFAULTS, do_simple_test }, { "single", RUN_SINGLE, do_simple_test }, - { "context_switch", ARM_CTX_SWITCH, do_context_switch } + { "context_switch", ARM_CTX_SWITCH, do_context_switch }, + { "exception", RUN_EXCEPTION, do_simple_test } }; =20 static char *get_test_name(int test_num) --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9E43C433F5 for ; Tue, 19 Apr 2022 17:09:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355974AbiDSRLr (ORCPT ); Tue, 19 Apr 2022 13:11:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355708AbiDSRKU (ORCPT ); Tue, 19 Apr 2022 13:10:20 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6069C13FBC for ; Tue, 19 Apr 2022 10:07:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388048; x=1681924048; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4R0sbFwDC9eNw8IsKBweMZJr5Tuj6Z2ytY+njPqpWC4=; b=JWsqe3L+RRPax9le7xJAd7lgcJFcPleDCdr+90KUMt8f2s/C1XZRDmxg wisVzRfDy9Ui2JqEZgob0P60bh2RXyoXp8O6siQ7y14l5q8/fn7iewOoQ VD63FPDp4cratglSOUfScSikkWGrUPP/P0Sbn+8R6aLDwprrtz9j7xnKC FTMI/mDuxYFXQMIARkF+nDqunTg9dhYgwm4ysRHQP4SvBmvOuRRuofjDN I9/ZpdRqlgD48ks8LTs+RgoFmfFlXD4iJNUkGHrP9phD9zy9cZvoj71/7 4r7gAg0humkou/WwaQE63f9O7cfmKQSFiJNS1Jnip2guDowQqE0tR+w6g A==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="262677124" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="262677124" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:28 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="529397394" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:27 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 43/44] mm/pkeys: PKS testing, test pks_update_exception() Date: Tue, 19 Apr 2022 10:06:48 -0700 Message-Id: <20220419170649.1022246-44-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny A common use case for the custom fault callbacks will be for the callback to warn of the violation and relax the permissions rather than crash the kernel. pks_update_exception() was added for this purpose. Add a test which uses pks_update_exception() to clear the pkey permissions. Verify that the permissions are changed in the interrupted thread. Signed-off-by: Ira Weiny --- Changes for V9 Update the commit message Clean up test name Add test_pks support s/pks_mk_*/pks_set_*/ Simplify the use of globals for the faults From Rick Edgecombe Use WRITE_ONCE to protect against races with the fault handler s/RUN_FAULT_ABANDON/RUN_FAULT_CALLBACK Changes for V8 New test developed just to double check for regressions while reworking the code. --- lib/pks/pks_test.c | 60 ++++++++++++++++++++++++++ tools/testing/selftests/x86/test_pks.c | 5 ++- 2 files changed, 64 insertions(+), 1 deletion(-) diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c index 762f4a19cb7d..a9cd2a49abfa 100644 --- a/lib/pks/pks_test.c +++ b/lib/pks/pks_test.c @@ -49,6 +49,7 @@ #define ARM_CTX_SWITCH 2 #define CHECK_CTX_SWITCH 3 #define RUN_EXCEPTION 4 +#define RUN_EXCEPTION_UPDATE 5 #define RUN_CRASH_TEST 9 =20 DECLARE_PER_CPU(u32, pkrs_cache); @@ -64,6 +65,7 @@ struct pks_test_ctx { void *test_page; bool fault_seen; bool validate_exp_handling; + bool validate_update_exp; }; =20 static bool check_pkey_val(u32 pk_reg, u8 pkey, u32 expected) @@ -164,6 +166,16 @@ static void validate_exception(struct pks_test_ctx *ct= x, u32 thread_pkrs) } } =20 +static bool handle_update_exception(struct pt_regs *regs, struct pks_test_= ctx *ctx) +{ + pr_debug("Updating pkey %d during exception\n", ctx->pkey); + + ctx->fault_seen =3D true; + pks_update_exception(regs, ctx->pkey, 0); + + return true; +} + /* Global data protected by test_run_lock */ struct pks_test_ctx *g_ctx_under_test; =20 @@ -190,6 +202,9 @@ bool pks_test_fault_callback(struct pt_regs *regs, unsi= gned long address, if (!g_ctx_under_test) return false; =20 + if (g_ctx_under_test->validate_update_exp) + return handle_update_exception(regs, g_ctx_under_test); + if (g_ctx_under_test->validate_exp_handling) { validate_exception(g_ctx_under_test, pkrs); /* @@ -518,6 +533,47 @@ static void check_ctx_switch(struct pks_session_data *= sd) } } =20 +static bool run_exception_update(struct pks_session_data *sd) +{ + struct pks_test_ctx *ctx; + + ctx =3D alloc_ctx(PKS_KEY_TEST); + if (IS_ERR(ctx)) + return false; + + set_ctx_data(sd, ctx); + + ctx->fault_seen =3D false; + ctx->validate_update_exp =3D true; + pks_set_noaccess(ctx->pkey); + + set_context_for_fault(ctx); + + /* fault */ + memcpy(ctx->test_page, ctx->data, 8); + + if (!ctx->fault_seen) { + pr_err("Failed to see the callback\n"); + return false; + } + + ctx->fault_seen =3D false; + ctx->validate_update_exp =3D false; + + set_context_for_fault(ctx); + + /* no fault */ + memcpy(ctx->test_page, ctx->data, 8); + + if (ctx->fault_seen) { + pr_err("Pkey %d failed to be set RD/WR in the callback\n", + ctx->pkey); + return false; + } + + return true; +} + static ssize_t pks_read_file(struct file *file, char __user *user_buf, size_t count, loff_t *ppos) { @@ -584,6 +640,10 @@ static ssize_t pks_write_file(struct file *file, const= char __user *user_buf, pr_debug("Exception checking\n"); sd->last_test_pass =3D run_exception_test(file->private_data); break; + case RUN_EXCEPTION_UPDATE: + pr_debug("Fault clear test\n"); + sd->last_test_pass =3D run_exception_update(file->private_data); + break; default: pr_debug("Unknown test\n"); sd->last_test_pass =3D false; diff --git a/tools/testing/selftests/x86/test_pks.c b/tools/testing/selftes= ts/x86/test_pks.c index c40035803e38..194c9dd9a211 100644 --- a/tools/testing/selftests/x86/test_pks.c +++ b/tools/testing/selftests/x86/test_pks.c @@ -36,6 +36,7 @@ #define ARM_CTX_SWITCH "2" #define CHECK_CTX_SWITCH "3" #define RUN_EXCEPTION "4" +#define RUN_EXCEPTION_UPDATE "5" #define RUN_CRASH_TEST "9" =20 time_t g_start_time; @@ -63,6 +64,7 @@ enum { TEST_SINGLE, TEST_CTX_SWITCH, TEST_EXCEPTION, + TEST_FAULT_CALLBACK, MAX_TESTS, } tests; =20 @@ -77,7 +79,8 @@ struct test_item { { "check_defaults", CHECK_DEFAULTS, do_simple_test }, { "single", RUN_SINGLE, do_simple_test }, { "context_switch", ARM_CTX_SWITCH, do_context_switch }, - { "exception", RUN_EXCEPTION, do_simple_test } + { "exception", RUN_EXCEPTION, do_simple_test }, + { "exception_update", RUN_EXCEPTION_UPDATE, do_simple_test } }; =20 static char *get_test_name(int test_num) --=20 2.35.1 From nobody Thu Jun 18 15:45:34 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4CEAC433EF for ; Tue, 19 Apr 2022 17:09:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356010AbiDSRMB (ORCPT ); Tue, 19 Apr 2022 13:12:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355695AbiDSRKU (ORCPT ); Tue, 19 Apr 2022 13:10:20 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B86FC140F1 for ; Tue, 19 Apr 2022 10:07:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650388049; x=1681924049; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=IUJvmUts7zoezDYr8j9Vo1UZam84ShRJ8gR0jPI62nQ=; b=Grn+tzGldlDzTKxNFqE4C/jSvWLyNkUCnahtaEhs5r+6tIryCo8emEHt ee9xPdlL2zTEC3MYT5M6KmFjGp2/0IIp/yrs4U+iQEmwjr53MMB+SjHCq NXI/ei0GNNGWPAI5fcJ4mQXCuaM81PVHBPJDs/OwPC/29MU70Il75LwDz dredc9DStJqQL71KrvwceeuAwNeceCNhsNlXf7uawdUFnMKz7gi006Mmb nb2m69XycC/BLlM00jl18Y9RQLT3+mzdArzF/bq6slUDbY2J6sCcU8pnp ERT1wDkwy+7a83FslX+4JEFAgm/jQPzcsJSU3MPUI+plFmhCa3XEWlitG Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10322"; a="324261565" X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="324261565" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:29 -0700 X-IronPort-AV: E=Sophos;i="5.90,273,1643702400"; d="scan'208";a="554813731" Received: from ajacosta-mobl1.amr.corp.intel.com (HELO localhost) ([10.212.11.4]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2022 10:07:28 -0700 From: ira.weiny@intel.com To: Dave Hansen , "H. Peter Anvin" , Dan Williams Cc: Ira Weiny , Fenghua Yu , Rick Edgecombe , "Shankar, Ravi V" , linux-kernel@vger.kernel.org Subject: [PATCH V10 44/44] mm/pkeys: PKS testing, add test for all keys Date: Tue, 19 Apr 2022 10:06:49 -0700 Message-Id: <20220419170649.1022246-45-ira.weiny@intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220419170649.1022246-1-ira.weiny@intel.com> References: <20220419170649.1022246-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Ira Weiny To help test hardware and qemu it is necessary to be able to run through all the available pkeys and run the access checks. However, running this test will conflict with normal PKS consumers. Make a test, which is mutually exclusive from all other PKS consumers, that loops through all the pkeys and tests the various access modes. Update the documentation. Signed-off-by: Ira Weiny --- Changes for V9 Update commit message Create ENABLE_PKS_CONSUMER Kconfig to make this test mutually exclusive with any other pks consumer Changes for V8 Split this off from the large testing patch Remove debugging version --- Documentation/core-api/protection-keys.rst | 12 +++---- arch/x86/mm/pkeys.c | 10 ++++++ include/linux/pks-keys.h | 5 +++ lib/Kconfig.debug | 21 +++++++++++ lib/pks/pks_test.c | 41 +++++++++++++++++++++- mm/Kconfig | 9 +++++ tools/testing/selftests/x86/test_pks.c | 5 ++- 7 files changed, 95 insertions(+), 8 deletions(-) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/cor= e-api/protection-keys.rst index d492ec194e2a..36621cbc2cc6 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -117,20 +117,20 @@ Kconfig ------- =20 Kernel users intending to use PKS support should depend on -ARCH_HAS_SUPERVISOR_PKEYS, and select ARCH_ENABLE_SUPERVISOR_PKEYS to turn= on -this support within the core. For example: +ARCH_HAS_SUPERVISOR_PKEYS, and select ARCH_ENABLE_PKS_CONSUMER to turn on = this +support within the core. For example: =20 .. code-block:: c =20 config MY_NEW_FEATURE depends on ARCH_HAS_SUPERVISOR_PKEYS - select ARCH_ENABLE_SUPERVISOR_PKEYS + select ARCH_ENABLE_PKS_CONSUMER =20 This will make "MY_NEW_FEATURE" unavailable unless the architecture sets ARCH_HAS_SUPERVISOR_PKEYS. It also makes it possible for multiple indepen= dent -features to "select ARCH_ENABLE_SUPERVISOR_PKEYS". If no features enable = PKS -by selecting ARCH_ENABLE_SUPERVISOR_PKEYS, PKS support will not be compiled -into the kernel. +features to "select ARCH_ENABLE_PKS_CONSUMER". If no features enable PKS = by +selecting ARCH_ENABLE_PKS_CONSUMER, PKS support will not be compiled into = the +kernel. =20 PKS Key Allocation ------------------ diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index ee5eff6bdbf3..74ba51b9853b 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -244,6 +244,8 @@ __static_or_pks_test DEFINE_PER_CPU(u32, pkrs_cache); * #endif * }; */ +#ifndef CONFIG_PKS_TEST_ALL_KEYS + static const pks_key_callback pks_key_callbacks[PKS_KEY_MAX] =3D { #ifdef CONFIG_DEVMAP_ACCESS_PROTECTION [PKS_KEY_PGMAP_PROTECTION] =3D pgmap_pks_fault_callback, @@ -253,6 +255,14 @@ static const pks_key_callback pks_key_callbacks[PKS_KE= Y_MAX] =3D { #endif }; =20 +#else /* CONFIG_PKS_TEST_ALL_KEYS */ + +static const pks_key_callback pks_key_callbacks[PKS_KEY_MAX] =3D { + [1 ... (PKS_KEY_MAX-1)] =3D pks_test_fault_callback, +}; + +#endif + static bool pks_call_fault_callback(struct pt_regs *regs, unsigned long ad= dress, bool write, u16 key) { diff --git a/include/linux/pks-keys.h b/include/linux/pks-keys.h index 380bc999cbe3..aef1cb3c0f7f 100644 --- a/include/linux/pks-keys.h +++ b/include/linux/pks-keys.h @@ -66,6 +66,11 @@ CONFIG_PKS_TEST) #define PKS_KEY_MAX PKS_NEW_KEY(PKS_KEY_TEST, 1) =20 +#ifdef CONFIG_PKS_TEST_ALL_KEYS +#undef PKS_KEY_MAX +#define PKS_KEY_MAX PKS_NUM_PKEYS +#endif + /* PKS_KEY_DEFAULT_INIT must be RW */ #define PKS_KEY_DEFAULT_INIT PKS_DECLARE_INIT_VALUE(PKS_KEY_DEFAULT, RW, 1) #define PKS_KEY_PGMAP_INIT PKS_DECLARE_INIT_VALUE(PKS_KEY_PGMAP_PROTECTION= , \ diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 7ac43b78c7bb..57a76c096ea7 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2758,6 +2758,12 @@ config HYPERV_TESTING help Select this option to enable Hyper-V vmbus testing. =20 +# PKS_TEST is a special PKS consumer and therefore sets +# ARCH_ENABLE_SUPERVISOR_PKEYS directly rather than through +# ARCH_ENABLE_PKS_CONSUMER +# +# This allows PKS_TEST_ALL_KEYS to remain mutially exclusive to any real P= KS +# consumer config PKS_TEST bool "PKey (S)upervisor testing" depends on ARCH_HAS_SUPERVISOR_PKEYS @@ -2770,6 +2776,21 @@ config PKS_TEST =20 If unsure, say N. =20 +config PKS_TEST_ALL_KEYS + bool "PKS test all keys" + depends on (PKS_TEST && !ARCH_ENABLE_PKS_CONSUMER) + help + Select this option to enable testing of all the PKS keys available in + the architecture. This option is mutually exclusive with PKS + consumers other than PKS_TEST. This is because it will consume all + PKS keys for testing purposes. + + Answer N if you don't know what supervisor keys are or want to have + supervisor keys available for other consumers. + + If unsure, say N. + + endmenu # "Kernel Testing and Coverage" =20 source "Documentation/Kconfig" diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c index a9cd2a49abfa..e38a487c7065 100644 --- a/lib/pks/pks_test.c +++ b/lib/pks/pks_test.c @@ -50,12 +50,12 @@ #define CHECK_CTX_SWITCH 3 #define RUN_EXCEPTION 4 #define RUN_EXCEPTION_UPDATE 5 +#define RUN_ALL_KEYS 6 #define RUN_CRASH_TEST 9 =20 DECLARE_PER_CPU(u32, pkrs_cache); =20 static struct dentry *pks_test_dentry; - DEFINE_MUTEX(test_run_lock); =20 struct pks_test_ctx { @@ -439,6 +439,39 @@ static bool run_exception_test(struct pks_session_data= *sd) return pass; } =20 +#ifdef CONFIG_PKS_TEST_ALL_KEYS + +static bool run_all_keys(void) +{ + struct pks_test_ctx *ctx[PKS_NUM_PKEYS]; + static char name[PKS_NUM_PKEYS][64]; + int i; + bool rc =3D true; + + for (i =3D 1; i < PKS_NUM_PKEYS; i++) { + sprintf(name[i], "pks ctx %d", i); + ctx[i] =3D alloc_ctx(i); + } + + for (i =3D 1; i < PKS_NUM_PKEYS; i++) { + pr_debug("Running pkey '%d'\n", i); + if (!IS_ERR(ctx[i])) { + /* sticky fail */ + if (!test_ctx(ctx[i])) + rc =3D false; + } + } + + for (i =3D 1; i < PKS_NUM_PKEYS; i++) { + if (!IS_ERR(ctx[i])) + free_ctx(ctx[i]); + } + + return rc; +} + +#endif + static void crash_it(struct pks_session_data *sd) { struct pks_test_ctx *ctx; @@ -644,6 +677,12 @@ static ssize_t pks_write_file(struct file *file, const= char __user *user_buf, pr_debug("Fault clear test\n"); sd->last_test_pass =3D run_exception_update(file->private_data); break; +#ifdef CONFIG_PKS_TEST_ALL_KEYS + case RUN_ALL_KEYS: + pr_debug("Run all\n"); + sd->last_test_pass =3D run_all_keys(); + goto unlock_test; +#endif default: pr_debug("Unknown test\n"); sd->last_test_pass =3D false; diff --git a/mm/Kconfig b/mm/Kconfig index 616baee3f62d..a25217f2729d 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -842,6 +842,15 @@ config ARCH_HAS_PKEYS bool config ARCH_HAS_SUPERVISOR_PKEYS bool + +config ARCH_ENABLE_PKS_CONSUMER + select ARCH_ENABLE_SUPERVISOR_PKEYS + bool + +# WARNING Do not set ARCH_ENABLE_SUPERVISOR_PKEYS directly use +# ARCH_ENABLE_PKS_CONSUMER instead. +# +# See the PKey (S)upervisor testing (PKS_TEST) config option for details. config ARCH_ENABLE_SUPERVISOR_PKEYS bool =20 diff --git a/tools/testing/selftests/x86/test_pks.c b/tools/testing/selftes= ts/x86/test_pks.c index 194c9dd9a211..8ffe4596de1f 100644 --- a/tools/testing/selftests/x86/test_pks.c +++ b/tools/testing/selftests/x86/test_pks.c @@ -37,6 +37,7 @@ #define CHECK_CTX_SWITCH "3" #define RUN_EXCEPTION "4" #define RUN_EXCEPTION_UPDATE "5" +#define RUN_ALL_KEYS "6" #define RUN_CRASH_TEST "9" =20 time_t g_start_time; @@ -65,6 +66,7 @@ enum { TEST_CTX_SWITCH, TEST_EXCEPTION, TEST_FAULT_CALLBACK, + TEST_ALL, MAX_TESTS, } tests; =20 @@ -80,7 +82,8 @@ struct test_item { { "single", RUN_SINGLE, do_simple_test }, { "context_switch", ARM_CTX_SWITCH, do_context_switch }, { "exception", RUN_EXCEPTION, do_simple_test }, - { "exception_update", RUN_EXCEPTION_UPDATE, do_simple_test } + { "exception_update", RUN_EXCEPTION_UPDATE, do_simple_test }, + { "run_all", RUN_ALL_KEYS, do_simple_test } }; =20 static char *get_test_name(int test_num) --=20 2.35.1