From nobody Sat Nov 30 03:52:21 2024 Received: from smtp-fw-52005.amazon.com (smtp-fw-52005.amazon.com [52.119.213.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F7921A3035 for ; Wed, 11 Sep 2024 14:36:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726065420; cv=none; b=NX+5BC6GXhy/jpLLomu3Ebm//xYNVwnsU9DKIpRAUgf2QA0fq2z8NJknERx1tYGKJJt17Jq+IaE643eS5ZPoR9mnvoX3jD/tBoQ7E5VjKaNNKHw2cJZQO+LReg4LOjIcs6dnvzsM5V1qub83sEdrdlgh06LbqYNoYKiMqddTqsc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726065420; c=relaxed/simple; bh=IIAVkvLaIFvLB0z2Wa6dsujKRjAOpbGd0p7bW6r9E64=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=SXBY4TbR0cGNodCJXZ5uru0ZMMu/3KtfZTJ0k45vvCLogOLZ/OtrjG8dZ3P37XZuC5qIUkDyMXccdEKgPTMUtsotz/nIU8AnHTpsI4AY3oaoNdcwOt51K3IhCsUr1lrza9vU6cb9aX1gN0/UylWob8ftZYXefvMTWGsM75dA/Sk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.de; spf=pass smtp.mailfrom=amazon.de; dkim=pass (1024-bit key) header.d=amazon.de header.i=@amazon.de header.b=nI2L2RIe; arc=none smtp.client-ip=52.119.213.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.de header.i=@amazon.de header.b="nI2L2RIe" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1726065419; x=1757601419; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=M6VWy3IvZCNc4zvvTM+7ldu6BbKiq6bFPxrlGyI8xuQ=; b=nI2L2RIemjQ39cae4IPqUUN+CGTZvgKt8PtQB9fLkLsv7cnssx7n3sjm iElBk5IVs4lli1HAPAvpOQnOSJM+WhM0Deflx2zcwcF6zwdXSJKSnz7ON /JudbHObmhFQtoL4GBu3WPBsBcMSBhPmMBM878vnNsV8ORnn4wOGE8/US k=; X-IronPort-AV: E=Sophos;i="6.10,220,1719878400"; d="scan'208";a="679649476" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52005.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2024 14:35:48 +0000 Received: from EX19MTAEUC002.ant.amazon.com [10.0.43.254:50131] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.39.168:2525] with esmtp (Farcaster) id ce0e4cf6-8c87-42ac-ae89-33a20b34e603; Wed, 11 Sep 2024 14:35:46 +0000 (UTC) X-Farcaster-Flow-ID: ce0e4cf6-8c87-42ac-ae89-33a20b34e603 Received: from EX19D007EUA004.ant.amazon.com (10.252.50.76) by EX19MTAEUC002.ant.amazon.com (10.252.51.245) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 11 Sep 2024 14:35:42 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D007EUA004.ant.amazon.com (10.252.50.76) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.35; Wed, 11 Sep 2024 14:35:41 +0000 Received: from dev-dsk-faresx-1b-27755bf1.eu-west-1.amazon.com (10.253.79.181) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Wed, 11 Sep 2024 14:35:39 +0000 From: Fares Mehanna To: CC: , Fares Mehanna , Roman Kagan , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Andrew Morton , Kemeng Shi , =?UTF-8?q?Pierre-Cl=C3=A9ment=20Tosi?= , Ard Biesheuvel , Mark Rutland , "Javier Martinez Canillas" , Arnd Bergmann , Fuad Tabba , Mark Brown , Joey Gouly , Kristina Martsenko , "Randy Dunlap" , Bjorn Helgaas , Jean-Philippe Brucker , "Mike Rapoport (IBM)" , David Hildenbrand , "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , "open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , open list , "open list:MEMORY MANAGEMENT" Subject: [RFC PATCH 1/7] mseal: expose interface to seal / unseal user memory ranges Date: Wed, 11 Sep 2024 14:34:00 +0000 Message-ID: <20240911143421.85612-2-faresx@amazon.de> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240911143421.85612-1-faresx@amazon.de> References: <20240911143421.85612-1-faresx@amazon.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To make sure the kernel mm-local mapping is untouched by the user, we will = seal the VMA before changing the protection to be used by the kernel. This will guarantee that userspace can't unmap or alter this VMA while it is being used by the kernel. After the kernel is done with the secret memory, it will unseal the VMA to = be able to unmap and free it. Unseal operation is not exposed to userspace. Signed-off-by: Fares Mehanna Signed-off-by: Roman Kagan --- mm/internal.h | 7 +++++ mm/mseal.c | 81 ++++++++++++++++++++++++++++++++------------------- 2 files changed, 58 insertions(+), 30 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index b4d86436565b..cf7280d101e9 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1501,6 +1501,8 @@ bool can_modify_mm(struct mm_struct *mm, unsigned lon= g start, unsigned long end); bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, unsigned long end, int behavior); +/* mm's mmap write lock must be taken before seal/unseal operation */ +int do_mseal(unsigned long start, unsigned long end, bool seal); #else static inline int can_do_mseal(unsigned long flags) { @@ -1518,6 +1520,11 @@ static inline bool can_modify_mm_madv(struct mm_stru= ct *mm, unsigned long start, { return true; } + +static inline int do_mseal(unsigned long start, unsigned long end, bool se= al) +{ + return -EINVAL; +} #endif =20 #ifdef CONFIG_SHRINKER_DEBUG diff --git a/mm/mseal.c b/mm/mseal.c index 15bba28acc00..aac9399ffd5d 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -26,6 +26,11 @@ static inline void set_vma_sealed(struct vm_area_struct = *vma) vm_flags_set(vma, VM_SEALED); } =20 +static inline void clear_vma_sealed(struct vm_area_struct *vma) +{ + vm_flags_clear(vma, VM_SEALED); +} + /* * check if a vma is sealed for modification. * return true, if modification is allowed. @@ -117,7 +122,7 @@ bool can_modify_mm_madv(struct mm_struct *mm, unsigned = long start, unsigned long =20 static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vm= a, struct vm_area_struct **prev, unsigned long start, - unsigned long end, vm_flags_t newflags) + unsigned long end, vm_flags_t newflags, bool seal) { int ret =3D 0; vm_flags_t oldflags =3D vma->vm_flags; @@ -131,7 +136,10 @@ static int mseal_fixup(struct vma_iterator *vmi, struc= t vm_area_struct *vma, goto out; } =20 - set_vma_sealed(vma); + if (seal) + set_vma_sealed(vma); + else + clear_vma_sealed(vma); out: *prev =3D vma; return ret; @@ -167,9 +175,9 @@ static int check_mm_seal(unsigned long start, unsigned = long end) } =20 /* - * Apply sealing. + * Apply sealing / unsealing. */ -static int apply_mm_seal(unsigned long start, unsigned long end) +static int apply_mm_seal(unsigned long start, unsigned long end, bool seal) { unsigned long nstart; struct vm_area_struct *vma, *prev; @@ -191,11 +199,14 @@ static int apply_mm_seal(unsigned long start, unsigne= d long end) unsigned long tmp; vm_flags_t newflags; =20 - newflags =3D vma->vm_flags | VM_SEALED; + if (seal) + newflags =3D vma->vm_flags | VM_SEALED; + else + newflags =3D vma->vm_flags & ~(VM_SEALED); tmp =3D vma->vm_end; if (tmp > end) tmp =3D end; - error =3D mseal_fixup(&vmi, vma, &prev, nstart, tmp, newflags); + error =3D mseal_fixup(&vmi, vma, &prev, nstart, tmp, newflags, seal); if (error) return error; nstart =3D vma_iter_end(&vmi); @@ -204,6 +215,37 @@ static int apply_mm_seal(unsigned long start, unsigned= long end) return 0; } =20 +int do_mseal(unsigned long start, unsigned long end, bool seal) +{ + int ret; + + if (end < start) + return -EINVAL; + + if (end =3D=3D start) + return 0; + + /* + * First pass, this helps to avoid + * partial sealing in case of error in input address range, + * e.g. ENOMEM error. + */ + ret =3D check_mm_seal(start, end); + if (ret) + goto out; + + /* + * Second pass, this should success, unless there are errors + * from vma_modify_flags, e.g. merge/split error, or process + * reaching the max supported VMAs, however, those cases shall + * be rare. + */ + ret =3D apply_mm_seal(start, end, seal); + +out: + return ret; +} + /* * mseal(2) seals the VM's meta data from * selected syscalls. @@ -256,7 +298,7 @@ static int apply_mm_seal(unsigned long start, unsigned = long end) * * unseal() is not supported. */ -static int do_mseal(unsigned long start, size_t len_in, unsigned long flag= s) +static int __do_mseal(unsigned long start, size_t len_in, unsigned long fl= ags) { size_t len; int ret =3D 0; @@ -277,33 +319,12 @@ static int do_mseal(unsigned long start, size_t len_i= n, unsigned long flags) return -EINVAL; =20 end =3D start + len; - if (end < start) - return -EINVAL; - - if (end =3D=3D start) - return 0; =20 if (mmap_write_lock_killable(mm)) return -EINTR; =20 - /* - * First pass, this helps to avoid - * partial sealing in case of error in input address range, - * e.g. ENOMEM error. - */ - ret =3D check_mm_seal(start, end); - if (ret) - goto out; - - /* - * Second pass, this should success, unless there are errors - * from vma_modify_flags, e.g. merge/split error, or process - * reaching the max supported VMAs, however, those cases shall - * be rare. - */ - ret =3D apply_mm_seal(start, end); + ret =3D do_mseal(start, end, true); =20 -out: mmap_write_unlock(current->mm); return ret; } @@ -311,5 +332,5 @@ static int do_mseal(unsigned long start, size_t len_in,= unsigned long flags) SYSCALL_DEFINE3(mseal, unsigned long, start, size_t, len, unsigned long, flags) { - return do_mseal(start, len, flags); + return __do_mseal(start, len, flags); } --=20 2.40.1 Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 From nobody Sat Nov 30 03:52:21 2024 Received: from smtp-fw-52004.amazon.com (smtp-fw-52004.amazon.com [52.119.213.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 823EC1A3054 for ; Wed, 11 Sep 2024 14:37:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.154 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726065442; cv=none; b=ERxRUPByt4PHsH5Ctosgfp7Lqoa30RXsV38eQXKYlI4R/aiJl5OIJaVficto9kkwti2VQtT2QVTmErMkxSln+gpYtcdyyC07Upc01NyQj/LcEMaHQnRZ4ZcROZuankP/FvVhvgeEMDYPEFPL62hV6+0RYXF9V6xxON4e38QqYvE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726065442; c=relaxed/simple; bh=Si9Upc2GbecJX04sbCs0KyaR9A+PIgrmZDDDvD8elO0=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=U2S+7hpPKbyua6BSHu8r/wEuL1iEFD4w5WIwhki0A6OonI+MKdraQOMp5TUWIfwlNPr0ufJZF9SX/6LJ7m65BfK+XMxH2MUqam3wY8w6iqQpH0YsoCe1nm4+nOFP1kNPdM5fFYns4WcJPyS25IDyB1xrUjECgmSFHRAHWVUR0l4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.de; spf=pass smtp.mailfrom=amazon.de; dkim=pass (1024-bit key) header.d=amazon.de header.i=@amazon.de header.b=SnFEoqs5; arc=none smtp.client-ip=52.119.213.154 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.de header.i=@amazon.de header.b="SnFEoqs5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1726065441; x=1757601441; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Y9TjOwsiSXZl9P0/RyFMFJbYNQPYXtbeuapzd5F/ewk=; b=SnFEoqs5DMZ29EiE3PhcVxS0yDDvNCc1bmhFj9H9Cb9LH+wmIemx0ZkR JYbi3WaroCkKcUMFIrV0V+s6cvir9MCu4ZMg3H/BobXai7eNLzU1nDInb YsUZWpW4lAwHUznmjdMgpxHpnVwYwIgsePm1GgI05m7qW7sCNyOsIAK44 M=; X-IronPort-AV: E=Sophos;i="6.10,220,1719878400"; d="scan'208";a="231193640" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.2]) by smtp-border-fw-52004.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2024 14:36:07 +0000 Received: from EX19MTAEUC001.ant.amazon.com [10.0.17.79:61643] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.38.136:2525] with esmtp (Farcaster) id 3550e7db-38ee-4821-b3d5-b8907b20a81c; Wed, 11 Sep 2024 14:36:06 +0000 (UTC) X-Farcaster-Flow-ID: 3550e7db-38ee-4821-b3d5-b8907b20a81c Received: from EX19D033EUB004.ant.amazon.com (10.252.61.103) by EX19MTAEUC001.ant.amazon.com (10.252.51.193) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 11 Sep 2024 14:36:01 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D033EUB004.ant.amazon.com (10.252.61.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 11 Sep 2024 14:36:01 +0000 Received: from dev-dsk-faresx-1b-27755bf1.eu-west-1.amazon.com (10.253.79.181) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Wed, 11 Sep 2024 14:35:59 +0000 From: Fares Mehanna To: CC: , Fares Mehanna , Roman Kagan , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Andrew Morton , Kemeng Shi , =?UTF-8?q?Pierre-Cl=C3=A9ment=20Tosi?= , Ard Biesheuvel , Mark Rutland , "Javier Martinez Canillas" , Arnd Bergmann , Fuad Tabba , Mark Brown , Joey Gouly , Kristina Martsenko , "Randy Dunlap" , Bjorn Helgaas , Jean-Philippe Brucker , "Mike Rapoport (IBM)" , David Hildenbrand , "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , "open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , open list , "open list:MEMORY MANAGEMENT" Subject: [RFC PATCH 2/7] mm/secretmem: implement mm-local kernel allocations Date: Wed, 11 Sep 2024 14:34:01 +0000 Message-ID: <20240911143421.85612-3-faresx@amazon.de> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240911143421.85612-1-faresx@amazon.de> References: <20240911143421.85612-1-faresx@amazon.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In order to be resilient against cross-process speculation-based attacks, it makes sense to store certain (secret) items in kernel memory local to mm. Implement such allocations on top of secretmem infrastructure. Specifically, on allocate 1. Create secretmem file. 2. To distinguish it from the conventional memfd_secret()-created one and to maintain associated mm-local allocation context, put the latter on ->private_data of the file. 3. Create virtual mapping in user virtual address space using mmap(). 4. Seal the virtual mapping to disallow the user from affecting it in any way. 5. Fault the pages in, effectively calling secretmem fault handler to remove the pages from kernel linear address and make them local to process mm. 6. Change the PTE from user mode to kernel mode, any access from userspace will result in segmentation fault. Kernel can access this virtual address now. 7. Return the secure area as a struct containing the pointer to the actual memory and providing the context for the release function later. On release, - if called while mm is still in use, remove the mapping - otherwise, if performed at mm teardown, no unmapping is necessary The rest is taken care of by secretmem file cleanup, including returning the pages to the kernel direct map. Signed-off-by: Fares Mehanna Signed-off-by: Roman Kagan --- include/linux/secretmem.h | 29 ++++++ mm/Kconfig | 10 ++ mm/gup.c | 4 +- mm/secretmem.c | 213 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 254 insertions(+), 2 deletions(-) diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h index e918f96881f5..39cc73a0e4bd 100644 --- a/include/linux/secretmem.h +++ b/include/linux/secretmem.h @@ -2,6 +2,10 @@ #ifndef _LINUX_SECRETMEM_H #define _LINUX_SECRETMEM_H =20 +struct secretmem_area { + void *ptr; +}; + #ifdef CONFIG_SECRETMEM =20 extern const struct address_space_operations secretmem_aops; @@ -33,4 +37,29 @@ static inline bool secretmem_active(void) =20 #endif /* CONFIG_SECRETMEM */ =20 +#ifdef CONFIG_KERNEL_SECRETMEM + +bool can_access_secretmem_vma(struct vm_area_struct *vma); +struct secretmem_area *secretmem_allocate_pages(unsigned int order); +void secretmem_release_pages(struct secretmem_area *data); + +#else + +static inline bool can_access_secretmem_vma(struct vm_area_struct *vma) +{ + return true; +} + +static inline struct secretmem_area *secretmem_allocate_pages(unsigned int= order) +{ + return NULL; +} + +static inline void secretmem_release_pages(struct secretmem_area *data) +{ + WARN_ONCE(1, "Called secret memory release page without support\n"); +} + +#endif /* CONFIG_KERNEL_SECRETMEM */ + #endif /* _LINUX_SECRETMEM_H */ diff --git a/mm/Kconfig b/mm/Kconfig index b72e7d040f78..a327d8def179 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1168,6 +1168,16 @@ config SECRETMEM memory areas visible only in the context of the owning process and not mapped to other processes and other kernel page tables. =20 +config KERNEL_SECRETMEM + default y + bool "Enable kernel usage of memfd_secret()" if EXPERT + depends on SECRETMEM + depends on MMU + help + Enable the kernel usage of memfd_secret() for kernel memory allocations, + The allocated memory is visible only to the kernel in the context of + the owning process. + config ANON_VMA_NAME bool "Anonymous VMA name support" depends on PROC_FS && ADVISE_SYSCALLS && MMU diff --git a/mm/gup.c b/mm/gup.c index 54d0dc3831fb..6c2c6a0cbe2a 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1076,7 +1076,7 @@ struct page *follow_page(struct vm_area_struct *vma, = unsigned long address, struct follow_page_context ctx =3D { NULL }; struct page *page; =20 - if (vma_is_secretmem(vma)) + if (!can_access_secretmem_vma(vma)) return NULL; =20 if (WARN_ON_ONCE(foll_flags & FOLL_PIN)) @@ -1281,7 +1281,7 @@ static int check_vma_flags(struct vm_area_struct *vma= , unsigned long gup_flags) if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma)) return -EOPNOTSUPP; =20 - if (vma_is_secretmem(vma)) + if (!can_access_secretmem_vma(vma)) return -EFAULT; =20 if (write) { diff --git a/mm/secretmem.c b/mm/secretmem.c index 3afb5ad701e1..86afedc65889 100644 --- a/mm/secretmem.c +++ b/mm/secretmem.c @@ -13,13 +13,17 @@ #include #include #include +#include #include #include #include #include #include +#include =20 +#include #include +#include =20 #include =20 @@ -42,6 +46,16 @@ MODULE_PARM_DESC(secretmem_enable, =20 static atomic_t secretmem_users; =20 +/* secretmem file private context */ +struct secretmem_ctx { + struct secretmem_area _area; + struct page **_pages; + unsigned long _nr_pages; + struct file *_file; + struct mm_struct *_mm; +}; + + bool secretmem_active(void) { return !!atomic_read(&secretmem_users); @@ -116,6 +130,7 @@ static const struct vm_operations_struct secretmem_vm_o= ps =3D { =20 static int secretmem_release(struct inode *inode, struct file *file) { + kfree(file->private_data); atomic_dec(&secretmem_users); return 0; } @@ -123,13 +138,23 @@ static int secretmem_release(struct inode *inode, str= uct file *file) static int secretmem_mmap(struct file *file, struct vm_area_struct *vma) { unsigned long len =3D vma->vm_end - vma->vm_start; + struct secretmem_ctx *ctx =3D file->private_data; + unsigned long kernel_no_permissions; + + kernel_no_permissions =3D (VM_READ | VM_WRITE | VM_EXEC | VM_MAYEXEC); =20 if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) =3D=3D 0) return -EINVAL; =20 + if (ctx && (vma->vm_flags & kernel_no_permissions)) + return -EINVAL; + if (!mlock_future_ok(vma->vm_mm, vma->vm_flags | VM_LOCKED, len)) return -EAGAIN; =20 + if (ctx) + vm_flags_set(vma, VM_MIXEDMAP); + vm_flags_set(vma, VM_LOCKED | VM_DONTDUMP); vma->vm_ops =3D &secretmem_vm_ops; =20 @@ -230,6 +255,194 @@ static struct file *secretmem_file_create(unsigned lo= ng flags) return file; } =20 +#ifdef CONFIG_KERNEL_SECRETMEM + +struct secretmem_area *secretmem_allocate_pages(unsigned int order) +{ + unsigned long uvaddr, uvaddr_inc, unused, nr_pages, bytes_length; + struct file *kernel_secfile; + struct vm_area_struct *vma; + struct secretmem_ctx *ctx; + struct page **sec_pages; + struct mm_struct *mm; + long nr_pinned_pages; + pte_t pte, old_pte; + spinlock_t *ptl; + pte_t *upte; + int rc; + + nr_pages =3D (1 << order); + bytes_length =3D nr_pages * PAGE_SIZE; + mm =3D current->mm; + + if (!mm || !mmget_not_zero(mm)) + return NULL; + + /* Create secret memory file / truncate it */ + kernel_secfile =3D secretmem_file_create(0); + if (IS_ERR(kernel_secfile)) + goto put_mm; + + ctx =3D kzalloc(sizeof(*ctx), GFP_KERNEL); + if (IS_ERR(ctx)) + goto close_secfile; + kernel_secfile->private_data =3D ctx; + + rc =3D do_truncate(file_mnt_idmap(kernel_secfile), + file_dentry(kernel_secfile), bytes_length, 0, NULL); + if (rc) + goto close_secfile; + + if (mmap_write_lock_killable(mm)) + goto close_secfile; + + /* Map pages to the secretmem file */ + uvaddr =3D do_mmap(kernel_secfile, 0, bytes_length, PROT_NONE, + MAP_SHARED, 0, 0, &unused, NULL); + if (IS_ERR_VALUE(uvaddr)) + goto unlock_mmap; + + /* mseal() the VMA to make sure it won't change */ + rc =3D do_mseal(uvaddr, uvaddr + bytes_length, true); + if (rc) + goto unmap_pages; + + /* Make sure VMA is there, and is kernel-secure */ + vma =3D find_vma(current->mm, uvaddr); + if (!vma) + goto unseal_vma; + + if (!vma_is_secretmem(vma) || + !can_access_secretmem_vma(vma)) + goto unseal_vma; + + /* Pin user pages; fault them in */ + sec_pages =3D kzalloc(sizeof(struct page *) * nr_pages, GFP_KERNEL); + if (!sec_pages) + goto unseal_vma; + + nr_pinned_pages =3D pin_user_pages(uvaddr, nr_pages, FOLL_FORCE | FOLL_LO= NGTERM, sec_pages); + if (nr_pinned_pages < 0) + goto free_sec_pages; + if (nr_pinned_pages !=3D nr_pages) + goto unpin_pages; + + /* Modify the existing mapping to be kernel accessible, local to this pro= cess mm */ + uvaddr_inc =3D uvaddr; + while (uvaddr_inc < uvaddr + bytes_length) { + upte =3D get_locked_pte(mm, uvaddr_inc, &ptl); + if (!upte) + goto unpin_pages; + old_pte =3D ptep_modify_prot_start(vma, uvaddr_inc, upte); + pte =3D pte_modify(old_pte, PAGE_KERNEL); + ptep_modify_prot_commit(vma, uvaddr_inc, upte, old_pte, pte); + pte_unmap_unlock(upte, ptl); + uvaddr_inc +=3D PAGE_SIZE; + } + flush_tlb_range(vma, uvaddr, uvaddr + bytes_length); + + /* Return data */ + mmgrab(mm); + ctx->_area.ptr =3D (void *) uvaddr; + ctx->_pages =3D sec_pages; + ctx->_nr_pages =3D nr_pages; + ctx->_mm =3D mm; + ctx->_file =3D kernel_secfile; + + mmap_write_unlock(mm); + mmput(mm); + + return &ctx->_area; + +unpin_pages: + unpin_user_pages(sec_pages, nr_pinned_pages); +free_sec_pages: + kfree(sec_pages); +unseal_vma: + rc =3D do_mseal(uvaddr, uvaddr + bytes_length, false); + if (rc) + BUG(); +unmap_pages: + rc =3D do_munmap(mm, uvaddr, bytes_length, NULL); + if (rc) + BUG(); +unlock_mmap: + mmap_write_unlock(mm); +close_secfile: + fput(kernel_secfile); +put_mm: + mmput(mm); + return NULL; +} + +void secretmem_release_pages(struct secretmem_area *data) +{ + unsigned long uvaddr, bytes_length; + struct secretmem_ctx *ctx; + int rc; + + if (!data || !data->ptr) + BUG(); + + ctx =3D container_of(data, struct secretmem_ctx, _area); + if (!ctx || !ctx->_file || !ctx->_pages || !ctx->_mm) + BUG(); + + bytes_length =3D ctx->_nr_pages * PAGE_SIZE; + uvaddr =3D (unsigned long) data->ptr; + + /* + * Remove the mapping if mm is still in use. + * Not secure to continue if unmapping failed. + */ + if (mmget_not_zero(ctx->_mm)) { + mmap_write_lock(ctx->_mm); + rc =3D do_mseal(uvaddr, uvaddr + bytes_length, false); + if (rc) { + mmap_write_unlock(ctx->_mm); + BUG(); + } + rc =3D do_munmap(ctx->_mm, uvaddr, bytes_length, NULL); + if (rc) { + mmap_write_unlock(ctx->_mm); + BUG(); + } + mmap_write_unlock(ctx->_mm); + mmput(ctx->_mm); + } + + mmdrop(ctx->_mm); + unpin_user_pages(ctx->_pages, ctx->_nr_pages); + fput(ctx->_file); + kfree(ctx->_pages); + + ctx->_nr_pages =3D 0; + ctx->_pages =3D NULL; + ctx->_file =3D NULL; + ctx->_mm =3D NULL; + ctx->_area.ptr =3D NULL; +} + +bool can_access_secretmem_vma(struct vm_area_struct *vma) +{ + struct secretmem_ctx *ctx; + + if (!vma_is_secretmem(vma)) + return true; + + /* + * If VMA is owned by running process, and marked for kernel + * usage, then allow access. + */ + ctx =3D vma->vm_file->private_data; + if (ctx && current->mm =3D=3D vma->vm_mm) + return true; + + return false; +} + +#endif /* CONFIG_KERNEL_SECRETMEM */ + SYSCALL_DEFINE1(memfd_secret, unsigned int, flags) { struct file *file; --=20 2.40.1 Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 From nobody Sat Nov 30 03:52:21 2024 Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D78CE1A76C9 for ; Wed, 11 Sep 2024 14:37:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.171.184.29 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726065468; cv=none; b=KGo1/EX5F2RjDdzDIjx0TkzcY8QKctuzu/oagdwy3+AqPHb3H1e6k2mDg5CT/xArRFBWXeac1UaeNXo+VAhlr2cMRUlopNFwVHKfZM9QAhGoj0evTv+GfWohtegMymRV2tMEmloEU+MdT/0COZRAQAzCnusPuDculK70tquH4nM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726065468; c=relaxed/simple; bh=feRSGJHJU4EXR47b3J41bw3N15CyZRe4CKUQtrdHZqY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=FHLoS5KYGe7LHaNhH0xqd74k7FUVSGT9JbRoZcbIcCOkpKipHho4NRckcTbselMapu1NMT1qTsTplB/UFpMo4uZ0nIdxtCnRuCp2OtEw7reVEfmswm1JQWQM1DV84E6aKtM6jVvolfm2MzIC1Y/hv46ogrmwtabGJDsFMRIPUBI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.de; spf=pass smtp.mailfrom=amazon.de; dkim=pass (1024-bit key) header.d=amazon.de header.i=@amazon.de header.b=fP/foVYv; arc=none smtp.client-ip=207.171.184.29 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.de header.i=@amazon.de header.b="fP/foVYv" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1726065467; x=1757601467; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rCn+xTWUjnpsWnBBnpI3iR0NNWrWMHIE+UutE4x6c8o=; b=fP/foVYvIedlOjkxTffZG9ffRlV5MWod3Keodu7uiEvWZFZkwI6A1tVy Wl41huZOdJk2H6huy5WaSPO23dIISv/Hukne75pgeGkm3mvQXR/PqDqLX gD5sIrdv1bBmS/Pcb3LMrB5BOEZ84FpiLE72GlV/6oJcUqEfGF5syqPJa c=; X-IronPort-AV: E=Sophos;i="6.10,220,1719878400"; d="scan'208";a="452953260" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2024 14:36:30 +0000 Received: from EX19MTAEUC001.ant.amazon.com [10.0.17.79:20048] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.13.80:2525] with esmtp (Farcaster) id 337e7469-1cf4-42a1-a92c-28b8d968fd53; Wed, 11 Sep 2024 14:36:28 +0000 (UTC) X-Farcaster-Flow-ID: 337e7469-1cf4-42a1-a92c-28b8d968fd53 Received: from EX19D007EUB001.ant.amazon.com (10.252.51.82) by EX19MTAEUC001.ant.amazon.com (10.252.51.155) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 11 Sep 2024 14:36:22 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D007EUB001.ant.amazon.com (10.252.51.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.35; Wed, 11 Sep 2024 14:36:21 +0000 Received: from dev-dsk-faresx-1b-27755bf1.eu-west-1.amazon.com (10.253.79.181) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Wed, 11 Sep 2024 14:36:19 +0000 From: Fares Mehanna To: CC: , Fares Mehanna , "Marc Zyngier" , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , "Will Deacon" , Andrew Morton , "Kemeng Shi" , =?UTF-8?q?Pierre-Cl=C3=A9ment=20Tosi?= , Ard Biesheuvel , Mark Rutland , Javier Martinez Canillas , "Arnd Bergmann" , Fuad Tabba , Mark Brown , Joey Gouly , Kristina Martsenko , Randy Dunlap , "Bjorn Helgaas" , Jean-Philippe Brucker , "Mike Rapoport (IBM)" , "David Hildenbrand" , Roman Kagan , "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , "open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , open list , "open list:MEMORY MANAGEMENT" Subject: [RFC PATCH 3/7] arm64: KVM: Refactor C-code to access vCPU gp-registers through macros Date: Wed, 11 Sep 2024 14:34:02 +0000 Message-ID: <20240911143421.85612-4-faresx@amazon.de> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240911143421.85612-1-faresx@amazon.de> References: <20240911143421.85612-1-faresx@amazon.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Unify how KVM accesses vCPU gp-regs by using two macros vcpu_gp_regs() and ctxt_gp_regs(). This is prerequisite to move the gp-regs later to be dynami= cally allocated for vCPUs. Signed-off-by: Fares Mehanna --- arch/arm64/include/asm/kvm_emulate.h | 2 +- arch/arm64/include/asm/kvm_host.h | 3 ++- arch/arm64/kvm/guest.c | 8 ++++---- arch/arm64/kvm/hyp/include/hyp/switch.h | 2 +- arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 10 +++++----- arch/arm64/kvm/hyp/include/nvhe/trap_handler.h | 2 +- 6 files changed, 14 insertions(+), 13 deletions(-) diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/= kvm_emulate.h index a601a9305b10..cabfb76ca514 100644 --- a/arch/arm64/include/asm/kvm_emulate.h +++ b/arch/arm64/include/asm/kvm_emulate.h @@ -170,7 +170,7 @@ static __always_inline void vcpu_set_reg(struct kvm_vcp= u *vcpu, u8 reg_num, =20 static inline bool vcpu_is_el2_ctxt(const struct kvm_cpu_context *ctxt) { - switch (ctxt->regs.pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) { + switch (ctxt_gp_regs(ctxt)->pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) { case PSR_MODE_EL2h: case PSR_MODE_EL2t: return true; diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm= _host.h index a33f5996ca9f..31cbd62a5d06 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -946,7 +946,8 @@ struct kvm_vcpu_arch { #define vcpu_clear_on_unsupported_cpu(vcpu) \ vcpu_clear_flag(vcpu, ON_UNSUPPORTED_CPU) =20 -#define vcpu_gp_regs(v) (&(v)->arch.ctxt.regs) +#define ctxt_gp_regs(ctxt) (&(ctxt)->regs) +#define vcpu_gp_regs(v) (ctxt_gp_regs(&(v)->arch.ctxt)) =20 /* * Only use __vcpu_sys_reg/ctxt_sys_reg if you know you want the diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index 11098eb7eb44..821a2b7de388 100644 --- a/arch/arm64/kvm/guest.c +++ b/arch/arm64/kvm/guest.c @@ -134,16 +134,16 @@ static void *core_reg_addr(struct kvm_vcpu *vcpu, con= st struct kvm_one_reg *reg) KVM_REG_ARM_CORE_REG(regs.regs[30]): off -=3D KVM_REG_ARM_CORE_REG(regs.regs[0]); off /=3D 2; - return &vcpu->arch.ctxt.regs.regs[off]; + return &vcpu_gp_regs(vcpu)->regs[off]; =20 case KVM_REG_ARM_CORE_REG(regs.sp): - return &vcpu->arch.ctxt.regs.sp; + return &vcpu_gp_regs(vcpu)->sp; =20 case KVM_REG_ARM_CORE_REG(regs.pc): - return &vcpu->arch.ctxt.regs.pc; + return &vcpu_gp_regs(vcpu)->pc; =20 case KVM_REG_ARM_CORE_REG(regs.pstate): - return &vcpu->arch.ctxt.regs.pstate; + return &vcpu_gp_regs(vcpu)->pstate; =20 case KVM_REG_ARM_CORE_REG(sp_el1): return __ctxt_sys_reg(&vcpu->arch.ctxt, SP_EL1); diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/i= nclude/hyp/switch.h index 37ff87d782b6..d2ed0938fc90 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -649,7 +649,7 @@ static inline void synchronize_vcpu_pstate(struct kvm_v= cpu *vcpu, u64 *exit_code ESR_ELx_EC(read_sysreg_el2(SYS_ESR)) =3D=3D ESR_ELx_EC_PAC) write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR); =20 - vcpu->arch.ctxt.regs.pstate =3D read_sysreg_el2(SYS_SPSR); + vcpu_gp_regs(vcpu)->pstate =3D read_sysreg_el2(SYS_SPSR); } =20 /* diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hy= p/include/hyp/sysreg-sr.h index 4c0fdabaf8ae..d17033766010 100644 --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h @@ -105,13 +105,13 @@ static inline void __sysreg_save_el1_state(struct kvm= _cpu_context *ctxt) =20 static inline void __sysreg_save_el2_return_state(struct kvm_cpu_context *= ctxt) { - ctxt->regs.pc =3D read_sysreg_el2(SYS_ELR); + ctxt_gp_regs(ctxt)->pc =3D read_sysreg_el2(SYS_ELR); /* * Guest PSTATE gets saved at guest fixup time in all * cases. We still need to handle the nVHE host side here. */ if (!has_vhe() && ctxt->__hyp_running_vcpu) - ctxt->regs.pstate =3D read_sysreg_el2(SYS_SPSR); + ctxt_gp_regs(ctxt)->pstate =3D read_sysreg_el2(SYS_SPSR); =20 if (cpus_have_final_cap(ARM64_HAS_RAS_EXTN)) ctxt_sys_reg(ctxt, DISR_EL1) =3D read_sysreg_s(SYS_VDISR_EL2); @@ -202,7 +202,7 @@ static inline void __sysreg_restore_el1_state(struct kv= m_cpu_context *ctxt) /* Read the VCPU state's PSTATE, but translate (v)EL2 to EL1. */ static inline u64 to_hw_pstate(const struct kvm_cpu_context *ctxt) { - u64 mode =3D ctxt->regs.pstate & (PSR_MODE_MASK | PSR_MODE32_BIT); + u64 mode =3D ctxt_gp_regs(ctxt)->pstate & (PSR_MODE_MASK | PSR_MODE32_BIT= ); =20 switch (mode) { case PSR_MODE_EL2t: @@ -213,7 +213,7 @@ static inline u64 to_hw_pstate(const struct kvm_cpu_con= text *ctxt) break; } =20 - return (ctxt->regs.pstate & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) | mode; + return (ctxt_gp_regs(ctxt)->pstate & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) |= mode; } =20 static inline void __sysreg_restore_el2_return_state(struct kvm_cpu_contex= t *ctxt) @@ -235,7 +235,7 @@ static inline void __sysreg_restore_el2_return_state(st= ruct kvm_cpu_context *ctx if (!(mode & PSR_MODE32_BIT) && mode >=3D PSR_MODE_EL2t) pstate =3D PSR_MODE_EL2h | PSR_IL_BIT; =20 - write_sysreg_el2(ctxt->regs.pc, SYS_ELR); + write_sysreg_el2(ctxt_gp_regs(ctxt)->pc, SYS_ELR); write_sysreg_el2(pstate, SYS_SPSR); =20 if (cpus_have_final_cap(ARM64_HAS_RAS_EXTN)) diff --git a/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h b/arch/arm64/kv= m/hyp/include/nvhe/trap_handler.h index 45a84f0ade04..dfe5be0d70ef 100644 --- a/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h +++ b/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h @@ -11,7 +11,7 @@ =20 #include =20 -#define cpu_reg(ctxt, r) (ctxt)->regs.regs[r] +#define cpu_reg(ctxt, r) (ctxt_gp_regs((ctxt))->regs[r]) #define DECLARE_REG(type, name, ctxt, reg) \ type name =3D (type)cpu_reg(ctxt, (reg)) =20 --=20 2.40.1 Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 From nobody Sat Nov 30 03:52:21 2024 Received: from smtp-fw-52003.amazon.com (smtp-fw-52003.amazon.com [52.119.213.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 62BD51A0716 for ; Wed, 11 Sep 2024 14:38:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726065485; cv=none; b=PCQ2U1vZY6Fu1AfzwLRMU4G6y+nDp2qqm7itZ12/NHpRCzH8kGPqeUcn1AmImEAdgRiJLnQCsEj233M7iwfojbZA9aC4n2ytRVUtMNLR9zsTx4JGzSCFNGRcNlAHWX/J/Qxc7qzqF0tBc1X7yBZaYtzm4gXEMDxbkwjupW0Hpu8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726065485; c=relaxed/simple; bh=Lpd9Dv852KN81mDxr313SZ2EVA4RHwbx7HlJ0kkCpAY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PXLcwWQMdMv1i68tNbi5y74eaK7dt/HeSlLfj3+I7KOW1pN7fxdVkg4fRGrF9399TWjAQWmQh+hAK71FpaRj4GGmH35SSABEu9Y3TMbMkVkR8m91yUgTkuoPIxeqF7JNafkfPps+RhFLqGD1R4ORDX7702m8pQcYPvfqgpCwQfY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.de; spf=pass smtp.mailfrom=amazon.de; dkim=pass (1024-bit key) header.d=amazon.de header.i=@amazon.de header.b=OvYiy/RL; arc=none smtp.client-ip=52.119.213.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.de header.i=@amazon.de header.b="OvYiy/RL" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1726065484; x=1757601484; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gUT0NHH1rkNY0jIUGubcfNpyrpnEMMZzd9a2sKPpzdA=; b=OvYiy/RLGsHEw2Y91pabEnpcXnmrBn5OxpEOSY4fZg7eJvCNNl9k1u6U 6RytsYo8D8Dgvhb4Qro4VfUlQ0oRZ3Fq9uXT5szAXy51l05C6RmBDuJV9 CRK68EfikhoQ1iw0jRkrusXQzmX3HiTj+JdtFn2aLISg6R6j3hkvqb66v M=; X-IronPort-AV: E=Sophos;i="6.10,220,1719878400"; d="scan'208";a="24916368" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52003.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2024 14:36:54 +0000 Received: from EX19MTAEUA001.ant.amazon.com [10.0.10.100:10017] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.27.59:2525] with esmtp (Farcaster) id f9f29f1f-caf9-4ac7-bc5d-27f9e417256f; Wed, 11 Sep 2024 14:36:52 +0000 (UTC) X-Farcaster-Flow-ID: f9f29f1f-caf9-4ac7-bc5d-27f9e417256f Received: from EX19D007EUA001.ant.amazon.com (10.252.50.133) by EX19MTAEUA001.ant.amazon.com (10.252.50.223) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 11 Sep 2024 14:36:52 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D007EUA001.ant.amazon.com (10.252.50.133) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.35; Wed, 11 Sep 2024 14:36:51 +0000 Received: from dev-dsk-faresx-1b-27755bf1.eu-west-1.amazon.com (10.253.79.181) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Wed, 11 Sep 2024 14:36:49 +0000 From: Fares Mehanna To: CC: , Fares Mehanna , "Marc Zyngier" , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , "Will Deacon" , Andrew Morton , "Kemeng Shi" , =?UTF-8?q?Pierre-Cl=C3=A9ment=20Tosi?= , Ard Biesheuvel , Mark Rutland , Javier Martinez Canillas , "Arnd Bergmann" , Fuad Tabba , Mark Brown , Joey Gouly , Kristina Martsenko , Randy Dunlap , "Bjorn Helgaas" , Jean-Philippe Brucker , "Mike Rapoport (IBM)" , "David Hildenbrand" , Roman Kagan , "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , "open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , open list , "open list:MEMORY MANAGEMENT" Subject: [RFC PATCH 4/7] KVM: Refactor Assembly-code to access vCPU gp-registers through a macro Date: Wed, 11 Sep 2024 14:34:03 +0000 Message-ID: <20240911143421.85612-5-faresx@amazon.de> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240911143421.85612-1-faresx@amazon.de> References: <20240911143421.85612-1-faresx@amazon.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Right now assembly code accesses vCPU gp-regs directly from the context str= uct "struct kvm_cpu_context" using "CPU_XREG_OFFSET()". Since we want to move gp-regs to dynamic memory, we can no longer assume th= at gp-regs will be embedded in the context struct, thus split the access to two steps. The first is to get the gp-regs from the context using the assembly macro "get_ctxt_gp_regs". And the second is to access the gp-registers directly from within the "struct user_pt_regs" by removing the offset "CPU_USER_PT_REGS" from the ac= cess macro "CPU_XREG_OFFSET()". I also changed variable naming and comments where appropriate. Signed-off-by: Fares Mehanna --- arch/arm64/include/asm/kvm_asm.h | 48 +++++++++++++++++--------------- arch/arm64/kvm/hyp/entry.S | 15 ++++++++++ arch/arm64/kvm/hyp/nvhe/host.S | 20 ++++++++++--- 3 files changed, 57 insertions(+), 26 deletions(-) diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_= asm.h index 2181a11b9d92..fa4fb642a5f5 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -313,6 +313,10 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr,= u64 spsr, u64 elr_virt, str \vcpu, [\ctxt, #HOST_CONTEXT_VCPU] .endm =20 +.macro get_ctxt_gp_regs ctxt, regs + add \regs, \ctxt, #CPU_USER_PT_REGS +.endm + /* * KVM extable for unexpected exceptions. * Create a struct kvm_exception_table_entry output to a section that can = be @@ -329,7 +333,7 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, = u64 spsr, u64 elr_virt, .popsection .endm =20 -#define CPU_XREG_OFFSET(x) (CPU_USER_PT_REGS + 8*x) +#define CPU_XREG_OFFSET(x) (8 * (x)) #define CPU_LR_OFFSET CPU_XREG_OFFSET(30) #define CPU_SP_EL0_OFFSET (CPU_LR_OFFSET + 8) =20 @@ -337,34 +341,34 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr= , u64 spsr, u64 elr_virt, * We treat x18 as callee-saved as the host may use it as a platform * register (e.g. for shadow call stack). */ -.macro save_callee_saved_regs ctxt - str x18, [\ctxt, #CPU_XREG_OFFSET(18)] - stp x19, x20, [\ctxt, #CPU_XREG_OFFSET(19)] - stp x21, x22, [\ctxt, #CPU_XREG_OFFSET(21)] - stp x23, x24, [\ctxt, #CPU_XREG_OFFSET(23)] - stp x25, x26, [\ctxt, #CPU_XREG_OFFSET(25)] - stp x27, x28, [\ctxt, #CPU_XREG_OFFSET(27)] - stp x29, lr, [\ctxt, #CPU_XREG_OFFSET(29)] +.macro save_callee_saved_regs regs + str x18, [\regs, #CPU_XREG_OFFSET(18)] + stp x19, x20, [\regs, #CPU_XREG_OFFSET(19)] + stp x21, x22, [\regs, #CPU_XREG_OFFSET(21)] + stp x23, x24, [\regs, #CPU_XREG_OFFSET(23)] + stp x25, x26, [\regs, #CPU_XREG_OFFSET(25)] + stp x27, x28, [\regs, #CPU_XREG_OFFSET(27)] + stp x29, lr, [\regs, #CPU_XREG_OFFSET(29)] .endm =20 -.macro restore_callee_saved_regs ctxt - // We require \ctxt is not x18-x28 - ldr x18, [\ctxt, #CPU_XREG_OFFSET(18)] - ldp x19, x20, [\ctxt, #CPU_XREG_OFFSET(19)] - ldp x21, x22, [\ctxt, #CPU_XREG_OFFSET(21)] - ldp x23, x24, [\ctxt, #CPU_XREG_OFFSET(23)] - ldp x25, x26, [\ctxt, #CPU_XREG_OFFSET(25)] - ldp x27, x28, [\ctxt, #CPU_XREG_OFFSET(27)] - ldp x29, lr, [\ctxt, #CPU_XREG_OFFSET(29)] +.macro restore_callee_saved_regs regs + // We require \regs is not x18-x28 + ldr x18, [\regs, #CPU_XREG_OFFSET(18)] + ldp x19, x20, [\regs, #CPU_XREG_OFFSET(19)] + ldp x21, x22, [\regs, #CPU_XREG_OFFSET(21)] + ldp x23, x24, [\regs, #CPU_XREG_OFFSET(23)] + ldp x25, x26, [\regs, #CPU_XREG_OFFSET(25)] + ldp x27, x28, [\regs, #CPU_XREG_OFFSET(27)] + ldp x29, lr, [\regs, #CPU_XREG_OFFSET(29)] .endm =20 -.macro save_sp_el0 ctxt, tmp +.macro save_sp_el0 regs, tmp mrs \tmp, sp_el0 - str \tmp, [\ctxt, #CPU_SP_EL0_OFFSET] + str \tmp, [\regs, #CPU_SP_EL0_OFFSET] .endm =20 -.macro restore_sp_el0 ctxt, tmp - ldr \tmp, [\ctxt, #CPU_SP_EL0_OFFSET] +.macro restore_sp_el0 regs, tmp + ldr \tmp, [\regs, #CPU_SP_EL0_OFFSET] msr sp_el0, \tmp .endm =20 diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S index 4433a234aa9b..628a123bcdc1 100644 --- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -28,6 +28,9 @@ SYM_FUNC_START(__guest_enter) =20 adr_this_cpu x1, kvm_hyp_ctxt, x2 =20 + // Get gp-regs pointer from the context + get_ctxt_gp_regs x1, x1 + // Store the hyp regs save_callee_saved_regs x1 =20 @@ -62,6 +65,9 @@ alternative_else_nop_endif // when this feature is enabled for kernel code. ptrauth_switch_to_guest x29, x0, x1, x2 =20 + // Get gp-regs pointer from the context + get_ctxt_gp_regs x29, x29 + // Restore the guest's sp_el0 restore_sp_el0 x29, x0 =20 @@ -108,6 +114,7 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL) // current state is saved to the guest context but it will only be // accurate if the guest had been completely restored. adr_this_cpu x0, kvm_hyp_ctxt, x1 + get_ctxt_gp_regs x0, x0 adr_l x1, hyp_panic str x1, [x0, #CPU_XREG_OFFSET(30)] =20 @@ -120,6 +127,7 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL) // vcpu x0-x1 on the stack =20 add x1, x1, #VCPU_CONTEXT + get_ctxt_gp_regs x1, x1 =20 ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN) =20 @@ -145,6 +153,10 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL) // Store the guest's sp_el0 save_sp_el0 x1, x2 =20 + // Recover vCPU context to x1 + get_vcpu_ptr x1, x2 + add x1, x1, #VCPU_CONTEXT + adr_this_cpu x2, kvm_hyp_ctxt, x3 =20 // Macro ptrauth_switch_to_hyp format: @@ -157,6 +169,9 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL) // mte_switch_to_hyp(g_ctxt, h_ctxt, reg1) mte_switch_to_hyp x1, x2, x3 =20 + // Get gp-regs pointer from the context + get_ctxt_gp_regs x2, x2 + // Restore hyp's sp_el0 restore_sp_el0 x2, x3 =20 diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S index 3d610fc51f4d..31afa7396294 100644 --- a/arch/arm64/kvm/hyp/nvhe/host.S +++ b/arch/arm64/kvm/hyp/nvhe/host.S @@ -17,6 +17,12 @@ SYM_FUNC_START(__host_exit) get_host_ctxt x0, x1 =20 + /* Keep host context in x1 */ + mov x1, x0 + + /* Get gp-regs pointer from the context */ + get_ctxt_gp_regs x0, x0 + /* Store the host regs x2 and x3 */ stp x2, x3, [x0, #CPU_XREG_OFFSET(2)] =20 @@ -36,7 +42,10 @@ SYM_FUNC_START(__host_exit) /* Store the host regs x18-x29, lr */ save_callee_saved_regs x0 =20 - /* Save the host context pointer in x29 across the function call */ + /* Save the host context pointer in x28 across the function call */ + mov x28, x1 + + /* Save the host gp-regs pointer in x29 across the function call */ mov x29, x0 =20 #ifdef CONFIG_ARM64_PTR_AUTH_KERNEL @@ -46,7 +55,7 @@ alternative_else_nop_endif =20 alternative_if ARM64_KVM_PROTECTED_MODE /* Save kernel ptrauth keys. */ - add x18, x29, #CPU_APIAKEYLO_EL1 + add x18, x28, #CPU_APIAKEYLO_EL1 ptrauth_save_state x18, x19, x20 =20 /* Use hyp keys. */ @@ -58,6 +67,7 @@ alternative_else_nop_endif __skip_pauth_save: #endif /* CONFIG_ARM64_PTR_AUTH_KERNEL */ =20 + mov x0, x28 bl handle_trap =20 __host_enter_restore_full: @@ -68,7 +78,7 @@ b __skip_pauth_restore alternative_else_nop_endif =20 alternative_if ARM64_KVM_PROTECTED_MODE - add x18, x29, #CPU_APIAKEYLO_EL1 + add x18, x28, #CPU_APIAKEYLO_EL1 ptrauth_restore_state x18, x19, x20 alternative_else_nop_endif __skip_pauth_restore: @@ -101,7 +111,8 @@ SYM_FUNC_END(__host_exit) * void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt); */ SYM_FUNC_START(__host_enter) - mov x29, x0 + mov x28, x0 + get_ctxt_gp_regs x0, x29 b __host_enter_restore_full SYM_FUNC_END(__host_enter) =20 @@ -141,6 +152,7 @@ SYM_FUNC_START(__hyp_do_panic) =20 /* Enter the host, conditionally restoring the host context. */ cbz x29, __host_enter_without_restoring + get_ctxt_gp_regs x29, x29 b __host_enter_for_panic SYM_FUNC_END(__hyp_do_panic) =20 --=20 2.40.1 Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 From nobody Sat Nov 30 03:52:21 2024 Received: from smtp-fw-52002.amazon.com (smtp-fw-52002.amazon.com [52.119.213.150]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 60F4A1A4B6E for ; Wed, 11 Sep 2024 14:38:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.150 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726065523; cv=none; b=mtgc/OMcmwLh1fBian0tyQOFra9OHNO6Bqruuwrj09lg8LLpn8U0dQlVioIXW0U/GBI8jzBnN6UvIdrMytHUaSmDeUn/PJKU91pmO2NpQeog6J1R01YRyiAqx7feNN+wAuj3KjCNjgYkvxDH6BunZbs22c/kyMPAVsYQrQc+F7Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726065523; c=relaxed/simple; bh=y69GLusekEpOSkfF6e4m3oTOacotg5cYQYvn2myDj2w=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=o6YcJR5I/M8X4kUJWj3nkSUKTJEl4VczIRakONNNj6FslLlMU6SF9oVGYyUxZpVqCG1kSbOMcRTtGGFLQKSKK/W78UAdtCNriEvXvlPGlKkKpdjsBHj6h9O9kHuWNX9RN8uN2UFZkOVivsg18e8p3bcaYL3f9ms7QJSJdDXUSlE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.de; spf=pass smtp.mailfrom=amazon.de; dkim=pass (1024-bit key) header.d=amazon.de header.i=@amazon.de header.b=LIeLRz3s; arc=none smtp.client-ip=52.119.213.150 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.de header.i=@amazon.de header.b="LIeLRz3s" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1726065520; x=1757601520; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=y7aHv3GtJyRu8tFW/46gCzZ/hJvvqzg3ZHRgqRHdqdU=; b=LIeLRz3sAPyO7Y9WMDYHaD3IyoPwLRe+Ta1QhWRAWRY+BoKAblG4VPKP TqMCiAHCF7VxoY23617uT2vE1vv55vSRr7NfCY5H4nBMTohsHkPP0HsO7 yvfibcs11gQHX2+Fu+hmTLQE0HP0j6l87hUbolN8HKo14ZlxXFk6aChpr w=; X-IronPort-AV: E=Sophos;i="6.10,220,1719878400"; d="scan'208";a="658274111" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52002.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2024 14:37:14 +0000 Received: from EX19MTAEUC002.ant.amazon.com [10.0.43.254:16521] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.27.59:2525] with esmtp (Farcaster) id c7fcd23e-81e6-4810-ad32-43bfceb7c5f9; Wed, 11 Sep 2024 14:37:13 +0000 (UTC) X-Farcaster-Flow-ID: c7fcd23e-81e6-4810-ad32-43bfceb7c5f9 Received: from EX19D007EUA004.ant.amazon.com (10.252.50.76) by EX19MTAEUC002.ant.amazon.com (10.252.51.245) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 11 Sep 2024 14:37:12 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D007EUA004.ant.amazon.com (10.252.50.76) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.35; Wed, 11 Sep 2024 14:37:12 +0000 Received: from dev-dsk-faresx-1b-27755bf1.eu-west-1.amazon.com (10.253.79.181) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Wed, 11 Sep 2024 14:37:09 +0000 From: Fares Mehanna To: CC: , Fares Mehanna , "Marc Zyngier" , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , "Will Deacon" , Andrew Morton , "Kemeng Shi" , =?UTF-8?q?Pierre-Cl=C3=A9ment=20Tosi?= , Ard Biesheuvel , Mark Rutland , Javier Martinez Canillas , "Arnd Bergmann" , Fuad Tabba , Mark Brown , Joey Gouly , Kristina Martsenko , Randy Dunlap , "Bjorn Helgaas" , Jean-Philippe Brucker , "Mike Rapoport (IBM)" , "David Hildenbrand" , Roman Kagan , "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , "open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , open list , "open list:MEMORY MANAGEMENT" Subject: [RFC PATCH 5/7] arm64: KVM: Allocate vCPU gp-regs dynamically on VHE and KERNEL_SECRETMEM enabled systems Date: Wed, 11 Sep 2024 14:34:04 +0000 Message-ID: <20240911143421.85612-6-faresx@amazon.de> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240911143421.85612-1-faresx@amazon.de> References: <20240911143421.85612-1-faresx@amazon.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To allocate the vCPU gp-regs using secret memory, we need to dynamically allocate the vCPU gp-regs first. This is tricky with NVHE (Non-Virtualization Host Extensions) since it will require adjusting the virtual address on every access. With a large shared codebase between the OS and the hypervisor, it would be cumbersome to dupli= cate the code with one version using `kern_hyp_va()`. To avoid this issue, and since the secret memory feature will not be enable= d on NVHE systems, we're introducing the following changes: 1. Maintain a `struct user_pt_regs regs_storage` in the vCPU context struct= as a fallback storage for the vCPU gp-regs. 2. Introduce a pointer `struct user_pt_regs *regs` in the vCPU context stru= ct to hold the dynamically allocated vCPU gp-regs. If we are on an NVHE system or a VHE (Virtualization Host Extensions) system that doesn't support `KERNEL_SECRETMEM`, we will use `ctxt_storage`. Access= ing the context in this case will not require a de-reference operation. If we are on a VHE system with support for `KERNEL_SECRETMEM`, we will use = the `regs` pointer. In this case, we will add one de-reference operation every = time the vCPU gp-reg is accessed. Accessing the gp-regs embedded in the vCPU context without de-reference is = done as: add \regs, \ctxt, #CPU_USER_PT_REGS_STRG Accessing the dynamically allocated gp-regs with de-reference is done as: ldr \regs, [\ctxt, #CPU_USER_PT_REGS] By default, we are using the first version. If we are booting on a system t= hat supports VHE and `KERNEL_SECRETMEM`, we switch to the second version. We are also allocating the needed gp-regs allocations for vCPU, kvm_hyp_ctx= t and kvm_host_data structs when needed. Signed-off-by: Fares Mehanna --- arch/arm64/include/asm/kvm_asm.h | 4 +- arch/arm64/include/asm/kvm_host.h | 24 +++++++++++- arch/arm64/kernel/asm-offsets.c | 1 + arch/arm64/kernel/image-vars.h | 1 + arch/arm64/kvm/arm.c | 63 ++++++++++++++++++++++++++++++- arch/arm64/kvm/va_layout.c | 23 +++++++++++ 6 files changed, 112 insertions(+), 4 deletions(-) diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_= asm.h index fa4fb642a5f5..1d6de0806dbd 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -314,7 +314,9 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, = u64 spsr, u64 elr_virt, .endm =20 .macro get_ctxt_gp_regs ctxt, regs - add \regs, \ctxt, #CPU_USER_PT_REGS +alternative_cb ARM64_HAS_VIRT_HOST_EXTN, kvm_update_ctxt_gp_regs + add \regs, \ctxt, #CPU_USER_PT_REGS_STRG +alternative_cb_end .endm =20 /* diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm= _host.h index 31cbd62a5d06..23a10178d1b0 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -541,7 +541,9 @@ struct kvm_sysreg_masks { }; =20 struct kvm_cpu_context { - struct user_pt_regs regs; /* sp =3D sp_el0 */ + struct user_pt_regs *regs; /* sp =3D sp_el0 */ + struct user_pt_regs regs_storage; + struct secretmem_area *regs_area; =20 u64 spsr_abt; u64 spsr_und; @@ -946,7 +948,25 @@ struct kvm_vcpu_arch { #define vcpu_clear_on_unsupported_cpu(vcpu) \ vcpu_clear_flag(vcpu, ON_UNSUPPORTED_CPU) =20 -#define ctxt_gp_regs(ctxt) (&(ctxt)->regs) +/* Static allocation is used if NVHE-host or if KERNEL_SECRETMEM is not en= abled */ +static __inline bool kvm_use_dynamic_regs(void) +{ +#ifndef CONFIG_KERNEL_SECRETMEM + return false; +#endif + return cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN); +} + +static __always_inline struct user_pt_regs *ctxt_gp_regs(const struct kvm_= cpu_context *ctxt) +{ + struct user_pt_regs *regs =3D (void *) ctxt; + asm volatile(ALTERNATIVE_CB("add %0, %0, %1\n", + ARM64_HAS_VIRT_HOST_EXTN, + kvm_update_ctxt_gp_regs) + : "+r" (regs) + : "I" (offsetof(struct kvm_cpu_context, regs_storage))); + return regs; +} #define vcpu_gp_regs(v) (ctxt_gp_regs(&(v)->arch.ctxt)) =20 /* diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offset= s.c index 27de1dddb0ab..275d480f5e65 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -128,6 +128,7 @@ int main(void) DEFINE(VCPU_FAULT_DISR, offsetof(struct kvm_vcpu, arch.fault.disr_el1)); DEFINE(VCPU_HCR_EL2, offsetof(struct kvm_vcpu, arch.hcr_el2)); DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_cpu_context, regs)); + DEFINE(CPU_USER_PT_REGS_STRG, offsetof(struct kvm_cpu_context, regs_stor= age)); DEFINE(CPU_ELR_EL2, offsetof(struct kvm_cpu_context, sys_regs[ELR_EL2])= ); DEFINE(CPU_RGSR_EL1, offsetof(struct kvm_cpu_context, sys_regs[RGSR_EL1= ])); DEFINE(CPU_GCR_EL1, offsetof(struct kvm_cpu_context, sys_regs[GCR_EL1])= ); diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h index 8f5422ed1b75..e3bb626e299c 100644 --- a/arch/arm64/kernel/image-vars.h +++ b/arch/arm64/kernel/image-vars.h @@ -86,6 +86,7 @@ KVM_NVHE_ALIAS(kvm_patch_vector_branch); KVM_NVHE_ALIAS(kvm_update_va_mask); KVM_NVHE_ALIAS(kvm_get_kimage_voffset); KVM_NVHE_ALIAS(kvm_compute_final_ctr_el0); +KVM_NVHE_ALIAS(kvm_update_ctxt_gp_regs); KVM_NVHE_ALIAS(spectre_bhb_patch_loop_iter); KVM_NVHE_ALIAS(spectre_bhb_patch_loop_mitigation_enable); KVM_NVHE_ALIAS(spectre_bhb_patch_wa3); diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 9bef7638342e..78c562a060de 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -452,6 +453,7 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned i= nt id) =20 int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) { + unsigned long pages_needed; int err; =20 spin_lock_init(&vcpu->arch.mp_state_lock); @@ -469,6 +471,14 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) =20 vcpu->arch.mmu_page_cache.gfp_zero =3D __GFP_ZERO; =20 + if (kvm_use_dynamic_regs()) { + pages_needed =3D (sizeof(*vcpu_gp_regs(vcpu)) + PAGE_SIZE - 1) / PAGE_SI= ZE; + vcpu->arch.ctxt.regs_area =3D secretmem_allocate_pages(fls(pages_needed = - 1)); + if (!vcpu->arch.ctxt.regs_area) + return -ENOMEM; + vcpu->arch.ctxt.regs =3D vcpu->arch.ctxt.regs_area->ptr; + } + /* Set up the timer */ kvm_timer_vcpu_init(vcpu); =20 @@ -489,9 +499,14 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) =20 err =3D kvm_vgic_vcpu_init(vcpu); if (err) - return err; + goto free_vcpu_ctxt; =20 return kvm_share_hyp(vcpu, vcpu + 1); + +free_vcpu_ctxt: + if (kvm_use_dynamic_regs()) + secretmem_release_pages(vcpu->arch.ctxt.regs_area); + return err; } =20 void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu) @@ -508,6 +523,9 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) kvm_pmu_vcpu_destroy(vcpu); kvm_vgic_vcpu_destroy(vcpu); kvm_arm_vcpu_destroy(vcpu); + + if (kvm_use_dynamic_regs()) + secretmem_release_pages(vcpu->arch.ctxt.regs_area); } =20 void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) @@ -2683,6 +2701,45 @@ static int __init init_hyp_mode(void) return err; } =20 +static int init_hyp_hve_mode(void) +{ + int cpu; + int err =3D 0; + + if (!kvm_use_dynamic_regs()) + return 0; + + /* Allocate gp-regs */ + for_each_possible_cpu(cpu) { + void *hyp_ctxt_regs; + void *kvm_host_data_regs; + + hyp_ctxt_regs =3D kzalloc(sizeof(struct user_pt_regs), GFP_KERNEL); + if (!hyp_ctxt_regs) { + err =3D -ENOMEM; + goto free_regs; + } + per_cpu(kvm_hyp_ctxt, cpu).regs =3D hyp_ctxt_regs; + + kvm_host_data_regs =3D kzalloc(sizeof(struct user_pt_regs), GFP_KERNEL); + if (!kvm_host_data_regs) { + err =3D -ENOMEM; + goto free_regs; + } + per_cpu(kvm_host_data, cpu).host_ctxt.regs =3D kvm_host_data_regs; + } + + return 0; + +free_regs: + for_each_possible_cpu(cpu) { + kfree(per_cpu(kvm_hyp_ctxt, cpu).regs); + kfree(per_cpu(kvm_host_data, cpu).host_ctxt.regs); + } + + return err; +} + struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr) { struct kvm_vcpu *vcpu =3D NULL; @@ -2806,6 +2863,10 @@ static __init int kvm_arm_init(void) err =3D init_hyp_mode(); if (err) goto out_err; + } else { + err =3D init_hyp_hve_mode(); + if (err) + goto out_err; } =20 err =3D kvm_init_vector_slots(); diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c index 91b22a014610..fcef7e89d042 100644 --- a/arch/arm64/kvm/va_layout.c +++ b/arch/arm64/kvm/va_layout.c @@ -185,6 +185,29 @@ void __init kvm_update_va_mask(struct alt_instr *alt, } } =20 +void __init kvm_update_ctxt_gp_regs(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, int nr_inst) +{ + u32 rd, rn, imm, insn, oinsn; + + BUG_ON(nr_inst !=3D 1); + + if (!kvm_use_dynamic_regs()) + return; + + oinsn =3D le32_to_cpu(origptr[0]); + rd =3D aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RD, oinsn); + rn =3D aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RN, oinsn); + imm =3D offsetof(struct kvm_cpu_context, regs); + + insn =3D aarch64_insn_gen_load_store_imm(rd, rn, imm, + AARCH64_INSN_SIZE_64, + AARCH64_INSN_LDST_LOAD_IMM_OFFSET); + BUG_ON(insn =3D=3D AARCH64_BREAK_FAULT); + + updptr[0] =3D cpu_to_le32(insn); +} + void kvm_patch_vector_branch(struct alt_instr *alt, __le32 *origptr, __le32 *updptr, int nr_inst) { --=20 2.40.1 Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 From nobody Sat Nov 30 03:52:21 2024 Received: from smtp-fw-80007.amazon.com (smtp-fw-80007.amazon.com [99.78.197.218]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D09D1A76C9 for ; Wed, 11 Sep 2024 14:37:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.218 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726065460; cv=none; b=IEU0W3saLRqR4j81TSJ1wbx9h/zBlErwXhVRDLSz+XCZscXOzGtcKTUCiqoJxV5Sd4vijwXVv9Gj8ZxpAXL0Zf6xdxh3uFHocy8FTJc3OcrtMqAT7cXOb9BtK7NiHanw8OFSuc2OdWG36Jwbx9jaiSk8xgHTvC5ZCj1h5nIgSnE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726065460; c=relaxed/simple; bh=Q/B5vpJYxuN6CqXDgq2jT5wdTrhKkZw6PP0jxEciucc=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Do2NL7IBglwJ2CScoWHOP+Yo73bPeRMK5VCRrKjEtPvF9EYvFjy/lhZ7AQxH4h7esdrDROER03E9DyBuEyV78eB92p+jBSqgX6wLgWFHDkXN9soxJ6OVjece6+LX50DdwyFXmcWcjzpay9IMA213NIahQ33+bOke2YIK6OcAYpo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.de; spf=pass smtp.mailfrom=amazon.de; dkim=pass (1024-bit key) header.d=amazon.de header.i=@amazon.de header.b=tmWeFfmB; arc=none smtp.client-ip=99.78.197.218 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.de header.i=@amazon.de header.b="tmWeFfmB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1726065458; x=1757601458; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ttqjEP9NZOH/UBgVCbYLfsAknImtx66onLISyRd5NFM=; b=tmWeFfmBv6rY9UiSCQMfebPMwcgFBjqEW51h5ro2W50MS5T19rJmWpMq LbuNxdct8JX4YGvhU9hu+mSuY0otMVerq2wc4ERt90BxBihK79y62HRa4 +WnNs1UA0YlZZ3RR536/g1KeOa94wODlGAYJgkhTA9AkfjuRq3rzcgnmW s=; X-IronPort-AV: E=Sophos;i="6.10,220,1719878400"; d="scan'208";a="329956727" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80007.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2024 14:37:36 +0000 Received: from EX19MTAEUA001.ant.amazon.com [10.0.10.100:3852] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.38.136:2525] with esmtp (Farcaster) id ac0f2d97-7ed9-430d-a2d7-58bab541e8ef; Wed, 11 Sep 2024 14:37:35 +0000 (UTC) X-Farcaster-Flow-ID: ac0f2d97-7ed9-430d-a2d7-58bab541e8ef Received: from EX19D007EUB001.ant.amazon.com (10.252.51.82) by EX19MTAEUA001.ant.amazon.com (10.252.50.50) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 11 Sep 2024 14:37:33 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D007EUB001.ant.amazon.com (10.252.51.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.35; Wed, 11 Sep 2024 14:37:33 +0000 Received: from dev-dsk-faresx-1b-27755bf1.eu-west-1.amazon.com (10.253.79.181) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Wed, 11 Sep 2024 14:37:30 +0000 From: Fares Mehanna To: CC: , Fares Mehanna , "Marc Zyngier" , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , "Will Deacon" , Andrew Morton , "Kemeng Shi" , =?UTF-8?q?Pierre-Cl=C3=A9ment=20Tosi?= , Ard Biesheuvel , Mark Rutland , Javier Martinez Canillas , "Arnd Bergmann" , Fuad Tabba , Mark Brown , Joey Gouly , Kristina Martsenko , Randy Dunlap , "Bjorn Helgaas" , Jean-Philippe Brucker , "Mike Rapoport (IBM)" , "David Hildenbrand" , Roman Kagan , "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , "open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , open list , "open list:MEMORY MANAGEMENT" Subject: [RFC PATCH 6/7] arm64: KVM: Refactor C-code to access vCPU fp-registers through macros Date: Wed, 11 Sep 2024 14:34:05 +0000 Message-ID: <20240911143421.85612-7-faresx@amazon.de> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240911143421.85612-1-faresx@amazon.de> References: <20240911143421.85612-1-faresx@amazon.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Unify how KVM accesses vCPU fp-regs by using vcpu_fp_regs(). This is a prerequisite to move the fp-regs later to be dynamically allocated for vCPU= s. Signed-off-by: Fares Mehanna --- arch/arm64/include/asm/kvm_host.h | 2 ++ arch/arm64/kvm/arm.c | 2 +- arch/arm64/kvm/fpsimd.c | 2 +- arch/arm64/kvm/guest.c | 6 +++--- arch/arm64/kvm/hyp/include/hyp/switch.h | 4 ++-- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 4 ++-- arch/arm64/kvm/reset.c | 2 +- 7 files changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm= _host.h index 23a10178d1b0..e8ed2c12479f 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -968,6 +968,8 @@ static __always_inline struct user_pt_regs *ctxt_gp_reg= s(const struct kvm_cpu_co return regs; } #define vcpu_gp_regs(v) (ctxt_gp_regs(&(v)->arch.ctxt)) +#define ctxt_fp_regs(ctxt) (&(ctxt).fp_regs) +#define vcpu_fp_regs(v) (ctxt_fp_regs(&(v)->arch.ctxt)) =20 /* * Only use __vcpu_sys_reg/ctxt_sys_reg if you know you want the diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 78c562a060de..7542af3f766a 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -2507,7 +2507,7 @@ static void finalize_init_hyp_mode(void) for_each_possible_cpu(cpu) { struct user_fpsimd_state *fpsimd_state; =20 - fpsimd_state =3D &per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->host_ctxt.f= p_regs; + fpsimd_state =3D ctxt_fp_regs(&per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)= ->host_ctxt); per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->fpsimd_state =3D kern_hyp_va(fpsimd_state); } diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index c53e5b14038d..c27c96ae22e1 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -130,7 +130,7 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) * Currently we do not support SME guests so SVCR is * always 0 and we just need a variable to point to. */ - fp_state.st =3D &vcpu->arch.ctxt.fp_regs; + fp_state.st =3D vcpu_fp_regs(vcpu); fp_state.sve_state =3D vcpu->arch.sve_state; fp_state.sve_vl =3D vcpu->arch.sve_max_vl; fp_state.sme_state =3D NULL; diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index 821a2b7de388..3474874a00a7 100644 --- a/arch/arm64/kvm/guest.c +++ b/arch/arm64/kvm/guest.c @@ -170,13 +170,13 @@ static void *core_reg_addr(struct kvm_vcpu *vcpu, con= st struct kvm_one_reg *reg) KVM_REG_ARM_CORE_REG(fp_regs.vregs[31]): off -=3D KVM_REG_ARM_CORE_REG(fp_regs.vregs[0]); off /=3D 4; - return &vcpu->arch.ctxt.fp_regs.vregs[off]; + return &vcpu_fp_regs(vcpu)->vregs[off]; =20 case KVM_REG_ARM_CORE_REG(fp_regs.fpsr): - return &vcpu->arch.ctxt.fp_regs.fpsr; + return &vcpu_fp_regs(vcpu)->fpsr; =20 case KVM_REG_ARM_CORE_REG(fp_regs.fpcr): - return &vcpu->arch.ctxt.fp_regs.fpcr; + return &vcpu_fp_regs(vcpu)->fpcr; =20 default: return NULL; diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/i= nclude/hyp/switch.h index d2ed0938fc90..1444bad519db 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -319,7 +319,7 @@ static inline void __hyp_sve_restore_guest(struct kvm_v= cpu *vcpu) */ sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, SYS_ZCR_EL2); __sve_restore_state(vcpu_sve_pffr(vcpu), - &vcpu->arch.ctxt.fp_regs.fpsr, + &vcpu_fp_regs(vcpu)->fpsr, true); =20 /* @@ -401,7 +401,7 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu= , u64 *exit_code) if (sve_guest) __hyp_sve_restore_guest(vcpu); else - __fpsimd_restore_state(&vcpu->arch.ctxt.fp_regs); + __fpsimd_restore_state(vcpu_fp_regs(vcpu)); =20 /* Skip restoring fpexc32 for AArch64 guests */ if (!(read_sysreg(hcr_el2) & HCR_RW)) diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/h= yp-main.c index f43d845f3c4e..feb1dd37f2a5 100644 --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c @@ -32,7 +32,7 @@ static void __hyp_sve_save_guest(struct kvm_vcpu *vcpu) * on the VL, so use a consistent (i.e., the maximum) guest VL. */ sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, SYS_ZCR_EL2); - __sve_save_state(vcpu_sve_pffr(vcpu), &vcpu->arch.ctxt.fp_regs.fpsr, true= ); + __sve_save_state(vcpu_sve_pffr(vcpu), &vcpu_fp_regs(vcpu)->fpsr, true); write_sysreg_s(ZCR_ELx_LEN_MASK, SYS_ZCR_EL2); } =20 @@ -71,7 +71,7 @@ static void fpsimd_sve_sync(struct kvm_vcpu *vcpu) if (vcpu_has_sve(vcpu)) __hyp_sve_save_guest(vcpu); else - __fpsimd_save_state(&vcpu->arch.ctxt.fp_regs); + __fpsimd_save_state(vcpu_fp_regs(vcpu)); =20 if (system_supports_sve()) __hyp_sve_restore_host(); diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 0b0ae5ae7bc2..5f38acf5d156 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -229,7 +229,7 @@ void kvm_reset_vcpu(struct kvm_vcpu *vcpu) =20 /* Reset core registers */ memset(vcpu_gp_regs(vcpu), 0, sizeof(*vcpu_gp_regs(vcpu))); - memset(&vcpu->arch.ctxt.fp_regs, 0, sizeof(vcpu->arch.ctxt.fp_regs)); + memset(vcpu_fp_regs(vcpu), 0, sizeof(*vcpu_fp_regs(vcpu))); vcpu->arch.ctxt.spsr_abt =3D 0; vcpu->arch.ctxt.spsr_und =3D 0; vcpu->arch.ctxt.spsr_irq =3D 0; --=20 2.40.1 Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597 From nobody Sat Nov 30 03:52:21 2024 Received: from smtp-fw-80007.amazon.com (smtp-fw-80007.amazon.com [99.78.197.218]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 89CF31AB6CA for ; Wed, 11 Sep 2024 14:38:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.218 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726065488; cv=none; b=i1lKdrGIGrXMxIMfitNvsUzmLSQbXEC8vaSMwYlsYeY4TO+bGhmrkNwU36XlrnQFaTcMyqmb/ZhcplyUa7+lADxffRprbzcstpq3V4XaomjH+e4vYzS5ryKkGvgaoWISl3SSYLcZSErHQ0ENyK7fu0FlRT1kGbEVOOkv1cWzluw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726065488; c=relaxed/simple; bh=bcE5zfmXJa4poSgksIzijifRjEkiDBngz9LK8nPfx6g=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=KQ2dA5hEZv6JBL2MIhjOUWMkdgEogcXJikYPsx/pHU5o11g7DjDhHGPFK/CUs31kQemu6wKXX1QvS6+YRdh/5u+2nrtg6+LcGAZmvddZGa5FO+pCSxE8AEM0/+Z0eoepd6QpjNaKj6lw9eQ0PbtpDxaoECojEFzzTOqtYm9PRxI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.de; spf=pass smtp.mailfrom=amazon.de; dkim=pass (1024-bit key) header.d=amazon.de header.i=@amazon.de header.b=gF2Df1yP; arc=none smtp.client-ip=99.78.197.218 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.de header.i=@amazon.de header.b="gF2Df1yP" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1726065485; x=1757601485; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Hqkci3dj0GUqOn7J4NBabiUfpbZVZgBkN05sOca7hnY=; b=gF2Df1yPfZ7Th3kcqBzgCRzqU9NQ9p2PfOHxIUPRwjeRda/6lRFTIzqX 2eyd40Vcq0n2XL1hzeJ/t5GCA9zBaqV5oedVhlPwrgEfQP5ny4xU6mScb NUFSXZsw+6A63f3wmAH7UtbSj55cBEda3TGjWSiJZQIVvgdGB6s4AB40B U=; X-IronPort-AV: E=Sophos;i="6.10,220,1719878400"; d="scan'208";a="329956973" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80007.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2024 14:38:05 +0000 Received: from EX19MTAEUC001.ant.amazon.com [10.0.17.79:41644] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.25.181:2525] with esmtp (Farcaster) id 4f621809-6571-476e-b300-f14101ba8a55; Wed, 11 Sep 2024 14:38:03 +0000 (UTC) X-Farcaster-Flow-ID: 4f621809-6571-476e-b300-f14101ba8a55 Received: from EX19D007EUA001.ant.amazon.com (10.252.50.133) by EX19MTAEUC001.ant.amazon.com (10.252.51.193) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 11 Sep 2024 14:38:00 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D007EUA001.ant.amazon.com (10.252.50.133) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.35; Wed, 11 Sep 2024 14:37:59 +0000 Received: from dev-dsk-faresx-1b-27755bf1.eu-west-1.amazon.com (10.253.79.181) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Wed, 11 Sep 2024 14:37:56 +0000 From: Fares Mehanna To: CC: , Fares Mehanna , "Marc Zyngier" , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , "Will Deacon" , Andrew Morton , "Kemeng Shi" , =?UTF-8?q?Pierre-Cl=C3=A9ment=20Tosi?= , Ard Biesheuvel , Mark Rutland , Javier Martinez Canillas , "Arnd Bergmann" , Fuad Tabba , Mark Brown , Joey Gouly , Kristina Martsenko , Randy Dunlap , "Bjorn Helgaas" , Jean-Philippe Brucker , "Mike Rapoport (IBM)" , "David Hildenbrand" , Roman Kagan , "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , "open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , open list , "open list:MEMORY MANAGEMENT" Subject: [RFC PATCH 7/7] arm64: KVM: Allocate vCPU fp-regs dynamically on VHE and KERNEL_SECRETMEM enabled systems Date: Wed, 11 Sep 2024 14:34:06 +0000 Message-ID: <20240911143421.85612-8-faresx@amazon.de> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240911143421.85612-1-faresx@amazon.de> References: <20240911143421.85612-1-faresx@amazon.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Similar to what was done in this commit: "arm64: KVM: Allocate vCPU gp-regs dynamically on VHE and KERNEL_SECRETME= M enabled systems" We're moving fp-regs to dynamic memory for systems supporting VHE and compi= led with KERNEL_SECRETMEM support. Otherwise, we will use the "fp_regs_storage" struct embedded in the vCPU context. Accessing fp-regs embedded in the vCPU context without de-reference is done= as: add \regs, \ctxt, #offsetof(struct kvm_cpu_context, fp_regs_storage) Accessing the dynamically allocated fp-regs with de-reference is done as: ldr \regs, [\ctxt, #offsetof(struct kvm_cpu_context, fp_regs)] Signed-off-by: Fares Mehanna --- arch/arm64/include/asm/kvm_host.h | 16 ++++++++++++++-- arch/arm64/kernel/image-vars.h | 1 + arch/arm64/kvm/arm.c | 29 +++++++++++++++++++++++++++-- arch/arm64/kvm/va_layout.c | 23 +++++++++++++++++++---- 4 files changed, 61 insertions(+), 8 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm= _host.h index e8ed2c12479f..4132c57d7e69 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -550,7 +550,9 @@ struct kvm_cpu_context { u64 spsr_irq; u64 spsr_fiq; =20 - struct user_fpsimd_state fp_regs; + struct user_fpsimd_state *fp_regs; + struct user_fpsimd_state fp_regs_storage; + struct secretmem_area *fp_regs_area; =20 u64 sys_regs[NR_SYS_REGS]; =20 @@ -968,7 +970,17 @@ static __always_inline struct user_pt_regs *ctxt_gp_re= gs(const struct kvm_cpu_co return regs; } #define vcpu_gp_regs(v) (ctxt_gp_regs(&(v)->arch.ctxt)) -#define ctxt_fp_regs(ctxt) (&(ctxt).fp_regs) + +static __always_inline struct user_fpsimd_state *ctxt_fp_regs(const struct= kvm_cpu_context *ctxt) +{ + struct user_fpsimd_state *fp_regs =3D (void *) ctxt; + asm volatile(ALTERNATIVE_CB("add %0, %0, %1\n", + ARM64_HAS_VIRT_HOST_EXTN, + kvm_update_ctxt_fp_regs) + : "+r" (fp_regs) + : "I" (offsetof(struct kvm_cpu_context, fp_regs_storage))); + return fp_regs; +} #define vcpu_fp_regs(v) (ctxt_fp_regs(&(v)->arch.ctxt)) =20 /* diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h index e3bb626e299c..904573598e0f 100644 --- a/arch/arm64/kernel/image-vars.h +++ b/arch/arm64/kernel/image-vars.h @@ -87,6 +87,7 @@ KVM_NVHE_ALIAS(kvm_update_va_mask); KVM_NVHE_ALIAS(kvm_get_kimage_voffset); KVM_NVHE_ALIAS(kvm_compute_final_ctr_el0); KVM_NVHE_ALIAS(kvm_update_ctxt_gp_regs); +KVM_NVHE_ALIAS(kvm_update_ctxt_fp_regs); KVM_NVHE_ALIAS(spectre_bhb_patch_loop_iter); KVM_NVHE_ALIAS(spectre_bhb_patch_loop_mitigation_enable); KVM_NVHE_ALIAS(spectre_bhb_patch_wa3); diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 7542af3f766a..17b42e9099c3 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -477,6 +477,14 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) if (!vcpu->arch.ctxt.regs_area) return -ENOMEM; vcpu->arch.ctxt.regs =3D vcpu->arch.ctxt.regs_area->ptr; + + pages_needed =3D (sizeof(*vcpu_fp_regs(vcpu)) + PAGE_SIZE - 1) / PAGE_SI= ZE; + vcpu->arch.ctxt.fp_regs_area =3D secretmem_allocate_pages(fls(pages_need= ed - 1)); + if (!vcpu->arch.ctxt.fp_regs_area) { + err =3D -ENOMEM; + goto free_vcpu_ctxt; + } + vcpu->arch.ctxt.fp_regs =3D vcpu->arch.ctxt.fp_regs_area->ptr; } =20 /* Set up the timer */ @@ -504,8 +512,10 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) return kvm_share_hyp(vcpu, vcpu + 1); =20 free_vcpu_ctxt: - if (kvm_use_dynamic_regs()) + if (kvm_use_dynamic_regs()) { secretmem_release_pages(vcpu->arch.ctxt.regs_area); + secretmem_release_pages(vcpu->arch.ctxt.fp_regs_area); + } return err; } =20 @@ -524,8 +534,10 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) kvm_vgic_vcpu_destroy(vcpu); kvm_arm_vcpu_destroy(vcpu); =20 - if (kvm_use_dynamic_regs()) + if (kvm_use_dynamic_regs()) { secretmem_release_pages(vcpu->arch.ctxt.regs_area); + secretmem_release_pages(vcpu->arch.ctxt.fp_regs_area); + } } =20 void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) @@ -2729,12 +2741,25 @@ static int init_hyp_hve_mode(void) per_cpu(kvm_host_data, cpu).host_ctxt.regs =3D kvm_host_data_regs; } =20 + /* Allocate fp-regs */ + for_each_possible_cpu(cpu) { + void *kvm_host_data_regs; + + kvm_host_data_regs =3D kzalloc(sizeof(struct user_fpsimd_state), GFP_KER= NEL); + if (!kvm_host_data_regs) { + err =3D -ENOMEM; + goto free_regs; + } + per_cpu(kvm_host_data, cpu).host_ctxt.fp_regs =3D kvm_host_data_regs; + } + return 0; =20 free_regs: for_each_possible_cpu(cpu) { kfree(per_cpu(kvm_hyp_ctxt, cpu).regs); kfree(per_cpu(kvm_host_data, cpu).host_ctxt.regs); + kfree(per_cpu(kvm_host_data, cpu).host_ctxt.fp_regs); } =20 return err; diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c index fcef7e89d042..ba1030fa5b08 100644 --- a/arch/arm64/kvm/va_layout.c +++ b/arch/arm64/kvm/va_layout.c @@ -185,10 +185,12 @@ void __init kvm_update_va_mask(struct alt_instr *alt, } } =20 -void __init kvm_update_ctxt_gp_regs(struct alt_instr *alt, - __le32 *origptr, __le32 *updptr, int nr_inst) +static __always_inline void __init kvm_update_ctxt_regs(struct alt_instr *= alt, + __le32 *origptr, + __le32 *updptr, + int nr_inst, u32 imm) { - u32 rd, rn, imm, insn, oinsn; + u32 rd, rn, insn, oinsn; =20 BUG_ON(nr_inst !=3D 1); =20 @@ -198,7 +200,6 @@ void __init kvm_update_ctxt_gp_regs(struct alt_instr *a= lt, oinsn =3D le32_to_cpu(origptr[0]); rd =3D aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RD, oinsn); rn =3D aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RN, oinsn); - imm =3D offsetof(struct kvm_cpu_context, regs); =20 insn =3D aarch64_insn_gen_load_store_imm(rd, rn, imm, AARCH64_INSN_SIZE_64, @@ -208,6 +209,20 @@ void __init kvm_update_ctxt_gp_regs(struct alt_instr *= alt, updptr[0] =3D cpu_to_le32(insn); } =20 +void __init kvm_update_ctxt_gp_regs(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, int nr_inst) +{ + u32 offset =3D offsetof(struct kvm_cpu_context, regs); + kvm_update_ctxt_regs(alt, origptr, updptr, nr_inst, offset); +} + +void __init kvm_update_ctxt_fp_regs(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, int nr_inst) +{ + u32 offset =3D offsetof(struct kvm_cpu_context, fp_regs); + kvm_update_ctxt_regs(alt, origptr, updptr, nr_inst, offset); +} + void kvm_patch_vector_branch(struct alt_instr *alt, __le32 *origptr, __le32 *updptr, int nr_inst) { --=20 2.40.1 Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597