From nobody Fri Nov 29 23:56:26 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 277A77DA95; Fri, 13 Sep 2024 17:06:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726247174; cv=none; b=idWXZKtorqi8ttSlx2WPiUmvDj5lSeigFkt2oMV0kW6ztXpHoI/oc90l2lF0TguuK0EQcZOfC10DYxbGHyHF4pQnsylIYJpjI5Yqtg45WLqMn+MJ9I3Fv1gRrUN+OdBt6TEdUlW9N0Hm0etjtcFlnPPW4X4wD2BpcCy8Zn+YIO0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726247174; c=relaxed/simple; bh=NxChBfmfE/PQew0FTivawHLni9itxwM9CzLHCbgP3wU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hpBzqpwNwea1uIJ1zzseqUeHS4I3NcHDKv/PkPBMot5Rz9f15Q1bcRr9IVoYr56ZsNH2p5RWBBO4jkY6iIi+45votGlAc2mLXBUiLKxwDHyxpYtE0zLsFaHLlLn1qCT0f0i98PHSWH67UxdZlqUeKUVJhM4lQ0vcc1jkX7MlZ/4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=pxgVP/Qb; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="pxgVP/Qb" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F1620C4CECD; Fri, 13 Sep 2024 17:06:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726247174; bh=NxChBfmfE/PQew0FTivawHLni9itxwM9CzLHCbgP3wU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=pxgVP/QbAzUTKfqfKBIPOtYV/dBwYzREvdFWhIev1FqfdG+r+zM/Ep3mNROPDX5Tv 56GPF8PLj+Sawwz2McYJWVTPL2JV89vDKy3zmhR2SFksTPUdkIZbNKbLGXZZpYW+7R gJIfX3623C5fvqr11wx88YLB7vvPqZZRT/NL4K5EfTRw3iI2nLM39RS8hUy6kdBCAb 34ffpufBY4nmVrnOcbLVJVe1sjqqzLGo+3yIGjjXYsGH/GrV3QarqvRLOgDVzf/tl5 9N3kY+P3Gm40+KaM3PkLYnMvAxtKWGaV5sKv1ca/D2vmf9HWZuD/gg+N/Bh3i8bIoY OTbv8k4e02ksg== From: Alexey Gladkov To: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev Cc: "Alexey Gladkov (Intel)" , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , "Kirill A. Shutemov" , Andrew Morton , Yuan Yao , Geert Uytterhoeven , Yuntao Wang , Kai Huang , Baoquan He , Oleg Nesterov , cho@microsoft.com, decui@microsoft.com, John.Starks@microsoft.com, stable@vger.kernel.org Subject: [PATCH v7 1/6] x86/tdx: Fix "in-kernel MMIO" check Date: Fri, 13 Sep 2024 19:05:56 +0200 Message-ID: <565a804b80387970460a4ebc67c88d1380f61ad1.1726237595.git.legion@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Alexey Gladkov (Intel)" TDX only supports kernel-initiated MMIO operations. The handle_mmio() function checks if the #VE exception occurred in the kernel and rejects the operation if it did not. However, userspace can deceive the kernel into performing MMIO on its behalf. For example, if userspace can point a syscall to an MMIO address, syscall does get_user() or put_user() on it, triggering MMIO #VE. The kernel will treat the #VE as in-kernel MMIO. Ensure that the target MMIO address is within the kernel before decoding instruction. Cc: stable@vger.kernel.org Reviewed-by: Kirill A. Shutemov Signed-off-by: Alexey Gladkov (Intel) Acked-by: Dave Hansen --- arch/x86/coco/tdx/tdx.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 078e2bac2553..d6e6407e3999 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -16,6 +16,7 @@ #include #include #include +#include =20 /* MMIO direction */ #define EPT_READ 0 @@ -434,6 +435,11 @@ static int handle_mmio(struct pt_regs *regs, struct ve= _info *ve) return -EINVAL; } =20 + if (!fault_in_kernel_space(ve->gla)) { + WARN_ONCE(1, "Access to userspace address is not supported"); + return -EINVAL; + } + /* * Reject EPT violation #VEs that split pages. * --=20 2.46.0 From nobody Fri Nov 29 23:56:26 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C48114F20E; Fri, 13 Sep 2024 17:06:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726247178; cv=none; b=cSGMXDHycuSqfGNy8wfV/HxwGgvfETB6JYMIG/CU9GLFn5EDhc6zn9IfZr5+ydXtTLQvuCHw0TxNf6cXwh1QXntFXL7EtZldxmZjgAGnoRexJAZy5fwzSbrExInb653J4Z5xO3AWkJwaaa+f6Wjv9fuogzbIA/awox+kYxER50Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726247178; c=relaxed/simple; bh=gz19GKmwzYvLt8pTHhB7YzVJhm3SA0Cg7Hsn9nteeVg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WcO5Hbn6EBg2iU7u+s6D2kD1Aw1qB8IHy6D80d9nE0mw1YjzYfE7FhqofQngJZJpiXTdA25Qf46HdEz+vt24rZuGjI71Wpfp+Totk22hU/KbETdoZSZx7275zYWQGtTG1xzcHGBuX+CVMqpZSscXv3Y6iaWR9SlpXZN6MTmDP74= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=IbDPUbxb; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="IbDPUbxb" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 73720C4CECC; Fri, 13 Sep 2024 17:06:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726247178; bh=gz19GKmwzYvLt8pTHhB7YzVJhm3SA0Cg7Hsn9nteeVg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=IbDPUbxbz8Iu+6RUmtSAhM64EzY9WjinudsNeHylJgKS5A8wX7bDYZFdEp9D2zaJa QInfX62cdJslGUY8mcB+5+8JKiEoh6mguMfLZDFAuUnhrtN+EbxxujSDXkHMB3hsIU IBfCxAg3VYBWsVmJoCdNVIUIf6awfN5+XB1KhNti9NXprPHWHAG0WH+i1EruFdNNtT 1dkYltsFMaO1wPCOl2usgtLIudnzz+srmdfaS6ZYDelCaR5CjaWrO26nmWNpmw1vb5 jyvRo5eCcEOd9vTiPklbrs3ip+hcRNTXecxEIIF8fQQzoyK5WtBOwlfLQY7fvIfXaW r9sg/B47UNT1A== From: Alexey Gladkov To: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev Cc: "Alexey Gladkov (Intel)" , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , "Kirill A. Shutemov" , Andrew Morton , Yuan Yao , Geert Uytterhoeven , Yuntao Wang , Kai Huang , Baoquan He , Oleg Nesterov , cho@microsoft.com, decui@microsoft.com, John.Starks@microsoft.com Subject: [PATCH v7 2/6] x86/tdx: Split MMIO read and write operations Date: Fri, 13 Sep 2024 19:05:57 +0200 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Alexey Gladkov (Intel)" To implement MMIO in userspace, additional memory checks need to be implemented. To avoid overly complicating the handle_mmio() function and to separate checks from actions, it would be better to split this function into two separate functions to handle read and write operations. Reviewed-by: Kirill A. Shutemov Signed-off-by: Alexey Gladkov (Intel) --- arch/x86/coco/tdx/tdx.c | 136 ++++++++++++++++++++++++---------------- 1 file changed, 83 insertions(+), 53 deletions(-) diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index d6e6407e3999..008840ac1191 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -406,14 +406,91 @@ static bool mmio_write(int size, unsigned long addr, = unsigned long val) EPT_WRITE, addr, val); } =20 +static int handle_mmio_write(struct insn *insn, enum insn_mmio_type mmio, = int size, + struct pt_regs *regs, struct ve_info *ve) +{ + unsigned long *reg, val; + + switch (mmio) { + case INSN_MMIO_WRITE: + reg =3D insn_get_modrm_reg_ptr(insn, regs); + if (!reg) + return -EINVAL; + memcpy(&val, reg, size); + if (!mmio_write(size, ve->gpa, val)) + return -EIO; + return insn->length; + case INSN_MMIO_WRITE_IMM: + val =3D insn->immediate.value; + if (!mmio_write(size, ve->gpa, val)) + return -EIO; + return insn->length; + case INSN_MMIO_MOVS: + /* + * MMIO was accessed with an instruction that could not be + * decoded or handled properly. It was likely not using io.h + * helpers or accessed MMIO accidentally. + */ + return -EINVAL; + default: + WARN_ON_ONCE(1); + return -EINVAL; + } + + return insn->length; +} + +static int handle_mmio_read(struct insn *insn, enum insn_mmio_type mmio, i= nt size, + struct pt_regs *regs, struct ve_info *ve) +{ + unsigned long *reg, val; + int extend_size; + u8 extend_val; + + reg =3D insn_get_modrm_reg_ptr(insn, regs); + if (!reg) + return -EINVAL; + + if (!mmio_read(size, ve->gpa, &val)) + return -EIO; + + extend_val =3D 0; + + switch (mmio) { + case INSN_MMIO_READ: + /* Zero-extend for 32-bit operation */ + extend_size =3D size =3D=3D 4 ? sizeof(*reg) : 0; + break; + case INSN_MMIO_READ_ZERO_EXTEND: + /* Zero extend based on operand size */ + extend_size =3D insn->opnd_bytes; + break; + case INSN_MMIO_READ_SIGN_EXTEND: + /* Sign extend based on operand size */ + extend_size =3D insn->opnd_bytes; + if (size =3D=3D 1 && val & BIT(7)) + extend_val =3D 0xFF; + else if (size > 1 && val & BIT(15)) + extend_val =3D 0xFF; + break; + default: + WARN_ON_ONCE(1); + return -EINVAL; + } + + if (extend_size) + memset(reg, extend_val, extend_size); + memcpy(reg, &val, size); + return insn->length; +} + static int handle_mmio(struct pt_regs *regs, struct ve_info *ve) { - unsigned long *reg, val, vaddr; char buffer[MAX_INSN_SIZE]; enum insn_mmio_type mmio; struct insn insn =3D {}; - int size, extend_size; - u8 extend_val =3D 0; + unsigned long vaddr; + int size; =20 /* Only in-kernel MMIO is supported */ if (WARN_ON_ONCE(user_mode(regs))) @@ -429,12 +506,6 @@ static int handle_mmio(struct pt_regs *regs, struct ve= _info *ve) if (WARN_ON_ONCE(mmio =3D=3D INSN_MMIO_DECODE_FAILED)) return -EINVAL; =20 - if (mmio !=3D INSN_MMIO_WRITE_IMM && mmio !=3D INSN_MMIO_MOVS) { - reg =3D insn_get_modrm_reg_ptr(&insn, regs); - if (!reg) - return -EINVAL; - } - if (!fault_in_kernel_space(ve->gla)) { WARN_ONCE(1, "Access to userspace address is not supported"); return -EINVAL; @@ -453,24 +524,15 @@ static int handle_mmio(struct pt_regs *regs, struct v= e_info *ve) if (vaddr / PAGE_SIZE !=3D (vaddr + size - 1) / PAGE_SIZE) return -EFAULT; =20 - /* Handle writes first */ switch (mmio) { case INSN_MMIO_WRITE: - memcpy(&val, reg, size); - if (!mmio_write(size, ve->gpa, val)) - return -EIO; - return insn.length; case INSN_MMIO_WRITE_IMM: - val =3D insn.immediate.value; - if (!mmio_write(size, ve->gpa, val)) - return -EIO; - return insn.length; + case INSN_MMIO_MOVS: + return handle_mmio_write(&insn, mmio, size, regs, ve); case INSN_MMIO_READ: case INSN_MMIO_READ_ZERO_EXTEND: case INSN_MMIO_READ_SIGN_EXTEND: - /* Reads are handled below */ - break; - case INSN_MMIO_MOVS: + return handle_mmio_read(&insn, mmio, size, regs, ve); case INSN_MMIO_DECODE_FAILED: /* * MMIO was accessed with an instruction that could not be @@ -482,38 +544,6 @@ static int handle_mmio(struct pt_regs *regs, struct ve= _info *ve) WARN_ONCE(1, "Unknown insn_decode_mmio() decode value?"); return -EINVAL; } - - /* Handle reads */ - if (!mmio_read(size, ve->gpa, &val)) - return -EIO; - - switch (mmio) { - case INSN_MMIO_READ: - /* Zero-extend for 32-bit operation */ - extend_size =3D size =3D=3D 4 ? sizeof(*reg) : 0; - break; - case INSN_MMIO_READ_ZERO_EXTEND: - /* Zero extend based on operand size */ - extend_size =3D insn.opnd_bytes; - break; - case INSN_MMIO_READ_SIGN_EXTEND: - /* Sign extend based on operand size */ - extend_size =3D insn.opnd_bytes; - if (size =3D=3D 1 && val & BIT(7)) - extend_val =3D 0xFF; - else if (size > 1 && val & BIT(15)) - extend_val =3D 0xFF; - break; - default: - /* All other cases has to be covered with the first switch() */ - WARN_ON_ONCE(1); - return -EINVAL; - } - - if (extend_size) - memset(reg, extend_val, extend_size); - memcpy(reg, &val, size); - return insn.length; } =20 static bool handle_in(struct pt_regs *regs, int size, int port) --=20 2.46.0 From nobody Fri Nov 29 23:56:26 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E21A4F20E; Fri, 13 Sep 2024 17:06:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726247183; cv=none; b=NuzcOODIGUqFQYuOKKQU/ocuJA1dR0eWWNgYp6qluUcI7sIUqTYzz7G1kbKnx8dfKY4MW5dXh8Aoi00u+hYn0MkxWMEYeN8Aa6o5hwjWxqYwGJi7sSeIgLg0BlbFO+HKGQ0hotNUA+eo/yV7Jvm2sCbL0F40tytpGOKClHtvBTI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726247183; c=relaxed/simple; bh=c8xRrWQx8lzMHOGMOUi7KsdFCbyLJg7q+NaSRJ5jGDA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KtejzbsSvJVRywXwcdY0yUtjeGz0woEXa59Ly34p68Pe6Ow1dXFDfZwnG9T4gepkup/uWJjmPJi26GFTDLejSUOcYPigyogoDw8coeyy7a4DsDrb8XKBoCR7bbf6KzjcarRwxDcTgmukXEI3z+jw33u8JdOEX7su0kGoj8s2XNI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=QlWqMocU; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QlWqMocU" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C19A2C4CED1; Fri, 13 Sep 2024 17:06:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726247182; bh=c8xRrWQx8lzMHOGMOUi7KsdFCbyLJg7q+NaSRJ5jGDA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QlWqMocUbx7Yiz1jgIeZidWfJhFyVe3PApc9UwJ0/kb1yAsdeebhac2XiBXZvXEj3 tNbyUEV5wLexLa2XeAGr1jryQAi/mRkzIif2l4lJ2hglUWO4lTZLEyqmbfF1h4v02+ skjVItoIKzNDiILzOcsE/QAaVeuQEhGIybWGbaFC1mwGfcT4vcO+GxRnAzbgHug0mP 3TG87882kCXim4qT09zRIuIEwULVlGOlMtlYhtud8La20VMQFwWiGYvlqODj5QkNbk bILJEzWvmNobogTWfJs1adztePzcTksLgjJfsiWm/6U18cBFP1xQcYAk1qrrQGWG/f sTnFXsBs7YjOg== From: Alexey Gladkov To: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev Cc: "Alexey Gladkov (Intel)" , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , "Kirill A. Shutemov" , Andrew Morton , Yuan Yao , Geert Uytterhoeven , Yuntao Wang , Kai Huang , Baoquan He , Oleg Nesterov , cho@microsoft.com, decui@microsoft.com, John.Starks@microsoft.com Subject: [PATCH v7 3/6] x86/tdx: Add validation of userspace MMIO instructions Date: Fri, 13 Sep 2024 19:05:58 +0200 Message-ID: <5d3787ea8947a41fb5a71c9e606e8e9a99e76feb.1726237595.git.legion@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Alexey Gladkov (Intel)" Instructions from kernel space are considered trusted. If the MMIO instruction is from userspace it must be checked. For userspace instructions, it is need to check that the INSN has not changed at the time of #VE and before the execution of the instruction. Once the userspace instruction parsed is enforced that the address points to mapped memory of current process and that address does not point to private memory. After parsing the userspace instruction, it is necessary to ensure that: 1. the operation direction (read/write) corresponds to #VE info; 2. the address still points to mapped memory of current process; 3. the address does not point to private memory. Reviewed-by: Kirill A. Shutemov Signed-off-by: Alexey Gladkov (Intel) --- arch/x86/coco/tdx/tdx.c | 131 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 118 insertions(+), 13 deletions(-) diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 008840ac1191..30651a5af180 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -406,6 +407,87 @@ static bool mmio_write(int size, unsigned long addr, u= nsigned long val) EPT_WRITE, addr, val); } =20 +static inline bool is_private_gpa(u64 gpa) +{ + return gpa =3D=3D cc_mkenc(gpa); +} + +static int get_phys_addr(unsigned long addr, phys_addr_t *phys_addr, bool = *writable) +{ + unsigned int level; + pgd_t *pgdp; + pte_t *ptep; + + /* + * Address validation only makes sense for a user process. The lock must + * be obtained before validation can begin. + */ + mmap_assert_locked(current->mm); + + pgdp =3D pgd_offset(current->mm, addr); + + if (!pgd_none(*pgdp)) { + ptep =3D lookup_address_in_pgd(pgdp, addr, &level); + if (ptep) { + unsigned long offset; + + if (!pte_decrypted(*ptep)) + return -EFAULT; + + offset =3D addr & ~page_level_mask(level); + *phys_addr =3D PFN_PHYS(pte_pfn(*ptep)); + *phys_addr |=3D offset; + + *writable =3D pte_write(*ptep); + + return 0; + } + } + + return -EFAULT; +} + +static int valid_vaddr(struct ve_info *ve, enum insn_mmio_type mmio, int s= ize, + unsigned long vaddr) +{ + phys_addr_t phys_addr; + bool writable =3D false; + + /* It's not fatal. This can happen due to swap out or page migration. */ + if (get_phys_addr(vaddr, &phys_addr, &writable) || (ve->gpa !=3D cc_mkdec= (phys_addr))) + return -EAGAIN; + + /* + * Re-check whether #VE info matches the instruction that was decoded. + * + * The ve->gpa was valid at the time ve_info was received. But this code + * executed with interrupts enabled, allowing tlb shootdown and therefore + * munmap() to be executed in the parallel thread. + * + * By the time MMIO emulation is performed, ve->gpa may be already + * unmapped from the process, the device it belongs to removed from + * system and something else could be plugged in its place. + */ + switch (mmio) { + case INSN_MMIO_WRITE: + case INSN_MMIO_WRITE_IMM: + if (!writable || !(ve->exit_qual & EPT_VIOLATION_ACC_WRITE)) + return -EFAULT; + break; + case INSN_MMIO_READ: + case INSN_MMIO_READ_ZERO_EXTEND: + case INSN_MMIO_READ_SIGN_EXTEND: + if (!(ve->exit_qual & EPT_VIOLATION_ACC_READ)) + return -EFAULT; + break; + default: + WARN_ONCE(1, "Unsupported mmio instruction: %d", mmio); + return -EINVAL; + } + + return 0; +} + static int handle_mmio_write(struct insn *insn, enum insn_mmio_type mmio, = int size, struct pt_regs *regs, struct ve_info *ve) { @@ -490,7 +572,7 @@ static int handle_mmio(struct pt_regs *regs, struct ve_= info *ve) enum insn_mmio_type mmio; struct insn insn =3D {}; unsigned long vaddr; - int size; + int size, ret; =20 /* Only in-kernel MMIO is supported */ if (WARN_ON_ONCE(user_mode(regs))) @@ -511,6 +593,17 @@ static int handle_mmio(struct pt_regs *regs, struct ve= _info *ve) return -EINVAL; } =20 + vaddr =3D (unsigned long)insn_get_addr_ref(&insn, regs); + + if (user_mode(regs)) { + if (mmap_read_lock_killable(current->mm)) + return -EINTR; + + ret =3D valid_vaddr(ve, mmio, size, vaddr); + if (ret) + goto unlock; + } + /* * Reject EPT violation #VEs that split pages. * @@ -520,30 +613,39 @@ static int handle_mmio(struct pt_regs *regs, struct v= e_info *ve) * * load_unaligned_zeropad() will recover using exception fixups. */ - vaddr =3D (unsigned long)insn_get_addr_ref(&insn, regs); - if (vaddr / PAGE_SIZE !=3D (vaddr + size - 1) / PAGE_SIZE) - return -EFAULT; + if (vaddr / PAGE_SIZE !=3D (vaddr + size - 1) / PAGE_SIZE) { + ret =3D -EFAULT; + goto unlock; + } =20 switch (mmio) { case INSN_MMIO_WRITE: case INSN_MMIO_WRITE_IMM: case INSN_MMIO_MOVS: - return handle_mmio_write(&insn, mmio, size, regs, ve); + ret =3D handle_mmio_write(&insn, mmio, size, regs, ve); + break; case INSN_MMIO_READ: case INSN_MMIO_READ_ZERO_EXTEND: case INSN_MMIO_READ_SIGN_EXTEND: - return handle_mmio_read(&insn, mmio, size, regs, ve); + ret =3D handle_mmio_read(&insn, mmio, size, regs, ve); + break; case INSN_MMIO_DECODE_FAILED: /* * MMIO was accessed with an instruction that could not be * decoded or handled properly. It was likely not using io.h * helpers or accessed MMIO accidentally. */ - return -EINVAL; + ret =3D -EINVAL; + break; default: WARN_ONCE(1, "Unknown insn_decode_mmio() decode value?"); - return -EINVAL; + ret =3D -EINVAL; } +unlock: + if (user_mode(regs)) + mmap_read_unlock(current->mm); + + return ret; } =20 static bool handle_in(struct pt_regs *regs, int size, int port) @@ -687,11 +789,6 @@ static int virt_exception_user(struct pt_regs *regs, s= truct ve_info *ve) } } =20 -static inline bool is_private_gpa(u64 gpa) -{ - return gpa =3D=3D cc_mkenc(gpa); -} - /* * Handle the kernel #VE. * @@ -729,6 +826,14 @@ bool tdx_handle_virt_exception(struct pt_regs *regs, s= truct ve_info *ve) insn_len =3D virt_exception_user(regs, ve); else insn_len =3D virt_exception_kernel(regs, ve); + + /* + * A special case to return to userspace without increasing regs->ip + * to repeat the instruction once again. + */ + if (insn_len =3D=3D -EAGAIN) + return true; + if (insn_len < 0) return false; =20 --=20 2.46.0 From nobody Fri Nov 29 23:56:26 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 12C4F4F20E; Fri, 13 Sep 2024 17:06:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726247187; cv=none; b=qXsaQmkkw4N4xAdWL+vzThU2/nyeAenEgIT2u50LEj1wCVtsipvz8adS1P3xz2JmL4huAKho6s9YiU1HqZ8Gv3Fid9stt1sRsgFNpzisb5aTuYEh8xRHVN9OhWD7n7X2jH8d9E/dF5ulMQHSX3w+MjHrSckeiX2gh412bU97x4A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726247187; c=relaxed/simple; bh=F/xRqCRoXDUPxpDyWTV8ycqqc2DnTbaoD2OG8e3q4Uc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Go9PEUVpVZ8T7aJY/O/beOafQxDv40vQXgln4lLTd+pg7l/Wg34FksoBue7qmiydknL9QcPt3xXdncSLD3LEai8ApaLLFpw+0C5cgDeIZMLT4We5R3cE50fCOpkYHRrldPcHWJhVkDVcQ7kSqIp1nqQEjCwdlQHtqxPeOI8v6Y4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=dKy/Juz6; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="dKy/Juz6" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 21932C4CECC; Fri, 13 Sep 2024 17:06:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726247186; bh=F/xRqCRoXDUPxpDyWTV8ycqqc2DnTbaoD2OG8e3q4Uc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dKy/Juz6AA6ZcbF02HZoEw2DS3GFRLrbcnEiAVyMzvgP6jfi4u3eN1CGCNBY3Yqri tqAhk7fBWqZL7ClPDoXiwAmKfYyNxgkqUWb1vUhJsZFWHvMV9jqY7IP+oiwM9mpSZg X20vJH+aumvXOLlSGUSCfg88cri1ub+xpfFYMOf3s0sngoLx6CC7t82SFy3h6LlyJs ZIxDMh6TeMK0mu6iqclSauhtUmDQ17cXtgoMTVqXwNkitXKPSwO+TOC+H96pbzLAfe eG818ne/iMgmog0UCHJFiYk3SDzEVqwKAd/86dY4xhPgtC9N42FgwC8NWuWIJwn11n MOK9FHuAXbtSg== From: Alexey Gladkov To: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev Cc: "Alexey Gladkov (Intel)" , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , "Kirill A. Shutemov" , Andrew Morton , Yuan Yao , Geert Uytterhoeven , Yuntao Wang , Kai Huang , Baoquan He , Oleg Nesterov , cho@microsoft.com, decui@microsoft.com, John.Starks@microsoft.com Subject: [PATCH v7 4/6] x86/tdx: Allow MMIO from userspace Date: Fri, 13 Sep 2024 19:05:59 +0200 Message-ID: <5a94e08959fc9360eddd30a5743f16165353282b.1726237595.git.legion@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Alexey Gladkov (Intel)" The MMIO emulation is only allowed for kernel space code. It is carried out through a special API, which uses only certain instructions. This does not allow userspace to work with virtual devices. Allow userspace to use the same instructions as kernel space to access MMIO. Additional checks have been added previously. Reviewed-by: Thomas Gleixner Reviewed-by: Kirill A. Shutemov Signed-off-by: Alexey Gladkov (Intel) --- arch/x86/coco/tdx/tdx.c | 45 +++++++++++++++++++++++++++++++---------- 1 file changed, 34 insertions(+), 11 deletions(-) diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 30651a5af180..dffc343e64d7 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -488,6 +488,32 @@ static int valid_vaddr(struct ve_info *ve, enum insn_m= mio_type mmio, int size, return 0; } =20 +static int decode_insn_struct(struct insn *insn, struct pt_regs *regs) +{ + char buffer[MAX_INSN_SIZE]; + + if (user_mode(regs)) { + int nr_copied =3D insn_fetch_from_user(regs, buffer); + + if (nr_copied <=3D 0) + return -EFAULT; + + if (!insn_decode_from_regs(insn, regs, buffer, nr_copied)) + return -EINVAL; + } else { + if (copy_from_kernel_nofault(buffer, (void *)regs->ip, MAX_INSN_SIZE)) + return -EFAULT; + + if (insn_decode(insn, buffer, MAX_INSN_SIZE, INSN_MODE_64)) + return -EINVAL; + } + + if (!insn->immediate.got) + return -EINVAL; + + return 0; +} + static int handle_mmio_write(struct insn *insn, enum insn_mmio_type mmio, = int size, struct pt_regs *regs, struct ve_info *ve) { @@ -568,27 +594,20 @@ static int handle_mmio_read(struct insn *insn, enum i= nsn_mmio_type mmio, int siz =20 static int handle_mmio(struct pt_regs *regs, struct ve_info *ve) { - char buffer[MAX_INSN_SIZE]; enum insn_mmio_type mmio; struct insn insn =3D {}; unsigned long vaddr; int size, ret; =20 - /* Only in-kernel MMIO is supported */ - if (WARN_ON_ONCE(user_mode(regs))) - return -EFAULT; - - if (copy_from_kernel_nofault(buffer, (void *)regs->ip, MAX_INSN_SIZE)) - return -EFAULT; - - if (insn_decode(&insn, buffer, MAX_INSN_SIZE, INSN_MODE_64)) - return -EINVAL; + ret =3D decode_insn_struct(&insn, regs); + if (ret) + return ret; =20 mmio =3D insn_decode_mmio(&insn, &size); if (WARN_ON_ONCE(mmio =3D=3D INSN_MMIO_DECODE_FAILED)) return -EINVAL; =20 - if (!fault_in_kernel_space(ve->gla)) { + if (!user_mode(regs) && !fault_in_kernel_space(ve->gla)) { WARN_ONCE(1, "Access to userspace address is not supported"); return -EINVAL; } @@ -783,6 +802,10 @@ static int virt_exception_user(struct pt_regs *regs, s= truct ve_info *ve) switch (ve->exit_reason) { case EXIT_REASON_CPUID: return handle_cpuid(regs, ve); + case EXIT_REASON_EPT_VIOLATION: + if (is_private_gpa(ve->gpa)) + panic("Unexpected EPT-violation on private memory."); + return handle_mmio(regs, ve); default: pr_warn("Unexpected #VE: %lld\n", ve->exit_reason); return -EIO; --=20 2.46.0 From nobody Fri Nov 29 23:56:26 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A29E812C54B; Fri, 13 Sep 2024 17:06:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726247192; cv=none; b=JUMBe7K6xu7I5ostD5vX9e2B+9WBlMR5zHMR62/rHLOG0Q76DFPWo8UX1mLNv8QMKhnXWWaLY3sfF5+MlQLK6aiEZ6NrBnrqI5oqXLKA6NbPyEkOd1uu2lgQEF4JFVgafzZnN2DGuDBxfyK89YgXs9KnGtR1ZYN1ia+GsdpEg9I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726247192; c=relaxed/simple; bh=IgzoIpxGUtPmblpkXvkKoxQWMwcL2kujO7mD+PwtbuE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZmTnPvbuNezp4z/RYi08d1HuLE6fPp9VZvO9h5WKkssPowC/cVqrd++D3F2F5UEIPliThALnwFoD3Qh01WoAZeYnVOSNNVO7kXCLeIfGXztqVvmot1XlVzYQBBfWNlGzSHMDjrnPScbKDLwiwFMEFFvFRKYInwIB2fbdJ9dETtU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ih/zIdjx; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ih/zIdjx" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 66775C4CECD; Fri, 13 Sep 2024 17:06:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726247191; bh=IgzoIpxGUtPmblpkXvkKoxQWMwcL2kujO7mD+PwtbuE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ih/zIdjxTwfMw+IUTJpsDdrsfCUZnU5UysGZW9y+vP5IYOCFvyjkcdfdzmKAAbXM8 OONwqYoWSAFUb9y4vWb37XsPFWfxU5I9BTZUNJmNT7L/yqajryuwuQfmx55jcsgwJX BIv8i6GD+aYhJbmL8hrmVNxXxL5kG8Jb5CI/T+lkjpRBs/yZoHXeZ07+80J0feTwgh Cq+DoiyTABzkU+e+EnEvq5nb5CXj59ooWF4xnY8rpqlcFEyNOz4UDX0xwvzJwQQ40i YqN36n6UlKG5NN3m9FS9c1+Q6BT0ZxNGzjKk+19hQJyfYbQa47hUkJ9HQod1ZYL/o5 CgbLKuFe0lrFw== From: Alexey Gladkov To: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev Cc: "Alexey Gladkov (Intel)" , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , "Kirill A. Shutemov" , Andrew Morton , Yuan Yao , Geert Uytterhoeven , Yuntao Wang , Kai Huang , Baoquan He , Oleg Nesterov , cho@microsoft.com, decui@microsoft.com, John.Starks@microsoft.com Subject: [PATCH v7 5/6] x86/tdx: Move MMIO helpers to common library Date: Fri, 13 Sep 2024 19:06:00 +0200 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Alexey Gladkov (Intel)" AMD code has helpers that are used to emulate MOVS instructions. To be able to reuse this code in the MOVS implementation for intel, it is necessary to move them to a common location. Acked-by: Kirill A. Shutemov Signed-off-by: Alexey Gladkov (Intel) --- arch/x86/coco/sev/core.c | 139 ++++++-------------------------------- arch/x86/include/asm/io.h | 3 + arch/x86/lib/iomem.c | 115 +++++++++++++++++++++++++++++++ 3 files changed, 140 insertions(+), 117 deletions(-) diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c index 082d61d85dfc..07e9a6f15fba 100644 --- a/arch/x86/coco/sev/core.c +++ b/arch/x86/coco/sev/core.c @@ -369,72 +369,24 @@ static enum es_result vc_decode_insn(struct es_em_ctx= t *ctxt) static enum es_result vc_write_mem(struct es_em_ctxt *ctxt, char *dst, char *buf, size_t size) { - unsigned long error_code =3D X86_PF_PROT | X86_PF_WRITE; + unsigned long error_code; + int ret; =20 /* - * This function uses __put_user() independent of whether kernel or user - * memory is accessed. This works fine because __put_user() does no - * sanity checks of the pointer being accessed. All that it does is - * to report when the access failed. - * - * Also, this function runs in atomic context, so __put_user() is not - * allowed to sleep. The page-fault handler detects that it is running - * in atomic context and will not try to take mmap_sem and handle the - * fault, so additional pagefault_enable()/disable() calls are not - * needed. - * - * The access can't be done via copy_to_user() here because - * vc_write_mem() must not use string instructions to access unsafe - * memory. The reason is that MOVS is emulated by the #VC handler by - * splitting the move up into a read and a write and taking a nested #VC - * exception on whatever of them is the MMIO access. Using string - * instructions here would cause infinite nesting. + * This function runs in atomic context, so __put_iomem() is not allowed + * to sleep. The page-fault handler detects that it is running in atomic + * context and will not try to take mmap_lock and handle the fault, so + * additional pagefault_enable()/disable() calls are not needed. */ - switch (size) { - case 1: { - u8 d1; - u8 __user *target =3D (u8 __user *)dst; - - memcpy(&d1, buf, 1); - if (__put_user(d1, target)) - goto fault; - break; - } - case 2: { - u16 d2; - u16 __user *target =3D (u16 __user *)dst; - - memcpy(&d2, buf, 2); - if (__put_user(d2, target)) - goto fault; - break; - } - case 4: { - u32 d4; - u32 __user *target =3D (u32 __user *)dst; - - memcpy(&d4, buf, 4); - if (__put_user(d4, target)) - goto fault; - break; - } - case 8: { - u64 d8; - u64 __user *target =3D (u64 __user *)dst; + ret =3D __put_iomem(dst, buf, size); + if (!ret) + return ES_OK; =20 - memcpy(&d8, buf, 8); - if (__put_user(d8, target)) - goto fault; - break; - } - default: - WARN_ONCE(1, "%s: Invalid size: %zu\n", __func__, size); + if (ret =3D=3D -EIO) return ES_UNSUPPORTED; - } =20 - return ES_OK; + error_code =3D X86_PF_PROT | X86_PF_WRITE; =20 -fault: if (user_mode(ctxt->regs)) error_code |=3D X86_PF_USER; =20 @@ -448,71 +400,24 @@ static enum es_result vc_write_mem(struct es_em_ctxt = *ctxt, static enum es_result vc_read_mem(struct es_em_ctxt *ctxt, char *src, char *buf, size_t size) { - unsigned long error_code =3D X86_PF_PROT; + unsigned long error_code; + int ret; =20 /* - * This function uses __get_user() independent of whether kernel or user - * memory is accessed. This works fine because __get_user() does no - * sanity checks of the pointer being accessed. All that it does is - * to report when the access failed. - * - * Also, this function runs in atomic context, so __get_user() is not - * allowed to sleep. The page-fault handler detects that it is running - * in atomic context and will not try to take mmap_sem and handle the - * fault, so additional pagefault_enable()/disable() calls are not - * needed. - * - * The access can't be done via copy_from_user() here because - * vc_read_mem() must not use string instructions to access unsafe - * memory. The reason is that MOVS is emulated by the #VC handler by - * splitting the move up into a read and a write and taking a nested #VC - * exception on whatever of them is the MMIO access. Using string - * instructions here would cause infinite nesting. + * This function runs in atomic context, so __get_iomem() is not allowed + * to sleep. The page-fault handler detects that it is running in atomic + * context and will not try to take mmap_lock and handle the fault, so + * additional pagefault_enable()/disable() calls are not needed. */ - switch (size) { - case 1: { - u8 d1; - u8 __user *s =3D (u8 __user *)src; - - if (__get_user(d1, s)) - goto fault; - memcpy(buf, &d1, 1); - break; - } - case 2: { - u16 d2; - u16 __user *s =3D (u16 __user *)src; - - if (__get_user(d2, s)) - goto fault; - memcpy(buf, &d2, 2); - break; - } - case 4: { - u32 d4; - u32 __user *s =3D (u32 __user *)src; + ret =3D __get_iomem(src, buf, size); + if (!ret) + return ES_OK; =20 - if (__get_user(d4, s)) - goto fault; - memcpy(buf, &d4, 4); - break; - } - case 8: { - u64 d8; - u64 __user *s =3D (u64 __user *)src; - if (__get_user(d8, s)) - goto fault; - memcpy(buf, &d8, 8); - break; - } - default: - WARN_ONCE(1, "%s: Invalid size: %zu\n", __func__, size); + if (ret =3D=3D -EIO) return ES_UNSUPPORTED; - } =20 - return ES_OK; + error_code =3D X86_PF_PROT; =20 -fault: if (user_mode(ctxt->regs)) error_code |=3D X86_PF_USER; =20 diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index 1d60427379c9..ac01d53466cb 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -402,4 +402,7 @@ static inline void iosubmit_cmds512(void __iomem *dst, = const void *src, } } =20 +int __get_iomem(char *src, char *buf, size_t size); +int __put_iomem(char *src, char *buf, size_t size); + #endif /* _ASM_X86_IO_H */ diff --git a/arch/x86/lib/iomem.c b/arch/x86/lib/iomem.c index 5eecb45d05d5..3ab146edddea 100644 --- a/arch/x86/lib/iomem.c +++ b/arch/x86/lib/iomem.c @@ -2,6 +2,7 @@ #include #include #include +#include =20 #define movs(type,to,from) \ asm volatile("movs" type:"=3D&D" (to), "=3D&S" (from):"0" (to), "1" (from= ):"memory") @@ -124,3 +125,117 @@ void memset_io(volatile void __iomem *a, int b, size_= t c) } } EXPORT_SYMBOL(memset_io); + +int __get_iomem(char *src, char *buf, size_t size) +{ + /* + * This function uses __get_user() independent of whether kernel or user + * memory is accessed. This works fine because __get_user() does no + * sanity checks of the pointer being accessed. All that it does is + * to report when the access failed. + * + * The access can't be done via copy_from_user() here because + * __get_iomem() must not use string instructions to access unsafe + * memory. The reason is that MOVS is emulated by the exception handler + * for SEV and TDX by splitting the move up into a read and a write + * opetations and taking a nested exception on whatever of them is the + * MMIO access. Using string instructions here would cause infinite + * nesting. + */ + switch (size) { + case 1: { + u8 d1, __user *s =3D (u8 __user *)src; + + if (__get_user(d1, s)) + return -EFAULT; + memcpy(buf, &d1, 1); + break; + } + case 2: { + u16 d2, __user *s =3D (u16 __user *)src; + + if (__get_user(d2, s)) + return -EFAULT; + memcpy(buf, &d2, 2); + break; + } + case 4: { + u32 d4, __user *s =3D (u32 __user *)src; + + if (__get_user(d4, s)) + return -EFAULT; + memcpy(buf, &d4, 4); + break; + } + case 8: { + u64 d8, __user *s =3D (u64 __user *)src; + + if (__get_user(d8, s)) + return -EFAULT; + memcpy(buf, &d8, 8); + break; + } + default: + WARN_ONCE(1, "%s: Invalid size: %zu\n", __func__, size); + return -EIO; + } + + return 0; +} + +int __put_iomem(char *dst, char *buf, size_t size) +{ + /* + * This function uses __put_user() independent of whether kernel or user + * memory is accessed. This works fine because __put_user() does no + * sanity checks of the pointer being accessed. All that it does is + * to report when the access failed. + * + * The access can't be done via copy_to_user() here because + * __put_iomem() must not use string instructions to access unsafe + * memory. The reason is that MOVS is emulated by the exception handler + * for SEV and TDX by splitting the move up into a read and a write + * opetations and taking a nested exception on whatever of them is the + * MMIO access. Using string instructions here would cause infinite + * nesting. + */ + switch (size) { + case 1: { + u8 d1, __user *target =3D (u8 __user *)dst; + + memcpy(&d1, buf, 1); + if (__put_user(d1, target)) + return -EFAULT; + break; + } + case 2: { + u16 d2, __user *target =3D (u16 __user *)dst; + + memcpy(&d2, buf, 2); + if (__put_user(d2, target)) + return -EFAULT; + break; + } + case 4: { + u32 d4, __user *target =3D (u32 __user *)dst; + + memcpy(&d4, buf, 4); + if (__put_user(d4, target)) + return -EFAULT; + break; + } + case 8: { + u64 d8, __user *target =3D (u64 __user *)dst; + + memcpy(&d8, buf, 8); + if (__put_user(d8, target)) + return -EFAULT; + break; + } + default: + WARN_ONCE(1, "%s: Invalid size: %zu\n", __func__, size); + return -EIO; + } + + return 0; +} --=20 2.46.0 From nobody Fri Nov 29 23:56:26 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA69C12CDAE; Fri, 13 Sep 2024 17:06:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726247196; cv=none; b=lVl4m5am7ydooTvYP4+SUttyBilHpBLX5hdrRYAEtgkl/YiwOe5Z3EfrBH9kF0AS5Vl0Do3pmU5FL4QAr1d2z2OqSX4xgjOxKmxEDGlmRqdjrT2q91wWxxFSX8XdfeArPnQO7ygNMZAfo2EZg2HUL75XrKchszv/EyuTiTiWTKI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726247196; c=relaxed/simple; bh=qgmNZSeuTxAO/eLsK1F2kBpN2tQRX5B4go6rJgBz/8I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KE1mHZn6oQ9M1ec0xvTi4UioZfLu/GhWel447X5zH3p8f4hPgiwJS7dUYsqwRccTiA0wYRDAAllIyJgXAONcjSL382jXzIOaLUl0bQDHxsofqCAtULQgiPLugP6jMaIJuLdqRNH9Ikr0cna2OHCP2QnuxzERAfhgmHJwwASX+uw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lQUxudkB; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lQUxudkB" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AE025C4CEC0; Fri, 13 Sep 2024 17:06:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726247195; bh=qgmNZSeuTxAO/eLsK1F2kBpN2tQRX5B4go6rJgBz/8I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lQUxudkBbZhM4XOq+w66nTpphwMtaP3BXjF/ZCJ4k8aM//7HP5qq7Z+++WlPbVyl0 VEUfgoUEraW3LWgBe5zt1Mbx9RrNoXLY3MM4ebS+4zlKpMJuAjkY3isFpzoEUN3izQ wawZrV0xQQ+otQiJabzHZxsqBdry6JLfZHAfbXG8fnDTcLgmQf3BzIQMxuYhXudwLi 7uzOCVyg93SRAnV0jwPvS/7snMemHxG4aD4ITlaKbw3F+/dtt3FsUuBxUox/6TrDdh PaXHlI+CTLvXEfd/MPTUfk2igY/59+f2wMVmaSnFDGFdehGTiTPmy1fQK9F1JRlHt3 AYnEyKl2yka/w== From: Alexey Gladkov To: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev Cc: "Alexey Gladkov (Intel)" , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , "Kirill A. Shutemov" , Andrew Morton , Yuan Yao , Geert Uytterhoeven , Yuntao Wang , Kai Huang , Baoquan He , Oleg Nesterov , cho@microsoft.com, decui@microsoft.com, John.Starks@microsoft.com Subject: [PATCH v7 6/6] x86/tdx: Implement MOVS for MMIO Date: Fri, 13 Sep 2024 19:06:01 +0200 Message-ID: <6f901828e66b30d7aff5c64dd7c57f2cf922be7e.1726237595.git.legion@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Alexey Gladkov (Intel)" Add emulation of the MOVS instruction on MMIO regions. MOVS emulation consists of dividing it into a series of read and write operations, which in turn will be validated separately. This implementation is based on the same principle as in SEV. It splits MOVS into separate read and write operations, which in turn can cause nested #VEs depending on which of the arguments caused the first #VE. The difference with the SEV implementation is the execution context. SEV code is executed in atomic context. Exception handler in TDX is executed with interrupts enabled. That's why the approach to locking is different. In TDX, mmap_lock is taken to verify and emulate the instruction. Another difference is how the read and write instructions are executed for MOVS emulation. While in SEV each read/write operation returns to user space, in TDX these operations are performed from the kernel context. It may be possible to achieve more code reuse at this point, but it would require confirmation from SEV that such a thing wouldn't break anything. Reviewed-by: Kirill A. Shutemov Signed-off-by: Alexey Gladkov (Intel) --- arch/x86/coco/tdx/tdx.c | 82 ++++++++++++++++++++++++++++---- arch/x86/include/asm/processor.h | 1 + 2 files changed, 75 insertions(+), 8 deletions(-) diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index dffc343e64d7..151e63083a13 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -514,6 +514,60 @@ static int decode_insn_struct(struct insn *insn, struc= t pt_regs *regs) return 0; } =20 +static int handle_mmio_movs(struct insn *insn, struct pt_regs *regs, int s= ize, struct ve_info *ve) +{ + unsigned long ds_base, es_base; + unsigned char *src, *dst; + unsigned char buffer[8]; + int off, ret; + bool rep; + + /* + * The in-kernel code must use a special API that does not use MOVS. + * If the MOVS instruction is received from in-kernel, then something + * is broken. + */ + if (WARN_ON_ONCE(!user_mode(regs))) + return -EFAULT; + + ds_base =3D insn_get_seg_base(regs, INAT_SEG_REG_DS); + es_base =3D insn_get_seg_base(regs, INAT_SEG_REG_ES); + + if (ds_base =3D=3D -1L || es_base =3D=3D -1L) + return -EINVAL; + + current->thread.in_mmio_emul =3D 1; + + rep =3D insn_has_rep_prefix(insn); + + do { + src =3D ds_base + (unsigned char *) regs->si; + dst =3D es_base + (unsigned char *) regs->di; + + ret =3D __get_iomem(src, buffer, size); + if (ret) + goto out; + + ret =3D __put_iomem(dst, buffer, size); + if (ret) + goto out; + + off =3D (regs->flags & X86_EFLAGS_DF) ? -size : size; + + regs->si +=3D off; + regs->di +=3D off; + + if (rep) + regs->cx -=3D 1; + } while (rep || regs->cx > 0); + + ret =3D insn->length; +out: + current->thread.in_mmio_emul =3D 0; + + return ret; +} + static int handle_mmio_write(struct insn *insn, enum insn_mmio_type mmio, = int size, struct pt_regs *regs, struct ve_info *ve) { @@ -535,9 +589,8 @@ static int handle_mmio_write(struct insn *insn, enum in= sn_mmio_type mmio, int si return insn->length; case INSN_MMIO_MOVS: /* - * MMIO was accessed with an instruction that could not be - * decoded or handled properly. It was likely not using io.h - * helpers or accessed MMIO accidentally. + * MOVS is processed through higher level emulation which breaks + * this instruction into a sequence of reads and writes. */ return -EINVAL; default: @@ -596,6 +649,7 @@ static int handle_mmio(struct pt_regs *regs, struct ve_= info *ve) { enum insn_mmio_type mmio; struct insn insn =3D {}; + int need_validation; unsigned long vaddr; int size, ret; =20 @@ -607,14 +661,27 @@ static int handle_mmio(struct pt_regs *regs, struct v= e_info *ve) if (WARN_ON_ONCE(mmio =3D=3D INSN_MMIO_DECODE_FAILED)) return -EINVAL; =20 + if (mmio =3D=3D INSN_MMIO_MOVS) + return handle_mmio_movs(&insn, regs, size, ve); + + need_validation =3D user_mode(regs); + if (!user_mode(regs) && !fault_in_kernel_space(ve->gla)) { - WARN_ONCE(1, "Access to userspace address is not supported"); - return -EINVAL; + /* + * Access from kernel to userspace addresses is not allowed + * unless it is a nested exception during MOVS emulation. + */ + if (!current->thread.in_mmio_emul || !current->mm) { + WARN_ONCE(1, "Access to userspace address is not supported"); + return -EINVAL; + } + + need_validation =3D 1; } =20 vaddr =3D (unsigned long)insn_get_addr_ref(&insn, regs); =20 - if (user_mode(regs)) { + if (need_validation) { if (mmap_read_lock_killable(current->mm)) return -EINTR; =20 @@ -640,7 +707,6 @@ static int handle_mmio(struct pt_regs *regs, struct ve_= info *ve) switch (mmio) { case INSN_MMIO_WRITE: case INSN_MMIO_WRITE_IMM: - case INSN_MMIO_MOVS: ret =3D handle_mmio_write(&insn, mmio, size, regs, ve); break; case INSN_MMIO_READ: @@ -661,7 +727,7 @@ static int handle_mmio(struct pt_regs *regs, struct ve_= info *ve) ret =3D -EINVAL; } unlock: - if (user_mode(regs)) + if (need_validation) mmap_read_unlock(current->mm); =20 return ret; diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/proces= sor.h index a75a07f4931f..57605b11b06c 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -486,6 +486,7 @@ struct thread_struct { unsigned long iopl_emul; =20 unsigned int iopl_warn:1; + unsigned int in_mmio_emul:1; =20 /* * Protection Keys Register for Userspace. Loaded immediately on --=20 2.46.0