From nobody Thu Dec 18 13:15:19 2025 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1744F2857F7 for ; Thu, 10 Apr 2025 13:41:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292506; cv=none; b=akTQwxtBWKbFsGmcOpdzcpEKx4/wXkp/UldzJ8tF7Y0dLuUOFS1jxn8RGcnbXtTqfnmJzIhWzQg577c3J+opPcs5xwjf7tfRabS5FWjtMQjul+OhF5YM1lj6ORLkgT3wqJdkGESQ/q4H3T1HfO03opT9qJuZRUb7fqB3KAPgZuA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292506; c=relaxed/simple; bh=t76Wg9mCJpGbQFO226Opa35P1ZmW8qq3cQ+2zwr1lR0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=D6OGoRvkzzZsd9ykLVltDsv0ZEheJNIMtSyRr/7MgNvaB0m7e7LLHiZhkt5eAehH92gyqTi0EDN8nlj6CoYvZf5VLRuBZwfb5m0XX6dy4C1OgLttRuD+ZgC+qMNRfBl+NQbPkMI73SItB1ftQigfjNeNdTNQCBIRAWkOkxl4MTA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=EwLhVq5u; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="EwLhVq5u" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-43947a0919aso6420225e9.0 for ; Thu, 10 Apr 2025 06:41:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744292502; x=1744897302; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=gXw/UaNA/8PDNEKLJNHvZMJeXVXKbmGygtg1CpsGD6E=; b=EwLhVq5u/iw0xHCjxn5XexRQscw4vllX0kVfja2j2xe66Wt4Li4SPcMybLhk8YlW2P KW2raYoQzJaCDXzS4FIJpFrboen6d/u3DcrYWrvzcT4FiPIKV2XVkR4/Q94hf7PP23h9 AgjaIN0CFUpJhzxwRQbuXzIzRvJi5tjZmiAkfVXpOpcGaBeRtV7UYAfbxi+GD+MUGHfK 9fKdbsk6jZuxwdYMns8XGO3fVwytMAjasA5SctevO7nNA7GKCxOylVIwRrra5OBJIR4i cfLKoG/cvLFAvGo/ZzEaxtCPM2w7KTXeZMdc/ywyRVch5iNbrTcQ9QjUWLo/hiubWgLa dkeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744292502; x=1744897302; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gXw/UaNA/8PDNEKLJNHvZMJeXVXKbmGygtg1CpsGD6E=; b=bnx75e8Hy8UfnCknLtDlNjXoNVG/UhGRKOgZzt/sWxr+zC+HqDHQIo/nNnLt5SoKW5 hT+J/k6+ZgG/hXCYf3RCkB32vBv9xJnvB5UXDIykv7WYG0se2dgnsz9PI3WZEXoJ7lSp 2kz+szLgCZDFDTzf5eedMVWwD1t1s+c3aan/zUQyXsBjYDNPrmJHr8rNbGqMjik23c5W ZKRSKjxXslc5k3ClaWlc9MhW/e5QJ6dARZfMIrRqSaQXpnSkFatIRZtsu+btwobIf/Xz n38tqfBgO/XATSa+o21uzY+pI6JYn8fvvcjVamOQ0ysgUCKW7/kqowIaz4zMSZNDrj1r lQrg== X-Forwarded-Encrypted: i=1; AJvYcCWxFiz1x0hlLjVYDGmaNvcZ12Khy2dm4pdJkDc7XDycbC/47ZzyYLZMKR5v+iHJgKzz1JFsTWX/3uttqMo=@vger.kernel.org X-Gm-Message-State: AOJu0YxGXkA77ZvCNT28/XaY/YtytN++IQhEX+4DKswMzX5gDwV8D3YH MLNF5EVqkpbPSYnycmdPVuZUtma7YICIiPF+7k2FJg00i67jgyeAyMRYu1UuYpvJx4Gtog== X-Google-Smtp-Source: AGHT+IEbBK7uSrQvOLZ1dSP0sU6X384GpC8H66VqBJri1LGrN5IkBkYGIS0pLmbALgbYAMDD1f3a53yZ X-Received: from wmcn12.prod.google.com ([2002:a05:600c:c0cc:b0:43c:f60a:4c59]) (user=ardb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:3b97:b0:43c:fc00:f94f with SMTP id 5b1f17b1804b1-43f2d95ebf4mr26942655e9.23.1744292502500; Thu, 10 Apr 2025 06:41:42 -0700 (PDT) Date: Thu, 10 Apr 2025 15:41:19 +0200 In-Reply-To: <20250410134117.3713574-13-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250410134117.3713574-13-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=8366; i=ardb@kernel.org; h=from:subject; bh=niWmrBdmld5Ut46NlyltJZwh2miFi8PLlQF/GqLQX1A=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIf37qYatm4Sv1eo+PbiM8x+Hz899TWpBtxJTlwfO02qcm nniq4hkRykLgxgHg6yYIovA7L/vdp6eKFXrPEsWZg4rE8gQBi5OAZhITSLDf0frAGdhzqwPLl3O bz2UZl6IfSoYcNauXPHgRPPWiDW7nzEynLQ+b2t0N/gIjwu37MuLH4QPxJgfffp/l61F3vMnlks 2swIA X-Mailer: git-send-email 2.49.0.504.g3bcea36a83-goog Message-ID: <20250410134117.3713574-14-ardb+git@google.com> Subject: [PATCH v4 01/11] x86/asm: Make rip_rel_ptr() usable from fPIC code From: Ard Biesheuvel To: linux-efi@vger.kernel.org Cc: x86@kernel.org, mingo@kernel.org, linux-kernel@vger.kernel.org, Ard Biesheuvel , Tom Lendacky , Dionna Amalie Glaze , Kevin Loughlin Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ard Biesheuvel RIP_REL_REF() is used in non-PIC C code that is called very early, before the kernel virtual mapping is up, which is the mapping that the linker expects. It is currently used in two different ways: - to refer to the value of a global variable, including as an lvalue in assignments; - to take the address of a global variable via the mapping that the code currently executes at. The former case is only needed in non-PIC code, as PIC code will never use absolute symbol references when the address of the symbol is not being used. But taking the address of a variable in PIC code may still require extra care, as a stack allocated struct assignment may be emitted as a memcpy() from a statically allocated copy in .rodata. For instance, this void startup_64_setup_gdt_idt(void) { struct desc_ptr startup_gdt_descr =3D { .address =3D (__force unsigned long)gdt_page.gdt, .size =3D GDT_SIZE - 1, }; may result in an absolute symbol reference in PIC code, even though the struct is allocated on the stack and populated at runtime. To address this case, make rip_rel_ptr() accessible in PIC code, and update any existing uses where the address of a global variable is taken using RIP_REL_REF. Once all code of this nature has been moved into arch/x86/boot/startup and built with -fPIC, RIP_REL_REF() can be retired, and only rip_rel_ptr() will remain. Signed-off-by: Ard Biesheuvel --- arch/x86/coco/sev/core.c | 2 +- arch/x86/coco/sev/shared.c | 4 ++-- arch/x86/include/asm/asm.h | 2 +- arch/x86/kernel/head64.c | 24 ++++++++++---------- arch/x86/mm/mem_encrypt_identity.c | 6 ++--- 5 files changed, 19 insertions(+), 19 deletions(-) diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c index b0c1a7a57497..832f7a7b10b2 100644 --- a/arch/x86/coco/sev/core.c +++ b/arch/x86/coco/sev/core.c @@ -2400,7 +2400,7 @@ static __head void svsm_setup(struct cc_blob_sev_info= *cc_info) * kernel was loaded (physbase), so the get the CA address using * RIP-relative addressing. */ - pa =3D (u64)&RIP_REL_REF(boot_svsm_ca_page); + pa =3D (u64)rip_rel_ptr(&boot_svsm_ca_page); =20 /* * Switch over to the boot SVSM CA while the current CA is still diff --git a/arch/x86/coco/sev/shared.c b/arch/x86/coco/sev/shared.c index 2e4122f8aa6b..04982d356803 100644 --- a/arch/x86/coco/sev/shared.c +++ b/arch/x86/coco/sev/shared.c @@ -475,7 +475,7 @@ static int sev_cpuid_hv(struct ghcb *ghcb, struct es_em= _ctxt *ctxt, struct cpuid */ static const struct snp_cpuid_table *snp_cpuid_get_table(void) { - return &RIP_REL_REF(cpuid_table_copy); + return rip_rel_ptr(&cpuid_table_copy); } =20 /* @@ -1681,7 +1681,7 @@ static bool __head svsm_setup_ca(const struct cc_blob= _sev_info *cc_info) * routine is running identity mapped when called, both by the decompress= or * code and the early kernel code. */ - if (!rmpadjust((unsigned long)&RIP_REL_REF(boot_ghcb_page), RMP_PG_SIZE_4= K, 1)) + if (!rmpadjust((unsigned long)rip_rel_ptr(&boot_ghcb_page), RMP_PG_SIZE_4= K, 1)) return false; =20 /* diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h index cc2881576c2c..a9f07799e337 100644 --- a/arch/x86/include/asm/asm.h +++ b/arch/x86/include/asm/asm.h @@ -114,13 +114,13 @@ #endif =20 #ifndef __ASSEMBLER__ -#ifndef __pic__ static __always_inline __pure void *rip_rel_ptr(void *p) { asm("leaq %c1(%%rip), %0" : "=3Dr"(p) : "i"(p)); =20 return p; } +#ifndef __pic__ #define RIP_REL_REF(var) (*(typeof(&(var)))rip_rel_ptr(&(var))) #else #define RIP_REL_REF(var) (var) diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index fa9b6339975f..954d093f187b 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -106,8 +106,8 @@ static unsigned long __head sme_postprocess_startup(str= uct boot_params *bp, * attribute. */ if (sme_get_me_mask()) { - paddr =3D (unsigned long)&RIP_REL_REF(__start_bss_decrypted); - paddr_end =3D (unsigned long)&RIP_REL_REF(__end_bss_decrypted); + paddr =3D (unsigned long)rip_rel_ptr(__start_bss_decrypted); + paddr_end =3D (unsigned long)rip_rel_ptr(__end_bss_decrypted); =20 for (; paddr < paddr_end; paddr +=3D PMD_SIZE) { /* @@ -144,8 +144,8 @@ static unsigned long __head sme_postprocess_startup(str= uct boot_params *bp, unsigned long __head __startup_64(unsigned long p2v_offset, struct boot_params *bp) { - pmd_t (*early_pgts)[PTRS_PER_PMD] =3D RIP_REL_REF(early_dynamic_pgts); - unsigned long physaddr =3D (unsigned long)&RIP_REL_REF(_text); + pmd_t (*early_pgts)[PTRS_PER_PMD] =3D rip_rel_ptr(early_dynamic_pgts); + unsigned long physaddr =3D (unsigned long)rip_rel_ptr(_text); unsigned long va_text, va_end; unsigned long pgtable_flags; unsigned long load_delta; @@ -174,18 +174,18 @@ unsigned long __head __startup_64(unsigned long p2v_o= ffset, for (;;); =20 va_text =3D physaddr - p2v_offset; - va_end =3D (unsigned long)&RIP_REL_REF(_end) - p2v_offset; + va_end =3D (unsigned long)rip_rel_ptr(_end) - p2v_offset; =20 /* Include the SME encryption mask in the fixup value */ load_delta +=3D sme_get_me_mask(); =20 /* Fixup the physical addresses in the page table */ =20 - pgd =3D &RIP_REL_REF(early_top_pgt)->pgd; + pgd =3D rip_rel_ptr(early_top_pgt); pgd[pgd_index(__START_KERNEL_map)] +=3D load_delta; =20 if (IS_ENABLED(CONFIG_X86_5LEVEL) && la57) { - p4d =3D (p4dval_t *)&RIP_REL_REF(level4_kernel_pgt); + p4d =3D (p4dval_t *)rip_rel_ptr(level4_kernel_pgt); p4d[MAX_PTRS_PER_P4D - 1] +=3D load_delta; =20 pgd[pgd_index(__START_KERNEL_map)] =3D (pgdval_t)p4d | _PAGE_TABLE; @@ -258,7 +258,7 @@ unsigned long __head __startup_64(unsigned long p2v_off= set, * error, causing the BIOS to halt the system. */ =20 - pmd =3D &RIP_REL_REF(level2_kernel_pgt)->pmd; + pmd =3D rip_rel_ptr(level2_kernel_pgt); =20 /* invalidate pages before the kernel image */ for (i =3D 0; i < pmd_index(va_text); i++) @@ -531,7 +531,7 @@ static gate_desc bringup_idt_table[NUM_EXCEPTION_VECTOR= S] __page_aligned_data; static void __head startup_64_load_idt(void *vc_handler) { struct desc_ptr desc =3D { - .address =3D (unsigned long)&RIP_REL_REF(bringup_idt_table), + .address =3D (unsigned long)rip_rel_ptr(bringup_idt_table), .size =3D sizeof(bringup_idt_table) - 1, }; struct idt_data data; @@ -565,11 +565,11 @@ void early_setup_idt(void) */ void __head startup_64_setup_gdt_idt(void) { - struct desc_struct *gdt =3D (void *)(__force unsigned long)gdt_page.gdt; + struct gdt_page *gp =3D rip_rel_ptr((void *)(__force unsigned long)&gdt_p= age); void *handler =3D NULL; =20 struct desc_ptr startup_gdt_descr =3D { - .address =3D (unsigned long)&RIP_REL_REF(*gdt), + .address =3D (unsigned long)gp->gdt, .size =3D GDT_SIZE - 1, }; =20 @@ -582,7 +582,7 @@ void __head startup_64_setup_gdt_idt(void) "movl %%eax, %%es\n" : : "a"(__KERNEL_DS) : "memory"); =20 if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) - handler =3D &RIP_REL_REF(vc_no_ghcb); + handler =3D rip_rel_ptr(vc_no_ghcb); =20 startup_64_load_idt(handler); } diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_i= dentity.c index 5eecdd92da10..e7fb3779b35f 100644 --- a/arch/x86/mm/mem_encrypt_identity.c +++ b/arch/x86/mm/mem_encrypt_identity.c @@ -318,8 +318,8 @@ void __head sme_encrypt_kernel(struct boot_params *bp) * memory from being cached. */ =20 - kernel_start =3D (unsigned long)RIP_REL_REF(_text); - kernel_end =3D ALIGN((unsigned long)RIP_REL_REF(_end), PMD_SIZE); + kernel_start =3D (unsigned long)rip_rel_ptr(_text); + kernel_end =3D ALIGN((unsigned long)rip_rel_ptr(_end), PMD_SIZE); kernel_len =3D kernel_end - kernel_start; =20 initrd_start =3D 0; @@ -345,7 +345,7 @@ void __head sme_encrypt_kernel(struct boot_params *bp) * pagetable structures for the encryption of the kernel * pagetable structures for workarea (in case not currently mapped) */ - execute_start =3D workarea_start =3D (unsigned long)RIP_REL_REF(sme_worka= rea); + execute_start =3D workarea_start =3D (unsigned long)rip_rel_ptr(sme_worka= rea); execute_end =3D execute_start + (PAGE_SIZE * 2) + PMD_SIZE; execute_len =3D execute_end - execute_start; =20 --=20 2.49.0.504.g3bcea36a83-goog From nobody Thu Dec 18 13:15:19 2025 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E85CA28A405 for ; Thu, 10 Apr 2025 13:41:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292507; cv=none; b=np4u8QY/fGNqdWPWrhQbfiCeQv15Ka9WgKkwmr7GG+YzSxP0KY/0Lc0V1GRcOylAdzWDsUUMbncdoCcPWzCsemYlFDzDiZeOHMFw1cDdMhOBuOSynDrYCFAl3Zt4Th8SpPA8c3eluqYkR6Ba6vgzgaSmmkpjsyL5+H9AUWrwohw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292507; c=relaxed/simple; bh=Z9KU8XxKRScSaHvY2ha9sFr5gN/G6+Xwd8QUfPi9AL0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=PZWK3e9sEzap18VdykbGtl9dKejQJjEQRuRTNPRZyVsDd8Qf0iMKqpWvjL3Njm58s8u1FY4POYMp4PEcAAshPKKM3dWhICCDaa3X0ez814iI38ditUJekLiOeAzlo3l/SfMnyltlRjayrBsnQ0ejOe3X5gplK7nMjpjpZLi3euM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=r3zs9AdX; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="r3zs9AdX" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-39131f2bbe5so326718f8f.3 for ; Thu, 10 Apr 2025 06:41:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744292504; x=1744897304; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=KF3wFyMqFk5OCzQRT3cGNIX8apAV8WuLNqnG519UiK0=; b=r3zs9AdX9GsxZMrMhPTPY310M0AOzU1c38MEYgWmYRVOojYRMT2dj5qMTegbXrzA20 /zMD7KuYbhQ+3tVMoViJRLh2hsegzi33c+RIn34hFF/LZPy3j80PYlI1nKZcxSj5O+wP oGyyMpvpkPREy8KeNDD0KG/h5Ts+KUtxt5nbKHG2R60pMq1mViqFgis7x33p65FhXsWU 8WC5ePFxK4BiU2uesNvxRQvEmdIbxC28PraSEAV7Df7iePMPLPxGZpr3BV5MUCXIKtts Kc/BHPsFr0aOmenhOMcC1XC9To2xIo3ErJa0xlk5ATmleXSMoksq1kEKedYAe/kllLcL ehNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744292504; x=1744897304; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KF3wFyMqFk5OCzQRT3cGNIX8apAV8WuLNqnG519UiK0=; b=Y74pC2ZO0CHvrZOvcD/xz1kn300F6dh+Qg3h0URqHsGeY2yKhjpygAEQyU5HmGbOSy ILEktlHkjTwXObqNDsHWFfOfycTVCwapwyg5xl23bAYknCZzkC9PBiSNJNMjGBolTHPg EgKY5qtomslZTbzA9gDOQYCNJwiwf8lLcBddk17RI+4xfBShF4984VusspFYHfZPxYaG CtgwvPXUIOaD2Elh9+36NekYMs1T+sF2XoilE6uW1Y9aMJF+9jnDMRpgLBq5g0EdgvUK +m3fs1rYDX6V7FAgO3sWM41/6wDlR0W6cqsmRIojKzoPOCTAbqmuGM3kvSbqzt1pI4kE rfbw== X-Forwarded-Encrypted: i=1; AJvYcCWQ54IeA0ClE5znsRtfIU0pi/rj3P6iJbK4gsNWRJIcNvaaxlf9H2J0ElHcdSXNDw1EzoUT/YYQzIfrfus=@vger.kernel.org X-Gm-Message-State: AOJu0YxfoZHndACbr8ZBIsGdAvvRl7nsr9OKE5/PtLJEbEQULiAko7Vr N8R8nKhPEW7I1Bhe7kcUk+WWhULfxYOzsSdgnAkDJlNrCHustaH629t/fKdV3/gYIY50Dw== X-Google-Smtp-Source: AGHT+IGLJ1g9qWYFRLbvfayzPkGURQoNtuE64AN7e/FtM9+0Wx8poeCQDn7dHo9mhHcAXpCVHE9fI77j X-Received: from wmbh17.prod.google.com ([2002:a05:600c:a111:b0:43d:1c11:4e5d]) (user=ardb job=prod-delivery.src-stubby-dispatcher) by 2002:a5d:5f4f:0:b0:390:eebc:6f32 with SMTP id ffacd0b85a97d-39d8fda7660mr2385734f8f.48.1744292504422; Thu, 10 Apr 2025 06:41:44 -0700 (PDT) Date: Thu, 10 Apr 2025 15:41:20 +0200 In-Reply-To: <20250410134117.3713574-13-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250410134117.3713574-13-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=7814; i=ardb@kernel.org; h=from:subject; bh=9hD4nc9XUYWvD6V2ufz0yp8+oHtK4mbTifqG6gyUMrw=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIf37qcZJmX2CITdv/wrNT9sSPzNXhEtxS07kzGu3+iIl+ uIXcMd2lLIwiHEwyIopsgjM/vtu5+mJUrXOs2Rh5rAygQxh4OIUgIl8zmJkeGe789WWFUf7Txne u32yVehgyC7v5ZqCu9J+nJ0X8GFWRxjD/2D5NIuPG4y2r13ueFaxcsfH18q64jlvVJR3MR1jqTH pYgcA X-Mailer: git-send-email 2.49.0.504.g3bcea36a83-goog Message-ID: <20250410134117.3713574-15-ardb+git@google.com> Subject: [PATCH v4 02/11] x86/boot: Move the early GDT/IDT setup code into startup/ From: Ard Biesheuvel To: linux-efi@vger.kernel.org Cc: x86@kernel.org, mingo@kernel.org, linux-kernel@vger.kernel.org, Ard Biesheuvel , Tom Lendacky , Dionna Amalie Glaze , Kevin Loughlin Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ard Biesheuvel Move the early GDT/IDT setup code that runs long before the kernel virtual mapping is up into arch/x86/boot/startup/, and build it in a way that ensures that the code tolerates being called from the 1:1 mapping of memory. The code itself is left unchanged by this patch. Also tweak the sed symbol matching pattern in the decompressor to match on lower case 't' or 'b', as these will be emitted by Clang for symbols with hidden linkage. Signed-off-by: Ard Biesheuvel --- arch/x86/boot/compressed/Makefile | 2 +- arch/x86/boot/startup/Makefile | 15 ++++ arch/x86/boot/startup/gdt_idt.c | 84 ++++++++++++++++++++ arch/x86/kernel/head64.c | 74 ----------------- 4 files changed, 100 insertions(+), 75 deletions(-) diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/M= akefile index 37b85ce9b2a3..0fcad7b7e007 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -73,7 +73,7 @@ LDFLAGS_vmlinux +=3D -T hostprogs :=3D mkpiggy HOST_EXTRACFLAGS +=3D -I$(srctree)/tools/include =20 -sed-voffset :=3D -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|__start_rod= ata\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p' +sed-voffset :=3D -e 's/^\([0-9a-fA-F]*\) [ABbCDGRSTtVW] \(_text\|__start_r= odata\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p' =20 quiet_cmd_voffset =3D VOFFSET $@ cmd_voffset =3D $(NM) $< | sed -n $(sed-voffset) > $@ diff --git a/arch/x86/boot/startup/Makefile b/arch/x86/boot/startup/Makefile index 8919a1cbcb5a..1beb5de30735 100644 --- a/arch/x86/boot/startup/Makefile +++ b/arch/x86/boot/startup/Makefile @@ -1,6 +1,21 @@ # SPDX-License-Identifier: GPL-2.0 =20 KBUILD_AFLAGS +=3D -D__DISABLE_EXPORTS +KBUILD_CFLAGS +=3D -D__DISABLE_EXPORTS -mcmodel=3Dsmall -fPIC \ + -Os -DDISABLE_BRANCH_PROFILING \ + $(DISABLE_STACKLEAK_PLUGIN) \ + -fno-stack-protector -D__NO_FORTIFY \ + -include $(srctree)/include/linux/hidden.h + +# disable ftrace hooks +KBUILD_CFLAGS :=3D $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) +KASAN_SANITIZE :=3D n +KCSAN_SANITIZE :=3D n +KMSAN_SANITIZE :=3D n +UBSAN_SANITIZE :=3D n +KCOV_INSTRUMENT :=3D n + +obj-$(CONFIG_X86_64) +=3D gdt_idt.o =20 lib-$(CONFIG_X86_64) +=3D la57toggle.o lib-$(CONFIG_EFI_MIXED) +=3D efi-mixed.o diff --git a/arch/x86/boot/startup/gdt_idt.c b/arch/x86/boot/startup/gdt_id= t.c new file mode 100644 index 000000000000..7e34d0b426b1 --- /dev/null +++ b/arch/x86/boot/startup/gdt_idt.c @@ -0,0 +1,84 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include + +#include +#include +#include +#include +#include + +/* + * Data structures and code used for IDT setup in head_64.S. The bringup-I= DT is + * used until the idt_table takes over. On the boot CPU this happens in + * x86_64_start_kernel(), on secondary CPUs in start_secondary(). In both = cases + * this happens in the functions called from head_64.S. + * + * The idt_table can't be used that early because all the code modifying i= t is + * in idt.c and can be instrumented by tracing or KASAN, which both don't = work + * during early CPU bringup. Also the idt_table has the runtime vectors + * configured which require certain CPU state to be setup already (like TS= S), + * which also hasn't happened yet in early CPU bringup. + */ +static gate_desc bringup_idt_table[NUM_EXCEPTION_VECTORS] __page_aligned_d= ata; + +/* This may run while still in the direct mapping */ +static void __head startup_64_load_idt(void *vc_handler) +{ + struct desc_ptr desc =3D { + .address =3D (unsigned long)rip_rel_ptr(bringup_idt_table), + .size =3D sizeof(bringup_idt_table) - 1, + }; + struct idt_data data; + gate_desc idt_desc; + + /* @vc_handler is set only for a VMM Communication Exception */ + if (vc_handler) { + init_idt_data(&data, X86_TRAP_VC, vc_handler); + idt_init_desc(&idt_desc, &data); + native_write_idt_entry((gate_desc *)desc.address, X86_TRAP_VC, &idt_desc= ); + } + + native_load_idt(&desc); +} + +/* This is used when running on kernel addresses */ +void early_setup_idt(void) +{ + void *handler =3D NULL; + + if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) { + setup_ghcb(); + handler =3D vc_boot_ghcb; + } + + startup_64_load_idt(handler); +} + +/* + * Setup boot CPU state needed before kernel switches to virtual addresses. + */ +void __head startup_64_setup_gdt_idt(void) +{ + struct gdt_page *gp =3D rip_rel_ptr((void *)(__force unsigned long)&gdt_p= age); + void *handler =3D NULL; + + struct desc_ptr startup_gdt_descr =3D { + .address =3D (unsigned long)gp->gdt, + .size =3D GDT_SIZE - 1, + }; + + /* Load GDT */ + native_load_gdt(&startup_gdt_descr); + + /* New GDT is live - reload data segment registers */ + asm volatile("movl %%eax, %%ds\n" + "movl %%eax, %%ss\n" + "movl %%eax, %%es\n" : : "a"(__KERNEL_DS) : "memory"); + + if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) + handler =3D rip_rel_ptr(vc_no_ghcb); + + startup_64_load_idt(handler); +} diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 954d093f187b..9b2ffec4bbad 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -512,77 +512,3 @@ void __init __noreturn x86_64_start_reservations(char = *real_mode_data) =20 start_kernel(); } - -/* - * Data structures and code used for IDT setup in head_64.S. The bringup-I= DT is - * used until the idt_table takes over. On the boot CPU this happens in - * x86_64_start_kernel(), on secondary CPUs in start_secondary(). In both = cases - * this happens in the functions called from head_64.S. - * - * The idt_table can't be used that early because all the code modifying i= t is - * in idt.c and can be instrumented by tracing or KASAN, which both don't = work - * during early CPU bringup. Also the idt_table has the runtime vectors - * configured which require certain CPU state to be setup already (like TS= S), - * which also hasn't happened yet in early CPU bringup. - */ -static gate_desc bringup_idt_table[NUM_EXCEPTION_VECTORS] __page_aligned_d= ata; - -/* This may run while still in the direct mapping */ -static void __head startup_64_load_idt(void *vc_handler) -{ - struct desc_ptr desc =3D { - .address =3D (unsigned long)rip_rel_ptr(bringup_idt_table), - .size =3D sizeof(bringup_idt_table) - 1, - }; - struct idt_data data; - gate_desc idt_desc; - - /* @vc_handler is set only for a VMM Communication Exception */ - if (vc_handler) { - init_idt_data(&data, X86_TRAP_VC, vc_handler); - idt_init_desc(&idt_desc, &data); - native_write_idt_entry((gate_desc *)desc.address, X86_TRAP_VC, &idt_desc= ); - } - - native_load_idt(&desc); -} - -/* This is used when running on kernel addresses */ -void early_setup_idt(void) -{ - void *handler =3D NULL; - - if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) { - setup_ghcb(); - handler =3D vc_boot_ghcb; - } - - startup_64_load_idt(handler); -} - -/* - * Setup boot CPU state needed before kernel switches to virtual addresses. - */ -void __head startup_64_setup_gdt_idt(void) -{ - struct gdt_page *gp =3D rip_rel_ptr((void *)(__force unsigned long)&gdt_p= age); - void *handler =3D NULL; - - struct desc_ptr startup_gdt_descr =3D { - .address =3D (unsigned long)gp->gdt, - .size =3D GDT_SIZE - 1, - }; - - /* Load GDT */ - native_load_gdt(&startup_gdt_descr); - - /* New GDT is live - reload data segment registers */ - asm volatile("movl %%eax, %%ds\n" - "movl %%eax, %%ss\n" - "movl %%eax, %%es\n" : : "a"(__KERNEL_DS) : "memory"); - - if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) - handler =3D rip_rel_ptr(vc_no_ghcb); - - startup_64_load_idt(handler); -} --=20 2.49.0.504.g3bcea36a83-goog From nobody Thu Dec 18 13:15:19 2025 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D5E128C5AA for ; Thu, 10 Apr 2025 13:41:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292510; cv=none; b=A1k5GLMKGhi6r4H9toMLJI0skUxIlSIqcR2K0ItTKx6ig2qcBsPpT30t+tLkujg6oA4NMCNgWcq2wAnAEZw6mbXb+pI3li57kUAmfwqM88h3e9gB4BjdIjJwSRRp3toxMK3YN4OTQsIZWLBPMLK8F5BBDtaY3QTZWM7RU7Zm74w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292510; c=relaxed/simple; bh=afAM/r8LLBJ3ce6878gWia6R3fO9omu9O2KzXRnZMwI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=SBJMFBuA4TjcXhwhCsaZuXOkge+m19NN6NwgxQvqzn9bkxQs4ADj+4CiNS/su/IXw9hd33XvkD7tVRGecPmUHKqjBnQF2PAEkpyoMz/TYZmN38bysBpivJmT0g5SMPTOC55xt2FMrsw2Pw7gkK8SWmk6XNOaABL1aIXixwPgMhE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=NqEg94ez; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="NqEg94ez" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-43cf327e9a2so7649325e9.3 for ; Thu, 10 Apr 2025 06:41:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744292506; x=1744897306; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=cD0R3YWzzmvI7uAGndwbKruAiXCyd47nNPAPxqQeLtI=; b=NqEg94ezBM65b0IIeKw7TtJ7mbHqbvxILnPBJwo0Fd/AsTc0OcJsxDvcrkESWbCMNv MM3aVXOGfhsX1WNrgn9kbVzidvzpcZHAq2jzID/BXj0KK1NlSLVc+L7GJz+/FE5y+7X4 qZa7gx8t/gGJeZttO+XlgBZrRRIIeiCpj3zj03NfzK4iR7AElg9OmwfMUhv1OObmi2oE FBdZTEbfLBOfFIFql2WD0WrVSiO0+QW7l3z4M5swnZOvP4ElaRFgLD2kSnDIAMDXaKwU wYa2vDafbJzF1Gr6Pe+gF3WJm2PPoBwsC/oP4l/ffPpsvs427AheSXN7dfra5IYP0caA Bmeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744292506; x=1744897306; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=cD0R3YWzzmvI7uAGndwbKruAiXCyd47nNPAPxqQeLtI=; b=JfyP6bsEVwv6BvySILERmBaDRFPHwWoe/U/dvZ/mKO9F8uAOVJT4AFIJPcLvWS8B9t wzZctqhkW690hVCWwDK3b1WtAaa1gB25EOATEl9hryvrKfSLCLuM+KJGIDDTLXFJtAjv BMqioNCsaK5pfKumU+bSr41BulbbEGqadXFpt28jMkWki99VrPL1sopKCcgTyrnrHtKd 5IdKzccjI5N+E8NSzPsnRodnbOdOZ+x5kkfubdrUjnEB0V53W8MEh+4oDA0H9OLwgdb2 A6fm99sesgIcBOGqcPO2m0MX9LrfJhr612M89+U6JHlvGOhv+lJyqZStazhkBowxP2v5 +30A== X-Forwarded-Encrypted: i=1; AJvYcCVsx+dkBG7OzqeDMw9SIeb5CHig3J8snmybzIQGrAxPurRNAen9St3lML78XyYdJsA5SIHLbYXEXuMVldU=@vger.kernel.org X-Gm-Message-State: AOJu0YwF6fveEORgC+koSykg9BeQYlVzzZiI9l4DlSqz8bh3oUPa8vsz VgrnijiSz3meA68Keibx34CpXEWXsezsTVdQAeqprakeOnsvDwYAvDkKQY7LTK+zSrAXPw== X-Google-Smtp-Source: AGHT+IECSvnXWyIDZ+bhCcYKjAWG/3jyLP57403k3bJhoWftdK+ZIM70dIQ1HrpF0Y0cabvsuB/fI0mn X-Received: from wmbay5.prod.google.com ([2002:a05:600c:1e05:b0:43c:ef1f:48d8]) (user=ardb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:4e13:b0:43b:c284:5bc2 with SMTP id 5b1f17b1804b1-43f2d6819c8mr35742265e9.0.1744292506332; Thu, 10 Apr 2025 06:41:46 -0700 (PDT) Date: Thu, 10 Apr 2025 15:41:21 +0200 In-Reply-To: <20250410134117.3713574-13-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250410134117.3713574-13-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=16737; i=ardb@kernel.org; h=from:subject; bh=tGbwhpUC8P3TmNYLN+90vBme6ochJTirJO/Rz83PktE=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIf37qWaOgifFv4w3Rxiki266OaXe8tG1p54minP2i+76r O/o8iWzo5SFQYyDQVZMkUVg9t93O09PlKp1niULM4eVCWQIAxenAExk81qGvyJ3JuyYusd5Afci 0ZXckmFV677smOQXFvj7x1SmsxmbAgUYGToPfPn1pVrL8uAFcVUm0T03c1Oaq/WENkuJfpotKum XzQoA X-Mailer: git-send-email 2.49.0.504.g3bcea36a83-goog Message-ID: <20250410134117.3713574-16-ardb+git@google.com> Subject: [PATCH v4 03/11] x86/boot: Move early kernel mapping code into startup/ From: Ard Biesheuvel To: linux-efi@vger.kernel.org Cc: x86@kernel.org, mingo@kernel.org, linux-kernel@vger.kernel.org, Ard Biesheuvel , Tom Lendacky , Dionna Amalie Glaze , Kevin Loughlin Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ard Biesheuvel The startup code that constructs the kernel virtual mapping runs from the 1:1 mapping of memory itself, and therefore, cannot use absolute symbol references. Before making changes in subsequent patches, move this code into a separate source file under arch/x86/boot/startup/ where all such code will be kept from now on. Signed-off-by: Ard Biesheuvel --- arch/x86/boot/startup/Makefile | 2 +- arch/x86/boot/startup/map_kernel.c | 224 ++++++++++++++++++++ arch/x86/kernel/head64.c | 211 +----------------- 3 files changed, 226 insertions(+), 211 deletions(-) diff --git a/arch/x86/boot/startup/Makefile b/arch/x86/boot/startup/Makefile index 1beb5de30735..10319aee666b 100644 --- a/arch/x86/boot/startup/Makefile +++ b/arch/x86/boot/startup/Makefile @@ -15,7 +15,7 @@ KMSAN_SANITIZE :=3D n UBSAN_SANITIZE :=3D n KCOV_INSTRUMENT :=3D n =20 -obj-$(CONFIG_X86_64) +=3D gdt_idt.o +obj-$(CONFIG_X86_64) +=3D gdt_idt.o map_kernel.o =20 lib-$(CONFIG_X86_64) +=3D la57toggle.o lib-$(CONFIG_EFI_MIXED) +=3D efi-mixed.o diff --git a/arch/x86/boot/startup/map_kernel.c b/arch/x86/boot/startup/map= _kernel.c new file mode 100644 index 000000000000..5f1b7e0ba26e --- /dev/null +++ b/arch/x86/boot/startup/map_kernel.c @@ -0,0 +1,224 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD]; +extern unsigned int next_early_pgt; + +static inline bool check_la57_support(void) +{ + if (!IS_ENABLED(CONFIG_X86_5LEVEL)) + return false; + + /* + * 5-level paging is detected and enabled at kernel decompression + * stage. Only check if it has been enabled there. + */ + if (!(native_read_cr4() & X86_CR4_LA57)) + return false; + + RIP_REL_REF(__pgtable_l5_enabled) =3D 1; + RIP_REL_REF(pgdir_shift) =3D 48; + RIP_REL_REF(ptrs_per_p4d) =3D 512; + RIP_REL_REF(page_offset_base) =3D __PAGE_OFFSET_BASE_L5; + RIP_REL_REF(vmalloc_base) =3D __VMALLOC_BASE_L5; + RIP_REL_REF(vmemmap_base) =3D __VMEMMAP_BASE_L5; + + return true; +} + +static unsigned long __head sme_postprocess_startup(struct boot_params *bp, + pmdval_t *pmd, + unsigned long p2v_offset) +{ + unsigned long paddr, paddr_end; + int i; + + /* Encrypt the kernel and related (if SME is active) */ + sme_encrypt_kernel(bp); + + /* + * Clear the memory encryption mask from the .bss..decrypted section. + * The bss section will be memset to zero later in the initialization so + * there is no need to zero it after changing the memory encryption + * attribute. + */ + if (sme_get_me_mask()) { + paddr =3D (unsigned long)rip_rel_ptr(__start_bss_decrypted); + paddr_end =3D (unsigned long)rip_rel_ptr(__end_bss_decrypted); + + for (; paddr < paddr_end; paddr +=3D PMD_SIZE) { + /* + * On SNP, transition the page to shared in the RMP table so that + * it is consistent with the page table attribute change. + * + * __start_bss_decrypted has a virtual address in the high range + * mapping (kernel .text). PVALIDATE, by way of + * early_snp_set_memory_shared(), requires a valid virtual + * address but the kernel is currently running off of the identity + * mapping so use the PA to get a *currently* valid virtual address. + */ + early_snp_set_memory_shared(paddr, paddr, PTRS_PER_PMD); + + i =3D pmd_index(paddr - p2v_offset); + pmd[i] -=3D sme_get_me_mask(); + } + } + + /* + * Return the SME encryption mask (if SME is active) to be used as a + * modifier for the initial pgdir entry programmed into CR3. + */ + return sme_get_me_mask(); +} + +/* Code in __startup_64() can be relocated during execution, but the compi= ler + * doesn't have to generate PC-relative relocations when accessing globals= from + * that function. Clang actually does not generate them, which leads to + * boot-time crashes. To work around this problem, every global pointer mu= st + * be accessed using RIP_REL_REF(). Kernel virtual addresses can be determ= ined + * by subtracting p2v_offset from the RIP-relative address. + */ +unsigned long __head __startup_64(unsigned long p2v_offset, + struct boot_params *bp) +{ + pmd_t (*early_pgts)[PTRS_PER_PMD] =3D rip_rel_ptr(early_dynamic_pgts); + unsigned long physaddr =3D (unsigned long)rip_rel_ptr(_text); + unsigned long va_text, va_end; + unsigned long pgtable_flags; + unsigned long load_delta; + pgdval_t *pgd; + p4dval_t *p4d; + pudval_t *pud; + pmdval_t *pmd, pmd_entry; + bool la57; + int i; + + la57 =3D check_la57_support(); + + /* Is the address too large? */ + if (physaddr >> MAX_PHYSMEM_BITS) + for (;;); + + /* + * Compute the delta between the address I am compiled to run at + * and the address I am actually running at. + */ + load_delta =3D __START_KERNEL_map + p2v_offset; + RIP_REL_REF(phys_base) =3D load_delta; + + /* Is the address not 2M aligned? */ + if (load_delta & ~PMD_MASK) + for (;;); + + va_text =3D physaddr - p2v_offset; + va_end =3D (unsigned long)rip_rel_ptr(_end) - p2v_offset; + + /* Include the SME encryption mask in the fixup value */ + load_delta +=3D sme_get_me_mask(); + + /* Fixup the physical addresses in the page table */ + + pgd =3D rip_rel_ptr(early_top_pgt); + pgd[pgd_index(__START_KERNEL_map)] +=3D load_delta; + + if (IS_ENABLED(CONFIG_X86_5LEVEL) && la57) { + p4d =3D (p4dval_t *)rip_rel_ptr(level4_kernel_pgt); + p4d[MAX_PTRS_PER_P4D - 1] +=3D load_delta; + + pgd[pgd_index(__START_KERNEL_map)] =3D (pgdval_t)p4d | _PAGE_TABLE; + } + + RIP_REL_REF(level3_kernel_pgt)[PTRS_PER_PUD - 2].pud +=3D load_delta; + RIP_REL_REF(level3_kernel_pgt)[PTRS_PER_PUD - 1].pud +=3D load_delta; + + for (i =3D FIXMAP_PMD_TOP; i > FIXMAP_PMD_TOP - FIXMAP_PMD_NUM; i--) + RIP_REL_REF(level2_fixmap_pgt)[i].pmd +=3D load_delta; + + /* + * Set up the identity mapping for the switchover. These + * entries should *NOT* have the global bit set! This also + * creates a bunch of nonsense entries but that is fine -- + * it avoids problems around wraparound. + */ + + pud =3D &early_pgts[0]->pmd; + pmd =3D &early_pgts[1]->pmd; + RIP_REL_REF(next_early_pgt) =3D 2; + + pgtable_flags =3D _KERNPG_TABLE_NOENC + sme_get_me_mask(); + + if (la57) { + p4d =3D &early_pgts[RIP_REL_REF(next_early_pgt)++]->pmd; + + i =3D (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD; + pgd[i + 0] =3D (pgdval_t)p4d + pgtable_flags; + pgd[i + 1] =3D (pgdval_t)p4d + pgtable_flags; + + i =3D physaddr >> P4D_SHIFT; + p4d[(i + 0) % PTRS_PER_P4D] =3D (pgdval_t)pud + pgtable_flags; + p4d[(i + 1) % PTRS_PER_P4D] =3D (pgdval_t)pud + pgtable_flags; + } else { + i =3D (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD; + pgd[i + 0] =3D (pgdval_t)pud + pgtable_flags; + pgd[i + 1] =3D (pgdval_t)pud + pgtable_flags; + } + + i =3D physaddr >> PUD_SHIFT; + pud[(i + 0) % PTRS_PER_PUD] =3D (pudval_t)pmd + pgtable_flags; + pud[(i + 1) % PTRS_PER_PUD] =3D (pudval_t)pmd + pgtable_flags; + + pmd_entry =3D __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL; + /* Filter out unsupported __PAGE_KERNEL_* bits: */ + pmd_entry &=3D RIP_REL_REF(__supported_pte_mask); + pmd_entry +=3D sme_get_me_mask(); + pmd_entry +=3D physaddr; + + for (i =3D 0; i < DIV_ROUND_UP(va_end - va_text, PMD_SIZE); i++) { + int idx =3D i + (physaddr >> PMD_SHIFT); + + pmd[idx % PTRS_PER_PMD] =3D pmd_entry + i * PMD_SIZE; + } + + /* + * Fixup the kernel text+data virtual addresses. Note that + * we might write invalid pmds, when the kernel is relocated + * cleanup_highmap() fixes this up along with the mappings + * beyond _end. + * + * Only the region occupied by the kernel image has so far + * been checked against the table of usable memory regions + * provided by the firmware, so invalidate pages outside that + * region. A page table entry that maps to a reserved area of + * memory would allow processor speculation into that area, + * and on some hardware (particularly the UV platform) even + * speculative access to some reserved areas is caught as an + * error, causing the BIOS to halt the system. + */ + + pmd =3D rip_rel_ptr(level2_kernel_pgt); + + /* invalidate pages before the kernel image */ + for (i =3D 0; i < pmd_index(va_text); i++) + pmd[i] &=3D ~_PAGE_PRESENT; + + /* fixup pages that are part of the kernel image */ + for (; i <=3D pmd_index(va_end); i++) + if (pmd[i] & _PAGE_PRESENT) + pmd[i] +=3D load_delta; + + /* invalidate pages after the kernel image */ + for (; i < PTRS_PER_PMD; i++) + pmd[i] &=3D ~_PAGE_PRESENT; + + return sme_postprocess_startup(bp, pmd, p2v_offset); +} diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 9b2ffec4bbad..6b68a206fa7f 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -47,7 +47,7 @@ * Manage page tables very early on. */ extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD]; -static unsigned int __initdata next_early_pgt; +unsigned int __initdata next_early_pgt; pmdval_t early_pmd_flags =3D __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_= NX); =20 #ifdef CONFIG_X86_5LEVEL @@ -67,215 +67,6 @@ unsigned long vmemmap_base __ro_after_init =3D __VMEMMA= P_BASE_L4; EXPORT_SYMBOL(vmemmap_base); #endif =20 -static inline bool check_la57_support(void) -{ - if (!IS_ENABLED(CONFIG_X86_5LEVEL)) - return false; - - /* - * 5-level paging is detected and enabled at kernel decompression - * stage. Only check if it has been enabled there. - */ - if (!(native_read_cr4() & X86_CR4_LA57)) - return false; - - RIP_REL_REF(__pgtable_l5_enabled) =3D 1; - RIP_REL_REF(pgdir_shift) =3D 48; - RIP_REL_REF(ptrs_per_p4d) =3D 512; - RIP_REL_REF(page_offset_base) =3D __PAGE_OFFSET_BASE_L5; - RIP_REL_REF(vmalloc_base) =3D __VMALLOC_BASE_L5; - RIP_REL_REF(vmemmap_base) =3D __VMEMMAP_BASE_L5; - - return true; -} - -static unsigned long __head sme_postprocess_startup(struct boot_params *bp, - pmdval_t *pmd, - unsigned long p2v_offset) -{ - unsigned long paddr, paddr_end; - int i; - - /* Encrypt the kernel and related (if SME is active) */ - sme_encrypt_kernel(bp); - - /* - * Clear the memory encryption mask from the .bss..decrypted section. - * The bss section will be memset to zero later in the initialization so - * there is no need to zero it after changing the memory encryption - * attribute. - */ - if (sme_get_me_mask()) { - paddr =3D (unsigned long)rip_rel_ptr(__start_bss_decrypted); - paddr_end =3D (unsigned long)rip_rel_ptr(__end_bss_decrypted); - - for (; paddr < paddr_end; paddr +=3D PMD_SIZE) { - /* - * On SNP, transition the page to shared in the RMP table so that - * it is consistent with the page table attribute change. - * - * __start_bss_decrypted has a virtual address in the high range - * mapping (kernel .text). PVALIDATE, by way of - * early_snp_set_memory_shared(), requires a valid virtual - * address but the kernel is currently running off of the identity - * mapping so use the PA to get a *currently* valid virtual address. - */ - early_snp_set_memory_shared(paddr, paddr, PTRS_PER_PMD); - - i =3D pmd_index(paddr - p2v_offset); - pmd[i] -=3D sme_get_me_mask(); - } - } - - /* - * Return the SME encryption mask (if SME is active) to be used as a - * modifier for the initial pgdir entry programmed into CR3. - */ - return sme_get_me_mask(); -} - -/* Code in __startup_64() can be relocated during execution, but the compi= ler - * doesn't have to generate PC-relative relocations when accessing globals= from - * that function. Clang actually does not generate them, which leads to - * boot-time crashes. To work around this problem, every global pointer mu= st - * be accessed using RIP_REL_REF(). Kernel virtual addresses can be determ= ined - * by subtracting p2v_offset from the RIP-relative address. - */ -unsigned long __head __startup_64(unsigned long p2v_offset, - struct boot_params *bp) -{ - pmd_t (*early_pgts)[PTRS_PER_PMD] =3D rip_rel_ptr(early_dynamic_pgts); - unsigned long physaddr =3D (unsigned long)rip_rel_ptr(_text); - unsigned long va_text, va_end; - unsigned long pgtable_flags; - unsigned long load_delta; - pgdval_t *pgd; - p4dval_t *p4d; - pudval_t *pud; - pmdval_t *pmd, pmd_entry; - bool la57; - int i; - - la57 =3D check_la57_support(); - - /* Is the address too large? */ - if (physaddr >> MAX_PHYSMEM_BITS) - for (;;); - - /* - * Compute the delta between the address I am compiled to run at - * and the address I am actually running at. - */ - load_delta =3D __START_KERNEL_map + p2v_offset; - RIP_REL_REF(phys_base) =3D load_delta; - - /* Is the address not 2M aligned? */ - if (load_delta & ~PMD_MASK) - for (;;); - - va_text =3D physaddr - p2v_offset; - va_end =3D (unsigned long)rip_rel_ptr(_end) - p2v_offset; - - /* Include the SME encryption mask in the fixup value */ - load_delta +=3D sme_get_me_mask(); - - /* Fixup the physical addresses in the page table */ - - pgd =3D rip_rel_ptr(early_top_pgt); - pgd[pgd_index(__START_KERNEL_map)] +=3D load_delta; - - if (IS_ENABLED(CONFIG_X86_5LEVEL) && la57) { - p4d =3D (p4dval_t *)rip_rel_ptr(level4_kernel_pgt); - p4d[MAX_PTRS_PER_P4D - 1] +=3D load_delta; - - pgd[pgd_index(__START_KERNEL_map)] =3D (pgdval_t)p4d | _PAGE_TABLE; - } - - RIP_REL_REF(level3_kernel_pgt)[PTRS_PER_PUD - 2].pud +=3D load_delta; - RIP_REL_REF(level3_kernel_pgt)[PTRS_PER_PUD - 1].pud +=3D load_delta; - - for (i =3D FIXMAP_PMD_TOP; i > FIXMAP_PMD_TOP - FIXMAP_PMD_NUM; i--) - RIP_REL_REF(level2_fixmap_pgt)[i].pmd +=3D load_delta; - - /* - * Set up the identity mapping for the switchover. These - * entries should *NOT* have the global bit set! This also - * creates a bunch of nonsense entries but that is fine -- - * it avoids problems around wraparound. - */ - - pud =3D &early_pgts[0]->pmd; - pmd =3D &early_pgts[1]->pmd; - RIP_REL_REF(next_early_pgt) =3D 2; - - pgtable_flags =3D _KERNPG_TABLE_NOENC + sme_get_me_mask(); - - if (la57) { - p4d =3D &early_pgts[RIP_REL_REF(next_early_pgt)++]->pmd; - - i =3D (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD; - pgd[i + 0] =3D (pgdval_t)p4d + pgtable_flags; - pgd[i + 1] =3D (pgdval_t)p4d + pgtable_flags; - - i =3D physaddr >> P4D_SHIFT; - p4d[(i + 0) % PTRS_PER_P4D] =3D (pgdval_t)pud + pgtable_flags; - p4d[(i + 1) % PTRS_PER_P4D] =3D (pgdval_t)pud + pgtable_flags; - } else { - i =3D (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD; - pgd[i + 0] =3D (pgdval_t)pud + pgtable_flags; - pgd[i + 1] =3D (pgdval_t)pud + pgtable_flags; - } - - i =3D physaddr >> PUD_SHIFT; - pud[(i + 0) % PTRS_PER_PUD] =3D (pudval_t)pmd + pgtable_flags; - pud[(i + 1) % PTRS_PER_PUD] =3D (pudval_t)pmd + pgtable_flags; - - pmd_entry =3D __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL; - /* Filter out unsupported __PAGE_KERNEL_* bits: */ - pmd_entry &=3D RIP_REL_REF(__supported_pte_mask); - pmd_entry +=3D sme_get_me_mask(); - pmd_entry +=3D physaddr; - - for (i =3D 0; i < DIV_ROUND_UP(va_end - va_text, PMD_SIZE); i++) { - int idx =3D i + (physaddr >> PMD_SHIFT); - - pmd[idx % PTRS_PER_PMD] =3D pmd_entry + i * PMD_SIZE; - } - - /* - * Fixup the kernel text+data virtual addresses. Note that - * we might write invalid pmds, when the kernel is relocated - * cleanup_highmap() fixes this up along with the mappings - * beyond _end. - * - * Only the region occupied by the kernel image has so far - * been checked against the table of usable memory regions - * provided by the firmware, so invalidate pages outside that - * region. A page table entry that maps to a reserved area of - * memory would allow processor speculation into that area, - * and on some hardware (particularly the UV platform) even - * speculative access to some reserved areas is caught as an - * error, causing the BIOS to halt the system. - */ - - pmd =3D rip_rel_ptr(level2_kernel_pgt); - - /* invalidate pages before the kernel image */ - for (i =3D 0; i < pmd_index(va_text); i++) - pmd[i] &=3D ~_PAGE_PRESENT; - - /* fixup pages that are part of the kernel image */ - for (; i <=3D pmd_index(va_end); i++) - if (pmd[i] & _PAGE_PRESENT) - pmd[i] +=3D load_delta; - - /* invalidate pages after the kernel image */ - for (; i < PTRS_PER_PMD; i++) - pmd[i] &=3D ~_PAGE_PRESENT; - - return sme_postprocess_startup(bp, pmd, p2v_offset); -} - /* Wipe all early page tables except for the kernel symbol map */ static void __init reset_early_page_tables(void) { --=20 2.49.0.504.g3bcea36a83-goog From nobody Thu Dec 18 13:15:19 2025 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C981728C5C8 for ; Thu, 10 Apr 2025 13:41:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292511; cv=none; b=jrq+XNae6xQlYa9eNAHkkohp8l1RDRLaKW/9kRgizW1utafmfGfsVTOx3bVr78loIKWJeUK97SPF9eaLuCSfmdsQv6XdE9ryU01Ab3k0U84hXfE7h7hL90DdXXfBnxyw8Y3FJ9MUSSr2WHKByHNZCmCWg3LdKv8xGLwJ711mrEA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292511; c=relaxed/simple; bh=N4MtpQRBVB+U9+XwdsBV8OoaO5bJmSMiovkcXVy1Y8M=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Xx7MC3AvCbYwfhPYqlCDSASo2ColBqrE2i/IE6B/pe+4Upv/KuViRA5AAd7Od9aQkP7Jqh1Ed0xzTrj0dE1FzootaZqJrv6dLOuAJlksXDMlMgGedj4jDqoNwK+tG6/oLoQeNxmWBhUytBXAitx/7ocKSF3CYCCRUpyTRIgCKFk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=HSvtvoze; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="HSvtvoze" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43d3b211d0eso10748825e9.1 for ; Thu, 10 Apr 2025 06:41:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744292508; x=1744897308; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=pfhvtPh90PyQrqwseeYAemYLSGq1dbH+1FaY8eo5bSo=; b=HSvtvozeMV8Z3dDkb4TLnuueFSa8Vo3ZzejIjdBL1yJxpUf80cOeolUF56jFKzMsyv 3B2z3lkRGTXcZ63ayEWeHgV7UKh9R2XhEjShzqOWmVwVl9+er25QizVR1gxuBkBQqKyU RXqLq4aP3VyTBms210t9FhMVWxLK+PiNaqVtcPaFGyzFyYqo7+65FSRrgBV/SV9KKoUp DBSDghi1RYkWAddfcCy8dK1QJ3Prlcg+8Ux7mHK94/kjx0BYQ3WAo7kZk1fX7SQGLOm5 voTOgD8rjwUv7xFBJqsx+x1waCSO7HTBWodFOwT25xwoBE32EeR8U9gkwOywcPg/mX6t EPRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744292508; x=1744897308; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=pfhvtPh90PyQrqwseeYAemYLSGq1dbH+1FaY8eo5bSo=; b=I8gQ3jq940JeepBz8mxKMJ18ViY004KFreHwyj6dLaXmXuyMZwsdZ6p/MKQzC8VcE3 RnLLgkkRCJZNKF/sbigzKYvdlBiLZsQeIZew+aGblk11/mb0L5Ruau94u0ezl9uhWYcT LAvcZetj+Hwj/3qn1qR8ZajHsSubeJy7mSY7G7KLDbyhzULmBkCFjOnZi2Hj9XB5qEbL l+uOB98Ys3V9tmCFsSLCh22rJy2bJ+hW92VKXtuBPp1dC+RNYyhuZX3Su+0GxTYyhCg4 GypH8LyQ3Xy/LQUW0E4eAEB78H4hLUVCVOyFxhIWAOrBsdewZk6/7hIjG8hiE9JMugkr Ezsg== X-Forwarded-Encrypted: i=1; AJvYcCVe3Kq2gbItd4EKY/9uC4TLBV62O+LnIGYtPU44aXq/lDCkyfmVawjUixYjY2bTK5bjWYLn3L0nSxVfWA0=@vger.kernel.org X-Gm-Message-State: AOJu0YyArkyPp63IFaZac9oM9BeW6sgHxHyZ/d5+IQ426gNELM9cpiRj ShJzWCqRomdkxUoFtoOsZ+u0hbl/HL4zrhymKByrjeXLiyM8Q/SUIuBh8OI9Pzb+lSu2Jg== X-Google-Smtp-Source: AGHT+IEHgsALtbsBtlCh5VUUbOKjAbYRqEioBxZdA2XTwAKO+rsn9gwbhSvj8B86/s9ydc4ad/YnzBYc X-Received: from wmbh13.prod.google.com ([2002:a05:600c:a10d:b0:43d:9f1:31a9]) (user=ardb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:14b:b0:43d:174:2668 with SMTP id 5b1f17b1804b1-43f3630500dmr3554455e9.0.1744292508427; Thu, 10 Apr 2025 06:41:48 -0700 (PDT) Date: Thu, 10 Apr 2025 15:41:22 +0200 In-Reply-To: <20250410134117.3713574-13-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250410134117.3713574-13-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=4518; i=ardb@kernel.org; h=from:subject; bh=vBaHbEMhT1iu9JGCBB53txIntdweeE+e86SJD43lhWg=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIf37qZY3swVe9d7f39V0K/TExkdV8mZTc9VL75dvrJS35 k7yvXuso5SFQYyDQVZMkUVg9t93O09PlKp1niULM4eVCWQIAxenAEyE+wkjw7vHDXxpKdeStydN q1gV6xifuuJoEMctwQ8r8hmzWedEejD84dhQPq3Is2pH4/15LBJ8DRdux156liTt+2L2Tf9q7b3 x7AA= X-Mailer: git-send-email 2.49.0.504.g3bcea36a83-goog Message-ID: <20250410134117.3713574-17-ardb+git@google.com> Subject: [PATCH v4 04/11] x86/boot: Drop RIP_REL_REF() uses from early mapping code From: Ard Biesheuvel To: linux-efi@vger.kernel.org Cc: x86@kernel.org, mingo@kernel.org, linux-kernel@vger.kernel.org, Ard Biesheuvel , Tom Lendacky , Dionna Amalie Glaze , Kevin Loughlin Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ard Biesheuvel Now that __startup_64() is built using -fPIC, RIP_REL_REF() has become a NOP and can be removed. Only some occurrences of rip_rel_ptr() will remain, to explicitly take the address of certain global structures in the 1:1 mapping of memory. While at it, update the code comment to describe why this is needed. Signed-off-by: Ard Biesheuvel --- arch/x86/boot/startup/map_kernel.c | 41 ++++++++++---------- 1 file changed, 21 insertions(+), 20 deletions(-) diff --git a/arch/x86/boot/startup/map_kernel.c b/arch/x86/boot/startup/map= _kernel.c index 5f1b7e0ba26e..0eac3f17dbd3 100644 --- a/arch/x86/boot/startup/map_kernel.c +++ b/arch/x86/boot/startup/map_kernel.c @@ -26,12 +26,12 @@ static inline bool check_la57_support(void) if (!(native_read_cr4() & X86_CR4_LA57)) return false; =20 - RIP_REL_REF(__pgtable_l5_enabled) =3D 1; - RIP_REL_REF(pgdir_shift) =3D 48; - RIP_REL_REF(ptrs_per_p4d) =3D 512; - RIP_REL_REF(page_offset_base) =3D __PAGE_OFFSET_BASE_L5; - RIP_REL_REF(vmalloc_base) =3D __VMALLOC_BASE_L5; - RIP_REL_REF(vmemmap_base) =3D __VMEMMAP_BASE_L5; + __pgtable_l5_enabled =3D 1; + pgdir_shift =3D 48; + ptrs_per_p4d =3D 512; + page_offset_base =3D __PAGE_OFFSET_BASE_L5; + vmalloc_base =3D __VMALLOC_BASE_L5; + vmemmap_base =3D __VMEMMAP_BASE_L5; =20 return true; } @@ -81,12 +81,14 @@ static unsigned long __head sme_postprocess_startup(str= uct boot_params *bp, return sme_get_me_mask(); } =20 -/* Code in __startup_64() can be relocated during execution, but the compi= ler - * doesn't have to generate PC-relative relocations when accessing globals= from - * that function. Clang actually does not generate them, which leads to - * boot-time crashes. To work around this problem, every global pointer mu= st - * be accessed using RIP_REL_REF(). Kernel virtual addresses can be determ= ined - * by subtracting p2v_offset from the RIP-relative address. +/* + * This code is compiled using PIC codegen because it will execute from the + * early 1:1 mapping of memory, which deviates from the mapping expected b= y the + * linker. Due to this deviation, taking the address of a global variable = will + * produce an ambiguous result when using the plain & operator. Instead, + * rip_rel_ptr() must be used, which will return the RIP-relative address = in + * the 1:1 mapping of memory. Kernel virtual addresses can be determined by + * subtracting p2v_offset from the RIP-relative address. */ unsigned long __head __startup_64(unsigned long p2v_offset, struct boot_params *bp) @@ -113,8 +115,7 @@ unsigned long __head __startup_64(unsigned long p2v_off= set, * Compute the delta between the address I am compiled to run at * and the address I am actually running at. */ - load_delta =3D __START_KERNEL_map + p2v_offset; - RIP_REL_REF(phys_base) =3D load_delta; + phys_base =3D load_delta =3D __START_KERNEL_map + p2v_offset; =20 /* Is the address not 2M aligned? */ if (load_delta & ~PMD_MASK) @@ -138,11 +139,11 @@ unsigned long __head __startup_64(unsigned long p2v_o= ffset, pgd[pgd_index(__START_KERNEL_map)] =3D (pgdval_t)p4d | _PAGE_TABLE; } =20 - RIP_REL_REF(level3_kernel_pgt)[PTRS_PER_PUD - 2].pud +=3D load_delta; - RIP_REL_REF(level3_kernel_pgt)[PTRS_PER_PUD - 1].pud +=3D load_delta; + level3_kernel_pgt[PTRS_PER_PUD - 2].pud +=3D load_delta; + level3_kernel_pgt[PTRS_PER_PUD - 1].pud +=3D load_delta; =20 for (i =3D FIXMAP_PMD_TOP; i > FIXMAP_PMD_TOP - FIXMAP_PMD_NUM; i--) - RIP_REL_REF(level2_fixmap_pgt)[i].pmd +=3D load_delta; + level2_fixmap_pgt[i].pmd +=3D load_delta; =20 /* * Set up the identity mapping for the switchover. These @@ -153,12 +154,12 @@ unsigned long __head __startup_64(unsigned long p2v_o= ffset, =20 pud =3D &early_pgts[0]->pmd; pmd =3D &early_pgts[1]->pmd; - RIP_REL_REF(next_early_pgt) =3D 2; + next_early_pgt =3D 2; =20 pgtable_flags =3D _KERNPG_TABLE_NOENC + sme_get_me_mask(); =20 if (la57) { - p4d =3D &early_pgts[RIP_REL_REF(next_early_pgt)++]->pmd; + p4d =3D &early_pgts[next_early_pgt++]->pmd; =20 i =3D (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD; pgd[i + 0] =3D (pgdval_t)p4d + pgtable_flags; @@ -179,7 +180,7 @@ unsigned long __head __startup_64(unsigned long p2v_off= set, =20 pmd_entry =3D __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL; /* Filter out unsupported __PAGE_KERNEL_* bits: */ - pmd_entry &=3D RIP_REL_REF(__supported_pte_mask); + pmd_entry &=3D __supported_pte_mask; pmd_entry +=3D sme_get_me_mask(); pmd_entry +=3D physaddr; =20 --=20 2.49.0.504.g3bcea36a83-goog From nobody Thu Dec 18 13:15:19 2025 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 060E52857F1 for ; Thu, 10 Apr 2025 13:41:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292514; cv=none; b=ICzGs+KySi5cbRdfO2NiHGGOhUxVPXzhSxFBQkdKpnvhPIZbxf/FdM6VNcjdgS+jvOsaoc+te9KvfMug6h8ynYA9Mz1jJWE0qrqJZLqulP/efHoygpJWYeRnE3oQ0d88S24lIWfaROLVIFkKXT6o81HtdFD5W+1gXCiq8ZYdRns= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292514; c=relaxed/simple; bh=rGZMX0G8lRLt0dY4UTfyPAeFNjieLVMC1P9MjIC1z4k=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=QKTPkfS+VRZkvwzyvWW/lMqK8E/JRLq8F4/dDOMRCclEqAmzxwfxUJG2s5rKji2MpOor0kq9OBF/m/1APTwk1Uv4OV7Or539co+2zaEOkGUwXG5cLMMYi5yTjmrEw//5bTJ4qSBnH0eDl068EJrhKjnoSyZuOBBD1S5P99BXn08= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=bHEGPwDo; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="bHEGPwDo" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-3978ef9a284so333120f8f.3 for ; Thu, 10 Apr 2025 06:41:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744292510; x=1744897310; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Ut3DF7cdGnYGcsB+1y8GBrdozOKwjcN3MDn2MzZHkLM=; b=bHEGPwDoo3kYs7xiBXTKO01dFpfFevD/s6ceH2tldTcQV/ICyoHnnMvQPMWGNTM4Vt /u0Hc8femlmQO12vFdMNEtNu7S9giSQn8EQL0t8eetKAtSDlhb4NVu27/gpYlspwXxUH GDusrHSDY5ICE1YbaNwN9feVATQLlrkxCwVjdKkukEXC2iQ7jVwO/2/6r7e6z637pX8b jldWi2ybsKGYZl47UNHOZOlpAQqOgyvUf7lIsoyVnKJik/isBGg3wu6cOuyjN15HFJIJ 50OLMjYfIQAd8TGEpfQQd3/LISi4vvp2+tG1svOlqUhxMEkwK7pA0whsD4QA4Ycgmm5t cSYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744292510; x=1744897310; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Ut3DF7cdGnYGcsB+1y8GBrdozOKwjcN3MDn2MzZHkLM=; b=xT6OhYHVM/E3V4TdSwMLnvP+mKXGd+Wdf7XDBQThCzMyqEXnUmr2j+ZUgnYFz2Y6f8 90+ytcKwkGOcJq82me4hok7JjJvA/3Zfp2Oj6UCkrv/HXZGm70Lmzv6WyXuGpPMeRTBY DK3ZfCfcth4pVhfKHxqnk4uAn2bg4qbWmvF+klz10f6iZEwUq48h+0P4SnCdRAhBBmCH zDMCEfuw7iHSwNAgtnU0cZH9dRmzH9l96OVTaYLATAONJhztXMC4Z5jIMLC5FtZG38Qo 2UkFMbCmHplDy2bY9UIf7RKVqOCyLBD3HVLHVlcdc0ScInx36FF0y8qlU1cZFTFKFqTi PMzg== X-Forwarded-Encrypted: i=1; AJvYcCUAPJike4TmGmP9ynA063ZR9AcE4JQID28wzNNMHotzKfV8IjLUzHt6CAe0ogduoEiGHpT95QA4hc4Eyt0=@vger.kernel.org X-Gm-Message-State: AOJu0YxHW28qtxdaU/3LkEda3c1yqDFe5Mfguw2Lm6y0hHDIve1ffWQC E3hD7AwjbiWdpMYrh35twTadCa3YybkinBKSk7q/Rv3sk8wqQ8LVv5zRNAFWnNy1wXMkvg== X-Google-Smtp-Source: AGHT+IE4m6fPKqx/35XmbrFlcOeYZO9KrixDd7VjHDR0y2s4p9wZTI76FsNpYTfEVOGWOh1JL4I+B4WW X-Received: from wmbgx3.prod.google.com ([2002:a05:600c:8583:b0:43d:48c5:64a2]) (user=ardb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6000:248a:b0:39d:724f:a8cd with SMTP id ffacd0b85a97d-39d8f4993d5mr2529023f8f.35.1744292510373; Thu, 10 Apr 2025 06:41:50 -0700 (PDT) Date: Thu, 10 Apr 2025 15:41:23 +0200 In-Reply-To: <20250410134117.3713574-13-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250410134117.3713574-13-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=3139; i=ardb@kernel.org; h=from:subject; bh=1OMnBnM3B5uy1EYv4XyX2QqLee71wb5xFGUqQq+IsUU=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIf37qTbnjFluV2qrldV1vde/X/1KYO3nsp1yTdMyjm3QP vPWOYeho5SFQYyDQVZMkUVg9t93O09PlKp1niULM4eVCWQIAxenAEzkywJGhuvBnsdjV3w5vvnH OivuuDnN8bn3XLa+VD8vqSyx96O2vDYjw8SF97e8VRB3Xb4tx/qV6ub1+gv8m8TuH7RhjJv6xeU oAysA X-Mailer: git-send-email 2.49.0.504.g3bcea36a83-goog Message-ID: <20250410134117.3713574-18-ardb+git@google.com> Subject: [PATCH v4 05/11] x86/boot: Move early SME init code into startup/ From: Ard Biesheuvel To: linux-efi@vger.kernel.org Cc: x86@kernel.org, mingo@kernel.org, linux-kernel@vger.kernel.org, Ard Biesheuvel , Tom Lendacky , Dionna Amalie Glaze , Kevin Loughlin Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ard Biesheuvel Move the SME initialization code, which runs from the 1:1 mapping of memory as it operates on the kernel virtual mapping, into the new sub-directory arch/x86/boot/startup/ where all startup code will reside that needs to tolerate executing from the 1:1 mapping. Signed-off-by: Ard Biesheuvel --- arch/x86/boot/startup/Makefile | 1 + arch/x86/{mm/mem_encrypt_identity.c =3D> boot/startup/sme.c} | 2 -- arch/x86/mm/Makefile | 6 ------ 3 files changed, 1 insertion(+), 8 deletions(-) diff --git a/arch/x86/boot/startup/Makefile b/arch/x86/boot/startup/Makefile index 10319aee666b..ccdfc42a4d59 100644 --- a/arch/x86/boot/startup/Makefile +++ b/arch/x86/boot/startup/Makefile @@ -16,6 +16,7 @@ UBSAN_SANITIZE :=3D n KCOV_INSTRUMENT :=3D n =20 obj-$(CONFIG_X86_64) +=3D gdt_idt.o map_kernel.o +obj-$(CONFIG_AMD_MEM_ENCRYPT) +=3D sme.o =20 lib-$(CONFIG_X86_64) +=3D la57toggle.o lib-$(CONFIG_EFI_MIXED) +=3D efi-mixed.o diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/boot/startup/sme= .c similarity index 99% rename from arch/x86/mm/mem_encrypt_identity.c rename to arch/x86/boot/startup/sme.c index e7fb3779b35f..23d10cda5b58 100644 --- a/arch/x86/mm/mem_encrypt_identity.c +++ b/arch/x86/boot/startup/sme.c @@ -45,8 +45,6 @@ #include #include =20 -#include "mm_internal.h" - #define PGD_FLAGS _KERNPG_TABLE_NOENC #define P4D_FLAGS _KERNPG_TABLE_NOENC #define PUD_FLAGS _KERNPG_TABLE_NOENC diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 32035d5be5a0..3faa60f13a61 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -3,12 +3,10 @@ KCOV_INSTRUMENT_tlb.o :=3D n KCOV_INSTRUMENT_mem_encrypt.o :=3D n KCOV_INSTRUMENT_mem_encrypt_amd.o :=3D n -KCOV_INSTRUMENT_mem_encrypt_identity.o :=3D n KCOV_INSTRUMENT_pgprot.o :=3D n =20 KASAN_SANITIZE_mem_encrypt.o :=3D n KASAN_SANITIZE_mem_encrypt_amd.o :=3D n -KASAN_SANITIZE_mem_encrypt_identity.o :=3D n KASAN_SANITIZE_pgprot.o :=3D n =20 # Disable KCSAN entirely, because otherwise we get warnings that some func= tions @@ -16,12 +14,10 @@ KASAN_SANITIZE_pgprot.o :=3D n KCSAN_SANITIZE :=3D n # Avoid recursion by not calling KMSAN hooks for CEA code. KMSAN_SANITIZE_cpu_entry_area.o :=3D n -KMSAN_SANITIZE_mem_encrypt_identity.o :=3D n =20 ifdef CONFIG_FUNCTION_TRACER CFLAGS_REMOVE_mem_encrypt.o =3D -pg CFLAGS_REMOVE_mem_encrypt_amd.o =3D -pg -CFLAGS_REMOVE_mem_encrypt_identity.o =3D -pg CFLAGS_REMOVE_pgprot.o =3D -pg endif =20 @@ -32,7 +28,6 @@ obj-y +=3D pat/ =20 # Make sure __phys_addr has no stackprotector CFLAGS_physaddr.o :=3D -fno-stack-protector -CFLAGS_mem_encrypt_identity.o :=3D -fno-stack-protector =20 CFLAGS_fault.o :=3D -I $(src)/../include/asm/trace =20 @@ -63,5 +58,4 @@ obj-$(CONFIG_MITIGATION_PAGE_TABLE_ISOLATION) +=3D pti.o obj-$(CONFIG_X86_MEM_ENCRYPT) +=3D mem_encrypt.o obj-$(CONFIG_AMD_MEM_ENCRYPT) +=3D mem_encrypt_amd.o =20 -obj-$(CONFIG_AMD_MEM_ENCRYPT) +=3D mem_encrypt_identity.o obj-$(CONFIG_AMD_MEM_ENCRYPT) +=3D mem_encrypt_boot.o --=20 2.49.0.504.g3bcea36a83-goog From nobody Thu Dec 18 13:15:19 2025 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0091228C5D1 for ; Thu, 10 Apr 2025 13:41:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292515; cv=none; b=XcG9FuTqeYGhQDbWjWUT0eUMUCcuejUgAbSu03HtPzJLgiNe1uPjFdSeHdFJb1CdUGMby6Qo5HvzZeD/KXMJIDthEXr3OVT4Al6nEuAEsV6R1Q21a/YTtFfO67xjIMqGCUlzI819HkJYdypC2wU3yHWznJYscQV75Oe/33DIJQ0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292515; c=relaxed/simple; bh=nMTMqzq37ZDdZSy/ASJb7GM4lqZTMOHxEOCfZ5XR4c8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=AAlkcqAyK46nazJYRrYpKNHpYptkehnjq2vvpmqfEOcFs075FbDUFwOgxKj24aheNhgKvQ0eIwctVoJ06jBgLj7t4XXhPPkFoLIrlnE9vfExy8onodrYiMWYUk6UxXboDVjie39QjA7Yn8kLj82qceJY7b35aD7YyV4mJ/wcC6g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=lsNO/Plv; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="lsNO/Plv" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-39d917b1455so285261f8f.3 for ; Thu, 10 Apr 2025 06:41:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744292512; x=1744897312; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Bu1M3/PvdkhGWlHIRkQzAzcvhk8sTE/2aa4WDt4cxm0=; b=lsNO/Plv7GrYSyZsgRRIoR0xzAlUQGfCsPUP3cYTd+Mv7PS5GPTSDTYQqJDz3XoZ4K IRMUDdRO0JlpwiYnonf7c3jmqfABNexQv41K6GqWcxVw9kiI2IMz3uhOCMh1SlQqGtjM n+gpy4nAWXLkxJuY6c6a8LUB/S75wacUQSMZF3bkNXJhemggj9nzYnPeuDefJpiPT38V mhYndEv5uBQKy+s2UVBilZ8bUv/ZIp77zaNgg1retkENSM2IhRKGF3QMeVFHRhHv+TeA PcMOlo5k6Vrkxum/+PCDC0Eza6O2q4DEt7j41qtW8EL07m8ZZw8vHj3P6PpiYaKdpBO2 5xWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744292512; x=1744897312; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Bu1M3/PvdkhGWlHIRkQzAzcvhk8sTE/2aa4WDt4cxm0=; b=QhsPSUdSzUFGjg+xEmWtPOdP01WppfT+8uFgUrVhJE3wBz8lIsfzIX6pIKHEsCa7Si l90q9rF2XT9q/EwbQQE+SkeKqFTzDYQxOqzFCtkTIIKc1FinYfQgO+JV0s4U8rJ/99Im Wd5sWn/+D8Hd/N+gYuw/prO9K3Tv31YSmd67hIiiukOGWyUWPyVPYjyZR9g/oenTSTlT fCOIwCqTVOwb/C1UTBRHoIpqVSvFc/0ZH6iY3p/z5Cl7iMAlw5awgSHvhnMGX5OU4Rb4 S20XV47TB9vuFqmqG6kcdDoSgdkA+kdLCIGd7TthtW55rIVcaNlzn8KDvpO0v8P7D0tk y6UQ== X-Forwarded-Encrypted: i=1; AJvYcCVPOOE7h/KLdtbmdglkoQ5xTLVyQP7GrIRv6jKOlMvmOmUhT5cwBvRHysVLJlZmBAyzCaYTjeI8Zrv9vzI=@vger.kernel.org X-Gm-Message-State: AOJu0Yz9+ov9UzXWbPsI4FLGZnanllRSylU/zzMBr+olFpWme09l62nK cxPG42TH4mIkkD4foRJIMTSKIkJ8haddgqzzgdLP4h9IUZKAaiIgb5mC1eNqDRQREdOs6A== X-Google-Smtp-Source: AGHT+IHjGJ8my3L+/VrQ6z6E/5WTYGRGconWFEeiWV78b8cMZMlPoKQBR0w9iIS6PTwJ1hNbOmGkrZ05 X-Received: from wmbet13.prod.google.com ([2002:a05:600c:818d:b0:43c:f5b8:aad0]) (user=ardb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6000:4404:b0:39c:13fd:ea9c with SMTP id ffacd0b85a97d-39d8f4e4741mr1694110f8f.47.1744292512486; Thu, 10 Apr 2025 06:41:52 -0700 (PDT) Date: Thu, 10 Apr 2025 15:41:24 +0200 In-Reply-To: <20250410134117.3713574-13-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250410134117.3713574-13-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=2567; i=ardb@kernel.org; h=from:subject; bh=HHqxACxmdTzUBkBumD23I1ULhjlW30p2Zg7pYMcwt7Y=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIf37qQ4N1s3Fv0T1v264wxLJtfCEcbDHtdKL+/Z+T4uxz HE/eX5mRykLgxgHg6yYIovA7L/vdp6eKFXrPEsWZg4rE8gQBi5OAZjImWiGv1KL07/qn5N4NVdG nkHr9JTMLcfOxu+Kc9m0wHNDmYbfhT0M/+s28xXacfhuDJmutG/pkwcnT9ou+qW/8OAEs9v31u3 5UcQPAA== X-Mailer: git-send-email 2.49.0.504.g3bcea36a83-goog Message-ID: <20250410134117.3713574-19-ardb+git@google.com> Subject: [PATCH v4 06/11] x86/boot: Drop RIP_REL_REF() uses from SME startup code From: Ard Biesheuvel To: linux-efi@vger.kernel.org Cc: x86@kernel.org, mingo@kernel.org, linux-kernel@vger.kernel.org, Ard Biesheuvel , Tom Lendacky , Dionna Amalie Glaze , Kevin Loughlin Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ard Biesheuvel RIP_REL_REF() has no effect on code residing in arch/x86/boot/startup, as it is built with -fPIC. So remove any occurrences from the SME startup code. Note the SME is the only caller of cc_set_mask() that requires this, so drop it from there as well. Signed-off-by: Ard Biesheuvel --- arch/x86/boot/startup/sme.c | 11 +++++------ arch/x86/include/asm/coco.h | 2 +- arch/x86/include/asm/mem_encrypt.h | 2 +- 3 files changed, 7 insertions(+), 8 deletions(-) diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c index 23d10cda5b58..5738b31c8e60 100644 --- a/arch/x86/boot/startup/sme.c +++ b/arch/x86/boot/startup/sme.c @@ -297,8 +297,7 @@ void __head sme_encrypt_kernel(struct boot_params *bp) * instrumentation or checking boot_cpu_data in the cc_platform_has() * function. */ - if (!sme_get_me_mask() || - RIP_REL_REF(sev_status) & MSR_AMD64_SEV_ENABLED) + if (!sme_get_me_mask() || sev_status & MSR_AMD64_SEV_ENABLED) return; =20 /* @@ -524,7 +523,7 @@ void __head sme_enable(struct boot_params *bp) me_mask =3D 1UL << (ebx & 0x3f); =20 /* Check the SEV MSR whether SEV or SME is enabled */ - RIP_REL_REF(sev_status) =3D msr =3D __rdmsr(MSR_AMD64_SEV); + sev_status =3D msr =3D __rdmsr(MSR_AMD64_SEV); feature_mask =3D (msr & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BI= T; =20 /* @@ -560,8 +559,8 @@ void __head sme_enable(struct boot_params *bp) return; } =20 - RIP_REL_REF(sme_me_mask) =3D me_mask; - RIP_REL_REF(physical_mask) &=3D ~me_mask; - RIP_REL_REF(cc_vendor) =3D CC_VENDOR_AMD; + sme_me_mask =3D me_mask; + physical_mask &=3D ~me_mask; + cc_vendor =3D CC_VENDOR_AMD; cc_set_mask(me_mask); } diff --git a/arch/x86/include/asm/coco.h b/arch/x86/include/asm/coco.h index e7225452963f..e1dbf8df1b69 100644 --- a/arch/x86/include/asm/coco.h +++ b/arch/x86/include/asm/coco.h @@ -22,7 +22,7 @@ static inline u64 cc_get_mask(void) =20 static inline void cc_set_mask(u64 mask) { - RIP_REL_REF(cc_mask) =3D mask; + cc_mask =3D mask; } =20 u64 cc_mkenc(u64 val); diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_= encrypt.h index 1530ee301dfe..ea6494628cb0 100644 --- a/arch/x86/include/asm/mem_encrypt.h +++ b/arch/x86/include/asm/mem_encrypt.h @@ -61,7 +61,7 @@ void __init sev_es_init_vc_handling(void); =20 static inline u64 sme_get_me_mask(void) { - return RIP_REL_REF(sme_me_mask); + return sme_me_mask; } =20 #define __bss_decrypted __section(".bss..decrypted") --=20 2.49.0.504.g3bcea36a83-goog From nobody Thu Dec 18 13:15:19 2025 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1120228A412 for ; Thu, 10 Apr 2025 13:41:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292518; cv=none; b=mdcK2U61ILcqWVsi7FqGiNUQpboS3p1qUBUprWxzPgLAU6Xiy2BtE12shn9hzyFtqrFf8IoeHZKx4pTJRTc4ZserLDkX/twDIIOzXTEESrRz0iFG0tVswFKV7YCIJXD66oyfczy11vAbylIg3fGJcaVbgS2Cq9r7xvFDUt9Rh3g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292518; c=relaxed/simple; bh=sXcv8Bj/Ng2dizia9Kl+Ln2rH7OpTnrGD/FCfvkuQ/4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=qh79Eo3swOaAan57uKy0CFinIYEXYuCIvTzG24xyg4w5Nj2w8d51KDPU0NwVxy/8XQ5ojMOAwNOv5GlwOWw5JbA/9eijZ2XYUila0sCVwlP5Pu73AVONoEkRwBEBy+WYTeOn56DWfRUfD8e8Ys1rBCbUd5T/NuQ7Nt3rtHCKaxs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=GVHC+mDq; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="GVHC+mDq" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43d00017e9dso6284705e9.0 for ; Thu, 10 Apr 2025 06:41:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744292514; x=1744897314; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=zc0CaNH46aSXfcNvbbVecUQJ4JVti4ah68rnx/wCzdg=; b=GVHC+mDqJ0KTSl8ea5L1BTEYebNpPFdk1QQwP8PHv7MCnbZqcEgryYe43KQxZyYNL1 YG5kW6rVH5pR1hUyhNQaIWJxiP/Znm7uhMNZtmITXVIGq5wAQyXyFlvU6FO4QL86yWEQ zE4xp9840MZ9xz+2onB2DwP4X37lUgaVae8+Q5Dkbp9pDPptz08LgkV9mYWOT7KeuPfw pNT+rdly6f1aQkohJQ9d5V4JQx3SYCHpaYAR8E+V3zvFTptQidzcO3GXxm524MsfAgI3 RZhxF4IgRvPsDjV9nsn3IMo/4sbw4i+J7tWf7Pe2dyqGYvVg4M6kTUrAKKf7MCamM2BY gWBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744292514; x=1744897314; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zc0CaNH46aSXfcNvbbVecUQJ4JVti4ah68rnx/wCzdg=; b=R+KhK+EgSV5ertATgVF0B3stdcQpNUYFVJqCZa26w34Mg6g64SLSkvw8SuczM1Q3kj 7DiYRLVoPMing4I81DIWcWpOVGwyW+CFOu7IeHGDf66ADZp0ggLByeNJiNkhv+/85Won kPnSVVDWKQhy8lyX9oxkfWGCtjhyHPqVvwjnNQdXBkOmlhg4Alsw9pgcxuLIXofTHppw Ic/b0z/1/3bTo5E4LelKVBCmhSoyK2LJAOeRz8jSITcoJtLJXrfKrZw9yq1j96+9Z2Rh UyKADCXYxeJqNbi5Mu+NBuR1BPcd5ZoA3RZMBVPLd5LYAIVN+Dd5hFvBb0eO42hqfsBo uCwQ== X-Forwarded-Encrypted: i=1; AJvYcCXssq8E58QdePzEkRrzBhUivnkYG4Zm1yW+bkhp5vS4BmuA0cBQUFIZikDhL2G966SMI7urEJgQqhdkxbw=@vger.kernel.org X-Gm-Message-State: AOJu0YyEGfgPrLKrthQwP6fAIdLIm7s2GB/JMhShcvMAPF0t3cCPwGXy Y13Me6ZlErem7QufWpHlwxJKH+5pru3ZLlFRLzTu6O8wQ8xIKC+syVte9Jtma6KhH/6/mg== X-Google-Smtp-Source: AGHT+IGLrZ+rbPlDcOJ629ZkS40K3IHzIn/Y2ZzDhDIt5M67vu9gV3VdVb5BPHxoqrfbborMY5SPOVO4 X-Received: from wmbz20.prod.google.com ([2002:a05:600c:c094:b0:43d:55a1:bffc]) (user=ardb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:5248:b0:43d:79:ae1b with SMTP id 5b1f17b1804b1-43f2d7bc45bmr27188705e9.14.1744292514418; Thu, 10 Apr 2025 06:41:54 -0700 (PDT) Date: Thu, 10 Apr 2025 15:41:25 +0200 In-Reply-To: <20250410134117.3713574-13-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250410134117.3713574-13-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=18269; i=ardb@kernel.org; h=from:subject; bh=Qxobw1YbX9zFeutTUvK/UXi3khnpAbEqxCIOzB7Q9gM=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIf37qc49ofNf/Hh8M1Vgq25O+yumR2sO/7pvpuRxvCp6w XujMvkdHaUsDGIcDLJiiiwCs/++23l6olSt8yxZmDmsTCBDGLg4BWAi3hKMDA3vhYqPT95gzfPx SsDT+n1m5gzCLdfZfyr/b8vb/G/HZFaGvzJGAv47tqUu+DY9t6awKps5k3ti/bevbIJFnmtT5Fa 0sAAA X-Mailer: git-send-email 2.49.0.504.g3bcea36a83-goog Message-ID: <20250410134117.3713574-20-ardb+git@google.com> Subject: [PATCH v4 07/11] x86/sev: Prepare for splitting off early SEV code From: Ard Biesheuvel To: linux-efi@vger.kernel.org Cc: x86@kernel.org, mingo@kernel.org, linux-kernel@vger.kernel.org, Ard Biesheuvel , Tom Lendacky , Dionna Amalie Glaze , Kevin Loughlin Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ard Biesheuvel Prepare for splitting off parts of the SEV core.c source file into a file that carries code that must tolerate being called from the early 1:1 mapping. This will allow special build-time handling of thise code, to ensure that it gets generated in a way that is compatible with the early execution context. So create a de-facto internal SEV API and put the definitions into sev-internal.h. No attempt is made to allow this header file to be included in arbitrary other sources - this is explicitly not the intent. Signed-off-by: Ard Biesheuvel --- arch/x86/boot/compressed/sev.c | 15 ++- arch/x86/coco/sev/core.c | 108 +++-------------- arch/x86/coco/sev/shared.c | 64 ++-------- arch/x86/include/asm/sev-internal.h | 122 ++++++++++++++++++++ arch/x86/include/asm/sev.h | 37 ++++++ 5 files changed, 194 insertions(+), 152 deletions(-) diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c index 89ba168f4f0f..478eca4f7180 100644 --- a/arch/x86/boot/compressed/sev.c +++ b/arch/x86/boot/compressed/sev.c @@ -126,20 +126,25 @@ static bool fault_in_kernel_space(unsigned long addre= ss) #include "../../lib/inat.c" #include "../../lib/insn.c" =20 -/* Include code for early handlers */ -#include "../../coco/sev/shared.c" +extern struct svsm_ca *boot_svsm_caa; +extern u64 boot_svsm_caa_pa; =20 -static struct svsm_ca *svsm_get_caa(void) +struct svsm_ca *svsm_get_caa(void) { return boot_svsm_caa; } =20 -static u64 svsm_get_caa_pa(void) +u64 svsm_get_caa_pa(void) { return boot_svsm_caa_pa; } =20 -static int svsm_perform_call_protocol(struct svsm_call *call) +int svsm_perform_call_protocol(struct svsm_call *call); + +/* Include code for early handlers */ +#include "../../coco/sev/shared.c" + +int svsm_perform_call_protocol(struct svsm_call *call) { struct ghcb *ghcb; int ret; diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c index 832f7a7b10b2..aeb7731862c0 100644 --- a/arch/x86/coco/sev/core.c +++ b/arch/x86/coco/sev/core.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #include @@ -44,8 +45,6 @@ #include #include =20 -#define DR7_RESET_VALUE 0x400 - /* AP INIT values as documented in the APM2 section "Processor Initializa= tion State" */ #define AP_INIT_CS_LIMIT 0xffff #define AP_INIT_DS_LIMIT 0xffff @@ -82,16 +81,16 @@ static const char * const sev_status_feat_names[] =3D { }; =20 /* For early boot hypervisor communication in SEV-ES enabled guests */ -static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE); +struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE); =20 /* * Needs to be in the .data section because we need it NULL before bss is * cleared */ -static struct ghcb *boot_ghcb __section(".data"); +struct ghcb *boot_ghcb __section(".data"); =20 /* Bitmap of SEV features supported by the hypervisor */ -static u64 sev_hv_features __ro_after_init; +u64 sev_hv_features __ro_after_init; =20 /* Secrets page physical address from the CC blob */ static u64 secrets_pa __ro_after_init; @@ -105,54 +104,14 @@ static u64 snp_tsc_scale __ro_after_init; static u64 snp_tsc_offset __ro_after_init; static u64 snp_tsc_freq_khz __ro_after_init; =20 -/* #VC handler runtime per-CPU data */ -struct sev_es_runtime_data { - struct ghcb ghcb_page; - - /* - * Reserve one page per CPU as backup storage for the unencrypted GHCB. - * It is needed when an NMI happens while the #VC handler uses the real - * GHCB, and the NMI handler itself is causing another #VC exception. In - * that case the GHCB content of the first handler needs to be backed up - * and restored. - */ - struct ghcb backup_ghcb; - - /* - * Mark the per-cpu GHCBs as in-use to detect nested #VC exceptions. - * There is no need for it to be atomic, because nothing is written to - * the GHCB between the read and the write of ghcb_active. So it is safe - * to use it when a nested #VC exception happens before the write. - * - * This is necessary for example in the #VC->NMI->#VC case when the NMI - * happens while the first #VC handler uses the GHCB. When the NMI code - * raises a second #VC handler it might overwrite the contents of the - * GHCB written by the first handler. To avoid this the content of the - * GHCB is saved and restored when the GHCB is detected to be in use - * already. - */ - bool ghcb_active; - bool backup_ghcb_active; - - /* - * Cached DR7 value - write it on DR7 writes and return it on reads. - * That value will never make it to the real hardware DR7 as debugging - * is currently unsupported in SEV-ES guests. - */ - unsigned long dr7; -}; - -struct ghcb_state { - struct ghcb *ghcb; -}; =20 /* For early boot SVSM communication */ -static struct svsm_ca boot_svsm_ca_page __aligned(PAGE_SIZE); +struct svsm_ca boot_svsm_ca_page __aligned(PAGE_SIZE); =20 -static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data); -static DEFINE_PER_CPU(struct sev_es_save_area *, sev_vmsa); -static DEFINE_PER_CPU(struct svsm_ca *, svsm_caa); -static DEFINE_PER_CPU(u64, svsm_caa_pa); +DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data); +DEFINE_PER_CPU(struct sev_es_save_area *, sev_vmsa); +DEFINE_PER_CPU(struct svsm_ca *, svsm_caa); +DEFINE_PER_CPU(u64, svsm_caa_pa); =20 static __always_inline bool on_vc_stack(struct pt_regs *regs) { @@ -231,7 +190,7 @@ void noinstr __sev_es_ist_exit(void) * * Callers must disable local interrupts around it. */ -static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state) +noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state) { struct sev_es_runtime_data *data; struct ghcb *ghcb; @@ -274,21 +233,6 @@ static noinstr struct ghcb *__sev_get_ghcb(struct ghcb= _state *state) return ghcb; } =20 -static inline u64 sev_es_rd_ghcb_msr(void) -{ - return __rdmsr(MSR_AMD64_SEV_ES_GHCB); -} - -static __always_inline void sev_es_wr_ghcb_msr(u64 val) -{ - u32 low, high; - - low =3D (u32)(val); - high =3D (u32)(val >> 32); - - native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high); -} - static int vc_fetch_insn_kernel(struct es_em_ctxt *ctxt, unsigned char *buffer) { @@ -601,33 +545,7 @@ static __always_inline void vc_forward_exception(struc= t es_em_ctxt *ctxt) /* Include code shared with pre-decompression boot stage */ #include "shared.c" =20 -static inline struct svsm_ca *svsm_get_caa(void) -{ - /* - * Use rIP-relative references when called early in the boot. If - * ->use_cas is set, then it is late in the boot and no need - * to worry about rIP-relative references. - */ - if (RIP_REL_REF(sev_cfg).use_cas) - return this_cpu_read(svsm_caa); - else - return RIP_REL_REF(boot_svsm_caa); -} - -static u64 svsm_get_caa_pa(void) -{ - /* - * Use rIP-relative references when called early in the boot. If - * ->use_cas is set, then it is late in the boot and no need - * to worry about rIP-relative references. - */ - if (RIP_REL_REF(sev_cfg).use_cas) - return this_cpu_read(svsm_caa_pa); - else - return RIP_REL_REF(boot_svsm_caa_pa); -} - -static noinstr void __sev_put_ghcb(struct ghcb_state *state) +noinstr void __sev_put_ghcb(struct ghcb_state *state) { struct sev_es_runtime_data *data; struct ghcb *ghcb; @@ -652,7 +570,7 @@ static noinstr void __sev_put_ghcb(struct ghcb_state *s= tate) } } =20 -static int svsm_perform_call_protocol(struct svsm_call *call) +int svsm_perform_call_protocol(struct svsm_call *call) { struct ghcb_state state; unsigned long flags; @@ -761,7 +679,7 @@ static u64 __init get_jump_table_addr(void) return ret; } =20 -static void __head +void __head early_set_pages_state(unsigned long vaddr, unsigned long paddr, unsigned long npages, enum psc_op op) { diff --git a/arch/x86/coco/sev/shared.c b/arch/x86/coco/sev/shared.c index 04982d356803..a7c94020e384 100644 --- a/arch/x86/coco/sev/shared.c +++ b/arch/x86/coco/sev/shared.c @@ -38,12 +38,8 @@ */ u8 snp_vmpl __ro_after_init; EXPORT_SYMBOL_GPL(snp_vmpl); -static struct svsm_ca *boot_svsm_caa __ro_after_init; -static u64 boot_svsm_caa_pa __ro_after_init; - -static struct svsm_ca *svsm_get_caa(void); -static u64 svsm_get_caa_pa(void); -static int svsm_perform_call_protocol(struct svsm_call *call); +struct svsm_ca *boot_svsm_caa __ro_after_init; +u64 boot_svsm_caa_pa __ro_after_init; =20 /* I/O parameters for CPUID-related helpers */ struct cpuid_leaf { @@ -55,36 +51,6 @@ struct cpuid_leaf { u32 edx; }; =20 -/* - * Individual entries of the SNP CPUID table, as defined by the SNP - * Firmware ABI, Revision 0.9, Section 7.1, Table 14. - */ -struct snp_cpuid_fn { - u32 eax_in; - u32 ecx_in; - u64 xcr0_in; - u64 xss_in; - u32 eax; - u32 ebx; - u32 ecx; - u32 edx; - u64 __reserved; -} __packed; - -/* - * SNP CPUID table, as defined by the SNP Firmware ABI, Revision 0.9, - * Section 8.14.2.6. Also noted there is the SNP firmware-enforced limit - * of 64 entries per CPUID table. - */ -#define SNP_CPUID_COUNT_MAX 64 - -struct snp_cpuid_table { - u32 count; - u32 __reserved1; - u64 __reserved2; - struct snp_cpuid_fn fn[SNP_CPUID_COUNT_MAX]; -} __packed; - /* * Since feature negotiation related variables are set early in the boot * process they must reside in the .data section so as not to be zeroed @@ -107,7 +73,7 @@ static u32 cpuid_std_range_max __ro_after_init; static u32 cpuid_hyp_range_max __ro_after_init; static u32 cpuid_ext_range_max __ro_after_init; =20 -static bool __init sev_es_check_cpu_features(void) +bool __init sev_es_check_cpu_features(void) { if (!has_cpuflag(X86_FEATURE_RDRAND)) { error("RDRAND instruction not supported - no trusted source of randomnes= s available\n"); @@ -117,7 +83,7 @@ static bool __init sev_es_check_cpu_features(void) return true; } =20 -static void __head __noreturn +void __head __noreturn sev_es_terminate(unsigned int set, unsigned int reason) { u64 val =3D GHCB_MSR_TERM_REQ; @@ -136,7 +102,7 @@ sev_es_terminate(unsigned int set, unsigned int reason) /* * The hypervisor features are available from GHCB version 2 onward. */ -static u64 get_hv_features(void) +u64 get_hv_features(void) { u64 val; =20 @@ -153,7 +119,7 @@ static u64 get_hv_features(void) return GHCB_MSR_HV_FT_RESP_VAL(val); } =20 -static void snp_register_ghcb_early(unsigned long paddr) +void snp_register_ghcb_early(unsigned long paddr) { unsigned long pfn =3D paddr >> PAGE_SHIFT; u64 val; @@ -169,7 +135,7 @@ static void snp_register_ghcb_early(unsigned long paddr) sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_REGISTER); } =20 -static bool sev_es_negotiate_protocol(void) +bool sev_es_negotiate_protocol(void) { u64 val; =20 @@ -190,12 +156,6 @@ static bool sev_es_negotiate_protocol(void) return true; } =20 -static __always_inline void vc_ghcb_invalidate(struct ghcb *ghcb) -{ - ghcb->save.sw_exit_code =3D 0; - __builtin_memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitm= ap)); -} - static bool vc_decoding_needed(unsigned long exit_code) { /* Exceptions don't require to decode the instruction */ @@ -371,10 +331,10 @@ static int svsm_perform_ghcb_protocol(struct ghcb *gh= cb, struct svsm_call *call) return svsm_process_result_codes(call); } =20 -static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb, - struct es_em_ctxt *ctxt, - u64 exit_code, u64 exit_info_1, - u64 exit_info_2) +enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb, + struct es_em_ctxt *ctxt, + u64 exit_code, u64 exit_info_1, + u64 exit_info_2) { /* Fill in protocol and format specifiers */ ghcb->protocol_version =3D ghcb_version; @@ -473,7 +433,7 @@ static int sev_cpuid_hv(struct ghcb *ghcb, struct es_em= _ctxt *ctxt, struct cpuid * while running with the initial identity mapping as well as the * switch-over to kernel virtual addresses later. */ -static const struct snp_cpuid_table *snp_cpuid_get_table(void) +const struct snp_cpuid_table *snp_cpuid_get_table(void) { return rip_rel_ptr(&cpuid_table_copy); } diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev= -internal.h new file mode 100644 index 000000000000..73cb774c3639 --- /dev/null +++ b/arch/x86/include/asm/sev-internal.h @@ -0,0 +1,122 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#define DR7_RESET_VALUE 0x400 + +extern struct ghcb boot_ghcb_page; +extern struct ghcb *boot_ghcb; +extern u64 sev_hv_features; + +/* #VC handler runtime per-CPU data */ +struct sev_es_runtime_data { + struct ghcb ghcb_page; + + /* + * Reserve one page per CPU as backup storage for the unencrypted GHCB. + * It is needed when an NMI happens while the #VC handler uses the real + * GHCB, and the NMI handler itself is causing another #VC exception. In + * that case the GHCB content of the first handler needs to be backed up + * and restored. + */ + struct ghcb backup_ghcb; + + /* + * Mark the per-cpu GHCBs as in-use to detect nested #VC exceptions. + * There is no need for it to be atomic, because nothing is written to + * the GHCB between the read and the write of ghcb_active. So it is safe + * to use it when a nested #VC exception happens before the write. + * + * This is necessary for example in the #VC->NMI->#VC case when the NMI + * happens while the first #VC handler uses the GHCB. When the NMI code + * raises a second #VC handler it might overwrite the contents of the + * GHCB written by the first handler. To avoid this the content of the + * GHCB is saved and restored when the GHCB is detected to be in use + * already. + */ + bool ghcb_active; + bool backup_ghcb_active; + + /* + * Cached DR7 value - write it on DR7 writes and return it on reads. + * That value will never make it to the real hardware DR7 as debugging + * is currently unsupported in SEV-ES guests. + */ + unsigned long dr7; +}; + +struct ghcb_state { + struct ghcb *ghcb; +}; + +extern struct svsm_ca boot_svsm_ca_page; + +struct ghcb *__sev_get_ghcb(struct ghcb_state *state); +void __sev_put_ghcb(struct ghcb_state *state); + +DECLARE_PER_CPU(struct sev_es_runtime_data*, runtime_data); +DECLARE_PER_CPU(struct sev_es_save_area *, sev_vmsa); + +void early_set_pages_state(unsigned long vaddr, unsigned long paddr, + unsigned long npages, enum psc_op op); + +void __noreturn sev_es_terminate(unsigned int set, unsigned int reason); + +DECLARE_PER_CPU(struct svsm_ca *, svsm_caa); +DECLARE_PER_CPU(u64, svsm_caa_pa); + +extern struct svsm_ca *boot_svsm_caa; +extern u64 boot_svsm_caa_pa; + +static __always_inline struct svsm_ca *svsm_get_caa(void) +{ + /* + * Use rIP-relative references when called early in the boot. If + * ->use_cas is set, then it is late in the boot and no need + * to worry about rIP-relative references. + */ + if (RIP_REL_REF(sev_cfg).use_cas) + return this_cpu_read(svsm_caa); + else + return RIP_REL_REF(boot_svsm_caa); +} + +static __always_inline u64 svsm_get_caa_pa(void) +{ + /* + * Use rIP-relative references when called early in the boot. If + * ->use_cas is set, then it is late in the boot and no need + * to worry about rIP-relative references. + */ + if (RIP_REL_REF(sev_cfg).use_cas) + return this_cpu_read(svsm_caa_pa); + else + return RIP_REL_REF(boot_svsm_caa_pa); +} + +int svsm_perform_call_protocol(struct svsm_call *call); + +static inline u64 sev_es_rd_ghcb_msr(void) +{ + return __rdmsr(MSR_AMD64_SEV_ES_GHCB); +} + +static __always_inline void sev_es_wr_ghcb_msr(u64 val) +{ + u32 low, high; + + low =3D (u32)(val); + high =3D (u32)(val >> 32); + + native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high); +} + +enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb, + struct es_em_ctxt *ctxt, + u64 exit_code, u64 exit_info_1, + u64 exit_info_2); + +void snp_register_ghcb_early(unsigned long paddr); +bool sev_es_negotiate_protocol(void); +bool sev_es_check_cpu_features(void); +u64 get_hv_features(void); + +const struct snp_cpuid_table *snp_cpuid_get_table(void); diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h index ba7999f66abe..a8661dfc9a9a 100644 --- a/arch/x86/include/asm/sev.h +++ b/arch/x86/include/asm/sev.h @@ -15,6 +15,7 @@ #include #include #include +#include =20 #define GHCB_PROTOCOL_MIN 1ULL #define GHCB_PROTOCOL_MAX 2ULL @@ -83,6 +84,36 @@ extern void vc_no_ghcb(void); extern void vc_boot_ghcb(void); extern bool handle_vc_boot_ghcb(struct pt_regs *regs); =20 +/* + * Individual entries of the SNP CPUID table, as defined by the SNP + * Firmware ABI, Revision 0.9, Section 7.1, Table 14. + */ +struct snp_cpuid_fn { + u32 eax_in; + u32 ecx_in; + u64 xcr0_in; + u64 xss_in; + u32 eax; + u32 ebx; + u32 ecx; + u32 edx; + u64 __reserved; +} __packed; + +/* + * SNP CPUID table, as defined by the SNP Firmware ABI, Revision 0.9, + * Section 8.14.2.6. Also noted there is the SNP firmware-enforced limit + * of 64 entries per CPUID table. + */ +#define SNP_CPUID_COUNT_MAX 64 + +struct snp_cpuid_table { + u32 count; + u32 __reserved1; + u64 __reserved2; + struct snp_cpuid_fn fn[SNP_CPUID_COUNT_MAX]; +} __packed; + /* PVALIDATE return codes */ #define PVALIDATE_FAIL_SIZEMISMATCH 6 =20 @@ -484,6 +515,12 @@ int snp_send_guest_request(struct snp_msg_desc *mdesc,= struct snp_guest_req *req void __init snp_secure_tsc_prepare(void); void __init snp_secure_tsc_init(void); =20 +static __always_inline void vc_ghcb_invalidate(struct ghcb *ghcb) +{ + ghcb->save.sw_exit_code =3D 0; + __builtin_memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitm= ap)); +} + #else /* !CONFIG_AMD_MEM_ENCRYPT */ =20 #define snp_vmpl 0 --=20 2.49.0.504.g3bcea36a83-goog From nobody Thu Dec 18 13:15:19 2025 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4CAEE28D831 for ; Thu, 10 Apr 2025 13:41:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292523; cv=none; b=PcQe5PknLG8OZY35BIqZY6ouHUEOg704GSOLt9TEywOIUf0p+K2jpEAQznj85VTuvLeUc3a1mZ8gA/tG+unJCG1gkRBJXws2nPD9TbZLTLX/2umEqf+j5lyN75lChuf+O5G+Yl9ggKTFddbjVuP69oIeup/lFQvTdlrgDbYtWT0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292523; c=relaxed/simple; bh=vM67J1PXJCfXgYbTaW64nM7P/pQrfOPX7SIwEPfUYLI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=YzUOrhttj87QH26UopuzN6HnoOHlAiodOUNV8jQa0Er8rqOPerfeLXAtieQOOeJI9V8q6Psldrq/DnMRjRZCXtLUmcGVOgaYtOy2M1cldnoLqPHie5mFGVbkf2msjrooF1WxHdsrI09Kv5n7515ztWaGU+9OXKRjxu/gpuBoUVA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=BaKXM7rd; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="BaKXM7rd" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-39d917b1455so285305f8f.3 for ; Thu, 10 Apr 2025 06:41:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744292517; x=1744897317; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=AKMscW38p0w7jQFUu2hb4y6VWO76QJqFM0eYk9egbz8=; b=BaKXM7rdx+QAc1trHWREW4Q1jYLawL50IzhpLeYuiclAF4uYfkzTPpj8uinlqy5mKh TbZUtjv5E57JtvquNJXvZCSQo1yxzVRI0kwgU7v3UuIwHcrXf2GzaGfU3OO/TPaZYOQu OC/LSSBEOYpg1NJGRXSZWquGefugdRvjar+XKUkjUKp+BQjE4FqwviCdTlNCfmuA5moL V4aBg3W0auOncQiGClUJODBWeWv4wQFt+TdV/ApfUe4puOBfGlNLiy8lSNDrHFf6sB8i vIZfv9E6FTgOmn/U95AqqNzzgHlAhaRH5Um5H1gbZ58y1bm7h4pM0GknAVzfRCCuGhgY fP2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744292517; x=1744897317; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=AKMscW38p0w7jQFUu2hb4y6VWO76QJqFM0eYk9egbz8=; b=ZATfXZZ/+qjSqsc4oY0WiLSbhSg6DqhpAa7zheIAoGJCOKzC0GDfxgNNjeKJnFvA5E vlCiF1wl53X+uX+clWbz700IDD/rZGMN0Mq+AGfr4Z79ytYO6h1+ttv9Ax3qURfuH/4L +h6Oj9rrLyUbI5JBcdU3nbGFh8ppV8KscS1Keq4p58CdAmSxIrO3OBo1U/Di5TMIhGvf 3OARmYkyeIk0aPqCnFIXaXnjK3qOi2ZMudHxPS0f7Y3/pDGeW6GSvc9r/UiHMhuNNl/9 G589mcqUgdLS+aAwHIjvnZ3mnTH3+7ezTWbHwnERb7P7ZYboE114ISZ/B7RYxyFjtTBl 7FLQ== X-Forwarded-Encrypted: i=1; AJvYcCUK1e5HDVBeiBHktkjD4W/GJntvpcwdIhHD7iTrHSVpU0nomhr/HJSVFb8CrDfh+4P7C9k8UGyhgAnC3Iw=@vger.kernel.org X-Gm-Message-State: AOJu0YxjQ+kooeKbnVprQGmIgYnkrYxcvZFU1GRH28btJ0Bea20iOg8G 3bLD94TvqVD/AAbOC0kW2sue8QCYrMMJCq7c/CIwUecRm0h6ud7CtyDadWcgEf3rqIYMEA== X-Google-Smtp-Source: AGHT+IGKHuKoboCe9eD9syYspbkeXthO/0/Yyfk0Q0DQwJYmjhTldVIkCdqSEfQ5rm/bGdh9T0z4Xfr/ X-Received: from wmby10.prod.google.com ([2002:a05:600c:c04a:b0:43d:522a:45e8]) (user=ardb job=prod-delivery.src-stubby-dispatcher) by 2002:a5d:5f48:0:b0:391:2fe3:24ec with SMTP id ffacd0b85a97d-39d8f46aef3mr2561770f8f.14.1744292516870; Thu, 10 Apr 2025 06:41:56 -0700 (PDT) Date: Thu, 10 Apr 2025 15:41:26 +0200 In-Reply-To: <20250410134117.3713574-13-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250410134117.3713574-13-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=97236; i=ardb@kernel.org; h=from:subject; bh=5lc2uuzNj5mTst6eV+V0hrmJyTzNtK+1DmQuWpSWoDI=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIf37qW5Ry8LbwZqskeFN6smJ/DHJf/fcz36QYbkgchPHJ uY5u1U7SlkYxDgYZMUUWQRm/3238/REqVrnWbIwc1iZQIYwcHEKwEQWtDAyLPwo8NPRcPUmpjDW 1+Ib/OYrTPfw0Y66oeK2ds2Vs3biexgZVuvrzPc+fe7bteQ7W8rLxB7F/mKpj92olDnt6rZE2bV LmAE= X-Mailer: git-send-email 2.49.0.504.g3bcea36a83-goog Message-ID: <20250410134117.3713574-21-ardb+git@google.com> Subject: [PATCH v4 08/11] x86/sev: Split off startup code from core code From: Ard Biesheuvel To: linux-efi@vger.kernel.org Cc: x86@kernel.org, mingo@kernel.org, linux-kernel@vger.kernel.org, Ard Biesheuvel , Tom Lendacky , Dionna Amalie Glaze , Kevin Loughlin Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ard Biesheuvel Disentangle the SEV core code and the SEV code that is called during early boot. The latter piece will be moved into startup/ in a subsequent patch. Signed-off-by: Ard Biesheuvel --- arch/x86/boot/compressed/sev.c | 2 + arch/x86/coco/sev/Makefile | 12 +- arch/x86/coco/sev/core.c | 1574 ++++---------------- arch/x86/coco/sev/shared.c | 281 ---- arch/x86/coco/sev/startup.c | 1395 +++++++++++++++++ 5 files changed, 1658 insertions(+), 1606 deletions(-) diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c index 478eca4f7180..714e30c66eae 100644 --- a/arch/x86/boot/compressed/sev.c +++ b/arch/x86/boot/compressed/sev.c @@ -141,6 +141,8 @@ u64 svsm_get_caa_pa(void) =20 int svsm_perform_call_protocol(struct svsm_call *call); =20 +u8 snp_vmpl; + /* Include code for early handlers */ #include "../../coco/sev/shared.c" =20 diff --git a/arch/x86/coco/sev/Makefile b/arch/x86/coco/sev/Makefile index dcb06dc8b5ae..7d7d2aee62f0 100644 --- a/arch/x86/coco/sev/Makefile +++ b/arch/x86/coco/sev/Makefile @@ -1,18 +1,18 @@ # SPDX-License-Identifier: GPL-2.0 =20 -obj-y +=3D core.o +obj-y +=3D core.o startup.o =20 # jump tables are emitted using absolute references in non-PIC code # so they cannot be used in the early SEV startup code -CFLAGS_core.o +=3D -fno-jump-tables +CFLAGS_startup.o +=3D -fno-jump-tables =20 ifdef CONFIG_FUNCTION_TRACER -CFLAGS_REMOVE_core.o =3D -pg +CFLAGS_REMOVE_startup.o =3D -pg endif =20 -KASAN_SANITIZE_core.o :=3D n -KMSAN_SANITIZE_core.o :=3D n -KCOV_INSTRUMENT_core.o :=3D n +KASAN_SANITIZE_startup.o :=3D n +KMSAN_SANITIZE_startup.o :=3D n +KCOV_INSTRUMENT_startup.o :=3D n =20 # With some compiler versions the generated code results in boot hangs, ca= used # by several compilation units. To be safe, disable all instrumentation. diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c index aeb7731862c0..26e3cf28c4c1 100644 --- a/arch/x86/coco/sev/core.c +++ b/arch/x86/coco/sev/core.c @@ -80,18 +80,6 @@ static const char * const sev_status_feat_names[] =3D { [MSR_AMD64_SNP_SMT_PROT_BIT] =3D "SMTProt", }; =20 -/* For early boot hypervisor communication in SEV-ES enabled guests */ -struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE); - -/* - * Needs to be in the .data section because we need it NULL before bss is - * cleared - */ -struct ghcb *boot_ghcb __section(".data"); - -/* Bitmap of SEV features supported by the hypervisor */ -u64 sev_hv_features __ro_after_init; - /* Secrets page physical address from the CC blob */ static u64 secrets_pa __ro_after_init; =20 @@ -104,14 +92,16 @@ static u64 snp_tsc_scale __ro_after_init; static u64 snp_tsc_offset __ro_after_init; static u64 snp_tsc_freq_khz __ro_after_init; =20 - -/* For early boot SVSM communication */ -struct svsm_ca boot_svsm_ca_page __aligned(PAGE_SIZE); - DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data); DEFINE_PER_CPU(struct sev_es_save_area *, sev_vmsa); -DEFINE_PER_CPU(struct svsm_ca *, svsm_caa); -DEFINE_PER_CPU(u64, svsm_caa_pa); + +/* + * SVSM related information: + * When running under an SVSM, the VMPL that Linux is executing at must = be + * non-zero. The VMPL is therefore used to indicate the presence of an S= VSM. + */ +u8 snp_vmpl __ro_after_init; +EXPORT_SYMBOL_GPL(snp_vmpl); =20 static __always_inline bool on_vc_stack(struct pt_regs *regs) { @@ -128,6 +118,7 @@ static __always_inline bool on_vc_stack(struct pt_regs = *regs) return ((sp >=3D __this_cpu_ist_bottom_va(VC)) && (sp < __this_cpu_ist_to= p_va(VC))); } =20 + /* * This function handles the case when an NMI is raised in the #VC * exception handler entry code, before the #VC handler has switched off @@ -184,397 +175,203 @@ void noinstr __sev_es_ist_exit(void) this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], *(unsigned long *)is= t); } =20 -/* - * Nothing shall interrupt this code path while holding the per-CPU - * GHCB. The backup GHCB is only for NMIs interrupting this path. - * - * Callers must disable local interrupts around it. - */ -noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state) +static u64 __init get_snp_jump_table_addr(void) { - struct sev_es_runtime_data *data; - struct ghcb *ghcb; - - WARN_ON(!irqs_disabled()); - - data =3D this_cpu_read(runtime_data); - ghcb =3D &data->ghcb_page; - - if (unlikely(data->ghcb_active)) { - /* GHCB is already in use - save its contents */ - - if (unlikely(data->backup_ghcb_active)) { - /* - * Backup-GHCB is also already in use. There is no way - * to continue here so just kill the machine. To make - * panic() work, mark GHCBs inactive so that messages - * can be printed out. - */ - data->ghcb_active =3D false; - data->backup_ghcb_active =3D false; - - instrumentation_begin(); - panic("Unable to handle #VC exception! GHCB and Backup GHCB are already= in use"); - instrumentation_end(); - } - - /* Mark backup_ghcb active before writing to it */ - data->backup_ghcb_active =3D true; - - state->ghcb =3D &data->backup_ghcb; + struct snp_secrets_page *secrets; + void __iomem *mem; + u64 addr; =20 - /* Backup GHCB content */ - *state->ghcb =3D *ghcb; - } else { - state->ghcb =3D NULL; - data->ghcb_active =3D true; + mem =3D ioremap_encrypted(secrets_pa, PAGE_SIZE); + if (!mem) { + pr_err("Unable to locate AP jump table address: failed to map the SNP se= crets page.\n"); + return 0; } =20 - return ghcb; -} + secrets =3D (__force struct snp_secrets_page *)mem; =20 -static int vc_fetch_insn_kernel(struct es_em_ctxt *ctxt, - unsigned char *buffer) -{ - return copy_from_kernel_nofault(buffer, (unsigned char *)ctxt->regs->ip, = MAX_INSN_SIZE); + addr =3D secrets->os_area.ap_jump_table_pa; + iounmap(mem); + + return addr; } =20 -static enum es_result __vc_decode_user_insn(struct es_em_ctxt *ctxt) +void noinstr __sev_es_nmi_complete(void) { - char buffer[MAX_INSN_SIZE]; - int insn_bytes; - - insn_bytes =3D insn_fetch_from_user_inatomic(ctxt->regs, buffer); - if (insn_bytes =3D=3D 0) { - /* Nothing could be copied */ - ctxt->fi.vector =3D X86_TRAP_PF; - ctxt->fi.error_code =3D X86_PF_INSTR | X86_PF_USER; - ctxt->fi.cr2 =3D ctxt->regs->ip; - return ES_EXCEPTION; - } else if (insn_bytes =3D=3D -EINVAL) { - /* Effective RIP could not be calculated */ - ctxt->fi.vector =3D X86_TRAP_GP; - ctxt->fi.error_code =3D 0; - ctxt->fi.cr2 =3D 0; - return ES_EXCEPTION; - } - - if (!insn_decode_from_regs(&ctxt->insn, ctxt->regs, buffer, insn_bytes)) - return ES_DECODE_FAILED; + struct ghcb_state state; + struct ghcb *ghcb; =20 - if (ctxt->insn.immediate.got) - return ES_OK; - else - return ES_DECODE_FAILED; -} + ghcb =3D __sev_get_ghcb(&state); =20 -static enum es_result __vc_decode_kern_insn(struct es_em_ctxt *ctxt) -{ - char buffer[MAX_INSN_SIZE]; - int res, ret; + vc_ghcb_invalidate(ghcb); + ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_NMI_COMPLETE); + ghcb_set_sw_exit_info_1(ghcb, 0); + ghcb_set_sw_exit_info_2(ghcb, 0); =20 - res =3D vc_fetch_insn_kernel(ctxt, buffer); - if (res) { - ctxt->fi.vector =3D X86_TRAP_PF; - ctxt->fi.error_code =3D X86_PF_INSTR; - ctxt->fi.cr2 =3D ctxt->regs->ip; - return ES_EXCEPTION; - } + sev_es_wr_ghcb_msr(__pa_nodebug(ghcb)); + VMGEXIT(); =20 - ret =3D insn_decode(&ctxt->insn, buffer, MAX_INSN_SIZE, INSN_MODE_64); - if (ret < 0) - return ES_DECODE_FAILED; - else - return ES_OK; + __sev_put_ghcb(&state); } =20 -static enum es_result vc_decode_insn(struct es_em_ctxt *ctxt) +static u64 __init get_jump_table_addr(void) { - if (user_mode(ctxt->regs)) - return __vc_decode_user_insn(ctxt); - else - return __vc_decode_kern_insn(ctxt); -} + struct ghcb_state state; + unsigned long flags; + struct ghcb *ghcb; + u64 ret =3D 0; =20 -static enum es_result vc_write_mem(struct es_em_ctxt *ctxt, - char *dst, char *buf, size_t size) -{ - unsigned long error_code =3D X86_PF_PROT | X86_PF_WRITE; + if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) + return get_snp_jump_table_addr(); =20 - /* - * This function uses __put_user() independent of whether kernel or user - * memory is accessed. This works fine because __put_user() does no - * sanity checks of the pointer being accessed. All that it does is - * to report when the access failed. - * - * Also, this function runs in atomic context, so __put_user() is not - * allowed to sleep. The page-fault handler detects that it is running - * in atomic context and will not try to take mmap_sem and handle the - * fault, so additional pagefault_enable()/disable() calls are not - * needed. - * - * The access can't be done via copy_to_user() here because - * vc_write_mem() must not use string instructions to access unsafe - * memory. The reason is that MOVS is emulated by the #VC handler by - * splitting the move up into a read and a write and taking a nested #VC - * exception on whatever of them is the MMIO access. Using string - * instructions here would cause infinite nesting. - */ - switch (size) { - case 1: { - u8 d1; - u8 __user *target =3D (u8 __user *)dst; - - memcpy(&d1, buf, 1); - if (__put_user(d1, target)) - goto fault; - break; - } - case 2: { - u16 d2; - u16 __user *target =3D (u16 __user *)dst; + local_irq_save(flags); =20 - memcpy(&d2, buf, 2); - if (__put_user(d2, target)) - goto fault; - break; - } - case 4: { - u32 d4; - u32 __user *target =3D (u32 __user *)dst; + ghcb =3D __sev_get_ghcb(&state); =20 - memcpy(&d4, buf, 4); - if (__put_user(d4, target)) - goto fault; - break; - } - case 8: { - u64 d8; - u64 __user *target =3D (u64 __user *)dst; + vc_ghcb_invalidate(ghcb); + ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_JUMP_TABLE); + ghcb_set_sw_exit_info_1(ghcb, SVM_VMGEXIT_GET_AP_JUMP_TABLE); + ghcb_set_sw_exit_info_2(ghcb, 0); =20 - memcpy(&d8, buf, 8); - if (__put_user(d8, target)) - goto fault; - break; - } - default: - WARN_ONCE(1, "%s: Invalid size: %zu\n", __func__, size); - return ES_UNSUPPORTED; - } + sev_es_wr_ghcb_msr(__pa(ghcb)); + VMGEXIT(); =20 - return ES_OK; + if (ghcb_sw_exit_info_1_is_valid(ghcb) && + ghcb_sw_exit_info_2_is_valid(ghcb)) + ret =3D ghcb->save.sw_exit_info_2; =20 -fault: - if (user_mode(ctxt->regs)) - error_code |=3D X86_PF_USER; + __sev_put_ghcb(&state); =20 - ctxt->fi.vector =3D X86_TRAP_PF; - ctxt->fi.error_code =3D error_code; - ctxt->fi.cr2 =3D (unsigned long)dst; + local_irq_restore(flags); =20 - return ES_EXCEPTION; + return ret; } =20 -static enum es_result vc_read_mem(struct es_em_ctxt *ctxt, - char *src, char *buf, size_t size) +static inline void __pval_terminate(u64 pfn, bool action, unsigned int pag= e_size, + int ret, u64 svsm_ret) { - unsigned long error_code =3D X86_PF_PROT; - - /* - * This function uses __get_user() independent of whether kernel or user - * memory is accessed. This works fine because __get_user() does no - * sanity checks of the pointer being accessed. All that it does is - * to report when the access failed. - * - * Also, this function runs in atomic context, so __get_user() is not - * allowed to sleep. The page-fault handler detects that it is running - * in atomic context and will not try to take mmap_sem and handle the - * fault, so additional pagefault_enable()/disable() calls are not - * needed. - * - * The access can't be done via copy_from_user() here because - * vc_read_mem() must not use string instructions to access unsafe - * memory. The reason is that MOVS is emulated by the #VC handler by - * splitting the move up into a read and a write and taking a nested #VC - * exception on whatever of them is the MMIO access. Using string - * instructions here would cause infinite nesting. - */ - switch (size) { - case 1: { - u8 d1; - u8 __user *s =3D (u8 __user *)src; - - if (__get_user(d1, s)) - goto fault; - memcpy(buf, &d1, 1); - break; - } - case 2: { - u16 d2; - u16 __user *s =3D (u16 __user *)src; - - if (__get_user(d2, s)) - goto fault; - memcpy(buf, &d2, 2); - break; - } - case 4: { - u32 d4; - u32 __user *s =3D (u32 __user *)src; - - if (__get_user(d4, s)) - goto fault; - memcpy(buf, &d4, 4); - break; - } - case 8: { - u64 d8; - u64 __user *s =3D (u64 __user *)src; - if (__get_user(d8, s)) - goto fault; - memcpy(buf, &d8, 8); - break; - } - default: - WARN_ONCE(1, "%s: Invalid size: %zu\n", __func__, size); - return ES_UNSUPPORTED; - } + WARN(1, "PVALIDATE failure: pfn: 0x%llx, action: %u, size: %u, ret: %d, s= vsm_ret: 0x%llx\n", + pfn, action, page_size, ret, svsm_ret); =20 - return ES_OK; + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE); +} =20 -fault: - if (user_mode(ctxt->regs)) - error_code |=3D X86_PF_USER; +static void svsm_pval_terminate(struct svsm_pvalidate_call *pc, int ret, u= 64 svsm_ret) +{ + unsigned int page_size; + bool action; + u64 pfn; =20 - ctxt->fi.vector =3D X86_TRAP_PF; - ctxt->fi.error_code =3D error_code; - ctxt->fi.cr2 =3D (unsigned long)src; + pfn =3D pc->entry[pc->cur_index].pfn; + action =3D pc->entry[pc->cur_index].action; + page_size =3D pc->entry[pc->cur_index].page_size; =20 - return ES_EXCEPTION; + __pval_terminate(pfn, action, page_size, ret, svsm_ret); } =20 -static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em= _ctxt *ctxt, - unsigned long vaddr, phys_addr_t *paddr) +static void pval_pages(struct snp_psc_desc *desc) { - unsigned long va =3D (unsigned long)vaddr; - unsigned int level; - phys_addr_t pa; - pgd_t *pgd; - pte_t *pte; - - pgd =3D __va(read_cr3_pa()); - pgd =3D &pgd[pgd_index(va)]; - pte =3D lookup_address_in_pgd(pgd, va, &level); - if (!pte) { - ctxt->fi.vector =3D X86_TRAP_PF; - ctxt->fi.cr2 =3D vaddr; - ctxt->fi.error_code =3D 0; - - if (user_mode(ctxt->regs)) - ctxt->fi.error_code |=3D X86_PF_USER; + struct psc_entry *e; + unsigned long vaddr; + unsigned int size; + unsigned int i; + bool validate; + u64 pfn; + int rc; =20 - return ES_EXCEPTION; - } + for (i =3D 0; i <=3D desc->hdr.end_entry; i++) { + e =3D &desc->entries[i]; =20 - if (WARN_ON_ONCE(pte_val(*pte) & _PAGE_ENC)) - /* Emulated MMIO to/from encrypted memory not supported */ - return ES_UNSUPPORTED; + pfn =3D e->gfn; + vaddr =3D (unsigned long)pfn_to_kaddr(pfn); + size =3D e->pagesize ? RMP_PG_SIZE_2M : RMP_PG_SIZE_4K; + validate =3D e->operation =3D=3D SNP_PAGE_STATE_PRIVATE; =20 - pa =3D (phys_addr_t)pte_pfn(*pte) << PAGE_SHIFT; - pa |=3D va & ~page_level_mask(level); + rc =3D pvalidate(vaddr, size, validate); + if (!rc) + continue; =20 - *paddr =3D pa; + if (rc =3D=3D PVALIDATE_FAIL_SIZEMISMATCH && size =3D=3D RMP_PG_SIZE_2M)= { + unsigned long vaddr_end =3D vaddr + PMD_SIZE; =20 - return ES_OK; + for (; vaddr < vaddr_end; vaddr +=3D PAGE_SIZE, pfn++) { + rc =3D pvalidate(vaddr, RMP_PG_SIZE_4K, validate); + if (rc) + __pval_terminate(pfn, validate, RMP_PG_SIZE_4K, rc, 0); + } + } else { + __pval_terminate(pfn, validate, size, rc, 0); + } + } } =20 -static enum es_result vc_ioio_check(struct es_em_ctxt *ctxt, u16 port, siz= e_t size) +static u64 svsm_build_ca_from_pfn_range(u64 pfn, u64 pfn_end, bool action, + struct svsm_pvalidate_call *pc) { - BUG_ON(size > 4); + struct svsm_pvalidate_entry *pe; =20 - if (user_mode(ctxt->regs)) { - struct thread_struct *t =3D ¤t->thread; - struct io_bitmap *iobm =3D t->io_bitmap; - size_t idx; + /* Nothing in the CA yet */ + pc->num_entries =3D 0; + pc->cur_index =3D 0; =20 - if (!iobm) - goto fault; + pe =3D &pc->entry[0]; =20 - for (idx =3D port; idx < port + size; ++idx) { - if (test_bit(idx, iobm->bitmap)) - goto fault; - } - } + while (pfn < pfn_end) { + pe->page_size =3D RMP_PG_SIZE_4K; + pe->action =3D action; + pe->ignore_cf =3D 0; + pe->pfn =3D pfn; =20 - return ES_OK; + pe++; + pfn++; =20 -fault: - ctxt->fi.vector =3D X86_TRAP_GP; - ctxt->fi.error_code =3D 0; + pc->num_entries++; + if (pc->num_entries =3D=3D SVSM_PVALIDATE_MAX_COUNT) + break; + } =20 - return ES_EXCEPTION; + return pfn; } =20 -static __always_inline void vc_forward_exception(struct es_em_ctxt *ctxt) +static int svsm_build_ca_from_psc_desc(struct snp_psc_desc *desc, unsigned= int desc_entry, + struct svsm_pvalidate_call *pc) { - long error_code =3D ctxt->fi.error_code; - int trapnr =3D ctxt->fi.vector; - - ctxt->regs->orig_ax =3D ctxt->fi.error_code; - - switch (trapnr) { - case X86_TRAP_GP: - exc_general_protection(ctxt->regs, error_code); - break; - case X86_TRAP_UD: - exc_invalid_op(ctxt->regs); - break; - case X86_TRAP_PF: - write_cr2(ctxt->fi.cr2); - exc_page_fault(ctxt->regs, error_code); - break; - case X86_TRAP_AC: - exc_alignment_check(ctxt->regs, error_code); - break; - default: - pr_emerg("Unsupported exception in #VC instruction emulation - can't con= tinue\n"); - BUG(); - } -} + struct svsm_pvalidate_entry *pe; + struct psc_entry *e; =20 -/* Include code shared with pre-decompression boot stage */ -#include "shared.c" + /* Nothing in the CA yet */ + pc->num_entries =3D 0; + pc->cur_index =3D 0; =20 -noinstr void __sev_put_ghcb(struct ghcb_state *state) -{ - struct sev_es_runtime_data *data; - struct ghcb *ghcb; + pe =3D &pc->entry[0]; + e =3D &desc->entries[desc_entry]; =20 - WARN_ON(!irqs_disabled()); + while (desc_entry <=3D desc->hdr.end_entry) { + pe->page_size =3D e->pagesize ? RMP_PG_SIZE_2M : RMP_PG_SIZE_4K; + pe->action =3D e->operation =3D=3D SNP_PAGE_STATE_PRIVATE; + pe->ignore_cf =3D 0; + pe->pfn =3D e->gfn; =20 - data =3D this_cpu_read(runtime_data); - ghcb =3D &data->ghcb_page; + pe++; + e++; =20 - if (state->ghcb) { - /* Restore GHCB from Backup */ - *ghcb =3D *state->ghcb; - data->backup_ghcb_active =3D false; - state->ghcb =3D NULL; - } else { - /* - * Invalidate the GHCB so a VMGEXIT instruction issued - * from userspace won't appear to be valid. - */ - vc_ghcb_invalidate(ghcb); - data->ghcb_active =3D false; + desc_entry++; + pc->num_entries++; + if (pc->num_entries =3D=3D SVSM_PVALIDATE_MAX_COUNT) + break; } + + return desc_entry; } =20 -int svsm_perform_call_protocol(struct svsm_call *call) +static void svsm_pval_pages(struct snp_psc_desc *desc) { - struct ghcb_state state; + struct svsm_pvalidate_entry pv_4k[VMGEXIT_PSC_MAX_ENTRY]; + unsigned int i, pv_4k_count =3D 0; + struct svsm_pvalidate_call *pc; + struct svsm_call call =3D {}; unsigned long flags; - struct ghcb *ghcb; + bool action; + u64 pc_pa; int ret; =20 /* @@ -584,184 +381,149 @@ int svsm_perform_call_protocol(struct svsm_call *ca= ll) flags =3D native_local_irq_save(); =20 /* - * Use rip-relative references when called early in the boot. If - * ghcbs_initialized is set, then it is late in the boot and no need - * to worry about rip-relative references in called functions. + * The SVSM calling area (CA) can support processing 510 entries at a + * time. Loop through the Page State Change descriptor until the CA is + * full or the last entry in the descriptor is reached, at which time + * the SVSM is invoked. This repeats until all entries in the descriptor + * are processed. */ - if (RIP_REL_REF(sev_cfg).ghcbs_initialized) - ghcb =3D __sev_get_ghcb(&state); - else if (RIP_REL_REF(boot_ghcb)) - ghcb =3D RIP_REL_REF(boot_ghcb); - else - ghcb =3D NULL; + call.caa =3D svsm_get_caa(); =20 - do { - ret =3D ghcb ? svsm_perform_ghcb_protocol(ghcb, call) - : svsm_perform_msr_protocol(call); - } while (ret =3D=3D -EAGAIN); + pc =3D (struct svsm_pvalidate_call *)call.caa->svsm_buffer; + pc_pa =3D svsm_get_caa_pa() + offsetof(struct svsm_ca, svsm_buffer); =20 - if (RIP_REL_REF(sev_cfg).ghcbs_initialized) - __sev_put_ghcb(&state); + /* Protocol 0, Call ID 1 */ + call.rax =3D SVSM_CORE_CALL(SVSM_CORE_PVALIDATE); + call.rcx =3D pc_pa; =20 - native_local_irq_restore(flags); + for (i =3D 0; i <=3D desc->hdr.end_entry;) { + i =3D svsm_build_ca_from_psc_desc(desc, i, pc); =20 - return ret; -} + do { + ret =3D svsm_perform_call_protocol(&call); + if (!ret) + continue; =20 -void noinstr __sev_es_nmi_complete(void) -{ - struct ghcb_state state; - struct ghcb *ghcb; + /* + * Check if the entry failed because of an RMP mismatch (a + * PVALIDATE at 2M was requested, but the page is mapped in + * the RMP as 4K). + */ =20 - ghcb =3D __sev_get_ghcb(&state); + if (call.rax_out =3D=3D SVSM_PVALIDATE_FAIL_SIZEMISMATCH && + pc->entry[pc->cur_index].page_size =3D=3D RMP_PG_SIZE_2M) { + /* Save this entry for post-processing at 4K */ + pv_4k[pv_4k_count++] =3D pc->entry[pc->cur_index]; + + /* Skip to the next one unless at the end of the list */ + pc->cur_index++; + if (pc->cur_index < pc->num_entries) + ret =3D -EAGAIN; + else + ret =3D 0; + } + } while (ret =3D=3D -EAGAIN); =20 - vc_ghcb_invalidate(ghcb); - ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_NMI_COMPLETE); - ghcb_set_sw_exit_info_1(ghcb, 0); - ghcb_set_sw_exit_info_2(ghcb, 0); + if (ret) + svsm_pval_terminate(pc, ret, call.rax_out); + } =20 - sev_es_wr_ghcb_msr(__pa_nodebug(ghcb)); - VMGEXIT(); + /* Process any entries that failed to be validated at 2M and validate the= m at 4K */ + for (i =3D 0; i < pv_4k_count; i++) { + u64 pfn, pfn_end; =20 - __sev_put_ghcb(&state); -} + action =3D pv_4k[i].action; + pfn =3D pv_4k[i].pfn; + pfn_end =3D pfn + 512; =20 -static u64 __init get_snp_jump_table_addr(void) -{ - struct snp_secrets_page *secrets; - void __iomem *mem; - u64 addr; + while (pfn < pfn_end) { + pfn =3D svsm_build_ca_from_pfn_range(pfn, pfn_end, action, pc); =20 - mem =3D ioremap_encrypted(secrets_pa, PAGE_SIZE); - if (!mem) { - pr_err("Unable to locate AP jump table address: failed to map the SNP se= crets page.\n"); - return 0; + ret =3D svsm_perform_call_protocol(&call); + if (ret) + svsm_pval_terminate(pc, ret, call.rax_out); + } } =20 - secrets =3D (__force struct snp_secrets_page *)mem; - - addr =3D secrets->os_area.ap_jump_table_pa; - iounmap(mem); - - return addr; + native_local_irq_restore(flags); } =20 -static u64 __init get_jump_table_addr(void) +static void pvalidate_pages(struct snp_psc_desc *desc) { - struct ghcb_state state; - unsigned long flags; - struct ghcb *ghcb; - u64 ret =3D 0; - - if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) - return get_snp_jump_table_addr(); - - local_irq_save(flags); + if (snp_vmpl) + svsm_pval_pages(desc); + else + pval_pages(desc); +} =20 - ghcb =3D __sev_get_ghcb(&state); +static int vmgexit_psc(struct ghcb *ghcb, struct snp_psc_desc *desc) +{ + int cur_entry, end_entry, ret =3D 0; + struct snp_psc_desc *data; + struct es_em_ctxt ctxt; =20 vc_ghcb_invalidate(ghcb); - ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_JUMP_TABLE); - ghcb_set_sw_exit_info_1(ghcb, SVM_VMGEXIT_GET_AP_JUMP_TABLE); - ghcb_set_sw_exit_info_2(ghcb, 0); - - sev_es_wr_ghcb_msr(__pa(ghcb)); - VMGEXIT(); - - if (ghcb_sw_exit_info_1_is_valid(ghcb) && - ghcb_sw_exit_info_2_is_valid(ghcb)) - ret =3D ghcb->save.sw_exit_info_2; - - __sev_put_ghcb(&state); =20 - local_irq_restore(flags); - - return ret; -} + /* Copy the input desc into GHCB shared buffer */ + data =3D (struct snp_psc_desc *)ghcb->shared_buffer; + memcpy(ghcb->shared_buffer, desc, min_t(int, GHCB_SHARED_BUF_SIZE, sizeof= (*desc))); =20 -void __head -early_set_pages_state(unsigned long vaddr, unsigned long paddr, - unsigned long npages, enum psc_op op) -{ - unsigned long paddr_end; - u64 val; - - vaddr =3D vaddr & PAGE_MASK; + /* + * As per the GHCB specification, the hypervisor can resume the guest + * before processing all the entries. Check whether all the entries + * are processed. If not, then keep retrying. Note, the hypervisor + * will update the data memory directly to indicate the status, so + * reference the data->hdr everywhere. + * + * The strategy here is to wait for the hypervisor to change the page + * state in the RMP table before guest accesses the memory pages. If the + * page state change was not successful, then later memory access will + * result in a crash. + */ + cur_entry =3D data->hdr.cur_entry; + end_entry =3D data->hdr.end_entry; =20 - paddr =3D paddr & PAGE_MASK; - paddr_end =3D paddr + (npages << PAGE_SHIFT); + while (data->hdr.cur_entry <=3D data->hdr.end_entry) { + ghcb_set_sw_scratch(ghcb, (u64)__pa(data)); =20 - while (paddr < paddr_end) { - /* Page validation must be rescinded before changing to shared */ - if (op =3D=3D SNP_PAGE_STATE_SHARED) - pvalidate_4k_page(vaddr, paddr, false); + /* This will advance the shared buffer data points to. */ + ret =3D sev_es_ghcb_hv_call(ghcb, &ctxt, SVM_VMGEXIT_PSC, 0, 0); =20 /* - * Use the MSR protocol because this function can be called before - * the GHCB is established. + * Page State Change VMGEXIT can pass error code through + * exit_info_2. */ - sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op)); - VMGEXIT(); - - val =3D sev_es_rd_ghcb_msr(); - - if (GHCB_RESP_CODE(val) !=3D GHCB_MSR_PSC_RESP) - goto e_term; - - if (GHCB_MSR_PSC_RESP_VAL(val)) - goto e_term; + if (WARN(ret || ghcb->save.sw_exit_info_2, + "SNP: PSC failed ret=3D%d exit_info_2=3D%llx\n", + ret, ghcb->save.sw_exit_info_2)) { + ret =3D 1; + goto out; + } =20 - /* Page validation must be performed after changing to private */ - if (op =3D=3D SNP_PAGE_STATE_PRIVATE) - pvalidate_4k_page(vaddr, paddr, true); + /* Verify that reserved bit is not set */ + if (WARN(data->hdr.reserved, "Reserved bit is set in the PSC header\n"))= { + ret =3D 1; + goto out; + } =20 - vaddr +=3D PAGE_SIZE; - paddr +=3D PAGE_SIZE; + /* + * Sanity check that entry processing is not going backwards. + * This will happen only if hypervisor is tricking us. + */ + if (WARN(data->hdr.end_entry > end_entry || cur_entry > data->hdr.cur_en= try, +"SNP: PSC processing going backward, end_entry %d (got %d) cur_entry %d (g= ot %d)\n", + end_entry, data->hdr.end_entry, cur_entry, data->hdr.cur_entry)) { + ret =3D 1; + goto out; + } } =20 - return; - -e_term: - sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC); +out: + return ret; } =20 -void __head early_snp_set_memory_private(unsigned long vaddr, unsigned lon= g paddr, - unsigned long npages) -{ - /* - * This can be invoked in early boot while running identity mapped, so - * use an open coded check for SNP instead of using cc_platform_has(). - * This eliminates worries about jump tables or checking boot_cpu_data - * in the cc_platform_has() function. - */ - if (!(RIP_REL_REF(sev_status) & MSR_AMD64_SEV_SNP_ENABLED)) - return; - - /* - * Ask the hypervisor to mark the memory pages as private in the RMP - * table. - */ - early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_PRIVATE); -} - -void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long= paddr, - unsigned long npages) -{ - /* - * This can be invoked in early boot while running identity mapped, so - * use an open coded check for SNP instead of using cc_platform_has(). - * This eliminates worries about jump tables or checking boot_cpu_data - * in the cc_platform_has() function. - */ - if (!(RIP_REL_REF(sev_status) & MSR_AMD64_SEV_SNP_ENABLED)) - return; - - /* Ask hypervisor to mark the memory pages shared in the RMP table. */ - early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_SHARED); -} - -static unsigned long __set_pages_state(struct snp_psc_desc *data, unsigned= long vaddr, - unsigned long vaddr_end, int op) +static unsigned long __set_pages_state(struct snp_psc_desc *data, unsigned= long vaddr, + unsigned long vaddr_end, int op) { struct ghcb_state state; bool use_large_entry; @@ -1335,90 +1097,6 @@ int __init sev_es_efi_map_ghcbs(pgd_t *pgd) return 0; } =20 -/* Writes to the SVSM CAA MSR are ignored */ -static enum es_result __vc_handle_msr_caa(struct pt_regs *regs, bool write) -{ - if (write) - return ES_OK; - - regs->ax =3D lower_32_bits(this_cpu_read(svsm_caa_pa)); - regs->dx =3D upper_32_bits(this_cpu_read(svsm_caa_pa)); - - return ES_OK; -} - -/* - * TSC related accesses should not exit to the hypervisor when a guest is - * executing with Secure TSC enabled, so special handling is required for - * accesses of MSR_IA32_TSC and MSR_AMD64_GUEST_TSC_FREQ. - */ -static enum es_result __vc_handle_secure_tsc_msrs(struct pt_regs *regs, bo= ol write) -{ - u64 tsc; - - /* - * GUEST_TSC_FREQ should not be intercepted when Secure TSC is enabled. - * Terminate the SNP guest when the interception is enabled. - */ - if (regs->cx =3D=3D MSR_AMD64_GUEST_TSC_FREQ) - return ES_VMM_ERROR; - - /* - * Writes: Writing to MSR_IA32_TSC can cause subsequent reads of the TSC - * to return undefined values, so ignore all writes. - * - * Reads: Reads of MSR_IA32_TSC should return the current TSC value, use - * the value returned by rdtsc_ordered(). - */ - if (write) { - WARN_ONCE(1, "TSC MSR writes are verboten!\n"); - return ES_OK; - } - - tsc =3D rdtsc_ordered(); - regs->ax =3D lower_32_bits(tsc); - regs->dx =3D upper_32_bits(tsc); - - return ES_OK; -} - -static enum es_result vc_handle_msr(struct ghcb *ghcb, struct es_em_ctxt *= ctxt) -{ - struct pt_regs *regs =3D ctxt->regs; - enum es_result ret; - bool write; - - /* Is it a WRMSR? */ - write =3D ctxt->insn.opcode.bytes[1] =3D=3D 0x30; - - switch (regs->cx) { - case MSR_SVSM_CAA: - return __vc_handle_msr_caa(regs, write); - case MSR_IA32_TSC: - case MSR_AMD64_GUEST_TSC_FREQ: - if (sev_status & MSR_AMD64_SNP_SECURE_TSC) - return __vc_handle_secure_tsc_msrs(regs, write); - break; - default: - break; - } - - ghcb_set_rcx(ghcb, regs->cx); - if (write) { - ghcb_set_rax(ghcb, regs->ax); - ghcb_set_rdx(ghcb, regs->dx); - } - - ret =3D sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MSR, write, 0); - - if ((ret =3D=3D ES_OK) && !write) { - regs->ax =3D ghcb->save.rax; - regs->dx =3D ghcb->save.rdx; - } - - return ret; -} - static void snp_register_per_cpu_ghcb(void) { struct sev_es_runtime_data *data; @@ -1631,748 +1309,6 @@ void __init sev_es_init_vc_handling(void) initial_vc_handler =3D (unsigned long)kernel_exc_vmm_communication; } =20 -static void __init vc_early_forward_exception(struct es_em_ctxt *ctxt) -{ - int trapnr =3D ctxt->fi.vector; - - if (trapnr =3D=3D X86_TRAP_PF) - native_write_cr2(ctxt->fi.cr2); - - ctxt->regs->orig_ax =3D ctxt->fi.error_code; - do_early_exception(ctxt->regs, trapnr); -} - -static long *vc_insn_get_rm(struct es_em_ctxt *ctxt) -{ - long *reg_array; - int offset; - - reg_array =3D (long *)ctxt->regs; - offset =3D insn_get_modrm_rm_off(&ctxt->insn, ctxt->regs); - - if (offset < 0) - return NULL; - - offset /=3D sizeof(long); - - return reg_array + offset; -} -static enum es_result vc_do_mmio(struct ghcb *ghcb, struct es_em_ctxt *ctx= t, - unsigned int bytes, bool read) -{ - u64 exit_code, exit_info_1, exit_info_2; - unsigned long ghcb_pa =3D __pa(ghcb); - enum es_result res; - phys_addr_t paddr; - void __user *ref; - - ref =3D insn_get_addr_ref(&ctxt->insn, ctxt->regs); - if (ref =3D=3D (void __user *)-1L) - return ES_UNSUPPORTED; - - exit_code =3D read ? SVM_VMGEXIT_MMIO_READ : SVM_VMGEXIT_MMIO_WRITE; - - res =3D vc_slow_virt_to_phys(ghcb, ctxt, (unsigned long)ref, &paddr); - if (res !=3D ES_OK) { - if (res =3D=3D ES_EXCEPTION && !read) - ctxt->fi.error_code |=3D X86_PF_WRITE; - - return res; - } - - exit_info_1 =3D paddr; - /* Can never be greater than 8 */ - exit_info_2 =3D bytes; - - ghcb_set_sw_scratch(ghcb, ghcb_pa + offsetof(struct ghcb, shared_buffer)); - - return sev_es_ghcb_hv_call(ghcb, ctxt, exit_code, exit_info_1, exit_info_= 2); -} - -/* - * The MOVS instruction has two memory operands, which raises the - * problem that it is not known whether the access to the source or the - * destination caused the #VC exception (and hence whether an MMIO read - * or write operation needs to be emulated). - * - * Instead of playing games with walking page-tables and trying to guess - * whether the source or destination is an MMIO range, split the move - * into two operations, a read and a write with only one memory operand. - * This will cause a nested #VC exception on the MMIO address which can - * then be handled. - * - * This implementation has the benefit that it also supports MOVS where - * source _and_ destination are MMIO regions. - * - * It will slow MOVS on MMIO down a lot, but in SEV-ES guests it is a - * rare operation. If it turns out to be a performance problem the split - * operations can be moved to memcpy_fromio() and memcpy_toio(). - */ -static enum es_result vc_handle_mmio_movs(struct es_em_ctxt *ctxt, - unsigned int bytes) -{ - unsigned long ds_base, es_base; - unsigned char *src, *dst; - unsigned char buffer[8]; - enum es_result ret; - bool rep; - int off; - - ds_base =3D insn_get_seg_base(ctxt->regs, INAT_SEG_REG_DS); - es_base =3D insn_get_seg_base(ctxt->regs, INAT_SEG_REG_ES); - - if (ds_base =3D=3D -1L || es_base =3D=3D -1L) { - ctxt->fi.vector =3D X86_TRAP_GP; - ctxt->fi.error_code =3D 0; - return ES_EXCEPTION; - } - - src =3D ds_base + (unsigned char *)ctxt->regs->si; - dst =3D es_base + (unsigned char *)ctxt->regs->di; - - ret =3D vc_read_mem(ctxt, src, buffer, bytes); - if (ret !=3D ES_OK) - return ret; - - ret =3D vc_write_mem(ctxt, dst, buffer, bytes); - if (ret !=3D ES_OK) - return ret; - - if (ctxt->regs->flags & X86_EFLAGS_DF) - off =3D -bytes; - else - off =3D bytes; - - ctxt->regs->si +=3D off; - ctxt->regs->di +=3D off; - - rep =3D insn_has_rep_prefix(&ctxt->insn); - if (rep) - ctxt->regs->cx -=3D 1; - - if (!rep || ctxt->regs->cx =3D=3D 0) - return ES_OK; - else - return ES_RETRY; -} - -static enum es_result vc_handle_mmio(struct ghcb *ghcb, struct es_em_ctxt = *ctxt) -{ - struct insn *insn =3D &ctxt->insn; - enum insn_mmio_type mmio; - unsigned int bytes =3D 0; - enum es_result ret; - u8 sign_byte; - long *reg_data; - - mmio =3D insn_decode_mmio(insn, &bytes); - if (mmio =3D=3D INSN_MMIO_DECODE_FAILED) - return ES_DECODE_FAILED; - - if (mmio !=3D INSN_MMIO_WRITE_IMM && mmio !=3D INSN_MMIO_MOVS) { - reg_data =3D insn_get_modrm_reg_ptr(insn, ctxt->regs); - if (!reg_data) - return ES_DECODE_FAILED; - } - - if (user_mode(ctxt->regs)) - return ES_UNSUPPORTED; - - switch (mmio) { - case INSN_MMIO_WRITE: - memcpy(ghcb->shared_buffer, reg_data, bytes); - ret =3D vc_do_mmio(ghcb, ctxt, bytes, false); - break; - case INSN_MMIO_WRITE_IMM: - memcpy(ghcb->shared_buffer, insn->immediate1.bytes, bytes); - ret =3D vc_do_mmio(ghcb, ctxt, bytes, false); - break; - case INSN_MMIO_READ: - ret =3D vc_do_mmio(ghcb, ctxt, bytes, true); - if (ret) - break; - - /* Zero-extend for 32-bit operation */ - if (bytes =3D=3D 4) - *reg_data =3D 0; - - memcpy(reg_data, ghcb->shared_buffer, bytes); - break; - case INSN_MMIO_READ_ZERO_EXTEND: - ret =3D vc_do_mmio(ghcb, ctxt, bytes, true); - if (ret) - break; - - /* Zero extend based on operand size */ - memset(reg_data, 0, insn->opnd_bytes); - memcpy(reg_data, ghcb->shared_buffer, bytes); - break; - case INSN_MMIO_READ_SIGN_EXTEND: - ret =3D vc_do_mmio(ghcb, ctxt, bytes, true); - if (ret) - break; - - if (bytes =3D=3D 1) { - u8 *val =3D (u8 *)ghcb->shared_buffer; - - sign_byte =3D (*val & 0x80) ? 0xff : 0x00; - } else { - u16 *val =3D (u16 *)ghcb->shared_buffer; - - sign_byte =3D (*val & 0x8000) ? 0xff : 0x00; - } - - /* Sign extend based on operand size */ - memset(reg_data, sign_byte, insn->opnd_bytes); - memcpy(reg_data, ghcb->shared_buffer, bytes); - break; - case INSN_MMIO_MOVS: - ret =3D vc_handle_mmio_movs(ctxt, bytes); - break; - default: - ret =3D ES_UNSUPPORTED; - break; - } - - return ret; -} - -static enum es_result vc_handle_dr7_write(struct ghcb *ghcb, - struct es_em_ctxt *ctxt) -{ - struct sev_es_runtime_data *data =3D this_cpu_read(runtime_data); - long val, *reg =3D vc_insn_get_rm(ctxt); - enum es_result ret; - - if (sev_status & MSR_AMD64_SNP_DEBUG_SWAP) - return ES_VMM_ERROR; - - if (!reg) - return ES_DECODE_FAILED; - - val =3D *reg; - - /* Upper 32 bits must be written as zeroes */ - if (val >> 32) { - ctxt->fi.vector =3D X86_TRAP_GP; - ctxt->fi.error_code =3D 0; - return ES_EXCEPTION; - } - - /* Clear out other reserved bits and set bit 10 */ - val =3D (val & 0xffff23ffL) | BIT(10); - - /* Early non-zero writes to DR7 are not supported */ - if (!data && (val & ~DR7_RESET_VALUE)) - return ES_UNSUPPORTED; - - /* Using a value of 0 for ExitInfo1 means RAX holds the value */ - ghcb_set_rax(ghcb, val); - ret =3D sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_WRITE_DR7, 0, 0); - if (ret !=3D ES_OK) - return ret; - - if (data) - data->dr7 =3D val; - - return ES_OK; -} - -static enum es_result vc_handle_dr7_read(struct ghcb *ghcb, - struct es_em_ctxt *ctxt) -{ - struct sev_es_runtime_data *data =3D this_cpu_read(runtime_data); - long *reg =3D vc_insn_get_rm(ctxt); - - if (sev_status & MSR_AMD64_SNP_DEBUG_SWAP) - return ES_VMM_ERROR; - - if (!reg) - return ES_DECODE_FAILED; - - if (data) - *reg =3D data->dr7; - else - *reg =3D DR7_RESET_VALUE; - - return ES_OK; -} - -static enum es_result vc_handle_wbinvd(struct ghcb *ghcb, - struct es_em_ctxt *ctxt) -{ - return sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_WBINVD, 0, 0); -} - -static enum es_result vc_handle_rdpmc(struct ghcb *ghcb, struct es_em_ctxt= *ctxt) -{ - enum es_result ret; - - ghcb_set_rcx(ghcb, ctxt->regs->cx); - - ret =3D sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_RDPMC, 0, 0); - if (ret !=3D ES_OK) - return ret; - - if (!(ghcb_rax_is_valid(ghcb) && ghcb_rdx_is_valid(ghcb))) - return ES_VMM_ERROR; - - ctxt->regs->ax =3D ghcb->save.rax; - ctxt->regs->dx =3D ghcb->save.rdx; - - return ES_OK; -} - -static enum es_result vc_handle_monitor(struct ghcb *ghcb, - struct es_em_ctxt *ctxt) -{ - /* - * Treat it as a NOP and do not leak a physical address to the - * hypervisor. - */ - return ES_OK; -} - -static enum es_result vc_handle_mwait(struct ghcb *ghcb, - struct es_em_ctxt *ctxt) -{ - /* Treat the same as MONITOR/MONITORX */ - return ES_OK; -} - -static enum es_result vc_handle_vmmcall(struct ghcb *ghcb, - struct es_em_ctxt *ctxt) -{ - enum es_result ret; - - ghcb_set_rax(ghcb, ctxt->regs->ax); - ghcb_set_cpl(ghcb, user_mode(ctxt->regs) ? 3 : 0); - - if (x86_platform.hyper.sev_es_hcall_prepare) - x86_platform.hyper.sev_es_hcall_prepare(ghcb, ctxt->regs); - - ret =3D sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_VMMCALL, 0, 0); - if (ret !=3D ES_OK) - return ret; - - if (!ghcb_rax_is_valid(ghcb)) - return ES_VMM_ERROR; - - ctxt->regs->ax =3D ghcb->save.rax; - - /* - * Call sev_es_hcall_finish() after regs->ax is already set. - * This allows the hypervisor handler to overwrite it again if - * necessary. - */ - if (x86_platform.hyper.sev_es_hcall_finish && - !x86_platform.hyper.sev_es_hcall_finish(ghcb, ctxt->regs)) - return ES_VMM_ERROR; - - return ES_OK; -} - -static enum es_result vc_handle_trap_ac(struct ghcb *ghcb, - struct es_em_ctxt *ctxt) -{ - /* - * Calling ecx_alignment_check() directly does not work, because it - * enables IRQs and the GHCB is active. Forward the exception and call - * it later from vc_forward_exception(). - */ - ctxt->fi.vector =3D X86_TRAP_AC; - ctxt->fi.error_code =3D 0; - return ES_EXCEPTION; -} - -static enum es_result vc_handle_exitcode(struct es_em_ctxt *ctxt, - struct ghcb *ghcb, - unsigned long exit_code) -{ - enum es_result result =3D vc_check_opcode_bytes(ctxt, exit_code); - - if (result !=3D ES_OK) - return result; - - switch (exit_code) { - case SVM_EXIT_READ_DR7: - result =3D vc_handle_dr7_read(ghcb, ctxt); - break; - case SVM_EXIT_WRITE_DR7: - result =3D vc_handle_dr7_write(ghcb, ctxt); - break; - case SVM_EXIT_EXCP_BASE + X86_TRAP_AC: - result =3D vc_handle_trap_ac(ghcb, ctxt); - break; - case SVM_EXIT_RDTSC: - case SVM_EXIT_RDTSCP: - result =3D vc_handle_rdtsc(ghcb, ctxt, exit_code); - break; - case SVM_EXIT_RDPMC: - result =3D vc_handle_rdpmc(ghcb, ctxt); - break; - case SVM_EXIT_INVD: - pr_err_ratelimited("#VC exception for INVD??? Seriously???\n"); - result =3D ES_UNSUPPORTED; - break; - case SVM_EXIT_CPUID: - result =3D vc_handle_cpuid(ghcb, ctxt); - break; - case SVM_EXIT_IOIO: - result =3D vc_handle_ioio(ghcb, ctxt); - break; - case SVM_EXIT_MSR: - result =3D vc_handle_msr(ghcb, ctxt); - break; - case SVM_EXIT_VMMCALL: - result =3D vc_handle_vmmcall(ghcb, ctxt); - break; - case SVM_EXIT_WBINVD: - result =3D vc_handle_wbinvd(ghcb, ctxt); - break; - case SVM_EXIT_MONITOR: - result =3D vc_handle_monitor(ghcb, ctxt); - break; - case SVM_EXIT_MWAIT: - result =3D vc_handle_mwait(ghcb, ctxt); - break; - case SVM_EXIT_NPF: - result =3D vc_handle_mmio(ghcb, ctxt); - break; - default: - /* - * Unexpected #VC exception - */ - result =3D ES_UNSUPPORTED; - } - - return result; -} - -static __always_inline bool is_vc2_stack(unsigned long sp) -{ - return (sp >=3D __this_cpu_ist_bottom_va(VC2) && sp < __this_cpu_ist_top_= va(VC2)); -} - -static __always_inline bool vc_from_invalid_context(struct pt_regs *regs) -{ - unsigned long sp, prev_sp; - - sp =3D (unsigned long)regs; - prev_sp =3D regs->sp; - - /* - * If the code was already executing on the VC2 stack when the #VC - * happened, let it proceed to the normal handling routine. This way the - * code executing on the VC2 stack can cause #VC exceptions to get handle= d. - */ - return is_vc2_stack(sp) && !is_vc2_stack(prev_sp); -} - -static bool vc_raw_handle_exception(struct pt_regs *regs, unsigned long er= ror_code) -{ - struct ghcb_state state; - struct es_em_ctxt ctxt; - enum es_result result; - struct ghcb *ghcb; - bool ret =3D true; - - ghcb =3D __sev_get_ghcb(&state); - - vc_ghcb_invalidate(ghcb); - result =3D vc_init_em_ctxt(&ctxt, regs, error_code); - - if (result =3D=3D ES_OK) - result =3D vc_handle_exitcode(&ctxt, ghcb, error_code); - - __sev_put_ghcb(&state); - - /* Done - now check the result */ - switch (result) { - case ES_OK: - vc_finish_insn(&ctxt); - break; - case ES_UNSUPPORTED: - pr_err_ratelimited("Unsupported exit-code 0x%02lx in #VC exception (IP: = 0x%lx)\n", - error_code, regs->ip); - ret =3D false; - break; - case ES_VMM_ERROR: - pr_err_ratelimited("Failure in communication with VMM (exit-code 0x%02lx= IP: 0x%lx)\n", - error_code, regs->ip); - ret =3D false; - break; - case ES_DECODE_FAILED: - pr_err_ratelimited("Failed to decode instruction (exit-code 0x%02lx IP: = 0x%lx)\n", - error_code, regs->ip); - ret =3D false; - break; - case ES_EXCEPTION: - vc_forward_exception(&ctxt); - break; - case ES_RETRY: - /* Nothing to do */ - break; - default: - pr_emerg("Unknown result in %s():%d\n", __func__, result); - /* - * Emulating the instruction which caused the #VC exception - * failed - can't continue so print debug information - */ - BUG(); - } - - return ret; -} - -static __always_inline bool vc_is_db(unsigned long error_code) -{ - return error_code =3D=3D SVM_EXIT_EXCP_BASE + X86_TRAP_DB; -} - -/* - * Runtime #VC exception handler when raised from kernel mode. Runs in NMI= mode - * and will panic when an error happens. - */ -DEFINE_IDTENTRY_VC_KERNEL(exc_vmm_communication) -{ - irqentry_state_t irq_state; - - /* - * With the current implementation it is always possible to switch to a - * safe stack because #VC exceptions only happen at known places, like - * intercepted instructions or accesses to MMIO areas/IO ports. They can - * also happen with code instrumentation when the hypervisor intercepts - * #DB, but the critical paths are forbidden to be instrumented, so #DB - * exceptions currently also only happen in safe places. - * - * But keep this here in case the noinstr annotations are violated due - * to bug elsewhere. - */ - if (unlikely(vc_from_invalid_context(regs))) { - instrumentation_begin(); - panic("Can't handle #VC exception from unsupported context\n"); - instrumentation_end(); - } - - /* - * Handle #DB before calling into !noinstr code to avoid recursive #DB. - */ - if (vc_is_db(error_code)) { - exc_debug(regs); - return; - } - - irq_state =3D irqentry_nmi_enter(regs); - - instrumentation_begin(); - - if (!vc_raw_handle_exception(regs, error_code)) { - /* Show some debug info */ - show_regs(regs); - - /* Ask hypervisor to sev_es_terminate */ - sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ); - - /* If that fails and we get here - just panic */ - panic("Returned from Terminate-Request to Hypervisor\n"); - } - - instrumentation_end(); - irqentry_nmi_exit(regs, irq_state); -} - -/* - * Runtime #VC exception handler when raised from user mode. Runs in IRQ m= ode - * and will kill the current task with SIGBUS when an error happens. - */ -DEFINE_IDTENTRY_VC_USER(exc_vmm_communication) -{ - /* - * Handle #DB before calling into !noinstr code to avoid recursive #DB. - */ - if (vc_is_db(error_code)) { - noist_exc_debug(regs); - return; - } - - irqentry_enter_from_user_mode(regs); - instrumentation_begin(); - - if (!vc_raw_handle_exception(regs, error_code)) { - /* - * Do not kill the machine if user-space triggered the - * exception. Send SIGBUS instead and let user-space deal with - * it. - */ - force_sig_fault(SIGBUS, BUS_OBJERR, (void __user *)0); - } - - instrumentation_end(); - irqentry_exit_to_user_mode(regs); -} - -bool __init handle_vc_boot_ghcb(struct pt_regs *regs) -{ - unsigned long exit_code =3D regs->orig_ax; - struct es_em_ctxt ctxt; - enum es_result result; - - vc_ghcb_invalidate(boot_ghcb); - - result =3D vc_init_em_ctxt(&ctxt, regs, exit_code); - if (result =3D=3D ES_OK) - result =3D vc_handle_exitcode(&ctxt, boot_ghcb, exit_code); - - /* Done - now check the result */ - switch (result) { - case ES_OK: - vc_finish_insn(&ctxt); - break; - case ES_UNSUPPORTED: - early_printk("PANIC: Unsupported exit-code 0x%02lx in early #VC exceptio= n (IP: 0x%lx)\n", - exit_code, regs->ip); - goto fail; - case ES_VMM_ERROR: - early_printk("PANIC: Failure in communication with VMM (exit-code 0x%02l= x IP: 0x%lx)\n", - exit_code, regs->ip); - goto fail; - case ES_DECODE_FAILED: - early_printk("PANIC: Failed to decode instruction (exit-code 0x%02lx IP:= 0x%lx)\n", - exit_code, regs->ip); - goto fail; - case ES_EXCEPTION: - vc_early_forward_exception(&ctxt); - break; - case ES_RETRY: - /* Nothing to do */ - break; - default: - BUG(); - } - - return true; - -fail: - show_regs(regs); - - sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ); -} - -/* - * Initial set up of SNP relies on information provided by the - * Confidential Computing blob, which can be passed to the kernel - * in the following ways, depending on how it is booted: - * - * - when booted via the boot/decompress kernel: - * - via boot_params - * - * - when booted directly by firmware/bootloader (e.g. CONFIG_PVH): - * - via a setup_data entry, as defined by the Linux Boot Protocol - * - * Scan for the blob in that order. - */ -static __head struct cc_blob_sev_info *find_cc_blob(struct boot_params *bp) -{ - struct cc_blob_sev_info *cc_info; - - /* Boot kernel would have passed the CC blob via boot_params. */ - if (bp->cc_blob_address) { - cc_info =3D (struct cc_blob_sev_info *)(unsigned long)bp->cc_blob_addres= s; - goto found_cc_info; - } - - /* - * If kernel was booted directly, without the use of the - * boot/decompression kernel, the CC blob may have been passed via - * setup_data instead. - */ - cc_info =3D find_cc_blob_setup_data(bp); - if (!cc_info) - return NULL; - -found_cc_info: - if (cc_info->magic !=3D CC_BLOB_SEV_HDR_MAGIC) - snp_abort(); - - return cc_info; -} - -static __head void svsm_setup(struct cc_blob_sev_info *cc_info) -{ - struct svsm_call call =3D {}; - int ret; - u64 pa; - - /* - * Record the SVSM Calling Area address (CAA) if the guest is not - * running at VMPL0. The CA will be used to communicate with the - * SVSM to perform the SVSM services. - */ - if (!svsm_setup_ca(cc_info)) - return; - - /* - * It is very early in the boot and the kernel is running identity - * mapped but without having adjusted the pagetables to where the - * kernel was loaded (physbase), so the get the CA address using - * RIP-relative addressing. - */ - pa =3D (u64)rip_rel_ptr(&boot_svsm_ca_page); - - /* - * Switch over to the boot SVSM CA while the current CA is still - * addressable. There is no GHCB at this point so use the MSR protocol. - * - * SVSM_CORE_REMAP_CA call: - * RAX =3D 0 (Protocol=3D0, CallID=3D0) - * RCX =3D New CA GPA - */ - call.caa =3D svsm_get_caa(); - call.rax =3D SVSM_CORE_CALL(SVSM_CORE_REMAP_CA); - call.rcx =3D pa; - ret =3D svsm_perform_call_protocol(&call); - if (ret) - sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_SVSM_CA_REMAP_FAIL); - - RIP_REL_REF(boot_svsm_caa) =3D (struct svsm_ca *)pa; - RIP_REL_REF(boot_svsm_caa_pa) =3D pa; -} - -bool __head snp_init(struct boot_params *bp) -{ - struct cc_blob_sev_info *cc_info; - - if (!bp) - return false; - - cc_info =3D find_cc_blob(bp); - if (!cc_info) - return false; - - if (cc_info->secrets_phys && cc_info->secrets_len =3D=3D PAGE_SIZE) - secrets_pa =3D cc_info->secrets_phys; - else - return false; - - setup_cpuid_table(cc_info); - - svsm_setup(cc_info); - - /* - * The CC blob will be used later to access the secrets page. Cache - * it here like the boot kernel does. - */ - bp->cc_blob_address =3D (u32)(unsigned long)cc_info; - - return true; -} - -void __head __noreturn snp_abort(void) -{ - sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED); -} - /* * SEV-SNP guests should only execute dmi_setup() if EFI_CONFIG_TABLES are * enabled, as the alternative (fallback) logic for DMI probing in the leg= acy diff --git a/arch/x86/coco/sev/shared.c b/arch/x86/coco/sev/shared.c index a7c94020e384..815542295f16 100644 --- a/arch/x86/coco/sev/shared.c +++ b/arch/x86/coco/sev/shared.c @@ -27,17 +27,12 @@ =20 /* * SVSM related information: - * When running under an SVSM, the VMPL that Linux is executing at must = be - * non-zero. The VMPL is therefore used to indicate the presence of an S= VSM. - * * During boot, the page tables are set up as identity mapped and later * changed to use kernel virtual addresses. Maintain separate virtual and * physical addresses for the CAA to allow SVSM functions to be used dur= ing * early boot, both with identity mapped virtual addresses and proper ke= rnel * virtual addresses. */ -u8 snp_vmpl __ro_after_init; -EXPORT_SYMBOL_GPL(snp_vmpl); struct svsm_ca *boot_svsm_caa __ro_after_init; u64 boot_svsm_caa_pa __ro_after_init; =20 @@ -1192,28 +1187,6 @@ static void __head setup_cpuid_table(const struct cc= _blob_sev_info *cc_info) } } =20 -static inline void __pval_terminate(u64 pfn, bool action, unsigned int pag= e_size, - int ret, u64 svsm_ret) -{ - WARN(1, "PVALIDATE failure: pfn: 0x%llx, action: %u, size: %u, ret: %d, s= vsm_ret: 0x%llx\n", - pfn, action, page_size, ret, svsm_ret); - - sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE); -} - -static void svsm_pval_terminate(struct svsm_pvalidate_call *pc, int ret, u= 64 svsm_ret) -{ - unsigned int page_size; - bool action; - u64 pfn; - - pfn =3D pc->entry[pc->cur_index].pfn; - action =3D pc->entry[pc->cur_index].action; - page_size =3D pc->entry[pc->cur_index].page_size; - - __pval_terminate(pfn, action, page_size, ret, svsm_ret); -} - static void __head svsm_pval_4k_page(unsigned long paddr, bool validate) { struct svsm_pvalidate_call *pc; @@ -1269,260 +1242,6 @@ static void __head pvalidate_4k_page(unsigned long = vaddr, unsigned long paddr, } } =20 -static void pval_pages(struct snp_psc_desc *desc) -{ - struct psc_entry *e; - unsigned long vaddr; - unsigned int size; - unsigned int i; - bool validate; - u64 pfn; - int rc; - - for (i =3D 0; i <=3D desc->hdr.end_entry; i++) { - e =3D &desc->entries[i]; - - pfn =3D e->gfn; - vaddr =3D (unsigned long)pfn_to_kaddr(pfn); - size =3D e->pagesize ? RMP_PG_SIZE_2M : RMP_PG_SIZE_4K; - validate =3D e->operation =3D=3D SNP_PAGE_STATE_PRIVATE; - - rc =3D pvalidate(vaddr, size, validate); - if (!rc) - continue; - - if (rc =3D=3D PVALIDATE_FAIL_SIZEMISMATCH && size =3D=3D RMP_PG_SIZE_2M)= { - unsigned long vaddr_end =3D vaddr + PMD_SIZE; - - for (; vaddr < vaddr_end; vaddr +=3D PAGE_SIZE, pfn++) { - rc =3D pvalidate(vaddr, RMP_PG_SIZE_4K, validate); - if (rc) - __pval_terminate(pfn, validate, RMP_PG_SIZE_4K, rc, 0); - } - } else { - __pval_terminate(pfn, validate, size, rc, 0); - } - } -} - -static u64 svsm_build_ca_from_pfn_range(u64 pfn, u64 pfn_end, bool action, - struct svsm_pvalidate_call *pc) -{ - struct svsm_pvalidate_entry *pe; - - /* Nothing in the CA yet */ - pc->num_entries =3D 0; - pc->cur_index =3D 0; - - pe =3D &pc->entry[0]; - - while (pfn < pfn_end) { - pe->page_size =3D RMP_PG_SIZE_4K; - pe->action =3D action; - pe->ignore_cf =3D 0; - pe->pfn =3D pfn; - - pe++; - pfn++; - - pc->num_entries++; - if (pc->num_entries =3D=3D SVSM_PVALIDATE_MAX_COUNT) - break; - } - - return pfn; -} - -static int svsm_build_ca_from_psc_desc(struct snp_psc_desc *desc, unsigned= int desc_entry, - struct svsm_pvalidate_call *pc) -{ - struct svsm_pvalidate_entry *pe; - struct psc_entry *e; - - /* Nothing in the CA yet */ - pc->num_entries =3D 0; - pc->cur_index =3D 0; - - pe =3D &pc->entry[0]; - e =3D &desc->entries[desc_entry]; - - while (desc_entry <=3D desc->hdr.end_entry) { - pe->page_size =3D e->pagesize ? RMP_PG_SIZE_2M : RMP_PG_SIZE_4K; - pe->action =3D e->operation =3D=3D SNP_PAGE_STATE_PRIVATE; - pe->ignore_cf =3D 0; - pe->pfn =3D e->gfn; - - pe++; - e++; - - desc_entry++; - pc->num_entries++; - if (pc->num_entries =3D=3D SVSM_PVALIDATE_MAX_COUNT) - break; - } - - return desc_entry; -} - -static void svsm_pval_pages(struct snp_psc_desc *desc) -{ - struct svsm_pvalidate_entry pv_4k[VMGEXIT_PSC_MAX_ENTRY]; - unsigned int i, pv_4k_count =3D 0; - struct svsm_pvalidate_call *pc; - struct svsm_call call =3D {}; - unsigned long flags; - bool action; - u64 pc_pa; - int ret; - - /* - * This can be called very early in the boot, use native functions in - * order to avoid paravirt issues. - */ - flags =3D native_local_irq_save(); - - /* - * The SVSM calling area (CA) can support processing 510 entries at a - * time. Loop through the Page State Change descriptor until the CA is - * full or the last entry in the descriptor is reached, at which time - * the SVSM is invoked. This repeats until all entries in the descriptor - * are processed. - */ - call.caa =3D svsm_get_caa(); - - pc =3D (struct svsm_pvalidate_call *)call.caa->svsm_buffer; - pc_pa =3D svsm_get_caa_pa() + offsetof(struct svsm_ca, svsm_buffer); - - /* Protocol 0, Call ID 1 */ - call.rax =3D SVSM_CORE_CALL(SVSM_CORE_PVALIDATE); - call.rcx =3D pc_pa; - - for (i =3D 0; i <=3D desc->hdr.end_entry;) { - i =3D svsm_build_ca_from_psc_desc(desc, i, pc); - - do { - ret =3D svsm_perform_call_protocol(&call); - if (!ret) - continue; - - /* - * Check if the entry failed because of an RMP mismatch (a - * PVALIDATE at 2M was requested, but the page is mapped in - * the RMP as 4K). - */ - - if (call.rax_out =3D=3D SVSM_PVALIDATE_FAIL_SIZEMISMATCH && - pc->entry[pc->cur_index].page_size =3D=3D RMP_PG_SIZE_2M) { - /* Save this entry for post-processing at 4K */ - pv_4k[pv_4k_count++] =3D pc->entry[pc->cur_index]; - - /* Skip to the next one unless at the end of the list */ - pc->cur_index++; - if (pc->cur_index < pc->num_entries) - ret =3D -EAGAIN; - else - ret =3D 0; - } - } while (ret =3D=3D -EAGAIN); - - if (ret) - svsm_pval_terminate(pc, ret, call.rax_out); - } - - /* Process any entries that failed to be validated at 2M and validate the= m at 4K */ - for (i =3D 0; i < pv_4k_count; i++) { - u64 pfn, pfn_end; - - action =3D pv_4k[i].action; - pfn =3D pv_4k[i].pfn; - pfn_end =3D pfn + 512; - - while (pfn < pfn_end) { - pfn =3D svsm_build_ca_from_pfn_range(pfn, pfn_end, action, pc); - - ret =3D svsm_perform_call_protocol(&call); - if (ret) - svsm_pval_terminate(pc, ret, call.rax_out); - } - } - - native_local_irq_restore(flags); -} - -static void pvalidate_pages(struct snp_psc_desc *desc) -{ - if (snp_vmpl) - svsm_pval_pages(desc); - else - pval_pages(desc); -} - -static int vmgexit_psc(struct ghcb *ghcb, struct snp_psc_desc *desc) -{ - int cur_entry, end_entry, ret =3D 0; - struct snp_psc_desc *data; - struct es_em_ctxt ctxt; - - vc_ghcb_invalidate(ghcb); - - /* Copy the input desc into GHCB shared buffer */ - data =3D (struct snp_psc_desc *)ghcb->shared_buffer; - memcpy(ghcb->shared_buffer, desc, min_t(int, GHCB_SHARED_BUF_SIZE, sizeof= (*desc))); - - /* - * As per the GHCB specification, the hypervisor can resume the guest - * before processing all the entries. Check whether all the entries - * are processed. If not, then keep retrying. Note, the hypervisor - * will update the data memory directly to indicate the status, so - * reference the data->hdr everywhere. - * - * The strategy here is to wait for the hypervisor to change the page - * state in the RMP table before guest accesses the memory pages. If the - * page state change was not successful, then later memory access will - * result in a crash. - */ - cur_entry =3D data->hdr.cur_entry; - end_entry =3D data->hdr.end_entry; - - while (data->hdr.cur_entry <=3D data->hdr.end_entry) { - ghcb_set_sw_scratch(ghcb, (u64)__pa(data)); - - /* This will advance the shared buffer data points to. */ - ret =3D sev_es_ghcb_hv_call(ghcb, &ctxt, SVM_VMGEXIT_PSC, 0, 0); - - /* - * Page State Change VMGEXIT can pass error code through - * exit_info_2. - */ - if (WARN(ret || ghcb->save.sw_exit_info_2, - "SNP: PSC failed ret=3D%d exit_info_2=3D%llx\n", - ret, ghcb->save.sw_exit_info_2)) { - ret =3D 1; - goto out; - } - - /* Verify that reserved bit is not set */ - if (WARN(data->hdr.reserved, "Reserved bit is set in the PSC header\n"))= { - ret =3D 1; - goto out; - } - - /* - * Sanity check that entry processing is not going backwards. - * This will happen only if hypervisor is tricking us. - */ - if (WARN(data->hdr.end_entry > end_entry || cur_entry > data->hdr.cur_en= try, -"SNP: PSC processing going backward, end_entry %d (got %d) cur_entry %d (g= ot %d)\n", - end_entry, data->hdr.end_entry, cur_entry, data->hdr.cur_entry)) { - ret =3D 1; - goto out; - } - } - -out: - return ret; -} - static enum es_result vc_check_opcode_bytes(struct es_em_ctxt *ctxt, unsigned long exit_code) { diff --git a/arch/x86/coco/sev/startup.c b/arch/x86/coco/sev/startup.c new file mode 100644 index 000000000000..9f5dc70cfb44 --- /dev/null +++ b/arch/x86/coco/sev/startup.c @@ -0,0 +1,1395 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * AMD Memory Encryption Support + * + * Copyright (C) 2019 SUSE + * + * Author: Joerg Roedel + */ + +#define pr_fmt(fmt) "SEV: " fmt + +#include /* For show_regs() */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* For early boot hypervisor communication in SEV-ES enabled guests */ +struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE); + +/* + * Needs to be in the .data section because we need it NULL before bss is + * cleared + */ +struct ghcb *boot_ghcb __section(".data"); + +/* Bitmap of SEV features supported by the hypervisor */ +u64 sev_hv_features __ro_after_init; + +/* Secrets page physical address from the CC blob */ +static u64 secrets_pa __ro_after_init; + +/* For early boot SVSM communication */ +struct svsm_ca boot_svsm_ca_page __aligned(PAGE_SIZE); + +DEFINE_PER_CPU(struct svsm_ca *, svsm_caa); +DEFINE_PER_CPU(u64, svsm_caa_pa); + +/* + * Nothing shall interrupt this code path while holding the per-CPU + * GHCB. The backup GHCB is only for NMIs interrupting this path. + * + * Callers must disable local interrupts around it. + */ +noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state) +{ + struct sev_es_runtime_data *data; + struct ghcb *ghcb; + + WARN_ON(!irqs_disabled()); + + data =3D this_cpu_read(runtime_data); + ghcb =3D &data->ghcb_page; + + if (unlikely(data->ghcb_active)) { + /* GHCB is already in use - save its contents */ + + if (unlikely(data->backup_ghcb_active)) { + /* + * Backup-GHCB is also already in use. There is no way + * to continue here so just kill the machine. To make + * panic() work, mark GHCBs inactive so that messages + * can be printed out. + */ + data->ghcb_active =3D false; + data->backup_ghcb_active =3D false; + + instrumentation_begin(); + panic("Unable to handle #VC exception! GHCB and Backup GHCB are already= in use"); + instrumentation_end(); + } + + /* Mark backup_ghcb active before writing to it */ + data->backup_ghcb_active =3D true; + + state->ghcb =3D &data->backup_ghcb; + + /* Backup GHCB content */ + *state->ghcb =3D *ghcb; + } else { + state->ghcb =3D NULL; + data->ghcb_active =3D true; + } + + return ghcb; +} + +static int vc_fetch_insn_kernel(struct es_em_ctxt *ctxt, + unsigned char *buffer) +{ + return copy_from_kernel_nofault(buffer, (unsigned char *)ctxt->regs->ip, = MAX_INSN_SIZE); +} + +static enum es_result __vc_decode_user_insn(struct es_em_ctxt *ctxt) +{ + char buffer[MAX_INSN_SIZE]; + int insn_bytes; + + insn_bytes =3D insn_fetch_from_user_inatomic(ctxt->regs, buffer); + if (insn_bytes =3D=3D 0) { + /* Nothing could be copied */ + ctxt->fi.vector =3D X86_TRAP_PF; + ctxt->fi.error_code =3D X86_PF_INSTR | X86_PF_USER; + ctxt->fi.cr2 =3D ctxt->regs->ip; + return ES_EXCEPTION; + } else if (insn_bytes =3D=3D -EINVAL) { + /* Effective RIP could not be calculated */ + ctxt->fi.vector =3D X86_TRAP_GP; + ctxt->fi.error_code =3D 0; + ctxt->fi.cr2 =3D 0; + return ES_EXCEPTION; + } + + if (!insn_decode_from_regs(&ctxt->insn, ctxt->regs, buffer, insn_bytes)) + return ES_DECODE_FAILED; + + if (ctxt->insn.immediate.got) + return ES_OK; + else + return ES_DECODE_FAILED; +} + +static enum es_result __vc_decode_kern_insn(struct es_em_ctxt *ctxt) +{ + char buffer[MAX_INSN_SIZE]; + int res, ret; + + res =3D vc_fetch_insn_kernel(ctxt, buffer); + if (res) { + ctxt->fi.vector =3D X86_TRAP_PF; + ctxt->fi.error_code =3D X86_PF_INSTR; + ctxt->fi.cr2 =3D ctxt->regs->ip; + return ES_EXCEPTION; + } + + ret =3D insn_decode(&ctxt->insn, buffer, MAX_INSN_SIZE, INSN_MODE_64); + if (ret < 0) + return ES_DECODE_FAILED; + else + return ES_OK; +} + +static enum es_result vc_decode_insn(struct es_em_ctxt *ctxt) +{ + if (user_mode(ctxt->regs)) + return __vc_decode_user_insn(ctxt); + else + return __vc_decode_kern_insn(ctxt); +} + +static enum es_result vc_write_mem(struct es_em_ctxt *ctxt, + char *dst, char *buf, size_t size) +{ + unsigned long error_code =3D X86_PF_PROT | X86_PF_WRITE; + + /* + * This function uses __put_user() independent of whether kernel or user + * memory is accessed. This works fine because __put_user() does no + * sanity checks of the pointer being accessed. All that it does is + * to report when the access failed. + * + * Also, this function runs in atomic context, so __put_user() is not + * allowed to sleep. The page-fault handler detects that it is running + * in atomic context and will not try to take mmap_sem and handle the + * fault, so additional pagefault_enable()/disable() calls are not + * needed. + * + * The access can't be done via copy_to_user() here because + * vc_write_mem() must not use string instructions to access unsafe + * memory. The reason is that MOVS is emulated by the #VC handler by + * splitting the move up into a read and a write and taking a nested #VC + * exception on whatever of them is the MMIO access. Using string + * instructions here would cause infinite nesting. + */ + switch (size) { + case 1: { + u8 d1; + u8 __user *target =3D (u8 __user *)dst; + + memcpy(&d1, buf, 1); + if (__put_user(d1, target)) + goto fault; + break; + } + case 2: { + u16 d2; + u16 __user *target =3D (u16 __user *)dst; + + memcpy(&d2, buf, 2); + if (__put_user(d2, target)) + goto fault; + break; + } + case 4: { + u32 d4; + u32 __user *target =3D (u32 __user *)dst; + + memcpy(&d4, buf, 4); + if (__put_user(d4, target)) + goto fault; + break; + } + case 8: { + u64 d8; + u64 __user *target =3D (u64 __user *)dst; + + memcpy(&d8, buf, 8); + if (__put_user(d8, target)) + goto fault; + break; + } + default: + WARN_ONCE(1, "%s: Invalid size: %zu\n", __func__, size); + return ES_UNSUPPORTED; + } + + return ES_OK; + +fault: + if (user_mode(ctxt->regs)) + error_code |=3D X86_PF_USER; + + ctxt->fi.vector =3D X86_TRAP_PF; + ctxt->fi.error_code =3D error_code; + ctxt->fi.cr2 =3D (unsigned long)dst; + + return ES_EXCEPTION; +} + +static enum es_result vc_read_mem(struct es_em_ctxt *ctxt, + char *src, char *buf, size_t size) +{ + unsigned long error_code =3D X86_PF_PROT; + + /* + * This function uses __get_user() independent of whether kernel or user + * memory is accessed. This works fine because __get_user() does no + * sanity checks of the pointer being accessed. All that it does is + * to report when the access failed. + * + * Also, this function runs in atomic context, so __get_user() is not + * allowed to sleep. The page-fault handler detects that it is running + * in atomic context and will not try to take mmap_sem and handle the + * fault, so additional pagefault_enable()/disable() calls are not + * needed. + * + * The access can't be done via copy_from_user() here because + * vc_read_mem() must not use string instructions to access unsafe + * memory. The reason is that MOVS is emulated by the #VC handler by + * splitting the move up into a read and a write and taking a nested #VC + * exception on whatever of them is the MMIO access. Using string + * instructions here would cause infinite nesting. + */ + switch (size) { + case 1: { + u8 d1; + u8 __user *s =3D (u8 __user *)src; + + if (__get_user(d1, s)) + goto fault; + memcpy(buf, &d1, 1); + break; + } + case 2: { + u16 d2; + u16 __user *s =3D (u16 __user *)src; + + if (__get_user(d2, s)) + goto fault; + memcpy(buf, &d2, 2); + break; + } + case 4: { + u32 d4; + u32 __user *s =3D (u32 __user *)src; + + if (__get_user(d4, s)) + goto fault; + memcpy(buf, &d4, 4); + break; + } + case 8: { + u64 d8; + u64 __user *s =3D (u64 __user *)src; + if (__get_user(d8, s)) + goto fault; + memcpy(buf, &d8, 8); + break; + } + default: + WARN_ONCE(1, "%s: Invalid size: %zu\n", __func__, size); + return ES_UNSUPPORTED; + } + + return ES_OK; + +fault: + if (user_mode(ctxt->regs)) + error_code |=3D X86_PF_USER; + + ctxt->fi.vector =3D X86_TRAP_PF; + ctxt->fi.error_code =3D error_code; + ctxt->fi.cr2 =3D (unsigned long)src; + + return ES_EXCEPTION; +} + +static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em= _ctxt *ctxt, + unsigned long vaddr, phys_addr_t *paddr) +{ + unsigned long va =3D (unsigned long)vaddr; + unsigned int level; + phys_addr_t pa; + pgd_t *pgd; + pte_t *pte; + + pgd =3D __va(read_cr3_pa()); + pgd =3D &pgd[pgd_index(va)]; + pte =3D lookup_address_in_pgd(pgd, va, &level); + if (!pte) { + ctxt->fi.vector =3D X86_TRAP_PF; + ctxt->fi.cr2 =3D vaddr; + ctxt->fi.error_code =3D 0; + + if (user_mode(ctxt->regs)) + ctxt->fi.error_code |=3D X86_PF_USER; + + return ES_EXCEPTION; + } + + if (WARN_ON_ONCE(pte_val(*pte) & _PAGE_ENC)) + /* Emulated MMIO to/from encrypted memory not supported */ + return ES_UNSUPPORTED; + + pa =3D (phys_addr_t)pte_pfn(*pte) << PAGE_SHIFT; + pa |=3D va & ~page_level_mask(level); + + *paddr =3D pa; + + return ES_OK; +} + +static enum es_result vc_ioio_check(struct es_em_ctxt *ctxt, u16 port, siz= e_t size) +{ + BUG_ON(size > 4); + + if (user_mode(ctxt->regs)) { + struct thread_struct *t =3D ¤t->thread; + struct io_bitmap *iobm =3D t->io_bitmap; + size_t idx; + + if (!iobm) + goto fault; + + for (idx =3D port; idx < port + size; ++idx) { + if (test_bit(idx, iobm->bitmap)) + goto fault; + } + } + + return ES_OK; + +fault: + ctxt->fi.vector =3D X86_TRAP_GP; + ctxt->fi.error_code =3D 0; + + return ES_EXCEPTION; +} + +static __always_inline void vc_forward_exception(struct es_em_ctxt *ctxt) +{ + long error_code =3D ctxt->fi.error_code; + int trapnr =3D ctxt->fi.vector; + + ctxt->regs->orig_ax =3D ctxt->fi.error_code; + + switch (trapnr) { + case X86_TRAP_GP: + exc_general_protection(ctxt->regs, error_code); + break; + case X86_TRAP_UD: + exc_invalid_op(ctxt->regs); + break; + case X86_TRAP_PF: + write_cr2(ctxt->fi.cr2); + exc_page_fault(ctxt->regs, error_code); + break; + case X86_TRAP_AC: + exc_alignment_check(ctxt->regs, error_code); + break; + default: + pr_emerg("Unsupported exception in #VC instruction emulation - can't con= tinue\n"); + BUG(); + } +} + +/* Include code shared with pre-decompression boot stage */ +#include "shared.c" + +noinstr void __sev_put_ghcb(struct ghcb_state *state) +{ + struct sev_es_runtime_data *data; + struct ghcb *ghcb; + + WARN_ON(!irqs_disabled()); + + data =3D this_cpu_read(runtime_data); + ghcb =3D &data->ghcb_page; + + if (state->ghcb) { + /* Restore GHCB from Backup */ + *ghcb =3D *state->ghcb; + data->backup_ghcb_active =3D false; + state->ghcb =3D NULL; + } else { + /* + * Invalidate the GHCB so a VMGEXIT instruction issued + * from userspace won't appear to be valid. + */ + vc_ghcb_invalidate(ghcb); + data->ghcb_active =3D false; + } +} + +int svsm_perform_call_protocol(struct svsm_call *call) +{ + struct ghcb_state state; + unsigned long flags; + struct ghcb *ghcb; + int ret; + + /* + * This can be called very early in the boot, use native functions in + * order to avoid paravirt issues. + */ + flags =3D native_local_irq_save(); + + /* + * Use rip-relative references when called early in the boot. If + * ghcbs_initialized is set, then it is late in the boot and no need + * to worry about rip-relative references in called functions. + */ + if (RIP_REL_REF(sev_cfg).ghcbs_initialized) + ghcb =3D __sev_get_ghcb(&state); + else if (RIP_REL_REF(boot_ghcb)) + ghcb =3D RIP_REL_REF(boot_ghcb); + else + ghcb =3D NULL; + + do { + ret =3D ghcb ? svsm_perform_ghcb_protocol(ghcb, call) + : svsm_perform_msr_protocol(call); + } while (ret =3D=3D -EAGAIN); + + if (RIP_REL_REF(sev_cfg).ghcbs_initialized) + __sev_put_ghcb(&state); + + native_local_irq_restore(flags); + + return ret; +} + +void __head +early_set_pages_state(unsigned long vaddr, unsigned long paddr, + unsigned long npages, enum psc_op op) +{ + unsigned long paddr_end; + u64 val; + + vaddr =3D vaddr & PAGE_MASK; + + paddr =3D paddr & PAGE_MASK; + paddr_end =3D paddr + (npages << PAGE_SHIFT); + + while (paddr < paddr_end) { + /* Page validation must be rescinded before changing to shared */ + if (op =3D=3D SNP_PAGE_STATE_SHARED) + pvalidate_4k_page(vaddr, paddr, false); + + /* + * Use the MSR protocol because this function can be called before + * the GHCB is established. + */ + sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op)); + VMGEXIT(); + + val =3D sev_es_rd_ghcb_msr(); + + if (GHCB_RESP_CODE(val) !=3D GHCB_MSR_PSC_RESP) + goto e_term; + + if (GHCB_MSR_PSC_RESP_VAL(val)) + goto e_term; + + /* Page validation must be performed after changing to private */ + if (op =3D=3D SNP_PAGE_STATE_PRIVATE) + pvalidate_4k_page(vaddr, paddr, true); + + vaddr +=3D PAGE_SIZE; + paddr +=3D PAGE_SIZE; + } + + return; + +e_term: + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC); +} + +void __head early_snp_set_memory_private(unsigned long vaddr, unsigned lon= g paddr, + unsigned long npages) +{ + /* + * This can be invoked in early boot while running identity mapped, so + * use an open coded check for SNP instead of using cc_platform_has(). + * This eliminates worries about jump tables or checking boot_cpu_data + * in the cc_platform_has() function. + */ + if (!(RIP_REL_REF(sev_status) & MSR_AMD64_SEV_SNP_ENABLED)) + return; + + /* + * Ask the hypervisor to mark the memory pages as private in the RMP + * table. + */ + early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_PRIVATE); +} + +void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long= paddr, + unsigned long npages) +{ + /* + * This can be invoked in early boot while running identity mapped, so + * use an open coded check for SNP instead of using cc_platform_has(). + * This eliminates worries about jump tables or checking boot_cpu_data + * in the cc_platform_has() function. + */ + if (!(RIP_REL_REF(sev_status) & MSR_AMD64_SEV_SNP_ENABLED)) + return; + + /* Ask hypervisor to mark the memory pages shared in the RMP table. */ + early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_SHARED); +} + +/* Writes to the SVSM CAA MSR are ignored */ +static enum es_result __vc_handle_msr_caa(struct pt_regs *regs, bool write) +{ + if (write) + return ES_OK; + + regs->ax =3D lower_32_bits(this_cpu_read(svsm_caa_pa)); + regs->dx =3D upper_32_bits(this_cpu_read(svsm_caa_pa)); + + return ES_OK; +} + +/* + * TSC related accesses should not exit to the hypervisor when a guest is + * executing with Secure TSC enabled, so special handling is required for + * accesses of MSR_IA32_TSC and MSR_AMD64_GUEST_TSC_FREQ. + */ +static enum es_result __vc_handle_secure_tsc_msrs(struct pt_regs *regs, bo= ol write) +{ + u64 tsc; + + /* + * GUEST_TSC_FREQ should not be intercepted when Secure TSC is enabled. + * Terminate the SNP guest when the interception is enabled. + */ + if (regs->cx =3D=3D MSR_AMD64_GUEST_TSC_FREQ) + return ES_VMM_ERROR; + + /* + * Writes: Writing to MSR_IA32_TSC can cause subsequent reads of the TSC + * to return undefined values, so ignore all writes. + * + * Reads: Reads of MSR_IA32_TSC should return the current TSC value, use + * the value returned by rdtsc_ordered(). + */ + if (write) { + WARN_ONCE(1, "TSC MSR writes are verboten!\n"); + return ES_OK; + } + + tsc =3D rdtsc_ordered(); + regs->ax =3D lower_32_bits(tsc); + regs->dx =3D upper_32_bits(tsc); + + return ES_OK; +} + +static enum es_result vc_handle_msr(struct ghcb *ghcb, struct es_em_ctxt *= ctxt) +{ + struct pt_regs *regs =3D ctxt->regs; + enum es_result ret; + bool write; + + /* Is it a WRMSR? */ + write =3D ctxt->insn.opcode.bytes[1] =3D=3D 0x30; + + switch (regs->cx) { + case MSR_SVSM_CAA: + return __vc_handle_msr_caa(regs, write); + case MSR_IA32_TSC: + case MSR_AMD64_GUEST_TSC_FREQ: + if (sev_status & MSR_AMD64_SNP_SECURE_TSC) + return __vc_handle_secure_tsc_msrs(regs, write); + break; + default: + break; + } + + ghcb_set_rcx(ghcb, regs->cx); + if (write) { + ghcb_set_rax(ghcb, regs->ax); + ghcb_set_rdx(ghcb, regs->dx); + } + + ret =3D sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MSR, write, 0); + + if ((ret =3D=3D ES_OK) && !write) { + regs->ax =3D ghcb->save.rax; + regs->dx =3D ghcb->save.rdx; + } + + return ret; +} + +static void __init vc_early_forward_exception(struct es_em_ctxt *ctxt) +{ + int trapnr =3D ctxt->fi.vector; + + if (trapnr =3D=3D X86_TRAP_PF) + native_write_cr2(ctxt->fi.cr2); + + ctxt->regs->orig_ax =3D ctxt->fi.error_code; + do_early_exception(ctxt->regs, trapnr); +} + +static long *vc_insn_get_rm(struct es_em_ctxt *ctxt) +{ + long *reg_array; + int offset; + + reg_array =3D (long *)ctxt->regs; + offset =3D insn_get_modrm_rm_off(&ctxt->insn, ctxt->regs); + + if (offset < 0) + return NULL; + + offset /=3D sizeof(long); + + return reg_array + offset; +} +static enum es_result vc_do_mmio(struct ghcb *ghcb, struct es_em_ctxt *ctx= t, + unsigned int bytes, bool read) +{ + u64 exit_code, exit_info_1, exit_info_2; + unsigned long ghcb_pa =3D __pa(ghcb); + enum es_result res; + phys_addr_t paddr; + void __user *ref; + + ref =3D insn_get_addr_ref(&ctxt->insn, ctxt->regs); + if (ref =3D=3D (void __user *)-1L) + return ES_UNSUPPORTED; + + exit_code =3D read ? SVM_VMGEXIT_MMIO_READ : SVM_VMGEXIT_MMIO_WRITE; + + res =3D vc_slow_virt_to_phys(ghcb, ctxt, (unsigned long)ref, &paddr); + if (res !=3D ES_OK) { + if (res =3D=3D ES_EXCEPTION && !read) + ctxt->fi.error_code |=3D X86_PF_WRITE; + + return res; + } + + exit_info_1 =3D paddr; + /* Can never be greater than 8 */ + exit_info_2 =3D bytes; + + ghcb_set_sw_scratch(ghcb, ghcb_pa + offsetof(struct ghcb, shared_buffer)); + + return sev_es_ghcb_hv_call(ghcb, ctxt, exit_code, exit_info_1, exit_info_= 2); +} + +/* + * The MOVS instruction has two memory operands, which raises the + * problem that it is not known whether the access to the source or the + * destination caused the #VC exception (and hence whether an MMIO read + * or write operation needs to be emulated). + * + * Instead of playing games with walking page-tables and trying to guess + * whether the source or destination is an MMIO range, split the move + * into two operations, a read and a write with only one memory operand. + * This will cause a nested #VC exception on the MMIO address which can + * then be handled. + * + * This implementation has the benefit that it also supports MOVS where + * source _and_ destination are MMIO regions. + * + * It will slow MOVS on MMIO down a lot, but in SEV-ES guests it is a + * rare operation. If it turns out to be a performance problem the split + * operations can be moved to memcpy_fromio() and memcpy_toio(). + */ +static enum es_result vc_handle_mmio_movs(struct es_em_ctxt *ctxt, + unsigned int bytes) +{ + unsigned long ds_base, es_base; + unsigned char *src, *dst; + unsigned char buffer[8]; + enum es_result ret; + bool rep; + int off; + + ds_base =3D insn_get_seg_base(ctxt->regs, INAT_SEG_REG_DS); + es_base =3D insn_get_seg_base(ctxt->regs, INAT_SEG_REG_ES); + + if (ds_base =3D=3D -1L || es_base =3D=3D -1L) { + ctxt->fi.vector =3D X86_TRAP_GP; + ctxt->fi.error_code =3D 0; + return ES_EXCEPTION; + } + + src =3D ds_base + (unsigned char *)ctxt->regs->si; + dst =3D es_base + (unsigned char *)ctxt->regs->di; + + ret =3D vc_read_mem(ctxt, src, buffer, bytes); + if (ret !=3D ES_OK) + return ret; + + ret =3D vc_write_mem(ctxt, dst, buffer, bytes); + if (ret !=3D ES_OK) + return ret; + + if (ctxt->regs->flags & X86_EFLAGS_DF) + off =3D -bytes; + else + off =3D bytes; + + ctxt->regs->si +=3D off; + ctxt->regs->di +=3D off; + + rep =3D insn_has_rep_prefix(&ctxt->insn); + if (rep) + ctxt->regs->cx -=3D 1; + + if (!rep || ctxt->regs->cx =3D=3D 0) + return ES_OK; + else + return ES_RETRY; +} + +static enum es_result vc_handle_mmio(struct ghcb *ghcb, struct es_em_ctxt = *ctxt) +{ + struct insn *insn =3D &ctxt->insn; + enum insn_mmio_type mmio; + unsigned int bytes =3D 0; + enum es_result ret; + u8 sign_byte; + long *reg_data; + + mmio =3D insn_decode_mmio(insn, &bytes); + if (mmio =3D=3D INSN_MMIO_DECODE_FAILED) + return ES_DECODE_FAILED; + + if (mmio !=3D INSN_MMIO_WRITE_IMM && mmio !=3D INSN_MMIO_MOVS) { + reg_data =3D insn_get_modrm_reg_ptr(insn, ctxt->regs); + if (!reg_data) + return ES_DECODE_FAILED; + } + + if (user_mode(ctxt->regs)) + return ES_UNSUPPORTED; + + switch (mmio) { + case INSN_MMIO_WRITE: + memcpy(ghcb->shared_buffer, reg_data, bytes); + ret =3D vc_do_mmio(ghcb, ctxt, bytes, false); + break; + case INSN_MMIO_WRITE_IMM: + memcpy(ghcb->shared_buffer, insn->immediate1.bytes, bytes); + ret =3D vc_do_mmio(ghcb, ctxt, bytes, false); + break; + case INSN_MMIO_READ: + ret =3D vc_do_mmio(ghcb, ctxt, bytes, true); + if (ret) + break; + + /* Zero-extend for 32-bit operation */ + if (bytes =3D=3D 4) + *reg_data =3D 0; + + memcpy(reg_data, ghcb->shared_buffer, bytes); + break; + case INSN_MMIO_READ_ZERO_EXTEND: + ret =3D vc_do_mmio(ghcb, ctxt, bytes, true); + if (ret) + break; + + /* Zero extend based on operand size */ + memset(reg_data, 0, insn->opnd_bytes); + memcpy(reg_data, ghcb->shared_buffer, bytes); + break; + case INSN_MMIO_READ_SIGN_EXTEND: + ret =3D vc_do_mmio(ghcb, ctxt, bytes, true); + if (ret) + break; + + if (bytes =3D=3D 1) { + u8 *val =3D (u8 *)ghcb->shared_buffer; + + sign_byte =3D (*val & 0x80) ? 0xff : 0x00; + } else { + u16 *val =3D (u16 *)ghcb->shared_buffer; + + sign_byte =3D (*val & 0x8000) ? 0xff : 0x00; + } + + /* Sign extend based on operand size */ + memset(reg_data, sign_byte, insn->opnd_bytes); + memcpy(reg_data, ghcb->shared_buffer, bytes); + break; + case INSN_MMIO_MOVS: + ret =3D vc_handle_mmio_movs(ctxt, bytes); + break; + default: + ret =3D ES_UNSUPPORTED; + break; + } + + return ret; +} + +static enum es_result vc_handle_dr7_write(struct ghcb *ghcb, + struct es_em_ctxt *ctxt) +{ + struct sev_es_runtime_data *data =3D this_cpu_read(runtime_data); + long val, *reg =3D vc_insn_get_rm(ctxt); + enum es_result ret; + + if (sev_status & MSR_AMD64_SNP_DEBUG_SWAP) + return ES_VMM_ERROR; + + if (!reg) + return ES_DECODE_FAILED; + + val =3D *reg; + + /* Upper 32 bits must be written as zeroes */ + if (val >> 32) { + ctxt->fi.vector =3D X86_TRAP_GP; + ctxt->fi.error_code =3D 0; + return ES_EXCEPTION; + } + + /* Clear out other reserved bits and set bit 10 */ + val =3D (val & 0xffff23ffL) | BIT(10); + + /* Early non-zero writes to DR7 are not supported */ + if (!data && (val & ~DR7_RESET_VALUE)) + return ES_UNSUPPORTED; + + /* Using a value of 0 for ExitInfo1 means RAX holds the value */ + ghcb_set_rax(ghcb, val); + ret =3D sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_WRITE_DR7, 0, 0); + if (ret !=3D ES_OK) + return ret; + + if (data) + data->dr7 =3D val; + + return ES_OK; +} + +static enum es_result vc_handle_dr7_read(struct ghcb *ghcb, + struct es_em_ctxt *ctxt) +{ + struct sev_es_runtime_data *data =3D this_cpu_read(runtime_data); + long *reg =3D vc_insn_get_rm(ctxt); + + if (sev_status & MSR_AMD64_SNP_DEBUG_SWAP) + return ES_VMM_ERROR; + + if (!reg) + return ES_DECODE_FAILED; + + if (data) + *reg =3D data->dr7; + else + *reg =3D DR7_RESET_VALUE; + + return ES_OK; +} + +static enum es_result vc_handle_wbinvd(struct ghcb *ghcb, + struct es_em_ctxt *ctxt) +{ + return sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_WBINVD, 0, 0); +} + +static enum es_result vc_handle_rdpmc(struct ghcb *ghcb, struct es_em_ctxt= *ctxt) +{ + enum es_result ret; + + ghcb_set_rcx(ghcb, ctxt->regs->cx); + + ret =3D sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_RDPMC, 0, 0); + if (ret !=3D ES_OK) + return ret; + + if (!(ghcb_rax_is_valid(ghcb) && ghcb_rdx_is_valid(ghcb))) + return ES_VMM_ERROR; + + ctxt->regs->ax =3D ghcb->save.rax; + ctxt->regs->dx =3D ghcb->save.rdx; + + return ES_OK; +} + +static enum es_result vc_handle_monitor(struct ghcb *ghcb, + struct es_em_ctxt *ctxt) +{ + /* + * Treat it as a NOP and do not leak a physical address to the + * hypervisor. + */ + return ES_OK; +} + +static enum es_result vc_handle_mwait(struct ghcb *ghcb, + struct es_em_ctxt *ctxt) +{ + /* Treat the same as MONITOR/MONITORX */ + return ES_OK; +} + +static enum es_result vc_handle_vmmcall(struct ghcb *ghcb, + struct es_em_ctxt *ctxt) +{ + enum es_result ret; + + ghcb_set_rax(ghcb, ctxt->regs->ax); + ghcb_set_cpl(ghcb, user_mode(ctxt->regs) ? 3 : 0); + + if (x86_platform.hyper.sev_es_hcall_prepare) + x86_platform.hyper.sev_es_hcall_prepare(ghcb, ctxt->regs); + + ret =3D sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_VMMCALL, 0, 0); + if (ret !=3D ES_OK) + return ret; + + if (!ghcb_rax_is_valid(ghcb)) + return ES_VMM_ERROR; + + ctxt->regs->ax =3D ghcb->save.rax; + + /* + * Call sev_es_hcall_finish() after regs->ax is already set. + * This allows the hypervisor handler to overwrite it again if + * necessary. + */ + if (x86_platform.hyper.sev_es_hcall_finish && + !x86_platform.hyper.sev_es_hcall_finish(ghcb, ctxt->regs)) + return ES_VMM_ERROR; + + return ES_OK; +} + +static enum es_result vc_handle_trap_ac(struct ghcb *ghcb, + struct es_em_ctxt *ctxt) +{ + /* + * Calling ecx_alignment_check() directly does not work, because it + * enables IRQs and the GHCB is active. Forward the exception and call + * it later from vc_forward_exception(). + */ + ctxt->fi.vector =3D X86_TRAP_AC; + ctxt->fi.error_code =3D 0; + return ES_EXCEPTION; +} + +static enum es_result vc_handle_exitcode(struct es_em_ctxt *ctxt, + struct ghcb *ghcb, + unsigned long exit_code) +{ + enum es_result result =3D vc_check_opcode_bytes(ctxt, exit_code); + + if (result !=3D ES_OK) + return result; + + switch (exit_code) { + case SVM_EXIT_READ_DR7: + result =3D vc_handle_dr7_read(ghcb, ctxt); + break; + case SVM_EXIT_WRITE_DR7: + result =3D vc_handle_dr7_write(ghcb, ctxt); + break; + case SVM_EXIT_EXCP_BASE + X86_TRAP_AC: + result =3D vc_handle_trap_ac(ghcb, ctxt); + break; + case SVM_EXIT_RDTSC: + case SVM_EXIT_RDTSCP: + result =3D vc_handle_rdtsc(ghcb, ctxt, exit_code); + break; + case SVM_EXIT_RDPMC: + result =3D vc_handle_rdpmc(ghcb, ctxt); + break; + case SVM_EXIT_INVD: + pr_err_ratelimited("#VC exception for INVD??? Seriously???\n"); + result =3D ES_UNSUPPORTED; + break; + case SVM_EXIT_CPUID: + result =3D vc_handle_cpuid(ghcb, ctxt); + break; + case SVM_EXIT_IOIO: + result =3D vc_handle_ioio(ghcb, ctxt); + break; + case SVM_EXIT_MSR: + result =3D vc_handle_msr(ghcb, ctxt); + break; + case SVM_EXIT_VMMCALL: + result =3D vc_handle_vmmcall(ghcb, ctxt); + break; + case SVM_EXIT_WBINVD: + result =3D vc_handle_wbinvd(ghcb, ctxt); + break; + case SVM_EXIT_MONITOR: + result =3D vc_handle_monitor(ghcb, ctxt); + break; + case SVM_EXIT_MWAIT: + result =3D vc_handle_mwait(ghcb, ctxt); + break; + case SVM_EXIT_NPF: + result =3D vc_handle_mmio(ghcb, ctxt); + break; + default: + /* + * Unexpected #VC exception + */ + result =3D ES_UNSUPPORTED; + } + + return result; +} + +static __always_inline bool is_vc2_stack(unsigned long sp) +{ + return (sp >=3D __this_cpu_ist_bottom_va(VC2) && sp < __this_cpu_ist_top_= va(VC2)); +} + +static __always_inline bool vc_from_invalid_context(struct pt_regs *regs) +{ + unsigned long sp, prev_sp; + + sp =3D (unsigned long)regs; + prev_sp =3D regs->sp; + + /* + * If the code was already executing on the VC2 stack when the #VC + * happened, let it proceed to the normal handling routine. This way the + * code executing on the VC2 stack can cause #VC exceptions to get handle= d. + */ + return is_vc2_stack(sp) && !is_vc2_stack(prev_sp); +} + +static bool vc_raw_handle_exception(struct pt_regs *regs, unsigned long er= ror_code) +{ + struct ghcb_state state; + struct es_em_ctxt ctxt; + enum es_result result; + struct ghcb *ghcb; + bool ret =3D true; + + ghcb =3D __sev_get_ghcb(&state); + + vc_ghcb_invalidate(ghcb); + result =3D vc_init_em_ctxt(&ctxt, regs, error_code); + + if (result =3D=3D ES_OK) + result =3D vc_handle_exitcode(&ctxt, ghcb, error_code); + + __sev_put_ghcb(&state); + + /* Done - now check the result */ + switch (result) { + case ES_OK: + vc_finish_insn(&ctxt); + break; + case ES_UNSUPPORTED: + pr_err_ratelimited("Unsupported exit-code 0x%02lx in #VC exception (IP: = 0x%lx)\n", + error_code, regs->ip); + ret =3D false; + break; + case ES_VMM_ERROR: + pr_err_ratelimited("Failure in communication with VMM (exit-code 0x%02lx= IP: 0x%lx)\n", + error_code, regs->ip); + ret =3D false; + break; + case ES_DECODE_FAILED: + pr_err_ratelimited("Failed to decode instruction (exit-code 0x%02lx IP: = 0x%lx)\n", + error_code, regs->ip); + ret =3D false; + break; + case ES_EXCEPTION: + vc_forward_exception(&ctxt); + break; + case ES_RETRY: + /* Nothing to do */ + break; + default: + pr_emerg("Unknown result in %s():%d\n", __func__, result); + /* + * Emulating the instruction which caused the #VC exception + * failed - can't continue so print debug information + */ + BUG(); + } + + return ret; +} + +static __always_inline bool vc_is_db(unsigned long error_code) +{ + return error_code =3D=3D SVM_EXIT_EXCP_BASE + X86_TRAP_DB; +} + +/* + * Runtime #VC exception handler when raised from kernel mode. Runs in NMI= mode + * and will panic when an error happens. + */ +DEFINE_IDTENTRY_VC_KERNEL(exc_vmm_communication) +{ + irqentry_state_t irq_state; + + /* + * With the current implementation it is always possible to switch to a + * safe stack because #VC exceptions only happen at known places, like + * intercepted instructions or accesses to MMIO areas/IO ports. They can + * also happen with code instrumentation when the hypervisor intercepts + * #DB, but the critical paths are forbidden to be instrumented, so #DB + * exceptions currently also only happen in safe places. + * + * But keep this here in case the noinstr annotations are violated due + * to bug elsewhere. + */ + if (unlikely(vc_from_invalid_context(regs))) { + instrumentation_begin(); + panic("Can't handle #VC exception from unsupported context\n"); + instrumentation_end(); + } + + /* + * Handle #DB before calling into !noinstr code to avoid recursive #DB. + */ + if (vc_is_db(error_code)) { + exc_debug(regs); + return; + } + + irq_state =3D irqentry_nmi_enter(regs); + + instrumentation_begin(); + + if (!vc_raw_handle_exception(regs, error_code)) { + /* Show some debug info */ + show_regs(regs); + + /* Ask hypervisor to sev_es_terminate */ + sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ); + + /* If that fails and we get here - just panic */ + panic("Returned from Terminate-Request to Hypervisor\n"); + } + + instrumentation_end(); + irqentry_nmi_exit(regs, irq_state); +} + +/* + * Runtime #VC exception handler when raised from user mode. Runs in IRQ m= ode + * and will kill the current task with SIGBUS when an error happens. + */ +DEFINE_IDTENTRY_VC_USER(exc_vmm_communication) +{ + /* + * Handle #DB before calling into !noinstr code to avoid recursive #DB. + */ + if (vc_is_db(error_code)) { + noist_exc_debug(regs); + return; + } + + irqentry_enter_from_user_mode(regs); + instrumentation_begin(); + + if (!vc_raw_handle_exception(regs, error_code)) { + /* + * Do not kill the machine if user-space triggered the + * exception. Send SIGBUS instead and let user-space deal with + * it. + */ + force_sig_fault(SIGBUS, BUS_OBJERR, (void __user *)0); + } + + instrumentation_end(); + irqentry_exit_to_user_mode(regs); +} + +bool __init handle_vc_boot_ghcb(struct pt_regs *regs) +{ + unsigned long exit_code =3D regs->orig_ax; + struct es_em_ctxt ctxt; + enum es_result result; + + vc_ghcb_invalidate(boot_ghcb); + + result =3D vc_init_em_ctxt(&ctxt, regs, exit_code); + if (result =3D=3D ES_OK) + result =3D vc_handle_exitcode(&ctxt, boot_ghcb, exit_code); + + /* Done - now check the result */ + switch (result) { + case ES_OK: + vc_finish_insn(&ctxt); + break; + case ES_UNSUPPORTED: + early_printk("PANIC: Unsupported exit-code 0x%02lx in early #VC exceptio= n (IP: 0x%lx)\n", + exit_code, regs->ip); + goto fail; + case ES_VMM_ERROR: + early_printk("PANIC: Failure in communication with VMM (exit-code 0x%02l= x IP: 0x%lx)\n", + exit_code, regs->ip); + goto fail; + case ES_DECODE_FAILED: + early_printk("PANIC: Failed to decode instruction (exit-code 0x%02lx IP:= 0x%lx)\n", + exit_code, regs->ip); + goto fail; + case ES_EXCEPTION: + vc_early_forward_exception(&ctxt); + break; + case ES_RETRY: + /* Nothing to do */ + break; + default: + BUG(); + } + + return true; + +fail: + show_regs(regs); + + sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ); +} + +/* + * Initial set up of SNP relies on information provided by the + * Confidential Computing blob, which can be passed to the kernel + * in the following ways, depending on how it is booted: + * + * - when booted via the boot/decompress kernel: + * - via boot_params + * + * - when booted directly by firmware/bootloader (e.g. CONFIG_PVH): + * - via a setup_data entry, as defined by the Linux Boot Protocol + * + * Scan for the blob in that order. + */ +static __head struct cc_blob_sev_info *find_cc_blob(struct boot_params *bp) +{ + struct cc_blob_sev_info *cc_info; + + /* Boot kernel would have passed the CC blob via boot_params. */ + if (bp->cc_blob_address) { + cc_info =3D (struct cc_blob_sev_info *)(unsigned long)bp->cc_blob_addres= s; + goto found_cc_info; + } + + /* + * If kernel was booted directly, without the use of the + * boot/decompression kernel, the CC blob may have been passed via + * setup_data instead. + */ + cc_info =3D find_cc_blob_setup_data(bp); + if (!cc_info) + return NULL; + +found_cc_info: + if (cc_info->magic !=3D CC_BLOB_SEV_HDR_MAGIC) + snp_abort(); + + return cc_info; +} + +static __head void svsm_setup(struct cc_blob_sev_info *cc_info) +{ + struct svsm_call call =3D {}; + int ret; + u64 pa; + + /* + * Record the SVSM Calling Area address (CAA) if the guest is not + * running at VMPL0. The CA will be used to communicate with the + * SVSM to perform the SVSM services. + */ + if (!svsm_setup_ca(cc_info)) + return; + + /* + * It is very early in the boot and the kernel is running identity + * mapped but without having adjusted the pagetables to where the + * kernel was loaded (physbase), so the get the CA address using + * RIP-relative addressing. + */ + pa =3D (u64)rip_rel_ptr(&boot_svsm_ca_page); + + /* + * Switch over to the boot SVSM CA while the current CA is still + * addressable. There is no GHCB at this point so use the MSR protocol. + * + * SVSM_CORE_REMAP_CA call: + * RAX =3D 0 (Protocol=3D0, CallID=3D0) + * RCX =3D New CA GPA + */ + call.caa =3D svsm_get_caa(); + call.rax =3D SVSM_CORE_CALL(SVSM_CORE_REMAP_CA); + call.rcx =3D pa; + ret =3D svsm_perform_call_protocol(&call); + if (ret) + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_SVSM_CA_REMAP_FAIL); + + RIP_REL_REF(boot_svsm_caa) =3D (struct svsm_ca *)pa; + RIP_REL_REF(boot_svsm_caa_pa) =3D pa; +} + +bool __head snp_init(struct boot_params *bp) +{ + struct cc_blob_sev_info *cc_info; + + if (!bp) + return false; + + cc_info =3D find_cc_blob(bp); + if (!cc_info) + return false; + + if (cc_info->secrets_phys && cc_info->secrets_len =3D=3D PAGE_SIZE) + secrets_pa =3D cc_info->secrets_phys; + else + return false; + + setup_cpuid_table(cc_info); + + svsm_setup(cc_info); + + /* + * The CC blob will be used later to access the secrets page. Cache + * it here like the boot kernel does. + */ + bp->cc_blob_address =3D (u32)(unsigned long)cc_info; + + return true; +} + +void __head __noreturn snp_abort(void) +{ + sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED); +} --=20 2.49.0.504.g3bcea36a83-goog From nobody Thu Dec 18 13:15:19 2025 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 40B8828D82F for ; Thu, 10 Apr 2025 13:42:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292522; cv=none; b=lCnaVozPrQinplTsZ3XlylC6KB7lxWhdpwE2ZW2fyv7Kg1/tLI+Zq0xnbLyYjwBTaPtqe/L3PPa3cdxUrWi7PKcudPzK5n74zeJSBJoCaGG1d7N70T9XfVlZIYjMoyPQABGWXpkhvzPfOgGfNslwpq/UMPXyb/N3zXpIBYRyL1c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292522; c=relaxed/simple; bh=3RDY+InRbRGkZwpv/31rtaQafCmsuYt5wpq5FyYuVIo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=fBaTdrUjK4JCvYSV/B3xcDWyhz+eWIyfdHMlB/MHtKiY5pJByeRRXfgYq9R79o0uiN4IHu6wwr3BI6iSRSI3CeiTwL6j1nNhJPruk0hRPl4ZJQxVOeX/X0xnRv3G+k9pm0f1S1Kj0Rx/QeRBLdyHG30k3Mino+zfakGWjkaJJMU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=wEUtSqcq; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="wEUtSqcq" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-43947a0919aso6422025e9.0 for ; Thu, 10 Apr 2025 06:42:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744292518; x=1744897318; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8Yq2Udp4AIBfEdCHQPPm/mcMlKJxLC8HFZ/abGzZSPM=; b=wEUtSqcqfgQUbm6mveqdmeMq6laXD/Bj5fmk4oBNV1JarzI/LiPJ0IpMQsSsFeKSpZ S4ijSQk0V60y/66Ib87LLGFpZx2Tqy1781ZS3728X0jhOHQsh3VhK8DqCj4oX9ZoN7pU nUKMdchtrKfM6ljE4SpbIv/vW9HTY2N2afAVI3jbL1gu+0EyY49CcdbEmOWdineszgK4 42cP/lZVzJ8ofCttzElkzWShdQGZ0DAAIyyDRDkUS/OUDuUaqutrGlHcBUlagGynkqJw KZ3abo5wQE8AAQTzEFYIzf46kxz5LxYyX/Nvz54JIF9Df4a5WXuu9wEKQ6zXjqY+jmUQ L2ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744292518; x=1744897318; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8Yq2Udp4AIBfEdCHQPPm/mcMlKJxLC8HFZ/abGzZSPM=; b=qwga4Bygqkpi622u9eTh1t9wzKubdwmWCtEOdCeKAhdApUjB8dz6fFrpWaPWKmuMgF Z6Xrn9S7kR7BctLdG5jG4JMatVLWxY8zvwTJ3ddnk+sg2c3CFKsHmBAfg/I7lHZxHbKf OncLgqHYXRdpDXs990XTjqQhbv0IBKcYXl35d1OsBw7ExExdLNYhRgGf3nJDPSpGYqY0 FromwcJ//J3hYX/ZcnFgCfdoAobM6SsbnO5YwEF/l07lrZW7iWxS2B2DOSOVB75Im+Tj /p5/JwFTvLMIYuZDxxAfFs2ThbKl16794XMmLxTHz8ptFCBRsSE4Pw9SDuVDLKx1YjTL 4hEQ== X-Forwarded-Encrypted: i=1; AJvYcCWpyn/cREt1I5YubBXq5OLHF0K6SnUg7NJTgfn6FNZvdEO+Oc91qGWUZ69ratRSef/vf/FV9Ohd4AI2BXE=@vger.kernel.org X-Gm-Message-State: AOJu0Yxp8vska+Dxs9xZLsISjo75XnCm2JLq8adfJz60XEe8ezFdox2u iJyS1c2fhZlqpHk3Ao19bikTPulfFpL0BfgNzdkbgR2eIDGUg3gS5OqUfJBVu6gzcOhqWw== X-Google-Smtp-Source: AGHT+IH5IZ4nLFaGoD4MuACElwJFU9aXPnUiWGgPaq1XTHhC2cKM31QzFF70nprkf6ZRYgrR6xmVXgVr X-Received: from wmcn12.prod.google.com ([2002:a05:600c:c0cc:b0:43c:f60a:4c59]) (user=ardb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:1e13:b0:43c:e481:3353 with SMTP id 5b1f17b1804b1-43f2d7f1b68mr32794315e9.17.1744292518802; Thu, 10 Apr 2025 06:41:58 -0700 (PDT) Date: Thu, 10 Apr 2025 15:41:27 +0200 In-Reply-To: <20250410134117.3713574-13-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250410134117.3713574-13-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=3347; i=ardb@kernel.org; h=from:subject; bh=ZxH01yKh1YWsY6C3Y3GXCe7hCPc6xzGAIM8w8oIWNO0=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIf37qV67Dyt0dPKafwRHms/cEjtZ1GTdBltth8sTJuusY C2IVl3RUcrCIMbBICumyCIw+++7nacnStU6z5KFmcPKBDKEgYtTACairMrwh8+zpnrOPpvjyfue 6Rkq7Zgf8aB5fcfPiSs+7Jt7J933ySFGhpmTdJn93AREKqN0D0uzM1a+XVT9ZtLnY/2l2S9Xr/g 9lx0A X-Mailer: git-send-email 2.49.0.504.g3bcea36a83-goog Message-ID: <20250410134117.3713574-22-ardb+git@google.com> Subject: [PATCH v4 09/11] x86/boot: Move SEV startup code into startup/ From: Ard Biesheuvel To: linux-efi@vger.kernel.org Cc: x86@kernel.org, mingo@kernel.org, linux-kernel@vger.kernel.org, Ard Biesheuvel , Tom Lendacky , Dionna Amalie Glaze , Kevin Loughlin Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ard Biesheuvel Move the SEV startup code into arch/x86/boot/startup/, where it will reside along with other code that executes extremely early, and therefore needs to be built in a special manner. Signed-off-by: Ard Biesheuvel --- arch/x86/boot/compressed/sev.c | 2 +- arch/x86/boot/startup/Makefile | 2 +- arch/x86/{coco/sev/shared.c =3D> boot/startup/sev-shared.c} | 0 arch/x86/{coco/sev/startup.c =3D> boot/startup/sev-startup.c} | 2 +- arch/x86/coco/sev/Makefile | 21 +--------= ----------- 5 files changed, 4 insertions(+), 23 deletions(-) diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c index 714e30c66eae..478c65149cf0 100644 --- a/arch/x86/boot/compressed/sev.c +++ b/arch/x86/boot/compressed/sev.c @@ -144,7 +144,7 @@ int svsm_perform_call_protocol(struct svsm_call *call); u8 snp_vmpl; =20 /* Include code for early handlers */ -#include "../../coco/sev/shared.c" +#include "../../boot/startup/sev-shared.c" =20 int svsm_perform_call_protocol(struct svsm_call *call) { diff --git a/arch/x86/boot/startup/Makefile b/arch/x86/boot/startup/Makefile index ccdfc42a4d59..b56facb9091a 100644 --- a/arch/x86/boot/startup/Makefile +++ b/arch/x86/boot/startup/Makefile @@ -16,7 +16,7 @@ UBSAN_SANITIZE :=3D n KCOV_INSTRUMENT :=3D n =20 obj-$(CONFIG_X86_64) +=3D gdt_idt.o map_kernel.o -obj-$(CONFIG_AMD_MEM_ENCRYPT) +=3D sme.o +obj-$(CONFIG_AMD_MEM_ENCRYPT) +=3D sme.o sev-startup.o =20 lib-$(CONFIG_X86_64) +=3D la57toggle.o lib-$(CONFIG_EFI_MIXED) +=3D efi-mixed.o diff --git a/arch/x86/coco/sev/shared.c b/arch/x86/boot/startup/sev-shared.c similarity index 100% rename from arch/x86/coco/sev/shared.c rename to arch/x86/boot/startup/sev-shared.c diff --git a/arch/x86/coco/sev/startup.c b/arch/x86/boot/startup/sev-startu= p.c similarity index 99% rename from arch/x86/coco/sev/startup.c rename to arch/x86/boot/startup/sev-startup.c index 9f5dc70cfb44..10b636009d1c 100644 --- a/arch/x86/coco/sev/startup.c +++ b/arch/x86/boot/startup/sev-startup.c @@ -422,7 +422,7 @@ static __always_inline void vc_forward_exception(struct= es_em_ctxt *ctxt) } =20 /* Include code shared with pre-decompression boot stage */ -#include "shared.c" +#include "sev-shared.c" =20 noinstr void __sev_put_ghcb(struct ghcb_state *state) { diff --git a/arch/x86/coco/sev/Makefile b/arch/x86/coco/sev/Makefile index 7d7d2aee62f0..b89ba3fba343 100644 --- a/arch/x86/coco/sev/Makefile +++ b/arch/x86/coco/sev/Makefile @@ -1,22 +1,3 @@ # SPDX-License-Identifier: GPL-2.0 =20 -obj-y +=3D core.o startup.o - -# jump tables are emitted using absolute references in non-PIC code -# so they cannot be used in the early SEV startup code -CFLAGS_startup.o +=3D -fno-jump-tables - -ifdef CONFIG_FUNCTION_TRACER -CFLAGS_REMOVE_startup.o =3D -pg -endif - -KASAN_SANITIZE_startup.o :=3D n -KMSAN_SANITIZE_startup.o :=3D n -KCOV_INSTRUMENT_startup.o :=3D n - -# With some compiler versions the generated code results in boot hangs, ca= used -# by several compilation units. To be safe, disable all instrumentation. -KCSAN_SANITIZE :=3D n - -# Clang 14 and older may fail to respect __no_sanitize_undefined when inli= ning -UBSAN_SANITIZE :=3D n +obj-y +=3D core.o --=20 2.49.0.504.g3bcea36a83-goog From nobody Thu Dec 18 13:15:19 2025 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9FFC528D848 for ; Thu, 10 Apr 2025 13:42:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292524; cv=none; b=qu3e7Iq+LkeTLT+YCo56U7HRVmPV/oApJWwOVv/VCqB3AuUeTNdpgrYGnGBAA7miNPbozJIOvxsV9xaPjgoHsmhlz41EaFfRNGgNec5GpeHj+8gejolmcFiUA/yG9FUrwPMW3908/QfbwiZ4Yr9dztI4G6u20hHqafI+rHIDnYQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292524; c=relaxed/simple; bh=qRnbrYushYruhm4qh3eglZ9GhyvymI5PLgTPXFxDYaE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=gPcSqAcWIIoLRcNajuuqydJS0vH4ti3l5fjh3In9Lw5/7R55IwzSb++TpXKqp7P1WD/Ea2E7HP1HDiIwNNexSIFH0+6LIfX9FM8ODWruNV+zFOUmlw7Vd1sob2kWWdTiiZKjYkNgxkwsNMfzLtTXSaYqf1V6oHzDKIOqBiHJbvw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=WsGxr9Q5; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="WsGxr9Q5" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-3913f97d115so466587f8f.0 for ; Thu, 10 Apr 2025 06:42:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744292521; x=1744897321; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=sm9qg46trCRfuH5OPEG2I+8thbipFQuVhxq71kAuSK8=; b=WsGxr9Q55hM6iaYh7aDCSJvmUStTZ264BuUOfKhOx9NoFU4VSviriTLSLe8ef7iAgb WB/vvhn3kp08sKNgA0OKLWjrGydOs2588CnZtlKzFM7uPZMMK7e+eMMLoQeSDsgak34X 2AUg6b5n6qRZtBXtGxM56lhUM7mJ1B6jn+03zP0VRBzBHG2KgHJdLcvv7CXwbgu5MDMb Jf55tV0+4PzmucEgPiLsNzU3mJegHjpPg+T5ssk0vCNXnP6d+rH5A/607eZBALamRUSp u0kGnm8vyeqCt15FPEtbH1ZgjsOHTY0glGpGK/oqHwlkr5TJTQB8jM+MyVbmpani4xQ1 IgOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744292521; x=1744897321; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=sm9qg46trCRfuH5OPEG2I+8thbipFQuVhxq71kAuSK8=; b=X3st8AyUPP9n/4Vvj19Tjonjht6UYjsGHzmcJQuajdfnMY3aXNsdj7MWTK4LIAqufA jURbck9GA+IEvL4fPzPV1lJIec03KpAtHCoKLPl9G0Zxx3I+c+z/IcAxEEogOwWV0Evi 6oNTFknpccUOHcQHjL8R1qllY2YMDWOZdUrnu1ZtQL1SVPZTrwfQcOlCzUrqodglltWl aXs2t2QUGbDsErK34eJdzB/noNMLmKEJBkNKSLLLiVJsisgQSHV4MhAV2N2AAn/IzzGE vzevJmEj6PNoj6gIOcAfrpUFl4hQ3FiHLZFAGyQ9q7FgIqLl5jUyL0th2grysOSA+0Uj BPTA== X-Forwarded-Encrypted: i=1; AJvYcCVT5EPLpRh0gV3AUYo1LrUskcbdyJyGYRRbJCJ/1wIHz2CFlI2JULNUc03UoA7wKjx3g55qG1ihdQqWc+U=@vger.kernel.org X-Gm-Message-State: AOJu0YzQFC7nVXBSG0O6MSInLm13HqPnBlQuw2l/9pwbrFbfh4F6wisW cNUyJG2r6fLWUfO/zlkqvc/Yc5CISPsHsqiFX4MRgTKvbiKq5vCR81MUhK8qOJMZcY2rzg== X-Google-Smtp-Source: AGHT+IF4ZCUofvG98+/W0HdYPKmbxRqu7FA683gwVCBxJo6VzdLSZzPQd41IKBaeCpJh++s/0G8p8qN1 X-Received: from wmbh20.prod.google.com ([2002:a05:600c:a114:b0:43d:8f:dd29]) (user=ardb job=prod-delivery.src-stubby-dispatcher) by 2002:a5d:64c9:0:b0:39c:2673:4f10 with SMTP id ffacd0b85a97d-39d8f4c9222mr2471771f8f.23.1744292521043; Thu, 10 Apr 2025 06:42:01 -0700 (PDT) Date: Thu, 10 Apr 2025 15:41:28 +0200 In-Reply-To: <20250410134117.3713574-13-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250410134117.3713574-13-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=6881; i=ardb@kernel.org; h=from:subject; bh=me4CAa7ZM5wAa0+f+0nLWScVpocU5E9ofun8VFTWI5A=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIf37qb5DnQs2LjJv07t168XhDdvcFNpKzecavrl0vfXvr FCPbZPedpSwMIhxMMiKKbIIzP77bufpiVK1zrNkYeawMoEMYeDiFICJRHMx/Nhz7VvlN+GYx8bs FRP0mv0nRik3dNwTmHKi2+LSqg86lxgZ9rAKeOjbbVrSl/Lv26tPOXqvr+Z0L+WJuzjDKHtd/ms lfgA= X-Mailer: git-send-email 2.49.0.504.g3bcea36a83-goog Message-ID: <20250410134117.3713574-23-ardb+git@google.com> Subject: [PATCH v4 10/11] x86/boot: Drop RIP_REL_REF() uses from early SEV code From: Ard Biesheuvel To: linux-efi@vger.kernel.org Cc: x86@kernel.org, mingo@kernel.org, linux-kernel@vger.kernel.org, Ard Biesheuvel , Tom Lendacky , Dionna Amalie Glaze , Kevin Loughlin Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ard Biesheuvel Now that the early SEV code is built with -fPIC, RIP_REL_REF() has no effect and can be dropped. Signed-off-by: Ard Biesheuvel --- arch/x86/boot/startup/sev-shared.c | 26 +++++++++----------- arch/x86/boot/startup/sev-startup.c | 16 ++++++------ arch/x86/include/asm/sev-internal.h | 18 +++----------- 3 files changed, 23 insertions(+), 37 deletions(-) diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev= -shared.c index 815542295f16..173f3d1f777a 100644 --- a/arch/x86/boot/startup/sev-shared.c +++ b/arch/x86/boot/startup/sev-shared.c @@ -299,7 +299,7 @@ static int svsm_perform_ghcb_protocol(struct ghcb *ghcb= , struct svsm_call *call) * Fill in protocol and format specifiers. This can be called very early * in the boot, so use rip-relative references as needed. */ - ghcb->protocol_version =3D RIP_REL_REF(ghcb_version); + ghcb->protocol_version =3D ghcb_version; ghcb->ghcb_usage =3D GHCB_DEFAULT_USAGE; =20 ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_SNP_RUN_VMPL); @@ -656,9 +656,9 @@ snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, s= truct cpuid_leaf *leaf) leaf->eax =3D leaf->ebx =3D leaf->ecx =3D leaf->edx =3D 0; =20 /* Skip post-processing for out-of-range zero leafs. */ - if (!(leaf->fn <=3D RIP_REL_REF(cpuid_std_range_max) || - (leaf->fn >=3D 0x40000000 && leaf->fn <=3D RIP_REL_REF(cpuid_hyp_r= ange_max)) || - (leaf->fn >=3D 0x80000000 && leaf->fn <=3D RIP_REL_REF(cpuid_ext_r= ange_max)))) + if (!(leaf->fn <=3D cpuid_std_range_max || + (leaf->fn >=3D 0x40000000 && leaf->fn <=3D cpuid_hyp_range_max) || + (leaf->fn >=3D 0x80000000 && leaf->fn <=3D cpuid_ext_range_max))) return 0; } =20 @@ -1179,11 +1179,11 @@ static void __head setup_cpuid_table(const struct c= c_blob_sev_info *cc_info) const struct snp_cpuid_fn *fn =3D &cpuid_table->fn[i]; =20 if (fn->eax_in =3D=3D 0x0) - RIP_REL_REF(cpuid_std_range_max) =3D fn->eax; + cpuid_std_range_max =3D fn->eax; else if (fn->eax_in =3D=3D 0x40000000) - RIP_REL_REF(cpuid_hyp_range_max) =3D fn->eax; + cpuid_hyp_range_max =3D fn->eax; else if (fn->eax_in =3D=3D 0x80000000) - RIP_REL_REF(cpuid_ext_range_max) =3D fn->eax; + cpuid_ext_range_max =3D fn->eax; } } =20 @@ -1229,11 +1229,7 @@ static void __head pvalidate_4k_page(unsigned long v= addr, unsigned long paddr, { int ret; =20 - /* - * This can be called very early during boot, so use rIP-relative - * references as needed. - */ - if (RIP_REL_REF(snp_vmpl)) { + if (snp_vmpl) { svsm_pval_4k_page(paddr, validate); } else { ret =3D pvalidate(vaddr, RMP_PG_SIZE_4K, validate); @@ -1377,7 +1373,7 @@ static bool __head svsm_setup_ca(const struct cc_blob= _sev_info *cc_info) if (!secrets_page->svsm_guest_vmpl) sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_SVSM_VMPL0); =20 - RIP_REL_REF(snp_vmpl) =3D secrets_page->svsm_guest_vmpl; + snp_vmpl =3D secrets_page->svsm_guest_vmpl; =20 caa =3D secrets_page->svsm_caa; =20 @@ -1392,8 +1388,8 @@ static bool __head svsm_setup_ca(const struct cc_blob= _sev_info *cc_info) * The CA is identity mapped when this routine is called, both by the * decompressor code and the early kernel code. */ - RIP_REL_REF(boot_svsm_caa) =3D (struct svsm_ca *)caa; - RIP_REL_REF(boot_svsm_caa_pa) =3D caa; + boot_svsm_caa =3D (struct svsm_ca *)caa; + boot_svsm_caa_pa =3D caa; =20 /* Advertise the SVSM presence via CPUID. */ cpuid_table =3D (struct snp_cpuid_table *)snp_cpuid_get_table(); diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/se= v-startup.c index 10b636009d1c..e376a340b629 100644 --- a/arch/x86/boot/startup/sev-startup.c +++ b/arch/x86/boot/startup/sev-startup.c @@ -467,10 +467,10 @@ int svsm_perform_call_protocol(struct svsm_call *call) * ghcbs_initialized is set, then it is late in the boot and no need * to worry about rip-relative references in called functions. */ - if (RIP_REL_REF(sev_cfg).ghcbs_initialized) + if (sev_cfg.ghcbs_initialized) ghcb =3D __sev_get_ghcb(&state); - else if (RIP_REL_REF(boot_ghcb)) - ghcb =3D RIP_REL_REF(boot_ghcb); + else if (boot_ghcb) + ghcb =3D boot_ghcb; else ghcb =3D NULL; =20 @@ -479,7 +479,7 @@ int svsm_perform_call_protocol(struct svsm_call *call) : svsm_perform_msr_protocol(call); } while (ret =3D=3D -EAGAIN); =20 - if (RIP_REL_REF(sev_cfg).ghcbs_initialized) + if (sev_cfg.ghcbs_initialized) __sev_put_ghcb(&state); =20 native_local_irq_restore(flags); @@ -542,7 +542,7 @@ void __head early_snp_set_memory_private(unsigned long = vaddr, unsigned long padd * This eliminates worries about jump tables or checking boot_cpu_data * in the cc_platform_has() function. */ - if (!(RIP_REL_REF(sev_status) & MSR_AMD64_SEV_SNP_ENABLED)) + if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED)) return; =20 /* @@ -561,7 +561,7 @@ void __head early_snp_set_memory_shared(unsigned long v= addr, unsigned long paddr * This eliminates worries about jump tables or checking boot_cpu_data * in the cc_platform_has() function. */ - if (!(RIP_REL_REF(sev_status) & MSR_AMD64_SEV_SNP_ENABLED)) + if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED)) return; =20 /* Ask hypervisor to mark the memory pages shared in the RMP table. */ @@ -1356,8 +1356,8 @@ static __head void svsm_setup(struct cc_blob_sev_info= *cc_info) if (ret) sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_SVSM_CA_REMAP_FAIL); =20 - RIP_REL_REF(boot_svsm_caa) =3D (struct svsm_ca *)pa; - RIP_REL_REF(boot_svsm_caa_pa) =3D pa; + boot_svsm_caa =3D (struct svsm_ca *)pa; + boot_svsm_caa_pa =3D pa; } =20 bool __head snp_init(struct boot_params *bp) diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev= -internal.h index 73cb774c3639..e54847a69107 100644 --- a/arch/x86/include/asm/sev-internal.h +++ b/arch/x86/include/asm/sev-internal.h @@ -68,28 +68,18 @@ extern u64 boot_svsm_caa_pa; =20 static __always_inline struct svsm_ca *svsm_get_caa(void) { - /* - * Use rIP-relative references when called early in the boot. If - * ->use_cas is set, then it is late in the boot and no need - * to worry about rIP-relative references. - */ - if (RIP_REL_REF(sev_cfg).use_cas) + if (sev_cfg.use_cas) return this_cpu_read(svsm_caa); else - return RIP_REL_REF(boot_svsm_caa); + return boot_svsm_caa; } =20 static __always_inline u64 svsm_get_caa_pa(void) { - /* - * Use rIP-relative references when called early in the boot. If - * ->use_cas is set, then it is late in the boot and no need - * to worry about rIP-relative references. - */ - if (RIP_REL_REF(sev_cfg).use_cas) + if (sev_cfg.use_cas) return this_cpu_read(svsm_caa_pa); else - return RIP_REL_REF(boot_svsm_caa_pa); + return boot_svsm_caa_pa; } =20 int svsm_perform_call_protocol(struct svsm_call *call); --=20 2.49.0.504.g3bcea36a83-goog From nobody Thu Dec 18 13:15:19 2025 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7283228D85B for ; Thu, 10 Apr 2025 13:42:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292526; cv=none; b=dMM/U9qq2jZrm8mv+6cZ+ZKKJwjAcGUoA692tfDVElKjz/PqoFNkyMPTx1Qa+2MUgb1cypPbARyI2fsBNCncmpmutk+Qg8v1iIsRkhKwFtHPBW6/bPKzZYey+RcYyLyeXevtugl+WeHWO2Yid4wx7mEhie7Wp+uY5bGFr7n3LF4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744292526; c=relaxed/simple; bh=c7VVRQXrhzgfIlMjr+TpjXTvNCRF1v445ixvQETbT+A=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=fG1hN8SLy6YAPN6PJx12qwbUa5p5Kxm04oBZ7enwdxC18MnymKUoPSGleUPlqyB0Kd/aPQJRB3x8UbzShW9yGNbYl2w4ABY2ow8v3Djzwn1fLh3S6m0fLCjSMqdP3Rs3ok5E1pE8XhGAvInuq/tPbsWr618Ht9vVgg/ChYfKYrc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=oRH7a5BX; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="oRH7a5BX" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43d08915f61so4929035e9.2 for ; Thu, 10 Apr 2025 06:42:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744292523; x=1744897323; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=uOh+IRGczBamQjnnCYnCR7O3Qkx4ta7ti5u2qxCyexk=; b=oRH7a5BXIVMl2WzWDgYkBvEIvoahMb1DbRXVYqg27ybI000V8qrjA7HBZTO2vQGPfi +6rFgUaGJI14p1kjig/Zln3/HUD3eWDkI4ItRgrIIQlq1iXHYz8mrE2aqXs48fdZD4xt aIziUcIlq27+PpdHced+nxi+p+f7XtpEZV64JXmtJl3jaaoDIoe/xezibebqD/x0iudR 0Wj8/ysrAZFaxKKaF/PfrmQ0/CzPgHDOEdi5x2cFWRm4b3jhxNDS4rKbshTCLIHFhKR3 jGM9sbNRor6R1GdA2LlNYMvrERuBO/yAy8jkofxjYsWr6RZhMeCCufkyfS5/0zARyDwt 32eg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744292523; x=1744897323; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uOh+IRGczBamQjnnCYnCR7O3Qkx4ta7ti5u2qxCyexk=; b=OWHJFp/6SM00SO/kgYaz9JCMQt1afLpMtQchTLEQdBy5E8Blt7ZCLNi4OzGwTtPXUu MQdNsjAP1GwMaMKO+IO0rnpUsS8FK3C7CR2EgPtaJqOxBU0nWqj2I6yQNjoOZhlb6lG5 FayKL7Jm36qesPoZaVS/Vb7lpTCIKdZTpMvudNISci5SglIf4PKPb7vbcDJy1+pN7kTL FAN21hh7bB8sef3D+OH7vLbfo8sUP/9A1+oNxSqijD/ipuJGmeiEU27JLySThSmsyMeN vUr0J39Ab4epFufXtpuifNL9yaQCdIp77ws3kvbdl5ts+UrCPvt/SCZBXTdfHNYB80/l 1d7w== X-Forwarded-Encrypted: i=1; AJvYcCVAsk8NzSfvuBU0272REGww98Jjlu1CPsbafInCWp4VcTszibPVucZ0um8gZkYC9a+JZhRZs5K7HwKz/m4=@vger.kernel.org X-Gm-Message-State: AOJu0YwfPT+Q3iYo+Lho/t1u0eZog0tfiEgAHE9k+SvjcKi523EMqBRr +salf8oZe6wuXqm2cmO3fQQyIvN2AXsA8dBL20ciNgYog3u6slnKJq8bIcse/NidVevgdQ== X-Google-Smtp-Source: AGHT+IHlh93oL9mKoWPZ8xOYA3DwDprouXIOHdb9NVnXRG5y9eYSfFLcWKwT5Rx6iE1dQlUAN4pF5K0M X-Received: from wmbeq5.prod.google.com ([2002:a05:600c:8485:b0:43c:eba5:f9b3]) (user=ardb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:19d0:b0:43e:afca:808f with SMTP id 5b1f17b1804b1-43f2ffa2ee7mr26677755e9.31.1744292523066; Thu, 10 Apr 2025 06:42:03 -0700 (PDT) Date: Thu, 10 Apr 2025 15:41:29 +0200 In-Reply-To: <20250410134117.3713574-13-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250410134117.3713574-13-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=717; i=ardb@kernel.org; h=from:subject; bh=X3JfDQ3/BuCITBPYEHh2vS/KH9MRV7HSxm9OJYaOaHg=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIf37qQnnWN6GX+s0/hHHK/zwBN/KK5prDScuMDvg/Dz00 afpPftdO0pZGMQ4GGTFFFkEZv99t/P0RKla51myMHNYmUCGMHBxCsBEtrQy/JUKWSp+5LqmLasU f9ZDuwAHRm4V4RT7x/6zRR/N/a/1XZ6R4eC8BdplW9cqCGc8D1q2+W7RbgOd5z8C3EUF1fb3fKm 7ywwA X-Mailer: git-send-email 2.49.0.504.g3bcea36a83-goog Message-ID: <20250410134117.3713574-24-ardb+git@google.com> Subject: [PATCH v4 11/11] x86/asm: Retire RIP_REL_REF() From: Ard Biesheuvel To: linux-efi@vger.kernel.org Cc: x86@kernel.org, mingo@kernel.org, linux-kernel@vger.kernel.org, Ard Biesheuvel , Tom Lendacky , Dionna Amalie Glaze , Kevin Loughlin Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ard Biesheuvel Now that all users have been moved into startup/ where PIC codegen is used, RIP_REL_REF() is no longer needed. Remove it. Signed-off-by: Ard Biesheuvel --- arch/x86/include/asm/asm.h | 5 ----- 1 file changed, 5 deletions(-) diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h index a9f07799e337..eef0771512de 100644 --- a/arch/x86/include/asm/asm.h +++ b/arch/x86/include/asm/asm.h @@ -120,11 +120,6 @@ static __always_inline __pure void *rip_rel_ptr(void *= p) =20 return p; } -#ifndef __pic__ -#define RIP_REL_REF(var) (*(typeof(&(var)))rip_rel_ptr(&(var))) -#else -#define RIP_REL_REF(var) (var) -#endif #endif =20 /* --=20 2.49.0.504.g3bcea36a83-goog