From nobody Tue Apr 7 14:04:29 2026 Received: from mail-dy1-f201.google.com (mail-dy1-f201.google.com [74.125.82.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CDE743806BA for ; Thu, 26 Feb 2026 06:08:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772086093; cv=none; b=h6Cv9SYy8nxc1kPVqtGhydgOkMSeYAXmK5TrGGZW+ZKsq2Y6kEalASD7ezOdN99RctaT2KjVCf2pw+4gm51gbw7ypZzIG8PQxCtdwSLs/v3UQsVSrt3R1mLHTHRpn+d8gGKdbMJZSXk9mwJlk0ifoSVY8rE/7VKb3D5CQwXSGZ8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772086093; c=relaxed/simple; bh=Nz4Gmortizpfjkwhmhbr0O10+l8K+YfC3BX5KgEJWQc=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=fdKWQD8Wam1wN9gwf7R50PUKlWPEZuVAmtrPqpZXk8BuurB06lkfNmmwVyhPNoeFXy6bE8AJ03B2axN7q8sB3NNMzPBWNjAm9bDQlnDy0emtDq4Jp+bR8CuqMvLlLIohOM686qQq45GSKrAZ/NVrE4hi5mML6KrsTTg/JVt7hCY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--changyuanl.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=pC76cpqz; arc=none smtp.client-ip=74.125.82.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--changyuanl.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="pC76cpqz" Received: by mail-dy1-f201.google.com with SMTP id 5a478bee46e88-2ba9a744f7dso528142eec.0 for ; Wed, 25 Feb 2026 22:08:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1772086091; x=1772690891; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:date:from:to:cc:subject:date:message-id:reply-to; bh=g5s+Ux7bS6G7UTnXgWVijtCdQ7RSC+zE8ekjocJtdk0=; b=pC76cpqz+SBfO93+bkDxR/jDIaZWPJU4do2R0FHiDBDbFovOYRQFAmLNRnbb3jp+b/ OKFFJmts3y2/I010emLS4FACRAjqhNT7+x3QxGtUWbZYW8eqyGQbwkRV8Qg0VvVcXrsO m4ehNzowWcgHbXoq8J4XcvR4OIoOgaj3xz+zEn66NP7v/m6ZTSH0N1Zfwr0z+8Er+Vpe e9VOZMVLLSXG2rkMcwfjOCjODffLJjf5vacjl1yDOFsCWTNF5UEJHlpW3BJ1kihoIo0m 8ZniMiNGskGGoOYY+MjWCe9VajMAu/hyIameV0QJZCz3AleiGLbcgPVbxmb6n2UC/fJO rnXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772086091; x=1772690891; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=g5s+Ux7bS6G7UTnXgWVijtCdQ7RSC+zE8ekjocJtdk0=; b=KurumiSs0DzCfCxwrHNrPuBJy7Ha4XM3k6ByFliZTmZUav8/Zs2B3OuSJOU818x1RI 2chgsK/luoGeIM9VQZ+RqOLFuHO/tChdVyOAGhWXUCnWZ/vjqjvkiSZPM32bsOoiXdvS UC/cO5axvMZt0B291g5n/0+5AC/YZg/y2mPA+4JfNcSPxdzSJfN7MiBuP/WhsR7LTECK 4TsfIroekKB820ZSjJPVVaTcRuyPmSz7++qojSyTdRzO1wm8wnepJv9oro1shwXrZ4Ad n72Yn5HK8Qfqmi/O0ejv7tZeSzR1k7EAKIRvR02Z7IjLMHnJ/HVGaCAc8rV4CXbqa2DD sOBg== X-Forwarded-Encrypted: i=1; AJvYcCWVl/11rMXRI17n3aHqS1BQcXODeWCgawNCr1f3/awaQjUpKvpBY+oTU80Rh9vmOKZVGbVTZqcK3pxnnYY=@vger.kernel.org X-Gm-Message-State: AOJu0YwFEzyPkOu/mwHCEQpF/h6cPZf9OgoDHjsjsSF6Sek1qYtu3/9w cKY/tsEZ7ruvIG/sw7zC84bVNNCpg2v7tyhFCHHlPZ/r8haF3QgwdAVVWqQ8iGwqk08pFj4Gv0W hue1dB6fSkCmghg2IBO24kQ== X-Received: from dybri2.prod.google.com ([2002:a05:7300:f082:b0:2bd:c0e6:3762]) (user=changyuanl job=prod-delivery.src-stubby-dispatcher) by 2002:a05:7300:7c02:b0:2ba:7b0b:e1f1 with SMTP id 5a478bee46e88-2bd7bb55c13mr5599727eec.17.1772086090690; Wed, 25 Feb 2026 22:08:10 -0800 (PST) Date: Wed, 25 Feb 2026 22:07:14 -0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.53.0.414.gf7e9f6c205-goog Message-ID: <20260226060714.1636773-1-changyuanl@google.com> Subject: [RFC] x86/boot: Fix early boot SEV-SNP panic in direct kernel boot From: Changyuan Lyu To: Ard Biesheuvel , Borislav Petkov Cc: Thomas Gleixner , Ingo Molnar , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Tom Lendacky , Kishon Vijay Abraham I , Neeraj Upadhyay , linux-kernel@vger.kernel.org, Changyuan Lyu Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Hi all, I'm writing to report a regression introduced by commit 68a501d7fd82 ("x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check") and to request feedback on the best approach to fix it. =3D=3D The Bug =3D=3D Commit 68a501d7fd82 ("x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check") introduced a regression that causes SEV-SNP guests to panic during early boot under specific booting conditions. By design, snp_vmpl should only be assigned a non-zero value when a Secure VM Service Module (SVSM) is enabled and the guest is running at a VMPL other than 0. The commit refactored the VMPL0 enforcement check in sev_enable() to rely exclusively on this variable: if (snp_vmpl && !(hv_features & GHCB_HV_FT_SNP_MULTI_VMPL)) sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_NOT_VMPL0); This panic specifically manifests when there is *no* SVSM present, and the kernel is booted via a direct kernel boot rather than the EFI stub. When booting via the EFI stub (drivers/firmware/efi/libstub/x86-stub.c), the environment and zeroed memory are set up by the EFI loader before calling sev_enable().=20 However, lightweight firmwares=E2=80=94such as Project Oak's stage0 (https://github.com/project-oak/oak/tree/main/stage0_bin)=E2=80=94jump stra= ight to the kernel's 64-bit entry point following the 64-bit Linux boot protocol, bypassing the EFI stub entirely. During this direct boot path, head_64.S calls sev_enable() exceptionally early in the compressed kernel boot sequence, significantly before the .bss section is cleared by the rep stosq routine in .Lrelocated. Because snp_vmpl is declared as an uninitialized global (u8 snp_vmpl;), it is placed in the .bss section. When sev_enable() reads it during a direct boot, the memory contains uninitialized garbage data. If this garbage data happens to be non-zero, the kernel erroneously assumes it is running at a non-zero VMPL. Because there is no SVSM present, the guest forcefully terminates itself. =3D=3D Reproduction =3D=3D The issue was reproduced and tested on an AMD EPYC 7B13 64-Core Processor. The stage0_sev.bin firmware used for testing can be built from https://github.com/project-oak/oak/ via: $ bazel build //stage0_bin:stage0_bin 1. Reproducing with QEMU: $ ./qemu-system-x86_64 -nodefaults -nographic -vga none \ -M q35,confidential-guest-support=3Dcgs \ -accel kvm,kernel-irqchip=3Dsplit \ -bios stage0_sev.bin \ -append "console=3DttyS0" \ -initrd initramfs.linux_amd64.cpio \ -kernel ./vmlinuz-x86 \ -m size=3D1024m \ -smp 2 \ -serial stdio \ -cpu host,x2apic \ -object sev-snp-guest,id=3Dcgs,cbitpos=3D51,reduced-phys-bits=3D1 QEMU panic log: stage0 INFO: jumping to kernel at 0x0000000002000200 EAX=3D00000000 EBX=3D00000000 ECX=3D00000000 EDX=3D00a00f11 ESI=3D00000000 EDI=3D00000000 EBP=3D00000000 ESP=3D00000000 EIP=3D0000fff0 EFL=3D00000002 [-------] CPL=3D0 II=3D0 A20=3D1 SMM=3D0 HLT= =3D0 ... Code=3Dc5 5a 08 2d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 6d f0 00 00 00 00 00 00 00 00 00 00 00 00 00 ?? ?? ?? ?? ?? ?? ?? 2. Reproducing with Alioth (https://github.com/google/alioth): $ ./alioth --log-to-file -l trace boot \ --cmdline "console=3DttyS0" \ --kernel ./vmlinuz-x86 \ --cpu count=3D2 \ --initramfs initramfs.linux_amd64.cpio \ --memory size=3D1g,backend=3Dmemfd \ --coco snp,policy=3D0x30000 \ --firmware stage0_sev.bin=20 Alioth panic log: stage0 INFO: jumping to kernel at 0x0000000002000200 Error: VM did not shutdown peacefully 0: Failed to handle VM exit: KvmRunExitSystemEvent { type_: KvmSystemEvent(0x6), flags: 0x31100, }, at alioth/src/hv/kvm/vcpu/vmexit.rs:84:14 1: Failed to run VCPU-0, at /alioth/src/board/board.rs:381:46 2: VCPU-0 error, at alioth/src/vm/vm.rs:275:25 3: VM did not shutdown peacefully, at alioth-cli/src/boot/boot.rs:474:15 =3D=3D A simple fix =3D=3D I tried moving snp_vmpl to .data: diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c index c8c1464b3a56e..a1a1cb47e7b93 100644 --- a/arch/x86/boot/compressed/sev.c +++ b/arch/x86/boot/compressed/sev.c @@ -35,7 +35,7 @@ struct ghcb *boot_ghcb; #define __BOOT_COMPRESSED -u8 snp_vmpl; +u8 snp_vmpl __section(".data") =3D 0; u16 ghcb_version; u64 boot_svsm_caa_pa; -- I tested locally with this approach, and both VMMs can boot the kernel successfully. However, Gemini identified that this approach breaks SVSM guests due to how the decompressor handles the .bss section during relocation. Below is its analyses. =3D=3D The Complication (.bss wiping) =3D=3D The seemingly obvious fix is to move `snp_vmpl` to `.data` (e.g.,=20 `u8 snp_vmpl __section(".data") =3D 0;`). However, doing this alone breaks SVSM guests due to how the decompressor handles the `.bss` section during relocation. At .Lrelocated in arch/x86/boot/compressed/head_64.S, .bss is wiped to 0. Currently, for SVSM guests, both snp_vmpl and boot_svsm_caa_pa (which are populated in sev_enable()) are wiped to 0. The kernel accidentally survives this wipe because extract_kernel() later calls early_is_sevsnp_guest(), which contains a fallback: if (!snp_vmpl) { /* ... CPUID checks ... */ raw_rdmsr(MSR_SVSM_CAA, &m); boot_svsm_caa_pa =3D m.q; snp_vmpl =3D U8_MAX; } Because snp_vmpl was wiped to 0, this fallback triggers and successfully recovers the physical address of the SVSM Calling Area into boot_svsm_caa_p= a. If we move *only* snp_vmpl to .data, it survives the wipe (e.g., snp_vmpl = =3D 1). But boot_svsm_caa_pa is still in .bss and gets wiped to 0. The fallback in early_is_sevsnp_guest() is skipped (since snp_vmpl !=3D 0), leaving boot_svsm_caa_pa =3D=3D 0. Shortly after, when extract_kernel() attempts to= accept memory, the guest crashes when it tries to use physical address 0 for the S= VSM CAA. =3D=3D Proposed Solutions =3D=3D I analyzed the AI's analyses above to the best of my ability, and I think it is correct. But I do not have an SVSM environment to test it out. To safely resolve this, we have two options. I'd like to ask the maintainers which approach is preferred: Option 1: Revert commit 68a501d7fd82 By reverting the commit and bringing back the RMPADJUST check, we avoid reading uninitialized .bss memory to determine the VMPL level. This sideste= ps the .bss initialization order issue entirely. Option 2: Move early SEV variables to .data (Proposed by Gemini) We can explicitly move snp_vmpl, boot_svsm_caa_pa, and ghcb_version to .dat= a.=20 u8 snp_vmpl __section(".data") =3D 0; u16 ghcb_version __section(".data") =3D 0; u64 boot_svsm_caa_pa __section(".data") =3D 0; This protects them from the garbage-read during direct boot, and properly preserves their initialized SVSM states across the .bss wipe at .Lrelocated, intentionally bypassing the need for the accidental MSR fallback recovery. Does anyone have a preference between reverting the original commit versus moving the affected global variables to .data? Or please let me know if AI's alert is a false positive. I am happy to submit a formal patch for whichever route is preferred. Thanks, Changyuan Lyu