From nobody Mon Feb 9 07:20:26 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BD3C312B73 for ; Sat, 1 Feb 2025 00:50:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738371054; cv=none; b=VQqpv2oJ+3RxSCfzwZ/VCQe7uzDSzewXAv76ousVYzTAOSbF2BcET6Gbnj39OnJPNrRb91eM0UvCUV/Zh/gDtzUYYvL7OWcwkTmtxHEHQVANfU0lLC3lzPwvzMruEhx89lTmsvQ3WTlusYJB+Fhkkr+ieJ/RT/C+E4I6IefMOqs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738371054; c=relaxed/simple; bh=8r1IMSgN9S/6dfm6fHWdqJHUa5NHhwmDUjPOyM4evr0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=uittHUJUX+LMeP2hRfWo9qtBuMCQuutyuSOPJ3ihhUMkHZE8vsrzSRjVfTCupHNr5L5NROueQEEHR7eB45vkYMTY907zsRBRUNzdkkpe68wq2x4+p2Dhn5+0CkHbwRWNXjabVJGZ9HQE64RkLZjeM01+mXQ6vV7Mpx/Xd0Sea4g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=zcyRsBr0; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="zcyRsBr0" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ef775ec883so4985119a91.1 for ; Fri, 31 Jan 2025 16:50:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738371052; x=1738975852; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=k2K+3ES3yFwMq3HJwtK6RVZhiYj3gk5lp/ZXG2ZzfAM=; b=zcyRsBr0SNRq3ekT4ZrY8GDjKr4FK9hIS7uItjluocvQkw/DaRTe7iLAoZizPg4hw6 2iTQ2+Md8c4R4SoNNCWzfU0HaeND2ltA1WCprES4sQmfTR/WKU0w4XUFRJz3tlFyXmmN VwgZdhWzlv9t8ysKH/9fdZULTFwHaaiZ3HMgWfb7vQkj+g3YhD07jSP024ve3L89fOlZ hV5IoC2sgm9S2No0oPsciLkx79tO29hs9xX9a3ay6GgPNhnw2jlQGiqNt8Kc5RqWvHyJ 1aHIhZA6vXSDnUIUHtg3vsB4qHafzfV4QlXUYIfQdClRhyzoIp0bvHtm37VvSbzXdCgR apXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738371052; x=1738975852; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=k2K+3ES3yFwMq3HJwtK6RVZhiYj3gk5lp/ZXG2ZzfAM=; b=wjMmajx+/MXAsV14cHrVq5YdSR3EljDqEDy4JeXnQTHm+FvmAmn2ClMuDK0rMnXYBY 4nKwdabXnkYuy2iDHBIcZpVbb+uURF5Erl6gtjxOVEKmlERBpEShWoFSgRtfnAhu8o+m 1GXzgZ/zXfs/D46+D8cOzvGudQ4Z8rKubieVqnaJHLGrFHpiKr9zIkFQmD/ss2OAq6iO MUKBovdjcLJd0d9+8j5g/MvNlNyNe5vlPfbRd98YQdTG3Xumz7OLdSam683lMW8hBSfC ACznogqKxBI0HSgYYsTrBMYRYAyI78lHpEC/Q8WpbXtmhhL0EcE1W7AMr/6KrV8QA7jP tk8Q== X-Gm-Message-State: AOJu0YwSX7Vq1k760OU/e5PXpkQ3MJoX8pqcXBj/BFXJVGESkiz5MSbQ 1bfRgQ5AeagIhu1i2aIJSrsloMeHOHrG0Xa13rk2J+mjBffrT5xDF9Uyc7OWU7i+FDAfnV7ywr4 sYA== X-Google-Smtp-Source: AGHT+IG2FdkNAipz+7YfRFK37esVegmX9yUIpKPkI5IQki8FVTV9TcslwFm+Z3SMc3yQI05wjkmkJT6td5Y= X-Received: from pjbov11.prod.google.com ([2002:a17:90b:258b:b0:2ef:7352:9e97]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3a0c:b0:2ee:c9b6:4c42 with SMTP id 98e67ed59e1d1-2f83abff391mr21670588a91.16.1738371052098; Fri, 31 Jan 2025 16:50:52 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 31 Jan 2025 16:50:47 -0800 In-Reply-To: <20250201005048.657470-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250201005048.657470-1-seanjc@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250201005048.657470-2-seanjc@google.com> Subject: [PATCH 1/2] x86/mtrr: Return success vs. "failure" from guest_force_mtrr_state() From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Paolo Bonzini Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Sean Christopherson , Dionna Glaze , Peter Gonda , "=?UTF-8?q?J=C3=BCrgen=20Gro=C3=9F?=" , Kirill Shutemov , Vitaly Kuznetsov , "H . Peter Anvin" , Binbin Wu , Tom Lendacky Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When *potentially* forcing MTRRs to a single memory type, return whether or not MTRRs were indeed overridden so that the caller can take additional action when necessary. E.g. KVM-as-a-guest will use the information to also force the PAT memtype for legacy devices to be WB. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson --- arch/x86/include/asm/mtrr.h | 5 +++-- arch/x86/kernel/cpu/mtrr/generic.c | 11 +++++++---- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h index c69e269937c5..598753189f19 100644 --- a/arch/x86/include/asm/mtrr.h +++ b/arch/x86/include/asm/mtrr.h @@ -58,7 +58,7 @@ struct mtrr_state_type { */ # ifdef CONFIG_MTRR void mtrr_bp_init(void); -void guest_force_mtrr_state(struct mtrr_var_range *var, unsigned int num_v= ar, +bool guest_force_mtrr_state(struct mtrr_var_range *var, unsigned int num_v= ar, mtrr_type def_type); extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform); extern void mtrr_save_fixed_ranges(void *); @@ -75,10 +75,11 @@ void mtrr_disable(void); void mtrr_enable(void); void mtrr_generic_set_state(void); # else -static inline void guest_force_mtrr_state(struct mtrr_var_range *var, +static inline bool guest_force_mtrr_state(struct mtrr_var_range *var, unsigned int num_var, mtrr_type def_type) { + return false; } =20 static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform) diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/= generic.c index 2fdfda2b60e4..4fd704907dbc 100644 --- a/arch/x86/kernel/cpu/mtrr/generic.c +++ b/arch/x86/kernel/cpu/mtrr/generic.c @@ -435,19 +435,21 @@ void __init mtrr_copy_map(void) * @var: MTRR variable range array to use * @num_var: length of the @var array * @def_type: default caching type + * + * Returns %true if MTRRs were overridden, %false if they were not. */ -void guest_force_mtrr_state(struct mtrr_var_range *var, unsigned int num_v= ar, +bool guest_force_mtrr_state(struct mtrr_var_range *var, unsigned int num_v= ar, mtrr_type def_type) { unsigned int i; =20 /* Only allowed to be called once before mtrr_bp_init(). */ if (WARN_ON_ONCE(mtrr_state_set)) - return; + return false; =20 /* Only allowed when running virtualized. */ if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) - return; + return false; =20 /* * Only allowed for special virtualization cases: @@ -460,7 +462,7 @@ void guest_force_mtrr_state(struct mtrr_var_range *var,= unsigned int num_var, !hv_is_isolation_supported() && !cpu_feature_enabled(X86_FEATURE_XENPV) && !cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) - return; + return false; =20 /* Disable MTRR in order to disable MTRR modifications. */ setup_clear_cpu_cap(X86_FEATURE_MTRR); @@ -480,6 +482,7 @@ void guest_force_mtrr_state(struct mtrr_var_range *var,= unsigned int num_var, mtrr_state.enabled |=3D MTRR_STATE_MTRR_ENABLED; =20 mtrr_state_set =3D 1; + return true; } =20 static u8 type_merge(u8 type, u8 new_type, u8 *uniform) --=20 2.48.1.362.g079036d154-goog From nobody Mon Feb 9 07:20:26 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9CC451F92A for ; Sat, 1 Feb 2025 00:50:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738371056; cv=none; b=EZRVDaNSoDN2KpCOc0nVsvfiGveqFrjcmbauUSZDjTnKLtOom/NAeTF+hLAfPK4yZQYyTTqHKO27AfTAeFu8OfRpkd5sobNkVDEijf+Q0DjBG8MMY1HkmSIvsaPGUICHjtTd/KT6m3UVGdSjp/af3FHLuL6KB3XADarFkYMPfvQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738371056; c=relaxed/simple; bh=Q0dVAZtcWC3kGSriq20XNsob1ps+6rdAl4KBthV0lsE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=eSe0iybpUQgLqLQJFVz6Yckm/PxS2yoVlktr83KUJMkopmZMkEArypWlGrBMAKY6GlCFrrQ1ngrtuHX3MaSkcDp/TYT7votD6ZZuyORxxQ9dgUbyhTg6TzQtHZW6j9TwMQqDQ8ZPqr/Es0azg6oboVzAuD4Hxv3yyLPUWzwkivI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=zwRd8JYi; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="zwRd8JYi" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2ef7fbd99a6so4908957a91.1 for ; Fri, 31 Jan 2025 16:50:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738371054; x=1738975854; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:from:to:cc:subject:date :message-id:reply-to; bh=hoM9urIHNLcJgS4Dc2h2QR/XgwI2ycQ6Fgv8TpOxhpY=; b=zwRd8JYi18vRFX1iXQGiCjxldWd0x9tqHkJwq12s0oXEz+AILEj5mePMp7y1kXuomY TwUyfsGxptGe77MO4jSEsjaaUvaIB/YryLyXVK+Ety1N7cCfozWDdPPptwrPF7uy/WQl vaW7xYyklq5sZxDmhRFLXf3FEvFT5fRWfX7zcB3QH4ijkttLLUmG0w59Iy4XkFTs7m5k +YYDaQjTp+GpQwM6EJ9kWFzyA4HyKwGGQQGFByqn1vENYh0xhaEcrc860eDXr3WUvRkj 8qTzrxfFv8VRxY1jJtpcjUXSnoC4AARhnedNFARMPHoaDQ5WSjX00t3+bQzTwUk7pl5H wvOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738371054; x=1738975854; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=hoM9urIHNLcJgS4Dc2h2QR/XgwI2ycQ6Fgv8TpOxhpY=; b=cRyR+RnmYf9+B3nIEXADfDUoz/pNfmZah8tL9zLckUY30Mx17aL5EJgbyR1SSexsrJ ASN9s668u2DOCudnOTZEAdUr7QuIbySuw+tFwnkLQD86gZz90jJYErZfRq9XIUZ2I5qw EF8PJuZnzNAG77KoIGuoNToF8W+acUpKACv57jnvsphZ5kEib6XzYD66GJESalDIPHRu PdwhBDCrnECAXTPWYq9JRDKuyOZwyo/orJWaFA8KoxoW0Eyj6bOmv2gS8cv2/hfdMwH7 HN8EL3UIDmHU9mTKaqhA5zmD9l9MQ/D5XODj9hlI5dtD0C/qYNiBeT7mxMAUVkq75qME B6rA== X-Gm-Message-State: AOJu0YwgRzDO/231Irvej0h8fUtb/dC3WAtiXwYPRSPjWKpwgiXlG0AI eOlHWk7lFypQ9AH5LFFvQOs/hgj1cBD6Vq5VGQ4uJoKSD4eVOy0MjE04rc2rEW41KrJc3F1i0FC M/w== X-Google-Smtp-Source: AGHT+IGI421GnSBBqUYjOcLFGbdaN53zonktkZ1lI4ZcZUJ4Es2rBGDtWBpKi7KRzlsLCpJZRLs34Ghhh2c= X-Received: from pjboh14.prod.google.com ([2002:a17:90b:3a4e:b0:2f5:63a:4513]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:51c6:b0:2ea:4c4f:bd20 with SMTP id 98e67ed59e1d1-2f83ac86e65mr16933408a91.32.1738371053817; Fri, 31 Jan 2025 16:50:53 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 31 Jan 2025 16:50:48 -0800 In-Reply-To: <20250201005048.657470-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250201005048.657470-1-seanjc@google.com> X-Mailer: git-send-email 2.48.1.362.g079036d154-goog Message-ID: <20250201005048.657470-3-seanjc@google.com> Subject: [PATCH 2/2] x86/kvm: Override low memory above TOLUD to WB when MTRRs are forced WB From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Paolo Bonzini Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Sean Christopherson , Dionna Glaze , Peter Gonda , "=?UTF-8?q?J=C3=BCrgen=20Gro=C3=9F?=" , Kirill Shutemov , Vitaly Kuznetsov , "H . Peter Anvin" , Binbin Wu , Tom Lendacky Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When running as an SNP or TDX guest under KVM, treat the legacy PCI hole, i.e. memory between Top of Lower Usable DRAM and 4GiB, as an untracked PAT range to workaround issues with mapping legacy devices when MTRRs are forced to WB. In most KVM-based setups, legacy devices such as the HPET and TPM are enumerated via ACPI. For unknown reasons, ACPI auto-maps such devices as WB, whereas the dedicated device drivers map memory as WC or UC. In normal setups, the entire mess "works" as firmware configures the PCI hole (and other device memory) to be UC in the MTRRs. As a result, the ACPI mappings end up UC, which is compatible with the drivers' requested WC/UC-. With WB MTRRs, the ACPI mappings get their requested WB. If acpi_init() runs before the corresponding device driver is probed, ACPI's WB mapping will "win", and result in the driver's ioremap() failing because the existing WB mapping isn't compatible with the requested WC/UC-. E.g. when a TPM is emulated by the hypervisor (ignoring the security implications of relying on what is allegedly an untrusted entity to store measurements), the TPM driver will request UC and fail: [ 1.730459] ioremap error for 0xfed40000-0xfed45000, requested 0x2, got = 0x0 [ 1.732780] tpm_tis MSFT0101:00: probe with driver tpm_tis failed with e= rror -12 Note, the '0x2' and '0x0' values refer to "enum page_cache_mode", not x86's memtypes (which frustratingly are an almost pure inversion; 2 =3D=3D WB, 0 = =3D=3D UC). The above trace is from a Google-VMM based VM, but the same behavior happens with a QEMU based VM. E.g. tracing mapping requests for HPET under QEMU yields: Mapping HPET, req_type =3D 0 WARNING: CPU: 5 PID: 1 at arch/x86/mm/pat/memtype.c:528 memtype_reserve+= 0x22f/0x3f0 Call Trace: __ioremap_caller.constprop.0+0xd6/0x330 acpi_os_map_iomem+0x195/0x1b0 acpi_ex_system_memory_space_handler+0x11c/0x2f0 acpi_ev_address_space_dispatch+0x168/0x3b0 acpi_ex_access_region+0xd7/0x280 acpi_ex_field_datum_io+0x73/0x210 acpi_ex_extract_from_field+0x267/0x2a0 acpi_ex_read_data_from_field+0x8e/0x220 acpi_ex_resolve_node_to_value+0xe2/0x2b0 acpi_ds_evaluate_name_path+0xa9/0x100 acpi_ds_exec_end_op+0x21f/0x4c0 acpi_ps_parse_loop+0xf4/0x670 acpi_ps_parse_aml+0x17b/0x3d0 acpi_ps_execute_method+0x137/0x260 acpi_ns_evaluate+0x1f0/0x2d0 acpi_evaluate_object+0x13d/0x2e0 acpi_evaluate_integer+0x50/0xe0 acpi_bus_get_status+0x7b/0x140 acpi_add_single_object+0x3f8/0x750 acpi_bus_check_add+0xb2/0x340 acpi_ns_walk_namespace+0xfe/0x200 acpi_walk_namespace+0xbb/0xe0 acpi_bus_scan+0x1b5/0x1d0 acpi_scan_init+0xd5/0x290 acpi_init+0x1fc/0x520 do_one_initcall+0x41/0x1d0 kernel_init_freeable+0x164/0x260 kernel_init+0x16/0x1a0 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x11/0x20 ---[ end trace 0000000000000000 ]--- The only reason this doesn't cause problems for HPET is because HPET gets special treatment via x86_init.timers.timer_init(), and so gets a chance to create its UC- mapping before acpi_init() clobbers things. Disabling the early call to hpet_time_init() yields the same behavior for HPET: [ 0.318264] ioremap error for 0xfed00000-0xfed01000, requested 0x2, got = 0x0 Hack around the mess by forcing such mappings to WB, as the memory type is irrevelant. Even in a theoretical setup where such devices are passed through by the host, i.e. point at real MMIO memory, it is KVM's (as the hypervisor) responsibility to force the memory to be WC/UC, e.g. via EPT memtype under TDX or real hardware MTRRs under SNP. Not doing so cannot work, and the hypervisor is highly motivated to do the right thing as letting the guest access hardware MMIO with WB would likely result in a variety of fatal #MCs. Limit the hack to the legacy PCI hole on the off chance that there are use cases that want to map virtual devices with WC/UC. E.g. in theory, it would be possible to expose hardware GPU buffers to an SNP or TDX guest. Extending the hack, e.g. if there are use cases for memory above 4GiB that are affected by ACPI, is far easier than debugging memory corruption if a driver requests WC/UC and silently gets WB. Double down on forcing everything to WB, e.g. instead of fixing the CR0.CD issue and reverting to a "normal" model, as OVMF has also been taught to ignore MTRRs when running as a TDX guest: 3a3b12cbda ("UefiCpuPkg/MtrrLib: MtrrLibIsMtrrSupported always return FAL= SE in TD-Guest") 071d2cfab8 ("OvmfPkg/Sec: Skip setup MTRR early in TD-Guest") And running with firmware that doesn't program MTRRs would likely put the kernel back into the conundrum of ACPI mapping devices WB, with drivers wanting WC/UC-. Fixes: 8e690b817e38 ("x86/kvm: Override default caching mode for SEV-SNP an= d TDX") Cc: stable@vger.kernel.org Cc: Dionna Glaze Cc: Peter Gonda Cc: J=C3=BCrgen Gro=C3=9F Cc: Kirill Shutemov Cc: Vitaly Kuznetsov Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Binbin Wu Cc: Tom Lendacky Signed-off-by: Sean Christopherson --- arch/x86/kernel/kvm.c | 31 +++++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 7a422a6c5983..7ae294fe99c3 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -931,6 +931,23 @@ static void kvm_sev_hc_page_enc_status(unsigned long p= fn, int npages, bool enc) KVM_MAP_GPA_RANGE_ENC_STAT(enc) | KVM_MAP_GPA_RANGE_PAGE_SZ_4K); } =20 +static u64 kvm_tolud __ro_after_init; + +static bool kvm_is_forced_wb_range(u64 start, u64 end) +{ + /* + * In addition to the standard ISA override, force all low memory above + * TOLUD to WB so that legacy devices are mapped with WB when running + * as an SNP or TDX guest. The memtype itself is completely irrevelant + * as the devices are emulated, the override^Whack is needed purely to + * avoid failures due to ACPI mapping device memory as WB in advance of + * device drivers requesting WC or UC. In a system with MTRRs, ACPI's + * mappings get forced to UC via MTRRs (programmed sanely by firmware). + */ + return is_ISA_range(start, end) || + (start >=3D kvm_tolud && end <=3D SZ_4G); +} + static void __init kvm_init_platform(void) { if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) && @@ -982,8 +999,18 @@ static void __init kvm_init_platform(void) kvmclock_init(); x86_platform.apic_post_init =3D kvm_apic_init; =20 - /* Set WB as the default cache mode for SEV-SNP and TDX */ - guest_force_mtrr_state(NULL, 0, MTRR_TYPE_WRBACK); + /* + * Set WB as the default cache mode for SEV-SNP and TDX. MTRRs may be + * enumerated as supported, but neither the TDX-Module (Secure EPT) nor + * KVM (normal EPT for TDX, virtual MTRRs for NPT) actually virtualizes + * MTRR memory types. If MTRRs are forced to writeback, register KVM's + * range-based WB override to handle cases where device drivers try to + * map an emulated device's memory as WC, and fail because it's all WB. + */ + if (guest_force_mtrr_state(NULL, 0, MTRR_TYPE_WRBACK)) { + kvm_tolud =3D (e820__end_of_low_ram_pfn() << PAGE_SHIFT); + x86_platform.is_untracked_pat_range =3D kvm_is_forced_wb_range; + } } =20 #if defined(CONFIG_AMD_MEM_ENCRYPT) --=20 2.48.1.362.g079036d154-goog