From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C2B42C159A for ; Thu, 30 Apr 2026 15:07:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561683; cv=none; b=oQ43gBpDHUULcCZNr2Upa4Pzp2Y3Kk04ZYxSYyHq0ruxZAWypTNABRAN5oWEtU6P+epeA+dHAG9yu9xnMmAvjLsio2ZsFLqOMzQ0J+P6crL5vqWmKKDvhyopm43x4VRUSilAGAeHKzGh6imp+U9Yqyl2Oy5EO/o4Z/Al37uioTI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561683; c=relaxed/simple; bh=qN6uNsXXrdxPWeiUyZUCQi9CQmefkvVmCbWD7S5KIeA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=FxYIpJgUjCdSHnrzOupnLPLZzfanC8TIFQgbMEaBtxg75Zsob493YK0sC0vlgcgeAdvyVki5leOcko+KNgAFCUR6cgEItqyANeg7fNiiHZTLbkhcc+gdSoAXf940pz4YzXEOdROybmvvEv9umSi6aeUJ1xQd16wE4N9IqSaL7xk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=B70tK9mm; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="B70tK9mm" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561678; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mgCNJ6QbPXfdRFucAOWtH1ZqPtP2sYd3GhwHotAeOtc=; b=B70tK9mm911nvmMS+DHN2+KCAEQrOXOIgnBBcv+b59G+W5+x+wi7eEKkMmAYKLNQ8Gu6Vk 9zhF0PRGgCYWkgcYiuEE746/jx6x/c6CK2n2d64NVYLVDIuVx7vmtVTm5HvJNkPmsUeuwZ IP4sfOaD8q0+ssbmKGDiPSP3FSJUlrI= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-314-lYvAiO0cNxWV0EBGXJWXaQ-1; Thu, 30 Apr 2026 11:07:51 -0400 X-MC-Unique: lYvAiO0cNxWV0EBGXJWXaQ-1 X-Mimecast-MFC-AGG-ID: lYvAiO0cNxWV0EBGXJWXaQ_1777561670 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 24EF919560A5; Thu, 30 Apr 2026 15:07:50 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 3EE001800480; Thu, 30 Apr 2026 15:07:49 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com, Sean Christopherson Subject: [PATCH 01/28] KVM: TDX/VMX: rework EPT_VIOLATION_EXEC_FOR_RING3_LIN into PROT_MASK Date: Thu, 30 Apr 2026 11:07:20 -0400 Message-ID: <20260430150747.76749-2-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" From: Jon Kohler EPT exit qualification bit 6 is used when mode-based execute control is enabled, and reflects user executable addresses. Rework name to reflect the intention and add to EPT_VIOLATION_PROT_MASK, which allows simplifying the return evaluation in tdx_is_sept_violation_unexpected_pending a pinch. Rework handling in __vmx_handle_ept_violation to unconditionally clear EPT_VIOLATION_PROT_USER_EXEC until MBEC is implemented, as suggested by Sean [1]. Note: Intel SDM Table 29-7 defines bit 6 as: If the "mode-based execute control" VM-execution control is 0, the value of this bit is undefined. If that control is 1, this bit is the logical-AND of bit 10 in the EPT paging-structure entries used to translate the guest-physical address of the access causing the EPT violation. In this case, it indicates whether the guest-physical address was executable for user-mode linear addresses. [1] https://lore.kernel.org/all/aCJDzU1p_SFNRIJd@google.com/ Suggested-by: Sean Christopherson Signed-off-by: Jon Kohler Message-ID: <20251223054806.1611168-2-jon@nutanix.com> Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/vmx.h | 5 +++-- arch/x86/kvm/vmx/common.h | 9 +++++++-- arch/x86/kvm/vmx/tdx.c | 2 +- 3 files changed, 11 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 37080382df54..b2291a766e3f 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -608,10 +608,11 @@ enum vm_entry_failure_code { #define EPT_VIOLATION_PROT_READ BIT(3) #define EPT_VIOLATION_PROT_WRITE BIT(4) #define EPT_VIOLATION_PROT_EXEC BIT(5) -#define EPT_VIOLATION_EXEC_FOR_RING3_LIN BIT(6) +#define EPT_VIOLATION_PROT_USER_EXEC BIT(6) #define EPT_VIOLATION_PROT_MASK (EPT_VIOLATION_PROT_READ | \ EPT_VIOLATION_PROT_WRITE | \ - EPT_VIOLATION_PROT_EXEC) + EPT_VIOLATION_PROT_EXEC | \ + EPT_VIOLATION_PROT_USER_EXEC) #define EPT_VIOLATION_GVA_IS_VALID BIT(7) #define EPT_VIOLATION_GVA_TRANSLATED BIT(8) =20 diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h index 412d0829d7a2..adf925500b9e 100644 --- a/arch/x86/kvm/vmx/common.h +++ b/arch/x86/kvm/vmx/common.h @@ -94,8 +94,13 @@ static inline int __vmx_handle_ept_violation(struct kvm_= vcpu *vcpu, gpa_t gpa, /* Is it a fetch fault? */ error_code |=3D (exit_qualification & EPT_VIOLATION_ACC_INSTR) ? PFERR_FETCH_MASK : 0; - /* ept page table entry is present? */ - error_code |=3D (exit_qualification & EPT_VIOLATION_PROT_MASK) + /* + * ept page table entry is present? + * note: unconditionally clear USER_EXEC until mode-based + * execute control is implemented + */ + error_code |=3D (exit_qualification & + (EPT_VIOLATION_PROT_MASK & ~EPT_VIOLATION_PROT_USER_EXEC)) ? PFERR_PRESENT_MASK : 0; =20 if (exit_qualification & EPT_VIOLATION_GVA_IS_VALID) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 1e47c194af53..89f9fe30435d 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1845,7 +1845,7 @@ static inline bool tdx_is_sept_violation_unexpected_p= ending(struct kvm_vcpu *vcp if (eeq_type !=3D TDX_EXT_EXIT_QUAL_TYPE_PENDING_EPT_VIOLATION) return false; =20 - return !(eq & EPT_VIOLATION_PROT_MASK) && !(eq & EPT_VIOLATION_EXEC_FOR_R= ING3_LIN); + return !(eq & EPT_VIOLATION_PROT_MASK); } =20 static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu) --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2A632C21FE for ; Thu, 30 Apr 2026 15:07:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561676; cv=none; b=G+250ihAZSkGH/Yqmkl3HK2wLU8qqdiofjBzEqkBGokz98wKBfEFrrM0BpJlkzv8fHOdK/2MvLD7BcTlhfidORi07q20aMewWUj3463Xn3VzW3E7Djt/saZeZfrFzZ7vvL5ZjEtx2KS22V8fRVr+cJ1CKnRhIRt2ujG8i1lfXuc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561676; c=relaxed/simple; bh=uoR0FXK9mGMvTToC/wIA4BhxmTH60bS/XJ0L3zm8OHw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=cnjhdzRfCln6lplZ3zINQXPLbnlXnSKBjm9FdIhI91qivi+0ocmLcPPNjJK9bSwoBXWyBtVvKlFOF8ya2732OIXay1ol2KPEOy7mB0EzG8wUqNZ77zM5zE/Vf+S491NQi1aZ1/WxRQGq2sFbl/vCLOEEe0wnhWQFA0S3DDhGzrw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=cPZc0cP3; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cPZc0cP3" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561673; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6obxhOSXhhEiKG2m9zciOCfCB3jjUQpgTbvMPVwc6To=; b=cPZc0cP3NWVxLBEIcSr6qm74m9fapuIwHh1vArOzjDY772bhSM4pfdxyKtNkWci/a91E4N c4eNmOkU8DYd4IUC+ZlCxpZ40pRqFRcgYNQuU26Qj1XyrIFHqKuYjVZp//xAjTX0CW72e4 nY5fP47TwYOJ8YzeJU/uRfjHBmP4mXc= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-590-p_kAsxbhMw-csVOf-VgdoA-1; Thu, 30 Apr 2026 11:07:52 -0400 X-MC-Unique: p_kAsxbhMw-csVOf-VgdoA-1 X-Mimecast-MFC-AGG-ID: p_kAsxbhMw-csVOf-VgdoA_1777561671 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 118C11800282; Thu, 30 Apr 2026 15:07:51 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 4B3721800347; Thu, 30 Apr 2026 15:07:50 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 02/28] KVM: x86/mmu: remove SPTE_PERM_MASK Date: Thu, 30 Apr 2026 11:07:21 -0400 Message-ID: <20260430150747.76749-3-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" From: Jon Kohler SPTE_PERM_MASK is no longer referenced by anything in the kernel. Signed-off-by: Jon Kohler Message-ID: <20251223054806.1611168-3-jon@nutanix.com> Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/spte.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 91ce29fd6f1b..28086fa86fe0 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -42,9 +42,6 @@ static_assert(SPTE_TDP_AD_ENABLED =3D=3D 0); #define SPTE_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)) #endif =20 -#define SPTE_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_m= ask \ - | shadow_x_mask | shadow_nx_mask | shadow_me_mask) - #define ACC_EXEC_MASK 1 #define ACC_WRITE_MASK PT_WRITABLE_MASK #define ACC_USER_MASK PT_USER_MASK --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EEE7C2D0C64 for ; Thu, 30 Apr 2026 15:07:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561683; cv=none; b=jyTvNj165f0z4tJ2kO9C70WV81M39OlLgyrOm129aKYAvLAK1augPk01VQ/aCVwAl9D8NtCK6FTB2RAar+JWG52Ca8L0qQtA124RDrNFkcazjTnO2nmlZ+Qp3+GKtFBboDjk/E4BrIaM7bjRexcRpWjEMzupEW2H1rKazObsaD4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561683; c=relaxed/simple; bh=zZVoVFJ/nOF7y9rXzfbb0RWNmNB2de94MyMjpNi5bVk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=dGCBEX/ZI2u3bbMliBB+XB1ZLwt+zrja7KXnxWWdSAJv6Sgd6TdZEyf8knfSm1tf2vu4RFvj0Y/4L8mggwfJKjMJX8pATF+48G/88zd0sxJYebNoLorchzglvy40T28F1vwF3CKFT6NlxKgQK4yJJsyiw5YmU9QxSYCDnsfiWUs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=dJ+XF93C; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dJ+XF93C" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561678; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GAkplb3fEQUh//5D4yri74NW8yHpfRyCt1Rq8wCX+vc=; b=dJ+XF93C5dY/ScjNFuBI4GFJa/5NmBOCW8vQUI6xtl+WeyBHknkovQKYXXVUOFdYLQ8LLj BA+FUFRMzN1nmn8k1agkTz1cV/WzpeZKIUywW8IFmI7h2PpGcuP6FRd4ZEDn6sB/mOeR6V Fhql+NXnSwR4cLuBw/nzYXAodFPcFI0= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-687-_i1dfoj1Ov2WrIZApdNmZA-1; Thu, 30 Apr 2026 11:07:54 -0400 X-MC-Unique: _i1dfoj1Ov2WrIZApdNmZA-1 X-Mimecast-MFC-AGG-ID: _i1dfoj1Ov2WrIZApdNmZA_1777561672 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 235CF19560B4; Thu, 30 Apr 2026 15:07:52 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 39D901800347; Thu, 30 Apr 2026 15:07:51 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com, Kai Huang Subject: [PATCH 03/28] KVM: x86/mmu: free up bit 10 of PTEs in preparation for MBEC Date: Thu, 30 Apr 2026 11:07:22 -0400 Message-ID: <20260430150747.76749-4-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" From: Jon Kohler Update SPTE_MMIO_ALLOWED_MASK to allow EPT user executable (bit 10) to be treated like EPT RWX bit2:0, as when mode-based execute control is enabled, bit 10 can act like a "present" bit. Likewise do not include it in FROZEN_SPTE. No functional changes intended, other than the reduction of the maximum MMIO generation that is stored in page tables. Cc: Kai Huang Signed-off-by: Jon Kohler Message-ID: <20251223054806.1611168-4-jon@nutanix.com> Reviewed-by: Kai Huang Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/vmx.h | 2 ++ arch/x86/kvm/mmu/spte.h | 20 +++++++++++--------- 2 files changed, 13 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index b2291a766e3f..2b30b921b375 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -560,10 +560,12 @@ enum vmcs_field { #define VMX_EPT_ACCESS_BIT (1ull << 8) #define VMX_EPT_DIRTY_BIT (1ull << 9) #define VMX_EPT_SUPPRESS_VE_BIT (1ull << 63) + #define VMX_EPT_RWX_MASK (VMX_EPT_READABLE_MASK | = \ VMX_EPT_WRITABLE_MASK | \ VMX_EPT_EXECUTABLE_MASK) #define VMX_EPT_MT_MASK (7ull << VMX_EPT_MT_EPTE_SHIFT) +#define VMX_EPT_USER_EXECUTABLE_MASK (1ull << 10) =20 static inline u8 vmx_eptp_page_walk_level(u64 eptp) { diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 28086fa86fe0..4283cea3e66c 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -96,11 +96,11 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRAC= K_SAVED_MASK)); #undef SHADOW_ACC_TRACK_SAVED_MASK =20 /* - * Due to limited space in PTEs, the MMIO generation is a 19 bit subset of + * Due to limited space in PTEs, the MMIO generation is an 18 bit subset of * the memslots generation and is derived as follows: * - * Bits 0-7 of the MMIO generation are propagated to spte bits 3-10 - * Bits 8-18 of the MMIO generation are propagated to spte bits 52-62 + * Bits 0-6 of the MMIO generation are propagated to spte bits 3-9 + * Bits 7-17 of the MMIO generation are propagated to spte bits 52-62 * * The KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS flag is intentionally not includ= ed in * the MMIO generation number, as doing so would require stealing a bit fr= om @@ -111,7 +111,7 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRAC= K_SAVED_MASK)); */ =20 #define MMIO_SPTE_GEN_LOW_START 3 -#define MMIO_SPTE_GEN_LOW_END 10 +#define MMIO_SPTE_GEN_LOW_END 9 =20 #define MMIO_SPTE_GEN_HIGH_START 52 #define MMIO_SPTE_GEN_HIGH_END 62 @@ -133,7 +133,8 @@ static_assert(!(SPTE_MMU_PRESENT_MASK & * and so they're off-limits for generation; additional checks ensure the = mask * doesn't overlap legal PA bits), and bit 63 (carved out for future usage= ). */ -#define SPTE_MMIO_ALLOWED_MASK (BIT_ULL(63) | GENMASK_ULL(51, 12) | GENMAS= K_ULL(2, 0)) +#define SPTE_MMIO_ALLOWED_MASK (BIT_ULL(63) | GENMASK_ULL(51, 12) | \ + BIT_ULL(10) | GENMASK_ULL(2, 0)) static_assert(!(SPTE_MMIO_ALLOWED_MASK & (SPTE_MMU_PRESENT_MASK | MMIO_SPTE_GEN_LOW_MASK | MMIO_SPTE_GEN_HIGH_MAS= K))); =20 @@ -141,7 +142,7 @@ static_assert(!(SPTE_MMIO_ALLOWED_MASK & #define MMIO_SPTE_GEN_HIGH_BITS (MMIO_SPTE_GEN_HIGH_END - MMIO_SPTE_GEN_H= IGH_START + 1) =20 /* remember to adjust the comment above as well if you change these */ -static_assert(MMIO_SPTE_GEN_LOW_BITS =3D=3D 8 && MMIO_SPTE_GEN_HIGH_BITS = =3D=3D 11); +static_assert(MMIO_SPTE_GEN_LOW_BITS =3D=3D 7 && MMIO_SPTE_GEN_HIGH_BITS = =3D=3D 11); =20 #define MMIO_SPTE_GEN_LOW_SHIFT (MMIO_SPTE_GEN_LOW_START - 0) #define MMIO_SPTE_GEN_HIGH_SHIFT (MMIO_SPTE_GEN_HIGH_START - MMIO_SPTE_GEN= _LOW_BITS) @@ -217,10 +218,11 @@ extern u64 __read_mostly shadow_nonpresent_or_rsvd_ma= sk; * * Only used by the TDP MMU. */ -#define FROZEN_SPTE (SHADOW_NONPRESENT_VALUE | 0x5a0ULL) +#define FROZEN_SPTE (SHADOW_NONPRESENT_VALUE | 0x1a0ULL) =20 -/* Frozen SPTEs must not be misconstrued as shadow present PTEs. */ -static_assert(!(FROZEN_SPTE & SPTE_MMU_PRESENT_MASK)); +/* Frozen SPTEs must not be misconstrued as shadow or MMU present PTEs. */ +static_assert(!(FROZEN_SPTE & (SPTE_MMU_PRESENT_MASK | + VMX_EPT_RWX_MASK | VMX_EPT_USER_EXECUTABLE_MASK))); =20 static inline bool is_frozen_spte(u64 spte) { --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 646412BFC7B for ; Thu, 30 Apr 2026 15:07:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561677; cv=none; b=UY2RAVED65WG8fKmynkjNrK1iB4MOFBtUTmjKRtR2AKknxlIfL9rSWfYY9C5f9AnDRKct+CkT7RWDuCEhSWY0OXrJ/oxgSdNnVi+0l6MKe5bTOpDVN5cxoTChYw4Dc7piPRacealU/fNqon7JrWii7oCOXql3H944ovj3c6Dl+M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561677; c=relaxed/simple; bh=O5n71Bz8uoSCSBkrgkRDI29Njuo0IhYKHEcXNw41XFs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=p0qpIhcBySQoA9ORsHI8rvmVYjJVEgAEshaLlT6PNiUMXzTsUn0PTy610p5Blx5PEEZdkChGn45jcxkUDw/szm/9660d1ocp6jnN1efBAJKV9CnRHDbE8llnrmp6LOixzfrJ00rNAHeeR/lk+Co/tVjhFryvGDKfw6K5+e2GY6Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ahavsuM5; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ahavsuM5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561675; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sVOkHVzh1v/WWfM+UAvEnyZaqIJvUCbqkVDpCLFz8Ew=; b=ahavsuM5NbL1mcnP8y/sMESxlNJ2aoYjGK+Czx0//q8y8rJFEL2Dp7xPiIhXRmt5JAqgW+ 67dQ6qMsOxwfJwLQjdAyF0cgxKlEDBJvujr9F0UOybBaMmanyJ0gKe5yki658sjIx2DVJ5 HPXBxWA8oQuMD45gMePwySwE6T8y2HQ= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-711-q8vQ8q6WP9-HGm5JRq7yTQ-1; Thu, 30 Apr 2026 11:07:54 -0400 X-MC-Unique: q8vQ8q6WP9-HGm5JRq7yTQ-1 X-Mimecast-MFC-AGG-ID: q8vQ8q6WP9-HGm5JRq7yTQ_1777561673 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E717E1956052; Thu, 30 Apr 2026 15:07:52 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 342E01800577; Thu, 30 Apr 2026 15:07:52 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 04/28] KVM: x86/mmu: shuffle high bits of SPTEs in preparation for MBEC Date: Thu, 30 Apr 2026 11:07:23 -0400 Message-ID: <20260430150747.76749-5-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" Access tracking will need to save bit 10 when MBEC is enabled. Right now it is simply shifting the R and X bits into bits 54 and 56, but bit 10 would not fit with the same scheme. Reorganize the high bits so that access tracking will use bits 52, 54 and 62. As a side effect, the free bits are compacted slightly, with 56-59 still unused. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/spte.h | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 4283cea3e66c..317b9cd1537c 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -17,10 +17,20 @@ */ #define SPTE_MMU_PRESENT_MASK BIT_ULL(11) =20 +/* + * The ignored high bits are allocated as follows: + * - bits 52, 54: saved X-R bits for access tracking when EPT does not hav= e A/D + * - bits 53 (EPT only): host writable + * - bits 55 (EPT only): MMU-writable + * - bits 56-59: unused + * - bits 60-61: type of A/D tracking + * - bits 62: unused + */ + /* * TDP SPTES (more specifically, EPT SPTEs) may not have A/D bits, and may= also * be restricted to using write-protection (for L2 when CPU dirty logging,= i.e. - * PML, is enabled). Use bits 52 and 53 to hold the type of A/D tracking = that + * PML, is enabled). Use bits 60 and 61 to hold the type of A/D tracking = that * is must be employed for a given TDP SPTE. * * Note, the "enabled" mask must be '0', as bits 62:52 are _reserved_ for = PAE @@ -29,7 +39,7 @@ * TDP with CPU dirty logging (PML). If NPT ever gains PML-like support, = it * must be restricted to 64-bit KVM. */ -#define SPTE_TDP_AD_SHIFT 52 +#define SPTE_TDP_AD_SHIFT 60 #define SPTE_TDP_AD_MASK (3ULL << SPTE_TDP_AD_SHIFT) #define SPTE_TDP_AD_ENABLED (0ULL << SPTE_TDP_AD_SHIFT) #define SPTE_TDP_AD_DISABLED (1ULL << SPTE_TDP_AD_SHIFT) @@ -65,7 +75,7 @@ static_assert(SPTE_TDP_AD_ENABLED =3D=3D 0); */ #define SHADOW_ACC_TRACK_SAVED_BITS_MASK (SPTE_EPT_READABLE_MASK | \ SPTE_EPT_EXECUTABLE_MASK) -#define SHADOW_ACC_TRACK_SAVED_BITS_SHIFT 54 +#define SHADOW_ACC_TRACK_SAVED_BITS_SHIFT 52 #define SHADOW_ACC_TRACK_SAVED_MASK (SHADOW_ACC_TRACK_SAVED_BITS_MASK << \ SHADOW_ACC_TRACK_SAVED_BITS_SHIFT) static_assert(!(SPTE_TDP_AD_MASK & SHADOW_ACC_TRACK_SAVED_MASK)); @@ -84,8 +94,8 @@ static_assert(!(SPTE_TDP_AD_MASK & SHADOW_ACC_TRACK_SAVED= _MASK)); * to not overlap the A/D type mask or the saved access bits of access-tra= cked * SPTEs when A/D bits are disabled. */ -#define EPT_SPTE_HOST_WRITABLE BIT_ULL(57) -#define EPT_SPTE_MMU_WRITABLE BIT_ULL(58) +#define EPT_SPTE_HOST_WRITABLE BIT_ULL(53) +#define EPT_SPTE_MMU_WRITABLE BIT_ULL(55) =20 static_assert(!(EPT_SPTE_HOST_WRITABLE & SPTE_TDP_AD_MASK)); static_assert(!(EPT_SPTE_MMU_WRITABLE & SPTE_TDP_AD_MASK)); --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E16CC2DC783 for ; Thu, 30 Apr 2026 15:08:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561682; cv=none; b=B4DGtu1Sd2xX4pru3BY4RvzOCS+f/xsNZvF/iNlmMoSLiRX/Zr8vKYrU932W3TethMXuSsshvY0peeSQNRyJqpeXA3Ken6vTkfQxMwqtdBv8nR9GvR1bsz8jhL4oq+GgAPsfScCWnMYrKsqpdV62e4XhVc/yqOJZyXmXG2hzzrU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561682; c=relaxed/simple; bh=SFwAvEejykGMbbJiQ9F6uJQL21XJV9Dx4ZbJgpCL2gM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QLqtTKjxtovt+kH/ul4UQ4wEI43m1c47tjHlHcZMB4oQoUhHmjg5eDY6gyGlbzdXCjV+x+NJ7TLtwI+LS9fu07GJSz9TzJXAy81HTaQiLUHuFbAvngO9f5cm0Y63Nlb80Bqw90eEZfisSBNoUQidDRZ/Dhk4GeaKQoCoKmzvuYw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=bd4pthEr; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="bd4pthEr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561680; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f02Crf6R+TdJvUvPULZfzFF8yNDEWWCjLIu/znZeFPk=; b=bd4pthEr1iytHWnYTB1xzwgOYeMna6rfhn8MIPRe93NK6u1vUjeI4Ag0mwldHBnh4UmmBX e1UposZm+uwqZerX8jlMpb/uySX8/9b9alsa0a+vRZez1k8uKOLYLNoFNNqukvqVjZjvmz o/E4Y9onnpHT7Yxh1r124Wwtwk5oTg8= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-37-Yao7u9fdNQSBfZeKHZ6ayw-1; Thu, 30 Apr 2026 11:07:54 -0400 X-MC-Unique: Yao7u9fdNQSBfZeKHZ6ayw-1 X-Mimecast-MFC-AGG-ID: Yao7u9fdNQSBfZeKHZ6ayw_1777561673 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9F9E8195608F; Thu, 30 Apr 2026 15:07:53 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 19ACC1800577; Thu, 30 Apr 2026 15:07:53 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 05/28] KVM: x86/mmu: remove SPTE_EPT_* Date: Thu, 30 Apr 2026 11:07:24 -0400 Message-ID: <20260430150747.76749-6-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" spte.h is already including vmx.h, use the constants it defines. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/spte.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 317b9cd1537c..bc02a2e89a31 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -57,10 +57,6 @@ static_assert(SPTE_TDP_AD_ENABLED =3D=3D 0); #define ACC_USER_MASK PT_USER_MASK #define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK) =20 -/* The mask for the R/X bits in EPT PTEs */ -#define SPTE_EPT_READABLE_MASK 0x1ull -#define SPTE_EPT_EXECUTABLE_MASK 0x4ull - #define SPTE_LEVEL_BITS 9 #define SPTE_LEVEL_SHIFT(level) __PT_LEVEL_SHIFT(level, SPTE_LEVEL_BITS) #define SPTE_INDEX(address, level) __PT_INDEX(address, level, SPTE_LEVEL_B= ITS) @@ -73,8 +69,8 @@ static_assert(SPTE_TDP_AD_ENABLED =3D=3D 0); * restored only when a write is attempted to the page. This mask obvious= ly * must not overlap the A/D type mask. */ -#define SHADOW_ACC_TRACK_SAVED_BITS_MASK (SPTE_EPT_READABLE_MASK | \ - SPTE_EPT_EXECUTABLE_MASK) +#define SHADOW_ACC_TRACK_SAVED_BITS_MASK (VMX_EPT_READABLE_MASK | \ + VMX_EPT_EXECUTABLE_MASK) #define SHADOW_ACC_TRACK_SAVED_BITS_SHIFT 52 #define SHADOW_ACC_TRACK_SAVED_MASK (SHADOW_ACC_TRACK_SAVED_BITS_MASK << \ SHADOW_ACC_TRACK_SAVED_BITS_SHIFT) --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3219A2C3252 for ; Thu, 30 Apr 2026 15:08:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561683; cv=none; b=oqdjP32szcjuRBPnmAz72gcWdLHaru06eIuJMoqq30GSfvVWSAxyhYwlvB93rPzx28H/9kYfmF7YQbSGhUqoN+02JAEgC43cnmeDKIVJw/63EYg+iQ+q9voQKck60obUNBc7YlMdjFe+mF+SQIHaJI0Bb5iAINEKv8IKyAiDpjw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561683; c=relaxed/simple; bh=vQvPgt0Mq2Gsbr1Skf087n28m4o4mK3xVgjfrnI90KM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=oFpNlS392GtOWn+3HQRpUVUpwm/0D9FFg/1GNd8kHVbw/ztl5kd0xi1olP5vA1e0QlznP1+0qUi0rC70Y0qPdRB8LPLbxHVXdMLox5ajLB9zuYvbqax///8AZfpLYbEu7kOygRSNPNSWT8nTo8yzD5TMp7VjpCIqT0vOzP80q7A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=QO2BS0VI; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QO2BS0VI" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561680; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pMPlsOACadmgUCBsX+kfM3jcuhxlgiVMWFCf2O3a/w0=; b=QO2BS0VI9q6U4ajGibVzatgrBqJ7AAB7TJjCO/qekig2N862nTcEDnlfTupsGg3GOaUhfU 83gdW4aCuOi3f6zLtus4B4hUd6fUdNoe1xDd6hcnvQ1oGfjTbLjlbPeZBGk8EOGSMVxeFj JMS0bVvx0aTCmgt35l+vbrJ+FMdfK70= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-50-JjgX3ONNOwumryolFVpoZQ-1; Thu, 30 Apr 2026 11:07:55 -0400 X-MC-Unique: JjgX3ONNOwumryolFVpoZQ-1 X-Mimecast-MFC-AGG-ID: JjgX3ONNOwumryolFVpoZQ_1777561674 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 581AA1800245; Thu, 30 Apr 2026 15:07:54 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C65D91800577; Thu, 30 Apr 2026 15:07:53 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 06/28] KVM: x86/mmu: merge make_spte_{non,}executable Date: Thu, 30 Apr 2026 11:07:25 -0400 Message-ID: <20260430150747.76749-7-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" As the logic will become more complicated with the introduction of MBEC, at least write it only once. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/spte.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 85a0473809b0..e9dc0ae44274 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -317,14 +317,16 @@ static u64 modify_spte_protections(u64 spte, u64 set,= u64 clear) return spte; } =20 -static u64 make_spte_executable(u64 spte) +static u64 make_spte_executable(u64 spte, u8 access) { - return modify_spte_protections(spte, shadow_x_mask, shadow_nx_mask); -} + u64 set, clear; =20 -static u64 make_spte_nonexecutable(u64 spte) -{ - return modify_spte_protections(spte, shadow_nx_mask, shadow_x_mask); + if (access & ACC_EXEC_MASK) + set =3D shadow_x_mask; + else + set =3D shadow_nx_mask; + clear =3D set ^ (shadow_nx_mask | shadow_x_mask); + return modify_spte_protections(spte, set, clear); } =20 /* @@ -356,8 +358,8 @@ u64 make_small_spte(struct kvm *kvm, u64 huge_spte, * the page executable as the NX hugepage mitigation no longer * applies. */ - if ((role.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm)) - child_spte =3D make_spte_executable(child_spte); + if (is_nx_huge_page_enabled(kvm)) + child_spte =3D make_spte_executable(child_spte, role.access); } =20 return child_spte; @@ -379,7 +381,7 @@ u64 make_huge_spte(struct kvm *kvm, u64 small_spte, int= level) huge_spte &=3D KVM_HPAGE_MASK(level) | ~PAGE_MASK; =20 if (is_nx_huge_page_enabled(kvm)) - huge_spte =3D make_spte_nonexecutable(huge_spte); + huge_spte =3D make_spte_executable(huge_spte, 0); =20 return huge_spte; } --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6CE372D97B8 for ; Thu, 30 Apr 2026 15:08:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561684; cv=none; b=FL9dszTLjg91XmlqFqDn3YT1k/7VZhF2rKQP1pAkqhrL8/BeueosW4KF6ss+XLzec114dG2QVE9n4FVsc7pkLnYn0Z4nVd/yr2fsCnV19DEXgVPGQsnl682RLCJe1zP07YClzcXdcAa77aQVKTHf3+BMn8JYe8YiY9hXVIcnlXA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561684; c=relaxed/simple; bh=pYxi/j+e+ki5JQsTiKeKDXJukerh7jXKmGy6I5ArLY0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=kTRclQr3kNXNEcjyDmbwU1qkW2NSicmdgNwhhnVIM9ewPsZyGdKJ2daz35a1Qacdg/xNHAmdZUjl4zw86neMMKMj6gASUbrzEPhkx6uHyVWHSBfCG2FL1G9mct4xN6p7HMo/bYw/4as/jK+HeNcVWWXR4A9gPfRkryNF9nz2Z2o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Z0hsBsvw; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Z0hsBsvw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561681; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IqH4mMwuSaq5xJDDbQpqV0WrUZp6tq9mDH2+yND/Ss0=; b=Z0hsBsvwcGtTEKIyCAq0Qd2jVVMQLwpPNSRJicVup6UPA1wNhba6I5Ko+cx5Z+5f8z0ReE XbGun51HtVCYqg1Uba66BU/ehERi8IwEMoKjidKlAsBYgw/tQs1lSR3/gzXx3JHyGabmCr 9F5kVf6e0Usj4L77pJEeUOvEoHiBV14= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-493-8e-_lmkvPlqTTPP7ksHpqw-1; Thu, 30 Apr 2026 11:07:56 -0400 X-MC-Unique: 8e-_lmkvPlqTTPP7ksHpqw-1 X-Mimecast-MFC-AGG-ID: 8e-_lmkvPlqTTPP7ksHpqw_1777561675 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 47F65195605A; Thu, 30 Apr 2026 15:07:55 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 7F6B81800906; Thu, 30 Apr 2026 15:07:54 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 07/28] KVM: x86/mmu: rename and clarify BYTE_MASK Date: Thu, 30 Apr 2026 11:07:26 -0400 Message-ID: <20260430150747.76749-8-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" The BYTE_MASK macro is the central point of the black magic in update_permission_bitmask(). Rename it to something that relates to how it is used, and add a comment explaining how it works. Using shifts instead of powers of two was actually suggested by David Hildenbrand back in 2017 for clarity[1] but I evidently forgot his suggestion when applying to kvm.git. [1] https://lore.kernel.org/kvm/e4b5df86-31ae-2f4e-0666-393753e256df@redhat= .com/ Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 55 ++++++++++++++++++++++++++++++------------ 1 file changed, 39 insertions(+), 16 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 24fbc9ea502a..70fd6868a555 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5529,29 +5529,53 @@ reset_ept_shadow_zero_bits_mask(struct kvm_mmu *con= text, bool execonly) max_huge_page_level); } =20 -#define BYTE_MASK(access) \ - ((1 & (access) ? 2 : 0) | \ - (2 & (access) ? 4 : 0) | \ - (3 & (access) ? 8 : 0) | \ - (4 & (access) ? 16 : 0) | \ - (5 & (access) ? 32 : 0) | \ - (6 & (access) ? 64 : 0) | \ - (7 & (access) ? 128 : 0)) - +/* + * Build a mask with all combinations of PTE access rights that + * include the given access bit. The mask can be queried with + * "mask & (1 << access)", where access is a combination of + * ACC_* bits. + * + * By mixing and matching multiple masks returned by ACC_BITS_MASK, + * update_permission_bitmask() builds what is effectively a + * two-dimensional array of bools. The second dimension is + * provided by individual bits of permissions[pfec >> 1], and + * logical &, | and ~ operations operate on all the 8 possible + * combinations of ACC_* bits. + */ +#define ACC_BITS_MASK(access) \ + ((1 & (access) ? 1 << 1 : 0) | \ + (2 & (access) ? 1 << 2 : 0) | \ + (3 & (access) ? 1 << 3 : 0) | \ + (4 & (access) ? 1 << 4 : 0) | \ + (5 & (access) ? 1 << 5 : 0) | \ + (6 & (access) ? 1 << 6 : 0) | \ + (7 & (access) ? 1 << 7 : 0)) =20 static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept) { unsigned byte; =20 - const u8 x =3D BYTE_MASK(ACC_EXEC_MASK); - const u8 w =3D BYTE_MASK(ACC_WRITE_MASK); - const u8 u =3D BYTE_MASK(ACC_USER_MASK); + const u8 x =3D ACC_BITS_MASK(ACC_EXEC_MASK); + const u8 w =3D ACC_BITS_MASK(ACC_WRITE_MASK); + const u8 u =3D ACC_BITS_MASK(ACC_USER_MASK); =20 bool cr4_smep =3D is_cr4_smep(mmu); bool cr4_smap =3D is_cr4_smap(mmu); bool cr0_wp =3D is_cr0_wp(mmu); bool efer_nx =3D is_efer_nx(mmu); =20 + /* + * In hardware, page fault error codes are generated (as the name + * suggests) on any kind of page fault. permission_fault() and + * paging_tmpl.h already use the same bits after a successful page + * table walk, to indicate the kind of access being performed. + * + * However, PFERR_PRESENT_MASK and PFERR_RSVD_MASK are never set here, + * exactly because the page walk is successful. PFERR_PRESENT_MASK is + * removed by the shift, while PFERR_RSVD_MASK is repurposed in + * permission_fault() to indicate accesses that are *not* subject to + * SMAP restrictions. + */ for (byte =3D 0; byte < ARRAY_SIZE(mmu->permissions); ++byte) { unsigned pfec =3D byte << 1; =20 @@ -5598,10 +5622,9 @@ static void update_permission_bitmask(struct kvm_mmu= *mmu, bool ept) * - The access is supervisor mode * - If implicit supervisor access or X86_EFLAGS_AC is clear * - * Here, we cover the first four conditions. - * The fifth is computed dynamically in permission_fault(); - * PFERR_RSVD_MASK bit will be set in PFEC if the access is - * *not* subject to SMAP restrictions. + * Here, we cover the first four conditions. The fifth + * is computed dynamically in permission_fault() and + * communicated by setting PFERR_RSVD_MASK. */ if (cr4_smap) smapf =3D (pfec & (PFERR_RSVD_MASK|PFERR_FETCH_MASK)) ? 0 : kf; --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F1DF42D949F for ; Thu, 30 Apr 2026 15:08:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561684; cv=none; b=UFUXujiOt2YU3eSlqZi8CcQCoH/Pl9CPfu8C1PfRwiFQDqYGjCJjz9xvNCFs299+sHObC69z02bLSvZ0XCj6D5AnsqeoB9H/8CmXBkDQQd3iaRrbOpwA5UCuvjXqGt8Q7ILm17TnKekarD1dfvdag04GVeGPn4SdnQL7V9W4EZo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561684; c=relaxed/simple; bh=yBqFMsLCGznmhPZ1vRAb0i1RZX9nxHjWBWVouZ/55D4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=nV3oD78x+yDBSHAa8+g9zTXarg017fdH3lojSUhopaFu2RdZuRECNkpXgK33cVxFc0cJ91AXQVyhfmPezba7Zol4p+YwrMcDi4JjIJd2M2NLollnMmc+CQdSSzFjCdIPVSObmqWabEUWAnE68GdT60d9dTiE3VCGS6obWNzL554= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=QNz1Jvuk; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QNz1Jvuk" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561681; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zRsefiZnpFr+R4AGPBx9bGqTQT8DYDaseYAVtgOfyBI=; b=QNz1JvukdK3XINiyxdaiDUE1mlMWd6Q/5b7NAISTIBR/zKIRPzKEj5KTKCVvMbxCJlLJ/2 PL0QA28PbG5FlZFAGk1adlNenYehzU4t2UgBSXTNz+gcWoHN223f4P3bQ/g9v+z4xyuP+E wbyri24BjqjB9wcIZ/Y+zyEKUTBk04A= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-482-1PXFkvQ7PzalK3jBrwgqjg-1; Thu, 30 Apr 2026 11:07:57 -0400 X-MC-Unique: 1PXFkvQ7PzalK3jBrwgqjg-1 X-Mimecast-MFC-AGG-ID: 1PXFkvQ7PzalK3jBrwgqjg_1777561676 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DE1DE180059B; Thu, 30 Apr 2026 15:07:55 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 578801800347; Thu, 30 Apr 2026 15:07:55 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 08/28] KVM: x86/mmu: separate more EPT/non-EPT permission_fault() Date: Thu, 30 Apr 2026 11:07:27 -0400 Message-ID: <20260430150747.76749-9-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" Move more of EPT handling entirely in the existing "if (!ept)" conditional. Use a new "rf" variable instead of uf for read permissions for clarity. Merge smepf and ff into a single variable because EPT's "SMEP" (actually MBEC) is defined differently and does not need smepf. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 70fd6868a555..8bbda4684338 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5584,24 +5584,28 @@ static void update_permission_bitmask(struct kvm_mm= u *mmu, bool ept) * that causes a fault with the given PFEC. */ =20 + /* Faults from reads to non-readable pages */ + u8 rf =3D 0; /* Faults from writes to non-writable pages */ u8 wf =3D (pfec & PFERR_WRITE_MASK) ? (u8)~w : 0; /* Faults from user mode accesses to supervisor pages */ - u8 uf =3D (pfec & PFERR_USER_MASK) ? (u8)~u : 0; - /* Faults from fetches of non-executable pages*/ - u8 ff =3D (pfec & PFERR_FETCH_MASK) ? (u8)~x : 0; - /* Faults from kernel mode fetches of user pages */ - u8 smepf =3D 0; + u8 uf =3D 0; + /* Faults from fetches of non-executable pages */ + u8 ff =3D 0; /* Faults from kernel mode accesses of user pages */ u8 smapf =3D 0; =20 - if (!ept) { + if (ept) { + rf =3D (pfec & PFERR_USER_MASK) ? (u8)~u : 0; + ff =3D (pfec & PFERR_FETCH_MASK) ? (u16)~x : 0; + } else { /* Faults from kernel mode accesses to user pages */ u8 kf =3D (pfec & PFERR_USER_MASK) ? 0 : u; =20 - /* Not really needed: !nx will cause pte.nx to fault */ - if (!efer_nx) - ff =3D 0; + uf =3D (pfec & PFERR_USER_MASK) ? (u8)~u : 0; + + if (efer_nx) + ff =3D (pfec & PFERR_FETCH_MASK) ? (u16)~x : 0; =20 /* Allow supervisor writes if !cr0.wp */ if (!cr0_wp) @@ -5609,7 +5613,7 @@ static void update_permission_bitmask(struct kvm_mmu = *mmu, bool ept) =20 /* Disallow supervisor fetches of user code if cr4.smep */ if (cr4_smep) - smepf =3D (pfec & PFERR_FETCH_MASK) ? kf : 0; + ff |=3D (pfec & PFERR_FETCH_MASK) ? kf : 0; =20 /* * SMAP:kernel-mode data accesses from user-mode @@ -5630,7 +5634,7 @@ static void update_permission_bitmask(struct kvm_mmu = *mmu, bool ept) smapf =3D (pfec & (PFERR_RSVD_MASK|PFERR_FETCH_MASK)) ? 0 : kf; } =20 - mmu->permissions[byte] =3D ff | uf | wf | smepf | smapf; + mmu->permissions[byte] =3D ff | uf | wf | rf | smapf; } } =20 --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 45A5F2571C0 for ; Thu, 30 Apr 2026 15:08:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561685; cv=none; b=qJq/iD/KoRnPjrUkli2VD1j4R0QU2XFK2i3lDEuovEmcSg7IHO77mqDOF8wGAtOwobM1WBKpOjtxfUIhuB1sDVaDzw3ff15O1TDlwQbNanDcp9MWBbJ+uhcoHRUnzKZgv9XwlETIDjErYR81sb7UZiSBjFlYE5FDwFMbe9a6+TM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561685; c=relaxed/simple; bh=HdRjLsp38qMX1nPGa5So/zrbyjxUIqxmmXaR+77mntg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=lmtYosb0/ElL8dk8jhDZKUBm0a5fcNr/A76LkvFPtXVAQjI2U9eZHoaj+HIQmlFU0Eqo3b1FVKi0KFKXlo4WF9yc2BUgWOsgbcYawo5KudcUohg3t8mdDRH87UeLYqrCrr0CS2qHF6kyxoWMAwzonkez0C0nSacFvGOdWUdefnE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=PTfyBB1r; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PTfyBB1r" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561681; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gB41uWGvnX8Bo1MLIC1H0NeEUJ17Qu4p2PJU1qv629Y=; b=PTfyBB1rqUa4+74oqSHslk9YqMD8K+cPDRSNRNjG0oTa2Fe2bXoi/oELipkWM4Hysy7LU7 TUSlwmg44E7H2nbqX8CQHA8XKa/5BgJvL7fPMCTKIpcBzQ3GKbRHAMJ5J5SePFCYVMumF5 oYkIT4mn4wkLvGvjzIF7IrWO+dGJyto= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-224-jVkWoRA4PxCi-bUnbYorFA-1; Thu, 30 Apr 2026 11:07:57 -0400 X-MC-Unique: jVkWoRA4PxCi-bUnbYorFA-1 X-Mimecast-MFC-AGG-ID: jVkWoRA4PxCi-bUnbYorFA_1777561676 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 96F541800364; Thu, 30 Apr 2026 15:07:56 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 1075F1800577; Thu, 30 Apr 2026 15:07:55 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 09/28] KVM: x86/mmu: introduce ACC_READ_MASK Date: Thu, 30 Apr 2026 11:07:28 -0400 Message-ID: <20260430150747.76749-10-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" Read permissions so far were only needed for EPT, which does not need ACC_USER_MASK. Therefore, for EPT page tables ACC_USER_MASK was repurposed as a read permission bit. In order to implement nested MBEC, EPT will genuinely have four kinds of accesses, and there will be no room for such hacks; bite the bullet at last, enlarging ACC_ALL to four bits and permissions[] to 2^4 bits (u16). The new code does not enforce that the XWR bits on non-execonly processors have their R bit set, even when running nested: none of the shadow_*_mask values have bit 0 set, and make_spte() genuinely relies on ACC_READ_MASK being requested! This works because, if execonly is not supported by the processor, shadow EPT will generate an EPT misconfig vmexit if the XWR bits represent a non-readable page, and therefore the pte_access argument to make_spte() will also always have ACC_READ_MASK set. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 12 +++++----- arch/x86/kvm/mmu.h | 2 +- arch/x86/kvm/mmu/mmu.c | 41 ++++++++++++++++++++------------- arch/x86/kvm/mmu/mmutrace.h | 3 ++- arch/x86/kvm/mmu/paging_tmpl.h | 35 +++++++++++++++++----------- arch/x86/kvm/mmu/spte.c | 18 ++++++--------- arch/x86/kvm/mmu/spte.h | 5 ++-- arch/x86/kvm/vmx/capabilities.h | 5 ---- arch/x86/kvm/vmx/common.h | 5 +--- arch/x86/kvm/vmx/vmx.c | 3 +-- 10 files changed, 67 insertions(+), 62 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index c470e40a00aa..8f2a1b915df9 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -328,11 +328,11 @@ struct kvm_kernel_irq_routing_entry; * the number of unique SPs that can theoretically be created is 2^n, wher= e n * is the number of bits that are used to compute the role. * - * But, even though there are 20 bits in the mask below, not all combinati= ons + * But, even though there are 21 bits in the mask below, not all combinati= ons * of modes and flags are possible: * * - invalid shadow pages are not accounted, mirror pages are not shadow= ed, - * so the bits are effectively 18. + * so the bits are effectively 19. * * - quadrant will only be used if has_4_byte_gpte=3D1 (non-PAE paging); * execonly and ad_disabled are only used for nested EPT which has @@ -347,7 +347,7 @@ struct kvm_kernel_irq_routing_entry; * cr0_wp=3D0, therefore these three bits only give rise to 5 possibil= ities. * * Therefore, the maximum number of possible upper-level shadow pages for a - * single gfn is a bit less than 2^13. + * single gfn is a bit less than 2^14. */ union kvm_mmu_page_role { u32 word; @@ -356,7 +356,7 @@ union kvm_mmu_page_role { unsigned has_4_byte_gpte:1; unsigned quadrant:2; unsigned direct:1; - unsigned access:3; + unsigned access:4; unsigned invalid:1; unsigned efer_nx:1; unsigned cr0_wp:1; @@ -366,7 +366,7 @@ union kvm_mmu_page_role { unsigned guest_mode:1; unsigned passthrough:1; unsigned is_mirror:1; - unsigned :4; + unsigned:3; =20 /* * This is left at the top of the word so that @@ -492,7 +492,7 @@ struct kvm_mmu { * Byte index: page fault error code [4:1] * Bit index: pte permissions in ACC_* format */ - u8 permissions[16]; + u16 permissions[16]; =20 u64 *pae_root; u64 *pml4_root; diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 830f46145692..23f37535c0ce 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -81,7 +81,7 @@ u8 kvm_mmu_get_max_tdp_level(void); void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_= mask); void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value); void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask); -void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only); +void kvm_mmu_set_ept_masks(bool has_ad_bits); =20 void kvm_init_mmu(struct kvm_vcpu *vcpu); void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0, diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 8bbda4684338..fc1b17e22ea2 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2033,7 +2033,7 @@ static bool kvm_sync_page_check(struct kvm_vcpu *vcpu= , struct kvm_mmu_page *sp) */ const union kvm_mmu_page_role sync_role_ign =3D { .level =3D 0xf, - .access =3D 0x7, + .access =3D ACC_ALL, .quadrant =3D 0x3, .passthrough =3D 0x1, }; @@ -5539,7 +5539,7 @@ reset_ept_shadow_zero_bits_mask(struct kvm_mmu *conte= xt, bool execonly) * update_permission_bitmask() builds what is effectively a * two-dimensional array of bools. The second dimension is * provided by individual bits of permissions[pfec >> 1], and - * logical &, | and ~ operations operate on all the 8 possible + * logical &, | and ~ operations operate on all the 16 possible * combinations of ACC_* bits. */ #define ACC_BITS_MASK(access) \ @@ -5549,15 +5549,23 @@ reset_ept_shadow_zero_bits_mask(struct kvm_mmu *con= text, bool execonly) (4 & (access) ? 1 << 4 : 0) | \ (5 & (access) ? 1 << 5 : 0) | \ (6 & (access) ? 1 << 6 : 0) | \ - (7 & (access) ? 1 << 7 : 0)) + (7 & (access) ? 1 << 7 : 0) | \ + (8 & (access) ? 1 << 8 : 0) | \ + (9 & (access) ? 1 << 9 : 0) | \ + (10 & (access) ? 1 << 10 : 0) | \ + (11 & (access) ? 1 << 11 : 0) | \ + (12 & (access) ? 1 << 12 : 0) | \ + (13 & (access) ? 1 << 13 : 0) | \ + (14 & (access) ? 1 << 14 : 0) | \ + (15 & (access) ? 1 << 15 : 0)) =20 static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept) { unsigned byte; =20 - const u8 x =3D ACC_BITS_MASK(ACC_EXEC_MASK); - const u8 w =3D ACC_BITS_MASK(ACC_WRITE_MASK); - const u8 u =3D ACC_BITS_MASK(ACC_USER_MASK); + const u16 x =3D ACC_BITS_MASK(ACC_EXEC_MASK); + const u16 w =3D ACC_BITS_MASK(ACC_WRITE_MASK); + const u16 r =3D ACC_BITS_MASK(ACC_READ_MASK); =20 bool cr4_smep =3D is_cr4_smep(mmu); bool cr4_smap =3D is_cr4_smap(mmu); @@ -5580,29 +5588,30 @@ static void update_permission_bitmask(struct kvm_mm= u *mmu, bool ept) unsigned pfec =3D byte << 1; =20 /* - * Each "*f" variable has a 1 bit for each UWX value + * Each "*f" variable has a 1 bit for each ACC_* combo * that causes a fault with the given PFEC. */ =20 /* Faults from reads to non-readable pages */ - u8 rf =3D 0; + u16 rf =3D (pfec & (PFERR_WRITE_MASK|PFERR_FETCH_MASK)) ? 0 : (u16)~r; /* Faults from writes to non-writable pages */ - u8 wf =3D (pfec & PFERR_WRITE_MASK) ? (u8)~w : 0; + u16 wf =3D (pfec & PFERR_WRITE_MASK) ? (u16)~w : 0; /* Faults from user mode accesses to supervisor pages */ - u8 uf =3D 0; + u16 uf =3D 0; /* Faults from fetches of non-executable pages */ - u8 ff =3D 0; + u16 ff =3D 0; /* Faults from kernel mode accesses of user pages */ - u8 smapf =3D 0; + u16 smapf =3D 0; =20 if (ept) { - rf =3D (pfec & PFERR_USER_MASK) ? (u8)~u : 0; ff =3D (pfec & PFERR_FETCH_MASK) ? (u16)~x : 0; } else { - /* Faults from kernel mode accesses to user pages */ - u8 kf =3D (pfec & PFERR_USER_MASK) ? 0 : u; + const u16 u =3D ACC_BITS_MASK(ACC_USER_MASK); =20 - uf =3D (pfec & PFERR_USER_MASK) ? (u8)~u : 0; + /* Faults from kernel mode accesses to user pages */ + u16 kf =3D (pfec & PFERR_USER_MASK) ? 0 : u; + + uf =3D (pfec & PFERR_USER_MASK) ? (u16)~u : 0; =20 if (efer_nx) ff =3D (pfec & PFERR_FETCH_MASK) ? (u16)~x : 0; diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h index 764e3015d021..dcfdfedfc4e9 100644 --- a/arch/x86/kvm/mmu/mmutrace.h +++ b/arch/x86/kvm/mmu/mmutrace.h @@ -25,7 +25,8 @@ #define KVM_MMU_PAGE_PRINTK() ({ \ const char *saved_ptr =3D trace_seq_buffer_ptr(p); \ static const char *access_str[] =3D { \ - "---", "--x", "w--", "w-x", "-u-", "-ux", "wu-", "wux" \ + "----", "r---", "-w--", "rw--", "--u-", "r-u-", "-wu-", "rwu-", \ + "---x", "r--x", "-w-x", "rw-x", "--ux", "r-ux", "-wux", "rwux" \ }; \ union kvm_mmu_page_role role; \ \ diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 901cd2bd40b8..fb1b5d8b23e5 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -170,25 +170,24 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_v= cpu *vcpu, return true; } =20 -/* - * For PTTYPE_EPT, a page table can be executable but not readable - * on supported processors. Therefore, set_spte does not automatically - * set bit 0 if execute only is supported. Here, we repurpose ACC_USER_MASK - * to signify readability since it isn't used in the EPT case - */ static inline unsigned FNAME(gpte_access)(u64 gpte) { unsigned access; #if PTTYPE =3D=3D PTTYPE_EPT access =3D ((gpte & VMX_EPT_WRITABLE_MASK) ? ACC_WRITE_MASK : 0) | ((gpte & VMX_EPT_EXECUTABLE_MASK) ? ACC_EXEC_MASK : 0) | - ((gpte & VMX_EPT_READABLE_MASK) ? ACC_USER_MASK : 0); + ((gpte & VMX_EPT_READABLE_MASK) ? ACC_READ_MASK : 0); #else - BUILD_BUG_ON(ACC_EXEC_MASK !=3D PT_PRESENT_MASK); - BUILD_BUG_ON(ACC_EXEC_MASK !=3D 1); + /* + * P is set here, so the page is always readable and W/U/!NX represent + * allowed accesses. + */ + BUILD_BUG_ON(ACC_READ_MASK !=3D PT_PRESENT_MASK); + BUILD_BUG_ON(ACC_WRITE_MASK !=3D PT_WRITABLE_MASK); + BUILD_BUG_ON(ACC_USER_MASK !=3D PT_USER_MASK); + BUILD_BUG_ON(ACC_EXEC_MASK & (PT_WRITABLE_MASK | PT_USER_MASK | PT_PRESEN= T_MASK)); access =3D gpte & (PT_WRITABLE_MASK | PT_USER_MASK | PT_PRESENT_MASK); - /* Combine NX with P (which is set here) to get ACC_EXEC_MASK. */ - access ^=3D (gpte >> PT64_NX_SHIFT); + access |=3D gpte & PT64_NX_MASK ? 0 : ACC_EXEC_MASK; #endif =20 return access; @@ -501,10 +500,18 @@ static int FNAME(walk_addr_generic)(struct guest_walk= er *walker, =20 if (write_fault) walker->fault.exit_qualification |=3D EPT_VIOLATION_ACC_WRITE; - if (user_fault) - walker->fault.exit_qualification |=3D EPT_VIOLATION_ACC_READ; - if (fetch_fault) + else if (fetch_fault) walker->fault.exit_qualification |=3D EPT_VIOLATION_ACC_INSTR; + else + walker->fault.exit_qualification |=3D EPT_VIOLATION_ACC_READ; + + /* + * Accesses to guest paging structures are either "reads" or + * "read+write" accesses, so consider them the latter if write_fault + * is true. + */ + if (access & PFERR_GUEST_PAGE_MASK) + walker->fault.exit_qualification |=3D EPT_VIOLATION_ACC_READ; =20 /* * Note, pte_access holds the raw RWX bits from the EPTE, not diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index e9dc0ae44274..7b5f118ae211 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -194,12 +194,6 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_p= age *sp, int is_host_mmio =3D -1; bool wrprot =3D false; =20 - /* - * For the EPT case, shadow_present_mask has no RWX bits set if - * exec-only page table entries are supported. In that case, - * ACC_USER_MASK and shadow_user_mask are used to represent - * read access. See FNAME(gpte_access) in paging_tmpl.h. - */ WARN_ON_ONCE((pte_access | shadow_present_mask) =3D=3D SHADOW_NONPRESENT_= VALUE); =20 if (sp->role.ad_disabled) @@ -228,6 +222,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_pa= ge *sp, pte_access &=3D ~ACC_EXEC_MASK; } =20 + if (pte_access & ACC_READ_MASK) + spte |=3D PT_PRESENT_MASK; /* or VMX_EPT_READABLE_MASK */ + if (pte_access & ACC_EXEC_MASK) spte |=3D shadow_x_mask; else @@ -391,6 +388,7 @@ u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled) u64 spte =3D SPTE_MMU_PRESENT_MASK; =20 spte |=3D __pa(child_pt) | shadow_present_mask | PT_WRITABLE_MASK | + PT_PRESENT_MASK /* or VMX_EPT_READABLE_MASK */ | shadow_user_mask | shadow_x_mask | shadow_me_value; =20 if (ad_disabled) @@ -491,18 +489,16 @@ void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_ma= sk) } EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_set_me_spte_mask); =20 -void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only) +void kvm_mmu_set_ept_masks(bool has_ad_bits) { kvm_ad_enabled =3D has_ad_bits; =20 - shadow_user_mask =3D VMX_EPT_READABLE_MASK; + shadow_user_mask =3D 0; shadow_accessed_mask =3D VMX_EPT_ACCESS_BIT; shadow_dirty_mask =3D VMX_EPT_DIRTY_BIT; shadow_nx_mask =3D 0ull; shadow_x_mask =3D VMX_EPT_EXECUTABLE_MASK; - /* VMX_EPT_SUPPRESS_VE_BIT is needed for W or X violation. */ - shadow_present_mask =3D - (has_exec_only ? 0ull : VMX_EPT_READABLE_MASK) | VMX_EPT_SUPPRESS_VE_BIT; + shadow_present_mask =3D VMX_EPT_SUPPRESS_VE_BIT; =20 shadow_acc_track_mask =3D VMX_EPT_RWX_MASK; shadow_host_writable_mask =3D EPT_SPTE_HOST_WRITABLE; diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index bc02a2e89a31..121bfb2217e8 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -52,10 +52,11 @@ static_assert(SPTE_TDP_AD_ENABLED =3D=3D 0); #define SPTE_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)) #endif =20 -#define ACC_EXEC_MASK 1 +#define ACC_READ_MASK PT_PRESENT_MASK #define ACC_WRITE_MASK PT_WRITABLE_MASK #define ACC_USER_MASK PT_USER_MASK -#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK) +#define ACC_EXEC_MASK 8 +#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK |= ACC_READ_MASK) =20 #define SPTE_LEVEL_BITS 9 #define SPTE_LEVEL_SHIFT(level) __PT_LEVEL_SHIFT(level, SPTE_LEVEL_BITS) diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilitie= s.h index 56cacc06225e..7e59eb0f41bb 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -300,11 +300,6 @@ static inline bool cpu_has_vmx_flexpriority(void) cpu_has_vmx_virtualize_apic_accesses(); } =20 -static inline bool cpu_has_vmx_ept_execute_only(void) -{ - return vmx_capability.ept & VMX_EPT_EXECUTE_ONLY_BIT; -} - static inline bool cpu_has_vmx_ept_4levels(void) { return vmx_capability.ept & VMX_EPT_PAGE_WALK_4_BIT; diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h index adf925500b9e..1afbf272efae 100644 --- a/arch/x86/kvm/vmx/common.h +++ b/arch/x86/kvm/vmx/common.h @@ -85,11 +85,8 @@ static inline int __vmx_handle_ept_violation(struct kvm_= vcpu *vcpu, gpa_t gpa, { u64 error_code; =20 - /* Is it a read fault? */ - error_code =3D (exit_qualification & EPT_VIOLATION_ACC_READ) - ? PFERR_USER_MASK : 0; /* Is it a write fault? */ - error_code |=3D (exit_qualification & EPT_VIOLATION_ACC_WRITE) + error_code =3D (exit_qualification & EPT_VIOLATION_ACC_WRITE) ? PFERR_WRITE_MASK : 0; /* Is it a fetch fault? */ error_code |=3D (exit_qualification & EPT_VIOLATION_ACC_INSTR) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index a29896a9ef14..337bbfecc021 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -8683,8 +8683,7 @@ __init int vmx_hardware_setup(void) set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */ =20 if (enable_ept) - kvm_mmu_set_ept_masks(enable_ept_ad_bits, - cpu_has_vmx_ept_execute_only()); + kvm_mmu_set_ept_masks(enable_ept_ad_bits); else vt_x86_ops.get_mt_mask =3D NULL; =20 --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B209530EF74 for ; Thu, 30 Apr 2026 15:08:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561689; cv=none; b=u+2ScNuFR+chMREAUHtJXQBg8XjDr/3ieX4Bo+If5Brof/YfOhoGspKzjFCEjxfuyTLuMsF2BkqH11jSP4X8uwlNux1wlGU+ICjAQqvuYc5YmHFmDpAzfu1g3zy5So756qPLD5fqds/6ykrbq098MTJgUecPwRcsJpszQuulw7k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561689; c=relaxed/simple; bh=wXjuhSz6nVLjdUFmzOE30XGcCV0H2i3wuwB7AhkrVDs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=gZnr1aWk+dy+vonbQgjIjKXvY7HXyzswVRjU5cohryWBh8d4oDpZn7ybOCcvkD3758GSsFiIqLkM5ipghC6UZZyE8yQlDN3p281aa9rVS5GmyeoPzMp4YYL2zB/mLVVSwOunA/pvVz2liSAifWIzGtsV/Q4s3Qw4lcrGySHOdpU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=UGgevWNA; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="UGgevWNA" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561685; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lwmBYA9ksrwWQcFcUn5nZOkE9O0tPmBRjesg8bFOvsg=; b=UGgevWNAjBh50w7OFaz2JmdqVC0jT3h4UWRwS7efGKKTvGPqBGBtQ95vT9Q7gcrC5SH5gw eaUVwEUy2vCYPuPK8W5wi/hFGyZHpUmD24f7tkhhLg5wO3PJpaDEr49oCai4Vly6qeIqvt bAVj7IrLs6urn60U+A4tADe592kl1Kc= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-216-87RMD59TMoG7cU3n0zoTEg-1; Thu, 30 Apr 2026 11:08:01 -0400 X-MC-Unique: 87RMD59TMoG7cU3n0zoTEg-1 X-Mimecast-MFC-AGG-ID: 87RMD59TMoG7cU3n0zoTEg_1777561678 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D6C55195608F; Thu, 30 Apr 2026 15:07:57 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 4B90F1955D84; Thu, 30 Apr 2026 15:07:57 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 10/28] KVM: x86/mmu: pass PFERR_GUEST_PAGE/FINAL_MASK to kvm_translate_gpa Date: Thu, 30 Apr 2026 11:07:29 -0400 Message-ID: <20260430150747.76749-11-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" The XS/XU bit for EPT are only applied to final accesses, and use the U bit from the page walk itself. While strictly speaking not necessary (any value of PFERR_USER_MASK would be the same for page table accesses, because they're reads and writes only), it is clearer and less hackish to only apply MBEC to PFERR_GUEST_FINAL_MASK. Allow kvm-intel.ko to distinguish the two cases. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/kvm/hyperv.c | 3 ++- arch/x86/kvm/mmu/mmu.c | 3 ++- arch/x86/kvm/mmu/paging_tmpl.h | 7 +++++-- arch/x86/kvm/x86.c | 3 ++- 4 files changed, 11 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c index 9b140bbdc1d8..cf9dd565b894 100644 --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -2041,7 +2041,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, st= ruct kvm_hv_hcall *hc) * read with kvm_read_guest(). */ if (!hc->fast && is_guest_mode(vcpu)) { - hc->ingpa =3D translate_nested_gpa(vcpu, hc->ingpa, 0, NULL); + hc->ingpa =3D translate_nested_gpa(vcpu, hc->ingpa, + PFERR_GUEST_FINAL_MASK, NULL); if (unlikely(hc->ingpa =3D=3D INVALID_GPA)) return HV_STATUS_INVALID_HYPERCALL_INPUT; } diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index fc1b17e22ea2..0fc362508a19 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4348,7 +4348,8 @@ static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vc= pu, struct kvm_mmu *mmu, { if (exception) exception->error_code =3D 0; - return kvm_translate_gpa(vcpu, mmu, vaddr, access, exception); + return kvm_translate_gpa(vcpu, mmu, vaddr, access | PFERR_GUEST_FINAL_MAS= K, + exception); } =20 static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64 addr, bool direc= t) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index fb1b5d8b23e5..567f8b77ffe0 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -376,7 +376,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker= *walker, walker->pte_gpa[walker->level - 1] =3D pte_gpa; =20 real_gpa =3D kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(table_gfn), - nested_access, &walker->fault); + nested_access | PFERR_GUEST_PAGE_MASK, + &walker->fault); =20 /* * FIXME: This can happen if emulation (for of an INS/OUTS @@ -444,7 +445,9 @@ static int FNAME(walk_addr_generic)(struct guest_walker= *walker, gfn +=3D pse36_gfn_delta(pte); #endif =20 - real_gpa =3D kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(gfn), access, &walke= r->fault); + real_gpa =3D kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(gfn), + access | PFERR_GUEST_FINAL_MASK, + &walker->fault); if (real_gpa =3D=3D INVALID_GPA) return 0; =20 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0a1b63c63d1a..ef1e3ae13887 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1072,7 +1072,8 @@ int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long = cr3) * to an L1 GPA. */ real_gpa =3D kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(pdpt_gfn), - PFERR_USER_MASK | PFERR_WRITE_MASK, NULL); + PFERR_USER_MASK | PFERR_WRITE_MASK | + PFERR_GUEST_PAGE_MASK, NULL); if (real_gpa =3D=3D INVALID_GPA) return 0; =20 --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 639EC2F290B for ; Thu, 30 Apr 2026 15:08:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561686; cv=none; b=eLdu4eU/j5UqQL1ZtOVYjBxRU7Fo99QfPWa4klSYb8Hr28trj8GH3vnNgZJb8ikKveCHY54m+QQ8ySwX9mqbZ0z8nUDs5+w1NnWrZRPVkhTfwnMuRfG4KHgEau7R56g5X+IT1BQjkBYWUFE3csjXn0Mk/TjchuUiEi6ZiaufIHs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561686; c=relaxed/simple; bh=WKPXK6K/tHb5Na74AIzSoHWXqpAuSXduR4rzaixRAPg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=vDCJWf5y6YKfOsyIYoE2uCasOq8yP6iwRNCzhd8b1X5A08aqgHdKcpaNaBjeYYhDcLrP3TewBS8XXZRVMS5KGP/I9XGSVLRfudiAhutFSAuHDQoCRstaFkCjoXfv+XC1QzQXq8rVJrBVlSP/vw99xxqB3CveKXXIlJJ6hC/pMjY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=TXfzMtaz; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="TXfzMtaz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561683; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IDgRL/nXkrHHBZ8i5FZU9XSFmenmHs5feCDoWmKWVxk=; b=TXfzMtazWOYZg5CpBGk/yvIDbID6ucfUkkaYeN5lC/bLF2sjeKfGrzQ8GYhXSyi0ONDz+f uajlWwpDMnq+VUsp3JI0L788PSFgwXmM06FFo8IwvGZqQqvG1O/jbV2ByHrGSXzcVuQ6iQ 0Sgnr7B9nz9v7tnmJD5m0ApVvUT+QXI= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-284-wuokMCf7Pji50h2ukLawUQ-1; Thu, 30 Apr 2026 11:07:59 -0400 X-MC-Unique: wuokMCf7Pji50h2ukLawUQ-1 X-Mimecast-MFC-AGG-ID: wuokMCf7Pji50h2ukLawUQ_1777561678 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8E6E71956071; Thu, 30 Apr 2026 15:07:58 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 06E201955D84; Thu, 30 Apr 2026 15:07:57 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 11/28] KVM: x86/mmu: pass pte_access for final nGPA->GPA walk Date: Thu, 30 Apr 2026 11:07:30 -0400 Message-ID: <20260430150747.76749-12-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" The XS/XU bit for EPT are only applied to final accesses, and use the U bit from the page walk itself. This is available in the page walker as pte_access & ACC_USER_MASK but not available to translate_nested_gpa, so pass it down. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/kvm/hyperv.c | 2 +- arch/x86/kvm/mmu.h | 15 ++++++++++++--- arch/x86/kvm/mmu/mmu.c | 8 +++++++- arch/x86/kvm/mmu/paging_tmpl.h | 4 ++-- arch/x86/kvm/mmu/spte.h | 6 ------ arch/x86/kvm/x86.c | 5 +++-- 6 files changed, 25 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c index cf9dd565b894..53688f7b76eb 100644 --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -2042,7 +2042,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, st= ruct kvm_hv_hcall *hc) */ if (!hc->fast && is_guest_mode(vcpu)) { hc->ingpa =3D translate_nested_gpa(vcpu, hc->ingpa, - PFERR_GUEST_FINAL_MASK, NULL); + PFERR_GUEST_FINAL_MASK, NULL, 0); if (unlikely(hc->ingpa =3D=3D INVALID_GPA)) return HV_STATUS_INVALID_HYPERCALL_INPUT; } diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 23f37535c0ce..635c2e5d8513 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -37,6 +37,12 @@ extern bool __read_mostly enable_mmio_caching; #define PT32_ROOT_LEVEL 2 #define PT32E_ROOT_LEVEL 3 =20 +#define ACC_READ_MASK PT_PRESENT_MASK +#define ACC_WRITE_MASK PT_WRITABLE_MASK +#define ACC_USER_MASK PT_USER_MASK +#define ACC_EXEC_MASK 8 +#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK |= ACC_READ_MASK) + #define KVM_MMU_CR4_ROLE_BITS (X86_CR4_PSE | X86_CR4_PAE | X86_CR4_LA57 | \ X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE) =20 @@ -289,16 +295,19 @@ static inline void kvm_update_page_stats(struct kvm *= kvm, int level, int count) } =20 gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u64 access, - struct x86_exception *exception); + struct x86_exception *exception, + u64 pte_access); =20 static inline gpa_t kvm_translate_gpa(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, gpa_t gpa, u64 access, - struct x86_exception *exception) + struct x86_exception *exception, + u64 pte_access) { if (mmu !=3D &vcpu->arch.nested_mmu) return gpa; - return translate_nested_gpa(vcpu, gpa, access, exception); + return translate_nested_gpa(vcpu, gpa, access, exception, + pte_access); } =20 static inline bool kvm_has_mirrored_tdp(const struct kvm *kvm) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 0fc362508a19..88d0ff95fc8c 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4348,8 +4348,14 @@ static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *v= cpu, struct kvm_mmu *mmu, { if (exception) exception->error_code =3D 0; + /* + * EPT MBEC uses the effective access bits from the PTE to distinguish + * user and supervisor accesses, and treats every linear address as a + * user-mode address if CR0.PG=3D0. Therefore *include* ACC_USER_MASK in + * the last argument to kvm_translate_gpa (which NPT does not use). + */ return kvm_translate_gpa(vcpu, mmu, vaddr, access | PFERR_GUEST_FINAL_MAS= K, - exception); + exception, ACC_ALL); } =20 static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64 addr, bool direc= t) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 567f8b77ffe0..8dd9d510fc34 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -377,7 +377,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker= *walker, =20 real_gpa =3D kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(table_gfn), nested_access | PFERR_GUEST_PAGE_MASK, - &walker->fault); + &walker->fault, 0); =20 /* * FIXME: This can happen if emulation (for of an INS/OUTS @@ -447,7 +447,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker= *walker, =20 real_gpa =3D kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(gfn), access | PFERR_GUEST_FINAL_MASK, - &walker->fault); + &walker->fault, walker->pte_access); if (real_gpa =3D=3D INVALID_GPA) return 0; =20 diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 121bfb2217e8..8a4c09c5cdbf 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -52,12 +52,6 @@ static_assert(SPTE_TDP_AD_ENABLED =3D=3D 0); #define SPTE_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)) #endif =20 -#define ACC_READ_MASK PT_PRESENT_MASK -#define ACC_WRITE_MASK PT_WRITABLE_MASK -#define ACC_USER_MASK PT_USER_MASK -#define ACC_EXEC_MASK 8 -#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK |= ACC_READ_MASK) - #define SPTE_LEVEL_BITS 9 #define SPTE_LEVEL_SHIFT(level) __PT_LEVEL_SHIFT(level, SPTE_LEVEL_BITS) #define SPTE_INDEX(address, level) __PT_INDEX(address, level, SPTE_LEVEL_B= ITS) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ef1e3ae13887..67979b7de5d6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1073,7 +1073,7 @@ int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long = cr3) */ real_gpa =3D kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(pdpt_gfn), PFERR_USER_MASK | PFERR_WRITE_MASK | - PFERR_GUEST_PAGE_MASK, NULL); + PFERR_GUEST_PAGE_MASK, NULL, 0); if (real_gpa =3D=3D INVALID_GPA) return 0; =20 @@ -7849,7 +7849,8 @@ void kvm_get_segment(struct kvm_vcpu *vcpu, } =20 gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u64 access, - struct x86_exception *exception) + struct x86_exception *exception, + u64 pte_access) { struct kvm_mmu *mmu =3D vcpu->arch.mmu; gpa_t t_gpa; --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA8892F5468 for ; Thu, 30 Apr 2026 15:08:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561686; cv=none; b=GpDr6qx30J38vxLtYBQq77x94JCvOb3i94uFlFZlDpZNedqOVcaTRpVBRqlAs9tG0JUgWK3M65r92PFHiytaddW4WktcFIgI+zxBkDtYcx17H1E7ikfxJPIzyIKc+zg/kb1Id3eISsf1nnR1Jv8vBnYayIqMvX0+pQyD4iw3TSU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561686; c=relaxed/simple; bh=bSg7XmRbrruQa2PqC5StFoRIwS95D3bE1el8XO+L7IM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ZGWpEg+Eff52wi5XnYJaSqJ0k7IB6Gx9PyOt4RE7vquhVG6NW2F5Gm1oRNBcHrQOiyTH3NGZIFQ630K3pjDm3Swq8qHZDHQZYpDvIAU9Ud6VTjug/kQCP4RorelTtCNHAMnGSqXff168LqEoUmmHvdhMRDRptAQAkNe5KWj7mXI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=X6mSuSla; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="X6mSuSla" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561683; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=q4I+ErKCcFAyo+V91TRlTUsn1L5S+t2tgXLY5cfXULw=; b=X6mSuSla9gh+A1CGgAlceGXvB3Bbhd8QOHQwOfOoXqNprFLHkSz3A3iPRBb3MU8S1vd0uo N5FIUO26NfZLjshk66HvtAsiOKSbxyWhTT88KGVaI1pwp51ixeEmQcV2l8gdTzLYIdoAMh O780aEHrHCG+gOed09ekoLRJziiAHDI= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-647-Mj4wiNXnMcy7FRluLT1tUg-1; Thu, 30 Apr 2026 11:08:00 -0400 X-MC-Unique: Mj4wiNXnMcy7FRluLT1tUg-1 X-Mimecast-MFC-AGG-ID: Mj4wiNXnMcy7FRluLT1tUg_1777561679 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4A256180036E; Thu, 30 Apr 2026 15:07:59 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B55D11953944; Thu, 30 Apr 2026 15:07:58 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 12/28] KVM: x86: make translate_nested_gpa vendor-specific Date: Thu, 30 Apr 2026 11:07:31 -0400 Message-ID: <20260430150747.76749-13-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" EPT and NPT have different rules for passing PFERR_USER_MASK to the nested page table walk. In particular, for final addresses EPT uses the U bit of the guest (nGVA->nGPA) walk. While at it, remove PFERR_USER_MASK from the VMX version of the function, since it is actually ignored by the tables that update_permission_bitmask() generates for EPT. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 4 ++++ arch/x86/kvm/hyperv.c | 3 ++- arch/x86/kvm/mmu.h | 9 +++------ arch/x86/kvm/svm/nested.c | 15 +++++++++++++++ arch/x86/kvm/vmx/nested.c | 12 ++++++++++++ arch/x86/kvm/x86.c | 16 ---------------- 6 files changed, 36 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 8f2a1b915df9..62dc782b2dd3 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2010,6 +2010,10 @@ struct kvm_x86_nested_ops { struct kvm_nested_state *kvm_state); bool (*get_nested_state_pages)(struct kvm_vcpu *vcpu); int (*write_log_dirty)(struct kvm_vcpu *vcpu, gpa_t l2_gpa); + gpa_t (*translate_nested_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, + u64 access, + struct x86_exception *exception, + u64 pte_access); =20 int (*enable_evmcs)(struct kvm_vcpu *vcpu, uint16_t *vmcs_version); diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c index 53688f7b76eb..f35fae3a7b3d 100644 --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -2041,7 +2041,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, st= ruct kvm_hv_hcall *hc) * read with kvm_read_guest(). */ if (!hc->fast && is_guest_mode(vcpu)) { - hc->ingpa =3D translate_nested_gpa(vcpu, hc->ingpa, + hc->ingpa =3D kvm_x86_ops.nested_ops->translate_nested_gpa( + vcpu, hc->ingpa, PFERR_GUEST_FINAL_MASK, NULL, 0); if (unlikely(hc->ingpa =3D=3D INVALID_GPA)) return HV_STATUS_INVALID_HYPERCALL_INPUT; diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 635c2e5d8513..63be5c5efed9 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -294,10 +294,6 @@ static inline void kvm_update_page_stats(struct kvm *k= vm, int level, int count) atomic64_add(count, &kvm->stat.pages[level - 1]); } =20 -gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u64 access, - struct x86_exception *exception, - u64 pte_access); - static inline gpa_t kvm_translate_gpa(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, gpa_t gpa, u64 access, @@ -306,8 +302,9 @@ static inline gpa_t kvm_translate_gpa(struct kvm_vcpu *= vcpu, { if (mmu !=3D &vcpu->arch.nested_mmu) return gpa; - return translate_nested_gpa(vcpu, gpa, access, exception, - pte_access); + return kvm_x86_ops.nested_ops->translate_nested_gpa(vcpu, gpa, access, + exception, + pte_access); } =20 static inline bool kvm_has_mirrored_tdp(const struct kvm *kvm) diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 961804df5f45..df232153eb24 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -2071,8 +2071,23 @@ static bool svm_get_nested_state_pages(struct kvm_vc= pu *vcpu) return true; } =20 +static gpa_t svm_translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, + u64 access, + struct x86_exception *exception, + u64 pte_access) +{ + struct kvm_mmu *mmu =3D vcpu->arch.mmu; + + BUG_ON(!mmu_is_nested(vcpu)); + + /* NPT walks are always user-walks */ + access |=3D PFERR_USER_MASK; + return mmu->gva_to_gpa(vcpu, mmu, gpa, access, exception); +} + struct kvm_x86_nested_ops svm_nested_ops =3D { .leave_nested =3D svm_leave_nested, + .translate_nested_gpa =3D svm_translate_nested_gpa, .is_exception_vmexit =3D nested_svm_is_exception_vmexit, .check_events =3D svm_check_nested_events, .triple_fault =3D nested_svm_triple_fault, diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 3fe88f29be7a..cd1924c6e075 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -7438,8 +7438,20 @@ __init int nested_vmx_hardware_setup(int (*exit_hand= lers[])(struct kvm_vcpu *)) return 0; } =20 +static gpa_t vmx_translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, + u64 access, + struct x86_exception *exception, + u64 pte_access) +{ + struct kvm_mmu *mmu =3D vcpu->arch.mmu; + + BUG_ON(!mmu_is_nested(vcpu)); + return mmu->gva_to_gpa(vcpu, mmu, gpa, access, exception); +} + struct kvm_x86_nested_ops vmx_nested_ops =3D { .leave_nested =3D vmx_leave_nested, + .translate_nested_gpa =3D vmx_translate_nested_gpa, .is_exception_vmexit =3D nested_vmx_is_exception_vmexit, .check_events =3D vmx_check_nested_events, .has_events =3D vmx_has_nested_events, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 67979b7de5d6..7c6942afae81 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7848,22 +7848,6 @@ void kvm_get_segment(struct kvm_vcpu *vcpu, kvm_x86_call(get_segment)(vcpu, var, seg); } =20 -gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u64 access, - struct x86_exception *exception, - u64 pte_access) -{ - struct kvm_mmu *mmu =3D vcpu->arch.mmu; - gpa_t t_gpa; - - BUG_ON(!mmu_is_nested(vcpu)); - - /* NPT walks are always user-walks */ - access |=3D PFERR_USER_MASK; - t_gpa =3D mmu->gva_to_gpa(vcpu, mmu, gpa, access, exception); - - return t_gpa; -} - gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, struct x86_exception *exception) { --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B1FA3019AA for ; Thu, 30 Apr 2026 15:08:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561689; cv=none; b=XvOVOMn01bUSIQCsuVk7PgT9Zpl9c77z0+OAftjgZXHC6c/1cFuKAOIb4pk7/osk03w79iovtG1Lyoel6KesE+qLuuZ6Jhwcu20A3HtMSMOYd5YSSlI7FvxAlOaTP72cRmZ/8qUE8JAGHbbwFvjngQ+ZQr9Qx6ofs6xj6FkAzDY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561689; c=relaxed/simple; bh=5tMAP3ZFoigrW/FEomY2eR1exFYuPuU7CEe2jG4a4dQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=AM6xjIp6gC3hgHLTSlvs3a9d5+KBBhSradIB6GL62lV+KoCYDAqHoC+wP2XoNtPwTiMfAA0xAtIweNMeRLUl9HoEQCcZ9mDWYtX2kt8oIqsAHDoFLg46xOtxMpDkT0f6scHn9aS/McSgUmqkc+PKy510DAkIFzoPjjYxd6beElY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=EniMyZ/4; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="EniMyZ/4" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561684; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J3F1Qv8G+KzfvbPdCOHo6pDIPCQfYdEDaxRX4uXirEw=; b=EniMyZ/4kdpQ0P6OInfxgT0kROjwWXvvqec6v2Vzi2NDRrj+U6Sz0giRuys1dVg7vHnspC +cuH4bMCZDccLX5kpf36JTYR3wJQ9hOKFijNn2z6gkor+kRjuS0MMxaiFjWQnrFLhUHXsl B6wRVXq2Dy0bHeBQJqxaTsm762nbykI= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-651-iSsvD2yxO3GYI6f-dyVA8g-1; Thu, 30 Apr 2026 11:08:01 -0400 X-MC-Unique: iSsvD2yxO3GYI6f-dyVA8g-1 X-Mimecast-MFC-AGG-ID: iSsvD2yxO3GYI6f-dyVA8g_1777561680 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0416A1956095; Thu, 30 Apr 2026 15:08:00 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 6FD0F1955D84; Thu, 30 Apr 2026 15:07:59 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 13/28] KVM: x86/mmu: split XS/XU bits for EPT Date: Thu, 30 Apr 2026 11:07:32 -0400 Message-ID: <20260430150747.76749-14-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" When EPT is in use, replace ACC_USER_MASK with ACC_USER_EXEC_MASK, so that supervisor and user-mode execution can be controlled independently (ACC_USER_MASK would not allow a setting similar to XU=3D0 XS=3D1 W=3D1 R=3D1). Replace shadow_x_mask with shadow_xs_mask/shadow_xu_mask, to allow setting XS and XU bits separately in EPT entries. Note that ACC_USER_EXEC_MASK is already set through ACC_ALL in the kvm_mmu_page roles, but it does not propagate to the XU bit because (for now) shadow_xs_mask =3D=3D shadow_xu_mask. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu.h | 3 +- arch/x86/kvm/mmu/mmu.c | 2 +- arch/x86/kvm/mmu/mmutrace.h | 6 ++-- arch/x86/kvm/mmu/spte.c | 60 ++++++++++++++++++++++++++----------- arch/x86/kvm/mmu/spte.h | 11 +++++-- 5 files changed, 58 insertions(+), 24 deletions(-) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 63be5c5efed9..d8c13e43c2d7 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -39,7 +39,8 @@ extern bool __read_mostly enable_mmio_caching; =20 #define ACC_READ_MASK PT_PRESENT_MASK #define ACC_WRITE_MASK PT_WRITABLE_MASK -#define ACC_USER_MASK PT_USER_MASK +#define ACC_USER_MASK PT_USER_MASK /* non EPT */ +#define ACC_USER_EXEC_MASK ACC_USER_MASK /* EPT only */ #define ACC_EXEC_MASK 8 #define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK |= ACC_READ_MASK) =20 diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 88d0ff95fc8c..617a3204a5e0 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5491,7 +5491,7 @@ static void reset_shadow_zero_bits_mask(struct kvm_vc= pu *vcpu, static inline bool boot_cpu_is_amd(void) { WARN_ON_ONCE(!tdp_enabled); - return shadow_x_mask =3D=3D 0; + return shadow_xs_mask =3D=3D 0; } =20 /* diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h index dcfdfedfc4e9..3429c1413f42 100644 --- a/arch/x86/kvm/mmu/mmutrace.h +++ b/arch/x86/kvm/mmu/mmutrace.h @@ -357,8 +357,8 @@ TRACE_EVENT( __entry->sptep =3D virt_to_phys(sptep); __entry->level =3D level; __entry->r =3D shadow_present_mask || (__entry->spte & PT_PRESENT_MASK); - __entry->x =3D is_executable_pte(__entry->spte); - __entry->u =3D shadow_user_mask ? !!(__entry->spte & shadow_user_mask) := -1; + __entry->x =3D (__entry->spte & (shadow_xs_mask | shadow_nx_mask)) =3D= =3D shadow_xs_mask; + __entry->u =3D !!(__entry->spte & (shadow_xu_mask | shadow_user_mask)); ), =20 TP_printk("gfn %llx spte %llx (%s%s%s%s) level %d at %llx", @@ -366,7 +366,7 @@ TRACE_EVENT( __entry->r ? "r" : "-", __entry->spte & PT_WRITABLE_MASK ? "w" : "-", __entry->x ? "x" : "-", - __entry->u =3D=3D -1 ? "" : (__entry->u ? "u" : "-"), + __entry->u ? "u" : "-", __entry->level, __entry->sptep ) ); diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 7b5f118ae211..4575dd77f854 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -29,8 +29,9 @@ bool __read_mostly kvm_ad_enabled; u64 __read_mostly shadow_host_writable_mask; u64 __read_mostly shadow_mmu_writable_mask; u64 __read_mostly shadow_nx_mask; -u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ u64 __read_mostly shadow_user_mask; +u64 __read_mostly shadow_xs_mask; /* mutual exclusive with nx_mask and use= r_mask */ +u64 __read_mostly shadow_xu_mask; /* mutual exclusive with nx_mask and use= r_mask */ u64 __read_mostly shadow_accessed_mask; u64 __read_mostly shadow_dirty_mask; u64 __read_mostly shadow_mmio_value; @@ -217,21 +218,26 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_= page *sp, * would tie make_spte() further to vCPU/MMU state, and add complexity * just to optimize a mode that is anything but performance critical. */ - if (level > PG_LEVEL_4K && (pte_access & ACC_EXEC_MASK) && - is_nx_huge_page_enabled(vcpu->kvm)) { + if (level > PG_LEVEL_4K && is_nx_huge_page_enabled(vcpu->kvm)) { pte_access &=3D ~ACC_EXEC_MASK; + if (shadow_xu_mask) + pte_access &=3D ~ACC_USER_EXEC_MASK; } =20 if (pte_access & ACC_READ_MASK) spte |=3D PT_PRESENT_MASK; /* or VMX_EPT_READABLE_MASK */ =20 - if (pte_access & ACC_EXEC_MASK) - spte |=3D shadow_x_mask; - else - spte |=3D shadow_nx_mask; - - if (pte_access & ACC_USER_MASK) - spte |=3D shadow_user_mask; + if (shadow_nx_mask) { + if (!(pte_access & ACC_EXEC_MASK)) + spte |=3D shadow_nx_mask; + if (pte_access & ACC_USER_MASK) + spte |=3D shadow_user_mask; + } else { + if (pte_access & ACC_EXEC_MASK) + spte |=3D shadow_xs_mask; + if (pte_access & ACC_USER_EXEC_MASK) + spte |=3D shadow_xu_mask; + } =20 if (level > PG_LEVEL_4K) spte |=3D PT_PAGE_SIZE_MASK; @@ -318,11 +324,13 @@ static u64 make_spte_executable(u64 spte, u8 access) { u64 set, clear; =20 - if (access & ACC_EXEC_MASK) - set =3D shadow_x_mask; + if (shadow_nx_mask) + set =3D (access & ACC_EXEC_MASK) ? 0 : shadow_nx_mask; else - set =3D shadow_nx_mask; - clear =3D set ^ (shadow_nx_mask | shadow_x_mask); + set =3D + (access & ACC_EXEC_MASK ? shadow_xs_mask : 0) | + (access & ACC_USER_EXEC_MASK ? shadow_xu_mask : 0); + clear =3D set ^ (shadow_nx_mask | shadow_xs_mask | shadow_xu_mask); return modify_spte_protections(spte, set, clear); } =20 @@ -389,7 +397,7 @@ u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled) =20 spte |=3D __pa(child_pt) | shadow_present_mask | PT_WRITABLE_MASK | PT_PRESENT_MASK /* or VMX_EPT_READABLE_MASK */ | - shadow_user_mask | shadow_x_mask | shadow_me_value; + shadow_user_mask | shadow_xs_mask | shadow_xu_mask | shadow_me_value; =20 if (ad_disabled) spte |=3D SPTE_TDP_AD_DISABLED; @@ -497,7 +505,24 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits) shadow_accessed_mask =3D VMX_EPT_ACCESS_BIT; shadow_dirty_mask =3D VMX_EPT_DIRTY_BIT; shadow_nx_mask =3D 0ull; - shadow_x_mask =3D VMX_EPT_EXECUTABLE_MASK; + shadow_xs_mask =3D VMX_EPT_EXECUTABLE_MASK; + + /* + * The MMU always maps ACC_EXEC_MASK and ACC_USER_EXEC_MASK to the + * XS and XU bits of shadow EPT entries, regardless of whether MBEC + * is available on the host or enabled by the L1 hypervisor's EPTP. + * + * For the non-nested case, pages are mapped with ACC_EXEC_MASK + * and ACC_USER_EXEC_MASK set in tandem, so XS =3D=3D XU and the + * host's MBEC setting does not matter. On hardware without MBEC + * the XU bit is reserved-as-ignored, and setting it does no harm. + * + * For nested EPT MBEC is not supported, but bit 10 of the gPTE has + * no effect because (a) is_present_gpte() does not treat it as a + * present bit, and (b) permission_fault() uses an mmu->permissions[] + * array that effectively ignores ACC_USER_EXEC_MASK. + */ + shadow_xu_mask =3D VMX_EPT_USER_EXECUTABLE_MASK; shadow_present_mask =3D VMX_EPT_SUPPRESS_VE_BIT; =20 shadow_acc_track_mask =3D VMX_EPT_RWX_MASK; @@ -548,7 +573,8 @@ void kvm_mmu_reset_all_pte_masks(void) shadow_accessed_mask =3D PT_ACCESSED_MASK; shadow_dirty_mask =3D PT_DIRTY_MASK; shadow_nx_mask =3D PT64_NX_MASK; - shadow_x_mask =3D 0; + shadow_xs_mask =3D 0; + shadow_xu_mask =3D 0; shadow_present_mask =3D PT_PRESENT_MASK; =20 shadow_acc_track_mask =3D 0; diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 8a4c09c5cdbf..0ed690f78e17 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -178,8 +178,9 @@ extern bool __read_mostly kvm_ad_enabled; extern u64 __read_mostly shadow_host_writable_mask; extern u64 __read_mostly shadow_mmu_writable_mask; extern u64 __read_mostly shadow_nx_mask; -extern u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ extern u64 __read_mostly shadow_user_mask; +extern u64 __read_mostly shadow_xs_mask; /* mutual exclusive with nx_mask = and user_mask */ +extern u64 __read_mostly shadow_xu_mask; /* mutual exclusive with nx_mask = and user_mask */ extern u64 __read_mostly shadow_accessed_mask; extern u64 __read_mostly shadow_dirty_mask; extern u64 __read_mostly shadow_mmio_value; @@ -357,7 +358,13 @@ static inline bool is_last_spte(u64 pte, int level) =20 static inline bool is_executable_pte(u64 spte) { - return (spte & (shadow_x_mask | shadow_nx_mask)) =3D=3D shadow_x_mask; + /* + * For now, return true if either the XS or XU bit is set + * This function is only used for fast_page_fault, + * which never processes shadow EPT, and regular page + * tables always have XS=3D=3DXU. + */ + return (spte & (shadow_xs_mask | shadow_xu_mask | shadow_nx_mask)) !=3D s= hadow_nx_mask; } =20 static inline kvm_pfn_t spte_to_pfn(u64 pte) --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B277530EF7D for ; Thu, 30 Apr 2026 15:08:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561689; cv=none; b=jttKuoBaGxMMK/3Q8a0RlLN5jM0dgPckQkDO6po7vUuTHhSuAV/nBbb96PHwAYAyIRtBmKnUJ8e5mtY7oPUcfVdBxpy0t6xPAOn7jHCvT9YdXj9gHzPoGn4/cvXBNmWTe1smayOfR/TeJxPYwE8iWDQqyI1KXrdNq7ptIeSYn5o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561689; c=relaxed/simple; bh=jq5FH1Tr5AOyx+3hO6XLhhSfzPIQZVTsT/5+h8Wzc7o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=XDkbtpV6Ig9ip/KyAiFBvxNX5Q64CDeEWptHi6x9lpvqW13NgOJpd8D7hmcPNZImO2am7j437Ze1acnoGkLg4vxuQaRgzNrLDWFdu+mdx/+jgzGZ2pLv1QZFV7++QBCmBloz9LXHdi+5gdoiQVNNFxoTp4bg9eEPswwIS/oFRDA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=jVJoTjMS; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="jVJoTjMS" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561685; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Xw8LQkpqhSbO89aqQneay3hfbCdjIB0MYrjWIBdldYA=; b=jVJoTjMSg1BR24KQe8d3IIHPT61yxXVVfVVhNFiY1ZJ4os6Akf23h6iPM6LN7q006jfUJC PmBv4ciEpMFPffmAOsNu0S+d+Ukhh5MdeHKWpZbBodgeEjfqJg/Zns4t2itnih/+juzYJp xC9/0ODb70nFQEYDJHgSqhByIeVdwkE= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-31-ldPI0HdfMlKCQfyNl6b1Fw-1; Thu, 30 Apr 2026 11:08:01 -0400 X-MC-Unique: ldPI0HdfMlKCQfyNl6b1Fw-1 X-Mimecast-MFC-AGG-ID: ldPI0HdfMlKCQfyNl6b1Fw_1777561680 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B377B18005A9; Thu, 30 Apr 2026 15:08:00 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2AA7B1955D84; Thu, 30 Apr 2026 15:08:00 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 14/28] KVM: x86/mmu: move cr4_smep to base role Date: Thu, 30 Apr 2026 11:07:33 -0400 Message-ID: <20260430150747.76749-15-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" Guest page tables can be reused independent of the value of CR4.SMEP (at least if WP=3D1). However, this is not true of EPT MBEC pages, because presence of EPT entries is signaled by bits 0-2 when MBEC is off, and bits 0-2 + bit 10 when MBEC is on. In preparation for enabling MBEC, move cr4_smep to the base role. This makes the smep_andnot_wp bit redundant, so remove it. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/x86/mmu.rst | 10 ++++------ arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 23 +++++++++++++++-------- arch/x86/kvm/mmu/mmu.c | 6 +++--- 4 files changed, 23 insertions(+), 17 deletions(-) diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x8= 6/mmu.rst index 2b3b6d442302..666aa179601a 100644 --- a/Documentation/virt/kvm/x86/mmu.rst +++ b/Documentation/virt/kvm/x86/mmu.rst @@ -184,10 +184,8 @@ Shadow pages contain the following information: Contains the value of efer.nx for which the page is valid. role.cr0_wp: Contains the value of cr0.wp for which the page is valid. - role.smep_andnot_wp: - Contains the value of cr4.smep && !cr0.wp for which the page is valid - (pages for which this is true are different from other pages; see the - treatment of cr0.wp=3D0 below). + role.cr4_smep: + Contains the value of cr4.smep for which the page is valid. role.smap_andnot_wp: Contains the value of cr4.smap && !cr0.wp for which the page is valid (pages for which this is true are different from other pages; see the @@ -435,8 +433,8 @@ from being written by the kernel after cr0.wp has chang= ed to 1, we make the value of cr0.wp part of the page role. This means that an spte created with one value of cr0.wp cannot be used when cr0.wp has a different value - it will simply be missed by the shadow page lookup code. A similar issue -exists when an spte created with cr0.wp=3D0 and cr4.smep=3D0 is used after -changing cr4.smep to 1. To avoid this, the value of !cr0.wp && cr4.smep +exists when an spte created with cr0.wp=3D0 and cr4.smap=3D0 is used after +changing cr4.smap to 1. To avoid this, the value of !cr0.wp && cr4.smap is also made a part of the page role. =20 Large pages diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index 3776cf5382a2..e4fca997ec79 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -94,6 +94,7 @@ KVM_X86_OP_OPTIONAL(sync_pir_to_irr) KVM_X86_OP_OPTIONAL_RET0(set_tss_addr) KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr) KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) +KVM_X86_OP_OPTIONAL_RET0(tdp_has_smep) KVM_X86_OP(load_mmu_pgd) KVM_X86_OP_OPTIONAL(link_external_spt) KVM_X86_OP_OPTIONAL(set_external_spte) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 62dc782b2dd3..23a7ac8d7fbe 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -343,8 +343,8 @@ struct kvm_kernel_irq_routing_entry; * paging has exactly one upper level, making level completely redunda= nt * when has_4_byte_gpte=3D1. * - * - on top of this, smep_andnot_wp and smap_andnot_wp are only set if - * cr0_wp=3D0, therefore these three bits only give rise to 5 possibil= ities. + * - on top of this, smap_andnot_wp is only set if cr0_wp=3D0, + * therefore these two bits only give rise to 3 possibilities. * * Therefore, the maximum number of possible upper-level shadow pages for a * single gfn is a bit less than 2^14. @@ -360,12 +360,19 @@ union kvm_mmu_page_role { unsigned invalid:1; unsigned efer_nx:1; unsigned cr0_wp:1; - unsigned smep_andnot_wp:1; unsigned smap_andnot_wp:1; unsigned ad_disabled:1; unsigned guest_mode:1; unsigned passthrough:1; unsigned is_mirror:1; + + /* + * cr4_smep is also set for EPT MBEC. Because it affects + * which pages are considered non-present (bit 10 additionally + * must be zero if MBEC is on) it has to be in the base role. + */ + unsigned cr4_smep:1; + unsigned:3; =20 /* @@ -392,10 +399,10 @@ union kvm_mmu_page_role { * tables (because KVM doesn't support Protection Keys with shadow paging)= , and * CR0.PG, CR4.PAE, and CR4.PSE are indirectly reflected in role.level. * - * Note, SMEP and SMAP are not redundant with sm*p_andnot_wp in the page r= ole. - * If CR0.WP=3D1, KVM can reuse shadow pages for the guest regardless of S= MEP and - * SMAP, but the MMU's permission checks for software walks need to be SME= P and - * SMAP aware regardless of CR0.WP. + * Note, SMAP is not redundant with smap_andnot_wp in the page role. If + * CR0.WP=3D1, KVM can reuse shadow pages for the guest regardless of SMAP, + * but the MMU's permission checks for software walks need to be SMAP + * aware regardless of CR0.WP. */ union kvm_mmu_extended_role { u32 word; @@ -405,7 +412,6 @@ union kvm_mmu_extended_role { unsigned int cr4_pse:1; unsigned int cr4_pke:1; unsigned int cr4_smap:1; - unsigned int cr4_smep:1; unsigned int cr4_la57:1; unsigned int efer_lma:1; }; @@ -1887,6 +1893,7 @@ struct kvm_x86_ops { int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); int (*set_identity_map_addr)(struct kvm *kvm, u64 ident_addr); u8 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); + bool (*tdp_has_smep)(struct kvm *kvm); =20 void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 617a3204a5e0..245a2e92793d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -227,7 +227,7 @@ static inline bool __maybe_unused is_##reg##_##name(str= uct kvm_mmu *mmu) \ } BUILD_MMU_ROLE_ACCESSOR(base, cr0, wp); BUILD_MMU_ROLE_ACCESSOR(ext, cr4, pse); -BUILD_MMU_ROLE_ACCESSOR(ext, cr4, smep); +BUILD_MMU_ROLE_ACCESSOR(base, cr4, smep); BUILD_MMU_ROLE_ACCESSOR(ext, cr4, smap); BUILD_MMU_ROLE_ACCESSOR(ext, cr4, pke); BUILD_MMU_ROLE_ACCESSOR(ext, cr4, la57); @@ -5764,7 +5764,7 @@ static union kvm_cpu_role kvm_calc_cpu_role(struct kv= m_vcpu *vcpu, =20 role.base.efer_nx =3D ____is_efer_nx(regs); role.base.cr0_wp =3D ____is_cr0_wp(regs); - role.base.smep_andnot_wp =3D ____is_cr4_smep(regs) && !____is_cr0_wp(regs= ); + role.base.cr4_smep =3D ____is_cr4_smep(regs); role.base.smap_andnot_wp =3D ____is_cr4_smap(regs) && !____is_cr0_wp(regs= ); role.base.has_4_byte_gpte =3D !____is_cr4_pae(regs); =20 @@ -5776,7 +5776,6 @@ static union kvm_cpu_role kvm_calc_cpu_role(struct kv= m_vcpu *vcpu, else role.base.level =3D PT32_ROOT_LEVEL; =20 - role.ext.cr4_smep =3D ____is_cr4_smep(regs); role.ext.cr4_smap =3D ____is_cr4_smap(regs); role.ext.cr4_pse =3D ____is_cr4_pse(regs); =20 @@ -5835,6 +5834,7 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, =20 role.access =3D ACC_ALL; role.cr0_wp =3D true; + role.cr4_smep =3D kvm_x86_call(tdp_has_smep)(vcpu->kvm); role.efer_nx =3D true; role.smm =3D cpu_role.base.smm; role.guest_mode =3D cpu_role.base.guest_mode; --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DB799314D15 for ; Thu, 30 Apr 2026 15:08:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561691; cv=none; b=Xvl16AICUFEnCnIyNecMcfmtccvHOwQSeBKIMxD2QFLzTE7kA31HJ/KaDyahOjT9hQxarYey4RmKehVu2VdlHdyuzm5Huj7gd5cK4UtLFjXnL2YNfoa4JqPD6FYqLoNd1bvJ+3pUBhMO4fOJl0vKLtcPxlZ559OuyIP21nF8+Pc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561691; c=relaxed/simple; bh=B4WufAr0TneN+STkhFZRf6uab8w021nv0A/4ICh48KE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=bL9oSN07pf0REHqWQdaYsYyBhJoVeumnmO955sMz/2TE4mcDUXQri2ybI37jQlhT7IFiqge3KzAwrlaeJBcut0n46n8kk2AwmwsNWooEvvkSNn92SnvPnB3JV3ZzOvkwXp1Nht4Xd/RMsv6O7pmgYiNWchBYEcg9guhs/Rip4/c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=iyjQIiTC; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="iyjQIiTC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561686; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RcT+o7SBgcuxEEQ1WcRxpOZxScjpmmFgoTW2p9XzBKw=; b=iyjQIiTC6rNlkVbUA0VXSCzxEmAZlnGsmCBkUkUrekpF9drCBTIQs+r0sNEOumy4h+xJv9 10F+3LHdX+YDJrbOyRMMD9ClpvlArU1VySiZkYjWK1E/MrdjodbKI7fHD6AN3dfjQA9J9v Hva5T1GHnwx1xbLEBWvc7O/zDOpuDX8= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-136-LE9Ah-zENWqOqaOjKOaK-Q-1; Thu, 30 Apr 2026 11:08:02 -0400 X-MC-Unique: LE9Ah-zENWqOqaOjKOaK-Q-1 X-Mimecast-MFC-AGG-ID: LE9Ah-zENWqOqaOjKOaK-Q_1777561681 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8579219560AA; Thu, 30 Apr 2026 15:08:01 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id D9EEC1955D84; Thu, 30 Apr 2026 15:08:00 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 15/28] KVM: VMX: enable use of MBEC Date: Thu, 30 Apr 2026 11:07:34 -0400 Message-ID: <20260430150747.76749-16-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" If available, set SECONDARY_EXEC_MODE_BASED_EPT_EXEC in the secondary execution controls. XS and XU are configured separately even if host does not support MBEC, to avoid confusion where an L1 hypervisor sets bit 10 in its EPT (knowing the non-MBEC hardware ignores it) but clears bit 2 to make the page non-executable, and KVM sets the X bit via shadow_xu_mask. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/vmx.h | 3 +++ arch/x86/kvm/mmu.h | 5 +++++ arch/x86/kvm/mmu/spte.c | 2 +- arch/x86/kvm/mmu/spte.h | 5 +++-- arch/x86/kvm/vmx/capabilities.h | 7 +++++++ arch/x86/kvm/vmx/common.h | 10 +++++----- arch/x86/kvm/vmx/main.c | 9 +++++++++ arch/x86/kvm/vmx/nested.c | 1 + arch/x86/kvm/vmx/vmx.c | 16 +++++++++++++++- arch/x86/kvm/vmx/vmx.h | 1 + arch/x86/kvm/vmx/x86_ops.h | 1 + 11 files changed, 51 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 2b30b921b375..54aa5be50df9 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -619,9 +619,12 @@ enum vm_entry_failure_code { #define EPT_VIOLATION_GVA_TRANSLATED BIT(8) =20 #define EPT_VIOLATION_RWX_TO_PROT(__epte) (((__epte) & VMX_EPT_RWX_MASK) <= < 3) +#define EPT_VIOLATION_USER_EXEC_TO_PROT(__epte) (((__epte) & VMX_EPT_USER_= EXECUTABLE_MASK) >> 4) =20 static_assert(EPT_VIOLATION_RWX_TO_PROT(VMX_EPT_RWX_MASK) =3D=3D (EPT_VIOLATION_PROT_READ | EPT_VIOLATION_PROT_WRITE | EPT_VIOLATION= _PROT_EXEC)); +static_assert(EPT_VIOLATION_USER_EXEC_TO_PROT(VMX_EPT_USER_EXECUTABLE_MASK= ) =3D=3D + (EPT_VIOLATION_PROT_USER_EXEC)); =20 /* * Exit Qualifications for NOTIFY VM EXIT diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index d8c13e43c2d7..23bc5b18efd0 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -83,6 +83,11 @@ static inline gfn_t kvm_mmu_max_gfn(void) return (1ULL << (max_gpa_bits - PAGE_SHIFT)) - 1; } =20 +static inline bool mmu_has_mbec(struct kvm_mmu *mmu) +{ + return mmu->root_role.cr4_smep; +} + u8 kvm_mmu_get_max_tdp_level(void); =20 void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_= mask); diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 4575dd77f854..09e6f494dcf4 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -525,7 +525,7 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits) shadow_xu_mask =3D VMX_EPT_USER_EXECUTABLE_MASK; shadow_present_mask =3D VMX_EPT_SUPPRESS_VE_BIT; =20 - shadow_acc_track_mask =3D VMX_EPT_RWX_MASK; + shadow_acc_track_mask =3D VMX_EPT_RWX_MASK | VMX_EPT_USER_EXECUTABLE_MASK; shadow_host_writable_mask =3D EPT_SPTE_HOST_WRITABLE; shadow_mmu_writable_mask =3D EPT_SPTE_MMU_WRITABLE; =20 diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 0ed690f78e17..f5261d993eac 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -24,7 +24,7 @@ * - bits 55 (EPT only): MMU-writable * - bits 56-59: unused * - bits 60-61: type of A/D tracking - * - bits 62: unused + * - bits 62 (EPT only): saved XU bit for disabled AD */ =20 /* @@ -65,7 +65,8 @@ static_assert(SPTE_TDP_AD_ENABLED =3D=3D 0); * must not overlap the A/D type mask. */ #define SHADOW_ACC_TRACK_SAVED_BITS_MASK (VMX_EPT_READABLE_MASK | \ - VMX_EPT_EXECUTABLE_MASK) + VMX_EPT_EXECUTABLE_MASK | \ + VMX_EPT_USER_EXECUTABLE_MASK) #define SHADOW_ACC_TRACK_SAVED_BITS_SHIFT 52 #define SHADOW_ACC_TRACK_SAVED_MASK (SHADOW_ACC_TRACK_SAVED_BITS_MASK << \ SHADOW_ACC_TRACK_SAVED_BITS_SHIFT) diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilitie= s.h index 7e59eb0f41bb..07469d1cfe74 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -15,6 +15,7 @@ extern bool __read_mostly enable_ept; extern bool __read_mostly enable_unrestricted_guest; extern bool __read_mostly enable_ept_ad_bits; extern bool __read_mostly enable_pml; +extern bool __read_mostly enable_mbec; extern int __read_mostly pt_mode; =20 #define PT_MODE_SYSTEM 0 @@ -406,4 +407,10 @@ static inline bool cpu_has_notify_vmexit(void) SECONDARY_EXEC_NOTIFY_VM_EXITING; } =20 +static inline bool cpu_has_ept_mbec(void) +{ + return vmcs_config.cpu_based_2nd_exec_ctrl & + SECONDARY_EXEC_MODE_BASED_EPT_EXEC; +} + #endif /* __KVM_X86_VMX_CAPS_H */ diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h index 1afbf272efae..40fa72f31fc7 100644 --- a/arch/x86/kvm/vmx/common.h +++ b/arch/x86/kvm/vmx/common.h @@ -91,15 +91,15 @@ static inline int __vmx_handle_ept_violation(struct kvm= _vcpu *vcpu, gpa_t gpa, /* Is it a fetch fault? */ error_code |=3D (exit_qualification & EPT_VIOLATION_ACC_INSTR) ? PFERR_FETCH_MASK : 0; - /* - * ept page table entry is present? - * note: unconditionally clear USER_EXEC until mode-based - * execute control is implemented - */ + /* ept page table entry is present? */ error_code |=3D (exit_qualification & (EPT_VIOLATION_PROT_MASK & ~EPT_VIOLATION_PROT_USER_EXEC)) ? PFERR_PRESENT_MASK : 0; =20 + if (mmu_has_mbec(vcpu->arch.mmu)) + error_code |=3D (exit_qualification & EPT_VIOLATION_PROT_USER_EXEC) + ? PFERR_PRESENT_MASK : 0; + if (exit_qualification & EPT_VIOLATION_GVA_IS_VALID) error_code |=3D (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) ? PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK; diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index dbebddf648be..83d9921277ea 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -755,6 +755,14 @@ static int vt_set_identity_map_addr(struct kvm *kvm, u= 64 ident_addr) return vmx_set_identity_map_addr(kvm, ident_addr); } =20 +static bool vt_tdp_has_smep(struct kvm *kvm) +{ + if (is_td(kvm)) + return false; + + return vmx_tdp_has_smep(kvm); +} + static u64 vt_get_l2_tsc_offset(struct kvm_vcpu *vcpu) { /* TDX doesn't support L2 guest at the moment. */ @@ -966,6 +974,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D { .set_tss_addr =3D vt_op(set_tss_addr), .set_identity_map_addr =3D vt_op(set_identity_map_addr), .get_mt_mask =3D vmx_get_mt_mask, + .tdp_has_smep =3D vt_op(tdp_has_smep), =20 .get_exit_info =3D vt_op(get_exit_info), .get_entry_info =3D vt_op(get_entry_info), diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index cd1924c6e075..299d4ca60fb3 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2440,6 +2440,7 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx= , struct loaded_vmcs *vmcs0 SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY | SECONDARY_EXEC_APIC_REGISTER_VIRT | SECONDARY_EXEC_ENABLE_VMFUNC | + SECONDARY_EXEC_MODE_BASED_EPT_EXEC | SECONDARY_EXEC_DESC); =20 if (nested_cpu_has(vmcs12, diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 337bbfecc021..72a75fa33c93 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -114,6 +114,9 @@ module_param(emulate_invalid_guest_state, bool, 0444); static bool __read_mostly fasteoi =3D 1; module_param(fasteoi, bool, 0444); =20 +bool __read_mostly enable_mbec =3D 1; +module_param_named(mbec, enable_mbec, bool, 0444); + module_param(enable_apicv, bool, 0444); module_param(enable_ipiv, bool, 0444); =20 @@ -2773,6 +2776,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs= _conf, return -EIO; =20 vmx_cap->ept =3D 0; + _cpu_based_2nd_exec_control &=3D ~SECONDARY_EXEC_MODE_BASED_EPT_EXEC; _cpu_based_2nd_exec_control &=3D ~SECONDARY_EXEC_EPT_VIOLATION_VE; } if (!(_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_VPID) && @@ -4735,6 +4739,9 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx= *vmx) */ exec_control &=3D ~SECONDARY_EXEC_ENABLE_VMFUNC; =20 + if (!enable_mbec) + exec_control &=3D ~SECONDARY_EXEC_MODE_BASED_EPT_EXEC; + /* SECONDARY_EXEC_DESC is enabled/disabled on writes to CR4.UMIP, * in vmx_set_cr4. */ exec_control &=3D ~SECONDARY_EXEC_DESC; @@ -7823,6 +7830,11 @@ u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn,= bool is_mmio) return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT); } =20 +bool vmx_tdp_has_smep(struct kvm *kvm) +{ + return enable_mbec; +} + static void vmcs_set_secondary_exec_control(struct vcpu_vmx *vmx, u32 new_= ctl) { /* @@ -8622,6 +8634,8 @@ __init int vmx_hardware_setup(void) =20 if (!cpu_has_vmx_ept_ad_bits() || !enable_ept) enable_ept_ad_bits =3D 0; + if (!cpu_has_ept_mbec() || !enable_ept) + enable_mbec =3D 0; =20 if (!cpu_has_vmx_unrestricted_guest() || !enable_ept) enable_unrestricted_guest =3D 0; @@ -8683,7 +8697,7 @@ __init int vmx_hardware_setup(void) set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */ =20 if (enable_ept) - kvm_mmu_set_ept_masks(enable_ept_ad_bits); + kvm_mmu_set_ept_masks(enable_ept_ad_bits, enable_mbec); else vt_x86_ops.get_mt_mask =3D NULL; =20 diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index db84e8001da5..0a4e263c4095 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -567,6 +567,7 @@ static inline u8 vmx_get_rvi(void) SECONDARY_EXEC_ENABLE_VMFUNC | \ SECONDARY_EXEC_BUS_LOCK_DETECTION | \ SECONDARY_EXEC_NOTIFY_VM_EXITING | \ + SECONDARY_EXEC_MODE_BASED_EPT_EXEC | \ SECONDARY_EXEC_ENCLS_EXITING | \ SECONDARY_EXEC_EPT_VIOLATION_VE) =20 diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index d09abeac2b56..69cf276be88e 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -103,6 +103,7 @@ void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *e= oi_exit_bitmap); int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr); int vmx_set_identity_map_addr(struct kvm *kvm, u64 ident_addr); u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); +bool vmx_tdp_has_smep(struct kvm *kvm); =20 void vmx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code); --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DB8C4314D37 for ; Thu, 30 Apr 2026 15:08:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561692; cv=none; b=g0NHeFBDtl7XS/rgBTwsAUHqTuaYQdue15xBS2Bjy1X3fk0yWQaOUPzeEzWjBvNnOpxCKxpo2Nw23ALX2rrIq0MJjnj5kx56OKuLAZ+kfdTTOAuExfxh4nTxmwys+aCcV/lEBzXI8r7P18NDqzPoXFBcfg/RJtKACShi0ExIaoI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561692; c=relaxed/simple; bh=gmkDX3O4ane8ulIx5+0C+chNqfYWpBi43hyLaHGQBfI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=n6iBMF+WSAvzSKyj38Xr9M56CJ2SzULzykCIk7Ux56OoJwIgncxcC8t+lfG/emdjXdovT04fDcCfUajueNdtMPez+QZCDsM18FHZHmQyXa/55NooK7YrIVY0oV3Dr5ElSzTXsgancKNjNeZf9cbxW9WrUvqBRMQG54Wi7LU7pIE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=M9VDha0k; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="M9VDha0k" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561686; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jNcWIjVjbPOmXholqgQk/ETY/Ma9mMzr6y26fkTpZDk=; b=M9VDha0kyLyWIDgMqLO53Op9FhXl6LdSnJgXEN9/Kp5FcxdjWEa5HLj/IuNNIVLHaRI7OR syfi9B2PDhEG2erzH1Ved7PBM0K3iXPaAbk0bBv9NB67WsUFd2uFObsQUNSvaGtPE68f0+ wIQ3Uu1CacE0qHHWZtlw/ovdNg40cts= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-166-pndzp4nBPiKbma7gey3vRg-1; Thu, 30 Apr 2026 11:08:03 -0400 X-MC-Unique: pndzp4nBPiKbma7gey3vRg-1 X-Mimecast-MFC-AGG-ID: pndzp4nBPiKbma7gey3vRg_1777561682 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 2ACEA180036E; Thu, 30 Apr 2026 15:08:02 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 951351955D84; Thu, 30 Apr 2026 15:08:01 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 16/28] KVM: nVMX: pass advanced EPT violation vmexit info to guest Date: Thu, 30 Apr 2026 11:07:35 -0400 Message-ID: <20260430150747.76749-17-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" KVM will use advanced vmexit information for EPT violations to virtualize MBEC. Pass it to the guest since it is easy and allows testing nested nested. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/vmx.h | 4 ++++ arch/x86/kvm/mmu/paging_tmpl.h | 2 +- arch/x86/kvm/vmx/nested.c | 13 +++++++++---- 3 files changed, 14 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 54aa5be50df9..ed2ded531e55 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -535,6 +535,7 @@ enum vmcs_field { #define VMX_EPT_1GB_PAGE_BIT (1ull << 17) #define VMX_EPT_INVEPT_BIT (1ull << 20) #define VMX_EPT_AD_BIT (1ull << 21) +#define VMX_EPT_ADVANCED_VMEXIT_INFO_BIT (1ull << 22) #define VMX_EPT_EXTENT_CONTEXT_BIT (1ull << 25) #define VMX_EPT_EXTENT_GLOBAL_BIT (1ull << 26) =20 @@ -617,6 +618,9 @@ enum vm_entry_failure_code { EPT_VIOLATION_PROT_USER_EXEC) #define EPT_VIOLATION_GVA_IS_VALID BIT(7) #define EPT_VIOLATION_GVA_TRANSLATED BIT(8) +#define EPT_VIOLATION_GVA_USER BIT(9) +#define EPT_VIOLATION_GVA_WRITABLE BIT(10) +#define EPT_VIOLATION_GVA_NX BIT(11) =20 #define EPT_VIOLATION_RWX_TO_PROT(__epte) (((__epte) & VMX_EPT_RWX_MASK) <= < 3) #define EPT_VIOLATION_USER_EXEC_TO_PROT(__epte) (((__epte) & VMX_EPT_USER_= EXECUTABLE_MASK) >> 4) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 8dd9d510fc34..d4ce55195a7c 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -494,7 +494,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker= *walker, * [2:0] - Derive from the access bits. The exit_qualification might be * out of date if it is serving an EPT misconfiguration. * [5:3] - Calculated by the page walk of the guest EPT page tables - * [7:8] - Derived from [7:8] of real exit_qualification + * [7:11] - Derived from [7:11] of real exit_qualification * * The other bits are set to 0. */ diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 299d4ca60fb3..46b65475765d 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -443,10 +443,14 @@ static void nested_ept_inject_page_fault(struct kvm_v= cpu *vcpu, vm_exit_reason =3D EXIT_REASON_EPT_MISCONFIG; exit_qualification =3D 0; } else { + u64 mask =3D EPT_VIOLATION_GVA_IS_VALID | + EPT_VIOLATION_GVA_TRANSLATED; + if (vmx->nested.msrs.ept_caps & VMX_EPT_ADVANCED_VMEXIT_INFO_BIT) + mask |=3D EPT_VIOLATION_GVA_USER | + EPT_VIOLATION_GVA_WRITABLE | + EPT_VIOLATION_GVA_NX; exit_qualification =3D fault->exit_qualification; - exit_qualification |=3D vmx_get_exit_qual(vcpu) & - (EPT_VIOLATION_GVA_IS_VALID | - EPT_VIOLATION_GVA_TRANSLATED); + exit_qualification |=3D vmx_get_exit_qual(vcpu) & mask; vm_exit_reason =3D EXIT_REASON_EPT_VIOLATION; } =20 @@ -7240,7 +7244,8 @@ static void nested_vmx_setup_secondary_ctls(u32 ept_c= aps, VMX_EPT_PAGE_WALK_5_BIT | VMX_EPTP_WB_BIT | VMX_EPT_INVEPT_BIT | - VMX_EPT_EXECUTE_ONLY_BIT; + VMX_EPT_EXECUTE_ONLY_BIT | + VMX_EPT_ADVANCED_VMEXIT_INFO_BIT; =20 msrs->ept_caps &=3D ept_caps; msrs->ept_caps |=3D VMX_EPT_EXTENT_GLOBAL_BIT | --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2E0CF30AD05 for ; Thu, 30 Apr 2026 15:08:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561688; cv=none; b=hsblmpfKIb3jdsVATH0YVrWQ7lpe2E6npa42Sh4CsEtkdAmEWpyBU39duBQNY/Gvz7LvddoKuseijx2lLWCPnGkn6mMYMstGhAv2cLmaSPsRUApmqvoPxXJnwHkb/AkOjqs9yFIAIbb3jE5VKgPiCi4Y7VqilkMN5SaixP0YwC8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561688; c=relaxed/simple; bh=+u1fMN6nd8KUKMbyFT62MQ4AptYZ2XnnW9NGV9ACRcY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=lOHpHuQ4NC4h7nHJt4bOHzaN9dP/47E/ul3AuycgH8F4NN58bs+MF10HfUoShB9pnDjuNPAPBblJkRVmpMSAU/o1Fqc9JFmZtP8EksiN00oeEORKCVCB0djlqes5RfITlJYnu1m+hMuVTwcJzL/nsp0izmoO0udeW0EA80dB+ls= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=MpcLOPkE; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="MpcLOPkE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561685; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0LpoYunsVnYZDVJp7EOWJeICXFOg6+wbjkFfeG7udrI=; b=MpcLOPkEgKmJ5/d+Bwy4TlRz6MhpxR50NUYCO/0z2i/aJFD6GFJEHXHhbAAKH1EoKBt2LJ ODtD/er+WeEMQiiD5OW9QLRjZmL7/iRd66el8hFwBCzzTFrlFu2Ix6uQa79p5lKfBKE6dV 0LuwFT9/0Bu5xupDgAaWKaugiBn6qRA= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-638-1VOMbf7eNGaJEvrqN1PwSA-1; Thu, 30 Apr 2026 11:08:04 -0400 X-MC-Unique: 1VOMbf7eNGaJEvrqN1PwSA-1 X-Mimecast-MFC-AGG-ID: 1VOMbf7eNGaJEvrqN1PwSA_1777561683 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DAABF18002C0; Thu, 30 Apr 2026 15:08:02 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 51B8E1953944; Thu, 30 Apr 2026 15:08:02 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 17/28] KVM: nVMX: pass PFERR_USER_MASK to MMU on EPT violations Date: Thu, 30 Apr 2026 11:07:36 -0400 Message-ID: <20260430150747.76749-18-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" For EPT, PFERR_USER_MASK refers not to the CPL of the guest, but to the AND of the U bits encountered while walking guest page tables; this is consistent with how MBEC differentiates between XS and XU. This is available through the "advanced vmexit information for EPT violations" feature. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/common.h | 6 +++++- arch/x86/kvm/vmx/vmx.c | 10 ++++++++++ 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h index 40fa72f31fc7..48520fa1c8e8 100644 --- a/arch/x86/kvm/vmx/common.h +++ b/arch/x86/kvm/vmx/common.h @@ -100,9 +100,13 @@ static inline int __vmx_handle_ept_violation(struct kv= m_vcpu *vcpu, gpa_t gpa, error_code |=3D (exit_qualification & EPT_VIOLATION_PROT_USER_EXEC) ? PFERR_PRESENT_MASK : 0; =20 - if (exit_qualification & EPT_VIOLATION_GVA_IS_VALID) + if (exit_qualification & EPT_VIOLATION_GVA_IS_VALID) { error_code |=3D (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) ? PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK; + if ((exit_qualification & (EPT_VIOLATION_GVA_TRANSLATED|EPT_VIOLATION_GV= A_USER)) + =3D=3D (EPT_VIOLATION_GVA_TRANSLATED|EPT_VIOLATION_GVA_USER)) + error_code |=3D PFERR_USER_MASK; + } =20 if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) error_code |=3D PFERR_PRIVATE_ACCESS; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 72a75fa33c93..5e4aa519423c 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2790,6 +2790,16 @@ static int setup_vmcs_config(struct vmcs_config *vmc= s_conf, vmx_cap->vpid =3D 0; } =20 + /* + * Virtualizing MBEC requires advanced vmexit information in order to + * distinguish supervisor and user accesses. For simplicity and clarity + * disable MBEC entirely if advanced vmexit information is not available, + * this way mbec=3D1 in the kvm_intel module parameters implies availabil= ity + * to nested guests as well. + */ + if (!(vmx_cap->ept & VMX_EPT_ADVANCED_VMEXIT_INFO_BIT)) + _cpu_based_2nd_exec_control &=3D ~SECONDARY_EXEC_MODE_BASED_EPT_EXEC; + if (!cpu_has_sgx()) _cpu_based_2nd_exec_control &=3D ~SECONDARY_EXEC_ENCLS_EXITING; =20 --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 119962D97B9 for ; Thu, 30 Apr 2026 15:08:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561692; cv=none; b=VZDwWA/usIaWk1BERjuR1ydQvSQjlO8wOEsLTSMEa/wXH6W/dRiCFPf7R2Zlp3TN912+FA1iOiiZfc7fcFTTNkn8H3GpwF4Lz5KuxagKptJABY2t6rGctd/MMMu5rD0ZMggPUdsnUL8hIkZyh/CP2rsUy2sGhSgnI/9wJh7jnoU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561692; c=relaxed/simple; bh=GZSSTljDv+ergzv4M2yqC91lQETUIFR9uFQF4nCQLc0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=kNaSWfy5bay94Y0/CubE63xn4VDyEfaoEUFlldcJZwV32ZVoQEuGOPfvBH85GxdrGTZol6NNu1GzG/b2jTdAIdHsTQU2PkJ7OZI7YCagU9v53aI6CkIBXiA3UMm1duVg7WjifzHBiW2MLf5/n12JIUFGdhqLjBONQ82uUYL+b6Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=DdvE9oXr; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="DdvE9oXr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561689; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CZeZAbqcBwGRE4+Xb1AswEMbnFoIKz17RW1UR3CzwhE=; b=DdvE9oXr6I5CVsWCF51dZuguKMWrPu4aHqWOSLz/sLCjkAiIFjBB7PPjb3FyMkd5wk6KLy /ggduMUqsOR58LXbF8oIyP2+nxI5ShsISsZvS97tMUpOvrB0cgnWhi6VLGLtnZD74Ozg7H Lfm927sAXOcomYgllt9Ae7XpZu0gXVw= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-283-eWwqrcbnN0GD_xt9snrH-g-1; Thu, 30 Apr 2026 11:08:04 -0400 X-MC-Unique: eWwqrcbnN0GD_xt9snrH-g-1 X-Mimecast-MFC-AGG-ID: eWwqrcbnN0GD_xt9snrH-g_1777561683 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 94E421956059; Thu, 30 Apr 2026 15:08:03 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 0D51B1955D84; Thu, 30 Apr 2026 15:08:02 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 18/28] KVM: x86/mmu: add support for MBEC to EPT page table walks Date: Thu, 30 Apr 2026 11:07:37 -0400 Message-ID: <20260430150747.76749-19-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" Extend the page walker to support moving bit 10 of the PTEs into ACC_USER_EXEC_MASK and bit 6 of the exit qualification of EPT violation VM exits. Note that while mmu_has_mbec()/cr4_smep affect the interpretation of ACC_USER_EXEC_MASK and add bit 10 as a "present bit" in guest EPT page table entries, they do not affect how KVM operates on SPTEs. That's because the MMU uses explicit ACC_USER_EXEC_MASK/shadow_xu_mask even for the non-nested EPT; the only difference is that ACC_USER_EXEC_MASK and ACC_EXEC_MASK will always be set in tandem outside the nested scenario. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 13 +++++++++++-- arch/x86/kvm/mmu/paging_tmpl.h | 27 +++++++++++++++++++++------ arch/x86/kvm/mmu/spte.h | 2 ++ arch/x86/kvm/vmx/nested.c | 9 +++++++++ 4 files changed, 43 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 245a2e92793d..fe87eee43b09 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5570,7 +5570,6 @@ static void update_permission_bitmask(struct kvm_mmu = *mmu, bool ept) { unsigned byte; =20 - const u16 x =3D ACC_BITS_MASK(ACC_EXEC_MASK); const u16 w =3D ACC_BITS_MASK(ACC_WRITE_MASK); const u16 r =3D ACC_BITS_MASK(ACC_READ_MASK); =20 @@ -5611,8 +5610,18 @@ static void update_permission_bitmask(struct kvm_mmu= *mmu, bool ept) u16 smapf =3D 0; =20 if (ept) { - ff =3D (pfec & PFERR_FETCH_MASK) ? (u16)~x : 0; + const u16 xs =3D ACC_BITS_MASK(ACC_EXEC_MASK); + const u16 xu =3D ACC_BITS_MASK(ACC_USER_EXEC_MASK); + + if (pfec & PFERR_FETCH_MASK) { + /* Ignore XU unless MBEC is enabled. */ + if (cr4_smep) + ff =3D pfec & PFERR_USER_MASK ? (u16)~xu : (u16)~xs; + else + ff =3D (u16)~xs; + } } else { + const u16 x =3D ACC_BITS_MASK(ACC_EXEC_MASK); const u16 u =3D ACC_BITS_MASK(ACC_USER_MASK); =20 /* Faults from kernel mode accesses to user pages */ diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index d4ce55195a7c..f741f7d4cc2d 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -124,12 +124,17 @@ static inline void FNAME(protect_clean_gpte)(struct k= vm_mmu *mmu, unsigned *acce *access &=3D mask; } =20 -static inline int FNAME(is_present_gpte)(unsigned long pte) +static inline int FNAME(is_present_gpte)(struct kvm_mmu *mmu, + unsigned long pte) { #if PTTYPE !=3D PTTYPE_EPT return pte & PT_PRESENT_MASK; #else - return pte & 7; + /* + * For EPT, an entry is present if any of bits 2:0 are set. + * With mode-based execute control, bit 10 also indicates presence. + */ + return pte & (7 | (mmu_has_mbec(mmu) ? VMX_EPT_USER_EXECUTABLE_MASK : 0)); #endif } =20 @@ -152,7 +157,7 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcp= u *vcpu, struct kvm_mmu_page *sp, u64 *spte, u64 gpte) { - if (!FNAME(is_present_gpte)(gpte)) + if (!FNAME(is_present_gpte)(vcpu->arch.mmu, gpte)) goto no_present; =20 /* Prefetch only accessed entries (unless A/D bits are disabled). */ @@ -173,10 +178,17 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_v= cpu *vcpu, static inline unsigned FNAME(gpte_access)(u64 gpte) { unsigned access; + /* + * Set bits in ACC_*_MASK even if they might not be used in the + * actual checks. For example, if EFER.NX is clear permission_fault() + * will ignore ACC_EXEC_MASK, and if MBEC is disabled it will + * ignore ACC_USER_EXEC_MASK. + */ #if PTTYPE =3D=3D PTTYPE_EPT access =3D ((gpte & VMX_EPT_WRITABLE_MASK) ? ACC_WRITE_MASK : 0) | ((gpte & VMX_EPT_EXECUTABLE_MASK) ? ACC_EXEC_MASK : 0) | - ((gpte & VMX_EPT_READABLE_MASK) ? ACC_READ_MASK : 0); + ((gpte & VMX_EPT_READABLE_MASK) ? ACC_READ_MASK : 0) | + ((gpte & VMX_EPT_USER_EXECUTABLE_MASK) ? ACC_USER_EXEC_MASK : 0); #else /* * P is set here, so the page is always readable and W/U/!NX represent @@ -331,7 +343,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker= *walker, if (walker->level =3D=3D PT32E_ROOT_LEVEL) { pte =3D mmu->get_pdptr(vcpu, (addr >> 30) & 3); trace_kvm_mmu_paging_element(pte, walker->level); - if (!FNAME(is_present_gpte)(pte)) + if (!FNAME(is_present_gpte)(mmu, pte)) goto error; --walker->level; } @@ -414,7 +426,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker= *walker, */ pte_access =3D pt_access & (pte ^ walk_nx_mask); =20 - if (unlikely(!FNAME(is_present_gpte)(pte))) + if (unlikely(!FNAME(is_present_gpte)(mmu, pte))) goto error; =20 if (unlikely(FNAME(is_rsvd_bits_set)(mmu, pte, walker->level))) { @@ -521,6 +533,9 @@ static int FNAME(walk_addr_generic)(struct guest_walker= *walker, * ACC_*_MASK flags! */ walker->fault.exit_qualification |=3D EPT_VIOLATION_RWX_TO_PROT(pte_acce= ss); + if (mmu_has_mbec(mmu)) + walker->fault.exit_qualification |=3D + EPT_VIOLATION_USER_EXEC_TO_PROT(pte_access); } #endif walker->fault.address =3D addr; diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index f5261d993eac..fe9571837fee 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -395,6 +395,8 @@ static inline bool __is_rsvd_bits_set(struct rsvd_bits_= validate *rsvd_check, static inline bool __is_bad_mt_xwr(struct rsvd_bits_validate *rsvd_check, u64 pte) { + if (pte & VMX_EPT_USER_EXECUTABLE_MASK) + pte |=3D VMX_EPT_EXECUTABLE_MASK; return rsvd_check->bad_mt_xwr & BIT_ULL(pte & 0x3f); } =20 diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 46b65475765d..84f5c25a1f12 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -7452,6 +7452,15 @@ static gpa_t vmx_translate_nested_gpa(struct kvm_vcp= u *vcpu, gpa_t gpa, struct kvm_mmu *mmu =3D vcpu->arch.mmu; =20 BUG_ON(!mmu_is_nested(vcpu)); + + /* + * MBEC differentiates based on the effective U/S bit of + * the guest page tables; not the processor CPL. + */ + access &=3D ~PFERR_USER_MASK; + if ((pte_access & ACC_USER_MASK) && (access & PFERR_GUEST_FINAL_MASK)) + access |=3D PFERR_USER_MASK; + return mmu->gva_to_gpa(vcpu, mmu, gpa, access, exception); } =20 --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 27C42324B06 for ; Thu, 30 Apr 2026 15:08:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561692; cv=none; b=jtRADjzcYavgk1ObBD2aKkrzWqPvPa4WPZXVueOR9YfbUeepkuKYz2uxPPFFkb2xHlnXg8UOrWfDKmFQFkMCW4z+qjz9T7Q3FCjinHfmbNoh+CYh5entAwu8imxISOmLstPCaF2bhg2fQ/5MdUedKytr+eoyFaLnCOoxL3FeUI0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561692; c=relaxed/simple; bh=35oJC/GFObw9teYQb0y/VjRC1YMrNqwMKCmR2K0ave8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=uVektxW7xJUCTHXZwRmGC9Cp0j8Mxg9k90z0pq6v3mcLxGzKdFj43so2DJ/uzSBly/UIQVsuC5kyvCEeZtbIeeAgv6fDJvCsbxJ+Uhsg8eLAatmhfJ2ljS0jV+B5+DLkdjeUHg2Fx14gszUGtUym8HgOTmM0mBHsD+OgRx/01J8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=FiixqfY7; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="FiixqfY7" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561689; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MBcY9e96Aa/BbabXbf6e88iFvY8eREiLagxnvP8M/Dg=; b=FiixqfY7doqzXyfsfUOnQ5N/nzNzJtxB2/1Qq4/4uOdtreW90dsie+oyvZOXWHjf5tstAX vDtVIfwo6ffKgu6bu/PQz+y78TALeOUyJKRl/74KbnXzgX00qkh+tpXV4klAX1/Q9vnmBz ip8/v3PDdarRNONFBffTVrgiess92wY= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-608-97Rt56zQMxyCZS6sXrR37g-1; Thu, 30 Apr 2026 11:08:05 -0400 X-MC-Unique: 97Rt56zQMxyCZS6sXrR37g-1 X-Mimecast-MFC-AGG-ID: 97Rt56zQMxyCZS6sXrR37g_1777561684 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 50EC9180049F; Thu, 30 Apr 2026 15:08:04 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id BBE1E1955D84; Thu, 30 Apr 2026 15:08:03 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 19/28] KVM: nVMX: advertise MBEC to nested guests Date: Thu, 30 Apr 2026 11:07:38 -0400 Message-ID: <20260430150747.76749-20-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" From: Jon Kohler Advertise SECONDARY_EXEC_MODE_BASED_EPT_EXEC (MBEC) to userspace, which allows userspace to expose and advertise the feature to the guest. When MBEC is enabled by the guest, it is passed to the MMU via cr4_smep, and to the processor by the merging of vmcs12->secondary_vm_exec_control into the VMCS02's secondary VM execution controls. Signed-off-by: Jon Kohler Message-ID: <20251223054806.1611168-9-jon@nutanix.com> Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu.h | 2 +- arch/x86/kvm/mmu/mmu.c | 7 ++++--- arch/x86/kvm/mmu/spte.c | 10 ++++++---- arch/x86/kvm/vmx/nested.c | 11 +++++++++++ 4 files changed, 22 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 23bc5b18efd0..e1e3869f568b 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -100,7 +100,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, uns= igned long cr0, unsigned long cr4, u64 efer, gpa_t nested_cr3); void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly, int huge_page_level, bool accessed_dirty, - gpa_t new_eptp); + bool mbec, gpa_t new_eptp); bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu); int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, u64 fault_address, char *insn, int insn_len); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index fe87eee43b09..27776d0b2ad9 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5959,7 +5959,7 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_init_shadow_npt_mm= u); =20 static union kvm_cpu_role kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_di= rty, - bool execonly, u8 level) + bool execonly, u8 level, bool mbec) { union kvm_cpu_role role =3D {0}; =20 @@ -5969,6 +5969,7 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *v= cpu, bool accessed_dirty, */ WARN_ON_ONCE(is_smm(vcpu)); role.base.level =3D level; + role.base.cr4_smep =3D mbec; role.base.has_4_byte_gpte =3D false; role.base.direct =3D false; role.base.ad_disabled =3D !accessed_dirty; @@ -5984,13 +5985,13 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu = *vcpu, bool accessed_dirty, =20 void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly, int huge_page_level, bool accessed_dirty, - gpa_t new_eptp) + bool mbec, gpa_t new_eptp) { struct kvm_mmu *context =3D &vcpu->arch.guest_mmu; u8 level =3D vmx_eptp_page_walk_level(new_eptp); union kvm_cpu_role new_mode =3D kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty, - execonly, level); + execonly, level, mbec); =20 if (new_mode.as_u64 !=3D context->cpu_role.as_u64) { /* EPT, and thus nested EPT, does not consume CR0, CR4, nor EFER. */ diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 09e6f494dcf4..9b5ce4d1fa65 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -517,10 +517,12 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits) * host's MBEC setting does not matter. On hardware without MBEC * the XU bit is reserved-as-ignored, and setting it does no harm. * - * For nested EPT MBEC is not supported, but bit 10 of the gPTE has - * no effect because (a) is_present_gpte() does not treat it as a - * present bit, and (b) permission_fault() uses an mmu->permissions[] - * array that effectively ignores ACC_USER_EXEC_MASK. + * For nested EPT, when MBEC is disabled by L1, correctness relies + * on (a) ignoring bit 10 of the gPTE in is_present_gpte(), rather + * than treating it as a present bit, and (b) permission_fault() + * using an mmu->permissions[] array that effectively ignores + * ACC_USER_EXEC_MASK. Bit 10 of the gPTE does end up mirrored + * in the sPTEs but is ignored because L2 runs with MBEC disabled. */ shadow_xu_mask =3D VMX_EPT_USER_EXECUTABLE_MASK; shadow_present_mask =3D VMX_EPT_SUPPRESS_VE_BIT; diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 84f5c25a1f12..bc1046f32ebc 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -469,6 +469,13 @@ static void nested_ept_inject_page_fault(struct kvm_vc= pu *vcpu, vmcs12->guest_physical_address =3D fault->address; } =20 +static inline bool nested_ept_mbec_enabled(struct kvm_vcpu *vcpu) +{ + struct vmcs12 *vmcs12 =3D get_vmcs12(vcpu); + + return nested_cpu_has2(vmcs12, SECONDARY_EXEC_MODE_BASED_EPT_EXEC); +} + static void nested_ept_new_eptp(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx =3D to_vmx(vcpu); @@ -477,6 +484,7 @@ static void nested_ept_new_eptp(struct kvm_vcpu *vcpu) =20 kvm_init_shadow_ept_mmu(vcpu, execonly, ept_lpage_level, nested_ept_ad_enabled(vcpu), + nested_ept_mbec_enabled(vcpu), nested_ept_get_eptp(vcpu)); } =20 @@ -7257,6 +7265,9 @@ static void nested_vmx_setup_secondary_ctls(u32 ept_c= aps, msrs->ept_caps |=3D VMX_EPT_AD_BIT; } =20 + if (enable_mbec) + msrs->secondary_ctls_high |=3D + SECONDARY_EXEC_MODE_BASED_EPT_EXEC; /* * Advertise EPTP switching irrespective of hardware support, * KVM emulates it in software so long as VMFUNC is supported. --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C4764427A1D for ; Thu, 30 Apr 2026 15:08:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561694; cv=none; b=ZF2DOkGGoPIgyk110s+hbVIt6Hbk298ZnEEuOcYb0nDKdaS4Hl4JPHTVQ7FaX7MNghWakbls638e/+U6E36hxgRaV0BMVYfeWa455wgMHpcBipYLDuPnwEEHwQvJiyOsXzz45sL46IAHstfxZj4nhKMLV/fiTJJrCWcH2bnk0vE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561694; c=relaxed/simple; bh=i3jQrAyLxnhA/t/BHsxb3+sqwKCc2Ml3j5/foser9aU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=KCFGBOvGnFFVFwD9PnqZpXcOEUJgdzcG/6Onf7nWtoPme4oLqmNI5sjvpZz64htLHsD7j9BkoX2ecxykgN+cIqAtqGhkawzmtpCm9GgbgJqsCG0g1gMWPwA70ZaE+/iSLLc+9YQz6tjfuV2B+JZpj9NiU6lFqkhVPdrqiejizno= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=IKYfotp4; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="IKYfotp4" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561690; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+YJGyvsFomEhH/bKnG38bbVfFx34TKz1rxdBHf//6Lw=; b=IKYfotp4SvyDHm7re02DojocqM+1pmwJ8wAFU6rrYI1I4creu4RXGxp+vHPx+tMDSiwjoc ZLXG0N1Ws4tHbZac3O/9D2x9zvpA4roW0ZuZigWeVUA3qTjDxALytZHDwxutBiyDhiTRgn REX/NkIYfDDBiYM3F34icLE4triWZ/o= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-63-oMXG1ngJMIWpAgjX4mEO2A-1; Thu, 30 Apr 2026 11:08:06 -0400 X-MC-Unique: oMXG1ngJMIWpAgjX4mEO2A-1 X-Mimecast-MFC-AGG-ID: oMXG1ngJMIWpAgjX4mEO2A_1777561685 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8CBE019560A1; Thu, 30 Apr 2026 15:08:05 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 04E6F180034F; Thu, 30 Apr 2026 15:08:04 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 20/28] KVM: nVMX: allow MBEC with EVMCS Date: Thu, 30 Apr 2026 11:07:39 -0400 Message-ID: <20260430150747.76749-21-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" From: Jon Kohler Extend EVMCS1_SUPPORTED_2NDEXEC to allow MBEC and EVMCS to coexist. Presenting both EVMCS and MBEC simultaneously causes KVM to filter out MBEC and not present it as a supported control to the guest, preventing performance gains from MBEC when Windows HVCI is enabled. The guest may choose not to use MBEC (e.g., if the admin does not enable Windows HVCI / Memory Integrity), but if they use traditional nested virt (Hyper-V, WSL2, etc.), having EVMCS exposed is important for improving nested guest performance. IOW allowing MBEC and EVMCS to coexist provides maximum optionality to Windows users without overcomplicating VM administration. Signed-off-by: Jon Kohler Message-ID: <20251223054806.1611168-8-jon@nutanix.com> Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/hyperv_evmcs.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kvm/vmx/hyperv_evmcs.h b/arch/x86/kvm/vmx/hyperv_evmc= s.h index fc7c4e7bd1bf..bc08fe40590e 100644 --- a/arch/x86/kvm/vmx/hyperv_evmcs.h +++ b/arch/x86/kvm/vmx/hyperv_evmcs.h @@ -87,6 +87,7 @@ SECONDARY_EXEC_PT_CONCEAL_VMX | \ SECONDARY_EXEC_BUS_LOCK_DETECTION | \ SECONDARY_EXEC_NOTIFY_VM_EXITING | \ + SECONDARY_EXEC_MODE_BASED_EPT_EXEC | \ SECONDARY_EXEC_ENCLS_EXITING) =20 #define EVMCS1_SUPPORTED_3RDEXEC (0ULL) --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9178E328B62 for ; Thu, 30 Apr 2026 15:08:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561693; cv=none; b=DEFgVxDS4k0BUOLo9i+uP051js8wZgclpFgAsEvIbGB1A682WVKiOlDaJPkznfScT2UuaA1mn6vMMYr808fA9W4oCkf3LMGs6gkeaSZm0QA1kGSlZ4lf7AEHnyMtBU+7pfgsjr221WI65/OaoR0INOdVNnteoHurB7ufBteuhY0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561693; c=relaxed/simple; bh=POoeonNKM+9smTwZUSI9eZc9XummIWOu9dGydt9Q9Xo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=F8xR2tNg26y62mG6qCOhzTEcmEQmgbBcPR/AkC/RFuyElyc4ea3vi16tJRhwItW2yQN0ItI0xYiSQYxcmPk172/siRBd0NYFSE0U4nT5MlUToLmYk8+scqSz+w5DEsSZZ3XDRcQdi8cLY9KdAo44zFPuF4Pbnsb7Zao4xD108aI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ZTxRBlTn; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZTxRBlTn" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561689; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pi6Mxveh+nSgltzKstw/k+E1UcHUk+4daKGVjEoWrG8=; b=ZTxRBlTnSkLELlRFNdzRqk+QWBQ13AUZn1dtntnvQwipmLR+6bdu6fgjaAgkMjlFAE5khh r+XSzvBahBpxfMg2rBvBwbvZFXnqkIeFwnp672YRUeKkNvRxqxwis1g0CRFwzHoym+oVQo ULZiAZfpwDXYUCuyRtcGMBQEyTqD1PU= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-632-hbhtxzKANxG1R2RDN2UWow-1; Thu, 30 Apr 2026 11:08:07 -0400 X-MC-Unique: hbhtxzKANxG1R2RDN2UWow-1 X-Mimecast-MFC-AGG-ID: hbhtxzKANxG1R2RDN2UWow_1777561686 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4856719560BD; Thu, 30 Apr 2026 15:08:06 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B53AB180045E; Thu, 30 Apr 2026 15:08:05 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 21/28] KVM: x86/mmu: propagate access mask from root pages down Date: Thu, 30 Apr 2026 11:07:40 -0400 Message-ID: <20260430150747.76749-22-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" Until now, all SPTEs have had all kinds of access allowed; however, for GMET to be enabled all the pages have to have ACC_USER_MASK disabled. By marking them as supervisor pages, the processor allows execution from either user or supervisor mode (unlike for normal paging, NPT ignores the U bit for reads and writes). That will mean that the root page's role has ACC_USER_MASK cleared and that has to be propagated down through the kvm_mmu_page tree. Do that, and pass the required access to the kvm_mmu_spte_requested tracepoint since it's not ACC_ALL anymore. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 9 +++++---- arch/x86/kvm/mmu/mmutrace.h | 10 ++++++---- arch/x86/kvm/mmu/paging_tmpl.h | 2 +- arch/x86/kvm/mmu/tdp_mmu.c | 6 +++--- 4 files changed, 15 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 27776d0b2ad9..5836ff595e32 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3446,12 +3446,13 @@ static int direct_map(struct kvm_vcpu *vcpu, struct= kvm_page_fault *fault) { struct kvm_shadow_walk_iterator it; struct kvm_mmu_page *sp; - int ret; + int ret, access; gfn_t base_gfn =3D fault->gfn; =20 kvm_mmu_hugepage_adjust(vcpu, fault); =20 - trace_kvm_mmu_spte_requested(fault); + access =3D vcpu->arch.mmu->root_role.access; + trace_kvm_mmu_spte_requested(fault, access); for_each_shadow_entry(vcpu, fault->addr, it) { /* * We cannot overwrite existing page tables with an NX @@ -3464,7 +3465,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct k= vm_page_fault *fault) if (it.level =3D=3D fault->goal_level) break; =20 - sp =3D kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, true, ACC_ALL); + sp =3D kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, true, access); if (sp =3D=3D ERR_PTR(-EEXIST)) continue; =20 @@ -3477,7 +3478,7 @@ static int direct_map(struct kvm_vcpu *vcpu, struct k= vm_page_fault *fault) if (WARN_ON_ONCE(it.level !=3D fault->goal_level)) return -EFAULT; =20 - ret =3D mmu_set_spte(vcpu, fault->slot, it.sptep, ACC_ALL, + ret =3D mmu_set_spte(vcpu, fault->slot, it.sptep, access, base_gfn, fault->pfn, fault); if (ret =3D=3D RET_PF_SPURIOUS) return ret; diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h index 3429c1413f42..fa01719baf8d 100644 --- a/arch/x86/kvm/mmu/mmutrace.h +++ b/arch/x86/kvm/mmu/mmutrace.h @@ -373,23 +373,25 @@ TRACE_EVENT( =20 TRACE_EVENT( kvm_mmu_spte_requested, - TP_PROTO(struct kvm_page_fault *fault), - TP_ARGS(fault), + TP_PROTO(struct kvm_page_fault *fault, u8 access), + TP_ARGS(fault, access), =20 TP_STRUCT__entry( __field(u64, gfn) __field(u64, pfn) __field(u8, level) + __field(u8, access) ), =20 TP_fast_assign( __entry->gfn =3D fault->gfn; __entry->pfn =3D fault->pfn | (fault->gfn & (KVM_PAGES_PER_HPAGE(fault->= goal_level) - 1)); __entry->level =3D fault->goal_level; + __entry->access =3D access; ), =20 - TP_printk("gfn %llx pfn %llx level %d", - __entry->gfn, __entry->pfn, __entry->level + TP_printk("gfn %llx pfn %llx level %d access %x", + __entry->gfn, __entry->pfn, __entry->level, __entry->access ) ); =20 diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index f741f7d4cc2d..047400af924d 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -734,7 +734,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct k= vm_page_fault *fault, */ kvm_mmu_hugepage_adjust(vcpu, fault); =20 - trace_kvm_mmu_spte_requested(fault); + trace_kvm_mmu_spte_requested(fault, gw->pte_access); =20 for (; shadow_walk_okay(&it); shadow_walk_next(&it)) { /* diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 7b1102d26f9c..5a2f8ce9a32b 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1185,9 +1185,9 @@ static int tdp_mmu_map_handle_target_level(struct kvm= _vcpu *vcpu, } =20 if (unlikely(!fault->slot)) - new_spte =3D make_mmio_spte(vcpu, iter->gfn, ACC_ALL); + new_spte =3D make_mmio_spte(vcpu, iter->gfn, sp->role.access); else - wrprot =3D make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn, + wrprot =3D make_spte(vcpu, sp, fault->slot, sp->role.access, iter->gfn, fault->pfn, iter->old_spte, fault->prefetch, false, fault->map_writable, &new_spte); =20 @@ -1272,7 +1272,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm= _page_fault *fault) =20 kvm_mmu_hugepage_adjust(vcpu, fault); =20 - trace_kvm_mmu_spte_requested(fault); + trace_kvm_mmu_spte_requested(fault, root->role.access); =20 rcu_read_lock(); =20 --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 500BC43D50C for ; Thu, 30 Apr 2026 15:08:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561697; cv=none; b=iWwjGh/P5vCJH6ZeL98og4+c0eshuf9xcwKpEjSe/x5wPxNhHeL9v1jdC5S/wCdXzSLlzkcpIRbyBdGEnxGyB8/DVzPtjuOf5xpmn0GTCdbeOCL17WnRIwFsLaiMxAcg1fuh43YmFUNMtsUdQwZW3sxsKm83hrvW6Xa/jy2LW78= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561697; c=relaxed/simple; bh=9YNpdEUdkuiS04q7Of+MvfTKX9DIsMX/RVhBzzglv8k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=YI4By/JJfDjGk0iSkoG4CyspbZrlf2MkLsajt9YGrvsxlKXgfKJXHisCYgeARvYT1ZBJlAwV1xvwvz+1FR7oY50efH6C15bI1In9uydjYuJoGWE8cRTGHQsckg57NYJjKWksVmEELy7lISeVr+ykBg1tqnq9mwWSRPk9spVef4w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=d6JTSZtN; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="d6JTSZtN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561694; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4Le+YTyora37ZygRRJFpMnAlaGoD4oOtJ4ONn7Rk0JQ=; b=d6JTSZtN3pHCNU0joIA0TIDLtcXAfe14LFwtlLNzgdpZJdNLTkx0Ut1nVSFiICBw3XZoxM zJ2ikvgdAqqybnxbFRWWb/ia5vrSC27VxRHZHHAl+69qmCOpoeAQ8AvbDr8JK+uvpxqSDR v1Z+Yrj+i/jqJSQGz0rDQaW41mXyP8A= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-326-Cc7OluurMnWjFTniwuIsTA-1; Thu, 30 Apr 2026 11:08:10 -0400 X-MC-Unique: Cc7OluurMnWjFTniwuIsTA-1 X-Mimecast-MFC-AGG-ID: Cc7OluurMnWjFTniwuIsTA_1777561687 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0106D19560AA; Thu, 30 Apr 2026 15:08:07 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 6E4FB180058B; Thu, 30 Apr 2026 15:08:06 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 22/28] KVM: x86/mmu: introduce cpu_role bit for availability of PFEC.I/D Date: Thu, 30 Apr 2026 11:07:41 -0400 Message-ID: <20260430150747.76749-23-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" While GMET looks a lot like SMEP, it has several annoying differences. The main one is that the availability of the I/D bit in the page fault error code still depends on the host CR4.SMEP and EFER.NXE bits. If the base.cr4_smep bit of the cpu_role is (ab)used to enable GMET, there needs to be another place where the host CR4.SMEP is read from; just merge it with EFER.NXE into a new cpu_role bit that tells paging_tmpl.h whether to set the I/D bit at all. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 7 +++++++ arch/x86/kvm/mmu/mmu.c | 8 ++++++++ arch/x86/kvm/mmu/paging_tmpl.h | 2 +- 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 23a7ac8d7fbe..7dde4ca87752 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -414,6 +414,13 @@ union kvm_mmu_extended_role { unsigned int cr4_smap:1; unsigned int cr4_la57:1; unsigned int efer_lma:1; + + /* + * True if either CR4.SMEP or EFER.NXE are set. For AMD NPT + * this is the "real" host CR4.SMEP whereas cr4_smep is + * actually GMET. + */ + unsigned int has_pferr_fetch:1; }; }; =20 diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 5836ff595e32..93f96673d02a 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -234,6 +234,11 @@ BUILD_MMU_ROLE_ACCESSOR(ext, cr4, la57); BUILD_MMU_ROLE_ACCESSOR(base, efer, nx); BUILD_MMU_ROLE_ACCESSOR(ext, efer, lma); =20 +static inline bool has_pferr_fetch(struct kvm_mmu *mmu) +{ + return mmu->cpu_role.ext.has_pferr_fetch; +} + static inline bool is_cr0_pg(struct kvm_mmu *mmu) { return mmu->cpu_role.base.level > 0; @@ -5793,6 +5798,8 @@ static union kvm_cpu_role kvm_calc_cpu_role(struct kv= m_vcpu *vcpu, role.ext.cr4_pke =3D ____is_efer_lma(regs) && ____is_cr4_pke(regs); role.ext.cr4_la57 =3D ____is_efer_lma(regs) && ____is_cr4_la57(regs); role.ext.efer_lma =3D ____is_efer_lma(regs); + + role.ext.has_pferr_fetch =3D role.base.efer_nx | role.base.cr4_smep; return role; } =20 @@ -5946,6 +5953,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, u= nsigned long cr0, =20 /* NPT requires CR0.PG=3D1. */ WARN_ON_ONCE(cpu_role.base.direct || !cpu_role.base.guest_mode); + cpu_role.base.cr4_smep =3D false; =20 root_role =3D cpu_role.base; root_role.level =3D kvm_mmu_get_tdp_level(vcpu); diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 047400af924d..07100bbfc270 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -489,7 +489,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker= *walker, =20 error: errcode |=3D write_fault | user_fault; - if (fetch_fault && (is_efer_nx(mmu) || is_cr4_smep(mmu))) + if (fetch_fault && has_pferr_fetch(mmu)) errcode |=3D PFERR_FETCH_MASK; =20 walker->fault.vector =3D PF_VECTOR; --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0093A43C05D for ; Thu, 30 Apr 2026 15:08:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561696; cv=none; b=dTGfyApTcbSwB0ekpdvCv2ZXo4zpZWRnyWb25tVagUTA9/wl+4aNJVRRE4U1eWitxNecyT8D451wV0dKbzpLZlyF3Js790cfEWmDqFBRAprDAIbs1XAgX5D3SInMjHJ5K0cgVCGT1/LK9LWbuXCjx8b9tm/CS5AkeyWosZ4sfl8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561696; c=relaxed/simple; bh=yWXdoiQVh8ATDy6+edjM/jh5GvM/cnaOKN/EbxBl6do=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=KOW/VMfsX5U62e/GZ0Ygsc+WQQTrmO7W1DOo3uCk+ZYPpCx7H2cnd7dlfyt/bK57wwhTVuGod45UiMtYerSEu4ksEkXpPEpsbM5XBFHINwlUA1gPujzVvSlCZ+HL1Dd3ic5nzyODhoDmRuwVWGVmUAl6xZdXhGXwT97aWn3JdLo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=OowoG+QT; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="OowoG+QT" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561692; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eMXRbG836TtpgBO9IHiswDkh+14f6bX9mDfcP8XvT0M=; b=OowoG+QTFrzTYExnfrk9ZKY03Vo+kOP3mkmyfh8D/WQ0/svJsTUUg3NOhzZwUHBWZofe01 IurSPhSw/pvJ556zGXXHuTFexY5wi13NKsodUrNqvp3Jq5Nqjg/oVlX+UmtdIMKZ2ZdTxX IOcE9QMIgvxz/WQNTlD1AzSkNiy8+zE= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-147-eHQvP40FM5SU0V5qIRzPTQ-1; Thu, 30 Apr 2026 11:08:09 -0400 X-MC-Unique: eHQvP40FM5SU0V5qIRzPTQ-1 X-Mimecast-MFC-AGG-ID: eHQvP40FM5SU0V5qIRzPTQ_1777561688 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3208A18003FC; Thu, 30 Apr 2026 15:08:08 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 27BB6180045E; Thu, 30 Apr 2026 15:08:07 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com, "Borislav Petkov (AMD)" Subject: [PATCH 23/28] KVM: SVM: add GMET bit definitions Date: Thu, 30 Apr 2026 11:07:42 -0400 Message-ID: <20260430150747.76749-24-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" GMET (Guest Mode Execute Trap) is an AMD virtualization feature, essentially the nested paging version of SMEP. Hyper-V uses it; add it in preparation for making it available to hypervisors running under KVM. Acked-by: Borislav Petkov (AMD) Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/svm.h | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpuf= eatures.h index de7bd88e539d..d58dbce83f45 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -379,6 +379,7 @@ #define X86_FEATURE_AVIC (15*32+13) /* "avic" Virtual Interrupt Controlle= r */ #define X86_FEATURE_V_VMSAVE_VMLOAD (15*32+15) /* "v_vmsave_vmload" Virtua= l VMSAVE VMLOAD */ #define X86_FEATURE_VGIF (15*32+16) /* "vgif" Virtual GIF */ +#define X86_FEATURE_GMET (15*32+17) /* Guest Mode Execution Trap */ #define X86_FEATURE_X2AVIC (15*32+18) /* "x2avic" Virtual x2apic */ #define X86_FEATURE_V_SPEC_CTRL (15*32+20) /* "v_spec_ctrl" Virtual SPEC_= CTRL */ #define X86_FEATURE_VNMI (15*32+25) /* "vnmi" Virtual NMI */ diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index bcfeb5e7c0ed..aa63431ba92c 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -243,6 +243,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area { #define SVM_MISC_ENABLE_NP BIT(0) #define SVM_MISC_ENABLE_SEV BIT(1) #define SVM_MISC_ENABLE_SEV_ES BIT(2) +#define SVM_MISC_ENABLE_GMET BIT(3) =20 #define SVM_MISC2_ENABLE_V_LBR BIT_ULL(0) #define SVM_MISC2_ENABLE_V_VMLOAD_VMSAVE BIT_ULL(1) --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C46B12F6184 for ; Thu, 30 Apr 2026 15:09:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561767; cv=none; b=fqe1fCnpYTi+ITZLi4CjYDfy7RGBTxQyzdO2qwKgwSjIGiJJyyAa89LTPJ2cxDx9c1ORdR93fX372BNN8ilX7KnBypwcGUi84zpsvWzt5W5olDtd3gFYZR5YVJdwmv1758g0qN5JgsP9AAlIGJjWFtoyapLDXn+PGereEneQJ0Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561767; c=relaxed/simple; bh=L4cH2dT588gSAN7U+hs6x9Lbsg5mnNLb6qtEQXg9v+U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IKFu6RMFeMePAxR80jmmL9GWLvctqh2KyhYAKUtlRz1KLydZFzuVkk4P3zhNk1CuHEIgCMIuhHAphGPyUg0sO4oi/iAE9PTvoZaVD16BhH9HOjTtQcrUB06RGV/4pPD8JBaSXoStk1nOOMCjcbSKRCcrtjMfjX5Kc4+D31ba9F4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=DmnByGXF; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="DmnByGXF" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561749; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TqgTbHba1w1SDsi4GMvguZqLQWEvB+kbaiyrWxLdXpU=; b=DmnByGXFGZN5SD8v0Uzc8V+Tl3Rz9gdvSteKlJWQDCCJXC3zDF/wtbl4QnAcce8FMY0l6f B9UCoopzx6OqBSGPiMTnrJgOnji8SUWVt25ZtbvzeIhd1zmVLKd01qAgV0nqfMGQDm4Gtf pCf784LxOJWLfEVIizAJbO/iNX3Bteg= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-694-eFQN5iRiP0a6mZoOAArfoQ-1; Thu, 30 Apr 2026 11:08:25 -0400 X-MC-Unique: eFQN5iRiP0a6mZoOAArfoQ-1 X-Mimecast-MFC-AGG-ID: eFQN5iRiP0a6mZoOAArfoQ_1777561690 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DED2018001FA; Thu, 30 Apr 2026 15:08:08 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 584B2180034F; Thu, 30 Apr 2026 15:08:08 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 24/28] KVM: x86/mmu: hard code more bits in kvm_init_shadow_npt_mmu Date: Thu, 30 Apr 2026 11:07:43 -0400 Message-ID: <20260430150747.76749-25-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" The host CR0 does not really reflect onto the NPT format because hCR0.PG=3D1 must be set and hCR0.WP is ignored. Carve that in stone by removing the cr0 argument from kvm_init_shadow_npt_mmu. Pass in WP=3D1 as well; it does not matter for GMET disabled because PFERR_USER_MASK is always set, but a cleared W bit in the nested page tables cannot be overridden in supervisor mode when GMET is enabled, either. In fact, since CR0.WP=3D0 is the weird "extra accesses allowed" mode, it is acutally easier think about it being always set. Likewise, clear X86_CR4_SMAP to avoid that KVM erroneously faults on supervisor accesses to an U=3D1 page. Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu.h | 4 ++-- arch/x86/kvm/mmu/mmu.c | 8 ++++---- arch/x86/kvm/svm/nested.c | 2 +- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index e1e3869f568b..1b354e1f2d81 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -96,8 +96,8 @@ void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask); void kvm_mmu_set_ept_masks(bool has_ad_bits); =20 void kvm_init_mmu(struct kvm_vcpu *vcpu); -void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0, - unsigned long cr4, u64 efer, gpa_t nested_cr3); +void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr4, + u64 efer, gpa_t nested_cr3); void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly, int huge_page_level, bool accessed_dirty, bool mbec, gpa_t new_eptp); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 93f96673d02a..32845edd14fa 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5939,13 +5939,13 @@ static void kvm_init_shadow_mmu(struct kvm_vcpu *vc= pu, shadow_mmu_init_context(vcpu, context, cpu_role, root_role); } =20 -void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0, - unsigned long cr4, u64 efer, gpa_t nested_cr3) +void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr4, + u64 efer, gpa_t nested_cr3) { struct kvm_mmu *context =3D &vcpu->arch.guest_mmu; struct kvm_mmu_role_regs regs =3D { - .cr0 =3D cr0, - .cr4 =3D cr4 & ~X86_CR4_PKE, + .cr0 =3D X86_CR0_PG | X86_CR0_WP, + .cr4 =3D cr4 & ~(X86_CR4_PKE | X86_CR4_SMAP), .efer =3D efer, }; union kvm_cpu_role cpu_role =3D kvm_calc_cpu_role(vcpu, ®s); diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index df232153eb24..a1cffd274000 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -93,7 +93,7 @@ static void nested_svm_init_mmu_context(struct kvm_vcpu *= vcpu) * when called via KVM_SET_NESTED_STATE, that state may _not_ match curre= nt * vCPU state. CR0.WP is explicitly ignored, while CR0.PG is required. */ - kvm_init_shadow_npt_mmu(vcpu, X86_CR0_PG, svm->vmcb01.ptr->save.cr4, + kvm_init_shadow_npt_mmu(vcpu, svm->vmcb01.ptr->save.cr4, svm->vmcb01.ptr->save.efer, svm->nested.ctl.nested_cr3); vcpu->arch.mmu->get_guest_pgd =3D nested_svm_get_tdp_cr3; --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C008C43901B for ; Thu, 30 Apr 2026 15:08:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561696; cv=none; b=hFAVb0mYEEg0OH/sSXEzMtLE1ccfOsVVv5wxsTK1Sm+PAMwDnxVSd3UWnO2vcZbdYXwJTES4TTOxIWu+M5+Kk8ylKJfGLVFpMdkQbwpG4EzCfmWJAtkxRtoerBk7BIhVQO9s+RJQvb/nlO4lGsAY8lnOnBA/zMpHR9Dm7iF1/48= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561696; c=relaxed/simple; bh=JuQ5ifpSezq9RPetdtgiaNYLdjC6VLx7Jds/OSAOh30=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ADI/IApP/XCR9zM3GV2ZXbR0NbMNz4Q2hg/W9FkzrgTjZKZuhE2wKatjt63dk/8vYDVICK2xvMAP6PsdffoTkxJddd6Vbrbl187QyHtMcmuTnas5FdX8xPEgqxTv2ZwQnj20HrxeY8gWJy8f2lMjFx7KbMPyytlxdfDTl7k98zk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=BW73ck3D; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="BW73ck3D" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561692; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=C1YGIFeSK9cmskuSDD+3fm+A3buUMN10rubEx5F9QP0=; b=BW73ck3DR5T+Rq1y4QZDuSuQKd7oSoNUurTd+EwbadtowKboLwZ5OYKCsVL40dXjFQyHlJ VwGPJjjubXAsn7g/5vK7DDASy8kJLNtxf3FAK1p6eGlssF/A5p+q5tSPFsY0uyt6dubQJA NBiOru5th1vR+wSyG02EzDMDwnBVZe4= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-281-kh3IWg7oPT-O4hkP7f2kjA-1; Thu, 30 Apr 2026 11:08:10 -0400 X-MC-Unique: kh3IWg7oPT-O4hkP7f2kjA-1 X-Mimecast-MFC-AGG-ID: kh3IWg7oPT-O4hkP7f2kjA_1777561689 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 97A4918004A9; Thu, 30 Apr 2026 15:08:09 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 114A3180045E; Thu, 30 Apr 2026 15:08:08 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 25/28] KVM: x86/mmu: add support for GMET to NPT page table walks Date: Thu, 30 Apr 2026 11:07:44 -0400 Message-ID: <20260430150747.76749-26-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" GMET allows page table entries to be created with U=3D0 in NPT. However, when GMET=3D1 U=3D0 only affects execution, not reads or writes. Ignore user faults on non-fetch accesses for NPT GMET. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/mmu.h | 2 +- arch/x86/kvm/mmu/mmu.c | 18 ++++++++++++------ arch/x86/kvm/svm/nested.c | 10 +++++++--- 4 files changed, 22 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 7dde4ca87752..1da3d5c59e15 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -370,6 +370,8 @@ union kvm_mmu_page_role { * cr4_smep is also set for EPT MBEC. Because it affects * which pages are considered non-present (bit 10 additionally * must be zero if MBEC is on) it has to be in the base role. + * It also has to be in the base role for AMD GMET because + * kernel-executable pages need to have U=3D0 with GMET enabled. */ unsigned cr4_smep:1; =20 diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 1b354e1f2d81..ddf4e467c071 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -97,7 +97,7 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits); =20 void kvm_init_mmu(struct kvm_vcpu *vcpu); void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr4, - u64 efer, gpa_t nested_cr3); + u64 efer, gpa_t nested_cr3, u64 misc_ctl); void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly, int huge_page_level, bool accessed_dirty, bool mbec, gpa_t new_eptp); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 32845edd14fa..015085ef6e46 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -55,6 +55,7 @@ #include #include #include +#include #include =20 #include "trace.h" @@ -5572,7 +5573,7 @@ reset_ept_shadow_zero_bits_mask(struct kvm_mmu *conte= xt, bool execonly) (14 & (access) ? 1 << 14 : 0) | \ (15 & (access) ? 1 << 15 : 0)) =20 -static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept) +static void update_permission_bitmask(struct kvm_mmu *mmu, bool tdp, bool = ept) { unsigned byte; =20 @@ -5633,7 +5634,12 @@ static void update_permission_bitmask(struct kvm_mmu= *mmu, bool ept) /* Faults from kernel mode accesses to user pages */ u16 kf =3D (pfec & PFERR_USER_MASK) ? 0 : u; =20 - uf =3D (pfec & PFERR_USER_MASK) ? (u16)~u : 0; + /* + * For NPT GMET, U=3D0 does not affect reads and writes. Fetches + * are handled below via cr4_smep. + */ + if (!(tdp && cr4_smep)) + uf =3D (pfec & PFERR_USER_MASK) ? (u16)~u : 0; =20 if (efer_nx) ff =3D (pfec & PFERR_FETCH_MASK) ? (u16)~x : 0; @@ -5744,7 +5750,7 @@ static void reset_guest_paging_metadata(struct kvm_vc= pu *vcpu, return; =20 reset_guest_rsvds_bits_mask(vcpu, mmu); - update_permission_bitmask(mmu, false); + update_permission_bitmask(mmu, mmu =3D=3D &vcpu->arch.guest_mmu, false); update_pkru_bitmask(mmu); } =20 @@ -5940,7 +5946,7 @@ static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, } =20 void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr4, - u64 efer, gpa_t nested_cr3) + u64 efer, gpa_t nested_cr3, u64 misc_ctl) { struct kvm_mmu *context =3D &vcpu->arch.guest_mmu; struct kvm_mmu_role_regs regs =3D { @@ -5953,7 +5959,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, u= nsigned long cr4, =20 /* NPT requires CR0.PG=3D1. */ WARN_ON_ONCE(cpu_role.base.direct || !cpu_role.base.guest_mode); - cpu_role.base.cr4_smep =3D false; + cpu_role.base.cr4_smep =3D (misc_ctl & SVM_MISC_ENABLE_GMET) !=3D 0; =20 root_role =3D cpu_role.base; root_role.level =3D kvm_mmu_get_tdp_level(vcpu); @@ -6011,7 +6017,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, b= ool execonly, context->gva_to_gpa =3D ept_gva_to_gpa; context->sync_spte =3D ept_sync_spte; =20 - update_permission_bitmask(context, true); + update_permission_bitmask(context, true, true); context->pkru_mask =3D 0; reset_rsvds_bits_mask_ept(vcpu, context, execonly, huge_page_level); reset_ept_shadow_zero_bits_mask(context, execonly); diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index a1cffd274000..7adfa7da210d 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -95,7 +95,8 @@ static void nested_svm_init_mmu_context(struct kvm_vcpu *= vcpu) */ kvm_init_shadow_npt_mmu(vcpu, svm->vmcb01.ptr->save.cr4, svm->vmcb01.ptr->save.efer, - svm->nested.ctl.nested_cr3); + svm->nested.ctl.nested_cr3, + svm->nested.ctl.misc_ctl); vcpu->arch.mmu->get_guest_pgd =3D nested_svm_get_tdp_cr3; vcpu->arch.mmu->get_pdptr =3D nested_svm_get_tdp_pdptr; vcpu->arch.mmu->inject_page_fault =3D nested_svm_inject_npf_exit; @@ -2076,12 +2077,15 @@ static gpa_t svm_translate_nested_gpa(struct kvm_vc= pu *vcpu, gpa_t gpa, struct x86_exception *exception, u64 pte_access) { + struct vcpu_svm *svm =3D to_svm(vcpu); struct kvm_mmu *mmu =3D vcpu->arch.mmu; =20 BUG_ON(!mmu_is_nested(vcpu)); =20 - /* NPT walks are always user-walks */ - access |=3D PFERR_USER_MASK; + /* Non-GMET walks are always user-walks */ + if (!(svm->nested.ctl.misc_ctl & SVM_MISC_ENABLE_GMET)) + access |=3D PFERR_USER_MASK; + return mmu->gva_to_gpa(vcpu, mmu, gpa, access, exception); } =20 --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BBD2843E482 for ; Thu, 30 Apr 2026 15:08:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561698; cv=none; b=b//Q/8hjKhpcxpTBarFr/UidYnoshl6cBtp43uKbzUHbt13NVKOGMe+7sYzo7zsvwfHuoigQOqgzcV8MRrDYP7961+6TSIGWcQbdiKxqmvBeVNcVCI+vzVnVoaTI86eLBJwLUbHpHnSrqsAit3LAciGjBdxAj0jgYP/n6LeswIk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561698; c=relaxed/simple; bh=i3LHV97Zl4iJ8M728cR5ScJiyeerLmUokWWhG56QOA4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QYoI88skb/xTbfw72x3ewSejzVDAXOmHOJKyOeIXSC/YtYLh7vglzqRrkz6S+6t0jSPzmEQMkBTSOkeNZ+OjIwcGrzjhFldZQQRJ1+Wvru9dC+D01NrKiUriJVYPGFDpFqEkYsl4kQbNifZ0BVAUgOl/G1y3o4eI06Owc5hmvUo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=FnGlbkMr; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="FnGlbkMr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561694; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mTPB8p8VgH0WxNWI3+gmQlsmxKMQQtISoOqn9ka2noU=; b=FnGlbkMrW5A2PUAsqOlyymZkMCdfX2qPnXYNPVu18TzOUMS8699tR3p2U+R5D94phby9O8 0wW2pHkXNIuv4RrpN9SNJoq92/Irmn6fnFJddkQkj98yaR3FZaAq4XJRJvNaraud87sprb 4Ligzo3Og79BoWXtS2NrbderfnDNj7U= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-172-e_aF1ppaOnGspVLdF_O2Pw-1; Thu, 30 Apr 2026 11:08:11 -0400 X-MC-Unique: e_aF1ppaOnGspVLdF_O2Pw-1 X-Mimecast-MFC-AGG-ID: e_aF1ppaOnGspVLdF_O2Pw_1777561690 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 51DB019560AF; Thu, 30 Apr 2026 15:08:10 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id BFA0D180045E; Thu, 30 Apr 2026 15:08:09 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 26/28] KVM: SVM: enable GMET and set it in MMU role Date: Thu, 30 Apr 2026 11:07:45 -0400 Message-ID: <20260430150747.76749-27-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" Set the GMET bit in the nested control field. This has effectively no impact as long as NPT page tables are changed to have U=3D0. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 6 +++++- arch/x86/kvm/svm/nested.c | 9 ++++++--- arch/x86/kvm/svm/svm.c | 16 ++++++++++++++++ arch/x86/kvm/svm/svm.h | 1 + 4 files changed, 28 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 015085ef6e46..31c1803d9d15 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5855,7 +5855,6 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, { union kvm_mmu_page_role role =3D {0}; =20 - role.access =3D ACC_ALL; role.cr0_wp =3D true; role.cr4_smep =3D kvm_x86_call(tdp_has_smep)(vcpu->kvm); role.efer_nx =3D true; @@ -5866,6 +5865,11 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcp= u, role.direct =3D true; role.has_4_byte_gpte =3D false; =20 + /* All TDP pages are supervisor-executable */ + role.access =3D ACC_ALL; + if (role.cr4_smep && shadow_user_mask) + role.access &=3D ~ACC_USER_MASK; + return role; } =20 diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 7adfa7da210d..74a1df1cb84f 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -858,7 +858,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_s= vm *svm) * the latter, L1 runs L2 with shadow page tables that translate L2 GVAs * to L1 GPAs, so the same NPTs can be used for L1 and L2. */ - vmcb02->control.misc_ctl =3D vmcb01->control.misc_ctl & SVM_MISC_ENABLE_N= P; + vmcb02->control.misc_ctl =3D vmcb01->control.misc_ctl & (SVM_MISC_ENABLE_= NP | SVM_MISC_ENABLE_GMET); vmcb02->control.iopm_base_pa =3D vmcb01->control.iopm_base_pa; vmcb02->control.msrpm_base_pa =3D vmcb01->control.msrpm_base_pa; vmcb_mark_dirty(vmcb02, VMCB_PERM_MAP); @@ -895,9 +895,12 @@ static void nested_vmcb02_prepare_control(struct vcpu_= svm *svm) /* Also overwritten later if necessary. */ vmcb02->control.tlb_ctl =3D TLB_CONTROL_DO_NOTHING; =20 - /* nested_cr3. */ - if (nested_npt_enabled(svm)) + /* Use vmcb01 MMU and format if guest does not use nNPT */ + if (nested_npt_enabled(svm)) { + vmcb02->control.misc_ctl &=3D ~SVM_MISC_ENABLE_GMET; + nested_svm_init_mmu_context(vcpu); + } =20 vcpu->arch.tsc_offset =3D kvm_calc_nested_tsc_offset(vcpu->arch.l1_tsc_of= fset, vmcb12_ctrl->tsc_offset, diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index e7fdd7a9c280..3895d8794366 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -138,6 +138,9 @@ module_param(pause_filter_count_max, ushort, 0444); bool __ro_after_init npt_enabled =3D true; module_param_named(npt, npt_enabled, bool, 0444); =20 +bool gmet_enabled =3D true; +module_param_named(gmet, gmet_enabled, bool, 0444); + /* allow nested virtualization in KVM/SVM */ static int __ro_after_init nested =3D true; module_param(nested, int, 0444); @@ -1209,6 +1212,10 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool in= it_event) save->g_pat =3D vcpu->arch.pat; save->cr3 =3D 0; } + + if (gmet_enabled) + control->misc_ctl |=3D SVM_MISC_ENABLE_GMET; + svm->current_vmcb->asid_generation =3D 0; svm->asid =3D 0; =20 @@ -4612,6 +4619,11 @@ svm_patch_hypercall(struct kvm_vcpu *vcpu, unsigned = char *hypercall) hypercall[2] =3D 0xd9; } =20 +static bool svm_tdp_has_smep(struct kvm *kvm) +{ + return gmet_enabled; +} + /* * The kvm parameter can be NULL (module initialization, or invocation bef= ore * VM creation). Be sure to check the kvm parameter before using it. @@ -5355,6 +5367,7 @@ struct kvm_x86_ops svm_x86_ops __initdata =3D { .write_tsc_multiplier =3D svm_write_tsc_multiplier, =20 .load_mmu_pgd =3D svm_load_mmu_pgd, + .tdp_has_smep =3D svm_tdp_has_smep, =20 .check_intercept =3D svm_check_intercept, .handle_exit_irqoff =3D svm_handle_exit_irqoff, @@ -5588,6 +5601,9 @@ static __init int svm_hardware_setup(void) if (!boot_cpu_has(X86_FEATURE_NPT)) npt_enabled =3D false; =20 + if (!npt_enabled || !boot_cpu_has(X86_FEATURE_GMET)) + gmet_enabled =3D false; + /* Force VM NPT level equal to the host's paging level */ kvm_configure_mmu(npt_enabled, get_npt_level(), get_npt_level(), PG_LEVEL_1G); diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index a10668d17a16..dd93b3daefa9 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -44,6 +44,7 @@ static inline struct page *__sme_pa_to_page(unsigned long= pa) #define IOPM_SIZE PAGE_SIZE * 3 #define MSRPM_SIZE PAGE_SIZE * 2 =20 +extern bool gmet_enabled; extern bool npt_enabled; extern int nrips; extern int vgif; --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6423843E48C for ; Thu, 30 Apr 2026 15:08:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561699; cv=none; b=O14iFlLqJLHfktmceV5s8LBI6yNXCHN42GjhO49w3sO02pup1/sfFsRZJHi+yh2FBZFszY+1v5h0yWUzmnGBi+KJMoSrGhOvC7eLp9Z+kH+owwU62AVmZEGcbOlbdVjzXeEwkpYyqRnEsJH7dXGKE2kHzSno7Ea46H0Q5VksAj0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561699; c=relaxed/simple; bh=GR++9H5p0i1D5sIKFL2DvhkQ9RXdFtz3TIvO90pRMw0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=sSOnEEkC8lHPtVW4U6ktPezuoAY+MtGBA0ib1BvS/9DyNVl/vJDDmQKVjQLhzW/+yQn6I7QI+ru0l+mja9ojLZ4S1fljTU3LNUOvFYB9SDbD/dNc0R/8uOvtnPWfhIT9dyyZ+I9nA6WhA2kQmmRSdqM3j2AWv1TQyM2pPNi/lL8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=dwwoeTpr; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dwwoeTpr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561695; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RQb5rHk5eMELl/+vuZzClPskfe+ABko0k3vSQpQtlOY=; b=dwwoeTprHv1IE4SXUUiwD80T2C9Ry3uV/uGnLgDGuHpNQU6rOfbYvymLyjtLiTXxwDWgRS 5/s0L8Ndu27m84fM/jMkH0TLu9cmrTGlg39dGkprpEbdLjygHgaBGnytDhaEuSUEfQMQGr txZAfSaejSLwBZaHjONMIdcyXLkwSeI= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-144-MLPCkoqHOc-0nWAHvwyKRg-1; Thu, 30 Apr 2026 11:08:12 -0400 X-MC-Unique: MLPCkoqHOc-0nWAHvwyKRg-1 X-Mimecast-MFC-AGG-ID: MLPCkoqHOc-0nWAHvwyKRg_1777561691 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0ABF8195608E; Thu, 30 Apr 2026 15:08:11 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 786C518001ED; Thu, 30 Apr 2026 15:08:10 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 27/28] KVM: SVM: work around errata 1218 Date: Thu, 30 Apr 2026 11:07:46 -0400 Message-ID: <20260430150747.76749-28-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" According to AMD, the hypervisor may not be able to determine whether a fault was a GMET fault or an NX fault based on EXITINFO1, and software "must read the relevant VMCB to determine whether a fault was a GMET fault or an NX fault". The APM further details that they meant the CPL field. KVM uses the page fault error code to distinguish the causes of a nested page fault, so recalculate the PFERR_USER_MASK bit of the vmexit information. Only do it for fetches and only if GMET is in use, because KVM does not differentiate based on PFERR_USER_MASK for other nested NPT page faults. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/kvm/svm/svm.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 3895d8794366..fd79874c5f4b 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1993,6 +1993,18 @@ static int npf_interception(struct kvm_vcpu *vcpu) } } =20 + if (!is_sev_es_guest(vcpu) && + (svm->vmcb->control.misc_ctl & SVM_MISC_ENABLE_GMET) && + (error_code & PFERR_FETCH_MASK)) { + /* + * Work around errata 1218: EXITINFO1[2] May Be Incorrectly Set + * When GMET (Guest Mode Execute Trap extension) is Enabled + */ + error_code |=3D PFERR_USER_MASK; + if (svm_get_cpl(vcpu) !=3D 3) + error_code &=3D ~PFERR_USER_MASK; + } + if (is_sev_snp_guest(vcpu) && (error_code & PFERR_GUEST_ENC_MASK)) error_code |=3D PFERR_PRIVATE_ACCESS; =20 --=20 2.52.0 From nobody Tue Jun 16 17:01:53 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 94D7E4418DB for ; Thu, 30 Apr 2026 15:08:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561700; cv=none; b=R/OPKrxawu5iPNKXf1zEHVAEXN+Qx1i8EdUXQTIUc0h/S7p1Ho2mxU1nNLmQlk/yQ6qiTNCJpt+9hwTZD12UI6LCx9TyHOdslE34iBBhR+cBo8gi01XsMtE8dbI/tEWRS3M2DrH3SBRNJpEjKBQS6+AUk1I2tjHpOQwfmeFNeZM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777561700; c=relaxed/simple; bh=Ikxx8wYPGa5Z+o5s0fjJ6ZzD5n/6leMlsJt/lKwgm7w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=khDWk7xC+9Sb35s9wFzJiiAWlhklBlP30GGMAmj32l+PR/lGSem6SiFuR746Bb0bpEtSHiB8zDdckp4F6fiTUc0lksAB84FnQ36VX3wzu7Sho+i8IVohX48ubJWh3t46Sj/DTTxmEe1cx0ssWEqupQo7r8bC8O/DIRFWxuyLGSc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=P0ZJsbVI; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="P0ZJsbVI" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777561696; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ROsMQfT03fFELQ1ZZYQ/4nB7HH3HdhzPEBDKlF7YMiI=; b=P0ZJsbVI9JKgULI4Nom41/aR99HauzNjVwL4cCPsDwWtzgsgEl6MkKNgB5cxq+H2hoEeI2 eGJaLkFoqfq+WsdN4Ljy+Ip2SekR1LxRavm9BtoxKYgtrX5DsRwzWHO7LxvT+EBoux9r+d uHWIRvDzqVrlu9H1PmNNxAqzZwfMUXI= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-258-hxVhi1gXPvinpFabibuGXg-1; Thu, 30 Apr 2026 11:08:12 -0400 X-MC-Unique: hxVhi1gXPvinpFabibuGXg-1 X-Mimecast-MFC-AGG-ID: hxVhi1gXPvinpFabibuGXg_1777561691 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B7EAB1956048; Thu, 30 Apr 2026 15:08:11 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 31582180034F; Thu, 30 Apr 2026 15:08:11 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: d.riley@proxmox.com, jon@nutanix.com Subject: [PATCH 28/28] KVM: nSVM: enable GMET for guests Date: Thu, 30 Apr 2026 11:07:47 -0400 Message-ID: <20260430150747.76749-29-pbonzini@redhat.com> In-Reply-To: <20260430150747.76749-1-pbonzini@redhat.com> References: <20260430150747.76749-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" All that needs to be done is moving the GMET bit from vmcb12 to vmcb02. The only new thing is that __nested_copy_vmcb_control_to_cache now ensures that ignored-if-unavailable bits are zero in svm->nested.ctl. Tested-by: David Riley Signed-off-by: Paolo Bonzini --- arch/x86/kvm/svm/nested.c | 6 +++++- arch/x86/kvm/svm/svm.c | 3 +++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 74a1df1cb84f..3d1fd1776e19 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -489,11 +489,14 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_v= cpu *vcpu, nested_svm_sanitize_intercept(vcpu, to, SKINIT); nested_svm_sanitize_intercept(vcpu, to, RDPRU); =20 - /* Always clear SVM_MISC_ENABLE_NP if the guest cannot use NPTs */ + /* Always clear misc_ctl bits that the guest cannot use */ to->misc_ctl =3D from->misc_ctl; if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NPT)) to->misc_ctl &=3D ~SVM_MISC_ENABLE_NP; =20 + if (!gmet_enabled || !guest_cpu_cap_has(vcpu, X86_FEATURE_GMET)) + to->misc_ctl &=3D ~SVM_MISC_ENABLE_GMET; + to->iopm_base_pa =3D from->iopm_base_pa & PAGE_MASK; to->msrpm_base_pa =3D from->msrpm_base_pa & PAGE_MASK; to->tsc_offset =3D from->tsc_offset; @@ -898,6 +901,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_s= vm *svm) /* Use vmcb01 MMU and format if guest does not use nNPT */ if (nested_npt_enabled(svm)) { vmcb02->control.misc_ctl &=3D ~SVM_MISC_ENABLE_GMET; + vmcb02->control.misc_ctl |=3D (svm->nested.ctl.misc_ctl & SVM_MISC_ENABL= E_GMET); =20 nested_svm_init_mmu_context(vcpu); } diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index fd79874c5f4b..a82471a6d3ea 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -5504,6 +5504,9 @@ static __init void svm_set_cpu_caps(void) if (boot_cpu_has(X86_FEATURE_PFTHRESHOLD)) kvm_cpu_cap_set(X86_FEATURE_PFTHRESHOLD); =20 + if (gmet_enabled) + kvm_cpu_cap_set(X86_FEATURE_GMET); + if (vgif) kvm_cpu_cap_set(X86_FEATURE_VGIF); =20 --=20 2.52.0