From nobody Tue Apr 7 14:25:32 2026 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9892136D51B for ; Fri, 13 Mar 2026 07:10:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773385842; cv=none; b=ub5v417pVnl1d2hI/EORQlTHxx/FpjGXaqn3dHX1gXrea7/g/ax8LPE2dNHN2EKjWaYBrcGwBt2lARU5kvxEgG6SpImDGyRkctuAtOdwOTqozGgqovgexH8lnzykFXvrPbkVJ1bOqiFebPbEmE3DXIrsTWgC9Gfe3syjsbKh9tg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773385842; c=relaxed/simple; bh=zEieePgVikfqHjkpa0n5I3vuISbE49oCD0CeCAd8nWw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=rlARhuOlBAgwbrk24lLLEzMcZmVt1e2C0iTE5dUkVV/oeGkvPgzXeELJOSB9dMkoBF3hy+NFKpvf/aLNwgml1qiw4nN7A8bKLlPk8oZRDW0fJPPucLYj3iM1wdF4Pm5gN6xkXJYDVtV0H/1UBicqnDjXi7U8mL21g+h81kNm4es= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--chengkev.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=o/KF92Uh; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--chengkev.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="o/KF92Uh" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-bce224720d8so1111711a12.1 for ; Fri, 13 Mar 2026 00:10:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773385840; x=1773990640; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=aMnjMyqVdWNANGnVMI3Si7Uw5YgsWHlFIfWWP9yjF3Q=; b=o/KF92UhsWxr4VPk0TlyiOLhHqipKGRVD/Wig8TQEk7naiAEabGbyLGZ/JT4hsYwXk lazZs33o8I10brmqmKjyBItWGLXKsCRwDf3ms7Z293KirH/CPbyXI9KiHWsiDQ6GhQGP Xo1bqxP94HwjP+Gcl9klePdXh3v8qgvgE9gwtXZYyjpwMGfEeBZxjkfGw4k24PWJ4fYn IZY5JpqXK7J35kFvDXlYpxVrdipdj2jTbLnglNw4gz2qPpKo7ma8Nd1IN1XqkNOdLGU1 m5jzSIt5PUaDaW50gj4MNKmEP0vA+65VX2fQBKmDxDjB4xOL6Xe87Ae1e6XSe6W7FUgz M8AA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773385840; x=1773990640; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=aMnjMyqVdWNANGnVMI3Si7Uw5YgsWHlFIfWWP9yjF3Q=; b=QU7vgzjsOPU9CL6moHH5ZeHejVsGuo78Rmk1IukHI7jN6jOgIizzwbBoEzHlCyFDgN rYRxqXVNjMUXz8BClk7aZSkr0QsZM1PaOcW6SkW9qk30xUkAh9YRYbLZ3TpGEEK8U6+5 TKKKQ9TQ/vRRH2Ej5o+1+Fy/93W6VFElCMEQ2MMr1BKgUQOGiNng9Nkhl3hLp9ctIlhJ 6aXF2k6uBcnNq6i3Ia7FIHKxsoLNkvfPa5OmtVUC8v74bj7RxPpsKU5y7XN9RT0b6XJ0 gba4Pgzlh9Vs0wk9WYIzLQZWlgd6/ziJZX2tD/EV0/0rKs9Ir/D1+s2pfwwSdoIloVK0 XCKg== X-Forwarded-Encrypted: i=1; AJvYcCW0k9pbUVleHylsztgHS2dgbopAhflsk9J7+H3Lx8ciLfjjCGXkKg3l8vA3JpNm7w0xCuEp2TKxseCg/cM=@vger.kernel.org X-Gm-Message-State: AOJu0YzppPoqdOJyn+ki6yCAf84bkHaMTPw9pc3dRvONM3+8cpH7Ch1A YYQRHCCiUl0NamivPXY6klw1UstIDb/gm/Y2LVBgUqPHvvFsBQF1eI4ASVsKnOcda5aYHDzT/2b zQytqUmrH4KdlIw== X-Received: from pfuv11.prod.google.com ([2002:a05:6a00:148b:b0:829:84db:97d8]) (user=chengkev job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:278a:b0:81f:61d2:84a9 with SMTP id d2e1a72fcca58-82a198c4368mr1672419b3a.35.1773385839766; Fri, 13 Mar 2026 00:10:39 -0700 (PDT) Date: Fri, 13 Mar 2026 07:10:31 +0000 In-Reply-To: <20260313071033.4153209-1-chengkev@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260313071033.4153209-1-chengkev@google.com> X-Mailer: git-send-email 2.53.0.851.ga537e3e6e9-goog Message-ID: <20260313071033.4153209-3-chengkev@google.com> Subject: [PATCH V3 2/4] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK From: Kevin Cheng To: seanjc@google.com, pbonzini@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, yosry@kernel.org, Kevin Cheng Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Fix nested_svm_inject_npf_exit() to correctly set the fault stage bits (PFERR_GUEST_PAGE_MASK vs PFERR_GUEST_FINAL_MASK) in exit_info_1 when injecting an NPF to L1. There are two paths into nested_svm_inject_npf_exit(): hardware NPF exits (guest_mmu walker) and emulation-triggered faults (nested_mmu walker). For emulation, the nested_mmu walker knows whether the fault occurred on a page table page or the final translation, and sets the appropriate bit in fault->error_code via paging_tmpl.h. For hardware NPF exits, the guest_mmu walker cannot determine this. Only hardware knows, via exit_info_1 bits 32-33. The old code hardcoded (1ULL << 32) for the emulation path, always setting PFERR_GUEST_FINAL_MASK even for page table walk faults. For the hardware NPF path, it preserved exit_info_1's upper bits and replaced the lower 32 bits with fault->error_code, which was correct but convoluted. Introduce hardware_nested_page_fault in struct x86_exception to distinguish the two paths. For hardware NPF exits, take the fault stage bits from exit_info_1. For emulation faults, take them from fault->error_code. The lower 32 bits always come from fault->error_code, which reflects L1's NPT state (L0's NPT may differ since KVM only populates it when the full translation succeeds). Add a WARN_ON_ONCE if exactly one of PFERR_GUEST_FINAL_MASK or PFERR_GUEST_PAGE_MASK is not set in the final exit_info_1, as this would indicate a bug in the fault handling code. Signed-off-by: Kevin Cheng --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/kvm_emulate.h | 1 + arch/x86/kvm/mmu/paging_tmpl.h | 26 +++++++++++------------ arch/x86/kvm/svm/nested.c | 37 +++++++++++++++++++++++---------- 4 files changed, 42 insertions(+), 24 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index d3bdc9828133..134394dc09e6 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -281,6 +281,8 @@ enum x86_intercept_stage; #define PFERR_GUEST_RMP_MASK BIT_ULL(31) #define PFERR_GUEST_FINAL_MASK BIT_ULL(32) #define PFERR_GUEST_PAGE_MASK BIT_ULL(33) +#define PFERR_GUEST_FAULT_STAGE_MASK \ + (PFERR_GUEST_FINAL_MASK | PFERR_GUEST_PAGE_MASK) #define PFERR_GUEST_ENC_MASK BIT_ULL(34) #define PFERR_GUEST_SIZEM_MASK BIT_ULL(35) #define PFERR_GUEST_VMPL_MASK BIT_ULL(36) diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h index ff4f9b0a01ff..e67982f4da40 100644 --- a/arch/x86/kvm/kvm_emulate.h +++ b/arch/x86/kvm/kvm_emulate.h @@ -24,6 +24,7 @@ struct x86_exception { bool error_code_valid; u64 error_code; bool nested_page_fault; + bool hardware_nested_page_fault; u64 address; /* cr2 or nested page fault gpa */ u8 async_page_fault; unsigned long exit_qualification; diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 37eba7dafd14..ea2b7569f8a4 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -385,18 +385,12 @@ static int FNAME(walk_addr_generic)(struct guest_walk= er *walker, real_gpa =3D kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(table_gfn), nested_access, &walker->fault); =20 - /* - * FIXME: This can happen if emulation (for of an INS/OUTS - * instruction) triggers a nested page fault. The exit - * qualification / exit info field will incorrectly have - * "guest page access" as the nested page fault's cause, - * instead of "guest page structure access". To fix this, - * the x86_exception struct should be augmented with enough - * information to fix the exit_qualification or exit_info_1 - * fields. - */ - if (unlikely(real_gpa =3D=3D INVALID_GPA)) + if (unlikely(real_gpa =3D=3D INVALID_GPA)) { +#if PTTYPE !=3D PTTYPE_EPT + walker->fault.error_code |=3D PFERR_GUEST_PAGE_MASK; +#endif return 0; + } =20 slot =3D kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(real_gpa)); if (!kvm_is_visible_memslot(slot)) @@ -452,8 +446,12 @@ static int FNAME(walk_addr_generic)(struct guest_walke= r *walker, #endif =20 real_gpa =3D kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(gfn), access, &walke= r->fault); - if (real_gpa =3D=3D INVALID_GPA) + if (real_gpa =3D=3D INVALID_GPA) { +#if PTTYPE !=3D PTTYPE_EPT + walker->fault.error_code |=3D PFERR_GUEST_FINAL_MASK; +#endif return 0; + } =20 walker->gfn =3D real_gpa >> PAGE_SHIFT; =20 @@ -787,8 +785,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, st= ruct kvm_page_fault *fault * The page is not mapped by the guest. Let the guest handle it. */ if (!r) { - if (!fault->prefetch) + if (!fault->prefetch) { + walker.fault.hardware_nested_page_fault =3D walker.fault.nested_page_fa= ult; kvm_inject_emulated_page_fault(vcpu, &walker.fault); + } =20 return RET_PF_RETRY; } diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 5ff01d2ac85e..62904ec08dda 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -38,19 +38,34 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu = *vcpu, { struct vcpu_svm *svm =3D to_svm(vcpu); struct vmcb *vmcb =3D svm->vmcb; + u64 fault_stage; =20 - if (vmcb->control.exit_code !=3D SVM_EXIT_NPF) { - /* - * TODO: track the cause of the nested page fault, and - * correctly fill in the high bits of exit_info_1. - */ - vmcb->control.exit_code =3D SVM_EXIT_NPF; - vmcb->control.exit_info_1 =3D (1ULL << 32); - vmcb->control.exit_info_2 =3D fault->address; - } + /* + * For hardware NPF exits, the GUEST_FAULT_STAGE bits are only + * available in the hardware exit_info_1, since the guest_mmu + * walker doesn't know whether the faulting GPA was a page table + * page or final page from L2's perspective. + */ + if (fault->hardware_nested_page_fault) + fault_stage =3D vmcb->control.exit_info_1 & + PFERR_GUEST_FAULT_STAGE_MASK; + else + fault_stage =3D fault->error_code & PFERR_GUEST_FAULT_STAGE_MASK; + + vmcb->control.exit_code =3D SVM_EXIT_NPF; + vmcb->control.exit_info_1 =3D fault_stage | fault->error_code; + vmcb->control.exit_info_2 =3D fault->address; =20 - vmcb->control.exit_info_1 &=3D ~0xffffffffULL; - vmcb->control.exit_info_1 |=3D fault->error_code; + /* + * All nested page faults should be annotated as occurring on the + * final translation *or* the page walk. Arbitrarily choose "final" + * if KVM is buggy and enumerated both or neither. + */ + if (WARN_ON_ONCE(hweight64(vmcb->control.exit_info_1 & + PFERR_GUEST_FAULT_STAGE_MASK) !=3D 1)) { + vmcb->control.exit_info_1 &=3D ~PFERR_GUEST_FAULT_STAGE_MASK; + vmcb->control.exit_info_1 |=3D PFERR_GUEST_FINAL_MASK; + } =20 nested_svm_vmexit(svm); } --=20 2.53.0.851.ga537e3e6e9-goog