From nobody Tue Apr 7 12:54:03 2026 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7CDDA36B047 for ; Fri, 13 Mar 2026 07:10:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773385839; cv=none; b=q7enC1rr1hhGBa3+4fhsz2LDSfzEoAXEZC8a6QkJLKT20XIgiDOE8xa7mwkitSv4BQxgzxaMJYsPHfTDGpdFC4ldGLLmDO0ljlITtB/mHfk+43YTRihBrE+p4GauuP+hxWHLoUIXb5AwJzT/VgCOFh7snWDXSTu+YZjnniRdVEU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773385839; c=relaxed/simple; bh=yipGvzU0XSoeO6SPfV8USfkXqLZQGUvVOj9vOK2YYXo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=VcleWlan3w3avABzwzVUELrtDNOKr2QmPZ4xNfJpY4tR5DchdWyvxzgiGtKUpgVp/5yfoNk6tC0xuPqom3tTsc8e+vMZcncvplRvPOqQUtpnifKzqWbe+MG2iTW46DNoNAaYdk0i4/YkhiF4vlThAZNH8VC9f319kMZF1YN9SLA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--chengkev.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=MlwezNEg; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--chengkev.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="MlwezNEg" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-829a72475a1so6795324b3a.3 for ; Fri, 13 Mar 2026 00:10:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773385838; x=1773990638; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=IbW9HHXfBj/RUZsSrrL2YIjr4l62JaW5V4xI8hWdqmQ=; b=MlwezNEgGUMDju0ZMnfmfWM4OXQyZqsW7mI0OWH4q74kXmHOF/TCFZdX1BZK7nTHHi iAm7t75JnAMO/u3NXzQZBGv1M3D+rz7sZYccvYjAGmZBPcuyFVk63dG2h8Q8h0NagXHD NALwsUHeBVBYtnRRRxdDkT954txF2ByS9i3O19dBGUSIy3/cRi6GGUZIgCyPqbWVZv5q JvLFUU7KvADKPDD4wmXQqYPmfWAdKUQQFDvhkjyLK859mzLz3UhKtXr8O4WhDo7frrVq TyX2BlZxZmx5uS77NTjd9+iwDuLY/Ia+CyzosXaquy8/eeH6W8iNghWLBqYxaGt/P3mX 78Kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773385838; x=1773990638; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IbW9HHXfBj/RUZsSrrL2YIjr4l62JaW5V4xI8hWdqmQ=; b=FSSatZUbWD1y27y9e3n4lWi5j8rtBFyeriIQa2KBqcFScbzf2nV/1RpoBfeHxWsF50 IexxSRj8/CMXPAcXxEHo0rplO07xASCRZHO2XVWOuR0wpOxJYywkpleYiKSDPrldElhs Ci6ZzNETL/MXuDIe2EKqlHKZHsYrbmap4/UbGkpf/Ojilf1XYlN7B9c0KdnHIePMb6Jv qQ/XiyG2KAfvwbQPfI/enxAKi213VFF6LKn2lKiYsrIav052vWrap5DDD3MFJxNeCqo8 QL72mk+2qppeZh36gHcBkMEVnHFj1AwGJdVRwGvMn0uXa9GiJ0XjENVE7N8WfvlxtZJJ sjUA== X-Forwarded-Encrypted: i=1; AJvYcCXHIaGUKEtouexbCEV1EKfYLfVEG7gZUJKooZtiFayiEFosNHrbUAlUvyxGO6FCpdIn7yXat7SEi56Q32o=@vger.kernel.org X-Gm-Message-State: AOJu0YzlzvDeFEBrZDbOKjZAQb8ovtnMtDpqxjTasKCGq3ChZcJeNzdv AAEM5Ep5opOAOzBDOMjDA9W00c9lDO+O6De7YfQwvJ5cdxEmOy87T36wTeorLcIi1/fVO4efKTc Zk4Y889jqF+Zodw== X-Received: from pfbmu17.prod.google.com ([2002:a05:6a00:6e91:b0:829:880b:b4]) (user=chengkev job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:2d99:b0:81e:12f1:d8a with SMTP id d2e1a72fcca58-82a198d7062mr2091526b3a.34.1773385837675; Fri, 13 Mar 2026 00:10:37 -0700 (PDT) Date: Fri, 13 Mar 2026 07:10:30 +0000 In-Reply-To: <20260313071033.4153209-1-chengkev@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260313071033.4153209-1-chengkev@google.com> X-Mailer: git-send-email 2.53.0.851.ga537e3e6e9-goog Message-ID: <20260313071033.4153209-2-chengkev@google.com> Subject: [PATCH V3 1/4] KVM: x86: Widen x86_exception's error_code to 64 bits From: Kevin Cheng To: seanjc@google.com, pbonzini@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, yosry@kernel.org, Kevin Cheng Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Widen the error_code field in struct x86_exception from u16 to u64 to accommodate AMD's NPF error code, which defines information bits above bit 31, e.g. PFERR_GUEST_FINAL_MASK (bit 32), and PFERR_GUEST_PAGE_MASK (bit 33). Retain the u16 type for the local errcode variable in walk_addr_generic as the walker synthesizes conventional #PF error codes that are architecturally limited to bits 15:0. Signed-off-by: Kevin Cheng --- arch/x86/kvm/kvm_emulate.h | 2 +- arch/x86/kvm/mmu/paging_tmpl.h | 6 ++++++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h index fb3dab4b5a53..ff4f9b0a01ff 100644 --- a/arch/x86/kvm/kvm_emulate.h +++ b/arch/x86/kvm/kvm_emulate.h @@ -22,7 +22,7 @@ enum x86_intercept_stage; struct x86_exception { u8 vector; bool error_code_valid; - u16 error_code; + u64 error_code; bool nested_page_fault; u64 address; /* cr2 or nested page fault gpa */ u8 async_page_fault; diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 901cd2bd40b8..37eba7dafd14 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -317,6 +317,12 @@ static int FNAME(walk_addr_generic)(struct guest_walke= r *walker, const int write_fault =3D access & PFERR_WRITE_MASK; const int user_fault =3D access & PFERR_USER_MASK; const int fetch_fault =3D access & PFERR_FETCH_MASK; + /* + * Note! Track the error_code that's common to legacy shadow paging + * and NPT shadow paging as a u16 to guard against unintentionally + * setting any of bits 63:16. Architecturally, the #PF error code is + * 32 bits, and Intel CPUs don't support settings bits 31:16. + */ u16 errcode =3D 0; gpa_t real_gpa; gfn_t gfn; --=20 2.53.0.851.ga537e3e6e9-goog From nobody Tue Apr 7 12:54:03 2026 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9892136D51B for ; Fri, 13 Mar 2026 07:10:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773385842; cv=none; b=ub5v417pVnl1d2hI/EORQlTHxx/FpjGXaqn3dHX1gXrea7/g/ax8LPE2dNHN2EKjWaYBrcGwBt2lARU5kvxEgG6SpImDGyRkctuAtOdwOTqozGgqovgexH8lnzykFXvrPbkVJ1bOqiFebPbEmE3DXIrsTWgC9Gfe3syjsbKh9tg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773385842; c=relaxed/simple; bh=zEieePgVikfqHjkpa0n5I3vuISbE49oCD0CeCAd8nWw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=rlARhuOlBAgwbrk24lLLEzMcZmVt1e2C0iTE5dUkVV/oeGkvPgzXeELJOSB9dMkoBF3hy+NFKpvf/aLNwgml1qiw4nN7A8bKLlPk8oZRDW0fJPPucLYj3iM1wdF4Pm5gN6xkXJYDVtV0H/1UBicqnDjXi7U8mL21g+h81kNm4es= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--chengkev.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=o/KF92Uh; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--chengkev.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="o/KF92Uh" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-bce224720d8so1111711a12.1 for ; Fri, 13 Mar 2026 00:10:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773385840; x=1773990640; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=aMnjMyqVdWNANGnVMI3Si7Uw5YgsWHlFIfWWP9yjF3Q=; b=o/KF92UhsWxr4VPk0TlyiOLhHqipKGRVD/Wig8TQEk7naiAEabGbyLGZ/JT4hsYwXk lazZs33o8I10brmqmKjyBItWGLXKsCRwDf3ms7Z293KirH/CPbyXI9KiHWsiDQ6GhQGP Xo1bqxP94HwjP+Gcl9klePdXh3v8qgvgE9gwtXZYyjpwMGfEeBZxjkfGw4k24PWJ4fYn IZY5JpqXK7J35kFvDXlYpxVrdipdj2jTbLnglNw4gz2qPpKo7ma8Nd1IN1XqkNOdLGU1 m5jzSIt5PUaDaW50gj4MNKmEP0vA+65VX2fQBKmDxDjB4xOL6Xe87Ae1e6XSe6W7FUgz M8AA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773385840; x=1773990640; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=aMnjMyqVdWNANGnVMI3Si7Uw5YgsWHlFIfWWP9yjF3Q=; b=QU7vgzjsOPU9CL6moHH5ZeHejVsGuo78Rmk1IukHI7jN6jOgIizzwbBoEzHlCyFDgN rYRxqXVNjMUXz8BClk7aZSkr0QsZM1PaOcW6SkW9qk30xUkAh9YRYbLZ3TpGEEK8U6+5 TKKKQ9TQ/vRRH2Ej5o+1+Fy/93W6VFElCMEQ2MMr1BKgUQOGiNng9Nkhl3hLp9ctIlhJ 6aXF2k6uBcnNq6i3Ia7FIHKxsoLNkvfPa5OmtVUC8v74bj7RxPpsKU5y7XN9RT0b6XJ0 gba4Pgzlh9Vs0wk9WYIzLQZWlgd6/ziJZX2tD/EV0/0rKs9Ir/D1+s2pfwwSdoIloVK0 XCKg== X-Forwarded-Encrypted: i=1; AJvYcCW0k9pbUVleHylsztgHS2dgbopAhflsk9J7+H3Lx8ciLfjjCGXkKg3l8vA3JpNm7w0xCuEp2TKxseCg/cM=@vger.kernel.org X-Gm-Message-State: AOJu0YzppPoqdOJyn+ki6yCAf84bkHaMTPw9pc3dRvONM3+8cpH7Ch1A YYQRHCCiUl0NamivPXY6klw1UstIDb/gm/Y2LVBgUqPHvvFsBQF1eI4ASVsKnOcda5aYHDzT/2b zQytqUmrH4KdlIw== X-Received: from pfuv11.prod.google.com ([2002:a05:6a00:148b:b0:829:84db:97d8]) (user=chengkev job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:278a:b0:81f:61d2:84a9 with SMTP id d2e1a72fcca58-82a198c4368mr1672419b3a.35.1773385839766; Fri, 13 Mar 2026 00:10:39 -0700 (PDT) Date: Fri, 13 Mar 2026 07:10:31 +0000 In-Reply-To: <20260313071033.4153209-1-chengkev@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260313071033.4153209-1-chengkev@google.com> X-Mailer: git-send-email 2.53.0.851.ga537e3e6e9-goog Message-ID: <20260313071033.4153209-3-chengkev@google.com> Subject: [PATCH V3 2/4] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK From: Kevin Cheng To: seanjc@google.com, pbonzini@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, yosry@kernel.org, Kevin Cheng Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Fix nested_svm_inject_npf_exit() to correctly set the fault stage bits (PFERR_GUEST_PAGE_MASK vs PFERR_GUEST_FINAL_MASK) in exit_info_1 when injecting an NPF to L1. There are two paths into nested_svm_inject_npf_exit(): hardware NPF exits (guest_mmu walker) and emulation-triggered faults (nested_mmu walker). For emulation, the nested_mmu walker knows whether the fault occurred on a page table page or the final translation, and sets the appropriate bit in fault->error_code via paging_tmpl.h. For hardware NPF exits, the guest_mmu walker cannot determine this. Only hardware knows, via exit_info_1 bits 32-33. The old code hardcoded (1ULL << 32) for the emulation path, always setting PFERR_GUEST_FINAL_MASK even for page table walk faults. For the hardware NPF path, it preserved exit_info_1's upper bits and replaced the lower 32 bits with fault->error_code, which was correct but convoluted. Introduce hardware_nested_page_fault in struct x86_exception to distinguish the two paths. For hardware NPF exits, take the fault stage bits from exit_info_1. For emulation faults, take them from fault->error_code. The lower 32 bits always come from fault->error_code, which reflects L1's NPT state (L0's NPT may differ since KVM only populates it when the full translation succeeds). Add a WARN_ON_ONCE if exactly one of PFERR_GUEST_FINAL_MASK or PFERR_GUEST_PAGE_MASK is not set in the final exit_info_1, as this would indicate a bug in the fault handling code. Signed-off-by: Kevin Cheng --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/kvm_emulate.h | 1 + arch/x86/kvm/mmu/paging_tmpl.h | 26 +++++++++++------------ arch/x86/kvm/svm/nested.c | 37 +++++++++++++++++++++++---------- 4 files changed, 42 insertions(+), 24 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index d3bdc9828133..134394dc09e6 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -281,6 +281,8 @@ enum x86_intercept_stage; #define PFERR_GUEST_RMP_MASK BIT_ULL(31) #define PFERR_GUEST_FINAL_MASK BIT_ULL(32) #define PFERR_GUEST_PAGE_MASK BIT_ULL(33) +#define PFERR_GUEST_FAULT_STAGE_MASK \ + (PFERR_GUEST_FINAL_MASK | PFERR_GUEST_PAGE_MASK) #define PFERR_GUEST_ENC_MASK BIT_ULL(34) #define PFERR_GUEST_SIZEM_MASK BIT_ULL(35) #define PFERR_GUEST_VMPL_MASK BIT_ULL(36) diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h index ff4f9b0a01ff..e67982f4da40 100644 --- a/arch/x86/kvm/kvm_emulate.h +++ b/arch/x86/kvm/kvm_emulate.h @@ -24,6 +24,7 @@ struct x86_exception { bool error_code_valid; u64 error_code; bool nested_page_fault; + bool hardware_nested_page_fault; u64 address; /* cr2 or nested page fault gpa */ u8 async_page_fault; unsigned long exit_qualification; diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 37eba7dafd14..ea2b7569f8a4 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -385,18 +385,12 @@ static int FNAME(walk_addr_generic)(struct guest_walk= er *walker, real_gpa =3D kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(table_gfn), nested_access, &walker->fault); =20 - /* - * FIXME: This can happen if emulation (for of an INS/OUTS - * instruction) triggers a nested page fault. The exit - * qualification / exit info field will incorrectly have - * "guest page access" as the nested page fault's cause, - * instead of "guest page structure access". To fix this, - * the x86_exception struct should be augmented with enough - * information to fix the exit_qualification or exit_info_1 - * fields. - */ - if (unlikely(real_gpa =3D=3D INVALID_GPA)) + if (unlikely(real_gpa =3D=3D INVALID_GPA)) { +#if PTTYPE !=3D PTTYPE_EPT + walker->fault.error_code |=3D PFERR_GUEST_PAGE_MASK; +#endif return 0; + } =20 slot =3D kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(real_gpa)); if (!kvm_is_visible_memslot(slot)) @@ -452,8 +446,12 @@ static int FNAME(walk_addr_generic)(struct guest_walke= r *walker, #endif =20 real_gpa =3D kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(gfn), access, &walke= r->fault); - if (real_gpa =3D=3D INVALID_GPA) + if (real_gpa =3D=3D INVALID_GPA) { +#if PTTYPE !=3D PTTYPE_EPT + walker->fault.error_code |=3D PFERR_GUEST_FINAL_MASK; +#endif return 0; + } =20 walker->gfn =3D real_gpa >> PAGE_SHIFT; =20 @@ -787,8 +785,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, st= ruct kvm_page_fault *fault * The page is not mapped by the guest. Let the guest handle it. */ if (!r) { - if (!fault->prefetch) + if (!fault->prefetch) { + walker.fault.hardware_nested_page_fault =3D walker.fault.nested_page_fa= ult; kvm_inject_emulated_page_fault(vcpu, &walker.fault); + } =20 return RET_PF_RETRY; } diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 5ff01d2ac85e..62904ec08dda 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -38,19 +38,34 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu = *vcpu, { struct vcpu_svm *svm =3D to_svm(vcpu); struct vmcb *vmcb =3D svm->vmcb; + u64 fault_stage; =20 - if (vmcb->control.exit_code !=3D SVM_EXIT_NPF) { - /* - * TODO: track the cause of the nested page fault, and - * correctly fill in the high bits of exit_info_1. - */ - vmcb->control.exit_code =3D SVM_EXIT_NPF; - vmcb->control.exit_info_1 =3D (1ULL << 32); - vmcb->control.exit_info_2 =3D fault->address; - } + /* + * For hardware NPF exits, the GUEST_FAULT_STAGE bits are only + * available in the hardware exit_info_1, since the guest_mmu + * walker doesn't know whether the faulting GPA was a page table + * page or final page from L2's perspective. + */ + if (fault->hardware_nested_page_fault) + fault_stage =3D vmcb->control.exit_info_1 & + PFERR_GUEST_FAULT_STAGE_MASK; + else + fault_stage =3D fault->error_code & PFERR_GUEST_FAULT_STAGE_MASK; + + vmcb->control.exit_code =3D SVM_EXIT_NPF; + vmcb->control.exit_info_1 =3D fault_stage | fault->error_code; + vmcb->control.exit_info_2 =3D fault->address; =20 - vmcb->control.exit_info_1 &=3D ~0xffffffffULL; - vmcb->control.exit_info_1 |=3D fault->error_code; + /* + * All nested page faults should be annotated as occurring on the + * final translation *or* the page walk. Arbitrarily choose "final" + * if KVM is buggy and enumerated both or neither. + */ + if (WARN_ON_ONCE(hweight64(vmcb->control.exit_info_1 & + PFERR_GUEST_FAULT_STAGE_MASK) !=3D 1)) { + vmcb->control.exit_info_1 &=3D ~PFERR_GUEST_FAULT_STAGE_MASK; + vmcb->control.exit_info_1 |=3D PFERR_GUEST_FINAL_MASK; + } =20 nested_svm_vmexit(svm); } --=20 2.53.0.851.ga537e3e6e9-goog From nobody Tue Apr 7 12:54:03 2026 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BE111370D4E for ; Fri, 13 Mar 2026 07:10:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773385844; cv=none; b=UXHx7iNnRURlqsC1QBprzDoAnbA9PqHpQwiOOyDh7pVuzVCI3S5ZxMjm9nfcR67PIrnGCXPpQ6PHCHCnb9hLsqCQPyeAIvoZ4XUFXruHYkKpz9Vkf7TWPy53PeVtJGuF1E6YZZ0dGp/wYAmk3RtP+52QmuHlC4cN5JZqspzEMP0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773385844; c=relaxed/simple; bh=9LlIX91Z/ZvbbeR02bUtgeWebst4TFDOSn3cp2RnlTQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=koyST+ySZFEF2v46hT98QEdyobT8aGZ+xQ1nJ083SW4flR0MfyQMOT8RB8Jy+SVw6qA/NRgyNC1afkq7onDCG/Ck8ix5yJlOXpZulcauoqJPhkG0CAaPuAwFJwKOIG2Y3bzTjKj+sSHIbyjQxEtX/oBAmhOuXd8olvM4V2EEWtU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--chengkev.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=cjdnolXt; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--chengkev.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="cjdnolXt" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-829a535ad7fso6523905b3a.1 for ; Fri, 13 Mar 2026 00:10:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773385842; x=1773990642; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=uMnmbravmc6HAY/hwiz+iRBLWQc+dimhfGrxweAepUY=; b=cjdnolXtWWmiXLLrLgVH5UsOGw7+samhtrbEMU5KJbQYnOCx4SIw/y5Uo6fXgDX1Ec x9khZsdb3VtsdvNoYx6quT8DX14O6mWGOqNSFql54YsYt9FTXsJP5mtph7cKZufgJc8H BIaEQBDxZ9p/XIPrJqt3M9BtSWW1wNLEa77NaDg+EbbSdtibIb1OVHh+7E30q9e6m31S 2/xRvQ7V88pp0nMmZfLq9iTOHfqORc/MHjeIpVWZywtwU4fTuKcRvq++syZEoQwr+JYk kWpXfwDpZqsrEVgEhMQR514HxTXW08SAGgvjNhDP2TBgoNdS5ttnxqIsw3Txr7Zyj6yV 2hKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773385842; x=1773990642; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uMnmbravmc6HAY/hwiz+iRBLWQc+dimhfGrxweAepUY=; b=Nvw85oe+aBj0bCnG5l8jSSgCfMOegqcA1TuRJjfo1j8gGc+KsM3UeZp2DxCc94U0NN BnX8jskdYLnQE7J61hSTHaVXq0ihpqN+ULcgYmmvxVvq+KHdMkuqQyh0zB+KzrSsmeKn ZX7Vp9eJjhLXihB4HFYZHQ2o8MDn0KC9jdXoFrwcZuo6L5DFsJUlWGRUhLCEP960+cIk By2xzuZY1k45UbUbxlDX0sone7OG8xNHd7Iuv3V+z4HEYBAvrU3+QlXMaLc1q0M2EckX HeCR5XDHQ+to81cQSnUVFTOYyq2OUWP5akpGqPwmxO5Uld0yp+0CMESWK9zpSkeSAkPI lCAQ== X-Forwarded-Encrypted: i=1; AJvYcCV0bNx2CndzRl0guN6639QW86piVupHUPmBxAFzGM8k/DZk1vnZjSwT0qwuZVFx+16GeZLSbdMG53twJYw=@vger.kernel.org X-Gm-Message-State: AOJu0YwCqtuUCIA7ua83j+wtmSVdJWL0MJa7Do2AL6xMa9L4aU+bhO0P ZugNdpusWDnts5V5ofYrzY/5Hx9TUNaSS91vAsd79IoZYU7DrQmrjssQQTjRj25T6b1hluGjl+s djDaNoY23OvlWqA== X-Received: from pfbhm21.prod.google.com ([2002:a05:6a00:6715:b0:824:ad09:9215]) (user=chengkev job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:4f8d:b0:81f:3f88:89ee with SMTP id d2e1a72fcca58-82a19704525mr2113077b3a.12.1773385841864; Fri, 13 Mar 2026 00:10:41 -0700 (PDT) Date: Fri, 13 Mar 2026 07:10:32 +0000 In-Reply-To: <20260313071033.4153209-1-chengkev@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260313071033.4153209-1-chengkev@google.com> X-Mailer: git-send-email 2.53.0.851.ga537e3e6e9-goog Message-ID: <20260313071033.4153209-4-chengkev@google.com> Subject: [PATCH V3 3/4] KVM: VMX: Fix nested EPT violation injection of GVA_IS_VALID/GVA_TRANSLATED bits From: Kevin Cheng To: seanjc@google.com, pbonzini@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, yosry@kernel.org, Kevin Cheng Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Make the OR of EPT_VIOLATION_GVA_IS_VALID and EPT_VIOLATION_GVA_TRANSLATED from the hardware exit qualification conditional on the fault originating from a hardware EPT violation exit. The hardware exit qualification reflects the original VM exit, which may not be an EPT violation at all, e.g. if KVM is emulating an I/O instruction and the memory operand's translation through L1's EPT fails. In that case, bits 7-8 of the exit qualification have completely different semantics (or are simply zero), and OR'ing them into the injected EPT violation corrupts the GVA_IS_VALID/ GVA_TRANSLATED information. Use the hardware_nested_page_fault flag introduced in the previous patch to distinguish hardware EPT violation exits from emulation-triggered faults. For hardware exits, take the GVA_IS_VALID/GVA_TRANSLATED bits from the hardware exit qualification. For emulation faults, take them from fault->exit_qualification, which is populated by the nested_mmu walker in paging_tmpl.h. Replace the #if PTTYPE !=3D PTTYPE_EPT preprocessor guards in paging_tmpl.h with a runtime kvm_nested_fault_is_ept() helper that checks guest_mmu to determine whether the nested fault is EPT vs NPT, and sets the appropriate field (exit_qualification for EPT, error_code for NPF) accordingly. Signed-off-by: Kevin Cheng --- arch/x86/kvm/mmu/mmu.c | 10 ++++++++++ arch/x86/kvm/mmu/paging_tmpl.h | 22 +++++++++++++++------- arch/x86/kvm/vmx/nested.c | 9 +++++---- 3 files changed, 30 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 3dce38ffee76..aabf4ac39c43 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5272,6 +5272,9 @@ static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64= *sptep, gfn_t gfn, return false; } =20 +static bool kvm_nested_fault_is_ept(struct kvm_vcpu *vcpu, + struct x86_exception *exception); + #define PTTYPE_EPT 18 /* arbitrary */ #define PTTYPE PTTYPE_EPT #include "paging_tmpl.h" @@ -5285,6 +5288,13 @@ static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u6= 4 *sptep, gfn_t gfn, #include "paging_tmpl.h" #undef PTTYPE =20 +static bool kvm_nested_fault_is_ept(struct kvm_vcpu *vcpu, + struct x86_exception *exception) +{ + WARN_ON_ONCE(!exception->nested_page_fault); + return vcpu->arch.guest_mmu.page_fault =3D=3D ept_page_fault; +} + static void __reset_rsvds_bits_mask(struct rsvd_bits_validate *rsvd_check, u64 pa_bits_rsvd, int level, bool nx, bool gbpages, bool pse, bool amd) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index ea2b7569f8a4..15be93d735ab 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -386,9 +386,15 @@ static int FNAME(walk_addr_generic)(struct guest_walke= r *walker, nested_access, &walker->fault); =20 if (unlikely(real_gpa =3D=3D INVALID_GPA)) { -#if PTTYPE !=3D PTTYPE_EPT - walker->fault.error_code |=3D PFERR_GUEST_PAGE_MASK; -#endif + /* + * Set EPT Violation flags even if the fault is an + * EPT Misconfig, fault.exit_qualification is ignored + * for EPT Misconfigs. + */ + if (kvm_nested_fault_is_ept(vcpu, &walker->fault)) + walker->fault.exit_qualification |=3D EPT_VIOLATION_GVA_IS_VALID; + else + walker->fault.error_code |=3D PFERR_GUEST_PAGE_MASK; return 0; } =20 @@ -447,9 +453,11 @@ static int FNAME(walk_addr_generic)(struct guest_walke= r *walker, =20 real_gpa =3D kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(gfn), access, &walke= r->fault); if (real_gpa =3D=3D INVALID_GPA) { -#if PTTYPE !=3D PTTYPE_EPT - walker->fault.error_code |=3D PFERR_GUEST_FINAL_MASK; -#endif + if (kvm_nested_fault_is_ept(vcpu, &walker->fault)) + walker->fault.exit_qualification |=3D EPT_VIOLATION_GVA_IS_VALID | + EPT_VIOLATION_GVA_TRANSLATED; + else + walker->fault.error_code |=3D PFERR_GUEST_FINAL_MASK; return 0; } =20 @@ -496,7 +504,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker= *walker, * [2:0] - Derive from the access bits. The exit_qualification might be * out of date if it is serving an EPT misconfiguration. * [5:3] - Calculated by the page walk of the guest EPT page tables - * [7:8] - Derived from [7:8] of real exit_qualification + * [7:8] - Set at the kvm_translate_gpa() call sites above * * The other bits are set to 0. */ diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 937aeb474af7..39f8504f5cf2 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -443,11 +443,12 @@ static void nested_ept_inject_page_fault(struct kvm_v= cpu *vcpu, vm_exit_reason =3D EXIT_REASON_EPT_MISCONFIG; exit_qualification =3D 0; } else { - exit_qualification =3D fault->exit_qualification; - exit_qualification |=3D vmx_get_exit_qual(vcpu) & - (EPT_VIOLATION_GVA_IS_VALID | - EPT_VIOLATION_GVA_TRANSLATED); vm_exit_reason =3D EXIT_REASON_EPT_VIOLATION; + exit_qualification =3D fault->exit_qualification; + if (fault->hardware_nested_page_fault) + exit_qualification |=3D vmx_get_exit_qual(vcpu) & + (EPT_VIOLATION_GVA_IS_VALID | + EPT_VIOLATION_GVA_TRANSLATED); } =20 /* --=20 2.53.0.851.ga537e3e6e9-goog From nobody Tue Apr 7 12:54:03 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A991A371887 for ; Fri, 13 Mar 2026 07:10:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773385846; cv=none; b=A7c0nCYHnnahXKrqHTxWt5qS1c7Eqg1DaNX3z2iaXXX603HB6cebZryX5X4IvemEJkU4sHBG9h9AQ/TL4QDaRB7A8Doyeju+zTc9LYSgwqxAF3iYmPc368WX0kxb44/rKpp0zAb++ckigda6I0LgaZXHulKycMLz31CZTVjhH/8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773385846; c=relaxed/simple; bh=lzrrrbYrMggax4CQalZw0G7TvNQ22j1zNBa4hPLshKs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ddzrpSbbpaadcw/zBBlTOvNjV5P4TETwq8SMtUhoQGZGtQVsx2nUz5IBBPFxIXARxfr9oBYrXcWjTJvX6H+yAfg/OjSLDVgtuySEl8zJgT3QTQnT7j6BVrXGT9JFfcDfktXpFdlpZPvXto3SDm8YSrasrVI9fsl5TZxyrrbhPP4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--chengkev.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Nr445hy+; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--chengkev.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Nr445hy+" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-35a0ee0fed2so1475379a91.0 for ; Fri, 13 Mar 2026 00:10:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1773385844; x=1773990644; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=fVi+Kj9L+S9ogpYgLsWQgKkY72ufJ8d7XTMRyRvK6kI=; b=Nr445hy+Tv37RiLUjknIhNEcuVMPC/0aLbv3NRIli6cDFXVJf4HSr5MBHUdmR0S3Nt 1EaYr9rj7gftiKu+ny1yEjfFelgaClY6eGMQ5udejnfxjZNeP0TCkZYXZF9SVxN2XCol 2ufoHzI0Sy1RZhx/PFx1i60+jiIIbpRkhTYkB81awKBknMO/clHY4ZNu2OGFTjuVWdxd HkZz+Wrq8NNwmtMsjoWCK6aOP4kPbFhzlbhhmMUEwcS5OjxvXYuApzBirI5hFo+SzNl3 Wd0IQalteP1d7/1zntHaCO2oYiYeizOo0WW0qWNiXo36lPo/MGllEqmBhjbOxLISALdd eDtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773385844; x=1773990644; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fVi+Kj9L+S9ogpYgLsWQgKkY72ufJ8d7XTMRyRvK6kI=; b=fk3zgsiSVNc9zmSqshMwnhqL6YJKoqd2Q3ee6SQQETC3M38kSj/9OzUspb86V8LPH3 3qmlU/vCvjCXg1qKkPrXKOcq93joTnHjZrV5u0ASRHR62TrXIGz3rLqZkLbiAX5sj4YT o3lLEf0Z4GAjXa0U70FyQyghHGets/mBvIwQ9pNbP0p2wf7QwcgtV6yMjJ3haCdiTVSK zxnpe0zE/RmtiNLif7n1sshLCbWOSwHE/EkSogH8AQl2fI3HVsSEypisjACVJ8Mq2A0J 3USR3/PbbgyF1kmP82Owvlu7XbJuwua2Ok9GUObL88aQfUrjUB0ztbDKF6bLRzLqmdg4 Stcw== X-Forwarded-Encrypted: i=1; AJvYcCVuQnaVGWluj8zA2rnUir0gp6PXm9N+EgRNBk2Mp87ajJWxeInCiC3HcooPLx2DmHHStwo2hMXw7Ja1Eho=@vger.kernel.org X-Gm-Message-State: AOJu0YzjaCvsI67zfjDsvhrUC2krSp723N+Lz1XQKWFhWKtyEHJwNnrk Xa1YIk05szcdlsvb2Gieibtm6f31gNaaRXKdWhm1pgKGhmoU/B/bEpev1AGKEzS5ONJTyge9Yy0 r6eb6MZI3+yoORw== X-Received: from pjbdj15.prod.google.com ([2002:a17:90a:d2cf:b0:359:fee7:eac2]) (user=chengkev job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3d8a:b0:359:8312:90a2 with SMTP id 98e67ed59e1d1-35a21ebedf7mr2186963a91.11.1773385843942; Fri, 13 Mar 2026 00:10:43 -0700 (PDT) Date: Fri, 13 Mar 2026 07:10:33 +0000 In-Reply-To: <20260313071033.4153209-1-chengkev@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260313071033.4153209-1-chengkev@google.com> X-Mailer: git-send-email 2.53.0.851.ga537e3e6e9-goog Message-ID: <20260313071033.4153209-5-chengkev@google.com> Subject: [PATCH V3 4/4] KVM: selftests: Add nested page fault injection test From: Kevin Cheng To: seanjc@google.com, pbonzini@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, yosry@kernel.org, Kevin Cheng Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a test that exercises nested page fault injection during L2 execution. L2 executes I/O string instructions (OUTSB/INSB) that access memory restricted in L1's nested page tables (NPT/EPT), triggering a nested page fault that L0 must inject to L1. The test supports both AMD SVM (NPF) and Intel VMX (EPT violation) and verifies that: - The exit reason is an NPF/EPT violation - The access type and permission bits are correct - The faulting GPA is correct Three test cases are implemented: - Unmap the final data page (final translation fault, OUTSB read) - Unmap a PT page (page walk fault, OUTSB read) - Write-protect the final data page (protection violation, INSB write) - Write-protect a PT page (protection violation on A/D update, OUTSB read) Signed-off-by: Kevin Cheng --- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../selftests/kvm/x86/nested_npf_test.c | 374 ++++++++++++++++++ 2 files changed, 375 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86/nested_npf_test.c diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selft= ests/kvm/Makefile.kvm index 3d372d78a275..9308e6100f27 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -94,6 +94,7 @@ TEST_GEN_PROGS_x86 +=3D x86/nested_dirty_log_test TEST_GEN_PROGS_x86 +=3D x86/nested_emulation_test TEST_GEN_PROGS_x86 +=3D x86/nested_exceptions_test TEST_GEN_PROGS_x86 +=3D x86/nested_invalid_cr3_test +TEST_GEN_PROGS_x86 +=3D x86/nested_npf_test TEST_GEN_PROGS_x86 +=3D x86/nested_set_state_test TEST_GEN_PROGS_x86 +=3D x86/nested_tsc_adjust_test TEST_GEN_PROGS_x86 +=3D x86/nested_tsc_scaling_test diff --git a/tools/testing/selftests/kvm/x86/nested_npf_test.c b/tools/test= ing/selftests/kvm/x86/nested_npf_test.c new file mode 100644 index 000000000000..7725e5dc3a38 --- /dev/null +++ b/tools/testing/selftests/kvm/x86/nested_npf_test.c @@ -0,0 +1,374 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2025, Google, Inc. + */ + +#include "test_util.h" +#include "kvm_util.h" +#include "processor.h" +#include "svm_util.h" +#include "vmx.h" + +#define L2_GUEST_STACK_SIZE 64 + +#define EPT_VIOLATION_ACC_READ BIT(0) +#define EPT_VIOLATION_ACC_WRITE BIT(1) +#define EPT_VIOLATION_ACC_INSTR BIT(2) +#define EPT_VIOLATION_PROT_READ BIT(3) +#define EPT_VIOLATION_PROT_WRITE BIT(4) +#define EPT_VIOLATION_PROT_EXEC BIT(5) +#define EPT_VIOLATION_GVA_IS_VALID BIT(7) +#define EPT_VIOLATION_GVA_TRANSLATED BIT(8) + +enum test_type { + TEST_FINAL_PAGE_UNMAPPED, /* Final data page not present */ + TEST_PT_PAGE_UNMAPPED, /* Page table page not present */ + TEST_FINAL_PAGE_WRITE_PROTECTED, /* Final data page read-only */ + TEST_PT_PAGE_WRITE_PROTECTED, /* Page table page read-only */ +}; + +static vm_vaddr_t l2_test_page; +static void (*l2_entry)(void); + +#define TEST_IO_PORT 0x80 +#define TEST1_VADDR 0x8000000ULL +#define TEST2_VADDR 0x10000000ULL +#define TEST3_VADDR 0x18000000ULL +#define TEST4_VADDR 0x20000000ULL + +/* + * L2 executes OUTS reading from l2_test_page, triggering a nested page + * fault on the read access. + */ +static void l2_guest_code_outs(void) +{ + asm volatile("outsb" ::"S"(l2_test_page), "d"(TEST_IO_PORT) : "memory"); + GUEST_FAIL("L2 should not reach here"); +} + +/* + * L2 executes INS writing to l2_test_page, triggering a nested page + * fault on the write access. + */ +static void l2_guest_code_ins(void) +{ + asm volatile("insb" ::"D"(l2_test_page), "d"(TEST_IO_PORT) : "memory"); + GUEST_FAIL("L2 should not reach here"); +} + +static void l1_vmx_code(struct vmx_pages *vmx, uint64_t expected_fault_gpa, + uint64_t test_type) +{ + unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE]; + uint64_t exit_qual; + + GUEST_ASSERT(vmx->vmcs_gpa); + GUEST_ASSERT(prepare_for_vmx_operation(vmx)); + GUEST_ASSERT(load_vmcs(vmx)); + + prepare_vmcs(vmx, l2_entry, &l2_guest_stack[L2_GUEST_STACK_SIZE]); + + GUEST_ASSERT(!vmlaunch()); + + /* Verify we got an EPT violation exit */ + __GUEST_ASSERT(vmreadz(VM_EXIT_REASON) =3D=3D EXIT_REASON_EPT_VIOLATION, + "Expected EPT violation (0x%x), got 0x%lx", + EXIT_REASON_EPT_VIOLATION, + vmreadz(VM_EXIT_REASON)); + + exit_qual =3D vmreadz(EXIT_QUALIFICATION); + + switch (test_type) { + case TEST_FINAL_PAGE_UNMAPPED: + /* Read access, final translation, page not present */ + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_ACC_READ, + "Expected ACC_READ set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_GVA_IS_VALID, + "Expected GVA_IS_VALID set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_GVA_TRANSLATED, + "Expected GVA_TRANSLATED set, exit_qual 0x%lx", + exit_qual); + break; + case TEST_PT_PAGE_UNMAPPED: + /* Read access, page walk fault, page not present */ + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_ACC_READ, + "Expected ACC_READ set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_GVA_IS_VALID, + "Expected GVA_IS_VALID set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(!(exit_qual & EPT_VIOLATION_GVA_TRANSLATED), + "Expected GVA_TRANSLATED clear, exit_qual 0x%lx", + exit_qual); + break; + case TEST_FINAL_PAGE_WRITE_PROTECTED: + /* Write access, final translation, page present but read-only */ + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_ACC_WRITE, + "Expected ACC_WRITE set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_PROT_READ, + "Expected PROT_READ set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(!(exit_qual & EPT_VIOLATION_PROT_WRITE), + "Expected PROT_WRITE clear, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_GVA_IS_VALID, + "Expected GVA_IS_VALID set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_GVA_TRANSLATED, + "Expected GVA_TRANSLATED set, exit_qual 0x%lx", + exit_qual); + break; + case TEST_PT_PAGE_WRITE_PROTECTED: + /* Write access (A/D update), page walk, page present but read-only */ + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_ACC_WRITE, + "Expected ACC_WRITE set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_PROT_READ, + "Expected PROT_READ set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(!(exit_qual & EPT_VIOLATION_PROT_WRITE), + "Expected PROT_WRITE clear, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_GVA_IS_VALID, + "Expected GVA_IS_VALID set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(!(exit_qual & EPT_VIOLATION_GVA_TRANSLATED), + "Expected GVA_TRANSLATED clear, exit_qual 0x%lx", + exit_qual); + break; + } + + __GUEST_ASSERT(vmreadz(GUEST_PHYSICAL_ADDRESS) =3D=3D expected_fault_gpa, + "Expected guest_physical_address =3D 0x%lx, got 0x%lx", + expected_fault_gpa, + vmreadz(GUEST_PHYSICAL_ADDRESS)); + + GUEST_DONE(); +} + +static void l1_svm_code(struct svm_test_data *svm, uint64_t expected_fault= _gpa, + uint64_t test_type) +{ + unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE]; + struct vmcb *vmcb =3D svm->vmcb; + uint64_t exit_info_1; + + generic_svm_setup(svm, l2_entry, + &l2_guest_stack[L2_GUEST_STACK_SIZE]); + + run_guest(vmcb, svm->vmcb_gpa); + + /* Verify we got an NPF exit */ + __GUEST_ASSERT(vmcb->control.exit_code =3D=3D SVM_EXIT_NPF, + "Expected NPF exit (0x%x), got 0x%lx", SVM_EXIT_NPF, + vmcb->control.exit_code); + + exit_info_1 =3D vmcb->control.exit_info_1; + + switch (test_type) { + case TEST_FINAL_PAGE_UNMAPPED: + /* Read access, final translation, page not present */ + __GUEST_ASSERT(exit_info_1 & PFERR_GUEST_FINAL_MASK, + "Expected GUEST_FINAL set, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(!(exit_info_1 & PFERR_GUEST_PAGE_MASK), + "Expected GUEST_PAGE clear, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(!(exit_info_1 & PFERR_PRESENT_MASK), + "Expected PRESENT clear, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + break; + case TEST_PT_PAGE_UNMAPPED: + /* Read access, page walk fault, page not present */ + __GUEST_ASSERT(exit_info_1 & PFERR_GUEST_PAGE_MASK, + "Expected GUEST_PAGE set, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(!(exit_info_1 & PFERR_GUEST_FINAL_MASK), + "Expected GUEST_FINAL clear, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(!(exit_info_1 & PFERR_PRESENT_MASK), + "Expected PRESENT clear, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + break; + case TEST_FINAL_PAGE_WRITE_PROTECTED: + /* Write access, final translation, page present but read-only */ + __GUEST_ASSERT(exit_info_1 & PFERR_GUEST_FINAL_MASK, + "Expected GUEST_FINAL set, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(!(exit_info_1 & PFERR_GUEST_PAGE_MASK), + "Expected GUEST_PAGE clear, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(exit_info_1 & PFERR_PRESENT_MASK, + "Expected PRESENT set, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(exit_info_1 & PFERR_WRITE_MASK, + "Expected WRITE set, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + break; + case TEST_PT_PAGE_WRITE_PROTECTED: + /* Write access (A/D update), page walk, page present but read-only */ + __GUEST_ASSERT(exit_info_1 & PFERR_GUEST_PAGE_MASK, + "Expected GUEST_PAGE set, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(!(exit_info_1 & PFERR_GUEST_FINAL_MASK), + "Expected GUEST_FINAL clear, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(exit_info_1 & PFERR_PRESENT_MASK, + "Expected PRESENT set, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(exit_info_1 & PFERR_WRITE_MASK, + "Expected WRITE set, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + break; + } + + __GUEST_ASSERT(vmcb->control.exit_info_2 =3D=3D expected_fault_gpa, + "Expected exit_info_2 =3D 0x%lx, got 0x%lx", + expected_fault_gpa, + vmcb->control.exit_info_2); + + GUEST_DONE(); +} + +static void l1_guest_code(void *data, uint64_t expected_fault_gpa, + uint64_t test_type) +{ + if (this_cpu_has(X86_FEATURE_VMX)) + l1_vmx_code(data, expected_fault_gpa, test_type); + else + l1_svm_code(data, expected_fault_gpa, test_type); +} + +/* Returns the GPA of the PT page that maps @vaddr. */ +static uint64_t get_pt_gpa_for_vaddr(struct kvm_vm *vm, uint64_t vaddr) +{ + uint64_t *pte; + + pte =3D vm_get_pte(vm, vaddr); + TEST_ASSERT(pte && (*pte & 0x1), "PTE not present for vaddr 0x%lx", + (unsigned long)vaddr); + + return addr_hva2gpa(vm, (void *)((uint64_t)pte & ~0xFFFULL)); +} + +static void run_test(enum test_type type) +{ + vm_paddr_t expected_fault_gpa; + vm_vaddr_t nested_gva; + + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + struct ucall uc; + + vm =3D vm_create_with_one_vcpu(&vcpu, l1_guest_code); + vm_enable_tdp(vm); + + if (kvm_cpu_has(X86_FEATURE_VMX)) + vcpu_alloc_vmx(vm, &nested_gva); + else + vcpu_alloc_svm(vm, &nested_gva); + + switch (type) { + case TEST_FINAL_PAGE_UNMAPPED: + /* + * Unmap the final data page from NPT/EPT. The guest page + * table walk succeeds, but the final GPA->HPA translation + * fails. L2 reads from the page via OUTS. + */ + l2_entry =3D l2_guest_code_outs; + l2_test_page =3D vm_vaddr_alloc(vm, vm->page_size, TEST1_VADDR); + expected_fault_gpa =3D addr_gva2gpa(vm, l2_test_page); + break; + case TEST_PT_PAGE_UNMAPPED: + /* + * Unmap a page table page from NPT/EPT. The hardware page + * table walk fails when translating the PT page's GPA + * through NPT/EPT. L2 reads from the page via OUTS. + */ + l2_entry =3D l2_guest_code_outs; + l2_test_page =3D vm_vaddr_alloc(vm, vm->page_size, TEST2_VADDR); + expected_fault_gpa =3D get_pt_gpa_for_vaddr(vm, l2_test_page); + break; + case TEST_FINAL_PAGE_WRITE_PROTECTED: + /* + * Write-protect the final data page in NPT/EPT. The page + * is present and readable, but not writable. L2 writes to + * the page via INS, triggering a protection violation. + */ + l2_entry =3D l2_guest_code_ins; + l2_test_page =3D vm_vaddr_alloc(vm, vm->page_size, TEST3_VADDR); + expected_fault_gpa =3D addr_gva2gpa(vm, l2_test_page); + break; + case TEST_PT_PAGE_WRITE_PROTECTED: + /* + * Write-protect a page table page in NPT/EPT. The page is + * present and readable, but not writable. The guest page + * table walk needs write access to set A/D bits, so it + * triggers a protection violation on the PT page. + * L2 reads from the page via OUTS. + */ + l2_entry =3D l2_guest_code_outs; + l2_test_page =3D vm_vaddr_alloc(vm, vm->page_size, TEST4_VADDR); + expected_fault_gpa =3D get_pt_gpa_for_vaddr(vm, l2_test_page); + break; + } + + tdp_identity_map_default_memslots(vm); + + if (type =3D=3D TEST_FINAL_PAGE_WRITE_PROTECTED || + type =3D=3D TEST_PT_PAGE_WRITE_PROTECTED) + *tdp_get_pte(vm, expected_fault_gpa) &=3D ~PTE_WRITABLE_MASK(&vm->stage2= _mmu); + else + *tdp_get_pte(vm, expected_fault_gpa) &=3D ~(PTE_PRESENT_MASK(&vm->stage2= _mmu) | + PTE_READABLE_MASK(&vm->stage2_mmu) | + PTE_WRITABLE_MASK(&vm->stage2_mmu) | + PTE_EXECUTABLE_MASK(&vm->stage2_mmu)); + + sync_global_to_guest(vm, l2_entry); + sync_global_to_guest(vm, l2_test_page); + vcpu_args_set(vcpu, 3, nested_gva, expected_fault_gpa, (uint64_t)type); + + /* + * For the INS-based write test, KVM emulates the instruction and + * first reads from the I/O port, which exits to userspace. + * Re-enter the guest so emulation can proceed to the memory + * write, where the nested page fault is triggered. + */ + for (;;) { + vcpu_run(vcpu); + + if (vcpu->run->exit_reason =3D=3D KVM_EXIT_IO && + vcpu->run->io.port =3D=3D TEST_IO_PORT && + vcpu->run->io.direction =3D=3D KVM_EXIT_IO_IN) { + continue; + } + break; + } + + switch (get_ucall(vcpu, &uc)) { + case UCALL_DONE: + break; + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + default: + TEST_FAIL("Unexpected exit reason: %d", vcpu->run->exit_reason); + } + + kvm_vm_free(vm); +} + +int main(int argc, char *argv[]) +{ + TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX) || kvm_cpu_has(X86_FEATURE_SVM)= ); + TEST_REQUIRE(kvm_cpu_has_tdp()); + + run_test(TEST_FINAL_PAGE_UNMAPPED); + run_test(TEST_PT_PAGE_UNMAPPED); + run_test(TEST_FINAL_PAGE_WRITE_PROTECTED); + run_test(TEST_PT_PAGE_WRITE_PROTECTED); + + return 0; +} --=20 2.53.0.851.ga537e3e6e9-goog