From nobody Fri Apr 17 07:13:59 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E3F6036403F for ; Tue, 24 Feb 2026 07:18:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771917509; cv=none; b=WiBkybbwjG/CIuq0xTAerd2K2AsG6v04WKcDf0LI5YE1HcOD0wr5zLo8x/3UM/fjmEmmfmDS0IClsll1nfr4J5EI0rHLI8E13gRANK8dfvMf/xj0wvZ+xtPEuSFva1l6Fc+Jns+rd850SSikRk8ATfgoDOt3MwSPACAduUkv26k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771917509; c=relaxed/simple; bh=sqw10a5I4gtYDuskoXi6Kp5gGm44+OFNeBcV1k6G2uw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=q7OfKxjq5yhWk2Visv/rXofB6831ZWof1JyBh73NyGXkbMzzp6cK4O9kMJKMk1bhaNQjqe6HcpZyWwQkmHHQyJ+B0cbhuMxQLhxxG/N/YBFcDiP1hoklvY/mcXdVbpxNmovPt0FK8MQvl0NGA2DpMZuzEtUN2toAx9trjdKFEd4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--chengkev.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=GhLZDBPh; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--chengkev.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="GhLZDBPh" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-3545b891dd1so32129093a91.1 for ; Mon, 23 Feb 2026 23:18:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771917507; x=1772522307; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=dcHRPFmunbEFZzyxfL0Rua/ObkZPc5hbYi5K4rfoc3o=; b=GhLZDBPhVVqVrtrOBq10Pb8+CAHEJNUFrwkyxofdtC8HqS2Y0F+zBICKjVSAio0lia lnnKcGdUZ4R4xg9HdQKnPy5hQ/WkYt3y4oeFbY2DJf2l8qxpRYSTn6oantINHv03qt2W BVsCJovBb8ftpa2BHkSyGkiBaKeKnjRS4TTQmdgEYx6/3wrbQr2R5yUfuf6KhZVJKsgi MAXz/UU7ZW+mCqWhFIjiIiwpBC0z0a6j2amXupzaHapT6n2Lzms6SlIJOpbRgQ3oTneW YEB4I3rXjooUOKl8idk9ZUNdr9PrCAecxvmkiP9qTy+03t/rMGQZ7C+OdRgizhvL2yty f3Lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771917507; x=1772522307; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dcHRPFmunbEFZzyxfL0Rua/ObkZPc5hbYi5K4rfoc3o=; b=l9XWJem6zNHTrXLodokBIMkAkfHV1KD2v+3yOYUCr+HFa40Dp4Q33DOHAE5NjdfkmT 3rZlJOZwGuTvV/nQd/DhDnErH6gf9tBorFxIHMPsYu0B4BH5Ay6iiC0Zl8KuBbT5QdV0 5HBvtm4/P0+uIGVJUWYPwQznuxisvPBW9fmasdLqtv8l3F24KcSkCegmq3p5gIpm9TRW rl1bfA+PerRm+LiVuTk2bFk2/OcZsXQXEp/MEeYoSxLvo5LLr4hvhRQUmtUF+H/5RNuQ WD9QFUP6hFgewDWWsuUg0LePZ6mKPAvCtK6jbqCmbQqZ1xd99xnLb69GGyJ9hMxU85/T DFaQ== X-Forwarded-Encrypted: i=1; AJvYcCX500UlhF0QTIPHxqlNKkg6zF0cwKYtRNsRZG3qAg2RBsiVq+7muEcM21Ox8jKTqEqifHWqsqK2viEWBIc=@vger.kernel.org X-Gm-Message-State: AOJu0Yz+3mPZFO/T3AfpjLyPAELecTSOQnTJGGywNstpflVR/8t2X/8d 782SZuFTAllPGhkD7HGFrAz4GGQwmLK8wk/oG+t+vard4hoKt3K2Y2+EN1LMCItSfw4oNYhoz2d UPBJHnbN9uHI7LA== X-Received: from pjbbx11.prod.google.com ([2002:a17:90a:f48b:b0:356:2fe0:f5b4]) (user=chengkev job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3147:b0:354:a065:ec3b with SMTP id 98e67ed59e1d1-358ae8d5edbmr10094029a91.27.1771917507178; Mon, 23 Feb 2026 23:18:27 -0800 (PST) Date: Tue, 24 Feb 2026 07:18:19 +0000 In-Reply-To: <20260224071822.369326-1-chengkev@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260224071822.369326-1-chengkev@google.com> X-Mailer: git-send-email 2.53.0.414.gf7e9f6c205-goog Message-ID: <20260224071822.369326-2-chengkev@google.com> Subject: [PATCH V2 1/4] KVM: x86: Widen x86_exception's error_code to 64 bits From: Kevin Cheng To: seanjc@google.com, pbonzini@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, yosry.ahmed@linux.dev, Kevin Cheng Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Widen the error_code field in struct x86_exception from u16 to u64 to accommodate AMD's NPF error code, which defines information bits above bit 31, e.g. PFERR_GUEST_FINAL_MASK (bit 32), and PFERR_GUEST_PAGE_MASK (bit 33). Retain the u16 type for the local errcode variable in walk_addr_generic as the walker synthesizes conventional #PF error codes that are architecturally limited to bits 15:0. Signed-off-by: Kevin Cheng --- arch/x86/kvm/kvm_emulate.h | 2 +- arch/x86/kvm/mmu/paging_tmpl.h | 6 ++++++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h index fb3dab4b5a53e..ff4f9b0a01ff7 100644 --- a/arch/x86/kvm/kvm_emulate.h +++ b/arch/x86/kvm/kvm_emulate.h @@ -22,7 +22,7 @@ enum x86_intercept_stage; struct x86_exception { u8 vector; bool error_code_valid; - u16 error_code; + u64 error_code; bool nested_page_fault; u64 address; /* cr2 or nested page fault gpa */ u8 async_page_fault; diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 901cd2bd40b84..37eba7dafd14f 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -317,6 +317,12 @@ static int FNAME(walk_addr_generic)(struct guest_walke= r *walker, const int write_fault =3D access & PFERR_WRITE_MASK; const int user_fault =3D access & PFERR_USER_MASK; const int fetch_fault =3D access & PFERR_FETCH_MASK; + /* + * Note! Track the error_code that's common to legacy shadow paging + * and NPT shadow paging as a u16 to guard against unintentionally + * setting any of bits 63:16. Architecturally, the #PF error code is + * 32 bits, and Intel CPUs don't support settings bits 31:16. + */ u16 errcode =3D 0; gpa_t real_gpa; gfn_t gfn; --=20 2.53.0.414.gf7e9f6c205-goog From nobody Fri Apr 17 07:13:59 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 816A7364056 for ; Tue, 24 Feb 2026 07:18:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771917510; cv=none; b=Tnjh8fvyVTuSXWHPQQ8qZj57I2nT/lA0yaW2WGAPh8hS4RCrJc/LrzzURlB1PX0E6F8AQ26xtkd4hrzNuTo0r/OwJfRp54LHiisPozytPHz3ZxfAd89JTw/IlEjpe6OlJZ2ozpp4e+bjW0C7D3rdV1VlGypRgiPI5AA5nf7Fs0U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771917510; c=relaxed/simple; bh=t/4eIGDwcWrsgm4X+m3qrlgXmkeUOwyBevemJiQ3rw8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=r2lVrL3Efp29nabrW6C9c+IHp/AtdXFjhOZf8DNGnwzhPTjil1kKysF12SfOmN/ESM7cKQzvMdUtSEZ+r8IwiQWCxrQi+4lgdWs+PCvVLIKJAiBrVpUz6gdHV0Q29hMf/xdYjtkBgfMKkWz0fG0NfNB6pyurlZk+6oaHcTa/GT0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--chengkev.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=xLob06m6; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--chengkev.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="xLob06m6" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-354c0234c1fso4838733a91.2 for ; Mon, 23 Feb 2026 23:18:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771917509; x=1772522309; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=S3RqoxsfWqLtznrpktgtZU0l3Zon+Utvni3pymAleg0=; b=xLob06m6XFexsyI51qWObnTlF+cRh7n7wYtkMPzYdV0VevYfKp1WuMeyvWB1Yftiau +R0nBeW/xFAbeoVpUCRN8jvGfqp9mYKwsZGccukrsFCEI4Wyp2Bnn8cw0LioNBC7Bn8h 3R8zbFcQt5VfbWNAX43c+giCq80MTWaTa8/sumA/KVc+ctgdp72PWSkIZGNV0M98pWwA EVz3jL26yLnWvbwqPc08txg/9/5hf3fNMVzl8Rs3CQ9CKoG1SwKP7CPESq+EM5qiQv3T klgOd+x6DuH34FLv5+dii/JPnKBNtXO7GA6zD96A/9/tQkG3FFZjFWdzH7+GWgfg2Oqz 8yDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771917509; x=1772522309; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=S3RqoxsfWqLtznrpktgtZU0l3Zon+Utvni3pymAleg0=; b=HvWAEl/nvwW9wln1ZfkoAt4XnixKeeaCJTFlYT+w1IyqGxS/UKIlsdBYhaxwOVkvUL ntUsmdbZqUJckc7YIL/9GZ/nHFhIjkgyYU8AFYqb7K039jqsS9U8BPHolg6I0wwpiVMv mKtlYZM+z6uzTFHlzVV9yC8fihW4K/rzG6/l41ZbiLW36xp/VPZoxGw8SJ67jaDE1XBK a4jmcdNRbX0Nnd8LuiPdguXqtUtLK85c5G5Qa8Ll4nIFXlol0KVSs7awpqgtc6O6U0UY fMeysK7VZTHeSQIfehUZOb/cnMw2eBqHJe5673l5+3gBlHjC2gaT64ZuHpZe54oNvlEr 7oEw== X-Forwarded-Encrypted: i=1; AJvYcCVqjHN6kH7W5wkr6XxJDrvXxea9kRGBjHQYq/sFpy2tZI2k5ddF2puIAmFXcBZyZrOGmhIXioa7bFiE1So=@vger.kernel.org X-Gm-Message-State: AOJu0YwNOBD7dH1u2nmTihaVSuqC7vCdd8O/JmkuSeqDDb4TPotXixaL 5CdY7CxNU032JoW+866WLI3z+kEdDtycy8LdfsOL4Aoy5NZsJ5PO4CIaIfnIl8gCifpVc3pdymG 9ylufrybyxmY4yw== X-Received: from pjbsm18.prod.google.com ([2002:a17:90b:2e52:b0:353:2ac6:b59b]) (user=chengkev job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:53ce:b0:356:2fee:92c4 with SMTP id 98e67ed59e1d1-358ae8ce789mr11164142a91.24.1771917508746; Mon, 23 Feb 2026 23:18:28 -0800 (PST) Date: Tue, 24 Feb 2026 07:18:20 +0000 In-Reply-To: <20260224071822.369326-1-chengkev@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260224071822.369326-1-chengkev@google.com> X-Mailer: git-send-email 2.53.0.414.gf7e9f6c205-goog Message-ID: <20260224071822.369326-3-chengkev@google.com> Subject: [PATCH V2 2/4] KVM: SVM: Fix nested NPF injection to set PFERR_GUEST_{PAGE,FINAL}_MASK From: Kevin Cheng To: seanjc@google.com, pbonzini@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, yosry.ahmed@linux.dev, Kevin Cheng Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When KVM emulates an instruction for L2 and encounters a nested page fault (e.g., during string I/O emulation), nested_svm_inject_npf_exit() injects an NPF to L1. However, the code incorrectly hardcodes (1ULL << 32) for exit_info_1's upper bits when the original exit was not an NPF. This always sets PFERR_GUEST_FINAL_MASK even when the fault occurred on a page table page, preventing L1 from correctly identifying the cause of the fault. Set PFERR_GUEST_PAGE_MASK in the error code when a nested page fault occurs during a guest page table walk, and PFERR_GUEST_FINAL_MASK when the fault occurs on the final GPA-to-HPA translation. Widen error_code in struct x86_exception from u16 to u64 to accommodate the PFERR_GUEST_* bits (bits 32 and 33). Update nested_svm_inject_npf_exit() to use fault->error_code directly instead of hardcoding the upper bits. Also add a WARN_ON_ONCE if neither PFERR_GUEST_FINAL_MASK nor PFERR_GUEST_PAGE_MASK is set, as this would indicate a bug in the page fault handling code. Signed-off-by: Kevin Cheng --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/mmu/paging_tmpl.h | 22 ++++++++++------------ arch/x86/kvm/svm/nested.c | 19 +++++++++++++------ 3 files changed, 25 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index ff07c45e3c731..454f84660edfc 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -280,6 +280,8 @@ enum x86_intercept_stage; #define PFERR_GUEST_RMP_MASK BIT_ULL(31) #define PFERR_GUEST_FINAL_MASK BIT_ULL(32) #define PFERR_GUEST_PAGE_MASK BIT_ULL(33) +#define PFERR_GUEST_FAULT_STAGE_MASK \ + (PFERR_GUEST_FINAL_MASK | PFERR_GUEST_PAGE_MASK) #define PFERR_GUEST_ENC_MASK BIT_ULL(34) #define PFERR_GUEST_SIZEM_MASK BIT_ULL(35) #define PFERR_GUEST_VMPL_MASK BIT_ULL(36) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 37eba7dafd14f..f148c92b606ba 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -385,18 +385,12 @@ static int FNAME(walk_addr_generic)(struct guest_walk= er *walker, real_gpa =3D kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(table_gfn), nested_access, &walker->fault); =20 - /* - * FIXME: This can happen if emulation (for of an INS/OUTS - * instruction) triggers a nested page fault. The exit - * qualification / exit info field will incorrectly have - * "guest page access" as the nested page fault's cause, - * instead of "guest page structure access". To fix this, - * the x86_exception struct should be augmented with enough - * information to fix the exit_qualification or exit_info_1 - * fields. - */ - if (unlikely(real_gpa =3D=3D INVALID_GPA)) + if (unlikely(real_gpa =3D=3D INVALID_GPA)) { +#if PTTYPE !=3D PTTYPE_EPT + walker->fault.error_code |=3D PFERR_GUEST_PAGE_MASK; +#endif return 0; + } =20 slot =3D kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(real_gpa)); if (!kvm_is_visible_memslot(slot)) @@ -452,8 +446,12 @@ static int FNAME(walk_addr_generic)(struct guest_walke= r *walker, #endif =20 real_gpa =3D kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(gfn), access, &walke= r->fault); - if (real_gpa =3D=3D INVALID_GPA) + if (real_gpa =3D=3D INVALID_GPA) { +#if PTTYPE !=3D PTTYPE_EPT + walker->fault.error_code |=3D PFERR_GUEST_FINAL_MASK; +#endif return 0; + } =20 walker->gfn =3D real_gpa >> PAGE_SHIFT; =20 diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index de90b104a0dd5..1013e814168b5 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -40,18 +40,25 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu = *vcpu, struct vmcb *vmcb =3D svm->vmcb; =20 if (vmcb->control.exit_code !=3D SVM_EXIT_NPF) { - /* - * TODO: track the cause of the nested page fault, and - * correctly fill in the high bits of exit_info_1. - */ - vmcb->control.exit_code =3D SVM_EXIT_NPF; - vmcb->control.exit_info_1 =3D (1ULL << 32); + vmcb->control.exit_info_1 =3D fault->error_code; vmcb->control.exit_info_2 =3D fault->address; } =20 + vmcb->control.exit_code =3D SVM_EXIT_NPF; vmcb->control.exit_info_1 &=3D ~0xffffffffULL; vmcb->control.exit_info_1 |=3D fault->error_code; =20 + /* + * All nested page faults should be annotated as occurring on the + * final translation *or* the page walk. Arbitrarily choose "final" + * if KVM is buggy and enumerated both or neither. + */ + if (WARN_ON_ONCE(hweight64(vmcb->control.exit_info_1 & + PFERR_GUEST_FAULT_STAGE_MASK) !=3D 1)) { + vmcb->control.exit_info_1 &=3D ~PFERR_GUEST_FAULT_STAGE_MASK; + vmcb->control.exit_info_1 |=3D PFERR_GUEST_FINAL_MASK; + } + nested_svm_vmexit(svm); } =20 --=20 2.53.0.414.gf7e9f6c205-goog From nobody Fri Apr 17 07:13:59 2026 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1EFA33644BC for ; Tue, 24 Feb 2026 07:18:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771917512; cv=none; b=tYn3XlSonopgFms4Z8y3cQKdowGgdHtWX6bMN4jUFx8Jptef2gtf8NRh0R9c7QwVF4cZfT70JeNN9tv3zsNa5hTZIuuErzb563lNHfsZ217K+btr4tYqw+iddyJmQubKhVCnDEA8dVLSqy/PuRQDpz8YHr8DPFITQj4Ka72o/Zg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771917512; c=relaxed/simple; bh=p+/JXIHe+Eju4TiZ6IHv5BpGFugMUeIkVHKHy0fA2/4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=jSeNHQiEr+m32KIoq7oeTELLIq5AeaYvSQQH4RqzUoCoFp5BLtiqWUTiOxEeFpjfdb2cehCZ7BuifSgPKEp39R/EKb5rlSCnCvYhBOfxXNzUqWrFGXjwStif4QHqhOXYrRMA0srgTtUwLVN7IsmWmjbK754ilmlHhENgDWkArK0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--chengkev.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=FN94f1aq; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--chengkev.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="FN94f1aq" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-824b2f0d404so23699229b3a.0 for ; Mon, 23 Feb 2026 23:18:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771917510; x=1772522310; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=UKYOfVtFaaNnohSAhtPG8hERqqL1dIuBDlAEuZIErIM=; b=FN94f1aqDrh8f2x5HgxOwYSPtcVlqHgqmWrF3lfL9fisIGtm4Oa8POouXrzCWhrV+c 8V1GJvq7B6C/SsJesAAhdfV7LpEIbPQ4abM1yqSbb2jxGTICqd4nAiN+ZxMhHnZzSvOZ 9+hid4fZaB7Rg3+bWk4pwGjV28Y26ITAXgrNSkJAPmwfeBgqnNjSqSW4VNGB5scx927+ DgVRIOdd9JE0VvVgSZu0CG+b3gafBNP/7xAZlQMfwA4tx2cwPrnWDCAGW5Tom9SkpP/5 z2UiRDQ16xugNO5dG3d1xObYkPmcRo+0CH5QK+xY+D8Iz9H3sZ991tDaR/8tBfN0qesy 2D8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771917510; x=1772522310; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UKYOfVtFaaNnohSAhtPG8hERqqL1dIuBDlAEuZIErIM=; b=JS5z2tjwPOw0dKSfvqIUYoG56Eoez+tqN5MzatiYaqtM4sJ2Q9RutODQml+Q8796sh CbEQndUq3JV60IfL6iDW73Sbf22H1zttcSoNmTqMJ9Kj61n3OlqbC+2fZI5VtvX/AjES p1BOOIGp3T7H7jzTM/LTCIizZgLYDpPrBM6ZJhHTS4qPSJ5i5Vh2W6glJU8oPXbMuc1u FqJPYgTGEtbaYXFw8gU9KcQnId92pxiT61fAWzEBqcA+nFvbTOCpwHsylJlBJMCndG4W aiPF37mE/SGWgAT/8cXy/i/OqrJCh9ZWlZUkMgH5YNYAno2VSdNjfZk0mRXZ/ZMTpzEU EV8Q== X-Forwarded-Encrypted: i=1; AJvYcCVu/F0CcdGpTKlwioNGQBovSX7uQOwguJ8SvKyvKrFG57lPhm3JIUjpngM6+zfgdJ8BQyHuFAozw9EMTKg=@vger.kernel.org X-Gm-Message-State: AOJu0Yz7S7/4MHRoSJ+5nUQ7uYL2Qou1ZtnGjG8xainVGtOZmFEQmH8u BAaOSDEtuyapR9YAaEriBp/eETgTNf+zcsNHJ7j/mhHGWcZAEvGp1IBozv3Lo8RGjBoZ38sNWNA N4VR4vgx9y47LRQ== X-Received: from pfbbe20.prod.google.com ([2002:a05:6a00:1f14:b0:824:ce77:6430]) (user=chengkev job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:1813:b0:823:edd:20b9 with SMTP id d2e1a72fcca58-826daaa6ee3mr10406235b3a.61.1771917510282; Mon, 23 Feb 2026 23:18:30 -0800 (PST) Date: Tue, 24 Feb 2026 07:18:21 +0000 In-Reply-To: <20260224071822.369326-1-chengkev@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260224071822.369326-1-chengkev@google.com> X-Mailer: git-send-email 2.53.0.414.gf7e9f6c205-goog Message-ID: <20260224071822.369326-4-chengkev@google.com> Subject: [PATCH V2 3/4] KVM: VMX: Don't consult original exit qualification for nested EPT violation injection From: Kevin Cheng To: seanjc@google.com, pbonzini@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, yosry.ahmed@linux.dev, Kevin Cheng Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Remove the OR of EPT_VIOLATION_GVA_IS_VALID and EPT_VIOLATION_GVA_TRANSLATED from the hardware exit qualification when injecting a synthesized EPT violation to L1. The hardware exit qualification reflects the original VM exit, which may not be an EPT violation at all, e.g. if KVM is emulating an I/O instruction and the memory operand's translation through L1's EPT fails. In that case, bits 7-8 of the exit qualification have completely different semantics (or are simply zero), and OR'ing them into the injected EPT violation corrupts the GVA_IS_VALID/GVA_TRANSLATED information. Even when the original exit is an EPT violation, the hardware bits may not match the current fault. For example, if an EPT violation happened while walking L2's page tables, it's possible that the EPT violation injected by KVM into L1 is for the final address translation, if L1 already had the mappings for L2's page tables in its EPTs but KVM did not have shadow EPTs for them. Populate EPT_VIOLATION_GVA_IS_VALID and EPT_VIOLATION_GVA_TRANSLATED directly in the page table walker at the kvm_translate_gpa() failure sites, mirroring the existing PFERR_GUEST_PAGE_MASK and PFERR_GUEST_FINAL_MASK population for NPT. Signed-off-by: Kevin Cheng --- arch/x86/kvm/mmu/paging_tmpl.h | 16 +++++++++++++++- arch/x86/kvm/vmx/nested.c | 3 --- 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index f148c92b606ba..a084b5e50effc 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -386,8 +386,19 @@ static int FNAME(walk_addr_generic)(struct guest_walke= r *walker, nested_access, &walker->fault); =20 if (unlikely(real_gpa =3D=3D INVALID_GPA)) { + /* + * Unconditionally set the NPF error_code bits and + * EPT exit_qualification bits for nested page + * faults. The walker doesn't know whether L1 uses + * NPT or EPT, and each injection handler consumes + * only the field it cares about (error_code for + * NPF, exit_qualification for EPT violations), so + * setting both is harmless. + */ #if PTTYPE !=3D PTTYPE_EPT walker->fault.error_code |=3D PFERR_GUEST_PAGE_MASK; + walker->fault.exit_qualification |=3D + EPT_VIOLATION_GVA_IS_VALID; #endif return 0; } @@ -449,6 +460,9 @@ static int FNAME(walk_addr_generic)(struct guest_walker= *walker, if (real_gpa =3D=3D INVALID_GPA) { #if PTTYPE !=3D PTTYPE_EPT walker->fault.error_code |=3D PFERR_GUEST_FINAL_MASK; + walker->fault.exit_qualification |=3D + EPT_VIOLATION_GVA_IS_VALID | + EPT_VIOLATION_GVA_TRANSLATED; #endif return 0; } @@ -496,7 +510,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker= *walker, * [2:0] - Derive from the access bits. The exit_qualification might be * out of date if it is serving an EPT misconfiguration. * [5:3] - Calculated by the page walk of the guest EPT page tables - * [7:8] - Derived from [7:8] of real exit_qualification + * [7:8] - Set at the kvm_translate_gpa() call sites above * * The other bits are set to 0. */ diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 248635da67661..6a167b1d51595 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -444,9 +444,6 @@ static void nested_ept_inject_page_fault(struct kvm_vcp= u *vcpu, exit_qualification =3D 0; } else { exit_qualification =3D fault->exit_qualification; - exit_qualification |=3D vmx_get_exit_qual(vcpu) & - (EPT_VIOLATION_GVA_IS_VALID | - EPT_VIOLATION_GVA_TRANSLATED); vm_exit_reason =3D EXIT_REASON_EPT_VIOLATION; } =20 --=20 2.53.0.414.gf7e9f6c205-goog From nobody Fri Apr 17 07:13:59 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C8882364056 for ; Tue, 24 Feb 2026 07:18:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771917516; cv=none; b=KMvObIUi/T9ahE6efe3lnfjGpmr1Wc20EYgWDZniE4MDZNzlWCd5qZIiMLVG8ArUGSwz5mWDwosIm7HR9deMLMX3LbLGXPc5EwkSHCp6UTwIVxj/+qfz+ATTmtBHY9y4DBugAZ7i5KBqqG/sITnGZo4WnhfI44FrGnYS+SWll04= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771917516; c=relaxed/simple; bh=o2lXAF8DlbMbY3dKElYJx00rNsVLLWMFknS+lu6kFMg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=aoz9lWtbnhMz2kw16xvByhKuIhj/7IEws2dAt2O19TgY7BoMkOjDICOAqNSBHFtXNHVsyqqGPlxgZC3lUumJsBLP2hkNCPeF05uzBt01TxyZbpd7k4iuXHY6E+GYYRLnLSlbkp7zLprOY6anAquP7Ji359DzkZk9K5CJclpXov0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--chengkev.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=eyNeL6Km; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--chengkev.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="eyNeL6Km" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-3562bdba6f7so33483140a91.2 for ; Mon, 23 Feb 2026 23:18:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771917512; x=1772522312; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=iHhMMK4VWyZxrOQeZtJ11bNmNzWKZ3L+l8EC7z1Hl3w=; b=eyNeL6KmycK2ZxMoDP7s7Up+ke4yWKa34aS2ExgaqT5YNJnsC8MHpL0ZUU7RlRHLib QRhdFjuAY2mMGrh28pHgnWFNM3oq/bzPqxv2wwK08UFX0o2c29BryOKdpJjQlog+nhyB /fDe/icqgq9Lp3fqwOzVTWaFg657lxPlwxRxYF6O5DJi9/CxKNy9FfWu1Xa/3P1vi8lD jAhzK626IXWllY/2MTEPIc7F86KVjY7JRPQaYD0LefcIHW75FAAmEZ3zisVm4W8Z1qRy T2dtc2XsVCoLLCv1WgYPDyu56aNo3Imcr5ZO89l7pOhkUASFBCiaP6wTC9mP+/IE4gim iJTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771917512; x=1772522312; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=iHhMMK4VWyZxrOQeZtJ11bNmNzWKZ3L+l8EC7z1Hl3w=; b=T86yWfgKKKZQT/J1xzDaEsTWZsOygSlXzIvnuw9S8aL5AHHTK9L/6cV/GEUyH8jKmo OtOPJNuvQp2G0H02loiIMbGd+dZo9XSCYDWuXuv8dBDbjzf3+cZypOquUO9EWqVJMsOS k7uIpePwM/R+ar2T3KGQA0afJH4xmiotymz7xHteOa9NqQUuRiCV9/hCUdyaqMJyAyLY 6T07dGBLZsE8DuiIcCMSfrqYSU1/p32h4md4d/vhRQ9nLeqv2Wr34qtohFBrSHYP0Vbv pMP5ydnvkZ+6JVQX4X8/1hlFHfkadvsZ7EtDcxu0BRN+lfB8Bj2k1KP6MTmQ4w2KDHCB iVtQ== X-Forwarded-Encrypted: i=1; AJvYcCVOJxV52ghsDngtZ0YrTNkfsj8tBf+llztxpogfjwDfoJwKj5MU1YdpoBzgesji6/dtCwZu5/KsVXseLQQ=@vger.kernel.org X-Gm-Message-State: AOJu0Yzf7hW7T4jjXEY4eEKP2rNJiUGF2PhEfaGKcj/EI4TC5Zjzio13 uapql1hQhEZo15s0/7W+LbC3ITMtFFxlyIHedRTD3XCRU0xOrhSvT+QxQWKgqa9ezmQfhFIaTBX fevPwZabItZlSAQ== X-Received: from pje3.prod.google.com ([2002:a17:90b:5503:b0:356:2f4b:a4dc]) (user=chengkev job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:d645:b0:354:c441:a758 with SMTP id 98e67ed59e1d1-358ae8c2723mr9322133a91.19.1771917512061; Mon, 23 Feb 2026 23:18:32 -0800 (PST) Date: Tue, 24 Feb 2026 07:18:22 +0000 In-Reply-To: <20260224071822.369326-1-chengkev@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260224071822.369326-1-chengkev@google.com> X-Mailer: git-send-email 2.53.0.414.gf7e9f6c205-goog Message-ID: <20260224071822.369326-5-chengkev@google.com> Subject: [PATCH V2 4/4] KVM: selftests: Add nested page fault injection test From: Kevin Cheng To: seanjc@google.com, pbonzini@redhat.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, yosry.ahmed@linux.dev, Kevin Cheng Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a test that exercises nested page fault injection during L2 execution. L2 executes I/O string instructions (OUTSB/INSB) that access memory restricted in L1's nested page tables (NPT/EPT), triggering a nested page fault that L0 must inject to L1. The test supports both AMD SVM (NPF) and Intel VMX (EPT violation) and verifies that: - The exit reason is an NPF/EPT violation - The access type and permission bits are correct - The faulting GPA is correct Three test cases are implemented: - Unmap the final data page (final translation fault, OUTSB read) - Unmap a PT page (page walk fault, OUTSB read) - Write-protect the final data page (protection violation, INSB write) - Write-protect a PT page (protection violation on A/D update, OUTSB read) Signed-off-by: Kevin Cheng --- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../selftests/kvm/x86/nested_npf_test.c | 374 ++++++++++++++++++ 2 files changed, 375 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86/nested_npf_test.c diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selft= ests/kvm/Makefile.kvm index fdec90e854671..55703d6be5e7a 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -93,6 +93,7 @@ TEST_GEN_PROGS_x86 +=3D x86/nested_dirty_log_test TEST_GEN_PROGS_x86 +=3D x86/nested_emulation_test TEST_GEN_PROGS_x86 +=3D x86/nested_exceptions_test TEST_GEN_PROGS_x86 +=3D x86/nested_invalid_cr3_test +TEST_GEN_PROGS_x86 +=3D x86/nested_npf_test TEST_GEN_PROGS_x86 +=3D x86/nested_set_state_test TEST_GEN_PROGS_x86 +=3D x86/nested_tsc_adjust_test TEST_GEN_PROGS_x86 +=3D x86/nested_tsc_scaling_test diff --git a/tools/testing/selftests/kvm/x86/nested_npf_test.c b/tools/test= ing/selftests/kvm/x86/nested_npf_test.c new file mode 100644 index 0000000000000..7725e5dc3a386 --- /dev/null +++ b/tools/testing/selftests/kvm/x86/nested_npf_test.c @@ -0,0 +1,374 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2025, Google, Inc. + */ + +#include "test_util.h" +#include "kvm_util.h" +#include "processor.h" +#include "svm_util.h" +#include "vmx.h" + +#define L2_GUEST_STACK_SIZE 64 + +#define EPT_VIOLATION_ACC_READ BIT(0) +#define EPT_VIOLATION_ACC_WRITE BIT(1) +#define EPT_VIOLATION_ACC_INSTR BIT(2) +#define EPT_VIOLATION_PROT_READ BIT(3) +#define EPT_VIOLATION_PROT_WRITE BIT(4) +#define EPT_VIOLATION_PROT_EXEC BIT(5) +#define EPT_VIOLATION_GVA_IS_VALID BIT(7) +#define EPT_VIOLATION_GVA_TRANSLATED BIT(8) + +enum test_type { + TEST_FINAL_PAGE_UNMAPPED, /* Final data page not present */ + TEST_PT_PAGE_UNMAPPED, /* Page table page not present */ + TEST_FINAL_PAGE_WRITE_PROTECTED, /* Final data page read-only */ + TEST_PT_PAGE_WRITE_PROTECTED, /* Page table page read-only */ +}; + +static vm_vaddr_t l2_test_page; +static void (*l2_entry)(void); + +#define TEST_IO_PORT 0x80 +#define TEST1_VADDR 0x8000000ULL +#define TEST2_VADDR 0x10000000ULL +#define TEST3_VADDR 0x18000000ULL +#define TEST4_VADDR 0x20000000ULL + +/* + * L2 executes OUTS reading from l2_test_page, triggering a nested page + * fault on the read access. + */ +static void l2_guest_code_outs(void) +{ + asm volatile("outsb" ::"S"(l2_test_page), "d"(TEST_IO_PORT) : "memory"); + GUEST_FAIL("L2 should not reach here"); +} + +/* + * L2 executes INS writing to l2_test_page, triggering a nested page + * fault on the write access. + */ +static void l2_guest_code_ins(void) +{ + asm volatile("insb" ::"D"(l2_test_page), "d"(TEST_IO_PORT) : "memory"); + GUEST_FAIL("L2 should not reach here"); +} + +static void l1_vmx_code(struct vmx_pages *vmx, uint64_t expected_fault_gpa, + uint64_t test_type) +{ + unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE]; + uint64_t exit_qual; + + GUEST_ASSERT(vmx->vmcs_gpa); + GUEST_ASSERT(prepare_for_vmx_operation(vmx)); + GUEST_ASSERT(load_vmcs(vmx)); + + prepare_vmcs(vmx, l2_entry, &l2_guest_stack[L2_GUEST_STACK_SIZE]); + + GUEST_ASSERT(!vmlaunch()); + + /* Verify we got an EPT violation exit */ + __GUEST_ASSERT(vmreadz(VM_EXIT_REASON) =3D=3D EXIT_REASON_EPT_VIOLATION, + "Expected EPT violation (0x%x), got 0x%lx", + EXIT_REASON_EPT_VIOLATION, + vmreadz(VM_EXIT_REASON)); + + exit_qual =3D vmreadz(EXIT_QUALIFICATION); + + switch (test_type) { + case TEST_FINAL_PAGE_UNMAPPED: + /* Read access, final translation, page not present */ + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_ACC_READ, + "Expected ACC_READ set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_GVA_IS_VALID, + "Expected GVA_IS_VALID set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_GVA_TRANSLATED, + "Expected GVA_TRANSLATED set, exit_qual 0x%lx", + exit_qual); + break; + case TEST_PT_PAGE_UNMAPPED: + /* Read access, page walk fault, page not present */ + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_ACC_READ, + "Expected ACC_READ set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_GVA_IS_VALID, + "Expected GVA_IS_VALID set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(!(exit_qual & EPT_VIOLATION_GVA_TRANSLATED), + "Expected GVA_TRANSLATED clear, exit_qual 0x%lx", + exit_qual); + break; + case TEST_FINAL_PAGE_WRITE_PROTECTED: + /* Write access, final translation, page present but read-only */ + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_ACC_WRITE, + "Expected ACC_WRITE set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_PROT_READ, + "Expected PROT_READ set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(!(exit_qual & EPT_VIOLATION_PROT_WRITE), + "Expected PROT_WRITE clear, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_GVA_IS_VALID, + "Expected GVA_IS_VALID set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_GVA_TRANSLATED, + "Expected GVA_TRANSLATED set, exit_qual 0x%lx", + exit_qual); + break; + case TEST_PT_PAGE_WRITE_PROTECTED: + /* Write access (A/D update), page walk, page present but read-only */ + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_ACC_WRITE, + "Expected ACC_WRITE set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_PROT_READ, + "Expected PROT_READ set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(!(exit_qual & EPT_VIOLATION_PROT_WRITE), + "Expected PROT_WRITE clear, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(exit_qual & EPT_VIOLATION_GVA_IS_VALID, + "Expected GVA_IS_VALID set, exit_qual 0x%lx", + exit_qual); + __GUEST_ASSERT(!(exit_qual & EPT_VIOLATION_GVA_TRANSLATED), + "Expected GVA_TRANSLATED clear, exit_qual 0x%lx", + exit_qual); + break; + } + + __GUEST_ASSERT(vmreadz(GUEST_PHYSICAL_ADDRESS) =3D=3D expected_fault_gpa, + "Expected guest_physical_address =3D 0x%lx, got 0x%lx", + expected_fault_gpa, + vmreadz(GUEST_PHYSICAL_ADDRESS)); + + GUEST_DONE(); +} + +static void l1_svm_code(struct svm_test_data *svm, uint64_t expected_fault= _gpa, + uint64_t test_type) +{ + unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE]; + struct vmcb *vmcb =3D svm->vmcb; + uint64_t exit_info_1; + + generic_svm_setup(svm, l2_entry, + &l2_guest_stack[L2_GUEST_STACK_SIZE]); + + run_guest(vmcb, svm->vmcb_gpa); + + /* Verify we got an NPF exit */ + __GUEST_ASSERT(vmcb->control.exit_code =3D=3D SVM_EXIT_NPF, + "Expected NPF exit (0x%x), got 0x%lx", SVM_EXIT_NPF, + vmcb->control.exit_code); + + exit_info_1 =3D vmcb->control.exit_info_1; + + switch (test_type) { + case TEST_FINAL_PAGE_UNMAPPED: + /* Read access, final translation, page not present */ + __GUEST_ASSERT(exit_info_1 & PFERR_GUEST_FINAL_MASK, + "Expected GUEST_FINAL set, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(!(exit_info_1 & PFERR_GUEST_PAGE_MASK), + "Expected GUEST_PAGE clear, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(!(exit_info_1 & PFERR_PRESENT_MASK), + "Expected PRESENT clear, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + break; + case TEST_PT_PAGE_UNMAPPED: + /* Read access, page walk fault, page not present */ + __GUEST_ASSERT(exit_info_1 & PFERR_GUEST_PAGE_MASK, + "Expected GUEST_PAGE set, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(!(exit_info_1 & PFERR_GUEST_FINAL_MASK), + "Expected GUEST_FINAL clear, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(!(exit_info_1 & PFERR_PRESENT_MASK), + "Expected PRESENT clear, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + break; + case TEST_FINAL_PAGE_WRITE_PROTECTED: + /* Write access, final translation, page present but read-only */ + __GUEST_ASSERT(exit_info_1 & PFERR_GUEST_FINAL_MASK, + "Expected GUEST_FINAL set, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(!(exit_info_1 & PFERR_GUEST_PAGE_MASK), + "Expected GUEST_PAGE clear, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(exit_info_1 & PFERR_PRESENT_MASK, + "Expected PRESENT set, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(exit_info_1 & PFERR_WRITE_MASK, + "Expected WRITE set, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + break; + case TEST_PT_PAGE_WRITE_PROTECTED: + /* Write access (A/D update), page walk, page present but read-only */ + __GUEST_ASSERT(exit_info_1 & PFERR_GUEST_PAGE_MASK, + "Expected GUEST_PAGE set, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(!(exit_info_1 & PFERR_GUEST_FINAL_MASK), + "Expected GUEST_FINAL clear, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(exit_info_1 & PFERR_PRESENT_MASK, + "Expected PRESENT set, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + __GUEST_ASSERT(exit_info_1 & PFERR_WRITE_MASK, + "Expected WRITE set, exit_info_1 0x%lx", + (unsigned long)exit_info_1); + break; + } + + __GUEST_ASSERT(vmcb->control.exit_info_2 =3D=3D expected_fault_gpa, + "Expected exit_info_2 =3D 0x%lx, got 0x%lx", + expected_fault_gpa, + vmcb->control.exit_info_2); + + GUEST_DONE(); +} + +static void l1_guest_code(void *data, uint64_t expected_fault_gpa, + uint64_t test_type) +{ + if (this_cpu_has(X86_FEATURE_VMX)) + l1_vmx_code(data, expected_fault_gpa, test_type); + else + l1_svm_code(data, expected_fault_gpa, test_type); +} + +/* Returns the GPA of the PT page that maps @vaddr. */ +static uint64_t get_pt_gpa_for_vaddr(struct kvm_vm *vm, uint64_t vaddr) +{ + uint64_t *pte; + + pte =3D vm_get_pte(vm, vaddr); + TEST_ASSERT(pte && (*pte & 0x1), "PTE not present for vaddr 0x%lx", + (unsigned long)vaddr); + + return addr_hva2gpa(vm, (void *)((uint64_t)pte & ~0xFFFULL)); +} + +static void run_test(enum test_type type) +{ + vm_paddr_t expected_fault_gpa; + vm_vaddr_t nested_gva; + + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + struct ucall uc; + + vm =3D vm_create_with_one_vcpu(&vcpu, l1_guest_code); + vm_enable_tdp(vm); + + if (kvm_cpu_has(X86_FEATURE_VMX)) + vcpu_alloc_vmx(vm, &nested_gva); + else + vcpu_alloc_svm(vm, &nested_gva); + + switch (type) { + case TEST_FINAL_PAGE_UNMAPPED: + /* + * Unmap the final data page from NPT/EPT. The guest page + * table walk succeeds, but the final GPA->HPA translation + * fails. L2 reads from the page via OUTS. + */ + l2_entry =3D l2_guest_code_outs; + l2_test_page =3D vm_vaddr_alloc(vm, vm->page_size, TEST1_VADDR); + expected_fault_gpa =3D addr_gva2gpa(vm, l2_test_page); + break; + case TEST_PT_PAGE_UNMAPPED: + /* + * Unmap a page table page from NPT/EPT. The hardware page + * table walk fails when translating the PT page's GPA + * through NPT/EPT. L2 reads from the page via OUTS. + */ + l2_entry =3D l2_guest_code_outs; + l2_test_page =3D vm_vaddr_alloc(vm, vm->page_size, TEST2_VADDR); + expected_fault_gpa =3D get_pt_gpa_for_vaddr(vm, l2_test_page); + break; + case TEST_FINAL_PAGE_WRITE_PROTECTED: + /* + * Write-protect the final data page in NPT/EPT. The page + * is present and readable, but not writable. L2 writes to + * the page via INS, triggering a protection violation. + */ + l2_entry =3D l2_guest_code_ins; + l2_test_page =3D vm_vaddr_alloc(vm, vm->page_size, TEST3_VADDR); + expected_fault_gpa =3D addr_gva2gpa(vm, l2_test_page); + break; + case TEST_PT_PAGE_WRITE_PROTECTED: + /* + * Write-protect a page table page in NPT/EPT. The page is + * present and readable, but not writable. The guest page + * table walk needs write access to set A/D bits, so it + * triggers a protection violation on the PT page. + * L2 reads from the page via OUTS. + */ + l2_entry =3D l2_guest_code_outs; + l2_test_page =3D vm_vaddr_alloc(vm, vm->page_size, TEST4_VADDR); + expected_fault_gpa =3D get_pt_gpa_for_vaddr(vm, l2_test_page); + break; + } + + tdp_identity_map_default_memslots(vm); + + if (type =3D=3D TEST_FINAL_PAGE_WRITE_PROTECTED || + type =3D=3D TEST_PT_PAGE_WRITE_PROTECTED) + *tdp_get_pte(vm, expected_fault_gpa) &=3D ~PTE_WRITABLE_MASK(&vm->stage2= _mmu); + else + *tdp_get_pte(vm, expected_fault_gpa) &=3D ~(PTE_PRESENT_MASK(&vm->stage2= _mmu) | + PTE_READABLE_MASK(&vm->stage2_mmu) | + PTE_WRITABLE_MASK(&vm->stage2_mmu) | + PTE_EXECUTABLE_MASK(&vm->stage2_mmu)); + + sync_global_to_guest(vm, l2_entry); + sync_global_to_guest(vm, l2_test_page); + vcpu_args_set(vcpu, 3, nested_gva, expected_fault_gpa, (uint64_t)type); + + /* + * For the INS-based write test, KVM emulates the instruction and + * first reads from the I/O port, which exits to userspace. + * Re-enter the guest so emulation can proceed to the memory + * write, where the nested page fault is triggered. + */ + for (;;) { + vcpu_run(vcpu); + + if (vcpu->run->exit_reason =3D=3D KVM_EXIT_IO && + vcpu->run->io.port =3D=3D TEST_IO_PORT && + vcpu->run->io.direction =3D=3D KVM_EXIT_IO_IN) { + continue; + } + break; + } + + switch (get_ucall(vcpu, &uc)) { + case UCALL_DONE: + break; + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + default: + TEST_FAIL("Unexpected exit reason: %d", vcpu->run->exit_reason); + } + + kvm_vm_free(vm); +} + +int main(int argc, char *argv[]) +{ + TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX) || kvm_cpu_has(X86_FEATURE_SVM)= ); + TEST_REQUIRE(kvm_cpu_has_tdp()); + + run_test(TEST_FINAL_PAGE_UNMAPPED); + run_test(TEST_PT_PAGE_UNMAPPED); + run_test(TEST_FINAL_PAGE_WRITE_PROTECTED); + run_test(TEST_PT_PAGE_WRITE_PROTECTED); + + return 0; +} --=20 2.53.0.414.gf7e9f6c205-goog