From nobody Sun Oct 5 16:20:37 2025 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 11B6E2550BB for ; Thu, 31 Jul 2025 20:58:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753995538; cv=none; b=FWP5qFsWppAhsYgAaH4Fzv7MNjrWgkRZrADbxZfzu+bQYaOk9fcxVlMLfJmYfyb53V/zznVtiHnDMi6a0vQzWS1qk9sty17B0Z8xmbqQoWLLncFAnHIzXrNtzjy57+mCGQRw9aQ5KXwA9alfPUrxTlIb8I3GJJj+rUBDK8/QfLA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753995538; c=relaxed/simple; bh=oZmAKS0zc6NvOTUiVXiXnE1DVkxtGlCzd92lP34vsmo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=PKHlghPJg+InpMhj+uhlBhm38nmC6btb5My9sz4+Ppw05hn6pORdmjbyfqqo4UpljSjlVWhik8yuwScHj7MWujSvsIRxuI2NkQFviV62O7belTWUySPxFXhmVDFkmqfVJ9JbD3ZfWWOoUhkEmeLHVBzoyWdTKxnhbS/rnSahmiE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jiaqiyan.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Q/AURF9e; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jiaqiyan.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Q/AURF9e" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-23824a9bc29so2833005ad.3 for ; Thu, 31 Jul 2025 13:58:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1753995535; x=1754600335; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=eOUpQzKtFZYeNywfqlT/YzIj7U1gzFp8S5SCwF5sOXg=; b=Q/AURF9eXlttZ1IW+YIbnH+osciDcx/wBBFtn4ghGbb3QjIE3nnHqAN4M961DMh9fx mKEHV8mW73+uDdlH/jNJoRcglNe1c14kSUBFTNoEqKFhI99Wb1vPkrNu0RssulPDhSjo 2hZ/eqZ0jR6h2LHIC9Buf1NlDgDwrZv9IpyiqBC2Fv53UoraQArNUrnHvBMPxYfEJPEQ SKGIIn/2zw43z6iLMpogU3dGg4MXThEn976VW7mEmxh2mSiKT3Ku21oSLj0JZLaIRKA7 fl0nWPVMHWiQ455Y5KicA5Vookbm3dzM8PdLItWuqo+3SRJyestNZROfdOipbhdGO27C w1RA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753995535; x=1754600335; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eOUpQzKtFZYeNywfqlT/YzIj7U1gzFp8S5SCwF5sOXg=; b=lc+umZNfWwSIR6heFIslwk8MaMFc0PG4Ngpi0K5f1akp/c4bxVqy90uRvsIO6jo0oI AcdgF9yVBFj2vCpMp7ORYf2SBjTjlUJtsPEHpVs1gA5VNS8cCoyZqhFL+vQu37Y1ArY4 s23x5N39mfKOH0WEMSUCC2FwuNysa9SNzQNsgp5sKsS2rkG8Zk+UPswoMzF2n/QU1rXT gfjt5GNrVglKT23haCtmRTnRRHAitYOnwduaknQUh3Q5kFSih3b6PY6wQbuweGjphwWP Fvxovg8hh/ZjaNdWaWHo62gt/NlDu0U0gFFQSiSs+0vy+m5XyV81cJZ2noYMlwppe7FO AYlQ== X-Forwarded-Encrypted: i=1; AJvYcCXj76GAdK6hzVxTXS7Fo2CJe4h+lKwTdMf+iZNZr01gFi4PpRcxgbnPdA7h8aefCBIYqaAe0EnV3TVnczk=@vger.kernel.org X-Gm-Message-State: AOJu0Yx4vhaUYi4LSOwCqx4Cjo6YMsW7b6aQcq3/pHs/bBTfyJNGpK0E V6hxuJhZptJaWwRcIcvH/jKNJqSVfabr3QoT7e8UKbRMeNQ+ceukV6shGpMJeJ9LZF/Yku5X0/Z 84Va7w3t6LY73LQ== X-Google-Smtp-Source: AGHT+IFYBd0n/6m6Ox1Nv0VY06rvT/P3u/kIBdHylp6A2OvXCXAjXPjyyd0PyslE+N1LWREbAdmRywkwsgANvQ== X-Received: from plbu12.prod.google.com ([2002:a17:902:e20c:b0:240:801d:1089]) (user=jiaqiyan job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:e5cd:b0:240:678c:d2b7 with SMTP id d9443c01a7336-24096ab6fc1mr124071855ad.15.1753995535319; Thu, 31 Jul 2025 13:58:55 -0700 (PDT) Date: Thu, 31 Jul 2025 20:58:42 +0000 In-Reply-To: <20250731205844.1346839-1-jiaqiyan@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250731205844.1346839-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.50.1.565.gc32cd1483b-goog Message-ID: <20250731205844.1346839-2-jiaqiyan@google.com> Subject: [PATCH v3 1/3] KVM: arm64: VM exit to userspace to handle SEA From: Jiaqi Yan To: maz@kernel.org, oliver.upton@linux.dev Cc: joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org, pbonzini@redhat.com, corbet@lwn.net, shuah@kernel.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, duenwen@google.com, rananta@google.com, jthoughton@google.com, Jiaqi Yan Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When APEI fails to handle a stage-2 synchronous external abort (SEA), today KVM directly injects an async SError to the VCPU then resumes it, which usually results in unpleasant guest kernel panic. One major situation of guest SEA is when vCPU consumes recoverable uncorrected memory error (UER). Although SError and guest kernel panic effectively stops the propagation of corrupted memory, guest may re-use the corrupted memory if auto-rebooted; in worse case, guest boot may run into poisoned memory. So there is room to recover from an UER in a more graceful manner. Alternatively KVM can redirect the synchronous SEA event to VMM to - Reduce blast radius if possible. VMM can inject a SEA to VCPU via KVM's existing KVM_SET_VCPU_EVENTS API. If the memory poison consumption or fault is not from guest kernel, blast radius can be limited to the triggering thread in guest userspace, so VM can keep running. - Allow VMM to protect from future memory poison consumption by unmapping the page from stage-2, or to interrupt guest of the poisoned page so guest kernel can unmap it from stage-1 page table. - Allow VMM to track SEA events that VM customers care about, to restart VM when certain number of distinct poison events have happened, to provide observability to customers in log management UI. Introduce an userspace-visible feature to enable VMM handle SEA: - KVM_CAP_ARM_SEA_TO_USER. As the alternative fallback behavior when host APEI fails to claim a SEA, userspace can opt in this new capability to let KVM exit to userspace during SEA if it is not owned by host. - KVM_EXIT_ARM_SEA. A new exit reason is introduced for this. KVM fills kvm_run.arm_sea with as much as possible information about the SEA, enabling VMM to emulate SEA to guest by itself. - Sanitized ESR_EL2. The general rule is to keep only the bits useful for userspace and relevant to guest memory. - Flags indicating if faulting guest physical address is valid. - Faulting guest physical and virtual addresses if valid. Signed-off-by: Jiaqi Yan Co-developed-by: Oliver Upton Signed-off-by: Oliver Upton --- arch/arm64/include/asm/kvm_host.h | 2 + arch/arm64/kvm/arm.c | 5 +++ arch/arm64/kvm/mmu.c | 68 ++++++++++++++++++++++++++++++- include/uapi/linux/kvm.h | 10 +++++ 4 files changed, 84 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm= _host.h index d373d555a69ba..8b4133a5aacf3 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -349,6 +349,8 @@ struct kvm_arch { #define KVM_ARCH_FLAG_GUEST_HAS_SVE 9 /* MIDR_EL1, REVIDR_EL1, and AIDR_EL1 are writable from userspace */ #define KVM_ARCH_FLAG_WRITABLE_IMP_ID_REGS 10 + /* Unhandled SEAs are taken to userspace */ +#define KVM_ARCH_FLAG_EXIT_SEA 11 unsigned long flags; =20 /* VM-wide vCPU feature set */ diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 7a1a8210ff918..aec6034db1e75 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -133,6 +133,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, } mutex_unlock(&kvm->lock); break; + case KVM_CAP_ARM_SEA_TO_USER: + r =3D 0; + set_bit(KVM_ARCH_FLAG_EXIT_SEA, &kvm->arch.flags); + break; default: break; } @@ -322,6 +326,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long = ext) case KVM_CAP_IRQFD_RESAMPLE: case KVM_CAP_COUNTER_OFFSET: case KVM_CAP_ARM_WRITABLE_IMP_ID_REGS: + case KVM_CAP_ARM_SEA_TO_USER: r =3D 1; break; case KVM_CAP_SET_GUEST_DEBUG2: diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 9a45daf817bfd..f6a545700c15b 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1812,8 +1812,48 @@ static void handle_access_fault(struct kvm_vcpu *vcp= u, phys_addr_t fault_ipa) read_unlock(&vcpu->kvm->mmu_lock); } =20 +/* + * Returns true if the SEA should be handled locally within KVM if the abo= rt + * is caused by a kernel memory allocation (e.g. stage-2 table memory). + */ +static bool host_owns_sea(struct kvm_vcpu *vcpu, u64 esr) +{ + /* + * Without FEAT_RAS HCR_EL2.TEA is RES0, meaning any external abort + * taken from a guest EL to EL2 is due to a host-imposed access (e.g. + * stage-2 PTW). + */ + if (!cpus_have_final_cap(ARM64_HAS_RAS_EXTN)) + return true; + + /* KVM owns the VNCR when the vCPU isn't in a nested context. */ + if (is_hyp_ctxt(vcpu) && (esr & ESR_ELx_VNCR)) + return true; + + /* + * Determine if an external abort during a table walk happened at + * stage-2 is only possible when S1PTW is set. Otherwise, since KVM + * sets HCR_EL2.TEA, SEAs due to a stage-1 walk (i.e. accessing the + * PA of the stage-1 descriptor) can reach here and are reported + * with a TTW ESR value. + */ + return (esr_fsc_is_sea_ttw(esr) && (esr & ESR_ELx_S1PTW)); +} + int kvm_handle_guest_sea(struct kvm_vcpu *vcpu) { + struct kvm *kvm =3D vcpu->kvm; + struct kvm_run *run =3D vcpu->run; + u64 esr =3D kvm_vcpu_get_esr(vcpu); + u64 esr_mask =3D ESR_ELx_EC_MASK | + ESR_ELx_IL | + ESR_ELx_FnV | + ESR_ELx_EA | + ESR_ELx_CM | + ESR_ELx_WNR | + ESR_ELx_FSC; + u64 ipa; + /* * Give APEI the opportunity to claim the abort before handling it * within KVM. apei_claim_sea() expects to be called with IRQs enabled. @@ -1822,7 +1862,33 @@ int kvm_handle_guest_sea(struct kvm_vcpu *vcpu) if (apei_claim_sea(NULL) =3D=3D 0) return 1; =20 - return kvm_inject_serror(vcpu); + if (host_owns_sea(vcpu, esr) || + !test_bit(KVM_ARCH_FLAG_EXIT_SEA, &vcpu->kvm->arch.flags)) + return kvm_inject_serror(vcpu); + + /* ESR_ELx.SET is RES0 when FEAT_RAS isn't implemented. */ + if (kvm_has_ras(kvm)) + esr_mask |=3D ESR_ELx_SET_MASK; + + /* + * Exit to userspace, and provide faulting guest virtual and physical + * addresses in case userspace wants to emulate SEA to guest by + * writing to FAR_ELx and HPFAR_ELx registers. + */ + memset(&run->arm_sea, 0, sizeof(run->arm_sea)); + run->exit_reason =3D KVM_EXIT_ARM_SEA; + run->arm_sea.esr =3D esr & esr_mask; + + if (!(esr & ESR_ELx_FnV)) + run->arm_sea.gva =3D kvm_vcpu_get_hfar(vcpu); + + ipa =3D kvm_vcpu_get_fault_ipa(vcpu); + if (ipa !=3D INVALID_GPA) { + run->arm_sea.flags |=3D KVM_EXIT_ARM_SEA_FLAG_GPA_VALID; + run->arm_sea.gpa =3D ipa; + } + + return 0; } =20 /** diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index e4e566ff348b0..b2cc3d74d769c 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -179,6 +179,7 @@ struct kvm_xen_exit { #define KVM_EXIT_LOONGARCH_IOCSR 38 #define KVM_EXIT_MEMORY_FAULT 39 #define KVM_EXIT_TDX 40 +#define KVM_EXIT_ARM_SEA 41 =20 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -469,6 +470,14 @@ struct kvm_run { } get_tdvmcall_info; }; } tdx; + /* KVM_EXIT_ARM_SEA */ + struct { +#define KVM_EXIT_ARM_SEA_FLAG_GPA_VALID (1ULL << 0) + __u64 flags; + __u64 esr; + __u64 gva; + __u64 gpa; + } arm_sea; /* Fix the size of the union. */ char padding[256]; }; @@ -957,6 +966,7 @@ struct kvm_enable_cap { #define KVM_CAP_ARM_EL2_E2H0 241 #define KVM_CAP_RISCV_MP_STATE_RESET 242 #define KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 243 +#define KVM_CAP_ARM_SEA_TO_USER 244 =20 struct kvm_irq_routing_irqchip { __u32 irqchip; --=20 2.50.1.565.gc32cd1483b-goog From nobody Sun Oct 5 16:20:37 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 77E77261594 for ; Thu, 31 Jul 2025 20:58:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753995539; cv=none; b=kD8y2TH01r0i//LUr1UC+O1227993pHBmjIuxHQ6U5Th9k8la4278rgrU9xZKF7n9sRvw8Jz8rhq8SIFnrRH1nMxZmfhAXVSLdTJWgpWDa9zavBaVwXcYnIVKktBcP+OIoCzq2jCmFX29qLl4Q12xFj+oE0afEoGxoGz5I1sL7M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753995539; c=relaxed/simple; bh=cGMGpgMDOFtemFI0cHnoeNNANpjz/ZQ3gUUSarXqgHI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Y5neIv7x8fIliCOx15v/Hk8nfY+wpkxvaU9YCGS9l2/DlCyiaNoGOxN9AJG1UuiNV0otfqnleznXdzQjEDWtszvwpOXeLy9o9ugK1CbQifojWw0laYbhYsZwWgJnaZFIFtVhw1gRnsedqXcFZPywcDQOL6WZNmPYeO0wyGk5VIQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jiaqiyan.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=wjig+Wze; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jiaqiyan.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="wjig+Wze" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-31ea430d543so131849a91.3 for ; Thu, 31 Jul 2025 13:58:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1753995537; x=1754600337; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=F46pMH1+iZGmF9xaF77AjZh22HwgVlr+zZEaUyOdnFk=; b=wjig+WzeYoK61PbdPSKW2eFme651kMtzdAB2HHt7e4Omb0fcAaJiVNXHMqWnzdfylv 2LlOAYyq/5jCKtoOSqM0jDfsN4ovouRbhfbjT2Tx9ANEQR61HRNRiobUhwGpwGGanX8K sl1YdcHmBvuega/5Uhl2L8kvhIo0+FDgdECY/6YNmHQjSJEvQzW69t3+xlx+3Cz8URtl XVxN02C1rxT/jfn9y5GzibsVlJC3oW4PUuRc1W2IFQthurCLCNMMHyGiUZfC1hxwZIgw HEDV/VcXEGTx7Y5eRNadgXd/06LXSmazg8IEsgkd3UWWkfVntjZVz96DeSWCogMp7VG0 953Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753995537; x=1754600337; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=F46pMH1+iZGmF9xaF77AjZh22HwgVlr+zZEaUyOdnFk=; b=D3sovtWOZMVuQoEIGjPcnADqZee6yZK/6jB/JUUZQ+H72PtIMtVVKjV0NZUo4NQBzx ILo8B6k70EfzL+A+QjQC7mLtMQDSCBuF6fqmDdBJuJ3wywAuUhovyCF/EPkrCKPEAWbX YXSKjo/Kc6E0AK7ze0+pnIVj3oCFAfqou0VxPsTfa++Q6FN5HmH1CzHNsSivsSAEngu9 9523CGj77CvAuhxrbmptf6ysuXnQCadZloU929o0CjLwJD/aZagGLrrCQUZ1LAOtMbci LH7h0JBgM4aFAnvGsD8sRknDS/Z3u/WFDDWildJW0+kium4uiZyhtxtvc2WKCAHcWmQh EzPg== X-Forwarded-Encrypted: i=1; AJvYcCW35ueg2LuJ6JPLhWvdcR+/lVmCTDBrCBzJeCU5RDISknX9EntEwW8lNF3mkfiHb9lebiCHMbfTHJDLA1g=@vger.kernel.org X-Gm-Message-State: AOJu0Yy2c9En+VFwVtyDSU8uQPztx2rVdbfCVocVSMpSHB5hOTbWP7Ux PviuL/jdO6eDIlhGGTnIhdCq3kR+67QTDDeFkaa5uFkN4IRZ94fKH5yFgS7/SSkL1BX/Xe1NV9C h7RMKFelHz0YzpA== X-Google-Smtp-Source: AGHT+IEQ3yXPShvKLn0RwZSyuAR/26fLbmN+PMhu/KUx8Bqh8AprTssst9Gv9dW7e1g9nfOkQELiWT0go9UbEw== X-Received: from pjyf13.prod.google.com ([2002:a17:90a:ec8d:b0:312:e266:f849]) (user=jiaqiyan job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:1f88:b0:31f:336a:f0db with SMTP id 98e67ed59e1d1-31f5ddc4b7fmr12701699a91.10.1753995536842; Thu, 31 Jul 2025 13:58:56 -0700 (PDT) Date: Thu, 31 Jul 2025 20:58:43 +0000 In-Reply-To: <20250731205844.1346839-1-jiaqiyan@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250731205844.1346839-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.50.1.565.gc32cd1483b-goog Message-ID: <20250731205844.1346839-3-jiaqiyan@google.com> Subject: [PATCH v3 2/3] KVM: selftests: Test for KVM_EXIT_ARM_SEA From: Jiaqi Yan To: maz@kernel.org, oliver.upton@linux.dev Cc: joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org, pbonzini@redhat.com, corbet@lwn.net, shuah@kernel.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, duenwen@google.com, rananta@google.com, jthoughton@google.com, Jiaqi Yan Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Test how KVM handles guest SEA when APEI is unable to claim it, and KVM_CAP_ARM_SEA_TO_USER is enabled. The behavior is triggered by consuming recoverable memory error (UER) injected via EINJ. The test asserts two major things: 1. KVM returns to userspace with KVM_EXIT_ARM_SEA exit reason, and has provided expected fault information, e.g. esr, flags, gva, gpa. 2. Userspace is able to handle KVM_EXIT_ARM_SEA by injecting SEA to guest and KVM injects expected SEA into the VCPU. Tested on a data center server running Siryn AmpereOne processor that has RAS support. Several things to notice before attempting to run this selftest: - The test relies on EINJ support in both firmware and kernel to inject UER. Otherwise the test will be skipped. - The under-test platform's APEI should be unable to claim the SEA. Otherwise the test will be skipped. - Some platform doesn't support notrigger in EINJ, which may cause APEI and GHES to offline the memory before guest can consume injected UER, and making test unable to trigger SEA. Signed-off-by: Jiaqi Yan --- tools/arch/arm64/include/asm/esr.h | 2 + tools/testing/selftests/kvm/Makefile.kvm | 1 + .../testing/selftests/kvm/arm64/sea_to_user.c | 327 ++++++++++++++++++ tools/testing/selftests/kvm/lib/kvm_util.c | 1 + 4 files changed, 331 insertions(+) create mode 100644 tools/testing/selftests/kvm/arm64/sea_to_user.c diff --git a/tools/arch/arm64/include/asm/esr.h b/tools/arch/arm64/include/= asm/esr.h index bd592ca815711..0fa17b3af1f78 100644 --- a/tools/arch/arm64/include/asm/esr.h +++ b/tools/arch/arm64/include/asm/esr.h @@ -141,6 +141,8 @@ #define ESR_ELx_SF (UL(1) << ESR_ELx_SF_SHIFT) #define ESR_ELx_AR_SHIFT (14) #define ESR_ELx_AR (UL(1) << ESR_ELx_AR_SHIFT) +#define ESR_ELx_VNCR_SHIFT (13) +#define ESR_ELx_VNCR (UL(1) << ESR_ELx_VNCR_SHIFT) #define ESR_ELx_CM_SHIFT (8) #define ESR_ELx_CM (UL(1) << ESR_ELx_CM_SHIFT) =20 diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selft= ests/kvm/Makefile.kvm index ce817a975e50a..8512e790f45fa 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -159,6 +159,7 @@ TEST_GEN_PROGS_arm64 +=3D arm64/hypercalls TEST_GEN_PROGS_arm64 +=3D arm64/external_aborts TEST_GEN_PROGS_arm64 +=3D arm64/page_fault_test TEST_GEN_PROGS_arm64 +=3D arm64/psci_test +TEST_GEN_PROGS_arm64 +=3D arm64/sea_to_user TEST_GEN_PROGS_arm64 +=3D arm64/set_id_regs TEST_GEN_PROGS_arm64 +=3D arm64/smccc_filter TEST_GEN_PROGS_arm64 +=3D arm64/vcpu_width_config diff --git a/tools/testing/selftests/kvm/arm64/sea_to_user.c b/tools/testin= g/selftests/kvm/arm64/sea_to_user.c new file mode 100644 index 0000000000000..3ddc890e8a8b5 --- /dev/null +++ b/tools/testing/selftests/kvm/arm64/sea_to_user.c @@ -0,0 +1,327 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Test KVM returns to userspace with KVM_EXIT_ARM_SEA if host APEI fails + * to handle SEA and userspace has opt-ed in KVM_CAP_ARM_SEA_TO_USER. + * + * After reaching userspace with expected arm_sea info, also test userspace + * injecting a synchronous external data abort into the guest. + * + * This test utilizes EINJ to generate a REAL synchronous external data + * abort by consuming a recoverable uncorrectable memory error. Therefore + * the device under test must support EINJ in both firmware and host kerne= l, + * including the notrigger feature. Otherwise the test will be skipped. + * The under-test platform's APEI should be unable to claim SEA. Otherwise + * the test will also be skipped. + */ + +#include +#include +#include +#include + +#include "test_util.h" +#include "kvm_util.h" +#include "processor.h" +#include "guest_modes.h" + +#define PAGE_PRESENT (1ULL << 63) +#define PAGE_PHYSICAL 0x007fffffffffffffULL +#define PAGE_ADDR_MASK (~(0xfffULL)) + +/* Group ISV and ISS[23:14]. */ +#define ESR_ELx_INST_SYNDROME ((ESR_ELx_ISV) | (ESR_ELx_SAS) | \ + (ESR_ELx_SSE) | (ESR_ELx_SRT_MASK) | \ + (ESR_ELx_SF) | (ESR_ELx_AR)) + +#define EINJ_ETYPE "/sys/kernel/debug/apei/einj/error_type" +#define EINJ_ADDR "/sys/kernel/debug/apei/einj/param1" +#define EINJ_MASK "/sys/kernel/debug/apei/einj/param2" +#define EINJ_FLAGS "/sys/kernel/debug/apei/einj/flags" +#define EINJ_NOTRIGGER "/sys/kernel/debug/apei/einj/notrigger" +#define EINJ_DOIT "/sys/kernel/debug/apei/einj/error_inject" +/* Memory Uncorrectable non-fatal. */ +#define ERROR_TYPE_MEMORY_UER 0x10 +/* Memory address and mask valid (param1 and param2). */ +#define MASK_MEMORY_UER 0b10 + +/* Guest virtual address region =3D [2G, 3G). */ +#define START_GVA 0x80000000UL +#define VM_MEM_SIZE 0x40000000UL +/* Note: EINJ_OFFSET must < VM_MEM_SIZE. */ +#define EINJ_OFFSET 0x01234badUL +#define EINJ_GVA ((START_GVA) + (EINJ_OFFSET)) + +static vm_paddr_t einj_gpa; +static void *einj_hva; +static uint64_t einj_hpa; +static bool far_invalid; + +static uint64_t translate_to_host_paddr(unsigned long vaddr) +{ + uint64_t pinfo; + int64_t offset =3D vaddr / getpagesize() * sizeof(pinfo); + int fd; + uint64_t page_addr; + uint64_t paddr; + + fd =3D open("/proc/self/pagemap", O_RDONLY); + if (fd < 0) + ksft_exit_fail_perror("Failed to open /proc/self/pagemap"); + if (pread(fd, &pinfo, sizeof(pinfo), offset) !=3D sizeof(pinfo)) { + close(fd); + ksft_exit_fail_perror("Failed to read /proc/self/pagemap"); + } + + close(fd); + + if ((pinfo & PAGE_PRESENT) =3D=3D 0) + ksft_exit_fail_perror("Page not present"); + + page_addr =3D (pinfo & PAGE_PHYSICAL) << MIN_PAGE_SHIFT; + paddr =3D page_addr + (vaddr & (getpagesize() - 1)); + return paddr; +} + +static void write_einj_entry(const char *einj_path, uint64_t val) +{ + char cmd[256] =3D {0}; + FILE *cmdfile =3D NULL; + + sprintf(cmd, "echo %#lx > %s", val, einj_path); + cmdfile =3D popen(cmd, "r"); + + if (pclose(cmdfile) =3D=3D 0) + ksft_print_msg("echo %#lx > %s - done\n", val, einj_path); + else + ksft_exit_fail_perror("Failed to write EINJ entry"); +} + +static void inject_uer(uint64_t paddr) +{ + if (access("/sys/firmware/acpi/tables/EINJ", R_OK) =3D=3D -1) + ksft_test_result_skip("EINJ table no available in firmware"); + + if (access(EINJ_ETYPE, R_OK | W_OK) =3D=3D -1) + ksft_test_result_skip("EINJ module probably not loaded?"); + + write_einj_entry(EINJ_ETYPE, ERROR_TYPE_MEMORY_UER); + write_einj_entry(EINJ_FLAGS, MASK_MEMORY_UER); + write_einj_entry(EINJ_ADDR, paddr); + write_einj_entry(EINJ_MASK, ~0x0UL); + write_einj_entry(EINJ_NOTRIGGER, 1); + write_einj_entry(EINJ_DOIT, 1); +} + +/* + * When host APEI successfully claims the SEA caused by guest_code, kernel + * will send SIGBUS signal with BUS_MCEERR_AR to test thread. + * + * We set up this SIGBUS handler to skip the test for that case. + */ +static void sigbus_signal_handler(int sig, siginfo_t *si, void *v) +{ + ksft_print_msg("SIGBUS (%d) received, dumping siginfo...\n", sig); + ksft_print_msg("si_signo=3D%d, si_errno=3D%d, si_code=3D%d, si_addr=3D%p\= n", + si->si_signo, si->si_errno, si->si_code, si->si_addr); + if (si->si_code =3D=3D BUS_MCEERR_AR) + ksft_test_result_skip("SEA is claimed by host APEI\n"); + else + ksft_test_result_fail("Exit with signal unhandled\n"); + + exit(0); +} + +static void setup_sigbus_handler(void) +{ + struct sigaction act; + + memset(&act, 0, sizeof(act)); + sigemptyset(&act.sa_mask); + act.sa_sigaction =3D sigbus_signal_handler; + act.sa_flags =3D SA_SIGINFO; + TEST_ASSERT(sigaction(SIGBUS, &act, NULL) =3D=3D 0, + "Failed to setup SIGBUS handler"); +} + +static void guest_code(void) +{ + uint64_t guest_data; + + /* Consumes error will cause a SEA. */ + guest_data =3D *(uint64_t *)EINJ_GVA; + + GUEST_FAIL("Poison not protected by SEA: gva=3D%#lx, guest_data=3D%#lx\n", + EINJ_GVA, guest_data); +} + +static void expect_sea_handler(struct ex_regs *regs) +{ + u64 esr =3D read_sysreg(esr_el1); + u64 far =3D read_sysreg(far_el1); + bool expect_far_invalid =3D far_invalid; + + GUEST_PRINTF("Handling Guest SEA\n"); + GUEST_PRINTF("ESR_EL1=3D%#lx, FAR_EL1=3D%#lx\n", esr, far); + + GUEST_ASSERT_EQ(ESR_ELx_EC(esr), ESR_ELx_EC_DABT_CUR); + GUEST_ASSERT_EQ(esr & ESR_ELx_FSC_TYPE, ESR_ELx_FSC_EXTABT); + + if (expect_far_invalid) { + GUEST_ASSERT_EQ(esr & ESR_ELx_FnV, ESR_ELx_FnV); + GUEST_PRINTF("Guest observed garbage value in FAR\n"); + } else { + GUEST_ASSERT_EQ(esr & ESR_ELx_FnV, 0); + GUEST_ASSERT_EQ(far, EINJ_GVA); + } + + GUEST_DONE(); +} + +static void vcpu_inject_sea(struct kvm_vcpu *vcpu) +{ + struct kvm_vcpu_events events =3D {}; + + events.exception.ext_dabt_pending =3D true; + vcpu_events_set(vcpu, &events); +} + +static void run_vm(struct kvm_vm *vm, struct kvm_vcpu *vcpu) +{ + struct ucall uc; + bool guest_done =3D false; + struct kvm_run *run =3D vcpu->run; + u64 esr; + + /* Resume the vCPU after error injection to consume the error. */ + vcpu_run(vcpu); + + ksft_print_msg("Dump kvm_run info about KVM_EXIT_%s\n", + exit_reason_str(run->exit_reason)); + ksft_print_msg("kvm_run.arm_sea: esr=3D%#llx, flags=3D%#llx\n", + run->arm_sea.esr, run->arm_sea.flags); + ksft_print_msg("kvm_run.arm_sea: gva=3D%#llx, gpa=3D%#llx\n", + run->arm_sea.gva, run->arm_sea.gpa); + + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_ARM_SEA); + + esr =3D run->arm_sea.esr; + TEST_ASSERT_EQ(ESR_ELx_EC(esr), ESR_ELx_EC_DABT_LOW); + TEST_ASSERT_EQ(esr & ESR_ELx_FSC_TYPE, ESR_ELx_FSC_EXTABT); + TEST_ASSERT_EQ(ESR_ELx_ISS2(esr), 0); + TEST_ASSERT_EQ((esr & ESR_ELx_INST_SYNDROME), 0); + TEST_ASSERT_EQ(esr & ESR_ELx_VNCR, 0); + + if (!(esr & ESR_ELx_FnV)) + TEST_ASSERT_EQ(run->arm_sea.gva, EINJ_GVA); + + if (run->arm_sea.flags & KVM_EXIT_ARM_SEA_FLAG_GPA_VALID) + TEST_ASSERT_EQ(run->arm_sea.gpa, einj_gpa & PAGE_ADDR_MASK); + + far_invalid =3D esr & ESR_ELx_FnV; + + /* Inject a SEA into guest and expect handled in SEA handler. */ + vcpu_inject_sea(vcpu); + + /* Expect the guest to reach GUEST_DONE gracefully. */ + do { + vcpu_run(vcpu); + switch (get_ucall(vcpu, &uc)) { + case UCALL_PRINTF: + ksft_print_msg("From guest: %s", uc.buffer); + break; + case UCALL_DONE: + ksft_print_msg("Guest done gracefully!\n"); + guest_done =3D 1; + break; + case UCALL_ABORT: + ksft_print_msg("Guest aborted!\n"); + guest_done =3D 1; + REPORT_GUEST_ASSERT(uc); + break; + default: + TEST_FAIL("Unexpected ucall: %lu\n", uc.cmd); + } + } while (!guest_done); +} + +static struct kvm_vm *vm_create_with_sea_handler(struct kvm_vcpu **vcpu) +{ + size_t backing_page_size; + size_t guest_page_size; + size_t alignment; + uint64_t num_guest_pages; + vm_paddr_t start_gpa; + enum vm_mem_backing_src_type src_type =3D VM_MEM_SRC_ANONYMOUS_HUGETLB_1G= B; + struct kvm_vm *vm; + + backing_page_size =3D get_backing_src_pagesz(src_type); + guest_page_size =3D vm_guest_mode_params[VM_MODE_DEFAULT].page_size; + alignment =3D max(backing_page_size, guest_page_size); + num_guest_pages =3D VM_MEM_SIZE / guest_page_size; + + vm =3D __vm_create_with_one_vcpu(vcpu, num_guest_pages, guest_code); + vm_init_descriptor_tables(vm); + vcpu_init_descriptor_tables(*vcpu); + + vm_install_sync_handler(vm, + /*vector=3D*/VECTOR_SYNC_CURRENT, + /*ec=3D*/ESR_ELx_EC_DABT_CUR, + /*handler=3D*/expect_sea_handler); + + start_gpa =3D (vm->max_gfn - num_guest_pages) * guest_page_size; + start_gpa =3D align_down(start_gpa, alignment); + + vm_userspace_mem_region_add( + /*vm=3D*/vm, + /*src_type=3D*/src_type, + /*guest_paddr=3D*/start_gpa, + /*slot=3D*/1, + /*npages=3D*/num_guest_pages, + /*flags=3D*/0); + + virt_map(vm, START_GVA, start_gpa, num_guest_pages); + + ksft_print_msg("Mapped %#lx pages: gva=3D%#lx to gpa=3D%#lx\n", + num_guest_pages, START_GVA, start_gpa); + return vm; +} + +static void vm_inject_memory_uer(struct kvm_vm *vm) +{ + uint64_t guest_data; + + einj_gpa =3D addr_gva2gpa(vm, EINJ_GVA); + einj_hva =3D addr_gva2hva(vm, EINJ_GVA); + + /* Populate certain data before injecting UER. */ + *(uint64_t *)einj_hva =3D 0xBAADCAFE; + guest_data =3D *(uint64_t *)einj_hva; + ksft_print_msg("Before EINJect: data=3D%#lx\n", + guest_data); + + einj_hpa =3D translate_to_host_paddr((unsigned long)einj_hva); + + ksft_print_msg("EINJ_GVA=3D%#lx, einj_gpa=3D%#lx, einj_hva=3D%p, einj_hpa= =3D%#lx\n", + EINJ_GVA, einj_gpa, einj_hva, einj_hpa); + + inject_uer(einj_hpa); + ksft_print_msg("Memory UER EINJected\n"); +} + +int main(int argc, char *argv[]) +{ + struct kvm_vm *vm; + struct kvm_vcpu *vcpu; + + TEST_REQUIRE(kvm_has_cap(KVM_CAP_ARM_SEA_TO_USER)); + + setup_sigbus_handler(); + + vm =3D vm_create_with_sea_handler(&vcpu); + vm_enable_cap(vm, KVM_CAP_ARM_SEA_TO_USER, 0); + vm_inject_memory_uer(vm); + run_vm(vm, vcpu); + kvm_vm_free(vm); + + return 0; +} diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/sel= ftests/kvm/lib/kvm_util.c index a055343a7bf75..d3234239b9845 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -2041,6 +2041,7 @@ static struct exit_reason { KVM_EXIT_STRING(NOTIFY), KVM_EXIT_STRING(LOONGARCH_IOCSR), KVM_EXIT_STRING(MEMORY_FAULT), + KVM_EXIT_STRING(ARM_SEA), }; =20 /* --=20 2.50.1.565.gc32cd1483b-goog From nobody Sun Oct 5 16:20:37 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F1D20264A60 for ; Thu, 31 Jul 2025 20:58:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753995542; cv=none; b=f4QBWMA1DlIaVTjn5U1UBvtNsWMoTj3vuJnr2ZQ9oi1rXDbTiOycUVgf/bw2iZj/HamZlcX8ufB4i9Q6L+SR3klHZVGG94E/AOTjIcYC9xfTp6vJqn6gMmTg2i1w64WjXDSBDkpAFosurQMzKoZ0tpiE3jb2VZAYFQFVy0Aua7Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753995542; c=relaxed/simple; bh=yoT6DJExb3VRWLBkgFpzEF48Jk7YiXC2lt/RI1WEyLY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Opmd2A9rM0CB7qkclFq12EieRHdudsbIMdQkFZ7m2iqnuH0WtCMGDZpO7Bl7MW1DcLKL2d7RVv1XuPQWqL8H11Dt3Zbq+p9WSzjFV/ZotqNg4R5xNPl3Z16W23ytR9iUU6S7rxVsaiA2ZvNJu1pdEL0E3yRMjCiWSbPxGnR48u8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jiaqiyan.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=p6AzG1kK; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jiaqiyan.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="p6AzG1kK" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-3138e65efe2so979344a91.1 for ; Thu, 31 Jul 2025 13:58:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1753995538; x=1754600338; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=eUXB20g5KTDpVxkdi+cD6LrJ9ZPidMR3ejh7YHbLbWc=; b=p6AzG1kK/BGcmyyvXG+z3MJJkWCsBDk1lTRsn+wUwDuSIr4aqI4gupD5DvdV61pape n2qZp8HJrJXQz9jcsXJ99tIwN01Qpo6xFL2+UCzet7Tlz1TiiTpde5W2i4daYvAkiGI1 +FT9Q5CB5uv1DRBeomxM7CPDTtVdBSA8TIOsR6jO45ZckHvkDPigsUUjaUG4lL7n4p9M DzUl/guHlab5bvV97XJBUUIK4bBOAtiK53qAeIkoOqonIcYaf2sK9MBTob6DpFqKOdSI TsBjJSGbwjFuuTTcElUuE2DQXpfje2kjIkLNQJAAWOdmmfwNTABKLsO/DhxkZly/6bVP B0tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753995538; x=1754600338; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eUXB20g5KTDpVxkdi+cD6LrJ9ZPidMR3ejh7YHbLbWc=; b=hxssQs8ulW2lCgsrxl/EP+T9dapbxuqhji0ML2Ubd/+3rpt2sjYbN2LBbFsHmD+d77 6jn9N1S8lKDriV8YMGLtJ8OFT7nXK8Mja0GRJycbxyH3zxUNW4ss1UB6DIqFNt7k434/ SSAobNDeNlYeld2aAFBc+H8f9IiyN7/ryrO9wLLgXFMQB4Xre3ltMe3vdTUaNQ4HeDK9 GYjR1xVy5ds4Ukyy7LVwjOG91BsTj5zEEXfg/dNiE3sVIVM7lEZNp/8zCvpoxN3GLW0j e4AOwvjMgs2/49jorYmSxQTz06ha0DRk0qgLZukDaOJoHBVurMzXiMC6cRKmE52r3BHd KIvg== X-Forwarded-Encrypted: i=1; AJvYcCV41me6gOLpDml+E+hiLJNDK4Na3eTIvXsQbO2r6r5SB63RCPYTT39PQ1HD1UAx3tTA5D81mmUiL3WiH00=@vger.kernel.org X-Gm-Message-State: AOJu0Yx2FxMNoC5avvsT45/12lzE2OWMAjIN4p56XtMKUAdHCZySvdlJ mFd6Nhe51ZF7BsCI8BR0KB72Eo633lz+4fdNJ2AuU8ijeHRr5SHjcdNMn/Qf+eokNAMpdukKDpB k4c16P+s0OoFDZg== X-Google-Smtp-Source: AGHT+IHpN+KMEdudZzx9R+o2+YGbSRX7Sb8ugKl+/4e/Osfparz535RYvSTWw1nMv5ihMLOYXj3j/6PFJN9K4g== X-Received: from pjtq5.prod.google.com ([2002:a17:90a:c105:b0:31c:2fe4:33bc]) (user=jiaqiyan job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2b46:b0:315:9cae:bd8 with SMTP id 98e67ed59e1d1-31f5de3f47dmr11403605a91.23.1753995538407; Thu, 31 Jul 2025 13:58:58 -0700 (PDT) Date: Thu, 31 Jul 2025 20:58:44 +0000 In-Reply-To: <20250731205844.1346839-1-jiaqiyan@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250731205844.1346839-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.50.1.565.gc32cd1483b-goog Message-ID: <20250731205844.1346839-4-jiaqiyan@google.com> Subject: [PATCH v3 3/3] Documentation: kvm: new UAPI for handling SEA From: Jiaqi Yan To: maz@kernel.org, oliver.upton@linux.dev Cc: joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org, pbonzini@redhat.com, corbet@lwn.net, shuah@kernel.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, duenwen@google.com, rananta@google.com, jthoughton@google.com, Jiaqi Yan Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Document the new userspace-visible features and APIs for handling synchronous external abort (SEA) - KVM_CAP_ARM_SEA_TO_USER: How userspace enables the new feature. - KVM_EXIT_ARM_SEA: exit userspace gets when it needs to handle SEA and what userspace gets while taking the SEA. Signed-off-by: Jiaqi Yan --- Documentation/virt/kvm/api.rst | 61 ++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 53e0179d52949..b61bca10a3a10 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -7231,6 +7231,55 @@ exit, even without calls to ``KVM_ENABLE_CAP`` or si= milar. In this case, it will enter with output fields already valid; in the common case, the ``unknown.ret`` field of the union will be ``TDVMCALL_STATUS_SUBFUNC_UNSUP= PORTED``. Userspace need not do anything if it does not wish to support a TDVMCALL. + +:: + /* KVM_EXIT_ARM_SEA */ + struct { + __u64 esr; + #define KVM_EXIT_ARM_SEA_FLAG_GPA_VALID (1ULL << 0) + __u64 flags; + __u64 gva; + __u64 gpa; + } arm_sea; + +Used on arm64 systems. When the VM capability KVM_CAP_ARM_SEA_TO_USER is +enabled, a VM exit is generated if guest causes a synchronous external abo= rt +(SEA) and the host APEI fails to handle the SEA. + +Historically KVM handles SEA by first delegating the SEA to host APEI as t= here +is high chance that the SEA is caused by consuming uncorrected memory erro= r. +However, not all platforms support SEA handling in APEI, and KVM's fallback +handling is to inject an async SError into the guest, which usually panics +guest kernel unpleasantly. As an alternative, userspace can participate in +the SEA handling by enabling KVM_CAP_ARM_SEA_TO_USER at VM creation, after +querying the capability. Once enabled, when KVM has to handle the guest +caused SEA, it returns to userspace with KVM_EXIT_ARM_SEA, with details +about the SEA available in 'arm_sea'. + +The 'esr' field holds the value of the exception syndrome register (ESR) w= hile +KVM taking the SEA, which tells userspace the character of the current SEA, +such as its Exception Class, Synchronous Error Type, Fault Specific Code a= nd +so on. For more details on ESR, check the Arm Architecture Registers +documentation. + +The following values are defined for the 'flags' field + + - KVM_EXIT_ARM_SEA_FLAG_GPA_VALID -- the faulting guest physical address + is valid and userspace can get its value in the 'gpa' field. + +Note userspace can tell whether the faulting guest virtual address is valid +from the FnV bit in 'esr' field. If FnV bit in 'esr' field is not set, the +'gva' field holds the valid faulting guest virtual address. + +Userspace needs to take actions to handle guest SEA synchronously, namely = in +the same thread that runs KVM_RUN and receives KVM_EXIT_ARM_SEA. One of the +encouraged approach is to utilize the KVM_SET_VCPU_EVENTS to inject the SEA +to the faulting VCPU. This way, the guest has the opportunity to keep runn= ing +and limit the blast radius of the SEA to the particular guest application = that +caused the SEA. Userspace may also emulate the SEA to VM by itself using t= he +KVM_SET_ONE_REG API. In this case, it can use the valid values from 'gva' = and +'gpa' fields to manipulate VCPU's registers (e.g. FAR_EL1, HPFAR_EL1). + :: =20 /* Fix the size of the union. */ @@ -8625,6 +8674,18 @@ This capability indicate to the userspace whether a = PFNMAP memory region can be safely mapped as cacheable. This relies on the presence of force write back (FWB) feature support on the hardware. =20 +7.45 KVM_CAP_ARM_SEA_TO_USER +---------------------------- + +:Architecture: arm64 +:Target: VM +:Parameters: none +:Returns: 0 on success, -EINVAL if unsupported. + +This capability, if KVM_CHECK_EXTENSION indicates that it is available, me= ans +that KVM has an implementation that allows userspace to participate in han= dling +synchronous external abort caused by VM, by an exit of KVM_EXIT_ARM_SEA. + 8. Other capabilities. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 --=20 2.50.1.565.gc32cd1483b-goog