From nobody Sun Oct 5 20:02:15 2025 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 582DE255F5C for ; Tue, 29 Jul 2025 19:33:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753817631; cv=none; b=kZUxZGiLFMTGCy2X8VdmHLBEUJlqcKTSFFAv9p6yLHI+SzI/o9a8FW5aTtSBDWCAlO8tZaOad2aHW7Al7lIzXjphgGsEl7NjpFmR2LAYKFzs+WmasA21QQwyLoP9bGLOfCpOQ+qYYgW02ucv+kassBXBsgBKcX8lBiAUViuSMLY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753817631; c=relaxed/simple; bh=L+2rXbfnP5UHLH6J8UIbSI41QWjZkhL1tyySVaOK3Ek=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Y7das+qUVLOpLfw0V14HyKpb+EPdTw2W/Tx8xRP5LwASWQC4x0JoGFvT3X7C6wJ84yk8xGjCxPXJTdymd2k6hVfxT/CXjwxWHEWEJqiz1cnrszGbXuqgyQA++7o/p6CwmDf9v97K5gIHBSQ5bKRN3BFyCHD3OTlm8BrWniScprk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=cAIeG4DQ; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="cAIeG4DQ" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-74ae13e99d6so140343b3a.0 for ; Tue, 29 Jul 2025 12:33:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1753817629; x=1754422429; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=nK97iU3U8P3+PseGfflZpbVtPQJIWPB5aDB6ZsfbPi4=; b=cAIeG4DQv2bCnZtYJeyxx7gR0jV9adUofUysQDi6b84n3r+ASEU/Xdchc0t2VZ143p 1d//X8/CtoZECr9B6+p2Y9vU+odLylKXoJqdIEnCk4RTBZSWk/oAiSEQMCLakl160dNN 0c9OA4j4jnJbp2/v1LRF1U1eI2pR8quVaq1pnPPQKkdTSDWqA9eHsgDPd69F6IlgoMe6 M/TTbzU3sFPhfBJGjeaE6wKrjDx4VmGRg82UGe2NyaRNWz40q9rm+CDl1HhoNyELiQHK bnDv+0P9yQK/0lmtl1TxBRaol6SjoTfqqHO/L6P45YcCEVI2iNe+HjtV5DyQ9xAw2+9P Idfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753817629; x=1754422429; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=nK97iU3U8P3+PseGfflZpbVtPQJIWPB5aDB6ZsfbPi4=; b=tZpsekAZZSiEi7Ljaluvi+eKdTz4j3GpU16fkPBz2cHlHtaIUGNmysZAFuRlIDZHfF 7232Qz2gaJaFxGGEy0qSrDSPQ152qwYcRcpvd7hAvlY3wBiqCi4Z457dcWqgXXkb5zso qSzroJggMYoEsYnVunctG0adY3j/I/uuBcO1bh/uUgAVPReeRnccEckQYAy1R9qDgdPg LBxPZbBqmrHfCSiyF7/E6vOIjI63A8y5R6PAK+3Jsmbusp3fS2wExtLe8AdYLLofvkF1 Y1WRirErXwxv75zij0jSrvG1Cjpmij/6pXEWDWBMDUKC0YR6L6Ah2CvAssGPx1x4LFwv uY4w== X-Forwarded-Encrypted: i=1; AJvYcCWJDBu3Gx5Nm/7k2TwHwIoddlHC7NC1U1XHUx23xC17J80raQ6HxtLn5jXtAyp1j4J33VuuBWTFum37AOA=@vger.kernel.org X-Gm-Message-State: AOJu0YylJ3NkXqHFLQoR7WJM1PEq4+QzNeLt2exs1RC38x/woPdKnkWg qcJyAOrWCgWZ7/E3qD8bGYUM3EYSFRt8EbniOUCK/6ZCqsiQ1cN3airLie8OzTwaGFtyZE3+KYe RfKE7qQ== X-Google-Smtp-Source: AGHT+IH4QbqgK/rjqLq7feIazM0nAtyNNAHCeByH/06YNfkwg9k7lbaoNX81vVU/L2yD55+9XIqmw/waN38= X-Received: from pgac7.prod.google.com ([2002:a05:6a02:2947:b0:b36:36f4:9862]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:914f:b0:23d:659c:aadd with SMTP id adf61e73a8af0-23dadeb68c0mr7268679637.22.1753817628625; Tue, 29 Jul 2025 12:33:48 -0700 (PDT) Reply-To: Sean Christopherson Date: Tue, 29 Jul 2025 12:33:36 -0700 In-Reply-To: <20250729193341.621487-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250729193341.621487-1-seanjc@google.com> X-Mailer: git-send-email 2.50.1.552.g942d659e1b-goog Message-ID: <20250729193341.621487-2-seanjc@google.com> Subject: [PATCH 1/5] KVM: Never clear KVM_REQ_VM_DEAD from a vCPU's requests From: Sean Christopherson To: Marc Zyngier , Oliver Upton , Sean Christopherson , Paolo Bonzini Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Adrian Hunter , Vishal Annapurve , Xiaoyao Li , Rick Edgecombe , Nikolay Borisov Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use kvm_test_request() instead of kvm_check_request() when querying KVM_REQ_VM_DEAD, i.e. don't clear KVM_REQ_VM_DEAD, as the entire purpose of KVM_REQ_VM_DEAD is to prevent the vCPU from enterring the guest ever again, even if userspace insists on redoing KVM_RUN. Ensuring KVM_REQ_VM_DEAD is never cleared will allow relaxing KVM's rule that ioctls can't be invoked on dead VMs, to only disallow ioctls if the VM is bugged, i.e. if KVM hit a KVM_BUG_ON(). Opportunistically add compile-time assertions to guard against clearing KVM_REQ_VM_DEAD through the standard APIs. Signed-off-by: Sean Christopherson --- arch/arm64/kvm/arm.c | 2 +- arch/x86/kvm/mmu/mmu.c | 2 +- arch/x86/kvm/vmx/tdx.c | 2 +- arch/x86/kvm/x86.c | 2 +- include/linux/kvm_host.h | 9 +++++++-- 5 files changed, 11 insertions(+), 6 deletions(-) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index f946926716b0..2fdc48c0fc4d 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1013,7 +1013,7 @@ static int kvm_vcpu_suspend(struct kvm_vcpu *vcpu) static int check_vcpu_requests(struct kvm_vcpu *vcpu) { if (kvm_request_pending(vcpu)) { - if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) + if (kvm_test_request(KVM_REQ_VM_DEAD, vcpu)) return -EIO; =20 if (kvm_check_request(KVM_REQ_SLEEP, vcpu)) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 6e838cb6c9e1..d09bd236a92d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4915,7 +4915,7 @@ int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa= , u64 error_code, u8 *level if (signal_pending(current)) return -EINTR; =20 - if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) + if (kvm_test_request(KVM_REQ_VM_DEAD, vcpu)) return -EIO; =20 cond_resched(); diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 66744f5768c8..3e0d4edee849 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -2010,7 +2010,7 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *= vcpu) if (kvm_vcpu_has_events(vcpu) || signal_pending(current)) break; =20 - if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) { + if (kvm_test_request(KVM_REQ_VM_DEAD, vcpu)) { ret =3D -EIO; break; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a1c49bc681c4..1700df68f12a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10649,7 +10649,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) bool req_immediate_exit =3D false; =20 if (kvm_request_pending(vcpu)) { - if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) { + if (kvm_test_request(KVM_REQ_VM_DEAD, vcpu)) { r =3D -EIO; goto out; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 15656b7fba6c..627054d27222 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2261,13 +2261,18 @@ static inline bool kvm_test_request(int req, struct= kvm_vcpu *vcpu) return test_bit(req & KVM_REQUEST_MASK, (void *)&vcpu->requests); } =20 -static inline void kvm_clear_request(int req, struct kvm_vcpu *vcpu) +static __always_inline void kvm_clear_request(int req, struct kvm_vcpu *vc= pu) { + BUILD_BUG_ON(req =3D=3D KVM_REQ_VM_DEAD); + clear_bit(req & KVM_REQUEST_MASK, (void *)&vcpu->requests); } =20 -static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu) +static __always_inline bool kvm_check_request(int req, struct kvm_vcpu *vc= pu) { + /* Once a VM is dead, it needs to stay dead. */ + BUILD_BUG_ON(req =3D=3D KVM_REQ_VM_DEAD); + if (kvm_test_request(req, vcpu)) { kvm_clear_request(req, vcpu); =20 --=20 2.50.1.552.g942d659e1b-goog From nobody Sun Oct 5 20:02:15 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F3AC260582 for ; Tue, 29 Jul 2025 19:33:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753817632; cv=none; b=oZng6ktOWaFFSs7/BMW3mFnzyyWmzecMZDS2InF0s+WMTYAjg8Tck3jXz/SoEyvdkXhD1lCszZJbAWnzEskSYEG0GDE5tOwFgyZiFOYmh56yt9pvMT1vk1qEm4+O45DQ9YmmnSiboZ9ykaLLLVkn9NT4oCbnU6V8xhoCWPTw8yk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753817632; c=relaxed/simple; bh=sqgm1EIxKlNrAgPRJQWz+ccC/dJ7P7wFp1H9ZQzuLhw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=iRdrfHNj4MnjAeewjTeQyLovO85FTWGrGPPXIyQkTRtJDal//a6GZyABYB1l+icqS+hQQwTEPCfkSM4UvDmYrtmbiUqrhrW1a9Rapp2C8bZZubQaC+W5GhPEVraaK/g/l2Hpr0u3V8lfxuZrbfZNKwvbL+k4fhXRYv2W+wrMMh8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Dw5dQutD; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Dw5dQutD" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-31ed2a7d475so2822225a91.1 for ; Tue, 29 Jul 2025 12:33:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1753817630; x=1754422430; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=JJX8dU8+n+0QNuI44F28R1+g564ctZSa7EPvc3ZFM1U=; b=Dw5dQutD1AQktBJZnfWwu/psiv4pYLyZN9TSImEP3IRqHg5D3v8tlrSWUhtf7Ys+Bl 8X6h+hVLebqlKjiJiuxuaIilf/14PPMh7P9xELldDIQpuRp6OYncSi0tzunnKB4jy2RA ESWYhaA1EzW6JiqvSaGP1zLMa0EUO3PYkTM/5ASfEkzRsVRte0xwjawlVlITWLK4njSj UzsQKcLgkROUFMCSbHAjoLWwDOkWZIT/GJJNiGP5lmch4woQEN4r+kzv1Xgx1gDuz4GJ st9eT8lk/QPVHD7i0X9+OXGLCnuOIKO7nRBQlq8M0KVSdmByao/bfs4Zbq6HFQiiJcr3 PY1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753817630; x=1754422430; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=JJX8dU8+n+0QNuI44F28R1+g564ctZSa7EPvc3ZFM1U=; b=dLU050eBNy+cO7+Lz7gGqv6qGT0JmzQWntzv2VWAueU4QFynylIV1NgcjA2p8rqh4J mDRJUo/OlMH89BRvDPpPdpo4tgsbeqc1lSU/iFG1O0f3yCV//sgclUBdtSoHQj/fLvw6 gfRD41uW5rUGS5bqtDGZC41dh1DQpdKDz9mk1r8O3ag4EWTlTh/CTIXToX6Hz89hLRz7 RGWru27VmqQHk65CEpng3wQ5P70hnTdDxahPKK70I/ZGQ9h/a5nQ4757HjZ5uOYc7qZE lfwtV4yZReRm3RvOBeugkafnNGHMP391jmiKCPrMIcXWMXNeHwfbaHeTW2975WclJ/Sw P05A== X-Forwarded-Encrypted: i=1; AJvYcCVcAl4VLtk2TkR9spiKtAtf8a8oI0EQeHb5Z0Kp7ycStP9DiCVukHc/IrdIRY86uV4aWJSS3YcMvDaxT6c=@vger.kernel.org X-Gm-Message-State: AOJu0YySf0DKN/KxKs4yzVNh79Q/iPLWANwzL6ockLJOUDX8PPHtCjSC 2lcqJBWOj61ll1v+tHJjcEmnMwiPRFjg0kEHMaCIVtUnDGuE9oPysyCZoALXI9Q3h2x0avZ1SZI +zSGpMA== X-Google-Smtp-Source: AGHT+IHfEWMEuUfBMeiKYE9aLfHbALVYy3IIrBllmmLaQcn7q2hIwc6mBe+teoBHRd1eD2W+YlhcAuMLoXU= X-Received: from pjbpv8.prod.google.com ([2002:a17:90b:3c88:b0:31e:998f:7b79]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:33d2:b0:31e:c8fc:e630 with SMTP id 98e67ed59e1d1-31f5de73bf9mr824146a91.26.1753817630463; Tue, 29 Jul 2025 12:33:50 -0700 (PDT) Reply-To: Sean Christopherson Date: Tue, 29 Jul 2025 12:33:37 -0700 In-Reply-To: <20250729193341.621487-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250729193341.621487-1-seanjc@google.com> X-Mailer: git-send-email 2.50.1.552.g942d659e1b-goog Message-ID: <20250729193341.621487-3-seanjc@google.com> Subject: [PATCH 2/5] KVM: TDX: Exit with MEMORY_FAULT on unexpected pending S-EPT Violation From: Sean Christopherson To: Marc Zyngier , Oliver Upton , Sean Christopherson , Paolo Bonzini Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Adrian Hunter , Vishal Annapurve , Xiaoyao Li , Rick Edgecombe , Nikolay Borisov Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Exit to userspace with -EFAULT and a valid MEMORY_FAULT exit if a vCPU hits an unexpected pending S-EPT Violation instead of marking the VM dead. While it's unlikely the VM can continue on, whether or not to terminate the VM is not KVM's decision to make. Set memory_fault.size to zero to communicate to userspace that reported fault is "bad", and to effectively terminate the VM if userspace blindly treats the exit as a conversion attempt (KVM_SET_MEMORY_ATTRIBUTES will fail with -EINVAL if the size is zero). Opportunistically delete the pr_warn(), which could be abused to spam the kernel log, and is largely useless outside of interact debug as it doesn't specify which VM encountered a failure. Signed-off-by: Sean Christopherson Reviewed-by: Rick Edgecombe --- arch/x86/kvm/vmx/tdx.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 3e0d4edee849..c2ef03f39c32 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1937,10 +1937,8 @@ static int tdx_handle_ept_violation(struct kvm_vcpu = *vcpu) =20 if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) { if (tdx_is_sept_violation_unexpected_pending(vcpu)) { - pr_warn("Guest access before accepting 0x%llx on vCPU %d\n", - gpa, vcpu->vcpu_id); - kvm_vm_dead(vcpu->kvm); - return -EIO; + kvm_prepare_memory_fault_exit(vcpu, gpa, 0, true, false, true); + return -EFAULT; } /* * Always treat SEPT violations as write faults. Ignore the --=20 2.50.1.552.g942d659e1b-goog From nobody Sun Oct 5 20:02:15 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF22F288CB7 for ; Tue, 29 Jul 2025 19:33:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753817634; cv=none; b=UW+gKc99MuVrGrESnNXis8SYueBTUDCGdsVHibaOArASVmDHXiwJpx0Js82U38rK5iOxAhEbV8rsom65CmCcm2QvBuasd+gJCQsgezzW3b9eLhMi7VMP3aIadjauTZ2azsQwdVREp5Ydo/Ks7IWKgAQhLUqceJB1HyUpCh8TLoY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753817634; c=relaxed/simple; bh=X6kSoWARgQhTBHyJo7xEWPtfNUe15LYs+m14KA9l8uM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=jUxugBWf63p4X/bYN2qgTH1R3xBSfbNQ6ROfBjVTP2S+QoZBY9VOdoIrLFH6Ph38vSe6b7FLk2TIKFUbP7eiEeyJP0PMMDVuUfnrsqvhOJ+MfK6w73nF7QthgPe6BQrpqO3rrcYznLFxeC8tcEj/SMQ6WzkGZDnCR4xQdg+1GSI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Q4vCi3Sh; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Q4vCi3Sh" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-31366819969so142188a91.0 for ; Tue, 29 Jul 2025 12:33:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1753817632; x=1754422432; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=ZdHlnZ98+ZVsmpjhh1EatY5FQnyPvSy7oQ52FgoL3AU=; b=Q4vCi3ShYDN/hQBFDzkRgCk804dRCJSsavcu7aTrtlRlNwdUYKQqnC2WKbixHIXDDW m07oWVcjqYcufQps6ZxR4W3CyamVgQIuMT1yXEf8BOFharChSdtynUg4bTyYHCPpNWfw XerbyrcI1jnW36QsmiI5u4zIl8HZxjpwvmEJs+2d7wMUayGGcafxTfFUvedJ/fegJ/Gi nnaB53HhbjLFOdVlcaeamaEmjMrl8YNs8K9X7QRrpFUqR5z25MG31x6K2KgmyrckSh8a 7ya8GtgL32KbBXFLZLsFqwQJanWeGzZOAddHjVxyl7DTzeCrlTwn+CTVkSWfkaGij6d/ Kdug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753817632; x=1754422432; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ZdHlnZ98+ZVsmpjhh1EatY5FQnyPvSy7oQ52FgoL3AU=; b=CTUKnTYu6bGwZb/tMzXXoxLi8e9Wy0js/MMmwWwXjumUtig7hAQdAgzofpneEOwvqi VtaIZGnBhhyu6AxC0zFfpVKJnaC6lhVFvxdN0pk2xevXMf3k9izrM6ZT+GC7uBBfxke3 IVvik+VrYYgLAK57fqfwYJEppUN/GtWAVV33hllpYjwuaJjWWjhknlKSFNngwmKNXhkL nhQwlxDG+udo8ebFYbem9YAmxk01CCDjNdzyXz2mzydLbmkul1C9wbRuHAzGxZoHOZZp B/jymxIrSuUGL62fGVnEZjKrEmDLiODcQLHmM4uaCVFLm7O/n0Ql2okOUaCW9+YmJ7U/ 4ONQ== X-Forwarded-Encrypted: i=1; AJvYcCWHubzTm/yTzJPKwcKr58B0Q00rvC/SlZDk6vfty/8jgKVMtm/q659XOjIKQLmzTB2iFaHxCKa+1+oCkMg=@vger.kernel.org X-Gm-Message-State: AOJu0YxSm0utc/cuQGruJjCfJO8Z/YHQVp36eICrUzmpchh3tNaZj/nB QoA+Ax1hv8RUHBKnfyiIGZPPfB1ASshgyZ4kxHvbYYLIfvy2PfYaxc2R9nOUYMNcCJ6R5eFIh2w 1jQtlpA== X-Google-Smtp-Source: AGHT+IFFe8P3g6K0zh0qtvZDIKe8VSkXT7Rqbzi2+TCDvK005OkWz+Mz8UVq7jUspvUvHpMWKrpsZ+VbdgQ= X-Received: from pjbqc14.prod.google.com ([2002:a17:90b:288e:b0:31c:32f8:3f88]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:17c2:b0:313:b1a:3939 with SMTP id 98e67ed59e1d1-31f5ddcd3c6mr813328a91.15.1753817631999; Tue, 29 Jul 2025 12:33:51 -0700 (PDT) Reply-To: Sean Christopherson Date: Tue, 29 Jul 2025 12:33:38 -0700 In-Reply-To: <20250729193341.621487-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250729193341.621487-1-seanjc@google.com> X-Mailer: git-send-email 2.50.1.552.g942d659e1b-goog Message-ID: <20250729193341.621487-4-seanjc@google.com> Subject: [PATCH 3/5] KVM: Reject ioctls only if the VM is bugged, not simply marked dead From: Sean Christopherson To: Marc Zyngier , Oliver Upton , Sean Christopherson , Paolo Bonzini Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Adrian Hunter , Vishal Annapurve , Xiaoyao Li , Rick Edgecombe , Nikolay Borisov Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Relax the protection against interacting with a buggy KVM to only reject ioctls if the VM is bugged, i.e. allow userspace to invoke ioctls if KVM deliberately terminated the VM. Drop kvm.vm_dead as there are no longer any readers, and KVM shouldn't rely on vm_dead for functional correctness. The only functional guarantees provided by kvm_vm_dead() come by way of KVM_REQ_VM_DEAD, which ensures that vCPU won't re-enter the guest. Practically speaking, this only affects x86, which uses kvm_vm_dead() to prevent running a VM whose resources have been partially freed or has run one or more of its vCPUs into an architecturally defined state. In these cases, there is no (known) danger to KVM, the goal is purely to prevent entering the guest. As evidenced by commit ecf371f8b02d ("KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation is in-flight"), the restriction on invoking ioctls only blocks _new_ ioctls. I.e. KVM mustn't rely on blocking ioctls for functional safety (whereas KVM_REQ_VM_DEAD is guaranteed to prevent vCPUs from entering the guest). Signed-off-by: Sean Christopherson --- arch/arm64/kvm/vgic/vgic-init.c | 2 +- include/linux/kvm_host.h | 2 -- tools/testing/selftests/kvm/x86/sev_migrate_tests.c | 5 +---- virt/kvm/kvm_main.c | 10 +++++----- 4 files changed, 7 insertions(+), 12 deletions(-) diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-ini= t.c index eb1205654ac8..c2033bae73b2 100644 --- a/arch/arm64/kvm/vgic/vgic-init.c +++ b/arch/arm64/kvm/vgic/vgic-init.c @@ -612,7 +612,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) mutex_unlock(&kvm->arch.config_lock); out_slots: if (ret) - kvm_vm_dead(kvm); + kvm_vm_bugged(kvm); =20 mutex_unlock(&kvm->slots_lock); =20 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 627054d27222..fa97d71577b5 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -854,7 +854,6 @@ struct kvm { u32 dirty_ring_size; bool dirty_ring_with_bitmap; bool vm_bugged; - bool vm_dead; =20 #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER struct notifier_block pm_notifier; @@ -894,7 +893,6 @@ struct kvm { =20 static inline void kvm_vm_dead(struct kvm *kvm) { - kvm->vm_dead =3D true; kvm_make_all_cpus_request(kvm, KVM_REQ_VM_DEAD); } =20 diff --git a/tools/testing/selftests/kvm/x86/sev_migrate_tests.c b/tools/te= sting/selftests/kvm/x86/sev_migrate_tests.c index 0a6dfba3905b..0580bee5888e 100644 --- a/tools/testing/selftests/kvm/x86/sev_migrate_tests.c +++ b/tools/testing/selftests/kvm/x86/sev_migrate_tests.c @@ -87,10 +87,7 @@ static void test_sev_migrate_from(bool es) sev_migrate_from(dst_vms[i], dst_vms[i - 1]); =20 /* Migrate the guest back to the original VM. */ - ret =3D __sev_migrate_from(src_vm, dst_vms[NR_MIGRATE_TEST_VMS - 1]); - TEST_ASSERT(ret =3D=3D -1 && errno =3D=3D EIO, - "VM that was migrated from should be dead. ret %d, errno: %d", ret, - errno); + sev_migrate_from(src_vm, dst_vms[NR_MIGRATE_TEST_VMS - 1]); =20 kvm_vm_free(src_vm); for (i =3D 0; i < NR_MIGRATE_TEST_VMS; ++i) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 6c07dd423458..f1f69e10a371 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4408,7 +4408,7 @@ static long kvm_vcpu_ioctl(struct file *filp, struct kvm_fpu *fpu =3D NULL; struct kvm_sregs *kvm_sregs =3D NULL; =20 - if (vcpu->kvm->mm !=3D current->mm || vcpu->kvm->vm_dead) + if (vcpu->kvm->mm !=3D current->mm || vcpu->kvm->vm_bugged) return -EIO; =20 if (unlikely(_IOC_TYPE(ioctl) !=3D KVMIO)) @@ -4651,7 +4651,7 @@ static long kvm_vcpu_compat_ioctl(struct file *filp, void __user *argp =3D compat_ptr(arg); int r; =20 - if (vcpu->kvm->mm !=3D current->mm || vcpu->kvm->vm_dead) + if (vcpu->kvm->mm !=3D current->mm || vcpu->kvm->vm_bugged) return -EIO; =20 switch (ioctl) { @@ -4717,7 +4717,7 @@ static long kvm_device_ioctl(struct file *filp, unsig= ned int ioctl, { struct kvm_device *dev =3D filp->private_data; =20 - if (dev->kvm->mm !=3D current->mm || dev->kvm->vm_dead) + if (dev->kvm->mm !=3D current->mm || dev->kvm->vm_bugged) return -EIO; =20 switch (ioctl) { @@ -5139,7 +5139,7 @@ static long kvm_vm_ioctl(struct file *filp, void __user *argp =3D (void __user *)arg; int r; =20 - if (kvm->mm !=3D current->mm || kvm->vm_dead) + if (kvm->mm !=3D current->mm || kvm->vm_bugged) return -EIO; switch (ioctl) { case KVM_CREATE_VCPU: @@ -5403,7 +5403,7 @@ static long kvm_vm_compat_ioctl(struct file *filp, struct kvm *kvm =3D filp->private_data; int r; =20 - if (kvm->mm !=3D current->mm || kvm->vm_dead) + if (kvm->mm !=3D current->mm || kvm->vm_bugged) return -EIO; =20 r =3D kvm_arch_vm_compat_ioctl(filp, ioctl, arg); --=20 2.50.1.552.g942d659e1b-goog From nobody Sun Oct 5 20:02:15 2025 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6916B298981 for ; Tue, 29 Jul 2025 19:33:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753817635; cv=none; b=IW5WM3yqbSku5KqBaOa2RZKsLF+aeX4d9NfPyfMDua82KW3GkAwzIY+lIZitVCbU4GBw39C9yw/EV5E/t0kSUvO8eoM1Wi5c8/AQUgpc9OqI+DkjoDaQzdU0dCR+wllCVB32TBovcIo6Ju6yarrkv1oeqY207zv5sqeUsQdKGsU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753817635; c=relaxed/simple; bh=fdpTasDFylH0aih/WYUrRlBApDM32hVkvXUFxrZnj7c=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=NWJs59+HXUPBeI/z+WSHfVFHYBz3BXv6PNx07Aap92D973ocNMXkQLKYv0Xn8KBIXkXOilZFEHOP6Db+nlx8MowFUBox2rv/u8AevHpFVvyVk2z6M+dFAtlEySzIXLY8AdNb1lk2SHixMCvVkhjprqoAggLmjBeHqAsT4vVG3uU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=IP91EMDy; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="IP91EMDy" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b2c37558eccso4071718a12.1 for ; Tue, 29 Jul 2025 12:33:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1753817634; x=1754422434; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=7kW936wFYM0Ary5B3+n4kBp/4+0suFDEkHTdVVXpkj4=; b=IP91EMDyNjw3k8Z+1ElEQJ5TP/ZpTwQv5kdxpk2JhS2i5zL4sAxu1SnQjKzj77w88i GtuuU03qz5x/U3wf6wPsDNc4QogKfpU4DmmaWPaqpMqfp/GHK46eiAATRhpmXAJ+HQLw kxFW0OC73ichLTJ+xf4va2DRjlopBxG60VGb8QlwXcFAN+iwQyR1WLLq4PNETqnhEeuy wSvr4Z8nmK2sIwc9zs0GWylpURovu8D97veDNPWjXO7V8lgvLgtG9yGXM2dBSDayV4iy 8yhpzFg6zosqgoxm14OVtgF5qxk0STb9kzCQ4ZLP6qKVEmfx8LxtfFEEi9SsJUgKo4+r GW/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753817634; x=1754422434; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7kW936wFYM0Ary5B3+n4kBp/4+0suFDEkHTdVVXpkj4=; b=EIqmGYliAO412W3MB9MEssjDtxrPkwz8QjS8nAUPoRYCrul0jD4Arrv8H3bzMiP8f1 a8cn/h4Tf0U8GAMF1TYYQ2AYIfAUyOSxJPLE0O4Queattz+lNA2sxCJiEvZrGuxevyXZ at9IQ64N/KDV/6h23I60FPTxXJQHz18ZHPFcCihX5vV7JFqRbCCQKUe/vutstE3tDbYO JqxP0zoE3PJ7IOW9pAQPFfaSQ62qRliyVlfoH1W1Wqit6cViBcaRCE0dZDdX/Ypt4ooQ qUtv3Ltpj3cMb/FbgztT6DJg27Zjez9dLzvX4hJw0258do0SXmGaUP6JHLJf2DBAR5Gd qOww== X-Forwarded-Encrypted: i=1; AJvYcCWDnY2uq5Hz3GEkniFRveYrkk2C5yC0WkatX0MNiBhdXHnnm4y5sfJt1L8rR+fhNFz6pjDQ8Cll40oteMo=@vger.kernel.org X-Gm-Message-State: AOJu0YyXkgtf9xKDHUX/gD2nb7PYQJ4WJYLyIbt2XCkJUcB2B8xSBMo3 2icSLa652OGM5/GEfyLTN13JoX9kDQmi8Txn7nNHbQpemu8CWDx+0JQhTIZx85NMyAxTI3MbS72 /w40c4g== X-Google-Smtp-Source: AGHT+IELFOFWc8vYkYZRQRf3VfKYyLdceEz1T/Y/HkOfoDthT5RfeT3LHFZMzx5elvj9USSUE7rYmRIpqRk= X-Received: from pjsc3.prod.google.com ([2002:a17:90a:bf03:b0:31f:a0:fad4]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2684:b0:313:20d2:c99b with SMTP id 98e67ed59e1d1-31f5dd9de5emr917579a91.9.1753817633690; Tue, 29 Jul 2025 12:33:53 -0700 (PDT) Reply-To: Sean Christopherson Date: Tue, 29 Jul 2025 12:33:39 -0700 In-Reply-To: <20250729193341.621487-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250729193341.621487-1-seanjc@google.com> X-Mailer: git-send-email 2.50.1.552.g942d659e1b-goog Message-ID: <20250729193341.621487-5-seanjc@google.com> Subject: [PATCH 4/5] KVM: selftests: Use for-loop to handle all successful SEV migrations From: Sean Christopherson To: Marc Zyngier , Oliver Upton , Sean Christopherson , Paolo Bonzini Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Adrian Hunter , Vishal Annapurve , Xiaoyao Li , Rick Edgecombe , Nikolay Borisov Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use the main for-loop in the "SEV migrate from" testcase to handle all successful migrations, as there is nothing inherently unique about the original source VM beyond it needing to be created as an SEV VM. No functional change intended. Signed-off-by: Sean Christopherson --- .../selftests/kvm/x86/sev_migrate_tests.c | 31 +++++++++---------- 1 file changed, 14 insertions(+), 17 deletions(-) diff --git a/tools/testing/selftests/kvm/x86/sev_migrate_tests.c b/tools/te= sting/selftests/kvm/x86/sev_migrate_tests.c index 0580bee5888e..b501c916edf5 100644 --- a/tools/testing/selftests/kvm/x86/sev_migrate_tests.c +++ b/tools/testing/selftests/kvm/x86/sev_migrate_tests.c @@ -14,7 +14,7 @@ #include "kselftest.h" =20 #define NR_MIGRATE_TEST_VCPUS 4 -#define NR_MIGRATE_TEST_VMS 3 +#define NR_MIGRATE_TEST_VMS 4 #define NR_LOCK_TESTING_THREADS 3 #define NR_LOCK_TESTING_ITERATIONS 10000 =20 @@ -72,26 +72,23 @@ static void sev_migrate_from(struct kvm_vm *dst, struct= kvm_vm *src) =20 static void test_sev_migrate_from(bool es) { - struct kvm_vm *src_vm; - struct kvm_vm *dst_vms[NR_MIGRATE_TEST_VMS]; - int i, ret; + struct kvm_vm *vms[NR_MIGRATE_TEST_VMS]; + int i; =20 - src_vm =3D sev_vm_create(es); - for (i =3D 0; i < NR_MIGRATE_TEST_VMS; ++i) - dst_vms[i] =3D aux_vm_create(true); - - /* Initial migration from the src to the first dst. */ - sev_migrate_from(dst_vms[0], src_vm); - - for (i =3D 1; i < NR_MIGRATE_TEST_VMS; i++) - sev_migrate_from(dst_vms[i], dst_vms[i - 1]); + vms[0] =3D sev_vm_create(es); + for (i =3D 1; i < NR_MIGRATE_TEST_VMS; ++i) + vms[i] =3D aux_vm_create(true); =20 - /* Migrate the guest back to the original VM. */ - sev_migrate_from(src_vm, dst_vms[NR_MIGRATE_TEST_VMS - 1]); + /* + * Migrate in N times, in a chain from the initial SEV VM to each "aux" + * VM, and finally back to the original SEV VM. KVM disallows KVM_RUN + * on the source after migration, but all other ioctls should succeed. + */ + for (i =3D 0; i < NR_MIGRATE_TEST_VMS; i++) + sev_migrate_from(vms[(i + 1) % NR_MIGRATE_TEST_VMS], vms[i]); =20 - kvm_vm_free(src_vm); for (i =3D 0; i < NR_MIGRATE_TEST_VMS; ++i) - kvm_vm_free(dst_vms[i]); + kvm_vm_free(vms[i]); } =20 struct locking_thread_input { --=20 2.50.1.552.g942d659e1b-goog From nobody Sun Oct 5 20:02:15 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0F43255F5C for ; Tue, 29 Jul 2025 19:33:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753817637; cv=none; b=dECnlawVLgLqns+aBqPq8YKA4MlnMcZp88dbW2Knj9KF0XCX/CPRHaDFuBmUYkaJp0SWX3JBBuzAezqY6oAZAMhF5mmxIwgSrmX3ZGRUGP2wf9kcUOh2xcS77U5l7/ubnJsRAlmPH5OMI89ckdfG2WlYjcEh6XRU0BnnaWmk8mM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753817637; c=relaxed/simple; bh=GezEG6D+dGW2ASk1t1mkWRtt3ji/dNz8mwQjkZHk1gk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=WSEKsgLBFfU1PtzXhoPPUXFCnDYFOxOtBK9fbmrblio2E1u3ygozhKTbgCJKSrI83UL35a0Dry8QYHohdVqjU6NTaJI0YOUTT+Io2KGM3vw/FH8MBPpt097KOivDapIQa6kXeFFXGxaYqgSI/whufI2WjZE0/MrhehfuYtBCJiA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=JXE87pYp; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="JXE87pYp" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-31ed2a7d475so2822276a91.1 for ; Tue, 29 Jul 2025 12:33:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1753817635; x=1754422435; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=LRgTVb6HxivGsODMp4kE86IHTk8ygg+wPbdHcPP2xOg=; b=JXE87pYphuTyhvNjf1jT2DbiwheL7uzaf7PQ1fnD0sQZ27L0EVodT+iOz7M+5njpZr aqTOfS7al0xe4aph6tcM7+bIF2lB7k5i9PvI7Z993z6m7s2yL13ghoejYGtTngonEPew xUpSPTtzv93Xph8wEtVeLRC9GvBXeWY02Cp4kzQTj+bkdx2EjfDPWHqD7iCco2BP+L86 48/lzoXXGopOD23gc8veDndoiG9b56WS4fsbPO2eoUnrACMhpkiwjQtnEnZSpb8PFqvo lSbnSVCDKj6VIXRKGzg10Tjj0KpnTwh7fT1Km0gwrU+Jg/vSx3fc1bi0Ffz142QbZizF 0WCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753817635; x=1754422435; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=LRgTVb6HxivGsODMp4kE86IHTk8ygg+wPbdHcPP2xOg=; b=PI+26FMIjFuzzkE9Un7w6GP1ZOfSZ0mVQPdyl/tSowEsqQQ6w7WQ4675bxV2ocBjf0 11PHtL0o9umTXa+bZtw0FDpP/R85sREcAajV6p8JXjOI9flBLuVw1Eu6wtDIdleLekiV 0G0298gyrJplJ0tI3/dcnUNTFXVjwGPqDdcRoLRlDRAp/tzulIJTQUh+adoRlC1Zmo2l 6ik3vlSw+uqTUksiXpa1GhAQ8NqWN0N6tpN+mxOVzYwSzEIgH+0lteOXZ2Yzer3FT9bP 8jQdusQAfkmoT44FhJzoNa8/DhPvteIqe4XIgog68D2WNamSI4PCZtUMi4+wgWKEvqc4 a2bQ== X-Forwarded-Encrypted: i=1; AJvYcCW5jkA+HWhfJawLd+Rq+ToOl6rgOgKYQi01vriu5k+Su1yc7q76qNaOzCqpZByUspMPhTBNKRfAQ72VaQ8=@vger.kernel.org X-Gm-Message-State: AOJu0YzppZdERBGNxLuwm8fGp83+cbk4T1XTxWVu6K+rb29wzbQFz2tk xijEeBZVuuTpZARJFaP6khW0KOk/Sf91BTwecdWP6yZvRaV7m/jxLhBwukEftgYJ+u/2atAqLXd MopM67g== X-Google-Smtp-Source: AGHT+IG/gL3d24hNw5SUEuxjxjut5eSNrEkkhi5cedOb5e+QFzIszFgoNw0OLcITFvyVJnRdvLf93b3+R4g= X-Received: from pjqo23.prod.google.com ([2002:a17:90a:ac17:b0:31f:f3:f8c3]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:5285:b0:312:f650:c795 with SMTP id 98e67ed59e1d1-31f5de54b6dmr784023a91.21.1753817635200; Tue, 29 Jul 2025 12:33:55 -0700 (PDT) Reply-To: Sean Christopherson Date: Tue, 29 Jul 2025 12:33:40 -0700 In-Reply-To: <20250729193341.621487-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250729193341.621487-1-seanjc@google.com> X-Mailer: git-send-email 2.50.1.552.g942d659e1b-goog Message-ID: <20250729193341.621487-6-seanjc@google.com> Subject: [PATCH 5/5] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM From: Sean Christopherson To: Marc Zyngier , Oliver Upton , Sean Christopherson , Paolo Bonzini Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Adrian Hunter , Vishal Annapurve , Xiaoyao Li , Rick Edgecombe , Nikolay Borisov Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add sub-ioctl KVM_TDX_TERMINATE_VM to release the HKID prior to shutdown, which enables more efficient reclaim of private memory. Private memory is removed from MMU/TDP when guest_memfds are closed. If the HKID has not been released, the TDX VM is still in the RUNNABLE state, and so pages must be removed using "Dynamic Page Removal" procedure (refer to the TDX Module Base spec) which involves a number of steps: Block further address translation Exit each VCPU Clear Secure EPT entry Flush/write-back/invalidate relevant caches However, when the HKID is released, the TDX VM moves to TD_TEARDOWN state, where all TDX VM pages are effectively unmapped, so pages can be reclaimed directly. Reclaiming TD Pages in TD_TEARDOWN State was seen to decrease the total reclaim time. For example: VCPUs Size (GB) Before (secs) After (secs) 4 18 72 24 32 107 517 134 64 400 5539 467 Add kvm_tdx_capabilities.supported_caps along with KVM_TDX_CAP_TERMINATE_VM to advertise support to userspace. Use a new field in kvm_tdx_capabilities instead of adding yet another generic KVM_CAP to avoid bleeding TDX details into common code (and #ifdefs), and so that userspace can query TDX capabilities in one shot. Enumerating capabilities as a mask of bits does limit supported_caps to 64 capabilities, but in the unlikely event KVM needs to enumerate more than 64 TDX capabilities, there are another 249 u64 entries reserved for future expansion. To preserve the KVM_BUG_ON() sanity check that deals with HKID assignment, track if a TD is terminated and assert that, when an S-EPT entry is removed, either the TD has an assigned HKID or the TD was explicitly terminated. Link: https://lore.kernel.org/r/Z-V0qyTn2bXdrPF7@google.com Link: https://lore.kernel.org/r/aAL4dT1pWG5dDDeo@google.com Co-developed-by: Adrian Hunter Signed-off-by: Adrian Hunter Acked-by: Vishal Annapurve Tested-by: Vishal Annapurve Tested-by: Xiaoyao Li Cc: Rick Edgecombe Cc: Nikolay Borisov Signed-off-by: Sean Christopherson --- Documentation/virt/kvm/x86/intel-tdx.rst | 22 +++++++++++++- arch/x86/include/uapi/asm/kvm.h | 7 ++++- arch/x86/kvm/vmx/tdx.c | 37 +++++++++++++++++++----- arch/x86/kvm/vmx/tdx.h | 1 + 4 files changed, 57 insertions(+), 10 deletions(-) diff --git a/Documentation/virt/kvm/x86/intel-tdx.rst b/Documentation/virt/= kvm/x86/intel-tdx.rst index 5efac62c92c7..bcfa97e0c9e7 100644 --- a/Documentation/virt/kvm/x86/intel-tdx.rst +++ b/Documentation/virt/kvm/x86/intel-tdx.rst @@ -38,6 +38,7 @@ ioctl with TDX specific sub-ioctl() commands. KVM_TDX_INIT_MEM_REGION, KVM_TDX_FINALIZE_VM, KVM_TDX_GET_CPUID, + KVM_TDX_TERMINATE_VM, =20 KVM_TDX_CMD_NR_MAX, }; @@ -92,7 +93,10 @@ to be configured to the TDX guest. __u64 kernel_tdvmcallinfo_1_r12; __u64 user_tdvmcallinfo_1_r12; =20 - __u64 reserved[250]; + /* Misc capabilities enumerated via the KVM_TDX_CAP_* namespace. */ + __u64 supported_caps; + + __u64 reserved[249]; =20 /* Configurable CPUID bits for userspace */ struct kvm_cpuid2 cpuid; @@ -227,6 +231,22 @@ struct kvm_cpuid2. __u32 padding[3]; }; =20 +KVM_TDX_TERMINATE_VM +-------------------- +:Capability: KVM_TDX_CAP_TERMINATE_VM +:Type: vm ioctl +:Returns: 0 on success, <0 on error + +Release Host Key ID (HKID) to allow more efficient reclaim of private memo= ry. +After this, the TD is no longer in a runnable state. + +Using KVM_TDX_TERMINATE_VM is optional. + +- id: KVM_TDX_TERMINATE_VM +- flags: must be 0 +- data: must be 0 +- hw_error: must be 0 + KVM TDX creation flow =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D In addition to the standard KVM flow, new TDX ioctls need to be called. T= he diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv= m.h index 0f15d683817d..e019111e2150 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -940,6 +940,7 @@ enum kvm_tdx_cmd_id { KVM_TDX_INIT_MEM_REGION, KVM_TDX_FINALIZE_VM, KVM_TDX_GET_CPUID, + KVM_TDX_TERMINATE_VM, =20 KVM_TDX_CMD_NR_MAX, }; @@ -962,6 +963,8 @@ struct kvm_tdx_cmd { __u64 hw_error; }; =20 +#define KVM_TDX_CAP_TERMINATE_VM _BITULL(0) + struct kvm_tdx_capabilities { __u64 supported_attrs; __u64 supported_xfam; @@ -971,7 +974,9 @@ struct kvm_tdx_capabilities { __u64 kernel_tdvmcallinfo_1_r12; __u64 user_tdvmcallinfo_1_r12; =20 - __u64 reserved[250]; + __u64 supported_caps; + + __u64 reserved[249]; =20 /* Configurable CPUID bits for userspace */ struct kvm_cpuid2 cpuid; diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index c2ef03f39c32..ae059daf1a20 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -188,6 +188,8 @@ static int init_kvm_tdx_caps(const struct tdx_sys_info_= td_conf *td_conf, if (!caps->supported_xfam) return -EIO; =20 + caps->supported_caps =3D KVM_TDX_CAP_TERMINATE_VM; + caps->cpuid.nent =3D td_conf->num_cpuid_config; =20 caps->user_tdvmcallinfo_1_r11 =3D @@ -520,6 +522,7 @@ void tdx_mmu_release_hkid(struct kvm *kvm) goto out; } =20 + write_lock(&kvm->mmu_lock); for_each_online_cpu(i) { if (packages_allocated && cpumask_test_and_set_cpu(topology_physical_package_id(i), @@ -544,7 +547,7 @@ void tdx_mmu_release_hkid(struct kvm *kvm) } else { tdx_hkid_free(kvm_tdx); } - + write_unlock(&kvm->mmu_lock); out: mutex_unlock(&tdx_lock); cpus_read_unlock(); @@ -1884,13 +1887,13 @@ static int tdx_sept_remove_private_spte(struct kvm = *kvm, gfn_t gfn, struct page *page =3D pfn_to_page(pfn); int ret; =20 - /* - * HKID is released after all private pages have been removed, and set - * before any might be populated. Warn if zapping is attempted when - * there can't be anything populated in the private EPT. - */ - if (KVM_BUG_ON(!is_hkid_assigned(to_kvm_tdx(kvm)), kvm)) - return -EINVAL; + if (!is_hkid_assigned(to_kvm_tdx(kvm))) { + KVM_BUG_ON(!to_kvm_tdx(kvm)->vm_terminated, kvm); + ret =3D tdx_reclaim_page(page); + if (!ret) + tdx_unpin(kvm, page); + return ret; + } =20 ret =3D tdx_sept_zap_private_spte(kvm, gfn, level, page); if (ret <=3D 0) @@ -2884,6 +2887,21 @@ static int tdx_td_finalize(struct kvm *kvm, struct k= vm_tdx_cmd *cmd) return 0; } =20 +static int tdx_terminate_vm(struct kvm *kvm) +{ + if (kvm_trylock_all_vcpus(kvm)) + return -EBUSY; + + kvm_vm_dead(kvm); + to_kvm_tdx(kvm)->vm_terminated =3D true; + + kvm_unlock_all_vcpus(kvm); + + tdx_mmu_release_hkid(kvm); + + return 0; +} + int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { struct kvm_tdx_cmd tdx_cmd; @@ -2911,6 +2929,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) case KVM_TDX_FINALIZE_VM: r =3D tdx_td_finalize(kvm, &tdx_cmd); break; + case KVM_TDX_TERMINATE_VM: + r =3D tdx_terminate_vm(kvm); + break; default: r =3D -EINVAL; goto out; diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index ca39a9391db1..0abe70aa1644 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -45,6 +45,7 @@ struct kvm_tdx { * Set/unset is protected with kvm->mmu_lock. */ bool wait_for_sept_zap; + bool vm_terminated; }; =20 /* TDX module vCPU states */ --=20 2.50.1.552.g942d659e1b-goog