From nobody Fri Dec 26 03:31:14 2025 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D223215C0 for ; Wed, 10 Jan 2024 01:15:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="qiW5NejF" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dbeac7a5b53so3337975276.2 for ; Tue, 09 Jan 2024 17:15:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704849338; x=1705454138; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=XYuPpMHg4kkSf61Majt6ZB1e5OqyEZJQxirPTzJ0Fdw=; b=qiW5NejF4maQ9b2Dlimzdpcp53dwS+KZAernKfMXgv/Mb97qv5NqoSKLaPtniKbhcZ f1vvOnnK9tXYTqFpTY+gT9GQi/f6eEOTLDFcbnpNJEgF8vxFegXRrQVDv5pZ9pf8gn7Y TJ5D9H5Y5cTs5orpASXqH5JHTzvKGFygvxZK7hFUHuVrxc5ihEhm5sDvojOUfuGDWgpQ KIdVqRw6lefSGTyoJ8WokVSOKXbdH59EcN96maj8DWfkbvYPCHJfhZiPmVYMvLWchPOd 92lKhnvqv5lfjKQ53fvwkdJPCfKEgw5rV4u2RO9dxxoi0wjzXGOuRi1b1jeAEECfXoQ7 kuxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704849338; x=1705454138; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=XYuPpMHg4kkSf61Majt6ZB1e5OqyEZJQxirPTzJ0Fdw=; b=wlbRMjKKGu9GhDwS53TSvD4kQEceGScumzGZEjvMlPdJjBf3DQG2i9yPenbwSsqC8x E5ZE2XtLJV2rY5pE4bBIl2hoJp5uMdxNAAUKTbZbkslO/iO5cPlRBcQ7dImdV8pkvwci h4cXbT2hifEk5PrbcAnp8bM7ZArUf0IGlASgCI4XOJztB45UMApdkdYfLrsso2KOlBKj jANorvx7MMQFD8uslV/NE9ez4WbufHB6khXb6vDi2NQSjvRCnMCCSvYu6KBE/8cjWbVM 9uq+8slQB1a4eCLu5WNgzb+pW6LGSxM5h6agDGAar99b6S4PvyVBNx+v2Biisqs8fEXJ jo8A== X-Gm-Message-State: AOJu0YziXsRvrs1F05Tv/xPXtBqg6/oJQA/xfZOdrvfXFdAr3C3mSj+F 1ONWfk4SS8IQZ0taQO29KG6s23eYemGhBC3P3g== X-Google-Smtp-Source: AGHT+IHPHYzTfLV1eyWciUMErntB/kDHFOk79hccdy8996kR1y9nEE7EfVhcF4BWE9mB/Y6Qjh4D0pBEn50= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:8002:0:b0:dbe:4772:395e with SMTP id m2-20020a258002000000b00dbe4772395emr6664ybk.5.1704849337952; Tue, 09 Jan 2024 17:15:37 -0800 (PST) Reply-To: Sean Christopherson Date: Tue, 9 Jan 2024 17:15:30 -0800 In-Reply-To: <20240110011533.503302-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240110011533.503302-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog Message-ID: <20240110011533.503302-2-seanjc@google.com> Subject: [PATCH 1/4] KVM: Always flush async #PF workqueue when vCPU is being destroyed From: Sean Christopherson To: Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Xu Yilun , Sean Christopherson Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its completion queue, e.g. when a VM and all its vCPUs is being destroyed. KVM must ensure that none of its workqueue callbacks is running when the last reference to the KVM _module_ is put. Gifting a reference to the associated VM prevents the workqueue callback from dereferencing freed vCPU/VM memory, but does not prevent the KVM module from being unloaded before the callback completes. Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will result in deadlock. async_pf_execute() can't return until kvm_put_kvm() finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes: WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x32= 0 [kvm] Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G W 6.6.0-rc1-e= 7af8d17224a-x86/gmem-vm #119 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Workqueue: events async_pf_execute [kvm] RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm] Call Trace: async_pf_execute+0x198/0x260 [kvm] process_one_work+0x145/0x2d0 worker_thread+0x27e/0x3a0 kthread+0xba/0xe0 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x11/0x20 ---[ end trace 0000000000000000 ]--- INFO: task kworker/8:1:251 blocked for more than 120 seconds. Tainted: G W 6.6.0-rc1-e7af8d17224a-x86/gmem-vm #119 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/8:1 state:D stack:0 pid:251 ppid:2 flags:0x000= 04000 Workqueue: events async_pf_execute [kvm] Call Trace: __schedule+0x33f/0xa40 schedule+0x53/0xc0 schedule_timeout+0x12a/0x140 __wait_for_common+0x8d/0x1d0 __flush_work.isra.0+0x19f/0x2c0 kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm] kvm_arch_destroy_vm+0x78/0x1b0 [kvm] kvm_put_kvm+0x1c1/0x320 [kvm] async_pf_execute+0x198/0x260 [kvm] process_one_work+0x145/0x2d0 worker_thread+0x27e/0x3a0 kthread+0xba/0xe0 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x11/0x20 If kvm_clear_async_pf_completion_queue() actually flushes the workqueue, then there's no need to gift async_pf_execute() a reference because all invocations of async_pf_execute() will be forced to complete before the vCPU and its VM are destroyed/freed. And that in turn fixes the module unloading bug as __fput() won't do module_put() on the last vCPU reference until the vCPU has been freed, e.g. if closing the vCPU file also puts the last reference to the KVM module. Note that kvm_check_async_pf_completion() may also take the work item off the completion queue and so also needs to flush the work queue, as the work will not be seen by kvm_clear_async_pf_completion_queue(). Waiting on the workqueue could theoretically delay a vCPU due to waiting for the work to complete, but that's a very, very small chance, and likely a very small delay. kvm_arch_async_page_present_queued() unconditionally makes a new request, i.e. will effectively delay entering the guest, so the remaining work is really just: trace_kvm_async_pf_completed(addr, cr2_or_gpa); __kvm_vcpu_wake_up(vcpu); mmput(mm); and mmput() can't drop the last reference to the page tables if the vCPU is still alive, i.e. the vCPU won't get stuck tearing down page tables. Add a helper to do the flushing, specifically to deal with "wakeup all" work items, as they aren't actually work items, i.e. are never placed in a workqueue. Trying to flush a bogus workqueue entry rightly makes __flush_work() complain (kudos to whoever added that sanity check). Note, commit 5f6de5cbebee ("KVM: Prevent module exit until all VMs are freed") *tried* to fix the module refcounting issue by having VMs grab a reference to the module, but that only made the bug slightly harder to hit as it gave async_pf_execute() a bit more time to complete before the KVM module could be unloaded. Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped = out") Cc: stable@vger.kernel.org Cc: David Matlack Cc: Xu Yilun Signed-off-by: Sean Christopherson Reviewed-by: Vitaly Kuznetsov --- virt/kvm/async_pf.c | 37 ++++++++++++++++++++++++++++++++----- 1 file changed, 32 insertions(+), 5 deletions(-) diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index e033c79d528e..876927a558ad 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -87,7 +87,25 @@ static void async_pf_execute(struct work_struct *work) __kvm_vcpu_wake_up(vcpu); =20 mmput(mm); - kvm_put_kvm(vcpu->kvm); +} + +static void kvm_flush_and_free_async_pf_work(struct kvm_async_pf *work) +{ + /* + * The async #PF is "done", but KVM must wait for the work item itself, + * i.e. async_pf_execute(), to run to completion. If KVM is a module, + * KVM must ensure *no* code owned by the KVM (the module) can be run + * after the last call to module_put(), i.e. after the last reference + * to the last vCPU's file is put. + * + * Wake all events skip the queue and go straight done, i.e. don't + * need to be flushed (but sanity check that the work wasn't queued). + */ + if (work->wakeup_all) + WARN_ON_ONCE(work->work.func); + else + flush_work(&work->work); + kmem_cache_free(async_pf_cache, work); } =20 void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu) @@ -114,7 +132,6 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcp= u *vcpu) #else if (cancel_work_sync(&work->work)) { mmput(work->mm); - kvm_put_kvm(vcpu->kvm); /* =3D=3D work->vcpu->kvm */ kmem_cache_free(async_pf_cache, work); } #endif @@ -126,7 +143,18 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vc= pu *vcpu) list_first_entry(&vcpu->async_pf.done, typeof(*work), link); list_del(&work->link); - kmem_cache_free(async_pf_cache, work); + + spin_unlock(&vcpu->async_pf.lock); + + /* + * The async #PF is "done", but KVM must wait for the work item + * itself, i.e. async_pf_execute(), to run to completion. If + * KVM is a module, KVM must ensure *no* code owned by the KVM + * (the module) can be run after the last call to module_put(), + * i.e. after the last reference to the last vCPU's file is put. + */ + kvm_flush_and_free_async_pf_work(work); + spin_lock(&vcpu->async_pf.lock); } spin_unlock(&vcpu->async_pf.lock); =20 @@ -151,7 +179,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcp= u) =20 list_del(&work->queue); vcpu->async_pf.queued--; - kmem_cache_free(async_pf_cache, work); + kvm_flush_and_free_async_pf_work(work); } } =20 @@ -186,7 +214,6 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr= 2_or_gpa, work->arch =3D *arch; work->mm =3D current->mm; mmget(work->mm); - kvm_get_kvm(work->vcpu->kvm); =20 INIT_WORK(&work->work, async_pf_execute); =20 --=20 2.43.0.472.g3155946c3a-goog From nobody Fri Dec 26 03:31:14 2025 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E11C633EC for ; Wed, 10 Jan 2024 01:15:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="CvT+ThHt" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dbea05a6de5so4089249276.3 for ; Tue, 09 Jan 2024 17:15:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704849340; x=1705454140; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=MSa3yAe8jPQSi96I05+ffZSVty1oHKuW7qm39NwPaMU=; b=CvT+ThHt67ibjlOE2BRw0lbHsson8mUEGeMIcP0Yj/cQNvpShj9Y1XoFz6Zyh2y6DP lbqMxyfQJYFHGB/wWkV0YCysKHAMrWWAbYSU3QiTbRVN7lvq6oLpizCEBoZi/CDzxQ/D XihlPvZrbxQUDHPGWKVGtRBmCpwRLZlSmv63G+s3AJ50xQyjWcRM7vJYj1sgbVm7830b 01xc+xOa6StjjSWBT8K5GDpy0asLSTxi0+gcwAswAvuUzG7ndvRm8Ft2NU3HKrqAyY+5 /CVjLo6Wxyw2aCnLGJCww3S3k89So9Ax2N34wdqRrgo+n7c6qfb70phCMgWH6IyN3nsV 9J8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704849340; x=1705454140; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=MSa3yAe8jPQSi96I05+ffZSVty1oHKuW7qm39NwPaMU=; b=D/IlD8cDanub0RnoUzwJcO/MHQy7bAG1eTUZqAwu/JzUgvlkZA9+N6MLHwOqYEhVp6 Sizcn1yBn6xKnTp8YkII4rsxltka98kt2moKE70KdbG71evPYqmo5Zv4yxfSWC6PeIZZ 1KoZWxH6DMqb4M5GhfrMv2tEikRjGxMfGldinQl1P86fMSH+WGx779Qqhz+3K2UpdHOF MNYlpIN23WdfXoZ4IvEpm/3guReCJVzhzl+invn3ENE4Gxy/H8H4jKls9XPhwVjR8qxh 05tAe5kobfL+G+DSwYl34QEuiINTLU4wyrt0Eg9ZvE20k3PkkYn+KuhBFDpt2v7jHslR j48A== X-Gm-Message-State: AOJu0Yx0tFkrJ/Evy9mjMGznpPJUn8XcpO7TaqpIfTPc4A7HYfURhEZv mK1E9vtBJ2G/NhTGlua7ginU/Ak4qSJxR2bO1g== X-Google-Smtp-Source: AGHT+IEE0VBsviB3u2D39fkX0O0IUZsZjyJNbOn/W6XheAc3OMjvl0i9pABJ0QykPu9WmSLqGTayIjQLg8k= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:5f48:0:b0:dbd:f6e3:19a with SMTP id h8-20020a255f48000000b00dbdf6e3019amr10788ybm.11.1704849339880; Tue, 09 Jan 2024 17:15:39 -0800 (PST) Reply-To: Sean Christopherson Date: Tue, 9 Jan 2024 17:15:31 -0800 In-Reply-To: <20240110011533.503302-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240110011533.503302-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog Message-ID: <20240110011533.503302-3-seanjc@google.com> Subject: [PATCH 2/4] KVM: Put mm immediately after async #PF worker completes remote gup() From: Sean Christopherson To: Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Xu Yilun , Sean Christopherson Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Put the async #PF worker's reference to the VM's address space as soon as the worker is done with the mm. This will allow deferring getting a reference to the worker itself without having to track whether or not getting a reference succeeded. Note, if the vCPU is still alive, there is no danger of the worker getting stuck with tearing down the host page tables, as userspace also holds a reference (obviously), i.e. there is no risk of delaying the page-present notification due to triggering the slow path in mmput(). Signed-off-by: Sean Christopherson Reviewed-by: Vitaly Kuznetsov Reviewed-by: Xu Yilun --- virt/kvm/async_pf.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index 876927a558ad..d5dc50318aa6 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -64,6 +64,7 @@ static void async_pf_execute(struct work_struct *work) get_user_pages_remote(mm, addr, 1, FOLL_WRITE, NULL, &locked); if (locked) mmap_read_unlock(mm); + mmput(mm); =20 if (IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC)) kvm_arch_async_page_present(vcpu, apf); @@ -85,8 +86,6 @@ static void async_pf_execute(struct work_struct *work) trace_kvm_async_pf_completed(addr, cr2_or_gpa); =20 __kvm_vcpu_wake_up(vcpu); - - mmput(mm); } =20 static void kvm_flush_and_free_async_pf_work(struct kvm_async_pf *work) --=20 2.43.0.472.g3155946c3a-goog From nobody Fri Dec 26 03:31:14 2025 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DEC615381 for ; Wed, 10 Jan 2024 01:15:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="bVwNcSjF" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dbeba57a668so4918745276.3 for ; Tue, 09 Jan 2024 17:15:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704849342; x=1705454142; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=mX9ZU+1m0BrodkWf6AaI85KgoYtkfH6tJUIjqhj930U=; b=bVwNcSjFESVuFpg6LAHHYMR8/QjDjseO9/eg7vjCANPhsM39d2F4XoRrZchGEyXXsq eRitVq25JH1M8//2lfcjGGRR+ut54DDaSn5o1oRkr5B3gV58WmNZOQNbu57im0xG65dO daL9+x6DLnM5iQBm+sVK6KU5seXuf4m0d5C8xLhUkh/Y60O50hF4K8mHdA/eaon4hA7S KaD4BAI+Jdkh0QxjfCLBnYHk7vj//7xhmG4rzYdeG/VYy+mZLtYUK4I2QoEdW9/YDPZt Cu+ji46c6WRpLLSs6S4CftZ73Kd/LnU0EpSa9XmPx88y8A5LwDEbhpoDV7Bo9JJlBvC9 jlzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704849342; x=1705454142; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=mX9ZU+1m0BrodkWf6AaI85KgoYtkfH6tJUIjqhj930U=; b=HzusmU5+y7kYMzN5SMLWFjMwtJvvA84WKbdt9E37MBl6h6ARqu2kZQnDT3+QtgJkUn sSKLbYPCA3cySENIgWctBqRNvVmWP8kjJFE/BVbbxrSn9scyJ9H/Oto/mw4j05ojzrDA l/KmTaj1bk1HXoT63RuLvNF4UyrcLJWDhzXdlGMB7vtsdaAMf5CtDU5OrSyqxFLsT5c/ GleB9QeMfN42kH0mf8QkrqmVt1hyZJDNbLpiSFf4gZeOkwTaJrYPHENpd9aCme5uGiS6 sqb7J2YktwFIjx3u+3VVkK9V4COQ5xKKTrGF/jN9P/rSnbobHl+x7d6jYkIafEhyB4gG eBBg== X-Gm-Message-State: AOJu0YzRDo49pf8mh4LrCmSA8LOnbNaOD0eWyfjqsxqFl1J2oEGefi52 sNmCuXI5rdUVIsWyyWNtE7op7kA7CyZH8M5fLw== X-Google-Smtp-Source: AGHT+IHaLuS4kAIrF+XCH5QdXQ4OxQ6NkjBy+SO1qhpsf6OFBPVqu2gV9AfGYUTrXFQq9qb+Prlb1FHVsio= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:1364:b0:dbf:2471:5341 with SMTP id bt4-20020a056902136400b00dbf24715341mr103599ybb.10.1704849341932; Tue, 09 Jan 2024 17:15:41 -0800 (PST) Reply-To: Sean Christopherson Date: Tue, 9 Jan 2024 17:15:32 -0800 In-Reply-To: <20240110011533.503302-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240110011533.503302-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog Message-ID: <20240110011533.503302-4-seanjc@google.com> Subject: [PATCH 3/4] KVM: Get reference to VM's address space in the async #PF worker From: Sean Christopherson To: Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Xu Yilun , Sean Christopherson Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Get a reference to the target VM's address space in async_pf_execute() instead of gifting a reference from kvm_setup_async_pf(). Keeping the address space alive just to service an async #PF is counter-productive, i.e. if the process is exiting and all vCPUs are dead, then NOT doing get_user_pages_remote() and freeing the address space asap is desirable. Handling the mm reference entirely within async_pf_execute() also simplifies the async #PF flows as a whole, e.g. it's not immediately obvious when the worker task vs. the vCPU task is responsible for putting the gifted mm reference. Signed-off-by: Sean Christopherson Reviewed-by: Vitaly Kuznetsov Reviewed-by: Xu Yilun --- include/linux/kvm_host.h | 1 - virt/kvm/async_pf.c | 32 ++++++++++++++++++-------------- 2 files changed, 18 insertions(+), 15 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7e7fd25b09b3..bbfefd7e612f 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -238,7 +238,6 @@ struct kvm_async_pf { struct list_head link; struct list_head queue; struct kvm_vcpu *vcpu; - struct mm_struct *mm; gpa_t cr2_or_gpa; unsigned long addr; struct kvm_arch_async_pf arch; diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index d5dc50318aa6..c3f4f351a2ae 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -46,8 +46,8 @@ static void async_pf_execute(struct work_struct *work) { struct kvm_async_pf *apf =3D container_of(work, struct kvm_async_pf, work); - struct mm_struct *mm =3D apf->mm; struct kvm_vcpu *vcpu =3D apf->vcpu; + struct mm_struct *mm =3D vcpu->kvm->mm; unsigned long addr =3D apf->addr; gpa_t cr2_or_gpa =3D apf->cr2_or_gpa; int locked =3D 1; @@ -56,16 +56,24 @@ static void async_pf_execute(struct work_struct *work) might_sleep(); =20 /* - * This work is run asynchronously to the task which owns - * mm and might be done in another context, so we must - * access remotely. + * Attempt to pin the VM's host address space, and simply skip gup() if + * acquiring a pin fail, i.e. if the process is exiting. Note, KVM + * holds a reference to its associated mm_struct until the very end of + * kvm_destroy_vm(), i.e. the struct itself won't be freed before this + * work item is fully processed. */ - mmap_read_lock(mm); - get_user_pages_remote(mm, addr, 1, FOLL_WRITE, NULL, &locked); - if (locked) - mmap_read_unlock(mm); - mmput(mm); + if (mmget_not_zero(mm)) { + mmap_read_lock(mm); + get_user_pages_remote(mm, addr, 1, FOLL_WRITE, NULL, &locked); + if (locked) + mmap_read_unlock(mm); + mmput(mm); + } =20 + /* + * Notify and kick the vCPU even if faulting in the page failed, e.g. + * so that the vCPU can retry the fault synchronously. + */ if (IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC)) kvm_arch_async_page_present(vcpu, apf); =20 @@ -129,10 +137,8 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vc= pu *vcpu) #ifdef CONFIG_KVM_ASYNC_PF_SYNC flush_work(&work->work); #else - if (cancel_work_sync(&work->work)) { - mmput(work->mm); + if (cancel_work_sync(&work->work)) kmem_cache_free(async_pf_cache, work); - } #endif spin_lock(&vcpu->async_pf.lock); } @@ -211,8 +217,6 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr= 2_or_gpa, work->cr2_or_gpa =3D cr2_or_gpa; work->addr =3D hva; work->arch =3D *arch; - work->mm =3D current->mm; - mmget(work->mm); =20 INIT_WORK(&work->work, async_pf_execute); =20 --=20 2.43.0.472.g3155946c3a-goog From nobody Fri Dec 26 03:31:14 2025 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 765E98BEC for ; Wed, 10 Jan 2024 01:15:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="dNK39BnQ" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-5ce53c43ea1so1875461a12.2 for ; Tue, 09 Jan 2024 17:15:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704849344; x=1705454144; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=zbNh1anpXzgsFYy2suczQuAt+Ip04T0QEkordwGV61Q=; b=dNK39BnQDGkIlFULkcyO6U/3u0XQ9ssRBGzme3nz6R5Cf2WtpRvqHSy6sJnnK9fwSI XF1V3hDCqxe5uePm0G9gR0dxEAPMt3BexdBMN9iyM9eWuJi7fodrsucO3kbAf5qMEoKe tl6YLYbIel1U5JIShD9ZMh6zs7Khvx/cjrDpbhL5+ZRoX2Cnqv6QAjOh4NCOiev3s+aT p2wV/tFI+J4+tHotCstSRcoaoFsANnkNIWtNl3VUJIDJW8Yvbp5cymoPRHFd6F+Dwk7V UIFlWdyr2KiD4gzhMsRhxnoYqay5O/DTE+qhC2QSFev1A9aZ34Qxgy6Y7owHgY2Lxkrl yd0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704849344; x=1705454144; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=zbNh1anpXzgsFYy2suczQuAt+Ip04T0QEkordwGV61Q=; b=WrPMqWiZkrgoaF1jJRjcE/zbfZDy5BrrYfP3ioDSHG/i+d3ByvRJF0Yr5aAiHC82Uu rYm+1uYkyzEEbUoT7NbKt8S5ZRCXAl172ckC1tj0MgBeEh+ZjoCurd6IL+ai02tT/TmC vDZFbgHYYrzByn2UdmxBWBFzDEc2OyA4cDsTdaIPw/U79/HTTbVlGs97WFG2lkigm2mt 1DaPIj1dWAC/5A1/uegdqcqnxItz3XBFaozXhNH+FBMCopAg1gxyhvEbZxfotbO7oHL6 HVehZoQ2eIL2Cq7KbhdIdiRMiqmI54cBuZaQgT0ilX8fASRhwTCM0XrHYMt9ERlFeka5 9zVA== X-Gm-Message-State: AOJu0YzD68oLqCaVkglsz4PiOJ56k2nxZoe2v3wUujLbbTw4ahuSDL8d 9+eT0tjoGeauzwqmp/cjxFbtKmf/i3fZ4HLqlw== X-Google-Smtp-Source: AGHT+IFipzkMiu6A+DPGq7lS055oPto9adZ9Nwk3nmTujL2ZM2JT8bTuRRSR5BbyhjvU4hLC+1cHheBO0wI= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:dacc:b0:1d3:d9c2:225a with SMTP id q12-20020a170902dacc00b001d3d9c2225amr1322plx.7.1704849343802; Tue, 09 Jan 2024 17:15:43 -0800 (PST) Reply-To: Sean Christopherson Date: Tue, 9 Jan 2024 17:15:33 -0800 In-Reply-To: <20240110011533.503302-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240110011533.503302-1-seanjc@google.com> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog Message-ID: <20240110011533.503302-5-seanjc@google.com> Subject: [PATCH 4/4] KVM: Nullify async #PF worker's "apf" pointer as soon as it might be freed From: Sean Christopherson To: Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Xu Yilun , Sean Christopherson Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Nullify the async #PF worker's local "apf" pointer immediately after the point where the structure can be freed by the vCPU. The existing comment is helpful, but easy to overlook as there is no associated code. Update the comment to clarify that it can be freed by as soon as the lock is dropped, as "after this point" isn't strictly accurate, nor does it help understand what prevents the structure from being freed earlier. Signed-off-by: Sean Christopherson Reviewed-by: Vitaly Kuznetsov Reviewed-by: Xu Yilun --- virt/kvm/async_pf.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index c3f4f351a2ae..1088c6628de9 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -83,13 +83,14 @@ static void async_pf_execute(struct work_struct *work) apf->vcpu =3D NULL; spin_unlock(&vcpu->async_pf.lock); =20 - if (!IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC) && first) - kvm_arch_async_page_present_queued(vcpu); - /* - * apf may be freed by kvm_check_async_pf_completion() after - * this point + * The apf struct may freed by kvm_check_async_pf_completion() as soon + * as the lock is dropped. Nullify it to prevent improper usage. */ + apf =3D NULL; + + if (!IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC) && first) + kvm_arch_async_page_present_queued(vcpu); =20 trace_kvm_async_pf_completed(addr, cr2_or_gpa); =20 --=20 2.43.0.472.g3155946c3a-goog