From nobody Sun Feb  8 14:37:30 2026
Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com
 [209.85.219.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D223215C0
	for <linux-kernel@vger.kernel.org>; Wed, 10 Jan 2024 01:15:38 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="qiW5NejF"
Received: by mail-yb1-f201.google.com with SMTP id
 3f1490d57ef6-dbeac7a5b53so3337975276.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 09 Jan 2024 17:15:38 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1704849338; x=1705454138;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=XYuPpMHg4kkSf61Majt6ZB1e5OqyEZJQxirPTzJ0Fdw=;
        b=qiW5NejF4maQ9b2Dlimzdpcp53dwS+KZAernKfMXgv/Mb97qv5NqoSKLaPtniKbhcZ
         f1vvOnnK9tXYTqFpTY+gT9GQi/f6eEOTLDFcbnpNJEgF8vxFegXRrQVDv5pZ9pf8gn7Y
         TJ5D9H5Y5cTs5orpASXqH5JHTzvKGFygvxZK7hFUHuVrxc5ihEhm5sDvojOUfuGDWgpQ
         KIdVqRw6lefSGTyoJ8WokVSOKXbdH59EcN96maj8DWfkbvYPCHJfhZiPmVYMvLWchPOd
         92lKhnvqv5lfjKQ53fvwkdJPCfKEgw5rV4u2RO9dxxoi0wjzXGOuRi1b1jeAEECfXoQ7
         kuxA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1704849338; x=1705454138;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=XYuPpMHg4kkSf61Majt6ZB1e5OqyEZJQxirPTzJ0Fdw=;
        b=wlbRMjKKGu9GhDwS53TSvD4kQEceGScumzGZEjvMlPdJjBf3DQG2i9yPenbwSsqC8x
         E5ZE2XtLJV2rY5pE4bBIl2hoJp5uMdxNAAUKTbZbkslO/iO5cPlRBcQ7dImdV8pkvwci
         h4cXbT2hifEk5PrbcAnp8bM7ZArUf0IGlASgCI4XOJztB45UMApdkdYfLrsso2KOlBKj
         jANorvx7MMQFD8uslV/NE9ez4WbufHB6khXb6vDi2NQSjvRCnMCCSvYu6KBE/8cjWbVM
         9uq+8slQB1a4eCLu5WNgzb+pW6LGSxM5h6agDGAar99b6S4PvyVBNx+v2Biisqs8fEXJ
         jo8A==
X-Gm-Message-State: AOJu0YziXsRvrs1F05Tv/xPXtBqg6/oJQA/xfZOdrvfXFdAr3C3mSj+F
	1ONWfk4SS8IQZ0taQO29KG6s23eYemGhBC3P3g==
X-Google-Smtp-Source: 
 AGHT+IHPHYzTfLV1eyWciUMErntB/kDHFOk79hccdy8996kR1y9nEE7EfVhcF4BWE9mB/Y6Qjh4D0pBEn50=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a25:8002:0:b0:dbe:4772:395e with SMTP id
 m2-20020a258002000000b00dbe4772395emr6664ybk.5.1704849337952; Tue, 09 Jan
 2024 17:15:37 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue,  9 Jan 2024 17:15:30 -0800
In-Reply-To: <20240110011533.503302-1-seanjc@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240110011533.503302-1-seanjc@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20240110011533.503302-2-seanjc@google.com>
Subject: [PATCH 1/4] KVM: Always flush async #PF workqueue when vCPU is being
 destroyed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	David Matlack <dmatlack@google.com>, Xu Yilun <yilun.xu@linux.intel.com>,
	Sean Christopherson <seanjc@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its
completion queue, e.g. when a VM and all its vCPUs is being destroyed.
KVM must ensure that none of its workqueue callbacks is running when the
last reference to the KVM _module_ is put.  Gifting a reference to the
associated VM prevents the workqueue callback from dereferencing freed
vCPU/VM memory, but does not prevent the KVM module from being unloaded
before the callback completes.

Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from
async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will
result in deadlock.  async_pf_execute() can't return until kvm_put_kvm()
finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes:

 WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x32=
0 [kvm]
 Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass
 CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G        W          6.6.0-rc1-e=
7af8d17224a-x86/gmem-vm #119
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
 Workqueue: events async_pf_execute [kvm]
 RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm]
 Call Trace:
  <TASK>
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>
 ---[ end trace 0000000000000000 ]---
 INFO: task kworker/8:1:251 blocked for more than 120 seconds.
       Tainted: G        W          6.6.0-rc1-e7af8d17224a-x86/gmem-vm #119
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:kworker/8:1     state:D stack:0     pid:251   ppid:2      flags:0x000=
04000
 Workqueue: events async_pf_execute [kvm]
 Call Trace:
  <TASK>
  __schedule+0x33f/0xa40
  schedule+0x53/0xc0
  schedule_timeout+0x12a/0x140
  __wait_for_common+0x8d/0x1d0
  __flush_work.isra.0+0x19f/0x2c0
  kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm]
  kvm_arch_destroy_vm+0x78/0x1b0 [kvm]
  kvm_put_kvm+0x1c1/0x320 [kvm]
  async_pf_execute+0x198/0x260 [kvm]
  process_one_work+0x145/0x2d0
  worker_thread+0x27e/0x3a0
  kthread+0xba/0xe0
  ret_from_fork+0x2d/0x50
  ret_from_fork_asm+0x11/0x20
  </TASK>

If kvm_clear_async_pf_completion_queue() actually flushes the workqueue,
then there's no need to gift async_pf_execute() a reference because all
invocations of async_pf_execute() will be forced to complete before the
vCPU and its VM are destroyed/freed.  And that in turn fixes the module
unloading bug as __fput() won't do module_put() on the last vCPU reference
until the vCPU has been freed, e.g. if closing the vCPU file also puts the
last reference to the KVM module.

Note that kvm_check_async_pf_completion() may also take the work item off
the completion queue and so also needs to flush the work queue, as the
work will not be seen by kvm_clear_async_pf_completion_queue().  Waiting
on the workqueue could theoretically delay a vCPU due to waiting for the
work to complete, but that's a very, very small chance, and likely a very
small delay.  kvm_arch_async_page_present_queued() unconditionally makes a
new request, i.e. will effectively delay entering the guest, so the
remaining work is really just:

        trace_kvm_async_pf_completed(addr, cr2_or_gpa);

        __kvm_vcpu_wake_up(vcpu);

        mmput(mm);

and mmput() can't drop the last reference to the page tables if the vCPU is
still alive, i.e. the vCPU won't get stuck tearing down page tables.

Add a helper to do the flushing, specifically to deal with "wakeup all"
work items, as they aren't actually work items, i.e. are never placed in a
workqueue.  Trying to flush a bogus workqueue entry rightly makes
__flush_work() complain (kudos to whoever added that sanity check).

Note, commit 5f6de5cbebee ("KVM: Prevent module exit until all VMs are
freed") *tried* to fix the module refcounting issue by having VMs grab a
reference to the module, but that only made the bug slightly harder to hit
as it gave async_pf_execute() a bit more time to complete before the KVM
module could be unloaded.

Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped =
out")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Cc: Xu Yilun <yilun.xu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 virt/kvm/async_pf.c | 37 ++++++++++++++++++++++++++++++++-----
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index e033c79d528e..876927a558ad 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -87,7 +87,25 @@ static void async_pf_execute(struct work_struct *work)
 	__kvm_vcpu_wake_up(vcpu);
=20
 	mmput(mm);
-	kvm_put_kvm(vcpu->kvm);
+}
+
+static void kvm_flush_and_free_async_pf_work(struct kvm_async_pf *work)
+{
+	/*
+	 * The async #PF is "done", but KVM must wait for the work item itself,
+	 * i.e. async_pf_execute(), to run to completion.  If KVM is a module,
+	 * KVM must ensure *no* code owned by the KVM (the module) can be run
+	 * after the last call to module_put(), i.e. after the last reference
+	 * to the last vCPU's file is put.
+	 *
+	 * Wake all events skip the queue and go straight done, i.e. don't
+	 * need to be flushed (but sanity check that the work wasn't queued).
+	 */
+	if (work->wakeup_all)
+		WARN_ON_ONCE(work->work.func);
+	else
+		flush_work(&work->work);
+	kmem_cache_free(async_pf_cache, work);
 }
=20
 void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
@@ -114,7 +132,6 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcp=
u *vcpu)
 #else
 		if (cancel_work_sync(&work->work)) {
 			mmput(work->mm);
-			kvm_put_kvm(vcpu->kvm); /* =3D=3D work->vcpu->kvm */
 			kmem_cache_free(async_pf_cache, work);
 		}
 #endif
@@ -126,7 +143,18 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vc=
pu *vcpu)
 			list_first_entry(&vcpu->async_pf.done,
 					 typeof(*work), link);
 		list_del(&work->link);
-		kmem_cache_free(async_pf_cache, work);
+
+		spin_unlock(&vcpu->async_pf.lock);
+
+		/*
+		 * The async #PF is "done", but KVM must wait for the work item
+		 * itself, i.e. async_pf_execute(), to run to completion.  If
+		 * KVM is a module, KVM must ensure *no* code owned by the KVM
+		 * (the module) can be run after the last call to module_put(),
+		 * i.e. after the last reference to the last vCPU's file is put.
+		 */
+		kvm_flush_and_free_async_pf_work(work);
+		spin_lock(&vcpu->async_pf.lock);
 	}
 	spin_unlock(&vcpu->async_pf.lock);
=20
@@ -151,7 +179,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcp=
u)
=20
 		list_del(&work->queue);
 		vcpu->async_pf.queued--;
-		kmem_cache_free(async_pf_cache, work);
+		kvm_flush_and_free_async_pf_work(work);
 	}
 }
=20
@@ -186,7 +214,6 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr=
2_or_gpa,
 	work->arch =3D *arch;
 	work->mm =3D current->mm;
 	mmget(work->mm);
-	kvm_get_kvm(work->vcpu->kvm);
=20
 	INIT_WORK(&work->work, async_pf_execute);
=20
--=20
2.43.0.472.g3155946c3a-goog
From nobody Sun Feb  8 14:37:30 2026
Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com
 [209.85.219.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E11C633EC
	for <linux-kernel@vger.kernel.org>; Wed, 10 Jan 2024 01:15:40 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="CvT+ThHt"
Received: by mail-yb1-f201.google.com with SMTP id
 3f1490d57ef6-dbea05a6de5so4089249276.3
        for <linux-kernel@vger.kernel.org>;
 Tue, 09 Jan 2024 17:15:40 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1704849340; x=1705454140;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=MSa3yAe8jPQSi96I05+ffZSVty1oHKuW7qm39NwPaMU=;
        b=CvT+ThHt67ibjlOE2BRw0lbHsson8mUEGeMIcP0Yj/cQNvpShj9Y1XoFz6Zyh2y6DP
         lbqMxyfQJYFHGB/wWkV0YCysKHAMrWWAbYSU3QiTbRVN7lvq6oLpizCEBoZi/CDzxQ/D
         XihlPvZrbxQUDHPGWKVGtRBmCpwRLZlSmv63G+s3AJ50xQyjWcRM7vJYj1sgbVm7830b
         01xc+xOa6StjjSWBT8K5GDpy0asLSTxi0+gcwAswAvuUzG7ndvRm8Ft2NU3HKrqAyY+5
         /CVjLo6Wxyw2aCnLGJCww3S3k89So9Ax2N34wdqRrgo+n7c6qfb70phCMgWH6IyN3nsV
         9J8A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1704849340; x=1705454140;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=MSa3yAe8jPQSi96I05+ffZSVty1oHKuW7qm39NwPaMU=;
        b=D/IlD8cDanub0RnoUzwJcO/MHQy7bAG1eTUZqAwu/JzUgvlkZA9+N6MLHwOqYEhVp6
         Sizcn1yBn6xKnTp8YkII4rsxltka98kt2moKE70KdbG71evPYqmo5Zv4yxfSWC6PeIZZ
         1KoZWxH6DMqb4M5GhfrMv2tEikRjGxMfGldinQl1P86fMSH+WGx779Qqhz+3K2UpdHOF
         MNYlpIN23WdfXoZ4IvEpm/3guReCJVzhzl+invn3ENE4Gxy/H8H4jKls9XPhwVjR8qxh
         05tAe5kobfL+G+DSwYl34QEuiINTLU4wyrt0Eg9ZvE20k3PkkYn+KuhBFDpt2v7jHslR
         j48A==
X-Gm-Message-State: AOJu0Yx0tFkrJ/Evy9mjMGznpPJUn8XcpO7TaqpIfTPc4A7HYfURhEZv
	mK1E9vtBJ2G/NhTGlua7ginU/Ak4qSJxR2bO1g==
X-Google-Smtp-Source: 
 AGHT+IEE0VBsviB3u2D39fkX0O0IUZsZjyJNbOn/W6XheAc3OMjvl0i9pABJ0QykPu9WmSLqGTayIjQLg8k=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a25:5f48:0:b0:dbd:f6e3:19a with SMTP id
 h8-20020a255f48000000b00dbdf6e3019amr10788ybm.11.1704849339880; Tue, 09 Jan
 2024 17:15:39 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue,  9 Jan 2024 17:15:31 -0800
In-Reply-To: <20240110011533.503302-1-seanjc@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240110011533.503302-1-seanjc@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20240110011533.503302-3-seanjc@google.com>
Subject: [PATCH 2/4] KVM: Put mm immediately after async #PF worker completes
 remote gup()
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	David Matlack <dmatlack@google.com>, Xu Yilun <yilun.xu@linux.intel.com>,
	Sean Christopherson <seanjc@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Put the async #PF worker's reference to the VM's address space as soon as
the worker is done with the mm.  This will allow deferring getting a
reference to the worker itself without having to track whether or not
getting a reference succeeded.

Note, if the vCPU is still alive, there is no danger of the worker getting
stuck with tearing down the host page tables, as userspace also holds a
reference (obviously), i.e. there is no risk of delaying the page-present
notification due to triggering the slow path in mmput().

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Xu Yilun <yilun.xu@intel.com>
---
 virt/kvm/async_pf.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 876927a558ad..d5dc50318aa6 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -64,6 +64,7 @@ static void async_pf_execute(struct work_struct *work)
 	get_user_pages_remote(mm, addr, 1, FOLL_WRITE, NULL, &locked);
 	if (locked)
 		mmap_read_unlock(mm);
+	mmput(mm);
=20
 	if (IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC))
 		kvm_arch_async_page_present(vcpu, apf);
@@ -85,8 +86,6 @@ static void async_pf_execute(struct work_struct *work)
 	trace_kvm_async_pf_completed(addr, cr2_or_gpa);
=20
 	__kvm_vcpu_wake_up(vcpu);
-
-	mmput(mm);
 }
=20
 static void kvm_flush_and_free_async_pf_work(struct kvm_async_pf *work)
--=20
2.43.0.472.g3155946c3a-goog
From nobody Sun Feb  8 14:37:30 2026
Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com
 [209.85.219.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id DEC615381
	for <linux-kernel@vger.kernel.org>; Wed, 10 Jan 2024 01:15:42 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="bVwNcSjF"
Received: by mail-yb1-f201.google.com with SMTP id
 3f1490d57ef6-dbeba57a668so4918745276.3
        for <linux-kernel@vger.kernel.org>;
 Tue, 09 Jan 2024 17:15:42 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1704849342; x=1705454142;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=mX9ZU+1m0BrodkWf6AaI85KgoYtkfH6tJUIjqhj930U=;
        b=bVwNcSjFESVuFpg6LAHHYMR8/QjDjseO9/eg7vjCANPhsM39d2F4XoRrZchGEyXXsq
         eRitVq25JH1M8//2lfcjGGRR+ut54DDaSn5o1oRkr5B3gV58WmNZOQNbu57im0xG65dO
         daL9+x6DLnM5iQBm+sVK6KU5seXuf4m0d5C8xLhUkh/Y60O50hF4K8mHdA/eaon4hA7S
         KaD4BAI+Jdkh0QxjfCLBnYHk7vj//7xhmG4rzYdeG/VYy+mZLtYUK4I2QoEdW9/YDPZt
         Cu+ji46c6WRpLLSs6S4CftZ73Kd/LnU0EpSa9XmPx88y8A5LwDEbhpoDV7Bo9JJlBvC9
         jlzA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1704849342; x=1705454142;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=mX9ZU+1m0BrodkWf6AaI85KgoYtkfH6tJUIjqhj930U=;
        b=HzusmU5+y7kYMzN5SMLWFjMwtJvvA84WKbdt9E37MBl6h6ARqu2kZQnDT3+QtgJkUn
         sSKLbYPCA3cySENIgWctBqRNvVmWP8kjJFE/BVbbxrSn9scyJ9H/Oto/mw4j05ojzrDA
         l/KmTaj1bk1HXoT63RuLvNF4UyrcLJWDhzXdlGMB7vtsdaAMf5CtDU5OrSyqxFLsT5c/
         GleB9QeMfN42kH0mf8QkrqmVt1hyZJDNbLpiSFf4gZeOkwTaJrYPHENpd9aCme5uGiS6
         sqb7J2YktwFIjx3u+3VVkK9V4COQ5xKKTrGF/jN9P/rSnbobHl+x7d6jYkIafEhyB4gG
         eBBg==
X-Gm-Message-State: AOJu0YzRDo49pf8mh4LrCmSA8LOnbNaOD0eWyfjqsxqFl1J2oEGefi52
	sNmCuXI5rdUVIsWyyWNtE7op7kA7CyZH8M5fLw==
X-Google-Smtp-Source: 
 AGHT+IHaLuS4kAIrF+XCH5QdXQ4OxQ6NkjBy+SO1qhpsf6OFBPVqu2gV9AfGYUTrXFQq9qb+Prlb1FHVsio=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a05:6902:1364:b0:dbf:2471:5341 with SMTP id
 bt4-20020a056902136400b00dbf24715341mr103599ybb.10.1704849341932; Tue, 09 Jan
 2024 17:15:41 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue,  9 Jan 2024 17:15:32 -0800
In-Reply-To: <20240110011533.503302-1-seanjc@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240110011533.503302-1-seanjc@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20240110011533.503302-4-seanjc@google.com>
Subject: [PATCH 3/4] KVM: Get reference to VM's address space in the async #PF
 worker
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	David Matlack <dmatlack@google.com>, Xu Yilun <yilun.xu@linux.intel.com>,
	Sean Christopherson <seanjc@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Get a reference to the target VM's address space in async_pf_execute()
instead of gifting a reference from kvm_setup_async_pf().  Keeping the
address space alive just to service an async #PF is counter-productive,
i.e. if the process is exiting and all vCPUs are dead, then NOT doing
get_user_pages_remote() and freeing the address space asap is desirable.

Handling the mm reference entirely within async_pf_execute() also
simplifies the async #PF flows as a whole, e.g. it's not immediately
obvious when the worker task vs. the vCPU task is responsible for putting
the gifted mm reference.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Xu Yilun <yilun.xu@intel.com>
---
 include/linux/kvm_host.h |  1 -
 virt/kvm/async_pf.c      | 32 ++++++++++++++++++--------------
 2 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7e7fd25b09b3..bbfefd7e612f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -238,7 +238,6 @@ struct kvm_async_pf {
 	struct list_head link;
 	struct list_head queue;
 	struct kvm_vcpu *vcpu;
-	struct mm_struct *mm;
 	gpa_t cr2_or_gpa;
 	unsigned long addr;
 	struct kvm_arch_async_pf arch;
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index d5dc50318aa6..c3f4f351a2ae 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -46,8 +46,8 @@ static void async_pf_execute(struct work_struct *work)
 {
 	struct kvm_async_pf *apf =3D
 		container_of(work, struct kvm_async_pf, work);
-	struct mm_struct *mm =3D apf->mm;
 	struct kvm_vcpu *vcpu =3D apf->vcpu;
+	struct mm_struct *mm =3D vcpu->kvm->mm;
 	unsigned long addr =3D apf->addr;
 	gpa_t cr2_or_gpa =3D apf->cr2_or_gpa;
 	int locked =3D 1;
@@ -56,16 +56,24 @@ static void async_pf_execute(struct work_struct *work)
 	might_sleep();
=20
 	/*
-	 * This work is run asynchronously to the task which owns
-	 * mm and might be done in another context, so we must
-	 * access remotely.
+	 * Attempt to pin the VM's host address space, and simply skip gup() if
+	 * acquiring a pin fail, i.e. if the process is exiting.  Note, KVM
+	 * holds a reference to its associated mm_struct until the very end of
+	 * kvm_destroy_vm(), i.e. the struct itself won't be freed before this
+	 * work item is fully processed.
 	 */
-	mmap_read_lock(mm);
-	get_user_pages_remote(mm, addr, 1, FOLL_WRITE, NULL, &locked);
-	if (locked)
-		mmap_read_unlock(mm);
-	mmput(mm);
+	if (mmget_not_zero(mm)) {
+		mmap_read_lock(mm);
+		get_user_pages_remote(mm, addr, 1, FOLL_WRITE, NULL, &locked);
+		if (locked)
+			mmap_read_unlock(mm);
+		mmput(mm);
+	}
=20
+	/*
+	 * Notify and kick the vCPU even if faulting in the page failed, e.g.
+	 * so that the vCPU can retry the fault synchronously.
+	 */
 	if (IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC))
 		kvm_arch_async_page_present(vcpu, apf);
=20
@@ -129,10 +137,8 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vc=
pu *vcpu)
 #ifdef CONFIG_KVM_ASYNC_PF_SYNC
 		flush_work(&work->work);
 #else
-		if (cancel_work_sync(&work->work)) {
-			mmput(work->mm);
+		if (cancel_work_sync(&work->work))
 			kmem_cache_free(async_pf_cache, work);
-		}
 #endif
 		spin_lock(&vcpu->async_pf.lock);
 	}
@@ -211,8 +217,6 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr=
2_or_gpa,
 	work->cr2_or_gpa =3D cr2_or_gpa;
 	work->addr =3D hva;
 	work->arch =3D *arch;
-	work->mm =3D current->mm;
-	mmget(work->mm);
=20
 	INIT_WORK(&work->work, async_pf_execute);
=20
--=20
2.43.0.472.g3155946c3a-goog
From nobody Sun Feb  8 14:37:30 2026
Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com
 [209.85.215.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 765E98BEC
	for <linux-kernel@vger.kernel.org>; Wed, 10 Jan 2024 01:15:44 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="dNK39BnQ"
Received: by mail-pg1-f201.google.com with SMTP id
 41be03b00d2f7-5ce53c43ea1so1875461a12.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 09 Jan 2024 17:15:44 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1704849344; x=1705454144;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=zbNh1anpXzgsFYy2suczQuAt+Ip04T0QEkordwGV61Q=;
        b=dNK39BnQDGkIlFULkcyO6U/3u0XQ9ssRBGzme3nz6R5Cf2WtpRvqHSy6sJnnK9fwSI
         XF1V3hDCqxe5uePm0G9gR0dxEAPMt3BexdBMN9iyM9eWuJi7fodrsucO3kbAf5qMEoKe
         tl6YLYbIel1U5JIShD9ZMh6zs7Khvx/cjrDpbhL5+ZRoX2Cnqv6QAjOh4NCOiev3s+aT
         p2wV/tFI+J4+tHotCstSRcoaoFsANnkNIWtNl3VUJIDJW8Yvbp5cymoPRHFd6F+Dwk7V
         UIFlWdyr2KiD4gzhMsRhxnoYqay5O/DTE+qhC2QSFev1A9aZ34Qxgy6Y7owHgY2Lxkrl
         yd0w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1704849344; x=1705454144;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=zbNh1anpXzgsFYy2suczQuAt+Ip04T0QEkordwGV61Q=;
        b=WrPMqWiZkrgoaF1jJRjcE/zbfZDy5BrrYfP3ioDSHG/i+d3ByvRJF0Yr5aAiHC82Uu
         rYm+1uYkyzEEbUoT7NbKt8S5ZRCXAl172ckC1tj0MgBeEh+ZjoCurd6IL+ai02tT/TmC
         vDZFbgHYYrzByn2UdmxBWBFzDEc2OyA4cDsTdaIPw/U79/HTTbVlGs97WFG2lkigm2mt
         1DaPIj1dWAC/5A1/uegdqcqnxItz3XBFaozXhNH+FBMCopAg1gxyhvEbZxfotbO7oHL6
         HVehZoQ2eIL2Cq7KbhdIdiRMiqmI54cBuZaQgT0ilX8fASRhwTCM0XrHYMt9ERlFeka5
         9zVA==
X-Gm-Message-State: AOJu0YzD68oLqCaVkglsz4PiOJ56k2nxZoe2v3wUujLbbTw4ahuSDL8d
	9+eT0tjoGeauzwqmp/cjxFbtKmf/i3fZ4HLqlw==
X-Google-Smtp-Source: 
 AGHT+IFipzkMiu6A+DPGq7lS055oPto9adZ9Nwk3nmTujL2ZM2JT8bTuRRSR5BbyhjvU4hLC+1cHheBO0wI=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a17:902:dacc:b0:1d3:d9c2:225a with SMTP id
 q12-20020a170902dacc00b001d3d9c2225amr1322plx.7.1704849343802; Tue, 09 Jan
 2024 17:15:43 -0800 (PST)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Tue,  9 Jan 2024 17:15:33 -0800
In-Reply-To: <20240110011533.503302-1-seanjc@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20240110011533.503302-1-seanjc@google.com>
X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog
Message-ID: <20240110011533.503302-5-seanjc@google.com>
Subject: [PATCH 4/4] KVM: Nullify async #PF worker's "apf" pointer as soon as
 it might be freed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	David Matlack <dmatlack@google.com>, Xu Yilun <yilun.xu@linux.intel.com>,
	Sean Christopherson <seanjc@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Nullify the async #PF worker's local "apf" pointer immediately after the
point where the structure can be freed by the vCPU.  The existing comment
is helpful, but easy to overlook as there is no associated code.

Update the comment to clarify that it can be freed by as soon as the lock
is dropped, as "after this point" isn't strictly accurate, nor does it
help understand what prevents the structure from being freed earlier.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Xu Yilun <yilun.xu@intel.com>
---
 virt/kvm/async_pf.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index c3f4f351a2ae..1088c6628de9 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -83,13 +83,14 @@ static void async_pf_execute(struct work_struct *work)
 	apf->vcpu =3D NULL;
 	spin_unlock(&vcpu->async_pf.lock);
=20
-	if (!IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC) && first)
-		kvm_arch_async_page_present_queued(vcpu);
-
 	/*
-	 * apf may be freed by kvm_check_async_pf_completion() after
-	 * this point
+	 * The apf struct may freed by kvm_check_async_pf_completion() as soon
+	 * as the lock is dropped.  Nullify it to prevent improper usage.
 	 */
+	apf =3D NULL;
+
+	if (!IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC) && first)
+		kvm_arch_async_page_present_queued(vcpu);
=20
 	trace_kvm_async_pf_completed(addr, cr2_or_gpa);
=20
--=20
2.43.0.472.g3155946c3a-goog