From nobody Wed Dec 17 12:57:52 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 579C5CDB47E for ; Wed, 18 Oct 2023 20:46:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232358AbjJRUql (ORCPT ); Wed, 18 Oct 2023 16:46:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232082AbjJRUqf (ORCPT ); Wed, 18 Oct 2023 16:46:35 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A3679B for ; Wed, 18 Oct 2023 13:46:34 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-d9a3a98b34dso10019765276.3 for ; Wed, 18 Oct 2023 13:46:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1697661993; x=1698266793; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=0NC3yqMcGjhIZrhf7725DJ1Gf8NkzsIha09da8LPGz0=; b=SmAaQ7putaQ3J9cjAp0JVUn0Kcd12FMuMq4Sl67CevF4CgsKROoWDvmmhp3kRKmGLQ 3QMt7nCvyNMn7/TFbNaLtmuM7W+jWyrJx6vhg9hOs//AB0sYdKhmYfjuvEDYhfHy+/ju hqMbQ99fiBxOGENlnGXhYSN1A2GlE19mlS2vnmOs0GdbeNYKPaYrs/2LZWJ/ohWYLZ6R RGqPv03UwwsBO6mI5aqAQQYJa8ot68Pq5M12HKzDhajzxXBa2DsN/Bxc0gXxGr7gxWgn eR6jmqBHFekWlaHqzNduMj7a5id4JgQewCu1IAA474hxqINCASw/k/HGprLiHz7ytne+ Rtsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697661993; x=1698266793; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=0NC3yqMcGjhIZrhf7725DJ1Gf8NkzsIha09da8LPGz0=; b=ce50dm3SPBGswXVCg5wQu8yAUZsBtp+mTHAbIyj816Q9lT7tcR3wgj/knt3XbdccTX vYWEApDz+MzSCez7N4rcHsoDMene+dZHQipCouSrxprIq64tPvJKGyG/UhdfGB22d56C RDgHbjOZmnadJxsFVKDDMXiKqn87WWKQpeoTsOOcRArirNd0a/FLfq2m2ZDzT+225TOF ocg1jlsMZBQnXwrm9MNcLXt8X0SQLp5UglgjNnusssZby2//ObnhkI4HV/bZsST/YNR+ 5UiSAXpY02bXom48S3LHdfvepVdsEhIMyi4/ByIOaNURZ51IlxsPglCyjJCG15bYwKXI deYA== X-Gm-Message-State: AOJu0YxRu0sE6Fr9B1b48m3vvfpcgRzRAC/5HLCunaEH4YxA3jhZaehT ZvN3ZTH+foN9nvmpB/efbfP42eYTFzQ= X-Google-Smtp-Source: AGHT+IG+aew8QqUeKJhMjafkL3+AfUCVwG2CQNeHZsBiKSQcnY9Z1RYn+qlTT/sPLTnhnGPz3wPUHZUiY+E= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:134a:b0:d9a:58e0:c7c7 with SMTP id g10-20020a056902134a00b00d9a58e0c7c7mr11788ybu.1.1697661993300; Wed, 18 Oct 2023 13:46:33 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 18 Oct 2023 13:46:22 -0700 In-Reply-To: <20231018204624.1905300-1-seanjc@google.com> Mime-Version: 1.0 References: <20231018204624.1905300-1-seanjc@google.com> X-Mailer: git-send-email 2.42.0.655.g421f12c284-goog Message-ID: <20231018204624.1905300-2-seanjc@google.com> Subject: [PATCH 1/3] KVM: Set file_operations.owner appropriately for all such structures From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Al Viro , David Matlack Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Set .owner for all KVM-owned filed types so that the KVM module is pinned until any files with callbacks back into KVM are completely freed. Using "struct kvm" as a proxy for the module, i.e. keeping KVM-the-module alive while there are active VMs, doesn't provide full protection. Userspace can invoke delete_module() the instant the last reference to KVM is put. If KVM itself puts the last reference, e.g. via kvm_destroy_vm(), then it's possible for KVM to be preempted and deleted/unloaded before KVM fully exits, e.g. when the task running kvm_destroy_vm() is scheduled back in, it will jump to a code page that is no longer mapped. Note, file types that can call into sub-module code, e.g. kvm-intel.ko or kvm-amd.ko on x86, must use the module pointer passed to kvm_init(), not THIS_MODULE (which points at kvm.ko). KVM assumes that if /dev/kvm is reachable, e.g. VMs are active, then the vendor module is loaded. To reduce the probability of forgetting to set .owner entirely, use THIS_MODULE for stats files where KVM does not call back into vendor code. This reverts commit 70375c2d8fa3fb9b0b59207a9c5df1e2e1205c10, and fixes several other file types that have been buggy since their introduction. Fixes: 70375c2d8fa3 ("Revert "KVM: set owner of cpu and vm file operations"= ") Fixes: 3bcd0662d66f ("KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs fil= e") Reported-by: Al Viro Link: https://lore.kernel.org/all/20231010003746.GN800259@ZenIV Signed-off-by: Sean Christopherson --- arch/x86/kvm/debugfs.c | 1 + virt/kvm/kvm_main.c | 11 ++++++++--- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c index ee8c4c3496ed..eea6ea7f14af 100644 --- a/arch/x86/kvm/debugfs.c +++ b/arch/x86/kvm/debugfs.c @@ -182,6 +182,7 @@ static int kvm_mmu_rmaps_stat_release(struct inode *ino= de, struct file *file) } =20 static const struct file_operations mmu_rmaps_stat_fops =3D { + .owner =3D THIS_MODULE, .open =3D kvm_mmu_rmaps_stat_open, .read =3D seq_read, .llseek =3D seq_lseek, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 486800a7024b..1e65a506985f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3887,7 +3887,7 @@ static int kvm_vcpu_release(struct inode *inode, stru= ct file *filp) return 0; } =20 -static const struct file_operations kvm_vcpu_fops =3D { +static struct file_operations kvm_vcpu_fops =3D { .release =3D kvm_vcpu_release, .unlocked_ioctl =3D kvm_vcpu_ioctl, .mmap =3D kvm_vcpu_mmap, @@ -4081,6 +4081,7 @@ static int kvm_vcpu_stats_release(struct inode *inode= , struct file *file) } =20 static const struct file_operations kvm_vcpu_stats_fops =3D { + .owner =3D THIS_MODULE, .read =3D kvm_vcpu_stats_read, .release =3D kvm_vcpu_stats_release, .llseek =3D noop_llseek, @@ -4431,7 +4432,7 @@ static int kvm_device_release(struct inode *inode, st= ruct file *filp) return 0; } =20 -static const struct file_operations kvm_device_fops =3D { +static struct file_operations kvm_device_fops =3D { .unlocked_ioctl =3D kvm_device_ioctl, .release =3D kvm_device_release, KVM_COMPAT(kvm_device_ioctl), @@ -4759,6 +4760,7 @@ static int kvm_vm_stats_release(struct inode *inode, = struct file *file) } =20 static const struct file_operations kvm_vm_stats_fops =3D { + .owner =3D THIS_MODULE, .read =3D kvm_vm_stats_read, .release =3D kvm_vm_stats_release, .llseek =3D noop_llseek, @@ -5060,7 +5062,7 @@ static long kvm_vm_compat_ioctl(struct file *filp, } #endif =20 -static const struct file_operations kvm_vm_fops =3D { +static struct file_operations kvm_vm_fops =3D { .release =3D kvm_vm_release, .unlocked_ioctl =3D kvm_vm_ioctl, .llseek =3D noop_llseek, @@ -6095,6 +6097,9 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align,= struct module *module) goto err_async_pf; =20 kvm_chardev_ops.owner =3D module; + kvm_vm_fops.owner =3D module; + kvm_vcpu_fops.owner =3D module; + kvm_device_fops.owner =3D module; =20 kvm_preempt_ops.sched_in =3D kvm_sched_in; kvm_preempt_ops.sched_out =3D kvm_sched_out; --=20 2.42.0.655.g421f12c284-goog From nobody Wed Dec 17 12:57:52 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E650ECDB483 for ; Wed, 18 Oct 2023 20:46:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232221AbjJRUqp (ORCPT ); Wed, 18 Oct 2023 16:46:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51042 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232149AbjJRUqh (ORCPT ); Wed, 18 Oct 2023 16:46:37 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2467A4 for ; Wed, 18 Oct 2023 13:46:35 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5a7af53bde4so117957207b3.0 for ; Wed, 18 Oct 2023 13:46:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1697661995; x=1698266795; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=w+Tq8ZbDqZnNACP1ZGd4eGNbYVTAqWG4yMn2szlYYbk=; b=T98kk+jNDwt/ZiuSK4thE7SNRmSmHKpNUjbVFTUOUn1zp7roA0T/GxYtAivo2NfBRp iSok0N2azlFKcVN4CBiS6e/ST5N5Qfbc2K6a9737JC+ra4WSrvNb74LU20wMm4A8emdm uPM8dmBoUr9ZFKz4C9oiTP9DIR01oYLTL6QGFfMMIwG3CFyGBUYGU3CGT0y6L05aA1ay 0OikiUefL/VdGER5iJKu+15WKaK6Xu6B6Nm+qkr/7lJETspWuQDyLSsoXwdkBcqSkpWA 5jQ52I8t7dt9ka/SHKX7tCKAOfhYpCtpkPL1qrqY/B10qsAAQIAp0wSbn4jKeWl/80nC rcvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697661995; x=1698266795; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=w+Tq8ZbDqZnNACP1ZGd4eGNbYVTAqWG4yMn2szlYYbk=; b=AdbN3jfaN1un/qGJl2BUb9+YdsMmhxQ69mz3Z7Vk1894uxDhUmXydyo82hczVOgp5f BCznRIaCDno+GIIPNQrZps0/zAqzv+FLbl2Cn7uKLZKIVC/BDyv/V/84SrI4//y1zSmR UlHRMHfaH4jXU+O31M2p2DL2zTGO9OC2Y/aY+AWlZhFEJqmykOxaMdJhTkE1tGjVAVhR YWwsPl4w1PJtiIqwPsT1FsQ/d2E+OQdPy9Lax59cBYoixJszF9GIbUiF+0RzlH/n1/wg e9+uu4J/AAudvyJUu5lfL4oSTqk0gh7lzw96KOgGQRiiqsgzv8+R5YuBL1S1r26LT1c/ oO+Q== X-Gm-Message-State: AOJu0YwAJC2/4qBmvY+uo/caJ6w+JVjKp7xO8KKvP9v6snNP27X8RWp4 Y4JF/BqNZ9oqS4pYwSNlTGO+NuACurA= X-Google-Smtp-Source: AGHT+IExHueZ801GZYJL9D6wPlfFWNcPEOuWicVaYleJkmF6NjxQ6GVvUBqt4HcS+UjO0Bnn4V8/N0OPneU= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a0d:cb89:0:b0:5a7:7683:995d with SMTP id n131-20020a0dcb89000000b005a77683995dmr11570ywd.5.1697661995065; Wed, 18 Oct 2023 13:46:35 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 18 Oct 2023 13:46:23 -0700 In-Reply-To: <20231018204624.1905300-1-seanjc@google.com> Mime-Version: 1.0 References: <20231018204624.1905300-1-seanjc@google.com> X-Mailer: git-send-email 2.42.0.655.g421f12c284-goog Message-ID: <20231018204624.1905300-3-seanjc@google.com> Subject: [PATCH 2/3] KVM: Always flush async #PF workqueue when vCPU is being destroyed From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Al Viro , David Matlack Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Always flush the per-vCPU async #PF workqueue when a vCPU is clearing its completion queue, i.e. when a VM and all its vCPUs is being destroyed. KVM must ensure that none of its workqueue callbacks is running when the last reference to the KVM _module_ is put. Gifting a reference to the associated VM prevents the workqueue callback from dereferencing freed vCPU/VM memory, but does not prevent the KVM module from being unloaded before the callback completes. Drop the misguided VM refcount gifting, as calling kvm_put_kvm() from async_pf_execute() if kvm_put_kvm() flushes the async #PF workqueue will result in deadlock. async_pf_execute() can't return until kvm_put_kvm() finishes, and kvm_put_kvm() can't return until async_pf_execute() finishes: WARNING: CPU: 8 PID: 251 at virt/kvm/kvm_main.c:1435 kvm_put_kvm+0x2d/0x32= 0 [kvm] Modules linked in: vhost_net vhost vhost_iotlb tap kvm_intel kvm irqbypass CPU: 8 PID: 251 Comm: kworker/8:1 Tainted: G W 6.6.0-rc1-e= 7af8d17224a-x86/gmem-vm #119 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Workqueue: events async_pf_execute [kvm] RIP: 0010:kvm_put_kvm+0x2d/0x320 [kvm] Call Trace: async_pf_execute+0x198/0x260 [kvm] process_one_work+0x145/0x2d0 worker_thread+0x27e/0x3a0 kthread+0xba/0xe0 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x11/0x20 ---[ end trace 0000000000000000 ]--- INFO: task kworker/8:1:251 blocked for more than 120 seconds. Tainted: G W 6.6.0-rc1-e7af8d17224a-x86/gmem-vm #119 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/8:1 state:D stack:0 pid:251 ppid:2 flags:0x000= 04000 Workqueue: events async_pf_execute [kvm] Call Trace: __schedule+0x33f/0xa40 schedule+0x53/0xc0 schedule_timeout+0x12a/0x140 __wait_for_common+0x8d/0x1d0 __flush_work.isra.0+0x19f/0x2c0 kvm_clear_async_pf_completion_queue+0x129/0x190 [kvm] kvm_arch_destroy_vm+0x78/0x1b0 [kvm] kvm_put_kvm+0x1c1/0x320 [kvm] async_pf_execute+0x198/0x260 [kvm] process_one_work+0x145/0x2d0 worker_thread+0x27e/0x3a0 kthread+0xba/0xe0 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x11/0x20 If kvm_clear_async_pf_completion_queue() actually flushes the workqueue, then there's no need to gift async_pf_execute() a reference because all invocations of async_pf_execute() will be forced to complete before the vCPU and its VM are destroyed/freed. And that in turn fixes the module unloading bug as __fput() won't do module_put() on the last vCPU reference until the vCPU has been freed, e.g. if closing the vCPU file also puts the last reference to the KVM module. Note, commit 5f6de5cbebee ("KVM: Prevent module exit until all VMs are freed") *tried* to fix the module refcounting issue by having VMs grab a reference to the module, but that only made the bug slightly harder to hit as it gave async_pf_execute() a bit more time to complete before the KVM module could be unloaded. Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped = out") Cc: stable@vger.kernel.org Cc: David Matlack Signed-off-by: Sean Christopherson Reviewed-by: David Matlack --- virt/kvm/async_pf.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index e033c79d528e..7aeb9d1f43b1 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -87,7 +87,6 @@ static void async_pf_execute(struct work_struct *work) __kvm_vcpu_wake_up(vcpu); =20 mmput(mm); - kvm_put_kvm(vcpu->kvm); } =20 void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu) @@ -114,7 +113,6 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcp= u *vcpu) #else if (cancel_work_sync(&work->work)) { mmput(work->mm); - kvm_put_kvm(vcpu->kvm); /* =3D=3D work->vcpu->kvm */ kmem_cache_free(async_pf_cache, work); } #endif @@ -126,7 +124,19 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vc= pu *vcpu) list_first_entry(&vcpu->async_pf.done, typeof(*work), link); list_del(&work->link); + + spin_unlock(&vcpu->async_pf.lock); + + /* + * The async #PF is "done", but KVM must wait for the work item + * itself, i.e. async_pf_execute(), to run to completion. If + * KVM is a module, KVM must ensure *no* code owned by the KVM + * (the module) can be run after the last call to module_put(), + * i.e. after the last reference to the last vCPU's file is put. + */ + flush_work(&work->work); kmem_cache_free(async_pf_cache, work); + spin_lock(&vcpu->async_pf.lock); } spin_unlock(&vcpu->async_pf.lock); =20 @@ -186,7 +196,6 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr= 2_or_gpa, work->arch =3D *arch; work->mm =3D current->mm; mmget(work->mm); - kvm_get_kvm(work->vcpu->kvm); =20 INIT_WORK(&work->work, async_pf_execute); =20 --=20 2.42.0.655.g421f12c284-goog From nobody Wed Dec 17 12:57:52 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90CC2CDB47E for ; Wed, 18 Oct 2023 20:46:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232396AbjJRUqt (ORCPT ); Wed, 18 Oct 2023 16:46:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51026 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232262AbjJRUqj (ORCPT ); Wed, 18 Oct 2023 16:46:39 -0400 Received: from mail-oi1-x249.google.com (mail-oi1-x249.google.com [IPv6:2607:f8b0:4864:20::249]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 601DDFA for ; Wed, 18 Oct 2023 13:46:37 -0700 (PDT) Received: by mail-oi1-x249.google.com with SMTP id 5614622812f47-3af6a12b2a8so12070975b6e.1 for ; Wed, 18 Oct 2023 13:46:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1697661996; x=1698266796; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=AxQtUaL497jm8Kvv+JzkijCCW6oqSLgiPCMTJq7eWjY=; b=qHUULQsSo3Ub018clPnEeoGy1KFO9nj/b12yYOWutZRQBYALkbLj/B2C+oF0XuLbcR NLeJjUPJAz0tFXBku3c22stWEspn5R2O4ncJx0EvYY7DolE9X5pB3I+LJWgZcuLKloqt PiT7noNmG96kEbTH0XsmnICSTFzgyoRdU1F3MhtJe4koukZUBoikGDcr1D+WJX1mmb2h 73HmsFByuSSZnPR6AQQdzUeqpc7x54czTNbpa5Zk9BjWN8+l3z/jO9xXwPI60qqAp8OW +Yqed/Fmzo0XRFg1nG4AVH8v3UiV6MTSiMx+oZ36n8hhz7F7kVQ28Wwvk+5en1RPGiGn IkHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697661996; x=1698266796; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=AxQtUaL497jm8Kvv+JzkijCCW6oqSLgiPCMTJq7eWjY=; b=AMUD6a3OWq71IRP4ruauqs7Ta93yENCBa5F9XLCZ5qxAVfOEFmoTap2wI4MUJUGJd0 g+p9WJGN5I6FgF+pRirOrgVrnrRmaaAgIDYGB7lpwelzvFmNId3aunPUi/Rk5lZVL3Db ulz+/geyVMi8gsK9C8wWYQoBpL6jCBWwxCA9Xz/1U3JsqholUF270zKHj38bAL6njz3A RZ0yUZ5Ye6pvmxCBV8sB7FGpaR0/Li5+9cJmGcvzmoAvGL44Ths4SXeQ8NuBitDXEEb/ fFtrclHatyqLRbzKWCiO2JbKiBaHI5g2K8AswNaViykYtWccPJWcoAjrvrqFCjo/XW+0 JdOg== X-Gm-Message-State: AOJu0YxJ7rCjDG2yezmqOa5qDjmXGJA5LjKyuSlr6s7RC4o5Xe0ucfRB O+jODfvskNLnmfjN1mDHS8JDXFp5hII= X-Google-Smtp-Source: AGHT+IE+HSvc8NfeaZoWmMmsZSX78eAKPODu7osocTukqqL8LuYcEH25gkg/9rvnxgtnl9ETHnlyRSBM6Fo= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6808:1884:b0:3a9:d030:5023 with SMTP id bi4-20020a056808188400b003a9d0305023mr94020oib.3.1697661996735; Wed, 18 Oct 2023 13:46:36 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 18 Oct 2023 13:46:24 -0700 In-Reply-To: <20231018204624.1905300-1-seanjc@google.com> Mime-Version: 1.0 References: <20231018204624.1905300-1-seanjc@google.com> X-Mailer: git-send-email 2.42.0.655.g421f12c284-goog Message-ID: <20231018204624.1905300-4-seanjc@google.com> Subject: [PATCH 3/3] Revert "KVM: Prevent module exit until all VMs are freed" From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Al Viro , David Matlack Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Revert KVM's misguided attempt to "fix" a use-after-module-unload bug that was actually due to failure to flush a workqueue, not a lack of module refcounting. Pinning the KVM module until kvm_vm_destroy() doesn't prevent use-after-free due to the module being unloaded, as userspace can invoke delete_module() the instant the last reference to KVM is put, i.e. can cause all KVM code to be unmapped while KVM is actively executing said code. Generally speaking, the many instances of module_put(THIS_MODULE) notwithstanding, outside of a few special paths, a module can never safely put the last reference to itself without creating deadlock, i.e. something external to the module *must* put the last reference. In other words, having VMs grab a reference to the KVM module is futile, pointless, and as evidenced by the now-reverted commit 70375c2d8fa3 ("Revert "KVM: set owner of cpu and vm file operations""), actively dangerous. This reverts commit 405294f29faee5de8c10cb9d4a90e229c2835279 and commit 5f6de5cbebee925a612856fce6f9182bb3eee0db. Fixes: 405294f29fae ("KVM: Unconditionally get a ref to /dev/kvm module whe= n creating a VM") Fixes: 5f6de5cbebee ("KVM: Prevent module exit until all VMs are freed") Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1e65a506985f..3b1b9e8dd70c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -115,8 +115,6 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir); =20 static const struct file_operations stat_fops_per_vm; =20 -static struct file_operations kvm_chardev_ops; - static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, unsigned long arg); #ifdef CONFIG_KVM_COMPAT @@ -1157,9 +1155,6 @@ static struct kvm *kvm_create_vm(unsigned long type, = const char *fdname) if (!kvm) return ERR_PTR(-ENOMEM); =20 - /* KVM is pinned via open("/dev/kvm"), the fd passed to this ioctl(). */ - __module_get(kvm_chardev_ops.owner); - KVM_MMU_LOCK_INIT(kvm); mmgrab(current->mm); kvm->mm =3D current->mm; @@ -1279,7 +1274,6 @@ static struct kvm *kvm_create_vm(unsigned long type, = const char *fdname) out_err_no_srcu: kvm_arch_free_vm(kvm); mmdrop(current->mm); - module_put(kvm_chardev_ops.owner); return ERR_PTR(r); } =20 @@ -1348,7 +1342,6 @@ static void kvm_destroy_vm(struct kvm *kvm) preempt_notifier_dec(); hardware_disable_all(); mmdrop(mm); - module_put(kvm_chardev_ops.owner); } =20 void kvm_get_kvm(struct kvm *kvm) --=20 2.42.0.655.g421f12c284-goog