From nobody Tue Dec 16 22:29:49 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F758239E9E for ; Sat, 31 May 2025 09:58:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748685503; cv=none; b=jwTuOT1ZWYYsm5zzEO632brAoE7LEdU/C9cfxSHB7ETdD7lT62QAlfFrl6bnOAzEt6lbWn75J+F3zY7Wt0L+4CZUDdSaAVPPSbeTBuhwYm6mRDnyiSf6hgXwP2UFPJjOZb8AerssFyi7YjrbgAb8r+YHHOFyvg5MHE/vgweUcQE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748685503; c=relaxed/simple; bh=mw/Qer8CeHGw4edMlvresJ7xkZve9Gz7YMI12DPZe2E=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GIMTd8VU5P6QkonnR+4jvQlcP/LC+rCAjQ+sh3Uz3LR4x9URXxhsMM3q0YFsABG1Ff+ikUS1rGeIcGaGsrY2XGe2czx3C6KtFH+oC58tSISjKHBgccBmjfxVXjuUhg6uUbJ9FzF3sn3WQ/KrnYgTYx8ezmGa4Qz7NVRorx2yz+Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=QFmPgYvo; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QFmPgYvo" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1748685499; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nrPgdD64vKDtkpdzZC+la9VolCi4JsMi5slKhfemFX0=; b=QFmPgYvoMq6pMASuIBiAZtX5xMiHWfjWWglnXoXib9vVAm/jd3fR0efSiEGNN2Vil3HV/N fRJq5YDwQHztuQ57paihT7XDSLgRFvgwIvAsECUhpvnkZ5coy8F5NtRGHWiWnfu1Mlq8Jo kzLoTFGMAM9/9MamYWX/OTTSa17N99M= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-244-5PmC2kcHMq6FaqEdTUVv3g-1; Sat, 31 May 2025 05:58:16 -0400 X-MC-Unique: 5PmC2kcHMq6FaqEdTUVv3g-1 X-Mimecast-MFC-AGG-ID: 5PmC2kcHMq6FaqEdTUVv3g_1748685495 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 243C01956086; Sat, 31 May 2025 09:58:15 +0000 (UTC) Received: from server.redhat.com (unknown [10.72.112.30]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 03631180049D; Sat, 31 May 2025 09:58:10 +0000 (UTC) From: Cindy Lu To: lulu@redhat.com, jasowang@redhat.com, mst@redhat.com, michael.christie@oracle.com, sgarzare@redhat.com, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Subject: [PATCH RESEND v10 1/3] vhost: Add a new modparam to allow userspace select kthread Date: Sat, 31 May 2025 17:57:26 +0800 Message-ID: <20250531095800.160043-2-lulu@redhat.com> In-Reply-To: <20250531095800.160043-1-lulu@redhat.com> References: <20250531095800.160043-1-lulu@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" The vhost now uses vhost_task and workers as a child of the owner thread. While this aligns with containerization principles, it confuses some legacy userspace applications, therefore, we are reintroducing kthread API support. Add a new module parameter to allow userspace to select behavior between using kthread and task. By default, this parameter is set to true (task mode). This means the default behavior remains unchanged by this patch. Signed-off-by: Cindy Lu Tested-by: Lei Yang --- drivers/vhost/vhost.c | 5 +++++ drivers/vhost/vhost.h | 10 ++++++++++ 2 files changed, 15 insertions(+) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 3a5ebb973dba..240ba78b1e3f 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -41,6 +41,10 @@ static int max_iotlb_entries =3D 2048; module_param(max_iotlb_entries, int, 0444); MODULE_PARM_DESC(max_iotlb_entries, "Maximum number of iotlb entries. (default: 2048)"); +bool inherit_owner_default =3D true; +module_param(inherit_owner_default, bool, 0444); +MODULE_PARM_DESC(inherit_owner_default, + "Set task mode as the default(default: Y)"); =20 enum { VHOST_MEMORY_F_LOG =3D 0x1, @@ -552,6 +556,7 @@ void vhost_dev_init(struct vhost_dev *dev, dev->byte_weight =3D byte_weight; dev->use_worker =3D use_worker; dev->msg_handler =3D msg_handler; + dev->inherit_owner =3D inherit_owner_default; init_waitqueue_head(&dev->wait); INIT_LIST_HEAD(&dev->read_list); INIT_LIST_HEAD(&dev->pending_list); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index bb75a292d50c..c1ff4a92b925 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -176,6 +176,16 @@ struct vhost_dev { int byte_weight; struct xarray worker_xa; bool use_worker; + /* + * If inherit_owner is true we use vhost_tasks to create + * the worker so all settings/limits like cgroups, NPROC, + * scheduler, etc are inherited from the owner. If false, + * we use kthreads and only attach to the same cgroups + * as the owner for compat with older kernels. + * here we use true as default value. + * The default value is set by modparam inherit_owner_default + */ + bool inherit_owner; int (*msg_handler)(struct vhost_dev *dev, u32 asid, struct vhost_iotlb_msg *msg); }; --=20 2.45.0 From nobody Tue Dec 16 22:29:49 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5562223A987 for ; Sat, 31 May 2025 09:58:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748685506; cv=none; b=nJEWMaOQ3qJN77EnnLlDzT9EKMxMUJgm+Gz5o1vKA9LkndnxBBeRaKr/uv864USiktBouyWfPQijeddEQ0MYrQDe+eHOh2lTFQeEhmaqnReiD3bka1+g4wnZ91UQm0E22Sx/FNVwYcyXHiDMzcqWEIpdi1AMXPFJsQuQzFRQ4qM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748685506; c=relaxed/simple; bh=CcOpk8Xyc4VMwwKi2VpRLuGbaZey2YlSpo+8aANMhkg=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=p3MPkbbY9fFaouL5bxIkiHnA0eLGUmuHzvPkAoS8z6oCEj/7WIKWKfNgd1VR71qF5RcVX23fpeds0RRUq1sSBVMUsnT+tObxLFWJYe3+wBZ/7934OjKqtQg34I4etUFC1Q2jIwEs5Cv2xsQqC+mWh1YGZsnU7MpWIk9YpySt0Gg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ilhO1XPr; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ilhO1XPr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1748685502; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/eIhXFNCsuT8fyAjm28qjbco9E563cRybykc6t/pZX4=; b=ilhO1XPrPkKkirBijvkIjMbSr148MgLS3J6fAaEqT7wyh9triHgx5n9KOZshuyTpD/LYHV 3kqhfAdqX6JVFxdWTD9k7HGatoMD8eV7pIiaihQ8CMpDbE6Hpakgu6Lzrj85bcTgC+klbG TWUEX37dGR+L/21nrJFx7GtFbRLLo5I= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-246-0PY6_7P0MN-afYLu4e4gTA-1; Sat, 31 May 2025 05:58:21 -0400 X-MC-Unique: 0PY6_7P0MN-afYLu4e4gTA-1 X-Mimecast-MFC-AGG-ID: 0PY6_7P0MN-afYLu4e4gTA_1748685500 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D317C180034E; Sat, 31 May 2025 09:58:19 +0000 (UTC) Received: from server.redhat.com (unknown [10.72.112.30]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id DD067180049D; Sat, 31 May 2025 09:58:15 +0000 (UTC) From: Cindy Lu To: lulu@redhat.com, jasowang@redhat.com, mst@redhat.com, michael.christie@oracle.com, sgarzare@redhat.com, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Subject: [PATCH RESEND v10 2/3] vhost: Reintroduce kthread mode support in vhost Date: Sat, 31 May 2025 17:57:27 +0800 Message-ID: <20250531095800.160043-3-lulu@redhat.com> In-Reply-To: <20250531095800.160043-1-lulu@redhat.com> References: <20250531095800.160043-1-lulu@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 This patch reintroduces kthread mode support in vhost, It also introduces struct vhost_worker_ops to abstract worker create/stop/wakeup operations. * Bring back the original vhost_worker() implementation, and renamed to vhost_run_work_kthread_list(). * Add cgroup support for the kthread * Introduce struct vhost_worker_ops: - Encapsulates create / stop / wake=E2=80=91up callbacks. - vhost_worker_create() selects the proper ops according to inherit_owner. This partially reverts or improves upon: commit 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") commit 1cdaafa1b8b4 ("vhost: replace single worker pointer with xarray") Signed-off-by: Cindy Lu Tested-by: Lei Yang --- drivers/vhost/vhost.c | 188 ++++++++++++++++++++++++++++++++++++++---- drivers/vhost/vhost.h | 12 +++ 2 files changed, 182 insertions(+), 18 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 240ba78b1e3f..2d2909be1bb2 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -246,7 +247,7 @@ static void vhost_worker_queue(struct vhost_worker *wor= ker, * test_and_set_bit() implies a memory barrier. */ llist_add(&work->node, &worker->work_list); - vhost_task_wake(worker->vtsk); + worker->ops->wakeup(worker); } } =20 @@ -392,6 +393,44 @@ static void vhost_vq_reset(struct vhost_dev *dev, __vhost_vq_meta_reset(vq); } =20 +static int vhost_run_work_kthread_list(void *data) +{ + struct vhost_worker *worker =3D data; + struct vhost_work *work, *work_next; + struct vhost_dev *dev =3D worker->dev; + struct llist_node *node; + + kthread_use_mm(dev->mm); + + for (;;) { + /* mb paired w/ kthread_stop */ + set_current_state(TASK_INTERRUPTIBLE); + + if (kthread_should_stop()) { + __set_current_state(TASK_RUNNING); + break; + } + node =3D llist_del_all(&worker->work_list); + if (!node) + schedule(); + + node =3D llist_reverse_order(node); + /* make sure flag is seen after deletion */ + smp_wmb(); + llist_for_each_entry_safe(work, work_next, node, node) { + clear_bit(VHOST_WORK_QUEUED, &work->flags); + __set_current_state(TASK_RUNNING); + kcov_remote_start_common(worker->kcov_handle); + work->fn(work); + kcov_remote_stop(); + cond_resched(); + } + } + kthread_unuse_mm(dev->mm); + + return 0; +} + static bool vhost_run_work_list(void *data) { struct vhost_worker *worker =3D data; @@ -586,6 +625,46 @@ long vhost_dev_check_owner(struct vhost_dev *dev) } EXPORT_SYMBOL_GPL(vhost_dev_check_owner); =20 +struct vhost_attach_cgroups_struct { + struct vhost_work work; + struct task_struct *owner; + int ret; +}; + +static void vhost_attach_cgroups_work(struct vhost_work *work) +{ + struct vhost_attach_cgroups_struct *s; + + s =3D container_of(work, struct vhost_attach_cgroups_struct, work); + s->ret =3D cgroup_attach_task_all(s->owner, current); +} + +static int vhost_attach_task_to_cgroups(struct vhost_worker *worker) +{ + struct vhost_attach_cgroups_struct attach; + int saved_cnt; + + attach.owner =3D current; + + vhost_work_init(&attach.work, vhost_attach_cgroups_work); + vhost_worker_queue(worker, &attach.work); + + mutex_lock(&worker->mutex); + + /* + * Bypass attachment_cnt check in __vhost_worker_flush: + * Temporarily change it to INT_MAX to bypass the check + */ + saved_cnt =3D worker->attachment_cnt; + worker->attachment_cnt =3D INT_MAX; + __vhost_worker_flush(worker); + worker->attachment_cnt =3D saved_cnt; + + mutex_unlock(&worker->mutex); + + return attach.ret; +} + /* Caller should have device mutex */ bool vhost_dev_has_owner(struct vhost_dev *dev) { @@ -631,7 +710,7 @@ static void vhost_worker_destroy(struct vhost_dev *dev, =20 WARN_ON(!llist_empty(&worker->work_list)); xa_erase(&dev->worker_xa, worker->id); - vhost_task_stop(worker->vtsk); + worker->ops->stop(worker); kfree(worker); } =20 @@ -654,42 +733,115 @@ static void vhost_workers_free(struct vhost_dev *dev) xa_destroy(&dev->worker_xa); } =20 +static void vhost_task_wakeup(struct vhost_worker *worker) +{ + return vhost_task_wake(worker->vtsk); +} + +static void vhost_kthread_wakeup(struct vhost_worker *worker) +{ + wake_up_process(worker->kthread_task); +} + +static void vhost_task_do_stop(struct vhost_worker *worker) +{ + return vhost_task_stop(worker->vtsk); +} + +static void vhost_kthread_do_stop(struct vhost_worker *worker) +{ + kthread_stop(worker->kthread_task); +} + +static int vhost_task_worker_create(struct vhost_worker *worker, + struct vhost_dev *dev, const char *name) +{ + struct vhost_task *vtsk; + u32 id; + int ret; + + vtsk =3D vhost_task_create(vhost_run_work_list, vhost_worker_killed, + worker, name); + if (IS_ERR(vtsk)) + return PTR_ERR(vtsk); + + worker->vtsk =3D vtsk; + vhost_task_start(vtsk); + ret =3D xa_alloc(&dev->worker_xa, &id, worker, xa_limit_32b, GFP_KERNEL); + if (ret < 0) { + vhost_task_do_stop(worker); + return ret; + } + worker->id =3D id; + return 0; +} + +static int vhost_kthread_worker_create(struct vhost_worker *worker, + struct vhost_dev *dev, const char *name) +{ + struct task_struct *task; + u32 id; + int ret; + + task =3D kthread_create(vhost_run_work_kthread_list, worker, "%s", name); + if (IS_ERR(task)) + return PTR_ERR(task); + + worker->kthread_task =3D task; + wake_up_process(task); + ret =3D xa_alloc(&dev->worker_xa, &id, worker, xa_limit_32b, GFP_KERNEL); + if (ret < 0) + goto stop_worker; + + ret =3D vhost_attach_task_to_cgroups(worker); + if (ret) + goto stop_worker; + + worker->id =3D id; + return 0; + +stop_worker: + vhost_kthread_do_stop(worker); + return ret; +} + +static const struct vhost_worker_ops kthread_ops =3D { + .create =3D vhost_kthread_worker_create, + .stop =3D vhost_kthread_do_stop, + .wakeup =3D vhost_kthread_wakeup, +}; + +static const struct vhost_worker_ops vhost_task_ops =3D { + .create =3D vhost_task_worker_create, + .stop =3D vhost_task_do_stop, + .wakeup =3D vhost_task_wakeup, +}; + static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev) { struct vhost_worker *worker; - struct vhost_task *vtsk; char name[TASK_COMM_LEN]; int ret; - u32 id; + const struct vhost_worker_ops *ops =3D + dev->inherit_owner ? &vhost_task_ops : &kthread_ops; =20 worker =3D kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT); if (!worker) return NULL; =20 worker->dev =3D dev; + worker->ops =3D ops; snprintf(name, sizeof(name), "vhost-%d", current->pid); =20 - vtsk =3D vhost_task_create(vhost_run_work_list, vhost_worker_killed, - worker, name); - if (IS_ERR(vtsk)) - goto free_worker; - mutex_init(&worker->mutex); init_llist_head(&worker->work_list); worker->kcov_handle =3D kcov_common_handle(); - worker->vtsk =3D vtsk; - - vhost_task_start(vtsk); - - ret =3D xa_alloc(&dev->worker_xa, &id, worker, xa_limit_32b, GFP_KERNEL); + ret =3D ops->create(worker, dev, name); if (ret < 0) - goto stop_worker; - worker->id =3D id; + goto free_worker; =20 return worker; =20 -stop_worker: - vhost_task_stop(vtsk); free_worker: kfree(worker); return NULL; diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index c1ff4a92b925..bef42ed51485 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -26,7 +26,18 @@ struct vhost_work { unsigned long flags; }; =20 +struct vhost_worker; +struct vhost_dev; + +struct vhost_worker_ops { + int (*create)(struct vhost_worker *worker, struct vhost_dev *dev, + const char *name); + void (*stop)(struct vhost_worker *worker); + void (*wakeup)(struct vhost_worker *worker); +}; + struct vhost_worker { + struct task_struct *kthread_task; struct vhost_task *vtsk; struct vhost_dev *dev; /* Used to serialize device wide flushing with worker swapping. */ @@ -36,6 +47,7 @@ struct vhost_worker { u32 id; int attachment_cnt; bool killed; + const struct vhost_worker_ops *ops; }; =20 /* Poll a file (eventfd or socket) */ --=20 2.45.0 From nobody Tue Dec 16 22:29:49 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F1D0239E69 for ; Sat, 31 May 2025 09:58:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748685510; cv=none; b=lE0AP46lflmh7bA9zCwpCOGDNpUv/6DZsfsKegsJhZRtjGjgXMkkxt4jqDA30xi26S5zPr0IBbTFezv5h3A1KuIkG65AihzGnDGnp6f+g2rAqWnf7E6O8cIY4pLiQKw06ZDd+TYHl4Fc8VBa2fGl64bB5dtdLvIRMEnLiyNM2xM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748685510; c=relaxed/simple; bh=MH2G2hfDbvU36nbcXul/7o/guuWSQyBF7jNQJARY0eE=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ACktSST+OER9KQOBzZRfZZloR74nlzEwYHCjeLcvA9yQrQu2HRjDSa62/voYhN5ZomziWeXpYnTk9dVp7es/BJXH1wxtL2H63acAjMuhJPx0Cn8SNMdFoonzsrzgKR81VSqn7DT9k9LM3TtaYkckWb9azu3YnXUP+BaXw/lI9uE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=bw6PoCFw; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="bw6PoCFw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1748685507; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7QVzfJTKlAfS/h4/7JwNwWZ1w2FS+9cWThTFrETjmZk=; b=bw6PoCFwbrqtEP06rQZJIYlkc3pkhO2sqePoJjN8icTHInU1CC7nifVEEB9tjoVaw9kqf+ wNAFxk6T7Uz5bRVZDBF0ZFEB7utfUq3CUIeDi3sJfEa4H4PDKZNsPSoR9VCXjofKXoEluU +2JcNNTHgqeIjXgfogZWWeVE/uCg+ss= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-394-O6hR8a-POuaRKRIkNYS4ZQ-1; Sat, 31 May 2025 05:58:25 -0400 X-MC-Unique: O6hR8a-POuaRKRIkNYS4ZQ-1 X-Mimecast-MFC-AGG-ID: O6hR8a-POuaRKRIkNYS4ZQ_1748685504 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 894621800446; Sat, 31 May 2025 09:58:24 +0000 (UTC) Received: from server.redhat.com (unknown [10.72.112.30]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 99DF3180049D; Sat, 31 May 2025 09:58:20 +0000 (UTC) From: Cindy Lu To: lulu@redhat.com, jasowang@redhat.com, mst@redhat.com, michael.christie@oracle.com, sgarzare@redhat.com, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Subject: [PATCH RESEND v10 3/3] vhost: Add new UAPI to select kthread mode and KConfig to enable this IOCTL Date: Sat, 31 May 2025 17:57:28 +0800 Message-ID: <20250531095800.160043-4-lulu@redhat.com> In-Reply-To: <20250531095800.160043-1-lulu@redhat.com> References: <20250531095800.160043-1-lulu@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" This patch introduces a new UAPI that allows the vhost device to select in kthread mode. Userspace applications can utilize IOCTL VHOST_FORK_FROM_OWNER to select between task and kthread modes, which must be invoked before IOCTL VHOST_SET_OWNER, as the worker will be created during that call. The VHOST_NEW_WORKER requires the inherit_owner setting to be true, and a check has been added to ensure proper configuration. Additionally, a new KConfig option, CONFIG_VHOST_ENABLE_FORK_OWNER_IOCTL, is introduced to control the availability of the IOCTL VHOST_FORK_FROM_OWNER. When CONFIG_VHOST_ENABLE_FORK_OWNER_IOCTL is set to n, the IOCTL is disabled, and any attempt to use it will result in a failure. Signed-off-by: Cindy Lu Tested-by: Lei Yang --- drivers/vhost/Kconfig | 13 +++++++++++++ drivers/vhost/vhost.c | 30 +++++++++++++++++++++++++++++- include/uapi/linux/vhost.h | 16 ++++++++++++++++ 3 files changed, 58 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 020d4fbb947c..300e474b60fd 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -96,3 +96,16 @@ config VHOST_CROSS_ENDIAN_LEGACY If unsure, say "N". =20 endif + +config VHOST_ENABLE_FORK_OWNER_IOCTL + bool "Enable IOCTL VHOST_FORK_FROM_OWNER" + default n + help + This option enables the IOCTL VHOST_FORK_FROM_OWNER, allowing + userspace applications to modify the thread mode for vhost devices. + + By default, `CONFIG_VHOST_ENABLE_FORK_OWNER_IOCTL` is set to `n`, + which disables the IOCTL. When enabled (y), the IOCTL allows + users to set the mode as needed. + + If unsure, say "N". diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 2d2909be1bb2..cfa60dc438f9 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1022,6 +1022,13 @@ long vhost_worker_ioctl(struct vhost_dev *dev, unsig= ned int ioctl, switch (ioctl) { /* dev worker ioctls */ case VHOST_NEW_WORKER: + /* + * vhost_tasks will account for worker threads under the parent's + * NPROC value but kthreads do not. To avoid userspace overflowing + * the system with worker threads inherit_owner must be true. + */ + if (!dev->inherit_owner) + return -EFAULT; ret =3D vhost_new_worker(dev, &state); if (!ret && copy_to_user(argp, &state, sizeof(state))) ret =3D -EFAULT; @@ -1138,7 +1145,7 @@ void vhost_dev_reset_owner(struct vhost_dev *dev, str= uct vhost_iotlb *umem) int i; =20 vhost_dev_cleanup(dev); - + dev->inherit_owner =3D inherit_owner_default; dev->umem =3D umem; /* We don't need VQ locks below since vhost_dev_cleanup makes sure * VQs aren't running. @@ -2292,6 +2299,27 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned i= nt ioctl, void __user *argp) goto done; } =20 +#ifdef CONFIG_VHOST_ENABLE_FORK_OWNER_IOCTL + if (ioctl =3D=3D VHOST_FORK_FROM_OWNER) { + u8 inherit_owner; + /*inherit_owner can only be modified before owner is set*/ + if (vhost_dev_has_owner(d)) { + r =3D -EBUSY; + goto done; + } + if (copy_from_user(&inherit_owner, argp, sizeof(u8))) { + r =3D -EFAULT; + goto done; + } + if (inherit_owner > 1) { + r =3D -EINVAL; + goto done; + } + d->inherit_owner =3D (bool)inherit_owner; + r =3D 0; + goto done; + } +#endif /* You must be the owner to do anything else */ r =3D vhost_dev_check_owner(d); if (r) diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h index d4b3e2ae1314..d2692c7ef450 100644 --- a/include/uapi/linux/vhost.h +++ b/include/uapi/linux/vhost.h @@ -235,4 +235,20 @@ */ #define VHOST_VDPA_GET_VRING_SIZE _IOWR(VHOST_VIRTIO, 0x82, \ struct vhost_vring_state) + +/** + * VHOST_FORK_FROM_OWNER - Set the inherit_owner flag for the vhost device, + * This ioctl must called before VHOST_SET_OWNER. + * + * @param inherit_owner: An 8-bit value that determines the vhost thread m= ode + * + * When inherit_owner is set to 1(default value): + * - Vhost will create tasks similar to processes forked from the owner, + * inheriting all of the owner's attributes. + * + * When inherit_owner is set to 0: + * - Vhost will create tasks as kernel thread. + */ +#define VHOST_FORK_FROM_OWNER _IOW(VHOST_VIRTIO, 0x83, __u8) + #endif --=20 2.45.0