From nobody Sun Dec 14 19:21:37 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E27BD256C90 for ; Mon, 21 Apr 2025 02:45:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745203518; cv=none; b=EaFetOGD/6PlvRlBzpTUerOIT9Ksk21xOKvq43Xl4RE0pTJ2/CDf6as2yIG0nQVvCwyJYARFD6uxCloJj5D/zGv4zvCShGwtbAqXuWM/vkekACUpXMiPd1+jKZfS+uXXEKEZQ5BFLGXcytJqhXaJ46MOBnDjwPSAF1CLyGgkemU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745203518; c=relaxed/simple; bh=sumMjmwZxo0qJvhw7S8ltFSvno0RZA0jV5PI4oksXXk=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eGNBLlNESDSTf/cGugQETWm69zq04AQOs/yYYGprIOOSr+MhZMMwBreLlBPdvQ+6Vfdbzipg/OJArXIpSWVb0yGzYN4XQ5JQNwK+jGlujEMc2bGAICyNPxNmWBvaIdltVsu4HxZM1odIiludQHIF/8b5HxNEVijrydPssoROMeE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ir7QOPOC; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ir7QOPOC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745203514; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NrTRtJXNWtpYPwMS+KZwGlCYM9RXuubPsDXsJSdgeYg=; b=ir7QOPOCR59SAvvCL7vuyeafrM4sWUhK8VsvC2bywGbjqJFyxkwn9amw4IZbOw9AGtTNFy IobDVbTmOcF31dcIHOc6f0oVZeybFt6NPw+CTjHaj6k4jh50quAy+/Cp4CZPkcHfYhYEOe GVlcOCCoD2awp2Oq0KnnpkwygLEQvpc= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-495-9UGXHUVXNde3xXFfbvUM-Q-1; Sun, 20 Apr 2025 22:45:11 -0400 X-MC-Unique: 9UGXHUVXNde3xXFfbvUM-Q-1 X-Mimecast-MFC-AGG-ID: 9UGXHUVXNde3xXFfbvUM-Q_1745203510 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 480301800570; Mon, 21 Apr 2025 02:45:10 +0000 (UTC) Received: from server.redhat.com (unknown [10.72.112.29]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 33922180175B; Mon, 21 Apr 2025 02:45:05 +0000 (UTC) From: Cindy Lu To: lulu@redhat.com, jasowang@redhat.com, mst@redhat.com, michael.christie@oracle.com, sgarzare@redhat.com, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Subject: [PATCH v9 1/4] vhost: Add a new parameter in vhost_dev to allow user select kthread Date: Mon, 21 Apr 2025 10:44:07 +0800 Message-ID: <20250421024457.112163-2-lulu@redhat.com> In-Reply-To: <20250421024457.112163-1-lulu@redhat.com> References: <20250421024457.112163-1-lulu@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" The vhost now uses vhost_task and workers as a child of the owner thread. While this aligns with containerization principles, it confuses some legacy userspace applications, therefore, we are reintroducing kthread API support. Introduce a new parameter to enable users to choose between kthread and task mode. By default, this parameter is set to true, so the default behavior remains unchanged by this patch. Signed-off-by: Cindy Lu Acked-by: Jason Wang Reviewed-by: Stefano Garzarella --- drivers/vhost/vhost.c | 1 + drivers/vhost/vhost.h | 9 +++++++++ 2 files changed, 10 insertions(+) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 63612faeab72..250dc43f1786 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -552,6 +552,7 @@ void vhost_dev_init(struct vhost_dev *dev, dev->byte_weight =3D byte_weight; dev->use_worker =3D use_worker; dev->msg_handler =3D msg_handler; + dev->inherit_owner =3D true; init_waitqueue_head(&dev->wait); INIT_LIST_HEAD(&dev->read_list); INIT_LIST_HEAD(&dev->pending_list); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index bb75a292d50c..19bb94922a0e 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -176,6 +176,15 @@ struct vhost_dev { int byte_weight; struct xarray worker_xa; bool use_worker; + /* + * If inherit_owner is true we use vhost_tasks to create + * the worker so all settings/limits like cgroups, NPROC, + * scheduler, etc are inherited from the owner. If false, + * we use kthreads and only attach to the same cgroups + * as the owner for compat with older kernels. + * here we use true as default value + */ + bool inherit_owner; int (*msg_handler)(struct vhost_dev *dev, u32 asid, struct vhost_iotlb_msg *msg); }; --=20 2.45.0 From nobody Sun Dec 14 19:21:37 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F2F9E2571B9 for ; Mon, 21 Apr 2025 02:45:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745203522; cv=none; b=pEA2sAAEsDHgk9qQOhAiM7lAy8GlZCULyAu/uH9mCxGcFw65SlSnb6tiN9GtaZA0dKCusu9zQSHEJTHWS7vQGc8TTXAnkzAQSzQUspdElyy3cDSUY55X7245lBIF6vYwiuU4ozdUBkoHLEcLGQ+Humq/i24iNiuWx8iQHrNvLBg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745203522; c=relaxed/simple; bh=AZqGW8Ocgt8sq156c0ncQRRuaUsojrcIcO70tsL2zlg=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=SyKofRDz9jij+7HRc/t+n6bd6Jv2Hy2Sjydw1w0ZFDsQIDHNTTmKD9tCkJsgBrxm9kwyzL0n7/VP4yb6ByFv7SIlQkIUja13zAU7xNxr2AM9zn57F9OffR4XuH9cKOSMUYouPGNlQVJWkiIEoM3UL3zzChzASEKXsCE96Zg4ANg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=B5OzFLVY; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="B5OzFLVY" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745203519; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=q+V+8sRVGmZGRQi4jugHVy4Q0DFhBHCBFGufYnE9Lbc=; b=B5OzFLVYXx6XmcMx3hpLzRLybb//a9oa5+qOuoLGWSvHHEQdMTL80ZNiibYZ7xfmLVuZCp 9eYb+BHJGTJsDOvsam8CJupyxV9zACaspoG7Ej9CXpJZjLA8dWKN72U2bvq8gb7nvZlCfy 6R4sZfUgMCsQQAP7vF2q+CiKvVP4KBg= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-653-ko8SyFA-PnuTtm2w8azhsg-1; Sun, 20 Apr 2025 22:45:16 -0400 X-MC-Unique: ko8SyFA-PnuTtm2w8azhsg-1 X-Mimecast-MFC-AGG-ID: ko8SyFA-PnuTtm2w8azhsg_1745203515 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5E7AC18004A7; Mon, 21 Apr 2025 02:45:15 +0000 (UTC) Received: from server.redhat.com (unknown [10.72.112.29]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id ED7741801778; Mon, 21 Apr 2025 02:45:10 +0000 (UTC) From: Cindy Lu To: lulu@redhat.com, jasowang@redhat.com, mst@redhat.com, michael.christie@oracle.com, sgarzare@redhat.com, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Subject: [PATCH v9 2/4] vhost: Reintroduce kthread mode support in vhost Date: Mon, 21 Apr 2025 10:44:08 +0800 Message-ID: <20250421024457.112163-3-lulu@redhat.com> In-Reply-To: <20250421024457.112163-1-lulu@redhat.com> References: <20250421024457.112163-1-lulu@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 This patch reintroduces kthread mode support in vhost, It also introduces struct vhost_worker_ops to abstract worker create/stop/wakeup operations. * Bring back the original vhost_worker() implementation, and renamed to vhost_run_work_kthread_list(). * Add cgroup support for the kthread * Introduce struct vhost_worker_ops: - Encapsulates create / stop / wake=E2=80=91up callbacks. - vhost_worker_create() selects the proper ops according to inherit_owner. This partially reverts or improves upon: commit 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") commit 1cdaafa1b8b4 ("vhost: replace single worker pointer with xarray") Signed-off-by: Cindy Lu --- drivers/vhost/vhost.c | 188 ++++++++++++++++++++++++++++++++++++++---- drivers/vhost/vhost.h | 12 +++ 2 files changed, 182 insertions(+), 18 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 250dc43f1786..be97028a8baf 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -242,7 +243,7 @@ static void vhost_worker_queue(struct vhost_worker *wor= ker, * test_and_set_bit() implies a memory barrier. */ llist_add(&work->node, &worker->work_list); - vhost_task_wake(worker->vtsk); + worker->ops->wakeup(worker); } } =20 @@ -388,6 +389,44 @@ static void vhost_vq_reset(struct vhost_dev *dev, __vhost_vq_meta_reset(vq); } =20 +static int vhost_run_work_kthread_list(void *data) +{ + struct vhost_worker *worker =3D data; + struct vhost_work *work, *work_next; + struct vhost_dev *dev =3D worker->dev; + struct llist_node *node; + + kthread_use_mm(dev->mm); + + for (;;) { + /* mb paired w/ kthread_stop */ + set_current_state(TASK_INTERRUPTIBLE); + + if (kthread_should_stop()) { + __set_current_state(TASK_RUNNING); + break; + } + node =3D llist_del_all(&worker->work_list); + if (!node) + schedule(); + + node =3D llist_reverse_order(node); + /* make sure flag is seen after deletion */ + smp_wmb(); + llist_for_each_entry_safe(work, work_next, node, node) { + clear_bit(VHOST_WORK_QUEUED, &work->flags); + __set_current_state(TASK_RUNNING); + kcov_remote_start_common(worker->kcov_handle); + work->fn(work); + kcov_remote_stop(); + cond_resched(); + } + } + kthread_unuse_mm(dev->mm); + + return 0; +} + static bool vhost_run_work_list(void *data) { struct vhost_worker *worker =3D data; @@ -582,6 +621,46 @@ long vhost_dev_check_owner(struct vhost_dev *dev) } EXPORT_SYMBOL_GPL(vhost_dev_check_owner); =20 +struct vhost_attach_cgroups_struct { + struct vhost_work work; + struct task_struct *owner; + int ret; +}; + +static void vhost_attach_cgroups_work(struct vhost_work *work) +{ + struct vhost_attach_cgroups_struct *s; + + s =3D container_of(work, struct vhost_attach_cgroups_struct, work); + s->ret =3D cgroup_attach_task_all(s->owner, current); +} + +static int vhost_attach_task_to_cgroups(struct vhost_worker *worker) +{ + struct vhost_attach_cgroups_struct attach; + int saved_cnt; + + attach.owner =3D current; + + vhost_work_init(&attach.work, vhost_attach_cgroups_work); + vhost_worker_queue(worker, &attach.work); + + mutex_lock(&worker->mutex); + + /* + * Bypass attachment_cnt check in __vhost_worker_flush: + * Temporarily change it to INT_MAX to bypass the check + */ + saved_cnt =3D worker->attachment_cnt; + worker->attachment_cnt =3D INT_MAX; + __vhost_worker_flush(worker); + worker->attachment_cnt =3D saved_cnt; + + mutex_unlock(&worker->mutex); + + return attach.ret; +} + /* Caller should have device mutex */ bool vhost_dev_has_owner(struct vhost_dev *dev) { @@ -627,7 +706,7 @@ static void vhost_worker_destroy(struct vhost_dev *dev, =20 WARN_ON(!llist_empty(&worker->work_list)); xa_erase(&dev->worker_xa, worker->id); - vhost_task_stop(worker->vtsk); + worker->ops->stop(worker); kfree(worker); } =20 @@ -650,42 +729,115 @@ static void vhost_workers_free(struct vhost_dev *dev) xa_destroy(&dev->worker_xa); } =20 +static void vhost_task_wakeup(struct vhost_worker *worker) +{ + return vhost_task_wake(worker->vtsk); +} + +static void vhost_kthread_wakeup(struct vhost_worker *worker) +{ + wake_up_process(worker->kthread_task); +} + +static void vhost_task_do_stop(struct vhost_worker *worker) +{ + return vhost_task_stop(worker->vtsk); +} + +static void vhost_kthread_do_stop(struct vhost_worker *worker) +{ + kthread_stop(worker->kthread_task); +} + +static int vhost_task_worker_create(struct vhost_worker *worker, + struct vhost_dev *dev, const char *name) +{ + struct vhost_task *vtsk; + u32 id; + int ret; + + vtsk =3D vhost_task_create(vhost_run_work_list, vhost_worker_killed, + worker, name); + if (IS_ERR(vtsk)) + return PTR_ERR(vtsk); + + worker->vtsk =3D vtsk; + vhost_task_start(vtsk); + ret =3D xa_alloc(&dev->worker_xa, &id, worker, xa_limit_32b, GFP_KERNEL); + if (ret < 0) { + vhost_task_do_stop(worker); + return ret; + } + worker->id =3D id; + return 0; +} + +static int vhost_kthread_worker_create(struct vhost_worker *worker, + struct vhost_dev *dev, const char *name) +{ + struct task_struct *task; + u32 id; + int ret; + + task =3D kthread_create(vhost_run_work_kthread_list, worker, "%s", name); + if (IS_ERR(task)) + return PTR_ERR(task); + + worker->kthread_task =3D task; + wake_up_process(task); + ret =3D xa_alloc(&dev->worker_xa, &id, worker, xa_limit_32b, GFP_KERNEL); + if (ret < 0) + goto stop_worker; + + ret =3D vhost_attach_task_to_cgroups(worker); + if (ret) + goto stop_worker; + + worker->id =3D id; + return 0; + +stop_worker: + vhost_kthread_do_stop(worker); + return ret; +} + +static const struct vhost_worker_ops kthread_ops =3D { + .create =3D vhost_kthread_worker_create, + .stop =3D vhost_kthread_do_stop, + .wakeup =3D vhost_kthread_wakeup, +}; + +static const struct vhost_worker_ops vhost_task_ops =3D { + .create =3D vhost_task_worker_create, + .stop =3D vhost_task_do_stop, + .wakeup =3D vhost_task_wakeup, +}; + static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev) { struct vhost_worker *worker; - struct vhost_task *vtsk; char name[TASK_COMM_LEN]; int ret; - u32 id; + const struct vhost_worker_ops *ops =3D + dev->inherit_owner ? &vhost_task_ops : &kthread_ops; =20 worker =3D kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT); if (!worker) return NULL; =20 worker->dev =3D dev; + worker->ops =3D ops; snprintf(name, sizeof(name), "vhost-%d", current->pid); =20 - vtsk =3D vhost_task_create(vhost_run_work_list, vhost_worker_killed, - worker, name); - if (IS_ERR(vtsk)) - goto free_worker; - mutex_init(&worker->mutex); init_llist_head(&worker->work_list); worker->kcov_handle =3D kcov_common_handle(); - worker->vtsk =3D vtsk; - - vhost_task_start(vtsk); - - ret =3D xa_alloc(&dev->worker_xa, &id, worker, xa_limit_32b, GFP_KERNEL); + ret =3D ops->create(worker, dev, name); if (ret < 0) - goto stop_worker; - worker->id =3D id; + goto free_worker; =20 return worker; =20 -stop_worker: - vhost_task_stop(vtsk); free_worker: kfree(worker); return NULL; diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 19bb94922a0e..af4b2f7d3b91 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -26,7 +26,18 @@ struct vhost_work { unsigned long flags; }; =20 +struct vhost_worker; +struct vhost_dev; + +struct vhost_worker_ops { + int (*create)(struct vhost_worker *worker, struct vhost_dev *dev, + const char *name); + void (*stop)(struct vhost_worker *worker); + void (*wakeup)(struct vhost_worker *worker); +}; + struct vhost_worker { + struct task_struct *kthread_task; struct vhost_task *vtsk; struct vhost_dev *dev; /* Used to serialize device wide flushing with worker swapping. */ @@ -36,6 +47,7 @@ struct vhost_worker { u32 id; int attachment_cnt; bool killed; + const struct vhost_worker_ops *ops; }; =20 /* Poll a file (eventfd or socket) */ --=20 2.45.0 From nobody Sun Dec 14 19:21:37 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D088721CC46 for ; Mon, 21 Apr 2025 02:45:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745203524; cv=none; b=myWYOmz1EIMBUGNjfN+9BPJx3lbhML48a1CRJzKJCXtObAsqXr+vNpstF843ySQPgsvdovJ4JM2DnTHGxEf7vCfv2/rPg/0HLN9jC5FrsBDmOl243CZqh0HYB+L9Hl5a8ODl//n2+ypkLBJ+smTbNSd0gv33OXp8n+P5F+vRwfY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745203524; c=relaxed/simple; bh=A4Zbqm+MVJZCQcrr+9Ai/1+sO4076QEbyY00hQ6PoxI=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lN8x4j3MwrqwBleQBiNzgxxDA5MF42jSndyJ/2WFjEbR6+1DTaEMxDP5EHVCB/xOcN6EWPd7TqSUBG7Za9fMAp42Ds6Ds8gOn/yFWuBlHEhRk0nSsk8SnyDAErEd0n1hKR2XuGIwXC4Go6E/jws47n8CH66zoVckdsc+ROwmc0I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=KSAa8VbM; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="KSAa8VbM" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745203522; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ss3P3+pIl4el626c6K08/+BeDbxgovbAazzfH4/0Z1Y=; b=KSAa8VbMdx55AVQvQtn2Yj+eia5AkaadYpdWpB1j1Q+D88q5DL7AecHV8IappLMT+HD83v FJrivmVnD2XqNRwFq/yW4XW07/3cGvzJtv3HZsDWD/CIX3O4cUEZ0l6C4GZse2sjWcHCVK 1jBWz6u4Fh2Ua2ZHIUEmDQkykEWcfYM= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-605-anIgpWsfMui2vWWiVhyiuw-1; Sun, 20 Apr 2025 22:45:20 -0400 X-MC-Unique: anIgpWsfMui2vWWiVhyiuw-1 X-Mimecast-MFC-AGG-ID: anIgpWsfMui2vWWiVhyiuw_1745203519 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C18A91800368; Mon, 21 Apr 2025 02:45:19 +0000 (UTC) Received: from server.redhat.com (unknown [10.72.112.29]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id BD7D4180175B; Mon, 21 Apr 2025 02:45:15 +0000 (UTC) From: Cindy Lu To: lulu@redhat.com, jasowang@redhat.com, mst@redhat.com, michael.christie@oracle.com, sgarzare@redhat.com, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Subject: [PATCH v9 3/4] vhost: add VHOST_FORK_FROM_OWNER ioctl and validate inherit_owner Date: Mon, 21 Apr 2025 10:44:09 +0800 Message-ID: <20250421024457.112163-4-lulu@redhat.com> In-Reply-To: <20250421024457.112163-1-lulu@redhat.com> References: <20250421024457.112163-1-lulu@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" Add a new UAPI to configure the vhost device to use the kthread mode. The userspace application can use IOCTL VHOST_FORK_FROM_OWNER to choose between owner and kthread mode if necessary. This setting must be applied before VHOST_SET_OWNER, as the worker will be created in the VHOST_SET_OWNER function. In addition, the VHOST_NEW_WORKER requires the inherit_owner setting to be true. So we need to add a check for this. Signed-off-by: Cindy Lu Acked-by: Jason Wang --- drivers/vhost/vhost.c | 29 +++++++++++++++++++++++++++-- include/uapi/linux/vhost.h | 16 ++++++++++++++++ 2 files changed, 43 insertions(+), 2 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index be97028a8baf..fb0c7fb43f78 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1018,6 +1018,13 @@ long vhost_worker_ioctl(struct vhost_dev *dev, unsig= ned int ioctl, switch (ioctl) { /* dev worker ioctls */ case VHOST_NEW_WORKER: + /* + * vhost_tasks will account for worker threads under the parent's + * NPROC value but kthreads do not. To avoid userspace overflowing + * the system with worker threads inherit_owner must be true. + */ + if (!dev->inherit_owner) + return -EFAULT; ret =3D vhost_new_worker(dev, &state); if (!ret && copy_to_user(argp, &state, sizeof(state))) ret =3D -EFAULT; @@ -1134,7 +1141,7 @@ void vhost_dev_reset_owner(struct vhost_dev *dev, str= uct vhost_iotlb *umem) int i; =20 vhost_dev_cleanup(dev); - + dev->inherit_owner =3D true; dev->umem =3D umem; /* We don't need VQ locks below since vhost_dev_cleanup makes sure * VQs aren't running. @@ -2287,7 +2294,25 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned i= nt ioctl, void __user *argp) r =3D vhost_dev_set_owner(d); goto done; } - + if (ioctl =3D=3D VHOST_FORK_FROM_OWNER) { + u8 inherit_owner; + /*inherit_owner can only be modified before owner is set*/ + if (vhost_dev_has_owner(d)) { + r =3D -EBUSY; + goto done; + } + if (copy_from_user(&inherit_owner, argp, sizeof(u8))) { + r =3D -EFAULT; + goto done; + } + if (inherit_owner > 1) { + r =3D -EINVAL; + goto done; + } + d->inherit_owner =3D (bool)inherit_owner; + r =3D 0; + goto done; + } /* You must be the owner to do anything else */ r =3D vhost_dev_check_owner(d); if (r) diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h index b95dd84eef2d..1ae0917bfeca 100644 --- a/include/uapi/linux/vhost.h +++ b/include/uapi/linux/vhost.h @@ -235,4 +235,20 @@ */ #define VHOST_VDPA_GET_VRING_SIZE _IOWR(VHOST_VIRTIO, 0x82, \ struct vhost_vring_state) + +/** + * VHOST_FORK_FROM_OWNER - Set the inherit_owner flag for the vhost device, + * This ioctl must called before VHOST_SET_OWNER. + * + * @param inherit_owner: An 8-bit value that determines the vhost thread m= ode + * + * When inherit_owner is set to 1(default value): + * - Vhost will create tasks similar to processes forked from the owner, + * inheriting all of the owner's attributes. + * + * When inherit_owner is set to 0: + * - Vhost will create tasks as kernel thread. + */ +#define VHOST_FORK_FROM_OWNER _IOW(VHOST_VIRTIO, 0x83, __u8) + #endif --=20 2.45.0 From nobody Sun Dec 14 19:21:37 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98C4838382 for ; Mon, 21 Apr 2025 02:45:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745203553; cv=none; b=penyLWI5pyY9PbPEG+IUUQZFdXkx+tLAlGlxAr63AouxB4IhoMOOccPbQKlqUJ/23QpSaTV5XfpWeESjKtqhWC4MrR0276IU1RsfsWI3vZwGdLdS8lguA1sc6kYf1cRAv+E0Cdh1BxEqW/Rm7PdEzQz1GvQExhojmWWIuq3oSy8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745203553; c=relaxed/simple; bh=xai17yd0Iw0cyri+aEzpllkp4/1VDyGHCbJLE5ukZJM=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MbO2z04U9E0d/J5Kv6SGEO73ij1IxTN2oydN6zG/xRlWrB1PWtz88sxxd13rxLfV5DP4DI1GFvTncOAqM7KeUzIn6Ic2lDVbxRHt7FTlFscv01UkRlm8KrA3SixF6apoHp0Zem0aKHIpJ6NeEcNppGaKsmsIo9c5rjS0Zxza/XE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=CIBfcura; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CIBfcura" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745203550; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tksn/BeIa6XB+jednamafWrUiiXuAgrox86wRoZ3XDE=; b=CIBfcuratZHZ5TtODJEqxHIa6Mo4uU0QaUbOUDIRnM5Pu+WEz+oA4IQUXg8f5UhdAo98rT bmJeK+v8ySKPPbaXsX8mF0XK9j4mfY1qIn8Dq4tDjnuodJCrsH3lOmOwc+3DY+rwXYYQ/K 5WdFDt9leeyWxonM+ZBbjI3CVsQNLHg= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-665-1w1C6q7mN6-inSOiYaE9XQ-1; Sun, 20 Apr 2025 22:45:47 -0400 X-MC-Unique: 1w1C6q7mN6-inSOiYaE9XQ-1 X-Mimecast-MFC-AGG-ID: 1w1C6q7mN6-inSOiYaE9XQ_1745203546 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9350219560AA; Mon, 21 Apr 2025 02:45:46 +0000 (UTC) Received: from server.redhat.com (unknown [10.72.112.29]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 8F8CE180138B; Mon, 21 Apr 2025 02:45:42 +0000 (UTC) From: Cindy Lu To: lulu@redhat.com, jasowang@redhat.com, mst@redhat.com, michael.christie@oracle.com, sgarzare@redhat.com, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org Subject: [PATCH v9 4/4] vhost: Add a KConfig knob to enable IOCTL VHOST_FORK_FROM_OWNER Date: Mon, 21 Apr 2025 10:44:10 +0800 Message-ID: <20250421024457.112163-5-lulu@redhat.com> In-Reply-To: <20250421024457.112163-1-lulu@redhat.com> References: <20250421024457.112163-1-lulu@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Content-Type: text/plain; charset="utf-8" Introduce a new config knob `CONFIG_VHOST_ENABLE_FORK_OWNER_IOCTL`, to control the availability of the `VHOST_FORK_FROM_OWNER` ioctl. When CONFIG_VHOST_ENABLE_FORK_OWNER_IOCTL is set to n, the ioctl is disabled, and any attempt to use it will result in failure. Signed-off-by: Cindy Lu --- drivers/vhost/Kconfig | 15 +++++++++++++++ drivers/vhost/vhost.c | 3 +++ 2 files changed, 18 insertions(+) diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 020d4fbb947c..bc8fadb06f98 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -96,3 +96,18 @@ config VHOST_CROSS_ENDIAN_LEGACY If unsure, say "N". =20 endif + +config VHOST_ENABLE_FORK_OWNER_IOCTL + bool "Enable IOCTL VHOST_FORK_FROM_OWNER" + default n + help + This option enables the IOCTL VHOST_FORK_FROM_OWNER, which allows + userspace applications to modify the thread mode for vhost devices. + + By default, `CONFIG_VHOST_ENABLE_FORK_OWNER_IOCTL` is set to `n`, + meaning the ioctl is disabled and any operation using this ioctl + will fail. + When the configuration is enabled (y), the ioctl becomes + available, allowing users to set the mode if needed. + + If unsure, say "N". diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index fb0c7fb43f78..568e43cb54a9 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2294,6 +2294,8 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned in= t ioctl, void __user *argp) r =3D vhost_dev_set_owner(d); goto done; } + +#ifdef CONFIG_VHOST_ENABLE_FORK_OWNER_IOCTL if (ioctl =3D=3D VHOST_FORK_FROM_OWNER) { u8 inherit_owner; /*inherit_owner can only be modified before owner is set*/ @@ -2313,6 +2315,7 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned in= t ioctl, void __user *argp) r =3D 0; goto done; } +#endif /* You must be the owner to do anything else */ r =3D vhost_dev_check_owner(d); if (r) --=20 2.45.0