From nobody Thu Apr 9 17:57:47 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4721F34F259; Tue, 3 Mar 2026 13:49:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545764; cv=none; b=qun1oeIm3Kdi8KecXnWQPiDJoXd0OKDnQwp8oQCqIpqbm8ohJmP+xmqpcECvBO/WBfgbRCpsYviivtjuaEZXtZ95z3tQC/04ks5feFYbiZ5uCme0+sjgPdsHSWftkbDPxT22pwJcJrYF6hOR/d615gg2t2BZeFmrDnrubgnLnIw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545764; c=relaxed/simple; bh=G9y+dlXKiv7XllPg95+Jn37I6Ds6wUEm0+HL1jfB7Y0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=F6+odSaSdhD33q6LSRBE/HlXG9SGAG6CjexM440IQvU8uFx2DhPgceGmZkiQ6rH6Pk7+78a7iPq+QntG4yt9vb8IOAweLT8FNTA66yr+fOT3M6F2eGN9hhlx/jmouSBaxAMSRgGAJBN0LgvC9BwYmTvE9pBi4DN+7VNkyVppRdA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Tbp37g3S; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Tbp37g3S" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 93470C116C6; Tue, 3 Mar 2026 13:49:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772545763; bh=G9y+dlXKiv7XllPg95+Jn37I6Ds6wUEm0+HL1jfB7Y0=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=Tbp37g3SW/FaxSbw3RmAR8ZKVVeS4XC1SgHv7xLaVVkJshBbd7t4hN5WB51rWX/oG oz7BhvpcbXwI8r9hLNw/izFwijHNyHZmlzv7wUVr7GkvMVJvJ7LuFbvYH0E+zfQg6m /oV+LqAALAi4+eMBsJ6m9SVHo7JR8WLdxb0dt686KlLR/BrA2UFBGCLqB45WCofIXG fTou0DM9Sahlk7fUlTTaDaUxQaOchW+y/Y7NBjNmHQPTcpcYg/X3RwU8J434/JjTq2 AIq0FWNM3YvtzYsqevUa9nr1aYvFbAaeG1hU+iyPNw74MJ6iwpQRauv+MY41WgCr/R KLc+8fijkl0qQ== From: Christian Brauner Date: Tue, 03 Mar 2026 14:49:12 +0100 Subject: [PATCH RFC DRAFT POC 01/11] kthread: refactor __kthread_create_on_node() to take a struct argument Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260303-work-kthread-nullfs-v1-1-87e559b94375@kernel.org> References: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> In-Reply-To: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> To: linux-fsdevel@vger.kernel.org, Linus Torvalds Cc: linux-kernel@vger.kernel.org, Alexander Viro , Jens Axboe , Jan Kara , Tejun Heo , Jann Horn , Christian Brauner X-Mailer: b4 0.15-dev-47773 X-Developer-Signature: v=1; a=openpgp-sha256; l=5798; i=brauner@kernel.org; h=from:subject:message-id; bh=G9y+dlXKiv7XllPg95+Jn37I6Ds6wUEm0+HL1jfB7Y0=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMWQue3YvZm1MpOPNStesz2IqB99nT9ZM3sHc+iRfZu7UA 5NsJJ5IdpSyMIhxMciKKbI4tJuEyy3nqdhslKkBM4eVCWQIAxenAEzk2nWG/+HWz5iWmW2SWp4r aHpjSm6/5Zs5/R+f9zSJihfNvbHmojsjw4Z3qyyeel79efd19QQPCZ1FSiaahyN32mR/uc0eqfa 7jw8A X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Refactor __kthread_create_on_node() to take a const struct kthread_create_info pointer instead of individual parameters. The caller fills in the relevant fields in a stack-local struct and the helper heap-copies it, making it trivial to add new kthread creation options without changing the function signature. As part of this, collapse __kthread_create_worker_on_node() into __kthread_create_on_node() by adding a kthread_worker:1 bitfield to struct kthread_create_info. When set, the unified helper allocates and initializes the kthread_worker internally, removing the need for a separate helper. Also switch create_kthread() from the kernel_thread() wrapper to constructing struct kernel_clone_args directly and calling kernel_clone(). This makes the clone flags explicit and prepares for passing richer per-kthread arguments through kernel_clone_args in subsequent patches. No functional change. Signed-off-by: Christian Brauner --- kernel/kthread.c | 87 +++++++++++++++++++++++++++++++---------------------= ---- 1 file changed, 48 insertions(+), 39 deletions(-) diff --git a/kernel/kthread.c b/kernel/kthread.c index 791210daf8b4..84d535c7a635 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -45,6 +45,7 @@ struct kthread_create_info int (*threadfn)(void *data); void *data; int node; + u32 kthread_worker:1; =20 /* Result passed back to kthread_create() from kthreadd. */ struct task_struct *result; @@ -451,13 +452,20 @@ int tsk_fork_get_node(struct task_struct *tsk) static void create_kthread(struct kthread_create_info *create) { int pid; + struct kernel_clone_args args =3D { + .flags =3D CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_UNTRACED, + .exit_signal =3D SIGCHLD, + .fn =3D kthread, + .fn_arg =3D create, + .name =3D create->full_name, + .kthread =3D 1, + }; =20 #ifdef CONFIG_NUMA current->pref_node_fork =3D create->node; #endif /* We want our own signal handler (we take no signals by default). */ - pid =3D kernel_thread(kthread, create, create->full_name, - CLONE_FS | CLONE_FILES | SIGCHLD); + pid =3D kernel_clone(&args); if (pid < 0) { /* Release the structure when caller killed by a fatal signal. */ struct completion *done =3D xchg(&create->done, NULL); @@ -472,21 +480,32 @@ static void create_kthread(struct kthread_create_info= *create) } } =20 -static __printf(4, 0) -struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data), - void *data, int node, +static struct task_struct *__kthread_create_on_node(const struct kthread_c= reate_info *info, const char namefmt[], va_list args) { DECLARE_COMPLETION_ONSTACK(done); + struct kthread_worker *worker =3D NULL; struct task_struct *task; - struct kthread_create_info *create =3D kmalloc_obj(*create); + struct kthread_create_info *create; =20 + create =3D kmalloc_obj(*create); if (!create) return ERR_PTR(-ENOMEM); - create->threadfn =3D threadfn; - create->data =3D data; - create->node =3D node; + + *create =3D *info; + + if (create->kthread_worker) { + worker =3D kzalloc_obj(*worker); + if (!worker) { + kfree(create); + return ERR_PTR(-ENOMEM); + } + kthread_init_worker(worker); + create->threadfn =3D kthread_worker_fn; + create->data =3D worker; + } + create->done =3D &done; create->full_name =3D kvasprintf(GFP_KERNEL, namefmt, args); if (!create->full_name) { @@ -520,6 +539,8 @@ struct task_struct *__kthread_create_on_node(int (*thre= adfn)(void *data), } task =3D create->result; free_create: + if (IS_ERR(task)) + kfree(worker); kfree(create); return task; } @@ -552,11 +573,16 @@ struct task_struct *kthread_create_on_node(int (*thre= adfn)(void *data), const char namefmt[], ...) { + struct kthread_create_info info =3D { + .threadfn =3D threadfn, + .data =3D data, + .node =3D node, + }; struct task_struct *task; va_list args; =20 va_start(args, namefmt); - task =3D __kthread_create_on_node(threadfn, data, node, namefmt, args); + task =3D __kthread_create_on_node(&info, namefmt, args); va_end(args); =20 return task; @@ -1045,34 +1071,6 @@ int kthread_worker_fn(void *worker_ptr) } EXPORT_SYMBOL_GPL(kthread_worker_fn); =20 -static __printf(3, 0) struct kthread_worker * -__kthread_create_worker_on_node(unsigned int flags, int node, - const char namefmt[], va_list args) -{ - struct kthread_worker *worker; - struct task_struct *task; - - worker =3D kzalloc_obj(*worker); - if (!worker) - return ERR_PTR(-ENOMEM); - - kthread_init_worker(worker); - - task =3D __kthread_create_on_node(kthread_worker_fn, worker, - node, namefmt, args); - if (IS_ERR(task)) - goto fail_task; - - worker->flags =3D flags; - worker->task =3D task; - - return worker; - -fail_task: - kfree(worker); - return ERR_CAST(task); -} - /** * kthread_create_worker_on_node - create a kthread worker * @flags: flags modifying the default behavior of the worker @@ -1086,13 +1084,24 @@ __kthread_create_worker_on_node(unsigned int flags,= int node, struct kthread_worker * kthread_create_worker_on_node(unsigned int flags, int node, const char nam= efmt[], ...) { + struct kthread_create_info info =3D { + .node =3D node, + .kthread_worker =3D 1, + }; struct kthread_worker *worker; + struct task_struct *task; va_list args; =20 va_start(args, namefmt); - worker =3D __kthread_create_worker_on_node(flags, node, namefmt, args); + task =3D __kthread_create_on_node(&info, namefmt, args); va_end(args); =20 + if (IS_ERR(task)) + return ERR_CAST(task); + + worker =3D kthread_data(task); + worker->flags =3D flags; + worker->task =3D task; return worker; } EXPORT_SYMBOL(kthread_create_worker_on_node); --=20 2.47.3 From nobody Thu Apr 9 17:57:47 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 810343570DF; Tue, 3 Mar 2026 13:49:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545766; cv=none; b=IG+kcYU/3xxEc18eN08esP6mZqS8h4ONPz47b8y28tPu7pjFCPJiyrhaPDCawu0Sea5M/Avuv+xD3AGv3arQofzPerTOo8kvry4mLsfLieLt0eaYps2LcC7uQEx9q7b+I1RzNRLNQWlfmYfrJWwT1p9HASf1MHWbZnA8SKJw0A0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545766; c=relaxed/simple; bh=oQ8uKG3Ry2N/mgkkmA1tJCwU7ZWTHkTzQBPEwjHqMfg=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=QM6RaSKbrZH3kW7b6U2VMewm3N28DVLUNE4BPI8ohQ6FXng9BHHC8TE7EqF53rWxSRUt7IqY7hzeQv5iS4l0nxXho+9ur13YhcWf6n1pRiwh9JGNEFo+wsib376MNW121Bk79RP1KoCHAC6yPKvT3GDU9QQoaUyxKxrJvYfNHRM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=cWvQx82R; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="cWvQx82R" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0ED46C19422; Tue, 3 Mar 2026 13:49:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772545765; bh=oQ8uKG3Ry2N/mgkkmA1tJCwU7ZWTHkTzQBPEwjHqMfg=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=cWvQx82R/3Duwt+pSljwlzd9gVpbfNMQ+lrOooGZ0tiaVMjLI5jzIM3/Drbq/2Koi DNdEMTOEhzvvk/8vCdvKJzt2XJWPL9ubGQ544/SrBRmnZI1S1BkPfdDDBeFfTo7n39 BSbEpCtmv2dqB+tw90vizoBSoONPSZRTFNOUTFpOeEkPuaJ8jeOI13dT6X4sp1C8Gy G8GuinhdFmCZwoBYpeK6GIkiESYpWVyuug1NKNJr2PY1pnpu5c8/IOp22ny/xMGJo9 1i1Ys9GHHV9rzEwSqtokSGQPq0Flhs4Y0KsPdn8rzj16NG1Ktbs/XFvaMXcALctBoJ x0HMPgWmWAPCw== From: Christian Brauner Date: Tue, 03 Mar 2026 14:49:13 +0100 Subject: [PATCH RFC DRAFT POC 02/11] kthread: remove unused flags argument from kthread worker creation API Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260303-work-kthread-nullfs-v1-2-87e559b94375@kernel.org> References: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> In-Reply-To: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> To: linux-fsdevel@vger.kernel.org, Linus Torvalds Cc: linux-kernel@vger.kernel.org, Alexander Viro , Jens Axboe , Jan Kara , Tejun Heo , Jann Horn , Christian Brauner X-Mailer: b4 0.15-dev-47773 X-Developer-Signature: v=1; a=openpgp-sha256; l=28588; i=brauner@kernel.org; h=from:subject:message-id; bh=oQ8uKG3Ry2N/mgkkmA1tJCwU7ZWTHkTzQBPEwjHqMfg=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMWQue3bv2Mxfd+aXF6y/drDSuiKH8Wj4344/yzpyFbzXv Uk6/ZX7fkcpC4MYF4OsmCKLQ7tJuNxynorNRpkaMHNYmUCGMHBxCsBEDq1jZPi2+2pHEysb/03v fYc03ZbETN6tIJYTO791+lS7S/NZZ29h+Gd5YIXLDZHZ9xazfIs32/p90hXnHJapBYL1qpkyB5a 6lLECAA== X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Every caller of kthread_create_worker(), kthread_run_worker(), kthread_create_worker_on_cpu(), and kthread_run_worker_on_cpu() passes 0 for the flags argument. The only defined flag, KTW_FREEZABLE, has no users anywhere in the tree. Remove the flags parameter from the entire kthread worker creation API, the KTW_FREEZABLE enum, the flags field from struct kthread_worker, and the dead set_freezable() call in kthread_worker_fn(). No functional change. Signed-off-by: Christian Brauner --- arch/x86/kvm/i8254.c | 2 +- crypto/crypto_engine.c | 2 +- drivers/cpufreq/cppc_cpufreq.c | 2 +- drivers/dpll/zl3073x/core.c | 2 +- drivers/gpu/drm/drm_vblank_work.c | 6 ++--- .../gpu/drm/i915/gem/selftests/i915_gem_context.c | 4 ++-- drivers/gpu/drm/i915/gt/selftest_execlists.c | 2 +- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 ++-- drivers/gpu/drm/i915/gt/selftest_slpc.c | 2 +- drivers/gpu/drm/i915/selftests/i915_request.c | 12 +++++----- drivers/gpu/drm/msm/disp/msm_disp_snapshot.c | 2 +- drivers/gpu/drm/msm/msm_atomic.c | 2 +- drivers/gpu/drm/msm/msm_gpu.c | 2 +- drivers/gpu/drm/msm/msm_kms.c | 2 +- .../media/platform/chips-media/wave5/wave5-vpu.c | 2 +- drivers/net/dsa/mv88e6xxx/chip.c | 2 +- drivers/net/ethernet/intel/ice/ice_dpll.c | 4 ++-- drivers/net/ethernet/intel/ice/ice_gnss.c | 2 +- drivers/net/ethernet/intel/ice/ice_ptp.c | 4 ++-- drivers/platform/chrome/cros_ec_spi.c | 2 +- drivers/ptp/ptp_clock.c | 2 +- drivers/spi/spi.c | 2 +- drivers/usb/gadget/function/uvc_video.c | 2 +- drivers/usb/typec/tcpm/tcpm.c | 2 +- drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 ++-- drivers/watchdog/watchdog_dev.c | 2 +- fs/erofs/zdata.c | 2 +- include/linux/kthread.h | 28 +++++++-----------= ---- kernel/kthread.c | 13 +++------- kernel/rcu/tree.c | 4 ++-- kernel/sched/ext.c | 2 +- kernel/workqueue.c | 2 +- net/dsa/tag_ksz.c | 4 ++-- net/dsa/tag_ocelot_8021q.c | 2 +- net/dsa/tag_sja1105.c | 4 ++-- 35 files changed, 60 insertions(+), 77 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 1982b0077ddd..4f1065c96e78 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -750,7 +750,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 fla= gs) pid_nr =3D pid_vnr(pid); put_pid(pid); =20 - pit->worker =3D kthread_run_worker(0, "kvm-pit/%d", pid_nr); + pit->worker =3D kthread_run_worker("kvm-pit/%d", pid_nr); if (IS_ERR(pit->worker)) goto fail_kthread; =20 diff --git a/crypto/crypto_engine.c b/crypto/crypto_engine.c index 3d07dd5de4fa..60023f485c7f 100644 --- a/crypto/crypto_engine.c +++ b/crypto/crypto_engine.c @@ -456,7 +456,7 @@ struct crypto_engine *crypto_engine_alloc_init_and_set(= struct device *dev, guard(spinlock_init)(&engine->queue_lock); crypto_init_queue(&engine->queue, qlen); =20 - engine->kworker =3D kthread_run_worker(0, "%s", engine->name); + engine->kworker =3D kthread_run_worker("%s", engine->name); if (IS_ERR(engine->kworker)) { dev_err(dev, "failed to create crypto request pump task\n"); return NULL; diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index 011f35cb47b9..1cdd3ed9e7a3 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -225,7 +225,7 @@ static void cppc_fie_kworker_init(void) }; int ret; =20 - kworker_fie =3D kthread_run_worker(0, "cppc_fie"); + kworker_fie =3D kthread_run_worker("cppc_fie"); if (IS_ERR(kworker_fie)) { pr_warn("%s: failed to create kworker_fie: %ld\n", __func__, PTR_ERR(kworker_fie)); diff --git a/drivers/dpll/zl3073x/core.c b/drivers/dpll/zl3073x/core.c index 63bd97181b9e..55d0ee934246 100644 --- a/drivers/dpll/zl3073x/core.c +++ b/drivers/dpll/zl3073x/core.c @@ -966,7 +966,7 @@ zl3073x_devm_dpll_init(struct zl3073x_dev *zldev, u8 nu= m_dplls) =20 /* Initialize monitoring thread */ kthread_init_delayed_work(&zldev->work, zl3073x_dev_periodic_work); - kworker =3D kthread_run_worker(0, "zl3073x-%s", dev_name(zldev->dev)); + kworker =3D kthread_run_worker("zl3073x-%s", dev_name(zldev->dev)); if (IS_ERR(kworker)) { rc =3D PTR_ERR(kworker); goto error; diff --git a/drivers/gpu/drm/drm_vblank_work.c b/drivers/gpu/drm/drm_vblank= _work.c index 70f0199251ea..f5a95dc5bb05 100644 --- a/drivers/gpu/drm/drm_vblank_work.c +++ b/drivers/gpu/drm/drm_vblank_work.c @@ -279,9 +279,9 @@ int drm_vblank_worker_init(struct drm_vblank_crtc *vbla= nk) =20 INIT_LIST_HEAD(&vblank->pending_work); init_waitqueue_head(&vblank->work_wait_queue); - worker =3D kthread_run_worker(0, "card%d-crtc%d", - vblank->dev->primary->index, - vblank->pipe); + worker =3D kthread_run_worker("card%d-crtc%d", + vblank->dev->primary->index, + vblank->pipe); if (IS_ERR(worker)) return PTR_ERR(worker); =20 diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/driver= s/gpu/drm/i915/gem/selftests/i915_gem_context.c index 9d405098f9e7..8b55eeeabe8c 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -369,8 +369,8 @@ static int live_parallel_switch(void *arg) if (!data[n].ce[0]) continue; =20 - worker =3D kthread_run_worker(0, "igt/parallel:%s", - data[n].ce[0]->engine->name); + worker =3D kthread_run_worker("igt/parallel:%s", + data[n].ce[0]->engine->name); if (IS_ERR(worker)) { err =3D PTR_ERR(worker); goto out; diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm= /i915/gt/selftest_execlists.c index 21e5ed9f72a3..a6edb922b7e2 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -3577,7 +3577,7 @@ static int smoke_crescendo(struct preempt_smoke *smok= e, unsigned int flags) arg[id].batch =3D NULL; arg[id].count =3D 0; =20 - worker[id] =3D kthread_run_worker(0, "igt/smoke:%d", id); + worker[id] =3D kthread_run_worker("igt/smoke:%d", id); if (IS_ERR(worker[id])) { err =3D PTR_ERR(worker[id]); break; diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm= /i915/gt/selftest_hangcheck.c index 00dfc37221fa..91a0ab9d6158 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -1025,8 +1025,8 @@ static int __igt_reset_engines(struct intel_gt *gt, threads[tmp].engine =3D other; threads[tmp].flags =3D flags; =20 - worker =3D kthread_run_worker(0, "igt/%s", - other->name); + worker =3D kthread_run_worker("igt/%s", + other->name); if (IS_ERR(worker)) { err =3D PTR_ERR(worker); pr_err("[%s] Worker create failed: %d!\n", diff --git a/drivers/gpu/drm/i915/gt/selftest_slpc.c b/drivers/gpu/drm/i915= /gt/selftest_slpc.c index c3c918248989..fb69773e89d4 100644 --- a/drivers/gpu/drm/i915/gt/selftest_slpc.c +++ b/drivers/gpu/drm/i915/gt/selftest_slpc.c @@ -504,7 +504,7 @@ static int live_slpc_tile_interaction(void *arg) return -ENOMEM; =20 for_each_gt(gt, i915, i) { - threads[i].worker =3D kthread_run_worker(0, "igt/slpc_parallel:%d", gt->= info.id); + threads[i].worker =3D kthread_run_worker("igt/slpc_parallel:%d", gt->inf= o.id); =20 if (IS_ERR(threads[i].worker)) { ret =3D PTR_ERR(threads[i].worker); diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/dr= m/i915/selftests/i915_request.c index e1a7c454a0a9..54b8f7be0bdd 100644 --- a/drivers/gpu/drm/i915/selftests/i915_request.c +++ b/drivers/gpu/drm/i915/selftests/i915_request.c @@ -493,7 +493,7 @@ static int mock_breadcrumbs_smoketest(void *arg) for (n =3D 0; n < ncpus; n++) { struct kthread_worker *worker; =20 - worker =3D kthread_run_worker(0, "igt/%d", n); + worker =3D kthread_run_worker("igt/%d", n); if (IS_ERR(worker)) { ret =3D PTR_ERR(worker); ncpus =3D n; @@ -1646,8 +1646,8 @@ static int live_parallel_engines(void *arg) for_each_uabi_engine(engine, i915) { struct kthread_worker *worker; =20 - worker =3D kthread_run_worker(0, "igt/parallel:%s", - engine->name); + worker =3D kthread_run_worker("igt/parallel:%s", + engine->name); if (IS_ERR(worker)) { err =3D PTR_ERR(worker); break; @@ -1805,7 +1805,7 @@ static int live_breadcrumbs_smoketest(void *arg) unsigned int i =3D idx * ncpus + n; struct kthread_worker *worker; =20 - worker =3D kthread_run_worker(0, "igt/%d.%d", idx, n); + worker =3D kthread_run_worker("igt/%d.%d", idx, n); if (IS_ERR(worker)) { ret =3D PTR_ERR(worker); goto out_flush; @@ -3218,8 +3218,8 @@ static int perf_parallel_engines(void *arg) =20 memset(&engines[idx].p, 0, sizeof(engines[idx].p)); =20 - worker =3D kthread_run_worker(0, "igt:%s", - engine->name); + worker =3D kthread_run_worker("igt:%s", + engine->name); if (IS_ERR(worker)) { err =3D PTR_ERR(worker); intel_engine_pm_put(engine); diff --git a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c b/drivers/gpu/drm= /msm/disp/msm_disp_snapshot.c index d99771684728..87f8063b7390 100644 --- a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c +++ b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c @@ -109,7 +109,7 @@ int msm_disp_snapshot_init(struct drm_device *drm_dev) =20 mutex_init(&kms->dump_mutex); =20 - kms->dump_worker =3D kthread_run_worker(0, "%s", "disp_snapshot"); + kms->dump_worker =3D kthread_run_worker("%s", "disp_snapshot"); if (IS_ERR(kms->dump_worker)) DRM_ERROR("failed to create disp state task\n"); =20 diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_ato= mic.c index 87a91148a731..4c7d5fb0d914 100644 --- a/drivers/gpu/drm/msm/msm_atomic.c +++ b/drivers/gpu/drm/msm/msm_atomic.c @@ -115,7 +115,7 @@ int msm_atomic_init_pending_timer(struct msm_pending_ti= mer *timer, timer->kms =3D kms; timer->crtc_idx =3D crtc_idx; =20 - timer->worker =3D kthread_run_worker(0, "atomic-worker-%d", crtc_idx); + timer->worker =3D kthread_run_worker("atomic-worker-%d", crtc_idx); if (IS_ERR(timer->worker)) { int ret =3D PTR_ERR(timer->worker); timer->worker =3D NULL; diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c index 84d6c7f50c8d..7b5cf071d0f3 100644 --- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -989,7 +989,7 @@ int msm_gpu_init(struct drm_device *drm, struct platfor= m_device *pdev, gpu->funcs =3D funcs; gpu->name =3D name; =20 - gpu->worker =3D kthread_run_worker(0, "gpu-worker"); + gpu->worker =3D kthread_run_worker("gpu-worker"); if (IS_ERR(gpu->worker)) { ret =3D PTR_ERR(gpu->worker); gpu->worker =3D NULL; diff --git a/drivers/gpu/drm/msm/msm_kms.c b/drivers/gpu/drm/msm/msm_kms.c index e5d0ea629448..69df2b46402d 100644 --- a/drivers/gpu/drm/msm/msm_kms.c +++ b/drivers/gpu/drm/msm/msm_kms.c @@ -306,7 +306,7 @@ int msm_drm_kms_init(struct device *dev, const struct d= rm_driver *drv) /* initialize event thread */ ev_thread =3D &kms->event_thread[drm_crtc_index(crtc)]; ev_thread->dev =3D ddev; - ev_thread->worker =3D kthread_run_worker(0, "crtc_event:%d", crtc->base.= id); + ev_thread->worker =3D kthread_run_worker("crtc_event:%d", crtc->base.id); if (IS_ERR(ev_thread->worker)) { ret =3D PTR_ERR(ev_thread->worker); DRM_DEV_ERROR(dev, "failed to create crtc_event kthread\n"); diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu.c b/drivers= /media/platform/chips-media/wave5/wave5-vpu.c index 76d57c6b636a..fea52a23b8c2 100644 --- a/drivers/media/platform/chips-media/wave5/wave5-vpu.c +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu.c @@ -342,7 +342,7 @@ static int wave5_vpu_probe(struct platform_device *pdev) dev->irq_thread =3D kthread_run(irq_thread, dev, "irq thread"); hrtimer_setup(&dev->hrtimer, &wave5_vpu_timer_callback, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED); - dev->worker =3D kthread_run_worker(0, "vpu_irq_thread"); + dev->worker =3D kthread_run_worker("vpu_irq_thread"); if (IS_ERR(dev->worker)) { dev_err(&pdev->dev, "failed to create vpu irq worker\n"); ret =3D PTR_ERR(dev->worker); diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/c= hip.c index 6fcd7181116a..a7a59e5e99a2 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -394,7 +394,7 @@ static int mv88e6xxx_irq_poll_setup(struct mv88e6xxx_ch= ip *chip) kthread_init_delayed_work(&chip->irq_poll_work, mv88e6xxx_irq_poll); =20 - chip->kworker =3D kthread_run_worker(0, "%s", dev_name(chip->dev)); + chip->kworker =3D kthread_run_worker("%s", dev_name(chip->dev)); if (IS_ERR(chip->kworker)) return PTR_ERR(chip->kworker); =20 diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.c b/drivers/net/ethern= et/intel/ice/ice_dpll.c index 62f75701d652..8c03d14d8f83 100644 --- a/drivers/net/ethernet/intel/ice/ice_dpll.c +++ b/drivers/net/ethernet/intel/ice/ice_dpll.c @@ -3776,8 +3776,8 @@ static int ice_dpll_init_worker(struct ice_pf *pf) struct kthread_worker *kworker; =20 kthread_init_delayed_work(&d->work, ice_dpll_periodic_work); - kworker =3D kthread_run_worker(0, "ice-dplls-%s", - dev_name(ice_pf_to_dev(pf))); + kworker =3D kthread_run_worker("ice-dplls-%s", + dev_name(ice_pf_to_dev(pf))); if (IS_ERR(kworker)) return PTR_ERR(kworker); d->kworker =3D kworker; diff --git a/drivers/net/ethernet/intel/ice/ice_gnss.c b/drivers/net/ethern= et/intel/ice/ice_gnss.c index 8fd954f1ebd6..b85a96d7cac8 100644 --- a/drivers/net/ethernet/intel/ice/ice_gnss.c +++ b/drivers/net/ethernet/intel/ice/ice_gnss.c @@ -182,7 +182,7 @@ static struct gnss_serial *ice_gnss_struct_init(struct = ice_pf *pf) pf->gnss_serial =3D gnss; =20 kthread_init_delayed_work(&gnss->read_work, ice_gnss_read); - kworker =3D kthread_run_worker(0, "ice-gnss-%s", dev_name(dev)); + kworker =3D kthread_run_worker("ice-gnss-%s", dev_name(dev)); if (IS_ERR(kworker)) { kfree(gnss); return NULL; diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/etherne= t/intel/ice/ice_ptp.c index 094e96219f45..cfc8daec3d50 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp.c @@ -3207,8 +3207,8 @@ static int ice_ptp_init_work(struct ice_pf *pf, struc= t ice_ptp *ptp) /* Allocate a kworker for handling work required for the ports * connected to the PTP hardware clock. */ - kworker =3D kthread_run_worker(0, "ice-ptp-%s", - dev_name(ice_pf_to_dev(pf))); + kworker =3D kthread_run_worker("ice-ptp-%s", + dev_name(ice_pf_to_dev(pf))); if (IS_ERR(kworker)) return PTR_ERR(kworker); =20 diff --git a/drivers/platform/chrome/cros_ec_spi.c b/drivers/platform/chrom= e/cros_ec_spi.c index 28fa82f8cb07..0009659712ca 100644 --- a/drivers/platform/chrome/cros_ec_spi.c +++ b/drivers/platform/chrome/cros_ec_spi.c @@ -715,7 +715,7 @@ static int cros_ec_spi_devm_high_pri_alloc(struct devic= e *dev, int err; =20 ec_spi->high_pri_worker =3D - kthread_run_worker(0, "cros_ec_spi_high_pri"); + kthread_run_worker("cros_ec_spi_high_pri"); =20 if (IS_ERR(ec_spi->high_pri_worker)) { err =3D PTR_ERR(ec_spi->high_pri_worker); diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c index d6f54ccaf93b..b9811ccc9147 100644 --- a/drivers/ptp/ptp_clock.c +++ b/drivers/ptp/ptp_clock.c @@ -382,7 +382,7 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_i= nfo *info, =20 if (ptp->info->do_aux_work) { kthread_init_delayed_work(&ptp->aux_work, ptp_aux_kworker); - ptp->kworker =3D kthread_run_worker(0, "ptp%d", ptp->index); + ptp->kworker =3D kthread_run_worker("ptp%d", ptp->index); if (IS_ERR(ptp->kworker)) { err =3D PTR_ERR(ptp->kworker); pr_err("failed to create ptp aux_worker %d\n", err); diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index 61f7bde8c7fb..c0a742290207 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -2046,7 +2046,7 @@ static int spi_init_queue(struct spi_controller *ctlr) ctlr->busy =3D false; ctlr->queue_empty =3D true; =20 - ctlr->kworker =3D kthread_run_worker(0, dev_name(&ctlr->dev)); + ctlr->kworker =3D kthread_run_worker(dev_name(&ctlr->dev)); if (IS_ERR(ctlr->kworker)) { dev_err(&ctlr->dev, "failed to create message pump kworker\n"); return PTR_ERR(ctlr->kworker); diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/f= unction/uvc_video.c index 7cea641b06b4..83a745e9b820 100644 --- a/drivers/usb/gadget/function/uvc_video.c +++ b/drivers/usb/gadget/function/uvc_video.c @@ -819,7 +819,7 @@ int uvcg_video_init(struct uvc_video *video, struct uvc= _device *uvc) return -EINVAL; =20 /* Allocate a kthread for asynchronous hw submit handler. */ - video->kworker =3D kthread_run_worker(0, "UVCG"); + video->kworker =3D kthread_run_worker("UVCG"); if (IS_ERR(video->kworker)) { uvcg_err(&video->uvc->func, "failed to create UVCG kworker\n"); return PTR_ERR(video->kworker); diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c index 1d2f3af034c5..9d9b8c202ffb 100644 --- a/drivers/usb/typec/tcpm/tcpm.c +++ b/drivers/usb/typec/tcpm/tcpm.c @@ -7836,7 +7836,7 @@ struct tcpm_port *tcpm_register_port(struct device *d= ev, struct tcpc_dev *tcpc) mutex_init(&port->lock); mutex_init(&port->swap_lock); =20 - port->wq =3D kthread_run_worker(0, dev_name(dev)); + port->wq =3D kthread_run_worker(dev_name(dev)); if (IS_ERR(port->wq)) return ERR_CAST(port->wq); sched_set_fifo(port->wq->task); diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_= sim.c index 8cb1cc2ea139..78434262bb49 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -229,8 +229,8 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr = *dev_attr, dev =3D &vdpasim->vdpa.dev; =20 kthread_init_work(&vdpasim->work, vdpasim_work_fn); - vdpasim->worker =3D kthread_run_worker(0, "vDPA sim worker: %s", - dev_attr->name); + vdpasim->worker =3D kthread_run_worker("vDPA sim worker: %s", + dev_attr->name); if (IS_ERR(vdpasim->worker)) goto err_iommu; =20 diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_de= v.c index 834f65f4b59a..13fb68728022 100644 --- a/drivers/watchdog/watchdog_dev.c +++ b/drivers/watchdog/watchdog_dev.c @@ -1224,7 +1224,7 @@ int __init watchdog_dev_init(void) { int err; =20 - watchdog_kworker =3D kthread_run_worker(0, "watchdogd"); + watchdog_kworker =3D kthread_run_worker("watchdogd"); if (IS_ERR(watchdog_kworker)) { pr_err("Failed to create watchdog kworker\n"); return PTR_ERR(watchdog_kworker); diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index 3977e42b9516..2f68e2cf393a 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -309,7 +309,7 @@ static void erofs_destroy_percpu_workers(void) static struct kthread_worker *erofs_init_percpu_worker(int cpu) { struct kthread_worker *worker =3D - kthread_run_worker_on_cpu(cpu, 0, "erofs_worker/%u"); + kthread_run_worker_on_cpu(cpu, "erofs_worker/%u"); =20 if (IS_ERR(worker)) return worker; diff --git a/include/linux/kthread.h b/include/linux/kthread.h index a01a474719a7..2630791295ac 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -137,12 +137,7 @@ struct kthread_work; typedef void (*kthread_work_func_t)(struct kthread_work *work); void kthread_delayed_work_timer_fn(struct timer_list *t); =20 -enum { - KTW_FREEZABLE =3D 1 << 0, /* freeze during suspend */ -}; - struct kthread_worker { - unsigned int flags; raw_spinlock_t lock; struct list_head work_list; struct list_head delayed_work_list; @@ -207,39 +202,35 @@ extern void __kthread_init_worker(struct kthread_work= er *worker, =20 int kthread_worker_fn(void *worker_ptr); =20 -__printf(3, 4) -struct kthread_worker *kthread_create_worker_on_node(unsigned int flags, - int node, +__printf(2, 3) +struct kthread_worker *kthread_create_worker_on_node(int node, const char namefmt[], ...); =20 -#define kthread_create_worker(flags, namefmt, ...) \ - kthread_create_worker_on_node(flags, NUMA_NO_NODE, namefmt, ## __VA_ARGS_= _); +#define kthread_create_worker(namefmt, ...) \ + kthread_create_worker_on_node(NUMA_NO_NODE, namefmt, ## __VA_ARGS__) =20 /** * kthread_run_worker - create and wake a kthread worker. - * @flags: flags modifying the default behavior of the worker * @namefmt: printf-style name for the thread. * * Description: Convenient wrapper for kthread_create_worker() followed by * wake_up_process(). Returns the kthread_worker or ERR_PTR(-ENOMEM). */ -#define kthread_run_worker(flags, namefmt, ...) \ +#define kthread_run_worker(namefmt, ...) \ ({ \ struct kthread_worker *__kw \ - =3D kthread_create_worker(flags, namefmt, ## __VA_ARGS__); \ + =3D kthread_create_worker(namefmt, ## __VA_ARGS__); \ if (!IS_ERR(__kw)) \ wake_up_process(__kw->task); \ __kw; \ }) =20 struct kthread_worker * -kthread_create_worker_on_cpu(int cpu, unsigned int flags, - const char namefmt[]); +kthread_create_worker_on_cpu(int cpu, const char namefmt[]); =20 /** * kthread_run_worker_on_cpu - create and wake a cpu bound kthread worker. * @cpu: CPU number - * @flags: flags modifying the default behavior of the worker * @namefmt: printf-style name for the thread. Format is restricted * to "name.*%u". Code fills in cpu number. * @@ -248,12 +239,11 @@ kthread_create_worker_on_cpu(int cpu, unsigned int fl= ags, * ERR_PTR(-ENOMEM). */ static inline struct kthread_worker * -kthread_run_worker_on_cpu(int cpu, unsigned int flags, - const char namefmt[]) +kthread_run_worker_on_cpu(int cpu, const char namefmt[]) { struct kthread_worker *kw; =20 - kw =3D kthread_create_worker_on_cpu(cpu, flags, namefmt); + kw =3D kthread_create_worker_on_cpu(cpu, namefmt); if (!IS_ERR(kw)) wake_up_process(kw->task); =20 diff --git a/kernel/kthread.c b/kernel/kthread.c index 84d535c7a635..4c60c8082126 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -1020,9 +1020,6 @@ int kthread_worker_fn(void *worker_ptr) WARN_ON(worker->task && worker->task !=3D current); worker->task =3D current; =20 - if (worker->flags & KTW_FREEZABLE) - set_freezable(); - repeat: set_current_state(TASK_INTERRUPTIBLE); /* mb paired w/ kthread_stop */ =20 @@ -1073,7 +1070,6 @@ EXPORT_SYMBOL_GPL(kthread_worker_fn); =20 /** * kthread_create_worker_on_node - create a kthread worker - * @flags: flags modifying the default behavior of the worker * @node: task structure for the thread is allocated on this node * @namefmt: printf-style name for the kthread worker (task). * @@ -1082,7 +1078,7 @@ EXPORT_SYMBOL_GPL(kthread_worker_fn); * when the caller was killed by a fatal signal. */ struct kthread_worker * -kthread_create_worker_on_node(unsigned int flags, int node, const char nam= efmt[], ...) +kthread_create_worker_on_node(int node, const char namefmt[], ...) { struct kthread_create_info info =3D { .node =3D node, @@ -1100,7 +1096,6 @@ kthread_create_worker_on_node(unsigned int flags, int= node, const char namefmt[] return ERR_CAST(task); =20 worker =3D kthread_data(task); - worker->flags =3D flags; worker->task =3D task; return worker; } @@ -1110,7 +1105,6 @@ EXPORT_SYMBOL(kthread_create_worker_on_node); * kthread_create_worker_on_cpu - create a kthread worker and bind it * to a given CPU and the associated NUMA node. * @cpu: CPU number - * @flags: flags modifying the default behavior of the worker * @namefmt: printf-style name for the thread. Format is restricted * to "name.*%u". Code fills in cpu number. * @@ -1143,12 +1137,11 @@ EXPORT_SYMBOL(kthread_create_worker_on_node); * when the caller was killed by a fatal signal. */ struct kthread_worker * -kthread_create_worker_on_cpu(int cpu, unsigned int flags, - const char namefmt[]) +kthread_create_worker_on_cpu(int cpu, const char namefmt[]) { struct kthread_worker *worker; =20 - worker =3D kthread_create_worker_on_node(flags, cpu_to_node(cpu), namefmt= , cpu); + worker =3D kthread_create_worker_on_node(cpu_to_node(cpu), namefmt, cpu); if (!IS_ERR(worker)) kthread_bind(worker->task, cpu); =20 diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 55df6d37145e..7d8c6de2a232 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4186,7 +4186,7 @@ static void rcu_spawn_exp_par_gp_kworker(struct rcu_n= ode *rnp) if (rnp->exp_kworker) return; =20 - kworker =3D kthread_create_worker(0, name, rnp_index); + kworker =3D kthread_create_worker(name, rnp_index); if (IS_ERR_OR_NULL(kworker)) { pr_err("Failed to create par gp kworker on %d/%d\n", rnp->grplo, rnp->grphi); @@ -4206,7 +4206,7 @@ static void __init rcu_start_exp_gp_kworker(void) const char *name =3D "rcu_exp_gp_kthread_worker"; struct sched_param param =3D { .sched_priority =3D kthread_prio }; =20 - rcu_exp_gp_kworker =3D kthread_run_worker(0, name); + rcu_exp_gp_kworker =3D kthread_run_worker(name); if (IS_ERR_OR_NULL(rcu_exp_gp_kworker)) { pr_err("Failed to create %s!\n", name); rcu_exp_gp_kworker =3D NULL; diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 62b1f3ac5630..4d2fd73de353 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -4863,7 +4863,7 @@ static struct scx_sched *scx_alloc_and_add_sched(stru= ct sched_ext_ops *ops) goto err_free_gdsqs; } =20 - sch->helper =3D kthread_run_worker(0, "sched_ext_helper"); + sch->helper =3D kthread_run_worker("sched_ext_helper"); if (IS_ERR(sch->helper)) { ret =3D PTR_ERR(sch->helper); goto err_free_pcpu; diff --git a/kernel/workqueue.c b/kernel/workqueue.c index aeaec79bc09c..3670ea197327 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -7954,7 +7954,7 @@ static void __init wq_cpu_intensive_thresh_init(void) unsigned long thresh; unsigned long bogo; =20 - pwq_release_worker =3D kthread_run_worker(0, "pool_workqueue_release"); + pwq_release_worker =3D kthread_run_worker("pool_workqueue_release"); BUG_ON(IS_ERR(pwq_release_worker)); =20 /* if the user set it to a specific value, keep it */ diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c index d2475c3bbb7d..5285a076476c 100644 --- a/net/dsa/tag_ksz.c +++ b/net/dsa/tag_ksz.c @@ -66,8 +66,8 @@ static int ksz_connect(struct dsa_switch *ds) if (!priv) return -ENOMEM; =20 - xmit_worker =3D kthread_run_worker(0, "dsa%d:%d_xmit", - ds->dst->index, ds->index); + xmit_worker =3D kthread_run_worker("dsa%d:%d_xmit", + ds->dst->index, ds->index); if (IS_ERR(xmit_worker)) { ret =3D PTR_ERR(xmit_worker); kfree(priv); diff --git a/net/dsa/tag_ocelot_8021q.c b/net/dsa/tag_ocelot_8021q.c index e89d9254e90a..c3d294a5149e 100644 --- a/net/dsa/tag_ocelot_8021q.c +++ b/net/dsa/tag_ocelot_8021q.c @@ -110,7 +110,7 @@ static int ocelot_connect(struct dsa_switch *ds) if (!priv) return -ENOMEM; =20 - priv->xmit_worker =3D kthread_run_worker(0, "felix_xmit"); + priv->xmit_worker =3D kthread_run_worker("felix_xmit"); if (IS_ERR(priv->xmit_worker)) { err =3D PTR_ERR(priv->xmit_worker); kfree(priv); diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c index de6d4ce8668b..50c7f8fe7a5e 100644 --- a/net/dsa/tag_sja1105.c +++ b/net/dsa/tag_sja1105.c @@ -707,8 +707,8 @@ static int sja1105_connect(struct dsa_switch *ds) =20 spin_lock_init(&priv->meta_lock); =20 - xmit_worker =3D kthread_run_worker(0, "dsa%d:%d_xmit", - ds->dst->index, ds->index); + xmit_worker =3D kthread_run_worker("dsa%d:%d_xmit", + ds->dst->index, ds->index); if (IS_ERR(xmit_worker)) { err =3D PTR_ERR(xmit_worker); kfree(priv); --=20 2.47.3 From nobody Thu Apr 9 17:57:47 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0FC9335BDCB; Tue, 3 Mar 2026 13:49:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545769; cv=none; b=ueVPNPzT11VN+is64JTV8A+MXBHF2E1bvTSqA39KkHoN/5JNil8tCxfhhNsM32VEHe9AxEFU1PEGyYzrHhRNIdAbyFY6j3UrBbh9YtvSdUWZOtHApCR+w9l2nSSBt6rVr5WscCRx2zw9b9QDfIushVAdtp7A6YfPIlJYvQ8QuLg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545769; c=relaxed/simple; bh=/f8YATYhS0KcmnbHrVL7/juOKyb+2GwTV5/gywiNp5o=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Gz35jA+nh5ptPOohJBZj9Vqa8oIrpdMXuTu5TR+JDfy6PKWS4lqQuhMDTia1Nxx9eoUf4z7YVV+ITKHJKzXDKi6M7r5tsiDz4GH5ZfZ//lN3ntRmqU5tuxZUVr17gao589gPet5GdwsyOjMycEcBBr0fYHhC2IZFQlu1rMPekt4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NA8BjHMy; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NA8BjHMy" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 65797C116C6; Tue, 3 Mar 2026 13:49:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772545768; bh=/f8YATYhS0KcmnbHrVL7/juOKyb+2GwTV5/gywiNp5o=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=NA8BjHMym9Yb2haF8WM9w0xcIXLD5G2JVRqS38O8cenvLpqLGQSfRUvDEl+/ANEBL TWAa7GYUpx4rpMVEpW0BTh/6qGd/Li55L7VxyxyLJORmeAVmZ8Aurp7uOKbUyl4pi0 uv8usSYsYtSLI81zUjeTMHw39KZVRV2JwvzExLf0S0Q0RIXsEM0AwZQ63JgkWEwZR2 mP46lt8V2EGXfK2XeY3hHe+wFCpGACEbghMxVfE0go1QuHUuFOjmuMRYoQOj5i6GZw UXiBD1el/o79JbhTWZCuKha8q0xPr7uvHUkqLJWCUPay5wuL9c2K9g8ByY3p8AsAtf KVRrNEcW48EPg== From: Christian Brauner Date: Tue, 03 Mar 2026 14:49:14 +0100 Subject: [PATCH RFC DRAFT POC 03/11] kthread: add extensible kthread_create()/kthread_run() pattern Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260303-work-kthread-nullfs-v1-3-87e559b94375@kernel.org> References: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> In-Reply-To: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> To: linux-fsdevel@vger.kernel.org, Linus Torvalds Cc: linux-kernel@vger.kernel.org, Alexander Viro , Jens Axboe , Jan Kara , Tejun Heo , Jann Horn , Christian Brauner X-Mailer: b4 0.15-dev-47773 X-Developer-Signature: v=1; a=openpgp-sha256; l=9767; i=brauner@kernel.org; h=from:subject:message-id; bh=/f8YATYhS0KcmnbHrVL7/juOKyb+2GwTV5/gywiNp5o=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMWQue3aPaf7B6+kBrY0Tzjrf0l/tkvSl8o1e+qwKw5vZ9 onC/2fv7ihlYRDjYpAVU2RxaDcJl1vOU7HZKFMDZg4rE8gQBi5OAZgIy1KGv1Ityxbyvr2/cRej +fsVh7d+vmP0NqFHwblko0cGn0x7rjjDP4t+X7c5+df3mO81MthsN/VqyN2Nl0qFdhQw2eme65g lwQMA X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 This is similar to what I did for kmem_cache_create() in b2e7456b5c25 ("slab: create kmem_cache_create() compatibility layer"). Instead of piling on new variants of the functions add a struct kthread_args variant that just passes the relevant paramter. Signed-off-by: Christian Brauner --- include/linux/kthread.h | 69 +++++++++++++++++++++++++++------------- kernel/kthread.c | 83 +++++++++++++++++++++++++++++++++++++++++----= ---- 2 files changed, 118 insertions(+), 34 deletions(-) diff --git a/include/linux/kthread.h b/include/linux/kthread.h index 2630791295ac..972cb2960b61 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -25,26 +25,53 @@ static inline struct kthread *tsk_is_kthread(struct tas= k_struct *p) return NULL; } =20 +/** + * struct kthread_args - kthread creation parameters. + * @threadfn: the function to run in the kthread. + * @data: data pointer passed to @threadfn. + * @node: NUMA node for stack/task allocation (NUMA_NO_NODE for any). + * @kthread_worker: set to 1 to create a kthread worker. + * + * Pass a pointer to this struct as the first argument of kthread_create() + * or kthread_run() to use the struct-based creation path. Legacy callers + * that pass a function pointer as the first argument continue to work + * unchanged via _Generic dispatch. + */ +struct kthread_args { + int (*threadfn)(void *data); + void *data; + int node; + u32 kthread_worker:1; +}; + __printf(4, 5) struct task_struct *kthread_create_on_node(int (*threadfn)(void *data), void *data, int node, const char namefmt[], ...); =20 +__printf(2, 3) +struct task_struct *kthread_create_on_info(struct kthread_args *kargs, + const char namefmt[], ...); + +__printf(3, 4) +struct task_struct *__kthread_create(int (*threadfn)(void *data), + void *data, + const char namefmt[], ...); + /** - * kthread_create - create a kthread on the current node - * @threadfn: the function to run in the thread - * @data: data pointer for @threadfn() - * @namefmt: printf-style format string for the thread name - * @arg: arguments for @namefmt. + * kthread_create - create a kthread on the current node. + * @first: either a function pointer (legacy) or a &struct kthread_args + * pointer (struct-based). * - * This macro will create a kthread on the current node, leaving it in - * the stopped state. This is just a helper for kthread_create_on_node(); - * see the documentation there for more details. + * _Generic dispatch: when @first is a &struct kthread_args pointer the + * call is forwarded to kthread_create_on_info(); otherwise it goes through + * __kthread_create() which wraps kthread_create_on_node() with NUMA_NO_NO= DE. */ -#define kthread_create(threadfn, data, namefmt, arg...) \ - kthread_create_on_node(threadfn, data, NUMA_NO_NODE, namefmt, ##arg) - +#define kthread_create(__first, ...) \ + _Generic((__first), \ + struct kthread_args *: kthread_create_on_info, \ + default: __kthread_create)(__first, __VA_ARGS__) =20 struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data), void *data, @@ -59,20 +86,20 @@ bool kthread_is_per_cpu(struct task_struct *k); =20 /** * kthread_run - create and wake a thread. - * @threadfn: the function to run until signal_pending(current). - * @data: data ptr for @threadfn. - * @namefmt: printf-style name for the thread. + * @first: either a function pointer (legacy) or a &struct kthread_args + * pointer (struct-based). Remaining arguments are forwarded to + * kthread_create(). * * Description: Convenient wrapper for kthread_create() followed by * wake_up_process(). Returns the kthread or ERR_PTR(-ENOMEM). */ -#define kthread_run(threadfn, data, namefmt, ...) \ -({ \ - struct task_struct *__k \ - =3D kthread_create(threadfn, data, namefmt, ## __VA_ARGS__); \ - if (!IS_ERR(__k)) \ - wake_up_process(__k); \ - __k; \ +#define kthread_run(__first, ...) \ +({ \ + struct task_struct *__k \ + =3D kthread_create(__first, __VA_ARGS__); \ + if (!IS_ERR(__k)) \ + wake_up_process(__k); \ + __k; \ }) =20 /** diff --git a/kernel/kthread.c b/kernel/kthread.c index 4c60c8082126..20ec96142ce6 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -38,8 +38,7 @@ struct task_struct *kthreadd_task; static LIST_HEAD(kthread_affinity_list); static DEFINE_MUTEX(kthread_affinity_lock); =20 -struct kthread_create_info -{ +struct kthread_create_req { /* Information passed to kthread() from kthreadd. */ char *full_name; int (*threadfn)(void *data); @@ -382,7 +381,7 @@ static int kthread(void *_create) { static const struct sched_param param =3D { .sched_priority =3D 0 }; /* Copy data: it's on kthread's stack */ - struct kthread_create_info *create =3D _create; + struct kthread_create_req *create =3D _create; int (*threadfn)(void *data) =3D create->threadfn; void *data =3D create->data; struct completion *done; @@ -449,7 +448,7 @@ int tsk_fork_get_node(struct task_struct *tsk) return NUMA_NO_NODE; } =20 -static void create_kthread(struct kthread_create_info *create) +static void create_kthread(struct kthread_create_req *create) { int pid; struct kernel_clone_args args =3D { @@ -480,20 +479,23 @@ static void create_kthread(struct kthread_create_info= *create) } } =20 -static struct task_struct *__kthread_create_on_node(const struct kthread_c= reate_info *info, +static struct task_struct *__kthread_create_on_node(const struct kthread_a= rgs *kargs, const char namefmt[], va_list args) { DECLARE_COMPLETION_ONSTACK(done); struct kthread_worker *worker =3D NULL; struct task_struct *task; - struct kthread_create_info *create; + struct kthread_create_req *create; =20 create =3D kmalloc_obj(*create); if (!create) return ERR_PTR(-ENOMEM); =20 - *create =3D *info; + create->threadfn =3D kargs->threadfn; + create->data =3D kargs->data; + create->node =3D kargs->node; + create->kthread_worker =3D kargs->kthread_worker; =20 if (create->kthread_worker) { worker =3D kzalloc_obj(*worker); @@ -573,7 +575,7 @@ struct task_struct *kthread_create_on_node(int (*thread= fn)(void *data), const char namefmt[], ...) { - struct kthread_create_info info =3D { + struct kthread_args kargs =3D { .threadfn =3D threadfn, .data =3D data, .node =3D node, @@ -582,13 +584,68 @@ struct task_struct *kthread_create_on_node(int (*thre= adfn)(void *data), va_list args; =20 va_start(args, namefmt); - task =3D __kthread_create_on_node(&info, namefmt, args); + task =3D __kthread_create_on_node(&kargs, namefmt, args); va_end(args); =20 return task; } EXPORT_SYMBOL(kthread_create_on_node); =20 +/** + * kthread_create_on_info - create a kthread from a struct kthread_args. + * @kargs: kthread creation parameters. + * @namefmt: printf-style name for the thread. + * + * This is the struct-based kthread creation path, dispatched via the + * kthread_create() _Generic macro when the first argument is a + * &struct kthread_args pointer. + * + * Returns a task_struct or ERR_PTR(-ENOMEM) or ERR_PTR(-EINTR). + */ +struct task_struct *kthread_create_on_info(struct kthread_args *kargs, + const char namefmt[], ...) +{ + struct task_struct *task; + va_list args; + + va_start(args, namefmt); + task =3D __kthread_create_on_node(kargs, namefmt, args); + va_end(args); + + return task; +} +EXPORT_SYMBOL(kthread_create_on_info); + +/** + * __kthread_create - create a kthread (legacy positional-argument path). + * @threadfn: the function to run until signal_pending(current). + * @data: data ptr for @threadfn. + * @namefmt: printf-style name for the thread. + * + * _Generic dispatch target for kthread_create() when the first argument + * is a function pointer rather than a &struct kthread_args. + * + * Returns a task_struct or ERR_PTR(-ENOMEM) or ERR_PTR(-EINTR). + */ +struct task_struct *__kthread_create(int (*threadfn)(void *data), + void *data, const char namefmt[], ...) +{ + struct kthread_args kargs =3D { + .threadfn =3D threadfn, + .data =3D data, + .node =3D NUMA_NO_NODE, + }; + struct task_struct *task; + va_list args; + + va_start(args, namefmt); + task =3D __kthread_create_on_node(&kargs, namefmt, args); + va_end(args); + + return task; +} +EXPORT_SYMBOL(__kthread_create); + static void __kthread_bind_mask(struct task_struct *p, const struct cpumas= k *mask, unsigned int state) { if (!wait_task_inactive(p, state)) { @@ -833,10 +890,10 @@ int kthreadd(void *unused) =20 spin_lock(&kthread_create_lock); while (!list_empty(&kthread_create_list)) { - struct kthread_create_info *create; + struct kthread_create_req *create; =20 create =3D list_entry(kthread_create_list.next, - struct kthread_create_info, list); + struct kthread_create_req, list); list_del_init(&create->list); spin_unlock(&kthread_create_lock); =20 @@ -1080,7 +1137,7 @@ EXPORT_SYMBOL_GPL(kthread_worker_fn); struct kthread_worker * kthread_create_worker_on_node(int node, const char namefmt[], ...) { - struct kthread_create_info info =3D { + struct kthread_args kargs =3D { .node =3D node, .kthread_worker =3D 1, }; @@ -1089,7 +1146,7 @@ kthread_create_worker_on_node(int node, const char na= mefmt[], ...) va_list args; =20 va_start(args, namefmt); - task =3D __kthread_create_on_node(&info, namefmt, args); + task =3D __kthread_create_on_node(&kargs, namefmt, args); va_end(args); =20 if (IS_ERR(task)) --=20 2.47.3 From nobody Thu Apr 9 17:57:47 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 781DE35D5E2; Tue, 3 Mar 2026 13:49:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545771; cv=none; b=aMv8OyluWDEv8Y2nRl3Avte5p7Og93qThGwYvRXBtrQ7XbwU7GSH5oiUPgqftHLkKudxZLwP4rGaRXT4dA5vd5QOEei/DP6qUHNPeD/UAu2+p+Ue4o5FupUou8mNFDkOH+BhuyOK/1tBBaAxQ3Lp0SIa8eIpQ58KxOuQoeyeT2U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545771; c=relaxed/simple; bh=2s3ijDSMxoTDVKzFWvkN9Jlo/jKZ0E04ThBDqqjO0fg=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=rxhTFQurZZ+eGCp2B7ip+2KxYpQ/hgRNG+4kN5S81VF3OT+DSO1FSOER1nQR6VfkTwmzddMHPksnx3U++pSS5Fvn8MG0mL8gUCWcqtzSWWm0X3Q6r491GC0FlLx65QQBHeKZk8naTJPrFZ9HJNg4pRfSdzWTTs7w+RappC4JgVM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=MarP8QcJ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="MarP8QcJ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D3D41C2BC9E; Tue, 3 Mar 2026 13:49:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772545771; bh=2s3ijDSMxoTDVKzFWvkN9Jlo/jKZ0E04ThBDqqjO0fg=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=MarP8QcJaq/s30v4+pGU45MYOTVYQ3wyEUF7c6ZER2/qsRxCOzpbt2a1H+M2benVc qrHFgPuGNuVZWK60moeglRdh/bFv6lFeuSqPlNGV+V9MaZdViGHpF1TqSmVYqIWo8l 9AKoysRyOjVN6cnTCe9ce/3/WicnakjwvONVLx8OZoU+FjbBDHXR4aRbaH86hhZ69w V2p2wjDCeG4MOg8A9B3KBwHNUnDXt/OJs1fmekTYu5PoWHIbItkJARDAXWmQUU3WwV i4VsA6pDljLJP6/60MkO+voPXVrzs8SWiqakD88gx8llbezypwAC5+coak7UVQEysz 6XhMfOtE38i6w== From: Christian Brauner Date: Tue, 03 Mar 2026 14:49:15 +0100 Subject: [PATCH RFC DRAFT POC 04/11] fs: notice when init abandons fs sharing Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260303-work-kthread-nullfs-v1-4-87e559b94375@kernel.org> References: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> In-Reply-To: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> To: linux-fsdevel@vger.kernel.org, Linus Torvalds Cc: linux-kernel@vger.kernel.org, Alexander Viro , Jens Axboe , Jan Kara , Tejun Heo , Jann Horn , Christian Brauner X-Mailer: b4 0.15-dev-47773 X-Developer-Signature: v=1; a=openpgp-sha256; l=3873; i=brauner@kernel.org; h=from:subject:message-id; bh=2s3ijDSMxoTDVKzFWvkN9Jlo/jKZ0E04ThBDqqjO0fg=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMWQue3Yv51LyFv6kZe16MvF5M8XupOq6R1+bdFrIZNOtP xn+c2yZO0pZGMS4GGTFFFkc2k3C5ZbzVGw2ytSAmcPKBDKEgYtTACby5ysjQ1O8FzO/uvyKtX9W 7ZMwjml13Rp96kJrQ/wtqQ+7nT8JzmZkuOMr8HvDp5U8vxh6xPKCf53Z+3P5kb/mD9auZV7nte3 YFC4A X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 PID 1 may choose to stop sharing fs_struct state with us. Either via unshare(CLONE_FS) or unshare(CLONE_NEWNS). Of course, PID 1 could have chosen to create arbitrary process trees that all share fs_struct state via CLONE_FS. This is a strong statement: We only care about PID 1 aka the thread-group leader so ubthread's fs_struct state doesn't matter. PID 1 unsharing fs_struct state is a bug. PID 1 relies on various kthreads to be able to perform work based on its fs_struct state. Breaking that contract sucks for both sides. So just don't bother with extra work for this. No sane init system should ever do this. Signed-off-by: Christian Brauner --- fs/fs_struct.c | 43 +++++++++++++++++++++++++++++++++++++++++++ include/linux/fs_struct.h | 2 ++ kernel/fork.c | 14 +++----------- 3 files changed, 48 insertions(+), 11 deletions(-) diff --git a/fs/fs_struct.c b/fs/fs_struct.c index 394875d06fd6..ab6826d7a6a9 100644 --- a/fs/fs_struct.c +++ b/fs/fs_struct.c @@ -147,6 +147,49 @@ int unshare_fs_struct(void) } EXPORT_SYMBOL_GPL(unshare_fs_struct); =20 +/* + * PID 1 may choose to stop sharing fs_struct state with us. + * Either via unshare(CLONE_FS) or unshare(CLONE_NEWNS). Of + * course, PID 1 could have chosen to create arbitrary process + * trees that all share fs_struct state via CLONE_FS. This is a + * strong statement: We only care about PID 1 aka the thread-group + * leader so ubthread's fs_struct state doesn't matter. + * + * PID 1 unsharing fs_struct state is a bug. PID 1 relies on + * various kthreads to be able to perform work based on its + * fs_struct state. Breaking that contract sucks for both sides. + * So just don't bother with extra work for this. No sane init + * system should ever do this. + */ +static inline bool nullfs_userspace_init(void) +{ + struct fs_struct *fs =3D current->fs; + + if (unlikely(current->pid =3D=3D 1) && fs !=3D &init_fs) { + pr_warn("VFS: Pid 1 stopped sharing filesystem state\n"); + return true; + } + + return false; +} + +struct fs_struct *switch_fs_struct(struct fs_struct *new_fs) +{ + struct fs_struct *fs; + + fs =3D current->fs; + read_seqlock_excl(&fs->seq); + current->fs =3D new_fs; + if (--fs->users) + new_fs =3D NULL; + else + new_fs =3D fs; + read_sequnlock_excl(&fs->seq); + + nullfs_userspace_init(); + return new_fs; +} + /* to be mentioned only in INIT_TASK */ struct fs_struct init_fs =3D { .users =3D 1, diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h index 0070764b790a..ade459383f92 100644 --- a/include/linux/fs_struct.h +++ b/include/linux/fs_struct.h @@ -40,6 +40,8 @@ static inline void get_fs_pwd(struct fs_struct *fs, struc= t path *pwd) read_sequnlock_excl(&fs->seq); } =20 +struct fs_struct *switch_fs_struct(struct fs_struct *new_fs); + extern bool current_chrooted(void); =20 static inline int current_umask(void) diff --git a/kernel/fork.c b/kernel/fork.c index 65113a304518..583078c69bbd 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -3123,7 +3123,7 @@ static int unshare_fd(unsigned long unshare_flags, st= ruct files_struct **new_fdp */ int ksys_unshare(unsigned long unshare_flags) { - struct fs_struct *fs, *new_fs =3D NULL; + struct fs_struct *new_fs =3D NULL; struct files_struct *new_fd =3D NULL; struct cred *new_cred =3D NULL; struct nsproxy *new_nsproxy =3D NULL; @@ -3200,16 +3200,8 @@ int ksys_unshare(unsigned long unshare_flags) =20 task_lock(current); =20 - if (new_fs) { - fs =3D current->fs; - read_seqlock_excl(&fs->seq); - current->fs =3D new_fs; - if (--fs->users) - new_fs =3D NULL; - else - new_fs =3D fs; - read_sequnlock_excl(&fs->seq); - } + if (new_fs) + new_fs =3D switch_fs_struct(new_fs); =20 if (new_fd) swap(current->files, new_fd); --=20 2.47.3 From nobody Thu Apr 9 17:57:47 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E1ABC35C181; Tue, 3 Mar 2026 13:49:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545774; cv=none; b=mTaUszaynMKTARULP+hqGFDo8yQF2BLW8320Ucd2u8U9b+uab0ODBjC76vdd3FQ8wA9gMwry/rdwG5g7RH9SluRt/fTp0BpySPEIhzJoyC72MDeaKYAX7znQw0LVrgZXVh9EP+ILz8KCNoHFiicllIFi1vmaAvTrn8bkfPhF2Mg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545774; c=relaxed/simple; bh=Kp7R1oLP8DpihtIuEU4nA7CAO+wMVu/giG6qsMQq6nY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=PiFZZJ6O9HuN2guvwk/f42xHVXq2zMI1Hxirg97KmWu84uy73fjEyUe9l0ukERXJbavUkMtKq4qLiEfg6WaUbKSkwGLCO7CJDmedSzvaPDlgeVIWovCjzEHyzDBgd9174J3KxeTqP0UalFbuesqydhplOTpKVqCBw++zUAlJF5Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RifPClnR; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RifPClnR" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6B4F4C116C6; Tue, 3 Mar 2026 13:49:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772545773; bh=Kp7R1oLP8DpihtIuEU4nA7CAO+wMVu/giG6qsMQq6nY=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=RifPClnRKQYstvijQswg9rYEDpTcV3ujtDZ6smdNHs+WmoAychNwIIKFQtgj1x5J1 uylv1V9Ggt/hZYvB0vFKidjKRQ1aTlssnOa/99SCz04SPMV9pr8+Vq3ftdYWWhWmLY 4NO+2ObuSWEGDC2ADlBXglRJXNJLk17zK4EjoGJig4qawHlMlgEg06M3x4ZCsGn97D StMDUn15GCU4beWgE3v7+gFemKdgN5ds+p+HyfHrUxpTJiDVhFmWzIEV1Eshus/zTI 9sJjZJXp3feF0m/NEkW3vQO8uW8nKDg9lT63Kd3CK2VvP7rjJ4X7sSyodkPW4acmhL GysrsiEwjTR1w== From: Christian Brauner Date: Tue, 03 Mar 2026 14:49:16 +0100 Subject: [PATCH RFC DRAFT POC 05/11] fs: add LOOKUP_IN_INIT Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260303-work-kthread-nullfs-v1-5-87e559b94375@kernel.org> References: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> In-Reply-To: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> To: linux-fsdevel@vger.kernel.org, Linus Torvalds Cc: linux-kernel@vger.kernel.org, Alexander Viro , Jens Axboe , Jan Kara , Tejun Heo , Jann Horn , Christian Brauner X-Mailer: b4 0.15-dev-47773 X-Developer-Signature: v=1; a=openpgp-sha256; l=2453; i=brauner@kernel.org; h=from:subject:message-id; bh=Kp7R1oLP8DpihtIuEU4nA7CAO+wMVu/giG6qsMQq6nY=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMWQue3bvsf7cvyym0r/n13Z9n2iVLrJugegnIwaPrcocr zY+9p56p6OUhUGMi0FWTJHFod0kXG45T8Vmo0wNmDmsTCBDGLg4BWAiLDsZGSZOvGz3df/m199y vjx4V3Qi5Iqk7u5pVzz2OvJnS4eX/7nA8IcjgGF5ne6kvjuq23Ytjfr9VyZlpcvO07fXmfRLaUf 0ynIDAA== X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Add a new LOOKUP_IN_INIT flag that causes the lookup to be performed relative to userspace init's root or working directory. This will be used to force kthreads to be isolated in nullfs and explicitly opt-in to lookup stuff in init's filesystem state. Signed-off-by: Christian Brauner --- fs/namei.c | 17 ++++++++++++++--- include/linux/namei.h | 3 ++- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 58f715f7657e..dd2710d5f5df 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1099,7 +1099,12 @@ static int complete_walk(struct nameidata *nd) =20 static int set_root(struct nameidata *nd) { - struct fs_struct *fs =3D current->fs; + struct fs_struct *fs; + + if (nd->flags & LOOKUP_IN_INIT) + fs =3D &init_fs; + else + fs =3D current->fs; =20 /* * Jumping to the real root in a scoped-lookup is a BUG in namei, but we @@ -2716,8 +2721,14 @@ static const char *path_init(struct nameidata *nd, u= nsigned flags) =20 /* Relative pathname -- get the starting-point it is relative to. */ if (nd->dfd =3D=3D AT_FDCWD) { + struct fs_struct *fs; + + if (nd->flags & LOOKUP_IN_INIT) + fs =3D &init_fs; + else + fs =3D current->fs; + if (flags & LOOKUP_RCU) { - struct fs_struct *fs =3D current->fs; unsigned seq; =20 do { @@ -2727,7 +2738,7 @@ static const char *path_init(struct nameidata *nd, un= signed flags) nd->seq =3D __read_seqcount_begin(&nd->path.dentry->d_seq); } while (read_seqretry(&fs->seq, seq)); } else { - get_fs_pwd(current->fs, &nd->path); + get_fs_pwd(fs, &nd->path); nd->inode =3D nd->path.dentry->d_inode; } } else { diff --git a/include/linux/namei.h b/include/linux/namei.h index 58600cf234bc..072533ec367b 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -46,9 +46,10 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT}; #define LOOKUP_NO_XDEV BIT(26) /* No mountpoint crossing. */ #define LOOKUP_BENEATH BIT(27) /* No escaping from starting point. */ #define LOOKUP_IN_ROOT BIT(28) /* Treat dirfd as fs root. */ +#define LOOKUP_IN_INIT BIT(29) /* Lookup in init's namespace. */ /* LOOKUP_* flags which do scope-related checks based on the dirfd. */ #define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT) -/* 3 spare bits for scoping */ +/* 2 spare bits for scoping */ =20 extern int path_pts(struct path *path); =20 --=20 2.47.3 From nobody Thu Apr 9 17:57:47 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A05C358393; Tue, 3 Mar 2026 13:49:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545776; cv=none; b=Cyo4pYQYLUf7iUB/I6qksA6iQ+3lQUWR8P0xXu0Gw45S7pLRdNxsthUNMxgoY/WZFMo308B3O3tSV9KnY0/8gJ+QVf7IDZsawg9BP40Xaf7FRpH1+IodwIfbjh2SOfFP6cszl6Fq+8Uwcfq6OxB0VQBIdX2Yn+9ojBBET/H2kRA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545776; c=relaxed/simple; bh=h1wcd65W5zs2eQUWT7gXxS1QFuH0m7DMN6iquDR9cEQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=JL3or9JykTGo5iIuhwuqkNceReL3zab0uJvGf+Yogr3NqlSMJ222yvQKy7bymm64uRXHwmpBhVW4pIcU9adnRGb+V7f2XTOY85qu88eQagSgbewL68uSg/Sh+UUWRvGCjCZWFH7OvTNTSM0HsLOeYInbt5MepA1gxHwCR99y61Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=JIIw2wAB; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="JIIw2wAB" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B835DC19422; Tue, 3 Mar 2026 13:49:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772545775; bh=h1wcd65W5zs2eQUWT7gXxS1QFuH0m7DMN6iquDR9cEQ=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=JIIw2wABtVNItboNQ0RYihXT/bfcrwy+i9dXNGc0aLX2w/N8wW0KHf2O4FabvWi1R zbARXVU91ISnbWbPbjqD+/rufBRtclUsxXRUjIe5k3yIvVZfpXjDbWWK3PuNwL8+W1 wuEG42FUzA54RFcbFutjY68wnLXyeNQLK+mus4xArwExgVaYuL0GVo6r/eDG3HPpwo fYSWprGuw4gv/FUdXMmNLA5/zhu68pAwv0QmCDup0tcYP5AJl8/9dG43KfWt1yQ48l ub9jht1jfFaakRv4O6XhyHsxn5k+wnnVErnA2ROwVh397mySVuJZVFGaSVf8leypd4 IBIUnivXy8GpQ== From: Christian Brauner Date: Tue, 03 Mar 2026 14:49:17 +0100 Subject: [PATCH RFC DRAFT POC 06/11] fs: add file_open_init() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260303-work-kthread-nullfs-v1-6-87e559b94375@kernel.org> References: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> In-Reply-To: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> To: linux-fsdevel@vger.kernel.org, Linus Torvalds Cc: linux-kernel@vger.kernel.org, Alexander Viro , Jens Axboe , Jan Kara , Tejun Heo , Jann Horn , Christian Brauner X-Mailer: b4 0.15-dev-47773 X-Developer-Signature: v=1; a=openpgp-sha256; l=2156; i=brauner@kernel.org; h=from:subject:message-id; bh=h1wcd65W5zs2eQUWT7gXxS1QFuH0m7DMN6iquDR9cEQ=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMWQue3ZPmFfwWX7tcZszeYoPP/bbaTZFhl1bpFy5ZP25e 29Ohy5M7ShlYRDjYpAVU2RxaDcJl1vOU7HZKFMDZg4rE8gQBi5OAZjIyZkMf6WFp2VeLPe7522y L/pO8j7Hvr1FXS/PaTt8WTtFk++v6HGG/+VHb8hPrr7vIBA5qbZ+fsVX3UWCOUvLlY6IJJi9tg2 eyQYA X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Add a helper to allow the few users that need it to open a file in init's fs_struct from a kernel thread. Signed-off-by: Christian Brauner --- fs/open.c | 25 +++++++++++++++++++++++++ include/linux/fs.h | 1 + 2 files changed, 26 insertions(+) diff --git a/fs/open.c b/fs/open.c index 91f1139591ab..bc97d66b6348 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1342,6 +1342,31 @@ struct file *filp_open(const char *filename, int fla= gs, umode_t mode) } EXPORT_SYMBOL(filp_open); =20 +/** + * filp_open_init - open file resolving paths against init's root + * + * @filename: path to open + * @flags: open flags as per the open(2) second argument + * @mode: mode for the new file if O_CREAT is set, else ignored + * + * Same as filp_open() but path resolution is done relative to init's + * root (using pid1_fs) instead of current->fs. Intended for kernel + * threads that need to open files by absolute path after being rooted + * in nullfs. + */ +struct file *filp_open_init(const char *filename, int flags, umode_t mode) +{ + struct open_flags op; + struct open_how how =3D build_open_how(flags, mode); + int err =3D build_open_flags(&how, &op); + if (err) + return ERR_PTR(err); + op.lookup_flags |=3D LOOKUP_IN_INIT; + CLASS(filename_kernel, name)(filename); + return do_file_open(AT_FDCWD, name, &op); +} +EXPORT_SYMBOL(filp_open_init); + struct file *file_open_root(const struct path *root, const char *filename, int flags, umode_t mode) { diff --git a/include/linux/fs.h b/include/linux/fs.h index 8b3dd145b25e..bc0430e72c74 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2459,6 +2459,7 @@ int do_sys_open(int dfd, const char __user *filename,= int flags, umode_t mode); extern struct file *file_open_name(struct filename *, int, umode_t); extern struct file *filp_open(const char *, int, umode_t); +extern struct file *filp_open_init(const char *, int, umode_t); extern struct file *file_open_root(const struct path *, const char *, int, umode_t); static inline struct file *file_open_root_mnt(struct vfsmount *mnt, --=20 2.47.3 From nobody Thu Apr 9 17:57:47 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B981282F3C; Tue, 3 Mar 2026 13:49:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545778; cv=none; b=X1ZSxtTl+KnADLd13F9XsB22mo1Yavj7hDshz831MHdT7bf/2n4ser9lzRAflFZDDXi28yxzGMiRPGEn+crlhsSNpG5p+LyJZwni8Fg56/ewKTdWklOFZ3WUDPPLEegOqNk9GsPfU0B4x89dzF2BzDgEoVEmDDjb97NeU71P1jg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545778; c=relaxed/simple; bh=Rd97hp5Sb2SQumjj0ys3LJKvu37X6SG0d3207zpR8cQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=ItWmUBmCVM75CxjD+fEcu+sBYzTjGTDhLLTOENTCFBtcVenI+pRS5rwdOkiUFHdDFP7qb06Vx603yPqUr+yd8ubkecCLE0zjc79JY7C5wPto6+gBKJn1tWYjJn6Q3Iq0h8D04+NDV/CXaIKVjQLedkEnVeyKVz0u7ClW11KQxAo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=cF4jAPtt; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="cF4jAPtt" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1942EC2BC9E; Tue, 3 Mar 2026 13:49:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772545778; bh=Rd97hp5Sb2SQumjj0ys3LJKvu37X6SG0d3207zpR8cQ=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=cF4jAPttDUp96GpnVaXGr7kvIHQ1LKriskkdYgfy3c8pMqt4nrJJ2UaiBLuGggzpT oc0DtU2byev6zS0ZZumwQ+FncTxzcIhVIWpJxpfHh5aQEuheg7lLJ9BJzHE2Pg239M YWV9CWdcsav1hIOlD/JlzVslDPy+vk/NVD9tXxQ85gjS7YeGxB+Ww9rVldpesUiWUG VFRm6fuEuaoMmdGlSfk/uJcwNes0a11mRjwc72m6C3RUW16jepuwWmvVpexHJE3F7x z1wG/PLN+nZtqU0f9qOO9hj9p+ccHgANWg9OYBsWrC6+n4n8zkAB5dJ1YIjiBfTmQM puQOZf5lq6fnA== From: Christian Brauner Date: Tue, 03 Mar 2026 14:49:18 +0100 Subject: [PATCH RFC DRAFT POC 07/11] block: add bdev_file_open_init() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260303-work-kthread-nullfs-v1-7-87e559b94375@kernel.org> References: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> In-Reply-To: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> To: linux-fsdevel@vger.kernel.org, Linus Torvalds Cc: linux-kernel@vger.kernel.org, Alexander Viro , Jens Axboe , Jan Kara , Tejun Heo , Jann Horn , Christian Brauner X-Mailer: b4 0.15-dev-47773 X-Developer-Signature: v=1; a=openpgp-sha256; l=3393; i=brauner@kernel.org; h=from:subject:message-id; bh=Rd97hp5Sb2SQumjj0ys3LJKvu37X6SG0d3207zpR8cQ=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMWQue3bvica7Sl2JXrnbhXdqnkTMk5q0eeZsg8Spd4KbQ rVbFnUe7ChlYRDjYpAVU2RxaDcJl1vOU7HZKFMDZg4rE8gQBi5OAZhImhLD/+JMw1f/+p5KWc9n Tvo+VzJ33Wb1w69Xmj6Mel+tHO9pMpWR4aONfOmsLaKNigk+yq1TtnJWGgTpl61/H9MSzyO5PTi YFwA= X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Add a helper to open a block device from a kthread. Signed-off-by: Christian Brauner --- block/bdev.c | 60 +++++++++++++++++++++++++++++++++++++---------= ---- include/linux/blkdev.h | 2 ++ 2 files changed, 47 insertions(+), 15 deletions(-) diff --git a/block/bdev.c b/block/bdev.c index ed022f8c48c7..79152c3ffa76 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -1083,6 +1083,20 @@ struct file *bdev_file_open_by_dev(dev_t dev, blk_mo= de_t mode, void *holder, } EXPORT_SYMBOL(bdev_file_open_by_dev); =20 +static int validate_bdev(const struct path *path, dev_t *dev) +{ + struct inode *inode; + + inode =3D d_backing_inode(path->dentry); + if (!S_ISBLK(inode->i_mode)) + return -ENOTBLK; + if (!may_open_dev(path)) + return -EACCES; + + *dev =3D inode->i_rdev; + return 0; +} + struct file *bdev_file_open_by_path(const char *path, blk_mode_t mode, void *holder, const struct blk_holder_ops *hops) @@ -1107,6 +1121,35 @@ struct file *bdev_file_open_by_path(const char *path= , blk_mode_t mode, } EXPORT_SYMBOL(bdev_file_open_by_path); =20 +struct file *bdev_file_open_init(const char *path, blk_mode_t mode, + void *holder, + const struct blk_holder_ops *hops) +{ + struct path p __free(path_put) =3D {}; + struct file *file; + dev_t dev; + int error; + + error =3D kern_path(path, LOOKUP_FOLLOW | LOOKUP_IN_INIT, &p); + if (error) + return ERR_PTR(error); + + error =3D validate_bdev(&p, &dev); + if (error) + return ERR_PTR(error); + + file =3D bdev_file_open_by_dev(dev, mode, holder, hops); + if (!IS_ERR(file) && (mode & BLK_OPEN_WRITE)) { + if (bdev_read_only(file_bdev(file))) { + fput(file); + file =3D ERR_PTR(-EACCES); + } + } + + return file; +} +EXPORT_SYMBOL(bdev_file_open_init); + static inline void bd_yield_claim(struct file *bdev_file) { struct block_device *bdev =3D file_bdev(bdev_file); @@ -1211,8 +1254,7 @@ EXPORT_SYMBOL(bdev_fput); */ int lookup_bdev(const char *pathname, dev_t *dev) { - struct inode *inode; - struct path path; + struct path path __free(path_put) =3D {}; int error; =20 if (!pathname || !*pathname) @@ -1222,19 +1264,7 @@ int lookup_bdev(const char *pathname, dev_t *dev) if (error) return error; =20 - inode =3D d_backing_inode(path.dentry); - error =3D -ENOTBLK; - if (!S_ISBLK(inode->i_mode)) - goto out_path_put; - error =3D -EACCES; - if (!may_open_dev(&path)) - goto out_path_put; - - *dev =3D inode->i_rdev; - error =3D 0; -out_path_put: - path_put(&path); - return error; + return validate_bdev(&path, dev); } EXPORT_SYMBOL(lookup_bdev); =20 diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index d463b9b5a0a5..9070979b6616 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1773,6 +1773,8 @@ struct file *bdev_file_open_by_dev(dev_t dev, blk_mod= e_t mode, void *holder, const struct blk_holder_ops *hops); struct file *bdev_file_open_by_path(const char *path, blk_mode_t mode, void *holder, const struct blk_holder_ops *hops); +struct file *bdev_file_open_init(const char *path, blk_mode_t mode, + void *holder, const struct blk_holder_ops *hops); int bd_prepare_to_claim(struct block_device *bdev, void *holder, const struct blk_holder_ops *hops); void bd_abort_claiming(struct block_device *bdev, void *holder); --=20 2.47.3 From nobody Thu Apr 9 17:57:47 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E2C135C19B; Tue, 3 Mar 2026 13:49:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545781; cv=none; b=fz/MrpSvxCJQ2Ut81rUqp7DFdliC7smEmtSTZD8Myxf0fw9d2V0FhA/HdPb7SOn9FymB2tAkjQiugrpqvj5JauWAvgKU6GpWjxTMoJ/TH58xrz2mbZxBLxJZ/m330cYynWr4Ze+U4kDxoiHCSNfIy1b5Ruu0TBqPorSbbJrluU0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545781; c=relaxed/simple; bh=Ikabcn8EKfLA8KEpW72CmxbJ0nkIh821mCuZpNy2Fkk=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=lU85WjT7Pcc2Kpd0EemXbk/pqgHFfDo3hDypMrlOgh9IOil2HLJJ9420qeMTWbbaoKdyLqNK9lGQXBvFntoH46Dh09Tiwq/HE1mty1rYsxmJ8Elboxt0XdfnawGw6n6GtDQcNTiAmkbMgiN+mxT54RrsuvxozzU+OzicMOGvhq4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=XctvIxld; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="XctvIxld" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 96BDDC116C6; Tue, 3 Mar 2026 13:49:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772545780; bh=Ikabcn8EKfLA8KEpW72CmxbJ0nkIh821mCuZpNy2Fkk=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=XctvIxldCVUOSOoPBZXpuAX2bMob7wOqPcujevyse76Jpya/f2x9byYNewl2TU6oF BDyQFTkPDYQUBO1pB9DoA3HPGJf0Pml4ieiou2/qOm6BKYEuqc5a+oAK7J4gkSGkBD JGU54KVKLS/F7skBmn3611vO7hM9n+L6ZvsfYL7uDsW1T2+JW8qKNuJj7LUZVFTsjT a6nC5FppW5ZuGi6HtplqCd2hWqkeeBsgKyM0L9GCN71XcgWrYwvRhh+Rf86wBZ0BL+ Y4w6USo2BL+4SDIDjjZ3JujyseeKidji4fwXqxoFfu/qJ3BJdjWVrbxWs0iM5QR++D ZzDG3/Yh34tBQ== From: Christian Brauner Date: Tue, 03 Mar 2026 14:49:19 +0100 Subject: [PATCH RFC DRAFT POC 08/11] fs: allow to pass lookup flags to filename_*() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260303-work-kthread-nullfs-v1-8-87e559b94375@kernel.org> References: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> In-Reply-To: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> To: linux-fsdevel@vger.kernel.org, Linus Torvalds Cc: linux-kernel@vger.kernel.org, Alexander Viro , Jens Axboe , Jan Kara , Tejun Heo , Jann Horn , Christian Brauner X-Mailer: b4 0.15-dev-47773 X-Developer-Signature: v=1; a=openpgp-sha256; l=12595; i=brauner@kernel.org; h=from:subject:message-id; bh=Ikabcn8EKfLA8KEpW72CmxbJ0nkIh821mCuZpNy2Fkk=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMWQue3Zvyx3L5wkRnedZPjmfv/79fPfZh5LT5WdUNUmKn H9zgnPbxo5SFgYxLgZZMUUWh3aTcLnlPBWbjTI1YOawMoEMYeDiFICJ/N3MyNA+/4LN//2XOnY/ kjZJFfKem7j70av5tf6LxWsP354zzXABw3+ndG3WC83slnwX8qqmnA25Lnb8z2vVD1H767k0zjw TUWYCAA== X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Allow lookup flags to be passed to filename_*() so callers can pass LOOUP_IN_INIT to explicitly opt-into to performing lookups in init's filesystem state. Signed-off-by: Christian Brauner --- fs/coredump.c | 2 +- fs/init.c | 12 ++++++------ fs/internal.h | 18 ++++++++++++------ fs/namei.c | 52 +++++++++++++++++++++++++++------------------------- io_uring/fs.c | 10 +++++----- 5 files changed, 51 insertions(+), 43 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index 29df8aa19e2e..550a1553f6cb 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -900,7 +900,7 @@ static bool coredump_file(struct core_name *cn, struct = coredump_params *cprm, * If it doesn't exist, that's fine. If there's some * other problem, we'll catch it at the filp_open(). */ - filename_unlinkat(AT_FDCWD, name); + filename_unlinkat(AT_FDCWD, name, 0); } =20 /* diff --git a/fs/init.c b/fs/init.c index 33e312d74f58..a79872d5af3b 100644 --- a/fs/init.c +++ b/fs/init.c @@ -158,39 +158,39 @@ int __init init_stat(const char *filename, struct kst= at *stat, int flags) int __init init_mknod(const char *filename, umode_t mode, unsigned int dev) { CLASS(filename_kernel, name)(filename); - return filename_mknodat(AT_FDCWD, name, mode, dev); + return filename_mknodat(AT_FDCWD, name, mode, dev, 0); } =20 int __init init_link(const char *oldname, const char *newname) { CLASS(filename_kernel, old)(oldname); CLASS(filename_kernel, new)(newname); - return filename_linkat(AT_FDCWD, old, AT_FDCWD, new, 0); + return filename_linkat(AT_FDCWD, old, AT_FDCWD, new, 0, 0); } =20 int __init init_symlink(const char *oldname, const char *newname) { CLASS(filename_kernel, old)(oldname); CLASS(filename_kernel, new)(newname); - return filename_symlinkat(old, AT_FDCWD, new); + return filename_symlinkat(old, AT_FDCWD, new, 0); } =20 int __init init_unlink(const char *pathname) { CLASS(filename_kernel, name)(pathname); - return filename_unlinkat(AT_FDCWD, name); + return filename_unlinkat(AT_FDCWD, name, 0); } =20 int __init init_mkdir(const char *pathname, umode_t mode) { CLASS(filename_kernel, name)(pathname); - return filename_mkdirat(AT_FDCWD, name, mode); + return filename_mkdirat(AT_FDCWD, name, mode, 0); } =20 int __init init_rmdir(const char *pathname) { CLASS(filename_kernel, name)(pathname); - return filename_rmdir(AT_FDCWD, name); + return filename_rmdir(AT_FDCWD, name, 0); } =20 int __init init_utimes(char *filename, struct timespec64 *ts) diff --git a/fs/internal.h b/fs/internal.h index cbc384a1aa09..7302badcae69 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -53,16 +53,22 @@ extern int finish_clean_context(struct fs_context *fc); */ extern int filename_lookup(int dfd, struct filename *name, unsigned flags, struct path *path, const struct path *root); -int filename_rmdir(int dfd, struct filename *name); -int filename_unlinkat(int dfd, struct filename *name); +int filename_rmdir(int dfd, struct filename *name, + unsigned int lookup_flags); +int filename_unlinkat(int dfd, struct filename *name, + unsigned int lookup_flags); int may_linkat(struct mnt_idmap *idmap, const struct path *link); int filename_renameat2(int olddfd, struct filename *oldname, int newdfd, struct filename *newname, unsigned int flags); -int filename_mkdirat(int dfd, struct filename *name, umode_t mode); -int filename_mknodat(int dfd, struct filename *name, umode_t mode, unsigne= d int dev); -int filename_symlinkat(struct filename *from, int newdfd, struct filename = *to); +int filename_mkdirat(int dfd, struct filename *name, umode_t mode, + unsigned int lookup_flags); +int filename_mknodat(int dfd, struct filename *name, umode_t mode, + unsigned int dev, unsigned int lookup_flags); +int filename_symlinkat(struct filename *from, int newdfd, struct filename = *to, + unsigned int lookup_flags); int filename_linkat(int olddfd, struct filename *old, int newdfd, - struct filename *new, int flags); + struct filename *new, int flags, + unsigned int lookup_flags); int vfs_tmpfile(struct mnt_idmap *idmap, const struct path *parentpath, struct file *file, umode_t mode); diff --git a/fs/namei.c b/fs/namei.c index dd2710d5f5df..5cf407aad5b3 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -5125,14 +5125,13 @@ static int may_mknod(umode_t mode) } =20 int filename_mknodat(int dfd, struct filename *name, umode_t mode, - unsigned int dev) + unsigned int dev, unsigned int lookup_flags) { struct delegated_inode di =3D { }; struct mnt_idmap *idmap; struct dentry *dentry; struct path path; int error; - unsigned int lookup_flags =3D 0; =20 error =3D may_mknod(mode); if (error) @@ -5181,13 +5180,13 @@ SYSCALL_DEFINE4(mknodat, int, dfd, const char __use= r *, filename, umode_t, mode, unsigned int, dev) { CLASS(filename, name)(filename); - return filename_mknodat(dfd, name, mode, dev); + return filename_mknodat(dfd, name, mode, dev, 0); } =20 SYSCALL_DEFINE3(mknod, const char __user *, filename, umode_t, mode, unsig= ned, dev) { CLASS(filename, name)(filename); - return filename_mknodat(AT_FDCWD, name, mode, dev); + return filename_mknodat(AT_FDCWD, name, mode, dev, 0); } =20 /** @@ -5258,14 +5257,16 @@ struct dentry *vfs_mkdir(struct mnt_idmap *idmap, s= truct inode *dir, } EXPORT_SYMBOL(vfs_mkdir); =20 -int filename_mkdirat(int dfd, struct filename *name, umode_t mode) +int filename_mkdirat(int dfd, struct filename *name, umode_t mode, + unsigned int lookup_flags) { struct dentry *dentry; struct path path; int error; - unsigned int lookup_flags =3D LOOKUP_DIRECTORY; struct delegated_inode delegated_inode =3D { }; =20 + lookup_flags |=3D LOOKUP_DIRECTORY; + retry: dentry =3D filename_create(dfd, name, &path, lookup_flags); if (IS_ERR(dentry)) @@ -5295,13 +5296,13 @@ int filename_mkdirat(int dfd, struct filename *name= , umode_t mode) SYSCALL_DEFINE3(mkdirat, int, dfd, const char __user *, pathname, umode_t,= mode) { CLASS(filename, name)(pathname); - return filename_mkdirat(dfd, name, mode); + return filename_mkdirat(dfd, name, mode, 0); } =20 SYSCALL_DEFINE2(mkdir, const char __user *, pathname, umode_t, mode) { CLASS(filename, name)(pathname); - return filename_mkdirat(AT_FDCWD, name, mode); + return filename_mkdirat(AT_FDCWD, name, mode, 0); } =20 /** @@ -5364,14 +5365,14 @@ int vfs_rmdir(struct mnt_idmap *idmap, struct inode= *dir, } EXPORT_SYMBOL(vfs_rmdir); =20 -int filename_rmdir(int dfd, struct filename *name) +int filename_rmdir(int dfd, struct filename *name, + unsigned int lookup_flags) { int error; struct dentry *dentry; struct path path; struct qstr last; int type; - unsigned int lookup_flags =3D 0; struct delegated_inode delegated_inode =3D { }; retry: error =3D filename_parentat(dfd, name, lookup_flags, &path, &last, &type); @@ -5424,7 +5425,7 @@ int filename_rmdir(int dfd, struct filename *name) SYSCALL_DEFINE1(rmdir, const char __user *, pathname) { CLASS(filename, name)(pathname); - return filename_rmdir(AT_FDCWD, name); + return filename_rmdir(AT_FDCWD, name, 0); } =20 /** @@ -5506,7 +5507,8 @@ EXPORT_SYMBOL(vfs_unlink); * writeout happening, and we don't want to prevent access to the directory * while waiting on the I/O. */ -int filename_unlinkat(int dfd, struct filename *name) +int filename_unlinkat(int dfd, struct filename *name, + unsigned int lookup_flags) { int error; struct dentry *dentry; @@ -5515,7 +5517,6 @@ int filename_unlinkat(int dfd, struct filename *name) int type; struct inode *inode; struct delegated_inode delegated_inode =3D { }; - unsigned int lookup_flags =3D 0; retry: error =3D filename_parentat(dfd, name, lookup_flags, &path, &last, &type); if (error) @@ -5576,14 +5577,14 @@ SYSCALL_DEFINE3(unlinkat, int, dfd, const char __us= er *, pathname, int, flag) =20 CLASS(filename, name)(pathname); if (flag & AT_REMOVEDIR) - return filename_rmdir(dfd, name); - return filename_unlinkat(dfd, name); + return filename_rmdir(dfd, name, 0); + return filename_unlinkat(dfd, name, 0); } =20 SYSCALL_DEFINE1(unlink, const char __user *, pathname) { CLASS(filename, name)(pathname); - return filename_unlinkat(AT_FDCWD, name); + return filename_unlinkat(AT_FDCWD, name, 0); } =20 /** @@ -5630,12 +5631,12 @@ int vfs_symlink(struct mnt_idmap *idmap, struct ino= de *dir, } EXPORT_SYMBOL(vfs_symlink); =20 -int filename_symlinkat(struct filename *from, int newdfd, struct filename = *to) +int filename_symlinkat(struct filename *from, int newdfd, struct filename = *to, + unsigned int lookup_flags) { int error; struct dentry *dentry; struct path path; - unsigned int lookup_flags =3D 0; struct delegated_inode delegated_inode =3D { }; =20 if (IS_ERR(from)) @@ -5668,14 +5669,14 @@ SYSCALL_DEFINE3(symlinkat, const char __user *, old= name, { CLASS(filename, old)(oldname); CLASS(filename, new)(newname); - return filename_symlinkat(old, newdfd, new); + return filename_symlinkat(old, newdfd, new, 0); } =20 SYSCALL_DEFINE2(symlink, const char __user *, oldname, const char __user *= , newname) { CLASS(filename, old)(oldname); CLASS(filename, new)(newname); - return filename_symlinkat(old, AT_FDCWD, new); + return filename_symlinkat(old, AT_FDCWD, new, 0); } =20 /** @@ -5779,13 +5780,14 @@ EXPORT_SYMBOL(vfs_link); * and other special files. --ADM */ int filename_linkat(int olddfd, struct filename *old, - int newdfd, struct filename *new, int flags) + int newdfd, struct filename *new, int flags, + unsigned int lookup_flags) { struct mnt_idmap *idmap; struct dentry *new_dentry; struct path old_path, new_path; struct delegated_inode delegated_inode =3D { }; - int how =3D 0; + int how =3D lookup_flags; int error; =20 if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH)) !=3D 0) @@ -5807,7 +5809,7 @@ int filename_linkat(int olddfd, struct filename *old, return error; =20 new_dentry =3D filename_create(newdfd, new, &new_path, - (how & LOOKUP_REVAL)); + (how & (LOOKUP_REVAL | LOOKUP_IN_INIT))); error =3D PTR_ERR(new_dentry); if (IS_ERR(new_dentry)) goto out_putpath; @@ -5848,14 +5850,14 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __u= ser *, oldname, { CLASS(filename_uflags, old)(oldname, flags); CLASS(filename, new)(newname); - return filename_linkat(olddfd, old, newdfd, new, flags); + return filename_linkat(olddfd, old, newdfd, new, flags, 0); } =20 SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, n= ewname) { CLASS(filename, old)(oldname); CLASS(filename, new)(newname); - return filename_linkat(AT_FDCWD, old, AT_FDCWD, new, 0); + return filename_linkat(AT_FDCWD, old, AT_FDCWD, new, 0, 0); } =20 /** diff --git a/io_uring/fs.c b/io_uring/fs.c index d0580c754bf8..1d9b2939f5ae 100644 --- a/io_uring/fs.c +++ b/io_uring/fs.c @@ -140,9 +140,9 @@ int io_unlinkat(struct io_kiocb *req, unsigned int issu= e_flags) WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK); =20 if (un->flags & AT_REMOVEDIR) - ret =3D filename_rmdir(un->dfd, name); + ret =3D filename_rmdir(un->dfd, name, 0); else - ret =3D filename_unlinkat(un->dfd, name); + ret =3D filename_unlinkat(un->dfd, name, 0); =20 req->flags &=3D ~REQ_F_NEED_CLEANUP; io_req_set_res(req, ret, 0); @@ -188,7 +188,7 @@ int io_mkdirat(struct io_kiocb *req, unsigned int issue= _flags) =20 WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK); =20 - ret =3D filename_mkdirat(mkd->dfd, name, mkd->mode); + ret =3D filename_mkdirat(mkd->dfd, name, mkd->mode, 0); =20 req->flags &=3D ~REQ_F_NEED_CLEANUP; io_req_set_res(req, ret, 0); @@ -241,7 +241,7 @@ int io_symlinkat(struct io_kiocb *req, unsigned int iss= ue_flags) =20 WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK); =20 - ret =3D filename_symlinkat(old, sl->new_dfd, new); + ret =3D filename_symlinkat(old, sl->new_dfd, new, 0); =20 req->flags &=3D ~REQ_F_NEED_CLEANUP; io_req_set_res(req, ret, 0); @@ -289,7 +289,7 @@ int io_linkat(struct io_kiocb *req, unsigned int issue_= flags) =20 WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK); =20 - ret =3D filename_linkat(lnk->old_dfd, old, lnk->new_dfd, new, lnk->flags); + ret =3D filename_linkat(lnk->old_dfd, old, lnk->new_dfd, new, lnk->flags,= 0); =20 req->flags &=3D ~REQ_F_NEED_CLEANUP; io_req_set_res(req, ret, 0); --=20 2.47.3 From nobody Thu Apr 9 17:57:47 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D44635F195; Tue, 3 Mar 2026 13:49:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545783; cv=none; b=XBalG5WJNCnI2mBYfwvd+md0WRVtY8HgZEr2l3ZdjuzJLhzfhM0D5xewarkD/Kc4sTwb7LE0s+5ob49a9cT+sYkz2WvdcWtcomsnXjkVAfnP3bilt6iy2xfsMqZEuCM4WP8lSHmh0MPcLIYhRBIYMvyb5/LsnInp28lvFp7bJJc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545783; c=relaxed/simple; bh=aEnsbJuZOrSqcCTr8229rBmTXM8YIUu3ak8cRkJ0pSc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=jZTqyDQRgRT6CvdKtxerYno5Zhjl5dByuHa07SvuBXxkHwfBmFZwUgl4Xk1V2nADP4t5xYGzAJOSiZv41RpzeVJ1GZI4rAO+Sf0G4A8vVgvVpzUSy58mj7h/oATMOHo9XNZrtmTWjPgQSYDUeobZY1GUKYg3pU2m5IRuE4CzKM8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=I8aY9l3/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="I8aY9l3/" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0D995C2BC9E; Tue, 3 Mar 2026 13:49:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772545782; bh=aEnsbJuZOrSqcCTr8229rBmTXM8YIUu3ak8cRkJ0pSc=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=I8aY9l3/gA3hzodBHVs87JDsbQrbtwLWlTvyBut6R7Myzgi1dlQJvrnXCqT/6Rf+9 yygr8CLRQCzjlMwBgBHIIzDtlg2n2FAQyhmh11xhekYAntgY/X4bVzN/8o0+iOYuvb GUWsyOB3V5hWJG3Rg65q0kPCPAVv6jmee4lqtmwAuWAmGDyAPmsXGkxzN3u412T2U/ S54c2h34nYuJCfAvMVfZPyr4F/WslC4y1ZgjtNncEyT814VhLmv3insDjTuvn0+yZq nkdiqRIwug5dwS1vjkgUN771TI7zWtFfto1GTI4OwHruxQkb4u3LrUD37HVeY+Hqon /jFW/a+LaTVWA== From: Christian Brauner Date: Tue, 03 Mar 2026 14:49:20 +0100 Subject: [PATCH RFC DRAFT POC 09/11] fs: add init_root() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260303-work-kthread-nullfs-v1-9-87e559b94375@kernel.org> References: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> In-Reply-To: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> To: linux-fsdevel@vger.kernel.org, Linus Torvalds Cc: linux-kernel@vger.kernel.org, Alexander Viro , Jens Axboe , Jan Kara , Tejun Heo , Jann Horn , Christian Brauner X-Mailer: b4 0.15-dev-47773 X-Developer-Signature: v=1; a=openpgp-sha256; l=1060; i=brauner@kernel.org; h=from:subject:message-id; bh=aEnsbJuZOrSqcCTr8229rBmTXM8YIUu3ak8cRkJ0pSc=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMWQue3ZvS8EGAbtlT620lkzMNGne6Bc9S2ah3YGGyrc/D XYV7Pct6ihlYRDjYpAVU2RxaDcJl1vOU7HZKFMDZg4rE8gQBi5OAZjIsWhGhl7T7LSURj2XGm91 7vM3DDisTy1v5upf3Syzxytzb0C9CcP/6o9/ZSRe/40828y1xGXO7JdvOoxfr+iP1fG5lbSd/ft jNgA= X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Add a init_root() helper that allows to grab init's current filesystem root. This can be used by callers to perform tasks relative to init's current filesystem root. Signed-off-by: Christian Brauner --- fs/fs_struct.c | 6 ++++++ include/linux/fs_struct.h | 2 ++ 2 files changed, 8 insertions(+) diff --git a/fs/fs_struct.c b/fs/fs_struct.c index ab6826d7a6a9..64b5840131cb 100644 --- a/fs/fs_struct.c +++ b/fs/fs_struct.c @@ -196,3 +196,9 @@ struct fs_struct init_fs =3D { .seq =3D __SEQLOCK_UNLOCKED(init_fs.seq), .umask =3D 0022, }; + +void init_root(struct path *root) +{ + get_fs_root(&init_fs, root); +} +EXPORT_SYMBOL_GPL(init_root); diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h index ade459383f92..8ff1acd8389d 100644 --- a/include/linux/fs_struct.h +++ b/include/linux/fs_struct.h @@ -49,4 +49,6 @@ static inline int current_umask(void) return current->fs->umask; } =20 +void init_root(struct path *root); + #endif /* _LINUX_FS_STRUCT_H */ --=20 2.47.3 From nobody Thu Apr 9 17:57:47 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CACA3364EB6; Tue, 3 Mar 2026 13:49:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545785; cv=none; b=GU74ru1KyH12vbnqNcbPl9xcLL3t+MekciykuOcjVQrlj1UGd6mW6yde382D3G8geM5BIcjygFFuCS38CZoedsPknegXWtXrS8+uC+CUM616TmcGnrr6S4ZatVFTYtcSjlMIAA17xVx8Xj2wb2nv/7H4IO6bu8bc6aqzxSeQEqs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545785; c=relaxed/simple; bh=aX5gO+5Iccj0yV9V7P8CAMTUIqqZ5nyE5IExGpNekTs=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=XylETVTSoELyG4DP7/qGy2gFDKiLWbZ8fBPiQn0Nc0XTJMUw4+01lwSXiLVvjrY3Rolw4KuuZSljh+/Xi76bWyDuu2uzTOjUaFhSboRSZek6OLCsFd0ZB8bQIvEYwQkxvBrnyWAqIgPo6O4vIVgXwxx4ffz13+lHXe7TjH4FEro= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=QNFivJpM; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QNFivJpM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 531C0C19422; Tue, 3 Mar 2026 13:49:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772545785; bh=aX5gO+5Iccj0yV9V7P8CAMTUIqqZ5nyE5IExGpNekTs=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=QNFivJpM+sQAYYx7A87jfUjL+KMzaYWq4mkENqshivLyyPIwBOLwWsPgRw7/wsB3p AjavOKSZb3bBecr7N5SohB578iL3sbWa0szbCVaKxdd/y8t8WRHrNTiw/BBvXXF4o0 HbzdiPGzGJfw9j/kQclF9k7e0piJfv+mevs+Xa1uXMGwtqHuvUnC9NdrrtxRFYZjQL 09wEoXwIrYD+c+fqANzERhLR7/9TZByCAHLWNcBwS80VQp2N4sxO5cC3nHh08yR5MS QvgssH7l+uW9Ruy9rcDL0v4RkFW980HRJ9BG2WUo4Xyt3SWju/D7ZsBOivATzGTbvv WBi7MyhuZ6ZnA== From: Christian Brauner Date: Tue, 03 Mar 2026 14:49:21 +0100 Subject: [PATCH RFC DRAFT POC 10/11] tree-wide: make all kthread path lookups to use LOOKUP_IN_INIT Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260303-work-kthread-nullfs-v1-10-87e559b94375@kernel.org> References: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> In-Reply-To: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> To: linux-fsdevel@vger.kernel.org, Linus Torvalds Cc: linux-kernel@vger.kernel.org, Alexander Viro , Jens Axboe , Jan Kara , Tejun Heo , Jann Horn , Christian Brauner X-Mailer: b4 0.15-dev-47773 X-Developer-Signature: v=1; a=openpgp-sha256; l=14803; i=brauner@kernel.org; h=from:subject:message-id; bh=aX5gO+5Iccj0yV9V7P8CAMTUIqqZ5nyE5IExGpNekTs=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMWQue3bvzLr6rfOSt7wwYwr7Ue+yzpmBv4HLaf89v21TG VqYPhV3dZSyMIhxMciKKbI4tJuEyy3nqdhslKkBM4eVCWQIAxenAExkuycjwxf7vt41vzTyIz7y rrxsfE7YJTHoxMfMmZ+dnt/VMTqxYRUjw2HR+ULdAftN1drnHFm7+VOaZXf66knip5X3OT+s4FF bzgcA X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 In preparation to isolate all kthreads in nullfs convert all lookups performed from kthread context to use LOOKUP_IN_INIT. This will make them all perform the relevant lookup operation in init's filesystem state. This should be switched to individual commits for easy bisectability but right now it serves to illustrate the idea without creating a massive patchbomb. Signed-off-by: Christian Brauner --- drivers/block/rnbd/rnbd-srv.c | 2 +- drivers/char/misc_minor_kunit.c | 2 +- drivers/crypto/ccp/sev-dev.c | 4 +--- drivers/target/target_core_alua.c | 2 +- drivers/target/target_core_pr.c | 2 +- fs/btrfs/volumes.c | 6 +++++- fs/coredump.c | 6 ++---- fs/init.c | 23 ++++++++++++----------- fs/kernel_read_file.c | 4 +--- fs/namei.c | 2 +- fs/nfs/blocklayout/dev.c | 4 ++-- fs/smb/server/mgmt/share_config.c | 3 ++- fs/smb/server/smb2pdu.c | 2 +- fs/smb/server/vfs.c | 6 ++++-- init/initramfs.c | 4 ++-- init/initramfs_test.c | 4 ++-- net/unix/af_unix.c | 4 +--- 17 files changed, 40 insertions(+), 40 deletions(-) diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c index 10e8c438bb43..6796aee9a2f0 100644 --- a/drivers/block/rnbd/rnbd-srv.c +++ b/drivers/block/rnbd/rnbd-srv.c @@ -734,7 +734,7 @@ static int process_msg_open(struct rnbd_srv_session *sr= v_sess, goto reject; } =20 - bdev_file =3D bdev_file_open_by_path(full_path, open_flags, NULL, NULL); + bdev_file =3D bdev_file_open_init(full_path, open_flags, NULL, NULL); if (IS_ERR(bdev_file)) { ret =3D PTR_ERR(bdev_file); pr_err("Opening device '%s' on session %s failed, failed to open the blo= ck device, err: %pe\n", diff --git a/drivers/char/misc_minor_kunit.c b/drivers/char/misc_minor_kuni= t.c index e930c78e1ef9..8af1377c42f9 100644 --- a/drivers/char/misc_minor_kunit.c +++ b/drivers/char/misc_minor_kunit.c @@ -165,7 +165,7 @@ static void __init miscdev_test_can_open(struct kunit *= test, struct miscdevice * if (ret !=3D 0) KUNIT_FAIL(test, "failed to create node\n"); =20 - filp =3D filp_open(devname, O_RDONLY, 0); + filp =3D filp_open_init(devname, O_RDONLY, 0); if (IS_ERR(filp)) KUNIT_FAIL(test, "failed to open misc device: %ld\n", PTR_ERR(filp)); else diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c index 096f993974d1..92971671fa9d 100644 --- a/drivers/crypto/ccp/sev-dev.c +++ b/drivers/crypto/ccp/sev-dev.c @@ -262,9 +262,7 @@ static struct file *open_file_as_root(const char *filen= ame, int flags, umode_t m { struct path root __free(path_put) =3D {}; =20 - task_lock(&init_task); - get_fs_root(init_task.fs, &root); - task_unlock(&init_task); + init_root(&root); =20 CLASS(prepare_creds, cred)(); if (!cred) diff --git a/drivers/target/target_core_alua.c b/drivers/target/target_core= _alua.c index 10250aca5a81..d23390d1b6ab 100644 --- a/drivers/target/target_core_alua.c +++ b/drivers/target/target_core_alua.c @@ -856,7 +856,7 @@ static int core_alua_write_tpg_metadata( unsigned char *md_buf, u32 md_buf_len) { - struct file *file =3D filp_open(path, O_RDWR | O_CREAT | O_TRUNC, 0600); + struct file *file =3D filp_open_init(path, O_RDWR | O_CREAT | O_TRUNC, 06= 00); loff_t pos =3D 0; int ret; =20 diff --git a/drivers/target/target_core_pr.c b/drivers/target/target_core_p= r.c index f88e63aefcd8..7ad6b534ccc6 100644 --- a/drivers/target/target_core_pr.c +++ b/drivers/target/target_core_pr.c @@ -1969,7 +1969,7 @@ static int __core_scsi3_write_aptpl_to_file( if (!path) return -ENOMEM; =20 - file =3D filp_open(path, flags, 0600); + file =3D filp_open_init(path, flags, 0600); if (IS_ERR(file)) { pr_err("filp_open(%s) for APTPL metadata" " failed\n", path); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 6fb0c4cd50ff..8baeacca01da 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2119,8 +2119,12 @@ static int btrfs_add_dev_item(struct btrfs_trans_han= dle *trans, static void update_dev_time(const char *device_path) { struct path path; + unsigned int flags =3D LOOKUP_FOLLOW; =20 - if (!kern_path(device_path, LOOKUP_FOLLOW, &path)) { + if (tsk_is_kthread(current)) + flags |=3D LOOKUP_IN_INIT; + + if (!kern_path(device_path, flags, &path)) { vfs_utimes(&path, NULL); path_put(&path); } diff --git a/fs/coredump.c b/fs/coredump.c index 550a1553f6cb..1e631c5d2076 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -919,13 +919,11 @@ static bool coredump_file(struct core_name *cn, struc= t coredump_params *cprm, * with a fully qualified path" rule is to control where * coredumps may be placed using root privileges, * current->fs->root must not be used. Instead, use the - * root directory of init_task. + * root directory of PID 1. */ struct path root; =20 - task_lock(&init_task); - get_fs_root(init_task.fs, &root); - task_unlock(&init_task); + init_root(&root); file =3D file_open_root(&root, cn->corename, open_flags, 0600); path_put(&root); } else { diff --git a/fs/init.c b/fs/init.c index a79872d5af3b..eb224e945328 100644 --- a/fs/init.c +++ b/fs/init.c @@ -12,6 +12,7 @@ #include #include #include "internal.h" +#include "mount.h" =20 int __init init_pivot_root(const char *new_root, const char *put_old) { @@ -102,7 +103,7 @@ int __init init_chown(const char *filename, uid_t user,= gid_t group, int flags) struct path path; int error; =20 - error =3D kern_path(filename, lookup_flags, &path); + error =3D kern_path(filename, lookup_flags | LOOKUP_IN_INIT, &path); if (error) return error; error =3D mnt_want_write(path.mnt); @@ -119,7 +120,7 @@ int __init init_chmod(const char *filename, umode_t mod= e) struct path path; int error; =20 - error =3D kern_path(filename, LOOKUP_FOLLOW, &path); + error =3D kern_path(filename, LOOKUP_FOLLOW | LOOKUP_IN_INIT, &path); if (error) return error; error =3D chmod_common(&path, mode); @@ -132,7 +133,7 @@ int __init init_eaccess(const char *filename) struct path path; int error; =20 - error =3D kern_path(filename, LOOKUP_FOLLOW, &path); + error =3D kern_path(filename, LOOKUP_FOLLOW | LOOKUP_IN_INIT, &path); if (error) return error; error =3D path_permission(&path, MAY_ACCESS); @@ -146,7 +147,7 @@ int __init init_stat(const char *filename, struct kstat= *stat, int flags) struct path path; int error; =20 - error =3D kern_path(filename, lookup_flags, &path); + error =3D kern_path(filename, lookup_flags | LOOKUP_IN_INIT, &path); if (error) return error; error =3D vfs_getattr(&path, stat, STATX_BASIC_STATS, @@ -158,39 +159,39 @@ int __init init_stat(const char *filename, struct kst= at *stat, int flags) int __init init_mknod(const char *filename, umode_t mode, unsigned int dev) { CLASS(filename_kernel, name)(filename); - return filename_mknodat(AT_FDCWD, name, mode, dev, 0); + return filename_mknodat(AT_FDCWD, name, mode, dev, LOOKUP_IN_INIT); } =20 int __init init_link(const char *oldname, const char *newname) { CLASS(filename_kernel, old)(oldname); CLASS(filename_kernel, new)(newname); - return filename_linkat(AT_FDCWD, old, AT_FDCWD, new, 0, 0); + return filename_linkat(AT_FDCWD, old, AT_FDCWD, new, 0, LOOKUP_IN_INIT); } =20 int __init init_symlink(const char *oldname, const char *newname) { CLASS(filename_kernel, old)(oldname); CLASS(filename_kernel, new)(newname); - return filename_symlinkat(old, AT_FDCWD, new, 0); + return filename_symlinkat(old, AT_FDCWD, new, LOOKUP_IN_INIT); } =20 int __init init_unlink(const char *pathname) { CLASS(filename_kernel, name)(pathname); - return filename_unlinkat(AT_FDCWD, name, 0); + return filename_unlinkat(AT_FDCWD, name, LOOKUP_IN_INIT); } =20 int __init init_mkdir(const char *pathname, umode_t mode) { CLASS(filename_kernel, name)(pathname); - return filename_mkdirat(AT_FDCWD, name, mode, 0); + return filename_mkdirat(AT_FDCWD, name, mode, LOOKUP_IN_INIT); } =20 int __init init_rmdir(const char *pathname) { CLASS(filename_kernel, name)(pathname); - return filename_rmdir(AT_FDCWD, name, 0); + return filename_rmdir(AT_FDCWD, name, LOOKUP_IN_INIT); } =20 int __init init_utimes(char *filename, struct timespec64 *ts) @@ -198,7 +199,7 @@ int __init init_utimes(char *filename, struct timespec6= 4 *ts) struct path path; int error; =20 - error =3D kern_path(filename, 0, &path); + error =3D kern_path(filename, LOOKUP_IN_INIT, &path); if (error) return error; error =3D vfs_utimes(&path, ts); diff --git a/fs/kernel_read_file.c b/fs/kernel_read_file.c index de32c95d823d..00bbe0757ad3 100644 --- a/fs/kernel_read_file.c +++ b/fs/kernel_read_file.c @@ -156,9 +156,7 @@ ssize_t kernel_read_file_from_path_initns(const char *p= ath, loff_t offset, if (!path || !*path) return -EINVAL; =20 - task_lock(&init_task); - get_fs_root(init_task.fs, &root); - task_unlock(&init_task); + init_root(&root); =20 file =3D file_open_root(&root, path, O_RDONLY, 0); path_put(&root); diff --git a/fs/namei.c b/fs/namei.c index 5cf407aad5b3..976b1e9f7032 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -4906,7 +4906,7 @@ static struct dentry *filename_create(int dfd, struct= filename *name, struct dentry *dentry =3D ERR_PTR(-EEXIST); struct qstr last; bool want_dir =3D lookup_flags & LOOKUP_DIRECTORY; - unsigned int reval_flag =3D lookup_flags & LOOKUP_REVAL; + unsigned int reval_flag =3D lookup_flags & (LOOKUP_REVAL | LOOKUP_IN_INIT= ); unsigned int create_flags =3D LOOKUP_CREATE | LOOKUP_EXCL; int type; int error; diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c index cc6327d97a91..32dee716237a 100644 --- a/fs/nfs/blocklayout/dev.c +++ b/fs/nfs/blocklayout/dev.c @@ -370,8 +370,8 @@ bl_open_path(struct pnfs_block_volume *v, const char *p= refix) if (!devname) return ERR_PTR(-ENOMEM); =20 - bdev_file =3D bdev_file_open_by_path(devname, BLK_OPEN_READ | BLK_OPEN_WR= ITE, - NULL, NULL); + bdev_file =3D bdev_file_open_init(devname, BLK_OPEN_READ | BLK_OPEN_WRITE, + NULL, NULL); if (IS_ERR(bdev_file)) { dprintk("failed to open device %s (%ld)\n", devname, PTR_ERR(bdev_file)); diff --git a/fs/smb/server/mgmt/share_config.c b/fs/smb/server/mgmt/share_c= onfig.c index 53f44ff4d376..2deefdc242a8 100644 --- a/fs/smb/server/mgmt/share_config.c +++ b/fs/smb/server/mgmt/share_config.c @@ -189,7 +189,8 @@ static struct ksmbd_share_config *share_config_request(= struct ksmbd_work *work, goto out; } =20 - ret =3D kern_path(share->path, 0, &share->vfs_path); + ret =3D kern_path(share->path, LOOKUP_IN_INIT, + &share->vfs_path); ksmbd_revert_fsids(work); if (ret) { ksmbd_debug(SMB, "failed to access '%s'\n", diff --git a/fs/smb/server/smb2pdu.c b/fs/smb/server/smb2pdu.c index 95901a78951c..8e89fb9a8c35 100644 --- a/fs/smb/server/smb2pdu.c +++ b/fs/smb/server/smb2pdu.c @@ -5462,7 +5462,7 @@ static int smb2_get_info_filesystem(struct ksmbd_work= *work, if (!share->path) return -EIO; =20 - rc =3D kern_path(share->path, LOOKUP_NO_SYMLINKS, &path); + rc =3D kern_path(share->path, LOOKUP_NO_SYMLINKS | LOOKUP_IN_INIT, &path); if (rc) { pr_err("cannot create vfs path\n"); return -EIO; diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c index d08973b288e5..2e64ed65dcca 100644 --- a/fs/smb/server/vfs.c +++ b/fs/smb/server/vfs.c @@ -62,6 +62,7 @@ static int ksmbd_vfs_path_lookup(struct ksmbd_share_confi= g *share_conf, if (pathname[0] =3D=3D '\0') { pathname =3D share_conf->path; root_share_path =3D NULL; + flags |=3D LOOKUP_IN_INIT; } else { flags |=3D LOOKUP_BENEATH; } @@ -622,7 +623,7 @@ int ksmbd_vfs_link(struct ksmbd_work *work, const char = *oldname, if (ksmbd_override_fsids(work)) return -ENOMEM; =20 - err =3D kern_path(oldname, LOOKUP_NO_SYMLINKS, &oldpath); + err =3D kern_path(oldname, LOOKUP_NO_SYMLINKS | LOOKUP_IN_INIT, &oldpath); if (err) { pr_err("cannot get linux path for %s, err =3D %d\n", oldname, err); @@ -1258,7 +1259,8 @@ struct dentry *ksmbd_vfs_kern_path_create(struct ksmb= d_work *work, if (!abs_name) return ERR_PTR(-ENOMEM); =20 - dent =3D start_creating_path(AT_FDCWD, abs_name, path, flags); + dent =3D start_creating_path(AT_FDCWD, abs_name, path, + flags | LOOKUP_IN_INIT); kfree(abs_name); return dent; } diff --git a/init/initramfs.c b/init/initramfs.c index 139baed06589..f44d772f960b 100644 --- a/init/initramfs.c +++ b/init/initramfs.c @@ -382,7 +382,7 @@ static int __init do_name(void) int openflags =3D O_WRONLY|O_CREAT|O_LARGEFILE; if (ml !=3D 1) openflags |=3D O_TRUNC; - wfile =3D filp_open(collected, openflags, mode); + wfile =3D filp_open_init(collected, openflags, mode); if (IS_ERR(wfile)) return 0; wfile_pos =3D 0; @@ -702,7 +702,7 @@ static void __init populate_initrd_image(char *err) =20 printk(KERN_INFO "rootfs image is not initramfs (%s); looks like an initr= d\n", err); - file =3D filp_open("/initrd.image", O_WRONLY|O_CREAT|O_LARGEFILE, 0700); + file =3D filp_open_init("/initrd.image", O_WRONLY|O_CREAT|O_LARGEFILE, 07= 00); if (IS_ERR(file)) return; =20 diff --git a/init/initramfs_test.c b/init/initramfs_test.c index 2ce38d9a8fd0..9415b9cfb9d3 100644 --- a/init/initramfs_test.c +++ b/init/initramfs_test.c @@ -224,7 +224,7 @@ static void __init initramfs_test_data(struct kunit *te= st) err =3D unpack_to_rootfs(cpio_srcbuf, len); KUNIT_EXPECT_NULL(test, err); =20 - file =3D filp_open(c[0].fname, O_RDONLY, 0); + file =3D filp_open_init(c[0].fname, O_RDONLY, 0); if (IS_ERR(file)) { KUNIT_FAIL(test, "open failed"); goto out; @@ -430,7 +430,7 @@ static void __init initramfs_test_fname_pad(struct kuni= t *test) err =3D unpack_to_rootfs(tbufs->cpio_srcbuf, len); KUNIT_EXPECT_NULL(test, err); =20 - file =3D filp_open(c[0].fname, O_RDONLY, 0); + file =3D filp_open_init(c[0].fname, O_RDONLY, 0); if (IS_ERR(file)) { KUNIT_FAIL(test, "open failed"); goto out; diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 3756a93dc63a..6f370cb44afe 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1200,9 +1200,7 @@ static struct sock *unix_find_bsd(struct sockaddr_un = *sunaddr, int addr_len, if (flags & SOCK_COREDUMP) { struct path root; =20 - task_lock(&init_task); - get_fs_root(init_task.fs, &root); - task_unlock(&init_task); + init_root(&root); =20 scoped_with_kernel_creds() err =3D vfs_path_lookup(root.dentry, root.mnt, sunaddr->sun_path, --=20 2.47.3 From nobody Thu Apr 9 17:57:47 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 78FC6368976; Tue, 3 Mar 2026 13:49:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545788; cv=none; b=ZQDcXrCcaBUjeZ2LkfU8QXCVCRqsMVTJ1ZqHTT4r2WQKKSW578m2I2f+ARJFDBIYtAAZVWr2t9Ew7qgMIRhqlReVXXKPrQuBfIdD+ZmcgCAHXBH/sxYujpQSqPir+qLEi3o30kg9HIaKmZJWERiwAM8KNPZVBnIkDw24kDlqRQg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772545788; c=relaxed/simple; bh=xWc1SYhagsFIDcFD1Ftyff0sKkkiatJdtEEz/BrQlu0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=nERYc3aYMJwyk12gPZIYxWWBtSPL3Vl6pZSlR8XvQ21kh6zIWV1R0LoLo5i8JE64C676+KluLBLdQE0hVMLdMuos73BQsi1pbFew8jTRdrrTfVl4NrYdAya71hORf38Bi8eInehBkbpTW6cR+KLj3jjh/T3dP17jKw5uoOam5uU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=I6UJFgX3; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="I6UJFgX3" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A4117C2BCAF; Tue, 3 Mar 2026 13:49:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772545787; bh=xWc1SYhagsFIDcFD1Ftyff0sKkkiatJdtEEz/BrQlu0=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=I6UJFgX3ETgpJRUabnKnCKeS3y3BTYQTmNPCdjeYA1TUGdjIaWFp12INNy9U0iJ6b 3OwaymS1YWZWJc+Yey3dZads/5t7Xaj2ozzOz95rdt17n5TZOA6Bm1L3ZT5fYx45Ti E/nVWlnY/l/WkBGCl3NH6rUtMnZtAXoL3/blXOa9QkE4FSSOCh4ArUJbBiaEJvlN7L vQToiagtJH+XTpkGGGZrN7j8BpmAf4lWL/MhQw5SBa25sllxCLJeqyVZUmY2sF1vx5 VqdMeXHFVTOvxombxAbi9hR0fOfAnIdM6tCZPophF3NMDPMis693dtFv/NIZg1RIdD F/GsnxciPK2uQ== From: Christian Brauner Date: Tue, 03 Mar 2026 14:49:22 +0100 Subject: [PATCH RFC DRAFT POC 11/11] fs: isolate all kthreads in nullfs Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260303-work-kthread-nullfs-v1-11-87e559b94375@kernel.org> References: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> In-Reply-To: <20260303-work-kthread-nullfs-v1-0-87e559b94375@kernel.org> To: linux-fsdevel@vger.kernel.org, Linus Torvalds Cc: linux-kernel@vger.kernel.org, Alexander Viro , Jens Axboe , Jan Kara , Tejun Heo , Jann Horn , Christian Brauner X-Mailer: b4 0.15-dev-47773 X-Developer-Signature: v=1; a=openpgp-sha256; l=9100; i=brauner@kernel.org; h=from:subject:message-id; bh=xWc1SYhagsFIDcFD1Ftyff0sKkkiatJdtEEz/BrQlu0=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMWQue3bP7UYr29pnt0xuprRP+CBQtHDqiyubtk7xFlPVN P70VNz2QEcpC4MYF4OsmCKLQ7tJuNxynorNRpkaMHNYmUCGMHBxCsBE3rEw/JU+ccosULbBSqdX 4a1Y7OXOcieXetebbQXCtatcp67KTGRkmLFyzoQDffON2K2Cs6TcPSUkXX0uaK577SV5knv7n3x JbgA= X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Leave all kthreads isolated in nullfs and move userspace init into its separate fs_struct that any kthread can grab on demand to perform lookup. This isolates kthreads from userspace filesystem state quite a bit and makes it hard for anyone to mess up when performing filesystem operations from kthreads. Without LOOKUP_IN_INIT they will just not be able to do anything at all: no lookup or creation. Add a new struct kernel_clone_args extension that allows to create a task that shares init's filesystem state. This is only going to be used by user_mode_thread() which execute stuff in init's filesystem state. That concept should go away. Signed-off-by: Christian Brauner --- fs/fs_struct.c | 49 ++++++++++++++++++++++++++++++++++++++++++= +--- fs/namei.c | 4 ++-- fs/namespace.c | 4 ---- include/linux/fs_struct.h | 1 + include/linux/init_task.h | 1 + include/linux/sched/task.h | 1 + init/main.c | 10 +++++++++- kernel/fork.c | 26 +++++++++++++++++++++--- 8 files changed, 83 insertions(+), 13 deletions(-) diff --git a/fs/fs_struct.c b/fs/fs_struct.c index 64b5840131cb..164139c27380 100644 --- a/fs/fs_struct.c +++ b/fs/fs_struct.c @@ -8,6 +8,7 @@ #include #include #include "internal.h" +#include "mount.h" =20 /* * Replace the fs->{rootmnt,root} with {mnt,dentry}. Put the old values. @@ -160,13 +161,30 @@ EXPORT_SYMBOL_GPL(unshare_fs_struct); * fs_struct state. Breaking that contract sucks for both sides. * So just don't bother with extra work for this. No sane init * system should ever do this. + * + * On older kernels if PID 1 unshared its filesystem state with us the + * kernel simply used the stale fs_struct state implicitly pinning + * anything that PID 1 had last used. Even if PID 1 might've moved on to + * some completely different fs_struct state and might've even unmounted + * the old root. + * + * This has hilarious consequences: Think continuing to dump coredump + * state into an implicitly pinned directory somewhere. Calling random + * binaries in the old rootfs via usermodehelpers. + * + * Be aggressive about this: We simply reject operating on stale + * fs_struct state by reverting to nullfs. Every kworker that does + * lookups after this point will fail. Every usermodehelper call will + * fail. Tough luck but let's be kind and emit a warning to userspace. */ static inline bool nullfs_userspace_init(void) { struct fs_struct *fs =3D current->fs; =20 - if (unlikely(current->pid =3D=3D 1) && fs !=3D &init_fs) { + if (unlikely(current->pid =3D=3D 1) && fs !=3D &userspace_init_fs) { pr_warn("VFS: Pid 1 stopped sharing filesystem state\n"); + set_fs_root(&userspace_init_fs, &init_fs.root); + set_fs_pwd(&userspace_init_fs, &init_fs.root); return true; } =20 @@ -186,7 +204,9 @@ struct fs_struct *switch_fs_struct(struct fs_struct *ne= w_fs) new_fs =3D fs; read_sequnlock_excl(&fs->seq); =20 - nullfs_userspace_init(); + /* one reference belongs to us */ + if (nullfs_userspace_init()) + return NULL; return new_fs; } =20 @@ -197,8 +217,31 @@ struct fs_struct init_fs =3D { .umask =3D 0022, }; =20 +struct fs_struct userspace_init_fs =3D { + .users =3D 1, + .seq =3D __SEQLOCK_UNLOCKED(userspace_init_fs.seq), + .umask =3D 0022, +}; + void init_root(struct path *root) { - get_fs_root(&init_fs, root); + get_fs_root(&userspace_init_fs, root); } EXPORT_SYMBOL_GPL(init_root); + +void __init init_userspace_fs(void) +{ + struct mount *m; + struct path root; + + /* Move PID 1 from nullfs into the initramfs. */ + m =3D topmost_overmount(current->nsproxy->mnt_ns->root); + root.mnt =3D &m->mnt; + root.dentry =3D root.mnt->mnt_root; + + VFS_WARN_ON_ONCE(current->fs !=3D &init_fs); + VFS_WARN_ON_ONCE(current->pid !=3D 1); + set_fs_root(&userspace_init_fs, &root); + set_fs_pwd(&userspace_init_fs, &root); + switch_fs_struct(&userspace_init_fs); +} diff --git a/fs/namei.c b/fs/namei.c index 976b1e9f7032..6cc53040e9eb 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1102,7 +1102,7 @@ static int set_root(struct nameidata *nd) struct fs_struct *fs; =20 if (nd->flags & LOOKUP_IN_INIT) - fs =3D &init_fs; + fs =3D &userspace_init_fs; else fs =3D current->fs; =20 @@ -2724,7 +2724,7 @@ static const char *path_init(struct nameidata *nd, un= signed flags) struct fs_struct *fs; =20 if (nd->flags & LOOKUP_IN_INIT) - fs =3D &init_fs; + fs =3D &userspace_init_fs; else fs =3D current->fs; =20 diff --git a/fs/namespace.c b/fs/namespace.c index 854f4fc66469..10056ac1dcd2 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -6190,10 +6190,6 @@ static void __init init_mount_tree(void) =20 init_task.nsproxy->mnt_ns =3D &init_mnt_ns; get_mnt_ns(&init_mnt_ns); - - /* The root and pwd always point to the mutable rootfs. */ - root.mnt =3D mnt; - root.dentry =3D mnt->mnt_root; set_fs_pwd(current->fs, &root); set_fs_root(current->fs, &root); =20 diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h index 8ff1acd8389d..5c40fdc39550 100644 --- a/include/linux/fs_struct.h +++ b/include/linux/fs_struct.h @@ -50,5 +50,6 @@ static inline int current_umask(void) } =20 void init_root(struct path *root); +void __init init_userspace_fs(void); =20 #endif /* _LINUX_FS_STRUCT_H */ diff --git a/include/linux/init_task.h b/include/linux/init_task.h index a6cb241ea00c..f27f88598394 100644 --- a/include/linux/init_task.h +++ b/include/linux/init_task.h @@ -24,6 +24,7 @@ =20 extern struct files_struct init_files; extern struct fs_struct init_fs; +extern struct fs_struct userspace_init_fs; extern struct nsproxy init_nsproxy; =20 #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 41ed884cffc9..e0c1ca8c6a18 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -31,6 +31,7 @@ struct kernel_clone_args { u32 io_thread:1; u32 user_worker:1; u32 no_files:1; + u32 umh:1; unsigned long stack; unsigned long stack_size; unsigned long tls; diff --git a/init/main.c b/init/main.c index 1cb395dd94e4..ca0d0914c63e 100644 --- a/init/main.c +++ b/init/main.c @@ -102,6 +102,7 @@ #include #include #include +#include #include #include #include @@ -713,6 +714,11 @@ static __initdata DECLARE_COMPLETION(kthreadd_done); =20 static noinline void __ref __noreturn rest_init(void) { + struct kernel_clone_args init_args =3D { + .flags =3D (CLONE_FS | CLONE_VM | CLONE_UNTRACED), + .fn =3D kernel_init, + .fn_arg =3D NULL, + }; struct task_struct *tsk; int pid; =20 @@ -722,7 +728,7 @@ static noinline void __ref __noreturn rest_init(void) * the init task will end up wanting to create kthreads, which, if * we schedule it before we create kthreadd, will OOPS. */ - pid =3D user_mode_thread(kernel_init, NULL, CLONE_FS); + pid =3D kernel_clone(&init_args); /* * Pin init on the boot CPU. Task migration is not properly working * until sched_init_smp() has been run. It will set the allowed @@ -1574,6 +1580,8 @@ static int __ref kernel_init(void *unused) { int ret; =20 + init_userspace_fs(); + /* * Wait until kthreadd is all set-up. */ diff --git a/kernel/fork.c b/kernel/fork.c index 583078c69bbd..121538f58272 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1590,9 +1590,28 @@ static int copy_mm(u64 clone_flags, struct task_stru= ct *tsk) return 0; } =20 -static int copy_fs(u64 clone_flags, struct task_struct *tsk) +static int copy_fs(u64 clone_flags, struct task_struct *tsk, bool umh) { - struct fs_struct *fs =3D current->fs; + struct fs_struct *fs; + + /* + * Usermodehelper may use userspace_init_fs filesystem state but + * they don't get to create mount namespaces, share the + * filesystem state, or be started from a non-initial mount + * namespace. + */ + if (umh) { + if (clone_flags & (CLONE_NEWNS | CLONE_FS)) + return -EINVAL; + if (current->nsproxy->mnt_ns !=3D &init_mnt_ns) + return -EINVAL; + } + + if (umh) + fs =3D &userspace_init_fs; + else + fs =3D current->fs; + if (clone_flags & CLONE_FS) { /* tsk->fs is already what we want */ read_seqlock_excl(&fs->seq); @@ -2211,7 +2230,7 @@ __latent_entropy struct task_struct *copy_process( retval =3D copy_files(clone_flags, p, args->no_files); if (retval) goto bad_fork_cleanup_semundo; - retval =3D copy_fs(clone_flags, p); + retval =3D copy_fs(clone_flags, p, args->umh); if (retval) goto bad_fork_cleanup_files; retval =3D copy_sighand(clone_flags, p); @@ -2725,6 +2744,7 @@ pid_t user_mode_thread(int (*fn)(void *), void *arg, = unsigned long flags) .exit_signal =3D (flags & CSIGNAL), .fn =3D fn, .fn_arg =3D arg, + .umh =3D 1, }; =20 return kernel_clone(&args); --=20 2.47.3