From nobody Thu Apr 9 21:49:51 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 01DE5367F5C; Thu, 5 Mar 2026 23:30:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772753424; cv=none; b=bu+OgYo3csulndn/XrK1vzjk5JokevdflfmXlYIRPvMWLt1awCfudohT1FaxlTF68a7Ow6BCA5WOV7lIIF5Qu40bAHjNtE2+t1oH2ZMWSO5QjDtEsW1x7hJf8ecm0RL0R1Kic9XT0CeeLx6Zifbj9aziFrMcAqUR4/dqQVMCFj8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772753424; c=relaxed/simple; bh=ihMLt71nK319mbuBMwuGH7uvgLVHRHqUMc328AWSJvE=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=VbchsArgLWQEm8YHzil073C3BJEiWjjpXUB21Q3AnKXpQ/TnDZTHn/yYDXAyDzY9PM6Zojv69Hu7QsTBrBgVc+DFaIF5G+M2dYD3FsINgY0P4QiNKBMWkyW2tjg/7uxQkljCF/itZN49ITM58S/whBKdBthKP+NUX7xErHZt4h0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=MoUpDN7u; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="MoUpDN7u" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 92BC9C116C6; Thu, 5 Mar 2026 23:30:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772753423; bh=ihMLt71nK319mbuBMwuGH7uvgLVHRHqUMc328AWSJvE=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=MoUpDN7u+l9nXcomHZs3cPKTQpjOaxOurl0VAxMssPKEXp8LZiXaCRDNARwUdqVzh kelNP1OZaE8NlzF9qyOgonlafJcY6367WOaBzihM0/nAxEdNLFUiTOoNZE7XQi04m4 XZFuzEhb8jjbny9OH0XgOSY69eHa46q+DeeMP0xC0eF1waDbTLvHDW6XanYPZapaf1 IOd3dfAinSnBwejiyV66TwcXW7dMH/LUvgLkkrsQq/RA1YZNuwmCQRp87oXupuz9f9 K7oeNO5ybFOVdVNQsg4MzwtmvWUFZW908FMisHRwdilLc9ZNUSFXwEVDQrjucjW7WA RrlZg31/aUdvA== From: Christian Brauner Date: Fri, 06 Mar 2026 00:30:04 +0100 Subject: [PATCH RFC v2 01/23] fs: notice when init abandons fs sharing Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260306-work-kthread-nullfs-v2-1-ad1b4bed7d3e@kernel.org> References: <20260306-work-kthread-nullfs-v2-0-ad1b4bed7d3e@kernel.org> In-Reply-To: <20260306-work-kthread-nullfs-v2-0-ad1b4bed7d3e@kernel.org> To: linux-fsdevel@vger.kernel.org, Linus Torvalds Cc: linux-kernel@vger.kernel.org, Alexander Viro , Jens Axboe , Jan Kara , Tejun Heo , Jann Horn , Christian Brauner X-Mailer: b4 0.15-dev-47773 X-Developer-Signature: v=1; a=openpgp-sha256; l=3901; i=brauner@kernel.org; h=from:subject:message-id; bh=ihMLt71nK319mbuBMwuGH7uvgLVHRHqUMc328AWSJvE=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMWSuEuK0faOquWPVh63WomzFXz9a/X1Xmfn2z9eiO5HZS Rr/ujgvd5SyMIhxMciKKbI4tJuEyy3nqdhslKkBM4eVCWQIAxenAExE6RjDP6vtr2rifqbJn9Ll kOnjcn6on2ayW+XIgrquPe+/pt7vW87wh9P0RfoElUhfL33XHfv9V8xd82r9Z9nCzZw77MJ+2C9 yZgcA X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 PID 1 may choose to stop sharing fs_struct state with us. Either via unshare(CLONE_FS) or unshare(CLONE_NEWNS). Of course, PID 1 could have chosen to create arbitrary process trees that all share fs_struct state via CLONE_FS. This is a strong statement: We only care about PID 1 aka the thread-group leader so subthread's fs_struct state doesn't matter. PID 1 unsharing fs_struct state is a bug. PID 1 relies on various kthreads to be able to perform work based on its fs_struct state. Breaking that contract sucks for both sides. So just don't bother with extra work for this. No sane init system should ever do this. Signed-off-by: Christian Brauner --- fs/fs_struct.c | 41 +++++++++++++++++++++++++++++++++++++++++ include/linux/fs_struct.h | 2 ++ kernel/fork.c | 14 +++----------- 3 files changed, 46 insertions(+), 11 deletions(-) diff --git a/fs/fs_struct.c b/fs/fs_struct.c index 394875d06fd6..3ff79fb894c1 100644 --- a/fs/fs_struct.c +++ b/fs/fs_struct.c @@ -147,6 +147,47 @@ int unshare_fs_struct(void) } EXPORT_SYMBOL_GPL(unshare_fs_struct); =20 +/* + * PID 1 may choose to stop sharing fs_struct state with us. + * Either via unshare(CLONE_FS) or unshare(CLONE_NEWNS). Of + * course, PID 1 could have chosen to create arbitrary process + * trees that all share fs_struct state via CLONE_FS. This is a + * strong statement: We only care about PID 1 aka the thread-group + * leader so subthread's fs_struct state doesn't matter. + * + * PID 1 unsharing fs_struct state is a bug. PID 1 relies on + * various kthreads to be able to perform work based on its + * fs_struct state. Breaking that contract sucks for both sides. + * So just don't bother with extra work for this. No sane init + * system should ever do this. + */ +static inline void nullfs_userspace_init(struct fs_struct *old_fs) +{ + if (likely(current->pid !=3D 1)) + return; + /* @old_fs may be dangling but for comparison it's fine */ + if (old_fs !=3D &init_fs) + return; + pr_warn("VFS: Pid 1 stopped sharing filesystem state\n"); +} + +struct fs_struct *switch_fs_struct(struct fs_struct *new_fs) +{ + struct fs_struct *fs; + + fs =3D current->fs; + read_seqlock_excl(&fs->seq); + current->fs =3D new_fs; + if (--fs->users) + new_fs =3D NULL; + else + new_fs =3D fs; + read_sequnlock_excl(&fs->seq); + + nullfs_userspace_init(fs); + return new_fs; +} + /* to be mentioned only in INIT_TASK */ struct fs_struct init_fs =3D { .users =3D 1, diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h index 0070764b790a..ade459383f92 100644 --- a/include/linux/fs_struct.h +++ b/include/linux/fs_struct.h @@ -40,6 +40,8 @@ static inline void get_fs_pwd(struct fs_struct *fs, struc= t path *pwd) read_sequnlock_excl(&fs->seq); } =20 +struct fs_struct *switch_fs_struct(struct fs_struct *new_fs); + extern bool current_chrooted(void); =20 static inline int current_umask(void) diff --git a/kernel/fork.c b/kernel/fork.c index 65113a304518..583078c69bbd 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -3123,7 +3123,7 @@ static int unshare_fd(unsigned long unshare_flags, st= ruct files_struct **new_fdp */ int ksys_unshare(unsigned long unshare_flags) { - struct fs_struct *fs, *new_fs =3D NULL; + struct fs_struct *new_fs =3D NULL; struct files_struct *new_fd =3D NULL; struct cred *new_cred =3D NULL; struct nsproxy *new_nsproxy =3D NULL; @@ -3200,16 +3200,8 @@ int ksys_unshare(unsigned long unshare_flags) =20 task_lock(current); =20 - if (new_fs) { - fs =3D current->fs; - read_seqlock_excl(&fs->seq); - current->fs =3D new_fs; - if (--fs->users) - new_fs =3D NULL; - else - new_fs =3D fs; - read_sequnlock_excl(&fs->seq); - } + if (new_fs) + new_fs =3D switch_fs_struct(new_fs); =20 if (new_fd) swap(current->files, new_fd); --=20 2.47.3