From nobody Mon Jun 8 10:56:49 2026 Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDD5237AA9E for ; Fri, 29 May 2026 17:18:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075138; cv=none; b=jEdg7Yiw1I0UqVESCLKwK9j4GGpPQdYXg2gcDsusgOr4GeK3d+85UIdgsu8izPMVobzMmOImVW42K96ETXPI1KuMEYaqNE4qy2qeHomoU1WufyUzDPbroCL+SoP5LlaC8u/TQY/wYN24p2EqzXtKxzTzHbgEsTGDwnoxJSWlv+g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075138; c=relaxed/simple; bh=kPRFGm1PpzKTojZpkl1fpPxcxPliwE6WJk5XhCUxa50=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dHHP2YAPegPyGpiMP34hPp/7rqHIYrJKERXa5I66ro55mN3Be3ugwXNExgC1MKfEHrWDL9PNN5AI5LmbGgjmAa5SEIDuO7IYArdqMVd8dSG6bWVO6rGhQCe1CtVAXtfzfyzQWOlFBAr9v9jel4R2HZnj/+i3qhRvZrWXPJsI2g4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XonvL2ID; arc=none smtp.client-ip=209.85.128.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XonvL2ID" Received: by mail-wm1-f46.google.com with SMTP id 5b1f17b1804b1-4904c1ce4c1so84354355e9.3 for ; Fri, 29 May 2026 10:18:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780075135; x=1780679935; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tZ4vZVcUxh57FWknF9iHwYSifIVXpEFpUueO53nt890=; b=XonvL2ID72Jf20vseryt3U/cWL6sigpsbaS4/H0uOGkaflCi29fQAf8aFbmKUTncht 4e6THe0bsCmM3mhph4+pRG9yFmBEkrBGwuvGnBaVPTc1B1ZhwkbdSDmTWgMCLeUM74ub q+u7c/q/DS+wVlyLhh/OSwPIM5wrw4KiqkO/lPs5aDNq4rtk/fWw/Am/p5l8wiUo9NPl e8/IhmYEVEYjiBXaPAa4yhZEYYXX6XMyKeARBP00qWOyaJTX6BLChSecONVEfl4uE4IM wFfGzDo40ufTWHAsOS06iMzbWG/xF0qjvGjDXQSIMjMxjgIgVmnZvE4O9LD5k86f3q6v HdWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780075135; x=1780679935; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=tZ4vZVcUxh57FWknF9iHwYSifIVXpEFpUueO53nt890=; b=HLbsx0LHCW/OzlOHZ2PA0TtNT+X/1M71pZdiDVIhzzyUA9hC00b4XImTqsAx2osWMF UGn2HLrYgSgAYqLzm687lxgNV7IBK3RBfVo4Fwm+o/RetgAPtt4lJ4gyPYNTF94ioHcV cJQaoO6Ig029kGZg6sMimN5VC8rH6Gnw9hUHtOJ8JdN5lfa7tL0G/ltWXPLfZY3se1R9 nHOn0h2CBgzUmDnL6rIBTSQLA87DtzZYNazQSH6im07d9kWZHRt8AQt2ONuVnIaUEJ5d NTLz6x/TZdAWJwsi6ANnaEL3W5idjtkIm6Fy7Nm3Jj8cK3//ok8HEnt0iYZtqABRcD/g E2Gg== X-Forwarded-Encrypted: i=1; AFNElJ/X78vv98zN7l/ck63YuQLM36JhCJk4MLUTWI4C5OZcueGs6xgf6deJd10gLnMH3IIoBsGo2fn0Q4ldiWc=@vger.kernel.org X-Gm-Message-State: AOJu0YzcF/eSEjn3PFvQk0KkoRgvzRdDxph1cBbfOX4x9rKYv8vOH0fH P15/efxM5P66rXrMq9Egrv9Jt/dHreYhf0ERZtyEaLi0rMLqzSovwWhC X-Gm-Gg: Acq92OGe47UMA6NO7uVveCMnXzG3O4YQ7rpVW9oGxgCctTqAOrHo9ETRYqwBeyZ4r2l Oar4Qw/CM948y9+MXv61bmpDiibMJAiY+SkDVSWALHpvEwahUhVEwCKYTFejCP7IWYX50xIFXK+ t+4peakcp2RmXuq32TDtqVH3W4gkT+N2R6KmYRsDpTfaSTejT5w93U/v2E6T1H+qmirf4lB3t5f EPy273sBx6X7KuR7QUHI8V5l48A2D5xlnudJScVjjeTr+nvpaSf9XRKzRr5w0zxoi/5M/KNRwAu uE3LlVaNE9riH51mlA445LDawp9nuF0RmadjryK6VonemJ0Baqj/Tiv7xb9vG9O6FapRjtd811l gEPCgwMt3L1Yi9bxsAz35XrpsageuMqJR+lJVY7n7OYSLnHSNAg06h91iQStknm5BCrSYOoybTB Ayk254ChyGCCwmjN5wkGYd9Q7bQ1VdCkniME+9yhYC9I67MGCXC6ZWcLw+hPh6JuDAUtUoJxBIc uYPKeahiANW X-Received: by 2002:a05:600c:c3ce:20b0:48f:e230:2a21 with SMTP id 5b1f17b1804b1-490a2975173mr6658095e9.32.1780075134846; Fri, 29 May 2026 10:18:54 -0700 (PDT) Received: from f.. (cst-prg-92-135.cust.vodafone.cz. [46.135.92.135]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4909caa7faasm56255375e9.11.2026.05.29.10.18.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 May 2026 10:18:54 -0700 (PDT) From: Mateusz Guzik To: brauner@kernel.org Cc: viro@zeniv.linux.org.uk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, adobriyan@gmail.com Subject: [PATCH v4 1/3] proc: allow to mark /proc files permanent outside of fs/proc/ Date: Fri, 29 May 2026 19:18:38 +0200 Message-ID: <20260529171840.2576445-2-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260529171840.2576445-1-mjguzik@gmail.com> References: <20260529171840.2576445-1-mjguzik@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Alexey Dobriyan Add proc_make_permanent() function to mark PDE as permanent to speed up open/read/close (one alloc/free and lock/unlock less). Enable it for built-in code and for compiled-in modules. This function becomes nop magically in modular code. Note, note, note! If built-in code creates and deletes PDEs dynamically (not in init hook), then proc_make_permanent() must not be used. It is intended for simple code: static int __init xxx_module_init(void) { g_pde =3D proc_create_single(); proc_make_permanent(g_pde); return 0; } static void __exit xxx_module_exit(void) { remove_proc_entry(g_pde); } If module is built-in then exit hook never executed and PDE is permanent so it is OK to mark it as such. If module is module then rmmod will yank PDE, but proc_make_permanent() is nop and core /proc code will do everything right. Signed-off-by: Alexey Dobriyan --- fs/proc/generic.c | 12 ++++++++++++ fs/proc/internal.h | 3 +++ include/linux/proc_fs.h | 10 ++++++++++ 3 files changed, 25 insertions(+) diff --git a/fs/proc/generic.c b/fs/proc/generic.c index 8bb81e58c9d8..5f82c0b4a8bb 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -841,3 +841,15 @@ ssize_t proc_simple_write(struct file *f, const char _= _user *ubuf, size_t size, kfree(buf); return ret =3D=3D 0 ? size : ret; } + +/* + * Not exported to modules: + * modules' /proc files aren't permanent because modules aren't permanent. + */ +void impl_proc_make_permanent(struct proc_dir_entry *pde); +void impl_proc_make_permanent(struct proc_dir_entry *pde) +{ + if (pde) { + pde_make_permanent(pde); + } +} diff --git a/fs/proc/internal.h b/fs/proc/internal.h index d31984c3c797..b232e1098117 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -79,8 +79,11 @@ static inline bool pde_is_permanent(const struct proc_di= r_entry *pde) return pde->flags & PROC_ENTRY_PERMANENT; } =20 +/* This is for builtin code, not even for modules which are compiled in. */ static inline void pde_make_permanent(struct proc_dir_entry *pde) { + /* Ensure magic flag does something. */ + static_assert(PROC_ENTRY_PERMANENT !=3D 0); pde->flags |=3D PROC_ENTRY_PERMANENT; } =20 diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index ec123c277d49..96a22109a0bf 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -249,4 +249,14 @@ static inline struct pid_namespace *proc_pid_ns(struct= super_block *sb) =20 bool proc_ns_file(const struct file *file); =20 +static inline void proc_make_permanent(struct proc_dir_entry *pde) +{ + /* Don't give matches to modules. */ +#if defined CONFIG_PROC_FS && !defined MODULE + /* This mess is created by defining "struct proc_dir_entry" elsewhere. */ + void impl_proc_make_permanent(struct proc_dir_entry *pde); + impl_proc_make_permanent(pde); +#endif +} + #endif /* _LINUX_PROC_FS_H */ --=20 2.48.1 From nobody Mon Jun 8 10:56:49 2026 Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BBE40380FF7 for ; Fri, 29 May 2026 17:18:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075139; cv=none; b=ludeosnotoU7jqzwFyZblICFrCYoPjRS6CjTTbQzNEXsrlmhs+dWU0KfTN85GmD25zPwbnAf/NSc86uFUGbPn1aD7Ny9DGemiQypAVT7o5AamVIOthTnpvW3qqheUGzFF2xbBSTWr32wgcznr99c5R3JsHJSDyBnj/N1MEBOkFU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075139; c=relaxed/simple; bh=fK6xNyE6eDSHJQmEglHgdDrhfdST1Vppxfx5ZtL9Uqc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=U+NzsufB6i3bULFCo9/H+eptxjC4qVdMsHt02nN5l19wOoCFiOh/tS7Tf+vYjNvjGp0p3afd84GaYkdWW1WLrIBN95LyPzqTYNXt1+cCH24eB8td2cGyEUMt8RJ/cKNVBXvV/QlS0TQbGNaJNPWrY7TR5dyuM9//FlOdRzRnb3k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=YSNHAvf0; arc=none smtp.client-ip=209.85.221.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="YSNHAvf0" Received: by mail-wr1-f42.google.com with SMTP id ffacd0b85a97d-45ef5146b56so431937f8f.0 for ; Fri, 29 May 2026 10:18:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780075136; x=1780679936; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8wZ8gF6/XO5oxB0yf+BvwEi0Q7rBPl4dd/tZzxFPNC8=; b=YSNHAvf0r0c5SFrSySJ/4dilw6r5S2YuJ3gPlyifsl+ePPHoARF41I4o94F1QdsZ1E Q9/EnxuIoKNOx5mnu4hqDjUTJqQ2ZSRS4slAGKYya4uUc0Ps4pDegY5SXgEbW+xEmG5n xA4AbEu33/JibHmn+myDW+VDB1yR/GBcaNVvMpOXisJzQnENeeGt8q9b61pMc15oQd7M 3+aRtcd5LGRCWUwDhJcQgrDpZD+s5SER2soaYsdOLcOgqy6oOVIBn6xTWCb5TD4zJ0rf 4D9oIbT+aratRcEm78kuR/TXGnVI8GYwVGEgMzPqk10Fg/lwoJoFrsmDdSDQi9JBkHLT c6Ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780075136; x=1780679936; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=8wZ8gF6/XO5oxB0yf+BvwEi0Q7rBPl4dd/tZzxFPNC8=; b=gHbm0xZvaLf9Y3TAGgr6K14LR/kB8k3is9LkqXgVQEAy3mX8gw7XhbJgpILizXesjX Ip5/oq215lnVzPkPDzWlMTiZoyEUl2VR/Cm0eQ1d2bE7zQGZ7gfnjHwvmaVv10thQcxB vZntGYfPsAOCMgglTUsTeshkRveTuqKZJmcidRIjFK16UGVQvaKWW8OC9geYiMAYYtDK WnmReD9nvhcyxGkjOkxo5LMf3iktVCGqR4b7ABIzWFiJ6lp+EGGtFZ1LFAV5odO/e0B6 HGyMzRzo04TIL210e9QzGpYmT5KrhmxSsLMmSh1xR7UhezUXG4U/wgpiAucqa2ozEURO 09FA== X-Forwarded-Encrypted: i=1; AFNElJ9cNxJZtEjh6p6Cs1H4h6L6uUXs9Z7cnheFbCU1l8C6FsOxH8YSGj7zFDQwfagXzO8uX1VDN8kELjrASks=@vger.kernel.org X-Gm-Message-State: AOJu0YxrND7YqcNqvkLtELZOmxD9CRXZuPwntVmD8cGEUczAWa4QyO3G 5XJ3anAgtgBSx3JfykquRuLU27Fe1m49oY1Y9LJ3CwzjyOD8Oem5oCSf X-Gm-Gg: Acq92OEYodgfysTRFEjTzSaM2bL61Mgv3UBwDtzc1DXWpFPgigERjUbWnsdEluIT0Hh 2/J/B9gZkD54DFTotf0LcPDYCwUn0KpYbmiWFLf0AdSbrcJXT462YdX+36vHu0Apmcw4VRz8oq6 f8bvhvHFRqsXoVmffRUTMvl6mulLHjXW78A2q3K2nmf4Dg5jgJP6NLTDBLHqgRkCMoqy6GA03yA ozzOLIUNjltd9lBHkj4uJBLHsRr3FiJDaHYbblQsYECS/5kCOt3suvvHzBare2uPFV1toj138nD ehNpmQGagE6EUDsO7Pb8A7ECfHf+GpXol/r7cma/1M5on4xDm6Bkq1aDmu2cPIJUjy8Ajoa0cPJ 138NxYWqxo5WhqGQd8q+SEFNNMPkn4oECdFXeQ8TuqUzhJfCsYIgFHHvSse6kbLiLr9HnjLWvYQ jUOcqfMcEKcdUsKSDaWBL290SBJn854eOKSDsT8gBsNxvxtYRW3g7KYG2TY8TRu5HFlk9QW4THd WcW02drh3O/ X-Received: by 2002:a05:600c:3e8c:b0:490:3d48:6cb9 with SMTP id 5b1f17b1804b1-490a2a02b06mr5226445e9.3.1780075136000; Fri, 29 May 2026 10:18:56 -0700 (PDT) Received: from f.. (cst-prg-92-135.cust.vodafone.cz. [46.135.92.135]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4909caa7faasm56255375e9.11.2026.05.29.10.18.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 May 2026 10:18:55 -0700 (PDT) From: Mateusz Guzik To: brauner@kernel.org Cc: viro@zeniv.linux.org.uk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, adobriyan@gmail.com Subject: [PATCH v4 2/3] fs: RCU-ify filesystems list Date: Fri, 29 May 2026 19:18:39 +0200 Message-ID: <20260529171840.2576445-3-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260529171840.2576445-1-mjguzik@gmail.com> References: <20260529171840.2576445-1-mjguzik@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Christian Brauner The drivers list was protected by an rwlock; every mount, every open of /proc/filesystems and the legacy sysfs(2) syscall walked a hand-rolled singly-linked list under it. /proc/filesystems is especially hot because libselinux causes programs as mundane as mkdir, ls and sed to open and read it on every invocation. Convert the list to an RCU-protected hlist and switch the writer side to a plain spinlock. Writers keep their existing non-sleeping section while readers walk under rcu_read_lock() with no lock traffic: - register_filesystem()/unregister_filesystem() take file_systems_lock, publish via hlist_{add_tail,del_init}_rcu() and invalidate the cached /proc/filesystems string. unregister_filesystem() keeps its synchronize_rcu() after dropping the lock so in-flight readers are drained before the module (and its embedded file_system_type) can go away. - __get_fs_type(), list_bdev_fs_names() and the fs_index()/fs_name()/fs_maxindex() helpers walk the list under rcu_read_lock(). fs_name() continues to drop the read-side lock after try_module_get() and accesses ->name outside the RCU section; the module reference pins the embedded file_system_type across the boundary. struct file_system_type::next becomes struct hlist_node list; no in-tree caller references the old ->next field outside fs/filesystems.c. Signed-off-by: Christian Brauner --- fs/filesystems.c | 179 +++++++++++++++++++-------------------------- fs/ocfs2/super.c | 1 - include/linux/fs.h | 2 +- 3 files changed, 75 insertions(+), 107 deletions(-) diff --git a/fs/filesystems.c b/fs/filesystems.c index 0c7d2b7ac26c..7976366d4197 100644 --- a/fs/filesystems.c +++ b/fs/filesystems.c @@ -17,22 +17,19 @@ #include #include #include +#include =20 /* - * Handling of filesystem drivers list. - * Rules: - * Inclusion to/removals from/scanning of list are protected by spinlock. - * During the unload module must call unregister_filesystem(). - * We can access the fields of list element if: - * 1) spinlock is held or - * 2) we hold the reference to the module. - * The latter can be guaranteed by call of try_module_get(); if it - * returned 0 we must skip the element, otherwise we got the reference. - * Once the reference is obtained we can drop the spinlock. + * Read-mostly filesystem drivers list. + * + * Readers walk under rcu_read_lock(); writers take file_systems_lock + * and publish via _rcu hlist primitives. unregister_filesystem() + * synchronize_rcu()s after unlock so the embedded file_system_type + * can't go away under a reader. To keep using a filesystem after + * the RCU section ends, take a module reference via try_module_get(). */ - -static struct file_system_type *file_systems; -static DEFINE_RWLOCK(file_systems_lock); +static HLIST_HEAD(file_systems); +static DEFINE_SPINLOCK(file_systems_lock); =20 /* WARNING: This can be used only if we _already_ own a reference */ struct file_system_type *get_filesystem(struct file_system_type *fs) @@ -46,14 +43,15 @@ void put_filesystem(struct file_system_type *fs) module_put(fs->owner); } =20 -static struct file_system_type **find_filesystem(const char *name, unsigne= d len) +static struct file_system_type *find_filesystem(const char *name, unsigned= len) { - struct file_system_type **p; - for (p =3D &file_systems; *p; p =3D &(*p)->next) - if (strncmp((*p)->name, name, len) =3D=3D 0 && - !(*p)->name[len]) - break; - return p; + struct file_system_type *fs; + + hlist_for_each_entry_rcu(fs, &file_systems, list, + lockdep_is_held(&file_systems_lock)) + if (strncmp(fs->name, name, len) =3D=3D 0 && !fs->name[len]) + return fs; + return NULL; } =20 /** @@ -64,33 +62,26 @@ static struct file_system_type **find_filesystem(const = char *name, unsigned len) * is aware of for mount and other syscalls. Returns 0 on success, * or a negative errno code on an error. * - * The &struct file_system_type that is passed is linked into the kernel=20 + * The &struct file_system_type that is passed is linked into the kernel * structures and must not be freed until the file system has been * unregistered. */ -=20 -int register_filesystem(struct file_system_type * fs) +int register_filesystem(struct file_system_type *fs) { - int res =3D 0; - struct file_system_type ** p; - if (fs->parameters && !fs_validate_description(fs->name, fs->parameters)) return -EINVAL; =20 BUG_ON(strchr(fs->name, '.')); - if (fs->next) + if (!hlist_unhashed_lockless(&fs->list)) return -EBUSY; - write_lock(&file_systems_lock); - p =3D find_filesystem(fs->name, strlen(fs->name)); - if (*p) - res =3D -EBUSY; - else - *p =3D fs; - write_unlock(&file_systems_lock); - return res; -} =20 + guard(spinlock)(&file_systems_lock); + if (find_filesystem(fs->name, strlen(fs->name))) + return -EBUSY; + hlist_add_tail_rcu(&fs->list, &file_systems); + return 0; +} EXPORT_SYMBOL(register_filesystem); =20 /** @@ -100,94 +91,78 @@ EXPORT_SYMBOL(register_filesystem); * Remove a file system that was previously successfully registered * with the kernel. An error is returned if the file system is not found. * Zero is returned on a success. - *=09 + * * Once this function has returned the &struct file_system_type structure * may be freed or reused. */ -=20 -int unregister_filesystem(struct file_system_type * fs) +int unregister_filesystem(struct file_system_type *fs) { - struct file_system_type ** tmp; - - write_lock(&file_systems_lock); - tmp =3D &file_systems; - while (*tmp) { - if (fs =3D=3D *tmp) { - *tmp =3D fs->next; - fs->next =3D NULL; - write_unlock(&file_systems_lock); - synchronize_rcu(); - return 0; - } - tmp =3D &(*tmp)->next; + scoped_guard(spinlock, &file_systems_lock) { + if (hlist_unhashed(&fs->list)) + return -EINVAL; + hlist_del_init_rcu(&fs->list); } - write_unlock(&file_systems_lock); - - return -EINVAL; + synchronize_rcu(); + return 0; } - EXPORT_SYMBOL(unregister_filesystem); =20 #ifdef CONFIG_SYSFS_SYSCALL -static int fs_index(const char __user * __name) +static int fs_index(const char __user *__name) { - struct file_system_type * tmp; + struct file_system_type *p; char *name __free(kfree) =3D strndup_user(__name, PATH_MAX); - int err, index; + int index =3D 0; =20 if (IS_ERR(name)) return PTR_ERR(name); =20 - err =3D -EINVAL; - read_lock(&file_systems_lock); - for (tmp=3Dfile_systems, index=3D0 ; tmp ; tmp=3Dtmp->next, index++) { - if (strcmp(tmp->name, name) =3D=3D 0) { - err =3D index; - break; - } + guard(rcu)(); + hlist_for_each_entry_rcu(p, &file_systems, list) { + if (strcmp(p->name, name) =3D=3D 0) + return index; + index++; } - read_unlock(&file_systems_lock); - return err; + return -EINVAL; } =20 -static int fs_name(unsigned int index, char __user * buf) +static int fs_name(unsigned int index, char __user *buf) { - struct file_system_type * tmp; - int len, res =3D -EINVAL; - - read_lock(&file_systems_lock); - for (tmp =3D file_systems; tmp; tmp =3D tmp->next, index--) { - if (index =3D=3D 0) { - if (try_module_get(tmp->owner)) - res =3D 0; + struct file_system_type *p, *found =3D NULL; + int len, res; + + scoped_guard(rcu) { + hlist_for_each_entry_rcu(p, &file_systems, list) { + if (index--) + continue; + if (try_module_get(p->owner)) + found =3D p; break; } } - read_unlock(&file_systems_lock); - if (res) - return res; + if (!found) + return -EINVAL; =20 /* OK, we got the reference, so we can safely block */ - len =3D strlen(tmp->name) + 1; - res =3D copy_to_user(buf, tmp->name, len) ? -EFAULT : 0; - put_filesystem(tmp); + len =3D strlen(found->name) + 1; + res =3D copy_to_user(buf, found->name, len) ? -EFAULT : 0; + put_filesystem(found); return res; } =20 static int fs_maxindex(void) { - struct file_system_type * tmp; - int index; + struct file_system_type *p; + int index =3D 0; =20 - read_lock(&file_systems_lock); - for (tmp =3D file_systems, index =3D 0 ; tmp ; tmp =3D tmp->next, index++) - ; - read_unlock(&file_systems_lock); + guard(rcu)(); + hlist_for_each_entry_rcu(p, &file_systems, list) + index++; return index; } =20 /* - * Whee.. Weird sysv syscall.=20 + * Whee.. Weird sysv syscall. */ SYSCALL_DEFINE3(sysfs, int, option, unsigned long, arg1, unsigned long, ar= g2) { @@ -216,8 +191,8 @@ int __init list_bdev_fs_names(char *buf, size_t size) size_t len; int count =3D 0; =20 - read_lock(&file_systems_lock); - for (p =3D file_systems; p; p =3D p->next) { + guard(rcu)(); + hlist_for_each_entry_rcu(p, &file_systems, list) { if (!(p->fs_flags & FS_REQUIRES_DEV)) continue; len =3D strlen(p->name) + 1; @@ -230,24 +205,20 @@ int __init list_bdev_fs_names(char *buf, size_t size) size -=3D len; count++; } - read_unlock(&file_systems_lock); return count; } =20 #ifdef CONFIG_PROC_FS static int filesystems_proc_show(struct seq_file *m, void *v) { - struct file_system_type * tmp; + struct file_system_type *p; =20 - read_lock(&file_systems_lock); - tmp =3D file_systems; - while (tmp) { + guard(rcu)(); + hlist_for_each_entry_rcu(p, &file_systems, list) { seq_printf(m, "%s\t%s\n", - (tmp->fs_flags & FS_REQUIRES_DEV) ? "" : "nodev", - tmp->name); - tmp =3D tmp->next; + (p->fs_flags & FS_REQUIRES_DEV) ? "" : "nodev", + p->name); } - read_unlock(&file_systems_lock); return 0; } =20 @@ -263,11 +234,10 @@ static struct file_system_type *__get_fs_type(const c= har *name, int len) { struct file_system_type *fs; =20 - read_lock(&file_systems_lock); - fs =3D *(find_filesystem(name, len)); + guard(rcu)(); + fs =3D find_filesystem(name, len); if (fs && !try_module_get(fs->owner)) fs =3D NULL; - read_unlock(&file_systems_lock); return fs; } =20 @@ -291,5 +261,4 @@ struct file_system_type *get_fs_type(const char *name) } return fs; } - EXPORT_SYMBOL(get_fs_type); diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index b875f01c9756..4870e680c4e5 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -1224,7 +1224,6 @@ static struct file_system_type ocfs2_fs_type =3D { .name =3D "ocfs2", .kill_sb =3D kill_block_super, .fs_flags =3D FS_REQUIRES_DEV|FS_RENAME_DOES_D_MOVE, - .next =3D NULL, .init_fs_context =3D ocfs2_init_fs_context, .parameters =3D ocfs2_param_spec, }; diff --git a/include/linux/fs.h b/include/linux/fs.h index 748c29fa679c..2c4ffc0bb8c7 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2300,7 +2300,7 @@ struct file_system_type { const struct fs_parameter_spec *parameters; void (*kill_sb) (struct super_block *); struct module *owner; - struct file_system_type * next; + struct hlist_node list; struct hlist_head fs_supers; =20 struct lock_class_key s_lock_key; --=20 2.48.1 From nobody Mon Jun 8 10:56:49 2026 Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ADC443803D1 for ; Fri, 29 May 2026 17:18:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075140; cv=none; b=GAPgjJsyQKEAUtt0QIxdkvwdN1xvgmwWFPYu120V4xDYkjGlooeJlE0/N3KvLSL9y2XdYqUNl9DQFUgoo3t5TihhHC1Qo9R2YJgc+OejpELk7d+OV0GAO7BjuI0zwgIJXCgPuMAO9ZuhqpnjfdGMn6zqOuVgqZFP1jpsD2RA3IA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780075140; c=relaxed/simple; bh=qO8gKdZWVCp3u2z9buUYizj/kCjZNyvtTAd/qWerhzQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=irYfJtD8qn51pqCOEzKWcS3jW2h771vXKyhTcd+q2JQQ7QLoZXqmL+mE2PGT8zwx/WvaZQcuJS31uHURGa/esQ3y1S6ylpJH5id8xxUiQC9uBRP5LupJNZNp28cYSZxZAAp+I56mfw4PYLE9dTzEjpMhZpBR7orofW/UpYP6Mxg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=B2tEfGFO; arc=none smtp.client-ip=209.85.128.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="B2tEfGFO" Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-4891e5b9c1fso123048315e9.2 for ; Fri, 29 May 2026 10:18:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780075137; x=1780679937; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gjNfddSx1CZYCGwuNq70lpBd2sPOv8YHgLj4FuhfVY4=; b=B2tEfGFOCA2x3DjCT7GqiBTbSlM4Y0n4aouCWZKLBdbOp1v0BWXlvmegMVS6rNNyxy aAiWo7Q03je5ilO982YxaiAqzr9OxW3CRE8a1OdaFfAdsxYzye1SPyasBcjPOHUpeoon V2Vs3QG4tU0AInQl+dSBLYl94Slabal484M+gj7mhUDu91koEDqsWswaf+xCXMEwtDjX X3WRDelg+UlM4AOOlGx2Kd70YC5Pk9JZhvUcO0pkFxpN/PmLGKq03ca7VTeMFhNx906T 6416aGA1jwQmAErUKgdr2t2IX82Q/Hqrj7pRDSwo4LV9RCCK/GAgcFfk2Bri0vznfl2S Q7BA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780075137; x=1780679937; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=gjNfddSx1CZYCGwuNq70lpBd2sPOv8YHgLj4FuhfVY4=; b=a83kaxErxSFYErRObUhtIqGd7/zKTOsqv0Kl4J2pqKQzdwoGHlhVYp+DbA2CQmptLt j23JYqZMlXnWc4239U6BJk3nUdpXt8iTMCHCTNZWJrXmXgGKGW4bMJs34yXNGm8vkPYJ IRrE7qqTheFueSmZhPMFuuUQVM+tJ5XEmaWxKomtbWTAr6ZJ51kE5GndtcGWLWR+SIQX yWnNJ4KWagdtBOmXqDBW52MKAdrmuw87OSaMg6rNdNoLmmn9w4gnOpxxr40WwtjLZmYF vbjBxvcU3bUrDiQGnf4aJ2aNujpu6XQ7Vu9XuqIuiBKIYxQY7hdD00WkTFQpI6WaJJju EqfA== X-Forwarded-Encrypted: i=1; AFNElJ9IINgxKSrequvT0TF93hDXkonN2tih4aeE14uydUQnd/Nz6i9qR6YxkDIfSpqwP4T77rdj68lBm2OhoYU=@vger.kernel.org X-Gm-Message-State: AOJu0YxLiflwJ5obhqvzytzBCcI1VQEenh5Yg3eoKfAqa04rhGPgXkpB XEI6qeh/p2i0ObQfd5bjPzqQchTFxGxFjmNIy6uiW9DlLmhHTNiRUhub X-Gm-Gg: Acq92OFFkJ0r/orhKPuRAY12LArXW+xqR20+JwusoG4WGcA8lsQcQZnNdMUZmy+pjaS ssZrEVStxroPqBbL/vS8QQ/kmWEgFeQN5j0kG4kEcd4cyncJK+XX93gEx2zuSmLrY0hTBMzEtum jG+IKdmN8lGGM7x/z49djWYwKBtx+8J7LNDvvOqYfrzxfPIsDHWbxOSbfTTUyxL7yzSsrJYvB2C NuY2D3PtRK7lO+0M+kTCc0Ep2R6w9LO7VP8ByNlEr24tbKGBSVb7wrGdvOu1uOVZQlZaIXj6WSy wvYP3VRykzd60+AfB00rU60H0FAa+vKFZ0Pt7u8UcM5fAV+LAwZGNmKjfZ0dAnOSI0XQxgF5tFe yS7tjC4c/fA7V50CKaOb9xWuNr52X+RqJKBa1crvZuQNAiTGfzpxWV2VZdTzJ5hmaWndDYw8r5t mIq+WT9zrd4eFyywMAQOUbpL+B+OZdkMnUxqXYAaiLKMvZSMMLksUueqiJ32rc1BBw4Hzj5wlOr 40plv/i90W3 X-Received: by 2002:a05:600d:4448:20b0:48f:e26a:1744 with SMTP id 5b1f17b1804b1-490a29096admr7694415e9.9.1780075137144; Fri, 29 May 2026 10:18:57 -0700 (PDT) Received: from f.. (cst-prg-92-135.cust.vodafone.cz. [46.135.92.135]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4909caa7faasm56255375e9.11.2026.05.29.10.18.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 May 2026 10:18:56 -0700 (PDT) From: Mateusz Guzik To: brauner@kernel.org Cc: viro@zeniv.linux.org.uk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, adobriyan@gmail.com, Mateusz Guzik Subject: [PATCH v4 3/3] fs: cache the string generated by reading /proc/filesystems Date: Fri, 29 May 2026 19:18:40 +0200 Message-ID: <20260529171840.2576445-4-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260529171840.2576445-1-mjguzik@gmail.com> References: <20260529171840.2576445-1-mjguzik@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" It is being read surprisingly often (e.g., by mkdir, ls and even sed!). This is lock-protected pointer chasing over a linked list to pay for sprintf for every fs (32 on my boxen). Instead cache the result. While here make the file as permanent to avoid spurious ref trips in procfs. Signed-off-by: Mateusz Guzik --- fs/filesystems.c | 153 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 151 insertions(+), 2 deletions(-) diff --git a/fs/filesystems.c b/fs/filesystems.c index 7976366d4197..673a03b5f32b 100644 --- a/fs/filesystems.c +++ b/fs/filesystems.c @@ -31,6 +31,36 @@ static HLIST_HEAD(file_systems); static DEFINE_SPINLOCK(file_systems_lock); =20 +#ifdef CONFIG_PROC_FS +/* + * Cache a stringified version of the filesystem list. + * + * The fs list gets queried a lot by userspace because of libselinux, incl= uding + * rather surprising programs (would you guess *sed* is on the list?). In = order + * to reduce the overhead we cache the resulting string, which normally ha= ngs + * around below 512 bytes in size. + * + * As the list almost never changes, its creation is not particularly opti= mized + * to keep things simple. + * + * We sort it out on read in order to not introduce a failure point for fs + * registration (in principle we may be unable to alloc memory for the lis= t). + */ +struct file_systems_string { + struct rcu_head rcu; + unsigned long gen; + size_t len; + char string[]; +}; + +static unsigned long file_systems_gen; +static struct file_systems_string __read_mostly __rcu *file_systems_string; + +static void invalidate_filesystems_string(void); +#else +static inline void invalidate_filesystems_string(void) { } +#endif + /* WARNING: This can be used only if we _already_ own a reference */ struct file_system_type *get_filesystem(struct file_system_type *fs) { @@ -80,6 +110,7 @@ int register_filesystem(struct file_system_type *fs) if (find_filesystem(fs->name, strlen(fs->name))) return -EBUSY; hlist_add_tail_rcu(&fs->list, &file_systems); + invalidate_filesystems_string(); return 0; } EXPORT_SYMBOL(register_filesystem); @@ -101,6 +132,7 @@ int unregister_filesystem(struct file_system_type *fs) if (hlist_unhashed(&fs->list)) return -EINVAL; hlist_del_init_rcu(&fs->list); + invalidate_filesystems_string(); } synchronize_rcu(); return 0; @@ -209,7 +241,100 @@ int __init list_bdev_fs_names(char *buf, size_t size) } =20 #ifdef CONFIG_PROC_FS -static int filesystems_proc_show(struct seq_file *m, void *v) +static void invalidate_filesystems_string(void) +{ + struct file_systems_string *old; + + lockdep_assert_held_write(&file_systems_lock); + file_systems_gen++; + old =3D rcu_replace_pointer(file_systems_string, NULL, + lockdep_is_held(&file_systems_lock)); + if (old) + kfree_rcu(old, rcu); +} + +static __cold noinline int regen_filesystems_string(void) +{ + struct file_system_type *p; + struct file_systems_string *old, *new; + size_t newlen, usedlen; + unsigned long gen; + +retry: + newlen =3D 0; + + /* pre-calc space for each fs */ + spin_lock(&file_systems_lock); + gen =3D file_systems_gen; + hlist_for_each_entry_rcu(p, &file_systems, list) { + if (!(p->fs_flags & FS_REQUIRES_DEV)) + newlen +=3D strlen("nodev"); + newlen +=3D strlen("\t") + strlen(p->name) + strlen("\n"); + } + spin_unlock(&file_systems_lock); + + new =3D kmalloc(offsetof(struct file_systems_string, string) + newlen + 1, + GFP_KERNEL); + if (!new) + return -ENOMEM; + + new->gen =3D gen; + new->len =3D newlen; + new->string[newlen] =3D '\0'; + + spin_lock(&file_systems_lock); + old =3D file_systems_string; + + /* + * Did someone beat us to it? + */ + if (old && old->gen =3D=3D file_systems_gen) { + spin_unlock(&file_systems_lock); + kfree(new); + return 0; + } + + /* + * Did the list change in the meantime? + */ + if (gen !=3D file_systems_gen) { + spin_unlock(&file_systems_lock); + kfree(new); + goto retry; + } + + /* + * Populate the string. + * + * We know we have just enough space because we calculated the right + * size the previous time we had the lock and confirmed the list has + * not changed after reacquiring it. + */ + usedlen =3D 0; + hlist_for_each_entry_rcu(p, &file_systems, list) { + usedlen +=3D sprintf(&new->string[usedlen], "%s\t%s\n", + (p->fs_flags & FS_REQUIRES_DEV) ? "" : "nodev", + p->name); + } + + if (WARN_ON_ONCE(new->len !=3D strlen(new->string))) { + /* + * Should never happen of course, keep this in case someone changes stri= ng + * generation above and messes it up. + */ + spin_unlock(&file_systems_lock); + kfree(new); + return -EINVAL; + } + + rcu_assign_pointer(file_systems_string, new); + spin_unlock(&file_systems_lock); + if (old) + kfree_rcu(old, rcu); + return 0; +} + +static __cold noinline int filesystems_proc_show_fallback(struct seq_file = *m, void *v) { struct file_system_type *p; =20 @@ -222,9 +347,33 @@ static int filesystems_proc_show(struct seq_file *m, v= oid *v) return 0; } =20 +static int filesystems_proc_show(struct seq_file *m, void *v) +{ + struct file_systems_string *fss; + + for (;;) { + scoped_guard(rcu) { + fss =3D rcu_dereference(file_systems_string); + if (likely(fss)) { + seq_write(m, fss->string, fss->len); + return 0; + } + } + + int err =3D regen_filesystems_string(); + if (unlikely(err)) + return filesystems_proc_show_fallback(m, v); + } +} + static int __init proc_filesystems_init(void) { - proc_create_single("filesystems", 0, NULL, filesystems_proc_show); + struct proc_dir_entry *pde; + + pde =3D proc_create_single("filesystems", 0, NULL, filesystems_proc_show); + if (!pde) + return -ENOMEM; + proc_make_permanent(pde); return 0; } module_init(proc_filesystems_init); --=20 2.48.1