From nobody Fri Jun 19 09:05:20 2026 Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9084E3815DC for ; Sat, 25 Apr 2026 22:09:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777154944; cv=none; b=JUhO654Kh+H4xYlMb5g5h0tpF1N2oAkRsFYxcjnQt0RDRY1vMi9Vh6VZgLSyQNW5vTCTne09oJyhgcNsrmNh+lGdCeIGdxQhHxvG27E0CTWrAnhD5rnOl8kyclFzR/QLtGoSIzfl3BxvSd89k5JgzSfh3vFMvPxBRLsWwRnMu2k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777154944; c=relaxed/simple; bh=bPeySIzvwdl59bI8ZI4Nv+mHr412k88Uyp1LoSfzrgk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=q+QMxSdD5bhTrsqOFv6+5u6bl0MGQt5Akmd22tm41fxfykhe0iyOavHRP1lb8CcxgJgpCyOK3ftvWnpkMGPV3hx8c6ALiYtKMQbA6HIX6IgbTQ8velqZD621vYHDmDMxPlxUootLrFjPMgUr1dTsLa/CsfGL4z3EsWn5zl2jick= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=T1aMH6j1; arc=none smtp.client-ip=209.85.128.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="T1aMH6j1" Received: by mail-wm1-f47.google.com with SMTP id 5b1f17b1804b1-4852a9c6309so83053715e9.0 for ; Sat, 25 Apr 2026 15:09:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777154940; x=1777759740; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IjA4suJY50P3hL0eg2zIeJeFSvUg5EcOPNDUedPN0Yo=; b=T1aMH6j1pRhFzmOiZfLxKTGirNYPR2MbKZyoQgBGdDCuBASDFPL6JqKd8E/MrhXUng CpczMJ+2ePsqF8KMc2kBv8gs8mXrBrsRd64MpyFQyzYrq/OLVo371m8X0WVkQ54GeA+x PmymDfU8PD47sOlyH5Y/I+QjHRzpUZUrM5LTbVW+H2nOyaCOuT8T2d01mByskmKxcptR lp6cBdeYqlalwGSsQmuppDVMyQl8AhLcs87bgoKj7ydvOOV+SgXF/6XYfXsvtC2SofvX /8+cNnzrNLhY5GxZS9ujaGRPDD4SL+iq65tMyDFM1mM0A1s7wEp04gPlcsYu5LXmzuEB cnJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777154940; x=1777759740; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=IjA4suJY50P3hL0eg2zIeJeFSvUg5EcOPNDUedPN0Yo=; b=rxd7AIvKpNkYiQ+oRTv6oQ+mzB+dQZNFqvOOs2fWiqgpCaSBEm4i42NFvZapU6Y+2y +x5Mlot7JHaWQjgwg0upiIl3AvJzKJZpB6DKFaoooGEdqa9Q5Yb97cLGhH9+OBXp5jHO t4NvZDCSAZVQoDYbYgK8hdP7nM3qW0lmoCun47A+bTIz7wVrR35Cgb0iQk6RWNtCYrY+ EnicYunygR0/qhtyvQ9JCrAsb+Lj3DLYJoX4x5Lz8b9XTjQT+vHogNwlAYYdJ/MGtv7T OZOgdGPf8R49YR8DhieXk5rQCKLHGATQT6eDTuC7HQH+PQejEFlvLp+J7h3ndE06whLm rH2w== X-Forwarded-Encrypted: i=1; AFNElJ8a913Cs83NBfZ6MU9go+xqcYiMra7I1/EuEZBUxsTOfeKeiyYQTAGix+AXVapd1Cbm7KlAjp0FTKoqx5o=@vger.kernel.org X-Gm-Message-State: AOJu0Yz4PDgaFwXE2UW33ka/MLLFWXnKu39mrmtkGjEQAqHPaanvGK6Q B519W4uDZQWrZtZuK7lJjnT09aLPE9EhjYOUuNEUh7ATrE1t6Lt/dVQp X-Gm-Gg: AeBDievgi0Omn9+Jk76lZ3khEd9JbjwJbz8L6KNUd8hUG6ynM8eO8faBB0dBPrB5rIL hPtJ9epriiPXreCnovIe8LSYzWgN0ylpjTbgyt+mp+subGjKQFpTXd9vttciix6ESVCFfhOdU1K FKJjnsZ8ybJ6dLhsfVlG2OZbAe7mf2HjgZtuxW91ejrpcjK604g++QMSgh4velwubH5NamXTTUz m15r9Q11MCTdjOqq8TYYv5bZrVhVzjHeY6qGGtT77AHPqWLfC8+YgugILPh1tCKc2yzolvMTPeQ hiycqWtuyjLMFzOsDRCmwiKvOuwBhdivjA2Q2F9o9xbyDw1LINXMQHfQDviUQ8rwvjElm2Avzqp 3MwlnmOU3C7pDc0OmfHMKU6iX1hKUM8lLmczAhKhNxf9ZqMF0170j3Bs2rN5UekKEmHclYg5YUC Ev2gp+BQv1ViqXI9c+iajNyJ5vX+sHUcLN2IKm5LeRXBL1k6/ZXzVi2rlKVqRUs15NSIneld88D gqgNCjzJrcn X-Received: by 2002:a05:600c:4593:b0:488:c120:480a with SMTP id 5b1f17b1804b1-488fb78bf05mr524932765e9.31.1777154939987; Sat, 25 Apr 2026 15:08:59 -0700 (PDT) Received: from f.. (cst-prg-93-232.cust.vodafone.cz. [46.135.93.232]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-488fc0b4c85sm651984545e9.0.2026.04.25.15.08.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 25 Apr 2026 15:08:58 -0700 (PDT) From: Mateusz Guzik To: brauner@kernel.org Cc: viro@zeniv.linux.org.uk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, adobriyan@gmail.com Subject: [PATCH v3 1/3] proc: allow to mark /proc files permanent outside of fs/proc/ Date: Sun, 26 Apr 2026 00:08:42 +0200 Message-ID: <20260425220844.1763933-2-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260425220844.1763933-1-mjguzik@gmail.com> References: <20260425220844.1763933-1-mjguzik@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Alexey Dobriyan Add proc_make_permanent() function to mark PDE as permanent to speed up open/read/close (one alloc/free and lock/unlock less). Enable it for built-in code and for compiled-in modules. This function becomes nop magically in modular code. Note, note, note! If built-in code creates and deletes PDEs dynamically (not in init hook), then proc_make_permanent() must not be used. It is intended for simple code: static int __init xxx_module_init(void) { g_pde =3D proc_create_single(); proc_make_permanent(g_pde); return 0; } static void __exit xxx_module_exit(void) { remove_proc_entry(g_pde); } If module is built-in then exit hook never executed and PDE is permanent so it is OK to mark it as such. If module is module then rmmod will yank PDE, but proc_make_permanent() is nop and core /proc code will do everything right. Signed-off-by: Alexey Dobriyan --- fs/proc/generic.c | 12 ++++++++++++ fs/proc/internal.h | 3 +++ include/linux/proc_fs.h | 10 ++++++++++ 3 files changed, 25 insertions(+) diff --git a/fs/proc/generic.c b/fs/proc/generic.c index 3063080f3bb2..497561ee3848 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -845,3 +845,15 @@ ssize_t proc_simple_write(struct file *f, const char _= _user *ubuf, size_t size, kfree(buf); return ret =3D=3D 0 ? size : ret; } + +/* + * Not exported to modules: + * modules' /proc files aren't permanent because modules aren't permanent. + */ +void impl_proc_make_permanent(struct proc_dir_entry *pde); +void impl_proc_make_permanent(struct proc_dir_entry *pde) +{ + if (pde) { + pde_make_permanent(pde); + } +} diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 64dc44832808..1edbabbdbc5d 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -79,8 +79,11 @@ static inline bool pde_is_permanent(const struct proc_di= r_entry *pde) return pde->flags & PROC_ENTRY_PERMANENT; } =20 +/* This is for builtin code, not even for modules which are compiled in. */ static inline void pde_make_permanent(struct proc_dir_entry *pde) { + /* Ensure magic flag does something. */ + static_assert(PROC_ENTRY_PERMANENT !=3D 0); pde->flags |=3D PROC_ENTRY_PERMANENT; } =20 diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index 19d1c5e5f335..dceccd27a234 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -248,4 +248,14 @@ static inline struct pid_namespace *proc_pid_ns(struct= super_block *sb) =20 bool proc_ns_file(const struct file *file); =20 +static inline void proc_make_permanent(struct proc_dir_entry *pde) +{ + /* Don't give matches to modules. */ +#if defined CONFIG_PROC_FS && !defined MODULE + /* This mess is created by defining "struct proc_dir_entry" elsewhere. */ + void impl_proc_make_permanent(struct proc_dir_entry *pde); + impl_proc_make_permanent(pde); +#endif +} + #endif /* _LINUX_PROC_FS_H */ --=20 2.48.1 From nobody Fri Jun 19 09:05:20 2026 Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0F3433815CC for ; Sat, 25 Apr 2026 22:09:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777154944; cv=none; b=erTVj+PFMfICzAaCsp+h5XqVqBIuy2lb5Ph+GBBQoFoh1/giEHHzNX4pXD1PNUBWM+5vKEDgyZxlkvRn6p7JIly4nQjZec3Y5Ly2nfCW25Y7PfM8c4MaZRLr7sG9qjZ52cGkVcV/JoeOQOla5YlItnD/IRNN0UiWoYJYelViSDI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777154944; c=relaxed/simple; bh=L9zpwf3EIr2AD49nRl3j2Ab6fW1/MFPJSpax6eFne2o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oLzTm7uheTt3QUeGjfaWikH4QUQmmO6W/YoQl2g3yhBnjlx1Kt7AJai2XBlV74a9+9pJsZYCj/GINcri+9v2llQY6OcrXbJ0Q5STFhCDFHESeRJJOggqUQF/b2NLSQyUQXtMq8ItEvw2aZ+CbhjMzB+fBPRMg71Ml9TxNwL5ytU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=j68SnNzh; arc=none smtp.client-ip=209.85.128.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="j68SnNzh" Received: by mail-wm1-f50.google.com with SMTP id 5b1f17b1804b1-488b0e1b870so149327355e9.2 for ; Sat, 25 Apr 2026 15:09:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777154941; x=1777759741; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=D+KYWQJc563XdjX05jfE7lShL8u3TjqfIZX6hbZyu1k=; b=j68SnNzh4MbsPnjJ7Kjxu4bp9kFnmC6Wzdiqa9vUvAZqk3VMtaXrS74udSruRoNeIM YjCK71W17Zu2tODlLABIzEB4kPqr1np8MtFVa92lETvPkV5G36icheFHrMagzGxK73Yu rs+ODvrO3+b/iz0W8OJJXM6Ksr+TKt3Lwj9vifPhITSdCUcbfcmXxiwIlA2teymL/SS0 /4QopO5RAlP01VdlM4IUNopKYalZBoSMvnEQttLyZI+8bGk3BCn/ucj8dm0hfXFt5rJ/ q6/WXbClLLu+DBkwO47vHvEYXIMXa1L9mmT2WLfSd11j3z83oQc86oU3MvUKZaAOwO4W XvnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777154941; x=1777759741; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=D+KYWQJc563XdjX05jfE7lShL8u3TjqfIZX6hbZyu1k=; b=F2om0m58WWthnd0Bia7OhTD6NkudncjvP7S16Vo5FDj0sTcNNCaa3PvbPvfLSF1Xfl CVEEZybErNesF5GbgtEoecMfqXiMI00TUDBgZA1x9KMBLUfydAXuAztUTp01JCHNPC+j 9ytmEYVAy9SPgYWTrkfLfjGsoGrdrLBaMaMyt9/WmO3jnrYRTrOwvzxOCY4Dfbuum2Qr reMGyrIl7riL1ssSObExcY0Kx0uG9QUIPunaYpEz2ifKZG67nE60QT5XFpWwqckpYoGq cXgdLfRRHQIDUJH3IqY8c2DUetDsFM/Eb3AfT4XSfnV2DGU5sEtcTMqdXbO+U7b63veL wUhQ== X-Forwarded-Encrypted: i=1; AFNElJ8AYZhVjTrbZbzRTfhQZooWhMLhgdDLjBzdIe1R/b1ldXddnRRbGOB17IPYQSmxbJVDSB6+QVsMOTCBp1c=@vger.kernel.org X-Gm-Message-State: AOJu0YxuDvUFGUJNjTuwXaGt9HUwRuuTg00rsxBsw14D2lU0U2BZKBcT OYu5aoqnEGNM0gz6fbHA4WkWXgeok5TITAc+/SD1qLGHpdWtSCK6w7hq X-Gm-Gg: AeBDietU59FNMevZveRtb/UH2igYr5iRlzCigduHX0UgC0h8JnSriQh8f+5+oKvzOxj ThajFMyHisAPHf9JB3uAvBGGHcJN/QrAYeGY1Gvfx23VOr4nzw+r4VtNZGJKt9xpLGEQR/BWoOq +YsF9LBppeWn9xY721avziyXhEMp75I6zHuObuU7mxTaJEvKLR1VtyLPMiiZPiOfitfKaZb6Tpu BhSU9rYXcD7IjvJVB/TiHKVIO/CyFy24uZonh4eaWes9Swn+nrnLKNM7w2YOnkZwPDeE0zgvBEG VfPtdLCKd6qOowQ3EXNR5lZ1wk0mrF1aRy+PZR5F5t0nB6hRYQ78sF+/WgIiTP/9niNLWPWAX5/ UMAK3kuAz5jk6zLHsQo69D13BX8CFGK8pQX+URgJWBiCmb9TQwrNf+uDlkNOkISInyRCXvRTeSH 7T153Et7hkbmT3WaEch5BhbQyqxnaC2JrsF8tcpbTwo6pLVB1o6Wn5WHqSge/jdxp41BgbS6ip7 7Pu5Eybx0TO X-Received: by 2002:a05:600c:350e:b0:488:a82f:bb95 with SMTP id 5b1f17b1804b1-488fb7861c0mr562579805e9.29.1777154941215; Sat, 25 Apr 2026 15:09:01 -0700 (PDT) Received: from f.. (cst-prg-93-232.cust.vodafone.cz. [46.135.93.232]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-488fc0b4c85sm651984545e9.0.2026.04.25.15.09.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 25 Apr 2026 15:09:00 -0700 (PDT) From: Mateusz Guzik To: brauner@kernel.org Cc: viro@zeniv.linux.org.uk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, adobriyan@gmail.com Subject: [PATCH v3 2/3] fs: RCU-ify filesystems list Date: Sun, 26 Apr 2026 00:08:43 +0200 Message-ID: <20260425220844.1763933-3-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260425220844.1763933-1-mjguzik@gmail.com> References: <20260425220844.1763933-1-mjguzik@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Christian Brauner The drivers list was protected by an rwlock; every mount, every open of /proc/filesystems and the legacy sysfs(2) syscall walked a hand-rolled singly-linked list under it. /proc/filesystems is especially hot because libselinux causes programs as mundane as mkdir, ls and sed to open and read it on every invocation. Convert the list to an RCU-protected hlist and switch the writer side to a plain spinlock. Writers keep their existing non-sleeping section while readers walk under rcu_read_lock() with no lock traffic: - register_filesystem()/unregister_filesystem() take file_systems_lock, publish via hlist_{add_tail,del_init}_rcu() and invalidate the cached /proc/filesystems string. unregister_filesystem() keeps its synchronize_rcu() after dropping the lock so in-flight readers are drained before the module (and its embedded file_system_type) can go away. - __get_fs_type(), list_bdev_fs_names() and the fs_index()/fs_name()/fs_maxindex() helpers walk the list under rcu_read_lock(). fs_name() continues to drop the read-side lock after try_module_get() and accesses ->name outside the RCU section; the module reference pins the embedded file_system_type across the boundary. struct file_system_type::next becomes struct hlist_node list; no in-tree caller references the old ->next field outside fs/filesystems.c. Signed-off-by: Christian Brauner --- fs/filesystems.c | 179 +++++++++++++++++++-------------------------- fs/ocfs2/super.c | 1 - include/linux/fs.h | 2 +- 3 files changed, 75 insertions(+), 107 deletions(-) diff --git a/fs/filesystems.c b/fs/filesystems.c index 0c7d2b7ac26c..7976366d4197 100644 --- a/fs/filesystems.c +++ b/fs/filesystems.c @@ -17,22 +17,19 @@ #include #include #include +#include =20 /* - * Handling of filesystem drivers list. - * Rules: - * Inclusion to/removals from/scanning of list are protected by spinlock. - * During the unload module must call unregister_filesystem(). - * We can access the fields of list element if: - * 1) spinlock is held or - * 2) we hold the reference to the module. - * The latter can be guaranteed by call of try_module_get(); if it - * returned 0 we must skip the element, otherwise we got the reference. - * Once the reference is obtained we can drop the spinlock. + * Read-mostly filesystem drivers list. + * + * Readers walk under rcu_read_lock(); writers take file_systems_lock + * and publish via _rcu hlist primitives. unregister_filesystem() + * synchronize_rcu()s after unlock so the embedded file_system_type + * can't go away under a reader. To keep using a filesystem after + * the RCU section ends, take a module reference via try_module_get(). */ - -static struct file_system_type *file_systems; -static DEFINE_RWLOCK(file_systems_lock); +static HLIST_HEAD(file_systems); +static DEFINE_SPINLOCK(file_systems_lock); =20 /* WARNING: This can be used only if we _already_ own a reference */ struct file_system_type *get_filesystem(struct file_system_type *fs) @@ -46,14 +43,15 @@ void put_filesystem(struct file_system_type *fs) module_put(fs->owner); } =20 -static struct file_system_type **find_filesystem(const char *name, unsigne= d len) +static struct file_system_type *find_filesystem(const char *name, unsigned= len) { - struct file_system_type **p; - for (p =3D &file_systems; *p; p =3D &(*p)->next) - if (strncmp((*p)->name, name, len) =3D=3D 0 && - !(*p)->name[len]) - break; - return p; + struct file_system_type *fs; + + hlist_for_each_entry_rcu(fs, &file_systems, list, + lockdep_is_held(&file_systems_lock)) + if (strncmp(fs->name, name, len) =3D=3D 0 && !fs->name[len]) + return fs; + return NULL; } =20 /** @@ -64,33 +62,26 @@ static struct file_system_type **find_filesystem(const = char *name, unsigned len) * is aware of for mount and other syscalls. Returns 0 on success, * or a negative errno code on an error. * - * The &struct file_system_type that is passed is linked into the kernel=20 + * The &struct file_system_type that is passed is linked into the kernel * structures and must not be freed until the file system has been * unregistered. */ -=20 -int register_filesystem(struct file_system_type * fs) +int register_filesystem(struct file_system_type *fs) { - int res =3D 0; - struct file_system_type ** p; - if (fs->parameters && !fs_validate_description(fs->name, fs->parameters)) return -EINVAL; =20 BUG_ON(strchr(fs->name, '.')); - if (fs->next) + if (!hlist_unhashed_lockless(&fs->list)) return -EBUSY; - write_lock(&file_systems_lock); - p =3D find_filesystem(fs->name, strlen(fs->name)); - if (*p) - res =3D -EBUSY; - else - *p =3D fs; - write_unlock(&file_systems_lock); - return res; -} =20 + guard(spinlock)(&file_systems_lock); + if (find_filesystem(fs->name, strlen(fs->name))) + return -EBUSY; + hlist_add_tail_rcu(&fs->list, &file_systems); + return 0; +} EXPORT_SYMBOL(register_filesystem); =20 /** @@ -100,94 +91,78 @@ EXPORT_SYMBOL(register_filesystem); * Remove a file system that was previously successfully registered * with the kernel. An error is returned if the file system is not found. * Zero is returned on a success. - *=09 + * * Once this function has returned the &struct file_system_type structure * may be freed or reused. */ -=20 -int unregister_filesystem(struct file_system_type * fs) +int unregister_filesystem(struct file_system_type *fs) { - struct file_system_type ** tmp; - - write_lock(&file_systems_lock); - tmp =3D &file_systems; - while (*tmp) { - if (fs =3D=3D *tmp) { - *tmp =3D fs->next; - fs->next =3D NULL; - write_unlock(&file_systems_lock); - synchronize_rcu(); - return 0; - } - tmp =3D &(*tmp)->next; + scoped_guard(spinlock, &file_systems_lock) { + if (hlist_unhashed(&fs->list)) + return -EINVAL; + hlist_del_init_rcu(&fs->list); } - write_unlock(&file_systems_lock); - - return -EINVAL; + synchronize_rcu(); + return 0; } - EXPORT_SYMBOL(unregister_filesystem); =20 #ifdef CONFIG_SYSFS_SYSCALL -static int fs_index(const char __user * __name) +static int fs_index(const char __user *__name) { - struct file_system_type * tmp; + struct file_system_type *p; char *name __free(kfree) =3D strndup_user(__name, PATH_MAX); - int err, index; + int index =3D 0; =20 if (IS_ERR(name)) return PTR_ERR(name); =20 - err =3D -EINVAL; - read_lock(&file_systems_lock); - for (tmp=3Dfile_systems, index=3D0 ; tmp ; tmp=3Dtmp->next, index++) { - if (strcmp(tmp->name, name) =3D=3D 0) { - err =3D index; - break; - } + guard(rcu)(); + hlist_for_each_entry_rcu(p, &file_systems, list) { + if (strcmp(p->name, name) =3D=3D 0) + return index; + index++; } - read_unlock(&file_systems_lock); - return err; + return -EINVAL; } =20 -static int fs_name(unsigned int index, char __user * buf) +static int fs_name(unsigned int index, char __user *buf) { - struct file_system_type * tmp; - int len, res =3D -EINVAL; - - read_lock(&file_systems_lock); - for (tmp =3D file_systems; tmp; tmp =3D tmp->next, index--) { - if (index =3D=3D 0) { - if (try_module_get(tmp->owner)) - res =3D 0; + struct file_system_type *p, *found =3D NULL; + int len, res; + + scoped_guard(rcu) { + hlist_for_each_entry_rcu(p, &file_systems, list) { + if (index--) + continue; + if (try_module_get(p->owner)) + found =3D p; break; } } - read_unlock(&file_systems_lock); - if (res) - return res; + if (!found) + return -EINVAL; =20 /* OK, we got the reference, so we can safely block */ - len =3D strlen(tmp->name) + 1; - res =3D copy_to_user(buf, tmp->name, len) ? -EFAULT : 0; - put_filesystem(tmp); + len =3D strlen(found->name) + 1; + res =3D copy_to_user(buf, found->name, len) ? -EFAULT : 0; + put_filesystem(found); return res; } =20 static int fs_maxindex(void) { - struct file_system_type * tmp; - int index; + struct file_system_type *p; + int index =3D 0; =20 - read_lock(&file_systems_lock); - for (tmp =3D file_systems, index =3D 0 ; tmp ; tmp =3D tmp->next, index++) - ; - read_unlock(&file_systems_lock); + guard(rcu)(); + hlist_for_each_entry_rcu(p, &file_systems, list) + index++; return index; } =20 /* - * Whee.. Weird sysv syscall.=20 + * Whee.. Weird sysv syscall. */ SYSCALL_DEFINE3(sysfs, int, option, unsigned long, arg1, unsigned long, ar= g2) { @@ -216,8 +191,8 @@ int __init list_bdev_fs_names(char *buf, size_t size) size_t len; int count =3D 0; =20 - read_lock(&file_systems_lock); - for (p =3D file_systems; p; p =3D p->next) { + guard(rcu)(); + hlist_for_each_entry_rcu(p, &file_systems, list) { if (!(p->fs_flags & FS_REQUIRES_DEV)) continue; len =3D strlen(p->name) + 1; @@ -230,24 +205,20 @@ int __init list_bdev_fs_names(char *buf, size_t size) size -=3D len; count++; } - read_unlock(&file_systems_lock); return count; } =20 #ifdef CONFIG_PROC_FS static int filesystems_proc_show(struct seq_file *m, void *v) { - struct file_system_type * tmp; + struct file_system_type *p; =20 - read_lock(&file_systems_lock); - tmp =3D file_systems; - while (tmp) { + guard(rcu)(); + hlist_for_each_entry_rcu(p, &file_systems, list) { seq_printf(m, "%s\t%s\n", - (tmp->fs_flags & FS_REQUIRES_DEV) ? "" : "nodev", - tmp->name); - tmp =3D tmp->next; + (p->fs_flags & FS_REQUIRES_DEV) ? "" : "nodev", + p->name); } - read_unlock(&file_systems_lock); return 0; } =20 @@ -263,11 +234,10 @@ static struct file_system_type *__get_fs_type(const c= har *name, int len) { struct file_system_type *fs; =20 - read_lock(&file_systems_lock); - fs =3D *(find_filesystem(name, len)); + guard(rcu)(); + fs =3D find_filesystem(name, len); if (fs && !try_module_get(fs->owner)) fs =3D NULL; - read_unlock(&file_systems_lock); return fs; } =20 @@ -291,5 +261,4 @@ struct file_system_type *get_fs_type(const char *name) } return fs; } - EXPORT_SYMBOL(get_fs_type); diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index b875f01c9756..4870e680c4e5 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -1224,7 +1224,6 @@ static struct file_system_type ocfs2_fs_type =3D { .name =3D "ocfs2", .kill_sb =3D kill_block_super, .fs_flags =3D FS_REQUIRES_DEV|FS_RENAME_DOES_D_MOVE, - .next =3D NULL, .init_fs_context =3D ocfs2_init_fs_context, .parameters =3D ocfs2_param_spec, }; diff --git a/include/linux/fs.h b/include/linux/fs.h index 11559c513dfb..c37bb3c7de8b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2286,7 +2286,7 @@ struct file_system_type { const struct fs_parameter_spec *parameters; void (*kill_sb) (struct super_block *); struct module *owner; - struct file_system_type * next; + struct hlist_node list; struct hlist_head fs_supers; =20 struct lock_class_key s_lock_key; --=20 2.48.1 From nobody Fri Jun 19 09:05:20 2026 Received: from mail-wr1-f50.google.com (mail-wr1-f50.google.com [209.85.221.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3396B3806BE for ; Sat, 25 Apr 2026 22:09:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777154945; cv=none; b=eVrrRp6V/D4CQeTpFRNpQsDpnTrmhIdW1yFoPKSnRP2Y8ntuMvGcurnUy9sxgCm9wm2gpaN7YKjPY8SSrnhRuwc8euIo1Tio5UN+g5kYo3Dtj/z1Q93Tnqit+VyzV0CmboImNt/DiV4YGZwQaHMv8m1iJ5W3IOaKSTCfHqpazUA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777154945; c=relaxed/simple; bh=VtZOPW1/LZKA5x5UFQabaxvf5lmI+Tr3n7dvFdW2Njo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=R+lcbdUEKZvR47au++0AZ3v3OPlapW0yyT5GzFqbWmWIyGk58PydbUA9dr7yWVaB7eRh0Vdu19lGRGKTG2OQO7EF/jM5WcHIIvaBIlXQpLzGQZ7B5MZk78KuVDSoqvUaEbleVl/S5i7RyIO5MyGzr/hp/TZOPOKA77OEk4eis68= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=OxEDFPAY; arc=none smtp.client-ip=209.85.221.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OxEDFPAY" Received: by mail-wr1-f50.google.com with SMTP id ffacd0b85a97d-43d70b3e159so4697436f8f.0 for ; Sat, 25 Apr 2026 15:09:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777154942; x=1777759742; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=pFZlcdvkhCsVbbo9Dh0LekEgPNA2Ljy464GXlS0CXEM=; b=OxEDFPAYspITjxdQPV+8eAkXPBCu/at3xJWWiqFvVKx1ZgZRNnBOT3WXInSB/bVzHW tqKPLr601Rc4iJABn8+C2cFo0h2bqiqPVNgzb1LTIq1cbrMgBUVHpX9jyWDHkGwQNZ4T +D0nfNKr/wNXwY/dk/zd/4j0eLomPQOQpNuwuP48QwRgL8rAlqpfAetd+Xu+DhF4iAGI 8YovNQOwUjDL5J90UTeZgqhvOWUSm0koo2POqdy9KPucGdfLjNd2cVXlcq/WKqOzzv12 XDpduKXAxBKVCvY5MgaABQbaSPlvgSNdxyROdOuTNScyoJfb333b0wP+FC0sA8cg1qE0 Ntqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777154942; x=1777759742; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=pFZlcdvkhCsVbbo9Dh0LekEgPNA2Ljy464GXlS0CXEM=; b=h35qelyMB/iDttZuGoZ8HIUkPbASMboXqSJ51nXzfTI8TdeXX6WxYrudjdFRfrjs4h 1rw0gk3nNVb5jw4cRq/O5Q+pMGBc/nmwhDzMcJKxLwp78MnjDWi8EH3Jz7VFVtL4oqoN p6Atch1BlmMpG8AMOVp6nr18NiZTY/YsKkFmGyUWdqtN4tfPrlFb51P1G8/IfkbaZfnf ZLZhsQDTIt8nZRHmja+GYA4dRoP0OT/nAQsXnbmpMCZe+Yxklv8EYZdQoxtSJhVSdvjh QMedZmS0XT+SPQWqdK1U+FelXjufyGsx8PShNFVFhLtnVkOcbmQnLroaPjR2W227u2Mx qloA== X-Forwarded-Encrypted: i=1; AFNElJ8REN1U45oo3Cl2w4VeWbiNUNFfKnlbahI+aIIoX4LqZy06Fix0hoPZ1tn8eiUyYD+YUEC5FxVNJGprKsY=@vger.kernel.org X-Gm-Message-State: AOJu0YyxBa4257r98HvrvoezlaHvekqYFoYzEy90Ji2TgH4nBKYrJP+q yQqf7xxINsqlmjAnJFvDWnQrw2Y5bN7Zg7Mqbd/QI4S9G4UddS173KaV X-Gm-Gg: AeBDiet009mvnAfiK9gjsXjCZ6qRGo3Fucs7BIyKU2gUHt00JWw/fYGtUoP2xt9w71J udYDtTPss/9q5zP6SsHBGgHmKDrTtH/SYW4kBJqVulnLbVMOWiUr/ZxVlLYWhKeWdQXO0x9XdK7 ZLqWOc22PCnzaFxvGNMmx9mLqbM0+o2CSeCKLvX53lP1Zs0pIuF82cWa05+cpjrqn0/bBRnnmbE vISwtUkJANu0fIF4PJ9qG2ttYzto6c4Hf3lWA30MwCL7UJGJjLnVsXw/+DRa6bE01NvwYB4mExV DLgzY95AfipZmaUSH94r8DEppha+UUmqmFZj3k7tBhNIuDkRPs1jVWV6r8gNP+11anQpANFldnX +CDX+gm2/xRADgSoajUzZecwFaiSUnWMkyf1XQ/j7mjbcbcf+ZqfICcxFGdBoNSooUQuWLsJ7rV 1XhQ8RaxR1gPgNOdnh5seDVEPx36CuTIaSKY/H2gM/0HQBLjEZ9j6Gc5mVOIZCqqaQuJgNKVr/D pAwRMNHnwno X-Received: by 2002:a05:600c:8284:b0:489:1f3e:5f69 with SMTP id 5b1f17b1804b1-4891f3e629bmr384462475e9.18.1777154942431; Sat, 25 Apr 2026 15:09:02 -0700 (PDT) Received: from f.. (cst-prg-93-232.cust.vodafone.cz. [46.135.93.232]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-488fc0b4c85sm651984545e9.0.2026.04.25.15.09.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 25 Apr 2026 15:09:01 -0700 (PDT) From: Mateusz Guzik To: brauner@kernel.org Cc: viro@zeniv.linux.org.uk, jack@suse.cz, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, adobriyan@gmail.com, Mateusz Guzik Subject: [PATCH v3 3/3] fs: cache the string generated by reading /proc/filesystems Date: Sun, 26 Apr 2026 00:08:44 +0200 Message-ID: <20260425220844.1763933-4-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260425220844.1763933-1-mjguzik@gmail.com> References: <20260425220844.1763933-1-mjguzik@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" It is being read surprisingly often (e.g., by mkdir, ls and even sed!). This is lock-protected pointer chasing over a linked list to pay for sprintf for every fs (32 on my boxen). Instead cache the result. While here make the file as permanent to avoid spurious ref trips in procfs. Signed-off-by: Mateusz Guzik --- fs/filesystems.c | 155 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 153 insertions(+), 2 deletions(-) diff --git a/fs/filesystems.c b/fs/filesystems.c index 7976366d4197..771fc31a69b8 100644 --- a/fs/filesystems.c +++ b/fs/filesystems.c @@ -31,6 +31,36 @@ static HLIST_HEAD(file_systems); static DEFINE_SPINLOCK(file_systems_lock); =20 +#ifdef CONFIG_PROC_FS +/* + * Cache a stringified version of the filesystem list. + * + * The fs list gets queried a lot by userspace because of libselinux, incl= uding + * rather surprising programs (would you guess *sed* is on the list?). In = order + * to reduce the overhead we cache the resulting string, which normally ha= ngs + * around below 512 bytes in size. + * + * As the list almost never changes, its creation is not particularly opti= mized + * to keep things simple. + * + * We sort it out on read in order to not introduce a failure point for fs + * registration (in principle we may be unable to alloc memory for the lis= t). + */ +struct file_systems_string { + struct rcu_head rcu; + unsigned long gen; + size_t len; + char string[]; +}; + +static unsigned long file_systems_gen; +static struct file_systems_string __rcu *file_systems_string; + +static void invalidate_filesystems_string(void); +#else +static inline void invalidate_filesystems_string(void) { } +#endif + /* WARNING: This can be used only if we _already_ own a reference */ struct file_system_type *get_filesystem(struct file_system_type *fs) { @@ -80,6 +110,7 @@ int register_filesystem(struct file_system_type *fs) if (find_filesystem(fs->name, strlen(fs->name))) return -EBUSY; hlist_add_tail_rcu(&fs->list, &file_systems); + invalidate_filesystems_string(); return 0; } EXPORT_SYMBOL(register_filesystem); @@ -101,6 +132,7 @@ int unregister_filesystem(struct file_system_type *fs) if (hlist_unhashed(&fs->list)) return -EINVAL; hlist_del_init_rcu(&fs->list); + invalidate_filesystems_string(); } synchronize_rcu(); return 0; @@ -209,7 +241,102 @@ int __init list_bdev_fs_names(char *buf, size_t size) } =20 #ifdef CONFIG_PROC_FS -static int filesystems_proc_show(struct seq_file *m, void *v) +static void invalidate_filesystems_string(void) +{ + struct file_systems_string *old; + + lockdep_assert_held_write(&file_systems_lock); + file_systems_gen++; + old =3D rcu_replace_pointer(file_systems_string, NULL, + lockdep_is_held(&file_systems_lock)); + if (old) + kfree_rcu(old, rcu); +} + +static __cold noinline int regen_filesystems_string(void) +{ + struct file_system_type *p; + struct file_systems_string *old, *new; + size_t newlen, usedlen; + unsigned long gen; + +retry: + newlen =3D 0; + + /* pre-calc space for each fs */ + spin_lock(&file_systems_lock); + gen =3D file_systems_gen; + hlist_for_each_entry_rcu(p, &file_systems, list) { + if (!(p->fs_flags & FS_REQUIRES_DEV)) + newlen +=3D strlen("nodev"); + newlen +=3D strlen("\t") + strlen(p->name) + strlen("\n"); + } + spin_unlock(&file_systems_lock); + + new =3D kmalloc(offsetof(struct file_systems_string, string) + newlen + 1, + GFP_KERNEL); + if (!new) + return -ENOMEM; + + new->gen =3D gen; + new->len =3D newlen; + new->string[newlen] =3D '\0'; + + spin_lock(&file_systems_lock); + old =3D file_systems_string; + + /* + * Did someone beat us to it? + */ + if (old && old->gen =3D=3D file_systems_gen) { + kfree(new); + return 0; + } + + /* + * Did the list change in the meantime? + */ + if (gen !=3D file_systems_gen) { + kfree(new); + goto retry; + } + + /* + * Populate the string. + * + * We know we have just enough space because we calculated the right + * size the previous time we had the lock and confirmed the list has + * not changed after reacquiring it. + */ + usedlen =3D 0; + hlist_for_each_entry_rcu(p, &file_systems, list) { + usedlen +=3D sprintf(&new->string[usedlen], "%s\t%s\n", + (p->fs_flags & FS_REQUIRES_DEV) ? "" : "nodev", + p->name); + } + + if (WARN_ON_ONCE(new->len !=3D strlen(new->string))) { + /* + * Should never happen of course, keep this in case someone changes stri= ng + * generation above and messes it up. + */ + spin_unlock(&file_systems_lock); + if (old) + kfree_rcu(old, rcu); + return -EINVAL; + } + + /* + * Paired with consume fence in READ_ONCE() in filesystems_proc_show() + */ + smp_store_release(&file_systems_string, new); + spin_unlock(&file_systems_lock); + if (old) + kfree_rcu(old, rcu); + return 0; +} + +static __cold noinline int filesystems_proc_show_fallback(struct seq_file = *m, void *v) { struct file_system_type *p; =20 @@ -222,9 +349,33 @@ static int filesystems_proc_show(struct seq_file *m, v= oid *v) return 0; } =20 +static int filesystems_proc_show(struct seq_file *m, void *v) +{ + struct file_systems_string *fss; + + for (;;) { + scoped_guard(rcu) { + fss =3D rcu_dereference(file_systems_string); + if (likely(fss)) { + seq_write(m, fss->string, fss->len); + return 0; + } + } + + int err =3D regen_filesystems_string(); + if (unlikely(err)) + return filesystems_proc_show_fallback(m, v); + } +} + static int __init proc_filesystems_init(void) { - proc_create_single("filesystems", 0, NULL, filesystems_proc_show); + struct proc_dir_entry *pde; + + pde =3D proc_create_single("filesystems", 0, NULL, filesystems_proc_show); + if (!pde) + return -ENOMEM; + proc_make_permanent(pde); return 0; } module_init(proc_filesystems_init); --=20 2.48.1