[RFC PATCH v3 00/59] struct filename work

Al Viro posted 59 patches 1 month, 3 weeks ago
arch/alpha/kernel/osf_sys.c       |  34 +--
fs/dcache.c                       |   8 +-
fs/exec.c                         |  99 +++-----
fs/fhandle.c                      |   5 +-
fs/file_attr.c                    |  12 +-
fs/filesystems.c                  |   9 +-
fs/fsopen.c                       |   6 +-
fs/internal.h                     |   4 +-
fs/namei.c                        | 359 ++++++++++++++++--------------
fs/namespace.c                    |  22 +-
fs/ntfs3/inode.c                  |   6 +-
fs/ntfs3/namei.c                  |   8 +-
fs/open.c                         | 119 ++++------
fs/quota/quota.c                  |   3 +-
fs/smb/server/vfs.c               |  15 +-
fs/stat.c                         |  28 +--
fs/statfs.c                       |   3 +-
fs/utimes.c                       |   8 +-
fs/xattr.c                        |  33 +--
include/asm-generic/vmlinux.lds.h |   3 +-
include/linux/audit.h             |  11 -
include/linux/fs.h                |  42 ++--
io_uring/fs.c                     | 101 +++++----
io_uring/openclose.c              |  26 +--
io_uring/statx.c                  |  17 +-
io_uring/xattr.c                  |  30 +--
ipc/mqueue.c                      |  11 +-
kernel/acct.c                     |   4 +-
kernel/auditsc.c                  |  23 +-
mm/huge_memory.c                  |  15 +-
mm/swapfile.c                     |  21 +-
31 files changed, 459 insertions(+), 626 deletions(-)
[RFC PATCH v3 00/59] struct filename work
Posted by Al Viro 1 month, 3 weeks ago
[Considerably grown since the last time it had been posted - see
https://lore.kernel.org/all/20251129170142.150639-1-viro@zeniv.linux.org.uk/
for previous variant]

Changes compared to v2:
	* rebased to v6.19-rc1, dropped mguzik's patch merged in mainline
	* size of embedded name bumped up, so that sizeof(struct filename)
ends up being 192 bytes (64-byte aligned and large enough to cover everything
realistic).
	* in #17 ("allow incomplete imports of filenames") don't open-code
getname_kernel() in the "copy" case.
	* taken out ntfs (ab)uses of names_cachep added this cycle
	* add CLASS() machinery for struct filename; it fits most of the
users.  Plenty of conversions later in the series...
	* most of the pathwalk-related API does the right thing on ERR_PTR()
passed as filename; the only exception is do_filp_open(), that needs to have
that checked in its callers.  Moved the check into it, simplified the callers
(and in some cases callers of callers, and...)
	* renamed 'do_filp_open' to 'do_file_open' - should've been called
that way from the very beginning.  My fault, back in 2.1.128, when cretinous
filp_open() had been added ;-/
	* alpha osf_mount() device name imports are done the same way as
the regular mount() does - strndup_user()
	* sysfs(2) (no relation to sysfs) used to abuse getname() for
importing filesystem type name; no more.

The branch lives in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.filename
now; individual patches in followups.

Review and testing would be very welcome; it seems to survive the local
beating, but...

Shortlog:

Al Viro (58):
  do_faccessat(): import pathname only once
  do_fchmodat(): import pathname only once
  do_fchownat(): import pathname only once
  do_utimes_path(): import pathname only once
  chdir(2): import pathname only once
  chroot(2): import pathname only once
  user_statfs(): import pathname only once
  do_sys_truncate(): import pathname only once
  do_readlinkat(): import pathname only once
  get rid of audit_reusename()
  ntfs: ->d_compare() must not block
  getname_flags() massage, part 1
  getname_flags() massage, part 2
  struct filename: use names_cachep only for getname() and friends
  struct filename: saner handling of long names
  allow incomplete imports of filenames
  struct filename ->refcnt doesn't need to be atomic
  allow to use CLASS() for struct filename *
  file_getattr(): filename_lookup() accepts ERR_PTR() as filename
  file_setattr(): filename_lookup() accepts ERR_PTR() as filename
  move_mount(): filename_lookup() accepts ERR_PTR() as filename
  ksmbd_vfs_path_lookup(): vfs_path_parent_lookup() accepts ERR_PTR() as name
  ksmbd_vfs_rename(): vfs_path_parent_lookup() accepts ERR_PTR() as name
  do_filp_open(): DTRT when getting ERR_PTR() as pathname
  rename do_filp_open() to do_file_open()
  do_sys_openat2(): get rid of useless check, switch to CLASS(filename)
  simplify the callers of file_open_name()
  simplify the callers of do_open_execat()
  simplify the callers of alloc_bprm()
  switch {alloc,free}_bprm() to CLASS()
  file_[gs]etattr(2): switch to CLASS(filename_maybe_null)
  mount_setattr(2): don't mess with LOOKUP_EMPTY
  do_open_execat(): don't care about LOOKUP_EMPTY
  vfs_open_tree(): use CLASS(filename_uflags)
  name_to_handle_at(): use CLASS(filename_uflags)
  fspick(2): use CLASS(filename_flags)
  do_fchownat(): unspaghettify a bit...
  chdir(2): unspaghettify a bit...
  do_utimes_path(): switch to CLASS(filename_uflags)
  do_sys_truncate(): switch to CLASS(filename)
  do_readlinkat(): switch to CLASS(filename_flags)
  do_f{chmod,chown,access}at(): use CLASS(filename_uflags)
  io_openat2(): use CLASS(filename_complete_delayed)
  io_statx(): use CLASS(filename_complete_delayed)
  do_{renameat2,linkat,symlinkat}(): use CLASS(filename_consume)
  do_{mknodat,mkdirat,unlinkat,rmdir}(): use CLASS(filename_consume)
  namei.c: convert getname_kernel() callers to CLASS(filename_kernel)
  namei.c: switch user pathname imports to CLASS(filename{,_flags})
  filename_...xattr(): don't consume filename reference
  move_mount(2): switch to CLASS(filename_maybe_null)
  chroot(2): switch to CLASS(filename)
  quotactl_block(): switch to CLASS(filename)
  statx: switch to CLASS(filename_maybe_null)
  user_statfs(): switch to CLASS(filename)
  mqueue: switch to CLASS(filename)
  ksmbd: use CLASS(filename_kernel)
  alpha: switch osf_mount() to strndup_user()
  sysfs(2): fs_index() argument is _not_ a pathname

Mateusz Guzik (1):
  fs: hide names_cache behind runtime const machinery

Diffstat:

 arch/alpha/kernel/osf_sys.c       |  34 +--
 fs/dcache.c                       |   8 +-
 fs/exec.c                         |  99 +++-----
 fs/fhandle.c                      |   5 +-
 fs/file_attr.c                    |  12 +-
 fs/filesystems.c                  |   9 +-
 fs/fsopen.c                       |   6 +-
 fs/internal.h                     |   4 +-
 fs/namei.c                        | 359 ++++++++++++++++--------------
 fs/namespace.c                    |  22 +-
 fs/ntfs3/inode.c                  |   6 +-
 fs/ntfs3/namei.c                  |   8 +-
 fs/open.c                         | 119 ++++------
 fs/quota/quota.c                  |   3 +-
 fs/smb/server/vfs.c               |  15 +-
 fs/stat.c                         |  28 +--
 fs/statfs.c                       |   3 +-
 fs/utimes.c                       |   8 +-
 fs/xattr.c                        |  33 +--
 include/asm-generic/vmlinux.lds.h |   3 +-
 include/linux/audit.h             |  11 -
 include/linux/fs.h                |  42 ++--
 io_uring/fs.c                     | 101 +++++----
 io_uring/openclose.c              |  26 +--
 io_uring/statx.c                  |  17 +-
 io_uring/xattr.c                  |  30 +--
 ipc/mqueue.c                      |  11 +-
 kernel/acct.c                     |   4 +-
 kernel/auditsc.c                  |  23 +-
 mm/huge_memory.c                  |  15 +-
 mm/swapfile.c                     |  21 +-
 31 files changed, 459 insertions(+), 626 deletions(-)

Rough overview:
1--9:
	moving pathname import out of retry loops
10:
	now we can get rid of "reuse the struct filename if
we'd just imported it from the same address _and_ audit is
enabled" logics.
11:
	get rid of names_cachep abuse in ntfs
12--15:
	embed reasonably short pathnames into struct filename,
*always* get struct filename out names_cachep, take the long
names into explicitly kmalloc'ed objects.
NB: EMBEDDED_NAME_MAX is 128 at the moment; might make sense
to increase it - the nearest kmalloc cache size is 192 bytes,
so we are leaving gaps anyway.  40 bytes increase, at least?
More on 32bit...
16:
	runtime_const machinery for names_cachep; there's
a potentially better variant (statically allocated kmem_cache),
but that's a separate series.
17:
	delayed_filename machinery, solves the audit vs. io_uring
problems.
18:
	now we don't need filename->refcnt to be atomic.
19:
	infrastructure for CLASS(filename...)
20--24:
	simplify checks in callers of pathwalk primitives -
	they (with exception of do_filp_open()) will do
	the right thing if given ERR_PTR() for name.
25--31:	... get rid of that one exception and simplify
	more callers.
32--57:
	conversions to CLASS(filename...), cleanups
58, 59:
	... and these should not have been using getname().

-- 
2.47.3
Re: [RFC PATCH v3 00/59] struct filename work
Posted by Linus Torvalds 1 month, 3 weeks ago
So I like the whole series, but..

On Tue, 16 Dec 2025 at 15:56, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>   struct filename ->refcnt doesn't need to be atomic

Does ->refcnt need to exist _at_all_ if audit isn't enabled?

Are there any other users of it? Maybe I missed some?

Because I'm wondering if we could just encapsulate the thing entirely
in some #ifdef CONFIG_AUDIT check.

Now, I think absolutely everybody does enable audit, so it's not
because I'd try to save one word of memory and a few tests, it's more
of a "could we make it very explicit that all that code is purely
about the audit case"?

                    Linus
Re: [RFC PATCH v3 00/59] struct filename work
Posted by Al Viro 1 month, 3 weeks ago
On Tue, Dec 16, 2025 at 04:32:03PM +1200, Linus Torvalds wrote:
> So I like the whole series, but..
> 
> On Tue, 16 Dec 2025 at 15:56, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> >   struct filename ->refcnt doesn't need to be atomic
> 
> Does ->refcnt need to exist _at_all_ if audit isn't enabled?
> 
> Are there any other users of it? Maybe I missed some?
> 
> Because I'm wondering if we could just encapsulate the thing entirely
> in some #ifdef CONFIG_AUDIT check.
> 
> Now, I think absolutely everybody does enable audit, so it's not
> because I'd try to save one word of memory and a few tests, it's more
> of a "could we make it very explicit that all that code is purely
> about the audit case"?

Umm...  Not exactly.  I mean, yes, at the moment we never increment the
refcount outside of kernel/auditsc.c, so it'll always be 1 if that thing
is disabled.

But if you mean to store it on caller's stack, that's another kettle of
fish - anything async with io_uring won't be able to do that, even we
ignore the stack footprint issues.  In configs without audit we end up
	1) allocating it and copying the pathname from userland on
request submission; pointer is stashed into request.
	2) picking it in processing thread and doing the operation
there By that point submitted might have not just left the kernel,
but overwritten the pathname contents in userland.
	3) either stashing it back into request or freeing it.

With audit (2) might become "... and have an extra ref stashed in audit
context" with (3) becoming "either stashing it back into request, if no
extra ref has appeared, or making a copy, stashing it back into request
and dropping the reference on the original"

So refcount may be audit-only thing, at least at the moment, but the
need to outlive the syscall where getname() had been called is very much
not audit-only.

And stack footprint is not trivial either, unless you limit embedded
case to something very short - even an extra hundred bytes (or two,
e.g. in case of rename()) is not something I'd be entirely comfortable
grabbing for pathname-related syscalls.