[PATCH v4 00/12] hide ->i_state behind accessors

Mateusz Guzik posted 12 patches 2 weeks, 1 day ago
There is a newer version of this series
block/bdev.c                     |   4 +-
drivers/dax/super.c              |   2 +-
fs/9p/vfs_inode.c                |   2 +-
fs/9p/vfs_inode_dotl.c           |   2 +-
fs/affs/inode.c                  |   2 +-
fs/afs/dynroot.c                 |   6 +-
fs/afs/inode.c                   |   8 +-
fs/bcachefs/fs.c                 |   7 +-
fs/befs/linuxvfs.c               |   2 +-
fs/bfs/inode.c                   |   2 +-
fs/btrfs/inode.c                 |  10 +--
fs/buffer.c                      |   4 +-
fs/ceph/cache.c                  |   2 +-
fs/ceph/crypto.c                 |   4 +-
fs/ceph/file.c                   |   4 +-
fs/ceph/inode.c                  |  28 +++----
fs/coda/cnode.c                  |   4 +-
fs/cramfs/inode.c                |   2 +-
fs/crypto/keyring.c              |   2 +-
fs/crypto/keysetup.c             |   2 +-
fs/dcache.c                      |   8 +-
fs/drop_caches.c                 |   2 +-
fs/ecryptfs/inode.c              |   6 +-
fs/efs/inode.c                   |   2 +-
fs/erofs/inode.c                 |   2 +-
fs/ext2/inode.c                  |   2 +-
fs/ext4/inode.c                  |  10 +--
fs/ext4/orphan.c                 |   4 +-
fs/f2fs/data.c                   |   2 +-
fs/f2fs/inode.c                  |   2 +-
fs/f2fs/namei.c                  |   4 +-
fs/f2fs/super.c                  |   2 +-
fs/freevxfs/vxfs_inode.c         |   2 +-
fs/fs-writeback.c                | 123 ++++++++++++++++---------------
fs/fuse/inode.c                  |   4 +-
fs/gfs2/file.c                   |   2 +-
fs/gfs2/glops.c                  |   2 +-
fs/gfs2/inode.c                  |   4 +-
fs/gfs2/ops_fstype.c             |   2 +-
fs/hfs/btree.c                   |   2 +-
fs/hfs/inode.c                   |   2 +-
fs/hfsplus/super.c               |   2 +-
fs/hostfs/hostfs_kern.c          |   2 +-
fs/hpfs/dir.c                    |   2 +-
fs/hpfs/inode.c                  |   2 +-
fs/inode.c                       | 104 +++++++++++++-------------
fs/isofs/inode.c                 |   2 +-
fs/jffs2/fs.c                    |   4 +-
fs/jfs/file.c                    |   4 +-
fs/jfs/inode.c                   |   2 +-
fs/jfs/jfs_txnmgr.c              |   2 +-
fs/kernfs/inode.c                |   2 +-
fs/libfs.c                       |   6 +-
fs/minix/inode.c                 |   2 +-
fs/namei.c                       |   8 +-
fs/netfs/misc.c                  |   8 +-
fs/netfs/read_single.c           |   6 +-
fs/nfs/inode.c                   |   2 +-
fs/nfs/pnfs.c                    |   2 +-
fs/nfsd/vfs.c                    |   2 +-
fs/nilfs2/cpfile.c               |   2 +-
fs/nilfs2/dat.c                  |   2 +-
fs/nilfs2/ifile.c                |   2 +-
fs/nilfs2/inode.c                |  10 +--
fs/nilfs2/sufile.c               |   2 +-
fs/notify/fsnotify.c             |   2 +-
fs/ntfs3/inode.c                 |   2 +-
fs/ocfs2/dlmglue.c               |   2 +-
fs/ocfs2/inode.c                 |  10 +--
fs/omfs/inode.c                  |   2 +-
fs/openpromfs/inode.c            |   2 +-
fs/orangefs/inode.c              |   2 +-
fs/orangefs/orangefs-utils.c     |   6 +-
fs/overlayfs/dir.c               |   2 +-
fs/overlayfs/inode.c             |   6 +-
fs/overlayfs/util.c              |  10 +--
fs/pipe.c                        |   2 +-
fs/qnx4/inode.c                  |   2 +-
fs/qnx6/inode.c                  |   2 +-
fs/quota/dquot.c                 |   2 +-
fs/romfs/super.c                 |   2 +-
fs/smb/client/cifsfs.c           |   2 +-
fs/smb/client/inode.c            |  14 ++--
fs/squashfs/inode.c              |   2 +-
fs/sync.c                        |   2 +-
fs/ubifs/file.c                  |   2 +-
fs/ubifs/super.c                 |   2 +-
fs/udf/inode.c                   |   2 +-
fs/ufs/inode.c                   |   2 +-
fs/xfs/scrub/common.c            |   2 +-
fs/xfs/scrub/inode_repair.c      |   2 +-
fs/xfs/scrub/parent.c            |   2 +-
fs/xfs/xfs_bmap_util.c           |   2 +-
fs/xfs/xfs_health.c              |   4 +-
fs/xfs/xfs_icache.c              |   6 +-
fs/xfs/xfs_inode.c               |   6 +-
fs/xfs/xfs_inode_item.c          |   4 +-
fs/xfs/xfs_iops.c                |   2 +-
fs/xfs/xfs_reflink.h             |   2 +-
fs/zonefs/super.c                |   4 +-
include/linux/backing-dev.h      |   5 +-
include/linux/fs.h               |  70 +++++++++++++++++-
include/linux/writeback.h        |   4 +-
include/trace/events/writeback.h |   8 +-
mm/backing-dev.c                 |   2 +-
security/landlock/fs.c           |   2 +-
106 files changed, 371 insertions(+), 310 deletions(-)
[PATCH v4 00/12] hide ->i_state behind accessors
Posted by Mateusz Guzik 2 weeks, 1 day ago
This is generated against:
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.18.inode.refcount.preliminaries

First commit message quoted verbatim with rationable + API:

[quote]
Open-coded accesses prevent asserting they are done correctly. One
obvious aspect is locking, but significantly more can checked. For
example it can be detected when the code is clearing flags which are
already missing, or is setting flags when it is illegal (e.g., I_FREEING
when ->i_count > 0).

In order to keep things manageable this patchset merely gets the thing
off the ground with only lockdep checks baked in.

Current consumers can be trivially converted.

Suppose flags I_A and I_B are to be handled, then if ->i_lock is held:

state = inode->i_state  	=> state = inode_state_read(inode)
inode->i_state |= (I_A | I_B) 	=> inode_state_add(inode, I_A | I_B)
inode->i_state &= ~(I_A | I_B) 	=> inode_state_del(inode, I_A | I_B)
inode->i_state = I_A | I_B	=> inode_state_set(inode, I_A | I_B)

If ->i_lock is not held or only held conditionally, add "_once"
suffix for the read routine or "_raw" for the rest:

state = inode->i_state  	=> state = inode_state_read_once(inode)
inode->i_state |= (I_A | I_B) 	=> inode_state_add_raw(inode, I_A | I_B)
inode->i_state &= ~(I_A | I_B) 	=> inode_state_del_raw(inode, I_A | I_B)
inode->i_state = I_A | I_B	=> inode_state_set_raw(inode, I_A | I_B)

The "_once" vs "_raw" discrepancy stems from the read variant differing
by READ_ONCE as opposed to just lockdep checks.
[/quote]

A series with one patch per subsystem/filesystem is quite big (over 50
largely trivial mails) and that's probably not warranted. Instead, core
kernel was handled in one commit and only file systems with changes
which should be looked at got split (all the rest is one combined commit).
This all mostly mechanical churn and that alone should not be
objectional. If someone does not like the API, they should raise it
here.

per-fs postings are there in case something is correctly marked as
unlocked access.

Note that in the worst case should a mistake be made here, it either
will fail to spot ->i_lock not being held (which is equivalent to the
stock state) OR it will generate a lockdep splat. While not pretty, it
is loud and readily fixable. Otherwise this patchset is a NOP.

Testing was limited:
kernel with CONFIG_DEBUG_VFS + lockdep was booted with ext4, survived
kernel builds and whatnot. xfs and btrfs filesystems were mounted had
files linked and unlinked on them.

I very much will need someone with more resources to give this a
beating. I tried to err on the side of expecting the caller does *need*
->i_lock and it is possible something is using inode_state_read()
instead of inode_state_read_once() as a result. If so, there will be a
lockdep splat though.

Coccinelle was used to do the conversion, with all changes audited +
some manual fixups (more eyes welcome):

@@
expression inode, flags;
@@

- inode->i_state & flags
+ inode_state_read(inode) & flags

@@
expression inode, flags;
@@

- inode->i_state &= ~flags
+ inode_state_del(inode, flags)

@@
expression inode, flags;
@@

- inode->i_state |= flags
+ inode_state_add(inode, flags)

@@
expression inode, flags;
@@

- inode->i_state = flags
+ inode_state_set_raw(inode, flags)

Patch breakdown:
  fs: provide accessors for ->i_state

This only adds the routines, nothing is using them and overall it's a
NOP.

  fs: use ->i_state accessors in core kernel

Converts the entirety of the kernel modulo specific file systems.

  fs: mechanically convert most filesystems to use ->i_state accessors

This includes all trivial changes (mostly when the filesystem just
checks for I_NEW after getting the inode from the hash).

  btrfs: use the new ->i_state accessors
  netfs: use the new ->i_state accessors
  nilfs2: use the new ->i_state accessors
  xfs: use the new ->i_state accessors
  ext4: use the new ->i_state accessors
  f2fs: use the new ->i_state accessors
  ceph: use the new ->i_state accessors
  overlayfs: use the new ->i_state accessors

Per-fs split if there was more work in the area just to sanity check by
interested parties.

  fs: make plain ->i_state access fail to compile

This hides ->i_state behind a struct, so things nicely fail to compile
if someone open-codes plain access.

v3:
- rename accessors (s/unchecked/raw; s/unstable/once/)
- rebase
- provide actual commit messages
- per fs patches as I deemed applicable

Mateusz Guzik (12):
  fs: provide accessors for ->i_state
  fs: use ->i_state accessors in core kernel
  fs: mechanically convert most filesystems to use ->i_state accessors
  btrfs: use the new ->i_state accessors
  netfs: use the new ->i_state accessors
  nilfs2: use the new ->i_state accessors
  xfs: use the new ->i_state accessors
  ext4: use the new ->i_state accessors
  f2fs: use the new ->i_state accessors
  ceph: use the new ->i_state accessors
  overlayfs: use the new ->i_state accessors
  fs: make plain ->i_state access fail to compile

 block/bdev.c                     |   4 +-
 drivers/dax/super.c              |   2 +-
 fs/9p/vfs_inode.c                |   2 +-
 fs/9p/vfs_inode_dotl.c           |   2 +-
 fs/affs/inode.c                  |   2 +-
 fs/afs/dynroot.c                 |   6 +-
 fs/afs/inode.c                   |   8 +-
 fs/bcachefs/fs.c                 |   7 +-
 fs/befs/linuxvfs.c               |   2 +-
 fs/bfs/inode.c                   |   2 +-
 fs/btrfs/inode.c                 |  10 +--
 fs/buffer.c                      |   4 +-
 fs/ceph/cache.c                  |   2 +-
 fs/ceph/crypto.c                 |   4 +-
 fs/ceph/file.c                   |   4 +-
 fs/ceph/inode.c                  |  28 +++----
 fs/coda/cnode.c                  |   4 +-
 fs/cramfs/inode.c                |   2 +-
 fs/crypto/keyring.c              |   2 +-
 fs/crypto/keysetup.c             |   2 +-
 fs/dcache.c                      |   8 +-
 fs/drop_caches.c                 |   2 +-
 fs/ecryptfs/inode.c              |   6 +-
 fs/efs/inode.c                   |   2 +-
 fs/erofs/inode.c                 |   2 +-
 fs/ext2/inode.c                  |   2 +-
 fs/ext4/inode.c                  |  10 +--
 fs/ext4/orphan.c                 |   4 +-
 fs/f2fs/data.c                   |   2 +-
 fs/f2fs/inode.c                  |   2 +-
 fs/f2fs/namei.c                  |   4 +-
 fs/f2fs/super.c                  |   2 +-
 fs/freevxfs/vxfs_inode.c         |   2 +-
 fs/fs-writeback.c                | 123 ++++++++++++++++---------------
 fs/fuse/inode.c                  |   4 +-
 fs/gfs2/file.c                   |   2 +-
 fs/gfs2/glops.c                  |   2 +-
 fs/gfs2/inode.c                  |   4 +-
 fs/gfs2/ops_fstype.c             |   2 +-
 fs/hfs/btree.c                   |   2 +-
 fs/hfs/inode.c                   |   2 +-
 fs/hfsplus/super.c               |   2 +-
 fs/hostfs/hostfs_kern.c          |   2 +-
 fs/hpfs/dir.c                    |   2 +-
 fs/hpfs/inode.c                  |   2 +-
 fs/inode.c                       | 104 +++++++++++++-------------
 fs/isofs/inode.c                 |   2 +-
 fs/jffs2/fs.c                    |   4 +-
 fs/jfs/file.c                    |   4 +-
 fs/jfs/inode.c                   |   2 +-
 fs/jfs/jfs_txnmgr.c              |   2 +-
 fs/kernfs/inode.c                |   2 +-
 fs/libfs.c                       |   6 +-
 fs/minix/inode.c                 |   2 +-
 fs/namei.c                       |   8 +-
 fs/netfs/misc.c                  |   8 +-
 fs/netfs/read_single.c           |   6 +-
 fs/nfs/inode.c                   |   2 +-
 fs/nfs/pnfs.c                    |   2 +-
 fs/nfsd/vfs.c                    |   2 +-
 fs/nilfs2/cpfile.c               |   2 +-
 fs/nilfs2/dat.c                  |   2 +-
 fs/nilfs2/ifile.c                |   2 +-
 fs/nilfs2/inode.c                |  10 +--
 fs/nilfs2/sufile.c               |   2 +-
 fs/notify/fsnotify.c             |   2 +-
 fs/ntfs3/inode.c                 |   2 +-
 fs/ocfs2/dlmglue.c               |   2 +-
 fs/ocfs2/inode.c                 |  10 +--
 fs/omfs/inode.c                  |   2 +-
 fs/openpromfs/inode.c            |   2 +-
 fs/orangefs/inode.c              |   2 +-
 fs/orangefs/orangefs-utils.c     |   6 +-
 fs/overlayfs/dir.c               |   2 +-
 fs/overlayfs/inode.c             |   6 +-
 fs/overlayfs/util.c              |  10 +--
 fs/pipe.c                        |   2 +-
 fs/qnx4/inode.c                  |   2 +-
 fs/qnx6/inode.c                  |   2 +-
 fs/quota/dquot.c                 |   2 +-
 fs/romfs/super.c                 |   2 +-
 fs/smb/client/cifsfs.c           |   2 +-
 fs/smb/client/inode.c            |  14 ++--
 fs/squashfs/inode.c              |   2 +-
 fs/sync.c                        |   2 +-
 fs/ubifs/file.c                  |   2 +-
 fs/ubifs/super.c                 |   2 +-
 fs/udf/inode.c                   |   2 +-
 fs/ufs/inode.c                   |   2 +-
 fs/xfs/scrub/common.c            |   2 +-
 fs/xfs/scrub/inode_repair.c      |   2 +-
 fs/xfs/scrub/parent.c            |   2 +-
 fs/xfs/xfs_bmap_util.c           |   2 +-
 fs/xfs/xfs_health.c              |   4 +-
 fs/xfs/xfs_icache.c              |   6 +-
 fs/xfs/xfs_inode.c               |   6 +-
 fs/xfs/xfs_inode_item.c          |   4 +-
 fs/xfs/xfs_iops.c                |   2 +-
 fs/xfs/xfs_reflink.h             |   2 +-
 fs/zonefs/super.c                |   4 +-
 include/linux/backing-dev.h      |   5 +-
 include/linux/fs.h               |  70 +++++++++++++++++-
 include/linux/writeback.h        |   4 +-
 include/trace/events/writeback.h |   8 +-
 mm/backing-dev.c                 |   2 +-
 security/landlock/fs.c           |   2 +-
 106 files changed, 371 insertions(+), 310 deletions(-)

-- 
2.43.0
Re: [PATCH v4 00/12] hide ->i_state behind accessors
Posted by Christian Brauner 1 week, 5 days ago
On Tue, Sep 16, 2025 at 03:58:48PM +0200, Mateusz Guzik wrote:
> This is generated against:
> https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.18.inode.refcount.preliminaries

Given how late in the cycle it is I'm going to push this into the v6.19
merge window. You don't need to resend. We might get by with applying
and rebasing given that it's fairly mechanincal overall. Objections
Mateusz?
Re: [PATCH v4 00/12] hide ->i_state behind accessors
Posted by Mateusz Guzik 1 week, 5 days ago
On Fri, Sep 19, 2025 at 2:19 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, Sep 16, 2025 at 03:58:48PM +0200, Mateusz Guzik wrote:
> > This is generated against:
> > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.18.inode.refcount.preliminaries
>
> Given how late in the cycle it is I'm going to push this into the v6.19
> merge window. You don't need to resend. We might get by with applying
> and rebasing given that it's fairly mechanincal overall. Objections
> Mateusz?

First a nit: if the prelim branch is going in, you may want to adjust
the dump_inode commit to use icount_read instead of
atomic_read(&inode->i_count));

Getting this in *now* is indeed not worth it, so I support the idea.
Re: [PATCH v4 00/12] hide ->i_state behind accessors
Posted by Mateusz Guzik 1 week, 5 days ago
On Fri, Sep 19, 2025 at 3:09 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> On Fri, Sep 19, 2025 at 2:19 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Tue, Sep 16, 2025 at 03:58:48PM +0200, Mateusz Guzik wrote:
> > > This is generated against:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.18.inode.refcount.preliminaries
> >
> > Given how late in the cycle it is I'm going to push this into the v6.19
> > merge window. You don't need to resend. We might get by with applying
> > and rebasing given that it's fairly mechanincal overall. Objections
> > Mateusz?
>
> First a nit: if the prelim branch is going in, you may want to adjust
> the dump_inode commit to use icount_read instead of
> atomic_read(&inode->i_count));
>
> Getting this in *now* is indeed not worth it, so I support the idea.

Now that I wrote this I gave it a little bit of thought.

Note almost all of the churn was generated by coccinelle. Few spots
got adjusted by hand.

Regressions are possible in 3 ways:
- wrong routine usage (_raw/_once vs plain) leading to lockdep splats
- incorrect manual adjustment between _raw/_once and plain variants,
again leading to lockdep splats
- incorrect manually added usage (e.g., some of the _set stuff and the
xfs changes were done that way)

The first two become instant non-problems if lockdep gets elided for
the merge right now.

The last one may be a real concern, to which I have a
counter-proposal: extended coccinelle to also cover that, leading to
*no* manual intervention.

Something like that should be perfectly safe to merge, hopefully
avoiding some churn headache in the next cycle. Worst case the
_raw/_once usage would be "wrong" and only come out after lockdep is
restored.

Another option is to make the patchset into a nop by only providing
the helpers without _raw/_once variants, again fully generated with
coccinelle. Again should make it easier to shuffle changes in the next
cycle.

I can prep this today if it sounds like a plan, but I'm not going to
strongly argue one way or the other.
Re: [PATCH v4 00/12] hide ->i_state behind accessors
Posted by Mateusz Guzik 1 week, 5 days ago
On Fri, Sep 19, 2025 at 3:39 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> On Fri, Sep 19, 2025 at 3:09 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> >
> > On Fri, Sep 19, 2025 at 2:19 PM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > On Tue, Sep 16, 2025 at 03:58:48PM +0200, Mateusz Guzik wrote:
> > > > This is generated against:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.18.inode.refcount.preliminaries
> > >
> > > Given how late in the cycle it is I'm going to push this into the v6.19
> > > merge window. You don't need to resend. We might get by with applying
> > > and rebasing given that it's fairly mechanincal overall. Objections
> > > Mateusz?
> >
> > First a nit: if the prelim branch is going in, you may want to adjust
> > the dump_inode commit to use icount_read instead of
> > atomic_read(&inode->i_count));
> >
> > Getting this in *now* is indeed not worth it, so I support the idea.
>
> Now that I wrote this I gave it a little bit of thought.
>
> Note almost all of the churn was generated by coccinelle. Few spots
> got adjusted by hand.
>
> Regressions are possible in 3 ways:
> - wrong routine usage (_raw/_once vs plain) leading to lockdep splats
> - incorrect manual adjustment between _raw/_once and plain variants,
> again leading to lockdep splats
> - incorrect manually added usage (e.g., some of the _set stuff and the
> xfs changes were done that way)
>
> The first two become instant non-problems if lockdep gets elided for
> the merge right now.
>
> The last one may be a real concern, to which I have a
> counter-proposal: extended coccinelle to also cover that, leading to
> *no* manual intervention.
>
> Something like that should be perfectly safe to merge, hopefully
> avoiding some churn headache in the next cycle. Worst case the
> _raw/_once usage would be "wrong" and only come out after lockdep is
> restored.
>
> Another option is to make the patchset into a nop by only providing
> the helpers without _raw/_once variants, again fully generated with
> coccinelle. Again should make it easier to shuffle changes in the next
> cycle.
>
> I can prep this today if it sounds like a plan, but I'm not going to
> strongly argue one way or the other.

So I posted v5 with the no _raw/_once variants approach.

It is more manual conversion than I thought, but it is all pretty
straightforward and contained to a dedicated diff.

If you still want to postpone this work that's fine with me.