fs/inode.c | 46 +++++++++++++++++++++++++++++++++++----------- 1 file changed, 35 insertions(+), 11 deletions(-)
The material change is I_DIRTY_TIME handling without a spurious ref
acquire/release cycle.
While here a bunch of smaller changes:
1. predict there is an inode -- bpftrace suggests one is passed vast
majority of the time
2. convert BUG_ON into VFS_BUG_ON_INODE
3. assert on ->i_count
4. assert ->i_lock is not held
5. flip the order of I_DIRTY_TIME and nlink count checks as the former
is less likely to be true
I verified atomic_read(&inode->i_count) does not show up in asm if
debug is disabled.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
The routine kept annoying me, so here is a further revised variant.
I verified this compiles, but I still cannot runtime test. I'm sorry for
that. My signed-off is conditional on a good samaritan making sure it
works :)
diff compared to the thing I sent "informally":
- if (unlikely(!inode))
- asserts
- slightly reworded iput_final commentary
- unlikely() on the second I_DIRTY_TIME check
Given the revamp I think it makes sense to attribute the change to me,
hence a "proper" mail.
The thing surviving from the submission by Josef is:
+ if (atomic_add_unless(&inode->i_count, -1, 1))
+ return;
And of course he is the one who brought up the spurious refcount trip in
the first place.
I'm happy with Reported-by, Co-developed-by or whatever other credit
as you guys see fit.
That aside I think it would be nice if NULL inodes passed to iput
became illegal, but that's a different story for another day.
fs/inode.c | 46 +++++++++++++++++++++++++++++++++++-----------
1 file changed, 35 insertions(+), 11 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index 01ebdc40021e..01a554e11279 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1908,20 +1908,44 @@ static void iput_final(struct inode *inode)
*/
void iput(struct inode *inode)
{
- if (!inode)
+ if (unlikely(!inode))
return;
- BUG_ON(inode->i_state & I_CLEAR);
+
retry:
- if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock)) {
- if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) {
- atomic_inc(&inode->i_count);
- spin_unlock(&inode->i_lock);
- trace_writeback_lazytime_iput(inode);
- mark_inode_dirty_sync(inode);
- goto retry;
- }
- iput_final(inode);
+ lockdep_assert_not_held(&inode->i_lock);
+ VFS_BUG_ON_INODE(inode->i_state & I_CLEAR, inode);
+ /*
+ * Note this assert is technically racy as if the count is bogusly
+ * equal to one, then two CPUs racing to further drop it can both
+ * conclude it's fine.
+ */
+ VFS_BUG_ON_INODE(atomic_read(&inode->i_count) < 1, inode);
+
+ if (atomic_add_unless(&inode->i_count, -1, 1))
+ return;
+
+ if ((inode->i_state & I_DIRTY_TIME) && inode->i_nlink) {
+ trace_writeback_lazytime_iput(inode);
+ mark_inode_dirty_sync(inode);
+ goto retry;
}
+
+ spin_lock(&inode->i_lock);
+ if (unlikely((inode->i_state & I_DIRTY_TIME) && inode->i_nlink)) {
+ spin_unlock(&inode->i_lock);
+ goto retry;
+ }
+
+ if (!atomic_dec_and_test(&inode->i_count)) {
+ spin_unlock(&inode->i_lock);
+ return;
+ }
+
+ /*
+ * iput_final() drops ->i_lock, we can't assert on it as the inode may
+ * be deallocated by the time the call returns.
+ */
+ iput_final(inode);
}
EXPORT_SYMBOL(iput);
--
2.43.0
I'm writing a long response to this series, in the meantime I noticed this bit landed in https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.18.inode.refcount.preliminaries&id=3cba19f6a00675fbc2af0987dfc90e216e6cfb74 but with some whitespace issues in comments -- they are indented with spaces instead of tabs after the opening line. I verified the mail I sent does not have it, so I'm guessing this was copy-pasted? Tabing them by hand does the trick, below is my copy-paste as proof, please indent by hand in your editor ;) diff --git a/fs/inode.c b/fs/inode.c index 2db680a37235..fe4868e2a954 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1915,10 +1915,10 @@ void iput(struct inode *inode) lockdep_assert_not_held(&inode->i_lock); VFS_BUG_ON_INODE(inode->i_state & I_CLEAR, inode); /* - * Note this assert is technically racy as if the count is bogusly - * equal to one, then two CPUs racing to further drop it can both - * conclude it's fine. - */ + * Note this assert is technically racy as if the count is bogusly + * equal to one, then two CPUs racing to further drop it can both + * conclude it's fine. + */ VFS_BUG_ON_INODE(atomic_read(&inode->i_count) < 1, inode); if (atomic_add_unless(&inode->i_count, -1, 1)) @@ -1942,9 +1942,9 @@ void iput(struct inode *inode) } /* - * iput_final() drops ->i_lock, we can't assert on it as the inode may - * be deallocated by the time the call returns. - */ + * iput_final() drops ->i_lock, we can't assert on it as the inode may + * be deallocated by the time the call returns. + */ iput_final(inode); } EXPORT_SYMBOL(iput); While here, vim told me about spaces instead of tabs in 2 more spots in the file. Again to show the lines: diff --git a/fs/inode.c b/fs/inode.c index 2db680a37235..833de5457a06 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -550,11 +550,11 @@ static void __inode_add_lru(struct inode *inode, bool rotate) struct wait_queue_head *inode_bit_waitqueue(struct wait_bit_queue_entry *wqe, struct inode *inode, u32 bit) { - void *bit_address; + void *bit_address; - bit_address = inode_state_wait_address(inode, bit); - init_wait_var_entry(wqe, bit_address, 0); - return __var_waitqueue(bit_address); + bit_address = inode_state_wait_address(inode, bit); + init_wait_var_entry(wqe, bit_address, 0); + return __var_waitqueue(bit_address); } EXPORT_SYMBOL(inode_bit_waitqueue); @@ -2938,7 +2938,7 @@ EXPORT_SYMBOL(mode_strip_sgid); */ void dump_inode(struct inode *inode, const char *reason) { - pr_warn("%s encountered for inode %px", reason, inode); + pr_warn("%s encountered for inode %px", reason, inode); } EXPORT_SYMBOL(dump_inode); Christian, I think it would be the most expedient if you just made changes on your own with whatever commit message you see fit. No need to mention I brought this up. If you insist I can send a patch. On Wed, Aug 27, 2025 at 6:24 PM Mateusz Guzik <mjguzik@gmail.com> wrote: > > The material change is I_DIRTY_TIME handling without a spurious ref > acquire/release cycle. > > While here a bunch of smaller changes: > 1. predict there is an inode -- bpftrace suggests one is passed vast > majority of the time > 2. convert BUG_ON into VFS_BUG_ON_INODE > 3. assert on ->i_count > 4. assert ->i_lock is not held > 5. flip the order of I_DIRTY_TIME and nlink count checks as the former > is less likely to be true > > I verified atomic_read(&inode->i_count) does not show up in asm if > debug is disabled. > > Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> > --- > > The routine kept annoying me, so here is a further revised variant. > > I verified this compiles, but I still cannot runtime test. I'm sorry for > that. My signed-off is conditional on a good samaritan making sure it > works :) > > diff compared to the thing I sent "informally": > - if (unlikely(!inode)) > - asserts > - slightly reworded iput_final commentary > - unlikely() on the second I_DIRTY_TIME check > > Given the revamp I think it makes sense to attribute the change to me, > hence a "proper" mail. > > The thing surviving from the submission by Josef is: > + if (atomic_add_unless(&inode->i_count, -1, 1)) > + return; > > And of course he is the one who brought up the spurious refcount trip in > the first place. > > I'm happy with Reported-by, Co-developed-by or whatever other credit > as you guys see fit. > > That aside I think it would be nice if NULL inodes passed to iput > became illegal, but that's a different story for another day. > > fs/inode.c | 46 +++++++++++++++++++++++++++++++++++----------- > 1 file changed, 35 insertions(+), 11 deletions(-) > > diff --git a/fs/inode.c b/fs/inode.c > index 01ebdc40021e..01a554e11279 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -1908,20 +1908,44 @@ static void iput_final(struct inode *inode) > */ > void iput(struct inode *inode) > { > - if (!inode) > + if (unlikely(!inode)) > return; > - BUG_ON(inode->i_state & I_CLEAR); > + > retry: > - if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock)) { > - if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) { > - atomic_inc(&inode->i_count); > - spin_unlock(&inode->i_lock); > - trace_writeback_lazytime_iput(inode); > - mark_inode_dirty_sync(inode); > - goto retry; > - } > - iput_final(inode); > + lockdep_assert_not_held(&inode->i_lock); > + VFS_BUG_ON_INODE(inode->i_state & I_CLEAR, inode); > + /* > + * Note this assert is technically racy as if the count is bogusly > + * equal to one, then two CPUs racing to further drop it can both > + * conclude it's fine. > + */ > + VFS_BUG_ON_INODE(atomic_read(&inode->i_count) < 1, inode); > + > + if (atomic_add_unless(&inode->i_count, -1, 1)) > + return; > + > + if ((inode->i_state & I_DIRTY_TIME) && inode->i_nlink) { > + trace_writeback_lazytime_iput(inode); > + mark_inode_dirty_sync(inode); > + goto retry; > } > + > + spin_lock(&inode->i_lock); > + if (unlikely((inode->i_state & I_DIRTY_TIME) && inode->i_nlink)) { > + spin_unlock(&inode->i_lock); > + goto retry; > + } > + > + if (!atomic_dec_and_test(&inode->i_count)) { > + spin_unlock(&inode->i_lock); > + return; > + } > + > + /* > + * iput_final() drops ->i_lock, we can't assert on it as the inode may > + * be deallocated by the time the call returns. > + */ > + iput_final(inode); > } > EXPORT_SYMBOL(iput); > > -- > 2.43.0 > -- Mateusz Guzik <mjguzik gmail.com>
> Christian, I think it would be the most expedient if you just made Ok, thanks.
On Sat 30-08-25 17:54:35, Mateusz Guzik wrote: > I'm writing a long response to this series, in the meantime I noticed > this bit landed in > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.18.inode.refcount.preliminaries&id=3cba19f6a00675fbc2af0987dfc90e216e6cfb74 > but with some whitespace issues in comments -- they are indented with > spaces instead of tabs after the opening line. Interesting. I didn't see an email about inclusion. Anyway, the change looks good to me so Christian, feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR
On Mon, Sep 01, 2025 at 10:50:59AM +0200, Jan Kara wrote: > On Sat 30-08-25 17:54:35, Mateusz Guzik wrote: > > I'm writing a long response to this series, in the meantime I noticed > > this bit landed in > > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.18.inode.refcount.preliminaries&id=3cba19f6a00675fbc2af0987dfc90e216e6cfb74 > > but with some whitespace issues in comments -- they are indented with > > spaces instead of tabs after the opening line. > > Interesting. I didn't see an email about inclusion. Anyway, the change Sorry, that waas my bad. I talked with Josef off-list and told him that I would apply Mateusz suggestions with his CdB and SoB added. I forgot to repeat that on the list. > looks good to me so Christian, feel free to add: > > Reviewed-by: Jan Kara <jack@suse.cz> Thanks!
© 2016 - 2025 Red Hat, Inc.