[v1] fs/9p: fix inode nlink accounting

[PATCH] fs/9p: fix inode nlink accounting

Posted by Eric Van Hensbergen 2 years, 1 month ago

I was running some regressions and noticed a (race-y) kernel warning that
happens when nlink becomes less than zero.  Looking through the code
it looks like we aren't good about protecting the inode lock when
manipulating nlink and some code that was added several years ago to
protect against bugs in underlying file systems nlink handling didn't
look quite right either.  I took a look at what NFS was doing and tried to
follow similar approaches in the 9p code.

Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
---
 fs/9p/vfs_inode.c      | 32 ++++++++++++++++++++++++--------
 fs/9p/vfs_inode_dotl.c |  2 ++
 2 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index b845ee18a80b..9723c3cbae38 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -508,9 +508,12 @@ static int v9fs_at_to_dotl_flags(int flags)
 /**
  * v9fs_dec_count - helper functon to drop i_nlink.
  *
- * If a directory had nlink <= 2 (including . and ..), then we should not drop
- * the link count, which indicates the underlying exported fs doesn't maintain
- * nlink accurately. e.g.
+ * Put a guards around this so we are only dropping nlink
+ * if it would be valid.  This prevents bugs from an underlying
+ * filesystem implementations from triggering kernel WARNs.  We'll
+ * still print 9p debug messages if the underlying filesystem is wrong.
+ * 
+ * known underlying filesystems which might exhibit this issue:
  * - overlayfs sets nlink to 1 for merged dir
  * - ext4 (with dir_nlink feature enabled) sets nlink to 1 if a dir has more
  *   than EXT4_LINK_MAX (65000) links.
@@ -519,8 +522,13 @@ static int v9fs_at_to_dotl_flags(int flags)
  */
 static void v9fs_dec_count(struct inode *inode)
 {
-	if (!S_ISDIR(inode->i_mode) || inode->i_nlink > 2)
+	spin_lock(&inode->i_lock);
+	if (inode->i_nlink > 0)
 		drop_nlink(inode);
+	else
+		p9_debug(P9_DEBUG_ERROR, "WARNING: nlink is already 0 inode %p\n", 
+			inode);
+	spin_unlock(&inode->i_lock);
 }
 
 /**
@@ -566,8 +574,9 @@ static int v9fs_remove(struct inode *dir, struct dentry *dentry, int flags)
 		 * link count
 		 */
 		if (flags & AT_REMOVEDIR) {
+			spin_lock(&inode->i_lock);
 			clear_nlink(inode);
-			v9fs_dec_count(dir);
+			spin_unlock(&inode->i_lock);
 		} else
 			v9fs_dec_count(inode);
 
@@ -713,7 +722,9 @@ static int v9fs_vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
 		err = PTR_ERR(fid);
 		fid = NULL;
 	} else {
+		spin_lock(&dir->i_lock);
 		inc_nlink(dir);
+		spin_unlock(&dir->i_lock);
 		v9fs_invalidate_inode_attr(dir);
 	}
 
@@ -962,14 +973,19 @@ v9fs_vfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 error_locked:
 	if (!retval) {
 		if (new_inode) {
-			if (S_ISDIR(new_inode->i_mode))
+			if (S_ISDIR(new_inode->i_mode)) {
+				spin_lock(&new_inode->i_lock);
 				clear_nlink(new_inode);
-			else
+				spin_unlock(&new_inode->i_lock);
+			} else
 				v9fs_dec_count(new_inode);
 		}
 		if (S_ISDIR(old_inode->i_mode)) {
-			if (!new_inode)
+			if (!new_inode) {
+				spin_lock(&new_dir->i_lock);
 				inc_nlink(new_dir);
+				spin_unlock(&new_dir->i_lock);
+			}
 			v9fs_dec_count(old_dir);
 		}
 		v9fs_invalidate_inode_attr(old_inode);
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index c7319af2f471..6cc037f726e7 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -427,7 +427,9 @@ static int v9fs_vfs_mkdir_dotl(struct mnt_idmap *idmap,
 		v9fs_set_create_acl(inode, fid, dacl, pacl);
 		d_instantiate(dentry, inode);
 	}
+	spin_lock(&dir->i_lock);
 	inc_nlink(dir);
+	spin_unlock(&dir->i_lock);
 	v9fs_invalidate_inode_attr(dir);
 error:
 	p9_fid_put(fid);

---
base-commit: 5254c0cbc92d2a08e75443bdb914f1c4839cdf5a
change-id: 20240107-fix-nlink-handling-3c0646f5d927

Best regards,
-- 
Eric Van Hensbergen <ericvh@kernel.org>

Re: [PATCH] fs/9p: fix inode nlink accounting

Posted by asmadeus@codewreck.org 2 years, 1 month ago

Eric Van Hensbergen wrote on Sun, Jan 07, 2024 at 07:07:52PM +0000:
> I was running some regressions and noticed a (race-y) kernel warning that
> happens when nlink becomes less than zero.  Looking through the code
> it looks like we aren't good about protecting the inode lock when
> manipulating nlink and some code that was added several years ago to
> protect against bugs in underlying file systems nlink handling didn't
> look quite right either.  I took a look at what NFS was doing and tried to
> follow similar approaches in the 9p code.

I was about to say the set/inc/etc_nlink helpers could probably just be
using atomic (there's an atomic_dec_if_postive that we could have used
for the v9fs_dec_count warning), but this isn't our code so not much to
do about that -- I agree it needs a lock.

I didn't take the time to check if you missed any, but it won't be worse
than what we have right now:
Acked-by: Dominique Martinet <asmadeus@codewreck.org>

-- 
Dominique Martinet | Asmadeus

Re: [PATCH] fs/9p: fix inode nlink accounting

Posted by Christian Schoenebeck 2 years, 1 month ago

On Monday, January 8, 2024 12:19:34 PM CET asmadeus@codewreck.org wrote:
> Eric Van Hensbergen wrote on Sun, Jan 07, 2024 at 07:07:52PM +0000:
> > I was running some regressions and noticed a (race-y) kernel warning that
> > happens when nlink becomes less than zero.  Looking through the code
> > it looks like we aren't good about protecting the inode lock when
> > manipulating nlink and some code that was added several years ago to
> > protect against bugs in underlying file systems nlink handling didn't
> > look quite right either.  I took a look at what NFS was doing and tried to
> > follow similar approaches in the 9p code.
> 
> I was about to say the set/inc/etc_nlink helpers could probably just be
> using atomic (there's an atomic_dec_if_postive that we could have used
> for the v9fs_dec_count warning), but this isn't our code so not much to
> do about that -- I agree it needs a lock.
> 
> I didn't take the time to check if you missed any, but it won't be worse
> than what we have right now:
> Acked-by: Dominique Martinet <asmadeus@codewreck.org>

That's actually a good point. For these tasks atomic inc/sub/etc are usually
used instead of locks.

I would at least add local wrapper functions that would do these spinlocks for
us.

However would it be too bold to change those inode functions to use atomic
operations directly on their end?

/Christian

Re: [PATCH] fs/9p: fix inode nlink accounting

Posted by Eric Van Hensbergen 2 years, 1 month ago

On Mon, Jan 8, 2024 at 6:08 AM Christian Schoenebeck
<linux_oss@crudebyte.com> wrote:
>
> On Monday, January 8, 2024 12:19:34 PM CET asmadeus@codewreck.org wrote:
> > Eric Van Hensbergen wrote on Sun, Jan 07, 2024 at 07:07:52PM +0000:
> > > I was running some regressions and noticed a (race-y) kernel warning that
> > > happens when nlink becomes less than zero.  Looking through the code
> > > it looks like we aren't good about protecting the inode lock when
> > > manipulating nlink and some code that was added several years ago to
> > > protect against bugs in underlying file systems nlink handling didn't
> > > look quite right either.  I took a look at what NFS was doing and tried to
> > > follow similar approaches in the 9p code.
> >
> > I was about to say the set/inc/etc_nlink helpers could probably just be
> > using atomic (there's an atomic_dec_if_postive that we could have used
> > for the v9fs_dec_count warning), but this isn't our code so not much to
> > do about that -- I agree it needs a lock.
> >
> > I didn't take the time to check if you missed any, but it won't be worse
> > than what we have right now:
> > Acked-by: Dominique Martinet <asmadeus@codewreck.org>
>
> That's actually a good point. For these tasks atomic inc/sub/etc are usually
> used instead of locks.
>
> I would at least add local wrapper functions that would do these spinlocks for
> us.
>

I'm good with adding local wrapper functions,  I imagine these aren't
used in the kernel because for regular file-systems maybe you want the
warning that your inode link accounting is wrong.
I suppose we could be naughty and not use the kernel functions (which
themselves are basically wrappers).

      -eric