[v1] nilfs2: reject CLEAN_SEGMENTS ioctl with out-of-range segment numbers

[PATCH] nilfs2: reject CLEAN_SEGMENTS ioctl with out-of-range segment numbers

Posted by Deepanshu Kartikey 1 month, 2 weeks ago

Syzbot reported a hung task in nilfs_transaction_begin() where multiple
tasks performing chmod() on a nilfs2 mount blocked for over 143 seconds
waiting to acquire ns_segctor_sem for read:

  INFO: task syz.0.17:5918 blocked for more than 143 seconds.
  Call Trace:
   schedule+0x164/0x360
   rwsem_down_read_slowpath+0x6d9/0x940
   down_read+0x99/0x2e0
   nilfs_transaction_begin+0x364/0x710 fs/nilfs2/segment.c:221
   nilfs_setattr+0x124/0x2c0 fs/nilfs2/inode.c:921
   notify_change+0xc1a/0xf40
   chmod_common+0x273/0x4a0
   do_fchmodat+0x12d/0x230

The writer holding ns_segctor_sem was a concurrent NILFS_IOCTL_CLEAN_SEGMENTS
caller, stuck inside printk while emitting per-element warnings from
nilfs_sufile_updatev():

   __nilfs_msg+0x373/0x450 fs/nilfs2/super.c:78
   nilfs_sufile_updatev+0x21c/0x6d0 fs/nilfs2/sufile.c:186
   nilfs_sufile_freev fs/nilfs2/sufile.h:93 [inline]
   nilfs_free_segments fs/nilfs2/segment.c:1140 [inline]
   nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1261 [inline]
   nilfs_segctor_do_construct+0x1f55/0x76c0
   nilfs_clean_segments+0x3bd/0xa50
   nilfs_ioctl_clean_segments fs/nilfs2/ioctl.c:922 [inline]
   nilfs_ioctl+0x261f/0x2780

The root cause is that nilfs_ioctl_clean_segments() does not validate
the user-supplied segment numbers in kbufs[4] before calling
nilfs_clean_segments(), which acquires ns_segctor_sem for write.  The
range check on each segnum is performed deep inside the call chain by
nilfs_sufile_updatev(), which emits a nilfs_warn() per invalid entry
while still under the segctor lock and the sufile mi_sem.  Under load
(repeated invocations across multiple mounts saturating the global
printk path), the cumulative printk latency keeps ns_segctor_sem held
long enough to trip the hung_task watchdog, blocking concurrent
operations such as chmod() that need ns_segctor_sem for read.

Fix by validating the contents of kbufs[4] in the ioctl entry path,
before any FS-wide lock is acquired.  Out-of-range segment numbers are
rejected with -EINVAL synchronously, with no work performed under
ns_segctor_sem.

Reported-by: syzbot+62f0f99d2f2bb8e3bbd7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=62f0f99d2f2bb8e3bbd7
Fixes: 4f6b828837b4 ("nilfs2: fix lock order reversal in nilfs_clean_segments ioctl")
Tested-by: syzbot+62f0f99d2f2bb8e3bbd7@syzkaller.appspotmail.com
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
 fs/nilfs2/ioctl.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index e0a606643e87..38822dce1839 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -846,6 +846,7 @@ static int nilfs_ioctl_clean_segments(struct inode *inode, struct file *filp,
 	struct the_nilfs *nilfs;
 	size_t len, nsegs;
 	int n, ret;
+	size_t i;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
@@ -876,6 +877,21 @@ static int nilfs_ioctl_clean_segments(struct inode *inode, struct file *filp,
 	}
 	nilfs = inode->i_sb->s_fs_info;
 
+	/*
+	 * Validate segment numbers against the filesystem's segment count
+	 * before entering nilfs_clean_segments(), which acquires
+	 * ns_segctor_sem for write.  Catching invalid segnums here avoids
+	 * holding that lock while emitting per-element diagnostics under
+	 * the segment constructor.
+	 */
+	for (i = 0; i < nsegs; i++) {
+		if (((__u64 *)kbufs[4])[i] >= nilfs->ns_nsegments) {
+			ret = -EINVAL;
+			kfree(kbufs[4]);
+			goto out;
+		}
+	}
+
 	for (n = 0; n < 4; n++) {
 		ret = -EINVAL;
 		if (argv[n].v_size != argsz[n])
-- 
2.43.0

Re: [PATCH] nilfs2: reject CLEAN_SEGMENTS ioctl with out-of-range segment numbers

Posted by Viacheslav Dubeyko 1 month, 2 weeks ago

On Tue, 2026-04-28 at 09:32 +0530, Deepanshu Kartikey wrote:
> Syzbot reported a hung task in nilfs_transaction_begin() where multiple
> tasks performing chmod() on a nilfs2 mount blocked for over 143 seconds
> waiting to acquire ns_segctor_sem for read:
> 
>   INFO: task syz.0.17:5918 blocked for more than 143 seconds.
>   Call Trace:
>    schedule+0x164/0x360
>    rwsem_down_read_slowpath+0x6d9/0x940
>    down_read+0x99/0x2e0
>    nilfs_transaction_begin+0x364/0x710 fs/nilfs2/segment.c:221
>    nilfs_setattr+0x124/0x2c0 fs/nilfs2/inode.c:921
>    notify_change+0xc1a/0xf40
>    chmod_common+0x273/0x4a0
>    do_fchmodat+0x12d/0x230
> 
> The writer holding ns_segctor_sem was a concurrent NILFS_IOCTL_CLEAN_SEGMENTS
> caller, stuck inside printk while emitting per-element warnings from
> nilfs_sufile_updatev():
> 
>    __nilfs_msg+0x373/0x450 fs/nilfs2/super.c:78
>    nilfs_sufile_updatev+0x21c/0x6d0 fs/nilfs2/sufile.c:186
>    nilfs_sufile_freev fs/nilfs2/sufile.h:93 [inline]
>    nilfs_free_segments fs/nilfs2/segment.c:1140 [inline]
>    nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1261 [inline]
>    nilfs_segctor_do_construct+0x1f55/0x76c0
>    nilfs_clean_segments+0x3bd/0xa50
>    nilfs_ioctl_clean_segments fs/nilfs2/ioctl.c:922 [inline]
>    nilfs_ioctl+0x261f/0x2780
> 
> The root cause is that nilfs_ioctl_clean_segments() does not validate
> the user-supplied segment numbers in kbufs[4] before calling
> nilfs_clean_segments(), which acquires ns_segctor_sem for write.  The
> range check on each segnum is performed deep inside the call chain by
> nilfs_sufile_updatev(), which emits a nilfs_warn() per invalid entry
> while still under the segctor lock and the sufile mi_sem.  Under load
> (repeated invocations across multiple mounts saturating the global
> printk path), the cumulative printk latency keeps ns_segctor_sem held
> long enough to trip the hung_task watchdog, blocking concurrent
> operations such as chmod() that need ns_segctor_sem for read.
> 
> Fix by validating the contents of kbufs[4] in the ioctl entry path,
> before any FS-wide lock is acquired.  Out-of-range segment numbers are
> rejected with -EINVAL synchronously, with no work performed under
> ns_segctor_sem.
> 
> Reported-by: syzbot+62f0f99d2f2bb8e3bbd7@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=62f0f99d2f2bb8e3bbd7
> Fixes: 4f6b828837b4 ("nilfs2: fix lock order reversal in nilfs_clean_segments ioctl")
> Tested-by: syzbot+62f0f99d2f2bb8e3bbd7@syzkaller.appspotmail.com
> Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
> ---
>  fs/nilfs2/ioctl.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
> index e0a606643e87..38822dce1839 100644
> --- a/fs/nilfs2/ioctl.c
> +++ b/fs/nilfs2/ioctl.c
> @@ -846,6 +846,7 @@ static int nilfs_ioctl_clean_segments(struct inode *inode, struct file *filp,
>  	struct the_nilfs *nilfs;
>  	size_t len, nsegs;
>  	int n, ret;
> +	size_t i;

What about re-using the n variable? Does it make sense to introduce new one?

>  
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
> @@ -876,6 +877,21 @@ static int nilfs_ioctl_clean_segments(struct inode *inode, struct file *filp,
>  	}
>  	nilfs = inode->i_sb->s_fs_info;
>  
> +	/*
> +	 * Validate segment numbers against the filesystem's segment count
> +	 * before entering nilfs_clean_segments(), which acquires
> +	 * ns_segctor_sem for write.  Catching invalid segnums here avoids
> +	 * holding that lock while emitting per-element diagnostics under
> +	 * the segment constructor.
> +	 */
> +	for (i = 0; i < nsegs; i++) {
> +		if (((__u64 *)kbufs[4])[i] >= nilfs->ns_nsegments) {
> +			ret = -EINVAL;
> +			kfree(kbufs[4]);
> +			goto out;

Are you sure that you need to free buffer here and go to out? Maybe, we can
introduce another label and to jump to kfree(kbufs[4]) at the end of method?

Thanks,
Slava.

> +		}
> +	}
> +
>  	for (n = 0; n < 4; n++) {
>  		ret = -EINVAL;
>  		if (argv[n].v_size != argsz[n])

Re: [PATCH] nilfs2: reject CLEAN_SEGMENTS ioctl with out-of-range segment numbers

Posted by Deepanshu Kartikey 1 month, 2 weeks ago

On Wed, Apr 29, 2026 at 12:29 AM Viacheslav Dubeyko <vdubeyko@redhat.com> wrote:
>
> On Tue, 2026-04-28 at 09:32 +0530, Deepanshu Kartikey wrote:
> > Syzbot reported a hung task in nilfs_transaction_begin() where multiple
> > tasks performing chmod() on a nilfs2 mount blocked for over 143 seconds
> > waiting to acquire ns_segctor_sem for read:
> >
> >   INFO: task syz.0.17:5918 blocked for more than 143 seconds.
> >   Call Trace:
> >    schedule+0x164/0x360
> >    rwsem_down_read_slowpath+0x6d9/0x940
> >    down_read+0x99/0x2e0
> >    nilfs_transaction_begin+0x364/0x710 fs/nilfs2/segment.c:221
> >    nilfs_setattr+0x124/0x2c0 fs/nilfs2/inode.c:921
> >    notify_change+0xc1a/0xf40
> >    chmod_common+0x273/0x4a0
> >    do_fchmodat+0x12d/0x230
> >
> > The writer holding ns_segctor_sem was a concurrent NILFS_IOCTL_CLEAN_SEGMENTS
> > caller, stuck inside printk while emitting per-element warnings from
> > nilfs_sufile_updatev():
> >
> >    __nilfs_msg+0x373/0x450 fs/nilfs2/super.c:78
> >    nilfs_sufile_updatev+0x21c/0x6d0 fs/nilfs2/sufile.c:186
> >    nilfs_sufile_freev fs/nilfs2/sufile.h:93 [inline]
> >    nilfs_free_segments fs/nilfs2/segment.c:1140 [inline]
> >    nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1261 [inline]
> >    nilfs_segctor_do_construct+0x1f55/0x76c0
> >    nilfs_clean_segments+0x3bd/0xa50
> >    nilfs_ioctl_clean_segments fs/nilfs2/ioctl.c:922 [inline]
> >    nilfs_ioctl+0x261f/0x2780
> >
> > The root cause is that nilfs_ioctl_clean_segments() does not validate
> > the user-supplied segment numbers in kbufs[4] before calling
> > nilfs_clean_segments(), which acquires ns_segctor_sem for write.  The
> > range check on each segnum is performed deep inside the call chain by
> > nilfs_sufile_updatev(), which emits a nilfs_warn() per invalid entry
> > while still under the segctor lock and the sufile mi_sem.  Under load
> > (repeated invocations across multiple mounts saturating the global
> > printk path), the cumulative printk latency keeps ns_segctor_sem held
> > long enough to trip the hung_task watchdog, blocking concurrent
> > operations such as chmod() that need ns_segctor_sem for read.
> >
> > Fix by validating the contents of kbufs[4] in the ioctl entry path,
> > before any FS-wide lock is acquired.  Out-of-range segment numbers are
> > rejected with -EINVAL synchronously, with no work performed under
> > ns_segctor_sem.
> >
> > Reported-by: syzbot+62f0f99d2f2bb8e3bbd7@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=62f0f99d2f2bb8e3bbd7
> > Fixes: 4f6b828837b4 ("nilfs2: fix lock order reversal in nilfs_clean_segments ioctl")
> > Tested-by: syzbot+62f0f99d2f2bb8e3bbd7@syzkaller.appspotmail.com
> > Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
> > ---
> >  fs/nilfs2/ioctl.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
> > index e0a606643e87..38822dce1839 100644
> > --- a/fs/nilfs2/ioctl.c
> > +++ b/fs/nilfs2/ioctl.c
> > @@ -846,6 +846,7 @@ static int nilfs_ioctl_clean_segments(struct inode *inode, struct file *filp,
> >       struct the_nilfs *nilfs;
> >       size_t len, nsegs;
> >       int n, ret;
> > +     size_t i;
>
> What about re-using the n variable? Does it make sense to introduce new one?
>
> >
> >       if (!capable(CAP_SYS_ADMIN))
> >               return -EPERM;
> > @@ -876,6 +877,21 @@ static int nilfs_ioctl_clean_segments(struct inode *inode, struct file *filp,
> >       }
> >       nilfs = inode->i_sb->s_fs_info;
> >
> > +     /*
> > +      * Validate segment numbers against the filesystem's segment count
> > +      * before entering nilfs_clean_segments(), which acquires
> > +      * ns_segctor_sem for write.  Catching invalid segnums here avoids
> > +      * holding that lock while emitting per-element diagnostics under
> > +      * the segment constructor.
> > +      */
> > +     for (i = 0; i < nsegs; i++) {
> > +             if (((__u64 *)kbufs[4])[i] >= nilfs->ns_nsegments) {
> > +                     ret = -EINVAL;
> > +                     kfree(kbufs[4]);
> > +                     goto out;
>
> Are you sure that you need to free buffer here and go to out? Maybe, we can
> introduce another label and to jump to kfree(kbufs[4]) at the end of method?
>
> Thanks,
> Slava.
>
> > +             }
> > +     }
> > +
> >       for (n = 0; n < 4; n++) {
> >               ret = -EINVAL;
> >               if (argv[n].v_size != argsz[n])
>

Thanks for the feedback. I have sent patch v2.

Thanks

Deepanshu Kartikey

Re: [PATCH] nilfs2: reject CLEAN_SEGMENTS ioctl with out-of-range segment numbers

Posted by Ryusuke Konishi 1 month, 2 weeks ago

On Wed, Apr 29, 2026 at 10:50 AM Deepanshu Kartikey wrote:
>
> On Wed, Apr 29, 2026 at 12:29 AM Viacheslav Dubeyko wrote:
> >
> > On Tue, 2026-04-28 at 09:32 +0530, Deepanshu Kartikey wrote:
> > > Syzbot reported a hung task in nilfs_transaction_begin() where multiple
> > > tasks performing chmod() on a nilfs2 mount blocked for over 143 seconds
> > > waiting to acquire ns_segctor_sem for read:
> > >
> > >   INFO: task syz.0.17:5918 blocked for more than 143 seconds.
> > >   Call Trace:
> > >    schedule+0x164/0x360
> > >    rwsem_down_read_slowpath+0x6d9/0x940
> > >    down_read+0x99/0x2e0
> > >    nilfs_transaction_begin+0x364/0x710 fs/nilfs2/segment.c:221
> > >    nilfs_setattr+0x124/0x2c0 fs/nilfs2/inode.c:921
> > >    notify_change+0xc1a/0xf40
> > >    chmod_common+0x273/0x4a0
> > >    do_fchmodat+0x12d/0x230
> > >
> > > The writer holding ns_segctor_sem was a concurrent NILFS_IOCTL_CLEAN_SEGMENTS
> > > caller, stuck inside printk while emitting per-element warnings from
> > > nilfs_sufile_updatev():
> > >
> > >    __nilfs_msg+0x373/0x450 fs/nilfs2/super.c:78
> > >    nilfs_sufile_updatev+0x21c/0x6d0 fs/nilfs2/sufile.c:186
> > >    nilfs_sufile_freev fs/nilfs2/sufile.h:93 [inline]
> > >    nilfs_free_segments fs/nilfs2/segment.c:1140 [inline]
> > >    nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1261 [inline]
> > >    nilfs_segctor_do_construct+0x1f55/0x76c0
> > >    nilfs_clean_segments+0x3bd/0xa50
> > >    nilfs_ioctl_clean_segments fs/nilfs2/ioctl.c:922 [inline]
> > >    nilfs_ioctl+0x261f/0x2780
> > >
> > > The root cause is that nilfs_ioctl_clean_segments() does not validate
> > > the user-supplied segment numbers in kbufs[4] before calling
> > > nilfs_clean_segments(), which acquires ns_segctor_sem for write.  The
> > > range check on each segnum is performed deep inside the call chain by
> > > nilfs_sufile_updatev(), which emits a nilfs_warn() per invalid entry
> > > while still under the segctor lock and the sufile mi_sem.  Under load
> > > (repeated invocations across multiple mounts saturating the global
> > > printk path), the cumulative printk latency keeps ns_segctor_sem held
> > > long enough to trip the hung_task watchdog, blocking concurrent
> > > operations such as chmod() that need ns_segctor_sem for read.
> > >
> > > Fix by validating the contents of kbufs[4] in the ioctl entry path,
> > > before any FS-wide lock is acquired.  Out-of-range segment numbers are
> > > rejected with -EINVAL synchronously, with no work performed under
> > > ns_segctor_sem.
> > >
> > > Reported-by: syzbot+62f0f99d2f2bb8e3bbd7@syzkaller.appspotmail.com
> > > Closes: https://syzkaller.appspot.com/bug?extid=62f0f99d2f2bb8e3bbd7
> > > Fixes: 4f6b828837b4 ("nilfs2: fix lock order reversal in nilfs_clean_segments ioctl")
> > > Tested-by: syzbot+62f0f99d2f2bb8e3bbd7@syzkaller.appspotmail.com
> > > Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
> > > ---
> > >  fs/nilfs2/ioctl.c | 16 ++++++++++++++++
> > >  1 file changed, 16 insertions(+)
> > >
> > > diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
> > > index e0a606643e87..38822dce1839 100644
> > > --- a/fs/nilfs2/ioctl.c
> > > +++ b/fs/nilfs2/ioctl.c
> > > @@ -846,6 +846,7 @@ static int nilfs_ioctl_clean_segments(struct inode *inode, struct file *filp,
> > >       struct the_nilfs *nilfs;
> > >       size_t len, nsegs;
> > >       int n, ret;
> > > +     size_t i;
> >
> > What about re-using the n variable? Does it make sense to introduce new one?
> >
> > >
> > >       if (!capable(CAP_SYS_ADMIN))
> > >               return -EPERM;
> > > @@ -876,6 +877,21 @@ static int nilfs_ioctl_clean_segments(struct inode *inode, struct file *filp,
> > >       }
> > >       nilfs = inode->i_sb->s_fs_info;
> > >
> > > +     /*
> > > +      * Validate segment numbers against the filesystem's segment count
> > > +      * before entering nilfs_clean_segments(), which acquires
> > > +      * ns_segctor_sem for write.  Catching invalid segnums here avoids
> > > +      * holding that lock while emitting per-element diagnostics under
> > > +      * the segment constructor.
> > > +      */
> > > +     for (i = 0; i < nsegs; i++) {
> > > +             if (((__u64 *)kbufs[4])[i] >= nilfs->ns_nsegments) {
> > > +                     ret = -EINVAL;
> > > +                     kfree(kbufs[4]);
> > > +                     goto out;
> >
> > Are you sure that you need to free buffer here and go to out? Maybe, we can
> > introduce another label and to jump to kfree(kbufs[4]) at the end of method?
> >
> > Thanks,
> > Slava.
> >
> > > +             }
> > > +     }
> > > +
> > >       for (n = 0; n < 4; n++) {
> > >               ret = -EINVAL;
> > >               if (argv[n].v_size != argsz[n])
> >
>
> Thanks for the feedback. I have sent patch v2.
>
> Thanks
>
> Deepanshu Kartikey

Thank you, Deepanshu, for the patch proposal.

Because nilfs->ns_nsegments can be modified by nilfs_ioctl_resize(),
we must avoid race conditions regarding this proposed fix.

Therefore, it's appropriate to insert this check within the write lock
section of nilfs->ns_segctor_sem, that is, immediately after calling
nilfs_transaction_lock() within nilfs_clean_segments().

As a coding comment, directly referencing kbufs[4] as an array is not
very readable, so it's better to declare a variable in the local
variable declaration section of nilfs_clean_segments() like this:

        size_t i, nsegs = argv[4].v_nmembs;
        __u64 *segnumv = kbufs[4];

and then compare by referencing segnumv[i].
Here, the original variable name "nsegs" is confusingly similar to
"ns_nsegments", therefore, it would be better to simply change it to
"n" or rename it to something that more concisely represents the
number of segments in the array.

Also, to help identify the error's cause, I recommend adding an error
message like the following within the pre-check loop:

        nilfs_err(sb,
                "Segment number %llu to be freed is out of range",
                (unsigned long long)segnumv[i]);

Finally, a minor comment: to avoid scattering the function's exit path
when making corrections according to the above policy, it's better to
add a label like the following to nilfs_clean_segments() and jump to
it, rather than returning separately.

out_unlock:
        ...

bail_unlock:
        nilfs_transaction_unlock(sb);
        return err;

Thanks,
Ryusuke Konishi

Re: [PATCH] nilfs2: reject CLEAN_SEGMENTS ioctl with out-of-range segment numbers

Posted by Deepanshu Kartikey 1 month, 2 weeks ago

On Wed, Apr 29, 2026 at 6:02 PM Ryusuke Konishi
<konishi.ryusuke@gmail.com> wrote:
>
> On Wed, Apr 29, 2026 at 10:50 AM Deepanshu Kartikey wrote:
> >
> > On Wed, Apr 29, 2026 at 12:29 AM Viacheslav Dubeyko wrote:
> > >
> > > On Tue, 2026-04-28 at 09:32 +0530, Deepanshu Kartikey wrote:
> > > > Syzbot reported a hung task in nilfs_transaction_begin() where multiple
> > > > tasks performing chmod() on a nilfs2 mount blocked for over 143 seconds
> > > > waiting to acquire ns_segctor_sem for read:
> > > >
> > > >   INFO: task syz.0.17:5918 blocked for more than 143 seconds.
> > > >   Call Trace:
> > > >    schedule+0x164/0x360
> > > >    rwsem_down_read_slowpath+0x6d9/0x940
> > > >    down_read+0x99/0x2e0
> > > >    nilfs_transaction_begin+0x364/0x710 fs/nilfs2/segment.c:221
> > > >    nilfs_setattr+0x124/0x2c0 fs/nilfs2/inode.c:921
> > > >    notify_change+0xc1a/0xf40
> > > >    chmod_common+0x273/0x4a0
> > > >    do_fchmodat+0x12d/0x230
> > > >
> > > > The writer holding ns_segctor_sem was a concurrent NILFS_IOCTL_CLEAN_SEGMENTS
> > > > caller, stuck inside printk while emitting per-element warnings from
> > > > nilfs_sufile_updatev():
> > > >
> > > >    __nilfs_msg+0x373/0x450 fs/nilfs2/super.c:78
> > > >    nilfs_sufile_updatev+0x21c/0x6d0 fs/nilfs2/sufile.c:186
> > > >    nilfs_sufile_freev fs/nilfs2/sufile.h:93 [inline]
> > > >    nilfs_free_segments fs/nilfs2/segment.c:1140 [inline]
> > > >    nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1261 [inline]
> > > >    nilfs_segctor_do_construct+0x1f55/0x76c0
> > > >    nilfs_clean_segments+0x3bd/0xa50
> > > >    nilfs_ioctl_clean_segments fs/nilfs2/ioctl.c:922 [inline]
> > > >    nilfs_ioctl+0x261f/0x2780
> > > >
> > > > The root cause is that nilfs_ioctl_clean_segments() does not validate
> > > > the user-supplied segment numbers in kbufs[4] before calling
> > > > nilfs_clean_segments(), which acquires ns_segctor_sem for write.  The
> > > > range check on each segnum is performed deep inside the call chain by
> > > > nilfs_sufile_updatev(), which emits a nilfs_warn() per invalid entry
> > > > while still under the segctor lock and the sufile mi_sem.  Under load
> > > > (repeated invocations across multiple mounts saturating the global
> > > > printk path), the cumulative printk latency keeps ns_segctor_sem held
> > > > long enough to trip the hung_task watchdog, blocking concurrent
> > > > operations such as chmod() that need ns_segctor_sem for read.
> > > >
> > > > Fix by validating the contents of kbufs[4] in the ioctl entry path,
> > > > before any FS-wide lock is acquired.  Out-of-range segment numbers are
> > > > rejected with -EINVAL synchronously, with no work performed under
> > > > ns_segctor_sem.
> > > >
> > > > Reported-by: syzbot+62f0f99d2f2bb8e3bbd7@syzkaller.appspotmail.com
> > > > Closes: https://syzkaller.appspot.com/bug?extid=62f0f99d2f2bb8e3bbd7
> > > > Fixes: 4f6b828837b4 ("nilfs2: fix lock order reversal in nilfs_clean_segments ioctl")
> > > > Tested-by: syzbot+62f0f99d2f2bb8e3bbd7@syzkaller.appspotmail.com
> > > > Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
> > > > ---
> > > >  fs/nilfs2/ioctl.c | 16 ++++++++++++++++
> > > >  1 file changed, 16 insertions(+)
> > > >
> > > > diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
> > > > index e0a606643e87..38822dce1839 100644
> > > > --- a/fs/nilfs2/ioctl.c
> > > > +++ b/fs/nilfs2/ioctl.c
> > > > @@ -846,6 +846,7 @@ static int nilfs_ioctl_clean_segments(struct inode *inode, struct file *filp,
> > > >       struct the_nilfs *nilfs;
> > > >       size_t len, nsegs;
> > > >       int n, ret;
> > > > +     size_t i;
> > >
> > > What about re-using the n variable? Does it make sense to introduce new one?
> > >
> > > >
> > > >       if (!capable(CAP_SYS_ADMIN))
> > > >               return -EPERM;
> > > > @@ -876,6 +877,21 @@ static int nilfs_ioctl_clean_segments(struct inode *inode, struct file *filp,
> > > >       }
> > > >       nilfs = inode->i_sb->s_fs_info;
> > > >
> > > > +     /*
> > > > +      * Validate segment numbers against the filesystem's segment count
> > > > +      * before entering nilfs_clean_segments(), which acquires
> > > > +      * ns_segctor_sem for write.  Catching invalid segnums here avoids
> > > > +      * holding that lock while emitting per-element diagnostics under
> > > > +      * the segment constructor.
> > > > +      */
> > > > +     for (i = 0; i < nsegs; i++) {
> > > > +             if (((__u64 *)kbufs[4])[i] >= nilfs->ns_nsegments) {
> > > > +                     ret = -EINVAL;
> > > > +                     kfree(kbufs[4]);
> > > > +                     goto out;
> > >
> > > Are you sure that you need to free buffer here and go to out? Maybe, we can
> > > introduce another label and to jump to kfree(kbufs[4]) at the end of method?
> > >
> > > Thanks,
> > > Slava.
> > >
> > > > +             }
> > > > +     }
> > > > +
> > > >       for (n = 0; n < 4; n++) {
> > > >               ret = -EINVAL;
> > > >               if (argv[n].v_size != argsz[n])
> > >
> >
> > Thanks for the feedback. I have sent patch v2.
> >
> > Thanks
> >
> > Deepanshu Kartikey
>
> Thank you, Deepanshu, for the patch proposal.
>
> Because nilfs->ns_nsegments can be modified by nilfs_ioctl_resize(),
> we must avoid race conditions regarding this proposed fix.
>
> Therefore, it's appropriate to insert this check within the write lock
> section of nilfs->ns_segctor_sem, that is, immediately after calling
> nilfs_transaction_lock() within nilfs_clean_segments().
>
> As a coding comment, directly referencing kbufs[4] as an array is not
> very readable, so it's better to declare a variable in the local
> variable declaration section of nilfs_clean_segments() like this:
>
>         size_t i, nsegs = argv[4].v_nmembs;
>         __u64 *segnumv = kbufs[4];
>
> and then compare by referencing segnumv[i].
> Here, the original variable name "nsegs" is confusingly similar to
> "ns_nsegments", therefore, it would be better to simply change it to
> "n" or rename it to something that more concisely represents the
> number of segments in the array.
>
> Also, to help identify the error's cause, I recommend adding an error
> message like the following within the pre-check loop:
>
>         nilfs_err(sb,
>                 "Segment number %llu to be freed is out of range",
>                 (unsigned long long)segnumv[i]);
>
> Finally, a minor comment: to avoid scattering the function's exit path
> when making corrections according to the above policy, it's better to
> add a label like the following to nilfs_clean_segments() and jump to
> it, rather than returning separately.
>
> out_unlock:
>         ...
>
> bail_unlock:
>         nilfs_transaction_unlock(sb);
>         return err;
>
> Thanks,
> Ryusuke Konishi

Thanks for the detailed feedback, Ryusuke. I have sent the patch v3

Thanks


Deepanshu Kartikey