[PATCH] md: fix array_state=clear sysfs deadlock

Yu Kuai posted 1 patch 2 days, 22 hours ago
drivers/md/md.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
[PATCH] md: fix array_state=clear sysfs deadlock
Posted by Yu Kuai 2 days, 22 hours ago
From: Yu Kuai <yukuai3@huawei.com>

When "clear" is written to array_state, md_attr_store() breaks sysfs
active protection so the array can delete itself from its own sysfs
store method.

However, md_attr_store() currently drops the mddev reference before
calling sysfs_unbreak_active_protection(). Once do_md_stop(..., 0)
has made the mddev eligible for delayed deletion, the temporary
kobject reference taken by sysfs_break_active_protection() can become
the last kobject reference protecting the md kobject.

That allows sysfs_unbreak_active_protection() to drop the last
kobject reference from the current sysfs writer context. kobject
teardown then recurses into kernfs removal while the current sysfs
node is still being unwound, and lockdep reports recursive locking on
kn->active with kernfs_drain() in the call chain.

Reproducer on an existing level:
1. Create an md0 linear array and activate it:
   mknod /dev/md0 b 9 0
   echo none > /sys/block/md0/md/metadata_version
   echo linear > /sys/block/md0/md/level
   echo 1 > /sys/block/md0/md/raid_disks
   echo "$(cat /sys/class/block/sdb/dev)" > /sys/block/md0/md/new_dev
   echo "$(($(cat /sys/class/block/sdb/size) / 2))" > \
	/sys/block/md0/md/dev-sdb/size
   echo 0 > /sys/block/md0/md/dev-sdb/slot
   echo active > /sys/block/md0/md/array_state
2. Wait briefly for the array to settle, then clear it:
   sleep 2
   echo clear > /sys/block/md0/md/array_state

The warning looks like:

  WARNING: possible recursive locking detected
  bash/588 is trying to acquire lock:
  (kn->active#65) at __kernfs_remove+0x157/0x1d0
  but task is already holding lock:
  (kn->active#65) at sysfs_unbreak_active_protection+0x1f/0x40
  ...
  Call Trace:
   kernfs_drain
   __kernfs_remove
   kernfs_remove_by_name_ns
   sysfs_remove_group
   sysfs_remove_groups
   __kobject_del
   kobject_put
   md_attr_store
   kernfs_fop_write_iter
   vfs_write
   ksys_write

Restore active protection before mddev_put() so the extra sysfs
kobject reference is dropped while the mddev is still held alive. The
actual md kobject deletion is then deferred until after the sysfs
write path has fully returned.

Fixes: 9e59d609763f ("md: call del_gendisk in control path")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 521d9b34cd9e..02efe9700256 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6130,10 +6130,16 @@ md_attr_store(struct kobject *kobj, struct attribute *attr,
 	}
 	spin_unlock(&all_mddevs_lock);
 	rv = entry->store(mddev, page, length);
-	mddev_put(mddev);
 
+	/*
+	 * For "array_state=clear", dropping the extra kobject reference from
+	 * sysfs_break_active_protection() can trigger md kobject deletion.
+	 * Restore active protection before mddev_put() so deletion happens
+	 * after the sysfs write path fully unwinds.
+	 */
 	if (kn)
 		sysfs_unbreak_active_protection(kn);
+	mddev_put(mddev);
 
 	return rv;
 }
-- 
2.51.0
Re: [PATCH] md: fix array_state=clear sysfs deadlock
Posted by Li Nan 2 hours ago

在 2026/3/30 13:52, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> When "clear" is written to array_state, md_attr_store() breaks sysfs
> active protection so the array can delete itself from its own sysfs
> store method.
> 
> However, md_attr_store() currently drops the mddev reference before
> calling sysfs_unbreak_active_protection(). Once do_md_stop(..., 0)
> has made the mddev eligible for delayed deletion, the temporary
> kobject reference taken by sysfs_break_active_protection() can become
> the last kobject reference protecting the md kobject.
> 
> That allows sysfs_unbreak_active_protection() to drop the last
> kobject reference from the current sysfs writer context. kobject
> teardown then recurses into kernfs removal while the current sysfs
> node is still being unwound, and lockdep reports recursive locking on
> kn->active with kernfs_drain() in the call chain.
> 
> Reproducer on an existing level:
> 1. Create an md0 linear array and activate it:
>     mknod /dev/md0 b 9 0
>     echo none > /sys/block/md0/md/metadata_version
>     echo linear > /sys/block/md0/md/level
>     echo 1 > /sys/block/md0/md/raid_disks
>     echo "$(cat /sys/class/block/sdb/dev)" > /sys/block/md0/md/new_dev
>     echo "$(($(cat /sys/class/block/sdb/size) / 2))" > \
> 	/sys/block/md0/md/dev-sdb/size
>     echo 0 > /sys/block/md0/md/dev-sdb/slot
>     echo active > /sys/block/md0/md/array_state
> 2. Wait briefly for the array to settle, then clear it:
>     sleep 2
>     echo clear > /sys/block/md0/md/array_state
> 
> The warning looks like:
> 
>    WARNING: possible recursive locking detected
>    bash/588 is trying to acquire lock:
>    (kn->active#65) at __kernfs_remove+0x157/0x1d0
>    but task is already holding lock:
>    (kn->active#65) at sysfs_unbreak_active_protection+0x1f/0x40
>    ...
>    Call Trace:
>     kernfs_drain
>     __kernfs_remove
>     kernfs_remove_by_name_ns
>     sysfs_remove_group
>     sysfs_remove_groups
>     __kobject_del
>     kobject_put
>     md_attr_store
>     kernfs_fop_write_iter
>     vfs_write
>     ksys_write
> 
> Restore active protection before mddev_put() so the extra sysfs
> kobject reference is dropped while the mddev is still held alive. The
> actual md kobject deletion is then deferred until after the sysfs
> write path has fully returned.
> 
> Fixes: 9e59d609763f ("md: call del_gendisk in control path")
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>   drivers/md/md.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 521d9b34cd9e..02efe9700256 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6130,10 +6130,16 @@ md_attr_store(struct kobject *kobj, struct attribute *attr,
>   	}
>   	spin_unlock(&all_mddevs_lock);
>   	rv = entry->store(mddev, page, length);
> -	mddev_put(mddev);
>   
> +	/*
> +	 * For "array_state=clear", dropping the extra kobject reference from
> +	 * sysfs_break_active_protection() can trigger md kobject deletion.
> +	 * Restore active protection before mddev_put() so deletion happens
> +	 * after the sysfs write path fully unwinds.
> +	 */
>   	if (kn)
>   		sysfs_unbreak_active_protection(kn);
> +	mddev_put(mddev);
>   
>   	return rv;
>   }

LGTM

-- 
Thanks,
Nan

Re: [PATCH] md: fix array_state=clear sysfs deadlock
Posted by Xiao Ni 2 days, 12 hours ago
On Mon, Mar 30, 2026 at 1:55 PM Yu Kuai <yukuai@fnnas.com> wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> When "clear" is written to array_state, md_attr_store() breaks sysfs
> active protection so the array can delete itself from its own sysfs
> store method.
>
> However, md_attr_store() currently drops the mddev reference before
> calling sysfs_unbreak_active_protection(). Once do_md_stop(..., 0)
> has made the mddev eligible for delayed deletion, the temporary
> kobject reference taken by sysfs_break_active_protection() can become
> the last kobject reference protecting the md kobject.
>
> That allows sysfs_unbreak_active_protection() to drop the last
> kobject reference from the current sysfs writer context. kobject
> teardown then recurses into kernfs removal while the current sysfs
> node is still being unwound, and lockdep reports recursive locking on
> kn->active with kernfs_drain() in the call chain.
>
> Reproducer on an existing level:
> 1. Create an md0 linear array and activate it:
>    mknod /dev/md0 b 9 0
>    echo none > /sys/block/md0/md/metadata_version
>    echo linear > /sys/block/md0/md/level
>    echo 1 > /sys/block/md0/md/raid_disks
>    echo "$(cat /sys/class/block/sdb/dev)" > /sys/block/md0/md/new_dev
>    echo "$(($(cat /sys/class/block/sdb/size) / 2))" > \
>         /sys/block/md0/md/dev-sdb/size
>    echo 0 > /sys/block/md0/md/dev-sdb/slot
>    echo active > /sys/block/md0/md/array_state
> 2. Wait briefly for the array to settle, then clear it:
>    sleep 2
>    echo clear > /sys/block/md0/md/array_state
>
> The warning looks like:
>
>   WARNING: possible recursive locking detected
>   bash/588 is trying to acquire lock:
>   (kn->active#65) at __kernfs_remove+0x157/0x1d0
>   but task is already holding lock:
>   (kn->active#65) at sysfs_unbreak_active_protection+0x1f/0x40
>   ...
>   Call Trace:
>    kernfs_drain
>    __kernfs_remove
>    kernfs_remove_by_name_ns
>    sysfs_remove_group
>    sysfs_remove_groups
>    __kobject_del
>    kobject_put
>    md_attr_store
>    kernfs_fop_write_iter
>    vfs_write
>    ksys_write
>
> Restore active protection before mddev_put() so the extra sysfs
> kobject reference is dropped while the mddev is still held alive. The
> actual md kobject deletion is then deferred until after the sysfs
> write path has fully returned.
>
> Fixes: 9e59d609763f ("md: call del_gendisk in control path")
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/md/md.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 521d9b34cd9e..02efe9700256 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6130,10 +6130,16 @@ md_attr_store(struct kobject *kobj, struct attribute *attr,
>         }
>         spin_unlock(&all_mddevs_lock);
>         rv = entry->store(mddev, page, length);
> -       mddev_put(mddev);
>
> +       /*
> +        * For "array_state=clear", dropping the extra kobject reference from
> +        * sysfs_break_active_protection() can trigger md kobject deletion.
> +        * Restore active protection before mddev_put() so deletion happens
> +        * after the sysfs write path fully unwinds.
> +        */
>         if (kn)
>                 sysfs_unbreak_active_protection(kn);
> +       mddev_put(mddev);
>
>         return rv;
>  }
> --
> 2.51.0
>
>

This patch looks good to me.
Reviewed-by: Xiao Ni <xni@redhat.com>