[PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log()

Yu Kuai posted 2 patches 2 years, 7 months ago
There is a newer version of this series
drivers/md/raid5-cache.c | 25 ++++++++++---------------
1 file changed, 10 insertions(+), 15 deletions(-)
[PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log()
Posted by Yu Kuai 2 years, 7 months ago
From: Yu Kuai <yukuai3@huawei.com>

Changes in v2:
 - remove a now unused local variable in patch 2;

Commit b13015af94cf ("md/raid5-cache: Clear conf->log after finishing
work") introduce a new problem:

// caller hold reconfig_mutex
r5l_exit_log
 flush_work(&log->disable_writeback_work)
			r5c_disable_writeback_async
			 wait_event
			  /*
			   * conf->log is not NULL, and mddev_trylock()
			   * will fail, wait_event() can never pass.
			   */
 conf->log = NULL

patch 1 revert this patch, an patch 2 fix the original problem in a
different way.

Noted this problem is just found by code review, and I think this is
probably the reason that some mdadm tests is broken.

Yu Kuai (2):
  md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after
    finishing work"
  md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread()

 drivers/md/raid5-cache.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

-- 
2.39.2
Re: [PATCH -next v2 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log()
Posted by Yu Kuai 2 years, 7 months ago
Hi,

在 2023/06/28 9:07, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Changes in v2:
>   - remove a now unused local variable in patch 2;
> 
> Commit b13015af94cf ("md/raid5-cache: Clear conf->log after finishing
> work") introduce a new problem:
> 
> // caller hold reconfig_mutex
> r5l_exit_log
>   flush_work(&log->disable_writeback_work)
> 			r5c_disable_writeback_async
> 			 wait_event
> 			  /*
> 			   * conf->log is not NULL, and mddev_trylock()
> 			   * will fail, wait_event() can never pass.
> 			   */
>   conf->log = NULL
> 
> patch 1 revert this patch, an patch 2 fix the original problem in a
> different way.
> 
> Noted this problem is just found by code review, and I think this is
> probably the reason that some mdadm tests is broken.

Any suggestions?

By the way, while taking another look at this problem, I think probably
read and write 'conf->log' should use READ_ONCE and WRITE_ONCE.

Thanks,
Kuai
> 
> Yu Kuai (2):
>    md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after
>      finishing work"
>    md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread()
> 
>   drivers/md/raid5-cache.c | 25 ++++++++++---------------
>   1 file changed, 10 insertions(+), 15 deletions(-)
>