[PATCH 00/10] blk-mq: fix blk_mq_tags double free while nr_requests grown

Yu Kuai posted 10 patches 1 month, 2 weeks ago
Documentation/ABI/stable/sysfs-block | 14 ++-----
block/blk-mq-sched.c                 | 14 +++----
block/blk-mq-sched.h                 |  2 +-
block/blk-mq-tag.c                   | 52 -----------------------
block/blk-mq.c                       | 62 +++++++++++-----------------
block/blk-mq.h                       | 17 ++++++--
block/blk-sysfs.c                    | 44 +++++++++++++++-----
block/elevator.c                     |  3 +-
8 files changed, 84 insertions(+), 124 deletions(-)
[PATCH 00/10] blk-mq: fix blk_mq_tags double free while nr_requests grown
Posted by Yu Kuai 1 month, 2 weeks ago
From: Yu Kuai <yukuai3@huawei.com>

In the case user trigger tags grow by queue sysfs attribute nr_requests,
hctx->sched_tags will be freed directly and replaced with a new
allocated tags, see blk_mq_tag_update_depth().

The problem is that hctx->sched_tags is from elevator->et->tags, while
et->tags is still the freed tags, hence later elevator exist will try to
free the tags again, causing kernel panic.

patch 1-6 are prep cleanup and refactor patches for updating nr_requests
patch 7,8 are the fix patches for the regression
patch 9 is cleanup patch after patch 8
patch 10 fix the stale nr_requests documentation

Yu Kuai (10):
  blk-mq: remove useless checking from queue_requests_store()
  blk-mq: remove useless checkings from blk_mq_update_nr_requests()
  blk-mq: check invalid nr_requests in queue_requests_store()
  blk-mq: serialize updating nr_requests with update_nr_hwq_lock
  blk-mq: cleanup shared tags case in blk_mq_update_nr_requests()
  blk-mq: split bitmap grow and resize case in
    blk_mq_update_nr_requests()
  blk-mq-sched: add new parameter nr_requests in
    blk_mq_alloc_sched_tags()
  blk-mq: fix blk_mq_tags double free while nr_requests grown
  blk-mq: remove blk_mq_tag_update_depth()
  blk-mq: fix stale nr_requests documentation

 Documentation/ABI/stable/sysfs-block | 14 ++-----
 block/blk-mq-sched.c                 | 14 +++----
 block/blk-mq-sched.h                 |  2 +-
 block/blk-mq-tag.c                   | 52 -----------------------
 block/blk-mq.c                       | 62 +++++++++++-----------------
 block/blk-mq.h                       | 17 ++++++--
 block/blk-sysfs.c                    | 44 +++++++++++++++-----
 block/elevator.c                     |  3 +-
 8 files changed, 84 insertions(+), 124 deletions(-)

-- 
2.39.2
Re: [PATCH 00/10] blk-mq: fix blk_mq_tags double free while nr_requests grown
Posted by Ming Lei 1 month, 2 weeks ago
On Fri, Aug 15, 2025 at 04:02:06PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> In the case user trigger tags grow by queue sysfs attribute nr_requests,
> hctx->sched_tags will be freed directly and replaced with a new
> allocated tags, see blk_mq_tag_update_depth().
> 
> The problem is that hctx->sched_tags is from elevator->et->tags, while
> et->tags is still the freed tags, hence later elevator exist will try to
> free the tags again, causing kernel panic.
> 
> patch 1-6 are prep cleanup and refactor patches for updating nr_requests
> patch 7,8 are the fix patches for the regression
> patch 9 is cleanup patch after patch 8
> patch 10 fix the stale nr_requests documentation

Please do not mix bug(regression) fix with cleanup.

The bug fix for updating nr_requests should have been simple enough in single
or two patches, why do you make 10-patches for dealing with the regression?

Not mention this way is really unfriendly for stable tree backport.


Thanks,
Ming
Re: [PATCH 00/10] blk-mq: fix blk_mq_tags double free while nr_requests grown
Posted by Yu Kuai 1 month, 2 weeks ago
Hi,

在 2025/08/15 16:30, Ming Lei 写道:
> On Fri, Aug 15, 2025 at 04:02:06PM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> In the case user trigger tags grow by queue sysfs attribute nr_requests,
>> hctx->sched_tags will be freed directly and replaced with a new
>> allocated tags, see blk_mq_tag_update_depth().
>>
>> The problem is that hctx->sched_tags is from elevator->et->tags, while
>> et->tags is still the freed tags, hence later elevator exist will try to
>> free the tags again, causing kernel panic.
>>
>> patch 1-6 are prep cleanup and refactor patches for updating nr_requests
>> patch 7,8 are the fix patches for the regression
>> patch 9 is cleanup patch after patch 8
>> patch 10 fix the stale nr_requests documentation
> 
> Please do not mix bug(regression) fix with cleanup.
> 
> The bug fix for updating nr_requests should have been simple enough in single
> or two patches, why do you make 10-patches for dealing with the regression?

Ok, in short, my solution is:

- serialize switching elevator with updating nr_requests
- check the case that nr_requests will grow and allocate elevator_tags
before freezing the queue.
- for the grow case, switch to new elevator_tags.

I do tried and I can't find a easy way to fix this without making
related code uncomfortable. Perhaps because I do the cleanups and
refactor first and I can't think outside the box...

> 
> Not mention this way is really unfriendly for stable tree backport.

I checked the last time related code to queue_requests_store() was
changed is commit 3efe7571c3ae ("block: protect nr_requests update using
q->elevator_lock"), and I believe this is what the fixed patch relied
on, so I think backport will not have much conflicts.

Whatever stbale branch that f5a6604f7a44 ("block: fix lockdep warning
caused by lock dependency in elv_iosched_store") is backported, I can
make sure a proper fix is backported as well.

Thanks,
Kuai

> 
> 
> Thanks,
> Ming
> 
> 
> .
> 

Re: [PATCH 00/10] blk-mq: fix blk_mq_tags double free while nr_requests grown
Posted by Ming Lei 1 month, 2 weeks ago
On Fri, Aug 15, 2025 at 05:05:34PM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2025/08/15 16:30, Ming Lei 写道:
> > On Fri, Aug 15, 2025 at 04:02:06PM +0800, Yu Kuai wrote:
> > > From: Yu Kuai <yukuai3@huawei.com>
> > > 
> > > In the case user trigger tags grow by queue sysfs attribute nr_requests,
> > > hctx->sched_tags will be freed directly and replaced with a new
> > > allocated tags, see blk_mq_tag_update_depth().
> > > 
> > > The problem is that hctx->sched_tags is from elevator->et->tags, while
> > > et->tags is still the freed tags, hence later elevator exist will try to
> > > free the tags again, causing kernel panic.
> > > 
> > > patch 1-6 are prep cleanup and refactor patches for updating nr_requests
> > > patch 7,8 are the fix patches for the regression
> > > patch 9 is cleanup patch after patch 8
> > > patch 10 fix the stale nr_requests documentation
> > 
> > Please do not mix bug(regression) fix with cleanup.
> > 
> > The bug fix for updating nr_requests should have been simple enough in single
> > or two patches, why do you make 10-patches for dealing with the regression?
> 
> Ok, in short, my solution is:
> 
> - serialize switching elevator with updating nr_requests
> - check the case that nr_requests will grow and allocate elevator_tags
> before freezing the queue.
> - for the grow case, switch to new elevator_tags.

I'd suggest to make one or two commits to fix the recent regression
f5a6604f7a44 ("block: fix lockdep warning caused by lock dependency in elv_iosched_store")
first, because double free is one serious issue, and the fix should
belong to v6.17.

For other long-term or less serious issue, it may be fine to delay to v6.18
if the patchset is too big or complicated, which might imply new regression.


Thanks, 
Ming