block/blk-cgroup.c | 12 ++++++++++-- block/blk-cgroup.h | 1 + block/blk-iocost.c | 8 ++++---- block/blk-rq-qos.c | 25 ++++++++++++++++++++----- block/blk-rq-qos.h | 17 +++++++++++++---- include/linux/blkdev.h | 1 + 6 files changed, 49 insertions(+), 15 deletions(-)
From: Yu Kuai <yukuai3@huawei.com> iocost is initialized when it's configured the first time, and iocost initializing can race with del_gendisk(), which will cause null pointer dereference: t1 t2 ioc_qos_write blk_iocost_init rq_qos_add del_gendisk rq_qos_exit //iocost is removed from q->roqs blkcg_activate_policy pd_init_fn ioc_pd_init ioc = q_to_ioc(blkg->q) //can't find iocost and return null And iolatency is about to switch to the same lazy initialization. This patchset fix this problem by synchronize rq_qos_add() and blkcg_activate_policy() with rq_qos_exit(). Yu Kuai (4): block/rq_qos: protect 'q->rq_qos' with queue_lock in rq_qos_exit() block/rq_qos: fail rq_qos_add() after del_gendisk() blk-cgroup: add a new interface blkcg_conf_close_bdev() blk-cgroup: synchronize del_gendisk() with configuring cgroup policy block/blk-cgroup.c | 12 ++++++++++-- block/blk-cgroup.h | 1 + block/blk-iocost.c | 8 ++++---- block/blk-rq-qos.c | 25 ++++++++++++++++++++----- block/blk-rq-qos.h | 17 +++++++++++++---- include/linux/blkdev.h | 1 + 6 files changed, 49 insertions(+), 15 deletions(-) -- 2.31.1
Hello, On Sat, Dec 17, 2022 at 11:09:04AM +0800, Yu Kuai wrote: > From: Yu Kuai <yukuai3@huawei.com> > > iocost is initialized when it's configured the first time, and iocost > initializing can race with del_gendisk(), which will cause null pointer > dereference: > > t1 t2 > ioc_qos_write > blk_iocost_init > rq_qos_add > del_gendisk > rq_qos_exit > //iocost is removed from q->roqs > blkcg_activate_policy > pd_init_fn > ioc_pd_init > ioc = q_to_ioc(blkg->q) > //can't find iocost and return null > > And iolatency is about to switch to the same lazy initialization. > > This patchset fix this problem by synchronize rq_qos_add() and > blkcg_activate_policy() with rq_qos_exit(). So, the patchset seems a bit overly complicated to me. What do you think about the following? * These init/exit paths are super cold path, just protecting them with a global mutex is probably enough. If we encounter a scalability problem, it's easy to fix down the line. * If we're synchronizing this with a mutex anyway, no need to grab the queue_lock, right? rq_qos_add/del/exit() can all just hold the mutex. * And we can keep the state tracking within rq_qos. When rq_qos_exit() is called, mark it so that future adds will fail - be that a special ->next value a queue flag or whatever. Thanks. -- tejun
Hi, 在 2022/12/20 4:55, Tejun Heo 写道: > Hello, > > On Sat, Dec 17, 2022 at 11:09:04AM +0800, Yu Kuai wrote: >> From: Yu Kuai <yukuai3@huawei.com> >> >> iocost is initialized when it's configured the first time, and iocost >> initializing can race with del_gendisk(), which will cause null pointer >> dereference: >> >> t1 t2 >> ioc_qos_write >> blk_iocost_init >> rq_qos_add >> del_gendisk >> rq_qos_exit >> //iocost is removed from q->roqs >> blkcg_activate_policy >> pd_init_fn >> ioc_pd_init >> ioc = q_to_ioc(blkg->q) >> //can't find iocost and return null >> >> And iolatency is about to switch to the same lazy initialization. >> >> This patchset fix this problem by synchronize rq_qos_add() and >> blkcg_activate_policy() with rq_qos_exit(). > > So, the patchset seems a bit overly complicated to me. What do you think > about the following? > > * These init/exit paths are super cold path, just protecting them with a > global mutex is probably enough. If we encounter a scalability problem, > it's easy to fix down the line. > > * If we're synchronizing this with a mutex anyway, no need to grab the > queue_lock, right? rq_qos_add/del/exit() can all just hold the mutex. > > * And we can keep the state tracking within rq_qos. When rq_qos_exit() is > called, mark it so that future adds will fail - be that a special ->next > value a queue flag or whatever. Yes, that sounds good. BTW, queue_lock is also used to protect pd_alloc_fn/pd_init_fn,and we found that blkcg_activate_policy() is problematic: blkcg_activate_policy spin_lock_irq(&q->queue_lock); list_for_each_entry_reverse(blkg, &q->blkg_list pd_alloc_fn(GFP_NOWAIT | __GFP_NOWARN,...) -> failed spin_unlock_irq(&q->queue_lock); // release queue_lock here is problematic, this will cause pd_offline_fn called without pd_init_fn. pd_alloc_fn(__GFP_NOWARN,...) If we are using a mutex to protect rq_qos ops, it seems the right thing to do do also using the mutex to protect blkcg_policy ops, and this problem can be fixed because mutex can be held to alloc memroy with GFP_KERNEL. What do you think? Thanks, Kuai > > Thanks. >
On Tue, Dec 20, 2022 at 05:19:12PM +0800, Yu Kuai wrote: > If we are using a mutex to protect rq_qos ops, it seems the right thing > to do do also using the mutex to protect blkcg_policy ops, and this > problem can be fixed because mutex can be held to alloc memroy with > GFP_KERNEL. What do you think? Getting rid of the atomic allocations would be awesome. FYI, I'm also in favor of everything that moves things out of queue_lock into more dedicated locks. queue_lock is such an undocument mess of untargeted things that don't realted to each other right now that splitting and removing it is becoming more and more important.
Hello, On Tue, Dec 20, 2022 at 05:19:12PM +0800, Yu Kuai wrote: > Yes, that sounds good. BTW, queue_lock is also used to protect > pd_alloc_fn/pd_init_fn,and we found that blkcg_activate_policy() is > problematic: > > blkcg_activate_policy > spin_lock_irq(&q->queue_lock); > list_for_each_entry_reverse(blkg, &q->blkg_list > pd_alloc_fn(GFP_NOWAIT | __GFP_NOWARN,...) -> failed > > spin_unlock_irq(&q->queue_lock); > // release queue_lock here is problematic, this will cause > pd_offline_fn called without pd_init_fn. > pd_alloc_fn(__GFP_NOWARN,...) So, if a blkg is destroyed while a policy is being activated, right? > If we are using a mutex to protect rq_qos ops, it seems the right thing > to do do also using the mutex to protect blkcg_policy ops, and this > problem can be fixed because mutex can be held to alloc memroy with > GFP_KERNEL. What do you think? One worry is that switching to mutex can be more headache due to destroy path synchronization. Another approach would be using a per-blkg flag to track whether a blkg has been initialized. Thanks. -- tejun
Hi, 在 2022/12/21 0:01, Tejun Heo 写道: > Hello, > > On Tue, Dec 20, 2022 at 05:19:12PM +0800, Yu Kuai wrote: >> Yes, that sounds good. BTW, queue_lock is also used to protect >> pd_alloc_fn/pd_init_fn,and we found that blkcg_activate_policy() is >> problematic: >> >> blkcg_activate_policy >> spin_lock_irq(&q->queue_lock); >> list_for_each_entry_reverse(blkg, &q->blkg_list >> pd_alloc_fn(GFP_NOWAIT | __GFP_NOWARN,...) -> failed >> >> spin_unlock_irq(&q->queue_lock); >> // release queue_lock here is problematic, this will cause >> pd_offline_fn called without pd_init_fn. >> pd_alloc_fn(__GFP_NOWARN,...) > > So, if a blkg is destroyed while a policy is being activated, right? Yes, remove cgroup can race with this, for bfq null pointer deference will be triggered in bfq_pd_offline(). > >> If we are using a mutex to protect rq_qos ops, it seems the right thing >> to do do also using the mutex to protect blkcg_policy ops, and this >> problem can be fixed because mutex can be held to alloc memroy with >> GFP_KERNEL. What do you think? > > One worry is that switching to mutex can be more headache due to destroy > path synchronization. Another approach would be using a per-blkg flag to > track whether a blkg has been initialized. I think perhaps you mean per blkg_policy_data flag? per blkg flag should not work in this case. Thanks, Kuai > > Thanks. >
Hi, 在 2022/12/21 9:10, Yu Kuai 写道: > Hi, > > 在 2022/12/21 0:01, Tejun Heo 写道: >> Hello, >> >> On Tue, Dec 20, 2022 at 05:19:12PM +0800, Yu Kuai wrote: >>> Yes, that sounds good. BTW, queue_lock is also used to protect >>> pd_alloc_fn/pd_init_fn,and we found that blkcg_activate_policy() is >>> problematic: >>> >>> blkcg_activate_policy >>> spin_lock_irq(&q->queue_lock); >>> list_for_each_entry_reverse(blkg, &q->blkg_list >>> pd_alloc_fn(GFP_NOWAIT | __GFP_NOWARN,...) -> failed >>> >>> spin_unlock_irq(&q->queue_lock); >>> // release queue_lock here is problematic, this will cause >>> pd_offline_fn called without pd_init_fn. >>> pd_alloc_fn(__GFP_NOWARN,...) >> >> So, if a blkg is destroyed while a policy is being activated, right? > Yes, remove cgroup can race with this, for bfq null pointer deference > will be triggered in bfq_pd_offline(). BTW, We just found that pd_online_fn() is missed in blkcg_activate_policy()... Currently this is not a problem because only bl-throttle implement it, and blk-throttle is activated while creating blkg. Thanks, Kuai > >> >>> If we are using a mutex to protect rq_qos ops, it seems the right thing >>> to do do also using the mutex to protect blkcg_policy ops, and this >>> problem can be fixed because mutex can be held to alloc memroy with >>> GFP_KERNEL. What do you think? >> >> One worry is that switching to mutex can be more headache due to destroy >> path synchronization. Another approach would be using a per-blkg flag to >> track whether a blkg has been initialized. > I think perhaps you mean per blkg_policy_data flag? per blkg flag should > not work in this case. > > Thanks, > Kuai >> >> Thanks. >> > > . >
© 2016 - 2025 Red Hat, Inc.