bfq: fix waker_bfqq inconsistency crash

[RFC PATCH] bfq: fix waker_bfqq inconsistency crash

Posted by Khazhismel Kumykov 3 years, 5 months ago

This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL,
but woken_list_node still being hashed. This would happen when
bfq_init_rq() expects a brand new allocated queue to be returned from
bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq
without resetting woken_list_node. Since we can always return oom_bfqq
when attempting to allocate, we cannot assume waker_bfqq starts as NULL.
We must either reset woken_list_node, or avoid setting woken_list at all
for oom_bfqq - opt to do the former.

Crashes would have a stacktrace like:
[160595.656560]  bfq_add_bfqq_busy+0x110/0x1ec
[160595.661142]  bfq_add_request+0x6bc/0x980
[160595.666602]  bfq_insert_request+0x8ec/0x1240
[160595.671762]  bfq_insert_requests+0x58/0x9c
[160595.676420]  blk_mq_sched_insert_request+0x11c/0x198
[160595.682107]  blk_mq_submit_bio+0x270/0x62c
[160595.686759]  __submit_bio_noacct_mq+0xec/0x178
[160595.691926]  submit_bio+0x120/0x184
[160595.695990]  ext4_mpage_readpages+0x77c/0x7c8
[160595.701026]  ext4_readpage+0x60/0xb0
[160595.705158]  filemap_read_page+0x54/0x114
[160595.711961]  filemap_fault+0x228/0x5f4
[160595.716272]  do_read_fault+0xe0/0x1f0
[160595.720487]  do_fault+0x40/0x1c8

Tested by injecting random failures into bfq_get_queue, crashes go away
completely.

Fixes: 8ef3fc3a043c ("block, bfq: make shared queues inherit wakers")
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
---
RFC mainly because it's not clear to me the best policy here - but the
patch is tested and fixes a real crash we started seeing in 5.15

This is following up my ramble over at
https://lore.kernel.org/lkml/CACGdZYLMnfcqwbAXDx+x9vUOMn2cz55oc+8WySBS3J2Xd_q7Lg@mail.gmail.com/

 block/bfq-iosched.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 7ea427817f7f..5d2861119d20 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -6793,7 +6793,12 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
 				 * reset. So insert new_bfqq into the
 				 * woken_list of the waker. See
 				 * bfq_check_waker for details.
+				 *
+				 * Also, if we got oom_bfqq, we must check if
+				 * it's already in a woken_list
 				 */
+				if (unlikely(!hlist_unhashed(&bfqq->woken_list_node)))
+					hlist_del_init(&bfqq->woken_list_node);
 				if (bfqq->waker_bfqq)
 					hlist_add_head(&bfqq->woken_list_node,
 						       &bfqq->waker_bfqq->woken_list);
-- 
2.38.1.273.g43a17bfeac-goog

Re: [RFC PATCH] bfq: fix waker_bfqq inconsistency crash

Posted by Yu Kuai 3 years, 5 months ago

Hi,

在 2022/11/03 9:39, Khazhismel Kumykov 写道:
> This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL,
> but woken_list_node still being hashed. This would happen when
> bfq_init_rq() expects a brand new allocated queue to be returned from

 From what I see, bfqq->waker_bfqq is updated in bfq_init_rq() only if
'new_queue' is false, but if 'new_queue' is false, the returned 'bfqq'
from bfq_get_bfqq_handle_split() will never be oom_bfqq, so I'm confused
here...

> bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq
> without resetting woken_list_node. Since we can always return oom_bfqq
> when attempting to allocate, we cannot assume waker_bfqq starts as NULL.
> We must either reset woken_list_node, or avoid setting woken_list at all
> for oom_bfqq - opt to do the former.

Once oom_bfqq is used, I think the io is treated as issued from root
group. Hence I don't think it's necessary to set woken_list or
waker_bfqq for oom_bfqq.

Thanks,
Kuai
> 
> Crashes would have a stacktrace like:
> [160595.656560]  bfq_add_bfqq_busy+0x110/0x1ec
> [160595.661142]  bfq_add_request+0x6bc/0x980
> [160595.666602]  bfq_insert_request+0x8ec/0x1240
> [160595.671762]  bfq_insert_requests+0x58/0x9c
> [160595.676420]  blk_mq_sched_insert_request+0x11c/0x198
> [160595.682107]  blk_mq_submit_bio+0x270/0x62c
> [160595.686759]  __submit_bio_noacct_mq+0xec/0x178
> [160595.691926]  submit_bio+0x120/0x184
> [160595.695990]  ext4_mpage_readpages+0x77c/0x7c8
> [160595.701026]  ext4_readpage+0x60/0xb0
> [160595.705158]  filemap_read_page+0x54/0x114
> [160595.711961]  filemap_fault+0x228/0x5f4
> [160595.716272]  do_read_fault+0xe0/0x1f0
> [160595.720487]  do_fault+0x40/0x1c8
> 
> Tested by injecting random failures into bfq_get_queue, crashes go away
> completely.
> 
> Fixes: 8ef3fc3a043c ("block, bfq: make shared queues inherit wakers")
> Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
> ---
> RFC mainly because it's not clear to me the best policy here - but the
> patch is tested and fixes a real crash we started seeing in 5.15
> 
> This is following up my ramble over at
> https://lore.kernel.org/lkml/CACGdZYLMnfcqwbAXDx+x9vUOMn2cz55oc+8WySBS3J2Xd_q7Lg@mail.gmail.com/
> 
>   block/bfq-iosched.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 7ea427817f7f..5d2861119d20 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -6793,7 +6793,12 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
>   				 * reset. So insert new_bfqq into the
>   				 * woken_list of the waker. See
>   				 * bfq_check_waker for details.
> +				 *
> +				 * Also, if we got oom_bfqq, we must check if
> +				 * it's already in a woken_list
>   				 */
> +				if (unlikely(!hlist_unhashed(&bfqq->woken_list_node)))
> +					hlist_del_init(&bfqq->woken_list_node);
>   				if (bfqq->waker_bfqq)
>   					hlist_add_head(&bfqq->woken_list_node,
>   						       &bfqq->waker_bfqq->woken_list);
>

Re: [RFC PATCH] bfq: fix waker_bfqq inconsistency crash

Posted by Khazhy Kumykov 3 years, 5 months ago

On Wed, Nov 2, 2022 at 7:56 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2022/11/03 9:39, Khazhismel Kumykov 写道:
> > This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL,
> > but woken_list_node still being hashed. This would happen when
> > bfq_init_rq() expects a brand new allocated queue to be returned from
>
>  From what I see, bfqq->waker_bfqq is updated in bfq_init_rq() only if
> 'new_queue' is false, but if 'new_queue' is false, the returned 'bfqq'
> from bfq_get_bfqq_handle_split() will never be oom_bfqq, so I'm confused
> here...
There's two calls for bfq_get_bfqq_handle_split in this function - the
second one is after the check you mentioned, and is the problematic
one.
>
> > bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq
> > without resetting woken_list_node. Since we can always return oom_bfqq
> > when attempting to allocate, we cannot assume waker_bfqq starts as NULL.
> > We must either reset woken_list_node, or avoid setting woken_list at all
> > for oom_bfqq - opt to do the former.
>
> Once oom_bfqq is used, I think the io is treated as issued from root
> group. Hence I don't think it's necessary to set woken_list or
> waker_bfqq for oom_bfqq.
Ack, I was wondering what's right here since, evidently, *someone* had
already set oom_bfqq->waker_bfqq to *something* (although... maybe it
was an earlier init_rq). But maybe it's better to do nothing if we
*know* it's oom_bfqq.

Is it a correct interpretation here that setting waker_bfqq won't
accomplish anything anyways on oom_bfqq, so better off not?

>
> Thanks,
> Kuai

Re: [RFC PATCH] bfq: fix waker_bfqq inconsistency crash

Posted by Yu Kuai 3 years, 5 months ago

Hi,

在 2022/11/03 11:05, Khazhy Kumykov 写道:
> On Wed, Nov 2, 2022 at 7:56 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2022/11/03 9:39, Khazhismel Kumykov 写道:
>>> This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL,
>>> but woken_list_node still being hashed. This would happen when
>>> bfq_init_rq() expects a brand new allocated queue to be returned from
>>
>>   From what I see, bfqq->waker_bfqq is updated in bfq_init_rq() only if
>> 'new_queue' is false, but if 'new_queue' is false, the returned 'bfqq'
>> from bfq_get_bfqq_handle_split() will never be oom_bfqq, so I'm confused
>> here...
> There's two calls for bfq_get_bfqq_handle_split in this function - the
> second one is after the check you mentioned, and is the problematic
> one.
Yes, thanks for the explanation. Now I understand how the problem
triggers.

>>
>>> bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq
>>> without resetting woken_list_node. Since we can always return oom_bfqq
>>> when attempting to allocate, we cannot assume waker_bfqq starts as NULL.
>>> We must either reset woken_list_node, or avoid setting woken_list at all
>>> for oom_bfqq - opt to do the former.
>>
>> Once oom_bfqq is used, I think the io is treated as issued from root
>> group. Hence I don't think it's necessary to set woken_list or
>> waker_bfqq for oom_bfqq.
> Ack, I was wondering what's right here since, evidently, *someone* had
> already set oom_bfqq->waker_bfqq to *something* (although... maybe it
> was an earlier init_rq). But maybe it's better to do nothing if we
> *know* it's oom_bfqq.

I need to have a check how oom_bfqq get involved with waker_bfqq, and
then see if it's reasonable.

Probably Jan and Paolo will have better view on this.

Thanks,
Kuai
> 
> Is it a correct interpretation here that setting waker_bfqq won't
> accomplish anything anyways on oom_bfqq, so better off not?

Re: [RFC PATCH] bfq: fix waker_bfqq inconsistency crash

Posted by Jan Kara 3 years, 5 months ago

On Thu 03-11-22 11:51:15, Yu Kuai wrote:
> Hi,
> 
> 在 2022/11/03 11:05, Khazhy Kumykov 写道:
> > On Wed, Nov 2, 2022 at 7:56 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> > > 
> > > Hi,
> > > 
> > > 在 2022/11/03 9:39, Khazhismel Kumykov 写道:
> > > > This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL,
> > > > but woken_list_node still being hashed. This would happen when
> > > > bfq_init_rq() expects a brand new allocated queue to be returned from
> > > 
> > >   From what I see, bfqq->waker_bfqq is updated in bfq_init_rq() only if
> > > 'new_queue' is false, but if 'new_queue' is false, the returned 'bfqq'
> > > from bfq_get_bfqq_handle_split() will never be oom_bfqq, so I'm confused
> > > here...
> > There's two calls for bfq_get_bfqq_handle_split in this function - the
> > second one is after the check you mentioned, and is the problematic
> > one.
> Yes, thanks for the explanation. Now I understand how the problem
> triggers.
> 
> > > 
> > > > bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq
> > > > without resetting woken_list_node. Since we can always return oom_bfqq
> > > > when attempting to allocate, we cannot assume waker_bfqq starts as NULL.
> > > > We must either reset woken_list_node, or avoid setting woken_list at all
> > > > for oom_bfqq - opt to do the former.
> > > 
> > > Once oom_bfqq is used, I think the io is treated as issued from root
> > > group. Hence I don't think it's necessary to set woken_list or
> > > waker_bfqq for oom_bfqq.
> > Ack, I was wondering what's right here since, evidently, *someone* had
> > already set oom_bfqq->waker_bfqq to *something* (although... maybe it
> > was an earlier init_rq). But maybe it's better to do nothing if we
> > *know* it's oom_bfqq.
> 
> I need to have a check how oom_bfqq get involved with waker_bfqq, and
> then see if it's reasonable.
> 
> Probably Jan and Paolo will have better view on this.

Thanks for the CC Kuai and thanks to Khazy for spotting the bug. The
oom_bfqq is just a fallback bfqq and as such it should be extempted from
all special handling like waker detection etc. All this stuff is just for
optimizing performance and when we are OOM, we have far larger troubles
than to optimize performance.

So how I think we should really fix this is that we extempt oom_bfqq from
waker detection in bfq_check_waker() by adding:

	bfqq == bfqd->oom_bfqq ||
 	bfqd->last_completed_rq_bfq == bfqd->oom_bfqq)

to the initial check and then also if bfq_get_bfqq_handle_split() returns
oom_bfqq we should just skip carrying over the waker information.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

Re: [RFC PATCH] bfq: fix waker_bfqq inconsistency crash

Posted by Khazhy Kumykov 3 years, 5 months ago

On Thu, Nov 3, 2022 at 1:47 AM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 03-11-22 11:51:15, Yu Kuai wrote:
> > Hi,
> >
> > 在 2022/11/03 11:05, Khazhy Kumykov 写道:
> > > On Wed, Nov 2, 2022 at 7:56 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > 在 2022/11/03 9:39, Khazhismel Kumykov 写道:
> > > > > This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL,
> > > > > but woken_list_node still being hashed. This would happen when
> > > > > bfq_init_rq() expects a brand new allocated queue to be returned from
> > > >
> > > >   From what I see, bfqq->waker_bfqq is updated in bfq_init_rq() only if
> > > > 'new_queue' is false, but if 'new_queue' is false, the returned 'bfqq'
> > > > from bfq_get_bfqq_handle_split() will never be oom_bfqq, so I'm confused
> > > > here...
> > > There's two calls for bfq_get_bfqq_handle_split in this function - the
> > > second one is after the check you mentioned, and is the problematic
> > > one.
> > Yes, thanks for the explanation. Now I understand how the problem
> > triggers.
> >
> > > >
> > > > > bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq
> > > > > without resetting woken_list_node. Since we can always return oom_bfqq
> > > > > when attempting to allocate, we cannot assume waker_bfqq starts as NULL.
> > > > > We must either reset woken_list_node, or avoid setting woken_list at all
> > > > > for oom_bfqq - opt to do the former.
> > > >
> > > > Once oom_bfqq is used, I think the io is treated as issued from root
> > > > group. Hence I don't think it's necessary to set woken_list or
> > > > waker_bfqq for oom_bfqq.
> > > Ack, I was wondering what's right here since, evidently, *someone* had
> > > already set oom_bfqq->waker_bfqq to *something* (although... maybe it
> > > was an earlier init_rq). But maybe it's better to do nothing if we
> > > *know* it's oom_bfqq.
> >
> > I need to have a check how oom_bfqq get involved with waker_bfqq, and
> > then see if it's reasonable.
> >
> > Probably Jan and Paolo will have better view on this.
>
> Thanks for the CC Kuai and thanks to Khazy for spotting the bug. The
> oom_bfqq is just a fallback bfqq and as such it should be extempted from
> all special handling like waker detection etc. All this stuff is just for
> optimizing performance and when we are OOM, we have far larger troubles
> than to optimize performance.
>
> So how I think we should really fix this is that we extempt oom_bfqq from
> waker detection in bfq_check_waker() by adding:
>
>         bfqq == bfqd->oom_bfqq ||
>         bfqd->last_completed_rq_bfq == bfqd->oom_bfqq)
>
> to the initial check and then also if bfq_get_bfqq_handle_split() returns
> oom_bfqq we should just skip carrying over the waker information.
Thanks for the tip! I'll send a followup, including your suggestions.

I do have some other questions in this area, if you could help me
understand. Looking again at bfq_init_rq, inside of the !new_queue
section - we call bfq_split_bfqq() to "split" our bfqq, then in the
next line bfq_get_bfqq_handle_split inspects bic_to_bfqq(bic,
is_sync), and if it's NULL, allocates a new queue. However, if we have
an async rq, this call will return the pre-existing async bfqq, as the
call to bfq_split_bfqq() only clears the sync bfqq, ever. The best
understanding I have now is: "bic->bfqq[aync] is never NULL (and we
don't merge async queues) so we'll never reach this !new_queue section
anyways if it's async". Is that accurate?

Thanks,
Khazhy


>
>                                                                 Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

Re: [RFC PATCH] bfq: fix waker_bfqq inconsistency crash

Posted by Jan Kara 3 years, 5 months ago

On Fri 04-11-22 14:25:32, Khazhy Kumykov wrote:
> On Thu, Nov 3, 2022 at 1:47 AM Jan Kara <jack@suse.cz> wrote:
> >
> > On Thu 03-11-22 11:51:15, Yu Kuai wrote:
> > > Hi,
> > >
> > > 在 2022/11/03 11:05, Khazhy Kumykov 写道:
> > > > On Wed, Nov 2, 2022 at 7:56 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > 在 2022/11/03 9:39, Khazhismel Kumykov 写道:
> > > > > > This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL,
> > > > > > but woken_list_node still being hashed. This would happen when
> > > > > > bfq_init_rq() expects a brand new allocated queue to be returned from
> > > > >
> > > > >   From what I see, bfqq->waker_bfqq is updated in bfq_init_rq() only if
> > > > > 'new_queue' is false, but if 'new_queue' is false, the returned 'bfqq'
> > > > > from bfq_get_bfqq_handle_split() will never be oom_bfqq, so I'm confused
> > > > > here...
> > > > There's two calls for bfq_get_bfqq_handle_split in this function - the
> > > > second one is after the check you mentioned, and is the problematic
> > > > one.
> > > Yes, thanks for the explanation. Now I understand how the problem
> > > triggers.
> > >
> > > > >
> > > > > > bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq
> > > > > > without resetting woken_list_node. Since we can always return oom_bfqq
> > > > > > when attempting to allocate, we cannot assume waker_bfqq starts as NULL.
> > > > > > We must either reset woken_list_node, or avoid setting woken_list at all
> > > > > > for oom_bfqq - opt to do the former.
> > > > >
> > > > > Once oom_bfqq is used, I think the io is treated as issued from root
> > > > > group. Hence I don't think it's necessary to set woken_list or
> > > > > waker_bfqq for oom_bfqq.
> > > > Ack, I was wondering what's right here since, evidently, *someone* had
> > > > already set oom_bfqq->waker_bfqq to *something* (although... maybe it
> > > > was an earlier init_rq). But maybe it's better to do nothing if we
> > > > *know* it's oom_bfqq.
> > >
> > > I need to have a check how oom_bfqq get involved with waker_bfqq, and
> > > then see if it's reasonable.
> > >
> > > Probably Jan and Paolo will have better view on this.
> >
> > Thanks for the CC Kuai and thanks to Khazy for spotting the bug. The
> > oom_bfqq is just a fallback bfqq and as such it should be extempted from
> > all special handling like waker detection etc. All this stuff is just for
> > optimizing performance and when we are OOM, we have far larger troubles
> > than to optimize performance.
> >
> > So how I think we should really fix this is that we extempt oom_bfqq from
> > waker detection in bfq_check_waker() by adding:
> >
> >         bfqq == bfqd->oom_bfqq ||
> >         bfqd->last_completed_rq_bfq == bfqd->oom_bfqq)
> >
> > to the initial check and then also if bfq_get_bfqq_handle_split() returns
> > oom_bfqq we should just skip carrying over the waker information.
> Thanks for the tip! I'll send a followup, including your suggestions.
> 
> I do have some other questions in this area, if you could help me
> understand. Looking again at bfq_init_rq, inside of the !new_queue
> section - we call bfq_split_bfqq() to "split" our bfqq, then in the
> next line bfq_get_bfqq_handle_split inspects bic_to_bfqq(bic,
> is_sync), and if it's NULL, allocates a new queue. However, if we have
> an async rq, this call will return the pre-existing async bfqq, as the
> call to bfq_split_bfqq() only clears the sync bfqq, ever. The best
> understanding I have now is: "bic->bfqq[aync] is never NULL (and we
> don't merge async queues) so we'll never reach this !new_queue section
> anyways if it's async". Is that accurate?

So you are right that async queues are never merged or split. In fact, if
you have a look at bfq_get_queue(), you'll notice that async queue is
common for all processes with the same ioprio & blkcg. So all these games
with splitting, merging, waker detection etc. impact only sync queues.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[PATCH 1/2] bfq: fix waker_bfqq inconsistency crash

Posted by Khazhismel Kumykov 3 years, 5 months ago

This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL,
but woken_list_node still being hashed. This would happen when
bfq_init_rq() expects a brand new allocated queue to be returned from
bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq
without resetting woken_list_node. Since we can always return oom_bfqq
when attempting to allocate, we cannot assume waker_bfqq starts as NULL.

Avoid setting woken_bfqq for oom_bfqq entirely, as it's not useful.

Crashes would have a stacktrace like:
[160595.656560]  bfq_add_bfqq_busy+0x110/0x1ec
[160595.661142]  bfq_add_request+0x6bc/0x980
[160595.666602]  bfq_insert_request+0x8ec/0x1240
[160595.671762]  bfq_insert_requests+0x58/0x9c
[160595.676420]  blk_mq_sched_insert_request+0x11c/0x198
[160595.682107]  blk_mq_submit_bio+0x270/0x62c
[160595.686759]  __submit_bio_noacct_mq+0xec/0x178
[160595.691926]  submit_bio+0x120/0x184
[160595.695990]  ext4_mpage_readpages+0x77c/0x7c8
[160595.701026]  ext4_readpage+0x60/0xb0
[160595.705158]  filemap_read_page+0x54/0x114
[160595.711961]  filemap_fault+0x228/0x5f4
[160595.716272]  do_read_fault+0xe0/0x1f0
[160595.720487]  do_fault+0x40/0x1c8

Tested by injecting random failures into bfq_get_queue, crashes go away
completely.

Fixes: 8ef3fc3a043c ("block, bfq: make shared queues inherit wakers")
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
---
 block/bfq-iosched.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 7ea427817f7f..ca04ec868c40 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -6784,6 +6784,12 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
 				bfqq = bfq_get_bfqq_handle_split(bfqd, bic, bio,
 								 true, is_sync,
 								 NULL);
+				if (unlikely(bfqq == &bfqd->oom_bfqq))
+					bfqq_already_existing = true;
+			} else
+				bfqq_already_existing = true;
+
+			if (!bfqq_already_existing) {
 				bfqq->waker_bfqq = old_bfqq->waker_bfqq;
 				bfqq->tentative_waker_bfqq = NULL;
 
@@ -6797,8 +6803,7 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
 				if (bfqq->waker_bfqq)
 					hlist_add_head(&bfqq->woken_list_node,
 						       &bfqq->waker_bfqq->woken_list);
-			} else
-				bfqq_already_existing = true;
+			}
 		}
 	}
 
-- 
2.38.1.431.g37b22c650d-goog

Re: [PATCH 1/2] bfq: fix waker_bfqq inconsistency crash

Posted by Jens Axboe 3 years, 5 months ago

On Tue, 8 Nov 2022 10:10:29 -0800, Khazhismel Kumykov wrote:
> This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL,
> but woken_list_node still being hashed. This would happen when
> bfq_init_rq() expects a brand new allocated queue to be returned from
> bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq
> without resetting woken_list_node. Since we can always return oom_bfqq
> when attempting to allocate, we cannot assume waker_bfqq starts as NULL.
> 
> [...]

Applied, thanks!

[1/2] bfq: fix waker_bfqq inconsistency crash
      commit: a1795c2ccb1e4c49220d2a0d381540024d71647c
[2/2] bfq: ignore oom_bfqq in bfq_check_waker
      commit: 99771d73ff4539f2337b84917f4792abf0d8931b

Best regards,
-- 
Jens Axboe

Re: [PATCH 1/2] bfq: fix waker_bfqq inconsistency crash

Posted by Jan Kara 3 years, 5 months ago

On Tue 08-11-22 10:10:29, Khazhismel Kumykov wrote:
> This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL,
> but woken_list_node still being hashed. This would happen when
> bfq_init_rq() expects a brand new allocated queue to be returned from
> bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq
> without resetting woken_list_node. Since we can always return oom_bfqq
> when attempting to allocate, we cannot assume waker_bfqq starts as NULL.
> 
> Avoid setting woken_bfqq for oom_bfqq entirely, as it's not useful.
> 
> Crashes would have a stacktrace like:
> [160595.656560]  bfq_add_bfqq_busy+0x110/0x1ec
> [160595.661142]  bfq_add_request+0x6bc/0x980
> [160595.666602]  bfq_insert_request+0x8ec/0x1240
> [160595.671762]  bfq_insert_requests+0x58/0x9c
> [160595.676420]  blk_mq_sched_insert_request+0x11c/0x198
> [160595.682107]  blk_mq_submit_bio+0x270/0x62c
> [160595.686759]  __submit_bio_noacct_mq+0xec/0x178
> [160595.691926]  submit_bio+0x120/0x184
> [160595.695990]  ext4_mpage_readpages+0x77c/0x7c8
> [160595.701026]  ext4_readpage+0x60/0xb0
> [160595.705158]  filemap_read_page+0x54/0x114
> [160595.711961]  filemap_fault+0x228/0x5f4
> [160595.716272]  do_read_fault+0xe0/0x1f0
> [160595.720487]  do_fault+0x40/0x1c8
> 
> Tested by injecting random failures into bfq_get_queue, crashes go away
> completely.
> 
> Fixes: 8ef3fc3a043c ("block, bfq: make shared queues inherit wakers")
> Signed-off-by: Khazhismel Kumykov <khazhy@google.com>

Looks good. Thanks! Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/bfq-iosched.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 7ea427817f7f..ca04ec868c40 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -6784,6 +6784,12 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
>  				bfqq = bfq_get_bfqq_handle_split(bfqd, bic, bio,
>  								 true, is_sync,
>  								 NULL);
> +				if (unlikely(bfqq == &bfqd->oom_bfqq))
> +					bfqq_already_existing = true;
> +			} else
> +				bfqq_already_existing = true;
> +
> +			if (!bfqq_already_existing) {
>  				bfqq->waker_bfqq = old_bfqq->waker_bfqq;
>  				bfqq->tentative_waker_bfqq = NULL;
>  
> @@ -6797,8 +6803,7 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
>  				if (bfqq->waker_bfqq)
>  					hlist_add_head(&bfqq->woken_list_node,
>  						       &bfqq->waker_bfqq->woken_list);
> -			} else
> -				bfqq_already_existing = true;
> +			}
>  		}
>  	}
>  
> -- 
> 2.38.1.431.g37b22c650d-goog
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[PATCH 2/2] bfq: ignore oom_bfqq in bfq_check_waker

Posted by Khazhismel Kumykov 3 years, 5 months ago

oom_bfqq is just a fallback bfqq, so shouldn't be used with waker
detection.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
---
 block/bfq-iosched.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index ca04ec868c40..267baf84870f 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2135,7 +2135,9 @@ static void bfq_check_waker(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 	if (!bfqd->last_completed_rq_bfqq ||
 	    bfqd->last_completed_rq_bfqq == bfqq ||
 	    bfq_bfqq_has_short_ttime(bfqq) ||
-	    now_ns - bfqd->last_completion >= 4 * NSEC_PER_MSEC)
+	    now_ns - bfqd->last_completion >= 4 * NSEC_PER_MSEC ||
+	    bfqd->last_completed_rq_bfqq == &bfqd->oom_bfqq ||
+	    bfqq == &bfqd->oom_bfqq)
 		return;
 
 	/*
-- 
2.38.1.431.g37b22c650d-goog

Re: [PATCH 2/2] bfq: ignore oom_bfqq in bfq_check_waker

Posted by Jan Kara 3 years, 5 months ago

On Tue 08-11-22 10:10:30, Khazhismel Kumykov wrote:
> oom_bfqq is just a fallback bfqq, so shouldn't be used with waker
> detection.
> 
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Khazhismel Kumykov <khazhy@google.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/bfq-iosched.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index ca04ec868c40..267baf84870f 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -2135,7 +2135,9 @@ static void bfq_check_waker(struct bfq_data *bfqd, struct bfq_queue *bfqq,
>  	if (!bfqd->last_completed_rq_bfqq ||
>  	    bfqd->last_completed_rq_bfqq == bfqq ||
>  	    bfq_bfqq_has_short_ttime(bfqq) ||
> -	    now_ns - bfqd->last_completion >= 4 * NSEC_PER_MSEC)
> +	    now_ns - bfqd->last_completion >= 4 * NSEC_PER_MSEC ||
> +	    bfqd->last_completed_rq_bfqq == &bfqd->oom_bfqq ||
> +	    bfqq == &bfqd->oom_bfqq)
>  		return;
>  
>  	/*
> -- 
> 2.38.1.431.g37b22c650d-goog
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR