A recent commit made cpumask_next*() trigger a warning when passed
n = nr_cpu_ids - 1. This means extra care must be taken when feeding CPU
numbers back into cpumask_next*().
The warning occurs nearly every boot on QEMU:
[ 5.398453] WARNING: CPU: 3 PID: 162 at include/linux/cpumask.h:110 __blk_mq_delay_run_hw_queue+0x16b/0x180
[ 5.399317] Modules linked in:
[ 5.399646] CPU: 3 PID: 162 Comm: ssh-keygen Tainted: G N 6.0.0-rc4-00004-g93003cb24006 #55
[ 5.400135] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 5.405430] Call Trace:
[ 5.406152] <TASK>
[ 5.406452] blk_mq_sched_insert_requests+0x67/0x150
[ 5.406759] blk_mq_flush_plug_list+0xd0/0x280
[ 5.406987] ? bit_wait+0x60/0x60
[ 5.407317] __blk_flush_plug+0xdb/0x120
[ 5.407561] ? bit_wait+0x60/0x60
[ 5.407765] io_schedule_prepare+0x38/0x40
[ 5.407974] io_schedule+0x6/0x40
[ 5.408283] bit_wait_io+0x8/0x60
[ 5.408485] __wait_on_bit+0x72/0x90
[ 5.408668] out_of_line_wait_on_bit+0x8c/0xb0
[ 5.408879] ? swake_up_one+0x30/0x30
[ 5.409158] ext4_read_bh+0x7a/0x80
[ 5.409381] ext4_get_branch+0xc0/0x130
[ 5.409583] ext4_ind_map_blocks+0x1ac/0xb30
[ 5.409844] ? __es_remove_extent+0x61/0x6d0
[ 5.410128] ? blk_account_io_merge_bio+0x67/0xd0
[ 5.410416] ? percpu_counter_add_batch+0x59/0xb0
[ 5.410720] ? percpu_counter_add_batch+0x59/0xb0
[ 5.410949] ? _raw_read_unlock+0x13/0x30
[ 5.411323] ext4_map_blocks+0xc2/0x590
[ 5.411609] ? xa_load+0x7c/0xa0
[ 5.411859] ext4_mpage_readpages+0x4a2/0xaa0
[ 5.412192] read_pages+0x69/0x2b0
[ 5.412477] ? folio_add_lru+0x4f/0x70
[ 5.412696] page_cache_ra_unbounded+0x11d/0x170
[ 5.412960] filemap_get_pages+0x109/0x5d0
[ 5.413340] ? __vma_adjust+0x348/0x930
[ 5.413576] filemap_read+0xb7/0x380
[ 5.413805] ? vma_merge+0x2e9/0x330
[ 5.414067] ? vma_set_page_prot+0x43/0x80
[ 5.414309] ? __inode_security_revalidate+0x5e/0x80
[ 5.414581] ? selinux_file_permission+0x107/0x130
[ 5.414889] vfs_read+0x205/0x2e0
[ 5.415162] ksys_read+0x54/0xd0
[ 5.415348] do_syscall_64+0x3a/0x90
[ 5.415584] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: 78e5a3399421 ("cpumask: fix checking valid cpu range")
Suggested-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
---
block/blk-mq.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c96c8c4f751b..30ae51eda95e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2046,8 +2046,13 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
if (--hctx->next_cpu_batch <= 0) {
select_cpu:
- next_cpu = cpumask_next_and(next_cpu, hctx->cpumask,
- cpu_online_mask);
+ if (next_cpu == nr_cpu_ids - 1)
+ next_cpu = nr_cpu_ids;
+ else
+ next_cpu = cpumask_next_and(next_cpu,
+ hctx->cpumask,
+ cpu_online_mask);
+
if (next_cpu >= nr_cpu_ids)
next_cpu = blk_mq_first_mapped_cpu(hctx);
hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
--
2.31.1
On Mon, Oct 03, 2022 at 04:34:16PM +0100, Valentin Schneider wrote:
> A recent commit made cpumask_next*() trigger a warning when passed
> n = nr_cpu_ids - 1. This means extra care must be taken when feeding CPU
> numbers back into cpumask_next*().
>
> The warning occurs nearly every boot on QEMU:
[...]
> Fixes: 78e5a3399421 ("cpumask: fix checking valid cpu range")
No! It fixes blk-mq bug, which has been revealed after 78e5a3399421.
> Suggested-by: Yury Norov <yury.norov@gmail.com>
OK, maybe I suggested something like this. But after looking into the code
of blk_mq_hctx_next_cpu() code for more, I have a feeling that this should
be overridden deeper.
Can you check - did this warning raise because hctx->next_cpu, or
because cpumask_next_and() was called twice after jumping on
select_cpu label?
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> ---
> block/blk-mq.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index c96c8c4f751b..30ae51eda95e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2046,8 +2046,13 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
>
> if (--hctx->next_cpu_batch <= 0) {
> select_cpu:
Because we have backward looking goto, I have a strong feeling that the
code should be reorganized.
> - next_cpu = cpumask_next_and(next_cpu, hctx->cpumask,
> - cpu_online_mask);
> + if (next_cpu == nr_cpu_ids - 1)
> + next_cpu = nr_cpu_ids;
> + else
> + next_cpu = cpumask_next_and(next_cpu,
> + hctx->cpumask,
> + cpu_online_mask);
> +
> if (next_cpu >= nr_cpu_ids)
> next_cpu = blk_mq_first_mapped_cpu(hctx);
This simply means 'let's start from the beginning', and should be
replaced with cpumask_next_and_wrap().
> hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
Maybe something like this would work?
if (--hctx->next_cpu_batch > 0 && cpu_online(next_cpu)) {
hctx->next_cpu = next_cpu;
return next_cpu;
}
next_cpu = cpumask_next_and_wrap(next_cpu, hctx->cpumask, cpu_online_mask)
if (next_cpu < nr_cpu_ids) {
hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
hctx->next_cpu = next_cpu;
return next_cpu;
}
/*
* Make sure to re-select CPU next time once after CPUs
* in hctx->cpumask become online again.
*/
hctx->next_cpu = next_cpu;
hctx->next_cpu_batch = 1;
return WORK_CPU_UNBOUND;
I didn't test it and likely screwed some corner case. I'm just
trying to say that picking next cpu should be an easier thing.
Thanks,
Yury
On 03/10/22 10:54, Yury Norov wrote:
> On Mon, Oct 03, 2022 at 04:34:16PM +0100, Valentin Schneider wrote:
>> A recent commit made cpumask_next*() trigger a warning when passed
>> n = nr_cpu_ids - 1. This means extra care must be taken when feeding CPU
>> numbers back into cpumask_next*().
>>
>> The warning occurs nearly every boot on QEMU:
>
> [...]
>
>> Fixes: 78e5a3399421 ("cpumask: fix checking valid cpu range")
>
> No! It fixes blk-mq bug, which has been revealed after 78e5a3399421.
>
>> Suggested-by: Yury Norov <yury.norov@gmail.com>
>
> OK, maybe I suggested something like this. But after looking into the code
> of blk_mq_hctx_next_cpu() code for more, I have a feeling that this should
> be overridden deeper.
>
> Can you check - did this warning raise because hctx->next_cpu, or
> because cpumask_next_and() was called twice after jumping on
> select_cpu label?
>
It seems to always happen when hctx->next_cpu == nr_cpu_ids-1 at the start
of the function - no jumping involved.
>> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
>> ---
>> block/blk-mq.c | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index c96c8c4f751b..30ae51eda95e 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -2046,8 +2046,13 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
>>
>> if (--hctx->next_cpu_batch <= 0) {
>> select_cpu:
>
> Because we have backward looking goto, I have a strong feeling that the
> code should be reorganized.
>
>> - next_cpu = cpumask_next_and(next_cpu, hctx->cpumask,
>> - cpu_online_mask);
>> + if (next_cpu == nr_cpu_ids - 1)
>> + next_cpu = nr_cpu_ids;
>> + else
>> + next_cpu = cpumask_next_and(next_cpu,
>> + hctx->cpumask,
>> + cpu_online_mask);
>> +
>> if (next_cpu >= nr_cpu_ids)
>> next_cpu = blk_mq_first_mapped_cpu(hctx);
>
> This simply means 'let's start from the beginning', and should be
> replaced with cpumask_next_and_wrap().
I hadn't looked in depth there, but that's a strange behaviour.
If we get to the end of the cpumask, blk_mq_first_mapped_cpu() does:
int cpu = cpumask_first_and(hctx->cpumask, cpu_online_mask);
if (cpu >= nr_cpu_ids)
cpu = cpumask_first(hctx->cpumask);
return cpu;
That if branch means the returned CPU is offline, which then triggers:
if (!cpu_online(next_cpu)) {
if (!tried) {
tried = true;
goto select_cpu;
}
but going back to select_cpu doesn't make much sense, we've already checked
that hctx->cpumask and cpu_online_mask were disjoint.
>
>> hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
>
>
> Maybe something like this would work?
>
> if (--hctx->next_cpu_batch > 0 && cpu_online(next_cpu)) {
> hctx->next_cpu = next_cpu;
> return next_cpu;
> }
>
> next_cpu = cpumask_next_and_wrap(next_cpu, hctx->cpumask, cpu_online_mask)
> if (next_cpu < nr_cpu_ids) {
> hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
> hctx->next_cpu = next_cpu;
> return next_cpu;
> }
>
> /*
> * Make sure to re-select CPU next time once after CPUs
> * in hctx->cpumask become online again.
> */
> hctx->next_cpu = next_cpu;
> hctx->next_cpu_batch = 1;
> return WORK_CPU_UNBOUND;
>
> I didn't test it and likely screwed some corner case. I'm just
> trying to say that picking next cpu should be an easier thing.
>
Agreed, your suggestion looks sane, let me play with that a bit.
> Thanks,
> Yury
© 2016 - 2026 Red Hat, Inc.