kernel/padata.c | 7 +++++++ 1 file changed, 7 insertions(+)
We are hit with a not easily reproducible divide-by-0 panic in padata.c
at bootup time.
[ 10.017908] Oops: divide error: 0000 1 PREEMPT SMP NOPTI
[ 10.017908] CPU: 26 PID: 2627 Comm: kworker/u1666:1 Not tainted 6.10.0-15.el10.x86_64 #1
[ 10.017908] Hardware name: Lenovo ThinkSystem SR950 [7X12CTO1WW]/[7X12CTO1WW], BIOS [PSE140J-2.30] 07/20/2021
[ 10.017908] Workqueue: events_unbound padata_mt_helper
[ 10.017908] RIP: 0010:padata_mt_helper+0x39/0xb0
:
[ 10.017963] Call Trace:
[ 10.017968] <TASK>
[ 10.018004] ? padata_mt_helper+0x39/0xb0
[ 10.018084] process_one_work+0x174/0x330
[ 10.018093] worker_thread+0x266/0x3a0
[ 10.018111] kthread+0xcf/0x100
[ 10.018124] ret_from_fork+0x31/0x50
[ 10.018138] ret_from_fork_asm+0x1a/0x30
[ 10.018147] </TASK>
Looking at the padata_mt_helper() function, the only way a divide-by-0
panic can happen is when ps->chunk_size is 0. The way that chunk_size is
initialized in padata_do_multithreaded(), chunk_size can be 0 when the
min_chunk in the passed-in padata_mt_job structure is 0.
Fix this divide-by-0 panic by making sure that chunk_size will be at
least 1 no matter what the input parameters are.
Fixes: 004ed42638f4 ("padata: add basic support for multithreaded jobs")
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/padata.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/padata.c b/kernel/padata.c
index 53f4bc912712..0fa6c2895460 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job)
ps.chunk_size = max(ps.chunk_size, job->min_chunk);
ps.chunk_size = roundup(ps.chunk_size, job->align);
+ /*
+ * chunk_size can be 0 if the caller sets min_chunk to 0. So force it
+ * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().`
+ */
+ if (!ps.chunk_size)
+ ps.chunk_size = 1U;
+
list_for_each_entry(pw, &works, pw_list)
if (job->numa_aware) {
int old_node = atomic_read(&last_used_nid);
--
2.43.5
Waiman Long <longman@redhat.com> writes:
> We are hit with a not easily reproducible divide-by-0 panic in padata.c
> at bootup time.
>
> [ 10.017908] Oops: divide error: 0000 1 PREEMPT SMP NOPTI
> [ 10.017908] CPU: 26 PID: 2627 Comm: kworker/u1666:1 Not tainted 6.10.0-15.el10.x86_64 #1
> [ 10.017908] Hardware name: Lenovo ThinkSystem SR950 [7X12CTO1WW]/[7X12CTO1WW], BIOS [PSE140J-2.30] 07/20/2021
> [ 10.017908] Workqueue: events_unbound padata_mt_helper
> [ 10.017908] RIP: 0010:padata_mt_helper+0x39/0xb0
> :
> [ 10.017963] Call Trace:
> [ 10.017968] <TASK>
> [ 10.018004] ? padata_mt_helper+0x39/0xb0
> [ 10.018084] process_one_work+0x174/0x330
> [ 10.018093] worker_thread+0x266/0x3a0
> [ 10.018111] kthread+0xcf/0x100
> [ 10.018124] ret_from_fork+0x31/0x50
> [ 10.018138] ret_from_fork_asm+0x1a/0x30
> [ 10.018147] </TASK>
>
> Looking at the padata_mt_helper() function, the only way a divide-by-0
> panic can happen is when ps->chunk_size is 0. The way that chunk_size is
> initialized in padata_do_multithreaded(), chunk_size can be 0 when the
> min_chunk in the passed-in padata_mt_job structure is 0.
>
> Fix this divide-by-0 panic by making sure that chunk_size will be at
> least 1 no matter what the input parameters are.
>
> Fixes: 004ed42638f4 ("padata: add basic support for multithreaded jobs")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> kernel/padata.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/kernel/padata.c b/kernel/padata.c
> index 53f4bc912712..0fa6c2895460 100644
> --- a/kernel/padata.c
> +++ b/kernel/padata.c
> @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job)
> ps.chunk_size = max(ps.chunk_size, job->min_chunk);
> ps.chunk_size = roundup(ps.chunk_size, job->align);
>
> + /*
> + * chunk_size can be 0 if the caller sets min_chunk to 0. So force it
> + * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().`
> + */
Thanks for the patch and detailed comment.
> + if (!ps.chunk_size)
> + ps.chunk_size = 1U;
> +
could it be
ps.chunk_size = max(ps.chunk_size, 1U);
or can be merged with earlier max()
ps.chunk_size = max(ps.chunk_size, max(job->min_chunk, 1U));
ps.chunk_size = roundup(ps.chunk_size, job->align);
sits well with how entire file is written and compiler is optimizing
them to same level.
Kamlesh
> list_for_each_entry(pw, &works, pw_list)
> if (job->numa_aware) {
> int old_node = atomic_read(&last_used_nid);
> --
> 2.43.5
On 8/10/24 13:44, Kamlesh Gurudasani wrote:
> Waiman Long <longman@redhat.com> writes:
>
>> We are hit with a not easily reproducible divide-by-0 panic in padata.c
>> at bootup time.
>>
>> [ 10.017908] Oops: divide error: 0000 1 PREEMPT SMP NOPTI
>> [ 10.017908] CPU: 26 PID: 2627 Comm: kworker/u1666:1 Not tainted 6.10.0-15.el10.x86_64 #1
>> [ 10.017908] Hardware name: Lenovo ThinkSystem SR950 [7X12CTO1WW]/[7X12CTO1WW], BIOS [PSE140J-2.30] 07/20/2021
>> [ 10.017908] Workqueue: events_unbound padata_mt_helper
>> [ 10.017908] RIP: 0010:padata_mt_helper+0x39/0xb0
>> :
>> [ 10.017963] Call Trace:
>> [ 10.017968] <TASK>
>> [ 10.018004] ? padata_mt_helper+0x39/0xb0
>> [ 10.018084] process_one_work+0x174/0x330
>> [ 10.018093] worker_thread+0x266/0x3a0
>> [ 10.018111] kthread+0xcf/0x100
>> [ 10.018124] ret_from_fork+0x31/0x50
>> [ 10.018138] ret_from_fork_asm+0x1a/0x30
>> [ 10.018147] </TASK>
>>
>> Looking at the padata_mt_helper() function, the only way a divide-by-0
>> panic can happen is when ps->chunk_size is 0. The way that chunk_size is
>> initialized in padata_do_multithreaded(), chunk_size can be 0 when the
>> min_chunk in the passed-in padata_mt_job structure is 0.
>>
>> Fix this divide-by-0 panic by making sure that chunk_size will be at
>> least 1 no matter what the input parameters are.
>>
>> Fixes: 004ed42638f4 ("padata: add basic support for multithreaded jobs")
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>> kernel/padata.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/kernel/padata.c b/kernel/padata.c
>> index 53f4bc912712..0fa6c2895460 100644
>> --- a/kernel/padata.c
>> +++ b/kernel/padata.c
>> @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job)
>> ps.chunk_size = max(ps.chunk_size, job->min_chunk);
>> ps.chunk_size = roundup(ps.chunk_size, job->align);
>>
>> + /*
>> + * chunk_size can be 0 if the caller sets min_chunk to 0. So force it
>> + * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().`
>> + */
> Thanks for the patch and detailed comment.
>> + if (!ps.chunk_size)
>> + ps.chunk_size = 1U;
>> +
> could it be
> ps.chunk_size = max(ps.chunk_size, 1U);
>
> or can be merged with earlier max()
> ps.chunk_size = max(ps.chunk_size, max(job->min_chunk, 1U));
> ps.chunk_size = roundup(ps.chunk_size, job->align);
>
> sits well with how entire file is written and compiler is optimizing
> them to same level.
I had actually thought about doing that as an alternative. I used the
current patch to avoid putting too many max() calls there. I can go this
route if you guys prefer this.
Cheers,
Longman
Waiman Long <longman@redhat.com> writes:
> On 8/10/24 13:44, Kamlesh Gurudasani wrote:
>> Waiman Long <longman@redhat.com> writes:
>>
...
>>> diff --git a/kernel/padata.c b/kernel/padata.c
>>> index 53f4bc912712..0fa6c2895460 100644
>>> --- a/kernel/padata.c
>>> +++ b/kernel/padata.c
>>> @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job)
>>> ps.chunk_size = max(ps.chunk_size, job->min_chunk);
>>> ps.chunk_size = roundup(ps.chunk_size, job->align);
>>>
>>> + /*
>>> + * chunk_size can be 0 if the caller sets min_chunk to 0. So force it
>>> + * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().`
>>> + */
>> Thanks for the patch and detailed comment.
>>> + if (!ps.chunk_size)
>>> + ps.chunk_size = 1U;
>>> +
>> could it be
>> ps.chunk_size = max(ps.chunk_size, 1U);
>>
>> or can be merged with earlier max()
>> ps.chunk_size = max(ps.chunk_size, max(job->min_chunk, 1U));
>> ps.chunk_size = roundup(ps.chunk_size, job->align);
>>
>> sits well with how entire file is written and compiler is optimizing
>> them to same level.
>
> I had actually thought about doing that as an alternative. I used the
> current patch to avoid putting too many max() calls there. I can go this
> route if you guys prefer this.
Just curious, what is your reason for avoiding too many max() calls? Both
if (!ps.chunk_size)
ps.chunk_size = 1U;
and
ps.chunk_size = max(ps.chunk_size, 1U);
are having same number of instructions [1].
[1] https://godbolt.org/z/ajrK59c67
We can avoid nested max(), though following would make it easier to understand.
ps.chunk_size = max(ps.chunk_size, 1U);
Cheers,
Kamlesh
>
> Cheers,
> Longman
On 8/11/24 01:44, Kamlesh Gurudasani wrote: > Waiman Long <longman@redhat.com> writes: > >> On 8/10/24 13:44, Kamlesh Gurudasani wrote: >>> Waiman Long <longman@redhat.com> writes: >>> > ... >>>> diff --git a/kernel/padata.c b/kernel/padata.c >>>> index 53f4bc912712..0fa6c2895460 100644 >>>> --- a/kernel/padata.c >>>> +++ b/kernel/padata.c >>>> @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job) >>>> ps.chunk_size = max(ps.chunk_size, job->min_chunk); >>>> ps.chunk_size = roundup(ps.chunk_size, job->align); >>>> >>>> + /* >>>> + * chunk_size can be 0 if the caller sets min_chunk to 0. So force it >>>> + * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().` >>>> + */ >>> Thanks for the patch and detailed comment. >>>> + if (!ps.chunk_size) >>>> + ps.chunk_size = 1U; >>>> + >>> could it be >>> ps.chunk_size = max(ps.chunk_size, 1U); >>> >>> or can be merged with earlier max() >>> ps.chunk_size = max(ps.chunk_size, max(job->min_chunk, 1U)); >>> ps.chunk_size = roundup(ps.chunk_size, job->align); >>> >>> sits well with how entire file is written and compiler is optimizing >>> them to same level. >> I had actually thought about doing that as an alternative. I used the >> current patch to avoid putting too many max() calls there. I can go this >> route if you guys prefer this. > Just curious, what is your reason for avoiding too many max() calls? Both > if (!ps.chunk_size) > ps.chunk_size = 1U; > and > ps.chunk_size = max(ps.chunk_size, 1U); > > are having same number of instructions [1]. > > [1] https://godbolt.org/z/ajrK59c67 > > We can avoid nested max(), though following would make it easier to understand. > > ps.chunk_size = max(ps.chunk_size, 1U); That will certainly work. My current patch has been merged into the Linus tree. You are welcome to post another patch to clean it up if you want. Cheers, Longman > > Cheers, > Kamlesh > >> Cheers, >> Longman
Waiman Long <longman@redhat.com> wrote: > > diff --git a/kernel/padata.c b/kernel/padata.c > index 53f4bc912712..0fa6c2895460 100644 > --- a/kernel/padata.c > +++ b/kernel/padata.c > @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job) > ps.chunk_size = max(ps.chunk_size, job->min_chunk); > ps.chunk_size = roundup(ps.chunk_size, job->align); > > + /* > + * chunk_size can be 0 if the caller sets min_chunk to 0. So force it > + * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().` > + */ > + if (!ps.chunk_size) > + ps.chunk_size = 1U; Perhaps change the first ps.chunk_size assignment to use DIV_ROUND_UP instead? Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On 8/10/24 00:05, Herbert Xu wrote: > Waiman Long <longman@redhat.com> wrote: >> diff --git a/kernel/padata.c b/kernel/padata.c >> index 53f4bc912712..0fa6c2895460 100644 >> --- a/kernel/padata.c >> +++ b/kernel/padata.c >> @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job) >> ps.chunk_size = max(ps.chunk_size, job->min_chunk); >> ps.chunk_size = roundup(ps.chunk_size, job->align); >> >> + /* >> + * chunk_size can be 0 if the caller sets min_chunk to 0. So force it >> + * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().` >> + */ >> + if (!ps.chunk_size) >> + ps.chunk_size = 1U; > Perhaps change the first ps.chunk_size assignment to use DIV_ROUND_UP > instead? I think DIV_ROUND_UP() will exactly the same problem that if chunk_size is 0, you still got a 0 result. round_up() only if the 2nd argument is a power of 2 while with DIV_ROUND_UP(), the second argument can be any number except 0. Cheers, Longman
On Sat, Aug 10, 2024 at 09:30:25PM -0400, Waiman Long wrote: > > On 8/10/24 00:05, Herbert Xu wrote: > > Waiman Long <longman@redhat.com> wrote: > > > diff --git a/kernel/padata.c b/kernel/padata.c > > > index 53f4bc912712..0fa6c2895460 100644 > > > --- a/kernel/padata.c > > > +++ b/kernel/padata.c > > > @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job) > > > ps.chunk_size = max(ps.chunk_size, job->min_chunk); > > > ps.chunk_size = roundup(ps.chunk_size, job->align); > > > > > > + /* > > > + * chunk_size can be 0 if the caller sets min_chunk to 0. So force it > > > + * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().` > > > + */ > > > + if (!ps.chunk_size) > > > + ps.chunk_size = 1U; > > Perhaps change the first ps.chunk_size assignment to use DIV_ROUND_UP > > instead? > > I think DIV_ROUND_UP() will exactly the same problem that if chunk_size is > 0, you still got a 0 result. round_up() only if the 2nd argument is a power > of 2 while with DIV_ROUND_UP(), the second argument can be any number except > 0. Unless I'm missing something chunk_size cannot be zero before the division because that's the first thing we check upon entry into this function. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On 8/10/24 21:45, Herbert Xu wrote: > On Sat, Aug 10, 2024 at 09:30:25PM -0400, Waiman Long wrote: >> On 8/10/24 00:05, Herbert Xu wrote: >>> Waiman Long <longman@redhat.com> wrote: >>>> diff --git a/kernel/padata.c b/kernel/padata.c >>>> index 53f4bc912712..0fa6c2895460 100644 >>>> --- a/kernel/padata.c >>>> +++ b/kernel/padata.c >>>> @@ -517,6 +517,13 @@ void __init padata_do_multithreaded(struct padata_mt_job *job) >>>> ps.chunk_size = max(ps.chunk_size, job->min_chunk); >>>> ps.chunk_size = roundup(ps.chunk_size, job->align); >>>> >>>> + /* >>>> + * chunk_size can be 0 if the caller sets min_chunk to 0. So force it >>>> + * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().` >>>> + */ >>>> + if (!ps.chunk_size) >>>> + ps.chunk_size = 1U; >>> Perhaps change the first ps.chunk_size assignment to use DIV_ROUND_UP >>> instead? >> I think DIV_ROUND_UP() will exactly the same problem that if chunk_size is >> 0, you still got a 0 result. round_up() only if the 2nd argument is a power >> of 2 while with DIV_ROUND_UP(), the second argument can be any number except >> 0. > Unless I'm missing something chunk_size cannot be zero before the > division because that's the first thing we check upon entry into > this function. chunk_size is initialized as ps.chunk_size = job->size / (ps.nworks * load_balance_factor); chunk_size will be 0 if job->size < (ps.nworks * load_balance_factor). If min_chunk is 0, chunk_size will remain 0. After looking at the dump file when the crash happen at padata_mt_helper(). I had determined that ps->chunk_size was indeed 0 which caused the divide-by-0 panic. I actually got 2 different bug reports of this div-by-0 panic, one with a debug and another one with a non-debug kernel. Cheers, Longman
On Sat, Aug 10, 2024 at 11:11:07PM -0400, Waiman Long wrote: > > > Unless I'm missing something chunk_size cannot be zero before the > > division because that's the first thing we check upon entry into > > this function. > > chunk_size is initialized as > > ps.chunk_size = job->size / (ps.nworks * load_balance_factor); > > chunk_size will be 0 if job->size < (ps.nworks * load_balance_factor). If > min_chunk is 0, chunk_size will remain 0. That's why I was suggesting that you replace the division by DIV_ROUND_UP. That should ensure that ps.chunk_size is not zero. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On 8/10/24 23:13, Herbert Xu wrote: > On Sat, Aug 10, 2024 at 11:11:07PM -0400, Waiman Long wrote: >>> Unless I'm missing something chunk_size cannot be zero before the >>> division because that's the first thing we check upon entry into >>> this function. >> chunk_size is initialized as >> >> ps.chunk_size = job->size / (ps.nworks * load_balance_factor); >> >> chunk_size will be 0 if job->size < (ps.nworks * load_balance_factor). If >> min_chunk is 0, chunk_size will remain 0. > That's why I was suggesting that you replace the division by > DIV_ROUND_UP. That should ensure that ps.chunk_size is not zero. Now I see what you mean. Yes, we can probably change that to a DIV_ROUND_UP operation to make sure that chunk_size is at least one unless job->size is 0. I still think the current patch is a bit more fail-safe. Cheers, Longman
On Sat, Aug 10, 2024 at 11:27:55PM -0400, Waiman Long wrote: > > Now I see what you mean. Yes, we can probably change that to a DIV_ROUND_UP > operation to make sure that chunk_size is at least one unless job->size is > 0. I still think the current patch is a bit more fail-safe. The very first thing the function does is check that job->size is not zero. So this should be all that is necessary. Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
© 2016 - 2026 Red Hat, Inc.