kernel/module/main.c | 3 +++ 1 file changed, 3 insertions(+)
The per-CPU data section is handled differently than the other sections.
The memory allocations requires a special __percpu pointer and then the
section is copied into the view of each CPU. Therefore the SHF_ALLOC
flag is removed to ensure move_module() skips it.
Later, relocations are applied and apply_relocations() skips sections
without SHF_ALLOC because they have not been copied. This also skips the
per-CPU data section.
The missing relocations result in a NULL pointer on x86-64 and very
small values on x86-32. This results in a crash because it is not
skipped like NULL pointer would and it can't be dereferenced.
Such an assignment happens during compile time per-CPU lock
initialisation with lockdep enabled.
Add the SHF_ALLOC flag back for the per-CPU section after move_module().
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506041623.e45e4f7d-lkp@intel.com
Fixes: 8d8022e8aba85 ("module: do percpu allocation after uniqueness check. No, really!")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
kernel/module/main.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 5c6ab20240a6d..35abb5f13d7dc 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2816,6 +2816,9 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
if (err)
return ERR_PTR(err);
+ /* Add SHF_ALLOC back so that relocations are applied. */
+ info->sechdrs[info->index.pcpu].sh_flags |= SHF_ALLOC;
+
/* Module has been copied to its final place now: return it. */
mod = (void *)info->sechdrs[info->index.mod].sh_addr;
kmemleak_load_module(mod, info);
--
2.49.0
The per-CPU data section is handled differently than the other sections.
The memory allocations requires a special __percpu pointer and then the
section is copied into the view of each CPU. Therefore the SHF_ALLOC
flag is removed to ensure move_module() skips it.
Later, relocations are applied and apply_relocations() skips sections
without SHF_ALLOC because they have not been copied. This also skips the
per-CPU data section.
The missing relocations result in a NULL pointer on x86-64 and very
small values on x86-32. This results in a crash because it is not
skipped like NULL pointer would and can't be dereferenced.
Such an assignment happens during static per-CPU lock initialisation
with lockdep enabled.
Add the SHF_ALLOC flag back for the per-CPU section (if found) after
move_module().
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506041623.e45e4f7d-lkp@intel.com
Fixes: 8d8022e8aba85 ("module: do percpu allocation after uniqueness check. No, really!")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v1…v2: https://lore.kernel.org/all/20250604152707.CieD9tN0@linutronix.de/
- Add the flag back only on SMP if the per-CPU section was found.
kernel/module/main.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 5c6ab20240a6d..4f6554dedf8ea 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2816,6 +2816,10 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
if (err)
return ERR_PTR(err);
+ /* Add SHF_ALLOC back so that relocations are applied. */
+ if (IS_ENABLED(CONFIG_SMP) && info->index.pcpu)
+ info->sechdrs[info->index.pcpu].sh_flags |= SHF_ALLOC;
+
/* Module has been copied to its final place now: return it. */
mod = (void *)info->sechdrs[info->index.mod].sh_addr;
kmemleak_load_module(mod, info);
--
2.49.0
On 6/5/25 8:07 AM, Sebastian Andrzej Siewior wrote:
> The per-CPU data section is handled differently than the other sections.
> The memory allocations requires a special __percpu pointer and then the
> section is copied into the view of each CPU. Therefore the SHF_ALLOC
> flag is removed to ensure move_module() skips it.
>
> Later, relocations are applied and apply_relocations() skips sections
> without SHF_ALLOC because they have not been copied. This also skips the
> per-CPU data section.
> The missing relocations result in a NULL pointer on x86-64 and very
> small values on x86-32. This results in a crash because it is not
> skipped like NULL pointer would and can't be dereferenced.
>
> Such an assignment happens during static per-CPU lock initialisation
> with lockdep enabled.
>
> Add the SHF_ALLOC flag back for the per-CPU section (if found) after
> move_module().
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202506041623.e45e4f7d-lkp@intel.com
> Fixes: 8d8022e8aba85 ("module: do percpu allocation after uniqueness check. No, really!")
Isn't this broken earlier by "Don't relocate non-allocated regions in modules."
(pre-Git, [1])?
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> v1…v2: https://lore.kernel.org/all/20250604152707.CieD9tN0@linutronix.de/
> - Add the flag back only on SMP if the per-CPU section was found.
>
> kernel/module/main.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 5c6ab20240a6d..4f6554dedf8ea 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -2816,6 +2816,10 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
> if (err)
> return ERR_PTR(err);
>
> + /* Add SHF_ALLOC back so that relocations are applied. */
> + if (IS_ENABLED(CONFIG_SMP) && info->index.pcpu)
> + info->sechdrs[info->index.pcpu].sh_flags |= SHF_ALLOC;
> +
> /* Module has been copied to its final place now: return it. */
> mod = (void *)info->sechdrs[info->index.mod].sh_addr;
> kmemleak_load_module(mod, info);
This looks like a valid fix. The info->sechdrs[info->index.pcpu].sh_addr
is set by rewrite_section_headers() to point to the percpu data in the
userspace-passed ELF copy. The section has SHF_ALLOC reset, so it
doesn't move and the sh_addr isn't adjusted by move_module(). The
function apply_relocations() then applies the relocations in the initial
ELF copy. Finally, post_relocation() copies the relocated percpu data to
their final per-CPU destinations.
However, I'm not sure if it is best to manipulate the SHF_ALLOC flag in
this way. It is ok to reset it once, but if we need to set it back again
then I would reconsider this.
An alternative approach could be to teach apply_relocations() that the
percpu section is special and should be relocated even though it doesn't
have SHF_ALLOC set. This would also allow adding a comment explaining
that we're relocating the data in the original ELF copy, which I find
useful to mention as it is different to other relocation processing.
For instance:
/*
* Don't bother with non-allocated sections.
*
* An exception is the percpu section, which has separate allocations
* for individual CPUs. We relocate the percpu section in the initial
* ELF template and subsequently copy it to the per-CPU destinations.
*/
if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC) &&
infosec != info->index.pcpu)
continue;
[1] https://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux-fullhistory.git/commit/?id=b3b91325f3c77ace041f769ada7039ebc7aab8de
--
Thanks,
Petr
On 2025-06-05 15:44:23 [+0200], Petr Pavlu wrote:
> Isn't this broken earlier by "Don't relocate non-allocated regions in modules."
> (pre-Git, [1])?
Looking further back into the history, we have
21af2f0289dea ("[PATCH] per-cpu support inside modules (minimal)")
which does
+ if (pcpuindex) {
+ /* We have a special allocation for this section. */
+ mod->percpu = percpu_modalloc(sechdrs[pcpuindex].sh_size,
+ sechdrs[pcpuindex].sh_addralign);
+ if (!mod->percpu) {
+ err = -ENOMEM;
+ goto free_mod;
+ }
+ sechdrs[pcpuindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
+ }
so this looks like the origin.
…
> > --- a/kernel/module/main.c
> > +++ b/kernel/module/main.c
> > @@ -2816,6 +2816,10 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
> > if (err)
> > return ERR_PTR(err);
> >
> > + /* Add SHF_ALLOC back so that relocations are applied. */
> > + if (IS_ENABLED(CONFIG_SMP) && info->index.pcpu)
> > + info->sechdrs[info->index.pcpu].sh_flags |= SHF_ALLOC;
> > +
> > /* Module has been copied to its final place now: return it. */
> > mod = (void *)info->sechdrs[info->index.mod].sh_addr;
> > kmemleak_load_module(mod, info);
>
> This looks like a valid fix. The info->sechdrs[info->index.pcpu].sh_addr
> is set by rewrite_section_headers() to point to the percpu data in the
> userspace-passed ELF copy. The section has SHF_ALLOC reset, so it
> doesn't move and the sh_addr isn't adjusted by move_module(). The
> function apply_relocations() then applies the relocations in the initial
> ELF copy. Finally, post_relocation() copies the relocated percpu data to
> their final per-CPU destinations.
>
> However, I'm not sure if it is best to manipulate the SHF_ALLOC flag in
> this way. It is ok to reset it once, but if we need to set it back again
> then I would reconsider this.
I had the other way around but this flag is not considered anywhere
else other than the functions called here. So I decided to add back what
was taken once.
> An alternative approach could be to teach apply_relocations() that the
> percpu section is special and should be relocated even though it doesn't
> have SHF_ALLOC set. This would also allow adding a comment explaining
> that we're relocating the data in the original ELF copy, which I find
> useful to mention as it is different to other relocation processing.
Not sure if this makes it better. It looks like it continues a
workaround…
The only reason why it has been removed in the first place is to skip
the copy process.
We could also keep the flag and skip the section during the copy
process based on its id. This was the original intention.
> For instance:
>
> /*
> * Don't bother with non-allocated sections.
> *
> * An exception is the percpu section, which has separate allocations
> * for individual CPUs. We relocate the percpu section in the initial
> * ELF template and subsequently copy it to the per-CPU destinations.
> */
> if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC) &&
> infosec != info->index.pcpu)
> continue;
>
If you insist but…
Sebastian
On 6/5/25 5:54 PM, Sebastian Andrzej Siewior wrote:
> On 2025-06-05 15:44:23 [+0200], Petr Pavlu wrote:
>> Isn't this broken earlier by "Don't relocate non-allocated regions in modules."
>> (pre-Git, [1])?
>
> Looking further back into the history, we have
> 21af2f0289dea ("[PATCH] per-cpu support inside modules (minimal)")
>
> which does
>
> + if (pcpuindex) {
> + /* We have a special allocation for this section. */
> + mod->percpu = percpu_modalloc(sechdrs[pcpuindex].sh_size,
> + sechdrs[pcpuindex].sh_addralign);
> + if (!mod->percpu) {
> + err = -ENOMEM;
> + goto free_mod;
> + }
> + sechdrs[pcpuindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
> + }
>
> so this looks like the origin.
This patch added the initial per-cpu support for modules. The relocation
handling at that point appears correct to me. I think it's the mentioned patch
"Don't relocate non-allocated regions in modules" that broke it.
>
> …
>>> --- a/kernel/module/main.c
>>> +++ b/kernel/module/main.c
>>> @@ -2816,6 +2816,10 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
>>> if (err)
>>> return ERR_PTR(err);
>>>
>>> + /* Add SHF_ALLOC back so that relocations are applied. */
>>> + if (IS_ENABLED(CONFIG_SMP) && info->index.pcpu)
>>> + info->sechdrs[info->index.pcpu].sh_flags |= SHF_ALLOC;
>>> +
>>> /* Module has been copied to its final place now: return it. */
>>> mod = (void *)info->sechdrs[info->index.mod].sh_addr;
>>> kmemleak_load_module(mod, info);
>>
>> This looks like a valid fix. The info->sechdrs[info->index.pcpu].sh_addr
>> is set by rewrite_section_headers() to point to the percpu data in the
>> userspace-passed ELF copy. The section has SHF_ALLOC reset, so it
>> doesn't move and the sh_addr isn't adjusted by move_module(). The
>> function apply_relocations() then applies the relocations in the initial
>> ELF copy. Finally, post_relocation() copies the relocated percpu data to
>> their final per-CPU destinations.
>>
>> However, I'm not sure if it is best to manipulate the SHF_ALLOC flag in
>> this way. It is ok to reset it once, but if we need to set it back again
>> then I would reconsider this.
>
> I had the other way around but this flag is not considered anywhere
> else other than the functions called here. So I decided to add back what
> was taken once.
>
>> An alternative approach could be to teach apply_relocations() that the
>> percpu section is special and should be relocated even though it doesn't
>> have SHF_ALLOC set. This would also allow adding a comment explaining
>> that we're relocating the data in the original ELF copy, which I find
>> useful to mention as it is different to other relocation processing.
>
> Not sure if this makes it better. It looks like it continues a
> workaround…
> The only reason why it has been removed in the first place is to skip
> the copy process.
The SHF_ALLOC flag is also removed to prevent the section from being allocated
by layout_sections().
> We could also keep the flag and skip the section during the copy
> process based on its id. This was the original intention.
>
>> For instance:
>>
>> /*
>> * Don't bother with non-allocated sections.
>> *
>> * An exception is the percpu section, which has separate allocations
>> * for individual CPUs. We relocate the percpu section in the initial
>> * ELF template and subsequently copy it to the per-CPU destinations.
>> */
>> if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC) &&
>> infosec != info->index.pcpu)
>> continue;
>>
>
> If you insist but…
It seems logical to me that the SHF_ALLOC flag is removed for the percpu section
since it isn't directly allocated by the regular process. This is consistent
with what the module loader does in other similar cases. I could also understand
keeping the flag and explicitly skipping the layout and allocate process for the
section. However, adjusting the flag back and forth to trigger the right code
paths in between seems fragile to me and harder to maintain if we need to
shuffle things around in the future.
--
Cheers,
Petr
On 2025-06-05 18:50:27 [+0200], Petr Pavlu wrote:
> On 6/5/25 5:54 PM, Sebastian Andrzej Siewior wrote:
> > On 2025-06-05 15:44:23 [+0200], Petr Pavlu wrote:
> >> Isn't this broken earlier by "Don't relocate non-allocated regions in modules."
> >> (pre-Git, [1])?
> >
> > Looking further back into the history, we have
> > 21af2f0289dea ("[PATCH] per-cpu support inside modules (minimal)")
> >
> > which does
> >
> > + if (pcpuindex) {
> > + /* We have a special allocation for this section. */
> > + mod->percpu = percpu_modalloc(sechdrs[pcpuindex].sh_size,
> > + sechdrs[pcpuindex].sh_addralign);
> > + if (!mod->percpu) {
> > + err = -ENOMEM;
> > + goto free_mod;
> > + }
> > + sechdrs[pcpuindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
> > + }
> >
> > so this looks like the origin.
>
> This patch added the initial per-cpu support for modules. The relocation
> handling at that point appears correct to me. I think it's the mentioned patch
> "Don't relocate non-allocated regions in modules" that broke it.
Ach, it ignores that bit. Okay then.
> It seems logical to me that the SHF_ALLOC flag is removed for the percpu section
> since it isn't directly allocated by the regular process. This is consistent
> with what the module loader does in other similar cases. I could also understand
> keeping the flag and explicitly skipping the layout and allocate process for the
> section. However, adjusting the flag back and forth to trigger the right code
> paths in between seems fragile to me and harder to maintain if we need to
> shuffle things around in the future.
Okay. Let me add this exception later on instead of adding the bit back.
Sebastian
On Thu, Jun 05, 2025 at 03:44:23PM +0200, Petr Pavlu wrote: > For instance: > > /* > * Don't bother with non-allocated sections. > * > * An exception is the percpu section, which has separate allocations > * for individual CPUs. We relocate the percpu section in the initial > * ELF template and subsequently copy it to the per-CPU destinations. > */ > if (!(info->sechdrs[infosec].sh_flags & SHF_ALLOC) && > infosec != info->index.pcpu) > continue; Right, and pcpu is a data section and should not have relative relocations, only absolute. So copying things should not be a problem.
© 2016 - 2025 Red Hat, Inc.