arch/alpha/include/uapi/asm/ptrace.h | 2 ++ arch/alpha/kernel/asm-offsets.c | 4 ++++ arch/alpha/kernel/entry.S | 24 ++++++++++-------------- arch/alpha/kernel/traps.c | 2 +- arch/alpha/mm/fault.c | 4 ++-- 5 files changed, 19 insertions(+), 17 deletions(-)
This series fixes oopses on Alpha/SMP observed since kernel v6.9. [1]
Thanks to Magnus Lindholm for identifying that remarkably longstanding
bug.
The problem is that GCC expects 16-byte alignment of the incoming stack
since early 2004, as Maciej found out [2]:
Having actually dug speculatively I can see that the psABI was changed in
GCC 3.5 with commit e5e10fb4a350 ("re PR target/14539 (128-bit long double
improperly aligned)") back in Mar 2004, when the stack pointer alignment
was increased from 8 bytes to 16 bytes, and arch/alpha/kernel/entry.S has
various suspicious stack pointer adjustments, starting with SP_OFF which
is not a whole multiple of 16.
Also, as Magnus noted, "ALPHA Calling Standard" [3] required the same:
D.3.1 Stack Alignment
This standard requires that stacks be octaword aligned at the time a
new procedure is invoked.
However:
- the "normal" kernel stack is always misaligned by 8 bytes, thanks to
the odd number of 64-bit words in 'struct pt_regs', which is the very
first thing pushed onto the kernel thread stack;
- syscall, fault, interrupt etc. handlers may, or may not, receive aligned
stack depending on numerous factors.
Somehow we got away with it until recently, when we ended up with
a stack corruption in kernel/smp.c:smp_call_function_single() due to
its use of 32-byte aligned local data and the compiler doing clever
things allocating it on the stack.
Patche 1 is preparatory; 2 - the main fix; 3 - fixes remaining
special cases.
Ivan.
[1] https://lore.kernel.org/rcu/CA+=Fv5R9NG+1SHU9QV9hjmavycHKpnNyerQ=Ei90G98ukRcRJA@mail.gmail.com/#r
[2] https://lore.kernel.org/rcu/alpine.DEB.2.21.2501130248010.18889@angie.orcam.me.uk/
[3] https://bitsavers.org/pdf/dec/alpha/Alpha_Calling_Standard_Rev_2.0_19900427.pdf
---
Changes in v2:
- patch #1: provide empty 'struct pt_regs' to fix compile failure in libbpf,
reported by John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>;
update comment and commit message accordingly;
- cc'ed <stable@vger.kernel.org> as older kernels ought to be fixed as well.
Changes in v3:
- patch #1 dropped for the time being;
- updated commit messages as Maciej suggested.
---
Ivan Kokshaysky (3):
alpha: replace hardcoded stack offsets with autogenerated ones
alpha: make stack 16-byte aligned (most cases)
alpha: align stack for page fault and user unaligned trap handlers
arch/alpha/include/uapi/asm/ptrace.h | 2 ++
arch/alpha/kernel/asm-offsets.c | 4 ++++
arch/alpha/kernel/entry.S | 24 ++++++++++--------------
arch/alpha/kernel/traps.c | 2 +-
arch/alpha/mm/fault.c | 4 ++--
5 files changed, 19 insertions(+), 17 deletions(-)
--
2.47.2
Hi,
On Tue, 2025-02-04 at 23:35 +0100, Ivan Kokshaysky wrote:
> This series fixes oopses on Alpha/SMP observed since kernel v6.9. [1]
> Thanks to Magnus Lindholm for identifying that remarkably longstanding
> bug.
>
> The problem is that GCC expects 16-byte alignment of the incoming stack
> since early 2004, as Maciej found out [2]:
> Having actually dug speculatively I can see that the psABI was changed in
> GCC 3.5 with commit e5e10fb4a350 ("re PR target/14539 (128-bit long double
> improperly aligned)") back in Mar 2004, when the stack pointer alignment
> was increased from 8 bytes to 16 bytes, and arch/alpha/kernel/entry.S has
> various suspicious stack pointer adjustments, starting with SP_OFF which
> is not a whole multiple of 16.
>
> Also, as Magnus noted, "ALPHA Calling Standard" [3] required the same:
> D.3.1 Stack Alignment
> This standard requires that stacks be octaword aligned at the time a
> new procedure is invoked.
>
> However:
> - the "normal" kernel stack is always misaligned by 8 bytes, thanks to
> the odd number of 64-bit words in 'struct pt_regs', which is the very
> first thing pushed onto the kernel thread stack;
> - syscall, fault, interrupt etc. handlers may, or may not, receive aligned
> stack depending on numerous factors.
>
> Somehow we got away with it until recently, when we ended up with
> a stack corruption in kernel/smp.c:smp_call_function_single() due to
> its use of 32-byte aligned local data and the compiler doing clever
> things allocating it on the stack.
>
> Patche 1 is preparatory; 2 - the main fix; 3 - fixes remaining
> special cases.
>
> Ivan.
>
> [1] https://lore.kernel.org/rcu/CA+=Fv5R9NG+1SHU9QV9hjmavycHKpnNyerQ=Ei90G98ukRcRJA@mail.gmail.com/#r
> [2] https://lore.kernel.org/rcu/alpine.DEB.2.21.2501130248010.18889@angie.orcam.me.uk/
> [3] https://bitsavers.org/pdf/dec/alpha/Alpha_Calling_Standard_Rev_2.0_19900427.pdf
> ---
> Changes in v2:
> - patch #1: provide empty 'struct pt_regs' to fix compile failure in libbpf,
> reported by John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>;
> update comment and commit message accordingly;
> - cc'ed <stable@vger.kernel.org> as older kernels ought to be fixed as well.
>
> Changes in v3:
> - patch #1 dropped for the time being;
> - updated commit messages as Maciej suggested.
> ---
> Ivan Kokshaysky (3):
> alpha: replace hardcoded stack offsets with autogenerated ones
> alpha: make stack 16-byte aligned (most cases)
> alpha: align stack for page fault and user unaligned trap handlers
>
> arch/alpha/include/uapi/asm/ptrace.h | 2 ++
> arch/alpha/kernel/asm-offsets.c | 4 ++++
> arch/alpha/kernel/entry.S | 24 ++++++++++--------------
> arch/alpha/kernel/traps.c | 2 +-
> arch/alpha/mm/fault.c | 4 ++--
> 5 files changed, 19 insertions(+), 17 deletions(-)
Can we get this landed this week, maybe for v6.14-rc3? This way it will quickly
backported to various stable kernels which means it will reach Debian unstable
within a few days.
Thanks,
Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
On Tue, Feb 11, 2025 at 2:20 AM John Paul Adrian Glaubitz
<glaubitz@physik.fu-berlin.de> wrote:
>
> Hi,
>
> On Tue, 2025-02-04 at 23:35 +0100, Ivan Kokshaysky wrote:
> > This series fixes oopses on Alpha/SMP observed since kernel v6.9. [1]
> > Thanks to Magnus Lindholm for identifying that remarkably longstanding
> > bug.
> >
> > The problem is that GCC expects 16-byte alignment of the incoming stack
> > since early 2004, as Maciej found out [2]:
> > Having actually dug speculatively I can see that the psABI was changed in
> > GCC 3.5 with commit e5e10fb4a350 ("re PR target/14539 (128-bit long double
> > improperly aligned)") back in Mar 2004, when the stack pointer alignment
> > was increased from 8 bytes to 16 bytes, and arch/alpha/kernel/entry.S has
> > various suspicious stack pointer adjustments, starting with SP_OFF which
> > is not a whole multiple of 16.
> >
> > Also, as Magnus noted, "ALPHA Calling Standard" [3] required the same:
> > D.3.1 Stack Alignment
> > This standard requires that stacks be octaword aligned at the time a
> > new procedure is invoked.
> >
> > However:
> > - the "normal" kernel stack is always misaligned by 8 bytes, thanks to
> > the odd number of 64-bit words in 'struct pt_regs', which is the very
> > first thing pushed onto the kernel thread stack;
> > - syscall, fault, interrupt etc. handlers may, or may not, receive aligned
> > stack depending on numerous factors.
> >
> > Somehow we got away with it until recently, when we ended up with
> > a stack corruption in kernel/smp.c:smp_call_function_single() due to
> > its use of 32-byte aligned local data and the compiler doing clever
> > things allocating it on the stack.
> >
> > Patche 1 is preparatory; 2 - the main fix; 3 - fixes remaining
> > special cases.
> >
> > Ivan.
> >
> > [1] https://lore.kernel.org/rcu/CA+=Fv5R9NG+1SHU9QV9hjmavycHKpnNyerQ=Ei90G98ukRcRJA@mail.gmail.com/#r
> > [2] https://lore.kernel.org/rcu/alpine.DEB.2.21.2501130248010.18889@angie.orcam.me.uk/
> > [3] https://bitsavers.org/pdf/dec/alpha/Alpha_Calling_Standard_Rev_2.0_19900427.pdf
> > ---
> > Changes in v2:
> > - patch #1: provide empty 'struct pt_regs' to fix compile failure in libbpf,
> > reported by John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>;
> > update comment and commit message accordingly;
> > - cc'ed <stable@vger.kernel.org> as older kernels ought to be fixed as well.
> >
> > Changes in v3:
> > - patch #1 dropped for the time being;
> > - updated commit messages as Maciej suggested.
> > ---
> > Ivan Kokshaysky (3):
> > alpha: replace hardcoded stack offsets with autogenerated ones
> > alpha: make stack 16-byte aligned (most cases)
> > alpha: align stack for page fault and user unaligned trap handlers
> >
> > arch/alpha/include/uapi/asm/ptrace.h | 2 ++
> > arch/alpha/kernel/asm-offsets.c | 4 ++++
> > arch/alpha/kernel/entry.S | 24 ++++++++++--------------
> > arch/alpha/kernel/traps.c | 2 +-
> > arch/alpha/mm/fault.c | 4 ++--
> > 5 files changed, 19 insertions(+), 17 deletions(-)
>
> Can we get this landed this week, maybe for v6.14-rc3? This way it will quickly
> backported to various stable kernels which means it will reach Debian unstable
> within a few days.
>
> Thanks,
> Adrian
Yeah, I'll vacuum the patches up and send them to Linus.
I've been running them on my ES47 and while my ES47 is still unstable,
these patches definitely solve this bug [1] and fix failures I saw
with the rcu_torture module!
Thanks a bunch to all of you!
[1] https://bugzilla.kernel.org/show_bug.cgi?id=213143
On Tue, 4 Feb 2025, Ivan Kokshaysky wrote:
> This series fixes oopses on Alpha/SMP observed since kernel v6.9. [1]
> Thanks to Magnus Lindholm for identifying that remarkably longstanding
> bug.
Thank you. TL;DR: I've completed regression testing now and the results
are looking good, so I have posted my Tested-by: tags now.
Sadly original verification triggered a glitch which sometimes happens in
the operation of my server's intranet network card that puts the device
offline until it's hot-removed and the re-discovered. I suspect a driver
bug triggering under higher traffic, but that yet has to be investigated
and it's a bit of a challenge with a machine that is at a remote location
and in constant use.
I have only discovered it once 1023 tests got affected and consequently
failed due to the inability to reach the target Alpha system. I concluded
it was infeasible to run the failed tests by hand and then try to merge
the results.
Instead I disabled the extra prolonged tests discussed in the previous
update and rerun the remaining whole testsuite. That completed in ~23h
and produced a few regressions and progressions, the former for the "math"
and "nptl" test subsets, as well as one "string" test ("test-strnlen"). I
have therefore verified them by hand and triggered the failures with your
patchset removed, so I concluded their failed results are not the outcome
of your changes.
So this is good to go in as far as I'm concerned, as noted at the top.
Thank you for taking time to work on this problem.
Maciej
© 2016 - 2025 Red Hat, Inc.