[PATCH v3 0/3] alpha: stack fixes

Ivan Kokshaysky posted 3 patches 10 months, 2 weeks ago
arch/alpha/include/uapi/asm/ptrace.h |  2 ++
arch/alpha/kernel/asm-offsets.c      |  4 ++++
arch/alpha/kernel/entry.S            | 24 ++++++++++--------------
arch/alpha/kernel/traps.c            |  2 +-
arch/alpha/mm/fault.c                |  4 ++--
5 files changed, 19 insertions(+), 17 deletions(-)
[PATCH v3 0/3] alpha: stack fixes
Posted by Ivan Kokshaysky 10 months, 2 weeks ago
This series fixes oopses on Alpha/SMP observed since kernel v6.9. [1]
Thanks to Magnus Lindholm for identifying that remarkably longstanding
bug.

The problem is that GCC expects 16-byte alignment of the incoming stack
since early 2004, as Maciej found out [2]:
  Having actually dug speculatively I can see that the psABI was changed in
 GCC 3.5 with commit e5e10fb4a350 ("re PR target/14539 (128-bit long double
 improperly aligned)") back in Mar 2004, when the stack pointer alignment
 was increased from 8 bytes to 16 bytes, and arch/alpha/kernel/entry.S has
 various suspicious stack pointer adjustments, starting with SP_OFF which
 is not a whole multiple of 16.

Also, as Magnus noted, "ALPHA Calling Standard" [3] required the same:
 D.3.1 Stack Alignment
  This standard requires that stacks be octaword aligned at the time a
  new procedure is invoked.

However:
- the "normal" kernel stack is always misaligned by 8 bytes, thanks to
  the odd number of 64-bit words in 'struct pt_regs', which is the very
  first thing pushed onto the kernel thread stack;
- syscall, fault, interrupt etc. handlers may, or may not, receive aligned
  stack depending on numerous factors.

Somehow we got away with it until recently, when we ended up with
a stack corruption in kernel/smp.c:smp_call_function_single() due to
its use of 32-byte aligned local data and the compiler doing clever
things allocating it on the stack.

Patche 1 is preparatory; 2 - the main fix; 3 - fixes remaining
special cases.

Ivan.

[1] https://lore.kernel.org/rcu/CA+=Fv5R9NG+1SHU9QV9hjmavycHKpnNyerQ=Ei90G98ukRcRJA@mail.gmail.com/#r
[2] https://lore.kernel.org/rcu/alpine.DEB.2.21.2501130248010.18889@angie.orcam.me.uk/
[3] https://bitsavers.org/pdf/dec/alpha/Alpha_Calling_Standard_Rev_2.0_19900427.pdf
---
Changes in v2:
- patch #1: provide empty 'struct pt_regs' to fix compile failure in libbpf,
  reported by John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>;
  update comment and commit message accordingly;
- cc'ed <stable@vger.kernel.org> as older kernels ought to be fixed as well.

Changes in v3:
- patch #1 dropped for the time being;
- updated commit messages as Maciej suggested.
---
Ivan Kokshaysky (3):
  alpha: replace hardcoded stack offsets with autogenerated ones
  alpha: make stack 16-byte aligned (most cases)
  alpha: align stack for page fault and user unaligned trap handlers

 arch/alpha/include/uapi/asm/ptrace.h |  2 ++
 arch/alpha/kernel/asm-offsets.c      |  4 ++++
 arch/alpha/kernel/entry.S            | 24 ++++++++++--------------
 arch/alpha/kernel/traps.c            |  2 +-
 arch/alpha/mm/fault.c                |  4 ++--
 5 files changed, 19 insertions(+), 17 deletions(-)

-- 
2.47.2
Re: [PATCH v3 0/3] alpha: stack fixes
Posted by John Paul Adrian Glaubitz 10 months, 1 week ago
Hi,

On Tue, 2025-02-04 at 23:35 +0100, Ivan Kokshaysky wrote:
> This series fixes oopses on Alpha/SMP observed since kernel v6.9. [1]
> Thanks to Magnus Lindholm for identifying that remarkably longstanding
> bug.
> 
> The problem is that GCC expects 16-byte alignment of the incoming stack
> since early 2004, as Maciej found out [2]:
>   Having actually dug speculatively I can see that the psABI was changed in
>  GCC 3.5 with commit e5e10fb4a350 ("re PR target/14539 (128-bit long double
>  improperly aligned)") back in Mar 2004, when the stack pointer alignment
>  was increased from 8 bytes to 16 bytes, and arch/alpha/kernel/entry.S has
>  various suspicious stack pointer adjustments, starting with SP_OFF which
>  is not a whole multiple of 16.
> 
> Also, as Magnus noted, "ALPHA Calling Standard" [3] required the same:
>  D.3.1 Stack Alignment
>   This standard requires that stacks be octaword aligned at the time a
>   new procedure is invoked.
> 
> However:
> - the "normal" kernel stack is always misaligned by 8 bytes, thanks to
>   the odd number of 64-bit words in 'struct pt_regs', which is the very
>   first thing pushed onto the kernel thread stack;
> - syscall, fault, interrupt etc. handlers may, or may not, receive aligned
>   stack depending on numerous factors.
> 
> Somehow we got away with it until recently, when we ended up with
> a stack corruption in kernel/smp.c:smp_call_function_single() due to
> its use of 32-byte aligned local data and the compiler doing clever
> things allocating it on the stack.
> 
> Patche 1 is preparatory; 2 - the main fix; 3 - fixes remaining
> special cases.
> 
> Ivan.
> 
> [1] https://lore.kernel.org/rcu/CA+=Fv5R9NG+1SHU9QV9hjmavycHKpnNyerQ=Ei90G98ukRcRJA@mail.gmail.com/#r
> [2] https://lore.kernel.org/rcu/alpine.DEB.2.21.2501130248010.18889@angie.orcam.me.uk/
> [3] https://bitsavers.org/pdf/dec/alpha/Alpha_Calling_Standard_Rev_2.0_19900427.pdf
> ---
> Changes in v2:
> - patch #1: provide empty 'struct pt_regs' to fix compile failure in libbpf,
>   reported by John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>;
>   update comment and commit message accordingly;
> - cc'ed <stable@vger.kernel.org> as older kernels ought to be fixed as well.
> 
> Changes in v3:
> - patch #1 dropped for the time being;
> - updated commit messages as Maciej suggested.
> ---
> Ivan Kokshaysky (3):
>   alpha: replace hardcoded stack offsets with autogenerated ones
>   alpha: make stack 16-byte aligned (most cases)
>   alpha: align stack for page fault and user unaligned trap handlers
> 
>  arch/alpha/include/uapi/asm/ptrace.h |  2 ++
>  arch/alpha/kernel/asm-offsets.c      |  4 ++++
>  arch/alpha/kernel/entry.S            | 24 ++++++++++--------------
>  arch/alpha/kernel/traps.c            |  2 +-
>  arch/alpha/mm/fault.c                |  4 ++--
>  5 files changed, 19 insertions(+), 17 deletions(-)

Can we get this landed this week, maybe for v6.14-rc3? This way it will quickly
backported to various stable kernels which means it will reach Debian unstable
within a few days.

Thanks,
Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913
Re: [PATCH v3 0/3] alpha: stack fixes
Posted by Matt Turner 10 months, 1 week ago
On Tue, Feb 11, 2025 at 2:20 AM John Paul Adrian Glaubitz
<glaubitz@physik.fu-berlin.de> wrote:
>
> Hi,
>
> On Tue, 2025-02-04 at 23:35 +0100, Ivan Kokshaysky wrote:
> > This series fixes oopses on Alpha/SMP observed since kernel v6.9. [1]
> > Thanks to Magnus Lindholm for identifying that remarkably longstanding
> > bug.
> >
> > The problem is that GCC expects 16-byte alignment of the incoming stack
> > since early 2004, as Maciej found out [2]:
> >   Having actually dug speculatively I can see that the psABI was changed in
> >  GCC 3.5 with commit e5e10fb4a350 ("re PR target/14539 (128-bit long double
> >  improperly aligned)") back in Mar 2004, when the stack pointer alignment
> >  was increased from 8 bytes to 16 bytes, and arch/alpha/kernel/entry.S has
> >  various suspicious stack pointer adjustments, starting with SP_OFF which
> >  is not a whole multiple of 16.
> >
> > Also, as Magnus noted, "ALPHA Calling Standard" [3] required the same:
> >  D.3.1 Stack Alignment
> >   This standard requires that stacks be octaword aligned at the time a
> >   new procedure is invoked.
> >
> > However:
> > - the "normal" kernel stack is always misaligned by 8 bytes, thanks to
> >   the odd number of 64-bit words in 'struct pt_regs', which is the very
> >   first thing pushed onto the kernel thread stack;
> > - syscall, fault, interrupt etc. handlers may, or may not, receive aligned
> >   stack depending on numerous factors.
> >
> > Somehow we got away with it until recently, when we ended up with
> > a stack corruption in kernel/smp.c:smp_call_function_single() due to
> > its use of 32-byte aligned local data and the compiler doing clever
> > things allocating it on the stack.
> >
> > Patche 1 is preparatory; 2 - the main fix; 3 - fixes remaining
> > special cases.
> >
> > Ivan.
> >
> > [1] https://lore.kernel.org/rcu/CA+=Fv5R9NG+1SHU9QV9hjmavycHKpnNyerQ=Ei90G98ukRcRJA@mail.gmail.com/#r
> > [2] https://lore.kernel.org/rcu/alpine.DEB.2.21.2501130248010.18889@angie.orcam.me.uk/
> > [3] https://bitsavers.org/pdf/dec/alpha/Alpha_Calling_Standard_Rev_2.0_19900427.pdf
> > ---
> > Changes in v2:
> > - patch #1: provide empty 'struct pt_regs' to fix compile failure in libbpf,
> >   reported by John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>;
> >   update comment and commit message accordingly;
> > - cc'ed <stable@vger.kernel.org> as older kernels ought to be fixed as well.
> >
> > Changes in v3:
> > - patch #1 dropped for the time being;
> > - updated commit messages as Maciej suggested.
> > ---
> > Ivan Kokshaysky (3):
> >   alpha: replace hardcoded stack offsets with autogenerated ones
> >   alpha: make stack 16-byte aligned (most cases)
> >   alpha: align stack for page fault and user unaligned trap handlers
> >
> >  arch/alpha/include/uapi/asm/ptrace.h |  2 ++
> >  arch/alpha/kernel/asm-offsets.c      |  4 ++++
> >  arch/alpha/kernel/entry.S            | 24 ++++++++++--------------
> >  arch/alpha/kernel/traps.c            |  2 +-
> >  arch/alpha/mm/fault.c                |  4 ++--
> >  5 files changed, 19 insertions(+), 17 deletions(-)
>
> Can we get this landed this week, maybe for v6.14-rc3? This way it will quickly
> backported to various stable kernels which means it will reach Debian unstable
> within a few days.
>
> Thanks,
> Adrian

Yeah, I'll vacuum the patches up and send them to Linus.

I've been running them on my ES47 and while my ES47 is still unstable,
these patches definitely solve this bug [1] and fix failures I saw
with the rcu_torture module!

Thanks a bunch to all of you!

[1] https://bugzilla.kernel.org/show_bug.cgi?id=213143
Re: [PATCH v3 0/3] alpha: stack fixes
Posted by Maciej W. Rozycki 10 months, 2 weeks ago
On Tue, 4 Feb 2025, Ivan Kokshaysky wrote:

> This series fixes oopses on Alpha/SMP observed since kernel v6.9. [1]
> Thanks to Magnus Lindholm for identifying that remarkably longstanding
> bug.

 Thank you.  TL;DR: I've completed regression testing now and the results 
are looking good, so I have posted my Tested-by: tags now.

 Sadly original verification triggered a glitch which sometimes happens in 
the operation of my server's intranet network card that puts the device 
offline until it's hot-removed and the re-discovered.  I suspect a driver 
bug triggering under higher traffic, but that yet has to be investigated 
and it's a bit of a challenge with a machine that is at a remote location 
and in constant use.

 I have only discovered it once 1023 tests got affected and consequently 
failed due to the inability to reach the target Alpha system.  I concluded 
it was infeasible to run the failed tests by hand and then try to merge 
the results.

 Instead I disabled the extra prolonged tests discussed in the previous  
update and rerun the remaining whole testsuite.  That completed in ~23h 
and produced a few regressions and progressions, the former for the "math" 
and "nptl" test subsets, as well as one "string" test ("test-strnlen").  I 
have therefore verified them by hand and triggered the failures with your 
patchset removed, so I concluded their failed results are not the outcome 
of your changes.

 So this is good to go in as far as I'm concerned, as noted at the top.  
Thank you for taking time to work on this problem.

  Maciej