[PATCH 00/14] alpha: cleanups for 6.10

Arnd Bergmann posted 14 patches 1 year, 7 months ago
Documentation/driver-api/eisa.rst      |   4 +-
arch/alpha/Kconfig                     | 175 +-------
arch/alpha/Makefile                    |   8 +-
arch/alpha/include/asm/core_apecs.h    | 534 -------------------------
arch/alpha/include/asm/core_lca.h      | 378 -----------------
arch/alpha/include/asm/core_t2.h       |   8 -
arch/alpha/include/asm/dma-mapping.h   |   4 -
arch/alpha/include/asm/dma.h           |   9 +-
arch/alpha/include/asm/elf.h           |   4 +-
arch/alpha/include/asm/io.h            |  26 +-
arch/alpha/include/asm/irq.h           |  10 +-
arch/alpha/include/asm/jensen.h        | 363 -----------------
arch/alpha/include/asm/machvec.h       |   9 -
arch/alpha/include/asm/mmu_context.h   |  45 +--
arch/alpha/include/asm/special_insns.h |   5 +-
arch/alpha/include/asm/tlbflush.h      |  41 +-
arch/alpha/include/asm/uaccess.h       |  80 ----
arch/alpha/include/asm/vga.h           |   2 +
arch/alpha/include/uapi/asm/compiler.h |  18 -
arch/alpha/kernel/Makefile             |  25 +-
arch/alpha/kernel/asm-offsets.c        |  21 +-
arch/alpha/kernel/bugs.c               |   1 +
arch/alpha/kernel/console.c            |   1 +
arch/alpha/kernel/core_apecs.c         | 420 -------------------
arch/alpha/kernel/core_cia.c           |   6 +-
arch/alpha/kernel/core_irongate.c      |   1 -
arch/alpha/kernel/core_lca.c           | 517 ------------------------
arch/alpha/kernel/core_marvel.c        |   2 +-
arch/alpha/kernel/core_t2.c            |   2 +-
arch/alpha/kernel/core_wildfire.c      |   8 +-
arch/alpha/kernel/entry.S              |   1 +
arch/alpha/kernel/io.c                 |  19 +
arch/alpha/kernel/irq.c                |   1 +
arch/alpha/kernel/irq_i8259.c          |   4 -
arch/alpha/kernel/machvec_impl.h       |  25 +-
arch/alpha/kernel/pci-noop.c           | 113 ------
arch/alpha/kernel/pci_impl.h           |   4 +-
arch/alpha/kernel/perf_event.c         |   2 +-
arch/alpha/kernel/proto.h              |  44 +-
arch/alpha/kernel/setup.c              | 109 +----
arch/alpha/kernel/smc37c669.c          |   6 +-
arch/alpha/kernel/smc37c93x.c          |   2 +
arch/alpha/kernel/smp.c                |   1 +
arch/alpha/kernel/srmcons.c            |   2 +
arch/alpha/kernel/sys_cabriolet.c      |  87 +---
arch/alpha/kernel/sys_eb64p.c          | 238 -----------
arch/alpha/kernel/sys_jensen.c         | 237 -----------
arch/alpha/kernel/sys_mikasa.c         |  57 ---
arch/alpha/kernel/sys_nautilus.c       |   8 +-
arch/alpha/kernel/sys_noritake.c       |  60 ---
arch/alpha/kernel/sys_sable.c          | 294 +-------------
arch/alpha/kernel/sys_sio.c            | 486 ----------------------
arch/alpha/kernel/syscalls/syscall.tbl |   2 +-
arch/alpha/kernel/traps.c              |  64 ---
arch/alpha/lib/Makefile                |  14 -
arch/alpha/lib/checksum.c              |   1 +
arch/alpha/lib/fpreg.c                 |   1 +
arch/alpha/lib/memcpy.c                |   3 +
arch/alpha/lib/stycpy.S                |  11 +
arch/alpha/lib/styncpy.S               |  11 +
arch/alpha/math-emu/math.c             |   7 +-
arch/alpha/mm/init.c                   |   2 +-
drivers/char/agp/alpha-agp.c           |   2 +-
drivers/eisa/Kconfig                   |   9 +-
drivers/eisa/virtual_root.c            |   2 +-
drivers/input/serio/i8042-io.h         |   5 +-
drivers/tty/serial/8250/8250.h         |   3 -
drivers/tty/serial/8250/8250_alpha.c   |  21 -
drivers/tty/serial/8250/8250_core.c    |   4 -
drivers/tty/serial/8250/Makefile       |   2 -
include/linux/blk_types.h              |   6 -
include/linux/tty.h                    |  14 +-
72 files changed, 166 insertions(+), 4545 deletions(-)
delete mode 100644 arch/alpha/include/asm/core_apecs.h
delete mode 100644 arch/alpha/include/asm/core_lca.h
delete mode 100644 arch/alpha/include/asm/jensen.h
delete mode 100644 arch/alpha/kernel/core_apecs.c
delete mode 100644 arch/alpha/kernel/core_lca.c
delete mode 100644 arch/alpha/kernel/pci-noop.c
delete mode 100644 arch/alpha/kernel/sys_eb64p.c
delete mode 100644 arch/alpha/kernel/sys_jensen.c
delete mode 100644 arch/alpha/kernel/sys_sio.c
create mode 100644 arch/alpha/lib/stycpy.S
create mode 100644 arch/alpha/lib/styncpy.S
delete mode 100644 drivers/tty/serial/8250/8250_alpha.c
[PATCH 00/14] alpha: cleanups for 6.10
Posted by Arnd Bergmann 1 year, 7 months ago
From: Arnd Bergmann <arnd@arndb.de>

I had investigated dropping support for alpha EV5 and earlier a while
ago after noticing that this is the only supported CPU family
in the kernel without native byte access and that Debian has already
dropped support for this generation last year [1] after it turned
out to be broken.

This topic came up again when Paul E. McKenney noticed that
parts of the RCU code already rely on byte access and do not
work on alpha EV5 reliably, so I refreshed my series now for
inclusion into the next merge window.

Al Viro did another series for alpha to address all the known build
issues. I rebased his patches without any further changes and included
it as a baseline for my work here to avoid conflicts.

      Arnd

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1036158
[2] https://lore.kernel.org/lkml/b67e79d4-06cb-4a45-a906-b9e0fbae22c5@paulmck-laptop/

Al Viro (9):
  alpha: sort scr_mem{cpy,move}w() out
  alpha: fix modversions for strcpy() et.al.
  alpha: add clone3() support
  alpha: don't make functions public without a reason
  alpha: sys_sio: fix misspelled ifdefs
  alpha: missing includes
  alpha: core_lca: take the unused functions out
  alpha: jensen, t2 - make __EXTERN_INLINE same as for the rest
  alpha: trim the unused stuff from asm-offsets.c

Arnd Bergmann (5):
  alpha: remove DECpc AXP150 (Jensen) support
  alpha: sable: remove early machine support
  alpha: remove LCA and APECS based machines
  alpha: cabriolet: remove EV5 CPU support
  alpha: drop pre-EV56 support

 Documentation/driver-api/eisa.rst      |   4 +-
 arch/alpha/Kconfig                     | 175 +-------
 arch/alpha/Makefile                    |   8 +-
 arch/alpha/include/asm/core_apecs.h    | 534 -------------------------
 arch/alpha/include/asm/core_lca.h      | 378 -----------------
 arch/alpha/include/asm/core_t2.h       |   8 -
 arch/alpha/include/asm/dma-mapping.h   |   4 -
 arch/alpha/include/asm/dma.h           |   9 +-
 arch/alpha/include/asm/elf.h           |   4 +-
 arch/alpha/include/asm/io.h            |  26 +-
 arch/alpha/include/asm/irq.h           |  10 +-
 arch/alpha/include/asm/jensen.h        | 363 -----------------
 arch/alpha/include/asm/machvec.h       |   9 -
 arch/alpha/include/asm/mmu_context.h   |  45 +--
 arch/alpha/include/asm/special_insns.h |   5 +-
 arch/alpha/include/asm/tlbflush.h      |  41 +-
 arch/alpha/include/asm/uaccess.h       |  80 ----
 arch/alpha/include/asm/vga.h           |   2 +
 arch/alpha/include/uapi/asm/compiler.h |  18 -
 arch/alpha/kernel/Makefile             |  25 +-
 arch/alpha/kernel/asm-offsets.c        |  21 +-
 arch/alpha/kernel/bugs.c               |   1 +
 arch/alpha/kernel/console.c            |   1 +
 arch/alpha/kernel/core_apecs.c         | 420 -------------------
 arch/alpha/kernel/core_cia.c           |   6 +-
 arch/alpha/kernel/core_irongate.c      |   1 -
 arch/alpha/kernel/core_lca.c           | 517 ------------------------
 arch/alpha/kernel/core_marvel.c        |   2 +-
 arch/alpha/kernel/core_t2.c            |   2 +-
 arch/alpha/kernel/core_wildfire.c      |   8 +-
 arch/alpha/kernel/entry.S              |   1 +
 arch/alpha/kernel/io.c                 |  19 +
 arch/alpha/kernel/irq.c                |   1 +
 arch/alpha/kernel/irq_i8259.c          |   4 -
 arch/alpha/kernel/machvec_impl.h       |  25 +-
 arch/alpha/kernel/pci-noop.c           | 113 ------
 arch/alpha/kernel/pci_impl.h           |   4 +-
 arch/alpha/kernel/perf_event.c         |   2 +-
 arch/alpha/kernel/proto.h              |  44 +-
 arch/alpha/kernel/setup.c              | 109 +----
 arch/alpha/kernel/smc37c669.c          |   6 +-
 arch/alpha/kernel/smc37c93x.c          |   2 +
 arch/alpha/kernel/smp.c                |   1 +
 arch/alpha/kernel/srmcons.c            |   2 +
 arch/alpha/kernel/sys_cabriolet.c      |  87 +---
 arch/alpha/kernel/sys_eb64p.c          | 238 -----------
 arch/alpha/kernel/sys_jensen.c         | 237 -----------
 arch/alpha/kernel/sys_mikasa.c         |  57 ---
 arch/alpha/kernel/sys_nautilus.c       |   8 +-
 arch/alpha/kernel/sys_noritake.c       |  60 ---
 arch/alpha/kernel/sys_sable.c          | 294 +-------------
 arch/alpha/kernel/sys_sio.c            | 486 ----------------------
 arch/alpha/kernel/syscalls/syscall.tbl |   2 +-
 arch/alpha/kernel/traps.c              |  64 ---
 arch/alpha/lib/Makefile                |  14 -
 arch/alpha/lib/checksum.c              |   1 +
 arch/alpha/lib/fpreg.c                 |   1 +
 arch/alpha/lib/memcpy.c                |   3 +
 arch/alpha/lib/stycpy.S                |  11 +
 arch/alpha/lib/styncpy.S               |  11 +
 arch/alpha/math-emu/math.c             |   7 +-
 arch/alpha/mm/init.c                   |   2 +-
 drivers/char/agp/alpha-agp.c           |   2 +-
 drivers/eisa/Kconfig                   |   9 +-
 drivers/eisa/virtual_root.c            |   2 +-
 drivers/input/serio/i8042-io.h         |   5 +-
 drivers/tty/serial/8250/8250.h         |   3 -
 drivers/tty/serial/8250/8250_alpha.c   |  21 -
 drivers/tty/serial/8250/8250_core.c    |   4 -
 drivers/tty/serial/8250/Makefile       |   2 -
 include/linux/blk_types.h              |   6 -
 include/linux/tty.h                    |  14 +-
 72 files changed, 166 insertions(+), 4545 deletions(-)
 delete mode 100644 arch/alpha/include/asm/core_apecs.h
 delete mode 100644 arch/alpha/include/asm/core_lca.h
 delete mode 100644 arch/alpha/include/asm/jensen.h
 delete mode 100644 arch/alpha/kernel/core_apecs.c
 delete mode 100644 arch/alpha/kernel/core_lca.c
 delete mode 100644 arch/alpha/kernel/pci-noop.c
 delete mode 100644 arch/alpha/kernel/sys_eb64p.c
 delete mode 100644 arch/alpha/kernel/sys_jensen.c
 delete mode 100644 arch/alpha/kernel/sys_sio.c
 create mode 100644 arch/alpha/lib/stycpy.S
 create mode 100644 arch/alpha/lib/styncpy.S
 delete mode 100644 drivers/tty/serial/8250/8250_alpha.c

-- 
2.39.2

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: linux-alpha@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Matt Turner 1 year, 7 months ago
On Fri, May 3, 2024 at 4:12 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> From: Arnd Bergmann <arnd@arndb.de>
>
> I had investigated dropping support for alpha EV5 and earlier a while
> ago after noticing that this is the only supported CPU family
> in the kernel without native byte access and that Debian has already
> dropped support for this generation last year [1] after it turned
> out to be broken.
>
> This topic came up again when Paul E. McKenney noticed that
> parts of the RCU code already rely on byte access and do not
> work on alpha EV5 reliably, so I refreshed my series now for
> inclusion into the next merge window.
>
> Al Viro did another series for alpha to address all the known build
> issues. I rebased his patches without any further changes and included
> it as a baseline for my work here to avoid conflicts.

Thanks for all this. Removing support for non-BWX alphas makes a lot
of sense to me.

The whole series is

Acked-by: Matt Turner <mattst88@gmail.com>
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by John Paul Adrian Glaubitz 1 year, 7 months ago
Hello Arnd,

On Fri, 2024-05-03 at 10:11 +0200, Arnd Bergmann wrote:
> I had investigated dropping support for alpha EV5 and earlier a while
> ago after noticing that this is the only supported CPU family
> in the kernel without native byte access and that Debian has already
> dropped support for this generation last year [1] after it turned
> out to be broken.

That's not quite correct. Support for older Alphas is not broken and
always worked when I tested it. It's just that some people wanted to
raise the baseline in order to improve code performance on newer machines
with the hope to fix some minor issues we saw on Alpha here and there.

> This topic came up again when Paul E. McKenney noticed that
> parts of the RCU code already rely on byte access and do not
> work on alpha EV5 reliably, so I refreshed my series now for
> inclusion into the next merge window.

Hrrrm? That sounds like like Paul ran tests on EV5, did he?

> Al Viro did another series for alpha to address all the known build
> issues. I rebased his patches without any further changes and included
> it as a baseline for my work here to avoid conflicts.

It's somewhat strange that Al improves code on the older machines only
to be axed by your series. I would prefer such removals to aimed at an
LTS release, if possible.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Maciej W. Rozycki 1 year, 6 months ago
On Fri, 3 May 2024, John Paul Adrian Glaubitz wrote:

> > I had investigated dropping support for alpha EV5 and earlier a while
> > ago after noticing that this is the only supported CPU family
> > in the kernel without native byte access and that Debian has already
> > dropped support for this generation last year [1] after it turned
> > out to be broken.
> 
> That's not quite correct. Support for older Alphas is not broken and
> always worked when I tested it. It's just that some people wanted to
> raise the baseline in order to improve code performance on newer machines
> with the hope to fix some minor issues we saw on Alpha here and there.

 I'm not quite happy to see pre-EV5 support go as EV45 is all the Alpha 
hardware I have and it's only owing to issues with the firmware of my 
console manager hardware that I haven't deployed it at my lab yet for 
Linux and GNU toolchain verification.  I'd rather I wasn't stuck with an 
obsolete version of Linux.

> > This topic came up again when Paul E. McKenney noticed that
> > parts of the RCU code already rely on byte access and do not
> > work on alpha EV5 reliably, so I refreshed my series now for
> > inclusion into the next merge window.
> 
> Hrrrm? That sounds like like Paul ran tests on EV5, did he?

 What exactly is required to make it work?

  Maciej
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Paul E. McKenney 1 year, 6 months ago
On Tue, May 28, 2024 at 12:49:16AM +0100, Maciej W. Rozycki wrote:
> On Fri, 3 May 2024, John Paul Adrian Glaubitz wrote:
> 
> > > I had investigated dropping support for alpha EV5 and earlier a while
> > > ago after noticing that this is the only supported CPU family
> > > in the kernel without native byte access and that Debian has already
> > > dropped support for this generation last year [1] after it turned
> > > out to be broken.
> > 
> > That's not quite correct. Support for older Alphas is not broken and
> > always worked when I tested it. It's just that some people wanted to
> > raise the baseline in order to improve code performance on newer machines
> > with the hope to fix some minor issues we saw on Alpha here and there.
> 
>  I'm not quite happy to see pre-EV5 support go as EV45 is all the Alpha 
> hardware I have and it's only owing to issues with the firmware of my 
> console manager hardware that I haven't deployed it at my lab yet for 
> Linux and GNU toolchain verification.  I'd rather I wasn't stuck with an 
> obsolete version of Linux.
> 
> > > This topic came up again when Paul E. McKenney noticed that
> > > parts of the RCU code already rely on byte access and do not
> > > work on alpha EV5 reliably, so I refreshed my series now for
> > > inclusion into the next merge window.
> > 
> > Hrrrm? That sounds like like Paul ran tests on EV5, did he?
> 
>  What exactly is required to make it work?

Whatever changes are needed to prevent the data corruption that can
currently result in code generated by single-byte stores.  For but one
example, consider a pair of tasks (or one task and an interrupt handler
in the CONFIG_SMP=n case) do a single-byte store to a pair of bytes
in the same machine word.  As I understand it, in code generated for
older Alphas, both "stores" will load the word containing that byte,
update their own byte, and store the updated word.

If two such single-byte stores run concurrently, one or the other of those
two stores will be lost, as in overwritten by the other.  This is a bug,
even in kernels built for single-CPU systems.  And a rare bug at that, one
that tends to disappear as you add debug code in an attempt to find it.

So if you want to run current kernels on old Alphas, you will need to
do something to fix this.

There might well be other things in need of fixing, for but one example,
it might be that the same issue will soon need to be addressed for
two-byte stores.  You will therefore need to carefully investigate this
issue to determine the full extent of work required to solve it.

							Thanx, Paul
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Maciej W. Rozycki 1 year, 6 months ago
On Tue, 28 May 2024, Paul E. McKenney wrote:

> > > > This topic came up again when Paul E. McKenney noticed that
> > > > parts of the RCU code already rely on byte access and do not
> > > > work on alpha EV5 reliably, so I refreshed my series now for
> > > > inclusion into the next merge window.
> > > 
> > > Hrrrm? That sounds like like Paul ran tests on EV5, did he?
> > 
> >  What exactly is required to make it work?
> 
> Whatever changes are needed to prevent the data corruption that can
> currently result in code generated by single-byte stores.  For but one
> example, consider a pair of tasks (or one task and an interrupt handler
> in the CONFIG_SMP=n case) do a single-byte store to a pair of bytes
> in the same machine word.  As I understand it, in code generated for
> older Alphas, both "stores" will load the word containing that byte,
> update their own byte, and store the updated word.
> 
> If two such single-byte stores run concurrently, one or the other of those
> two stores will be lost, as in overwritten by the other.  This is a bug,
> even in kernels built for single-CPU systems.  And a rare bug at that, one
> that tends to disappear as you add debug code in an attempt to find it.

 Thank you for the detailed description of the problematic scenario.

 I hope someone will find it useful, however for the record I have been 
familiar with the intricacies of the Alpha architecture as well as their 
implications for software for decades now.  The Alpha port of Linux was 
the first non-x86 Linux platform I have used and actually (and I've chased 
that as a matter of interest) my first ever contribution to Linux was for 
Alpha platform code:

On Mon, 30 Mar 1998, Jay.Estabrook@digital.com wrote:

> Hi, sorry about the delay in answering, but you'll be happy to know, I took
> your patches and merged them into my latest SMP patches, and submitted them
> to Linus just last night. He promises them to (mostly) be in 2.1.92, so we
> can look forward to that... :-)

so I find the scenario you have described more than obvious.

 Mind that the read-modify-write sequence that software does for sub-word 
write accesses with original Alpha hardware is precisely what hardware 
would have to do anyway and support for that was deliberately omitted by 
the architecture designers from the ISA to give it performance advantages 
quoted in the architecture manual.  The only difference here is that with 
hardware read-modify-write operations atomicity for sub-word accesses is 
guaranteed by the ISA, however for software read-modify-write it has to be 
explictly coded using the usual load-locked/store-conditional sequence in 
a loop.  I don't think it's a big deal really, it should be trivial to do 
in the relevant accessors, along with the memory barriers that are needed 
anyway for EV56+ and possibly other ports such as the MIPS one.

 What I have been after actually is: can you point me at a piece of code 
in our tree that will cause an issue with a non-BWX Alpha as described in 
your scenario, so that I have a starting point?  Mind that I'm completely 
new to RCU as I didn't have a need to research it before (though from a 
skim over Documentation/RCU/rcu.rst I understand what its objective is).

 FWIW even if it was only me I think that depriving the already thin Alpha 
port developer base of any quantity of the remaining manpower, by dropping 
support for a subset of the hardware available, and then a subset that is 
not just as exotic as the original i386 became to the x86 platform at the 
time support for it was dropped, is only going to lead to further demise 
and eventual drop of the entire port.

 And I think it would be good if we kept the port, just as we keep other 
ports of historical significance only, for educational reasons if nothing 
else, such as to let people understand based on an actual example, once 
mainstream, the implications of weakly ordered memory systems.

  Maciej
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Arnd Bergmann 1 year, 6 months ago
On Wed, May 29, 2024, at 20:50, Maciej W. Rozycki wrote:
> On Tue, 28 May 2024, Paul E. McKenney wrote:

>  What I have been after actually is: can you point me at a piece of code 
> in our tree that will cause an issue with a non-BWX Alpha as described in 
> your scenario, so that I have a starting point?  Mind that I'm completely 
> new to RCU as I didn't have a need to research it before (though from a 
> skim over Documentation/RCU/rcu.rst I understand what its objective is).

I tried to look for examples and started with users of WRITE_ONCE()
on small variables by only allowing it to be used on types that
can be written natively:

--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -451,8 +451,7 @@ struct ftrace_likely_data {
 
 /* Is this type a native word size -- useful for atomic operations */
 #define __native_word(t) \
-       (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
-        sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
+       (sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
 
 #ifdef __OPTIMIZE__
 # define __compiletime_assert(condition, msg, prefix, suffix)          \

The WRITE_ONCE() calls tend to be there in order to avoid
expensive atomic or locking when something can be expressed
with a store that known to be visible atomically (on all other
architectures).

I then tried changing the underlying variables to 32-bit ones
to see how many changes are needed, but I gave up after around
150 of them, as I was only scratching the surface. To do this
right, you'd need to go through each one of them and come up
with a solution that is the best trade-off in terms of memory
usage and performance for that one. There are of course
others that should be using WRITE_ONCE() and are missing
this, so the list is not complete either. See below for
the ones I could find quickly.

>  FWIW even if it was only me I think that depriving the already thin Alpha 
> port developer base of any quantity of the remaining manpower, by dropping 
> support for a subset of the hardware available, and then a subset that is 
> not just as exotic as the original i386 became to the x86 platform at the 
> time support for it was dropped, is only going to lead to further demise 
> and eventual drop of the entire port.

I know you like you museum pieces to be older than everyone
else's, and I'm sorry that my patch series is causing you
problems, but I don't think the more general criticism is
valid here. My hope was mainly to help our with both keeping
Alpha viable for a few more years while also allowing Paul
to continue with his RCU changes.

As far as I can tell, nobody else is actually using EV4
machines or has been for years now, but the presence of that
code did affect both the performance and correctness of the
kernel code for all EV56+ users since distros have no way
of picking the ISA level on alpha for a generic kernel.

As I wrote in my patch notes, Debian already changed their
userspace to be built for EV56 or higher, as they had
determined that this was a significant improvement for
their users, so there is no binary distro left to ship
with a modern kernel.

Matt still maintains the Gentoo port (in addition to alpha
kernel), which seems to still support EV4, but all eight
of his machines on https://mattst88.com/computers/ are
EV56 or higher.

The strongest argument I see for assuming non-BWX alphas
are long dead is that gcc-4.4 added support for C11 style
_Atomic variables for alpha, but got the stores wrong
without anyone ever noticing the problem. Even one makes
the argument that normal byte stores and volatiles ones
should not need atomic ll/st sequenes, the atomics
clearly do. Building BWX-enabled kernels and userland
completely avoids this problem, which make debugging
easier for the remaining users when stuff breaks.

   Arnd

----
below: partial patch to change types for WRITE_ONCE() variables,

diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 5d1779759c21..11f1368808fe 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -451,8 +451,7 @@ struct ftrace_likely_data {
 
 /* Is this type a native word size -- useful for atomic operations */
 #define __native_word(t) \
-	(sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
-	 sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
+	(sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
 
 #ifdef __OPTIMIZE__
 # define __compiletime_assert(condition, msg, prefix, suffix)		\
diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index b1e7415f8439..86402997df21 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -71,7 +71,7 @@ static mempool_t *bio_fallback_crypt_ctx_pool;
  * be used at a time - the rest of the unused tfms have their keys cleared.
  */
 static DEFINE_MUTEX(tfms_init_lock);
-static bool tfms_inited[BLK_ENCRYPTION_MODE_MAX];
+static int tfms_inited[BLK_ENCRYPTION_MODE_MAX];
 
 static struct blk_crypto_fallback_keyslot {
 	enum blk_crypto_mode_num crypto_mode;diff --git a/include/linux/hrtimer_types.h b/include/linux/hrtimer_types.h
index ad66a3081735..4e675dc1ea29 100644
--- a/include/linux/hrtimer_types.h
+++ b/include/linux/hrtimer_types.h
@@ -41,10 +41,10 @@ struct hrtimer {
 	ktime_t				_softexpires;
 	enum hrtimer_restart		(*function)(struct hrtimer *);
 	struct hrtimer_clock_base	*base;
-	u8				state;
-	u8				is_rel;
-	u8				is_soft;
-	u8				is_hard;
+	u32				state;
+	u32				is_rel;
+	u32				is_soft;
+	u32				is_hard;
 };
 
 #endif /* _LINUX_HRTIMER_TYPES_H */
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 91224bbcfa73..11d2b5da3982 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -411,8 +411,8 @@ struct io_ring_ctx {
 
 	/* napi busy poll default timeout */
 	unsigned int		napi_busy_poll_to;
-	bool			napi_prefer_busy_poll;
-	bool			napi_enabled;
+	int			napi_prefer_busy_poll;
+	int			napi_enabled;
 
 	DECLARE_HASHTABLE(napi_ht, 4);
 #endif
@@ -605,7 +605,7 @@ struct io_kiocb {
 
 	u8				opcode;
 	/* polled IO has completed */
-	u8				iopoll_completed;
+	u32				iopoll_completed;
 	/*
 	 * Can be either a fixed buffer index, or used with provided buffers.
 	 * For the latter, before issue it points to the buffer group ID,
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 383a0ea2ab91..0f72f411b520 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -254,7 +254,7 @@ struct ipv6_pinfo {
 						 * 010: prefer public address
 						 * 100: prefer care-of address
 						 */
-	__u8			pmtudisc;
+	__u32			pmtudisc;
 	__u8			min_hopcount;
 	__u8			tclass;
 	__be32			rcv_flowinfo;
diff --git a/include/linux/key.h b/include/linux/key.h
index 943a432da3ae..8809e797417e 100644
--- a/include/linux/key.h
+++ b/include/linux/key.h
@@ -218,7 +218,7 @@ struct key {
 						 * - may not match RCU dereferenced payload
 						 * - payload should contain own length
 						 */
-	short			state;		/* Key state (+) or rejection error (-) */
+	int			state;		/* Key state (+) or rejection error (-) */
 
 #ifdef KEY_DEBUGGING
 	unsigned		magic;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 546de9cf46df..a7ef1e3aa9d0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2047,7 +2047,7 @@ struct net_device {
 	 * and to use WRITE_ONCE() to annotate the writes.
 	 */
 	unsigned int		mtu;
-	unsigned short		needed_headroom;
+	unsigned int		needed_headroom;
 	struct netdev_tc_txq	tc_to_txq[TC_MAX_QUEUE];
 #ifdef CONFIG_XPS
 	struct xps_dev_maps __rcu *xps_maps[XPS_MAPS_MAX];
@@ -2298,7 +2298,7 @@ struct net_device {
 
 	struct list_head	link_watch_list;
 
-	u8 reg_state;
+	u32 reg_state;
 
 	bool dismantle;
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 61591ac6eab6..5eadcdfcf089 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -719,10 +719,10 @@ struct uclamp_se {
 
 union rcu_special {
 	struct {
-		u8			blocked;
-		u8			need_qs;
-		u8			exp_hint; /* Hint for performance. */
-		u8			need_mb; /* Readers need smp_mb(). */
+		u32			blocked;
+		u32			need_qs;
+		u32			exp_hint; /* Hint for performance. */
+		u32			need_mb; /* Readers need smp_mb(). */
 	} b; /* Bits. */
 	u32 s; /* Set of bits. */
 };
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 6a5e08b937b3..8dafb3e49546 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -125,7 +125,7 @@ struct tcp_options_received {
 	u8	saw_unknown:1,	/* Received unknown option		*/
 		unused:7;
 	u8	num_sacks;	/* Number of SACK blocks		*/
-	u16	user_mss;	/* mss requested by user in ioctl	*/
+	u32	user_mss;	/* mss requested by user in ioctl	*/
 	u16	mss_clamp;	/* Maximal mss, negotiated at connection setup */
 };
 
@@ -237,8 +237,8 @@ struct tcp_sock {
 	u32	tlp_high_seq;	/* snd_nxt at the time of TLP */
 	u32	rttvar_us;	/* smoothed mdev_max			*/
 	u32	retrans_out;	/* Retransmitted packets out		*/
-	u16	advmss;		/* Advertised MSS			*/
-	u16	urg_data;	/* Saved octet of OOB data and control flags */
+	u32	advmss;		/* Advertised MSS			*/
+	u32	urg_data;	/* Saved octet of OOB data and control flags */
 	u32	lost;		/* Total data packets lost incl. rexmits */
 	struct  minmax rtt_min;
 	/* OOO segments go in this rbtree. Socket lock must be held. */
@@ -383,7 +383,7 @@ struct tcp_sock {
 		syn_fastopen_ch:1, /* Active TFO re-enabling probe */
 		syn_data_acked:1;/* data in SYN is acked by SYN-ACK */
 
-	u8	keepalive_probes; /* num of allowed keep alive probes	*/
+	u32	keepalive_probes; /* num of allowed keep alive probes	*/
 	u32	tcp_tx_delay;	/* delay (in usec) added to TX packets */
 
 /* RTT measurement */
diff --git a/include/linux/udp.h b/include/linux/udp.h
index 3eb3f2b9a2a0..2bba12ee545e 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -54,19 +54,19 @@ struct udp_sock {
 	unsigned long	 udp_flags;
 
 	int		 pending;	/* Any pending frames ? */
-	__u8		 encap_type;	/* Is this an Encapsulation socket? */
+	__u32		 encap_type;	/* Is this an Encapsulation socket? */
 
 	/*
 	 * Following member retains the information to create a UDP header
 	 * when the socket is uncorked.
 	 */
 	__u16		 len;		/* total length of pending frames */
-	__u16		 gso_size;
+	__u32		 gso_size;
 	/*
 	 * Fields specific to UDP-Lite.
 	 */
-	__u16		 pcslen;
-	__u16		 pcrlen;
+	__u32		 pcslen;
+	__u32		 pcrlen;
 	/*
 	 * For encapsulation sockets.
 	 */
@@ -94,7 +94,7 @@ struct udp_sock {
 	int		forward_threshold;
 
 	/* Cache friendly copy of sk->sk_peek_off >= 0 */
-	bool		peeking_with_offset;
+	int		peeking_with_offset;
 };
 
 #define udp_test_bit(nr, sk)			\
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index 7d6b1254c92d..b549a39360ec 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -105,11 +105,11 @@ struct inet_connection_sock {
 	__u8			  icsk_retransmits;
 	__u8			  icsk_pending;
 	__u8			  icsk_backoff;
-	__u8			  icsk_syn_retries;
+	__u32			  icsk_syn_retries;
 	__u8			  icsk_probes_out;
 	__u16			  icsk_ext_hdr_len;
 	struct {
-		__u8		  pending;	 /* ACK is pending			   */
+		__u32		  pending;	 /* ACK is pending			   */
 		__u8		  quick;	 /* Scheduled number of quick acks	   */
 		__u8		  pingpong;	 /* The session is interactive		   */
 		__u8		  retry;	 /* Number of attempts			   */
@@ -120,7 +120,7 @@ struct inet_connection_sock {
 		unsigned long	  timeout;	 /* Currently scheduled timeout		   */
 		__u32		  lrcvtime;	 /* timestamp of last received data packet */
 		__u16		  last_seg_size; /* Size of last incoming segment	   */
-		__u16		  rcv_mss;	 /* MSS used for delayed ACK decisions	   */
+		__u32		  rcv_mss;	 /* MSS used for delayed ACK decisions	   */
 	} icsk_ack;
 	struct {
 		/* Range of MTUs to search */
diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 5af6eb14c5db..5d368156e437 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -18,7 +18,7 @@ struct fqdir {
 	int			max_dist;
 	struct inet_frags	*f;
 	struct net		*net;
-	bool			dead;
+	int			dead;
 
 	struct rhashtable       rhashtable ____cacheline_aligned_in_smp;
 
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index f9ddd47dc4f8..98f2c745a34e 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -220,17 +220,17 @@ struct inet_sock {
 
 	unsigned long		inet_flags;
 	__be32			inet_saddr;
-	__s16			uc_ttl;
+	__s32			uc_ttl;
 	__be16			inet_sport;
 	struct ip_options_rcu __rcu	*inet_opt;
 	atomic_t		inet_id;
 
-	__u8			tos;
-	__u8			min_ttl;
-	__u8			mc_ttl;
-	__u8			pmtudisc;
-	__u8			rcv_tos;
-	__u8			convert_csum;
+	__u32			tos;
+	__u32			min_ttl;
+	__u32			mc_ttl;
+	__u32			pmtudisc;
+	__u32			rcv_tos;
+	__u32			convert_csum;
 	int			uc_index;
 	int			mc_index;
 	__be32			mc_addr;
diff --git a/include/net/ip.h b/include/net/ip.h
index 6d735e00d3f3..e0cb7e0bfec9 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -81,7 +81,7 @@ struct ipcm_cookie {
 	__u8			ttl;
 	__s16			tos;
 	char			priority;
-	__u16			gso_size;
+	__u32			gso_size;
 };
 
 static inline void ipcm_init(struct ipcm_cookie *ipcm)
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 9b2f69ba5e49..182b7eade5c0 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -139,7 +139,7 @@ struct fib_info {
 	refcount_t		fib_treeref;
 	refcount_t		fib_clntref;
 	unsigned int		fib_flags;
-	unsigned char		fib_dead;
+	u32			fib_dead;
 	unsigned char		fib_protocol;
 	unsigned char		fib_scope;
 	unsigned char		fib_type;
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 88a8e554f7a1..4732c3084e10 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -358,7 +358,7 @@ struct ipcm6_cookie {
 	struct sockcm_cookie sockc;
 	__s16 hlimit;
 	__s16 tclass;
-	__u16 gso_size;
+	__u32 gso_size;
 	__s8  dontfrag;
 	struct ipv6_txoptions *opt;
 };
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 0d28172193fa..e2f580880977 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -147,10 +147,10 @@ struct neighbour {
 	struct timer_list	timer;
 	unsigned long		used;
 	atomic_t		probes;
-	u8			nud_state;
-	u8			type;
-	u8			dead;
-	u8			protocol;
+	u32			nud_state;
+	u32			type;
+	u32			dead;
+	u32			protocol;
 	u32			flags;
 	seqlock_t		ha_lock;
 	unsigned char		ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))] __aligned(8);
diff --git a/include/net/netns/core.h b/include/net/netns/core.h
index 78214f1b43a2..46d9b3966c5b 100644
--- a/include/net/netns/core.h
+++ b/include/net/netns/core.h
@@ -14,7 +14,7 @@ struct netns_core {
 
 	int	sysctl_somaxconn;
 	int	sysctl_optmem_max;
-	u8	sysctl_txrehash;
+	u32	sysctl_txrehash;
 
 #ifdef CONFIG_PROC_FS
 	struct prot_inuse __percpu *prot_inuse;
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index c356c458b340..3df67aa03c3c 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -48,27 +48,27 @@ struct netns_ipv4 {
 
 	/* TX readonly hotpath cache lines */
 	__cacheline_group_begin(netns_ipv4_read_tx);
-	u8 sysctl_tcp_early_retrans;
-	u8 sysctl_tcp_tso_win_divisor;
-	u8 sysctl_tcp_tso_rtt_log;
-	u8 sysctl_tcp_autocorking;
+	u32 sysctl_tcp_early_retrans;
+	u32 sysctl_tcp_tso_win_divisor;
+	u32 sysctl_tcp_tso_rtt_log;
+	u32 sysctl_tcp_autocorking;
 	int sysctl_tcp_min_snd_mss;
 	unsigned int sysctl_tcp_notsent_lowat;
 	int sysctl_tcp_limit_output_bytes;
 	int sysctl_tcp_min_rtt_wlen;
 	int sysctl_tcp_wmem[3];
-	u8 sysctl_ip_fwd_use_pmtu;
+	u32 sysctl_ip_fwd_use_pmtu;
 	__cacheline_group_end(netns_ipv4_read_tx);
 
 	/* TXRX readonly hotpath cache lines */
 	__cacheline_group_begin(netns_ipv4_read_txrx);
-	u8 sysctl_tcp_moderate_rcvbuf;
+	u32 sysctl_tcp_moderate_rcvbuf;
 	__cacheline_group_end(netns_ipv4_read_txrx);
 
 	/* RX readonly hotpath cache line */
 	__cacheline_group_begin(netns_ipv4_read_rx);
-	u8 sysctl_ip_early_demux;
-	u8 sysctl_tcp_early_demux;
+	u32 sysctl_ip_early_demux;
+	u32 sysctl_tcp_early_demux;
 	int sysctl_tcp_reordering;
 	int sysctl_tcp_rmem[3];
 	__cacheline_group_end(netns_ipv4_read_rx);
@@ -96,7 +96,7 @@ struct netns_ipv4 {
 #endif
 	bool			fib_has_custom_local_routes;
 	bool			fib_offload_disabled;
-	u8			sysctl_tcp_shrink_window;
+	u32			sysctl_tcp_shrink_window;
 #ifdef CONFIG_IP_ROUTE_CLASSID
 	atomic_t		fib_num_tclassid_users;
 #endif
@@ -108,11 +108,11 @@ struct netns_ipv4 {
 	struct inet_peer_base	*peers;
 	struct fqdir		*fqdir;
 
-	u8 sysctl_icmp_echo_ignore_all;
-	u8 sysctl_icmp_echo_enable_probe;
-	u8 sysctl_icmp_echo_ignore_broadcasts;
-	u8 sysctl_icmp_ignore_bogus_error_responses;
-	u8 sysctl_icmp_errors_use_inbound_ifaddr;
+	u32 sysctl_icmp_echo_ignore_all;
+	u32 sysctl_icmp_echo_enable_probe;
+	u32 sysctl_icmp_echo_ignore_broadcasts;
+	u32 sysctl_icmp_ignore_bogus_error_responses;
+	u32 sysctl_icmp_errors_use_inbound_ifaddr;
 	int sysctl_icmp_ratelimit;
 	int sysctl_icmp_ratemask;
 
@@ -122,29 +122,29 @@ struct netns_ipv4 {
 
 	struct local_ports ip_local_ports;
 
-	u8 sysctl_tcp_ecn;
-	u8 sysctl_tcp_ecn_fallback;
+	u32 sysctl_tcp_ecn;
+	u32 sysctl_tcp_ecn_fallback;
 
-	u8 sysctl_ip_default_ttl;
-	u8 sysctl_ip_no_pmtu_disc;
-	u8 sysctl_ip_fwd_update_priority;
-	u8 sysctl_ip_nonlocal_bind;
-	u8 sysctl_ip_autobind_reuse;
+	u32 sysctl_ip_default_ttl;
+	u32 sysctl_ip_no_pmtu_disc;
+	u32 sysctl_ip_fwd_update_priority;
+	u32 sysctl_ip_nonlocal_bind;
+	u32 sysctl_ip_autobind_reuse;
 	/* Shall we try to damage output packets if routing dev changes? */
-	u8 sysctl_ip_dynaddr;
+	u32 sysctl_ip_dynaddr;
 #ifdef CONFIG_NET_L3_MASTER_DEV
-	u8 sysctl_raw_l3mdev_accept;
+	u32 sysctl_raw_l3mdev_accept;
 #endif
-	u8 sysctl_udp_early_demux;
+	u32 sysctl_udp_early_demux;
 
-	u8 sysctl_nexthop_compat_mode;
+	u32 sysctl_nexthop_compat_mode;
 
-	u8 sysctl_fwmark_reflect;
-	u8 sysctl_tcp_fwmark_accept;
+	u32 sysctl_fwmark_reflect;
+	u32 sysctl_tcp_fwmark_accept;
 #ifdef CONFIG_NET_L3_MASTER_DEV
-	u8 sysctl_tcp_l3mdev_accept;
+	u32 sysctl_tcp_l3mdev_accept;
 #endif
-	u8 sysctl_tcp_mtu_probing;
+	u32 sysctl_tcp_mtu_probing;
 	int sysctl_tcp_mtu_probe_floor;
 	int sysctl_tcp_base_mss;
 	int sysctl_tcp_probe_threshold;
@@ -152,43 +152,43 @@ struct netns_ipv4 {
 
 	int sysctl_tcp_keepalive_time;
 	int sysctl_tcp_keepalive_intvl;
-	u8 sysctl_tcp_keepalive_probes;
-
-	u8 sysctl_tcp_syn_retries;
-	u8 sysctl_tcp_synack_retries;
-	u8 sysctl_tcp_syncookies;
-	u8 sysctl_tcp_migrate_req;
-	u8 sysctl_tcp_comp_sack_nr;
-	u8 sysctl_tcp_backlog_ack_defer;
-	u8 sysctl_tcp_pingpong_thresh;
-
-	u8 sysctl_tcp_retries1;
-	u8 sysctl_tcp_retries2;
-	u8 sysctl_tcp_orphan_retries;
-	u8 sysctl_tcp_tw_reuse;
+	u32 sysctl_tcp_keepalive_probes;
+
+	u32 sysctl_tcp_syn_retries;
+	u32 sysctl_tcp_synack_retries;
+	u32 sysctl_tcp_syncookies;
+	u32 sysctl_tcp_migrate_req;
+	u32 sysctl_tcp_comp_sack_nr;
+	u32 sysctl_tcp_backlog_ack_defer;
+	u32 sysctl_tcp_pingpong_thresh;
+
+	u32 sysctl_tcp_retries1;
+	u32 sysctl_tcp_retries2;
+	u32 sysctl_tcp_orphan_retries;
+	u32 sysctl_tcp_tw_reuse;
 	int sysctl_tcp_fin_timeout;
-	u8 sysctl_tcp_sack;
-	u8 sysctl_tcp_window_scaling;
-	u8 sysctl_tcp_timestamps;
-	u8 sysctl_tcp_recovery;
-	u8 sysctl_tcp_thin_linear_timeouts;
-	u8 sysctl_tcp_slow_start_after_idle;
-	u8 sysctl_tcp_retrans_collapse;
-	u8 sysctl_tcp_stdurg;
-	u8 sysctl_tcp_rfc1337;
-	u8 sysctl_tcp_abort_on_overflow;
-	u8 sysctl_tcp_fack; /* obsolete */
+	u32 sysctl_tcp_sack;
+	u32 sysctl_tcp_window_scaling;
+	u32 sysctl_tcp_timestamps;
+	u32 sysctl_tcp_recovery;
+	u32 sysctl_tcp_thin_linear_timeouts;
+	u32 sysctl_tcp_slow_start_after_idle;
+	u32 sysctl_tcp_retrans_collapse;
+	u32 sysctl_tcp_stdurg;
+	u32 sysctl_tcp_rfc1337;
+	u32 sysctl_tcp_abort_on_overflow;
+	u32 sysctl_tcp_fack; /* obsolete */
 	int sysctl_tcp_max_reordering;
 	int sysctl_tcp_adv_win_scale; /* obsolete */
-	u8 sysctl_tcp_dsack;
-	u8 sysctl_tcp_app_win;
-	u8 sysctl_tcp_frto;
-	u8 sysctl_tcp_nometrics_save;
-	u8 sysctl_tcp_no_ssthresh_metrics_save;
-	u8 sysctl_tcp_workaround_signed_windows;
+	u32 sysctl_tcp_dsack;
+	u32 sysctl_tcp_app_win;
+	u32 sysctl_tcp_frto;
+	u32 sysctl_tcp_nometrics_save;
+	u32 sysctl_tcp_no_ssthresh_metrics_save;
+	u32 sysctl_tcp_workaround_signed_windows;
 	int sysctl_tcp_challenge_ack_limit;
-	u8 sysctl_tcp_min_tso_segs;
-	u8 sysctl_tcp_reflect_tos;
+	u32 sysctl_tcp_min_tso_segs;
+	u32 sysctl_tcp_reflect_tos;
 	int sysctl_tcp_invalid_ratelimit;
 	int sysctl_tcp_pacing_ss_ratio;
 	int sysctl_tcp_pacing_ca_ratio;
@@ -204,23 +204,23 @@ struct netns_ipv4 {
 	unsigned long tfo_active_disable_stamp;
 	u32 tcp_challenge_timestamp;
 	u32 tcp_challenge_count;
-	u8 sysctl_tcp_plb_enabled;
-	u8 sysctl_tcp_plb_idle_rehash_rounds;
-	u8 sysctl_tcp_plb_rehash_rounds;
-	u8 sysctl_tcp_plb_suspend_rto_sec;
+	u32 sysctl_tcp_plb_enabled;
+	u32 sysctl_tcp_plb_idle_rehash_rounds;
+	u32 sysctl_tcp_plb_rehash_rounds;
+	u32 sysctl_tcp_plb_suspend_rto_sec;
 	int sysctl_tcp_plb_cong_thresh;
 
 	int sysctl_udp_wmem_min;
 	int sysctl_udp_rmem_min;
 
-	u8 sysctl_fib_notify_on_flag_change;
-	u8 sysctl_tcp_syn_linear_timeouts;
+	u32 sysctl_fib_notify_on_flag_change;
+	u32 sysctl_tcp_syn_linear_timeouts;
 
 #ifdef CONFIG_NET_L3_MASTER_DEV
-	u8 sysctl_udp_l3mdev_accept;
+	u32 sysctl_udp_l3mdev_accept;
 #endif
 
-	u8 sysctl_igmp_llm_reports;
+	u32 sysctl_igmp_llm_reports;
 	int sysctl_igmp_max_memberships;
 	int sysctl_igmp_max_msf;
 	int sysctl_igmp_qrv;
@@ -246,8 +246,8 @@ struct netns_ipv4 {
 #endif
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 	u32 sysctl_fib_multipath_hash_fields;
-	u8 sysctl_fib_multipath_use_neigh;
-	u8 sysctl_fib_multipath_hash_policy;
+	u32 sysctl_fib_multipath_use_neigh;
+	u32 sysctl_fib_multipath_hash_policy;
 #endif
 
 	struct fib_notifier_ops	*notifier_ops;
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index ebcb8896bffc..662804623036 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -221,7 +221,7 @@ struct fastopen_queue {
  */
 struct request_sock_queue {
 	spinlock_t		rskq_lock;
-	u8			rskq_defer_accept;
+	u32			rskq_defer_accept;
 
 	u32			synflood_warned;
 	atomic_t		qlen;
diff --git a/include/net/sock.h b/include/net/sock.h
index 5f4d0629348f..729fa16dd29f 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -168,8 +168,8 @@ struct sock_common {
 		};
 	};
 
-	unsigned short		skc_family;
-	volatile unsigned char	skc_state;
+	u32		skc_family;
+	volatile unsigned int	skc_state;
 	unsigned char		skc_reuse:4;
 	unsigned char		skc_reuseport:1;
 	unsigned char		skc_ipv6only:1;
@@ -210,9 +210,9 @@ struct sock_common {
 		struct hlist_node	skc_node;
 		struct hlist_nulls_node skc_nulls_node;
 	};
-	unsigned short		skc_tx_queue_mapping;
+	unsigned int		skc_tx_queue_mapping;
 #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING
-	unsigned short		skc_rx_queue_mapping;
+	unsigned int		skc_rx_queue_mapping;
 #endif
 	union {
 		int		skc_incoming_cpu;
@@ -411,8 +411,8 @@ struct sock {
 #ifdef CONFIG_NET_RX_BUSY_POLL
 	unsigned int		sk_ll_usec;
 	unsigned int		sk_napi_id;
-	u16			sk_busy_poll_budget;
-	u8			sk_prefer_busy_poll;
+	u32			sk_busy_poll_budget;
+	u32			sk_prefer_busy_poll;
 #endif
 	u8			sk_userlocks;
 	int			sk_rcvbuf;
@@ -486,7 +486,7 @@ struct sock {
 	unsigned int		sk_gso_max_size;
 	gfp_t			sk_allocation;
 	u32			sk_txhash;
-	u8			sk_pacing_shift;
+	u32			sk_pacing_shift;
 	bool			sk_use_task_frag;
 	__cacheline_group_end(sock_read_tx);
 
@@ -498,7 +498,7 @@ struct sock {
 				sk_kern_sock : 1,
 				sk_no_check_tx : 1,
 				sk_no_check_rx : 1;
-	u8			sk_shutdown;
+	u32			sk_shutdown;
 	u16			sk_type;
 	u16			sk_protocol;
 	unsigned long	        sk_lingertime;
@@ -519,7 +519,7 @@ struct sock {
 #endif
 	int			sk_disconnects;
 
-	u8			sk_txrehash;
+	u32			sk_txrehash;
 	u8			sk_clockid;
 	u8			sk_txtime_deadline_mode : 1,
 				sk_txtime_report_errors : 1,
diff --git a/include/net/sock_reuseport.h b/include/net/sock_reuseport.h
index 6ec140b0a61b..1c79170538f3 100644
--- a/include/net/sock_reuseport.h
+++ b/include/net/sock_reuseport.h
@@ -15,8 +15,8 @@ struct sock_reuseport {
 
 	u16			max_socks;		/* length of socks */
 	u16			num_socks;		/* elements in socks */
-	u16			num_closed_socks;	/* closed elements in socks */
-	u16			incoming_cpu;
+	u32			num_closed_socks;	/* closed elements in socks */
+	u32			incoming_cpu;
 	/* The last synq overflow event timestamp of this
 	 * reuse->socks[] group.
 	 */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 060e95b331a2..4c068da5d085 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1744,7 +1744,7 @@ static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp)
 /* - key database */
 struct tcp_md5sig_key {
 	struct hlist_node	node;
-	u8			keylen;
+	u32			keylen;
 	u8			family; /* AF_INET or AF_INET6 */
 	u8			prefixlen;
 	u8			flags;
diff --git a/include/net/udp.h b/include/net/udp.h
index c4e05b14b648..2794e6b75f86 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -279,7 +279,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len);
 void udp_splice_eof(struct socket *sock);
 int udp_push_pending_frames(struct sock *sk);
 void udp_flush_pending_frames(struct sock *sk);
-int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u16 *gso_size);
+int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u32 *gso_size);
 void udp4_hwcsum(struct sk_buff *skb, __be32 src, __be32 dst);
 int udp_rcv(struct sk_buff *skb);
 int udp_ioctl(struct sock *sk, int cmd, int *karg);
diff --git a/include/sound/ump.h b/include/sound/ump.h
index 91238dabe307..c6ce23d2db00 100644
--- a/include/sound/ump.h
+++ b/include/sound/ump.h
@@ -26,7 +26,7 @@ struct snd_ump_endpoint {
 
 	/* UMP Stream message processing */
 	u32 stream_wait_for;	/* expected stream message status */
-	bool stream_finished;	/* set when message has been processed */
+	u32 stream_finished;	/* set when message has been processed */
 	bool parsed;		/* UMP / FB parse finished? */
 	bool no_process_stream;	/* suppress UMP stream messages handling */
 	wait_queue_head_t stream_wait;
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 994bf7af0efe..647503be0999 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -30,7 +30,7 @@ extern "C" {
 struct io_uring_sqe {
 	__u8	opcode;		/* type of operation for this sqe */
 	__u8	flags;		/* IOSQE_ flags */
-	__u16	ioprio;		/* ioprio for the request */
+	__u32	ioprio;		/* ioprio for the request */
 	__s32	fd;		/* file descriptor to do IO on */
 	union {
 		__u64	off;	/* offset into file */
@@ -78,9 +78,9 @@ struct io_uring_sqe {
 	/* pack this to avoid bogus arm OABI complaints */
 	union {
 		/* index into fixed buffers, if used */
-		__u16	buf_index;
+		__u32	buf_index;
 		/* for grouped buffer selection */
-		__u16	buf_group;
+		__u32	buf_group;
 	} __attribute__((packed));
 	/* personality to use, if used */
 	__u16	personality;
@@ -89,8 +89,8 @@ struct io_uring_sqe {
 		__u32	file_index;
 		__u32	optlen;
 		struct {
-			__u16	addr_len;
-			__u16	__pad3[1];
+			__u32	addr_len;
+			__u32	__pad3[1];
 		};
 	};
 	union {
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 624ca9076a50..b14acb4de822 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -44,7 +44,7 @@ struct io_wait_queue {
 
 #ifdef CONFIG_NET_RX_BUSY_POLL
 	unsigned int napi_busy_poll_to;
-	bool napi_prefer_busy_poll;
+	int napi_prefer_busy_poll;
 #endif
 };
 
diff --git a/io_uring/notif.h b/io_uring/notif.h
index f3589cfef4a9..e3f4f2462c4a 100644
--- a/io_uring/notif.h
+++ b/io_uring/notif.h
@@ -18,9 +18,9 @@ struct io_notif_data {
 	struct io_notif_data	*head;
 
 	unsigned		account_pages;
-	bool			zc_report;
-	bool			zc_used;
-	bool			zc_copied;
+	int			zc_report;
+	int			zc_used;
+	int			zc_copied;
 };
 
 struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx);
diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index de95ec07e477..1b485f4041a2 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -123,7 +123,7 @@ struct lock_stress_stats {
 
 struct call_rcu_chain {
 	struct rcu_head crc_rh;
-	bool crc_stop;
+	int crc_stop;
 };
 struct call_rcu_chain *call_rcu_chain_list;
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index cefa27f92bb6..1f690da4a6e4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -872,10 +872,10 @@ struct root_domain {
 	 * - More than one runnable task
 	 * - Running task is misfit
 	 */
-	bool			overloaded;
+	int			overloaded;
 
 	/* Indicate one or more CPUs over-utilized (tipping point) */
-	bool			overutilized;
+	int			overutilized;
 
 	/*
 	 * The bit corresponding to a CPU gets set here if such CPU has more
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index cdd4e2314bfc..a79464cba03d 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -753,7 +753,7 @@ static int __init debug_boot_weak_hash_enable(char *str)
 }
 early_param("debug_boot_weak_hash", debug_boot_weak_hash_enable);
 
-static bool filled_random_ptr_key __read_mostly;
+static int filled_random_ptr_key __read_mostly;
 static siphash_key_t ptr_key __read_mostly;
 
 static int fill_ptr_key(struct notifier_block *nb, unsigned long action, void *data)
diff --git a/mm/swap_state.c b/mm/swap_state.c
index a5dae40523ab..cbf325c8afc4 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -40,7 +40,7 @@ static const struct address_space_operations swap_aops = {
 
 struct address_space *swapper_spaces[MAX_SWAPFILES] __read_mostly;
 static unsigned int nr_swapper_spaces[MAX_SWAPFILES] __read_mostly;
-static bool enable_vma_readahead __read_mostly = true;
+static int enable_vma_readahead __read_mostly = true;
 
 #define SWAP_RA_WIN_SHIFT	(PAGE_SHIFT / 2)
 #define SWAP_RA_HITS_MASK	((1UL << SWAP_RA_WIN_SHIFT) - 1)
diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
index f9b9e26c32c1..5c5b69e5fb8b 100644
--- a/net/ipv4/fib_lookup.h
+++ b/net/ipv4/fib_lookup.h
@@ -17,9 +17,9 @@ struct fib_alias {
 	u8			fa_slen;
 	u32			tb_id;
 	s16			fa_default;
-	u8			offload;
-	u8			trap;
-	u8			offload_failed;
+	u32			offload;
+	u32			trap;
+	u32			offload_failed;
 	struct rcu_head		rcu;
 };
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 681b54e1f3a6..24ebb76dc7b5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4729,8 +4633,6 @@ void __init tcp_init(void)
 	BUILD_BUG_ON(sizeof(struct tcp_skb_cb) >
 		     sizeof_field(struct sk_buff, cb));
 
-	tcp_struct_check();
-
 	percpu_counter_init(&tcp_sockets_allocated, 0, GFP_KERNEL);
 
 	timer_setup(&tcp_orphan_timer, tcp_orphan_update, TIMER_DEFERRABLE);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9c04a9c8be9d..7531e52c5e5d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -568,7 +568,7 @@ static void tcp_init_buffer_space(struct sock *sk)
 
 		if (tcp_app_win && maxwin > 4 * tp->advmss)
 			WRITE_ONCE(tp->window_clamp,
-				   max(maxwin - (maxwin >> tcp_app_win),
+				   max_t(u32, maxwin - (maxwin >> tcp_app_win),
 				       4 * tp->advmss));
 	}
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 189c9113fe9a..47bf7dc201ab 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1015,7 +1015,7 @@ int udp_push_pending_frames(struct sock *sk)
 }
 EXPORT_SYMBOL(udp_push_pending_frames);
 
-static int __udp_cmsg_send(struct cmsghdr *cmsg, u16 *gso_size)
+static int __udp_cmsg_send(struct cmsghdr *cmsg, u32 *gso_size)
 {
 	switch (cmsg->cmsg_type) {
 	case UDP_SEGMENT:
@@ -1028,7 +1028,7 @@ static int __udp_cmsg_send(struct cmsghdr *cmsg, u16 *gso_size)
 	}
 }
 
-int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u16 *gso_size)
+int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u32 *gso_size)
 {
 	struct cmsghdr *cmsg;
 	bool need_ip = false;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 7aa47e2dd52b..803ca22eadef 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -219,17 +219,17 @@ struct mptcp_pm_data {
 
 	spinlock_t	lock;		/*protects the whole PM data */
 
-	u8		addr_signal;
-	bool		server_side;
-	bool		work_pending;
-	bool		accept_addr;
-	bool		accept_subflow;
-	bool		remote_deny_join_id0;
+	u32		addr_signal;
+	u32		server_side;
+	u32		work_pending;
+	u32		accept_addr;
+	u32		accept_subflow;
+	u32		remote_deny_join_id0;
 	u8		add_addr_signaled;
 	u8		add_addr_accepted;
 	u8		local_addr_used;
-	u8		pm_type;
-	u8		subflows;
+	u32		pm_type;
+	u32		subflows;
 	u8		status;
 	DECLARE_BITMAP(id_avail_bitmap, MPTCP_PM_MAX_ADDR_ID + 1);
 	struct mptcp_rm_list rm_list_tx;
@@ -290,14 +290,14 @@ struct mptcp_sock {
 	unsigned long	flags;
 	unsigned long	cb_flags;
 	bool		recovery;		/* closing subflow write queue reinjected */
-	bool		can_ack;
-	bool		fully_established;
-	bool		rcv_data_fin;
-	bool		snd_data_fin_enable;
-	bool		rcv_fastclose;
-	bool		use_64bit_ack; /* Set when we received a 64-bit DSN */
-	bool		csum_enabled;
-	bool		allow_infinite_fallback;
+	int		can_ack;
+	int		fully_established;
+	u32		rcv_data_fin;
+	u32		snd_data_fin_enable;
+	u32		rcv_fastclose;
+	u32		use_64bit_ack; /* Set when we received a 64-bit DSN */
+	u32		csum_enabled;
+	u32		allow_infinite_fallback;
 	u8		pending_state; /* A subflow asked to set this sk_state,
 					* protected by the msk data lock
 					*/
@@ -445,8 +445,8 @@ struct mptcp_subflow_request_sock {
 		backup : 1,
 		csum_reqd : 1,
 		allow_join_id0 : 1;
-	u8	local_id;
-	u8	remote_id;
+	u16	local_id;
+	u32	remote_id;
 	u64	local_key;
 	u64	idsn;
 	u32	token;
@@ -519,8 +519,8 @@ struct mptcp_subflow_context {
 		valid_csum_seen : 1,        /* at least one csum validated */
 		is_mptfo : 1,	    /* subflow is doing TFO */
 		__unused : 10;
-	bool	data_avail;
-	bool	scheduled;
+	u32	data_avail;
+	u32	scheduled;
 	u32	remote_nonce;
 	u64	thmac;
 	u32	local_nonce;
@@ -529,8 +529,8 @@ struct mptcp_subflow_context {
 		u8	hmintac[MPTCPOPT_HMAC_LEN]; /* MPJ subflow only */
 		u64	iasn;	    /* initial ack sequence number, MPC subflows only */
 	};
-	s16	local_id;	    /* if negative not initialized yet */
-	u8	remote_id;
+	u32	local_id;	    /* if negative not initialized yet */
+	u32	remote_id;
 	u8	reset_seen:1;
 	u8	reset_transient:1;
 	u8	reset_reason:4;
diff --git a/scripts/Makefile.extrawarn b/scripts/Makefile.extrawarn
index 55f6e6917033..02c54a8b2f2f 100644
--- a/scripts/Makefile.extrawarn
+++ b/scripts/Makefile.extrawarn
@@ -25,7 +25,7 @@ endif
 
 KBUILD_CPPFLAGS-$(CONFIG_WERROR) += -Werror
 KBUILD_CPPFLAGS += $(KBUILD_CPPFLAGS-y)
-KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds
+#KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds
 
 ifdef CONFIG_CC_IS_CLANG
 # The kernel builds with '-std=gnu11' so use of GNU extensions is acceptable.
diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
index 289bf9233f71..92dff1fd4ac3 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -91,7 +91,7 @@ struct selinux_policy;
 
 struct selinux_state {
 #ifdef CONFIG_SECURITY_SELINUX_DEVELOP
-	bool enforcing;
+	int enforcing;
 #endif
 	bool initialized;
 	bool policycap[__POLICYDB_CAP_MAX];
diff --git a/security/tomoyo/common.h b/security/tomoyo/common.h
index 0e8e2e959aef..ebca489c729f 100644
--- a/security/tomoyo/common.h
+++ b/security/tomoyo/common.h
@@ -772,7 +772,7 @@ struct tomoyo_inet_acl {
 struct tomoyo_unix_acl {
 	struct tomoyo_acl_info head; /* type = TOMOYO_TYPE_UNIX_ACL */
 	u8 protocol;
-	u8 perm; /* Bitmask of values in "enum tomoyo_network_acl_index" */
+	u32 perm; /* Bitmask of values in "enum tomoyo_network_acl_index" */
 	struct tomoyo_name_union name;
 };
 
diff --git a/security/tomoyo/network.c b/security/tomoyo/network.c
index 8dc61335f65e..dccd873b5bf8 100644
--- a/security/tomoyo/network.c
+++ b/security/tomoyo/network.c
@@ -257,7 +257,7 @@ static bool tomoyo_merge_unix_acl(struct tomoyo_acl_info *a,
 				  struct tomoyo_acl_info *b,
 				  const bool is_delete)
 {
-	u8 * const a_perm =
+	u32 * const a_perm =
 		&container_of(a, struct tomoyo_unix_acl, head)->perm;
 	u8 perm = READ_ONCE(*a_perm);
 	const u8 b_perm = container_of(b, struct tomoyo_unix_acl, head)->perm;
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Maciej W. Rozycki 1 year, 6 months ago
On Fri, 31 May 2024, Arnd Bergmann wrote:

> I then tried changing the underlying variables to 32-bit ones
> to see how many changes are needed, but I gave up after around
> 150 of them, as I was only scratching the surface. To do this
> right, you'd need to go through each one of them and come up
> with a solution that is the best trade-off in terms of memory
> usage and performance for that one. There are of course
> others that should be using WRITE_ONCE() and are missing
> this, so the list is not complete either. See below for
> the ones I could find quickly.

 Thank you for your attempt, and I agree this is excessive and beyond what 
we can reasonably handle.

> >  FWIW even if it was only me I think that depriving the already thin Alpha 
> > port developer base of any quantity of the remaining manpower, by dropping 
> > support for a subset of the hardware available, and then a subset that is 
> > not just as exotic as the original i386 became to the x86 platform at the 
> > time support for it was dropped, is only going to lead to further demise 
> > and eventual drop of the entire port.
> 
> I know you like you museum pieces to be older than everyone
> else's, and I'm sorry that my patch series is causing you
> problems, but I don't think the more general criticism is
> valid here. My hope was mainly to help our with both keeping
> Alpha viable for a few more years while also allowing Paul
> to continue with his RCU changes.

 Appreciated and thank you for your appreciation as well.

> As far as I can tell, nobody else is actually using EV4
> machines or has been for years now, but the presence of that
> code did affect both the performance and correctness of the
> kernel code for all EV56+ users since distros have no way
> of picking the ISA level on alpha for a generic kernel.

 Well, at least John Paul Adrian complained as well, and who knows who 
else is there downstream.  I'd expect most people (i.e. all except for 
core Linux developers) not to track upstream development in a continuous 
manner.

> The strongest argument I see for assuming non-BWX alphas
> are long dead is that gcc-4.4 added support for C11 style
> _Atomic variables for alpha, but got the stores wrong
> without anyone ever noticing the problem. Even one makes
> the argument that normal byte stores and volatiles ones
> should not need atomic ll/st sequenes, the atomics
> clearly do. Building BWX-enabled kernels and userland
> completely avoids this problem, which make debugging
> easier for the remaining users when stuff breaks.

 This only shows the lack of proper verification here rather than just 
use.  I'm not even sure if the nature of this problem is going to make it 
trigger in GCC regression testing.  Which BTW I have wired my EV45 system 
for in my lab last year and which would be going by now if not for issues 
with support network automation equipment (FAOD, state of the art and 
supported by the manufacturer).  We shall see once I'm done.

 As John Paul Adrian has pointed out the removal was expedited with no 
attempt made to find a proper solution that would not affect other users.  
As you can see it took me one e-mail exchange with Linus to understand 
what the underlying issue has been and then just a little bit of thinking, 
maybe half an hour, likely even less, to identify a feasible solution.

 Yes, I could have come up with it maybe a month ago if I wasn't so much 
behind on mailing list traffic.  But it's not my day job and since we had 
this issue for years now, it wasn't something that had to be handled as a 
matter of urgency.  We all are people and have our limitations.  We could 
have waited with the RFC out for another development cycle.  This has been 
the point of my complaint.

  Maciej
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Linus Torvalds 1 year, 6 months ago
On Fri, 31 May 2024 at 08:48, Arnd Bergmann <arnd@arndb.de> wrote:
>
>  /* Is this type a native word size -- useful for atomic operations */
>  #define __native_word(t) \
> -       (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
> -        sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
> +       (sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
>
>  #ifdef __OPTIMIZE__
>  # define __compiletime_assert(condition, msg, prefix, suffix)          \
>
> The WRITE_ONCE() calls tend to be there in order to avoid
> expensive atomic or locking when something can be expressed
> with a store that known to be visible atomically (on all other
> architectures).

No, if you go down this road, then you would want to do the same thing
we do for READ_ONCE() - but for a different reason - hook into it for
alpha, and add a memory barrier to get rid of the crazy alpha memory
ordering:

  /*
   * Alpha is apparently daft enough to reorder address-dependent loads
   * on some CPU implementations. Knock some common sense into it with
   * a memory barrier in READ_ONCE().
   *
   * For the curious, more information about this unusual reordering is
   * available in chapter 15 of the "perfbook":
   *
   *  https://kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html
   *
   */
  #define __READ_ONCE(x)                                                  \
  ({                                                                      \
        __unqual_scalar_typeof(x) __x =                                 \
                (*(volatile typeof(__x) *)(&(x)));                      \
        mb();                                                           \
        (typeof(x))__x;                                                 \
  })

and the solution would be to make a __WRITE_ONCE() that then uses
"sizeof()" to decide at compile-time whether it can just do it as a
regular write, or whether it needs to do it as a LL/SC loop.

Because we're definitely not changing hundreds - probably thousands -
of random generic data structures.

That said, the above fixes WRITE_ONCE() without changing the
definition of what a native word size is, but doesn't actually *fix*
the problem.

Let's take a really simple example:

    struct net_device {
        ...
        u8 reg_state;

        bool dismantle;

        enum {
                RTNL_LINK_INITIALIZED,
                RTNL_LINK_INITIALIZING,
        } rtnl_link_state:16;
        ...

are all in the same 32-bit word, and we intentionally have code
without locking like this:

        WRITE_ONCE(dev->reg_state, NETREG_RELEASED);
...
        return READ_ONCE(dev->reg_state) <= NETREG_REGISTERED;

because the code knows the state machine ordering requirements (ie
once it has moved past NETREG_REGISTERED, it won't move back).

So now - assuming we fix WRITE_ONCE() to use LL/SC, these READ_ONCE()
and WRITE_ONCE() games work fine on alpha

BUT.

Code that then does something like this:

        dev->dismantle = true;

which is all nice and good (accesses are done under the RTNL lock) now
will possibly race with the unlocked reg_state accesses.

So it's still fundamentally buggy.

And before you say "that's why I wanted to fix the __native_word()
definition", please realize that the above happens EVEN WITH the
READ_ONCE/WRITE_ONCE being done on an "int".

Yes, really. The READ_ONCE and WRITE_ONCE will be individual
instructions. But lookie here, if we have

        u32 reg_state;
        bool dismantle;

and they happen to share the same 8-byte word, and somebody passes
'&dismantle' off to something that does byte writes to it, guess what
the canonical byte write sequence is?

That's right, it looks something like this (excuse any bugs, this is
from memory and looking up the ops in the architecture manual):

        LDQ_U tmp,(addr)
        INSBL byte,addr,tmp2
        MSKBL tmp,addr,tmp
        OR tmp,tmp2,tmp
        STQ_U tmp,(addr)

and notice how in the process it read and then wrote that supposedly
atomic 'req_state" that was otherwise accessed purely with 32-bit
atomic instructions?

There are no LDL_U/STL_U instructions. The unaligned memory ops are
always 8 bytes wide (you can obviously always do address masking
manually and "emulate" a LDL_U/STL_U model, but then you make already
bad code generation even *worse*).

So no. Even 32-bit values aren't "atomic" in alpha, because of the
complete sh*t-show that is lack of byte ops.

NOTE NOTE NOTE! Note how I said "pass off the address of
'dev->dismantle' to something that does byte ops"? If you *know* the
alignment of the byte in a structure, so you don't just get a random
pointer to a byte, you can - and should - generate better code on
alpha, which may in fact involve just doing a 32-bit load, masking off
the low bits, and doing the 32-bit store.

So that LDQ_U/STQ_U sequence is for the generic case, with various
simpler sub-cases that don't necessarily require it.

The fact is, the original alpha is the worst architecture ever made.
The lack of byte instructions and the absolutely horrendous memory
ordering are fatal flaws. And while the memory ordering arguably had
excuses for it ("they didn't know better"), the lack of byte ops was
wilful misdesign that the designers were proud of, and made a central
tenet of their mess.

And I say that as somebody who *loved* it originally. Yes, the lack of
byte operations always was a pain, because it really caused the IO
subsystem to be a nightmare, but I was young, I was stupid, it was
interesting, and I had bought into the kool aid.

But alpha without BWX really is shit. People think x86 is bad. Those
people have NO CLUE.

               Linus
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Maciej W. Rozycki 1 year, 5 months ago
On Fri, 31 May 2024, Linus Torvalds wrote:

> The fact is, the original alpha is the worst architecture ever made.
> The lack of byte instructions and the absolutely horrendous memory
> ordering are fatal flaws. And while the memory ordering arguably had
> excuses for it ("they didn't know better"), the lack of byte ops was
> wilful misdesign that the designers were proud of, and made a central
> tenet of their mess.
> 
> And I say that as somebody who *loved* it originally. Yes, the lack of
> byte operations always was a pain, because it really caused the IO
> subsystem to be a nightmare, but I was young, I was stupid, it was
> interesting, and I had bought into the kool aid.

 Looking from today's perspective it was clearly a bad choice.  However it 
was 30+ years ago, it wasn't so certain as it is now that x86 was there to 
stay -- indeed as I recall it DEC had the ambition to phase x86 out with 
their Alpha (whether they approached it the right way business-wise is 
another matter) -- so the notion of having a fully byte-addressed machine 
perhaps wasn't yet so obvious to DEC engineers as it is now, when most if 
not all the current CPU architectures have these fundamentals the same.  

 As I say it may have been the final attempt to do something differently 
before x86 domination forced everyone to be at least remotely compatible.

 And there used to be weirder architectures before people moved away from 
them and settled on the current paradigm, just as nobody wants to build a 
general-purpose atmospheric railway anymore and yet a while ago not only 
it didn't appear ridiculous, but such stuff was actually built and run, 
such as the South Devon Railway.

 And then it wasn't the only failed attempt: remember the i860 or the 
iAPX432?  At least the Alphas weren't a total disaster, they made their 
impact, the worst mistakes have been fixed as the architecture evolved, 
and the engineering legacy remains, often in unexpected places.

  Maciej
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Linus Torvalds 1 year, 5 months ago
On Mon, 1 Jul 2024 at 16:48, Maciej W. Rozycki <macro@orcam.me.uk> wrote:
>
>  Looking from today's perspective it was clearly a bad choice.  However it
> was 30+ years ago, it wasn't so certain as it is now that x86 was there to
> stay

No.

The thing is, it was objectively the wrong thing to do even 30 years
ago, and has nothing to do with x86.

The lack of byte operations literally means that even _word_
operations aren't reliable.

Because when you emulate byte operations with quad-word operations -
which is the way the alpha instruction set was literally designed -
you mess with the adjacent word too.

So even word accesses aren't safe. And I'm pretty sure that
'sig_atomic_t' was just 32-bit on alpha (that's what glibc had, and
I'm pretty sure OSF/1 did too). So...

And that's an issue even just UP, and just completely bog-standard
POSIX 1003.1 and C.

You really can't get much more basic than that.

So supposedly portable programs would have subtle bugs because the
architecture was bad, and the workarounds for that badness were
incomplete.

SMP and IO - which are a thing, and which were things that the
architecture was allegedly designed for - are then only much worse.

The architecture was wrong 30 years ago. It's not that it "became"
wrong in hindsight. It was wrong originally, and it's just that people
hadn't thought things through enough to realize how wrong it was.

The only way it's not wrong is if you say "byte accesses do not
matter". That's a very Cray way of looking at things - Cray 1 had a
64-bit "char" in C, because there were no byte accesses.

That's fine if your only goal in life is to do HPC.

So if you simply don't care about bytes, and you *only* work with
words and quad-words, then alpha looks ok.

But honestly, that's basically saying "in a different universe, alpha
is not a mis-design".

That's not the universe we live in, and it's entirely unrelated to
x86. Bytes were very much a thing 30 years ago, and they will be a
thing 30 years from now even if x86 is dead and buried.

Basically, the fundamental mistake of thinking that you can do byte
operations by just masking quad-words screwed up POSIX compatibility,
screwed up SMP, and majorly screwed up the alpha IO layer too.

And by the time it was fixed, it was too late.

Don't make excuses for it. It's not ok today, but it really wasn't ok
30 years ago either.

It's ok to have rose-colored glasses and have a weak spot in your
heart for an architecture. But let's not make that weak spot in your
heart be a weak spot in your mind.

            Linus
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Maciej W. Rozycki 1 year, 5 months ago
On Mon, 1 Jul 2024, Linus Torvalds wrote:

> The architecture was wrong 30 years ago. It's not that it "became"
> wrong in hindsight. It was wrong originally, and it's just that people
> hadn't thought things through enough to realize how wrong it was.
> 
> The only way it's not wrong is if you say "byte accesses do not
> matter". That's a very Cray way of looking at things - Cray 1 had a
> 64-bit "char" in C, because there were no byte accesses.
> 
> That's fine if your only goal in life is to do HPC.
> 
> So if you simply don't care about bytes, and you *only* work with
> words and quad-words, then alpha looks ok.
> 
> But honestly, that's basically saying "in a different universe, alpha
> is not a mis-design".

 Precisely my point!  We got so used to think in multiples of 8 bits that 
other approaches seem ridiculous.

 The PDP-10 operated on 36-bit quantities and strings were essentially 
clusters of 6-bit characters packed into 6-packs (which is also allegedly 
where the C language's original limitation of using at most six characters 
for identifiers came from -- so that the PDP-10 could compare a pair with 
a single machine instruction).

 So there was already legacy of doing things this way at DEC back in ~1990 
and I can envisage engineers there actually thought that to have a machine 
that in C terms has 32-bit shorts and ints, 64-bit longs and pointers, and 
strings as clusters of 8-bit characters packed into 4-packs or 8-packs was 
not at all unreasonable.  Or maybe just plain 32-bit characters.  After 
all you don't absolutely *have* to use data types of 8 or 16 bits exactly 
in width for anything, do you?  NB for strings nowadays we have Unicode 
and we could just use UTF-32 if not to waste memory.

 And even now ISO C is very flexible on data type widths and only requires 
the character data type to be at least 8 bits wide, and 16-bit and 24-bit 
examples are actually given in the standard itself.  Yes, POSIX requires 
the character data type to be 8 bits wide exactly now, but POSIX.1-1988 
deferred to ANSI C AFAICT.

  Maciej
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Linus Torvalds 1 year, 5 months ago
On Tue, 2 Jul 2024 at 17:12, Maciej W. Rozycki <macro@orcam.me.uk> wrote:
>
> On Mon, 1 Jul 2024, Linus Torvalds wrote:
> >
> > But honestly, that's basically saying "in a different universe, alpha
> > is not a mis-design".
>
>  Precisely my point!  We got so used to think in multiples of 8 bits that
> other approaches seem ridiculous.

But Maciej - alpha *was* designed for bytes. It wasn't a Cray 1. It
wasn't a PDP-10. It was designed by the time people knew that bytes
were the dominant thing, and that bytes were important and the main
use case.

But it was designed BADLY. The architecture sucked.

Give it up. If alpha had been designed in the 60s or 70s when the
whole issue of bytes were was debatable, it would have been
incredible.

But no. It was designed for byte accesses, and it FAILED AT THEM.

              Linus
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Maciej W. Rozycki 1 year, 5 months ago
On Tue, 2 Jul 2024, Linus Torvalds wrote:

> >  Precisely my point!  We got so used to think in multiples of 8 bits that
> > other approaches seem ridiculous.
> 
> But Maciej - alpha *was* designed for bytes. It wasn't a Cray 1. It
> wasn't a PDP-10. It was designed by the time people knew that bytes
> were the dominant thing, and that bytes were important and the main
> use case.
> 
> But it was designed BADLY. The architecture sucked.

 OK, perhaps it was those people who decided to make it that way that 
lived in a parallel universe.

> Give it up. If alpha had been designed in the 60s or 70s when the
> whole issue of bytes were was debatable, it would have been
> incredible.
> 
> But no. It was designed for byte accesses, and it FAILED AT THEM.

 I guess they decided that trading byte and word accesses for simpler bus 
logic that does not have all the bits required to issue an RMW operation 
to recalculate the ECC syndrome on such accesses was a good deal, and I 
guess they did not realise data race implications or thought they could be 
sorted in a reasonable way.  The avoidance of RMWs is explicitly mentioned 
in the preface to the Alpha ARM.

 And I guess you are aware that getting an asynchronous multi-bit ECC 
error interrupt for a partial write the origin of which has long gone and 
all you have is the physical address is also a horror to handle.

 Bad choice I guess anyway.  Too many guesses I guess too.

  Maciej
RE: [PATCH 00/14] alpha: cleanups for 6.10
Posted by David Laight 1 year, 6 months ago
...
> The fact is, the original alpha is the worst architecture ever made.
> The lack of byte instructions and the absolutely horrendous memory
> ordering are fatal flaws. And while the memory ordering arguably had
> excuses for it ("they didn't know better"), the lack of byte ops was
> wilful misdesign that the designers were proud of, and made a central
> tenet of their mess.

If it wasn't from DEC (where the pdp-11 and vax were fine) I'd think
it was someone harking back to the old mainframe days where is was
perfectly normal to only have 'word addressing' and, for example,
to put three 6-bit characters into an 18-bit word (hi Univac!).
(Don't even think how 18-bit words got written to mag tape!)

It is almost as is someone assumed that the only use for byte accesses
was within character arrays - and they can jolly well align the arrays.

Mind you, all the byte shifting needed to get the data onto the
right data bus lines is a PITA and will affect the max cpu frequency [1].
So perhaps they decided it was a 'software problem' so some benchmarks
could run faster.

	David

[1] I've been busy re-implementing the Nios-II cpu.

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Linus Torvalds 1 year, 6 months ago
On Wed, 29 May 2024 at 11:50, Maciej W. Rozycki <macro@orcam.me.uk> wrote:
>
>              The only difference here is that with
> hardware read-modify-write operations atomicity for sub-word accesses is
> guaranteed by the ISA, however for software read-modify-write it has to be
> explictly coded using the usual load-locked/store-conditional sequence in
> a loop.

I have some bad news for you: the old alpha CPU's not only screwed up
the byte/word design, they _also_ screwed up the
load-locked/store-conditional.

You'd think that LL/SC would be done at a cacheline level, like any
sane person would do.

But no.

The 21064 actually did atomicity with an external pin on the bus, the
same way people used to do before caches even existed.

Yes, it has an internal L1 D$, but it is a write-through cache, and
clearly things like cache coherency weren't designed for. In fact,
LL/SC is even documented to not work in the external L2 cache
("Bcache" - don't ask me why the odd naming).

So LL/SC on the 21064 literally works on external memory.

Quoting the reference manual:

  "A.6 Load Locked and Store Conditional
  The 21064 provides the ability to perform locked memory accesses through
  the LDxL (Load_Locked) and STxC (Store_Conditional) cycle command pair.
  The LDxL command forces the 21064 to bypass the Bcache and request data
  directly from the external memory interface. The memory interface logic must
  set a special interlock flag as it returns the data, and may
optionally keep the
  locked address"

End result: a LL/SC pair is very very slow. It was incredibly slow
even for the time. I had benchmarks, I can't recall them, but I'd like
to say "hundreds of cycles". Maybe thousands.

So actual reliable byte operations are not realistically possible on
the early alpha CPU's. You can do them with LL/SC, sure, but
performance would be so horrendously bad that it would be just sad.

The 21064A had some "fast lock" mode which allows the data from the
LDQ_L to come from the Bcache. So it still isn't exactly fast, and it
still didn't work at CPU core speeds, but at least it worked with the
external cache.

Compilers will generate the sequence that DEC specified, which isn't
thread-safe.

In fact, it's worse than "not thread safe". It's not even safe on UP
with interrupts, or even signals in user space.

It's one of those "technically valid POSIX", since there's
"sig_atomic_t" and if you do any concurrent signal stuff you're
supposed to only use that type. But it's another of those "Yeah, you'd
better make sure your structure members are either 'int' or bigger, or
never accessed from signals or interrupts, or they might clobber
nearby values".

           Linus
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Maciej W. Rozycki 1 year, 6 months ago
On Wed, 29 May 2024, Linus Torvalds wrote:

> >              The only difference here is that with
> > hardware read-modify-write operations atomicity for sub-word accesses is
> > guaranteed by the ISA, however for software read-modify-write it has to be
> > explictly coded using the usual load-locked/store-conditional sequence in
> > a loop.
> 
> I have some bad news for you: the old alpha CPU's not only screwed up
> the byte/word design, they _also_ screwed up the
> load-locked/store-conditional.
> 
> You'd think that LL/SC would be done at a cacheline level, like any
> sane person would do.
> 
> But no.
> 
> The 21064 actually did atomicity with an external pin on the bus, the
> same way people used to do before caches even existed.

 Umm, 8086's LOCK#, anyone?

> Yes, it has an internal L1 D$, but it is a write-through cache, and
> clearly things like cache coherency weren't designed for. In fact,
> LL/SC is even documented to not work in the external L2 cache
> ("Bcache" - don't ask me why the odd naming).

 Board cache, I suppose.

> So LL/SC on the 21064 literally works on external memory.
> 
> Quoting the reference manual:
> 
>   "A.6 Load Locked and Store Conditional
>   The 21064 provides the ability to perform locked memory accesses through
>   the LDxL (Load_Locked) and STxC (Store_Conditional) cycle command pair.
>   The LDxL command forces the 21064 to bypass the Bcache and request data
>   directly from the external memory interface. The memory interface logic must
>   set a special interlock flag as it returns the data, and may
> optionally keep the
>   locked address"
> 
> End result: a LL/SC pair is very very slow. It was incredibly slow
> even for the time. I had benchmarks, I can't recall them, but I'd like
> to say "hundreds of cycles". Maybe thousands.

 Interesting and disappointing, given how many years the Alpha designers 
had to learn from the MIPS R4000.  Which they borrowed from already after 
all and which they had first-hand experience with present onboard, from 
the R4000 DECstation systems built at their WSE facility.  Hmm, I wonder 
if there was patent avoidance involved.

> So actual reliable byte operations are not realistically possible on
> the early alpha CPU's. You can do them with LL/SC, sure, but
> performance would be so horrendously bad that it would be just sad.

 Hmm, performance with a 30 years old system?  Who cares!  It mattered 30 
years ago, maybe 25.  And the performance of a system that runs slowly is 
still infinitely better than one of a system that doesn't boot anymore, 
isn't it?

> The 21064A had some "fast lock" mode which allows the data from the
> LDQ_L to come from the Bcache. So it still isn't exactly fast, and it
> still didn't work at CPU core speeds, but at least it worked with the
> external cache.
> 
> Compilers will generate the sequence that DEC specified, which isn't
> thread-safe.
> 
> In fact, it's worse than "not thread safe". It's not even safe on UP
> with interrupts, or even signals in user space.

 Ouch, I find it a surprising oversight.  Come to think of it indeed the 
plain unlocked read-modify-write sequences are unsafe.  I don't suppose 
any old DECies are still around, but any idea how this was sorted in DEC's 
own commercial operating systems (DU and OVMS)?

 So this seems like something that needs to be sorted in the compiler, by 
always using a locked sequence for 8-bit and 16-bit writes with non-BWX 
targets.  I can surely do it myself, not a big deal, and I reckon such a 
change to GCC should be pretty compact and self-contained, as all the bits 
are already within `alpha_expand_mov_nobwx' anyway.

 I'm not sure if Richard will be happy to accept it, but it seems to me 
the right thing to do at this point and with that in place there should be 
no safety concern for RCU or anything with the old Alphas, with no effort 
at all on the Linux side as all the burden will be on the compiler.  We 
may want to probe for the associated compiler option though and bail out 
if unsupported.

 Will it be enough to keep Linux support at least until the next obstacle?

  Maciej
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Linus Torvalds 1 year, 6 months ago
On Thu, 30 May 2024 at 15:57, Maciej W. Rozycki <macro@orcam.me.uk> wrote:
>
> On Wed, 29 May 2024, Linus Torvalds wrote:
> >
> > The 21064 actually did atomicity with an external pin on the bus, the
> > same way people used to do before caches even existed.
>
>  Umm, 8086's LOCK#, anyone?

Well, yes and no.

So yes, exactly like 8086 did before having caches.

But no, not like the alpha contemporary PPro that did have caches. The
PPro already did locked cycles in the caches.

Yes, the PPro still did have an external lock pin (and in fact current
much more modern x86 CPUs do too), but it's only used for locked IO
accesses or possibly cacheline crossing accesses.

So x86 has supported atomic accesses on IO - and it is very very slow,
to this day. So slow, and problematic, in fact, that Intel is only now
trying to remove it (look  up "split lock"

But the 21064 explicitly did not support locking on IO - and unaligned
LL/SC accesses obviously also did not work.

So I really feel the 21064 was broken.

It's probably related to the whole cache coherency being designed to
be external to the built-in caches - or even the Bcache. The caches
basically are write-through, and the weak memory ordering was designed
for allowing this horrible model.

> > In fact, it's worse than "not thread safe". It's not even safe on UP
> > with interrupts, or even signals in user space.
>
>  Ouch, I find it a surprising oversight.

The sad part is that it doesn't seem to have been an oversight. It
really was broken-as-designed.

Basically, the CPU was designed for single-threaded Spec benchmarks
and absolutely nothing else. Classic RISC where you recompile to fix
problems like the atomicity thing - "just use a 32-bit sig_atomic_t
and you're fine")

The original alpha architecture handbook makes a big deal of how
clever the lack of byte and word operations is. I also remember
reading an article by Dick Sites - one of the main designers - talking
a lot about how the lack of byte operations is great, and encourages
vectorizing byte accesses and doing string operations in whole words.

          Linus
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Maciej W. Rozycki 1 year, 6 months ago
On Thu, 30 May 2024, Linus Torvalds wrote:

> > > The 21064 actually did atomicity with an external pin on the bus, the
> > > same way people used to do before caches even existed.
> >
> >  Umm, 8086's LOCK#, anyone?
> 
> Well, yes and no.
> 
> So yes, exactly like 8086 did before having caches.

 Well I wrote 8086 specifically, not x86.

> But no, not like the alpha contemporary PPro that did have caches. The
> PPro already did locked cycles in the caches.

 But the 21064 does predate the PPro by a couple of years: Feb 1992 vs Nov 
1995, so surely Intel folks had extra time to resolve this stuff properly.  

 Conversely the R4000 came about in Oct 1991, so before the 21064.  But 
only so slightly and not as much as I remembered (I thought the 21064 was 
more like 1993), so it seems like DEC couldn't have had enough time after 
all to figure out what SGI did (patents notwithstanding).  Surely the 
R4000MC cache coherency protocol was complex for the silicon technology of 
the time, but it's just MOESI in modern terms AFAICT, and LL/SC is handled 
there (and is in fact undefined for uncached accesses).

 I'm not sure what else was out there at the time, but going back to x86 
the i486 was contemporary, the original write-through cache version, which 
if memory serves, was not any better in this respect (and the "write-back 
enhanced" DX2/DX4 models with proper MESI cache protocol came out much 
later, after Pentium only, which they borrowed from).

> So I really feel the 21064 was broken.
> 
> It's probably related to the whole cache coherency being designed to
> be external to the built-in caches - or even the Bcache. The caches
> basically are write-through, and the weak memory ordering was designed
> for allowing this horrible model.

 In retrospect perhaps it wasn't the best design, but they have learnt 
from their mistakes.

> > > In fact, it's worse than "not thread safe". It's not even safe on UP
> > > with interrupts, or even signals in user space.
> >
> >  Ouch, I find it a surprising oversight.
> 
> The sad part is that it doesn't seem to have been an oversight. It
> really was broken-as-designed.
> 
> Basically, the CPU was designed for single-threaded Spec benchmarks
> and absolutely nothing else. Classic RISC where you recompile to fix
> problems like the atomicity thing - "just use a 32-bit sig_atomic_t
> and you're fine")

 Not OK however, as you correctly point out, for plain ordinary non-atomic 
stuff.  Point me at any document that claims that a pair of threads poking 
at even and odd byte vector elements each is not allowed.  Caches may not 
enjoy it, but there's nothing AFAIK saying this is UB or whatever.

> The original alpha architecture handbook makes a big deal of how
> clever the lack of byte and word operations is. I also remember

 I've seen that; dropped in v3 with the addition of the BWX extension.

> reading an article by Dick Sites - one of the main designers - talking
> a lot about how the lack of byte operations is great, and encourages
> vectorizing byte accesses and doing string operations in whole words.

 Yeah, the software folks at DEC must have been delighted porting all the 
VAX VMS software.  But pehaps this was the last attempt to try something 
different from the CPU architecture standards established back in 1970s 
(by the VAX among others) that make current designs so similar to one 
another.

 Anyway, back to my point.  A feasible solution non-intrusive for Linux 
and low-overhead for GCC has been found.  I can expedite implementation 
and I'll see if I can regression-test it too, but I may have to rely on 
other people to complete it after all, as I haven't been prepared for this 
effort in the light of certain issues I have recently suffered from in my 
lab.

 Is that going to be enough to bring the platform bits back?

 FAOD, with all the hacks so eagerly being removed now happily left in the 
dust bin where they belong, and which I wholeheartedly agree with: we 
shouldn't be suffering from design mistakes of systems that are no longer 
relevant, but I fail to see the reason why we should disallow their use 
where the burden is confined or plain elsewhere.

 For example we continue supporting old UP MIPS platforms that predate 
LL/SC, by just trapping and emulating these instructions.  Surely it sucks 
performance-wise and it's possibly hundreds of cycles too, but it works 
and the burden is confined to the exception handler, so not a big deal.

  Maciej
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Linus Torvalds 1 year, 6 months ago
On Mon, 3 Jun 2024 at 04:09, Maciej W. Rozycki <macro@orcam.me.uk> wrote:
>
>  Anyway, back to my point.  A feasible solution non-intrusive for Linux
> and low-overhead for GCC has been found.  I can expedite implementation
> and I'll see if I can regression-test it too, but I may have to rely on
> other people to complete it after all, as I haven't been prepared for this
> effort in the light of certain issues I have recently suffered from in my
> lab.

Yeah, if compiler support makes us not have to care, then I don't
think the difference between pre-BWX and BWX is going to matter much
for the kernel.

The real pain with alpha has been that it's special enough that it
affects non-alpha code, and BWX was one big piece of that.

That said, some of the EV4 misfeatures end up being a huge pain inside
the alpha code either because of the horrible hoops that the IO
accessors have to jump through, or because of the broken ASID's.

So even with enw compiler support, maybe it's worth trying to
re-introduce any support for older cpu's incrementally.

For example, the ASID hw issue is _claimed_ to have been fixed in
PALcode, and maybe the games we played for ev4-era cpus aren't
actually needed any more?

And the various odd IO platforms should only be re-introduced when
there are people who actually have access to the relevant hardware and
will test.

           Linus
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Maciej W. Rozycki 1 year, 5 months ago
On Mon, 3 Jun 2024, Linus Torvalds wrote:

> >  Anyway, back to my point.  A feasible solution non-intrusive for Linux
> > and low-overhead for GCC has been found.  I can expedite implementation
> > and I'll see if I can regression-test it too, but I may have to rely on
> > other people to complete it after all, as I haven't been prepared for this
> > effort in the light of certain issues I have recently suffered from in my
> > lab.
> 
> Yeah, if compiler support makes us not have to care, then I don't
> think the difference between pre-BWX and BWX is going to matter much
> for the kernel.
> 
> The real pain with alpha has been that it's special enough that it
> affects non-alpha code, and BWX was one big piece of that.

 Understood, that's burden beyond justification for an obsolete legacy 
platform.

> That said, some of the EV4 misfeatures end up being a huge pain inside
> the alpha code either because of the horrible hoops that the IO
> accessors have to jump through, or because of the broken ASID's.
> 
> So even with enw compiler support, maybe it's worth trying to
> re-introduce any support for older cpu's incrementally.

 Ack.

> For example, the ASID hw issue is _claimed_ to have been fixed in
> PALcode, and maybe the games we played for ev4-era cpus aren't
> actually needed any more?

 Actually my system seems to be an odd relic that has very old PALcode:

[...]
X3.7-10895, built on Sep 15 1994 at 10:19:05
>>>sh conf

SRM Console X3.7-10895  VMS PALcode X5.48-60, OSF PALcode X1.35-42
[...]

-- which is dated well before the system's release date.  It has been 
heavily patched with extra components retrofitted on the PCB as if an 
early hardware revision and the part number labelled on the PCB it's an 
AlphaStation 250 and yet it came packaged as AlphaServer 300 (the only 
documented difference between the PCBs of the two systems is the maximum 
amount of DRAM supported), with a vast mismatch between the dates given on 
the PCB and the case.  I don't know what's the story behind it, maybe it 
once was a DEC engineering machine.

 And for instance its SRM cannot netboot over BOOTP/TFTP, it can only 
use MOP.  Not an issue for me, and I feel a bit uneasy about upgrading the 
firmware, I'd rather I didn't brick the machine.  I guess we shall see 
whether it matters and if so, then what can be done about it.

 I used an AlphaServer 300 before that was purchased brand new and I can't 
recall any such patching on the PCB, and I reckon SRM was more modern too.
Indeed having checked old logs I found:

[...]
version                 X6.2-165 Nov  4 1996 10:06:10
>>>sh conf

Firmware
SRM Console:    X6.2-165
ARC Console:    4.49
PALcode:        VMS PALcode V5.56-2, OSF PALcode X1.46-2
[...]

> And the various odd IO platforms should only be re-introduced when
> there are people who actually have access to the relevant hardware and
> will test.

 Absolutely, what's the point of keeping something we have no way to
verify?  I'll begin with what I'm interested in myself and will gather 
input from people willing to verify stuff with other hardware they may 
have.

 Anyway, it's been a hectic month for me and I have my Alpha machine in 
the remote lab fully ready for this effort now, with a number of issues 
fixed, most importantly rather tricky GCC PR rtl-optimization/115565, 
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115565> which prevented the 
userland from being used as installed.

 With that in place the system was able to complete GCC 15 verification, 
so now it should be able to do pretty much anything.  I ran some glibc 
upstream master testing too.

 With that ticked off I do hope to work on the GCC part throughout July, 
and then the kernel bits will follow.

  Maciej
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Maciej W. Rozycki 1 year ago
On Tue, 2 Jul 2024, Maciej W. Rozycki wrote:

> > >  Anyway, back to my point.  A feasible solution non-intrusive for Linux
> > > and low-overhead for GCC has been found.  I can expedite implementation
> > > and I'll see if I can regression-test it too, but I may have to rely on
> > > other people to complete it after all, as I haven't been prepared for this
> > > effort in the light of certain issues I have recently suffered from in my
> > > lab.
> > 
> > Yeah, if compiler support makes us not have to care, then I don't
> > think the difference between pre-BWX and BWX is going to matter much
> > for the kernel.
> > 
> > The real pain with alpha has been that it's special enough that it
> > affects non-alpha code, and BWX was one big piece of that.
> 
>  Understood, that's burden beyond justification for an obsolete legacy 
> platform.

 FTR a proposed implementation of the solution is now discussed here:
<https://inbox.sourceware.org/gcc-patches/alpine.DEB.2.21.2411141652300.9262@angie.orcam.me.uk/>.

  Maciej
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by John Paul Adrian Glaubitz 1 year, 6 months ago
Hi Maciej,

On Mon, 2024-06-03 at 12:09 +0100, Maciej W. Rozycki wrote:
>  Anyway, back to my point.  A feasible solution non-intrusive for Linux 
> and low-overhead for GCC has been found.  I can expedite implementation 
> and I'll see if I can regression-test it too, but I may have to rely on 
> other people to complete it after all, as I haven't been prepared for this 
> effort in the light of certain issues I have recently suffered from in my 
> lab.

That's really great to hear! Please let me know if you have something to test,
I would love to help with this effort.

>  Is that going to be enough to bring the platform bits back?

That would be awesome. Would love to be able to keep running a current kernel
on my AlphaStation 233 which is pre-EV56.

>  FAOD, with all the hacks so eagerly being removed now happily left in the 
> dust bin where they belong, and which I wholeheartedly agree with: we 
> shouldn't be suffering from design mistakes of systems that are no longer 
> relevant, but I fail to see the reason why we should disallow their use 
> where the burden is confined or plain elsewhere.

Agreed.

>  For example we continue supporting old UP MIPS platforms that predate 
> LL/SC, by just trapping and emulating these instructions.  Surely it sucks 
> performance-wise and it's possibly hundreds of cycles too, but it works 
> and the burden is confined to the exception handler, so not a big deal.

Fully agreed.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Paul E. McKenney 1 year, 6 months ago
On Wed, May 29, 2024 at 07:50:28PM +0100, Maciej W. Rozycki wrote:
> On Tue, 28 May 2024, Paul E. McKenney wrote:
> 
> > > > > This topic came up again when Paul E. McKenney noticed that
> > > > > parts of the RCU code already rely on byte access and do not
> > > > > work on alpha EV5 reliably, so I refreshed my series now for
> > > > > inclusion into the next merge window.
> > > > 
> > > > Hrrrm? That sounds like like Paul ran tests on EV5, did he?
> > > 
> > >  What exactly is required to make it work?
> > 
> > Whatever changes are needed to prevent the data corruption that can
> > currently result in code generated by single-byte stores.  For but one
> > example, consider a pair of tasks (or one task and an interrupt handler
> > in the CONFIG_SMP=n case) do a single-byte store to a pair of bytes
> > in the same machine word.  As I understand it, in code generated for
> > older Alphas, both "stores" will load the word containing that byte,
> > update their own byte, and store the updated word.
> > 
> > If two such single-byte stores run concurrently, one or the other of those
> > two stores will be lost, as in overwritten by the other.  This is a bug,
> > even in kernels built for single-CPU systems.  And a rare bug at that, one
> > that tends to disappear as you add debug code in an attempt to find it.
> 
>  Thank you for the detailed description of the problematic scenario.
> 
>  I hope someone will find it useful, however for the record I have been 
> familiar with the intricacies of the Alpha architecture as well as their 
> implications for software for decades now.  The Alpha port of Linux was 
> the first non-x86 Linux platform I have used and actually (and I've chased 
> that as a matter of interest) my first ever contribution to Linux was for 
> Alpha platform code:
> 
> On Mon, 30 Mar 1998, Jay.Estabrook@digital.com wrote:
> 
> > Hi, sorry about the delay in answering, but you'll be happy to know, I took
> > your patches and merged them into my latest SMP patches, and submitted them
> > to Linus just last night. He promises them to (mostly) be in 2.1.92, so we
> > can look forward to that... :-)
> 
> so I find the scenario you have described more than obvious.

Glad that it helped.

>  Mind that the read-modify-write sequence that software does for sub-word 
> write accesses with original Alpha hardware is precisely what hardware 
> would have to do anyway and support for that was deliberately omitted by 
> the architecture designers from the ISA to give it performance advantages 
> quoted in the architecture manual.  The only difference here is that with 
> hardware read-modify-write operations atomicity for sub-word accesses is 
> guaranteed by the ISA, however for software read-modify-write it has to be 
> explictly coded using the usual load-locked/store-conditional sequence in 
> a loop.  I don't think it's a big deal really, it should be trivial to do 
> in the relevant accessors, along with the memory barriers that are needed 
> anyway for EV56+ and possibly other ports such as the MIPS one.

There shouldn't be any memory barriers required, and don't EV56+ have
single-byte loads and stores?

>  What I have been after actually is: can you point me at a piece of code 
> in our tree that will cause an issue with a non-BWX Alpha as described in 
> your scenario, so that I have a starting point?  Mind that I'm completely 
> new to RCU as I didn't have a need to research it before (though from a 
> skim over Documentation/RCU/rcu.rst I understand what its objective is).

See the uses of the fields of the current->rcu_read_unlock_special.b
anonymous structure for the example that led us here.  And who knows how
many other pieces of the Linux kernel that assume that it is possible
to atomically store a single byte.

Many of which use a normal C-language store, in which case there are
no accessors.  This can be a problem even in the case where there are no
data races to either byte, because the need for the read-modify-write
sequence on older Alpha systems results in implicit data races at the
machine-word level.

>  FWIW even if it was only me I think that depriving the already thin Alpha 
> port developer base of any quantity of the remaining manpower, by dropping 
> support for a subset of the hardware available, and then a subset that is 
> not just as exotic as the original i386 became to the x86 platform at the 
> time support for it was dropped, is only going to lead to further demise 
> and eventual drop of the entire port.

Yes, support has been dropped for some of the older x86 CPUs as well,
for example, Linux-kernel support for multiprocessor 80386 systems was
dropped a great many years ago, in part because those CPUs do not have
a cmpxchg instruction.  So it is not like we are picking on Alpha.

>  And I think it would be good if we kept the port, just as we keep other 
> ports of historical significance only, for educational reasons if nothing 
> else, such as to let people understand based on an actual example, once 
> mainstream, the implications of weakly ordered memory systems.

I don't know of any remaining issues with the newer Alpha systems that do
support single-byte and double-byte load and store instructions, and so
I am not aware of any reason for dropping Linux-kernel support for them.

							Thanx, Paul
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Maciej W. Rozycki 1 year, 6 months ago
On Wed, 29 May 2024, Paul E. McKenney wrote:

> >  Mind that the read-modify-write sequence that software does for sub-word 
> > write accesses with original Alpha hardware is precisely what hardware 
> > would have to do anyway and support for that was deliberately omitted by 
> > the architecture designers from the ISA to give it performance advantages 
> > quoted in the architecture manual.  The only difference here is that with 
> > hardware read-modify-write operations atomicity for sub-word accesses is 
> > guaranteed by the ISA, however for software read-modify-write it has to be 
> > explictly coded using the usual load-locked/store-conditional sequence in 
> > a loop.  I don't think it's a big deal really, it should be trivial to do 
> > in the relevant accessors, along with the memory barriers that are needed 
> > anyway for EV56+ and possibly other ports such as the MIPS one.
> 
> There shouldn't be any memory barriers required, and don't EV56+ have
> single-byte loads and stores?

 I should have commented on this in my original reply.

 You're the RCU expert so you know the answer.  I don't.  If it's OK for
successive writes to get reordered, or readers to see a stale value, then 
you don't need memory barriers.  Otherwise you do.  Whether byte accesses 
are available or not does not matter, the CPU *will* do reordering if it's 
allowed to (or more specifically, it won't do anything to prevent it from 
happening, especially in SMP configurations; I can't remember offhand if 
there are cases with UP).  Also adjacent byte writes may be merged, but I 
suppose it does not matter, or does it?

 NB MIPS has similar architectural arrangements (and a bunch of barriers 
defined in the ISA), it's just most implementations are actually strongly 
ordered, so most people can't see the effects of this.  With MIPS I know 
for sure there are cases of UP reordering, but they only really matter for 
MMIO use cases and not regular memory.

  Maciej
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Paul E. McKenney 1 year, 6 months ago
On Fri, May 31, 2024 at 04:56:28AM +0100, Maciej W. Rozycki wrote:
> On Wed, 29 May 2024, Paul E. McKenney wrote:
> 
> > >  Mind that the read-modify-write sequence that software does for sub-word 
> > > write accesses with original Alpha hardware is precisely what hardware 
> > > would have to do anyway and support for that was deliberately omitted by 
> > > the architecture designers from the ISA to give it performance advantages 
> > > quoted in the architecture manual.  The only difference here is that with 
> > > hardware read-modify-write operations atomicity for sub-word accesses is 
> > > guaranteed by the ISA, however for software read-modify-write it has to be 
> > > explictly coded using the usual load-locked/store-conditional sequence in 
> > > a loop.  I don't think it's a big deal really, it should be trivial to do 
> > > in the relevant accessors, along with the memory barriers that are needed 
> > > anyway for EV56+ and possibly other ports such as the MIPS one.
> > 
> > There shouldn't be any memory barriers required, and don't EV56+ have
> > single-byte loads and stores?
> 
>  I should have commented on this in my original reply.
> 
>  You're the RCU expert so you know the answer.  I don't.  If it's OK for
> successive writes to get reordered, or readers to see a stale value, then 
> you don't need memory barriers.  Otherwise you do.  Whether byte accesses 
> are available or not does not matter, the CPU *will* do reordering if it's 
> allowed to (or more specifically, it won't do anything to prevent it from 
> happening, especially in SMP configurations; I can't remember offhand if 
> there are cases with UP).  Also adjacent byte writes may be merged, but I 
> suppose it does not matter, or does it?

RCU uses whichever wrapper is required.  For example, if ordering is
required, it might use smp_store_release() and smp_load_acquire().
If ordering does not matter, it might use WRITE_ONCE() and READ_ONCE().
If tearing/fusing/merging does not matter, as in there are not concurrent
accesses, it uses plain C-language loads and stores.

>  NB MIPS has similar architectural arrangements (and a bunch of barriers 
> defined in the ISA), it's just most implementations are actually strongly 
> ordered, so most people can't see the effects of this.  With MIPS I know 
> for sure there are cases of UP reordering, but they only really matter for 
> MMIO use cases and not regular memory.

Any given architecture is required to provide architecture-specific
implementations of the various functions that meet the requirements of
Linux-kernel memory model.  See tools/memory-model for more information.

							Thanx, Paul
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Maciej W. Rozycki 1 year, 6 months ago
On Fri, 31 May 2024, Paul E. McKenney wrote:

> >  You're the RCU expert so you know the answer.  I don't.  If it's OK for
> > successive writes to get reordered, or readers to see a stale value, then 
> > you don't need memory barriers.  Otherwise you do.  Whether byte accesses 
> > are available or not does not matter, the CPU *will* do reordering if it's 
> > allowed to (or more specifically, it won't do anything to prevent it from 
> > happening, especially in SMP configurations; I can't remember offhand if 
> > there are cases with UP).  Also adjacent byte writes may be merged, but I 
> > suppose it does not matter, or does it?
> 
> RCU uses whichever wrapper is required.  For example, if ordering is
> required, it might use smp_store_release() and smp_load_acquire().
> If ordering does not matter, it might use WRITE_ONCE() and READ_ONCE().
> If tearing/fusing/merging does not matter, as in there are not concurrent
> accesses, it uses plain C-language loads and stores.

 Fair enough.

> >  NB MIPS has similar architectural arrangements (and a bunch of barriers 
> > defined in the ISA), it's just most implementations are actually strongly 
> > ordered, so most people can't see the effects of this.  With MIPS I know 
> > for sure there are cases of UP reordering, but they only really matter for 
> > MMIO use cases and not regular memory.
> 
> Any given architecture is required to provide architecture-specific
> implementations of the various functions that meet the requirements of
> Linux-kernel memory model.  See tools/memory-model for more information.

 This is a fairly recent addition, thank you for putting it all together.  
I used to rely solely on Documentation/memory-barriers.txt.  Thanks for 
the reference.

  Maciej
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Paul E. McKenney 1 year, 6 months ago
On Mon, Jun 03, 2024 at 05:22:22PM +0100, Maciej W. Rozycki wrote:
> On Fri, 31 May 2024, Paul E. McKenney wrote:
> 
> > >  You're the RCU expert so you know the answer.  I don't.  If it's OK for
> > > successive writes to get reordered, or readers to see a stale value, then 
> > > you don't need memory barriers.  Otherwise you do.  Whether byte accesses 
> > > are available or not does not matter, the CPU *will* do reordering if it's 
> > > allowed to (or more specifically, it won't do anything to prevent it from 
> > > happening, especially in SMP configurations; I can't remember offhand if 
> > > there are cases with UP).  Also adjacent byte writes may be merged, but I 
> > > suppose it does not matter, or does it?
> > 
> > RCU uses whichever wrapper is required.  For example, if ordering is
> > required, it might use smp_store_release() and smp_load_acquire().
> > If ordering does not matter, it might use WRITE_ONCE() and READ_ONCE().
> > If tearing/fusing/merging does not matter, as in there are not concurrent
> > accesses, it uses plain C-language loads and stores.
> 
>  Fair enough.
> 
> > >  NB MIPS has similar architectural arrangements (and a bunch of barriers 
> > > defined in the ISA), it's just most implementations are actually strongly 
> > > ordered, so most people can't see the effects of this.  With MIPS I know 
> > > for sure there are cases of UP reordering, but they only really matter for 
> > > MMIO use cases and not regular memory.
> > 
> > Any given architecture is required to provide architecture-specific
> > implementations of the various functions that meet the requirements of
> > Linux-kernel memory model.  See tools/memory-model for more information.
> 
>  This is a fairly recent addition, thank you for putting it all together.  
> I used to rely solely on Documentation/memory-barriers.txt.  Thanks for 
> the reference.

It has been in the kernel since April 2018, but OK.  And a big "thank you"
to all the people who made this possible and who continue contributing
to it.  And Documentation/memory-barriers.txt still matters, though the
long-term goal is for it to be subsumed into tools/memory-model.  Things
like compiler optimizations make this difficult, but not impossible.

Another precaution is to ensure that any contraints of a non-common-case
architecture be tested for.  For example, if I add a 64-bit divide, I
get yelled at promptly.  In contrast, that long list of byte accesses
that Arnd posted were suffered in silence.  So they accumulated well
past the point where they can reasonably be backed out.

							Thanx, Paul
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Maciej W. Rozycki 1 year, 5 months ago
On Mon, 3 Jun 2024, Paul E. McKenney wrote:

> >  This is a fairly recent addition, thank you for putting it all together.  
> > I used to rely solely on Documentation/memory-barriers.txt.  Thanks for 
> > the reference.
> 
> It has been in the kernel since April 2018, but OK.  And a big "thank you"

 When you've been around for 25+ years, 5 years back seems like yesterday.

> to all the people who made this possible and who continue contributing
> to it.  And Documentation/memory-barriers.txt still matters, though the
> long-term goal is for it to be subsumed into tools/memory-model.  Things
> like compiler optimizations make this difficult, but not impossible.

 I realise these are tough matters and I second your gratitude.

> Another precaution is to ensure that any contraints of a non-common-case
> architecture be tested for.  For example, if I add a 64-bit divide, I
> get yelled at promptly.  In contrast, that long list of byte accesses
> that Arnd posted were suffered in silence.  So they accumulated well
> past the point where they can reasonably be backed out.

 Well, it's easy to notice and yell when you get an unresolved link-time 
reference to __divdi3 or suchlike.  While such heisenbugs as those caused 
by the race condition from concurrent unprotected rmw accesses may all be 
too easily blamed on cosmic rays or any other random instability.

 Take for example the GCC bug I mentioned in my reply to Linus in this 
thread, GCC PR rtl-optimization/115565.  It took 20 years to spot, even 
though it's in heavily used code and it does not depend on timing: with 
the right conditions it will trigger every time.

 If I were aware of these issues, I would definitely have got at them 
sooner.  Anyway, as mentioned in the other reply, I've overcome system 
setup issues now and will be working on the problem discussed here.

  Maciej
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Maciej W. Rozycki 1 year, 6 months ago
On Wed, 29 May 2024, Paul E. McKenney wrote:

> >  What I have been after actually is: can you point me at a piece of code 
> > in our tree that will cause an issue with a non-BWX Alpha as described in 
> > your scenario, so that I have a starting point?  Mind that I'm completely 
> > new to RCU as I didn't have a need to research it before (though from a 
> > skim over Documentation/RCU/rcu.rst I understand what its objective is).
> 
> See the uses of the fields of the current->rcu_read_unlock_special.b
> anonymous structure for the example that led us here.  And who knows how
> many other pieces of the Linux kernel that assume that it is possible
> to atomically store a single byte.

 Thanks, that helps.

> Many of which use a normal C-language store, in which case there are
> no accessors.  This can be a problem even in the case where there are no
> data races to either byte, because the need for the read-modify-write
> sequence on older Alpha systems results in implicit data races at the
> machine-word level.

 Ack.

> >  FWIW even if it was only me I think that depriving the already thin Alpha 
> > port developer base of any quantity of the remaining manpower, by dropping 
> > support for a subset of the hardware available, and then a subset that is 
> > not just as exotic as the original i386 became to the x86 platform at the 
> > time support for it was dropped, is only going to lead to further demise 
> > and eventual drop of the entire port.
> 
> Yes, support has been dropped for some of the older x86 CPUs as well,
> for example, Linux-kernel support for multiprocessor 80386 systems was
> dropped a great many years ago, in part because those CPUs do not have
> a cmpxchg instruction.  So it is not like we are picking on Alpha.

 That's what I mentioned (and for the record i386 wasn't dropped for the 
lack of CMPXCHG, as we never supported i386 SMP, exceedingly rare, anyway, 
but for the lack of page-level write-protection in the kernel mode, which 
implied painful manual checks).  At the time our support for the i386 was 
dropped its population outside embedded use was minuscule and certainly 
compared to non-i386 x86 Linux user base.  And the supply of modern x86 
systems was not an issue either.

 Conversely no new Alpha systems are made and I suspect the ratio between 
BWX and non-BWX Alpha Linux users is not as high as between post-i386 x86 
and original i386 Linux users at the time of the drop.

> >  And I think it would be good if we kept the port, just as we keep other 
> > ports of historical significance only, for educational reasons if nothing 
> > else, such as to let people understand based on an actual example, once 
> > mainstream, the implications of weakly ordered memory systems.
> 
> I don't know of any remaining issues with the newer Alpha systems that do
> support single-byte and double-byte load and store instructions, and so
> I am not aware of any reason for dropping Linux-kernel support for them.

 Well, the lack of developers to maintain the port would be the reason I 
refer to.  If you let developers drop by preventing them from using their 
hardware to work on the port, then eventually we'll have none.

 Anyway it seems like an issue to be sorted in the compiler, transparently 
to RCU, so it shouldn't be a reason to drop support for non-BWX Alpha CPUs 
anymore.  See my reply to Linus in this thread.

 Thank you for your input, always appreciated.

  Maciej
Re: [PATCH 00/14] alpha: cleanups for 6.10
Posted by Paul E. McKenney 1 year, 7 months ago
On Fri, May 03, 2024 at 06:53:45PM +0200, John Paul Adrian Glaubitz wrote:
> Hello Arnd,
> 
> On Fri, 2024-05-03 at 10:11 +0200, Arnd Bergmann wrote:
> > I had investigated dropping support for alpha EV5 and earlier a while
> > ago after noticing that this is the only supported CPU family
> > in the kernel without native byte access and that Debian has already
> > dropped support for this generation last year [1] after it turned
> > out to be broken.
> 
> That's not quite correct. Support for older Alphas is not broken and
> always worked when I tested it. It's just that some people wanted to
> raise the baseline in order to improve code performance on newer machines
> with the hope to fix some minor issues we saw on Alpha here and there.
> 
> > This topic came up again when Paul E. McKenney noticed that
> > parts of the RCU code already rely on byte access and do not
> > work on alpha EV5 reliably, so I refreshed my series now for
> > inclusion into the next merge window.
> 
> Hrrrm? That sounds like like Paul ran tests on EV5, did he?

Arnd does say "noticed", not "tested".  No Alpha CPUs here, and I don't
run Alpha emulators.  There is only so much time in each day and only
so much budget for electricity.  ;-)

For the series: Acked-by: Paul E. McKenney <paulmck@kernel.org>

> > Al Viro did another series for alpha to address all the known build
> > issues. I rebased his patches without any further changes and included
> > it as a baseline for my work here to avoid conflicts.
> 
> It's somewhat strange that Al improves code on the older machines only
> to be axed by your series. I would prefer such removals to aimed at an
> LTS release, if possible.

Once they are in mainline, you are within your rights to send Al's
code-improvement patches to -stable, which should get them to the LTS
releases.  It might well be that Arnd was planning to do just that.

							Thanx, Paul

> Adrian
> 
> -- 
>  .''`.  John Paul Adrian Glaubitz
> : :' :  Debian Developer
> `. `'   Physicist
>   `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913