[PATCH 13/15] x86/cpu: Make CONFIG_X86_CX8 unconditional

Ingo Molnar posted 15 patches 9 months, 2 weeks ago
There is a newer version of this series
[PATCH 13/15] x86/cpu: Make CONFIG_X86_CX8 unconditional
Posted by Ingo Molnar 9 months, 2 weeks ago
I.e. it's a boot time requirement for the CPU to support it.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/Kconfig             | 2 +-
 arch/x86/Kconfig.cpu         | 3 +--
 arch/x86/Kconfig.cpufeatures | 1 -
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 465e5abd2750..a9d717558972 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -136,7 +136,7 @@ config X86
 	select ARCH_SUPPORTS_AUTOFDO_CLANG
 	select ARCH_SUPPORTS_PROPELLER_CLANG    if X86_64
 	select ARCH_USE_BUILTIN_BSWAP
-	select ARCH_USE_CMPXCHG_LOCKREF		if X86_CX8
+	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_USE_MEMTEST
 	select ARCH_USE_QUEUED_RWLOCKS
 	select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index 30466a258db8..6f1e8cc8fe58 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -245,7 +245,6 @@ config X86_HAVE_PAE
 
 config X86_CX8
 	def_bool y
-	depends on X86_HAVE_PAE || M586TSC || M586MMX || MK6 || MK7 || MGEODEGX1 || MGEODE_LX
 
 # this should be set for all -march=.. options where the compiler
 # generates cmov.
@@ -257,7 +256,7 @@ config X86_MINIMUM_CPU_FAMILY
 	int
 	default "64" if X86_64
 	default "6" if X86_32 && (MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MEFFICEON || MATOM || MK7)
-	default "5" if X86_32 && X86_CX8
+	default "5" if X86_32
 	default "4"
 
 config X86_DEBUGCTLMSR
diff --git a/arch/x86/Kconfig.cpufeatures b/arch/x86/Kconfig.cpufeatures
index cd551818f451..f04ae53435bc 100644
--- a/arch/x86/Kconfig.cpufeatures
+++ b/arch/x86/Kconfig.cpufeatures
@@ -42,7 +42,6 @@ config X86_REQUIRED_FEATURE_NOPL
 
 config X86_REQUIRED_FEATURE_CX8
 	def_bool y
-	depends on X86_CX8
 
 # this should be set for all -march=.. options where the compiler
 # generates cmov.
-- 
2.45.2
Re: [PATCH 13/15] x86/cpu: Make CONFIG_X86_CX8 unconditional
Posted by Arnd Bergmann 9 months, 2 weeks ago
On Fri, Apr 25, 2025, at 10:42, Ingo Molnar wrote:
> @@ -257,7 +256,7 @@ config X86_MINIMUM_CPU_FAMILY
>  	int
>  	default "64" if X86_64
>  	default "6" if X86_32 && (MPENTIUM4 || MPENTIUMM || MPENTIUMIII || 
> MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MEFFICEON || MATOM || MK7)
> -	default "5" if X86_32 && X86_CX8
> +	default "5" if X86_32
>  	default "4"
> 

I just noticed this one: the final 'default "4"' is no longer possible
here and can be removed. All the remaining CPUs report family "5" or
higher.

There is an old issue for some rare CPUs (Geode LX and Crusoe) that
support CMOV but report family=6. These to boot a kernel with X86_MINIMUM_CPU_FAMILY=6 because it triggers the boot time check.

     Arnd
Re: [PATCH 13/15] x86/cpu: Make CONFIG_X86_CX8 unconditional
Posted by Ingo Molnar 9 months, 2 weeks ago
* Arnd Bergmann <arnd@kernel.org> wrote:

> On Fri, Apr 25, 2025, at 10:42, Ingo Molnar wrote:
> > @@ -257,7 +256,7 @@ config X86_MINIMUM_CPU_FAMILY
> >  	int
> >  	default "64" if X86_64
> >  	default "6" if X86_32 && (MPENTIUM4 || MPENTIUMM || MPENTIUMIII || 
> > MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MEFFICEON || MATOM || MK7)
> > -	default "5" if X86_32 && X86_CX8
> > +	default "5" if X86_32
> >  	default "4"
> > 
> 
> I just noticed this one: the final 'default "4"' is no longer 
> possible here and can be removed. All the remaining CPUs report 
> family "5" or higher.

Right, I've applied the fix below and backmerged it into the series.

Thanks,

	Ingo

==========================>
 arch/x86/Kconfig.cpu | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index 6f1e8cc8fe58..b3772d384fa0 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -257,7 +257,6 @@ config X86_MINIMUM_CPU_FAMILY
 	default "64" if X86_64
 	default "6" if X86_32 && (MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MEFFICEON || MATOM || MK7)
 	default "5" if X86_32
-	default "4"
 
 config X86_DEBUGCTLMSR
 	def_bool y
Re: [PATCH 13/15] x86/cpu: Make CONFIG_X86_CX8 unconditional
Posted by H. Peter Anvin 9 months, 2 weeks ago
On April 25, 2025 5:10:27 AM PDT, Arnd Bergmann <arnd@kernel.org> wrote:
>On Fri, Apr 25, 2025, at 10:42, Ingo Molnar wrote:
>> @@ -257,7 +256,7 @@ config X86_MINIMUM_CPU_FAMILY
>>  	int
>>  	default "64" if X86_64
>>  	default "6" if X86_32 && (MPENTIUM4 || MPENTIUMM || MPENTIUMIII || 
>> MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MEFFICEON || MATOM || MK7)
>> -	default "5" if X86_32 && X86_CX8
>> +	default "5" if X86_32
>>  	default "4"
>> 
>
>I just noticed this one: the final 'default "4"' is no longer possible
>here and can be removed. All the remaining CPUs report family "5" or
>higher.
>
>There is an old issue for some rare CPUs (Geode LX and Crusoe) that
>support CMOV but report family=6. These to boot a kernel with X86_MINIMUM_CPU_FAMILY=6 because it triggers the boot time check.
>
>     Arnd

They report family=5 because family=6 implies fcomi and nopl support (in the case of Crusoe, they have fcomi but didn't support movl.)
Re: [PATCH 13/15] x86/cpu: Make CONFIG_X86_CX8 unconditional
Posted by Arnd Bergmann 9 months, 2 weeks ago
On Fri, Apr 25, 2025, at 17:15, H. Peter Anvin wrote:
> On April 25, 2025 5:10:27 AM PDT, Arnd Bergmann <arnd@kernel.org> wrote:
>>
>>I just noticed this one: the final 'default "4"' is no longer possible
>>here and can be removed. All the remaining CPUs report family "5" or
>>higher.
>>
>>There is an old issue for some rare CPUs (Geode LX and Crusoe) that
>>support CMOV but report family=6. These to boot a kernel with X86_MINIMUM_CPU_FAMILY=6 because it triggers the boot time check.
>>
>
> They report family=5 because family=6 implies fcomi and nopl support 
> (in the case of Crusoe, they have fcomi but didn't support movl.)

Ah right, I see now. I had only checked that the kernel itself
no longer uses nopl after your ba0593bf553c ("x86: completely
disable NOPL on 32 bits"), and I had seen that Debian intentionally
builds 32-bit i686 kernels with CONFIG_MGEODEGX1.

I now found that both Debian 12 and gcc 11 changed their definition
if 686 to actually require nopl for Indirect branch tracking 
(-fcf-protection) in user space, as discussed in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104713

So even if it makes sense for GeodeLX specific kernel to use CMOV,
any general-purpose i686 distro would still want to enable IBT
in userspace to gain IBT on Tiger Lake and newer 64-bit CPUs.

     Arnd
Re: [PATCH 13/15] x86/cpu: Make CONFIG_X86_CX8 unconditional
Posted by Ingo Molnar 9 months, 2 weeks ago
* Arnd Bergmann <arnd@kernel.org> wrote:

> On Fri, Apr 25, 2025, at 17:15, H. Peter Anvin wrote:
> > On April 25, 2025 5:10:27 AM PDT, Arnd Bergmann <arnd@kernel.org> wrote:
> >>
> >>I just noticed this one: the final 'default "4"' is no longer possible
> >>here and can be removed. All the remaining CPUs report family "5" or
> >>higher.
> >>
> >>There is an old issue for some rare CPUs (Geode LX and Crusoe) that
> >>support CMOV but report family=6. These to boot a kernel with X86_MINIMUM_CPU_FAMILY=6 because it triggers the boot time check.
> >>
> >
> > They report family=5 because family=6 implies fcomi and nopl support 
> > (in the case of Crusoe, they have fcomi but didn't support movl.)
> 
> Ah right, I see now. I had only checked that the kernel itself
> no longer uses nopl after your ba0593bf553c ("x86: completely
> disable NOPL on 32 bits"), and I had seen that Debian intentionally
> builds 32-bit i686 kernels with CONFIG_MGEODEGX1.
> 
> I now found that both Debian 12 and gcc 11 changed their definition
> if 686 to actually require nopl for Indirect branch tracking 
> (-fcf-protection) in user space, as discussed in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104713
> 
> So even if it makes sense for GeodeLX specific kernel to use CMOV,
> any general-purpose i686 distro would still want to enable IBT
> in userspace to gain IBT on Tiger Lake and newer 64-bit CPUs.

And the kernel Debian 12 uses is a "686" one:

  ./pool/main/l/linux-signed-i386/linux-image-6.1.0-32-686_6.1.129-1_i386.deb
  ./pool/main/l/linux-signed-i386/linux-image-686_6.1.129-1_i386.deb

and the kernel is set to CONFIG_MGEODE_LX=y:

  $ grep CONFIG_MGEODE_LX ./boot/config-6.1.0-32-686
  CONFIG_MGEODE_LX=y

... which CPU has CMOV support:

  config X86_CMOV
        def_bool y
        depends on (MK7 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || MATOM || MGEODE_LX || X86_64)                                                                                                                                                                                                                                           ^^^^^^^^^
So I'd argue that the kernel's x86-32 CPU support cutoff should match 
the i386 CPU support cutoff of the Debian i386 installer.

Survey of other distros:

 - Fedora dropped x86-32 with Fedora 31, almost 5 years ago.

 - Ubuntu dropped x86-32 after 18 LTS, more than 5 years ago. The LTS 
   kernel is v5.6 based.

 - Arch Linux dropped i686 support even earlier than that, the 
   spin-off-community project of archlinux32.org has 486 and 686 
   variants. 686 variant includes CMOV.

 - Gentoo has an 'x86' variant with 486 and 686 stages. 686 stage 
   includes CMOV.

Ie. I think we can also make CMOV a hard requirement, and keep support 
for all family 5 CPUs that have CMOV and have a chance to boot current 
32-bit distros. Even distros that had 486 builds have 686 variants that 
should still work.

I.e. remove support for M586MMX, M586TSC, MCYRIXIII, MGEODEGX1 and MK6 
as well, these don't have CMOV support and won't even boot i386 Debian 
12.

Summary, the plan would be to remove support for the following pre-CMOV 
CPUs (the ones not yet in this series are marked 'NEW'):

  M486
  M486SX
  M586
  M586MMX         # NEW
  M586TSC         # NEW
  MCYRIXIII       # NEW
  MELAN
  MGEODEGX1       # NEW
  MK6             # NEW
  MWINCHIP3D
  MWINCHIPC6

And to keep these:

  M686
  MATOM
  MCRUSOE
  MEFFICEON
  MGEODE_LX
  MK7
  MPENTIUM4
  MPENTIUMII
  MPENTIUMIII
  MPENTIUMM
  MVIAC3_2
  MVIAC7

Thanks,

	Ingo
Re: [PATCH 13/15] x86/cpu: Make CONFIG_X86_CX8 unconditional
Posted by Arnd Bergmann 9 months, 2 weeks ago
On Sun, Apr 27, 2025, at 11:25, Ingo Molnar wrote:
> * Arnd Bergmann <arnd@kernel.org> wrote:
>> On Fri, Apr 25, 2025, at 17:15, H. Peter Anvin wrote:
>> 
>> I now found that both Debian 12 and gcc 11 changed their definition
>> if 686 to actually require nopl for Indirect branch tracking 
>> (-fcf-protection) in user space, as discussed in
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104713
>> 
>> So even if it makes sense for GeodeLX specific kernel to use CMOV,
>> any general-purpose i686 distro would still want to enable IBT
>> in userspace to gain IBT on Tiger Lake and newer 64-bit CPUs.
>
> And the kernel Debian 12 uses is a "686" one:
>
>   ./pool/main/l/linux-signed-i386/linux-image-6.1.0-32-686_6.1.129-1_i386.deb
>   ./pool/main/l/linux-signed-i386/linux-image-686_6.1.129-1_i386.deb
>
> and the kernel is set to CONFIG_MGEODE_LX=y:
>
>   $ grep CONFIG_MGEODE_LX ./boot/config-6.1.0-32-686
>   CONFIG_MGEODE_LX=y
>
> ... which CPU has CMOV support:
>
>   config X86_CMOV
>         def_bool y
>         depends on (MK7 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || 
> MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || 
> MATOM || MGEODE_LX || X86_64)                                           
>                                                                         
>                                                                         
>                                                 ^^^^^^^^^
> So I'd argue that the kernel's x86-32 CPU support cutoff should match 
> the i386 CPU support cutoff of the Debian i386 installer.

I think this misses a few other bits of information, some of which
we already mentioned in this thread:

- Debian 13 no longer has any 32-bit kernel, so debian-i686 is
  primarily targeted at running on 64-bit kernels for memory
  constrained environments.

- Debian 12 started requiring NOPL in userspace, which is not
  supported on GeodeLX (or Crusoe), the kernel option should have
  been changed to M686 instead but was accidentally left at
  MGEODE_LX, so the kernel still works, but userspace doesn't.

- Anyone running Linux on an i586 machine likely already wants
  a custom kernel, regardless what the distros provide. This
  is especially true for the embedded targets.

> Survey of other distros:
>
>  - Fedora dropped x86-32 with Fedora 31, almost 5 years ago.
>
>  - Ubuntu dropped x86-32 after 18 LTS, more than 5 years ago. The LTS 
>    kernel is v5.6 based.
>
>  - Arch Linux dropped i686 support even earlier than that, the 
>    spin-off-community project of archlinux32.org has 486 and 686 
>    variants. 686 variant includes CMOV.
>
>  - Gentoo has an 'x86' variant with 486 and 686 stages. 686 stage 
>    includes CMOV.
>
> Ie. I think we can also make CMOV a hard requirement, and keep support 
> for all family 5 CPUs that have CMOV and have a chance to boot current 
> 32-bit distros. Even distros that had 486 builds have 686 variants that 
> should still work.
>
> I.e. remove support for M586MMX, M586TSC, MCYRIXIII, MGEODEGX1 and MK6 
> as well, these don't have CMOV support and won't even boot i386 Debian 
> 12.
>
> Summary, the plan would be to remove support for the following pre-CMOV 
> CPUs (the ones not yet in this series are marked 'NEW'):
>
>   M486
>   M486SX
>   M586
>   M586MMX         # NEW
>   M586TSC         # NEW
>   MCYRIXIII       # NEW
>   MELAN
>   MGEODEGX1       # NEW
>   MK6             # NEW
>   MWINCHIP3D
>   MWINCHIPC6

This would also mean dropping support for the pre-2015 Intel Quark
and DM&P Vortex86DX/DX2/MX/EX that never had a custom CONFIG_Mxxxx
option but are still relevant to some degree.
I think that would be a mistake. 

> And to keep these:
>
>   M686
>   MATOM
>   MCRUSOE
>   MEFFICEON
>   MGEODE_LX
>   MK7
>   MPENTIUM4
>   MPENTIUMII
>   MPENTIUMIII
>   MPENTIUMM
>   MVIAC3_2
>   MVIAC7

As Linus said, overall they are barely different from the
first group, and they are just as obsolete, only Atom and
Vortex86DX3/EmKore are less than 20 years old.

Here are some alternatives I like better than dropping i586:

a) keep my patch with an new bool option to pick between
   i586 and i686 targets, by any name.

b) always build with -march=i586 and leave only the -mtune
   flags; see if anyone cares enough to even benchmark
   and pick one of the other options if they can show
   a meaningful regression over -march=i686 -mtune=

c) keep the outcome of your v1 series, dropping only
   pre-i586 support, and leave my patch out. No change here,
   so at least no regression potential.

d) use -march=i686 (plus -mtune=) for normal builds, but
   keep support for the older cores guarded by
   X86_EXTENDED_PLATFORM or CONFIG_EXPERT, use -march=i586
   if at least one of those platforms is selected.

      Arnd
Re: [PATCH 13/15] x86/cpu: Make CONFIG_X86_CX8 unconditional
Posted by Ingo Molnar 9 months, 2 weeks ago
* Arnd Bergmann <arnd@kernel.org> wrote:

> >   M486
> >   M486SX
> >   M586
> >   M586MMX         # NEW
> >   M586TSC         # NEW
> >   MCYRIXIII       # NEW
> >   MELAN
> >   MGEODEGX1       # NEW
> >   MK6             # NEW
> >   MWINCHIP3D
> >   MWINCHIPC6
> 
> This would also mean dropping support for the pre-2015 Intel Quark
> and DM&P Vortex86DX/DX2/MX/EX that never had a custom CONFIG_Mxxxx
> option but are still relevant to some degree.
> I think that would be a mistake. 

Yeah, agreed, and especially with the <asm/bitops.h> CMOV complication 
removed per Linus's patch, we could actually remove CONFIG_X86_CMOV, as 
nothing uses it anymore:

  starship:~/mingo.tip.git> git grep X86_CMOV
  arch/x86/Kconfig.cpu:config X86_CMOV
  arch/x86/Kconfig.cpufeatures:   depends on X86_CMOV

The CMOV dependency comes in through compiler options only: 

  arch/x86/Makefile_32.cpu:cflags-$(CONFIG_M586TSC)	+= -march=i586
  arch/x86/Makefile_32.cpu:cflags-$(CONFIG_M586MMX)	+= -march=pentium-mmx
  arch/x86/Makefile_32.cpu:cflags-$(CONFIG_MCYRIXIII)	+= $(call cc-option,-march=c3,-march=i486) $(align)
  arch/x86/Makefile_32.cpu:cflags-$(CONFIG_MGEODEGX1)	+= -march=pentium-mmx
  arch/x86/Makefile_32.cpu:cflags-$(CONFIG_MGEODE_LX)	+= $(call cc-option,-march=geode,-march=pentium-mmx)

These build options will, indirectly, not include CMOV in the kernel 
image, while i686 and higher march options will.

(BTW., we should probably remove the -march=i486 fallback for 
MCYRIXIII, our minimum CC version is beyond that already I believe.)

Anyway, the current plan is to not drop common-i586, only the removal 
of what's in the -v1 series:

   M486
   M486SX
   M586
   MELAN
   MWINCHIP3D
   MWINCHIPC6

> Here are some alternatives I like better than dropping i586:
> 
> a) keep my patch with an new bool option to pick between
>    i586 and i686 targets, by any name.
> 
> b) always build with -march=i586 and leave only the -mtune
>    flags; see if anyone cares enough to even benchmark
>    and pick one of the other options if they can show
>    a meaningful regression over -march=i686 -mtune=

That's actually a good idea IMO. I looked at the code generation with 
current compilers and it turns out that M686 is *substantially* worse 
in code generation than M586, as apparently the extra CMOV instructions 
bloat up the generated code:

      text	   data	    bss	    dec	    hex	filename
  15427023	7601010	1744896	24772929	17a0141	vmlinux.M586
  16578295	7598826	1744896	25922017	18b89e1	vmlinux.M686

 - +7.5% increase in text size (5.6% according to bloatometer),
 - +2% increase in instruction count,
 - while number of branches increases by +1.3%.

But it's not about CMOV: I checked about a dozen functions that end up 
using CMOV, and the 'conditional' part of CMOV does seem to reduce 
branches for those functions by a minor degree and ends up reducing 
their size as well. So CMOV helps, a bit.

The substantial code bloat comes from some other aspect of GCC's 
march=i686 flag ... I bet it's primarily inlining: there's a 0.7% 
reduction in number of calls done.

I have a hard time believing that this kind of bloat and complexity 
helps performance to *any* degree.

I really didn't remember how bad it was, until I re-measured it.

CMOV is likely a drop in the ocean compared to this kind of text bloat. 
And yeah, it doesn't really matter that i686 class CPUs have larger 
caches, the kernel is dominantly cache-cold code execution, inlining 
driven bloat almost never helps performance.

> c) keep the outcome of your v1 series, dropping only
>    pre-i586 support, and leave my patch out. No change here,
>    so at least no regression potential.

Yeah, so this is roughly the current plan, with perhaps light touchups 
on top to make it easier to configure, and to remove residual legacies.

Thanks,

	Ingo
Re: [PATCH 13/15] x86/cpu: Make CONFIG_X86_CX8 unconditional
Posted by Arnd Bergmann 9 months, 2 weeks ago
On Mon, Apr 28, 2025, at 11:16, Ingo Molnar wrote:
> * Arnd Bergmann <arnd@kernel.org> wrote:
>> 
>> b) always build with -march=i586 and leave only the -mtune
>>    flags; see if anyone cares enough to even benchmark
>>    and pick one of the other options if they can show
>>    a meaningful regression over -march=i686 -mtune=
>
> That's actually a good idea IMO. I looked at the code generation with 
> current compilers and it turns out that M686 is *substantially* worse 
> in code generation than M586, as apparently the extra CMOV instructions 
> bloat up the generated code:
>
>       text	   data	    bss	    dec	    hex	filename
>   15427023	7601010	1744896	24772929	17a0141	vmlinux.M586
>   16578295	7598826	1744896	25922017	18b89e1	vmlinux.M686
>
>  - +7.5% increase in text size (5.6% according to bloatometer),
>  - +2% increase in instruction count,
>  - while number of branches increases by +1.3%.
>
> But it's not about CMOV: I checked about a dozen functions that end up 
> using CMOV, and the 'conditional' part of CMOV does seem to reduce 
> branches for those functions by a minor degree and ends up reducing 
> their size as well. So CMOV helps, a bit.
>
> The substantial code bloat comes from some other aspect of GCC's 
> march=i686 flag ... I bet it's primarily inlining: there's a 0.7% 
> reduction in number of calls done.

I had tried the same thing already, but saw a different result,
For me, the i686 output is 0.2% smaller than the i586 one (both
-mtune=generic), using gcc-14.2. or just 0.1% with clang-21,
which is roughly what I expected:

   text	   data	    bss	    dec	    hex	filename
7454055	4158218	1695744	13308017	 cb1071	build/tmp/vmlinux-i586
7433427	4154146	1695744	13283317	 caaff5	build/tmp/vmlinux-i686
7318514	4052573	1687552	13058639	 c7424f	build/tmp/vmlinux-i586-clang
7309938	4052573	1687552	13050063	 c720cf	build/tmp/vmlinux-i686-clang

I do see a larger difference compared to other -mtune= options, here is
the same config with "clang-21 -march=i586 -mtune=i686" instead of
"-march=i586 -mtune=generic":

7254510	4056669	1687552	12998731	 c6584b	build/tmp/vmlinux

There is a good chance that the -mtune= optimizations totally
dwarf cmov not just in code size difference but also actual
performance, the bit I'm unsure about is whether we still need
to worry about any core where this is not the case (I'm guessing
not but have no way to prove that).

      Arnd
Re: [PATCH 13/15] x86/cpu: Make CONFIG_X86_CX8 unconditional
Posted by Ingo Molnar 9 months, 2 weeks ago
* Arnd Bergmann <arnd@kernel.org> wrote:

> On Mon, Apr 28, 2025, at 11:16, Ingo Molnar wrote:
> > * Arnd Bergmann <arnd@kernel.org> wrote:
> >> 
> >> b) always build with -march=i586 and leave only the -mtune
> >>    flags; see if anyone cares enough to even benchmark
> >>    and pick one of the other options if they can show
> >>    a meaningful regression over -march=i686 -mtune=
> >
> > That's actually a good idea IMO. I looked at the code generation with 
> > current compilers and it turns out that M686 is *substantially* worse 
> > in code generation than M586, as apparently the extra CMOV instructions 
> > bloat up the generated code:
> >
> >       text	   data	    bss	    dec	    hex	filename
> >   15427023	7601010	1744896	24772929	17a0141	vmlinux.M586
> >   16578295	7598826	1744896	25922017	18b89e1	vmlinux.M686
> >
> >  - +7.5% increase in text size (5.6% according to bloatometer),
> >  - +2% increase in instruction count,
> >  - while number of branches increases by +1.3%.
> >
> > But it's not about CMOV: I checked about a dozen functions that end up 
> > using CMOV, and the 'conditional' part of CMOV does seem to reduce 
> > branches for those functions by a minor degree and ends up reducing 
> > their size as well. So CMOV helps, a bit.
> >
> > The substantial code bloat comes from some other aspect of GCC's 
> > march=i686 flag ... I bet it's primarily inlining: there's a 0.7% 
> > reduction in number of calls done.
> 
> I had tried the same thing already, but saw a different result,

Just to clarify, my measurements only compare -march=i586 to 
-march=i686, not -mtune. Your results are primarily -mtune figures.

So unless you see something different from my figures with -march only, 
it's an apples to oranges comparison.

> There is a good chance that the -mtune= optimizations totally dwarf 
> cmov not just in code size difference but also actual performance, 
> the bit I'm unsure about is whether we still need to worry about any 
> core where this is not the case (I'm guessing not but have no way to 
> prove that).

I didn't use -mtune - I only tested two Kconfig variants:

  CONFIG_M686=y vs. CONFIG_M586TSC=y

... which map to two -march flags, not different -mtune flags:

  arch/x86/Makefile_32.cpu:cflags-$(CONFIG_M586TSC)       += -march=i586
  ...
  arch/x86/Makefile_32.cpu:cflags-$(CONFIG_M686)          += -march=i686

This is the current upstream status quo of x86-32 compiler flags, which 
results in significant .text bloat:

      text         data     bss     dec     hex filename
  15427023      7601010 1744896 24772929        17a0141 vmlinux.M586
  16578295      7598826 1744896 25922017        18b89e1 vmlinux.M686

 - +7.5% increase in text size (+5.6% according to bloatometer),
 - +2% increase in instruction count,
 - the number of branches increases by +1.3%,
 - while there's a -0.7% reduction in number of CALLs done.

I believe this is mostly the result of increased amount of inlining GCC 
14.2.0 does on march=i686 vs. march=i586.

The extra CMOV use on -march=i686 helps a bit but is overwhelmed by the 
effects of inlining.

Obviously these metrics cannot be automatically transformed into 
performance figures, but such inlining driven bloat almost always 
reduces the kernel's performance even on CPUs with large caches, for 
most but a few select 'hot' functions.

An interesting 'modern' twist: the reduced number of CALLs due to 
increased inlining is almost certainly reflected in a reduced number of 
CALLs in real workloads as well, which would be a disproportionately 
positive factor on x86 kernels and CPUs with retbleed-style mitigations 
activated (which is almost all of them).

Thanks,

	Ingo
Re: [PATCH 13/15] x86/cpu: Make CONFIG_X86_CX8 unconditional
Posted by Arnd Bergmann 9 months, 2 weeks ago
On Tue, Apr 29, 2025, at 12:22, Ingo Molnar wrote:
> * Arnd Bergmann <arnd@kernel.org> wrote:
>
> This is the current upstream status quo of x86-32 compiler flags, which 
> results in significant .text bloat:
>
>       text         data     bss     dec     hex filename
>   15427023      7601010 1744896 24772929        17a0141 vmlinux.M586
>   16578295      7598826 1744896 25922017        18b89e1 vmlinux.M686

>  - +7.5% increase in text size (+5.6% according to bloatometer),
>  - +2% increase in instruction count,
>  - the number of branches increases by +1.3%,
>  - while there's a -0.7% reduction in number of CALLs done.
>
> I believe this is mostly the result of increased amount of inlining GCC 
> 14.2.0 does on march=i686 vs. march=i586.

I can reproduce +7% numbers like the ones you have shown when
CONFIG_X86_GENERIC is disabled, but not if I turn that on,
or with my "[RFC] x86/cpu: rework instruction set selection"
patch applied.

What makes this confusing is that the -march=i686 option does
two things: it changes the allowed instructions to include cmov,
and it changes the implicit -mtune= argument to the same value,
unless you pass an explicit -mtune= as well.

Selecting the i686 instruction set by itself does not change
the amount of inlining at all, you can see that by comparing the
i586 and i686 output when CONFIG_X86_GENERIC=y is set, or if you
change the flags in the Makefile

What really kills it is the implied -mtune=i686, these are the
results of manually changing the flags:

   text	   data	    bss	    dec	    hex	filename
7235028	4240706	1691648	13167382	 c8eb16	vmlinux # i585
7218356	4240718	1691648	13150722	 c8aa02	vmlinux # i686, tune=i586
7299828	4240706	1691648	13232182	 c9e836	vmlinux # i586, tune=generic
7278948	4244826	1691648	13215422	 c9a6be	vmlinux # i686, tune=generic
7784708	4239410	1691648	13715766	 d14936	vmlinux # i586, tune=i686
7768340	4239446	1691648	13699434	 d1096a	vmlinux # i686

If you set the CONFIG_M586/M686 options, you get an additional
effect from a couple of changed Kconfig options, that lead to
the i686 further shrinking a little more, mainly from less
padding:

-CONFIG_X86_F00F_BUG=y
-CONFIG_X86_ALIGNMENT_16=y
+CONFIG_X86_USE_PPRO_CHECKSUM=y
-CONFIG_X86_MINIMUM_CPU_FAMILY=5
+CONFIG_X86_CMOV=y
+CONFIG_X86_MINIMUM_CPU_FAMILY=6
+CONFIG_X86_DEBUGCTLMSR=y
-CONFIG_CPU_SUP_CYRIX_32=y
-CONFIG_FUNCTION_PADDING_CFI=11
-CONFIG_FUNCTION_PADDING_BYTES=16
+CONFIG_FUNCTION_PADDING_CFI=0
+CONFIG_FUNCTION_PADDING_BYTES=4
+CONFIG_X86_REQUIRED_FEATURE_CMOV=y
-CONFIG_FUNCTION_ALIGNMENT_16B=y
-CONFIG_FUNCTION_ALIGNMENT=16
+CONFIG_FUNCTION_ALIGNMENT=4

      Arnd
Re: [PATCH 13/15] x86/cpu: Make CONFIG_X86_CX8 unconditional
Posted by H. Peter Anvin 9 months, 2 weeks ago
On April 27, 2025 10:32:14 AM PDT, Arnd Bergmann <arnd@kernel.org> wrote:
>On Sun, Apr 27, 2025, at 11:25, Ingo Molnar wrote:
>> * Arnd Bergmann <arnd@kernel.org> wrote:
>>> On Fri, Apr 25, 2025, at 17:15, H. Peter Anvin wrote:
>>> 
>>> I now found that both Debian 12 and gcc 11 changed their definition
>>> if 686 to actually require nopl for Indirect branch tracking 
>>> (-fcf-protection) in user space, as discussed in
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104713
>>> 
>>> So even if it makes sense for GeodeLX specific kernel to use CMOV,
>>> any general-purpose i686 distro would still want to enable IBT
>>> in userspace to gain IBT on Tiger Lake and newer 64-bit CPUs.
>>
>> And the kernel Debian 12 uses is a "686" one:
>>
>>   ./pool/main/l/linux-signed-i386/linux-image-6.1.0-32-686_6.1.129-1_i386.deb
>>   ./pool/main/l/linux-signed-i386/linux-image-686_6.1.129-1_i386.deb
>>
>> and the kernel is set to CONFIG_MGEODE_LX=y:
>>
>>   $ grep CONFIG_MGEODE_LX ./boot/config-6.1.0-32-686
>>   CONFIG_MGEODE_LX=y
>>
>> ... which CPU has CMOV support:
>>
>>   config X86_CMOV
>>         def_bool y
>>         depends on (MK7 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || 
>> MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || 
>> MATOM || MGEODE_LX || X86_64)                                           
>>                                                                         
>>                                                                         
>>                                                 ^^^^^^^^^
>> So I'd argue that the kernel's x86-32 CPU support cutoff should match 
>> the i386 CPU support cutoff of the Debian i386 installer.
>
>I think this misses a few other bits of information, some of which
>we already mentioned in this thread:
>
>- Debian 13 no longer has any 32-bit kernel, so debian-i686 is
>  primarily targeted at running on 64-bit kernels for memory
>  constrained environments.
>
>- Debian 12 started requiring NOPL in userspace, which is not
>  supported on GeodeLX (or Crusoe), the kernel option should have
>  been changed to M686 instead but was accidentally left at
>  MGEODE_LX, so the kernel still works, but userspace doesn't.
>
>- Anyone running Linux on an i586 machine likely already wants
>  a custom kernel, regardless what the distros provide. This
>  is especially true for the embedded targets.
>
>> Survey of other distros:
>>
>>  - Fedora dropped x86-32 with Fedora 31, almost 5 years ago.
>>
>>  - Ubuntu dropped x86-32 after 18 LTS, more than 5 years ago. The LTS 
>>    kernel is v5.6 based.
>>
>>  - Arch Linux dropped i686 support even earlier than that, the 
>>    spin-off-community project of archlinux32.org has 486 and 686 
>>    variants. 686 variant includes CMOV.
>>
>>  - Gentoo has an 'x86' variant with 486 and 686 stages. 686 stage 
>>    includes CMOV.
>>
>> Ie. I think we can also make CMOV a hard requirement, and keep support 
>> for all family 5 CPUs that have CMOV and have a chance to boot current 
>> 32-bit distros. Even distros that had 486 builds have 686 variants that 
>> should still work.
>>
>> I.e. remove support for M586MMX, M586TSC, MCYRIXIII, MGEODEGX1 and MK6 
>> as well, these don't have CMOV support and won't even boot i386 Debian 
>> 12.
>>
>> Summary, the plan would be to remove support for the following pre-CMOV 
>> CPUs (the ones not yet in this series are marked 'NEW'):
>>
>>   M486
>>   M486SX
>>   M586
>>   M586MMX         # NEW
>>   M586TSC         # NEW
>>   MCYRIXIII       # NEW
>>   MELAN
>>   MGEODEGX1       # NEW
>>   MK6             # NEW
>>   MWINCHIP3D
>>   MWINCHIPC6
>
>This would also mean dropping support for the pre-2015 Intel Quark
>and DM&P Vortex86DX/DX2/MX/EX that never had a custom CONFIG_Mxxxx
>option but are still relevant to some degree.
>I think that would be a mistake. 
>
>> And to keep these:
>>
>>   M686
>>   MATOM
>>   MCRUSOE
>>   MEFFICEON
>>   MGEODE_LX
>>   MK7
>>   MPENTIUM4
>>   MPENTIUMII
>>   MPENTIUMIII
>>   MPENTIUMM
>>   MVIAC3_2
>>   MVIAC7
>
>As Linus said, overall they are barely different from the
>first group, and they are just as obsolete, only Atom and
>Vortex86DX3/EmKore are less than 20 years old.
>
>Here are some alternatives I like better than dropping i586:
>
>a) keep my patch with an new bool option to pick between
>   i586 and i686 targets, by any name.
>
>b) always build with -march=i586 and leave only the -mtune
>   flags; see if anyone cares enough to even benchmark
>   and pick one of the other options if they can show
>   a meaningful regression over -march=i686 -mtune=
>
>c) keep the outcome of your v1 series, dropping only
>   pre-i586 support, and leave my patch out. No change here,
>   so at least no regression potential.
>
>d) use -march=i686 (plus -mtune=) for normal builds, but
>   keep support for the older cores guarded by
>   X86_EXTENDED_PLATFORM or CONFIG_EXPERT, use -march=i586
>   if at least one of those platforms is selected.
>
>      Arnd

Interesting. They really should be using x32 for that application...