I.e. it's a boot time requirement for the CPU to support it.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/Kconfig | 2 +-
arch/x86/Kconfig.cpu | 3 +--
arch/x86/Kconfig.cpufeatures | 1 -
3 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 465e5abd2750..a9d717558972 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -136,7 +136,7 @@ config X86
select ARCH_SUPPORTS_AUTOFDO_CLANG
select ARCH_SUPPORTS_PROPELLER_CLANG if X86_64
select ARCH_USE_BUILTIN_BSWAP
- select ARCH_USE_CMPXCHG_LOCKREF if X86_CX8
+ select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index 30466a258db8..6f1e8cc8fe58 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -245,7 +245,6 @@ config X86_HAVE_PAE
config X86_CX8
def_bool y
- depends on X86_HAVE_PAE || M586TSC || M586MMX || MK6 || MK7 || MGEODEGX1 || MGEODE_LX
# this should be set for all -march=.. options where the compiler
# generates cmov.
@@ -257,7 +256,7 @@ config X86_MINIMUM_CPU_FAMILY
int
default "64" if X86_64
default "6" if X86_32 && (MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MEFFICEON || MATOM || MK7)
- default "5" if X86_32 && X86_CX8
+ default "5" if X86_32
default "4"
config X86_DEBUGCTLMSR
diff --git a/arch/x86/Kconfig.cpufeatures b/arch/x86/Kconfig.cpufeatures
index cd551818f451..f04ae53435bc 100644
--- a/arch/x86/Kconfig.cpufeatures
+++ b/arch/x86/Kconfig.cpufeatures
@@ -42,7 +42,6 @@ config X86_REQUIRED_FEATURE_NOPL
config X86_REQUIRED_FEATURE_CX8
def_bool y
- depends on X86_CX8
# this should be set for all -march=.. options where the compiler
# generates cmov.
--
2.45.2
On Fri, Apr 25, 2025, at 10:42, Ingo Molnar wrote:
> @@ -257,7 +256,7 @@ config X86_MINIMUM_CPU_FAMILY
> int
> default "64" if X86_64
> default "6" if X86_32 && (MPENTIUM4 || MPENTIUMM || MPENTIUMIII ||
> MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MEFFICEON || MATOM || MK7)
> - default "5" if X86_32 && X86_CX8
> + default "5" if X86_32
> default "4"
>
I just noticed this one: the final 'default "4"' is no longer possible
here and can be removed. All the remaining CPUs report family "5" or
higher.
There is an old issue for some rare CPUs (Geode LX and Crusoe) that
support CMOV but report family=6. These to boot a kernel with X86_MINIMUM_CPU_FAMILY=6 because it triggers the boot time check.
Arnd
* Arnd Bergmann <arnd@kernel.org> wrote: > On Fri, Apr 25, 2025, at 10:42, Ingo Molnar wrote: > > @@ -257,7 +256,7 @@ config X86_MINIMUM_CPU_FAMILY > > int > > default "64" if X86_64 > > default "6" if X86_32 && (MPENTIUM4 || MPENTIUMM || MPENTIUMIII || > > MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MEFFICEON || MATOM || MK7) > > - default "5" if X86_32 && X86_CX8 > > + default "5" if X86_32 > > default "4" > > > > I just noticed this one: the final 'default "4"' is no longer > possible here and can be removed. All the remaining CPUs report > family "5" or higher. Right, I've applied the fix below and backmerged it into the series. Thanks, Ingo ==========================> arch/x86/Kconfig.cpu | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index 6f1e8cc8fe58..b3772d384fa0 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -257,7 +257,6 @@ config X86_MINIMUM_CPU_FAMILY default "64" if X86_64 default "6" if X86_32 && (MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MEFFICEON || MATOM || MK7) default "5" if X86_32 - default "4" config X86_DEBUGCTLMSR def_bool y
On April 25, 2025 5:10:27 AM PDT, Arnd Bergmann <arnd@kernel.org> wrote: >On Fri, Apr 25, 2025, at 10:42, Ingo Molnar wrote: >> @@ -257,7 +256,7 @@ config X86_MINIMUM_CPU_FAMILY >> int >> default "64" if X86_64 >> default "6" if X86_32 && (MPENTIUM4 || MPENTIUMM || MPENTIUMIII || >> MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MEFFICEON || MATOM || MK7) >> - default "5" if X86_32 && X86_CX8 >> + default "5" if X86_32 >> default "4" >> > >I just noticed this one: the final 'default "4"' is no longer possible >here and can be removed. All the remaining CPUs report family "5" or >higher. > >There is an old issue for some rare CPUs (Geode LX and Crusoe) that >support CMOV but report family=6. These to boot a kernel with X86_MINIMUM_CPU_FAMILY=6 because it triggers the boot time check. > > Arnd They report family=5 because family=6 implies fcomi and nopl support (in the case of Crusoe, they have fcomi but didn't support movl.)
On Fri, Apr 25, 2025, at 17:15, H. Peter Anvin wrote:
> On April 25, 2025 5:10:27 AM PDT, Arnd Bergmann <arnd@kernel.org> wrote:
>>
>>I just noticed this one: the final 'default "4"' is no longer possible
>>here and can be removed. All the remaining CPUs report family "5" or
>>higher.
>>
>>There is an old issue for some rare CPUs (Geode LX and Crusoe) that
>>support CMOV but report family=6. These to boot a kernel with X86_MINIMUM_CPU_FAMILY=6 because it triggers the boot time check.
>>
>
> They report family=5 because family=6 implies fcomi and nopl support
> (in the case of Crusoe, they have fcomi but didn't support movl.)
Ah right, I see now. I had only checked that the kernel itself
no longer uses nopl after your ba0593bf553c ("x86: completely
disable NOPL on 32 bits"), and I had seen that Debian intentionally
builds 32-bit i686 kernels with CONFIG_MGEODEGX1.
I now found that both Debian 12 and gcc 11 changed their definition
if 686 to actually require nopl for Indirect branch tracking
(-fcf-protection) in user space, as discussed in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104713
So even if it makes sense for GeodeLX specific kernel to use CMOV,
any general-purpose i686 distro would still want to enable IBT
in userspace to gain IBT on Tiger Lake and newer 64-bit CPUs.
Arnd
* Arnd Bergmann <arnd@kernel.org> wrote:
> On Fri, Apr 25, 2025, at 17:15, H. Peter Anvin wrote:
> > On April 25, 2025 5:10:27 AM PDT, Arnd Bergmann <arnd@kernel.org> wrote:
> >>
> >>I just noticed this one: the final 'default "4"' is no longer possible
> >>here and can be removed. All the remaining CPUs report family "5" or
> >>higher.
> >>
> >>There is an old issue for some rare CPUs (Geode LX and Crusoe) that
> >>support CMOV but report family=6. These to boot a kernel with X86_MINIMUM_CPU_FAMILY=6 because it triggers the boot time check.
> >>
> >
> > They report family=5 because family=6 implies fcomi and nopl support
> > (in the case of Crusoe, they have fcomi but didn't support movl.)
>
> Ah right, I see now. I had only checked that the kernel itself
> no longer uses nopl after your ba0593bf553c ("x86: completely
> disable NOPL on 32 bits"), and I had seen that Debian intentionally
> builds 32-bit i686 kernels with CONFIG_MGEODEGX1.
>
> I now found that both Debian 12 and gcc 11 changed their definition
> if 686 to actually require nopl for Indirect branch tracking
> (-fcf-protection) in user space, as discussed in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104713
>
> So even if it makes sense for GeodeLX specific kernel to use CMOV,
> any general-purpose i686 distro would still want to enable IBT
> in userspace to gain IBT on Tiger Lake and newer 64-bit CPUs.
And the kernel Debian 12 uses is a "686" one:
./pool/main/l/linux-signed-i386/linux-image-6.1.0-32-686_6.1.129-1_i386.deb
./pool/main/l/linux-signed-i386/linux-image-686_6.1.129-1_i386.deb
and the kernel is set to CONFIG_MGEODE_LX=y:
$ grep CONFIG_MGEODE_LX ./boot/config-6.1.0-32-686
CONFIG_MGEODE_LX=y
... which CPU has CMOV support:
config X86_CMOV
def_bool y
depends on (MK7 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || MATOM || MGEODE_LX || X86_64) ^^^^^^^^^
So I'd argue that the kernel's x86-32 CPU support cutoff should match
the i386 CPU support cutoff of the Debian i386 installer.
Survey of other distros:
- Fedora dropped x86-32 with Fedora 31, almost 5 years ago.
- Ubuntu dropped x86-32 after 18 LTS, more than 5 years ago. The LTS
kernel is v5.6 based.
- Arch Linux dropped i686 support even earlier than that, the
spin-off-community project of archlinux32.org has 486 and 686
variants. 686 variant includes CMOV.
- Gentoo has an 'x86' variant with 486 and 686 stages. 686 stage
includes CMOV.
Ie. I think we can also make CMOV a hard requirement, and keep support
for all family 5 CPUs that have CMOV and have a chance to boot current
32-bit distros. Even distros that had 486 builds have 686 variants that
should still work.
I.e. remove support for M586MMX, M586TSC, MCYRIXIII, MGEODEGX1 and MK6
as well, these don't have CMOV support and won't even boot i386 Debian
12.
Summary, the plan would be to remove support for the following pre-CMOV
CPUs (the ones not yet in this series are marked 'NEW'):
M486
M486SX
M586
M586MMX # NEW
M586TSC # NEW
MCYRIXIII # NEW
MELAN
MGEODEGX1 # NEW
MK6 # NEW
MWINCHIP3D
MWINCHIPC6
And to keep these:
M686
MATOM
MCRUSOE
MEFFICEON
MGEODE_LX
MK7
MPENTIUM4
MPENTIUMII
MPENTIUMIII
MPENTIUMM
MVIAC3_2
MVIAC7
Thanks,
Ingo
On Sun, Apr 27, 2025, at 11:25, Ingo Molnar wrote:
> * Arnd Bergmann <arnd@kernel.org> wrote:
>> On Fri, Apr 25, 2025, at 17:15, H. Peter Anvin wrote:
>>
>> I now found that both Debian 12 and gcc 11 changed their definition
>> if 686 to actually require nopl for Indirect branch tracking
>> (-fcf-protection) in user space, as discussed in
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104713
>>
>> So even if it makes sense for GeodeLX specific kernel to use CMOV,
>> any general-purpose i686 distro would still want to enable IBT
>> in userspace to gain IBT on Tiger Lake and newer 64-bit CPUs.
>
> And the kernel Debian 12 uses is a "686" one:
>
> ./pool/main/l/linux-signed-i386/linux-image-6.1.0-32-686_6.1.129-1_i386.deb
> ./pool/main/l/linux-signed-i386/linux-image-686_6.1.129-1_i386.deb
>
> and the kernel is set to CONFIG_MGEODE_LX=y:
>
> $ grep CONFIG_MGEODE_LX ./boot/config-6.1.0-32-686
> CONFIG_MGEODE_LX=y
>
> ... which CPU has CMOV support:
>
> config X86_CMOV
> def_bool y
> depends on (MK7 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII ||
> MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON ||
> MATOM || MGEODE_LX || X86_64)
>
>
> ^^^^^^^^^
> So I'd argue that the kernel's x86-32 CPU support cutoff should match
> the i386 CPU support cutoff of the Debian i386 installer.
I think this misses a few other bits of information, some of which
we already mentioned in this thread:
- Debian 13 no longer has any 32-bit kernel, so debian-i686 is
primarily targeted at running on 64-bit kernels for memory
constrained environments.
- Debian 12 started requiring NOPL in userspace, which is not
supported on GeodeLX (or Crusoe), the kernel option should have
been changed to M686 instead but was accidentally left at
MGEODE_LX, so the kernel still works, but userspace doesn't.
- Anyone running Linux on an i586 machine likely already wants
a custom kernel, regardless what the distros provide. This
is especially true for the embedded targets.
> Survey of other distros:
>
> - Fedora dropped x86-32 with Fedora 31, almost 5 years ago.
>
> - Ubuntu dropped x86-32 after 18 LTS, more than 5 years ago. The LTS
> kernel is v5.6 based.
>
> - Arch Linux dropped i686 support even earlier than that, the
> spin-off-community project of archlinux32.org has 486 and 686
> variants. 686 variant includes CMOV.
>
> - Gentoo has an 'x86' variant with 486 and 686 stages. 686 stage
> includes CMOV.
>
> Ie. I think we can also make CMOV a hard requirement, and keep support
> for all family 5 CPUs that have CMOV and have a chance to boot current
> 32-bit distros. Even distros that had 486 builds have 686 variants that
> should still work.
>
> I.e. remove support for M586MMX, M586TSC, MCYRIXIII, MGEODEGX1 and MK6
> as well, these don't have CMOV support and won't even boot i386 Debian
> 12.
>
> Summary, the plan would be to remove support for the following pre-CMOV
> CPUs (the ones not yet in this series are marked 'NEW'):
>
> M486
> M486SX
> M586
> M586MMX # NEW
> M586TSC # NEW
> MCYRIXIII # NEW
> MELAN
> MGEODEGX1 # NEW
> MK6 # NEW
> MWINCHIP3D
> MWINCHIPC6
This would also mean dropping support for the pre-2015 Intel Quark
and DM&P Vortex86DX/DX2/MX/EX that never had a custom CONFIG_Mxxxx
option but are still relevant to some degree.
I think that would be a mistake.
> And to keep these:
>
> M686
> MATOM
> MCRUSOE
> MEFFICEON
> MGEODE_LX
> MK7
> MPENTIUM4
> MPENTIUMII
> MPENTIUMIII
> MPENTIUMM
> MVIAC3_2
> MVIAC7
As Linus said, overall they are barely different from the
first group, and they are just as obsolete, only Atom and
Vortex86DX3/EmKore are less than 20 years old.
Here are some alternatives I like better than dropping i586:
a) keep my patch with an new bool option to pick between
i586 and i686 targets, by any name.
b) always build with -march=i586 and leave only the -mtune
flags; see if anyone cares enough to even benchmark
and pick one of the other options if they can show
a meaningful regression over -march=i686 -mtune=
c) keep the outcome of your v1 series, dropping only
pre-i586 support, and leave my patch out. No change here,
so at least no regression potential.
d) use -march=i686 (plus -mtune=) for normal builds, but
keep support for the older cores guarded by
X86_EXTENDED_PLATFORM or CONFIG_EXPERT, use -march=i586
if at least one of those platforms is selected.
Arnd
* Arnd Bergmann <arnd@kernel.org> wrote:
> > M486
> > M486SX
> > M586
> > M586MMX # NEW
> > M586TSC # NEW
> > MCYRIXIII # NEW
> > MELAN
> > MGEODEGX1 # NEW
> > MK6 # NEW
> > MWINCHIP3D
> > MWINCHIPC6
>
> This would also mean dropping support for the pre-2015 Intel Quark
> and DM&P Vortex86DX/DX2/MX/EX that never had a custom CONFIG_Mxxxx
> option but are still relevant to some degree.
> I think that would be a mistake.
Yeah, agreed, and especially with the <asm/bitops.h> CMOV complication
removed per Linus's patch, we could actually remove CONFIG_X86_CMOV, as
nothing uses it anymore:
starship:~/mingo.tip.git> git grep X86_CMOV
arch/x86/Kconfig.cpu:config X86_CMOV
arch/x86/Kconfig.cpufeatures: depends on X86_CMOV
The CMOV dependency comes in through compiler options only:
arch/x86/Makefile_32.cpu:cflags-$(CONFIG_M586TSC) += -march=i586
arch/x86/Makefile_32.cpu:cflags-$(CONFIG_M586MMX) += -march=pentium-mmx
arch/x86/Makefile_32.cpu:cflags-$(CONFIG_MCYRIXIII) += $(call cc-option,-march=c3,-march=i486) $(align)
arch/x86/Makefile_32.cpu:cflags-$(CONFIG_MGEODEGX1) += -march=pentium-mmx
arch/x86/Makefile_32.cpu:cflags-$(CONFIG_MGEODE_LX) += $(call cc-option,-march=geode,-march=pentium-mmx)
These build options will, indirectly, not include CMOV in the kernel
image, while i686 and higher march options will.
(BTW., we should probably remove the -march=i486 fallback for
MCYRIXIII, our minimum CC version is beyond that already I believe.)
Anyway, the current plan is to not drop common-i586, only the removal
of what's in the -v1 series:
M486
M486SX
M586
MELAN
MWINCHIP3D
MWINCHIPC6
> Here are some alternatives I like better than dropping i586:
>
> a) keep my patch with an new bool option to pick between
> i586 and i686 targets, by any name.
>
> b) always build with -march=i586 and leave only the -mtune
> flags; see if anyone cares enough to even benchmark
> and pick one of the other options if they can show
> a meaningful regression over -march=i686 -mtune=
That's actually a good idea IMO. I looked at the code generation with
current compilers and it turns out that M686 is *substantially* worse
in code generation than M586, as apparently the extra CMOV instructions
bloat up the generated code:
text data bss dec hex filename
15427023 7601010 1744896 24772929 17a0141 vmlinux.M586
16578295 7598826 1744896 25922017 18b89e1 vmlinux.M686
- +7.5% increase in text size (5.6% according to bloatometer),
- +2% increase in instruction count,
- while number of branches increases by +1.3%.
But it's not about CMOV: I checked about a dozen functions that end up
using CMOV, and the 'conditional' part of CMOV does seem to reduce
branches for those functions by a minor degree and ends up reducing
their size as well. So CMOV helps, a bit.
The substantial code bloat comes from some other aspect of GCC's
march=i686 flag ... I bet it's primarily inlining: there's a 0.7%
reduction in number of calls done.
I have a hard time believing that this kind of bloat and complexity
helps performance to *any* degree.
I really didn't remember how bad it was, until I re-measured it.
CMOV is likely a drop in the ocean compared to this kind of text bloat.
And yeah, it doesn't really matter that i686 class CPUs have larger
caches, the kernel is dominantly cache-cold code execution, inlining
driven bloat almost never helps performance.
> c) keep the outcome of your v1 series, dropping only
> pre-i586 support, and leave my patch out. No change here,
> so at least no regression potential.
Yeah, so this is roughly the current plan, with perhaps light touchups
on top to make it easier to configure, and to remove residual legacies.
Thanks,
Ingo
On Mon, Apr 28, 2025, at 11:16, Ingo Molnar wrote:
> * Arnd Bergmann <arnd@kernel.org> wrote:
>>
>> b) always build with -march=i586 and leave only the -mtune
>> flags; see if anyone cares enough to even benchmark
>> and pick one of the other options if they can show
>> a meaningful regression over -march=i686 -mtune=
>
> That's actually a good idea IMO. I looked at the code generation with
> current compilers and it turns out that M686 is *substantially* worse
> in code generation than M586, as apparently the extra CMOV instructions
> bloat up the generated code:
>
> text data bss dec hex filename
> 15427023 7601010 1744896 24772929 17a0141 vmlinux.M586
> 16578295 7598826 1744896 25922017 18b89e1 vmlinux.M686
>
> - +7.5% increase in text size (5.6% according to bloatometer),
> - +2% increase in instruction count,
> - while number of branches increases by +1.3%.
>
> But it's not about CMOV: I checked about a dozen functions that end up
> using CMOV, and the 'conditional' part of CMOV does seem to reduce
> branches for those functions by a minor degree and ends up reducing
> their size as well. So CMOV helps, a bit.
>
> The substantial code bloat comes from some other aspect of GCC's
> march=i686 flag ... I bet it's primarily inlining: there's a 0.7%
> reduction in number of calls done.
I had tried the same thing already, but saw a different result,
For me, the i686 output is 0.2% smaller than the i586 one (both
-mtune=generic), using gcc-14.2. or just 0.1% with clang-21,
which is roughly what I expected:
text data bss dec hex filename
7454055 4158218 1695744 13308017 cb1071 build/tmp/vmlinux-i586
7433427 4154146 1695744 13283317 caaff5 build/tmp/vmlinux-i686
7318514 4052573 1687552 13058639 c7424f build/tmp/vmlinux-i586-clang
7309938 4052573 1687552 13050063 c720cf build/tmp/vmlinux-i686-clang
I do see a larger difference compared to other -mtune= options, here is
the same config with "clang-21 -march=i586 -mtune=i686" instead of
"-march=i586 -mtune=generic":
7254510 4056669 1687552 12998731 c6584b build/tmp/vmlinux
There is a good chance that the -mtune= optimizations totally
dwarf cmov not just in code size difference but also actual
performance, the bit I'm unsure about is whether we still need
to worry about any core where this is not the case (I'm guessing
not but have no way to prove that).
Arnd
* Arnd Bergmann <arnd@kernel.org> wrote:
> On Mon, Apr 28, 2025, at 11:16, Ingo Molnar wrote:
> > * Arnd Bergmann <arnd@kernel.org> wrote:
> >>
> >> b) always build with -march=i586 and leave only the -mtune
> >> flags; see if anyone cares enough to even benchmark
> >> and pick one of the other options if they can show
> >> a meaningful regression over -march=i686 -mtune=
> >
> > That's actually a good idea IMO. I looked at the code generation with
> > current compilers and it turns out that M686 is *substantially* worse
> > in code generation than M586, as apparently the extra CMOV instructions
> > bloat up the generated code:
> >
> > text data bss dec hex filename
> > 15427023 7601010 1744896 24772929 17a0141 vmlinux.M586
> > 16578295 7598826 1744896 25922017 18b89e1 vmlinux.M686
> >
> > - +7.5% increase in text size (5.6% according to bloatometer),
> > - +2% increase in instruction count,
> > - while number of branches increases by +1.3%.
> >
> > But it's not about CMOV: I checked about a dozen functions that end up
> > using CMOV, and the 'conditional' part of CMOV does seem to reduce
> > branches for those functions by a minor degree and ends up reducing
> > their size as well. So CMOV helps, a bit.
> >
> > The substantial code bloat comes from some other aspect of GCC's
> > march=i686 flag ... I bet it's primarily inlining: there's a 0.7%
> > reduction in number of calls done.
>
> I had tried the same thing already, but saw a different result,
Just to clarify, my measurements only compare -march=i586 to
-march=i686, not -mtune. Your results are primarily -mtune figures.
So unless you see something different from my figures with -march only,
it's an apples to oranges comparison.
> There is a good chance that the -mtune= optimizations totally dwarf
> cmov not just in code size difference but also actual performance,
> the bit I'm unsure about is whether we still need to worry about any
> core where this is not the case (I'm guessing not but have no way to
> prove that).
I didn't use -mtune - I only tested two Kconfig variants:
CONFIG_M686=y vs. CONFIG_M586TSC=y
... which map to two -march flags, not different -mtune flags:
arch/x86/Makefile_32.cpu:cflags-$(CONFIG_M586TSC) += -march=i586
...
arch/x86/Makefile_32.cpu:cflags-$(CONFIG_M686) += -march=i686
This is the current upstream status quo of x86-32 compiler flags, which
results in significant .text bloat:
text data bss dec hex filename
15427023 7601010 1744896 24772929 17a0141 vmlinux.M586
16578295 7598826 1744896 25922017 18b89e1 vmlinux.M686
- +7.5% increase in text size (+5.6% according to bloatometer),
- +2% increase in instruction count,
- the number of branches increases by +1.3%,
- while there's a -0.7% reduction in number of CALLs done.
I believe this is mostly the result of increased amount of inlining GCC
14.2.0 does on march=i686 vs. march=i586.
The extra CMOV use on -march=i686 helps a bit but is overwhelmed by the
effects of inlining.
Obviously these metrics cannot be automatically transformed into
performance figures, but such inlining driven bloat almost always
reduces the kernel's performance even on CPUs with large caches, for
most but a few select 'hot' functions.
An interesting 'modern' twist: the reduced number of CALLs due to
increased inlining is almost certainly reflected in a reduced number of
CALLs in real workloads as well, which would be a disproportionately
positive factor on x86 kernels and CPUs with retbleed-style mitigations
activated (which is almost all of them).
Thanks,
Ingo
On Tue, Apr 29, 2025, at 12:22, Ingo Molnar wrote:
> * Arnd Bergmann <arnd@kernel.org> wrote:
>
> This is the current upstream status quo of x86-32 compiler flags, which
> results in significant .text bloat:
>
> text data bss dec hex filename
> 15427023 7601010 1744896 24772929 17a0141 vmlinux.M586
> 16578295 7598826 1744896 25922017 18b89e1 vmlinux.M686
> - +7.5% increase in text size (+5.6% according to bloatometer),
> - +2% increase in instruction count,
> - the number of branches increases by +1.3%,
> - while there's a -0.7% reduction in number of CALLs done.
>
> I believe this is mostly the result of increased amount of inlining GCC
> 14.2.0 does on march=i686 vs. march=i586.
I can reproduce +7% numbers like the ones you have shown when
CONFIG_X86_GENERIC is disabled, but not if I turn that on,
or with my "[RFC] x86/cpu: rework instruction set selection"
patch applied.
What makes this confusing is that the -march=i686 option does
two things: it changes the allowed instructions to include cmov,
and it changes the implicit -mtune= argument to the same value,
unless you pass an explicit -mtune= as well.
Selecting the i686 instruction set by itself does not change
the amount of inlining at all, you can see that by comparing the
i586 and i686 output when CONFIG_X86_GENERIC=y is set, or if you
change the flags in the Makefile
What really kills it is the implied -mtune=i686, these are the
results of manually changing the flags:
text data bss dec hex filename
7235028 4240706 1691648 13167382 c8eb16 vmlinux # i585
7218356 4240718 1691648 13150722 c8aa02 vmlinux # i686, tune=i586
7299828 4240706 1691648 13232182 c9e836 vmlinux # i586, tune=generic
7278948 4244826 1691648 13215422 c9a6be vmlinux # i686, tune=generic
7784708 4239410 1691648 13715766 d14936 vmlinux # i586, tune=i686
7768340 4239446 1691648 13699434 d1096a vmlinux # i686
If you set the CONFIG_M586/M686 options, you get an additional
effect from a couple of changed Kconfig options, that lead to
the i686 further shrinking a little more, mainly from less
padding:
-CONFIG_X86_F00F_BUG=y
-CONFIG_X86_ALIGNMENT_16=y
+CONFIG_X86_USE_PPRO_CHECKSUM=y
-CONFIG_X86_MINIMUM_CPU_FAMILY=5
+CONFIG_X86_CMOV=y
+CONFIG_X86_MINIMUM_CPU_FAMILY=6
+CONFIG_X86_DEBUGCTLMSR=y
-CONFIG_CPU_SUP_CYRIX_32=y
-CONFIG_FUNCTION_PADDING_CFI=11
-CONFIG_FUNCTION_PADDING_BYTES=16
+CONFIG_FUNCTION_PADDING_CFI=0
+CONFIG_FUNCTION_PADDING_BYTES=4
+CONFIG_X86_REQUIRED_FEATURE_CMOV=y
-CONFIG_FUNCTION_ALIGNMENT_16B=y
-CONFIG_FUNCTION_ALIGNMENT=16
+CONFIG_FUNCTION_ALIGNMENT=4
Arnd
On April 27, 2025 10:32:14 AM PDT, Arnd Bergmann <arnd@kernel.org> wrote: >On Sun, Apr 27, 2025, at 11:25, Ingo Molnar wrote: >> * Arnd Bergmann <arnd@kernel.org> wrote: >>> On Fri, Apr 25, 2025, at 17:15, H. Peter Anvin wrote: >>> >>> I now found that both Debian 12 and gcc 11 changed their definition >>> if 686 to actually require nopl for Indirect branch tracking >>> (-fcf-protection) in user space, as discussed in >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104713 >>> >>> So even if it makes sense for GeodeLX specific kernel to use CMOV, >>> any general-purpose i686 distro would still want to enable IBT >>> in userspace to gain IBT on Tiger Lake and newer 64-bit CPUs. >> >> And the kernel Debian 12 uses is a "686" one: >> >> ./pool/main/l/linux-signed-i386/linux-image-6.1.0-32-686_6.1.129-1_i386.deb >> ./pool/main/l/linux-signed-i386/linux-image-686_6.1.129-1_i386.deb >> >> and the kernel is set to CONFIG_MGEODE_LX=y: >> >> $ grep CONFIG_MGEODE_LX ./boot/config-6.1.0-32-686 >> CONFIG_MGEODE_LX=y >> >> ... which CPU has CMOV support: >> >> config X86_CMOV >> def_bool y >> depends on (MK7 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || >> MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || >> MATOM || MGEODE_LX || X86_64) >> >> >> ^^^^^^^^^ >> So I'd argue that the kernel's x86-32 CPU support cutoff should match >> the i386 CPU support cutoff of the Debian i386 installer. > >I think this misses a few other bits of information, some of which >we already mentioned in this thread: > >- Debian 13 no longer has any 32-bit kernel, so debian-i686 is > primarily targeted at running on 64-bit kernels for memory > constrained environments. > >- Debian 12 started requiring NOPL in userspace, which is not > supported on GeodeLX (or Crusoe), the kernel option should have > been changed to M686 instead but was accidentally left at > MGEODE_LX, so the kernel still works, but userspace doesn't. > >- Anyone running Linux on an i586 machine likely already wants > a custom kernel, regardless what the distros provide. This > is especially true for the embedded targets. > >> Survey of other distros: >> >> - Fedora dropped x86-32 with Fedora 31, almost 5 years ago. >> >> - Ubuntu dropped x86-32 after 18 LTS, more than 5 years ago. The LTS >> kernel is v5.6 based. >> >> - Arch Linux dropped i686 support even earlier than that, the >> spin-off-community project of archlinux32.org has 486 and 686 >> variants. 686 variant includes CMOV. >> >> - Gentoo has an 'x86' variant with 486 and 686 stages. 686 stage >> includes CMOV. >> >> Ie. I think we can also make CMOV a hard requirement, and keep support >> for all family 5 CPUs that have CMOV and have a chance to boot current >> 32-bit distros. Even distros that had 486 builds have 686 variants that >> should still work. >> >> I.e. remove support for M586MMX, M586TSC, MCYRIXIII, MGEODEGX1 and MK6 >> as well, these don't have CMOV support and won't even boot i386 Debian >> 12. >> >> Summary, the plan would be to remove support for the following pre-CMOV >> CPUs (the ones not yet in this series are marked 'NEW'): >> >> M486 >> M486SX >> M586 >> M586MMX # NEW >> M586TSC # NEW >> MCYRIXIII # NEW >> MELAN >> MGEODEGX1 # NEW >> MK6 # NEW >> MWINCHIP3D >> MWINCHIPC6 > >This would also mean dropping support for the pre-2015 Intel Quark >and DM&P Vortex86DX/DX2/MX/EX that never had a custom CONFIG_Mxxxx >option but are still relevant to some degree. >I think that would be a mistake. > >> And to keep these: >> >> M686 >> MATOM >> MCRUSOE >> MEFFICEON >> MGEODE_LX >> MK7 >> MPENTIUM4 >> MPENTIUMII >> MPENTIUMIII >> MPENTIUMM >> MVIAC3_2 >> MVIAC7 > >As Linus said, overall they are barely different from the >first group, and they are just as obsolete, only Atom and >Vortex86DX3/EmKore are less than 20 years old. > >Here are some alternatives I like better than dropping i586: > >a) keep my patch with an new bool option to pick between > i586 and i686 targets, by any name. > >b) always build with -march=i586 and leave only the -mtune > flags; see if anyone cares enough to even benchmark > and pick one of the other options if they can show > a meaningful regression over -march=i686 -mtune= > >c) keep the outcome of your v1 series, dropping only > pre-i586 support, and leave my patch out. No change here, > so at least no regression potential. > >d) use -march=i686 (plus -mtune=) for normal builds, but > keep support for the older cores guarded by > X86_EXTENDED_PLATFORM or CONFIG_EXPERT, use -march=i586 > if at least one of those platforms is selected. > > Arnd Interesting. They really should be using x32 for that application...
© 2016 - 2026 Red Hat, Inc.