From: Arnd Bergmann <arnd@arndb.de>
Building an x86-64 kernel with CONFIG_GENERIC_CPU is documented to
run on all CPUs, but the Makefile does not actually pass an -march=
argument, instead relying on the default that was used to configure
the toolchain.
In many cases, gcc will be configured to -march=x86-64 or -march=k8
for maximum compatibility, but in other cases a distribution default
may be either raised to a more recent ISA, or set to -march=native
to build for the CPU used for compilation. This still works in the
case of building a custom kernel for the local machine.
The point where it breaks down is building a kernel for another
machine that is older the the default target. Changing the default
to -march=x86-64 would make it work reliable, but possibly produce
worse code on distros that intentionally default to a newer ISA.
To allow reliably building a kernel for either the oldest x86-64
CPUs or a more recent level, add three separate options for
v1, v2 and v3 of the architecture as defined by gcc and clang
and make them all turn on CONFIG_GENERIC_CPU. Based on this it
should be possible to change runtime feature detection into
build-time detection for things like cmpxchg16b, or possibly
gate features that are only available on older architectures.
Link: https://lists.llvm.org/pipermail/llvm-dev/2020-July/143289.html
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
arch/x86/Kconfig.cpu | 39 ++++++++++++++++++++++++++++++++++-----
arch/x86/Makefile | 6 ++++++
2 files changed, 40 insertions(+), 5 deletions(-)
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index 139db904e564..1461a739237b 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -260,7 +260,7 @@ endchoice
choice
prompt "x86-64 Processor family"
depends on X86_64
- default GENERIC_CPU
+ default X86_64_V2
help
This is the processor type of your CPU. This information is
used for optimizing purposes. In order to compile a kernel
@@ -314,15 +314,44 @@ config MSILVERMONT
early Atom CPUs based on the Bonnell microarchitecture,
such as Atom 230/330, D4xx/D5xx, D2xxx, N2xxx or Z2xxx.
-config GENERIC_CPU
- bool "Generic-x86-64"
+config X86_64_V1
+ bool "Generic x86-64"
depends on X86_64
help
- Generic x86-64 CPU.
- Run equally well on all x86-64 CPUs.
+ Generic x86-64-v1 CPU.
+ Run equally well on all x86-64 CPUs, including early Pentium-4
+ variants lacking the sahf and cmpxchg16b instructions as well
+ as the AMD K8 and Intel Core 2 lacking popcnt.
+
+config X86_64_V2
+ bool "Generic x86-64 v2"
+ depends on X86_64
+ help
+ Generic x86-64-v2 CPU.
+ Run equally well on all x86-64 CPUs that meet the x86-64-v2
+ definition as well as those that only miss the optional
+ SSE3/SSSE3/SSE4.1 portions.
+ Examples of this include Intel Nehalem and Silvermont,
+ AMD Bulldozer (K10) and Jaguar as well as VIA Nano that
+ include popcnt, cmpxchg16b and sahf.
+
+config X86_64_V3
+ bool "Generic x86-64 v3"
+ depends on X86_64
+ help
+ Generic x86-64-v3 CPU.
+ Run equally well on all x86-64 CPUs that meet the x86-64-v3
+ definition as well as those that only miss the optional
+ AVX/AVX2 portions.
+ Examples of this include the Intel Haswell and AMD Excavator
+ microarchitectures that include the bmi1/bmi2, lzncnt, movbe
+ and xsave instruction set extensions.
endchoice
+config GENERIC_CPU
+ def_bool X86_64_V1 || X86_64_V2 || X86_64_V3
+
config X86_GENERIC
bool "Generic x86 support"
depends on X86_32
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 05887ae282f5..1fdc3fc6a54e 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -183,6 +183,9 @@ else
cflags-$(CONFIG_MPSC) += -march=nocona
cflags-$(CONFIG_MCORE2) += -march=core2
cflags-$(CONFIG_MSILVERMONT) += -march=silvermont
+ cflags-$(CONFIG_MX86_64_V1) += -march=x86-64
+ cflags-$(CONFIG_MX86_64_V2) += $(call cc-option,-march=x86-64-v2,-march=x86-64)
+ cflags-$(CONFIG_MX86_64_V3) += $(call cc-option,-march=x86-64-v3,-march=x86-64)
cflags-$(CONFIG_GENERIC_CPU) += -mtune=generic
KBUILD_CFLAGS += $(cflags-y)
@@ -190,6 +193,9 @@ else
rustflags-$(CONFIG_MPSC) += -Ctarget-cpu=nocona
rustflags-$(CONFIG_MCORE2) += -Ctarget-cpu=core2
rustflags-$(CONFIG_MSILVERMONT) += -Ctarget-cpu=silvermont
+ rustflags-$(CONFIG_MX86_64_V1) += -Ctarget-cpu=x86-64
+ rustflags-$(CONFIG_MX86_64_V2) += -Ctarget-cpu=x86-64-v2
+ rustflags-$(CONFIG_MX86_64_V3) += -Ctarget-cpu=x86-64-v3
rustflags-$(CONFIG_GENERIC_CPU) += -Ztune-cpu=generic
KBUILD_RUSTFLAGS += $(rustflags-y)
--
2.39.5
From: Arnd Bergmann > Sent: 04 December 2024 10:31 > Building an x86-64 kernel with CONFIG_GENERIC_CPU is documented to > run on all CPUs, but the Makefile does not actually pass an -march= > argument, instead relying on the default that was used to configure > the toolchain. > > In many cases, gcc will be configured to -march=x86-64 or -march=k8 > for maximum compatibility, but in other cases a distribution default > may be either raised to a more recent ISA, or set to -march=native > to build for the CPU used for compilation. This still works in the > case of building a custom kernel for the local machine. > > The point where it breaks down is building a kernel for another > machine that is older the the default target. Changing the default > to -march=x86-64 would make it work reliable, but possibly produce > worse code on distros that intentionally default to a newer ISA. > > To allow reliably building a kernel for either the oldest x86-64 > CPUs or a more recent level, add three separate options for > v1, v2 and v3 of the architecture as defined by gcc and clang > and make them all turn on CONFIG_GENERIC_CPU. Based on this it > should be possible to change runtime feature detection into > build-time detection for things like cmpxchg16b, or possibly > gate features that are only available on older architectures. > > Link: https://lists.llvm.org/pipermail/llvm-dev/2020-July/143289.html > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/x86/Kconfig.cpu | 39 ++++++++++++++++++++++++++++++++++----- > arch/x86/Makefile | 6 ++++++ > 2 files changed, 40 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu > index 139db904e564..1461a739237b 100644 > --- a/arch/x86/Kconfig.cpu > +++ b/arch/x86/Kconfig.cpu > @@ -260,7 +260,7 @@ endchoice > choice > prompt "x86-64 Processor family" > depends on X86_64 > - default GENERIC_CPU > + default X86_64_V2 > help > This is the processor type of your CPU. This information is > used for optimizing purposes. In order to compile a kernel > @@ -314,15 +314,44 @@ config MSILVERMONT > early Atom CPUs based on the Bonnell microarchitecture, > such as Atom 230/330, D4xx/D5xx, D2xxx, N2xxx or Z2xxx. > > -config GENERIC_CPU > - bool "Generic-x86-64" > +config X86_64_V1 > + bool "Generic x86-64" > depends on X86_64 > help > - Generic x86-64 CPU. > - Run equally well on all x86-64 CPUs. > + Generic x86-64-v1 CPU. > + Run equally well on all x86-64 CPUs, including early Pentium-4 > + variants lacking the sahf and cmpxchg16b instructions as well > + as the AMD K8 and Intel Core 2 lacking popcnt. The 'equally well' text was clearly always wrong (equally badly?) but is now just 'plain wrong'. Perhaps: Runs on all x86-64 CPUs including early cpu that lack the sahf, cmpxchg16b and popcnt instructions. Then for V2 (or whatever it gets called) Requires support for the sahf, cmpxchg16b and popcnt instructions. This will not run on AMD K8 or Intel before Sandy bridge. I think someone suggested that run-time detect of AVX/AVX2/AVX512 is fine? David > + > +config X86_64_V2 > + bool "Generic x86-64 v2" > + depends on X86_64 > + help > + Generic x86-64-v2 CPU. > + Run equally well on all x86-64 CPUs that meet the x86-64-v2 > + definition as well as those that only miss the optional > + SSE3/SSSE3/SSE4.1 portions. > + Examples of this include Intel Nehalem and Silvermont, > + AMD Bulldozer (K10) and Jaguar as well as VIA Nano that > + include popcnt, cmpxchg16b and sahf. > + > +config X86_64_V3 > + bool "Generic x86-64 v3" > + depends on X86_64 > + help > + Generic x86-64-v3 CPU. > + Run equally well on all x86-64 CPUs that meet the x86-64-v3 > + definition as well as those that only miss the optional > + AVX/AVX2 portions. > + Examples of this include the Intel Haswell and AMD Excavator > + microarchitectures that include the bmi1/bmi2, lzncnt, movbe > + and xsave instruction set extensions. > > endchoice > > +config GENERIC_CPU > + def_bool X86_64_V1 || X86_64_V2 || X86_64_V3 > + > config X86_GENERIC > bool "Generic x86 support" > depends on X86_32 > diff --git a/arch/x86/Makefile b/arch/x86/Makefile > index 05887ae282f5..1fdc3fc6a54e 100644 > --- a/arch/x86/Makefile > +++ b/arch/x86/Makefile > @@ -183,6 +183,9 @@ else > cflags-$(CONFIG_MPSC) += -march=nocona > cflags-$(CONFIG_MCORE2) += -march=core2 > cflags-$(CONFIG_MSILVERMONT) += -march=silvermont > + cflags-$(CONFIG_MX86_64_V1) += -march=x86-64 > + cflags-$(CONFIG_MX86_64_V2) += $(call cc-option,-march=x86-64-v2,-march=x86-64) > + cflags-$(CONFIG_MX86_64_V3) += $(call cc-option,-march=x86-64-v3,-march=x86-64) > cflags-$(CONFIG_GENERIC_CPU) += -mtune=generic > KBUILD_CFLAGS += $(cflags-y) > > @@ -190,6 +193,9 @@ else > rustflags-$(CONFIG_MPSC) += -Ctarget-cpu=nocona > rustflags-$(CONFIG_MCORE2) += -Ctarget-cpu=core2 > rustflags-$(CONFIG_MSILVERMONT) += -Ctarget-cpu=silvermont > + rustflags-$(CONFIG_MX86_64_V1) += -Ctarget-cpu=x86-64 > + rustflags-$(CONFIG_MX86_64_V2) += -Ctarget-cpu=x86-64-v2 > + rustflags-$(CONFIG_MX86_64_V3) += -Ctarget-cpu=x86-64-v3 > rustflags-$(CONFIG_GENERIC_CPU) += -Ztune-cpu=generic > KBUILD_RUSTFLAGS += $(rustflags-y) > > -- > 2.39.5 > - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
"On second thought , let’s not go to x86-64 microarchitectural
levels. ‘Tis a silly place"
On Wed, 4 Dec 2024 at 02:31, Arnd Bergmann <arnd@kernel.org> wrote:
>
> To allow reliably building a kernel for either the oldest x86-64
> CPUs or a more recent level, add three separate options for
> v1, v2 and v3 of the architecture as defined by gcc and clang
> and make them all turn on CONFIG_GENERIC_CPU.
The whole "v2", "v3", "v4" etc naming seems to be some crazy glibc
artifact and is stupid and needs to die.
It has no relevance to anything. Please do *not* introduce that
mind-fart into the kernel sources.
I have no idea who came up with the "microarchitecture levels"
garbage, but as far as I can tell, it's entirely unofficial, and it's
a completely broken model.
There is a very real model for microarchitectural features, and it's
the CPUID bits. Trying to linearize those bits is technically wrong,
since these things simply aren't some kind of linear progression.
And worse, it's a "simplification" that literally adds complexity. Now
instead of asking "does this CPU support the cmpxchgb16 instruction?",
the question instead becomes one of "what the hell does 'v3' mean
again?"
So no. We are *NOT* introducing that idiocy in the kernel.
Linus
Hi Arnd, On Wed, Dec 04, 2024 at 11:30:40AM +0100, Arnd Bergmann wrote: ... > +++ b/arch/x86/Kconfig.cpu > +config X86_64_V1 > +config X86_64_V2 > +config X86_64_V3 ... > +++ b/arch/x86/Makefile > + cflags-$(CONFIG_MX86_64_V1) += -march=x86-64 > + cflags-$(CONFIG_MX86_64_V2) += $(call cc-option,-march=x86-64-v2,-march=x86-64) > + cflags-$(CONFIG_MX86_64_V3) += $(call cc-option,-march=x86-64-v3,-march=x86-64) ... > + rustflags-$(CONFIG_MX86_64_V1) += -Ctarget-cpu=x86-64 > + rustflags-$(CONFIG_MX86_64_V2) += -Ctarget-cpu=x86-64-v2 > + rustflags-$(CONFIG_MX86_64_V3) += -Ctarget-cpu=x86-64-v3 There appears to be an extra 'M' when using these CONFIGs in Makefile, so I don't think this works as is? Cheers, Nathan
On Wed, Dec 4, 2024, at 18:09, Nathan Chancellor wrote:
> Hi Arnd,
>
> On Wed, Dec 04, 2024 at 11:30:40AM +0100, Arnd Bergmann wrote:
> ...
>> +++ b/arch/x86/Kconfig.cpu
>> +config X86_64_V1
>> +config X86_64_V2
>> +config X86_64_V3
> ...
>> +++ b/arch/x86/Makefile
>> + cflags-$(CONFIG_MX86_64_V1) += -march=x86-64
>> + cflags-$(CONFIG_MX86_64_V2) += $(call cc-option,-march=x86-64-v2,-march=x86-64)
>> + cflags-$(CONFIG_MX86_64_V3) += $(call cc-option,-march=x86-64-v3,-march=x86-64)
> ...
>> + rustflags-$(CONFIG_MX86_64_V1) += -Ctarget-cpu=x86-64
>> + rustflags-$(CONFIG_MX86_64_V2) += -Ctarget-cpu=x86-64-v2
>> + rustflags-$(CONFIG_MX86_64_V3) += -Ctarget-cpu=x86-64-v3
>
> There appears to be an extra 'M' when using these CONFIGs in Makefile,
> so I don't think this works as is?
Fixed now by adding the 'M' in the Kconfig file, thanks for
noticing it.
Arnd
On 12/4/24 11:30, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > Building an x86-64 kernel with CONFIG_GENERIC_CPU is documented to > run on all CPUs, but the Makefile does not actually pass an -march= > argument, instead relying on the default that was used to configure > the toolchain. > > In many cases, gcc will be configured to -march=x86-64 or -march=k8 > for maximum compatibility, but in other cases a distribution default > may be either raised to a more recent ISA, or set to -march=native > to build for the CPU used for compilation. This still works in the > case of building a custom kernel for the local machine. > > The point where it breaks down is building a kernel for another > machine that is older the the default target. Changing the default > to -march=x86-64 would make it work reliable, but possibly produce > worse code on distros that intentionally default to a newer ISA. > > To allow reliably building a kernel for either the oldest x86-64 > CPUs or a more recent level, add three separate options for > v1, v2 and v3 of the architecture as defined by gcc and clang > and make them all turn on CONFIG_GENERIC_CPU. Based on this it > should be possible to change runtime feature detection into > build-time detection for things like cmpxchg16b, or possibly > gate features that are only available on older architectures. > Hi Arnd, Similar but not identical changes have been proposed in the past several times like e.g. in 1, 2 and likely even more often. Your solution seems to be much cleaner, I like it. That said, on my Skylake platform, there is no difference between -march=x86-64 and -march=x86-64-v3 in terms of kernel binary size or performance. I think Boris also said that these settings make no real difference on code generation. Other settings might make a small difference (numbers are from 2023): -generic: 85.089.784 bytes -core2: 85.139.932 bytes -march=skylake: 85.017.808 bytes ---- [1] https://lore.kernel.org/all/4_u6ZNYPbaK36xkLt8ApRhiRTyWp_-NExHCH_tTFO_fanDglEmcbfowmiB505heI4md2AuR9hS-VSkf4s90sXb5--AnNTOwvPaTmcgzRYSY=@proton.me/ [2] https://lore.kernel.org/all/20230707105601.133221-1-dimitri.ledkov@canonical.com/
On Wed, Dec 4, 2024, at 16:36, Tor Vic wrote:
> On 12/4/24 11:30, Arnd Bergmann wrote:
> Similar but not identical changes have been proposed in the past several
> times like e.g. in 1, 2 and likely even more often.
>
> Your solution seems to be much cleaner, I like it.
Thanks. It looks like the other two did not actually
address the bug I'm fixing in my version.
> That said, on my Skylake platform, there is no difference between
> -march=x86-64 and -march=x86-64-v3 in terms of kernel binary size or
> performance.
> I think Boris also said that these settings make no real difference on
> code generation.
As Nathan pointed out, I had a typo in my patch, so the
options didn't actually do anything at all. I fixed it now
and did a 'defconfig' test build with all three:
> Other settings might make a small difference (numbers are from 2023):
> -generic: 85.089.784 bytes
> -core2: 85.139.932 bytes
> -march=skylake: 85.017.808 bytes
text data bss dec hex filename
26664466 10806622 1490948 38962036 2528374 obj-x86/vmlinux-v1
26664466 10806622 1490948 38962036 2528374 obj-x86/vmlinux-v2
26662504 10806654 1490948 38960106 2527bea obj-x86/vmlinux-v3
which is a tiny 2KB saved between v2 and v3. I looked at
the object code and found that the v3 version takes advantage
of the BMI extension, which makes perfect sense. Not sure
if it has any real performance benefits.
Between v1 and v2, there is a chance to turn things like
system_has_cmpxchg128() into a constant on v2 and higher.
The v4 version is meaningless in practice since it only
adds AVX512 instructions that are only present in very
few CPUs and not that useful inside the kernel side from
specialized crypto and raid helpers.
Arnd
© 2016 - 2025 Red Hat, Inc.