arch/x86/Kconfig | 15 +++++++++++++++ arch/x86/pci/common.c | 14 ++++++++++++++ 2 files changed, 29 insertions(+)
As CPU performance demands increase, the configuration of some internal CPU
registers needs to be dynamically configured in the program, such as
configuring memory controller strategies within specific time windows.
These configurations place high demands on the efficiency of the
configuration instructions themselves, requiring them to retire and
take effect as quickly as possible.
However, the current kernel code forces the use of the IO Port method for
PCI accesses with domain=0 and offset less than 256. The IO Port method is
more like a legacy from historical reasons, and its performance is lower
than that of the MMCFG method. We conducted comparative tests on AMD and
Hygon CPUs respectively, even without considering the impact of indirect
access (IO Ports use 0xCF8 and 0xCFC), simply comparing the performance of
the following two code:
1)outl(0x400702,0xCFC);
2)mmio_config_writel(data_addr,0x400702);
while both codes access the same register. The results shows the MMCFG
(400+ cycle per access) method outperforms the IO Port (1000+ cycle
per access) by twice.
Through PMC/PMU event statistics within the AMD/Hygon microarchitecture,
we found IO Port access causes more stalls within the CPU's internal
dispatch module, and these stalls are mainly due to the front-end's
inability to decode the corresponding uops in a timely manner.
Therefore the main reason for the performance difference between the
two access methods is that the in/out instructions corresponding to
the IO Port access belong to microcode, and therefore their decoding
efficiency is lower than that of mmcfg.
For CPUs that support both MMCFG and IO Port access methods, if a hardware
register only supports IO Port access, this configuration may lead to
illegal access. However, we think registers that support I/O Port access
have corresponding MMCFG addresses. Even we test several AMD/Hygon CPUs
with this patch and found no problems, we still cannot rule out the
possibility that all CPUs are problem-free, especially older CPUs. To
address this risk, we have created a new macro, PREFER MMCONFIG, allowing
users to choose whether or not to enable this feature.
Signed-off-by: Yang Zhang <zhangz@hygon.cn>
---
arch/x86/Kconfig | 15 +++++++++++++++
arch/x86/pci/common.c | 14 ++++++++++++++
2 files changed, 29 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 80527299f..037d56690 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2932,6 +2932,21 @@ config PCI_MMCONFIG
Say Y otherwise.
+config PREFER_MMCONFIG
+ bool "Perfer to use mmconfig over IO Port"
+ depends on PCI_MMCONFIG
+ help
+ This setting will prioritize the use of mmcfg, which is superior to
+ io port from a performance perspective, mainly for the following reasons:
+ 1) io port is an indirect access; 2) io port instructions are decoded
+ by microcode, which is more likely to cause CPU front-end bound compared
+ to mmcfg using mov instructions.
+
+ For CPUs that support both MMCFG and IO Port access methods, if a
+ hardware register only supports IO Port access, this configuration
+ may lead to illegal access. Therefore, users must ensure that the
+ configuration will not cause any exceptions before enabling it.
+
config PCI_OLPC
def_bool y
depends on PCI && OLPC && (PCI_GOOLPC || PCI_GOANY)
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index ddb798603..8bde5d1df 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -40,20 +40,34 @@ const struct pci_raw_ops *__read_mostly raw_pci_ext_ops;
int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
int reg, int len, u32 *val)
{
+#ifdef CONFIG_PREFER_MMCONFIG
+ if (raw_pci_ext_ops)
+ return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
+ if (domain == 0 && reg < 256 && raw_pci_ops)
+ return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
+#else
if (domain == 0 && reg < 256 && raw_pci_ops)
return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
if (raw_pci_ext_ops)
return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
+#endif
return -EINVAL;
}
int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
int reg, int len, u32 val)
{
+#ifdef CONFIG_PREFER_MMCONFIG
+ if (raw_pci_ext_ops)
+ return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
+ if (domain == 0 && reg < 256 && raw_pci_ops)
+ return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
+#else
if (domain == 0 && reg < 256 && raw_pci_ops)
return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
if (raw_pci_ext_ops)
return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
+#endif
return -EINVAL;
}
--
2.34.1
On Tue, Dec 16 2025 at 18:03, Yang Zhang wrote:
> However, the current kernel code forces the use of the IO Port method for
> PCI accesses with domain=0 and offset less than 256. The IO Port method is
> more like a legacy from historical reasons, and its performance is lower
That code has a reason and if you would have taken the time to go back
in the git history and to read the related discussions in the LKML
archive then you could provide a proper explanation and not some
handwaving "like a legacy".
> than that of the MMCFG method. We conducted comparative tests on AMD and
> Hygon CPUs respectively, even without considering the impact of indirect
> access (IO Ports use 0xCF8 and 0xCFC), simply comparing the performance of
> the following two code:
>
> 1)outl(0x400702,0xCFC);
>
> 2)mmio_config_writel(data_addr,0x400702);
>
> while both codes access the same register. The results shows the MMCFG
> (400+ cycle per access) method outperforms the IO Port (1000+ cycle
> per access) by twice.
That's a known fact and has been discussed many times on LKML. See the
archive for details.
> Through PMC/PMU event statistics within the AMD/Hygon microarchitecture,
> we found IO Port access causes more stalls within the CPU's internal
> dispatch module, and these stalls are mainly due to the front-end's
> inability to decode the corresponding uops in a timely manner.
Interesting analysis.
> Therefore the main reason for the performance difference between the
> two access methods is that the in/out instructions corresponding to
> the IO Port access belong to microcode, and therefore their decoding
> efficiency is lower than that of mmcfg.
It's known forever that inb/outb are significantly slower not only due
to the micro code magic, but also because IO port instructions are
serializing against IO port instructions. See SDM/APM, it's documented.
> For CPUs that support both MMCFG and IO Port access methods, if a hardware
> register only supports IO Port access, this configuration may lead to
> illegal access. However, we think registers that support I/O Port access
> have corresponding MMCFG addresses.
We think? Either you know or not. By specification the MMIO config space
covers the complete config space from 0 to 4095.
> Even we test several AMD/Hygon CPUs with this patch and found no
> problems, we still cannot rule out the possibility that all CPUs are
> problem-free, especially older CPUs.
If you've had read the mailing list archives and the git history then
you would know for sure that there are systems out there which have
issues with accessing the lower config space via MMIO.
> To address this risk, we have created a new macro, PREFER MMCONFIG,
That's not a macro. That's a config switch, no?
> allowing users to choose whether or not to enable this feature.
Also please read and follow
https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog
> +config PREFER_MMCONFIG
> + bool "Perfer to use mmconfig over IO Port"
> + depends on PCI_MMCONFIG
> + help
> + This setting will prioritize the use of mmcfg, which is superior to
> + io port from a performance perspective, mainly for the following reasons:
> + 1) io port is an indirect access; 2) io port instructions are decoded
> + by microcode, which is more likely to cause CPU front-end bound compared
> + to mmcfg using mov instructions.
> +
> + For CPUs that support both MMCFG and IO Port access methods, if a
> + hardware register only supports IO Port access, this configuration
> + may lead to illegal access. Therefore, users must ensure that the
> + configuration will not cause any exceptions before enabling it.
Q: How is that supposed to work for distros?
A: Not at all.
The right thing to do here is:
1) Have a control variable, which determines the MMIO preference
2) Make this control default to false (backwards compatible)
3) Provide a command line option to enable/disable MMIO preference
4) Optionally allow the setup code to enable MMIO preference based
on e.g. CPU family/model cut-offs or some other reasonable method
which prevents a default on for the reportedly affected systems
(See LKML).
Thanks,
tglx
On Tue, 16 Dec 2025, Yang Zhang wrote:
> As CPU performance demands increase, the configuration of some internal CPU
> registers needs to be dynamically configured in the program, such as
> configuring memory controller strategies within specific time windows.
> These configurations place high demands on the efficiency of the
> configuration instructions themselves, requiring them to retire and
> take effect as quickly as possible.
>
> However, the current kernel code forces the use of the IO Port method for
> PCI accesses with domain=0 and offset less than 256. The IO Port method is
> more like a legacy from historical reasons, and its performance is lower
> than that of the MMCFG method. We conducted comparative tests on AMD and
> Hygon CPUs respectively, even without considering the impact of indirect
> access (IO Ports use 0xCF8 and 0xCFC), simply comparing the performance of
> the following two code:
>
> 1)outl(0x400702,0xCFC);
>
> 2)mmio_config_writel(data_addr,0x400702);
>
> while both codes access the same register. The results shows the MMCFG
> (400+ cycle per access) method outperforms the IO Port (1000+ cycle
> per access) by twice.
>
> Through PMC/PMU event statistics within the AMD/Hygon microarchitecture,
> we found IO Port access causes more stalls within the CPU's internal
> dispatch module, and these stalls are mainly due to the front-end's
> inability to decode the corresponding uops in a timely manner.
> Therefore the main reason for the performance difference between the
> two access methods is that the in/out instructions corresponding to
> the IO Port access belong to microcode, and therefore their decoding
> efficiency is lower than that of mmcfg.
>
> For CPUs that support both MMCFG and IO Port access methods, if a hardware
> register only supports IO Port access, this configuration may lead to
> illegal access. However, we think registers that support I/O Port access
> have corresponding MMCFG addresses. Even we test several AMD/Hygon CPUs
> with this patch and found no problems, we still cannot rule out the
> possibility that all CPUs are problem-free, especially older CPUs. To
> address this risk, we have created a new macro, PREFER MMCONFIG, allowing
> users to choose whether or not to enable this feature.
>
> Signed-off-by: Yang Zhang <zhangz@hygon.cn>
> ---
> arch/x86/Kconfig | 15 +++++++++++++++
> arch/x86/pci/common.c | 14 ++++++++++++++
> 2 files changed, 29 insertions(+)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 80527299f..037d56690 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2932,6 +2932,21 @@ config PCI_MMCONFIG
>
> Say Y otherwise.
>
> +config PREFER_MMCONFIG
> + bool "Perfer to use mmconfig over IO Port"
Prefer
--
i.
> + depends on PCI_MMCONFIG
> + help
> + This setting will prioritize the use of mmcfg, which is superior to
> + io port from a performance perspective, mainly for the following reasons:
> + 1) io port is an indirect access; 2) io port instructions are decoded
> + by microcode, which is more likely to cause CPU front-end bound compared
> + to mmcfg using mov instructions.
> +
> + For CPUs that support both MMCFG and IO Port access methods, if a
> + hardware register only supports IO Port access, this configuration
> + may lead to illegal access. Therefore, users must ensure that the
> + configuration will not cause any exceptions before enabling it.
> +
> config PCI_OLPC
> def_bool y
> depends on PCI && OLPC && (PCI_GOOLPC || PCI_GOANY)
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index ddb798603..8bde5d1df 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -40,20 +40,34 @@ const struct pci_raw_ops *__read_mostly raw_pci_ext_ops;
> int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
> int reg, int len, u32 *val)
> {
> +#ifdef CONFIG_PREFER_MMCONFIG
> + if (raw_pci_ext_ops)
> + return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
> + if (domain == 0 && reg < 256 && raw_pci_ops)
> + return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
> +#else
> if (domain == 0 && reg < 256 && raw_pci_ops)
> return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
> if (raw_pci_ext_ops)
> return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
> +#endif
> return -EINVAL;
> }
>
> int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
> int reg, int len, u32 val)
> {
> +#ifdef CONFIG_PREFER_MMCONFIG
> + if (raw_pci_ext_ops)
> + return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
> + if (domain == 0 && reg < 256 && raw_pci_ops)
> + return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
> +#else
> if (domain == 0 && reg < 256 && raw_pci_ops)
> return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
> if (raw_pci_ext_ops)
> return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
> +#endif
> return -EINVAL;
> }
>
>
© 2016 - 2026 Red Hat, Inc.