With kconfig option NUMA_KEEP_MEMINFO disabled the SRAT lookup done
with numa_fill_memblks() fails returning NUMA_NO_MEMBLK (-1). An
existing SRAT memory range cannot be found for a CFMWS address range.
This causes the addition of a duplicate numa_memblk with a different
node id and a subsequent page fault and kernel crash during boot.
Note that the issue was initially introduced with [1]. But since
phys_to_target_node() was originally used that returned the valid node
0, an additional numa_memblk was not added. Though, the node id was
wrong too.
Fix this by enabling NUMA_KEEP_MEMINFO for x86 with ACPI and NUMA
enabled.
[1] fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT")
Fixes: 8f1004679987 ("ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window")
Cc: Derick Marks <derick.w.marks@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
---
drivers/acpi/numa/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/acpi/numa/Kconfig b/drivers/acpi/numa/Kconfig
index 849c2bd820b9..2f4ac6ac6768 100644
--- a/drivers/acpi/numa/Kconfig
+++ b/drivers/acpi/numa/Kconfig
@@ -3,6 +3,7 @@ config ACPI_NUMA
bool "NUMA support"
depends on NUMA
depends on (X86 || ARM64 || LOONGARCH)
+ select NUMA_KEEP_MEMINFO if X86
default y if ARM64
config ACPI_HMAT
--
2.39.2
Robert Richter wrote:
> With kconfig option NUMA_KEEP_MEMINFO disabled the SRAT lookup done
> with numa_fill_memblks() fails returning NUMA_NO_MEMBLK (-1). An
> existing SRAT memory range cannot be found for a CFMWS address range.
> This causes the addition of a duplicate numa_memblk with a different
> node id and a subsequent page fault and kernel crash during boot.
>
> Note that the issue was initially introduced with [1]. But since
> phys_to_target_node() was originally used that returned the valid node
> 0, an additional numa_memblk was not added. Though, the node id was
> wrong too.
>
> Fix this by enabling NUMA_KEEP_MEMINFO for x86 with ACPI and NUMA
> enabled.
>
> [1] fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT")
>
> Fixes: 8f1004679987 ("ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window")
> Cc: Derick Marks <derick.w.marks@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
> drivers/acpi/numa/Kconfig | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/acpi/numa/Kconfig b/drivers/acpi/numa/Kconfig
> index 849c2bd820b9..2f4ac6ac6768 100644
> --- a/drivers/acpi/numa/Kconfig
> +++ b/drivers/acpi/numa/Kconfig
> @@ -3,6 +3,7 @@ config ACPI_NUMA
> bool "NUMA support"
> depends on NUMA
> depends on (X86 || ARM64 || LOONGARCH)
> + select NUMA_KEEP_MEMINFO if X86
> default y if ARM64
A fix is needed, yes, but this is the wrong one. NUMA_KEEP_MEMINFO is
only about marking numa_meminfo data as not "__init". Since
numa_fill_memblks() *is* an __init function, it should have no
dependency on NUMA_KEEP_MEMINFO.
The fix here involves moving the definition of numa_fill_memblks() out
of the "#ifdef CONFIG_NUMA_KEEP_MEMINFO" in
arch/x86/include/asm/sparsemem.h so that it does not fallback to the
default definition in include/linux/numa.h.
It should also be the case that cxl_acpi needs this:
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 67998dbd1d46..1bf25185c35b 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -6,6 +6,7 @@ menuconfig CXL_BUS
select FW_UPLOAD
select PCI_DOE
select FIRMWARE_TABLE
+ select NUMA_KEEP_MEMINFO if NUMA
help
CXL is a bus that is electrically compatible with PCI Express, but
layers three protocols on that signalling (CXL.io, CXL.cache, and
Hi Dan,
thanks for the quick review.
Yes, this is the 'old' patch. But only the subject was corrected. I
will send a v2 anyway. See below.
On 18.03.24 14:26:41, Dan Williams wrote:
> Robert Richter wrote:
> > With kconfig option NUMA_KEEP_MEMINFO disabled the SRAT lookup done
> > with numa_fill_memblks() fails returning NUMA_NO_MEMBLK (-1). An
> > existing SRAT memory range cannot be found for a CFMWS address range.
> > This causes the addition of a duplicate numa_memblk with a different
> > node id and a subsequent page fault and kernel crash during boot.
> >
> > Note that the issue was initially introduced with [1]. But since
> > phys_to_target_node() was originally used that returned the valid node
> > 0, an additional numa_memblk was not added. Though, the node id was
> > wrong too.
> >
> > Fix this by enabling NUMA_KEEP_MEMINFO for x86 with ACPI and NUMA
> > enabled.
> >
> > [1] fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT")
> >
> > Fixes: 8f1004679987 ("ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window")
> > Cc: Derick Marks <derick.w.marks@intel.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Alison Schofield <alison.schofield@intel.com>
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> > ---
> > drivers/acpi/numa/Kconfig | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/acpi/numa/Kconfig b/drivers/acpi/numa/Kconfig
> > index 849c2bd820b9..2f4ac6ac6768 100644
> > --- a/drivers/acpi/numa/Kconfig
> > +++ b/drivers/acpi/numa/Kconfig
> > @@ -3,6 +3,7 @@ config ACPI_NUMA
> > bool "NUMA support"
> > depends on NUMA
> > depends on (X86 || ARM64 || LOONGARCH)
> > + select NUMA_KEEP_MEMINFO if X86
> > default y if ARM64
>
> A fix is needed, yes, but this is the wrong one. NUMA_KEEP_MEMINFO is
> only about marking numa_meminfo data as not "__init". Since
> numa_fill_memblks() *is* an __init function, it should have no
> dependency on NUMA_KEEP_MEMINFO.
Right, the option is about just keeping it in non-init mem, but the
parsing is durint __init. Will take a look.
>
> The fix here involves moving the definition of numa_fill_memblks() out
> of the "#ifdef CONFIG_NUMA_KEEP_MEMINFO" in
> arch/x86/include/asm/sparsemem.h so that it does not fallback to the
> default definition in include/linux/numa.h.
>
> It should also be the case that cxl_acpi needs this:
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 67998dbd1d46..1bf25185c35b 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -6,6 +6,7 @@ menuconfig CXL_BUS
> select FW_UPLOAD
> select PCI_DOE
> select FIRMWARE_TABLE
> + select NUMA_KEEP_MEMINFO if NUMA
Ok, will take a look here too.
Thanks,
-Robert
> help
> CXL is a bus that is electrically compatible with PCI Express, but
> layers three protocols on that signalling (CXL.io, CXL.cache, and
Hi Dan,
patch below. I have not included it into v2 of the SRAT/CEDT changes
as it is cxl specific and can be applied separately.
Thanks,
-Robert
On 18.03.24 14:26:41, Dan Williams wrote:
> It should also be the case that cxl_acpi needs this:
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 67998dbd1d46..1bf25185c35b 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -6,6 +6,7 @@ menuconfig CXL_BUS
> select FW_UPLOAD
> select PCI_DOE
> select FIRMWARE_TABLE
> + select NUMA_KEEP_MEMINFO if NUMA
> help
> CXL is a bus that is electrically compatible with PCI Express, but
> layers three protocols on that signalling (CXL.io, CXL.cache, and
From be5b495980bae41d879909212db02dac0fba978e Mon Sep 17 00:00:00 2001
From: Robert Richter <rrichter@amd.com>
Date: Tue, 19 Mar 2024 09:28:33 +0100
Subject: [PATCH] cxl: Fix use of phys_to_target_node() outside of init section
The CXL driver uses both functions phys_to_target_node() and
memory_add_physaddr_to_nid(). The x86 architecture relies on the
NUMA_KEEP_MEMINFO kernel option to be set. Enable the option for the
driver accordingly.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
---
drivers/cxl/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 67998dbd1d46..6140b3529a29 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -6,6 +6,7 @@ menuconfig CXL_BUS
select FW_UPLOAD
select PCI_DOE
select FIRMWARE_TABLE
+ select NUMA_KEEP_MEMINFO if (NUMA && X86)
help
CXL is a bus that is electrically compatible with PCI Express, but
layers three protocols on that signalling (CXL.io, CXL.cache, and
--
2.39.2
Robert Richter wrote: > Hi Dan, > > patch below. I have not included it into v2 of the SRAT/CEDT changes > as it is cxl specific and can be applied separately. > > Thanks, > > -Robert > > > On 18.03.24 14:26:41, Dan Williams wrote: > > It should also be the case that cxl_acpi needs this: > > > > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig > > index 67998dbd1d46..1bf25185c35b 100644 > > --- a/drivers/cxl/Kconfig > > +++ b/drivers/cxl/Kconfig > > @@ -6,6 +6,7 @@ menuconfig CXL_BUS > > select FW_UPLOAD > > select PCI_DOE > > select FIRMWARE_TABLE > > + select NUMA_KEEP_MEMINFO if NUMA > > help > > CXL is a bus that is electrically compatible with PCI Express, but > > layers three protocols on that signalling (CXL.io, CXL.cache, and > > From be5b495980bae41d879909212db02dac0fba978e Mon Sep 17 00:00:00 2001 Hi Robert, When you send inline patches like this can you remember to include a scissors line? That way tools like "b4 am" automatically know where to trim things. So add a line like the following: -- >8 -- ...see "git mailinfo --help" for details. Also note that if you reply with an updated patch in a series include the "vX NN/MM" suffix, like "Subject: [PATCH v3 2/3] ..." so that b4 am knows to perform a "partial reroll". > From: Robert Richter <rrichter@amd.com> > Date: Tue, 19 Mar 2024 09:28:33 +0100 > Subject: [PATCH] cxl: Fix use of phys_to_target_node() outside of init section > > The CXL driver uses both functions phys_to_target_node() and > memory_add_physaddr_to_nid(). The x86 architecture relies on the > NUMA_KEEP_MEMINFO kernel option to be set. Enable the option for the > driver accordingly. > > Suggested-by: Dan Williams <dan.j.williams@intel.com> > Signed-off-by: Robert Richter <rrichter@amd.com> > --- > drivers/cxl/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig > index 67998dbd1d46..6140b3529a29 100644 > --- a/drivers/cxl/Kconfig > +++ b/drivers/cxl/Kconfig > @@ -6,6 +6,7 @@ menuconfig CXL_BUS > select FW_UPLOAD > select PCI_DOE > select FIRMWARE_TABLE > + select NUMA_KEEP_MEMINFO if (NUMA && X86) > help > CXL is a bus that is electrically compatible with PCI Express, but > layers three protocols on that signalling (CXL.io, CXL.cache, and > -- > 2.39.2 >
On 19.03.24 17:21:53, Dan Williams wrote: > Robert Richter wrote: > > Hi Dan, > > > > patch below. I have not included it into v2 of the SRAT/CEDT changes > > as it is cxl specific and can be applied separately. > > > > Thanks, > > > > -Robert > > > > > > On 18.03.24 14:26:41, Dan Williams wrote: > > > It should also be the case that cxl_acpi needs this: > > > > > > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig > > > index 67998dbd1d46..1bf25185c35b 100644 > > > --- a/drivers/cxl/Kconfig > > > +++ b/drivers/cxl/Kconfig > > > @@ -6,6 +6,7 @@ menuconfig CXL_BUS > > > select FW_UPLOAD > > > select PCI_DOE > > > select FIRMWARE_TABLE > > > + select NUMA_KEEP_MEMINFO if NUMA > > > help > > > CXL is a bus that is electrically compatible with PCI Express, but > > > layers three protocols on that signalling (CXL.io, CXL.cache, and > > > > From be5b495980bae41d879909212db02dac0fba978e Mon Sep 17 00:00:00 2001 > > Hi Robert, > > When you send inline patches like this can you remember to include a > scissors line? That way tools like "b4 am" automatically know where to > trim things. So add a line like the following: > > -- >8 -- > > ...see "git mailinfo --help" for details. Thanks for the inside on your patch processing. Will use that in the future. > > Also note that if you reply with an updated patch in a series include > the "vX NN/MM" suffix, like "Subject: [PATCH v3 2/3] ..." so that b4 am > knows to perform a "partial reroll". This patch is in addition to the other SRAT patches and can be applied directly to the cxl tree. That is why there is no version update here. But I replied to this series for reference. I saw the b4 shazam --no-parent option, would that help here? Thanks, -Robert > > > From: Robert Richter <rrichter@amd.com> > > Date: Tue, 19 Mar 2024 09:28:33 +0100 > > Subject: [PATCH] cxl: Fix use of phys_to_target_node() outside of init section > > > > The CXL driver uses both functions phys_to_target_node() and > > memory_add_physaddr_to_nid(). The x86 architecture relies on the > > NUMA_KEEP_MEMINFO kernel option to be set. Enable the option for the > > driver accordingly. > > > > Suggested-by: Dan Williams <dan.j.williams@intel.com> > > Signed-off-by: Robert Richter <rrichter@amd.com> > > --- > > drivers/cxl/Kconfig | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig > > index 67998dbd1d46..6140b3529a29 100644 > > --- a/drivers/cxl/Kconfig > > +++ b/drivers/cxl/Kconfig > > @@ -6,6 +6,7 @@ menuconfig CXL_BUS > > select FW_UPLOAD > > select PCI_DOE > > select FIRMWARE_TABLE > > + select NUMA_KEEP_MEMINFO if (NUMA && X86) > > help > > CXL is a bus that is electrically compatible with PCI Express, but > > layers three protocols on that signalling (CXL.io, CXL.cache, and > > -- > > 2.39.2 > > > >
On 18.03.24 22:09:00, Robert Richter wrote:
> With kconfig option NUMA_KEEP_MEMINFO disabled the SRAT lookup done
> with numa_fill_memblks() fails returning NUMA_NO_MEMBLK (-1). An
> existing SRAT memory range cannot be found for a CFMWS address range.
> This causes the addition of a duplicate numa_memblk with a different
> node id and a subsequent page fault and kernel crash during boot.
>
> Note that the issue was initially introduced with [1]. But since
> phys_to_target_node() was originally used that returned the valid node
> 0, an additional numa_memblk was not added. Though, the node id was
> wrong too.
>
> Fix this by enabling NUMA_KEEP_MEMINFO for x86 with ACPI and NUMA
> enabled.
>
> [1] fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT")
>
> Fixes: 8f1004679987 ("ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window")
> Cc: Derick Marks <derick.w.marks@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Robert Richter <rrichter@amd.com>
This patch should be dropped in favor of the other 1/3 patch, it is a
leftover.
Thanks,
-Robert
Robert Richter wrote:
> On 18.03.24 22:09:00, Robert Richter wrote:
> > With kconfig option NUMA_KEEP_MEMINFO disabled the SRAT lookup done
> > with numa_fill_memblks() fails returning NUMA_NO_MEMBLK (-1). An
> > existing SRAT memory range cannot be found for a CFMWS address range.
> > This causes the addition of a duplicate numa_memblk with a different
> > node id and a subsequent page fault and kernel crash during boot.
> >
> > Note that the issue was initially introduced with [1]. But since
> > phys_to_target_node() was originally used that returned the valid node
> > 0, an additional numa_memblk was not added. Though, the node id was
> > wrong too.
> >
> > Fix this by enabling NUMA_KEEP_MEMINFO for x86 with ACPI and NUMA
> > enabled.
> >
> > [1] fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT")
> >
> > Fixes: 8f1004679987 ("ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window")
> > Cc: Derick Marks <derick.w.marks@intel.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Alison Schofield <alison.schofield@intel.com>
> > Signed-off-by: Robert Richter <rrichter@amd.com>
>
> This patch should be dropped in favor of the other 1/3 patch, it is a
> leftover.
What "other" patch? Did I respond to the wrong one?
© 2016 - 2026 Red Hat, Inc.