Previous, there is only one edac can be registering in the EDAC core. After
ghes_edac is modularized, the registering of ghes devices may be deferred
when ghes_edac is loaded. Prevent the chipset-specific edac drivers from
loading after ghes_edac is registered, which allow the edac drivers to
get selected in e.g. HPE platform.
Signed-off-by: Jia He <justin.he@arm.com>
---
drivers/acpi/apei/ghes.c | 14 ++++++++++++++
drivers/edac/amd64_edac.c | 2 +-
drivers/edac/ghes_edac.c | 2 ++
drivers/edac/pnd2_edac.c | 2 +-
drivers/edac/sb_edac.c | 2 +-
drivers/edac/skx_base.c | 2 +-
include/acpi/ghes.h | 4 ++++
7 files changed, 24 insertions(+), 4 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 9c52183e3ad9..9272d963b57d 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -95,6 +95,7 @@
#endif
static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
+static bool devs_registered;
static inline bool is_hest_type_generic_v2(struct ghes *ghes)
{
@@ -1266,6 +1267,18 @@ static int apei_sdei_unregister_ghes(struct ghes *ghes)
return sdei_unregister_ghes(ghes);
}
+void set_ghes_devs_registered(bool flag)
+{
+ devs_registered = flag;
+}
+EXPORT_SYMBOL_GPL(set_ghes_devs_registered);
+
+bool ghes_devs_registered(void)
+{
+ return devs_registered;
+}
+EXPORT_SYMBOL_GPL(ghes_devs_registered);
+
static int ghes_probe(struct platform_device *ghes_dev)
{
struct acpi_hest_generic *generic;
@@ -1382,6 +1395,7 @@ static int ghes_probe(struct platform_device *ghes_dev)
platform_set_drvdata(ghes_dev, ghes);
ghes->dev = &ghes_dev->dev;
+ set_ghes_devs_registered(false);
mutex_lock(&ghes_devs_mutex);
list_add_tail(&ghes->elist, &ghes_devs);
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index e4eaf6668feb..f48507fa7b94 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -4329,7 +4329,7 @@ static int __init amd64_edac_init(void)
int err = -ENODEV;
int i;
- if (ghes_get_devices(0))
+ if (ghes_get_devices(0) && ghes_devs_registered())
return -EBUSY;
owner = edac_get_owner();
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 3bdf8469882d..d26644b3bc47 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -564,6 +564,7 @@ static int __init ghes_edac_init(void)
list_for_each_entry_safe(g, g_tmp, ghes_devs, elist) {
ghes_edac_register(g->dev);
}
+ set_ghes_devs_registered(true);
return 0;
}
@@ -576,6 +577,7 @@ static void __exit ghes_edac_exit(void)
list_for_each_entry_safe(g, g_tmp, ghes_devs, elist) {
ghes_edac_unregister(g);
}
+ set_ghes_devs_registered(false);
}
module_exit(ghes_edac_exit);
diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
index 73f2ba0e64e3..66d89d00c4b3 100644
--- a/drivers/edac/pnd2_edac.c
+++ b/drivers/edac/pnd2_edac.c
@@ -1528,7 +1528,7 @@ static int __init pnd2_init(void)
edac_dbg(2, "\n");
- if (ghes_get_devices(0))
+ if (ghes_get_devices(0) && ghes_devs_registered())
return -EBUSY;
owner = edac_get_owner();
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 1d0520a16840..a3a89e366a74 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -3506,7 +3506,7 @@ static int __init sbridge_init(void)
edac_dbg(2, "\n");
- if (ghes_get_devices(0))
+ if (ghes_get_devices(0) && ghes_devs_registered())
return -EBUSY;
owner = edac_get_owner();
diff --git a/drivers/edac/skx_base.c b/drivers/edac/skx_base.c
index fe267f8543f5..efa1ae79c35a 100644
--- a/drivers/edac/skx_base.c
+++ b/drivers/edac/skx_base.c
@@ -653,7 +653,7 @@ static int __init skx_init(void)
edac_dbg(2, "\n");
- if (ghes_get_devices(0))
+ if (ghes_get_devices(0) && ghes_devs_registered())
return -EBUSY;
owner = edac_get_owner();
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 150c0b9500d6..21b9d4d4c3e9 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -72,8 +72,12 @@ int ghes_register_vendor_record_notifier(struct notifier_block *nb);
*/
void ghes_unregister_vendor_record_notifier(struct notifier_block *nb);
struct list_head *ghes_get_devices(bool force);
+bool ghes_devs_registered(void);
+void set_ghes_devs_registered(bool flag);
#else
static inline struct list_head *ghes_get_devices(bool force) { return NULL; }
+static inline bool ghes_devs_registered(void) { return false; }
+static inline void set_ghes_devs_registered(bool flag) { return; }
#endif
int ghes_estatus_pool_init(int num_ghes);
--
2.25.1
On Wednesday, August 17, 2022 8:35 AM, Jia He wrote: > Previous, there is only one edac can be registering in the EDAC core. After > ghes_edac is modularized, the registering of ghes devices may be deferred > when ghes_edac is loaded. Prevent the chipset-specific edac drivers from > loading after ghes_edac is registered, which allow the edac drivers to > get selected in e.g. HPE platform. ... > @@ -1382,6 +1395,7 @@ static int ghes_probe(struct platform_device > *ghes_dev) > platform_set_drvdata(ghes_dev, ghes); > > ghes->dev = &ghes_dev->dev; > + set_ghes_devs_registered(false); This does not look right to me. The condition of using ghes_edac is (ghes-present) && (ghes-preferred), where: - ghes-present is latched on ghes_probe() - ghes-preferred is true if the platform-check passes. ghes_get_device() introduced in the previous patch works as the ghes-preferred check. We cannot use ghes_edac registration as the ghes-present check in this scheme since it is deferred to module_init(). I'd suggest the following changes: - Update ghes_get_device() to check a flag latched on ghes_probe(). - Remove 'force' argument from ghes_get_device(). ghes_edac_init() is too late to set this flag. Also, other edac drivers should not be able to control this flag. I think this force_load kernel option needs to belong to ghes with this scheme. - ghes_get_device() exposes 'ghes_devs' to chipset-specific edac drivers, which should be internal to ghes. ghes_get_device() may be renamed to something like ghes_edac_preferred() which returns true / false. Toshi
Hi Toshi, > -----Original Message----- > From: Kani, Toshi <toshi.kani@hpe.com> > Sent: Friday, August 19, 2022 9:30 AM > To: Justin He <Justin.He@arm.com>; Ard Biesheuvel <ardb@kernel.org>; Len > Brown <lenb@kernel.org>; James Morse <James.Morse@arm.com>; Tony Luck > <tony.luck@intel.com>; Borislav Petkov <bp@alien8.de>; Mauro Carvalho > Chehab <mchehab@kernel.org>; Robert Richter <rric@kernel.org>; Robert > Moore <robert.moore@intel.com>; Qiuxu Zhuo <qiuxu.zhuo@intel.com>; > Yazen Ghannam <yazen.ghannam@amd.com> > Cc: linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org; > linux-edac@vger.kernel.org; devel@acpica.org; Rafael J . Wysocki > <rafael@kernel.org>; Shuai Xue <xueshuai@linux.alibaba.com>; Jarkko > Sakkinen <jarkko@kernel.org>; linux-efi@vger.kernel.org; nd <nd@arm.com> > Subject: RE: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from > loading after ghes_edac is registered > > On Wednesday, August 17, 2022 8:35 AM, Jia He wrote: > > Previous, there is only one edac can be registering in the EDAC core. > > After ghes_edac is modularized, the registering of ghes devices may be > > deferred when ghes_edac is loaded. Prevent the chipset-specific edac > > drivers from loading after ghes_edac is registered, which allow the > > edac drivers to get selected in e.g. HPE platform. > ... > > @@ -1382,6 +1395,7 @@ static int ghes_probe(struct platform_device > > *ghes_dev) > > platform_set_drvdata(ghes_dev, ghes); > > > > ghes->dev = &ghes_dev->dev; > > + set_ghes_devs_registered(false); > > This does not look right to me. > > The condition of using ghes_edac is (ghes-present) && (ghes-preferred), > where: > - ghes-present is latched on ghes_probe() > - ghes-preferred is true if the platform-check passes. > > ghes_get_device() introduced in the previous patch works as the > ghes-preferred check. > > We cannot use ghes_edac registration as the ghes-present check in this > scheme since it is deferred to module_init(). What is the logic for ghes-present check? In this patch, I assumed it is equal to "ghes_edac devices have been registered". Seems it is not correct. But should we consider the case as follows: What if sbridge_init () is invoked before ghes_edac_init()? i.e. Should we get sb_edac driver selected when ghes_edac is not loaded yet (e.g. on HPE)? > > I'd suggest the following changes: > - Update ghes_get_device() to check a flag latched on ghes_probe(). Still need your elaborating about the details of the flag. i.e. When is this flag latched? When is it unlatched? > - Remove 'force' argument from ghes_get_device(). ghes_edac_init() is too > late to set this flag. Also, other edac drivers should not be able to control this > flag. I think this force_load kernel option needs to belong to ghes with this > scheme. Agree, I will remove force_load in ghes_get_device(), and put all check/update of force_load in ghes > - ghes_get_device() exposes 'ghes_devs' to chipset-specific edac drivers, which > should be internal to ghes. ghes_get_device() may be renamed to something > like ghes_edac_preferred() which returns true / false. > Okay, agree -- Cheers, Justin (Jia He)
On Friday, August 19, 2022 4:35 AM, Justin He wrote: > > > @@ -1382,6 +1395,7 @@ static int ghes_probe(struct platform_device > > > *ghes_dev) > > > platform_set_drvdata(ghes_dev, ghes); > > > > > > ghes->dev = &ghes_dev->dev; > > > + set_ghes_devs_registered(false); > > > > This does not look right to me. > > > > The condition of using ghes_edac is (ghes-present) && (ghes-preferred), > > where: > > - ghes-present is latched on ghes_probe() > > - ghes-preferred is true if the platform-check passes. > > > > ghes_get_device() introduced in the previous patch works as the > > ghes-preferred check. > > > > We cannot use ghes_edac registration as the ghes-present check in this > > scheme since it is deferred to module_init(). > > What is the logic for ghes-present check? In this patch, I assumed it is equal to > "ghes_edac devices have been registered". Seems it is not correct. Using (ghes_edac-registered) is a wrong check in this scheme since ghes_edac registration is deferred. This check is fine in the current scheme since ghes_edac gets registered before any other chipset-specific edac drivers. > But should we consider the case as follows: > What if sbridge_init () is invoked before ghes_edac_init()? i.e. Should we get > sb_edac driver selected when ghes_edac is not loaded yet (e.g. on HPE)? No. The point is that ghes_edac driver needs to be selected, "regardless of the module ordering", on a system with GHES present & preferred. Note that this new scheme leads to the following condition change: - Current: (ghes-present) && (ghes-preferred) && (ghes_edac registered) - New: (ghes-present) && (ghes-preferred) The option I suggested previously keeps the current condition, but this new scheme does not for better modularity. What this means is that if ghes_edac is not enabled (but ghes is enabled) on a system with GHES present & preferred, no edac driver gets registered. This change is fine from my (HPE) perspective and should be fine for other GHES systems. GHES systems have chipset-specific edac driver in FW. OS-based chipset-specific edac driver is not necessary and may lead to a conflict of chipset register ownership. > > I'd suggest the following changes: > > - Update ghes_get_device() to check a flag latched on ghes_probe(). > > Still need your elaborating about the details of the flag. i.e. When is this flag > latched? When is it unlatched? This flag is a static variable, say ghes_present, which is set to false initially. ghes_probe() sets it to true. ghes_edac_preferred() (aka. ghes_get_device) checks this flag at beginning and returns false if this flag is false. It does not get unlatched since ACPI GHES table is static. Toshi
> -----Original Message----- > From: Kani, Toshi <toshi.kani@hpe.com> > Sent: Friday, August 19, 2022 12:49 PM > To: Justin He <Justin.He@arm.com> > Cc: linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux- > edac@vger.kernel.org; devel@acpica.org; Rafael J . Wysocki > <rafael@kernel.org>; Shuai Xue <xueshuai@linux.alibaba.com>; Jarkko > Sakkinen <jarkko@kernel.org>; linux-efi@vger.kernel.org; nd > <nd@arm.com>; Ard Biesheuvel <ardb@kernel.org>; Len Brown > <lenb@kernel.org>; James Morse <James.Morse@arm.com>; Tony Luck > <tony.luck@intel.com>; Borislav Petkov <bp@alien8.de>; Mauro Carvalho > Chehab <mchehab@kernel.org>; Robert Richter <rric@kernel.org>; Robert > Moore <robert.moore@intel.com>; Qiuxu Zhuo <qiuxu.zhuo@intel.com>; > Yazen Ghannam <yazen.ghannam@amd.com> > Subject: RE: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac > from loading after ghes_edac is registered > > On Friday, August 19, 2022 4:35 AM, Justin He wrote: > > > > @@ -1382,6 +1395,7 @@ static int ghes_probe(struct > platform_device > > > > *ghes_dev) > > > > platform_set_drvdata(ghes_dev, ghes); > > > > > > > > ghes->dev = &ghes_dev->dev; > > > > + set_ghes_devs_registered(false); > > > > > > This does not look right to me. > > > > > > The condition of using ghes_edac is (ghes-present) && (ghes- > preferred), > > > where: > > > - ghes-present is latched on ghes_probe() > > > - ghes-preferred is true if the platform-check passes. > > > > > > ghes_get_device() introduced in the previous patch works as the > > > ghes-preferred check. > > > > > > We cannot use ghes_edac registration as the ghes-present check in > this > > > scheme since it is deferred to module_init(). > > > > What is the logic for ghes-present check? In this patch, I assumed it > is equal to > > "ghes_edac devices have been registered". Seems it is not correct. > > Using (ghes_edac-registered) is a wrong check in this scheme > since ghes_edac registration is deferred. This check is fine in > the current scheme since ghes_edac gets registered before > any other chipset-specific edac drivers. > > > But should we consider the case as follows: > > What if sbridge_init () is invoked before ghes_edac_init()? i.e. > Should we get > > sb_edac driver selected when ghes_edac is not loaded yet (e.g. on > HPE)? > > No. The point is that ghes_edac driver needs to be selected, > "regardless of the module ordering", on a system with GHES > present & preferred. > > Note that this new scheme leads to the following condition > change: > - Current: (ghes-present) && (ghes-preferred) && (ghes_edac registered) > - New: (ghes-present) && (ghes-preferred) > > The option I suggested previously keeps the current condition, > but this new scheme does not for better modularity. > > What this means is that if ghes_edac is not enabled (but ghes > is enabled) on a system with GHES present & preferred, no edac > driver gets registered. This change is fine from my (HPE) perspective > and should be fine for other GHES systems. GHES systems have > chipset-specific edac driver in FW. OS-based chipset-specific edac > driver is not necessary and may lead to a conflict of chipset register > ownership. Currently, running with this on the kernel command line ghes.disable causes the ACPI ghes driver to quit early in acpi_ghes_init(): /* * This driver isn't really modular, however for the time being, * continuing to use module_param is the easiest way to remain * compatible with existing boot arg use cases. */ bool ghes_disable; module_param_named(disable, ghes_disable, bool, 0); which results in the skx_edac module assuming it should run: [ 33.628140] calling skx_init+0x0/0xe5a [skx_edac] @ 1444 [ 33.628531] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:36:0a.0 (INTERRUPT) [ 33.641432] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:36:0c.0 (INTERRUPT) [ 33.653256] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT) [ 33.665055] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT) [ 33.676801] initcall skx_init+0x0/0xe5a [skx_edac] returned 0 after 36343 usecs We might need to differentiate between the system ROM really not offering GHES vs. the ghes module not running.
On Fri, Aug 19, 2022 at 06:57:11PM +0000, Elliott, Robert (Servers) wrote:
> We might need to differentiate between the system ROM really not
> offering GHES vs. the ghes module not running.
All detectable in ghes_get_devices().
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Aug 19, 2022 at 05:48:39PM +0000, Kani, Toshi wrote:
> This flag is a static variable, say ghes_present, which is set to
> false initially. ghes_probe() sets it to true. ghes_edac_preferred()
> (aka. ghes_get_device) checks this flag at beginning and returns false
> if this flag is false. It does not get unlatched since ACPI GHES table
> is static.
What is that flag needed for at all?
There are two possibilities:
1. ghes_probe() succeeds. ghes_edac loads properly and other drivers use
ghes_get_devices() to know when to load.
2. ghes_probe() fails and that is caught during platform testing of all
those platforms who wish to use ghes_edac. BIOS is fixed and goto 1.
No need for funky flags whatsoever.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Friday, August 19, 2022 12:30 PM, Borislav Petkov wrote: > > This flag is a static variable, say ghes_present, which is set to > > false initially. ghes_probe() sets it to true. ghes_edac_preferred() > > (aka. ghes_get_device) checks this flag at beginning and returns false > > if this flag is false. It does not get unlatched since ACPI GHES table > > is static. > > What is that flag needed for at all? Because ghes_get_device() always returns &ghes_devs on Arm, which is !NULL. It does not check if GHES is present. > There are two possibilities: > > 1. ghes_probe() succeeds. ghes_edac loads properly and other drivers use > ghes_get_devices() to know when to load. > > 2. ghes_probe() fails and that is caught during platform testing of all > those platforms who wish to use ghes_edac. BIOS is fixed and goto 1. > > No need for funky flags whatsoever. 3. ghes_probe() is not called, but ghes_get_device() is called from other edac drivers. Toshi
On Fri, Aug 19, 2022 at 06:46:34PM +0000, Kani, Toshi wrote:
> 3. ghes_probe() is not called,
When is ghes_probe() not called on ARM?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Friday, August 19, 2022 12:50 PM, Borislav Petkov wrote: > > 3. ghes_probe() is not called, > > When is ghes_probe() not called on ARM? When the system does not implement ACPI GHES table, which I suppose is most of the case on ARM. Toshi
On Fri, Aug 19, 2022 at 06:53:41PM +0000, Kani, Toshi wrote:
> When the system does not implement ACPI GHES table,
> which I suppose is most of the case on ARM.
That should be detected in ghes_get_devices() - just like on x86.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Friday, August 19, 2022 1:31 PM, Borislav Petkov wrote: > > When the system does not implement ACPI GHES table, > > which I suppose is most of the case on ARM. > > That should be detected in ghes_get_devices() - just like on x86. Agreed. And that is the check to ghes_present flag... Toshi
© 2016 - 2026 Red Hat, Inc.