[PATCH] acpi: Fix hed module initialization order when it is built-in

Xiaofei Tan posted 1 patch 1 year, 2 months ago
drivers/acpi/Makefile | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
[PATCH] acpi: Fix hed module initialization order when it is built-in
Posted by Xiaofei Tan 1 year, 2 months ago
When the module hed is built-in, the init order is determined by
Makefile order. That order violates expectations. Because the module
hed init is behind evged. RAS records can't be handled in the
special time window that evged has initialized while hed not.
If the number of such RAS records is more than the APEI HEST error
source number, the HEST resources could be occupied all, and then
could affect subsequent RAS error reporting.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
---
 drivers/acpi/Makefile | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 61ca4afe83dc..54f60b7922ad 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -15,6 +15,13 @@ endif
 
 obj-$(CONFIG_ACPI)		+= tables.o
 
+#
+# The hed.o needs to be in front of evged.o to avoid the problem that
+# RAS errors cannot be handled in the special time window of startup
+# phase that evged has initialized while hed not.
+#
+obj-$(CONFIG_ACPI_HED)		+= hed.o
+
 #
 # ACPI Core Subsystem (Interpreter)
 #
@@ -95,7 +102,6 @@ obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o
 obj-$(CONFIG_ACPI_BATTERY)	+= battery.o
 obj-$(CONFIG_ACPI_SBS)		+= sbshc.o
 obj-$(CONFIG_ACPI_SBS)		+= sbs.o
-obj-$(CONFIG_ACPI_HED)		+= hed.o
 obj-$(CONFIG_ACPI_EC_DEBUGFS)	+= ec_sys.o
 obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
 obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
-- 
2.33.0
Re: [PATCH] acpi: Fix hed module initialization order when it is built-in
Posted by Mauro Carvalho Chehab 1 year, 1 month ago
Em Fri, 15 Nov 2024 11:50:14 +0800
Xiaofei Tan <tanxiaofei@huawei.com> escreveu:

Please always copy my @kernel.org address for upstream work.

> When the module hed is built-in, the init order is determined by
> Makefile order. That order violates expectations. Because the module
> hed init is behind evged. RAS records can't be handled in the
> special time window that evged has initialized while hed not.
> If the number of such RAS records is more than the APEI HEST error
> source number, the HEST resources could be occupied all, and then
> could affect subsequent RAS error reporting.

IMO, it is a lot better to use a late init call. Please see:
	include/linux/init.h

This would be done by, for instance, using late_initcall().

Now, what we have is:

	acpi-y                          += evged.o
	obj-$(CONFIG_ACPI_HED)          += hed.o

Where ACPI_HED being a tri-state.

It sounds to me, that even, with your patch, if you build
HED as a module, you'll still have a problem.

Shouldn't be ACPI_HED be changed from tristate to bool?

Regards,
Mauro

> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
> ---
>  drivers/acpi/Makefile | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 61ca4afe83dc..54f60b7922ad 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -15,6 +15,13 @@ endif
>  
>  obj-$(CONFIG_ACPI)		+= tables.o
>  
> +#
> +# The hed.o needs to be in front of evged.o to avoid the problem that
> +# RAS errors cannot be handled in the special time window of startup
> +# phase that evged has initialized while hed not.
> +#
> +obj-$(CONFIG_ACPI_HED)		+= hed.o
> +
>  #
>  # ACPI Core Subsystem (Interpreter)
>  #
> @@ -95,7 +102,6 @@ obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o
>  obj-$(CONFIG_ACPI_BATTERY)	+= battery.o
>  obj-$(CONFIG_ACPI_SBS)		+= sbshc.o
>  obj-$(CONFIG_ACPI_SBS)		+= sbs.o
> -obj-$(CONFIG_ACPI_HED)		+= hed.o
>  obj-$(CONFIG_ACPI_EC_DEBUGFS)	+= ec_sys.o
>  obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
>  obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
Re: [PATCH] acpi: Fix hed module initialization order when it is built-in
Posted by Xiaofei Tan 1 year, 1 month ago
Hi Mauro,

在 2024/12/12 0:22, Mauro Carvalho Chehab 写道:
> Em Fri, 15 Nov 2024 11:50:14 +0800
> Xiaofei Tan <tanxiaofei@huawei.com> escreveu:
>
> Please always copy my @kernel.org address for upstream work.

OK

>> When the module hed is built-in, the init order is determined by
>> Makefile order. That order violates expectations. Because the module
>> hed init is behind evged. RAS records can't be handled in the
>> special time window that evged has initialized while hed not.
>> If the number of such RAS records is more than the APEI HEST error
>> source number, the HEST resources could be occupied all, and then
>> could affect subsequent RAS error reporting.
> IMO, it is a lot better to use a late init call. Please see:
> 	include/linux/init.h
>
> This would be done by, for instance, using late_initcall().
>
> Now, what we have is:
>
> 	acpi-y                          += evged.o
> 	obj-$(CONFIG_ACPI_HED)          += hed.o
>
> Where ACPI_HED being a tri-state.
>
> It sounds to me, that even, with your patch, if you build
> HED as a module, you'll still have a problem.
Yes, and it is also  affected by loading sequence of HED and GHES. Anyway, the risk remains.
>
> Shouldn't be ACPI_HED be changed from tristate to bool?

agree,

@Rafael

Hi Rafael, Please help check if we can do this change, thanks.


>
> Regards,
> Mauro
>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
>> ---
>>   drivers/acpi/Makefile | 8 +++++++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
>> index 61ca4afe83dc..54f60b7922ad 100644
>> --- a/drivers/acpi/Makefile
>> +++ b/drivers/acpi/Makefile
>> @@ -15,6 +15,13 @@ endif
>>   
>>   obj-$(CONFIG_ACPI)		+= tables.o
>>   
>> +#
>> +# The hed.o needs to be in front of evged.o to avoid the problem that
>> +# RAS errors cannot be handled in the special time window of startup
>> +# phase that evged has initialized while hed not.
>> +#
>> +obj-$(CONFIG_ACPI_HED)		+= hed.o
>> +
>>   #
>>   # ACPI Core Subsystem (Interpreter)
>>   #
>> @@ -95,7 +102,6 @@ obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o
>>   obj-$(CONFIG_ACPI_BATTERY)	+= battery.o
>>   obj-$(CONFIG_ACPI_SBS)		+= sbshc.o
>>   obj-$(CONFIG_ACPI_SBS)		+= sbs.o
>> -obj-$(CONFIG_ACPI_HED)		+= hed.o
>>   obj-$(CONFIG_ACPI_EC_DEBUGFS)	+= ec_sys.o
>>   obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
>>   obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
> .
Re: [PATCH] acpi: Fix hed module initialization order when it is built-in
Posted by Rafael J. Wysocki 1 year, 1 month ago
On Fri, Nov 15, 2024 at 4:56 AM Xiaofei Tan <tanxiaofei@huawei.com> wrote:
>
> When the module hed is built-in, the init order is determined by
> Makefile order.

Are you sure?

> That order violates expectations. Because the module
> hed init is behind evged. RAS records can't be handled in the
> special time window that evged has initialized while hed not.
> If the number of such RAS records is more than the APEI HEST error
> source number, the HEST resources could be occupied all, and then
> could affect subsequent RAS error reporting.

Well, the problem is real, but does the change really prevent it from
happening or does it just increase the likelihood of success?

In the latter case, and generally speaking too, it would be better to
add explicit synchronization between evged and hed.

> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
> ---
>  drivers/acpi/Makefile | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 61ca4afe83dc..54f60b7922ad 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -15,6 +15,13 @@ endif
>
>  obj-$(CONFIG_ACPI)             += tables.o
>
> +#
> +# The hed.o needs to be in front of evged.o to avoid the problem that
> +# RAS errors cannot be handled in the special time window of startup
> +# phase that evged has initialized while hed not.
> +#
> +obj-$(CONFIG_ACPI_HED)         += hed.o
> +
>  #
>  # ACPI Core Subsystem (Interpreter)
>  #
> @@ -95,7 +102,6 @@ obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o
>  obj-$(CONFIG_ACPI_BATTERY)     += battery.o
>  obj-$(CONFIG_ACPI_SBS)         += sbshc.o
>  obj-$(CONFIG_ACPI_SBS)         += sbs.o
> -obj-$(CONFIG_ACPI_HED)         += hed.o
>  obj-$(CONFIG_ACPI_EC_DEBUGFS)  += ec_sys.o
>  obj-$(CONFIG_ACPI_BGRT)                += bgrt.o
>  obj-$(CONFIG_ACPI_CPPC_LIB)    += cppc_acpi.o
> --
> 2.33.0
>
Re: [PATCH] acpi: Fix hed module initialization order when it is built-in
Posted by Xiaofei Tan 1 year, 1 month ago
Hi Rafael,

在 2024/12/11 1:59, Rafael J. Wysocki 写道:
> On Fri, Nov 15, 2024 at 4:56 AM Xiaofei Tan <tanxiaofei@huawei.com> wrote:
>> When the module hed is built-in, the init order is determined by
>> Makefile order.
> Are you sure?

yes

>> That order violates expectations. Because the module
>> hed init is behind evged. RAS records can't be handled in the
>> special time window that evged has initialized while hed not.
>> If the number of such RAS records is more than the APEI HEST error
>> source number, the HEST resources could be occupied all, and then
>> could affect subsequent RAS error reporting.
> Well, the problem is real, but does the change really prevent it from
> happening or does it just increase the likelihood of success?

It can be completely solved if the driver used as built-in way. If build HED as a
module, it not solved.

>
> In the latter case, and generally speaking too, it would be better to
> add explicit synchronization between evged and hed.
>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
>> ---
>>   drivers/acpi/Makefile | 8 +++++++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
>> index 61ca4afe83dc..54f60b7922ad 100644
>> --- a/drivers/acpi/Makefile
>> +++ b/drivers/acpi/Makefile
>> @@ -15,6 +15,13 @@ endif
>>
>>   obj-$(CONFIG_ACPI)             += tables.o
>>
>> +#
>> +# The hed.o needs to be in front of evged.o to avoid the problem that
>> +# RAS errors cannot be handled in the special time window of startup
>> +# phase that evged has initialized while hed not.
>> +#
>> +obj-$(CONFIG_ACPI_HED)         += hed.o
>> +
>>   #
>>   # ACPI Core Subsystem (Interpreter)
>>   #
>> @@ -95,7 +102,6 @@ obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o
>>   obj-$(CONFIG_ACPI_BATTERY)     += battery.o
>>   obj-$(CONFIG_ACPI_SBS)         += sbshc.o
>>   obj-$(CONFIG_ACPI_SBS)         += sbs.o
>> -obj-$(CONFIG_ACPI_HED)         += hed.o
>>   obj-$(CONFIG_ACPI_EC_DEBUGFS)  += ec_sys.o
>>   obj-$(CONFIG_ACPI_BGRT)                += bgrt.o
>>   obj-$(CONFIG_ACPI_CPPC_LIB)    += cppc_acpi.o
>> --
>> 2.33.0
>>
> .
Re: [PATCH] acpi: Fix hed module initialization order when it is built-in
Posted by Jonathan Cameron 1 year, 1 month ago
On Mon, 23 Dec 2024 17:31:08 +0800
Xiaofei Tan <tanxiaofei@huawei.com> wrote:

> Hi Rafael,
> 
> 在 2024/12/11 1:59, Rafael J. Wysocki 写道:
> > On Fri, Nov 15, 2024 at 4:56 AM Xiaofei Tan <tanxiaofei@huawei.com> wrote:  
> >> When the module hed is built-in, the init order is determined by
> >> Makefile order.  
> > Are you sure?  
> 
> yes

We had a similar fix in CXL recently (which is why I suggested this approach
internally when tanxiaofei mentioned the problem).

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/cxl?id=6575b268157f37929948a8d1f3bafb3d7c055bc1

The related discussion for the CXL patch was the first time I'd come across solution
to load order for built in cases.


> 
> >> That order violates expectations. Because the module
> >> hed init is behind evged. RAS records can't be handled in the
> >> special time window that evged has initialized while hed not.
> >> If the number of such RAS records is more than the APEI HEST error
> >> source number, the HEST resources could be occupied all, and then
> >> could affect subsequent RAS error reporting.  
> > Well, the problem is real, but does the change really prevent it from
> > happening or does it just increase the likelihood of success?  
> 
> It can be completely solved if the driver used as built-in way. If build HED as a
> module, it not solved.

Can we enforce that condition not happening with appropriate Kconfig?
It's annoying to restrict build options, but if needed to make it work
then better than not working!

Jonathan


> 
> >
> > In the latter case, and generally speaking too, it would be better to
> > add explicit synchronization between evged and hed.
> >  
> >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >> Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
> >> ---
> >>   drivers/acpi/Makefile | 8 +++++++-
> >>   1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> >> index 61ca4afe83dc..54f60b7922ad 100644
> >> --- a/drivers/acpi/Makefile
> >> +++ b/drivers/acpi/Makefile
> >> @@ -15,6 +15,13 @@ endif
> >>
> >>   obj-$(CONFIG_ACPI)             += tables.o
> >>
> >> +#
> >> +# The hed.o needs to be in front of evged.o to avoid the problem that
> >> +# RAS errors cannot be handled in the special time window of startup
> >> +# phase that evged has initialized while hed not.
> >> +#
> >> +obj-$(CONFIG_ACPI_HED)         += hed.o
> >> +
> >>   #
> >>   # ACPI Core Subsystem (Interpreter)
> >>   #
> >> @@ -95,7 +102,6 @@ obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o
> >>   obj-$(CONFIG_ACPI_BATTERY)     += battery.o
> >>   obj-$(CONFIG_ACPI_SBS)         += sbshc.o
> >>   obj-$(CONFIG_ACPI_SBS)         += sbs.o
> >> -obj-$(CONFIG_ACPI_HED)         += hed.o
> >>   obj-$(CONFIG_ACPI_EC_DEBUGFS)  += ec_sys.o
> >>   obj-$(CONFIG_ACPI_BGRT)                += bgrt.o
> >>   obj-$(CONFIG_ACPI_CPPC_LIB)    += cppc_acpi.o
> >> --
> >> 2.33.0
> >>  
> > .  
Re: [PATCH] acpi: Fix hed module initialization order when it is built-in
Posted by Xiaofei Tan 1 year, 1 month ago
Hi Jonathan,

在 2024/12/24 3:33, Jonathan Cameron 写道:
> On Mon, 23 Dec 2024 17:31:08 +0800
> Xiaofei Tan <tanxiaofei@huawei.com> wrote:
>
>> Hi Rafael,
>>
>> 在 2024/12/11 1:59, Rafael J. Wysocki 写道:
>>> On Fri, Nov 15, 2024 at 4:56 AM Xiaofei Tan <tanxiaofei@huawei.com> wrote:
>>>> When the module hed is built-in, the init order is determined by
>>>> Makefile order.
>>> Are you sure?
>> yes
> We had a similar fix in CXL recently (which is why I suggested this approach
> internally when tanxiaofei mentioned the problem).
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/cxl?id=6575b268157f37929948a8d1f3bafb3d7c055bc1
>
> The related discussion for the CXL patch was the first time I'd come across solution
> to load order for built in cases.
>
Yes :)

>>>> That order violates expectations. Because the module
>>>> hed init is behind evged. RAS records can't be handled in the
>>>> special time window that evged has initialized while hed not.
>>>> If the number of such RAS records is more than the APEI HEST error
>>>> source number, the HEST resources could be occupied all, and then
>>>> could affect subsequent RAS error reporting.
>>> Well, the problem is real, but does the change really prevent it from
>>> happening or does it just increase the likelihood of success?
>> It can be completely solved if the driver used as built-in way. If build HED as a
>> module, it not solved.
> Can we enforce that condition not happening with appropriate Kconfig?
> It's annoying to restrict build options, but if needed to make it work
> then better than not working!

Agree,  i will change ACPI_HED from tristate to bool if there are no other comments, thanks.

>
> Jonathan
>
>
>>> In the latter case, and generally speaking too, it would be better to
>>> add explicit synchronization between evged and hed.
>>>   
>>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>>> Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
>>>> ---
>>>>    drivers/acpi/Makefile | 8 +++++++-
>>>>    1 file changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
>>>> index 61ca4afe83dc..54f60b7922ad 100644
>>>> --- a/drivers/acpi/Makefile
>>>> +++ b/drivers/acpi/Makefile
>>>> @@ -15,6 +15,13 @@ endif
>>>>
>>>>    obj-$(CONFIG_ACPI)             += tables.o
>>>>
>>>> +#
>>>> +# The hed.o needs to be in front of evged.o to avoid the problem that
>>>> +# RAS errors cannot be handled in the special time window of startup
>>>> +# phase that evged has initialized while hed not.
>>>> +#
>>>> +obj-$(CONFIG_ACPI_HED)         += hed.o
>>>> +
>>>>    #
>>>>    # ACPI Core Subsystem (Interpreter)
>>>>    #
>>>> @@ -95,7 +102,6 @@ obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o
>>>>    obj-$(CONFIG_ACPI_BATTERY)     += battery.o
>>>>    obj-$(CONFIG_ACPI_SBS)         += sbshc.o
>>>>    obj-$(CONFIG_ACPI_SBS)         += sbs.o
>>>> -obj-$(CONFIG_ACPI_HED)         += hed.o
>>>>    obj-$(CONFIG_ACPI_EC_DEBUGFS)  += ec_sys.o
>>>>    obj-$(CONFIG_ACPI_BGRT)                += bgrt.o
>>>>    obj-$(CONFIG_ACPI_CPPC_LIB)    += cppc_acpi.o
>>>> --
>>>> 2.33.0
>>>>   
>>> .
> .