[PATCH 3/3] ACPI: APEI: EINJ: Do not fail einj_init() on faux_device_create() failure

Dan Williams posted 3 patches 6 months, 2 weeks ago
[PATCH 3/3] ACPI: APEI: EINJ: Do not fail einj_init() on faux_device_create() failure
Posted by Dan Williams 6 months, 2 weeks ago
CXL has a symbol dependency on einj_core.ko, so if einj_init() fails then
cxl_core.ko fails to load. Prior to the faux_device_create() conversion,
einj_probe() failures were tracked by the einj_initialized flag without
failing einj_init().

Revert to that behavior and always succeed einj_init() given there is no
way, and no pressing need, to discern faux device-create vs device-probe
failures.

This situation arose because CXL knows proper kernel named objects to
trigger errors against, but acpi-einj knows how to perform the error
injection. The injection mechanism is shared with non-CXL use cases. The
result is CXL now has a module dependency on einj-core.ko, and init/probe
failures are handled at runtime.

Fixes: 6cb9441bfe8d ("ACPI: APEI: EINJ: Transition to the faux device interface")
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Ben Cheatham <Benjamin.Cheatham@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/apei/einj-core.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c
index fea11a35eea3..9b041415a9d0 100644
--- a/drivers/acpi/apei/einj-core.c
+++ b/drivers/acpi/apei/einj-core.c
@@ -883,19 +883,16 @@ static int __init einj_init(void)
 	}
 
 	einj_dev = faux_device_create("acpi-einj", NULL, &einj_device_ops);
-	if (!einj_dev)
-		return -ENODEV;
 
-	einj_initialized = true;
+	if (einj_dev)
+		einj_initialized = true;
 
 	return 0;
 }
 
 static void __exit einj_exit(void)
 {
-	if (einj_initialized)
-		faux_device_destroy(einj_dev);
-
+	faux_device_destroy(einj_dev);
 }
 
 module_init(einj_init);
-- 
2.49.0
Re: [PATCH 3/3] ACPI: APEI: EINJ: Do not fail einj_init() on faux_device_create() failure
Posted by Cheatham, Benjamin 6 months, 1 week ago
On 6/6/2025 10:32 PM, Dan Williams wrote:
> CXL has a symbol dependency on einj_core.ko, so if einj_init() fails then
> cxl_core.ko fails to load. Prior to the faux_device_create() conversion,
> einj_probe() failures were tracked by the einj_initialized flag without
> failing einj_init().
> 
> Revert to that behavior and always succeed einj_init() given there is no
> way, and no pressing need, to discern faux device-create vs device-probe
> failures.
> 
> This situation arose because CXL knows proper kernel named objects to
> trigger errors against, but acpi-einj knows how to perform the error
> injection. The injection mechanism is shared with non-CXL use cases. The
> result is CXL now has a module dependency on einj-core.ko, and init/probe
> failures are handled at runtime.
> 
> Fixes: 6cb9441bfe8d ("ACPI: APEI: EINJ: Transition to the faux device interface")
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> Cc: Ben Cheatham <Benjamin.Cheatham@amd.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---

Thanks for sending this out!

Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Re: [PATCH 3/3] ACPI: APEI: EINJ: Do not fail einj_init() on faux_device_create() failure
Posted by Jonathan Cameron 6 months, 1 week ago
On Fri, 6 Jun 2025 20:32:28 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> CXL has a symbol dependency on einj_core.ko, so if einj_init() fails then
> cxl_core.ko fails to load. Prior to the faux_device_create() conversion,
> einj_probe() failures were tracked by the einj_initialized flag without
> failing einj_init().
> 
> Revert to that behavior and always succeed einj_init() given there is no
> way, and no pressing need, to discern faux device-create vs device-probe
> failures.
> 
> This situation arose because CXL knows proper kernel named objects to
> trigger errors against, but acpi-einj knows how to perform the error
> injection. The injection mechanism is shared with non-CXL use cases. The
> result is CXL now has a module dependency on einj-core.ko, and init/probe
> failures are handled at runtime.
> 
> Fixes: 6cb9441bfe8d ("ACPI: APEI: EINJ: Transition to the faux device interface")
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> Cc: Ben Cheatham <Benjamin.Cheatham@amd.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/acpi/apei/einj-core.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c
> index fea11a35eea3..9b041415a9d0 100644
> --- a/drivers/acpi/apei/einj-core.c
> +++ b/drivers/acpi/apei/einj-core.c
> @@ -883,19 +883,16 @@ static int __init einj_init(void)
>  	}
>  
>  	einj_dev = faux_device_create("acpi-einj", NULL, &einj_device_ops);
> -	if (!einj_dev)
> -		return -ENODEV;
>  
> -	einj_initialized = true;
> +	if (einj_dev)
> +		einj_initialized = true;
>  
>  	return 0;
>  }
>  
>  static void __exit einj_exit(void)
>  {
> -	if (einj_initialized)
> -		faux_device_destroy(einj_dev);
> -
> +	faux_device_destroy(einj_dev);

Hi Dan,

Thi bit is sort of fine though not really related, because
faux_device_destroy() checks

void faux_device_destroy(struct faux_device *faux_dev)
{
	struct device *dev = &faux_dev->dev;

	if (!faux_dev)
		return;

Though that check is after a dereference of faux_dev
which doesn't look right to me.  Might be fine because
of how the kernel is built (I can't remember where we ended
up on topic of compilers making undefined behavior based
optimizations).  Still not that nice from a logical point of view!

>  }
>  
>  module_init(einj_init);
Re: [PATCH 3/3] ACPI: APEI: EINJ: Do not fail einj_init() on faux_device_create() failure
Posted by Greg KH 6 months, 1 week ago
On Mon, Jun 09, 2025 at 11:17:58AM +0100, Jonathan Cameron wrote:
> On Fri, 6 Jun 2025 20:32:28 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > CXL has a symbol dependency on einj_core.ko, so if einj_init() fails then
> > cxl_core.ko fails to load. Prior to the faux_device_create() conversion,
> > einj_probe() failures were tracked by the einj_initialized flag without
> > failing einj_init().
> > 
> > Revert to that behavior and always succeed einj_init() given there is no
> > way, and no pressing need, to discern faux device-create vs device-probe
> > failures.
> > 
> > This situation arose because CXL knows proper kernel named objects to
> > trigger errors against, but acpi-einj knows how to perform the error
> > injection. The injection mechanism is shared with non-CXL use cases. The
> > result is CXL now has a module dependency on einj-core.ko, and init/probe
> > failures are handled at runtime.
> > 
> > Fixes: 6cb9441bfe8d ("ACPI: APEI: EINJ: Transition to the faux device interface")
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > Cc: Sudeep Holla <sudeep.holla@arm.com>
> > Cc: Ben Cheatham <Benjamin.Cheatham@amd.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  drivers/acpi/apei/einj-core.c | 9 +++------
> >  1 file changed, 3 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c
> > index fea11a35eea3..9b041415a9d0 100644
> > --- a/drivers/acpi/apei/einj-core.c
> > +++ b/drivers/acpi/apei/einj-core.c
> > @@ -883,19 +883,16 @@ static int __init einj_init(void)
> >  	}
> >  
> >  	einj_dev = faux_device_create("acpi-einj", NULL, &einj_device_ops);
> > -	if (!einj_dev)
> > -		return -ENODEV;
> >  
> > -	einj_initialized = true;
> > +	if (einj_dev)
> > +		einj_initialized = true;
> >  
> >  	return 0;
> >  }
> >  
> >  static void __exit einj_exit(void)
> >  {
> > -	if (einj_initialized)
> > -		faux_device_destroy(einj_dev);
> > -
> > +	faux_device_destroy(einj_dev);
> 
> Hi Dan,
> 
> Thi bit is sort of fine though not really related, because
> faux_device_destroy() checks
> 
> void faux_device_destroy(struct faux_device *faux_dev)
> {
> 	struct device *dev = &faux_dev->dev;
> 
> 	if (!faux_dev)
> 		return;
> 
> Though that check is after a dereference of faux_dev
> which doesn't look right to me.  Might be fine because
> of how the kernel is built (I can't remember where we ended
> up on topic of compilers making undefined behavior based
> optimizations).  Still not that nice from a logical point of view!

I think this is fine as we just put "0 + offset of dev" into dev, and
didn't do anything with that (i.e. no actual read of that memory
location happened).  The compiler shouldn't be doing anything that could
happen after the return before we check for a valid pointer here, right?

thanks,

greg k-h
Re: [PATCH 3/3] ACPI: APEI: EINJ: Do not fail einj_init() on faux_device_create() failure
Posted by Jonathan Cameron 6 months, 1 week ago
On Mon, 9 Jun 2025 12:42:53 +0200
Greg KH <gregkh@linuxfoundation.org> wrote:

> On Mon, Jun 09, 2025 at 11:17:58AM +0100, Jonathan Cameron wrote:
> > On Fri, 6 Jun 2025 20:32:28 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >   
> > > CXL has a symbol dependency on einj_core.ko, so if einj_init() fails then
> > > cxl_core.ko fails to load. Prior to the faux_device_create() conversion,
> > > einj_probe() failures were tracked by the einj_initialized flag without
> > > failing einj_init().
> > > 
> > > Revert to that behavior and always succeed einj_init() given there is no
> > > way, and no pressing need, to discern faux device-create vs device-probe
> > > failures.
> > > 
> > > This situation arose because CXL knows proper kernel named objects to
> > > trigger errors against, but acpi-einj knows how to perform the error
> > > injection. The injection mechanism is shared with non-CXL use cases. The
> > > result is CXL now has a module dependency on einj-core.ko, and init/probe
> > > failures are handled at runtime.
> > > 
> > > Fixes: 6cb9441bfe8d ("ACPI: APEI: EINJ: Transition to the faux device interface")
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > > Cc: Sudeep Holla <sudeep.holla@arm.com>
> > > Cc: Ben Cheatham <Benjamin.Cheatham@amd.com>
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > ---
> > >  drivers/acpi/apei/einj-core.c | 9 +++------
> > >  1 file changed, 3 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/acpi/apei/einj-core.c b/drivers/acpi/apei/einj-core.c
> > > index fea11a35eea3..9b041415a9d0 100644
> > > --- a/drivers/acpi/apei/einj-core.c
> > > +++ b/drivers/acpi/apei/einj-core.c
> > > @@ -883,19 +883,16 @@ static int __init einj_init(void)
> > >  	}
> > >  
> > >  	einj_dev = faux_device_create("acpi-einj", NULL, &einj_device_ops);
> > > -	if (!einj_dev)
> > > -		return -ENODEV;
> > >  
> > > -	einj_initialized = true;
> > > +	if (einj_dev)
> > > +		einj_initialized = true;
> > >  
> > >  	return 0;
> > >  }
> > >  
> > >  static void __exit einj_exit(void)
> > >  {
> > > -	if (einj_initialized)
> > > -		faux_device_destroy(einj_dev);
> > > -
> > > +	faux_device_destroy(einj_dev);  
> > 
> > Hi Dan,
> > 
> > Thi bit is sort of fine though not really related, because
> > faux_device_destroy() checks
> > 
> > void faux_device_destroy(struct faux_device *faux_dev)
> > {
> > 	struct device *dev = &faux_dev->dev;
> > 
> > 	if (!faux_dev)
> > 		return;
> > 
> > Though that check is after a dereference of faux_dev
> > which doesn't look right to me.  Might be fine because
> > of how the kernel is built (I can't remember where we ended
> > up on topic of compilers making undefined behavior based
> > optimizations).  Still not that nice from a logical point of view!  
> 
> I think this is fine as we just put "0 + offset of dev" into dev, and
> didn't do anything with that (i.e. no actual read of that memory
> location happened).  The compiler shouldn't be doing anything that could
> happen after the return before we check for a valid pointer here, right?

Hmm. I did some digging. Seems that was debated 10 years ago without
a huge amount of clarity on the answer beyond all sane people telling
compiler folk not to use this in optimizations :)

Comes down to whether any dereference of NULL is UB whether or not
the compiler can just do a simple offset calculation.

Anyhow, whilst fine, it's still a little ugly to my eyes :(

Jonathan



> 
> thanks,
> 
> greg k-h
>
Re: [PATCH 3/3] ACPI: APEI: EINJ: Do not fail einj_init() on faux_device_create() failure
Posted by Dan Williams 6 months, 1 week ago
Jonathan Cameron wrote:
[..]
> Hmm. I did some digging. Seems that was debated 10 years ago without
> a huge amount of clarity on the answer beyond all sane people telling
> compiler folk not to use this in optimizations :)
> 
> Comes down to whether any dereference of NULL is UB whether or not
> the compiler can just do a simple offset calculation.
> 
> Anyhow, whilst fine, it's still a little ugly to my eyes :(

I recall we had this conversation with Dan Carpenter on a smatch patch
and resolved that while it looks "interesting" it does no harm.

For this patch I am not motivated to spin it because even if the the
compiler took advantage of the NULL check to drop UB work, that would
only mean dropping the assignment.

Otherwise, this conversion lines up with the intent of both
einj_initialized and faux_device_destroy() whereby faux_device_destroy()
is already prepared for the case where faux_device_create() fails.