firmware: google: fix orphaned devices on partial populate failure

[PATCH] firmware: google: fix orphaned devices on partial populate failure

Posted by Titouan Ameline de Cadeville 1 month, 3 weeks ago

coreboot_table_populate() registers devices one by one. If
device_register() fails mid-loop, the function returns an error to
coreboot_table_probe(), which returns it to the platform driver core.
The platform driver core does not call remove() on probe failure, so
devices registered before the failure are left orphaned on the coreboot
bus with no cleanup path.

Call bus_for_each_dev() to unregister all devices when populate fails.

Signed-off-by: Titouan Ameline de Cadeville <titouan.ameline@gmail.com>
---
 drivers/firmware/google/coreboot_table.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/firmware/google/coreboot_table.c b/drivers/firmware/google/coreboot_table.c
index 233939e548b4..2ebf1497f60c 100644
--- a/drivers/firmware/google/coreboot_table.c
+++ b/drivers/firmware/google/coreboot_table.c
@@ -167,6 +167,12 @@ static int coreboot_table_populate(struct device *dev, void *ptr, resource_size_
 	return 0;
 }
 
+static int __cb_dev_unregister(struct device *dev, void *dummy)
+{
+	device_unregister(dev);
+	return 0;
+}
+
 static int coreboot_table_probe(struct platform_device *pdev)
 {
 	resource_size_t len;
@@ -203,17 +209,14 @@ static int coreboot_table_probe(struct platform_device *pdev)
 
 	ret = coreboot_table_populate(dev, ptr, len);
 
+	if (ret)
+		bus_for_each_dev(&coreboot_bus_type, NULL, NULL,
+				 __cb_dev_unregister);
 	memunmap(ptr);
 
 	return ret;
 }
 
-static int __cb_dev_unregister(struct device *dev, void *dummy)
-{
-	device_unregister(dev);
-	return 0;
-}
-
 static void coreboot_table_remove(struct platform_device *pdev)
 {
 	bus_for_each_dev(&coreboot_bus_type, NULL, NULL, __cb_dev_unregister);
-- 
2.44.2

Re: [PATCH] firmware: google: fix orphaned devices on partial populate failure

Posted by Julius Werner 1 month, 2 weeks ago

Why does device_register() generally fail? Is it usually a problem
with the device specifically (e.g. the device driver probe failed) or
does it always indicate an issue with the core Linux device framework?

The coreboot table entries are generally independent of each other, so
if one of them has a problem that doesn't mean we need to kill all the
others. If there's a chance that later device_register() for other
entries would still succeed, I'd say we should actually just continue
the loop instead of returning immediately. If it is certain that later
entries can also not succeed (e.g. because this error can only happen
when the device framework ran out of some allocatable resource or
something), then I think the current code makes sense that just exits
with the devices we already managed to create.

Re: [PATCH] firmware: google: fix orphaned devices on partial populate failure

Posted by Titouan Ameline 1 month, 2 weeks ago

Thanks for feedback. You are right that the distinction matters

Looking at what device_register() can realistically return in this context:
it calls device_add(),  which can fail with -ENOMEM during
kobject/sysfs setup, or with -EEXIST if the device name already exists
in sysfs.

 -ENOMEM is systemic and means subsequent entries will also fail,
while -EEXIST would be entry-specific
so a duplicate name for one entry shouldn't prevent others from registering.

Given that, would the right approach be to continue the loop on
entry-specific errors ( logging a warning), while still aborting and
cleaning up on systemic ones like -ENOMEM? Or is the name collision
case considered impossible here since names are derived from the
tag/index and the table is only parsed once?

If continuing on all errors is preferred for simplicity, I can rework
the patch to skip failing entries rather than aborting, and drop the
cleanup entirely.


Le lun. 27 avr. 2026 à 21:03, Julius Werner <jwerner@chromium.org> a écrit :
>
> Why does device_register() generally fail? Is it usually a problem
> with the device specifically (e.g. the device driver probe failed) or
> does it always indicate an issue with the core Linux device framework?
>
> The coreboot table entries are generally independent of each other, so
> if one of them has a problem that doesn't mean we need to kill all the
> others. If there's a chance that later device_register() for other
> entries would still succeed, I'd say we should actually just continue
> the loop instead of returning immediately. If it is certain that later
> entries can also not succeed (e.g. because this error can only happen
> when the device framework ran out of some allocatable resource or
> something), then I think the current code makes sense that just exits
> with the devices we already managed to create.

Re: [PATCH] firmware: google: fix orphaned devices on partial populate failure

Posted by Julius Werner 1 month, 2 weeks ago

> Given that, would the right approach be to continue the loop on
> entry-specific errors ( logging a warning), while still aborting and
> cleaning up on systemic ones like -ENOMEM? Or is the name collision
> case considered impossible here since names are derived from the
> tag/index and the table is only parsed once?

I don't think you should hardcode behavior so specific to what the
called function does. Trying every entry doesn't really hurt even if
they all fail due to some systemic problem, so if there's any chance
that other entries might succeed, I think the best option is to just
always continue the loop and try the next one.

Re: [PATCH] firmware: google: fix orphaned devices on partial populate failure

Posted by Brian Norris 1 month, 2 weeks ago

On Tue, Apr 28, 2026 at 12:49:35PM -0700, Julius Werner wrote:
> > Given that, would the right approach be to continue the loop on
> > entry-specific errors ( logging a warning), while still aborting and
> > cleaning up on systemic ones like -ENOMEM? Or is the name collision
> > case considered impossible here since names are derived from the
> > tag/index and the table is only parsed once?
> 
> I don't think you should hardcode behavior so specific to what the
> called function does. Trying every entry doesn't really hurt even if
> they all fail due to some systemic problem, so if there's any chance
> that other entries might succeed, I think the best option is to just
> always continue the loop and try the next one.

FWIW, of_platform_populate() might be a (highly-used) analog for
comparison. Aside from some top-level errors (such as, "can't even find
the root to start from"), it doesn't actually return errors at all [1].
It just skips individual device failures (including -ENOMEM).

Seems like an OK strategy to me.

Brian

[1] of_platform_bus_create() technically has some recursion-carried
return codes, giving a chance to propagate a failure, but all the return
codes are still 0.

Re: [PATCH] firmware: google: fix orphaned devices on partial populate failure

Posted by Titouan Ameline 1 month, 2 weeks ago

Thanks both for the reference to of_platform_populate(), that makes
the  clear.

I'll rework the patch to simply continue the loop on any
device_register() failure, log a warning, and always return 0.
That drops the cleanup and aligns with the strategy used
in of_platform_populate().

Will send a v2 in the next 24 hours

Le mar. 28 avr. 2026 à 22:33, Brian Norris <briannorris@chromium.org> a écrit :
>
> On Tue, Apr 28, 2026 at 12:49:35PM -0700, Julius Werner wrote:
> > > Given that, would the right approach be to continue the loop on
> > > entry-specific errors ( logging a warning), while still aborting and
> > > cleaning up on systemic ones like -ENOMEM? Or is the name collision
> > > case considered impossible here since names are derived from the
> > > tag/index and the table is only parsed once?
> >
> > I don't think you should hardcode behavior so specific to what the
> > called function does. Trying every entry doesn't really hurt even if
> > they all fail due to some systemic problem, so if there's any chance
> > that other entries might succeed, I think the best option is to just
> > always continue the loop and try the next one.
>
> FWIW, of_platform_populate() might be a (highly-used) analog for
> comparison. Aside from some top-level errors (such as, "can't even find
> the root to start from"), it doesn't actually return errors at all [1].
> It just skips individual device failures (including -ENOMEM).
>
> Seems like an OK strategy to me.
>
> Brian
>
> [1] of_platform_bus_create() technically has some recursion-carried
> return codes, giving a chance to propagate a failure, but all the return
> codes are still 0.