[PATCH] sofware node: Only the managing device can unreference managed software node

mike.isely@cobaltdigital.com posted 1 patch 1 month, 1 week ago
drivers/base/swnode.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
[PATCH] sofware node: Only the managing device can unreference managed software node
Posted by mike.isely@cobaltdigital.com 1 month, 1 week ago
From: Mike Isely <mike.isely@cobaltdigital.com>

Explanation here is lengthy because the problem is subtle.

A scenario exists where device_create_managed_software_node() is used
to create an swnode instance that will be implicitly shared to a child
device despite best intentions not to permit such sharing (per the
comment in device_create_managed_software_node()).  I encountered this
with the sfp kernel module when it was instantiated with properties
via a call to platform_device_register_full() - it will create hwmon
child devices which get all property references.  Unfortunately with
just a "managed" boolean in struct swnode handling this, then
kobject_put() gets called for the managed aspect when the child device
goes away instead of the parent.  This leads to premature freeing of
the swnode structure, followed by use-after-free problems, heap
corruption, and generally chaos / crashes / misbehavior in the kernel.

This commit changes that boolean into a pointer to the actual managing
struct device, which is then checked against the struct device
instance that is actually going away (via the usual call back into
software_node_notify_remove()).  Thus the child device removal is
ignored as it should, and we only do the kobject_put() when the actual
managing struct device instance goes away.  We effectively carry a
little bit more information now so that we can be sure to clean up
only when the correct struct device instance is actually going away.

Note that while we are now keeping a pointer to a struct device here,
this is safe to do because the pointer itself only stays in use while
the pointed-to device remains valid.  (So no need to be concerned
about additional reference counting.)

Here's a succinct, generic sequence that spells out the steps which
lead to trouble:

  1. Somebody creates a struct device instance.

  2. Somebody calls device_create_managed_software_node() associating
  it with that struct device instance.  This causes two kobject
  references to be counted against the swnode instance (one for the
  struct device and one because the managed flag was set).  refcount=2

  3. Some time later, a child struct device is created and the
  properties of the parent are shared with the child.  This causes
  another kobject reference to be counted against the swnode instance.
  refcount=3

  4. Some time later, that child struct device instance goes
  away (perhaps due to a hotplug removal).  During the tear-down,
  software_node_notify_remove() notices that the managed field of the
  swnode instance is set, so TWO kobject references are removed
  instead of the single reference created by the previous step.
  refcount=1

  5. Repeat step 3.  refcount=2

  6. Repeat step 4.  refcount=0

  7. Now the kobject reference count inside the swnode instance has
  dropped to zero and so the swnode instance is released.  Notice that
  the parent device is still holding a physical reference, now
  pointing into freed memory.  Chaos likely follows.  If lucky, a NULL
  pointer kernel oops is the result.

The patch fixes this by causing step 4 to only do the second reference
decrement if the device being torn down is the same one that
established the managing relationship in the first place.  Then
everything stays balanced.

I confirmed this exact sequence of behavior in the context of the sfp
kernel module with appropriate printk debug trace added, when
instantiated with platform_device_register_full() with properties
provided.  While that's may be hard for to reproduce without sfp
hardware, anything else which performs those steps should produce
similar results.

Note that because platform_device_register_full() will implicitly
employ a managed swnode instance any time it is passed properties in
its pdevinfo argument, then by implication anyone calling
platform_device_register_full() in that manner - which happens in
multiple other places in the kernel - is at risk for the same issue.

Signed-off-by: Mike Isely <mike.isely@cobaltdigital.com>
Signed-off-by: Mike Isely <isely@pobox.com>
---
 drivers/base/swnode.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
index 53b3f0061ad12..18a903a35197c 100644
--- a/drivers/base/swnode.c
+++ b/drivers/base/swnode.c
@@ -36,8 +36,8 @@ struct swnode {
 	struct list_head children;
 	struct swnode *parent;
 
+	struct device *managing_dev;
 	unsigned int allocated:1;
-	unsigned int managed:1;
 };
 
 static DEFINE_IDA(swnode_root_ids);
@@ -1060,7 +1060,7 @@ int device_create_managed_software_node(struct device *dev,
 	if (IS_ERR(fwnode))
 		return PTR_ERR(fwnode);
 
-	to_swnode(fwnode)->managed = true;
+	to_swnode(fwnode)->managing_dev = dev;
 	set_secondary_fwnode(dev, fwnode);
 
 	if (device_is_registered(dev))
@@ -1104,7 +1104,7 @@ void software_node_notify_remove(struct device *dev)
 	sysfs_remove_link(&dev->kobj, "software_node");
 	kobject_put(&swnode->kobj);
 
-	if (swnode->managed) {
+	if (swnode->managing_dev == dev) {
 		set_secondary_fwnode(dev, NULL);
 		kobject_put(&swnode->kobj);
 	}
-- 
2.47.3

---
Re: [PATCH] sofware node: Only the managing device can unreference managed software node
Posted by Andy Shevchenko 1 month, 1 week ago
On Mon, Mar 02, 2026 at 11:21:41PM -0600, mike.isely@cobaltdigital.com wrote:

Can you send it with proper version given (I think v2) and changelog?
Also in the Subject a typo should be fixed: "software node" is the correct
spelling.

-- 
With Best Regards,
Andy Shevchenko
Re: [PATCH] sofware node: Only the managing device can unreference managed software node
Posted by Mike Isely 1 month ago
OK, this time for sure...  Patch coming up.

  -Mike

On Tue, 3 Mar 2026, Andy Shevchenko wrote:

> On Mon, Mar 02, 2026 at 11:21:41PM -0600, mike.isely@cobaltdigital.com wrote:
> 
> Can you send it with proper version given (I think v2) and changelog?
> Also in the Subject a typo should be fixed: "software node" is the correct
> spelling.
> 
>
Re: [PATCH] sofware node: Only the managing device can unreference managed software node
Posted by Andy Shevchenko 1 month, 1 week ago
On Tue, Mar 03, 2026 at 10:28:24AM +0200, Andy Shevchenko wrote:
> On Mon, Mar 02, 2026 at 11:21:41PM -0600, mike.isely@cobaltdigital.com wrote:
> 
> Can you send it with proper version given (I think v2) and changelog?
> Also in the Subject a typo should be fixed: "software node" is the correct
> spelling.

Note, changelog goes to the place after '---' line in the given patch format.

-- 
With Best Regards,
Andy Shevchenko