.../driver-api/driver-model/index.rst | 1 - .../driver-api/driver-model/revocable.rst | 152 ----------- MAINTAINERS | 9 - drivers/base/Kconfig | 8 - drivers/base/Makefile | 5 +- drivers/base/revocable.c | 241 ------------------ drivers/base/revocable_test.c | 142 ----------- include/linux/revocable.h | 69 ----- tools/testing/selftests/Makefile | 1 - .../selftests/drivers/base/revocable/Makefile | 7 - .../drivers/base/revocable/revocable_test.c | 136 ---------- .../drivers/base/revocable/test-revocable.sh | 39 --- .../base/revocable/test_modules/Makefile | 10 - .../revocable/test_modules/revocable_test.c | 195 -------------- 14 files changed, 1 insertion(+), 1014 deletions(-) delete mode 100644 Documentation/driver-api/driver-model/revocable.rst delete mode 100644 drivers/base/revocable.c delete mode 100644 drivers/base/revocable_test.c delete mode 100644 include/linux/revocable.h delete mode 100644 tools/testing/selftests/drivers/base/revocable/Makefile delete mode 100644 tools/testing/selftests/drivers/base/revocable/revocable_test.c delete mode 100755 tools/testing/selftests/drivers/base/revocable/test-revocable.sh delete mode 100644 tools/testing/selftests/drivers/base/revocable/test_modules/Makefile delete mode 100644 tools/testing/selftests/drivers/base/revocable/test_modules/revocable_test.c
I was surprised to learn that the revocable functionality was merged last week given the community feedback on list and at LPC, but not least since there are no users of it, which we are supposed to require to be able to evaluate it properly. The chromeos ec driver issue which motivated this work turned out not to need it as was found during review. And the example gpiolib conversion was posted the very same morning that this was merged which hardly provides enough time for evaluation (even if Bartosz quickly reported a performance regression). Turns out there are correctness issues with both the gpiolib conversion and the revocable design itself that can lead to use-after-free and hung tasks (see [1] and patch 3/3). And as was pointed out repeatedly during review, and again at the day of the merge, this does not look like the right interface for the chardev unplug issue. Revert the revocable implementation until a redesign has been proposed and evaluated properly. Johan [1] https://lore.kernel.org/all/aXT45B6vLf9R3Pbf@hovoldconsulting.com/ Johan Hovold (3): Revert "selftests: revocable: Add kselftest cases" Revert "revocable: Add Kunit test cases" Revert "revocable: Revocable resource management" .../driver-api/driver-model/index.rst | 1 - .../driver-api/driver-model/revocable.rst | 152 ----------- MAINTAINERS | 9 - drivers/base/Kconfig | 8 - drivers/base/Makefile | 5 +- drivers/base/revocable.c | 241 ------------------ drivers/base/revocable_test.c | 142 ----------- include/linux/revocable.h | 69 ----- tools/testing/selftests/Makefile | 1 - .../selftests/drivers/base/revocable/Makefile | 7 - .../drivers/base/revocable/revocable_test.c | 136 ---------- .../drivers/base/revocable/test-revocable.sh | 39 --- .../base/revocable/test_modules/Makefile | 10 - .../revocable/test_modules/revocable_test.c | 195 -------------- 14 files changed, 1 insertion(+), 1014 deletions(-) delete mode 100644 Documentation/driver-api/driver-model/revocable.rst delete mode 100644 drivers/base/revocable.c delete mode 100644 drivers/base/revocable_test.c delete mode 100644 include/linux/revocable.h delete mode 100644 tools/testing/selftests/drivers/base/revocable/Makefile delete mode 100644 tools/testing/selftests/drivers/base/revocable/revocable_test.c delete mode 100755 tools/testing/selftests/drivers/base/revocable/test-revocable.sh delete mode 100644 tools/testing/selftests/drivers/base/revocable/test_modules/Makefile delete mode 100644 tools/testing/selftests/drivers/base/revocable/test_modules/revocable_test.c -- 2.52.0
On Sat, Jan 24, 2026 at 06:05:32PM +0100, Johan Hovold wrote:
> I was surprised to learn that the revocable functionality was merged last week
> given the community feedback on list and at LPC, but not least since there are
> no users of it, which we are supposed to require to be able to evaluate it
> properly.
>
> The chromeos ec driver issue which motivated this work turned out not to need
> it as was found during review. And the example gpiolib conversion was posted
Thanks for sharing your thoughts on revocable.
Regarding cros_ec_chardev, my last attempt [2] to solve its hot-plug issue by
synchronizing misc_{de,}register() with file operations using a global lock
highlighted the difficulty of alternatives: that approach serialized all file
operations and could easily lead to hung tasks if any file operation slept.
Given the drawbacks of [2], I believe cros_ec_chardev remains a valid use
case for revocable.
> the very same morning that this was merged which hardly provides enough time
> for evaluation (even if Bartosz quickly reported a performance regression).
The gpiolib conversion was provided as the first concrete user to enable
this evaluation process. The performance regression Bartosz identified is
valuable feedback, and I believe it is addressed by [3]. I plan to send the
next version of the series after v7.0-rc1 and revisit the issue.
[3] https://lore.kernel.org/all/20260121040204.2699886-1-tzungbi@kernel.org/
> Turns out there are correctness issues with both the gpiolib conversion and
> the revocable design itself that can lead to use-after-free and hung tasks (see
> [1] and patch 3/3).
I appreciate you identifying the issues where multiple threads share a file
descriptor; this is a case I overlooked. I see these kinds of findings as a
positive outcome of having wider review and a concrete user, allowing us to
identify and fix issues in the design.
> And as was pointed out repeatedly during review, and again at the day of the
> merge, this does not look like the right interface for the chardev unplug
> issue.
My focus has been on miscdevice [2], but I suspect cdev solutions for device
hot-plug would face similar synchronization challenges between device removal
and in-flight file operations.
> Revert the revocable implementation until a redesign has been proposed and
> evaluated properly.
I'll work on addressing the discovered issues and send follow-up fixes. I
believe keeping the current series in linux-next would be beneficial, as it
allows for easier testing and wider evaluation by others, rather than
reverting at this stage.
>
> Johan
>
>
> [1] https://lore.kernel.org/all/aXT45B6vLf9R3Pbf@hovoldconsulting.com/
---
[2]:
diff --git a/drivers/char/misc.c b/drivers/char/misc.c
...
+static struct miscdevice *find_miscdevice(int minor)
+{
+ struct miscdevice *c;
+
+ list_for_each_entry(c, &misc_list, list)
+ if (c->minor == minor)
+ return c;
+ return NULL;
+}
+
+static __poll_t misc_sync_poll(struct file *filp, poll_table *wait)
+{
+ struct miscdevice *c;
+
+ guard(mutex)(&misc_mtx);
+ c = find_miscdevice(iminor(filp->f_inode));
+ if (!c)
+ return -ENODEV;
+ if (!c->fops->poll)
+ return 0;
+
+ return c->fops->poll(filp, wait);
+}
...
+static const struct file_operations misc_sync_fops = {
+ .poll = misc_sync_poll,
+ .read = misc_sync_read,
+ .unlocked_ioctl = misc_sync_ioctl,
+ .release = misc_sync_release,
+};
+
static int misc_open(struct inode *inode, struct file *file)
{
int minor = iminor(inode);
@@ -161,6 +237,7 @@ static int misc_open(struct inode *inode, struct file *file)
replace_fops(file, new_fops);
if (file->f_op->open)
err = file->f_op->open(inode, file);
+ file->f_op = &misc_sync_fops;
On Tue, Jan 27, 2026 at 03:57:44PM +0000, Tzung-Bi Shih wrote: > On Sat, Jan 24, 2026 at 06:05:32PM +0100, Johan Hovold wrote: > > Turns out there are correctness issues with both the gpiolib conversion and > > the revocable design itself that can lead to use-after-free and hung tasks (see > > [1] and patch 3/3). > [...] > > Revert the revocable implementation until a redesign has been proposed and > > evaluated properly. > > I'll work on addressing the discovered issues and send follow-up fixes. I > believe keeping the current series in linux-next would be beneficial, as it > allows for easier testing and wider evaluation by others, rather than > reverting at this stage. FWIW: https://lore.kernel.org/all/20260129143733.45618-2-tzungbi@kernel.org/ and https://lore.kernel.org/all/20260129143733.45618-4-tzungbi@kernel.org/ are the proposed fixes.
On Thu, Jan 29, 2026 at 03:01:50PM +0000, Tzung-Bi Shih wrote: > On Tue, Jan 27, 2026 at 03:57:44PM +0000, Tzung-Bi Shih wrote: > > On Sat, Jan 24, 2026 at 06:05:32PM +0100, Johan Hovold wrote: > > > Turns out there are correctness issues with both the gpiolib conversion and > > > the revocable design itself that can lead to use-after-free and hung tasks (see > > > [1] and patch 3/3). > > [...] > > > Revert the revocable implementation until a redesign has been proposed and > > > evaluated properly. > > > > I'll work on addressing the discovered issues and send follow-up fixes. I > > believe keeping the current series in linux-next would be beneficial, as it > > allows for easier testing and wider evaluation by others, rather than > > reverting at this stage. > > FWIW: https://lore.kernel.org/all/20260129143733.45618-2-tzungbi@kernel.org/ > and https://lore.kernel.org/all/20260129143733.45618-4-tzungbi@kernel.org/ > are the proposed fixes. I won't review that, sorry. As multiple people said in this mail thread, the API needs to go back to the design board. -- Regards, Laurent Pinchart
On Fri Jan 30, 2026 at 10:12 AM CET, Laurent Pinchart wrote: > On Thu, Jan 29, 2026 at 03:01:50PM +0000, Tzung-Bi Shih wrote: >> FWIW: https://lore.kernel.org/all/20260129143733.45618-2-tzungbi@kernel.org/ >> and https://lore.kernel.org/all/20260129143733.45618-4-tzungbi@kernel.org/ >> are the proposed fixes. > > I won't review that, sorry. As multiple people said in this mail thread, > the API needs to go back to the design board. Of course, you are entirely free to choose what you want to review, but I think the justification you give is not entirely fair. The patch series was on the list since August, you were explicitly Cc'd from the get-go. In v3 you said: "To be clear, I'm not against the API being merged for the use cases that would benefit from it, but I don't want to see drivers using it to protect from the cdev/unregistration race." This was a reply to Bartosz saying: "Yeah, I'm not against this going upstream. If more development is needed for this to be usable in other parts of the kernel, that can be done gradually. Literally no subsystem ever was perfect on day 1." Just to be clear, I'm not saying there are no issues to be addressed. And I'm also not saying that you never raised concerns, you clearly did. But, given the above, I don't think it's fair to request a revert as a precondition to give constructive feedback for improvements and fixes. - Danilo Link: https://lore.kernel.org/chrome-platform/20250912145416.GL31682@pendragon.ideasonboard.com/
On Tue, Jan 27, 2026 at 03:57:44PM +0000, Tzung-Bi Shih wrote:
> On Sat, Jan 24, 2026 at 06:05:32PM +0100, Johan Hovold wrote:
> > I was surprised to learn that the revocable functionality was merged last week
> > given the community feedback on list and at LPC, but not least since there are
> > no users of it, which we are supposed to require to be able to evaluate it
> > properly.
> >
> > The chromeos ec driver issue which motivated this work turned out not to need
> > it as was found during review. And the example gpiolib conversion was posted
> Regarding cros_ec_chardev, my last attempt [2] to solve its hot-plug issue by
> synchronizing misc_{de,}register() with file operations using a global lock
> highlighted the difficulty of alternatives: that approach serialized all file
> operations and could easily lead to hung tasks if any file operation slept.
Yeah, fixing it at the chardev (or miscdevice) level would still require
some changes at the driver level (e.g. to wakeup sleeping tasks).
> Given the drawbacks of [2], I believe cros_ec_chardev remains a valid use
> case for revocable.
Yes, possibly.
Miscdevice is a bit of a special case however since you cannot tie the
lifetime of your driver data to that of the miscdev, but there are ways
to address that. And some drivers already handle this (i.e. without any
changes to miscdevice).
But notably the revocable design seems to derive from this quirk of
miscdevice.
[ Side note for completeness: Looking at the cros_ec driver it doesn't
seem to involve any hotpluggable buses so this looks like a mostly
theoretical issue of a developer with root access actively unbinding
an ec-driver while in use. ]
> > the very same morning that this was merged which hardly provides enough time
> > for evaluation (even if Bartosz quickly reported a performance regression).
>
> The gpiolib conversion was provided as the first concrete user to enable
> this evaluation process. The performance regression Bartosz identified is
> valuable feedback, and I believe it is addressed by [3]. I plan to send the
> next version of the series after v7.0-rc1 and revisit the issue.
>
> [3] https://lore.kernel.org/all/20260121040204.2699886-1-tzungbi@kernel.org/
>
> > Turns out there are correctness issues with both the gpiolib conversion and
> > the revocable design itself that can lead to use-after-free and hung tasks (see
> > [1] and patch 3/3).
>
> I appreciate you identifying the issues where multiple threads share a file
> descriptor; this is a case I overlooked. I see these kinds of findings as a
> positive outcome of having wider review and a concrete user, allowing us to
> identify and fix issues in the design.
Yes, that's exactly why we always require a user *before* merging
something like this. To be able to determine that the interface is sane.
Now the user showed up on the same day as the merge, which is obviously
not enough time for proper review and discussion.
[ And by short-circuiting the normal process you probably reduce the
motivation for people to spend time on the example conversion too. ]
> > And as was pointed out repeatedly during review, and again at the day of the
> > merge, this does not look like the right interface for the chardev unplug
> > issue.
>
> My focus has been on miscdevice [2], but I suspect cdev solutions for device
> hot-plug would face similar synchronization challenges between device removal
> and in-flight file operations.
But we need to look at that before throwing up our hands and saying that
it's not possible and that each driver should take care of this itself.
> > Revert the revocable implementation until a redesign has been proposed and
> > evaluated properly.
>
> I'll work on addressing the discovered issues and send follow-up fixes. I
> believe keeping the current series in linux-next would be beneficial, as it
> allows for easier testing and wider evaluation by others, rather than
> reverting at this stage.
To the contrary, now is right time to do the revert as there are
fundamental problems with the current interface that will require a
redesign. That's not the kind of work that should be done during an rc
stabilisation cycle.
Give people a chance to see for themselves what the gpiolib conversion
looks like, determine whether this abstraction makes the code more or
less readable, and think about possible further uses of a mechanism like
this.
Johan
> > [1] https://lore.kernel.org/all/aXT45B6vLf9R3Pbf@hovoldconsulting.com/
On Wed, Jan 28, 2026 at 03:23:10PM +0100, Johan Hovold wrote:
> On Tue, Jan 27, 2026 at 03:57:44PM +0000, Tzung-Bi Shih wrote:
> > On Sat, Jan 24, 2026 at 06:05:32PM +0100, Johan Hovold wrote:
> > > I was surprised to learn that the revocable functionality was merged last week
> > > given the community feedback on list and at LPC, but not least since there are
> > > no users of it, which we are supposed to require to be able to evaluate it
> > > properly.
> > >
> > > The chromeos ec driver issue which motivated this work turned out not to need
> > > it as was found during review. And the example gpiolib conversion was posted
>
> > Regarding cros_ec_chardev, my last attempt [2] to solve its hot-plug issue by
> > synchronizing misc_{de,}register() with file operations using a global lock
> > highlighted the difficulty of alternatives: that approach serialized all file
> > operations and could easily lead to hung tasks if any file operation slept.
>
> Yeah, fixing it at the chardev (or miscdevice) level would still require
> some changes at the driver level (e.g. to wakeup sleeping tasks).
Waking up sleeping tasks is a core part of the fix. Drivers will need to
do so explicitly in their .remove() operation, through a
subsystem-specific wrapper around a cdev helper function.
> > Given the drawbacks of [2], I believe cros_ec_chardev remains a valid use
> > case for revocable.
>
> Yes, possibly.
>
> Miscdevice is a bit of a special case however since you cannot tie the
> lifetime of your driver data to that of the miscdev, but there are ways
> to address that. And some drivers already handle this (i.e. without any
> changes to miscdevice).
>
> But notably the revocable design seems to derive from this quirk of
> miscdevice.
>
> [ Side note for completeness: Looking at the cros_ec driver it doesn't
> seem to involve any hotpluggable buses so this looks like a mostly
> theoretical issue of a developer with root access actively unbinding
> an ec-driver while in use. ]
>
> > > the very same morning that this was merged which hardly provides enough time
> > > for evaluation (even if Bartosz quickly reported a performance regression).
> >
> > The gpiolib conversion was provided as the first concrete user to enable
> > this evaluation process. The performance regression Bartosz identified is
> > valuable feedback, and I believe it is addressed by [3]. I plan to send the
> > next version of the series after v7.0-rc1 and revisit the issue.
> >
> > [3] https://lore.kernel.org/all/20260121040204.2699886-1-tzungbi@kernel.org/
> >
> > > Turns out there are correctness issues with both the gpiolib conversion and
> > > the revocable design itself that can lead to use-after-free and hung tasks (see
> > > [1] and patch 3/3).
> >
> > I appreciate you identifying the issues where multiple threads share a file
> > descriptor; this is a case I overlooked. I see these kinds of findings as a
> > positive outcome of having wider review and a concrete user, allowing us to
> > identify and fix issues in the design.
>
> Yes, that's exactly why we always require a user *before* merging
> something like this. To be able to determine that the interface is sane.
>
> Now the user showed up on the same day as the merge, which is obviously
> not enough time for proper review and discussion.
>
> [ And by short-circuiting the normal process you probably reduce the
> motivation for people to spend time on the example conversion too. ]
>
> > > And as was pointed out repeatedly during review, and again at the day of the
> > > merge, this does not look like the right interface for the chardev unplug
> > > issue.
> >
> > My focus has been on miscdevice [2], but I suspect cdev solutions for device
> > hot-plug would face similar synchronization challenges between device removal
> > and in-flight file operations.
>
> But we need to look at that before throwing up our hands and saying that
> it's not possible and that each driver should take care of this itself.
There has been previous attempts, including a functional patch series,
that didn't get merge due to the sole reason that it duplicated a small
amount of logic also present in debugfs. I'll reply to Danilo somewhere
else in this mail thread with more information.
> > > Revert the revocable implementation until a redesign has been proposed and
> > > evaluated properly.
> >
> > I'll work on addressing the discovered issues and send follow-up fixes. I
> > believe keeping the current series in linux-next would be beneficial, as it
> > allows for easier testing and wider evaluation by others, rather than
> > reverting at this stage.
>
> To the contrary, now is right time to do the revert as there are
> fundamental problems with the current interface that will require a
> redesign. That's not the kind of work that should be done during an rc
> stabilisation cycle.
>
> Give people a chance to see for themselves what the gpiolib conversion
> looks like, determine whether this abstraction makes the code more or
> less readable, and think about possible further uses of a mechanism like
> this.
fully agreed. Follow-up fixes is not the right way forward. As Johan
said, the quick merge despite the open issues, circumventing the normal
review process, damages trust, and reduces my motivation to review your
work and help shaping a good API to solve this problem.
> > > [1] https://lore.kernel.org/all/aXT45B6vLf9R3Pbf@hovoldconsulting.com/
--
Regards,
Laurent Pinchart
On Sat Jan 24, 2026 at 6:05 PM CET, Johan Hovold wrote: > this does not look like the right interface for the chardev unplug issue. I think it depends, we should do everything to prevent having the issue in the first place, e.g. ensure that we synchronize the unplug properly on device driver unbind. Sometimes, however, this isn't possible; this is where a revocable mechanism can come in handy to prevent UAF of device resources -- DRM is a good example for this. But to be fair, I also want to point out that there is a quite significant difference regarding the usefulness of the revocable concept in C compared to in Rust due to language capabilities.
On Sat, Jan 24, 2026 at 08:08:28PM +0100, Danilo Krummrich wrote: > On Sat Jan 24, 2026 at 6:05 PM CET, Johan Hovold wrote: > > this does not look like the right interface for the chardev unplug issue. > > I think it depends, we should do everything to prevent having the issue in the > first place, e.g. ensure that we synchronize the unplug properly on device > driver unbind. > > Sometimes, however, this isn't possible; this is where a revocable mechanism can > come in handy to prevent UAF of device resources -- DRM is a good example for > this. This is not "possible" for almost all real devices so we need something like this for almost all classes of devices, DRM just shows the extremes involved, v4l2 is also another good example. Note, other OSes also have this same problem, look at all the work the BSDs are going through at the moment just to get closer to the place where we are in Linux today with removable devices and they have hit our same problems. > But to be fair, I also want to point out that there is a quite significant > difference regarding the usefulness of the revocable concept in C compared to in > Rust due to language capabilities. True, but we do need something. I took these patches without a real user as a base for us to start working off of. The rust implementation has shown that the design-pattern is a good solution for the problem, and so I feel we should work with it and try to get this working properly. We've been sitting and talking about it for years now, and here is the first real code submission that is getting us closer to fix the problem properly. It might not be perfict, but let's evolve it from here for what is found not to work correctly. So I don't want to take these reverts, let's try this out, by putting this into the driver core now, we have the base to experiment with in a "safe" way in lots of different driver subsytems at the same time. If it doesn't work out, worst case we revert it in a release or two because it didn't get used. thanks, greg k-h
On Sun, Jan 25, 2026 at 01:47:14PM +0100, Greg Kroah-Hartman wrote: > On Sat, Jan 24, 2026 at 08:08:28PM +0100, Danilo Krummrich wrote: > > On Sat Jan 24, 2026 at 6:05 PM CET, Johan Hovold wrote: > > > this does not look like the right interface for the chardev unplug issue. > > > > I think it depends, we should do everything to prevent having the issue in the > > first place, e.g. ensure that we synchronize the unplug properly on device > > driver unbind. > > > > Sometimes, however, this isn't possible; this is where a revocable mechanism can > > come in handy to prevent UAF of device resources -- DRM is a good example for > > this. > > This is not "possible" for almost all real devices so we need something > like this for almost all classes of devices, DRM just shows the extremes > involved, v4l2 is also another good example. It's certainly possible to handle the chardev unplug issue without revocable as several subsystems already do. All you need is a refcount, a lock and a flag. It may be possible to provide a generic solutions at the chardev level or some kind of helper implementation (similar to revocable) for subsystems to use directly. But revocable appears to be too fine grained for this as when the device goes away all operations must cease. There's no need to track mmio regions individually as was suggested. This may be the mental model for someone working with rust, but it isn't necessarily a good fit for the rest of the kernel. > > But to be fair, I also want to point out that there is a quite significant > > difference regarding the usefulness of the revocable concept in C compared to in > > Rust due to language capabilities. > > True, but we do need something. I took these patches without a real > user as a base for us to start working off of. The rust implementation > has shown that the design-pattern is a good solution for the problem, > and so I feel we should work with it and try to get this working > properly. We've been sitting and talking about it for years now, and > here is the first real code submission that is getting us closer to fix > the problem properly. It might not be perfict, but let's evolve it from > here for what is found not to work correctly. It's a design pattern that's perhaps needed for rust, but not necessarily elsewhere. But either way there is no need to rush this. If it turns out to be usable, it can be merged along with a future user. Dropping the revocable_provider and revocable abstraction split should even make it more palatable. And with a new interface and a non-trivial user we can see what the end-result looks like and decide where to go from there. > So I don't want to take these reverts, let's try this out, by putting > this into the driver core now, we have the base to experiment with in a > "safe" way in lots of different driver subsytems at the same time. If > it doesn't work out, worst case we revert it in a release or two because > it didn't get used. Please reconsider. Perhaps I didn't stress the point enough that the current API needs to be reworked completely since there's no longer any need for the two revocable abstractions. Johan
Hi Greg, On Mon, Jan 26, 2026 at 02:50:05PM +0100, Johan Hovold wrote: > On Sun, Jan 25, 2026 at 01:47:14PM +0100, Greg Kroah-Hartman wrote: > > True, but we do need something. I took these patches without a real > > user as a base for us to start working off of. The rust implementation > > has shown that the design-pattern is a good solution for the problem, > > and so I feel we should work with it and try to get this working > > properly. We've been sitting and talking about it for years now, and > > here is the first real code submission that is getting us closer to fix > > the problem properly. It might not be perfict, but let's evolve it from > > here for what is found not to work correctly. > > It's a design pattern that's perhaps needed for rust, but not > necessarily elsewhere. But either way there is no need to rush this. If > it turns out to be usable, it can be merged along with a future user. > > Dropping the revocable_provider and revocable abstraction split should > even make it more palatable. > > And with a new interface and a non-trivial user we can see what the > end-result looks like and decide where to go from there. > > > So I don't want to take these reverts, let's try this out, by putting > > this into the driver core now, we have the base to experiment with in a > > "safe" way in lots of different driver subsytems at the same time. If > > it doesn't work out, worst case we revert it in a release or two because > > it didn't get used. > > Please reconsider. Perhaps I didn't stress the point enough that the > current API needs to be reworked completely since there's no longer any > need for the two revocable abstractions. I noticed that you picked up the proposed incremental fixes to the issues I reported without anyone even having reviewed them. The fixes being incremental makes it a lot harder to review, but based on a quick look it seems there needs to be further changes. And again, what is the rush? Anyone wanting to experiment with this functionality only needs to apply a single patch. And exposing the API before it is stable is just going to be a mess as subsystems may start using it from day one. So please, just drop it for 6.20. You can still merge this for the next cycle when the basic functionality has been fixed. Johan
On Tue, Feb 03, 2026 at 01:15:50PM +0100, Johan Hovold wrote: > Hi Greg, > > On Mon, Jan 26, 2026 at 02:50:05PM +0100, Johan Hovold wrote: > > On Sun, Jan 25, 2026 at 01:47:14PM +0100, Greg Kroah-Hartman wrote: > > > > True, but we do need something. I took these patches without a real > > > user as a base for us to start working off of. The rust implementation > > > has shown that the design-pattern is a good solution for the problem, > > > and so I feel we should work with it and try to get this working > > > properly. We've been sitting and talking about it for years now, and > > > here is the first real code submission that is getting us closer to fix > > > the problem properly. It might not be perfict, but let's evolve it from > > > here for what is found not to work correctly. > > > > It's a design pattern that's perhaps needed for rust, but not > > necessarily elsewhere. But either way there is no need to rush this. If > > it turns out to be usable, it can be merged along with a future user. > > > > Dropping the revocable_provider and revocable abstraction split should > > even make it more palatable. > > > > And with a new interface and a non-trivial user we can see what the > > end-result looks like and decide where to go from there. > > > > > So I don't want to take these reverts, let's try this out, by putting > > > this into the driver core now, we have the base to experiment with in a > > > "safe" way in lots of different driver subsytems at the same time. If > > > it doesn't work out, worst case we revert it in a release or two because > > > it didn't get used. > > > > Please reconsider. Perhaps I didn't stress the point enough that the > > current API needs to be reworked completely since there's no longer any > > need for the two revocable abstractions. > > I noticed that you picked up the proposed incremental fixes to the > issues I reported without anyone even having reviewed them. The fixes > being incremental makes it a lot harder to review, but based on a quick > look it seems there needs to be further changes. > > And again, what is the rush? Anyone wanting to experiment with this > functionality only needs to apply a single patch. And exposing the API > before it is stable is just going to be a mess as subsystems may start > using it from day one. > > So please, just drop it for 6.20. You can still merge this for the next > cycle when the basic functionality has been fixed. The fixes seemed correct on my review, what was wrong with them? And having the code fixed for known issues is a good thing here, it gives the gpio people a base to test their work on. As no one is currently using this, I will disable this from the build, but keeping the code in the tree right now is a good thing as I do feel this is the right way forward, and others can work off of it easier this way. thanks, greg k-h
On Tue, Feb 03, 2026 at 01:26:58PM +0100, Greg Kroah-Hartman wrote: > On Tue, Feb 03, 2026 at 01:15:50PM +0100, Johan Hovold wrote: > > On Mon, Jan 26, 2026 at 02:50:05PM +0100, Johan Hovold wrote: > > > On Sun, Jan 25, 2026 at 01:47:14PM +0100, Greg Kroah-Hartman wrote: > > I noticed that you picked up the proposed incremental fixes to the > > issues I reported without anyone even having reviewed them. The fixes > > being incremental makes it a lot harder to review, but based on a quick > > look it seems there needs to be further changes. > > > > And again, what is the rush? Anyone wanting to experiment with this > > functionality only needs to apply a single patch. And exposing the API > > before it is stable is just going to be a mess as subsystems may start > > using it from day one. > > > > So please, just drop it for 6.20. You can still merge this for the next > > cycle when the basic functionality has been fixed. > > The fixes seemed correct on my review, what was wrong with them? And > having the code fixed for known issues is a good thing here, it gives > the gpio people a base to test their work on. Turns out the new revocable design is also fundamentally broken. I've already spent too much time on this when I should be doing other things, but here is an updated revert which explains things: https://lore.kernel.org/r/20260204142849.22055-1-johan@kernel.org > As no one is currently using this, I will disable this from the build, > but keeping the code in the tree right now is a good thing as I do feel > this is the right way forward, and others can work off of it easier this > way. API design should not be done incrementally in-tree. It just makes things harder for reviewers, adds noise, and without any benefit for anyone when the interface keeps changing every other week. Please just merge the reverts and we can work out a way forward. Johan
On Tue, Feb 03, 2026 at 01:26:58PM +0100, Greg KH wrote: > On Tue, Feb 03, 2026 at 01:15:50PM +0100, Johan Hovold wrote: > > On Mon, Jan 26, 2026 at 02:50:05PM +0100, Johan Hovold wrote: > > > On Sun, Jan 25, 2026 at 01:47:14PM +0100, Greg Kroah-Hartman wrote: > > > > > > True, but we do need something. I took these patches without a real > > > > user as a base for us to start working off of. The rust implementation > > > > has shown that the design-pattern is a good solution for the problem, > > > > and so I feel we should work with it and try to get this working > > > > properly. We've been sitting and talking about it for years now, and > > > > here is the first real code submission that is getting us closer to fix > > > > the problem properly. It might not be perfict, but let's evolve it from > > > > here for what is found not to work correctly. > > > > > > It's a design pattern that's perhaps needed for rust, but not > > > necessarily elsewhere. But either way there is no need to rush this. If > > > it turns out to be usable, it can be merged along with a future user. > > > > > > Dropping the revocable_provider and revocable abstraction split should > > > even make it more palatable. > > > > > > And with a new interface and a non-trivial user we can see what the > > > end-result looks like and decide where to go from there. > > > > > > > So I don't want to take these reverts, let's try this out, by putting > > > > this into the driver core now, we have the base to experiment with in a > > > > "safe" way in lots of different driver subsytems at the same time. If > > > > it doesn't work out, worst case we revert it in a release or two because > > > > it didn't get used. > > > > > > Please reconsider. Perhaps I didn't stress the point enough that the > > > current API needs to be reworked completely since there's no longer any > > > need for the two revocable abstractions. > > > > I noticed that you picked up the proposed incremental fixes to the > > issues I reported without anyone even having reviewed them. The fixes > > being incremental makes it a lot harder to review, but based on a quick > > look it seems there needs to be further changes. > > > > And again, what is the rush? Anyone wanting to experiment with this > > functionality only needs to apply a single patch. And exposing the API > > before it is stable is just going to be a mess as subsystems may start > > using it from day one. > > > > So please, just drop it for 6.20. You can still merge this for the next > > cycle when the basic functionality has been fixed. > > The fixes seemed correct on my review, what was wrong with them? And > having the code fixed for known issues is a good thing here, it gives > the gpio people a base to test their work on. > > As no one is currently using this, I will disable this from the build, > but keeping the code in the tree right now is a good thing as I do feel > this is the right way forward, and others can work off of it easier this > way. The reason why I (and others) would like to see this series reverted is because be believe it is *not* the right way forward. There's no consensus on that topic. While having code in mainline can improve collaboration and accelerate development, we don't traditionally do that when a large portion of the parties involved believe the approach is wrong, and no user should be added until a consensus on the API is found. In the worst case, if no consensus can be found, someone has to make a decision, but doing so while discussions are ongoing, so close after LPC where different approaches were discussed and proposed, is very worrying. Do we want to prioritize your feeling that this is the right way forward over trying to find an argument, while discussions are progressing ? -- Regards, Laurent Pinchart
On Tue, Feb 03, 2026 at 03:57:11PM +0200, Laurent Pinchart wrote: > On Tue, Feb 03, 2026 at 01:26:58PM +0100, Greg KH wrote: > > On Tue, Feb 03, 2026 at 01:15:50PM +0100, Johan Hovold wrote: > > > On Mon, Jan 26, 2026 at 02:50:05PM +0100, Johan Hovold wrote: > > > > On Sun, Jan 25, 2026 at 01:47:14PM +0100, Greg Kroah-Hartman wrote: > > > > > > > > True, but we do need something. I took these patches without a real > > > > > user as a base for us to start working off of. The rust implementation > > > > > has shown that the design-pattern is a good solution for the problem, > > > > > and so I feel we should work with it and try to get this working > > > > > properly. We've been sitting and talking about it for years now, and > > > > > here is the first real code submission that is getting us closer to fix > > > > > the problem properly. It might not be perfict, but let's evolve it from > > > > > here for what is found not to work correctly. > > > > > > > > It's a design pattern that's perhaps needed for rust, but not > > > > necessarily elsewhere. But either way there is no need to rush this. If > > > > it turns out to be usable, it can be merged along with a future user. > > > > > > > > Dropping the revocable_provider and revocable abstraction split should > > > > even make it more palatable. > > > > > > > > And with a new interface and a non-trivial user we can see what the > > > > end-result looks like and decide where to go from there. > > > > > > > > > So I don't want to take these reverts, let's try this out, by putting > > > > > this into the driver core now, we have the base to experiment with in a > > > > > "safe" way in lots of different driver subsytems at the same time. If > > > > > it doesn't work out, worst case we revert it in a release or two because > > > > > it didn't get used. > > > > > > > > Please reconsider. Perhaps I didn't stress the point enough that the > > > > current API needs to be reworked completely since there's no longer any > > > > need for the two revocable abstractions. > > > > > > I noticed that you picked up the proposed incremental fixes to the > > > issues I reported without anyone even having reviewed them. The fixes > > > being incremental makes it a lot harder to review, but based on a quick > > > look it seems there needs to be further changes. > > > > > > And again, what is the rush? Anyone wanting to experiment with this > > > functionality only needs to apply a single patch. And exposing the API > > > before it is stable is just going to be a mess as subsystems may start > > > using it from day one. > > > > > > So please, just drop it for 6.20. You can still merge this for the next > > > cycle when the basic functionality has been fixed. > > > > The fixes seemed correct on my review, what was wrong with them? And > > having the code fixed for known issues is a good thing here, it gives > > the gpio people a base to test their work on. > > > > As no one is currently using this, I will disable this from the build, > > but keeping the code in the tree right now is a good thing as I do feel > > this is the right way forward, and others can work off of it easier this > > way. > > The reason why I (and others) would like to see this series reverted is > because be believe it is *not* the right way forward. There's no > consensus on that topic. While having code in mainline can improve > collaboration and accelerate development, we don't traditionally do that > when a large portion of the parties involved believe the approach is > wrong, and no user should be added until a consensus on the API is > found. Ah, but I do think this is the way forward, given that the pattern/idea works in the rust side of the kernel, and it's exactly what I've been asking for for years now :) But yes, without a real user, it's hard for me to justify it. But, I want it present in the tree now so that lots of others can play with it easily. If it turns out it is not correct, and does not work properly, then great, we will delete the files entirely. But I'm not so sure that we are there yet. thanks, greg "we all want more hours in a day" k-h
The revocable code is still under active discussion, and there is no
in-kernel users of it. So disable it from the build for now so that no
one suffers from it being present in the tree, yet leave it in the
source tree so that others can easily test it by reverting this commit
and building off of it for future releases.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/base/Kconfig | 2 +-
drivers/base/Makefile | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 8f7d7b9d81ac..9f318b98144d 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -254,7 +254,7 @@ endmenu
# Kunit test cases
config REVOCABLE_KUNIT_TEST
tristate "Kunit tests for revocable" if !KUNIT_ALL_TESTS
- depends on KUNIT
+ depends on KUNIT && BROKEN
default KUNIT_ALL_TESTS
help
Kunit tests for the revocable API.
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 4185aaa9bbb9..4c6607616a73 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -6,7 +6,7 @@ obj-y := component.o core.o bus.o dd.o syscore.o \
cpu.o firmware.o init.o map.o devres.o \
attribute_container.o transport_class.o \
topology.o container.o property.o cacheinfo.o \
- swnode.o faux.o revocable.o
+ swnode.o faux.o
obj-$(CONFIG_AUXILIARY_BUS) += auxiliary.o
obj-$(CONFIG_DEVTMPFS) += devtmpfs.o
obj-y += power/
--
2.53.0
On Tue, Feb 03, 2026 at 01:30:37PM +0100, Greg Kroah-Hartman wrote: > The revocable code is still under active discussion, and there is no > in-kernel users of it. So disable it from the build for now so that no > one suffers from it being present in the tree, yet leave it in the > source tree so that others can easily test it by reverting this commit > and building off of it for future releases. > > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> And also: diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 11b6515ce3d0..56e44a98d6a5 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -17,7 +17,6 @@ TARGETS += damon TARGETS += devices/error_logs TARGETS += devices/probe TARGETS += dmabuf-heaps -TARGETS += drivers/base/revocable TARGETS += drivers/dma-buf TARGETS += drivers/ntsync TARGETS += drivers/s390x/uvdevice With that, Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>
The revocable code is still under active discussion, and there is no
in-kernel users of it. So disable it from the build for now so that no
one suffers from it being present in the tree, yet leave it in the
source tree so that others can easily test it by reverting this commit
and building off of it for future releases.
Fixes: dd7762c73b1c ("driver core: disable revocable code from build")
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
---
Greg: I realized "driver core: disable revocable code from build" is
already in driver-core-testing branch. Sent this independent patch
in case it'd need to.
tools/testing/selftests/Makefile | 1 -
1 file changed, 1 deletion(-)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 11b6515ce3d0..56e44a98d6a5 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -17,7 +17,6 @@ TARGETS += damon
TARGETS += devices/error_logs
TARGETS += devices/probe
TARGETS += dmabuf-heaps
-TARGETS += drivers/base/revocable
TARGETS += drivers/dma-buf
TARGETS += drivers/ntsync
TARGETS += drivers/s390x/uvdevice
--
2.53.0.rc2.204.g2597b5adb4-goog
On Wed, Feb 04, 2026 at 05:28:18AM +0000, Tzung-Bi Shih wrote:
> The revocable code is still under active discussion, and there is no
> in-kernel users of it. So disable it from the build for now so that no
> one suffers from it being present in the tree, yet leave it in the
> source tree so that others can easily test it by reverting this commit
> and building off of it for future releases.
>
> Fixes: dd7762c73b1c ("driver core: disable revocable code from build")
> Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
> ---
> Greg: I realized "driver core: disable revocable code from build" is
> already in driver-core-testing branch. Sent this independent patch
> in case it'd need to.
>
> tools/testing/selftests/Makefile | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
> index 11b6515ce3d0..56e44a98d6a5 100644
> --- a/tools/testing/selftests/Makefile
> +++ b/tools/testing/selftests/Makefile
> @@ -17,7 +17,6 @@ TARGETS += damon
> TARGETS += devices/error_logs
> TARGETS += devices/probe
> TARGETS += dmabuf-heaps
> -TARGETS += drivers/base/revocable
> TARGETS += drivers/dma-buf
> TARGETS += drivers/ntsync
> TARGETS += drivers/s390x/uvdevice
> --
> 2.53.0.rc2.204.g2597b5adb4-goog
Thanks, I'll merge this into the other commit so that it all happens at
once, and can be reverted easier.
greg k-h
On Tue Feb 3, 2026 at 1:30 PM CET, Greg Kroah-Hartman wrote: > The revocable code is still under active discussion, and there is no > in-kernel users of it. So disable it from the build for now so that no > one suffers from it being present in the tree, yet leave it in the > source tree so that others can easily test it by reverting this commit > and building off of it for future releases. > > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Danilo Krummrich <dakr@kernel.org>
On Mon, Jan 26, 2026 at 2:50 PM Johan Hovold <johan@kernel.org> wrote: > > On Sun, Jan 25, 2026 at 01:47:14PM +0100, Greg Kroah-Hartman wrote: > > On Sat, Jan 24, 2026 at 08:08:28PM +0100, Danilo Krummrich wrote: > > > On Sat Jan 24, 2026 at 6:05 PM CET, Johan Hovold wrote: > > > > this does not look like the right interface for the chardev unplug issue. > > > > > > I think it depends, we should do everything to prevent having the issue in the > > > first place, e.g. ensure that we synchronize the unplug properly on device > > > driver unbind. > > > > > > Sometimes, however, this isn't possible; this is where a revocable mechanism can > > > come in handy to prevent UAF of device resources -- DRM is a good example for > > > this. > > > > This is not "possible" for almost all real devices so we need something > > like this for almost all classes of devices, DRM just shows the extremes > > involved, v4l2 is also another good example. > > It's certainly possible to handle the chardev unplug issue without > revocable as several subsystems already do. All you need is a refcount, > a lock and a flag. > > It may be possible to provide a generic solutions at the chardev level > or some kind of helper implementation (similar to revocable) for > subsystems to use directly. > This echoes the heated exchange I recently had with Johan elsewhere so I would like to chime in and use the wider forum of driver core maintainers to settle an important question. It seems there are two camps in this discussion: one whose perception of the problem is limited to character devices being referenced from user-space at the time of the driver unbind (favoring fixing the issues at the vfs level) and another extending the problem to any driver unbinding where we cannot ensure a proper ordering of the teardown (for whatever reason: fw_devlink=off, helper auxiliary devices acting as intermediates, or even user-space unbinding a driver manually with bus-level sysfs attributes) leaving consumers of resources exposed by providers that are gone with dangling references (focusing the solutions on the subsystem level). The question is: should we work towards making the kernel gracefully handle any such situation or is it acceptable that if we do "non-standard" things, we can trigger invalid memory accesses from user-space. I'm asking this because I've been sending patches to several subsystems addressing life-time issues at the subsystem level with SRCU and I've faced resistance from Johan at least twice - not based on the implementation details but on the philosophy itself of synchronizing all accesses from consumers to providers (SRCU, revocable or otherwise). I myself am in the latter camp. My thinking is: if we expose an interface, it should work correctly. In particular: it should not allow the user (even root!) to crash the kernel. In addition: there seems to be an agreement that rust in linux is good because of its memory safety features. The issues we're discussing would have never happened, had the code been written in rust so we should not just accept them as normal in C or tell the user to "just not do it". Bartosz
On Tue, Jan 27, 2026 at 10:18:27PM +0100, Bartosz Golaszewski wrote: > On Mon, Jan 26, 2026 at 2:50 PM Johan Hovold <johan@kernel.org> wrote: > > It's certainly possible to handle the chardev unplug issue without > > revocable as several subsystems already do. All you need is a refcount, > > a lock and a flag. > > > > It may be possible to provide a generic solutions at the chardev level > > or some kind of helper implementation (similar to revocable) for > > subsystems to use directly. > > This echoes the heated exchange I recently had with Johan elsewhere so > I would like to chime in and use the wider forum of driver core > maintainers to settle an important question. It seems there are two > camps in this discussion: one whose perception of the problem is > limited to character devices being referenced from user-space at the > time of the driver unbind (favoring fixing the issues at the vfs > level) and another extending the problem to any driver unbinding where > we cannot ensure a proper ordering of the teardown (for whatever > reason: fw_devlink=off, helper auxiliary devices acting as > intermediates, or even user-space unbinding a driver manually with > bus-level sysfs attributes) leaving consumers of resources exposed by > providers that are gone with dangling references (focusing the > solutions on the subsystem level). What I've been trying to get across is that the chardev hot-unplug issue is real and needs to be fixed where it still exists, while the manual unbinding of drivers by root is a corner case which does not need to be addressed at *any* cost. If addressing the latter by wrapping every resource access in code that adds enough runtime overhead and makes drivers harder to write and maintain it *may* not be worth it and we should instead explore alternatives. This may involve tracking consumers like fw_devlink already does today so that they are unbound before their dependencies are. Because in the end, how sound is a model where we allow critical resources to silently go away while a device is still in use (e.g. you won't discover that your emergency shutdown gpio is gone until you actually need it)? Johan
Doing a quick interlude here, On Wed, Jan 28, 2026 at 4:48 PM Johan Hovold <johan@kernel.org> wrote: > What I've been trying to get across is that the chardev hot-unplug issue > is real and needs to be fixed where it still exists, while the manual > unbinding of drivers by root is a corner case which does not need to be > addressed at *any* cost. I agree with Johans stance here. I might have a skewed view of history and be a bit self-delusional, but at the time I began the work with the GPIO character device, what we were seeing were these GPIO adapters on USB. Those were/are hot-pluggable, and provide GPIOs that are not for "system critical" things, i.e. not critical for the computer/system it is running on. Instead these are things like the Mikroe Clickboard GPIO expander and TTY, or these GPIOs that come on FTDI USB adapters. These can and are certainly being used for things that are critical to the thing they are used for, such as regulating water reserves for your local hydro power plant. Of course only a madman would unplug that, but from Linux' POV they are very hot-pluggable, by their very nature. [Side quest on what are some insanities industrial system people do here... but these adapters are *very* popular.] Since they are hot-pluggable and will never appear in DTS files or such, a character device or other userspace ABI is the right abstraction, and it needs to be able to come and go without crashing or wasting memory. Plug/unplug the Mikroe clickboard GPIO 10.000 times should be fine under any circumstances. And that is the problem that need to be solved *first*, before removing things on (local) I2C buses or unbinding random devices from sysfs making them unpluggable in theory. Yours, Linus Walleij
On Wed, Jan 28, 2026 at 4:48 PM Johan Hovold <johan@kernel.org> wrote: > > On Tue, Jan 27, 2026 at 10:18:27PM +0100, Bartosz Golaszewski wrote: > > On Mon, Jan 26, 2026 at 2:50 PM Johan Hovold <johan@kernel.org> wrote: > > > > It's certainly possible to handle the chardev unplug issue without > > > revocable as several subsystems already do. All you need is a refcount, > > > a lock and a flag. > > > > > > It may be possible to provide a generic solutions at the chardev level > > > or some kind of helper implementation (similar to revocable) for > > > subsystems to use directly. > > > > This echoes the heated exchange I recently had with Johan elsewhere so > > I would like to chime in and use the wider forum of driver core > > maintainers to settle an important question. It seems there are two > > camps in this discussion: one whose perception of the problem is > > limited to character devices being referenced from user-space at the > > time of the driver unbind (favoring fixing the issues at the vfs > > level) and another extending the problem to any driver unbinding where > > we cannot ensure a proper ordering of the teardown (for whatever > > reason: fw_devlink=off, helper auxiliary devices acting as > > intermediates, or even user-space unbinding a driver manually with > > bus-level sysfs attributes) leaving consumers of resources exposed by > > providers that are gone with dangling references (focusing the > > solutions on the subsystem level). > > What I've been trying to get across is that the chardev hot-unplug issue > is real and needs to be fixed where it still exists, while the manual > unbinding of drivers by root is a corner case which does not need to be > addressed at *any* cost. > > If addressing the latter by wrapping every resource access in code that > adds enough runtime overhead and makes drivers harder to write and > maintain it *may* not be worth it and we should instead explore > alternatives. > Alright, so we *do* agree at least on some parts. :) I agree that any such change should not affect drivers. If you look at the GPIO changes I did or the proposed nvmem rework - it never touched drivers, only the subsystem level code. The latter especially is really tiny, in fact: drivers/nvmem/core.c | 172 +++++++++++++++++++++++--------------- drivers/nvmem/internals.h | 17 +++- is all you need to make it not crash in the situations I described under that series. Runtime overhead in read-sections with SRCU or read-write semaphores is negligible and typically we only have to write on driver unbind. So that "wrapping every resource access" sounds scary but really is not. GPIO work was bigger but it addressed way more synchronization issues than just supplier unbinding. For I2C both the problem is different (subsystem waiting forever for consumers to release all references) and the culprit: memory used to hold the reference-counted struct device is released the supplier unbind unconditionally. Unfortunately there's no way around it other than to first move it into a separate chunk managed by i2c core. But that's not the synchronization part that leaks into the drivers, just the need to move struct device out of struct i2c_adapter. > This may involve tracking consumers like fw_devlink already does today > so that they are unbound before their dependencies are. > During Saravana's talk at LPC we did briefly speak about whether it would be possible to enforce devlinks for ALL devices linked in a consumer-supplier fashion. I did in fact look into it for a bit on my way back and it too would require at least subsystem-level changes across all subsystems because you need to add that entry point at the time of the resource being requested so it's not a no-cost operation. But it is an alternative, yes though it'll require a comparable amount of gap-plugging IMO. > Because in the end, how sound is a model where we allow critical > resources to silently go away while a device is still in use (e.g. you > won't discover that your emergency shutdown gpio is gone until you > actually need it)? > Well, we do allow it at the moment. It doesn't seem like devlink will be able to cover 100% of use-cases anytime soon. Bartosz
On Thu, Jan 29, 2026 at 10:11:46AM +0100, Bartosz Golaszewski wrote: > On Wed, Jan 28, 2026 at 4:48 PM Johan Hovold <johan@kernel.org> wrote: > > On Tue, Jan 27, 2026 at 10:18:27PM +0100, Bartosz Golaszewski wrote: > > > On Mon, Jan 26, 2026 at 2:50 PM Johan Hovold <johan@kernel.org> wrote: > > > > > > It's certainly possible to handle the chardev unplug issue without > > > > revocable as several subsystems already do. All you need is a refcount, > > > > a lock and a flag. > > > > > > > > It may be possible to provide a generic solutions at the chardev level > > > > or some kind of helper implementation (similar to revocable) for > > > > subsystems to use directly. > > > > > > This echoes the heated exchange I recently had with Johan elsewhere so > > > I would like to chime in and use the wider forum of driver core > > > maintainers to settle an important question. It seems there are two > > > camps in this discussion: one whose perception of the problem is > > > limited to character devices being referenced from user-space at the > > > time of the driver unbind (favoring fixing the issues at the vfs > > > level) and another extending the problem to any driver unbinding where > > > we cannot ensure a proper ordering of the teardown (for whatever > > > reason: fw_devlink=off, helper auxiliary devices acting as > > > intermediates, or even user-space unbinding a driver manually with > > > bus-level sysfs attributes) leaving consumers of resources exposed by > > > providers that are gone with dangling references (focusing the > > > solutions on the subsystem level). > > > > What I've been trying to get across is that the chardev hot-unplug issue > > is real and needs to be fixed where it still exists, while the manual > > unbinding of drivers by root is a corner case which does not need to be > > addressed at *any* cost. > > > > If addressing the latter by wrapping every resource access in code that > > adds enough runtime overhead and makes drivers harder to write and > > maintain it *may* not be worth it and we should instead explore > > alternatives. > > Alright, so we *do* agree at least on some parts. :) > > I agree that any such change should not affect drivers. If you look at > the GPIO changes I did or the proposed nvmem rework - it never touched > drivers, only the subsystem level code. The latter especially is > really tiny, in fact: > > drivers/nvmem/core.c | 172 +++++++++++++++++++++++--------------- > drivers/nvmem/internals.h | 17 +++- > > is all you need to make it not crash in the situations I described > under that series. Runtime overhead in read-sections with SRCU or > read-write semaphores is negligible and typically we only have to > write on driver unbind. So that "wrapping every resource access" > sounds scary but really is not. > > GPIO work was bigger but it addressed way more synchronization issues > than just supplier unbinding. > > For I2C both the problem is different (subsystem waiting forever for > consumers to release all references) and the culprit: memory used to > hold the reference-counted struct device is released the supplier > unbind unconditionally. Unfortunately there's no way around it other > than to first move it into a separate chunk managed by i2c core. Isn't there ? Can't the driver-specific data structure be reference-counted instead of unconditionally freed at unbind time ? > But > that's not the synchronization part that leaks into the drivers, just > the need to move struct device out of struct i2c_adapter. > > > This may involve tracking consumers like fw_devlink already does today > > so that they are unbound before their dependencies are. > > During Saravana's talk at LPC we did briefly speak about whether it > would be possible to enforce devlinks for ALL devices linked in a > consumer-supplier fashion. I did in fact look into it for a bit on my > way back and it too would require at least subsystem-level changes > across all subsystems because you need to add that entry point at the > time of the resource being requested so it's not a no-cost operation. > But it is an alternative, yes though it'll require a comparable amount > of gap-plugging IMO. I recall at least one driver (omap3isp) having a circular resource issue. The ISP hardware block has the ability to produce a clock for the camera sensor, and the camera sensor is a resource acquired by the ISP driver. It's quite rare, but it happens. I would however not reject a solution that would solve the 99.99% of the problem without addressing this. > > Because in the end, how sound is a model where we allow critical > > resources to silently go away while a device is still in use (e.g. you > > won't discover that your emergency shutdown gpio is gone until you > > actually need it)? > > Well, we do allow it at the moment. It doesn't seem like devlink will > be able to cover 100% of use-cases anytime soon. We have this issue because designing resource management is hard. The decision we made not to pay that cost has now turned into a huge technical debt. There's no easy way around it, it won't be easier to solve it correctly today than it was years ago. I don't know when we will be able to fix the issue, but I know it will happen only when we decide to face the situation and stop with band-aids. What I think is the biggest issue at the moment is the lack of motivation/time/money to address this huge, but I'm hopeful because I trust the technical expertise of the community. -- Regards, Laurent Pinchart
On Thu, 29 Jan 2026 11:56:34 +0100, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> said:
> On Thu, Jan 29, 2026 at 10:11:46AM +0100, Bartosz Golaszewski wrote:
>>
>> For I2C both the problem is different (subsystem waiting forever for
>> consumers to release all references) and the culprit: memory used to
>> hold the reference-counted struct device is released the supplier
>> unbind unconditionally. Unfortunately there's no way around it other
>> than to first move it into a separate chunk managed by i2c core.
>
> Isn't there ? Can't the driver-specific data structure be
> reference-counted instead of unconditionally freed at unbind time ?
>
Oh, for sure, if we did from the start. But we did not and there are now
hundreds of i2c drivers that do:
struct my_i2c_drv_data {
struct i2c_adapter adap;
int my_other_drv_data;
};
and in probe:
struct my_i2c_drv_data *data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
(or just kzalloc() with kfree() in remove, it doesn't matter)
and the ownership of that data belongs to the driver. There's no way we could
address it now so the next best thing is to work towards moving the ownership
of struct i2c_adapter to the i2c core and make it reference counted using the
internal kobject of the associated struct device.
Bartosz
On Thu, Jan 29, 2026 at 08:50:30AM -0500, Bartosz Golaszewski wrote:
> On Thu, 29 Jan 2026 11:56:34 +0100, Laurent Pinchart said:
> > On Thu, Jan 29, 2026 at 10:11:46AM +0100, Bartosz Golaszewski wrote:
> >>
> >> For I2C both the problem is different (subsystem waiting forever for
> >> consumers to release all references) and the culprit: memory used to
> >> hold the reference-counted struct device is released the supplier
> >> unbind unconditionally. Unfortunately there's no way around it other
> >> than to first move it into a separate chunk managed by i2c core.
> >
> > Isn't there ? Can't the driver-specific data structure be
> > reference-counted instead of unconditionally freed at unbind time ?
>
> Oh, for sure, if we did from the start. But we did not and there are now
> hundreds of i2c drivers that do:
>
> struct my_i2c_drv_data {
> struct i2c_adapter adap;
> int my_other_drv_data;
> };
>
> and in probe:
>
> struct my_i2c_drv_data *data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
>
> (or just kzalloc() with kfree() in remove, it doesn't matter)
>
> and the ownership of that data belongs to the driver. There's no way we could
> address it now so the next best thing is to work towards moving the ownership
> of struct i2c_adapter to the i2c core and make it reference counted using the
> internal kobject of the associated struct device.
What I'm reading here is essentially that we rolled out devm_kzalloc()
too quickly without understanding the consequences, and it has spread so
much that it can't be fixed properly now, so we need to find a
workaround. And now we're trying to work around the problem by rolling
out a revocable API that has barely seen any testing, and is known to
have design issues. Does any one else see the irony ? :-)
--
Regards,
Laurent Pinchart
On Thu, Jan 29, 2026 at 3:49 PM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> On Thu, Jan 29, 2026 at 08:50:30AM -0500, Bartosz Golaszewski wrote:
> > On Thu, 29 Jan 2026 11:56:34 +0100, Laurent Pinchart said:
> > > On Thu, Jan 29, 2026 at 10:11:46AM +0100, Bartosz Golaszewski wrote:
> > >>
> > >> For I2C both the problem is different (subsystem waiting forever for
> > >> consumers to release all references) and the culprit: memory used to
> > >> hold the reference-counted struct device is released the supplier
> > >> unbind unconditionally. Unfortunately there's no way around it other
> > >> than to first move it into a separate chunk managed by i2c core.
> > >
> > > Isn't there ? Can't the driver-specific data structure be
> > > reference-counted instead of unconditionally freed at unbind time ?
> >
> > Oh, for sure, if we did from the start. But we did not and there are now
> > hundreds of i2c drivers that do:
> >
> > struct my_i2c_drv_data {
> > struct i2c_adapter adap;
> > int my_other_drv_data;
> > };
> >
> > and in probe:
> >
> > struct my_i2c_drv_data *data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> >
> > (or just kzalloc() with kfree() in remove, it doesn't matter)
> >
> > and the ownership of that data belongs to the driver. There's no way we could
> > address it now so the next best thing is to work towards moving the ownership
> > of struct i2c_adapter to the i2c core and make it reference counted using the
> > internal kobject of the associated struct device.
>
> What I'm reading here is essentially that we rolled out devm_kzalloc()
> too quickly without understanding the consequences, and it has spread so
> much that it can't be fixed properly now, so we need to find a
> workaround. And now we're trying to work around the problem by rolling
> out a revocable API that has barely seen any testing, and is known to
> have design issues. Does any one else see the irony ? :-)
>
No, this has nothing to do with devm_anything(). It's resource
ownership. What driver creates at probe(), the driver should destroy
at remove(). I really hate with a passion the pattern where a driver
creates something in probe() and then tosses it over to the subsystem
for management. If an entity registers the struct device, it should
also be the one who allocates and manages the memory for it. Devres
just made it a bit easier to commit this kind of errors but they would
exist nevertheless.
Bartosz
On Thu Jan 29, 2026 at 3:49 PM CET, Laurent Pinchart wrote:
> On Thu, Jan 29, 2026 at 08:50:30AM -0500, Bartosz Golaszewski wrote:
>> On Thu, 29 Jan 2026 11:56:34 +0100, Laurent Pinchart said:
>> > On Thu, Jan 29, 2026 at 10:11:46AM +0100, Bartosz Golaszewski wrote:
>> >>
>> >> For I2C both the problem is different (subsystem waiting forever for
>> >> consumers to release all references) and the culprit: memory used to
>> >> hold the reference-counted struct device is released the supplier
>> >> unbind unconditionally. Unfortunately there's no way around it other
>> >> than to first move it into a separate chunk managed by i2c core.
>> >
>> > Isn't there ? Can't the driver-specific data structure be
>> > reference-counted instead of unconditionally freed at unbind time ?
>>
>> Oh, for sure, if we did from the start. But we did not and there are now
>> hundreds of i2c drivers that do:
>>
>> struct my_i2c_drv_data {
>> struct i2c_adapter adap;
>> int my_other_drv_data;
>> };
>>
>> and in probe:
>>
>> struct my_i2c_drv_data *data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
>>
>> (or just kzalloc() with kfree() in remove, it doesn't matter)
>>
>> and the ownership of that data belongs to the driver. There's no way we could
>> address it now so the next best thing is to work towards moving the ownership
>> of struct i2c_adapter to the i2c core and make it reference counted using the
>> internal kobject of the associated struct device.
>
> What I'm reading here is essentially that we rolled out devm_kzalloc() too
> quickly without understanding the consequences, and it has spread so much that
> it can't be fixed properly now, so we need to find a workaround.
I'm pretty sure I don't have all the details about the problem with the
i2c_adapter, but from what I read here briefly the problem simply seems to be
that currently the driver providing the i2c_adapter frees the i2c_adapter on
driver unbind unconditionally.
I would argue that this is a violation of the driver core design anyways. I
mean, in the end an i2c_adapter is the same type of device as any other bus
device (platform, PCI, etc.).
For instance, when the physical device that is represented by a PCI device is
removed from the machine, the corresponding struct pci_dev is not forcefully
freed either, it stays around as long as there are references to the device.
So, I fail to see how devm_kmalloc() and frieds are the root cause of the
i2c_adapter lifetime problem?
> And now we're trying to work around the problem by rolling out a revocable API
> that has barely seen any testing, and is known to have design issues. Does any
> one else see the irony ? :-)
I don't think anyone is trying to work around problems with devm_kmalloc() and
friends. It's just system memory, i.e. nothing that needs to be revoked. We can
just not use devm_kmalloc() and friends if we need the memory to outlive driver
unbind for some reason. The problem is about real device resources, such as I/O
memory mappings that *must* be released when a driver is unbound from its
device. So, revocable is clearly not a fix for devm_kmalloc() et al.
On Thu, Jan 29, 2026 at 08:50:30AM -0500, Bartosz Golaszewski wrote:
> and the ownership of that data belongs to the driver. There's no way we could
> address it now so the next best thing is to work towards moving the ownership
Think positive!
If this is common:
struct my_i2c_drv_data *data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
Then change it into
struct my_i2c_drv_data *data = devm_i2c_adaptor_alloc(struct my_i2c_drv_data, adap);
With Coccinelle or sed.
Audit all the drivers to catch the stragglers.
Now you have a refcount. Look at how fwctl_alloc_device() works to
understand the pattern.
Kernel community has done far harder transformations than this :)
Sure it is 200 drivers, I would ask Coccinelle team for help.
Here is how I would approach it.
First, grep to find all the candidates:
$ git grep -E '^\s+struct i2c_adapter[^*]*;'
Get a kernel built with all those compiling and get a clangd database
going. Make a list of all possible candidate files with grep.
AI tells me (and AI is never right about Coccinelle sadly) that you
could use this:
// C1: Find any struct that has a member of type "struct i2c_adapter"
@ has_i2c_adapter_struct @
type S;
@@
struct S {
...
struct i2c_adapter;
...
};
// C2: Replace sizeof(...) with fixme_sizeof(...)
@ sizeof_i2c_adapter_struct depends on has_i2c_adapter_struct @
type has_i2c_adapter_struct.S;
@@
- sizeof(struct S)
+ fixme_sizeof(struct S)
The idea being the only reason to do sizeof(S) is for an allocation
and we want to find every allocation of a wrapper struct to fix it.
Now you have an index of all lines that need touching.
Look for common patterns, use Coccinelle or sed to make bulk
replacements. Group patches of all similar transformations. Sweep
through with grep to clean anything not caught. Probably there will be
a couple drivers doing something utterly insane, meditate on them and
clean them up by hand (this is what clangd is helpful for)
Betcha you can get through it in a few hours!
Jason
On Thu, Jan 29, 2026 at 10:28:36AM -0400, Jason Gunthorpe wrote:
> On Thu, Jan 29, 2026 at 08:50:30AM -0500, Bartosz Golaszewski wrote:
>
> > and the ownership of that data belongs to the driver. There's no way we could
> > address it now so the next best thing is to work towards moving the ownership
>
> Think positive!
>
> If this is common:
>
> struct my_i2c_drv_data *data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
>
> Then change it into
>
> struct my_i2c_drv_data *data = devm_i2c_adaptor_alloc(struct my_i2c_drv_data, adap);
>
> With Coccinelle or sed.
>
> Audit all the drivers to catch the stragglers.
>
> Now you have a refcount. Look at how fwctl_alloc_device() works to
> understand the pattern.
>
> Kernel community has done far harder transformations than this :)
>
> Sure it is 200 drivers, I would ask Coccinelle team for help.
We rewrote the device model between v2.4 and v2.6 (and by "we" I mostly
mean kudos to Greg for that work, as well as to all the people who
worked with him who I don't know about). That impacted *all* the
drivers. We can do this if we want to.
> Here is how I would approach it.
>
> First, grep to find all the candidates:
>
> $ git grep -E '^\s+struct i2c_adapter[^*]*;'
>
> Get a kernel built with all those compiling and get a clangd database
> going. Make a list of all possible candidate files with grep.
>
> AI tells me (and AI is never right about Coccinelle sadly) that you
> could use this:
>
> // C1: Find any struct that has a member of type "struct i2c_adapter"
> @ has_i2c_adapter_struct @
> type S;
> @@
> struct S {
> ...
> struct i2c_adapter;
> ...
> };
>
> // C2: Replace sizeof(...) with fixme_sizeof(...)
> @ sizeof_i2c_adapter_struct depends on has_i2c_adapter_struct @
> type has_i2c_adapter_struct.S;
> @@
> - sizeof(struct S)
> + fixme_sizeof(struct S)
>
> The idea being the only reason to do sizeof(S) is for an allocation
> and we want to find every allocation of a wrapper struct to fix it.
>
> Now you have an index of all lines that need touching.
>
> Look for common patterns, use Coccinelle or sed to make bulk
> replacements. Group patches of all similar transformations. Sweep
> through with grep to clean anything not caught. Probably there will be
> a couple drivers doing something utterly insane, meditate on them and
> clean them up by hand (this is what clangd is helpful for)
>
> Betcha you can get through it in a few hours!
>
> Jason
--
Regards,
Laurent Pinchart
On Tue, Jan 27, 2026 at 10:18:27PM +0100, Bartosz Golaszewski wrote: > maintainers to settle an important question. It seems there are two > camps in this discussion: one whose perception of the problem is > limited to character devices being referenced from user-space at the > time of the driver unbind (favoring fixing the issues at the vfs > level) and another extending the problem to any driver unbinding where > we cannot ensure a proper ordering of the teardown (for whatever > reason: I don't think you can defend any position where the user can do echo XYZ > /sys/.../YY and the kernel has an oops. Meaing in this discussion if the user does echo ".." > /sys/bus/XX/drivers/YY/unbind The kernel shouldn't oops or warn. I've seen many kinds of bogus arguments over the years (especially misunderstanding what the module refcount does!!), but ultimately I think this is the generally agreed expectation. However, in practice this isn't a common work flow, it is quite alot of tricky work to understand how a subsystem works and put in the necessary protections, and frankly many subsystems have had these bugs for their entire existance. It isn't urgent. Several subsystems do get it right, it is very possible and the best practices are much more aligned with the Device<Bound> stuff in Rust. ie guarantee in most contexts that remove() can't run. I'm not surprised to hear pushback on trying to fix it, especially if the proposed fixes are not subsystem comphrenesive in nature. Sprinkling SRCU around as partial patches, especially in drivers, is not a good idea, IMHO. The reason cdev keeps coming up is because there are few common ways a typical driver can actually generate concurrent operations during and after remove that would be problematic. File descriptors, subsystem callbacks, work queues, timers, interrupts, and notifiers. The latter already have robust schemes to help the driver shutdown and end the concurrent operations. ie cancel_work_sync(), del_timer_sync(), free_irq(), and *notifier_unregister(). Many wrappered file descriptors are safe. For example the sysfs usage in a driver is sync stopped during device_del's calls to sysfs remove functions. IMHO the largest systemic issue in this space is people making their own fops without understanding the lifecycle model and without hand-rolling a special a "_sync" kind of shutdown around it. A managed fops with a sync destruction operation would go a long way to closing these issues. ie the gpiolib example was basically all fops, one work and a bunch of places where the protection was redundant. Yes there are other cases, and certainly I've commonly seen cases of drivers reaching into other drivers, and subsystem non-file ops, but these cases usualy have other more fundamental issues with remove races :( So I would probably also take a strong position that introducing something like DevRes where you try to wrapper MMIO or other device resources is adamently not something we want to do. Not because I don't care about these removal races, but because I want the drivers to run in a Device<Bound> context with very few exceptions. Jason
On Tue, Jan 27, 2026 at 07:52:32PM -0400, Jason Gunthorpe wrote: > On Tue, Jan 27, 2026 at 10:18:27PM +0100, Bartosz Golaszewski wrote: > > maintainers to settle an important question. It seems there are two > > camps in this discussion: one whose perception of the problem is > > limited to character devices being referenced from user-space at the > > time of the driver unbind (favoring fixing the issues at the vfs > > level) and another extending the problem to any driver unbinding where > > we cannot ensure a proper ordering of the teardown (for whatever > > reason: > > I don't think you can defend any position where the user can do > > echo XYZ > /sys/.../YY > > and the kernel has an oops. > > Meaing in this discussion if the user does > echo ".." > /sys/bus/XX/drivers/YY/unbind > > The kernel shouldn't oops or warn. > > I've seen many kinds of bogus arguments over the years (especially > misunderstanding what the module refcount does!!), but ultimately I > think this is the generally agreed expectation. > > However, in practice this isn't a common work flow, it is quite alot > of tricky work to understand how a subsystem works and put in the > necessary protections, and frankly many subsystems have had these bugs > for their entire existance. It isn't urgent. I completely agree with this approach. I don't think anyone really advocates for considering that causing an oops from userspace is a normal behaviour. While the various discussions on this topic that escalated into heated arguments over time may have created an impression that we have two camps, I think we are all fundamentally in the same camp: we want to kernel to run reliably without crashing, with performant technical solutions. And let's remember that we achieve this goal best when we work together. > Several subsystems do get it right, it is very possible and the best > practices are much more aligned with the Device<Bound> stuff in > Rust. ie guarantee in most contexts that remove() can't run. > > I'm not surprised to hear pushback on trying to fix it, especially if > the proposed fixes are not subsystem comphrenesive in > nature. Sprinkling SRCU around as partial patches, especially in > drivers, is not a good idea, IMHO. > > The reason cdev keeps coming up is because there are few common ways a > typical driver can actually generate concurrent operations during and > after remove that would be problematic. > > File descriptors, subsystem callbacks, work queues, timers, > interrupts, and notifiers. > > The latter already have robust schemes to help the driver shutdown and > end the concurrent operations. ie cancel_work_sync(), > del_timer_sync(), free_irq(), and *notifier_unregister(). One a side note, devm_request_irq() is another of the devm_* helpers that cause race conditions, as interrupt handlers can run right after .remove() returns, which drivers will most likely not handle correctly. > Many wrappered file descriptors are safe. For example the sysfs usage > in a driver is sync stopped during device_del's calls to sysfs remove > functions. > > IMHO the largest systemic issue in this space is people making their > own fops without understanding the lifecycle model and without > hand-rolling a special a "_sync" kind of shutdown around it. > > A managed fops with a sync destruction operation would go a long way > to closing these issues. That's what I've been advocating for. The best way to ensure that driver code will not accessed data freed at .remove() time is to prevent the code to run at all. We have different classes of problems. It's tempting to think that a single solution can handle them all, but that's often not the best option. Based on what I've seen so far, handling the fops vs. .remove() race is best done by ensuring that fops do not race with .remove(). Dan Williams showed that it can be done quite simply. Conceptually speaking (not using (S)RCU or other specific kernel primitives, to try and describe the concepts clearly), it works as follows. 1. At the beginning of .remove(), flag that the device is being removed. 2. At the entry point of all fops, check the flag and return with an error if set. This prevents *new* fops from being entered once .remove() has started. 3. At the entry point of all fops, flag that a fop is in progress (with a counter). 4. In .remove(), wake up all threads sleeping in fops. 5. At the exit point of all fops, decrease the fop-in-progress counter. 6. In .remove(), wait until the fop-in-progress counter reaches 0. At that point, it is guaranteed that all fops will have completed and that no new fop can run. 7. Complete .remove(), freeing resources. Most of this of course would need to be handled in helpers at the cdev level, and wrapped in subsystem helpers. Drivers will need to be modified to - Call a helper at the beginning of .remove() to handle (1). - Wake up threads sleeping in fops (3), as only drivers know about these (in the general case, in some subsystems this may be handled in helpers). - Call a helper to handle (6). The only fop that this doesn't handle is .release(), as we can't block .remove() until userspace closes all open file descriptors and unmaps all memory. This is also the race condition that is the easiest to trigger (along with the race conditions related to threads sleeping in fops). Solving the .release() race also requires driver involvement too. By definition drivers have tasks to perform in .release() that need to access resources (MMIO, data structures, ...) - otherwise there would be no .release() operation. Drivers need to synchronize between .remove() and .release() and skip release tasks that have been performed by .remove() already. Strategy depends on subsystems, in the V4L2 case things can be more complex as it's common for drivers to create multiple cdevs, but it's entirely maangeable with clear concepts that can be documented, implemented, and checked during review. > ie the gpiolib example was basically all fops, one work and a bunch of > places where the protection was redundant. > > Yes there are other cases, and certainly I've commonly seen cases of > drivers reaching into other drivers, and subsystem non-file ops, but > these cases usualy have other more fundamental issues with remove > races :( Drivers using resources provided by other drivers is a more complex problem. I'm thinking about a driver acquiring a GPIO in .probe(), and the GPIO provider disappearing at a random point of time. Or a clock, or a regulator. These issues are more rare if we ignore unbinding drivers forcefully through sysfs, but they can still occur, so we should try to find a solution here too (and the sysfs unbind issue is also worth fixing). There, unlike in the cdev case, some sort of revocable API could make sense. > So I would probably also take a strong position that introducing > something like DevRes where you try to wrapper MMIO or other device > resources is adamently not something we want to do. Not because I > don't care about these removal races, but because I want the drivers > to run in a Device<Bound> context with very few exceptions. -- Regards, Laurent Pinchart
(Cc: Maxime, Thomas, Maarten) On Thu Jan 29, 2026 at 2:08 AM CET, Laurent Pinchart wrote: > That's what I've been advocating for. The best way to ensure that driver > code will not accessed data freed at .remove() time is to prevent the > code to run at all. With this we are in full agreement, I think that'd be best too. But, I also think that sometimes this isn't possible. For instance, DRM has such a case with atomic mode setting.
On Thu, Jan 29, 2026 at 11:29:03PM +0100, Danilo Krummrich wrote: > (Cc: Maxime, Thomas, Maarten) > > On Thu Jan 29, 2026 at 2:08 AM CET, Laurent Pinchart wrote: > > That's what I've been advocating for. The best way to ensure that driver > > code will not accessed data freed at .remove() time is to prevent the > > code to run at all. > > With this we are in full agreement, I think that'd be best too. But, I also > think that sometimes this isn't possible. For instance, DRM has such a case with > atomic mode setting. I don't see why it would be impossible there. -- Regards, Laurent Pinchart
Hi, On Fri, Jan 30, 2026 at 11:10:49AM +0200, Laurent Pinchart wrote: > On Thu, Jan 29, 2026 at 11:29:03PM +0100, Danilo Krummrich wrote: > > (Cc: Maxime, Thomas, Maarten) > > > > On Thu Jan 29, 2026 at 2:08 AM CET, Laurent Pinchart wrote: > > > That's what I've been advocating for. The best way to ensure that driver > > > code will not accessed data freed at .remove() time is to prevent the > > > code to run at all. > > > > With this we are in full agreement, I think that'd be best too. But, I also > > think that sometimes this isn't possible. For instance, DRM has such a case with > > atomic mode setting. > > I don't see why it would be impossible there. I'm not quite sure what you have in mind there, but DRM always allowed the DRM driver to stick around longer than its device to accomodate the fact that userspace might still have an open fd to it. If userspace has an open fd, it can still call ioctl so preventing to run any code is going to be difficult. Maxime
On Tue, Feb 03, 2026 at 10:10:57AM +0100, Maxime Ripard wrote: > On Fri, Jan 30, 2026 at 11:10:49AM +0200, Laurent Pinchart wrote: > > On Thu, Jan 29, 2026 at 11:29:03PM +0100, Danilo Krummrich wrote: > > > (Cc: Maxime, Thomas, Maarten) > > > > > > On Thu Jan 29, 2026 at 2:08 AM CET, Laurent Pinchart wrote: > > > > That's what I've been advocating for. The best way to ensure that driver > > > > code will not accessed data freed at .remove() time is to prevent the > > > > code to run at all. > > > > > > With this we are in full agreement, I think that'd be best too. But, I also > > > think that sometimes this isn't possible. For instance, DRM has such a case with > > > atomic mode setting. > > > > I don't see why it would be impossible there. > > I'm not quite sure what you have in mind there, but DRM always allowed > the DRM driver to stick around longer than its device to accomodate the > fact that userspace might still have an open fd to it. > > If userspace has an open fd, it can still call ioctl so preventing to > run any code is going to be difficult. Preventing new ioctls (and other fops) from being called isn't difficult, they can be blocked at the entry point, outside of the driver. In-progress ioctls running in other threads can also be forced to complete before .remove() frees all memory. I think this is the right thing to do. The only fop that we can't completely prevent from running is .release(), as we can't wait until userspace closes all file handles and unmaps all memory before .remove() completes. There are solutions for that. -- Regards, Laurent Pinchart
On Thu, Jan 29, 2026 at 03:08:22AM +0200, Laurent Pinchart wrote: > > The latter already have robust schemes to help the driver shutdown and > > end the concurrent operations. ie cancel_work_sync(), > > del_timer_sync(), free_irq(), and *notifier_unregister(). > > One a side note, devm_request_irq() is another of the devm_* helpers > that cause race conditions, as interrupt handlers can run right after > .remove() returns, which drivers will most likely not handle correctly. Yes! You *cannot* intermix devm and non-devm approaches without creating very subtle bugs exactly like this. If your subsystem does not provide a "devm register" helper its drivers shouldn't use devm. > 1. At the beginning of .remove(), flag that the device is being removed. > > 2. At the entry point of all fops, check the flag and return with an > error if set. This prevents *new* fops from being entered once > .remove() has started. > > 3. At the entry point of all fops, flag that a fop is in progress (with > a counter). > > 4. In .remove(), wake up all threads sleeping in fops. > > 5. At the exit point of all fops, decrease the fop-in-progress counter. > > 6. In .remove(), wait until the fop-in-progress counter reaches 0. At > that point, it is guaranteed that all fops will have completed and > that no new fop can run. > > 7. Complete .remove(), freeing resources. This is is basically just open coding a rwsem.. :) SRCU should be faster than this, IIRC it has less cache line bouncing. But sure, it is all easy once you figure out how to give the fops shim some place to store all this state since people would not agree to make this a universal cost to all fops. > > Yes there are other cases, and certainly I've commonly seen cases of > > drivers reaching into other drivers, and subsystem non-file ops, but > > these cases usualy have other more fundamental issues with remove > > races :( > > Drivers using resources provided by other drivers is a more complex > problem. I'm thinking about a driver acquiring a GPIO in .probe(), and > the GPIO provider disappearing at a random point of time. Or a clock, or > a regulator. These issues are more rare if we ignore unbinding drivers > forcefully through sysfs, but they can still occur, so we should try to > find a solution here too (and the sysfs unbind issue is also worth > fixing). There, unlike in the cdev case, some sort of revocable API > could make sense. IMHO the sanest answer is to force the depending driver to unplug before allowing the dependend driver to remove. Isn't that what the dev link stuff does? IDK it has been forever now since I've done embedded stuff.. Jason
On Wed, Jan 28, 2026 at 09:23:22PM -0400, Jason Gunthorpe wrote: > On Thu, Jan 29, 2026 at 03:08:22AM +0200, Laurent Pinchart wrote: > > > The latter already have robust schemes to help the driver shutdown and > > > end the concurrent operations. ie cancel_work_sync(), > > > del_timer_sync(), free_irq(), and *notifier_unregister(). > > > > One a side note, devm_request_irq() is another of the devm_* helpers > > that cause race conditions, as interrupt handlers can run right after > > .remove() returns, which drivers will most likely not handle correctly. > > Yes! You *cannot* intermix devm and non-devm approaches without > creating very subtle bugs exactly like this. If your subsystem does > not provide a "devm register" helper its drivers shouldn't use devm. I'd relax that rule a bit. There are resources that drivers must never, ever access after .remove(), such as MMIO. Using devm_ioremap* should therefore be safe in all cases. > > 1. At the beginning of .remove(), flag that the device is being removed. > > > > 2. At the entry point of all fops, check the flag and return with an > > error if set. This prevents *new* fops from being entered once > > .remove() has started. > > > > 3. At the entry point of all fops, flag that a fop is in progress (with > > a counter). > > > > 4. In .remove(), wake up all threads sleeping in fops. > > > > 5. At the exit point of all fops, decrease the fop-in-progress counter. > > > > 6. In .remove(), wait until the fop-in-progress counter reaches 0. At > > that point, it is guaranteed that all fops will have completed and > > that no new fop can run. > > > > 7. Complete .remove(), freeing resources. > > This is is basically just open coding a rwsem.. :) Yes, and that's why I wrote that my explanation was conceptual :-) I think it's important for developers (at least the ones developing subsystem integration for this, if not all driver authors) to understand the concepts behind it. Then we can optimize performance with the right kernel primitives. If we start the API documentation by talking about SRCU, people will be lost. > SRCU should be faster than this, IIRC it has less cache line bouncing. > > But sure, it is all easy once you figure out how to give the fops shim > some place to store all this state since people would not agree to > make this a universal cost to all fops. I didn't see any push back against Dan's proposal to store that information in struct cdev, did I miss something ? > > > Yes there are other cases, and certainly I've commonly seen cases of > > > drivers reaching into other drivers, and subsystem non-file ops, but > > > these cases usualy have other more fundamental issues with remove > > > races :( > > > > Drivers using resources provided by other drivers is a more complex > > problem. I'm thinking about a driver acquiring a GPIO in .probe(), and > > the GPIO provider disappearing at a random point of time. Or a clock, or > > a regulator. These issues are more rare if we ignore unbinding drivers > > forcefully through sysfs, but they can still occur, so we should try to > > find a solution here too (and the sysfs unbind issue is also worth > > fixing). There, unlike in the cdev case, some sort of revocable API > > could make sense. > > IMHO the sanest answer is to force the depending driver to unplug > before allowing the dependend driver to remove. Isn't that what the > dev link stuff does? IDK it has been forever now since I've done > embedded stuff.. I think it's the simplest answer, yes. It's a bit like using a canon to kill a fly though, but all other solutions would I think be much more complex. -- Regards, Laurent Pinchart
On Thu, Jan 29, 2026 at 12:38:50PM +0200, Laurent Pinchart wrote: > On Wed, Jan 28, 2026 at 09:23:22PM -0400, Jason Gunthorpe wrote: > > On Thu, Jan 29, 2026 at 03:08:22AM +0200, Laurent Pinchart wrote: > > > > The latter already have robust schemes to help the driver shutdown and > > > > end the concurrent operations. ie cancel_work_sync(), > > > > del_timer_sync(), free_irq(), and *notifier_unregister(). > > > > > > One a side note, devm_request_irq() is another of the devm_* helpers > > > that cause race conditions, as interrupt handlers can run right after > > > .remove() returns, which drivers will most likely not handle correctly. > > > > Yes! You *cannot* intermix devm and non-devm approaches without > > creating very subtle bugs exactly like this. If your subsystem does > > not provide a "devm register" helper its drivers shouldn't use devm. > > I'd relax that rule a bit. There are resources that drivers must never, > ever access after .remove(), such as MMIO. Using devm_ioremap* should > therefore be safe in all cases. Yeah, probably, but I've seen driver using devm before & after non-devm and it is just too hard to tell if things are going to even work right. To be fair the IRQ issue is always more involved. The subsystem should provide a state after unregistration where the memory is still around and the IRQ path into the subsystem becomes a NOP. The driver then frees the IRQ, fences work and releases the driver memory. It is hard to do this sequence with devm.. I think a lot of places manage without this because seeing interrupts after unregister is probably a rare race condition in their HW. > > But sure, it is all easy once you figure out how to give the fops shim > > some place to store all this state since people would not agree to > > make this a universal cost to all fops. > > I didn't see any push back against Dan's proposal to store that > information in struct cdev, did I miss something ? I also don't see an issue with that, especially if we can stack misc on top of cdev to share the same logic. I think if you take that idea and the other proposal to shim the fops with ones that use the cdev data then we can see some cdev_unregister_sync() primitive. Jason
On Thu, Jan 29, 2026 at 09:34:40AM -0400, Jason Gunthorpe wrote: > On Thu, Jan 29, 2026 at 12:38:50PM +0200, Laurent Pinchart wrote: > > On Wed, Jan 28, 2026 at 09:23:22PM -0400, Jason Gunthorpe wrote: > > > On Thu, Jan 29, 2026 at 03:08:22AM +0200, Laurent Pinchart wrote: > > > > > The latter already have robust schemes to help the driver shutdown and > > > > > end the concurrent operations. ie cancel_work_sync(), > > > > > del_timer_sync(), free_irq(), and *notifier_unregister(). > > > > > > > > One a side note, devm_request_irq() is another of the devm_* helpers > > > > that cause race conditions, as interrupt handlers can run right after > > > > .remove() returns, which drivers will most likely not handle correctly. > > > > > > Yes! You *cannot* intermix devm and non-devm approaches without > > > creating very subtle bugs exactly like this. If your subsystem does > > > not provide a "devm register" helper its drivers shouldn't use devm. > > > > I'd relax that rule a bit. There are resources that drivers must never, > > ever access after .remove(), such as MMIO. Using devm_ioremap* should > > therefore be safe in all cases. > > Yeah, probably, but I've seen driver using devm before & after > non-devm and it is just too hard to tell if things are going to > even work right. > > To be fair the IRQ issue is always more involved. The subsystem should > provide a state after unregistration where the memory is still around > and the IRQ path into the subsystem becomes a NOP. The driver then > frees the IRQ, fences work and releases the driver memory. > > It is hard to do this sequence with devm.. > > I think a lot of places manage without this because seeing interrupts > after unregister is probably a rare race condition in their HW. > > > > But sure, it is all easy once you figure out how to give the fops shim > > > some place to store all this state since people would not agree to > > > make this a universal cost to all fops. > > > > I didn't see any push back against Dan's proposal to store that > > information in struct cdev, did I miss something ? > > I also don't see an issue with that, especially if we can stack misc > on top of cdev to share the same logic. > > I think if you take that idea and the other proposal to shim the fops > with ones that use the cdev data then we can see some > cdev_unregister_sync() primitive. I think we'll need to split that primitive in two (or add a second primitive), as drivers need to wake up thread sleeping in fops between flagging the cdev as being unregistered and completing the unregistration with cdev_unregister_sync(). How to wake those threads up is highly driver-specific (or at least subsystem-specific), so we need two functions. Other than that, I think we're on the same page. -- Regards, Laurent Pinchart
Jason Gunthorpe wrote: > On Thu, Jan 29, 2026 at 03:08:22AM +0200, Laurent Pinchart wrote: > > > The latter already have robust schemes to help the driver shutdown and > > > end the concurrent operations. ie cancel_work_sync(), > > > del_timer_sync(), free_irq(), and *notifier_unregister(). > > > > One a side note, devm_request_irq() is another of the devm_* helpers > > that cause race conditions, as interrupt handlers can run right after > > .remove() returns, which drivers will most likely not handle correctly. > > Yes! You *cannot* intermix devm and non-devm approaches without > creating very subtle bugs exactly like this. If your subsystem does > not provide a "devm register" helper its drivers shouldn't use devm. I wonder if we should have a proactive debug mode that checks for idiomatic devres usage and flags: - registering devres actions while the driver is detached - registering devres actions for a device with a driver that has a .remove() method - passing a devres allocation to a kobject API - invoking devres release actions from a kobject release API
On Thu Jan 29, 2026 at 4:42 AM CET, dan.j.williams wrote: > Jason Gunthorpe wrote: >> On Thu, Jan 29, 2026 at 03:08:22AM +0200, Laurent Pinchart wrote: >> > > The latter already have robust schemes to help the driver shutdown and >> > > end the concurrent operations. ie cancel_work_sync(), >> > > del_timer_sync(), free_irq(), and *notifier_unregister(). >> > >> > One a side note, devm_request_irq() is another of the devm_* helpers >> > that cause race conditions, as interrupt handlers can run right after >> > .remove() returns, which drivers will most likely not handle correctly. >> >> Yes! You *cannot* intermix devm and non-devm approaches without >> creating very subtle bugs exactly like this. If your subsystem does >> not provide a "devm register" helper its drivers shouldn't use devm. > > I wonder if we should have a proactive debug mode that checks for > idiomatic devres usage and flags: > > - registering devres actions while the driver is detached In Rust we already enforce this through the type system, i.e. we even fail to compile the code when this is violated. (I.e. for all scopes that guarantee that a device is bound (and will remain throughout) we give out a &Device<Bound>, which serves as a cookie.) In C I don't really see how that would be possible without additional locking, as the only thing we could check is dev->driver, which obviously is racy. > - registering devres actions for a device with a driver that has a > .remove() method This is perfectly valid, drivers might still be performing teardown operations on the device (if it did not get hot unplugged). > - passing a devres allocation to a kobject API Well, this is an ownership violation. Technically, devres owns this allocation and devres releases the allocation when the device is unbound. Which makes this allocation only ever valid to access from a scope that is guaranteed that the device is (and remains) bound. > - invoking devres release actions from a kobject release API This is similar to "registering devres actions while the driver is detached" and falls into the boarder problem class of "messing with devres objects outside of a Device<Bound> scope". Again, I think in C we can't properly protect against this. While we could also give out a "Device<Bound>" token for scopes where we can guarantee that the device is actually bound to a driver in C, we can't control the lifetime of the token as we can in Rust, which makes it pointless. So, the best thing we can probably do is document that "This must only be called from a scope that guarantees that the device remains bound throughout." for all the devres accessors. There may be one thing that comes to my mind that we could do though. While we can't catch those at compile time, we might be able to catch those on runtime. For instance, we could "abuse" lockdep and register a fake lock for a Device<Bound> scope and put a lockdep assert into all the devres accessors. However, fixing up all the violations that come up when introducing this sounds like a huge pain. :)
Danilo Krummrich wrote: > On Thu Jan 29, 2026 at 4:42 AM CET, dan.j.williams wrote: > > Jason Gunthorpe wrote: > >> On Thu, Jan 29, 2026 at 03:08:22AM +0200, Laurent Pinchart wrote: > >> > > The latter already have robust schemes to help the driver shutdown and > >> > > end the concurrent operations. ie cancel_work_sync(), > >> > > del_timer_sync(), free_irq(), and *notifier_unregister(). > >> > > >> > One a side note, devm_request_irq() is another of the devm_* helpers > >> > that cause race conditions, as interrupt handlers can run right after > >> > .remove() returns, which drivers will most likely not handle correctly. > >> > >> Yes! You *cannot* intermix devm and non-devm approaches without > >> creating very subtle bugs exactly like this. If your subsystem does > >> not provide a "devm register" helper its drivers shouldn't use devm. > > > > I wonder if we should have a proactive debug mode that checks for > > idiomatic devres usage and flags: > > > > - registering devres actions while the driver is detached > > In Rust we already enforce this through the type system, i.e. we even fail to > compile the code when this is violated. (I.e. for all scopes that guarantee that > a device is bound (and will remain throughout) we give out a &Device<Bound>, > which serves as a cookie.) > > In C I don't really see how that would be possible without additional locking, > as the only thing we could check is dev->driver, which obviously is racy. > > > - registering devres actions for a device with a driver that has a > > .remove() method > > This is perfectly valid, drivers might still be performing teardown operations > on the device (if it did not get hot unplugged). Yes, this one is a soft warning. It is perfectly valid, but the first thing I look for in a device that uses devm in ->probe() and also has a ->remove() method is dependencies of the devm object on the ->remove() managed object. That case is potentially mitigated if all resources are devm acquired and no ->remove() is needed. > > - passing a devres allocation to a kobject API > > Well, this is an ownership violation. Technically, devres owns this allocation > and devres releases the allocation when the device is unbound. Which makes this > allocation only ever valid to access from a scope that is guaranteed that the > device is (and remains) bound. To be clear I am talking about: dev2 = devm_kzalloc(dev1...) init(dev2) device_register(dev2) ...where it must be valid past unbind of dev1. > > - invoking devres release actions from a kobject release API > > This is similar to "registering devres actions while the driver is detached" and > falls into the boarder problem class of "messing with devres objects outside of > a Device<Bound> scope". > > Again, I think in C we can't properly protect against this. While we could also > give out a "Device<Bound>" token for scopes where we can guarantee that the > device is actually bound to a driver in C, we can't control the lifetime of the > token as we can in Rust, which makes it pointless. This is why Rust remains on my, learn when I have time, pile. The goal would not be to "properly protect", but "sufficiently warn" the unsuspecting. If anything is going to get me to convert a subsystem to Rust it is to get help from the compiler for the review burden of the above abuses. > So, the best thing we can probably do is document that "This must only be called > from a scope that guarantees that the device remains bound throughout." for all > the devres accessors. Agree. > There may be one thing that comes to my mind that we could do though. While we > can't catch those at compile time, we might be able to catch those on runtime. > > For instance, we could "abuse" lockdep and register a fake lock for a > Device<Bound> scope and put a lockdep assert into all the devres accessors. > > However, fixing up all the violations that come up when introducing this sounds > like a huge pain. :) Right, and as you said there are many valid uses that are not typically recommended that would make the warning not useful.
On Thu, Jan 29, 2026 at 10:56:22AM +0100, Danilo Krummrich wrote: > On Thu Jan 29, 2026 at 4:42 AM CET, dan.j.williams wrote: > > Jason Gunthorpe wrote: > >> On Thu, Jan 29, 2026 at 03:08:22AM +0200, Laurent Pinchart wrote: > >> > > The latter already have robust schemes to help the driver shutdown and > >> > > end the concurrent operations. ie cancel_work_sync(), > >> > > del_timer_sync(), free_irq(), and *notifier_unregister(). > >> > > >> > One a side note, devm_request_irq() is another of the devm_* helpers > >> > that cause race conditions, as interrupt handlers can run right after > >> > .remove() returns, which drivers will most likely not handle correctly. > >> > >> Yes! You *cannot* intermix devm and non-devm approaches without > >> creating very subtle bugs exactly like this. If your subsystem does > >> not provide a "devm register" helper its drivers shouldn't use devm. > > > > I wonder if we should have a proactive debug mode that checks for > > idiomatic devres usage and flags: > > > > - registering devres actions while the driver is detached > > In Rust we already enforce this through the type system, i.e. we even fail to > compile the code when this is violated. (I.e. for all scopes that guarantee that > a device is bound (and will remain throughout) we give out a &Device<Bound>, > which serves as a cookie.) > > In C I don't really see how that would be possible without additional locking, > as the only thing we could check is dev->driver, which obviously is racy. > > > - registering devres actions for a device with a driver that has a > > .remove() method > > This is perfectly valid, drivers might still be performing teardown operations > on the device (if it did not get hot unplugged). > > > - passing a devres allocation to a kobject API > > Well, this is an ownership violation. Technically, devres owns this allocation > and devres releases the allocation when the device is unbound. Which makes this > allocation only ever valid to access from a scope that is guaranteed that the > device is (and remains) bound. > > > - invoking devres release actions from a kobject release API > > This is similar to "registering devres actions while the driver is detached" and > falls into the boarder problem class of "messing with devres objects outside of > a Device<Bound> scope". > > Again, I think in C we can't properly protect against this. While we could also > give out a "Device<Bound>" token for scopes where we can guarantee that the > device is actually bound to a driver in C, we can't control the lifetime of the > token as we can in Rust, which makes it pointless. > > So, the best thing we can probably do is document that "This must only be called > from a scope that guarantees that the device remains bound throughout." for all > the devres accessors. > > There may be one thing that comes to my mind that we could do though. While we > can't catch those at compile time, we might be able to catch those on runtime. C will not allow catching as many things as compile time as rust, that's for sure, but I don't think it's the main issue here. The core of the problem, in my opinion, is that we have seen a proliferation of devres and similar helpers that were quickly adopted by drivers as it made their life easier, *but* without any documentation of the caveats and correct usage patterns. We have APIs that don't tell how to use them correctly, so we can hardly blame driver developers for not doing it right. In many cases we haven't even thought about what the right (and wrong) usage patterns are, and in some cases the APIs have been developed with so little awareness of these issues that there's no correct usage pattern. Fixing this is the first step, then we can see how to use the features provided by the programming language to minimize the risk of incorrect usage. > For instance, we could "abuse" lockdep and register a fake lock for a > Device<Bound> scope and put a lockdep assert into all the devres accessors. > > However, fixing up all the violations that come up when introducing this sounds > like a huge pain. :) -- Regards, Laurent Pinchart
On Wed, 28 Jan 2026 00:52:32 +0100, Jason Gunthorpe <jgg@nvidia.com> said: > On Tue, Jan 27, 2026 at 10:18:27PM +0100, Bartosz Golaszewski wrote: >> maintainers to settle an important question. It seems there are two >> camps in this discussion: one whose perception of the problem is >> limited to character devices being referenced from user-space at the >> time of the driver unbind (favoring fixing the issues at the vfs >> level) and another extending the problem to any driver unbinding where >> we cannot ensure a proper ordering of the teardown (for whatever >> reason: > > I don't think you can defend any position where the user can do > > echo XYZ > /sys/.../YY > > and the kernel has an oops. > > Meaing in this discussion if the user does > echo ".." > /sys/bus/XX/drivers/YY/unbind > > The kernel shouldn't oops or warn. > > I've seen many kinds of bogus arguments over the years (especially > misunderstanding what the module refcount does!!), but ultimately I > think this is the generally agreed expectation. > > However, in practice this isn't a common work flow, it is quite alot > of tricky work to understand how a subsystem works and put in the > necessary protections, and frankly many subsystems have had these bugs > for their entire existance. It isn't urgent. > In general, that's true. However I have first bumped into this can of worms at work when a client's custom device using a USB-to-I2C adapter and creating devices dynamically from user-space would crash the kernel or freeze a thread when the device would be disconnected with processes still referencing file descriptors. It may not be common but I hope we do care about corner-cases too. So I'd say: it's fine as is until it isn't. :) > Several subsystems do get it right, it is very possible and the best > practices are much more aligned with the Device<Bound> stuff in > Rust. ie guarantee in most contexts that remove() can't run. > > I'm not surprised to hear pushback on trying to fix it, especially if > the proposed fixes are not subsystem comphrenesive in > nature. Sprinkling SRCU around as partial patches, especially in > drivers, is not a good idea, IMHO. > For sure. I've been trying to address it exclusively in subsystem code, where this kind of logic belongs. After all, subsystem code is where we do complex things to make drivers simpler. One exception is I2C where the logic is so broken we need to first rework a lot of drivers. Wolfram is on board with that though. > The reason cdev keeps coming up is because there are few common ways a > typical driver can actually generate concurrent operations during and > after remove that would be problematic. > > File descriptors, subsystem callbacks, work queues, timers, > interrupts, and notifiers. > > The latter already have robust schemes to help the driver shutdown and > end the concurrent operations. ie cancel_work_sync(), > del_timer_sync(), free_irq(), and *notifier_unregister(). > > Many wrappered file descriptors are safe. For example the sysfs usage > in a driver is sync stopped during device_del's calls to sysfs remove > functions. > > IMHO the largest systemic issue in this space is people making their > own fops without understanding the lifecycle model and without > hand-rolling a special a "_sync" kind of shutdown around it. > > A managed fops with a sync destruction operation would go a long way > to closing these issues. > > ie the gpiolib example was basically all fops, one work and a bunch of > places where the protection was redundant. > > Yes there are other cases, and certainly I've commonly seen cases of > drivers reaching into other drivers, and subsystem non-file ops, but > these cases usualy have other more fundamental issues with remove > races :( > > So I would probably also take a strong position that introducing > something like DevRes where you try to wrapper MMIO or other device > resources is adamently not something we want to do. Not because I > don't care about these removal races, but because I want the drivers > to run in a Device<Bound> context with very few exceptions. > > Jason > Rust makes it very clear who owns what. That's something that we struggle a lot with in C, where we have drivers that create objects and then pass them on to the subsystem for management but the latter still references objects that are destroyed when the driver is unbound. Devres when used right is great and the lifetime is very clear: from binding to unbinding. The using right is the hard part though. :) Bartosz
> One exception is I2C where the logic is so broken we need to first > rework a lot of drivers. Let's say "bitrotten" instead of broken. People used what was available at that time and they prevented the kernel from crashing, at least. And up to now, nobody had the bandwidth to improve that part in I2C. > Wolfram is on board with that though. Ack. I want to emphasize here that for I2C the SRCU part goes into the subsystem, not into the drivers. (Disclaimer: I don't have the time to even read all the mails discussing 'revocable' despite it maybe being used in I2C. I am busy enough handling the preparations needed to improve the I2C core to handle the lifecycle issues. If 'revocable' is the final piece or not is a second step for me. Maybe even a third.) > > The reason cdev keeps coming up is because there are few common ways a > > typical driver can actually generate concurrent operations during and > > after remove that would be problematic. Let me point out again that Dan Williams already had a PoC-patch for handling the cdev issue generically [1]. Dunno if this fact is present in the current discussion. [1] https://lkml.org/lkml/2021/1/20/999
On Wed, Jan 28, 2026 at 11:01:03AM +0100, Wolfram Sang wrote: > > > One exception is I2C where the logic is so broken we need to first > > rework a lot of drivers. > > Let's say "bitrotten" instead of broken. People used what was available > at that time and they prevented the kernel from crashing, at least. And > up to now, nobody had the bandwidth to improve that part in I2C. > > > Wolfram is on board with that though. > > Ack. I want to emphasize here that for I2C the SRCU part goes into the > subsystem, not into the drivers. I would just gently advise again that SRCU is not a pancea and should only be used if the read side sections are super performance critical. I'm not sure that describes I2C. rwsem is often a simpler and better choice. > > > The reason cdev keeps coming up is because there are few common ways a > > > typical driver can actually generate concurrent operations during and > > > after remove that would be problematic. > > Let me point out again that Dan Williams already had a PoC-patch for > handling the cdev issue generically [1]. Dunno if this fact is present > in the current discussion. > > [1] https://lkml.org/lkml/2021/1/20/999 Yeah, this was brought up a couple drafts of possible options were exchanged already but nothing was really focused on and polished. It is a tricky problem to find a storage location for the lock and revoke so that the fops shim can access it while not disturbing the actual driver. Jason
> I would just gently advise again that SRCU is not a pancea and should > only be used if the read side sections are super performance > critical. I'm not sure that describes I2C. rwsem is often a simpler > and better choice. You might be totally right. But in any case, we need to prepare the subsystem first. That is where I am now. Next steps come after that. > > [1] https://lkml.org/lkml/2021/1/20/999 > > Yeah, this was brought up a couple drafts of possible options were > exchanged already but nothing was really focused on and polished. > > It is a tricky problem to find a storage location for the lock and > revoke so that the fops shim can access it while not disturbing the > actual driver. I can imagine. Thanks for the heads up, nice to see it was not forgotten!
On Wed, Jan 28, 2026 at 4:05 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Wed, Jan 28, 2026 at 11:01:03AM +0100, Wolfram Sang wrote:
> >
> > > One exception is I2C where the logic is so broken we need to first
> > > rework a lot of drivers.
> >
> > Let's say "bitrotten" instead of broken. People used what was available
> > at that time and they prevented the kernel from crashing, at least. And
> > up to now, nobody had the bandwidth to improve that part in I2C.
> >
> > > Wolfram is on board with that though.
> >
> > Ack. I want to emphasize here that for I2C the SRCU part goes into the
> > subsystem, not into the drivers.
>
> I would just gently advise again that SRCU is not a pancea and should
> only be used if the read side sections are super performance
> critical. I'm not sure that describes I2C. rwsem is often a simpler
> and better choice.
>
Sure. We're not there yet so that's TBD. The reason for using SRCU in
GPIO was the fact that we have consumers that may call into GPIOLIB
from atomic context - so we must not sleep - but we also have sleeping
controllers - so we must not take a spinlock before entering driver
code. SRCU was about the only synchronization mechanism that worked in
all cases as the locking at the subsystem level needs to be
centralized.
Also SRCU protects pointers by design, which makes the split into
subsystem- and provider-owned data logical in this case:
struct subsys_data {
struct driver_data __rcu *priv;
struct device dev;
}
Bartosz
On Wed, Jan 28, 2026 at 04:20:02PM +0100, Bartosz Golaszewski wrote:
> On Wed, Jan 28, 2026 at 4:05 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Wed, Jan 28, 2026 at 11:01:03AM +0100, Wolfram Sang wrote:
> > >
> > > > One exception is I2C where the logic is so broken we need to first
> > > > rework a lot of drivers.
> > >
> > > Let's say "bitrotten" instead of broken. People used what was available
> > > at that time and they prevented the kernel from crashing, at least. And
> > > up to now, nobody had the bandwidth to improve that part in I2C.
> > >
> > > > Wolfram is on board with that though.
> > >
> > > Ack. I want to emphasize here that for I2C the SRCU part goes into the
> > > subsystem, not into the drivers.
> >
> > I would just gently advise again that SRCU is not a pancea and should
> > only be used if the read side sections are super performance
> > critical. I'm not sure that describes I2C. rwsem is often a simpler
> > and better choice.
> >
>
> Sure. We're not there yet so that's TBD. The reason for using SRCU in
> GPIO was the fact that we have consumers that may call into GPIOLIB
> from atomic context - so we must not sleep - but we also have sleeping
> controllers - so we must not take a spinlock before entering driver
> code. SRCU was about the only synchronization mechanism that worked in
> all cases as the locking at the subsystem level needs to be
> centralized.
Oh, I didn't know you could safely use SRCU within atomic
sections. That's a pretty good reason to use it in some places.
> struct subsys_data {
> struct driver_data __rcu *priv;
> struct device dev;
> }
IMHO this is not a good pattern.
The thing to RCU protect is the *driver ops* pointer, not the driver
data. It is what I was saying before, the goal is to not call ops if
the driver is revoked so that the ops can assume they are in a
Device<Bound> context.
It should not be normal for the core code to be touching priv at all,
it has no natural reason to ever load it. It does have to fetch ops!
Jason
On Wed, Jan 28, 2026 at 5:01 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Wed, Jan 28, 2026 at 04:20:02PM +0100, Bartosz Golaszewski wrote:
> > On Wed, Jan 28, 2026 at 4:05 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >
> > > On Wed, Jan 28, 2026 at 11:01:03AM +0100, Wolfram Sang wrote:
> > > >
> > > > > One exception is I2C where the logic is so broken we need to first
> > > > > rework a lot of drivers.
> > > >
> > > > Let's say "bitrotten" instead of broken. People used what was available
> > > > at that time and they prevented the kernel from crashing, at least. And
> > > > up to now, nobody had the bandwidth to improve that part in I2C.
> > > >
> > > > > Wolfram is on board with that though.
> > > >
> > > > Ack. I want to emphasize here that for I2C the SRCU part goes into the
> > > > subsystem, not into the drivers.
> > >
> > > I would just gently advise again that SRCU is not a pancea and should
> > > only be used if the read side sections are super performance
> > > critical. I'm not sure that describes I2C. rwsem is often a simpler
> > > and better choice.
> > >
> >
> > Sure. We're not there yet so that's TBD. The reason for using SRCU in
> > GPIO was the fact that we have consumers that may call into GPIOLIB
> > from atomic context - so we must not sleep - but we also have sleeping
> > controllers - so we must not take a spinlock before entering driver
> > code. SRCU was about the only synchronization mechanism that worked in
> > all cases as the locking at the subsystem level needs to be
> > centralized.
>
> Oh, I didn't know you could safely use SRCU within atomic
> sections. That's a pretty good reason to use it in some places.
>
> > struct subsys_data {
> > struct driver_data __rcu *priv;
> > struct device dev;
> > }
>
> IMHO this is not a good pattern.
>
> The thing to RCU protect is the *driver ops* pointer, not the driver
> data. It is what I was saying before, the goal is to not call ops if
> the driver is revoked so that the ops can assume they are in a
> Device<Bound> context.
>
> It should not be normal for the core code to be touching priv at all,
> it has no natural reason to ever load it. It does have to fetch ops!
>
This is just an example and priv can hold whatever. Ideally this would
be ops but it's not always the case with existing code where ops and
driver data are often mixed into a single structure.
Bart
On Sun Jan 25, 2026 at 1:47 PM CET, Greg Kroah-Hartman wrote:
> On Sat, Jan 24, 2026 at 08:08:28PM +0100, Danilo Krummrich wrote:
>> But to be fair, I also want to point out that there is a quite significant
>> difference regarding the usefulness of the revocable concept in C compared to in
>> Rust due to language capabilities.
>
> True, but we do need something. I took these patches without a real
> user as a base for us to start working off of. The rust implementation
> has shown that the design-pattern is a good solution for the problem,
> and so I feel we should work with it and try to get this working
> properly.
I agree, it's a matter of figuring out the best way to adopt this pattern. (For
those reading along, [1] details the Rust implementation to illustrate why it
may not make sense to adopt it in the same way).
> So I don't want to take these reverts, let's try this out, by putting this
> into the driver core now, we have the base to experiment with in a "safe" way
> in lots of different driver subsytems at the same time.
I also don't think this should be reverted -- I think it makes sense to start
experimenting to figure out what's the best way to adopt this pattern.
I think DRM has already shown interest in adopting this, and I think I have also
seen patch series doing preparation work to be able to adopt this pattern as
well.
Perhaps, to address some of the concerns, a good middle ground could be to mark
the feature as experimental with a separate Kconfig in the meantime?
--
[1] Revocable in Rust
In Rust (most) device resources are only ever handed out to drivers within an
opaque container type specific for device resources, hence it is named
Devres<T>.
The Devres container type uses the Revocable type internally to protect the
actual device resource; the resource is released automatically once the
corresponding device is unbound from the driver (for which it uses the "normal"
devres infrastructure).
There are mainly two ways to access a device resource with a Devres container:
Devres::access() and Devres::try_access().
Devres::access() provides a zero-cost access to the inner device resource, but
requires a proof that it is called from a scope where it is guaranteed that the
device remains bound. This is achieved by Devres::access() taking a
&Device<Bound> argument, i.e. a reference (not reference count) of a device that
is guaranteed to be bound for the entire lifetime of the reference.
Let's have a look at an example with an IRQ handler:
struct MyIrqData {
bar: Devres<pci::Bar>,
}
impl irq::Handler for MyIrqData {
fn handle(&self, dev: &Device<Bound>) -> IrqReturn {
// Directly access the inner `pci::Bar`; fails if the resource
// did not originate from `dev`.
//
// (Internally, this is just a pointer comparison between `dev`
// and the device the `Devres` container has been created with.)
if let Ok(bar) = self.bar.access(dev) else {
return IrqReturn::None;
}
// `bar` is a `&pci::Bar`; no (S)RCU read side critical section
// involved.
bar.write32(...);
IrqReturn::Handled
}
}
Since the Rust IRQ handler always guarantees that it won't race with device
unbind, we can provide a &Device<Bound> cookie and hence directly access the
device resource with no cost. Due to the type system representation of the
device context state, this is checked at compile time.
We can do this for anything that guarantees a scope where the device must be
bound. For instance, if we guarantee that a workqueue is torn down on device
unbind, we can provide a &Device<Bound> cookie for all work items scheduled on
this workqueue. The same applies to subsystem and filesystem callbacks etc.
But, if we are in a scope where we don't have a &Device<Bound>, it means that
this may be executed after device unbind. Consequently, drivers can't call
Devres::access() for direct access of the device resource (because it may be a
UAF) and, instead, have to fall back to Devres::try_access() and friends.
Let's take some DRM IOCTL for instance:
struct MyDriver {
bar: Devres<pci::Bar>,
}
fn ioctl_vm_create(
drm: &drm::Device<MyDriver>,
req: &mut uapi::drm_mydriver_vm_create,
file: &drm::File<MyDriverFile>,
) -> Result<u32> {
// Runs the closure in an (S)RCU read side critical section if the
// resource is available, returns ENXIO otherwise.
drm.bar.try_access_with(|bar| {
// (S)RCU read side critical section starts here.
bar.write32(...);
// (S)RCU read side critical section ends here.
}).ok_or(ENXIO)?;
Ok(0)
}
I think those examples make it obvious why a revocable implementation on the C
side can't provide the same value and ergonomics due to language limitations,
yet I think it makes sense to start experimenting how subsystems can adopt this
design-pattern in C.
One additional note, while this overall can come across a bit paranoid, it
achieves a clear goal: It becomes impossible for drivers to mess this up and
create memory safety bugs, while at the same time causing zero overhead in hot
paths, that can be proven to not have a potential for such bugs in the first
place.
On Sun, Jan 25, 2026 at 06:53:15PM +0100, Danilo Krummrich wrote:
>
> Let's take some DRM IOCTL for instance:
>
> struct MyDriver {
> bar: Devres<pci::Bar>,
> }
>
> fn ioctl_vm_create(
> drm: &drm::Device<MyDriver>,
> req: &mut uapi::drm_mydriver_vm_create,
> file: &drm::File<MyDriverFile>,
> ) -> Result<u32> {
> // Runs the closure in an (S)RCU read side critical section if the
> // resource is available, returns ENXIO otherwise.
> drm.bar.try_access_with(|bar| {
> // (S)RCU read side critical section starts here.
>
> bar.write32(...);
>
> // (S)RCU read side critical section ends here.
> }).ok_or(ENXIO)?;
>
> Ok(0)
> }
That's the whole issue with DRM right there - allowing driver code to
run after the driver has unregistered from the subsystem is
*dangerous* and creates all these bugs.
From a rust perspective I would argue you should be looking at every
one of those try_access_with() sites in drivers as a code smell that
says something is questionable in the driver or subsystem.
In many other subsystems a driver should *never* use
"try_access_with". Unfortunately the rust solution forces
synchronize_srcu()'s anyway (which Bartoz rightly pointed out as an
unacceptable performance issue). IMHO since rust has the Device<Bound>
stuff the revocable should have used rwsem, because the expectation
should be that the majority uses access, not try_access.
> I think those examples make it obvious why a revocable implementation on the C
> side can't provide the same value and ergonomics due to language limitations,
> yet I think it makes sense to start experimenting how subsystems can adopt this
> design-pattern in C.
The most important part of this pattern, IMHO, is documenting when you
are in a safe Device<Bound> scope or not.
What the C version of revocable does is just enforce it *everwhere*
without any thought as to if it is papering over a bigger problem.
In C protecting some limited "device resources" is not nearly good
enough for alot of drivers since the root issue here is often the
author doesn't understand the undocumented contexts when it is
"try_access_with" vs "access" and then makes a lot more errors than
just "device resources". :(
Frankly I don't think "iterating" is going to salvage this idea. The
real value from rust was not in creating a thin wrapper around
SRCU. It is in all the other stuff Danilo has explained.
Jason
On Mon Jan 26, 2026 at 1:07 AM CET, Jason Gunthorpe wrote:
> That's the whole issue with DRM right there - allowing driver code to
> run after the driver has unregistered from the subsystem is
> *dangerous* and creates all these bugs.
Unfortunately, it is necessary (at least to a certain extend) in DRM. I think
there is space for improvements, but I don't think we can get rid of this
entirely, especially on the KMS side AFAIK.
(KMS is not exactly my core competence, so it would be better for someone else
to explain the details.)
> From a rust perspective I would argue you should be looking at every
> one of those try_access_with() sites in drivers as a code smell that
> says something is questionable in the driver or subsystem.
Agreed, if that is necessary it requires special attention and justification.
> In many other subsystems a driver should *never* use "try_access_with".
> Unfortunately the rust solution forces synchronize_srcu()'s anyway (which
> Bartoz rightly pointed out as an unacceptable performance issue).
I'm already working on patches that get this down to a single
synchronize_{s}rcu() call on driver unbind, which is not a hot path at all. So,
this should be fine. This should work for C code as well.
> IMHO since rust has the Device<Bound> stuff the revocable should have used
> rwsem, because the expectation should be that the majority uses access, not
> try_access.
Yes, the majority of uses is access(), not try_access(); not sure if rwsem is
the better solution though.
On Mon, Jan 26, 2026 at 05:08:20PM +0100, Danilo Krummrich wrote: > On Mon Jan 26, 2026 at 1:07 AM CET, Jason Gunthorpe wrote: > > That's the whole issue with DRM right there - allowing driver code to > > run after the driver has unregistered from the subsystem is > > *dangerous* and creates all these bugs. > > Unfortunately, it is necessary (at least to a certain extend) in DRM. I think > there is space for improvements, but I don't think we can get rid of this > entirely, especially on the KMS side AFAIK. From what I saw alot of the issue with DRM was how it works the fops, instead of the core subsytem managing the fops and calling the driver, the driver managed its own fops and called back to the core. Sure, the issues may be very hard to fix in existing code, but I find it hard to swallow the idea that a subsystem couldn't own all the fops/etc and guard every driver callback with a lock to generate the right kind of fence.. > > IMHO since rust has the Device<Bound> stuff the revocable should have used > > rwsem, because the expectation should be that the majority uses access, not > > try_access. > > Yes, the majority of uses is access(), not try_access(); not sure if rwsem is > the better solution though. rwsem is much faster on destroy and somewhat slower on read. Which sounds to match the use case here. Ie you wouldn't need to do special effort to bundle the synchronize_srcu() Jason
On Mon, Jan 26, 2026 at 01:07:20PM -0400, Jason Gunthorpe wrote: > On Mon, Jan 26, 2026 at 05:08:20PM +0100, Danilo Krummrich wrote: > > On Mon Jan 26, 2026 at 1:07 AM CET, Jason Gunthorpe wrote: > > > That's the whole issue with DRM right there - allowing driver code to > > > run after the driver has unregistered from the subsystem is > > > *dangerous* and creates all these bugs. > > > > Unfortunately, it is necessary (at least to a certain extend) in DRM. I think > > there is space for improvements, but I don't think we can get rid of this > > entirely, especially on the KMS side AFAIK. > > From what I saw alot of the issue with DRM was how it works the fops, > instead of the core subsytem managing the fops and calling the driver, > the driver managed its own fops and called back to the core. > > Sure, the issues may be very hard to fix in existing code, but I find > it hard to swallow the idea that a subsystem couldn't own all the > fops/etc and guard every driver callback with a lock to generate the > right kind of fence.. I also don't see a real technical reason. Retrofitting the right solution in all existing drivers wouldn't be for the faint-hearted though, so I understand the appeal for subsystems of a quick and easy suboptimal implementation. > > > IMHO since rust has the Device<Bound> stuff the revocable should have used > > > rwsem, because the expectation should be that the majority uses access, not > > > try_access. > > > > Yes, the majority of uses is access(), not try_access(); not sure if rwsem is > > the better solution though. > > rwsem is much faster on destroy and somewhat slower on read. Which > sounds to match the use case here. Ie you wouldn't need to do special > effort to bundle the synchronize_srcu() -- Regards, Laurent Pinchart
On Mon Jan 26, 2026 at 6:07 PM CET, Jason Gunthorpe wrote: > On Mon, Jan 26, 2026 at 05:08:20PM +0100, Danilo Krummrich wrote: >> Yes, the majority of uses is access(), not try_access(); not sure if rwsem is >> the better solution though. > > rwsem is much faster on destroy and somewhat slower on read. Which > sounds to match the use case here. Ie you wouldn't need to do special > effort to bundle the synchronize_srcu() While not that many, the try_access() cases may still be in hot paths, whereas the destroy case is always in a cold path, i.e. device driver unbind. Also note that if the resource is released "manually" before driver unbind, we use revoke_nosync() as the destructor already guarantees that there are no more users.
On Sun, Jan 25, 2026 at 01:47:14PM +0100, Greg KH wrote: > On Sat, Jan 24, 2026 at 08:08:28PM +0100, Danilo Krummrich wrote: > > On Sat Jan 24, 2026 at 6:05 PM CET, Johan Hovold wrote: > > > this does not look like the right interface for the chardev unplug issue. > > > > I think it depends, we should do everything to prevent having the issue in the > > first place, e.g. ensure that we synchronize the unplug properly on device > > driver unbind. > > > > Sometimes, however, this isn't possible; this is where a revocable mechanism can > > come in handy to prevent UAF of device resources -- DRM is a good example for > > this. > > This is not "possible" for almost all real devices so we need something > like this for almost all classes of devices, DRM just shows the extremes > involved, v4l2 is also another good example. Revocable is not needed in V4L2. > Note, other OSes also have this same problem, look at all the work the > BSDs are going through at the moment just to get closer to the place > where we are in Linux today with removable devices and they have hit our > same problems. > > > But to be fair, I also want to point out that there is a quite significant > > difference regarding the usefulness of the revocable concept in C compared to in > > Rust due to language capabilities. > > True, but we do need something. I took these patches without a real > user as a base for us to start working off of. The rust implementation > has shown that the design-pattern is a good solution for the problem, > and so I feel we should work with it and try to get this working > properly. We've been sitting and talking about it for years now, and > here is the first real code submission that is getting us closer to fix > the problem properly. It might not be perfict, but let's evolve it from > here for what is found not to work correctly. > > So I don't want to take these reverts, let's try this out, by putting > this into the driver core now, we have the base to experiment with in a > "safe" way in lots of different driver subsytems at the same time. If > it doesn't work out, worst case we revert it in a release or two because > it didn't get used. -- Regards, Laurent Pinchart
On Sun, Jan 25, 2026 at 01:47:14PM +0100, Greg KH wrote: > On Sat, Jan 24, 2026 at 08:08:28PM +0100, Danilo Krummrich wrote: > > On Sat Jan 24, 2026 at 6:05 PM CET, Johan Hovold wrote: > > > this does not look like the right interface for the chardev unplug issue. > > > > I think it depends, we should do everything to prevent having the issue in the > > first place, e.g. ensure that we synchronize the unplug properly on device > > driver unbind. > > > > Sometimes, however, this isn't possible; this is where a revocable mechanism can > > come in handy to prevent UAF of device resources -- DRM is a good example for > > this. > > This is not "possible" for almost all real devices so we need something > like this for almost all classes of devices, DRM just shows the extremes > involved, v4l2 is also another good example. > > Note, other OSes also have this same problem, look at all the work the > BSDs are going through at the moment just to get closer to the place > where we are in Linux today with removable devices and they have hit our > same problems. > > > But to be fair, I also want to point out that there is a quite significant > > difference regarding the usefulness of the revocable concept in C compared to in > > Rust due to language capabilities. > > True, but we do need something. I took these patches without a real > user as a base for us to start working off of. The rust implementation > has shown that the design-pattern is a good solution for the problem, > and so I feel we should work with it and try to get this working > properly. We've been sitting and talking about it for years now, and > here is the first real code submission that is getting us closer to fix > the problem properly. It might not be perfict, but let's evolve it from > here for what is found not to work correctly. > > So I don't want to take these reverts, let's try this out, by putting > this into the driver core now, we have the base to experiment with in a > "safe" way in lots of different driver subsytems at the same time. If > it doesn't work out, worst case we revert it in a release or two because > it didn't get used. It's the wrong solution for most cases, if not all. It will spread in drivers and then become another big piece of technical debt we'll wish we had never merged. We know what the right solution to the cdev race is, moving forward with the revocable API is shooting ourselves in the foot. Or both feet. Or much worse. I can't stop everybody from harming themselves, but I object vigourously when that impacts me. I want this to be reverted now. -- Regards, Laurent Pinchart
On Sun Jan 25, 2026 at 2:22 PM CET, Laurent Pinchart wrote: > It's the wrong solution for most cases, if not all. It will spread in drivers > and then become another big piece of technical debt we'll wish we had never > merged. It is a matter of how the revocable pattern is adopted. I.e. I don't think drivers should create instances of revocable (device) resources by themselves. Instead, I think it should be up to the corresponding subsystems to adopt the pattern in the way necessary and make it accessible to drivers instead. > We know what the right solution to the cdev race is So, what is it? Assuming that this is what you are referring to, how do you prevent accesses to (potentially freed) device resources after the bus device has been unbound from the driver for subsystems that may still call back into the driver after device unbind?
On Sun, Jan 25, 2026 at 03:07:41PM +0100, Danilo Krummrich wrote: > On Sun Jan 25, 2026 at 2:22 PM CET, Laurent Pinchart wrote: > > It's the wrong solution for most cases, if not all. It will spread in drivers > > and then become another big piece of technical debt we'll wish we had never > > merged. > > It is a matter of how the revocable pattern is adopted. I.e. I don't think > drivers should create instances of revocable (device) resources by themselves. > Instead, I think it should be up to the corresponding subsystems to adopt the > pattern in the way necessary and make it accessible to drivers instead. > > > We know what the right solution to the cdev race is > > So, what is it? Assuming that this is what you are referring to, how do you > prevent accesses to (potentially freed) device resources after the bus device > has been unbound from the driver for subsystems that may still call back into > the driver after device unbind? I've answered this question in another e-mail in this thread, see https://lore.kernel.org/all/20260129010822.GA3310904@killaraus/ -- Regards, Laurent Pinchart
On Sat, Jan 24, 2026 at 06:05:32PM +0100, Johan Hovold wrote: > I was surprised to learn that the revocable functionality was merged last week > given the community feedback on list and at LPC, but not least since there are > no users of it, which we are supposed to require to be able to evaluate it > properly. > > The chromeos ec driver issue which motivated this work turned out not to need > it as was found during review. And the example gpiolib conversion was posted > the very same morning that this was merged which hardly provides enough time > for evaluation (even if Bartosz quickly reported a performance regression). > > Turns out there are correctness issues with both the gpiolib conversion and > the revocable design itself that can lead to use-after-free and hung tasks (see > [1] and patch 3/3). > > And as was pointed out repeatedly during review, and again at the day of the > merge, this does not look like the right interface for the chardev unplug > issue. > > Revert the revocable implementation until a redesign has been proposed and > evaluated properly. I have voiced some of the concerns listed above. This was merge way too quickly, without proper review and evaluation of the API as a solution for the problem at hand. I don't want to see this API spreading through drivers the same way devm_kzalloc() did without developers understanding the limitations, it's just another recipe for disaster. I trust that we have enough knowledge and wisdom in the community to implement correct solutions to the producer-consumer races. Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> > [1] https://lore.kernel.org/all/aXT45B6vLf9R3Pbf@hovoldconsulting.com/ > > > Johan Hovold (3): > Revert "selftests: revocable: Add kselftest cases" > Revert "revocable: Add Kunit test cases" > Revert "revocable: Revocable resource management" > > .../driver-api/driver-model/index.rst | 1 - > .../driver-api/driver-model/revocable.rst | 152 ----------- > MAINTAINERS | 9 - > drivers/base/Kconfig | 8 - > drivers/base/Makefile | 5 +- > drivers/base/revocable.c | 241 ------------------ > drivers/base/revocable_test.c | 142 ----------- > include/linux/revocable.h | 69 ----- > tools/testing/selftests/Makefile | 1 - > .../selftests/drivers/base/revocable/Makefile | 7 - > .../drivers/base/revocable/revocable_test.c | 136 ---------- > .../drivers/base/revocable/test-revocable.sh | 39 --- > .../base/revocable/test_modules/Makefile | 10 - > .../revocable/test_modules/revocable_test.c | 195 -------------- > 14 files changed, 1 insertion(+), 1014 deletions(-) > delete mode 100644 Documentation/driver-api/driver-model/revocable.rst > delete mode 100644 drivers/base/revocable.c > delete mode 100644 drivers/base/revocable_test.c > delete mode 100644 include/linux/revocable.h > delete mode 100644 tools/testing/selftests/drivers/base/revocable/Makefile > delete mode 100644 tools/testing/selftests/drivers/base/revocable/revocable_test.c > delete mode 100755 tools/testing/selftests/drivers/base/revocable/test-revocable.sh > delete mode 100644 tools/testing/selftests/drivers/base/revocable/test_modules/Makefile > delete mode 100644 tools/testing/selftests/drivers/base/revocable/test_modules/revocable_test.c -- Regards, Laurent Pinchart
© 2016 - 2026 Red Hat, Inc.