drivers/firewire/ohci.c | 2 ++ 1 file changed, 2 insertions(+)
Commit 5a95f1ded28691e6 ("firewire: ohci: use devres for requested IRQ")
also removed the call to free_irq() in pci_remove(), leading to a
leftover irq of devm_request_irq() at pci_disable_msi() in pci_remove()
when unbinding the driver from the device
remove_proc_entry: removing non-empty directory 'irq/136', leaking at
least 'firewire_ohci'
Call Trace:
? remove_proc_entry+0x19c/0x1c0
? __warn+0x81/0x130
? remove_proc_entry+0x19c/0x1c0
? report_bug+0x171/0x1a0
? console_unlock+0x78/0x120
? handle_bug+0x3c/0x80
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? remove_proc_entry+0x19c/0x1c0
unregister_irq_proc+0xf4/0x120
free_desc+0x3d/0xe0
? kfree+0x29f/0x2f0
irq_free_descs+0x47/0x70
msi_domain_free_locked.part.0+0x19d/0x1d0
msi_domain_free_irqs_all_locked+0x81/0xc0
pci_free_msi_irqs+0x12/0x40
pci_disable_msi+0x4c/0x60
pci_remove+0x9d/0xc0 [firewire_ohci
01b483699bebf9cb07a3d69df0aa2bee71db1b26]
pci_device_remove+0x37/0xa0
device_release_driver_internal+0x19f/0x200
unbind_store+0xa1/0xb0
remove irq with devm_free_irq() before pci_disable_msi()
also remove it in fail_msi: of pci_probe() as this would lead to
an identical leak
Fixes: 5a95f1ded28691e6 ("firewire: ohci: use devres for requested IRQ")
Signed-off-by: Edmund Raile <edmund.raile@proton.me>
---
Using FW643 with vfio-pci required unbinding from firewire_ohci,
doing so currently produces a memory leak due to a leftover irq
which this patch removes.
The irq can be observed while the driver is loaded and bound:
find /proc/irq -type d -name "firewire_ohci"
Is it a good idea to submit a patch to devm_request_irq() in
include/linux/interrupt.h to add the function comment
/*
* counterpart: devm_free_irq()
*/
so LSPs show that hint?
v2 change: corrected patch title
drivers/firewire/ohci.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c
index 9db9290c3269..7bc71f4be64a 100644
--- a/drivers/firewire/ohci.c
+++ b/drivers/firewire/ohci.c
@@ -3773,6 +3773,7 @@ static int pci_probe(struct pci_dev *dev,
return 0;
fail_msi:
+ devm_free_irq(&dev->dev, dev->irq, ohci);
pci_disable_msi(dev);
return err;
@@ -3800,6 +3801,7 @@ static void pci_remove(struct pci_dev *dev)
software_reset(ohci);
+ devm_free_irq(&dev->dev, dev->irq, ohci);
pci_disable_msi(dev);
dev_notice(&dev->dev, "removing fw-ohci device\n");
--
2.43.0
Hi,
On Thu, Feb 29, 2024 at 02:47:59PM +0000, Edmund Raile wrote:
>
> Commit 5a95f1ded28691e6 ("firewire: ohci: use devres for requested IRQ")
> also removed the call to free_irq() in pci_remove(), leading to a
> leftover irq of devm_request_irq() at pci_disable_msi() in pci_remove()
> when unbinding the driver from the device
>
> remove_proc_entry: removing non-empty directory 'irq/136', leaking at
> least 'firewire_ohci'
> Call Trace:
> ? remove_proc_entry+0x19c/0x1c0
> ? __warn+0x81/0x130
> ? remove_proc_entry+0x19c/0x1c0
> ? report_bug+0x171/0x1a0
> ? console_unlock+0x78/0x120
> ? handle_bug+0x3c/0x80
> ? exc_invalid_op+0x17/0x70
> ? asm_exc_invalid_op+0x1a/0x20
> ? remove_proc_entry+0x19c/0x1c0
> unregister_irq_proc+0xf4/0x120
> free_desc+0x3d/0xe0
> ? kfree+0x29f/0x2f0
> irq_free_descs+0x47/0x70
> msi_domain_free_locked.part.0+0x19d/0x1d0
> msi_domain_free_irqs_all_locked+0x81/0xc0
> pci_free_msi_irqs+0x12/0x40
> pci_disable_msi+0x4c/0x60
> pci_remove+0x9d/0xc0 [firewire_ohci
> 01b483699bebf9cb07a3d69df0aa2bee71db1b26]
> pci_device_remove+0x37/0xa0
> device_release_driver_internal+0x19f/0x200
> unbind_store+0xa1/0xb0
>
> remove irq with devm_free_irq() before pci_disable_msi()
> also remove it in fail_msi: of pci_probe() as this would lead to
> an identical leak
>
> Fixes: 5a95f1ded28691e6 ("firewire: ohci: use devres for requested IRQ")
>
> Signed-off-by: Edmund Raile <edmund.raile@proton.me>
Applied to for-linus branch. I'll send it for v6.8-final.
I think the pairs of 'pci_alloc_irq_vectors()' and 'request_irq()',
'free_irq()' and 'pci_free_irq_vectors()' would be fine here, but the
replacement of legacy API is not welcome in the last week of kernel
development, so I postpone the work to the future.
Thanks
Takashi Sakamoto
Hi,
Thanks for your taking the issue, and sending the patch. But I have a
concern about the change, since the allocated IRQ should be released as
the part of managed device resource[1]:
(include/linux/interrupt.h)
devm_request_irq()
(kernel/irq/devres.c)
->devm_request_threaded_irq()
->devres_alloc(devm_irq_release)
(kernel/irq/devres.c)
devrm_irq_release()
->free_irq()
In my opinion, the devres mechanism releases the allocated memory when
releasing the data of associated device structure. In our case, it is
the data of pci_dev structure (precisely, the data of device structure
embedded in it). In the call trace of your commit comment, the release
should be done in:
(drivers/base/dd.c)
device_release_driver_internal()
->__device_release_driver()
->device_unbind_cleanup()
(drivers/base/devres.c)
->devres_release_all(dev);
->release_nodes()
(kernel/irq/devres.c)
->free_irq()
However, you encounter the leak actually. I think we have another cause
for the leak, but never figured it out. Anyway, I'll further investigate
the issue.
[1] https://docs.kernel.org/driver-api/driver-model/devres.html
On Thu, Feb 29, 2024 at 02:47:59PM +0000, Edmund Raile wrote:
>
> Commit 5a95f1ded28691e6 ("firewire: ohci: use devres for requested IRQ")
> also removed the call to free_irq() in pci_remove(), leading to a
> leftover irq of devm_request_irq() at pci_disable_msi() in pci_remove()
> when unbinding the driver from the device
>
> remove_proc_entry: removing non-empty directory 'irq/136', leaking at
> least 'firewire_ohci'
> Call Trace:
> ? remove_proc_entry+0x19c/0x1c0
> ? __warn+0x81/0x130
> ? remove_proc_entry+0x19c/0x1c0
> ? report_bug+0x171/0x1a0
> ? console_unlock+0x78/0x120
> ? handle_bug+0x3c/0x80
> ? exc_invalid_op+0x17/0x70
> ? asm_exc_invalid_op+0x1a/0x20
> ? remove_proc_entry+0x19c/0x1c0
> unregister_irq_proc+0xf4/0x120
> free_desc+0x3d/0xe0
> ? kfree+0x29f/0x2f0
> irq_free_descs+0x47/0x70
> msi_domain_free_locked.part.0+0x19d/0x1d0
> msi_domain_free_irqs_all_locked+0x81/0xc0
> pci_free_msi_irqs+0x12/0x40
> pci_disable_msi+0x4c/0x60
> pci_remove+0x9d/0xc0 [firewire_ohci
> 01b483699bebf9cb07a3d69df0aa2bee71db1b26]
> pci_device_remove+0x37/0xa0
> device_release_driver_internal+0x19f/0x200
> unbind_store+0xa1/0xb0
>
> remove irq with devm_free_irq() before pci_disable_msi()
> also remove it in fail_msi: of pci_probe() as this would lead to
> an identical leak
>
> Fixes: 5a95f1ded28691e6 ("firewire: ohci: use devres for requested IRQ")
>
> Signed-off-by: Edmund Raile <edmund.raile@proton.me>
>
> ---
>
> Using FW643 with vfio-pci required unbinding from firewire_ohci,
> doing so currently produces a memory leak due to a leftover irq
> which this patch removes.
>
> The irq can be observed while the driver is loaded and bound:
> find /proc/irq -type d -name "firewire_ohci"
>
> Is it a good idea to submit a patch to devm_request_irq() in
> include/linux/interrupt.h to add the function comment
> /*
> * counterpart: devm_free_irq()
> */
> so LSPs show that hint?
>
> v2 change: corrected patch title
>
> drivers/firewire/ohci.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c
> index 9db9290c3269..7bc71f4be64a 100644
> --- a/drivers/firewire/ohci.c
> +++ b/drivers/firewire/ohci.c
> @@ -3773,6 +3773,7 @@ static int pci_probe(struct pci_dev *dev,
> return 0;
>
> fail_msi:
> + devm_free_irq(&dev->dev, dev->irq, ohci);
> pci_disable_msi(dev);
>
> return err;
> @@ -3800,6 +3801,7 @@ static void pci_remove(struct pci_dev *dev)
>
> software_reset(ohci);
>
> + devm_free_irq(&dev->dev, dev->irq, ohci);
> pci_disable_msi(dev);
>
> dev_notice(&dev->dev, "removing fw-ohci device\n");
> --
> 2.43.0
Thanks
Takashi Sakamoto
> In my opinion, the devres mechanism releases the allocated memory when > releasing the data of associated device structure. > device_release_driver_internal() > ->__device_release_driver() > ->device_unbind_cleanup() > (drivers/base/devres.c) > ->devres_release_all(dev); > ->release_nodes() > (kernel/irq/devres.c) > ->free_irq() Looking at __device_release_driver() in drivers/base/dd.c, device_remove() gets called, leading to dev->bus->remove(dev), which likely calls our good old friend from the call trace: pci_device_remove(). > > Call Trace: > > ? remove_proc_entry+0x19c/0x1c0 > > ? __warn+0x81/0x130 > > ? remove_proc_entry+0x19c/0x1c0 > > ? report_bug+0x171/0x1a0 > > ? console_unlock+0x78/0x120 > > ? handle_bug+0x3c/0x80 > > ? exc_invalid_op+0x17/0x70 > > ? asm_exc_invalid_op+0x1a/0x20 > > ? remove_proc_entry+0x19c/0x1c0 > > unregister_irq_proc+0xf4/0x120 > > free_desc+0x3d/0xe0 > > ? kfree+0x29f/0x2f0 > > irq_free_descs+0x47/0x70 > > msi_domain_free_locked.part.0+0x19d/0x1d0 > > msi_domain_free_irqs_all_locked+0x81/0xc0 > > pci_free_msi_irqs+0x12/0x40 > > pci_disable_msi+0x4c/0x60 > > pci_remove+0x9d/0xc0 [firewire_ohci > > 01b483699bebf9cb07a3d69df0aa2bee71db1b26] > > pci_device_remove+0x37/0xa0 > > device_release_driver_internal+0x19f/0x200 > > unbind_store+0xa1/0xb0 Then in ohci.c's pci_remove(), we kill the MSIs, which leads to the removal of the IRQ, etc. Back in __device_release_driver(), after device_remove(), device_unbind_cleanup() is called, leading to free_irq(), but too late. I think the order of these calls may be our issue but I doubt it has been done like this without good reason. That code is 8 years old, someone would have noticed if it had an error. I could be entirely wrong but the function description in /kernel/irq/devres.c tells me that function is meant to be used: > Except for the extra @dev argument, this function takes the > same arguments and performs the same function as free_irq(). > This function instead of free_irq() should be used to manually > free IRQs allocated with devm_request_irq(). And while devm_request_irq() has no function description of its own, its sister devm_request_threaded_irq() mentions this: > IRQs requested with this function will be > automatically freed on driver detach. > > If an IRQ allocated with this function needs to be freed > separately, devm_free_irq() must be used. Should we pull in the maintainers of dd.c for their opinion? Thank you very much for all the very hard work you do Sakamoto-Sensei!
Hi, (C.C.ed to the list of PCI SUBSYSTEM.) On Sat, Mar 02, 2024 at 04:52:06PM +0000, Edmund Raile wrote: > > In my opinion, the devres mechanism releases the allocated memory when > > releasing the data of associated device structure. > > device_release_driver_internal() > > ->__device_release_driver() > > ->device_unbind_cleanup() > > (drivers/base/devres.c) > > ->devres_release_all(dev); > > ->release_nodes() > > (kernel/irq/devres.c) > > ->free_irq() > > Looking at __device_release_driver() in drivers/base/dd.c, > device_remove() gets called, leading to dev->bus->remove(dev), > which likely calls our good old friend from the call trace: > pci_device_remove(). > > > > Call Trace: > > > ? remove_proc_entry+0x19c/0x1c0 > > > ? __warn+0x81/0x130 > > > ? remove_proc_entry+0x19c/0x1c0 > > > ? report_bug+0x171/0x1a0 > > > ? console_unlock+0x78/0x120 > > > ? handle_bug+0x3c/0x80 > > > ? exc_invalid_op+0x17/0x70 > > > ? asm_exc_invalid_op+0x1a/0x20 > > > ? remove_proc_entry+0x19c/0x1c0 > > > unregister_irq_proc+0xf4/0x120 > > > free_desc+0x3d/0xe0 > > > ? kfree+0x29f/0x2f0 > > > irq_free_descs+0x47/0x70 > > > msi_domain_free_locked.part.0+0x19d/0x1d0 > > > msi_domain_free_irqs_all_locked+0x81/0xc0 > > > pci_free_msi_irqs+0x12/0x40 > > > pci_disable_msi+0x4c/0x60 > > > pci_remove+0x9d/0xc0 [firewire_ohci > > > 01b483699bebf9cb07a3d69df0aa2bee71db1b26] > > > pci_device_remove+0x37/0xa0 > > > device_release_driver_internal+0x19f/0x200 > > > unbind_store+0xa1/0xb0 > > Then in ohci.c's pci_remove(), we kill the MSIs, which leads to > the removal of the IRQ, etc. > Back in __device_release_driver(), after device_remove(), > device_unbind_cleanup() is called, leading to free_irq(), but too late. > > I think the order of these calls may be our issue but I doubt it > has been done like this without good reason. > That code is 8 years old, someone would have noticed if it had an error. Now I got the point. Before optimizing to device managing resource, the 1394 OHCI driver called `free_irq()` then `pci_disable_msi()` in the .remove() callback. So the issue did not occur. At present, the order is reversed, as you find. To be honestly, I have little knowledge about current implementation of PCIe MSI operation and the current best-practice in Linux PCI subsystem. I've just replaced the old implementation of the driver with the relevant APIs, so I need to consult someone about the issue. > I could be entirely wrong but the function description in > /kernel/irq/devres.c tells me that function is meant to be used: > > > Except for the extra @dev argument, this function takes the > > same arguments and performs the same function as free_irq(). > > This function instead of free_irq() should be used to manually > > free IRQs allocated with devm_request_irq(). > > And while devm_request_irq() has no function description of its own, its > sister devm_request_threaded_irq() mentions this: > > > IRQs requested with this function will be > > automatically freed on driver detach. > > > > If an IRQ allocated with this function needs to be freed > > separately, devm_free_irq() must be used. > > Should we pull in the maintainers of dd.c for their opinion? > > Thank you very much for all the very hard work you do Sakamoto-Sensei! Indeed. If the current implementation of PCIe MSI requires the call of `free_irq()` (or something) before calling `pci_disable_msi()`, it should be documented. But we can also see the `pci_disable_msi()` is legacy API in PCIe MSI implementation[1]. I guess that the extra care of order to call these two functions would be useless nowadays by some enhancement. [1] https://docs.kernel.org/PCI/msi-howto.html#legacy-apis Thanks Takashi Sakamoto
© 2016 - 2026 Red Hat, Inc.