drivers/pci/msi/msi.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
While running Android15-6.6 and hacking on latest Qemu, I observed
the following with KFENCE:
[ 10.761992][ T81] pcieport 0000:00:03.0: Adding to iommu group 0
[ 10.782536][ T81] pcieport 0000:00:03.0: enabling device (0000 -> 0003)
[ 10.814259][ T81] ==================================================================
[ 10.816534][ T81] BUG: KFENCE: use-after-free read in __pci_enable_msi_range+0x2c0/0x488
[ 10.816534][ T81]
[ 10.820189][ T81] Use-after-free read at 0x0000000024629571 (in kfence-#12):
[ 10.822652][ T81] __pci_enable_msi_range+0x2c0/0x488
[ 10.824304][ T81] pci_alloc_irq_vectors_affinity+0xec/0x14c
[ 10.826129][ T81] pci_alloc_irq_vectors+0x18/0x28
[ 10.827606][ T81] pcie_portdrv_probe+0x1e0/0x608
[ 10.828979][ T81] pci_device_probe+0xa8/0x174
[ 10.830316][ T81] really_probe+0x150/0x2b8
[ 10.831717][ T81] __driver_probe_device+0x7c/0x130
[ 10.833368][ T81] driver_probe_device+0x40/0x118
[ 10.834899][ T81] __device_attach_driver+0xc4/0x108
[ 10.836464][ T81] bus_for_each_drv+0x8c/0xf0
[ 10.837915][ T81] __device_attach+0xa4/0x198
[ 10.839260][ T81] device_initial_probe+0x18/0x28
[ 10.840684][ T81] bus_probe_device+0xb0/0xb4
[ 10.842174][ T81] deferred_probe_work_func+0xac/0xf8
[ 10.843760][ T81] process_one_work+0x18c/0x414
[ 10.845129][ T81] worker_thread+0x41c/0x53c
[ 10.846611][ T81] kthread+0x114/0x118
[ 10.847841][ T81] ret_from_fork+0x10/0x20
[ 10.849298][ T81]
[ 10.850435][ T81] kfence-#12: 0x0000000008614900-0x00000000e06c228d, size=104, cache=kmalloc-128
[ 10.850435][ T81]
[ 10.853455][ T81] allocated by task 81 on cpu 7 at 10.808142s:
[ 10.856329][ T81] __kmem_cache_alloc_node+0x1f0/0x2bc
[ 10.857988][ T81] kmalloc_trace+0x44/0x138
[ 10.859299][ T81] msi_alloc_desc+0x3c/0x9c
[ 10.860582][ T81] msi_domain_insert_msi_desc+0x30/0x78
[ 10.862058][ T81] msi_setup_msi_desc+0x13c/0x184
[ 10.862942][ T81] __pci_enable_msi_range+0x258/0x488
[ 10.863847][ T81] pci_alloc_irq_vectors_affinity+0xec/0x14c
[ 10.864821][ T81] pci_alloc_irq_vectors+0x18/0x28
[ 10.866011][ T81] pcie_portdrv_probe+0x1e0/0x608
[ 10.867047][ T81] pci_device_probe+0xa8/0x174
[ 10.867937][ T81] really_probe+0x150/0x2b8
[ 10.868774][ T81] __driver_probe_device+0x7c/0x130
[ 10.869755][ T81] driver_probe_device+0x40/0x118
[ 10.870665][ T81] __device_attach_driver+0xc4/0x108
[ 10.871654][ T81] bus_for_each_drv+0x8c/0xf0
[ 10.872585][ T81] __device_attach+0xa4/0x198
[ 10.873581][ T81] device_initial_probe+0x18/0x28
[ 10.874645][ T81] bus_probe_device+0xb0/0xb4
[ 10.875610][ T81] deferred_probe_work_func+0xac/0xf8
[ 10.876696][ T81] process_one_work+0x18c/0x414
[ 10.877573][ T81] worker_thread+0x41c/0x53c
[ 10.878468][ T81] kthread+0x114/0x118
[ 10.879331][ T81] ret_from_fork+0x10/0x20
[ 10.880238][ T81]
[ 10.880767][ T81] freed by task 81 on cpu 7 at 10.811436s:
[ 10.882311][ T81] msi_domain_free_descs+0xd4/0x10c
[ 10.883318][ T81] msi_domain_free_locked.part.0+0xc0/0x1d8
[ 10.884401][ T81] msi_domain_alloc_irqs_all_locked+0xb4/0xbc
[ 10.885519][ T81] pci_msi_setup_msi_irqs+0x30/0x4c
[ 10.886549][ T81] __pci_enable_msi_range+0x2a8/0x488
[ 10.887537][ T81] pci_alloc_irq_vectors_affinity+0xec/0x14c
[ 10.888576][ T81] pci_alloc_irq_vectors+0x18/0x28
[ 10.889693][ T81] pcie_portdrv_probe+0x1e0/0x608
[ 10.890719][ T81] pci_device_probe+0xa8/0x174
[ 10.891623][ T81] really_probe+0x150/0x2b8
[ 10.892534][ T81] __driver_probe_device+0x7c/0x130
[ 10.893541][ T81] driver_probe_device+0x40/0x118
[ 10.894491][ T81] __device_attach_driver+0xc4/0x108
[ 10.895433][ T81] bus_for_each_drv+0x8c/0xf0
[ 10.896381][ T81] __device_attach+0xa4/0x198
[ 10.897359][ T81] device_initial_probe+0x18/0x28
[ 10.898401][ T81] bus_probe_device+0xb0/0xb4
[ 10.899268][ T81] deferred_probe_work_func+0xac/0xf8
[ 10.900314][ T81] process_one_work+0x18c/0x414
[ 10.901279][ T81] worker_thread+0x41c/0x53c
[ 10.902138][ T81] kthread+0x114/0x118
[ 10.902916][ T81] ret_from_fork+0x10/0x20
[ 10.903816][ T81]
[ 10.904429][ T81] CPU: 7 PID: 81 Comm: kworker/u16:2 Not tainted 6.6.23mostafa+ #224
[ 10.906362][ T81] Hardware name: linux,dummy-virt (DT)
[ 10.907934][ T81] Workqueue: events_unbound deferred_probe_work_func
[ 10.909959][ T81] ==================================================================
Looking at the upstream code, it seems to have the same issue where:
Descriptor allocation done in:
__pci_enable_msi_range
msi_capability_init
msi_setup_msi_desc
msi_insert_msi_desc
msi_domain_insert_msi_desc
msi_alloc_desc
...
Freed in case of failure in __msi_domain_alloc_locked()
__pci_enable_msi_range
msi_capability_init
pci_msi_setup_msi_irqs
msi_domain_alloc_irqs_all_locked
msi_domain_alloc_locked
__msi_domain_alloc_locked => fails
msi_domain_free_locked
...
That failure would propagate back till pci_msi_setup_msi_irqs() call
in msi_capability_init() which have "goto err" which would access the
descriptor to clear the mask.
However, we can’t make assumptions if the descriptor is freed or not
as it depends on the failure location and on MSI_FLAG_FREE_MSI_DESCS,
so we copy it and use the copy to mask it in case of error.
Also, calling pci_free_msi_irqs() after will trigger
msi_domain_free_locked() again, however it re-iterates through the xa
array, which should be safe.
I hit this only once, but I can confirm the bug(and verify the fix)
for the upstream kernel through stubbing return of
__msi_domain_alloc_locked() to fail and with KASAN enabled.
bf6e054e0e3f ("genirq/msi: Provide msi_device_populate/destroy_sysfs()")
is the first commit that introduced the free logic from the context
of pci_msi_setup_msi_irqs()
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
v2:
- Copy descriptor as suggested by tglx to unmaks freed msis
---
drivers/pci/msi/msi.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index c5625dd9bf49..3a45879d85db 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -352,7 +352,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
struct irq_affinity *affd)
{
struct irq_affinity_desc *masks = NULL;
- struct msi_desc *entry;
+ struct msi_desc *entry, desc;
int ret;
/* Reject multi-MSI early on irq domain enabled architectures */
@@ -377,6 +377,12 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
/* All MSIs are unmasked by default; mask them all */
entry = msi_first_desc(&dev->dev, MSI_DESC_ALL);
pci_msi_mask(entry, msi_multi_mask(entry));
+ /*
+ * Copy the MSI descriptor for the error path because
+ * pci_msi_setup_msi_irqs() will free it for the hierarchical
+ * interrupt domain case.
+ */
+ memcpy(&desc, entry, sizeof(desc));
/* Configure MSI capability structure */
ret = pci_msi_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSI);
@@ -396,7 +402,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
goto unlock;
err:
- pci_msi_unmask(entry, msi_multi_mask(entry));
+ pci_msi_unmask(&desc, msi_multi_mask(&desc));
pci_free_msi_irqs(dev);
fail:
dev->msi_enabled = 0;
--
2.45.2.741.gdbec12cfda-goog
The following commit has been merged into the irq/urgent branch of tip:
Commit-ID: 9eee5330656bf92f51cb1f09b2dc9f8cf975b3d1
Gitweb: https://git.kernel.org/tip/9eee5330656bf92f51cb1f09b2dc9f8cf975b3d1
Author: Mostafa Saleh <smostafa@google.com>
AuthorDate: Mon, 24 Jun 2024 20:37:28
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 24 Jun 2024 23:33:38 +02:00
PCI/MSI: Fix UAF in msi_capability_init
KFENCE reports the following UAF:
BUG: KFENCE: use-after-free read in __pci_enable_msi_range+0x2c0/0x488
Use-after-free read at 0x0000000024629571 (in kfence-#12):
__pci_enable_msi_range+0x2c0/0x488
pci_alloc_irq_vectors_affinity+0xec/0x14c
pci_alloc_irq_vectors+0x18/0x28
kfence-#12: 0x0000000008614900-0x00000000e06c228d, size=104, cache=kmalloc-128
allocated by task 81 on cpu 7 at 10.808142s:
__kmem_cache_alloc_node+0x1f0/0x2bc
kmalloc_trace+0x44/0x138
msi_alloc_desc+0x3c/0x9c
msi_domain_insert_msi_desc+0x30/0x78
msi_setup_msi_desc+0x13c/0x184
__pci_enable_msi_range+0x258/0x488
pci_alloc_irq_vectors_affinity+0xec/0x14c
pci_alloc_irq_vectors+0x18/0x28
freed by task 81 on cpu 7 at 10.811436s:
msi_domain_free_descs+0xd4/0x10c
msi_domain_free_locked.part.0+0xc0/0x1d8
msi_domain_alloc_irqs_all_locked+0xb4/0xbc
pci_msi_setup_msi_irqs+0x30/0x4c
__pci_enable_msi_range+0x2a8/0x488
pci_alloc_irq_vectors_affinity+0xec/0x14c
pci_alloc_irq_vectors+0x18/0x28
Descriptor allocation done in:
__pci_enable_msi_range
msi_capability_init
msi_setup_msi_desc
msi_insert_msi_desc
msi_domain_insert_msi_desc
msi_alloc_desc
...
Freed in case of failure in __msi_domain_alloc_locked()
__pci_enable_msi_range
msi_capability_init
pci_msi_setup_msi_irqs
msi_domain_alloc_irqs_all_locked
msi_domain_alloc_locked
__msi_domain_alloc_locked => fails
msi_domain_free_locked
...
That failure propagates back to pci_msi_setup_msi_irqs() in
msi_capability_init() which accesses the descriptor for unmasking in the
error exit path.
Cure it by copying the descriptor and using the copy for the error exit path
unmask operation.
[ tglx: Massaged change log ]
Fixes: bf6e054e0e3f ("genirq/msi: Provide msi_device_populate/destroy_sysfs()")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Heelgas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240624203729.1094506-1-smostafa@google.com
---
drivers/pci/msi/msi.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index c5625dd..3a45879 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -352,7 +352,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
struct irq_affinity *affd)
{
struct irq_affinity_desc *masks = NULL;
- struct msi_desc *entry;
+ struct msi_desc *entry, desc;
int ret;
/* Reject multi-MSI early on irq domain enabled architectures */
@@ -377,6 +377,12 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
/* All MSIs are unmasked by default; mask them all */
entry = msi_first_desc(&dev->dev, MSI_DESC_ALL);
pci_msi_mask(entry, msi_multi_mask(entry));
+ /*
+ * Copy the MSI descriptor for the error path because
+ * pci_msi_setup_msi_irqs() will free it for the hierarchical
+ * interrupt domain case.
+ */
+ memcpy(&desc, entry, sizeof(desc));
/* Configure MSI capability structure */
ret = pci_msi_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSI);
@@ -396,7 +402,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
goto unlock;
err:
- pci_msi_unmask(entry, msi_multi_mask(entry));
+ pci_msi_unmask(&desc, msi_multi_mask(&desc));
pci_free_msi_irqs(dev);
fail:
dev->msi_enabled = 0;
© 2016 - 2025 Red Hat, Inc.