VFIO allows error recovery and notification for devices that
are PCIe (and thus AER) capable. But for PCI devices on IBM
s390 error recovery involves platform firmware and
notification to operating system is done by architecture
specific way. The Internal Shared Memory(ISM) device is a legacy
PCI device (so not PCIe capable), but can still be recovered
when notified of an error.
Relax the PCIe only requirement for ISM devices, so passthrough
ISM devices can be notified and recovered on error.
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
drivers/vfio/pci/vfio_pci_core.c | 18 ++++++++++++++++--
drivers/vfio/pci/vfio_pci_intrs.c | 2 +-
drivers/vfio/pci/vfio_pci_priv.h | 3 +++
3 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 7220a22135a9..1faab80139c6 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -723,6 +723,20 @@ void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev)
}
EXPORT_SYMBOL_GPL(vfio_pci_core_finish_enable);
+bool vfio_pci_device_can_recover(struct vfio_pci_core_device *vdev)
+{
+ struct pci_dev *pdev = vdev->pdev;
+
+ if (pci_is_pcie(pdev))
+ return true;
+
+ if (pdev->vendor == PCI_VENDOR_ID_IBM &&
+ pdev->device == PCI_DEVICE_ID_IBM_ISM)
+ return true;
+
+ return false;
+}
+
static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_type)
{
if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
@@ -749,7 +763,7 @@ static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_typ
return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
}
} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
- if (pci_is_pcie(vdev->pdev))
+ if (vfio_pci_device_can_recover(vdev))
return 1;
} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
return 1;
@@ -1150,7 +1164,7 @@ static int vfio_pci_ioctl_get_irq_info(struct vfio_pci_core_device *vdev,
case VFIO_PCI_REQ_IRQ_INDEX:
break;
case VFIO_PCI_ERR_IRQ_INDEX:
- if (pci_is_pcie(vdev->pdev))
+ if (vfio_pci_device_can_recover(vdev))
break;
fallthrough;
default:
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 123298a4dc8f..f5384086ac45 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -838,7 +838,7 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
case VFIO_PCI_ERR_IRQ_INDEX:
switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
case VFIO_IRQ_SET_ACTION_TRIGGER:
- if (pci_is_pcie(vdev->pdev))
+ if (vfio_pci_device_can_recover(vdev))
func = vfio_pci_set_err_trigger;
break;
}
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index 5288577b3170..93c1e29fbbbb 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -36,6 +36,9 @@ ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, char __user *buf,
ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite);
+bool vfio_pci_device_can_recover(struct vfio_pci_core_device *vdev);
+
+
#ifdef CONFIG_VFIO_PCI_VGA
ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite);
--
2.43.0
On Wed, Aug 13, 2025 at 10:08:20AM -0700, Farhan Ali wrote: > VFIO allows error recovery and notification for devices that > are PCIe (and thus AER) capable. But for PCI devices on IBM > s390 error recovery involves platform firmware and > notification to operating system is done by architecture > specific way. The Internal Shared Memory(ISM) device is a legacy > PCI device (so not PCIe capable), but can still be recovered > when notified of an error. "PCIe (and thus AER) capable" reads as though AER is required for all PCIe devices, but AER is optional. I don't know the details of VFIO and why it tests for PCIe instead of AER. Maybe AER is not relevant here and you don't need to mention AER above at all? > Relax the PCIe only requirement for ISM devices, so passthrough > ISM devices can be notified and recovered on error. Nit: it looks like all your commit logs could be rewrapped to fill about 75 columns (to leave space for "git log" to indent them and still fit in 80 columns). IMHO not much value in using a smaller width than that. > Signed-off-by: Farhan Ali <alifm@linux.ibm.com> > --- > drivers/vfio/pci/vfio_pci_core.c | 18 ++++++++++++++++-- > drivers/vfio/pci/vfio_pci_intrs.c | 2 +- > drivers/vfio/pci/vfio_pci_priv.h | 3 +++ > 3 files changed, 20 insertions(+), 3 deletions(-) > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > index 7220a22135a9..1faab80139c6 100644 > --- a/drivers/vfio/pci/vfio_pci_core.c > +++ b/drivers/vfio/pci/vfio_pci_core.c > @@ -723,6 +723,20 @@ void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev) > } > EXPORT_SYMBOL_GPL(vfio_pci_core_finish_enable); > > +bool vfio_pci_device_can_recover(struct vfio_pci_core_device *vdev) > +{ > + struct pci_dev *pdev = vdev->pdev; > + > + if (pci_is_pcie(pdev)) > + return true; > + > + if (pdev->vendor == PCI_VENDOR_ID_IBM && > + pdev->device == PCI_DEVICE_ID_IBM_ISM) > + return true; > + > + return false; > +} > + > static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_type) > { > if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) { > @@ -749,7 +763,7 @@ static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_typ > return (flags & PCI_MSIX_FLAGS_QSIZE) + 1; > } > } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) { > - if (pci_is_pcie(vdev->pdev)) > + if (vfio_pci_device_can_recover(vdev)) > return 1; > } else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) { > return 1; > @@ -1150,7 +1164,7 @@ static int vfio_pci_ioctl_get_irq_info(struct vfio_pci_core_device *vdev, > case VFIO_PCI_REQ_IRQ_INDEX: > break; > case VFIO_PCI_ERR_IRQ_INDEX: > - if (pci_is_pcie(vdev->pdev)) > + if (vfio_pci_device_can_recover(vdev)) > break; > fallthrough; > default: > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c > index 123298a4dc8f..f5384086ac45 100644 > --- a/drivers/vfio/pci/vfio_pci_intrs.c > +++ b/drivers/vfio/pci/vfio_pci_intrs.c > @@ -838,7 +838,7 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags, > case VFIO_PCI_ERR_IRQ_INDEX: > switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { > case VFIO_IRQ_SET_ACTION_TRIGGER: > - if (pci_is_pcie(vdev->pdev)) > + if (vfio_pci_device_can_recover(vdev)) > func = vfio_pci_set_err_trigger; > break; > } > diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h > index 5288577b3170..93c1e29fbbbb 100644 > --- a/drivers/vfio/pci/vfio_pci_priv.h > +++ b/drivers/vfio/pci/vfio_pci_priv.h > @@ -36,6 +36,9 @@ ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, char __user *buf, > ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, > size_t count, loff_t *ppos, bool iswrite); > > +bool vfio_pci_device_can_recover(struct vfio_pci_core_device *vdev); > + > + > #ifdef CONFIG_VFIO_PCI_VGA > ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf, > size_t count, loff_t *ppos, bool iswrite); > -- > 2.43.0 >
On 8/14/2025 1:48 PM, Bjorn Helgaas wrote: > On Wed, Aug 13, 2025 at 10:08:20AM -0700, Farhan Ali wrote: >> VFIO allows error recovery and notification for devices that >> are PCIe (and thus AER) capable. But for PCI devices on IBM >> s390 error recovery involves platform firmware and >> notification to operating system is done by architecture >> specific way. The Internal Shared Memory(ISM) device is a legacy >> PCI device (so not PCIe capable), but can still be recovered >> when notified of an error. > "PCIe (and thus AER) capable" reads as though AER is required for all > PCIe devices, but AER is optional. > > I don't know the details of VFIO and why it tests for PCIe instead of > AER. Maybe AER is not relevant here and you don't need to mention > AER above at all? The original change that introduced this commit dad9f89 "VFIO-AER: Vfio-pci driver changes for supporting AER" was adding the support for AER for vfio. My assumption is the author thought if the device is AER capable the pcie check should be sufficient? I can remove the AER references in commit message. Thanks Farhan > >> Relax the PCIe only requirement for ISM devices, so passthrough >> ISM devices can be notified and recovered on error. > Nit: it looks like all your commit logs could be rewrapped to fill > about 75 columns (to leave space for "git log" to indent them and > still fit in 80 columns). IMHO not much value in using a smaller > width than that. > Sure, will fix this. >> Signed-off-by: Farhan Ali <alifm@linux.ibm.com> >> --- >> drivers/vfio/pci/vfio_pci_core.c | 18 ++++++++++++++++-- >> drivers/vfio/pci/vfio_pci_intrs.c | 2 +- >> drivers/vfio/pci/vfio_pci_priv.h | 3 +++ >> 3 files changed, 20 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c >> index 7220a22135a9..1faab80139c6 100644 >> --- a/drivers/vfio/pci/vfio_pci_core.c >> +++ b/drivers/vfio/pci/vfio_pci_core.c >> @@ -723,6 +723,20 @@ void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev) >> } >> EXPORT_SYMBOL_GPL(vfio_pci_core_finish_enable); >> >> +bool vfio_pci_device_can_recover(struct vfio_pci_core_device *vdev) >> +{ >> + struct pci_dev *pdev = vdev->pdev; >> + >> + if (pci_is_pcie(pdev)) >> + return true; >> + >> + if (pdev->vendor == PCI_VENDOR_ID_IBM && >> + pdev->device == PCI_DEVICE_ID_IBM_ISM) >> + return true; >> + >> + return false; >> +} >> + >> static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_type) >> { >> if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) { >> @@ -749,7 +763,7 @@ static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_typ >> return (flags & PCI_MSIX_FLAGS_QSIZE) + 1; >> } >> } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) { >> - if (pci_is_pcie(vdev->pdev)) >> + if (vfio_pci_device_can_recover(vdev)) >> return 1; >> } else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) { >> return 1; >> @@ -1150,7 +1164,7 @@ static int vfio_pci_ioctl_get_irq_info(struct vfio_pci_core_device *vdev, >> case VFIO_PCI_REQ_IRQ_INDEX: >> break; >> case VFIO_PCI_ERR_IRQ_INDEX: >> - if (pci_is_pcie(vdev->pdev)) >> + if (vfio_pci_device_can_recover(vdev)) >> break; >> fallthrough; >> default: >> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c >> index 123298a4dc8f..f5384086ac45 100644 >> --- a/drivers/vfio/pci/vfio_pci_intrs.c >> +++ b/drivers/vfio/pci/vfio_pci_intrs.c >> @@ -838,7 +838,7 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags, >> case VFIO_PCI_ERR_IRQ_INDEX: >> switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { >> case VFIO_IRQ_SET_ACTION_TRIGGER: >> - if (pci_is_pcie(vdev->pdev)) >> + if (vfio_pci_device_can_recover(vdev)) >> func = vfio_pci_set_err_trigger; >> break; >> } >> diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h >> index 5288577b3170..93c1e29fbbbb 100644 >> --- a/drivers/vfio/pci/vfio_pci_priv.h >> +++ b/drivers/vfio/pci/vfio_pci_priv.h >> @@ -36,6 +36,9 @@ ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, char __user *buf, >> ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, >> size_t count, loff_t *ppos, bool iswrite); >> >> +bool vfio_pci_device_can_recover(struct vfio_pci_core_device *vdev); >> + >> + >> #ifdef CONFIG_VFIO_PCI_VGA >> ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf, >> size_t count, loff_t *ppos, bool iswrite); >> -- >> 2.43.0 >>
On Thu, 14 Aug 2025 14:02:05 -0700 Farhan Ali <alifm@linux.ibm.com> wrote: > On 8/14/2025 1:48 PM, Bjorn Helgaas wrote: > > On Wed, Aug 13, 2025 at 10:08:20AM -0700, Farhan Ali wrote: > >> VFIO allows error recovery and notification for devices that > >> are PCIe (and thus AER) capable. But for PCI devices on IBM > >> s390 error recovery involves platform firmware and > >> notification to operating system is done by architecture > >> specific way. The Internal Shared Memory(ISM) device is a legacy > >> PCI device (so not PCIe capable), but can still be recovered > >> when notified of an error. > > "PCIe (and thus AER) capable" reads as though AER is required for all > > PCIe devices, but AER is optional. > > > > I don't know the details of VFIO and why it tests for PCIe instead of > > AER. Maybe AER is not relevant here and you don't need to mention > > AER above at all? > > The original change that introduced this commit dad9f89 "VFIO-AER: > Vfio-pci driver changes for supporting AER" was adding the support for > AER for vfio. My assumption is the author thought if the device is AER > capable the pcie check should be sufficient? I can remove the AER > references in commit message. Thanks Farhan I've looked back through discussions when this went in and can't find any specific reasoning about why we chose pci_is_pcie() here. Maybe we were trying to avoid setting up an error signal on devices that cannot have AER, but then why didn't we check specifically for AER. Maybe some version used PCIe specific calls in the handler that we didn't want to check runtime, but I don't spot such a dependency now. Possibly we should just remove the check. We're configuring the error signaling on the vast majority of devices, it's extremely rare that it fires anyway, reporting it on a device where it cannot trigger seems relatively negligible and avoids extra ugly code. Thanks, Alex
On 8/15/2025 1:48 PM, Alex Williamson wrote: > On Thu, 14 Aug 2025 14:02:05 -0700 > Farhan Ali <alifm@linux.ibm.com> wrote: > >> On 8/14/2025 1:48 PM, Bjorn Helgaas wrote: >>> On Wed, Aug 13, 2025 at 10:08:20AM -0700, Farhan Ali wrote: >>>> VFIO allows error recovery and notification for devices that >>>> are PCIe (and thus AER) capable. But for PCI devices on IBM >>>> s390 error recovery involves platform firmware and >>>> notification to operating system is done by architecture >>>> specific way. The Internal Shared Memory(ISM) device is a legacy >>>> PCI device (so not PCIe capable), but can still be recovered >>>> when notified of an error. >>> "PCIe (and thus AER) capable" reads as though AER is required for all >>> PCIe devices, but AER is optional. >>> >>> I don't know the details of VFIO and why it tests for PCIe instead of >>> AER. Maybe AER is not relevant here and you don't need to mention >>> AER above at all? >> The original change that introduced this commit dad9f89 "VFIO-AER: >> Vfio-pci driver changes for supporting AER" was adding the support for >> AER for vfio. My assumption is the author thought if the device is AER >> capable the pcie check should be sufficient? I can remove the AER >> references in commit message. Thanks Farhan > I've looked back through discussions when this went in and can't find > any specific reasoning about why we chose pci_is_pcie() here. Maybe > we were trying to avoid setting up an error signal on devices that > cannot have AER, but then why didn't we check specifically for AER. > Maybe some version used PCIe specific calls in the handler that we > didn't want to check runtime, but I don't spot such a dependency now. > > Possibly we should just remove the check. We're configuring the error > signaling on the vast majority of devices, it's extremely rare that it > fires anyway, reporting it on a device where it cannot trigger seems > relatively negligible and avoids extra ugly code. Thanks, > > Alex Okay will remove the check and re-word the commit message. Thanks Farhan
© 2016 - 2025 Red Hat, Inc.