[RFC PATCH] vfio-ccw: Permit missing IRQs

Eric Farman posted 1 patch 3 years ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20210419184906.2847283-1-farman@linux.ibm.com
Maintainers: Matthew Rosato <mjrosato@linux.ibm.com>, Alex Williamson <alex.williamson@redhat.com>, Eric Farman <farman@linux.ibm.com>, Cornelia Huck <cohuck@redhat.com>, Thomas Huth <thuth@redhat.com>
There is a newer version of this series
hw/vfio/ccw.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
[RFC PATCH] vfio-ccw: Permit missing IRQs
Posted by Eric Farman 3 years ago
Commit 690e29b91102 ("vfio-ccw: Refactor ccw irq handler") changed
one of the checks for the IRQ notifier registration from saying
"the host needs to recognize the only IRQ that exists" to saying
"the host needs to recognize ANY IRQ that exists."

And this worked fine, because the subsequent change to support the
CRW IRQ notifier doesn't get into this code when running on an older
kernel, thanks to a guard by a capability region. The later addition
of the REQ(uest) IRQ by commit b2f96f9e4f5f ("vfio-ccw: Connect the
device request notifier") broke this assumption because there is no
matching capability region. Thus, running new QEMU on an older
kernel fails with:

  vfio: unexpected number of irqs 2

Let's simply remove the check (and the less-than-helpful message),
and make the VFIO_DEVICE_GET_IRQ_INFO ioctl request for the IRQ
being processed. If it returns with EINVAL, we can treat it as
an unfortunate mismatch but not a fatal error for the guest.

Fixes: 690e29b91102 ("vfio-ccw: Refactor ccw irq handler")
Fixes: b2f96f9e4f5f ("vfio-ccw: Connect the device request notifier")
Signed-off-by: Eric Farman <farman@linux.ibm.com>
---
 hw/vfio/ccw.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index b2df708e4b..cfbfc3d1a2 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -411,20 +411,19 @@ static void vfio_ccw_register_irq_notifier(VFIOCCWDevice *vcdev,
         return;
     }
 
-    if (vdev->num_irqs < irq + 1) {
-        error_setg(errp, "vfio: unexpected number of irqs %u",
-                   vdev->num_irqs);
-        return;
-    }
-
     argsz = sizeof(*irq_info);
     irq_info = g_malloc0(argsz);
     irq_info->index = irq;
     irq_info->argsz = argsz;
     if (ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO,
               irq_info) < 0 || irq_info->count < 1) {
-        error_setg_errno(errp, errno, "vfio: Error getting irq info");
-        goto out_free_info;
+        if (errno == EINVAL) {
+            warn_report("Unable to get information about IRQ %u", irq);
+            goto out_free_info;
+        } else {
+            error_setg_errno(errp, errno, "vfio: Error getting irq info");
+            goto out_free_info;
+        }
     }
 
     if (event_notifier_init(notifier, 0)) {
-- 
2.25.1


Re: [RFC PATCH] vfio-ccw: Permit missing IRQs
Posted by Cornelia Huck 3 years ago
On Mon, 19 Apr 2021 20:49:06 +0200
Eric Farman <farman@linux.ibm.com> wrote:

> Commit 690e29b91102 ("vfio-ccw: Refactor ccw irq handler") changed
> one of the checks for the IRQ notifier registration from saying
> "the host needs to recognize the only IRQ that exists" to saying
> "the host needs to recognize ANY IRQ that exists."
> 
> And this worked fine, because the subsequent change to support the
> CRW IRQ notifier doesn't get into this code when running on an older
> kernel, thanks to a guard by a capability region. The later addition
> of the REQ(uest) IRQ by commit b2f96f9e4f5f ("vfio-ccw: Connect the
> device request notifier") broke this assumption because there is no
> matching capability region. Thus, running new QEMU on an older
> kernel fails with:
> 
>   vfio: unexpected number of irqs 2
> 
> Let's simply remove the check (and the less-than-helpful message),
> and make the VFIO_DEVICE_GET_IRQ_INFO ioctl request for the IRQ
> being processed. If it returns with EINVAL, we can treat it as
> an unfortunate mismatch but not a fatal error for the guest.
> 
> Fixes: 690e29b91102 ("vfio-ccw: Refactor ccw irq handler")
> Fixes: b2f96f9e4f5f ("vfio-ccw: Connect the device request notifier")
> Signed-off-by: Eric Farman <farman@linux.ibm.com>
> ---
>  hw/vfio/ccw.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index b2df708e4b..cfbfc3d1a2 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -411,20 +411,19 @@ static void vfio_ccw_register_irq_notifier(VFIOCCWDevice *vcdev,
>          return;
>      }
>  
> -    if (vdev->num_irqs < irq + 1) {
> -        error_setg(errp, "vfio: unexpected number of irqs %u",
> -                   vdev->num_irqs);

Alternative proposal: Change this message to

"vfio: IRQ %u not available (number of irqs %u)"

and still fail this function, while treating a failure of
vfio_ccw_register_irq_notifier(vcdev, VFIO_CCW_REQ_IRQ_INDEX, &err); in
vfio_ccw_realize() as a non-fatal error (maybe log a message).

This allows us to skip doing an ioctl call, of which we already know
that it would fail. Still, we can catch cases where a broken kernel e.g.
provides the crw region, but not the matching irq (I believe something
like that should indeed be a fatal error.)

> -        return;
> -    }
> -
>      argsz = sizeof(*irq_info);
>      irq_info = g_malloc0(argsz);
>      irq_info->index = irq;
>      irq_info->argsz = argsz;
>      if (ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO,
>                irq_info) < 0 || irq_info->count < 1) {
> -        error_setg_errno(errp, errno, "vfio: Error getting irq info");
> -        goto out_free_info;
> +        if (errno == EINVAL) {
> +            warn_report("Unable to get information about IRQ %u", irq);
> +            goto out_free_info;
> +        } else {
> +            error_setg_errno(errp, errno, "vfio: Error getting irq info");
> +            goto out_free_info;
> +        }
>      }
>  
>      if (event_notifier_init(notifier, 0)) {


Re: [RFC PATCH] vfio-ccw: Permit missing IRQs
Posted by Eric Farman 3 years ago
On Wed, 2021-04-21 at 12:01 +0200, Cornelia Huck wrote:
> On Mon, 19 Apr 2021 20:49:06 +0200
> Eric Farman <farman@linux.ibm.com> wrote:
> 
> > Commit 690e29b91102 ("vfio-ccw: Refactor ccw irq handler") changed
> > one of the checks for the IRQ notifier registration from saying
> > "the host needs to recognize the only IRQ that exists" to saying
> > "the host needs to recognize ANY IRQ that exists."
> > 
> > And this worked fine, because the subsequent change to support the
> > CRW IRQ notifier doesn't get into this code when running on an
> > older
> > kernel, thanks to a guard by a capability region. The later
> > addition
> > of the REQ(uest) IRQ by commit b2f96f9e4f5f ("vfio-ccw: Connect the
> > device request notifier") broke this assumption because there is no
> > matching capability region. Thus, running new QEMU on an older
> > kernel fails with:
> > 
> >   vfio: unexpected number of irqs 2
> > 
> > Let's simply remove the check (and the less-than-helpful message),
> > and make the VFIO_DEVICE_GET_IRQ_INFO ioctl request for the IRQ
> > being processed. If it returns with EINVAL, we can treat it as
> > an unfortunate mismatch but not a fatal error for the guest.
> > 
> > Fixes: 690e29b91102 ("vfio-ccw: Refactor ccw irq handler")
> > Fixes: b2f96f9e4f5f ("vfio-ccw: Connect the device request
> > notifier")
> > Signed-off-by: Eric Farman <farman@linux.ibm.com>
> > ---
> >  hw/vfio/ccw.c | 15 +++++++--------
> >  1 file changed, 7 insertions(+), 8 deletions(-)
> > 
> > diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> > index b2df708e4b..cfbfc3d1a2 100644
> > --- a/hw/vfio/ccw.c
> > +++ b/hw/vfio/ccw.c
> > @@ -411,20 +411,19 @@ static void
> > vfio_ccw_register_irq_notifier(VFIOCCWDevice *vcdev,
> >          return;
> >      }
> >  
> > -    if (vdev->num_irqs < irq + 1) {
> > -        error_setg(errp, "vfio: unexpected number of irqs %u",
> > -                   vdev->num_irqs);
> 
> Alternative proposal: Change this message to
> 
> "vfio: IRQ %u not available (number of irqs %u)"

> and still fail this function, while treating a failure of
> vfio_ccw_register_irq_notifier(vcdev, VFIO_CCW_REQ_IRQ_INDEX, &err);
> in
> vfio_ccw_realize() as a non-fatal error (maybe log a message).

This all sounds fine to me. I'll send a v2 as such.

> 
> This allows us to skip doing an ioctl call, of which we already know
> that it would fail. 

True, though as this is at the configuration time it's not as critical.

> Still, we can catch cases where a broken kernel e.g.
> provides the crw region, but not the matching irq (I believe
> something
> like that should indeed be a fatal error.)

Well they shouldn't do THAT. :)

> 
> > -        return;
> > -    }
> > -
> >      argsz = sizeof(*irq_info);
> >      irq_info = g_malloc0(argsz);
> >      irq_info->index = irq;
> >      irq_info->argsz = argsz;
> >      if (ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO,
> >                irq_info) < 0 || irq_info->count < 1) {
> > -        error_setg_errno(errp, errno, "vfio: Error getting irq
> > info");
> > -        goto out_free_info;
> > +        if (errno == EINVAL) {
> > +            warn_report("Unable to get information about IRQ %u",
> > irq);
> > +            goto out_free_info;
> > +        } else {
> > +            error_setg_errno(errp, errno, "vfio: Error getting irq
> > info");
> > +            goto out_free_info;
> > +        }
> >      }
> >  
> >      if (event_notifier_init(notifier, 0)) {