[PATCH v3] hw/s390x: Fix a possible crash with passed-through virtio devices

Thomas Huth posted 1 patch 1 day, 16 hours ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20251118174047.73103-1-thuth@redhat.com
Maintainers: Richard Henderson <richard.henderson@linaro.org>, David Hildenbrand <david@kernel.org>, Ilya Leoshkevich <iii@linux.ibm.com>, Halil Pasic <pasic@linux.ibm.com>, Christian Borntraeger <borntraeger@linux.ibm.com>, Eric Farman <farman@linux.ibm.com>, Matthew Rosato <mjrosato@linux.ibm.com>, Thomas Huth <thuth@redhat.com>
hw/s390x/s390-hypercall.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
[PATCH v3] hw/s390x: Fix a possible crash with passed-through virtio devices
Posted by Thomas Huth 1 day, 16 hours ago
From: Thomas Huth <thuth@redhat.com>

Consider the following nested setup: An L1 host uses some virtio device
(e.g. virtio-keyboard) for the L2 guest, and this L2 guest passes this
device through to the L3 guest. Since the L3 guest sees a virtio device,
it might send virtio notifications to the QEMU in L2 for that device.
But since the QEMU in L2 defined this device as vfio-ccw, the function
handle_virtio_ccw_notify() cannot handle this and crashes: It calls
virtio_ccw_get_vdev() that casts sch->driver_data into a VirtioCcwDevice,
but since "sch" belongs to a vfio-ccw device, that driver_data rather
points to a CcwDevice instead. So as soon as QEMU tries to use some
VirtioCcwDevice specific data from that device, we've lost.

We must not take virtio notifications for such devices. Thus fix the
issue by adding a check to the handle_virtio_ccw_notify() handler to
refuse all devices that are not our own virtio devices. Like in the
other branches that detect wrong settings, we return -EINVAL from the
function, which will later be placed in GPR2 to inform the guest about
the error.

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 v3: Print the subchannel number to ease debugging

 hw/s390x/s390-hypercall.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/hw/s390x/s390-hypercall.c b/hw/s390x/s390-hypercall.c
index ac1b08b2cd5..508dd97ca0d 100644
--- a/hw/s390x/s390-hypercall.c
+++ b/hw/s390x/s390-hypercall.c
@@ -10,6 +10,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/error-report.h"
 #include "cpu.h"
 #include "hw/s390x/s390-virtio-ccw.h"
 #include "hw/s390x/s390-hypercall.h"
@@ -42,6 +43,19 @@ static int handle_virtio_ccw_notify(uint64_t subch_id, uint64_t data)
     if (!sch || !css_subch_visible(sch)) {
         return -EINVAL;
     }
+    if (sch->id.cu_type != VIRTIO_CCW_CU_TYPE) {
+        /*
+         * This might happen in nested setups: If the L1 host defined the
+         * L2 guest with a virtio device (e.g. virtio-keyboard), and the
+         * L2 guest passes this device through to the L3 guest, the L3 guest
+         * might send virtio notifications to the QEMU in L2 for that device.
+         * But since the QEMU in L2 defined this device as vfio-ccw, it's not
+         * a VirtIODevice that we can handle here!
+         */
+        warn_report_once("Got virtio notification for unsupported device "
+                         "on subchannel %02x.%1x.%04x!", cssid, ssid, schid);
+        return -EINVAL;
+    }
 
     vdev = virtio_ccw_get_vdev(sch);
     if (vq_idx >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, vq_idx)) {
-- 
2.51.1
Re: [PATCH v3] hw/s390x: Fix a possible crash with passed-through virtio devices
Posted by Cédric Le Goater 1 day ago
On 11/18/25 18:40, Thomas Huth wrote:
> From: Thomas Huth <thuth@redhat.com>
> 
> Consider the following nested setup: An L1 host uses some virtio device
> (e.g. virtio-keyboard) for the L2 guest, and this L2 guest passes this
> device through to the L3 guest. Since the L3 guest sees a virtio device,
> it might send virtio notifications to the QEMU in L2 for that device.
> But since the QEMU in L2 defined this device as vfio-ccw, the function
> handle_virtio_ccw_notify() cannot handle this and crashes: It calls
> virtio_ccw_get_vdev() that casts sch->driver_data into a VirtioCcwDevice,
> but since "sch" belongs to a vfio-ccw device, that driver_data rather
> points to a CcwDevice instead. So as soon as QEMU tries to use some
> VirtioCcwDevice specific data from that device, we've lost.
> 
> We must not take virtio notifications for such devices. Thus fix the
> issue by adding a check to the handle_virtio_ccw_notify() handler to
> refuse all devices that are not our own virtio devices. Like in the
> other branches that detect wrong settings, we return -EINVAL from the
> function, which will later be placed in GPR2 to inform the guest about
> the error.
> 
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>   v3: Print the subchannel number to ease debugging
> 
>   hw/s390x/s390-hypercall.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
> 
> diff --git a/hw/s390x/s390-hypercall.c b/hw/s390x/s390-hypercall.c
> index ac1b08b2cd5..508dd97ca0d 100644
> --- a/hw/s390x/s390-hypercall.c
> +++ b/hw/s390x/s390-hypercall.c
> @@ -10,6 +10,7 @@
>    */
>   
>   #include "qemu/osdep.h"
> +#include "qemu/error-report.h"
>   #include "cpu.h"
>   #include "hw/s390x/s390-virtio-ccw.h"
>   #include "hw/s390x/s390-hypercall.h"
> @@ -42,6 +43,19 @@ static int handle_virtio_ccw_notify(uint64_t subch_id, uint64_t data)
>       if (!sch || !css_subch_visible(sch)) {
>           return -EINVAL;
>       }
> +    if (sch->id.cu_type != VIRTIO_CCW_CU_TYPE) {
> +        /*
> +         * This might happen in nested setups: If the L1 host defined the
> +         * L2 guest with a virtio device (e.g. virtio-keyboard), and the
> +         * L2 guest passes this device through to the L3 guest, the L3 guest
> +         * might send virtio notifications to the QEMU in L2 for that device.
> +         * But since the QEMU in L2 defined this device as vfio-ccw, it's not
> +         * a VirtIODevice that we can handle here!
> +         */
> +        warn_report_once("Got virtio notification for unsupported device "
> +                         "on subchannel %02x.%1x.%04x!", cssid, ssid, schid);
> +        return -EINVAL;
> +    }
>   
>       vdev = virtio_ccw_get_vdev(sch);

While at it, it would be good to test 'vdev' and return -EINVAL as in
virtio_ccw_set_vqs().

In virtio-ccw.c, this needs some care I think :

    static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
    {
        ...
        VirtioCcwDevice *dev = sch->driver_data;
        VirtIODevice *vdev = virtio_ccw_get_vdev(sch);
        ...
        if (!dev) {                <-- vdev ?
           return -EINVAL;
        }


Thanks,

C.

  

>       if (vq_idx >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, vq_idx)) {
Re: [PATCH v3] hw/s390x: Fix a possible crash with passed-through virtio devices
Posted by Thomas Huth 1 day ago
On 19/11/2025 11.02, Cédric Le Goater wrote:
> On 11/18/25 18:40, Thomas Huth wrote:
>> From: Thomas Huth <thuth@redhat.com>
>>
>> Consider the following nested setup: An L1 host uses some virtio device
>> (e.g. virtio-keyboard) for the L2 guest, and this L2 guest passes this
>> device through to the L3 guest. Since the L3 guest sees a virtio device,
>> it might send virtio notifications to the QEMU in L2 for that device.
>> But since the QEMU in L2 defined this device as vfio-ccw, the function
>> handle_virtio_ccw_notify() cannot handle this and crashes: It calls
>> virtio_ccw_get_vdev() that casts sch->driver_data into a VirtioCcwDevice,
>> but since "sch" belongs to a vfio-ccw device, that driver_data rather
>> points to a CcwDevice instead. So as soon as QEMU tries to use some
>> VirtioCcwDevice specific data from that device, we've lost.
>>
>> We must not take virtio notifications for such devices. Thus fix the
>> issue by adding a check to the handle_virtio_ccw_notify() handler to
>> refuse all devices that are not our own virtio devices. Like in the
>> other branches that detect wrong settings, we return -EINVAL from the
>> function, which will later be placed in GPR2 to inform the guest about
>> the error.
>>
>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>> ---
>>   v3: Print the subchannel number to ease debugging
>>
>>   hw/s390x/s390-hypercall.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/hw/s390x/s390-hypercall.c b/hw/s390x/s390-hypercall.c
>> index ac1b08b2cd5..508dd97ca0d 100644
>> --- a/hw/s390x/s390-hypercall.c
>> +++ b/hw/s390x/s390-hypercall.c
>> @@ -10,6 +10,7 @@
>>    */
>>   #include "qemu/osdep.h"
>> +#include "qemu/error-report.h"
>>   #include "cpu.h"
>>   #include "hw/s390x/s390-virtio-ccw.h"
>>   #include "hw/s390x/s390-hypercall.h"
>> @@ -42,6 +43,19 @@ static int handle_virtio_ccw_notify(uint64_t subch_id, 
>> uint64_t data)
>>       if (!sch || !css_subch_visible(sch)) {
>>           return -EINVAL;
>>       }
>> +    if (sch->id.cu_type != VIRTIO_CCW_CU_TYPE) {
>> +        /*
>> +         * This might happen in nested setups: If the L1 host defined the
>> +         * L2 guest with a virtio device (e.g. virtio-keyboard), and the
>> +         * L2 guest passes this device through to the L3 guest, the L3 guest
>> +         * might send virtio notifications to the QEMU in L2 for that 
>> device.
>> +         * But since the QEMU in L2 defined this device as vfio-ccw, it's 
>> not
>> +         * a VirtIODevice that we can handle here!
>> +         */
>> +        warn_report_once("Got virtio notification for unsupported device "
>> +                         "on subchannel %02x.%1x.%04x!", cssid, ssid, 
>> schid);
>> +        return -EINVAL;
>> +    }
>>       vdev = virtio_ccw_get_vdev(sch);
> 
> While at it, it would be good to test 'vdev' and return -EINVAL as in
> virtio_ccw_set_vqs().
> 
> In virtio-ccw.c, this needs some care I think :
> 
>     static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
>     {
>         ...
>         VirtioCcwDevice *dev = sch->driver_data;
>         VirtIODevice *vdev = virtio_ccw_get_vdev(sch);
>         ...
>         if (!dev) {                <-- vdev ?
>            return -EINVAL;
>         }

I wonder whether this can happen at all? We check for a a valid virtio-ccw 
device
now, and all virtio-ccw devices should have their driver_data set up in their
realize function, so I think we should always get a valid pointer here and 
ifit's NULL, there must be a bug somewhere. So maybe an assert() would be the
better idea? Or just let it crash by dereferencing the NULL pointer...?

  Thomas


Re: [PATCH v3] hw/s390x: Fix a possible crash with passed-through virtio devices
Posted by Cornelia Huck 1 day ago
On Tue, Nov 18 2025, Thomas Huth <thuth@redhat.com> wrote:

> From: Thomas Huth <thuth@redhat.com>
>
> Consider the following nested setup: An L1 host uses some virtio device
> (e.g. virtio-keyboard) for the L2 guest, and this L2 guest passes this
> device through to the L3 guest. Since the L3 guest sees a virtio device,
> it might send virtio notifications to the QEMU in L2 for that device.
> But since the QEMU in L2 defined this device as vfio-ccw, the function
> handle_virtio_ccw_notify() cannot handle this and crashes: It calls
> virtio_ccw_get_vdev() that casts sch->driver_data into a VirtioCcwDevice,
> but since "sch" belongs to a vfio-ccw device, that driver_data rather
> points to a CcwDevice instead. So as soon as QEMU tries to use some
> VirtioCcwDevice specific data from that device, we've lost.
>
> We must not take virtio notifications for such devices. Thus fix the
> issue by adding a check to the handle_virtio_ccw_notify() handler to
> refuse all devices that are not our own virtio devices. Like in the
> other branches that detect wrong settings, we return -EINVAL from the
> function, which will later be placed in GPR2 to inform the guest about
> the error.
>
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  v3: Print the subchannel number to ease debugging
>
>  hw/s390x/s390-hypercall.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Re: [PATCH v3] hw/s390x: Fix a possible crash with passed-through virtio devices
Posted by Christian Borntraeger 1 day, 1 hour ago
Am 18.11.25 um 18:40 schrieb Thomas Huth:
> From: Thomas Huth <thuth@redhat.com>
> 
> Consider the following nested setup: An L1 host uses some virtio device
> (e.g. virtio-keyboard) for the L2 guest, and this L2 guest passes this
> device through to the L3 guest. Since the L3 guest sees a virtio device,
> it might send virtio notifications to the QEMU in L2 for that device.
> But since the QEMU in L2 defined this device as vfio-ccw, the function
> handle_virtio_ccw_notify() cannot handle this and crashes: It calls
> virtio_ccw_get_vdev() that casts sch->driver_data into a VirtioCcwDevice,
> but since "sch" belongs to a vfio-ccw device, that driver_data rather
> points to a CcwDevice instead. So as soon as QEMU tries to use some
> VirtioCcwDevice specific data from that device, we've lost.
> 
> We must not take virtio notifications for such devices. Thus fix the
> issue by adding a check to the handle_virtio_ccw_notify() handler to
> refuse all devices that are not our own virtio devices. Like in the
> other branches that detect wrong settings, we return -EINVAL from the
> function, which will later be placed in GPR2 to inform the guest about
> the error.
> 
> Signed-off-by: Thomas Huth <thuth@redhat.com>

Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>

Certainly a good first step. If needed we can improve further in the future.
Re: [PATCH v3] hw/s390x: Fix a possible crash with passed-through virtio devices
Posted by Halil Pasic 1 day, 11 hours ago
On Tue, 18 Nov 2025 18:40:47 +0100
Thomas Huth <thuth@redhat.com> wrote:

> Consider the following nested setup: An L1 host uses some virtio device
> (e.g. virtio-keyboard) for the L2 guest, and this L2 guest passes this
> device through to the L3 guest. Since the L3 guest sees a virtio device,
> it might send virtio notifications to the QEMU in L2 for that device.
> But since the QEMU in L2 defined this device as vfio-ccw, the function
> handle_virtio_ccw_notify() cannot handle this and crashes: It calls
> virtio_ccw_get_vdev() that casts sch->driver_data into a VirtioCcwDevice,
> but since "sch" belongs to a vfio-ccw device, that driver_data rather
> points to a CcwDevice instead. So as soon as QEMU tries to use some
> VirtioCcwDevice specific data from that device, we've lost.
> 
> We must not take virtio notifications for such devices. Thus fix the
> issue by adding a check to the handle_virtio_ccw_notify() handler to
> refuse all devices that are not our own virtio devices. Like in the
> other branches that detect wrong settings, we return -EINVAL from the
> function, which will later be placed in GPR2 to inform the guest about
> the error.
> 
> Signed-off-by: Thomas Huth <thuth@redhat.com>

Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Re: [PATCH v3] hw/s390x: Fix a possible crash with passed-through virtio devices
Posted by Eric Farman 1 day, 12 hours ago
On Tue, 2025-11-18 at 18:40 +0100, Thomas Huth wrote:
> From: Thomas Huth <thuth@redhat.com>
> 
> Consider the following nested setup: An L1 host uses some virtio device
> (e.g. virtio-keyboard) for the L2 guest, and this L2 guest passes this
> device through to the L3 guest. Since the L3 guest sees a virtio device,
> it might send virtio notifications to the QEMU in L2 for that device.
> But since the QEMU in L2 defined this device as vfio-ccw, the function
> handle_virtio_ccw_notify() cannot handle this and crashes: It calls
> virtio_ccw_get_vdev() that casts sch->driver_data into a VirtioCcwDevice,
> but since "sch" belongs to a vfio-ccw device, that driver_data rather
> points to a CcwDevice instead. So as soon as QEMU tries to use some
> VirtioCcwDevice specific data from that device, we've lost.
> 
> We must not take virtio notifications for such devices. Thus fix the
> issue by adding a check to the handle_virtio_ccw_notify() handler to
> refuse all devices that are not our own virtio devices. Like in the
> other branches that detect wrong settings, we return -EINVAL from the
> function, which will later be placed in GPR2 to inform the guest about
> the error.

I still think this is a good idea, but of course "let's try it" got me into the weeds. I
reconstructed a configuration (dasd->virtio-blk-ccw->vfio-ccw->virtio-blk-ccw) that crashes the
nested guest upon startup with today's master. Applying this patch generates that message to point
out where it's broken (yay!), but the nested guest hangs during boot. Need to ponder this more
tomorrow.

...
2025-11-18T21:22:36.645657Z qemu-system-s390x: warning: Got virtio notification for unsupported
device on subchannel 00.0.0002!

> 
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  v3: Print the subchannel number to ease debugging
> 
>  hw/s390x/s390-hypercall.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/hw/s390x/s390-hypercall.c b/hw/s390x/s390-hypercall.c
> index ac1b08b2cd5..508dd97ca0d 100644
> --- a/hw/s390x/s390-hypercall.c
> +++ b/hw/s390x/s390-hypercall.c
> @@ -10,6 +10,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/error-report.h"
>  #include "cpu.h"
>  #include "hw/s390x/s390-virtio-ccw.h"
>  #include "hw/s390x/s390-hypercall.h"
> @@ -42,6 +43,19 @@ static int handle_virtio_ccw_notify(uint64_t subch_id, uint64_t data)
>      if (!sch || !css_subch_visible(sch)) {
>          return -EINVAL;
>      }
> +    if (sch->id.cu_type != VIRTIO_CCW_CU_TYPE) {
> +        /*
> +         * This might happen in nested setups: If the L1 host defined the
> +         * L2 guest with a virtio device (e.g. virtio-keyboard), and the
> +         * L2 guest passes this device through to the L3 guest, the L3 guest
> +         * might send virtio notifications to the QEMU in L2 for that device.
> +         * But since the QEMU in L2 defined this device as vfio-ccw, it's not
> +         * a VirtIODevice that we can handle here!
> +         */
> +        warn_report_once("Got virtio notification for unsupported device "
> +                         "on subchannel %02x.%1x.%04x!", cssid, ssid, schid);
> +        return -EINVAL;
> +    }
>  
>      vdev = virtio_ccw_get_vdev(sch);
>      if (vq_idx >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, vq_idx)) {
Re: [PATCH v3] hw/s390x: Fix a possible crash with passed-through virtio devices
Posted by Thomas Huth 1 day, 3 hours ago
On 18/11/2025 22.45, Eric Farman wrote:
> On Tue, 2025-11-18 at 18:40 +0100, Thomas Huth wrote:
>> From: Thomas Huth <thuth@redhat.com>
>>
>> Consider the following nested setup: An L1 host uses some virtio device
>> (e.g. virtio-keyboard) for the L2 guest, and this L2 guest passes this
>> device through to the L3 guest. Since the L3 guest sees a virtio device,
>> it might send virtio notifications to the QEMU in L2 for that device.
>> But since the QEMU in L2 defined this device as vfio-ccw, the function
>> handle_virtio_ccw_notify() cannot handle this and crashes: It calls
>> virtio_ccw_get_vdev() that casts sch->driver_data into a VirtioCcwDevice,
>> but since "sch" belongs to a vfio-ccw device, that driver_data rather
>> points to a CcwDevice instead. So as soon as QEMU tries to use some
>> VirtioCcwDevice specific data from that device, we've lost.
>>
>> We must not take virtio notifications for such devices. Thus fix the
>> issue by adding a check to the handle_virtio_ccw_notify() handler to
>> refuse all devices that are not our own virtio devices. Like in the
>> other branches that detect wrong settings, we return -EINVAL from the
>> function, which will later be placed in GPR2 to inform the guest about
>> the error.
> 
> I still think this is a good idea, but of course "let's try it" got me into the weeds. I
> reconstructed a configuration (dasd->virtio-blk-ccw->vfio-ccw->virtio-blk-ccw) that crashes the
> nested guest upon startup with today's master. Applying this patch generates that message to point
> out where it's broken (yay!), but the nested guest hangs during boot. Need to ponder this more
> tomorrow.

FWIW, we only tried to passthrough a virtio-input device to the L3 guest, we 
did not try a virtio-blk device here ... so that might be the reason why I 
did not see any further hangs after applying my fix.

  Thomas
Re: [PATCH v3] hw/s390x: Fix a possible crash with passed-through virtio devices
Posted by Eric Farman 20 hours ago
On Wed, 2025-11-19 at 08:33 +0100, Thomas Huth wrote:
> On 18/11/2025 22.45, Eric Farman wrote:
> > On Tue, 2025-11-18 at 18:40 +0100, Thomas Huth wrote:
> > > From: Thomas Huth <thuth@redhat.com>
> > > 
> > > Consider the following nested setup: An L1 host uses some virtio device
> > > (e.g. virtio-keyboard) for the L2 guest, and this L2 guest passes this
> > > device through to the L3 guest. Since the L3 guest sees a virtio device,
> > > it might send virtio notifications to the QEMU in L2 for that device.
> > > But since the QEMU in L2 defined this device as vfio-ccw, the function
> > > handle_virtio_ccw_notify() cannot handle this and crashes: It calls
> > > virtio_ccw_get_vdev() that casts sch->driver_data into a VirtioCcwDevice,
> > > but since "sch" belongs to a vfio-ccw device, that driver_data rather
> > > points to a CcwDevice instead. So as soon as QEMU tries to use some
> > > VirtioCcwDevice specific data from that device, we've lost.
> > > 
> > > We must not take virtio notifications for such devices. Thus fix the
> > > issue by adding a check to the handle_virtio_ccw_notify() handler to
> > > refuse all devices that are not our own virtio devices. Like in the
> > > other branches that detect wrong settings, we return -EINVAL from the
> > > function, which will later be placed in GPR2 to inform the guest about
> > > the error.
> > 
> > I still think this is a good idea, but of course "let's try it" got me into the weeds. I
> > reconstructed a configuration (dasd->virtio-blk-ccw->vfio-ccw->virtio-blk-ccw) that crashes the
> > nested guest upon startup with today's master. Applying this patch generates that message to point
> > out where it's broken (yay!), but the nested guest hangs during boot. Need to ponder this more
> > tomorrow.
> 
> FWIW, we only tried to passthrough a virtio-input device to the L3 guest, we 
> did not try a virtio-blk device here ... so that might be the reason why I 
> did not see any further hangs after applying my fix.

Ah, that could be. Well, as you and Halil discussed in v2, there's more that can be done on top but
since this solves the guest crashes:

Reviewed-by: Eric Farman <farman@linux.ibm.com>
Tested-by: Eric Farman <farman@linux.ibm.com>

> 
>   Thomas