The virtio device/driver (e.g., vhost-scsi and indeed any device including e1000e) may hang due to the lost of IRQ or the lost of doorbell register kick, e.g., https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html The virtio-net was in trouble in above link because the 'kick' was not taking effect (missed). This RFC adds a new debug interface 'DeviceEvent' to DeviceClass to help narrow down if the issue is due to lost of irq/kick. So far the new interface handles only two events: 'call' and 'kick'. Any device (e.g., e1000e or vhost-scsi) may implement (e.g., via eventfd, MSI-X or legacy IRQ). The 'call' is to inject irq on purpose by admin for a specific device (e.g., vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell on purpose by admin at QEMU/host side for a specific device. This device can also be used as a workaround if call/kick is lost due to virtualization software (e.g., kernel or QEMU) issue. Below is from live crash analysis. Initially, the queue=3 has count=30 for 'kick' eventfd_ctx. Suppose there is data in vring avail while there is no used available. We suspect this is because vhost-scsi was not notified by VM. In order to narrow down and analyze the issue, we use live crash to dump the current counter of eventfd for queue=3. crash> eventfd_ctx ffffa10392537ac0 struct eventfd_ctx { kref = { refcount = { refs = { counter = 4 } } }, wqh = { lock = { { rlock = { raw_lock = { { val = { counter = 0 }, { locked = 0 '\000', pending = 0 '\000' }, { locked_pending = 0, tail = 0 } } } } } }, head = { next = 0xffffa104ae40d360, prev = 0xffffa104ae40d360 } }, count = 30, -----> eventfd is 30 !!! flags = 526336, id = 26 } Now we kick the doorbell for vhost-scsi queue=3 on purpose for diagnostic with this interface. { "execute": "x-debug-device-event", "arguments": { "dev": "/machine/peripheral/vscsi0", "event": "kick", "queue": 3 } } The counter increased to 31. Suppose the hang issue is addressed, it indicates something bad is in software that the 'kick' is lost. crash> eventfd_ctx ffffa10392537ac0 struct eventfd_ctx { kref = { refcount = { refs = { counter = 4 } } }, wqh = { lock = { { rlock = { raw_lock = { { val = { counter = 0 }, { locked = 0 '\000', pending = 0 '\000' }, { locked_pending = 0, tail = 0 } } } } } }, head = { next = 0xffffa104ae40d360, prev = 0xffffa104ae40d360 } }, count = 31, -----> eventfd incremented to 31 !!! flags = 526336, id = 26 } Only the interface for vhost-scsi is implemented since this is RFC. I will implement for other types (e.g., eventfd or MSI-X) if the RFC is reasonable. Thank you very much! Dongli Zhang
On Thu, Jan 14, 2021 at 04:27:28PM -0800, Dongli Zhang wrote: > The virtio device/driver (e.g., vhost-scsi and indeed any device including > e1000e) may hang due to the lost of IRQ or the lost of doorbell register > kick, e.g., > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html > > The virtio-net was in trouble in above link because the 'kick' was not > taking effect (missed). > > This RFC adds a new debug interface 'DeviceEvent' to DeviceClass to help > narrow down if the issue is due to lost of irq/kick. So far the new > interface handles only two events: 'call' and 'kick'. Any device (e.g., > e1000e or vhost-scsi) may implement (e.g., via eventfd, MSI-X or legacy > IRQ). > > The 'call' is to inject irq on purpose by admin for a specific device (e.g., > vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell > on purpose by admin at QEMU/host side for a specific device. I'm really not convinced that we want to give admins the direct ability to poke at internals of devices in a running QEMU. It feels like there is way too much potential for the admin to make a situation far worse by doing the wrong thing here, and people dealing with support tickets will have no idea that the admin has been poking internals of the device and broken it by doing something wrong. You pointed to bug that hit where this could conceivably be useful, but that's a one time issue and should not a common occurrance that justifies making an official public API to poke at devices forever more IMHO. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
* Daniel P. Berrangé (berrange@redhat.com) wrote: > On Thu, Jan 14, 2021 at 04:27:28PM -0800, Dongli Zhang wrote: > > The virtio device/driver (e.g., vhost-scsi and indeed any device including > > e1000e) may hang due to the lost of IRQ or the lost of doorbell register > > kick, e.g., > > > > https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html > > > > The virtio-net was in trouble in above link because the 'kick' was not > > taking effect (missed). > > > > This RFC adds a new debug interface 'DeviceEvent' to DeviceClass to help > > narrow down if the issue is due to lost of irq/kick. So far the new > > interface handles only two events: 'call' and 'kick'. Any device (e.g., > > e1000e or vhost-scsi) may implement (e.g., via eventfd, MSI-X or legacy > > IRQ). > > > > The 'call' is to inject irq on purpose by admin for a specific device (e.g., > > vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell > > on purpose by admin at QEMU/host side for a specific device. > > I'm really not convinced that we want to give admins the direct ability to > poke at internals of devices in a running QEMU. It feels like there is way > too much potential for the admin to make a situation far worse by doing > the wrong thing here, We already do have commands to write to an iport, and to inject MCEs for example; is this that much different? > and people dealing with support tickets will have > no idea that the admin has been poking internals of the device and broken > it by doing something wrong. You could add a one time log entry to say that this mischeivous command had been used. > You pointed to bug that hit where this could conceivably be useful, but > that's a one time issue and should not a common occurrance that justifies > making an official public API to poke at devices forever more IMHO. I think where it might be practically useful is if you were debugging a hung customers VM and need to find a way to get it to move again. THat's something I'm not familiar with on the virtio side; mst - is this useful from a virtio side? Dave > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 1/18/21 8:59 AM, Dr. David Alan Gilbert wrote: > * Daniel P. Berrangé (berrange@redhat.com) wrote: >> On Thu, Jan 14, 2021 at 04:27:28PM -0800, Dongli Zhang wrote: >>> The virtio device/driver (e.g., vhost-scsi and indeed any device including >>> e1000e) may hang due to the lost of IRQ or the lost of doorbell register >>> kick, e.g., >>> >>> https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!K_zaQzJhlvPjRZe9efEtyX8vB6fMlKQeNy_RGz7oPp9k76pC8zarG1nSs1SFSL2xI1g$ >>> >>> The virtio-net was in trouble in above link because the 'kick' was not >>> taking effect (missed). >>> >>> This RFC adds a new debug interface 'DeviceEvent' to DeviceClass to help >>> narrow down if the issue is due to lost of irq/kick. So far the new >>> interface handles only two events: 'call' and 'kick'. Any device (e.g., >>> e1000e or vhost-scsi) may implement (e.g., via eventfd, MSI-X or legacy >>> IRQ). >>> >>> The 'call' is to inject irq on purpose by admin for a specific device (e.g., >>> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell >>> on purpose by admin at QEMU/host side for a specific device. >> >> I'm really not convinced that we want to give admins the direct ability to >> poke at internals of devices in a running QEMU. It feels like there is way >> too much potential for the admin to make a situation far worse by doing >> the wrong thing here, > > We already do have commands to write to an iport, and to inject MCEs for > example; is this that much different? > >> and people dealing with support tickets will have >> no idea that the admin has been poking internals of the device and broken >> it by doing something wrong. > > You could add a one time log entry to say that this mischeivous command > had been used. > >> You pointed to bug that hit where this could conceivably be useful, but >> that's a one time issue and should not a common occurrance that justifies >> making an official public API to poke at devices forever more IMHO. > > I think where it might be practically useful is if you were debugging a > hung customers VM and need to find a way to get it to move again. > THat's something I'm not familiar with on the virtio side; > mst - is this useful from a virtio side? BTW, the linux kernel blk-mq has similar idea/interface. To run the below will be able to 'run' the block IO queue on purpose. echo "kick" > /sys/kernel/debug/block/sda/state It is helpful for diagnostic if we assume the IO stall is due to an unknown race that a 'run' of queue is missing. Dongli Zhang > > Dave > >> Regards, >> Daniel >> --
© 2016 - 2024 Red Hat, Inc.