drivers/s390/virtio/virtio_ccw.c | 27 +++++++++++++++++++++ drivers/virtio/virtio.c | 24 ++++++++++++------ drivers/virtio/virtio_mmio.c | 9 +++++++ drivers/virtio/virtio_pci_legacy.c | 1 + drivers/virtio/virtio_pci_modern.c | 2 ++ drivers/virtio/virtio_ring.c | 32 +++++++++++++++++++++--- include/linux/virtio.h | 1 + include/linux/virtio_config.h | 39 +++++++++++++++++++++++++++++- 8 files changed, 123 insertions(+), 12 deletions(-)
Hi All:
This is a rework on the IRQ hardening for virtio which is done
previously by the following commits are reverted:
9e35276a5344 ("virtio_pci: harden MSI-X interrupts")
080cd7c3ac87 ("virtio-pci: harden INTX interrupts")
The reason is that it depends on the IRQF_NO_AUTOEN which may conflict
with the assumption of the affinity managed IRQ that is used by some
virtio drivers. And what's more, it is only done for virtio-pci but
not other transports.
In this rework, I try to implement a general virtio solution which
borrows the idea of the INTX hardening by re-using per virtqueue
boolean vq->broken and toggle it in virtio_device_ready() and
virtio_reset_device(). Then we can simply reuse the existing checks in
the vring_interrupt() and return early if the driver is not ready.
Note that, I only did compile test on ccw and MMIO transport.
Please review.
Changes since v1:
- Use transport specific irq synchronization method when possible
- Drop the module parameter and enable the hardening unconditonally
- Tweak the barrier/ordering facilities used in the code
- Reanme irq_soft_enabled to driver_ready
- Avoid unnecssary IRQ synchornization (e.g during boot)
Changes since V2:
- add ccw and MMIO support
- rename synchronize_vqs() to synchronize_cbs()
- switch to re-use vq->broken instead of introducing new device
attributes for the future virtqueue reset support
- remove unnecssary READ_ONCE()/WRITE_ONCE()
- a new patch to remove device triggerable BUG_ON()
- more tweaks on the comments
Changes since V3:
- Rename synchornize_vqs() to synchronize_cbs()
- tweak the comment for synchronize_cbs()
- switch to use a dedicated helper __virtio_unbreak_device() and
document it should be only used for probing
- switch to use rwlick to synchornize the non airq for ccw
Jason Wang (8):
virtio: use virtio_reset_device() when possible
virtio: introduce config op to synchronize vring callbacks
virtio-pci: implement synchronize_cbs()
virtio-mmio: implement synchronize_cbs()
virtio-ccw: implement synchronize_cbs()
virtio: allow to unbreak virtqueue
virtio: harden vring IRQ
virtio: use WARN_ON() to warning illegal status value
Stefano Garzarella (1):
virtio: use virtio_device_ready() in virtio_device_restore()
drivers/s390/virtio/virtio_ccw.c | 27 +++++++++++++++++++++
drivers/virtio/virtio.c | 24 ++++++++++++------
drivers/virtio/virtio_mmio.c | 9 +++++++
drivers/virtio/virtio_pci_legacy.c | 1 +
drivers/virtio/virtio_pci_modern.c | 2 ++
drivers/virtio/virtio_ring.c | 32 +++++++++++++++++++++---
include/linux/virtio.h | 1 +
include/linux/virtio_config.h | 39 +++++++++++++++++++++++++++++-
8 files changed, 123 insertions(+), 12 deletions(-)
--
2.25.1
On Sat, May 07 2022, Jason Wang <jasowang@redhat.com> wrote:
> Hi All:
>
> This is a rework on the IRQ hardening for virtio which is done
> previously by the following commits are reverted:
>
> 9e35276a5344 ("virtio_pci: harden MSI-X interrupts")
> 080cd7c3ac87 ("virtio-pci: harden INTX interrupts")
>
> The reason is that it depends on the IRQF_NO_AUTOEN which may conflict
> with the assumption of the affinity managed IRQ that is used by some
> virtio drivers. And what's more, it is only done for virtio-pci but
> not other transports.
>
> In this rework, I try to implement a general virtio solution which
> borrows the idea of the INTX hardening by re-using per virtqueue
> boolean vq->broken and toggle it in virtio_device_ready() and
> virtio_reset_device(). Then we can simply reuse the existing checks in
> the vring_interrupt() and return early if the driver is not ready.
>
> Note that, I only did compile test on ccw and MMIO transport.
Lockdep is unhappy with the ccw parts:
================================
WARNING: inconsistent lock state
5.18.0-rc6+ #191 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-R} -> {HARDIRQ-ON-W} usage.
kworker/u4:0/9 [HC0[0]:SC0[0]:HE1:SE1] takes:
00000000058e9618 (&vcdev->irq_lock){+-..}-{2:2}, at: virtio_ccw_synchronize_cbs+0x4e/0x60
{IN-HARDIRQ-R} state was registered at:
__lock_acquire+0x442/0xc20
lock_acquire.part.0+0xdc/0x228
lock_acquire+0xa6/0x1b0
_raw_read_lock_irqsave+0x72/0x100
virtio_ccw_int_handler+0x84/0x238
ccw_device_call_handler+0x72/0xd0
ccw_device_irq+0x7a/0x198
do_cio_interrupt+0x11c/0x1d0
__handle_irq_event_percpu+0xc2/0x318
handle_irq_event_percpu+0x26/0x68
handle_percpu_irq+0x64/0x88
generic_handle_irq+0x40/0x58
do_irq_async+0x56/0xb0
do_io_irq+0x82/0x160
io_int_handler+0xe6/0x120
rcu_read_lock_sched_held+0x3e/0xb0
lock_acquired+0x12e/0x208
new_inode+0x3e/0xd0
debugfs_get_inode+0x22/0x68
__debugfs_create_file+0x78/0x1c0
debugfs_create_file_unsafe+0x36/0x58
debugfs_create_u32+0x38/0x68
sched_init_debug+0xb0/0x1c0
do_one_initcall+0x108/0x280
do_initcalls+0x124/0x148
kernel_init_freeable+0x242/0x280
kernel_init+0x2e/0x158
__ret_from_fork+0x3c/0x50
ret_from_fork+0xa/0x40
irq event stamp: 539789
hardirqs last enabled at (539789): [<0000000000d9c632>] _raw_spin_unlock_irqrestore+0x72/0x88
hardirqs last disabled at (539788): [<0000000000d9c2b6>] _raw_spin_lock_irqsave+0x96/0xd0
softirqs last enabled at (539568): [<0000000000d9e0d4>] __do_softirq+0x434/0x588
softirqs last disabled at (539503): [<000000000018cd66>] __irq_exit_rcu+0x146/0x170
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&vcdev->irq_lock);
<Interrupt>
lock(&vcdev->irq_lock);
*** DEADLOCK ***
2 locks held by kworker/u4:0/9:
#0: 000000000288d948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1ea/0x658
#1: 000003800004bdc8 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x1ea/0x658
stack backtrace:
CPU: 1 PID: 9 Comm: kworker/u4:0 Not tainted 5.18.0-rc6+ #191
Hardware name: QEMU 8561 QEMU (KVM/Linux)
Workqueue: events_unbound async_run_entry_fn
Call Trace:
[<0000000000d8af22>] dump_stack_lvl+0x92/0xd0
[<00000000002032ac>] mark_lock_irq+0x864/0x968
[<0000000000203670>] mark_lock.part.0+0x2c0/0x790
[<0000000000203cea>] mark_usage+0x10a/0x178
[<000000000020692a>] __lock_acquire+0x442/0xc20
[<0000000000207cc4>] lock_acquire.part.0+0xdc/0x228
[<0000000000207eb6>] lock_acquire+0xa6/0x1b0
[<0000000000d9c774>] _raw_write_lock+0x54/0xa8
[<0000000000d5a1f6>] virtio_ccw_synchronize_cbs+0x4e/0x60
[<00000000008eec04>] register_virtio_device+0xdc/0x1b0
[<0000000000d5aabe>] virtio_ccw_online+0x246/0x2e8
[<0000000000c9fecc>] ccw_device_set_online+0x1c4/0x540
[<0000000000d5a05e>] virtio_ccw_auto_online+0x26/0x50
[<00000000001ba2b0>] async_run_entry_fn+0x40/0x108
[<00000000001ab9b4>] process_one_work+0x2a4/0x658
[<00000000001abdd0>] worker_thread+0x68/0x440
[<00000000001b4668>] kthread+0x128/0x130
[<0000000000102fac>] __ret_from_fork+0x3c/0x50
[<0000000000d9d3aa>] ret_from_fork+0xa/0x40
INFO: lockdep is turned off.
On Tue, May 10, 2022 at 5:29 PM Cornelia Huck <cohuck@redhat.com> wrote:
>
> On Sat, May 07 2022, Jason Wang <jasowang@redhat.com> wrote:
>
> > Hi All:
> >
> > This is a rework on the IRQ hardening for virtio which is done
> > previously by the following commits are reverted:
> >
> > 9e35276a5344 ("virtio_pci: harden MSI-X interrupts")
> > 080cd7c3ac87 ("virtio-pci: harden INTX interrupts")
> >
> > The reason is that it depends on the IRQF_NO_AUTOEN which may conflict
> > with the assumption of the affinity managed IRQ that is used by some
> > virtio drivers. And what's more, it is only done for virtio-pci but
> > not other transports.
> >
> > In this rework, I try to implement a general virtio solution which
> > borrows the idea of the INTX hardening by re-using per virtqueue
> > boolean vq->broken and toggle it in virtio_device_ready() and
> > virtio_reset_device(). Then we can simply reuse the existing checks in
> > the vring_interrupt() and return early if the driver is not ready.
> >
> > Note that, I only did compile test on ccw and MMIO transport.
>
> Lockdep is unhappy with the ccw parts:
>
> ================================
> WARNING: inconsistent lock state
> 5.18.0-rc6+ #191 Not tainted
> --------------------------------
> inconsistent {IN-HARDIRQ-R} -> {HARDIRQ-ON-W} usage.
> kworker/u4:0/9 [HC0[0]:SC0[0]:HE1:SE1] takes:
> 00000000058e9618 (&vcdev->irq_lock){+-..}-{2:2}, at: virtio_ccw_synchronize_cbs+0x4e/0x60
> {IN-HARDIRQ-R} state was registered at:
> __lock_acquire+0x442/0xc20
> lock_acquire.part.0+0xdc/0x228
> lock_acquire+0xa6/0x1b0
> _raw_read_lock_irqsave+0x72/0x100
> virtio_ccw_int_handler+0x84/0x238
> ccw_device_call_handler+0x72/0xd0
> ccw_device_irq+0x7a/0x198
> do_cio_interrupt+0x11c/0x1d0
> __handle_irq_event_percpu+0xc2/0x318
> handle_irq_event_percpu+0x26/0x68
> handle_percpu_irq+0x64/0x88
> generic_handle_irq+0x40/0x58
> do_irq_async+0x56/0xb0
> do_io_irq+0x82/0x160
> io_int_handler+0xe6/0x120
> rcu_read_lock_sched_held+0x3e/0xb0
> lock_acquired+0x12e/0x208
> new_inode+0x3e/0xd0
> debugfs_get_inode+0x22/0x68
> __debugfs_create_file+0x78/0x1c0
> debugfs_create_file_unsafe+0x36/0x58
> debugfs_create_u32+0x38/0x68
> sched_init_debug+0xb0/0x1c0
> do_one_initcall+0x108/0x280
> do_initcalls+0x124/0x148
> kernel_init_freeable+0x242/0x280
> kernel_init+0x2e/0x158
> __ret_from_fork+0x3c/0x50
> ret_from_fork+0xa/0x40
> irq event stamp: 539789
> hardirqs last enabled at (539789): [<0000000000d9c632>] _raw_spin_unlock_irqrestore+0x72/0x88
> hardirqs last disabled at (539788): [<0000000000d9c2b6>] _raw_spin_lock_irqsave+0x96/0xd0
> softirqs last enabled at (539568): [<0000000000d9e0d4>] __do_softirq+0x434/0x588
> softirqs last disabled at (539503): [<000000000018cd66>] __irq_exit_rcu+0x146/0x170
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&vcdev->irq_lock);
> <Interrupt>
> lock(&vcdev->irq_lock);
>
> *** DEADLOCK ***
It looks to me we need to use write_lock_irq()/write_unlock_irq() to
do the synchronization.
And we probably need to keep the
read_lock_irqsave()/read_lock_irqrestore() logic since I can see the
virtio_ccw_int_handler() to be called from process context (e.g from
the io_subchannel_quiesce()).
Thanks
>
> 2 locks held by kworker/u4:0/9:
> #0: 000000000288d948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1ea/0x658
> #1: 000003800004bdc8 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x1ea/0x658
>
> stack backtrace:
> CPU: 1 PID: 9 Comm: kworker/u4:0 Not tainted 5.18.0-rc6+ #191
> Hardware name: QEMU 8561 QEMU (KVM/Linux)
> Workqueue: events_unbound async_run_entry_fn
> Call Trace:
> [<0000000000d8af22>] dump_stack_lvl+0x92/0xd0
> [<00000000002032ac>] mark_lock_irq+0x864/0x968
> [<0000000000203670>] mark_lock.part.0+0x2c0/0x790
> [<0000000000203cea>] mark_usage+0x10a/0x178
> [<000000000020692a>] __lock_acquire+0x442/0xc20
> [<0000000000207cc4>] lock_acquire.part.0+0xdc/0x228
> [<0000000000207eb6>] lock_acquire+0xa6/0x1b0
> [<0000000000d9c774>] _raw_write_lock+0x54/0xa8
> [<0000000000d5a1f6>] virtio_ccw_synchronize_cbs+0x4e/0x60
> [<00000000008eec04>] register_virtio_device+0xdc/0x1b0
> [<0000000000d5aabe>] virtio_ccw_online+0x246/0x2e8
> [<0000000000c9fecc>] ccw_device_set_online+0x1c4/0x540
> [<0000000000d5a05e>] virtio_ccw_auto_online+0x26/0x50
> [<00000000001ba2b0>] async_run_entry_fn+0x40/0x108
> [<00000000001ab9b4>] process_one_work+0x2a4/0x658
> [<00000000001abdd0>] worker_thread+0x68/0x440
> [<00000000001b4668>] kthread+0x128/0x130
> [<0000000000102fac>] __ret_from_fork+0x3c/0x50
> [<0000000000d9d3aa>] ret_from_fork+0xa/0x40
> INFO: lockdep is turned off.
>
On Wed, 11 May 2022 10:22:59 +0800 Jason Wang <jasowang@redhat.com> wrote: > > CPU0 > > ---- > > lock(&vcdev->irq_lock); > > <Interrupt> > > lock(&vcdev->irq_lock); > > > > *** DEADLOCK *** > > It looks to me we need to use write_lock_irq()/write_unlock_irq() to > do the synchronization. > > And we probably need to keep the > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the > virtio_ccw_int_handler() to be called from process context (e.g from > the io_subchannel_quiesce()). > Sounds correct. Regards, Halil
On Wed, May 11, 2022 at 10:02 PM Halil Pasic <pasic@linux.ibm.com> wrote: > > On Wed, 11 May 2022 10:22:59 +0800 > Jason Wang <jasowang@redhat.com> wrote: > > > > CPU0 > > > ---- > > > lock(&vcdev->irq_lock); > > > <Interrupt> > > > lock(&vcdev->irq_lock); > > > > > > *** DEADLOCK *** > > > > It looks to me we need to use write_lock_irq()/write_unlock_irq() to > > do the synchronization. > > > > And we probably need to keep the > > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the > > virtio_ccw_int_handler() to be called from process context (e.g from > > the io_subchannel_quiesce()). > > > > Sounds correct. As Cornelia and Vineeth pointed out, all the paths the vring_interrupt is called with irq disabled. So I will use spin_lock()/spin_unlock() in the next version. Thanks > > Regards, > Halil >
On Thu, 12 May 2022 11:31:08 +0800 Jason Wang <jasowang@redhat.com> wrote: > > > It looks to me we need to use write_lock_irq()/write_unlock_irq() to > > > do the synchronization. > > > > > > And we probably need to keep the > > > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the > > > virtio_ccw_int_handler() to be called from process context (e.g from > > > the io_subchannel_quiesce()). > > > > > > > Sounds correct. > > As Cornelia and Vineeth pointed out, all the paths the vring_interrupt > is called with irq disabled. > > So I will use spin_lock()/spin_unlock() in the next version. Can we do some sort of an assertion that if the kernel is built with the corresponding debug features will make sure this assumption holds (and warn if it does not)? That assertion would also document the fact. If an assertion is not possible, I think we should at least place a strategic comment that documents our assumption. Regards, Halil > > Thanks
On Mon, May 16, 2022 at 01:20:06PM +0200, Halil Pasic wrote: > On Thu, 12 May 2022 11:31:08 +0800 > Jason Wang <jasowang@redhat.com> wrote: > > > > > It looks to me we need to use write_lock_irq()/write_unlock_irq() to > > > > do the synchronization. > > > > > > > > And we probably need to keep the > > > > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the > > > > virtio_ccw_int_handler() to be called from process context (e.g from > > > > the io_subchannel_quiesce()). > > > > > > > > > > Sounds correct. > > > > As Cornelia and Vineeth pointed out, all the paths the vring_interrupt > > is called with irq disabled. > > > > So I will use spin_lock()/spin_unlock() in the next version. > > Can we do some sort of an assertion that if the kernel is built with > the corresponding debug features will make sure this assumption holds > (and warn if it does not)? That assertion would also document the fact. Lockdep will do this automatically if you get it wrong, just like it did here. > If an assertion is not possible, I think we should at least place a > strategic comment that documents our assumption. That can't hurt. > Regards, > Halil > > > > > Thanks
On Mon, May 16, 2022 at 10:25 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Mon, May 16, 2022 at 01:20:06PM +0200, Halil Pasic wrote: > > On Thu, 12 May 2022 11:31:08 +0800 > > Jason Wang <jasowang@redhat.com> wrote: > > > > > > > It looks to me we need to use write_lock_irq()/write_unlock_irq() to > > > > > do the synchronization. > > > > > > > > > > And we probably need to keep the > > > > > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the > > > > > virtio_ccw_int_handler() to be called from process context (e.g from > > > > > the io_subchannel_quiesce()). > > > > > > > > > > > > > Sounds correct. > > > > > > As Cornelia and Vineeth pointed out, all the paths the vring_interrupt > > > is called with irq disabled. > > > > > > So I will use spin_lock()/spin_unlock() in the next version. > > > > Can we do some sort of an assertion that if the kernel is built with > > the corresponding debug features will make sure this assumption holds > > (and warn if it does not)? That assertion would also document the fact. > > Lockdep will do this automatically if you get it wrong, just like it > did here. > > > If an assertion is not possible, I think we should at least place a > > strategic comment that documents our assumption. > > That can't hurt. I will add some comments here. Thanks > > > Regards, > > Halil > > > > > > > > Thanks >
© 2016 - 2026 Red Hat, Inc.