[v4] rework on the IRQ hardening of virtio

[PATCH V4 0/9] rework on the IRQ hardening of virtio

Posted by Jason Wang 4 years ago

Hi All:

This is a rework on the IRQ hardening for virtio which is done
previously by the following commits are reverted:

9e35276a5344 ("virtio_pci: harden MSI-X interrupts")
080cd7c3ac87 ("virtio-pci: harden INTX interrupts")

The reason is that it depends on the IRQF_NO_AUTOEN which may conflict
with the assumption of the affinity managed IRQ that is used by some
virtio drivers. And what's more, it is only done for virtio-pci but
not other transports.

In this rework, I try to implement a general virtio solution which
borrows the idea of the INTX hardening by re-using per virtqueue
boolean vq->broken and toggle it in virtio_device_ready() and
virtio_reset_device(). Then we can simply reuse the existing checks in
the vring_interrupt() and return early if the driver is not ready.

Note that, I only did compile test on ccw and MMIO transport.

Please review.

Changes since v1:

- Use transport specific irq synchronization method when possible
- Drop the module parameter and enable the hardening unconditonally
- Tweak the barrier/ordering facilities used in the code
- Reanme irq_soft_enabled to driver_ready
- Avoid unnecssary IRQ synchornization (e.g during boot)

Changes since V2:

- add ccw and MMIO support
- rename synchronize_vqs() to synchronize_cbs()
- switch to re-use vq->broken instead of introducing new device
  attributes for the future virtqueue reset support
- remove unnecssary READ_ONCE()/WRITE_ONCE()
- a new patch to remove device triggerable BUG_ON()
- more tweaks on the comments

Changes since V3:

- Rename synchornize_vqs() to synchronize_cbs()
- tweak the comment for synchronize_cbs()
- switch to use a dedicated helper __virtio_unbreak_device() and
  document it should be only used for probing
- switch to use rwlick to synchornize the non airq for ccw

Jason Wang (8):
  virtio: use virtio_reset_device() when possible
  virtio: introduce config op to synchronize vring callbacks
  virtio-pci: implement synchronize_cbs()
  virtio-mmio: implement synchronize_cbs()
  virtio-ccw: implement synchronize_cbs()
  virtio: allow to unbreak virtqueue
  virtio: harden vring IRQ
  virtio: use WARN_ON() to warning illegal status value

Stefano Garzarella (1):
  virtio: use virtio_device_ready() in virtio_device_restore()

 drivers/s390/virtio/virtio_ccw.c   | 27 +++++++++++++++++++++
 drivers/virtio/virtio.c            | 24 ++++++++++++------
 drivers/virtio/virtio_mmio.c       |  9 +++++++
 drivers/virtio/virtio_pci_legacy.c |  1 +
 drivers/virtio/virtio_pci_modern.c |  2 ++
 drivers/virtio/virtio_ring.c       | 32 +++++++++++++++++++++---
 include/linux/virtio.h             |  1 +
 include/linux/virtio_config.h      | 39 +++++++++++++++++++++++++++++-
 8 files changed, 123 insertions(+), 12 deletions(-)

-- 
2.25.1

Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio

Posted by Cornelia Huck 3 years, 12 months ago

On Sat, May 07 2022, Jason Wang <jasowang@redhat.com> wrote:

> Hi All:
>
> This is a rework on the IRQ hardening for virtio which is done
> previously by the following commits are reverted:
>
> 9e35276a5344 ("virtio_pci: harden MSI-X interrupts")
> 080cd7c3ac87 ("virtio-pci: harden INTX interrupts")
>
> The reason is that it depends on the IRQF_NO_AUTOEN which may conflict
> with the assumption of the affinity managed IRQ that is used by some
> virtio drivers. And what's more, it is only done for virtio-pci but
> not other transports.
>
> In this rework, I try to implement a general virtio solution which
> borrows the idea of the INTX hardening by re-using per virtqueue
> boolean vq->broken and toggle it in virtio_device_ready() and
> virtio_reset_device(). Then we can simply reuse the existing checks in
> the vring_interrupt() and return early if the driver is not ready.
>
> Note that, I only did compile test on ccw and MMIO transport.

Lockdep is unhappy with the ccw parts:

================================
WARNING: inconsistent lock state
5.18.0-rc6+ #191 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-R} -> {HARDIRQ-ON-W} usage.
kworker/u4:0/9 [HC0[0]:SC0[0]:HE1:SE1] takes:
00000000058e9618 (&vcdev->irq_lock){+-..}-{2:2}, at: virtio_ccw_synchronize_cbs+0x4e/0x60
{IN-HARDIRQ-R} state was registered at:
  __lock_acquire+0x442/0xc20
  lock_acquire.part.0+0xdc/0x228
  lock_acquire+0xa6/0x1b0
  _raw_read_lock_irqsave+0x72/0x100
  virtio_ccw_int_handler+0x84/0x238
  ccw_device_call_handler+0x72/0xd0
  ccw_device_irq+0x7a/0x198
  do_cio_interrupt+0x11c/0x1d0
  __handle_irq_event_percpu+0xc2/0x318
  handle_irq_event_percpu+0x26/0x68
  handle_percpu_irq+0x64/0x88
  generic_handle_irq+0x40/0x58
  do_irq_async+0x56/0xb0
  do_io_irq+0x82/0x160
  io_int_handler+0xe6/0x120
  rcu_read_lock_sched_held+0x3e/0xb0
  lock_acquired+0x12e/0x208
  new_inode+0x3e/0xd0
  debugfs_get_inode+0x22/0x68
  __debugfs_create_file+0x78/0x1c0
  debugfs_create_file_unsafe+0x36/0x58
  debugfs_create_u32+0x38/0x68
  sched_init_debug+0xb0/0x1c0
  do_one_initcall+0x108/0x280
  do_initcalls+0x124/0x148
  kernel_init_freeable+0x242/0x280
  kernel_init+0x2e/0x158
  __ret_from_fork+0x3c/0x50
  ret_from_fork+0xa/0x40
irq event stamp: 539789
hardirqs last  enabled at (539789): [<0000000000d9c632>] _raw_spin_unlock_irqrestore+0x72/0x88
hardirqs last disabled at (539788): [<0000000000d9c2b6>] _raw_spin_lock_irqsave+0x96/0xd0
softirqs last  enabled at (539568): [<0000000000d9e0d4>] __do_softirq+0x434/0x588
softirqs last disabled at (539503): [<000000000018cd66>] __irq_exit_rcu+0x146/0x170

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&vcdev->irq_lock);
  <Interrupt>
    lock(&vcdev->irq_lock);

 *** DEADLOCK ***

2 locks held by kworker/u4:0/9:
 #0: 000000000288d948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1ea/0x658
 #1: 000003800004bdc8 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x1ea/0x658

stack backtrace:
CPU: 1 PID: 9 Comm: kworker/u4:0 Not tainted 5.18.0-rc6+ #191
Hardware name: QEMU 8561 QEMU (KVM/Linux)
Workqueue: events_unbound async_run_entry_fn
Call Trace:
 [<0000000000d8af22>] dump_stack_lvl+0x92/0xd0 
 [<00000000002032ac>] mark_lock_irq+0x864/0x968 
 [<0000000000203670>] mark_lock.part.0+0x2c0/0x790 
 [<0000000000203cea>] mark_usage+0x10a/0x178 
 [<000000000020692a>] __lock_acquire+0x442/0xc20 
 [<0000000000207cc4>] lock_acquire.part.0+0xdc/0x228 
 [<0000000000207eb6>] lock_acquire+0xa6/0x1b0 
 [<0000000000d9c774>] _raw_write_lock+0x54/0xa8 
 [<0000000000d5a1f6>] virtio_ccw_synchronize_cbs+0x4e/0x60 
 [<00000000008eec04>] register_virtio_device+0xdc/0x1b0 
 [<0000000000d5aabe>] virtio_ccw_online+0x246/0x2e8 
 [<0000000000c9fecc>] ccw_device_set_online+0x1c4/0x540 
 [<0000000000d5a05e>] virtio_ccw_auto_online+0x26/0x50 
 [<00000000001ba2b0>] async_run_entry_fn+0x40/0x108 
 [<00000000001ab9b4>] process_one_work+0x2a4/0x658 
 [<00000000001abdd0>] worker_thread+0x68/0x440 
 [<00000000001b4668>] kthread+0x128/0x130 
 [<0000000000102fac>] __ret_from_fork+0x3c/0x50 
 [<0000000000d9d3aa>] ret_from_fork+0xa/0x40 
INFO: lockdep is turned off.

Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio

Posted by Jason Wang 3 years, 12 months ago

On Tue, May 10, 2022 at 5:29 PM Cornelia Huck <cohuck@redhat.com> wrote:
>
> On Sat, May 07 2022, Jason Wang <jasowang@redhat.com> wrote:
>
> > Hi All:
> >
> > This is a rework on the IRQ hardening for virtio which is done
> > previously by the following commits are reverted:
> >
> > 9e35276a5344 ("virtio_pci: harden MSI-X interrupts")
> > 080cd7c3ac87 ("virtio-pci: harden INTX interrupts")
> >
> > The reason is that it depends on the IRQF_NO_AUTOEN which may conflict
> > with the assumption of the affinity managed IRQ that is used by some
> > virtio drivers. And what's more, it is only done for virtio-pci but
> > not other transports.
> >
> > In this rework, I try to implement a general virtio solution which
> > borrows the idea of the INTX hardening by re-using per virtqueue
> > boolean vq->broken and toggle it in virtio_device_ready() and
> > virtio_reset_device(). Then we can simply reuse the existing checks in
> > the vring_interrupt() and return early if the driver is not ready.
> >
> > Note that, I only did compile test on ccw and MMIO transport.
>
> Lockdep is unhappy with the ccw parts:
>
> ================================
> WARNING: inconsistent lock state
> 5.18.0-rc6+ #191 Not tainted
> --------------------------------
> inconsistent {IN-HARDIRQ-R} -> {HARDIRQ-ON-W} usage.
> kworker/u4:0/9 [HC0[0]:SC0[0]:HE1:SE1] takes:
> 00000000058e9618 (&vcdev->irq_lock){+-..}-{2:2}, at: virtio_ccw_synchronize_cbs+0x4e/0x60
> {IN-HARDIRQ-R} state was registered at:
>   __lock_acquire+0x442/0xc20
>   lock_acquire.part.0+0xdc/0x228
>   lock_acquire+0xa6/0x1b0
>   _raw_read_lock_irqsave+0x72/0x100
>   virtio_ccw_int_handler+0x84/0x238
>   ccw_device_call_handler+0x72/0xd0
>   ccw_device_irq+0x7a/0x198
>   do_cio_interrupt+0x11c/0x1d0
>   __handle_irq_event_percpu+0xc2/0x318
>   handle_irq_event_percpu+0x26/0x68
>   handle_percpu_irq+0x64/0x88
>   generic_handle_irq+0x40/0x58
>   do_irq_async+0x56/0xb0
>   do_io_irq+0x82/0x160
>   io_int_handler+0xe6/0x120
>   rcu_read_lock_sched_held+0x3e/0xb0
>   lock_acquired+0x12e/0x208
>   new_inode+0x3e/0xd0
>   debugfs_get_inode+0x22/0x68
>   __debugfs_create_file+0x78/0x1c0
>   debugfs_create_file_unsafe+0x36/0x58
>   debugfs_create_u32+0x38/0x68
>   sched_init_debug+0xb0/0x1c0
>   do_one_initcall+0x108/0x280
>   do_initcalls+0x124/0x148
>   kernel_init_freeable+0x242/0x280
>   kernel_init+0x2e/0x158
>   __ret_from_fork+0x3c/0x50
>   ret_from_fork+0xa/0x40
> irq event stamp: 539789
> hardirqs last  enabled at (539789): [<0000000000d9c632>] _raw_spin_unlock_irqrestore+0x72/0x88
> hardirqs last disabled at (539788): [<0000000000d9c2b6>] _raw_spin_lock_irqsave+0x96/0xd0
> softirqs last  enabled at (539568): [<0000000000d9e0d4>] __do_softirq+0x434/0x588
> softirqs last disabled at (539503): [<000000000018cd66>] __irq_exit_rcu+0x146/0x170
>
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>        CPU0
>        ----
>   lock(&vcdev->irq_lock);
>   <Interrupt>
>     lock(&vcdev->irq_lock);
>
>  *** DEADLOCK ***

It looks to me we need to use write_lock_irq()/write_unlock_irq() to
do the synchronization.

And we probably need to keep the
read_lock_irqsave()/read_lock_irqrestore() logic since I can see the
virtio_ccw_int_handler() to be called from process context (e.g from
the io_subchannel_quiesce()).

Thanks

>
> 2 locks held by kworker/u4:0/9:
>  #0: 000000000288d948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1ea/0x658
>  #1: 000003800004bdc8 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x1ea/0x658
>
> stack backtrace:
> CPU: 1 PID: 9 Comm: kworker/u4:0 Not tainted 5.18.0-rc6+ #191
> Hardware name: QEMU 8561 QEMU (KVM/Linux)
> Workqueue: events_unbound async_run_entry_fn
> Call Trace:
>  [<0000000000d8af22>] dump_stack_lvl+0x92/0xd0
>  [<00000000002032ac>] mark_lock_irq+0x864/0x968
>  [<0000000000203670>] mark_lock.part.0+0x2c0/0x790
>  [<0000000000203cea>] mark_usage+0x10a/0x178
>  [<000000000020692a>] __lock_acquire+0x442/0xc20
>  [<0000000000207cc4>] lock_acquire.part.0+0xdc/0x228
>  [<0000000000207eb6>] lock_acquire+0xa6/0x1b0
>  [<0000000000d9c774>] _raw_write_lock+0x54/0xa8
>  [<0000000000d5a1f6>] virtio_ccw_synchronize_cbs+0x4e/0x60
>  [<00000000008eec04>] register_virtio_device+0xdc/0x1b0
>  [<0000000000d5aabe>] virtio_ccw_online+0x246/0x2e8
>  [<0000000000c9fecc>] ccw_device_set_online+0x1c4/0x540
>  [<0000000000d5a05e>] virtio_ccw_auto_online+0x26/0x50
>  [<00000000001ba2b0>] async_run_entry_fn+0x40/0x108
>  [<00000000001ab9b4>] process_one_work+0x2a4/0x658
>  [<00000000001abdd0>] worker_thread+0x68/0x440
>  [<00000000001b4668>] kthread+0x128/0x130
>  [<0000000000102fac>] __ret_from_fork+0x3c/0x50
>  [<0000000000d9d3aa>] ret_from_fork+0xa/0x40
> INFO: lockdep is turned off.
>

Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio

Posted by Halil Pasic 3 years, 12 months ago

On Wed, 11 May 2022 10:22:59 +0800
Jason Wang <jasowang@redhat.com> wrote:

> >        CPU0
> >        ----
> >   lock(&vcdev->irq_lock);
> >   <Interrupt>
> >     lock(&vcdev->irq_lock);
> >
> >  *** DEADLOCK ***  
> 
> It looks to me we need to use write_lock_irq()/write_unlock_irq() to
> do the synchronization.
> 
> And we probably need to keep the
> read_lock_irqsave()/read_lock_irqrestore() logic since I can see the
> virtio_ccw_int_handler() to be called from process context (e.g from
> the io_subchannel_quiesce()).
> 

Sounds correct.

Regards,
Halil

Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio

Posted by Jason Wang 3 years, 12 months ago

On Wed, May 11, 2022 at 10:02 PM Halil Pasic <pasic@linux.ibm.com> wrote:
>
> On Wed, 11 May 2022 10:22:59 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
> > >        CPU0
> > >        ----
> > >   lock(&vcdev->irq_lock);
> > >   <Interrupt>
> > >     lock(&vcdev->irq_lock);
> > >
> > >  *** DEADLOCK ***
> >
> > It looks to me we need to use write_lock_irq()/write_unlock_irq() to
> > do the synchronization.
> >
> > And we probably need to keep the
> > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the
> > virtio_ccw_int_handler() to be called from process context (e.g from
> > the io_subchannel_quiesce()).
> >
>
> Sounds correct.

As Cornelia and Vineeth pointed out, all the paths the vring_interrupt
is called with irq disabled.

So I will use spin_lock()/spin_unlock() in the next version.

Thanks

>
> Regards,
> Halil
>

Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio

Posted by Halil Pasic 3 years, 11 months ago

On Thu, 12 May 2022 11:31:08 +0800
Jason Wang <jasowang@redhat.com> wrote:

> > > It looks to me we need to use write_lock_irq()/write_unlock_irq() to
> > > do the synchronization.
> > >
> > > And we probably need to keep the
> > > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the
> > > virtio_ccw_int_handler() to be called from process context (e.g from
> > > the io_subchannel_quiesce()).
> > >  
> >
> > Sounds correct.  
> 
> As Cornelia and Vineeth pointed out, all the paths the vring_interrupt
> is called with irq disabled.
> 
> So I will use spin_lock()/spin_unlock() in the next version.

Can we do some sort of an assertion that if the kernel is built with
the corresponding debug features will make sure this assumption holds
(and warn if it does not)? That assertion would also document the fact.

If an assertion is not possible, I think we should at least place a
strategic comment that documents our assumption.

Regards,
Halil

> 
> Thanks

Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio

Posted by Michael S. Tsirkin 3 years, 11 months ago

On Mon, May 16, 2022 at 01:20:06PM +0200, Halil Pasic wrote:
> On Thu, 12 May 2022 11:31:08 +0800
> Jason Wang <jasowang@redhat.com> wrote:
> 
> > > > It looks to me we need to use write_lock_irq()/write_unlock_irq() to
> > > > do the synchronization.
> > > >
> > > > And we probably need to keep the
> > > > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the
> > > > virtio_ccw_int_handler() to be called from process context (e.g from
> > > > the io_subchannel_quiesce()).
> > > >  
> > >
> > > Sounds correct.  
> > 
> > As Cornelia and Vineeth pointed out, all the paths the vring_interrupt
> > is called with irq disabled.
> > 
> > So I will use spin_lock()/spin_unlock() in the next version.
> 
> Can we do some sort of an assertion that if the kernel is built with
> the corresponding debug features will make sure this assumption holds
> (and warn if it does not)? That assertion would also document the fact.

Lockdep will do this automatically if you get it wrong, just like it
did here.

> If an assertion is not possible, I think we should at least place a
> strategic comment that documents our assumption.

That can't hurt.

> Regards,
> Halil
> 
> > 
> > Thanks

Re: [PATCH V4 0/9] rework on the IRQ hardening of virtio

Posted by Jason Wang 3 years, 11 months ago

On Mon, May 16, 2022 at 10:25 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, May 16, 2022 at 01:20:06PM +0200, Halil Pasic wrote:
> > On Thu, 12 May 2022 11:31:08 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >
> > > > > It looks to me we need to use write_lock_irq()/write_unlock_irq() to
> > > > > do the synchronization.
> > > > >
> > > > > And we probably need to keep the
> > > > > read_lock_irqsave()/read_lock_irqrestore() logic since I can see the
> > > > > virtio_ccw_int_handler() to be called from process context (e.g from
> > > > > the io_subchannel_quiesce()).
> > > > >
> > > >
> > > > Sounds correct.
> > >
> > > As Cornelia and Vineeth pointed out, all the paths the vring_interrupt
> > > is called with irq disabled.
> > >
> > > So I will use spin_lock()/spin_unlock() in the next version.
> >
> > Can we do some sort of an assertion that if the kernel is built with
> > the corresponding debug features will make sure this assumption holds
> > (and warn if it does not)? That assertion would also document the fact.
>
> Lockdep will do this automatically if you get it wrong, just like it
> did here.
>
> > If an assertion is not possible, I think we should at least place a
> > strategic comment that documents our assumption.
>
> That can't hurt.

I will add some comments here.

Thanks

>
> > Regards,
> > Halil
> >
> > >
> > > Thanks
>