[PATCH 0/3] s390x/pci: fix ISM reset

Matthew Rosato posted 3 patches 10 months, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20240116223157.73752-1-mjrosato@linux.ibm.com
Maintainers: Matthew Rosato <mjrosato@linux.ibm.com>, Eric Farman <farman@linux.ibm.com>, Halil Pasic <pasic@linux.ibm.com>, Christian Borntraeger <borntraeger@linux.ibm.com>, Richard Henderson <richard.henderson@linaro.org>, David Hildenbrand <david@redhat.com>, Ilya Leoshkevich <iii@linux.ibm.com>, Thomas Huth <thuth@redhat.com>
There is a newer version of this series
hw/s390x/s390-pci-bus.c         | 26 ++++++++++++++++---------
hw/s390x/s390-pci-kvm.c         | 34 +++++++++++++++++++++++++++++++--
hw/s390x/s390-virtio-ccw.c      |  2 ++
include/hw/s390x/s390-pci-bus.h |  2 ++
4 files changed, 53 insertions(+), 11 deletions(-)
[PATCH 0/3] s390x/pci: fix ISM reset
Posted by Matthew Rosato 10 months, 2 weeks ago
Commit ef1535901a0 (re-)introduced an issue where passthrough ISM devices
on s390x would enter an error state after reboot.  This was previously fixed
by 03451953c79e, using device reset callbacks, however the change in
ef1535901a0 effectively triggers a cold reset of the pci bus before the
device reset callbacks are triggered.

To resolve this, this series proposes to remove the use of the reset callback
for ISM cleanup and instead trigger ISM reset from subsystem_reset before 
triggering bus resets.  This has to happen before the bus resets because the
reset of s390-pcihost will trigger reset of the PCI bus followed by the
s390-pci bus, and the former will trigger vfio-pci reset / the aperture-wide
unmap that ISM gets upset about.
 
  /s390-pcihost (s390-pcihost)
    /pci.0 (PCI)
    /s390-pcibus.0 (s390-pcibus)
    
While fixing this, it was also noted that kernel warnings could be seen that
indicate a guest ISC reference count error.  That's because in some reset
cases we were not bothering to disable AIF, but would again re-enable it after
the reset (causing the reference count to grow erroneously).  This was a base
issue that went unnoticed because the kernel previously did not detect and
issue a warning for this scenario. 

Matthew Rosato (3):
  s390x/pci: avoid double enable/disable of aif
  s390x/pci: refresh fh before disabling aif
  s390x/pci: drive ISM reset from subsystem reset

 hw/s390x/s390-pci-bus.c         | 26 ++++++++++++++++---------
 hw/s390x/s390-pci-kvm.c         | 34 +++++++++++++++++++++++++++++++--
 hw/s390x/s390-virtio-ccw.c      |  2 ++
 include/hw/s390x/s390-pci-bus.h |  2 ++
 4 files changed, 53 insertions(+), 11 deletions(-)

-- 
2.43.0
Re: [PATCH 0/3] s390x/pci: fix ISM reset
Posted by Michael Tokarev 10 months, 1 week ago
17.01.2024 01:31, Matthew Rosato:
> Commit ef1535901a0 (re-)introduced an issue where passthrough ISM devices
> on s390x would enter an error state after reboot.  This was previously fixed
> by 03451953c79e, using device reset callbacks, however the change in
> ef1535901a0 effectively triggers a cold reset of the pci bus before the
> device reset callbacks are triggered.
> 
> To resolve this, this series proposes to remove the use of the reset callback
> for ISM cleanup and instead trigger ISM reset from subsystem_reset before
> triggering bus resets.  This has to happen before the bus resets because the
> reset of s390-pcihost will trigger reset of the PCI bus followed by the
> s390-pci bus, and the former will trigger vfio-pci reset / the aperture-wide
> unmap that ISM gets upset about.
>   
>    /s390-pcihost (s390-pcihost)
>      /pci.0 (PCI)
>      /s390-pcibus.0 (s390-pcibus)
>      
> While fixing this, it was also noted that kernel warnings could be seen that
> indicate a guest ISC reference count error.  That's because in some reset
> cases we were not bothering to disable AIF, but would again re-enable it after
> the reset (causing the reference count to grow erroneously).  This was a base
> issue that went unnoticed because the kernel previously did not detect and
> issue a warning for this scenario.

Is it a -stable material, or not worth picking up for stable?

Thanks,

/mjt
Re: [PATCH 0/3] s390x/pci: fix ISM reset
Posted by Thomas Huth 10 months, 1 week ago
On 18/01/2024 07.03, Michael Tokarev wrote:
> 17.01.2024 01:31, Matthew Rosato:
>> Commit ef1535901a0 (re-)introduced an issue where passthrough ISM devices
>> on s390x would enter an error state after reboot.  This was previously fixed
>> by 03451953c79e, using device reset callbacks, however the change in
>> ef1535901a0 effectively triggers a cold reset of the pci bus before the
>> device reset callbacks are triggered.
>>
>> To resolve this, this series proposes to remove the use of the reset callback
>> for ISM cleanup and instead trigger ISM reset from subsystem_reset before
>> triggering bus resets.  This has to happen before the bus resets because the
>> reset of s390-pcihost will trigger reset of the PCI bus followed by the
>> s390-pci bus, and the former will trigger vfio-pci reset / the aperture-wide
>> unmap that ISM gets upset about.
>>    /s390-pcihost (s390-pcihost)
>>      /pci.0 (PCI)
>>      /s390-pcibus.0 (s390-pcibus)
>> While fixing this, it was also noted that kernel warnings could be seen that
>> indicate a guest ISC reference count error.  That's because in some reset
>> cases we were not bothering to disable AIF, but would again re-enable it 
>> after
>> the reset (causing the reference count to grow erroneously).  This was a base
>> issue that went unnoticed because the kernel previously did not detect and
>> issue a warning for this scenario.
> 
> Is it a -stable material, or not worth picking up for stable?

It's definitely stable material, but IIUC there will be a v2 with some minor 
fixes.

  Thomas


Re: [PATCH 0/3] s390x/pci: fix ISM reset
Posted by Michael Tokarev 10 months, 1 week ago
18.01.2024 10:19, Thomas Huth:
>> Is it a -stable material, or not worth picking up for stable?
> 
> It's definitely stable material, but IIUC there will be a v2 with some minor fixes.

Yeah, I figured there will be v2.

Just to remind, - please add Cc: qemu-stable@ when appropriate (or mark it any other way,
or just forward it qemu-stable@, whatever, - just so it wont get lost).  There's no need
to do that for this patchset, as I already noticed this one :)

Thank you for the comments!

/mjt