[PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices

peterx@redhat.com posted 4 patches 2 years ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20240117091559.144730-1-peterx@redhat.com
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Peter Xu <peterx@redhat.com>, Jason Wang <jasowang@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>
include/sysemu/reset.h |  5 ++++
hw/core/reset.c        | 67 ++++++++++++++++++++++++++++++------------
hw/i386/intel_iommu.c  | 56 +++++++++++++++++++++++++++++++++--
3 files changed, 107 insertions(+), 21 deletions(-)
[PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
Posted by peterx@redhat.com 2 years ago
From: Peter Xu <peterx@redhat.com>

There're issue reported that when syetem_reset the VM with an intel iommu
device and MT2892 PF(mlx5_core driver), the host kernel throws DMAR error.

  https://issues.redhat.com/browse/RHEL-7188

Alex quickly spot a possible issue on ordering of device resets.

It's verified by our QE team then that it is indeed the root cause of the
problem.  Consider when vIOMMU is reset before a VFIO device in a system
reset: the device can be doing DMAs even if the vIOMMU is gone; in this
specific context it means the shadow mapping can already be completely
destroyed.  Host will see these DMAs as malicious and report.

To fix it, we'll need to make sure all devices under the vIOMMU device
hierachy will be reset before the vIOMMU itself.  There's plenty of trick
inside, one can get those by reading the last patch.

I didn't check other vIOMMUs, but this series should fix the issue for VT-d
as of now.  The solution can be slightly ugly, but a beautiful one can be
very non-trivial.

Review comments welcomed, thanks.

Peter Xu (4):
  reset: qemu_register_reset_one()
  reset: Allow multiple stages of system resets
  intel_iommu: Tear down address spaces before IOMMU reset
  intel_iommu: Reset vIOMMU at the last stage of system reset

 include/sysemu/reset.h |  5 ++++
 hw/core/reset.c        | 67 ++++++++++++++++++++++++++++++------------
 hw/i386/intel_iommu.c  | 56 +++++++++++++++++++++++++++++++++--
 3 files changed, 107 insertions(+), 21 deletions(-)

-- 
2.43.0
Re: [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
Posted by Eric Auger 1 year ago
Hi,

On 1/17/24 10:15 AM, peterx@redhat.com wrote:
> From: Peter Xu <peterx@redhat.com>
>
> There're issue reported that when syetem_reset the VM with an intel iommu
> device and MT2892 PF(mlx5_core driver), the host kernel throws DMAR error.
>
>   https://issues.redhat.com/browse/RHEL-7188
>
> Alex quickly spot a possible issue on ordering of device resets.
>
> It's verified by our QE team then that it is indeed the root cause of the
> problem.  Consider when vIOMMU is reset before a VFIO device in a system
> reset: the device can be doing DMAs even if the vIOMMU is gone; in this
> specific context it means the shadow mapping can already be completely
> destroyed.  Host will see these DMAs as malicious and report.
>
> To fix it, we'll need to make sure all devices under the vIOMMU device
> hierachy will be reset before the vIOMMU itself.  There's plenty of trick
> inside, one can get those by reading the last patch.

I have a case with intel iommu and vhost-net where I see that the vIOMMU
gets disabled by the guest before vhost_dev_stop() causing some spurious
lookup failures. This happens when rebooting the guest (see [PATCH]
hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
<https://lore.kernel.org/all/20250120173339.865681-1-eric.auger@redhat.com/>
https://lore.kernel.org/all/20250120173339.865681-1-eric.auger@redhat.com/
for more context).

I observe that
1) the guest disables the IOMMU through the vtd_handle_gcmd_write
2) vtd_reset (initiated from qemu_system_reset)
3) vhost_dev_stop (initiated from qemu_system_reset)

So I can effectively see the viommu is reset before vhost-net stop. 2)
is already an issue that looks the same as the one addressed in this
series. Now I am also in trouble wrt 1). I don't know yet which chain of
events causes the VTD GCMD TE bit to be written but this would also
cause spurious vhost IOLTB misses.

I haven't seen any follow-up on this series. Is anyone still looking at
this issue? Peter gave some guidance about the way to rework the reset
chain. Is it still up to date?

Thanks

Eric
>
> I didn't check other vIOMMUs, but this series should fix the issue for VT-d
> as of now.  The solution can be slightly ugly, but a beautiful one can be
> very non-trivial.
>
> Review comments welcomed, thanks.
>
> Peter Xu (4):
>   reset: qemu_register_reset_one()
>   reset: Allow multiple stages of system resets
>   intel_iommu: Tear down address spaces before IOMMU reset
>   intel_iommu: Reset vIOMMU at the last stage of system reset
>
>  include/sysemu/reset.h |  5 ++++
>  hw/core/reset.c        | 67 ++++++++++++++++++++++++++++++------------
>  hw/i386/intel_iommu.c  | 56 +++++++++++++++++++++++++++++++++--
>  3 files changed, 107 insertions(+), 21 deletions(-)
>
Re: [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
Posted by Peter Xu 1 year ago
On Thu, Jan 23, 2025 at 10:16:23AM +0100, Eric Auger wrote:
> I haven't seen any follow-up on this series. Is anyone still looking at
> this issue? Peter gave some guidance about the way to rework the reset
> chain. Is it still up to date?

I didn't continue looking at this issue since that time (and also stopped
working on vIOMMU stuff).  No plan to continue from my side..  I suppose
nobody else has either, or I should have got some email like this. :)

It may not be uptodate indeed, so may worth rechecking its validity.

Thanks,

-- 
Peter Xu
Re: [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
Posted by Eric Auger 1 year ago
Hi Peter,

On 1/23/25 6:57 PM, Peter Xu wrote:
> On Thu, Jan 23, 2025 at 10:16:23AM +0100, Eric Auger wrote:
>> I haven't seen any follow-up on this series. Is anyone still looking at
>> this issue? Peter gave some guidance about the way to rework the reset
>> chain. Is it still up to date?
> I didn't continue looking at this issue since that time (and also stopped
> working on vIOMMU stuff).  No plan to continue from my side..  I suppose
> nobody else has either, or I should have got some email like this. :)
>
> It may not be uptodate indeed, so may worth rechecking its validity.

thanks for the update. I will try to pursue the efforts then

Eric
>
> Thanks,
>
Re: [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
Posted by Eric Auger 1 year ago
Hi Peter,


On 1/23/25 6:57 PM, Peter Xu wrote:
> On Thu, Jan 23, 2025 at 10:16:23AM +0100, Eric Auger wrote:
>> I haven't seen any follow-up on this series. Is anyone still looking at
>> this issue? Peter gave some guidance about the way to rework the reset
>> chain. Is it still up to date?
> I didn't continue looking at this issue since that time (and also stopped
> working on vIOMMU stuff).  No plan to continue from my side..  I suppose
> nobody else has either, or I should have got some email like this. :)
>
> It may not be uptodate indeed, so may worth rechecking its validity.
>
> Thanks,
>
Than you for the confirmation

Eric
Re: [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
Posted by Eric Auger 2 years ago
Hi Peter,

On 1/17/24 10:15, peterx@redhat.com wrote:
> From: Peter Xu <peterx@redhat.com>
>
> There're issue reported that when syetem_reset the VM with an intel iommu
system_reset
> device and MT2892 PF(mlx5_core driver), the host kernel throws DMAR error.
>
>   https://issues.redhat.com/browse/RHEL-7188
>
> Alex quickly spot a possible issue on ordering of device resets.
>
> It's verified by our QE team then that it is indeed the root cause of the
> problem.  Consider when vIOMMU is reset before a VFIO device in a system
> reset: the device can be doing DMAs even if the vIOMMU is gone; in this
> specific context it means the shadow mapping can already be completely
> destroyed.  Host will see these DMAs as malicious and report.
That's curious we did not get this earlier?
>
> To fix it, we'll need to make sure all devices under the vIOMMU device
> hierachy will be reset before the vIOMMU itself.  There's plenty of trick
> inside, one can get those by reading the last patch.
Not sure what you meant here ;-)
>
> I didn't check other vIOMMUs, but this series should fix the issue for VT-d
> as of now.  The solution can be slightly ugly, but a beautiful one can be
> very non-trivial.
>
> Review comments welcomed, thanks.
Thanks

Eric
>
> Peter Xu (4):
>   reset: qemu_register_reset_one()
>   reset: Allow multiple stages of system resets
>   intel_iommu: Tear down address spaces before IOMMU reset
>   intel_iommu: Reset vIOMMU at the last stage of system reset
>
>  include/sysemu/reset.h |  5 ++++
>  hw/core/reset.c        | 67 ++++++++++++++++++++++++++++++------------
>  hw/i386/intel_iommu.c  | 56 +++++++++++++++++++++++++++++++++--
>  3 files changed, 107 insertions(+), 21 deletions(-)
>
Re: [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
Posted by Peter Xu 2 years ago
On Wed, Jan 17, 2024 at 11:29:08AM +0100, Eric Auger wrote:
> Hi Peter,

Hi, Eric,

Thanks for the reviews!

> 
> On 1/17/24 10:15, peterx@redhat.com wrote:
> > From: Peter Xu <peterx@redhat.com>
> >
> > There're issue reported that when syetem_reset the VM with an intel iommu
> system_reset
> > device and MT2892 PF(mlx5_core driver), the host kernel throws DMAR error.
> >
> >   https://issues.redhat.com/browse/RHEL-7188
> >
> > Alex quickly spot a possible issue on ordering of device resets.
> >
> > It's verified by our QE team then that it is indeed the root cause of the
> > problem.  Consider when vIOMMU is reset before a VFIO device in a system
> > reset: the device can be doing DMAs even if the vIOMMU is gone; in this
> > specific context it means the shadow mapping can already be completely
> > destroyed.  Host will see these DMAs as malicious and report.
> That's curious we did not get this earlier?

I sincerely don't know.  It could be that we just didn't test much on
system resets. Or, we could have overlooked the host dmesgs; after all the
error messages can be benign from functional pov.

> >
> > To fix it, we'll need to make sure all devices under the vIOMMU device
> > hierachy will be reset before the vIOMMU itself.  There's plenty of trick
> > inside, one can get those by reading the last patch.
> Not sure what you meant here ;-)

I meant "how to make sure all the vIOMMU managed devices will be reset
before the vIOMMU" is tricky on the implementation.  I didn't reference any
of those in the cover letter, because I think I stated mostly in patch 4, I
want to reference that patch for the details.  Since I think it's very
tricky, I left that major comment in the code to persist.

Thanks,

-- 
Peter Xu