On Wed, Jan 17, 2024 at 11:29:08AM +0100, Eric Auger wrote:
> Hi Peter,
Hi, Eric,
Thanks for the reviews!
>
> On 1/17/24 10:15, peterx@redhat.com wrote:
> > From: Peter Xu <peterx@redhat.com>
> >
> > There're issue reported that when syetem_reset the VM with an intel iommu
> system_reset
> > device and MT2892 PF(mlx5_core driver), the host kernel throws DMAR error.
> >
> > https://issues.redhat.com/browse/RHEL-7188
> >
> > Alex quickly spot a possible issue on ordering of device resets.
> >
> > It's verified by our QE team then that it is indeed the root cause of the
> > problem. Consider when vIOMMU is reset before a VFIO device in a system
> > reset: the device can be doing DMAs even if the vIOMMU is gone; in this
> > specific context it means the shadow mapping can already be completely
> > destroyed. Host will see these DMAs as malicious and report.
> That's curious we did not get this earlier?
I sincerely don't know. It could be that we just didn't test much on
system resets. Or, we could have overlooked the host dmesgs; after all the
error messages can be benign from functional pov.
> >
> > To fix it, we'll need to make sure all devices under the vIOMMU device
> > hierachy will be reset before the vIOMMU itself. There's plenty of trick
> > inside, one can get those by reading the last patch.
> Not sure what you meant here ;-)
I meant "how to make sure all the vIOMMU managed devices will be reset
before the vIOMMU" is tricky on the implementation. I didn't reference any
of those in the cover letter, because I think I stated mostly in patch 4, I
want to reference that patch for the details. Since I think it's very
tricky, I left that major comment in the code to persist.
Thanks,
--
Peter Xu