hw/virtio/vhost.c | 4 ++++ 1 file changed, 4 insertions(+)
When a guest exposed with a vhost device and protected by an
intel IOMMU gets rebooted, we sometimes observe a spurious warning:
Fail to lookup the translated address ffffe000
We observe that the IOMMU gets disabled through a write to the global
command register (CMAR_GCMD.TE) before the vhost device gets stopped.
When this warning happens it can be observed an inflight IOTLB
miss occurs after the IOMMU disable and before the vhost stop. In
that case a flat translation occurs and the check in
vhost_memory_region_lookup() fails.
Let's disable the IOTLB callbacks when all IOMMU MRs have been
unregistered.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
hw/virtio/vhost.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 6aa72fd434..128c2ab094 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener,
break;
}
}
+ if (QLIST_EMPTY(&dev->iommu_list) &&
+ dev->vhost_ops->vhost_set_iotlb_callback) {
+ dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
+ }
}
void vhost_toggle_device_iotlb(VirtIODevice *vdev)
--
2.47.1
Hi Eric, >-----Original Message----- >From: Eric Auger <eric.auger@redhat.com> >Subject: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets >disabled > >When a guest exposed with a vhost device and protected by an >intel IOMMU gets rebooted, we sometimes observe a spurious warning: > >Fail to lookup the translated address ffffe000 Do you see this print once during one time reboot? > >We observe that the IOMMU gets disabled through a write to the global >command register (CMAR_GCMD.TE) before the vhost device gets stopped. >When this warning happens it can be observed an inflight IOTLB >miss occurs after the IOMMU disable and before the vhost stop. In >that case a flat translation occurs and the check in >vhost_memory_region_lookup() fails. > >Let's disable the IOTLB callbacks when all IOMMU MRs have been >unregistered. Try to understand the sequence, is it like below? vhost vcpu call into vtd_iommu_translate(); set s->dmar_enabled = false; switch off iommu address space; disable vhost IOTLB callbacks; check if !s->dmar_enabled; return flat translation and trigger warning Thanks Zhenzhong > >Signed-off-by: Eric Auger <eric.auger@redhat.com> >--- > hw/virtio/vhost.c | 4 ++++ > 1 file changed, 4 insertions(+) > >diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >index 6aa72fd434..128c2ab094 100644 >--- a/hw/virtio/vhost.c >+++ b/hw/virtio/vhost.c >@@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener >*listener, > break; > } > } >+ if (QLIST_EMPTY(&dev->iommu_list) && >+ dev->vhost_ops->vhost_set_iotlb_callback) { >+ dev->vhost_ops->vhost_set_iotlb_callback(dev, false); >+ } > } > > void vhost_toggle_device_iotlb(VirtIODevice *vdev) >-- >2.47.1
Hi Zhenzhong, On 1/21/25 10:18 AM, Duan, Zhenzhong wrote: > Hi Eric, > >> -----Original Message----- >> From: Eric Auger <eric.auger@redhat.com> >> Subject: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets >> disabled >> >> When a guest exposed with a vhost device and protected by an >> intel IOMMU gets rebooted, we sometimes observe a spurious warning: >> >> Fail to lookup the translated address ffffe000 > Do you see this print once during one time reboot? Actually this happens rarely on reboot. The reproducibility is of the order of magnitude of 1/10 for me. I use a vm with vhost net device + virtual intel iommu featuring a crontab job. @reboot /usr/sbin/reboot > >> We observe that the IOMMU gets disabled through a write to the global >> command register (CMAR_GCMD.TE) before the vhost device gets stopped. >> When this warning happens it can be observed an inflight IOTLB >> miss occurs after the IOMMU disable and before the vhost stop. In >> that case a flat translation occurs and the check in >> vhost_memory_region_lookup() fails. >> >> Let's disable the IOTLB callbacks when all IOMMU MRs have been >> unregistered. > Try to understand the sequence, is it like below? > > vhost vcpu > > call into vtd_iommu_translate(); No that's a kernel vhost translate request that normally tries to find out the translated address on kernel side in the IOTLB but since the data is not there it ends up asking for the translation to user space ... > > set s->dmar_enabled = false; > switch off iommu address space; > disable vhost IOTLB callbacks; vtd_handle_gcmd_write/vtd_handle_gcmd_te/vtd_handle_gcmd_te which eventually calls vhost_iommu_region_del > > check if !s->dmar_enabled; > return flat translation and trigger warning vhost inflight translation reaches user space through vhost_device_iotlb_miss() > > Thanks > Zhenzhong > >> Signed-off-by: Eric Auger <eric.auger@redhat.com> >> --- >> hw/virtio/vhost.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >> index 6aa72fd434..128c2ab094 100644 >> --- a/hw/virtio/vhost.c >> +++ b/hw/virtio/vhost.c >> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener >> *listener, >> break; >> } >> } >> + if (QLIST_EMPTY(&dev->iommu_list) && >> + dev->vhost_ops->vhost_set_iotlb_callback) { >> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); >> + } >> } >> >> void vhost_toggle_device_iotlb(VirtIODevice *vdev) >> -- >> 2.47.1
>-----Original Message----- >From: Eric Auger <eric.auger@redhat.com> >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets >disabled > >Hi Zhenzhong, > > >On 1/21/25 10:18 AM, Duan, Zhenzhong wrote: >> Hi Eric, >> >>> -----Original Message----- >>> From: Eric Auger <eric.auger@redhat.com> >>> Subject: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets >>> disabled >>> >>> When a guest exposed with a vhost device and protected by an >>> intel IOMMU gets rebooted, we sometimes observe a spurious warning: >>> >>> Fail to lookup the translated address ffffe000 >> Do you see this print once during one time reboot? >Actually this happens rarely on reboot. The reproducibility is of the >order of magnitude of 1/10 for me. I use a vm with vhost net device + >virtual intel iommu featuring a crontab job. >@reboot /usr/sbin/reboot >> >>> We observe that the IOMMU gets disabled through a write to the global >>> command register (CMAR_GCMD.TE) before the vhost device gets stopped. >>> When this warning happens it can be observed an inflight IOTLB >>> miss occurs after the IOMMU disable and before the vhost stop. In >>> that case a flat translation occurs and the check in >>> vhost_memory_region_lookup() fails. >>> >>> Let's disable the IOTLB callbacks when all IOMMU MRs have been >>> unregistered. >> Try to understand the sequence, is it like below? >> >> vhost vcpu >> >> call into vtd_iommu_translate(); >No that's a kernel vhost translate request that normally tries to find >out the translated address on kernel side in the IOTLB but since the >data is not there it ends up asking for the translation to user space ... >> >> set s->dmar_enabled = false; >> switch off iommu address space; >> disable vhost IOTLB callbacks; >vtd_handle_gcmd_write/vtd_handle_gcmd_te/vtd_handle_gcmd_te which >eventually calls vhost_iommu_region_del >> >> check if !s->dmar_enabled; >> return flat translation and trigger warning >vhost inflight translation reaches user space through >vhost_device_iotlb_miss() Understood, thanks Eric! BRs. Zhenzhong
On 20/01/2025 18:33, Eric Auger wrote: > When a guest exposed with a vhost device and protected by an > intel IOMMU gets rebooted, we sometimes observe a spurious warning: > > Fail to lookup the translated address ffffe000 > > We observe that the IOMMU gets disabled through a write to the global > command register (CMAR_GCMD.TE) before the vhost device gets stopped. > When this warning happens it can be observed an inflight IOTLB > miss occurs after the IOMMU disable and before the vhost stop. In > that case a flat translation occurs and the check in > vhost_memory_region_lookup() fails. > > Let's disable the IOTLB callbacks when all IOMMU MRs have been > unregistered. > > Signed-off-by: Eric Auger <eric.auger@redhat.com> > --- > hw/virtio/vhost.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > index 6aa72fd434..128c2ab094 100644 > --- a/hw/virtio/vhost.c > +++ b/hw/virtio/vhost.c > @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener, > break; > } > } > + if (QLIST_EMPTY(&dev->iommu_list) && > + dev->vhost_ops->vhost_set_iotlb_callback) { > + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); > + } > } > > void vhost_toggle_device_iotlb(VirtIODevice *vdev) I think you need the counterpart in vhost_iommu_region_del() (for instance if we have an add after a del that results in an empty list). But you cannot unconditionally enable it (for instance if vhost is not started) Perhaps you should move the vhost_set_iotlb_callback() call from vhost_start()/vhost_stop() to vhost_iommu_region_add()/vhost_iommu_region_del()? Thanks, Laurent
Hi, On 1/21/25 9:31 AM, Laurent Vivier wrote: > On 20/01/2025 18:33, Eric Auger wrote: >> When a guest exposed with a vhost device and protected by an >> intel IOMMU gets rebooted, we sometimes observe a spurious warning: >> >> Fail to lookup the translated address ffffe000 >> >> We observe that the IOMMU gets disabled through a write to the global >> command register (CMAR_GCMD.TE) before the vhost device gets stopped. >> When this warning happens it can be observed an inflight IOTLB >> miss occurs after the IOMMU disable and before the vhost stop. In >> that case a flat translation occurs and the check in >> vhost_memory_region_lookup() fails. >> >> Let's disable the IOTLB callbacks when all IOMMU MRs have been >> unregistered. >> >> Signed-off-by: Eric Auger <eric.auger@redhat.com> >> --- >> hw/virtio/vhost.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >> index 6aa72fd434..128c2ab094 100644 >> --- a/hw/virtio/vhost.c >> +++ b/hw/virtio/vhost.c >> @@ -931,6 +931,10 @@ static void >> vhost_iommu_region_del(MemoryListener *listener, >> break; >> } >> } >> + if (QLIST_EMPTY(&dev->iommu_list) && >> + dev->vhost_ops->vhost_set_iotlb_callback) { >> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); >> + } >> } >> void vhost_toggle_device_iotlb(VirtIODevice *vdev) > > I think you need the counterpart in vhost_iommu_region_del() (for > instance if we have an add after a del that results in an empty list). > But you cannot unconditionally enable it (for instance if vhost is not > started) if we enter vhost_iommu_region_add(), this means that the iommu_listener has been registered in vhost_vdev_start() and thus vdev->vhost_started is set. > > Perhaps you should move the vhost_set_iotlb_callback() call from > vhost_start()/vhost_stop() to > vhost_iommu_region_add()/vhost_iommu_region_del()? I currently fail to understand whether we shouldn't keep listening to iotlb callbacks when the IOMMU gets disabled. In that case shouldn't we flush the kernel IOTLB and update the hdev->mem->regions to reflect the IOMMR MR disablement? Thanks Eric > > Thanks, > Laurent >
On 1/21/25 9:31 AM, Laurent Vivier wrote: > On 20/01/2025 18:33, Eric Auger wrote: >> When a guest exposed with a vhost device and protected by an >> intel IOMMU gets rebooted, we sometimes observe a spurious warning: >> >> Fail to lookup the translated address ffffe000 >> >> We observe that the IOMMU gets disabled through a write to the global >> command register (CMAR_GCMD.TE) before the vhost device gets stopped. >> When this warning happens it can be observed an inflight IOTLB >> miss occurs after the IOMMU disable and before the vhost stop. In >> that case a flat translation occurs and the check in >> vhost_memory_region_lookup() fails. >> >> Let's disable the IOTLB callbacks when all IOMMU MRs have been >> unregistered. >> >> Signed-off-by: Eric Auger <eric.auger@redhat.com> >> --- >> hw/virtio/vhost.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >> index 6aa72fd434..128c2ab094 100644 >> --- a/hw/virtio/vhost.c >> +++ b/hw/virtio/vhost.c >> @@ -931,6 +931,10 @@ static void >> vhost_iommu_region_del(MemoryListener *listener, >> break; >> } >> } >> + if (QLIST_EMPTY(&dev->iommu_list) && >> + dev->vhost_ops->vhost_set_iotlb_callback) { >> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); >> + } >> } >> void vhost_toggle_device_iotlb(VirtIODevice *vdev) > > I think you need the counterpart in vhost_iommu_region_del() (for > instance if we have an add after a del that results in an empty list). I guess you meant vhost_iommu_region_add() > But you cannot unconditionally enable it (for instance if vhost is not > started) agreed. I will further look at the control path. > > Perhaps you should move the vhost_set_iotlb_callback() call from > vhost_start()/vhost_stop() to > vhost_iommu_region_add()/vhost_iommu_region_del()? Interesting. I will study that. Thanks! Eric > > Thanks, > Laurent >
On Tue, Jan 21, 2025 at 09:31:53AM +0100, Laurent Vivier wrote: >On 20/01/2025 18:33, Eric Auger wrote: >>When a guest exposed with a vhost device and protected by an >>intel IOMMU gets rebooted, we sometimes observe a spurious warning: >> >>Fail to lookup the translated address ffffe000 >> >>We observe that the IOMMU gets disabled through a write to the global >>command register (CMAR_GCMD.TE) before the vhost device gets stopped. >>When this warning happens it can be observed an inflight IOTLB >>miss occurs after the IOMMU disable and before the vhost stop. In >>that case a flat translation occurs and the check in >>vhost_memory_region_lookup() fails. >> >>Let's disable the IOTLB callbacks when all IOMMU MRs have been >>unregistered. >> >>Signed-off-by: Eric Auger <eric.auger@redhat.com> >>--- >> hw/virtio/vhost.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >>diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >>index 6aa72fd434..128c2ab094 100644 >>--- a/hw/virtio/vhost.c >>+++ b/hw/virtio/vhost.c >>@@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener, >> break; >> } >> } >>+ if (QLIST_EMPTY(&dev->iommu_list) && >>+ dev->vhost_ops->vhost_set_iotlb_callback) { >>+ dev->vhost_ops->vhost_set_iotlb_callback(dev, false); >>+ } >> } >> void vhost_toggle_device_iotlb(VirtIODevice *vdev) > >I think you need the counterpart in vhost_iommu_region_del() (for I guess you meant vhost_iommu_region_add(). I was going to comment exactly on that, I agree here. >instance if we have an add after a del that results in an empty list). >But you cannot unconditionally enable it (for instance if vhost is not >started) Good point. > >Perhaps you should move the vhost_set_iotlb_callback() call from >vhost_start()/vhost_stop() to >vhost_iommu_region_add()/vhost_iommu_region_del()? I also like this idea. Stefano
Hi Stefano, On 1/21/25 9:45 AM, Stefano Garzarella wrote: > On Tue, Jan 21, 2025 at 09:31:53AM +0100, Laurent Vivier wrote: >> On 20/01/2025 18:33, Eric Auger wrote: >>> When a guest exposed with a vhost device and protected by an >>> intel IOMMU gets rebooted, we sometimes observe a spurious warning: >>> >>> Fail to lookup the translated address ffffe000 >>> >>> We observe that the IOMMU gets disabled through a write to the global >>> command register (CMAR_GCMD.TE) before the vhost device gets stopped. >>> When this warning happens it can be observed an inflight IOTLB >>> miss occurs after the IOMMU disable and before the vhost stop. In >>> that case a flat translation occurs and the check in >>> vhost_memory_region_lookup() fails. >>> >>> Let's disable the IOTLB callbacks when all IOMMU MRs have been >>> unregistered. >>> >>> Signed-off-by: Eric Auger <eric.auger@redhat.com> >>> --- >>> hw/virtio/vhost.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >>> index 6aa72fd434..128c2ab094 100644 >>> --- a/hw/virtio/vhost.c >>> +++ b/hw/virtio/vhost.c >>> @@ -931,6 +931,10 @@ static void >>> vhost_iommu_region_del(MemoryListener *listener, >>> break; >>> } >>> } >>> + if (QLIST_EMPTY(&dev->iommu_list) && >>> + dev->vhost_ops->vhost_set_iotlb_callback) { >>> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); >>> + } >>> } >>> void vhost_toggle_device_iotlb(VirtIODevice *vdev) >> >> I think you need the counterpart in vhost_iommu_region_del() (for > > I guess you meant vhost_iommu_region_add(). I was going to comment > exactly on that, I agree here. > >> instance if we have an add after a del that results in an empty list). >> But you cannot unconditionally enable it (for instance if vhost is >> not started) > > Good point. > >> >> Perhaps you should move the vhost_set_iotlb_callback() call from >> vhost_start()/vhost_stop() to >> vhost_iommu_region_add()/vhost_iommu_region_del()? > > I also like this idea. OK makes sense. I will go in this direction. Eric > > Stefano >
On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> wrote: > > When a guest exposed with a vhost device and protected by an > intel IOMMU gets rebooted, we sometimes observe a spurious warning: > > Fail to lookup the translated address ffffe000 > > We observe that the IOMMU gets disabled through a write to the global > command register (CMAR_GCMD.TE) before the vhost device gets stopped. > When this warning happens it can be observed an inflight IOTLB > miss occurs after the IOMMU disable and before the vhost stop. In > that case a flat translation occurs and the check in > vhost_memory_region_lookup() fails. > > Let's disable the IOTLB callbacks when all IOMMU MRs have been > unregistered. > > Signed-off-by: Eric Auger <eric.auger@redhat.com> > --- > hw/virtio/vhost.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > index 6aa72fd434..128c2ab094 100644 > --- a/hw/virtio/vhost.c > +++ b/hw/virtio/vhost.c > @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener, > break; > } > } > + if (QLIST_EMPTY(&dev->iommu_list) && > + dev->vhost_ops->vhost_set_iotlb_callback) { > + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); > + } So the current code assumes: 1) IOMMU is enabled before vhost starts 2) IOMMU is disabled after vhost stops This patch seems to fix 2) but not 1). Do we need to deal with the IOMMU enabled after vhost starts? Thanks > } > > void vhost_toggle_device_iotlb(VirtIODevice *vdev) > -- > 2.47.1 >
Hi Jason, On 1/21/25 4:27 AM, Jason Wang wrote: > On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> wrote: >> When a guest exposed with a vhost device and protected by an >> intel IOMMU gets rebooted, we sometimes observe a spurious warning: >> >> Fail to lookup the translated address ffffe000 >> >> We observe that the IOMMU gets disabled through a write to the global >> command register (CMAR_GCMD.TE) before the vhost device gets stopped. >> When this warning happens it can be observed an inflight IOTLB >> miss occurs after the IOMMU disable and before the vhost stop. In >> that case a flat translation occurs and the check in >> vhost_memory_region_lookup() fails. >> >> Let's disable the IOTLB callbacks when all IOMMU MRs have been >> unregistered. >> >> Signed-off-by: Eric Auger <eric.auger@redhat.com> >> --- >> hw/virtio/vhost.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >> index 6aa72fd434..128c2ab094 100644 >> --- a/hw/virtio/vhost.c >> +++ b/hw/virtio/vhost.c >> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener, >> break; >> } >> } >> + if (QLIST_EMPTY(&dev->iommu_list) && >> + dev->vhost_ops->vhost_set_iotlb_callback) { >> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); >> + } > So the current code assumes: > > 1) IOMMU is enabled before vhost starts > 2) IOMMU is disabled after vhost stops > > This patch seems to fix 2) but not 1). Do we need to deal with the > IOMMU enabled after vhost starts? sorry I initially misunderstood the above comment. Indeed in the reboot case assumption 2) happens to be wrong. However what I currently do is: stop listening to iotlb miss requests from the kernel because my understanding is those requests are just spurious ones, generate warnings and we do not care since we are rebooting the system. However I do not claim this could handle the case where the IOMMU MR would be turned off and then turned on. I think in that case we should also flush the kernel IOTLB and this is not taken care of in this patch. Is it a relevant use case? wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is that a valid use case as the virtio driver is using the dma api? Eric > > Thanks > >> } >> >> void vhost_toggle_device_iotlb(VirtIODevice *vdev) >> -- >> 2.47.1 >>
On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> wrote: > > > Hi Jason, > > On 1/21/25 4:27 AM, Jason Wang wrote: > > On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> wrote: > >> When a guest exposed with a vhost device and protected by an > >> intel IOMMU gets rebooted, we sometimes observe a spurious warning: > >> > >> Fail to lookup the translated address ffffe000 > >> > >> We observe that the IOMMU gets disabled through a write to the global > >> command register (CMAR_GCMD.TE) before the vhost device gets stopped. > >> When this warning happens it can be observed an inflight IOTLB > >> miss occurs after the IOMMU disable and before the vhost stop. In > >> that case a flat translation occurs and the check in > >> vhost_memory_region_lookup() fails. > >> > >> Let's disable the IOTLB callbacks when all IOMMU MRs have been > >> unregistered. > >> > >> Signed-off-by: Eric Auger <eric.auger@redhat.com> > >> --- > >> hw/virtio/vhost.c | 4 ++++ > >> 1 file changed, 4 insertions(+) > >> > >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > >> index 6aa72fd434..128c2ab094 100644 > >> --- a/hw/virtio/vhost.c > >> +++ b/hw/virtio/vhost.c > >> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener, > >> break; > >> } > >> } > >> + if (QLIST_EMPTY(&dev->iommu_list) && > >> + dev->vhost_ops->vhost_set_iotlb_callback) { > >> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); > >> + } > > So the current code assumes: > > > > 1) IOMMU is enabled before vhost starts > > 2) IOMMU is disabled after vhost stops > > > > This patch seems to fix 2) but not 1). Do we need to deal with the > > IOMMU enabled after vhost starts? > > sorry I initially misunderstood the above comment. Indeed in the reboot > case assumption 2) happens to be wrong. However what I currently do is: > stop listening to iotlb miss requests from the kernel because my > understanding is those requests are just spurious ones, generate > warnings and we do not care since we are rebooting the system. > > However I do not claim this could handle the case where the IOMMU MR > would be turned off and then turned on. I think in that case we should > also flush the kernel IOTLB and this is not taken care of in this patch. > Is it a relevant use case? Not sure. > > wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is > that a valid use case as the virtio driver is using the dma api? It should not be but we can't assume the behaviour of the guest. It could be buggy or even malicious. Btw, we had the following codes while handling te: /* Handle Translation Enable/Disable */ static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en) { if (s->dmar_enabled == en) { return; } trace_vtd_dmar_enable(en); ... vtd_reset_caches(s); vtd_address_space_refresh_all(s); } vtd_address_space_refresh_all() will basically disable the iommu memory region. It looks not sufficient to trigger the region_del callback, maybe we should delete the region or introduce listener callback? Thanks > > Eric > > > > > > Thanks > > > >> } > >> > >> void vhost_toggle_device_iotlb(VirtIODevice *vdev) > >> -- > >> 2.47.1 > >> >
Hi Jason, On 1/22/25 8:17 AM, Jason Wang wrote: > On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> wrote: >> >> Hi Jason, >> >> On 1/21/25 4:27 AM, Jason Wang wrote: >>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> wrote: >>>> When a guest exposed with a vhost device and protected by an >>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning: >>>> >>>> Fail to lookup the translated address ffffe000 >>>> >>>> We observe that the IOMMU gets disabled through a write to the global >>>> command register (CMAR_GCMD.TE) before the vhost device gets stopped. >>>> When this warning happens it can be observed an inflight IOTLB >>>> miss occurs after the IOMMU disable and before the vhost stop. In >>>> that case a flat translation occurs and the check in >>>> vhost_memory_region_lookup() fails. >>>> >>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been >>>> unregistered. >>>> >>>> Signed-off-by: Eric Auger <eric.auger@redhat.com> >>>> --- >>>> hw/virtio/vhost.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >>>> index 6aa72fd434..128c2ab094 100644 >>>> --- a/hw/virtio/vhost.c >>>> +++ b/hw/virtio/vhost.c >>>> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener, >>>> break; >>>> } >>>> } >>>> + if (QLIST_EMPTY(&dev->iommu_list) && >>>> + dev->vhost_ops->vhost_set_iotlb_callback) { >>>> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); >>>> + } >>> So the current code assumes: >>> >>> 1) IOMMU is enabled before vhost starts >>> 2) IOMMU is disabled after vhost stops >>> >>> This patch seems to fix 2) but not 1). Do we need to deal with the >>> IOMMU enabled after vhost starts? >> sorry I initially misunderstood the above comment. Indeed in the reboot >> case assumption 2) happens to be wrong. However what I currently do is: >> stop listening to iotlb miss requests from the kernel because my >> understanding is those requests are just spurious ones, generate >> warnings and we do not care since we are rebooting the system. >> >> However I do not claim this could handle the case where the IOMMU MR >> would be turned off and then turned on. I think in that case we should >> also flush the kernel IOTLB and this is not taken care of in this patch. >> Is it a relevant use case? > Not sure. > >> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is >> that a valid use case as the virtio driver is using the dma api? > It should not be but we can't assume the behaviour of the guest. It > could be buggy or even malicious. agreed > > Btw, we had the following codes while handling te: > > /* Handle Translation Enable/Disable */ > static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en) > { > if (s->dmar_enabled == en) { > return; > } > > trace_vtd_dmar_enable(en); > > ... > > vtd_reset_caches(s); > vtd_address_space_refresh_all(s); > } > > vtd_address_space_refresh_all() will basically disable the iommu > memory region. It looks not sufficient to trigger the region_del > callback, maybe we should delete the region or introduce listener > callback? This is exactly the code path which is entered in my use case. vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But given the current implement of this latter the IOTLB callback is not unset and the kernel IOTLB is not refreshed. Also as I pointed out the hdev->mem->regions are not updated? shouldn't they. Can you explain what they correspond to? Thanks Eric > > Thanks > >> Eric >> >> >>> Thanks >>> >>>> } >>>> >>>> void vhost_toggle_device_iotlb(VirtIODevice *vdev) >>>> -- >>>> 2.47.1 >>>>
On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote: > > Hi Jason, > > > On 1/22/25 8:17 AM, Jason Wang wrote: > > On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> wrote: > >> > >> Hi Jason, > >> > >> On 1/21/25 4:27 AM, Jason Wang wrote: > >>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> wrote: > >>>> When a guest exposed with a vhost device and protected by an > >>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning: > >>>> > >>>> Fail to lookup the translated address ffffe000 > >>>> > >>>> We observe that the IOMMU gets disabled through a write to the global > >>>> command register (CMAR_GCMD.TE) before the vhost device gets stopped. > >>>> When this warning happens it can be observed an inflight IOTLB > >>>> miss occurs after the IOMMU disable and before the vhost stop. In > >>>> that case a flat translation occurs and the check in > >>>> vhost_memory_region_lookup() fails. > >>>> > >>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been > >>>> unregistered. > >>>> > >>>> Signed-off-by: Eric Auger <eric.auger@redhat.com> > >>>> --- > >>>> hw/virtio/vhost.c | 4 ++++ > >>>> 1 file changed, 4 insertions(+) > >>>> > >>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > >>>> index 6aa72fd434..128c2ab094 100644 > >>>> --- a/hw/virtio/vhost.c > >>>> +++ b/hw/virtio/vhost.c > >>>> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener, > >>>> break; > >>>> } > >>>> } > >>>> + if (QLIST_EMPTY(&dev->iommu_list) && > >>>> + dev->vhost_ops->vhost_set_iotlb_callback) { > >>>> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); > >>>> + } > >>> So the current code assumes: > >>> > >>> 1) IOMMU is enabled before vhost starts > >>> 2) IOMMU is disabled after vhost stops > >>> > >>> This patch seems to fix 2) but not 1). Do we need to deal with the > >>> IOMMU enabled after vhost starts? > >> sorry I initially misunderstood the above comment. Indeed in the reboot > >> case assumption 2) happens to be wrong. However what I currently do is: > >> stop listening to iotlb miss requests from the kernel because my > >> understanding is those requests are just spurious ones, generate > >> warnings and we do not care since we are rebooting the system. > >> > >> However I do not claim this could handle the case where the IOMMU MR > >> would be turned off and then turned on. I think in that case we should > >> also flush the kernel IOTLB and this is not taken care of in this patch. > >> Is it a relevant use case? > > Not sure. > > > >> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is > >> that a valid use case as the virtio driver is using the dma api? > > It should not be but we can't assume the behaviour of the guest. It > > could be buggy or even malicious. > > agreed > > > > Btw, we had the following codes while handling te: > > > > /* Handle Translation Enable/Disable */ > > static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en) > > { > > if (s->dmar_enabled == en) { > > return; > > } > > > > trace_vtd_dmar_enable(en); > > > > ... > > > > vtd_reset_caches(s); > > vtd_address_space_refresh_all(s); > > } > > > > vtd_address_space_refresh_all() will basically disable the iommu > > memory region. It looks not sufficient to trigger the region_del > > callback, maybe we should delete the region or introduce listener > > callback? > > This is exactly the code path which is entered in my use case. > > vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But given the current implement of this latter the IOTLB callback is not unset and the kernel IOTLB is not refreshed. Also as I pointed out the hdev->mem->regions are not updated? shouldn't they. Can you explain what they correspond to? Adding Peter for more ideas. I think it's better to find a way to trigger the listener here, probably: 1) add/delete the memory regions instead of enable/disable or 2) introduce new listener ops that can be triggered when a region is enabled or disabled Thanks > > Thanks > > Eric > > > > > Thanks > > > >> Eric > >> > >> > >>> Thanks > >>> > >>>> } > >>>> > >>>> void vhost_toggle_device_iotlb(VirtIODevice *vdev) > >>>> -- > >>>> 2.47.1 > >>>> >
Hi Jason, On 1/23/25 2:34 AM, Jason Wang wrote: > On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote: >> Hi Jason, >> >> >> On 1/22/25 8:17 AM, Jason Wang wrote: >>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> wrote: >>>> Hi Jason, >>>> >>>> On 1/21/25 4:27 AM, Jason Wang wrote: >>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> wrote: >>>>>> When a guest exposed with a vhost device and protected by an >>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning: >>>>>> >>>>>> Fail to lookup the translated address ffffe000 >>>>>> >>>>>> We observe that the IOMMU gets disabled through a write to the global >>>>>> command register (CMAR_GCMD.TE) before the vhost device gets stopped. >>>>>> When this warning happens it can be observed an inflight IOTLB >>>>>> miss occurs after the IOMMU disable and before the vhost stop. In >>>>>> that case a flat translation occurs and the check in >>>>>> vhost_memory_region_lookup() fails. >>>>>> >>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been >>>>>> unregistered. >>>>>> >>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com> >>>>>> --- >>>>>> hw/virtio/vhost.c | 4 ++++ >>>>>> 1 file changed, 4 insertions(+) >>>>>> >>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >>>>>> index 6aa72fd434..128c2ab094 100644 >>>>>> --- a/hw/virtio/vhost.c >>>>>> +++ b/hw/virtio/vhost.c >>>>>> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener, >>>>>> break; >>>>>> } >>>>>> } >>>>>> + if (QLIST_EMPTY(&dev->iommu_list) && >>>>>> + dev->vhost_ops->vhost_set_iotlb_callback) { >>>>>> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); >>>>>> + } >>>>> So the current code assumes: >>>>> >>>>> 1) IOMMU is enabled before vhost starts >>>>> 2) IOMMU is disabled after vhost stops >>>>> >>>>> This patch seems to fix 2) but not 1). Do we need to deal with the >>>>> IOMMU enabled after vhost starts? >>>> sorry I initially misunderstood the above comment. Indeed in the reboot >>>> case assumption 2) happens to be wrong. However what I currently do is: >>>> stop listening to iotlb miss requests from the kernel because my >>>> understanding is those requests are just spurious ones, generate >>>> warnings and we do not care since we are rebooting the system. >>>> >>>> However I do not claim this could handle the case where the IOMMU MR >>>> would be turned off and then turned on. I think in that case we should >>>> also flush the kernel IOTLB and this is not taken care of in this patch. >>>> Is it a relevant use case? >>> Not sure. >>> >>>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is >>>> that a valid use case as the virtio driver is using the dma api? >>> It should not be but we can't assume the behaviour of the guest. It >>> could be buggy or even malicious. >> agreed >>> Btw, we had the following codes while handling te: >>> >>> /* Handle Translation Enable/Disable */ >>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en) >>> { >>> if (s->dmar_enabled == en) { >>> return; >>> } >>> >>> trace_vtd_dmar_enable(en); >>> >>> ... >>> >>> vtd_reset_caches(s); >>> vtd_address_space_refresh_all(s); >>> } >>> >>> vtd_address_space_refresh_all() will basically disable the iommu >>> memory region. It looks not sufficient to trigger the region_del >>> callback, maybe we should delete the region or introduce listener >>> callback? >> This is exactly the code path which is entered in my use case. >> >> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But given the current implement of this latter the IOTLB callback is not unset and the kernel IOTLB is not refreshed. Also as I pointed out the hdev->mem->regions are not updated? shouldn't they. Can you explain what they correspond to? > Adding Peter for more ideas. > > I think it's better to find a way to trigger the listener here, probably: > > 1) add/delete the memory regions instead of enable/disable sorry I don't understand what you mean. The vhost_iommu_region_del call stack is provided below [1]. Write to the intel iommu GCMD TE bit induces a call to vhost_iommu_region_del. This happens before the vhost_dev_stop whose call stack is provided below [2] and originates from a bus reset. We may have inflight IOTLB miss requests happening between both. If this happens, vhost_device_iotlb_miss() fails because the IOVA is not translated anymore by the IOMMU and the iotlb.translated_addr returned by address_space_get_iotlb_entry() is not within the registered vhost_memory_regions looked up in vhost_memory_region_lookup(), hence the "Fail to lookup the translated address" message. It sounds wrong that vhost keeps on using IOVAs that are not translated anymore. It looks we have a reset ordering issue and this patch is just removing the sympton and not the cause. At the moment I don't really get what is initiating the intel iommu TE bit write. This is a guest action but is it initiated from a misordered qemu event? It could also relate to [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices https://lore.kernel.org/all/?q=s%3Aintel_iommu%3A+Reset+vIOMMU+after+all+the+rest+of+devices Thanks Eric [1] #0 vhost_iommu_region_del (listener=0x55555708a388, section=0x7ffff62bf030) at ../hw/virtio/vhost.c:927 #1 0x0000555555c851b4 in address_space_update_topology_pass (as=0x55555822e4f0, old_view=0x7fffe8e81850, new_view=0x55555723dfa0, adding=false) at ../system/memory.c:977 #2 0x0000555555c857a0 in address_space_set_flatview (as=0x55555822e4f0) at ../system/memory.c:1079 #3 0x0000555555c85986 in memory_region_transaction_commit () at ../system/memory.c:1132 #4 0x0000555555c89f25 in memory_region_set_enabled (mr=0x555557dc0d30, enabled=false) at ../system/memory.c:2686 #5 0x0000555555b5edb1 in vtd_switch_address_space (as=0x555557dc0c60) at ../hw/i386/intel_iommu.c:1735 #6 0x0000555555b5ee6f in vtd_switch_address_space_all (s=0x555557db1500) at ../hw/i386/intel_iommu.c:1779 #7 0x0000555555b64533 in vtd_address_space_refresh_all (s=0x555557db1500) at ../hw/i386/intel_iommu.c:4006 #8 0x0000555555b60770 in vtd_handle_gcmd_te (s=0x555557db1500, en=false) at ../hw/i386/intel_iommu.c:2419 #9 0x0000555555b60885 in vtd_handle_gcmd_write (s=0x555557db1500) at ../hw/i386/intel_iommu.c:2449 #10 0x0000555555b61db2 in vtd_mem_write (opaque=0x555557db1500, addr=24, val=100663296, size=4) at ../hw/i386/intel_iommu.c:2991 #11 0x0000555555c833e0 in memory_region_write_accessor (mr=0x555557db1840, addr=24, value=0x7ffff62bf3d8, size=4, shift=0, mask=4294967295, attrs=...) at ../system/memory.c:497 #12 0x0000555555c83711 in access_with_adjusted_size (addr=24, value=0x7ffff62bf3d8, size=4, access_size_min=4, access_size_max=8, access_fn=0x555555c832ea <memory_region_write_accessor>, mr=0x555557db1840, attrs=...) at ../system/memory.c:573 #13 0x0000555555c8698b in memory_region_dispatch_write (mr=0x555557db1840, addr=24, data=100663296, op=MO_32, attrs=...) at ../system/memory.c:1521 #14 0x0000555555c95619 in flatview_write_continue_step (attrs=..., buf=0x7ffff7fbb028 "", len=4, mr_addr=24, l=0x7ffff62bf4c0, mr=0x555557db1840) at ../system/physmem.c:2803 #15 0x0000555555c956eb in flatview_write_continue (fv=0x7fffe852d370, addr=4275634200, attrs=..., ptr=0x7ffff7fbb028, len=4, mr_addr=24, l=4, mr=0x555557db1840) at ../system/physmem.c:2833 #16 0x0000555555c957f9 in flatview_write (fv=0x7fffe852d370, addr=4275634200, attrs=..., buf=0x7ffff7fbb028, len=4) at ../system/physmem.c:2864 #17 0x0000555555c95c39 in address_space_write (as=0x555556ff1720 <address_space_memory>, addr=4275634200, attrs=..., buf=0x7ffff7fbb028, len=4) at ../system/physmem.c:2984 #18 0x0000555555c95cb1 in address_space_rw (as=0x555556ff1720 <address_space_memory>, addr=4275634200, attrs=..., buf=0x7ffff7fbb028, len=4, is_write=true) at ../system/physmem.c:2994 #19 0x0000555555ceeb56 in kvm_cpu_exec (cpu=0x55555727e950) at ../accel/kvm/kvm-all.c:3188 #20 0x0000555555cf1ea6 in kvm_vcpu_thread_fn (arg=0x55555727e950) at ../accel/kvm/kvm-accel-ops.c:50 #21 0x0000555555f6de20 in qemu_thread_start (args=0x555557288960) at ../util/qemu-thread-posix.c:541 #22 0x00007ffff7289e92 in start_thread () at /lib64/libc.so.6 [2] #0 vhost_dev_stop (hdev=0x55555708a2c0, vdev=0x555558234cb0, vrings=false) at ../hw/virtio/vhost.c:2222 #1 0x0000555555990266 in vhost_net_stop_one (net=0x55555708a2c0, dev=0x555558234cb0) at ../hw/net/vhost_net.c:350 #2 0x00005555559906ea in vhost_net_stop (dev=0x555558234cb0, ncs=0x55555826f968, data_queue_pairs=1, cvq=0) at ../hw/net/vhost_net.c:462 #3 0x0000555555c1ad0a in virtio_net_vhost_status (n=0x555558234cb0, status=0 '\000') at ../hw/net/virtio-net.c:318 #4 0x0000555555c1afaf in virtio_net_set_status (vdev=0x555558234cb0, status=0 '\000') at ../hw/net/virtio-net.c:393 #5 0x0000555555c591bd in virtio_set_status (vdev=0x555558234cb0, val=0 '\000') at ../hw/virtio/virtio.c:2242 #6 0x0000555555c595eb in virtio_reset (opaque=0x555558234cb0) at ../hw/virtio/virtio.c:2325 #7 0x0000555555a0d1e1 in virtio_bus_reset (bus=0x555558234c30) at ../hw/virtio/virtio-bus.c:109 #8 0x0000555555a13d51 in virtio_pci_reset (qdev=0x55555822c830) at ../hw/virtio/virtio-pci.c:2282 #9 0x0000555555a14016 in virtio_pci_bus_reset_hold (obj=0x55555822c830, type=RESET_TYPE_COLD) at ../hw/virtio/virtio-pci.c:2322 #10 0x0000555555d08831 in resettable_phase_hold (obj=0x55555822c830, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:180 #11 0x0000555555d00fc5 in bus_reset_child_foreach (obj=0x555557ffa3c0, cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/bus.c:97 #12 0x0000555555d084d8 in resettable_child_foreach (rc=0x555557146f20, obj=0x555557ffa3c0, cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92 #13 0x0000555555d08792 in resettable_phase_hold (obj=0x555557ffa3c0, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169 #14 0x0000555555d0543b in device_reset_child_foreach (obj=0x555557ff98e0, cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/qdev.c:275 #15 0x0000555555d084d8 in resettable_child_foreach (rc=0x55555715a090, obj=0x555557ff98e0, cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92 #16 0x0000555555d08792 in resettable_phase_hold (obj=0x555557ff98e0, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169 #17 0x0000555555d00fc5 in bus_reset_child_foreach (obj=0x555557445120, cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/bus.c:97 #18 0x0000555555d084d8 in resettable_child_foreach (rc=0x555557146f20, obj=0x555557445120, cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92 #19 0x0000555555d08792 in resettable_phase_hold (obj=0x555557445120, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169 #20 0x0000555555d0543b in device_reset_child_foreach (obj=0x5555572d0a00, cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/qdev.c:275 #21 0x0000555555d084d8 in resettable_child_foreach (rc=0x5555570cf410, obj=0x5555572d0a00, cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92 #22 0x0000555555d08792 in resettable_phase_hold (obj=0x5555572d0a00, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169 #23 0x0000555555d00fc5 in bus_reset_child_foreach (obj=0x55555723d270, cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/bus.c:97 #24 0x0000555555d084d8 in resettable_child_foreach (rc=0x5555571bfde0, obj=0x55555723d270, cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92 #25 0x0000555555d08792 in resettable_phase_hold (obj=0x55555723d270, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169 #26 0x0000555555d06d6d in resettable_container_child_foreach (obj=0x55555724a8a0, cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resetcontainer.c:54 #27 0x0000555555d084d8 in resettable_child_foreach (rc=0x5555572180f0, obj=0x55555724a8a0, cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92 #28 0x0000555555d08792 in resettable_phase_hold (obj=0x55555724a8a0, opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169 #29 0x0000555555d0838d in resettable_assert_reset (obj=0x55555724a8a0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:58 #30 0x0000555555d082e5 in resettable_reset (obj=0x55555724a8a0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:45 #31 0x00005555558db5c8 in qemu_devices_reset (reason=SHUTDOWN_CAUSE_GUEST_RESET) at ../hw/core/reset.c:179 #32 0x0000555555b6f5b2 in pc_machine_reset (machine=0x555557234490, reason=SHUTDOWN_CAUSE_GUEST_RESET) at ../hw/i386/pc.c:1877 #33 0x0000555555a5a0a2 in qemu_system_reset (reason=SHUTDOWN_CAUSE_GUEST_RESET) at ../system/runstate.c:516 #34 0x0000555555a5a891 in main_loop_should_exit (status=0x7fffffffd574) at ../system/runstate.c:792 #35 0x0000555555a5a992 in qemu_main_loop () at ../system/runstate.c:825 #36 0x0000555555e9cced in qemu_default_main () at ../system/main.c:37 #37 0x0000555555e9cd2a in main (argc=58, argv=0x7fffffffd6d8) at ../system/main.c:48 > > or > > 2) introduce new listener ops that can be triggered when a region is > enabled or disabled > > Thanks > >> Thanks >> >> Eric >> >>> Thanks >>> >>>> Eric >>>> >>>> >>>>> Thanks >>>>> >>>>>> } >>>>>> >>>>>> void vhost_toggle_device_iotlb(VirtIODevice *vdev) >>>>>> -- >>>>>> 2.47.1 >>>>>>
>-----Original Message----- >From: Eric Auger <eric.auger@redhat.com> >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets >disabled > >Hi Jason, > > >On 1/23/25 2:34 AM, Jason Wang wrote: >> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote: >>> Hi Jason, >>> >>> >>> On 1/22/25 8:17 AM, Jason Wang wrote: >>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> >wrote: >>>>> Hi Jason, >>>>> >>>>> On 1/21/25 4:27 AM, Jason Wang wrote: >>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> >wrote: >>>>>>> When a guest exposed with a vhost device and protected by an >>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning: >>>>>>> >>>>>>> Fail to lookup the translated address ffffe000 >>>>>>> >>>>>>> We observe that the IOMMU gets disabled through a write to the global >>>>>>> command register (CMAR_GCMD.TE) before the vhost device gets >stopped. >>>>>>> When this warning happens it can be observed an inflight IOTLB >>>>>>> miss occurs after the IOMMU disable and before the vhost stop. In >>>>>>> that case a flat translation occurs and the check in >>>>>>> vhost_memory_region_lookup() fails. >>>>>>> >>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been >>>>>>> unregistered. >>>>>>> >>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com> >>>>>>> --- >>>>>>> hw/virtio/vhost.c | 4 ++++ >>>>>>> 1 file changed, 4 insertions(+) >>>>>>> >>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >>>>>>> index 6aa72fd434..128c2ab094 100644 >>>>>>> --- a/hw/virtio/vhost.c >>>>>>> +++ b/hw/virtio/vhost.c >>>>>>> @@ -931,6 +931,10 @@ static void >vhost_iommu_region_del(MemoryListener *listener, >>>>>>> break; >>>>>>> } >>>>>>> } >>>>>>> + if (QLIST_EMPTY(&dev->iommu_list) && >>>>>>> + dev->vhost_ops->vhost_set_iotlb_callback) { >>>>>>> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); >>>>>>> + } >>>>>> So the current code assumes: >>>>>> >>>>>> 1) IOMMU is enabled before vhost starts >>>>>> 2) IOMMU is disabled after vhost stops >>>>>> >>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the >>>>>> IOMMU enabled after vhost starts? >>>>> sorry I initially misunderstood the above comment. Indeed in the reboot >>>>> case assumption 2) happens to be wrong. However what I currently do is: >>>>> stop listening to iotlb miss requests from the kernel because my >>>>> understanding is those requests are just spurious ones, generate >>>>> warnings and we do not care since we are rebooting the system. >>>>> >>>>> However I do not claim this could handle the case where the IOMMU MR >>>>> would be turned off and then turned on. I think in that case we should >>>>> also flush the kernel IOTLB and this is not taken care of in this patch. >>>>> Is it a relevant use case? >>>> Not sure. >>>> >>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is >>>>> that a valid use case as the virtio driver is using the dma api? >>>> It should not be but we can't assume the behaviour of the guest. It >>>> could be buggy or even malicious. >>> agreed >>>> Btw, we had the following codes while handling te: >>>> >>>> /* Handle Translation Enable/Disable */ >>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en) >>>> { >>>> if (s->dmar_enabled == en) { >>>> return; >>>> } >>>> >>>> trace_vtd_dmar_enable(en); >>>> >>>> ... >>>> >>>> vtd_reset_caches(s); >>>> vtd_address_space_refresh_all(s); >>>> } >>>> >>>> vtd_address_space_refresh_all() will basically disable the iommu >>>> memory region. It looks not sufficient to trigger the region_del >>>> callback, maybe we should delete the region or introduce listener >>>> callback? >>> This is exactly the code path which is entered in my use case. >>> >>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But >given the current implement of this latter the IOTLB callback is not unset and the >kernel IOTLB is not refreshed. Also as I pointed out the hdev->mem->regions are >not updated? shouldn't they. Can you explain what they correspond to? >> Adding Peter for more ideas. >> >> I think it's better to find a way to trigger the listener here, probably: >> >> 1) add/delete the memory regions instead of enable/disable >sorry I don't understand what you mean. The vhost_iommu_region_del call >stack is provided below [1]. Write to the intel iommu GCMD TE bit >induces a call to vhost_iommu_region_del. This happens before the >vhost_dev_stop whose call stack is provided below [2] and originates >from a bus reset. > >We may have inflight IOTLB miss requests happening between both. > >If this happens, vhost_device_iotlb_miss() fails because the IOVA is not >translated anymore by the IOMMU and the iotlb.translated_addr returned >by address_space_get_iotlb_entry() is not within the registered >vhost_memory_regions looked up in vhost_memory_region_lookup(), hence >the "Fail to lookup the translated address" message. > >It sounds wrong that vhost keeps on using IOVAs that are not translated >anymore. It looks we have a reset ordering issue and this patch is just >removing the sympton and not the cause. > >At the moment I don't really get what is initiating the intel iommu TE >bit write. This is a guest action but is it initiated from a misordered >qemu event? During reboot, native_machine_shutdown() calls x86_platform.iommu_shutdown() to disable iommu before reset. So Peter's patch will not address your issue. Before iommu shutdown, device_shutdown() is called to shutdown all devices. Not clear why vhost is still active. Thanks Zhenzhong > >It could also relate to >[PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices >https://lore.kernel.org/all/?q=s%3Aintel_iommu%3A+Reset+vIOMMU+after+all+ >the+rest+of+devices
Hi On 1/24/25 3:44 AM, Duan, Zhenzhong wrote: > >> -----Original Message----- >> From: Eric Auger <eric.auger@redhat.com> >> Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets >> disabled >> >> Hi Jason, >> >> >> On 1/23/25 2:34 AM, Jason Wang wrote: >>> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote: >>>> Hi Jason, >>>> >>>> >>>> On 1/22/25 8:17 AM, Jason Wang wrote: >>>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> >> wrote: >>>>>> Hi Jason, >>>>>> >>>>>> On 1/21/25 4:27 AM, Jason Wang wrote: >>>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> >> wrote: >>>>>>>> When a guest exposed with a vhost device and protected by an >>>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning: >>>>>>>> >>>>>>>> Fail to lookup the translated address ffffe000 >>>>>>>> >>>>>>>> We observe that the IOMMU gets disabled through a write to the global >>>>>>>> command register (CMAR_GCMD.TE) before the vhost device gets >> stopped. >>>>>>>> When this warning happens it can be observed an inflight IOTLB >>>>>>>> miss occurs after the IOMMU disable and before the vhost stop. In >>>>>>>> that case a flat translation occurs and the check in >>>>>>>> vhost_memory_region_lookup() fails. >>>>>>>> >>>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been >>>>>>>> unregistered. >>>>>>>> >>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com> >>>>>>>> --- >>>>>>>> hw/virtio/vhost.c | 4 ++++ >>>>>>>> 1 file changed, 4 insertions(+) >>>>>>>> >>>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >>>>>>>> index 6aa72fd434..128c2ab094 100644 >>>>>>>> --- a/hw/virtio/vhost.c >>>>>>>> +++ b/hw/virtio/vhost.c >>>>>>>> @@ -931,6 +931,10 @@ static void >> vhost_iommu_region_del(MemoryListener *listener, >>>>>>>> break; >>>>>>>> } >>>>>>>> } >>>>>>>> + if (QLIST_EMPTY(&dev->iommu_list) && >>>>>>>> + dev->vhost_ops->vhost_set_iotlb_callback) { >>>>>>>> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); >>>>>>>> + } >>>>>>> So the current code assumes: >>>>>>> >>>>>>> 1) IOMMU is enabled before vhost starts >>>>>>> 2) IOMMU is disabled after vhost stops >>>>>>> >>>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the >>>>>>> IOMMU enabled after vhost starts? >>>>>> sorry I initially misunderstood the above comment. Indeed in the reboot >>>>>> case assumption 2) happens to be wrong. However what I currently do is: >>>>>> stop listening to iotlb miss requests from the kernel because my >>>>>> understanding is those requests are just spurious ones, generate >>>>>> warnings and we do not care since we are rebooting the system. >>>>>> >>>>>> However I do not claim this could handle the case where the IOMMU MR >>>>>> would be turned off and then turned on. I think in that case we should >>>>>> also flush the kernel IOTLB and this is not taken care of in this patch. >>>>>> Is it a relevant use case? >>>>> Not sure. >>>>> >>>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is >>>>>> that a valid use case as the virtio driver is using the dma api? >>>>> It should not be but we can't assume the behaviour of the guest. It >>>>> could be buggy or even malicious. >>>> agreed >>>>> Btw, we had the following codes while handling te: >>>>> >>>>> /* Handle Translation Enable/Disable */ >>>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en) >>>>> { >>>>> if (s->dmar_enabled == en) { >>>>> return; >>>>> } >>>>> >>>>> trace_vtd_dmar_enable(en); >>>>> >>>>> ... >>>>> >>>>> vtd_reset_caches(s); >>>>> vtd_address_space_refresh_all(s); >>>>> } >>>>> >>>>> vtd_address_space_refresh_all() will basically disable the iommu >>>>> memory region. It looks not sufficient to trigger the region_del >>>>> callback, maybe we should delete the region or introduce listener >>>>> callback? >>>> This is exactly the code path which is entered in my use case. >>>> >>>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But >> given the current implement of this latter the IOTLB callback is not unset and the >> kernel IOTLB is not refreshed. Also as I pointed out the hdev->mem->regions are >> not updated? shouldn't they. Can you explain what they correspond to? >>> Adding Peter for more ideas. >>> >>> I think it's better to find a way to trigger the listener here, probably: >>> >>> 1) add/delete the memory regions instead of enable/disable >> sorry I don't understand what you mean. The vhost_iommu_region_del call >> stack is provided below [1]. Write to the intel iommu GCMD TE bit >> induces a call to vhost_iommu_region_del. This happens before the >> vhost_dev_stop whose call stack is provided below [2] and originates > >from a bus reset. >> We may have inflight IOTLB miss requests happening between both. >> >> If this happens, vhost_device_iotlb_miss() fails because the IOVA is not >> translated anymore by the IOMMU and the iotlb.translated_addr returned >> by address_space_get_iotlb_entry() is not within the registered >> vhost_memory_regions looked up in vhost_memory_region_lookup(), hence >> the "Fail to lookup the translated address" message. >> >> It sounds wrong that vhost keeps on using IOVAs that are not translated >> anymore. It looks we have a reset ordering issue and this patch is just >> removing the sympton and not the cause. >> >> At the moment I don't really get what is initiating the intel iommu TE >> bit write. This is a guest action but is it initiated from a misordered >> qemu event? > During reboot, native_machine_shutdown() calls x86_platform.iommu_shutdown() > to disable iommu before reset. So Peter's patch will not address your issue. > > Before iommu shutdown, device_shutdown() is called to shutdown all devices. > Not clear why vhost is still active. I tested [PATCH] virtio: Remove virtio devices on device_shutdown() https://lore.kernel.org/all/20240808075141.3433253-1-kirill.shutemov@linux.intel.com/ and it fixes my issue Eric > > Thanks > Zhenzhong > >> It could also relate to >> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices >> https://lore.kernel.org/all/?q=s%3Aintel_iommu%3A+Reset+vIOMMU+after+all+ >> the+rest+of+devices
On Fri, Jan 31, 2025 at 10:55:26AM +0100, Eric Auger wrote: > > I tested [PATCH] virtio: Remove virtio devices on device_shutdown() > https://lore.kernel.org/all/20240808075141.3433253-1-kirill.shutemov@linux.intel.com/ > > and it fixes my issue > > Eric To make sure, we are dropping this in favor of the guest fix (which I intent to merge shortly), correct?
Hi Michael, On 2/20/25 4:25 PM, Michael S. Tsirkin wrote: > On Fri, Jan 31, 2025 at 10:55:26AM +0100, Eric Auger wrote: >> I tested [PATCH] virtio: Remove virtio devices on device_shutdown() >> https://lore.kernel.org/all/20240808075141.3433253-1-kirill.shutemov@linux.intel.com/ >> >> and it fixes my issue >> >> Eric > > To make sure, we are dropping this in favor of the guest fix (which > I intent to merge shortly), correct? Yes this should be dropped in favour of the implementation of the shutdown callback on guest side. Did you send your patch, I did not see it. however on qemu side, there is "[PATCH v3 0/5] Fix vIOMMU reset order" that needs to be merged to fix qmp system_reset. Thanks Eric >
On Thu, Feb 20, 2025 at 04:57:51PM +0100, Eric Auger wrote: > Hi Michael, > > > On 2/20/25 4:25 PM, Michael S. Tsirkin wrote: > > On Fri, Jan 31, 2025 at 10:55:26AM +0100, Eric Auger wrote: > >> I tested [PATCH] virtio: Remove virtio devices on device_shutdown() > >> https://lore.kernel.org/all/20240808075141.3433253-1-kirill.shutemov@linux.intel.com/ > >> > >> and it fixes my issue > >> > >> Eric > > > > To make sure, we are dropping this in favor of the guest fix (which > > I intent to merge shortly), correct? > > Yes this should be dropped in favour of the implementation of the > shutdown callback on guest side. Did you send your patch, I did not see it. Will send tomorrow. > however on qemu side, there is "[PATCH v3 0/5] Fix vIOMMU reset order" > that needs to be merged to fix qmp system_reset. Yes that one I'm testing. > Thanks > > Eric > >
Hi Zhenzhong, On 1/24/25 3:44 AM, Duan, Zhenzhong wrote: > >> -----Original Message----- >> From: Eric Auger <eric.auger@redhat.com> >> Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets >> disabled >> >> Hi Jason, >> >> >> On 1/23/25 2:34 AM, Jason Wang wrote: >>> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote: >>>> Hi Jason, >>>> >>>> >>>> On 1/22/25 8:17 AM, Jason Wang wrote: >>>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> >> wrote: >>>>>> Hi Jason, >>>>>> >>>>>> On 1/21/25 4:27 AM, Jason Wang wrote: >>>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> >> wrote: >>>>>>>> When a guest exposed with a vhost device and protected by an >>>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning: >>>>>>>> >>>>>>>> Fail to lookup the translated address ffffe000 >>>>>>>> >>>>>>>> We observe that the IOMMU gets disabled through a write to the global >>>>>>>> command register (CMAR_GCMD.TE) before the vhost device gets >> stopped. >>>>>>>> When this warning happens it can be observed an inflight IOTLB >>>>>>>> miss occurs after the IOMMU disable and before the vhost stop. In >>>>>>>> that case a flat translation occurs and the check in >>>>>>>> vhost_memory_region_lookup() fails. >>>>>>>> >>>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been >>>>>>>> unregistered. >>>>>>>> >>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com> >>>>>>>> --- >>>>>>>> hw/virtio/vhost.c | 4 ++++ >>>>>>>> 1 file changed, 4 insertions(+) >>>>>>>> >>>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >>>>>>>> index 6aa72fd434..128c2ab094 100644 >>>>>>>> --- a/hw/virtio/vhost.c >>>>>>>> +++ b/hw/virtio/vhost.c >>>>>>>> @@ -931,6 +931,10 @@ static void >> vhost_iommu_region_del(MemoryListener *listener, >>>>>>>> break; >>>>>>>> } >>>>>>>> } >>>>>>>> + if (QLIST_EMPTY(&dev->iommu_list) && >>>>>>>> + dev->vhost_ops->vhost_set_iotlb_callback) { >>>>>>>> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); >>>>>>>> + } >>>>>>> So the current code assumes: >>>>>>> >>>>>>> 1) IOMMU is enabled before vhost starts >>>>>>> 2) IOMMU is disabled after vhost stops >>>>>>> >>>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the >>>>>>> IOMMU enabled after vhost starts? >>>>>> sorry I initially misunderstood the above comment. Indeed in the reboot >>>>>> case assumption 2) happens to be wrong. However what I currently do is: >>>>>> stop listening to iotlb miss requests from the kernel because my >>>>>> understanding is those requests are just spurious ones, generate >>>>>> warnings and we do not care since we are rebooting the system. >>>>>> >>>>>> However I do not claim this could handle the case where the IOMMU MR >>>>>> would be turned off and then turned on. I think in that case we should >>>>>> also flush the kernel IOTLB and this is not taken care of in this patch. >>>>>> Is it a relevant use case? >>>>> Not sure. >>>>> >>>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is >>>>>> that a valid use case as the virtio driver is using the dma api? >>>>> It should not be but we can't assume the behaviour of the guest. It >>>>> could be buggy or even malicious. >>>> agreed >>>>> Btw, we had the following codes while handling te: >>>>> >>>>> /* Handle Translation Enable/Disable */ >>>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en) >>>>> { >>>>> if (s->dmar_enabled == en) { >>>>> return; >>>>> } >>>>> >>>>> trace_vtd_dmar_enable(en); >>>>> >>>>> ... >>>>> >>>>> vtd_reset_caches(s); >>>>> vtd_address_space_refresh_all(s); >>>>> } >>>>> >>>>> vtd_address_space_refresh_all() will basically disable the iommu >>>>> memory region. It looks not sufficient to trigger the region_del >>>>> callback, maybe we should delete the region or introduce listener >>>>> callback? >>>> This is exactly the code path which is entered in my use case. >>>> >>>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But >> given the current implement of this latter the IOTLB callback is not unset and the >> kernel IOTLB is not refreshed. Also as I pointed out the hdev->mem->regions are >> not updated? shouldn't they. Can you explain what they correspond to? >>> Adding Peter for more ideas. >>> >>> I think it's better to find a way to trigger the listener here, probably: >>> >>> 1) add/delete the memory regions instead of enable/disable >> sorry I don't understand what you mean. The vhost_iommu_region_del call >> stack is provided below [1]. Write to the intel iommu GCMD TE bit >> induces a call to vhost_iommu_region_del. This happens before the >> vhost_dev_stop whose call stack is provided below [2] and originates > >from a bus reset. >> We may have inflight IOTLB miss requests happening between both. >> >> If this happens, vhost_device_iotlb_miss() fails because the IOVA is not >> translated anymore by the IOMMU and the iotlb.translated_addr returned >> by address_space_get_iotlb_entry() is not within the registered >> vhost_memory_regions looked up in vhost_memory_region_lookup(), hence >> the "Fail to lookup the translated address" message. >> >> It sounds wrong that vhost keeps on using IOVAs that are not translated >> anymore. It looks we have a reset ordering issue and this patch is just >> removing the sympton and not the cause. >> >> At the moment I don't really get what is initiating the intel iommu TE >> bit write. This is a guest action but is it initiated from a misordered >> qemu event? > During reboot, native_machine_shutdown() calls x86_platform.iommu_shutdown() > to disable iommu before reset. So Peter's patch will not address your issue. Exactly I see the initial IOMMU disable comes from this guest code path (native_machine_shutdown). Indeed this is unrelated to Peter's series. Then I see that vIOMMU is reset and then vhost_dev_stop is called (this is related to Peter's series). > > Before iommu shutdown, device_shutdown() is called to shutdown all devices. > Not clear why vhost is still active. That's the piece I was missing next. Thanks! Eric > > Thanks > Zhenzhong > >> It could also relate to >> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices >> https://lore.kernel.org/all/?q=s%3Aintel_iommu%3A+Reset+vIOMMU+after+all+ >> the+rest+of+devices
Hi Eric, >-----Original Message----- >From: Eric Auger <eric.auger@redhat.com> >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets >disabled ... >>>>>> vtd_address_space_refresh_all() will basically disable the iommu >>>>>> memory region. It looks not sufficient to trigger the region_del >>>>>> callback, maybe we should delete the region or introduce listener >>>>>> callback? >>>>> This is exactly the code path which is entered in my use case. >>>>> >>>>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But >>> given the current implement of this latter the IOTLB callback is not unset and >the >>> kernel IOTLB is not refreshed. Also as I pointed out the hdev->mem->regions >are >>> not updated? shouldn't they. Can you explain what they correspond to? >>>> Adding Peter for more ideas. >>>> >>>> I think it's better to find a way to trigger the listener here, probably: >>>> >>>> 1) add/delete the memory regions instead of enable/disable >>> sorry I don't understand what you mean. The vhost_iommu_region_del call >>> stack is provided below [1]. Write to the intel iommu GCMD TE bit >>> induces a call to vhost_iommu_region_del. This happens before the >>> vhost_dev_stop whose call stack is provided below [2] and originates >> >from a bus reset. >>> We may have inflight IOTLB miss requests happening between both. >>> >>> If this happens, vhost_device_iotlb_miss() fails because the IOVA is not >>> translated anymore by the IOMMU and the iotlb.translated_addr returned >>> by address_space_get_iotlb_entry() is not within the registered >>> vhost_memory_regions looked up in vhost_memory_region_lookup(), hence >>> the "Fail to lookup the translated address" message. >>> >>> It sounds wrong that vhost keeps on using IOVAs that are not translated >>> anymore. It looks we have a reset ordering issue and this patch is just >>> removing the sympton and not the cause. >>> >>> At the moment I don't really get what is initiating the intel iommu TE >>> bit write. This is a guest action but is it initiated from a misordered >>> qemu event? >> During reboot, native_machine_shutdown() calls >x86_platform.iommu_shutdown() >> to disable iommu before reset. So Peter's patch will not address your issue. > >Exactly I see the initial IOMMU disable comes from this guest code path >(native_machine_shutdown). Indeed this is unrelated to Peter's series. >Then I see that vIOMMU is reset and then vhost_dev_stop is called (this >is related to Peter's series). Agree. Thanks Zhenzhong
On Fri, Jan 24, 2025 at 10:44 AM Duan, Zhenzhong <zhenzhong.duan@intel.com> wrote: > > > > >-----Original Message----- > >From: Eric Auger <eric.auger@redhat.com> > >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets > >disabled > > > >Hi Jason, > > > > > >On 1/23/25 2:34 AM, Jason Wang wrote: > >> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote: > >>> Hi Jason, > >>> > >>> > >>> On 1/22/25 8:17 AM, Jason Wang wrote: > >>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> > >wrote: > >>>>> Hi Jason, > >>>>> > >>>>> On 1/21/25 4:27 AM, Jason Wang wrote: > >>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> > >wrote: > >>>>>>> When a guest exposed with a vhost device and protected by an > >>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning: > >>>>>>> > >>>>>>> Fail to lookup the translated address ffffe000 > >>>>>>> > >>>>>>> We observe that the IOMMU gets disabled through a write to the global > >>>>>>> command register (CMAR_GCMD.TE) before the vhost device gets > >stopped. > >>>>>>> When this warning happens it can be observed an inflight IOTLB > >>>>>>> miss occurs after the IOMMU disable and before the vhost stop. In > >>>>>>> that case a flat translation occurs and the check in > >>>>>>> vhost_memory_region_lookup() fails. > >>>>>>> > >>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been > >>>>>>> unregistered. > >>>>>>> > >>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com> > >>>>>>> --- > >>>>>>> hw/virtio/vhost.c | 4 ++++ > >>>>>>> 1 file changed, 4 insertions(+) > >>>>>>> > >>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > >>>>>>> index 6aa72fd434..128c2ab094 100644 > >>>>>>> --- a/hw/virtio/vhost.c > >>>>>>> +++ b/hw/virtio/vhost.c > >>>>>>> @@ -931,6 +931,10 @@ static void > >vhost_iommu_region_del(MemoryListener *listener, > >>>>>>> break; > >>>>>>> } > >>>>>>> } > >>>>>>> + if (QLIST_EMPTY(&dev->iommu_list) && > >>>>>>> + dev->vhost_ops->vhost_set_iotlb_callback) { > >>>>>>> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); > >>>>>>> + } > >>>>>> So the current code assumes: > >>>>>> > >>>>>> 1) IOMMU is enabled before vhost starts > >>>>>> 2) IOMMU is disabled after vhost stops > >>>>>> > >>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the > >>>>>> IOMMU enabled after vhost starts? > >>>>> sorry I initially misunderstood the above comment. Indeed in the reboot > >>>>> case assumption 2) happens to be wrong. However what I currently do is: > >>>>> stop listening to iotlb miss requests from the kernel because my > >>>>> understanding is those requests are just spurious ones, generate > >>>>> warnings and we do not care since we are rebooting the system. > >>>>> > >>>>> However I do not claim this could handle the case where the IOMMU MR > >>>>> would be turned off and then turned on. I think in that case we should > >>>>> also flush the kernel IOTLB and this is not taken care of in this patch. > >>>>> Is it a relevant use case? > >>>> Not sure. > >>>> > >>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is > >>>>> that a valid use case as the virtio driver is using the dma api? > >>>> It should not be but we can't assume the behaviour of the guest. It > >>>> could be buggy or even malicious. > >>> agreed > >>>> Btw, we had the following codes while handling te: > >>>> > >>>> /* Handle Translation Enable/Disable */ > >>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en) > >>>> { > >>>> if (s->dmar_enabled == en) { > >>>> return; > >>>> } > >>>> > >>>> trace_vtd_dmar_enable(en); > >>>> > >>>> ... > >>>> > >>>> vtd_reset_caches(s); > >>>> vtd_address_space_refresh_all(s); > >>>> } > >>>> > >>>> vtd_address_space_refresh_all() will basically disable the iommu > >>>> memory region. It looks not sufficient to trigger the region_del > >>>> callback, maybe we should delete the region or introduce listener > >>>> callback? > >>> This is exactly the code path which is entered in my use case. > >>> > >>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But > >given the current implement of this latter the IOTLB callback is not unset and the > >kernel IOTLB is not refreshed. Also as I pointed out the hdev->mem->regions are > >not updated? shouldn't they. Can you explain what they correspond to? > >> Adding Peter for more ideas. > >> > >> I think it's better to find a way to trigger the listener here, probably: > >> > >> 1) add/delete the memory regions instead of enable/disable > >sorry I don't understand what you mean. The vhost_iommu_region_del call > >stack is provided below [1]. Write to the intel iommu GCMD TE bit > >induces a call to vhost_iommu_region_del. This happens before the > >vhost_dev_stop whose call stack is provided below [2] and originates > >from a bus reset. > > > >We may have inflight IOTLB miss requests happening between both. > > > >If this happens, vhost_device_iotlb_miss() fails because the IOVA is not > >translated anymore by the IOMMU and the iotlb.translated_addr returned > >by address_space_get_iotlb_entry() is not within the registered > >vhost_memory_regions looked up in vhost_memory_region_lookup(), hence > >the "Fail to lookup the translated address" message. > > > >It sounds wrong that vhost keeps on using IOVAs that are not translated > >anymore. It looks we have a reset ordering issue and this patch is just > >removing the sympton and not the cause. > > > >At the moment I don't really get what is initiating the intel iommu TE > >bit write. This is a guest action but is it initiated from a misordered > >qemu event? > > During reboot, native_machine_shutdown() calls x86_platform.iommu_shutdown() > to disable iommu before reset. So Peter's patch will not address your issue. > > Before iommu shutdown, device_shutdown() is called to shutdown all devices. > Not clear why vhost is still active. It might be because neither virtio bus nor virtio-net provides a shutdown method. There used to be requests to provide those to unbreak the kexec. A quick try might be to provide a .driver.shutdown to virtio_net_driver structure and reset the device there as a start. Thanks > > Thanks > Zhenzhong > > > > >It could also relate to > >[PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices > >https://lore.kernel.org/all/?q=s%3Aintel_iommu%3A+Reset+vIOMMU+after+all+ > >the+rest+of+devices >
Hello, Jason, Eric, On Fri, Jan 24, 2025 at 11:30:56AM +0800, Jason Wang wrote: > It might be because neither virtio bus nor virtio-net provides a > shutdown method. > > There used to be requests to provide those to unbreak the kexec. > > A quick try might be to provide a .driver.shutdown to > virtio_net_driver structure and reset the device there as a start. I didn't check virtio driver path, but if that's missing it's reasonable to support it indeed. OTOH, even with that, vhost can still hit such DMA issue if it's a hard-reset, am I right? IOW, when using QMP command "system-reset". If my memory is correct, that's the problem I was working on the VFIO series, rather than a clean reboot. And that won't give guest driver chance to run anything, IIUC. I am wildly suspecting a VT-d write to GCMD to disable it can also appear if there's a hard reset, then when bootloading the VM the bios (or whatever firmware at early stage) may want to make sure the VT-d device is completely off by writting to GCMD. But that's a pure guess.. and that may or may not matter much on how we fix this problem. IOW, I suspect we need to fix both of them, (a) for soft-reset, by making sure drivers properly quiesce DMAs proactively when VM gracefully shuts down. (b) for hard-reset, by making sure QEMU reset in proper order. One thing to mention is for problem (b) VFIO used to have an extra challenge on !FLR devices, I discussed it in patch 4's comment there. Quotting from patch 4 of series: https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com * (1) Device depth-first reset hierachy doesn't yet work for vIOMMUs * (reference: resettable_cold_reset_fn()) * * Currently, vIOMMU devices are created as normal '-device' * cmdlines. It means in many ways it has the same attributes with * most of the rest devices, even if the rest devices should * logically be under control of the vIOMMU unit. * * One side effect of it is vIOMMU devices will be currently put * randomly under qdev tree hierarchy, which is the source of * device reset ordering in current QEMU (depth-first traversal). * It means vIOMMU now can be reset before some devices. For fully * emulated devices that's not a problem, because the traversal * holds BQL for the whole process. However it is a problem if DMA * can happen without BQL, like VFIO, vDPA or remote device process. * * TODO: one ideal solution can be that we make vIOMMU the parent * of the whole pci host bridge. Hence vIOMMU can be reset after * all the devices are reset and quiesced. * * (2) Some devices register its own reset functions * * Even if above issue solved, if devices register its own reset * functions for some reason via QEMU reset hooks, vIOMMU can still * be reset before the device. One example is vfio_reset_handler() * where FLR is not supported on the device. * * TODO: merge relevant reset functions into the device tree reset * framework. So maybe vhost doesn't have problem (2) listed above, and maybe it means it's still worthwhile thinking more about problem (1), which is to change the QOM tree to provide a correct topology representation when vIOMMU is present: so far it should be still a pretty much orphaned object there.. if QEMU relies on QOM tree topology for reset order, we may need to move it to the right place sooner or later. Thanks, -- Peter Xu
On Sat, Jan 25, 2025 at 12:42 AM Peter Xu <peterx@redhat.com> wrote: > > Hello, Jason, Eric, > > On Fri, Jan 24, 2025 at 11:30:56AM +0800, Jason Wang wrote: > > It might be because neither virtio bus nor virtio-net provides a > > shutdown method. > > > > There used to be requests to provide those to unbreak the kexec. > > > > A quick try might be to provide a .driver.shutdown to > > virtio_net_driver structure and reset the device there as a start. > > I didn't check virtio driver path, but if that's missing it's reasonable to > support it indeed. > > OTOH, even with that, vhost can still hit such DMA issue if it's a > hard-reset, am I right? IOW, when using QMP command "system-reset". If my > memory is correct, that's the problem I was working on the VFIO series, > rather than a clean reboot. And that won't give guest driver chance to run > anything, IIUC. Yes. > > I am wildly suspecting a VT-d write to GCMD to disable it can also appear > if there's a hard reset, then when bootloading the VM the bios (or whatever > firmware at early stage) may want to make sure the VT-d device is > completely off by writting to GCMD. But that's a pure guess.. and that may > or may not matter much on how we fix this problem. > > IOW, I suspect we need to fix both of them, > > (a) for soft-reset, by making sure drivers properly quiesce DMAs > proactively when VM gracefully shuts down. > > (b) for hard-reset, by making sure QEMU reset in proper order. > > One thing to mention is for problem (b) VFIO used to have an extra > challenge on !FLR devices, I discussed it in patch 4's comment there. > Quotting from patch 4 of series: > > https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com > > * (1) Device depth-first reset hierachy doesn't yet work for vIOMMUs > * (reference: resettable_cold_reset_fn()) > * > * Currently, vIOMMU devices are created as normal '-device' > * cmdlines. It means in many ways it has the same attributes with > * most of the rest devices, even if the rest devices should > * logically be under control of the vIOMMU unit. > * > * One side effect of it is vIOMMU devices will be currently put > * randomly under qdev tree hierarchy, which is the source of > * device reset ordering in current QEMU (depth-first traversal). > * It means vIOMMU now can be reset before some devices. For fully > * emulated devices that's not a problem, because the traversal > * holds BQL for the whole process. However it is a problem if DMA > * can happen without BQL, like VFIO, vDPA or remote device process. > * > * TODO: one ideal solution can be that we make vIOMMU the parent > * of the whole pci host bridge. Hence vIOMMU can be reset after > * all the devices are reset and quiesced. > * > * (2) Some devices register its own reset functions > * > * Even if above issue solved, if devices register its own reset > * functions for some reason via QEMU reset hooks, vIOMMU can still > * be reset before the device. One example is vfio_reset_handler() > * where FLR is not supported on the device. > * > * TODO: merge relevant reset functions into the device tree reset > * framework. > > So maybe vhost doesn't have problem (2) listed above, and maybe it means > it's still worthwhile thinking more about problem (1), which is to change > the QOM tree to provide a correct topology representation when vIOMMU is > present: so far it should be still a pretty much orphaned object there.. if > QEMU relies on QOM tree topology for reset order, we may need to move it to > the right place sooner or later. Sounds like a non-trivial task, so for a hard reset, maybe we can proceed with Eric's proposal to deal with the reset before the device stops. Thanks > > Thanks, > > -- > Peter Xu >
On Sun, Jan 26, 2025 at 3:56 PM Jason Wang <jasowang@redhat.com> wrote: > > On Sat, Jan 25, 2025 at 12:42 AM Peter Xu <peterx@redhat.com> wrote: > > > > Hello, Jason, Eric, > > > > On Fri, Jan 24, 2025 at 11:30:56AM +0800, Jason Wang wrote: > > > It might be because neither virtio bus nor virtio-net provides a > > > shutdown method. > > > > > > There used to be requests to provide those to unbreak the kexec. > > > > > > A quick try might be to provide a .driver.shutdown to > > > virtio_net_driver structure and reset the device there as a start. > > > > I didn't check virtio driver path, but if that's missing it's reasonable to > > support it indeed. > > > > OTOH, even with that, vhost can still hit such DMA issue if it's a > > hard-reset, am I right? IOW, when using QMP command "system-reset". If my > > memory is correct, that's the problem I was working on the VFIO series, > > rather than a clean reboot. And that won't give guest driver chance to run > > anything, IIUC. > > Yes. > > > > > I am wildly suspecting a VT-d write to GCMD to disable it can also appear > > if there's a hard reset, then when bootloading the VM the bios (or whatever > > firmware at early stage) may want to make sure the VT-d device is > > completely off by writting to GCMD. But that's a pure guess.. and that may > > or may not matter much on how we fix this problem. > > > > IOW, I suspect we need to fix both of them, > > > > (a) for soft-reset, by making sure drivers properly quiesce DMAs > > proactively when VM gracefully shuts down. > > > > (b) for hard-reset, by making sure QEMU reset in proper order. > > > > One thing to mention is for problem (b) VFIO used to have an extra > > challenge on !FLR devices, I discussed it in patch 4's comment there. > > Quotting from patch 4 of series: > > > > https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com > > > > * (1) Device depth-first reset hierachy doesn't yet work for vIOMMUs > > * (reference: resettable_cold_reset_fn()) > > * > > * Currently, vIOMMU devices are created as normal '-device' > > * cmdlines. It means in many ways it has the same attributes with > > * most of the rest devices, even if the rest devices should > > * logically be under control of the vIOMMU unit. > > * > > * One side effect of it is vIOMMU devices will be currently put > > * randomly under qdev tree hierarchy, which is the source of > > * device reset ordering in current QEMU (depth-first traversal). > > * It means vIOMMU now can be reset before some devices. For fully > > * emulated devices that's not a problem, because the traversal > > * holds BQL for the whole process. However it is a problem if DMA > > * can happen without BQL, like VFIO, vDPA or remote device process. > > * > > * TODO: one ideal solution can be that we make vIOMMU the parent > > * of the whole pci host bridge. Hence vIOMMU can be reset after > > * all the devices are reset and quiesced. > > * > > * (2) Some devices register its own reset functions > > * > > * Even if above issue solved, if devices register its own reset > > * functions for some reason via QEMU reset hooks, vIOMMU can still > > * be reset before the device. One example is vfio_reset_handler() > > * where FLR is not supported on the device. > > * > > * TODO: merge relevant reset functions into the device tree reset > > * framework. > > > > So maybe vhost doesn't have problem (2) listed above, and maybe it means > > it's still worthwhile thinking more about problem (1), which is to change > > the QOM tree to provide a correct topology representation when vIOMMU is > > present: so far it should be still a pretty much orphaned object there.. if > > QEMU relies on QOM tree topology for reset order, we may need to move it to > > the right place sooner or later. > > Sounds like a non-trivial task, so for a hard reset, maybe we can > proceed with Eric's proposal to deal with the reset before the device > stops. Btw, I actually meant to break the assumption that vhost needs to be enabled/disabled after/before vIOMMU. This only works for virtio-net / vhost. From the view of vhost, it would work similar to _F_LOG_ALL (where there's no assumption on the order of enabling/disabling dirty page tracking and device start/stop). Thanks > > Thanks > > > > > Thanks, > > > > -- > > Peter Xu > >
On Mon, Jan 27, 2025 at 08:44:50AM +0800, Jason Wang wrote: > On Sun, Jan 26, 2025 at 3:56 PM Jason Wang <jasowang@redhat.com> wrote: > > > > On Sat, Jan 25, 2025 at 12:42 AM Peter Xu <peterx@redhat.com> wrote: > > > > > > Hello, Jason, Eric, > > > > > > On Fri, Jan 24, 2025 at 11:30:56AM +0800, Jason Wang wrote: > > > > It might be because neither virtio bus nor virtio-net provides a > > > > shutdown method. > > > > > > > > There used to be requests to provide those to unbreak the kexec. > > > > > > > > A quick try might be to provide a .driver.shutdown to > > > > virtio_net_driver structure and reset the device there as a start. > > > > > > I didn't check virtio driver path, but if that's missing it's reasonable to > > > support it indeed. > > > > > > OTOH, even with that, vhost can still hit such DMA issue if it's a > > > hard-reset, am I right? IOW, when using QMP command "system-reset". If my > > > memory is correct, that's the problem I was working on the VFIO series, > > > rather than a clean reboot. And that won't give guest driver chance to run > > > anything, IIUC. > > > > Yes. > > > > > > > > I am wildly suspecting a VT-d write to GCMD to disable it can also appear > > > if there's a hard reset, then when bootloading the VM the bios (or whatever > > > firmware at early stage) may want to make sure the VT-d device is > > > completely off by writting to GCMD. But that's a pure guess.. and that may > > > or may not matter much on how we fix this problem. > > > > > > IOW, I suspect we need to fix both of them, > > > > > > (a) for soft-reset, by making sure drivers properly quiesce DMAs > > > proactively when VM gracefully shuts down. > > > > > > (b) for hard-reset, by making sure QEMU reset in proper order. > > > > > > One thing to mention is for problem (b) VFIO used to have an extra > > > challenge on !FLR devices, I discussed it in patch 4's comment there. > > > Quotting from patch 4 of series: > > > > > > https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com > > > > > > * (1) Device depth-first reset hierachy doesn't yet work for vIOMMUs > > > * (reference: resettable_cold_reset_fn()) > > > * > > > * Currently, vIOMMU devices are created as normal '-device' > > > * cmdlines. It means in many ways it has the same attributes with > > > * most of the rest devices, even if the rest devices should > > > * logically be under control of the vIOMMU unit. > > > * > > > * One side effect of it is vIOMMU devices will be currently put > > > * randomly under qdev tree hierarchy, which is the source of > > > * device reset ordering in current QEMU (depth-first traversal). > > > * It means vIOMMU now can be reset before some devices. For fully > > > * emulated devices that's not a problem, because the traversal > > > * holds BQL for the whole process. However it is a problem if DMA > > > * can happen without BQL, like VFIO, vDPA or remote device process. > > > * > > > * TODO: one ideal solution can be that we make vIOMMU the parent > > > * of the whole pci host bridge. Hence vIOMMU can be reset after > > > * all the devices are reset and quiesced. > > > * > > > * (2) Some devices register its own reset functions > > > * > > > * Even if above issue solved, if devices register its own reset > > > * functions for some reason via QEMU reset hooks, vIOMMU can still > > > * be reset before the device. One example is vfio_reset_handler() > > > * where FLR is not supported on the device. > > > * > > > * TODO: merge relevant reset functions into the device tree reset > > > * framework. > > > > > > So maybe vhost doesn't have problem (2) listed above, and maybe it means > > > it's still worthwhile thinking more about problem (1), which is to change > > > the QOM tree to provide a correct topology representation when vIOMMU is > > > present: so far it should be still a pretty much orphaned object there.. if > > > QEMU relies on QOM tree topology for reset order, we may need to move it to > > > the right place sooner or later. > > > > Sounds like a non-trivial task, so for a hard reset, maybe we can > > proceed with Eric's proposal to deal with the reset before the device > > stops. The major challenge when I was working on that (as far as I can still remember..): some devices are created at early stage of QEMU startup, which can happen before the vIOMMU is created and realized. Then it can be challenging to re-parent those devices to be childrens of the vIOMMU, or we may need a way to create vIOMMU always earlier than those.. > > Btw, I actually meant to break the assumption that vhost needs to be > enabled/disabled after/before vIOMMU. This only works for virtio-net / > vhost. From the view of vhost, it would work similar to _F_LOG_ALL > (where there's no assumption on the order of enabling/disabling dirty > page tracking and device start/stop). Yes, we can go for a lightweight solution. Thanks, -- Peter Xu
On Fri, Jan 24, 2025 at 11:30 AM Jason Wang <jasowang@redhat.com> wrote: > > On Fri, Jan 24, 2025 at 10:44 AM Duan, Zhenzhong > <zhenzhong.duan@intel.com> wrote: > > > > > > > > >-----Original Message----- > > >From: Eric Auger <eric.auger@redhat.com> > > >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets > > >disabled > > > > > >Hi Jason, > > > > > > > > >On 1/23/25 2:34 AM, Jason Wang wrote: > > >> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote: > > >>> Hi Jason, > > >>> > > >>> > > >>> On 1/22/25 8:17 AM, Jason Wang wrote: > > >>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> > > >wrote: > > >>>>> Hi Jason, > > >>>>> > > >>>>> On 1/21/25 4:27 AM, Jason Wang wrote: > > >>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> > > >wrote: > > >>>>>>> When a guest exposed with a vhost device and protected by an > > >>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning: > > >>>>>>> > > >>>>>>> Fail to lookup the translated address ffffe000 > > >>>>>>> > > >>>>>>> We observe that the IOMMU gets disabled through a write to the global > > >>>>>>> command register (CMAR_GCMD.TE) before the vhost device gets > > >stopped. > > >>>>>>> When this warning happens it can be observed an inflight IOTLB > > >>>>>>> miss occurs after the IOMMU disable and before the vhost stop. In > > >>>>>>> that case a flat translation occurs and the check in > > >>>>>>> vhost_memory_region_lookup() fails. > > >>>>>>> > > >>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been > > >>>>>>> unregistered. > > >>>>>>> > > >>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com> > > >>>>>>> --- > > >>>>>>> hw/virtio/vhost.c | 4 ++++ > > >>>>>>> 1 file changed, 4 insertions(+) > > >>>>>>> > > >>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > > >>>>>>> index 6aa72fd434..128c2ab094 100644 > > >>>>>>> --- a/hw/virtio/vhost.c > > >>>>>>> +++ b/hw/virtio/vhost.c > > >>>>>>> @@ -931,6 +931,10 @@ static void > > >vhost_iommu_region_del(MemoryListener *listener, > > >>>>>>> break; > > >>>>>>> } > > >>>>>>> } > > >>>>>>> + if (QLIST_EMPTY(&dev->iommu_list) && > > >>>>>>> + dev->vhost_ops->vhost_set_iotlb_callback) { > > >>>>>>> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); > > >>>>>>> + } > > >>>>>> So the current code assumes: > > >>>>>> > > >>>>>> 1) IOMMU is enabled before vhost starts > > >>>>>> 2) IOMMU is disabled after vhost stops > > >>>>>> > > >>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the > > >>>>>> IOMMU enabled after vhost starts? > > >>>>> sorry I initially misunderstood the above comment. Indeed in the reboot > > >>>>> case assumption 2) happens to be wrong. However what I currently do is: > > >>>>> stop listening to iotlb miss requests from the kernel because my > > >>>>> understanding is those requests are just spurious ones, generate > > >>>>> warnings and we do not care since we are rebooting the system. > > >>>>> > > >>>>> However I do not claim this could handle the case where the IOMMU MR > > >>>>> would be turned off and then turned on. I think in that case we should > > >>>>> also flush the kernel IOTLB and this is not taken care of in this patch. > > >>>>> Is it a relevant use case? > > >>>> Not sure. > > >>>> > > >>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is > > >>>>> that a valid use case as the virtio driver is using the dma api? > > >>>> It should not be but we can't assume the behaviour of the guest. It > > >>>> could be buggy or even malicious. > > >>> agreed > > >>>> Btw, we had the following codes while handling te: > > >>>> > > >>>> /* Handle Translation Enable/Disable */ > > >>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en) > > >>>> { > > >>>> if (s->dmar_enabled == en) { > > >>>> return; > > >>>> } > > >>>> > > >>>> trace_vtd_dmar_enable(en); > > >>>> > > >>>> ... > > >>>> > > >>>> vtd_reset_caches(s); > > >>>> vtd_address_space_refresh_all(s); > > >>>> } > > >>>> > > >>>> vtd_address_space_refresh_all() will basically disable the iommu > > >>>> memory region. It looks not sufficient to trigger the region_del > > >>>> callback, maybe we should delete the region or introduce listener > > >>>> callback? > > >>> This is exactly the code path which is entered in my use case. > > >>> > > >>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But > > >given the current implement of this latter the IOTLB callback is not unset and the > > >kernel IOTLB is not refreshed. Also as I pointed out the hdev->mem->regions are > > >not updated? shouldn't they. Can you explain what they correspond to? > > >> Adding Peter for more ideas. > > >> > > >> I think it's better to find a way to trigger the listener here, probably: > > >> > > >> 1) add/delete the memory regions instead of enable/disable > > >sorry I don't understand what you mean. The vhost_iommu_region_del call > > >stack is provided below [1]. Write to the intel iommu GCMD TE bit > > >induces a call to vhost_iommu_region_del. This happens before the > > >vhost_dev_stop whose call stack is provided below [2] and originates > > >from a bus reset. > > > > > >We may have inflight IOTLB miss requests happening between both. > > > > > >If this happens, vhost_device_iotlb_miss() fails because the IOVA is not > > >translated anymore by the IOMMU and the iotlb.translated_addr returned > > >by address_space_get_iotlb_entry() is not within the registered > > >vhost_memory_regions looked up in vhost_memory_region_lookup(), hence > > >the "Fail to lookup the translated address" message. > > > > > >It sounds wrong that vhost keeps on using IOVAs that are not translated > > >anymore. It looks we have a reset ordering issue and this patch is just > > >removing the sympton and not the cause. > > > > > >At the moment I don't really get what is initiating the intel iommu TE > > >bit write. This is a guest action but is it initiated from a misordered > > >qemu event? > > > > During reboot, native_machine_shutdown() calls x86_platform.iommu_shutdown() > > to disable iommu before reset. So Peter's patch will not address your issue. > > > > Before iommu shutdown, device_shutdown() is called to shutdown all devices. > > Not clear why vhost is still active. > > It might be because neither virtio bus nor virtio-net provides a > shutdown method. > > There used to be requests to provide those to unbreak the kexec. More could be seen at https://issues.redhat.com/browse/RHEL-331 This looks exactly the same issue. Thanks > > A quick try might be to provide a .driver.shutdown to > virtio_net_driver structure and reset the device there as a start. > > Thanks > > > > > Thanks > > Zhenzhong > > > > > > > >It could also relate to > > >[PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices > > >https://lore.kernel.org/all/?q=s%3Aintel_iommu%3A+Reset+vIOMMU+after+all+ > > >the+rest+of+devices > >
Hi, On 1/24/25 4:41 AM, Jason Wang wrote: > On Fri, Jan 24, 2025 at 11:30 AM Jason Wang <jasowang@redhat.com> wrote: >> On Fri, Jan 24, 2025 at 10:44 AM Duan, Zhenzhong >> <zhenzhong.duan@intel.com> wrote: >>> >>> >>>> -----Original Message----- >>>> From: Eric Auger <eric.auger@redhat.com> >>>> Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets >>>> disabled >>>> >>>> Hi Jason, >>>> >>>> >>>> On 1/23/25 2:34 AM, Jason Wang wrote: >>>>> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote: >>>>>> Hi Jason, >>>>>> >>>>>> >>>>>> On 1/22/25 8:17 AM, Jason Wang wrote: >>>>>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> >>>> wrote: >>>>>>>> Hi Jason, >>>>>>>> >>>>>>>> On 1/21/25 4:27 AM, Jason Wang wrote: >>>>>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> >>>> wrote: >>>>>>>>>> When a guest exposed with a vhost device and protected by an >>>>>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning: >>>>>>>>>> >>>>>>>>>> Fail to lookup the translated address ffffe000 >>>>>>>>>> >>>>>>>>>> We observe that the IOMMU gets disabled through a write to the global >>>>>>>>>> command register (CMAR_GCMD.TE) before the vhost device gets >>>> stopped. >>>>>>>>>> When this warning happens it can be observed an inflight IOTLB >>>>>>>>>> miss occurs after the IOMMU disable and before the vhost stop. In >>>>>>>>>> that case a flat translation occurs and the check in >>>>>>>>>> vhost_memory_region_lookup() fails. >>>>>>>>>> >>>>>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been >>>>>>>>>> unregistered. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com> >>>>>>>>>> --- >>>>>>>>>> hw/virtio/vhost.c | 4 ++++ >>>>>>>>>> 1 file changed, 4 insertions(+) >>>>>>>>>> >>>>>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >>>>>>>>>> index 6aa72fd434..128c2ab094 100644 >>>>>>>>>> --- a/hw/virtio/vhost.c >>>>>>>>>> +++ b/hw/virtio/vhost.c >>>>>>>>>> @@ -931,6 +931,10 @@ static void >>>> vhost_iommu_region_del(MemoryListener *listener, >>>>>>>>>> break; >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> + if (QLIST_EMPTY(&dev->iommu_list) && >>>>>>>>>> + dev->vhost_ops->vhost_set_iotlb_callback) { >>>>>>>>>> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); >>>>>>>>>> + } >>>>>>>>> So the current code assumes: >>>>>>>>> >>>>>>>>> 1) IOMMU is enabled before vhost starts >>>>>>>>> 2) IOMMU is disabled after vhost stops >>>>>>>>> >>>>>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the >>>>>>>>> IOMMU enabled after vhost starts? >>>>>>>> sorry I initially misunderstood the above comment. Indeed in the reboot >>>>>>>> case assumption 2) happens to be wrong. However what I currently do is: >>>>>>>> stop listening to iotlb miss requests from the kernel because my >>>>>>>> understanding is those requests are just spurious ones, generate >>>>>>>> warnings and we do not care since we are rebooting the system. >>>>>>>> >>>>>>>> However I do not claim this could handle the case where the IOMMU MR >>>>>>>> would be turned off and then turned on. I think in that case we should >>>>>>>> also flush the kernel IOTLB and this is not taken care of in this patch. >>>>>>>> Is it a relevant use case? >>>>>>> Not sure. >>>>>>> >>>>>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is >>>>>>>> that a valid use case as the virtio driver is using the dma api? >>>>>>> It should not be but we can't assume the behaviour of the guest. It >>>>>>> could be buggy or even malicious. >>>>>> agreed >>>>>>> Btw, we had the following codes while handling te: >>>>>>> >>>>>>> /* Handle Translation Enable/Disable */ >>>>>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en) >>>>>>> { >>>>>>> if (s->dmar_enabled == en) { >>>>>>> return; >>>>>>> } >>>>>>> >>>>>>> trace_vtd_dmar_enable(en); >>>>>>> >>>>>>> ... >>>>>>> >>>>>>> vtd_reset_caches(s); >>>>>>> vtd_address_space_refresh_all(s); >>>>>>> } >>>>>>> >>>>>>> vtd_address_space_refresh_all() will basically disable the iommu >>>>>>> memory region. It looks not sufficient to trigger the region_del >>>>>>> callback, maybe we should delete the region or introduce listener >>>>>>> callback? >>>>>> This is exactly the code path which is entered in my use case. >>>>>> >>>>>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But >>>> given the current implement of this latter the IOTLB callback is not unset and the >>>> kernel IOTLB is not refreshed. Also as I pointed out the hdev->mem->regions are >>>> not updated? shouldn't they. Can you explain what they correspond to? >>>>> Adding Peter for more ideas. >>>>> >>>>> I think it's better to find a way to trigger the listener here, probably: >>>>> >>>>> 1) add/delete the memory regions instead of enable/disable >>>> sorry I don't understand what you mean. The vhost_iommu_region_del call >>>> stack is provided below [1]. Write to the intel iommu GCMD TE bit >>>> induces a call to vhost_iommu_region_del. This happens before the >>>> vhost_dev_stop whose call stack is provided below [2] and originates >>> >from a bus reset. >>>> We may have inflight IOTLB miss requests happening between both. >>>> >>>> If this happens, vhost_device_iotlb_miss() fails because the IOVA is not >>>> translated anymore by the IOMMU and the iotlb.translated_addr returned >>>> by address_space_get_iotlb_entry() is not within the registered >>>> vhost_memory_regions looked up in vhost_memory_region_lookup(), hence >>>> the "Fail to lookup the translated address" message. >>>> >>>> It sounds wrong that vhost keeps on using IOVAs that are not translated >>>> anymore. It looks we have a reset ordering issue and this patch is just >>>> removing the sympton and not the cause. >>>> >>>> At the moment I don't really get what is initiating the intel iommu TE >>>> bit write. This is a guest action but is it initiated from a misordered >>>> qemu event? >>> During reboot, native_machine_shutdown() calls x86_platform.iommu_shutdown() >>> to disable iommu before reset. So Peter's patch will not address your issue. >>> >>> Before iommu shutdown, device_shutdown() is called to shutdown all devices. >>> Not clear why vhost is still active. >> It might be because neither virtio bus nor virtio-net provides a >> shutdown method. >> >> There used to be requests to provide those to unbreak the kexec. > More could be seen at https://issues.redhat.com/browse/RHEL-331 Cool, we have a culprit now :-) I see the ticket is closed, I will reopen it. Are there known implementation challenges that caused the fix postponing or do we "just" need someone with free cycles to carry out the task. By the way FYI, we have other tickets opened related to vSMMU and virtio devices also happening during reboot. Thanks Eric > This looks exactly the same issue. > > Thanks > >> A quick try might be to provide a .driver.shutdown to >> virtio_net_driver structure and reset the device there as a start. >> >> Thanks >> >>> Thanks >>> Zhenzhong >>> >>>> It could also relate to >>>> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices >>>> https://lore.kernel.org/all/?q=s%3Aintel_iommu%3A+Reset+vIOMMU+after+all+ >>>> the+rest+of+devices
>-----Original Message----- >From: Jason Wang <jasowang@redhat.com> >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets >disabled > >On Fri, Jan 24, 2025 at 11:30 AM Jason Wang <jasowang@redhat.com> wrote: >> >> On Fri, Jan 24, 2025 at 10:44 AM Duan, Zhenzhong >> <zhenzhong.duan@intel.com> wrote: >> > >> > >> > >> > >-----Original Message----- >> > >From: Eric Auger <eric.auger@redhat.com> >> > >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU >gets >> > >disabled >> > > >> > >Hi Jason, >> > > >> > > >> > >On 1/23/25 2:34 AM, Jason Wang wrote: >> > >> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> >wrote: >> > >>> Hi Jason, >> > >>> >> > >>> >> > >>> On 1/22/25 8:17 AM, Jason Wang wrote: >> > >>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> >> > >wrote: >> > >>>>> Hi Jason, >> > >>>>> >> > >>>>> On 1/21/25 4:27 AM, Jason Wang wrote: >> > >>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> >> > >wrote: >> > >>>>>>> When a guest exposed with a vhost device and protected by an >> > >>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious >warning: >> > >>>>>>> >> > >>>>>>> Fail to lookup the translated address ffffe000 >> > >>>>>>> >> > >>>>>>> We observe that the IOMMU gets disabled through a write to the >global >> > >>>>>>> command register (CMAR_GCMD.TE) before the vhost device gets >> > >stopped. >> > >>>>>>> When this warning happens it can be observed an inflight IOTLB >> > >>>>>>> miss occurs after the IOMMU disable and before the vhost stop. In >> > >>>>>>> that case a flat translation occurs and the check in >> > >>>>>>> vhost_memory_region_lookup() fails. >> > >>>>>>> >> > >>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been >> > >>>>>>> unregistered. >> > >>>>>>> >> > >>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com> >> > >>>>>>> --- >> > >>>>>>> hw/virtio/vhost.c | 4 ++++ >> > >>>>>>> 1 file changed, 4 insertions(+) >> > >>>>>>> >> > >>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >> > >>>>>>> index 6aa72fd434..128c2ab094 100644 >> > >>>>>>> --- a/hw/virtio/vhost.c >> > >>>>>>> +++ b/hw/virtio/vhost.c >> > >>>>>>> @@ -931,6 +931,10 @@ static void >> > >vhost_iommu_region_del(MemoryListener *listener, >> > >>>>>>> break; >> > >>>>>>> } >> > >>>>>>> } >> > >>>>>>> + if (QLIST_EMPTY(&dev->iommu_list) && >> > >>>>>>> + dev->vhost_ops->vhost_set_iotlb_callback) { >> > >>>>>>> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); >> > >>>>>>> + } >> > >>>>>> So the current code assumes: >> > >>>>>> >> > >>>>>> 1) IOMMU is enabled before vhost starts >> > >>>>>> 2) IOMMU is disabled after vhost stops >> > >>>>>> >> > >>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the >> > >>>>>> IOMMU enabled after vhost starts? >> > >>>>> sorry I initially misunderstood the above comment. Indeed in the >reboot >> > >>>>> case assumption 2) happens to be wrong. However what I currently do >is: >> > >>>>> stop listening to iotlb miss requests from the kernel because my >> > >>>>> understanding is those requests are just spurious ones, generate >> > >>>>> warnings and we do not care since we are rebooting the system. >> > >>>>> >> > >>>>> However I do not claim this could handle the case where the IOMMU >MR >> > >>>>> would be turned off and then turned on. I think in that case we should >> > >>>>> also flush the kernel IOTLB and this is not taken care of in this patch. >> > >>>>> Is it a relevant use case? >> > >>>> Not sure. >> > >>>> >> > >>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost >start. Is >> > >>>>> that a valid use case as the virtio driver is using the dma api? >> > >>>> It should not be but we can't assume the behaviour of the guest. It >> > >>>> could be buggy or even malicious. >> > >>> agreed >> > >>>> Btw, we had the following codes while handling te: >> > >>>> >> > >>>> /* Handle Translation Enable/Disable */ >> > >>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en) >> > >>>> { >> > >>>> if (s->dmar_enabled == en) { >> > >>>> return; >> > >>>> } >> > >>>> >> > >>>> trace_vtd_dmar_enable(en); >> > >>>> >> > >>>> ... >> > >>>> >> > >>>> vtd_reset_caches(s); >> > >>>> vtd_address_space_refresh_all(s); >> > >>>> } >> > >>>> >> > >>>> vtd_address_space_refresh_all() will basically disable the iommu >> > >>>> memory region. It looks not sufficient to trigger the region_del >> > >>>> callback, maybe we should delete the region or introduce listener >> > >>>> callback? >> > >>> This is exactly the code path which is entered in my use case. >> > >>> >> > >>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. >But >> > >given the current implement of this latter the IOTLB callback is not unset and >the >> > >kernel IOTLB is not refreshed. Also as I pointed out the hdev->mem->regions >are >> > >not updated? shouldn't they. Can you explain what they correspond to? >> > >> Adding Peter for more ideas. >> > >> >> > >> I think it's better to find a way to trigger the listener here, probably: >> > >> >> > >> 1) add/delete the memory regions instead of enable/disable >> > >sorry I don't understand what you mean. The vhost_iommu_region_del call >> > >stack is provided below [1]. Write to the intel iommu GCMD TE bit >> > >induces a call to vhost_iommu_region_del. This happens before the >> > >vhost_dev_stop whose call stack is provided below [2] and originates >> > >from a bus reset. >> > > >> > >We may have inflight IOTLB miss requests happening between both. >> > > >> > >If this happens, vhost_device_iotlb_miss() fails because the IOVA is not >> > >translated anymore by the IOMMU and the iotlb.translated_addr returned >> > >by address_space_get_iotlb_entry() is not within the registered >> > >vhost_memory_regions looked up in vhost_memory_region_lookup(), hence >> > >the "Fail to lookup the translated address" message. >> > > >> > >It sounds wrong that vhost keeps on using IOVAs that are not translated >> > >anymore. It looks we have a reset ordering issue and this patch is just >> > >removing the sympton and not the cause. >> > > >> > >At the moment I don't really get what is initiating the intel iommu TE >> > >bit write. This is a guest action but is it initiated from a misordered >> > >qemu event? >> > >> > During reboot, native_machine_shutdown() calls >x86_platform.iommu_shutdown() >> > to disable iommu before reset. So Peter's patch will not address your issue. >> > >> > Before iommu shutdown, device_shutdown() is called to shutdown all devices. >> > Not clear why vhost is still active. >> >> It might be because neither virtio bus nor virtio-net provides a >> shutdown method. Oh, I see. >> >> There used to be requests to provide those to unbreak the kexec. > >More could be seen at https://issues.redhat.com/browse/RHEL-331 > >This looks exactly the same issue. Have not access😊 > >Thanks > >> >> A quick try might be to provide a .driver.shutdown to >> virtio_net_driver structure and reset the device there as a start. Make sense. Thanks Zhenzhong
On Fri, Jan 24, 2025 at 12:01 PM Duan, Zhenzhong <zhenzhong.duan@intel.com> wrote: > > > > >-----Original Message----- > >From: Jason Wang <jasowang@redhat.com> > >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets > >disabled > > > >On Fri, Jan 24, 2025 at 11:30 AM Jason Wang <jasowang@redhat.com> wrote: > >> > >> On Fri, Jan 24, 2025 at 10:44 AM Duan, Zhenzhong > >> <zhenzhong.duan@intel.com> wrote: > >> > > >> > > >> > > >> > >-----Original Message----- > >> > >From: Eric Auger <eric.auger@redhat.com> > >> > >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU > >gets > >> > >disabled > >> > > > >> > >Hi Jason, > >> > > > >> > > > >> > >On 1/23/25 2:34 AM, Jason Wang wrote: > >> > >> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> > >wrote: > >> > >>> Hi Jason, > >> > >>> > >> > >>> > >> > >>> On 1/22/25 8:17 AM, Jason Wang wrote: > >> > >>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> > >> > >wrote: > >> > >>>>> Hi Jason, > >> > >>>>> > >> > >>>>> On 1/21/25 4:27 AM, Jason Wang wrote: > >> > >>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> > >> > >wrote: > >> > >>>>>>> When a guest exposed with a vhost device and protected by an > >> > >>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious > >warning: > >> > >>>>>>> > >> > >>>>>>> Fail to lookup the translated address ffffe000 > >> > >>>>>>> > >> > >>>>>>> We observe that the IOMMU gets disabled through a write to the > >global > >> > >>>>>>> command register (CMAR_GCMD.TE) before the vhost device gets > >> > >stopped. > >> > >>>>>>> When this warning happens it can be observed an inflight IOTLB > >> > >>>>>>> miss occurs after the IOMMU disable and before the vhost stop. In > >> > >>>>>>> that case a flat translation occurs and the check in > >> > >>>>>>> vhost_memory_region_lookup() fails. > >> > >>>>>>> > >> > >>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been > >> > >>>>>>> unregistered. > >> > >>>>>>> > >> > >>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com> > >> > >>>>>>> --- > >> > >>>>>>> hw/virtio/vhost.c | 4 ++++ > >> > >>>>>>> 1 file changed, 4 insertions(+) > >> > >>>>>>> > >> > >>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > >> > >>>>>>> index 6aa72fd434..128c2ab094 100644 > >> > >>>>>>> --- a/hw/virtio/vhost.c > >> > >>>>>>> +++ b/hw/virtio/vhost.c > >> > >>>>>>> @@ -931,6 +931,10 @@ static void > >> > >vhost_iommu_region_del(MemoryListener *listener, > >> > >>>>>>> break; > >> > >>>>>>> } > >> > >>>>>>> } > >> > >>>>>>> + if (QLIST_EMPTY(&dev->iommu_list) && > >> > >>>>>>> + dev->vhost_ops->vhost_set_iotlb_callback) { > >> > >>>>>>> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); > >> > >>>>>>> + } > >> > >>>>>> So the current code assumes: > >> > >>>>>> > >> > >>>>>> 1) IOMMU is enabled before vhost starts > >> > >>>>>> 2) IOMMU is disabled after vhost stops > >> > >>>>>> > >> > >>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the > >> > >>>>>> IOMMU enabled after vhost starts? > >> > >>>>> sorry I initially misunderstood the above comment. Indeed in the > >reboot > >> > >>>>> case assumption 2) happens to be wrong. However what I currently do > >is: > >> > >>>>> stop listening to iotlb miss requests from the kernel because my > >> > >>>>> understanding is those requests are just spurious ones, generate > >> > >>>>> warnings and we do not care since we are rebooting the system. > >> > >>>>> > >> > >>>>> However I do not claim this could handle the case where the IOMMU > >MR > >> > >>>>> would be turned off and then turned on. I think in that case we should > >> > >>>>> also flush the kernel IOTLB and this is not taken care of in this patch. > >> > >>>>> Is it a relevant use case? > >> > >>>> Not sure. > >> > >>>> > >> > >>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost > >start. Is > >> > >>>>> that a valid use case as the virtio driver is using the dma api? > >> > >>>> It should not be but we can't assume the behaviour of the guest. It > >> > >>>> could be buggy or even malicious. > >> > >>> agreed > >> > >>>> Btw, we had the following codes while handling te: > >> > >>>> > >> > >>>> /* Handle Translation Enable/Disable */ > >> > >>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en) > >> > >>>> { > >> > >>>> if (s->dmar_enabled == en) { > >> > >>>> return; > >> > >>>> } > >> > >>>> > >> > >>>> trace_vtd_dmar_enable(en); > >> > >>>> > >> > >>>> ... > >> > >>>> > >> > >>>> vtd_reset_caches(s); > >> > >>>> vtd_address_space_refresh_all(s); > >> > >>>> } > >> > >>>> > >> > >>>> vtd_address_space_refresh_all() will basically disable the iommu > >> > >>>> memory region. It looks not sufficient to trigger the region_del > >> > >>>> callback, maybe we should delete the region or introduce listener > >> > >>>> callback? > >> > >>> This is exactly the code path which is entered in my use case. > >> > >>> > >> > >>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. > >But > >> > >given the current implement of this latter the IOTLB callback is not unset and > >the > >> > >kernel IOTLB is not refreshed. Also as I pointed out the hdev->mem->regions > >are > >> > >not updated? shouldn't they. Can you explain what they correspond to? > >> > >> Adding Peter for more ideas. > >> > >> > >> > >> I think it's better to find a way to trigger the listener here, probably: > >> > >> > >> > >> 1) add/delete the memory regions instead of enable/disable > >> > >sorry I don't understand what you mean. The vhost_iommu_region_del call > >> > >stack is provided below [1]. Write to the intel iommu GCMD TE bit > >> > >induces a call to vhost_iommu_region_del. This happens before the > >> > >vhost_dev_stop whose call stack is provided below [2] and originates > >> > >from a bus reset. > >> > > > >> > >We may have inflight IOTLB miss requests happening between both. > >> > > > >> > >If this happens, vhost_device_iotlb_miss() fails because the IOVA is not > >> > >translated anymore by the IOMMU and the iotlb.translated_addr returned > >> > >by address_space_get_iotlb_entry() is not within the registered > >> > >vhost_memory_regions looked up in vhost_memory_region_lookup(), hence > >> > >the "Fail to lookup the translated address" message. > >> > > > >> > >It sounds wrong that vhost keeps on using IOVAs that are not translated > >> > >anymore. It looks we have a reset ordering issue and this patch is just > >> > >removing the sympton and not the cause. > >> > > > >> > >At the moment I don't really get what is initiating the intel iommu TE > >> > >bit write. This is a guest action but is it initiated from a misordered > >> > >qemu event? > >> > > >> > During reboot, native_machine_shutdown() calls > >x86_platform.iommu_shutdown() > >> > to disable iommu before reset. So Peter's patch will not address your issue. > >> > > >> > Before iommu shutdown, device_shutdown() is called to shutdown all devices. > >> > Not clear why vhost is still active. > >> > >> It might be because neither virtio bus nor virtio-net provides a > >> shutdown method. > > Oh, I see. > > >> > >> There used to be requests to provide those to unbreak the kexec. > > > >More could be seen at https://issues.redhat.com/browse/RHEL-331 > > > >This looks exactly the same issue. > > Have not access😊 It should work now. Thanks > > > > >Thanks > > > >> > >> A quick try might be to provide a .driver.shutdown to > >> virtio_net_driver structure and reset the device there as a start. > > Make sense. > > Thanks > Zhenzhong
>-----Original Message----- >From: Jason Wang <jasowang@redhat.com> >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets >disabled > >On Fri, Jan 24, 2025 at 12:01 PM Duan, Zhenzhong ><zhenzhong.duan@intel.com> wrote: >> >> >> >> >-----Original Message----- >> >From: Jason Wang <jasowang@redhat.com> >> >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU >gets >> >disabled >> > >> >On Fri, Jan 24, 2025 at 11:30 AM Jason Wang <jasowang@redhat.com> wrote: >> >> >> >> On Fri, Jan 24, 2025 at 10:44 AM Duan, Zhenzhong >> >> <zhenzhong.duan@intel.com> wrote: >> >> > >> >> > >> >> > >> >> > >-----Original Message----- >> >> > >From: Eric Auger <eric.auger@redhat.com> >> >> > >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when >IOMMU >> >gets >> >> > >disabled >> >> > > >> >> > >Hi Jason, >> >> > > >> >> > > >> >> > >On 1/23/25 2:34 AM, Jason Wang wrote: >> >> > >> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> >> >wrote: >> >> > >>> Hi Jason, >> >> > >>> >> >> > >>> >> >> > >>> On 1/22/25 8:17 AM, Jason Wang wrote: >> >> > >>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger ><eric.auger@redhat.com> >> >> > >wrote: >> >> > >>>>> Hi Jason, >> >> > >>>>> >> >> > >>>>> On 1/21/25 4:27 AM, Jason Wang wrote: >> >> > >>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger ><eric.auger@redhat.com> >> >> > >wrote: >> >> > >>>>>>> When a guest exposed with a vhost device and protected by an >> >> > >>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious >> >warning: >> >> > >>>>>>> >> >> > >>>>>>> Fail to lookup the translated address ffffe000 >> >> > >>>>>>> >> >> > >>>>>>> We observe that the IOMMU gets disabled through a write to the >> >global >> >> > >>>>>>> command register (CMAR_GCMD.TE) before the vhost device >gets >> >> > >stopped. >> >> > >>>>>>> When this warning happens it can be observed an inflight IOTLB >> >> > >>>>>>> miss occurs after the IOMMU disable and before the vhost stop. >In >> >> > >>>>>>> that case a flat translation occurs and the check in >> >> > >>>>>>> vhost_memory_region_lookup() fails. >> >> > >>>>>>> >> >> > >>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been >> >> > >>>>>>> unregistered. >> >> > >>>>>>> >> >> > >>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com> >> >> > >>>>>>> --- >> >> > >>>>>>> hw/virtio/vhost.c | 4 ++++ >> >> > >>>>>>> 1 file changed, 4 insertions(+) >> >> > >>>>>>> >> >> > >>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >> >> > >>>>>>> index 6aa72fd434..128c2ab094 100644 >> >> > >>>>>>> --- a/hw/virtio/vhost.c >> >> > >>>>>>> +++ b/hw/virtio/vhost.c >> >> > >>>>>>> @@ -931,6 +931,10 @@ static void >> >> > >vhost_iommu_region_del(MemoryListener *listener, >> >> > >>>>>>> break; >> >> > >>>>>>> } >> >> > >>>>>>> } >> >> > >>>>>>> + if (QLIST_EMPTY(&dev->iommu_list) && >> >> > >>>>>>> + dev->vhost_ops->vhost_set_iotlb_callback) { >> >> > >>>>>>> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); >> >> > >>>>>>> + } >> >> > >>>>>> So the current code assumes: >> >> > >>>>>> >> >> > >>>>>> 1) IOMMU is enabled before vhost starts >> >> > >>>>>> 2) IOMMU is disabled after vhost stops >> >> > >>>>>> >> >> > >>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the >> >> > >>>>>> IOMMU enabled after vhost starts? >> >> > >>>>> sorry I initially misunderstood the above comment. Indeed in the >> >reboot >> >> > >>>>> case assumption 2) happens to be wrong. However what I currently >do >> >is: >> >> > >>>>> stop listening to iotlb miss requests from the kernel because my >> >> > >>>>> understanding is those requests are just spurious ones, generate >> >> > >>>>> warnings and we do not care since we are rebooting the system. >> >> > >>>>> >> >> > >>>>> However I do not claim this could handle the case where the >IOMMU >> >MR >> >> > >>>>> would be turned off and then turned on. I think in that case we >should >> >> > >>>>> also flush the kernel IOTLB and this is not taken care of in this patch. >> >> > >>>>> Is it a relevant use case? >> >> > >>>> Not sure. >> >> > >>>> >> >> > >>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost >> >start. Is >> >> > >>>>> that a valid use case as the virtio driver is using the dma api? >> >> > >>>> It should not be but we can't assume the behaviour of the guest. It >> >> > >>>> could be buggy or even malicious. >> >> > >>> agreed >> >> > >>>> Btw, we had the following codes while handling te: >> >> > >>>> >> >> > >>>> /* Handle Translation Enable/Disable */ >> >> > >>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en) >> >> > >>>> { >> >> > >>>> if (s->dmar_enabled == en) { >> >> > >>>> return; >> >> > >>>> } >> >> > >>>> >> >> > >>>> trace_vtd_dmar_enable(en); >> >> > >>>> >> >> > >>>> ... >> >> > >>>> >> >> > >>>> vtd_reset_caches(s); >> >> > >>>> vtd_address_space_refresh_all(s); >> >> > >>>> } >> >> > >>>> >> >> > >>>> vtd_address_space_refresh_all() will basically disable the iommu >> >> > >>>> memory region. It looks not sufficient to trigger the region_del >> >> > >>>> callback, maybe we should delete the region or introduce listener >> >> > >>>> callback? >> >> > >>> This is exactly the code path which is entered in my use case. >> >> > >>> >> >> > >>> vtd_address_space_refresh_all(s) induces the >vhost_iommu_region_del. >> >But >> >> > >given the current implement of this latter the IOTLB callback is not unset >and >> >the >> >> > >kernel IOTLB is not refreshed. Also as I pointed out the hdev->mem- >>regions >> >are >> >> > >not updated? shouldn't they. Can you explain what they correspond to? >> >> > >> Adding Peter for more ideas. >> >> > >> >> >> > >> I think it's better to find a way to trigger the listener here, probably: >> >> > >> >> >> > >> 1) add/delete the memory regions instead of enable/disable >> >> > >sorry I don't understand what you mean. The vhost_iommu_region_del >call >> >> > >stack is provided below [1]. Write to the intel iommu GCMD TE bit >> >> > >induces a call to vhost_iommu_region_del. This happens before the >> >> > >vhost_dev_stop whose call stack is provided below [2] and originates >> >> > >from a bus reset. >> >> > > >> >> > >We may have inflight IOTLB miss requests happening between both. >> >> > > >> >> > >If this happens, vhost_device_iotlb_miss() fails because the IOVA is not >> >> > >translated anymore by the IOMMU and the iotlb.translated_addr returned >> >> > >by address_space_get_iotlb_entry() is not within the registered >> >> > >vhost_memory_regions looked up in vhost_memory_region_lookup(), >hence >> >> > >the "Fail to lookup the translated address" message. >> >> > > >> >> > >It sounds wrong that vhost keeps on using IOVAs that are not translated >> >> > >anymore. It looks we have a reset ordering issue and this patch is just >> >> > >removing the sympton and not the cause. >> >> > > >> >> > >At the moment I don't really get what is initiating the intel iommu TE >> >> > >bit write. This is a guest action but is it initiated from a misordered >> >> > >qemu event? >> >> > >> >> > During reboot, native_machine_shutdown() calls >> >x86_platform.iommu_shutdown() >> >> > to disable iommu before reset. So Peter's patch will not address your issue. >> >> > >> >> > Before iommu shutdown, device_shutdown() is called to shutdown all >devices. >> >> > Not clear why vhost is still active. >> >> >> >> It might be because neither virtio bus nor virtio-net provides a >> >> shutdown method. >> >> Oh, I see. >> >> >> >> >> There used to be requests to provide those to unbreak the kexec. >> > >> >More could be seen at https://issues.redhat.com/browse/RHEL-331 >> > >> >This looks exactly the same issue. >> >> Have not access😊 > >It should work now. Yes, exactly same issue. Thanks Zhenzhong
On Thu, Jan 23, 2025 at 4:31 PM Eric Auger <eric.auger@redhat.com> wrote: > > Hi Jason, > > > On 1/23/25 2:34 AM, Jason Wang wrote: > > On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote: > >> Hi Jason, > >> > >> > >> On 1/22/25 8:17 AM, Jason Wang wrote: > >>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> wrote: > >>>> Hi Jason, > >>>> > >>>> On 1/21/25 4:27 AM, Jason Wang wrote: > >>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> wrote: > >>>>>> When a guest exposed with a vhost device and protected by an > >>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning: > >>>>>> > >>>>>> Fail to lookup the translated address ffffe000 > >>>>>> > >>>>>> We observe that the IOMMU gets disabled through a write to the global > >>>>>> command register (CMAR_GCMD.TE) before the vhost device gets stopped. > >>>>>> When this warning happens it can be observed an inflight IOTLB > >>>>>> miss occurs after the IOMMU disable and before the vhost stop. In > >>>>>> that case a flat translation occurs and the check in > >>>>>> vhost_memory_region_lookup() fails. > >>>>>> > >>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been > >>>>>> unregistered. > >>>>>> > >>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com> > >>>>>> --- > >>>>>> hw/virtio/vhost.c | 4 ++++ > >>>>>> 1 file changed, 4 insertions(+) > >>>>>> > >>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > >>>>>> index 6aa72fd434..128c2ab094 100644 > >>>>>> --- a/hw/virtio/vhost.c > >>>>>> +++ b/hw/virtio/vhost.c > >>>>>> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener, > >>>>>> break; > >>>>>> } > >>>>>> } > >>>>>> + if (QLIST_EMPTY(&dev->iommu_list) && > >>>>>> + dev->vhost_ops->vhost_set_iotlb_callback) { > >>>>>> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); > >>>>>> + } > >>>>> So the current code assumes: > >>>>> > >>>>> 1) IOMMU is enabled before vhost starts > >>>>> 2) IOMMU is disabled after vhost stops > >>>>> > >>>>> This patch seems to fix 2) but not 1). Do we need to deal with the > >>>>> IOMMU enabled after vhost starts? > >>>> sorry I initially misunderstood the above comment. Indeed in the reboot > >>>> case assumption 2) happens to be wrong. However what I currently do is: > >>>> stop listening to iotlb miss requests from the kernel because my > >>>> understanding is those requests are just spurious ones, generate > >>>> warnings and we do not care since we are rebooting the system. > >>>> > >>>> However I do not claim this could handle the case where the IOMMU MR > >>>> would be turned off and then turned on. I think in that case we should > >>>> also flush the kernel IOTLB and this is not taken care of in this patch. > >>>> Is it a relevant use case? > >>> Not sure. > >>> > >>>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is > >>>> that a valid use case as the virtio driver is using the dma api? > >>> It should not be but we can't assume the behaviour of the guest. It > >>> could be buggy or even malicious. > >> agreed > >>> Btw, we had the following codes while handling te: > >>> > >>> /* Handle Translation Enable/Disable */ > >>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en) > >>> { > >>> if (s->dmar_enabled == en) { > >>> return; > >>> } > >>> > >>> trace_vtd_dmar_enable(en); > >>> > >>> ... > >>> > >>> vtd_reset_caches(s); > >>> vtd_address_space_refresh_all(s); > >>> } > >>> > >>> vtd_address_space_refresh_all() will basically disable the iommu > >>> memory region. It looks not sufficient to trigger the region_del > >>> callback, maybe we should delete the region or introduce listener > >>> callback? > >> This is exactly the code path which is entered in my use case. > >> > >> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But given the current implement of this latter the IOTLB callback is not unset and the kernel IOTLB is not refreshed. Also as I pointed out the hdev->mem->regions are not updated? shouldn't they. Can you explain what they correspond to? > > Adding Peter for more ideas. > > > > I think it's better to find a way to trigger the listener here, probably: > > > > 1) add/delete the memory regions instead of enable/disable > sorry I don't understand what you mean. The vhost_iommu_region_del call > stack is provided below [1]. Write to the intel iommu GCMD TE bit > induces a call to vhost_iommu_region_del. This happens before the > vhost_dev_stop whose call stack is provided below [2] and originates > from a bus reset. Right, I see. > > We may have inflight IOTLB miss requests happening between both. > > If this happens, vhost_device_iotlb_miss() fails because the IOVA is not > translated anymore by the IOMMU and the iotlb.translated_addr returned > by address_space_get_iotlb_entry() is not within the registered > vhost_memory_regions looked up in vhost_memory_region_lookup(), hence > the "Fail to lookup the translated address" message. > > It sounds wrong that vhost keeps on using IOVAs that are not translated > anymore. It looks we have a reset ordering issue and this patch is just > removing the sympton and not the cause. Exactly. > > At the moment I don't really get what is initiating the intel iommu TE > bit write. This is a guest action but is it initiated from a misordered > qemu event? It could also be a guest bug which reset the vIOMMU before the device. Do we have a calltrace in the guest? Adding more vtd maintiners for more thoughts. Thanks > > It could also relate to > [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices > https://lore.kernel.org/all/?q=s%3Aintel_iommu%3A+Reset+vIOMMU+after+all+the+rest+of+devices > > Thanks > > Eric > > > > > [1] > #0 vhost_iommu_region_del (listener=0x55555708a388, > section=0x7ffff62bf030) at ../hw/virtio/vhost.c:927 > #1 0x0000555555c851b4 in address_space_update_topology_pass > (as=0x55555822e4f0, old_view=0x7fffe8e81850, new_view=0x55555723dfa0, > adding=false) at ../system/memory.c:977 > #2 0x0000555555c857a0 in address_space_set_flatview (as=0x55555822e4f0) > at ../system/memory.c:1079 > #3 0x0000555555c85986 in memory_region_transaction_commit () at > ../system/memory.c:1132 > #4 0x0000555555c89f25 in memory_region_set_enabled (mr=0x555557dc0d30, > enabled=false) at ../system/memory.c:2686 > #5 0x0000555555b5edb1 in vtd_switch_address_space (as=0x555557dc0c60) > at ../hw/i386/intel_iommu.c:1735 > #6 0x0000555555b5ee6f in vtd_switch_address_space_all > (s=0x555557db1500) at ../hw/i386/intel_iommu.c:1779 > #7 0x0000555555b64533 in vtd_address_space_refresh_all > (s=0x555557db1500) at ../hw/i386/intel_iommu.c:4006 > #8 0x0000555555b60770 in vtd_handle_gcmd_te (s=0x555557db1500, > en=false) at ../hw/i386/intel_iommu.c:2419 > #9 0x0000555555b60885 in vtd_handle_gcmd_write (s=0x555557db1500) at > ../hw/i386/intel_iommu.c:2449 > #10 0x0000555555b61db2 in vtd_mem_write (opaque=0x555557db1500, addr=24, > val=100663296, size=4) at ../hw/i386/intel_iommu.c:2991 > #11 0x0000555555c833e0 in memory_region_write_accessor > (mr=0x555557db1840, addr=24, value=0x7ffff62bf3d8, size=4, shift=0, > mask=4294967295, attrs=...) at ../system/memory.c:497 > #12 0x0000555555c83711 in access_with_adjusted_size > (addr=24, value=0x7ffff62bf3d8, size=4, access_size_min=4, > access_size_max=8, access_fn=0x555555c832ea > <memory_region_write_accessor>, mr=0x555557db1840, attrs=...) > at ../system/memory.c:573 > #13 0x0000555555c8698b in memory_region_dispatch_write > (mr=0x555557db1840, addr=24, data=100663296, op=MO_32, attrs=...) at > ../system/memory.c:1521 > #14 0x0000555555c95619 in flatview_write_continue_step (attrs=..., > buf=0x7ffff7fbb028 "", len=4, mr_addr=24, l=0x7ffff62bf4c0, > mr=0x555557db1840) at ../system/physmem.c:2803 > #15 0x0000555555c956eb in flatview_write_continue (fv=0x7fffe852d370, > addr=4275634200, attrs=..., ptr=0x7ffff7fbb028, len=4, mr_addr=24, l=4, > mr=0x555557db1840) at ../system/physmem.c:2833 > #16 0x0000555555c957f9 in flatview_write (fv=0x7fffe852d370, > addr=4275634200, attrs=..., buf=0x7ffff7fbb028, len=4) at > ../system/physmem.c:2864 > #17 0x0000555555c95c39 in address_space_write (as=0x555556ff1720 > <address_space_memory>, addr=4275634200, attrs=..., buf=0x7ffff7fbb028, > len=4) at ../system/physmem.c:2984 > #18 0x0000555555c95cb1 in address_space_rw (as=0x555556ff1720 > <address_space_memory>, addr=4275634200, attrs=..., buf=0x7ffff7fbb028, > len=4, is_write=true) at ../system/physmem.c:2994 > #19 0x0000555555ceeb56 in kvm_cpu_exec (cpu=0x55555727e950) at > ../accel/kvm/kvm-all.c:3188 > #20 0x0000555555cf1ea6 in kvm_vcpu_thread_fn (arg=0x55555727e950) at > ../accel/kvm/kvm-accel-ops.c:50 > #21 0x0000555555f6de20 in qemu_thread_start (args=0x555557288960) at > ../util/qemu-thread-posix.c:541 > #22 0x00007ffff7289e92 in start_thread () at /lib64/libc.so.6 > > [2] > #0 vhost_dev_stop (hdev=0x55555708a2c0, vdev=0x555558234cb0, > vrings=false) at ../hw/virtio/vhost.c:2222 > #1 0x0000555555990266 in vhost_net_stop_one (net=0x55555708a2c0, > dev=0x555558234cb0) at ../hw/net/vhost_net.c:350 > #2 0x00005555559906ea in vhost_net_stop (dev=0x555558234cb0, > ncs=0x55555826f968, data_queue_pairs=1, cvq=0) at ../hw/net/vhost_net.c:462 > #3 0x0000555555c1ad0a in virtio_net_vhost_status (n=0x555558234cb0, > status=0 '\000') at ../hw/net/virtio-net.c:318 > #4 0x0000555555c1afaf in virtio_net_set_status (vdev=0x555558234cb0, > status=0 '\000') at ../hw/net/virtio-net.c:393 > #5 0x0000555555c591bd in virtio_set_status (vdev=0x555558234cb0, val=0 > '\000') at ../hw/virtio/virtio.c:2242 > #6 0x0000555555c595eb in virtio_reset (opaque=0x555558234cb0) at > ../hw/virtio/virtio.c:2325 > #7 0x0000555555a0d1e1 in virtio_bus_reset (bus=0x555558234c30) at > ../hw/virtio/virtio-bus.c:109 > #8 0x0000555555a13d51 in virtio_pci_reset (qdev=0x55555822c830) at > ../hw/virtio/virtio-pci.c:2282 > #9 0x0000555555a14016 in virtio_pci_bus_reset_hold (obj=0x55555822c830, > type=RESET_TYPE_COLD) at ../hw/virtio/virtio-pci.c:2322 > #10 0x0000555555d08831 in resettable_phase_hold (obj=0x55555822c830, > opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:180 > #11 0x0000555555d00fc5 in bus_reset_child_foreach (obj=0x555557ffa3c0, > cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0, > type=RESET_TYPE_COLD) at ../hw/core/bus.c:97 > #12 0x0000555555d084d8 in resettable_child_foreach (rc=0x555557146f20, > obj=0x555557ffa3c0, cb=0x555555d086d5 <resettable_phase_hold>, > opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92 > #13 0x0000555555d08792 in resettable_phase_hold (obj=0x555557ffa3c0, > opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169 > #14 0x0000555555d0543b in device_reset_child_foreach > (obj=0x555557ff98e0, cb=0x555555d086d5 <resettable_phase_hold>, > opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/qdev.c:275 > #15 0x0000555555d084d8 in resettable_child_foreach (rc=0x55555715a090, > obj=0x555557ff98e0, cb=0x555555d086d5 <resettable_phase_hold>, > opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92 > #16 0x0000555555d08792 in resettable_phase_hold (obj=0x555557ff98e0, > opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169 > #17 0x0000555555d00fc5 in bus_reset_child_foreach (obj=0x555557445120, > cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0, > type=RESET_TYPE_COLD) at ../hw/core/bus.c:97 > #18 0x0000555555d084d8 in resettable_child_foreach (rc=0x555557146f20, > obj=0x555557445120, cb=0x555555d086d5 <resettable_phase_hold>, > opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92 > #19 0x0000555555d08792 in resettable_phase_hold (obj=0x555557445120, > opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169 > #20 0x0000555555d0543b in device_reset_child_foreach > (obj=0x5555572d0a00, cb=0x555555d086d5 <resettable_phase_hold>, > opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/qdev.c:275 > #21 0x0000555555d084d8 in resettable_child_foreach (rc=0x5555570cf410, > obj=0x5555572d0a00, cb=0x555555d086d5 <resettable_phase_hold>, > opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92 > #22 0x0000555555d08792 in resettable_phase_hold (obj=0x5555572d0a00, > opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169 > #23 0x0000555555d00fc5 in bus_reset_child_foreach (obj=0x55555723d270, > cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0, > type=RESET_TYPE_COLD) at ../hw/core/bus.c:97 > #24 0x0000555555d084d8 in resettable_child_foreach (rc=0x5555571bfde0, > obj=0x55555723d270, cb=0x555555d086d5 <resettable_phase_hold>, > opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92 > #25 0x0000555555d08792 in resettable_phase_hold (obj=0x55555723d270, > opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169 > #26 0x0000555555d06d6d in resettable_container_child_foreach > (obj=0x55555724a8a0, cb=0x555555d086d5 <resettable_phase_hold>, > opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resetcontainer.c:54 > #27 0x0000555555d084d8 in resettable_child_foreach (rc=0x5555572180f0, > obj=0x55555724a8a0, cb=0x555555d086d5 <resettable_phase_hold>, > opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92 > #28 0x0000555555d08792 in resettable_phase_hold (obj=0x55555724a8a0, > opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169 > #29 0x0000555555d0838d in resettable_assert_reset (obj=0x55555724a8a0, > type=RESET_TYPE_COLD) at ../hw/core/resettable.c:58 > #30 0x0000555555d082e5 in resettable_reset (obj=0x55555724a8a0, > type=RESET_TYPE_COLD) at ../hw/core/resettable.c:45 > #31 0x00005555558db5c8 in qemu_devices_reset > (reason=SHUTDOWN_CAUSE_GUEST_RESET) at ../hw/core/reset.c:179 > #32 0x0000555555b6f5b2 in pc_machine_reset (machine=0x555557234490, > reason=SHUTDOWN_CAUSE_GUEST_RESET) at ../hw/i386/pc.c:1877 > #33 0x0000555555a5a0a2 in qemu_system_reset > (reason=SHUTDOWN_CAUSE_GUEST_RESET) at ../system/runstate.c:516 > #34 0x0000555555a5a891 in main_loop_should_exit (status=0x7fffffffd574) > at ../system/runstate.c:792 > #35 0x0000555555a5a992 in qemu_main_loop () at ../system/runstate.c:825 > #36 0x0000555555e9cced in qemu_default_main () at ../system/main.c:37 > #37 0x0000555555e9cd2a in main (argc=58, argv=0x7fffffffd6d8) at > ../system/main.c:48 > > > > > > or > > > > 2) introduce new listener ops that can be triggered when a region is > > enabled or disabled > > > > Thanks > > > >> Thanks > >> > >> Eric > >> > >>> Thanks > >>> > >>>> Eric > >>>> > >>>> > >>>>> Thanks > >>>>> > >>>>>> } > >>>>>> > >>>>>> void vhost_toggle_device_iotlb(VirtIODevice *vdev) > >>>>>> -- > >>>>>> 2.47.1 > >>>>>> >
Hi Jason, On 1/21/25 4:27 AM, Jason Wang wrote: > On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> wrote: >> When a guest exposed with a vhost device and protected by an >> intel IOMMU gets rebooted, we sometimes observe a spurious warning: >> >> Fail to lookup the translated address ffffe000 >> >> We observe that the IOMMU gets disabled through a write to the global >> command register (CMAR_GCMD.TE) before the vhost device gets stopped. >> When this warning happens it can be observed an inflight IOTLB >> miss occurs after the IOMMU disable and before the vhost stop. In >> that case a flat translation occurs and the check in >> vhost_memory_region_lookup() fails. >> >> Let's disable the IOTLB callbacks when all IOMMU MRs have been >> unregistered. >> >> Signed-off-by: Eric Auger <eric.auger@redhat.com> >> --- >> hw/virtio/vhost.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >> index 6aa72fd434..128c2ab094 100644 >> --- a/hw/virtio/vhost.c >> +++ b/hw/virtio/vhost.c >> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener, >> break; >> } >> } >> + if (QLIST_EMPTY(&dev->iommu_list) && >> + dev->vhost_ops->vhost_set_iotlb_callback) { >> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false); >> + } > So the current code assumes: > > 1) IOMMU is enabled before vhost starts > 2) IOMMU is disabled after vhost stops > > This patch seems to fix 2) but not 1). Do we need to deal with the > IOMMU enabled after vhost starts? This patch handles the case where the IOMMU is disabled *before* vhost stops (not 2). This is what I concretely observe on guest reboot. But maybe I misunderstood your comments/questions? Thanks Eric > > Thanks > >> } >> >> void vhost_toggle_device_iotlb(VirtIODevice *vdev) >> -- >> 2.47.1 >>
© 2016 - 2025 Red Hat, Inc.