[Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set

Peter Xu posted 1 patch 5 years, 11 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20180418045121.14233-1-peterx@redhat.com
Test checkpatch passed
Test docker-build@min-glib passed
Test docker-mingw@fedora passed
Test s390x passed
hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------
1 file changed, 30 insertions(+), 12 deletions(-)
[Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set
Posted by Peter Xu 5 years, 11 months ago
During IOVA page table walk, there is a special case when:

- notify_unmap is set, meanwhile
- entry is invalid

In the past, we skip the entry always.  This is not correct.  We should
send UNMAP notification to registered notifiers in this case.  Otherwise
some stall pages will still be mapped in the host even if L1 guest
unmapped them already.

Without this patch, nested device assignment to L2 guests might dump
some errors like:

qemu-system-x86_64: VFIO_MAP_DMA: -17
qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
                    0x7f89a920d000) = -17 (File exists)

To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not
affected by this problem).

Signed-off-by: Peter Xu <peterx@redhat.com>
---

To test nested assignment, one also needs to apply below patchset:
https://lkml.org/lkml/2018/4/18/5
---
 hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index fb31de9416..b359efd6f9 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
 
 typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
 
+static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
+                             vtd_page_walk_hook hook_fn, void *private)
+{
+    assert(hook_fn);
+    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
+                            entry->addr_mask, entry->perm);
+    return hook_fn(entry, private);
+}
+
 /**
  * vtd_page_walk_level - walk over specific level for IOVA range
  *
@@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
          */
         entry_valid = read_cur | write_cur;
 
+        entry.target_as = &address_space_memory;
+        entry.iova = iova & subpage_mask;
+        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
+        entry.addr_mask = ~subpage_mask;
+
         if (vtd_is_last_slpte(slpte, level)) {
-            entry.target_as = &address_space_memory;
-            entry.iova = iova & subpage_mask;
             /* NOTE: this is only meaningful if entry_valid == true */
             entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
-            entry.addr_mask = ~subpage_mask;
-            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
             if (!entry_valid && !notify_unmap) {
                 trace_vtd_page_walk_skip_perm(iova, iova_next);
                 goto next;
             }
-            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
-                                    entry.addr_mask, entry.perm);
-            if (hook_fn) {
-                ret = hook_fn(&entry, private);
-                if (ret < 0) {
-                    return ret;
-                }
+            ret = vtd_page_walk_one(&entry, level, hook_fn, private);
+            if (ret < 0) {
+                return ret;
             }
         } else {
             if (!entry_valid) {
-                trace_vtd_page_walk_skip_perm(iova, iova_next);
+                if (notify_unmap) {
+                    /*
+                     * The whole entry is invalid; unmap it all.
+                     * Translated address is meaningless, zero it.
+                     */
+                    entry.translated_addr = 0x0;
+                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);
+                    if (ret < 0) {
+                        return ret;
+                    }
+                } else {
+                    trace_vtd_page_walk_skip_perm(iova, iova_next);
+                }
                 goto next;
             }
             ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
-- 
2.14.3


Re: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set
Posted by Peter Xu 5 years, 11 months ago
On Wed, Apr 18, 2018 at 12:51:21PM +0800, Peter Xu wrote:
> During IOVA page table walk, there is a special case when:
> 
> - notify_unmap is set, meanwhile
> - entry is invalid
> 
> In the past, we skip the entry always.  This is not correct.  We should
> send UNMAP notification to registered notifiers in this case.  Otherwise
> some stall pages will still be mapped in the host even if L1 guest
> unmapped them already.
> 
> Without this patch, nested device assignment to L2 guests might dump
> some errors like:
> 
> qemu-system-x86_64: VFIO_MAP_DMA: -17
> qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
>                     0x7f89a920d000) = -17 (File exists)
> 
> To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not
> affected by this problem).
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

This should really be 2.12 material, it fixes a real bug, but not sure
whether it's too late already.  Michael, what do you think?

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set
Posted by Liu, Yi L 5 years, 11 months ago
> Sent: Wednesday, April 18, 2018 12:51 PM
> Subject: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap
> set
> 
> During IOVA page table walk, there is a special case when:
> 
> - notify_unmap is set, meanwhile
> - entry is invalid

This is very brief description, would you mind talk a little bit more.
 
> In the past, we skip the entry always.  This is not correct.  We should send UNMAP
> notification to registered notifiers in this case.  Otherwise some stall pages will still
> be mapped in the host even if L1 guest unmapped them already.
>
> Without this patch, nested device assignment to L2 guests might dump some errors
> like:

Should it be physical device assigned from L0 host? Or emulated devices could also
trigger this problem?

> qemu-system-x86_64: VFIO_MAP_DMA: -17
> qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
>                     0x7f89a920d000) = -17 (File exists)
> 
> To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not affected by this
> problem).

Does this fix also apply to L0 QEMU?
 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> 
> To test nested assignment, one also needs to apply below patchset:
> https://lkml.org/lkml/2018/4/18/5
> ---
>  hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------
>  1 file changed, 30 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> fb31de9416..b359efd6f9 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t
> iova, bool is_write,
> 
>  typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
> 
> +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
> +                             vtd_page_walk_hook hook_fn, void *private)
> +{
> +    assert(hook_fn);
> +    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
> +                            entry->addr_mask, entry->perm);
> +    return hook_fn(entry, private);
> +}
> +
>  /**
>   * vtd_page_walk_level - walk over specific level for IOVA range
>   *
> @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t
> start,
>           */
>          entry_valid = read_cur | write_cur;
> 
> +        entry.target_as = &address_space_memory;
> +        entry.iova = iova & subpage_mask;
> +        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> +        entry.addr_mask = ~subpage_mask;
> +
>          if (vtd_is_last_slpte(slpte, level)) {
> -            entry.target_as = &address_space_memory;
> -            entry.iova = iova & subpage_mask;
>              /* NOTE: this is only meaningful if entry_valid == true */
>              entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
> -            entry.addr_mask = ~subpage_mask;
> -            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
>              if (!entry_valid && !notify_unmap) {
>                  trace_vtd_page_walk_skip_perm(iova, iova_next);
>                  goto next;
>              }
> -            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
> -                                    entry.addr_mask, entry.perm);
> -            if (hook_fn) {
> -                ret = hook_fn(&entry, private);
> -                if (ret < 0) {
> -                    return ret;
> -                }
> +            ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> +            if (ret < 0) {
> +                return ret;
>              }
>          } else {
>              if (!entry_valid) {
> -                trace_vtd_page_walk_skip_perm(iova, iova_next);
> +                if (notify_unmap) {
> +                    /*
> +                     * The whole entry is invalid; unmap it all.
> +                     * Translated address is meaningless, zero it.
> +                     */
> +                    entry.translated_addr = 0x0;
> +                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> +                    if (ret < 0) {
> +                        return ret;
> +                    }
> +                } else {
> +                    trace_vtd_page_walk_skip_perm(iova, iova_next);
> +                }
>                  goto next;
>              }
>              ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
> --
> 2.14.3
> 

Thanks,
Yi Liu

Re: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set
Posted by Peter Xu 5 years, 11 months ago
On Wed, Apr 18, 2018 at 05:29:56AM +0000, Liu, Yi L wrote:
> > Sent: Wednesday, April 18, 2018 12:51 PM
> > Subject: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap
> > set
> > 
> > During IOVA page table walk, there is a special case when:
> > 
> > - notify_unmap is set, meanwhile
> > - entry is invalid
> 
> This is very brief description, would you mind talk a little bit more.

It means the case when the program reaches [1] below.

>  
> > In the past, we skip the entry always.  This is not correct.  We should send UNMAP
> > notification to registered notifiers in this case.  Otherwise some stall pages will still
> > be mapped in the host even if L1 guest unmapped them already.
> >
> > Without this patch, nested device assignment to L2 guests might dump some errors
> > like:
> 
> Should it be physical device assigned from L0 host? Or emulated devices could also
> trigger this problem?

If using emulated devices, we possibly need three levels, so I think
we can also see this warning if you assign a emulated device from L1
guest to L2 then to L3, and you should be able to see this warning
dumped from the QEMU that runs L2.

> 
> > qemu-system-x86_64: VFIO_MAP_DMA: -17
> > qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
> >                     0x7f89a920d000) = -17 (File exists)
> > 
> > To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not affected by this
> > problem).
> 
> Does this fix also apply to L0 QEMU?

Sorry I wasn't clear.  When I say L1 QEMU I did mean the QEMU that
runs as L1.  I believe it means your "L0 QEMU" here.

And yes, this fix should also be valid even if without nesting,
however we can hardly trigger this (that's why I just found it
recently when people reported nested breakage, since it is hardly seen
without nested), but AFAIU it's still possible.

>  
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > 
> > To test nested assignment, one also needs to apply below patchset:
> > https://lkml.org/lkml/2018/4/18/5
> > ---
> >  hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------
> >  1 file changed, 30 insertions(+), 12 deletions(-)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> > fb31de9416..b359efd6f9 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t
> > iova, bool is_write,
> > 
> >  typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
> > 
> > +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
> > +                             vtd_page_walk_hook hook_fn, void *private)
> > +{
> > +    assert(hook_fn);
> > +    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
> > +                            entry->addr_mask, entry->perm);
> > +    return hook_fn(entry, private);
> > +}
> > +
> >  /**
> >   * vtd_page_walk_level - walk over specific level for IOVA range
> >   *
> > @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t
> > start,
> >           */
> >          entry_valid = read_cur | write_cur;
> > 
> > +        entry.target_as = &address_space_memory;
> > +        entry.iova = iova & subpage_mask;
> > +        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> > +        entry.addr_mask = ~subpage_mask;
> > +
> >          if (vtd_is_last_slpte(slpte, level)) {
> > -            entry.target_as = &address_space_memory;
> > -            entry.iova = iova & subpage_mask;
> >              /* NOTE: this is only meaningful if entry_valid == true */
> >              entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
> > -            entry.addr_mask = ~subpage_mask;
> > -            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> >              if (!entry_valid && !notify_unmap) {
> >                  trace_vtd_page_walk_skip_perm(iova, iova_next);
> >                  goto next;
> >              }
> > -            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
> > -                                    entry.addr_mask, entry.perm);
> > -            if (hook_fn) {
> > -                ret = hook_fn(&entry, private);
> > -                if (ret < 0) {
> > -                    return ret;
> > -                }
> > +            ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> > +            if (ret < 0) {
> > +                return ret;
> >              }
> >          } else {
> >              if (!entry_valid) {

[1]

> > -                trace_vtd_page_walk_skip_perm(iova, iova_next);
> > +                if (notify_unmap) {
> > +                    /*
> > +                     * The whole entry is invalid; unmap it all.
> > +                     * Translated address is meaningless, zero it.
> > +                     */
> > +                    entry.translated_addr = 0x0;
> > +                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> > +                    if (ret < 0) {
> > +                        return ret;
> > +                    }
> > +                } else {
> > +                    trace_vtd_page_walk_skip_perm(iova, iova_next);
> > +                }
> >                  goto next;
> >              }
> >              ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
> > --
> > 2.14.3
> > 
> 
> Thanks,
> Yi Liu

-- 
Peter Xu

Re: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set
Posted by Jason Wang 5 years, 11 months ago

On 2018年04月18日 12:51, Peter Xu wrote:
> During IOVA page table walk, there is a special case when:
>
> - notify_unmap is set, meanwhile
> - entry is invalid
>
> In the past, we skip the entry always.  This is not correct.  We should
> send UNMAP notification to registered notifiers in this case.  Otherwise
> some stall pages will still be mapped in the host even if L1 guest

You mean 'stale' here?

> unmapped them already.

It looks like some IOTLB invalidation were missed? If not, the page 
should be unmaped during invalidation.

>
> Without this patch, nested device assignment to L2 guests might dump
> some errors like:
>
> qemu-system-x86_64: VFIO_MAP_DMA: -17
> qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
>                      0x7f89a920d000) = -17 (File exists)
>
> To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not
> affected by this problem).
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>
> To test nested assignment, one also needs to apply below patchset:
> https://lkml.org/lkml/2018/4/18/5

Have a quick glance at it, looks like there's no need for this patch is 
we had that kernel patch applied.

Thanks

> ---
>   hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------
>   1 file changed, 30 insertions(+), 12 deletions(-)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index fb31de9416..b359efd6f9 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
>   
>   typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
>   
> +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
> +                             vtd_page_walk_hook hook_fn, void *private)
> +{
> +    assert(hook_fn);
> +    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
> +                            entry->addr_mask, entry->perm);
> +    return hook_fn(entry, private);
> +}
> +
>   /**
>    * vtd_page_walk_level - walk over specific level for IOVA range
>    *
> @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
>            */
>           entry_valid = read_cur | write_cur;
>   
> +        entry.target_as = &address_space_memory;
> +        entry.iova = iova & subpage_mask;
> +        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> +        entry.addr_mask = ~subpage_mask;
> +
>           if (vtd_is_last_slpte(slpte, level)) {
> -            entry.target_as = &address_space_memory;
> -            entry.iova = iova & subpage_mask;
>               /* NOTE: this is only meaningful if entry_valid == true */
>               entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
> -            entry.addr_mask = ~subpage_mask;
> -            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
>               if (!entry_valid && !notify_unmap) {
>                   trace_vtd_page_walk_skip_perm(iova, iova_next);
>                   goto next;
>               }
> -            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
> -                                    entry.addr_mask, entry.perm);
> -            if (hook_fn) {
> -                ret = hook_fn(&entry, private);
> -                if (ret < 0) {
> -                    return ret;
> -                }
> +            ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> +            if (ret < 0) {
> +                return ret;
>               }
>           } else {
>               if (!entry_valid) {
> -                trace_vtd_page_walk_skip_perm(iova, iova_next);
> +                if (notify_unmap) {
> +                    /*
> +                     * The whole entry is invalid; unmap it all.
> +                     * Translated address is meaningless, zero it.
> +                     */
> +                    entry.translated_addr = 0x0;
> +                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> +                    if (ret < 0) {
> +                        return ret;
> +                    }
> +                } else {
> +                    trace_vtd_page_walk_skip_perm(iova, iova_next);
> +                }
>                   goto next;
>               }
>               ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,


Re: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set
Posted by Peter Xu 5 years, 11 months ago
On Fri, Apr 20, 2018 at 12:57:13PM +0800, Jason Wang wrote:
> 
> 
> On 2018年04月18日 12:51, Peter Xu wrote:
> > During IOVA page table walk, there is a special case when:
> > 
> > - notify_unmap is set, meanwhile
> > - entry is invalid
> > 
> > In the past, we skip the entry always.  This is not correct.  We should
> > send UNMAP notification to registered notifiers in this case.  Otherwise
> > some stall pages will still be mapped in the host even if L1 guest
> 
> You mean 'stale' here?

Ah, yes.

> 
> > unmapped them already.
> 
> It looks like some IOTLB invalidation were missed? If not, the page should
> be unmaped during invalidation.

I'm afraid it's not the same problem, and that's why I think we need
two fixes.

Even if the guest sends the correct invalidations, it's still possible
that QEMU skips the PSI with current code.

> 
> > 
> > Without this patch, nested device assignment to L2 guests might dump
> > some errors like:
> > 
> > qemu-system-x86_64: VFIO_MAP_DMA: -17
> > qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
> >                      0x7f89a920d000) = -17 (File exists)
> > 
> > To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not
> > affected by this problem).
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > 
> > To test nested assignment, one also needs to apply below patchset:
> > https://lkml.org/lkml/2018/4/18/5
> 
> Have a quick glance at it, looks like there's no need for this patch is we
> had that kernel patch applied.

Even if guest kernel applies above fix, QEMU might still drop some of
the PSIs.  Above error messages were captured with kernel patches
applied already (otherwise we see no error messages, we directly miss
them from guest).

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set
Posted by Jason Wang 5 years, 11 months ago

On 2018年04月20日 13:11, Peter Xu wrote:
> On Fri, Apr 20, 2018 at 12:57:13PM +0800, Jason Wang wrote:
>> On 2018年04月18日 12:51, Peter Xu wrote:
>>> During IOVA page table walk, there is a special case when:
>>>
>>> - notify_unmap is set, meanwhile
>>> - entry is invalid
>>>
>>> In the past, we skip the entry always.  This is not correct.  We should
>>> send UNMAP notification to registered notifiers in this case.  Otherwise
>>> some stall pages will still be mapped in the host even if L1 guest
>> You mean 'stale' here?
> Ah, yes.
>
>>> unmapped them already.
>> It looks like some IOTLB invalidation were missed? If not, the page should
>> be unmaped during invalidation.
> I'm afraid it's not the same problem, and that's why I think we need
> two fixes.
>
> Even if the guest sends the correct invalidations, it's still possible
> that QEMU skips the PSI with current code.
>
>>> Without this patch, nested device assignment to L2 guests might dump
>>> some errors like:
>>>
>>> qemu-system-x86_64: VFIO_MAP_DMA: -17
>>> qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
>>>                       0x7f89a920d000) = -17 (File exists)
>>>
>>> To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not
>>> affected by this problem).
>>>
>>> Signed-off-by: Peter Xu<peterx@redhat.com>
>>> ---
>>>
>>> To test nested assignment, one also needs to apply below patchset:
>>> https://lkml.org/lkml/2018/4/18/5
>> Have a quick glance at it, looks like there's no need for this patch is we
>> had that kernel patch applied.
> Even if guest kernel applies above fix, QEMU might still drop some of
> the PSIs.  Above error messages were captured with kernel patches
> applied already (otherwise we see no error messages, we directly miss
> them from guest).
>
> Thanks,

I see so this actually try to trigger unmap notifier for non last level 
slpte.

Thanks


Re: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set
Posted by Jason Wang 5 years, 11 months ago

On 2018年04月18日 12:51, Peter Xu wrote:
> During IOVA page table walk, there is a special case when:
>
> - notify_unmap is set, meanwhile
> - entry is invalid
>
> In the past, we skip the entry always.  This is not correct.  We should
> send UNMAP notification to registered notifiers in this case.  Otherwise
> some stall pages will still be mapped in the host even if L1 guest
> unmapped them already.
>
> Without this patch, nested device assignment to L2 guests might dump
> some errors like:
>
> qemu-system-x86_64: VFIO_MAP_DMA: -17
> qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
>                      0x7f89a920d000) = -17 (File exists)
>
> To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not
> affected by this problem).
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>
> To test nested assignment, one also needs to apply below patchset:
> https://lkml.org/lkml/2018/4/18/5
> ---
>   hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------
>   1 file changed, 30 insertions(+), 12 deletions(-)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index fb31de9416..b359efd6f9 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
>   
>   typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
>   
> +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
> +                             vtd_page_walk_hook hook_fn, void *private)
> +{
> +    assert(hook_fn);
> +    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
> +                            entry->addr_mask, entry->perm);
> +    return hook_fn(entry, private);
> +}
> +
>   /**
>    * vtd_page_walk_level - walk over specific level for IOVA range
>    *
> @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
>            */
>           entry_valid = read_cur | write_cur;
>   
> +        entry.target_as = &address_space_memory;
> +        entry.iova = iova & subpage_mask;
> +        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> +        entry.addr_mask = ~subpage_mask;
> +
>           if (vtd_is_last_slpte(slpte, level)) {
> -            entry.target_as = &address_space_memory;
> -            entry.iova = iova & subpage_mask;
>               /* NOTE: this is only meaningful if entry_valid == true */
>               entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
> -            entry.addr_mask = ~subpage_mask;
> -            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
>               if (!entry_valid && !notify_unmap) {
>                   trace_vtd_page_walk_skip_perm(iova, iova_next);
>                   goto next;
>               }
> -            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
> -                                    entry.addr_mask, entry.perm);
> -            if (hook_fn) {
> -                ret = hook_fn(&entry, private);
> -                if (ret < 0) {
> -                    return ret;
> -                }
> +            ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> +            if (ret < 0) {
> +                return ret;
>               }
>           } else {
>               if (!entry_valid) {
> -                trace_vtd_page_walk_skip_perm(iova, iova_next);
> +                if (notify_unmap) {
> +                    /*
> +                     * The whole entry is invalid; unmap it all.
> +                     * Translated address is meaningless, zero it.
> +                     */
> +                    entry.translated_addr = 0x0;
> +                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> +                    if (ret < 0) {
> +                        return ret;
> +                    }
> +                } else {
> +                    trace_vtd_page_walk_skip_perm(iova, iova_next);
> +                }
>                   goto next;
>               }
>               ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,

Acked-by: Jason Wang <jasowang@redhat.com>