[Xen-devel] [PATCH 0/3] AMD/IOMMU: re-work mode updating

Jan Beulich posted 3 patches 2 weeks ago
Only 0 patches received!

[Xen-devel] [PATCH 0/3] AMD/IOMMU: re-work mode updating

Posted by Jan Beulich 2 weeks ago
update_paging_mode() in the AMD IOMMU code expects to be invoked with
the PCI devices lock held. The check occurring only when the mode
actually needs updating, the violation of this rule by the majority
of callers did go unnoticed until per-domain IOMMU setup was changed
to do away with on-demand creation of IOMMU page tables.

Unfortunately the only half way reasonable fix to this that I could
come up with requires more re-work than would seem desirable at this
time of the release process, but addressing the issue seems
unavoidable to me as its manifestation is a regression from the
IOMMU page table setup re-work. The change also isn't without risk
of further regressions - if in patch 2 I've missed a code path that
would also need to invoke the new hook, then this might mean non-
working guests (with passed-through devices on AMD hardware).

1: AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page
2: introduce GFN notification for translated domains
3: AMD/IOMMU: use notify_dfn() hook to update paging mode

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/3] AMD/IOMMU: re-work mode updating

Posted by Sander Eikelenboom 2 weeks ago
On 06/11/2019 16:16, Jan Beulich wrote:
> update_paging_mode() in the AMD IOMMU code expects to be invoked with
> the PCI devices lock held. The check occurring only when the mode
> actually needs updating, the violation of this rule by the majority
> of callers did go unnoticed until per-domain IOMMU setup was changed
> to do away with on-demand creation of IOMMU page tables.
> 
> Unfortunately the only half way reasonable fix to this that I could
> come up with requires more re-work than would seem desirable at this
> time of the release process, but addressing the issue seems
> unavoidable to me as its manifestation is a regression from the
> IOMMU page table setup re-work. The change also isn't without risk
> of further regressions - if in patch 2 I've missed a code path that
> would also need to invoke the new hook, then this might mean non-
> working guests (with passed-through devices on AMD hardware).
> 
> 1: AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page
> 2: introduce GFN notification for translated domains
> 3: AMD/IOMMU: use notify_dfn() hook to update paging mode
> 
> Jan
> 

Hi Jan,

I just tested and I don't get the  "pcidevs" message any more.

I assume this only was a fix for that issue, so it's probably expected
that the other issue:
   AMD-Vi: INVALID_DEV_REQUEST 00000800 8a000000 f8000840 000000fd
   and malfunctioning device in one of the guests.
is still around.

--
Sander

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/3] AMD/IOMMU: re-work mode updating

Posted by Jan Beulich 2 weeks ago
On 06.11.2019 19:29, Sander Eikelenboom wrote:
> On 06/11/2019 16:16, Jan Beulich wrote:
>> update_paging_mode() in the AMD IOMMU code expects to be invoked with
>> the PCI devices lock held. The check occurring only when the mode
>> actually needs updating, the violation of this rule by the majority
>> of callers did go unnoticed until per-domain IOMMU setup was changed
>> to do away with on-demand creation of IOMMU page tables.
>>
>> Unfortunately the only half way reasonable fix to this that I could
>> come up with requires more re-work than would seem desirable at this
>> time of the release process, but addressing the issue seems
>> unavoidable to me as its manifestation is a regression from the
>> IOMMU page table setup re-work. The change also isn't without risk
>> of further regressions - if in patch 2 I've missed a code path that
>> would also need to invoke the new hook, then this might mean non-
>> working guests (with passed-through devices on AMD hardware).
>>
>> 1: AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page
>> 2: introduce GFN notification for translated domains
>> 3: AMD/IOMMU: use notify_dfn() hook to update paging mode
> 
> I just tested and I don't get the  "pcidevs" message any more.

Thanks for testing the series.

> I assume this only was a fix for that issue, so it's probably expected
> that the other issue:
>    AMD-Vi: INVALID_DEV_REQUEST 00000800 8a000000 f8000840 000000fd
>    and malfunctioning device in one of the guests.
> is still around.

Indeed. Someone (possibly me) still needs to look into the other one.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/3] AMD/IOMMU: re-work mode updating

Posted by Andrew Cooper 2 weeks ago
On 06/11/2019 15:16, Jan Beulich wrote:
> update_paging_mode() in the AMD IOMMU code expects to be invoked with
> the PCI devices lock held. The check occurring only when the mode
> actually needs updating, the violation of this rule by the majority
> of callers did go unnoticed until per-domain IOMMU setup was changed
> to do away with on-demand creation of IOMMU page tables.
>
> Unfortunately the only half way reasonable fix to this that I could
> come up with requires more re-work than would seem desirable at this
> time of the release process, but addressing the issue seems
> unavoidable to me as its manifestation is a regression from the
> IOMMU page table setup re-work. The change also isn't without risk
> of further regressions - if in patch 2 I've missed a code path that
> would also need to invoke the new hook, then this might mean non-
> working guests (with passed-through devices on AMD hardware).
>
> 1: AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page
> 2: introduce GFN notification for translated domains
> 3: AMD/IOMMU: use notify_dfn() hook to update paging mode

Having now looked at all three, why don't we just delete the dynamic
height of AMD IOMMU pagetables?

This series looks suspiciously like it is adding new common
infrastructure to work around the fact we're doing something fairly dumb
to being with.

Hardcoding at 4 levels is, at the very worst, 2 extra pages per domain,
and a substantial reduction in the complexity of the IOMMU code.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/3] AMD/IOMMU: re-work mode updating

Posted by Jan Beulich 2 weeks ago
On 06.11.2019 18:31, Andrew Cooper wrote:
> On 06/11/2019 15:16, Jan Beulich wrote:
>> update_paging_mode() in the AMD IOMMU code expects to be invoked with
>> the PCI devices lock held. The check occurring only when the mode
>> actually needs updating, the violation of this rule by the majority
>> of callers did go unnoticed until per-domain IOMMU setup was changed
>> to do away with on-demand creation of IOMMU page tables.
>>
>> Unfortunately the only half way reasonable fix to this that I could
>> come up with requires more re-work than would seem desirable at this
>> time of the release process, but addressing the issue seems
>> unavoidable to me as its manifestation is a regression from the
>> IOMMU page table setup re-work. The change also isn't without risk
>> of further regressions - if in patch 2 I've missed a code path that
>> would also need to invoke the new hook, then this might mean non-
>> working guests (with passed-through devices on AMD hardware).
>>
>> 1: AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page
>> 2: introduce GFN notification for translated domains
>> 3: AMD/IOMMU: use notify_dfn() hook to update paging mode
> 
> Having now looked at all three, why don't we just delete the dynamic
> height of AMD IOMMU pagetables?
> 
> This series looks suspiciously like it is adding new common
> infrastructure to work around the fact we're doing something fairly dumb
> to being with.
> 
> Hardcoding at 4 levels is, at the very worst, 2 extra pages per domain,
> and a substantial reduction in the complexity of the IOMMU code.

Yet an additional level of page walks hardware has to perform. Also
4 levels won't cover all possible 52 address bits. And finally, the
more applicable your "substantial reduction", the less suitable such
a change may be at this point of the release (but I didn't look at
this side of things in any detail, so it may well not be an issue).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/3] AMD/IOMMU: re-work mode updating

Posted by Andrew Cooper 2 weeks ago
On 07/11/2019 07:36, Jan Beulich wrote:
> On 06.11.2019 18:31, Andrew Cooper wrote:
>> On 06/11/2019 15:16, Jan Beulich wrote:
>>> update_paging_mode() in the AMD IOMMU code expects to be invoked with
>>> the PCI devices lock held. The check occurring only when the mode
>>> actually needs updating, the violation of this rule by the majority
>>> of callers did go unnoticed until per-domain IOMMU setup was changed
>>> to do away with on-demand creation of IOMMU page tables.
>>>
>>> Unfortunately the only half way reasonable fix to this that I could
>>> come up with requires more re-work than would seem desirable at this
>>> time of the release process, but addressing the issue seems
>>> unavoidable to me as its manifestation is a regression from the
>>> IOMMU page table setup re-work. The change also isn't without risk
>>> of further regressions - if in patch 2 I've missed a code path that
>>> would also need to invoke the new hook, then this might mean non-
>>> working guests (with passed-through devices on AMD hardware).
>>>
>>> 1: AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page
>>> 2: introduce GFN notification for translated domains
>>> 3: AMD/IOMMU: use notify_dfn() hook to update paging mode
>> Having now looked at all three, why don't we just delete the dynamic
>> height of AMD IOMMU pagetables?
>>
>> This series looks suspiciously like it is adding new common
>> infrastructure to work around the fact we're doing something fairly dumb
>> to being with.
>>
>> Hardcoding at 4 levels is, at the very worst, 2 extra pages per domain,
>> and a substantial reduction in the complexity of the IOMMU code.
> Yet an additional level of page walks hardware has to perform. Also
> 4 levels won't cover all possible 52 address bits. And finally, the
> more applicable your "substantial reduction", the less suitable such
> a change may be at this point of the release (but I didn't look at
> this side of things in any detail, so it may well not be an issue).

There is, in practice, no such thing as an HVM guest using 2 levels. 
The VRAM just below the 4G boundary will force a resize to 3 levels
during domain construction, and as a 1-line fix for 4.13, this probably
isn't the worst idea going.

There are no AMD systems which support >48 bit PA space, so 4 levels is
sufficient for now, but fundamentally details such as the size of GPA
space should be specified in domain_create() and remain static for the
lifetime of the domain.

As far as I can tell, it is only AMD systems with IOMMUs which permit
the PA space to be variable, and I still can't help but feeling that
this series is attempting to work around a problem we shouldn't have in
the first place.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/3] AMD/IOMMU: re-work mode updating

Posted by Jan Beulich 2 weeks ago
On 07.11.2019 13:49, Andrew Cooper wrote:
> On 07/11/2019 07:36, Jan Beulich wrote:
>> On 06.11.2019 18:31, Andrew Cooper wrote:
>>> On 06/11/2019 15:16, Jan Beulich wrote:
>>>> update_paging_mode() in the AMD IOMMU code expects to be invoked with
>>>> the PCI devices lock held. The check occurring only when the mode
>>>> actually needs updating, the violation of this rule by the majority
>>>> of callers did go unnoticed until per-domain IOMMU setup was changed
>>>> to do away with on-demand creation of IOMMU page tables.
>>>>
>>>> Unfortunately the only half way reasonable fix to this that I could
>>>> come up with requires more re-work than would seem desirable at this
>>>> time of the release process, but addressing the issue seems
>>>> unavoidable to me as its manifestation is a regression from the
>>>> IOMMU page table setup re-work. The change also isn't without risk
>>>> of further regressions - if in patch 2 I've missed a code path that
>>>> would also need to invoke the new hook, then this might mean non-
>>>> working guests (with passed-through devices on AMD hardware).
>>>>
>>>> 1: AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page
>>>> 2: introduce GFN notification for translated domains
>>>> 3: AMD/IOMMU: use notify_dfn() hook to update paging mode
>>> Having now looked at all three, why don't we just delete the dynamic
>>> height of AMD IOMMU pagetables?
>>>
>>> This series looks suspiciously like it is adding new common
>>> infrastructure to work around the fact we're doing something fairly dumb
>>> to being with.
>>>
>>> Hardcoding at 4 levels is, at the very worst, 2 extra pages per domain,
>>> and a substantial reduction in the complexity of the IOMMU code.
>> Yet an additional level of page walks hardware has to perform. Also
>> 4 levels won't cover all possible 52 address bits. And finally, the
>> more applicable your "substantial reduction", the less suitable such
>> a change may be at this point of the release (but I didn't look at
>> this side of things in any detail, so it may well not be an issue).
> 
> There is, in practice, no such thing as an HVM guest using 2 levels. 
> The VRAM just below the 4G boundary will force a resize to 3 levels
> during domain construction, and as a 1-line fix for 4.13, this probably
> isn't the worst idea going.

So here (with the 1-line fix remark) you talk about 3 levels. Yet
switching the 2 that we start from to 3 won't fix anything, as we
still may need to go to 4 for huge guests. Such a change would
merely eliminate the indeed pretty pointless move from 2 to 3 which
now happens for all domains as their memory gets populated.

> There are no AMD systems which support >48 bit PA space, so 4 levels is
> sufficient for now, but fundamentally details such as the size of GPA
> space should be specified in domain_create() and remain static for the
> lifetime of the domain.

I agree GPA dimensions ought to be static. But the number-of-levels
adjustment the code does has nothing to do with variable GPA
boundaries. Even for a domain with a, say, 36-bit GFN space it
may be beneficial to run with only 3 levels, as long as it has
"little" enough memory assigned. In fact the number of levels the
hardware has to walk is the one aspect you don't even touch in your
reply.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/3] AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page

Posted by Jan Beulich 2 weeks ago
Unmapping a page which has never been mapped should be a no-op (note how
it already is in case there was no root page table allocated). There's
in particular no need to grow the number of page table levels in use,
and there's also no need to allocate intermediate page tables except
when needing to split a large page.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -176,7 +176,7 @@ void iommu_dte_set_guest_cr3(struct amd_
  * page tables.
  */
 static int iommu_pde_from_dfn(struct domain *d, unsigned long dfn,
-                              unsigned long pt_mfn[])
+                              unsigned long pt_mfn[], bool map)
 {
     struct amd_iommu_pte *pde, *next_table_vaddr;
     unsigned long  next_table_mfn;
@@ -189,6 +189,13 @@ static int iommu_pde_from_dfn(struct dom
 
     BUG_ON( table == NULL || level < 1 || level > 6 );
 
+    /*
+     * A frame number past what the current page tables can represent can't
+     * possibly have a mapping.
+     */
+    if ( dfn >> (PTE_PER_TABLE_SHIFT * level) )
+        return 0;
+
     next_table_mfn = mfn_x(page_to_mfn(table));
 
     if ( level == 1 )
@@ -246,6 +253,9 @@ static int iommu_pde_from_dfn(struct dom
         /* Install lower level page table for non-present entries */
         else if ( !pde->pr )
         {
+            if ( !map )
+                return 0;
+
             if ( next_table_mfn == 0 )
             {
                 table = alloc_amd_iommu_pgtable();
@@ -404,7 +414,7 @@ int amd_iommu_map_page(struct domain *d,
         }
     }
 
-    if ( iommu_pde_from_dfn(d, dfn_x(dfn), pt_mfn) || (pt_mfn[1] == 0) )
+    if ( iommu_pde_from_dfn(d, dfn_x(dfn), pt_mfn, true) || (pt_mfn[1] == 0) )
     {
         spin_unlock(&hd->arch.mapping_lock);
         AMD_IOMMU_DEBUG("Invalid IO pagetable entry dfn = %"PRI_dfn"\n",
@@ -439,24 +449,7 @@ int amd_iommu_unmap_page(struct domain *
         return 0;
     }
 
-    /* Since HVM domain is initialized with 2 level IO page table,
-     * we might need a deeper page table for lager dfn now */
-    if ( is_hvm_domain(d) )
-    {
-        int rc = update_paging_mode(d, dfn_x(dfn));
-
-        if ( rc )
-        {
-            spin_unlock(&hd->arch.mapping_lock);
-            AMD_IOMMU_DEBUG("Update page mode failed dfn = %"PRI_dfn"\n",
-                            dfn_x(dfn));
-            if ( rc != -EADDRNOTAVAIL )
-                domain_crash(d);
-            return rc;
-        }
-    }
-
-    if ( iommu_pde_from_dfn(d, dfn_x(dfn), pt_mfn) || (pt_mfn[1] == 0) )
+    if ( iommu_pde_from_dfn(d, dfn_x(dfn), pt_mfn, false) )
     {
         spin_unlock(&hd->arch.mapping_lock);
         AMD_IOMMU_DEBUG("Invalid IO pagetable entry dfn = %"PRI_dfn"\n",
@@ -465,8 +458,11 @@ int amd_iommu_unmap_page(struct domain *
         return -EFAULT;
     }
 
-    /* mark PTE as 'page not present' */
-    *flush_flags |= clear_iommu_pte_present(pt_mfn[1], dfn_x(dfn));
+    if ( pt_mfn[1] )
+    {
+        /* Mark PTE as 'page not present'. */
+        *flush_flags |= clear_iommu_pte_present(pt_mfn[1], dfn_x(dfn));
+    }
 
     spin_unlock(&hd->arch.mapping_lock);
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page

Posted by Jürgen Groß 1 week ago
On 06.11.19 16:18, Jan Beulich wrote:
> Unmapping a page which has never been mapped should be a no-op (note how
> it already is in case there was no root page table allocated). There's
> in particular no need to grow the number of page table levels in use,
> and there's also no need to allocate intermediate page tables except
> when needing to split a large page.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Release-acked-by: Juergen Gross <jgross@suse.com>


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page

Posted by Paul Durrant 2 weeks ago
On Wed, 6 Nov 2019 at 15:20, Jan Beulich <jbeulich@suse.com> wrote:
>
> Unmapping a page which has never been mapped should be a no-op (note how
> it already is in case there was no root page table allocated). There's
> in particular no need to grow the number of page table levels in use,
> and there's also no need to allocate intermediate page tables except
> when needing to split a large page.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Paul Durrant <paul@xen.org>

>
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -176,7 +176,7 @@ void iommu_dte_set_guest_cr3(struct amd_
>   * page tables.
>   */
>  static int iommu_pde_from_dfn(struct domain *d, unsigned long dfn,
> -                              unsigned long pt_mfn[])
> +                              unsigned long pt_mfn[], bool map)
>  {
>      struct amd_iommu_pte *pde, *next_table_vaddr;
>      unsigned long  next_table_mfn;
> @@ -189,6 +189,13 @@ static int iommu_pde_from_dfn(struct dom
>
>      BUG_ON( table == NULL || level < 1 || level > 6 );
>
> +    /*
> +     * A frame number past what the current page tables can represent can't
> +     * possibly have a mapping.
> +     */
> +    if ( dfn >> (PTE_PER_TABLE_SHIFT * level) )
> +        return 0;
> +
>      next_table_mfn = mfn_x(page_to_mfn(table));
>
>      if ( level == 1 )
> @@ -246,6 +253,9 @@ static int iommu_pde_from_dfn(struct dom
>          /* Install lower level page table for non-present entries */
>          else if ( !pde->pr )
>          {
> +            if ( !map )
> +                return 0;
> +
>              if ( next_table_mfn == 0 )
>              {
>                  table = alloc_amd_iommu_pgtable();
> @@ -404,7 +414,7 @@ int amd_iommu_map_page(struct domain *d,
>          }
>      }
>
> -    if ( iommu_pde_from_dfn(d, dfn_x(dfn), pt_mfn) || (pt_mfn[1] == 0) )
> +    if ( iommu_pde_from_dfn(d, dfn_x(dfn), pt_mfn, true) || (pt_mfn[1] == 0) )
>      {
>          spin_unlock(&hd->arch.mapping_lock);
>          AMD_IOMMU_DEBUG("Invalid IO pagetable entry dfn = %"PRI_dfn"\n",
> @@ -439,24 +449,7 @@ int amd_iommu_unmap_page(struct domain *
>          return 0;
>      }
>
> -    /* Since HVM domain is initialized with 2 level IO page table,
> -     * we might need a deeper page table for lager dfn now */
> -    if ( is_hvm_domain(d) )
> -    {
> -        int rc = update_paging_mode(d, dfn_x(dfn));
> -
> -        if ( rc )
> -        {
> -            spin_unlock(&hd->arch.mapping_lock);
> -            AMD_IOMMU_DEBUG("Update page mode failed dfn = %"PRI_dfn"\n",
> -                            dfn_x(dfn));
> -            if ( rc != -EADDRNOTAVAIL )
> -                domain_crash(d);
> -            return rc;
> -        }
> -    }
> -
> -    if ( iommu_pde_from_dfn(d, dfn_x(dfn), pt_mfn) || (pt_mfn[1] == 0) )
> +    if ( iommu_pde_from_dfn(d, dfn_x(dfn), pt_mfn, false) )
>      {
>          spin_unlock(&hd->arch.mapping_lock);
>          AMD_IOMMU_DEBUG("Invalid IO pagetable entry dfn = %"PRI_dfn"\n",
> @@ -465,8 +458,11 @@ int amd_iommu_unmap_page(struct domain *
>          return -EFAULT;
>      }
>
> -    /* mark PTE as 'page not present' */
> -    *flush_flags |= clear_iommu_pte_present(pt_mfn[1], dfn_x(dfn));
> +    if ( pt_mfn[1] )
> +    {
> +        /* Mark PTE as 'page not present'. */
> +        *flush_flags |= clear_iommu_pte_present(pt_mfn[1], dfn_x(dfn));
> +    }
>
>      spin_unlock(&hd->arch.mapping_lock);
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page

Posted by Andrew Cooper 2 weeks ago
On 06/11/2019 15:18, Jan Beulich wrote:
> Unmapping a page which has never been mapped should be a no-op (note how
> it already is in case there was no root page table allocated).

Which function are you talking about here?  iommu_pde_from_dfn() will
BUG() if no root was set up.

> There's
> in particular no need to grow the number of page table levels in use,
> and there's also no need to allocate intermediate page tables except
> when needing to split a large page.

To be honest, I've never been convinced that dynamically changing the
number of levels in the AMD IOMMU tables is clever.  It should be fixed
at 4 (like everything else) and suddenly a lot of runtime complexity
disappears.  (I'm fairly confident that we'll need a domain create
parameter to support 5 level paging in a rational way, so we won't even
include walk-length gymnastics then either.)

>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page

Posted by Jan Beulich 2 weeks ago
On 06.11.2019 18:12, Andrew Cooper wrote:
> On 06/11/2019 15:18, Jan Beulich wrote:
>> Unmapping a page which has never been mapped should be a no-op (note how
>> it already is in case there was no root page table allocated).
> 
> Which function are you talking about here?  iommu_pde_from_dfn() will
> BUG() if no root was set up.

amd_iommu_unmap_page() has such a check first thing.

>> There's
>> in particular no need to grow the number of page table levels in use,
>> and there's also no need to allocate intermediate page tables except
>> when needing to split a large page.
> 
> To be honest, I've never been convinced that dynamically changing the
> number of levels in the AMD IOMMU tables is clever.  It should be fixed
> at 4 (like everything else) and suddenly a lot of runtime complexity
> disappears.  (I'm fairly confident that we'll need a domain create
> parameter to support 5 level paging in a rational way, so we won't even
> include walk-length gymnastics then either.)

5-level paging for the CPU 1st-stage-translation is imo pretty orthogonal
to needing 5 levels of paging for 2nd-stage-translation (which also is
what the IOMMU code here is about).

>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

Thanks, Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/3] introduce GFN notification for translated domains

Posted by Jan Beulich 2 weeks ago
In order for individual IOMMU drivers (and from an abstract pov also
architectures) to be able to adjust their data structures ahead of time
when they might cover only a sub-range of all possible GFNs, introduce
a notification call used by various code paths potentially installing a
fresh mapping of a never used GFN (for a particular domain).

Note that in gnttab_transfer() the notification and lock re-acquire
handling is best effort only (the guest may not be able to make use of
the new page in case of failure, but that's in line with the lack of a
return value check of guest_physmap_add_page() itself).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -173,7 +173,8 @@ static int __init pvh_populate_memory_ra
             continue;
         }
 
-        rc = guest_physmap_add_page(d, _gfn(start), page_to_mfn(page),
+        rc = notify_gfn(d, _gfn(start + (1UL << order) - 1)) ?:
+             guest_physmap_add_page(d, _gfn(start), page_to_mfn(page),
                                     order);
         if ( rc != 0 )
         {
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4286,9 +4286,17 @@ static int hvmop_set_param(
         if ( a.value > SHUTDOWN_MAX )
             rc = -EINVAL;
         break;
+
     case HVM_PARAM_IOREQ_SERVER_PFN:
-        d->arch.hvm.ioreq_gfn.base = a.value;
+        if ( d->arch.hvm.params[HVM_PARAM_NR_IOREQ_SERVER_PAGES] )
+            rc = notify_gfn(
+                     d,
+                     _gfn(a.value + d->arch.hvm.params
+                                    [HVM_PARAM_NR_IOREQ_SERVER_PAGES] - 1));
+        if ( !rc )
+             d->arch.hvm.ioreq_gfn.base = a.value;
         break;
+
     case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
     {
         unsigned int i;
@@ -4299,6 +4307,9 @@ static int hvmop_set_param(
             rc = -EINVAL;
             break;
         }
+        rc = notify_gfn(d, _gfn(d->arch.hvm.ioreq_gfn.base + a.value - 1));
+        if ( rc )
+            break;
         for ( i = 0; i < a.value; i++ )
             set_bit(i, &d->arch.hvm.ioreq_gfn.mask);
 
@@ -4312,7 +4323,11 @@ static int hvmop_set_param(
         BUILD_BUG_ON(HVM_PARAM_BUFIOREQ_PFN >
                      sizeof(d->arch.hvm.ioreq_gfn.legacy_mask) * 8);
         if ( a.value )
-            set_bit(a.index, &d->arch.hvm.ioreq_gfn.legacy_mask);
+        {
+            rc = notify_gfn(d, _gfn(a.value));
+            if ( !rc )
+                set_bit(a.index, &d->arch.hvm.ioreq_gfn.legacy_mask);
+        }
         break;
 
     case HVM_PARAM_X87_FIP_WIDTH:
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -946,6 +946,16 @@ map_grant_ref(
         return;
     }
 
+    if ( paging_mode_translate(ld) /* && (op->flags & GNTMAP_host_map) */ &&
+         (rc = notify_gfn(ld, gaddr_to_gfn(op->host_addr))) )
+    {
+        gdprintk(XENLOG_INFO, "notify(%"PRI_gfn") -> %d\n",
+                 gfn_x(gaddr_to_gfn(op->host_addr)), rc);
+        op->status = GNTST_general_error;
+        return;
+        BUILD_BUG_ON(GNTST_okay);
+    }
+
     if ( unlikely((rd = rcu_lock_domain_by_id(op->dom)) == NULL) )
     {
         gdprintk(XENLOG_INFO, "Could not find domain %d\n", op->dom);
@@ -2123,6 +2133,7 @@ gnttab_transfer(
     {
         bool_t okay;
         int rc;
+        gfn_t gfn;
 
         if ( i && hypercall_preempt_check() )
             return i;
@@ -2300,21 +2311,52 @@ gnttab_transfer(
         act = active_entry_acquire(e->grant_table, gop.ref);
 
         if ( evaluate_nospec(e->grant_table->gt_version == 1) )
+            gfn = _gfn(shared_entry_v1(e->grant_table, gop.ref).frame);
+        else
+            gfn = _gfn(shared_entry_v2(e->grant_table, gop.ref).full_page.frame);
+
+        if ( paging_mode_translate(e) )
         {
-            grant_entry_v1_t *sha = &shared_entry_v1(e->grant_table, gop.ref);
+            gfn_t gfn2;
+
+            active_entry_release(act);
+            grant_read_unlock(e->grant_table);
+
+            rc = notify_gfn(e, gfn);
+            if ( rc )
+                printk(XENLOG_G_WARNING
+                       "%pd: gref %u: xfer GFN %"PRI_gfn" may be inaccessible (%d)\n",
+                       e, gop.ref, gfn_x(gfn), rc);
+
+            grant_read_lock(e->grant_table);
+            act = active_entry_acquire(e->grant_table, gop.ref);
 
-            guest_physmap_add_page(e, _gfn(sha->frame), mfn, 0);
-            if ( !paging_mode_translate(e) )
-                sha->frame = mfn_x(mfn);
+            if ( evaluate_nospec(e->grant_table->gt_version == 1) )
+                gfn2 = _gfn(shared_entry_v1(e->grant_table, gop.ref).frame);
+            else
+                gfn2 = _gfn(shared_entry_v2(e->grant_table, gop.ref).
+                    full_page.frame);
+
+            if ( !gfn_eq(gfn, gfn2) )
+            {
+                printk(XENLOG_G_WARNING
+                       "%pd: gref %u: xfer GFN went %"PRI_gfn" -> %"PRI_gfn"\n",
+                       e, gop.ref, gfn_x(gfn), gfn_x(gfn2));
+                gfn = gfn2;
+            }
         }
-        else
-        {
-            grant_entry_v2_t *sha = &shared_entry_v2(e->grant_table, gop.ref);
 
-            guest_physmap_add_page(e, _gfn(sha->full_page.frame), mfn, 0);
-            if ( !paging_mode_translate(e) )
-                sha->full_page.frame = mfn_x(mfn);
+        guest_physmap_add_page(e, gfn, mfn, 0);
+
+        if ( !paging_mode_translate(e) )
+        {
+            if ( evaluate_nospec(e->grant_table->gt_version == 1) )
+                shared_entry_v1(e->grant_table, gop.ref).frame = mfn_x(mfn);
+            else
+                shared_entry_v2(e->grant_table, gop.ref).full_page.frame =
+                    mfn_x(mfn);
         }
+
         smp_wmb();
         shared_entry_header(e->grant_table, gop.ref)->flags |=
             GTF_transfer_completed;
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -203,6 +203,10 @@ static void populate_physmap(struct memo
         if ( unlikely(__copy_from_guest_offset(&gpfn, a->extent_list, i, 1)) )
             goto out;
 
+        if ( paging_mode_translate(d) &&
+             notify_gfn(d, _gfn(gpfn + (1U << a->extent_order) - 1)) )
+            goto out;
+
         if ( a->memflags & MEMF_populate_on_demand )
         {
             /* Disallow populating PoD pages on oneself. */
@@ -745,6 +749,10 @@ static long memory_exchange(XEN_GUEST_HA
                 continue;
             }
 
+            if ( paging_mode_translate(d) )
+                rc = notify_gfn(d,
+                                _gfn(gpfn + (1U << exch.out.extent_order) - 1));
+
             mfn = page_to_mfn(page);
             guest_physmap_add_page(d, _gfn(gpfn), mfn,
                                    exch.out.extent_order);
@@ -813,12 +821,20 @@ int xenmem_add_to_physmap(struct domain
         extra.foreign_domid = DOMID_INVALID;
 
     if ( xatp->space != XENMAPSPACE_gmfn_range )
-        return xenmem_add_to_physmap_one(d, xatp->space, extra,
+        return notify_gfn(d, _gfn(xatp->gpfn)) ?:
+               xenmem_add_to_physmap_one(d, xatp->space, extra,
                                          xatp->idx, _gfn(xatp->gpfn));
 
     if ( xatp->size < start )
         return -EILSEQ;
 
+    if ( !start && xatp->size )
+    {
+        rc = notify_gfn(d, _gfn(xatp->gpfn + xatp->size - 1));
+        if ( rc )
+            return rc;
+    }
+
     xatp->idx += start;
     xatp->gpfn += start;
     xatp->size -= start;
@@ -891,7 +907,8 @@ static int xenmem_add_to_physmap_batch(s
                                                extent, 1)) )
             return -EFAULT;
 
-        rc = xenmem_add_to_physmap_one(d, xatpb->space,
+        rc = notify_gfn(d, _gfn(gpfn)) ?:
+             xenmem_add_to_physmap_one(d, xatpb->space,
                                        xatpb->u,
                                        idx, _gfn(gpfn));
 
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -530,6 +530,14 @@ void iommu_share_p2m_table(struct domain
         iommu_get_ops()->share_p2m(d);
 }
 
+int iommu_notify_gfn(struct domain *d, gfn_t gfn)
+{
+    const struct iommu_ops *ops = dom_iommu(d)->platform_ops;
+
+    return need_iommu_pt_sync(d) && ops->notify_dfn
+           ? iommu_call(ops, notify_dfn, d, _dfn(gfn_x(gfn))) : 0;
+}
+
 void iommu_crash_shutdown(void)
 {
     if ( !iommu_crash_disable )
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -237,6 +237,8 @@ struct iommu_ops {
     int __must_check (*lookup_page)(struct domain *d, dfn_t dfn, mfn_t *mfn,
                                     unsigned int *flags);
 
+    int __must_check (*notify_dfn)(struct domain *d, dfn_t dfn);
+
     void (*free_page_table)(struct page_info *);
 
 #ifdef CONFIG_X86
@@ -331,6 +333,7 @@ void iommu_crash_shutdown(void);
 int iommu_get_reserved_device_memory(iommu_grdm_t *, void *);
 
 void iommu_share_p2m_table(struct domain *d);
+int __must_check iommu_notify_gfn(struct domain *d, gfn_t gfn);
 
 #ifdef CONFIG_HAS_PCI
 int iommu_do_pci_domctl(struct xen_domctl *, struct domain *d,
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -1039,6 +1039,11 @@ static always_inline bool is_iommu_enabl
     return evaluate_nospec(d->options & XEN_DOMCTL_CDF_iommu);
 }
 
+static inline int __must_check notify_gfn(struct domain *d, gfn_t gfn)
+{
+    return /* arch_notify_gfn(d, gfn) ?: */ iommu_notify_gfn(d, gfn);
+}
+
 extern bool sched_smt_power_savings;
 extern bool sched_disable_smt_switching;
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/3] introduce GFN notification for translated domains

Posted by George Dunlap 2 weeks ago
On 11/6/19 3:19 PM, Jan Beulich wrote:
> In order for individual IOMMU drivers (and from an abstract pov also
> architectures) to be able to adjust their data structures ahead of time
> when they might cover only a sub-range of all possible GFNs, introduce
> a notification call used by various code paths potentially installing a
> fresh mapping of a never used GFN (for a particular domain).

So trying to reverse engineer what's going on here, you mean to say
something like this:

---
Individual IOMMU drivers contain adjuct data structures for gfn ranges
contained in the main p2m.  For efficiency, these adjuct data structures
often cover only a subset of the gfn range.  Installing a fresh mapping
of a never-used gfn may require these ranges to be expanded.  Doing this
when the p2m entry is first updated may be problematic because <reasons>.

To fix this, implement notify_gfn(), to be called when Xen first becomes
aware that a potentially new gfn may be about to be used.  This will
notify the IOMMU driver about the new gfn, allowing it to expand the
data structures.  It may return -ERESTART (?) for long-running
operations, in which case the operation should be restarted or a
different error if the expansion of the data structure is not possible.
 In the latter case, the entire operation should fail.
---

Is that about right?  Note I've had to make a lot of guesses here about
the functionality and intent.

> Note that in gnttab_transfer() the notification and lock re-acquire
> handling is best effort only (the guest may not be able to make use of
> the new page in case of failure, but that's in line with the lack of a
> return value check of guest_physmap_add_page() itself).

Is there a reason we can't just return an error to the caller?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/3] introduce GFN notification for translated domains

Posted by Jan Beulich 2 weeks ago
On 07.11.2019 12:35, George Dunlap wrote:
> On 11/6/19 3:19 PM, Jan Beulich wrote:
>> In order for individual IOMMU drivers (and from an abstract pov also
>> architectures) to be able to adjust their data structures ahead of time
>> when they might cover only a sub-range of all possible GFNs, introduce
>> a notification call used by various code paths potentially installing a
>> fresh mapping of a never used GFN (for a particular domain).
> 
> So trying to reverse engineer what's going on here, you mean to say
> something like this:
> 
> ---
> Individual IOMMU drivers contain adjuct data structures for gfn ranges
> contained in the main p2m.  For efficiency, these adjuct data structures
> often cover only a subset of the gfn range.  Installing a fresh mapping
> of a never-used gfn may require these ranges to be expanded.  Doing this
> when the p2m entry is first updated may be problematic because <reasons>.
> 
> To fix this, implement notify_gfn(), to be called when Xen first becomes
> aware that a potentially new gfn may be about to be used.  This will
> notify the IOMMU driver about the new gfn, allowing it to expand the
> data structures.  It may return -ERESTART (?) for long-running
> operations, in which case the operation should be restarted or a
> different error if the expansion of the data structure is not possible.
>  In the latter case, the entire operation should fail.
> ---
> 
> Is that about right?

With the exception of the -ERESTART / long running operations aspect,
yes. Plus assuming you mean "adjunct" (not "adjuct", which my
dictionary doesn't know about).

>  Note I've had to make a lot of guesses here about
> the functionality and intent.

Well, even after seeing your longer description, I don't see what mine
doesn't say.

>> Note that in gnttab_transfer() the notification and lock re-acquire
>> handling is best effort only (the guest may not be able to make use of
>> the new page in case of failure, but that's in line with the lack of a
>> return value check of guest_physmap_add_page() itself).
> 
> Is there a reason we can't just return an error to the caller?

Rolling back what has been done by that point would seem rather
difficult, which I guess is the reason why the code was written the
way it is (prior to my change).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/3] introduce GFN notification for translated domains

Posted by George Dunlap 2 weeks ago
On 11/7/19 11:47 AM, Jan Beulich wrote:
> On 07.11.2019 12:35, George Dunlap wrote:
>> On 11/6/19 3:19 PM, Jan Beulich wrote:
>>> In order for individual IOMMU drivers (and from an abstract pov also
>>> architectures) to be able to adjust their data structures ahead of time
>>> when they might cover only a sub-range of all possible GFNs, introduce
>>> a notification call used by various code paths potentially installing a
>>> fresh mapping of a never used GFN (for a particular domain).
>>
>> So trying to reverse engineer what's going on here, you mean to say
>> something like this:
>>
>> ---
>> Individual IOMMU drivers contain adjuct data structures for gfn ranges
>> contained in the main p2m.  For efficiency, these adjuct data structures
>> often cover only a subset of the gfn range.  Installing a fresh mapping
>> of a never-used gfn may require these ranges to be expanded.  Doing this
>> when the p2m entry is first updated may be problematic because <reasons>.
>>
>> To fix this, implement notify_gfn(), to be called when Xen first becomes
>> aware that a potentially new gfn may be about to be used.  This will
>> notify the IOMMU driver about the new gfn, allowing it to expand the
>> data structures.  It may return -ERESTART (?) for long-running
>> operations, in which case the operation should be restarted or a
>> different error if the expansion of the data structure is not possible.
>>  In the latter case, the entire operation should fail.
>> ---
>>
>> Is that about right?
> 
> With the exception of the -ERESTART / long running operations aspect,
> yes. Plus assuming you mean "adjunct" (not "adjuct", which my
> dictionary doesn't know about).
> 
>>  Note I've had to make a lot of guesses here about
>> the functionality and intent.
> 
> Well, even after seeing your longer description, I don't see what mine
> doesn't say

* "Ahead of time" -- ahead of what?

* Why do things need to be done ahead of time, rather than at the time
(for whatever it is)?  (I couldn't even really guess at this, which is
why I put "<reasons>".)

* To me "notify" doesn't in any way imply that the operation can fail.
Most modern notifications are FYI only, with no opportunity to prevent
the thing from happening.  (That's not to say that notify is an
inappropriate name -- just that by itself it doesn't imply the ability
to cancel, which seems like a major factor to understanding the intent
of the patch.)

>>> Note that in gnttab_transfer() the notification and lock re-acquire
>>> handling is best effort only (the guest may not be able to make use of
>>> the new page in case of failure, but that's in line with the lack of a
>>> return value check of guest_physmap_add_page() itself).
>>
>> Is there a reason we can't just return an error to the caller?
> 
> Rolling back what has been done by that point would seem rather
> difficult, which I guess is the reason why the code was written the
> way it is (prior to my change).

The phrasing made me think that you were changing it to be best-effort,
rather than following suit with existing functionality.

Maybe:

"Note that before this patch, in gnttab_transfer(), once <condition>
happens, further errors modifying the physmap are ignored (presumably
because it would be too complicated to try to roll back at that point).
 This patch follows suit by ignoring failed notify_gfn()s, simply
printing out a warning that the gfn may not be accessible due to the
failure."

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/3] introduce GFN notification for translated domains

Posted by Jan Beulich 2 weeks ago
On 07.11.2019 13:10, George Dunlap wrote:
> On 11/7/19 11:47 AM, Jan Beulich wrote:
>> On 07.11.2019 12:35, George Dunlap wrote:
>>> On 11/6/19 3:19 PM, Jan Beulich wrote:
>>>> In order for individual IOMMU drivers (and from an abstract pov also
>>>> architectures) to be able to adjust their data structures ahead of time
>>>> when they might cover only a sub-range of all possible GFNs, introduce
>>>> a notification call used by various code paths potentially installing a
>>>> fresh mapping of a never used GFN (for a particular domain).
>>>
>>> So trying to reverse engineer what's going on here, you mean to say
>>> something like this:
>>>
>>> ---
>>> Individual IOMMU drivers contain adjuct data structures for gfn ranges
>>> contained in the main p2m.  For efficiency, these adjuct data structures
>>> often cover only a subset of the gfn range.  Installing a fresh mapping
>>> of a never-used gfn may require these ranges to be expanded.  Doing this
>>> when the p2m entry is first updated may be problematic because <reasons>.
>>>
>>> To fix this, implement notify_gfn(), to be called when Xen first becomes
>>> aware that a potentially new gfn may be about to be used.  This will
>>> notify the IOMMU driver about the new gfn, allowing it to expand the
>>> data structures.  It may return -ERESTART (?) for long-running
>>> operations, in which case the operation should be restarted or a
>>> different error if the expansion of the data structure is not possible.
>>>  In the latter case, the entire operation should fail.
>>> ---
>>>
>>> Is that about right?
>>
>> With the exception of the -ERESTART / long running operations aspect,
>> yes. Plus assuming you mean "adjunct" (not "adjuct", which my
>> dictionary doesn't know about).
>>
>>>  Note I've had to make a lot of guesses here about
>>> the functionality and intent.
>>
>> Well, even after seeing your longer description, I don't see what mine
>> doesn't say
> 
> * "Ahead of time" -- ahead of what?

I replaced "time" by "actual mapping requests", realizing that I'm
implying too much here of what is the subject of the next patch.

> * Why do things need to be done ahead of time, rather than at the time
> (for whatever it is)?  (I couldn't even really guess at this, which is
> why I put "<reasons>".)

This "why" imo really is the subject of the next patch, and hence
gets explained there.

> * To me "notify" doesn't in any way imply that the operation can fail.
> Most modern notifications are FYI only, with no opportunity to prevent
> the thing from happening.  (That's not to say that notify is an
> inappropriate name -- just that by itself it doesn't imply the ability
> to cancel, which seems like a major factor to understanding the intent
> of the patch.)

I'm up for different names; "notify" is what I could think of. It
being able to fail is in line with our more abstract notifier
infrastructure (inherited from Linux) also allowing for NOTIFY_BAD.

>>>> Note that in gnttab_transfer() the notification and lock re-acquire
>>>> handling is best effort only (the guest may not be able to make use of
>>>> the new page in case of failure, but that's in line with the lack of a
>>>> return value check of guest_physmap_add_page() itself).
>>>
>>> Is there a reason we can't just return an error to the caller?
>>
>> Rolling back what has been done by that point would seem rather
>> difficult, which I guess is the reason why the code was written the
>> way it is (prior to my change).
> 
> The phrasing made me think that you were changing it to be best-effort,
> rather than following suit with existing functionality.
> 
> Maybe:
> 
> "Note that before this patch, in gnttab_transfer(), once <condition>
> happens, further errors modifying the physmap are ignored (presumably
> because it would be too complicated to try to roll back at that point).
>  This patch follows suit by ignoring failed notify_gfn()s, simply
> printing out a warning that the gfn may not be accessible due to the
> failure."

Thanks, I'll use this in a slightly extended form.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 3/3] AMD/IOMMU: use notify_dfn() hook to update paging mode

Posted by Jan Beulich 2 weeks ago
update_paging_mode() expects to be invoked with the PCI devices lock
held. The check occurring only when the mode actually needs updating,
the violation of this rule by the majority of callers did go unnoticed
until per-domain IOMMU setup was changed to do away with on-demand
creation of IOMMU page tables.

Acquiring the necessary lock in amd_iommu_map_page() or intermediate
layers in generic IOMMU code is not possible - we'd risk all sorts of
lock order violations. Hence the call to update_paging_mode() gets
pulled out of the function, to be invoked instead from the new
notify_dfn() hook, where no potentially conflicting locks are being
held by the callers.

Similarly the call to amd_iommu_alloc_root() gets pulled out - now
that we receive notification of all DFN range increases, there's no
need anymore to do this check when actually mapping a page.

Note that this ought to result in a small performance improvement as
well: The hook often gets invoked just once for larger blocks of pages,
so rather than going through amd_iommu_alloc_root() and
update_paging_mode() once per page, we may now invoke it just once per
batch.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -383,35 +383,16 @@ int amd_iommu_map_page(struct domain *d,
                        unsigned int flags, unsigned int *flush_flags)
 {
     struct domain_iommu *hd = dom_iommu(d);
-    int rc;
     unsigned long pt_mfn[7];
 
     memset(pt_mfn, 0, sizeof(pt_mfn));
 
     spin_lock(&hd->arch.mapping_lock);
 
-    rc = amd_iommu_alloc_root(hd);
-    if ( rc )
+    if ( !hd->arch.root_table )
     {
         spin_unlock(&hd->arch.mapping_lock);
-        AMD_IOMMU_DEBUG("Root table alloc failed, dfn = %"PRI_dfn"\n",
-                        dfn_x(dfn));
-        domain_crash(d);
-        return rc;
-    }
-
-    /* Since HVM domain is initialized with 2 level IO page table,
-     * we might need a deeper page table for wider dfn now */
-    if ( is_hvm_domain(d) )
-    {
-        if ( update_paging_mode(d, dfn_x(dfn)) )
-        {
-            spin_unlock(&hd->arch.mapping_lock);
-            AMD_IOMMU_DEBUG("Update page mode failed dfn = %"PRI_dfn"\n",
-                            dfn_x(dfn));
-            domain_crash(d);
-            return -EFAULT;
-        }
+        return -ENODATA;
     }
 
     if ( iommu_pde_from_dfn(d, dfn_x(dfn), pt_mfn, true) || (pt_mfn[1] == 0) )
@@ -468,6 +449,48 @@ int amd_iommu_unmap_page(struct domain *
 
     return 0;
 }
+
+int amd_iommu_notify_dfn(struct domain *d, dfn_t dfn)
+{
+    struct domain_iommu *hd = dom_iommu(d);
+    int rc;
+
+    ASSERT(is_hvm_domain(d));
+
+    /*
+     * Since HVM domain is initialized with 2 level IO page table,
+     * we might need a deeper page table for wider dfn now.
+     */
+    pcidevs_lock();
+    spin_lock(&hd->arch.mapping_lock);
+
+    rc = amd_iommu_alloc_root(hd);
+    if ( rc )
+    {
+        spin_unlock(&hd->arch.mapping_lock);
+        pcidevs_unlock();
+        AMD_IOMMU_DEBUG("Root table alloc failed, dfn = %"PRI_dfn" (rc %d)\n",
+                        dfn_x(dfn), rc);
+        domain_crash(d);
+        return rc;
+    }
+
+    rc = update_paging_mode(d, dfn_x(dfn));
+    if ( rc )
+    {
+        spin_unlock(&hd->arch.mapping_lock);
+        pcidevs_unlock();
+        AMD_IOMMU_DEBUG("Update paging mode failed dfn %"PRI_dfn" (rc %d)\n",
+                        dfn_x(dfn), rc);
+        domain_crash(d);
+        return rc;
+    }
+
+    spin_unlock(&hd->arch.mapping_lock);
+    pcidevs_unlock();
+
+    return 0;
+}
 
 static unsigned long flush_count(unsigned long dfn, unsigned int page_count,
                                  unsigned int order)
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -628,6 +628,7 @@ static const struct iommu_ops __initcons
     .teardown = amd_iommu_domain_destroy,
     .map_page = amd_iommu_map_page,
     .unmap_page = amd_iommu_unmap_page,
+    .notify_dfn = amd_iommu_notify_dfn,
     .iotlb_flush = amd_iommu_flush_iotlb_pages,
     .iotlb_flush_all = amd_iommu_flush_iotlb_all,
     .free_page_table = deallocate_page_table,
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -61,6 +61,7 @@ int __must_check amd_iommu_map_page(stru
 int __must_check amd_iommu_unmap_page(struct domain *d, dfn_t dfn,
                                       unsigned int *flush_flags);
 int __must_check amd_iommu_alloc_root(struct domain_iommu *hd);
+int __must_check amd_iommu_notify_dfn(struct domain *d, dfn_t dfn);
 int amd_iommu_reserve_domain_unity_map(struct domain *domain,
                                        paddr_t phys_addr, unsigned long size,
                                        int iw, int ir);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel