[Xen-devel] [PATCH 0/3] AMD IOMMU: misc small adjustments

Jan Beulich posted 3 patches 1 week ago
Only 0 patches received!

[Xen-devel] [PATCH 0/3] AMD IOMMU: misc small adjustments

Posted by Jan Beulich 1 week ago
First a small fix to halfway recent security related changes, and
then some cleanup noticed on the way as likely desirable.

1: fix off-by-one in amd_iommu_get_paging_mode() callers
2: drop redundant code
3: replace a few literal numbers

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/3] AMD/IOMMU: fix off-by-one in amd_iommu_get_paging_mode() callers

Posted by Jan Beulich 1 week ago
amd_iommu_get_paging_mode() expects a count, not a "maximum possible"
value. Prior to b4f042236ae0 dropping the reference, the use of our mis-
named "max_page" in amd_iommu_domain_init() may have lead to such a
misunderstanding.

Also replace a literal 4 by an expression tying it to a wider use
constant, just like amd_iommu_quarantine_init() does.

Fixes: ea38867831da ("x86 / iommu: set up a scratch page in the quarantine domain")
Fixes: b4f042236ae0 ("AMD/IOMMU: Cease using a dynamic height for the IOMMU pagetables")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Note: I'm not at the same time adding error checking here, despite
      amd_iommu_get_paging_mode() possibly returning one, as I think
      that's a sufficiently orthogonal aspect.

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -455,9 +455,9 @@ int amd_iommu_reserve_domain_unity_map(s
 int __init amd_iommu_quarantine_init(struct domain *d)
 {
     struct domain_iommu *hd = dom_iommu(d);
-    unsigned long max_gfn =
-        PFN_DOWN((1ul << DEFAULT_DOMAIN_ADDRESS_WIDTH) - 1);
-    unsigned int level = amd_iommu_get_paging_mode(max_gfn);
+    unsigned long end_gfn =
+        1ul << (DEFAULT_DOMAIN_ADDRESS_WIDTH - PAGE_SHIFT);
+    unsigned int level = amd_iommu_get_paging_mode(end_gfn);
     struct amd_iommu_pte *table;
 
     if ( hd->arch.root_table )
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -259,8 +259,10 @@ static int amd_iommu_domain_init(struct
      *   physical address space we give it, but this isn't known yet so use 4
      *   unilaterally.
      */
-    hd->arch.paging_mode = is_hvm_domain(d)
-        ? 4 : amd_iommu_get_paging_mode(get_upper_mfn_bound());
+    hd->arch.paging_mode = amd_iommu_get_paging_mode(
+        is_hvm_domain(d)
+        ? 1ul << (DEFAULT_DOMAIN_ADDRESS_WIDTH - PAGE_SHIFT)
+        : get_upper_mfn_bound() + 1);
 
     return 0;
 }


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] AMD/IOMMU: fix off-by-one in amd_iommu_get_paging_mode() callers

Posted by Andrew Cooper 1 week ago
On 05/02/2020 09:42, Jan Beulich wrote:
> amd_iommu_get_paging_mode() expects a count, not a "maximum possible"
> value. Prior to b4f042236ae0 dropping the reference, the use of our mis-
> named "max_page" in amd_iommu_domain_init() may have lead to such a
> misunderstanding.
>
> Also replace a literal 4 by an expression tying it to a wider use
> constant, just like amd_iommu_quarantine_init() does.
>
> Fixes: ea38867831da ("x86 / iommu: set up a scratch page in the quarantine domain")
> Fixes: b4f042236ae0 ("AMD/IOMMU: Cease using a dynamic height for the IOMMU pagetables")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Note: I'm not at the same time adding error checking here, despite
>       amd_iommu_get_paging_mode() possibly returning one, as I think
>       that's a sufficiently orthogonal aspect.

It is entirely non-obvious what amd_iommu_get_paging_mode() takes, which
is presumably what has led to this confusion.

It also seems silly to force a call into another translation unit when
2/3 of the callers can be evaluated at compile time.

How about re-implementing amd_iommu_get_paging_mode() as a static inline
(seeing as it is just basic arithmetic), and naming its parameter in a
more useful, e.g. max_frames ?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/3] AMD/IOMMU: drop redundant code

Posted by Jan Beulich 1 week ago
The level 1 special exit path is unnecessary in iommu_pde_from_dfn() -
the subsequent code takes care of this case quite fine.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -198,12 +198,6 @@ static int iommu_pde_from_dfn(struct dom
 
     next_table_mfn = mfn_x(page_to_mfn(table));
 
-    if ( level == 1 )
-    {
-        pt_mfn[level] = next_table_mfn;
-        return 0;
-    }
-
     while ( level > 1 )
     {
         unsigned int next_level = level - 1;


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/3] AMD/IOMMU: drop redundant code

Posted by Andrew Cooper 1 week ago
On 05/02/2020 09:42, Jan Beulich wrote:
> The level 1 special exit path is unnecessary in iommu_pde_from_dfn() -
> the subsequent code takes care of this case quite fine.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 3/3] AMD/IOMMU: replace a few literal numbers

Posted by Jan Beulich 1 week ago
Introduce IOMMU_PDE_NEXT_LEVEL_{MIN,MAX} to replace literal 1, 6, and 7
instances. While doing so replace two uses of memset() by initializers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
TBD: We should really honor the hats field of union
     amd_iommu_ext_features, but the specification (or at least the
     parts I did look at in the course of putting together this patch)
     is unclear about the maximum valid value in case EFRSup is clear.

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -187,7 +187,8 @@ static int iommu_pde_from_dfn(struct dom
     table = hd->arch.root_table;
     level = hd->arch.paging_mode;
 
-    BUG_ON( table == NULL || level < 1 || level > 6 );
+    BUG_ON(!table || level < IOMMU_PDE_NEXT_LEVEL_MIN ||
+           level > IOMMU_PDE_NEXT_LEVEL_MAX);
 
     /*
      * A frame number past what the current page tables can represent can't
@@ -198,7 +199,7 @@ static int iommu_pde_from_dfn(struct dom
 
     next_table_mfn = mfn_x(page_to_mfn(table));
 
-    while ( level > 1 )
+    while ( level > IOMMU_PDE_NEXT_LEVEL_MIN )
     {
         unsigned int next_level = level - 1;
         pt_mfn[level] = next_table_mfn;
@@ -274,7 +275,7 @@ static int iommu_pde_from_dfn(struct dom
         level--;
     }
 
-    /* mfn of level 1 page table */
+    /* mfn of IOMMU_PDE_NEXT_LEVEL_MIN page table */
     pt_mfn[level] = next_table_mfn;
     return 0;
 }
@@ -284,9 +285,7 @@ int amd_iommu_map_page(struct domain *d,
 {
     struct domain_iommu *hd = dom_iommu(d);
     int rc;
-    unsigned long pt_mfn[7];
-
-    memset(pt_mfn, 0, sizeof(pt_mfn));
+    unsigned long pt_mfn[IOMMU_PDE_NEXT_LEVEL_MAX + 1] = {};
 
     spin_lock(&hd->arch.mapping_lock);
 
@@ -300,7 +299,8 @@ int amd_iommu_map_page(struct domain *d,
         return rc;
     }
 
-    if ( iommu_pde_from_dfn(d, dfn_x(dfn), pt_mfn, true) || (pt_mfn[1] == 0) )
+    if ( iommu_pde_from_dfn(d, dfn_x(dfn), pt_mfn, true) ||
+         !pt_mfn[IOMMU_PDE_NEXT_LEVEL_MIN] )
     {
         spin_unlock(&hd->arch.mapping_lock);
         AMD_IOMMU_DEBUG("Invalid IO pagetable entry dfn = %"PRI_dfn"\n",
@@ -310,9 +310,11 @@ int amd_iommu_map_page(struct domain *d,
     }
 
     /* Install 4k mapping */
-    *flush_flags |= set_iommu_pte_present(pt_mfn[1], dfn_x(dfn), mfn_x(mfn),
-                                          1, (flags & IOMMUF_writable),
-                                          (flags & IOMMUF_readable));
+    *flush_flags |= set_iommu_pte_present(pt_mfn[IOMMU_PDE_NEXT_LEVEL_MIN],
+                                          dfn_x(dfn), mfn_x(mfn),
+                                          IOMMU_PDE_NEXT_LEVEL_MIN,
+                                          flags & IOMMUF_writable,
+                                          flags & IOMMUF_readable);
 
     spin_unlock(&hd->arch.mapping_lock);
 
@@ -322,11 +324,9 @@ int amd_iommu_map_page(struct domain *d,
 int amd_iommu_unmap_page(struct domain *d, dfn_t dfn,
                          unsigned int *flush_flags)
 {
-    unsigned long pt_mfn[7];
+    unsigned long pt_mfn[IOMMU_PDE_NEXT_LEVEL_MAX + 1] = {};
     struct domain_iommu *hd = dom_iommu(d);
 
-    memset(pt_mfn, 0, sizeof(pt_mfn));
-
     spin_lock(&hd->arch.mapping_lock);
 
     if ( !hd->arch.root_table )
@@ -344,10 +344,12 @@ int amd_iommu_unmap_page(struct domain *
         return -EFAULT;
     }
 
-    if ( pt_mfn[1] )
+    if ( pt_mfn[IOMMU_PDE_NEXT_LEVEL_MIN] )
     {
         /* Mark PTE as 'page not present'. */
-        *flush_flags |= clear_iommu_pte_present(pt_mfn[1], dfn_x(dfn));
+        *flush_flags |=
+            clear_iommu_pte_present(pt_mfn[IOMMU_PDE_NEXT_LEVEL_MIN],
+                                    dfn_x(dfn));
     }
 
     spin_unlock(&hd->arch.mapping_lock);
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -233,14 +233,14 @@ static int __must_check allocate_domain_
 
 int amd_iommu_get_paging_mode(unsigned long entries)
 {
-    int level = 1;
+    int level = IOMMU_PDE_NEXT_LEVEL_MIN;
 
     BUG_ON( !entries );
 
     while ( entries > PTE_PER_TABLE_SIZE )
     {
         entries = PTE_PER_TABLE_ALIGN(entries) >> PTE_PER_TABLE_SHIFT;
-        if ( ++level > 6 )
+        if ( ++level > IOMMU_PDE_NEXT_LEVEL_MAX )
             return -ENOMEM;
     }
 
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -465,6 +465,9 @@ union amd_iommu_x2apic_control {
 #define IOMMU_PAGE_TABLE_U32_PER_ENTRY	(IOMMU_PAGE_TABLE_ENTRY_SIZE / 4)
 #define IOMMU_PAGE_TABLE_ALIGNMENT	4096
 
+#define IOMMU_PDE_NEXT_LEVEL_MIN	1
+#define IOMMU_PDE_NEXT_LEVEL_MAX	6
+
 struct amd_iommu_pte {
     uint64_t pr:1;
     uint64_t ignored0:4;


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/3] AMD/IOMMU: replace a few literal numbers

Posted by Andrew Cooper 1 week ago
On 05/02/2020 09:43, Jan Beulich wrote:
> Introduce IOMMU_PDE_NEXT_LEVEL_{MIN,MAX} to replace literal 1, 6, and 7
> instances. While doing so replace two uses of memset() by initializers.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

This does not look to be an improvement.  IOMMU_PDE_NEXT_LEVEL_MIN is
definitely bogus, and in all cases, a literal 1 is better, because that
is how we describe pagetable levels.

Something to replace literal 6/7 probably is ok, but doesn't want to be
done like this.

The majority of the problems here as caused by iommu_pde_from_dfn()'s
silly ABI.  The pt_mfn[] array is problematic (because it is used as a
1-based array, not 0-based) and useless because both callers only want
the 4k-equivelent mfn.  Fixing the ABI gets rid of quite a lot of wasted
stack space, every use of '1', and every upper bound other than the bug
on and amd_iommu_get_paging_mode().

> ---
> TBD: We should really honor the hats field of union
>      amd_iommu_ext_features, but the specification (or at least the
>      parts I did look at in the course of putting together this patch)
>      is unclear about the maximum valid value in case EFRSup is clear.

It is available from PCI config space (Misc0 register, cap+0x10) even on
first gen IOMMUs, and the IVRS table in Type 10.

I'm honestly not sure why the information was duplicated into EFR, other
than perhaps for providing the information in a more useful format.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/3] AMD/IOMMU: replace a few literal numbers

Posted by Jan Beulich 7 hours ago
On 10.02.2020 15:28, Andrew Cooper wrote:
> On 05/02/2020 09:43, Jan Beulich wrote:
>> Introduce IOMMU_PDE_NEXT_LEVEL_{MIN,MAX} to replace literal 1, 6, and 7
>> instances. While doing so replace two uses of memset() by initializers.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> This does not look to be an improvement.  IOMMU_PDE_NEXT_LEVEL_MIN is
> definitely bogus, and in all cases, a literal 1 is better, because that
> is how we describe pagetable levels.

I disagree. The device table entry's mode field is bounded by 1
(min) and 6 (max) for the legitimate values to put there.

> Something to replace literal 6/7 probably is ok, but doesn't want to be
> done like this.
> 
> The majority of the problems here as caused by iommu_pde_from_dfn()'s
> silly ABI.  The pt_mfn[] array is problematic (because it is used as a
> 1-based array, not 0-based) and useless because both callers only want
> the 4k-equivelent mfn.  Fixing the ABI gets rid of quite a lot of wasted
> stack space, every use of '1', and every upper bound other than the bug
> on and amd_iommu_get_paging_mode().

I didn't mean to alter that function's behavior, at the very least
not until being certain there wasn't a reason it was coded with this
array approach. IOW the alternative to going with this patch
(subject to corrections of course) is for me to drop it altogether,
keeping the hard-coded numbers in place. Just let me know.

>> ---
>> TBD: We should really honor the hats field of union
>>      amd_iommu_ext_features, but the specification (or at least the
>>      parts I did look at in the course of putting together this patch)
>>      is unclear about the maximum valid value in case EFRSup is clear.
> 
> It is available from PCI config space (Misc0 register, cap+0x10) even on
> first gen IOMMUs,

I don't think any of the address size fields there matches what
HATS is about (limiting of the values valid to put in a DTE's
mode field). In fact I'm having some difficulty bringing the
two in (sensible) sync.

> and the IVRS table in Type 10.

Which may in turn be absent, i.e. the question of what to use as
a default merely gets shifted.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/3] AMD/IOMMU: replace a few literal numbers

Posted by Andrew Cooper 1 hour ago
On 17/02/2020 13:09, Jan Beulich wrote:
> On 10.02.2020 15:28, Andrew Cooper wrote:
>> On 05/02/2020 09:43, Jan Beulich wrote:
>>> Introduce IOMMU_PDE_NEXT_LEVEL_{MIN,MAX} to replace literal 1, 6, and 7
>>> instances. While doing so replace two uses of memset() by initializers.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> This does not look to be an improvement.  IOMMU_PDE_NEXT_LEVEL_MIN is
>> definitely bogus, and in all cases, a literal 1 is better, because that
>> is how we describe pagetable levels.
> I disagree.

A pagetable walking function which does:

while ( level > 1 )
{
    ...
    level--;
}

is far clearer and easier to follow than hiding 1 behind a constant
which isn't obviously 1.    Something like LEVEL_4K would at least be
something that makes sense in context, but a literal one less verbose.

>  The device table entry's mode field is bounded by 1
> (min) and 6 (max) for the legitimate values to put there.

If by 1, you mean 0, then yes.  Coping properly with a mode of 0 looks
to be easier than putting in an arbitrary restriction.

OTOH, you intended to restrict to just values we expect to find in a Xen
setup, then the answers are 3 and 4 only.  (The "correctness" of this
function depends on only running on Xen-written tables.  It doesn't
actually read the next-level field out of the PTE, and assumes that it
is a standard pagetable hierarchy.  Things will go wrong if it
encounters a superpage, or a next-level-7 entry.)

>
>> Something to replace literal 6/7 probably is ok, but doesn't want to be
>> done like this.
>>
>> The majority of the problems here as caused by iommu_pde_from_dfn()'s
>> silly ABI.  The pt_mfn[] array is problematic (because it is used as a
>> 1-based array, not 0-based) and useless because both callers only want
>> the 4k-equivelent mfn.  Fixing the ABI gets rid of quite a lot of wasted
>> stack space, every use of '1', and every upper bound other than the bug
>> on and amd_iommu_get_paging_mode().
> I didn't mean to alter that function's behavior, at the very least
> not until being certain there wasn't a reason it was coded with this
> array approach. IOW the alternative to going with this patch
> (subject to corrections of course) is for me to drop it altogether,
> keeping the hard-coded numbers in place. Just let me know.

If you don't want to change the API, then I'll put it on my todo list.

As previously expressed, this patch on its own is not an improvement IMO.

>>> ---
>>> TBD: We should really honor the hats field of union
>>>      amd_iommu_ext_features, but the specification (or at least the
>>>      parts I did look at in the course of putting together this patch)
>>>      is unclear about the maximum valid value in case EFRSup is clear.
>> It is available from PCI config space (Misc0 register, cap+0x10) even on
>> first gen IOMMUs,
> I don't think any of the address size fields there matches what
> HATS is about (limiting of the values valid to put in a DTE's
> mode field). In fact I'm having some difficulty bringing the
> two in (sensible) sync.

It will confirm whether 4-levels is available or not, but TBH, we know
that anyway by virtue of being 64bit.

Higher levels really don't matter because we don't support using them. 
We're we to support using them (and I do have one usecase in mind), it
would be entirely reasonable to restrict usage to systems which had EFR.

>
>> and the IVRS table in Type 10.
> Which may in turn be absent, i.e. the question of what to use as
> a default merely gets shifted.

One of Type 10 or 11 is mandatory for each IOMMU in the system.  One way
or another, the information is present.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel