[v1] x86/PVH: Dom0 building adjustments

[PATCH 2/4] x86/P2M: relax guarding of MMIO entries

Posted by Jan Beulich 4 years, 5 months ago

One of the changes comprising the fixes for XSA-378 disallows replacing
MMIO mappings by unintended (for this purpose) code paths. At least in
the case of PVH Dom0 hitting an RMRR covered by an E820 ACPI region,
this is too strict. Generally short-circuit requests establishing the
same kind of mapping that's already in place.

Further permit "access" to differ in the "executable" attribute. While
ideally only ROM regions would get mapped with X set, getting there is
quite a bit of work. Therefore, as a temporary measure, permit X to
vary. For Dom0 the more permissive of the types will be used, while for
DomU it'll be the more restrictive one.

While there, also add a log message to the other domain_crash()
invocation that did prevent PVH Dom0 from coming up after the XSA-378
changes.

Fixes: 753cb68e6530 ("x86/p2m: guard (in particular) identity mapping entries")
Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -958,9 +958,13 @@ guest_physmap_add_entry(struct domain *d
         if ( p2m_is_special(ot) )
         {
             /* Don't permit unmapping grant/foreign/direct-MMIO this way. */
-            domain_crash(d);
             p2m_unlock(p2m);
-            
+            printk(XENLOG_G_ERR
+                   "%pd: GFN %lx (%lx:%u:%u) -> (%lx:%u:%u) not permitted\n",
+                   d, gfn_x(gfn) + i,
+                   mfn_x(omfn), ot, a,
+                   mfn_x(mfn) + i, t, p2m->default_access);
+            domain_crash(d);
             return -EPERM;
         }
         else if ( p2m_is_ram(ot) && !p2m_is_paged(ot) )
@@ -1302,9 +1306,50 @@ static int set_typed_p2m_entry(struct do
     }
     if ( p2m_is_special(ot) )
     {
-        gfn_unlock(p2m, gfn, order);
-        domain_crash(d);
-        return -EPERM;
+        bool done = false, bad = true;
+
+        /* Special-case (almost) identical mappings. */
+        if ( mfn_eq(mfn, omfn) && gfn_p2mt == ot )
+        {
+            /*
+             * For MMIO allow X to differ in the requests (to cover for
+             * set_identity_p2m_entry() and set_mmio_p2m_entry() differing in
+             * the way they specify "access"). For the hardware domain put (or
+             * leave) in place the more permissive of the two possibilities,
+             * while for DomU-s go with the more restrictive variant.
+             */
+            if ( gfn_p2mt == p2m_mmio_direct &&
+                 access <= p2m_access_rwx &&
+                 (access ^ a) == p2m_access_x )
+            {
+                if ( is_hardware_domain(d) )
+                    access |= p2m_access_x;
+                else
+                    access &= ~p2m_access_x;
+                bad = access == p2m_access_n;
+            }
+
+            if ( access == a )
+                done = true;
+        }
+
+        if ( done )
+        {
+            gfn_unlock(p2m, gfn, order);
+            return 0;
+        }
+
+        if ( bad )
+        {
+            gfn_unlock(p2m, gfn, order);
+            printk(XENLOG_G_ERR
+                   "%pd: GFN %lx (%lx:%u:%u:%u) -> (%lx:%u:%u:%u) not permitted\n",
+                   d, gfn_l,
+                   mfn_x(omfn), cur_order, ot, a,
+                   mfn_x(mfn), order, gfn_p2mt, access);
+            domain_crash(d);
+            return -EPERM;
+        }
     }
     else if ( p2m_is_ram(ot) )
     {

Re: [PATCH 2/4] x86/P2M: relax guarding of MMIO entries

Posted by Jan Beulich 4 years, 5 months ago

On 30.08.2021 15:02, Jan Beulich wrote:
> One of the changes comprising the fixes for XSA-378 disallows replacing
> MMIO mappings by unintended (for this purpose) code paths. At least in
> the case of PVH Dom0 hitting an RMRR covered by an E820 ACPI region,
> this is too strict. Generally short-circuit requests establishing the
> same kind of mapping that's already in place.
> 
> Further permit "access" to differ in the "executable" attribute. While
> ideally only ROM regions would get mapped with X set, getting there is
> quite a bit of work. Therefore, as a temporary measure, permit X to
> vary. For Dom0 the more permissive of the types will be used, while for
> DomU it'll be the more restrictive one.
> 
> While there, also add a log message to the other domain_crash()
> invocation that did prevent PVH Dom0 from coming up after the XSA-378
> changes.
> 
> Fixes: 753cb68e6530 ("x86/p2m: guard (in particular) identity mapping entries")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Btw, I had meant to have this post-commit-message remark here:

TBD: This could be generalized to all of R, W, and X. Dealing with just X
     is merely the minimum I found is immediately necessary.

Jan

Re: [PATCH 2/4] x86/P2M: relax guarding of MMIO entries

Posted by Andrew Cooper 4 years, 5 months ago

On 30/08/2021 14:02, Jan Beulich wrote:
> One of the changes comprising the fixes for XSA-378 disallows replacing
> MMIO mappings by unintended (for this purpose) code paths.

Drop the brackets.

> At least in
> the case of PVH Dom0 hitting an RMRR covered by an E820 ACPI region,
> this is too strict. Generally short-circuit requests establishing the
> same kind of mapping that's already in place.
>
> Further permit "access" to differ in the "executable" attribute. While
> ideally only ROM regions would get mapped with X set, getting there is
> quite a bit of work. Therefore, as a temporary measure, permit X to
> vary. For Dom0 the more permissive of the types will be used, while for
> DomU it'll be the more restrictive one.

Split behaviour between dom0 and domU based on types alone cannot
possibly be correct.

DomU's need to execute ROMs too, and this looks like will malfunction if
a ROM ends up in the region that HVMLoader relocated RAM from.

As this is a temporary bodge emergency bugfix, don't try to be clever -
just take the latest access.

> While there, also add a log message to the other domain_crash()
> invocation that did prevent PVH Dom0 from coming up after the XSA-378
> changes.
>
> Fixes: 753cb68e6530 ("x86/p2m: guard (in particular) identity mapping entries")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -958,9 +958,13 @@ guest_physmap_add_entry(struct domain *d
>          if ( p2m_is_special(ot) )
>          {
>              /* Don't permit unmapping grant/foreign/direct-MMIO this way. */
> -            domain_crash(d);
>              p2m_unlock(p2m);
> -            
> +            printk(XENLOG_G_ERR
> +                   "%pd: GFN %lx (%lx:%u:%u) -> (%lx:%u:%u) not permitted\n",

type and access need to be rendered in hex, or you need to use 0x
prefixes to distinguish the two bases.

Also, use commas rather than colons.  Visually, this is ambiguous with
PCI BDFs, and commas match tuple notation in most programming languages
which is the construct you're trying to represent.

Same below.

> +                   d, gfn_x(gfn) + i,
> +                   mfn_x(omfn), ot, a,
> +                   mfn_x(mfn) + i, t, p2m->default_access);
> +            domain_crash(d);
>              return -EPERM;
>          }
>          else if ( p2m_is_ram(ot) && !p2m_is_paged(ot) )
> @@ -1302,9 +1306,50 @@ static int set_typed_p2m_entry(struct do
>      }
>      if ( p2m_is_special(ot) )
>      {
> -        gfn_unlock(p2m, gfn, order);
> -        domain_crash(d);
> -        return -EPERM;
> +        bool done = false, bad = true;
> +
> +        /* Special-case (almost) identical mappings. */
> +        if ( mfn_eq(mfn, omfn) && gfn_p2mt == ot )
> +        {
> +            /*
> +             * For MMIO allow X to differ in the requests (to cover for
> +             * set_identity_p2m_entry() and set_mmio_p2m_entry() differing in
> +             * the way they specify "access"). For the hardware domain put (or
> +             * leave) in place the more permissive of the two possibilities,
> +             * while for DomU-s go with the more restrictive variant.

This comment needs to identify clearly that it is a temporary bodge
intended to be removed.

~Andrew

> +             */
> +            if ( gfn_p2mt == p2m_mmio_direct &&
> +                 access <= p2m_access_rwx &&
> +                 (access ^ a) == p2m_access_x )
> +            {
> +                if ( is_hardware_domain(d) )
> +                    access |= p2m_access_x;
> +                else
> +                    access &= ~p2m_access_x;
> +                bad = access == p2m_access_n;
> +            }
> +
> +            if ( access == a )
> +                done = true;
> +        }
> +
> +        if ( done )
> +        {
> +            gfn_unlock(p2m, gfn, order);
> +            return 0;
> +        }
> +
> +        if ( bad )
> +        {
> +            gfn_unlock(p2m, gfn, order);
> +            printk(XENLOG_G_ERR
> +                   "%pd: GFN %lx (%lx:%u:%u:%u) -> (%lx:%u:%u:%u) not permitted\n",
> +                   d, gfn_l,
> +                   mfn_x(omfn), cur_order, ot, a,
> +                   mfn_x(mfn), order, gfn_p2mt, access);
> +            domain_crash(d);
> +            return -EPERM;
> +        }
>      }
>      else if ( p2m_is_ram(ot) )
>      {
>

Re: [PATCH 2/4] x86/P2M: relax guarding of MMIO entries

Posted by Jan Beulich 4 years, 5 months ago

On 31.08.2021 15:16, Andrew Cooper wrote:
> On 30/08/2021 14:02, Jan Beulich wrote:
>> Further permit "access" to differ in the "executable" attribute. While
>> ideally only ROM regions would get mapped with X set, getting there is
>> quite a bit of work. Therefore, as a temporary measure, permit X to
>> vary. For Dom0 the more permissive of the types will be used, while for
>> DomU it'll be the more restrictive one.
> 
> Split behaviour between dom0 and domU based on types alone cannot
> possibly be correct.

True, but what do you do.

> DomU's need to execute ROMs too, and this looks like will malfunction if
> a ROM ends up in the region that HVMLoader relocated RAM from.
> 
> As this is a temporary bodge emergency bugfix, don't try to be clever -
> just take the latest access.

And how do we know that that's what is going to work? We should
strictly accumulate for Dom0. And what we do for DomU is moot for
the moment, until PCI passthrough becomes a thing for PVH. Hence
I've opted to be restrictive there - I'd rather see things break
(and getting adjusted) when this future work actually gets carried
out, than leave things permissive for no-one to notice that it's
too permissive, leading to an XSA.

>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -958,9 +958,13 @@ guest_physmap_add_entry(struct domain *d
>>          if ( p2m_is_special(ot) )
>>          {
>>              /* Don't permit unmapping grant/foreign/direct-MMIO this way. */
>> -            domain_crash(d);
>>              p2m_unlock(p2m);
>> -            
>> +            printk(XENLOG_G_ERR
>> +                   "%pd: GFN %lx (%lx:%u:%u) -> (%lx:%u:%u) not permitted\n",
> 
> type and access need to be rendered in hex, or you need to use 0x
> prefixes to distinguish the two bases.

Will use %#lx then.

> Also, use commas rather than colons.  Visually, this is ambiguous with
> PCI BDFs, and commas match tuple notation in most programming languages
> which is the construct you're trying to represent.
> 
> Same below.

Sure, will do.

>> @@ -1302,9 +1306,50 @@ static int set_typed_p2m_entry(struct do
>>      }
>>      if ( p2m_is_special(ot) )
>>      {
>> -        gfn_unlock(p2m, gfn, order);
>> -        domain_crash(d);
>> -        return -EPERM;
>> +        bool done = false, bad = true;
>> +
>> +        /* Special-case (almost) identical mappings. */
>> +        if ( mfn_eq(mfn, omfn) && gfn_p2mt == ot )
>> +        {
>> +            /*
>> +             * For MMIO allow X to differ in the requests (to cover for
>> +             * set_identity_p2m_entry() and set_mmio_p2m_entry() differing in
>> +             * the way they specify "access"). For the hardware domain put (or
>> +             * leave) in place the more permissive of the two possibilities,
>> +             * while for DomU-s go with the more restrictive variant.
> 
> This comment needs to identify clearly that it is a temporary bodge
> intended to be removed.

Okay.

Jan

Re: [PATCH 2/4] x86/P2M: relax guarding of MMIO entries

Posted by Andrew Cooper 4 years, 5 months ago

On 31/08/2021 14:26, Jan Beulich wrote:
> On 31.08.2021 15:16, Andrew Cooper wrote:
>> On 30/08/2021 14:02, Jan Beulich wrote:
>>> Further permit "access" to differ in the "executable" attribute. While
>>> ideally only ROM regions would get mapped with X set, getting there is
>>> quite a bit of work. Therefore, as a temporary measure, permit X to
>>> vary. For Dom0 the more permissive of the types will be used, while for
>>> DomU it'll be the more restrictive one.
>> Split behaviour between dom0 and domU based on types alone cannot
>> possibly be correct.
> True, but what do you do.
>
>> DomU's need to execute ROMs too, and this looks like will malfunction if
>> a ROM ends up in the region that HVMLoader relocated RAM from.
>>
>> As this is a temporary bodge emergency bugfix, don't try to be clever -
>> just take the latest access.
> And how do we know that that's what is going to work?

Because it's the pre-existing behaviour.

>  We should
> strictly accumulate for Dom0. And what we do for DomU is moot for
> the moment, until PCI passthrough becomes a thing for PVH. Hence
> I've opted to be restrictive there - I'd rather see things break
> (and getting adjusted) when this future work actually gets carried
> out, than leave things permissive for no-one to notice that it's
> too permissive, leading to an XSA.

Restricting execute permissions is something unique to virt.  It doesn't
exist in a non-virtualised system, as I and D side reads are
indistinguishable outside of the core.

Furthermore, it is inexpressible on some systems/configurations.

Introspection is the only technology which should be restricting execute
permissions in the p2m, and only when it takes responsibility for
dealing with the fallout.

~Andrew

Re: [PATCH 2/4] x86/P2M: relax guarding of MMIO entries

Posted by Jan Beulich 4 years, 5 months ago

On 31.08.2021 17:25, Andrew Cooper wrote:
> On 31/08/2021 14:26, Jan Beulich wrote:
>> On 31.08.2021 15:16, Andrew Cooper wrote:
>>> On 30/08/2021 14:02, Jan Beulich wrote:
>>>> Further permit "access" to differ in the "executable" attribute. While
>>>> ideally only ROM regions would get mapped with X set, getting there is
>>>> quite a bit of work. Therefore, as a temporary measure, permit X to
>>>> vary. For Dom0 the more permissive of the types will be used, while for
>>>> DomU it'll be the more restrictive one.
>>> Split behaviour between dom0 and domU based on types alone cannot
>>> possibly be correct.
>> True, but what do you do.
>>
>>> DomU's need to execute ROMs too, and this looks like will malfunction if
>>> a ROM ends up in the region that HVMLoader relocated RAM from.
>>>
>>> As this is a temporary bodge emergency bugfix, don't try to be clever -
>>> just take the latest access.
>> And how do we know that that's what is going to work?
> 
> Because it's the pre-existing behaviour.

Valid point. But for the DomU case there simply has not been any
pre-existing behavior. Hence my desire to be restrictive initially
there.

>>  We should
>> strictly accumulate for Dom0. And what we do for DomU is moot for
>> the moment, until PCI passthrough becomes a thing for PVH. Hence
>> I've opted to be restrictive there - I'd rather see things break
>> (and getting adjusted) when this future work actually gets carried
>> out, than leave things permissive for no-one to notice that it's
>> too permissive, leading to an XSA.
> 
> Restricting execute permissions is something unique to virt.  It doesn't
> exist in a non-virtualised system, as I and D side reads are
> indistinguishable outside of the core.
> 
> Furthermore, it is inexpressible on some systems/configurations.
> 
> Introspection is the only technology which should be restricting execute
> permissions in the p2m, and only when it takes responsibility for
> dealing with the fallout.

IOW are you saying that the calls to set_identity_p2m_entry()
(pre-dating XSA-378) were wrong to use p2m_access_rw? Because that's
what's getting the way here.

Plus, as a side note, then we don't even have e.g. IOMMUF_executable.

Jan

Re: [PATCH 2/4] x86/P2M: relax guarding of MMIO entries

Posted by Andrew Cooper 4 years, 5 months ago

On 31/08/2021 16:38, Jan Beulich wrote:
> On 31.08.2021 17:25, Andrew Cooper wrote:
>> On 31/08/2021 14:26, Jan Beulich wrote:
>>> On 31.08.2021 15:16, Andrew Cooper wrote:
>>>> On 30/08/2021 14:02, Jan Beulich wrote:
>>>>> Further permit "access" to differ in the "executable" attribute. While
>>>>> ideally only ROM regions would get mapped with X set, getting there is
>>>>> quite a bit of work. Therefore, as a temporary measure, permit X to
>>>>> vary. For Dom0 the more permissive of the types will be used, while for
>>>>> DomU it'll be the more restrictive one.
>>>> Split behaviour between dom0 and domU based on types alone cannot
>>>> possibly be correct.
>>> True, but what do you do.
>>>
>>>> DomU's need to execute ROMs too, and this looks like will malfunction if
>>>> a ROM ends up in the region that HVMLoader relocated RAM from.
>>>>
>>>> As this is a temporary bodge emergency bugfix, don't try to be clever -
>>>> just take the latest access.
>>> And how do we know that that's what is going to work?
>> Because it's the pre-existing behaviour.
> Valid point. But for the DomU case there simply has not been any
> pre-existing behavior. Hence my desire to be restrictive initially
> there.

But you're conflating a feature (under question anyway, because I gave
you an example where I expect this will collide in a regular domU
already), with an emergency bugfix to unbreak staging caused by an
unexpected interaction in a security hotfix.

At an absolute minimum, this patch needs splitting in two to separate
the bugfix from the proposed feature.

>>>  We should
>>> strictly accumulate for Dom0. And what we do for DomU is moot for
>>> the moment, until PCI passthrough becomes a thing for PVH. Hence
>>> I've opted to be restrictive there - I'd rather see things break
>>> (and getting adjusted) when this future work actually gets carried
>>> out, than leave things permissive for no-one to notice that it's
>>> too permissive, leading to an XSA.
>> Restricting execute permissions is something unique to virt.  It doesn't
>> exist in a non-virtualised system, as I and D side reads are
>> indistinguishable outside of the core.
>>
>> Furthermore, it is inexpressible on some systems/configurations.
>>
>> Introspection is the only technology which should be restricting execute
>> permissions in the p2m, and only when it takes responsibility for
>> dealing with the fallout.
> IOW are you saying that the calls to set_identity_p2m_entry()
> (pre-dating XSA-378) were wrong to use p2m_access_rw?

Yes.

>  Because that's
> what's getting the way here.

On a real machine, you really can write some executable code into an
E820 reserved region and jump to it.  You can also execute code from
real BARs is you happen to know that they are prefetchable (or you're a
glutton for UC reads...)

And there is the WPBT ACPI table which exists specifically to let
firmware inject drivers/applications into a windows environment, and may
come out of the SPI ROM in the first place.

Is it sensible to execute an E820 reserved region, or unmarked BAR? 
Probably not.

Should it work, because that's how real hardware behaves?  Absolutely.

Any restrictions beyond that want handling by some kind of introspection
agent which has a policy of what to do with legal-but-dodgy-looking actions.

> Plus, as a side note, then we don't even have e.g. IOMMUF_executable.

Just as I vs D side reads are indistinguishable outside of the CPU core,
the same is in principle true for PCI devices which execute code (e.g.
GPU shaders).  Reads on the bus are just reads.

That said, the latest SIOV spec does appear to include the ER (Execution
Request) bit for use with PASSID/Shared-Virtual-Memory, which interacts
with the EPT-X/ia32-NX bits, and goes as far as having SMEP/SMAP bits in
the IOMMU configuration.  I'm not sure if this is in released hardware
yet, but it's clearly on the horizon.  I can't spot any execute related
controls in the AMD IOMMU spec, although it does have user/supervisor
for PASSID/SVM.

~Andrew

Re: [PATCH 2/4] x86/P2M: relax guarding of MMIO entries

Posted by Jan Beulich 4 years, 5 months ago

On 01.09.2021 14:47, Andrew Cooper wrote:
> On 31/08/2021 16:38, Jan Beulich wrote:
>> On 31.08.2021 17:25, Andrew Cooper wrote:
>>> On 31/08/2021 14:26, Jan Beulich wrote:
>>>> On 31.08.2021 15:16, Andrew Cooper wrote:
>>>>> On 30/08/2021 14:02, Jan Beulich wrote:
>>>>>> Further permit "access" to differ in the "executable" attribute. While
>>>>>> ideally only ROM regions would get mapped with X set, getting there is
>>>>>> quite a bit of work. Therefore, as a temporary measure, permit X to
>>>>>> vary. For Dom0 the more permissive of the types will be used, while for
>>>>>> DomU it'll be the more restrictive one.
>>>>> Split behaviour between dom0 and domU based on types alone cannot
>>>>> possibly be correct.
>>>> True, but what do you do.
>>>>
>>>>> DomU's need to execute ROMs too, and this looks like will malfunction if
>>>>> a ROM ends up in the region that HVMLoader relocated RAM from.
>>>>>
>>>>> As this is a temporary bodge emergency bugfix, don't try to be clever -
>>>>> just take the latest access.
>>>> And how do we know that that's what is going to work?
>>> Because it's the pre-existing behaviour.
>> Valid point. But for the DomU case there simply has not been any
>> pre-existing behavior. Hence my desire to be restrictive initially
>> there.
> 
> But you're conflating a feature (under question anyway, because I gave
> you an example where I expect this will collide in a regular domU
> already),

I don't think your example fits: hvmloader moving RAM will first
convert the p2m slot to non-present. Then a ROM page can get mapped
there quite fine. A direct transition (without going through n/p)
would not work independent of the change here: The MFNs would
differ, as would the p2m types.

> with an emergency bugfix to unbreak staging caused by an
> unexpected interaction in a security hotfix.
> 
> At an absolute minimum, this patch needs splitting in two to separate
> the bugfix from the proposed feature.

Well, okay, I will split the patch, despite not being convinced this
will do us any good - we'd backport just the part you consider a bug
fix, but not the part you deem a feature (and which I consider part
of the bug fix).

>>>>  We should
>>>> strictly accumulate for Dom0. And what we do for DomU is moot for
>>>> the moment, until PCI passthrough becomes a thing for PVH. Hence
>>>> I've opted to be restrictive there - I'd rather see things break
>>>> (and getting adjusted) when this future work actually gets carried
>>>> out, than leave things permissive for no-one to notice that it's
>>>> too permissive, leading to an XSA.

Actually I think I was missing an important aspect here: The code in
question gets used not only for PVH, but also for HVM, where pass-
through is a thing. Hence I'll restrict the "feature" part to Dom0
for now.

>>> Restricting execute permissions is something unique to virt.  It doesn't
>>> exist in a non-virtualised system, as I and D side reads are
>>> indistinguishable outside of the core.
>>>
>>> Furthermore, it is inexpressible on some systems/configurations.
>>>
>>> Introspection is the only technology which should be restricting execute
>>> permissions in the p2m, and only when it takes responsibility for
>>> dealing with the fallout.
>> IOW are you saying that the calls to set_identity_p2m_entry()
>> (pre-dating XSA-378) were wrong to use p2m_access_rw?
> 
> Yes.
> 
>>  Because that's
>> what's getting the way here.
> 
> On a real machine, you really can write some executable code into an
> E820 reserved region and jump to it.  You can also execute code from
> real BARs is you happen to know that they are prefetchable (or you're a
> glutton for UC reads...)
> 
> And there is the WPBT ACPI table which exists specifically to let
> firmware inject drivers/applications into a windows environment, and may
> come out of the SPI ROM in the first place.
> 
> 
> Is it sensible to execute an E820 reserved region, or unmarked BAR? 
> Probably not.
> 
> Should it work, because that's how real hardware behaves?  Absolutely.
> 
> Any restrictions beyond that want handling by some kind of introspection
> agent which has a policy of what to do with legal-but-dodgy-looking actions.

IOW you suggest we remove p2m_access_t parameters from various functions,
going with just default access? Of course, as pointed out in another
reply, we'll need to split p2m_mmio_direct into multiple types then, at
the very least to honor the potential r/o restriction of AMD IOMMU unity
mapped regions. (FAOD all of this isn't a short term plan anyway, at least
afaic.)

Jan

Re: [PATCH 2/4] x86/P2M: relax guarding of MMIO entries

Posted by Andrew Cooper 4 years, 5 months ago

On 01/09/2021 14:08, Jan Beulich wrote:
>>>> Restricting execute permissions is something unique to virt.  It doesn't
>>>> exist in a non-virtualised system, as I and D side reads are
>>>> indistinguishable outside of the core.
>>>>
>>>> Furthermore, it is inexpressible on some systems/configurations.
>>>>
>>>> Introspection is the only technology which should be restricting execute
>>>> permissions in the p2m, and only when it takes responsibility for
>>>> dealing with the fallout.
>>> IOW are you saying that the calls to set_identity_p2m_entry()
>>> (pre-dating XSA-378) were wrong to use p2m_access_rw?
>> Yes.
>>
>>>  Because that's
>>> what's getting the way here.
>> On a real machine, you really can write some executable code into an
>> E820 reserved region and jump to it.  You can also execute code from
>> real BARs is you happen to know that they are prefetchable (or you're a
>> glutton for UC reads...)
>>
>> And there is the WPBT ACPI table which exists specifically to let
>> firmware inject drivers/applications into a windows environment, and may
>> come out of the SPI ROM in the first place.
>>
>>
>> Is it sensible to execute an E820 reserved region, or unmarked BAR? 
>> Probably not.
>>
>> Should it work, because that's how real hardware behaves?  Absolutely.
>>
>> Any restrictions beyond that want handling by some kind of introspection
>> agent which has a policy of what to do with legal-but-dodgy-looking actions.
> IOW you suggest we remove p2m_access_t parameters from various functions,
> going with just default access?

p2m_access_t was very obviously a bodge when introduced, and I doubt it
would pass today's review standards.

It is definitely a mis-design, given its ill-specified and overlapping
semantics with respect to the p2m type.

>  Of course, as pointed out in another
> reply, we'll need to split p2m_mmio_direct into multiple types then, at
> the very least to honor the potential r/o restriction of AMD IOMMU unity
> mapped regions. (FAOD all of this isn't a short term plan anyway, at least
> afaic.)

I don't think that will be necessary.

IVMDs are almost non-existent, and given how many other areas of the AMD
IOMMU spec are totally unused, I wouldn't be surprised if r/o unity
mappings were in that category too.  There's no obvious usecase for r/o,
as anything critical enough in the platform to have an IVMD in the first
place will also be non-trivial enough to require bidirectional
communication somehow.

The unity mapping only says "this device requires read-only access".  It
doesn't say "this must be mapped read-only", and it is legitimate to map
a r/o unity mapping as r/w.

If such a case actually exists, there's clearly one agent in the system
with r/w access into the r/o range, and mapping it r/w is equivalent to
the "IOMMU not enabled in the first place" case which is the default
case for most software for the past decade-and-a-bit.

In other words, I don't think the r/o unit maps on their own are a good
enough reasons to split the type.  If we gain other reasons then fine,
but this seems like chunk of complexity with no real users.

~Andrew

Re: [PATCH 2/4] x86/P2M: relax guarding of MMIO entries

Posted by Jan Beulich 4 years, 5 months ago

On 06.09.2021 21:53, Andrew Cooper wrote:
> On 01/09/2021 14:08, Jan Beulich wrote:
>>>>> Restricting execute permissions is something unique to virt.  It doesn't
>>>>> exist in a non-virtualised system, as I and D side reads are
>>>>> indistinguishable outside of the core.
>>>>>
>>>>> Furthermore, it is inexpressible on some systems/configurations.
>>>>>
>>>>> Introspection is the only technology which should be restricting execute
>>>>> permissions in the p2m, and only when it takes responsibility for
>>>>> dealing with the fallout.
>>>> IOW are you saying that the calls to set_identity_p2m_entry()
>>>> (pre-dating XSA-378) were wrong to use p2m_access_rw?
>>> Yes.
>>>
>>>>  Because that's
>>>> what's getting the way here.
>>> On a real machine, you really can write some executable code into an
>>> E820 reserved region and jump to it.  You can also execute code from
>>> real BARs is you happen to know that they are prefetchable (or you're a
>>> glutton for UC reads...)
>>>
>>> And there is the WPBT ACPI table which exists specifically to let
>>> firmware inject drivers/applications into a windows environment, and may
>>> come out of the SPI ROM in the first place.
>>>
>>>
>>> Is it sensible to execute an E820 reserved region, or unmarked BAR? 
>>> Probably not.
>>>
>>> Should it work, because that's how real hardware behaves?  Absolutely.
>>>
>>> Any restrictions beyond that want handling by some kind of introspection
>>> agent which has a policy of what to do with legal-but-dodgy-looking actions.
>> IOW you suggest we remove p2m_access_t parameters from various functions,
>> going with just default access?
> 
> p2m_access_t was very obviously a bodge when introduced, and I doubt it
> would pass today's review standards.
> 
> It is definitely a mis-design, given its ill-specified and overlapping
> semantics with respect to the p2m type.
> 
>>  Of course, as pointed out in another
>> reply, we'll need to split p2m_mmio_direct into multiple types then, at
>> the very least to honor the potential r/o restriction of AMD IOMMU unity
>> mapped regions. (FAOD all of this isn't a short term plan anyway, at least
>> afaic.)
> 
> I don't think that will be necessary.
> 
> IVMDs are almost non-existent, and given how many other areas of the AMD
> IOMMU spec are totally unused, I wouldn't be surprised if r/o unity
> mappings were in that category too.  There's no obvious usecase for r/o,
> as anything critical enough in the platform to have an IVMD in the first
> place will also be non-trivial enough to require bidirectional
> communication somehow.
> 
> The unity mapping only says "this device requires read-only access".  It
> doesn't say "this must be mapped read-only", and it is legitimate to map
> a r/o unity mapping as r/w.

Well, imo that's extending what "Write permission. 1b=writeable, 0b=not
writeable" and "Read permission. 1b=readable, 0b=not readable" in the
spec say. "Permission" to me doesn't mean what you say.

Nevertheless I would perhaps not insist (as I've already made clear I
don't see a strong need to support w/o mappings), if ...

> If such a case actually exists, there's clearly one agent in the system
> with r/w access into the r/o range, and mapping it r/w is equivalent to
> the "IOMMU not enabled in the first place" case which is the default
> case for most software for the past decade-and-a-bit.
> 
> In other words, I don't think the r/o unit maps on their own are a good
> enough reasons to split the type.  If we gain other reasons then fine,
> but this seems like chunk of complexity with no real users.

... there wasn't already a 2nd use for this: The IO-APIC mappings (see
my respective pending patch).

Jan

Re: [PATCH 2/4] x86/P2M: relax guarding of MMIO entries

Posted by Roger Pau Monné 4 years, 5 months ago

On Tue, Aug 31, 2021 at 05:38:49PM +0200, Jan Beulich wrote:
> On 31.08.2021 17:25, Andrew Cooper wrote:
> > On 31/08/2021 14:26, Jan Beulich wrote:
> >> On 31.08.2021 15:16, Andrew Cooper wrote:
> >>> On 30/08/2021 14:02, Jan Beulich wrote:
> >>>> Further permit "access" to differ in the "executable" attribute. While
> >>>> ideally only ROM regions would get mapped with X set, getting there is
> >>>> quite a bit of work. Therefore, as a temporary measure, permit X to
> >>>> vary. For Dom0 the more permissive of the types will be used, while for
> >>>> DomU it'll be the more restrictive one.
> >>> Split behaviour between dom0 and domU based on types alone cannot
> >>> possibly be correct.
> >> True, but what do you do.
> >>
> >>> DomU's need to execute ROMs too, and this looks like will malfunction if
> >>> a ROM ends up in the region that HVMLoader relocated RAM from.
> >>>
> >>> As this is a temporary bodge emergency bugfix, don't try to be clever -
> >>> just take the latest access.
> >> And how do we know that that's what is going to work?
> > 
> > Because it's the pre-existing behaviour.
> 
> Valid point. But for the DomU case there simply has not been any
> pre-existing behavior. Hence my desire to be restrictive initially
> there.
> 
> >>  We should
> >> strictly accumulate for Dom0. And what we do for DomU is moot for
> >> the moment, until PCI passthrough becomes a thing for PVH. Hence
> >> I've opted to be restrictive there - I'd rather see things break
> >> (and getting adjusted) when this future work actually gets carried
> >> out, than leave things permissive for no-one to notice that it's
> >> too permissive, leading to an XSA.
> > 
> > Restricting execute permissions is something unique to virt.  It doesn't
> > exist in a non-virtualised system, as I and D side reads are
> > indistinguishable outside of the core.
> > 
> > Furthermore, it is inexpressible on some systems/configurations.
> > 
> > Introspection is the only technology which should be restricting execute
> > permissions in the p2m, and only when it takes responsibility for
> > dealing with the fallout.
> 
> IOW are you saying that the calls to set_identity_p2m_entry()
> (pre-dating XSA-378) were wrong to use p2m_access_rw? Because that's
> what's getting the way here.

I did wonder this before, because I saw some messages on a couple of
systems about mappings override, and I'm not sure why we need to use
p2m_access_rw. My first thought was to suggest to switch to use the
default access type for the domain, like set_mmio_p2m_entry does.

I have to admit I'm not sure I see the point of preventing execution,
but it's possible I'm missing something.

Thanks, Roger.

Re: [PATCH 2/4] x86/P2M: relax guarding of MMIO entries

Posted by Jan Beulich 4 years, 5 months ago

On 01.09.2021 10:50, Roger Pau Monné wrote:
> On Tue, Aug 31, 2021 at 05:38:49PM +0200, Jan Beulich wrote:
>> On 31.08.2021 17:25, Andrew Cooper wrote:
>>> On 31/08/2021 14:26, Jan Beulich wrote:
>>>> On 31.08.2021 15:16, Andrew Cooper wrote:
>>>>> On 30/08/2021 14:02, Jan Beulich wrote:
>>>>>> Further permit "access" to differ in the "executable" attribute. While
>>>>>> ideally only ROM regions would get mapped with X set, getting there is
>>>>>> quite a bit of work. Therefore, as a temporary measure, permit X to
>>>>>> vary. For Dom0 the more permissive of the types will be used, while for
>>>>>> DomU it'll be the more restrictive one.
>>>>> Split behaviour between dom0 and domU based on types alone cannot
>>>>> possibly be correct.
>>>> True, but what do you do.
>>>>
>>>>> DomU's need to execute ROMs too, and this looks like will malfunction if
>>>>> a ROM ends up in the region that HVMLoader relocated RAM from.
>>>>>
>>>>> As this is a temporary bodge emergency bugfix, don't try to be clever -
>>>>> just take the latest access.
>>>> And how do we know that that's what is going to work?
>>>
>>> Because it's the pre-existing behaviour.
>>
>> Valid point. But for the DomU case there simply has not been any
>> pre-existing behavior. Hence my desire to be restrictive initially
>> there.
>>
>>>>  We should
>>>> strictly accumulate for Dom0. And what we do for DomU is moot for
>>>> the moment, until PCI passthrough becomes a thing for PVH. Hence
>>>> I've opted to be restrictive there - I'd rather see things break
>>>> (and getting adjusted) when this future work actually gets carried
>>>> out, than leave things permissive for no-one to notice that it's
>>>> too permissive, leading to an XSA.
>>>
>>> Restricting execute permissions is something unique to virt.  It doesn't
>>> exist in a non-virtualised system, as I and D side reads are
>>> indistinguishable outside of the core.
>>>
>>> Furthermore, it is inexpressible on some systems/configurations.
>>>
>>> Introspection is the only technology which should be restricting execute
>>> permissions in the p2m, and only when it takes responsibility for
>>> dealing with the fallout.
>>
>> IOW are you saying that the calls to set_identity_p2m_entry()
>> (pre-dating XSA-378) were wrong to use p2m_access_rw? Because that's
>> what's getting the way here.
> 
> I did wonder this before, because I saw some messages on a couple of
> systems about mappings override, and I'm not sure why we need to use
> p2m_access_rw. My first thought was to suggest to switch to use the
> default access type for the domain, like set_mmio_p2m_entry does.
> 
> I have to admit I'm not sure I see the point of preventing execution,
> but it's possible I'm missing something.

Well, what good can come from allowing execution from, say, the
IO-APIC or LAPIC pages? Or other MMIO-mapped register space? Insn
fetches might even trip bad hardware behavior in such a case by
being the wrong granularity. It's imo really only ROM space which
ought to have execution permitted.

The issue isn't just with execution, though, and as a result I may
need to change the logic here to also include at least W. As of
one of the XSA-378 changes we may now pass just p2m_access_r to
iommu_identity_mapping(), if the ACPI tables on an AMD system were
saying so. (We may also pass p2m_access_w, but I sincerely hope no
firmware would specify write but no read access.)

Similarly in "IOMMU/x86: restrict IO-APIC mappings for PV Dom0" I
now pass p2m_access_r to set_identity_p2m_entry().

I suppose an underlying issue is the mixed purpose of using
p2m_access_*, which possibly has been against the intentions in the
first place. We cannot, for example, express r/o access to an MMIO
page without using p2m_access_r (or p2m_access_rx), as there's no
suitable p2m type to express this via type alone. We may need to
split p2m_mmio_direct into multiple types (up to 7), I guess, if
we wanted to remove this (ab)use of p2m_access_*.

Jan

Re: [PATCH 2/4] x86/P2M: relax guarding of MMIO entries

Posted by Jan Beulich 4 years, 5 months ago

On 31.08.2021 17:38, Jan Beulich wrote:
> On 31.08.2021 17:25, Andrew Cooper wrote:
>> On 31/08/2021 14:26, Jan Beulich wrote:
>>> On 31.08.2021 15:16, Andrew Cooper wrote:
>>>> On 30/08/2021 14:02, Jan Beulich wrote:
>>>>> Further permit "access" to differ in the "executable" attribute. While
>>>>> ideally only ROM regions would get mapped with X set, getting there is
>>>>> quite a bit of work. Therefore, as a temporary measure, permit X to
>>>>> vary. For Dom0 the more permissive of the types will be used, while for
>>>>> DomU it'll be the more restrictive one.
>>>> Split behaviour between dom0 and domU based on types alone cannot
>>>> possibly be correct.
>>> True, but what do you do.
>>>
>>>> DomU's need to execute ROMs too, and this looks like will malfunction if
>>>> a ROM ends up in the region that HVMLoader relocated RAM from.
>>>>
>>>> As this is a temporary bodge emergency bugfix, don't try to be clever -
>>>> just take the latest access.
>>> And how do we know that that's what is going to work?
>>
>> Because it's the pre-existing behaviour.
> 
> Valid point. But for the DomU case there simply has not been any
> pre-existing behavior. Hence my desire to be restrictive initially
> there.

Further to this: Using the last-value-set approach also puts us at
risk of running into a similar issue again when the ordering of
some operations changes elsewhere.

Jan