One of the changes comprising the fixes for XSA-378 disallows replacing
MMIO mappings by unintended (for this purpose) code paths. At least in
the case of PVH Dom0 hitting an RMRR covered by an E820 ACPI region,
this is too strict. Generally short-circuit requests establishing the
same kind of mapping that's already in place.
Further permit "access" to differ in the "executable" attribute. While
ideally only ROM regions would get mapped with X set, getting there is
quite a bit of work. Therefore, as a temporary measure, permit X to
vary. For Dom0 the more permissive of the types will be used, while for
DomU it'll be the more restrictive one.
While there, also add a log message to the other domain_crash()
invocation that did prevent PVH Dom0 from coming up after the XSA-378
changes.
Fixes: 753cb68e6530 ("x86/p2m: guard (in particular) identity mapping entries")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -958,9 +958,13 @@ guest_physmap_add_entry(struct domain *d
if ( p2m_is_special(ot) )
{
/* Don't permit unmapping grant/foreign/direct-MMIO this way. */
- domain_crash(d);
p2m_unlock(p2m);
-
+ printk(XENLOG_G_ERR
+ "%pd: GFN %lx (%lx:%u:%u) -> (%lx:%u:%u) not permitted\n",
+ d, gfn_x(gfn) + i,
+ mfn_x(omfn), ot, a,
+ mfn_x(mfn) + i, t, p2m->default_access);
+ domain_crash(d);
return -EPERM;
}
else if ( p2m_is_ram(ot) && !p2m_is_paged(ot) )
@@ -1302,9 +1306,50 @@ static int set_typed_p2m_entry(struct do
}
if ( p2m_is_special(ot) )
{
- gfn_unlock(p2m, gfn, order);
- domain_crash(d);
- return -EPERM;
+ bool done = false, bad = true;
+
+ /* Special-case (almost) identical mappings. */
+ if ( mfn_eq(mfn, omfn) && gfn_p2mt == ot )
+ {
+ /*
+ * For MMIO allow X to differ in the requests (to cover for
+ * set_identity_p2m_entry() and set_mmio_p2m_entry() differing in
+ * the way they specify "access"). For the hardware domain put (or
+ * leave) in place the more permissive of the two possibilities,
+ * while for DomU-s go with the more restrictive variant.
+ */
+ if ( gfn_p2mt == p2m_mmio_direct &&
+ access <= p2m_access_rwx &&
+ (access ^ a) == p2m_access_x )
+ {
+ if ( is_hardware_domain(d) )
+ access |= p2m_access_x;
+ else
+ access &= ~p2m_access_x;
+ bad = access == p2m_access_n;
+ }
+
+ if ( access == a )
+ done = true;
+ }
+
+ if ( done )
+ {
+ gfn_unlock(p2m, gfn, order);
+ return 0;
+ }
+
+ if ( bad )
+ {
+ gfn_unlock(p2m, gfn, order);
+ printk(XENLOG_G_ERR
+ "%pd: GFN %lx (%lx:%u:%u:%u) -> (%lx:%u:%u:%u) not permitted\n",
+ d, gfn_l,
+ mfn_x(omfn), cur_order, ot, a,
+ mfn_x(mfn), order, gfn_p2mt, access);
+ domain_crash(d);
+ return -EPERM;
+ }
}
else if ( p2m_is_ram(ot) )
{
On 30.08.2021 15:02, Jan Beulich wrote: > One of the changes comprising the fixes for XSA-378 disallows replacing > MMIO mappings by unintended (for this purpose) code paths. At least in > the case of PVH Dom0 hitting an RMRR covered by an E820 ACPI region, > this is too strict. Generally short-circuit requests establishing the > same kind of mapping that's already in place. > > Further permit "access" to differ in the "executable" attribute. While > ideally only ROM regions would get mapped with X set, getting there is > quite a bit of work. Therefore, as a temporary measure, permit X to > vary. For Dom0 the more permissive of the types will be used, while for > DomU it'll be the more restrictive one. > > While there, also add a log message to the other domain_crash() > invocation that did prevent PVH Dom0 from coming up after the XSA-378 > changes. > > Fixes: 753cb68e6530 ("x86/p2m: guard (in particular) identity mapping entries") > Signed-off-by: Jan Beulich <jbeulich@suse.com> Btw, I had meant to have this post-commit-message remark here: TBD: This could be generalized to all of R, W, and X. Dealing with just X is merely the minimum I found is immediately necessary. Jan
On 30/08/2021 14:02, Jan Beulich wrote: > One of the changes comprising the fixes for XSA-378 disallows replacing > MMIO mappings by unintended (for this purpose) code paths. Drop the brackets. > At least in > the case of PVH Dom0 hitting an RMRR covered by an E820 ACPI region, > this is too strict. Generally short-circuit requests establishing the > same kind of mapping that's already in place. > > Further permit "access" to differ in the "executable" attribute. While > ideally only ROM regions would get mapped with X set, getting there is > quite a bit of work. Therefore, as a temporary measure, permit X to > vary. For Dom0 the more permissive of the types will be used, while for > DomU it'll be the more restrictive one. Split behaviour between dom0 and domU based on types alone cannot possibly be correct. DomU's need to execute ROMs too, and this looks like will malfunction if a ROM ends up in the region that HVMLoader relocated RAM from. As this is a temporary bodge emergency bugfix, don't try to be clever - just take the latest access. > While there, also add a log message to the other domain_crash() > invocation that did prevent PVH Dom0 from coming up after the XSA-378 > changes. > > Fixes: 753cb68e6530 ("x86/p2m: guard (in particular) identity mapping entries") > Signed-off-by: Jan Beulich <jbeulich@suse.com> > > --- a/xen/arch/x86/mm/p2m.c > +++ b/xen/arch/x86/mm/p2m.c > @@ -958,9 +958,13 @@ guest_physmap_add_entry(struct domain *d > if ( p2m_is_special(ot) ) > { > /* Don't permit unmapping grant/foreign/direct-MMIO this way. */ > - domain_crash(d); > p2m_unlock(p2m); > - > + printk(XENLOG_G_ERR > + "%pd: GFN %lx (%lx:%u:%u) -> (%lx:%u:%u) not permitted\n", type and access need to be rendered in hex, or you need to use 0x prefixes to distinguish the two bases. Also, use commas rather than colons. Visually, this is ambiguous with PCI BDFs, and commas match tuple notation in most programming languages which is the construct you're trying to represent. Same below. > + d, gfn_x(gfn) + i, > + mfn_x(omfn), ot, a, > + mfn_x(mfn) + i, t, p2m->default_access); > + domain_crash(d); > return -EPERM; > } > else if ( p2m_is_ram(ot) && !p2m_is_paged(ot) ) > @@ -1302,9 +1306,50 @@ static int set_typed_p2m_entry(struct do > } > if ( p2m_is_special(ot) ) > { > - gfn_unlock(p2m, gfn, order); > - domain_crash(d); > - return -EPERM; > + bool done = false, bad = true; > + > + /* Special-case (almost) identical mappings. */ > + if ( mfn_eq(mfn, omfn) && gfn_p2mt == ot ) > + { > + /* > + * For MMIO allow X to differ in the requests (to cover for > + * set_identity_p2m_entry() and set_mmio_p2m_entry() differing in > + * the way they specify "access"). For the hardware domain put (or > + * leave) in place the more permissive of the two possibilities, > + * while for DomU-s go with the more restrictive variant. This comment needs to identify clearly that it is a temporary bodge intended to be removed. ~Andrew > + */ > + if ( gfn_p2mt == p2m_mmio_direct && > + access <= p2m_access_rwx && > + (access ^ a) == p2m_access_x ) > + { > + if ( is_hardware_domain(d) ) > + access |= p2m_access_x; > + else > + access &= ~p2m_access_x; > + bad = access == p2m_access_n; > + } > + > + if ( access == a ) > + done = true; > + } > + > + if ( done ) > + { > + gfn_unlock(p2m, gfn, order); > + return 0; > + } > + > + if ( bad ) > + { > + gfn_unlock(p2m, gfn, order); > + printk(XENLOG_G_ERR > + "%pd: GFN %lx (%lx:%u:%u:%u) -> (%lx:%u:%u:%u) not permitted\n", > + d, gfn_l, > + mfn_x(omfn), cur_order, ot, a, > + mfn_x(mfn), order, gfn_p2mt, access); > + domain_crash(d); > + return -EPERM; > + } > } > else if ( p2m_is_ram(ot) ) > { >
On 31.08.2021 15:16, Andrew Cooper wrote: > On 30/08/2021 14:02, Jan Beulich wrote: >> Further permit "access" to differ in the "executable" attribute. While >> ideally only ROM regions would get mapped with X set, getting there is >> quite a bit of work. Therefore, as a temporary measure, permit X to >> vary. For Dom0 the more permissive of the types will be used, while for >> DomU it'll be the more restrictive one. > > Split behaviour between dom0 and domU based on types alone cannot > possibly be correct. True, but what do you do. > DomU's need to execute ROMs too, and this looks like will malfunction if > a ROM ends up in the region that HVMLoader relocated RAM from. > > As this is a temporary bodge emergency bugfix, don't try to be clever - > just take the latest access. And how do we know that that's what is going to work? We should strictly accumulate for Dom0. And what we do for DomU is moot for the moment, until PCI passthrough becomes a thing for PVH. Hence I've opted to be restrictive there - I'd rather see things break (and getting adjusted) when this future work actually gets carried out, than leave things permissive for no-one to notice that it's too permissive, leading to an XSA. >> --- a/xen/arch/x86/mm/p2m.c >> +++ b/xen/arch/x86/mm/p2m.c >> @@ -958,9 +958,13 @@ guest_physmap_add_entry(struct domain *d >> if ( p2m_is_special(ot) ) >> { >> /* Don't permit unmapping grant/foreign/direct-MMIO this way. */ >> - domain_crash(d); >> p2m_unlock(p2m); >> - >> + printk(XENLOG_G_ERR >> + "%pd: GFN %lx (%lx:%u:%u) -> (%lx:%u:%u) not permitted\n", > > type and access need to be rendered in hex, or you need to use 0x > prefixes to distinguish the two bases. Will use %#lx then. > Also, use commas rather than colons. Visually, this is ambiguous with > PCI BDFs, and commas match tuple notation in most programming languages > which is the construct you're trying to represent. > > Same below. Sure, will do. >> @@ -1302,9 +1306,50 @@ static int set_typed_p2m_entry(struct do >> } >> if ( p2m_is_special(ot) ) >> { >> - gfn_unlock(p2m, gfn, order); >> - domain_crash(d); >> - return -EPERM; >> + bool done = false, bad = true; >> + >> + /* Special-case (almost) identical mappings. */ >> + if ( mfn_eq(mfn, omfn) && gfn_p2mt == ot ) >> + { >> + /* >> + * For MMIO allow X to differ in the requests (to cover for >> + * set_identity_p2m_entry() and set_mmio_p2m_entry() differing in >> + * the way they specify "access"). For the hardware domain put (or >> + * leave) in place the more permissive of the two possibilities, >> + * while for DomU-s go with the more restrictive variant. > > This comment needs to identify clearly that it is a temporary bodge > intended to be removed. Okay. Jan
On 31/08/2021 14:26, Jan Beulich wrote: > On 31.08.2021 15:16, Andrew Cooper wrote: >> On 30/08/2021 14:02, Jan Beulich wrote: >>> Further permit "access" to differ in the "executable" attribute. While >>> ideally only ROM regions would get mapped with X set, getting there is >>> quite a bit of work. Therefore, as a temporary measure, permit X to >>> vary. For Dom0 the more permissive of the types will be used, while for >>> DomU it'll be the more restrictive one. >> Split behaviour between dom0 and domU based on types alone cannot >> possibly be correct. > True, but what do you do. > >> DomU's need to execute ROMs too, and this looks like will malfunction if >> a ROM ends up in the region that HVMLoader relocated RAM from. >> >> As this is a temporary bodge emergency bugfix, don't try to be clever - >> just take the latest access. > And how do we know that that's what is going to work? Because it's the pre-existing behaviour. > We should > strictly accumulate for Dom0. And what we do for DomU is moot for > the moment, until PCI passthrough becomes a thing for PVH. Hence > I've opted to be restrictive there - I'd rather see things break > (and getting adjusted) when this future work actually gets carried > out, than leave things permissive for no-one to notice that it's > too permissive, leading to an XSA. Restricting execute permissions is something unique to virt. It doesn't exist in a non-virtualised system, as I and D side reads are indistinguishable outside of the core. Furthermore, it is inexpressible on some systems/configurations. Introspection is the only technology which should be restricting execute permissions in the p2m, and only when it takes responsibility for dealing with the fallout. ~Andrew
On 31.08.2021 17:25, Andrew Cooper wrote: > On 31/08/2021 14:26, Jan Beulich wrote: >> On 31.08.2021 15:16, Andrew Cooper wrote: >>> On 30/08/2021 14:02, Jan Beulich wrote: >>>> Further permit "access" to differ in the "executable" attribute. While >>>> ideally only ROM regions would get mapped with X set, getting there is >>>> quite a bit of work. Therefore, as a temporary measure, permit X to >>>> vary. For Dom0 the more permissive of the types will be used, while for >>>> DomU it'll be the more restrictive one. >>> Split behaviour between dom0 and domU based on types alone cannot >>> possibly be correct. >> True, but what do you do. >> >>> DomU's need to execute ROMs too, and this looks like will malfunction if >>> a ROM ends up in the region that HVMLoader relocated RAM from. >>> >>> As this is a temporary bodge emergency bugfix, don't try to be clever - >>> just take the latest access. >> And how do we know that that's what is going to work? > > Because it's the pre-existing behaviour. Valid point. But for the DomU case there simply has not been any pre-existing behavior. Hence my desire to be restrictive initially there. >> We should >> strictly accumulate for Dom0. And what we do for DomU is moot for >> the moment, until PCI passthrough becomes a thing for PVH. Hence >> I've opted to be restrictive there - I'd rather see things break >> (and getting adjusted) when this future work actually gets carried >> out, than leave things permissive for no-one to notice that it's >> too permissive, leading to an XSA. > > Restricting execute permissions is something unique to virt. It doesn't > exist in a non-virtualised system, as I and D side reads are > indistinguishable outside of the core. > > Furthermore, it is inexpressible on some systems/configurations. > > Introspection is the only technology which should be restricting execute > permissions in the p2m, and only when it takes responsibility for > dealing with the fallout. IOW are you saying that the calls to set_identity_p2m_entry() (pre-dating XSA-378) were wrong to use p2m_access_rw? Because that's what's getting the way here. Plus, as a side note, then we don't even have e.g. IOMMUF_executable. Jan
On 31/08/2021 16:38, Jan Beulich wrote: > On 31.08.2021 17:25, Andrew Cooper wrote: >> On 31/08/2021 14:26, Jan Beulich wrote: >>> On 31.08.2021 15:16, Andrew Cooper wrote: >>>> On 30/08/2021 14:02, Jan Beulich wrote: >>>>> Further permit "access" to differ in the "executable" attribute. While >>>>> ideally only ROM regions would get mapped with X set, getting there is >>>>> quite a bit of work. Therefore, as a temporary measure, permit X to >>>>> vary. For Dom0 the more permissive of the types will be used, while for >>>>> DomU it'll be the more restrictive one. >>>> Split behaviour between dom0 and domU based on types alone cannot >>>> possibly be correct. >>> True, but what do you do. >>> >>>> DomU's need to execute ROMs too, and this looks like will malfunction if >>>> a ROM ends up in the region that HVMLoader relocated RAM from. >>>> >>>> As this is a temporary bodge emergency bugfix, don't try to be clever - >>>> just take the latest access. >>> And how do we know that that's what is going to work? >> Because it's the pre-existing behaviour. > Valid point. But for the DomU case there simply has not been any > pre-existing behavior. Hence my desire to be restrictive initially > there. But you're conflating a feature (under question anyway, because I gave you an example where I expect this will collide in a regular domU already), with an emergency bugfix to unbreak staging caused by an unexpected interaction in a security hotfix. At an absolute minimum, this patch needs splitting in two to separate the bugfix from the proposed feature. >>> We should >>> strictly accumulate for Dom0. And what we do for DomU is moot for >>> the moment, until PCI passthrough becomes a thing for PVH. Hence >>> I've opted to be restrictive there - I'd rather see things break >>> (and getting adjusted) when this future work actually gets carried >>> out, than leave things permissive for no-one to notice that it's >>> too permissive, leading to an XSA. >> Restricting execute permissions is something unique to virt. It doesn't >> exist in a non-virtualised system, as I and D side reads are >> indistinguishable outside of the core. >> >> Furthermore, it is inexpressible on some systems/configurations. >> >> Introspection is the only technology which should be restricting execute >> permissions in the p2m, and only when it takes responsibility for >> dealing with the fallout. > IOW are you saying that the calls to set_identity_p2m_entry() > (pre-dating XSA-378) were wrong to use p2m_access_rw? Yes. > Because that's > what's getting the way here. On a real machine, you really can write some executable code into an E820 reserved region and jump to it. You can also execute code from real BARs is you happen to know that they are prefetchable (or you're a glutton for UC reads...) And there is the WPBT ACPI table which exists specifically to let firmware inject drivers/applications into a windows environment, and may come out of the SPI ROM in the first place. Is it sensible to execute an E820 reserved region, or unmarked BAR? Probably not. Should it work, because that's how real hardware behaves? Absolutely. Any restrictions beyond that want handling by some kind of introspection agent which has a policy of what to do with legal-but-dodgy-looking actions. > Plus, as a side note, then we don't even have e.g. IOMMUF_executable. Just as I vs D side reads are indistinguishable outside of the CPU core, the same is in principle true for PCI devices which execute code (e.g. GPU shaders). Reads on the bus are just reads. That said, the latest SIOV spec does appear to include the ER (Execution Request) bit for use with PASSID/Shared-Virtual-Memory, which interacts with the EPT-X/ia32-NX bits, and goes as far as having SMEP/SMAP bits in the IOMMU configuration. I'm not sure if this is in released hardware yet, but it's clearly on the horizon. I can't spot any execute related controls in the AMD IOMMU spec, although it does have user/supervisor for PASSID/SVM. ~Andrew
On 01.09.2021 14:47, Andrew Cooper wrote: > On 31/08/2021 16:38, Jan Beulich wrote: >> On 31.08.2021 17:25, Andrew Cooper wrote: >>> On 31/08/2021 14:26, Jan Beulich wrote: >>>> On 31.08.2021 15:16, Andrew Cooper wrote: >>>>> On 30/08/2021 14:02, Jan Beulich wrote: >>>>>> Further permit "access" to differ in the "executable" attribute. While >>>>>> ideally only ROM regions would get mapped with X set, getting there is >>>>>> quite a bit of work. Therefore, as a temporary measure, permit X to >>>>>> vary. For Dom0 the more permissive of the types will be used, while for >>>>>> DomU it'll be the more restrictive one. >>>>> Split behaviour between dom0 and domU based on types alone cannot >>>>> possibly be correct. >>>> True, but what do you do. >>>> >>>>> DomU's need to execute ROMs too, and this looks like will malfunction if >>>>> a ROM ends up in the region that HVMLoader relocated RAM from. >>>>> >>>>> As this is a temporary bodge emergency bugfix, don't try to be clever - >>>>> just take the latest access. >>>> And how do we know that that's what is going to work? >>> Because it's the pre-existing behaviour. >> Valid point. But for the DomU case there simply has not been any >> pre-existing behavior. Hence my desire to be restrictive initially >> there. > > But you're conflating a feature (under question anyway, because I gave > you an example where I expect this will collide in a regular domU > already), I don't think your example fits: hvmloader moving RAM will first convert the p2m slot to non-present. Then a ROM page can get mapped there quite fine. A direct transition (without going through n/p) would not work independent of the change here: The MFNs would differ, as would the p2m types. > with an emergency bugfix to unbreak staging caused by an > unexpected interaction in a security hotfix. > > At an absolute minimum, this patch needs splitting in two to separate > the bugfix from the proposed feature. Well, okay, I will split the patch, despite not being convinced this will do us any good - we'd backport just the part you consider a bug fix, but not the part you deem a feature (and which I consider part of the bug fix). >>>> We should >>>> strictly accumulate for Dom0. And what we do for DomU is moot for >>>> the moment, until PCI passthrough becomes a thing for PVH. Hence >>>> I've opted to be restrictive there - I'd rather see things break >>>> (and getting adjusted) when this future work actually gets carried >>>> out, than leave things permissive for no-one to notice that it's >>>> too permissive, leading to an XSA. Actually I think I was missing an important aspect here: The code in question gets used not only for PVH, but also for HVM, where pass- through is a thing. Hence I'll restrict the "feature" part to Dom0 for now. >>> Restricting execute permissions is something unique to virt. It doesn't >>> exist in a non-virtualised system, as I and D side reads are >>> indistinguishable outside of the core. >>> >>> Furthermore, it is inexpressible on some systems/configurations. >>> >>> Introspection is the only technology which should be restricting execute >>> permissions in the p2m, and only when it takes responsibility for >>> dealing with the fallout. >> IOW are you saying that the calls to set_identity_p2m_entry() >> (pre-dating XSA-378) were wrong to use p2m_access_rw? > > Yes. > >> Because that's >> what's getting the way here. > > On a real machine, you really can write some executable code into an > E820 reserved region and jump to it. You can also execute code from > real BARs is you happen to know that they are prefetchable (or you're a > glutton for UC reads...) > > And there is the WPBT ACPI table which exists specifically to let > firmware inject drivers/applications into a windows environment, and may > come out of the SPI ROM in the first place. > > > Is it sensible to execute an E820 reserved region, or unmarked BAR? > Probably not. > > Should it work, because that's how real hardware behaves? Absolutely. > > Any restrictions beyond that want handling by some kind of introspection > agent which has a policy of what to do with legal-but-dodgy-looking actions. IOW you suggest we remove p2m_access_t parameters from various functions, going with just default access? Of course, as pointed out in another reply, we'll need to split p2m_mmio_direct into multiple types then, at the very least to honor the potential r/o restriction of AMD IOMMU unity mapped regions. (FAOD all of this isn't a short term plan anyway, at least afaic.) Jan
On 01/09/2021 14:08, Jan Beulich wrote: >>>> Restricting execute permissions is something unique to virt. It doesn't >>>> exist in a non-virtualised system, as I and D side reads are >>>> indistinguishable outside of the core. >>>> >>>> Furthermore, it is inexpressible on some systems/configurations. >>>> >>>> Introspection is the only technology which should be restricting execute >>>> permissions in the p2m, and only when it takes responsibility for >>>> dealing with the fallout. >>> IOW are you saying that the calls to set_identity_p2m_entry() >>> (pre-dating XSA-378) were wrong to use p2m_access_rw? >> Yes. >> >>> Because that's >>> what's getting the way here. >> On a real machine, you really can write some executable code into an >> E820 reserved region and jump to it. You can also execute code from >> real BARs is you happen to know that they are prefetchable (or you're a >> glutton for UC reads...) >> >> And there is the WPBT ACPI table which exists specifically to let >> firmware inject drivers/applications into a windows environment, and may >> come out of the SPI ROM in the first place. >> >> >> Is it sensible to execute an E820 reserved region, or unmarked BAR? >> Probably not. >> >> Should it work, because that's how real hardware behaves? Absolutely. >> >> Any restrictions beyond that want handling by some kind of introspection >> agent which has a policy of what to do with legal-but-dodgy-looking actions. > IOW you suggest we remove p2m_access_t parameters from various functions, > going with just default access? p2m_access_t was very obviously a bodge when introduced, and I doubt it would pass today's review standards. It is definitely a mis-design, given its ill-specified and overlapping semantics with respect to the p2m type. > Of course, as pointed out in another > reply, we'll need to split p2m_mmio_direct into multiple types then, at > the very least to honor the potential r/o restriction of AMD IOMMU unity > mapped regions. (FAOD all of this isn't a short term plan anyway, at least > afaic.) I don't think that will be necessary. IVMDs are almost non-existent, and given how many other areas of the AMD IOMMU spec are totally unused, I wouldn't be surprised if r/o unity mappings were in that category too. There's no obvious usecase for r/o, as anything critical enough in the platform to have an IVMD in the first place will also be non-trivial enough to require bidirectional communication somehow. The unity mapping only says "this device requires read-only access". It doesn't say "this must be mapped read-only", and it is legitimate to map a r/o unity mapping as r/w. If such a case actually exists, there's clearly one agent in the system with r/w access into the r/o range, and mapping it r/w is equivalent to the "IOMMU not enabled in the first place" case which is the default case for most software for the past decade-and-a-bit. In other words, I don't think the r/o unit maps on their own are a good enough reasons to split the type. If we gain other reasons then fine, but this seems like chunk of complexity with no real users. ~Andrew
On 06.09.2021 21:53, Andrew Cooper wrote: > On 01/09/2021 14:08, Jan Beulich wrote: >>>>> Restricting execute permissions is something unique to virt. It doesn't >>>>> exist in a non-virtualised system, as I and D side reads are >>>>> indistinguishable outside of the core. >>>>> >>>>> Furthermore, it is inexpressible on some systems/configurations. >>>>> >>>>> Introspection is the only technology which should be restricting execute >>>>> permissions in the p2m, and only when it takes responsibility for >>>>> dealing with the fallout. >>>> IOW are you saying that the calls to set_identity_p2m_entry() >>>> (pre-dating XSA-378) were wrong to use p2m_access_rw? >>> Yes. >>> >>>> Because that's >>>> what's getting the way here. >>> On a real machine, you really can write some executable code into an >>> E820 reserved region and jump to it. You can also execute code from >>> real BARs is you happen to know that they are prefetchable (or you're a >>> glutton for UC reads...) >>> >>> And there is the WPBT ACPI table which exists specifically to let >>> firmware inject drivers/applications into a windows environment, and may >>> come out of the SPI ROM in the first place. >>> >>> >>> Is it sensible to execute an E820 reserved region, or unmarked BAR? >>> Probably not. >>> >>> Should it work, because that's how real hardware behaves? Absolutely. >>> >>> Any restrictions beyond that want handling by some kind of introspection >>> agent which has a policy of what to do with legal-but-dodgy-looking actions. >> IOW you suggest we remove p2m_access_t parameters from various functions, >> going with just default access? > > p2m_access_t was very obviously a bodge when introduced, and I doubt it > would pass today's review standards. > > It is definitely a mis-design, given its ill-specified and overlapping > semantics with respect to the p2m type. > >> Of course, as pointed out in another >> reply, we'll need to split p2m_mmio_direct into multiple types then, at >> the very least to honor the potential r/o restriction of AMD IOMMU unity >> mapped regions. (FAOD all of this isn't a short term plan anyway, at least >> afaic.) > > I don't think that will be necessary. > > IVMDs are almost non-existent, and given how many other areas of the AMD > IOMMU spec are totally unused, I wouldn't be surprised if r/o unity > mappings were in that category too. There's no obvious usecase for r/o, > as anything critical enough in the platform to have an IVMD in the first > place will also be non-trivial enough to require bidirectional > communication somehow. > > The unity mapping only says "this device requires read-only access". It > doesn't say "this must be mapped read-only", and it is legitimate to map > a r/o unity mapping as r/w. Well, imo that's extending what "Write permission. 1b=writeable, 0b=not writeable" and "Read permission. 1b=readable, 0b=not readable" in the spec say. "Permission" to me doesn't mean what you say. Nevertheless I would perhaps not insist (as I've already made clear I don't see a strong need to support w/o mappings), if ... > If such a case actually exists, there's clearly one agent in the system > with r/w access into the r/o range, and mapping it r/w is equivalent to > the "IOMMU not enabled in the first place" case which is the default > case for most software for the past decade-and-a-bit. > > In other words, I don't think the r/o unit maps on their own are a good > enough reasons to split the type. If we gain other reasons then fine, > but this seems like chunk of complexity with no real users. ... there wasn't already a 2nd use for this: The IO-APIC mappings (see my respective pending patch). Jan
On Tue, Aug 31, 2021 at 05:38:49PM +0200, Jan Beulich wrote: > On 31.08.2021 17:25, Andrew Cooper wrote: > > On 31/08/2021 14:26, Jan Beulich wrote: > >> On 31.08.2021 15:16, Andrew Cooper wrote: > >>> On 30/08/2021 14:02, Jan Beulich wrote: > >>>> Further permit "access" to differ in the "executable" attribute. While > >>>> ideally only ROM regions would get mapped with X set, getting there is > >>>> quite a bit of work. Therefore, as a temporary measure, permit X to > >>>> vary. For Dom0 the more permissive of the types will be used, while for > >>>> DomU it'll be the more restrictive one. > >>> Split behaviour between dom0 and domU based on types alone cannot > >>> possibly be correct. > >> True, but what do you do. > >> > >>> DomU's need to execute ROMs too, and this looks like will malfunction if > >>> a ROM ends up in the region that HVMLoader relocated RAM from. > >>> > >>> As this is a temporary bodge emergency bugfix, don't try to be clever - > >>> just take the latest access. > >> And how do we know that that's what is going to work? > > > > Because it's the pre-existing behaviour. > > Valid point. But for the DomU case there simply has not been any > pre-existing behavior. Hence my desire to be restrictive initially > there. > > >> We should > >> strictly accumulate for Dom0. And what we do for DomU is moot for > >> the moment, until PCI passthrough becomes a thing for PVH. Hence > >> I've opted to be restrictive there - I'd rather see things break > >> (and getting adjusted) when this future work actually gets carried > >> out, than leave things permissive for no-one to notice that it's > >> too permissive, leading to an XSA. > > > > Restricting execute permissions is something unique to virt. It doesn't > > exist in a non-virtualised system, as I and D side reads are > > indistinguishable outside of the core. > > > > Furthermore, it is inexpressible on some systems/configurations. > > > > Introspection is the only technology which should be restricting execute > > permissions in the p2m, and only when it takes responsibility for > > dealing with the fallout. > > IOW are you saying that the calls to set_identity_p2m_entry() > (pre-dating XSA-378) were wrong to use p2m_access_rw? Because that's > what's getting the way here. I did wonder this before, because I saw some messages on a couple of systems about mappings override, and I'm not sure why we need to use p2m_access_rw. My first thought was to suggest to switch to use the default access type for the domain, like set_mmio_p2m_entry does. I have to admit I'm not sure I see the point of preventing execution, but it's possible I'm missing something. Thanks, Roger.
On 01.09.2021 10:50, Roger Pau Monné wrote: > On Tue, Aug 31, 2021 at 05:38:49PM +0200, Jan Beulich wrote: >> On 31.08.2021 17:25, Andrew Cooper wrote: >>> On 31/08/2021 14:26, Jan Beulich wrote: >>>> On 31.08.2021 15:16, Andrew Cooper wrote: >>>>> On 30/08/2021 14:02, Jan Beulich wrote: >>>>>> Further permit "access" to differ in the "executable" attribute. While >>>>>> ideally only ROM regions would get mapped with X set, getting there is >>>>>> quite a bit of work. Therefore, as a temporary measure, permit X to >>>>>> vary. For Dom0 the more permissive of the types will be used, while for >>>>>> DomU it'll be the more restrictive one. >>>>> Split behaviour between dom0 and domU based on types alone cannot >>>>> possibly be correct. >>>> True, but what do you do. >>>> >>>>> DomU's need to execute ROMs too, and this looks like will malfunction if >>>>> a ROM ends up in the region that HVMLoader relocated RAM from. >>>>> >>>>> As this is a temporary bodge emergency bugfix, don't try to be clever - >>>>> just take the latest access. >>>> And how do we know that that's what is going to work? >>> >>> Because it's the pre-existing behaviour. >> >> Valid point. But for the DomU case there simply has not been any >> pre-existing behavior. Hence my desire to be restrictive initially >> there. >> >>>> We should >>>> strictly accumulate for Dom0. And what we do for DomU is moot for >>>> the moment, until PCI passthrough becomes a thing for PVH. Hence >>>> I've opted to be restrictive there - I'd rather see things break >>>> (and getting adjusted) when this future work actually gets carried >>>> out, than leave things permissive for no-one to notice that it's >>>> too permissive, leading to an XSA. >>> >>> Restricting execute permissions is something unique to virt. It doesn't >>> exist in a non-virtualised system, as I and D side reads are >>> indistinguishable outside of the core. >>> >>> Furthermore, it is inexpressible on some systems/configurations. >>> >>> Introspection is the only technology which should be restricting execute >>> permissions in the p2m, and only when it takes responsibility for >>> dealing with the fallout. >> >> IOW are you saying that the calls to set_identity_p2m_entry() >> (pre-dating XSA-378) were wrong to use p2m_access_rw? Because that's >> what's getting the way here. > > I did wonder this before, because I saw some messages on a couple of > systems about mappings override, and I'm not sure why we need to use > p2m_access_rw. My first thought was to suggest to switch to use the > default access type for the domain, like set_mmio_p2m_entry does. > > I have to admit I'm not sure I see the point of preventing execution, > but it's possible I'm missing something. Well, what good can come from allowing execution from, say, the IO-APIC or LAPIC pages? Or other MMIO-mapped register space? Insn fetches might even trip bad hardware behavior in such a case by being the wrong granularity. It's imo really only ROM space which ought to have execution permitted. The issue isn't just with execution, though, and as a result I may need to change the logic here to also include at least W. As of one of the XSA-378 changes we may now pass just p2m_access_r to iommu_identity_mapping(), if the ACPI tables on an AMD system were saying so. (We may also pass p2m_access_w, but I sincerely hope no firmware would specify write but no read access.) Similarly in "IOMMU/x86: restrict IO-APIC mappings for PV Dom0" I now pass p2m_access_r to set_identity_p2m_entry(). I suppose an underlying issue is the mixed purpose of using p2m_access_*, which possibly has been against the intentions in the first place. We cannot, for example, express r/o access to an MMIO page without using p2m_access_r (or p2m_access_rx), as there's no suitable p2m type to express this via type alone. We may need to split p2m_mmio_direct into multiple types (up to 7), I guess, if we wanted to remove this (ab)use of p2m_access_*. Jan
On 31.08.2021 17:38, Jan Beulich wrote: > On 31.08.2021 17:25, Andrew Cooper wrote: >> On 31/08/2021 14:26, Jan Beulich wrote: >>> On 31.08.2021 15:16, Andrew Cooper wrote: >>>> On 30/08/2021 14:02, Jan Beulich wrote: >>>>> Further permit "access" to differ in the "executable" attribute. While >>>>> ideally only ROM regions would get mapped with X set, getting there is >>>>> quite a bit of work. Therefore, as a temporary measure, permit X to >>>>> vary. For Dom0 the more permissive of the types will be used, while for >>>>> DomU it'll be the more restrictive one. >>>> Split behaviour between dom0 and domU based on types alone cannot >>>> possibly be correct. >>> True, but what do you do. >>> >>>> DomU's need to execute ROMs too, and this looks like will malfunction if >>>> a ROM ends up in the region that HVMLoader relocated RAM from. >>>> >>>> As this is a temporary bodge emergency bugfix, don't try to be clever - >>>> just take the latest access. >>> And how do we know that that's what is going to work? >> >> Because it's the pre-existing behaviour. > > Valid point. But for the DomU case there simply has not been any > pre-existing behavior. Hence my desire to be restrictive initially > there. Further to this: Using the last-value-set approach also puts us at risk of running into a similar issue again when the ordering of some operations changes elsewhere. Jan
© 2016 - 2024 Red Hat, Inc.