While for 5500 and 5520 chipsets only B3 and C2 are mentioned in the
spec update, X58's also mentions B2, and searching the internet suggests
systems with this stepping are actually in use. Even worse, for X58
erratum #69 is marked applicable even to C2. Split the check to cover
all applicable steppings and to also report applicable errata numbers in
the log message. The splitting requires using the DMI port instead of
the System Management Registers device, but that's then in line (also
revision checking wise) with the spec updates.
Fixes: 6890cebc6a98 ("VT-d: deal with 5500/5520/X58 errata")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
As to disabling just interrupt remapping (as the initial version of the
original patch did) vs disabling the IOMMU as a whole: Using a less
heavy workaround would of course be desirable, but then we need to
ensure not to misguide the tool stack about the state of the system. It
uses the PHYSCAP_directio sysctl output to determine whether PCI pass-
through can be made use of, yet that flag is driven by "iommu_enabled"
alone, without regard to the setting of "iommu_intremap".
--- a/xen/drivers/passthrough/vtd/quirks.c
+++ b/xen/drivers/passthrough/vtd/quirks.c
@@ -268,26 +268,42 @@ static int __init parse_snb_timeout(cons
}
custom_param("snb_igd_quirk", parse_snb_timeout);
-/* 5500/5520/X58 Chipset Interrupt remapping errata, for stepping B-3.
- * Fixed in stepping C-2. */
+/*
+ * 5500/5520/X58 chipset interrupt remapping errata, for steppings B2 and B3.
+ * Fixed in stepping C2 except on X58.
+ */
static void __init tylersburg_intremap_quirk(void)
{
- uint32_t bus, device;
+ unsigned int bus;
uint8_t rev;
for ( bus = 0; bus < 0x100; bus++ )
{
- /* Match on System Management Registers on Device 20 Function 0 */
- device = pci_conf_read32(PCI_SBDF(0, bus, 20, 0), PCI_VENDOR_ID);
- rev = pci_conf_read8(PCI_SBDF(0, bus, 20, 0), PCI_REVISION_ID);
+ /* Match on DMI port (Device 0 Function 0) */
+ rev = pci_conf_read8(PCI_SBDF(0, bus, 0, 0), PCI_REVISION_ID);
- if ( rev == 0x13 && device == 0x342e8086 )
+ switch ( pci_conf_read32(PCI_SBDF(0, bus, 0, 0), PCI_VENDOR_ID) )
{
+ default:
+ continue;
+
+ case 0x34038086: case 0x34068086:
+ if ( rev >= 0x22 )
+ continue;
printk(XENLOG_WARNING VTDPREFIX
- "Disabling IOMMU due to Intel 5500/5520/X58 Chipset errata #47, #53\n");
- iommu_enable = 0;
+ "Disabling IOMMU due to Intel 5500/5520 chipset errata #47 and #53\n");
+ iommu_enable = false;
+ break;
+
+ case 0x34058086:
+ printk(XENLOG_WARNING VTDPREFIX
+ "Disabling IOMMU due to Intel X58 chipset %s\n",
+ rev < 0x22 ? "errata #62 and #69" : "erratum #69");
+ iommu_enable = false;
break;
}
+
+ break;
}
}
On Tue, Aug 03, 2021 at 01:13:40PM +0200, Jan Beulich wrote: > While for 5500 and 5520 chipsets only B3 and C2 are mentioned in the > spec update, X58's also mentions B2, and searching the internet suggests > systems with this stepping are actually in use. Even worse, for X58 > erratum #69 is marked applicable even to C2. Split the check to cover > all applicable steppings and to also report applicable errata numbers in > the log message. The splitting requires using the DMI port instead of > the System Management Registers device, but that's then in line (also > revision checking wise) with the spec updates. > > Fixes: 6890cebc6a98 ("VT-d: deal with 5500/5520/X58 errata") > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > As to disabling just interrupt remapping (as the initial version of the > original patch did) vs disabling the IOMMU as a whole: Using a less > heavy workaround would of course be desirable, but then we need to > ensure not to misguide the tool stack about the state of the system. It > uses the PHYSCAP_directio sysctl output to determine whether PCI pass- > through can be made use of, yet that flag is driven by "iommu_enabled" > alone, without regard to the setting of "iommu_intremap". How does it differ from the situation where interrupt remapping actually isn't supported at all? Toolstack will use IOMMU then, in a way that is supported on a given platform. Sure, missing interrupt remapping makes it less robust[1]. But really, broken and missing interrupt remapping should be treated the same way. If we would have an option (in toolstack, or Xen) to force interrupt remapping, then indeed when it's broken, PCI passthrough should be refused (or maybe even system should refuse to boot if we'd have something like iommu=intremap=require). But none of those actually exists. And disabling the whole IOMMU in some cases of unusable intremap, but not the others, is not exactly useful thing to do (it breaks some cases, but still doesn't allow to reason about intremap in toolstack). So, I propose to disable just iommu_intremap if it's broken as part of this bug fix. But, independently (and _not_ as a pre-requisite) do either: - let the toolstack know if intremap is used, or - add iommu=intremap=require to refuse boot if intremap is missing/broken [1] https://invisiblethingslab.com/resources/2011/Software%20Attacks%20on%20Intel%20VT-d.pdf -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab
On 03.08.2021 14:21, Marek Marczykowski-Górecki wrote: > On Tue, Aug 03, 2021 at 01:13:40PM +0200, Jan Beulich wrote: >> While for 5500 and 5520 chipsets only B3 and C2 are mentioned in the >> spec update, X58's also mentions B2, and searching the internet suggests >> systems with this stepping are actually in use. Even worse, for X58 >> erratum #69 is marked applicable even to C2. Split the check to cover >> all applicable steppings and to also report applicable errata numbers in >> the log message. The splitting requires using the DMI port instead of >> the System Management Registers device, but that's then in line (also >> revision checking wise) with the spec updates. >> >> Fixes: 6890cebc6a98 ("VT-d: deal with 5500/5520/X58 errata") >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> --- >> As to disabling just interrupt remapping (as the initial version of the >> original patch did) vs disabling the IOMMU as a whole: Using a less >> heavy workaround would of course be desirable, but then we need to >> ensure not to misguide the tool stack about the state of the system. It >> uses the PHYSCAP_directio sysctl output to determine whether PCI pass- >> through can be made use of, yet that flag is driven by "iommu_enabled" >> alone, without regard to the setting of "iommu_intremap". > > How does it differ from the situation where interrupt remapping actually > isn't supported at all? Toolstack will use IOMMU then, in a way that is > supported on a given platform. Sure, missing interrupt remapping makes > it less robust[1]. But really, broken and missing interrupt remapping > should be treated the same way. I agree; in fact I meant to mention this aspect but then forgot. > If we would have an option (in > toolstack, or Xen) to force interrupt remapping, then indeed when it's > broken, PCI passthrough should be refused (or maybe even system should > refuse to boot if we'd have something like iommu=intremap=require). But > none of those actually exists. "iommu=force" actually does prevent boot from completing when interrupt remapping is available, but then gets turned off for some reason. See iommu_setup()'s bool_t force_intremap = force_iommu && iommu_intremap; > And disabling the whole IOMMU in some > cases of unusable intremap, but not the others, is not exactly useful > thing to do (it breaks some cases, but still doesn't allow to reason > about intremap in toolstack). > > So, I propose to disable just iommu_intremap if it's broken as part of > this bug fix. But, independently (and _not_ as a pre-requisite) do > either: > - let the toolstack know if intremap is used, or I don't follow why you even emphasize the "not" on this being a prereq. I consider it a plain bug (with possibly a security angle) that PCI pass-through may be permitted by the tool stack in the absence of interrupt remapping, without an explicit admin request to enable this (even) less secure mode of operation. Not making this a prereq would mean to widen the scope of the bug. Jan
On Tue, Aug 03, 2021 at 02:29:01PM +0200, Jan Beulich wrote: > On 03.08.2021 14:21, Marek Marczykowski-Górecki wrote: > > If we would have an option (in > > toolstack, or Xen) to force interrupt remapping, then indeed when it's > > broken, PCI passthrough should be refused (or maybe even system should > > refuse to boot if we'd have something like iommu=intremap=require). But > > none of those actually exists. > > "iommu=force" actually does prevent boot from completing when > interrupt remapping is available, but then gets turned off for > some reason. See iommu_setup()'s > > bool_t force_intremap = force_iommu && iommu_intremap; Ok, then, just setting iommu_intremap=false should do the right thing, if platform_quirks_init() is called somewhere between the above line, and actual enforcement of iommu=force few lines later. I couldn't quickly find if that is the case - is it? Anyway, this still doesn't give the toolstack, or the admin sufficient control, because there is no way to express "use PCI passthrough only if IOMMU _and_ interrupt remapping is in use". Even with iommu=force, because intremap could simply be missing on the platform. So, to be sure, the admin still need to inspect the boot log to fish that information out - could do that in the "intremap broken" case as well. So, iommu=force should either always require intremap too (IMO less preferable), or there should be separate intremap=force, that prevents the boot if intremap cannot be used for any reason. Even better, if the toolstack could figure it out, and apply the admin policy on per-domain basis, but that's a broader change (that IMO should not be a part of a bugfix). > > And disabling the whole IOMMU in some > > cases of unusable intremap, but not the others, is not exactly useful > > thing to do (it breaks some cases, but still doesn't allow to reason > > about intremap in toolstack). > > > > So, I propose to disable just iommu_intremap if it's broken as part of > > this bug fix. But, independently (and _not_ as a pre-requisite) do > > either: > > - let the toolstack know if intremap is used, or > > I don't follow why you even emphasize the "not" on this being a prereq. > I consider it a plain bug (with possibly a security angle) that PCI > pass-through may be permitted by the tool stack in the absence of > interrupt remapping, without an explicit admin request to enable this > (even) less secure mode of operation. Not making this a prereq would > mean to widen the scope of the bug. As explained above - the scope here doesn't really matter. Admin currently (with or without this commit) cannot rely on intremap being used, even with iommu=force. For that, admin needs to inspect the boot log. And when done, inspecting the boot log will catch both cases - intremap missing and intremap broken. But, disabling the whole IOMMU if intremap is broken, doesn't even allow to make a conscious choice to choose to use it. This breaks the (very much valid) configuration of running a _trusted_ HVM guest with PCI passthorugh, on some platforms. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab
On 03.08.2021 15:01, Marek Marczykowski-Górecki wrote: > On Tue, Aug 03, 2021 at 02:29:01PM +0200, Jan Beulich wrote: >> On 03.08.2021 14:21, Marek Marczykowski-Górecki wrote: >>> If we would have an option (in >>> toolstack, or Xen) to force interrupt remapping, then indeed when it's >>> broken, PCI passthrough should be refused (or maybe even system should >>> refuse to boot if we'd have something like iommu=intremap=require). But >>> none of those actually exists. >> >> "iommu=force" actually does prevent boot from completing when >> interrupt remapping is available, but then gets turned off for >> some reason. See iommu_setup()'s >> >> bool_t force_intremap = force_iommu && iommu_intremap; > > Ok, then, just setting iommu_intremap=false should do the right thing, ... if "iommu=force" is in use (but not otherwise), ... > if platform_quirks_init() is called somewhere between the above line, > and actual enforcement of iommu=force few lines later. I couldn't > quickly find if that is the case - is it? iommu_setup() -> iommu_hardware_setup() -> iommu_init_ops->setup() (i.e. vtd_setup()) -> platform_quirks_init() Jan
On Tue, Aug 03, 2021 at 03:06:50PM +0200, Jan Beulich wrote: > On 03.08.2021 15:01, Marek Marczykowski-Górecki wrote: > > Ok, then, just setting iommu_intremap=false should do the right thing, > > ... if "iommu=force" is in use (but not otherwise), ... But that's the purpose of iommu=force, no? With "iommu=force": strictly require IOMMU Without "iommu=force": use IOMMU on best-effort basis It makes sense to refuse the boot if intremap is broken in the first case. But also, it makes sense to allow using IOMMU without intremp in the second case. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab
On 03.08.2021 15:12, Marek Marczykowski-Górecki wrote: > On Tue, Aug 03, 2021 at 03:06:50PM +0200, Jan Beulich wrote: >> On 03.08.2021 15:01, Marek Marczykowski-Górecki wrote: >>> Ok, then, just setting iommu_intremap=false should do the right thing, >> >> ... if "iommu=force" is in use (but not otherwise), ... > > But that's the purpose of iommu=force, no? > With "iommu=force": strictly require IOMMU > Without "iommu=force": use IOMMU on best-effort basis > > It makes sense to refuse the boot if intremap is broken in the first > case. But also, it makes sense to allow using IOMMU without intremp in > the second case. I agree with both statements. What I disagree with is that the latter happens by default (instead of only upon admin override), including the case of intremap being unavailable in the first place. Jan
> From: Jan Beulich <jbeulich@suse.com> > Sent: Tuesday, August 3, 2021 7:14 PM > > While for 5500 and 5520 chipsets only B3 and C2 are mentioned in the > spec update, X58's also mentions B2, and searching the internet suggests > systems with this stepping are actually in use. Even worse, for X58 > erratum #69 is marked applicable even to C2. Split the check to cover > all applicable steppings and to also report applicable errata numbers in > the log message. The splitting requires using the DMI port instead of > the System Management Registers device, but that's then in line (also > revision checking wise) with the spec updates. > > Fixes: 6890cebc6a98 ("VT-d: deal with 5500/5520/X58 errata") > Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> > --- > As to disabling just interrupt remapping (as the initial version of the > original patch did) vs disabling the IOMMU as a whole: Using a less > heavy workaround would of course be desirable, but then we need to > ensure not to misguide the tool stack about the state of the system. It > uses the PHYSCAP_directio sysctl output to determine whether PCI pass- > through can be made use of, yet that flag is driven by "iommu_enabled" > alone, without regard to the setting of "iommu_intremap". > > --- a/xen/drivers/passthrough/vtd/quirks.c > +++ b/xen/drivers/passthrough/vtd/quirks.c > @@ -268,26 +268,42 @@ static int __init parse_snb_timeout(cons > } > custom_param("snb_igd_quirk", parse_snb_timeout); > > -/* 5500/5520/X58 Chipset Interrupt remapping errata, for stepping B-3. > - * Fixed in stepping C-2. */ > +/* > + * 5500/5520/X58 chipset interrupt remapping errata, for steppings B2 and > B3. > + * Fixed in stepping C2 except on X58. > + */ > static void __init tylersburg_intremap_quirk(void) > { > - uint32_t bus, device; > + unsigned int bus; > uint8_t rev; > > for ( bus = 0; bus < 0x100; bus++ ) > { > - /* Match on System Management Registers on Device 20 Function 0 */ > - device = pci_conf_read32(PCI_SBDF(0, bus, 20, 0), PCI_VENDOR_ID); > - rev = pci_conf_read8(PCI_SBDF(0, bus, 20, 0), PCI_REVISION_ID); > + /* Match on DMI port (Device 0 Function 0) */ > + rev = pci_conf_read8(PCI_SBDF(0, bus, 0, 0), PCI_REVISION_ID); > > - if ( rev == 0x13 && device == 0x342e8086 ) > + switch ( pci_conf_read32(PCI_SBDF(0, bus, 0, 0), PCI_VENDOR_ID) ) > { > + default: > + continue; > + > + case 0x34038086: case 0x34068086: > + if ( rev >= 0x22 ) > + continue; > printk(XENLOG_WARNING VTDPREFIX > - "Disabling IOMMU due to Intel 5500/5520/X58 Chipset errata #47, > #53\n"); > - iommu_enable = 0; > + "Disabling IOMMU due to Intel 5500/5520 chipset errata #47 and > #53\n"); > + iommu_enable = false; > + break; > + > + case 0x34058086: > + printk(XENLOG_WARNING VTDPREFIX > + "Disabling IOMMU due to Intel X58 chipset %s\n", > + rev < 0x22 ? "errata #62 and #69" : "erratum #69"); > + iommu_enable = false; > break; > } > + > + break; > } > } >
On 03/08/2021 12:13, Jan Beulich wrote: > While for 5500 and 5520 chipsets only B3 and C2 are mentioned in the > spec update, X58's also mentions B2, and searching the internet suggests > systems with this stepping are actually in use. Even worse, for X58 > erratum #69 is marked applicable even to C2. Split the check to cover > all applicable steppings and to also report applicable errata numbers in > the log message. The splitting requires using the DMI port instead of > the System Management Registers device, but that's then in line (also > revision checking wise) with the spec updates. > > Fixes: 6890cebc6a98 ("VT-d: deal with 5500/5520/X58 errata") > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > As to disabling just interrupt remapping (as the initial version of the > original patch did) vs disabling the IOMMU as a whole: Using a less > heavy workaround would of course be desirable, but then we need to > ensure not to misguide the tool stack about the state of the system. This reasoning is buggy. This errata is very specifically to do with interrupt remapping only. Disabling the whole IOMMU in response is inappropriate. > It uses the PHYSCAP_directio sysctl output to determine whether PCI pass- > through can be made use of, yet that flag is driven by "iommu_enabled" > alone, without regard to the setting of "iommu_intremap". The fact that range of hardware, including Tylersburg, don't have interrupt remapping, and noone plumbed this nicely to the toolstack is suboptimal. But it is wholly inappropriate to punish users with Tylersburg hardware because you don't like the fact that the toolstack can't see when interrupt remapping is off. The two issues are entirely orthogonal. Tylersburg (taking this erratum into account) works just as well as and securely as several previous generations of hardware, and should behave the same. ~Andrew
On 18.08.2021 13:32, Andrew Cooper wrote: > On 03/08/2021 12:13, Jan Beulich wrote: >> While for 5500 and 5520 chipsets only B3 and C2 are mentioned in the >> spec update, X58's also mentions B2, and searching the internet suggests >> systems with this stepping are actually in use. Even worse, for X58 >> erratum #69 is marked applicable even to C2. Split the check to cover >> all applicable steppings and to also report applicable errata numbers in >> the log message. The splitting requires using the DMI port instead of >> the System Management Registers device, but that's then in line (also >> revision checking wise) with the spec updates. >> >> Fixes: 6890cebc6a98 ("VT-d: deal with 5500/5520/X58 errata") >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> --- >> As to disabling just interrupt remapping (as the initial version of the >> original patch did) vs disabling the IOMMU as a whole: Using a less >> heavy workaround would of course be desirable, but then we need to >> ensure not to misguide the tool stack about the state of the system. > > This reasoning is buggy. > > This errata is very specifically to do with interrupt remapping only. > Disabling the whole IOMMU in response is inappropriate. That's your view, and I accept it as a reasonable one. I don't accept it as being the only reasonable one though, and hence I object to you tagging other views (here just like in various cases elsewhere) as "buggy" (or sometimes worse). >> It uses the PHYSCAP_directio sysctl output to determine whether PCI pass- >> through can be made use of, yet that flag is driven by "iommu_enabled" >> alone, without regard to the setting of "iommu_intremap". > > The fact that range of hardware, including Tylersburg, don't have > interrupt remapping, and noone plumbed this nicely to the toolstack is > suboptimal. > > But it is wholly inappropriate to punish users with Tylersburg hardware > because you don't like the fact that the toolstack can't see when > interrupt remapping is off. The two issues are entirely orthogonal. > > Tylersburg (taking this erratum into account) works just as well as and > securely as several previous generations of hardware, and should behave > the same. Should behave the same - yes. Previous generations without interrupt remapping also shouldn't allow pass-through by default, i.e. require admin consent to run guests in this less secure mode (except, perhaps, for devices without interrupts, albeit I'm unaware of ways to tell). Jan
© 2016 - 2024 Red Hat, Inc.