[PATCH] VT-d: Tylersburg errata apply to further steppings

Jan Beulich posted 1 patch 3 years, 3 months ago
Test gitlab-ci failed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/07ded368-5c12-c06e-fd94-d7ae52d18836@suse.com
[PATCH] VT-d: Tylersburg errata apply to further steppings
Posted by Jan Beulich 3 years, 3 months ago
While for 5500 and 5520 chipsets only B3 and C2 are mentioned in the
spec update, X58's also mentions B2, and searching the internet suggests
systems with this stepping are actually in use. Even worse, for X58
erratum #69 is marked applicable even to C2. Split the check to cover
all applicable steppings and to also report applicable errata numbers in
the log message. The splitting requires using the DMI port instead of
the System Management Registers device, but that's then in line (also
revision checking wise) with the spec updates.

Fixes: 6890cebc6a98 ("VT-d: deal with 5500/5520/X58 errata")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
As to disabling just interrupt remapping (as the initial version of the
original patch did) vs disabling the IOMMU as a whole: Using a less
heavy workaround would of course be desirable, but then we need to
ensure not to misguide the tool stack about the state of the system. It
uses the PHYSCAP_directio sysctl output to determine whether PCI pass-
through can be made use of, yet that flag is driven by "iommu_enabled"
alone, without regard to the setting of "iommu_intremap".

--- a/xen/drivers/passthrough/vtd/quirks.c
+++ b/xen/drivers/passthrough/vtd/quirks.c
@@ -268,26 +268,42 @@ static int __init parse_snb_timeout(cons
 }
 custom_param("snb_igd_quirk", parse_snb_timeout);
 
-/* 5500/5520/X58 Chipset Interrupt remapping errata, for stepping B-3.
- * Fixed in stepping C-2. */
+/*
+ * 5500/5520/X58 chipset interrupt remapping errata, for steppings B2 and B3.
+ * Fixed in stepping C2 except on X58.
+ */
 static void __init tylersburg_intremap_quirk(void)
 {
-    uint32_t bus, device;
+    unsigned int bus;
     uint8_t rev;
 
     for ( bus = 0; bus < 0x100; bus++ )
     {
-        /* Match on System Management Registers on Device 20 Function 0 */
-        device = pci_conf_read32(PCI_SBDF(0, bus, 20, 0), PCI_VENDOR_ID);
-        rev = pci_conf_read8(PCI_SBDF(0, bus, 20, 0), PCI_REVISION_ID);
+        /* Match on DMI port (Device 0 Function 0) */
+        rev = pci_conf_read8(PCI_SBDF(0, bus, 0, 0), PCI_REVISION_ID);
 
-        if ( rev == 0x13 && device == 0x342e8086 )
+        switch ( pci_conf_read32(PCI_SBDF(0, bus, 0, 0), PCI_VENDOR_ID) )
         {
+        default:
+            continue;
+
+        case 0x34038086: case 0x34068086:
+            if ( rev >= 0x22 )
+                continue;
             printk(XENLOG_WARNING VTDPREFIX
-                   "Disabling IOMMU due to Intel 5500/5520/X58 Chipset errata #47, #53\n");
-            iommu_enable = 0;
+                   "Disabling IOMMU due to Intel 5500/5520 chipset errata #47 and #53\n");
+            iommu_enable = false;
+            break;
+
+        case 0x34058086:
+            printk(XENLOG_WARNING VTDPREFIX
+                   "Disabling IOMMU due to Intel X58 chipset %s\n",
+                   rev < 0x22 ? "errata #62 and #69" : "erratum #69");
+            iommu_enable = false;
             break;
         }
+
+        break;
     }
 }
 


Re: [PATCH] VT-d: Tylersburg errata apply to further steppings
Posted by Marek Marczykowski-Górecki 3 years, 3 months ago
On Tue, Aug 03, 2021 at 01:13:40PM +0200, Jan Beulich wrote:
> While for 5500 and 5520 chipsets only B3 and C2 are mentioned in the
> spec update, X58's also mentions B2, and searching the internet suggests
> systems with this stepping are actually in use. Even worse, for X58
> erratum #69 is marked applicable even to C2. Split the check to cover
> all applicable steppings and to also report applicable errata numbers in
> the log message. The splitting requires using the DMI port instead of
> the System Management Registers device, but that's then in line (also
> revision checking wise) with the spec updates.
> 
> Fixes: 6890cebc6a98 ("VT-d: deal with 5500/5520/X58 errata")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> As to disabling just interrupt remapping (as the initial version of the
> original patch did) vs disabling the IOMMU as a whole: Using a less
> heavy workaround would of course be desirable, but then we need to
> ensure not to misguide the tool stack about the state of the system. It
> uses the PHYSCAP_directio sysctl output to determine whether PCI pass-
> through can be made use of, yet that flag is driven by "iommu_enabled"
> alone, without regard to the setting of "iommu_intremap".

How does it differ from the situation where interrupt remapping actually
isn't supported at all? Toolstack will use IOMMU then, in a way that is
supported on a given platform. Sure, missing interrupt remapping makes
it less robust[1]. But really, broken and missing interrupt remapping
should be treated the same way. If we would have an option (in
toolstack, or Xen) to force interrupt remapping, then indeed when it's
broken, PCI passthrough should be refused (or maybe even system should
refuse to boot if we'd have something like iommu=intremap=require). But
none of those actually exists. And disabling the whole IOMMU in some
cases of unusable intremap, but not the others, is not exactly useful
thing to do (it breaks some cases, but still doesn't allow to reason
about intremap in toolstack).

So, I propose to disable just iommu_intremap if it's broken as part of
this bug fix. But, independently (and _not_ as a pre-requisite) do
either:
 - let the toolstack know if intremap is used, or
 - add iommu=intremap=require to refuse boot if intremap is
   missing/broken

[1] https://invisiblethingslab.com/resources/2011/Software%20Attacks%20on%20Intel%20VT-d.pdf

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Re: [PATCH] VT-d: Tylersburg errata apply to further steppings
Posted by Jan Beulich 3 years, 3 months ago
On 03.08.2021 14:21, Marek Marczykowski-Górecki wrote:
> On Tue, Aug 03, 2021 at 01:13:40PM +0200, Jan Beulich wrote:
>> While for 5500 and 5520 chipsets only B3 and C2 are mentioned in the
>> spec update, X58's also mentions B2, and searching the internet suggests
>> systems with this stepping are actually in use. Even worse, for X58
>> erratum #69 is marked applicable even to C2. Split the check to cover
>> all applicable steppings and to also report applicable errata numbers in
>> the log message. The splitting requires using the DMI port instead of
>> the System Management Registers device, but that's then in line (also
>> revision checking wise) with the spec updates.
>>
>> Fixes: 6890cebc6a98 ("VT-d: deal with 5500/5520/X58 errata")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> As to disabling just interrupt remapping (as the initial version of the
>> original patch did) vs disabling the IOMMU as a whole: Using a less
>> heavy workaround would of course be desirable, but then we need to
>> ensure not to misguide the tool stack about the state of the system. It
>> uses the PHYSCAP_directio sysctl output to determine whether PCI pass-
>> through can be made use of, yet that flag is driven by "iommu_enabled"
>> alone, without regard to the setting of "iommu_intremap".
> 
> How does it differ from the situation where interrupt remapping actually
> isn't supported at all? Toolstack will use IOMMU then, in a way that is
> supported on a given platform. Sure, missing interrupt remapping makes
> it less robust[1]. But really, broken and missing interrupt remapping
> should be treated the same way.

I agree; in fact I meant to mention this aspect but then forgot.

> If we would have an option (in
> toolstack, or Xen) to force interrupt remapping, then indeed when it's
> broken, PCI passthrough should be refused (or maybe even system should
> refuse to boot if we'd have something like iommu=intremap=require). But
> none of those actually exists.

"iommu=force" actually does prevent boot from completing when
interrupt remapping is available, but then gets turned off for
some reason. See iommu_setup()'s

    bool_t force_intremap = force_iommu && iommu_intremap;

> And disabling the whole IOMMU in some
> cases of unusable intremap, but not the others, is not exactly useful
> thing to do (it breaks some cases, but still doesn't allow to reason
> about intremap in toolstack).
> 
> So, I propose to disable just iommu_intremap if it's broken as part of
> this bug fix. But, independently (and _not_ as a pre-requisite) do
> either:
>  - let the toolstack know if intremap is used, or

I don't follow why you even emphasize the "not" on this being a prereq.
I consider it a plain bug (with possibly a security angle) that PCI
pass-through may be permitted by the tool stack in the absence of
interrupt remapping, without an explicit admin request to enable this
(even) less secure mode of operation. Not making this a prereq would
mean to widen the scope of the bug.

Jan


Re: [PATCH] VT-d: Tylersburg errata apply to further steppings
Posted by Marek Marczykowski-Górecki 3 years, 3 months ago
On Tue, Aug 03, 2021 at 02:29:01PM +0200, Jan Beulich wrote:
> On 03.08.2021 14:21, Marek Marczykowski-Górecki wrote:
> > If we would have an option (in
> > toolstack, or Xen) to force interrupt remapping, then indeed when it's
> > broken, PCI passthrough should be refused (or maybe even system should
> > refuse to boot if we'd have something like iommu=intremap=require). But
> > none of those actually exists.
> 
> "iommu=force" actually does prevent boot from completing when
> interrupt remapping is available, but then gets turned off for
> some reason. See iommu_setup()'s
> 
>     bool_t force_intremap = force_iommu && iommu_intremap;

Ok, then, just setting iommu_intremap=false should do the right thing,
if platform_quirks_init() is called somewhere between the above line,
and actual enforcement of iommu=force few lines later. I couldn't
quickly find if that is the case - is it?

Anyway, this still doesn't give the toolstack, or the admin sufficient
control, because there is no way to express "use PCI passthrough only if
IOMMU _and_ interrupt remapping is in use". Even with iommu=force,
because intremap could simply be missing on the platform. So, to be
sure, the admin still need to inspect the boot log to fish that
information out - could do that in the "intremap broken" case as well.

So, iommu=force should either always require intremap too (IMO less
preferable), or there should be separate intremap=force, that prevents
the boot if intremap cannot be used for any reason. Even better, if the
toolstack could figure it out, and apply the admin policy on per-domain
basis, but that's a broader change (that IMO should not be a part of a
bugfix).

> > And disabling the whole IOMMU in some
> > cases of unusable intremap, but not the others, is not exactly useful
> > thing to do (it breaks some cases, but still doesn't allow to reason
> > about intremap in toolstack).
> > 
> > So, I propose to disable just iommu_intremap if it's broken as part of
> > this bug fix. But, independently (and _not_ as a pre-requisite) do
> > either:
> >  - let the toolstack know if intremap is used, or
> 
> I don't follow why you even emphasize the "not" on this being a prereq.
> I consider it a plain bug (with possibly a security angle) that PCI
> pass-through may be permitted by the tool stack in the absence of
> interrupt remapping, without an explicit admin request to enable this
> (even) less secure mode of operation. Not making this a prereq would
> mean to widen the scope of the bug.

As explained above - the scope here doesn't really matter. Admin
currently (with or without this commit) cannot rely on intremap being
used, even with iommu=force. For that, admin needs to inspect the boot
log. And when done, inspecting the boot log will catch both cases -
intremap missing and intremap broken. But, disabling the whole IOMMU if
intremap is broken, doesn't even allow to make a conscious choice to
choose to use it. This breaks the (very much valid) configuration of
running a _trusted_ HVM guest with PCI passthorugh, on some platforms.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Re: [PATCH] VT-d: Tylersburg errata apply to further steppings
Posted by Jan Beulich 3 years, 3 months ago
On 03.08.2021 15:01, Marek Marczykowski-Górecki wrote:
> On Tue, Aug 03, 2021 at 02:29:01PM +0200, Jan Beulich wrote:
>> On 03.08.2021 14:21, Marek Marczykowski-Górecki wrote:
>>> If we would have an option (in
>>> toolstack, or Xen) to force interrupt remapping, then indeed when it's
>>> broken, PCI passthrough should be refused (or maybe even system should
>>> refuse to boot if we'd have something like iommu=intremap=require). But
>>> none of those actually exists.
>>
>> "iommu=force" actually does prevent boot from completing when
>> interrupt remapping is available, but then gets turned off for
>> some reason. See iommu_setup()'s
>>
>>     bool_t force_intremap = force_iommu && iommu_intremap;
> 
> Ok, then, just setting iommu_intremap=false should do the right thing,

... if "iommu=force" is in use (but not otherwise), ...

> if platform_quirks_init() is called somewhere between the above line,
> and actual enforcement of iommu=force few lines later. I couldn't
> quickly find if that is the case - is it?

iommu_setup()
-> iommu_hardware_setup()
-> iommu_init_ops->setup() (i.e. vtd_setup())
-> platform_quirks_init()

Jan


Re: [PATCH] VT-d: Tylersburg errata apply to further steppings
Posted by Marek Marczykowski-Górecki 3 years, 3 months ago
On Tue, Aug 03, 2021 at 03:06:50PM +0200, Jan Beulich wrote:
> On 03.08.2021 15:01, Marek Marczykowski-Górecki wrote:
> > Ok, then, just setting iommu_intremap=false should do the right thing,
> 
> ... if "iommu=force" is in use (but not otherwise), ...

But that's the purpose of iommu=force, no?
With "iommu=force": strictly require IOMMU
Without "iommu=force": use IOMMU on best-effort basis

It makes sense to refuse the boot if intremap is broken in the first
case. But also, it makes sense to allow using IOMMU without intremp in
the second case.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Re: [PATCH] VT-d: Tylersburg errata apply to further steppings
Posted by Jan Beulich 3 years, 3 months ago
On 03.08.2021 15:12, Marek Marczykowski-Górecki wrote:
> On Tue, Aug 03, 2021 at 03:06:50PM +0200, Jan Beulich wrote:
>> On 03.08.2021 15:01, Marek Marczykowski-Górecki wrote:
>>> Ok, then, just setting iommu_intremap=false should do the right thing,
>>
>> ... if "iommu=force" is in use (but not otherwise), ...
> 
> But that's the purpose of iommu=force, no?
> With "iommu=force": strictly require IOMMU
> Without "iommu=force": use IOMMU on best-effort basis
> 
> It makes sense to refuse the boot if intremap is broken in the first
> case. But also, it makes sense to allow using IOMMU without intremp in
> the second case.

I agree with both statements. What I disagree with is that the latter
happens by default (instead of only upon admin override), including
the case of intremap being unavailable in the first place.

Jan


RE: [PATCH] VT-d: Tylersburg errata apply to further steppings
Posted by Tian, Kevin 3 years, 3 months ago
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Tuesday, August 3, 2021 7:14 PM
> 
> While for 5500 and 5520 chipsets only B3 and C2 are mentioned in the
> spec update, X58's also mentions B2, and searching the internet suggests
> systems with this stepping are actually in use. Even worse, for X58
> erratum #69 is marked applicable even to C2. Split the check to cover
> all applicable steppings and to also report applicable errata numbers in
> the log message. The splitting requires using the DMI port instead of
> the System Management Registers device, but that's then in line (also
> revision checking wise) with the spec updates.
> 
> Fixes: 6890cebc6a98 ("VT-d: deal with 5500/5520/X58 errata")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> ---
> As to disabling just interrupt remapping (as the initial version of the
> original patch did) vs disabling the IOMMU as a whole: Using a less
> heavy workaround would of course be desirable, but then we need to
> ensure not to misguide the tool stack about the state of the system. It
> uses the PHYSCAP_directio sysctl output to determine whether PCI pass-
> through can be made use of, yet that flag is driven by "iommu_enabled"
> alone, without regard to the setting of "iommu_intremap".
> 
> --- a/xen/drivers/passthrough/vtd/quirks.c
> +++ b/xen/drivers/passthrough/vtd/quirks.c
> @@ -268,26 +268,42 @@ static int __init parse_snb_timeout(cons
>  }
>  custom_param("snb_igd_quirk", parse_snb_timeout);
> 
> -/* 5500/5520/X58 Chipset Interrupt remapping errata, for stepping B-3.
> - * Fixed in stepping C-2. */
> +/*
> + * 5500/5520/X58 chipset interrupt remapping errata, for steppings B2 and
> B3.
> + * Fixed in stepping C2 except on X58.
> + */
>  static void __init tylersburg_intremap_quirk(void)
>  {
> -    uint32_t bus, device;
> +    unsigned int bus;
>      uint8_t rev;
> 
>      for ( bus = 0; bus < 0x100; bus++ )
>      {
> -        /* Match on System Management Registers on Device 20 Function 0 */
> -        device = pci_conf_read32(PCI_SBDF(0, bus, 20, 0), PCI_VENDOR_ID);
> -        rev = pci_conf_read8(PCI_SBDF(0, bus, 20, 0), PCI_REVISION_ID);
> +        /* Match on DMI port (Device 0 Function 0) */
> +        rev = pci_conf_read8(PCI_SBDF(0, bus, 0, 0), PCI_REVISION_ID);
> 
> -        if ( rev == 0x13 && device == 0x342e8086 )
> +        switch ( pci_conf_read32(PCI_SBDF(0, bus, 0, 0), PCI_VENDOR_ID) )
>          {
> +        default:
> +            continue;
> +
> +        case 0x34038086: case 0x34068086:
> +            if ( rev >= 0x22 )
> +                continue;
>              printk(XENLOG_WARNING VTDPREFIX
> -                   "Disabling IOMMU due to Intel 5500/5520/X58 Chipset errata #47,
> #53\n");
> -            iommu_enable = 0;
> +                   "Disabling IOMMU due to Intel 5500/5520 chipset errata #47 and
> #53\n");
> +            iommu_enable = false;
> +            break;
> +
> +        case 0x34058086:
> +            printk(XENLOG_WARNING VTDPREFIX
> +                   "Disabling IOMMU due to Intel X58 chipset %s\n",
> +                   rev < 0x22 ? "errata #62 and #69" : "erratum #69");
> +            iommu_enable = false;
>              break;
>          }
> +
> +        break;
>      }
>  }
> 

Re: [PATCH] VT-d: Tylersburg errata apply to further steppings
Posted by Andrew Cooper 3 years, 3 months ago
On 03/08/2021 12:13, Jan Beulich wrote:
> While for 5500 and 5520 chipsets only B3 and C2 are mentioned in the
> spec update, X58's also mentions B2, and searching the internet suggests
> systems with this stepping are actually in use. Even worse, for X58
> erratum #69 is marked applicable even to C2. Split the check to cover
> all applicable steppings and to also report applicable errata numbers in
> the log message. The splitting requires using the DMI port instead of
> the System Management Registers device, but that's then in line (also
> revision checking wise) with the spec updates.
>
> Fixes: 6890cebc6a98 ("VT-d: deal with 5500/5520/X58 errata")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> As to disabling just interrupt remapping (as the initial version of the
> original patch did) vs disabling the IOMMU as a whole: Using a less
> heavy workaround would of course be desirable, but then we need to
> ensure not to misguide the tool stack about the state of the system.

This reasoning is buggy.

This errata is very specifically to do with interrupt remapping only. 
Disabling the whole IOMMU in response is inappropriate.

> It uses the PHYSCAP_directio sysctl output to determine whether PCI pass-
> through can be made use of, yet that flag is driven by "iommu_enabled"
> alone, without regard to the setting of "iommu_intremap".

The fact that range of hardware, including Tylersburg, don't have
interrupt remapping, and noone plumbed this nicely to the toolstack is
suboptimal.

But it is wholly inappropriate to punish users with Tylersburg hardware
because you don't like the fact that the toolstack can't see when
interrupt remapping is off.  The two issues are entirely orthogonal.

Tylersburg (taking this erratum into account) works just as well as and
securely as several previous generations of hardware, and should behave
the same.

~Andrew


Re: [PATCH] VT-d: Tylersburg errata apply to further steppings
Posted by Jan Beulich 3 years, 3 months ago
On 18.08.2021 13:32, Andrew Cooper wrote:
> On 03/08/2021 12:13, Jan Beulich wrote:
>> While for 5500 and 5520 chipsets only B3 and C2 are mentioned in the
>> spec update, X58's also mentions B2, and searching the internet suggests
>> systems with this stepping are actually in use. Even worse, for X58
>> erratum #69 is marked applicable even to C2. Split the check to cover
>> all applicable steppings and to also report applicable errata numbers in
>> the log message. The splitting requires using the DMI port instead of
>> the System Management Registers device, but that's then in line (also
>> revision checking wise) with the spec updates.
>>
>> Fixes: 6890cebc6a98 ("VT-d: deal with 5500/5520/X58 errata")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> As to disabling just interrupt remapping (as the initial version of the
>> original patch did) vs disabling the IOMMU as a whole: Using a less
>> heavy workaround would of course be desirable, but then we need to
>> ensure not to misguide the tool stack about the state of the system.
> 
> This reasoning is buggy.
> 
> This errata is very specifically to do with interrupt remapping only. 
> Disabling the whole IOMMU in response is inappropriate.

That's your view, and I accept it as a reasonable one. I don't accept
it as being the only reasonable one though, and hence I object to you
tagging other views (here just like in various cases elsewhere) as
"buggy" (or sometimes worse).

>> It uses the PHYSCAP_directio sysctl output to determine whether PCI pass-
>> through can be made use of, yet that flag is driven by "iommu_enabled"
>> alone, without regard to the setting of "iommu_intremap".
> 
> The fact that range of hardware, including Tylersburg, don't have
> interrupt remapping, and noone plumbed this nicely to the toolstack is
> suboptimal.
> 
> But it is wholly inappropriate to punish users with Tylersburg hardware
> because you don't like the fact that the toolstack can't see when
> interrupt remapping is off.  The two issues are entirely orthogonal.
> 
> Tylersburg (taking this erratum into account) works just as well as and
> securely as several previous generations of hardware, and should behave
> the same.

Should behave the same - yes. Previous generations without interrupt
remapping also shouldn't allow pass-through by default, i.e. require
admin consent to run guests in this less secure mode (except, perhaps,
for devices without interrupts, albeit I'm unaware of ways to tell).

Jan