AMD/IOMMU: further work split from XSA-378

[PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier

Posted by Jan Beulich 4 years, 4 months ago

Doing this in amd_iommu_prepare() is too late for it, in particular, to
be used in amd_iommu_detect_one_acpi(), as a subsequent change will want
to do. Moving it immediately ahead of amd_iommu_detect_acpi() is
(luckily) pretty simple, (pretty importantly) without breaking
amd_iommu_prepare()'s logic to prevent multiple processing.

This involves moving table checksumming, as
amd_iommu_get_supported_ivhd_type() ->  get_supported_ivhd_type() will
now be invoked before amd_iommu_detect_acpi()  -> detect_iommu_acpi(). In
the course of doing so stop open-coding acpi_tb_checksum(), seeing that
we have other uses of this originally ACPI-private function elsewhere in
the tree.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v7: Move table checksumming.
v5: New.

--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -22,6 +22,8 @@
 
 #include <asm/io_apic.h>
 
+#include <acpi/actables.h>
+
 #include "iommu.h"
 
 /* Some helper structures, particularly to deal with ranges. */
@@ -1167,20 +1169,7 @@ static int __init parse_ivrs_table(struc
 static int __init detect_iommu_acpi(struct acpi_table_header *table)
 {
     const struct acpi_ivrs_header *ivrs_block;
-    unsigned long i;
     unsigned long length = sizeof(struct acpi_table_ivrs);
-    u8 checksum, *raw_table;
-
-    /* validate checksum: sum of entire table == 0 */
-    checksum = 0;
-    raw_table = (u8 *)table;
-    for ( i = 0; i < table->length; i++ )
-        checksum += raw_table[i];
-    if ( checksum )
-    {
-        AMD_IOMMU_DEBUG("IVRS Error: Invalid Checksum %#x\n", checksum);
-        return -ENODEV;
-    }
 
     while ( table->length > (length + sizeof(*ivrs_block)) )
     {
@@ -1317,6 +1306,15 @@ get_supported_ivhd_type(struct acpi_tabl
 {
     size_t length = sizeof(struct acpi_table_ivrs);
     const struct acpi_ivrs_header *ivrs_block, *blk = NULL;
+    uint8_t checksum;
+
+    /* Validate checksum: Sum of entire table == 0. */
+    checksum = acpi_tb_checksum(ACPI_CAST_PTR(uint8_t, table), table->length);
+    if ( checksum )
+    {
+        AMD_IOMMU_DEBUG("IVRS Error: Invalid Checksum %#x\n", checksum);
+        return -ENODEV;
+    }
 
     while ( table->length > (length + sizeof(*ivrs_block)) )
     {
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1398,15 +1398,9 @@ int __init amd_iommu_prepare(bool xt)
         goto error_out;
 
     /* Have we been here before? */
-    if ( ivhd_type )
+    if ( ivrs_bdf_entries )
         return 0;
 
-    rc = amd_iommu_get_supported_ivhd_type();
-    if ( rc < 0 )
-        goto error_out;
-    BUG_ON(!rc);
-    ivhd_type = rc;
-
     rc = amd_iommu_get_ivrs_dev_entries();
     if ( !rc )
         rc = -ENODEV;
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -179,9 +179,17 @@ static int __must_check amd_iommu_setup_
 
 int __init acpi_ivrs_init(void)
 {
+    int rc;
+
     if ( !iommu_enable && !iommu_intremap )
         return 0;
 
+    rc = amd_iommu_get_supported_ivhd_type();
+    if ( rc < 0 )
+        return rc;
+    BUG_ON(!rc);
+    ivhd_type = rc;
+
     if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) )
     {
         iommu_intremap = iommu_intremap_off;

Re: [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier

Posted by Durrant, Paul 4 years, 4 months ago

On 22/09/2021 15:36, Jan Beulich wrote:
> Doing this in amd_iommu_prepare() is too late for it, in particular, to
> be used in amd_iommu_detect_one_acpi(), as a subsequent change will want
> to do. Moving it immediately ahead of amd_iommu_detect_acpi() is
> (luckily) pretty simple, (pretty importantly) without breaking
> amd_iommu_prepare()'s logic to prevent multiple processing.
> 
> This involves moving table checksumming, as
> amd_iommu_get_supported_ivhd_type() ->  get_supported_ivhd_type() will
> now be invoked before amd_iommu_detect_acpi()  -> detect_iommu_acpi(). In
> the course of doing so stop open-coding acpi_tb_checksum(), seeing that
> we have other uses of this originally ACPI-private function elsewhere in
> the tree.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Paul Durrant <paul@xen.org>

Re: [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier

Posted by Andrew Cooper 4 years, 3 months ago

On 22/09/2021 15:36, Jan Beulich wrote:
> Doing this in amd_iommu_prepare() is too late for it, in particular, to
> be used in amd_iommu_detect_one_acpi(), as a subsequent change will want
> to do. Moving it immediately ahead of amd_iommu_detect_acpi() is
> (luckily) pretty simple, (pretty importantly) without breaking
> amd_iommu_prepare()'s logic to prevent multiple processing.
>
> This involves moving table checksumming, as
> amd_iommu_get_supported_ivhd_type() ->  get_supported_ivhd_type() will
> now be invoked before amd_iommu_detect_acpi()  -> detect_iommu_acpi(). In
> the course of doing so stop open-coding acpi_tb_checksum(), seeing that
> we have other uses of this originally ACPI-private function elsewhere in
> the tree.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

I'm afraid this breaks booting on Skylake Server.  Yes, really - I
didn't believe the bisection at first either.

From a bit of debugging, I've found:

(XEN) *** acpi_dmar_init() => -19
(XEN) *** amd_iommu_get_supported_ivhd_type() => -19

So VT-d is disabled in firmware.  Oops, but something we should cope with.

Then we fall into acpi_ivrs_init(), and take the new-in-this-patch early
exit with -ENOENT too.

It turns out ...

> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -179,9 +179,17 @@ static int __must_check amd_iommu_setup_
>  
>  int __init acpi_ivrs_init(void)
>  {
> +    int rc;
> +
>      if ( !iommu_enable && !iommu_intremap )
>          return 0;
>  
> +    rc = amd_iommu_get_supported_ivhd_type();
> +    if ( rc < 0 )
> +        return rc;
> +    BUG_ON(!rc);
> +    ivhd_type = rc;
> +
>      if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) )
>      {
>          iommu_intremap = iommu_intremap_off;
>

... we're relying on this path (now skipped) to set iommu_intremap away
from iommu_intremap_full in the "no IOMMU anywhere to be found" case.

This explains why I occasionally during failure get spew about:

(XEN) CPU0: No irq handler for vector 7a (IRQ -2147483648, LAPIC)
[   17.117518] xhci_hcd 0000:00:14.0: Error while assigning device slot ID
[   17.121114] xhci_hcd 0000:00:14.0: Max number of devices this xHCI
host supports is 64.
[   17.125198] usb usb1-port2: couldn't allocate usb_device
[  248.317462] INFO: task kworker/u32:0:7 blocked for more than 120 seconds.

and eventually (gone 400s) get dumped in a dracut shell.

Booting with an explicit iommu=no-intremap, which clobbers
iommu_intremap during cmdline parsing, recovers the system.

This variable controls a whole lot of magic with interrupt handling.  It
should default to 0, not 2, and only become nonzero when an IOMMU is
properly established.  It also shouldn't be serving double duty as "what
the user wants" ahead of determining the system capabilities.

And not to open another can of worms, but our entire way of working
explodes if there are devices on the system not covered by an IOMMU.

~Andrew

Re: [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier

Posted by Jan Beulich 4 years, 3 months ago

On 20.10.2021 01:34, Andrew Cooper wrote:
> On 22/09/2021 15:36, Jan Beulich wrote:
>> Doing this in amd_iommu_prepare() is too late for it, in particular, to
>> be used in amd_iommu_detect_one_acpi(), as a subsequent change will want
>> to do. Moving it immediately ahead of amd_iommu_detect_acpi() is
>> (luckily) pretty simple, (pretty importantly) without breaking
>> amd_iommu_prepare()'s logic to prevent multiple processing.
>>
>> This involves moving table checksumming, as
>> amd_iommu_get_supported_ivhd_type() ->  get_supported_ivhd_type() will
>> now be invoked before amd_iommu_detect_acpi()  -> detect_iommu_acpi(). In
>> the course of doing so stop open-coding acpi_tb_checksum(), seeing that
>> we have other uses of this originally ACPI-private function elsewhere in
>> the tree.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> I'm afraid this breaks booting on Skylake Server.  Yes, really - I
> didn't believe the bisection at first either.
> 
> From a bit of debugging, I've found:
> 
> (XEN) *** acpi_dmar_init() => -19
> (XEN) *** amd_iommu_get_supported_ivhd_type() => -19
> 
> So VT-d is disabled in firmware.  Oops, but something we should cope with.

I wanted to say that I definitely did test this (for a long, long
time) on Intel systems, but clearly not on one like this. I'm sure
though that I did test on IOMMU-less Intel systems, so I'm still a
bit puzzled.

> Then we fall into acpi_ivrs_init(), and take the new-in-this-patch early
> exit with -ENOENT too.
> 
> It turns out ...
> 
>> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> @@ -179,9 +179,17 @@ static int __must_check amd_iommu_setup_
>>  
>>  int __init acpi_ivrs_init(void)
>>  {
>> +    int rc;
>> +
>>      if ( !iommu_enable && !iommu_intremap )
>>          return 0;
>>  
>> +    rc = amd_iommu_get_supported_ivhd_type();
>> +    if ( rc < 0 )
>> +        return rc;
>> +    BUG_ON(!rc);
>> +    ivhd_type = rc;
>> +
>>      if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) )
>>      {
>>          iommu_intremap = iommu_intremap_off;
>>
> 
> ... we're relying on this path (now skipped) to set iommu_intremap away
> from iommu_intremap_full in the "no IOMMU anywhere to be found" case.
> 
> This explains why I occasionally during failure get spew about:
> 
> (XEN) CPU0: No irq handler for vector 7a (IRQ -2147483648, LAPIC)
> [   17.117518] xhci_hcd 0000:00:14.0: Error while assigning device slot ID
> [   17.121114] xhci_hcd 0000:00:14.0: Max number of devices this xHCI
> host supports is 64.
> [   17.125198] usb usb1-port2: couldn't allocate usb_device
> [  248.317462] INFO: task kworker/u32:0:7 blocked for more than 120 seconds.
> 
> and eventually (gone 400s) get dumped in a dracut shell.
> 
> Booting with an explicit iommu=no-intremap, which clobbers
> iommu_intremap during cmdline parsing, recovers the system.
> 
> This variable controls a whole lot of magic with interrupt handling.  It
> should default to 0, not 2, and only become nonzero when an IOMMU is
> properly established.  It also shouldn't be serving double duty as "what
> the user wants" ahead of determining the system capabilities.

This would probably be too large a change at this point in time;
I'll see whether I can find something less intrusive. Unless of
course there's a patch already on xen-devel, which I didn't get
to read yet.

> And not to open another can of worms, but our entire way of working
> explodes if there are devices on the system not covered by an IOMMU.

I wouldn't be surprised, but is this something we have to expect
on non-broken systems? (I do know of broken systems giving the
appearance of uncovered devices by lacking suitable include-all
DRHD entries.)

Jan

Re: [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier

Posted by Jan Beulich 4 years, 3 months ago

On 20.10.2021 01:34, Andrew Cooper wrote:
> On 22/09/2021 15:36, Jan Beulich wrote:
>> Doing this in amd_iommu_prepare() is too late for it, in particular, to
>> be used in amd_iommu_detect_one_acpi(), as a subsequent change will want
>> to do. Moving it immediately ahead of amd_iommu_detect_acpi() is
>> (luckily) pretty simple, (pretty importantly) without breaking
>> amd_iommu_prepare()'s logic to prevent multiple processing.
>>
>> This involves moving table checksumming, as
>> amd_iommu_get_supported_ivhd_type() ->  get_supported_ivhd_type() will
>> now be invoked before amd_iommu_detect_acpi()  -> detect_iommu_acpi(). In
>> the course of doing so stop open-coding acpi_tb_checksum(), seeing that
>> we have other uses of this originally ACPI-private function elsewhere in
>> the tree.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> I'm afraid this breaks booting on Skylake Server.  Yes, really - I
> didn't believe the bisection at first either.

I'll be able to debug this, as by disabling VT-d on my Skylake I can
repro. But ...

> From a bit of debugging, I've found:
> 
> (XEN) *** acpi_dmar_init() => -19
> (XEN) *** amd_iommu_get_supported_ivhd_type() => -19
> 
> So VT-d is disabled in firmware.  Oops, but something we should cope with.
> 
> Then we fall into acpi_ivrs_init(), and take the new-in-this-patch early
> exit with -ENOENT too.
> 
> It turns out ...
> 
>> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> @@ -179,9 +179,17 @@ static int __must_check amd_iommu_setup_
>>  
>>  int __init acpi_ivrs_init(void)
>>  {
>> +    int rc;
>> +
>>      if ( !iommu_enable && !iommu_intremap )
>>          return 0;
>>  
>> +    rc = amd_iommu_get_supported_ivhd_type();
>> +    if ( rc < 0 )
>> +        return rc;
>> +    BUG_ON(!rc);
>> +    ivhd_type = rc;
>> +
>>      if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) )
>>      {
>>          iommu_intremap = iommu_intremap_off;
>>
> 
> ... we're relying on this path (now skipped) to set iommu_intremap away
> from iommu_intremap_full in the "no IOMMU anywhere to be found" case.

... this picture here looks incomplete, since in iommu_hardware_setup()
we have

    if ( !iommu_enabled )
        iommu_intremap = iommu_intremap_off;

which I don't see how it could be bypassed. Booting here fails because
of the AHCI driver not being able to obtain control of the disk, but
checking in a working setup I see it use MSI, which can't possibly be
affected by an early-boot-only wrong setting of iommu_intremap. (I can
easily believe that we have early IO-APIC setup logic going wrong when
this remains mistakenly set.)

What I'd like to avoid though is to add yet another custom writing of
iommu_intremap_off in acpi_ivrs_init(). I'd prefer to find a better
place for it, so I will want to do some debugging first. If all else
fails, the setting should at least be moved into the caller of the
function - after all switching around the order of the
acpi_{dmar,ivrs}_init() calls in acpi_iommu_init() shouldn't have any
negative effect.

Jan

> This explains why I occasionally during failure get spew about:
> 
> (XEN) CPU0: No irq handler for vector 7a (IRQ -2147483648, LAPIC)
> [   17.117518] xhci_hcd 0000:00:14.0: Error while assigning device slot ID
> [   17.121114] xhci_hcd 0000:00:14.0: Max number of devices this xHCI
> host supports is 64.
> [   17.125198] usb usb1-port2: couldn't allocate usb_device
> [  248.317462] INFO: task kworker/u32:0:7 blocked for more than 120 seconds.
> 
> and eventually (gone 400s) get dumped in a dracut shell.
> 
> Booting with an explicit iommu=no-intremap, which clobbers
> iommu_intremap during cmdline parsing, recovers the system.
> 
> This variable controls a whole lot of magic with interrupt handling.  It
> should default to 0, not 2, and only become nonzero when an IOMMU is
> properly established.  It also shouldn't be serving double duty as "what
> the user wants" ahead of determining the system capabilities.
> 
> And not to open another can of worms, but our entire way of working
> explodes if there are devices on the system not covered by an IOMMU.
> 
> ~Andrew
>

[PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier
[PATCH v8 2/6] AMD/IOMMU: improve (extended) feature detection
[PATCH v8 3/6] AMD/IOMMU: check IVMD ranges against host implementation limits
[PATCH v8 4/6] AMD/IOMMU: respect AtsDisabled device flag
[PATCH v8 5/6] AMD/IOMMU: pull ATS disabling earlier
[PATCH v8 6/6] AMD/IOMMU: expose errors and warnings unconditionally