Doing this in amd_iommu_prepare() is too late for it, in particular, to
be used in amd_iommu_detect_one_acpi(), as a subsequent change will want
to do. Moving it immediately ahead of amd_iommu_detect_acpi() is
(luckily) pretty simple, (pretty importantly) without breaking
amd_iommu_prepare()'s logic to prevent multiple processing.
This involves moving table checksumming, as
amd_iommu_get_supported_ivhd_type() -> get_supported_ivhd_type() will
now be invoked before amd_iommu_detect_acpi() -> detect_iommu_acpi(). In
the course of doing so stop open-coding acpi_tb_checksum(), seeing that
we have other uses of this originally ACPI-private function elsewhere in
the tree.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v7: Move table checksumming.
v5: New.
--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -22,6 +22,8 @@
#include <asm/io_apic.h>
+#include <acpi/actables.h>
+
#include "iommu.h"
/* Some helper structures, particularly to deal with ranges. */
@@ -1167,20 +1169,7 @@ static int __init parse_ivrs_table(struc
static int __init detect_iommu_acpi(struct acpi_table_header *table)
{
const struct acpi_ivrs_header *ivrs_block;
- unsigned long i;
unsigned long length = sizeof(struct acpi_table_ivrs);
- u8 checksum, *raw_table;
-
- /* validate checksum: sum of entire table == 0 */
- checksum = 0;
- raw_table = (u8 *)table;
- for ( i = 0; i < table->length; i++ )
- checksum += raw_table[i];
- if ( checksum )
- {
- AMD_IOMMU_DEBUG("IVRS Error: Invalid Checksum %#x\n", checksum);
- return -ENODEV;
- }
while ( table->length > (length + sizeof(*ivrs_block)) )
{
@@ -1317,6 +1306,15 @@ get_supported_ivhd_type(struct acpi_tabl
{
size_t length = sizeof(struct acpi_table_ivrs);
const struct acpi_ivrs_header *ivrs_block, *blk = NULL;
+ uint8_t checksum;
+
+ /* Validate checksum: Sum of entire table == 0. */
+ checksum = acpi_tb_checksum(ACPI_CAST_PTR(uint8_t, table), table->length);
+ if ( checksum )
+ {
+ AMD_IOMMU_DEBUG("IVRS Error: Invalid Checksum %#x\n", checksum);
+ return -ENODEV;
+ }
while ( table->length > (length + sizeof(*ivrs_block)) )
{
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1398,15 +1398,9 @@ int __init amd_iommu_prepare(bool xt)
goto error_out;
/* Have we been here before? */
- if ( ivhd_type )
+ if ( ivrs_bdf_entries )
return 0;
- rc = amd_iommu_get_supported_ivhd_type();
- if ( rc < 0 )
- goto error_out;
- BUG_ON(!rc);
- ivhd_type = rc;
-
rc = amd_iommu_get_ivrs_dev_entries();
if ( !rc )
rc = -ENODEV;
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -179,9 +179,17 @@ static int __must_check amd_iommu_setup_
int __init acpi_ivrs_init(void)
{
+ int rc;
+
if ( !iommu_enable && !iommu_intremap )
return 0;
+ rc = amd_iommu_get_supported_ivhd_type();
+ if ( rc < 0 )
+ return rc;
+ BUG_ON(!rc);
+ ivhd_type = rc;
+
if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) )
{
iommu_intremap = iommu_intremap_off;
On 22/09/2021 15:36, Jan Beulich wrote: > Doing this in amd_iommu_prepare() is too late for it, in particular, to > be used in amd_iommu_detect_one_acpi(), as a subsequent change will want > to do. Moving it immediately ahead of amd_iommu_detect_acpi() is > (luckily) pretty simple, (pretty importantly) without breaking > amd_iommu_prepare()'s logic to prevent multiple processing. > > This involves moving table checksumming, as > amd_iommu_get_supported_ivhd_type() -> get_supported_ivhd_type() will > now be invoked before amd_iommu_detect_acpi() -> detect_iommu_acpi(). In > the course of doing so stop open-coding acpi_tb_checksum(), seeing that > we have other uses of this originally ACPI-private function elsewhere in > the tree. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org>
On 22/09/2021 15:36, Jan Beulich wrote: > Doing this in amd_iommu_prepare() is too late for it, in particular, to > be used in amd_iommu_detect_one_acpi(), as a subsequent change will want > to do. Moving it immediately ahead of amd_iommu_detect_acpi() is > (luckily) pretty simple, (pretty importantly) without breaking > amd_iommu_prepare()'s logic to prevent multiple processing. > > This involves moving table checksumming, as > amd_iommu_get_supported_ivhd_type() -> get_supported_ivhd_type() will > now be invoked before amd_iommu_detect_acpi() -> detect_iommu_acpi(). In > the course of doing so stop open-coding acpi_tb_checksum(), seeing that > we have other uses of this originally ACPI-private function elsewhere in > the tree. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> I'm afraid this breaks booting on Skylake Server. Yes, really - I didn't believe the bisection at first either. From a bit of debugging, I've found: (XEN) *** acpi_dmar_init() => -19 (XEN) *** amd_iommu_get_supported_ivhd_type() => -19 So VT-d is disabled in firmware. Oops, but something we should cope with. Then we fall into acpi_ivrs_init(), and take the new-in-this-patch early exit with -ENOENT too. It turns out ... > --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c > +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c > @@ -179,9 +179,17 @@ static int __must_check amd_iommu_setup_ > > int __init acpi_ivrs_init(void) > { > + int rc; > + > if ( !iommu_enable && !iommu_intremap ) > return 0; > > + rc = amd_iommu_get_supported_ivhd_type(); > + if ( rc < 0 ) > + return rc; > + BUG_ON(!rc); > + ivhd_type = rc; > + > if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) ) > { > iommu_intremap = iommu_intremap_off; > ... we're relying on this path (now skipped) to set iommu_intremap away from iommu_intremap_full in the "no IOMMU anywhere to be found" case. This explains why I occasionally during failure get spew about: (XEN) CPU0: No irq handler for vector 7a (IRQ -2147483648, LAPIC) [ 17.117518] xhci_hcd 0000:00:14.0: Error while assigning device slot ID [ 17.121114] xhci_hcd 0000:00:14.0: Max number of devices this xHCI host supports is 64. [ 17.125198] usb usb1-port2: couldn't allocate usb_device [ 248.317462] INFO: task kworker/u32:0:7 blocked for more than 120 seconds. and eventually (gone 400s) get dumped in a dracut shell. Booting with an explicit iommu=no-intremap, which clobbers iommu_intremap during cmdline parsing, recovers the system. This variable controls a whole lot of magic with interrupt handling. It should default to 0, not 2, and only become nonzero when an IOMMU is properly established. It also shouldn't be serving double duty as "what the user wants" ahead of determining the system capabilities. And not to open another can of worms, but our entire way of working explodes if there are devices on the system not covered by an IOMMU. ~Andrew
On 20.10.2021 01:34, Andrew Cooper wrote: > On 22/09/2021 15:36, Jan Beulich wrote: >> Doing this in amd_iommu_prepare() is too late for it, in particular, to >> be used in amd_iommu_detect_one_acpi(), as a subsequent change will want >> to do. Moving it immediately ahead of amd_iommu_detect_acpi() is >> (luckily) pretty simple, (pretty importantly) without breaking >> amd_iommu_prepare()'s logic to prevent multiple processing. >> >> This involves moving table checksumming, as >> amd_iommu_get_supported_ivhd_type() -> get_supported_ivhd_type() will >> now be invoked before amd_iommu_detect_acpi() -> detect_iommu_acpi(). In >> the course of doing so stop open-coding acpi_tb_checksum(), seeing that >> we have other uses of this originally ACPI-private function elsewhere in >> the tree. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > I'm afraid this breaks booting on Skylake Server. Yes, really - I > didn't believe the bisection at first either. > > From a bit of debugging, I've found: > > (XEN) *** acpi_dmar_init() => -19 > (XEN) *** amd_iommu_get_supported_ivhd_type() => -19 > > So VT-d is disabled in firmware. Oops, but something we should cope with. I wanted to say that I definitely did test this (for a long, long time) on Intel systems, but clearly not on one like this. I'm sure though that I did test on IOMMU-less Intel systems, so I'm still a bit puzzled. > Then we fall into acpi_ivrs_init(), and take the new-in-this-patch early > exit with -ENOENT too. > > It turns out ... > >> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c >> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c >> @@ -179,9 +179,17 @@ static int __must_check amd_iommu_setup_ >> >> int __init acpi_ivrs_init(void) >> { >> + int rc; >> + >> if ( !iommu_enable && !iommu_intremap ) >> return 0; >> >> + rc = amd_iommu_get_supported_ivhd_type(); >> + if ( rc < 0 ) >> + return rc; >> + BUG_ON(!rc); >> + ivhd_type = rc; >> + >> if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) ) >> { >> iommu_intremap = iommu_intremap_off; >> > > ... we're relying on this path (now skipped) to set iommu_intremap away > from iommu_intremap_full in the "no IOMMU anywhere to be found" case. > > This explains why I occasionally during failure get spew about: > > (XEN) CPU0: No irq handler for vector 7a (IRQ -2147483648, LAPIC) > [ 17.117518] xhci_hcd 0000:00:14.0: Error while assigning device slot ID > [ 17.121114] xhci_hcd 0000:00:14.0: Max number of devices this xHCI > host supports is 64. > [ 17.125198] usb usb1-port2: couldn't allocate usb_device > [ 248.317462] INFO: task kworker/u32:0:7 blocked for more than 120 seconds. > > and eventually (gone 400s) get dumped in a dracut shell. > > Booting with an explicit iommu=no-intremap, which clobbers > iommu_intremap during cmdline parsing, recovers the system. > > This variable controls a whole lot of magic with interrupt handling. It > should default to 0, not 2, and only become nonzero when an IOMMU is > properly established. It also shouldn't be serving double duty as "what > the user wants" ahead of determining the system capabilities. This would probably be too large a change at this point in time; I'll see whether I can find something less intrusive. Unless of course there's a patch already on xen-devel, which I didn't get to read yet. > And not to open another can of worms, but our entire way of working > explodes if there are devices on the system not covered by an IOMMU. I wouldn't be surprised, but is this something we have to expect on non-broken systems? (I do know of broken systems giving the appearance of uncovered devices by lacking suitable include-all DRHD entries.) Jan
On 20.10.2021 01:34, Andrew Cooper wrote: > On 22/09/2021 15:36, Jan Beulich wrote: >> Doing this in amd_iommu_prepare() is too late for it, in particular, to >> be used in amd_iommu_detect_one_acpi(), as a subsequent change will want >> to do. Moving it immediately ahead of amd_iommu_detect_acpi() is >> (luckily) pretty simple, (pretty importantly) without breaking >> amd_iommu_prepare()'s logic to prevent multiple processing. >> >> This involves moving table checksumming, as >> amd_iommu_get_supported_ivhd_type() -> get_supported_ivhd_type() will >> now be invoked before amd_iommu_detect_acpi() -> detect_iommu_acpi(). In >> the course of doing so stop open-coding acpi_tb_checksum(), seeing that >> we have other uses of this originally ACPI-private function elsewhere in >> the tree. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > I'm afraid this breaks booting on Skylake Server. Yes, really - I > didn't believe the bisection at first either. I'll be able to debug this, as by disabling VT-d on my Skylake I can repro. But ... > From a bit of debugging, I've found: > > (XEN) *** acpi_dmar_init() => -19 > (XEN) *** amd_iommu_get_supported_ivhd_type() => -19 > > So VT-d is disabled in firmware. Oops, but something we should cope with. > > Then we fall into acpi_ivrs_init(), and take the new-in-this-patch early > exit with -ENOENT too. > > It turns out ... > >> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c >> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c >> @@ -179,9 +179,17 @@ static int __must_check amd_iommu_setup_ >> >> int __init acpi_ivrs_init(void) >> { >> + int rc; >> + >> if ( !iommu_enable && !iommu_intremap ) >> return 0; >> >> + rc = amd_iommu_get_supported_ivhd_type(); >> + if ( rc < 0 ) >> + return rc; >> + BUG_ON(!rc); >> + ivhd_type = rc; >> + >> if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) ) >> { >> iommu_intremap = iommu_intremap_off; >> > > ... we're relying on this path (now skipped) to set iommu_intremap away > from iommu_intremap_full in the "no IOMMU anywhere to be found" case. ... this picture here looks incomplete, since in iommu_hardware_setup() we have if ( !iommu_enabled ) iommu_intremap = iommu_intremap_off; which I don't see how it could be bypassed. Booting here fails because of the AHCI driver not being able to obtain control of the disk, but checking in a working setup I see it use MSI, which can't possibly be affected by an early-boot-only wrong setting of iommu_intremap. (I can easily believe that we have early IO-APIC setup logic going wrong when this remains mistakenly set.) What I'd like to avoid though is to add yet another custom writing of iommu_intremap_off in acpi_ivrs_init(). I'd prefer to find a better place for it, so I will want to do some debugging first. If all else fails, the setting should at least be moved into the caller of the function - after all switching around the order of the acpi_{dmar,ivrs}_init() calls in acpi_iommu_init() shouldn't have any negative effect. Jan > This explains why I occasionally during failure get spew about: > > (XEN) CPU0: No irq handler for vector 7a (IRQ -2147483648, LAPIC) > [ 17.117518] xhci_hcd 0000:00:14.0: Error while assigning device slot ID > [ 17.121114] xhci_hcd 0000:00:14.0: Max number of devices this xHCI > host supports is 64. > [ 17.125198] usb usb1-port2: couldn't allocate usb_device > [ 248.317462] INFO: task kworker/u32:0:7 blocked for more than 120 seconds. > > and eventually (gone 400s) get dumped in a dracut shell. > > Booting with an explicit iommu=no-intremap, which clobbers > iommu_intremap during cmdline parsing, recovers the system. > > This variable controls a whole lot of magic with interrupt handling. It > should default to 0, not 2, and only become nonzero when an IOMMU is > properly established. It also shouldn't be serving double duty as "what > the user wants" ahead of determining the system capabilities. > > And not to open another can of worms, but our entire way of working > explodes if there are devices on the system not covered by an IOMMU. > > ~Andrew >
© 2016 - 2024 Red Hat, Inc.