[PATCH] x86/IOMMU: mark IOMMU / intremap not in use when ACPI tables are missing

Jan Beulich posted 1 patch 2 years, 6 months ago
Failed in applying to current master (apply log)
[PATCH] x86/IOMMU: mark IOMMU / intremap not in use when ACPI tables are missing
Posted by Jan Beulich 2 years, 6 months ago
x2apic_bsp_setup() gets called ahead of iommu_setup(), and since x2APIC
mode (physical vs clustered) depends on iommu_intremap, that variable
needs to be set to off as soon as we know we can't / won't enable the
IOMMU, i.e. in particular when
- parsing of the respective ACPI tables failed,
- "iommu=off" is in effect, but not "iommu=no-intremap".
Move the turning off of iommu_intremap from AMD specific code into
acpi_iommu_init(), accompanying it by clearing of iommu_enable.

Take the opportunity and also skip ACPI table parsing altogether when
"iommu=off" is in effect anyway.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
I've deliberately not added a Fixes: tag here, as I'm of the opinion
that d8bd82327b0f ("AMD/IOMMU: obtain IVHD type to use earlier") only
uncovered a pre-existing anomaly. This particularly applies to the
"iommu=off" aspect. (This now allows me to remove an item from my TODO
list: I was meaning to figure out why one of my systems wouldn't come
up properly with "iommu=off", and I had never thought of combining this
with "no-intremap".)

Arguably "iommu=off" should turn off subordinate features in common
IOMMU code, but doing so in parse_iommu_param() would be wrong (as
there might be multiple "iommu=" to parse). This could be placed in
iommu_supports_x2apic(), but see the next item.

While the change here deals with apic_x2apic_probe() as called from
x2apic_bsp_setup(), the check_x2apic_preenabled() path looks to be
similarly affected. That call occurs before acpi_boot_init(), which is
what calls acpi_iommu_init(). The ordering in setup.c is in part
relatively fragile, which is why for the moment I'm still hesitant to
move the generic_apic_probe() call down. Plus I don't have easy access
to a suitable system to test this case. Thoughts?

--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -193,10 +193,7 @@ int __init acpi_ivrs_init(void)
     ivhd_type = rc;
 
     if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) )
-    {
-        iommu_intremap = iommu_intremap_off;
         return -ENODEV;
-    }
 
     iommu_init_ops = &_iommu_init_ops;
 
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -41,6 +41,23 @@ enum iommu_intremap __read_mostly iommu_
 bool __read_mostly iommu_intpost;
 #endif
 
+void __init acpi_iommu_init(void)
+{
+    if ( iommu_enable )
+    {
+        int ret = acpi_dmar_init();
+
+        if ( ret == -ENODEV )
+            ret = acpi_ivrs_init();
+
+        if ( ret )
+            iommu_enable = false;
+    }
+
+    if ( !iommu_enable )
+        iommu_intremap = iommu_intremap_off;
+}
+
 int __init iommu_hardware_setup(void)
 {
     struct IO_APIC_route_entry **ioapic_entries = NULL;
--- a/xen/include/asm-x86/acpi.h
+++ b/xen/include/asm-x86/acpi.h
@@ -141,16 +141,10 @@ extern u32 x86_acpiid_to_apicid[];
 extern u32 pmtmr_ioport;
 extern unsigned int pmtmr_width;
 
+void acpi_iommu_init(void);
 int acpi_dmar_init(void);
 int acpi_ivrs_init(void);
 
-static inline int acpi_iommu_init(void)
-{
-    int ret = acpi_dmar_init();
-
-    return ret == -ENODEV ? acpi_ivrs_init() : ret;
-}
-
 void acpi_mmcfg_init(void);
 
 /* Incremented whenever we transition through S3. Value is 1 during boot. */


Re: [PATCH] x86/IOMMU: mark IOMMU / intremap not in use when ACPI tables are missing
Posted by Andrew Cooper 2 years, 6 months ago
On 20/10/2021 11:36, Jan Beulich wrote:
> x2apic_bsp_setup() gets called ahead of iommu_setup(), and since x2APIC
> mode (physical vs clustered) depends on iommu_intremap, that variable
> needs to be set to off as soon as we know we can't / won't enable the
> IOMMU, i.e. in particular when
> - parsing of the respective ACPI tables failed,
> - "iommu=off" is in effect, but not "iommu=no-intremap".
> Move the turning off of iommu_intremap from AMD specific code into
> acpi_iommu_init(), accompanying it by clearing of iommu_enable.
>
> Take the opportunity and also skip ACPI table parsing altogether when
> "iommu=off" is in effect anyway.
>
> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> I've deliberately not added a Fixes: tag here, as I'm of the opinion
> that d8bd82327b0f ("AMD/IOMMU: obtain IVHD type to use earlier") only
> uncovered a pre-existing anomaly.

I agree it uncovered a pre-existing issue, but that doesn't mean a Fixes
tag should be omitted.  That change very concretely regressed booting on
some systems in their pre-existing configuration.

The commit message needs to spell out a link to d8bd82327b0f, but it's
fine to say "that commit broke it by violating an unexpected ordering
dependency, but isn't really the source of the bug".

> This particularly applies to the "iommu=off" aspect.

There should be at least two Fixes tags, but I suspect trying to trace
the history of this mess is not a productive use of time.

>  (This now allows me to remove an item from my TODO
> list: I was meaning to figure out why one of my systems wouldn't come
> up properly with "iommu=off", and I had never thought of combining this
> with "no-intremap".)
>
> Arguably "iommu=off" should turn off subordinate features in common
> IOMMU code, but doing so in parse_iommu_param() would be wrong (as
> there might be multiple "iommu=" to parse). This could be placed in
> iommu_supports_x2apic(), but see the next item.

I don't think we make any claim or implication that passing the same
option twice works.  The problem here is the nested structure of
options, and the variable doing double duty.

>
> While the change here deals with apic_x2apic_probe() as called from
> x2apic_bsp_setup(), the check_x2apic_preenabled() path looks to be
> similarly affected. That call occurs before acpi_boot_init(), which is
> what calls acpi_iommu_init(). The ordering in setup.c is in part
> relatively fragile, which is why for the moment I'm still hesitant to
> move the generic_apic_probe() call down. Plus I don't have easy access
> to a suitable system to test this case. Thoughts?

I've written these thoughts before, but IOMMU handling it a catastrophic
mess.  It needs burning to the ground and redoing from scratch.

We currently have two ways of turning on the IOMMU, depending on what
firmware does, and plenty ways of crashing Xen with cmdline options
which should work, and the legacy xAPIC startup routine is after
interrupts have been enabled, leading to an attempt to rewrite
interrupts in place to remap them.  This in particular has lead to XSAs
due to trusting registers which can't be trusted, and the rewrite is
impossible to do safely.

The correct order is:
1) Parse DMAR/IVRS (but do not configure anything), MADT, current APIC
setting and cmdline arguments
2) Figure out whether we want to be in xAPIC or x2APIC mode, and whether
we need intremap.  Change the LAPIC setting
3) Configure the IOMMUs.  In particular, their interrupt needs to be
after step 2
4) Start up Xen's general IRQ infrastructure.

It's a fair chunk of work, but it will vastly simplify the boot logic
and let us delete the infinite flushing loops out of the IOMMU logic,
and we don't need any logic which has to second guess itself based on
what happened earlier on boot.

> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -41,6 +41,23 @@ enum iommu_intremap __read_mostly iommu_
>  bool __read_mostly iommu_intpost;
>  #endif
>  
> +void __init acpi_iommu_init(void)
> +{
> +    if ( iommu_enable )
> +    {
> +        int ret = acpi_dmar_init();
> +
> +        if ( ret == -ENODEV )
> +            ret = acpi_ivrs_init();
> +
> +        if ( ret )
> +            iommu_enable = false;
> +    }
> +
> +    if ( !iommu_enable )
> +        iommu_intremap = iommu_intremap_off;
> +}

This does fix my issue, so preferably with the Fixes tag reinstated,

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>

However, I don't think skipping parsing is a sensible move.  Intremap is
utterly mandatory if during boot, we discover that our APIC ID is >254,
and iommu=no-intremap must be ignored in this case, or if the MADT says
we have CPUs beyond that limit and the user hasn't specified nr_cpus=1
or equivalent.

This applies to a future world with a sane boot order, but Xen needs to
know hardware_support_{dma,int}remapping (-> must parse the ACPI tables)
ahead of choosing whether to turn the facilities on or not.  Fixing this
removes the double duty from variables.

~Andrew


Re: [PATCH] x86/IOMMU: mark IOMMU / intremap not in use when ACPI tables are missing
Posted by Jan Beulich 2 years, 6 months ago
On 20.10.2021 22:01, Andrew Cooper wrote:
> On 20/10/2021 11:36, Jan Beulich wrote:
>> x2apic_bsp_setup() gets called ahead of iommu_setup(), and since x2APIC
>> mode (physical vs clustered) depends on iommu_intremap, that variable
>> needs to be set to off as soon as we know we can't / won't enable the
>> IOMMU, i.e. in particular when
>> - parsing of the respective ACPI tables failed,
>> - "iommu=off" is in effect, but not "iommu=no-intremap".
>> Move the turning off of iommu_intremap from AMD specific code into
>> acpi_iommu_init(), accompanying it by clearing of iommu_enable.
>>
>> Take the opportunity and also skip ACPI table parsing altogether when
>> "iommu=off" is in effect anyway.
>>
>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> I've deliberately not added a Fixes: tag here, as I'm of the opinion
>> that d8bd82327b0f ("AMD/IOMMU: obtain IVHD type to use earlier") only
>> uncovered a pre-existing anomaly.
> 
> I agree it uncovered a pre-existing issue, but that doesn't mean a Fixes
> tag should be omitted.  That change very concretely regressed booting on
> some systems in their pre-existing configuration.
> 
> The commit message needs to spell out a link to d8bd82327b0f, but it's
> fine to say "that commit broke it by violating an unexpected ordering
> dependency, but isn't really the source of the bug".
> 
>> This particularly applies to the "iommu=off" aspect.
> 
> There should be at least two Fixes tags, but I suspect trying to trace
> the history of this mess is not a productive use of time.
> 
>>  (This now allows me to remove an item from my TODO
>> list: I was meaning to figure out why one of my systems wouldn't come
>> up properly with "iommu=off", and I had never thought of combining this
>> with "no-intremap".)
>>
>> Arguably "iommu=off" should turn off subordinate features in common
>> IOMMU code, but doing so in parse_iommu_param() would be wrong (as
>> there might be multiple "iommu=" to parse). This could be placed in
>> iommu_supports_x2apic(), but see the next item.
> 
> I don't think we make any claim or implication that passing the same
> option twice works.  The problem here is the nested structure of
> options, and the variable doing double duty.
> 
>>
>> While the change here deals with apic_x2apic_probe() as called from
>> x2apic_bsp_setup(), the check_x2apic_preenabled() path looks to be
>> similarly affected. That call occurs before acpi_boot_init(), which is
>> what calls acpi_iommu_init(). The ordering in setup.c is in part
>> relatively fragile, which is why for the moment I'm still hesitant to
>> move the generic_apic_probe() call down. Plus I don't have easy access
>> to a suitable system to test this case. Thoughts?
> 
> I've written these thoughts before, but IOMMU handling it a catastrophic
> mess.  It needs burning to the ground and redoing from scratch.
> 
> We currently have two ways of turning on the IOMMU, depending on what
> firmware does, and plenty ways of crashing Xen with cmdline options
> which should work, and the legacy xAPIC startup routine is after
> interrupts have been enabled, leading to an attempt to rewrite
> interrupts in place to remap them.  This in particular has lead to XSAs
> due to trusting registers which can't be trusted, and the rewrite is
> impossible to do safely.
> 
> The correct order is:
> 1) Parse DMAR/IVRS (but do not configure anything), MADT, current APIC
> setting and cmdline arguments
> 2) Figure out whether we want to be in xAPIC or x2APIC mode, and whether
> we need intremap.  Change the LAPIC setting
> 3) Configure the IOMMUs.  In particular, their interrupt needs to be
> after step 2

Leaving aside check_x2apic_preenabled(), all of this is already the
case afaict, almost at least: We do acpi_boot_init(), later
x2apic_bsp_setup(), and yet later iommu_setup(). The only issue might
be inside x2apic_bsp_setup(), where we call iommu_enable_x2apic()
before switching to x2APIC mode. Yet we avoid setting up IOMMU
interrupts during this early stage.

Hence I think, as expressed, that the question really is whether we
can safely defer check_x2apic_preenabled() by a little bit.

> 4) Start up Xen's general IRQ infrastructure.
> 
> It's a fair chunk of work, but it will vastly simplify the boot logic
> and let us delete the infinite flushing loops out of the IOMMU logic,
> and we don't need any logic which has to second guess itself based on
> what happened earlier on boot.
> 
>> --- a/xen/drivers/passthrough/x86/iommu.c
>> +++ b/xen/drivers/passthrough/x86/iommu.c
>> @@ -41,6 +41,23 @@ enum iommu_intremap __read_mostly iommu_
>>  bool __read_mostly iommu_intpost;
>>  #endif
>>  
>> +void __init acpi_iommu_init(void)
>> +{
>> +    if ( iommu_enable )
>> +    {
>> +        int ret = acpi_dmar_init();
>> +
>> +        if ( ret == -ENODEV )
>> +            ret = acpi_ivrs_init();
>> +
>> +        if ( ret )
>> +            iommu_enable = false;
>> +    }
>> +
>> +    if ( !iommu_enable )
>> +        iommu_intremap = iommu_intremap_off;
>> +}
> 
> This does fix my issue, so preferably with the Fixes tag reinstated,
> 
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>

Thanks, but I think there will need to be a v2, as per below (plus
possibly the dealing with check_x2apic_preenabled()).

> However, I don't think skipping parsing is a sensible move.  Intremap is
> utterly mandatory if during boot, we discover that our APIC ID is >254,
> and iommu=no-intremap must be ignored in this case, or if the MADT says
> we have CPUs beyond that limit and the user hasn't specified nr_cpus=1
> or equivalent.

Reading this made me realize that the change breaks other behavior.
The conditional really needs to be iommu_enable || iommu_intremap -
at least AMD code added in support for x2APIC already treats the
latter to not be a sub-option of the former (iov_supports_xt(),
acpi_ivrs_init()), and e.g. intel_iommu_supports_eim() also checks
the latter alone.

Overriding "iommu=no-intremap" in case it's unavoidable could then
be a later change, not further affecting the function here.

Jan


Re: [PATCH] x86/IOMMU: mark IOMMU / intremap not in use when ACPI tables are missing
Posted by Jan Beulich 2 years, 6 months ago
On 21.10.2021 09:21, Jan Beulich wrote:
> On 20.10.2021 22:01, Andrew Cooper wrote:
>> However, I don't think skipping parsing is a sensible move.  Intremap is
>> utterly mandatory if during boot, we discover that our APIC ID is >254,
>> and iommu=no-intremap must be ignored in this case, or if the MADT says
>> we have CPUs beyond that limit and the user hasn't specified nr_cpus=1
>> or equivalent.
> 
> Reading this made me realize that the change breaks other behavior.
> The conditional really needs to be iommu_enable || iommu_intremap -
> at least AMD code added in support for x2APIC already treats the
> latter to not be a sub-option of the former (iov_supports_xt(),
> acpi_ivrs_init()), and e.g. intel_iommu_supports_eim() also checks
> the latter alone.

Actually the check in iov_supports_xt() is wrong - it needs to use
&&, not ||. I'll make a 2nd patch ...

Jan