[PATCH 0/2] VT-d: correct / extend workaround(s) leaving an IOMMU disabled

Jan Beulich posted 2 patches 2 years, 6 months ago
Failed in applying to current master (apply log)
[PATCH 0/2] VT-d: correct / extend workaround(s) leaving an IOMMU disabled
Posted by Jan Beulich 2 years, 6 months ago
This gets us closer to what Linux does, which hopefully improves
the experience for people running Xen on affected hardware.

Ian - I'm also Cc-ing you since this feels like being on the edge
between a new feature and a bug fix.

1: generalize and correct "iommu=no-igfx" handling
2: Tylersburg isoch DMAR unit with no TLB space

Jan


Re: [PATCH 0/2] VT-d: correct / extend workaround(s) leaving an IOMMU disabled [and 1 more messages]
Posted by Ian Jackson 2 years, 6 months ago
Jan Beulich writes ("[PATCH 0/2] VT-d: correct / extend workaround(s) leaving an IOMMU disabled"):
> Ian - I'm also Cc-ing you since this feels like being on the edge
> between a new feature and a bug fix.

Thanks.

I think 2/ is a new quirk (or, new behaviour for an existing quirk).
I think I am happy to treat that as a bugfix, assuming we are
reasonably confident that most systems (including in particular all
systems without the quirk) will take unchanged codepaths.  Is that
right ?

I don't understand 1/.  It looks bugfixish to me but I am really not
qualified.  I am inclined to defer to your judgement, but it would
help me if you explicitly addressed the overall risks/benefits.

But when reading the patch I did notice one thing that struck me as
undesriable:

> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -750,27 +750,43 @@ static void iommu_enable_translation(str
>              if ( force_iommu )
> -                panic("BIOS did not enable IGD for VT properly, crash Xen for security purpose\n");
> +                panic(crash_fmt, msg);
...
> +        if ( force_iommu )
> +            panic(crash_fmt, msg);

Does this really mean that every exit path from
iommu_enable_translation that doesn't enable the iommu has to have a
check for force_iommu ?

That seems like a recipe for missing one.  And I think a missed one
would be an XSA.  Could we not structure the code some way to avoid
this foreseeable human error ?

Ian.

Re: [PATCH 0/2] VT-d: correct / extend workaround(s) leaving an IOMMU disabled [and 1 more messages]
Posted by Jan Beulich 2 years, 6 months ago
On 11.10.2021 12:56, Ian Jackson wrote:
> Jan Beulich writes ("[PATCH 0/2] VT-d: correct / extend workaround(s) leaving an IOMMU disabled"):
>> Ian - I'm also Cc-ing you since this feels like being on the edge
>> between a new feature and a bug fix.
> 
> Thanks.
> 
> I think 2/ is a new quirk (or, new behaviour for an existing quirk).
> I think I am happy to treat that as a bugfix, assuming we are
> reasonably confident that most systems (including in particular all
> systems without the quirk) will take unchanged codepaths.  Is that
> right ?

Yes. According to Linux there's exactly one BIOS flavor known to
exhibit the issue.

> I don't understand 1/.  It looks bugfixish to me but I am really not
> qualified.  I am inclined to defer to your judgement, but it would
> help me if you explicitly addressed the overall risks/benefits.

Right now our documentation claims similarity to a Linux workaround
without the similarity actually existing in the general case. A
common case (a single integrated graphics device) is handled, but the
perhaps yet more common case of a single add-in graphics devices is
not. Plus the criteria by which a device is determined to be a
graphics one was completely flawed. Hence people in need of the
workaround may find it non-functional. However, since our doc tells
people to report if they have a need to use the option engaging the
workaround, and since we didn't have any such reports in a number
of years, I guess both benefits and possible risks here are of
purely theoretical nature. Note that I've specifically said "possible"
because I can't really see any beyond me not having properly matched
Linux'es equivalent workaround - that workaround has been in place
unchanged for very many years.

> But when reading the patch I did notice one thing that struck me as
> undesriable:
> 
>> --- a/xen/drivers/passthrough/vtd/iommu.c
>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>> @@ -750,27 +750,43 @@ static void iommu_enable_translation(str
>>              if ( force_iommu )
>> -                panic("BIOS did not enable IGD for VT properly, crash Xen for security purpose\n");
>> +                panic(crash_fmt, msg);
> ...
>> +        if ( force_iommu )
>> +            panic(crash_fmt, msg);
> 
> Does this really mean that every exit path from
> iommu_enable_translation that doesn't enable the iommu has to have a
> check for force_iommu ?
> 
> That seems like a recipe for missing one.  And I think a missed one
> would be an XSA.  Could we not structure the code some way to avoid
> this foreseeable human error ?

I'm afraid I don't see a good way to do so, as imo it's desirable to
have separate log messages. IOW something like

    if ( ... )
    {
        msg = "...";
        goto dead;
    }

doesn't look any better to me. Also leaving individual IOMMUs disabled
should really be the exception anyway.

Jan


Re: [PATCH 0/2] VT-d: correct / extend workaround(s) leaving an IOMMU disabled [and 1 more messages]
Posted by Ian Jackson 2 years, 6 months ago
Jan Beulich writes ("Re: [PATCH 0/2] VT-d: correct / extend workaround(s) leaving an IOMMU disabled [and 1 more messages]"):
> On 11.10.2021 12:56, Ian Jackson wrote:
> > I think 2/ is a new quirk (or, new behaviour for an existing quirk).
> > I think I am happy to treat that as a bugfix, assuming we are
> > reasonably confident that most systems (including in particular all
> > systems without the quirk) will take unchanged codepaths.  Is that
> > right ?
> 
> Yes. According to Linux there's exactly one BIOS flavor known to
> exhibit the issue.
> 
> > I don't understand 1/.  It looks bugfixish to me but I am really not
> > qualified.  I am inclined to defer to your judgement, but it would
> > help me if you explicitly addressed the overall risks/benefits.
> 
> Right now our documentation claims similarity to a Linux workaround
> without the similarity actually existing in the general case. A
> common case (a single integrated graphics device) is handled, but the
> perhaps yet more common case of a single add-in graphics devices is
> not. Plus the criteria by which a device is determined to be a
> graphics one was completely flawed. Hence people in need of the
> workaround may find it non-functional. However, since our doc tells
> people to report if they have a need to use the option engaging the
> workaround, and since we didn't have any such reports in a number
> of years, I guess both benefits and possible risks here are of
> purely theoretical nature. Note that I've specifically said "possible"
> because I can't really see any beyond me not having properly matched
> Linux'es equivalent workaround - that workaround has been in place
> unchanged for very many years.

OK, great.  Thanks for the explanation.  For the record,

Release-Acked-by: Ian Jackson <iwj@xenproject.org>

> > But when reading the patch I did notice one thing that struck me as
> > undesriable:
...
> > That seems like a recipe for missing one.  And I think a missed one
> > would be an XSA.  Could we not structure the code some way to avoid
> > this foreseeable human error ?
> 
> I'm afraid I don't see a good way to do so, as imo it's desirable to
> have separate log messages. IOW something like
> 
>     if ( ... )
>     {
>         msg = "...";
>         goto dead;
>     }
> 
> doesn't look any better to me. Also leaving individual IOMMUs disabled
> should really be the exception anyway.

C does not make this kind of thing easy.  I might be tempted to make
an inner function which returned a const char*, with NULL meaning "it
went OK".  Oh for a proper sum type...

Ian.