[PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED

Roger Pau Monne posted 1 patch 1 year, 7 months ago
Test gitlab-ci failed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20220930141737.67574-1-roger.pau@citrix.com
xen/arch/x86/efi/efi-boot.h | 8 ++++++++
1 file changed, 8 insertions(+)
[PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED
Posted by Roger Pau Monne 1 year, 7 months ago
The EFI memory map contains two memory types (EfiMemoryMappedIO and
EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
devices used by EFI.

The current parsing of the EFI memory map was translating
EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
x86.  This is an issue because device MMIO regions (BARs) should not
be positioned on reserved regions.  Any BARs positioned on non-hole
areas of the memory map will cause is_memory_hole() to return false,
which would then cause memory decoding to be disabled for such device.
This leads to EFI firmware malfunctions when using runtime services.

The system under which this was observed has:

EFI memory map:
[...]
 00000fd000000-00000fe7fffff type=11 attr=800000000000100d
[...]
0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map

The device behind this BAR is:

00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09)
	Subsystem: Super Micro Computer Inc Device 091c
	Flags: fast devsel
	Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well

For the record, the symptom observed in that machine was a hard freeze
when attempting to set an EFI variable (XEN_EFI_set_variable).

Fix by not adding regions with type EfiMemoryMappedIO or
EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
be positioned there.

Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
I don't understand the definition of EfiMemoryMappedIOPortSpace:

"System memory-mapped IO region that is used to translate memory
cycles to IO cycles by the processor."

But given its name I would assume it's also likely used to mark ranges
in use by PCI device BARs.

It would also be interesting to forward this information to dom0, so
it doesn't attempt to move the BARs of this device(s) around, or else
issues will arise.

And of course the device must not be passed through to domUs, as
disabling memory decoding on it can lead to a host DoS.  Not that it
makes much sense to pass devices like the one above to guests.

It also makes me wonder whether this playing with the decoding bit
that we do in Xen is safe.  It might be more resilient to only disable
memory decoding when the BARs overlap a RAM region, as that would
indeed cause issues.

We should also consider moving away from the e820 and instead using
the EFI memory map for things like is_memory_hole(), but that would
involve translating e820 memory maps into EFI format, which might be
easier and more reliable than the other way around which we currently
do.
---
 xen/arch/x86/efi/efi-boot.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index 836e8c2ba1..12ad94cd71 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -169,6 +169,14 @@ static void __init efi_arch_process_memory_map(EFI_SYSTEM_TABLE *SystemTable,
 
         switch ( desc->Type )
         {
+        case EfiMemoryMappedIO:
+        case EfiMemoryMappedIOPortSpace:
+            /*
+             * There no suitable e820 memory type to represent a MMIO area
+             * except a hole.
+             */
+            continue;
+
         case EfiBootServicesCode:
         case EfiBootServicesData:
             if ( map_bs )
-- 
2.37.3


Re: [PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED
Posted by Jan Beulich 1 year, 7 months ago
On 30.09.2022 16:17, Roger Pau Monne wrote:
> The EFI memory map contains two memory types (EfiMemoryMappedIO and
> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
> devices used by EFI.
> 
> The current parsing of the EFI memory map was translating
> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
> x86.  This is an issue because device MMIO regions (BARs) should not
> be positioned on reserved regions.  Any BARs positioned on non-hole
> areas of the memory map will cause is_memory_hole() to return false,
> which would then cause memory decoding to be disabled for such device.
> This leads to EFI firmware malfunctions when using runtime services.
> 
> The system under which this was observed has:
> 
> EFI memory map:
> [...]
>  00000fd000000-00000fe7fffff type=11 attr=800000000000100d
> [...]
> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
> 
> The device behind this BAR is:
> 
> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09)
> 	Subsystem: Super Micro Computer Inc Device 091c
> 	Flags: fast devsel
> 	Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well
> 
> For the record, the symptom observed in that machine was a hard freeze
> when attempting to set an EFI variable (XEN_EFI_set_variable).
> 
> Fix by not adding regions with type EfiMemoryMappedIO or
> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
> be positioned there.
> 
> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

In the best case this is moving us from one way of being wrong to another:
So far we wrongly include BARs in E820_RESERVED (_if_ they can be
legitimately covered by a EfiMemoryMappedIO region in the first place,
which I'm not sure is actually permitted - iirc just like E820_RESERVED
may not be used for BARs, this memory type also may not be), whereas with
your change we would no longer report non-BAR MMIO space (chipset specific
ranges for example) as reserved. In fact I think the example you provide
is at least partly due to bogus firmware behavior: The BAR is put in space
normally used for firmware specific memory (MMIO) ranges. I think firmware
should either assign the BAR differently or exclude the range from the
memory map.

I guess instead we want to handle the example you give by a firmware quirk
workaround.

> ---
> I don't understand the definition of EfiMemoryMappedIOPortSpace:
> 
> "System memory-mapped IO region that is used to translate memory
> cycles to IO cycles by the processor."

That's something (only?) IA-64 used, where kind of as a "replacement" for
x86 I/O port accesses equivalents thereof were provided (iirc 4 ports
per page) via MMIO accesses. It is this compatibility MMIO space which is
marked this way. Such ranges should never be seen on (current) x86.

> But given its name I would assume it's also likely used to mark ranges
> in use by PCI device BARs.
> 
> It would also be interesting to forward this information to dom0, so
> it doesn't attempt to move the BARs of this device(s) around, or else
> issues will arise.

None of this is device specific. There's simply (typically) one MMIO
range covering the entire 64k or I/O port space.

Jan

Re: [PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED
Posted by Roger Pau Monné 1 year, 7 months ago
On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote:
> On 30.09.2022 16:17, Roger Pau Monne wrote:
> > The EFI memory map contains two memory types (EfiMemoryMappedIO and
> > EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
> > devices used by EFI.
> > 
> > The current parsing of the EFI memory map was translating
> > EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
> > x86.  This is an issue because device MMIO regions (BARs) should not
> > be positioned on reserved regions.  Any BARs positioned on non-hole
> > areas of the memory map will cause is_memory_hole() to return false,
> > which would then cause memory decoding to be disabled for such device.
> > This leads to EFI firmware malfunctions when using runtime services.
> > 
> > The system under which this was observed has:
> > 
> > EFI memory map:
> > [...]
> >  00000fd000000-00000fe7fffff type=11 attr=800000000000100d
> > [...]
> > 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
> > 
> > The device behind this BAR is:
> > 
> > 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09)
> > 	Subsystem: Super Micro Computer Inc Device 091c
> > 	Flags: fast devsel
> > 	Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well
> > 
> > For the record, the symptom observed in that machine was a hard freeze
> > when attempting to set an EFI variable (XEN_EFI_set_variable).
> > 
> > Fix by not adding regions with type EfiMemoryMappedIO or
> > EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
> > be positioned there.
> > 
> > Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> In the best case this is moving us from one way of being wrong to another:
> So far we wrongly include BARs in E820_RESERVED (_if_ they can be
> legitimately covered by a EfiMemoryMappedIO region in the first place,
> which I'm not sure is actually permitted - iirc just like E820_RESERVED
> may not be used for BARs, this memory type also may not be), whereas with
> your change we would no longer report non-BAR MMIO space (chipset specific
> ranges for example) as reserved. In fact I think the example you provide
> is at least partly due to bogus firmware behavior: The BAR is put in space
> normally used for firmware specific memory (MMIO) ranges. I think firmware
> should either assign the BAR differently or exclude the range from the
> memory map.

Hm, I'm not sure the example is bogus, how would firmware request a BAR
to be mapped for run time services to access it otherwise if it's not
using EfiMemoryMappedIO?

Not adding the BAR to the memory map in any way would mean the OS is
free to not map it for runtime services to access.

> I guess instead we want to handle the example you give by a firmware quirk
> workaround.

I'm unconvinced we need a quirk for this. AFAICT such usage of
EfiMemoryMappedIO doesn't go against the UEFI spec, and hence we need
to handle it without requiring specific firmware quirks.

> > ---
> > I don't understand the definition of EfiMemoryMappedIOPortSpace:
> > 
> > "System memory-mapped IO region that is used to translate memory
> > cycles to IO cycles by the processor."
> 
> That's something (only?) IA-64 used, where kind of as a "replacement" for
> x86 I/O port accesses equivalents thereof were provided (iirc 4 ports
> per page) via MMIO accesses. It is this compatibility MMIO space which is
> marked this way. Such ranges should never be seen on (current) x86.

I've heard the Arm guys speak about something similar.

There's a clarification note in newer versions of the UEFI spec:

"Note: There is only one region of type EfiMemoryMappedIoPortSpace
defined in the architecture for Itanium-based platforms. As a result,
there should be one and only one region of type
EfiMemoryMappedIoPortSpace in the EFI memory map of an Itanium-based
platform."

> > But given its name I would assume it's also likely used to mark ranges
> > in use by PCI device BARs.
> > 
> > It would also be interesting to forward this information to dom0, so
> > it doesn't attempt to move the BARs of this device(s) around, or else
> > issues will arise.
> 
> None of this is device specific. There's simply (typically) one MMIO
> range covering the entire 64k or I/O port space.

So this translation region won't be in a BAR of a host bridge for
example?

Thanks, Roger.

Re: [PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED
Posted by Jan Beulich 1 year, 7 months ago
On 04.10.2022 11:27, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote:
>> On 30.09.2022 16:17, Roger Pau Monne wrote:
>>> The EFI memory map contains two memory types (EfiMemoryMappedIO and
>>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
>>> devices used by EFI.
>>>
>>> The current parsing of the EFI memory map was translating
>>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
>>> x86.  This is an issue because device MMIO regions (BARs) should not
>>> be positioned on reserved regions.  Any BARs positioned on non-hole
>>> areas of the memory map will cause is_memory_hole() to return false,
>>> which would then cause memory decoding to be disabled for such device.
>>> This leads to EFI firmware malfunctions when using runtime services.
>>>
>>> The system under which this was observed has:
>>>
>>> EFI memory map:
>>> [...]
>>>  00000fd000000-00000fe7fffff type=11 attr=800000000000100d
>>> [...]
>>> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
>>>
>>> The device behind this BAR is:
>>>
>>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09)
>>> 	Subsystem: Super Micro Computer Inc Device 091c
>>> 	Flags: fast devsel
>>> 	Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well
>>>
>>> For the record, the symptom observed in that machine was a hard freeze
>>> when attempting to set an EFI variable (XEN_EFI_set_variable).
>>>
>>> Fix by not adding regions with type EfiMemoryMappedIO or
>>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
>>> be positioned there.
>>>
>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>
>> In the best case this is moving us from one way of being wrong to another:
>> So far we wrongly include BARs in E820_RESERVED (_if_ they can be
>> legitimately covered by a EfiMemoryMappedIO region in the first place,
>> which I'm not sure is actually permitted - iirc just like E820_RESERVED
>> may not be used for BARs, this memory type also may not be), whereas with
>> your change we would no longer report non-BAR MMIO space (chipset specific
>> ranges for example) as reserved. In fact I think the example you provide
>> is at least partly due to bogus firmware behavior: The BAR is put in space
>> normally used for firmware specific memory (MMIO) ranges. I think firmware
>> should either assign the BAR differently or exclude the range from the
>> memory map.
> 
> Hm, I'm not sure the example is bogus, how would firmware request a BAR
> to be mapped for run time services to access it otherwise if it's not
> using EfiMemoryMappedIO?
> 
> Not adding the BAR to the memory map in any way would mean the OS is
> free to not map it for runtime services to access.

My view is that BARs should not be marked for runtime services use. Doing
so requires awareness of the driver inside the OS, which I don't think
one can expect. If firmware needs to make use of a device in a system, it
ought to properly hide it from the OS. Note how the potential sharing of
an RTC requires special provisions in the spec, mandating driver awareness.

Having a BAR expressed in the memory map also contradicts the ability of
an OS to relocate all BARs of all devices, if necessary.

>> I guess instead we want to handle the example you give by a firmware quirk
>> workaround.
> 
> I'm unconvinced we need a quirk for this. AFAICT such usage of
> EfiMemoryMappedIO doesn't go against the UEFI spec, and hence we need
> to handle it without requiring specific firmware quirks.
> 
>>> ---
>>> I don't understand the definition of EfiMemoryMappedIOPortSpace:
>>>
>>> "System memory-mapped IO region that is used to translate memory
>>> cycles to IO cycles by the processor."
>>
>> That's something (only?) IA-64 used, where kind of as a "replacement" for
>> x86 I/O port accesses equivalents thereof were provided (iirc 4 ports
>> per page) via MMIO accesses. It is this compatibility MMIO space which is
>> marked this way. Such ranges should never be seen on (current) x86.
> 
> I've heard the Arm guys speak about something similar.
> 
> There's a clarification note in newer versions of the UEFI spec:
> 
> "Note: There is only one region of type EfiMemoryMappedIoPortSpace
> defined in the architecture for Itanium-based platforms. As a result,
> there should be one and only one region of type
> EfiMemoryMappedIoPortSpace in the EFI memory map of an Itanium-based
> platform."
> 
>>> But given its name I would assume it's also likely used to mark ranges
>>> in use by PCI device BARs.
>>>
>>> It would also be interesting to forward this information to dom0, so
>>> it doesn't attempt to move the BARs of this device(s) around, or else
>>> issues will arise.
>>
>> None of this is device specific. There's simply (typically) one MMIO
>> range covering the entire 64k or I/O port space.
> 
> So this translation region won't be in a BAR of a host bridge for
> example?

I have to admit that I don't recall at which layer the conversion happens.
I also didn't think (host) bridges would typically have BARs. Bridges (but
iirc not host bridges) have bridge windows, but that's different.

Jan

Re: [PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED
Posted by Roger Pau Monné 1 year, 7 months ago
On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote:
> On 04.10.2022 11:27, Roger Pau Monné wrote:
> > On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote:
> >> On 30.09.2022 16:17, Roger Pau Monne wrote:
> >>> The EFI memory map contains two memory types (EfiMemoryMappedIO and
> >>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
> >>> devices used by EFI.
> >>>
> >>> The current parsing of the EFI memory map was translating
> >>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
> >>> x86.  This is an issue because device MMIO regions (BARs) should not
> >>> be positioned on reserved regions.  Any BARs positioned on non-hole
> >>> areas of the memory map will cause is_memory_hole() to return false,
> >>> which would then cause memory decoding to be disabled for such device.
> >>> This leads to EFI firmware malfunctions when using runtime services.
> >>>
> >>> The system under which this was observed has:
> >>>
> >>> EFI memory map:
> >>> [...]
> >>>  00000fd000000-00000fe7fffff type=11 attr=800000000000100d
> >>> [...]
> >>> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
> >>>
> >>> The device behind this BAR is:
> >>>
> >>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09)
> >>> 	Subsystem: Super Micro Computer Inc Device 091c
> >>> 	Flags: fast devsel
> >>> 	Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well
> >>>
> >>> For the record, the symptom observed in that machine was a hard freeze
> >>> when attempting to set an EFI variable (XEN_EFI_set_variable).
> >>>
> >>> Fix by not adding regions with type EfiMemoryMappedIO or
> >>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
> >>> be positioned there.
> >>>
> >>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
> >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>
> >> In the best case this is moving us from one way of being wrong to another:
> >> So far we wrongly include BARs in E820_RESERVED (_if_ they can be
> >> legitimately covered by a EfiMemoryMappedIO region in the first place,
> >> which I'm not sure is actually permitted - iirc just like E820_RESERVED
> >> may not be used for BARs, this memory type also may not be), whereas with
> >> your change we would no longer report non-BAR MMIO space (chipset specific
> >> ranges for example) as reserved. In fact I think the example you provide
> >> is at least partly due to bogus firmware behavior: The BAR is put in space
> >> normally used for firmware specific memory (MMIO) ranges. I think firmware
> >> should either assign the BAR differently or exclude the range from the
> >> memory map.
> > 
> > Hm, I'm not sure the example is bogus, how would firmware request a BAR
> > to be mapped for run time services to access it otherwise if it's not
> > using EfiMemoryMappedIO?
> > 
> > Not adding the BAR to the memory map in any way would mean the OS is
> > free to not map it for runtime services to access.
> 
> My view is that BARs should not be marked for runtime services use. Doing
> so requires awareness of the driver inside the OS, which I don't think
> one can expect. If firmware needs to make use of a device in a system, it
> ought to properly hide it from the OS. Note how the potential sharing of
> an RTC requires special provisions in the spec, mandating driver awareness.
> 
> Having a BAR expressed in the memory map also contradicts the ability of
> an OS to relocate all BARs of all devices, if necessary.

I've failed to figure out if there's a way in UEFI to report a device
is in use by the firmware.  I've already looked before sending the
patch (see also the post commit notes about for example not passing
through the device to any guest for obvious reason).

I've got no idea if Linux has any checks to avoid trying to move BARs
residing in EfiMemoryMappedIO ranges, we have now observed this
behavior in two systems already.

Maybe we could do a special check for PCI devices and allow them
having BARs in EfiMemoryMappedIO, together with printing a warning
message.

Thanks, Roger.

Re: [PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED
Posted by Jan Beulich 1 year, 7 months ago
On 04.10.2022 14:17, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote:
>> On 04.10.2022 11:27, Roger Pau Monné wrote:
>>> On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote:
>>>> On 30.09.2022 16:17, Roger Pau Monne wrote:
>>>>> The EFI memory map contains two memory types (EfiMemoryMappedIO and
>>>>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
>>>>> devices used by EFI.
>>>>>
>>>>> The current parsing of the EFI memory map was translating
>>>>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
>>>>> x86.  This is an issue because device MMIO regions (BARs) should not
>>>>> be positioned on reserved regions.  Any BARs positioned on non-hole
>>>>> areas of the memory map will cause is_memory_hole() to return false,
>>>>> which would then cause memory decoding to be disabled for such device.
>>>>> This leads to EFI firmware malfunctions when using runtime services.
>>>>>
>>>>> The system under which this was observed has:
>>>>>
>>>>> EFI memory map:
>>>>> [...]
>>>>>  00000fd000000-00000fe7fffff type=11 attr=800000000000100d
>>>>> [...]
>>>>> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
>>>>>
>>>>> The device behind this BAR is:
>>>>>
>>>>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09)
>>>>> 	Subsystem: Super Micro Computer Inc Device 091c
>>>>> 	Flags: fast devsel
>>>>> 	Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well
>>>>>
>>>>> For the record, the symptom observed in that machine was a hard freeze
>>>>> when attempting to set an EFI variable (XEN_EFI_set_variable).
>>>>>
>>>>> Fix by not adding regions with type EfiMemoryMappedIO or
>>>>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
>>>>> be positioned there.
>>>>>
>>>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>>
>>>> In the best case this is moving us from one way of being wrong to another:
>>>> So far we wrongly include BARs in E820_RESERVED (_if_ they can be
>>>> legitimately covered by a EfiMemoryMappedIO region in the first place,
>>>> which I'm not sure is actually permitted - iirc just like E820_RESERVED
>>>> may not be used for BARs, this memory type also may not be), whereas with
>>>> your change we would no longer report non-BAR MMIO space (chipset specific
>>>> ranges for example) as reserved. In fact I think the example you provide
>>>> is at least partly due to bogus firmware behavior: The BAR is put in space
>>>> normally used for firmware specific memory (MMIO) ranges. I think firmware
>>>> should either assign the BAR differently or exclude the range from the
>>>> memory map.
>>>
>>> Hm, I'm not sure the example is bogus, how would firmware request a BAR
>>> to be mapped for run time services to access it otherwise if it's not
>>> using EfiMemoryMappedIO?
>>>
>>> Not adding the BAR to the memory map in any way would mean the OS is
>>> free to not map it for runtime services to access.
>>
>> My view is that BARs should not be marked for runtime services use. Doing
>> so requires awareness of the driver inside the OS, which I don't think
>> one can expect. If firmware needs to make use of a device in a system, it
>> ought to properly hide it from the OS. Note how the potential sharing of
>> an RTC requires special provisions in the spec, mandating driver awareness.
>>
>> Having a BAR expressed in the memory map also contradicts the ability of
>> an OS to relocate all BARs of all devices, if necessary.
> 
> I've failed to figure out if there's a way in UEFI to report a device
> is in use by the firmware.  I've already looked before sending the
> patch (see also the post commit notes about for example not passing
> through the device to any guest for obvious reason).
> 
> I've got no idea if Linux has any checks to avoid trying to move BARs
> residing in EfiMemoryMappedIO ranges, we have now observed this
> behavior in two systems already.
> 
> Maybe we could do a special check for PCI devices and allow them
> having BARs in EfiMemoryMappedIO, together with printing a warning
> message.

Right, that's one of the possible quirk workarounds I was thinking of.
At the risk of stating the obvious - the same would presumably apply to
E820_RESERVED on non-EFI systems then.

Jan

Re: [PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED
Posted by Roger Pau Monné 1 year, 7 months ago
On Tue, Oct 04, 2022 at 02:21:20PM +0200, Jan Beulich wrote:
> On 04.10.2022 14:17, Roger Pau Monné wrote:
> > On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote:
> >> On 04.10.2022 11:27, Roger Pau Monné wrote:
> >>> On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote:
> >>>> On 30.09.2022 16:17, Roger Pau Monne wrote:
> >>>>> The EFI memory map contains two memory types (EfiMemoryMappedIO and
> >>>>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
> >>>>> devices used by EFI.
> >>>>>
> >>>>> The current parsing of the EFI memory map was translating
> >>>>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
> >>>>> x86.  This is an issue because device MMIO regions (BARs) should not
> >>>>> be positioned on reserved regions.  Any BARs positioned on non-hole
> >>>>> areas of the memory map will cause is_memory_hole() to return false,
> >>>>> which would then cause memory decoding to be disabled for such device.
> >>>>> This leads to EFI firmware malfunctions when using runtime services.
> >>>>>
> >>>>> The system under which this was observed has:
> >>>>>
> >>>>> EFI memory map:
> >>>>> [...]
> >>>>>  00000fd000000-00000fe7fffff type=11 attr=800000000000100d
> >>>>> [...]
> >>>>> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
> >>>>>
> >>>>> The device behind this BAR is:
> >>>>>
> >>>>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09)
> >>>>> 	Subsystem: Super Micro Computer Inc Device 091c
> >>>>> 	Flags: fast devsel
> >>>>> 	Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well
> >>>>>
> >>>>> For the record, the symptom observed in that machine was a hard freeze
> >>>>> when attempting to set an EFI variable (XEN_EFI_set_variable).
> >>>>>
> >>>>> Fix by not adding regions with type EfiMemoryMappedIO or
> >>>>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
> >>>>> be positioned there.
> >>>>>
> >>>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
> >>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>>>
> >>>> In the best case this is moving us from one way of being wrong to another:
> >>>> So far we wrongly include BARs in E820_RESERVED (_if_ they can be
> >>>> legitimately covered by a EfiMemoryMappedIO region in the first place,
> >>>> which I'm not sure is actually permitted - iirc just like E820_RESERVED
> >>>> may not be used for BARs, this memory type also may not be), whereas with
> >>>> your change we would no longer report non-BAR MMIO space (chipset specific
> >>>> ranges for example) as reserved. In fact I think the example you provide
> >>>> is at least partly due to bogus firmware behavior: The BAR is put in space
> >>>> normally used for firmware specific memory (MMIO) ranges. I think firmware
> >>>> should either assign the BAR differently or exclude the range from the
> >>>> memory map.
> >>>
> >>> Hm, I'm not sure the example is bogus, how would firmware request a BAR
> >>> to be mapped for run time services to access it otherwise if it's not
> >>> using EfiMemoryMappedIO?
> >>>
> >>> Not adding the BAR to the memory map in any way would mean the OS is
> >>> free to not map it for runtime services to access.
> >>
> >> My view is that BARs should not be marked for runtime services use. Doing
> >> so requires awareness of the driver inside the OS, which I don't think
> >> one can expect. If firmware needs to make use of a device in a system, it
> >> ought to properly hide it from the OS. Note how the potential sharing of
> >> an RTC requires special provisions in the spec, mandating driver awareness.
> >>
> >> Having a BAR expressed in the memory map also contradicts the ability of
> >> an OS to relocate all BARs of all devices, if necessary.
> > 
> > I've failed to figure out if there's a way in UEFI to report a device
> > is in use by the firmware.  I've already looked before sending the
> > patch (see also the post commit notes about for example not passing
> > through the device to any guest for obvious reason).
> > 
> > I've got no idea if Linux has any checks to avoid trying to move BARs
> > residing in EfiMemoryMappedIO ranges, we have now observed this
> > behavior in two systems already.
> > 
> > Maybe we could do a special check for PCI devices and allow them
> > having BARs in EfiMemoryMappedIO, together with printing a warning
> > message.
> 
> Right, that's one of the possible quirk workarounds I was thinking of.
> At the risk of stating the obvious - the same would presumably apply to
> E820_RESERVED on non-EFI systems then.

One option would be to strictly limit to EfiMemoryMappedIO, by taking
the EFI memory map into account also if present.

Another maybe simpler option is to allow BARs to be placed in
E820_RESERVED regions, and translate EfiMemoryMappedIO into
E820_RESERVED like we have been doing.

I will attempt the later if you are OK with the approach.

Thanks, Roger.

Re: [PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED
Posted by Jan Beulich 1 year, 7 months ago
On 04.10.2022 14:59, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 02:21:20PM +0200, Jan Beulich wrote:
>> On 04.10.2022 14:17, Roger Pau Monné wrote:
>>> On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote:
>>>> On 04.10.2022 11:27, Roger Pau Monné wrote:
>>>>> On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote:
>>>>>> On 30.09.2022 16:17, Roger Pau Monne wrote:
>>>>>>> The EFI memory map contains two memory types (EfiMemoryMappedIO and
>>>>>>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
>>>>>>> devices used by EFI.
>>>>>>>
>>>>>>> The current parsing of the EFI memory map was translating
>>>>>>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
>>>>>>> x86.  This is an issue because device MMIO regions (BARs) should not
>>>>>>> be positioned on reserved regions.  Any BARs positioned on non-hole
>>>>>>> areas of the memory map will cause is_memory_hole() to return false,
>>>>>>> which would then cause memory decoding to be disabled for such device.
>>>>>>> This leads to EFI firmware malfunctions when using runtime services.
>>>>>>>
>>>>>>> The system under which this was observed has:
>>>>>>>
>>>>>>> EFI memory map:
>>>>>>> [...]
>>>>>>>  00000fd000000-00000fe7fffff type=11 attr=800000000000100d
>>>>>>> [...]
>>>>>>> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
>>>>>>>
>>>>>>> The device behind this BAR is:
>>>>>>>
>>>>>>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09)
>>>>>>> 	Subsystem: Super Micro Computer Inc Device 091c
>>>>>>> 	Flags: fast devsel
>>>>>>> 	Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well
>>>>>>>
>>>>>>> For the record, the symptom observed in that machine was a hard freeze
>>>>>>> when attempting to set an EFI variable (XEN_EFI_set_variable).
>>>>>>>
>>>>>>> Fix by not adding regions with type EfiMemoryMappedIO or
>>>>>>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
>>>>>>> be positioned there.
>>>>>>>
>>>>>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
>>>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>>>>
>>>>>> In the best case this is moving us from one way of being wrong to another:
>>>>>> So far we wrongly include BARs in E820_RESERVED (_if_ they can be
>>>>>> legitimately covered by a EfiMemoryMappedIO region in the first place,
>>>>>> which I'm not sure is actually permitted - iirc just like E820_RESERVED
>>>>>> may not be used for BARs, this memory type also may not be), whereas with
>>>>>> your change we would no longer report non-BAR MMIO space (chipset specific
>>>>>> ranges for example) as reserved. In fact I think the example you provide
>>>>>> is at least partly due to bogus firmware behavior: The BAR is put in space
>>>>>> normally used for firmware specific memory (MMIO) ranges. I think firmware
>>>>>> should either assign the BAR differently or exclude the range from the
>>>>>> memory map.
>>>>>
>>>>> Hm, I'm not sure the example is bogus, how would firmware request a BAR
>>>>> to be mapped for run time services to access it otherwise if it's not
>>>>> using EfiMemoryMappedIO?
>>>>>
>>>>> Not adding the BAR to the memory map in any way would mean the OS is
>>>>> free to not map it for runtime services to access.
>>>>
>>>> My view is that BARs should not be marked for runtime services use. Doing
>>>> so requires awareness of the driver inside the OS, which I don't think
>>>> one can expect. If firmware needs to make use of a device in a system, it
>>>> ought to properly hide it from the OS. Note how the potential sharing of
>>>> an RTC requires special provisions in the spec, mandating driver awareness.
>>>>
>>>> Having a BAR expressed in the memory map also contradicts the ability of
>>>> an OS to relocate all BARs of all devices, if necessary.
>>>
>>> I've failed to figure out if there's a way in UEFI to report a device
>>> is in use by the firmware.  I've already looked before sending the
>>> patch (see also the post commit notes about for example not passing
>>> through the device to any guest for obvious reason).
>>>
>>> I've got no idea if Linux has any checks to avoid trying to move BARs
>>> residing in EfiMemoryMappedIO ranges, we have now observed this
>>> behavior in two systems already.
>>>
>>> Maybe we could do a special check for PCI devices and allow them
>>> having BARs in EfiMemoryMappedIO, together with printing a warning
>>> message.
>>
>> Right, that's one of the possible quirk workarounds I was thinking of.
>> At the risk of stating the obvious - the same would presumably apply to
>> E820_RESERVED on non-EFI systems then.
> 
> One option would be to strictly limit to EfiMemoryMappedIO, by taking
> the EFI memory map into account also if present.
> 
> Another maybe simpler option is to allow BARs to be placed in
> E820_RESERVED regions, and translate EfiMemoryMappedIO into
> E820_RESERVED like we have been doing.
> 
> I will attempt the later if you are OK with the approach.

I might be okay with the approach, but first of all I continue to be
worried of us violating specifications if we make this the default
behavior.

Jan

Re: [PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED
Posted by Roger Pau Monné 1 year, 7 months ago
On Tue, Oct 04, 2022 at 03:15:02PM +0200, Jan Beulich wrote:
> On 04.10.2022 14:59, Roger Pau Monné wrote:
> > On Tue, Oct 04, 2022 at 02:21:20PM +0200, Jan Beulich wrote:
> >> On 04.10.2022 14:17, Roger Pau Monné wrote:
> >>> On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote:
> >>>> On 04.10.2022 11:27, Roger Pau Monné wrote:
> >>>>> On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote:
> >>>>>> On 30.09.2022 16:17, Roger Pau Monne wrote:
> >>>>>>> The EFI memory map contains two memory types (EfiMemoryMappedIO and
> >>>>>>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
> >>>>>>> devices used by EFI.
> >>>>>>>
> >>>>>>> The current parsing of the EFI memory map was translating
> >>>>>>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
> >>>>>>> x86.  This is an issue because device MMIO regions (BARs) should not
> >>>>>>> be positioned on reserved regions.  Any BARs positioned on non-hole
> >>>>>>> areas of the memory map will cause is_memory_hole() to return false,
> >>>>>>> which would then cause memory decoding to be disabled for such device.
> >>>>>>> This leads to EFI firmware malfunctions when using runtime services.
> >>>>>>>
> >>>>>>> The system under which this was observed has:
> >>>>>>>
> >>>>>>> EFI memory map:
> >>>>>>> [...]
> >>>>>>>  00000fd000000-00000fe7fffff type=11 attr=800000000000100d
> >>>>>>> [...]
> >>>>>>> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
> >>>>>>>
> >>>>>>> The device behind this BAR is:
> >>>>>>>
> >>>>>>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09)
> >>>>>>> 	Subsystem: Super Micro Computer Inc Device 091c
> >>>>>>> 	Flags: fast devsel
> >>>>>>> 	Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well
> >>>>>>>
> >>>>>>> For the record, the symptom observed in that machine was a hard freeze
> >>>>>>> when attempting to set an EFI variable (XEN_EFI_set_variable).
> >>>>>>>
> >>>>>>> Fix by not adding regions with type EfiMemoryMappedIO or
> >>>>>>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
> >>>>>>> be positioned there.
> >>>>>>>
> >>>>>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
> >>>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>>>>>
> >>>>>> In the best case this is moving us from one way of being wrong to another:
> >>>>>> So far we wrongly include BARs in E820_RESERVED (_if_ they can be
> >>>>>> legitimately covered by a EfiMemoryMappedIO region in the first place,
> >>>>>> which I'm not sure is actually permitted - iirc just like E820_RESERVED
> >>>>>> may not be used for BARs, this memory type also may not be), whereas with
> >>>>>> your change we would no longer report non-BAR MMIO space (chipset specific
> >>>>>> ranges for example) as reserved. In fact I think the example you provide
> >>>>>> is at least partly due to bogus firmware behavior: The BAR is put in space
> >>>>>> normally used for firmware specific memory (MMIO) ranges. I think firmware
> >>>>>> should either assign the BAR differently or exclude the range from the
> >>>>>> memory map.
> >>>>>
> >>>>> Hm, I'm not sure the example is bogus, how would firmware request a BAR
> >>>>> to be mapped for run time services to access it otherwise if it's not
> >>>>> using EfiMemoryMappedIO?
> >>>>>
> >>>>> Not adding the BAR to the memory map in any way would mean the OS is
> >>>>> free to not map it for runtime services to access.
> >>>>
> >>>> My view is that BARs should not be marked for runtime services use. Doing
> >>>> so requires awareness of the driver inside the OS, which I don't think
> >>>> one can expect. If firmware needs to make use of a device in a system, it
> >>>> ought to properly hide it from the OS. Note how the potential sharing of
> >>>> an RTC requires special provisions in the spec, mandating driver awareness.
> >>>>
> >>>> Having a BAR expressed in the memory map also contradicts the ability of
> >>>> an OS to relocate all BARs of all devices, if necessary.
> >>>
> >>> I've failed to figure out if there's a way in UEFI to report a device
> >>> is in use by the firmware.  I've already looked before sending the
> >>> patch (see also the post commit notes about for example not passing
> >>> through the device to any guest for obvious reason).
> >>>
> >>> I've got no idea if Linux has any checks to avoid trying to move BARs
> >>> residing in EfiMemoryMappedIO ranges, we have now observed this
> >>> behavior in two systems already.
> >>>
> >>> Maybe we could do a special check for PCI devices and allow them
> >>> having BARs in EfiMemoryMappedIO, together with printing a warning
> >>> message.
> >>
> >> Right, that's one of the possible quirk workarounds I was thinking of.
> >> At the risk of stating the obvious - the same would presumably apply to
> >> E820_RESERVED on non-EFI systems then.
> > 
> > One option would be to strictly limit to EfiMemoryMappedIO, by taking
> > the EFI memory map into account also if present.
> > 
> > Another maybe simpler option is to allow BARs to be placed in
> > E820_RESERVED regions, and translate EfiMemoryMappedIO into
> > E820_RESERVED like we have been doing.
> > 
> > I will attempt the later if you are OK with the approach.
> 
> I might be okay with the approach, but first of all I continue to be
> worried of us violating specifications if we make this the default
> behavior.

In any case it would be the firmware violating the specification by
not hiding those PCI devices, Xen is just trying to cope.

Xen went from not checking the position of the BARs at all to
enforcing them to not be placed over any regions on the memory map. I
think we need to relax the checks a bit to match reality.

Thanks, Roger.

Re: [PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED
Posted by Jan Beulich 1 year, 7 months ago
On 04.10.2022 15:55, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 03:15:02PM +0200, Jan Beulich wrote:
>> On 04.10.2022 14:59, Roger Pau Monné wrote:
>>> On Tue, Oct 04, 2022 at 02:21:20PM +0200, Jan Beulich wrote:
>>>> On 04.10.2022 14:17, Roger Pau Monné wrote:
>>>>> On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote:
>>>>>> On 04.10.2022 11:27, Roger Pau Monné wrote:
>>>>>>> On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote:
>>>>>>>> On 30.09.2022 16:17, Roger Pau Monne wrote:
>>>>>>>>> The EFI memory map contains two memory types (EfiMemoryMappedIO and
>>>>>>>>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
>>>>>>>>> devices used by EFI.
>>>>>>>>>
>>>>>>>>> The current parsing of the EFI memory map was translating
>>>>>>>>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
>>>>>>>>> x86.  This is an issue because device MMIO regions (BARs) should not
>>>>>>>>> be positioned on reserved regions.  Any BARs positioned on non-hole
>>>>>>>>> areas of the memory map will cause is_memory_hole() to return false,
>>>>>>>>> which would then cause memory decoding to be disabled for such device.
>>>>>>>>> This leads to EFI firmware malfunctions when using runtime services.
>>>>>>>>>
>>>>>>>>> The system under which this was observed has:
>>>>>>>>>
>>>>>>>>> EFI memory map:
>>>>>>>>> [...]
>>>>>>>>>  00000fd000000-00000fe7fffff type=11 attr=800000000000100d
>>>>>>>>> [...]
>>>>>>>>> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
>>>>>>>>>
>>>>>>>>> The device behind this BAR is:
>>>>>>>>>
>>>>>>>>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09)
>>>>>>>>> 	Subsystem: Super Micro Computer Inc Device 091c
>>>>>>>>> 	Flags: fast devsel
>>>>>>>>> 	Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well
>>>>>>>>>
>>>>>>>>> For the record, the symptom observed in that machine was a hard freeze
>>>>>>>>> when attempting to set an EFI variable (XEN_EFI_set_variable).
>>>>>>>>>
>>>>>>>>> Fix by not adding regions with type EfiMemoryMappedIO or
>>>>>>>>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
>>>>>>>>> be positioned there.
>>>>>>>>>
>>>>>>>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
>>>>>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>>>>>>
>>>>>>>> In the best case this is moving us from one way of being wrong to another:
>>>>>>>> So far we wrongly include BARs in E820_RESERVED (_if_ they can be
>>>>>>>> legitimately covered by a EfiMemoryMappedIO region in the first place,
>>>>>>>> which I'm not sure is actually permitted - iirc just like E820_RESERVED
>>>>>>>> may not be used for BARs, this memory type also may not be), whereas with
>>>>>>>> your change we would no longer report non-BAR MMIO space (chipset specific
>>>>>>>> ranges for example) as reserved. In fact I think the example you provide
>>>>>>>> is at least partly due to bogus firmware behavior: The BAR is put in space
>>>>>>>> normally used for firmware specific memory (MMIO) ranges. I think firmware
>>>>>>>> should either assign the BAR differently or exclude the range from the
>>>>>>>> memory map.
>>>>>>>
>>>>>>> Hm, I'm not sure the example is bogus, how would firmware request a BAR
>>>>>>> to be mapped for run time services to access it otherwise if it's not
>>>>>>> using EfiMemoryMappedIO?
>>>>>>>
>>>>>>> Not adding the BAR to the memory map in any way would mean the OS is
>>>>>>> free to not map it for runtime services to access.
>>>>>>
>>>>>> My view is that BARs should not be marked for runtime services use. Doing
>>>>>> so requires awareness of the driver inside the OS, which I don't think
>>>>>> one can expect. If firmware needs to make use of a device in a system, it
>>>>>> ought to properly hide it from the OS. Note how the potential sharing of
>>>>>> an RTC requires special provisions in the spec, mandating driver awareness.
>>>>>>
>>>>>> Having a BAR expressed in the memory map also contradicts the ability of
>>>>>> an OS to relocate all BARs of all devices, if necessary.
>>>>>
>>>>> I've failed to figure out if there's a way in UEFI to report a device
>>>>> is in use by the firmware.  I've already looked before sending the
>>>>> patch (see also the post commit notes about for example not passing
>>>>> through the device to any guest for obvious reason).
>>>>>
>>>>> I've got no idea if Linux has any checks to avoid trying to move BARs
>>>>> residing in EfiMemoryMappedIO ranges, we have now observed this
>>>>> behavior in two systems already.
>>>>>
>>>>> Maybe we could do a special check for PCI devices and allow them
>>>>> having BARs in EfiMemoryMappedIO, together with printing a warning
>>>>> message.
>>>>
>>>> Right, that's one of the possible quirk workarounds I was thinking of.
>>>> At the risk of stating the obvious - the same would presumably apply to
>>>> E820_RESERVED on non-EFI systems then.
>>>
>>> One option would be to strictly limit to EfiMemoryMappedIO, by taking
>>> the EFI memory map into account also if present.
>>>
>>> Another maybe simpler option is to allow BARs to be placed in
>>> E820_RESERVED regions, and translate EfiMemoryMappedIO into
>>> E820_RESERVED like we have been doing.
>>>
>>> I will attempt the later if you are OK with the approach.
>>
>> I might be okay with the approach, but first of all I continue to be
>> worried of us violating specifications if we make this the default
>> behavior.
> 
> In any case it would be the firmware violating the specification by
> not hiding those PCI devices, Xen is just trying to cope.
> 
> Xen went from not checking the position of the BARs at all to
> enforcing them to not be placed over any regions on the memory map. I
> think we need to relax the checks a bit to match reality.

Sure, as a workaround. Yet we don't want to relax too much, or else a
primary purpose of the check will be lost.

Jan