[Qemu-devel] [RFC 0/2] ARM virt: Support up to 256 PCIe buses

Eric Auger posted 2 patches 7 years, 5 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/1527091418-11874-1-git-send-email-eric.auger@redhat.com
Test checkpatch passed
Test docker-mingw@fedora passed
Test docker-quick@centos7 passed
Test s390x passed
hw/arm/virt-acpi-build.c | 22 ++++++++++++++--------
hw/arm/virt.c            | 39 ++++++++++++++++++++++++++++++++-------
include/hw/arm/virt.h    |  3 +++
3 files changed, 49 insertions(+), 15 deletions(-)
[Qemu-devel] [RFC 0/2] ARM virt: Support up to 256 PCIe buses
Posted by Eric Auger 7 years, 5 months ago
Current Machvirt PCI host controller's ECAM region is 16MB large.
This limits the number of PCIe buses to 16.

PC/Q35 machines have a 256MB region allowing up to 256 buses.
This series tries to bridge the gap.

It declares a new ECAM region located beyond 256GB, of size 256MB
(just after the hypothetical new GICv3 RDIST region). The new
ECAM region is used as soon as the highmem option is set (default)
and disabled for machines older than 3.0.

Best Regards

Eric

Git: complete series available at
https://github.com/eauger/qemu/tree/v2.12.0-256MB-ECAM-RFCv1

- Tested with guest running in aarch64 and aarch32 modes (aarch64=off)
- In aarch32 mode I encountered the issue the vmalloc region may be
  reported too small for the needs (dmesg excerpt below). So I had to
  extend the vmalloc size by passing the "vmalloc=512M" option to the
  bootargs and this eventually booted fine.

[    1.399581] pl061_gpio 9030000.pl061: PL061 GPIO chip @0x0000000009030000 registered
[    1.402636] OF: PCI: host bridge /pcie@10000000 ranges:
[    1.404506] OF: PCI:    IO 0x3eff0000..0x3effffff -> 0x00000000
[    1.406606] OF: PCI:   MEM 0x10000000..0x3efeffff -> 0x10000000
[    1.408690] OF: PCI:   MEM 0x8000000000..0xffffffffff -> 0x8000000000
[    1.411992] vmap allocation for size 1052672 failed: use vmalloc=<size> to increase size
[    1.414895] pci-host-generic 4010000000.pcie: ECAM ioremap failed
[    1.427472] pci-host-generic: probe of 4010000000.pcie failed with error -12

- Maybe this issue deserves introducing a new highmem_ecam option?

Eric Auger (2):
  hw/arm/virt: Add a new 256MB ECAM region
  hw/arm/virt: Add virt-3.0 machine type

 hw/arm/virt-acpi-build.c | 22 ++++++++++++++--------
 hw/arm/virt.c            | 39 ++++++++++++++++++++++++++++++++-------
 include/hw/arm/virt.h    |  3 +++
 3 files changed, 49 insertions(+), 15 deletions(-)

-- 
2.5.5


Re: [Qemu-devel] [RFC 0/2] ARM virt: Support up to 256 PCIe buses
Posted by Laszlo Ersek 7 years, 5 months ago
Hi Eric,

On 05/23/18 18:03, Eric Auger wrote:
> Current Machvirt PCI host controller's ECAM region is 16MB large.
> This limits the number of PCIe buses to 16.
>
> PC/Q35 machines have a 256MB region allowing up to 256 buses.
> This series tries to bridge the gap.
>
> It declares a new ECAM region located beyond 256GB, of size 256MB
> (just after the hypothetical new GICv3 RDIST region). The new
> ECAM region is used as soon as the highmem option is set (default)
> and disabled for machines older than 3.0.
>
> Best Regards
>
> Eric
>
> Git: complete series available at
> https://github.com/eauger/qemu/tree/v2.12.0-256MB-ECAM-RFCv1
>
> - Tested with guest running in aarch64 and aarch32 modes (aarch64=off)
> - In aarch32 mode I encountered the issue the vmalloc region may be
>   reported too small for the needs (dmesg excerpt below). So I had to
>   extend the vmalloc size by passing the "vmalloc=512M" option to the
>   bootargs and this eventually booted fine.
>
> [    1.399581] pl061_gpio 9030000.pl061: PL061 GPIO chip @0x0000000009030000 registered
> [    1.402636] OF: PCI: host bridge /pcie@10000000 ranges:
> [    1.404506] OF: PCI:    IO 0x3eff0000..0x3effffff -> 0x00000000
> [    1.406606] OF: PCI:   MEM 0x10000000..0x3efeffff -> 0x10000000
> [    1.408690] OF: PCI:   MEM 0x8000000000..0xffffffffff -> 0x8000000000
> [    1.411992] vmap allocation for size 1052672 failed: use vmalloc=<size> to increase size
> [    1.414895] pci-host-generic 4010000000.pcie: ECAM ioremap failed
> [    1.427472] pci-host-generic: probe of 4010000000.pcie failed with error -12
>
> - Maybe this issue deserves introducing a new highmem_ecam option?

I refer to my earlier email here:

  http://mid.mail-archive.com/13d95529-d61e-fc30-ffd4-f1ef93edad40@redhat.com

This series flips the sole ECAM range that is exposed to the guest to a
large one that is located above 4GB. That's a problem because -- to my
understanding -- it breaks 32-bit ARM UEFI builds, unless you change the
QEMU command line.

(1) Please enable the "firmware repo" from Gerd's site:

https://www.kraxel.org/repos/

(2) Please install the "edk2.git-arm" package.

(3) Please run the 32-bit ARM UEFI firmware, with qemu-system-aarch64,
in a separate directory, as follows (note: TCG only, KVM not needed):

  cp /usr/share/edk2.git/arm/vars-template-pflash.raw vars
  FWBIN=/usr/share/edk2.git/arm/QEMU_EFI-pflash.raw

  qemu-system-aarch64 \
    -nodefaults \
    -no-user-config \
    \
    -M virt \
    -cpu cortex-a15 \
    -m 1024 \
    \
    -drive if=pflash,format=raw,file=$FWBIN,readonly \
    -drive if=pflash,format=raw,file=vars \
    \
    -device virtio-gpu-pci \
    -device qemu-xhci \
    -device usb-kbd \
    \
    -chardev stdio,signal=off,mux=on,id=char0 \
    -mon chardev=char0,mode=readline \
    -serial chardev:char0

This will boot the UEFI shell for you in a graphical window and take
input from the keyboard in that window. A virtio-gpu-pci device is used
as GPU (a PCI Express virtio device) and a USB3.0 keyboard is used as
human input device (the USB3.0 controller is also PCI Express).


I didn't test it, but I expect that this series, when applied as-is,
will break the above use case, unless highmem is explicitly disabled.

I think the first patch is OK (modulo the runaway empty line at the end
of acpi_dsdt_add_pci()), while realizing my review cannot be complete.
:)

Regarding the second patch, I do believe we need "more sophistication"
there. For example, I guess it could be possible to distinguish "-cpu
cortex-a15" from "-cpu cortex-a57" somehow, and stick with the low/small
ECAM in the former case. (The 32-bit firmware already runs on cortex-a15
only, and not on cortex-a57, according to my testing.)

Thanks,
Laszlo

Re: [Qemu-devel] [RFC 0/2] ARM virt: Support up to 256 PCIe buses
Posted by Auger Eric 7 years, 5 months ago
Hi Laszlo,

On 05/23/2018 07:45 PM, Laszlo Ersek wrote:
> Hi Eric,
> 
> On 05/23/18 18:03, Eric Auger wrote:
>> Current Machvirt PCI host controller's ECAM region is 16MB large.
>> This limits the number of PCIe buses to 16.
>>
>> PC/Q35 machines have a 256MB region allowing up to 256 buses.
>> This series tries to bridge the gap.
>>
>> It declares a new ECAM region located beyond 256GB, of size 256MB
>> (just after the hypothetical new GICv3 RDIST region). The new
>> ECAM region is used as soon as the highmem option is set (default)
>> and disabled for machines older than 3.0.
>>
>> Best Regards
>>
>> Eric
>>
>> Git: complete series available at
>> https://github.com/eauger/qemu/tree/v2.12.0-256MB-ECAM-RFCv1
>>
>> - Tested with guest running in aarch64 and aarch32 modes (aarch64=off)
>> - In aarch32 mode I encountered the issue the vmalloc region may be
>>   reported too small for the needs (dmesg excerpt below). So I had to
>>   extend the vmalloc size by passing the "vmalloc=512M" option to the
>>   bootargs and this eventually booted fine.
>>
>> [    1.399581] pl061_gpio 9030000.pl061: PL061 GPIO chip @0x0000000009030000 registered
>> [    1.402636] OF: PCI: host bridge /pcie@10000000 ranges:
>> [    1.404506] OF: PCI:    IO 0x3eff0000..0x3effffff -> 0x00000000
>> [    1.406606] OF: PCI:   MEM 0x10000000..0x3efeffff -> 0x10000000
>> [    1.408690] OF: PCI:   MEM 0x8000000000..0xffffffffff -> 0x8000000000
>> [    1.411992] vmap allocation for size 1052672 failed: use vmalloc=<size> to increase size
>> [    1.414895] pci-host-generic 4010000000.pcie: ECAM ioremap failed
>> [    1.427472] pci-host-generic: probe of 4010000000.pcie failed with error -12
>>
>> - Maybe this issue deserves introducing a new highmem_ecam option?
> 
> I refer to my earlier email here:
> 
>   http://mid.mail-archive.com/13d95529-d61e-fc30-ffd4-f1ef93edad40@redhat.com
> 
> This series flips the sole ECAM range that is exposed to the guest to a
> large one that is located above 4GB. That's a problem because -- to my
> understanding -- it breaks 32-bit ARM UEFI builds, unless you change the
> QEMU command line.

Thank you for your quick feedback. Effectively I ran the aarch32 guest
in DT boot. My bad.
> 
> (1) Please enable the "firmware repo" from Gerd's site:
> 
> https://www.kraxel.org/repos/
> 
> (2) Please install the "edk2.git-arm" package.
> 
> (3) Please run the 32-bit ARM UEFI firmware, with qemu-system-aarch64,
> in a separate directory, as follows (note: TCG only, KVM not needed):
> 
>   cp /usr/share/edk2.git/arm/vars-template-pflash.raw vars
>   FWBIN=/usr/share/edk2.git/arm/QEMU_EFI-pflash.raw
> 
>   qemu-system-aarch64 \
>     -nodefaults \
>     -no-user-config \
>     \
>     -M virt \
>     -cpu cortex-a15 \
>     -m 1024 \
>     \
>     -drive if=pflash,format=raw,file=$FWBIN,readonly \
>     -drive if=pflash,format=raw,file=vars \
>     \
>     -device virtio-gpu-pci \
>     -device qemu-xhci \
>     -device usb-kbd \
>     \
>     -chardev stdio,signal=off,mux=on,id=char0 \
>     -mon chardev=char0,mode=readline \
>     -serial chardev:char0
> 
> This will boot the UEFI shell for you in a graphical window and take
> input from the keyboard in that window. A virtio-gpu-pci device is used
> as GPU (a PCI Express virtio device) and a USB3.0 keyboard is used as
> human input device (the USB3.0 controller is also PCI Express).
> 
> 
> I didn't test it, but I expect that this series, when applied as-is,
> will break the above use case, unless highmem is explicitly disabled.

Yes as you expected, it leads to an exception whereas it works properly
without the series. Actually I misunderstood your last email and was
thinking/hoping that with aarch32 LPAE, things should work.


PCI Bus First Scanning

Data Abort Exception PC at 0x7F7E1A06  CPSR 0x40000033 nZcveaifT_svc
/home/jenkins/workspace/edk2/build/edk2-g7cd8a57599/Build/ArmVirtQemu-ARM/DEBUG_GCC5/ARM/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciHostBridgeDxe/DEBUG/PciHostBridgeDxe.dll
loaded at 0x7F7E0000 (PE/COFF offset) 0x1A06 (ELF or Mach-O offset) 0xA06
0x6820       LDR    r0, [r4 #0x0]
  R0 0x00000001   R1 0xFFFFFFFF   R2 0x00000000   R3 0x00000000
  R4 0x10000000   R5 0x7FAC6C28   R6 0x00000001   R7 0x00000004
  R8 0x7FAC6C28   R9 0x00000000  R10 0x00000000  R11 0x00000000
 R12 0x00000000   SP 0x7FAC6B68   LR 0x7F7E19F5   PC 0x7F7E1A06
DFSR 0x00000008 DFAR 0x10000000 IFSR 0x00000000 IFAR 0x00000000
 Precise External Abort: read from 0x10000000

ASSERT [ArmCpuDxe]
/home/jenkins/workspace/edk2/build/edk2-g7cd8a57599/ArmPkg/Library/DefaultExceptionHandlerLib/Arm/DefaultExceptionHandler.c(268):
((BOOLEAN)(0==1))

> 
> I think the first patch is OK (modulo the runaway empty line at the end
> of acpi_dsdt_add_pci()), while realizing my review cannot be complete.
> :)
> 
> Regarding the second patch, I do believe we need "more sophistication"
> there. For example, I guess it could be possible to distinguish "-cpu
> cortex-a15" from "-cpu cortex-a57" somehow, and stick with the low/small
> ECAM in the former case. (The 32-bit firmware already runs on cortex-a15
> only, and not on cortex-a57, according to my testing.)

So we should detect we are in ACPI boot  + aarch32 mode to force legacy
ECAM region, right?

Thanks

Eric
> 
> Thanks,
> Laszlo
> 

Re: [Qemu-devel] [RFC 0/2] ARM virt: Support up to 256 PCIe buses
Posted by Laszlo Ersek 7 years, 5 months ago
On 05/23/18 22:40, Auger Eric wrote:
> On 05/23/2018 07:45 PM, Laszlo Ersek wrote:

>> Regarding the second patch, I do believe we need "more sophistication"
>> there. For example, I guess it could be possible to distinguish "-cpu
>> cortex-a15" from "-cpu cortex-a57" somehow, and stick with the low/small
>> ECAM in the former case. (The 32-bit firmware already runs on cortex-a15
>> only, and not on cortex-a57, according to my testing.)
> 
> So we should detect we are in ACPI boot  + aarch32 mode to force legacy
> ECAM region, right?

Agree about the aarch32 subcondition.

However, "ACPI vs. DT" is not the right "other" subcondition here;
instead we should (minimally) check "firmware vs. no firmware". See the
"firmware_loaded" boolean field.

I also suggest waiting for feedback from others! :)

Thanks,
Laszlo

Re: [Qemu-devel] [RFC 0/2] ARM virt: Support up to 256 PCIe buses
Posted by Auger Eric 7 years, 5 months ago
Hi,

On 05/23/2018 10:52 PM, Laszlo Ersek wrote:
> On 05/23/18 22:40, Auger Eric wrote:
>> On 05/23/2018 07:45 PM, Laszlo Ersek wrote:
> 
>>> Regarding the second patch, I do believe we need "more sophistication"
>>> there. For example, I guess it could be possible to distinguish "-cpu
>>> cortex-a15" from "-cpu cortex-a57" somehow, and stick with the low/small
>>> ECAM in the former case. (The 32-bit firmware already runs on cortex-a15
>>> only, and not on cortex-a57, according to my testing.)
>>
>> So we should detect we are in ACPI boot  + aarch32 mode to force legacy
>> ECAM region, right?
> 
> Agree about the aarch32 subcondition.
> 
> However, "ACPI vs. DT" is not the right "other" subcondition here;
> instead we should (minimally) check "firmware vs. no firmware". See the
> "firmware_loaded" boolean field.

OK
> 
> I also suggest waiting for feedback from others! :)

sure ;-)

Eric
> 
> Thanks,
> Laszlo
> 

Re: [Qemu-devel] [RFC 0/2] ARM virt: Support up to 256 PCIe buses
Posted by Peter Maydell 7 years, 5 months ago
On 23 May 2018 at 21:52, Laszlo Ersek <lersek@redhat.com> wrote:
> On 05/23/18 22:40, Auger Eric wrote:
>> On 05/23/2018 07:45 PM, Laszlo Ersek wrote:
>
>>> Regarding the second patch, I do believe we need "more sophistication"
>>> there. For example, I guess it could be possible to distinguish "-cpu
>>> cortex-a15" from "-cpu cortex-a57" somehow, and stick with the low/small
>>> ECAM in the former case. (The 32-bit firmware already runs on cortex-a15
>>> only, and not on cortex-a57, according to my testing.)
>>
>> So we should detect we are in ACPI boot  + aarch32 mode to force legacy
>> ECAM region, right?
>
> Agree about the aarch32 subcondition.
>
> However, "ACPI vs. DT" is not the right "other" subcondition here;
> instead we should (minimally) check "firmware vs. no firmware". See the
> "firmware_loaded" boolean field.

Won't it also break a guest which is just Linux loaded not via
firmware which is an aarch32 kernel without LPAE support?

thanks
-- PMM

Re: [Qemu-devel] [RFC 0/2] ARM virt: Support up to 256 PCIe buses
Posted by Laszlo Ersek 7 years, 5 months ago
On 05/24/18 11:11, Peter Maydell wrote:
> On 23 May 2018 at 21:52, Laszlo Ersek <lersek@redhat.com> wrote:
>> On 05/23/18 22:40, Auger Eric wrote:
>>> On 05/23/2018 07:45 PM, Laszlo Ersek wrote:
>>
>>>> Regarding the second patch, I do believe we need "more sophistication"
>>>> there. For example, I guess it could be possible to distinguish "-cpu
>>>> cortex-a15" from "-cpu cortex-a57" somehow, and stick with the low/small
>>>> ECAM in the former case. (The 32-bit firmware already runs on cortex-a15
>>>> only, and not on cortex-a57, according to my testing.)
>>>
>>> So we should detect we are in ACPI boot  + aarch32 mode to force legacy
>>> ECAM region, right?
>>
>> Agree about the aarch32 subcondition.
>>
>> However, "ACPI vs. DT" is not the right "other" subcondition here;
>> instead we should (minimally) check "firmware vs. no firmware". See the
>> "firmware_loaded" boolean field.
> 
> Won't it also break a guest which is just Linux loaded not via
> firmware which is an aarch32 kernel without LPAE support?

Does such a thing exist? (I honestly have no clue.)

If it does, then there are two options:
- don't enable the new ECAM range by default (always take an explicit
option),
- offer both ECAM ranges and let the guest pick one (I should add that I
have no idea whether exposing such *alternatives* is possible via DT and
ACPI; i.e., "pick one but not both").

Thanks
Laszlo

Re: [Qemu-devel] [RFC 0/2] ARM virt: Support up to 256 PCIe buses
Posted by Peter Maydell 7 years, 5 months ago
On 24 May 2018 at 13:59, Laszlo Ersek <lersek@redhat.com> wrote:
> On 05/24/18 11:11, Peter Maydell wrote:
>> Won't it also break a guest which is just Linux loaded not via
>> firmware which is an aarch32 kernel without LPAE support?
>
> Does such a thing exist? (I honestly have no clue.)

Yes, it does; LPAE isn't a mandatory kernel config option.
This is why we have the machine 'highmem' option, so that
we can run on those kernels by not putting anything above
the 4G boundary. Looking back at the history on that, we
opted at the time for "default to highmem on, and if you're
running an non-lpae kernel you need to turn it off manually".
So we can handle those kernels by just not putting ECAM
above 4G if highmem is false.

thanks
-- PMM

Re: [Qemu-devel] [RFC 0/2] ARM virt: Support up to 256 PCIe buses
Posted by Auger Eric 7 years, 5 months ago
Hi Peter, Laszlo,

On 05/24/2018 03:07 PM, Peter Maydell wrote:
> On 24 May 2018 at 13:59, Laszlo Ersek <lersek@redhat.com> wrote:
>> On 05/24/18 11:11, Peter Maydell wrote:
>>> Won't it also break a guest which is just Linux loaded not via
>>> firmware which is an aarch32 kernel without LPAE support?
>>
>> Does such a thing exist? (I honestly have no clue.)
> 
> Yes, it does; LPAE isn't a mandatory kernel config option.
> This is why we have the machine 'highmem' option, so that
> we can run on those kernels by not putting anything above
> the 4G boundary. Looking back at the history on that, we
> opted at the time for "default to highmem on, and if you're
> running an non-lpae kernel you need to turn it off manually".
> So we can handle those kernels by just not putting ECAM
> above 4G if highmem is false.

Actually that's what my series does. If highmem=off then we use the
legacy ECAM.

Thanks

Eric
> 
> thanks
> -- PMM
> 

Re: [Qemu-devel] [RFC 0/2] ARM virt: Support up to 256 PCIe buses
Posted by Laszlo Ersek 7 years, 5 months ago
On 05/24/18 15:07, Peter Maydell wrote:
> On 24 May 2018 at 13:59, Laszlo Ersek <lersek@redhat.com> wrote:
>> On 05/24/18 11:11, Peter Maydell wrote:
>>> Won't it also break a guest which is just Linux loaded not via
>>> firmware which is an aarch32 kernel without LPAE support?
>>
>> Does such a thing exist? (I honestly have no clue.)
> 
> Yes, it does; LPAE isn't a mandatory kernel config option.
> This is why we have the machine 'highmem' option, so that
> we can run on those kernels by not putting anything above
> the 4G boundary. Looking back at the history on that, we
> opted at the time for "default to highmem on, and if you're
> running an non-lpae kernel you need to turn it off manually".

Ah, OK, I didn't know that.

> So we can handle those kernels by just not putting ECAM
> above 4G if highmem is false.

The problem is we can have a combination of 32-bit UEFI firmware (which
certainly lacks LPAE) and a 32-bit kernel which supports LPAE.
Previously, you wouldn't specify highmem=off, and things would just work
-- the firmware would simply ignore the >=4GB MMIO aperture, and use the
32-bit MMIO aperture only (and use the sole 32-bit ECAM). The kernel
could then use both low and high MMIO apertures, however (I gather?).

The difference with "high ECAM" is that it is *moved* (not *added*), so
the 32-bit firmware is left with nothing for config space access. For
booting the same combination as above, you are suddenly forced to add
highmem=off, just to keep the ECAM low -- and that, while it keeps the
firmware happy, prevents the LPAE-capable kernel from using the high
MMIO aperture.

So I think "highmem_ecam" should be computed like this:

  highmem_ecam = highmem_ecam_machtype_default &&
                 highmem &&
                 (!firmware_loaded || aarch64);

Thanks,
Laszlo

Re: [Qemu-devel] [RFC 0/2] ARM virt: Support up to 256 PCIe buses
Posted by Auger Eric 7 years, 5 months ago
Hi Laszlo,

On 05/24/2018 03:59 PM, Laszlo Ersek wrote:
> On 05/24/18 15:07, Peter Maydell wrote:
>> On 24 May 2018 at 13:59, Laszlo Ersek <lersek@redhat.com> wrote:
>>> On 05/24/18 11:11, Peter Maydell wrote:
>>>> Won't it also break a guest which is just Linux loaded not via
>>>> firmware which is an aarch32 kernel without LPAE support?
>>>
>>> Does such a thing exist? (I honestly have no clue.)
>>
>> Yes, it does; LPAE isn't a mandatory kernel config option.
>> This is why we have the machine 'highmem' option, so that
>> we can run on those kernels by not putting anything above
>> the 4G boundary. Looking back at the history on that, we
>> opted at the time for "default to highmem on, and if you're
>> running an non-lpae kernel you need to turn it off manually".
> 
> Ah, OK, I didn't know that.
> 
>> So we can handle those kernels by just not putting ECAM
>> above 4G if highmem is false.
> 
> The problem is we can have a combination of 32-bit UEFI firmware (which
> certainly lacks LPAE) and a 32-bit kernel which supports LPAE.

Is it what happens with the FW you provided to me? There is no LPAE in it?

> Previously, you wouldn't specify highmem=off, and things would just work
> -- the firmware would simply ignore the >=4GB MMIO apertur  e, and use the
> 32-bit MMIO aperture only (and use the sole 32-bit ECAM). The kernel
> could then use both low and high MMIO apertures, however (I gather?).
>   
> The difference with "high ECAM" is that it is *moved* (not *added*), so
> the 32-bit firmware is left with nothing for config space access.
Yes it is not possible to declare several disjoint ECAM spaces for a
single segment I think, hence the move.

 For
> booting the same combination as above, you are suddenly forced to add
> highmem=off, just to keep the ECAM low -- and that, while it keeps the
> firmware happy, prevents the LPAE-capable kernel from using the high
> MMIO aperture.
> 
> So I think "highmem_ecam" should be computed like this:
> 
>   highmem_ecam = highmem_ecam_machtype_default &&
>                  highmem &&
>                  (!firmware_loaded || aarch64);

Looks sensible to me

Thanks

Eric
> 
> Thanks,
> Laszlo
> 

Re: [Qemu-devel] [RFC 0/2] ARM virt: Support up to 256 PCIe buses
Posted by Laszlo Ersek 7 years, 5 months ago
On 05/24/18 16:09, Auger Eric wrote:
> Hi Laszlo,
> 
> On 05/24/2018 03:59 PM, Laszlo Ersek wrote:
>> On 05/24/18 15:07, Peter Maydell wrote:
>>> On 24 May 2018 at 13:59, Laszlo Ersek <lersek@redhat.com> wrote:
>>>> On 05/24/18 11:11, Peter Maydell wrote:
>>>>> Won't it also break a guest which is just Linux loaded not via
>>>>> firmware which is an aarch32 kernel without LPAE support?
>>>>
>>>> Does such a thing exist? (I honestly have no clue.)
>>>
>>> Yes, it does; LPAE isn't a mandatory kernel config option.
>>> This is why we have the machine 'highmem' option, so that
>>> we can run on those kernels by not putting anything above
>>> the 4G boundary. Looking back at the history on that, we
>>> opted at the time for "default to highmem on, and if you're
>>> running an non-lpae kernel you need to turn it off manually".
>>
>> Ah, OK, I didn't know that.
>>
>>> So we can handle those kernels by just not putting ECAM
>>> above 4G if highmem is false.
>>
>> The problem is we can have a combination of 32-bit UEFI firmware (which
>> certainly lacks LPAE) and a 32-bit kernel which supports LPAE.
> 
> Is it what happens with the FW you provided to me? There is no LPAE in it?

That's the case, to my knowledge.

Thanks
Laszlo

Re: [Qemu-devel] [RFC 0/2] ARM virt: Support up to 256 PCIe buses
Posted by Ard Biesheuvel 7 years, 5 months ago
On 24 May 2018 at 15:59, Laszlo Ersek <lersek@redhat.com> wrote:
> On 05/24/18 15:07, Peter Maydell wrote:
>> On 24 May 2018 at 13:59, Laszlo Ersek <lersek@redhat.com> wrote:
>>> On 05/24/18 11:11, Peter Maydell wrote:
>>>> Won't it also break a guest which is just Linux loaded not via
>>>> firmware which is an aarch32 kernel without LPAE support?
>>>
>>> Does such a thing exist? (I honestly have no clue.)
>>
>> Yes, it does; LPAE isn't a mandatory kernel config option.
>> This is why we have the machine 'highmem' option, so that
>> we can run on those kernels by not putting anything above
>> the 4G boundary. Looking back at the history on that, we
>> opted at the time for "default to highmem on, and if you're
>> running an non-lpae kernel you need to turn it off manually".
>
> Ah, OK, I didn't know that.
>
>> So we can handle those kernels by just not putting ECAM
>> above 4G if highmem is false.
>
> The problem is we can have a combination of 32-bit UEFI firmware (which
> certainly lacks LPAE) and a 32-bit kernel which supports LPAE.
> Previously, you wouldn't specify highmem=off, and things would just work
> -- the firmware would simply ignore the >=4GB MMIO aperture, and use the
> 32-bit MMIO aperture only (and use the sole 32-bit ECAM). The kernel
> could then use both low and high MMIO apertures, however (I gather?).
>
> The difference with "high ECAM" is that it is *moved* (not *added*), so
> the 32-bit firmware is left with nothing for config space access. For
> booting the same combination as above, you are suddenly forced to add
> highmem=off, just to keep the ECAM low -- and that, while it keeps the
> firmware happy, prevents the LPAE-capable kernel from using the high
> MMIO aperture.
>
> So I think "highmem_ecam" should be computed like this:
>
>   highmem_ecam = highmem_ecam_machtype_default &&
>                  highmem &&
>                  (!firmware_loaded || aarch64);
>

Given that the firmware is tightly coupled to the platform, we may
decide not to care about ECAM for UEFI itself, and invent a secondary
config space access mechanism that does not consume such a huge amount
of address space. For instance, legacy PCI uses a pair of I/O ports
for this, one to set the address and one to perform the actual read or
write, and we could easily implement something similar (such an
interface is problematic in SMP context but we don't care about that
anyway)

Just a thought - perhaps we don't care enough about 32-bit to go
through the trouble, but it would be nice if LPAE capable 32-bit
guests could make use of the expanded PCIe config space as well.

Re: [Qemu-devel] [RFC 0/2] ARM virt: Support up to 256 PCIe buses
Posted by Laszlo Ersek 7 years, 5 months ago
On 05/24/18 16:14, Ard Biesheuvel wrote:
> On 24 May 2018 at 15:59, Laszlo Ersek <lersek@redhat.com> wrote:
>> On 05/24/18 15:07, Peter Maydell wrote:
>>> On 24 May 2018 at 13:59, Laszlo Ersek <lersek@redhat.com> wrote:
>>>> On 05/24/18 11:11, Peter Maydell wrote:
>>>>> Won't it also break a guest which is just Linux loaded not via
>>>>> firmware which is an aarch32 kernel without LPAE support?
>>>>
>>>> Does such a thing exist? (I honestly have no clue.)
>>>
>>> Yes, it does; LPAE isn't a mandatory kernel config option.
>>> This is why we have the machine 'highmem' option, so that
>>> we can run on those kernels by not putting anything above
>>> the 4G boundary. Looking back at the history on that, we
>>> opted at the time for "default to highmem on, and if you're
>>> running an non-lpae kernel you need to turn it off manually".
>>
>> Ah, OK, I didn't know that.
>>
>>> So we can handle those kernels by just not putting ECAM
>>> above 4G if highmem is false.
>>
>> The problem is we can have a combination of 32-bit UEFI firmware (which
>> certainly lacks LPAE) and a 32-bit kernel which supports LPAE.
>> Previously, you wouldn't specify highmem=off, and things would just work
>> -- the firmware would simply ignore the >=4GB MMIO aperture, and use the
>> 32-bit MMIO aperture only (and use the sole 32-bit ECAM). The kernel
>> could then use both low and high MMIO apertures, however (I gather?).
>>
>> The difference with "high ECAM" is that it is *moved* (not *added*), so
>> the 32-bit firmware is left with nothing for config space access. For
>> booting the same combination as above, you are suddenly forced to add
>> highmem=off, just to keep the ECAM low -- and that, while it keeps the
>> firmware happy, prevents the LPAE-capable kernel from using the high
>> MMIO aperture.
>>
>> So I think "highmem_ecam" should be computed like this:
>>
>>   highmem_ecam = highmem_ecam_machtype_default &&
>>                  highmem &&
>>                  (!firmware_loaded || aarch64);
>>
> 
> Given that the firmware is tightly coupled to the platform, we may
> decide not to care about ECAM for UEFI itself, and invent a secondary
> config space access mechanism that does not consume such a huge amount
> of address space. For instance, legacy PCI uses a pair of I/O ports
> for this, one to set the address and one to perform the actual read or
> write, and we could easily implement something similar (such an
> interface is problematic in SMP context but we don't care about that
> anyway)
> 
> Just a thought - perhaps we don't care enough about 32-bit to go
> through the trouble, but it would be nice if LPAE capable 32-bit
> guests could make use of the expanded PCIe config space as well.

Under the above proposal, they could, they'd just have to be launched
without firmware:

  highmem_ecam_machtype_default = true;
  highmem = true;
  firmware_loaded = false;
  aarch64 = false;

  highmem_ecam = true &&
                 true &&
                 (!false || false);

I see a return to the 0xCF8/0xCFC pattern regressive; I'd rather
restrict the large/high ECAM feature to 64-bit guests (with or without
firmware), and to 32-bit LPAE kernels that are launched without firmware
(which, I think, has been the case for most of their history).

Personally I don't have a stake in 32-bit ARM, so do take my opinion
with a grain of salt. Wearing my upstream ArmVirtQemu co-maintainer hat,
my sole 32-bit interest is in keeping command lines working, *if* they
once worked. Not extending new QEMU features to 32-bit firmware is fine
with me -- in fact I would value that over seeing more quirky firmware
code just for 32-bit's sake.

Side topic: the last subcondition basically says, "IF we use firmware
THEN the VM had better be 64-bit". This is a "logical implication":
A-->B. The C language doesn't have an "implication operator", so I
rewrote it equivalently with the logical negation and logical OR
operators: A-->B is equivalent to (!A || B). (If A is true, then B must
hold; if A is false, then B doesn't matter.)

Thanks,
Laszlo

Re: [Qemu-devel] [RFC 0/2] ARM virt: Support up to 256 PCIe buses
Posted by Auger Eric 7 years, 5 months ago

On 05/24/2018 07:20 PM, Laszlo Ersek wrote:
> On 05/24/18 16:14, Ard Biesheuvel wrote:
>> On 24 May 2018 at 15:59, Laszlo Ersek <lersek@redhat.com> wrote:
>>> On 05/24/18 15:07, Peter Maydell wrote:
>>>> On 24 May 2018 at 13:59, Laszlo Ersek <lersek@redhat.com> wrote:
>>>>> On 05/24/18 11:11, Peter Maydell wrote:
>>>>>> Won't it also break a guest which is just Linux loaded not via
>>>>>> firmware which is an aarch32 kernel without LPAE support?
>>>>>
>>>>> Does such a thing exist? (I honestly have no clue.)
>>>>
>>>> Yes, it does; LPAE isn't a mandatory kernel config option.
>>>> This is why we have the machine 'highmem' option, so that
>>>> we can run on those kernels by not putting anything above
>>>> the 4G boundary. Looking back at the history on that, we
>>>> opted at the time for "default to highmem on, and if you're
>>>> running an non-lpae kernel you need to turn it off manually".
>>>
>>> Ah, OK, I didn't know that.
>>>
>>>> So we can handle those kernels by just not putting ECAM
>>>> above 4G if highmem is false.
>>>
>>> The problem is we can have a combination of 32-bit UEFI firmware (which
>>> certainly lacks LPAE) and a 32-bit kernel which supports LPAE.
>>> Previously, you wouldn't specify highmem=off, and things would just work
>>> -- the firmware would simply ignore the >=4GB MMIO aperture, and use the
>>> 32-bit MMIO aperture only (and use the sole 32-bit ECAM). The kernel
>>> could then use both low and high MMIO apertures, however (I gather?).
>>>
>>> The difference with "high ECAM" is that it is *moved* (not *added*), so
>>> the 32-bit firmware is left with nothing for config space access. For
>>> booting the same combination as above, you are suddenly forced to add
>>> highmem=off, just to keep the ECAM low -- and that, while it keeps the
>>> firmware happy, prevents the LPAE-capable kernel from using the high
>>> MMIO aperture.
>>>
>>> So I think "highmem_ecam" should be computed like this:
>>>
>>>   highmem_ecam = highmem_ecam_machtype_default &&
>>>                  highmem &&
>>>                  (!firmware_loaded || aarch64);
>>>
>>
>> Given that the firmware is tightly coupled to the platform, we may
>> decide not to care about ECAM for UEFI itself, and invent a secondary
>> config space access mechanism that does not consume such a huge amount
>> of address space. For instance, legacy PCI uses a pair of I/O ports
>> for this, one to set the address and one to perform the actual read or
>> write, and we could easily implement something similar (such an
>> interface is problematic in SMP context but we don't care about that
>> anyway)
>>
>> Just a thought - perhaps we don't care enough about 32-bit to go
>> through the trouble, but it would be nice if LPAE capable 32-bit
>> guests could make use of the expanded PCIe config space as well.
> 
> Under the above proposal, they could, they'd just have to be launched
> without firmware:
> 
>   highmem_ecam_machtype_default = true;
>   highmem = true;
>   firmware_loaded = false;
>   aarch64 = false;
> 
>   highmem_ecam = true &&
>                  true &&
>                  (!false || false);

I think we mostly care about 64b guest experience improvement here. So
personally I am fine with your proposal.

Also there is this vmalloc shortage issue, hit with aarch32 guest only,
up to now (Which I reported at the end of the cover letter). This can
cause some existing guest configs (even without FW) to not boot with the
new high ECAM region whereas it booted before. I don't know if this is
acceptable.

Thanks

Eric
> 
> I see a return to the 0xCF8/0xCFC pattern regressive; I'd rather
> restrict the large/high ECAM feature to 64-bit guests (with or without
> firmware), and to 32-bit LPAE kernels that are launched without firmware
> (which, I think, has been the case for most of their history).
> 
> Personally I don't have a stake in 32-bit ARM, so do take my opinion
> with a grain of salt. Wearing my upstream ArmVirtQemu co-maintainer hat,
> my sole 32-bit interest is in keeping command lines working, *if* they
> once worked. Not extending new QEMU features to 32-bit firmware is fine
> with me -- in fact I would value that over seeing more quirky firmware
> code just for 32-bit's sake.
> 
> Side topic: the last subcondition basically says, "IF we use firmware
> THEN the VM had better be 64-bit". This is a "logical implication":
> A-->B. The C language doesn't have an "implication operator", so I
> rewrote it equivalently with the logical negation and logical OR
> operators: A-->B is equivalent to (!A || B). (If A is true, then B must
> hold; if A is false, then B doesn't matter.)
> 
> Thanks,
> Laszlo
>