[v7] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

[Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Eric Auger 6 years, 8 months ago

This series aims to bump the 255GB RAM limit in machvirt and to
support device memory in general, and especially PCDIMM/NVDIMM.

In machvirt versions < 4.0, the initial RAM starts at 1GB and can
grow up to 255GB. From 256GB onwards we find IO regions such as the
additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
MMIO region. The address map was 1TB large. This corresponded to
the max IPA capacity KVM was able to manage.

Since 4.20, the host kernel is able to support a larger and dynamic
IPA range. So the guest physical address can go beyond the 1TB. The
max GPA size depends on the host kernel configuration and physical CPUs.

In this series we use this feature and allow the RAM to grow without
any other limit than the one put by the host kernel.

The RAM still starts at 1GB. First comes the initial ram (-m) of size
ram_size and then comes the device memory (,maxmem) of size
maxram_size - ram_size. The device memory is potentially hotpluggable
depending on the instantiated memory objects.

IO regions previously located between 256GB and 1TB are moved after
the RAM. Their offset is dynamically computed, depends on ram_size
and maxram_size. Size alignment is enforced.

In case maxmem value is inferior to 255GB, the legacy memory map
still is used. The change of memory map becomes effective from 4.0
onwards.

As we keep the initial RAM at 1GB base address, we do not need to do
invasive changes in the EDK2 FW. It seems nobody is eager to do
that job at the moment.

Device memory being put just after the initial RAM, it is possible
to get access to this feature while keeping a 1TB address map.

This series reuses/rebases patches initially submitted by Shameer
in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.

Functionally, the series is split into 3 parts:
1) bump of the initial RAM limit [1 - 9] and change in
   the memory map
2) Support of PC-DIMM [10 - 13]
3) Support of NV-DIMM [14 - 17]

1) can be upstreamed before 2 and 2 can be upstreamed before 3.

Work is ongoing to transform the whole memory as device memory.
However this move is not trivial and to me, is independent on
the improvements brought by this series:
- if we were to use DIMM for initial RAM, those DIMMs would use
  use slots. Although they would not be part of the ones provided
  using the ",slots" options, they are ACPI limited resources.
- DT and ACPI description needs to be reworked
- NUMA integration needs special care
- a special device memory object may be required to avoid consuming
  slots and easing the FW description.

So I preferred to separate the concerns. This new implementation
based on device memory could be candidate for another virt
version.

Best Regards

Eric

References:

[0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
http://patchwork.ozlabs.org/cover/914694/

[1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html

This series can be found at:
https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7

History:

v6 -> v7:
- Addressed Peter and Igor comments (exceptions sent my email)
- Fixed TCG case. Now device memory works also for TCG and vcpu
  pamax is checked
- See individual logs for more details

v5 -> v6:
- mingw compilation issue fix
- kvm_arm_get_max_vm_phys_shift always returns the number of supported
  IPA bits
- new patch "hw/arm/virt: Rename highmem IO regions" that eases the review
  of "hw/arm/virt: Split the memory map description"
- "hw/arm/virt: Move memory map initialization into machvirt_init"
  squashed into the previous patch
- change alignment of IO regions beyond the RAM so that it matches their
  size

v4 -> v5:
- change in the memory map
- see individual logs

v3 -> v4:
- rebase on David's "pc-dimm: next bunch of cleanups" and
  "pc-dimm: pre_plug "slot" and "addr" assignment"
- kvm-type option not used anymore. We directly use
  maxram_size and ram_size machine fields to compute the
  MAX IPA range. Migration is naturally handled as CLI
  option are kept between source and destination. This was
  suggested by David.
- device_memory_start and device_memory_size not stored
  anymore in vms->bootinfo
- I did not take into account 2 Igor's comments: the one
  related to the refactoring of arm_load_dtb and the one
  related to the generation of the dtb after system_reset
  which would contain nodes of hotplugged devices (we do
  not support hotplug at this stage)
- check the end-user does not attempt to hotplug a device
- addition of "vl: Set machine ram_size, maxram_size and
  ram_slots earlier"

v2 -> v3:
- fix pc_q35 and pc_piix compilation error
- kwangwoo's email being not valid anymore, remove his address

v1 -> v2:
- kvm_get_max_vm_phys_shift moved in arch specific file
- addition of NVDIMM part
- single series
- rebase on David's refactoring

v1:
- was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
- was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"

Best Regards

Eric


Eric Auger (12):
  hw/arm/virt: Rename highmem IO regions
  hw/arm/virt: Split the memory map description
  hw/boards: Add a MachineState parameter to kvm_type callback
  kvm: add kvm_arm_get_max_vm_ipa_size
  vl: Set machine ram_size, maxram_size and ram_slots earlier
  hw/arm/virt: Dynamic memory map depending on RAM requirements
  hw/arm/virt: Implement kvm_type function for 4.0 machine
  hw/arm/virt: Bump the 255GB initial RAM limit
  hw/arm/virt: Add memory hotplug framework
  hw/arm/virt: Allocate device_memory
  hw/arm/boot: Expose the pmem nodes in the DT
  hw/arm/virt: Add nvdimm and nvdimm-persistence options

Kwangwoo Lee (2):
  nvdimm: use configurable ACPI IO base and size
  hw/arm/virt: Add nvdimm hot-plug infrastructure

Shameer Kolothum (3):
  hw/arm/boot: introduce fdt_add_memory_node helper
  hw/arm/boot: Expose the PC-DIMM nodes in the DT
  hw/arm/virt-acpi-build: Add PC-DIMM in SRAT

 accel/kvm/kvm-all.c             |   2 +-
 default-configs/arm-softmmu.mak |   4 +
 hw/acpi/nvdimm.c                |  31 ++-
 hw/arm/boot.c                   | 136 ++++++++++--
 hw/arm/virt-acpi-build.c        |  23 +-
 hw/arm/virt.c                   | 364 ++++++++++++++++++++++++++++----
 hw/i386/pc_piix.c               |   6 +-
 hw/i386/pc_q35.c                |   6 +-
 hw/ppc/mac_newworld.c           |   3 +-
 hw/ppc/mac_oldworld.c           |   2 +-
 hw/ppc/spapr.c                  |   2 +-
 include/hw/arm/virt.h           |  24 ++-
 include/hw/boards.h             |   5 +-
 include/hw/mem/nvdimm.h         |   4 +
 target/arm/kvm.c                |  10 +
 target/arm/kvm_arm.h            |  13 ++
 vl.c                            |   6 +-
 17 files changed, 556 insertions(+), 85 deletions(-)

-- 
2.20.1

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Auger Eric 6 years, 8 months ago

Hi Peter,

On 2/20/19 11:39 PM, Eric Auger wrote:
> This series aims to bump the 255GB RAM limit in machvirt and to
> support device memory in general, and especially PCDIMM/NVDIMM.
> 
> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
> grow up to 255GB. From 256GB onwards we find IO regions such as the
> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
> MMIO region. The address map was 1TB large. This corresponded to
> the max IPA capacity KVM was able to manage.
> 
> Since 4.20, the host kernel is able to support a larger and dynamic
> IPA range. So the guest physical address can go beyond the 1TB. The
> max GPA size depends on the host kernel configuration and physical CPUs.
> 
> In this series we use this feature and allow the RAM to grow without
> any other limit than the one put by the host kernel.
> 
> The RAM still starts at 1GB. First comes the initial ram (-m) of size
> ram_size and then comes the device memory (,maxmem) of size
> maxram_size - ram_size. The device memory is potentially hotpluggable
> depending on the instantiated memory objects.
> 
> IO regions previously located between 256GB and 1TB are moved after
> the RAM. Their offset is dynamically computed, depends on ram_size
> and maxram_size. Size alignment is enforced.
> 
> In case maxmem value is inferior to 255GB, the legacy memory map
> still is used. The change of memory map becomes effective from 4.0
> onwards.
> 
> As we keep the initial RAM at 1GB base address, we do not need to do
> invasive changes in the EDK2 FW. It seems nobody is eager to do
> that job at the moment.
> 
> Device memory being put just after the initial RAM, it is possible
> to get access to this feature while keeping a 1TB address map.
> 
> This series reuses/rebases patches initially submitted by Shameer
> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
> 
> Functionally, the series is split into 3 parts:
> 1) bump of the initial RAM limit [1 - 9] and change in
>    the memory map
I respinned the whole series including PCDIMM and NVDIMM parts as Igor
did a first review pass on those latter. However the first objective is
to get [1 - 9] upstreamed as we discussed realier. So please consider
those patches independently on the others.

Thanks

Eric
> 2) Support of PC-DIMM [10 - 13]
> 3) Support of NV-DIMM [14 - 17]
> 
> 1) can be upstreamed before 2 and 2 can be upstreamed before 3.
> 
> Work is ongoing to transform the whole memory as device memory.
> However this move is not trivial and to me, is independent on
> the improvements brought by this series:
> - if we were to use DIMM for initial RAM, those DIMMs would use
>   use slots. Although they would not be part of the ones provided
>   using the ",slots" options, they are ACPI limited resources.
> - DT and ACPI description needs to be reworked
> - NUMA integration needs special care
> - a special device memory object may be required to avoid consuming
>   slots and easing the FW description.
> 
> So I preferred to separate the concerns. This new implementation
> based on device memory could be candidate for another virt
> version.
> 
> Best Regards
> 
> Eric
> 
> References:
> 
> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> http://patchwork.ozlabs.org/cover/914694/
> 
> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
> 
> This series can be found at:
> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7
> 
> History:
> 
> v6 -> v7:
> - Addressed Peter and Igor comments (exceptions sent my email)
> - Fixed TCG case. Now device memory works also for TCG and vcpu
>   pamax is checked
> - See individual logs for more details
> 
> v5 -> v6:
> - mingw compilation issue fix
> - kvm_arm_get_max_vm_phys_shift always returns the number of supported
>   IPA bits
> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review
>   of "hw/arm/virt: Split the memory map description"
> - "hw/arm/virt: Move memory map initialization into machvirt_init"
>   squashed into the previous patch
> - change alignment of IO regions beyond the RAM so that it matches their
>   size
> 
> v4 -> v5:
> - change in the memory map
> - see individual logs
> 
> v3 -> v4:
> - rebase on David's "pc-dimm: next bunch of cleanups" and
>   "pc-dimm: pre_plug "slot" and "addr" assignment"
> - kvm-type option not used anymore. We directly use
>   maxram_size and ram_size machine fields to compute the
>   MAX IPA range. Migration is naturally handled as CLI
>   option are kept between source and destination. This was
>   suggested by David.
> - device_memory_start and device_memory_size not stored
>   anymore in vms->bootinfo
> - I did not take into account 2 Igor's comments: the one
>   related to the refactoring of arm_load_dtb and the one
>   related to the generation of the dtb after system_reset
>   which would contain nodes of hotplugged devices (we do
>   not support hotplug at this stage)
> - check the end-user does not attempt to hotplug a device
> - addition of "vl: Set machine ram_size, maxram_size and
>   ram_slots earlier"
> 
> v2 -> v3:
> - fix pc_q35 and pc_piix compilation error
> - kwangwoo's email being not valid anymore, remove his address
> 
> v1 -> v2:
> - kvm_get_max_vm_phys_shift moved in arch specific file
> - addition of NVDIMM part
> - single series
> - rebase on David's refactoring
> 
> v1:
> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> 
> Best Regards
> 
> Eric
> 
> 
> Eric Auger (12):
>   hw/arm/virt: Rename highmem IO regions
>   hw/arm/virt: Split the memory map description
>   hw/boards: Add a MachineState parameter to kvm_type callback
>   kvm: add kvm_arm_get_max_vm_ipa_size
>   vl: Set machine ram_size, maxram_size and ram_slots earlier
>   hw/arm/virt: Dynamic memory map depending on RAM requirements
>   hw/arm/virt: Implement kvm_type function for 4.0 machine
>   hw/arm/virt: Bump the 255GB initial RAM limit
>   hw/arm/virt: Add memory hotplug framework
>   hw/arm/virt: Allocate device_memory
>   hw/arm/boot: Expose the pmem nodes in the DT
>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
> 
> Kwangwoo Lee (2):
>   nvdimm: use configurable ACPI IO base and size
>   hw/arm/virt: Add nvdimm hot-plug infrastructure
> 
> Shameer Kolothum (3):
>   hw/arm/boot: introduce fdt_add_memory_node helper
>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
> 
>  accel/kvm/kvm-all.c             |   2 +-
>  default-configs/arm-softmmu.mak |   4 +
>  hw/acpi/nvdimm.c                |  31 ++-
>  hw/arm/boot.c                   | 136 ++++++++++--
>  hw/arm/virt-acpi-build.c        |  23 +-
>  hw/arm/virt.c                   | 364 ++++++++++++++++++++++++++++----
>  hw/i386/pc_piix.c               |   6 +-
>  hw/i386/pc_q35.c                |   6 +-
>  hw/ppc/mac_newworld.c           |   3 +-
>  hw/ppc/mac_oldworld.c           |   2 +-
>  hw/ppc/spapr.c                  |   2 +-
>  include/hw/arm/virt.h           |  24 ++-
>  include/hw/boards.h             |   5 +-
>  include/hw/mem/nvdimm.h         |   4 +
>  target/arm/kvm.c                |  10 +
>  target/arm/kvm_arm.h            |  13 ++
>  vl.c                            |   6 +-
>  17 files changed, 556 insertions(+), 85 deletions(-)
>

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Igor Mammedov 6 years, 8 months ago

On Wed, 20 Feb 2019 23:39:46 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> This series aims to bump the 255GB RAM limit in machvirt and to
> support device memory in general, and especially PCDIMM/NVDIMM.
> 
> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
> grow up to 255GB. From 256GB onwards we find IO regions such as the
> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
> MMIO region. The address map was 1TB large. This corresponded to
> the max IPA capacity KVM was able to manage.
> 
> Since 4.20, the host kernel is able to support a larger and dynamic
> IPA range. So the guest physical address can go beyond the 1TB. The
> max GPA size depends on the host kernel configuration and physical CPUs.
> 
> In this series we use this feature and allow the RAM to grow without
> any other limit than the one put by the host kernel.
> 
> The RAM still starts at 1GB. First comes the initial ram (-m) of size
> ram_size and then comes the device memory (,maxmem) of size
> maxram_size - ram_size. The device memory is potentially hotpluggable
> depending on the instantiated memory objects.
> 
> IO regions previously located between 256GB and 1TB are moved after
> the RAM. Their offset is dynamically computed, depends on ram_size
> and maxram_size. Size alignment is enforced.
> 
> In case maxmem value is inferior to 255GB, the legacy memory map
> still is used. The change of memory map becomes effective from 4.0
> onwards.
> 
> As we keep the initial RAM at 1GB base address, we do not need to do
> invasive changes in the EDK2 FW. It seems nobody is eager to do
> that job at the moment.
> 
> Device memory being put just after the initial RAM, it is possible
> to get access to this feature while keeping a 1TB address map.
> 
> This series reuses/rebases patches initially submitted by Shameer
> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
> 
> Functionally, the series is split into 3 parts:
> 1) bump of the initial RAM limit [1 - 9] and change in
>    the memory map

> 2) Support of PC-DIMM [10 - 13]
Is this part complete ACPI wise (for coldplug)? I haven't noticed
DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
visible to the guest. It might be that DT is masking problem
but well, that won't work on ACPI only guests.

Even though I've tried make mem hotplug ACPI parts not x86 specific,
I'm afraid it might be tightly coupled with hotplug support.
So here are 2 options make DSDT part work without hotplug or
implement hotplug here. I think the former is just a waste of time
and we should just add hotplug. It should take relatively minor effort
since you already implemented most of boiler plate here.

As for how to implement ACPI HW part, I suggest to borrow GED
device that NEMU guys trying to use instead of GPIO route,
like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event.
So that it would be easier to share this with their virt-x86
machine eventually.


> 3) Support of NV-DIMM [14 - 17]
The same might be true for NUMA but I haven't dug this deep in to
that part.

> 
> 1) can be upstreamed before 2 and 2 can be upstreamed before 3.
> 
> Work is ongoing to transform the whole memory as device memory.
> However this move is not trivial and to me, is independent on
> the improvements brought by this series:
> - if we were to use DIMM for initial RAM, those DIMMs would use
>   use slots. Although they would not be part of the ones provided
>   using the ",slots" options, they are ACPI limited resources.
> - DT and ACPI description needs to be reworked
> - NUMA integration needs special care
> - a special device memory object may be required to avoid consuming
>   slots and easing the FW description.
> 
> So I preferred to separate the concerns. This new implementation
> based on device memory could be candidate for another virt
> version.
> 
> Best Regards
> 
> Eric
> 
> References:
> 
> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> http://patchwork.ozlabs.org/cover/914694/
> 
> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
> 
> This series can be found at:
> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7
> 
> History:
> 
> v6 -> v7:
> - Addressed Peter and Igor comments (exceptions sent my email)
> - Fixed TCG case. Now device memory works also for TCG and vcpu
>   pamax is checked
> - See individual logs for more details
> 
> v5 -> v6:
> - mingw compilation issue fix
> - kvm_arm_get_max_vm_phys_shift always returns the number of supported
>   IPA bits
> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review
>   of "hw/arm/virt: Split the memory map description"
> - "hw/arm/virt: Move memory map initialization into machvirt_init"
>   squashed into the previous patch
> - change alignment of IO regions beyond the RAM so that it matches their
>   size
> 
> v4 -> v5:
> - change in the memory map
> - see individual logs
> 
> v3 -> v4:
> - rebase on David's "pc-dimm: next bunch of cleanups" and
>   "pc-dimm: pre_plug "slot" and "addr" assignment"
> - kvm-type option not used anymore. We directly use
>   maxram_size and ram_size machine fields to compute the
>   MAX IPA range. Migration is naturally handled as CLI
>   option are kept between source and destination. This was
>   suggested by David.
> - device_memory_start and device_memory_size not stored
>   anymore in vms->bootinfo
> - I did not take into account 2 Igor's comments: the one
>   related to the refactoring of arm_load_dtb and the one
>   related to the generation of the dtb after system_reset
>   which would contain nodes of hotplugged devices (we do
>   not support hotplug at this stage)
> - check the end-user does not attempt to hotplug a device
> - addition of "vl: Set machine ram_size, maxram_size and
>   ram_slots earlier"
> 
> v2 -> v3:
> - fix pc_q35 and pc_piix compilation error
> - kwangwoo's email being not valid anymore, remove his address
> 
> v1 -> v2:
> - kvm_get_max_vm_phys_shift moved in arch specific file
> - addition of NVDIMM part
> - single series
> - rebase on David's refactoring
> 
> v1:
> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> 
> Best Regards
> 
> Eric
> 
> 
> Eric Auger (12):
>   hw/arm/virt: Rename highmem IO regions
>   hw/arm/virt: Split the memory map description
>   hw/boards: Add a MachineState parameter to kvm_type callback
>   kvm: add kvm_arm_get_max_vm_ipa_size
>   vl: Set machine ram_size, maxram_size and ram_slots earlier
>   hw/arm/virt: Dynamic memory map depending on RAM requirements
>   hw/arm/virt: Implement kvm_type function for 4.0 machine
>   hw/arm/virt: Bump the 255GB initial RAM limit
>   hw/arm/virt: Add memory hotplug framework
>   hw/arm/virt: Allocate device_memory
>   hw/arm/boot: Expose the pmem nodes in the DT
>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
> 
> Kwangwoo Lee (2):
>   nvdimm: use configurable ACPI IO base and size
>   hw/arm/virt: Add nvdimm hot-plug infrastructure
> 
> Shameer Kolothum (3):
>   hw/arm/boot: introduce fdt_add_memory_node helper
>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
> 
>  accel/kvm/kvm-all.c             |   2 +-
>  default-configs/arm-softmmu.mak |   4 +
>  hw/acpi/nvdimm.c                |  31 ++-
>  hw/arm/boot.c                   | 136 ++++++++++--
>  hw/arm/virt-acpi-build.c        |  23 +-
>  hw/arm/virt.c                   | 364 ++++++++++++++++++++++++++++----
>  hw/i386/pc_piix.c               |   6 +-
>  hw/i386/pc_q35.c                |   6 +-
>  hw/ppc/mac_newworld.c           |   3 +-
>  hw/ppc/mac_oldworld.c           |   2 +-
>  hw/ppc/spapr.c                  |   2 +-
>  include/hw/arm/virt.h           |  24 ++-
>  include/hw/boards.h             |   5 +-
>  include/hw/mem/nvdimm.h         |   4 +
>  target/arm/kvm.c                |  10 +
>  target/arm/kvm_arm.h            |  13 ++
>  vl.c                            |   6 +-
>  17 files changed, 556 insertions(+), 85 deletions(-)
>

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Auger Eric 6 years, 8 months ago

Hi Igor,

On 2/22/19 5:27 PM, Igor Mammedov wrote:
> On Wed, 20 Feb 2019 23:39:46 +0100
> Eric Auger <eric.auger@redhat.com> wrote:
> 
>> This series aims to bump the 255GB RAM limit in machvirt and to
>> support device memory in general, and especially PCDIMM/NVDIMM.
>>
>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
>> grow up to 255GB. From 256GB onwards we find IO regions such as the
>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
>> MMIO region. The address map was 1TB large. This corresponded to
>> the max IPA capacity KVM was able to manage.
>>
>> Since 4.20, the host kernel is able to support a larger and dynamic
>> IPA range. So the guest physical address can go beyond the 1TB. The
>> max GPA size depends on the host kernel configuration and physical CPUs.
>>
>> In this series we use this feature and allow the RAM to grow without
>> any other limit than the one put by the host kernel.
>>
>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
>> ram_size and then comes the device memory (,maxmem) of size
>> maxram_size - ram_size. The device memory is potentially hotpluggable
>> depending on the instantiated memory objects.
>>
>> IO regions previously located between 256GB and 1TB are moved after
>> the RAM. Their offset is dynamically computed, depends on ram_size
>> and maxram_size. Size alignment is enforced.
>>
>> In case maxmem value is inferior to 255GB, the legacy memory map
>> still is used. The change of memory map becomes effective from 4.0
>> onwards.
>>
>> As we keep the initial RAM at 1GB base address, we do not need to do
>> invasive changes in the EDK2 FW. It seems nobody is eager to do
>> that job at the moment.
>>
>> Device memory being put just after the initial RAM, it is possible
>> to get access to this feature while keeping a 1TB address map.
>>
>> This series reuses/rebases patches initially submitted by Shameer
>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
>>
>> Functionally, the series is split into 3 parts:
>> 1) bump of the initial RAM limit [1 - 9] and change in
>>    the memory map
> 
>> 2) Support of PC-DIMM [10 - 13]
> Is this part complete ACPI wise (for coldplug)? I haven't noticed
> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
> visible to the guest. It might be that DT is masking problem
> but well, that won't work on ACPI only guests.

guest /proc/meminfo or "lshw -class memory" reflects the amount of mem
added with the DIMM slots. So it looks fine to me. Isn't E820 a pure x86
matter? What else would you expect in the dsdt? I understand hotplug
would require extra modifications but I don't see anything else missing
for coldplug.
> Even though I've tried make mem hotplug ACPI parts not x86 specific,
> I'm afraid it might be tightly coupled with hotplug support.
> So here are 2 options make DSDT part work without hotplug or
> implement hotplug here. I think the former is just a waste of time
> and we should just add hotplug. It should take relatively minor effort
> since you already implemented most of boiler plate here.

Shameer sent an RFC series for supporting hotplug.

[RFC PATCH 0/4] ARM virt: ACPI memory hotplug support
https://patchwork.kernel.org/cover/10783589/

I tested PCDIMM hotplug (with ACPI) this afternoon and it seemed to be
OK, even after system_reset.

Note the hotplug kernel support on ARM is very recent. I would prefer to
dissociate both efforts if we want to get a chance making coldplug for
4.0. Also we have an issue for NVDIMM since on reboot the guest does not
boot properly.

> 
> As for how to implement ACPI HW part, I suggest to borrow GED
> device that NEMU guys trying to use instead of GPIO route,
> like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event.
> So that it would be easier to share this with their virt-x86
> machine eventually.
Sounds like a different approach than the one initiated by Shameer?

Thanks

Eric
> 
> 
>> 3) Support of NV-DIMM [14 - 17]
> The same might be true for NUMA but I haven't dug this deep in to
> that part.
> 
>>
>> 1) can be upstreamed before 2 and 2 can be upstreamed before 3.
>>
>> Work is ongoing to transform the whole memory as device memory.
>> However this move is not trivial and to me, is independent on
>> the improvements brought by this series:
>> - if we were to use DIMM for initial RAM, those DIMMs would use
>>   use slots. Although they would not be part of the ones provided
>>   using the ",slots" options, they are ACPI limited resources.
>> - DT and ACPI description needs to be reworked
>> - NUMA integration needs special care
>> - a special device memory object may be required to avoid consuming
>>   slots and easing the FW description.
>>
>> So I preferred to separate the concerns. This new implementation
>> based on device memory could be candidate for another virt
>> version.
>>
>> Best Regards
>>
>> Eric
>>
>> References:
>>
>> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
>> http://patchwork.ozlabs.org/cover/914694/
>>
>> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
>>
>> This series can be found at:
>> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7
>>
>> History:
>>
>> v6 -> v7:
>> - Addressed Peter and Igor comments (exceptions sent my email)
>> - Fixed TCG case. Now device memory works also for TCG and vcpu
>>   pamax is checked
>> - See individual logs for more details
>>
>> v5 -> v6:
>> - mingw compilation issue fix
>> - kvm_arm_get_max_vm_phys_shift always returns the number of supported
>>   IPA bits
>> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review
>>   of "hw/arm/virt: Split the memory map description"
>> - "hw/arm/virt: Move memory map initialization into machvirt_init"
>>   squashed into the previous patch
>> - change alignment of IO regions beyond the RAM so that it matches their
>>   size
>>
>> v4 -> v5:
>> - change in the memory map
>> - see individual logs
>>
>> v3 -> v4:
>> - rebase on David's "pc-dimm: next bunch of cleanups" and
>>   "pc-dimm: pre_plug "slot" and "addr" assignment"
>> - kvm-type option not used anymore. We directly use
>>   maxram_size and ram_size machine fields to compute the
>>   MAX IPA range. Migration is naturally handled as CLI
>>   option are kept between source and destination. This was
>>   suggested by David.
>> - device_memory_start and device_memory_size not stored
>>   anymore in vms->bootinfo
>> - I did not take into account 2 Igor's comments: the one
>>   related to the refactoring of arm_load_dtb and the one
>>   related to the generation of the dtb after system_reset
>>   which would contain nodes of hotplugged devices (we do
>>   not support hotplug at this stage)
>> - check the end-user does not attempt to hotplug a device
>> - addition of "vl: Set machine ram_size, maxram_size and
>>   ram_slots earlier"
>>
>> v2 -> v3:
>> - fix pc_q35 and pc_piix compilation error
>> - kwangwoo's email being not valid anymore, remove his address
>>
>> v1 -> v2:
>> - kvm_get_max_vm_phys_shift moved in arch specific file
>> - addition of NVDIMM part
>> - single series
>> - rebase on David's refactoring
>>
>> v1:
>> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
>> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
>>
>> Best Regards
>>
>> Eric
>>
>>
>> Eric Auger (12):
>>   hw/arm/virt: Rename highmem IO regions
>>   hw/arm/virt: Split the memory map description
>>   hw/boards: Add a MachineState parameter to kvm_type callback
>>   kvm: add kvm_arm_get_max_vm_ipa_size
>>   vl: Set machine ram_size, maxram_size and ram_slots earlier
>>   hw/arm/virt: Dynamic memory map depending on RAM requirements
>>   hw/arm/virt: Implement kvm_type function for 4.0 machine
>>   hw/arm/virt: Bump the 255GB initial RAM limit
>>   hw/arm/virt: Add memory hotplug framework
>>   hw/arm/virt: Allocate device_memory
>>   hw/arm/boot: Expose the pmem nodes in the DT
>>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
>>
>> Kwangwoo Lee (2):
>>   nvdimm: use configurable ACPI IO base and size
>>   hw/arm/virt: Add nvdimm hot-plug infrastructure
>>
>> Shameer Kolothum (3):
>>   hw/arm/boot: introduce fdt_add_memory_node helper
>>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
>>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
>>
>>  accel/kvm/kvm-all.c             |   2 +-
>>  default-configs/arm-softmmu.mak |   4 +
>>  hw/acpi/nvdimm.c                |  31 ++-
>>  hw/arm/boot.c                   | 136 ++++++++++--
>>  hw/arm/virt-acpi-build.c        |  23 +-
>>  hw/arm/virt.c                   | 364 ++++++++++++++++++++++++++++----
>>  hw/i386/pc_piix.c               |   6 +-
>>  hw/i386/pc_q35.c                |   6 +-
>>  hw/ppc/mac_newworld.c           |   3 +-
>>  hw/ppc/mac_oldworld.c           |   2 +-
>>  hw/ppc/spapr.c                  |   2 +-
>>  include/hw/arm/virt.h           |  24 ++-
>>  include/hw/boards.h             |   5 +-
>>  include/hw/mem/nvdimm.h         |   4 +
>>  target/arm/kvm.c                |  10 +
>>  target/arm/kvm_arm.h            |  13 ++
>>  vl.c                            |   6 +-
>>  17 files changed, 556 insertions(+), 85 deletions(-)
>>
> 
>

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Igor Mammedov 6 years, 8 months ago

On Fri, 22 Feb 2019 18:35:26 +0100
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Igor,
> 
> On 2/22/19 5:27 PM, Igor Mammedov wrote:
> > On Wed, 20 Feb 2019 23:39:46 +0100
> > Eric Auger <eric.auger@redhat.com> wrote:
> > 
> >> This series aims to bump the 255GB RAM limit in machvirt and to
> >> support device memory in general, and especially PCDIMM/NVDIMM.
> >>
> >> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
> >> grow up to 255GB. From 256GB onwards we find IO regions such as the
> >> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
> >> MMIO region. The address map was 1TB large. This corresponded to
> >> the max IPA capacity KVM was able to manage.
> >>
> >> Since 4.20, the host kernel is able to support a larger and dynamic
> >> IPA range. So the guest physical address can go beyond the 1TB. The
> >> max GPA size depends on the host kernel configuration and physical CPUs.
> >>
> >> In this series we use this feature and allow the RAM to grow without
> >> any other limit than the one put by the host kernel.
> >>
> >> The RAM still starts at 1GB. First comes the initial ram (-m) of size
> >> ram_size and then comes the device memory (,maxmem) of size
> >> maxram_size - ram_size. The device memory is potentially hotpluggable
> >> depending on the instantiated memory objects.
> >>
> >> IO regions previously located between 256GB and 1TB are moved after
> >> the RAM. Their offset is dynamically computed, depends on ram_size
> >> and maxram_size. Size alignment is enforced.
> >>
> >> In case maxmem value is inferior to 255GB, the legacy memory map
> >> still is used. The change of memory map becomes effective from 4.0
> >> onwards.
> >>
> >> As we keep the initial RAM at 1GB base address, we do not need to do
> >> invasive changes in the EDK2 FW. It seems nobody is eager to do
> >> that job at the moment.
> >>
> >> Device memory being put just after the initial RAM, it is possible
> >> to get access to this feature while keeping a 1TB address map.
> >>
> >> This series reuses/rebases patches initially submitted by Shameer
> >> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
> >>
> >> Functionally, the series is split into 3 parts:
> >> 1) bump of the initial RAM limit [1 - 9] and change in
> >>    the memory map
> > 
> >> 2) Support of PC-DIMM [10 - 13]
> > Is this part complete ACPI wise (for coldplug)? I haven't noticed
> > DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
> > visible to the guest. It might be that DT is masking problem
> > but well, that won't work on ACPI only guests.
> 
> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem
> added with the DIMM slots.
Question is how does it get there? Does it come from DT or from firmware
via UEFI interfaces?

> So it looks fine to me. Isn't E820 a pure x86 matter?
sorry for misleading, I've meant is UEFI GetMemoryMap().
On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
via UEFI GetMemoryMap() as guest kernel might start using it as normal
memory early at boot and later put that memory into zone normal and hence
make it non-hot-un-pluggable. The same concerns apply to DT based means
of discovery.
(That's guest issue but it's easy to workaround it not putting hotpluggable
memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly)
That way memory doesn't get (ab)used by firmware or early boot kernel stages
and doesn't get locked up.

> What else would you expect in the dsdt?
Memory device descriptions, look for code that adds PNP0C80 with _CRS
describing memory ranges

> I understand hotplug
> would require extra modifications but I don't see anything else missing
> for coldplug.
> > Even though I've tried make mem hotplug ACPI parts not x86 specific,
> > I'm afraid it might be tightly coupled with hotplug support.
> > So here are 2 options make DSDT part work without hotplug or
> > implement hotplug here. I think the former is just a waste of time
> > and we should just add hotplug. It should take relatively minor effort
> > since you already implemented most of boiler plate here.
> 
> Shameer sent an RFC series for supporting hotplug.
> 
> [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support
> https://patchwork.kernel.org/cover/10783589/
> 
> I tested PCDIMM hotplug (with ACPI) this afternoon and it seemed to be
> OK, even after system_reset.
> 
> Note the hotplug kernel support on ARM is very recent. I would prefer to
> dissociate both efforts if we want to get a chance making coldplug for
> 4.0. Also we have an issue for NVDIMM since on reboot the guest does not
> boot properly.
I guess we can merge implemetation that works on some kernel configs
[DT based I'd guess], and add ACPI part later. Though that will be
a bit of a mess as we do not version firmware parts (ACPI tables).

> > As for how to implement ACPI HW part, I suggest to borrow GED
> > device that NEMU guys trying to use instead of GPIO route,
> > like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event.
> > So that it would be easier to share this with their virt-x86
> > machine eventually.
> Sounds like a different approach than the one initiated by Shameer?
ARM boards were first to use ACPI hw-reduced profile so they picked up
available back then GPIO based way to deliver hotplug event, later spec
introduced Generic Event Device for that means to use with hw-reduced
profile, which NEMU implemented[1], so I'd use that rather than ad-hoc
GPIO mapping. I'd guess it will more compatible with various contemporary
guests and we could reuse the same code for both x86/arm virt boards)

1) https://github.com/intel/nemu/blob/topic/virt-x86/hw/acpi/ged.c

> 
> Thanks
> 
> Eric
> > 
> > 
> >> 3) Support of NV-DIMM [14 - 17]
> > The same might be true for NUMA but I haven't dug this deep in to
> > that part.
> > 
> >>
> >> 1) can be upstreamed before 2 and 2 can be upstreamed before 3.
> >>
> >> Work is ongoing to transform the whole memory as device memory.
> >> However this move is not trivial and to me, is independent on
> >> the improvements brought by this series:
> >> - if we were to use DIMM for initial RAM, those DIMMs would use
> >>   use slots. Although they would not be part of the ones provided
> >>   using the ",slots" options, they are ACPI limited resources.
> >> - DT and ACPI description needs to be reworked
> >> - NUMA integration needs special care
> >> - a special device memory object may be required to avoid consuming
> >>   slots and easing the FW description.
> >>
> >> So I preferred to separate the concerns. This new implementation
> >> based on device memory could be candidate for another virt
> >> version.
> >>
> >> Best Regards
> >>
> >> Eric
> >>
> >> References:
> >>
> >> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> >> http://patchwork.ozlabs.org/cover/914694/
> >>
> >> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
> >> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
> >>
> >> This series can be found at:
> >> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7
> >>
> >> History:
> >>
> >> v6 -> v7:
> >> - Addressed Peter and Igor comments (exceptions sent my email)
> >> - Fixed TCG case. Now device memory works also for TCG and vcpu
> >>   pamax is checked
> >> - See individual logs for more details
> >>
> >> v5 -> v6:
> >> - mingw compilation issue fix
> >> - kvm_arm_get_max_vm_phys_shift always returns the number of supported
> >>   IPA bits
> >> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review
> >>   of "hw/arm/virt: Split the memory map description"
> >> - "hw/arm/virt: Move memory map initialization into machvirt_init"
> >>   squashed into the previous patch
> >> - change alignment of IO regions beyond the RAM so that it matches their
> >>   size
> >>
> >> v4 -> v5:
> >> - change in the memory map
> >> - see individual logs
> >>
> >> v3 -> v4:
> >> - rebase on David's "pc-dimm: next bunch of cleanups" and
> >>   "pc-dimm: pre_plug "slot" and "addr" assignment"
> >> - kvm-type option not used anymore. We directly use
> >>   maxram_size and ram_size machine fields to compute the
> >>   MAX IPA range. Migration is naturally handled as CLI
> >>   option are kept between source and destination. This was
> >>   suggested by David.
> >> - device_memory_start and device_memory_size not stored
> >>   anymore in vms->bootinfo
> >> - I did not take into account 2 Igor's comments: the one
> >>   related to the refactoring of arm_load_dtb and the one
> >>   related to the generation of the dtb after system_reset
> >>   which would contain nodes of hotplugged devices (we do
> >>   not support hotplug at this stage)
> >> - check the end-user does not attempt to hotplug a device
> >> - addition of "vl: Set machine ram_size, maxram_size and
> >>   ram_slots earlier"
> >>
> >> v2 -> v3:
> >> - fix pc_q35 and pc_piix compilation error
> >> - kwangwoo's email being not valid anymore, remove his address
> >>
> >> v1 -> v2:
> >> - kvm_get_max_vm_phys_shift moved in arch specific file
> >> - addition of NVDIMM part
> >> - single series
> >> - rebase on David's refactoring
> >>
> >> v1:
> >> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> >> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> >>
> >> Best Regards
> >>
> >> Eric
> >>
> >>
> >> Eric Auger (12):
> >>   hw/arm/virt: Rename highmem IO regions
> >>   hw/arm/virt: Split the memory map description
> >>   hw/boards: Add a MachineState parameter to kvm_type callback
> >>   kvm: add kvm_arm_get_max_vm_ipa_size
> >>   vl: Set machine ram_size, maxram_size and ram_slots earlier
> >>   hw/arm/virt: Dynamic memory map depending on RAM requirements
> >>   hw/arm/virt: Implement kvm_type function for 4.0 machine
> >>   hw/arm/virt: Bump the 255GB initial RAM limit
> >>   hw/arm/virt: Add memory hotplug framework
> >>   hw/arm/virt: Allocate device_memory
> >>   hw/arm/boot: Expose the pmem nodes in the DT
> >>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
> >>
> >> Kwangwoo Lee (2):
> >>   nvdimm: use configurable ACPI IO base and size
> >>   hw/arm/virt: Add nvdimm hot-plug infrastructure
> >>
> >> Shameer Kolothum (3):
> >>   hw/arm/boot: introduce fdt_add_memory_node helper
> >>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
> >>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
> >>
> >>  accel/kvm/kvm-all.c             |   2 +-
> >>  default-configs/arm-softmmu.mak |   4 +
> >>  hw/acpi/nvdimm.c                |  31 ++-
> >>  hw/arm/boot.c                   | 136 ++++++++++--
> >>  hw/arm/virt-acpi-build.c        |  23 +-
> >>  hw/arm/virt.c                   | 364 ++++++++++++++++++++++++++++----
> >>  hw/i386/pc_piix.c               |   6 +-
> >>  hw/i386/pc_q35.c                |   6 +-
> >>  hw/ppc/mac_newworld.c           |   3 +-
> >>  hw/ppc/mac_oldworld.c           |   2 +-
> >>  hw/ppc/spapr.c                  |   2 +-
> >>  include/hw/arm/virt.h           |  24 ++-
> >>  include/hw/boards.h             |   5 +-
> >>  include/hw/mem/nvdimm.h         |   4 +
> >>  target/arm/kvm.c                |  10 +
> >>  target/arm/kvm_arm.h            |  13 ++
> >>  vl.c                            |   6 +-
> >>  17 files changed, 556 insertions(+), 85 deletions(-)
> >>
> > 
> >

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Shameerali Kolothum Thodi 6 years, 8 months ago

Hi Igor,

> -----Original Message-----
> From: Igor Mammedov [mailto:imammedo@redhat.com]
> Sent: 25 February 2019 09:42
> To: Auger Eric <eric.auger@redhat.com>
> Cc: peter.maydell@linaro.org; drjones@redhat.com; david@redhat.com;
> qemu-devel@nongnu.org; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; dgilbert@redhat.com;
> qemu-arm@nongnu.org; david@gibson.dropbear.id.au;
> eric.auger.pro@gmail.com
> Subject: Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion
> and PCDIMM/NVDIMM support
> 
> On Fri, 22 Feb 2019 18:35:26 +0100
> Auger Eric <eric.auger@redhat.com> wrote:

[...]

> > I understand hotplug
> > would require extra modifications but I don't see anything else missing
> > for coldplug.
> > > Even though I've tried make mem hotplug ACPI parts not x86 specific,
> > > I'm afraid it might be tightly coupled with hotplug support.
> > > So here are 2 options make DSDT part work without hotplug or
> > > implement hotplug here. I think the former is just a waste of time
> > > and we should just add hotplug. It should take relatively minor effort
> > > since you already implemented most of boiler plate here.
> >
> > Shameer sent an RFC series for supporting hotplug.
> >
> > [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support
> > https://patchwork.kernel.org/cover/10783589/
> >
> > I tested PCDIMM hotplug (with ACPI) this afternoon and it seemed to be
> > OK, even after system_reset.
> >
> > Note the hotplug kernel support on ARM is very recent. I would prefer to
> > dissociate both efforts if we want to get a chance making coldplug for
> > 4.0. Also we have an issue for NVDIMM since on reboot the guest does not
> > boot properly.
> I guess we can merge implemetation that works on some kernel configs
> [DT based I'd guess], and add ACPI part later. Though that will be
> a bit of a mess as we do not version firmware parts (ACPI tables).
> 
> > > As for how to implement ACPI HW part, I suggest to borrow GED
> > > device that NEMU guys trying to use instead of GPIO route,
> > > like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event.
> > > So that it would be easier to share this with their virt-x86
> > > machine eventually.
> > Sounds like a different approach than the one initiated by Shameer?
> ARM boards were first to use ACPI hw-reduced profile so they picked up
> available back then GPIO based way to deliver hotplug event, later spec
> introduced Generic Event Device for that means to use with hw-reduced
> profile, which NEMU implemented[1], so I'd use that rather than ad-hoc
> GPIO mapping. I'd guess it will more compatible with various contemporary
> guests and we could reuse the same code for both x86/arm virt boards)
> 
> 1) https://github.com/intel/nemu/blob/topic/virt-x86/hw/acpi/ged.c

Thanks for this suggestion. Looks like that is definitely a better way of handling
ACPI events. I will take a look and use GED for the next revision of hotplug 
series.

Thanks,
Shameer

> > Thanks
> >
> > Eric
> > >
> > >
> > >> 3) Support of NV-DIMM [14 - 17]
> > > The same might be true for NUMA but I haven't dug this deep in to
> > > that part.
> > >
> > >>
> > >> 1) can be upstreamed before 2 and 2 can be upstreamed before 3.
> > >>
> > >> Work is ongoing to transform the whole memory as device memory.
> > >> However this move is not trivial and to me, is independent on
> > >> the improvements brought by this series:
> > >> - if we were to use DIMM for initial RAM, those DIMMs would use
> > >>   use slots. Although they would not be part of the ones provided
> > >>   using the ",slots" options, they are ACPI limited resources.
> > >> - DT and ACPI description needs to be reworked
> > >> - NUMA integration needs special care
> > >> - a special device memory object may be required to avoid consuming
> > >>   slots and easing the FW description.
> > >>
> > >> So I preferred to separate the concerns. This new implementation
> > >> based on device memory could be candidate for another virt
> > >> version.
> > >>
> > >> Best Regards
> > >>
> > >> Eric
> > >>
> > >> References:
> > >>
> > >> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> > >> http://patchwork.ozlabs.org/cover/914694/
> > >>
> > >> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
> > >> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
> > >>
> > >> This series can be found at:
> > >> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7
> > >>
> > >> History:
> > >>
> > >> v6 -> v7:
> > >> - Addressed Peter and Igor comments (exceptions sent my email)
> > >> - Fixed TCG case. Now device memory works also for TCG and vcpu
> > >>   pamax is checked
> > >> - See individual logs for more details
> > >>
> > >> v5 -> v6:
> > >> - mingw compilation issue fix
> > >> - kvm_arm_get_max_vm_phys_shift always returns the number of
> supported
> > >>   IPA bits
> > >> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the
> review
> > >>   of "hw/arm/virt: Split the memory map description"
> > >> - "hw/arm/virt: Move memory map initialization into machvirt_init"
> > >>   squashed into the previous patch
> > >> - change alignment of IO regions beyond the RAM so that it matches their
> > >>   size
> > >>
> > >> v4 -> v5:
> > >> - change in the memory map
> > >> - see individual logs
> > >>
> > >> v3 -> v4:
> > >> - rebase on David's "pc-dimm: next bunch of cleanups" and
> > >>   "pc-dimm: pre_plug "slot" and "addr" assignment"
> > >> - kvm-type option not used anymore. We directly use
> > >>   maxram_size and ram_size machine fields to compute the
> > >>   MAX IPA range. Migration is naturally handled as CLI
> > >>   option are kept between source and destination. This was
> > >>   suggested by David.
> > >> - device_memory_start and device_memory_size not stored
> > >>   anymore in vms->bootinfo
> > >> - I did not take into account 2 Igor's comments: the one
> > >>   related to the refactoring of arm_load_dtb and the one
> > >>   related to the generation of the dtb after system_reset
> > >>   which would contain nodes of hotplugged devices (we do
> > >>   not support hotplug at this stage)
> > >> - check the end-user does not attempt to hotplug a device
> > >> - addition of "vl: Set machine ram_size, maxram_size and
> > >>   ram_slots earlier"
> > >>
> > >> v2 -> v3:
> > >> - fix pc_q35 and pc_piix compilation error
> > >> - kwangwoo's email being not valid anymore, remove his address
> > >>
> > >> v1 -> v2:
> > >> - kvm_get_max_vm_phys_shift moved in arch specific file
> > >> - addition of NVDIMM part
> > >> - single series
> > >> - rebase on David's refactoring
> > >>
> > >> v1:
> > >> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
> > >> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
> > >>
> > >> Best Regards
> > >>
> > >> Eric
> > >>
> > >>
> > >> Eric Auger (12):
> > >>   hw/arm/virt: Rename highmem IO regions
> > >>   hw/arm/virt: Split the memory map description
> > >>   hw/boards: Add a MachineState parameter to kvm_type callback
> > >>   kvm: add kvm_arm_get_max_vm_ipa_size
> > >>   vl: Set machine ram_size, maxram_size and ram_slots earlier
> > >>   hw/arm/virt: Dynamic memory map depending on RAM requirements
> > >>   hw/arm/virt: Implement kvm_type function for 4.0 machine
> > >>   hw/arm/virt: Bump the 255GB initial RAM limit
> > >>   hw/arm/virt: Add memory hotplug framework
> > >>   hw/arm/virt: Allocate device_memory
> > >>   hw/arm/boot: Expose the pmem nodes in the DT
> > >>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
> > >>
> > >> Kwangwoo Lee (2):
> > >>   nvdimm: use configurable ACPI IO base and size
> > >>   hw/arm/virt: Add nvdimm hot-plug infrastructure
> > >>
> > >> Shameer Kolothum (3):
> > >>   hw/arm/boot: introduce fdt_add_memory_node helper
> > >>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
> > >>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
> > >>
> > >>  accel/kvm/kvm-all.c             |   2 +-
> > >>  default-configs/arm-softmmu.mak |   4 +
> > >>  hw/acpi/nvdimm.c                |  31 ++-
> > >>  hw/arm/boot.c                   | 136 ++++++++++--
> > >>  hw/arm/virt-acpi-build.c        |  23 +-
> > >>  hw/arm/virt.c                   | 364
> ++++++++++++++++++++++++++++----
> > >>  hw/i386/pc_piix.c               |   6 +-
> > >>  hw/i386/pc_q35.c                |   6 +-
> > >>  hw/ppc/mac_newworld.c           |   3 +-
> > >>  hw/ppc/mac_oldworld.c           |   2 +-
> > >>  hw/ppc/spapr.c                  |   2 +-
> > >>  include/hw/arm/virt.h           |  24 ++-
> > >>  include/hw/boards.h             |   5 +-
> > >>  include/hw/mem/nvdimm.h         |   4 +
> > >>  target/arm/kvm.c                |  10 +
> > >>  target/arm/kvm_arm.h            |  13 ++
> > >>  vl.c                            |   6 +-
> > >>  17 files changed, 556 insertions(+), 85 deletions(-)
> > >>
> > >
> > >

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Auger Eric 6 years, 8 months ago

Hi Igor,

On 2/25/19 10:42 AM, Igor Mammedov wrote:
> On Fri, 22 Feb 2019 18:35:26 +0100
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Igor,
>>
>> On 2/22/19 5:27 PM, Igor Mammedov wrote:
>>> On Wed, 20 Feb 2019 23:39:46 +0100
>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>
>>>> This series aims to bump the 255GB RAM limit in machvirt and to
>>>> support device memory in general, and especially PCDIMM/NVDIMM.
>>>>
>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the
>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
>>>> MMIO region. The address map was 1TB large. This corresponded to
>>>> the max IPA capacity KVM was able to manage.
>>>>
>>>> Since 4.20, the host kernel is able to support a larger and dynamic
>>>> IPA range. So the guest physical address can go beyond the 1TB. The
>>>> max GPA size depends on the host kernel configuration and physical CPUs.
>>>>
>>>> In this series we use this feature and allow the RAM to grow without
>>>> any other limit than the one put by the host kernel.
>>>>
>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
>>>> ram_size and then comes the device memory (,maxmem) of size
>>>> maxram_size - ram_size. The device memory is potentially hotpluggable
>>>> depending on the instantiated memory objects.
>>>>
>>>> IO regions previously located between 256GB and 1TB are moved after
>>>> the RAM. Their offset is dynamically computed, depends on ram_size
>>>> and maxram_size. Size alignment is enforced.
>>>>
>>>> In case maxmem value is inferior to 255GB, the legacy memory map
>>>> still is used. The change of memory map becomes effective from 4.0
>>>> onwards.
>>>>
>>>> As we keep the initial RAM at 1GB base address, we do not need to do
>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
>>>> that job at the moment.
>>>>
>>>> Device memory being put just after the initial RAM, it is possible
>>>> to get access to this feature while keeping a 1TB address map.
>>>>
>>>> This series reuses/rebases patches initially submitted by Shameer
>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
>>>>
>>>> Functionally, the series is split into 3 parts:
>>>> 1) bump of the initial RAM limit [1 - 9] and change in
>>>>    the memory map
>>>
>>>> 2) Support of PC-DIMM [10 - 13]
>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
>>> visible to the guest. It might be that DT is masking problem
>>> but well, that won't work on ACPI only guests.
>>
>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem
>> added with the DIMM slots.
> Question is how does it get there? Does it come from DT or from firmware
> via UEFI interfaces?
> 
>> So it looks fine to me. Isn't E820 a pure x86 matter?
> sorry for misleading, I've meant is UEFI GetMemoryMap().
> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
> via UEFI GetMemoryMap() as guest kernel might start using it as normal
> memory early at boot and later put that memory into zone normal and hence
> make it non-hot-un-pluggable. The same concerns apply to DT based means
> of discovery.
> (That's guest issue but it's easy to workaround it not putting hotpluggable
> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly)
> That way memory doesn't get (ab)used by firmware or early boot kernel stages
> and doesn't get locked up.
> 
>> What else would you expect in the dsdt?
> Memory device descriptions, look for code that adds PNP0C80 with _CRS
> describing memory ranges

OK thank you for the explanations. I will work on PNP0C80 addition then.
Does it mean that in ACPI mode we must not output DT hotplug memory
nodes or assuming that PNP0C80 is properly described, it will "override"
DT description?

> 
>> I understand hotplug
>> would require extra modifications but I don't see anything else missing
>> for coldplug.
>>> Even though I've tried make mem hotplug ACPI parts not x86 specific,
>>> I'm afraid it might be tightly coupled with hotplug support.
>>> So here are 2 options make DSDT part work without hotplug or
>>> implement hotplug here. I think the former is just a waste of time
>>> and we should just add hotplug. It should take relatively minor effort
>>> since you already implemented most of boiler plate here.
>>
>> Shameer sent an RFC series for supporting hotplug.
>>
>> [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support
>> https://patchwork.kernel.org/cover/10783589/
>>
>> I tested PCDIMM hotplug (with ACPI) this afternoon and it seemed to be
>> OK, even after system_reset.
>>
>> Note the hotplug kernel support on ARM is very recent. I would prefer to
>> dissociate both efforts if we want to get a chance making coldplug for
>> 4.0. Also we have an issue for NVDIMM since on reboot the guest does not
>> boot properly.
> I guess we can merge implemetation that works on some kernel configs
> [DT based I'd guess], and add ACPI part later. Though that will be
> a bit of a mess as we do not version firmware parts (ACPI tables).
> 
>>> As for how to implement ACPI HW part, I suggest to borrow GED
>>> device that NEMU guys trying to use instead of GPIO route,
>>> like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event.
>>> So that it would be easier to share this with their virt-x86
>>> machine eventually.
>> Sounds like a different approach than the one initiated by Shameer?
> ARM boards were first to use ACPI hw-reduced profile so they picked up
> available back then GPIO based way to deliver hotplug event, later spec
> introduced Generic Event Device for that means to use with hw-reduced
> profile, which NEMU implemented[1], so I'd use that rather than ad-hoc
> GPIO mapping. I'd guess it will more compatible with various contemporary
> guests and we could reuse the same code for both x86/arm virt boards)
> 
> 1) https://github.com/intel/nemu/blob/topic/virt-x86/hw/acpi/ged.c

That's really helpful for the ARM hotplug works. Thanks!

Eric
> 
>>
>> Thanks
>>
>> Eric
>>>
>>>
>>>> 3) Support of NV-DIMM [14 - 17]
>>> The same might be true for NUMA but I haven't dug this deep in to
>>> that part.
>>>
>>>>
>>>> 1) can be upstreamed before 2 and 2 can be upstreamed before 3.
>>>>
>>>> Work is ongoing to transform the whole memory as device memory.
>>>> However this move is not trivial and to me, is independent on
>>>> the improvements brought by this series:
>>>> - if we were to use DIMM for initial RAM, those DIMMs would use
>>>>   use slots. Although they would not be part of the ones provided
>>>>   using the ",slots" options, they are ACPI limited resources.
>>>> - DT and ACPI description needs to be reworked
>>>> - NUMA integration needs special care
>>>> - a special device memory object may be required to avoid consuming
>>>>   slots and easing the FW description.
>>>>
>>>> So I preferred to separate the concerns. This new implementation
>>>> based on device memory could be candidate for another virt
>>>> version.
>>>>
>>>> Best Regards
>>>>
>>>> Eric
>>>>
>>>> References:
>>>>
>>>> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
>>>> http://patchwork.ozlabs.org/cover/914694/
>>>>
>>>> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
>>>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
>>>>
>>>> This series can be found at:
>>>> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7
>>>>
>>>> History:
>>>>
>>>> v6 -> v7:
>>>> - Addressed Peter and Igor comments (exceptions sent my email)
>>>> - Fixed TCG case. Now device memory works also for TCG and vcpu
>>>>   pamax is checked
>>>> - See individual logs for more details
>>>>
>>>> v5 -> v6:
>>>> - mingw compilation issue fix
>>>> - kvm_arm_get_max_vm_phys_shift always returns the number of supported
>>>>   IPA bits
>>>> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review
>>>>   of "hw/arm/virt: Split the memory map description"
>>>> - "hw/arm/virt: Move memory map initialization into machvirt_init"
>>>>   squashed into the previous patch
>>>> - change alignment of IO regions beyond the RAM so that it matches their
>>>>   size
>>>>
>>>> v4 -> v5:
>>>> - change in the memory map
>>>> - see individual logs
>>>>
>>>> v3 -> v4:
>>>> - rebase on David's "pc-dimm: next bunch of cleanups" and
>>>>   "pc-dimm: pre_plug "slot" and "addr" assignment"
>>>> - kvm-type option not used anymore. We directly use
>>>>   maxram_size and ram_size machine fields to compute the
>>>>   MAX IPA range. Migration is naturally handled as CLI
>>>>   option are kept between source and destination. This was
>>>>   suggested by David.
>>>> - device_memory_start and device_memory_size not stored
>>>>   anymore in vms->bootinfo
>>>> - I did not take into account 2 Igor's comments: the one
>>>>   related to the refactoring of arm_load_dtb and the one
>>>>   related to the generation of the dtb after system_reset
>>>>   which would contain nodes of hotplugged devices (we do
>>>>   not support hotplug at this stage)
>>>> - check the end-user does not attempt to hotplug a device
>>>> - addition of "vl: Set machine ram_size, maxram_size and
>>>>   ram_slots earlier"
>>>>
>>>> v2 -> v3:
>>>> - fix pc_q35 and pc_piix compilation error
>>>> - kwangwoo's email being not valid anymore, remove his address
>>>>
>>>> v1 -> v2:
>>>> - kvm_get_max_vm_phys_shift moved in arch specific file
>>>> - addition of NVDIMM part
>>>> - single series
>>>> - rebase on David's refactoring
>>>>
>>>> v1:
>>>> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
>>>> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
>>>>
>>>> Best Regards
>>>>
>>>> Eric
>>>>
>>>>
>>>> Eric Auger (12):
>>>>   hw/arm/virt: Rename highmem IO regions
>>>>   hw/arm/virt: Split the memory map description
>>>>   hw/boards: Add a MachineState parameter to kvm_type callback
>>>>   kvm: add kvm_arm_get_max_vm_ipa_size
>>>>   vl: Set machine ram_size, maxram_size and ram_slots earlier
>>>>   hw/arm/virt: Dynamic memory map depending on RAM requirements
>>>>   hw/arm/virt: Implement kvm_type function for 4.0 machine
>>>>   hw/arm/virt: Bump the 255GB initial RAM limit
>>>>   hw/arm/virt: Add memory hotplug framework
>>>>   hw/arm/virt: Allocate device_memory
>>>>   hw/arm/boot: Expose the pmem nodes in the DT
>>>>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
>>>>
>>>> Kwangwoo Lee (2):
>>>>   nvdimm: use configurable ACPI IO base and size
>>>>   hw/arm/virt: Add nvdimm hot-plug infrastructure
>>>>
>>>> Shameer Kolothum (3):
>>>>   hw/arm/boot: introduce fdt_add_memory_node helper
>>>>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
>>>>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
>>>>
>>>>  accel/kvm/kvm-all.c             |   2 +-
>>>>  default-configs/arm-softmmu.mak |   4 +
>>>>  hw/acpi/nvdimm.c                |  31 ++-
>>>>  hw/arm/boot.c                   | 136 ++++++++++--
>>>>  hw/arm/virt-acpi-build.c        |  23 +-
>>>>  hw/arm/virt.c                   | 364 ++++++++++++++++++++++++++++----
>>>>  hw/i386/pc_piix.c               |   6 +-
>>>>  hw/i386/pc_q35.c                |   6 +-
>>>>  hw/ppc/mac_newworld.c           |   3 +-
>>>>  hw/ppc/mac_oldworld.c           |   2 +-
>>>>  hw/ppc/spapr.c                  |   2 +-
>>>>  include/hw/arm/virt.h           |  24 ++-
>>>>  include/hw/boards.h             |   5 +-
>>>>  include/hw/mem/nvdimm.h         |   4 +
>>>>  target/arm/kvm.c                |  10 +
>>>>  target/arm/kvm_arm.h            |  13 ++
>>>>  vl.c                            |   6 +-
>>>>  17 files changed, 556 insertions(+), 85 deletions(-)
>>>>
>>>
>>>
>

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Auger Eric 6 years, 8 months ago

Hi Igor,

On 2/26/19 9:40 AM, Auger Eric wrote:
> Hi Igor,
> 
> On 2/25/19 10:42 AM, Igor Mammedov wrote:
>> On Fri, 22 Feb 2019 18:35:26 +0100
>> Auger Eric <eric.auger@redhat.com> wrote:
>>
>>> Hi Igor,
>>>
>>> On 2/22/19 5:27 PM, Igor Mammedov wrote:
>>>> On Wed, 20 Feb 2019 23:39:46 +0100
>>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>>
>>>>> This series aims to bump the 255GB RAM limit in machvirt and to
>>>>> support device memory in general, and especially PCDIMM/NVDIMM.
>>>>>
>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the
>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
>>>>> MMIO region. The address map was 1TB large. This corresponded to
>>>>> the max IPA capacity KVM was able to manage.
>>>>>
>>>>> Since 4.20, the host kernel is able to support a larger and dynamic
>>>>> IPA range. So the guest physical address can go beyond the 1TB. The
>>>>> max GPA size depends on the host kernel configuration and physical CPUs.
>>>>>
>>>>> In this series we use this feature and allow the RAM to grow without
>>>>> any other limit than the one put by the host kernel.
>>>>>
>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
>>>>> ram_size and then comes the device memory (,maxmem) of size
>>>>> maxram_size - ram_size. The device memory is potentially hotpluggable
>>>>> depending on the instantiated memory objects.
>>>>>
>>>>> IO regions previously located between 256GB and 1TB are moved after
>>>>> the RAM. Their offset is dynamically computed, depends on ram_size
>>>>> and maxram_size. Size alignment is enforced.
>>>>>
>>>>> In case maxmem value is inferior to 255GB, the legacy memory map
>>>>> still is used. The change of memory map becomes effective from 4.0
>>>>> onwards.
>>>>>
>>>>> As we keep the initial RAM at 1GB base address, we do not need to do
>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
>>>>> that job at the moment.
>>>>>
>>>>> Device memory being put just after the initial RAM, it is possible
>>>>> to get access to this feature while keeping a 1TB address map.
>>>>>
>>>>> This series reuses/rebases patches initially submitted by Shameer
>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
>>>>>
>>>>> Functionally, the series is split into 3 parts:
>>>>> 1) bump of the initial RAM limit [1 - 9] and change in
>>>>>    the memory map
>>>>
>>>>> 2) Support of PC-DIMM [10 - 13]
>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
>>>> visible to the guest. It might be that DT is masking problem
>>>> but well, that won't work on ACPI only guests.
>>>
>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem
>>> added with the DIMM slots.
>> Question is how does it get there? Does it come from DT or from firmware
>> via UEFI interfaces?
>>
>>> So it looks fine to me. Isn't E820 a pure x86 matter?
>> sorry for misleading, I've meant is UEFI GetMemoryMap().
>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
>> via UEFI GetMemoryMap() as guest kernel might start using it as normal
>> memory early at boot and later put that memory into zone normal and hence
>> make it non-hot-un-pluggable. The same concerns apply to DT based means
>> of discovery.
>> (That's guest issue but it's easy to workaround it not putting hotpluggable
>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly)
>> That way memory doesn't get (ab)used by firmware or early boot kernel stages
>> and doesn't get locked up.
>>
>>> What else would you expect in the dsdt?
>> Memory device descriptions, look for code that adds PNP0C80 with _CRS
>> describing memory ranges
> 
> OK thank you for the explanations. I will work on PNP0C80 addition then.
> Does it mean that in ACPI mode we must not output DT hotplug memory
> nodes or assuming that PNP0C80 is properly described, it will "override"
> DT description?

After further investigations, I think the pieces you pointed out are
added by Shameer's series, ie. through the build_memory_hotplug_aml()
call. So I suggest we separate the concerns: this series brings support
for DIMM coldplug. hotplug, including all the relevant ACPI structures
will be added later on by Shameer.

Thanks

Eric
> 
>>
>>> I understand hotplug
>>> would require extra modifications but I don't see anything else missing
>>> for coldplug.
>>>> Even though I've tried make mem hotplug ACPI parts not x86 specific,
>>>> I'm afraid it might be tightly coupled with hotplug support.
>>>> So here are 2 options make DSDT part work without hotplug or
>>>> implement hotplug here. I think the former is just a waste of time
>>>> and we should just add hotplug. It should take relatively minor effort
>>>> since you already implemented most of boiler plate here.
>>>
>>> Shameer sent an RFC series for supporting hotplug.
>>>
>>> [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support
>>> https://patchwork.kernel.org/cover/10783589/
>>>
>>> I tested PCDIMM hotplug (with ACPI) this afternoon and it seemed to be
>>> OK, even after system_reset.
>>>
>>> Note the hotplug kernel support on ARM is very recent. I would prefer to
>>> dissociate both efforts if we want to get a chance making coldplug for
>>> 4.0. Also we have an issue for NVDIMM since on reboot the guest does not
>>> boot properly.
>> I guess we can merge implemetation that works on some kernel configs
>> [DT based I'd guess], and add ACPI part later. Though that will be
>> a bit of a mess as we do not version firmware parts (ACPI tables).
>>
>>>> As for how to implement ACPI HW part, I suggest to borrow GED
>>>> device that NEMU guys trying to use instead of GPIO route,
>>>> like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event.
>>>> So that it would be easier to share this with their virt-x86
>>>> machine eventually.
>>> Sounds like a different approach than the one initiated by Shameer?
>> ARM boards were first to use ACPI hw-reduced profile so they picked up
>> available back then GPIO based way to deliver hotplug event, later spec
>> introduced Generic Event Device for that means to use with hw-reduced
>> profile, which NEMU implemented[1], so I'd use that rather than ad-hoc
>> GPIO mapping. I'd guess it will more compatible with various contemporary
>> guests and we could reuse the same code for both x86/arm virt boards)
>>
>> 1) https://github.com/intel/nemu/blob/topic/virt-x86/hw/acpi/ged.c
> 
> That's really helpful for the ARM hotplug works. Thanks!
> 
> Eric
>>
>>>
>>> Thanks
>>>
>>> Eric
>>>>
>>>>
>>>>> 3) Support of NV-DIMM [14 - 17]
>>>> The same might be true for NUMA but I haven't dug this deep in to
>>>> that part.
>>>>
>>>>>
>>>>> 1) can be upstreamed before 2 and 2 can be upstreamed before 3.
>>>>>
>>>>> Work is ongoing to transform the whole memory as device memory.
>>>>> However this move is not trivial and to me, is independent on
>>>>> the improvements brought by this series:
>>>>> - if we were to use DIMM for initial RAM, those DIMMs would use
>>>>>   use slots. Although they would not be part of the ones provided
>>>>>   using the ",slots" options, they are ACPI limited resources.
>>>>> - DT and ACPI description needs to be reworked
>>>>> - NUMA integration needs special care
>>>>> - a special device memory object may be required to avoid consuming
>>>>>   slots and easing the FW description.
>>>>>
>>>>> So I preferred to separate the concerns. This new implementation
>>>>> based on device memory could be candidate for another virt
>>>>> version.
>>>>>
>>>>> Best Regards
>>>>>
>>>>> Eric
>>>>>
>>>>> References:
>>>>>
>>>>> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
>>>>> http://patchwork.ozlabs.org/cover/914694/
>>>>>
>>>>> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
>>>>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
>>>>>
>>>>> This series can be found at:
>>>>> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7
>>>>>
>>>>> History:
>>>>>
>>>>> v6 -> v7:
>>>>> - Addressed Peter and Igor comments (exceptions sent my email)
>>>>> - Fixed TCG case. Now device memory works also for TCG and vcpu
>>>>>   pamax is checked
>>>>> - See individual logs for more details
>>>>>
>>>>> v5 -> v6:
>>>>> - mingw compilation issue fix
>>>>> - kvm_arm_get_max_vm_phys_shift always returns the number of supported
>>>>>   IPA bits
>>>>> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review
>>>>>   of "hw/arm/virt: Split the memory map description"
>>>>> - "hw/arm/virt: Move memory map initialization into machvirt_init"
>>>>>   squashed into the previous patch
>>>>> - change alignment of IO regions beyond the RAM so that it matches their
>>>>>   size
>>>>>
>>>>> v4 -> v5:
>>>>> - change in the memory map
>>>>> - see individual logs
>>>>>
>>>>> v3 -> v4:
>>>>> - rebase on David's "pc-dimm: next bunch of cleanups" and
>>>>>   "pc-dimm: pre_plug "slot" and "addr" assignment"
>>>>> - kvm-type option not used anymore. We directly use
>>>>>   maxram_size and ram_size machine fields to compute the
>>>>>   MAX IPA range. Migration is naturally handled as CLI
>>>>>   option are kept between source and destination. This was
>>>>>   suggested by David.
>>>>> - device_memory_start and device_memory_size not stored
>>>>>   anymore in vms->bootinfo
>>>>> - I did not take into account 2 Igor's comments: the one
>>>>>   related to the refactoring of arm_load_dtb and the one
>>>>>   related to the generation of the dtb after system_reset
>>>>>   which would contain nodes of hotplugged devices (we do
>>>>>   not support hotplug at this stage)
>>>>> - check the end-user does not attempt to hotplug a device
>>>>> - addition of "vl: Set machine ram_size, maxram_size and
>>>>>   ram_slots earlier"
>>>>>
>>>>> v2 -> v3:
>>>>> - fix pc_q35 and pc_piix compilation error
>>>>> - kwangwoo's email being not valid anymore, remove his address
>>>>>
>>>>> v1 -> v2:
>>>>> - kvm_get_max_vm_phys_shift moved in arch specific file
>>>>> - addition of NVDIMM part
>>>>> - single series
>>>>> - rebase on David's refactoring
>>>>>
>>>>> v1:
>>>>> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
>>>>> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"
>>>>>
>>>>> Best Regards
>>>>>
>>>>> Eric
>>>>>
>>>>>
>>>>> Eric Auger (12):
>>>>>   hw/arm/virt: Rename highmem IO regions
>>>>>   hw/arm/virt: Split the memory map description
>>>>>   hw/boards: Add a MachineState parameter to kvm_type callback
>>>>>   kvm: add kvm_arm_get_max_vm_ipa_size
>>>>>   vl: Set machine ram_size, maxram_size and ram_slots earlier
>>>>>   hw/arm/virt: Dynamic memory map depending on RAM requirements
>>>>>   hw/arm/virt: Implement kvm_type function for 4.0 machine
>>>>>   hw/arm/virt: Bump the 255GB initial RAM limit
>>>>>   hw/arm/virt: Add memory hotplug framework
>>>>>   hw/arm/virt: Allocate device_memory
>>>>>   hw/arm/boot: Expose the pmem nodes in the DT
>>>>>   hw/arm/virt: Add nvdimm and nvdimm-persistence options
>>>>>
>>>>> Kwangwoo Lee (2):
>>>>>   nvdimm: use configurable ACPI IO base and size
>>>>>   hw/arm/virt: Add nvdimm hot-plug infrastructure
>>>>>
>>>>> Shameer Kolothum (3):
>>>>>   hw/arm/boot: introduce fdt_add_memory_node helper
>>>>>   hw/arm/boot: Expose the PC-DIMM nodes in the DT
>>>>>   hw/arm/virt-acpi-build: Add PC-DIMM in SRAT
>>>>>
>>>>>  accel/kvm/kvm-all.c             |   2 +-
>>>>>  default-configs/arm-softmmu.mak |   4 +
>>>>>  hw/acpi/nvdimm.c                |  31 ++-
>>>>>  hw/arm/boot.c                   | 136 ++++++++++--
>>>>>  hw/arm/virt-acpi-build.c        |  23 +-
>>>>>  hw/arm/virt.c                   | 364 ++++++++++++++++++++++++++++----
>>>>>  hw/i386/pc_piix.c               |   6 +-
>>>>>  hw/i386/pc_q35.c                |   6 +-
>>>>>  hw/ppc/mac_newworld.c           |   3 +-
>>>>>  hw/ppc/mac_oldworld.c           |   2 +-
>>>>>  hw/ppc/spapr.c                  |   2 +-
>>>>>  include/hw/arm/virt.h           |  24 ++-
>>>>>  include/hw/boards.h             |   5 +-
>>>>>  include/hw/mem/nvdimm.h         |   4 +
>>>>>  target/arm/kvm.c                |  10 +
>>>>>  target/arm/kvm_arm.h            |  13 ++
>>>>>  vl.c                            |   6 +-
>>>>>  17 files changed, 556 insertions(+), 85 deletions(-)
>>>>>
>>>>
>>>>
>>

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Igor Mammedov 6 years, 8 months ago

On Tue, 26 Feb 2019 14:11:58 +0100
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Igor,
> 
> On 2/26/19 9:40 AM, Auger Eric wrote:
> > Hi Igor,
> > 
> > On 2/25/19 10:42 AM, Igor Mammedov wrote:
> >> On Fri, 22 Feb 2019 18:35:26 +0100
> >> Auger Eric <eric.auger@redhat.com> wrote:
> >>
> >>> Hi Igor,
> >>>
> >>> On 2/22/19 5:27 PM, Igor Mammedov wrote:
> >>>> On Wed, 20 Feb 2019 23:39:46 +0100
> >>>> Eric Auger <eric.auger@redhat.com> wrote:
> >>>>
> >>>>> This series aims to bump the 255GB RAM limit in machvirt and to
> >>>>> support device memory in general, and especially PCDIMM/NVDIMM.
> >>>>>
> >>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
> >>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the
> >>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
> >>>>> MMIO region. The address map was 1TB large. This corresponded to
> >>>>> the max IPA capacity KVM was able to manage.
> >>>>>
> >>>>> Since 4.20, the host kernel is able to support a larger and dynamic
> >>>>> IPA range. So the guest physical address can go beyond the 1TB. The
> >>>>> max GPA size depends on the host kernel configuration and physical CPUs.
> >>>>>
> >>>>> In this series we use this feature and allow the RAM to grow without
> >>>>> any other limit than the one put by the host kernel.
> >>>>>
> >>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
> >>>>> ram_size and then comes the device memory (,maxmem) of size
> >>>>> maxram_size - ram_size. The device memory is potentially hotpluggable
> >>>>> depending on the instantiated memory objects.
> >>>>>
> >>>>> IO regions previously located between 256GB and 1TB are moved after
> >>>>> the RAM. Their offset is dynamically computed, depends on ram_size
> >>>>> and maxram_size. Size alignment is enforced.
> >>>>>
> >>>>> In case maxmem value is inferior to 255GB, the legacy memory map
> >>>>> still is used. The change of memory map becomes effective from 4.0
> >>>>> onwards.
> >>>>>
> >>>>> As we keep the initial RAM at 1GB base address, we do not need to do
> >>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
> >>>>> that job at the moment.
> >>>>>
> >>>>> Device memory being put just after the initial RAM, it is possible
> >>>>> to get access to this feature while keeping a 1TB address map.
> >>>>>
> >>>>> This series reuses/rebases patches initially submitted by Shameer
> >>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
> >>>>>
> >>>>> Functionally, the series is split into 3 parts:
> >>>>> 1) bump of the initial RAM limit [1 - 9] and change in
> >>>>>    the memory map
> >>>>
> >>>>> 2) Support of PC-DIMM [10 - 13]
> >>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
> >>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
> >>>> visible to the guest. It might be that DT is masking problem
> >>>> but well, that won't work on ACPI only guests.
> >>>
> >>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem
> >>> added with the DIMM slots.
> >> Question is how does it get there? Does it come from DT or from firmware
> >> via UEFI interfaces?
> >>
> >>> So it looks fine to me. Isn't E820 a pure x86 matter?
> >> sorry for misleading, I've meant is UEFI GetMemoryMap().
> >> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
> >> via UEFI GetMemoryMap() as guest kernel might start using it as normal
> >> memory early at boot and later put that memory into zone normal and hence
> >> make it non-hot-un-pluggable. The same concerns apply to DT based means
> >> of discovery.
> >> (That's guest issue but it's easy to workaround it not putting hotpluggable
> >> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly)
> >> That way memory doesn't get (ab)used by firmware or early boot kernel stages
> >> and doesn't get locked up.
> >>
> >>> What else would you expect in the dsdt?
> >> Memory device descriptions, look for code that adds PNP0C80 with _CRS
> >> describing memory ranges
> > 
> > OK thank you for the explanations. I will work on PNP0C80 addition then.
> > Does it mean that in ACPI mode we must not output DT hotplug memory
> > nodes or assuming that PNP0C80 is properly described, it will "override"
> > DT description?
> 
> After further investigations, I think the pieces you pointed out are
> added by Shameer's series, ie. through the build_memory_hotplug_aml()
> call. So I suggest we separate the concerns: this series brings support
> for DIMM coldplug. hotplug, including all the relevant ACPI structures
> will be added later on by Shameer.

Maybe we should not put pc-dimms in DT for this series until it gets clear
if it doesn't conflict with ACPI in some way.

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Auger Eric 6 years, 8 months ago

Hi Igor,

On 2/26/19 5:56 PM, Igor Mammedov wrote:
> On Tue, 26 Feb 2019 14:11:58 +0100
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Igor,
>>
>> On 2/26/19 9:40 AM, Auger Eric wrote:
>>> Hi Igor,
>>>
>>> On 2/25/19 10:42 AM, Igor Mammedov wrote:
>>>> On Fri, 22 Feb 2019 18:35:26 +0100
>>>> Auger Eric <eric.auger@redhat.com> wrote:
>>>>
>>>>> Hi Igor,
>>>>>
>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote:
>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100
>>>>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>>>>
>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to
>>>>>>> support device memory in general, and especially PCDIMM/NVDIMM.
>>>>>>>
>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the
>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
>>>>>>> MMIO region. The address map was 1TB large. This corresponded to
>>>>>>> the max IPA capacity KVM was able to manage.
>>>>>>>
>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic
>>>>>>> IPA range. So the guest physical address can go beyond the 1TB. The
>>>>>>> max GPA size depends on the host kernel configuration and physical CPUs.
>>>>>>>
>>>>>>> In this series we use this feature and allow the RAM to grow without
>>>>>>> any other limit than the one put by the host kernel.
>>>>>>>
>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
>>>>>>> ram_size and then comes the device memory (,maxmem) of size
>>>>>>> maxram_size - ram_size. The device memory is potentially hotpluggable
>>>>>>> depending on the instantiated memory objects.
>>>>>>>
>>>>>>> IO regions previously located between 256GB and 1TB are moved after
>>>>>>> the RAM. Their offset is dynamically computed, depends on ram_size
>>>>>>> and maxram_size. Size alignment is enforced.
>>>>>>>
>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory map
>>>>>>> still is used. The change of memory map becomes effective from 4.0
>>>>>>> onwards.
>>>>>>>
>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to do
>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
>>>>>>> that job at the moment.
>>>>>>>
>>>>>>> Device memory being put just after the initial RAM, it is possible
>>>>>>> to get access to this feature while keeping a 1TB address map.
>>>>>>>
>>>>>>> This series reuses/rebases patches initially submitted by Shameer
>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
>>>>>>>
>>>>>>> Functionally, the series is split into 3 parts:
>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in
>>>>>>>    the memory map
>>>>>>
>>>>>>> 2) Support of PC-DIMM [10 - 13]
>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
>>>>>> visible to the guest. It might be that DT is masking problem
>>>>>> but well, that won't work on ACPI only guests.
>>>>>
>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem
>>>>> added with the DIMM slots.
>>>> Question is how does it get there? Does it come from DT or from firmware
>>>> via UEFI interfaces?
>>>>
>>>>> So it looks fine to me. Isn't E820 a pure x86 matter?
>>>> sorry for misleading, I've meant is UEFI GetMemoryMap().
>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
>>>> via UEFI GetMemoryMap() as guest kernel might start using it as normal
>>>> memory early at boot and later put that memory into zone normal and hence
>>>> make it non-hot-un-pluggable. The same concerns apply to DT based means
>>>> of discovery.
>>>> (That's guest issue but it's easy to workaround it not putting hotpluggable
>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly)
>>>> That way memory doesn't get (ab)used by firmware or early boot kernel stages
>>>> and doesn't get locked up.
>>>>
>>>>> What else would you expect in the dsdt?
>>>> Memory device descriptions, look for code that adds PNP0C80 with _CRS
>>>> describing memory ranges
>>>
>>> OK thank you for the explanations. I will work on PNP0C80 addition then.
>>> Does it mean that in ACPI mode we must not output DT hotplug memory
>>> nodes or assuming that PNP0C80 is properly described, it will "override"
>>> DT description?
>>
>> After further investigations, I think the pieces you pointed out are
>> added by Shameer's series, ie. through the build_memory_hotplug_aml()
>> call. So I suggest we separate the concerns: this series brings support
>> for DIMM coldplug. hotplug, including all the relevant ACPI structures
>> will be added later on by Shameer.
> 
> Maybe we should not put pc-dimms in DT for this series until it gets clear
> if it doesn't conflict with ACPI in some way.

I guess you mean removing the DT hotpluggable memory nodes only in ACPI
mode? Otherwise you simply remove the DIMM feature, right?

I double checked and if you remove the hotpluggable memory DT nodes in
ACPI mode:
- you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I
guess you're right, if the DT nodes are available, that memory is
considered as not unpluggable by the guest.
- You can see the NVDIMM slots using ndctl list -u. You can mount a DAX
system.

Hotplug/unplug is clearly not supported by this series and any attempt
results in "memory hotplug is not supported". Is it really an issue if
the guest does not consider DIMM slots as not hot-unpluggable memory? I
am not even sure the guest kernel would support to unplug that memory.

In case we want all ACPI tables to be ready for making this memory seen
as hot-unpluggable we need some Shameer's patches on top of this series.

Also don't DIMM slots already make sense in DT mode. Usually we accept
to add one feature in DT and then in ACPI. For instance we can benefit
from nvdimm in dt mode right? So, considering an incremental approach I
would be in favour of keeping the DT nodes.

Thanks

Eric
> 
> 
> 
>

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Igor Mammedov 6 years, 8 months ago

On Tue, 26 Feb 2019 18:53:24 +0100
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Igor,
> 
> On 2/26/19 5:56 PM, Igor Mammedov wrote:
> > On Tue, 26 Feb 2019 14:11:58 +0100
> > Auger Eric <eric.auger@redhat.com> wrote:
> >   
> >> Hi Igor,
> >>
> >> On 2/26/19 9:40 AM, Auger Eric wrote:  
> >>> Hi Igor,
> >>>
> >>> On 2/25/19 10:42 AM, Igor Mammedov wrote:  
> >>>> On Fri, 22 Feb 2019 18:35:26 +0100
> >>>> Auger Eric <eric.auger@redhat.com> wrote:
> >>>>  
> >>>>> Hi Igor,
> >>>>>
> >>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote:  
> >>>>>> On Wed, 20 Feb 2019 23:39:46 +0100
> >>>>>> Eric Auger <eric.auger@redhat.com> wrote:
> >>>>>>  
> >>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to
> >>>>>>> support device memory in general, and especially PCDIMM/NVDIMM.
> >>>>>>>
> >>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
> >>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the
> >>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
> >>>>>>> MMIO region. The address map was 1TB large. This corresponded to
> >>>>>>> the max IPA capacity KVM was able to manage.
> >>>>>>>
> >>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic
> >>>>>>> IPA range. So the guest physical address can go beyond the 1TB. The
> >>>>>>> max GPA size depends on the host kernel configuration and physical CPUs.
> >>>>>>>
> >>>>>>> In this series we use this feature and allow the RAM to grow without
> >>>>>>> any other limit than the one put by the host kernel.
> >>>>>>>
> >>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
> >>>>>>> ram_size and then comes the device memory (,maxmem) of size
> >>>>>>> maxram_size - ram_size. The device memory is potentially hotpluggable
> >>>>>>> depending on the instantiated memory objects.
> >>>>>>>
> >>>>>>> IO regions previously located between 256GB and 1TB are moved after
> >>>>>>> the RAM. Their offset is dynamically computed, depends on ram_size
> >>>>>>> and maxram_size. Size alignment is enforced.
> >>>>>>>
> >>>>>>> In case maxmem value is inferior to 255GB, the legacy memory map
> >>>>>>> still is used. The change of memory map becomes effective from 4.0
> >>>>>>> onwards.
> >>>>>>>
> >>>>>>> As we keep the initial RAM at 1GB base address, we do not need to do
> >>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
> >>>>>>> that job at the moment.
> >>>>>>>
> >>>>>>> Device memory being put just after the initial RAM, it is possible
> >>>>>>> to get access to this feature while keeping a 1TB address map.
> >>>>>>>
> >>>>>>> This series reuses/rebases patches initially submitted by Shameer
> >>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
> >>>>>>>
> >>>>>>> Functionally, the series is split into 3 parts:
> >>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in
> >>>>>>>    the memory map  
> >>>>>>  
> >>>>>>> 2) Support of PC-DIMM [10 - 13]  
> >>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
> >>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
> >>>>>> visible to the guest. It might be that DT is masking problem
> >>>>>> but well, that won't work on ACPI only guests.  
> >>>>>
> >>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem
> >>>>> added with the DIMM slots.  
> >>>> Question is how does it get there? Does it come from DT or from firmware
> >>>> via UEFI interfaces?
> >>>>  
> >>>>> So it looks fine to me. Isn't E820 a pure x86 matter?  
> >>>> sorry for misleading, I've meant is UEFI GetMemoryMap().
> >>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
> >>>> via UEFI GetMemoryMap() as guest kernel might start using it as normal
> >>>> memory early at boot and later put that memory into zone normal and hence
> >>>> make it non-hot-un-pluggable. The same concerns apply to DT based means
> >>>> of discovery.
> >>>> (That's guest issue but it's easy to workaround it not putting hotpluggable
> >>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly)
> >>>> That way memory doesn't get (ab)used by firmware or early boot kernel stages
> >>>> and doesn't get locked up.
> >>>>  
> >>>>> What else would you expect in the dsdt?  
> >>>> Memory device descriptions, look for code that adds PNP0C80 with _CRS
> >>>> describing memory ranges  
> >>>
> >>> OK thank you for the explanations. I will work on PNP0C80 addition then.
> >>> Does it mean that in ACPI mode we must not output DT hotplug memory
> >>> nodes or assuming that PNP0C80 is properly described, it will "override"
> >>> DT description?  
> >>
> >> After further investigations, I think the pieces you pointed out are
> >> added by Shameer's series, ie. through the build_memory_hotplug_aml()
> >> call. So I suggest we separate the concerns: this series brings support
> >> for DIMM coldplug. hotplug, including all the relevant ACPI structures
> >> will be added later on by Shameer.  
> > 
> > Maybe we should not put pc-dimms in DT for this series until it gets clear
> > if it doesn't conflict with ACPI in some way.  
> 
> I guess you mean removing the DT hotpluggable memory nodes only in ACPI
> mode? Otherwise you simply remove the DIMM feature, right?
Something like this so DT won't get in conflict with ACPI.
Only we don't have a switch for it something like, -machine fdt=on (with default off)
 
> I double checked and if you remove the hotpluggable memory DT nodes in
> ACPI mode:
> - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I
> guess you're right, if the DT nodes are available, that memory is
> considered as not unpluggable by the guest.
> - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX
> system.
> 
> Hotplug/unplug is clearly not supported by this series and any attempt
> results in "memory hotplug is not supported". Is it really an issue if
> the guest does not consider DIMM slots as not hot-unpluggable memory? I
> am not even sure the guest kernel would support to unplug that memory.
> 
> In case we want all ACPI tables to be ready for making this memory seen
> as hot-unpluggable we need some Shameer's patches on top of this series.
May be we should push for this way (into 4.0), it's just a several patches
after all or even merge them in your series (I'd guess it would need to be
rebased on top of your latest work)
 
> Also don't DIMM slots already make sense in DT mode. Usually we accept
> to add one feature in DT and then in ACPI. For instance we can benefit
usually it doesn't conflict with each other (at least I'm not aware of it)
but I see a problem with in this case.

> from nvdimm in dt mode right? So, considering an incremental approach I
> would be in favour of keeping the DT nodes.
I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much
more versatile.

I consider target application of arm/virt as a board that's used to
run in production generic ACPI capable guest in most use cases and
various DT only guests as secondary ones. It's hard to make
both usecases be happy with defaults (that's probably  one of the
reasons why 'sbsa' board is being added).

So I'd give priority to ACPI based arm/virt versus DT when defaults are
considered.

> Thanks
> 
> Eric
> > 
> > 
> > 
> >

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Auger Eric 6 years, 8 months ago

Hi Igor, Shameer,

On 2/27/19 11:10 AM, Igor Mammedov wrote:
> On Tue, 26 Feb 2019 18:53:24 +0100
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Igor,
>>
>> On 2/26/19 5:56 PM, Igor Mammedov wrote:
>>> On Tue, 26 Feb 2019 14:11:58 +0100
>>> Auger Eric <eric.auger@redhat.com> wrote:
>>>   
>>>> Hi Igor,
>>>>
>>>> On 2/26/19 9:40 AM, Auger Eric wrote:  
>>>>> Hi Igor,
>>>>>
>>>>> On 2/25/19 10:42 AM, Igor Mammedov wrote:  
>>>>>> On Fri, 22 Feb 2019 18:35:26 +0100
>>>>>> Auger Eric <eric.auger@redhat.com> wrote:
>>>>>>  
>>>>>>> Hi Igor,
>>>>>>>
>>>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote:  
>>>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100
>>>>>>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>>>>>>  
>>>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to
>>>>>>>>> support device memory in general, and especially PCDIMM/NVDIMM.
>>>>>>>>>
>>>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
>>>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the
>>>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
>>>>>>>>> MMIO region. The address map was 1TB large. This corresponded to
>>>>>>>>> the max IPA capacity KVM was able to manage.
>>>>>>>>>
>>>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic
>>>>>>>>> IPA range. So the guest physical address can go beyond the 1TB. The
>>>>>>>>> max GPA size depends on the host kernel configuration and physical CPUs.
>>>>>>>>>
>>>>>>>>> In this series we use this feature and allow the RAM to grow without
>>>>>>>>> any other limit than the one put by the host kernel.
>>>>>>>>>
>>>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
>>>>>>>>> ram_size and then comes the device memory (,maxmem) of size
>>>>>>>>> maxram_size - ram_size. The device memory is potentially hotpluggable
>>>>>>>>> depending on the instantiated memory objects.
>>>>>>>>>
>>>>>>>>> IO regions previously located between 256GB and 1TB are moved after
>>>>>>>>> the RAM. Their offset is dynamically computed, depends on ram_size
>>>>>>>>> and maxram_size. Size alignment is enforced.
>>>>>>>>>
>>>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory map
>>>>>>>>> still is used. The change of memory map becomes effective from 4.0
>>>>>>>>> onwards.
>>>>>>>>>
>>>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to do
>>>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
>>>>>>>>> that job at the moment.
>>>>>>>>>
>>>>>>>>> Device memory being put just after the initial RAM, it is possible
>>>>>>>>> to get access to this feature while keeping a 1TB address map.
>>>>>>>>>
>>>>>>>>> This series reuses/rebases patches initially submitted by Shameer
>>>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
>>>>>>>>>
>>>>>>>>> Functionally, the series is split into 3 parts:
>>>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in
>>>>>>>>>    the memory map  
>>>>>>>>  
>>>>>>>>> 2) Support of PC-DIMM [10 - 13]  
>>>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
>>>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
>>>>>>>> visible to the guest. It might be that DT is masking problem
>>>>>>>> but well, that won't work on ACPI only guests.  
>>>>>>>
>>>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem
>>>>>>> added with the DIMM slots.  
>>>>>> Question is how does it get there? Does it come from DT or from firmware
>>>>>> via UEFI interfaces?
>>>>>>  
>>>>>>> So it looks fine to me. Isn't E820 a pure x86 matter?  
>>>>>> sorry for misleading, I've meant is UEFI GetMemoryMap().
>>>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
>>>>>> via UEFI GetMemoryMap() as guest kernel might start using it as normal
>>>>>> memory early at boot and later put that memory into zone normal and hence
>>>>>> make it non-hot-un-pluggable. The same concerns apply to DT based means
>>>>>> of discovery.
>>>>>> (That's guest issue but it's easy to workaround it not putting hotpluggable
>>>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly)
>>>>>> That way memory doesn't get (ab)used by firmware or early boot kernel stages
>>>>>> and doesn't get locked up.
>>>>>>  
>>>>>>> What else would you expect in the dsdt?  
>>>>>> Memory device descriptions, look for code that adds PNP0C80 with _CRS
>>>>>> describing memory ranges  
>>>>>
>>>>> OK thank you for the explanations. I will work on PNP0C80 addition then.
>>>>> Does it mean that in ACPI mode we must not output DT hotplug memory
>>>>> nodes or assuming that PNP0C80 is properly described, it will "override"
>>>>> DT description?  
>>>>
>>>> After further investigations, I think the pieces you pointed out are
>>>> added by Shameer's series, ie. through the build_memory_hotplug_aml()
>>>> call. So I suggest we separate the concerns: this series brings support
>>>> for DIMM coldplug. hotplug, including all the relevant ACPI structures
>>>> will be added later on by Shameer.  
>>>
>>> Maybe we should not put pc-dimms in DT for this series until it gets clear
>>> if it doesn't conflict with ACPI in some way.  
>>
>> I guess you mean removing the DT hotpluggable memory nodes only in ACPI
>> mode? Otherwise you simply remove the DIMM feature, right?
> Something like this so DT won't get in conflict with ACPI.
> Only we don't have a switch for it something like, -machine fdt=on (with default off)
>  
>> I double checked and if you remove the hotpluggable memory DT nodes in
>> ACPI mode:
>> - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I
>> guess you're right, if the DT nodes are available, that memory is
>> considered as not unpluggable by the guest.
>> - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX
>> system.
>>
>> Hotplug/unplug is clearly not supported by this series and any attempt
>> results in "memory hotplug is not supported". Is it really an issue if
>> the guest does not consider DIMM slots as not hot-unpluggable memory? I
>> am not even sure the guest kernel would support to unplug that memory.
>>
>> In case we want all ACPI tables to be ready for making this memory seen
>> as hot-unpluggable we need some Shameer's patches on top of this series.
> May be we should push for this way (into 4.0), it's just a several patches
> after all or even merge them in your series (I'd guess it would need to be
> rebased on top of your latest work)

Shameer, would you agree if we merge PATCH 1 of your RFC hotplug series
(without the reduced hw_reduced_acpi flag) in this series and isolate in
a second PATCH the acpi_memory_hotplug_init() + build_memory_hotplug_aml
called in virt code?

Then would remain the GED/GPIO actual integration.

Thanks

Eric
>  
>> Also don't DIMM slots already make sense in DT mode. Usually we accept
>> to add one feature in DT and then in ACPI. For instance we can benefit
> usually it doesn't conflict with each other (at least I'm not aware of it)
> but I see a problem with in this case.
> 
>> from nvdimm in dt mode right? So, considering an incremental approach I
>> would be in favour of keeping the DT nodes.
> I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much
> more versatile.
> 
> I consider target application of arm/virt as a board that's used to
> run in production generic ACPI capable guest in most use cases and
> various DT only guests as secondary ones. It's hard to make
> both usecases be happy with defaults (that's probably  one of the
> reasons why 'sbsa' board is being added).
> 
> So I'd give priority to ACPI based arm/virt versus DT when defaults are
> considered.
> 
>> Thanks
>>
>> Eric
>>>
>>>
>>>
>>>   
>

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Shameerali Kolothum Thodi 6 years, 8 months ago

Hi Eric,

> -----Original Message-----
> From: Auger Eric [mailto:eric.auger@redhat.com]
> Sent: 27 February 2019 10:27
> To: Igor Mammedov <imammedo@redhat.com>
> Cc: peter.maydell@linaro.org; drjones@redhat.com; david@redhat.com;
> dgilbert@redhat.com; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-devel@nongnu.org;
> qemu-arm@nongnu.org; eric.auger.pro@gmail.com;
> david@gibson.dropbear.id.au
> Subject: Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion
> and PCDIMM/NVDIMM support
> 
> Hi Igor, Shameer,
> 
> On 2/27/19 11:10 AM, Igor Mammedov wrote:
> > On Tue, 26 Feb 2019 18:53:24 +0100
> > Auger Eric <eric.auger@redhat.com> wrote:
> >
> >> Hi Igor,
> >>
> >> On 2/26/19 5:56 PM, Igor Mammedov wrote:
> >>> On Tue, 26 Feb 2019 14:11:58 +0100
> >>> Auger Eric <eric.auger@redhat.com> wrote:
> >>>
> >>>> Hi Igor,
> >>>>
> >>>> On 2/26/19 9:40 AM, Auger Eric wrote:
> >>>>> Hi Igor,
> >>>>>
> >>>>> On 2/25/19 10:42 AM, Igor Mammedov wrote:
> >>>>>> On Fri, 22 Feb 2019 18:35:26 +0100
> >>>>>> Auger Eric <eric.auger@redhat.com> wrote:
> >>>>>>
> >>>>>>> Hi Igor,
> >>>>>>>
> >>>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote:
> >>>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100
> >>>>>>>> Eric Auger <eric.auger@redhat.com> wrote:
> >>>>>>>>
> >>>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to
> >>>>>>>>> support device memory in general, and especially
> PCDIMM/NVDIMM.
> >>>>>>>>>
> >>>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
> >>>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as
> the
> >>>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high
> PCIe
> >>>>>>>>> MMIO region. The address map was 1TB large. This corresponded
> to
> >>>>>>>>> the max IPA capacity KVM was able to manage.
> >>>>>>>>>
> >>>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic
> >>>>>>>>> IPA range. So the guest physical address can go beyond the 1TB.
> The
> >>>>>>>>> max GPA size depends on the host kernel configuration and physical
> CPUs.
> >>>>>>>>>
> >>>>>>>>> In this series we use this feature and allow the RAM to grow
> without
> >>>>>>>>> any other limit than the one put by the host kernel.
> >>>>>>>>>
> >>>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
> >>>>>>>>> ram_size and then comes the device memory (,maxmem) of size
> >>>>>>>>> maxram_size - ram_size. The device memory is potentially
> hotpluggable
> >>>>>>>>> depending on the instantiated memory objects.
> >>>>>>>>>
> >>>>>>>>> IO regions previously located between 256GB and 1TB are moved
> after
> >>>>>>>>> the RAM. Their offset is dynamically computed, depends on
> ram_size
> >>>>>>>>> and maxram_size. Size alignment is enforced.
> >>>>>>>>>
> >>>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory
> map
> >>>>>>>>> still is used. The change of memory map becomes effective from 4.0
> >>>>>>>>> onwards.
> >>>>>>>>>
> >>>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to
> do
> >>>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
> >>>>>>>>> that job at the moment.
> >>>>>>>>>
> >>>>>>>>> Device memory being put just after the initial RAM, it is possible
> >>>>>>>>> to get access to this feature while keeping a 1TB address map.
> >>>>>>>>>
> >>>>>>>>> This series reuses/rebases patches initially submitted by Shameer
> >>>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
> >>>>>>>>>
> >>>>>>>>> Functionally, the series is split into 3 parts:
> >>>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in
> >>>>>>>>>    the memory map
> >>>>>>>>
> >>>>>>>>> 2) Support of PC-DIMM [10 - 13]
> >>>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
> >>>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
> >>>>>>>> visible to the guest. It might be that DT is masking problem
> >>>>>>>> but well, that won't work on ACPI only guests.
> >>>>>>>
> >>>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of
> mem
> >>>>>>> added with the DIMM slots.
> >>>>>> Question is how does it get there? Does it come from DT or from
> firmware
> >>>>>> via UEFI interfaces?
> >>>>>>
> >>>>>>> So it looks fine to me. Isn't E820 a pure x86 matter?
> >>>>>> sorry for misleading, I've meant is UEFI GetMemoryMap().
> >>>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
> >>>>>> via UEFI GetMemoryMap() as guest kernel might start using it as
> normal
> >>>>>> memory early at boot and later put that memory into zone normal and
> hence
> >>>>>> make it non-hot-un-pluggable. The same concerns apply to DT based
> means
> >>>>>> of discovery.
> >>>>>> (That's guest issue but it's easy to workaround it not putting
> hotpluggable
> >>>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it
> properly)
> >>>>>> That way memory doesn't get (ab)used by firmware or early boot
> kernel stages
> >>>>>> and doesn't get locked up.
> >>>>>>
> >>>>>>> What else would you expect in the dsdt?
> >>>>>> Memory device descriptions, look for code that adds PNP0C80 with
> _CRS
> >>>>>> describing memory ranges
> >>>>>
> >>>>> OK thank you for the explanations. I will work on PNP0C80 addition then.
> >>>>> Does it mean that in ACPI mode we must not output DT hotplug memory
> >>>>> nodes or assuming that PNP0C80 is properly described, it will "override"
> >>>>> DT description?
> >>>>
> >>>> After further investigations, I think the pieces you pointed out are
> >>>> added by Shameer's series, ie. through the build_memory_hotplug_aml()
> >>>> call. So I suggest we separate the concerns: this series brings support
> >>>> for DIMM coldplug. hotplug, including all the relevant ACPI structures
> >>>> will be added later on by Shameer.
> >>>
> >>> Maybe we should not put pc-dimms in DT for this series until it gets clear
> >>> if it doesn't conflict with ACPI in some way.
> >>
> >> I guess you mean removing the DT hotpluggable memory nodes only in ACPI
> >> mode? Otherwise you simply remove the DIMM feature, right?
> > Something like this so DT won't get in conflict with ACPI.
> > Only we don't have a switch for it something like, -machine fdt=on (with
> default off)
> >
> >> I double checked and if you remove the hotpluggable memory DT nodes in
> >> ACPI mode:
> >> - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I
> >> guess you're right, if the DT nodes are available, that memory is
> >> considered as not unpluggable by the guest.
> >> - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX
> >> system.
> >>
> >> Hotplug/unplug is clearly not supported by this series and any attempt
> >> results in "memory hotplug is not supported". Is it really an issue if
> >> the guest does not consider DIMM slots as not hot-unpluggable memory? I
> >> am not even sure the guest kernel would support to unplug that memory.
> >>
> >> In case we want all ACPI tables to be ready for making this memory seen
> >> as hot-unpluggable we need some Shameer's patches on top of this series.
> > May be we should push for this way (into 4.0), it's just a several patches
> > after all or even merge them in your series (I'd guess it would need to be
> > rebased on top of your latest work)
> 
> Shameer, would you agree if we merge PATCH 1 of your RFC hotplug series
> (without the reduced hw_reduced_acpi flag) in this series and isolate in
> a second PATCH the acpi_memory_hotplug_init() + build_memory_hotplug_aml
> called in virt code?

Sure, that’s fine with me. So what would you use for the event_handler_method in
build_memory_hotplug_aml()? GPO0 device?

Thanks,
Shameer

> Then would remain the GED/GPIO actual integration.
> 
> Thanks
> 
> Eric
> >
> >> Also don't DIMM slots already make sense in DT mode. Usually we accept
> >> to add one feature in DT and then in ACPI. For instance we can benefit
> > usually it doesn't conflict with each other (at least I'm not aware of it)
> > but I see a problem with in this case.
> >
> >> from nvdimm in dt mode right? So, considering an incremental approach I
> >> would be in favour of keeping the DT nodes.
> > I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much
> > more versatile.
> >
> > I consider target application of arm/virt as a board that's used to
> > run in production generic ACPI capable guest in most use cases and
> > various DT only guests as secondary ones. It's hard to make
> > both usecases be happy with defaults (that's probably  one of the
> > reasons why 'sbsa' board is being added).
> >
> > So I'd give priority to ACPI based arm/virt versus DT when defaults are
> > considered.
> >
> >> Thanks
> >>
> >> Eric
> >>>
> >>>
> >>>
> >>>
> >

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Igor Mammedov 6 years, 8 months ago

On Wed, 27 Feb 2019 10:41:45 +0000
Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:

> Hi Eric,
> 
> > -----Original Message-----
> > From: Auger Eric [mailto:eric.auger@redhat.com]
> > Sent: 27 February 2019 10:27
> > To: Igor Mammedov <imammedo@redhat.com>
> > Cc: peter.maydell@linaro.org; drjones@redhat.com; david@redhat.com;
> > dgilbert@redhat.com; Shameerali Kolothum Thodi
> > <shameerali.kolothum.thodi@huawei.com>; qemu-devel@nongnu.org;
> > qemu-arm@nongnu.org; eric.auger.pro@gmail.com;
> > david@gibson.dropbear.id.au
> > Subject: Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion
> > and PCDIMM/NVDIMM support
> > 
> > Hi Igor, Shameer,
> > 
> > On 2/27/19 11:10 AM, Igor Mammedov wrote:  
> > > On Tue, 26 Feb 2019 18:53:24 +0100
> > > Auger Eric <eric.auger@redhat.com> wrote:
> > >  
> > >> Hi Igor,
> > >>
> > >> On 2/26/19 5:56 PM, Igor Mammedov wrote:  
> > >>> On Tue, 26 Feb 2019 14:11:58 +0100
> > >>> Auger Eric <eric.auger@redhat.com> wrote:
> > >>>  
> > >>>> Hi Igor,
> > >>>>
> > >>>> On 2/26/19 9:40 AM, Auger Eric wrote:  
> > >>>>> Hi Igor,
> > >>>>>
> > >>>>> On 2/25/19 10:42 AM, Igor Mammedov wrote:  
> > >>>>>> On Fri, 22 Feb 2019 18:35:26 +0100
> > >>>>>> Auger Eric <eric.auger@redhat.com> wrote:
> > >>>>>>  
> > >>>>>>> Hi Igor,
> > >>>>>>>
> > >>>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote:  
> > >>>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100
> > >>>>>>>> Eric Auger <eric.auger@redhat.com> wrote:
> > >>>>>>>>  
> > >>>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to
> > >>>>>>>>> support device memory in general, and especially  
> > PCDIMM/NVDIMM.  
> > >>>>>>>>>
> > >>>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
> > >>>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as  
> > the  
> > >>>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high  
> > PCIe  
> > >>>>>>>>> MMIO region. The address map was 1TB large. This corresponded  
> > to  
> > >>>>>>>>> the max IPA capacity KVM was able to manage.
> > >>>>>>>>>
> > >>>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic
> > >>>>>>>>> IPA range. So the guest physical address can go beyond the 1TB.  
> > The  
> > >>>>>>>>> max GPA size depends on the host kernel configuration and physical  
> > CPUs.  
> > >>>>>>>>>
> > >>>>>>>>> In this series we use this feature and allow the RAM to grow  
> > without  
> > >>>>>>>>> any other limit than the one put by the host kernel.
> > >>>>>>>>>
> > >>>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
> > >>>>>>>>> ram_size and then comes the device memory (,maxmem) of size
> > >>>>>>>>> maxram_size - ram_size. The device memory is potentially  
> > hotpluggable  
> > >>>>>>>>> depending on the instantiated memory objects.
> > >>>>>>>>>
> > >>>>>>>>> IO regions previously located between 256GB and 1TB are moved  
> > after  
> > >>>>>>>>> the RAM. Their offset is dynamically computed, depends on  
> > ram_size  
> > >>>>>>>>> and maxram_size. Size alignment is enforced.
> > >>>>>>>>>
> > >>>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory  
> > map  
> > >>>>>>>>> still is used. The change of memory map becomes effective from 4.0
> > >>>>>>>>> onwards.
> > >>>>>>>>>
> > >>>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to  
> > do  
> > >>>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
> > >>>>>>>>> that job at the moment.
> > >>>>>>>>>
> > >>>>>>>>> Device memory being put just after the initial RAM, it is possible
> > >>>>>>>>> to get access to this feature while keeping a 1TB address map.
> > >>>>>>>>>
> > >>>>>>>>> This series reuses/rebases patches initially submitted by Shameer
> > >>>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
> > >>>>>>>>>
> > >>>>>>>>> Functionally, the series is split into 3 parts:
> > >>>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in
> > >>>>>>>>>    the memory map  
> > >>>>>>>>  
> > >>>>>>>>> 2) Support of PC-DIMM [10 - 13]  
> > >>>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
> > >>>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
> > >>>>>>>> visible to the guest. It might be that DT is masking problem
> > >>>>>>>> but well, that won't work on ACPI only guests.  
> > >>>>>>>
> > >>>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of  
> > mem  
> > >>>>>>> added with the DIMM slots.  
> > >>>>>> Question is how does it get there? Does it come from DT or from  
> > firmware  
> > >>>>>> via UEFI interfaces?
> > >>>>>>  
> > >>>>>>> So it looks fine to me. Isn't E820 a pure x86 matter?  
> > >>>>>> sorry for misleading, I've meant is UEFI GetMemoryMap().
> > >>>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
> > >>>>>> via UEFI GetMemoryMap() as guest kernel might start using it as  
> > normal  
> > >>>>>> memory early at boot and later put that memory into zone normal and  
> > hence  
> > >>>>>> make it non-hot-un-pluggable. The same concerns apply to DT based  
> > means  
> > >>>>>> of discovery.
> > >>>>>> (That's guest issue but it's easy to workaround it not putting  
> > hotpluggable  
> > >>>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it  
> > properly)  
> > >>>>>> That way memory doesn't get (ab)used by firmware or early boot  
> > kernel stages  
> > >>>>>> and doesn't get locked up.
> > >>>>>>  
> > >>>>>>> What else would you expect in the dsdt?  
> > >>>>>> Memory device descriptions, look for code that adds PNP0C80 with  
> > _CRS  
> > >>>>>> describing memory ranges  
> > >>>>>
> > >>>>> OK thank you for the explanations. I will work on PNP0C80 addition then.
> > >>>>> Does it mean that in ACPI mode we must not output DT hotplug memory
> > >>>>> nodes or assuming that PNP0C80 is properly described, it will "override"
> > >>>>> DT description?  
> > >>>>
> > >>>> After further investigations, I think the pieces you pointed out are
> > >>>> added by Shameer's series, ie. through the build_memory_hotplug_aml()
> > >>>> call. So I suggest we separate the concerns: this series brings support
> > >>>> for DIMM coldplug. hotplug, including all the relevant ACPI structures
> > >>>> will be added later on by Shameer.  
> > >>>
> > >>> Maybe we should not put pc-dimms in DT for this series until it gets clear
> > >>> if it doesn't conflict with ACPI in some way.  
> > >>
> > >> I guess you mean removing the DT hotpluggable memory nodes only in ACPI
> > >> mode? Otherwise you simply remove the DIMM feature, right?  
> > > Something like this so DT won't get in conflict with ACPI.
> > > Only we don't have a switch for it something like, -machine fdt=on (with  
> > default off)  
> > >  
> > >> I double checked and if you remove the hotpluggable memory DT nodes in
> > >> ACPI mode:
> > >> - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I
> > >> guess you're right, if the DT nodes are available, that memory is
> > >> considered as not unpluggable by the guest.
> > >> - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX
> > >> system.
> > >>
> > >> Hotplug/unplug is clearly not supported by this series and any attempt
> > >> results in "memory hotplug is not supported". Is it really an issue if
> > >> the guest does not consider DIMM slots as not hot-unpluggable memory? I
> > >> am not even sure the guest kernel would support to unplug that memory.
> > >>
> > >> In case we want all ACPI tables to be ready for making this memory seen
> > >> as hot-unpluggable we need some Shameer's patches on top of this series.  
> > > May be we should push for this way (into 4.0), it's just a several patches
> > > after all or even merge them in your series (I'd guess it would need to be
> > > rebased on top of your latest work)  
> > 
> > Shameer, would you agree if we merge PATCH 1 of your RFC hotplug series
> > (without the reduced hw_reduced_acpi flag) in this series and isolate in
> > a second PATCH the acpi_memory_hotplug_init() + build_memory_hotplug_aml
> > called in virt code?  
probably we can do it as transitional step as we need working mmio interface
in place for build_memory_hotplug_aml() to work, provided it won't create
migration issues (do we need VMSTATE_MEMORY_HOTPLUG for cold-plug case?).

What about dummy initial GED (empty device), that manages mmio region only
and then later it will be filled with remaining logic IRQ. In this case mmio region
and vmstate won't change (maybe) so it won't cause ABI or migration issues.


> Sure, that’s fine with me. So what would you use for the event_handler_method in
> build_memory_hotplug_aml()? GPO0 device?

a method name not defined in spec, so it won't be called might do.


> 
> Thanks,
> Shameer
> 
> > Then would remain the GED/GPIO actual integration.
> > 
> > Thanks
> > 
> > Eric  
> > >  
> > >> Also don't DIMM slots already make sense in DT mode. Usually we accept
> > >> to add one feature in DT and then in ACPI. For instance we can benefit  
> > > usually it doesn't conflict with each other (at least I'm not aware of it)
> > > but I see a problem with in this case.
> > >  
> > >> from nvdimm in dt mode right? So, considering an incremental approach I
> > >> would be in favour of keeping the DT nodes.  
> > > I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much
> > > more versatile.
> > >
> > > I consider target application of arm/virt as a board that's used to
> > > run in production generic ACPI capable guest in most use cases and
> > > various DT only guests as secondary ones. It's hard to make
> > > both usecases be happy with defaults (that's probably  one of the
> > > reasons why 'sbsa' board is being added).
> > >
> > > So I'd give priority to ACPI based arm/virt versus DT when defaults are
> > > considered.
> > >  
> > >> Thanks
> > >>
> > >> Eric  
> > >>>
> > >>>
> > >>>
> > >>>  
> > >

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Auger Eric 6 years, 8 months ago

Hi Igor, Shameer,

On 2/27/19 6:51 PM, Igor Mammedov wrote:
> On Wed, 27 Feb 2019 10:41:45 +0000
> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:
> 
>> Hi Eric,
>>
>>> -----Original Message-----
>>> From: Auger Eric [mailto:eric.auger@redhat.com]
>>> Sent: 27 February 2019 10:27
>>> To: Igor Mammedov <imammedo@redhat.com>
>>> Cc: peter.maydell@linaro.org; drjones@redhat.com; david@redhat.com;
>>> dgilbert@redhat.com; Shameerali Kolothum Thodi
>>> <shameerali.kolothum.thodi@huawei.com>; qemu-devel@nongnu.org;
>>> qemu-arm@nongnu.org; eric.auger.pro@gmail.com;
>>> david@gibson.dropbear.id.au
>>> Subject: Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion
>>> and PCDIMM/NVDIMM support
>>>
>>> Hi Igor, Shameer,
>>>
>>> On 2/27/19 11:10 AM, Igor Mammedov wrote:  
>>>> On Tue, 26 Feb 2019 18:53:24 +0100
>>>> Auger Eric <eric.auger@redhat.com> wrote:
>>>>  
>>>>> Hi Igor,
>>>>>
>>>>> On 2/26/19 5:56 PM, Igor Mammedov wrote:  
>>>>>> On Tue, 26 Feb 2019 14:11:58 +0100
>>>>>> Auger Eric <eric.auger@redhat.com> wrote:
>>>>>>  
>>>>>>> Hi Igor,
>>>>>>>
>>>>>>> On 2/26/19 9:40 AM, Auger Eric wrote:  
>>>>>>>> Hi Igor,
>>>>>>>>
>>>>>>>> On 2/25/19 10:42 AM, Igor Mammedov wrote:  
>>>>>>>>> On Fri, 22 Feb 2019 18:35:26 +0100
>>>>>>>>> Auger Eric <eric.auger@redhat.com> wrote:
>>>>>>>>>  
>>>>>>>>>> Hi Igor,
>>>>>>>>>>
>>>>>>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote:  
>>>>>>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100
>>>>>>>>>>> Eric Auger <eric.auger@redhat.com> wrote:
>>>>>>>>>>>  
>>>>>>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to
>>>>>>>>>>>> support device memory in general, and especially  
>>> PCDIMM/NVDIMM.  
>>>>>>>>>>>>
>>>>>>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
>>>>>>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as  
>>> the  
>>>>>>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high  
>>> PCIe  
>>>>>>>>>>>> MMIO region. The address map was 1TB large. This corresponded  
>>> to  
>>>>>>>>>>>> the max IPA capacity KVM was able to manage.
>>>>>>>>>>>>
>>>>>>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic
>>>>>>>>>>>> IPA range. So the guest physical address can go beyond the 1TB.  
>>> The  
>>>>>>>>>>>> max GPA size depends on the host kernel configuration and physical  
>>> CPUs.  
>>>>>>>>>>>>
>>>>>>>>>>>> In this series we use this feature and allow the RAM to grow  
>>> without  
>>>>>>>>>>>> any other limit than the one put by the host kernel.
>>>>>>>>>>>>
>>>>>>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
>>>>>>>>>>>> ram_size and then comes the device memory (,maxmem) of size
>>>>>>>>>>>> maxram_size - ram_size. The device memory is potentially  
>>> hotpluggable  
>>>>>>>>>>>> depending on the instantiated memory objects.
>>>>>>>>>>>>
>>>>>>>>>>>> IO regions previously located between 256GB and 1TB are moved  
>>> after  
>>>>>>>>>>>> the RAM. Their offset is dynamically computed, depends on  
>>> ram_size  
>>>>>>>>>>>> and maxram_size. Size alignment is enforced.
>>>>>>>>>>>>
>>>>>>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory  
>>> map  
>>>>>>>>>>>> still is used. The change of memory map becomes effective from 4.0
>>>>>>>>>>>> onwards.
>>>>>>>>>>>>
>>>>>>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to  
>>> do  
>>>>>>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
>>>>>>>>>>>> that job at the moment.
>>>>>>>>>>>>
>>>>>>>>>>>> Device memory being put just after the initial RAM, it is possible
>>>>>>>>>>>> to get access to this feature while keeping a 1TB address map.
>>>>>>>>>>>>
>>>>>>>>>>>> This series reuses/rebases patches initially submitted by Shameer
>>>>>>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
>>>>>>>>>>>>
>>>>>>>>>>>> Functionally, the series is split into 3 parts:
>>>>>>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in
>>>>>>>>>>>>    the memory map  
>>>>>>>>>>>  
>>>>>>>>>>>> 2) Support of PC-DIMM [10 - 13]  
>>>>>>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
>>>>>>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
>>>>>>>>>>> visible to the guest. It might be that DT is masking problem
>>>>>>>>>>> but well, that won't work on ACPI only guests.  
>>>>>>>>>>
>>>>>>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of  
>>> mem  
>>>>>>>>>> added with the DIMM slots.  
>>>>>>>>> Question is how does it get there? Does it come from DT or from  
>>> firmware  
>>>>>>>>> via UEFI interfaces?
>>>>>>>>>  
>>>>>>>>>> So it looks fine to me. Isn't E820 a pure x86 matter?  
>>>>>>>>> sorry for misleading, I've meant is UEFI GetMemoryMap().
>>>>>>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
>>>>>>>>> via UEFI GetMemoryMap() as guest kernel might start using it as  
>>> normal  
>>>>>>>>> memory early at boot and later put that memory into zone normal and  
>>> hence  
>>>>>>>>> make it non-hot-un-pluggable. The same concerns apply to DT based  
>>> means  
>>>>>>>>> of discovery.
>>>>>>>>> (That's guest issue but it's easy to workaround it not putting  
>>> hotpluggable  
>>>>>>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it  
>>> properly)  
>>>>>>>>> That way memory doesn't get (ab)used by firmware or early boot  
>>> kernel stages  
>>>>>>>>> and doesn't get locked up.
>>>>>>>>>  
>>>>>>>>>> What else would you expect in the dsdt?  
>>>>>>>>> Memory device descriptions, look for code that adds PNP0C80 with  
>>> _CRS  
>>>>>>>>> describing memory ranges  
>>>>>>>>
>>>>>>>> OK thank you for the explanations. I will work on PNP0C80 addition then.
>>>>>>>> Does it mean that in ACPI mode we must not output DT hotplug memory
>>>>>>>> nodes or assuming that PNP0C80 is properly described, it will "override"
>>>>>>>> DT description?  
>>>>>>>
>>>>>>> After further investigations, I think the pieces you pointed out are
>>>>>>> added by Shameer's series, ie. through the build_memory_hotplug_aml()
>>>>>>> call. So I suggest we separate the concerns: this series brings support
>>>>>>> for DIMM coldplug. hotplug, including all the relevant ACPI structures
>>>>>>> will be added later on by Shameer.  
>>>>>>
>>>>>> Maybe we should not put pc-dimms in DT for this series until it gets clear
>>>>>> if it doesn't conflict with ACPI in some way.  
>>>>>
>>>>> I guess you mean removing the DT hotpluggable memory nodes only in ACPI
>>>>> mode? Otherwise you simply remove the DIMM feature, right?  
>>>> Something like this so DT won't get in conflict with ACPI.
>>>> Only we don't have a switch for it something like, -machine fdt=on (with  
>>> default off)  
>>>>  
>>>>> I double checked and if you remove the hotpluggable memory DT nodes in
>>>>> ACPI mode:
>>>>> - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I
>>>>> guess you're right, if the DT nodes are available, that memory is
>>>>> considered as not unpluggable by the guest.
>>>>> - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX
>>>>> system.
>>>>>
>>>>> Hotplug/unplug is clearly not supported by this series and any attempt
>>>>> results in "memory hotplug is not supported". Is it really an issue if
>>>>> the guest does not consider DIMM slots as not hot-unpluggable memory? I
>>>>> am not even sure the guest kernel would support to unplug that memory.
>>>>>
>>>>> In case we want all ACPI tables to be ready for making this memory seen
>>>>> as hot-unpluggable we need some Shameer's patches on top of this series.  
>>>> May be we should push for this way (into 4.0), it's just a several patches
>>>> after all or even merge them in your series (I'd guess it would need to be
>>>> rebased on top of your latest work)  
>>>
>>> Shameer, would you agree if we merge PATCH 1 of your RFC hotplug series
>>> (without the reduced hw_reduced_acpi flag) in this series and isolate in
>>> a second PATCH the acpi_memory_hotplug_init() + build_memory_hotplug_aml
>>> called in virt code?  
> probably we can do it as transitional step as we need working mmio interface
> in place for build_memory_hotplug_aml() to work, provided it won't create
> migration issues (do we need VMSTATE_MEMORY_HOTPLUG for cold-plug case?).
> 
> What about dummy initial GED (empty device), that manages mmio region only
> and then later it will be filled with remaining logic IRQ. In this case mmio region
> and vmstate won't change (maybe) so it won't cause ABI or migration issues.


> 
> 
>> Sure, that’s fine with me. So what would you use for the event_handler_method in
>> build_memory_hotplug_aml()? GPO0 device?
> 
> a method name not defined in spec, so it won't be called might do.

At this point the event_handler_method, ie. \_SB.GPO0._E02, is not
supposed to be called, right? So effectivily we should be able to use
any other method name (unlinked to any GPIO/GED). I guess at this stage
only the PNP0C80 definition blocks + methods are used.

What still remains fuzzy for me is in case of cold plug the mmio hotplug
control region part only is read (despite the slot selection of course)
and returns 0 for addr/size and also flags meaning the slot is not
enabled. So despite the slots are advertised as hotpluggable/enabled in
the SRAT; I am not sure for the OS it actually makes any difference
whether the DSDT definition blocks are described or not.

To be honest I am afraid this is too late to add those additional
features for 4.0 now. This is going to jeopardize the first preliminary
part which is the introduction of the new memory map, allowing the
expansion of the initial RAM and paving the way for device memory
introduction. So I think I am going to resend the first 10 patches in a
standalone series. And we can iterate on the PCDIMM/NVDIMM parts
independently.

Thanks

Eric
> 
> 
>>
>> Thanks,
>> Shameer
>>
>>> Then would remain the GED/GPIO actual integration.
>>>
>>> Thanks
>>>
>>> Eric  
>>>>  
>>>>> Also don't DIMM slots already make sense in DT mode. Usually we accept
>>>>> to add one feature in DT and then in ACPI. For instance we can benefit  
>>>> usually it doesn't conflict with each other (at least I'm not aware of it)
>>>> but I see a problem with in this case.
>>>>  
>>>>> from nvdimm in dt mode right? So, considering an incremental approach I
>>>>> would be in favour of keeping the DT nodes.  
>>>> I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much
>>>> more versatile.
>>>>
>>>> I consider target application of arm/virt as a board that's used to
>>>> run in production generic ACPI capable guest in most use cases and
>>>> various DT only guests as secondary ones. It's hard to make
>>>> both usecases be happy with defaults (that's probably  one of the
>>>> reasons why 'sbsa' board is being added).
>>>>
>>>> So I'd give priority to ACPI based arm/virt versus DT when defaults are
>>>> considered.
>>>>  
>>>>> Thanks
>>>>>
>>>>> Eric  
>>>>>>
>>>>>>
>>>>>>
>>>>>>  
>>>>  
> 
>

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Igor Mammedov 6 years, 8 months ago

On Thu, 28 Feb 2019 08:48:18 +0100
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Igor, Shameer,
> 
> On 2/27/19 6:51 PM, Igor Mammedov wrote:
> > On Wed, 27 Feb 2019 10:41:45 +0000
> > Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:
> >   
> >> Hi Eric,
> >>  
> >>> -----Original Message-----
> >>> From: Auger Eric [mailto:eric.auger@redhat.com]
> >>> Sent: 27 February 2019 10:27
> >>> To: Igor Mammedov <imammedo@redhat.com>
> >>> Cc: peter.maydell@linaro.org; drjones@redhat.com; david@redhat.com;
> >>> dgilbert@redhat.com; Shameerali Kolothum Thodi
> >>> <shameerali.kolothum.thodi@huawei.com>; qemu-devel@nongnu.org;
> >>> qemu-arm@nongnu.org; eric.auger.pro@gmail.com;
> >>> david@gibson.dropbear.id.au
> >>> Subject: Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion
> >>> and PCDIMM/NVDIMM support
> >>>
> >>> Hi Igor, Shameer,
> >>>
> >>> On 2/27/19 11:10 AM, Igor Mammedov wrote:    
> >>>> On Tue, 26 Feb 2019 18:53:24 +0100
> >>>> Auger Eric <eric.auger@redhat.com> wrote:
> >>>>    
> >>>>> Hi Igor,
> >>>>>
> >>>>> On 2/26/19 5:56 PM, Igor Mammedov wrote:    
> >>>>>> On Tue, 26 Feb 2019 14:11:58 +0100
> >>>>>> Auger Eric <eric.auger@redhat.com> wrote:
> >>>>>>    
> >>>>>>> Hi Igor,
> >>>>>>>
> >>>>>>> On 2/26/19 9:40 AM, Auger Eric wrote:    
> >>>>>>>> Hi Igor,
> >>>>>>>>
> >>>>>>>> On 2/25/19 10:42 AM, Igor Mammedov wrote:    
> >>>>>>>>> On Fri, 22 Feb 2019 18:35:26 +0100
> >>>>>>>>> Auger Eric <eric.auger@redhat.com> wrote:
> >>>>>>>>>    
> >>>>>>>>>> Hi Igor,
> >>>>>>>>>>
> >>>>>>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote:    
> >>>>>>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100
> >>>>>>>>>>> Eric Auger <eric.auger@redhat.com> wrote:
> >>>>>>>>>>>    
> >>>>>>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to
> >>>>>>>>>>>> support device memory in general, and especially    
> >>> PCDIMM/NVDIMM.    
> >>>>>>>>>>>>
> >>>>>>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
> >>>>>>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as    
> >>> the    
> >>>>>>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high    
> >>> PCIe    
> >>>>>>>>>>>> MMIO region. The address map was 1TB large. This corresponded    
> >>> to    
> >>>>>>>>>>>> the max IPA capacity KVM was able to manage.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic
> >>>>>>>>>>>> IPA range. So the guest physical address can go beyond the 1TB.    
> >>> The    
> >>>>>>>>>>>> max GPA size depends on the host kernel configuration and physical    
> >>> CPUs.    
> >>>>>>>>>>>>
> >>>>>>>>>>>> In this series we use this feature and allow the RAM to grow    
> >>> without    
> >>>>>>>>>>>> any other limit than the one put by the host kernel.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
> >>>>>>>>>>>> ram_size and then comes the device memory (,maxmem) of size
> >>>>>>>>>>>> maxram_size - ram_size. The device memory is potentially    
> >>> hotpluggable    
> >>>>>>>>>>>> depending on the instantiated memory objects.
> >>>>>>>>>>>>
> >>>>>>>>>>>> IO regions previously located between 256GB and 1TB are moved    
> >>> after    
> >>>>>>>>>>>> the RAM. Their offset is dynamically computed, depends on    
> >>> ram_size    
> >>>>>>>>>>>> and maxram_size. Size alignment is enforced.
> >>>>>>>>>>>>
> >>>>>>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory    
> >>> map    
> >>>>>>>>>>>> still is used. The change of memory map becomes effective from 4.0
> >>>>>>>>>>>> onwards.
> >>>>>>>>>>>>
> >>>>>>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to    
> >>> do    
> >>>>>>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
> >>>>>>>>>>>> that job at the moment.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Device memory being put just after the initial RAM, it is possible
> >>>>>>>>>>>> to get access to this feature while keeping a 1TB address map.
> >>>>>>>>>>>>
> >>>>>>>>>>>> This series reuses/rebases patches initially submitted by Shameer
> >>>>>>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Functionally, the series is split into 3 parts:
> >>>>>>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in
> >>>>>>>>>>>>    the memory map    
> >>>>>>>>>>>    
> >>>>>>>>>>>> 2) Support of PC-DIMM [10 - 13]    
> >>>>>>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
> >>>>>>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
> >>>>>>>>>>> visible to the guest. It might be that DT is masking problem
> >>>>>>>>>>> but well, that won't work on ACPI only guests.    
> >>>>>>>>>>
> >>>>>>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of    
> >>> mem    
> >>>>>>>>>> added with the DIMM slots.    
> >>>>>>>>> Question is how does it get there? Does it come from DT or from    
> >>> firmware    
> >>>>>>>>> via UEFI interfaces?
> >>>>>>>>>    
> >>>>>>>>>> So it looks fine to me. Isn't E820 a pure x86 matter?    
> >>>>>>>>> sorry for misleading, I've meant is UEFI GetMemoryMap().
> >>>>>>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
> >>>>>>>>> via UEFI GetMemoryMap() as guest kernel might start using it as    
> >>> normal    
> >>>>>>>>> memory early at boot and later put that memory into zone normal and    
> >>> hence    
> >>>>>>>>> make it non-hot-un-pluggable. The same concerns apply to DT based    
> >>> means    
> >>>>>>>>> of discovery.
> >>>>>>>>> (That's guest issue but it's easy to workaround it not putting    
> >>> hotpluggable    
> >>>>>>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it    
> >>> properly)    
> >>>>>>>>> That way memory doesn't get (ab)used by firmware or early boot    
> >>> kernel stages    
> >>>>>>>>> and doesn't get locked up.
> >>>>>>>>>    
> >>>>>>>>>> What else would you expect in the dsdt?    
> >>>>>>>>> Memory device descriptions, look for code that adds PNP0C80 with    
> >>> _CRS    
> >>>>>>>>> describing memory ranges    
> >>>>>>>>
> >>>>>>>> OK thank you for the explanations. I will work on PNP0C80 addition then.
> >>>>>>>> Does it mean that in ACPI mode we must not output DT hotplug memory
> >>>>>>>> nodes or assuming that PNP0C80 is properly described, it will "override"
> >>>>>>>> DT description?    
> >>>>>>>
> >>>>>>> After further investigations, I think the pieces you pointed out are
> >>>>>>> added by Shameer's series, ie. through the build_memory_hotplug_aml()
> >>>>>>> call. So I suggest we separate the concerns: this series brings support
> >>>>>>> for DIMM coldplug. hotplug, including all the relevant ACPI structures
> >>>>>>> will be added later on by Shameer.    
> >>>>>>
> >>>>>> Maybe we should not put pc-dimms in DT for this series until it gets clear
> >>>>>> if it doesn't conflict with ACPI in some way.    
> >>>>>
> >>>>> I guess you mean removing the DT hotpluggable memory nodes only in ACPI
> >>>>> mode? Otherwise you simply remove the DIMM feature, right?    
> >>>> Something like this so DT won't get in conflict with ACPI.
> >>>> Only we don't have a switch for it something like, -machine fdt=on (with    
> >>> default off)    
> >>>>    
> >>>>> I double checked and if you remove the hotpluggable memory DT nodes in
> >>>>> ACPI mode:
> >>>>> - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I
> >>>>> guess you're right, if the DT nodes are available, that memory is
> >>>>> considered as not unpluggable by the guest.
> >>>>> - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX
> >>>>> system.
> >>>>>
> >>>>> Hotplug/unplug is clearly not supported by this series and any attempt
> >>>>> results in "memory hotplug is not supported". Is it really an issue if
> >>>>> the guest does not consider DIMM slots as not hot-unpluggable memory? I
> >>>>> am not even sure the guest kernel would support to unplug that memory.
> >>>>>
> >>>>> In case we want all ACPI tables to be ready for making this memory seen
> >>>>> as hot-unpluggable we need some Shameer's patches on top of this series.    
> >>>> May be we should push for this way (into 4.0), it's just a several patches
> >>>> after all or even merge them in your series (I'd guess it would need to be
> >>>> rebased on top of your latest work)    
> >>>
> >>> Shameer, would you agree if we merge PATCH 1 of your RFC hotplug series
> >>> (without the reduced hw_reduced_acpi flag) in this series and isolate in
> >>> a second PATCH the acpi_memory_hotplug_init() + build_memory_hotplug_aml
> >>> called in virt code?    
> > probably we can do it as transitional step as we need working mmio interface
> > in place for build_memory_hotplug_aml() to work, provided it won't create
> > migration issues (do we need VMSTATE_MEMORY_HOTPLUG for cold-plug case?).
> > 
> > What about dummy initial GED (empty device), that manages mmio region only
> > and then later it will be filled with remaining logic IRQ. In this case mmio region
> > and vmstate won't change (maybe) so it won't cause ABI or migration issues.  
> 
> 
> > 
> >   
> >> Sure, that’s fine with me. So what would you use for the event_handler_method in
> >> build_memory_hotplug_aml()? GPO0 device?  
> > 
> > a method name not defined in spec, so it won't be called might do.  
> 
> At this point the event_handler_method, ie. \_SB.GPO0._E02, is not
> supposed to be called, right? So effectivily we should be able to use
> any other method name (unlinked to any GPIO/GED). I guess at this stage
> only the PNP0C80 definition blocks + methods are used.
pretty much yes.
 
> What still remains fuzzy for me is in case of cold plug the mmio hotplug
> control region part only is read (despite the slot selection of course)
> and returns 0 for addr/size and also flags meaning the slot is not
> enabled.
If you mean guest reads 0s than it looks broken, could you show
trace log with mhp_* tracepoints enabled during a dimm hotplug.

> So despite the slots are advertised as hotpluggable/enabled in
> the SRAT; I am not sure for the OS it actually makes any difference
> whether the DSDT definition blocks are described or not.
SRAT isn't used fro informing guests about amount of present RAM,
it holds affinity information for present and possible RAM

> To be honest I am afraid this is too late to add those additional
> features for 4.0 now. This is going to jeopardize the first preliminary
> part which is the introduction of the new memory map, allowing the
> expansion of the initial RAM and paving the way for device memory
> introduction. So I think I am going to resend the first 10 patches in a
> standalone series. And we can iterate on the PCDIMM/NVDIMM parts
> independently.
sounds good to me, I'll try to review 1-10 today 
 
> Thanks
> 
> Eric
> > 
> >   
> >>
> >> Thanks,
> >> Shameer
> >>  
> >>> Then would remain the GED/GPIO actual integration.
> >>>
> >>> Thanks
> >>>
> >>> Eric    
> >>>>    
> >>>>> Also don't DIMM slots already make sense in DT mode. Usually we accept
> >>>>> to add one feature in DT and then in ACPI. For instance we can benefit    
> >>>> usually it doesn't conflict with each other (at least I'm not aware of it)
> >>>> but I see a problem with in this case.
> >>>>    
> >>>>> from nvdimm in dt mode right? So, considering an incremental approach I
> >>>>> would be in favour of keeping the DT nodes.    
> >>>> I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much
> >>>> more versatile.
> >>>>
> >>>> I consider target application of arm/virt as a board that's used to
> >>>> run in production generic ACPI capable guest in most use cases and
> >>>> various DT only guests as secondary ones. It's hard to make
> >>>> both usecases be happy with defaults (that's probably  one of the
> >>>> reasons why 'sbsa' board is being added).
> >>>>
> >>>> So I'd give priority to ACPI based arm/virt versus DT when defaults are
> >>>> considered.
> >>>>    
> >>>>> Thanks
> >>>>>
> >>>>> Eric    
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>    
> >>>>    
> > 
> >

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Auger Eric 6 years, 8 months ago

Hi Igor,

[..]

>  
>> What still remains fuzzy for me is in case of cold plug the mmio hotplug
>> control region part only is read (despite the slot selection of course)
>> and returns 0 for addr/size and also flags meaning the slot is not
>> enabled.
> If you mean guest reads 0s than it looks broken, could you show
> trace log with mhp_* tracepoints enabled during a dimm hotplug.

Please find the traces + cmd line on x86


/qemu-system-x86_64 -M
q35,usb=off,dump-guest-core=off,kernel_irqchip=split,nvdimm -cpu
Haswell,-hle,-rtm -smp 4,sockets=4,cores=1,threads=1 -m
16G,maxmem=32G,slots=4 -display none --enable-kvm -serial
tcp:localhost:4444,server -trace
events=/home/augere/UPSTREAM/qemu2/nvdimm.txt -qmp
unix:/home/augere/TEST/QEMU/qmp-sock,server,nowait -rtc
base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -realtime
mlock=off -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global
PIIX4_PM.disable_s4=1 -boot strict=on -machine kernel_irqchip=split
-object
memory-backend-file,id=mem3,share,mem-path=/home/augere/TEST/QEMU/nv-dimm-3,size=2G,align=128M
-device nvdimm,memdev=mem3,id=dimm3,label-size=2M -object
memory-backend-file,id=mem4,share,mem-path=/home/augere/TEST/QEMU/nv-dimm-4,size=2G,align=128M
-device nvdimm,memdev=mem4,id=dimm4,label-size=2M -device
virtio-blk-pci,bus=pcie.0,scsi=off,drive=drv0,id=virtio-disk0,bootindex=1,werror=stop,rerror=stop
-drive
file=/home/augere/VM/IMAGES/x86_64-vm1-f28.raw,format=raw,if=none,cache=writethrough,id=drv0
-device virtio-net-pci,bus=pcie.0,netdev=nic0,mac=6a:f5:10:b1:3d:d2
-netdev
tap,id=nic0,script=/home/augere/TEST/SCRIPTS/qemu-ifup,downscript=/home/augere/TEST/SCRIPTS/qemu-ifdown,vhost=on
-net none -d guest_errors

******************************************************************
ioctl(TUNSETIFF): Device or resource busy
qemu-system-x86_64: -serial tcp:localhost:4444,server: info: QEMU
waiting for connection on: disconnected:tcp:::1:4444,server
qemu-system-x86_64: warning: global PIIX4_PM.disable_s3=1 not used
qemu-system-x86_64: warning: global PIIX4_PM.disable_s4=1 not used
29556@1551449303.339464:mhp_acpi_write_slot set active slot: 0x0
29556@1551449303.339496:mhp_acpi_read_addr_hi slot[0x0] addr hi: 0x0
29556@1551449303.339505:mhp_acpi_read_addr_lo slot[0x0] addr lo: 0x0
29556@1551449303.339512:mhp_acpi_read_size_hi slot[0x0] size hi: 0x0
29556@1551449303.339520:mhp_acpi_read_size_lo slot[0x0] size lo: 0x0
29556@1551449303.339563:mhp_acpi_write_slot set active slot: 0x0
29556@1551449303.339574:mhp_acpi_read_flags slot[0x0] flags: 0x0
29556@1551449303.339621:mhp_acpi_write_slot set active slot: 0x1
29556@1551449303.339643:mhp_acpi_read_addr_hi slot[0x1] addr hi: 0x0
29556@1551449303.339651:mhp_acpi_read_addr_lo slot[0x1] addr lo: 0x0
29556@1551449303.339659:mhp_acpi_read_size_hi slot[0x1] size hi: 0x0
29556@1551449303.339667:mhp_acpi_read_size_lo slot[0x1] size lo: 0x0
29556@1551449303.339705:mhp_acpi_write_slot set active slot: 0x1
29556@1551449303.339713:mhp_acpi_read_flags slot[0x1] flags: 0x0
29556@1551449303.339757:mhp_acpi_write_slot set active slot: 0x2
29556@1551449303.339779:mhp_acpi_read_addr_hi slot[0x2] addr hi: 0x0
29556@1551449303.339787:mhp_acpi_read_addr_lo slot[0x2] addr lo: 0x0
29556@1551449303.339796:mhp_acpi_read_size_hi slot[0x2] size hi: 0x0
29556@1551449303.339804:mhp_acpi_read_size_lo slot[0x2] size lo: 0x0
29556@1551449303.339861:mhp_acpi_write_slot set active slot: 0x2
29556@1551449303.339870:mhp_acpi_read_flags slot[0x2] flags: 0x0
29556@1551449303.339916:mhp_acpi_write_slot set active slot: 0x3
29556@1551449303.339944:mhp_acpi_read_addr_hi slot[0x3] addr hi: 0x0
29556@1551449303.339954:mhp_acpi_read_addr_lo slot[0x3] addr lo: 0x0
29556@1551449303.339963:mhp_acpi_read_size_hi slot[0x3] size hi: 0x0
29556@1551449303.339971:mhp_acpi_read_size_lo slot[0x3] size lo: 0x0
29556@1551449303.340012:mhp_acpi_write_slot set active slot: 0x3
29556@1551449303.340020:mhp_acpi_read_flags slot[0x3] flags: 0x0
29556@1551449303.439695:mhp_acpi_write_slot set active slot: 0x0
29556@1551449303.439713:mhp_acpi_read_flags slot[0x0] flags: 0x0
29556@1551449303.439733:mhp_acpi_write_slot set active slot: 0x1
29556@1551449303.439740:mhp_acpi_read_flags slot[0x1] flags: 0x0
29556@1551449303.439759:mhp_acpi_write_slot set active slot: 0x2
29556@1551449303.439767:mhp_acpi_read_flags slot[0x2] flags: 0x0
29556@1551449303.439793:mhp_acpi_write_slot set active slot: 0x3
29556@1551449303.439801:mhp_acpi_read_flags slot[0x3] flags: 0x0
29556@1551449303.539590:mhp_acpi_write_slot set active slot: 0x0
29556@1551449303.539606:mhp_acpi_read_flags slot[0x0] flags: 0x0
29556@1551449303.539627:mhp_acpi_write_slot set active slot: 0x1
29556@1551449303.539634:mhp_acpi_read_flags slot[0x1] flags: 0x0
29556@1551449303.539652:mhp_acpi_write_slot set active slot: 0x2
29556@1551449303.539659:mhp_acpi_read_flags slot[0x2] flags: 0x0
29556@1551449303.539677:mhp_acpi_write_slot set active slot: 0x3
29556@1551449303.539684:mhp_acpi_read_flags slot[0x3] flags: 0x0

That's the only traces I get until I get the login prompt.

Thanks

Eric


> 
>> So despite the slots are advertised as hotpluggable/enabled in
>> the SRAT; I am not sure for the OS it actually makes any difference
>> whether the DSDT definition blocks are described or not.
> SRAT isn't used fro informing guests about amount of present RAM,
> it holds affinity information for present and possible RAM
> 
>> To be honest I am afraid this is too late to add those additional
>> features for 4.0 now. This is going to jeopardize the first preliminary
>> part which is the introduction of the new memory map, allowing the
>> expansion of the initial RAM and paving the way for device memory
>> introduction. So I think I am going to resend the first 10 patches in a
>> standalone series. And we can iterate on the PCDIMM/NVDIMM parts
>> independently.
> sounds good to me, I'll try to review 1-10 today 
>  
>> Thanks
>>
>> Eric
>>>
>>>   
>>>>
>>>> Thanks,
>>>> Shameer
>>>>  
>>>>> Then would remain the GED/GPIO actual integration.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Eric    
>>>>>>    
>>>>>>> Also don't DIMM slots already make sense in DT mode. Usually we accept
>>>>>>> to add one feature in DT and then in ACPI. For instance we can benefit    
>>>>>> usually it doesn't conflict with each other (at least I'm not aware of it)
>>>>>> but I see a problem with in this case.
>>>>>>    
>>>>>>> from nvdimm in dt mode right? So, considering an incremental approach I
>>>>>>> would be in favour of keeping the DT nodes.    
>>>>>> I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much
>>>>>> more versatile.
>>>>>>
>>>>>> I consider target application of arm/virt as a board that's used to
>>>>>> run in production generic ACPI capable guest in most use cases and
>>>>>> various DT only guests as secondary ones. It's hard to make
>>>>>> both usecases be happy with defaults (that's probably  one of the
>>>>>> reasons why 'sbsa' board is being added).
>>>>>>
>>>>>> So I'd give priority to ACPI based arm/virt versus DT when defaults are
>>>>>> considered.
>>>>>>    
>>>>>>> Thanks
>>>>>>>
>>>>>>> Eric    
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>    
>>>>>>    
>>>
>>>   
>

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Igor Mammedov 6 years, 8 months ago

On Fri, 1 Mar 2019 15:18:14 +0100
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Igor,
> 
> [..]
> 
> >    
> >> What still remains fuzzy for me is in case of cold plug the mmio hotplug
> >> control region part only is read (despite the slot selection of course)
> >> and returns 0 for addr/size and also flags meaning the slot is not
> >> enabled.  
> > If you mean guest reads 0s than it looks broken, could you show
> > trace log with mhp_* tracepoints enabled during a dimm hotplug.  
> 
> Please find the traces + cmd line on x86
I've thought that you where talking about pc-dimms, so here it goes:

nvdimm is not part of memory hotplug interface, they have their own
mmio region through which they let guest read completely rebuilt
NFIT table and their own GPE._E04 event handler.

you see 0's in trace because guest enumerates all PNP0C80 in DSDT
to check if for any present pc-dimm for which mem hotplug interface
reports 0s since there is none.

PS:
In ACPI spec there is an example of NVDIMMs where they also have
associated memory device (PNP0C80) and that it's somehow related
to nvdimm hotplug, but it's not described in sufficient detail
so I'm honestly do not know what to do with it. Hence QEMU doesn't
have PNP0C80 counterpart for nvdimm. To me it looks more like
a mistake in the spec, but that's a topic for another discussion.


> /qemu-system-x86_64 -M
> q35,usb=off,dump-guest-core=off,kernel_irqchip=split,nvdimm -cpu
> Haswell,-hle,-rtm -smp 4,sockets=4,cores=1,threads=1 -m
> 16G,maxmem=32G,slots=4 -display none --enable-kvm -serial
> tcp:localhost:4444,server -trace
> events=/home/augere/UPSTREAM/qemu2/nvdimm.txt -qmp
> unix:/home/augere/TEST/QEMU/qmp-sock,server,nowait -rtc
> base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -realtime
> mlock=off -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global
> PIIX4_PM.disable_s4=1 -boot strict=on -machine kernel_irqchip=split
> -object
> memory-backend-file,id=mem3,share,mem-path=/home/augere/TEST/QEMU/nv-dimm-3,size=2G,align=128M
> -device nvdimm,memdev=mem3,id=dimm3,label-size=2M -object
> memory-backend-file,id=mem4,share,mem-path=/home/augere/TEST/QEMU/nv-dimm-4,size=2G,align=128M
> -device nvdimm,memdev=mem4,id=dimm4,label-size=2M -device
> virtio-blk-pci,bus=pcie.0,scsi=off,drive=drv0,id=virtio-disk0,bootindex=1,werror=stop,rerror=stop
> -drive
> file=/home/augere/VM/IMAGES/x86_64-vm1-f28.raw,format=raw,if=none,cache=writethrough,id=drv0
> -device virtio-net-pci,bus=pcie.0,netdev=nic0,mac=6a:f5:10:b1:3d:d2
> -netdev
> tap,id=nic0,script=/home/augere/TEST/SCRIPTS/qemu-ifup,downscript=/home/augere/TEST/SCRIPTS/qemu-ifdown,vhost=on
> -net none -d guest_errors
> 
> ******************************************************************
> ioctl(TUNSETIFF): Device or resource busy
> qemu-system-x86_64: -serial tcp:localhost:4444,server: info: QEMU
> waiting for connection on: disconnected:tcp:::1:4444,server
> qemu-system-x86_64: warning: global PIIX4_PM.disable_s3=1 not used
> qemu-system-x86_64: warning: global PIIX4_PM.disable_s4=1 not used
> 29556@1551449303.339464:mhp_acpi_write_slot set active slot: 0x0
> 29556@1551449303.339496:mhp_acpi_read_addr_hi slot[0x0] addr hi: 0x0
> 29556@1551449303.339505:mhp_acpi_read_addr_lo slot[0x0] addr lo: 0x0
> 29556@1551449303.339512:mhp_acpi_read_size_hi slot[0x0] size hi: 0x0
> 29556@1551449303.339520:mhp_acpi_read_size_lo slot[0x0] size lo: 0x0
> 29556@1551449303.339563:mhp_acpi_write_slot set active slot: 0x0
> 29556@1551449303.339574:mhp_acpi_read_flags slot[0x0] flags: 0x0
> 29556@1551449303.339621:mhp_acpi_write_slot set active slot: 0x1
> 29556@1551449303.339643:mhp_acpi_read_addr_hi slot[0x1] addr hi: 0x0
> 29556@1551449303.339651:mhp_acpi_read_addr_lo slot[0x1] addr lo: 0x0
> 29556@1551449303.339659:mhp_acpi_read_size_hi slot[0x1] size hi: 0x0
> 29556@1551449303.339667:mhp_acpi_read_size_lo slot[0x1] size lo: 0x0
> 29556@1551449303.339705:mhp_acpi_write_slot set active slot: 0x1
> 29556@1551449303.339713:mhp_acpi_read_flags slot[0x1] flags: 0x0
> 29556@1551449303.339757:mhp_acpi_write_slot set active slot: 0x2
> 29556@1551449303.339779:mhp_acpi_read_addr_hi slot[0x2] addr hi: 0x0
> 29556@1551449303.339787:mhp_acpi_read_addr_lo slot[0x2] addr lo: 0x0
> 29556@1551449303.339796:mhp_acpi_read_size_hi slot[0x2] size hi: 0x0
> 29556@1551449303.339804:mhp_acpi_read_size_lo slot[0x2] size lo: 0x0
> 29556@1551449303.339861:mhp_acpi_write_slot set active slot: 0x2
> 29556@1551449303.339870:mhp_acpi_read_flags slot[0x2] flags: 0x0
> 29556@1551449303.339916:mhp_acpi_write_slot set active slot: 0x3
> 29556@1551449303.339944:mhp_acpi_read_addr_hi slot[0x3] addr hi: 0x0
> 29556@1551449303.339954:mhp_acpi_read_addr_lo slot[0x3] addr lo: 0x0
> 29556@1551449303.339963:mhp_acpi_read_size_hi slot[0x3] size hi: 0x0
> 29556@1551449303.339971:mhp_acpi_read_size_lo slot[0x3] size lo: 0x0
> 29556@1551449303.340012:mhp_acpi_write_slot set active slot: 0x3
> 29556@1551449303.340020:mhp_acpi_read_flags slot[0x3] flags: 0x0
> 29556@1551449303.439695:mhp_acpi_write_slot set active slot: 0x0
> 29556@1551449303.439713:mhp_acpi_read_flags slot[0x0] flags: 0x0
> 29556@1551449303.439733:mhp_acpi_write_slot set active slot: 0x1
> 29556@1551449303.439740:mhp_acpi_read_flags slot[0x1] flags: 0x0
> 29556@1551449303.439759:mhp_acpi_write_slot set active slot: 0x2
> 29556@1551449303.439767:mhp_acpi_read_flags slot[0x2] flags: 0x0
> 29556@1551449303.439793:mhp_acpi_write_slot set active slot: 0x3
> 29556@1551449303.439801:mhp_acpi_read_flags slot[0x3] flags: 0x0
> 29556@1551449303.539590:mhp_acpi_write_slot set active slot: 0x0
> 29556@1551449303.539606:mhp_acpi_read_flags slot[0x0] flags: 0x0
> 29556@1551449303.539627:mhp_acpi_write_slot set active slot: 0x1
> 29556@1551449303.539634:mhp_acpi_read_flags slot[0x1] flags: 0x0
> 29556@1551449303.539652:mhp_acpi_write_slot set active slot: 0x2
> 29556@1551449303.539659:mhp_acpi_read_flags slot[0x2] flags: 0x0
> 29556@1551449303.539677:mhp_acpi_write_slot set active slot: 0x3
> 29556@1551449303.539684:mhp_acpi_read_flags slot[0x3] flags: 0x0
> 
> That's the only traces I get until I get the login prompt.
> 
> Thanks
> 
> Eric
> 
> 
> >   
> >> So despite the slots are advertised as hotpluggable/enabled in
> >> the SRAT; I am not sure for the OS it actually makes any difference
> >> whether the DSDT definition blocks are described or not.  
> > SRAT isn't used fro informing guests about amount of present RAM,
> > it holds affinity information for present and possible RAM
> >   
> >> To be honest I am afraid this is too late to add those additional
> >> features for 4.0 now. This is going to jeopardize the first preliminary
> >> part which is the introduction of the new memory map, allowing the
> >> expansion of the initial RAM and paving the way for device memory
> >> introduction. So I think I am going to resend the first 10 patches in a
> >> standalone series. And we can iterate on the PCDIMM/NVDIMM parts
> >> independently.  
> > sounds good to me, I'll try to review 1-10 today 
> >    
> >> Thanks
> >>
> >> Eric  
> >>>
> >>>     
> >>>>
> >>>> Thanks,
> >>>> Shameer
> >>>>    
> >>>>> Then would remain the GED/GPIO actual integration.
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>> Eric      
> >>>>>>      
> >>>>>>> Also don't DIMM slots already make sense in DT mode. Usually we accept
> >>>>>>> to add one feature in DT and then in ACPI. For instance we can benefit      
> >>>>>> usually it doesn't conflict with each other (at least I'm not aware of it)
> >>>>>> but I see a problem with in this case.
> >>>>>>      
> >>>>>>> from nvdimm in dt mode right? So, considering an incremental approach I
> >>>>>>> would be in favour of keeping the DT nodes.      
> >>>>>> I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much
> >>>>>> more versatile.
> >>>>>>
> >>>>>> I consider target application of arm/virt as a board that's used to
> >>>>>> run in production generic ACPI capable guest in most use cases and
> >>>>>> various DT only guests as secondary ones. It's hard to make
> >>>>>> both usecases be happy with defaults (that's probably  one of the
> >>>>>> reasons why 'sbsa' board is being added).
> >>>>>>
> >>>>>> So I'd give priority to ACPI based arm/virt versus DT when defaults are
> >>>>>> considered.
> >>>>>>      
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>> Eric      
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>      
> >>>>>>      
> >>>
> >>>     
> >

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

Posted by Auger Eric 6 years, 8 months ago

Hi Igor,

On 3/1/19 5:33 PM, Igor Mammedov wrote:
> On Fri, 1 Mar 2019 15:18:14 +0100
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Igor,
>>
>> [..]
>>
>>>    
>>>> What still remains fuzzy for me is in case of cold plug the mmio hotplug
>>>> control region part only is read (despite the slot selection of course)
>>>> and returns 0 for addr/size and also flags meaning the slot is not
>>>> enabled.  
>>> If you mean guest reads 0s than it looks broken, could you show
>>> trace log with mhp_* tracepoints enabled during a dimm hotplug.  
>>
>> Please find the traces + cmd line on x86
> I've thought that you where talking about pc-dimms, so here it goes:
> 
> nvdimm is not part of memory hotplug interface, they have their own
> mmio region through which they let guest read completely rebuilt
> NFIT table and their own GPE._E04 event handler.
> 
> you see 0's in trace because guest enumerates all PNP0C80 in DSDT
> to check if for any present pc-dimm for which mem hotplug interface
> reports 0s since there is none.
> 
> PS:
> In ACPI spec there is an example of NVDIMMs where they also have
> associated memory device (PNP0C80) and that it's somehow related
> to nvdimm hotplug, but it's not described in sufficient detail
> so I'm honestly do not know what to do with it. Hence QEMU doesn't
> have PNP0C80 counterpart for nvdimm. To me it looks more like
> a mistake in the spec, but that's a topic for another discussion.

Oh my bad. Indeed with PCDIMM I can see the actual addresses being read.

Thank you for the explanation!

Have a nice WE.

Eric

> 
> 
>> /qemu-system-x86_64 -M
>> q35,usb=off,dump-guest-core=off,kernel_irqchip=split,nvdimm -cpu
>> Haswell,-hle,-rtm -smp 4,sockets=4,cores=1,threads=1 -m
>> 16G,maxmem=32G,slots=4 -display none --enable-kvm -serial
>> tcp:localhost:4444,server -trace
>> events=/home/augere/UPSTREAM/qemu2/nvdimm.txt -qmp
>> unix:/home/augere/TEST/QEMU/qmp-sock,server,nowait -rtc
>> base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -realtime
>> mlock=off -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global
>> PIIX4_PM.disable_s4=1 -boot strict=on -machine kernel_irqchip=split
>> -object
>> memory-backend-file,id=mem3,share,mem-path=/home/augere/TEST/QEMU/nv-dimm-3,size=2G,align=128M
>> -device nvdimm,memdev=mem3,id=dimm3,label-size=2M -object
>> memory-backend-file,id=mem4,share,mem-path=/home/augere/TEST/QEMU/nv-dimm-4,size=2G,align=128M
>> -device nvdimm,memdev=mem4,id=dimm4,label-size=2M -device
>> virtio-blk-pci,bus=pcie.0,scsi=off,drive=drv0,id=virtio-disk0,bootindex=1,werror=stop,rerror=stop
>> -drive
>> file=/home/augere/VM/IMAGES/x86_64-vm1-f28.raw,format=raw,if=none,cache=writethrough,id=drv0
>> -device virtio-net-pci,bus=pcie.0,netdev=nic0,mac=6a:f5:10:b1:3d:d2
>> -netdev
>> tap,id=nic0,script=/home/augere/TEST/SCRIPTS/qemu-ifup,downscript=/home/augere/TEST/SCRIPTS/qemu-ifdown,vhost=on
>> -net none -d guest_errors
>>
>> ******************************************************************
>> ioctl(TUNSETIFF): Device or resource busy
>> qemu-system-x86_64: -serial tcp:localhost:4444,server: info: QEMU
>> waiting for connection on: disconnected:tcp:::1:4444,server
>> qemu-system-x86_64: warning: global PIIX4_PM.disable_s3=1 not used
>> qemu-system-x86_64: warning: global PIIX4_PM.disable_s4=1 not used
>> 29556@1551449303.339464:mhp_acpi_write_slot set active slot: 0x0
>> 29556@1551449303.339496:mhp_acpi_read_addr_hi slot[0x0] addr hi: 0x0
>> 29556@1551449303.339505:mhp_acpi_read_addr_lo slot[0x0] addr lo: 0x0
>> 29556@1551449303.339512:mhp_acpi_read_size_hi slot[0x0] size hi: 0x0
>> 29556@1551449303.339520:mhp_acpi_read_size_lo slot[0x0] size lo: 0x0
>> 29556@1551449303.339563:mhp_acpi_write_slot set active slot: 0x0
>> 29556@1551449303.339574:mhp_acpi_read_flags slot[0x0] flags: 0x0
>> 29556@1551449303.339621:mhp_acpi_write_slot set active slot: 0x1
>> 29556@1551449303.339643:mhp_acpi_read_addr_hi slot[0x1] addr hi: 0x0
>> 29556@1551449303.339651:mhp_acpi_read_addr_lo slot[0x1] addr lo: 0x0
>> 29556@1551449303.339659:mhp_acpi_read_size_hi slot[0x1] size hi: 0x0
>> 29556@1551449303.339667:mhp_acpi_read_size_lo slot[0x1] size lo: 0x0
>> 29556@1551449303.339705:mhp_acpi_write_slot set active slot: 0x1
>> 29556@1551449303.339713:mhp_acpi_read_flags slot[0x1] flags: 0x0
>> 29556@1551449303.339757:mhp_acpi_write_slot set active slot: 0x2
>> 29556@1551449303.339779:mhp_acpi_read_addr_hi slot[0x2] addr hi: 0x0
>> 29556@1551449303.339787:mhp_acpi_read_addr_lo slot[0x2] addr lo: 0x0
>> 29556@1551449303.339796:mhp_acpi_read_size_hi slot[0x2] size hi: 0x0
>> 29556@1551449303.339804:mhp_acpi_read_size_lo slot[0x2] size lo: 0x0
>> 29556@1551449303.339861:mhp_acpi_write_slot set active slot: 0x2
>> 29556@1551449303.339870:mhp_acpi_read_flags slot[0x2] flags: 0x0
>> 29556@1551449303.339916:mhp_acpi_write_slot set active slot: 0x3
>> 29556@1551449303.339944:mhp_acpi_read_addr_hi slot[0x3] addr hi: 0x0
>> 29556@1551449303.339954:mhp_acpi_read_addr_lo slot[0x3] addr lo: 0x0
>> 29556@1551449303.339963:mhp_acpi_read_size_hi slot[0x3] size hi: 0x0
>> 29556@1551449303.339971:mhp_acpi_read_size_lo slot[0x3] size lo: 0x0
>> 29556@1551449303.340012:mhp_acpi_write_slot set active slot: 0x3
>> 29556@1551449303.340020:mhp_acpi_read_flags slot[0x3] flags: 0x0
>> 29556@1551449303.439695:mhp_acpi_write_slot set active slot: 0x0
>> 29556@1551449303.439713:mhp_acpi_read_flags slot[0x0] flags: 0x0
>> 29556@1551449303.439733:mhp_acpi_write_slot set active slot: 0x1
>> 29556@1551449303.439740:mhp_acpi_read_flags slot[0x1] flags: 0x0
>> 29556@1551449303.439759:mhp_acpi_write_slot set active slot: 0x2
>> 29556@1551449303.439767:mhp_acpi_read_flags slot[0x2] flags: 0x0
>> 29556@1551449303.439793:mhp_acpi_write_slot set active slot: 0x3
>> 29556@1551449303.439801:mhp_acpi_read_flags slot[0x3] flags: 0x0
>> 29556@1551449303.539590:mhp_acpi_write_slot set active slot: 0x0
>> 29556@1551449303.539606:mhp_acpi_read_flags slot[0x0] flags: 0x0
>> 29556@1551449303.539627:mhp_acpi_write_slot set active slot: 0x1
>> 29556@1551449303.539634:mhp_acpi_read_flags slot[0x1] flags: 0x0
>> 29556@1551449303.539652:mhp_acpi_write_slot set active slot: 0x2
>> 29556@1551449303.539659:mhp_acpi_read_flags slot[0x2] flags: 0x0
>> 29556@1551449303.539677:mhp_acpi_write_slot set active slot: 0x3
>> 29556@1551449303.539684:mhp_acpi_read_flags slot[0x3] flags: 0x0
>>
>> That's the only traces I get until I get the login prompt.
>>
>> Thanks
>>
>> Eric
>>
>>
>>>   
>>>> So despite the slots are advertised as hotpluggable/enabled in
>>>> the SRAT; I am not sure for the OS it actually makes any difference
>>>> whether the DSDT definition blocks are described or not.  
>>> SRAT isn't used fro informing guests about amount of present RAM,
>>> it holds affinity information for present and possible RAM
>>>   
>>>> To be honest I am afraid this is too late to add those additional
>>>> features for 4.0 now. This is going to jeopardize the first preliminary
>>>> part which is the introduction of the new memory map, allowing the
>>>> expansion of the initial RAM and paving the way for device memory
>>>> introduction. So I think I am going to resend the first 10 patches in a
>>>> standalone series. And we can iterate on the PCDIMM/NVDIMM parts
>>>> independently.  
>>> sounds good to me, I'll try to review 1-10 today 
>>>    
>>>> Thanks
>>>>
>>>> Eric  
>>>>>
>>>>>     
>>>>>>
>>>>>> Thanks,
>>>>>> Shameer
>>>>>>    
>>>>>>> Then would remain the GED/GPIO actual integration.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Eric      
>>>>>>>>      
>>>>>>>>> Also don't DIMM slots already make sense in DT mode. Usually we accept
>>>>>>>>> to add one feature in DT and then in ACPI. For instance we can benefit      
>>>>>>>> usually it doesn't conflict with each other (at least I'm not aware of it)
>>>>>>>> but I see a problem with in this case.
>>>>>>>>      
>>>>>>>>> from nvdimm in dt mode right? So, considering an incremental approach I
>>>>>>>>> would be in favour of keeping the DT nodes.      
>>>>>>>> I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much
>>>>>>>> more versatile.
>>>>>>>>
>>>>>>>> I consider target application of arm/virt as a board that's used to
>>>>>>>> run in production generic ACPI capable guest in most use cases and
>>>>>>>> various DT only guests as secondary ones. It's hard to make
>>>>>>>> both usecases be happy with defaults (that's probably  one of the
>>>>>>>> reasons why 'sbsa' board is being added).
>>>>>>>>
>>>>>>>> So I'd give priority to ACPI based arm/virt versus DT when defaults are
>>>>>>>> considered.
>>>>>>>>      
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> Eric      
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>      
>>>>>>>>      
>>>>>
>>>>>     
>>>   
> 
>