accel/kvm/kvm-all.c | 2 +- default-configs/arm-softmmu.mak | 4 + hw/acpi/nvdimm.c | 31 ++- hw/arm/boot.c | 136 ++++++++++-- hw/arm/virt-acpi-build.c | 23 +- hw/arm/virt.c | 364 ++++++++++++++++++++++++++++---- hw/i386/pc_piix.c | 6 +- hw/i386/pc_q35.c | 6 +- hw/ppc/mac_newworld.c | 3 +- hw/ppc/mac_oldworld.c | 2 +- hw/ppc/spapr.c | 2 +- include/hw/arm/virt.h | 24 ++- include/hw/boards.h | 5 +- include/hw/mem/nvdimm.h | 4 + target/arm/kvm.c | 10 + target/arm/kvm_arm.h | 13 ++ vl.c | 6 +- 17 files changed, 556 insertions(+), 85 deletions(-)
This series aims to bump the 255GB RAM limit in machvirt and to support device memory in general, and especially PCDIMM/NVDIMM. In machvirt versions < 4.0, the initial RAM starts at 1GB and can grow up to 255GB. From 256GB onwards we find IO regions such as the additional GICv3 RDIST region, high PCIe ECAM region and high PCIe MMIO region. The address map was 1TB large. This corresponded to the max IPA capacity KVM was able to manage. Since 4.20, the host kernel is able to support a larger and dynamic IPA range. So the guest physical address can go beyond the 1TB. The max GPA size depends on the host kernel configuration and physical CPUs. In this series we use this feature and allow the RAM to grow without any other limit than the one put by the host kernel. The RAM still starts at 1GB. First comes the initial ram (-m) of size ram_size and then comes the device memory (,maxmem) of size maxram_size - ram_size. The device memory is potentially hotpluggable depending on the instantiated memory objects. IO regions previously located between 256GB and 1TB are moved after the RAM. Their offset is dynamically computed, depends on ram_size and maxram_size. Size alignment is enforced. In case maxmem value is inferior to 255GB, the legacy memory map still is used. The change of memory map becomes effective from 4.0 onwards. As we keep the initial RAM at 1GB base address, we do not need to do invasive changes in the EDK2 FW. It seems nobody is eager to do that job at the moment. Device memory being put just after the initial RAM, it is possible to get access to this feature while keeping a 1TB address map. This series reuses/rebases patches initially submitted by Shameer in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts. Functionally, the series is split into 3 parts: 1) bump of the initial RAM limit [1 - 9] and change in the memory map 2) Support of PC-DIMM [10 - 13] 3) Support of NV-DIMM [14 - 17] 1) can be upstreamed before 2 and 2 can be upstreamed before 3. Work is ongoing to transform the whole memory as device memory. However this move is not trivial and to me, is independent on the improvements brought by this series: - if we were to use DIMM for initial RAM, those DIMMs would use use slots. Although they would not be part of the ones provided using the ",slots" options, they are ACPI limited resources. - DT and ACPI description needs to be reworked - NUMA integration needs special care - a special device memory object may be required to avoid consuming slots and easing the FW description. So I preferred to separate the concerns. This new implementation based on device memory could be candidate for another virt version. Best Regards Eric References: [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions http://patchwork.ozlabs.org/cover/914694/ [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html This series can be found at: https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7 History: v6 -> v7: - Addressed Peter and Igor comments (exceptions sent my email) - Fixed TCG case. Now device memory works also for TCG and vcpu pamax is checked - See individual logs for more details v5 -> v6: - mingw compilation issue fix - kvm_arm_get_max_vm_phys_shift always returns the number of supported IPA bits - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review of "hw/arm/virt: Split the memory map description" - "hw/arm/virt: Move memory map initialization into machvirt_init" squashed into the previous patch - change alignment of IO regions beyond the RAM so that it matches their size v4 -> v5: - change in the memory map - see individual logs v3 -> v4: - rebase on David's "pc-dimm: next bunch of cleanups" and "pc-dimm: pre_plug "slot" and "addr" assignment" - kvm-type option not used anymore. We directly use maxram_size and ram_size machine fields to compute the MAX IPA range. Migration is naturally handled as CLI option are kept between source and destination. This was suggested by David. - device_memory_start and device_memory_size not stored anymore in vms->bootinfo - I did not take into account 2 Igor's comments: the one related to the refactoring of arm_load_dtb and the one related to the generation of the dtb after system_reset which would contain nodes of hotplugged devices (we do not support hotplug at this stage) - check the end-user does not attempt to hotplug a device - addition of "vl: Set machine ram_size, maxram_size and ram_slots earlier" v2 -> v3: - fix pc_q35 and pc_piix compilation error - kwangwoo's email being not valid anymore, remove his address v1 -> v2: - kvm_get_max_vm_phys_shift moved in arch specific file - addition of NVDIMM part - single series - rebase on David's refactoring v1: - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size" - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB" Best Regards Eric Eric Auger (12): hw/arm/virt: Rename highmem IO regions hw/arm/virt: Split the memory map description hw/boards: Add a MachineState parameter to kvm_type callback kvm: add kvm_arm_get_max_vm_ipa_size vl: Set machine ram_size, maxram_size and ram_slots earlier hw/arm/virt: Dynamic memory map depending on RAM requirements hw/arm/virt: Implement kvm_type function for 4.0 machine hw/arm/virt: Bump the 255GB initial RAM limit hw/arm/virt: Add memory hotplug framework hw/arm/virt: Allocate device_memory hw/arm/boot: Expose the pmem nodes in the DT hw/arm/virt: Add nvdimm and nvdimm-persistence options Kwangwoo Lee (2): nvdimm: use configurable ACPI IO base and size hw/arm/virt: Add nvdimm hot-plug infrastructure Shameer Kolothum (3): hw/arm/boot: introduce fdt_add_memory_node helper hw/arm/boot: Expose the PC-DIMM nodes in the DT hw/arm/virt-acpi-build: Add PC-DIMM in SRAT accel/kvm/kvm-all.c | 2 +- default-configs/arm-softmmu.mak | 4 + hw/acpi/nvdimm.c | 31 ++- hw/arm/boot.c | 136 ++++++++++-- hw/arm/virt-acpi-build.c | 23 +- hw/arm/virt.c | 364 ++++++++++++++++++++++++++++---- hw/i386/pc_piix.c | 6 +- hw/i386/pc_q35.c | 6 +- hw/ppc/mac_newworld.c | 3 +- hw/ppc/mac_oldworld.c | 2 +- hw/ppc/spapr.c | 2 +- include/hw/arm/virt.h | 24 ++- include/hw/boards.h | 5 +- include/hw/mem/nvdimm.h | 4 + target/arm/kvm.c | 10 + target/arm/kvm_arm.h | 13 ++ vl.c | 6 +- 17 files changed, 556 insertions(+), 85 deletions(-) -- 2.20.1
Hi Peter, On 2/20/19 11:39 PM, Eric Auger wrote: > This series aims to bump the 255GB RAM limit in machvirt and to > support device memory in general, and especially PCDIMM/NVDIMM. > > In machvirt versions < 4.0, the initial RAM starts at 1GB and can > grow up to 255GB. From 256GB onwards we find IO regions such as the > additional GICv3 RDIST region, high PCIe ECAM region and high PCIe > MMIO region. The address map was 1TB large. This corresponded to > the max IPA capacity KVM was able to manage. > > Since 4.20, the host kernel is able to support a larger and dynamic > IPA range. So the guest physical address can go beyond the 1TB. The > max GPA size depends on the host kernel configuration and physical CPUs. > > In this series we use this feature and allow the RAM to grow without > any other limit than the one put by the host kernel. > > The RAM still starts at 1GB. First comes the initial ram (-m) of size > ram_size and then comes the device memory (,maxmem) of size > maxram_size - ram_size. The device memory is potentially hotpluggable > depending on the instantiated memory objects. > > IO regions previously located between 256GB and 1TB are moved after > the RAM. Their offset is dynamically computed, depends on ram_size > and maxram_size. Size alignment is enforced. > > In case maxmem value is inferior to 255GB, the legacy memory map > still is used. The change of memory map becomes effective from 4.0 > onwards. > > As we keep the initial RAM at 1GB base address, we do not need to do > invasive changes in the EDK2 FW. It seems nobody is eager to do > that job at the moment. > > Device memory being put just after the initial RAM, it is possible > to get access to this feature while keeping a 1TB address map. > > This series reuses/rebases patches initially submitted by Shameer > in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts. > > Functionally, the series is split into 3 parts: > 1) bump of the initial RAM limit [1 - 9] and change in > the memory map I respinned the whole series including PCDIMM and NVDIMM parts as Igor did a first review pass on those latter. However the first objective is to get [1 - 9] upstreamed as we discussed realier. So please consider those patches independently on the others. Thanks Eric > 2) Support of PC-DIMM [10 - 13] > 3) Support of NV-DIMM [14 - 17] > > 1) can be upstreamed before 2 and 2 can be upstreamed before 3. > > Work is ongoing to transform the whole memory as device memory. > However this move is not trivial and to me, is independent on > the improvements brought by this series: > - if we were to use DIMM for initial RAM, those DIMMs would use > use slots. Although they would not be part of the ones provided > using the ",slots" options, they are ACPI limited resources. > - DT and ACPI description needs to be reworked > - NUMA integration needs special care > - a special device memory object may be required to avoid consuming > slots and easing the FW description. > > So I preferred to separate the concerns. This new implementation > based on device memory could be candidate for another virt > version. > > Best Regards > > Eric > > References: > > [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions > http://patchwork.ozlabs.org/cover/914694/ > > [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform > https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html > > This series can be found at: > https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7 > > History: > > v6 -> v7: > - Addressed Peter and Igor comments (exceptions sent my email) > - Fixed TCG case. Now device memory works also for TCG and vcpu > pamax is checked > - See individual logs for more details > > v5 -> v6: > - mingw compilation issue fix > - kvm_arm_get_max_vm_phys_shift always returns the number of supported > IPA bits > - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review > of "hw/arm/virt: Split the memory map description" > - "hw/arm/virt: Move memory map initialization into machvirt_init" > squashed into the previous patch > - change alignment of IO regions beyond the RAM so that it matches their > size > > v4 -> v5: > - change in the memory map > - see individual logs > > v3 -> v4: > - rebase on David's "pc-dimm: next bunch of cleanups" and > "pc-dimm: pre_plug "slot" and "addr" assignment" > - kvm-type option not used anymore. We directly use > maxram_size and ram_size machine fields to compute the > MAX IPA range. Migration is naturally handled as CLI > option are kept between source and destination. This was > suggested by David. > - device_memory_start and device_memory_size not stored > anymore in vms->bootinfo > - I did not take into account 2 Igor's comments: the one > related to the refactoring of arm_load_dtb and the one > related to the generation of the dtb after system_reset > which would contain nodes of hotplugged devices (we do > not support hotplug at this stage) > - check the end-user does not attempt to hotplug a device > - addition of "vl: Set machine ram_size, maxram_size and > ram_slots earlier" > > v2 -> v3: > - fix pc_q35 and pc_piix compilation error > - kwangwoo's email being not valid anymore, remove his address > > v1 -> v2: > - kvm_get_max_vm_phys_shift moved in arch specific file > - addition of NVDIMM part > - single series > - rebase on David's refactoring > > v1: > - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size" > - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB" > > Best Regards > > Eric > > > Eric Auger (12): > hw/arm/virt: Rename highmem IO regions > hw/arm/virt: Split the memory map description > hw/boards: Add a MachineState parameter to kvm_type callback > kvm: add kvm_arm_get_max_vm_ipa_size > vl: Set machine ram_size, maxram_size and ram_slots earlier > hw/arm/virt: Dynamic memory map depending on RAM requirements > hw/arm/virt: Implement kvm_type function for 4.0 machine > hw/arm/virt: Bump the 255GB initial RAM limit > hw/arm/virt: Add memory hotplug framework > hw/arm/virt: Allocate device_memory > hw/arm/boot: Expose the pmem nodes in the DT > hw/arm/virt: Add nvdimm and nvdimm-persistence options > > Kwangwoo Lee (2): > nvdimm: use configurable ACPI IO base and size > hw/arm/virt: Add nvdimm hot-plug infrastructure > > Shameer Kolothum (3): > hw/arm/boot: introduce fdt_add_memory_node helper > hw/arm/boot: Expose the PC-DIMM nodes in the DT > hw/arm/virt-acpi-build: Add PC-DIMM in SRAT > > accel/kvm/kvm-all.c | 2 +- > default-configs/arm-softmmu.mak | 4 + > hw/acpi/nvdimm.c | 31 ++- > hw/arm/boot.c | 136 ++++++++++-- > hw/arm/virt-acpi-build.c | 23 +- > hw/arm/virt.c | 364 ++++++++++++++++++++++++++++---- > hw/i386/pc_piix.c | 6 +- > hw/i386/pc_q35.c | 6 +- > hw/ppc/mac_newworld.c | 3 +- > hw/ppc/mac_oldworld.c | 2 +- > hw/ppc/spapr.c | 2 +- > include/hw/arm/virt.h | 24 ++- > include/hw/boards.h | 5 +- > include/hw/mem/nvdimm.h | 4 + > target/arm/kvm.c | 10 + > target/arm/kvm_arm.h | 13 ++ > vl.c | 6 +- > 17 files changed, 556 insertions(+), 85 deletions(-) >
On Wed, 20 Feb 2019 23:39:46 +0100 Eric Auger <eric.auger@redhat.com> wrote: > This series aims to bump the 255GB RAM limit in machvirt and to > support device memory in general, and especially PCDIMM/NVDIMM. > > In machvirt versions < 4.0, the initial RAM starts at 1GB and can > grow up to 255GB. From 256GB onwards we find IO regions such as the > additional GICv3 RDIST region, high PCIe ECAM region and high PCIe > MMIO region. The address map was 1TB large. This corresponded to > the max IPA capacity KVM was able to manage. > > Since 4.20, the host kernel is able to support a larger and dynamic > IPA range. So the guest physical address can go beyond the 1TB. The > max GPA size depends on the host kernel configuration and physical CPUs. > > In this series we use this feature and allow the RAM to grow without > any other limit than the one put by the host kernel. > > The RAM still starts at 1GB. First comes the initial ram (-m) of size > ram_size and then comes the device memory (,maxmem) of size > maxram_size - ram_size. The device memory is potentially hotpluggable > depending on the instantiated memory objects. > > IO regions previously located between 256GB and 1TB are moved after > the RAM. Their offset is dynamically computed, depends on ram_size > and maxram_size. Size alignment is enforced. > > In case maxmem value is inferior to 255GB, the legacy memory map > still is used. The change of memory map becomes effective from 4.0 > onwards. > > As we keep the initial RAM at 1GB base address, we do not need to do > invasive changes in the EDK2 FW. It seems nobody is eager to do > that job at the moment. > > Device memory being put just after the initial RAM, it is possible > to get access to this feature while keeping a 1TB address map. > > This series reuses/rebases patches initially submitted by Shameer > in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts. > > Functionally, the series is split into 3 parts: > 1) bump of the initial RAM limit [1 - 9] and change in > the memory map > 2) Support of PC-DIMM [10 - 13] Is this part complete ACPI wise (for coldplug)? I haven't noticed DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be visible to the guest. It might be that DT is masking problem but well, that won't work on ACPI only guests. Even though I've tried make mem hotplug ACPI parts not x86 specific, I'm afraid it might be tightly coupled with hotplug support. So here are 2 options make DSDT part work without hotplug or implement hotplug here. I think the former is just a waste of time and we should just add hotplug. It should take relatively minor effort since you already implemented most of boiler plate here. As for how to implement ACPI HW part, I suggest to borrow GED device that NEMU guys trying to use instead of GPIO route, like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event. So that it would be easier to share this with their virt-x86 machine eventually. > 3) Support of NV-DIMM [14 - 17] The same might be true for NUMA but I haven't dug this deep in to that part. > > 1) can be upstreamed before 2 and 2 can be upstreamed before 3. > > Work is ongoing to transform the whole memory as device memory. > However this move is not trivial and to me, is independent on > the improvements brought by this series: > - if we were to use DIMM for initial RAM, those DIMMs would use > use slots. Although they would not be part of the ones provided > using the ",slots" options, they are ACPI limited resources. > - DT and ACPI description needs to be reworked > - NUMA integration needs special care > - a special device memory object may be required to avoid consuming > slots and easing the FW description. > > So I preferred to separate the concerns. This new implementation > based on device memory could be candidate for another virt > version. > > Best Regards > > Eric > > References: > > [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions > http://patchwork.ozlabs.org/cover/914694/ > > [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform > https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html > > This series can be found at: > https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7 > > History: > > v6 -> v7: > - Addressed Peter and Igor comments (exceptions sent my email) > - Fixed TCG case. Now device memory works also for TCG and vcpu > pamax is checked > - See individual logs for more details > > v5 -> v6: > - mingw compilation issue fix > - kvm_arm_get_max_vm_phys_shift always returns the number of supported > IPA bits > - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review > of "hw/arm/virt: Split the memory map description" > - "hw/arm/virt: Move memory map initialization into machvirt_init" > squashed into the previous patch > - change alignment of IO regions beyond the RAM so that it matches their > size > > v4 -> v5: > - change in the memory map > - see individual logs > > v3 -> v4: > - rebase on David's "pc-dimm: next bunch of cleanups" and > "pc-dimm: pre_plug "slot" and "addr" assignment" > - kvm-type option not used anymore. We directly use > maxram_size and ram_size machine fields to compute the > MAX IPA range. Migration is naturally handled as CLI > option are kept between source and destination. This was > suggested by David. > - device_memory_start and device_memory_size not stored > anymore in vms->bootinfo > - I did not take into account 2 Igor's comments: the one > related to the refactoring of arm_load_dtb and the one > related to the generation of the dtb after system_reset > which would contain nodes of hotplugged devices (we do > not support hotplug at this stage) > - check the end-user does not attempt to hotplug a device > - addition of "vl: Set machine ram_size, maxram_size and > ram_slots earlier" > > v2 -> v3: > - fix pc_q35 and pc_piix compilation error > - kwangwoo's email being not valid anymore, remove his address > > v1 -> v2: > - kvm_get_max_vm_phys_shift moved in arch specific file > - addition of NVDIMM part > - single series > - rebase on David's refactoring > > v1: > - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size" > - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB" > > Best Regards > > Eric > > > Eric Auger (12): > hw/arm/virt: Rename highmem IO regions > hw/arm/virt: Split the memory map description > hw/boards: Add a MachineState parameter to kvm_type callback > kvm: add kvm_arm_get_max_vm_ipa_size > vl: Set machine ram_size, maxram_size and ram_slots earlier > hw/arm/virt: Dynamic memory map depending on RAM requirements > hw/arm/virt: Implement kvm_type function for 4.0 machine > hw/arm/virt: Bump the 255GB initial RAM limit > hw/arm/virt: Add memory hotplug framework > hw/arm/virt: Allocate device_memory > hw/arm/boot: Expose the pmem nodes in the DT > hw/arm/virt: Add nvdimm and nvdimm-persistence options > > Kwangwoo Lee (2): > nvdimm: use configurable ACPI IO base and size > hw/arm/virt: Add nvdimm hot-plug infrastructure > > Shameer Kolothum (3): > hw/arm/boot: introduce fdt_add_memory_node helper > hw/arm/boot: Expose the PC-DIMM nodes in the DT > hw/arm/virt-acpi-build: Add PC-DIMM in SRAT > > accel/kvm/kvm-all.c | 2 +- > default-configs/arm-softmmu.mak | 4 + > hw/acpi/nvdimm.c | 31 ++- > hw/arm/boot.c | 136 ++++++++++-- > hw/arm/virt-acpi-build.c | 23 +- > hw/arm/virt.c | 364 ++++++++++++++++++++++++++++---- > hw/i386/pc_piix.c | 6 +- > hw/i386/pc_q35.c | 6 +- > hw/ppc/mac_newworld.c | 3 +- > hw/ppc/mac_oldworld.c | 2 +- > hw/ppc/spapr.c | 2 +- > include/hw/arm/virt.h | 24 ++- > include/hw/boards.h | 5 +- > include/hw/mem/nvdimm.h | 4 + > target/arm/kvm.c | 10 + > target/arm/kvm_arm.h | 13 ++ > vl.c | 6 +- > 17 files changed, 556 insertions(+), 85 deletions(-) >
Hi Igor, On 2/22/19 5:27 PM, Igor Mammedov wrote: > On Wed, 20 Feb 2019 23:39:46 +0100 > Eric Auger <eric.auger@redhat.com> wrote: > >> This series aims to bump the 255GB RAM limit in machvirt and to >> support device memory in general, and especially PCDIMM/NVDIMM. >> >> In machvirt versions < 4.0, the initial RAM starts at 1GB and can >> grow up to 255GB. From 256GB onwards we find IO regions such as the >> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe >> MMIO region. The address map was 1TB large. This corresponded to >> the max IPA capacity KVM was able to manage. >> >> Since 4.20, the host kernel is able to support a larger and dynamic >> IPA range. So the guest physical address can go beyond the 1TB. The >> max GPA size depends on the host kernel configuration and physical CPUs. >> >> In this series we use this feature and allow the RAM to grow without >> any other limit than the one put by the host kernel. >> >> The RAM still starts at 1GB. First comes the initial ram (-m) of size >> ram_size and then comes the device memory (,maxmem) of size >> maxram_size - ram_size. The device memory is potentially hotpluggable >> depending on the instantiated memory objects. >> >> IO regions previously located between 256GB and 1TB are moved after >> the RAM. Their offset is dynamically computed, depends on ram_size >> and maxram_size. Size alignment is enforced. >> >> In case maxmem value is inferior to 255GB, the legacy memory map >> still is used. The change of memory map becomes effective from 4.0 >> onwards. >> >> As we keep the initial RAM at 1GB base address, we do not need to do >> invasive changes in the EDK2 FW. It seems nobody is eager to do >> that job at the moment. >> >> Device memory being put just after the initial RAM, it is possible >> to get access to this feature while keeping a 1TB address map. >> >> This series reuses/rebases patches initially submitted by Shameer >> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts. >> >> Functionally, the series is split into 3 parts: >> 1) bump of the initial RAM limit [1 - 9] and change in >> the memory map > >> 2) Support of PC-DIMM [10 - 13] > Is this part complete ACPI wise (for coldplug)? I haven't noticed > DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be > visible to the guest. It might be that DT is masking problem > but well, that won't work on ACPI only guests. guest /proc/meminfo or "lshw -class memory" reflects the amount of mem added with the DIMM slots. So it looks fine to me. Isn't E820 a pure x86 matter? What else would you expect in the dsdt? I understand hotplug would require extra modifications but I don't see anything else missing for coldplug. > Even though I've tried make mem hotplug ACPI parts not x86 specific, > I'm afraid it might be tightly coupled with hotplug support. > So here are 2 options make DSDT part work without hotplug or > implement hotplug here. I think the former is just a waste of time > and we should just add hotplug. It should take relatively minor effort > since you already implemented most of boiler plate here. Shameer sent an RFC series for supporting hotplug. [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support https://patchwork.kernel.org/cover/10783589/ I tested PCDIMM hotplug (with ACPI) this afternoon and it seemed to be OK, even after system_reset. Note the hotplug kernel support on ARM is very recent. I would prefer to dissociate both efforts if we want to get a chance making coldplug for 4.0. Also we have an issue for NVDIMM since on reboot the guest does not boot properly. > > As for how to implement ACPI HW part, I suggest to borrow GED > device that NEMU guys trying to use instead of GPIO route, > like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event. > So that it would be easier to share this with their virt-x86 > machine eventually. Sounds like a different approach than the one initiated by Shameer? Thanks Eric > > >> 3) Support of NV-DIMM [14 - 17] > The same might be true for NUMA but I haven't dug this deep in to > that part. > >> >> 1) can be upstreamed before 2 and 2 can be upstreamed before 3. >> >> Work is ongoing to transform the whole memory as device memory. >> However this move is not trivial and to me, is independent on >> the improvements brought by this series: >> - if we were to use DIMM for initial RAM, those DIMMs would use >> use slots. Although they would not be part of the ones provided >> using the ",slots" options, they are ACPI limited resources. >> - DT and ACPI description needs to be reworked >> - NUMA integration needs special care >> - a special device memory object may be required to avoid consuming >> slots and easing the FW description. >> >> So I preferred to separate the concerns. This new implementation >> based on device memory could be candidate for another virt >> version. >> >> Best Regards >> >> Eric >> >> References: >> >> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions >> http://patchwork.ozlabs.org/cover/914694/ >> >> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform >> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html >> >> This series can be found at: >> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7 >> >> History: >> >> v6 -> v7: >> - Addressed Peter and Igor comments (exceptions sent my email) >> - Fixed TCG case. Now device memory works also for TCG and vcpu >> pamax is checked >> - See individual logs for more details >> >> v5 -> v6: >> - mingw compilation issue fix >> - kvm_arm_get_max_vm_phys_shift always returns the number of supported >> IPA bits >> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review >> of "hw/arm/virt: Split the memory map description" >> - "hw/arm/virt: Move memory map initialization into machvirt_init" >> squashed into the previous patch >> - change alignment of IO regions beyond the RAM so that it matches their >> size >> >> v4 -> v5: >> - change in the memory map >> - see individual logs >> >> v3 -> v4: >> - rebase on David's "pc-dimm: next bunch of cleanups" and >> "pc-dimm: pre_plug "slot" and "addr" assignment" >> - kvm-type option not used anymore. We directly use >> maxram_size and ram_size machine fields to compute the >> MAX IPA range. Migration is naturally handled as CLI >> option are kept between source and destination. This was >> suggested by David. >> - device_memory_start and device_memory_size not stored >> anymore in vms->bootinfo >> - I did not take into account 2 Igor's comments: the one >> related to the refactoring of arm_load_dtb and the one >> related to the generation of the dtb after system_reset >> which would contain nodes of hotplugged devices (we do >> not support hotplug at this stage) >> - check the end-user does not attempt to hotplug a device >> - addition of "vl: Set machine ram_size, maxram_size and >> ram_slots earlier" >> >> v2 -> v3: >> - fix pc_q35 and pc_piix compilation error >> - kwangwoo's email being not valid anymore, remove his address >> >> v1 -> v2: >> - kvm_get_max_vm_phys_shift moved in arch specific file >> - addition of NVDIMM part >> - single series >> - rebase on David's refactoring >> >> v1: >> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size" >> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB" >> >> Best Regards >> >> Eric >> >> >> Eric Auger (12): >> hw/arm/virt: Rename highmem IO regions >> hw/arm/virt: Split the memory map description >> hw/boards: Add a MachineState parameter to kvm_type callback >> kvm: add kvm_arm_get_max_vm_ipa_size >> vl: Set machine ram_size, maxram_size and ram_slots earlier >> hw/arm/virt: Dynamic memory map depending on RAM requirements >> hw/arm/virt: Implement kvm_type function for 4.0 machine >> hw/arm/virt: Bump the 255GB initial RAM limit >> hw/arm/virt: Add memory hotplug framework >> hw/arm/virt: Allocate device_memory >> hw/arm/boot: Expose the pmem nodes in the DT >> hw/arm/virt: Add nvdimm and nvdimm-persistence options >> >> Kwangwoo Lee (2): >> nvdimm: use configurable ACPI IO base and size >> hw/arm/virt: Add nvdimm hot-plug infrastructure >> >> Shameer Kolothum (3): >> hw/arm/boot: introduce fdt_add_memory_node helper >> hw/arm/boot: Expose the PC-DIMM nodes in the DT >> hw/arm/virt-acpi-build: Add PC-DIMM in SRAT >> >> accel/kvm/kvm-all.c | 2 +- >> default-configs/arm-softmmu.mak | 4 + >> hw/acpi/nvdimm.c | 31 ++- >> hw/arm/boot.c | 136 ++++++++++-- >> hw/arm/virt-acpi-build.c | 23 +- >> hw/arm/virt.c | 364 ++++++++++++++++++++++++++++---- >> hw/i386/pc_piix.c | 6 +- >> hw/i386/pc_q35.c | 6 +- >> hw/ppc/mac_newworld.c | 3 +- >> hw/ppc/mac_oldworld.c | 2 +- >> hw/ppc/spapr.c | 2 +- >> include/hw/arm/virt.h | 24 ++- >> include/hw/boards.h | 5 +- >> include/hw/mem/nvdimm.h | 4 + >> target/arm/kvm.c | 10 + >> target/arm/kvm_arm.h | 13 ++ >> vl.c | 6 +- >> 17 files changed, 556 insertions(+), 85 deletions(-) >> > >
On Fri, 22 Feb 2019 18:35:26 +0100 Auger Eric <eric.auger@redhat.com> wrote: > Hi Igor, > > On 2/22/19 5:27 PM, Igor Mammedov wrote: > > On Wed, 20 Feb 2019 23:39:46 +0100 > > Eric Auger <eric.auger@redhat.com> wrote: > > > >> This series aims to bump the 255GB RAM limit in machvirt and to > >> support device memory in general, and especially PCDIMM/NVDIMM. > >> > >> In machvirt versions < 4.0, the initial RAM starts at 1GB and can > >> grow up to 255GB. From 256GB onwards we find IO regions such as the > >> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe > >> MMIO region. The address map was 1TB large. This corresponded to > >> the max IPA capacity KVM was able to manage. > >> > >> Since 4.20, the host kernel is able to support a larger and dynamic > >> IPA range. So the guest physical address can go beyond the 1TB. The > >> max GPA size depends on the host kernel configuration and physical CPUs. > >> > >> In this series we use this feature and allow the RAM to grow without > >> any other limit than the one put by the host kernel. > >> > >> The RAM still starts at 1GB. First comes the initial ram (-m) of size > >> ram_size and then comes the device memory (,maxmem) of size > >> maxram_size - ram_size. The device memory is potentially hotpluggable > >> depending on the instantiated memory objects. > >> > >> IO regions previously located between 256GB and 1TB are moved after > >> the RAM. Their offset is dynamically computed, depends on ram_size > >> and maxram_size. Size alignment is enforced. > >> > >> In case maxmem value is inferior to 255GB, the legacy memory map > >> still is used. The change of memory map becomes effective from 4.0 > >> onwards. > >> > >> As we keep the initial RAM at 1GB base address, we do not need to do > >> invasive changes in the EDK2 FW. It seems nobody is eager to do > >> that job at the moment. > >> > >> Device memory being put just after the initial RAM, it is possible > >> to get access to this feature while keeping a 1TB address map. > >> > >> This series reuses/rebases patches initially submitted by Shameer > >> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts. > >> > >> Functionally, the series is split into 3 parts: > >> 1) bump of the initial RAM limit [1 - 9] and change in > >> the memory map > > > >> 2) Support of PC-DIMM [10 - 13] > > Is this part complete ACPI wise (for coldplug)? I haven't noticed > > DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be > > visible to the guest. It might be that DT is masking problem > > but well, that won't work on ACPI only guests. > > guest /proc/meminfo or "lshw -class memory" reflects the amount of mem > added with the DIMM slots. Question is how does it get there? Does it come from DT or from firmware via UEFI interfaces? > So it looks fine to me. Isn't E820 a pure x86 matter? sorry for misleading, I've meant is UEFI GetMemoryMap(). On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed via UEFI GetMemoryMap() as guest kernel might start using it as normal memory early at boot and later put that memory into zone normal and hence make it non-hot-un-pluggable. The same concerns apply to DT based means of discovery. (That's guest issue but it's easy to workaround it not putting hotpluggable memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly) That way memory doesn't get (ab)used by firmware or early boot kernel stages and doesn't get locked up. > What else would you expect in the dsdt? Memory device descriptions, look for code that adds PNP0C80 with _CRS describing memory ranges > I understand hotplug > would require extra modifications but I don't see anything else missing > for coldplug. > > Even though I've tried make mem hotplug ACPI parts not x86 specific, > > I'm afraid it might be tightly coupled with hotplug support. > > So here are 2 options make DSDT part work without hotplug or > > implement hotplug here. I think the former is just a waste of time > > and we should just add hotplug. It should take relatively minor effort > > since you already implemented most of boiler plate here. > > Shameer sent an RFC series for supporting hotplug. > > [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support > https://patchwork.kernel.org/cover/10783589/ > > I tested PCDIMM hotplug (with ACPI) this afternoon and it seemed to be > OK, even after system_reset. > > Note the hotplug kernel support on ARM is very recent. I would prefer to > dissociate both efforts if we want to get a chance making coldplug for > 4.0. Also we have an issue for NVDIMM since on reboot the guest does not > boot properly. I guess we can merge implemetation that works on some kernel configs [DT based I'd guess], and add ACPI part later. Though that will be a bit of a mess as we do not version firmware parts (ACPI tables). > > As for how to implement ACPI HW part, I suggest to borrow GED > > device that NEMU guys trying to use instead of GPIO route, > > like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event. > > So that it would be easier to share this with their virt-x86 > > machine eventually. > Sounds like a different approach than the one initiated by Shameer? ARM boards were first to use ACPI hw-reduced profile so they picked up available back then GPIO based way to deliver hotplug event, later spec introduced Generic Event Device for that means to use with hw-reduced profile, which NEMU implemented[1], so I'd use that rather than ad-hoc GPIO mapping. I'd guess it will more compatible with various contemporary guests and we could reuse the same code for both x86/arm virt boards) 1) https://github.com/intel/nemu/blob/topic/virt-x86/hw/acpi/ged.c > > Thanks > > Eric > > > > > >> 3) Support of NV-DIMM [14 - 17] > > The same might be true for NUMA but I haven't dug this deep in to > > that part. > > > >> > >> 1) can be upstreamed before 2 and 2 can be upstreamed before 3. > >> > >> Work is ongoing to transform the whole memory as device memory. > >> However this move is not trivial and to me, is independent on > >> the improvements brought by this series: > >> - if we were to use DIMM for initial RAM, those DIMMs would use > >> use slots. Although they would not be part of the ones provided > >> using the ",slots" options, they are ACPI limited resources. > >> - DT and ACPI description needs to be reworked > >> - NUMA integration needs special care > >> - a special device memory object may be required to avoid consuming > >> slots and easing the FW description. > >> > >> So I preferred to separate the concerns. This new implementation > >> based on device memory could be candidate for another virt > >> version. > >> > >> Best Regards > >> > >> Eric > >> > >> References: > >> > >> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions > >> http://patchwork.ozlabs.org/cover/914694/ > >> > >> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform > >> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html > >> > >> This series can be found at: > >> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7 > >> > >> History: > >> > >> v6 -> v7: > >> - Addressed Peter and Igor comments (exceptions sent my email) > >> - Fixed TCG case. Now device memory works also for TCG and vcpu > >> pamax is checked > >> - See individual logs for more details > >> > >> v5 -> v6: > >> - mingw compilation issue fix > >> - kvm_arm_get_max_vm_phys_shift always returns the number of supported > >> IPA bits > >> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review > >> of "hw/arm/virt: Split the memory map description" > >> - "hw/arm/virt: Move memory map initialization into machvirt_init" > >> squashed into the previous patch > >> - change alignment of IO regions beyond the RAM so that it matches their > >> size > >> > >> v4 -> v5: > >> - change in the memory map > >> - see individual logs > >> > >> v3 -> v4: > >> - rebase on David's "pc-dimm: next bunch of cleanups" and > >> "pc-dimm: pre_plug "slot" and "addr" assignment" > >> - kvm-type option not used anymore. We directly use > >> maxram_size and ram_size machine fields to compute the > >> MAX IPA range. Migration is naturally handled as CLI > >> option are kept between source and destination. This was > >> suggested by David. > >> - device_memory_start and device_memory_size not stored > >> anymore in vms->bootinfo > >> - I did not take into account 2 Igor's comments: the one > >> related to the refactoring of arm_load_dtb and the one > >> related to the generation of the dtb after system_reset > >> which would contain nodes of hotplugged devices (we do > >> not support hotplug at this stage) > >> - check the end-user does not attempt to hotplug a device > >> - addition of "vl: Set machine ram_size, maxram_size and > >> ram_slots earlier" > >> > >> v2 -> v3: > >> - fix pc_q35 and pc_piix compilation error > >> - kwangwoo's email being not valid anymore, remove his address > >> > >> v1 -> v2: > >> - kvm_get_max_vm_phys_shift moved in arch specific file > >> - addition of NVDIMM part > >> - single series > >> - rebase on David's refactoring > >> > >> v1: > >> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size" > >> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB" > >> > >> Best Regards > >> > >> Eric > >> > >> > >> Eric Auger (12): > >> hw/arm/virt: Rename highmem IO regions > >> hw/arm/virt: Split the memory map description > >> hw/boards: Add a MachineState parameter to kvm_type callback > >> kvm: add kvm_arm_get_max_vm_ipa_size > >> vl: Set machine ram_size, maxram_size and ram_slots earlier > >> hw/arm/virt: Dynamic memory map depending on RAM requirements > >> hw/arm/virt: Implement kvm_type function for 4.0 machine > >> hw/arm/virt: Bump the 255GB initial RAM limit > >> hw/arm/virt: Add memory hotplug framework > >> hw/arm/virt: Allocate device_memory > >> hw/arm/boot: Expose the pmem nodes in the DT > >> hw/arm/virt: Add nvdimm and nvdimm-persistence options > >> > >> Kwangwoo Lee (2): > >> nvdimm: use configurable ACPI IO base and size > >> hw/arm/virt: Add nvdimm hot-plug infrastructure > >> > >> Shameer Kolothum (3): > >> hw/arm/boot: introduce fdt_add_memory_node helper > >> hw/arm/boot: Expose the PC-DIMM nodes in the DT > >> hw/arm/virt-acpi-build: Add PC-DIMM in SRAT > >> > >> accel/kvm/kvm-all.c | 2 +- > >> default-configs/arm-softmmu.mak | 4 + > >> hw/acpi/nvdimm.c | 31 ++- > >> hw/arm/boot.c | 136 ++++++++++-- > >> hw/arm/virt-acpi-build.c | 23 +- > >> hw/arm/virt.c | 364 ++++++++++++++++++++++++++++---- > >> hw/i386/pc_piix.c | 6 +- > >> hw/i386/pc_q35.c | 6 +- > >> hw/ppc/mac_newworld.c | 3 +- > >> hw/ppc/mac_oldworld.c | 2 +- > >> hw/ppc/spapr.c | 2 +- > >> include/hw/arm/virt.h | 24 ++- > >> include/hw/boards.h | 5 +- > >> include/hw/mem/nvdimm.h | 4 + > >> target/arm/kvm.c | 10 + > >> target/arm/kvm_arm.h | 13 ++ > >> vl.c | 6 +- > >> 17 files changed, 556 insertions(+), 85 deletions(-) > >> > > > >
Hi Igor, > -----Original Message----- > From: Igor Mammedov [mailto:imammedo@redhat.com] > Sent: 25 February 2019 09:42 > To: Auger Eric <eric.auger@redhat.com> > Cc: peter.maydell@linaro.org; drjones@redhat.com; david@redhat.com; > qemu-devel@nongnu.org; Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@huawei.com>; dgilbert@redhat.com; > qemu-arm@nongnu.org; david@gibson.dropbear.id.au; > eric.auger.pro@gmail.com > Subject: Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion > and PCDIMM/NVDIMM support > > On Fri, 22 Feb 2019 18:35:26 +0100 > Auger Eric <eric.auger@redhat.com> wrote: [...] > > I understand hotplug > > would require extra modifications but I don't see anything else missing > > for coldplug. > > > Even though I've tried make mem hotplug ACPI parts not x86 specific, > > > I'm afraid it might be tightly coupled with hotplug support. > > > So here are 2 options make DSDT part work without hotplug or > > > implement hotplug here. I think the former is just a waste of time > > > and we should just add hotplug. It should take relatively minor effort > > > since you already implemented most of boiler plate here. > > > > Shameer sent an RFC series for supporting hotplug. > > > > [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support > > https://patchwork.kernel.org/cover/10783589/ > > > > I tested PCDIMM hotplug (with ACPI) this afternoon and it seemed to be > > OK, even after system_reset. > > > > Note the hotplug kernel support on ARM is very recent. I would prefer to > > dissociate both efforts if we want to get a chance making coldplug for > > 4.0. Also we have an issue for NVDIMM since on reboot the guest does not > > boot properly. > I guess we can merge implemetation that works on some kernel configs > [DT based I'd guess], and add ACPI part later. Though that will be > a bit of a mess as we do not version firmware parts (ACPI tables). > > > > As for how to implement ACPI HW part, I suggest to borrow GED > > > device that NEMU guys trying to use instead of GPIO route, > > > like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event. > > > So that it would be easier to share this with their virt-x86 > > > machine eventually. > > Sounds like a different approach than the one initiated by Shameer? > ARM boards were first to use ACPI hw-reduced profile so they picked up > available back then GPIO based way to deliver hotplug event, later spec > introduced Generic Event Device for that means to use with hw-reduced > profile, which NEMU implemented[1], so I'd use that rather than ad-hoc > GPIO mapping. I'd guess it will more compatible with various contemporary > guests and we could reuse the same code for both x86/arm virt boards) > > 1) https://github.com/intel/nemu/blob/topic/virt-x86/hw/acpi/ged.c Thanks for this suggestion. Looks like that is definitely a better way of handling ACPI events. I will take a look and use GED for the next revision of hotplug series. Thanks, Shameer > > Thanks > > > > Eric > > > > > > > > >> 3) Support of NV-DIMM [14 - 17] > > > The same might be true for NUMA but I haven't dug this deep in to > > > that part. > > > > > >> > > >> 1) can be upstreamed before 2 and 2 can be upstreamed before 3. > > >> > > >> Work is ongoing to transform the whole memory as device memory. > > >> However this move is not trivial and to me, is independent on > > >> the improvements brought by this series: > > >> - if we were to use DIMM for initial RAM, those DIMMs would use > > >> use slots. Although they would not be part of the ones provided > > >> using the ",slots" options, they are ACPI limited resources. > > >> - DT and ACPI description needs to be reworked > > >> - NUMA integration needs special care > > >> - a special device memory object may be required to avoid consuming > > >> slots and easing the FW description. > > >> > > >> So I preferred to separate the concerns. This new implementation > > >> based on device memory could be candidate for another virt > > >> version. > > >> > > >> Best Regards > > >> > > >> Eric > > >> > > >> References: > > >> > > >> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions > > >> http://patchwork.ozlabs.org/cover/914694/ > > >> > > >> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform > > >> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html > > >> > > >> This series can be found at: > > >> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7 > > >> > > >> History: > > >> > > >> v6 -> v7: > > >> - Addressed Peter and Igor comments (exceptions sent my email) > > >> - Fixed TCG case. Now device memory works also for TCG and vcpu > > >> pamax is checked > > >> - See individual logs for more details > > >> > > >> v5 -> v6: > > >> - mingw compilation issue fix > > >> - kvm_arm_get_max_vm_phys_shift always returns the number of > supported > > >> IPA bits > > >> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the > review > > >> of "hw/arm/virt: Split the memory map description" > > >> - "hw/arm/virt: Move memory map initialization into machvirt_init" > > >> squashed into the previous patch > > >> - change alignment of IO regions beyond the RAM so that it matches their > > >> size > > >> > > >> v4 -> v5: > > >> - change in the memory map > > >> - see individual logs > > >> > > >> v3 -> v4: > > >> - rebase on David's "pc-dimm: next bunch of cleanups" and > > >> "pc-dimm: pre_plug "slot" and "addr" assignment" > > >> - kvm-type option not used anymore. We directly use > > >> maxram_size and ram_size machine fields to compute the > > >> MAX IPA range. Migration is naturally handled as CLI > > >> option are kept between source and destination. This was > > >> suggested by David. > > >> - device_memory_start and device_memory_size not stored > > >> anymore in vms->bootinfo > > >> - I did not take into account 2 Igor's comments: the one > > >> related to the refactoring of arm_load_dtb and the one > > >> related to the generation of the dtb after system_reset > > >> which would contain nodes of hotplugged devices (we do > > >> not support hotplug at this stage) > > >> - check the end-user does not attempt to hotplug a device > > >> - addition of "vl: Set machine ram_size, maxram_size and > > >> ram_slots earlier" > > >> > > >> v2 -> v3: > > >> - fix pc_q35 and pc_piix compilation error > > >> - kwangwoo's email being not valid anymore, remove his address > > >> > > >> v1 -> v2: > > >> - kvm_get_max_vm_phys_shift moved in arch specific file > > >> - addition of NVDIMM part > > >> - single series > > >> - rebase on David's refactoring > > >> > > >> v1: > > >> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size" > > >> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB" > > >> > > >> Best Regards > > >> > > >> Eric > > >> > > >> > > >> Eric Auger (12): > > >> hw/arm/virt: Rename highmem IO regions > > >> hw/arm/virt: Split the memory map description > > >> hw/boards: Add a MachineState parameter to kvm_type callback > > >> kvm: add kvm_arm_get_max_vm_ipa_size > > >> vl: Set machine ram_size, maxram_size and ram_slots earlier > > >> hw/arm/virt: Dynamic memory map depending on RAM requirements > > >> hw/arm/virt: Implement kvm_type function for 4.0 machine > > >> hw/arm/virt: Bump the 255GB initial RAM limit > > >> hw/arm/virt: Add memory hotplug framework > > >> hw/arm/virt: Allocate device_memory > > >> hw/arm/boot: Expose the pmem nodes in the DT > > >> hw/arm/virt: Add nvdimm and nvdimm-persistence options > > >> > > >> Kwangwoo Lee (2): > > >> nvdimm: use configurable ACPI IO base and size > > >> hw/arm/virt: Add nvdimm hot-plug infrastructure > > >> > > >> Shameer Kolothum (3): > > >> hw/arm/boot: introduce fdt_add_memory_node helper > > >> hw/arm/boot: Expose the PC-DIMM nodes in the DT > > >> hw/arm/virt-acpi-build: Add PC-DIMM in SRAT > > >> > > >> accel/kvm/kvm-all.c | 2 +- > > >> default-configs/arm-softmmu.mak | 4 + > > >> hw/acpi/nvdimm.c | 31 ++- > > >> hw/arm/boot.c | 136 ++++++++++-- > > >> hw/arm/virt-acpi-build.c | 23 +- > > >> hw/arm/virt.c | 364 > ++++++++++++++++++++++++++++---- > > >> hw/i386/pc_piix.c | 6 +- > > >> hw/i386/pc_q35.c | 6 +- > > >> hw/ppc/mac_newworld.c | 3 +- > > >> hw/ppc/mac_oldworld.c | 2 +- > > >> hw/ppc/spapr.c | 2 +- > > >> include/hw/arm/virt.h | 24 ++- > > >> include/hw/boards.h | 5 +- > > >> include/hw/mem/nvdimm.h | 4 + > > >> target/arm/kvm.c | 10 + > > >> target/arm/kvm_arm.h | 13 ++ > > >> vl.c | 6 +- > > >> 17 files changed, 556 insertions(+), 85 deletions(-) > > >> > > > > > >
Hi Igor, On 2/25/19 10:42 AM, Igor Mammedov wrote: > On Fri, 22 Feb 2019 18:35:26 +0100 > Auger Eric <eric.auger@redhat.com> wrote: > >> Hi Igor, >> >> On 2/22/19 5:27 PM, Igor Mammedov wrote: >>> On Wed, 20 Feb 2019 23:39:46 +0100 >>> Eric Auger <eric.auger@redhat.com> wrote: >>> >>>> This series aims to bump the 255GB RAM limit in machvirt and to >>>> support device memory in general, and especially PCDIMM/NVDIMM. >>>> >>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can >>>> grow up to 255GB. From 256GB onwards we find IO regions such as the >>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe >>>> MMIO region. The address map was 1TB large. This corresponded to >>>> the max IPA capacity KVM was able to manage. >>>> >>>> Since 4.20, the host kernel is able to support a larger and dynamic >>>> IPA range. So the guest physical address can go beyond the 1TB. The >>>> max GPA size depends on the host kernel configuration and physical CPUs. >>>> >>>> In this series we use this feature and allow the RAM to grow without >>>> any other limit than the one put by the host kernel. >>>> >>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size >>>> ram_size and then comes the device memory (,maxmem) of size >>>> maxram_size - ram_size. The device memory is potentially hotpluggable >>>> depending on the instantiated memory objects. >>>> >>>> IO regions previously located between 256GB and 1TB are moved after >>>> the RAM. Their offset is dynamically computed, depends on ram_size >>>> and maxram_size. Size alignment is enforced. >>>> >>>> In case maxmem value is inferior to 255GB, the legacy memory map >>>> still is used. The change of memory map becomes effective from 4.0 >>>> onwards. >>>> >>>> As we keep the initial RAM at 1GB base address, we do not need to do >>>> invasive changes in the EDK2 FW. It seems nobody is eager to do >>>> that job at the moment. >>>> >>>> Device memory being put just after the initial RAM, it is possible >>>> to get access to this feature while keeping a 1TB address map. >>>> >>>> This series reuses/rebases patches initially submitted by Shameer >>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts. >>>> >>>> Functionally, the series is split into 3 parts: >>>> 1) bump of the initial RAM limit [1 - 9] and change in >>>> the memory map >>> >>>> 2) Support of PC-DIMM [10 - 13] >>> Is this part complete ACPI wise (for coldplug)? I haven't noticed >>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be >>> visible to the guest. It might be that DT is masking problem >>> but well, that won't work on ACPI only guests. >> >> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem >> added with the DIMM slots. > Question is how does it get there? Does it come from DT or from firmware > via UEFI interfaces? > >> So it looks fine to me. Isn't E820 a pure x86 matter? > sorry for misleading, I've meant is UEFI GetMemoryMap(). > On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed > via UEFI GetMemoryMap() as guest kernel might start using it as normal > memory early at boot and later put that memory into zone normal and hence > make it non-hot-un-pluggable. The same concerns apply to DT based means > of discovery. > (That's guest issue but it's easy to workaround it not putting hotpluggable > memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly) > That way memory doesn't get (ab)used by firmware or early boot kernel stages > and doesn't get locked up. > >> What else would you expect in the dsdt? > Memory device descriptions, look for code that adds PNP0C80 with _CRS > describing memory ranges OK thank you for the explanations. I will work on PNP0C80 addition then. Does it mean that in ACPI mode we must not output DT hotplug memory nodes or assuming that PNP0C80 is properly described, it will "override" DT description? > >> I understand hotplug >> would require extra modifications but I don't see anything else missing >> for coldplug. >>> Even though I've tried make mem hotplug ACPI parts not x86 specific, >>> I'm afraid it might be tightly coupled with hotplug support. >>> So here are 2 options make DSDT part work without hotplug or >>> implement hotplug here. I think the former is just a waste of time >>> and we should just add hotplug. It should take relatively minor effort >>> since you already implemented most of boiler plate here. >> >> Shameer sent an RFC series for supporting hotplug. >> >> [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support >> https://patchwork.kernel.org/cover/10783589/ >> >> I tested PCDIMM hotplug (with ACPI) this afternoon and it seemed to be >> OK, even after system_reset. >> >> Note the hotplug kernel support on ARM is very recent. I would prefer to >> dissociate both efforts if we want to get a chance making coldplug for >> 4.0. Also we have an issue for NVDIMM since on reboot the guest does not >> boot properly. > I guess we can merge implemetation that works on some kernel configs > [DT based I'd guess], and add ACPI part later. Though that will be > a bit of a mess as we do not version firmware parts (ACPI tables). > >>> As for how to implement ACPI HW part, I suggest to borrow GED >>> device that NEMU guys trying to use instead of GPIO route, >>> like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event. >>> So that it would be easier to share this with their virt-x86 >>> machine eventually. >> Sounds like a different approach than the one initiated by Shameer? > ARM boards were first to use ACPI hw-reduced profile so they picked up > available back then GPIO based way to deliver hotplug event, later spec > introduced Generic Event Device for that means to use with hw-reduced > profile, which NEMU implemented[1], so I'd use that rather than ad-hoc > GPIO mapping. I'd guess it will more compatible with various contemporary > guests and we could reuse the same code for both x86/arm virt boards) > > 1) https://github.com/intel/nemu/blob/topic/virt-x86/hw/acpi/ged.c That's really helpful for the ARM hotplug works. Thanks! Eric > >> >> Thanks >> >> Eric >>> >>> >>>> 3) Support of NV-DIMM [14 - 17] >>> The same might be true for NUMA but I haven't dug this deep in to >>> that part. >>> >>>> >>>> 1) can be upstreamed before 2 and 2 can be upstreamed before 3. >>>> >>>> Work is ongoing to transform the whole memory as device memory. >>>> However this move is not trivial and to me, is independent on >>>> the improvements brought by this series: >>>> - if we were to use DIMM for initial RAM, those DIMMs would use >>>> use slots. Although they would not be part of the ones provided >>>> using the ",slots" options, they are ACPI limited resources. >>>> - DT and ACPI description needs to be reworked >>>> - NUMA integration needs special care >>>> - a special device memory object may be required to avoid consuming >>>> slots and easing the FW description. >>>> >>>> So I preferred to separate the concerns. This new implementation >>>> based on device memory could be candidate for another virt >>>> version. >>>> >>>> Best Regards >>>> >>>> Eric >>>> >>>> References: >>>> >>>> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions >>>> http://patchwork.ozlabs.org/cover/914694/ >>>> >>>> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform >>>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html >>>> >>>> This series can be found at: >>>> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7 >>>> >>>> History: >>>> >>>> v6 -> v7: >>>> - Addressed Peter and Igor comments (exceptions sent my email) >>>> - Fixed TCG case. Now device memory works also for TCG and vcpu >>>> pamax is checked >>>> - See individual logs for more details >>>> >>>> v5 -> v6: >>>> - mingw compilation issue fix >>>> - kvm_arm_get_max_vm_phys_shift always returns the number of supported >>>> IPA bits >>>> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review >>>> of "hw/arm/virt: Split the memory map description" >>>> - "hw/arm/virt: Move memory map initialization into machvirt_init" >>>> squashed into the previous patch >>>> - change alignment of IO regions beyond the RAM so that it matches their >>>> size >>>> >>>> v4 -> v5: >>>> - change in the memory map >>>> - see individual logs >>>> >>>> v3 -> v4: >>>> - rebase on David's "pc-dimm: next bunch of cleanups" and >>>> "pc-dimm: pre_plug "slot" and "addr" assignment" >>>> - kvm-type option not used anymore. We directly use >>>> maxram_size and ram_size machine fields to compute the >>>> MAX IPA range. Migration is naturally handled as CLI >>>> option are kept between source and destination. This was >>>> suggested by David. >>>> - device_memory_start and device_memory_size not stored >>>> anymore in vms->bootinfo >>>> - I did not take into account 2 Igor's comments: the one >>>> related to the refactoring of arm_load_dtb and the one >>>> related to the generation of the dtb after system_reset >>>> which would contain nodes of hotplugged devices (we do >>>> not support hotplug at this stage) >>>> - check the end-user does not attempt to hotplug a device >>>> - addition of "vl: Set machine ram_size, maxram_size and >>>> ram_slots earlier" >>>> >>>> v2 -> v3: >>>> - fix pc_q35 and pc_piix compilation error >>>> - kwangwoo's email being not valid anymore, remove his address >>>> >>>> v1 -> v2: >>>> - kvm_get_max_vm_phys_shift moved in arch specific file >>>> - addition of NVDIMM part >>>> - single series >>>> - rebase on David's refactoring >>>> >>>> v1: >>>> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size" >>>> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB" >>>> >>>> Best Regards >>>> >>>> Eric >>>> >>>> >>>> Eric Auger (12): >>>> hw/arm/virt: Rename highmem IO regions >>>> hw/arm/virt: Split the memory map description >>>> hw/boards: Add a MachineState parameter to kvm_type callback >>>> kvm: add kvm_arm_get_max_vm_ipa_size >>>> vl: Set machine ram_size, maxram_size and ram_slots earlier >>>> hw/arm/virt: Dynamic memory map depending on RAM requirements >>>> hw/arm/virt: Implement kvm_type function for 4.0 machine >>>> hw/arm/virt: Bump the 255GB initial RAM limit >>>> hw/arm/virt: Add memory hotplug framework >>>> hw/arm/virt: Allocate device_memory >>>> hw/arm/boot: Expose the pmem nodes in the DT >>>> hw/arm/virt: Add nvdimm and nvdimm-persistence options >>>> >>>> Kwangwoo Lee (2): >>>> nvdimm: use configurable ACPI IO base and size >>>> hw/arm/virt: Add nvdimm hot-plug infrastructure >>>> >>>> Shameer Kolothum (3): >>>> hw/arm/boot: introduce fdt_add_memory_node helper >>>> hw/arm/boot: Expose the PC-DIMM nodes in the DT >>>> hw/arm/virt-acpi-build: Add PC-DIMM in SRAT >>>> >>>> accel/kvm/kvm-all.c | 2 +- >>>> default-configs/arm-softmmu.mak | 4 + >>>> hw/acpi/nvdimm.c | 31 ++- >>>> hw/arm/boot.c | 136 ++++++++++-- >>>> hw/arm/virt-acpi-build.c | 23 +- >>>> hw/arm/virt.c | 364 ++++++++++++++++++++++++++++---- >>>> hw/i386/pc_piix.c | 6 +- >>>> hw/i386/pc_q35.c | 6 +- >>>> hw/ppc/mac_newworld.c | 3 +- >>>> hw/ppc/mac_oldworld.c | 2 +- >>>> hw/ppc/spapr.c | 2 +- >>>> include/hw/arm/virt.h | 24 ++- >>>> include/hw/boards.h | 5 +- >>>> include/hw/mem/nvdimm.h | 4 + >>>> target/arm/kvm.c | 10 + >>>> target/arm/kvm_arm.h | 13 ++ >>>> vl.c | 6 +- >>>> 17 files changed, 556 insertions(+), 85 deletions(-) >>>> >>> >>> >
Hi Igor, On 2/26/19 9:40 AM, Auger Eric wrote: > Hi Igor, > > On 2/25/19 10:42 AM, Igor Mammedov wrote: >> On Fri, 22 Feb 2019 18:35:26 +0100 >> Auger Eric <eric.auger@redhat.com> wrote: >> >>> Hi Igor, >>> >>> On 2/22/19 5:27 PM, Igor Mammedov wrote: >>>> On Wed, 20 Feb 2019 23:39:46 +0100 >>>> Eric Auger <eric.auger@redhat.com> wrote: >>>> >>>>> This series aims to bump the 255GB RAM limit in machvirt and to >>>>> support device memory in general, and especially PCDIMM/NVDIMM. >>>>> >>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can >>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the >>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe >>>>> MMIO region. The address map was 1TB large. This corresponded to >>>>> the max IPA capacity KVM was able to manage. >>>>> >>>>> Since 4.20, the host kernel is able to support a larger and dynamic >>>>> IPA range. So the guest physical address can go beyond the 1TB. The >>>>> max GPA size depends on the host kernel configuration and physical CPUs. >>>>> >>>>> In this series we use this feature and allow the RAM to grow without >>>>> any other limit than the one put by the host kernel. >>>>> >>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size >>>>> ram_size and then comes the device memory (,maxmem) of size >>>>> maxram_size - ram_size. The device memory is potentially hotpluggable >>>>> depending on the instantiated memory objects. >>>>> >>>>> IO regions previously located between 256GB and 1TB are moved after >>>>> the RAM. Their offset is dynamically computed, depends on ram_size >>>>> and maxram_size. Size alignment is enforced. >>>>> >>>>> In case maxmem value is inferior to 255GB, the legacy memory map >>>>> still is used. The change of memory map becomes effective from 4.0 >>>>> onwards. >>>>> >>>>> As we keep the initial RAM at 1GB base address, we do not need to do >>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do >>>>> that job at the moment. >>>>> >>>>> Device memory being put just after the initial RAM, it is possible >>>>> to get access to this feature while keeping a 1TB address map. >>>>> >>>>> This series reuses/rebases patches initially submitted by Shameer >>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts. >>>>> >>>>> Functionally, the series is split into 3 parts: >>>>> 1) bump of the initial RAM limit [1 - 9] and change in >>>>> the memory map >>>> >>>>> 2) Support of PC-DIMM [10 - 13] >>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed >>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be >>>> visible to the guest. It might be that DT is masking problem >>>> but well, that won't work on ACPI only guests. >>> >>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem >>> added with the DIMM slots. >> Question is how does it get there? Does it come from DT or from firmware >> via UEFI interfaces? >> >>> So it looks fine to me. Isn't E820 a pure x86 matter? >> sorry for misleading, I've meant is UEFI GetMemoryMap(). >> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed >> via UEFI GetMemoryMap() as guest kernel might start using it as normal >> memory early at boot and later put that memory into zone normal and hence >> make it non-hot-un-pluggable. The same concerns apply to DT based means >> of discovery. >> (That's guest issue but it's easy to workaround it not putting hotpluggable >> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly) >> That way memory doesn't get (ab)used by firmware or early boot kernel stages >> and doesn't get locked up. >> >>> What else would you expect in the dsdt? >> Memory device descriptions, look for code that adds PNP0C80 with _CRS >> describing memory ranges > > OK thank you for the explanations. I will work on PNP0C80 addition then. > Does it mean that in ACPI mode we must not output DT hotplug memory > nodes or assuming that PNP0C80 is properly described, it will "override" > DT description? After further investigations, I think the pieces you pointed out are added by Shameer's series, ie. through the build_memory_hotplug_aml() call. So I suggest we separate the concerns: this series brings support for DIMM coldplug. hotplug, including all the relevant ACPI structures will be added later on by Shameer. Thanks Eric > >> >>> I understand hotplug >>> would require extra modifications but I don't see anything else missing >>> for coldplug. >>>> Even though I've tried make mem hotplug ACPI parts not x86 specific, >>>> I'm afraid it might be tightly coupled with hotplug support. >>>> So here are 2 options make DSDT part work without hotplug or >>>> implement hotplug here. I think the former is just a waste of time >>>> and we should just add hotplug. It should take relatively minor effort >>>> since you already implemented most of boiler plate here. >>> >>> Shameer sent an RFC series for supporting hotplug. >>> >>> [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support >>> https://patchwork.kernel.org/cover/10783589/ >>> >>> I tested PCDIMM hotplug (with ACPI) this afternoon and it seemed to be >>> OK, even after system_reset. >>> >>> Note the hotplug kernel support on ARM is very recent. I would prefer to >>> dissociate both efforts if we want to get a chance making coldplug for >>> 4.0. Also we have an issue for NVDIMM since on reboot the guest does not >>> boot properly. >> I guess we can merge implemetation that works on some kernel configs >> [DT based I'd guess], and add ACPI part later. Though that will be >> a bit of a mess as we do not version firmware parts (ACPI tables). >> >>>> As for how to implement ACPI HW part, I suggest to borrow GED >>>> device that NEMU guys trying to use instead of GPIO route, >>>> like we do now for ACPI_POWER_BUTTON_DEVICE to deliver event. >>>> So that it would be easier to share this with their virt-x86 >>>> machine eventually. >>> Sounds like a different approach than the one initiated by Shameer? >> ARM boards were first to use ACPI hw-reduced profile so they picked up >> available back then GPIO based way to deliver hotplug event, later spec >> introduced Generic Event Device for that means to use with hw-reduced >> profile, which NEMU implemented[1], so I'd use that rather than ad-hoc >> GPIO mapping. I'd guess it will more compatible with various contemporary >> guests and we could reuse the same code for both x86/arm virt boards) >> >> 1) https://github.com/intel/nemu/blob/topic/virt-x86/hw/acpi/ged.c > > That's really helpful for the ARM hotplug works. Thanks! > > Eric >> >>> >>> Thanks >>> >>> Eric >>>> >>>> >>>>> 3) Support of NV-DIMM [14 - 17] >>>> The same might be true for NUMA but I haven't dug this deep in to >>>> that part. >>>> >>>>> >>>>> 1) can be upstreamed before 2 and 2 can be upstreamed before 3. >>>>> >>>>> Work is ongoing to transform the whole memory as device memory. >>>>> However this move is not trivial and to me, is independent on >>>>> the improvements brought by this series: >>>>> - if we were to use DIMM for initial RAM, those DIMMs would use >>>>> use slots. Although they would not be part of the ones provided >>>>> using the ",slots" options, they are ACPI limited resources. >>>>> - DT and ACPI description needs to be reworked >>>>> - NUMA integration needs special care >>>>> - a special device memory object may be required to avoid consuming >>>>> slots and easing the FW description. >>>>> >>>>> So I preferred to separate the concerns. This new implementation >>>>> based on device memory could be candidate for another virt >>>>> version. >>>>> >>>>> Best Regards >>>>> >>>>> Eric >>>>> >>>>> References: >>>>> >>>>> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions >>>>> http://patchwork.ozlabs.org/cover/914694/ >>>>> >>>>> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform >>>>> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html >>>>> >>>>> This series can be found at: >>>>> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7 >>>>> >>>>> History: >>>>> >>>>> v6 -> v7: >>>>> - Addressed Peter and Igor comments (exceptions sent my email) >>>>> - Fixed TCG case. Now device memory works also for TCG and vcpu >>>>> pamax is checked >>>>> - See individual logs for more details >>>>> >>>>> v5 -> v6: >>>>> - mingw compilation issue fix >>>>> - kvm_arm_get_max_vm_phys_shift always returns the number of supported >>>>> IPA bits >>>>> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review >>>>> of "hw/arm/virt: Split the memory map description" >>>>> - "hw/arm/virt: Move memory map initialization into machvirt_init" >>>>> squashed into the previous patch >>>>> - change alignment of IO regions beyond the RAM so that it matches their >>>>> size >>>>> >>>>> v4 -> v5: >>>>> - change in the memory map >>>>> - see individual logs >>>>> >>>>> v3 -> v4: >>>>> - rebase on David's "pc-dimm: next bunch of cleanups" and >>>>> "pc-dimm: pre_plug "slot" and "addr" assignment" >>>>> - kvm-type option not used anymore. We directly use >>>>> maxram_size and ram_size machine fields to compute the >>>>> MAX IPA range. Migration is naturally handled as CLI >>>>> option are kept between source and destination. This was >>>>> suggested by David. >>>>> - device_memory_start and device_memory_size not stored >>>>> anymore in vms->bootinfo >>>>> - I did not take into account 2 Igor's comments: the one >>>>> related to the refactoring of arm_load_dtb and the one >>>>> related to the generation of the dtb after system_reset >>>>> which would contain nodes of hotplugged devices (we do >>>>> not support hotplug at this stage) >>>>> - check the end-user does not attempt to hotplug a device >>>>> - addition of "vl: Set machine ram_size, maxram_size and >>>>> ram_slots earlier" >>>>> >>>>> v2 -> v3: >>>>> - fix pc_q35 and pc_piix compilation error >>>>> - kwangwoo's email being not valid anymore, remove his address >>>>> >>>>> v1 -> v2: >>>>> - kvm_get_max_vm_phys_shift moved in arch specific file >>>>> - addition of NVDIMM part >>>>> - single series >>>>> - rebase on David's refactoring >>>>> >>>>> v1: >>>>> - was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size" >>>>> - was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB" >>>>> >>>>> Best Regards >>>>> >>>>> Eric >>>>> >>>>> >>>>> Eric Auger (12): >>>>> hw/arm/virt: Rename highmem IO regions >>>>> hw/arm/virt: Split the memory map description >>>>> hw/boards: Add a MachineState parameter to kvm_type callback >>>>> kvm: add kvm_arm_get_max_vm_ipa_size >>>>> vl: Set machine ram_size, maxram_size and ram_slots earlier >>>>> hw/arm/virt: Dynamic memory map depending on RAM requirements >>>>> hw/arm/virt: Implement kvm_type function for 4.0 machine >>>>> hw/arm/virt: Bump the 255GB initial RAM limit >>>>> hw/arm/virt: Add memory hotplug framework >>>>> hw/arm/virt: Allocate device_memory >>>>> hw/arm/boot: Expose the pmem nodes in the DT >>>>> hw/arm/virt: Add nvdimm and nvdimm-persistence options >>>>> >>>>> Kwangwoo Lee (2): >>>>> nvdimm: use configurable ACPI IO base and size >>>>> hw/arm/virt: Add nvdimm hot-plug infrastructure >>>>> >>>>> Shameer Kolothum (3): >>>>> hw/arm/boot: introduce fdt_add_memory_node helper >>>>> hw/arm/boot: Expose the PC-DIMM nodes in the DT >>>>> hw/arm/virt-acpi-build: Add PC-DIMM in SRAT >>>>> >>>>> accel/kvm/kvm-all.c | 2 +- >>>>> default-configs/arm-softmmu.mak | 4 + >>>>> hw/acpi/nvdimm.c | 31 ++- >>>>> hw/arm/boot.c | 136 ++++++++++-- >>>>> hw/arm/virt-acpi-build.c | 23 +- >>>>> hw/arm/virt.c | 364 ++++++++++++++++++++++++++++---- >>>>> hw/i386/pc_piix.c | 6 +- >>>>> hw/i386/pc_q35.c | 6 +- >>>>> hw/ppc/mac_newworld.c | 3 +- >>>>> hw/ppc/mac_oldworld.c | 2 +- >>>>> hw/ppc/spapr.c | 2 +- >>>>> include/hw/arm/virt.h | 24 ++- >>>>> include/hw/boards.h | 5 +- >>>>> include/hw/mem/nvdimm.h | 4 + >>>>> target/arm/kvm.c | 10 + >>>>> target/arm/kvm_arm.h | 13 ++ >>>>> vl.c | 6 +- >>>>> 17 files changed, 556 insertions(+), 85 deletions(-) >>>>> >>>> >>>> >>
On Tue, 26 Feb 2019 14:11:58 +0100 Auger Eric <eric.auger@redhat.com> wrote: > Hi Igor, > > On 2/26/19 9:40 AM, Auger Eric wrote: > > Hi Igor, > > > > On 2/25/19 10:42 AM, Igor Mammedov wrote: > >> On Fri, 22 Feb 2019 18:35:26 +0100 > >> Auger Eric <eric.auger@redhat.com> wrote: > >> > >>> Hi Igor, > >>> > >>> On 2/22/19 5:27 PM, Igor Mammedov wrote: > >>>> On Wed, 20 Feb 2019 23:39:46 +0100 > >>>> Eric Auger <eric.auger@redhat.com> wrote: > >>>> > >>>>> This series aims to bump the 255GB RAM limit in machvirt and to > >>>>> support device memory in general, and especially PCDIMM/NVDIMM. > >>>>> > >>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can > >>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the > >>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe > >>>>> MMIO region. The address map was 1TB large. This corresponded to > >>>>> the max IPA capacity KVM was able to manage. > >>>>> > >>>>> Since 4.20, the host kernel is able to support a larger and dynamic > >>>>> IPA range. So the guest physical address can go beyond the 1TB. The > >>>>> max GPA size depends on the host kernel configuration and physical CPUs. > >>>>> > >>>>> In this series we use this feature and allow the RAM to grow without > >>>>> any other limit than the one put by the host kernel. > >>>>> > >>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size > >>>>> ram_size and then comes the device memory (,maxmem) of size > >>>>> maxram_size - ram_size. The device memory is potentially hotpluggable > >>>>> depending on the instantiated memory objects. > >>>>> > >>>>> IO regions previously located between 256GB and 1TB are moved after > >>>>> the RAM. Their offset is dynamically computed, depends on ram_size > >>>>> and maxram_size. Size alignment is enforced. > >>>>> > >>>>> In case maxmem value is inferior to 255GB, the legacy memory map > >>>>> still is used. The change of memory map becomes effective from 4.0 > >>>>> onwards. > >>>>> > >>>>> As we keep the initial RAM at 1GB base address, we do not need to do > >>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do > >>>>> that job at the moment. > >>>>> > >>>>> Device memory being put just after the initial RAM, it is possible > >>>>> to get access to this feature while keeping a 1TB address map. > >>>>> > >>>>> This series reuses/rebases patches initially submitted by Shameer > >>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts. > >>>>> > >>>>> Functionally, the series is split into 3 parts: > >>>>> 1) bump of the initial RAM limit [1 - 9] and change in > >>>>> the memory map > >>>> > >>>>> 2) Support of PC-DIMM [10 - 13] > >>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed > >>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be > >>>> visible to the guest. It might be that DT is masking problem > >>>> but well, that won't work on ACPI only guests. > >>> > >>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem > >>> added with the DIMM slots. > >> Question is how does it get there? Does it come from DT or from firmware > >> via UEFI interfaces? > >> > >>> So it looks fine to me. Isn't E820 a pure x86 matter? > >> sorry for misleading, I've meant is UEFI GetMemoryMap(). > >> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed > >> via UEFI GetMemoryMap() as guest kernel might start using it as normal > >> memory early at boot and later put that memory into zone normal and hence > >> make it non-hot-un-pluggable. The same concerns apply to DT based means > >> of discovery. > >> (That's guest issue but it's easy to workaround it not putting hotpluggable > >> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly) > >> That way memory doesn't get (ab)used by firmware or early boot kernel stages > >> and doesn't get locked up. > >> > >>> What else would you expect in the dsdt? > >> Memory device descriptions, look for code that adds PNP0C80 with _CRS > >> describing memory ranges > > > > OK thank you for the explanations. I will work on PNP0C80 addition then. > > Does it mean that in ACPI mode we must not output DT hotplug memory > > nodes or assuming that PNP0C80 is properly described, it will "override" > > DT description? > > After further investigations, I think the pieces you pointed out are > added by Shameer's series, ie. through the build_memory_hotplug_aml() > call. So I suggest we separate the concerns: this series brings support > for DIMM coldplug. hotplug, including all the relevant ACPI structures > will be added later on by Shameer. Maybe we should not put pc-dimms in DT for this series until it gets clear if it doesn't conflict with ACPI in some way.
Hi Igor, On 2/26/19 5:56 PM, Igor Mammedov wrote: > On Tue, 26 Feb 2019 14:11:58 +0100 > Auger Eric <eric.auger@redhat.com> wrote: > >> Hi Igor, >> >> On 2/26/19 9:40 AM, Auger Eric wrote: >>> Hi Igor, >>> >>> On 2/25/19 10:42 AM, Igor Mammedov wrote: >>>> On Fri, 22 Feb 2019 18:35:26 +0100 >>>> Auger Eric <eric.auger@redhat.com> wrote: >>>> >>>>> Hi Igor, >>>>> >>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote: >>>>>> On Wed, 20 Feb 2019 23:39:46 +0100 >>>>>> Eric Auger <eric.auger@redhat.com> wrote: >>>>>> >>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to >>>>>>> support device memory in general, and especially PCDIMM/NVDIMM. >>>>>>> >>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can >>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the >>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe >>>>>>> MMIO region. The address map was 1TB large. This corresponded to >>>>>>> the max IPA capacity KVM was able to manage. >>>>>>> >>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic >>>>>>> IPA range. So the guest physical address can go beyond the 1TB. The >>>>>>> max GPA size depends on the host kernel configuration and physical CPUs. >>>>>>> >>>>>>> In this series we use this feature and allow the RAM to grow without >>>>>>> any other limit than the one put by the host kernel. >>>>>>> >>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size >>>>>>> ram_size and then comes the device memory (,maxmem) of size >>>>>>> maxram_size - ram_size. The device memory is potentially hotpluggable >>>>>>> depending on the instantiated memory objects. >>>>>>> >>>>>>> IO regions previously located between 256GB and 1TB are moved after >>>>>>> the RAM. Their offset is dynamically computed, depends on ram_size >>>>>>> and maxram_size. Size alignment is enforced. >>>>>>> >>>>>>> In case maxmem value is inferior to 255GB, the legacy memory map >>>>>>> still is used. The change of memory map becomes effective from 4.0 >>>>>>> onwards. >>>>>>> >>>>>>> As we keep the initial RAM at 1GB base address, we do not need to do >>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do >>>>>>> that job at the moment. >>>>>>> >>>>>>> Device memory being put just after the initial RAM, it is possible >>>>>>> to get access to this feature while keeping a 1TB address map. >>>>>>> >>>>>>> This series reuses/rebases patches initially submitted by Shameer >>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts. >>>>>>> >>>>>>> Functionally, the series is split into 3 parts: >>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in >>>>>>> the memory map >>>>>> >>>>>>> 2) Support of PC-DIMM [10 - 13] >>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed >>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be >>>>>> visible to the guest. It might be that DT is masking problem >>>>>> but well, that won't work on ACPI only guests. >>>>> >>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem >>>>> added with the DIMM slots. >>>> Question is how does it get there? Does it come from DT or from firmware >>>> via UEFI interfaces? >>>> >>>>> So it looks fine to me. Isn't E820 a pure x86 matter? >>>> sorry for misleading, I've meant is UEFI GetMemoryMap(). >>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed >>>> via UEFI GetMemoryMap() as guest kernel might start using it as normal >>>> memory early at boot and later put that memory into zone normal and hence >>>> make it non-hot-un-pluggable. The same concerns apply to DT based means >>>> of discovery. >>>> (That's guest issue but it's easy to workaround it not putting hotpluggable >>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly) >>>> That way memory doesn't get (ab)used by firmware or early boot kernel stages >>>> and doesn't get locked up. >>>> >>>>> What else would you expect in the dsdt? >>>> Memory device descriptions, look for code that adds PNP0C80 with _CRS >>>> describing memory ranges >>> >>> OK thank you for the explanations. I will work on PNP0C80 addition then. >>> Does it mean that in ACPI mode we must not output DT hotplug memory >>> nodes or assuming that PNP0C80 is properly described, it will "override" >>> DT description? >> >> After further investigations, I think the pieces you pointed out are >> added by Shameer's series, ie. through the build_memory_hotplug_aml() >> call. So I suggest we separate the concerns: this series brings support >> for DIMM coldplug. hotplug, including all the relevant ACPI structures >> will be added later on by Shameer. > > Maybe we should not put pc-dimms in DT for this series until it gets clear > if it doesn't conflict with ACPI in some way. I guess you mean removing the DT hotpluggable memory nodes only in ACPI mode? Otherwise you simply remove the DIMM feature, right? I double checked and if you remove the hotpluggable memory DT nodes in ACPI mode: - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I guess you're right, if the DT nodes are available, that memory is considered as not unpluggable by the guest. - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX system. Hotplug/unplug is clearly not supported by this series and any attempt results in "memory hotplug is not supported". Is it really an issue if the guest does not consider DIMM slots as not hot-unpluggable memory? I am not even sure the guest kernel would support to unplug that memory. In case we want all ACPI tables to be ready for making this memory seen as hot-unpluggable we need some Shameer's patches on top of this series. Also don't DIMM slots already make sense in DT mode. Usually we accept to add one feature in DT and then in ACPI. For instance we can benefit from nvdimm in dt mode right? So, considering an incremental approach I would be in favour of keeping the DT nodes. Thanks Eric > > > >
On Tue, 26 Feb 2019 18:53:24 +0100 Auger Eric <eric.auger@redhat.com> wrote: > Hi Igor, > > On 2/26/19 5:56 PM, Igor Mammedov wrote: > > On Tue, 26 Feb 2019 14:11:58 +0100 > > Auger Eric <eric.auger@redhat.com> wrote: > > > >> Hi Igor, > >> > >> On 2/26/19 9:40 AM, Auger Eric wrote: > >>> Hi Igor, > >>> > >>> On 2/25/19 10:42 AM, Igor Mammedov wrote: > >>>> On Fri, 22 Feb 2019 18:35:26 +0100 > >>>> Auger Eric <eric.auger@redhat.com> wrote: > >>>> > >>>>> Hi Igor, > >>>>> > >>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote: > >>>>>> On Wed, 20 Feb 2019 23:39:46 +0100 > >>>>>> Eric Auger <eric.auger@redhat.com> wrote: > >>>>>> > >>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to > >>>>>>> support device memory in general, and especially PCDIMM/NVDIMM. > >>>>>>> > >>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can > >>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the > >>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe > >>>>>>> MMIO region. The address map was 1TB large. This corresponded to > >>>>>>> the max IPA capacity KVM was able to manage. > >>>>>>> > >>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic > >>>>>>> IPA range. So the guest physical address can go beyond the 1TB. The > >>>>>>> max GPA size depends on the host kernel configuration and physical CPUs. > >>>>>>> > >>>>>>> In this series we use this feature and allow the RAM to grow without > >>>>>>> any other limit than the one put by the host kernel. > >>>>>>> > >>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size > >>>>>>> ram_size and then comes the device memory (,maxmem) of size > >>>>>>> maxram_size - ram_size. The device memory is potentially hotpluggable > >>>>>>> depending on the instantiated memory objects. > >>>>>>> > >>>>>>> IO regions previously located between 256GB and 1TB are moved after > >>>>>>> the RAM. Their offset is dynamically computed, depends on ram_size > >>>>>>> and maxram_size. Size alignment is enforced. > >>>>>>> > >>>>>>> In case maxmem value is inferior to 255GB, the legacy memory map > >>>>>>> still is used. The change of memory map becomes effective from 4.0 > >>>>>>> onwards. > >>>>>>> > >>>>>>> As we keep the initial RAM at 1GB base address, we do not need to do > >>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do > >>>>>>> that job at the moment. > >>>>>>> > >>>>>>> Device memory being put just after the initial RAM, it is possible > >>>>>>> to get access to this feature while keeping a 1TB address map. > >>>>>>> > >>>>>>> This series reuses/rebases patches initially submitted by Shameer > >>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts. > >>>>>>> > >>>>>>> Functionally, the series is split into 3 parts: > >>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in > >>>>>>> the memory map > >>>>>> > >>>>>>> 2) Support of PC-DIMM [10 - 13] > >>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed > >>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be > >>>>>> visible to the guest. It might be that DT is masking problem > >>>>>> but well, that won't work on ACPI only guests. > >>>>> > >>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem > >>>>> added with the DIMM slots. > >>>> Question is how does it get there? Does it come from DT or from firmware > >>>> via UEFI interfaces? > >>>> > >>>>> So it looks fine to me. Isn't E820 a pure x86 matter? > >>>> sorry for misleading, I've meant is UEFI GetMemoryMap(). > >>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed > >>>> via UEFI GetMemoryMap() as guest kernel might start using it as normal > >>>> memory early at boot and later put that memory into zone normal and hence > >>>> make it non-hot-un-pluggable. The same concerns apply to DT based means > >>>> of discovery. > >>>> (That's guest issue but it's easy to workaround it not putting hotpluggable > >>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly) > >>>> That way memory doesn't get (ab)used by firmware or early boot kernel stages > >>>> and doesn't get locked up. > >>>> > >>>>> What else would you expect in the dsdt? > >>>> Memory device descriptions, look for code that adds PNP0C80 with _CRS > >>>> describing memory ranges > >>> > >>> OK thank you for the explanations. I will work on PNP0C80 addition then. > >>> Does it mean that in ACPI mode we must not output DT hotplug memory > >>> nodes or assuming that PNP0C80 is properly described, it will "override" > >>> DT description? > >> > >> After further investigations, I think the pieces you pointed out are > >> added by Shameer's series, ie. through the build_memory_hotplug_aml() > >> call. So I suggest we separate the concerns: this series brings support > >> for DIMM coldplug. hotplug, including all the relevant ACPI structures > >> will be added later on by Shameer. > > > > Maybe we should not put pc-dimms in DT for this series until it gets clear > > if it doesn't conflict with ACPI in some way. > > I guess you mean removing the DT hotpluggable memory nodes only in ACPI > mode? Otherwise you simply remove the DIMM feature, right? Something like this so DT won't get in conflict with ACPI. Only we don't have a switch for it something like, -machine fdt=on (with default off) > I double checked and if you remove the hotpluggable memory DT nodes in > ACPI mode: > - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I > guess you're right, if the DT nodes are available, that memory is > considered as not unpluggable by the guest. > - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX > system. > > Hotplug/unplug is clearly not supported by this series and any attempt > results in "memory hotplug is not supported". Is it really an issue if > the guest does not consider DIMM slots as not hot-unpluggable memory? I > am not even sure the guest kernel would support to unplug that memory. > > In case we want all ACPI tables to be ready for making this memory seen > as hot-unpluggable we need some Shameer's patches on top of this series. May be we should push for this way (into 4.0), it's just a several patches after all or even merge them in your series (I'd guess it would need to be rebased on top of your latest work) > Also don't DIMM slots already make sense in DT mode. Usually we accept > to add one feature in DT and then in ACPI. For instance we can benefit usually it doesn't conflict with each other (at least I'm not aware of it) but I see a problem with in this case. > from nvdimm in dt mode right? So, considering an incremental approach I > would be in favour of keeping the DT nodes. I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much more versatile. I consider target application of arm/virt as a board that's used to run in production generic ACPI capable guest in most use cases and various DT only guests as secondary ones. It's hard to make both usecases be happy with defaults (that's probably one of the reasons why 'sbsa' board is being added). So I'd give priority to ACPI based arm/virt versus DT when defaults are considered. > Thanks > > Eric > > > > > > > >
Hi Igor, Shameer, On 2/27/19 11:10 AM, Igor Mammedov wrote: > On Tue, 26 Feb 2019 18:53:24 +0100 > Auger Eric <eric.auger@redhat.com> wrote: > >> Hi Igor, >> >> On 2/26/19 5:56 PM, Igor Mammedov wrote: >>> On Tue, 26 Feb 2019 14:11:58 +0100 >>> Auger Eric <eric.auger@redhat.com> wrote: >>> >>>> Hi Igor, >>>> >>>> On 2/26/19 9:40 AM, Auger Eric wrote: >>>>> Hi Igor, >>>>> >>>>> On 2/25/19 10:42 AM, Igor Mammedov wrote: >>>>>> On Fri, 22 Feb 2019 18:35:26 +0100 >>>>>> Auger Eric <eric.auger@redhat.com> wrote: >>>>>> >>>>>>> Hi Igor, >>>>>>> >>>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote: >>>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100 >>>>>>>> Eric Auger <eric.auger@redhat.com> wrote: >>>>>>>> >>>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to >>>>>>>>> support device memory in general, and especially PCDIMM/NVDIMM. >>>>>>>>> >>>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can >>>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the >>>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe >>>>>>>>> MMIO region. The address map was 1TB large. This corresponded to >>>>>>>>> the max IPA capacity KVM was able to manage. >>>>>>>>> >>>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic >>>>>>>>> IPA range. So the guest physical address can go beyond the 1TB. The >>>>>>>>> max GPA size depends on the host kernel configuration and physical CPUs. >>>>>>>>> >>>>>>>>> In this series we use this feature and allow the RAM to grow without >>>>>>>>> any other limit than the one put by the host kernel. >>>>>>>>> >>>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size >>>>>>>>> ram_size and then comes the device memory (,maxmem) of size >>>>>>>>> maxram_size - ram_size. The device memory is potentially hotpluggable >>>>>>>>> depending on the instantiated memory objects. >>>>>>>>> >>>>>>>>> IO regions previously located between 256GB and 1TB are moved after >>>>>>>>> the RAM. Their offset is dynamically computed, depends on ram_size >>>>>>>>> and maxram_size. Size alignment is enforced. >>>>>>>>> >>>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory map >>>>>>>>> still is used. The change of memory map becomes effective from 4.0 >>>>>>>>> onwards. >>>>>>>>> >>>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to do >>>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do >>>>>>>>> that job at the moment. >>>>>>>>> >>>>>>>>> Device memory being put just after the initial RAM, it is possible >>>>>>>>> to get access to this feature while keeping a 1TB address map. >>>>>>>>> >>>>>>>>> This series reuses/rebases patches initially submitted by Shameer >>>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts. >>>>>>>>> >>>>>>>>> Functionally, the series is split into 3 parts: >>>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in >>>>>>>>> the memory map >>>>>>>> >>>>>>>>> 2) Support of PC-DIMM [10 - 13] >>>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed >>>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be >>>>>>>> visible to the guest. It might be that DT is masking problem >>>>>>>> but well, that won't work on ACPI only guests. >>>>>>> >>>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem >>>>>>> added with the DIMM slots. >>>>>> Question is how does it get there? Does it come from DT or from firmware >>>>>> via UEFI interfaces? >>>>>> >>>>>>> So it looks fine to me. Isn't E820 a pure x86 matter? >>>>>> sorry for misleading, I've meant is UEFI GetMemoryMap(). >>>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed >>>>>> via UEFI GetMemoryMap() as guest kernel might start using it as normal >>>>>> memory early at boot and later put that memory into zone normal and hence >>>>>> make it non-hot-un-pluggable. The same concerns apply to DT based means >>>>>> of discovery. >>>>>> (That's guest issue but it's easy to workaround it not putting hotpluggable >>>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly) >>>>>> That way memory doesn't get (ab)used by firmware or early boot kernel stages >>>>>> and doesn't get locked up. >>>>>> >>>>>>> What else would you expect in the dsdt? >>>>>> Memory device descriptions, look for code that adds PNP0C80 with _CRS >>>>>> describing memory ranges >>>>> >>>>> OK thank you for the explanations. I will work on PNP0C80 addition then. >>>>> Does it mean that in ACPI mode we must not output DT hotplug memory >>>>> nodes or assuming that PNP0C80 is properly described, it will "override" >>>>> DT description? >>>> >>>> After further investigations, I think the pieces you pointed out are >>>> added by Shameer's series, ie. through the build_memory_hotplug_aml() >>>> call. So I suggest we separate the concerns: this series brings support >>>> for DIMM coldplug. hotplug, including all the relevant ACPI structures >>>> will be added later on by Shameer. >>> >>> Maybe we should not put pc-dimms in DT for this series until it gets clear >>> if it doesn't conflict with ACPI in some way. >> >> I guess you mean removing the DT hotpluggable memory nodes only in ACPI >> mode? Otherwise you simply remove the DIMM feature, right? > Something like this so DT won't get in conflict with ACPI. > Only we don't have a switch for it something like, -machine fdt=on (with default off) > >> I double checked and if you remove the hotpluggable memory DT nodes in >> ACPI mode: >> - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I >> guess you're right, if the DT nodes are available, that memory is >> considered as not unpluggable by the guest. >> - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX >> system. >> >> Hotplug/unplug is clearly not supported by this series and any attempt >> results in "memory hotplug is not supported". Is it really an issue if >> the guest does not consider DIMM slots as not hot-unpluggable memory? I >> am not even sure the guest kernel would support to unplug that memory. >> >> In case we want all ACPI tables to be ready for making this memory seen >> as hot-unpluggable we need some Shameer's patches on top of this series. > May be we should push for this way (into 4.0), it's just a several patches > after all or even merge them in your series (I'd guess it would need to be > rebased on top of your latest work) Shameer, would you agree if we merge PATCH 1 of your RFC hotplug series (without the reduced hw_reduced_acpi flag) in this series and isolate in a second PATCH the acpi_memory_hotplug_init() + build_memory_hotplug_aml called in virt code? Then would remain the GED/GPIO actual integration. Thanks Eric > >> Also don't DIMM slots already make sense in DT mode. Usually we accept >> to add one feature in DT and then in ACPI. For instance we can benefit > usually it doesn't conflict with each other (at least I'm not aware of it) > but I see a problem with in this case. > >> from nvdimm in dt mode right? So, considering an incremental approach I >> would be in favour of keeping the DT nodes. > I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much > more versatile. > > I consider target application of arm/virt as a board that's used to > run in production generic ACPI capable guest in most use cases and > various DT only guests as secondary ones. It's hard to make > both usecases be happy with defaults (that's probably one of the > reasons why 'sbsa' board is being added). > > So I'd give priority to ACPI based arm/virt versus DT when defaults are > considered. > >> Thanks >> >> Eric >>> >>> >>> >>> >
Hi Eric, > -----Original Message----- > From: Auger Eric [mailto:eric.auger@redhat.com] > Sent: 27 February 2019 10:27 > To: Igor Mammedov <imammedo@redhat.com> > Cc: peter.maydell@linaro.org; drjones@redhat.com; david@redhat.com; > dgilbert@redhat.com; Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@huawei.com>; qemu-devel@nongnu.org; > qemu-arm@nongnu.org; eric.auger.pro@gmail.com; > david@gibson.dropbear.id.au > Subject: Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion > and PCDIMM/NVDIMM support > > Hi Igor, Shameer, > > On 2/27/19 11:10 AM, Igor Mammedov wrote: > > On Tue, 26 Feb 2019 18:53:24 +0100 > > Auger Eric <eric.auger@redhat.com> wrote: > > > >> Hi Igor, > >> > >> On 2/26/19 5:56 PM, Igor Mammedov wrote: > >>> On Tue, 26 Feb 2019 14:11:58 +0100 > >>> Auger Eric <eric.auger@redhat.com> wrote: > >>> > >>>> Hi Igor, > >>>> > >>>> On 2/26/19 9:40 AM, Auger Eric wrote: > >>>>> Hi Igor, > >>>>> > >>>>> On 2/25/19 10:42 AM, Igor Mammedov wrote: > >>>>>> On Fri, 22 Feb 2019 18:35:26 +0100 > >>>>>> Auger Eric <eric.auger@redhat.com> wrote: > >>>>>> > >>>>>>> Hi Igor, > >>>>>>> > >>>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote: > >>>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100 > >>>>>>>> Eric Auger <eric.auger@redhat.com> wrote: > >>>>>>>> > >>>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to > >>>>>>>>> support device memory in general, and especially > PCDIMM/NVDIMM. > >>>>>>>>> > >>>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can > >>>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as > the > >>>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high > PCIe > >>>>>>>>> MMIO region. The address map was 1TB large. This corresponded > to > >>>>>>>>> the max IPA capacity KVM was able to manage. > >>>>>>>>> > >>>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic > >>>>>>>>> IPA range. So the guest physical address can go beyond the 1TB. > The > >>>>>>>>> max GPA size depends on the host kernel configuration and physical > CPUs. > >>>>>>>>> > >>>>>>>>> In this series we use this feature and allow the RAM to grow > without > >>>>>>>>> any other limit than the one put by the host kernel. > >>>>>>>>> > >>>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size > >>>>>>>>> ram_size and then comes the device memory (,maxmem) of size > >>>>>>>>> maxram_size - ram_size. The device memory is potentially > hotpluggable > >>>>>>>>> depending on the instantiated memory objects. > >>>>>>>>> > >>>>>>>>> IO regions previously located between 256GB and 1TB are moved > after > >>>>>>>>> the RAM. Their offset is dynamically computed, depends on > ram_size > >>>>>>>>> and maxram_size. Size alignment is enforced. > >>>>>>>>> > >>>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory > map > >>>>>>>>> still is used. The change of memory map becomes effective from 4.0 > >>>>>>>>> onwards. > >>>>>>>>> > >>>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to > do > >>>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do > >>>>>>>>> that job at the moment. > >>>>>>>>> > >>>>>>>>> Device memory being put just after the initial RAM, it is possible > >>>>>>>>> to get access to this feature while keeping a 1TB address map. > >>>>>>>>> > >>>>>>>>> This series reuses/rebases patches initially submitted by Shameer > >>>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts. > >>>>>>>>> > >>>>>>>>> Functionally, the series is split into 3 parts: > >>>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in > >>>>>>>>> the memory map > >>>>>>>> > >>>>>>>>> 2) Support of PC-DIMM [10 - 13] > >>>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed > >>>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be > >>>>>>>> visible to the guest. It might be that DT is masking problem > >>>>>>>> but well, that won't work on ACPI only guests. > >>>>>>> > >>>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of > mem > >>>>>>> added with the DIMM slots. > >>>>>> Question is how does it get there? Does it come from DT or from > firmware > >>>>>> via UEFI interfaces? > >>>>>> > >>>>>>> So it looks fine to me. Isn't E820 a pure x86 matter? > >>>>>> sorry for misleading, I've meant is UEFI GetMemoryMap(). > >>>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed > >>>>>> via UEFI GetMemoryMap() as guest kernel might start using it as > normal > >>>>>> memory early at boot and later put that memory into zone normal and > hence > >>>>>> make it non-hot-un-pluggable. The same concerns apply to DT based > means > >>>>>> of discovery. > >>>>>> (That's guest issue but it's easy to workaround it not putting > hotpluggable > >>>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it > properly) > >>>>>> That way memory doesn't get (ab)used by firmware or early boot > kernel stages > >>>>>> and doesn't get locked up. > >>>>>> > >>>>>>> What else would you expect in the dsdt? > >>>>>> Memory device descriptions, look for code that adds PNP0C80 with > _CRS > >>>>>> describing memory ranges > >>>>> > >>>>> OK thank you for the explanations. I will work on PNP0C80 addition then. > >>>>> Does it mean that in ACPI mode we must not output DT hotplug memory > >>>>> nodes or assuming that PNP0C80 is properly described, it will "override" > >>>>> DT description? > >>>> > >>>> After further investigations, I think the pieces you pointed out are > >>>> added by Shameer's series, ie. through the build_memory_hotplug_aml() > >>>> call. So I suggest we separate the concerns: this series brings support > >>>> for DIMM coldplug. hotplug, including all the relevant ACPI structures > >>>> will be added later on by Shameer. > >>> > >>> Maybe we should not put pc-dimms in DT for this series until it gets clear > >>> if it doesn't conflict with ACPI in some way. > >> > >> I guess you mean removing the DT hotpluggable memory nodes only in ACPI > >> mode? Otherwise you simply remove the DIMM feature, right? > > Something like this so DT won't get in conflict with ACPI. > > Only we don't have a switch for it something like, -machine fdt=on (with > default off) > > > >> I double checked and if you remove the hotpluggable memory DT nodes in > >> ACPI mode: > >> - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I > >> guess you're right, if the DT nodes are available, that memory is > >> considered as not unpluggable by the guest. > >> - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX > >> system. > >> > >> Hotplug/unplug is clearly not supported by this series and any attempt > >> results in "memory hotplug is not supported". Is it really an issue if > >> the guest does not consider DIMM slots as not hot-unpluggable memory? I > >> am not even sure the guest kernel would support to unplug that memory. > >> > >> In case we want all ACPI tables to be ready for making this memory seen > >> as hot-unpluggable we need some Shameer's patches on top of this series. > > May be we should push for this way (into 4.0), it's just a several patches > > after all or even merge them in your series (I'd guess it would need to be > > rebased on top of your latest work) > > Shameer, would you agree if we merge PATCH 1 of your RFC hotplug series > (without the reduced hw_reduced_acpi flag) in this series and isolate in > a second PATCH the acpi_memory_hotplug_init() + build_memory_hotplug_aml > called in virt code? Sure, that’s fine with me. So what would you use for the event_handler_method in build_memory_hotplug_aml()? GPO0 device? Thanks, Shameer > Then would remain the GED/GPIO actual integration. > > Thanks > > Eric > > > >> Also don't DIMM slots already make sense in DT mode. Usually we accept > >> to add one feature in DT and then in ACPI. For instance we can benefit > > usually it doesn't conflict with each other (at least I'm not aware of it) > > but I see a problem with in this case. > > > >> from nvdimm in dt mode right? So, considering an incremental approach I > >> would be in favour of keeping the DT nodes. > > I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much > > more versatile. > > > > I consider target application of arm/virt as a board that's used to > > run in production generic ACPI capable guest in most use cases and > > various DT only guests as secondary ones. It's hard to make > > both usecases be happy with defaults (that's probably one of the > > reasons why 'sbsa' board is being added). > > > > So I'd give priority to ACPI based arm/virt versus DT when defaults are > > considered. > > > >> Thanks > >> > >> Eric > >>> > >>> > >>> > >>> > >
On Wed, 27 Feb 2019 10:41:45 +0000 Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote: > Hi Eric, > > > -----Original Message----- > > From: Auger Eric [mailto:eric.auger@redhat.com] > > Sent: 27 February 2019 10:27 > > To: Igor Mammedov <imammedo@redhat.com> > > Cc: peter.maydell@linaro.org; drjones@redhat.com; david@redhat.com; > > dgilbert@redhat.com; Shameerali Kolothum Thodi > > <shameerali.kolothum.thodi@huawei.com>; qemu-devel@nongnu.org; > > qemu-arm@nongnu.org; eric.auger.pro@gmail.com; > > david@gibson.dropbear.id.au > > Subject: Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion > > and PCDIMM/NVDIMM support > > > > Hi Igor, Shameer, > > > > On 2/27/19 11:10 AM, Igor Mammedov wrote: > > > On Tue, 26 Feb 2019 18:53:24 +0100 > > > Auger Eric <eric.auger@redhat.com> wrote: > > > > > >> Hi Igor, > > >> > > >> On 2/26/19 5:56 PM, Igor Mammedov wrote: > > >>> On Tue, 26 Feb 2019 14:11:58 +0100 > > >>> Auger Eric <eric.auger@redhat.com> wrote: > > >>> > > >>>> Hi Igor, > > >>>> > > >>>> On 2/26/19 9:40 AM, Auger Eric wrote: > > >>>>> Hi Igor, > > >>>>> > > >>>>> On 2/25/19 10:42 AM, Igor Mammedov wrote: > > >>>>>> On Fri, 22 Feb 2019 18:35:26 +0100 > > >>>>>> Auger Eric <eric.auger@redhat.com> wrote: > > >>>>>> > > >>>>>>> Hi Igor, > > >>>>>>> > > >>>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote: > > >>>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100 > > >>>>>>>> Eric Auger <eric.auger@redhat.com> wrote: > > >>>>>>>> > > >>>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to > > >>>>>>>>> support device memory in general, and especially > > PCDIMM/NVDIMM. > > >>>>>>>>> > > >>>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can > > >>>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as > > the > > >>>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high > > PCIe > > >>>>>>>>> MMIO region. The address map was 1TB large. This corresponded > > to > > >>>>>>>>> the max IPA capacity KVM was able to manage. > > >>>>>>>>> > > >>>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic > > >>>>>>>>> IPA range. So the guest physical address can go beyond the 1TB. > > The > > >>>>>>>>> max GPA size depends on the host kernel configuration and physical > > CPUs. > > >>>>>>>>> > > >>>>>>>>> In this series we use this feature and allow the RAM to grow > > without > > >>>>>>>>> any other limit than the one put by the host kernel. > > >>>>>>>>> > > >>>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size > > >>>>>>>>> ram_size and then comes the device memory (,maxmem) of size > > >>>>>>>>> maxram_size - ram_size. The device memory is potentially > > hotpluggable > > >>>>>>>>> depending on the instantiated memory objects. > > >>>>>>>>> > > >>>>>>>>> IO regions previously located between 256GB and 1TB are moved > > after > > >>>>>>>>> the RAM. Their offset is dynamically computed, depends on > > ram_size > > >>>>>>>>> and maxram_size. Size alignment is enforced. > > >>>>>>>>> > > >>>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory > > map > > >>>>>>>>> still is used. The change of memory map becomes effective from 4.0 > > >>>>>>>>> onwards. > > >>>>>>>>> > > >>>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to > > do > > >>>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do > > >>>>>>>>> that job at the moment. > > >>>>>>>>> > > >>>>>>>>> Device memory being put just after the initial RAM, it is possible > > >>>>>>>>> to get access to this feature while keeping a 1TB address map. > > >>>>>>>>> > > >>>>>>>>> This series reuses/rebases patches initially submitted by Shameer > > >>>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts. > > >>>>>>>>> > > >>>>>>>>> Functionally, the series is split into 3 parts: > > >>>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in > > >>>>>>>>> the memory map > > >>>>>>>> > > >>>>>>>>> 2) Support of PC-DIMM [10 - 13] > > >>>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed > > >>>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be > > >>>>>>>> visible to the guest. It might be that DT is masking problem > > >>>>>>>> but well, that won't work on ACPI only guests. > > >>>>>>> > > >>>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of > > mem > > >>>>>>> added with the DIMM slots. > > >>>>>> Question is how does it get there? Does it come from DT or from > > firmware > > >>>>>> via UEFI interfaces? > > >>>>>> > > >>>>>>> So it looks fine to me. Isn't E820 a pure x86 matter? > > >>>>>> sorry for misleading, I've meant is UEFI GetMemoryMap(). > > >>>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed > > >>>>>> via UEFI GetMemoryMap() as guest kernel might start using it as > > normal > > >>>>>> memory early at boot and later put that memory into zone normal and > > hence > > >>>>>> make it non-hot-un-pluggable. The same concerns apply to DT based > > means > > >>>>>> of discovery. > > >>>>>> (That's guest issue but it's easy to workaround it not putting > > hotpluggable > > >>>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it > > properly) > > >>>>>> That way memory doesn't get (ab)used by firmware or early boot > > kernel stages > > >>>>>> and doesn't get locked up. > > >>>>>> > > >>>>>>> What else would you expect in the dsdt? > > >>>>>> Memory device descriptions, look for code that adds PNP0C80 with > > _CRS > > >>>>>> describing memory ranges > > >>>>> > > >>>>> OK thank you for the explanations. I will work on PNP0C80 addition then. > > >>>>> Does it mean that in ACPI mode we must not output DT hotplug memory > > >>>>> nodes or assuming that PNP0C80 is properly described, it will "override" > > >>>>> DT description? > > >>>> > > >>>> After further investigations, I think the pieces you pointed out are > > >>>> added by Shameer's series, ie. through the build_memory_hotplug_aml() > > >>>> call. So I suggest we separate the concerns: this series brings support > > >>>> for DIMM coldplug. hotplug, including all the relevant ACPI structures > > >>>> will be added later on by Shameer. > > >>> > > >>> Maybe we should not put pc-dimms in DT for this series until it gets clear > > >>> if it doesn't conflict with ACPI in some way. > > >> > > >> I guess you mean removing the DT hotpluggable memory nodes only in ACPI > > >> mode? Otherwise you simply remove the DIMM feature, right? > > > Something like this so DT won't get in conflict with ACPI. > > > Only we don't have a switch for it something like, -machine fdt=on (with > > default off) > > > > > >> I double checked and if you remove the hotpluggable memory DT nodes in > > >> ACPI mode: > > >> - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I > > >> guess you're right, if the DT nodes are available, that memory is > > >> considered as not unpluggable by the guest. > > >> - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX > > >> system. > > >> > > >> Hotplug/unplug is clearly not supported by this series and any attempt > > >> results in "memory hotplug is not supported". Is it really an issue if > > >> the guest does not consider DIMM slots as not hot-unpluggable memory? I > > >> am not even sure the guest kernel would support to unplug that memory. > > >> > > >> In case we want all ACPI tables to be ready for making this memory seen > > >> as hot-unpluggable we need some Shameer's patches on top of this series. > > > May be we should push for this way (into 4.0), it's just a several patches > > > after all or even merge them in your series (I'd guess it would need to be > > > rebased on top of your latest work) > > > > Shameer, would you agree if we merge PATCH 1 of your RFC hotplug series > > (without the reduced hw_reduced_acpi flag) in this series and isolate in > > a second PATCH the acpi_memory_hotplug_init() + build_memory_hotplug_aml > > called in virt code? probably we can do it as transitional step as we need working mmio interface in place for build_memory_hotplug_aml() to work, provided it won't create migration issues (do we need VMSTATE_MEMORY_HOTPLUG for cold-plug case?). What about dummy initial GED (empty device), that manages mmio region only and then later it will be filled with remaining logic IRQ. In this case mmio region and vmstate won't change (maybe) so it won't cause ABI or migration issues. > Sure, that’s fine with me. So what would you use for the event_handler_method in > build_memory_hotplug_aml()? GPO0 device? a method name not defined in spec, so it won't be called might do. > > Thanks, > Shameer > > > Then would remain the GED/GPIO actual integration. > > > > Thanks > > > > Eric > > > > > >> Also don't DIMM slots already make sense in DT mode. Usually we accept > > >> to add one feature in DT and then in ACPI. For instance we can benefit > > > usually it doesn't conflict with each other (at least I'm not aware of it) > > > but I see a problem with in this case. > > > > > >> from nvdimm in dt mode right? So, considering an incremental approach I > > >> would be in favour of keeping the DT nodes. > > > I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much > > > more versatile. > > > > > > I consider target application of arm/virt as a board that's used to > > > run in production generic ACPI capable guest in most use cases and > > > various DT only guests as secondary ones. It's hard to make > > > both usecases be happy with defaults (that's probably one of the > > > reasons why 'sbsa' board is being added). > > > > > > So I'd give priority to ACPI based arm/virt versus DT when defaults are > > > considered. > > > > > >> Thanks > > >> > > >> Eric > > >>> > > >>> > > >>> > > >>> > > >
Hi Igor, Shameer, On 2/27/19 6:51 PM, Igor Mammedov wrote: > On Wed, 27 Feb 2019 10:41:45 +0000 > Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote: > >> Hi Eric, >> >>> -----Original Message----- >>> From: Auger Eric [mailto:eric.auger@redhat.com] >>> Sent: 27 February 2019 10:27 >>> To: Igor Mammedov <imammedo@redhat.com> >>> Cc: peter.maydell@linaro.org; drjones@redhat.com; david@redhat.com; >>> dgilbert@redhat.com; Shameerali Kolothum Thodi >>> <shameerali.kolothum.thodi@huawei.com>; qemu-devel@nongnu.org; >>> qemu-arm@nongnu.org; eric.auger.pro@gmail.com; >>> david@gibson.dropbear.id.au >>> Subject: Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion >>> and PCDIMM/NVDIMM support >>> >>> Hi Igor, Shameer, >>> >>> On 2/27/19 11:10 AM, Igor Mammedov wrote: >>>> On Tue, 26 Feb 2019 18:53:24 +0100 >>>> Auger Eric <eric.auger@redhat.com> wrote: >>>> >>>>> Hi Igor, >>>>> >>>>> On 2/26/19 5:56 PM, Igor Mammedov wrote: >>>>>> On Tue, 26 Feb 2019 14:11:58 +0100 >>>>>> Auger Eric <eric.auger@redhat.com> wrote: >>>>>> >>>>>>> Hi Igor, >>>>>>> >>>>>>> On 2/26/19 9:40 AM, Auger Eric wrote: >>>>>>>> Hi Igor, >>>>>>>> >>>>>>>> On 2/25/19 10:42 AM, Igor Mammedov wrote: >>>>>>>>> On Fri, 22 Feb 2019 18:35:26 +0100 >>>>>>>>> Auger Eric <eric.auger@redhat.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Igor, >>>>>>>>>> >>>>>>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote: >>>>>>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100 >>>>>>>>>>> Eric Auger <eric.auger@redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to >>>>>>>>>>>> support device memory in general, and especially >>> PCDIMM/NVDIMM. >>>>>>>>>>>> >>>>>>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can >>>>>>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as >>> the >>>>>>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high >>> PCIe >>>>>>>>>>>> MMIO region. The address map was 1TB large. This corresponded >>> to >>>>>>>>>>>> the max IPA capacity KVM was able to manage. >>>>>>>>>>>> >>>>>>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic >>>>>>>>>>>> IPA range. So the guest physical address can go beyond the 1TB. >>> The >>>>>>>>>>>> max GPA size depends on the host kernel configuration and physical >>> CPUs. >>>>>>>>>>>> >>>>>>>>>>>> In this series we use this feature and allow the RAM to grow >>> without >>>>>>>>>>>> any other limit than the one put by the host kernel. >>>>>>>>>>>> >>>>>>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size >>>>>>>>>>>> ram_size and then comes the device memory (,maxmem) of size >>>>>>>>>>>> maxram_size - ram_size. The device memory is potentially >>> hotpluggable >>>>>>>>>>>> depending on the instantiated memory objects. >>>>>>>>>>>> >>>>>>>>>>>> IO regions previously located between 256GB and 1TB are moved >>> after >>>>>>>>>>>> the RAM. Their offset is dynamically computed, depends on >>> ram_size >>>>>>>>>>>> and maxram_size. Size alignment is enforced. >>>>>>>>>>>> >>>>>>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory >>> map >>>>>>>>>>>> still is used. The change of memory map becomes effective from 4.0 >>>>>>>>>>>> onwards. >>>>>>>>>>>> >>>>>>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to >>> do >>>>>>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do >>>>>>>>>>>> that job at the moment. >>>>>>>>>>>> >>>>>>>>>>>> Device memory being put just after the initial RAM, it is possible >>>>>>>>>>>> to get access to this feature while keeping a 1TB address map. >>>>>>>>>>>> >>>>>>>>>>>> This series reuses/rebases patches initially submitted by Shameer >>>>>>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts. >>>>>>>>>>>> >>>>>>>>>>>> Functionally, the series is split into 3 parts: >>>>>>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in >>>>>>>>>>>> the memory map >>>>>>>>>>> >>>>>>>>>>>> 2) Support of PC-DIMM [10 - 13] >>>>>>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed >>>>>>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be >>>>>>>>>>> visible to the guest. It might be that DT is masking problem >>>>>>>>>>> but well, that won't work on ACPI only guests. >>>>>>>>>> >>>>>>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of >>> mem >>>>>>>>>> added with the DIMM slots. >>>>>>>>> Question is how does it get there? Does it come from DT or from >>> firmware >>>>>>>>> via UEFI interfaces? >>>>>>>>> >>>>>>>>>> So it looks fine to me. Isn't E820 a pure x86 matter? >>>>>>>>> sorry for misleading, I've meant is UEFI GetMemoryMap(). >>>>>>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed >>>>>>>>> via UEFI GetMemoryMap() as guest kernel might start using it as >>> normal >>>>>>>>> memory early at boot and later put that memory into zone normal and >>> hence >>>>>>>>> make it non-hot-un-pluggable. The same concerns apply to DT based >>> means >>>>>>>>> of discovery. >>>>>>>>> (That's guest issue but it's easy to workaround it not putting >>> hotpluggable >>>>>>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it >>> properly) >>>>>>>>> That way memory doesn't get (ab)used by firmware or early boot >>> kernel stages >>>>>>>>> and doesn't get locked up. >>>>>>>>> >>>>>>>>>> What else would you expect in the dsdt? >>>>>>>>> Memory device descriptions, look for code that adds PNP0C80 with >>> _CRS >>>>>>>>> describing memory ranges >>>>>>>> >>>>>>>> OK thank you for the explanations. I will work on PNP0C80 addition then. >>>>>>>> Does it mean that in ACPI mode we must not output DT hotplug memory >>>>>>>> nodes or assuming that PNP0C80 is properly described, it will "override" >>>>>>>> DT description? >>>>>>> >>>>>>> After further investigations, I think the pieces you pointed out are >>>>>>> added by Shameer's series, ie. through the build_memory_hotplug_aml() >>>>>>> call. So I suggest we separate the concerns: this series brings support >>>>>>> for DIMM coldplug. hotplug, including all the relevant ACPI structures >>>>>>> will be added later on by Shameer. >>>>>> >>>>>> Maybe we should not put pc-dimms in DT for this series until it gets clear >>>>>> if it doesn't conflict with ACPI in some way. >>>>> >>>>> I guess you mean removing the DT hotpluggable memory nodes only in ACPI >>>>> mode? Otherwise you simply remove the DIMM feature, right? >>>> Something like this so DT won't get in conflict with ACPI. >>>> Only we don't have a switch for it something like, -machine fdt=on (with >>> default off) >>>> >>>>> I double checked and if you remove the hotpluggable memory DT nodes in >>>>> ACPI mode: >>>>> - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I >>>>> guess you're right, if the DT nodes are available, that memory is >>>>> considered as not unpluggable by the guest. >>>>> - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX >>>>> system. >>>>> >>>>> Hotplug/unplug is clearly not supported by this series and any attempt >>>>> results in "memory hotplug is not supported". Is it really an issue if >>>>> the guest does not consider DIMM slots as not hot-unpluggable memory? I >>>>> am not even sure the guest kernel would support to unplug that memory. >>>>> >>>>> In case we want all ACPI tables to be ready for making this memory seen >>>>> as hot-unpluggable we need some Shameer's patches on top of this series. >>>> May be we should push for this way (into 4.0), it's just a several patches >>>> after all or even merge them in your series (I'd guess it would need to be >>>> rebased on top of your latest work) >>> >>> Shameer, would you agree if we merge PATCH 1 of your RFC hotplug series >>> (without the reduced hw_reduced_acpi flag) in this series and isolate in >>> a second PATCH the acpi_memory_hotplug_init() + build_memory_hotplug_aml >>> called in virt code? > probably we can do it as transitional step as we need working mmio interface > in place for build_memory_hotplug_aml() to work, provided it won't create > migration issues (do we need VMSTATE_MEMORY_HOTPLUG for cold-plug case?). > > What about dummy initial GED (empty device), that manages mmio region only > and then later it will be filled with remaining logic IRQ. In this case mmio region > and vmstate won't change (maybe) so it won't cause ABI or migration issues. > > >> Sure, that’s fine with me. So what would you use for the event_handler_method in >> build_memory_hotplug_aml()? GPO0 device? > > a method name not defined in spec, so it won't be called might do. At this point the event_handler_method, ie. \_SB.GPO0._E02, is not supposed to be called, right? So effectivily we should be able to use any other method name (unlinked to any GPIO/GED). I guess at this stage only the PNP0C80 definition blocks + methods are used. What still remains fuzzy for me is in case of cold plug the mmio hotplug control region part only is read (despite the slot selection of course) and returns 0 for addr/size and also flags meaning the slot is not enabled. So despite the slots are advertised as hotpluggable/enabled in the SRAT; I am not sure for the OS it actually makes any difference whether the DSDT definition blocks are described or not. To be honest I am afraid this is too late to add those additional features for 4.0 now. This is going to jeopardize the first preliminary part which is the introduction of the new memory map, allowing the expansion of the initial RAM and paving the way for device memory introduction. So I think I am going to resend the first 10 patches in a standalone series. And we can iterate on the PCDIMM/NVDIMM parts independently. Thanks Eric > > >> >> Thanks, >> Shameer >> >>> Then would remain the GED/GPIO actual integration. >>> >>> Thanks >>> >>> Eric >>>> >>>>> Also don't DIMM slots already make sense in DT mode. Usually we accept >>>>> to add one feature in DT and then in ACPI. For instance we can benefit >>>> usually it doesn't conflict with each other (at least I'm not aware of it) >>>> but I see a problem with in this case. >>>> >>>>> from nvdimm in dt mode right? So, considering an incremental approach I >>>>> would be in favour of keeping the DT nodes. >>>> I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much >>>> more versatile. >>>> >>>> I consider target application of arm/virt as a board that's used to >>>> run in production generic ACPI capable guest in most use cases and >>>> various DT only guests as secondary ones. It's hard to make >>>> both usecases be happy with defaults (that's probably one of the >>>> reasons why 'sbsa' board is being added). >>>> >>>> So I'd give priority to ACPI based arm/virt versus DT when defaults are >>>> considered. >>>> >>>>> Thanks >>>>> >>>>> Eric >>>>>> >>>>>> >>>>>> >>>>>> >>>> > >
On Thu, 28 Feb 2019 08:48:18 +0100 Auger Eric <eric.auger@redhat.com> wrote: > Hi Igor, Shameer, > > On 2/27/19 6:51 PM, Igor Mammedov wrote: > > On Wed, 27 Feb 2019 10:41:45 +0000 > > Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote: > > > >> Hi Eric, > >> > >>> -----Original Message----- > >>> From: Auger Eric [mailto:eric.auger@redhat.com] > >>> Sent: 27 February 2019 10:27 > >>> To: Igor Mammedov <imammedo@redhat.com> > >>> Cc: peter.maydell@linaro.org; drjones@redhat.com; david@redhat.com; > >>> dgilbert@redhat.com; Shameerali Kolothum Thodi > >>> <shameerali.kolothum.thodi@huawei.com>; qemu-devel@nongnu.org; > >>> qemu-arm@nongnu.org; eric.auger.pro@gmail.com; > >>> david@gibson.dropbear.id.au > >>> Subject: Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion > >>> and PCDIMM/NVDIMM support > >>> > >>> Hi Igor, Shameer, > >>> > >>> On 2/27/19 11:10 AM, Igor Mammedov wrote: > >>>> On Tue, 26 Feb 2019 18:53:24 +0100 > >>>> Auger Eric <eric.auger@redhat.com> wrote: > >>>> > >>>>> Hi Igor, > >>>>> > >>>>> On 2/26/19 5:56 PM, Igor Mammedov wrote: > >>>>>> On Tue, 26 Feb 2019 14:11:58 +0100 > >>>>>> Auger Eric <eric.auger@redhat.com> wrote: > >>>>>> > >>>>>>> Hi Igor, > >>>>>>> > >>>>>>> On 2/26/19 9:40 AM, Auger Eric wrote: > >>>>>>>> Hi Igor, > >>>>>>>> > >>>>>>>> On 2/25/19 10:42 AM, Igor Mammedov wrote: > >>>>>>>>> On Fri, 22 Feb 2019 18:35:26 +0100 > >>>>>>>>> Auger Eric <eric.auger@redhat.com> wrote: > >>>>>>>>> > >>>>>>>>>> Hi Igor, > >>>>>>>>>> > >>>>>>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote: > >>>>>>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100 > >>>>>>>>>>> Eric Auger <eric.auger@redhat.com> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to > >>>>>>>>>>>> support device memory in general, and especially > >>> PCDIMM/NVDIMM. > >>>>>>>>>>>> > >>>>>>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can > >>>>>>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as > >>> the > >>>>>>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high > >>> PCIe > >>>>>>>>>>>> MMIO region. The address map was 1TB large. This corresponded > >>> to > >>>>>>>>>>>> the max IPA capacity KVM was able to manage. > >>>>>>>>>>>> > >>>>>>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic > >>>>>>>>>>>> IPA range. So the guest physical address can go beyond the 1TB. > >>> The > >>>>>>>>>>>> max GPA size depends on the host kernel configuration and physical > >>> CPUs. > >>>>>>>>>>>> > >>>>>>>>>>>> In this series we use this feature and allow the RAM to grow > >>> without > >>>>>>>>>>>> any other limit than the one put by the host kernel. > >>>>>>>>>>>> > >>>>>>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size > >>>>>>>>>>>> ram_size and then comes the device memory (,maxmem) of size > >>>>>>>>>>>> maxram_size - ram_size. The device memory is potentially > >>> hotpluggable > >>>>>>>>>>>> depending on the instantiated memory objects. > >>>>>>>>>>>> > >>>>>>>>>>>> IO regions previously located between 256GB and 1TB are moved > >>> after > >>>>>>>>>>>> the RAM. Their offset is dynamically computed, depends on > >>> ram_size > >>>>>>>>>>>> and maxram_size. Size alignment is enforced. > >>>>>>>>>>>> > >>>>>>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory > >>> map > >>>>>>>>>>>> still is used. The change of memory map becomes effective from 4.0 > >>>>>>>>>>>> onwards. > >>>>>>>>>>>> > >>>>>>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to > >>> do > >>>>>>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do > >>>>>>>>>>>> that job at the moment. > >>>>>>>>>>>> > >>>>>>>>>>>> Device memory being put just after the initial RAM, it is possible > >>>>>>>>>>>> to get access to this feature while keeping a 1TB address map. > >>>>>>>>>>>> > >>>>>>>>>>>> This series reuses/rebases patches initially submitted by Shameer > >>>>>>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts. > >>>>>>>>>>>> > >>>>>>>>>>>> Functionally, the series is split into 3 parts: > >>>>>>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in > >>>>>>>>>>>> the memory map > >>>>>>>>>>> > >>>>>>>>>>>> 2) Support of PC-DIMM [10 - 13] > >>>>>>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed > >>>>>>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be > >>>>>>>>>>> visible to the guest. It might be that DT is masking problem > >>>>>>>>>>> but well, that won't work on ACPI only guests. > >>>>>>>>>> > >>>>>>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of > >>> mem > >>>>>>>>>> added with the DIMM slots. > >>>>>>>>> Question is how does it get there? Does it come from DT or from > >>> firmware > >>>>>>>>> via UEFI interfaces? > >>>>>>>>> > >>>>>>>>>> So it looks fine to me. Isn't E820 a pure x86 matter? > >>>>>>>>> sorry for misleading, I've meant is UEFI GetMemoryMap(). > >>>>>>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed > >>>>>>>>> via UEFI GetMemoryMap() as guest kernel might start using it as > >>> normal > >>>>>>>>> memory early at boot and later put that memory into zone normal and > >>> hence > >>>>>>>>> make it non-hot-un-pluggable. The same concerns apply to DT based > >>> means > >>>>>>>>> of discovery. > >>>>>>>>> (That's guest issue but it's easy to workaround it not putting > >>> hotpluggable > >>>>>>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it > >>> properly) > >>>>>>>>> That way memory doesn't get (ab)used by firmware or early boot > >>> kernel stages > >>>>>>>>> and doesn't get locked up. > >>>>>>>>> > >>>>>>>>>> What else would you expect in the dsdt? > >>>>>>>>> Memory device descriptions, look for code that adds PNP0C80 with > >>> _CRS > >>>>>>>>> describing memory ranges > >>>>>>>> > >>>>>>>> OK thank you for the explanations. I will work on PNP0C80 addition then. > >>>>>>>> Does it mean that in ACPI mode we must not output DT hotplug memory > >>>>>>>> nodes or assuming that PNP0C80 is properly described, it will "override" > >>>>>>>> DT description? > >>>>>>> > >>>>>>> After further investigations, I think the pieces you pointed out are > >>>>>>> added by Shameer's series, ie. through the build_memory_hotplug_aml() > >>>>>>> call. So I suggest we separate the concerns: this series brings support > >>>>>>> for DIMM coldplug. hotplug, including all the relevant ACPI structures > >>>>>>> will be added later on by Shameer. > >>>>>> > >>>>>> Maybe we should not put pc-dimms in DT for this series until it gets clear > >>>>>> if it doesn't conflict with ACPI in some way. > >>>>> > >>>>> I guess you mean removing the DT hotpluggable memory nodes only in ACPI > >>>>> mode? Otherwise you simply remove the DIMM feature, right? > >>>> Something like this so DT won't get in conflict with ACPI. > >>>> Only we don't have a switch for it something like, -machine fdt=on (with > >>> default off) > >>>> > >>>>> I double checked and if you remove the hotpluggable memory DT nodes in > >>>>> ACPI mode: > >>>>> - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I > >>>>> guess you're right, if the DT nodes are available, that memory is > >>>>> considered as not unpluggable by the guest. > >>>>> - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX > >>>>> system. > >>>>> > >>>>> Hotplug/unplug is clearly not supported by this series and any attempt > >>>>> results in "memory hotplug is not supported". Is it really an issue if > >>>>> the guest does not consider DIMM slots as not hot-unpluggable memory? I > >>>>> am not even sure the guest kernel would support to unplug that memory. > >>>>> > >>>>> In case we want all ACPI tables to be ready for making this memory seen > >>>>> as hot-unpluggable we need some Shameer's patches on top of this series. > >>>> May be we should push for this way (into 4.0), it's just a several patches > >>>> after all or even merge them in your series (I'd guess it would need to be > >>>> rebased on top of your latest work) > >>> > >>> Shameer, would you agree if we merge PATCH 1 of your RFC hotplug series > >>> (without the reduced hw_reduced_acpi flag) in this series and isolate in > >>> a second PATCH the acpi_memory_hotplug_init() + build_memory_hotplug_aml > >>> called in virt code? > > probably we can do it as transitional step as we need working mmio interface > > in place for build_memory_hotplug_aml() to work, provided it won't create > > migration issues (do we need VMSTATE_MEMORY_HOTPLUG for cold-plug case?). > > > > What about dummy initial GED (empty device), that manages mmio region only > > and then later it will be filled with remaining logic IRQ. In this case mmio region > > and vmstate won't change (maybe) so it won't cause ABI or migration issues. > > > > > > > >> Sure, that’s fine with me. So what would you use for the event_handler_method in > >> build_memory_hotplug_aml()? GPO0 device? > > > > a method name not defined in spec, so it won't be called might do. > > At this point the event_handler_method, ie. \_SB.GPO0._E02, is not > supposed to be called, right? So effectivily we should be able to use > any other method name (unlinked to any GPIO/GED). I guess at this stage > only the PNP0C80 definition blocks + methods are used. pretty much yes. > What still remains fuzzy for me is in case of cold plug the mmio hotplug > control region part only is read (despite the slot selection of course) > and returns 0 for addr/size and also flags meaning the slot is not > enabled. If you mean guest reads 0s than it looks broken, could you show trace log with mhp_* tracepoints enabled during a dimm hotplug. > So despite the slots are advertised as hotpluggable/enabled in > the SRAT; I am not sure for the OS it actually makes any difference > whether the DSDT definition blocks are described or not. SRAT isn't used fro informing guests about amount of present RAM, it holds affinity information for present and possible RAM > To be honest I am afraid this is too late to add those additional > features for 4.0 now. This is going to jeopardize the first preliminary > part which is the introduction of the new memory map, allowing the > expansion of the initial RAM and paving the way for device memory > introduction. So I think I am going to resend the first 10 patches in a > standalone series. And we can iterate on the PCDIMM/NVDIMM parts > independently. sounds good to me, I'll try to review 1-10 today > Thanks > > Eric > > > > > >> > >> Thanks, > >> Shameer > >> > >>> Then would remain the GED/GPIO actual integration. > >>> > >>> Thanks > >>> > >>> Eric > >>>> > >>>>> Also don't DIMM slots already make sense in DT mode. Usually we accept > >>>>> to add one feature in DT and then in ACPI. For instance we can benefit > >>>> usually it doesn't conflict with each other (at least I'm not aware of it) > >>>> but I see a problem with in this case. > >>>> > >>>>> from nvdimm in dt mode right? So, considering an incremental approach I > >>>>> would be in favour of keeping the DT nodes. > >>>> I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much > >>>> more versatile. > >>>> > >>>> I consider target application of arm/virt as a board that's used to > >>>> run in production generic ACPI capable guest in most use cases and > >>>> various DT only guests as secondary ones. It's hard to make > >>>> both usecases be happy with defaults (that's probably one of the > >>>> reasons why 'sbsa' board is being added). > >>>> > >>>> So I'd give priority to ACPI based arm/virt versus DT when defaults are > >>>> considered. > >>>> > >>>>> Thanks > >>>>> > >>>>> Eric > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>> > > > >
Hi Igor, [..] > >> What still remains fuzzy for me is in case of cold plug the mmio hotplug >> control region part only is read (despite the slot selection of course) >> and returns 0 for addr/size and also flags meaning the slot is not >> enabled. > If you mean guest reads 0s than it looks broken, could you show > trace log with mhp_* tracepoints enabled during a dimm hotplug. Please find the traces + cmd line on x86 /qemu-system-x86_64 -M q35,usb=off,dump-guest-core=off,kernel_irqchip=split,nvdimm -cpu Haswell,-hle,-rtm -smp 4,sockets=4,cores=1,threads=1 -m 16G,maxmem=32G,slots=4 -display none --enable-kvm -serial tcp:localhost:4444,server -trace events=/home/augere/UPSTREAM/qemu2/nvdimm.txt -qmp unix:/home/augere/TEST/QEMU/qmp-sock,server,nowait -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -realtime mlock=off -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -machine kernel_irqchip=split -object memory-backend-file,id=mem3,share,mem-path=/home/augere/TEST/QEMU/nv-dimm-3,size=2G,align=128M -device nvdimm,memdev=mem3,id=dimm3,label-size=2M -object memory-backend-file,id=mem4,share,mem-path=/home/augere/TEST/QEMU/nv-dimm-4,size=2G,align=128M -device nvdimm,memdev=mem4,id=dimm4,label-size=2M -device virtio-blk-pci,bus=pcie.0,scsi=off,drive=drv0,id=virtio-disk0,bootindex=1,werror=stop,rerror=stop -drive file=/home/augere/VM/IMAGES/x86_64-vm1-f28.raw,format=raw,if=none,cache=writethrough,id=drv0 -device virtio-net-pci,bus=pcie.0,netdev=nic0,mac=6a:f5:10:b1:3d:d2 -netdev tap,id=nic0,script=/home/augere/TEST/SCRIPTS/qemu-ifup,downscript=/home/augere/TEST/SCRIPTS/qemu-ifdown,vhost=on -net none -d guest_errors ****************************************************************** ioctl(TUNSETIFF): Device or resource busy qemu-system-x86_64: -serial tcp:localhost:4444,server: info: QEMU waiting for connection on: disconnected:tcp:::1:4444,server qemu-system-x86_64: warning: global PIIX4_PM.disable_s3=1 not used qemu-system-x86_64: warning: global PIIX4_PM.disable_s4=1 not used 29556@1551449303.339464:mhp_acpi_write_slot set active slot: 0x0 29556@1551449303.339496:mhp_acpi_read_addr_hi slot[0x0] addr hi: 0x0 29556@1551449303.339505:mhp_acpi_read_addr_lo slot[0x0] addr lo: 0x0 29556@1551449303.339512:mhp_acpi_read_size_hi slot[0x0] size hi: 0x0 29556@1551449303.339520:mhp_acpi_read_size_lo slot[0x0] size lo: 0x0 29556@1551449303.339563:mhp_acpi_write_slot set active slot: 0x0 29556@1551449303.339574:mhp_acpi_read_flags slot[0x0] flags: 0x0 29556@1551449303.339621:mhp_acpi_write_slot set active slot: 0x1 29556@1551449303.339643:mhp_acpi_read_addr_hi slot[0x1] addr hi: 0x0 29556@1551449303.339651:mhp_acpi_read_addr_lo slot[0x1] addr lo: 0x0 29556@1551449303.339659:mhp_acpi_read_size_hi slot[0x1] size hi: 0x0 29556@1551449303.339667:mhp_acpi_read_size_lo slot[0x1] size lo: 0x0 29556@1551449303.339705:mhp_acpi_write_slot set active slot: 0x1 29556@1551449303.339713:mhp_acpi_read_flags slot[0x1] flags: 0x0 29556@1551449303.339757:mhp_acpi_write_slot set active slot: 0x2 29556@1551449303.339779:mhp_acpi_read_addr_hi slot[0x2] addr hi: 0x0 29556@1551449303.339787:mhp_acpi_read_addr_lo slot[0x2] addr lo: 0x0 29556@1551449303.339796:mhp_acpi_read_size_hi slot[0x2] size hi: 0x0 29556@1551449303.339804:mhp_acpi_read_size_lo slot[0x2] size lo: 0x0 29556@1551449303.339861:mhp_acpi_write_slot set active slot: 0x2 29556@1551449303.339870:mhp_acpi_read_flags slot[0x2] flags: 0x0 29556@1551449303.339916:mhp_acpi_write_slot set active slot: 0x3 29556@1551449303.339944:mhp_acpi_read_addr_hi slot[0x3] addr hi: 0x0 29556@1551449303.339954:mhp_acpi_read_addr_lo slot[0x3] addr lo: 0x0 29556@1551449303.339963:mhp_acpi_read_size_hi slot[0x3] size hi: 0x0 29556@1551449303.339971:mhp_acpi_read_size_lo slot[0x3] size lo: 0x0 29556@1551449303.340012:mhp_acpi_write_slot set active slot: 0x3 29556@1551449303.340020:mhp_acpi_read_flags slot[0x3] flags: 0x0 29556@1551449303.439695:mhp_acpi_write_slot set active slot: 0x0 29556@1551449303.439713:mhp_acpi_read_flags slot[0x0] flags: 0x0 29556@1551449303.439733:mhp_acpi_write_slot set active slot: 0x1 29556@1551449303.439740:mhp_acpi_read_flags slot[0x1] flags: 0x0 29556@1551449303.439759:mhp_acpi_write_slot set active slot: 0x2 29556@1551449303.439767:mhp_acpi_read_flags slot[0x2] flags: 0x0 29556@1551449303.439793:mhp_acpi_write_slot set active slot: 0x3 29556@1551449303.439801:mhp_acpi_read_flags slot[0x3] flags: 0x0 29556@1551449303.539590:mhp_acpi_write_slot set active slot: 0x0 29556@1551449303.539606:mhp_acpi_read_flags slot[0x0] flags: 0x0 29556@1551449303.539627:mhp_acpi_write_slot set active slot: 0x1 29556@1551449303.539634:mhp_acpi_read_flags slot[0x1] flags: 0x0 29556@1551449303.539652:mhp_acpi_write_slot set active slot: 0x2 29556@1551449303.539659:mhp_acpi_read_flags slot[0x2] flags: 0x0 29556@1551449303.539677:mhp_acpi_write_slot set active slot: 0x3 29556@1551449303.539684:mhp_acpi_read_flags slot[0x3] flags: 0x0 That's the only traces I get until I get the login prompt. Thanks Eric > >> So despite the slots are advertised as hotpluggable/enabled in >> the SRAT; I am not sure for the OS it actually makes any difference >> whether the DSDT definition blocks are described or not. > SRAT isn't used fro informing guests about amount of present RAM, > it holds affinity information for present and possible RAM > >> To be honest I am afraid this is too late to add those additional >> features for 4.0 now. This is going to jeopardize the first preliminary >> part which is the introduction of the new memory map, allowing the >> expansion of the initial RAM and paving the way for device memory >> introduction. So I think I am going to resend the first 10 patches in a >> standalone series. And we can iterate on the PCDIMM/NVDIMM parts >> independently. > sounds good to me, I'll try to review 1-10 today > >> Thanks >> >> Eric >>> >>> >>>> >>>> Thanks, >>>> Shameer >>>> >>>>> Then would remain the GED/GPIO actual integration. >>>>> >>>>> Thanks >>>>> >>>>> Eric >>>>>> >>>>>>> Also don't DIMM slots already make sense in DT mode. Usually we accept >>>>>>> to add one feature in DT and then in ACPI. For instance we can benefit >>>>>> usually it doesn't conflict with each other (at least I'm not aware of it) >>>>>> but I see a problem with in this case. >>>>>> >>>>>>> from nvdimm in dt mode right? So, considering an incremental approach I >>>>>>> would be in favour of keeping the DT nodes. >>>>>> I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much >>>>>> more versatile. >>>>>> >>>>>> I consider target application of arm/virt as a board that's used to >>>>>> run in production generic ACPI capable guest in most use cases and >>>>>> various DT only guests as secondary ones. It's hard to make >>>>>> both usecases be happy with defaults (that's probably one of the >>>>>> reasons why 'sbsa' board is being added). >>>>>> >>>>>> So I'd give priority to ACPI based arm/virt versus DT when defaults are >>>>>> considered. >>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Eric >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>> >>> >
On Fri, 1 Mar 2019 15:18:14 +0100 Auger Eric <eric.auger@redhat.com> wrote: > Hi Igor, > > [..] > > > > >> What still remains fuzzy for me is in case of cold plug the mmio hotplug > >> control region part only is read (despite the slot selection of course) > >> and returns 0 for addr/size and also flags meaning the slot is not > >> enabled. > > If you mean guest reads 0s than it looks broken, could you show > > trace log with mhp_* tracepoints enabled during a dimm hotplug. > > Please find the traces + cmd line on x86 I've thought that you where talking about pc-dimms, so here it goes: nvdimm is not part of memory hotplug interface, they have their own mmio region through which they let guest read completely rebuilt NFIT table and their own GPE._E04 event handler. you see 0's in trace because guest enumerates all PNP0C80 in DSDT to check if for any present pc-dimm for which mem hotplug interface reports 0s since there is none. PS: In ACPI spec there is an example of NVDIMMs where they also have associated memory device (PNP0C80) and that it's somehow related to nvdimm hotplug, but it's not described in sufficient detail so I'm honestly do not know what to do with it. Hence QEMU doesn't have PNP0C80 counterpart for nvdimm. To me it looks more like a mistake in the spec, but that's a topic for another discussion. > /qemu-system-x86_64 -M > q35,usb=off,dump-guest-core=off,kernel_irqchip=split,nvdimm -cpu > Haswell,-hle,-rtm -smp 4,sockets=4,cores=1,threads=1 -m > 16G,maxmem=32G,slots=4 -display none --enable-kvm -serial > tcp:localhost:4444,server -trace > events=/home/augere/UPSTREAM/qemu2/nvdimm.txt -qmp > unix:/home/augere/TEST/QEMU/qmp-sock,server,nowait -rtc > base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -realtime > mlock=off -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global > PIIX4_PM.disable_s4=1 -boot strict=on -machine kernel_irqchip=split > -object > memory-backend-file,id=mem3,share,mem-path=/home/augere/TEST/QEMU/nv-dimm-3,size=2G,align=128M > -device nvdimm,memdev=mem3,id=dimm3,label-size=2M -object > memory-backend-file,id=mem4,share,mem-path=/home/augere/TEST/QEMU/nv-dimm-4,size=2G,align=128M > -device nvdimm,memdev=mem4,id=dimm4,label-size=2M -device > virtio-blk-pci,bus=pcie.0,scsi=off,drive=drv0,id=virtio-disk0,bootindex=1,werror=stop,rerror=stop > -drive > file=/home/augere/VM/IMAGES/x86_64-vm1-f28.raw,format=raw,if=none,cache=writethrough,id=drv0 > -device virtio-net-pci,bus=pcie.0,netdev=nic0,mac=6a:f5:10:b1:3d:d2 > -netdev > tap,id=nic0,script=/home/augere/TEST/SCRIPTS/qemu-ifup,downscript=/home/augere/TEST/SCRIPTS/qemu-ifdown,vhost=on > -net none -d guest_errors > > ****************************************************************** > ioctl(TUNSETIFF): Device or resource busy > qemu-system-x86_64: -serial tcp:localhost:4444,server: info: QEMU > waiting for connection on: disconnected:tcp:::1:4444,server > qemu-system-x86_64: warning: global PIIX4_PM.disable_s3=1 not used > qemu-system-x86_64: warning: global PIIX4_PM.disable_s4=1 not used > 29556@1551449303.339464:mhp_acpi_write_slot set active slot: 0x0 > 29556@1551449303.339496:mhp_acpi_read_addr_hi slot[0x0] addr hi: 0x0 > 29556@1551449303.339505:mhp_acpi_read_addr_lo slot[0x0] addr lo: 0x0 > 29556@1551449303.339512:mhp_acpi_read_size_hi slot[0x0] size hi: 0x0 > 29556@1551449303.339520:mhp_acpi_read_size_lo slot[0x0] size lo: 0x0 > 29556@1551449303.339563:mhp_acpi_write_slot set active slot: 0x0 > 29556@1551449303.339574:mhp_acpi_read_flags slot[0x0] flags: 0x0 > 29556@1551449303.339621:mhp_acpi_write_slot set active slot: 0x1 > 29556@1551449303.339643:mhp_acpi_read_addr_hi slot[0x1] addr hi: 0x0 > 29556@1551449303.339651:mhp_acpi_read_addr_lo slot[0x1] addr lo: 0x0 > 29556@1551449303.339659:mhp_acpi_read_size_hi slot[0x1] size hi: 0x0 > 29556@1551449303.339667:mhp_acpi_read_size_lo slot[0x1] size lo: 0x0 > 29556@1551449303.339705:mhp_acpi_write_slot set active slot: 0x1 > 29556@1551449303.339713:mhp_acpi_read_flags slot[0x1] flags: 0x0 > 29556@1551449303.339757:mhp_acpi_write_slot set active slot: 0x2 > 29556@1551449303.339779:mhp_acpi_read_addr_hi slot[0x2] addr hi: 0x0 > 29556@1551449303.339787:mhp_acpi_read_addr_lo slot[0x2] addr lo: 0x0 > 29556@1551449303.339796:mhp_acpi_read_size_hi slot[0x2] size hi: 0x0 > 29556@1551449303.339804:mhp_acpi_read_size_lo slot[0x2] size lo: 0x0 > 29556@1551449303.339861:mhp_acpi_write_slot set active slot: 0x2 > 29556@1551449303.339870:mhp_acpi_read_flags slot[0x2] flags: 0x0 > 29556@1551449303.339916:mhp_acpi_write_slot set active slot: 0x3 > 29556@1551449303.339944:mhp_acpi_read_addr_hi slot[0x3] addr hi: 0x0 > 29556@1551449303.339954:mhp_acpi_read_addr_lo slot[0x3] addr lo: 0x0 > 29556@1551449303.339963:mhp_acpi_read_size_hi slot[0x3] size hi: 0x0 > 29556@1551449303.339971:mhp_acpi_read_size_lo slot[0x3] size lo: 0x0 > 29556@1551449303.340012:mhp_acpi_write_slot set active slot: 0x3 > 29556@1551449303.340020:mhp_acpi_read_flags slot[0x3] flags: 0x0 > 29556@1551449303.439695:mhp_acpi_write_slot set active slot: 0x0 > 29556@1551449303.439713:mhp_acpi_read_flags slot[0x0] flags: 0x0 > 29556@1551449303.439733:mhp_acpi_write_slot set active slot: 0x1 > 29556@1551449303.439740:mhp_acpi_read_flags slot[0x1] flags: 0x0 > 29556@1551449303.439759:mhp_acpi_write_slot set active slot: 0x2 > 29556@1551449303.439767:mhp_acpi_read_flags slot[0x2] flags: 0x0 > 29556@1551449303.439793:mhp_acpi_write_slot set active slot: 0x3 > 29556@1551449303.439801:mhp_acpi_read_flags slot[0x3] flags: 0x0 > 29556@1551449303.539590:mhp_acpi_write_slot set active slot: 0x0 > 29556@1551449303.539606:mhp_acpi_read_flags slot[0x0] flags: 0x0 > 29556@1551449303.539627:mhp_acpi_write_slot set active slot: 0x1 > 29556@1551449303.539634:mhp_acpi_read_flags slot[0x1] flags: 0x0 > 29556@1551449303.539652:mhp_acpi_write_slot set active slot: 0x2 > 29556@1551449303.539659:mhp_acpi_read_flags slot[0x2] flags: 0x0 > 29556@1551449303.539677:mhp_acpi_write_slot set active slot: 0x3 > 29556@1551449303.539684:mhp_acpi_read_flags slot[0x3] flags: 0x0 > > That's the only traces I get until I get the login prompt. > > Thanks > > Eric > > > > > >> So despite the slots are advertised as hotpluggable/enabled in > >> the SRAT; I am not sure for the OS it actually makes any difference > >> whether the DSDT definition blocks are described or not. > > SRAT isn't used fro informing guests about amount of present RAM, > > it holds affinity information for present and possible RAM > > > >> To be honest I am afraid this is too late to add those additional > >> features for 4.0 now. This is going to jeopardize the first preliminary > >> part which is the introduction of the new memory map, allowing the > >> expansion of the initial RAM and paving the way for device memory > >> introduction. So I think I am going to resend the first 10 patches in a > >> standalone series. And we can iterate on the PCDIMM/NVDIMM parts > >> independently. > > sounds good to me, I'll try to review 1-10 today > > > >> Thanks > >> > >> Eric > >>> > >>> > >>>> > >>>> Thanks, > >>>> Shameer > >>>> > >>>>> Then would remain the GED/GPIO actual integration. > >>>>> > >>>>> Thanks > >>>>> > >>>>> Eric > >>>>>> > >>>>>>> Also don't DIMM slots already make sense in DT mode. Usually we accept > >>>>>>> to add one feature in DT and then in ACPI. For instance we can benefit > >>>>>> usually it doesn't conflict with each other (at least I'm not aware of it) > >>>>>> but I see a problem with in this case. > >>>>>> > >>>>>>> from nvdimm in dt mode right? So, considering an incremental approach I > >>>>>>> would be in favour of keeping the DT nodes. > >>>>>> I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much > >>>>>> more versatile. > >>>>>> > >>>>>> I consider target application of arm/virt as a board that's used to > >>>>>> run in production generic ACPI capable guest in most use cases and > >>>>>> various DT only guests as secondary ones. It's hard to make > >>>>>> both usecases be happy with defaults (that's probably one of the > >>>>>> reasons why 'sbsa' board is being added). > >>>>>> > >>>>>> So I'd give priority to ACPI based arm/virt versus DT when defaults are > >>>>>> considered. > >>>>>> > >>>>>>> Thanks > >>>>>>> > >>>>>>> Eric > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>> > >>> > >>> > >
Hi Igor, On 3/1/19 5:33 PM, Igor Mammedov wrote: > On Fri, 1 Mar 2019 15:18:14 +0100 > Auger Eric <eric.auger@redhat.com> wrote: > >> Hi Igor, >> >> [..] >> >>> >>>> What still remains fuzzy for me is in case of cold plug the mmio hotplug >>>> control region part only is read (despite the slot selection of course) >>>> and returns 0 for addr/size and also flags meaning the slot is not >>>> enabled. >>> If you mean guest reads 0s than it looks broken, could you show >>> trace log with mhp_* tracepoints enabled during a dimm hotplug. >> >> Please find the traces + cmd line on x86 > I've thought that you where talking about pc-dimms, so here it goes: > > nvdimm is not part of memory hotplug interface, they have their own > mmio region through which they let guest read completely rebuilt > NFIT table and their own GPE._E04 event handler. > > you see 0's in trace because guest enumerates all PNP0C80 in DSDT > to check if for any present pc-dimm for which mem hotplug interface > reports 0s since there is none. > > PS: > In ACPI spec there is an example of NVDIMMs where they also have > associated memory device (PNP0C80) and that it's somehow related > to nvdimm hotplug, but it's not described in sufficient detail > so I'm honestly do not know what to do with it. Hence QEMU doesn't > have PNP0C80 counterpart for nvdimm. To me it looks more like > a mistake in the spec, but that's a topic for another discussion. Oh my bad. Indeed with PCDIMM I can see the actual addresses being read. Thank you for the explanation! Have a nice WE. Eric > > >> /qemu-system-x86_64 -M >> q35,usb=off,dump-guest-core=off,kernel_irqchip=split,nvdimm -cpu >> Haswell,-hle,-rtm -smp 4,sockets=4,cores=1,threads=1 -m >> 16G,maxmem=32G,slots=4 -display none --enable-kvm -serial >> tcp:localhost:4444,server -trace >> events=/home/augere/UPSTREAM/qemu2/nvdimm.txt -qmp >> unix:/home/augere/TEST/QEMU/qmp-sock,server,nowait -rtc >> base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -realtime >> mlock=off -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global >> PIIX4_PM.disable_s4=1 -boot strict=on -machine kernel_irqchip=split >> -object >> memory-backend-file,id=mem3,share,mem-path=/home/augere/TEST/QEMU/nv-dimm-3,size=2G,align=128M >> -device nvdimm,memdev=mem3,id=dimm3,label-size=2M -object >> memory-backend-file,id=mem4,share,mem-path=/home/augere/TEST/QEMU/nv-dimm-4,size=2G,align=128M >> -device nvdimm,memdev=mem4,id=dimm4,label-size=2M -device >> virtio-blk-pci,bus=pcie.0,scsi=off,drive=drv0,id=virtio-disk0,bootindex=1,werror=stop,rerror=stop >> -drive >> file=/home/augere/VM/IMAGES/x86_64-vm1-f28.raw,format=raw,if=none,cache=writethrough,id=drv0 >> -device virtio-net-pci,bus=pcie.0,netdev=nic0,mac=6a:f5:10:b1:3d:d2 >> -netdev >> tap,id=nic0,script=/home/augere/TEST/SCRIPTS/qemu-ifup,downscript=/home/augere/TEST/SCRIPTS/qemu-ifdown,vhost=on >> -net none -d guest_errors >> >> ****************************************************************** >> ioctl(TUNSETIFF): Device or resource busy >> qemu-system-x86_64: -serial tcp:localhost:4444,server: info: QEMU >> waiting for connection on: disconnected:tcp:::1:4444,server >> qemu-system-x86_64: warning: global PIIX4_PM.disable_s3=1 not used >> qemu-system-x86_64: warning: global PIIX4_PM.disable_s4=1 not used >> 29556@1551449303.339464:mhp_acpi_write_slot set active slot: 0x0 >> 29556@1551449303.339496:mhp_acpi_read_addr_hi slot[0x0] addr hi: 0x0 >> 29556@1551449303.339505:mhp_acpi_read_addr_lo slot[0x0] addr lo: 0x0 >> 29556@1551449303.339512:mhp_acpi_read_size_hi slot[0x0] size hi: 0x0 >> 29556@1551449303.339520:mhp_acpi_read_size_lo slot[0x0] size lo: 0x0 >> 29556@1551449303.339563:mhp_acpi_write_slot set active slot: 0x0 >> 29556@1551449303.339574:mhp_acpi_read_flags slot[0x0] flags: 0x0 >> 29556@1551449303.339621:mhp_acpi_write_slot set active slot: 0x1 >> 29556@1551449303.339643:mhp_acpi_read_addr_hi slot[0x1] addr hi: 0x0 >> 29556@1551449303.339651:mhp_acpi_read_addr_lo slot[0x1] addr lo: 0x0 >> 29556@1551449303.339659:mhp_acpi_read_size_hi slot[0x1] size hi: 0x0 >> 29556@1551449303.339667:mhp_acpi_read_size_lo slot[0x1] size lo: 0x0 >> 29556@1551449303.339705:mhp_acpi_write_slot set active slot: 0x1 >> 29556@1551449303.339713:mhp_acpi_read_flags slot[0x1] flags: 0x0 >> 29556@1551449303.339757:mhp_acpi_write_slot set active slot: 0x2 >> 29556@1551449303.339779:mhp_acpi_read_addr_hi slot[0x2] addr hi: 0x0 >> 29556@1551449303.339787:mhp_acpi_read_addr_lo slot[0x2] addr lo: 0x0 >> 29556@1551449303.339796:mhp_acpi_read_size_hi slot[0x2] size hi: 0x0 >> 29556@1551449303.339804:mhp_acpi_read_size_lo slot[0x2] size lo: 0x0 >> 29556@1551449303.339861:mhp_acpi_write_slot set active slot: 0x2 >> 29556@1551449303.339870:mhp_acpi_read_flags slot[0x2] flags: 0x0 >> 29556@1551449303.339916:mhp_acpi_write_slot set active slot: 0x3 >> 29556@1551449303.339944:mhp_acpi_read_addr_hi slot[0x3] addr hi: 0x0 >> 29556@1551449303.339954:mhp_acpi_read_addr_lo slot[0x3] addr lo: 0x0 >> 29556@1551449303.339963:mhp_acpi_read_size_hi slot[0x3] size hi: 0x0 >> 29556@1551449303.339971:mhp_acpi_read_size_lo slot[0x3] size lo: 0x0 >> 29556@1551449303.340012:mhp_acpi_write_slot set active slot: 0x3 >> 29556@1551449303.340020:mhp_acpi_read_flags slot[0x3] flags: 0x0 >> 29556@1551449303.439695:mhp_acpi_write_slot set active slot: 0x0 >> 29556@1551449303.439713:mhp_acpi_read_flags slot[0x0] flags: 0x0 >> 29556@1551449303.439733:mhp_acpi_write_slot set active slot: 0x1 >> 29556@1551449303.439740:mhp_acpi_read_flags slot[0x1] flags: 0x0 >> 29556@1551449303.439759:mhp_acpi_write_slot set active slot: 0x2 >> 29556@1551449303.439767:mhp_acpi_read_flags slot[0x2] flags: 0x0 >> 29556@1551449303.439793:mhp_acpi_write_slot set active slot: 0x3 >> 29556@1551449303.439801:mhp_acpi_read_flags slot[0x3] flags: 0x0 >> 29556@1551449303.539590:mhp_acpi_write_slot set active slot: 0x0 >> 29556@1551449303.539606:mhp_acpi_read_flags slot[0x0] flags: 0x0 >> 29556@1551449303.539627:mhp_acpi_write_slot set active slot: 0x1 >> 29556@1551449303.539634:mhp_acpi_read_flags slot[0x1] flags: 0x0 >> 29556@1551449303.539652:mhp_acpi_write_slot set active slot: 0x2 >> 29556@1551449303.539659:mhp_acpi_read_flags slot[0x2] flags: 0x0 >> 29556@1551449303.539677:mhp_acpi_write_slot set active slot: 0x3 >> 29556@1551449303.539684:mhp_acpi_read_flags slot[0x3] flags: 0x0 >> >> That's the only traces I get until I get the login prompt. >> >> Thanks >> >> Eric >> >> >>> >>>> So despite the slots are advertised as hotpluggable/enabled in >>>> the SRAT; I am not sure for the OS it actually makes any difference >>>> whether the DSDT definition blocks are described or not. >>> SRAT isn't used fro informing guests about amount of present RAM, >>> it holds affinity information for present and possible RAM >>> >>>> To be honest I am afraid this is too late to add those additional >>>> features for 4.0 now. This is going to jeopardize the first preliminary >>>> part which is the introduction of the new memory map, allowing the >>>> expansion of the initial RAM and paving the way for device memory >>>> introduction. So I think I am going to resend the first 10 patches in a >>>> standalone series. And we can iterate on the PCDIMM/NVDIMM parts >>>> independently. >>> sounds good to me, I'll try to review 1-10 today >>> >>>> Thanks >>>> >>>> Eric >>>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>> Shameer >>>>>> >>>>>>> Then would remain the GED/GPIO actual integration. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Eric >>>>>>>> >>>>>>>>> Also don't DIMM slots already make sense in DT mode. Usually we accept >>>>>>>>> to add one feature in DT and then in ACPI. For instance we can benefit >>>>>>>> usually it doesn't conflict with each other (at least I'm not aware of it) >>>>>>>> but I see a problem with in this case. >>>>>>>> >>>>>>>>> from nvdimm in dt mode right? So, considering an incremental approach I >>>>>>>>> would be in favour of keeping the DT nodes. >>>>>>>> I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much >>>>>>>> more versatile. >>>>>>>> >>>>>>>> I consider target application of arm/virt as a board that's used to >>>>>>>> run in production generic ACPI capable guest in most use cases and >>>>>>>> various DT only guests as secondary ones. It's hard to make >>>>>>>> both usecases be happy with defaults (that's probably one of the >>>>>>>> reasons why 'sbsa' board is being added). >>>>>>>> >>>>>>>> So I'd give priority to ACPI based arm/virt versus DT when defaults are >>>>>>>> considered. >>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> Eric >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>> >>>>> >>> > >
© 2016 - 2025 Red Hat, Inc.