[Qemu-devel] [RFC v2 PATCH 0/3] ppc: spapr: virtual NVDIMM support

Shivaprasad G Bhat posted 3 patches 4 years, 10 months ago
Test asan passed
Test docker-clang@ubuntu failed
Test checkpatch passed
Test docker-mingw@fedora passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/155773946961.49142.5208084426066783536.stgit@lep8c.aus.stglabs.ibm.com
Maintainers: Xiao Guangrong <xiaoguangrong.eric@gmail.com>, Igor Mammedov <imammedo@redhat.com>, David Gibson <david@gibson.dropbear.id.au>, "Michael S. Tsirkin" <mst@redhat.com>
There is a newer version of this series
default-configs/ppc64-softmmu.mak |    1
hw/acpi/nvdimm.c                  |   27 -----
hw/mem/Kconfig                    |    2
hw/mem/nvdimm.c                   |   70 +++++++++++++
hw/ppc/spapr.c                    |  202 +++++++++++++++++++++++++++++++++++--
hw/ppc/spapr_drc.c                |   18 +++
hw/ppc/spapr_events.c             |    4 +
hw/ppc/spapr_hcall.c              |  202 +++++++++++++++++++++++++++++++++++++
include/hw/mem/nvdimm.h           |    8 +
include/hw/ppc/spapr.h            |   19 +++
include/hw/ppc/spapr_drc.h        |    9 ++
11 files changed, 523 insertions(+), 39 deletions(-)
[Qemu-devel] [RFC v2 PATCH 0/3] ppc: spapr: virtual NVDIMM support
Posted by Shivaprasad G Bhat 4 years, 10 months ago
The patchset attempts to implement the virtual NVDIMM for pseries.

PAPR semantics is such that each NVDIMM device is comprising of multiple
SCM(Storage Class Memory) blocks. The hypervisor is expected to prepare the
FDT for the NVDIMM device and send guest a hotplug interrupt with new type 
RTAS_LOG_V6_HP_TYPE_PMEM currently handled by the upstream kernel. In response
to that interrupt, the guest requests the hypervisor to bind each of the SCM
blocks of the NVDIMM device using hcalls. There can be SCM block unbind
requests in case of driver errors or unplug(not supported now) use cases. The
NVDIMM label read/writes are done through hcalls.

There are also new futuristic hcalls added(currently unused in the kernel), for
querying the informations such as binding, logical addresses of the SCM blocks.
The current patchset leaves them unimplemented.

Since each virtual NVDIMM device is divided into multiple SCM blocks, the bind,
unbind, and queries using hcalls on those blocks can come independently. This
doesnt fit well into the qemu device semantics, where the map/unmap are done at
the (whole)device/object level granularity. The patchset uses the existing
NVDIMM class structures for the implementation. The bind/unbind is left to
happen at the object_add/del phase itself instead of at hcalls on-demand.

The guest kernel makes bind/unbind requests for the virtual NVDIMM device at the
region level granularity. Without interleaving, each virtual NVDIMM device is
presented as separate region. There is no way to configure the virtual NVDIMM
interleaving for the guests today. So, there is no way a partial bind/unbind
request can come for the vNVDIMM in a hcall for a subset of SCM blocks of a
virtual NVDIMM. Hence it is safe to do bind/unbind everything during the
object_add/del.

The free device-memory region which is used for memory hotplug are done using
multiple LMBs of size(256MiB) and are expected to be aligned to 256 MiB. As the
SCM blocks are mapped to the same region, the SCM blocks also need to be
aligned to this size for the subsequent memory hotplug to work. The minimum SCM
block size is set to this size for that reason and can be made user configurable
in future if required.

The first patch moves around the existing static function to common area
for using it in the subsequent patches. Second patch adds the FDT entries and
basic device support, the third patch adds the hcalls implementation.

The patches are also available at https://github.com/ShivaprasadGBhat/qemu.git -
pseries-nvdimm branch and can be used with the upstream kernel. ndctl can be
used for configuring the nvdimms inside the guest.

This is how it can be used ..
Add nvdimm=on to the qemu machine argument,
Ex : -machine pseries,nvdimm=on
For coldplug, the device to be added in qemu command line as shown below
-object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896
-device nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0

For hotplug, the device to be added from monitor as below
object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896
device_add nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0

---
v1 : http://lists.nongnu.org/archive/html/qemu-devel/2019-02/msg01545.html
Changes from v1:
     - Rebased to upstream, this required required dt_populate implementation
       for nvdimm hotplug support
     - Added uuid option to nvdimm device
     - Removed the memory region sizing down code as suggested by Igor,
       now erroring out if NVDIMM size excluding the label area is not
       aligned to 256MB, so patch 2 from previous series no longer needed.
     - Removed un-implemented hcalls
     - Changed the hcalls to different kinds of checks and return
       different values.
     - Addressed comments for v1

Shivaprasad G Bhat (3):
      mem: make nvdimm_device_list global
      spapr: Add NVDIMM device support
      spapr: Add Hcalls to support PAPR NVDIMM device


 default-configs/ppc64-softmmu.mak |    1 
 hw/acpi/nvdimm.c                  |   27 -----
 hw/mem/Kconfig                    |    2 
 hw/mem/nvdimm.c                   |   70 +++++++++++++
 hw/ppc/spapr.c                    |  202 +++++++++++++++++++++++++++++++++++--
 hw/ppc/spapr_drc.c                |   18 +++
 hw/ppc/spapr_events.c             |    4 +
 hw/ppc/spapr_hcall.c              |  202 +++++++++++++++++++++++++++++++++++++
 include/hw/mem/nvdimm.h           |    8 +
 include/hw/ppc/spapr.h            |   19 +++
 include/hw/ppc/spapr_drc.h        |    9 ++
 11 files changed, 523 insertions(+), 39 deletions(-)

--
Signature