drivers/acpi/nfit/core.c | 4 - drivers/dax/super.c | 17 +++++ drivers/nvdimm/claim.c | 6 + drivers/nvdimm/nd.h | 1 drivers/nvdimm/pmem.c | 15 +++- drivers/nvdimm/region_devs.c | 45 +++++++++++++- drivers/nvdimm/virtio_pmem.c | 84 ++++++++++++++++++++++++++ drivers/virtio/Kconfig | 10 +++ drivers/virtio/Makefile | 1 drivers/virtio/pmem.c | 125 +++++++++++++++++++++++++++++++++++++++ fs/ext4/file.c | 11 +++ fs/xfs/xfs_file.c | 8 ++ include/linux/dax.h | 9 ++ include/linux/libnvdimm.h | 11 +++ include/linux/virtio_pmem.h | 60 ++++++++++++++++++ include/uapi/linux/virtio_ids.h | 1 include/uapi/linux/virtio_pmem.h | 10 +++ 17 files changed, 406 insertions(+), 12 deletions(-)
This patch series has implementation for "virtio pmem".
"virtio pmem" is fake persistent memory(nvdimm) in guest
which allows to bypass the guest page cache. This also
implements a VIRTIO based asynchronous flush mechanism.
Sharing guest kernel driver in this patchset with the
changes suggested in v2. Tested with Qemu side device
emulation for virtio-pmem [6].
Details of project idea for 'virtio pmem' flushing interface
is shared [3] & [4].
Implementation is divided into two parts:
New virtio pmem guest driver and qemu code changes for new
virtio pmem paravirtualized device.
1. Guest virtio-pmem kernel driver
---------------------------------
- Reads persistent memory range from paravirt device and
registers with 'nvdimm_bus'.
- 'nvdimm/pmem' driver uses this information to allocate
persistent memory region and setup filesystem operations
to the allocated memory.
- virtio pmem driver implements asynchronous flushing
interface to flush from guest to host.
2. Qemu virtio-pmem device
---------------------------------
- Creates virtio pmem device and exposes a memory range to
KVM guest.
- At host side this is file backed memory which acts as
persistent memory.
- Qemu side flush uses aio thread pool API's and virtio
for asynchronous guest multi request handling.
David Hildenbrand CCed also posted a modified version[6] of
qemu virtio-pmem code based on updated Qemu memory device API.
Virtio-pmem errors handling:
----------------------------------------
Checked behaviour of virtio-pmem for below types of errors
Need suggestions on expected behaviour for handling these errors?
- Hardware Errors: Uncorrectable recoverable Errors:
a] virtio-pmem:
- As per current logic if error page belongs to Qemu process,
host MCE handler isolates(hwpoison) that page and send SIGBUS.
Qemu SIGBUS handler injects exception to KVM guest.
- KVM guest then isolates the page and send SIGBUS to guest
userspace process which has mapped the page.
b] Existing implementation for ACPI pmem driver:
- Handles such errors with MCE notifier and creates a list
of bad blocks. Read/direct access DAX operation return EIO
if accessed memory page fall in bad block list.
- It also starts backgound scrubbing.
- Similar functionality can be reused in virtio-pmem with MCE
notifier but without scrubbing(no ACPI/ARS)? Need inputs to
confirm if this behaviour is ok or needs any change?
Changes from PATCH v2: [1]
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan]
- Use name 'virtio pmem' in place of 'fake dax'
Changes from PATCH v1: [2]
- 0-day build test for build dependency on libnvdimm
Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text
- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow
Changes from RFC v3
- Rebase to latest upstream - Luiz
- Call ndregion->flush in place of nvdimm_flush- Luiz
- kmalloc return check - Luiz
- virtqueue full handling - Stefan
- Don't map entire virtio_pmem_req to device - Stefan
- request leak, correct sizeof req- Stefan
- Move declaration to virtio_pmem.c
Changes from RFC v2:
- Add flush function in the nd_region in place of switching
on a flag - Dan & Stefan
- Add flush completion function with proper locking and wait
for host side flush completion - Stefan & Dan
- Keep userspace API in uapi header file - Stefan, MST
- Use LE fields & New device id - MST
- Indentation & spacing suggestions - MST & Eric
- Remove extra header files & add licensing - Stefan
Changes from RFC v1:
- Reuse existing 'pmem' code for registering persistent
memory and other operations instead of creating an entirely
new block driver.
- Use VIRTIO driver to register memory information with
nvdimm_bus and create region_type accordingly.
- Call VIRTIO flush from existing pmem driver.
Pankaj Gupta (5):
libnvdimm: nd_region flush callback support
virtio-pmem: Add virtio-pmem guest driver
libnvdimm: add nd_region buffered dax_dev flag
ext4: disable map_sync for virtio pmem
xfs: disable map_sync for virtio pmem
[2] https://lkml.org/lkml/2018/8/31/407
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html
[5] https://lkml.org/lkml/2018/8/31/413
[6] https://marc.info/?l=qemu-devel&m=153555721901824&w=2
drivers/acpi/nfit/core.c | 4 -
drivers/dax/super.c | 17 +++++
drivers/nvdimm/claim.c | 6 +
drivers/nvdimm/nd.h | 1
drivers/nvdimm/pmem.c | 15 +++-
drivers/nvdimm/region_devs.c | 45 +++++++++++++-
drivers/nvdimm/virtio_pmem.c | 84 ++++++++++++++++++++++++++
drivers/virtio/Kconfig | 10 +++
drivers/virtio/Makefile | 1
drivers/virtio/pmem.c | 125 +++++++++++++++++++++++++++++++++++++++
fs/ext4/file.c | 11 +++
fs/xfs/xfs_file.c | 8 ++
include/linux/dax.h | 9 ++
include/linux/libnvdimm.h | 11 +++
include/linux/virtio_pmem.h | 60 ++++++++++++++++++
include/uapi/linux/virtio_ids.h | 1
include/uapi/linux/virtio_pmem.h | 10 +++
17 files changed, 406 insertions(+), 12 deletions(-)
Please ignore this series as my network went down while sending this. I will send this series again. Thanks, Pankaj > > This patch series has implementation for "virtio pmem". > "virtio pmem" is fake persistent memory(nvdimm) in guest > which allows to bypass the guest page cache. This also > implements a VIRTIO based asynchronous flush mechanism. > > Sharing guest kernel driver in this patchset with the > changes suggested in v2. Tested with Qemu side device > emulation for virtio-pmem [6]. > > Details of project idea for 'virtio pmem' flushing interface > is shared [3] & [4]. > > Implementation is divided into two parts: > New virtio pmem guest driver and qemu code changes for new > virtio pmem paravirtualized device. > > 1. Guest virtio-pmem kernel driver > --------------------------------- > - Reads persistent memory range from paravirt device and > registers with 'nvdimm_bus'. > - 'nvdimm/pmem' driver uses this information to allocate > persistent memory region and setup filesystem operations > to the allocated memory. > - virtio pmem driver implements asynchronous flushing > interface to flush from guest to host. > > 2. Qemu virtio-pmem device > --------------------------------- > - Creates virtio pmem device and exposes a memory range to > KVM guest. > - At host side this is file backed memory which acts as > persistent memory. > - Qemu side flush uses aio thread pool API's and virtio > for asynchronous guest multi request handling. > > David Hildenbrand CCed also posted a modified version[6] of > qemu virtio-pmem code based on updated Qemu memory device API. > > Virtio-pmem errors handling: > ---------------------------------------- > Checked behaviour of virtio-pmem for below types of errors > Need suggestions on expected behaviour for handling these errors? > > - Hardware Errors: Uncorrectable recoverable Errors: > a] virtio-pmem: > - As per current logic if error page belongs to Qemu process, > host MCE handler isolates(hwpoison) that page and send SIGBUS. > Qemu SIGBUS handler injects exception to KVM guest. > - KVM guest then isolates the page and send SIGBUS to guest > userspace process which has mapped the page. > > b] Existing implementation for ACPI pmem driver: > - Handles such errors with MCE notifier and creates a list > of bad blocks. Read/direct access DAX operation return EIO > if accessed memory page fall in bad block list. > - It also starts backgound scrubbing. > - Similar functionality can be reused in virtio-pmem with MCE > notifier but without scrubbing(no ACPI/ARS)? Need inputs to > confirm if this behaviour is ok or needs any change? > > Changes from PATCH v2: [1] > - Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] > - Use name 'virtio pmem' in place of 'fake dax' > > Changes from PATCH v1: [2] > - 0-day build test for build dependency on libnvdimm > > Changes suggested by - [Dan Williams] > - Split the driver into two parts virtio & pmem > - Move queuing of async block request to block layer > - Add "sync" parameter in nvdimm_flush function > - Use indirect call for nvdimm_flush > - Don’t move declarations to common global header e.g nd.h > - nvdimm_flush() return 0 or -EIO if it fails > - Teach nsio_rw_bytes() that the flush can fail > - Rename nvdimm_flush() to generic_nvdimm_flush() > - Use 'nd_region->provider_data' for long dereferencing > - Remove virtio_pmem_freeze/restore functions > - Remove BSD license text with SPDX license text > > - Add might_sleep() in virtio_pmem_flush - [Luiz] > - Make spin_lock_irqsave() narrow > > Changes from RFC v3 > - Rebase to latest upstream - Luiz > - Call ndregion->flush in place of nvdimm_flush- Luiz > - kmalloc return check - Luiz > - virtqueue full handling - Stefan > - Don't map entire virtio_pmem_req to device - Stefan > - request leak, correct sizeof req- Stefan > - Move declaration to virtio_pmem.c > > Changes from RFC v2: > - Add flush function in the nd_region in place of switching > on a flag - Dan & Stefan > - Add flush completion function with proper locking and wait > for host side flush completion - Stefan & Dan > - Keep userspace API in uapi header file - Stefan, MST > - Use LE fields & New device id - MST > - Indentation & spacing suggestions - MST & Eric > - Remove extra header files & add licensing - Stefan > > Changes from RFC v1: > - Reuse existing 'pmem' code for registering persistent > memory and other operations instead of creating an entirely > new block driver. > - Use VIRTIO driver to register memory information with > nvdimm_bus and create region_type accordingly. > - Call VIRTIO flush from existing pmem driver. > > Pankaj Gupta (5): > libnvdimm: nd_region flush callback support > virtio-pmem: Add virtio-pmem guest driver > libnvdimm: add nd_region buffered dax_dev flag > ext4: disable map_sync for virtio pmem > xfs: disable map_sync for virtio pmem > > [2] https://lkml.org/lkml/2018/8/31/407 > [3] https://www.spinics.net/lists/kvm/msg149761.html > [4] https://www.spinics.net/lists/kvm/msg153095.html > [5] https://lkml.org/lkml/2018/8/31/413 > [6] https://marc.info/?l=qemu-devel&m=153555721901824&w=2 > > drivers/acpi/nfit/core.c | 4 - > drivers/dax/super.c | 17 +++++ > drivers/nvdimm/claim.c | 6 + > drivers/nvdimm/nd.h | 1 > drivers/nvdimm/pmem.c | 15 +++- > drivers/nvdimm/region_devs.c | 45 +++++++++++++- > drivers/nvdimm/virtio_pmem.c | 84 ++++++++++++++++++++++++++ > drivers/virtio/Kconfig | 10 +++ > drivers/virtio/Makefile | 1 > drivers/virtio/pmem.c | 125 > +++++++++++++++++++++++++++++++++++++++ > fs/ext4/file.c | 11 +++ > fs/xfs/xfs_file.c | 8 ++ > include/linux/dax.h | 9 ++ > include/linux/libnvdimm.h | 11 +++ > include/linux/virtio_pmem.h | 60 ++++++++++++++++++ > include/uapi/linux/virtio_ids.h | 1 > include/uapi/linux/virtio_pmem.h | 10 +++ > 17 files changed, 406 insertions(+), 12 deletions(-) > > >
© 2016 - 2026 Red Hat, Inc.