docs/nvdimm.txt | 22 +++++++++++++++++++--- qemu-options.hx | 5 +++++ util/mmap-alloc.c | 41 ++++++++++++++++++++++++++++++++++++++++- 3 files changed, 64 insertions(+), 4 deletions(-)
Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
files on ext4/xfs file system mounted with '-o dax').
A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
https://patchwork.kernel.org/patch/10028151/
In order to make sure that the file metadata is in sync after a fault
while we are writing a shared DAX supporting backend files, this
patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
As the DAX vs DMA truncated issue was solved, we refined the code and
send out this feature for the v5 version.
We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
'share=on' & 'pmem=on'.
Or QEMU will not pass this flag to mmap(2)
Test with below cases:
1. pmem=on is set, shared=on is set, MAP_SYNC supported:
a: backend is a dax supporting file.
1) start VM1 with options:
-object memory-backend-file,id=nv_be4,share,mem-path=${DAX_FILE_1},size=${DAX_FILE_SIZE_1},align=128M,pmem=on,share=on
-device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
2) start VM2 with options:
-object memory-backend-file,id=nv_be4,share,mem-path=${DAX_FILE_2,size=${DAX_FILE_SIZE_2},align=128M,pmem=on,share=on
-device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
3) live migrate from VM1 to VM2.
4) Suddenly let Host crash or power failure.
5) check DAX_FILE_1 and DAX_FILE_2, no corrupt.
b: backend is a regular file.
1) start with options
-object memory-backend-file,id=nv_be4,share,mem-path=${REG_FILE},size=${REG_FILE_SIZE},align=128M,pmem=on,share=on
-device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
will warning "failed to validate with mapping flags: Operation not supported"
FILE_1 and FILE_2 random corrupt.
2. Other cases:
FILE_1 and FILE_2 random corrupt.
Changes in V14:
* 1/2 rebase on top of current upstream and tested
Changes in V13:
* 4/5 Micheal: move the inlcude to mmap_alloc.c.
* 4/5 Micheal: refine the warning message.
* 5/5 Micheal: refine the Documentations.
Changes in V12:
* 2/5: Micheal: Update update-linux-headers.sh
* 3/5: Micheal: Use script update add linux/mman.h
* 4/5: Pankaj,Micheal: 1) fallback to mmap without
MAP_SYNC & MAP_SHARED_VALIDATE if sync not supported or failed
2) Replace the include with 3/5 added linux/mman.h
* 5/5: Micheal: Refine the Documentations.
Changes in V11:
* 1/3: Micheal: Change to just add a bool is_pmem in qemu_ram_mmap.
* 2/3: Micheal: Fix the compatibility for old kernel.
* 2/3&3/3: Micheal&Eduardo :Update the behavior below:
Waning at no-dax and continue without MAP_SYNC.
Test if fails again for compatibility, then remove the MAP_VALIDATE and
silently proceed.
Changes in V10:
* 4/4: refine the document.
* 3/4: Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
* 2/4: refine the commit message, Added MAP_SHARED_VALIDATE.
* 2/4: Fix the wrong include header
Changes in V9:
* 1/6: Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
* 2/6: New Added: Micheal: use sparse feature define RAM_FLAG.
since I don't have much knowledge about the sparse feature, @Micheal Could you
add some documentation/commit message on this patch? Thank you very much.
* 3/6: from 2/5: Eduardo: updated the commit message.
* 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
* 5/6: from 4/5: Eduardo: updated the commit message.
* 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.
Changes in v8:
* Micheal: 3/5, remove the duplicated define in the os_dep.h
* Micheal: 2/5, make type define safety.
* Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
* 4/6 removed, we remove the on/off/auto define of sync, as by now,
MAP_SYNC only worked with pmem=on.
* @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to parse
all the flags in one parameter.
Changes in v7:
* Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)
Changes in v6:
* Pankaj: 3/7 are squashed with 2/7
* Pankaj: 7/7 update comments to "consistent filesystem metadata".
* Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
* Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
* Stefan, 5/7 Add missing "munmap"
* Stefan, 2/7 refine the shared/flag.
Changes in v5:
* Add patch 1 to fix a memory leak issue.
* Refine the patch 4-6
* Remove the patch 3 as we already change the parameter from "shared" to
"flags"
Changes in v4:
* Add patch 1-3 to switch some functions to a single 'flags'
parameters. (Michael S. Tsirkin)
* v3 patch 1-3 become v4 patch 4-6.
* Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
* Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)
Changes in v3:
* Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
cases, and add back the retry mechanism. MAP_SYNC will be ignored
by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
* Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
platforms in order to make qemu_ram_mmap() compile on those platforms.
* Patch 2&3: include more information in error messages of
memory-backend in hope to help user to identify the error.
(Dr. David Alan Gilbert)
* Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)
Changes in v2:
* Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
* Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
* Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
to osdep.h. (Michael S. Tsirkin)
Zhang Yi (2):
util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
docs: Added MAP_SYNC documentation
docs/nvdimm.txt | 22 +++++++++++++++++++---
qemu-options.hx | 5 +++++
util/mmap-alloc.c | 41 ++++++++++++++++++++++++++++++++++++++++-
3 files changed, 64 insertions(+), 4 deletions(-)
--
2.19.1
On Mon, Apr 22, 2019 at 08:48:47AM +0800, Wei Yang wrote:
> Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> files on ext4/xfs file system mounted with '-o dax').
>
> A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
> https://patchwork.kernel.org/patch/10028151/
>
> In order to make sure that the file metadata is in sync after a fault
> while we are writing a shared DAX supporting backend files, this
> patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
>
> As the DAX vs DMA truncated issue was solved, we refined the code and
> send out this feature for the v5 version.
>
> We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
> 'share=on' & 'pmem=on'.
> Or QEMU will not pass this flag to mmap(2)
OK this is in a good shape. As we are in freeze anyway,
there's still a bit more time to polish it. I have a couple of
suggestions:
- squash docs in same patch with code, no need for two patches
- mmap errors are not silently ignored as the doc says,
a warning is produced
Also, it might make sense to send the warnings to an errp object and not stderr.
I would leave that to a follow-up patch.
> Test with below cases:
> 1. pmem=on is set, shared=on is set, MAP_SYNC supported:
> a: backend is a dax supporting file.
> 1) start VM1 with options:
> -object memory-backend-file,id=nv_be4,share,mem-path=${DAX_FILE_1},size=${DAX_FILE_SIZE_1},align=128M,pmem=on,share=on
> -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
>
> 2) start VM2 with options:
> -object memory-backend-file,id=nv_be4,share,mem-path=${DAX_FILE_2,size=${DAX_FILE_SIZE_2},align=128M,pmem=on,share=on
> -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
>
> 3) live migrate from VM1 to VM2.
>
> 4) Suddenly let Host crash or power failure.
>
> 5) check DAX_FILE_1 and DAX_FILE_2, no corrupt.
>
> b: backend is a regular file.
> 1) start with options
> -object memory-backend-file,id=nv_be4,share,mem-path=${REG_FILE},size=${REG_FILE_SIZE},align=128M,pmem=on,share=on
> -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
>
> will warning "failed to validate with mapping flags: Operation not supported"
> FILE_1 and FILE_2 random corrupt.
>
> 2. Other cases:
> FILE_1 and FILE_2 random corrupt.
>
> Changes in V14:
> * 1/2 rebase on top of current upstream and tested
>
> Changes in V13:
> * 4/5 Micheal: move the inlcude to mmap_alloc.c.
> * 4/5 Micheal: refine the warning message.
> * 5/5 Micheal: refine the Documentations.
>
> Changes in V12:
> * 2/5: Micheal: Update update-linux-headers.sh
> * 3/5: Micheal: Use script update add linux/mman.h
> * 4/5: Pankaj,Micheal: 1) fallback to mmap without
> MAP_SYNC & MAP_SHARED_VALIDATE if sync not supported or failed
> 2) Replace the include with 3/5 added linux/mman.h
> * 5/5: Micheal: Refine the Documentations.
>
> Changes in V11:
> * 1/3: Micheal: Change to just add a bool is_pmem in qemu_ram_mmap.
> * 2/3: Micheal: Fix the compatibility for old kernel.
> * 2/3&3/3: Micheal&Eduardo :Update the behavior below:
> Waning at no-dax and continue without MAP_SYNC.
> Test if fails again for compatibility, then remove the MAP_VALIDATE and
> silently proceed.
>
> Changes in V10:
> * 4/4: refine the document.
> * 3/4: Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
> * 2/4: refine the commit message, Added MAP_SHARED_VALIDATE.
> * 2/4: Fix the wrong include header
>
> Changes in V9:
> * 1/6: Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> * 2/6: New Added: Micheal: use sparse feature define RAM_FLAG.
> since I don't have much knowledge about the sparse feature, @Micheal Could you
> add some documentation/commit message on this patch? Thank you very much.
> * 3/6: from 2/5: Eduardo: updated the commit message.
> * 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
> * 5/6: from 4/5: Eduardo: updated the commit message.
> * 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.
>
> Changes in v8:
> * Micheal: 3/5, remove the duplicated define in the os_dep.h
> * Micheal: 2/5, make type define safety.
> * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
> * 4/6 removed, we remove the on/off/auto define of sync, as by now,
> MAP_SYNC only worked with pmem=on.
> * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to parse
> all the flags in one parameter.
>
> Changes in v7:
> * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)
>
> Changes in v6:
> * Pankaj: 3/7 are squashed with 2/7
> * Pankaj: 7/7 update comments to "consistent filesystem metadata".
> * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
> * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
> * Stefan, 5/7 Add missing "munmap"
> * Stefan, 2/7 refine the shared/flag.
>
> Changes in v5:
> * Add patch 1 to fix a memory leak issue.
> * Refine the patch 4-6
> * Remove the patch 3 as we already change the parameter from "shared" to
> "flags"
>
> Changes in v4:
> * Add patch 1-3 to switch some functions to a single 'flags'
> parameters. (Michael S. Tsirkin)
> * v3 patch 1-3 become v4 patch 4-6.
> * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
> new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
> * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)
>
> Changes in v3:
> * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
> cases, and add back the retry mechanism. MAP_SYNC will be ignored
> by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
> * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
> platforms in order to make qemu_ram_mmap() compile on those platforms.
> * Patch 2&3: include more information in error messages of
> memory-backend in hope to help user to identify the error.
> (Dr. David Alan Gilbert)
> * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)
>
> Changes in v2:
> * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
> * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
> the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
> * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
> to osdep.h. (Michael S. Tsirkin)
>
> Zhang Yi (2):
> util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
> docs: Added MAP_SYNC documentation
>
> docs/nvdimm.txt | 22 +++++++++++++++++++---
> qemu-options.hx | 5 +++++
> util/mmap-alloc.c | 41 ++++++++++++++++++++++++++++++++++++++++-
> 3 files changed, 64 insertions(+), 4 deletions(-)
>
> --
> 2.19.1
On Mon, Apr 22, 2019 at 08:34:51AM -0400, Michael S. Tsirkin wrote: > On Mon, Apr 22, 2019 at 08:48:47AM +0800, Wei Yang wrote: > > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to > > guarantee the write persistence to mmap'ed files supporting DAX (e.g., > > files on ext4/xfs file system mounted with '-o dax'). > > > > A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at > > https://patchwork.kernel.org/patch/10028151/ > > > > In order to make sure that the file metadata is in sync after a fault > > while we are writing a shared DAX supporting backend files, this > > patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file. > > > > As the DAX vs DMA truncated issue was solved, we refined the code and > > send out this feature for the v5 version. > > > > We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and > > 'share=on' & 'pmem=on'. > > Or QEMU will not pass this flag to mmap(2) > > OK this is in a good shape. As we are in freeze anyway, > there's still a bit more time to polish it. I have a couple of > suggestions: > > - squash docs in same patch with code, no need for two patches > - mmap errors are not silently ignored as the doc says, > a warning is produced Note that a warning is produced only if both share=on and pmem=on is specified. If using pmem=on without share=on, no warning is printed at all. I agree we could squash the docs in the same patch, but I don't want to prevent the code from being merged and require v15 to be sent just because we are still polishing the documentation. If there are no objections, I plan to apply this version of the series including the following fixup (just removing the word "silently"), and I suggest further improvements to be sent as follow up patches. diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt index bcd1456e72..b531cacd35 100644 --- a/docs/nvdimm.txt +++ b/docs/nvdimm.txt @@ -159,8 +159,8 @@ If these conditions are not satisfied i.e. if either 'pmem' or 'share' are not set, if the backend file does not support DAX or if MAP_SYNC is not supported by the host kernel, write persistence is not guaranteed after a system crash. For compatibility reasons, these -conditions are silently ignored if not satisfied. Currently, no way -is provided to test for them. +conditions are ignored if not satisfied. Currently, no way is +provided to test for them. For more details, please reference mmap(2) man page: http://man7.org/linux/man-pages/man2/mmap.2.html. -- Eduardo
On Mon, Apr 22, 2019 at 03:22:55PM -0300, Eduardo Habkost wrote: >On Mon, Apr 22, 2019 at 08:34:51AM -0400, Michael S. Tsirkin wrote: >> On Mon, Apr 22, 2019 at 08:48:47AM +0800, Wei Yang wrote: >> > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to >> > guarantee the write persistence to mmap'ed files supporting DAX (e.g., >> > files on ext4/xfs file system mounted with '-o dax'). >> > >> > A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at >> > https://patchwork.kernel.org/patch/10028151/ >> > >> > In order to make sure that the file metadata is in sync after a fault >> > while we are writing a shared DAX supporting backend files, this >> > patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file. >> > >> > As the DAX vs DMA truncated issue was solved, we refined the code and >> > send out this feature for the v5 version. >> > >> > We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and >> > 'share=on' & 'pmem=on'. >> > Or QEMU will not pass this flag to mmap(2) >> >> OK this is in a good shape. As we are in freeze anyway, >> there's still a bit more time to polish it. I have a couple of >> suggestions: >> >> - squash docs in same patch with code, no need for two patches >> - mmap errors are not silently ignored as the doc says, >> a warning is produced > >Note that a warning is produced only if both share=on and pmem=on >is specified. If using pmem=on without share=on, no warning is >printed at all. > >I agree we could squash the docs in the same patch, but I don't >want to prevent the code from being merged and require v15 to be >sent just because we are still polishing the documentation. > >If there are no objections, I plan to apply this version of the >series including the following fixup (just removing the word >"silently"), and I suggest further improvements to be sent as >follow up patches. > If my understanding is correct, the following up patch is: "send the warnings to an errp object and not stderr" -- Wei Yang Help you, Help me
On Mon, Apr 22, 2019 at 03:22:55PM -0300, Eduardo Habkost wrote: > On Mon, Apr 22, 2019 at 08:34:51AM -0400, Michael S. Tsirkin wrote: > > On Mon, Apr 22, 2019 at 08:48:47AM +0800, Wei Yang wrote: > > > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to > > > guarantee the write persistence to mmap'ed files supporting DAX (e.g., > > > files on ext4/xfs file system mounted with '-o dax'). > > > > > > A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at > > > https://patchwork.kernel.org/patch/10028151/ > > > > > > In order to make sure that the file metadata is in sync after a fault > > > while we are writing a shared DAX supporting backend files, this > > > patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file. > > > > > > As the DAX vs DMA truncated issue was solved, we refined the code and > > > send out this feature for the v5 version. > > > > > > We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and > > > 'share=on' & 'pmem=on'. > > > Or QEMU will not pass this flag to mmap(2) > > > > OK this is in a good shape. As we are in freeze anyway, > > there's still a bit more time to polish it. I have a couple of > > suggestions: > > > > - squash docs in same patch with code, no need for two patches > > - mmap errors are not silently ignored as the doc says, > > a warning is produced > > Note that a warning is produced only if both share=on and pmem=on > is specified. If using pmem=on without share=on, no warning is > printed at all. > > I agree we could squash the docs in the same patch, but I don't > want to prevent the code from being merged and require v15 to be > sent just because we are still polishing the documentation. > > If there are no objections, I plan to apply this version of the > series including the following fixup (just removing the word > "silently"), and I suggest further improvements to be sent as > follow up patches. > > diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt > index bcd1456e72..b531cacd35 100644 > --- a/docs/nvdimm.txt > +++ b/docs/nvdimm.txt > @@ -159,8 +159,8 @@ If these conditions are not satisfied i.e. if either 'pmem' or 'share' > are not set, if the backend file does not support DAX or if MAP_SYNC > is not supported by the host kernel, write persistence is not > guaranteed after a system crash. For compatibility reasons, these > -conditions are silently ignored if not satisfied. Currently, no way > -is provided to test for them. > +conditions are ignored if not satisfied. Currently, no way is > +provided to test for them. > For more details, please reference mmap(2) man page: > http://man7.org/linux/man-pages/man2/mmap.2.html. with the two being squashed, and above fix: Reviewed-by: Michael S. Tsirkin <mst@redhat.com> > -- > Eduardo
© 2016 - 2026 Red Hat, Inc.