From nobody Fri Nov 7 18:31:47 2025 Delivered-To: importer@patchew.org Received-SPF: temperror (zoho.com: Error in retrieving data from DNS) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=temperror (zoho.com: Error in retrieving data from DNS) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=linux.intel.com Return-Path: Received: from lists.gnu.org (209.51.188.17 [209.51.188.17]) by mx.zohomail.com with SMTPS id 1548742832277648.3581496049784; Mon, 28 Jan 2019 22:20:32 -0800 (PST) Received: from localhost ([127.0.0.1]:43921 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1goMkv-0003xp-5Y for importer@patchew.org; Tue, 29 Jan 2019 01:20:21 -0500 Received: from eggs.gnu.org ([209.51.188.92]:35582) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1goMjE-00037N-VI for qemu-devel@nongnu.org; Tue, 29 Jan 2019 01:18:37 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1goMj5-0006N0-Ez for qemu-devel@nongnu.org; Tue, 29 Jan 2019 01:18:31 -0500 Received: from mga07.intel.com ([134.134.136.100]:32025) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1goMj4-0006Mh-Qz for qemu-devel@nongnu.org; Tue, 29 Jan 2019 01:18:27 -0500 Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Jan 2019 22:18:23 -0800 Received: from dazhang1-ssd.sh.intel.com (HELO localhost) ([10.239.48.91]) by fmsmga006.fm.intel.com with ESMTP; 28 Jan 2019 22:18:21 -0800 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,536,1539673200"; d="scan'208";a="314415253" From: "Zhang, Yi" To: xiaoguangrong.eric@gmail.com, stefanha@redhat.com, pbonzini@redhat.com, pagupta@redhat.com, yu.c.zhang@linux.intel.com, richardw.yang@linux.intel.com, mst@redhat.com, ehabkost@redhat.com Date: Tue, 29 Jan 2019 22:49:01 +0800 Message-Id: <8d5b28ca0350dfb6a0313f06c6ceb0f15201c8d7.1548771590.git.yi.z.zhang@linux.intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: References: X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.134.136.100 Subject: [Qemu-devel] [PATCH v11 1/3] util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: imammedo@redhat.com, dan.j.williams@intel.com, qemu-devel@nongnu.org, Zhang Yi Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Zhang Yi besides the existing 'shared' flags, we are going to add 'is_pmem' to qemu_ram_mmap(), which indicated the memory backend file is a persist memory. Signed-off-by: Haozhong Zhang Signed-off-by: Zhang Yi Reviewed-by: pagupta@redhat.com --- exec.c | 2 +- include/qemu/mmap-alloc.h | 21 ++++++++++++++++++++- util/mmap-alloc.c | 6 +++++- util/oslib-posix.c | 2 +- 4 files changed, 27 insertions(+), 4 deletions(-) diff --git a/exec.c b/exec.c index bb6170d..27cea52 100644 --- a/exec.c +++ b/exec.c @@ -1860,7 +1860,7 @@ static void *file_ram_alloc(RAMBlock *block, } =20 area =3D qemu_ram_mmap(fd, memory, block->mr->align, - block->flags & RAM_SHARED); + block->flags & RAM_SHARED, block->flags & RAM_PME= M); if (area =3D=3D MAP_FAILED) { error_setg_errno(errp, errno, "unable to map backing store for guest RAM"); diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h index 50385e3..190688a 100644 --- a/include/qemu/mmap-alloc.h +++ b/include/qemu/mmap-alloc.h @@ -7,7 +7,26 @@ size_t qemu_fd_getpagesize(int fd); =20 size_t qemu_mempath_getpagesize(const char *mem_path); =20 -void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared); +/** + * qemu_ram_mmap: mmap the specified file or device. + * + * Parameters: + * @fd: the file or the device to mmap + * @size: the number of bytes to be mmaped + * @align: if not zero, specify the alignment of the starting mapping add= ress; + * otherwise, the alignment in use will be determined by QEMU. + * @shared: map has RAM_SHARED flag. + * @is_pmem: map has RAM_PMEM flag. + * + * Return: + * On success, return a pointer to the mapped area. + * On failure, return MAP_FAILED. + */ +void *qemu_ram_mmap(int fd, + size_t size, + size_t align, + bool shared, + bool is_pmem); =20 void qemu_ram_munmap(void *ptr, size_t size); =20 diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c index fd329ec..97bbeed 100644 --- a/util/mmap-alloc.c +++ b/util/mmap-alloc.c @@ -75,7 +75,11 @@ size_t qemu_mempath_getpagesize(const char *mem_path) return getpagesize(); } =20 -void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared) +void *qemu_ram_mmap(int fd, + size_t size, + size_t align, + bool shared, + bool is_pmem) { /* * Note: this always allocates at least one extra page of virtual addr= ess diff --git a/util/oslib-posix.c b/util/oslib-posix.c index fbd0dc8..040937f 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -203,7 +203,7 @@ void *qemu_memalign(size_t alignment, size_t size) void *qemu_anon_ram_alloc(size_t size, uint64_t *alignment, bool shared) { size_t align =3D QEMU_VMALLOC_ALIGN; - void *ptr =3D qemu_ram_mmap(-1, size, align, shared); + void *ptr =3D qemu_ram_mmap(-1, size, align, shared, false); =20 if (ptr =3D=3D MAP_FAILED) { return NULL; --=20 2.7.4 From nobody Fri Nov 7 18:31:47 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=linux.intel.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1548742947259957.2095623219507; Mon, 28 Jan 2019 22:22:27 -0800 (PST) Received: from localhost ([127.0.0.1]:43974 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1goMmw-0005UL-9Q for importer@patchew.org; Tue, 29 Jan 2019 01:22:26 -0500 Received: from eggs.gnu.org ([209.51.188.92]:35611) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1goMjP-0003C9-79 for qemu-devel@nongnu.org; Tue, 29 Jan 2019 01:18:50 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1goMjO-0006Sb-Cb for qemu-devel@nongnu.org; Tue, 29 Jan 2019 01:18:47 -0500 Received: from mga12.intel.com ([192.55.52.136]:31984) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1goMjO-0006Na-2C for qemu-devel@nongnu.org; Tue, 29 Jan 2019 01:18:46 -0500 Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Jan 2019 22:18:31 -0800 Received: from dazhang1-ssd.sh.intel.com (HELO localhost) ([10.239.48.91]) by fmsmga004.fm.intel.com with ESMTP; 28 Jan 2019 22:18:29 -0800 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,536,1539673200"; d="scan'208";a="139657669" From: "Zhang, Yi" To: xiaoguangrong.eric@gmail.com, stefanha@redhat.com, pbonzini@redhat.com, pagupta@redhat.com, yu.c.zhang@linux.intel.com, richardw.yang@linux.intel.com, mst@redhat.com, ehabkost@redhat.com Date: Tue, 29 Jan 2019 22:49:09 +0800 Message-Id: X-Mailer: git-send-email 2.7.4 In-Reply-To: References: X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.136 Subject: [Qemu-devel] [PATCH v11 2/3] util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: imammedo@redhat.com, dan.j.williams@intel.com, qemu-devel@nongnu.org, Zhang Yi Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Zhang Yi When a file supporting DAX is used as vNVDIMM backend, mmap it with MAP_SYNC flag in addition which can ensure file system metadata synced in each guest writes to the backend file, without other QEMU actions (e.g., periodic fsync() by QEMU). Current, We have below different possible use cases: 1. pmem=3Don is set, shared=3Don is set, MAP_SYNC supported: a: backend is a dax supporting file. - MAP_SYNC will active. b: backend is not a dax supporting file. - mmap will trigger a warning. then MAP_SYNC flag will be ignored 2. The rest of cases: - we will never pass the MAP_SYNC to mmap2 Signed-off-by: Haozhong Zhang Signed-off-by: Zhang Yi --- include/qemu/osdep.h | 21 +++++++++++++++++++++ util/mmap-alloc.c | 28 +++++++++++++++++++++++++++- 2 files changed, 48 insertions(+), 1 deletion(-) diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index 457d24e..96209bb 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -419,6 +419,27 @@ void qemu_anon_ram_free(void *ptr, size_t size); # define QEMU_VMALLOC_ALIGN getpagesize() #endif =20 +/* + * MAP_SHARED_VALIDATE and MAP_SYNC are introduced in Linux kernel + * 4.15, so they may not be defined when compiling on older kernels. + */ +#ifdef CONFIG_LINUX + +#include + +#ifndef MAP_SYNC +#define MAP_SYNC 0x80000 +#endif + +#ifndef MAP_SHARED_VALIDATE +#define MAP_SHARED_VALIDATE 0x03 +#endif + +#else /* !CONFIG_LINUX */ +#define MAP_SYNC 0x0 +#define MAP_SHARED_VALIDATE 0x0 +#endif /* CONFIG_LINUX */ + #ifdef CONFIG_POSIX struct qemu_signalfd_siginfo { uint32_t ssi_signo; /* Signal number */ diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c index 97bbeed..2c86ad2 100644 --- a/util/mmap-alloc.c +++ b/util/mmap-alloc.c @@ -101,6 +101,7 @@ void *qemu_ram_mmap(int fd, #else void *ptr =3D mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -= 1, 0); #endif + int mmap_xflags =3D 0; size_t offset; void *ptr1; =20 @@ -111,13 +112,38 @@ void *qemu_ram_mmap(int fd, assert(is_power_of_2(align)); /* Always align to host page size */ assert(align >=3D getpagesize()); + if (shared && is_pmem) { + mmap_xflags =3D MAP_SYNC | MAP_SHARED_VALIDATE; + } =20 offset =3D QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr; +retry_mmap: ptr1 =3D mmap(ptr + offset, size, PROT_READ | PROT_WRITE, MAP_FIXED | (fd =3D=3D -1 ? MAP_ANONYMOUS : 0) | - (shared ? MAP_SHARED : MAP_PRIVATE), + (shared ? MAP_SHARED : MAP_PRIVATE) | mmap_xflags, fd, 0); + + /* if map failed with MAP_SHARED_VALIDATE | MAP_SYNC, + * we try with MAP_SHARED_VALIDATE without MAP_SYNC + */ + if (ptr1 =3D=3D MAP_FAILED && + mmap_xflags =3D=3D (MAP_SYNC | MAP_SHARED_VALIDATE)) { + if (errno =3D=3D ENOTSUP) { + perror("failed to validate with mapping flags"); + } + mmap_xflags =3D MAP_SHARED_VALIDATE; + goto retry_mmap; + } + /* MAP_SHARED_VALIDATE flag is available since Linux 4.15 + * Test only with MAP_SHARED_VALIDATE flag for compatibility. + * Then ignore the MAP_SHARED_VALIDATE flag and retry again + */ + if (mmap_xflags =3D=3D MAP_SHARED_VALIDATE && + ptr1 =3D=3D MAP_FAILED) { + mmap_xflags &=3D ~MAP_SHARED_VALIDATE; + goto retry_mmap; + } if (ptr1 =3D=3D MAP_FAILED) { munmap(ptr, total); return MAP_FAILED; --=20 2.7.4 From nobody Fri Nov 7 18:31:47 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=linux.intel.com Return-Path: Received: from lists.gnu.org (209.51.188.17 [209.51.188.17]) by mx.zohomail.com with SMTPS id 1548742841176976.7420735896906; Mon, 28 Jan 2019 22:20:41 -0800 (PST) Received: from localhost ([127.0.0.1]:43928 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1goMl7-000451-2o for importer@patchew.org; Tue, 29 Jan 2019 01:20:33 -0500 Received: from eggs.gnu.org ([209.51.188.92]:35618) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1goMjP-0003CZ-QW for qemu-devel@nongnu.org; Tue, 29 Jan 2019 01:18:50 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1goMjO-0006T5-Ee for qemu-devel@nongnu.org; Tue, 29 Jan 2019 01:18:47 -0500 Received: from mga09.intel.com ([134.134.136.24]:34449) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1goMjO-0006P9-6Z for qemu-devel@nongnu.org; Tue, 29 Jan 2019 01:18:46 -0500 Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Jan 2019 22:18:39 -0800 Received: from dazhang1-ssd.sh.intel.com (HELO localhost) ([10.239.48.91]) by fmsmga001.fm.intel.com with ESMTP; 28 Jan 2019 22:18:37 -0800 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,536,1539673200"; d="scan'208";a="142262457" From: "Zhang, Yi" To: xiaoguangrong.eric@gmail.com, stefanha@redhat.com, pbonzini@redhat.com, pagupta@redhat.com, yu.c.zhang@linux.intel.com, richardw.yang@linux.intel.com, mst@redhat.com, ehabkost@redhat.com Date: Tue, 29 Jan 2019 22:49:18 +0800 Message-Id: <4f91d5a46fc26d5672c91d80b78e986d4267c612.1548771590.git.yi.z.zhang@linux.intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: References: X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.134.136.24 Subject: [Qemu-devel] [PATCH v11 3/3] docs: Added MAP_SYNC documentation X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: imammedo@redhat.com, dan.j.williams@intel.com, qemu-devel@nongnu.org, Zhang Yi Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Zhang Yi Signed-off-by: Zhang Yi --- docs/nvdimm.txt | 29 ++++++++++++++++++++++++++++- qemu-options.hx | 4 ++++ 2 files changed, 32 insertions(+), 1 deletion(-) diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt index 5f158a6..9da96aa 100644 --- a/docs/nvdimm.txt +++ b/docs/nvdimm.txt @@ -142,11 +142,38 @@ backend of vNVDIMM: Guest Data Persistence ---------------------- =20 +vNVDIMM is designed and implemented to guarantee the guest data +persistence on the backends in case of host crash or a power failures. +However, there are still some requirements and limitations +as explained below. + Though QEMU supports multiple types of vNVDIMM backends on Linux, -currently the only one that can guarantee the guest write persistence +if MAP_SYNC is not supported by the host kernel and the backends, +the only backend that can guarantee the guest write persistence is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to which all guest access do not involve any host-side kernel cache. =20 +mmap(2) flag MAP_SYNC is added since Linux kernel 4.15. On such +systems, QEMU can mmap(2) the dax backend files with MAP_SYNC, which +ensures filesystem metadata consistency in case of a host crash or a power +failure. Enabling MAP_SYNC in QEMU requires below conditions + + - 'pmem' option of memory-backend-file is 'on': + The backend is a file supporting DAX, e.g., a file on an ext4 or + xfs file system mounted with '-o dax'. if your pmem=3Don ,but the backe= nd is + not a file supporting DAX, mapping with this flag results in an EOPNOTS= UPP + warning. then MAP_SYNC will be ignored + + - 'share' option of memory-backend-file is 'on': + MAP_SYNC flag available only with the MAP_SHARED_VALIDATE mapping type. + + - 'MAP_SYNC' is supported on linux kernel.(default opened since Linux 4.1= 5) + +Otherwise, We will ignore the MAP_SYNC flag. + +For more details, please reference mmap(2) man page: +http://man7.org/linux/man-pages/man2/mmap.2.html. + When using other types of backends, it's suggested to set 'unarmed' option of '-device nvdimm' to 'on', which sets the unarmed flag of the guest NVDIMM region mapping structure. This unarmed flag indicates diff --git a/qemu-options.hx b/qemu-options.hx index 08f8516..0cd41f4 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -4002,6 +4002,10 @@ using the SNIA NVM programming model (e.g. Intel NVD= IMM). If @option{pmem} is set to 'on', QEMU will take necessary operations to guarantee the persistence of its own writes to @option{mem-path} (e.g. in vNVDIMM label emulation and live migration). +Also, we will map the backend-file with MAP_SYNC flag, which can ensure +the file metadata is in sync to @option{mem-path} in case of host crash +or a power failure. MAP_SYNC requires support from both the host kernel +(since Linux kernel 4.15) and @option{mem-path} (only files supporting DAX= ). =20 @item -object memory-backend-ram,id=3D@var{id},merge=3D@var{on|off},dump= =3D@var{on|off},share=3D@var{on|off},prealloc=3D@var{on|off},size=3D@var{si= ze},host-nodes=3D@var{host-nodes},policy=3D@var{default|preferred|bind|inte= rleave} =20 --=20 2.7.4