util/oslib-posix.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
When using a memory-backend object with prealloc turned on, QEMU
will memset() the first byte in every memory page to zero. While
this might have been acceptable for memory backends associated
with RAM, this corrupts application data for NVDIMMs.
Instead of setting every page to zero, read the current byte
value and then just write that same value back, so we are not
corrupting the original data. Directly write the value instead
of memset()ing it, since there's no benefit to memset for a
single byte write.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
---
NB, I have not tested performance of this, so no idea if this
makes it better/worse/no-change. Would appreciate if Jitendra
could repeat tests to see if it impacts scalability at all.
util/oslib-posix.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 35012b9..2a5bb93 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -355,7 +355,20 @@ void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp)
/* MAP_POPULATE silently ignores failures */
for (i = 0; i < numpages; i++) {
- memset(area + (hpagesize * i), 0, 1);
+ /*
+ * Read & write back the same value, so we don't
+ * corrupt existinng user/app data that might be
+ * stored.
+ *
+ * 'volatile' to stop compiler optimizing this away
+ * to a no-op
+ *
+ * TODO: get a better solution from kernel so we
+ * don't need to write at all so we don't cause
+ * wear on the storage backing the region...
+ */
+ volatile char val = *(area + (hpagesize * i));
+ *(area + (hpagesize * i)) = val;
}
}
--
2.9.3
Hi, This series seems to have some coding style problems. See output below for more information: Message-id: 20170224172714.26026-1-berrange@redhat.com Type: series Subject: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc === TEST SCRIPT BEGIN === #!/bin/bash BASE=base n=1 total=$(git log --oneline $BASE.. | wc -l) failed=0 # Useful git options git config --local diff.renamelimit 0 git config --local diff.renames True commits="$(git log --format=%H --reverse $BASE..)" for c in $commits; do echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..." if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then failed=1 echo fi n=$((n+1)) done exit $failed === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu - [tag update] patchew/1487262963-11519-1-git-send-email-peter.maydell@linaro.org -> patchew/1487262963-11519-1-git-send-email-peter.maydell@linaro.org - [tag update] patchew/1487920971-16519-1-git-send-email-zhangchen.fnst@cn.fujitsu.com -> patchew/1487920971-16519-1-git-send-email-zhangchen.fnst@cn.fujitsu.com * [new tag] patchew/20170224172714.26026-1-berrange@redhat.com -> patchew/20170224172714.26026-1-berrange@redhat.com Switched to a new branch 'test' 131074e os: don't corrupt pre-existing memory-backend data with prealloc === OUTPUT BEGIN === Checking PATCH 1/1: os: don't corrupt pre-existing memory-backend data with prealloc... ERROR: Use of volatile is usually wrong: see Documentation/volatile-considered-harmful.txt #42: FILE: util/oslib-posix.c:370: + volatile char val = *(area + (hpagesize * i)); total: 1 errors, 0 warnings, 21 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-devel@freelists.org
On Fri, Feb 24, 2017 at 09:33:05AM -0800, no-reply@patchew.org wrote: > === OUTPUT BEGIN === > Checking PATCH 1/1: os: don't corrupt pre-existing memory-backend data with prealloc... > ERROR: Use of volatile is usually wrong: see Documentation/volatile-considered-harmful.txt ERROR: checkpatch.pl is usually wrong ;-P Heh, it is refering to a doc in the kernel source tree, which does not even exist at that path location anymore :-) > #42: FILE: util/oslib-posix.c:370: > + volatile char val = *(area + (hpagesize * i)); > > total: 1 errors, 0 warnings, 21 lines checked > > Your patch has style problems, please review. If any of these errors > are false positives report them to the maintainer, see > CHECKPATCH in MAINTAINERS. > > === OUTPUT END === Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
On 02/24/2017 11:27 AM, Daniel P. Berrange wrote: > When using a memory-backend object with prealloc turned on, QEMU > will memset() the first byte in every memory page to zero. While > this might have been acceptable for memory backends associated > with RAM, this corrupts application data for NVDIMMs. > > Instead of setting every page to zero, read the current byte > value and then just write that same value back, so we are not > corrupting the original data. Directly write the value instead > of memset()ing it, since there's no benefit to memset for a > single byte write. > > Signed-off-by: Daniel P. Berrange <berrange@redhat.com> > --- > > /* MAP_POPULATE silently ignores failures */ > for (i = 0; i < numpages; i++) { > - memset(area + (hpagesize * i), 0, 1); > + /* > + * Read & write back the same value, so we don't > + * corrupt existinng user/app data that might be s/existinng/existing/ > + * stored. > + * > + * 'volatile' to stop compiler optimizing this away > + * to a no-op > + * > + * TODO: get a better solution from kernel so we > + * don't need to write at all so we don't cause > + * wear on the storage backing the region... > + */ > + volatile char val = *(area + (hpagesize * i)); > + *(area + (hpagesize * i)) = val; > } > } > > -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
On Fri, Feb 24, 2017 at 05:27:14PM +0000, Daniel P. Berrange wrote: > When using a memory-backend object with prealloc turned on, QEMU > will memset() the first byte in every memory page to zero. While > this might have been acceptable for memory backends associated > with RAM, this corrupts application data for NVDIMMs. > > Instead of setting every page to zero, read the current byte > value and then just write that same value back, so we are not > corrupting the original data. Directly write the value instead > of memset()ing it, since there's no benefit to memset for a > single byte write. > > Signed-off-by: Daniel P. Berrange <berrange@redhat.com> > --- > > NB, I have not tested performance of this, so no idea if this > makes it better/worse/no-change. Would appreciate if Jitendra > could repeat tests to see if it impacts scalability at all. > > util/oslib-posix.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Hello, On Fri, Feb 24, 2017 at 05:27:14PM +0000, Daniel P. Berrange wrote: > diff --git a/util/oslib-posix.c b/util/oslib-posix.c > index 35012b9..2a5bb93 100644 > --- a/util/oslib-posix.c > +++ b/util/oslib-posix.c > @@ -355,7 +355,20 @@ void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp) > > /* MAP_POPULATE silently ignores failures */ > for (i = 0; i < numpages; i++) { > - memset(area + (hpagesize * i), 0, 1); > + /* > + * Read & write back the same value, so we don't > + * corrupt existinng user/app data that might be > + * stored. > + * > + * 'volatile' to stop compiler optimizing this away > + * to a no-op > + * > + * TODO: get a better solution from kernel so we > + * don't need to write at all so we don't cause > + * wear on the storage backing the region... > + */ It would be better that if MAP_POPULATE failed like mlock does, at least for those get_user_pages errors like -ENOMEM that are already a retval for mmap it sounds weird that it's not being forwarded. It'd be enough to call do_munmap to rollback all mmap work before returning -ENOMEM from mmap instead of succees. -EFAULT or other get_user_pages errors are more troublesome, as mmap wouldn't otherwise return -EFAULT, but those would generally be qemu bugs or hardware errors and MAP_POPULATE I doubt is meant to be a debug/robustness aid. All we care about here is memory resource management and not to risk -ENOMEM at runtime and immediate peak performance. So perhaps a kernel patch that forwards -ENOMEM from __mm_populate to mmap retval would be enough to make it usable? > + volatile char val = *(area + (hpagesize * i)); > + *(area + (hpagesize * i)) = val; It's not "var" that should be volatile, nor the pointer, but only the memory area you write to, because the write to the memory are is the only thing we care about. If "val" is volatile gcc could read the memory area and cache the value of the memory area, write it into the volatile var, then read the volatile var again, compare it with the cached value and skip the write to the memory if it's equal. In practice it's likely not doing that, and making var volatile has the same effect, so it's probably only a theoretical issue of course. I would suggest this to correct it (untested): char *tmparea = area + (hpagesize * i); *(volatile char *)tmparea = *tmparea; Thanks, Andrea
© 2016 - 2024 Red Hat, Inc.