[Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc

Daniel P. Berrange posted 1 patch 7 years, 2 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20170224172714.26026-1-berrange@redhat.com
Test checkpatch failed
Test docker passed
Test s390x passed
There is a newer version of this series
util/oslib-posix.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
[Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
Posted by Daniel P. Berrange 7 years, 2 months ago
When using a memory-backend object with prealloc turned on, QEMU
will memset() the first byte in every memory page to zero. While
this might have been acceptable for memory backends associated
with RAM, this corrupts application data for NVDIMMs.

Instead of setting every page to zero, read the current byte
value and then just write that same value back, so we are not
corrupting the original data. Directly write the value instead
of memset()ing it, since there's no benefit to memset for a
single byte write.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
---

NB, I have not tested performance of this, so no idea if this
makes it better/worse/no-change. Would appreciate if Jitendra
could repeat tests to see if it impacts scalability at all.

 util/oslib-posix.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 35012b9..2a5bb93 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -355,7 +355,20 @@ void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp)
 
         /* MAP_POPULATE silently ignores failures */
         for (i = 0; i < numpages; i++) {
-            memset(area + (hpagesize * i), 0, 1);
+            /*
+             * Read & write back the same value, so we don't
+             * corrupt existinng user/app data that might be
+             * stored.
+             *
+             * 'volatile' to stop compiler optimizing this away
+             * to a no-op
+             *
+             * TODO: get a better solution from kernel so we
+             * don't need to write at all so we don't cause
+             * wear on the storage backing the region...
+             */
+            volatile char val = *(area + (hpagesize * i));
+            *(area + (hpagesize * i)) = val;
         }
     }
 
-- 
2.9.3


Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
Posted by no-reply@patchew.org 7 years, 2 months ago
Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20170224172714.26026-1-berrange@redhat.com
Type: series
Subject: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]      patchew/1487262963-11519-1-git-send-email-peter.maydell@linaro.org -> patchew/1487262963-11519-1-git-send-email-peter.maydell@linaro.org
 - [tag update]      patchew/1487920971-16519-1-git-send-email-zhangchen.fnst@cn.fujitsu.com -> patchew/1487920971-16519-1-git-send-email-zhangchen.fnst@cn.fujitsu.com
 * [new tag]         patchew/20170224172714.26026-1-berrange@redhat.com -> patchew/20170224172714.26026-1-berrange@redhat.com
Switched to a new branch 'test'
131074e os: don't corrupt pre-existing memory-backend data with prealloc

=== OUTPUT BEGIN ===
Checking PATCH 1/1: os: don't corrupt pre-existing memory-backend data with prealloc...
ERROR: Use of volatile is usually wrong: see Documentation/volatile-considered-harmful.txt
#42: FILE: util/oslib-posix.c:370:
+            volatile char val = *(area + (hpagesize * i));

total: 1 errors, 0 warnings, 21 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org
Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
Posted by Daniel P. Berrange 7 years, 1 month ago
On Fri, Feb 24, 2017 at 09:33:05AM -0800, no-reply@patchew.org wrote:
> === OUTPUT BEGIN ===
> Checking PATCH 1/1: os: don't corrupt pre-existing memory-backend data with prealloc...
> ERROR: Use of volatile is usually wrong: see Documentation/volatile-considered-harmful.txt

ERROR: checkpatch.pl is usually wrong ;-P

Heh, it is refering to a doc in the kernel source tree, which does not even
exist at that path location anymore :-)


> #42: FILE: util/oslib-posix.c:370:
> +            volatile char val = *(area + (hpagesize * i));
> 
> total: 1 errors, 0 warnings, 21 lines checked
> 
> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> 
> === OUTPUT END ===

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
Posted by Eric Blake 7 years, 2 months ago
On 02/24/2017 11:27 AM, Daniel P. Berrange wrote:
> When using a memory-backend object with prealloc turned on, QEMU
> will memset() the first byte in every memory page to zero. While
> this might have been acceptable for memory backends associated
> with RAM, this corrupts application data for NVDIMMs.
> 
> Instead of setting every page to zero, read the current byte
> value and then just write that same value back, so we are not
> corrupting the original data. Directly write the value instead
> of memset()ing it, since there's no benefit to memset for a
> single byte write.
> 
> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
> ---
> 

>          /* MAP_POPULATE silently ignores failures */
>          for (i = 0; i < numpages; i++) {
> -            memset(area + (hpagesize * i), 0, 1);
> +            /*
> +             * Read & write back the same value, so we don't
> +             * corrupt existinng user/app data that might be

s/existinng/existing/

> +             * stored.
> +             *
> +             * 'volatile' to stop compiler optimizing this away
> +             * to a no-op
> +             *
> +             * TODO: get a better solution from kernel so we
> +             * don't need to write at all so we don't cause
> +             * wear on the storage backing the region...
> +             */
> +            volatile char val = *(area + (hpagesize * i));
> +            *(area + (hpagesize * i)) = val;
>          }
>      }
>  
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
Posted by Stefan Hajnoczi 7 years, 1 month ago
On Fri, Feb 24, 2017 at 05:27:14PM +0000, Daniel P. Berrange wrote:
> When using a memory-backend object with prealloc turned on, QEMU
> will memset() the first byte in every memory page to zero. While
> this might have been acceptable for memory backends associated
> with RAM, this corrupts application data for NVDIMMs.
> 
> Instead of setting every page to zero, read the current byte
> value and then just write that same value back, so we are not
> corrupting the original data. Directly write the value instead
> of memset()ing it, since there's no benefit to memset for a
> single byte write.
> 
> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
> ---
> 
> NB, I have not tested performance of this, so no idea if this
> makes it better/worse/no-change. Would appreciate if Jitendra
> could repeat tests to see if it impacts scalability at all.
> 
>  util/oslib-posix.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Re: [Qemu-devel] [PATCH] os: don't corrupt pre-existing memory-backend data with prealloc
Posted by Andrea Arcangeli 7 years, 1 month ago
Hello,

On Fri, Feb 24, 2017 at 05:27:14PM +0000, Daniel P. Berrange wrote:
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 35012b9..2a5bb93 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -355,7 +355,20 @@ void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp)
>  
>          /* MAP_POPULATE silently ignores failures */
>          for (i = 0; i < numpages; i++) {
> -            memset(area + (hpagesize * i), 0, 1);
> +            /*
> +             * Read & write back the same value, so we don't
> +             * corrupt existinng user/app data that might be
> +             * stored.
> +             *
> +             * 'volatile' to stop compiler optimizing this away
> +             * to a no-op
> +             *
> +             * TODO: get a better solution from kernel so we
> +             * don't need to write at all so we don't cause
> +             * wear on the storage backing the region...
> +             */

It would be better that if MAP_POPULATE failed like mlock does, at
least for those get_user_pages errors like -ENOMEM that are already a
retval for mmap it sounds weird that it's not being forwarded. It'd be
enough to call do_munmap to rollback all mmap work before returning
-ENOMEM from mmap instead of succees.

-EFAULT or other get_user_pages errors are more troublesome, as mmap
wouldn't otherwise return -EFAULT, but those would generally be qemu
bugs or hardware errors and MAP_POPULATE I doubt is meant to be a
debug/robustness aid.  All we care about here is memory resource
management and not to risk -ENOMEM at runtime and immediate peak
performance. So perhaps a kernel patch that forwards -ENOMEM from
__mm_populate to mmap retval would be enough to make it usable?

> +            volatile char val = *(area + (hpagesize * i));
> +            *(area + (hpagesize * i)) = val;

It's not "var" that should be volatile, nor the pointer, but only the
memory area you write to, because the write to the memory are is the
only thing we care about.

If "val" is volatile gcc could read the memory area and cache the
value of the memory area, write it into the volatile var, then read
the volatile var again, compare it with the cached value and skip the
write to the memory if it's equal. In practice it's likely not doing
that, and making var volatile has the same effect, so it's probably
only a theoretical issue of course.

I would suggest this to correct it (untested):

	char *tmparea = area + (hpagesize * i);
	*(volatile char *)tmparea = *tmparea;

Thanks,
Andrea