[PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

David Hildenbrand posted 3 patches 2 years, 6 months ago
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>, David Hildenbrand <david@redhat.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>
[PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by David Hildenbrand 2 years, 6 months ago
From: Thiner Logoer <logoerthiner1@163.com>

Users may specify
* "-mem-path" or
* "-object memory-backend-file,share=off,readonly=off"
and expect such COW (MAP_PRIVATE) mappings to work, even if the user
does not have write permissions to open the file.

For now, we would always fail in that case, always requiring file write
permissions. Let's detect when that failure happens and fallback to opening
the file readonly.

Warn the user, since there are other use cases where we want the file to
be mapped writable: ftruncate() and fallocate() will fail if the file
was not opened with write permissions.

Signed-off-by: Thiner Logoer <logoerthiner1@163.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 softmmu/physmem.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 3df73542e1..d1ae694b20 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1289,8 +1289,7 @@ static int64_t get_file_align(int fd)
 static int file_ram_open(const char *path,
                          const char *region_name,
                          bool readonly,
-                         bool *created,
-                         Error **errp)
+                         bool *created)
 {
     char *filename;
     char *sanitized_name;
@@ -1334,10 +1333,7 @@ static int file_ram_open(const char *path,
             g_free(filename);
         }
         if (errno != EEXIST && errno != EINTR) {
-            error_setg_errno(errp, errno,
-                             "can't open backing store %s for guest RAM",
-                             path);
-            return -1;
+            return -errno;
         }
         /*
          * Try again on EINTR and EEXIST.  The latter happens when
@@ -1946,9 +1942,23 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
     bool created;
     RAMBlock *block;
 
-    fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created,
-                       errp);
+    fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created);
+    if (fd == -EACCES && !(ram_flags & RAM_SHARED) && !readonly) {
+        /*
+         * We can have a writable MAP_PRIVATE mapping of a readonly file.
+         * However, some operations like ftruncate() or fallocate() might fail
+         * later, let's warn the user.
+         */
+        fd = file_ram_open(mem_path, memory_region_name(mr), true, &created);
+        if (fd >= 0) {
+            warn_report("backing store %s for guest RAM (MAP_PRIVATE) opened"
+                        " readonly because the file is not writable", mem_path);
+        }
+    }
     if (fd < 0) {
+        error_setg_errno(errp, -fd,
+                         "can't open backing store %s for guest RAM",
+                         mem_path);
         return NULL;
     }
 
-- 
2.41.0
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by Peter Xu 2 years, 6 months ago
On Mon, Aug 07, 2023 at 09:07:32PM +0200, David Hildenbrand wrote:
> From: Thiner Logoer <logoerthiner1@163.com>
> 
> Users may specify
> * "-mem-path" or
> * "-object memory-backend-file,share=off,readonly=off"
> and expect such COW (MAP_PRIVATE) mappings to work, even if the user
> does not have write permissions to open the file.
> 
> For now, we would always fail in that case, always requiring file write
> permissions. Let's detect when that failure happens and fallback to opening
> the file readonly.
> 
> Warn the user, since there are other use cases where we want the file to
> be mapped writable: ftruncate() and fallocate() will fail if the file
> was not opened with write permissions.
> 
> Signed-off-by: Thiner Logoer <logoerthiner1@163.com>
> Co-developed-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  softmmu/physmem.c | 26 ++++++++++++++++++--------
>  1 file changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 3df73542e1..d1ae694b20 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -1289,8 +1289,7 @@ static int64_t get_file_align(int fd)
>  static int file_ram_open(const char *path,
>                           const char *region_name,
>                           bool readonly,
> -                         bool *created,
> -                         Error **errp)
> +                         bool *created)
>  {
>      char *filename;
>      char *sanitized_name;
> @@ -1334,10 +1333,7 @@ static int file_ram_open(const char *path,
>              g_free(filename);
>          }
>          if (errno != EEXIST && errno != EINTR) {
> -            error_setg_errno(errp, errno,
> -                             "can't open backing store %s for guest RAM",
> -                             path);
> -            return -1;
> +            return -errno;
>          }
>          /*
>           * Try again on EINTR and EEXIST.  The latter happens when
> @@ -1946,9 +1942,23 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
>      bool created;
>      RAMBlock *block;
>  
> -    fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created,
> -                       errp);
> +    fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created);
> +    if (fd == -EACCES && !(ram_flags & RAM_SHARED) && !readonly) {
> +        /*
> +         * We can have a writable MAP_PRIVATE mapping of a readonly file.
> +         * However, some operations like ftruncate() or fallocate() might fail
> +         * later, let's warn the user.
> +         */
> +        fd = file_ram_open(mem_path, memory_region_name(mr), true, &created);
> +        if (fd >= 0) {
> +            warn_report("backing store %s for guest RAM (MAP_PRIVATE) opened"
> +                        " readonly because the file is not writable", mem_path);

I can understand the use case, but this will be slightly unwanted,
especially the user doesn't yet have a way to predict when will it happen.

Meanwhile this changes the behavior, is it a concern that someone may want
to rely on current behavior of failing?

To think from a higher level of current use case, the ideal solution seems
to me that if the ram file can be put on a file system that supports CoW
itself (like btrfs), we can snapshot that ram file and make it RW for the
qemu instance. Then here it'll be able to open the file.  We'll be able to
keep the interface working as before, meanwhile it'll work with fallocate
or truncations too I assume.

Would that be better instead of changing QEMU?

Thanks,

> +        }
> +    }
>      if (fd < 0) {
> +        error_setg_errno(errp, -fd,
> +                         "can't open backing store %s for guest RAM",
> +                         mem_path);
>          return NULL;
>      }
>  
> -- 
> 2.41.0
> 

-- 
Peter Xu
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by David Hildenbrand 2 years, 6 months ago
Hi Peter!

>> -    fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created,
>> -                       errp);
>> +    fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created);
>> +    if (fd == -EACCES && !(ram_flags & RAM_SHARED) && !readonly) {
>> +        /*
>> +         * We can have a writable MAP_PRIVATE mapping of a readonly file.
>> +         * However, some operations like ftruncate() or fallocate() might fail
>> +         * later, let's warn the user.
>> +         */
>> +        fd = file_ram_open(mem_path, memory_region_name(mr), true, &created);
>> +        if (fd >= 0) {
>> +            warn_report("backing store %s for guest RAM (MAP_PRIVATE) opened"
>> +                        " readonly because the file is not writable", mem_path);
> 
> I can understand the use case, but this will be slightly unwanted,
> especially the user doesn't yet have a way to predict when will it happen.

Users can set the file permissions accordingly I guess. If they don't 
want the file to never ever be modified via QEMU, set it R/O.

> 
> Meanwhile this changes the behavior, is it a concern that someone may want
> to rely on current behavior of failing?

The scenario would be that someone passes a readonly file to "-mem-path" 
or "-object memory-backend-file,share=off,readonly=off", with the 
expectation that it would currently fail.

If it now doesn't fail (and we warn instead), what would happen is:
* In file_ram_alloc() we won't even try ftruncate(), because the file
   already had a size > 0. So ftruncate() is not a concern as I now
   realize.
* fallocate might fail later. AFAIKS, that only applies to
   ram_block_discard_range().
  -> virtio-mem performs an initial ram_block_discard_range() check and
     fails gracefully early.
  -> virtio-ballooon ignores any errors
  -> ram_discard_range() in migration code fails early for postcopy in
     init_range() and loadvm_postcopy_ram_handle_discard(), handling it
     gracefully.

So mostly nothing "bad" would happen, it might just be undesirable, and 
we properly warn.

Most importantly, we won't be corrupting/touching the original file in 
any case, because it is R/O.

If we really want to be careful, we could clue that behavior to compat 
machines. I'm not really sure yet if we really have to go down that path.

Any other alternatives? I'd like to avoid new flags where not really 
required.

> 
> To think from a higher level of current use case, the ideal solution seems
> to me that if the ram file can be put on a file system that supports CoW
> itself (like btrfs), we can snapshot that ram file and make it RW for the
> qemu instance. Then here it'll be able to open the file.  We'll be able to
> keep the interface working as before, meanwhile it'll work with fallocate
> or truncations too I assume.
> 
> Would that be better instead of changing QEMU?

As I recently learned, using file-backed VMs (on real ssd/disks, not 
shmem/hugetlb) is usually undesired, because the dirtied pages will 
constantly get rewritten to disk by background writeback threads, 
eventually resulting in bad performance and SSD wear.

So while using a COW filesystem sounds cleaner in theory, it's not 
applicable in practice -- unless one disables any background writeback, 
which has different side effects because it cannot be configured on a 
per-file basis.

So for VM templating, it makes sense to capture the guest RAM and store 
it in a file, to then use a COW (MAP_PRIVATE) mapping. Using a read-only 
file makes perfect sense in that scenario IMHO.

[I'm curious at what point a filesystem will actually break COW. if it's 
wired up to the writenotify infrastructure, it would happen when 
actually writing to a page, not at mmap time. I know that filesystems 
use writenotify for lazy allocation of disk blocks on file holes, maybe 
they also do that for lazy allocation of disk blocks on COW]

Thanks!

-- 
Cheers,

David / dhildenb
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by Peter Xu 2 years, 6 months ago
On Wed, Aug 09, 2023 at 11:20:11AM +0200, David Hildenbrand wrote:
> Hi Peter!

Hi, David,

> 
> > > -    fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created,
> > > -                       errp);
> > > +    fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created);
> > > +    if (fd == -EACCES && !(ram_flags & RAM_SHARED) && !readonly) {
> > > +        /*
> > > +         * We can have a writable MAP_PRIVATE mapping of a readonly file.
> > > +         * However, some operations like ftruncate() or fallocate() might fail
> > > +         * later, let's warn the user.
> > > +         */
> > > +        fd = file_ram_open(mem_path, memory_region_name(mr), true, &created);
> > > +        if (fd >= 0) {
> > > +            warn_report("backing store %s for guest RAM (MAP_PRIVATE) opened"
> > > +                        " readonly because the file is not writable", mem_path);
> > 
> > I can understand the use case, but this will be slightly unwanted,
> > especially the user doesn't yet have a way to predict when will it happen.
> 
> Users can set the file permissions accordingly I guess. If they don't want
> the file to never ever be modified via QEMU, set it R/O.
> 
> > 
> > Meanwhile this changes the behavior, is it a concern that someone may want
> > to rely on current behavior of failing?
> 
> The scenario would be that someone passes a readonly file to "-mem-path" or
> "-object memory-backend-file,share=off,readonly=off", with the expectation
> that it would currently fail.
> 
> If it now doesn't fail (and we warn instead), what would happen is:
> * In file_ram_alloc() we won't even try ftruncate(), because the file
>   already had a size > 0. So ftruncate() is not a concern as I now
>   realize.
> * fallocate might fail later. AFAIKS, that only applies to
>   ram_block_discard_range().
>  -> virtio-mem performs an initial ram_block_discard_range() check and
>     fails gracefully early.
>  -> virtio-ballooon ignores any errors
>  -> ram_discard_range() in migration code fails early for postcopy in
>     init_range() and loadvm_postcopy_ram_handle_discard(), handling it
>     gracefully.
> 
> So mostly nothing "bad" would happen, it might just be undesirable, and we
> properly warn.

Indeed, all of them can fail gracefully, while balloon one is harmless.
Definitely good news.

> 
> Most importantly, we won't be corrupting/touching the original file in any
> case, because it is R/O.
> 
> If we really want to be careful, we could clue that behavior to compat
> machines. I'm not really sure yet if we really have to go down that path.
> 
> Any other alternatives? I'd like to avoid new flags where not really
> required.

I was just thinking of a new flag. :) So have you already discussed that
possibility and decided that not a good idea?

The root issue to me here is we actually have two resources (memory map of
the process, and the file) but we only have one way to describe the
permissions upon the two objects.  I'd think it makes a lot more sense if a
new flag is added, when there's a need to differentiate the two.

Consider if you see a bunch of qemu instances with:

  -mem-path $RAM_FILE

On the same host, which can be as weird as it could be to me.. At least
'-mem-path' looks still like a way to exclusively own a ram file for an
instance. I hesitate the new fallback can confuse people too, while that's
so far not the major use case.

Nobody may really rely on any existing behavior of the failure, but
changing existing behavior is just always not wanted.  The guideline here
to me is: whether we want existing "-mem-path XXX" users to start using the
fallback in general?  If it's "no", then maybe it implies a new flag is
better?

> 
> > 
> > To think from a higher level of current use case, the ideal solution seems
> > to me that if the ram file can be put on a file system that supports CoW
> > itself (like btrfs), we can snapshot that ram file and make it RW for the
> > qemu instance. Then here it'll be able to open the file.  We'll be able to
> > keep the interface working as before, meanwhile it'll work with fallocate
> > or truncations too I assume.
> > 
> > Would that be better instead of changing QEMU?
> 
> As I recently learned, using file-backed VMs (on real ssd/disks, not
> shmem/hugetlb) is usually undesired, because the dirtied pages will
> constantly get rewritten to disk by background writeback threads, eventually
> resulting in bad performance and SSD wear.
> 
> So while using a COW filesystem sounds cleaner in theory, it's not
> applicable in practice -- unless one disables any background writeback,
> which has different side effects because it cannot be configured on a
> per-file basis.
> 
> So for VM templating, it makes sense to capture the guest RAM and store it
> in a file, to then use a COW (MAP_PRIVATE) mapping. Using a read-only file
> makes perfect sense in that scenario IMHO.

Valid point.

> 
> [I'm curious at what point a filesystem will actually break COW. if it's
> wired up to the writenotify infrastructure, it would happen when actually
> writing to a page, not at mmap time. I know that filesystems use writenotify
> for lazy allocation of disk blocks on file holes, maybe they also do that
> for lazy allocation of disk blocks on COW]

I don't know either, but it definitely looks more promising and reasonable
if the CoW only happens until being written, rather than being mapped RW.

Thanks,

-- 
Peter Xu
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by David Hildenbrand 2 years, 6 months ago
>> Most importantly, we won't be corrupting/touching the original file in any
>> case, because it is R/O.
>>
>> If we really want to be careful, we could clue that behavior to compat
>> machines. I'm not really sure yet if we really have to go down that path.
>>
>> Any other alternatives? I'd like to avoid new flags where not really
>> required.
> 
> I was just thinking of a new flag. :) So have you already discussed that
> possibility and decided that not a good idea?

Not really. I was briefly playing with that idea but already struggled 
to come up with a reasonable name :)

Less toggles and just have it working nice, if possible.

> 
> The root issue to me here is we actually have two resources (memory map of
> the process, and the file) but we only have one way to describe the
> permissions upon the two objects.  I'd think it makes a lot more sense if a
> new flag is added, when there's a need to differentiate the two.
> 
> Consider if you see a bunch of qemu instances with:
> 
>    -mem-path $RAM_FILE
> 
> On the same host, which can be as weird as it could be to me.. At least
> '-mem-path' looks still like a way to exclusively own a ram file for an
> instance. I hesitate the new fallback can confuse people too, while that's
> so far not the major use case.

Once I learned that this is not a MAP_SHARED mapping, I was extremely 
confused. For example, vhost-user with "-mem-path" will absolutely not 
work with "-mem-path", even though the documentation explicitly spells 
that out (I still have to send a patch to fix that).

I guess "-mem-path" was primarily only used to consume hugetlb. Even for 
tmpfs it will already result in a double memory consumption, just like 
when using -memory-backend-memfd,share=no.

I guess deprecating it was the right decision.

But memory-backend-file also defaults to "share=no" ... so the same 
default behavior unfortunately.

> 
> Nobody may really rely on any existing behavior of the failure, but
> changing existing behavior is just always not wanted.  The guideline here
> to me is: whether we want existing "-mem-path XXX" users to start using the
> fallback in general?  If it's "no", then maybe it implies a new flag is
> better?

I think we have the following options (there might be more)

1) This patch.

2) New flag for memory-backend-file. We already have "readonly" and 
"share=". I'm having a hard time coming up with a good name that really 
describes the subtle difference.

3) Glue behavior to the QEMU machine


For 3), one option would be to always open a COW file readonly (as 
Thiner originally proposed). We could leave "-mem-path" behavior alone 
and only change memory-backend-file semantics. If the COW file does 
*not* exist yet, we would refuse to create the file like patch 2+3 do. 
Therefore, no ftruncate() errors, and fallocate() errors would always 
happen.


What are your thoughts?

[...]

>>
>> [I'm curious at what point a filesystem will actually break COW. if it's
>> wired up to the writenotify infrastructure, it would happen when actually
>> writing to a page, not at mmap time. I know that filesystems use writenotify
>> for lazy allocation of disk blocks on file holes, maybe they also do that
>> for lazy allocation of disk blocks on COW]
> 
> I don't know either, but it definitely looks more promising and reasonable
> if the CoW only happens until being written, rather than being mapped RW.

That would be my best guess. But then, we have multiple pagecache pages 
point at the same disk block until COW happens ... maybe that's how it 
already works. Once I have some spare time, I might play with that.

-- 
Cheers,

David / dhildenb
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by Daniel P. Berrangé 2 years, 5 months ago
On Thu, Aug 10, 2023 at 04:19:45PM +0200, David Hildenbrand wrote:
> > > Most importantly, we won't be corrupting/touching the original file in any
> > > case, because it is R/O.
> > > 
> > > If we really want to be careful, we could clue that behavior to compat
> > > machines. I'm not really sure yet if we really have to go down that path.
> > > 
> > > Any other alternatives? I'd like to avoid new flags where not really
> > > required.
> > 
> > I was just thinking of a new flag. :) So have you already discussed that
> > possibility and decided that not a good idea?
> 
> Not really. I was briefly playing with that idea but already struggled to
> come up with a reasonable name :)
> 
> Less toggles and just have it working nice, if possible.

IMHO having a new flag is desirable, because it is directly
expressing the desired deployment scenario, such tat we get
good error reporting upon deployment mistakes, while at the
same time allowing the readonly usage.

> > The root issue to me here is we actually have two resources (memory map of
> > the process, and the file) but we only have one way to describe the
> > permissions upon the two objects.  I'd think it makes a lot more sense if a
> > new flag is added, when there's a need to differentiate the two.
> > 
> > Consider if you see a bunch of qemu instances with:
> > 
> >    -mem-path $RAM_FILE
> > 
> > On the same host, which can be as weird as it could be to me.. At least
> > '-mem-path' looks still like a way to exclusively own a ram file for an
> > instance. I hesitate the new fallback can confuse people too, while that's
> > so far not the major use case.
> 
> Once I learned that this is not a MAP_SHARED mapping, I was extremely
> confused. For example, vhost-user with "-mem-path" will absolutely not work
> with "-mem-path", even though the documentation explicitly spells that out
> (I still have to send a patch to fix that).
> 
> I guess "-mem-path" was primarily only used to consume hugetlb. Even for
> tmpfs it will already result in a double memory consumption, just like when
> using -memory-backend-memfd,share=no.
> 
> I guess deprecating it was the right decision.

Regardless of whether its deprecated or not, I think its fine to just
say people need to use the more verbose memory-backend-file syntax
if they want to use an unusual deployment configuration where there is
a readonly backing file.

> > Nobody may really rely on any existing behavior of the failure, but
> > changing existing behavior is just always not wanted.  The guideline here
> > to me is: whether we want existing "-mem-path XXX" users to start using the
> > fallback in general?  If it's "no", then maybe it implies a new flag is
> > better?
> 
> I think we have the following options (there might be more)
> 
> 1) This patch.
> 
> 2) New flag for memory-backend-file. We already have "readonly" and
> "share=". I'm having a hard time coming up with a good name that really
> describes the subtle difference.
> 
> 3) Glue behavior to the QEMU machine
> 
> 
> For 3), one option would be to always open a COW file readonly (as Thiner
> originally proposed). We could leave "-mem-path" behavior alone and only
> change memory-backend-file semantics. If the COW file does *not* exist yet,
> we would refuse to create the file like patch 2+3 do. Therefore, no
> ftruncate() errors, and fallocate() errors would always happen.

I'm for (2).


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re:Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by ThinerLogoer 2 years, 6 months ago
At 2023-08-10 22:19:45, "David Hildenbrand" <david@redhat.com> wrote:
>>> Most importantly, we won't be corrupting/touching the original file in any
>>> case, because it is R/O.
>>>
>>> If we really want to be careful, we could clue that behavior to compat
>>> machines. I'm not really sure yet if we really have to go down that path.
>>>
>>> Any other alternatives? I'd like to avoid new flags where not really
>>> required.
>> 
>> I was just thinking of a new flag. :) So have you already discussed that
>> possibility and decided that not a good idea?
>
>Not really. I was briefly playing with that idea but already struggled 
>to come up with a reasonable name :)
>
>Less toggles and just have it working nice, if possible.
>
>> 
>> The root issue to me here is we actually have two resources (memory map of
>> the process, and the file) but we only have one way to describe the
>> permissions upon the two objects.  I'd think it makes a lot more sense if a
>> new flag is added, when there's a need to differentiate the two.
>> 
>> Consider if you see a bunch of qemu instances with:
>> 
>>    -mem-path $RAM_FILE
>> 
>> On the same host, which can be as weird as it could be to me.. At least
>> '-mem-path' looks still like a way to exclusively own a ram file for an
>> instance. I hesitate the new fallback can confuse people too, while that's
>> so far not the major use case.
>
>Once I learned that this is not a MAP_SHARED mapping, I was extremely 
>confused. For example, vhost-user with "-mem-path" will absolutely not 
>work with "-mem-path", even though the documentation explicitly spells 
>that out (I still have to send a patch to fix that).
>
>I guess "-mem-path" was primarily only used to consume hugetlb. Even for 
>tmpfs it will already result in a double memory consumption, just like 
>when using -memory-backend-memfd,share=no.
>
>I guess deprecating it was the right decision.
>
>But memory-backend-file also defaults to "share=no" ... so the same 
>default behavior unfortunately.
>
>> 
>> Nobody may really rely on any existing behavior of the failure, but
>> changing existing behavior is just always not wanted.  The guideline here
>> to me is: whether we want existing "-mem-path XXX" users to start using the
>> fallback in general?  If it's "no", then maybe it implies a new flag is
>> better?
>
>I think we have the following options (there might be more)
>
>1) This patch.
>
>2) New flag for memory-backend-file. We already have "readonly" and 
>"share=". I'm having a hard time coming up with a good name that really 
>describes the subtle difference.
>
>3) Glue behavior to the QEMU machine
>

4) '-deny-private-discard' argv, or environment variable, or both

I have proposed a 4) earlier in discussion which is to add a global qemu flag like
'-deny-private-discard' or '-disallow-private-discard' (let's find a better name!)
for some duration until private discard behavior phases out. We do
everything exactly the same as before without the flag, and with this flag,
private CoW mapping files are strictly opened readonly, discard on private
memory backend is brutally denied very early when possibility arises,
and private memory backed file are always opened readonly without
creating any file (so the file must exist, no more nasty edge cases).

This has the benefit that it can also help diagnose and debug all existing
private discard usages, which could be required in the long run. Therefore,
And with this flag we directly solves the immediate demand of
<readonly file+private map> while delays the hard problem indefinitely.
I think this solution seems most promising and acceptable by most ones.
At least for my use case, I would be glad to insert a such flag to my argv
if it is all I need, since it does not hurt the flexibility I care about.

Note that difference on this option probably should not cause difference
on the machine specification. Otherwise migration will fail because one
machine has this option and the other does not, which is absurd, since
it is a backend implementation flag.

>
>For 3), one option would be to always open a COW file readonly (as 
>Thiner originally proposed). We could leave "-mem-path" behavior alone 
>and only change memory-backend-file semantics. If the COW file does 
>*not* exist yet, we would refuse to create the file like patch 2+3 do. 
>Therefore, no ftruncate() errors, and fallocate() errors would always 
>happen.
>
>
>What are your thoughts?
>
>[...]
>

I would be happy if -mem-path stays supported since in this case I would
not need knowledge of memory backend before migration.

--

Regards,

logoerthiner
Re: Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by Peter Xu 2 years, 6 months ago
On Fri, Aug 11, 2023 at 01:06:12AM +0800, ThinerLogoer wrote:
> >I think we have the following options (there might be more)
> >
> >1) This patch.
> >
> >2) New flag for memory-backend-file. We already have "readonly" and 
> >"share=". I'm having a hard time coming up with a good name that really 
> >describes the subtle difference.
> >
> >3) Glue behavior to the QEMU machine
> >
> 
> 4) '-deny-private-discard' argv, or environment variable, or both

I'd personally vote for (2).  How about "fdperm"?  To describe when we want
to use different rw permissions on the file (besides the access permission
of the memory we already provided with "readonly"=XXX).  IIUC the only sane
value will be ro/rw/default, where "default" should just use the same rw
permission as the memory ("readonly"=XXX).

Would that be relatively clean and also work in this use case?

(the other thing I'd wish we don't have that fallback is, as long as we
 have any of that "fallback" we'll need to be compatible with it since
 then, and for ever...)

-- 
Peter Xu
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by David Hildenbrand 2 years, 6 months ago
On 10.08.23 23:24, Peter Xu wrote:
> On Fri, Aug 11, 2023 at 01:06:12AM +0800, ThinerLogoer wrote:
>>> I think we have the following options (there might be more)
>>>
>>> 1) This patch.
>>>
>>> 2) New flag for memory-backend-file. We already have "readonly" and
>>> "share=". I'm having a hard time coming up with a good name that really
>>> describes the subtle difference.
>>>
>>> 3) Glue behavior to the QEMU machine
>>>
>>
>> 4) '-deny-private-discard' argv, or environment variable, or both
> 
> I'd personally vote for (2).  How about "fdperm"?  To describe when we want
> to use different rw permissions on the file (besides the access permission
> of the memory we already provided with "readonly"=XXX).  IIUC the only sane
> value will be ro/rw/default, where "default" should just use the same rw
> permission as the memory ("readonly"=XXX).

Hmm, I'm not particularly happy about that.

> 
> Would that be relatively clean and also work in this use case?
> 

I get the feeling that we are over-engineering something that probably 
should never have been allowed: MAP_PRIVATE mapping of a file and 
opening it rw because someone might punch holes into it.

Once we start adding new parameters just for that, I get a bit skeptical 
that this is what we want. The number of people that care about that are 
probably close to 0.

The only real use case where this used to make sense (by accident I 
assume) was with hugetlb. And somehow, we decided that it was a good 
idea for "-mem-path" to use MAP_PRIVATE.

So, what stops us from

a) Leaving -mem-path alone. Keep opening files rw.
b) Make memory-backend-file with shared=off,readonly=off open the file
    read-only
c) Gluing that behavior to a QEMU compat machine

fallocate(PUNCH_HOLE) will fail, and we can probably let 
virtio-mem/virtio-balloon and postcopy refuse to even start (virtio-mem 
already does that) as early as possible.

People that care about any such use case would already get a warning 
when punching a hole today.

If we ever support discarding RAM in that configuration, we can simply 
unlock it again.

Am I missing any important use case?

-- 
Cheers,

David / dhildenb
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by Peter Xu 2 years, 6 months ago
On Fri, Aug 11, 2023 at 04:59:56PM +0200, David Hildenbrand wrote:
> On 10.08.23 23:24, Peter Xu wrote:
> > On Fri, Aug 11, 2023 at 01:06:12AM +0800, ThinerLogoer wrote:
> > > > I think we have the following options (there might be more)
> > > > 
> > > > 1) This patch.
> > > > 
> > > > 2) New flag for memory-backend-file. We already have "readonly" and
> > > > "share=". I'm having a hard time coming up with a good name that really
> > > > describes the subtle difference.
> > > > 
> > > > 3) Glue behavior to the QEMU machine
> > > > 
> > > 
> > > 4) '-deny-private-discard' argv, or environment variable, or both
> > 
> > I'd personally vote for (2).  How about "fdperm"?  To describe when we want
> > to use different rw permissions on the file (besides the access permission
> > of the memory we already provided with "readonly"=XXX).  IIUC the only sane
> > value will be ro/rw/default, where "default" should just use the same rw
> > permission as the memory ("readonly"=XXX).
> 
> Hmm, I'm not particularly happy about that.
> 
> > 
> > Would that be relatively clean and also work in this use case?
> > 
> 
> I get the feeling that we are over-engineering something that probably
> should never have been allowed: MAP_PRIVATE mapping of a file and opening it
> rw because someone might punch holes into it.
> 
> Once we start adding new parameters just for that, I get a bit skeptical
> that this is what we want. The number of people that care about that are
> probably close to 0.
> 
> The only real use case where this used to make sense (by accident I assume)
> was with hugetlb. And somehow, we decided that it was a good idea for
> "-mem-path" to use MAP_PRIVATE.
> 
> So, what stops us from
> 
> a) Leaving -mem-path alone. Keep opening files rw.
> b) Make memory-backend-file with shared=off,readonly=off open the file
>    read-only
> c) Gluing that behavior to a QEMU compat machine

So we want to make all users with shared=off + readonly=off to only open
the file RO always, failing file write ops rather than crashing others.

Sounds reasonable to me.

In that case, do we even need a compat bit, if we're 100% sure it won't
break anyone but only help, with the fact that everything should be
transparent?

-- 
Peter Xu
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by David Hildenbrand 2 years, 6 months ago
On 11.08.23 16:59, David Hildenbrand wrote:
> On 10.08.23 23:24, Peter Xu wrote:
>> On Fri, Aug 11, 2023 at 01:06:12AM +0800, ThinerLogoer wrote:
>>>> I think we have the following options (there might be more)
>>>>
>>>> 1) This patch.
>>>>
>>>> 2) New flag for memory-backend-file. We already have "readonly" and
>>>> "share=". I'm having a hard time coming up with a good name that really
>>>> describes the subtle difference.
>>>>
>>>> 3) Glue behavior to the QEMU machine
>>>>
>>>
>>> 4) '-deny-private-discard' argv, or environment variable, or both
>>
>> I'd personally vote for (2).  How about "fdperm"?  To describe when we want
>> to use different rw permissions on the file (besides the access permission
>> of the memory we already provided with "readonly"=XXX).  IIUC the only sane
>> value will be ro/rw/default, where "default" should just use the same rw
>> permission as the memory ("readonly"=XXX).
> 
> Hmm, I'm not particularly happy about that.
> 
>>
>> Would that be relatively clean and also work in this use case?
>>
> 
> I get the feeling that we are over-engineering something that probably
> should never have been allowed: MAP_PRIVATE mapping of a file and
> opening it rw because someone might punch holes into it.
> 
> Once we start adding new parameters just for that, I get a bit skeptical
> that this is what we want. The number of people that care about that are
> probably close to 0.
> 
> The only real use case where this used to make sense (by accident I
> assume) was with hugetlb. And somehow, we decided that it was a good
> idea for "-mem-path" to use MAP_PRIVATE.
> 
> So, what stops us from
> 
> a) Leaving -mem-path alone. Keep opening files rw.
> b) Make memory-backend-file with shared=off,readonly=off open the file
>      read-only
> c) Gluing that behavior to a QEMU compat machine
> 
> fallocate(PUNCH_HOLE) will fail, and we can probably let
> virtio-mem/virtio-balloon and postcopy refuse to even start (virtio-mem
> already does that) as early as possible.
> 
> People that care about any such use case would already get a warning
> when punching a hole today.
> 
> If we ever support discarding RAM in that configuration, we can simply
> unlock it again.
> 
> Am I missing any important use case?


I just started looking into the origins of "-mem-path".

Originally c902760fb2 ("Add option to use file backed guest memory"):

* Without MAP_POPULATE support, we use MAP_PRIVATE
* With MAP_POPULATE support we use MAP_PRIVATE if mem_prealloc was not
   defined.

It was only used for hugetlb. The shared memory case didn't really 
matter: they just needed a way to get hugetlb pages into the VM. Opening 
the file R/W even with MAP_PRIVATE kind-of made sense in that case, it 
was an exclusive owner.

Discarding of RAM was not very popular back then I guess: virtio-mem 
didn't exist, virtio-balloon doesn't even handle hugetlb today really, 
postcopy didn't exist.


I guess that's why nobody really cared about "-mem-path" MAP_PRIVATE vs. 
MAP_SHARED semantics: just get hugetlb pages into the VM somehow.

Nowadays, "-mem-path" always defaults to MAP_PRIVATE. For the original 
hugetlb use case, it's still good enough. For anything else, I'm not so 
sure.


-- 
Cheers,

David / dhildenb
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by Igor Mammedov 2 years, 5 months ago
On Fri, 11 Aug 2023 17:26:24 +0200
David Hildenbrand <david@redhat.com> wrote:

[...]
> Nowadays, "-mem-path" always defaults to MAP_PRIVATE. For the original 
> hugetlb use case, it's still good enough. For anything else, I'm not so 
> sure.

We can deprecate -mem-path and direct users to use explicitly
defined memory backend with legacy compatible example.
That way users won't come back again trying to fix it or
try inventing options to alter its behavior when using
explicit '-machine memory-backend=foo' can do the job.
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by Peter Xu 2 years, 6 months ago
On Fri, Aug 11, 2023 at 05:26:24PM +0200, David Hildenbrand wrote:
> I just started looking into the origins of "-mem-path".
> 
> Originally c902760fb2 ("Add option to use file backed guest memory"):
> 
> * Without MAP_POPULATE support, we use MAP_PRIVATE
> * With MAP_POPULATE support we use MAP_PRIVATE if mem_prealloc was not
>   defined.
> 
> It was only used for hugetlb. The shared memory case didn't really matter:
> they just needed a way to get hugetlb pages into the VM. Opening the file
> R/W even with MAP_PRIVATE kind-of made sense in that case, it was an
> exclusive owner.
> 
> Discarding of RAM was not very popular back then I guess: virtio-mem didn't
> exist, virtio-balloon doesn't even handle hugetlb today really, postcopy
> didn't exist.
> 
> 
> I guess that's why nobody really cared about "-mem-path" MAP_PRIVATE vs.
> MAP_SHARED semantics: just get hugetlb pages into the VM somehow.
> 
> Nowadays, "-mem-path" always defaults to MAP_PRIVATE. For the original
> hugetlb use case, it's still good enough. For anything else, I'm not so
> sure.

Ok this answers my other question then on the compat bit.. thanks.  Feel
free to ignore there.

But then I'd lean back towards simply adding a fdperm=; that seems the
simplest, since even if with a compat bit, we still face risk of breaking
-mem-path users for anyone using new machine types.

-- 
Peter Xu
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by David Hildenbrand 2 years, 6 months ago
On 11.08.23 18:16, Peter Xu wrote:
> On Fri, Aug 11, 2023 at 05:26:24PM +0200, David Hildenbrand wrote:
>> I just started looking into the origins of "-mem-path".
>>
>> Originally c902760fb2 ("Add option to use file backed guest memory"):
>>
>> * Without MAP_POPULATE support, we use MAP_PRIVATE
>> * With MAP_POPULATE support we use MAP_PRIVATE if mem_prealloc was not
>>    defined.
>>
>> It was only used for hugetlb. The shared memory case didn't really matter:
>> they just needed a way to get hugetlb pages into the VM. Opening the file
>> R/W even with MAP_PRIVATE kind-of made sense in that case, it was an
>> exclusive owner.
>>
>> Discarding of RAM was not very popular back then I guess: virtio-mem didn't
>> exist, virtio-balloon doesn't even handle hugetlb today really, postcopy
>> didn't exist.
>>
>>
>> I guess that's why nobody really cared about "-mem-path" MAP_PRIVATE vs.
>> MAP_SHARED semantics: just get hugetlb pages into the VM somehow.
>>
>> Nowadays, "-mem-path" always defaults to MAP_PRIVATE. For the original
>> hugetlb use case, it's still good enough. For anything else, I'm not so
>> sure.
> 
> Ok this answers my other question then on the compat bit.. thanks.  Feel
> free to ignore there.
> 
> But then I'd lean back towards simply adding a fdperm=; that seems the
> simplest, since even if with a compat bit, we still face risk of breaking
> -mem-path users for anyone using new machine types.

We wouldn't touch "-mem-path".

-- 
Cheers,

David / dhildenb
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by Peter Xu 2 years, 6 months ago
On Fri, Aug 11, 2023 at 06:17:05PM +0200, David Hildenbrand wrote:
> We wouldn't touch "-mem-path".

But still the same issue when someone uses -object memory-backend-file for
hugetlb, mapping privately, expecting ram discard to work?

Basically I see that example as, "hugetlb" in general made the private
mapping over RW file usable, so forbidden that anywhere may take a risk.

Thanks,

-- 
Peter Xu
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by David Hildenbrand 2 years, 6 months ago
On 11.08.23 18:22, Peter Xu wrote:
> On Fri, Aug 11, 2023 at 06:17:05PM +0200, David Hildenbrand wrote:
>> We wouldn't touch "-mem-path".
> 
> But still the same issue when someone uses -object memory-backend-file for
> hugetlb, mapping privately, expecting ram discard to work?
> 
> Basically I see that example as, "hugetlb" in general made the private
> mapping over RW file usable, so forbidden that anywhere may take a risk.

These users can be directed to using hugetlb

a) using MAP_SHARED
b) using memory-backend-memfd, if MAP_PRIVATE is desired

Am I missing any important use case? Are we being a bit to careful about 
virtio-balloon and postcopy simply not being available for these corner 
cases?

-- 
Cheers,

David / dhildenb
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by Peter Xu 2 years, 6 months ago
On Fri, Aug 11, 2023 at 06:25:14PM +0200, David Hildenbrand wrote:
> On 11.08.23 18:22, Peter Xu wrote:
> > On Fri, Aug 11, 2023 at 06:17:05PM +0200, David Hildenbrand wrote:
> > > We wouldn't touch "-mem-path".
> > 
> > But still the same issue when someone uses -object memory-backend-file for
> > hugetlb, mapping privately, expecting ram discard to work?
> > 
> > Basically I see that example as, "hugetlb" in general made the private
> > mapping over RW file usable, so forbidden that anywhere may take a risk.
> 
> These users can be directed to using hugetlb
> 
> a) using MAP_SHARED
> b) using memory-backend-memfd, if MAP_PRIVATE is desired
> 
> Am I missing any important use case? Are we being a bit to careful about
> virtio-balloon and postcopy simply not being available for these corner
> cases?

The current immediate issue is not really mem=rw + fd=rw + private case
(which was a known issue), but how to make mem=rw + fd=ro + private work
for ThinnerBloger, iiuc.

I'd just think it safer to expose that cap to solve problem A (vm
templating) without affecting problem B (fallcate-over-private not working
right), when B is uncertain.

I'm also copy Daniel & libvirt list in case there's quick comment from
there. Say, maybe libvirt never use private mapping on hugetlb files over
memory-backend-file at all, then it's probably fine.

In all cases, you and Igor should have the final grasp; no stand on a
strong opinon from my side.

-- 
Peter Xu
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by David Hildenbrand 2 years, 6 months ago
On 11.08.23 18:54, Peter Xu wrote:
> On Fri, Aug 11, 2023 at 06:25:14PM +0200, David Hildenbrand wrote:
>> On 11.08.23 18:22, Peter Xu wrote:
>>> On Fri, Aug 11, 2023 at 06:17:05PM +0200, David Hildenbrand wrote:
>>>> We wouldn't touch "-mem-path".
>>>
>>> But still the same issue when someone uses -object memory-backend-file for
>>> hugetlb, mapping privately, expecting ram discard to work?
>>>
>>> Basically I see that example as, "hugetlb" in general made the private
>>> mapping over RW file usable, so forbidden that anywhere may take a risk.
>>
>> These users can be directed to using hugetlb
>>
>> a) using MAP_SHARED
>> b) using memory-backend-memfd, if MAP_PRIVATE is desired
>>
>> Am I missing any important use case? Are we being a bit to careful about
>> virtio-balloon and postcopy simply not being available for these corner
>> cases?
> 
> The current immediate issue is not really mem=rw + fd=rw + private case
> (which was a known issue), but how to make mem=rw + fd=ro + private work
> for ThinnerBloger, iiuc.
> 
> I'd just think it safer to expose that cap to solve problem A (vm
> templating) without affecting problem B (fallcate-over-private not working
> right), when B is uncertain.

Right, and I'm thinking about if B is worth the effort.

> 
> I'm also copy Daniel & libvirt list in case there's quick comment from
> there. Say, maybe libvirt never use private mapping on hugetlb files over
> memory-backend-file at all, then it's probably fine.

libvirt certainly allows setting <access mode="shared"/> with <source 
type="file">.

Could be that they also end up mapping "<hugepages>" to 
memory-backend-file instead of memory-backend-memfd (e.g., compatibility 
with older kernels?).

> 
> In all cases, you and Igor should have the final grasp; no stand on a
> strong opinon from my side.

I do value your opinion, so I'm still trying to figure out if there are 
sane use cases that really need a new parameter. Let's recap:

When opening the file R/O, resulting in fallocate() refusing to work:
* virtio-balloon will fail to discard RAM but continue to "be alive"
* virtio-mem will discard any private pages, but cannot free up disk
   blocks using fallocate.
* postcopy would fail early

Postcopy:
* Works on shmem (MAP_SHARED / MAP_PRIVATE)
* Works on hugetlb (MAP_SHARED / MAP_PRIVATE)
* Does not work on file-backed memory (including MAP_PRIVATE)

We can ignore virtio-mem for now. What remains is postcopy and 
virtio-balloon.

memory-backend-file with MAP_PRIVATE on shmem/tmpfs results in a double 
memory consumption, so we can mostly cross that out as "sane use case". 
Rather make such users aware of that :D

memory-backend-file with MAP_PRIVATE on hugetlb works. virtio-balloon is 
not really compatible with hugetlb, free-page-reporting might work 
(although quite non-nonsensical). So postcopy as the most important use 
case remains.

memory-backend-file with MAP_PRIVATE on file-backed memory works. 
postcopy does not apply. virtio-balloon should work I guess.


So the two use cases that are left are:
* postcopy with hugetlb would fail
* virtio-balloon with file-backed memory cannot free up disk blocks

Am I missing a case?

-- 
Cheers,

David / dhildenb
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by Peter Xu 2 years, 6 months ago
On Fri, Aug 11, 2023 at 07:39:37PM +0200, David Hildenbrand wrote:
> On 11.08.23 18:54, Peter Xu wrote:
> > On Fri, Aug 11, 2023 at 06:25:14PM +0200, David Hildenbrand wrote:
> > > On 11.08.23 18:22, Peter Xu wrote:
> > > > On Fri, Aug 11, 2023 at 06:17:05PM +0200, David Hildenbrand wrote:
> > > > > We wouldn't touch "-mem-path".
> > > > 
> > > > But still the same issue when someone uses -object memory-backend-file for
> > > > hugetlb, mapping privately, expecting ram discard to work?
> > > > 
> > > > Basically I see that example as, "hugetlb" in general made the private
> > > > mapping over RW file usable, so forbidden that anywhere may take a risk.
> > > 
> > > These users can be directed to using hugetlb
> > > 
> > > a) using MAP_SHARED
> > > b) using memory-backend-memfd, if MAP_PRIVATE is desired
> > > 
> > > Am I missing any important use case? Are we being a bit to careful about
> > > virtio-balloon and postcopy simply not being available for these corner
> > > cases?
> > 
> > The current immediate issue is not really mem=rw + fd=rw + private case
> > (which was a known issue), but how to make mem=rw + fd=ro + private work
> > for ThinnerBloger, iiuc.
> > 
> > I'd just think it safer to expose that cap to solve problem A (vm
> > templating) without affecting problem B (fallcate-over-private not working
> > right), when B is uncertain.
> 
> Right, and I'm thinking about if B is worth the effort.
> 
> > 
> > I'm also copy Daniel & libvirt list in case there's quick comment from
> > there. Say, maybe libvirt never use private mapping on hugetlb files over
> > memory-backend-file at all, then it's probably fine.
> 
> libvirt certainly allows setting <access mode="shared"/> with <source
> type="file">.
> 
> Could be that they also end up mapping "<hugepages>" to memory-backend-file
> instead of memory-backend-memfd (e.g., compatibility with older kernels?).
> 
> > 
> > In all cases, you and Igor should have the final grasp; no stand on a
> > strong opinon from my side.
> 
> I do value your opinion, so I'm still trying to figure out if there are sane
> use cases that really need a new parameter. Let's recap:
> 
> When opening the file R/O, resulting in fallocate() refusing to work:
> * virtio-balloon will fail to discard RAM but continue to "be alive"
> * virtio-mem will discard any private pages, but cannot free up disk
>   blocks using fallocate.
> * postcopy would fail early
> 
> Postcopy:
> * Works on shmem (MAP_SHARED / MAP_PRIVATE)
> * Works on hugetlb (MAP_SHARED / MAP_PRIVATE)
> * Does not work on file-backed memory (including MAP_PRIVATE)
> 
> We can ignore virtio-mem for now. What remains is postcopy and
> virtio-balloon.
> 
> memory-backend-file with MAP_PRIVATE on shmem/tmpfs results in a double
> memory consumption, so we can mostly cross that out as "sane use case".
> Rather make such users aware of that :D
> 
> memory-backend-file with MAP_PRIVATE on hugetlb works. virtio-balloon is not
> really compatible with hugetlb, free-page-reporting might work (although
> quite non-nonsensical). So postcopy as the most important use case remains.
> 
> memory-backend-file with MAP_PRIVATE on file-backed memory works. postcopy
> does not apply. virtio-balloon should work I guess.
> 
> 
> So the two use cases that are left are:
> * postcopy with hugetlb would fail
> * virtio-balloon with file-backed memory cannot free up disk blocks
> 
> Am I missing a case?

Looks complete.  I don't want to say so, but afaik postcopy should be
"corner case" in most cases I'd say; people do still rely mostly on
precopy.  It's probably a matter of whether we'd like take the risk.

-- 
Peter Xu
Re:Re: Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by ThinerLogoer 2 years, 6 months ago
At 2023-08-11 05:24:43, "Peter Xu" <peterx@redhat.com> wrote:
>On Fri, Aug 11, 2023 at 01:06:12AM +0800, ThinerLogoer wrote:
>> >I think we have the following options (there might be more)
>> >
>> >1) This patch.
>> >
>> >2) New flag for memory-backend-file. We already have "readonly" and 
>> >"share=". I'm having a hard time coming up with a good name that really 
>> >describes the subtle difference.
>> >
>> >3) Glue behavior to the QEMU machine
>> >
>> 
>> 4) '-deny-private-discard' argv, or environment variable, or both
>
>I'd personally vote for (2).  How about "fdperm"?  To describe when we want
>to use different rw permissions on the file (besides the access permission
>of the memory we already provided with "readonly"=XXX).  IIUC the only sane
>value will be ro/rw/default, where "default" should just use the same rw
>permission as the memory ("readonly"=XXX).
>
>Would that be relatively clean and also work in this use case?
>
>(the other thing I'd wish we don't have that fallback is, as long as we
> have any of that "fallback" we'll need to be compatible with it since
> then, and for ever...)

If it must be (2), I would vote (2) + (4), with (4) adjust the default behavior of said `fdperm`.
Mainly because (private+discard) is itself not a good practice and (4) serves
as a good tool to help catch existing (private+discard) problems.

Actually (readonly+private) is more reasonable than (private+discard), so I
want at least one room for a default (readonly+private) behavior.

Also in my case I kind of have to use "-mem-path" despite it being considered
to be close to deprecated. Only with this I can avoid knowledge of memory
backend before migration. Actually there seems to be no equivalent working after-migration
setup of "-object memory-backend-file,... -machine q35,mem=..." that can match
before-migration setup of "-machine q35" (specifying nothing). Therefore
I must make a plan and choose a migration method BEFORE I boot the
machine and prepare to migrate, reducing the operation freedom.
Considering that, I have to use "-mem-path" which keeps the freedom but
has no configurable argument and I have to rely on default config.

Are there any "-object memory-backend-file..." setup equivalent to "-machine q35"
that can migrate from and to each other? If there is, I want to try it out.
By the way "-object memory-backend-file,id=pc.ram" has just been killed by an earlier
commit.

Either (4) or fixing this should help my config. Hope you can consider this
deeper and figure out a more systematic solution that helps more user?

--

Regards,
logoerthiner
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by David Hildenbrand 2 years, 6 months ago
On 11.08.23 07:49, ThinerLogoer wrote:
> At 2023-08-11 05:24:43, "Peter Xu" <peterx@redhat.com> wrote:
>> On Fri, Aug 11, 2023 at 01:06:12AM +0800, ThinerLogoer wrote:
>>>> I think we have the following options (there might be more)
>>>>
>>>> 1) This patch.
>>>>
>>>> 2) New flag for memory-backend-file. We already have "readonly" and
>>>> "share=". I'm having a hard time coming up with a good name that really
>>>> describes the subtle difference.
>>>>
>>>> 3) Glue behavior to the QEMU machine
>>>>
>>>
>>> 4) '-deny-private-discard' argv, or environment variable, or both
>>
>> I'd personally vote for (2).  How about "fdperm"?  To describe when we want
>> to use different rw permissions on the file (besides the access permission
>> of the memory we already provided with "readonly"=XXX).  IIUC the only sane
>> value will be ro/rw/default, where "default" should just use the same rw
>> permission as the memory ("readonly"=XXX).
>>
>> Would that be relatively clean and also work in this use case?
>>
>> (the other thing I'd wish we don't have that fallback is, as long as we
>> have any of that "fallback" we'll need to be compatible with it since
>> then, and for ever...)
> 
> If it must be (2), I would vote (2) + (4), with (4) adjust the default behavior of said `fdperm`.
> Mainly because (private+discard) is itself not a good practice and (4) serves
> as a good tool to help catch existing (private+discard) problems.

Instead of fdperm, maybe we could find a better name.

The man page of "open" says: The argument flags must include one of the 
following access modes: O_RDONLY, O_WRONLY, or O_RDWR.  These request 
opening the file read-only, write-only, or read/write, respectively.

So maybe something a bit more mouthful like "file-access-mode" would be 
better.

We could have the options
*readwrite
*readonly
*auto

Whereby "auto" is the default.

Specifying:

* "readonly=true,file-access-mode=readwrite" would fail.
* "shared=true,readonly=false,file-access-mode=readonly" would fail.
* "shared=false,readonly=false,file-access-mode=readonly" would work.

In theory, we could adjust the mode of "auto" with compat machines. But 
maybe it would be sufficient to do something like the following

1) shared=true,readonly=false
	-> readwrite
2) shared=true,readonly=true
	> readonly
2) shared=false,readonly=true
	-> readonly
3) shared=false,readonly=true
	hugetlb? -> readwrite
	otherwise -> readonly

Reason being the traditional usage of hugetlb with MAP_PRIVATE where I 
can see possible users with postcopy. Further, VM templating with 
hugetlb might not be that common ... users could still modify the 
behavior if they really have to.

That would make your use case work automatically with file-backed memory.

-- 
Cheers,

David / dhildenb
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by Daniel P. Berrangé 2 years, 5 months ago
On Fri, Aug 11, 2023 at 09:00:54PM +0200, David Hildenbrand wrote:
> On 11.08.23 07:49, ThinerLogoer wrote:
> > At 2023-08-11 05:24:43, "Peter Xu" <peterx@redhat.com> wrote:
> > > On Fri, Aug 11, 2023 at 01:06:12AM +0800, ThinerLogoer wrote:
> > > > > I think we have the following options (there might be more)
> > > > > 
> > > > > 1) This patch.
> > > > > 
> > > > > 2) New flag for memory-backend-file. We already have "readonly" and
> > > > > "share=". I'm having a hard time coming up with a good name that really
> > > > > describes the subtle difference.
> > > > > 
> > > > > 3) Glue behavior to the QEMU machine
> > > > > 
> > > > 
> > > > 4) '-deny-private-discard' argv, or environment variable, or both
> > > 
> > > I'd personally vote for (2).  How about "fdperm"?  To describe when we want
> > > to use different rw permissions on the file (besides the access permission
> > > of the memory we already provided with "readonly"=XXX).  IIUC the only sane
> > > value will be ro/rw/default, where "default" should just use the same rw
> > > permission as the memory ("readonly"=XXX).
> > > 
> > > Would that be relatively clean and also work in this use case?
> > > 
> > > (the other thing I'd wish we don't have that fallback is, as long as we
> > > have any of that "fallback" we'll need to be compatible with it since
> > > then, and for ever...)
> > 
> > If it must be (2), I would vote (2) + (4), with (4) adjust the default behavior of said `fdperm`.
> > Mainly because (private+discard) is itself not a good practice and (4) serves
> > as a good tool to help catch existing (private+discard) problems.
> 
> Instead of fdperm, maybe we could find a better name.
> 
> The man page of "open" says: The argument flags must include one of the
> following access modes: O_RDONLY, O_WRONLY, or O_RDWR.  These request
> opening the file read-only, write-only, or read/write, respectively.
> 
> So maybe something a bit more mouthful like "file-access-mode" would be
> better.

I don't think we should directly express the config in terms
of file-access-mode, as that's a low level impl detail. The
required file access mode is an artifact of the higher level
goal, or whether the RAM should be process private vs shared,
and whether we want QEMU to be able to create the backing
file or use pre-create one.

IOW, we should express whether or not we want QEMU to try to
pre-create the file or not.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re:Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by ThinerLogoer 2 years, 6 months ago
At 2023-08-12 03:00:54, "David Hildenbrand" <david@redhat.com> wrote:
>On 11.08.23 07:49, ThinerLogoer wrote:
>> At 2023-08-11 05:24:43, "Peter Xu" <peterx@redhat.com> wrote:
>>> On Fri, Aug 11, 2023 at 01:06:12AM +0800, ThinerLogoer wrote:
>>>>> I think we have the following options (there might be more)
>>>>>
>>>>> 1) This patch.
>>>>>
>>>>> 2) New flag for memory-backend-file. We already have "readonly" and
>>>>> "share=". I'm having a hard time coming up with a good name that really
>>>>> describes the subtle difference.
>>>>>
>>>>> 3) Glue behavior to the QEMU machine
>>>>>
>>>>
>>>> 4) '-deny-private-discard' argv, or environment variable, or both
>>>
>>> I'd personally vote for (2).  How about "fdperm"?  To describe when we want
>>> to use different rw permissions on the file (besides the access permission
>>> of the memory we already provided with "readonly"=XXX).  IIUC the only sane
>>> value will be ro/rw/default, where "default" should just use the same rw
>>> permission as the memory ("readonly"=XXX).
>>>
>>> Would that be relatively clean and also work in this use case?
>>>
>>> (the other thing I'd wish we don't have that fallback is, as long as we
>>> have any of that "fallback" we'll need to be compatible with it since
>>> then, and for ever...)
>> 
>> If it must be (2), I would vote (2) + (4), with (4) adjust the default behavior of said `fdperm`.
>> Mainly because (private+discard) is itself not a good practice and (4) serves
>> as a good tool to help catch existing (private+discard) problems.
>
>Instead of fdperm, maybe we could find a better name.
>
>The man page of "open" says: The argument flags must include one of the 
>following access modes: O_RDONLY, O_WRONLY, or O_RDWR.  These request 
>opening the file read-only, write-only, or read/write, respectively.
>
>So maybe something a bit more mouthful like "file-access-mode" would be 
>better.

Maybe "fd-mode"or "open-mode" to make it shorter and also non ambiguous.

"open-access-mode" can also be considered.

Btw my chatgpt agent suggested 10 names to me, in case you feel hard to
decide a name:

    access-mode
    file-mode
    open-mode
    file-permission
    file-access-mode (aha!)
    file-access-permission
    file-io-access-mode
    file-open-permission
    file-operation-mode
    file-read-write-mode


>
>We could have the options
>*readwrite
>*readonly
>*auto
>
>Whereby "auto" is the default.

I like the "auto" here.

>
>Specifying:
>
>* "readonly=true,file-access-mode=readwrite" would fail.
>* "shared=true,readonly=false,file-access-mode=readonly" would fail.
>* "shared=false,readonly=false,file-access-mode=readonly" would work.

Yeah, sanitizing the arguments here is wise.

>
>In theory, we could adjust the mode of "auto" with compat machines. But 
>maybe it would be sufficient to do something like the following
>
>1) shared=true,readonly=false
>	-> readwrite
>2) shared=true,readonly=true
>	> readonly
>2) shared=false,readonly=true
>	-> readonly
>3) shared=false,readonly=true
>	hugetlb? -> readwrite
>	otherwise -> readonly

Looks like there is some typo here. I assume it was:

1) shared=true,readonly=false
	-> readwrite
2) shared=true,readonly=true
	-> readonly
3) shared=false,readonly=true
	-> readonly
4) shared=false,readonly=false
	hugetlb? -> readwrite
	otherwise -> readonly


>Reason being the traditional usage of hugetlb with MAP_PRIVATE where I 
>can see possible users with postcopy. Further, VM templating with 
>hugetlb might not be that common ... users could still modify the 
>behavior if they really have to.
>
>That would make your use case work automatically with file-backed memory.

This seems a good idea. I am good with the solution you proposed
here as well.


--

Regards,

logoerthiner
Re: Re: Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by Peter Xu 2 years, 6 months ago
On Fri, Aug 11, 2023 at 01:49:52PM +0800, ThinerLogoer wrote:
> At 2023-08-11 05:24:43, "Peter Xu" <peterx@redhat.com> wrote:
> >On Fri, Aug 11, 2023 at 01:06:12AM +0800, ThinerLogoer wrote:
> >> >I think we have the following options (there might be more)
> >> >
> >> >1) This patch.
> >> >
> >> >2) New flag for memory-backend-file. We already have "readonly" and 
> >> >"share=". I'm having a hard time coming up with a good name that really 
> >> >describes the subtle difference.
> >> >
> >> >3) Glue behavior to the QEMU machine
> >> >
> >> 
> >> 4) '-deny-private-discard' argv, or environment variable, or both
> >
> >I'd personally vote for (2).  How about "fdperm"?  To describe when we want
> >to use different rw permissions on the file (besides the access permission
> >of the memory we already provided with "readonly"=XXX).  IIUC the only sane
> >value will be ro/rw/default, where "default" should just use the same rw
> >permission as the memory ("readonly"=XXX).
> >
> >Would that be relatively clean and also work in this use case?
> >
> >(the other thing I'd wish we don't have that fallback is, as long as we
> > have any of that "fallback" we'll need to be compatible with it since
> > then, and for ever...)
> 
> If it must be (2), I would vote (2) + (4), with (4) adjust the default behavior of said `fdperm`.
> Mainly because (private+discard) is itself not a good practice and (4) serves
> as a good tool to help catch existing (private+discard) problems.
> 
> Actually (readonly+private) is more reasonable than (private+discard), so I
> want at least one room for a default (readonly+private) behavior.

Just for purely discussion purpose: I think maybe someday private+discard
could work.  IIUC what we're missing is an syscall interface to install a
zero page for a MAP_PRIVATE, atomically freeing what's underneath: it seems
either punching a hole or DONTNEED won't suffice here.  It'll just be
another problem when having zero page involved in file mappings at least.

> 
> Also in my case I kind of have to use "-mem-path" despite it being considered
> to be close to deprecated. Only with this I can avoid knowledge of memory
> backend before migration. Actually there seems to be no equivalent working after-migration
> setup of "-object memory-backend-file,... -machine q35,mem=..." that can match
> before-migration setup of "-machine q35" (specifying nothing). Therefore
> I must make a plan and choose a migration method BEFORE I boot the
> machine and prepare to migrate, reducing the operation freedom.
> Considering that, I have to use "-mem-path" which keeps the freedom but
> has no configurable argument and I have to rely on default config.
> 
> Are there any "-object memory-backend-file..." setup equivalent to "-machine q35"
> that can migrate from and to each other? If there is, I want to try it out.
> By the way "-object memory-backend-file,id=pc.ram" has just been killed by an earlier
> commit.

I'm actually not familiar enough on the interfaces here, but I just checked
up the man page; would this work for you, together with option (2)?

        memory-backend='id'
                An alternative to legacy -mem-path and mem-prealloc options.  Allows to use a memory backend as main RAM.

                For example:

                -object memory-backend-file,id=pc.ram,size=512M,mem-path=/hugetlbfs,prealloc=on,share=on
                -machine memory-backend=pc.ram
                -m 512M

-- 
Peter Xu
Re:Re: Re: Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by ThinerLogoer 2 years, 6 months ago
At 2023-08-11 22:31:36, "Peter Xu" <peterx@redhat.com> wrote:
>On Fri, Aug 11, 2023 at 01:49:52PM +0800, ThinerLogoer wrote:
>> At 2023-08-11 05:24:43, "Peter Xu" <peterx@redhat.com> wrote:
>> >On Fri, Aug 11, 2023 at 01:06:12AM +0800, ThinerLogoer wrote:
>> >> >I think we have the following options (there might be more)
>> >> >
>> >> >1) This patch.
>> >> >
>> >> >2) New flag for memory-backend-file. We already have "readonly" and 
>> >> >"share=". I'm having a hard time coming up with a good name that really 
>> >> >describes the subtle difference.
>> >> >
>> >> >3) Glue behavior to the QEMU machine
>> >> >
>> >> 
>> >> 4) '-deny-private-discard' argv, or environment variable, or both
>> >
>> >I'd personally vote for (2).  How about "fdperm"?  To describe when we want
>> >to use different rw permissions on the file (besides the access permission
>> >of the memory we already provided with "readonly"=XXX).  IIUC the only sane
>> >value will be ro/rw/default, where "default" should just use the same rw
>> >permission as the memory ("readonly"=XXX).
>> >
>> >Would that be relatively clean and also work in this use case?
>> >
>> >(the other thing I'd wish we don't have that fallback is, as long as we
>> > have any of that "fallback" we'll need to be compatible with it since
>> > then, and for ever...)
>> 
>> If it must be (2), I would vote (2) + (4), with (4) adjust the default behavior of said `fdperm`.
>> Mainly because (private+discard) is itself not a good practice and (4) serves
>> as a good tool to help catch existing (private+discard) problems.
>> 
>> Actually (readonly+private) is more reasonable than (private+discard), so I
>> want at least one room for a default (readonly+private) behavior.
>
>Just for purely discussion purpose: I think maybe someday private+discard
>could work.  IIUC what we're missing is an syscall interface to install a
>zero page for a MAP_PRIVATE, atomically freeing what's underneath: it seems
>either punching a hole or DONTNEED won't suffice here.  It'll just be
>another problem when having zero page involved in file mappings at least.
>
>> 
>> Also in my case I kind of have to use "-mem-path" despite it being considered
>> to be close to deprecated. Only with this I can avoid knowledge of memory
>> backend before migration. Actually there seems to be no equivalent working after-migration
>> setup of "-object memory-backend-file,... -machine q35,mem=..." that can match
>> before-migration setup of "-machine q35" (specifying nothing). Therefore
>> I must make a plan and choose a migration method BEFORE I boot the
>> machine and prepare to migrate, reducing the operation freedom.
>> Considering that, I have to use "-mem-path" which keeps the freedom but
>> has no configurable argument and I have to rely on default config.
>> 
>> Are there any "-object memory-backend-file..." setup equivalent to "-machine q35"
>> that can migrate from and to each other? If there is, I want to try it out.
>> By the way "-object memory-backend-file,id=pc.ram" has just been killed by an earlier
>> commit.
>
>I'm actually not familiar enough on the interfaces here, but I just checked
>up the man page; would this work for you, together with option (2)?
>
>        memory-backend='id'
>                An alternative to legacy -mem-path and mem-prealloc options.  Allows to use a memory backend as main RAM.
>
>                For example:
>
>                -object memory-backend-file,id=pc.ram,size=512M,mem-path=/hugetlbfs,prealloc=on,share=on
>                -machine memory-backend=pc.ram
>                -m 512M
>

Wait ... I thought it should not work but it did work today. How bad am I at reading
the correct part of documentation ...

'-machine q35 -m 512M' is equivalent to '-object
memory-backend-file,id=pc.ram,size=512M
-machine q35,memory-backend=pc.ram',
the latter works, and the two mentioned setup can be
migrated from one to another.

What I was consistently trying was '-object
memory-backend-file,id=pc.ram,size=512M -machine q35', and qemu raises an error
for this in a recent update:

```
qemu-system-x86_64: object name 'pc.ram' is reserved for the default RAM backend, it can't be used for any other purposes. Change the object's 'id' to something else
```

This error is misleading. Actually in this case, the error report message should be more
close to:
```
object name 'pc.ram' is reserved for the default RAM backend, it can't
be used for any other purposes. Change the object's 'id' to something
else, or append "memory-backend=pc.ram" to -machine arguments
```

(I suggest rewriting the error message like this string because of the confusion just now)


Even though the default memory backend name is pc.ram, the
'-machine q35,memory-backend=pc.ram' part explicitly marks that qemu
uses a memory backend named pc.ram, rather than rely on default.

It seems that if it "rely on default" and memory-backend-file has an id
of "pc.ram" (in x86_64 of course), it will fail.

Great. Now I will consider using a "-object
memory-backend-file,id=pc.ram,size=512M
-machine q35,memory-backend=pc.ram"

--

Regards,

logoerthiner
Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by David Hildenbrand 2 years, 5 months ago
On 12.08.23 08:21, ThinerLogoer wrote:
> At 2023-08-11 22:31:36, "Peter Xu" <peterx@redhat.com> wrote:
>> On Fri, Aug 11, 2023 at 01:49:52PM +0800, ThinerLogoer wrote:
>>> At 2023-08-11 05:24:43, "Peter Xu" <peterx@redhat.com> wrote:
>>>> On Fri, Aug 11, 2023 at 01:06:12AM +0800, ThinerLogoer wrote:
>>>>>> I think we have the following options (there might be more)
>>>>>>
>>>>>> 1) This patch.
>>>>>>
>>>>>> 2) New flag for memory-backend-file. We already have "readonly" and
>>>>>> "share=". I'm having a hard time coming up with a good name that really
>>>>>> describes the subtle difference.
>>>>>>
>>>>>> 3) Glue behavior to the QEMU machine
>>>>>>
>>>>>
>>>>> 4) '-deny-private-discard' argv, or environment variable, or both
>>>>
>>>> I'd personally vote for (2).  How about "fdperm"?  To describe when we want
>>>> to use different rw permissions on the file (besides the access permission
>>>> of the memory we already provided with "readonly"=XXX).  IIUC the only sane
>>>> value will be ro/rw/default, where "default" should just use the same rw
>>>> permission as the memory ("readonly"=XXX).
>>>>
>>>> Would that be relatively clean and also work in this use case?
>>>>
>>>> (the other thing I'd wish we don't have that fallback is, as long as we
>>>> have any of that "fallback" we'll need to be compatible with it since
>>>> then, and for ever...)
>>>
>>> If it must be (2), I would vote (2) + (4), with (4) adjust the default behavior of said `fdperm`.
>>> Mainly because (private+discard) is itself not a good practice and (4) serves
>>> as a good tool to help catch existing (private+discard) problems.
>>>
>>> Actually (readonly+private) is more reasonable than (private+discard), so I
>>> want at least one room for a default (readonly+private) behavior.
>>
>> Just for purely discussion purpose: I think maybe someday private+discard
>> could work.  IIUC what we're missing is an syscall interface to install a
>> zero page for a MAP_PRIVATE, atomically freeing what's underneath: it seems
>> either punching a hole or DONTNEED won't suffice here.  It'll just be
>> another problem when having zero page involved in file mappings at least.
>>
>>>
>>> Also in my case I kind of have to use "-mem-path" despite it being considered
>>> to be close to deprecated. Only with this I can avoid knowledge of memory
>>> backend before migration. Actually there seems to be no equivalent working after-migration
>>> setup of "-object memory-backend-file,... -machine q35,mem=..." that can match
>>> before-migration setup of "-machine q35" (specifying nothing). Therefore
>>> I must make a plan and choose a migration method BEFORE I boot the
>>> machine and prepare to migrate, reducing the operation freedom.
>>> Considering that, I have to use "-mem-path" which keeps the freedom but
>>> has no configurable argument and I have to rely on default config.
>>>
>>> Are there any "-object memory-backend-file..." setup equivalent to "-machine q35"
>>> that can migrate from and to each other? If there is, I want to try it out.
>>> By the way "-object memory-backend-file,id=pc.ram" has just been killed by an earlier
>>> commit.
>>
>> I'm actually not familiar enough on the interfaces here, but I just checked
>> up the man page; would this work for you, together with option (2)?
>>
>>         memory-backend='id'
>>                 An alternative to legacy -mem-path and mem-prealloc options.  Allows to use a memory backend as main RAM.
>>
>>                 For example:
>>
>>                 -object memory-backend-file,id=pc.ram,size=512M,mem-path=/hugetlbfs,prealloc=on,share=on
>>                 -machine memory-backend=pc.ram
>>                 -m 512M
>>
> 
> Wait ... I thought it should not work but it did work today. How bad am I at reading
> the correct part of documentation ...
> 
> '-machine q35 -m 512M' is equivalent to '-object
> memory-backend-file,id=pc.ram,size=512M
> -machine q35,memory-backend=pc.ram',
> the latter works, and the two mentioned setup can be
> migrated from one to another.
> 
> What I was consistently trying was '-object
> memory-backend-file,id=pc.ram,size=512M -machine q35', and qemu raises an error
> for this in a recent update:
> 
> ```
> qemu-system-x86_64: object name 'pc.ram' is reserved for the default RAM backend, it can't be used for any other purposes. Change the object's 'id' to something else
> ```
> 
> This error is misleading. Actually in this case, the error report message should be more
> close to:
> ```
> object name 'pc.ram' is reserved for the default RAM backend, it can't
> be used for any other purposes. Change the object's 'id' to something
> else, or append "memory-backend=pc.ram" to -machine arguments
> ```

What about:

$ ./build/qemu-system-x86_64 -object 
memory-backend-file,id=pc.ram,size=512M,mem-path=tmp -machine q35
qemu-system-x86_64: object name 'pc.ram' is reserved for the default RAM 
backend, it can't be used for any other purposes. Change the object's 
'id' to something else or disable automatic creation of the default RAM 
backend by setting the 'memory-backend' machine property

?

-- 
Cheers,

David / dhildenb
Re:Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping
Posted by ThinerLogoer 2 years, 6 months ago
At 2023-08-09 05:01:17, "Peter Xu" <peterx@redhat.com> wrote:
>On Mon, Aug 07, 2023 at 09:07:32PM +0200, David Hildenbrand wrote:
>> From: Thiner Logoer <logoerthiner1@163.com>
>> 
>> Users may specify
>> * "-mem-path" or
>> * "-object memory-backend-file,share=off,readonly=off"
>> and expect such COW (MAP_PRIVATE) mappings to work, even if the user
>> does not have write permissions to open the file.
>> 
>> For now, we would always fail in that case, always requiring file write
>> permissions. Let's detect when that failure happens and fallback to opening
>> the file readonly.
>> 
>> Warn the user, since there are other use cases where we want the file to
>> be mapped writable: ftruncate() and fallocate() will fail if the file
>> was not opened with write permissions.
>> 
>> Signed-off-by: Thiner Logoer <logoerthiner1@163.com>
>> Co-developed-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  softmmu/physmem.c | 26 ++++++++++++++++++--------
>>  1 file changed, 18 insertions(+), 8 deletions(-)
>> 
>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>> index 3df73542e1..d1ae694b20 100644
>> --- a/softmmu/physmem.c
>> +++ b/softmmu/physmem.c
>> @@ -1289,8 +1289,7 @@ static int64_t get_file_align(int fd)
>>  static int file_ram_open(const char *path,
>>                           const char *region_name,
>>                           bool readonly,
>> -                         bool *created,
>> -                         Error **errp)
>> +                         bool *created)
>>  {
>>      char *filename;
>>      char *sanitized_name;
>> @@ -1334,10 +1333,7 @@ static int file_ram_open(const char *path,
>>              g_free(filename);
>>          }
>>          if (errno != EEXIST && errno != EINTR) {
>> -            error_setg_errno(errp, errno,
>> -                             "can't open backing store %s for guest RAM",
>> -                             path);
>> -            return -1;
>> +            return -errno;
>>          }
>>          /*
>>           * Try again on EINTR and EEXIST.  The latter happens when
>> @@ -1946,9 +1942,23 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
>>      bool created;
>>      RAMBlock *block;
>>  
>> -    fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created,
>> -                       errp);
>> +    fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created);
>> +    if (fd == -EACCES && !(ram_flags & RAM_SHARED) && !readonly) {
>> +        /*
>> +         * We can have a writable MAP_PRIVATE mapping of a readonly file.
>> +         * However, some operations like ftruncate() or fallocate() might fail
>> +         * later, let's warn the user.
>> +         */
>> +        fd = file_ram_open(mem_path, memory_region_name(mr), true, &created);
>> +        if (fd >= 0) {
>> +            warn_report("backing store %s for guest RAM (MAP_PRIVATE) opened"
>> +                        " readonly because the file is not writable", mem_path);
>
>I can understand the use case, but this will be slightly unwanted,
>especially the user doesn't yet have a way to predict when will it happen.
>
>Meanwhile this changes the behavior, is it a concern that someone may want
>to rely on current behavior of failing?
>

I am happy to learn if there is some solid evidence supporting this view.

The target of compatibility is "private+discard" which seems itself pathological
practice in early discussion. The only difference in behavior that might be unwanted
in your argument lies in "readonly file+private+discard" failure time. Before the
patch it fails early, after the patch it fails later and may does additional stuff.
Do you think that a bug reporting mechanism which relies on qemu failure
timing is valid?

If someone argues that "readonly file+private+discard" early failure behavior
is all where their system relies, I would be happy to learn why. Actually this
is much easier to solve outside qemu, by checking memory file permission,
compared to the "readonly+private" alternative plan that requires a btrfs.

In the long run the "private+discard" setup may itself be warned and
deprecated, rather than "private+readonly file" which is supported by
linux kernel.

Current patch is already a compromise considering compatibility, as it
always tries open the file in readwrite mode first, to persist behavior on
"readwrite+private+discard" case. Personally I prefer to try opening the file
readonly first, as this will reduce the risk of opening ram file accidentally
with write permission.

However I can accept a second level of compromise, aka adding "-disallow-private-discard"
to qemu argument for at least three qemu releases before "private+discard"
get deprecated and removed, if extreme compatibility enthusiast is involved.
With this argument, readonly private is enabled and private
discard fails immediately when discard request is detected, and without this
argument readonly private is disabled and the behavior is unchanged.
This argument is useful also because it helps
finding out existent "private+discard" behavior and assists debugging.
Do you think that this solution pays off?
(I am no expert qemu programmer, hope anyone else to handle the command
line arguments if this is preferred)

>To think from a higher level of current use case, the ideal solution seems
>to me that if the ram file can be put on a file system that supports CoW
>itself (like btrfs), we can snapshot that ram file and make it RW for the
>qemu instance. Then here it'll be able to open the file.  We'll be able to
>keep the interface working as before, meanwhile it'll work with fallocate
>or truncations too I assume.
>
>Would that be better instead of changing QEMU?
>

I am afraid that your alternative solution will make it still impossible
for the use case to be used on rootless system; while MAP_PRIVATE
based CoW works anywhere. For example this will not work on ext4 with
readonly file mode, or that the file is owned by root and you are
normal user, and on squashfs. Basically everything on the machine
should be readonly and you are also only a nobody, then
only the in-qemu choice is possible.

I believe the warning that the file is opened readonly is enough
for potential user to consider doing otherwise. Do you think that
looking solely at qemu exit status and pipe all qemu stderr log to /dev/null
all the time is one of the supported use cases?

--

Regards,

logoerthiner