[PATCH v3 02/12] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory

David Hildenbrand posted 12 patches 4 years, 8 months ago
Maintainers: Thomas Huth <huth@tuxfamily.org>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Igor Mammedov <imammedo@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Juan Quintela <quintela@redhat.com>, Stefan Weil <sw@weilnetz.de>, Paolo Bonzini <pbonzini@redhat.com>, Eduardo Habkost <ehabkost@redhat.com>
There is a newer version of this series
[PATCH v3 02/12] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory
Posted by David Hildenbrand 4 years, 8 months ago
We can create shared anonymous memory via
    "-object memory-backend-ram,share=on,..."
which is, for example, required by PVRDMA for mremap() to work.

Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we
have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing.

Fixes: 06329ccecfa0 ("mem: add share parameter to memory-backend-ram")
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 softmmu/physmem.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 62ea4abbdd..2ba815fec6 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3506,6 +3506,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
         /* The logic here is messy;
          *    madvise DONTNEED fails for hugepages
          *    fallocate works on hugepages and shmem
+         *    shared anonymous memory requires madvise REMOVE
          */
         need_madvise = (rb->page_size == qemu_host_page_size);
         need_fallocate = rb->fd != -1;
@@ -3539,7 +3540,11 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
              * fallocate'd away).
              */
 #if defined(CONFIG_MADVISE)
-            ret =  madvise(host_startaddr, length, MADV_DONTNEED);
+            if (qemu_ram_is_shared(rb) && rb->fd < 0) {
+                ret = madvise(host_startaddr, length, MADV_REMOVE);
+            } else {
+                ret = madvise(host_startaddr, length, MADV_DONTNEED);
+            }
             if (ret) {
                 ret = -errno;
                 error_report("ram_block_discard_range: Failed to discard range "
-- 
2.29.2


Re: [PATCH v3 02/12] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory
Posted by Dr. David Alan Gilbert 4 years, 8 months ago
* David Hildenbrand (david@redhat.com) wrote:
> We can create shared anonymous memory via
>     "-object memory-backend-ram,share=on,..."
> which is, for example, required by PVRDMA for mremap() to work.
> 
> Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we
> have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing.

OK, I wonder how stable these rules are; is it defined anywhere that
it's required?

Still,


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> Fixes: 06329ccecfa0 ("mem: add share parameter to memory-backend-ram")
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  softmmu/physmem.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 62ea4abbdd..2ba815fec6 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -3506,6 +3506,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
>          /* The logic here is messy;
>           *    madvise DONTNEED fails for hugepages
>           *    fallocate works on hugepages and shmem
> +         *    shared anonymous memory requires madvise REMOVE
>           */
>          need_madvise = (rb->page_size == qemu_host_page_size);
>          need_fallocate = rb->fd != -1;
> @@ -3539,7 +3540,11 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
>               * fallocate'd away).
>               */
>  #if defined(CONFIG_MADVISE)
> -            ret =  madvise(host_startaddr, length, MADV_DONTNEED);
> +            if (qemu_ram_is_shared(rb) && rb->fd < 0) {
> +                ret = madvise(host_startaddr, length, MADV_REMOVE);
> +            } else {
> +                ret = madvise(host_startaddr, length, MADV_DONTNEED);
> +            }
>              if (ret) {
>                  ret = -errno;
>                  error_report("ram_block_discard_range: Failed to discard range "
> -- 
> 2.29.2
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


Re: [PATCH v3 02/12] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory
Posted by David Hildenbrand 4 years, 8 months ago
On 11.03.21 17:39, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
>> We can create shared anonymous memory via
>>      "-object memory-backend-ram,share=on,..."
>> which is, for example, required by PVRDMA for mremap() to work.
>>
>> Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we
>> have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing.
> 
> OK, I wonder how stable these rules are; is it defined anywhere that
> it's required?
> 

I had a look at the Linux implementation: it's essentially shmem ... but 
we don't have an fd exposed, so we cannot use fallocate() ... :)

MADV_REMOVE documents (man):

"In the initial implementation, only tmpfs(5) was supported MADV_REMOVE; 
but since Linux 3.5, any filesystem which supports the fallocate(2) 
FALLOC_FL_PUNCH_HOLE mode also supports MADV_REMOVE."


Thanks!

> Still,
> 
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
>> Fixes: 06329ccecfa0 ("mem: add share parameter to memory-backend-ram")
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   softmmu/physmem.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>> index 62ea4abbdd..2ba815fec6 100644
>> --- a/softmmu/physmem.c
>> +++ b/softmmu/physmem.c
>> @@ -3506,6 +3506,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
>>           /* The logic here is messy;
>>            *    madvise DONTNEED fails for hugepages
>>            *    fallocate works on hugepages and shmem
>> +         *    shared anonymous memory requires madvise REMOVE
>>            */
>>           need_madvise = (rb->page_size == qemu_host_page_size);
>>           need_fallocate = rb->fd != -1;
>> @@ -3539,7 +3540,11 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
>>                * fallocate'd away).
>>                */
>>   #if defined(CONFIG_MADVISE)
>> -            ret =  madvise(host_startaddr, length, MADV_DONTNEED);
>> +            if (qemu_ram_is_shared(rb) && rb->fd < 0) {
>> +                ret = madvise(host_startaddr, length, MADV_REMOVE);
>> +            } else {
>> +                ret = madvise(host_startaddr, length, MADV_DONTNEED);
>> +            }
>>               if (ret) {
>>                   ret = -errno;
>>                   error_report("ram_block_discard_range: Failed to discard range "
>> -- 
>> 2.29.2
>>


-- 
Thanks,

David / dhildenb


Re: [PATCH v3 02/12] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory
Posted by Peter Xu 4 years, 8 months ago
On Thu, Mar 11, 2021 at 05:45:46PM +0100, David Hildenbrand wrote:
> On 11.03.21 17:39, Dr. David Alan Gilbert wrote:
> > * David Hildenbrand (david@redhat.com) wrote:
> > > We can create shared anonymous memory via
> > >      "-object memory-backend-ram,share=on,..."
> > > which is, for example, required by PVRDMA for mremap() to work.
> > > 
> > > Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we
> > > have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing.
> > 
> > OK, I wonder how stable these rules are; is it defined anywhere that
> > it's required?
> > 
> 
> I had a look at the Linux implementation: it's essentially shmem ... but we
> don't have an fd exposed, so we cannot use fallocate() ... :)
> 
> MADV_REMOVE documents (man):
> 
> "In the initial implementation, only tmpfs(5) was supported MADV_REMOVE; but
> since Linux 3.5, any filesystem which supports the fallocate(2)
> FALLOC_FL_PUNCH_HOLE mode also supports MADV_REMOVE."

Hmm, I see that MADV_DONTNEED will still tear down all mappings even for
anonymous shmem.. what did I miss?

-- 
Peter Xu


Re: [PATCH v3 02/12] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory
Posted by David Hildenbrand 4 years, 8 months ago
On 11.03.21 18:11, Peter Xu wrote:
> On Thu, Mar 11, 2021 at 05:45:46PM +0100, David Hildenbrand wrote:
>> On 11.03.21 17:39, Dr. David Alan Gilbert wrote:
>>> * David Hildenbrand (david@redhat.com) wrote:
>>>> We can create shared anonymous memory via
>>>>       "-object memory-backend-ram,share=on,..."
>>>> which is, for example, required by PVRDMA for mremap() to work.
>>>>
>>>> Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we
>>>> have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing.
>>>
>>> OK, I wonder how stable these rules are; is it defined anywhere that
>>> it's required?
>>>
>>
>> I had a look at the Linux implementation: it's essentially shmem ... but we
>> don't have an fd exposed, so we cannot use fallocate() ... :)
>>
>> MADV_REMOVE documents (man):
>>
>> "In the initial implementation, only tmpfs(5) was supported MADV_REMOVE; but
>> since Linux 3.5, any filesystem which supports the fallocate(2)
>> FALLOC_FL_PUNCH_HOLE mode also supports MADV_REMOVE."
> 
> Hmm, I see that MADV_DONTNEED will still tear down all mappings even for
> anonymous shmem.. what did I miss?

Where did you see that?

> 

MADV_DONTNEED only invalidates private copies in the pagecache. It's 
essentially useless for any kind of shared mappings.

(I am 99.9% sure that we can replace fallocate()+MADV_DONTNEED by 
fallocate() for fd-based shared mappings, but that's a different story)

-- 
Thanks,

David / dhildenb


Re: [PATCH v3 02/12] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory
Posted by Peter Xu 4 years, 8 months ago
On Thu, Mar 11, 2021 at 06:15:15PM +0100, David Hildenbrand wrote:
> On 11.03.21 18:11, Peter Xu wrote:
> > On Thu, Mar 11, 2021 at 05:45:46PM +0100, David Hildenbrand wrote:
> > > On 11.03.21 17:39, Dr. David Alan Gilbert wrote:
> > > > * David Hildenbrand (david@redhat.com) wrote:
> > > > > We can create shared anonymous memory via
> > > > >       "-object memory-backend-ram,share=on,..."
> > > > > which is, for example, required by PVRDMA for mremap() to work.
> > > > > 
> > > > > Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we
> > > > > have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing.
> > > > 
> > > > OK, I wonder how stable these rules are; is it defined anywhere that
> > > > it's required?
> > > > 
> > > 
> > > I had a look at the Linux implementation: it's essentially shmem ... but we
> > > don't have an fd exposed, so we cannot use fallocate() ... :)
> > > 
> > > MADV_REMOVE documents (man):
> > > 
> > > "In the initial implementation, only tmpfs(5) was supported MADV_REMOVE; but
> > > since Linux 3.5, any filesystem which supports the fallocate(2)
> > > FALLOC_FL_PUNCH_HOLE mode also supports MADV_REMOVE."
> > 
> > Hmm, I see that MADV_DONTNEED will still tear down all mappings even for
> > anonymous shmem.. what did I miss?
> 
> Where did you see that?

I see madvise_dontneed_free() calls zap_page_range().

> 
> > 
> 
> MADV_DONTNEED only invalidates private copies in the pagecache. It's
> essentially useless for any kind of shared mappings.

Since it's about zapping page tables, then I don't understand why it won't work
for shmem..

-- 
Peter Xu


Re: [PATCH v3 02/12] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory
Posted by David Hildenbrand 4 years, 8 months ago
On 11.03.21 18:22, Peter Xu wrote:
> On Thu, Mar 11, 2021 at 06:15:15PM +0100, David Hildenbrand wrote:
>> On 11.03.21 18:11, Peter Xu wrote:
>>> On Thu, Mar 11, 2021 at 05:45:46PM +0100, David Hildenbrand wrote:
>>>> On 11.03.21 17:39, Dr. David Alan Gilbert wrote:
>>>>> * David Hildenbrand (david@redhat.com) wrote:
>>>>>> We can create shared anonymous memory via
>>>>>>        "-object memory-backend-ram,share=on,..."
>>>>>> which is, for example, required by PVRDMA for mremap() to work.
>>>>>>
>>>>>> Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we
>>>>>> have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing.
>>>>>
>>>>> OK, I wonder how stable these rules are; is it defined anywhere that
>>>>> it's required?
>>>>>
>>>>
>>>> I had a look at the Linux implementation: it's essentially shmem ... but we
>>>> don't have an fd exposed, so we cannot use fallocate() ... :)
>>>>
>>>> MADV_REMOVE documents (man):
>>>>
>>>> "In the initial implementation, only tmpfs(5) was supported MADV_REMOVE; but
>>>> since Linux 3.5, any filesystem which supports the fallocate(2)
>>>> FALLOC_FL_PUNCH_HOLE mode also supports MADV_REMOVE."
>>>
>>> Hmm, I see that MADV_DONTNEED will still tear down all mappings even for
>>> anonymous shmem.. what did I miss?
>>
>> Where did you see that?
> 
> I see madvise_dontneed_free() calls zap_page_range().
> 
>>
>>>
>>
>> MADV_DONTNEED only invalidates private copies in the pagecache. It's
>> essentially useless for any kind of shared mappings.

Let me rephrase because it was wrong: MADV_DONTNEED invalidates private 
COW pages referenced in the page tables :)

> 
> Since it's about zapping page tables, then I don't understand why it won't work
> for shmem..

It zaps the page tables but the shmem pages are still referenced (in the 
pagecache AFAIU). On next user space access, you would fill the page 
tables with the previous content.

That's why MADV_DONTNEED works properly on private anonymous memory, but 
not on shared anonymous memory - the only valid references are in the 
page tables in case of private mappings (well, unless we have other 
references like GUP etc.).


I did wonder, however, if there is benefit in doing both:

MADV_REMOVE followed by MADV_DONTNEED or the other way around. Like, 
will the extra MADV_DONTNEED also remove page tables and not just 
invalidate/zap the entries. Doesn't make a difference 
functionality-wise, but memory-consumption-wise.

I'll still have to have a look.

-- 
Thanks,

David / dhildenb


Re: [PATCH v3 02/12] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory
Posted by Peter Xu 4 years, 8 months ago
On Thu, Mar 11, 2021 at 06:41:29PM +0100, David Hildenbrand wrote:
> It zaps the page tables but the shmem pages are still referenced (in the
> pagecache AFAIU). On next user space access, you would fill the page tables
> with the previous content.
> 
> That's why MADV_DONTNEED works properly on private anonymous memory, but not
> on shared anonymous memory - the only valid references are in the page
> tables in case of private mappings (well, unless we have other references
> like GUP etc.).

For some reason I thought anonymous shared memory could do auto-recycle, but
after a second thought what you said makes perfect sense.

> 
> 
> I did wonder, however, if there is benefit in doing both:
> 
> MADV_REMOVE followed by MADV_DONTNEED or the other way around. Like, will
> the extra MADV_DONTNEED also remove page tables and not just invalidate/zap
> the entries. Doesn't make a difference functionality-wise, but
> memory-consumption-wise.
> 
> I'll still have to have a look.

I saw your other email - that'll be another topic of course.  For now I believe
it's not necessary, and your current patch looks valid.

I just hope when qemu decides to disgard the range, we're sure the rdma
mremap() region have been unmaped - iiuc that's the only use case of that.
Otherwise data would corrupt.

-- 
Peter Xu


Re: [PATCH v3 02/12] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory
Posted by David Hildenbrand 4 years, 8 months ago
On 11.03.21 18:15, David Hildenbrand wrote:
> On 11.03.21 18:11, Peter Xu wrote:
>> On Thu, Mar 11, 2021 at 05:45:46PM +0100, David Hildenbrand wrote:
>>> On 11.03.21 17:39, Dr. David Alan Gilbert wrote:
>>>> * David Hildenbrand (david@redhat.com) wrote:
>>>>> We can create shared anonymous memory via
>>>>>        "-object memory-backend-ram,share=on,..."
>>>>> which is, for example, required by PVRDMA for mremap() to work.
>>>>>
>>>>> Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we
>>>>> have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing.
>>>>
>>>> OK, I wonder how stable these rules are; is it defined anywhere that
>>>> it's required?
>>>>
>>>
>>> I had a look at the Linux implementation: it's essentially shmem ... but we
>>> don't have an fd exposed, so we cannot use fallocate() ... :)
>>>
>>> MADV_REMOVE documents (man):
>>>
>>> "In the initial implementation, only tmpfs(5) was supported MADV_REMOVE; but
>>> since Linux 3.5, any filesystem which supports the fallocate(2)
>>> FALLOC_FL_PUNCH_HOLE mode also supports MADV_REMOVE."
>>
>> Hmm, I see that MADV_DONTNEED will still tear down all mappings even for
>> anonymous shmem.. what did I miss?
> 
> Where did you see that?
> 
>>
> 
> MADV_DONTNEED only invalidates private copies in the pagecache. It's
> essentially useless for any kind of shared mappings.


And to clarify, I think what you see is that the mapping gets torn down, 
but not the backend storage released/freed.

Removing the backend storage (MADV_REMOVE/fallocate()) will implicitly 
tear down the mapping from what I can tell (and what my experiments show).

-- 
Thanks,

David / dhildenb


Re: [PATCH v3 02/12] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory
Posted by Peter Xu 4 years, 8 months ago
On Mon, Mar 08, 2021 at 04:05:50PM +0100, David Hildenbrand wrote:
> We can create shared anonymous memory via
>     "-object memory-backend-ram,share=on,..."
> which is, for example, required by PVRDMA for mremap() to work.
> 
> Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we
> have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing.
> 
> Fixes: 06329ccecfa0 ("mem: add share parameter to memory-backend-ram")

I'm thinking whether we should keep this fixes - it's valid, however it could
unveil issues if those remapped ranges didn't get unmapped in time.  After all
"not releasing some memory existed" seems not a huge deal for stable.  No
strong opinion, just raise it up as a pure question.

> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


Re: [PATCH v3 02/12] softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory
Posted by David Hildenbrand 4 years, 8 months ago
On 11.03.21 22:37, Peter Xu wrote:
> On Mon, Mar 08, 2021 at 04:05:50PM +0100, David Hildenbrand wrote:
>> We can create shared anonymous memory via
>>      "-object memory-backend-ram,share=on,..."
>> which is, for example, required by PVRDMA for mremap() to work.
>>
>> Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we
>> have to use MADV_REMOVE. MADV_DONTNEED fails silently and does nothing.
>>
>> Fixes: 06329ccecfa0 ("mem: add share parameter to memory-backend-ram")
> 
> I'm thinking whether we should keep this fixes - it's valid, however it could
> unveil issues if those remapped ranges didn't get unmapped in time.  After all
> "not releasing some memory existed" seems not a huge deal for stable.  No
> strong opinion, just raise it up as a pure question.
> 

If someone would be using it along with postcopy (which should work 
apart from that issue) you could be in trouble. That's why i think at 
least the Fixes: tag is valid. CC: stable might be debatable indeed.

>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 

Thanks!

-- 
Thanks,

David / dhildenb