[PATCH v1 0/3] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()

David Hildenbrand posted 3 patches 2 years, 9 months ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20210714112306.67793-1-david@redhat.com
Maintainers: Paolo Bonzini <pbonzini@redhat.com>
There is a newer version of this series
include/qemu/osdep.h |   7 ++
util/oslib-posix.c   | 167 ++++++++++++++++++++++++++++++-------------
2 files changed, 126 insertions(+), 48 deletions(-)
[PATCH v1 0/3] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()
Posted by David Hildenbrand 2 years, 9 months ago
#1 adds support for MADV_POPULATE_WRITE, #2 cleans up the code to avoid
global variables and prepare for concurrency and #3 makes os_mem_prealloc()
safe to be called from multiple threads concurrently.

Details regarding MADV_POPULATE_WRITE can be found in introducing upstream
Linux commit 4ca9b3859dac ("mm/madvise: introduce
MADV_POPULATE_(READ|WRITE) to prefault page tables") and in the latest man
page patch [1].

[1] https://lkml.kernel.org/r/20210712083917.16361-1-david@redhat.com

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Marek Kedzierski <mkedzier@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

David Hildenbrand (3):
  util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()
  util/oslib-posix: Introduce and use MemsetContext for
    touch_all_pages()
  util/oslib-posix: Support concurrent os_mem_prealloc() invocation

 include/qemu/osdep.h |   7 ++
 util/oslib-posix.c   | 167 ++++++++++++++++++++++++++++++-------------
 2 files changed, 126 insertions(+), 48 deletions(-)

-- 
2.31.1


Re: [PATCH v1 0/3] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()
Posted by Pankaj Gupta 2 years, 9 months ago
> #1 adds support for MADV_POPULATE_WRITE, #2 cleans up the code to avoid
> global variables and prepare for concurrency and #3 makes os_mem_prealloc()
> safe to be called from multiple threads concurrently.
>
> Details regarding MADV_POPULATE_WRITE can be found in introducing upstream
> Linux commit 4ca9b3859dac ("mm/madvise: introduce
> MADV_POPULATE_(READ|WRITE) to prefault page tables") and in the latest man
> page patch [1].
>
> [1] https://lkml.kernel.org/r/20210712083917.16361-1-david@redhat.com
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Marek Kedzierski <mkedzier@redhat.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>
> David Hildenbrand (3):
>   util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()
>   util/oslib-posix: Introduce and use MemsetContext for
>     touch_all_pages()
>   util/oslib-posix: Support concurrent os_mem_prealloc() invocation
>
>  include/qemu/osdep.h |   7 ++
>  util/oslib-posix.c   | 167 ++++++++++++++++++++++++++++++-------------
>  2 files changed, 126 insertions(+), 48 deletions(-)
>

Nice implementation to avoid wear of memory device for prealloc case
and to avoid touching of
all the memory and abrupt exit of VM because of lack of memory. Instead better
way to populate the page tables with madvise.

Plan is to use this infrastructure for virtio-mem, I guess?

For the patches 1 & 3:
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>


Thanks,
Pankaj

Re: [PATCH v1 0/3] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()
Posted by Pankaj Gupta 2 years, 9 months ago
> > #1 adds support for MADV_POPULATE_WRITE, #2 cleans up the code to avoid
> > global variables and prepare for concurrency and #3 makes os_mem_prealloc()
> > safe to be called from multiple threads concurrently.
> >
> > Details regarding MADV_POPULATE_WRITE can be found in introducing upstream
> > Linux commit 4ca9b3859dac ("mm/madvise: introduce
> > MADV_POPULATE_(READ|WRITE) to prefault page tables") and in the latest man
> > page patch [1].
> >
> > [1] https://lkml.kernel.org/r/20210712083917.16361-1-david@redhat.com
> >
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > Cc: Igor Mammedov <imammedo@redhat.com>
> > Cc: Eduardo Habkost <ehabkost@redhat.com>
> > Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Cc: Marek Kedzierski <mkedzier@redhat.com>
> > Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> >
> > David Hildenbrand (3):
> >   util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()
> >   util/oslib-posix: Introduce and use MemsetContext for
> >     touch_all_pages()
> >   util/oslib-posix: Support concurrent os_mem_prealloc() invocation
> >
> >  include/qemu/osdep.h |   7 ++
> >  util/oslib-posix.c   | 167 ++++++++++++++++++++++++++++++-------------
> >  2 files changed, 126 insertions(+), 48 deletions(-)
> >
>
> Nice implementation to avoid wear of memory device for prealloc case

For prealloc case I mean.

> and to avoid touching of
> all the memory and abrupt exit of VM because of lack of memory. Instead better
> way to populate the page tables with madvise.
>
> Plan is to use this infrastructure for virtio-mem, I guess?
>
> For the patches 1 & 3:
> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
>
>
> Thanks,
> Pankaj

Re: [PATCH v1 0/3] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()
Posted by Daniel P. Berrangé 2 years, 9 months ago
On Wed, Jul 14, 2021 at 01:23:03PM +0200, David Hildenbrand wrote:
> #1 adds support for MADV_POPULATE_WRITE, #2 cleans up the code to avoid
> global variables and prepare for concurrency and #3 makes os_mem_prealloc()
> safe to be called from multiple threads concurrently.
> 
> Details regarding MADV_POPULATE_WRITE can be found in introducing upstream
> Linux commit 4ca9b3859dac ("mm/madvise: introduce
> MADV_POPULATE_(READ|WRITE) to prefault page tables") and in the latest man
> page patch [1].

Looking at that commit message, I see your caveat about POPULATE_WRITE
used together with shared file mappings, causing an undesirable glut
of dirty pages that needs to be flushed back to the underlying storage.

Is this something we need to be concerned with for the hostmem-file.c
implementation ? While it is mostly used to point to files on tmpfs
or hugetlbfs, I think users do something point it to a plain file
on a normal filesystem.  So will we need to optimize to use the
fallocate+POPULATE_READ combination at some point ?


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [PATCH v1 0/3] util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()
Posted by David Hildenbrand 2 years, 9 months ago
On 20.07.21 16:45, Daniel P. Berrangé wrote:
> On Wed, Jul 14, 2021 at 01:23:03PM +0200, David Hildenbrand wrote:
>> #1 adds support for MADV_POPULATE_WRITE, #2 cleans up the code to avoid
>> global variables and prepare for concurrency and #3 makes os_mem_prealloc()
>> safe to be called from multiple threads concurrently.
>>
>> Details regarding MADV_POPULATE_WRITE can be found in introducing upstream
>> Linux commit 4ca9b3859dac ("mm/madvise: introduce
>> MADV_POPULATE_(READ|WRITE) to prefault page tables") and in the latest man
>> page patch [1].
> 
> Looking at that commit message, I see your caveat about POPULATE_WRITE
> used together with shared file mappings, causing an undesirable glut
> of dirty pages that needs to be flushed back to the underlying storage.
> 
> Is this something we need to be concerned with for the hostmem-file.c
> implementation ? While it is mostly used to point to files on tmpfs
> or hugetlbfs, I think users do something point it to a plain file
> on a normal filesystem.  So will we need to optimize to use the
> fallocate+POPULATE_READ combination at some point ?

In the future, it might make sense to use fallocate() only when it comes 
to shared file mappings.

AFAIKS os_mem_prealloc() currently serves the following purposes:

1) Preallocate anonymous memory or backend storage (file, hugetlbfs, ...)
2) Apply mbind() policy, preallocating it from the right node when 
applicable.
3) Prefault page tables

For shared mappings, it's a little bit difficult, though: mbind() does 
not seem to work on shared mappings (which to some degree makes 
logically sense, but I don't think QEMU users are aware that it is like 
that): "The specified policy will be ignored for any  MAP_SHARED 
mappings  in  the specified  memory range. Rather the pages will be 
allocated according to the memory policy of the thread that caused the 
page to be allocated. Again, this may not be the thread that called 
mbind()."

So 2) does not apply. A simple fallocate() can get 1) done more efficiently.

So if we want to use MADV_POPULATE_READ completely depends on whether we 
want 3). It can make sense to prefault page tables for RT workloads, 
however, there is usually nothing stopping the OS from clearing the page 
cache and requiring a refault later -- except with mlock.

So whether we want fallocate() or fallocate()+MADV_POPULATE_READ for 
shared file mappings really depends on the use case, and on the system 
setup. If the system won't immediately free up the page cache and undo 
what MADV_POPULATE_READ did, it might make sense to use it.

Long story short: it's complicated :)

-- 
Thanks,

David / dhildenb