include/qemu/osdep.h | 7 ++ util/oslib-posix.c | 167 ++++++++++++++++++++++++++++++------------- 2 files changed, 126 insertions(+), 48 deletions(-)
#1 adds support for MADV_POPULATE_WRITE, #2 cleans up the code to avoid global variables and prepare for concurrency and #3 makes os_mem_prealloc() safe to be called from multiple threads concurrently. Details regarding MADV_POPULATE_WRITE can be found in introducing upstream Linux commit 4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables") and in the latest man page patch [1]. [1] https://lkml.kernel.org/r/20210712083917.16361-1-david@redhat.com Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> David Hildenbrand (3): util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() util/oslib-posix: Introduce and use MemsetContext for touch_all_pages() util/oslib-posix: Support concurrent os_mem_prealloc() invocation include/qemu/osdep.h | 7 ++ util/oslib-posix.c | 167 ++++++++++++++++++++++++++++++------------- 2 files changed, 126 insertions(+), 48 deletions(-) -- 2.31.1
> #1 adds support for MADV_POPULATE_WRITE, #2 cleans up the code to avoid > global variables and prepare for concurrency and #3 makes os_mem_prealloc() > safe to be called from multiple threads concurrently. > > Details regarding MADV_POPULATE_WRITE can be found in introducing upstream > Linux commit 4ca9b3859dac ("mm/madvise: introduce > MADV_POPULATE_(READ|WRITE) to prefault page tables") and in the latest man > page patch [1]. > > [1] https://lkml.kernel.org/r/20210712083917.16361-1-david@redhat.com > > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: "Michael S. Tsirkin" <mst@redhat.com> > Cc: Igor Mammedov <imammedo@redhat.com> > Cc: Eduardo Habkost <ehabkost@redhat.com> > Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> > Cc: Marek Kedzierski <mkedzier@redhat.com> > Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> > > David Hildenbrand (3): > util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() > util/oslib-posix: Introduce and use MemsetContext for > touch_all_pages() > util/oslib-posix: Support concurrent os_mem_prealloc() invocation > > include/qemu/osdep.h | 7 ++ > util/oslib-posix.c | 167 ++++++++++++++++++++++++++++++------------- > 2 files changed, 126 insertions(+), 48 deletions(-) > Nice implementation to avoid wear of memory device for prealloc case and to avoid touching of all the memory and abrupt exit of VM because of lack of memory. Instead better way to populate the page tables with madvise. Plan is to use this infrastructure for virtio-mem, I guess? For the patches 1 & 3: Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Thanks, Pankaj
> > #1 adds support for MADV_POPULATE_WRITE, #2 cleans up the code to avoid > > global variables and prepare for concurrency and #3 makes os_mem_prealloc() > > safe to be called from multiple threads concurrently. > > > > Details regarding MADV_POPULATE_WRITE can be found in introducing upstream > > Linux commit 4ca9b3859dac ("mm/madvise: introduce > > MADV_POPULATE_(READ|WRITE) to prefault page tables") and in the latest man > > page patch [1]. > > > > [1] https://lkml.kernel.org/r/20210712083917.16361-1-david@redhat.com > > > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > > Cc: Igor Mammedov <imammedo@redhat.com> > > Cc: Eduardo Habkost <ehabkost@redhat.com> > > Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> > > Cc: Marek Kedzierski <mkedzier@redhat.com> > > Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> > > > > David Hildenbrand (3): > > util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc() > > util/oslib-posix: Introduce and use MemsetContext for > > touch_all_pages() > > util/oslib-posix: Support concurrent os_mem_prealloc() invocation > > > > include/qemu/osdep.h | 7 ++ > > util/oslib-posix.c | 167 ++++++++++++++++++++++++++++++------------- > > 2 files changed, 126 insertions(+), 48 deletions(-) > > > > Nice implementation to avoid wear of memory device for prealloc case For prealloc case I mean. > and to avoid touching of > all the memory and abrupt exit of VM because of lack of memory. Instead better > way to populate the page tables with madvise. > > Plan is to use this infrastructure for virtio-mem, I guess? > > For the patches 1 & 3: > Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> > > > Thanks, > Pankaj
On Wed, Jul 14, 2021 at 01:23:03PM +0200, David Hildenbrand wrote: > #1 adds support for MADV_POPULATE_WRITE, #2 cleans up the code to avoid > global variables and prepare for concurrency and #3 makes os_mem_prealloc() > safe to be called from multiple threads concurrently. > > Details regarding MADV_POPULATE_WRITE can be found in introducing upstream > Linux commit 4ca9b3859dac ("mm/madvise: introduce > MADV_POPULATE_(READ|WRITE) to prefault page tables") and in the latest man > page patch [1]. Looking at that commit message, I see your caveat about POPULATE_WRITE used together with shared file mappings, causing an undesirable glut of dirty pages that needs to be flushed back to the underlying storage. Is this something we need to be concerned with for the hostmem-file.c implementation ? While it is mostly used to point to files on tmpfs or hugetlbfs, I think users do something point it to a plain file on a normal filesystem. So will we need to optimize to use the fallocate+POPULATE_READ combination at some point ? Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 20.07.21 16:45, Daniel P. Berrangé wrote: > On Wed, Jul 14, 2021 at 01:23:03PM +0200, David Hildenbrand wrote: >> #1 adds support for MADV_POPULATE_WRITE, #2 cleans up the code to avoid >> global variables and prepare for concurrency and #3 makes os_mem_prealloc() >> safe to be called from multiple threads concurrently. >> >> Details regarding MADV_POPULATE_WRITE can be found in introducing upstream >> Linux commit 4ca9b3859dac ("mm/madvise: introduce >> MADV_POPULATE_(READ|WRITE) to prefault page tables") and in the latest man >> page patch [1]. > > Looking at that commit message, I see your caveat about POPULATE_WRITE > used together with shared file mappings, causing an undesirable glut > of dirty pages that needs to be flushed back to the underlying storage. > > Is this something we need to be concerned with for the hostmem-file.c > implementation ? While it is mostly used to point to files on tmpfs > or hugetlbfs, I think users do something point it to a plain file > on a normal filesystem. So will we need to optimize to use the > fallocate+POPULATE_READ combination at some point ? In the future, it might make sense to use fallocate() only when it comes to shared file mappings. AFAIKS os_mem_prealloc() currently serves the following purposes: 1) Preallocate anonymous memory or backend storage (file, hugetlbfs, ...) 2) Apply mbind() policy, preallocating it from the right node when applicable. 3) Prefault page tables For shared mappings, it's a little bit difficult, though: mbind() does not seem to work on shared mappings (which to some degree makes logically sense, but I don't think QEMU users are aware that it is like that): "The specified policy will be ignored for any MAP_SHARED mappings in the specified memory range. Rather the pages will be allocated according to the memory policy of the thread that caused the page to be allocated. Again, this may not be the thread that called mbind()." So 2) does not apply. A simple fallocate() can get 1) done more efficiently. So if we want to use MADV_POPULATE_READ completely depends on whether we want 3). It can make sense to prefault page tables for RT workloads, however, there is usually nothing stopping the OS from clearing the page cache and requiring a refault later -- except with mlock. So whether we want fallocate() or fallocate()+MADV_POPULATE_READ for shared file mappings really depends on the use case, and on the system setup. If the system won't immediately free up the page cache and undo what MADV_POPULATE_READ did, it might make sense to use it. Long story short: it's complicated :) -- Thanks, David / dhildenb
© 2016 - 2024 Red Hat, Inc.