[PATCH v4 0/4] overcommit: introduce mem-lock-onfault

Daniil Tatianin posted 4 patches 2 months, 1 week ago
There is a newer version of this series
hw/virtio/virtio-mem.c    |  2 +-
include/system/os-posix.h |  2 +-
include/system/os-win32.h |  3 ++-
include/system/system.h   | 12 ++++++++-
migration/postcopy-ram.c  |  4 +--
os-posix.c                | 10 ++++++--
qemu-options.hx           | 14 +++++++----
system/globals.c          | 12 ++++++++-
system/vl.c               | 52 +++++++++++++++++++++++++++++++--------
9 files changed, 87 insertions(+), 24 deletions(-)
[PATCH v4 0/4] overcommit: introduce mem-lock-onfault
Posted by Daniil Tatianin 2 months, 1 week ago
Currently, passing mem-lock=on to QEMU causes memory usage to grow by
huge amounts:

no memlock:
    $ ./qemu-system-x86_64 -overcommit mem-lock=off
    $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
    45652

    $ ./qemu-system-x86_64 -overcommit mem-lock=off -enable-kvm
    $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
    39756

memlock:
    $ ./qemu-system-x86_64 -overcommit mem-lock=on
    $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
    1309876

    $ ./qemu-system-x86_64 -overcommit mem-lock=on -enable-kvm
    $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
    259956

This is caused by the fact that mlockall(2) automatically
write-faults every existing and future anonymous mappings in the
process right away.

One of the reasons to enable mem-lock is to protect a QEMU process'
pages from being compacted and migrated by kcompactd (which does so
by messing with a live process page tables causing thousands of TLB
flush IPIs per second) basically stealing all guest time while it's
active.

mem-lock=on helps against this (given compact_unevictable_allowed is 0),
but the memory overhead it introduces is an undesirable side effect,
which we can completely avoid by passing MCL_ONFAULT to mlockall, which
is what this series allows to do with a new option for mem-lock called
on-fault.

memlock-onfault:
    $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault
    $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
    54004

    $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault -enable-kvm
    $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
    47772

You may notice the memory usage is still slightly higher, in this case
by a few megabytes over the mem-lock=off case. I was able to trace this
down to a bug in the linux kernel with MCL_ONFAULT not being honored for
the early process heap (with brk(2) etc.) so it is still write-faulted in
this case, but it's still way less than it was with just the mem-lock=on.

Changes since v1:
    - Don't make a separate mem-lock-onfault, add an on-fault option to mem-lock instead

Changes since v2:
    - Move overcommit option parsing out of line
    - Make enable_mlock an enum instead

Changes since v3:
    - Rebase to latest master due to the recent sysemu -> system renames

Daniil Tatianin (4):
  os: add an ability to lock memory on_fault
  system/vl: extract overcommit option parsing into a helper
  system: introduce a new MlockState enum
  overcommit: introduce mem-lock=on-fault

 hw/virtio/virtio-mem.c    |  2 +-
 include/system/os-posix.h |  2 +-
 include/system/os-win32.h |  3 ++-
 include/system/system.h   | 12 ++++++++-
 migration/postcopy-ram.c  |  4 +--
 os-posix.c                | 10 ++++++--
 qemu-options.hx           | 14 +++++++----
 system/globals.c          | 12 ++++++++-
 system/vl.c               | 52 +++++++++++++++++++++++++++++++--------
 9 files changed, 87 insertions(+), 24 deletions(-)

-- 
2.34.1
Re: [PATCH v4 0/4] overcommit: introduce mem-lock-onfault
Posted by Peter Xu 2 months, 1 week ago
On Thu, Jan 23, 2025 at 04:19:40PM +0300, Daniil Tatianin wrote:
> Currently, passing mem-lock=on to QEMU causes memory usage to grow by
> huge amounts:
> 
> no memlock:
>     $ ./qemu-system-x86_64 -overcommit mem-lock=off
>     $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>     45652
> 
>     $ ./qemu-system-x86_64 -overcommit mem-lock=off -enable-kvm
>     $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>     39756
> 
> memlock:
>     $ ./qemu-system-x86_64 -overcommit mem-lock=on
>     $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>     1309876
> 
>     $ ./qemu-system-x86_64 -overcommit mem-lock=on -enable-kvm
>     $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>     259956
> 
> This is caused by the fact that mlockall(2) automatically
> write-faults every existing and future anonymous mappings in the
> process right away.
> 
> One of the reasons to enable mem-lock is to protect a QEMU process'
> pages from being compacted and migrated by kcompactd (which does so
> by messing with a live process page tables causing thousands of TLB
> flush IPIs per second) basically stealing all guest time while it's
> active.
> 
> mem-lock=on helps against this (given compact_unevictable_allowed is 0),
> but the memory overhead it introduces is an undesirable side effect,
> which we can completely avoid by passing MCL_ONFAULT to mlockall, which
> is what this series allows to do with a new option for mem-lock called
> on-fault.
> 
> memlock-onfault:
>     $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault
>     $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>     54004
> 
>     $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault -enable-kvm
>     $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>     47772
> 
> You may notice the memory usage is still slightly higher, in this case
> by a few megabytes over the mem-lock=off case. I was able to trace this
> down to a bug in the linux kernel with MCL_ONFAULT not being honored for
> the early process heap (with brk(2) etc.) so it is still write-faulted in
> this case, but it's still way less than it was with just the mem-lock=on.
> 
> Changes since v1:
>     - Don't make a separate mem-lock-onfault, add an on-fault option to mem-lock instead
> 
> Changes since v2:
>     - Move overcommit option parsing out of line
>     - Make enable_mlock an enum instead
> 
> Changes since v3:
>     - Rebase to latest master due to the recent sysemu -> system renames
> 
> Daniil Tatianin (4):
>   os: add an ability to lock memory on_fault
>   system/vl: extract overcommit option parsing into a helper
>   system: introduce a new MlockState enum
>   overcommit: introduce mem-lock=on-fault
> 
>  hw/virtio/virtio-mem.c    |  2 +-
>  include/system/os-posix.h |  2 +-
>  include/system/os-win32.h |  3 ++-
>  include/system/system.h   | 12 ++++++++-
>  migration/postcopy-ram.c  |  4 +--
>  os-posix.c                | 10 ++++++--
>  qemu-options.hx           | 14 +++++++----
>  system/globals.c          | 12 ++++++++-
>  system/vl.c               | 52 +++++++++++++++++++++++++++++++--------
>  9 files changed, 87 insertions(+), 24 deletions(-)

Considering it's very mem relevant change and looks pretty benign.. I can
pick this if nobody disagrees (or beats me to it, which I'd appreciate).

I'll also provide at least one week for people to stop me.

Thanks,

-- 
Peter Xu
Re: [PATCH v4 0/4] overcommit: introduce mem-lock-onfault
Posted by Daniil Tatianin 1 month, 4 weeks ago
On 1/23/25 7:31 PM, Peter Xu wrote:
> On Thu, Jan 23, 2025 at 04:19:40PM +0300, Daniil Tatianin wrote:
>> Currently, passing mem-lock=on to QEMU causes memory usage to grow by
>> huge amounts:
>>
>> no memlock:
>>      $ ./qemu-system-x86_64 -overcommit mem-lock=off
>>      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>>      45652
>>
>>      $ ./qemu-system-x86_64 -overcommit mem-lock=off -enable-kvm
>>      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>>      39756
>>
>> memlock:
>>      $ ./qemu-system-x86_64 -overcommit mem-lock=on
>>      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>>      1309876
>>
>>      $ ./qemu-system-x86_64 -overcommit mem-lock=on -enable-kvm
>>      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>>      259956
>>
>> This is caused by the fact that mlockall(2) automatically
>> write-faults every existing and future anonymous mappings in the
>> process right away.
>>
>> One of the reasons to enable mem-lock is to protect a QEMU process'
>> pages from being compacted and migrated by kcompactd (which does so
>> by messing with a live process page tables causing thousands of TLB
>> flush IPIs per second) basically stealing all guest time while it's
>> active.
>>
>> mem-lock=on helps against this (given compact_unevictable_allowed is 0),
>> but the memory overhead it introduces is an undesirable side effect,
>> which we can completely avoid by passing MCL_ONFAULT to mlockall, which
>> is what this series allows to do with a new option for mem-lock called
>> on-fault.
>>
>> memlock-onfault:
>>      $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault
>>      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>>      54004
>>
>>      $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault -enable-kvm
>>      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>>      47772
>>
>> You may notice the memory usage is still slightly higher, in this case
>> by a few megabytes over the mem-lock=off case. I was able to trace this
>> down to a bug in the linux kernel with MCL_ONFAULT not being honored for
>> the early process heap (with brk(2) etc.) so it is still write-faulted in
>> this case, but it's still way less than it was with just the mem-lock=on.
>>
>> Changes since v1:
>>      - Don't make a separate mem-lock-onfault, add an on-fault option to mem-lock instead
>>
>> Changes since v2:
>>      - Move overcommit option parsing out of line
>>      - Make enable_mlock an enum instead
>>
>> Changes since v3:
>>      - Rebase to latest master due to the recent sysemu -> system renames
>>
>> Daniil Tatianin (4):
>>    os: add an ability to lock memory on_fault
>>    system/vl: extract overcommit option parsing into a helper
>>    system: introduce a new MlockState enum
>>    overcommit: introduce mem-lock=on-fault
>>
>>   hw/virtio/virtio-mem.c    |  2 +-
>>   include/system/os-posix.h |  2 +-
>>   include/system/os-win32.h |  3 ++-
>>   include/system/system.h   | 12 ++++++++-
>>   migration/postcopy-ram.c  |  4 +--
>>   os-posix.c                | 10 ++++++--
>>   qemu-options.hx           | 14 +++++++----
>>   system/globals.c          | 12 ++++++++-
>>   system/vl.c               | 52 +++++++++++++++++++++++++++++++--------
>>   9 files changed, 87 insertions(+), 24 deletions(-)
> Considering it's very mem relevant change and looks pretty benign.. I can
> pick this if nobody disagrees (or beats me to it, which I'd appreciate).
>
> I'll also provide at least one week for people to stop me.

I think it's been almost two weeks, so should be good now :)

Thanks!

> Thanks,
>
Re: [PATCH v4 0/4] overcommit: introduce mem-lock-onfault
Posted by Peter Xu 1 month, 4 weeks ago
On Tue, Feb 04, 2025 at 11:23:41AM +0300, Daniil Tatianin wrote:
> 
> On 1/23/25 7:31 PM, Peter Xu wrote:
> > On Thu, Jan 23, 2025 at 04:19:40PM +0300, Daniil Tatianin wrote:
> > > Currently, passing mem-lock=on to QEMU causes memory usage to grow by
> > > huge amounts:
> > > 
> > > no memlock:
> > >      $ ./qemu-system-x86_64 -overcommit mem-lock=off
> > >      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
> > >      45652
> > > 
> > >      $ ./qemu-system-x86_64 -overcommit mem-lock=off -enable-kvm
> > >      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
> > >      39756
> > > 
> > > memlock:
> > >      $ ./qemu-system-x86_64 -overcommit mem-lock=on
> > >      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
> > >      1309876
> > > 
> > >      $ ./qemu-system-x86_64 -overcommit mem-lock=on -enable-kvm
> > >      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
> > >      259956
> > > 
> > > This is caused by the fact that mlockall(2) automatically
> > > write-faults every existing and future anonymous mappings in the
> > > process right away.
> > > 
> > > One of the reasons to enable mem-lock is to protect a QEMU process'
> > > pages from being compacted and migrated by kcompactd (which does so
> > > by messing with a live process page tables causing thousands of TLB
> > > flush IPIs per second) basically stealing all guest time while it's
> > > active.
> > > 
> > > mem-lock=on helps against this (given compact_unevictable_allowed is 0),
> > > but the memory overhead it introduces is an undesirable side effect,
> > > which we can completely avoid by passing MCL_ONFAULT to mlockall, which
> > > is what this series allows to do with a new option for mem-lock called
> > > on-fault.
> > > 
> > > memlock-onfault:
> > >      $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault
> > >      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
> > >      54004
> > > 
> > >      $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault -enable-kvm
> > >      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
> > >      47772
> > > 
> > > You may notice the memory usage is still slightly higher, in this case
> > > by a few megabytes over the mem-lock=off case. I was able to trace this
> > > down to a bug in the linux kernel with MCL_ONFAULT not being honored for
> > > the early process heap (with brk(2) etc.) so it is still write-faulted in
> > > this case, but it's still way less than it was with just the mem-lock=on.
> > > 
> > > Changes since v1:
> > >      - Don't make a separate mem-lock-onfault, add an on-fault option to mem-lock instead
> > > 
> > > Changes since v2:
> > >      - Move overcommit option parsing out of line
> > >      - Make enable_mlock an enum instead
> > > 
> > > Changes since v3:
> > >      - Rebase to latest master due to the recent sysemu -> system renames
> > > 
> > > Daniil Tatianin (4):
> > >    os: add an ability to lock memory on_fault
> > >    system/vl: extract overcommit option parsing into a helper
> > >    system: introduce a new MlockState enum
> > >    overcommit: introduce mem-lock=on-fault
> > > 
> > >   hw/virtio/virtio-mem.c    |  2 +-
> > >   include/system/os-posix.h |  2 +-
> > >   include/system/os-win32.h |  3 ++-
> > >   include/system/system.h   | 12 ++++++++-
> > >   migration/postcopy-ram.c  |  4 +--
> > >   os-posix.c                | 10 ++++++--
> > >   qemu-options.hx           | 14 +++++++----
> > >   system/globals.c          | 12 ++++++++-
> > >   system/vl.c               | 52 +++++++++++++++++++++++++++++++--------
> > >   9 files changed, 87 insertions(+), 24 deletions(-)
> > Considering it's very mem relevant change and looks pretty benign.. I can
> > pick this if nobody disagrees (or beats me to it, which I'd appreciate).
> > 
> > I'll also provide at least one week for people to stop me.
> 
> I think it's been almost two weeks, so should be good now :)

Don't worry, this is in track.  I'll send it maybe in a few days.

Thanks,

-- 
Peter Xu
Re: [PATCH v4 0/4] overcommit: introduce mem-lock-onfault
Posted by Daniil Tatianin 1 month, 4 weeks ago
On 2/4/25 5:47 PM, Peter Xu wrote:

> On Tue, Feb 04, 2025 at 11:23:41AM +0300, Daniil Tatianin wrote:
>> On 1/23/25 7:31 PM, Peter Xu wrote:
>>> On Thu, Jan 23, 2025 at 04:19:40PM +0300, Daniil Tatianin wrote:
>>>> Currently, passing mem-lock=on to QEMU causes memory usage to grow by
>>>> huge amounts:
>>>>
>>>> no memlock:
>>>>       $ ./qemu-system-x86_64 -overcommit mem-lock=off
>>>>       $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>>>>       45652
>>>>
>>>>       $ ./qemu-system-x86_64 -overcommit mem-lock=off -enable-kvm
>>>>       $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>>>>       39756
>>>>
>>>> memlock:
>>>>       $ ./qemu-system-x86_64 -overcommit mem-lock=on
>>>>       $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>>>>       1309876
>>>>
>>>>       $ ./qemu-system-x86_64 -overcommit mem-lock=on -enable-kvm
>>>>       $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>>>>       259956
>>>>
>>>> This is caused by the fact that mlockall(2) automatically
>>>> write-faults every existing and future anonymous mappings in the
>>>> process right away.
>>>>
>>>> One of the reasons to enable mem-lock is to protect a QEMU process'
>>>> pages from being compacted and migrated by kcompactd (which does so
>>>> by messing with a live process page tables causing thousands of TLB
>>>> flush IPIs per second) basically stealing all guest time while it's
>>>> active.
>>>>
>>>> mem-lock=on helps against this (given compact_unevictable_allowed is 0),
>>>> but the memory overhead it introduces is an undesirable side effect,
>>>> which we can completely avoid by passing MCL_ONFAULT to mlockall, which
>>>> is what this series allows to do with a new option for mem-lock called
>>>> on-fault.
>>>>
>>>> memlock-onfault:
>>>>       $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault
>>>>       $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>>>>       54004
>>>>
>>>>       $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault -enable-kvm
>>>>       $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
>>>>       47772
>>>>
>>>> You may notice the memory usage is still slightly higher, in this case
>>>> by a few megabytes over the mem-lock=off case. I was able to trace this
>>>> down to a bug in the linux kernel with MCL_ONFAULT not being honored for
>>>> the early process heap (with brk(2) etc.) so it is still write-faulted in
>>>> this case, but it's still way less than it was with just the mem-lock=on.
>>>>
>>>> Changes since v1:
>>>>       - Don't make a separate mem-lock-onfault, add an on-fault option to mem-lock instead
>>>>
>>>> Changes since v2:
>>>>       - Move overcommit option parsing out of line
>>>>       - Make enable_mlock an enum instead
>>>>
>>>> Changes since v3:
>>>>       - Rebase to latest master due to the recent sysemu -> system renames
>>>>
>>>> Daniil Tatianin (4):
>>>>     os: add an ability to lock memory on_fault
>>>>     system/vl: extract overcommit option parsing into a helper
>>>>     system: introduce a new MlockState enum
>>>>     overcommit: introduce mem-lock=on-fault
>>>>
>>>>    hw/virtio/virtio-mem.c    |  2 +-
>>>>    include/system/os-posix.h |  2 +-
>>>>    include/system/os-win32.h |  3 ++-
>>>>    include/system/system.h   | 12 ++++++++-
>>>>    migration/postcopy-ram.c  |  4 +--
>>>>    os-posix.c                | 10 ++++++--
>>>>    qemu-options.hx           | 14 +++++++----
>>>>    system/globals.c          | 12 ++++++++-
>>>>    system/vl.c               | 52 +++++++++++++++++++++++++++++++--------
>>>>    9 files changed, 87 insertions(+), 24 deletions(-)
>>> Considering it's very mem relevant change and looks pretty benign.. I can
>>> pick this if nobody disagrees (or beats me to it, which I'd appreciate).
>>>
>>> I'll also provide at least one week for people to stop me.
>> I think it's been almost two weeks, so should be good now :)
> Don't worry, this is in track.  I'll send it maybe in a few days.
>
> Thanks,

Amazing, thank you!