[PATCH 0/2] KVM: x86/mmu: Repurpose MMU shrinker into page cache shrinker

Vipin Sharma posted 2 patches 2 months, 2 weeks ago
There is a newer version of this series
arch/x86/include/asm/kvm_host.h |   7 +-
arch/x86/kvm/mmu/mmu.c          | 139 +++++++++++++-------------------
arch/x86/kvm/mmu/paging_tmpl.h  |  14 ++--
include/linux/kvm_host.h        |   1 +
virt/kvm/kvm_main.c             |   8 +-
5 files changed, 78 insertions(+), 91 deletions(-)
[PATCH 0/2] KVM: x86/mmu: Repurpose MMU shrinker into page cache shrinker
Posted by Vipin Sharma 2 months, 2 weeks ago
This series is extracted out from the NUMA aware page table series[1].
MMU shrinker changes were in patches 1 to 9 in the old series.

This series is changing KVM MMU shrinker behaviour by emptying MMU page
caches which are used during page fault and MMU load operations. It also
incorporates feedback from the NUMA aware page table series[1] regarding
MMU shrinker.

KVM MMU shrinker has not been very effective in alleviating pain under
memory pressure. It frees up the pages actively being used which results
in VM degradation. VM will take fault and bring them again in page
tables. More discussions happened at [2]. Overall, consensus was to
reprupose it into the code which frees pages from KVM MMU page caches.

Recently [3], there was a discussion to disable shrinker for TDP MMU.
Revival of this series is result of that discussion.

There are two major differences from the old series.
1. There is no global accounting of cache pages. It is dynamically
   calculated in mmu_shrink_count(). This has two effects; i) counting will
   be inaccurate but code is much simpler, and ii) kvm_lock being used
   here, this should be fine as mmu_shrink_scan() also holds the lock
   for its operation.
2. Only empty mmu_shadow_page_cache and mmu_shadowed_info_cache. This
   version doesn't empty split_shadow_page_cache as it is used only
   during dirty logging operation and is one per VM unlike other two
   which are per vCPU. I am not fully convinced that adding it is needed
   as it will add the cost of adding one more mutex and synchronizing it
   in shrinker. Also, if a VM is being dirty tracked most likely it will
   be migrated (memory pressure might be the reason in the first place)
   so better to not hinder migration effort and let vCPUs free up their
   caches. If someone convinces me to add split cache as well then I can
   send a separate patch to add that as well.

[1] https://lore.kernel.org/kvm/20230306224127.1689967-1-vipinsh@google.com/
[2] https://lore.kernel.org/lkml/Y45dldZnI6OIf+a5@google.com/
[3] https://lore.kernel.org/kvm/20240819214014.GA2313467.vipinsh@google.com/#t


v1:
- No global counting of pages in cache. As this number might not remain
  same between calls of mmu_shrink_count() and mmu_shrink_scan().
- Count cache pages in mmu_shrink_count(). KVM can tolerate inaccuracy
  here.
- Empty mmu_shadow_page_cache and mmu_shadowed_info_cache only. Don't
  empty split_shadow_page_cache.

v0: Patches 1-9 from NUMA aware page table series.
https://lore.kernel.org/kvm/20230306224127.1689967-1-vipinsh@google.com/

Vipin Sharma (2):
  KVM: x86/mmu: Change KVM mmu shrinker to no-op
  KVM: x86/mmu: Use MMU shrinker to shrink KVM MMU memory caches

 arch/x86/include/asm/kvm_host.h |   7 +-
 arch/x86/kvm/mmu/mmu.c          | 139 +++++++++++++-------------------
 arch/x86/kvm/mmu/paging_tmpl.h  |  14 ++--
 include/linux/kvm_host.h        |   1 +
 virt/kvm/kvm_main.c             |   8 +-
 5 files changed, 78 insertions(+), 91 deletions(-)


base-commit: 12680d7b8ac4db2eba6237a21a93d2b0e78a52a6
-- 
2.46.0.662.g92d0881bb0-goog
Re: [PATCH 0/2] KVM: x86/mmu: Repurpose MMU shrinker into page cache shrinker
Posted by David Matlack 2 months ago
On 2024-09-13 02:43 PM, Vipin Sharma wrote:
> This series is extracted out from the NUMA aware page table series[1].
> MMU shrinker changes were in patches 1 to 9 in the old series.

I'm curious how you tested this series. Would it be posisble to write a
selftest to exercise KVM's shrinker interactions? I don't think it needs
to be anything fancy to be useful (e.g. just run a VM, trigger lots of
shrinking, and make sure nothing blows up).

There appears to be a debugfs interface which could be used to trigger
shrinking from a selftest.

https://docs.kernel.org/admin-guide/mm/shrinker_debugfs.html
Re: [PATCH 0/2] KVM: x86/mmu: Repurpose MMU shrinker into page cache shrinker
Posted by Vipin Sharma 2 months ago
On Wed, Sep 25, 2024 at 4:51 PM David Matlack <dmatlack@google.com> wrote:
>
> On 2024-09-13 02:43 PM, Vipin Sharma wrote:
> > This series is extracted out from the NUMA aware page table series[1].
> > MMU shrinker changes were in patches 1 to 9 in the old series.
>
> I'm curious how you tested this series. Would it be posisble to write a
> selftest to exercise KVM's shrinker interactions? I don't think it needs
> to be anything fancy to be useful (e.g. just run a VM, trigger lots of
> shrinking, and make sure nothing blows up).

My testing was dropping caches (echo 2 > /proc/sys/vm/drop_caches) in
background while running dirty_log_perf_test selftest multiple times.
I added printk in shrink_count() and shrink_scan() to make sure pages
are being reported and released.

I can write a test which can spawn a thread to drop caches and a VM
which touches all of its pages to generate page faults. Only downside
is it will not detect if KVM MMU shrinker is being invoked, counting
and freeing pages.

>
> There appears to be a debugfs interface which could be used to trigger
> shrinking from a selftest.
>
> https://docs.kernel.org/admin-guide/mm/shrinker_debugfs.html

This is interesting and it does what is needed to test KVM MMU
shrinker. However, this needs CONFIG_DEBUG_FS and
CONFIG_SHRINKER_DEBUG. I think using shrinker_debugfs will be better,
selftest can just skip if it cannot find shrinker_debugfs files. One
downside is that this test will not run if these configs are not
enabled.

Which one do you prefer? I am preferring shrinker_debugfs but
concerned about its dependency on those two configs, not sure if it is
okay to have this kind of dependency in a selftests.
Re: [PATCH 0/2] KVM: x86/mmu: Repurpose MMU shrinker into page cache shrinker
Posted by David Matlack 2 months ago
On Mon, Sep 30, 2024 at 9:43 AM Vipin Sharma <vipinsh@google.com> wrote:
>
> On Wed, Sep 25, 2024 at 4:51 PM David Matlack <dmatlack@google.com> wrote:
> >
> > On 2024-09-13 02:43 PM, Vipin Sharma wrote:
> > > This series is extracted out from the NUMA aware page table series[1].
> > > MMU shrinker changes were in patches 1 to 9 in the old series.
> >
> > I'm curious how you tested this series. Would it be posisble to write a
> > selftest to exercise KVM's shrinker interactions? I don't think it needs
> > to be anything fancy to be useful (e.g. just run a VM, trigger lots of
> > shrinking, and make sure nothing blows up).
>
> My testing was dropping caches (echo 2 > /proc/sys/vm/drop_caches) in
> background while running dirty_log_perf_test selftest multiple times.
> I added printk in shrink_count() and shrink_scan() to make sure pages
> are being reported and released.
>
> I can write a test which can spawn a thread to drop caches and a VM
> which touches all of its pages to generate page faults. Only downside
> is it will not detect if KVM MMU shrinker is being invoked, counting
> and freeing pages.
>
> >
> > There appears to be a debugfs interface which could be used to trigger
> > shrinking from a selftest.
> >
> > https://docs.kernel.org/admin-guide/mm/shrinker_debugfs.html
>
> This is interesting and it does what is needed to test KVM MMU
> shrinker. However, this needs CONFIG_DEBUG_FS and
> CONFIG_SHRINKER_DEBUG. I think using shrinker_debugfs will be better,
> selftest can just skip if it cannot find shrinker_debugfs files. One
> downside is that this test will not run if these configs are not
> enabled.
>
> Which one do you prefer? I am preferring shrinker_debugfs but
> concerned about its dependency on those two configs, not sure if it is
> okay to have this kind of dependency in a selftests.

Configs are there to be enabled. If shrinker_debugfs is the right way
to test this, I don't see any reason to shy away from using it.