This is version 4 of switching the THP shrinker to list_lru.
Changes in v4:
- guard folio_memcg_alloc_deferred() with mem_cgroup_disabled() to fix
NULL deref in __memcg_list_lru_alloc() when booting with
cgroup_disable=memory (e.g., kdump capture kernel) -- reported and
tested by Mikhail Zaslonko on s390 and x86
- flatten if (folio) branches in alloc_swap_folio() and alloc_anon_folio()
in a prep patch so the list_lru allocation additions are a clean minimal
diff (Lorenzo)
- folio_memcg_alloc_deferred() moved out of alloc_charge_folio() into the
anon-only collapse_huge_page() path; collapse_file() shares that helper
but its pages don't go on the THP shrinker queue (David)
- guard folio_memcg_alloc_deferred() with order > 1; mTHPs below order-2
can't be queued on the deferred split list (David)
- make deferred_split_lru static, hide behind folio_memcg_alloc_deferred()
wrapper with GFP_KERNEL (Lorenzo)
- rename l -> lru throughout huge_memory.c (Lorenzo)
- kdoc for folio_memcg_list_lru_alloc() (Lorenzo)
- list_lru_lock_irq()/unlock_irq()/add_irq() irq-disabling variants;
use list_lru_add_irq() in deferred_split_scan() (Lorenzo)
- reorder shrinker_free() before list_lru_destroy() (Lorenzo)
Changes in v3:
- dedicated lockdep_key for irqsafe deferred_split_lru.lock (syzbot)
- conditional list_lru ops in __folio_freeze_and_split_unmapped() (syzbot)
- annotate runs of inscrutable false, NULL, false function arguments (David)
- rename to folio_memcg_list_lru_alloc() (David)
Changes in v2:
- explicit rcu_read_lock() in __folio_freeze_and_split_unmapped() (Usama)
- split out list_lru prep bits (Dave)
The open-coded deferred split queue has issues. It's not NUMA-aware
(when cgroup is enabled), and it's more complicated in the callsites
interacting with it. Switching to list_lru fixes the NUMA problem and
streamlines things. It also simplifies planned shrinker work.
Patches 1-4 are cleanups and small refactors in list_lru code. They're
basically independent, but make the THP shrinker conversion easier.
Patch 5 extends the list_lru API to allow the caller to control the
locking scope. The THP shrinker has private state it needs to keep
synchronized with the LRU state.
Patch 6 extends the list_lru API with a convenience helper to do
list_lru head allocation (memcg_list_lru_alloc) when coming from a
folio. Anon THPs are instantiated in several places, and with the
folio reparenting patches pending, folio_memcg() access is now a more
delicate dance. This avoids having to replicate that dance everywhere.
Patch 7 flattens the folio allocation retry loops in alloc_swap_folio()
and alloc_anon_folio() without functional change, in preparation for
patch 8.
Patch 8 finally switches the deferred_split_queue to list_lru.
Based on mm-unstable.
include/linux/huge_mm.h | 7 +-
include/linux/list_lru.h | 68 +++++++++
include/linux/memcontrol.h | 4 -
include/linux/mmzone.h | 12 --
mm/huge_memory.c | 355 ++++++++++++++-----------------------------
mm/internal.h | 2 +-
mm/khugepaged.c | 3 +
mm/list_lru.c | 220 ++++++++++++++++++---------
mm/memcontrol.c | 12 +-
mm/memory.c | 52 ++++---
mm/mm_init.c | 15 --
11 files changed, 374 insertions(+), 376 deletions(-)