[PATCH RFC 00/13] mm/rmap: support arbitrary folio mappings

David Hildenbrand (Arm) posted 13 patches 2 months ago
Documentation/admin-guide/cgroup-v1/memory.rst |   6 +-
Documentation/admin-guide/cgroup-v2.rst        |  13 +-
Documentation/admin-guide/mm/pagemap.rst       |  30 ++-
Documentation/filesystems/proc.rst             |  41 ++--
Documentation/mm/transhuge.rst                 |  29 +--
fs/proc/internal.h                             |  58 +----
fs/proc/page.c                                 |  10 +-
fs/proc/task_mmu.c                             |  69 ++----
include/linux/mm.h                             |  37 +--
include/linux/mm_types.h                       |  22 +-
include/linux/pgtable.h                        |  22 ++
include/linux/rmap.h                           | 221 ++++++++----------
mm/Kconfig                                     |  17 --
mm/debug.c                                     |  10 +-
mm/internal.h                                  |  30 +--
mm/memory.c                                    |   3 +-
mm/page_alloc.c                                |  31 +--
mm/rmap.c                                      | 302 ++++++++-----------------
18 files changed, 325 insertions(+), 626 deletions(-)
[PATCH RFC 00/13] mm/rmap: support arbitrary folio mappings
Posted by David Hildenbrand (Arm) 2 months ago
This series is related to my LSF/MM/BPF topic:

	[LSF/MM/BPF TOPIC] Towards removing CONFIG_PAGE_MAPCOUNT [1]

And does the following things:

(a) Gets rid of CONFIG_PAGE_MAPCOUNT, stopping rmap-related code to no
    longer use page->_mapcount.

(b) Converts the entire mapcount to a "total mapped pages" counter, that
    can trivially be used to calculate the per-page average mapcount in
    a folio.

(c) Cleans up the code heavily,

(d) Teaches RMAP code to support arbitrary folio mappings: For example,
    supporting PMD-mapping of folios that span multiple PMDs.

Initially, I wanted to use a PMD + PUD mapcount, but once I realized that
we can do the same thing much easier with a "total mapped pages" counters,
I tried that. And was surprised how clean it looks.

More details in the last patch.

Functional Changes
------------------

The kernel now always behaves like CONFIG_PAGE_NO_MAPCOUNT currently
does, in particular:

(1) System/node/memcg stats account large folios as fully mapped as soon
    as a single page is mapped, instead of the precise number of pages
    a partially-mapped folio has mapped. For example, this affects
    "AnonPages:", "Mapped:" and "Shmem" in /proc/meminfo.

(2) "mapmax" part of /proc/$PID/numa_maps uses the average page mapcount
    in a folio instead of the effective page mapcount.

(3) Determining the PM_MMAP_EXCLUSIVE flag for /proc/$PID/pagemap is based on
    folio_maybe_mapped_shared() instead of the effective page mapcount.

(4) /proc/kpagecount exposes the average page mapcount in a folio
    instead of the effective page mapcount.

(5) Calculating the Pss for /proc/$PID/smaps and /proc/$PID/smaps_rollup
    uses the average page mapcount in a folio instead of the effective
    page mapcount.

(6) Calculating the Uss for /proc/$PID/smaps and /proc/$PID/smaps_rollup
    uses folio_maybe_mapped_shared() instead of the effective page
    mapcount.

(7) Detecting partially-mapped anonymous folios uses the average
    page-page mapcount. This implies that we cannot detect partial
    mappings of shared anonymous folios in all cases.

TODOs
-----

Partially-mapped folios:

If deemed relevant, we could detect more partially-mapped shared
anonymous folios on the memory reclaim path (e.g., during access-bit
harvesting) and flag them accordingly, so they can get deferred-split.
We might also just let the deferred splitting logic perform more such
scanning of possible candidates.

Mapcount overflows:

It may already be possible to overflow a large folio's mapcount
(+refcount). With this series, it may be possible to overflow
"total mapped pages" on 32bit; and I'd like to avoid making it an
unsigned long long on 32bit.

In a distant future, we may want a 64bit mapcountv value, but for
the time being (no relevant use cases), we should likely reject new
folio mappings if there is the possibility for mapcount +
"total mapped pages" overflows early. I assume doing some basic checks
during fork() + file folio mapping should be good enough (e.g., stop
once it would turn negative).

This series saw only very basic testing on 64bit and no performance
fine-tuning yet.

[1] https://lore.kernel.org/all/fe6afcc3-7539-4650-863b-04d971e89cfb@kernel.org/

---
David Hildenbrand (Arm) (13):
      mm/rmap: remove folio->_nr_pages_mapped
      fs/proc/task_mmu: remove CONFIG_PAGE_MAPCOUNT handling for "mapmax"
      fs/proc/page: remove CONFIG_PAGE_MAPCOUNT handling for kpagecount
      fs/proc/task_mmu: remove CONFIG_PAGE_MAPCOUNT handling for PM_MMAP_EXCLUSIVE
      fs/proc/task_mmu: remove mapcount comment in smaps_account()
      fs/proc/task_mmu: remove CONFIG_PAGE_MAPCOUNT handling in smaps_account()
      mm/rmap: remove CONFIG_PAGE_MAPCOUNT
      mm: re-consolidate folio->_entire_mapcount
      mm: move _large_mapcount to _mapcount in page[1] of a large folio
      mm: re-consolidate folio->_pincount
      mm/rmap: stop using the entire mapcount for hugetlb folios
      mm/rmap: large mapcount interface cleanups
      mm/rmap: support arbitrary folio mappings

 Documentation/admin-guide/cgroup-v1/memory.rst |   6 +-
 Documentation/admin-guide/cgroup-v2.rst        |  13 +-
 Documentation/admin-guide/mm/pagemap.rst       |  30 ++-
 Documentation/filesystems/proc.rst             |  41 ++--
 Documentation/mm/transhuge.rst                 |  29 +--
 fs/proc/internal.h                             |  58 +----
 fs/proc/page.c                                 |  10 +-
 fs/proc/task_mmu.c                             |  69 ++----
 include/linux/mm.h                             |  37 +--
 include/linux/mm_types.h                       |  22 +-
 include/linux/pgtable.h                        |  22 ++
 include/linux/rmap.h                           | 221 ++++++++----------
 mm/Kconfig                                     |  17 --
 mm/debug.c                                     |  10 +-
 mm/internal.h                                  |  30 +--
 mm/memory.c                                    |   3 +-
 mm/page_alloc.c                                |  31 +--
 mm/rmap.c                                      | 302 ++++++++-----------------
 18 files changed, 325 insertions(+), 626 deletions(-)
---
base-commit: 196ab4af58d724f24335fed3da62920c3cea945f
change-id: 20260330-mapcount-32066c687010

Best regards,
-- 
David Hildenbrand (Arm) <david@kernel.org>