Documentation/admin-guide/cgroup-v2.rst | 9 +++++++++ include/linux/vm_event_item.h | 2 ++ mm/memcontrol.c | 4 ++++ mm/page_io.c | 16 ++++++++++++++++ mm/vmstat.c | 2 ++ 5 files changed, 33 insertions(+)
From: Barry Song <v-songbaohua@oppo.com>
When the proportion of folios from the zeromap is small, missing their
accounting may not significantly impact profiling. However, it’s easy
to construct a scenario where this becomes an issue—for example,
allocating 1 GB of memory, writing zeros from userspace, followed by
MADV_PAGEOUT, and then swapping it back in. In this case, the swap-out
and swap-in counts seem to vanish into a black hole, potentially
causing semantic ambiguity.
On the other hand, Usama reported that zero-filled pages can exceed 10% in
workloads utilizing zswap, while Hailong noted that some app in Android
have more than 6% zero-filled pages. Before commit 0ca0c24e3211 ("mm: store
zero pages to be swapped out in a bitmap"), both zswap and zRAM implemented
similar optimizations, leading to these optimized-out pages being counted
in either zswap or zRAM counters (with pswpin/pswpout also increasing for
zRAM). With zeromap functioning prior to both zswap and zRAM, userspace
will no longer detect these swap-out and swap-in actions.
We have three ways to address this:
1. Introduce a dedicated counter specifically for the zeromap.
2. Use pswpin/pswpout accounting, treating the zero map as a standard
backend. This approach aligns with zRAM's current handling of
same-page fills at the device level. However, it would mean losing
the optimized-out page counters previously available in zRAM and
would not align with systems using zswap. Additionally, as noted by
Nhat Pham, pswpin/pswpout counters apply only to I/O done directly
to the backend device.
3. Count zeromap pages under zswap, aligning with system behavior when
zswap is enabled. However, this would not be consistent with zRAM,
nor would it align with systems lacking both zswap and zRAM.
Given the complications with options 2 and 3, this patch selects
option 1.
We can find these counters from /proc/vmstat (counters for the whole
system) and memcg's memory.stat (counters for the interested memcg).
For example:
$ grep -E 'swpin_zero|swpout_zero' /proc/vmstat
swpin_zero 1648
swpout_zero 33536
$ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.stat
swpin_zero 3905
swpout_zero 3985
This patch does not address any specific zeromap bug, but the missing
swpout and swpin counts for zero-filled pages can be highly confusing
and may mislead user-space agents that rely on changes in these counters
as indicators. Therefore, we add a Fixes tag to encourage the inclusion
of this counter in any kernel versions with zeromap.
Fixes: 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitmap")
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Hailong Liu <hailong.liu@oppo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
-v3:
* collected Nhat's reviewed-by, thanks!
* refine doc per Usama and David, thanks!
* refine changelog
Documentation/admin-guide/cgroup-v2.rst | 9 +++++++++
include/linux/vm_event_item.h | 2 ++
mm/memcontrol.c | 4 ++++
mm/page_io.c | 16 ++++++++++++++++
mm/vmstat.c | 2 ++
5 files changed, 33 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index db3799f1483e..13736a94edfd 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1599,6 +1599,15 @@ The following nested keys are defined.
pglazyfreed (npn)
Amount of reclaimed lazyfree pages
+ swpin_zero
+ Number of pages swapped into memory and filled with zero, where I/O
+ was optimized out because the page content was detected to be zero
+ during swapout.
+
+ swpout_zero
+ Number of zero-filled pages swapped out with I/O skipped due to the
+ content being detected as zero.
+
zswpin
Number of pages moved in to memory from zswap.
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index aed952d04132..f70d0958095c 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -134,6 +134,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
#ifdef CONFIG_SWAP
SWAP_RA,
SWAP_RA_HIT,
+ SWPIN_ZERO,
+ SWPOUT_ZERO,
#ifdef CONFIG_KSM
KSM_SWPIN_COPY,
#endif
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5e44d6e7591e..7b3503d12aaf 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -441,6 +441,10 @@ static const unsigned int memcg_vm_event_stat[] = {
PGDEACTIVATE,
PGLAZYFREE,
PGLAZYFREED,
+#ifdef CONFIG_SWAP
+ SWPIN_ZERO,
+ SWPOUT_ZERO,
+#endif
#ifdef CONFIG_ZSWAP
ZSWPIN,
ZSWPOUT,
diff --git a/mm/page_io.c b/mm/page_io.c
index 5d9b6e6cf96c..4b4ea8e49cf6 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -204,7 +204,9 @@ static bool is_folio_zero_filled(struct folio *folio)
static void swap_zeromap_folio_set(struct folio *folio)
{
+ struct obj_cgroup *objcg = get_obj_cgroup_from_folio(folio);
struct swap_info_struct *sis = swp_swap_info(folio->swap);
+ int nr_pages = folio_nr_pages(folio);
swp_entry_t entry;
unsigned int i;
@@ -212,6 +214,12 @@ static void swap_zeromap_folio_set(struct folio *folio)
entry = page_swap_entry(folio_page(folio, i));
set_bit(swp_offset(entry), sis->zeromap);
}
+
+ count_vm_events(SWPOUT_ZERO, nr_pages);
+ if (objcg) {
+ count_objcg_events(objcg, SWPOUT_ZERO, nr_pages);
+ obj_cgroup_put(objcg);
+ }
}
static void swap_zeromap_folio_clear(struct folio *folio)
@@ -507,6 +515,7 @@ static void sio_read_complete(struct kiocb *iocb, long ret)
static bool swap_read_folio_zeromap(struct folio *folio)
{
int nr_pages = folio_nr_pages(folio);
+ struct obj_cgroup *objcg;
bool is_zeromap;
/*
@@ -521,6 +530,13 @@ static bool swap_read_folio_zeromap(struct folio *folio)
if (!is_zeromap)
return false;
+ objcg = get_obj_cgroup_from_folio(folio);
+ count_vm_events(SWPIN_ZERO, nr_pages);
+ if (objcg) {
+ count_objcg_events(objcg, SWPIN_ZERO, nr_pages);
+ obj_cgroup_put(objcg);
+ }
+
folio_zero_range(folio, 0, folio_size(folio));
folio_mark_uptodate(folio);
return true;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 22a294556b58..c8ef7352f9ed 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1418,6 +1418,8 @@ const char * const vmstat_text[] = {
#ifdef CONFIG_SWAP
"swap_ra",
"swap_ra_hit",
+ "swpin_zero",
+ "swpout_zero",
#ifdef CONFIG_KSM
"ksm_swpin_copy",
#endif
--
2.39.3 (Apple Git-146)
On Wed, Nov 06, 2024 at 10:19:34AM +1300, Barry Song wrote: > From: Barry Song <v-songbaohua@oppo.com> > > When the proportion of folios from the zeromap is small, missing their > accounting may not significantly impact profiling. However, it’s easy > to construct a scenario where this becomes an issue—for example, > allocating 1 GB of memory, writing zeros from userspace, followed by > MADV_PAGEOUT, and then swapping it back in. In this case, the swap-out > and swap-in counts seem to vanish into a black hole, potentially > causing semantic ambiguity. > > On the other hand, Usama reported that zero-filled pages can exceed 10% in > workloads utilizing zswap, while Hailong noted that some app in Android > have more than 6% zero-filled pages. Before commit 0ca0c24e3211 ("mm: store > zero pages to be swapped out in a bitmap"), both zswap and zRAM implemented > similar optimizations, leading to these optimized-out pages being counted > in either zswap or zRAM counters (with pswpin/pswpout also increasing for > zRAM). With zeromap functioning prior to both zswap and zRAM, userspace > will no longer detect these swap-out and swap-in actions. > > We have three ways to address this: > > 1. Introduce a dedicated counter specifically for the zeromap. > 2. Use pswpin/pswpout accounting, treating the zero map as a standard > backend. This approach aligns with zRAM's current handling of > same-page fills at the device level. However, it would mean losing > the optimized-out page counters previously available in zRAM and > would not align with systems using zswap. Additionally, as noted by > Nhat Pham, pswpin/pswpout counters apply only to I/O done directly > to the backend device. > 3. Count zeromap pages under zswap, aligning with system behavior when > zswap is enabled. However, this would not be consistent with zRAM, > nor would it align with systems lacking both zswap and zRAM. > > Given the complications with options 2 and 3, this patch selects > option 1. > > We can find these counters from /proc/vmstat (counters for the whole > system) and memcg's memory.stat (counters for the interested memcg). > > For example: > > $ grep -E 'swpin_zero|swpout_zero' /proc/vmstat > swpin_zero 1648 > swpout_zero 33536 > > $ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.stat > swpin_zero 3905 > swpout_zero 3985 > > This patch does not address any specific zeromap bug, but the missing > swpout and swpin counts for zero-filled pages can be highly confusing > and may mislead user-space agents that rely on changes in these counters > as indicators. Therefore, we add a Fixes tag to encourage the inclusion > of this counter in any kernel versions with zeromap. > > Fixes: 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitmap") > Reviewed-by: Nhat Pham <nphamcs@gmail.com> > Cc: Usama Arif <usamaarif642@gmail.com> > Cc: Chengming Zhou <chengming.zhou@linux.dev> > Cc: Yosry Ahmed <yosryahmed@google.com> > Cc: Hailong Liu <hailong.liu@oppo.com> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: David Hildenbrand <david@redhat.com> > Cc: Hugh Dickins <hughd@google.com> > Cc: Matthew Wilcox (Oracle) <willy@infradead.org> > Cc: Shakeel Butt <shakeel.butt@linux.dev> > Cc: Andi Kleen <ak@linux.intel.com> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com> > Cc: Chris Li <chrisl@kernel.org> > Cc: "Huang, Ying" <ying.huang@intel.com> > Cc: Kairui Song <kasong@tencent.com> > Cc: Ryan Roberts <ryan.roberts@arm.com> > Signed-off-by: Barry Song <v-songbaohua@oppo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
On Thu, Nov 7, 2024 at 4:06 AM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Wed, Nov 06, 2024 at 10:19:34AM +1300, Barry Song wrote: > > From: Barry Song <v-songbaohua@oppo.com> > > > > When the proportion of folios from the zeromap is small, missing their > > accounting may not significantly impact profiling. However, it’s easy > > to construct a scenario where this becomes an issue—for example, > > allocating 1 GB of memory, writing zeros from userspace, followed by > > MADV_PAGEOUT, and then swapping it back in. In this case, the swap-out > > and swap-in counts seem to vanish into a black hole, potentially > > causing semantic ambiguity. > > > > On the other hand, Usama reported that zero-filled pages can exceed 10% in > > workloads utilizing zswap, while Hailong noted that some app in Android > > have more than 6% zero-filled pages. Before commit 0ca0c24e3211 ("mm: store > > zero pages to be swapped out in a bitmap"), both zswap and zRAM implemented > > similar optimizations, leading to these optimized-out pages being counted > > in either zswap or zRAM counters (with pswpin/pswpout also increasing for > > zRAM). With zeromap functioning prior to both zswap and zRAM, userspace > > will no longer detect these swap-out and swap-in actions. > > > > We have three ways to address this: > > > > 1. Introduce a dedicated counter specifically for the zeromap. > > 2. Use pswpin/pswpout accounting, treating the zero map as a standard > > backend. This approach aligns with zRAM's current handling of > > same-page fills at the device level. However, it would mean losing > > the optimized-out page counters previously available in zRAM and > > would not align with systems using zswap. Additionally, as noted by > > Nhat Pham, pswpin/pswpout counters apply only to I/O done directly > > to the backend device. > > 3. Count zeromap pages under zswap, aligning with system behavior when > > zswap is enabled. However, this would not be consistent with zRAM, > > nor would it align with systems lacking both zswap and zRAM. > > > > Given the complications with options 2 and 3, this patch selects > > option 1. > > > > We can find these counters from /proc/vmstat (counters for the whole > > system) and memcg's memory.stat (counters for the interested memcg). > > > > For example: > > > > $ grep -E 'swpin_zero|swpout_zero' /proc/vmstat > > swpin_zero 1648 > > swpout_zero 33536 > > > > $ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.stat > > swpin_zero 3905 > > swpout_zero 3985 > > > > This patch does not address any specific zeromap bug, but the missing > > swpout and swpin counts for zero-filled pages can be highly confusing > > and may mislead user-space agents that rely on changes in these counters > > as indicators. Therefore, we add a Fixes tag to encourage the inclusion > > of this counter in any kernel versions with zeromap. > > > > Fixes: 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitmap") > > Reviewed-by: Nhat Pham <nphamcs@gmail.com> > > Cc: Usama Arif <usamaarif642@gmail.com> > > Cc: Chengming Zhou <chengming.zhou@linux.dev> > > Cc: Yosry Ahmed <yosryahmed@google.com> > > Cc: Hailong Liu <hailong.liu@oppo.com> > > Cc: Johannes Weiner <hannes@cmpxchg.org> > > Cc: David Hildenbrand <david@redhat.com> > > Cc: Hugh Dickins <hughd@google.com> > > Cc: Matthew Wilcox (Oracle) <willy@infradead.org> > > Cc: Shakeel Butt <shakeel.butt@linux.dev> > > Cc: Andi Kleen <ak@linux.intel.com> > > Cc: Baolin Wang <baolin.wang@linux.alibaba.com> > > Cc: Chris Li <chrisl@kernel.org> > > Cc: "Huang, Ying" <ying.huang@intel.com> > > Cc: Kairui Song <kasong@tencent.com> > > Cc: Ryan Roberts <ryan.roberts@arm.com> > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > Acked-by: Johannes Weiner <hannes@cmpxchg.org> Thanks! Oops, it seems that it depends on Kanchana's 'mm: change count_objcg_event() to count_objcg_events() for batch event updates,' which also isn't present in 6.12. Otherwise, it won't build, as reported here: https://lore.kernel.org/linux-mm/CAGsJ_4whD31+Lk0m2uq-o=ygvkRsw1uXcPeqxBONV-RUXkeEzg@mail.gmail.com/ Hi Andrew, What’s the best approach here? Should we include Kanchana's patch that extends the nr argument for count_objcg_events() in 6.12-rc as well? Thanks barry
On Thu, 7 Nov 2024 09:01:14 +1300 Barry Song <21cnbao@gmail.com> wrote: > Oops, it seems that it depends on Kanchana's 'mm: change count_objcg_event() to > count_objcg_events() for batch event updates,' which also isn't present in 6.12. > > Otherwise, it won't build, as reported here: > https://lore.kernel.org/linux-mm/CAGsJ_4whD31+Lk0m2uq-o=ygvkRsw1uXcPeqxBONV-RUXkeEzg@mail.gmail.com/ argh. > Hi Andrew, > What’s the best approach here? Should we include Kanchana's patch that extends > the nr argument for count_objcg_events() in 6.12-rc as well? Let's do the right thing here. I'll drop this patch from mm-hotfixes. Please send a v4 against Linus mainline fairly soon then I'll redo Kanchana's series around that.
On Thu, Nov 7, 2024 at 9:42 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Thu, 7 Nov 2024 09:01:14 +1300 Barry Song <21cnbao@gmail.com> wrote: > > > Oops, it seems that it depends on Kanchana's 'mm: change count_objcg_event() to > > count_objcg_events() for batch event updates,' which also isn't present in 6.12. > > > > Otherwise, it won't build, as reported here: > > https://lore.kernel.org/linux-mm/CAGsJ_4whD31+Lk0m2uq-o=ygvkRsw1uXcPeqxBONV-RUXkeEzg@mail.gmail.com/ > > argh. > Apologies for the inconvenience. > > Hi Andrew, > > What’s the best approach here? Should we include Kanchana's patch that extends > > the nr argument for count_objcg_events() in 6.12-rc as well? > > Let's do the right thing here. I'll drop this patch from mm-hotfixes. > Please send a v4 against Linus mainline fairly soon then I'll redo > Kanchana's series around that. Alright. The question is whether we should integrate Kanchana's 'mm: change count_objcg_event() to count_objcg_events() for batch event updates' into 'mm: count zeromap read and set for swapout and swapin,' or keep it as a separate patch as patch 1/2? I guess integration would be better, as hotfixes may not be ideal for a patch series? Thanks Barry
On Thu, 7 Nov 2024 10:00:47 +1300 Barry Song <21cnbao@gmail.com> wrote: > On Thu, Nov 7, 2024 at 9:42 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Thu, 7 Nov 2024 09:01:14 +1300 Barry Song <21cnbao@gmail.com> wrote: > > > > > Oops, it seems that it depends on Kanchana's 'mm: change count_objcg_event() to > > > count_objcg_events() for batch event updates,' which also isn't present in 6.12. > > > > > > Otherwise, it won't build, as reported here: > > > https://lore.kernel.org/linux-mm/CAGsJ_4whD31+Lk0m2uq-o=ygvkRsw1uXcPeqxBONV-RUXkeEzg@mail.gmail.com/ > > > > argh. > > > > Apologies for the inconvenience. > > > > Hi Andrew, > > > What’s the best approach here? Should we include Kanchana's patch that extends > > > the nr argument for count_objcg_events() in 6.12-rc as well? > > > > Let's do the right thing here. I'll drop this patch from mm-hotfixes. > > Please send a v4 against Linus mainline fairly soon then I'll redo > > Kanchana's series around that. > > Alright. The question is whether we should integrate Kanchana's 'mm: > change count_objcg_event() to count_objcg_events() for batch event > updates' into 'mm: count zeromap read and set for swapout and swapin,' > or keep it as a separate patch as patch 1/2? > > I guess integration would be better, as hotfixes may not be ideal for a patch > series? I don't fully understand what you're asking here. I'm suggesting that you prepare a minimal patch that fixes the bug in Linus's kernel. Then we figure out what to do with Kanchana's 6.13-rc1 material after the bugfix is sorted out.
On Thu, Nov 7, 2024 at 10:44 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Thu, 7 Nov 2024 10:00:47 +1300 Barry Song <21cnbao@gmail.com> wrote: > > > On Thu, Nov 7, 2024 at 9:42 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > On Thu, 7 Nov 2024 09:01:14 +1300 Barry Song <21cnbao@gmail.com> wrote: > > > > > > > Oops, it seems that it depends on Kanchana's 'mm: change count_objcg_event() to > > > > count_objcg_events() for batch event updates,' which also isn't present in 6.12. > > > > > > > > Otherwise, it won't build, as reported here: > > > > https://lore.kernel.org/linux-mm/CAGsJ_4whD31+Lk0m2uq-o=ygvkRsw1uXcPeqxBONV-RUXkeEzg@mail.gmail.com/ > > > > > > argh. > > > > > > > Apologies for the inconvenience. > > > > > > Hi Andrew, > > > > What’s the best approach here? Should we include Kanchana's patch that extends > > > > the nr argument for count_objcg_events() in 6.12-rc as well? > > > > > > Let's do the right thing here. I'll drop this patch from mm-hotfixes. > > > Please send a v4 against Linus mainline fairly soon then I'll redo > > > Kanchana's series around that. > > > > Alright. The question is whether we should integrate Kanchana's 'mm: > > change count_objcg_event() to count_objcg_events() for batch event > > updates' into 'mm: count zeromap read and set for swapout and swapin,' > > or keep it as a separate patch as patch 1/2? > > > > I guess integration would be better, as hotfixes may not be ideal for a patch > > series? > > I don't fully understand what you're asking here. > > I'm suggesting that you prepare a minimal patch that fixes the bug in > Linus's kernel. Then we figure out what to do with Kanchana's 6.13-rc1 > material after the bugfix is sorted out. Kanchana's commit 'mm: change count_objcg_event() to count_objcg_events()' changes count_objcg_event() to count_objcg_events() and supports nr_pages more than 1. This is what we need for the minimal patch of fixing zeromap as zeromap could be nr_pages > 1 for large folios. So my question is that, do I combine Kanchana's change into my patch and send a single patch, or do I send a patch series with 2 patches: 1: Kanchana's mm: change count_objcg_event() to count_objcg_events() 2: mm: count zeromap read and set for swapout and swapin If we combine them into a single patch, I'll need to incorporate the changes from 1 into 2. I'm also unsure how to acknowledge Kanchana's contribution —perhaps mark it as co-developed? Cc'd Kanchana as well. Thanks barry
On Thu, 7 Nov 2024 10:53:11 +1300 Barry Song <21cnbao@gmail.com> wrote: > > Kanchana's commit 'mm: change count_objcg_event() to count_objcg_events()' > changes count_objcg_event() to count_objcg_events() and supports > nr_pages more than 1. This is what we need for the minimal patch of > fixing zeromap > as zeromap could be nr_pages > 1 for large folios. > > So my question is that, do I combine Kanchana's change into my patch > and send a single patch, or do I send a patch series with 2 patches: > > 1: Kanchana's mm: change count_objcg_event() to count_objcg_events() > 2: mm: count zeromap read and set for swapout and swapin A single self-contained backportable patch is preferable, please. > If we combine them into a single patch, I'll need to incorporate the changes > from 1 into 2. I'm also unsure how to acknowledge Kanchana's contribution > —perhaps mark it as co-developed? Sure.
On 2024/11/6 05:19, Barry Song wrote: > From: Barry Song <v-songbaohua@oppo.com> > > When the proportion of folios from the zeromap is small, missing their > accounting may not significantly impact profiling. However, it’s easy > to construct a scenario where this becomes an issue—for example, > allocating 1 GB of memory, writing zeros from userspace, followed by > MADV_PAGEOUT, and then swapping it back in. In this case, the swap-out > and swap-in counts seem to vanish into a black hole, potentially > causing semantic ambiguity. > > On the other hand, Usama reported that zero-filled pages can exceed 10% in > workloads utilizing zswap, while Hailong noted that some app in Android > have more than 6% zero-filled pages. Before commit 0ca0c24e3211 ("mm: store > zero pages to be swapped out in a bitmap"), both zswap and zRAM implemented > similar optimizations, leading to these optimized-out pages being counted > in either zswap or zRAM counters (with pswpin/pswpout also increasing for > zRAM). With zeromap functioning prior to both zswap and zRAM, userspace > will no longer detect these swap-out and swap-in actions. > > We have three ways to address this: > > 1. Introduce a dedicated counter specifically for the zeromap. > 2. Use pswpin/pswpout accounting, treating the zero map as a standard > backend. This approach aligns with zRAM's current handling of > same-page fills at the device level. However, it would mean losing > the optimized-out page counters previously available in zRAM and > would not align with systems using zswap. Additionally, as noted by > Nhat Pham, pswpin/pswpout counters apply only to I/O done directly > to the backend device. > 3. Count zeromap pages under zswap, aligning with system behavior when > zswap is enabled. However, this would not be consistent with zRAM, > nor would it align with systems lacking both zswap and zRAM. > > Given the complications with options 2 and 3, this patch selects > option 1. > > We can find these counters from /proc/vmstat (counters for the whole > system) and memcg's memory.stat (counters for the interested memcg). > > For example: > > $ grep -E 'swpin_zero|swpout_zero' /proc/vmstat > swpin_zero 1648 > swpout_zero 33536 > > $ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.stat > swpin_zero 3905 > swpout_zero 3985 > > This patch does not address any specific zeromap bug, but the missing > swpout and swpin counts for zero-filled pages can be highly confusing > and may mislead user-space agents that rely on changes in these counters > as indicators. Therefore, we add a Fixes tag to encourage the inclusion > of this counter in any kernel versions with zeromap. > > Fixes: 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitmap") > Reviewed-by: Nhat Pham <nphamcs@gmail.com> > Cc: Usama Arif <usamaarif642@gmail.com> > Cc: Chengming Zhou <chengming.zhou@linux.dev> > Cc: Yosry Ahmed <yosryahmed@google.com> > Cc: Hailong Liu <hailong.liu@oppo.com> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: David Hildenbrand <david@redhat.com> > Cc: Hugh Dickins <hughd@google.com> > Cc: Matthew Wilcox (Oracle) <willy@infradead.org> > Cc: Shakeel Butt <shakeel.butt@linux.dev> > Cc: Andi Kleen <ak@linux.intel.com> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com> > Cc: Chris Li <chrisl@kernel.org> > Cc: "Huang, Ying" <ying.huang@intel.com> > Cc: Kairui Song <kasong@tencent.com> > Cc: Ryan Roberts <ryan.roberts@arm.com> > Signed-off-by: Barry Song <v-songbaohua@oppo.com> Looks good to me! Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Thanks. > --- > -v3: > * collected Nhat's reviewed-by, thanks! > * refine doc per Usama and David, thanks! > * refine changelog > > Documentation/admin-guide/cgroup-v2.rst | 9 +++++++++ > include/linux/vm_event_item.h | 2 ++ > mm/memcontrol.c | 4 ++++ > mm/page_io.c | 16 ++++++++++++++++ > mm/vmstat.c | 2 ++ > 5 files changed, 33 insertions(+) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > index db3799f1483e..13736a94edfd 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -1599,6 +1599,15 @@ The following nested keys are defined. > pglazyfreed (npn) > Amount of reclaimed lazyfree pages > > + swpin_zero > + Number of pages swapped into memory and filled with zero, where I/O > + was optimized out because the page content was detected to be zero > + during swapout. > + > + swpout_zero > + Number of zero-filled pages swapped out with I/O skipped due to the > + content being detected as zero. > + > zswpin > Number of pages moved in to memory from zswap. > > diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h > index aed952d04132..f70d0958095c 100644 > --- a/include/linux/vm_event_item.h > +++ b/include/linux/vm_event_item.h > @@ -134,6 +134,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, > #ifdef CONFIG_SWAP > SWAP_RA, > SWAP_RA_HIT, > + SWPIN_ZERO, > + SWPOUT_ZERO, > #ifdef CONFIG_KSM > KSM_SWPIN_COPY, > #endif > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 5e44d6e7591e..7b3503d12aaf 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -441,6 +441,10 @@ static const unsigned int memcg_vm_event_stat[] = { > PGDEACTIVATE, > PGLAZYFREE, > PGLAZYFREED, > +#ifdef CONFIG_SWAP > + SWPIN_ZERO, > + SWPOUT_ZERO, > +#endif > #ifdef CONFIG_ZSWAP > ZSWPIN, > ZSWPOUT, > diff --git a/mm/page_io.c b/mm/page_io.c > index 5d9b6e6cf96c..4b4ea8e49cf6 100644 > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -204,7 +204,9 @@ static bool is_folio_zero_filled(struct folio *folio) > > static void swap_zeromap_folio_set(struct folio *folio) > { > + struct obj_cgroup *objcg = get_obj_cgroup_from_folio(folio); > struct swap_info_struct *sis = swp_swap_info(folio->swap); > + int nr_pages = folio_nr_pages(folio); > swp_entry_t entry; > unsigned int i; > > @@ -212,6 +214,12 @@ static void swap_zeromap_folio_set(struct folio *folio) > entry = page_swap_entry(folio_page(folio, i)); > set_bit(swp_offset(entry), sis->zeromap); > } > + > + count_vm_events(SWPOUT_ZERO, nr_pages); > + if (objcg) { > + count_objcg_events(objcg, SWPOUT_ZERO, nr_pages); > + obj_cgroup_put(objcg); > + } > } > > static void swap_zeromap_folio_clear(struct folio *folio) > @@ -507,6 +515,7 @@ static void sio_read_complete(struct kiocb *iocb, long ret) > static bool swap_read_folio_zeromap(struct folio *folio) > { > int nr_pages = folio_nr_pages(folio); > + struct obj_cgroup *objcg; > bool is_zeromap; > > /* > @@ -521,6 +530,13 @@ static bool swap_read_folio_zeromap(struct folio *folio) > if (!is_zeromap) > return false; > > + objcg = get_obj_cgroup_from_folio(folio); > + count_vm_events(SWPIN_ZERO, nr_pages); > + if (objcg) { > + count_objcg_events(objcg, SWPIN_ZERO, nr_pages); > + obj_cgroup_put(objcg); > + } > + > folio_zero_range(folio, 0, folio_size(folio)); > folio_mark_uptodate(folio); > return true; > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 22a294556b58..c8ef7352f9ed 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -1418,6 +1418,8 @@ const char * const vmstat_text[] = { > #ifdef CONFIG_SWAP > "swap_ra", > "swap_ra_hit", > + "swpin_zero", > + "swpout_zero", > #ifdef CONFIG_KSM > "ksm_swpin_copy", > #endif
© 2016 - 2024 Red Hat, Inc.