[v1] mm: ZSWAP swap-out of mTHP folios

[RFC PATCH v1 4/4] mm: page_io: Count successful mTHP zswap stores in vmstat.

Posted by Kanchana P Sridhar 1 year, 5 months ago

Added count_zswap_thp_swpout_vm_event() that will increment the
appropriate mTHP/PMD vmstat event counters if zswap_store succeeds for
a large folio:

zswap_store mTHP order [0, HPAGE_PMD_ORDER-1] will increment these
vmstat event counters:

  ZSWPOUT_4KB_FOLIO
  mTHP_ZSWPOUT_8kB
  mTHP_ZSWPOUT_16kB
  mTHP_ZSWPOUT_32kB
  mTHP_ZSWPOUT_64kB
  mTHP_ZSWPOUT_128kB
  mTHP_ZSWPOUT_256kB
  mTHP_ZSWPOUT_512kB
  mTHP_ZSWPOUT_1024kB

zswap_store of a PMD-size THP, i.e., mTHP order HPAGE_PMD_ORDER, will
increment both these vmstat event counters:

  ZSWPOUT_PMD_THP_FOLIO
  mTHP_ZSWPOUT_2048kB

Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
---
 mm/page_io.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/mm/page_io.c b/mm/page_io.c
index 0a150c240bf4..ab54d2060cc4 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -172,6 +172,49 @@ int generic_swapfile_activate(struct swap_info_struct *sis,
 	goto out;
 }
 
+/*
+ * Count vmstats for ZSWAP store of large folios (mTHP and PMD-size THP).
+ */
+static inline void count_zswap_thp_swpout_vm_event(struct folio *folio)
+{
+	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && folio_test_pmd_mappable(folio)) {
+		count_vm_event(ZSWPOUT_PMD_THP_FOLIO);
+		count_vm_event(mTHP_ZSWPOUT_2048kB);
+	} else if (folio_order(folio) == 0) {
+		count_vm_event(ZSWPOUT_4KB_FOLIO);
+	} else if (IS_ENABLED(CONFIG_THP_SWAP)) {
+		switch (folio_order(folio)) {
+		case 1:
+			count_vm_event(mTHP_ZSWPOUT_8kB);
+			break;
+		case 2:
+			count_vm_event(mTHP_ZSWPOUT_16kB);
+			break;
+		case 3:
+			count_vm_event(mTHP_ZSWPOUT_32kB);
+			break;
+		case 4:
+			count_vm_event(mTHP_ZSWPOUT_64kB);
+			break;
+		case 5:
+			count_vm_event(mTHP_ZSWPOUT_128kB);
+			break;
+		case 6:
+			count_vm_event(mTHP_ZSWPOUT_256kB);
+			break;
+		case 7:
+			count_vm_event(mTHP_ZSWPOUT_512kB);
+			break;
+		case 8:
+			count_vm_event(mTHP_ZSWPOUT_1024kB);
+			break;
+		case 9:
+			count_vm_event(mTHP_ZSWPOUT_2048kB);
+			break;
+		}
+	}
+}
+
 /*
  * We may have stale swap cache pages in memory: notice
  * them here and get rid of the unnecessary final write.
@@ -196,6 +239,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
 		return ret;
 	}
 	if (zswap_store(folio)) {
+		count_zswap_thp_swpout_vm_event(folio);
 		folio_start_writeback(folio);
 		folio_unlock(folio);
 		folio_end_writeback(folio);
-- 
2.27.0

Re: [RFC PATCH v1 4/4] mm: page_io: Count successful mTHP zswap stores in vmstat.

Posted by Barry Song 1 year, 5 months ago

On Wed, Aug 14, 2024 at 6:28 PM Kanchana P Sridhar
<kanchana.p.sridhar@intel.com> wrote:
>
> Added count_zswap_thp_swpout_vm_event() that will increment the
> appropriate mTHP/PMD vmstat event counters if zswap_store succeeds for
> a large folio:
>
> zswap_store mTHP order [0, HPAGE_PMD_ORDER-1] will increment these
> vmstat event counters:
>
>   ZSWPOUT_4KB_FOLIO
>   mTHP_ZSWPOUT_8kB
>   mTHP_ZSWPOUT_16kB
>   mTHP_ZSWPOUT_32kB
>   mTHP_ZSWPOUT_64kB
>   mTHP_ZSWPOUT_128kB
>   mTHP_ZSWPOUT_256kB
>   mTHP_ZSWPOUT_512kB
>   mTHP_ZSWPOUT_1024kB
>
> zswap_store of a PMD-size THP, i.e., mTHP order HPAGE_PMD_ORDER, will
> increment both these vmstat event counters:
>
>   ZSWPOUT_PMD_THP_FOLIO
>   mTHP_ZSWPOUT_2048kB
>
> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> ---
>  mm/page_io.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
>
> diff --git a/mm/page_io.c b/mm/page_io.c
> index 0a150c240bf4..ab54d2060cc4 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -172,6 +172,49 @@ int generic_swapfile_activate(struct swap_info_struct *sis,
>         goto out;
>  }
>
> +/*
> + * Count vmstats for ZSWAP store of large folios (mTHP and PMD-size THP).
> + */
> +static inline void count_zswap_thp_swpout_vm_event(struct folio *folio)
> +{
> +       if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && folio_test_pmd_mappable(folio)) {
> +               count_vm_event(ZSWPOUT_PMD_THP_FOLIO);
> +               count_vm_event(mTHP_ZSWPOUT_2048kB);
> +       } else if (folio_order(folio) == 0) {
> +               count_vm_event(ZSWPOUT_4KB_FOLIO);
> +       } else if (IS_ENABLED(CONFIG_THP_SWAP)) {
> +               switch (folio_order(folio)) {
> +               case 1:
> +                       count_vm_event(mTHP_ZSWPOUT_8kB);
> +                       break;
> +               case 2:
> +                       count_vm_event(mTHP_ZSWPOUT_16kB);
> +                       break;
> +               case 3:
> +                       count_vm_event(mTHP_ZSWPOUT_32kB);
> +                       break;
> +               case 4:
> +                       count_vm_event(mTHP_ZSWPOUT_64kB);
> +                       break;
> +               case 5:
> +                       count_vm_event(mTHP_ZSWPOUT_128kB);
> +                       break;
> +               case 6:
> +                       count_vm_event(mTHP_ZSWPOUT_256kB);
> +                       break;
> +               case 7:
> +                       count_vm_event(mTHP_ZSWPOUT_512kB);
> +                       break;
> +               case 8:
> +                       count_vm_event(mTHP_ZSWPOUT_1024kB);
> +                       break;
> +               case 9:
> +                       count_vm_event(mTHP_ZSWPOUT_2048kB);
> +                       break;
> +               }

The number of orders is PMD_ORDER, also ilog2(MAX_PTRS_PER_PTE) .
PMD_ORDER isn't necessarily 9. It seems we need some general way
to handle this and avoid so many duplicated case 1, case 2.... case 9.

> +       }
> +}
> +
>  /*
>   * We may have stale swap cache pages in memory: notice
>   * them here and get rid of the unnecessary final write.
> @@ -196,6 +239,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
>                 return ret;
>         }
>         if (zswap_store(folio)) {
> +               count_zswap_thp_swpout_vm_event(folio);
>                 folio_start_writeback(folio);
>                 folio_unlock(folio);
>                 folio_end_writeback(folio);
> --
> 2.27.0
>

Thanks
Barry

RE: [RFC PATCH v1 4/4] mm: page_io: Count successful mTHP zswap stores in vmstat.

Posted by Sridhar, Kanchana P 1 year, 5 months ago

Hi Barry,

> -----Original Message-----
> From: Barry Song <21cnbao@gmail.com>
> Sent: Wednesday, August 14, 2024 12:53 AM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com;
> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>; akpm@linux-
> foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [RFC PATCH v1 4/4] mm: page_io: Count successful mTHP zswap
> stores in vmstat.
> 
> On Wed, Aug 14, 2024 at 6:28 PM Kanchana P Sridhar
> <kanchana.p.sridhar@intel.com> wrote:
> >
> > Added count_zswap_thp_swpout_vm_event() that will increment the
> > appropriate mTHP/PMD vmstat event counters if zswap_store succeeds for
> > a large folio:
> >
> > zswap_store mTHP order [0, HPAGE_PMD_ORDER-1] will increment these
> > vmstat event counters:
> >
> >   ZSWPOUT_4KB_FOLIO
> >   mTHP_ZSWPOUT_8kB
> >   mTHP_ZSWPOUT_16kB
> >   mTHP_ZSWPOUT_32kB
> >   mTHP_ZSWPOUT_64kB
> >   mTHP_ZSWPOUT_128kB
> >   mTHP_ZSWPOUT_256kB
> >   mTHP_ZSWPOUT_512kB
> >   mTHP_ZSWPOUT_1024kB
> >
> > zswap_store of a PMD-size THP, i.e., mTHP order HPAGE_PMD_ORDER, will
> > increment both these vmstat event counters:
> >
> >   ZSWPOUT_PMD_THP_FOLIO
> >   mTHP_ZSWPOUT_2048kB
> >
> > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > ---
> >  mm/page_io.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 44 insertions(+)
> >
> > diff --git a/mm/page_io.c b/mm/page_io.c
> > index 0a150c240bf4..ab54d2060cc4 100644
> > --- a/mm/page_io.c
> > +++ b/mm/page_io.c
> > @@ -172,6 +172,49 @@ int generic_swapfile_activate(struct
> swap_info_struct *sis,
> >         goto out;
> >  }
> >
> > +/*
> > + * Count vmstats for ZSWAP store of large folios (mTHP and PMD-size THP).
> > + */
> > +static inline void count_zswap_thp_swpout_vm_event(struct folio *folio)
> > +{
> > +       if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> folio_test_pmd_mappable(folio)) {
> > +               count_vm_event(ZSWPOUT_PMD_THP_FOLIO);
> > +               count_vm_event(mTHP_ZSWPOUT_2048kB);
> > +       } else if (folio_order(folio) == 0) {
> > +               count_vm_event(ZSWPOUT_4KB_FOLIO);
> > +       } else if (IS_ENABLED(CONFIG_THP_SWAP)) {
> > +               switch (folio_order(folio)) {
> > +               case 1:
> > +                       count_vm_event(mTHP_ZSWPOUT_8kB);
> > +                       break;
> > +               case 2:
> > +                       count_vm_event(mTHP_ZSWPOUT_16kB);
> > +                       break;
> > +               case 3:
> > +                       count_vm_event(mTHP_ZSWPOUT_32kB);
> > +                       break;
> > +               case 4:
> > +                       count_vm_event(mTHP_ZSWPOUT_64kB);
> > +                       break;
> > +               case 5:
> > +                       count_vm_event(mTHP_ZSWPOUT_128kB);
> > +                       break;
> > +               case 6:
> > +                       count_vm_event(mTHP_ZSWPOUT_256kB);
> > +                       break;
> > +               case 7:
> > +                       count_vm_event(mTHP_ZSWPOUT_512kB);
> > +                       break;
> > +               case 8:
> > +                       count_vm_event(mTHP_ZSWPOUT_1024kB);
> > +                       break;
> > +               case 9:
> > +                       count_vm_event(mTHP_ZSWPOUT_2048kB);
> > +                       break;
> > +               }
> 
> The number of orders is PMD_ORDER, also ilog2(MAX_PTRS_PER_PTE) .
> PMD_ORDER isn't necessarily 9. It seems we need some general way
> to handle this and avoid so many duplicated case 1, case 2.... case 9.

Thanks for this suggestion. The general way to do this appears to be
simply calling count_mthp_stat(folio_order(folio), MTHP_STAT_[Z]SWPOUT)
potentially with the addition of a new "MTHP_STAT_ZSWPOUT" to
"enum mthp_stat_item".

I will make this change in v2 accordingly.

Thanks,
Kanchana

> 
> > +       }
> > +}
> > +
> >  /*
> >   * We may have stale swap cache pages in memory: notice
> >   * them here and get rid of the unnecessary final write.
> > @@ -196,6 +239,7 @@ int swap_writepage(struct page *page, struct
> writeback_control *wbc)
> >                 return ret;
> >         }
> >         if (zswap_store(folio)) {
> > +               count_zswap_thp_swpout_vm_event(folio);
> >                 folio_start_writeback(folio);
> >                 folio_unlock(folio);
> >                 folio_end_writeback(folio);
> > --
> > 2.27.0
> >
> 
> Thanks
> Barry

[RFC PATCH v1 1/4] mm: zswap: zswap_is_folio_same_filled() takes an index in the folio.
[RFC PATCH v1 2/4] mm: vmstat: Per mTHP-size zswap_store vmstat event counters.
[RFC PATCH v1 3/4] mm: zswap: zswap_store() extended to handle mTHP folios.
[RFC PATCH v1 4/4] mm: page_io: Count successful mTHP zswap stores in vmstat.