[v8] mm: zswap swap-out of large folios

[PATCH v8 5/8] mm: zswap: Modify zswap_stored_pages to be atomic_long_t.

Posted by Kanchana P Sridhar 2 months ago

For zswap_store() to support large folios, we need to be able to do
a batch update of zswap_stored_pages upon successful store of all pages
in the folio. For this, we need to add folio_nr_pages(), which returns
a long, to zswap_stored_pages.

Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
---
 fs/proc/meminfo.c     |  2 +-
 include/linux/zswap.h |  2 +-
 mm/zswap.c            | 19 +++++++++++++------
 3 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 245171d9164b..8ba9b1472390 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -91,7 +91,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 #ifdef CONFIG_ZSWAP
 	show_val_kb(m, "Zswap:          ", zswap_total_pages());
 	seq_printf(m,  "Zswapped:       %8lu kB\n",
-		   (unsigned long)atomic_read(&zswap_stored_pages) <<
+		   (unsigned long)atomic_long_read(&zswap_stored_pages) <<
 		   (PAGE_SHIFT - 10));
 #endif
 	show_val_kb(m, "Dirty:          ",
diff --git a/include/linux/zswap.h b/include/linux/zswap.h
index 9cd1beef0654..d961ead91bf1 100644
--- a/include/linux/zswap.h
+++ b/include/linux/zswap.h
@@ -7,7 +7,7 @@
 
 struct lruvec;
 
-extern atomic_t zswap_stored_pages;
+extern atomic_long_t zswap_stored_pages;
 
 #ifdef CONFIG_ZSWAP
 
diff --git a/mm/zswap.c b/mm/zswap.c
index 0f281e50a034..43e4e216db41 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -43,7 +43,7 @@
 * statistics
 **********************************/
 /* The number of compressed pages currently stored in zswap */
-atomic_t zswap_stored_pages = ATOMIC_INIT(0);
+atomic_long_t zswap_stored_pages = ATOMIC_INIT(0);
 
 /*
  * The statistics below are not protected from concurrent access for
@@ -802,7 +802,7 @@ static void zswap_entry_free(struct zswap_entry *entry)
 		obj_cgroup_put(entry->objcg);
 	}
 	zswap_entry_cache_free(entry);
-	atomic_dec(&zswap_stored_pages);
+	atomic_long_dec(&zswap_stored_pages);
 }
 
 /*********************************
@@ -1232,7 +1232,7 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker,
 		nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED);
 	} else {
 		nr_backing = zswap_total_pages();
-		nr_stored = atomic_read(&zswap_stored_pages);
+		nr_stored = atomic_long_read(&zswap_stored_pages);
 	}
 
 	if (!nr_stored)
@@ -1501,7 +1501,7 @@ bool zswap_store(struct folio *folio)
 	}
 
 	/* update stats */
-	atomic_inc(&zswap_stored_pages);
+	atomic_long_inc(&zswap_stored_pages);
 	count_vm_event(ZSWPOUT);
 
 	return true;
@@ -1650,6 +1650,13 @@ static int debugfs_get_total_size(void *data, u64 *val)
 }
 DEFINE_DEBUGFS_ATTRIBUTE(total_size_fops, debugfs_get_total_size, NULL, "%llu\n");
 
+static int debugfs_get_stored_pages(void *data, u64 *val)
+{
+	*val = atomic_long_read(&zswap_stored_pages);
+	return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(stored_pages_fops, debugfs_get_stored_pages, NULL, "%llu\n");
+
 static int zswap_debugfs_init(void)
 {
 	if (!debugfs_initialized())
@@ -1673,8 +1680,8 @@ static int zswap_debugfs_init(void)
 			   zswap_debugfs_root, &zswap_written_back_pages);
 	debugfs_create_file("pool_total_size", 0444,
 			    zswap_debugfs_root, NULL, &total_size_fops);
-	debugfs_create_atomic_t("stored_pages", 0444,
-				zswap_debugfs_root, &zswap_stored_pages);
+	debugfs_create_file("stored_pages", 0444,
+			    zswap_debugfs_root, NULL, &stored_pages_fops);
 
 	return 0;
 }
-- 
2.27.0

Re: [PATCH v8 5/8] mm: zswap: Modify zswap_stored_pages to be atomic_long_t.

Posted by Nhat Pham 2 months ago

On Fri, Sep 27, 2024 at 7:16 PM Kanchana P Sridhar
<kanchana.p.sridhar@intel.com> wrote:
>
> For zswap_store() to support large folios, we need to be able to do
> a batch update of zswap_stored_pages upon successful store of all pages
> in the folio. For this, we need to add folio_nr_pages(), which returns
> a long, to zswap_stored_pages.
>
> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>

Reviewed-by: Nhat Pham <nphamcs@gmail.com>

Re: [PATCH v8 5/8] mm: zswap: Modify zswap_stored_pages to be atomic_long_t.

Posted by Johannes Weiner 2 months ago

On Fri, Sep 27, 2024 at 07:16:17PM -0700, Kanchana P Sridhar wrote:
> For zswap_store() to support large folios, we need to be able to do
> a batch update of zswap_stored_pages upon successful store of all pages
> in the folio. For this, we need to add folio_nr_pages(), which returns
> a long, to zswap_stored_pages.
> 
> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>

Long for pages makes sense to me even independent of the large folios
coming in. An int is just 8TB in 4k (base) pages.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

RE: [PATCH v8 5/8] mm: zswap: Modify zswap_stored_pages to be atomic_long_t.

Posted by Sridhar, Kanchana P 1 month, 4 weeks ago

> -----Original Message-----
> From: Johannes Weiner <hannes@cmpxchg.org>
> Sent: Saturday, September 28, 2024 6:54 AM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> yosryahmed@google.com; nphamcs@gmail.com;
> chengming.zhou@linux.dev; usamaarif642@gmail.com;
> shakeel.butt@linux.dev; ryan.roberts@arm.com; Huang, Ying
> <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org;
> Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v8 5/8] mm: zswap: Modify zswap_stored_pages to be
> atomic_long_t.
> 
> On Fri, Sep 27, 2024 at 07:16:17PM -0700, Kanchana P Sridhar wrote:
> > For zswap_store() to support large folios, we need to be able to do
> > a batch update of zswap_stored_pages upon successful store of all pages
> > in the folio. For this, we need to add folio_nr_pages(), which returns
> > a long, to zswap_stored_pages.
> >
> > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> 
> Long for pages makes sense to me even independent of the large folios
> coming in. An int is just 8TB in 4k (base) pages.
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Thanks Johannes for the Acked-by's!

Thanks,
Kanchana

Re: [PATCH v8 5/8] mm: zswap: Modify zswap_stored_pages to be atomic_long_t.

Posted by Yosry Ahmed 2 months ago

On Fri, Sep 27, 2024 at 7:16 PM Kanchana P Sridhar
<kanchana.p.sridhar@intel.com> wrote:
>
> For zswap_store() to support large folios, we need to be able to do
> a batch update of zswap_stored_pages upon successful store of all pages
> in the folio. For this, we need to add folio_nr_pages(), which returns
> a long, to zswap_stored_pages.
>
> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> ---
>  fs/proc/meminfo.c     |  2 +-
>  include/linux/zswap.h |  2 +-
>  mm/zswap.c            | 19 +++++++++++++------
>  3 files changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> index 245171d9164b..8ba9b1472390 100644
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -91,7 +91,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>  #ifdef CONFIG_ZSWAP
>         show_val_kb(m, "Zswap:          ", zswap_total_pages());
>         seq_printf(m,  "Zswapped:       %8lu kB\n",
> -                  (unsigned long)atomic_read(&zswap_stored_pages) <<
> +                  (unsigned long)atomic_long_read(&zswap_stored_pages) <<

Do we still need this cast? "HardwareCorrupted" seems to be using
atomic_long_read() without a cast.

Otherwise this LGTM:
Acked-by: Yosry Ahmed <yosryahmed@google.com>

>                    (PAGE_SHIFT - 10));
>  #endif
>         show_val_kb(m, "Dirty:          ",
> diff --git a/include/linux/zswap.h b/include/linux/zswap.h
> index 9cd1beef0654..d961ead91bf1 100644
> --- a/include/linux/zswap.h
> +++ b/include/linux/zswap.h
> @@ -7,7 +7,7 @@
>
>  struct lruvec;
>
> -extern atomic_t zswap_stored_pages;
> +extern atomic_long_t zswap_stored_pages;
>
>  #ifdef CONFIG_ZSWAP
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 0f281e50a034..43e4e216db41 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -43,7 +43,7 @@
>  * statistics
>  **********************************/
>  /* The number of compressed pages currently stored in zswap */
> -atomic_t zswap_stored_pages = ATOMIC_INIT(0);
> +atomic_long_t zswap_stored_pages = ATOMIC_INIT(0);
>
>  /*
>   * The statistics below are not protected from concurrent access for
> @@ -802,7 +802,7 @@ static void zswap_entry_free(struct zswap_entry *entry)
>                 obj_cgroup_put(entry->objcg);
>         }
>         zswap_entry_cache_free(entry);
> -       atomic_dec(&zswap_stored_pages);
> +       atomic_long_dec(&zswap_stored_pages);
>  }
>
>  /*********************************
> @@ -1232,7 +1232,7 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker,
>                 nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED);
>         } else {
>                 nr_backing = zswap_total_pages();
> -               nr_stored = atomic_read(&zswap_stored_pages);
> +               nr_stored = atomic_long_read(&zswap_stored_pages);
>         }
>
>         if (!nr_stored)
> @@ -1501,7 +1501,7 @@ bool zswap_store(struct folio *folio)
>         }
>
>         /* update stats */
> -       atomic_inc(&zswap_stored_pages);
> +       atomic_long_inc(&zswap_stored_pages);
>         count_vm_event(ZSWPOUT);
>
>         return true;
> @@ -1650,6 +1650,13 @@ static int debugfs_get_total_size(void *data, u64 *val)
>  }
>  DEFINE_DEBUGFS_ATTRIBUTE(total_size_fops, debugfs_get_total_size, NULL, "%llu\n");
>
> +static int debugfs_get_stored_pages(void *data, u64 *val)
> +{
> +       *val = atomic_long_read(&zswap_stored_pages);
> +       return 0;
> +}
> +DEFINE_DEBUGFS_ATTRIBUTE(stored_pages_fops, debugfs_get_stored_pages, NULL, "%llu\n");
> +
>  static int zswap_debugfs_init(void)
>  {
>         if (!debugfs_initialized())
> @@ -1673,8 +1680,8 @@ static int zswap_debugfs_init(void)
>                            zswap_debugfs_root, &zswap_written_back_pages);
>         debugfs_create_file("pool_total_size", 0444,
>                             zswap_debugfs_root, NULL, &total_size_fops);
> -       debugfs_create_atomic_t("stored_pages", 0444,
> -                               zswap_debugfs_root, &zswap_stored_pages);
> +       debugfs_create_file("stored_pages", 0444,
> +                           zswap_debugfs_root, NULL, &stored_pages_fops);
>
>         return 0;
>  }
> --
> 2.27.0
>

RE: [PATCH v8 5/8] mm: zswap: Modify zswap_stored_pages to be atomic_long_t.

Posted by Sridhar, Kanchana P 1 month, 4 weeks ago

> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@google.com>
> Sent: Saturday, September 28, 2024 1:13 AM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev;
> usamaarif642@gmail.com; shakeel.butt@linux.dev; ryan.roberts@arm.com;
> Huang, Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> foundation.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v8 5/8] mm: zswap: Modify zswap_stored_pages to be
> atomic_long_t.
> 
> On Fri, Sep 27, 2024 at 7:16 PM Kanchana P Sridhar
> <kanchana.p.sridhar@intel.com> wrote:
> >
> > For zswap_store() to support large folios, we need to be able to do
> > a batch update of zswap_stored_pages upon successful store of all pages
> > in the folio. For this, we need to add folio_nr_pages(), which returns
> > a long, to zswap_stored_pages.
> >
> > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > ---
> >  fs/proc/meminfo.c     |  2 +-
> >  include/linux/zswap.h |  2 +-
> >  mm/zswap.c            | 19 +++++++++++++------
> >  3 files changed, 15 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> > index 245171d9164b..8ba9b1472390 100644
> > --- a/fs/proc/meminfo.c
> > +++ b/fs/proc/meminfo.c
> > @@ -91,7 +91,7 @@ static int meminfo_proc_show(struct seq_file *m,
> void *v)
> >  #ifdef CONFIG_ZSWAP
> >         show_val_kb(m, "Zswap:          ", zswap_total_pages());
> >         seq_printf(m,  "Zswapped:       %8lu kB\n",
> > -                  (unsigned long)atomic_read(&zswap_stored_pages) <<
> > +                  (unsigned long)atomic_long_read(&zswap_stored_pages) <<
> 
> Do we still need this cast? "HardwareCorrupted" seems to be using
> atomic_long_read() without a cast.
> 
> Otherwise this LGTM:
> Acked-by: Yosry Ahmed <yosryahmed@google.com>

Thanks Yosry for the Acked-by's!

Thanks,
Kanchana

> 
> >                    (PAGE_SHIFT - 10));
> >  #endif
> >         show_val_kb(m, "Dirty:          ",
> > diff --git a/include/linux/zswap.h b/include/linux/zswap.h
> > index 9cd1beef0654..d961ead91bf1 100644
> > --- a/include/linux/zswap.h
> > +++ b/include/linux/zswap.h
> > @@ -7,7 +7,7 @@
> >
> >  struct lruvec;
> >
> > -extern atomic_t zswap_stored_pages;
> > +extern atomic_long_t zswap_stored_pages;
> >
> >  #ifdef CONFIG_ZSWAP
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 0f281e50a034..43e4e216db41 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -43,7 +43,7 @@
> >  * statistics
> >  **********************************/
> >  /* The number of compressed pages currently stored in zswap */
> > -atomic_t zswap_stored_pages = ATOMIC_INIT(0);
> > +atomic_long_t zswap_stored_pages = ATOMIC_INIT(0);
> >
> >  /*
> >   * The statistics below are not protected from concurrent access for
> > @@ -802,7 +802,7 @@ static void zswap_entry_free(struct zswap_entry
> *entry)
> >                 obj_cgroup_put(entry->objcg);
> >         }
> >         zswap_entry_cache_free(entry);
> > -       atomic_dec(&zswap_stored_pages);
> > +       atomic_long_dec(&zswap_stored_pages);
> >  }
> >
> >  /*********************************
> > @@ -1232,7 +1232,7 @@ static unsigned long zswap_shrinker_count(struct
> shrinker *shrinker,
> >                 nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED);
> >         } else {
> >                 nr_backing = zswap_total_pages();
> > -               nr_stored = atomic_read(&zswap_stored_pages);
> > +               nr_stored = atomic_long_read(&zswap_stored_pages);
> >         }
> >
> >         if (!nr_stored)
> > @@ -1501,7 +1501,7 @@ bool zswap_store(struct folio *folio)
> >         }
> >
> >         /* update stats */
> > -       atomic_inc(&zswap_stored_pages);
> > +       atomic_long_inc(&zswap_stored_pages);
> >         count_vm_event(ZSWPOUT);
> >
> >         return true;
> > @@ -1650,6 +1650,13 @@ static int debugfs_get_total_size(void *data,
> u64 *val)
> >  }
> >  DEFINE_DEBUGFS_ATTRIBUTE(total_size_fops, debugfs_get_total_size,
> NULL, "%llu\n");
> >
> > +static int debugfs_get_stored_pages(void *data, u64 *val)
> > +{
> > +       *val = atomic_long_read(&zswap_stored_pages);
> > +       return 0;
> > +}
> > +DEFINE_DEBUGFS_ATTRIBUTE(stored_pages_fops,
> debugfs_get_stored_pages, NULL, "%llu\n");
> > +
> >  static int zswap_debugfs_init(void)
> >  {
> >         if (!debugfs_initialized())
> > @@ -1673,8 +1680,8 @@ static int zswap_debugfs_init(void)
> >                            zswap_debugfs_root, &zswap_written_back_pages);
> >         debugfs_create_file("pool_total_size", 0444,
> >                             zswap_debugfs_root, NULL, &total_size_fops);
> > -       debugfs_create_atomic_t("stored_pages", 0444,
> > -                               zswap_debugfs_root, &zswap_stored_pages);
> > +       debugfs_create_file("stored_pages", 0444,
> > +                           zswap_debugfs_root, NULL, &stored_pages_fops);
> >
> >         return 0;
> >  }
> > --
> > 2.27.0
> >

Re: [PATCH v8 5/8] mm: zswap: Modify zswap_stored_pages to be atomic_long_t.

Posted by Yosry Ahmed 2 months ago

On Fri, Sep 27, 2024 at 7:16 PM Kanchana P Sridhar
<kanchana.p.sridhar@intel.com> wrote:
>
> For zswap_store() to support large folios, we need to be able to do
> a batch update of zswap_stored_pages upon successful store of all pages
> in the folio. For this, we need to add folio_nr_pages(), which returns
> a long, to zswap_stored_pages.

Do we really need this? A lot of places in the kernel assign the
result of folio_nr_pages() to an int (thp_nr_pages(),
split_huge_pages_all(), etc). I don't think we need to worry about
folio_nr_pages() exceeding INT_MAX for a while.

>
> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> ---
>  fs/proc/meminfo.c     |  2 +-
>  include/linux/zswap.h |  2 +-
>  mm/zswap.c            | 19 +++++++++++++------
>  3 files changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> index 245171d9164b..8ba9b1472390 100644
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -91,7 +91,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>  #ifdef CONFIG_ZSWAP
>         show_val_kb(m, "Zswap:          ", zswap_total_pages());
>         seq_printf(m,  "Zswapped:       %8lu kB\n",
> -                  (unsigned long)atomic_read(&zswap_stored_pages) <<
> +                  (unsigned long)atomic_long_read(&zswap_stored_pages) <<
>                    (PAGE_SHIFT - 10));
>  #endif
>         show_val_kb(m, "Dirty:          ",
> diff --git a/include/linux/zswap.h b/include/linux/zswap.h
> index 9cd1beef0654..d961ead91bf1 100644
> --- a/include/linux/zswap.h
> +++ b/include/linux/zswap.h
> @@ -7,7 +7,7 @@
>
>  struct lruvec;
>
> -extern atomic_t zswap_stored_pages;
> +extern atomic_long_t zswap_stored_pages;
>
>  #ifdef CONFIG_ZSWAP
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 0f281e50a034..43e4e216db41 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -43,7 +43,7 @@
>  * statistics
>  **********************************/
>  /* The number of compressed pages currently stored in zswap */
> -atomic_t zswap_stored_pages = ATOMIC_INIT(0);
> +atomic_long_t zswap_stored_pages = ATOMIC_INIT(0);
>
>  /*
>   * The statistics below are not protected from concurrent access for
> @@ -802,7 +802,7 @@ static void zswap_entry_free(struct zswap_entry *entry)
>                 obj_cgroup_put(entry->objcg);
>         }
>         zswap_entry_cache_free(entry);
> -       atomic_dec(&zswap_stored_pages);
> +       atomic_long_dec(&zswap_stored_pages);
>  }
>
>  /*********************************
> @@ -1232,7 +1232,7 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker,
>                 nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED);
>         } else {
>                 nr_backing = zswap_total_pages();
> -               nr_stored = atomic_read(&zswap_stored_pages);
> +               nr_stored = atomic_long_read(&zswap_stored_pages);
>         }
>
>         if (!nr_stored)
> @@ -1501,7 +1501,7 @@ bool zswap_store(struct folio *folio)
>         }
>
>         /* update stats */
> -       atomic_inc(&zswap_stored_pages);
> +       atomic_long_inc(&zswap_stored_pages);
>         count_vm_event(ZSWPOUT);
>
>         return true;
> @@ -1650,6 +1650,13 @@ static int debugfs_get_total_size(void *data, u64 *val)
>  }
>  DEFINE_DEBUGFS_ATTRIBUTE(total_size_fops, debugfs_get_total_size, NULL, "%llu\n");
>
> +static int debugfs_get_stored_pages(void *data, u64 *val)
> +{
> +       *val = atomic_long_read(&zswap_stored_pages);
> +       return 0;
> +}
> +DEFINE_DEBUGFS_ATTRIBUTE(stored_pages_fops, debugfs_get_stored_pages, NULL, "%llu\n");
> +
>  static int zswap_debugfs_init(void)
>  {
>         if (!debugfs_initialized())
> @@ -1673,8 +1680,8 @@ static int zswap_debugfs_init(void)
>                            zswap_debugfs_root, &zswap_written_back_pages);
>         debugfs_create_file("pool_total_size", 0444,
>                             zswap_debugfs_root, NULL, &total_size_fops);
> -       debugfs_create_atomic_t("stored_pages", 0444,
> -                               zswap_debugfs_root, &zswap_stored_pages);
> +       debugfs_create_file("stored_pages", 0444,
> +                           zswap_debugfs_root, NULL, &stored_pages_fops);
>
>         return 0;
>  }
> --
> 2.27.0
>

Re: [PATCH v8 5/8] mm: zswap: Modify zswap_stored_pages to be atomic_long_t.

Posted by Matthew Wilcox 2 months ago

On Fri, Sep 27, 2024 at 07:57:49PM -0700, Yosry Ahmed wrote:
> On Fri, Sep 27, 2024 at 7:16 PM Kanchana P Sridhar
> <kanchana.p.sridhar@intel.com> wrote:
> >
> > For zswap_store() to support large folios, we need to be able to do
> > a batch update of zswap_stored_pages upon successful store of all pages
> > in the folio. For this, we need to add folio_nr_pages(), which returns
> > a long, to zswap_stored_pages.
> 
> Do we really need this? A lot of places in the kernel assign the
> result of folio_nr_pages() to an int (thp_nr_pages(),
> split_huge_pages_all(), etc). I don't think we need to worry about
> folio_nr_pages() exceeding INT_MAX for a while.

You'd be surprised.  Let's assume we add support for PUD-sized pages
(personally I think this is too large to make sense, but some people can't
be told).  On arm64, we can have a 64kB page size, so that's 13 bits per
level for a total of 2^26 pages per PUD.  That feels uncomfortable close
to 2^32 to me.

Anywhere you've found that's using an int to store folio_nr_pages() is
somewhere we should probably switch to long.  And this, btw, is why I've
moved from using an int to store folio_size() to using size_t.  A PMD is
already 512MB (with a 64KB page size), and so a PUD will be 4TB.

thp_nr_pages() is not a good example.  I'll be happy when we kill it;
we're actually almost there.

Re: [PATCH v8 5/8] mm: zswap: Modify zswap_stored_pages to be atomic_long_t.

Posted by Yosry Ahmed 2 months ago

On Fri, Sep 27, 2024 at 9:51 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Sep 27, 2024 at 07:57:49PM -0700, Yosry Ahmed wrote:
> > On Fri, Sep 27, 2024 at 7:16 PM Kanchana P Sridhar
> > <kanchana.p.sridhar@intel.com> wrote:
> > >
> > > For zswap_store() to support large folios, we need to be able to do
> > > a batch update of zswap_stored_pages upon successful store of all pages
> > > in the folio. For this, we need to add folio_nr_pages(), which returns
> > > a long, to zswap_stored_pages.
> >
> > Do we really need this? A lot of places in the kernel assign the
> > result of folio_nr_pages() to an int (thp_nr_pages(),
> > split_huge_pages_all(), etc). I don't think we need to worry about
> > folio_nr_pages() exceeding INT_MAX for a while.
>
> You'd be surprised.  Let's assume we add support for PUD-sized pages
> (personally I think this is too large to make sense, but some people can't
> be told).  On arm64, we can have a 64kB page size, so that's 13 bits per
> level for a total of 2^26 pages per PUD.  That feels uncomfortable close
> to 2^32 to me.
>
> Anywhere you've found that's using an int to store folio_nr_pages() is
> somewhere we should probably switch to long.

There are a lot of them: rmap.c, shmem.c, khugepaged.c, etc.

> And this, btw, is why I've
> moved from using an int to store folio_size() to using size_t.  A PMD is
> already 512MB (with a 64KB page size), and so a PUD will be 4TB.

Thanks for pointing this out. I assumed the presence of many places
using int to store folio_nr_pages() means that it's a general
assumption.

Also, if we think it's possible that a single folio size may approach
INT_MAX, then we are in bigger trouble for zswap_stored_pages, because
that's the total number of pages stored in zswap on the entire system.
That's much more likely to exceed INT_MAX than a single folio.

>
> thp_nr_pages() is not a good example.  I'll be happy when we kill it;
> we're actually almost there.

Yeah I can only see 2 callers.