[v2] zswap same-filled and limit checking cleanups

[PATCH v2 2/5] mm: zswap: calculate limits only when updated

Posted by Yosry Ahmed 1 year, 10 months ago

Currently, we calculate the zswap global limit, and potentially the
acceptance threshold in the zswap, in pages in the zswap store path.
This is unnecessary because the values rarely change.

Instead, precalculate the them when the module parameters are updated,
which should be rare. Since we are adding custom handlers for setting
the percentages now, add proper validation that they are <= 100.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 mm/zswap.c | 86 ++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 70 insertions(+), 16 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 1cf3ab4b22e64..832e3f56232f0 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -116,12 +116,29 @@ module_param_cb(zpool, &zswap_zpool_param_ops, &zswap_zpool_type, 0644);
 
 /* The maximum percentage of memory that the compressed pool can occupy */
 static unsigned int zswap_max_pool_percent = 20;
-module_param_named(max_pool_percent, zswap_max_pool_percent, uint, 0644);
+static unsigned long zswap_max_pages;
+
+static int zswap_max_pool_param_set(const char *,
+				       const struct kernel_param *);
+static const struct kernel_param_ops zswap_max_pool_param_ops = {
+	.set =		zswap_max_pool_param_set,
+	.get =		param_get_uint,
+};
+module_param_cb(max_pool_percent, &zswap_max_pool_param_ops,
+		&zswap_max_pool_percent, 0644);
 
 /* The threshold for accepting new pages after the max_pool_percent was hit */
 static unsigned int zswap_accept_thr_percent = 90; /* of max pool size */
-module_param_named(accept_threshold_percent, zswap_accept_thr_percent,
-		   uint, 0644);
+unsigned long zswap_accept_thr_pages;
+
+static int zswap_accept_thr_param_set(const char *,
+				      const struct kernel_param *);
+static const struct kernel_param_ops zswap_accept_thr_param_ops = {
+	.set =		zswap_accept_thr_param_set,
+	.get =		param_get_uint,
+};
+module_param_cb(accept_threshold_percent, &zswap_accept_thr_param_ops,
+		&zswap_accept_thr_percent, 0644);
 
 /*
  * Enable/disable handling same-value filled pages (enabled by default).
@@ -490,14 +507,16 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor)
 	return NULL;
 }
 
-static unsigned long zswap_max_pages(void)
+static void zswap_update_max_pages(void)
 {
-	return totalram_pages() * zswap_max_pool_percent / 100;
+	WRITE_ONCE(zswap_max_pages,
+		   totalram_pages() * zswap_max_pool_percent / 100);
 }
 
-static unsigned long zswap_accept_thr_pages(void)
+static void zswap_update_accept_thr_pages(void)
 {
-	return zswap_max_pages() * zswap_accept_thr_percent / 100;
+	WRITE_ONCE(zswap_accept_thr_pages,
+		   READ_ONCE(zswap_max_pages) * zswap_accept_thr_percent / 100);
 }
 
 unsigned long zswap_total_pages(void)
@@ -684,6 +703,43 @@ static int zswap_enabled_param_set(const char *val,
 	return ret;
 }
 
+static int __zswap_percent_param_set(const char *val,
+				     const struct kernel_param *kp)
+{
+	unsigned int n;
+	int ret;
+
+	ret = kstrtouint(val, 10, &n);
+	if (ret || n > 100)
+		return -EINVAL;
+
+	return param_set_uint(val, kp);
+}
+
+static int zswap_max_pool_param_set(const char *val,
+				    const struct kernel_param *kp)
+{
+	int err = __zswap_percent_param_set(val, kp);
+
+	if (!err) {
+		zswap_update_max_pages();
+		zswap_update_accept_thr_pages();
+	}
+
+	return err;
+}
+
+static int zswap_accept_thr_param_set(const char *val,
+				      const struct kernel_param *kp)
+{
+	int err = __zswap_percent_param_set(val, kp);
+
+	if (!err)
+		zswap_update_accept_thr_pages();
+
+	return err;
+}
+
 /*********************************
 * lru functions
 **********************************/
@@ -1305,10 +1361,6 @@ static void shrink_worker(struct work_struct *w)
 {
 	struct mem_cgroup *memcg;
 	int ret, failures = 0;
-	unsigned long thr;
-
-	/* Reclaim down to the accept threshold */
-	thr = zswap_accept_thr_pages();
 
 	/* global reclaim will select cgroup in a round-robin fashion. */
 	do {
@@ -1358,7 +1410,8 @@ static void shrink_worker(struct work_struct *w)
 			break;
 resched:
 		cond_resched();
-	} while (zswap_total_pages() > thr);
+		/* Reclaim down to the accept threshold */
+	} while (zswap_total_pages() > READ_ONCE(zswap_accept_thr_pages));
 }
 
 static int zswap_is_page_same_filled(void *ptr, unsigned long *value)
@@ -1424,16 +1477,14 @@ bool zswap_store(struct folio *folio)
 
 	/* Check global limits */
 	cur_pages = zswap_total_pages();
-	max_pages = zswap_max_pages();
-
-	if (cur_pages >= max_pages) {
+	if (cur_pages >= READ_ONCE(zswap_max_pages)) {
 		zswap_pool_limit_hit++;
 		zswap_pool_reached_full = true;
 		goto reject;
 	}
 
 	if (zswap_pool_reached_full) {
-		if (cur_pages > zswap_accept_thr_pages())
+		if (cur_pages > READ_ONCE(zswap_accept_thr_pages))
 			goto reject;
 		else
 			zswap_pool_reached_full = false;
@@ -1734,6 +1785,9 @@ static int zswap_setup(void)
 		zswap_enabled = false;
 	}
 
+	zswap_update_max_pages();
+	zswap_update_accept_thr_pages();
+
 	if (zswap_debugfs_init())
 		pr_warn("debugfs initialization failed\n");
 	zswap_init_state = ZSWAP_INIT_SUCCEED;
-- 
2.44.0.478.gd926399ef9-goog

Re: [PATCH v2 2/5] mm: zswap: calculate limits only when updated

Posted by kernel test robot 1 year, 10 months ago

Hi Yosry,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on next-20240405]
[cannot apply to linus/master v6.9-rc2]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yosry-Ahmed/mm-zswap-always-shrink-in-zswap_store-if-zswap_pool_reached_full/20240405-133623
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20240405053510.1948982-3-yosryahmed%40google.com
patch subject: [PATCH v2 2/5] mm: zswap: calculate limits only when updated
config: i386-randconfig-061-20240405 (https://download.01.org/0day-ci/archive/20240406/202404060445.zMPxumi9-lkp@intel.com/config)
compiler: gcc-10 (Ubuntu 10.5.0-1ubuntu1) 10.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240406/202404060445.zMPxumi9-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202404060445.zMPxumi9-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> mm/zswap.c:132:15: sparse: sparse: symbol 'zswap_accept_thr_pages' was not declared. Should it be static?
   mm/zswap.c:1188:23: sparse: sparse: context imbalance in 'shrink_memcg_cb' - unexpected unlock

vim +/zswap_accept_thr_pages +132 mm/zswap.c

   120	
   121	static int zswap_max_pool_param_set(const char *,
   122					       const struct kernel_param *);
   123	static const struct kernel_param_ops zswap_max_pool_param_ops = {
   124		.set =		zswap_max_pool_param_set,
   125		.get =		param_get_uint,
   126	};
   127	module_param_cb(max_pool_percent, &zswap_max_pool_param_ops,
   128			&zswap_max_pool_percent, 0644);
   129	
   130	/* The threshold for accepting new pages after the max_pool_percent was hit */
   131	static unsigned int zswap_accept_thr_percent = 90; /* of max pool size */
 > 132	unsigned long zswap_accept_thr_pages;
   133	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Re: [PATCH v2 2/5] mm: zswap: calculate limits only when updated

Posted by Yosry Ahmed 1 year, 10 months ago

On Fri, Apr 5, 2024 at 1:24 PM kernel test robot <lkp@intel.com> wrote:
>
> Hi Yosry,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on akpm-mm/mm-everything]
> [also build test WARNING on next-20240405]
> [cannot apply to linus/master v6.9-rc2]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Yosry-Ahmed/mm-zswap-always-shrink-in-zswap_store-if-zswap_pool_reached_full/20240405-133623
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link:    https://lore.kernel.org/r/20240405053510.1948982-3-yosryahmed%40google.com
> patch subject: [PATCH v2 2/5] mm: zswap: calculate limits only when updated
> config: i386-randconfig-061-20240405 (https://download.01.org/0day-ci/archive/20240406/202404060445.zMPxumi9-lkp@intel.com/config)
> compiler: gcc-10 (Ubuntu 10.5.0-1ubuntu1) 10.5.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240406/202404060445.zMPxumi9-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202404060445.zMPxumi9-lkp@intel.com/
>
> sparse warnings: (new ones prefixed by >>)
> >> mm/zswap.c:132:15: sparse: sparse: symbol 'zswap_accept_thr_pages' was not declared. Should it be static?

Yes, I already have it static for v3, which will be sent after the
discussion on this patch settles.

Re: [PATCH v2 2/5] mm: zswap: calculate limits only when updated

Posted by Johannes Weiner 1 year, 10 months ago

On Fri, Apr 05, 2024 at 05:35:07AM +0000, Yosry Ahmed wrote:
> Currently, we calculate the zswap global limit, and potentially the
> acceptance threshold in the zswap, in pages in the zswap store path.
> This is unnecessary because the values rarely change.
> 
> Instead, precalculate the them when the module parameters are updated,
> which should be rare. Since we are adding custom handlers for setting
> the percentages now, add proper validation that they are <= 100.
> 
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

Nice! Getting that stuff out of the hotpath!

Two comments below:

> @@ -684,6 +703,43 @@ static int zswap_enabled_param_set(const char *val,
>  	return ret;
>  }
>  
> +static int __zswap_percent_param_set(const char *val,
> +				     const struct kernel_param *kp)
> +{
> +	unsigned int n;
> +	int ret;
> +
> +	ret = kstrtouint(val, 10, &n);
> +	if (ret || n > 100)
> +		return -EINVAL;
> +
> +	return param_set_uint(val, kp);
> +}
> +
> +static int zswap_max_pool_param_set(const char *val,
> +				    const struct kernel_param *kp)
> +{
> +	int err = __zswap_percent_param_set(val, kp);
> +
> +	if (!err) {
> +		zswap_update_max_pages();
> +		zswap_update_accept_thr_pages();
> +	}
> +
> +	return err;
> +}
> +
> +static int zswap_accept_thr_param_set(const char *val,
> +				      const struct kernel_param *kp)
> +{
> +	int err = __zswap_percent_param_set(val, kp);
> +
> +	if (!err)
> +		zswap_update_accept_thr_pages();
> +
> +	return err;
> +}

I think you can keep this simple and just always update both if
anything changes for whatever reason. It's an extremely rare event
after all. That should cut it from 3 functions to 1.

Note that totalram_pages can also change during memory onlining and
offlining. For that you need a memory notifier that also calls that
refresh function. It's simple enough, though, check out the code
around register_memory_notifier() in drivers/xen/balloon.c.

Re: [PATCH v2 2/5] mm: zswap: calculate limits only when updated

Posted by Yosry Ahmed 1 year, 10 months ago

On Fri, Apr 5, 2024 at 8:26 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Fri, Apr 05, 2024 at 05:35:07AM +0000, Yosry Ahmed wrote:
> > Currently, we calculate the zswap global limit, and potentially the
> > acceptance threshold in the zswap, in pages in the zswap store path.
> > This is unnecessary because the values rarely change.
> >
> > Instead, precalculate the them when the module parameters are updated,
> > which should be rare. Since we are adding custom handlers for setting
> > the percentages now, add proper validation that they are <= 100.
> >
> > Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
>
> Nice! Getting that stuff out of the hotpath!
>
> Two comments below:
>
> > @@ -684,6 +703,43 @@ static int zswap_enabled_param_set(const char *val,
> >       return ret;
> >  }
> >
> > +static int __zswap_percent_param_set(const char *val,
> > +                                  const struct kernel_param *kp)
> > +{
> > +     unsigned int n;
> > +     int ret;
> > +
> > +     ret = kstrtouint(val, 10, &n);
> > +     if (ret || n > 100)
> > +             return -EINVAL;
> > +
> > +     return param_set_uint(val, kp);
> > +}
> > +
> > +static int zswap_max_pool_param_set(const char *val,
> > +                                 const struct kernel_param *kp)
> > +{
> > +     int err = __zswap_percent_param_set(val, kp);
> > +
> > +     if (!err) {
> > +             zswap_update_max_pages();
> > +             zswap_update_accept_thr_pages();
> > +     }
> > +
> > +     return err;
> > +}
> > +
> > +static int zswap_accept_thr_param_set(const char *val,
> > +                                   const struct kernel_param *kp)
> > +{
> > +     int err = __zswap_percent_param_set(val, kp);
> > +
> > +     if (!err)
> > +             zswap_update_accept_thr_pages();
> > +
> > +     return err;
> > +}
>
> I think you can keep this simple and just always update both if
> anything changes for whatever reason. It's an extremely rare event
> after all. That should cut it from 3 functions to 1.

Yeah we can also combine both update functions in this case.

>
> Note that totalram_pages can also change during memory onlining and
> offlining. For that you need a memory notifier that also calls that
> refresh function. It's simple enough, though, check out the code
> around register_memory_notifier() in drivers/xen/balloon.c.

Good point, I completely missed this. It seems like totalram_pages can
actually change from contexts other than memory hotplug as well,
specifically through adjust_managed_page_count(), and mostly through
ballooning drivers. Do we trigger the notifiers in this case? I can't
find such logic.

It seems like in this case the actual amount of memory does not
change, but the drivers take it away from the system. It makes some
sense to me that the zswap limits do not change in this case,
especially that userspace just sets those limits as a percentage of
total memory. I wouldn't expect userspace to take ballooning into
account here.

However, it would be a behavioral change from today where we always
rely on totalram_pages(). Also, if userspace happens to change the
limit when a driver is holding a big chunk of memory away from
totalram_pages, then the limit would be constantly underestimated.

I do not have enough familiarity with memory ballooning to know for
sure if this is okay. How much memory can memory ballooning add/remove
from totalram_pages(), and is it usually transient or does it stick
around for a while?

Also CCing David here.

Re: [PATCH v2 2/5] mm: zswap: calculate limits only when updated

Posted by David Hildenbrand 1 year, 10 months ago

On 05.04.24 20:43, Yosry Ahmed wrote:
> On Fri, Apr 5, 2024 at 8:26 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>>
>> On Fri, Apr 05, 2024 at 05:35:07AM +0000, Yosry Ahmed wrote:
>>> Currently, we calculate the zswap global limit, and potentially the
>>> acceptance threshold in the zswap, in pages in the zswap store path.
>>> This is unnecessary because the values rarely change.
>>>
>>> Instead, precalculate the them when the module parameters are updated,
>>> which should be rare. Since we are adding custom handlers for setting
>>> the percentages now, add proper validation that they are <= 100.
>>>
>>> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
>>> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
>>
>> Nice! Getting that stuff out of the hotpath!
>>
>> Two comments below:
>>
>>> @@ -684,6 +703,43 @@ static int zswap_enabled_param_set(const char *val,
>>>        return ret;
>>>   }
>>>
>>> +static int __zswap_percent_param_set(const char *val,
>>> +                                  const struct kernel_param *kp)
>>> +{
>>> +     unsigned int n;
>>> +     int ret;
>>> +
>>> +     ret = kstrtouint(val, 10, &n);
>>> +     if (ret || n > 100)
>>> +             return -EINVAL;
>>> +
>>> +     return param_set_uint(val, kp);
>>> +}
>>> +
>>> +static int zswap_max_pool_param_set(const char *val,
>>> +                                 const struct kernel_param *kp)
>>> +{
>>> +     int err = __zswap_percent_param_set(val, kp);
>>> +
>>> +     if (!err) {
>>> +             zswap_update_max_pages();
>>> +             zswap_update_accept_thr_pages();
>>> +     }
>>> +
>>> +     return err;
>>> +}
>>> +
>>> +static int zswap_accept_thr_param_set(const char *val,
>>> +                                   const struct kernel_param *kp)
>>> +{
>>> +     int err = __zswap_percent_param_set(val, kp);
>>> +
>>> +     if (!err)
>>> +             zswap_update_accept_thr_pages();
>>> +
>>> +     return err;
>>> +}
>>
>> I think you can keep this simple and just always update both if
>> anything changes for whatever reason. It's an extremely rare event
>> after all. That should cut it from 3 functions to 1.
> 
> Yeah we can also combine both update functions in this case.
> 
>>
>> Note that totalram_pages can also change during memory onlining and
>> offlining. For that you need a memory notifier that also calls that
>> refresh function. It's simple enough, though, check out the code
>> around register_memory_notifier() in drivers/xen/balloon.c.
> 
> Good point, I completely missed this. It seems like totalram_pages can
> actually change from contexts other than memory hotplug as well,
> specifically through adjust_managed_page_count(), and mostly through
> ballooning drivers. Do we trigger the notifiers in this case? I can't
> find such logic.

Things like virtio-balloon never online/offline memory and would never 
call it.

Things like virtio-mem sometimes will online/offline memory and would 
sometimes call it (but not always). Things like the Hyper-V balloon and 
XEN balloon never offline memory, and would only call it when onlining 
memory.

> 
> It seems like in this case the actual amount of memory does not
> change, but the drivers take it away from the system. It makes some
> sense to me that the zswap limits do not change in this case,
> especially that userspace just sets those limits as a percentage of
> total memory. I wouldn't expect userspace to take ballooning into
> account here.
> 

For virtio-mem, it does change ("actual amount of memory"). For 
virtio-balloon, it's tricky. When using virtio-balloon for VM resizing, 
it would similarly change. When using it for pure memory overcommit, it 
depends on whatever the policy in the hypervisor is ... might be that 
under memory pressure that memory is simply given back to the VM.

> However, it would be a behavioral change from today where we always
> rely on totalram_pages(). Also, if userspace happens to change the
> limit when a driver is holding a big chunk of memory away from
> totalram_pages, then the limit would be constantly underestimated.
> 
> I do not have enough familiarity with memory ballooning to know for
> sure if this is okay. How much memory can memory ballooning add/remove
> from totalram_pages(), and is it usually transient or does it stick
> around for a while?
> 
> Also CCing David here.

It can be a lot. Especially with the Hyper-V balloon (but also on 
environments where other forms of memory hotunplug are not supported), 
memory ballooning can be used to unplug memory. So that memory can be 
gone for good and it can end up being quite a lot of memory.

The clean thing to do would be to have a way for other subsystems to get 
notified on any totalram_pages() changes, so they can adjust accordingly.

-- 
Cheers,

David / dhildenb

Re: [PATCH v2 2/5] mm: zswap: calculate limits only when updated

Posted by Yosry Ahmed 1 year, 10 months ago

[..]
> >> Note that totalram_pages can also change during memory onlining and
> >> offlining. For that you need a memory notifier that also calls that
> >> refresh function. It's simple enough, though, check out the code
> >> around register_memory_notifier() in drivers/xen/balloon.c.
> >
> > Good point, I completely missed this. It seems like totalram_pages can
> > actually change from contexts other than memory hotplug as well,
> > specifically through adjust_managed_page_count(), and mostly through
> > ballooning drivers. Do we trigger the notifiers in this case? I can't
> > find such logic.
>
> Things like virtio-balloon never online/offline memory and would never
> call it.

I see calls to adjust_managed_page_count() from
drivers/virtio/virtio_balloon.c, but I don't understand enough to know
what they are doing.

>
> Things like virtio-mem sometimes will online/offline memory and would
> sometimes call it (but not always). Things like the Hyper-V balloon and
> XEN balloon never offline memory, and would only call it when onlining
> memory.

Thanks for the details.

>
> >
> > It seems like in this case the actual amount of memory does not
> > change, but the drivers take it away from the system. It makes some
> > sense to me that the zswap limits do not change in this case,
> > especially that userspace just sets those limits as a percentage of
> > total memory. I wouldn't expect userspace to take ballooning into
> > account here.
> >
>
> For virtio-mem, it does change ("actual amount of memory"). For
> virtio-balloon, it's tricky. When using virtio-balloon for VM resizing,
> it would similarly change. When using it for pure memory overcommit, it
> depends on whatever the policy in the hypervisor is ... might be that
> under memory pressure that memory is simply given back to the VM.

That's good to know, it seems like we need to take these into account,
and not just because the users may happen to change zswap limits while
they are onlining/offlining memory.

>
> > However, it would be a behavioral change from today where we always
> > rely on totalram_pages(). Also, if userspace happens to change the
> > limit when a driver is holding a big chunk of memory away from
> > totalram_pages, then the limit would be constantly underestimated.
> >
> > I do not have enough familiarity with memory ballooning to know for
> > sure if this is okay. How much memory can memory ballooning add/remove
> > from totalram_pages(), and is it usually transient or does it stick
> > around for a while?
> >
> > Also CCing David here.
>
> It can be a lot. Especially with the Hyper-V balloon (but also on
> environments where other forms of memory hotunplug are not supported),
> memory ballooning can be used to unplug memory. So that memory can be
> gone for good and it can end up being quite a lot of memory.
>
> The clean thing to do would be to have a way for other subsystems to get
> notified on any totalram_pages() changes, so they can adjust accordingly.

Yeah I agree. I imagined that register_memory_notifier() would be the
way to do that. Apparently it is only effective for memory hotplug.
From your description, it sounds like the ballooning drivers may have
a very similar effect to memory hotplug, but I don't see
memory_notify() being called in these paths.

Do we need a separate notifier chain for totalram_pages() updates?

Re: [PATCH v2 2/5] mm: zswap: calculate limits only when updated

Posted by David Hildenbrand 1 year, 10 months ago

On 08.04.24 10:07, Yosry Ahmed wrote:
> [..]
>>>> Note that totalram_pages can also change during memory onlining and
>>>> offlining. For that you need a memory notifier that also calls that
>>>> refresh function. It's simple enough, though, check out the code
>>>> around register_memory_notifier() in drivers/xen/balloon.c.
>>>
>>> Good point, I completely missed this. It seems like totalram_pages can
>>> actually change from contexts other than memory hotplug as well,
>>> specifically through adjust_managed_page_count(), and mostly through
>>> ballooning drivers. Do we trigger the notifiers in this case? I can't
>>> find such logic.
>>
>> Things like virtio-balloon never online/offline memory and would never
>> call it.
> 
> I see calls to adjust_managed_page_count() from
> drivers/virtio/virtio_balloon.c, but I don't understand enough to know
> what they are doing.

Essentially fake removing/adding pages. :)

> 
>>
>> Things like virtio-mem sometimes will online/offline memory and would
>> sometimes call it (but not always). Things like the Hyper-V balloon and
>> XEN balloon never offline memory, and would only call it when onlining
>> memory.
> 
> Thanks for the details.
> 
>>
>>>
>>> It seems like in this case the actual amount of memory does not
>>> change, but the drivers take it away from the system. It makes some
>>> sense to me that the zswap limits do not change in this case,
>>> especially that userspace just sets those limits as a percentage of
>>> total memory. I wouldn't expect userspace to take ballooning into
>>> account here.
>>>
>>
>> For virtio-mem, it does change ("actual amount of memory"). For
>> virtio-balloon, it's tricky. When using virtio-balloon for VM resizing,
>> it would similarly change. When using it for pure memory overcommit, it
>> depends on whatever the policy in the hypervisor is ... might be that
>> under memory pressure that memory is simply given back to the VM.
> 
> That's good to know, it seems like we need to take these into account,
> and not just because the users may happen to change zswap limits while
> they are onlining/offlining memory.

Yes. Likely other parts of the system would want to know when available 
memory changes (especially, if it's quite significant).

> 
>>
>>> However, it would be a behavioral change from today where we always
>>> rely on totalram_pages(). Also, if userspace happens to change the
>>> limit when a driver is holding a big chunk of memory away from
>>> totalram_pages, then the limit would be constantly underestimated.
>>>
>>> I do not have enough familiarity with memory ballooning to know for
>>> sure if this is okay. How much memory can memory ballooning add/remove
>>> from totalram_pages(), and is it usually transient or does it stick
>>> around for a while?
>>>
>>> Also CCing David here.
>>
>> It can be a lot. Especially with the Hyper-V balloon (but also on
>> environments where other forms of memory hotunplug are not supported),
>> memory ballooning can be used to unplug memory. So that memory can be
>> gone for good and it can end up being quite a lot of memory.
>>
>> The clean thing to do would be to have a way for other subsystems to get
>> notified on any totalram_pages() changes, so they can adjust accordingly.
> 
> Yeah I agree. I imagined that register_memory_notifier() would be the
> way to do that. Apparently it is only effective for memory hotplug.

Yes. Right now it's always called under the memory hotplug lock. We 
could reuse it for this fake hotplug/unplug as well, but we'd have to 
clarify how exactly we expect this to interact with the existing 
notifications+notifiers.

>  From your description, it sounds like the ballooning drivers may have
> a very similar effect to memory hotplug, but I don't see
> memory_notify() being called in these paths.

Right, the existing notifications (MEM_ONLINE etc.) are called when 
memory blocks change their state. That's not what the fake 
hotplug/unplug does, that operates on much smaller ranges.

> 
> Do we need a separate notifier chain for totalram_pages() updates?

Good question. I actually might have the requirement to notify some arch 
code (s390x) from virtio-mem when fake adding/removing memory, and 
already wondered how to best wire that up.

Maybe we can squeeze that into the existing notifier chain, but needs a 
bit of thought.

-- 
Cheers,

David / dhildenb

[PATCH v2 1/5] mm: zswap: always shrink in zswap_store() if zswap_pool_reached_full
[PATCH v2 2/5] mm: zswap: calculate limits only when updated
[PATCH v2 3/5] mm: zswap: refactor limit checking from zswap_store()
[PATCH v2 4/5] mm: zswap: move more same-filled pages checks outside of zswap_store()
[PATCH v2 5/5] mm: zswap: remove same_filled module params