[v5] mm/damon: let DAMON be paused and resumed

[RFC PATCH v5 01/10] mm/damon/core: introduce damon_ctx->paused

Posted by SeongJae Park 1 week, 3 days ago

DAMON supports only start and stop of the execution.  When it is
stopped, its internal data that it self-trained goes away.  It will be
useful if the execution can be paused and resumed with the previous
self-trained data.

Introduce per-context API parameter, 'paused', for the purpose.  The
parameter can be set and unset while DAMON is running and paused, using
the online parameters commit helper functions (damon_commit_ctx() and
damon_call()).  Once 'paused' is set, the kdamond_fn() main loop does
only limited works with sampling interval sleep during the works.  The
limited works include the handling of the online parameters update, so
that users can unset the 'pause' and resume the execution when they
want.  It also keep checking DAMON stop conditions and handling of it,
so that DAMON can be stopped while paused if needed.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 include/linux/damon.h | 2 ++
 mm/damon/core.c       | 9 +++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index 04c8a052fcfbe..65c8d5ef510fe 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -787,6 +787,7 @@ struct damon_attrs {
  * @ops:	Set of monitoring operations for given use cases.
  * @addr_unit:	Scale factor for core to ops address conversion.
  * @min_region_sz:	Minimum region size.
+ * @pause:	Pause kdamond main loop.
  * @adaptive_targets:	Head of monitoring targets (&damon_target) list.
  * @schemes:		Head of schemes (&damos) list.
  */
@@ -838,6 +839,7 @@ struct damon_ctx {
 	struct damon_operations ops;
 	unsigned long addr_unit;
 	unsigned long min_region_sz;
+	bool pause;
 
 	struct list_head adaptive_targets;
 	struct list_head schemes;
diff --git a/mm/damon/core.c b/mm/damon/core.c
index db6c67e52d2b8..0ab2cfa848e69 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1349,6 +1349,7 @@ int damon_commit_ctx(struct damon_ctx *dst, struct damon_ctx *src)
 		if (err)
 			return err;
 	}
+	dst->pause = src->pause;
 	dst->ops = src->ops;
 	dst->addr_unit = src->addr_unit;
 	dst->min_region_sz = src->min_region_sz;
@@ -3003,6 +3004,14 @@ static int kdamond_fn(void *data)
 		kdamond_call(ctx, false);
 		if (ctx->maybe_corrupted)
 			break;
+		while (ctx->pause) {
+			damos_walk_cancel(ctx);
+			kdamond_usleep(ctx->attrs.sample_interval);
+			/* allow caller unset pause via damon_call() */
+			kdamond_call(ctx, false);
+			if (kdamond_need_stop(ctx) || ctx->maybe_corrupted)
+				goto done;
+		}
 		if (!list_empty(&ctx->schemes))
 			kdamond_apply_schemes(ctx);
 		else
-- 
2.47.3

Re: (sashiko review) [RFC PATCH v5 01/10] mm/damon/core: introduce damon_ctx->paused

Posted by SeongJae Park 1 week, 3 days ago

Forwarding Sashiko review for doing discussions via mails.

# review url: https://sashiko.dev/#/patchset/20260323231538.84452-2-sj@kernel.org
# start of sashiko.dev inline review
commit 011f7b451b6e4a657677ee09eada15a7980e8480
Author: SeongJae Park <sj@kernel.org>

mm/damon/core: introduce damon_ctx->paused

This commit introduces a pause parameter to the DAMON context, allowing
the kdamond main loop to be paused and resumed without losing its
self-trained internal data.

> diff --git a/mm/damon/core.c b/mm/damon/core.c
> index db6c67e52d2b..0ab2cfa848e6 100644
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c
[ ... ]
> @@ -3003,6 +3004,14 @@ static int kdamond_fn(void *data)
>  		kdamond_call(ctx, false);
>  		if (ctx->maybe_corrupted)
>  			break;
> +		while (ctx->pause) {
> +			damos_walk_cancel(ctx);
> +			kdamond_usleep(ctx->attrs.sample_interval);
> +			/* allow caller unset pause via damon_call() */
> +			kdamond_call(ctx, false);
> +			if (kdamond_need_stop(ctx) || ctx->maybe_corrupted)
> +				goto done;
> +		}

Since this loop is located after ops.prepare_access_checks(), kdamond_usleep(),
and ops.check_accesses(), does this unconditionally perform a full monitoring
cycle before honoring the paused state?

If a context is started with pause set to true, or if pause is enabled while
waiting in kdamond_wait_activation(), it appears it will still execute one
complete active sampling cycle. Could this cause unexpected disruption, such
as clearing PTE accessed bits, when the context is explicitly configured to
be paused?

Additionally, when the context is paused, kdamond_fn remains in this sleep
loop and skips calling damos_adjust_quota(), which means quota->charged_from
is frozen.

If the pause lasts for more than LONG_MAX jiffies (e.g., about 24.8 days on
a 32-bit system with HZ=1000), will system jiffies wrap past the half-range
limit of the unsigned long type relative to the frozen timestamp?

If this wraparound occurs, time_after_eq(jiffies, quota->charged_from +
interval) would evaluate to false when unpaused. Would this cause the quota
window to not reset for another 24.8 days, accumulating charged_sz and
blocking the scheme from being applied?


# end of sashiko.dev inline review
# review url: https://sashiko.dev/#/patchset/20260323231538.84452-2-sj@kernel.org
#
# hkml [1] generated a draft of this mail.  It can be regenerated
# using below command:
#
#     hkml patch sashiko_dev --for_forwarding \
#             20260323231538.84452-2-sj@kernel.org
#
# [1] https://github.com/sjp38/hackermail

Sent using hkml (https://github.com/sjp38/hackermail)

Re: (sashiko review) [RFC PATCH v5 01/10] mm/damon/core: introduce damon_ctx->paused

Posted by SeongJae Park 1 week, 3 days ago

On Mon, 23 Mar 2026 18:28:00 -0700 SeongJae Park <sj@kernel.org> wrote:

> Forwarding Sashiko review for doing discussions via mails.
> 
> # review url: https://sashiko.dev/#/patchset/20260323231538.84452-2-sj@kernel.org
> # start of sashiko.dev inline review
> commit 011f7b451b6e4a657677ee09eada15a7980e8480
> Author: SeongJae Park <sj@kernel.org>
> 
> mm/damon/core: introduce damon_ctx->paused
> 
> This commit introduces a pause parameter to the DAMON context, allowing
> the kdamond main loop to be paused and resumed without losing its
> self-trained internal data.
> 
> > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > index db6c67e52d2b..0ab2cfa848e6 100644
> > --- a/mm/damon/core.c
> > +++ b/mm/damon/core.c
> [ ... ]
> > @@ -3003,6 +3004,14 @@ static int kdamond_fn(void *data)
> >  		kdamond_call(ctx, false);
> >  		if (ctx->maybe_corrupted)
> >  			break;
> > +		while (ctx->pause) {
> > +			damos_walk_cancel(ctx);
> > +			kdamond_usleep(ctx->attrs.sample_interval);
> > +			/* allow caller unset pause via damon_call() */
> > +			kdamond_call(ctx, false);
> > +			if (kdamond_need_stop(ctx) || ctx->maybe_corrupted)
> > +				goto done;
> > +		}
> 
> Since this loop is located after ops.prepare_access_checks(), kdamond_usleep(),
> and ops.check_accesses(), does this unconditionally perform a full monitoring
> cycle before honoring the paused state?
> 
> If a context is started with pause set to true, or if pause is enabled while
> waiting in kdamond_wait_activation(), it appears it will still execute one
> complete active sampling cycle. Could this cause unexpected disruption, such
> as clearing PTE accessed bits, when the context is explicitly configured to
> be paused?

Yes, DAMON would behave in the way.  And having one more sampling work doesn't
cause a real issue.

> 
> Additionally, when the context is paused, kdamond_fn remains in this sleep
> loop and skips calling damos_adjust_quota(), which means quota->charged_from
> is frozen.
> 
> If the pause lasts for more than LONG_MAX jiffies (e.g., about 24.8 days on
> a 32-bit system with HZ=1000), will system jiffies wrap past the half-range
> limit of the unsigned long type relative to the frozen timestamp?
> 
> If this wraparound occurs, time_after_eq(jiffies, quota->charged_from +
> interval) would evaluate to false when unpaused. Would this cause the quota
> window to not reset for another 24.8 days, accumulating charged_sz and
> blocking the scheme from being applied?

That's a wild corner case, but I agree it is better to avoid the problematic
case.  I'm still thinking about the good way for that.  Anyway, I will address
this in the next spin.



Thanks,
SJ

[...]

Re: (sashiko review) [RFC PATCH v5 01/10] mm/damon/core: introduce damon_ctx->paused

Posted by SeongJae Park 1 day, 16 hours ago

On Mon, 23 Mar 2026 21:07:21 -0700 SeongJae Park <sj@kernel.org> wrote:

> On Mon, 23 Mar 2026 18:28:00 -0700 SeongJae Park <sj@kernel.org> wrote:
> 
> > Forwarding Sashiko review for doing discussions via mails.
[...]
> > Additionally, when the context is paused, kdamond_fn remains in this sleep
> > loop and skips calling damos_adjust_quota(), which means quota->charged_from
> > is frozen.
> > 
> > If the pause lasts for more than LONG_MAX jiffies (e.g., about 24.8 days on
> > a 32-bit system with HZ=1000), will system jiffies wrap past the half-range
> > limit of the unsigned long type relative to the frozen timestamp?
> > 
> > If this wraparound occurs, time_after_eq(jiffies, quota->charged_from +
> > interval) would evaluate to false when unpaused. Would this cause the quota
> > window to not reset for another 24.8 days, accumulating charged_sz and
> > blocking the scheme from being applied?
> 
> That's a wild corner case, but I agree it is better to avoid the problematic
> case.  I'm still thinking about the good way for that.  Anyway, I will address
> this in the next spin.

The root cause of the issue was introduced before this patch, so I fixed it
with another hotfix [1].  So next spin of this patch will have no change.

[1] https://lore.kernel.org/20260329152306.45796-1-sj@kernel.org


Thanks,
SJ

[...]