[v1] mm/damon: let DAMON be paused and resumed

[PATCH 01/10] mm/damon/core: introduce damon_ctx->paused

Posted by SeongJae Park 1 week, 6 days ago

DAMON supports only start and stop of the execution.  When it is
stopped, its internal data that it self-trained goes away.  It will be
useful if the execution can be paused and resumed with the previous
self-trained data.

Introduce per-context API parameter, 'paused', for the purpose.  The
parameter can be set and unset while DAMON is running and paused, using
the online parameters commit helper functions (damon_commit_ctx() and
damon_call()).  Once 'paused' is set, the kdamond_fn() main loop does
only limited works with sampling interval sleep during the works.  The
limited works include the handling of the online parameters update, so
that users can unset the 'pause' and resume the execution when they
want.  It also keep checking DAMON stop conditions and handling of it,
so that DAMON can be stopped while paused if needed.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 include/linux/damon.h | 2 ++
 mm/damon/core.c       | 9 +++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index d9a3babbafc16..ea1649a09395d 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -787,6 +787,7 @@ struct damon_attrs {
  * @ops:	Set of monitoring operations for given use cases.
  * @addr_unit:	Scale factor for core to ops address conversion.
  * @min_region_sz:	Minimum region size.
+ * @pause:	Pause kdamond main loop.
  * @adaptive_targets:	Head of monitoring targets (&damon_target) list.
  * @schemes:		Head of schemes (&damos) list.
  */
@@ -838,6 +839,7 @@ struct damon_ctx {
 	struct damon_operations ops;
 	unsigned long addr_unit;
 	unsigned long min_region_sz;
+	bool pause;
 
 	struct list_head adaptive_targets;
 	struct list_head schemes;
diff --git a/mm/damon/core.c b/mm/damon/core.c
index db6c67e52d2b8..ce010009ca136 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1349,6 +1349,7 @@ int damon_commit_ctx(struct damon_ctx *dst, struct damon_ctx *src)
 		if (err)
 			return err;
 	}
+	dst->pause = src->pause;
 	dst->ops = src->ops;
 	dst->addr_unit = src->addr_unit;
 	dst->min_region_sz = src->min_region_sz;
@@ -3003,6 +3004,14 @@ static int kdamond_fn(void *data)
 		kdamond_call(ctx, false);
 		if (ctx->maybe_corrupted)
 			break;
+		while (ctx->pause) {
+			if (kdamond_need_stop(ctx))
+				goto done;
+			kdamond_usleep(ctx->attrs.sample_interval);
+			/* allow caller unset pause via damon_call() */
+			kdamond_call(ctx, false);
+			damos_walk_cancel(ctx);
+		}
 		if (!list_empty(&ctx->schemes))
 			kdamond_apply_schemes(ctx);
 		else
-- 
2.47.3

Re: (sashiko) [PATCH 01/10] mm/damon/core: introduce damon_ctx->paused

Posted by SeongJae Park 1 week, 6 days ago

Forwarding Sashiko review for doing discussions via mails.

# review url: https://sashiko.dev/#/patchset/20260321181343.93971-2-sj@kernel.org
# start of sashiko.dev inline review
commit b56d17c42c1dc6d883f3a75d785a05ba53b5c35b
Author: SeongJae Park <sj@kernel.org>

mm/damon/core: introduce damon_ctx->paused

This commit introduces a paused parameter to the DAMON context API.
It allows users to pause and resume kdamond execution while preserving
self-trained data. The main loop is updated to perform limited work
and sleep when the context is paused.

> diff --git a/mm/damon/core.c b/mm/damon/core.c
> index db6c67e52d2b..ce010009ca13 100644
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c

[ ... ]

> @@ -3003,6 +3004,14 @@ static int kdamond_fn(void *data)
>  		kdamond_call(ctx, false);
>  		if (ctx->maybe_corrupted)
>  			break;
> +		while (ctx->pause) {
> +			if (kdamond_need_stop(ctx))
> +				goto done;
> +			kdamond_usleep(ctx->attrs.sample_interval);

This isn't a bug, but polling with kdamond_usleep() using the sample
interval could cause many wakeups per second even while the context is
paused. Would it be more efficient to use a waitqueue here to consume
near-zero CPU until the thread is explicitly woken?

> +			/* allow caller unset pause via damon_call() */
> +			kdamond_call(ctx, false);

If a parameter update fails during kdamond_call() and sets
ctx->maybe_corrupted to true, the loop will continue to run or exit the
pause loop and proceed to kdamond_apply_schemes() with corrupted
structures.

Is it possible to bypass the corruption safeguard here? Should we check
ctx->maybe_corrupted inside this loop to safely terminate the thread?

> +			damos_walk_cancel(ctx);

If kdamond_call() processes a request that unpauses the context, this
will immediately cancel any pending or concurrent damos_walk() requests
before the loop condition is re-evaluated.

Could this cause spurious cancellations of walk requests when unpausing
the context?

> +		}
>  		if (!list_empty(&ctx->schemes))
>  			kdamond_apply_schemes(ctx);
>  		else


# end of sashiko.dev inline review
# review url: https://sashiko.dev/#/patchset/20260321181343.93971-2-sj@kernel.org
#
# hkml [1] generated a draft of this mail.  It can be regenerated
# using below command:
#
#     hkml patch sashiko_dev --for_forwarding \
#             20260321181343.93971-2-sj@kernel.org
#
# [1] https://github.com/sjp38/hackermail

Re: (sashiko) [PATCH 01/10] mm/damon/core: introduce damon_ctx->paused

Posted by SeongJae Park 1 week, 6 days ago

TL; DR: I think this needs a few changes.  That can be a fixup (picked by
Andrew) or respin for v2.

On Sat, 21 Mar 2026 13:12:04 -0700 SeongJae Park <sj@kernel.org> wrote:

> Forwarding Sashiko review for doing discussions via mails.
> 
> # review url: https://sashiko.dev/#/patchset/20260321181343.93971-2-sj@kernel.org
> # start of sashiko.dev inline review
> commit b56d17c42c1dc6d883f3a75d785a05ba53b5c35b
> Author: SeongJae Park <sj@kernel.org>
> 
> mm/damon/core: introduce damon_ctx->paused
> 
> This commit introduces a paused parameter to the DAMON context API.
> It allows users to pause and resume kdamond execution while preserving
> self-trained data. The main loop is updated to perform limited work
> and sleep when the context is paused.
> 
> > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > index db6c67e52d2b..ce010009ca13 100644
> > --- a/mm/damon/core.c
> > +++ b/mm/damon/core.c
> 
> [ ... ]
> 
> > @@ -3003,6 +3004,14 @@ static int kdamond_fn(void *data)
> >  		kdamond_call(ctx, false);
> >  		if (ctx->maybe_corrupted)
> >  			break;
> > +		while (ctx->pause) {
> > +			if (kdamond_need_stop(ctx))
> > +				goto done;
> > +			kdamond_usleep(ctx->attrs.sample_interval);
> 
> This isn't a bug, but polling with kdamond_usleep() using the sample
> interval could cause many wakeups per second even while the context is
> paused. Would it be more efficient to use a waitqueue here to consume
> near-zero CPU until the thread is explicitly woken?

sample internval is 5ms by default and recommended auto-tuning setup makes it
hundreds of milliseconds.  So I don't think such change is required.

> 
> > +			/* allow caller unset pause via damon_call() */
> > +			kdamond_call(ctx, false);
> 
> If a parameter update fails during kdamond_call() and sets
> ctx->maybe_corrupted to true, the loop will continue to run or exit the
> pause loop and proceed to kdamond_apply_schemes() with corrupted
> structures.
> 
> Is it possible to bypass the corruption safeguard here? Should we check
> ctx->maybe_corrupted inside this loop to safely terminate the thread?

Good finding.  I will add that in the next spin.

> 
> > +			damos_walk_cancel(ctx);
> 
> If kdamond_call() processes a request that unpauses the context, this
> will immediately cancel any pending or concurrent damos_walk() requests
> before the loop condition is re-evaluated.
> 
> Could this cause spurious cancellations of walk requests when unpausing
> the context?

As I replied in the previous patch, that can happen and fine.  Users can do the
walk again in the case.  But, now I agree it may better to reduce such spurious
failures if it can be done in a simple way.  And seems it can be.  Just doing
this before the sleep.

> 
> > +		}
> >  		if (!list_empty(&ctx->schemes))
> >  			kdamond_apply_schemes(ctx);
> >  		else
> 
> 
> # end of sashiko.dev inline review
> # review url: https://sashiko.dev/#/patchset/20260321181343.93971-2-sj@kernel.org

So, I think below attaching fixup is needed.  Unless Andrew picks it up
together, I will post v2 tomorrow.


Thanks,
SJ
=== >8 ===
From 9b07cde80e61bb4cc7a559a62465dd0ee6e2b945 Mon Sep 17 00:00:00 2001
From: SeongJae Park <sj@kernel.org>
Date: Sat, 21 Mar 2026 13:22:52 -0700
Subject: [PATCH] mm/damon/core: fixup: care corruption and cancel damos_walk()
 immediately

As Sashiko suggested and I agreed.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/damon/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/damon/core.c b/mm/damon/core.c
index ce010009ca136..0ab2cfa848e69 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -3005,12 +3005,12 @@ static int kdamond_fn(void *data)
 		if (ctx->maybe_corrupted)
 			break;
 		while (ctx->pause) {
-			if (kdamond_need_stop(ctx))
-				goto done;
+			damos_walk_cancel(ctx);
 			kdamond_usleep(ctx->attrs.sample_interval);
 			/* allow caller unset pause via damon_call() */
 			kdamond_call(ctx, false);
-			damos_walk_cancel(ctx);
+			if (kdamond_need_stop(ctx) || ctx->maybe_corrupted)
+				goto done;
 		}
 		if (!list_empty(&ctx->schemes))
 			kdamond_apply_schemes(ctx);
-- 
2.47.3

Re: (sashiko) [PATCH 01/10] mm/damon/core: introduce damon_ctx->paused

Posted by SeongJae Park 1 week, 5 days ago

On Sat, 21 Mar 2026 13:28:57 -0700 SeongJae Park <sj@kernel.org> wrote:

> TL; DR: I think this needs a few changes.  That can be a fixup (picked by
> Andrew) or respin for v2.

We already have 436 patches in mm-unstable and 40 patches in mm-new [1].
That's 476 patches in mm-new.  The total number was 392 five days ago, which
already made Andrew concerned [2] a little bit.  I also agree the concern, so I
will stop sedning non-RFC non-hotfix DAMON patches until v7.1-rc1 merge window.

That applies to this series, too.  That is, I will post the next spin of this
series with RFC tag again, tomorrow.

[1] https://github.com/sjp38/mm_git_dashboard/blob/master/summary/summary.md
[2] https://lore.kernel.org/20260316142636.0c29d431b1c8bf03e01c3630@linux-foundation.org/

Thanks,
SJ

[...]