mm/damon/stat: roll back context on damon_call() failure

[PATCH v1] mm/damon/stat: roll back context on damon_call() failure

Posted by Yuho Choi 1 week, 3 days ago

damon_stat_start() allocates and starts damon_stat_context before
registering the repeated damon_call() callback.  If damon_call() fails,
the function currently returns the error while leaving the context
allocated and stored in the global pointer.

The retry-time cleanup added for this path only runs if users try to
enable DAMON_STAT again.  If no retry happens, the failed start leaves
the context allocated indefinitely.

Roll back the failed start by stopping the kdamond before destroying
the context and clearing the global pointer.  damon_stop() waits for a
live kdamond via kthread_stop_put(); if the worker has already completed
teardown, there is no kdamond left to wait on and the context can be
destroyed.

Fixes: 405f61996d9d ("mm/damon/stat: use damon_call() repeat mode instead of damon_callback")
Signed-off-by: Yuho Choi <dbgh9129@gmail.com>
---
 mm/damon/stat.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/damon/stat.c b/mm/damon/stat.c
index 3951b762cbdd..7f222b5b7193 100644
--- a/mm/damon/stat.c
+++ b/mm/damon/stat.c
@@ -266,7 +266,14 @@ static int damon_stat_start(void)
 
 	damon_stat_last_refresh_jiffies = jiffies;
 	call_control.data = damon_stat_context;
-	return damon_call(damon_stat_context, &call_control);
+	err = damon_call(damon_stat_context, &call_control);
+	if (err) {
+		damon_stop(&damon_stat_context, 1);
+		damon_destroy_ctx(damon_stat_context);
+		damon_stat_context = NULL;
+		return err;
+	}
+	return 0;
 }
 
 static void damon_stat_stop(void)
-- 
2.43.0

Re: [PATCH v1] mm/damon/stat: roll back context on damon_call() failure

Posted by SeongJae Park 1 week, 2 days ago

Hello Yuho,


Thank you for sharing this patch!

On Thu, 28 May 2026 23:15:19 -0400 Yuho Choi <dbgh9129@gmail.com> wrote:

> damon_stat_start() allocates and starts damon_stat_context before
> registering the repeated damon_call() callback.  If damon_call() fails,
> the function currently returns the error while leaving the context
> allocated and stored in the global pointer.
> 
> The retry-time cleanup added for this path only runs if users try to
> enable DAMON_STAT again.  If no retry happens, the failed start leaves
> the context allocated indefinitely.
> 
> Roll back the failed start by stopping the kdamond before destroying
> the context and clearing the global pointer.  damon_stop() waits for a
> live kdamond via kthread_stop_put(); if the worker has already completed
> teardown, there is no kdamond left to wait on and the context can be
> destroyed.

But, having one damon_ctx object in the memory is a real problem?  Why?


Thanks,
SJ

[...]

Re: [PATCH v1] mm/damon/stat: roll back context on damon_call() failure

Posted by 최유호 1 week, 2 days ago

Dear SJ

On Fri, 29 May 2026 at 10:44, SeongJae Park <sj@kernel.org> wrote:

> But, having one damon_ctx object in the memory is a real problem?  Why?

I agree that keeping one `damon_ctx` is unlikely to be a serious problem.
My intention was not to address memory pressure but to maintain
failure-path consistency. If initialization fails after the context is
created and started, the internal state remains in place despite the
operation returning an error. The cleanup is then deferred until a
subsequent retry.

For this reason, I think this is worth fixing as a correctness issue.
That said, if you feel the current cleanup is sufficient, I can drop
this patch.

Best regards,
Yuho

> [...]

Re: [PATCH v1] mm/damon/stat: roll back context on damon_call() failure

Posted by SeongJae Park 1 week, 2 days ago

On Fri, 29 May 2026 11:17:01 -0400 최유호 <dbgh9129@gmail.com> wrote:

> Dear SJ
> 
> On Fri, 29 May 2026 at 10:44, SeongJae Park <sj@kernel.org> wrote:
> 
> > But, having one damon_ctx object in the memory is a real problem?  Why?
> 
> I agree that keeping one `damon_ctx` is unlikely to be a serious problem.
> My intention was not to address memory pressure but to maintain
> failure-path consistency. If initialization fails after the context is
> created and started, the internal state remains in place despite the
> operation returning an error. The cleanup is then deferred until a
> subsequent retry.
> 
> For this reason, I think this is worth fixing as a correctness issue.
> That said, if you feel the current cleanup is sufficient, I can drop
> this patch.

Thank you for clarifying, Yuho.  I think the current cleanup is ok, but I
understand it is confusing to read.  Apparently that confused you at least, and
that's not ok.  What about adding a comment explaining the context?


Thanks,
SJ

[...]

Re: [PATCH v1] mm/damon/stat: roll back context on damon_call() failure

Posted by 최유호 1 week, 2 days ago

On Fri, 29 May 2026 at 12:00, SeongJae Park <sj@kernel.org> wrote:

> Thank you for clarifying, Yuho.  I think the current cleanup is ok, but I
> understand it is confusing to read.  Apparently that confused you at least, and
> that's not ok.  What about adding a comment explaining the context?

Thank you for the explanation.

That makes sense. Looking again, I noticed the same retry-time cleanup
pattern is also used in lru_sort.c and reclaim.c, so the behavior
seems intentional.

I'll send v2 patch adding a comment explaining deferred cleanup and
retry-time handling.

Best,
Yuho

Re: [PATCH v1] mm/damon/stat: roll back context on damon_call() failure

Posted by SeongJae Park 1 week, 2 days ago

On Fri, 29 May 2026 16:29:44 -0400 최유호 <dbgh9129@gmail.com> wrote:

> On Fri, 29 May 2026 at 12:00, SeongJae Park <sj@kernel.org> wrote:
> 
> > Thank you for clarifying, Yuho.  I think the current cleanup is ok, but I
> > understand it is confusing to read.  Apparently that confused you at least, and
> > that's not ok.  What about adding a comment explaining the context?
> 
> Thank you for the explanation.
> 
> That makes sense. Looking again, I noticed the same retry-time cleanup
> pattern is also used in lru_sort.c and reclaim.c, so the behavior
> seems intentional.
> 
> I'll send v2 patch adding a comment explaining deferred cleanup and
> retry-time handling.

Sounds good, I'm looking forward to the v2!


Thanks,
SJ

[...]