mm/damon/sysfs: dealloc repeat_call_control if damon_call() fails

[PATCH] mm/damon/sysfs: dealloc repeat_call_control if damon_call() fails

Posted by SeongJae Park 6 days, 16 hours ago

damon_call() for repeat_call_control of DAMON_SYSFS could fail if
somehow the kdamond is stopped before the damon_call().  It could
happen, for example, when te damon context was made for monitroing of a
virtual address processes, and the process is terminated immediately,
before the damon_call() invocation.  In the case, the dyanmically
allocated repeat_call_control is not deallocated and leaked.

Fix the leak by deallocating the repeat_call_control under the
damon_call() failure.

This issue is discovered by sashiko [1].

[1] https://lore.kernel.org/20260320020630.962-1-sj@kernel.org

Fixes: 04a06b139ec0 ("mm/damon/sysfs: use dynamically allocated repeat mode damon_call_control")
Cc: <stable@vger.kernel.org> # 6.17.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
Changes from RFC
(https://lore.kernel.org/20260326062347.88569-2-sj@kernel.org)
- Split out from the series.
- Drop RFC tag.
- Add Fixes: and Cc: stable.

 mm/damon/sysfs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/damon/sysfs.c b/mm/damon/sysfs.c
index 6a44a2f3d8fc..eefa959aa30a 100644
--- a/mm/damon/sysfs.c
+++ b/mm/damon/sysfs.c
@@ -1670,7 +1670,8 @@ static int damon_sysfs_turn_damon_on(struct damon_sysfs_kdamond *kdamond)
 	repeat_call_control->data = kdamond;
 	repeat_call_control->repeat = true;
 	repeat_call_control->dealloc_on_cancel = true;
-	damon_call(ctx, repeat_call_control);
+	if (damon_call(ctx, repeat_call_control))
+		kfree(repeat_call_control);
 	return err;
 }
 

base-commit: dd478b2be41492a9f7be5abbdbd4dceddc46818f
-- 
2.47.3

Re: (sashiko review) [PATCH] mm/damon/sysfs: dealloc repeat_call_control if damon_call() fails

Posted by SeongJae Park 6 days, 15 hours ago

Forwarding Sashiko review for doing discussions via mails.

# review url: https://sashiko.dev/#/patchset/20260327003224.55752-1-sj@kernel.org
# start of sashiko.dev inline review
commit aba7f94d0accfda5ab9b573157f691de9e9acf00
Author: SeongJae Park <sj@kernel.org>

mm/damon/sysfs: dealloc repeat_call_control if damon_call() fails

This commit attempts to fix a memory leak where repeat_call_control is
not deallocated if damon_call() fails because the kdamond stopped before
the invocation. It adds a check to free repeat_call_control when
damon_call() returns an error.

Link: https://lore.kernel.org/20260320020630.962-1-sj@kernel.org

> diff --git a/mm/damon/sysfs.c b/mm/damon/sysfs.c
> index 6a44a2f3d8fc..eefa959aa30a 100644
> --- a/mm/damon/sysfs.c
> +++ b/mm/damon/sysfs.c
> @@ -1670,7 +1670,8 @@ static int damon_sysfs_turn_damon_on(struct damon_sysfs_kdamond *kdamond)
>  	repeat_call_control->data = kdamond;
>  	repeat_call_control->repeat = true;
>  	repeat_call_control->dealloc_on_cancel = true;
> -	damon_call(ctx, repeat_call_control);
> +	if (damon_call(ctx, repeat_call_control))
> +		kfree(repeat_call_control);

Does this fully resolve the leak of repeat_call_control?

If a sysfs writer thread calls damon_call() during kdamond shutdown,
after the worker thread in kdamond_fn() flushes pending controls but
before it sets ctx->kdamond = NULL:

kdamond_fn()
   kdamond_call(ctx, true);
   ...
   // Before ctx->kdamond = NULL

Would damon_call() add the new request to ctx->call_controls and return 0
since ctx->kdamond is not yet NULL and repeat is true?

If so, does this leave repeat_call_control in the queue and leaked since
the worker is already exiting and this check is bypassed?

Could this same window also cause a deadlock for synchronous commands like
DAMON_SYSFS_CMD_UPDATE_SCHEMES_STATS?

If a synchronous command with repeat = false is added to the queue during
this same window, would the caller hang indefinitely on
wait_for_completion() since the worker thread will never process the queue
again?

>  	return err;
>  }


# end of sashiko.dev inline review
# review url: https://sashiko.dev/#/patchset/20260327003224.55752-1-sj@kernel.org
#
# hkml [1] generated a draft of this mail.  It can be regenerated
# using below command:
#
#     hkml patch sashiko_dev --for_forwarding \
#             20260327003224.55752-1-sj@kernel.org
#
# [1] https://github.com/sjp38/hackermail

Sent using hkml (https://github.com/sjp38/hackermail)

Re: (sashiko review) [PATCH] mm/damon/sysfs: dealloc repeat_call_control if damon_call() fails

Posted by SeongJae Park 6 days, 15 hours ago

TL; DR: nice finding but orthogonal to this patch.

On Thu, 26 Mar 2026 18:17:47 -0700 SeongJae Park <sj@kernel.org> wrote:

> Forwarding Sashiko review for doing discussions via mails.
> 
> # review url: https://sashiko.dev/#/patchset/20260327003224.55752-1-sj@kernel.org
> # start of sashiko.dev inline review
> commit aba7f94d0accfda5ab9b573157f691de9e9acf00
> Author: SeongJae Park <sj@kernel.org>
> 
> mm/damon/sysfs: dealloc repeat_call_control if damon_call() fails
> 
> This commit attempts to fix a memory leak where repeat_call_control is
> not deallocated if damon_call() fails because the kdamond stopped before
> the invocation. It adds a check to free repeat_call_control when
> damon_call() returns an error.
> 
> Link: https://lore.kernel.org/20260320020630.962-1-sj@kernel.org
> 
> > diff --git a/mm/damon/sysfs.c b/mm/damon/sysfs.c
> > index 6a44a2f3d8fc..eefa959aa30a 100644
> > --- a/mm/damon/sysfs.c
> > +++ b/mm/damon/sysfs.c
> > @@ -1670,7 +1670,8 @@ static int damon_sysfs_turn_damon_on(struct damon_sysfs_kdamond *kdamond)
> >  	repeat_call_control->data = kdamond;
> >  	repeat_call_control->repeat = true;
> >  	repeat_call_control->dealloc_on_cancel = true;
> > -	damon_call(ctx, repeat_call_control);
> > +	if (damon_call(ctx, repeat_call_control))
> > +		kfree(repeat_call_control);
> 
> Does this fully resolve the leak of repeat_call_control?
> 
> If a sysfs writer thread calls damon_call() during kdamond shutdown,
> after the worker thread in kdamond_fn() flushes pending controls but
> before it sets ctx->kdamond = NULL:
> 
> kdamond_fn()
>    kdamond_call(ctx, true);
>    ...
>    // Before ctx->kdamond = NULL
> 
> Would damon_call() add the new request to ctx->call_controls and return 0
> since ctx->kdamond is not yet NULL and repeat is true?
> 
> If so, does this leave repeat_call_control in the queue and leaked since
> the worker is already exiting and this check is bypassed?
> 
> Could this same window also cause a deadlock for synchronous commands like
> DAMON_SYSFS_CMD_UPDATE_SCHEMES_STATS?
> 
> If a synchronous command with repeat = false is added to the queue during
> this same window, would the caller hang indefinitely on
> wait_for_completion() since the worker thread will never process the queue
> again?

Nice catch, but orthogonal to this patch.

I'm working on fixing the race with a separate patch:
https://lore.kernel.org/20260327004952.58266-1-sj@kernel.org

The patch would fix the two issues at once.

Nonetheless, the seaprate patch was motivated for only the deadlock issue.
Finding of the leak possibility is a good call.


Thanks,
SJ

[...]