mm/damon/sysfs: fix memory leak and NULL dereference issues

[PATCH v3 3/3] mm/damon/sysfs: check contexts->nr in repeat_call_fn

Posted by SeongJae Park 1 week, 6 days ago

From: Josh Law <objecting@objecting.org>

damon_sysfs_repeat_call_fn() calls damon_sysfs_upd_tuned_intervals(),
damon_sysfs_upd_schemes_stats(), and
damon_sysfs_upd_schemes_effective_quotas() without checking
contexts->nr.  If nr_contexts is set to 0 via sysfs while DAMON is
running, these functions dereference contexts_arr[0] and cause a NULL
pointer dereference.  Add the missing check.

For example, the issue can be reproduced using DAMON sysfs interface and
DAMON user-space tool (damo) [1] like below.

    $ sudo damo start --refresh_interval 1s
    $ echo 0 | sudo tee \
            /sys/kernel/mm/damon/admin/kdamonds/0/contexts/nr_contexts

[1] https://github.com/damonitor/damo

Link: https://patch.msgid.link/20260320163559.178101-3-objecting@objecting.org
Fixes: d809a7c64ba8 ("mm/damon/sysfs: implement refresh_ms file internal work")
Cc: <stable@vger.kernel.org> # 6.17.x
Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/damon/sysfs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/damon/sysfs.c b/mm/damon/sysfs.c
index ddc30586c0e61..6a44a2f3d8fc9 100644
--- a/mm/damon/sysfs.c
+++ b/mm/damon/sysfs.c
@@ -1620,9 +1620,12 @@ static int damon_sysfs_repeat_call_fn(void *data)
 
 	if (!mutex_trylock(&damon_sysfs_lock))
 		return 0;
+	if (sysfs_kdamond->contexts->nr != 1)
+		goto out;
 	damon_sysfs_upd_tuned_intervals(sysfs_kdamond);
 	damon_sysfs_upd_schemes_stats(sysfs_kdamond);
 	damon_sysfs_upd_schemes_effective_quotas(sysfs_kdamond);
+out:
 	mutex_unlock(&damon_sysfs_lock);
 	return 0;
 }
-- 
2.47.3

Re: [PATCH v3 3/3] mm/damon/sysfs: check contexts->nr in repeat_call_fn

Posted by SeongJae Park 1 week, 6 days ago

Subject: (Sashiko) Re: [PATCH v3 3/3] mm/damon/sysfs: check contexts->nr in repeat_call_fn

Forwarding Sashiko review for doing discussions via mails.

# review url: https://sashiko.dev/#/patchset/20260321175427.86000-4-sj@kernel.org
# start of sashiko.dev inline review
commit 349107c02ee8a122119ac77a51451bb3d9311a34
Author: Josh Law <objecting@objecting.org>

mm/damon/sysfs: check contexts->nr in repeat_call_fn

damon_sysfs_repeat_call_fn() calls damon_sysfs_upd_tuned_intervals(),
damon_sysfs_upd_schemes_stats(), and
damon_sysfs_upd_schemes_effective_quotas() without checking
contexts->nr.  If nr_contexts is set to 0 via sysfs while DAMON is
running, these functions dereference contexts_arr[0] and cause a NULL
pointer dereference.  Add the missing check.

Link: https://patch.msgid.link/20260320163559.178101-3-objecting@objecting.org

> diff --git a/mm/damon/sysfs.c b/mm/damon/sysfs.c
> index ddc30586c0e6..6a44a2f3d8fc 100644
> --- a/mm/damon/sysfs.c
> +++ b/mm/damon/sysfs.c
> @@ -1620,9 +1620,12 @@ static int damon_sysfs_repeat_call_fn(void *data)
>  
>  	if (!mutex_trylock(&damon_sysfs_lock))
>  		return 0;
> +	if (sysfs_kdamond->contexts->nr != 1)
> +		goto out;
>  	damon_sysfs_upd_tuned_intervals(sysfs_kdamond);

Does this code cause starvation or incorrect statistics when multiple
kdamonds are active?

It appears that damon_sysfs_next_update_jiffies is declared as a static
global variable and used by this function to throttle updates:

static unsigned long damon_sysfs_next_update_jiffies;

static int damon_sysfs_repeat_call_fn(void *data)
{
    ...
    if (time_before(jiffies, damon_sysfs_next_update_jiffies))
        return 0;
    damon_sysfs_next_update_jiffies = jiffies +
        msecs_to_jiffies(sysfs_kdamond->refresh_ms);
    ...
}

If multiple kdamonds run concurrently, wouldn't they all share and
overwrite this single global timestamp?

If one kdamond sets the timestamp to a future value, another kdamond's
thread might incorrectly evaluate time_before() to true and skip its
update, potentially leading to starvation and incorrect sysfs
statistics for all but the winning kdamond.

>  	damon_sysfs_upd_schemes_stats(sysfs_kdamond);
>  	damon_sysfs_upd_schemes_effective_quotas(sysfs_kdamond);
> +out:
>  	mutex_unlock(&damon_sysfs_lock);
>  	return 0;
>  }

Could there also be a memory leak of repeat_call_control in
damon_sysfs_turn_damon_on() if the target kdamond exits early?

In damon_sysfs_turn_damon_on(), repeat_call_control is allocated:

    repeat_call_control = kmalloc_obj(*repeat_call_control);

If the kdamond thread encounters an error or has no work, it can exit
and set ctx->kdamond = NULL before damon_call() is fully executed.

In this race window, if damon_call() detects that the context is
inactive it calls damon_call_handle_inactive_ctx(). This removes the
control from the list and returns -EINVAL, but it does not free the
memory.

Because damon_sysfs_turn_damon_on() ignores the return value of
damon_call(), does this result in a permanent leak of the memory
allocated for repeat_call_control?


# end of sashiko.dev inline review
# review url: https://sashiko.dev/#/patchset/20260321175427.86000-4-sj@kernel.org
#
# hkml [1] generated a draft of this mail.  It can be regenerated
# using below command:
#
#     hkml patch sashiko_dev --for_forwarding \
#             20260321175427.86000-4-sj@kernel.org
#
# [1] https://github.com/sjp38/hackermail

Sent using hkml (https://github.com/sjp38/hackermail)

Re: [PATCH v3 3/3] mm/damon/sysfs: check contexts->nr in repeat_call_fn

Posted by SeongJae Park 1 week, 6 days ago

TL; DR: no blocker for this patch is found.

On Sat, 21 Mar 2026 13:05:22 -0700 SeongJae Park <sj@kernel.org> wrote:

> Subject: (Sashiko) Re: [PATCH v3 3/3] mm/damon/sysfs: check contexts->nr in repeat_call_fn
> 
> Forwarding Sashiko review for doing discussions via mails.
> 
> # review url: https://sashiko.dev/#/patchset/20260321175427.86000-4-sj@kernel.org
> # start of sashiko.dev inline review
> commit 349107c02ee8a122119ac77a51451bb3d9311a34
> Author: Josh Law <objecting@objecting.org>
> 
> mm/damon/sysfs: check contexts->nr in repeat_call_fn
> 
> damon_sysfs_repeat_call_fn() calls damon_sysfs_upd_tuned_intervals(),
> damon_sysfs_upd_schemes_stats(), and
> damon_sysfs_upd_schemes_effective_quotas() without checking
> contexts->nr.  If nr_contexts is set to 0 via sysfs while DAMON is
> running, these functions dereference contexts_arr[0] and cause a NULL
> pointer dereference.  Add the missing check.
> 
> Link: https://patch.msgid.link/20260320163559.178101-3-objecting@objecting.org
> 
> > diff --git a/mm/damon/sysfs.c b/mm/damon/sysfs.c
> > index ddc30586c0e6..6a44a2f3d8fc 100644
> > --- a/mm/damon/sysfs.c
> > +++ b/mm/damon/sysfs.c
> > @@ -1620,9 +1620,12 @@ static int damon_sysfs_repeat_call_fn(void *data)
> >  
> >  	if (!mutex_trylock(&damon_sysfs_lock))
> >  		return 0;
> > +	if (sysfs_kdamond->contexts->nr != 1)
> > +		goto out;
> >  	damon_sysfs_upd_tuned_intervals(sysfs_kdamond);
> 
> Does this code cause starvation or incorrect statistics when multiple
> kdamonds are active?
> 
> It appears that damon_sysfs_next_update_jiffies is declared as a static
> global variable and used by this function to throttle updates:
> 
> static unsigned long damon_sysfs_next_update_jiffies;
> 
> static int damon_sysfs_repeat_call_fn(void *data)
> {
>     ...
>     if (time_before(jiffies, damon_sysfs_next_update_jiffies))
>         return 0;
>     damon_sysfs_next_update_jiffies = jiffies +
>         msecs_to_jiffies(sysfs_kdamond->refresh_ms);
>     ...
> }
> 
> If multiple kdamonds run concurrently, wouldn't they all share and
> overwrite this single global timestamp?
> 
> If one kdamond sets the timestamp to a future value, another kdamond's
> thread might incorrectly evaluate time_before() to true and skip its
> update, potentially leading to starvation and incorrect sysfs
> statistics for all but the winning kdamond.

I answered this in the previous version of this patch.  In short, good finding
but not a blocker of this patch, I will work on it.

> 
> >  	damon_sysfs_upd_schemes_stats(sysfs_kdamond);
> >  	damon_sysfs_upd_schemes_effective_quotas(sysfs_kdamond);
> > +out:
> >  	mutex_unlock(&damon_sysfs_lock);
> >  	return 0;
> >  }
> 
> Could there also be a memory leak of repeat_call_control in
> damon_sysfs_turn_damon_on() if the target kdamond exits early?
> 
> In damon_sysfs_turn_damon_on(), repeat_call_control is allocated:
> 
>     repeat_call_control = kmalloc_obj(*repeat_call_control);
> 
> If the kdamond thread encounters an error or has no work, it can exit
> and set ctx->kdamond = NULL before damon_call() is fully executed.
> 
> In this race window, if damon_call() detects that the context is
> inactive it calls damon_call_handle_inactive_ctx(). This removes the
> control from the list and returns -EINVAL, but it does not free the
> memory.
> 
> Because damon_sysfs_turn_damon_on() ignores the return value of
> damon_call(), does this result in a permanent leak of the memory
> allocated for repeat_call_control?

I answered this in the previous version of this patch.  In short, good finding
but not a blocker of this patch, I will work on it.

> 
> 
> # end of sashiko.dev inline review
> # review url: https://sashiko.dev/#/patchset/20260321175427.86000-4-sj@kernel.org


Thanks,
SJ

[...]

[PATCH v3 1/3] mm/damon/sysfs: fix param_ctx leak on damon_sysfs_new_test_ctx() failure
[PATCH v3 2/3] mm/damon/sysfs: check contexts->nr before accessing contexts_arr[0]
[PATCH v3 3/3] mm/damon/sysfs: check contexts->nr in repeat_call_fn