[PATCH] cgroup/dmem: accept only one region per limit write

Eric Chanudet posted 1 patch 2 days, 6 hours ago
kernel/cgroup/dmem.c | 70 +++++++++++++++++++---------------------------------
1 file changed, 26 insertions(+), 44 deletions(-)
[PATCH] cgroup/dmem: accept only one region per limit write
Posted by Eric Chanudet 2 days, 6 hours ago
Accept only one "region value" pair entry for the dmem.max, dmem.min,
dmem.low files.

This changes the UAPI that otherwise accepted multiple lines for setting
multiple entries in one write. No existing user is known to rely on
writing multiple regions in a single write.

Processing multiple regions in dmemcg_limit_write() could quietly change
first limits before failing on a later one and returning an error to the
writer, with no indication some changes occurred.

Signed-off-by: Eric Chanudet <echanude@redhat.com>
---
Follow up from discussions on a previous thread[1].
If Albert's series[2] lands, I can cleanup and prepare some kunits for
these as well.
[1] https://lore.kernel.org/all/158bc103-7f99-4df4-8d3b-2da9b04ac0ed@lankhorst.se/
[2] https://lore.kernel.org/all/20260519-kunit_cgroups-v4-1-f6c2f498fae4@redhat.com/
---
 kernel/cgroup/dmem.c | 70 +++++++++++++++++++---------------------------------
 1 file changed, 26 insertions(+), 44 deletions(-)

diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 6430c7ce1e0372f59f1313163fb7630ce49ac1ef..113ee88e276296bccb4def546adf5cc175d7f0be 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -734,57 +734,39 @@ static ssize_t dmemcg_limit_write(struct kernfs_open_file *of,
 				 void (*apply)(struct dmem_cgroup_pool_state *, u64))
 {
 	struct dmemcg_state *dmemcs = css_to_dmemcs(of_css(of));
-	int err = 0;
-
-	while (buf && !err) {
-		struct dmem_cgroup_pool_state *pool = NULL;
-		char *options, *region_name;
-		struct dmem_cgroup_region *region;
-		u64 new_limit;
-
-		options = buf;
-		buf = strchr(buf, '\n');
-		if (buf)
-			*buf++ = '\0';
-
-		options = strstrip(options);
-
-		/* eat empty lines */
-		if (!options[0])
-			continue;
-
-		region_name = strsep(&options, " \t");
-		if (!region_name[0])
-			continue;
-
-		if (!options || !*options)
-			return -EINVAL;
+	struct dmem_cgroup_pool_state *pool;
+	struct dmem_cgroup_region *region;
+	char *region_name;
+	u64 new_limit;
+	int err;
 
-		rcu_read_lock();
-		region = dmemcg_get_region_by_name(region_name);
-		rcu_read_unlock();
+	buf = strstrip(buf);
+	region_name = strsep(&buf, " \t");
+	if (!region_name[0] || !buf)
+		return -EINVAL;
 
-		if (!region)
-			return -EINVAL;
+	rcu_read_lock();
+	region = dmemcg_get_region_by_name(region_name);
+	rcu_read_unlock();
+	if (!region)
+		return -EINVAL;
 
-		err = dmemcg_parse_limit(options, &new_limit);
-		if (err < 0)
-			goto out_put;
+	buf = strstrip(buf);
+	err = dmemcg_parse_limit(buf, &new_limit);
+	if (err < 0)
+		goto out_put;
 
-		pool = get_cg_pool_unlocked(dmemcs, region);
-		if (IS_ERR(pool)) {
-			err = PTR_ERR(pool);
-			goto out_put;
-		}
+	pool = get_cg_pool_unlocked(dmemcs, region);
+	if (IS_ERR(pool)) {
+		err = PTR_ERR(pool);
+		goto out_put;
+	}
 
-		/* And commit */
-		apply(pool, new_limit);
-		dmemcg_pool_put(pool);
+	apply(pool, new_limit);
+	dmemcg_pool_put(pool);
 
 out_put:
-		kref_put(&region->ref, dmemcg_free_region);
-	}
-
+	kref_put(&region->ref, dmemcg_free_region);
 
 	return err ?: nbytes;
 }

---
base-commit: 640c57d6ca1346a1c2363a3f473b405af979e046
change-id: 20260605-cgroup-dmem-write-single-region-9bf05b6d995d

Best regards,
-- 
Eric Chanudet <echanude@redhat.com>
Re: [PATCH] cgroup/dmem: accept only one region per limit write
Posted by Natalie Vock 1 day, 13 hours ago
On 6/6/26 00:44, Eric Chanudet wrote:
> Accept only one "region value" pair entry for the dmem.max, dmem.min,
> dmem.low files.>
> This changes the UAPI that otherwise accepted multiple lines for setting
> multiple entries in one write. No existing user is known to rely on
> writing multiple regions in a single write.

Ugh, shoot.

For dmem.low specifically, there already are some userspace thingies 
floating around that may write more than one region/value pairs.

These thingies all depend on that one patchset for dmemcg protection 
that I should really get around to merging[1]. Since the userspace 
utilities depend on not-yet-merged patches, they sort of have to expect 
stuff changing under their belts, so I wouldn't really consider those 
users a blocker by necessity.

As I see it, we could go down one of two paths:
1. We go ahead with the patch as proposed, and I make sure that the 
users I know of adapt. Could be a bit icky wrt. "do not break userspace" 
rules, but since the already use non-merged UAPIs in one place, you can 
argue that these users kind of have to expect breakage.
2. We use the old handling allowing multiple lines for dmem.min and 
dmem.low only. This preserves compatibility but uglifies the code by 
quite a bit.

All things considered, I think I personally would prefer going with 1. 
and taking the patch as proposed and just having one codepath handling 
every limit file. Just highlighting this so we don't do it on accident.

[1] https://patchwork.freedesktop.org/series/163183/

Some more review comments inline.

> 
> Processing multiple regions in dmemcg_limit_write() could quietly change
> first limits before failing on a later one and returning an error to the
> writer, with no indication some changes occurred.
> 
> Signed-off-by: Eric Chanudet <echanude@redhat.com>
> ---
> Follow up from discussions on a previous thread[1].
> If Albert's series[2] lands, I can cleanup and prepare some kunits for
> these as well.
> [1] https://lore.kernel.org/all/158bc103-7f99-4df4-8d3b-2da9b04ac0ed@lankhorst.se/
> [2] https://lore.kernel.org/all/20260519-kunit_cgroups-v4-1-f6c2f498fae4@redhat.com/
> ---
>   kernel/cgroup/dmem.c | 70 +++++++++++++++++++---------------------------------
>   1 file changed, 26 insertions(+), 44 deletions(-)
> 
> diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
> index 6430c7ce1e0372f59f1313163fb7630ce49ac1ef..113ee88e276296bccb4def546adf5cc175d7f0be 100644
> --- a/kernel/cgroup/dmem.c
> +++ b/kernel/cgroup/dmem.c
> @@ -734,57 +734,39 @@ static ssize_t dmemcg_limit_write(struct kernfs_open_file *of,
>   				 void (*apply)(struct dmem_cgroup_pool_state *, u64))
>   {
>   	struct dmemcg_state *dmemcs = css_to_dmemcs(of_css(of));
> -	int err = 0;
> -
> -	while (buf && !err) {
> -		struct dmem_cgroup_pool_state *pool = NULL;
> -		char *options, *region_name;
> -		struct dmem_cgroup_region *region;
> -		u64 new_limit;
> -
> -		options = buf;
> -		buf = strchr(buf, '\n');
> -		if (buf)
> -			*buf++ = '\0';
> -
> -		options = strstrip(options);
> -
> -		/* eat empty lines */
> -		if (!options[0])
> -			continue;
> -
> -		region_name = strsep(&options, " \t");
> -		if (!region_name[0])
> -			continue;
> -
> -		if (!options || !*options)
> -			return -EINVAL;
> +	struct dmem_cgroup_pool_state *pool;
> +	struct dmem_cgroup_region *region;
> +	char *region_name;
> +	u64 new_limit;
> +	int err;
>   
> -		rcu_read_lock();
> -		region = dmemcg_get_region_by_name(region_name);
> -		rcu_read_unlock();
> +	buf = strstrip(buf);
> +	region_name = strsep(&buf, " \t");
> +	if (!region_name[0] || !buf)

If buf is NULL, isn't strsep(&buf, ...) also NULL? region_name[0] would 
therefore be a NULL pointer deref. Flipping the order of the logical or 
should be enough to prevent this.

> +		return -EINVAL;
>   
> -		if (!region)
> -			return -EINVAL;
> +	rcu_read_lock();
> +	region = dmemcg_get_region_by_name(region_name);
> +	rcu_read_unlock();
> +	if (!region)
> +		return -EINVAL;
>   
> -		err = dmemcg_parse_limit(options, &new_limit);
> -		if (err < 0)
> -			goto out_put;
> +	buf = strstrip(buf);

Do we start allowing extra spaces between region and limit as well? 
Would also be fine by me, it doesn't break anything, just highlighting 
that it's a change in behavior. Should perhaps be documented in the 
commit message, too.

Also, you should be able to use skip_spaces() here for an equivalent 
result. I'm not strongly opinionated on either way, but using 
skip_spaces() indicates more clearly that this can only remove spaces at 
the start.

Best,
Natalie

> +	err = dmemcg_parse_limit(buf, &new_limit);
> +	if (err < 0)
> +		goto out_put;
>   
> -		pool = get_cg_pool_unlocked(dmemcs, region);
> -		if (IS_ERR(pool)) {
> -			err = PTR_ERR(pool);
> -			goto out_put;
> -		}
> +	pool = get_cg_pool_unlocked(dmemcs, region);
> +	if (IS_ERR(pool)) {
> +		err = PTR_ERR(pool);
> +		goto out_put;
> +	}
>   
> -		/* And commit */
> -		apply(pool, new_limit);
> -		dmemcg_pool_put(pool);
> +	apply(pool, new_limit);
> +	dmemcg_pool_put(pool);
>   
>   out_put:
> -		kref_put(&region->ref, dmemcg_free_region);
> -	}
> -
> +	kref_put(&region->ref, dmemcg_free_region);
>   
>   	return err ?: nbytes;
>   }
> 
> ---
> base-commit: 640c57d6ca1346a1c2363a3f473b405af979e046
> change-id: 20260605-cgroup-dmem-write-single-region-9bf05b6d995d
> 
> Best regards,
Re: [PATCH] cgroup/dmem: accept only one region per limit write
Posted by Maarten Lankhorst 1 day, 7 hours ago
Hey,

On 6/6/26 18:31, Natalie Vock wrote:
> On 6/6/26 00:44, Eric Chanudet wrote:
>> Accept only one "region value" pair entry for the dmem.max, dmem.min,
>> dmem.low files.>
>> This changes the UAPI that otherwise accepted multiple lines for setting
>> multiple entries in one write. No existing user is known to rely on
>> writing multiple regions in a single write.
> 
> Ugh, shoot.
> 
> For dmem.low specifically, there already are some userspace thingies floating around that may write more than one region/value pairs.
> 
> These thingies all depend on that one patchset for dmemcg protection that I should really get around to merging[1]. Since the userspace utilities depend on not-yet-merged patches, they sort of have to expect stuff changing under their belts, so I wouldn't really consider those users a blocker by necessity.
> 
> As I see it, we could go down one of two paths:
> 1. We go ahead with the patch as proposed, and I make sure that the users I know of adapt. Could be a bit icky wrt. "do not break userspace" rules, but since the already use non-merged UAPIs in one place, you can argue that these users kind of have to expect breakage.
> 2. We use the old handling allowing multiple lines for dmem.min and dmem.low only. This preserves compatibility but uglifies the code by quite a bit.
> 
> All things considered, I think I personally would prefer going with 1. and taking the patch as proposed and just having one codepath handling every limit file. Just highlighting this so we don't do it on accident.
> 
> [1] https://patchwork.freedesktop.org/series/163183/
> 

I prefer option 1 as well, but would like an ack from one of the core cgroup maintainers too,
and what Maxime's opinion on this as well.

Kind regards,

~Maarten Lankhorst

> Some more review comments inline.
> 
>>
>> Processing multiple regions in dmemcg_limit_write() could quietly change
>> first limits before failing on a later one and returning an error to the
>> writer, with no indication some changes occurred.
>>
>> Signed-off-by: Eric Chanudet <echanude@redhat.com>
>> ---
>> Follow up from discussions on a previous thread[1].
>> If Albert's series[2] lands, I can cleanup and prepare some kunits for
>> these as well.
>> [1] https://lore.kernel.org/all/158bc103-7f99-4df4-8d3b-2da9b04ac0ed@lankhorst.se/
>> [2] https://lore.kernel.org/all/20260519-kunit_cgroups-v4-1-f6c2f498fae4@redhat.com/
>> ---
>>   kernel/cgroup/dmem.c | 70 +++++++++++++++++++---------------------------------
>>   1 file changed, 26 insertions(+), 44 deletions(-)
>>
>> diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
>> index 6430c7ce1e0372f59f1313163fb7630ce49ac1ef..113ee88e276296bccb4def546adf5cc175d7f0be 100644
>> --- a/kernel/cgroup/dmem.c
>> +++ b/kernel/cgroup/dmem.c
>> @@ -734,57 +734,39 @@ static ssize_t dmemcg_limit_write(struct kernfs_open_file *of,
>>                    void (*apply)(struct dmem_cgroup_pool_state *, u64))
>>   {
>>       struct dmemcg_state *dmemcs = css_to_dmemcs(of_css(of));
>> -    int err = 0;
>> -
>> -    while (buf && !err) {
>> -        struct dmem_cgroup_pool_state *pool = NULL;
>> -        char *options, *region_name;
>> -        struct dmem_cgroup_region *region;
>> -        u64 new_limit;
>> -
>> -        options = buf;
>> -        buf = strchr(buf, '\n');
>> -        if (buf)
>> -            *buf++ = '\0';
>> -
>> -        options = strstrip(options);
>> -
>> -        /* eat empty lines */
>> -        if (!options[0])
>> -            continue;
>> -
>> -        region_name = strsep(&options, " \t");
>> -        if (!region_name[0])
>> -            continue;
>> -
>> -        if (!options || !*options)
>> -            return -EINVAL;
>> +    struct dmem_cgroup_pool_state *pool;
>> +    struct dmem_cgroup_region *region;
>> +    char *region_name;
>> +    u64 new_limit;
>> +    int err;
>>   -        rcu_read_lock();
>> -        region = dmemcg_get_region_by_name(region_name);
>> -        rcu_read_unlock();
>> +    buf = strstrip(buf);
>> +    region_name = strsep(&buf, " \t");
>> +    if (!region_name[0] || !buf)
> 
> If buf is NULL, isn't strsep(&buf, ...) also NULL? region_name[0] would therefore be a NULL pointer deref. Flipping the order of the logical or should be enough to prevent this.
> 
>> +        return -EINVAL;
>>   -        if (!region)
>> -            return -EINVAL;
>> +    rcu_read_lock();
>> +    region = dmemcg_get_region_by_name(region_name);
>> +    rcu_read_unlock();
>> +    if (!region)
>> +        return -EINVAL;
>>   -        err = dmemcg_parse_limit(options, &new_limit);
>> -        if (err < 0)
>> -            goto out_put;
>> +    buf = strstrip(buf);
> 
> Do we start allowing extra spaces between region and limit as well? Would also be fine by me, it doesn't break anything, just highlighting that it's a change in behavior. Should perhaps be documented in the commit message, too.
> 
> Also, you should be able to use skip_spaces() here for an equivalent result. I'm not strongly opinionated on either way, but using skip_spaces() indicates more clearly that this can only remove spaces at the start.

> 
> Best,
> Natalie
> 
>> +    err = dmemcg_parse_limit(buf, &new_limit);
>> +    if (err < 0)
>> +        goto out_put;
>>   -        pool = get_cg_pool_unlocked(dmemcs, region);
>> -        if (IS_ERR(pool)) {
>> -            err = PTR_ERR(pool);
>> -            goto out_put;
>> -        }
>> +    pool = get_cg_pool_unlocked(dmemcs, region);
>> +    if (IS_ERR(pool)) {
>> +        err = PTR_ERR(pool);
>> +        goto out_put;
>> +    }
>>   -        /* And commit */
>> -        apply(pool, new_limit);
>> -        dmemcg_pool_put(pool);
>> +    apply(pool, new_limit);
>> +    dmemcg_pool_put(pool);
>>     out_put:
>> -        kref_put(&region->ref, dmemcg_free_region);
>> -    }
>> -
>> +    kref_put(&region->ref, dmemcg_free_region);
>>         return err ?: nbytes;
>>   }
>>
>> ---
>> base-commit: 640c57d6ca1346a1c2363a3f473b405af979e046
>> change-id: 20260605-cgroup-dmem-write-single-region-9bf05b6d995d
>>
>> Best regards,
> 

Re: [PATCH] cgroup/dmem: accept only one region per limit write
Posted by Tejun Heo 22 hours ago
On Sat, Jun 06, 2026 at 11:44:10PM +0200, Maarten Lankhorst wrote:
> > As I see it, we could go down one of two paths:
> > 1. We go ahead with the patch as proposed, and I make sure that the users I know of adapt. Could be a bit icky wrt. "do not break userspace" rules, but since the already use non-merged UAPIs in one place, you can argue that these users kind of have to expect breakage.
> > 2. We use the old handling allowing multiple lines for dmem.min and dmem.low only. This preserves compatibility but uglifies the code by quite a bit.
> > 
> > All things considered, I think I personally would prefer going with 1. and taking the patch as proposed and just having one codepath handling every limit file. Just highlighting this so we don't do it on accident.
> > 
> > [1] https://patchwork.freedesktop.org/series/163183/
> > 
> 
> I prefer option 1 as well, but would like an ack from one of the core cgroup maintainers too,
> and what Maxime's opinion on this as well.

Yeah, if at all possible, please drop the multi region write support if at
all possible. This shouldn't have gotten in and yeah it's on the boundary
but the fact that users need external patches works in our favor - both
qualitatively and quantatively (the usage is likely tiny).

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun