kernel/cgroup/dmem.c | 74 +++++++++++++++++++++++----------------------------- 1 file changed, 32 insertions(+), 42 deletions(-)
When writing to dmem.max, it was noticed that some writes did not take
effect, even though the write was successful.
It turns out that page_counter_set_max checks if the new max value is above
the current usage and returns -EBUSY in case it is, which was being ignored
by dmemcg_limit_write.
It was also noticed that when setting limits for multiple regions in a
single write, while setting one region's limit may fail, others might have
succeeded before. Tejun Heo brought up that this breaks atomicity.
Maarten Lankhorst and Michal Koutný have brought up that instead of
failing, setting max below the current usage should behave like memcg and
start eviction until usage goes below the new max.
However, since there is no current mechanism to evict a given region, they
suggest that setting the new max will at least prevent further allocations.
v1 kept the multiple region support when writing to limit files, while
returning -EBUSY as soon as setting a region's max was above the usage.
This version (v2) only allows setting a single region limit per write,
while allowing any new max to be set.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
---
Changes in v2:
- Remove support for setting multiple regions' limits.
- Allow any new max limit to be set.
- Link to v1: https://lore.kernel.org/r/20260318-dmem_max_ebusy-v1-1-b7e461157b29@igalia.com
---
Thadeu Lima de Souza Cascardo (3):
cgroup/dmem: remove region parameter from dmemcg_parse_limit
cgroup/dmem: accept a single region when writing to attributes
cgroup/dmem: allow max to be set below current usage
kernel/cgroup/dmem.c | 74 +++++++++++++++++++++++-----------------------------
1 file changed, 32 insertions(+), 42 deletions(-)
---
base-commit: f338e77383789c0cae23ca3d48adcc5e9e137e3c
change-id: 20260318-dmem_max_ebusy-bfd33564f2c3
Best regards,
--
Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Hello, Generally looks okay to me. One comment on 3/3 \ufffd\ufffd\ufffd the naked xchg() in set_resource_max() needs a comment explaining why it's used instead of page_counter_set_max() and what the semantics are (unconditionally sets max regardless of current usage to prevent further allocations, since there's no eviction mechanism yet). Applied 1/3. Maarten, Michal, what do you think? Thanks. -- tejun
Hey, Den 2026-03-21 kl. 20:27, skrev Tejun Heo: > Hello, > > Generally looks okay to me. One comment on 3/3 — the naked xchg() in > set_resource_max() needs a comment explaining why it's used instead of > page_counter_set_max() and what the semantics are (unconditionally sets max > regardless of current usage to prevent further allocations, since there's no > eviction mechanism yet). > > Applied 1/3. Maarten, Michal, what do you think? Yeah probably drop 2/3 too since there is no longer a case where setting a limit may fail. Kind regards, ~Maarten Lankhorst
On Mon, Mar 23, 2026 at 05:46:28PM +0100, Maarten Lankhorst wrote: > Hey, > > > Den 2026-03-21 kl. 20:27, skrev Tejun Heo: > > Hello, > > > > Generally looks okay to me. One comment on 3/3 — the naked xchg() in > > set_resource_max() needs a comment explaining why it's used instead of > > page_counter_set_max() and what the semantics are (unconditionally sets max > > regardless of current usage to prevent further allocations, since there's no > > eviction mechanism yet). > > > > Applied 1/3. Maarten, Michal, what do you think? > > Yeah probably drop 2/3 too since there is no longer a case where setting a limit may fail. > > Kind regards, > ~Maarten Lankhorst Actually, this can still happen if an invalid region name is given. So, one could write: echo -e 'region1 max\ninvalidregion2 max\n' > dmem.max And even though setting the value for region1 would be applied, the write would return -EINVAL. Cascardo.
Hey, Den 2026-03-23 kl. 18:37, skrev Thadeu Lima de Souza Cascardo: > On Mon, Mar 23, 2026 at 05:46:28PM +0100, Maarten Lankhorst wrote: >> Hey, >> >> >> Den 2026-03-21 kl. 20:27, skrev Tejun Heo: >>> Hello, >>> >>> Generally looks okay to me. One comment on 3/3 — the naked xchg() in >>> set_resource_max() needs a comment explaining why it's used instead of >>> page_counter_set_max() and what the semantics are (unconditionally sets max >>> regardless of current usage to prevent further allocations, since there's no >>> eviction mechanism yet). >>> >>> Applied 1/3. Maarten, Michal, what do you think? >> >> Yeah probably drop 2/3 too since there is no longer a case where setting a limit may fail. >> >> Kind regards, >> ~Maarten Lankhorst > > Actually, this can still happen if an invalid region name is given. > > So, one could write: > > echo -e 'region1 max\ninvalidregion2 max\n' > dmem.max > > And even though setting the value for region1 would be applied, the write > would return -EINVAL. Makes sense. It would be good to validate in advance then. If that's not possible we should at least not error when we try to update 2 regions simultaneously. Likely the best to do so anyway if we want to handle eviction, which may be handled in a blocking fashion. Kind regards, ~Maarten Lankhorst
On Wed, Mar 25, 2026 at 06:40:40PM +0100, Maarten Lankhorst wrote: > Hey, > > Den 2026-03-23 kl. 18:37, skrev Thadeu Lima de Souza Cascardo: > > On Mon, Mar 23, 2026 at 05:46:28PM +0100, Maarten Lankhorst wrote: > >> Hey, > >> > >> > >> Den 2026-03-21 kl. 20:27, skrev Tejun Heo: > >>> Hello, > >>> > >>> Generally looks okay to me. One comment on 3/3 — the naked xchg() in > >>> set_resource_max() needs a comment explaining why it's used instead of > >>> page_counter_set_max() and what the semantics are (unconditionally sets max > >>> regardless of current usage to prevent further allocations, since there's no > >>> eviction mechanism yet). > >>> > >>> Applied 1/3. Maarten, Michal, what do you think? > >> > >> Yeah probably drop 2/3 too since there is no longer a case where setting a limit may fail. > >> > >> Kind regards, > >> ~Maarten Lankhorst > > > > Actually, this can still happen if an invalid region name is given. > > > > So, one could write: > > > > echo -e 'region1 max\ninvalidregion2 max\n' > dmem.max > > > > And even though setting the value for region1 would be applied, the write > > would return -EINVAL. > > Makes sense. It would be good to validate in advance then. If that's not possible we > should at least not error when we try to update 2 regions simultaneously. Likely the > best to do so anyway if we want to handle eviction, which may be handled in a blocking > fashion. > > Kind regards, > ~Maarten Lankhorst I have submitted only the last patch with the additional comment for now. If it turns out that eviction is handled, you bring up a good point that doing it in a blocking fashion may lead to an underisable effect where setting the max and starting eviction for one region may be delayed by the eviction of a previous region. Is there any reason to keep the support for handling multiple regions in a single write? Thanks. Cascardo.
© 2016 - 2026 Red Hat, Inc.