[PATCH 0/3] cgroup/misc: Add hwcap masks to the misc controller

Andrei Vagin posted 3 patches 2 weeks ago
Documentation/admin-guide/cgroup-v2.rst    |  25 ++++
Documentation/arch/arm64/elf_hwcaps.rst    |  21 ++++
fs/binfmt_elf.c                            |  24 +++-
include/linux/misc_cgroup.h                |  25 ++++
kernel/cgroup/misc.c                       | 126 +++++++++++++++++++++
tools/testing/selftests/cgroup/.gitignore  |   1 +
tools/testing/selftests/cgroup/Makefile    |   2 +
tools/testing/selftests/cgroup/config      |   1 +
tools/testing/selftests/cgroup/test_misc.c | 114 +++++++++++++++++++
9 files changed, 335 insertions(+), 4 deletions(-)
create mode 100644 tools/testing/selftests/cgroup/test_misc.c
[PATCH 0/3] cgroup/misc: Add hwcap masks to the misc controller
Posted by Andrei Vagin 2 weeks ago
This patch series introduces a mechanism to mask hardware capabilities
(AT_HWCAP) reported to user-space processes via the misc cgroup
controller.

To support C/R operations (snapshots, live migration) in heterogeneous
clusters, we must ensure that processes utilize CPU features available
on all potential target nodes. To solve this, we need to advertise a
common feature set across the cluster. This patchset allows users to
configure a mask for AT_HWCAP, AT_HWCAP2. This ensures that applications
within a container only detect and use features guaranteed to be
available on all potential target hosts.

The first patch adds the mask interface to the misc cgroup controller,
allowing users to set masks for AT_HWCAP, AT_HWCAP2...

The second patch adds a selftest to verify the functionality of the new
interface, ensuring masks are applied and inherited correctly.

The third patch updates the documentation.

Cc: Kees Cook <kees@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Michal Koutný" <mkoutny@suse.com>
Cc: Vipin Sharma <vipinsh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>

Andrei Vagin (3):
  cgroup, binfmt_elf: Add hwcap masks to the misc controller
  selftests/cgroup: Add a test for the misc.mask cgroup interface
  Documentation: cgroup-v2: Document misc.mask interface

 Documentation/admin-guide/cgroup-v2.rst    |  25 ++++
 Documentation/arch/arm64/elf_hwcaps.rst    |  21 ++++
 fs/binfmt_elf.c                            |  24 +++-
 include/linux/misc_cgroup.h                |  25 ++++
 kernel/cgroup/misc.c                       | 126 +++++++++++++++++++++
 tools/testing/selftests/cgroup/.gitignore  |   1 +
 tools/testing/selftests/cgroup/Makefile    |   2 +
 tools/testing/selftests/cgroup/config      |   1 +
 tools/testing/selftests/cgroup/test_misc.c | 114 +++++++++++++++++++
 9 files changed, 335 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/cgroup/test_misc.c
Re: [PATCH 0/3] cgroup/misc: Add hwcap masks to the misc controller
Posted by Chen Ridong 2 weeks ago

On 2025/12/5 8:58, Andrei Vagin wrote:
> This patch series introduces a mechanism to mask hardware capabilities
> (AT_HWCAP) reported to user-space processes via the misc cgroup
> controller.
> 
> To support C/R operations (snapshots, live migration) in heterogeneous
> clusters, we must ensure that processes utilize CPU features available
> on all potential target nodes. To solve this, we need to advertise a
> common feature set across the cluster. This patchset allows users to
> configure a mask for AT_HWCAP, AT_HWCAP2. This ensures that applications
> within a container only detect and use features guaranteed to be
> available on all potential target hosts.
> 

Could you elaborate on how this mask mechanism would be used in practice?

Based on my understanding of the implementation, the parent’s mask is effectively a subset of the
child’s mask, meaning the parent does not impose any additional restrictions on its children. This
behavior appears to differ from typical cgroup controllers, where children are further constrained
by their parent’s settings. This raises the question: is the cgroup model an appropriate fit for
this functionality?

> The first patch adds the mask interface to the misc cgroup controller,
> allowing users to set masks for AT_HWCAP, AT_HWCAP2...
> 
> The second patch adds a selftest to verify the functionality of the new
> interface, ensuring masks are applied and inherited correctly.
> 
> The third patch updates the documentation.
> 
> Cc: Kees Cook <kees@kernel.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: "Michal Koutný" <mkoutny@suse.com>
> Cc: Vipin Sharma <vipinsh@google.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> 
> Andrei Vagin (3):
>   cgroup, binfmt_elf: Add hwcap masks to the misc controller
>   selftests/cgroup: Add a test for the misc.mask cgroup interface
>   Documentation: cgroup-v2: Document misc.mask interface
> 
>  Documentation/admin-guide/cgroup-v2.rst    |  25 ++++
>  Documentation/arch/arm64/elf_hwcaps.rst    |  21 ++++
>  fs/binfmt_elf.c                            |  24 +++-
>  include/linux/misc_cgroup.h                |  25 ++++
>  kernel/cgroup/misc.c                       | 126 +++++++++++++++++++++
>  tools/testing/selftests/cgroup/.gitignore  |   1 +
>  tools/testing/selftests/cgroup/Makefile    |   2 +
>  tools/testing/selftests/cgroup/config      |   1 +
>  tools/testing/selftests/cgroup/test_misc.c | 114 +++++++++++++++++++
>  9 files changed, 335 insertions(+), 4 deletions(-)
>  create mode 100644 tools/testing/selftests/cgroup/test_misc.c

-- 
Best regards,
Ridong

Re: [PATCH 0/3] cgroup/misc: Add hwcap masks to the misc controller
Posted by Andrei Vagin 2 weeks ago
On Thu, Dec 4, 2025 at 6:52 PM Chen Ridong <chenridong@huaweicloud.com> wrote:
>
>
>
> On 2025/12/5 8:58, Andrei Vagin wrote:
> > This patch series introduces a mechanism to mask hardware capabilities
> > (AT_HWCAP) reported to user-space processes via the misc cgroup
> > controller.
> >
> > To support C/R operations (snapshots, live migration) in heterogeneous
> > clusters, we must ensure that processes utilize CPU features available
> > on all potential target nodes. To solve this, we need to advertise a
> > common feature set across the cluster. This patchset allows users to
> > configure a mask for AT_HWCAP, AT_HWCAP2. This ensures that applications
> > within a container only detect and use features guaranteed to be
> > available on all potential target hosts.
> >
>
> Could you elaborate on how this mask mechanism would be used in practice?
>
> Based on my understanding of the implementation, the parent’s mask is effectively a subset of the
> child’s mask, meaning the parent does not impose any additional restrictions on its children. This
> behavior appears to differ from typical cgroup controllers, where children are further constrained
> by their parent’s settings. This raises the question: is the cgroup model an appropriate fit for
> this functionality?

Chen,

Thank you for the question. I think I was not clear enough in the
description.

The misc.mask file works by masking out available features; any feature
bit set in the mask will not be advertised to processes within that
cgroup. When a child cgroup is created, its effective mask is  a
combination of its own mask and its parent's effective mask. This means
any feature masked by either the parent or the child will be hidden from
processes in the child cgroup.

For example:
- If a parent cgroup masks out feature A (mask=0b001), processes in it
  won't see feature A.
- If we create a child cgroup under it and set its mask to hide feature
  B (mask=0b010), the effective mask for processes in the child cgroup
  becomes 0b011. They will see neither feature A nor B.

This ensures that a feature hidden by a parent cannot be re-enabled by a
child. A child can only impose further restrictions by masking out
additional features. I think this behaviour is well aligned with the cgroup
model.

Thanks,
Andrei
Re: [PATCH 0/3] cgroup/misc: Add hwcap masks to the misc controller
Posted by Chen Ridong 2 weeks ago

On 2025/12/5 14:39, Andrei Vagin wrote:
> On Thu, Dec 4, 2025 at 6:52 PM Chen Ridong <chenridong@huaweicloud.com> wrote:
>>
>>
>>
>> On 2025/12/5 8:58, Andrei Vagin wrote:
>>> This patch series introduces a mechanism to mask hardware capabilities
>>> (AT_HWCAP) reported to user-space processes via the misc cgroup
>>> controller.
>>>
>>> To support C/R operations (snapshots, live migration) in heterogeneous
>>> clusters, we must ensure that processes utilize CPU features available
>>> on all potential target nodes. To solve this, we need to advertise a
>>> common feature set across the cluster. This patchset allows users to
>>> configure a mask for AT_HWCAP, AT_HWCAP2. This ensures that applications
>>> within a container only detect and use features guaranteed to be
>>> available on all potential target hosts.
>>>
>>
>> Could you elaborate on how this mask mechanism would be used in practice?
>>
>> Based on my understanding of the implementation, the parent’s mask is effectively a subset of the
>> child’s mask, meaning the parent does not impose any additional restrictions on its children. This
>> behavior appears to differ from typical cgroup controllers, where children are further constrained
>> by their parent’s settings. This raises the question: is the cgroup model an appropriate fit for
>> this functionality?
> 
> Chen,
> 
> Thank you for the question. I think I was not clear enough in the
> description.
> 
> The misc.mask file works by masking out available features; any feature
> bit set in the mask will not be advertised to processes within that
> cgroup. When a child cgroup is created, its effective mask is  a
> combination of its own mask and its parent's effective mask. This means
> any feature masked by either the parent or the child will be hidden from
> processes in the child cgroup.
> 
> For example:
> - If a parent cgroup masks out feature A (mask=0b001), processes in it
>   won't see feature A.
> - If we create a child cgroup under it and set its mask to hide feature
>   B (mask=0b010), the effective mask for processes in the child cgroup
>   becomes 0b011. They will see neither feature A nor B.
> 

Let me ask some basic questions:

When is the misc.mask typically set? Is it only configured before starting a container (e.g., before
docker run), or can it be adjusted dynamically while processes are already running?

I'm concerned about a potential scenario: If a child process initially has access to a CPU feature,
but then its parent cgroup masks that feature out, could the child process remain unaware of this
change?

Specifically, if a process has already cached or relied on a CPU capability before the mask was
applied, would it continue to assume it has that capability, leading to potential issues if it
attempts to use instructions that are now masked out?

Does such a scenario exist in practice?

> This ensures that a feature hidden by a parent cannot be re-enabled by a
> child. A child can only impose further restrictions by masking out
> additional features. I think this behaviour is well aligned with the cgroup
> model.
> 
> Thanks,
> Andrei

-- 
Best regards,
Ridong

Re: [PATCH 0/3] cgroup/misc: Add hwcap masks to the misc controller
Posted by Andrei Vagin 1 week, 6 days ago
On Fri, Dec 5, 2025 at 2:04 AM Chen Ridong <chenridong@huaweicloud.com> wrote:
>
>
>
> On 2025/12/5 14:39, Andrei Vagin wrote:
> > On Thu, Dec 4, 2025 at 6:52 PM Chen Ridong <chenridong@huaweicloud.com> wrote:
> >>
> >>
> >>
> >> On 2025/12/5 8:58, Andrei Vagin wrote:
> >>> This patch series introduces a mechanism to mask hardware capabilities
> >>> (AT_HWCAP) reported to user-space processes via the misc cgroup
> >>> controller.
> >>>
> >>> To support C/R operations (snapshots, live migration) in heterogeneous
> >>> clusters, we must ensure that processes utilize CPU features available
> >>> on all potential target nodes. To solve this, we need to advertise a
> >>> common feature set across the cluster. This patchset allows users to
> >>> configure a mask for AT_HWCAP, AT_HWCAP2. This ensures that applications
> >>> within a container only detect and use features guaranteed to be
> >>> available on all potential target hosts.
> >>>
> >>
> >> Could you elaborate on how this mask mechanism would be used in practice?
> >>
> >> Based on my understanding of the implementation, the parent’s mask is effectively a subset of the
> >> child’s mask, meaning the parent does not impose any additional restrictions on its children. This
> >> behavior appears to differ from typical cgroup controllers, where children are further constrained
> >> by their parent’s settings. This raises the question: is the cgroup model an appropriate fit for
> >> this functionality?
> >
> > Chen,
> >
> > Thank you for the question. I think I was not clear enough in the
> > description.
> >
> > The misc.mask file works by masking out available features; any feature
> > bit set in the mask will not be advertised to processes within that
> > cgroup. When a child cgroup is created, its effective mask is  a
> > combination of its own mask and its parent's effective mask. This means
> > any feature masked by either the parent or the child will be hidden from
> > processes in the child cgroup.
> >
> > For example:
> > - If a parent cgroup masks out feature A (mask=0b001), processes in it
> >   won't see feature A.
> > - If we create a child cgroup under it and set its mask to hide feature
> >   B (mask=0b010), the effective mask for processes in the child cgroup
> >   becomes 0b011. They will see neither feature A nor B.
> >
> Let me ask some basic questions:
>
> When is the misc.mask typically set? Is it only configured before starting a container (e.g., before
> docker run), or can it be adjusted dynamically while processes are already running?

If we are talking about C/R use cases, it should be configured when
container is started. It can be adjusted dynamically, but all changes
will affect only new processes. The auxiliary vectors are set on execve.

>
> I'm concerned about a potential scenario: If a child process initially has access to a CPU feature,
> but then its parent cgroup masks that feature out, could the child process remain unaware of this
> change?
>
> Specifically, if a process has already cached or relied on a CPU capability before the mask was
> applied, would it continue to assume it has that capability, leading to potential issues if it
> attempts to use instructions that are now masked out?

I wouldn't classify this behavior as an issue; it's designed to function
this way. It's important to understand that this isn't enforcement, but
rather information for processes regarding which features are
"guaranteed" to them. A process can choose to utilize unexposed
features at its own risk, potentially encountering problems after
migration to a different host.

Thanks,
Andrei
Re: [PATCH 0/3] cgroup/misc: Add hwcap masks to the misc controller
Posted by Michal Koutný 1 week, 3 days ago
Hello Andrei.

On Fri, Dec 05, 2025 at 12:19:04PM -0800, Andrei Vagin <avagin@gmail.com> wrote:
> If we are talking about C/R use cases, it should be configured when
> container is started. It can be adjusted dynamically, but all changes
> will affect only new processes. The auxiliary vectors are set on execve.

The questions by Ridong are getting at the reasons why cgroup API
doesn't sound like a good match for these values.
I understand it's tempting to implement this by simply copying some
masks from the enclosing cgroup but since there's little to be done upon
(dynamic) change or a process migration it's overkill.

So I'd look at how other [1] adjustments between fork-exec are done and
fit it with them. I guess prctl would be an option as a substitute for
non-existent setauxval().

Thanks,
Michal

[1] Yes, I admit cgroup migration is among them too. Another one is
setns(2) which is IMO a closer concept for this modified view of HW, I'm
not sure whether hardware namespaces had been brought up (and rejected)
in the past.

Re: [PATCH 0/3] cgroup/misc: Add hwcap masks to the misc controller
Posted by Chen Ridong 1 week, 3 days ago

On 2025/12/9 0:48, Michal Koutný wrote:
> Hello Andrei.
> 
> On Fri, Dec 05, 2025 at 12:19:04PM -0800, Andrei Vagin <avagin@gmail.com> wrote:
>> If we are talking about C/R use cases, it should be configured when
>> container is started. It can be adjusted dynamically, but all changes
>> will affect only new processes. The auxiliary vectors are set on execve.
> 
> The questions by Ridong are getting at the reasons why cgroup API
> doesn't sound like a good match for these values.

Eh, The statement "it can be adjusted dynamically, but all changes will affect only new processes"
means that processes created within the same cgroup could end up with different capabilities. This
does not sound like how cgroups typically operate;

> I understand it's tempting to implement this by simply copying some
> masks from the enclosing cgroup but since there's little to be done upon
> (dynamic) change or a process migration it's overkill.
> 
> So I'd look at how other [1] adjustments between fork-exec are done and
> fit it with them. I guess prctl would be an option as a substitute for
> non-existent setauxval().
> 
> Thanks,
> Michal
> 
> [1] Yes, I admit cgroup migration is among them too. Another one is
> setns(2) which is IMO a closer concept for this modified view of HW, I'm
> not sure whether hardware namespaces had been brought up (and rejected)
> in the past.
> 

-- 
Best regards,
Ridong