[PATCH 00/10] mm: make mm->flags a bitmap and 64-bit on all arches

Lorenzo Stoakes posted 10 patches 1 month, 3 weeks ago
arch/s390/mm/mmap.c              |  4 +-
arch/sparc/kernel/sys_sparc_64.c |  4 +-
arch/x86/mm/mmap.c               |  4 +-
fs/coredump.c                    |  4 +-
fs/exec.c                        |  2 +-
fs/pidfs.c                       |  7 +++-
fs/proc/array.c                  |  2 +-
fs/proc/base.c                   | 12 +++---
fs/proc/task_mmu.c               |  2 +-
include/linux/huge_mm.h          |  2 +-
include/linux/khugepaged.h       |  6 ++-
include/linux/ksm.h              |  6 +--
include/linux/mm.h               | 34 +++++++++++++++-
include/linux/mm_types.h         | 67 +++++++++++++++++++++++++-------
include/linux/mman.h             |  2 +-
include/linux/oom.h              |  2 +-
include/linux/sched/coredump.h   | 21 +++++++++-
kernel/events/uprobes.c          | 32 +++++++--------
kernel/fork.c                    |  9 +++--
kernel/sys.c                     | 16 ++++----
mm/debug.c                       |  4 +-
mm/gup.c                         | 10 ++---
mm/huge_memory.c                 |  8 ++--
mm/khugepaged.c                  | 10 ++---
mm/ksm.c                         | 32 +++++++--------
mm/mmap.c                        |  8 ++--
mm/oom_kill.c                    | 26 ++++++-------
mm/util.c                        |  6 +--
tools/testing/vma/vma_internal.h | 19 ++++++++-
29 files changed, 239 insertions(+), 122 deletions(-)
[PATCH 00/10] mm: make mm->flags a bitmap and 64-bit on all arches
Posted by Lorenzo Stoakes 1 month, 3 weeks ago
We are currently in the bizarre situation where we are constrained on the
number of flags we can set in an mm_struct based on whether this is a
32-bit or 64-bit kernel.

This is because mm->flags is an unsigned long field, which is 32-bits on a
32-bit system and 64-bits on a 64-bit system.

In order to keep things functional across both architectures, we do not
permit mm flag bits to be set above flag 31 (i.e. the 32nd bit).

This is a silly situation, especially given how profligate we are in
storing metadata in mm_struct, so let's convert mm->flags into a bitmap and
allow ourselves as many bits as we like.

In order to execute this change, we introduce a new opaque type -
mm_flags_t - which wraps a bitmap.

We go further and mark the bitmap field __private, which forces users to
have to use accessors, which allows us to enforce atomicity rules around
mm->flags (except on those occasions they are not required - fork, etc.)
and makes it far easier to keep track of how mm flags are being utilised.

In order to implement this change sensibly and an an iterative way, we
start by introducing the type with the same bitsize as the current mm flags
(system word size) and place it in union with mm->flags.

We are then able to gradually update users as we go without being forced to
do everything in a single patch.

In the course of working on this series I noticed the MMF_* flag masks
encounter a sign extension bug that, due to the 32-bit limit on mm->flags
thus far, has not caused any issues in practice, but required fixing for
this series.

We must make special dispensation for two cases - coredump and
initailisation on fork, but of which use masks extensively.

Since coredump flags are set in stone, we can safely assume they will
remain in the first 32-bits of the flags. We therefore provide special
non-atomic accessors for this case that access the first system word of
flags, keeping everything there essentially the same.

For mm->flags initialisation on fork, we adjust the logic to ensure all
bits are cleared correctly, and then adjust the existing intialisation
logic, dubbing the implementation utilising flags as legacy.

This means we get the same fast operations as we do now, but in future we
can also choose to update the forking logic to additionally propagate flags
beyond 32-bits across fork.

With this change in place we can, in future, decide to have as many bits as
we please.

Since the size of the bitmap will scale in system word multiples, there
should be no issues with changes in alignment in mm_struct. Additionally,
the really sensitive field (mmap_lock) is located prior to the flags field
so this should have no impact on that either.

Lorenzo Stoakes (10):
  mm: add bitmap mm->flags field
  mm: convert core mm to mm_flags_*() accessors
  mm: convert prctl to mm_flags_*() accessors
  mm: convert arch-specific code to mm_flags_*() accessors
  mm: convert uprobes to mm_flags_*() accessors
  mm: update coredump logic to correctly use bitmap mm flags
  mm: correct sign-extension issue in MMF_* flag masks
  mm: update fork mm->flags initialisation to use bitmap
  mm: convert remaining users to mm_flags_*() accessors
  mm: replace mm->flags with bitmap entirely and set to 64 bits

 arch/s390/mm/mmap.c              |  4 +-
 arch/sparc/kernel/sys_sparc_64.c |  4 +-
 arch/x86/mm/mmap.c               |  4 +-
 fs/coredump.c                    |  4 +-
 fs/exec.c                        |  2 +-
 fs/pidfs.c                       |  7 +++-
 fs/proc/array.c                  |  2 +-
 fs/proc/base.c                   | 12 +++---
 fs/proc/task_mmu.c               |  2 +-
 include/linux/huge_mm.h          |  2 +-
 include/linux/khugepaged.h       |  6 ++-
 include/linux/ksm.h              |  6 +--
 include/linux/mm.h               | 34 +++++++++++++++-
 include/linux/mm_types.h         | 67 +++++++++++++++++++++++++-------
 include/linux/mman.h             |  2 +-
 include/linux/oom.h              |  2 +-
 include/linux/sched/coredump.h   | 21 +++++++++-
 kernel/events/uprobes.c          | 32 +++++++--------
 kernel/fork.c                    |  9 +++--
 kernel/sys.c                     | 16 ++++----
 mm/debug.c                       |  4 +-
 mm/gup.c                         | 10 ++---
 mm/huge_memory.c                 |  8 ++--
 mm/khugepaged.c                  | 10 ++---
 mm/ksm.c                         | 32 +++++++--------
 mm/mmap.c                        |  8 ++--
 mm/oom_kill.c                    | 26 ++++++-------
 mm/util.c                        |  6 +--
 tools/testing/vma/vma_internal.h | 19 ++++++++-
 29 files changed, 239 insertions(+), 122 deletions(-)

--
2.50.1
Re: [PATCH 00/10] mm: make mm->flags a bitmap and 64-bit on all arches
Posted by SeongJae Park 1 month, 3 weeks ago
On Tue, 12 Aug 2025 16:44:09 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:

> We are currently in the bizarre situation where we are constrained on the
> number of flags we can set in an mm_struct based on whether this is a
> 32-bit or 64-bit kernel.
> 
> This is because mm->flags is an unsigned long field, which is 32-bits on a
> 32-bit system and 64-bits on a 64-bit system.
> 
> In order to keep things functional across both architectures, we do not
> permit mm flag bits to be set above flag 31 (i.e. the 32nd bit).
> 
> This is a silly situation, especially given how profligate we are in
> storing metadata in mm_struct, so let's convert mm->flags into a bitmap and
> allow ourselves as many bits as we like.

I like this conversion.

[...]
> 
> In order to execute this change, we introduce a new opaque type -
> mm_flags_t - which wraps a bitmap.

I have no strong opinion here, but I think coding-style.rst[1] has one?  To
quote,

    Please don't use things like ``vps_t``.
    It's a **mistake** to use typedef for structures and pointers. 

checkpatch.pl also complains similarly.

Again, I have no strong opinion, but I think adding a clarification about why
we use typedef despite of the documented recommendation here might be nice?

[...]
> For mm->flags initialisation on fork, we adjust the logic to ensure all
> bits are cleared correctly, and then adjust the existing intialisation

Nit.  s/intialisation/initialisation/ ?

[...]

[1] https://docs.kernel.org/process/coding-style.html#typedefs


Thanks,
SJ
Re: [PATCH 00/10] mm: make mm->flags a bitmap and 64-bit on all arches
Posted by Lorenzo Stoakes 1 month, 3 weeks ago
On Tue, Aug 12, 2025 at 01:13:26PM -0700, SeongJae Park wrote:
> On Tue, 12 Aug 2025 16:44:09 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
>
> > We are currently in the bizarre situation where we are constrained on the
> > number of flags we can set in an mm_struct based on whether this is a
> > 32-bit or 64-bit kernel.
> >
> > This is because mm->flags is an unsigned long field, which is 32-bits on a
> > 32-bit system and 64-bits on a 64-bit system.
> >
> > In order to keep things functional across both architectures, we do not
> > permit mm flag bits to be set above flag 31 (i.e. the 32nd bit).
> >
> > This is a silly situation, especially given how profligate we are in
> > storing metadata in mm_struct, so let's convert mm->flags into a bitmap and
> > allow ourselves as many bits as we like.
>
> I like this conversion.

Thanks!

>
> [...]
> >
> > In order to execute this change, we introduce a new opaque type -
> > mm_flags_t - which wraps a bitmap.
>
> I have no strong opinion here, but I think coding-style.rst[1] has one?  To
> quote,
>
>     Please don't use things like ``vps_t``.
>     It's a **mistake** to use typedef for structures and pointers.

You stopped reading the relevant section in [1] :) Keep going and you see:

	Lots of people think that typedefs help readability. Not so. They
	are useful only for: totally opaque objects (where the typedef is
	actively used to hide what the object is).  Example: pte_t
	etc. opaque objects that you can only access using the proper
	accessor functions.

So this is what this is.

The point is that it's opaque, that is you aren't supposed to know about or
care about what's inside, you use the accessors.

This means we can extend the size of this thing as we like, and can enforce
atomicity through the accessors.

We further highlight the opaqueness through the use of the __private.

>
> checkpatch.pl also complains similarly.
>
> Again, I have no strong opinion, but I think adding a clarification about why
> we use typedef despite of the documented recommendation here might be nice?

I already gave one, I clearly indicate it's opaque.

>
> [...]
> > For mm->flags initialisation on fork, we adjust the logic to ensure all
> > bits are cleared correctly, and then adjust the existing intialisation
>
> Nit.  s/intialisation/initialisation/ ?

Ack thanks!

>
> [...]
>
> [1] https://docs.kernel.org/process/coding-style.html#typedefs
>
>
> Thanks,
> SJ
Re: [PATCH 00/10] mm: make mm->flags a bitmap and 64-bit on all arches
Posted by SeongJae Park 1 month, 3 weeks ago
On Wed, 13 Aug 2025 05:18:31 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:

> On Tue, Aug 12, 2025 at 01:13:26PM -0700, SeongJae Park wrote:
> > On Tue, 12 Aug 2025 16:44:09 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
[...]
> > > In order to execute this change, we introduce a new opaque type -
> > > mm_flags_t - which wraps a bitmap.
> >
> > I have no strong opinion here, but I think coding-style.rst[1] has one?  To
> > quote,
> >
> >     Please don't use things like ``vps_t``.
> >     It's a **mistake** to use typedef for structures and pointers.
> 
> You stopped reading the relevant section in [1] :) Keep going and you see:
> 
> 	Lots of people think that typedefs help readability. Not so. They
> 	are useful only for: totally opaque objects (where the typedef is
> 	actively used to hide what the object is).  Example: pte_t
> 	etc. opaque objects that you can only access using the proper
> 	accessor functions.
> 
> So this is what this is.
> 
> The point is that it's opaque, that is you aren't supposed to know about or
> care about what's inside, you use the accessors.
> 
> This means we can extend the size of this thing as we like, and can enforce
> atomicity through the accessors.
> 
> We further highlight the opaqueness through the use of the __private.
> 
> >
> > checkpatch.pl also complains similarly.
> >
> > Again, I have no strong opinion, but I think adding a clarification about why
> > we use typedef despite of the documented recommendation here might be nice?
> 
> I already gave one, I clearly indicate it's opaque.

You're completely right and I agree all the points.  Thank you for kindly
enlightening me :)


Thanks,
SJ

[...]