[PATCH v2 3/4] mm: prevent KSM from completely breaking VMA merging

Lorenzo Stoakes posted 4 patches 6 months, 4 weeks ago
[PATCH v2 3/4] mm: prevent KSM from completely breaking VMA merging
Posted by Lorenzo Stoakes 6 months, 4 weeks ago
If a user wishes to enable KSM mergeability for an entire process and all
fork/exec'd processes that come after it, they use the prctl()
PR_SET_MEMORY_MERGE operation.

This defaults all newly mapped VMAs to have the VM_MERGEABLE VMA flag set
(in order to indicate they are KSM mergeable), as well as setting this flag
for all existing VMAs.

However it also entirely and completely breaks VMA merging for the process
and all forked (and fork/exec'd) processes.

This is because when a new mapping is proposed, the flags specified will
never have VM_MERGEABLE set. However all adjacent VMAs will already have
VM_MERGEABLE set, rendering VMAs unmergeable by default.

To work around this, we try to set the VM_MERGEABLE flag prior to
attempting a merge. In the case of brk() this can always be done.

However on mmap() things are more complicated - while KSM is not supported
for file-backed mappings, it is supported for MAP_PRIVATE file-backed
mappings.

And these mappings may have deprecated .mmap() callbacks specified which
could, in theory, adjust flags and thus KSM eligiblity.

This is unlikely to cause an issue on merge, as any adjacent file-backed
mappings would already have the same post-.mmap() callback attributes, and
thus would naturally not be merged.

But for the purposes of establishing a VMA as KSM-eligible (as well as
initially scanning the VMA), this is potentially very problematic.

So we check to determine whether this at all possible. If not, we set
VM_MERGEABLE prior to the merge attempt on mmap(), otherwise we retain the
previous behaviour.

When .mmap_prepare() is more widely used, we can remove this precaution.

While this doesn't quite cover all cases, it covers a great many (all
anonymous memory, for instance), meaning we should already see a
significant improvement in VMA mergeability.

Since, when it comes to file-backed mappings (other than shmem) we are
really only interested in MAP_PRIVATE mappings which have an available anon
page by default. Therefore, the VM_SPECIAL restriction makes less sense for
KSM.

In a future series we therefore intend to remove this limitation, which
ought to simplify this implementation. However it makes sense to defer
doing so until a later stage so we can first address this mergeability
issue.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Fixes: d7597f59d1d3 ("mm: add new api to enable ksm per process") # please no backport!
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
---
 include/linux/ksm.h |  8 +++++---
 mm/ksm.c            | 18 +++++++++++------
 mm/vma.c            | 49 +++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 64 insertions(+), 11 deletions(-)

diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index d73095b5cd96..51787f0b0208 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -17,8 +17,8 @@
 #ifdef CONFIG_KSM
 int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
 		unsigned long end, int advice, unsigned long *vm_flags);
-
-void ksm_add_vma(struct vm_area_struct *vma);
+vm_flags_t ksm_vma_flags(const struct mm_struct *mm, const struct file *file,
+			 vm_flags_t vm_flags);
 int ksm_enable_merge_any(struct mm_struct *mm);
 int ksm_disable_merge_any(struct mm_struct *mm);
 int ksm_disable(struct mm_struct *mm);
@@ -97,8 +97,10 @@ bool ksm_process_mergeable(struct mm_struct *mm);

 #else  /* !CONFIG_KSM */

-static inline void ksm_add_vma(struct vm_area_struct *vma)
+static inline vm_flags_t ksm_vma_flags(const struct mm_struct *mm,
+		const struct file *file, vm_flags_t vm_flags)
 {
+	return vm_flags;
 }

 static inline int ksm_disable(struct mm_struct *mm)
diff --git a/mm/ksm.c b/mm/ksm.c
index d0c763abd499..18b3690bb69a 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2731,16 +2731,22 @@ static int __ksm_del_vma(struct vm_area_struct *vma)
 	return 0;
 }
 /**
- * ksm_add_vma - Mark vma as mergeable if compatible
+ * ksm_vma_flags - Update VMA flags to mark as mergeable if compatible
  *
- * @vma:  Pointer to vma
+ * @mm:       Proposed VMA's mm_struct
+ * @file:     Proposed VMA's file-backed mapping, if any.
+ * @vm_flags: Proposed VMA"s flags.
+ *
+ * Returns: @vm_flags possibly updated to mark mergeable.
  */
-void ksm_add_vma(struct vm_area_struct *vma)
+vm_flags_t ksm_vma_flags(const struct mm_struct *mm, const struct file *file,
+			 vm_flags_t vm_flags)
 {
-	struct mm_struct *mm = vma->vm_mm;
+	if (test_bit(MMF_VM_MERGE_ANY, &mm->flags) &&
+	    __ksm_should_add_vma(file, vm_flags))
+		vm_flags |= VM_MERGEABLE;

-	if (test_bit(MMF_VM_MERGE_ANY, &mm->flags))
-		__ksm_add_vma(vma);
+	return vm_flags;
 }

 static void ksm_add_vmas(struct mm_struct *mm)
diff --git a/mm/vma.c b/mm/vma.c
index 3ff6cfbe3338..5bebe55ea737 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -2482,7 +2482,6 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
 	 */
 	if (!vma_is_anonymous(vma))
 		khugepaged_enter_vma(vma, map->flags);
-	ksm_add_vma(vma);
 	*vmap = vma;
 	return 0;

@@ -2585,6 +2584,45 @@ static void set_vma_user_defined_fields(struct vm_area_struct *vma,
 	vma->vm_private_data = map->vm_private_data;
 }

+static void update_ksm_flags(struct mmap_state *map)
+{
+	map->flags = ksm_vma_flags(map->mm, map->file, map->flags);
+}
+
+/*
+ * Are we guaranteed no driver can change state such as to preclude KSM merging?
+ * If so, let's set the KSM mergeable flag early so we don't break VMA merging.
+ *
+ * This is applicable when PR_SET_MEMORY_MERGE has been set on the mm_struct via
+ * prctl() causing newly mapped VMAs to have the KSM mergeable VMA flag set.
+ *
+ * If this is not the case, then we set the flag after considering mergeability,
+ * which will prevent mergeability as, when PR_SET_MEMORY_MERGE is set, a new
+ * VMA will not have the KSM mergeability VMA flag set, but all other VMAs will,
+ * preventing any merge.
+ */
+static bool can_set_ksm_flags_early(struct mmap_state *map)
+{
+	struct file *file = map->file;
+
+	/* Anonymous mappings have no driver which can change them. */
+	if (!file)
+		return true;
+
+	/* shmem is safe. */
+	if (shmem_file(file))
+		return true;
+
+	/*
+	 * If .mmap_prepare() is specified, then the driver will have already
+	 * manipulated state prior to updating KSM flags.
+	 */
+	if (file->f_op->mmap_prepare)
+		return true;
+
+	return false;
+}
+
 static unsigned long __mmap_region(struct file *file, unsigned long addr,
 		unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
 		struct list_head *uf)
@@ -2595,6 +2633,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
 	bool have_mmap_prepare = file && file->f_op->mmap_prepare;
 	VMA_ITERATOR(vmi, mm, addr);
 	MMAP_STATE(map, mm, &vmi, addr, len, pgoff, vm_flags, file);
+	bool check_ksm_early = can_set_ksm_flags_early(&map);

 	error = __mmap_prepare(&map, uf);
 	if (!error && have_mmap_prepare)
@@ -2602,6 +2641,9 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
 	if (error)
 		goto abort_munmap;

+	if (check_ksm_early)
+		update_ksm_flags(&map);
+
 	/* Attempt to merge with adjacent VMAs... */
 	if (map.prev || map.next) {
 		VMG_MMAP_STATE(vmg, &map, /* vma = */ NULL);
@@ -2611,6 +2653,9 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,

 	/* ...but if we can't, allocate a new VMA. */
 	if (!vma) {
+		if (!check_ksm_early)
+			update_ksm_flags(&map);
+
 		error = __mmap_new_vma(&map, &vma);
 		if (error)
 			goto unacct_error;
@@ -2713,6 +2758,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	 * Note: This happens *after* clearing old mappings in some code paths.
 	 */
 	flags |= VM_DATA_DEFAULT_FLAGS | VM_ACCOUNT | mm->def_flags;
+	flags = ksm_vma_flags(mm, NULL, flags);
 	if (!may_expand_vm(mm, flags, len >> PAGE_SHIFT))
 		return -ENOMEM;

@@ -2756,7 +2802,6 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,

 	mm->map_count++;
 	validate_mm(mm);
-	ksm_add_vma(vma);
 out:
 	perf_event_mmap(vma);
 	mm->total_vm += len >> PAGE_SHIFT;
--
2.49.0
Re: [PATCH v2 3/4] mm: prevent KSM from completely breaking VMA merging
Posted by Vlastimil Babka 6 months, 2 weeks ago
On 5/21/25 20:20, Lorenzo Stoakes wrote:
> If a user wishes to enable KSM mergeability for an entire process and all
> fork/exec'd processes that come after it, they use the prctl()
> PR_SET_MEMORY_MERGE operation.
> 
> This defaults all newly mapped VMAs to have the VM_MERGEABLE VMA flag set
> (in order to indicate they are KSM mergeable), as well as setting this flag
> for all existing VMAs.
> 
> However it also entirely and completely breaks VMA merging for the process
> and all forked (and fork/exec'd) processes.

I think merging due to e.g. mprotect() should still work, but for new VMAs,
yeah.

> This is because when a new mapping is proposed, the flags specified will
> never have VM_MERGEABLE set. However all adjacent VMAs will already have
> VM_MERGEABLE set, rendering VMAs unmergeable by default.
> 
> To work around this, we try to set the VM_MERGEABLE flag prior to
> attempting a merge. In the case of brk() this can always be done.
> 
> However on mmap() things are more complicated - while KSM is not supported
> for file-backed mappings, it is supported for MAP_PRIVATE file-backed

     ^ insert "shared" to make it obvious?

> mappings.
> 
> And these mappings may have deprecated .mmap() callbacks specified which
> could, in theory, adjust flags and thus KSM eligiblity.

Right, however your can_set_ksm_flags_early() isn't testing exactly that?
More on that there.

> This is unlikely to cause an issue on merge, as any adjacent file-backed
> mappings would already have the same post-.mmap() callback attributes, and
> thus would naturally not be merged.

I'm getting a bit lost as two kinds of merging have to be discussed. If the
vma's around have the same afftributes, they would be VMA-merged, no?

> But for the purposes of establishing a VMA as KSM-eligible (as well as
> initially scanning the VMA), this is potentially very problematic.

This part I understand as we have to check if we can add VM_MERGEABLE after
mmap() has adjusted the flags, as it might have an effect on the result of
ksm_compatible()?

> So we check to determine whether this at all possible. If not, we set
> VM_MERGEABLE prior to the merge attempt on mmap(), otherwise we retain the
> previous behaviour.
> 
> When .mmap_prepare() is more widely used, we can remove this precaution.
> 
> While this doesn't quite cover all cases, it covers a great many (all
> anonymous memory, for instance), meaning we should already see a
> significant improvement in VMA mergeability.
> 
> Since, when it comes to file-backed mappings (other than shmem) we are
> really only interested in MAP_PRIVATE mappings which have an available anon
> page by default. Therefore, the VM_SPECIAL restriction makes less sense for
> KSM.
> 
> In a future series we therefore intend to remove this limitation, which
> ought to simplify this implementation. However it makes sense to defer
> doing so until a later stage so we can first address this mergeability
> issue.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Fixes: d7597f59d1d3 ("mm: add new api to enable ksm per process") # please no backport!
> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>

<snip>

> +/*
> + * Are we guaranteed no driver can change state such as to preclude KSM merging?
> + * If so, let's set the KSM mergeable flag early so we don't break VMA merging.
> + *
> + * This is applicable when PR_SET_MEMORY_MERGE has been set on the mm_struct via
> + * prctl() causing newly mapped VMAs to have the KSM mergeable VMA flag set.
> + *
> + * If this is not the case, then we set the flag after considering mergeability,

								     ^ "VMA"

> + * which will prevent mergeability as, when PR_SET_MEMORY_MERGE is set, a new

			^ "VMA"

> + * VMA will not have the KSM mergeability VMA flag set, but all other VMAs will,
> + * preventing any merge.

		    ^ "VMA"

tedious I know, but more obvious, IMHO

> + */
> +static bool can_set_ksm_flags_early(struct mmap_state *map)
> +{
> +	struct file *file = map->file;
> +
> +	/* Anonymous mappings have no driver which can change them. */
> +	if (!file)
> +		return true;
> +
> +	/* shmem is safe. */
> +	if (shmem_file(file))
> +		return true;
> +
> +	/*
> +	 * If .mmap_prepare() is specified, then the driver will have already
> +	 * manipulated state prior to updating KSM flags.
> +	 */
> +	if (file->f_op->mmap_prepare)
> +		return true;
> +
> +	return false;

So back to my reply in the commit log, why test for mmap_prepare and
otherwise assume false, and not instead test for f_op->mmap which would
result in false, and otherwise return true? Or am I assuming wrong that
there are f_ops that have neither of those two callbacks?

> +}
> +
>  static unsigned long __mmap_region(struct file *file, unsigned long addr,
>  		unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
>  		struct list_head *uf)
> @@ -2595,6 +2633,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
>  	bool have_mmap_prepare = file && file->f_op->mmap_prepare;
>  	VMA_ITERATOR(vmi, mm, addr);
>  	MMAP_STATE(map, mm, &vmi, addr, len, pgoff, vm_flags, file);
> +	bool check_ksm_early = can_set_ksm_flags_early(&map);
> 
>  	error = __mmap_prepare(&map, uf);
>  	if (!error && have_mmap_prepare)
> @@ -2602,6 +2641,9 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
>  	if (error)
>  		goto abort_munmap;
> 
> +	if (check_ksm_early)
> +		update_ksm_flags(&map);
> +
>  	/* Attempt to merge with adjacent VMAs... */
>  	if (map.prev || map.next) {
>  		VMG_MMAP_STATE(vmg, &map, /* vma = */ NULL);
> @@ -2611,6 +2653,9 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
> 
>  	/* ...but if we can't, allocate a new VMA. */
>  	if (!vma) {
> +		if (!check_ksm_early)
> +			update_ksm_flags(&map);
> +
>  		error = __mmap_new_vma(&map, &vma);
>  		if (error)
>  			goto unacct_error;
> @@ -2713,6 +2758,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
>  	 * Note: This happens *after* clearing old mappings in some code paths.
>  	 */
>  	flags |= VM_DATA_DEFAULT_FLAGS | VM_ACCOUNT | mm->def_flags;
> +	flags = ksm_vma_flags(mm, NULL, flags);
>  	if (!may_expand_vm(mm, flags, len >> PAGE_SHIFT))
>  		return -ENOMEM;
> 
> @@ -2756,7 +2802,6 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> 
>  	mm->map_count++;
>  	validate_mm(mm);
> -	ksm_add_vma(vma);
>  out:
>  	perf_event_mmap(vma);
>  	mm->total_vm += len >> PAGE_SHIFT;
> --
> 2.49.0
Re: [PATCH v2 3/4] mm: prevent KSM from completely breaking VMA merging
Posted by Lorenzo Stoakes 6 months, 2 weeks ago
On Thu, May 29, 2025 at 04:50:16PM +0200, Vlastimil Babka wrote:
> On 5/21/25 20:20, Lorenzo Stoakes wrote:
> > If a user wishes to enable KSM mergeability for an entire process and all
> > fork/exec'd processes that come after it, they use the prctl()
> > PR_SET_MEMORY_MERGE operation.
> >
> > This defaults all newly mapped VMAs to have the VM_MERGEABLE VMA flag set
> > (in order to indicate they are KSM mergeable), as well as setting this flag
> > for all existing VMAs.
> >
> > However it also entirely and completely breaks VMA merging for the process
> > and all forked (and fork/exec'd) processes.
>
> I think merging due to e.g. mprotect() should still work, but for new VMAs,
> yeah.

Yes for new VMAs. I'll update the cover letter subject line + commit
messages accordingly... I may have over-egged the pudding, but it's still
really serious.

But you're right, this is misleading, it's not _all_ merging, it's just
_all merging of new mappings, ever_. Which is still you know. Not great :)

>
> > This is because when a new mapping is proposed, the flags specified will
> > never have VM_MERGEABLE set. However all adjacent VMAs will already have
> > VM_MERGEABLE set, rendering VMAs unmergeable by default.
> >
> > To work around this, we try to set the VM_MERGEABLE flag prior to
> > attempting a merge. In the case of brk() this can always be done.
> >
> > However on mmap() things are more complicated - while KSM is not supported
> > for file-backed mappings, it is supported for MAP_PRIVATE file-backed
>
>      ^ insert "shared" to make it obvious?

Good spot, this is confusing as-is, will fixup on respin.

>
> > mappings.
> >
> > And these mappings may have deprecated .mmap() callbacks specified which
> > could, in theory, adjust flags and thus KSM eligiblity.
>
> Right, however your can_set_ksm_flags_early() isn't testing exactly that?
> More on that there.

It's testing to see whether we are in a known case where you can go ahead
and set VM_MERGEABLE because either you know .mmap can't change _KSM_
mergabe eligibility, or it won't be invoked so can't hurt us.

Realistically almost certainly the only cases where this applies are ones
where VM_PFNMAP + friends are set, which a bunch of drivers do.

David actually proposes to stop disallowing this for KSM, at which point we
can drop this function anyway.

But that's best done as a follow-up.

>
> > This is unlikely to cause an issue on merge, as any adjacent file-backed
> > mappings would already have the same post-.mmap() callback attributes, and
> > thus would naturally not be merged.
>
> I'm getting a bit lost as two kinds of merging have to be discussed. If the
> vma's around have the same afftributes, they would be VMA-merged, no?

The overloading of this term is very annoying.

But yeah I need to drop this bit, the VMA mergeability isn't really
applicable - I'll explain why...

My concern was that you'd set VM_MERGEABLE then attempt mergeability then
get merged with an adjacent VMA.

But _later_ the .mmap() hook, had the merge not occurred, would have set
some flags that would have made the prior merge invalid (oopsy!)

However this isn't correct.

The vma->vm_file would need to be the same for both, and therefore any
adjacent VMA would already have had .mmap() called and had their VMA flags
changed.

And therefore TL;DR I should drop this bit from the commit message...

>
> > But for the purposes of establishing a VMA as KSM-eligible (as well as
> > initially scanning the VMA), this is potentially very problematic.
>
> This part I understand as we have to check if we can add VM_MERGEABLE after
> mmap() has adjusted the flags, as it might have an effect on the result of
> ksm_compatible()?

Yes.

>
> > So we check to determine whether this at all possible. If not, we set
> > VM_MERGEABLE prior to the merge attempt on mmap(), otherwise we retain the
> > previous behaviour.
> >
> > When .mmap_prepare() is more widely used, we can remove this precaution.
> >
> > While this doesn't quite cover all cases, it covers a great many (all
> > anonymous memory, for instance), meaning we should already see a
> > significant improvement in VMA mergeability.
> >
> > Since, when it comes to file-backed mappings (other than shmem) we are
> > really only interested in MAP_PRIVATE mappings which have an available anon
> > page by default. Therefore, the VM_SPECIAL restriction makes less sense for
> > KSM.
> >
> > In a future series we therefore intend to remove this limitation, which
> > ought to simplify this implementation. However it makes sense to defer
> > doing so until a later stage so we can first address this mergeability
> > issue.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Fixes: d7597f59d1d3 ("mm: add new api to enable ksm per process") # please no backport!
> > Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
>
> <snip>
>
> > +/*
> > + * Are we guaranteed no driver can change state such as to preclude KSM merging?
> > + * If so, let's set the KSM mergeable flag early so we don't break VMA merging.
> > + *
> > + * This is applicable when PR_SET_MEMORY_MERGE has been set on the mm_struct via
> > + * prctl() causing newly mapped VMAs to have the KSM mergeable VMA flag set.
> > + *
> > + * If this is not the case, then we set the flag after considering mergeability,
>
> 								     ^ "VMA"
>
> > + * which will prevent mergeability as, when PR_SET_MEMORY_MERGE is set, a new
>
> 			^ "VMA"
>
> > + * VMA will not have the KSM mergeability VMA flag set, but all other VMAs will,
> > + * preventing any merge.
>
> 		    ^ "VMA"
>
> tedious I know, but more obvious, IMHO

Ack will fixup.

>
> > + */
> > +static bool can_set_ksm_flags_early(struct mmap_state *map)
> > +{
> > +	struct file *file = map->file;
> > +
> > +	/* Anonymous mappings have no driver which can change them. */
> > +	if (!file)
> > +		return true;
> > +
> > +	/* shmem is safe. */
> > +	if (shmem_file(file))
> > +		return true;
> > +
> > +	/*
> > +	 * If .mmap_prepare() is specified, then the driver will have already
> > +	 * manipulated state prior to updating KSM flags.
> > +	 */
> > +	if (file->f_op->mmap_prepare)
> > +		return true;
> > +
> > +	return false;
>
> So back to my reply in the commit log, why test for mmap_prepare and
> otherwise assume false, and not instead test for f_op->mmap which would
> result in false, and otherwise return true? Or am I assuming wrong that
> there are f_ops that have neither of those two callbacks?

Because shmem has .mmap() set but we know it's safe.

I mean, we should probably put the mmap_prepare check before shmem_file()
to make things clearer.

We plan to drop this function soon (see above) anyway. Just being mighty
cautious.

>
> > +}
> > +
> >  static unsigned long __mmap_region(struct file *file, unsigned long addr,
> >  		unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
> >  		struct list_head *uf)
> > @@ -2595,6 +2633,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
> >  	bool have_mmap_prepare = file && file->f_op->mmap_prepare;
> >  	VMA_ITERATOR(vmi, mm, addr);
> >  	MMAP_STATE(map, mm, &vmi, addr, len, pgoff, vm_flags, file);
> > +	bool check_ksm_early = can_set_ksm_flags_early(&map);
> >
> >  	error = __mmap_prepare(&map, uf);
> >  	if (!error && have_mmap_prepare)
> > @@ -2602,6 +2641,9 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
> >  	if (error)
> >  		goto abort_munmap;
> >
> > +	if (check_ksm_early)
> > +		update_ksm_flags(&map);
> > +
> >  	/* Attempt to merge with adjacent VMAs... */
> >  	if (map.prev || map.next) {
> >  		VMG_MMAP_STATE(vmg, &map, /* vma = */ NULL);
> > @@ -2611,6 +2653,9 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
> >
> >  	/* ...but if we can't, allocate a new VMA. */
> >  	if (!vma) {
> > +		if (!check_ksm_early)
> > +			update_ksm_flags(&map);
> > +
> >  		error = __mmap_new_vma(&map, &vma);
> >  		if (error)
> >  			goto unacct_error;
> > @@ -2713,6 +2758,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> >  	 * Note: This happens *after* clearing old mappings in some code paths.
> >  	 */
> >  	flags |= VM_DATA_DEFAULT_FLAGS | VM_ACCOUNT | mm->def_flags;
> > +	flags = ksm_vma_flags(mm, NULL, flags);
> >  	if (!may_expand_vm(mm, flags, len >> PAGE_SHIFT))
> >  		return -ENOMEM;
> >
> > @@ -2756,7 +2802,6 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> >
> >  	mm->map_count++;
> >  	validate_mm(mm);
> > -	ksm_add_vma(vma);
> >  out:
> >  	perf_event_mmap(vma);
> >  	mm->total_vm += len >> PAGE_SHIFT;
> > --
> > 2.49.0
>
Re: [PATCH v2 3/4] mm: prevent KSM from completely breaking VMA merging
Posted by xu.xin16@zte.com.cn 6 months, 3 weeks ago
> +static void update_ksm_flags(struct mmap_state *map)
> +{
> +	map->flags = ksm_vma_flags(map->mm, map->file, map->flags);
> +}
> +
> +/*
> + * Are we guaranteed no driver can change state such as to preclude KSM merging?
> + * If so, let's set the KSM mergeable flag early so we don't break VMA merging.
> + *
> + * This is applicable when PR_SET_MEMORY_MERGE has been set on the mm_struct via
> + * prctl() causing newly mapped VMAs to have the KSM mergeable VMA flag set.
> + *
> + * If this is not the case, then we set the flag after considering mergeability,
> + * which will prevent mergeability as, when PR_SET_MEMORY_MERGE is set, a new
> + * VMA will not have the KSM mergeability VMA flag set, but all other VMAs will,
> + * preventing any merge.
> + */
> +static bool can_set_ksm_flags_early(struct mmap_state *map)
> +{
> +	struct file *file = map->file;
> +
> +	/* Anonymous mappings have no driver which can change them. */
> +	if (!file)
> +		return true;
> +
> +	/* shmem is safe. */

Excuse me, why it's safe here? Does KSM support shmem?

> +	if (shmem_file(file))
> +		return true;
> +
> +	/*
> +	 * If .mmap_prepare() is specified, then the driver will have already
> +	 * manipulated state prior to updating KSM flags.
> +	 */

Recommend expanding the comments here with slightly more verbose explanations to improve
code comprehension. Consider adding the following note (even though your commit log is
already sufficiently clear.   :)
/*
* If .mmap_prepare() is specified, then the driver will have already
* manipulated state prior to updating KSM flags. So no need to worry
* about mmap callbacks modifying vm_flags after the KSM flag has been
* updated here, which could otherwise affect KSM eligibility.
*/


> +	if (file->f_op->mmap_prepare)
> +		return true;
> +
> +	return false;
> +}
> +
Re: [PATCH v2 3/4] mm: prevent KSM from completely breaking VMA merging
Posted by Lorenzo Stoakes 6 months, 3 weeks ago
On Wed, May 28, 2025 at 11:38:32PM +0800, xu.xin16@zte.com.cn wrote:
> > +static void update_ksm_flags(struct mmap_state *map)
> > +{
> > +	map->flags = ksm_vma_flags(map->mm, map->file, map->flags);
> > +}
> > +
> > +/*
> > + * Are we guaranteed no driver can change state such as to preclude KSM merging?
> > + * If so, let's set the KSM mergeable flag early so we don't break VMA merging.
> > + *
> > + * This is applicable when PR_SET_MEMORY_MERGE has been set on the mm_struct via
> > + * prctl() causing newly mapped VMAs to have the KSM mergeable VMA flag set.
> > + *
> > + * If this is not the case, then we set the flag after considering mergeability,
> > + * which will prevent mergeability as, when PR_SET_MEMORY_MERGE is set, a new
> > + * VMA will not have the KSM mergeability VMA flag set, but all other VMAs will,
> > + * preventing any merge.
> > + */
> > +static bool can_set_ksm_flags_early(struct mmap_state *map)
> > +{
> > +	struct file *file = map->file;
> > +
> > +	/* Anonymous mappings have no driver which can change them. */
> > +	if (!file)
> > +		return true;
> > +
> > +	/* shmem is safe. */
>
> Excuse me, why it's safe here? Does KSM support shmem?

Because shmem_mmap() doesn't do anything which would invalidate the KSM.

Yeah I think I misinterpreted actually - looks like shmem isn't supported
(otherwise VM_SHARED would be set rendering the VMA incompatible), _but_
as with all file-backed mappings, MAP_PRIVATE mappings _are_.

So this is still relevant :)

>
> > +	if (shmem_file(file))
> > +		return true;
> > +
> > +	/*
> > +	 * If .mmap_prepare() is specified, then the driver will have already
> > +	 * manipulated state prior to updating KSM flags.
> > +	 */
>
> Recommend expanding the comments here with slightly more verbose explanations to improve
> code comprehension. Consider adding the following note (even though your commit log is
> already sufficiently clear.   :)
> /*
> * If .mmap_prepare() is specified, then the driver will have already
> * manipulated state prior to updating KSM flags. So no need to worry
> * about mmap callbacks modifying vm_flags after the KSM flag has been
> * updated here, which could otherwise affect KSM eligibility.
> */

While this comment is really nice actually, I think we're probably ok with the
shorter version given the commit log goes into substantial detail.

>
>
> > +	if (file->f_op->mmap_prepare)
> > +		return true;
> > +
> > +	return false;
> > +}
> > +
Re: [PATCH v2 3/4] mm: prevent KSM from completely breaking VMA merging
Posted by Lorenzo Stoakes 6 months, 2 weeks ago
On Wed, May 28, 2025 at 04:50:18PM +0100, Lorenzo Stoakes wrote:
> On Wed, May 28, 2025 at 11:38:32PM +0800, xu.xin16@zte.com.cn wrote:
> > > +static void update_ksm_flags(struct mmap_state *map)
> > > +{
> > > +	map->flags = ksm_vma_flags(map->mm, map->file, map->flags);
> > > +}
> > > +
> > > +/*
> > > + * Are we guaranteed no driver can change state such as to preclude KSM merging?
> > > + * If so, let's set the KSM mergeable flag early so we don't break VMA merging.
> > > + *
> > > + * This is applicable when PR_SET_MEMORY_MERGE has been set on the mm_struct via
> > > + * prctl() causing newly mapped VMAs to have the KSM mergeable VMA flag set.
> > > + *
> > > + * If this is not the case, then we set the flag after considering mergeability,
> > > + * which will prevent mergeability as, when PR_SET_MEMORY_MERGE is set, a new
> > > + * VMA will not have the KSM mergeability VMA flag set, but all other VMAs will,
> > > + * preventing any merge.
> > > + */
> > > +static bool can_set_ksm_flags_early(struct mmap_state *map)
> > > +{
> > > +	struct file *file = map->file;
> > > +
> > > +	/* Anonymous mappings have no driver which can change them. */
> > > +	if (!file)
> > > +		return true;
> > > +
> > > +	/* shmem is safe. */
> >
> > Excuse me, why it's safe here? Does KSM support shmem?
>
> Because shmem_mmap() doesn't do anything which would invalidate the KSM.
>
> Yeah I think I misinterpreted actually - looks like shmem isn't supported
> (otherwise VM_SHARED would be set rendering the VMA incompatible), _but_
> as with all file-backed mappings, MAP_PRIVATE mappings _are_.
>
> So this is still relevant :)

Will update the commit message to correct the erroneous reference to shmem.

>
> >
> > > +	if (shmem_file(file))
> > > +		return true;
> > > +
> > > +	/*
> > > +	 * If .mmap_prepare() is specified, then the driver will have already
> > > +	 * manipulated state prior to updating KSM flags.
> > > +	 */
> >
> > Recommend expanding the comments here with slightly more verbose explanations to improve
> > code comprehension. Consider adding the following note (even though your commit log is
> > already sufficiently clear.   :)
> > /*
> > * If .mmap_prepare() is specified, then the driver will have already
> > * manipulated state prior to updating KSM flags. So no need to worry
> > * about mmap callbacks modifying vm_flags after the KSM flag has been
> > * updated here, which could otherwise affect KSM eligibility.
> > */
>
> While this comment is really nice actually, I think we're probably ok with the
> shorter version given the commit log goes into substantial detail.

Actually on second thoughts, as I'm respinning, I'll update this and replace
with (a slightly adjusted version of) your excellent comment! :)

>
> >
> >
> > > +	if (file->f_op->mmap_prepare)
> > > +		return true;
> > > +
> > > +	return false;
> > > +}
> > > +
Re: [PATCH v2 3/4] mm: prevent KSM from completely breaking VMA merging
Posted by Liam R. Howlett 6 months, 3 weeks ago
* Lorenzo Stoakes <lorenzo.stoakes@oracle.com> [250521 14:20]:
> If a user wishes to enable KSM mergeability for an entire process and all
> fork/exec'd processes that come after it, they use the prctl()
> PR_SET_MEMORY_MERGE operation.
> 
> This defaults all newly mapped VMAs to have the VM_MERGEABLE VMA flag set
> (in order to indicate they are KSM mergeable), as well as setting this flag
> for all existing VMAs.
> 
> However it also entirely and completely breaks VMA merging for the process
> and all forked (and fork/exec'd) processes.
> 
> This is because when a new mapping is proposed, the flags specified will
> never have VM_MERGEABLE set. However all adjacent VMAs will already have
> VM_MERGEABLE set, rendering VMAs unmergeable by default.
> 
> To work around this, we try to set the VM_MERGEABLE flag prior to
> attempting a merge. In the case of brk() this can always be done.
> 
> However on mmap() things are more complicated - while KSM is not supported
> for file-backed mappings, it is supported for MAP_PRIVATE file-backed
> mappings.
> 
> And these mappings may have deprecated .mmap() callbacks specified which
> could, in theory, adjust flags and thus KSM eligiblity.
> 
> This is unlikely to cause an issue on merge, as any adjacent file-backed
> mappings would already have the same post-.mmap() callback attributes, and
> thus would naturally not be merged.
> 
> But for the purposes of establishing a VMA as KSM-eligible (as well as
> initially scanning the VMA), this is potentially very problematic.
> 
> So we check to determine whether this at all possible. If not, we set
> VM_MERGEABLE prior to the merge attempt on mmap(), otherwise we retain the
> previous behaviour.
> 
> When .mmap_prepare() is more widely used, we can remove this precaution.
> 
> While this doesn't quite cover all cases, it covers a great many (all
> anonymous memory, for instance), meaning we should already see a
> significant improvement in VMA mergeability.
> 
> Since, when it comes to file-backed mappings (other than shmem) we are
> really only interested in MAP_PRIVATE mappings which have an available anon
> page by default. Therefore, the VM_SPECIAL restriction makes less sense for
> KSM.
> 
> In a future series we therefore intend to remove this limitation, which
> ought to simplify this implementation. However it makes sense to defer
> doing so until a later stage so we can first address this mergeability
> issue.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Fixes: d7597f59d1d3 ("mm: add new api to enable ksm per process") # please no backport!
> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>

Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>

> ---
>  include/linux/ksm.h |  8 +++++---
>  mm/ksm.c            | 18 +++++++++++------
>  mm/vma.c            | 49 +++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 64 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/ksm.h b/include/linux/ksm.h
> index d73095b5cd96..51787f0b0208 100644
> --- a/include/linux/ksm.h
> +++ b/include/linux/ksm.h
> @@ -17,8 +17,8 @@
>  #ifdef CONFIG_KSM
>  int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
>  		unsigned long end, int advice, unsigned long *vm_flags);
> -
> -void ksm_add_vma(struct vm_area_struct *vma);
> +vm_flags_t ksm_vma_flags(const struct mm_struct *mm, const struct file *file,
> +			 vm_flags_t vm_flags);
>  int ksm_enable_merge_any(struct mm_struct *mm);
>  int ksm_disable_merge_any(struct mm_struct *mm);
>  int ksm_disable(struct mm_struct *mm);
> @@ -97,8 +97,10 @@ bool ksm_process_mergeable(struct mm_struct *mm);
> 
>  #else  /* !CONFIG_KSM */
> 
> -static inline void ksm_add_vma(struct vm_area_struct *vma)
> +static inline vm_flags_t ksm_vma_flags(const struct mm_struct *mm,
> +		const struct file *file, vm_flags_t vm_flags)
>  {
> +	return vm_flags;
>  }
> 
>  static inline int ksm_disable(struct mm_struct *mm)
> diff --git a/mm/ksm.c b/mm/ksm.c
> index d0c763abd499..18b3690bb69a 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -2731,16 +2731,22 @@ static int __ksm_del_vma(struct vm_area_struct *vma)
>  	return 0;
>  }
>  /**
> - * ksm_add_vma - Mark vma as mergeable if compatible
> + * ksm_vma_flags - Update VMA flags to mark as mergeable if compatible
>   *
> - * @vma:  Pointer to vma
> + * @mm:       Proposed VMA's mm_struct
> + * @file:     Proposed VMA's file-backed mapping, if any.
> + * @vm_flags: Proposed VMA"s flags.
> + *
> + * Returns: @vm_flags possibly updated to mark mergeable.
>   */
> -void ksm_add_vma(struct vm_area_struct *vma)
> +vm_flags_t ksm_vma_flags(const struct mm_struct *mm, const struct file *file,
> +			 vm_flags_t vm_flags)
>  {
> -	struct mm_struct *mm = vma->vm_mm;
> +	if (test_bit(MMF_VM_MERGE_ANY, &mm->flags) &&
> +	    __ksm_should_add_vma(file, vm_flags))
> +		vm_flags |= VM_MERGEABLE;
> 
> -	if (test_bit(MMF_VM_MERGE_ANY, &mm->flags))
> -		__ksm_add_vma(vma);
> +	return vm_flags;
>  }
> 
>  static void ksm_add_vmas(struct mm_struct *mm)
> diff --git a/mm/vma.c b/mm/vma.c
> index 3ff6cfbe3338..5bebe55ea737 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -2482,7 +2482,6 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap)
>  	 */
>  	if (!vma_is_anonymous(vma))
>  		khugepaged_enter_vma(vma, map->flags);
> -	ksm_add_vma(vma);
>  	*vmap = vma;
>  	return 0;
> 
> @@ -2585,6 +2584,45 @@ static void set_vma_user_defined_fields(struct vm_area_struct *vma,
>  	vma->vm_private_data = map->vm_private_data;
>  }
> 
> +static void update_ksm_flags(struct mmap_state *map)
> +{
> +	map->flags = ksm_vma_flags(map->mm, map->file, map->flags);
> +}
> +
> +/*
> + * Are we guaranteed no driver can change state such as to preclude KSM merging?
> + * If so, let's set the KSM mergeable flag early so we don't break VMA merging.
> + *
> + * This is applicable when PR_SET_MEMORY_MERGE has been set on the mm_struct via
> + * prctl() causing newly mapped VMAs to have the KSM mergeable VMA flag set.
> + *
> + * If this is not the case, then we set the flag after considering mergeability,
> + * which will prevent mergeability as, when PR_SET_MEMORY_MERGE is set, a new
> + * VMA will not have the KSM mergeability VMA flag set, but all other VMAs will,
> + * preventing any merge.
> + */
> +static bool can_set_ksm_flags_early(struct mmap_state *map)
> +{
> +	struct file *file = map->file;
> +
> +	/* Anonymous mappings have no driver which can change them. */
> +	if (!file)
> +		return true;
> +
> +	/* shmem is safe. */
> +	if (shmem_file(file))
> +		return true;
> +
> +	/*
> +	 * If .mmap_prepare() is specified, then the driver will have already
> +	 * manipulated state prior to updating KSM flags.
> +	 */
> +	if (file->f_op->mmap_prepare)
> +		return true;
> +
> +	return false;
> +}
> +
>  static unsigned long __mmap_region(struct file *file, unsigned long addr,
>  		unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
>  		struct list_head *uf)
> @@ -2595,6 +2633,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
>  	bool have_mmap_prepare = file && file->f_op->mmap_prepare;
>  	VMA_ITERATOR(vmi, mm, addr);
>  	MMAP_STATE(map, mm, &vmi, addr, len, pgoff, vm_flags, file);
> +	bool check_ksm_early = can_set_ksm_flags_early(&map);
> 
>  	error = __mmap_prepare(&map, uf);
>  	if (!error && have_mmap_prepare)
> @@ -2602,6 +2641,9 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
>  	if (error)
>  		goto abort_munmap;
> 
> +	if (check_ksm_early)
> +		update_ksm_flags(&map);
> +
>  	/* Attempt to merge with adjacent VMAs... */
>  	if (map.prev || map.next) {
>  		VMG_MMAP_STATE(vmg, &map, /* vma = */ NULL);
> @@ -2611,6 +2653,9 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
> 
>  	/* ...but if we can't, allocate a new VMA. */
>  	if (!vma) {
> +		if (!check_ksm_early)
> +			update_ksm_flags(&map);
> +
>  		error = __mmap_new_vma(&map, &vma);
>  		if (error)
>  			goto unacct_error;
> @@ -2713,6 +2758,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
>  	 * Note: This happens *after* clearing old mappings in some code paths.
>  	 */
>  	flags |= VM_DATA_DEFAULT_FLAGS | VM_ACCOUNT | mm->def_flags;
> +	flags = ksm_vma_flags(mm, NULL, flags);
>  	if (!may_expand_vm(mm, flags, len >> PAGE_SHIFT))
>  		return -ENOMEM;
> 
> @@ -2756,7 +2802,6 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> 
>  	mm->map_count++;
>  	validate_mm(mm);
> -	ksm_add_vma(vma);

Well, that was in the wrong place.

>  out:
>  	perf_event_mmap(vma);
>  	mm->total_vm += len >> PAGE_SHIFT;
> --
> 2.49.0
Re: [PATCH v2 3/4] mm: prevent KSM from completely breaking VMA merging
Posted by David Hildenbrand 6 months, 3 weeks ago
On 21.05.25 20:20, Lorenzo Stoakes wrote:
> If a user wishes to enable KSM mergeability for an entire process and all
> fork/exec'd processes that come after it, they use the prctl()
> PR_SET_MEMORY_MERGE operation.
> 
> This defaults all newly mapped VMAs to have the VM_MERGEABLE VMA flag set
> (in order to indicate they are KSM mergeable), as well as setting this flag
> for all existing VMAs.
> 
> However it also entirely and completely breaks VMA merging for the process
> and all forked (and fork/exec'd) processes.
> 
> This is because when a new mapping is proposed, the flags specified will
> never have VM_MERGEABLE set. However all adjacent VMAs will already have
> VM_MERGEABLE set, rendering VMAs unmergeable by default.
> 
> To work around this, we try to set the VM_MERGEABLE flag prior to
> attempting a merge. In the case of brk() this can always be done.
> 
> However on mmap() things are more complicated - while KSM is not supported
> for file-backed mappings, it is supported for MAP_PRIVATE file-backed
> mappings.
> 
> And these mappings may have deprecated .mmap() callbacks specified which
> could, in theory, adjust flags and thus KSM eligiblity.
> 
> This is unlikely to cause an issue on merge, as any adjacent file-backed
> mappings would already have the same post-.mmap() callback attributes, and
> thus would naturally not be merged.
> 
> But for the purposes of establishing a VMA as KSM-eligible (as well as
> initially scanning the VMA), this is potentially very problematic.
> 
> So we check to determine whether this at all possible. If not, we set
> VM_MERGEABLE prior to the merge attempt on mmap(), otherwise we retain the
> previous behaviour.
> 
> When .mmap_prepare() is more widely used, we can remove this precaution.
> 
> While this doesn't quite cover all cases, it covers a great many (all
> anonymous memory, for instance), meaning we should already see a
> significant improvement in VMA mergeability.
> 
> Since, when it comes to file-backed mappings (other than shmem) we are
> really only interested in MAP_PRIVATE mappings which have an available anon
> page by default. Therefore, the VM_SPECIAL restriction makes less sense for
> KSM.
> 
> In a future series we therefore intend to remove this limitation, which
> ought to simplify this implementation. However it makes sense to defer
> doing so until a later stage so we can first address this mergeability
> issue.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Fixes: d7597f59d1d3 ("mm: add new api to enable ksm per process") # please no backport!
> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>


Acked-by: David Hildenbrand <david@redhat.com>


-- 
Cheers,

David / dhildenb