[PATCH 2/3] mm: implement sticky, copy on fork VMA flags

Lorenzo Stoakes posted 3 patches 3 months, 1 week ago
There is a newer version of this series
[PATCH 2/3] mm: implement sticky, copy on fork VMA flags
Posted by Lorenzo Stoakes 3 months, 1 week ago
It's useful to be able to force a VMA to be copied on fork outside of the
parameters specified by vma_needs_copy(), which otherwise only copies page
tables if:

* The destination VMA has VM_UFFD_WP set
* The mapping is a PFN or mixed map
* The mapping is anonymous and forked in (i.e. vma->anon_vma is non-NULL)

Setting this flag implies that the page tables mapping the VMA are such
that simply re-faulting the VMA will not re-establish them in identical
form.

We introduce VM_COPY_ON_FORK to clearly identify which flags require this
behaviour, which currently is only VM_MAYBE_GUARD.

Any VMA flags which require this behaviour are inherently 'sticky', that
is, should we merge two VMAs together, this implies that the newly merged
VMA maps a range that requires page table copying on fork.

In order to implement this we must both introduce the concept of a 'sticky'
VMA flag and adjust the VMA merge logic accordingly, and also have VMA
merge still successfully succeed should one VMA have the flag set and
another not.

Note that we update the VMA expand logic to handle new VMA merging, as this
function is the one ultimately called by all instances of merging of new
VMAs.

This patch implements this, establishing VM_STICKY to contain all such
flags and VM_IGNORE_MERGE for those flags which should be ignored when
comparing adjacent VMA's flags for the purposes of merging.

As part of this change we place VM_SOFTDIRTY in VM_IGNORE_MERGE as it
already had this behaviour, alongside VM_STICKY as sticky flags by
implication must not disallow merge.

We update the VMA userland tests to account for the changes and,
furthermore, in order to assert that the functionality is workingly
correctly, update the new VMA and existing VMA merging logic to consider
every permutation of the flag being set/not set in all VMAs being
considered for merge.

As a result of this change, VMAs with guard ranges will now not have their
merge behaviour impacted by doing so and can be freely merged with other
VMAs without VM_MAYBE_GUARD set.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/mm.h               | 32 ++++++++++++
 mm/memory.c                      |  3 +-
 mm/vma.c                         | 22 ++++----
 tools/testing/vma/vma.c          | 89 ++++++++++++++++++++++++++++----
 tools/testing/vma/vma_internal.h | 32 ++++++++++++
 5 files changed, 156 insertions(+), 22 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f963afa1b9de..a8811ba57150 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -522,6 +522,38 @@ extern unsigned int kobjsize(const void *objp);
 #endif
 #define VM_FLAGS_CLEAR	(ARCH_VM_PKEY_FLAGS | VM_ARCH_CLEAR)
 
+/* Flags which should result in page tables being copied on fork. */
+#define VM_COPY_ON_FORK VM_MAYBE_GUARD
+
+/*
+ * Flags which should be 'sticky' on merge - that is, flags which, when one VMA
+ * possesses it but the other does not, the merged VMA should nonetheless have
+ * applied to it:
+ *
+ * VM_COPY_ON_FORK - These flags indicates that a VMA maps a range that contains
+ *                   metadata which should be unconditionally propagated upon
+ *                   fork. When merging two VMAs, we encapsulate this range in
+ *                   the merged VMA, so the flag should be 'sticky' as a result.
+ */
+#define VM_STICKY VM_COPY_ON_FORK
+
+/*
+ * VMA flags we ignore for the purposes of merge, i.e. one VMA possessing one
+ * of these flags and the other not does not preclude a merge.
+ *
+ * VM_SOFTDIRTY - Should not prevent from VMA merging, if we match the flags but
+ *                dirty bit -- the caller should mark merged VMA as dirty. If
+ *                dirty bit won't be excluded from comparison, we increase
+ *                pressure on the memory system forcing the kernel to generate
+ *                new VMAs when old one could be extended instead.
+ *
+ *    VM_STICKY - If one VMA has flags which most be 'sticky', that is ones
+ *                which should propagate to all VMAs, but the other does not,
+ *                the merge should still proceed with the merge logic applying
+ *                sticky flags to the final VMA.
+ */
+#define VM_IGNORE_MERGE (VM_SOFTDIRTY | VM_STICKY)
+
 /*
  * mapping from the currently active vm_flags protection bits (the
  * low four bits) to a page protection mask..
diff --git a/mm/memory.c b/mm/memory.c
index a2c79ee43d68..9528133e5147 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1478,8 +1478,7 @@ vma_needs_copy(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
 	if (src_vma->anon_vma)
 		return true;
 
-	/* Guard regions have momdified page tables that require copying. */
-	if (src_vma->vm_flags & VM_MAYBE_GUARD)
+	if (src_vma->vm_flags & VM_COPY_ON_FORK)
 		return true;
 
 	/*
diff --git a/mm/vma.c b/mm/vma.c
index 919d1fc63a52..50a6909c4be3 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -89,15 +89,7 @@ static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_nex
 
 	if (!mpol_equal(vmg->policy, vma_policy(vma)))
 		return false;
-	/*
-	 * VM_SOFTDIRTY should not prevent from VMA merging, if we
-	 * match the flags but dirty bit -- the caller should mark
-	 * merged VMA as dirty. If dirty bit won't be excluded from
-	 * comparison, we increase pressure on the memory system forcing
-	 * the kernel to generate new VMAs when old one could be
-	 * extended instead.
-	 */
-	if ((vma->vm_flags ^ vmg->vm_flags) & ~VM_SOFTDIRTY)
+	if ((vma->vm_flags ^ vmg->vm_flags) & ~VM_IGNORE_MERGE)
 		return false;
 	if (vma->vm_file != vmg->file)
 		return false;
@@ -809,6 +801,7 @@ static bool can_merge_remove_vma(struct vm_area_struct *vma)
 static __must_check struct vm_area_struct *vma_merge_existing_range(
 		struct vma_merge_struct *vmg)
 {
+	vm_flags_t sticky_flags = vmg->vm_flags & VM_STICKY;
 	struct vm_area_struct *middle = vmg->middle;
 	struct vm_area_struct *prev = vmg->prev;
 	struct vm_area_struct *next;
@@ -901,11 +894,13 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
 	if (merge_right) {
 		vma_start_write(next);
 		vmg->target = next;
+		sticky_flags |= (next->vm_flags & VM_STICKY);
 	}
 
 	if (merge_left) {
 		vma_start_write(prev);
 		vmg->target = prev;
+		sticky_flags |= (prev->vm_flags & VM_STICKY);
 	}
 
 	if (merge_both) {
@@ -975,6 +970,7 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
 	if (err || commit_merge(vmg))
 		goto abort;
 
+	vm_flags_set(vmg->target, sticky_flags);
 	khugepaged_enter_vma(vmg->target, vmg->vm_flags);
 	vmg->state = VMA_MERGE_SUCCESS;
 	return vmg->target;
@@ -1125,6 +1121,10 @@ int vma_expand(struct vma_merge_struct *vmg)
 	bool remove_next = false;
 	struct vm_area_struct *target = vmg->target;
 	struct vm_area_struct *next = vmg->next;
+	vm_flags_t sticky_flags;
+
+	sticky_flags = vmg->vm_flags & VM_STICKY;
+	sticky_flags |= target->vm_flags & VM_STICKY;
 
 	VM_WARN_ON_VMG(!target, vmg);
 
@@ -1134,6 +1134,7 @@ int vma_expand(struct vma_merge_struct *vmg)
 	if (next && (target != next) && (vmg->end == next->vm_end)) {
 		int ret;
 
+		sticky_flags |= next->vm_flags & VM_STICKY;
 		remove_next = true;
 		/* This should already have been checked by this point. */
 		VM_WARN_ON_VMG(!can_merge_remove_vma(next), vmg);
@@ -1160,6 +1161,7 @@ int vma_expand(struct vma_merge_struct *vmg)
 	if (commit_merge(vmg))
 		goto nomem;
 
+	vm_flags_set(target, sticky_flags);
 	return 0;
 
 nomem:
@@ -1903,7 +1905,7 @@ static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *
 	return a->vm_end == b->vm_start &&
 		mpol_equal(vma_policy(a), vma_policy(b)) &&
 		a->vm_file == b->vm_file &&
-		!((a->vm_flags ^ b->vm_flags) & ~(VM_ACCESS_FLAGS | VM_SOFTDIRTY)) &&
+		!((a->vm_flags ^ b->vm_flags) & ~(VM_ACCESS_FLAGS | VM_IGNORE_MERGE)) &&
 		b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT);
 }
 
diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
index 656e1c75b711..ee9d3547c421 100644
--- a/tools/testing/vma/vma.c
+++ b/tools/testing/vma/vma.c
@@ -48,6 +48,8 @@ static struct anon_vma dummy_anon_vma;
 #define ASSERT_EQ(_val1, _val2) ASSERT_TRUE((_val1) == (_val2))
 #define ASSERT_NE(_val1, _val2) ASSERT_TRUE((_val1) != (_val2))
 
+#define IS_SET(_val, _flags) ((_val & _flags) == _flags)
+
 static struct task_struct __current;
 
 struct task_struct *get_current(void)
@@ -441,7 +443,7 @@ static bool test_simple_shrink(void)
 	return true;
 }
 
-static bool test_merge_new(void)
+static bool __test_merge_new(bool is_sticky, bool a_is_sticky, bool b_is_sticky, bool c_is_sticky)
 {
 	vm_flags_t vm_flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
 	struct mm_struct mm = {};
@@ -469,23 +471,32 @@ static bool test_merge_new(void)
 	struct vm_area_struct *vma, *vma_a, *vma_b, *vma_c, *vma_d;
 	bool merged;
 
+	if (is_sticky)
+		vm_flags |= VM_STICKY;
+
 	/*
 	 * 0123456789abc
 	 * AA B       CC
 	 */
 	vma_a = alloc_and_link_vma(&mm, 0, 0x2000, 0, vm_flags);
 	ASSERT_NE(vma_a, NULL);
+	if (a_is_sticky)
+		vm_flags_set(vma_a, VM_STICKY);
 	/* We give each VMA a single avc so we can test anon_vma duplication. */
 	INIT_LIST_HEAD(&vma_a->anon_vma_chain);
 	list_add(&dummy_anon_vma_chain_a.same_vma, &vma_a->anon_vma_chain);
 
 	vma_b = alloc_and_link_vma(&mm, 0x3000, 0x4000, 3, vm_flags);
 	ASSERT_NE(vma_b, NULL);
+	if (b_is_sticky)
+		vm_flags_set(vma_b, VM_STICKY);
 	INIT_LIST_HEAD(&vma_b->anon_vma_chain);
 	list_add(&dummy_anon_vma_chain_b.same_vma, &vma_b->anon_vma_chain);
 
 	vma_c = alloc_and_link_vma(&mm, 0xb000, 0xc000, 0xb, vm_flags);
 	ASSERT_NE(vma_c, NULL);
+	if (c_is_sticky)
+		vm_flags_set(vma_c, VM_STICKY);
 	INIT_LIST_HEAD(&vma_c->anon_vma_chain);
 	list_add(&dummy_anon_vma_chain_c.same_vma, &vma_c->anon_vma_chain);
 
@@ -520,6 +531,8 @@ static bool test_merge_new(void)
 	ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
 	ASSERT_TRUE(vma_write_started(vma));
 	ASSERT_EQ(mm.map_count, 3);
+	if (is_sticky || a_is_sticky || b_is_sticky)
+		ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY));
 
 	/*
 	 * Merge to PREVIOUS VMA.
@@ -537,6 +550,8 @@ static bool test_merge_new(void)
 	ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
 	ASSERT_TRUE(vma_write_started(vma));
 	ASSERT_EQ(mm.map_count, 3);
+	if (is_sticky || a_is_sticky)
+		ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY));
 
 	/*
 	 * Merge to NEXT VMA.
@@ -556,6 +571,8 @@ static bool test_merge_new(void)
 	ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
 	ASSERT_TRUE(vma_write_started(vma));
 	ASSERT_EQ(mm.map_count, 3);
+	if (is_sticky) /* D uses is_sticky. */
+		ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY));
 
 	/*
 	 * Merge BOTH sides.
@@ -574,6 +591,8 @@ static bool test_merge_new(void)
 	ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
 	ASSERT_TRUE(vma_write_started(vma));
 	ASSERT_EQ(mm.map_count, 2);
+	if (is_sticky || a_is_sticky)
+		ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY));
 
 	/*
 	 * Merge to NEXT VMA.
@@ -592,6 +611,8 @@ static bool test_merge_new(void)
 	ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
 	ASSERT_TRUE(vma_write_started(vma));
 	ASSERT_EQ(mm.map_count, 2);
+	if (is_sticky || c_is_sticky)
+		ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY));
 
 	/*
 	 * Merge BOTH sides.
@@ -609,6 +630,8 @@ static bool test_merge_new(void)
 	ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
 	ASSERT_TRUE(vma_write_started(vma));
 	ASSERT_EQ(mm.map_count, 1);
+	if (is_sticky || a_is_sticky || c_is_sticky)
+		ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY));
 
 	/*
 	 * Final state.
@@ -637,6 +660,20 @@ static bool test_merge_new(void)
 	return true;
 }
 
+static bool test_merge_new(void)
+{
+	int i, j, k, l;
+
+	/* Generate every possible permutation of sticky flags. */
+	for (i = 0; i < 2; i++)
+		for (j = 0; j < 2; j++)
+			for (k = 0; k < 2; k++)
+				for (l = 0; l < 2; l++)
+					ASSERT_TRUE(__test_merge_new(i, j, k, l));
+
+	return true;
+}
+
 static bool test_vma_merge_special_flags(void)
 {
 	vm_flags_t vm_flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
@@ -973,9 +1010,11 @@ static bool test_vma_merge_new_with_close(void)
 	return true;
 }
 
-static bool test_merge_existing(void)
+static bool __test_merge_existing(bool prev_is_sticky, bool middle_is_sticky, bool next_is_sticky)
 {
 	vm_flags_t vm_flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
+	vm_flags_t prev_flags = vm_flags;
+	vm_flags_t next_flags = vm_flags;
 	struct mm_struct mm = {};
 	VMA_ITERATOR(vmi, &mm, 0);
 	struct vm_area_struct *vma, *vma_prev, *vma_next;
@@ -988,6 +1027,13 @@ static bool test_merge_existing(void)
 	};
 	struct anon_vma_chain avc = {};
 
+	if (prev_is_sticky)
+		prev_flags |= VM_STICKY;
+	if (middle_is_sticky)
+		vm_flags |= VM_STICKY;
+	if (next_is_sticky)
+		next_flags |= VM_STICKY;
+
 	/*
 	 * Merge right case - partial span.
 	 *
@@ -1000,7 +1046,7 @@ static bool test_merge_existing(void)
 	 */
 	vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, vm_flags);
 	vma->vm_ops = &vm_ops; /* This should have no impact. */
-	vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, vm_flags);
+	vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, next_flags);
 	vma_next->vm_ops = &vm_ops; /* This should have no impact. */
 	vmg_set_range_anon_vma(&vmg, 0x3000, 0x6000, 3, vm_flags, &dummy_anon_vma);
 	vmg.middle = vma;
@@ -1018,6 +1064,8 @@ static bool test_merge_existing(void)
 	ASSERT_TRUE(vma_write_started(vma));
 	ASSERT_TRUE(vma_write_started(vma_next));
 	ASSERT_EQ(mm.map_count, 2);
+	if (middle_is_sticky || next_is_sticky)
+		ASSERT_TRUE(IS_SET(vma_next->vm_flags, VM_STICKY));
 
 	/* Clear down and reset. */
 	ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
@@ -1033,7 +1081,7 @@ static bool test_merge_existing(void)
 	 *   NNNNNNN
 	 */
 	vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, vm_flags);
-	vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, vm_flags);
+	vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, next_flags);
 	vma_next->vm_ops = &vm_ops; /* This should have no impact. */
 	vmg_set_range_anon_vma(&vmg, 0x2000, 0x6000, 2, vm_flags, &dummy_anon_vma);
 	vmg.middle = vma;
@@ -1046,6 +1094,8 @@ static bool test_merge_existing(void)
 	ASSERT_EQ(vma_next->anon_vma, &dummy_anon_vma);
 	ASSERT_TRUE(vma_write_started(vma_next));
 	ASSERT_EQ(mm.map_count, 1);
+	if (middle_is_sticky || next_is_sticky)
+		ASSERT_TRUE(IS_SET(vma_next->vm_flags, VM_STICKY));
 
 	/* Clear down and reset. We should have deleted vma. */
 	ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);
@@ -1060,7 +1110,7 @@ static bool test_merge_existing(void)
 	 * 0123456789
 	 * PPPPPPV
 	 */
-	vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, vm_flags);
+	vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, prev_flags);
 	vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
 	vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, vm_flags);
 	vma->vm_ops = &vm_ops; /* This should have no impact. */
@@ -1080,6 +1130,8 @@ static bool test_merge_existing(void)
 	ASSERT_TRUE(vma_write_started(vma_prev));
 	ASSERT_TRUE(vma_write_started(vma));
 	ASSERT_EQ(mm.map_count, 2);
+	if (prev_is_sticky || middle_is_sticky)
+		ASSERT_TRUE(IS_SET(vma_prev->vm_flags, VM_STICKY));
 
 	/* Clear down and reset. */
 	ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
@@ -1094,7 +1146,7 @@ static bool test_merge_existing(void)
 	 * 0123456789
 	 * PPPPPPP
 	 */
-	vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, vm_flags);
+	vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, prev_flags);
 	vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
 	vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, vm_flags);
 	vmg_set_range_anon_vma(&vmg, 0x3000, 0x7000, 3, vm_flags, &dummy_anon_vma);
@@ -1109,6 +1161,8 @@ static bool test_merge_existing(void)
 	ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
 	ASSERT_TRUE(vma_write_started(vma_prev));
 	ASSERT_EQ(mm.map_count, 1);
+	if (prev_is_sticky || middle_is_sticky)
+		ASSERT_TRUE(IS_SET(vma_prev->vm_flags, VM_STICKY));
 
 	/* Clear down and reset. We should have deleted vma. */
 	ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);
@@ -1123,10 +1177,10 @@ static bool test_merge_existing(void)
 	 * 0123456789
 	 * PPPPPPPPPP
 	 */
-	vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, vm_flags);
+	vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, prev_flags);
 	vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
 	vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, vm_flags);
-	vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, vm_flags);
+	vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, next_flags);
 	vmg_set_range_anon_vma(&vmg, 0x3000, 0x7000, 3, vm_flags, &dummy_anon_vma);
 	vmg.prev = vma_prev;
 	vmg.middle = vma;
@@ -1139,6 +1193,8 @@ static bool test_merge_existing(void)
 	ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
 	ASSERT_TRUE(vma_write_started(vma_prev));
 	ASSERT_EQ(mm.map_count, 1);
+	if (prev_is_sticky || middle_is_sticky || next_is_sticky)
+		ASSERT_TRUE(IS_SET(vma_prev->vm_flags, VM_STICKY));
 
 	/* Clear down and reset. We should have deleted prev and next. */
 	ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);
@@ -1158,9 +1214,9 @@ static bool test_merge_existing(void)
 	 * PPPVVVVVNNN
 	 */
 
-	vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, vm_flags);
+	vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, prev_flags);
 	vma = alloc_and_link_vma(&mm, 0x3000, 0x8000, 3, vm_flags);
-	vma_next = alloc_and_link_vma(&mm, 0x8000, 0xa000, 8, vm_flags);
+	vma_next = alloc_and_link_vma(&mm, 0x8000, 0xa000, 8, next_flags);
 
 	vmg_set_range(&vmg, 0x4000, 0x5000, 4, vm_flags);
 	vmg.prev = vma;
@@ -1203,6 +1259,19 @@ static bool test_merge_existing(void)
 	return true;
 }
 
+static bool test_merge_existing(void)
+{
+	int i, j, k;
+
+	/* Generate every possible permutation of sticky flags. */
+	for (i = 0; i < 2; i++)
+		for (j = 0; j < 2; j++)
+			for (k = 0; k < 2; k++)
+				ASSERT_TRUE(__test_merge_existing(i, j, k));
+
+	return true;
+}
+
 static bool test_anon_vma_non_mergeable(void)
 {
 	vm_flags_t vm_flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
index e40c93edc5a7..3d9cb3a9411a 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -117,6 +117,38 @@ extern unsigned long dac_mmap_min_addr;
 #define VM_SEALED	VM_NONE
 #endif
 
+/* Flags which should result in page tables being copied on fork. */
+#define VM_COPY_ON_FORK VM_MAYBE_GUARD
+
+/*
+ * Flags which should be 'sticky' on merge - that is, flags which, when one VMA
+ * possesses it but the other does not, the merged VMA should nonetheless have
+ * applied to it:
+ *
+ * VM_COPY_ON_FORK - These flags indicates that a VMA maps a range that contains
+ *                   metadata which should be unconditionally propagated upon
+ *                   fork. When merging two VMAs, we encapsulate this range in
+ *                   the merged VMA, so the flag should be 'sticky' as a result.
+ */
+#define VM_STICKY VM_COPY_ON_FORK
+
+/*
+ * VMA flags we ignore for the purposes of merge, i.e. one VMA possessing one
+ * of these flags and the other not does not preclude a merge.
+ *
+ * VM_SOFTDIRTY - Should not prevent from VMA merging, if we match the flags but
+ *                dirty bit -- the caller should mark merged VMA as dirty. If
+ *                dirty bit won't be excluded from comparison, we increase
+ *                pressure on the memory system forcing the kernel to generate
+ *                new VMAs when old one could be extended instead.
+ *
+ *    VM_STICKY - If one VMA has flags which must be 'sticky', that is ones
+ *                which should propagate to all VMAs, but the other does not,
+ *                the merge should still proceed with the merge logic applying
+ *                sticky flags to the final VMA.
+ */
+#define VM_IGNORE_MERGE (VM_SOFTDIRTY | VM_STICKY)
+
 #define FIRST_USER_ADDRESS	0UL
 #define USER_PGTABLES_CEILING	0UL
 
-- 
2.51.0
Re: [PATCH 2/3] mm: implement sticky, copy on fork VMA flags
Posted by Pedro Falcato 3 months, 1 week ago
On Wed, Oct 29, 2025 at 04:50:32PM +0000, Lorenzo Stoakes wrote:
> It's useful to be able to force a VMA to be copied on fork outside of the
> parameters specified by vma_needs_copy(), which otherwise only copies page
> tables if:
> 
> * The destination VMA has VM_UFFD_WP set
> * The mapping is a PFN or mixed map
> * The mapping is anonymous and forked in (i.e. vma->anon_vma is non-NULL)
> 
> Setting this flag implies that the page tables mapping the VMA are such
> that simply re-faulting the VMA will not re-establish them in identical
> form.
> 
> We introduce VM_COPY_ON_FORK to clearly identify which flags require this
> behaviour, which currently is only VM_MAYBE_GUARD.

Do we want this to be sticky though? If you're looking for more granularity
with this flag, the best option might be to stop merges from happening there.
If not, I can imagine a VMA that merges with other VMAs far past the original
guarded range, and thus you get no granularity (possibly, not even useful).

If you're _not_ looking for granularity, then maybe using a per-mm flag for
guard ranges or some other solution would be superior?

The rest of the patch (superficially) looks good to me, though.

-- 
Pedro
Re: [PATCH 2/3] mm: implement sticky, copy on fork VMA flags
Posted by Lorenzo Stoakes 3 months, 1 week ago
On Thu, Oct 30, 2025 at 04:25:54PM +0000, Pedro Falcato wrote:
> On Wed, Oct 29, 2025 at 04:50:32PM +0000, Lorenzo Stoakes wrote:
> > It's useful to be able to force a VMA to be copied on fork outside of the
> > parameters specified by vma_needs_copy(), which otherwise only copies page
> > tables if:
> >
> > * The destination VMA has VM_UFFD_WP set
> > * The mapping is a PFN or mixed map
> > * The mapping is anonymous and forked in (i.e. vma->anon_vma is non-NULL)
> >
> > Setting this flag implies that the page tables mapping the VMA are such
> > that simply re-faulting the VMA will not re-establish them in identical
> > form.
> >
> > We introduce VM_COPY_ON_FORK to clearly identify which flags require this
> > behaviour, which currently is only VM_MAYBE_GUARD.
>
> Do we want this to be sticky though? If you're looking for more granularity

Yes?

> with this flag, the best option might be to stop merges from happening there.

No?

That'd entirely break VMA merging for any VMA you are installing guard regions
into.

That'd be a regression, as the property of guard regions belongs to the page
tables which can propagate across split/merge.

Also, a key purpose of this flag is to be able to correctly propagate page
tables on fork for file-backed VMAs.

Without this we have to install an anon_vma in file-backed VMAs (what we do
now), which has all the same drawbacks.

> If not, I can imagine a VMA that merges with other VMAs far past the original
> guarded range, and thus you get no granularity (possibly, not even useful).

Err? What? It gets you VMA granularity. You are always going to do better than
'anywhere in the entire mm'. Of course you can imagine scenarios where one VMA
somehow dominates everything, or guard regions are removed etc. but in most
cases you're not going to encounter that.

Also again, the _more important_ purpose here is correct behaviour on fork.

>
> If you're _not_ looking for granularity, then maybe using a per-mm flag for
> guard ranges or some other solution would be superior?

I'm not sure what solution you think would be superior that wouldn't involve
significant overhead in having to look up guard regions on split/merge.

This is a property that belongs to the page tables that we're relating to VMAs
which may or may not contain page tables which have this property.

mm granularity would be utterly pointless and leave us with the same anon_vma
hack.

>
> The rest of the patch (superficially) looks good to me, though.

Well there's that at least :)

>
> --
> Pedro

Thanks, Lorenzo
Re: [PATCH 2/3] mm: implement sticky, copy on fork VMA flags
Posted by Suren Baghdasaryan 3 months, 1 week ago
On Wed, Oct 29, 2025 at 9:51 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> It's useful to be able to force a VMA to be copied on fork outside of the
> parameters specified by vma_needs_copy(), which otherwise only copies page
> tables if:
>
> * The destination VMA has VM_UFFD_WP set
> * The mapping is a PFN or mixed map
> * The mapping is anonymous and forked in (i.e. vma->anon_vma is non-NULL)
>
> Setting this flag implies that the page tables mapping the VMA are such
> that simply re-faulting the VMA will not re-establish them in identical
> form.
>
> We introduce VM_COPY_ON_FORK to clearly identify which flags require this
> behaviour, which currently is only VM_MAYBE_GUARD.
>
> Any VMA flags which require this behaviour are inherently 'sticky', that
> is, should we merge two VMAs together, this implies that the newly merged
> VMA maps a range that requires page table copying on fork.
>
> In order to implement this we must both introduce the concept of a 'sticky'
> VMA flag and adjust the VMA merge logic accordingly, and also have VMA
> merge still successfully succeed should one VMA have the flag set and
> another not.

"successfully succeed" sounds weird. Just "succeed"?

>
> Note that we update the VMA expand logic to handle new VMA merging, as this
> function is the one ultimately called by all instances of merging of new
> VMAs.
>
> This patch implements this, establishing VM_STICKY to contain all such
> flags and VM_IGNORE_MERGE for those flags which should be ignored when
> comparing adjacent VMA's flags for the purposes of merging.
>
> As part of this change we place VM_SOFTDIRTY in VM_IGNORE_MERGE as it
> already had this behaviour, alongside VM_STICKY as sticky flags by
> implication must not disallow merge.
>
> We update the VMA userland tests to account for the changes and,
> furthermore, in order to assert that the functionality is workingly

s/workingly/working

> correctly, update the new VMA and existing VMA merging logic to consider
> every permutation of the flag being set/not set in all VMAs being
> considered for merge.
>
> As a result of this change, VMAs with guard ranges will now not have their
> merge behaviour impacted by doing so and can be freely merged with other
> VMAs without VM_MAYBE_GUARD set.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  include/linux/mm.h               | 32 ++++++++++++
>  mm/memory.c                      |  3 +-
>  mm/vma.c                         | 22 ++++----
>  tools/testing/vma/vma.c          | 89 ++++++++++++++++++++++++++++----
>  tools/testing/vma/vma_internal.h | 32 ++++++++++++
>  5 files changed, 156 insertions(+), 22 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f963afa1b9de..a8811ba57150 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -522,6 +522,38 @@ extern unsigned int kobjsize(const void *objp);
>  #endif
>  #define VM_FLAGS_CLEAR (ARCH_VM_PKEY_FLAGS | VM_ARCH_CLEAR)
>
> +/* Flags which should result in page tables being copied on fork. */
> +#define VM_COPY_ON_FORK VM_MAYBE_GUARD
> +
> +/*
> + * Flags which should be 'sticky' on merge - that is, flags which, when one VMA
> + * possesses it but the other does not, the merged VMA should nonetheless have
> + * applied to it:
> + *
> + * VM_COPY_ON_FORK - These flags indicates that a VMA maps a range that contains
> + *                   metadata which should be unconditionally propagated upon
> + *                   fork. When merging two VMAs, we encapsulate this range in
> + *                   the merged VMA, so the flag should be 'sticky' as a result.

It's probably worth noting that after a split, we do not remove
"sticky" flags even if the VMA acquired them as a result of a previous
merge.

> + */
> +#define VM_STICKY VM_COPY_ON_FORK
> +
> +/*
> + * VMA flags we ignore for the purposes of merge, i.e. one VMA possessing one
> + * of these flags and the other not does not preclude a merge.
> + *
> + * VM_SOFTDIRTY - Should not prevent from VMA merging, if we match the flags but
> + *                dirty bit -- the caller should mark merged VMA as dirty. If
> + *                dirty bit won't be excluded from comparison, we increase
> + *                pressure on the memory system forcing the kernel to generate
> + *                new VMAs when old one could be extended instead.
> + *
> + *    VM_STICKY - If one VMA has flags which most be 'sticky', that is ones

s/most/must ?

> + *                which should propagate to all VMAs, but the other does not,
> + *                the merge should still proceed with the merge logic applying
> + *                sticky flags to the final VMA.
> + */
> +#define VM_IGNORE_MERGE (VM_SOFTDIRTY | VM_STICKY)
> +
>  /*
>   * mapping from the currently active vm_flags protection bits (the
>   * low four bits) to a page protection mask..
> diff --git a/mm/memory.c b/mm/memory.c
> index a2c79ee43d68..9528133e5147 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1478,8 +1478,7 @@ vma_needs_copy(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
>         if (src_vma->anon_vma)
>                 return true;
>
> -       /* Guard regions have momdified page tables that require copying. */
> -       if (src_vma->vm_flags & VM_MAYBE_GUARD)
> +       if (src_vma->vm_flags & VM_COPY_ON_FORK)
>                 return true;
>
>         /*
> diff --git a/mm/vma.c b/mm/vma.c
> index 919d1fc63a52..50a6909c4be3 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -89,15 +89,7 @@ static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_nex
>
>         if (!mpol_equal(vmg->policy, vma_policy(vma)))
>                 return false;
> -       /*
> -        * VM_SOFTDIRTY should not prevent from VMA merging, if we
> -        * match the flags but dirty bit -- the caller should mark
> -        * merged VMA as dirty. If dirty bit won't be excluded from
> -        * comparison, we increase pressure on the memory system forcing
> -        * the kernel to generate new VMAs when old one could be
> -        * extended instead.
> -        */
> -       if ((vma->vm_flags ^ vmg->vm_flags) & ~VM_SOFTDIRTY)
> +       if ((vma->vm_flags ^ vmg->vm_flags) & ~VM_IGNORE_MERGE)
>                 return false;
>         if (vma->vm_file != vmg->file)
>                 return false;
> @@ -809,6 +801,7 @@ static bool can_merge_remove_vma(struct vm_area_struct *vma)
>  static __must_check struct vm_area_struct *vma_merge_existing_range(
>                 struct vma_merge_struct *vmg)
>  {
> +       vm_flags_t sticky_flags = vmg->vm_flags & VM_STICKY;
>         struct vm_area_struct *middle = vmg->middle;
>         struct vm_area_struct *prev = vmg->prev;
>         struct vm_area_struct *next;
> @@ -901,11 +894,13 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
>         if (merge_right) {
>                 vma_start_write(next);
>                 vmg->target = next;
> +               sticky_flags |= (next->vm_flags & VM_STICKY);
>         }
>
>         if (merge_left) {
>                 vma_start_write(prev);
>                 vmg->target = prev;
> +               sticky_flags |= (prev->vm_flags & VM_STICKY);
>         }
>
>         if (merge_both) {
> @@ -975,6 +970,7 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
>         if (err || commit_merge(vmg))
>                 goto abort;
>
> +       vm_flags_set(vmg->target, sticky_flags);
>         khugepaged_enter_vma(vmg->target, vmg->vm_flags);
>         vmg->state = VMA_MERGE_SUCCESS;
>         return vmg->target;
> @@ -1125,6 +1121,10 @@ int vma_expand(struct vma_merge_struct *vmg)
>         bool remove_next = false;
>         struct vm_area_struct *target = vmg->target;
>         struct vm_area_struct *next = vmg->next;
> +       vm_flags_t sticky_flags;
> +
> +       sticky_flags = vmg->vm_flags & VM_STICKY;
> +       sticky_flags |= target->vm_flags & VM_STICKY;
>
>         VM_WARN_ON_VMG(!target, vmg);
>
> @@ -1134,6 +1134,7 @@ int vma_expand(struct vma_merge_struct *vmg)
>         if (next && (target != next) && (vmg->end == next->vm_end)) {
>                 int ret;
>
> +               sticky_flags |= next->vm_flags & VM_STICKY;
>                 remove_next = true;
>                 /* This should already have been checked by this point. */
>                 VM_WARN_ON_VMG(!can_merge_remove_vma(next), vmg);
> @@ -1160,6 +1161,7 @@ int vma_expand(struct vma_merge_struct *vmg)
>         if (commit_merge(vmg))
>                 goto nomem;
>
> +       vm_flags_set(target, sticky_flags);
>         return 0;
>
>  nomem:
> @@ -1903,7 +1905,7 @@ static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *
>         return a->vm_end == b->vm_start &&
>                 mpol_equal(vma_policy(a), vma_policy(b)) &&
>                 a->vm_file == b->vm_file &&
> -               !((a->vm_flags ^ b->vm_flags) & ~(VM_ACCESS_FLAGS | VM_SOFTDIRTY)) &&
> +               !((a->vm_flags ^ b->vm_flags) & ~(VM_ACCESS_FLAGS | VM_IGNORE_MERGE)) &&
>                 b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT);
>  }
>
> diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
> index 656e1c75b711..ee9d3547c421 100644
> --- a/tools/testing/vma/vma.c
> +++ b/tools/testing/vma/vma.c

I prefer tests in a separate patch, but that might just be me. Feel
free to ignore.

> @@ -48,6 +48,8 @@ static struct anon_vma dummy_anon_vma;
>  #define ASSERT_EQ(_val1, _val2) ASSERT_TRUE((_val1) == (_val2))
>  #define ASSERT_NE(_val1, _val2) ASSERT_TRUE((_val1) != (_val2))
>
> +#define IS_SET(_val, _flags) ((_val & _flags) == _flags)
> +
>  static struct task_struct __current;
>
>  struct task_struct *get_current(void)
> @@ -441,7 +443,7 @@ static bool test_simple_shrink(void)
>         return true;
>  }
>
> -static bool test_merge_new(void)
> +static bool __test_merge_new(bool is_sticky, bool a_is_sticky, bool b_is_sticky, bool c_is_sticky)
>  {
>         vm_flags_t vm_flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
>         struct mm_struct mm = {};
> @@ -469,23 +471,32 @@ static bool test_merge_new(void)
>         struct vm_area_struct *vma, *vma_a, *vma_b, *vma_c, *vma_d;
>         bool merged;
>
> +       if (is_sticky)
> +               vm_flags |= VM_STICKY;
> +
>         /*
>          * 0123456789abc
>          * AA B       CC
>          */
>         vma_a = alloc_and_link_vma(&mm, 0, 0x2000, 0, vm_flags);
>         ASSERT_NE(vma_a, NULL);
> +       if (a_is_sticky)
> +               vm_flags_set(vma_a, VM_STICKY);
>         /* We give each VMA a single avc so we can test anon_vma duplication. */
>         INIT_LIST_HEAD(&vma_a->anon_vma_chain);
>         list_add(&dummy_anon_vma_chain_a.same_vma, &vma_a->anon_vma_chain);
>
>         vma_b = alloc_and_link_vma(&mm, 0x3000, 0x4000, 3, vm_flags);
>         ASSERT_NE(vma_b, NULL);
> +       if (b_is_sticky)
> +               vm_flags_set(vma_b, VM_STICKY);
>         INIT_LIST_HEAD(&vma_b->anon_vma_chain);
>         list_add(&dummy_anon_vma_chain_b.same_vma, &vma_b->anon_vma_chain);
>
>         vma_c = alloc_and_link_vma(&mm, 0xb000, 0xc000, 0xb, vm_flags);
>         ASSERT_NE(vma_c, NULL);
> +       if (c_is_sticky)
> +               vm_flags_set(vma_c, VM_STICKY);
>         INIT_LIST_HEAD(&vma_c->anon_vma_chain);
>         list_add(&dummy_anon_vma_chain_c.same_vma, &vma_c->anon_vma_chain);
>
> @@ -520,6 +531,8 @@ static bool test_merge_new(void)
>         ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
>         ASSERT_TRUE(vma_write_started(vma));
>         ASSERT_EQ(mm.map_count, 3);
> +       if (is_sticky || a_is_sticky || b_is_sticky)
> +               ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY));
>
>         /*
>          * Merge to PREVIOUS VMA.
> @@ -537,6 +550,8 @@ static bool test_merge_new(void)
>         ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
>         ASSERT_TRUE(vma_write_started(vma));
>         ASSERT_EQ(mm.map_count, 3);
> +       if (is_sticky || a_is_sticky)
> +               ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY));
>
>         /*
>          * Merge to NEXT VMA.
> @@ -556,6 +571,8 @@ static bool test_merge_new(void)
>         ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
>         ASSERT_TRUE(vma_write_started(vma));
>         ASSERT_EQ(mm.map_count, 3);
> +       if (is_sticky) /* D uses is_sticky. */
> +               ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY));
>
>         /*
>          * Merge BOTH sides.
> @@ -574,6 +591,8 @@ static bool test_merge_new(void)
>         ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
>         ASSERT_TRUE(vma_write_started(vma));
>         ASSERT_EQ(mm.map_count, 2);
> +       if (is_sticky || a_is_sticky)
> +               ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY));
>
>         /*
>          * Merge to NEXT VMA.
> @@ -592,6 +611,8 @@ static bool test_merge_new(void)
>         ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
>         ASSERT_TRUE(vma_write_started(vma));
>         ASSERT_EQ(mm.map_count, 2);
> +       if (is_sticky || c_is_sticky)
> +               ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY));
>
>         /*
>          * Merge BOTH sides.
> @@ -609,6 +630,8 @@ static bool test_merge_new(void)
>         ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
>         ASSERT_TRUE(vma_write_started(vma));
>         ASSERT_EQ(mm.map_count, 1);
> +       if (is_sticky || a_is_sticky || c_is_sticky)
> +               ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY));
>
>         /*
>          * Final state.
> @@ -637,6 +660,20 @@ static bool test_merge_new(void)
>         return true;
>  }
>
> +static bool test_merge_new(void)
> +{
> +       int i, j, k, l;
> +
> +       /* Generate every possible permutation of sticky flags. */
> +       for (i = 0; i < 2; i++)
> +               for (j = 0; j < 2; j++)
> +                       for (k = 0; k < 2; k++)
> +                               for (l = 0; l < 2; l++)
> +                                       ASSERT_TRUE(__test_merge_new(i, j, k, l));
> +
> +       return true;
> +}
> +
>  static bool test_vma_merge_special_flags(void)
>  {
>         vm_flags_t vm_flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
> @@ -973,9 +1010,11 @@ static bool test_vma_merge_new_with_close(void)
>         return true;
>  }
>
> -static bool test_merge_existing(void)
> +static bool __test_merge_existing(bool prev_is_sticky, bool middle_is_sticky, bool next_is_sticky)
>  {
>         vm_flags_t vm_flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
> +       vm_flags_t prev_flags = vm_flags;
> +       vm_flags_t next_flags = vm_flags;
>         struct mm_struct mm = {};
>         VMA_ITERATOR(vmi, &mm, 0);
>         struct vm_area_struct *vma, *vma_prev, *vma_next;
> @@ -988,6 +1027,13 @@ static bool test_merge_existing(void)
>         };
>         struct anon_vma_chain avc = {};
>
> +       if (prev_is_sticky)
> +               prev_flags |= VM_STICKY;
> +       if (middle_is_sticky)
> +               vm_flags |= VM_STICKY;
> +       if (next_is_sticky)
> +               next_flags |= VM_STICKY;
> +
>         /*
>          * Merge right case - partial span.
>          *
> @@ -1000,7 +1046,7 @@ static bool test_merge_existing(void)
>          */
>         vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, vm_flags);
>         vma->vm_ops = &vm_ops; /* This should have no impact. */
> -       vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, vm_flags);
> +       vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, next_flags);
>         vma_next->vm_ops = &vm_ops; /* This should have no impact. */
>         vmg_set_range_anon_vma(&vmg, 0x3000, 0x6000, 3, vm_flags, &dummy_anon_vma);
>         vmg.middle = vma;
> @@ -1018,6 +1064,8 @@ static bool test_merge_existing(void)
>         ASSERT_TRUE(vma_write_started(vma));
>         ASSERT_TRUE(vma_write_started(vma_next));
>         ASSERT_EQ(mm.map_count, 2);
> +       if (middle_is_sticky || next_is_sticky)
> +               ASSERT_TRUE(IS_SET(vma_next->vm_flags, VM_STICKY));
>
>         /* Clear down and reset. */
>         ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
> @@ -1033,7 +1081,7 @@ static bool test_merge_existing(void)
>          *   NNNNNNN
>          */
>         vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, vm_flags);
> -       vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, vm_flags);
> +       vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, next_flags);
>         vma_next->vm_ops = &vm_ops; /* This should have no impact. */
>         vmg_set_range_anon_vma(&vmg, 0x2000, 0x6000, 2, vm_flags, &dummy_anon_vma);
>         vmg.middle = vma;
> @@ -1046,6 +1094,8 @@ static bool test_merge_existing(void)
>         ASSERT_EQ(vma_next->anon_vma, &dummy_anon_vma);
>         ASSERT_TRUE(vma_write_started(vma_next));
>         ASSERT_EQ(mm.map_count, 1);
> +       if (middle_is_sticky || next_is_sticky)
> +               ASSERT_TRUE(IS_SET(vma_next->vm_flags, VM_STICKY));
>
>         /* Clear down and reset. We should have deleted vma. */
>         ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);
> @@ -1060,7 +1110,7 @@ static bool test_merge_existing(void)
>          * 0123456789
>          * PPPPPPV
>          */
> -       vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, vm_flags);
> +       vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, prev_flags);
>         vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
>         vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, vm_flags);
>         vma->vm_ops = &vm_ops; /* This should have no impact. */
> @@ -1080,6 +1130,8 @@ static bool test_merge_existing(void)
>         ASSERT_TRUE(vma_write_started(vma_prev));
>         ASSERT_TRUE(vma_write_started(vma));
>         ASSERT_EQ(mm.map_count, 2);
> +       if (prev_is_sticky || middle_is_sticky)
> +               ASSERT_TRUE(IS_SET(vma_prev->vm_flags, VM_STICKY));
>
>         /* Clear down and reset. */
>         ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
> @@ -1094,7 +1146,7 @@ static bool test_merge_existing(void)
>          * 0123456789
>          * PPPPPPP
>          */
> -       vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, vm_flags);
> +       vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, prev_flags);
>         vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
>         vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, vm_flags);
>         vmg_set_range_anon_vma(&vmg, 0x3000, 0x7000, 3, vm_flags, &dummy_anon_vma);
> @@ -1109,6 +1161,8 @@ static bool test_merge_existing(void)
>         ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
>         ASSERT_TRUE(vma_write_started(vma_prev));
>         ASSERT_EQ(mm.map_count, 1);
> +       if (prev_is_sticky || middle_is_sticky)
> +               ASSERT_TRUE(IS_SET(vma_prev->vm_flags, VM_STICKY));
>
>         /* Clear down and reset. We should have deleted vma. */
>         ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);
> @@ -1123,10 +1177,10 @@ static bool test_merge_existing(void)
>          * 0123456789
>          * PPPPPPPPPP
>          */
> -       vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, vm_flags);
> +       vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, prev_flags);
>         vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
>         vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, vm_flags);
> -       vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, vm_flags);
> +       vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, next_flags);
>         vmg_set_range_anon_vma(&vmg, 0x3000, 0x7000, 3, vm_flags, &dummy_anon_vma);
>         vmg.prev = vma_prev;
>         vmg.middle = vma;
> @@ -1139,6 +1193,8 @@ static bool test_merge_existing(void)
>         ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
>         ASSERT_TRUE(vma_write_started(vma_prev));
>         ASSERT_EQ(mm.map_count, 1);
> +       if (prev_is_sticky || middle_is_sticky || next_is_sticky)
> +               ASSERT_TRUE(IS_SET(vma_prev->vm_flags, VM_STICKY));
>
>         /* Clear down and reset. We should have deleted prev and next. */
>         ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);
> @@ -1158,9 +1214,9 @@ static bool test_merge_existing(void)
>          * PPPVVVVVNNN
>          */
>
> -       vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, vm_flags);
> +       vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, prev_flags);
>         vma = alloc_and_link_vma(&mm, 0x3000, 0x8000, 3, vm_flags);
> -       vma_next = alloc_and_link_vma(&mm, 0x8000, 0xa000, 8, vm_flags);
> +       vma_next = alloc_and_link_vma(&mm, 0x8000, 0xa000, 8, next_flags);
>
>         vmg_set_range(&vmg, 0x4000, 0x5000, 4, vm_flags);
>         vmg.prev = vma;
> @@ -1203,6 +1259,19 @@ static bool test_merge_existing(void)
>         return true;
>  }
>
> +static bool test_merge_existing(void)
> +{
> +       int i, j, k;
> +
> +       /* Generate every possible permutation of sticky flags. */
> +       for (i = 0; i < 2; i++)
> +               for (j = 0; j < 2; j++)
> +                       for (k = 0; k < 2; k++)
> +                               ASSERT_TRUE(__test_merge_existing(i, j, k));
> +
> +       return true;
> +}
> +
>  static bool test_anon_vma_non_mergeable(void)
>  {
>         vm_flags_t vm_flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
> diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
> index e40c93edc5a7..3d9cb3a9411a 100644
> --- a/tools/testing/vma/vma_internal.h
> +++ b/tools/testing/vma/vma_internal.h
> @@ -117,6 +117,38 @@ extern unsigned long dac_mmap_min_addr;
>  #define VM_SEALED      VM_NONE
>  #endif
>
> +/* Flags which should result in page tables being copied on fork. */
> +#define VM_COPY_ON_FORK VM_MAYBE_GUARD
> +
> +/*
> + * Flags which should be 'sticky' on merge - that is, flags which, when one VMA
> + * possesses it but the other does not, the merged VMA should nonetheless have
> + * applied to it:
> + *
> + * VM_COPY_ON_FORK - These flags indicates that a VMA maps a range that contains
> + *                   metadata which should be unconditionally propagated upon
> + *                   fork. When merging two VMAs, we encapsulate this range in
> + *                   the merged VMA, so the flag should be 'sticky' as a result.
> + */
> +#define VM_STICKY VM_COPY_ON_FORK
> +
> +/*
> + * VMA flags we ignore for the purposes of merge, i.e. one VMA possessing one
> + * of these flags and the other not does not preclude a merge.
> + *
> + * VM_SOFTDIRTY - Should not prevent from VMA merging, if we match the flags but
> + *                dirty bit -- the caller should mark merged VMA as dirty. If
> + *                dirty bit won't be excluded from comparison, we increase
> + *                pressure on the memory system forcing the kernel to generate
> + *                new VMAs when old one could be extended instead.
> + *
> + *    VM_STICKY - If one VMA has flags which must be 'sticky', that is ones
> + *                which should propagate to all VMAs, but the other does not,
> + *                the merge should still proceed with the merge logic applying
> + *                sticky flags to the final VMA.
> + */
> +#define VM_IGNORE_MERGE (VM_SOFTDIRTY | VM_STICKY)
> +
>  #define FIRST_USER_ADDRESS     0UL
>  #define USER_PGTABLES_CEILING  0UL
>
> --
> 2.51.0
>
Re: [PATCH 2/3] mm: implement sticky, copy on fork VMA flags
Posted by Lorenzo Stoakes 3 months, 1 week ago
On Wed, Oct 29, 2025 at 09:35:25PM -0700, Suren Baghdasaryan wrote:
> On Wed, Oct 29, 2025 at 9:51 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > It's useful to be able to force a VMA to be copied on fork outside of the
> > parameters specified by vma_needs_copy(), which otherwise only copies page
> > tables if:
> >
> > * The destination VMA has VM_UFFD_WP set
> > * The mapping is a PFN or mixed map
> > * The mapping is anonymous and forked in (i.e. vma->anon_vma is non-NULL)
> >
> > Setting this flag implies that the page tables mapping the VMA are such
> > that simply re-faulting the VMA will not re-establish them in identical
> > form.
> >
> > We introduce VM_COPY_ON_FORK to clearly identify which flags require this
> > behaviour, which currently is only VM_MAYBE_GUARD.
> >
> > Any VMA flags which require this behaviour are inherently 'sticky', that
> > is, should we merge two VMAs together, this implies that the newly merged
> > VMA maps a range that requires page table copying on fork.
> >
> > In order to implement this we must both introduce the concept of a 'sticky'
> > VMA flag and adjust the VMA merge logic accordingly, and also have VMA
> > merge still successfully succeed should one VMA have the flag set and
> > another not.
>
> "successfully succeed" sounds weird. Just "succeed"?

Yeah... typo bonanza this series :) will fix.

>
> >
> > Note that we update the VMA expand logic to handle new VMA merging, as this
> > function is the one ultimately called by all instances of merging of new
> > VMAs.
> >
> > This patch implements this, establishing VM_STICKY to contain all such
> > flags and VM_IGNORE_MERGE for those flags which should be ignored when
> > comparing adjacent VMA's flags for the purposes of merging.
> >
> > As part of this change we place VM_SOFTDIRTY in VM_IGNORE_MERGE as it
> > already had this behaviour, alongside VM_STICKY as sticky flags by
> > implication must not disallow merge.
> >
> > We update the VMA userland tests to account for the changes and,
> > furthermore, in order to assert that the functionality is workingly
>
> s/workingly/working

Haha good lord. Will fix also!

>
> > correctly, update the new VMA and existing VMA merging logic to consider
> > every permutation of the flag being set/not set in all VMAs being
> > considered for merge.
> >
> > As a result of this change, VMAs with guard ranges will now not have their
> > merge behaviour impacted by doing so and can be freely merged with other
> > VMAs without VM_MAYBE_GUARD set.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> >  include/linux/mm.h               | 32 ++++++++++++
> >  mm/memory.c                      |  3 +-
> >  mm/vma.c                         | 22 ++++----
> >  tools/testing/vma/vma.c          | 89 ++++++++++++++++++++++++++++----
> >  tools/testing/vma/vma_internal.h | 32 ++++++++++++
> >  5 files changed, 156 insertions(+), 22 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index f963afa1b9de..a8811ba57150 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -522,6 +522,38 @@ extern unsigned int kobjsize(const void *objp);
> >  #endif
> >  #define VM_FLAGS_CLEAR (ARCH_VM_PKEY_FLAGS | VM_ARCH_CLEAR)
> >
> > +/* Flags which should result in page tables being copied on fork. */
> > +#define VM_COPY_ON_FORK VM_MAYBE_GUARD
> > +
> > +/*
> > + * Flags which should be 'sticky' on merge - that is, flags which, when one VMA
> > + * possesses it but the other does not, the merged VMA should nonetheless have
> > + * applied to it:
> > + *
> > + * VM_COPY_ON_FORK - These flags indicates that a VMA maps a range that contains
> > + *                   metadata which should be unconditionally propagated upon
> > + *                   fork. When merging two VMAs, we encapsulate this range in
> > + *                   the merged VMA, so the flag should be 'sticky' as a result.
>
> It's probably worth noting that after a split, we do not remove
> "sticky" flags even if the VMA acquired them as a result of a previous
> merge.

Hm I thought this was implied. Will update to be clear however!

>
> > + */
> > +#define VM_STICKY VM_COPY_ON_FORK
> > +
> > +/*
> > + * VMA flags we ignore for the purposes of merge, i.e. one VMA possessing one
> > + * of these flags and the other not does not preclude a merge.
> > + *
> > + * VM_SOFTDIRTY - Should not prevent from VMA merging, if we match the flags but
> > + *                dirty bit -- the caller should mark merged VMA as dirty. If
> > + *                dirty bit won't be excluded from comparison, we increase
> > + *                pressure on the memory system forcing the kernel to generate
> > + *                new VMAs when old one could be extended instead.
> > + *
> > + *    VM_STICKY - If one VMA has flags which most be 'sticky', that is ones
>
> s/most/must ?

I most learn to not typo so much :)

Yes you're right, will fix! :P

>
> > + *                which should propagate to all VMAs, but the other does not,
> > + *                the merge should still proceed with the merge logic applying
> > + *                sticky flags to the final VMA.
> > + */
> > +#define VM_IGNORE_MERGE (VM_SOFTDIRTY | VM_STICKY)
> > +
> >  /*
> >   * mapping from the currently active vm_flags protection bits (the
> >   * low four bits) to a page protection mask..
> > diff --git a/mm/memory.c b/mm/memory.c
> > index a2c79ee43d68..9528133e5147 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1478,8 +1478,7 @@ vma_needs_copy(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
> >         if (src_vma->anon_vma)
> >                 return true;
> >
> > -       /* Guard regions have momdified page tables that require copying. */
> > -       if (src_vma->vm_flags & VM_MAYBE_GUARD)
> > +       if (src_vma->vm_flags & VM_COPY_ON_FORK)
> >                 return true;
> >
> >         /*
> > diff --git a/mm/vma.c b/mm/vma.c
> > index 919d1fc63a52..50a6909c4be3 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -89,15 +89,7 @@ static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_nex
> >
> >         if (!mpol_equal(vmg->policy, vma_policy(vma)))
> >                 return false;
> > -       /*
> > -        * VM_SOFTDIRTY should not prevent from VMA merging, if we
> > -        * match the flags but dirty bit -- the caller should mark
> > -        * merged VMA as dirty. If dirty bit won't be excluded from
> > -        * comparison, we increase pressure on the memory system forcing
> > -        * the kernel to generate new VMAs when old one could be
> > -        * extended instead.
> > -        */
> > -       if ((vma->vm_flags ^ vmg->vm_flags) & ~VM_SOFTDIRTY)
> > +       if ((vma->vm_flags ^ vmg->vm_flags) & ~VM_IGNORE_MERGE)
> >                 return false;
> >         if (vma->vm_file != vmg->file)
> >                 return false;
> > @@ -809,6 +801,7 @@ static bool can_merge_remove_vma(struct vm_area_struct *vma)
> >  static __must_check struct vm_area_struct *vma_merge_existing_range(
> >                 struct vma_merge_struct *vmg)
> >  {
> > +       vm_flags_t sticky_flags = vmg->vm_flags & VM_STICKY;
> >         struct vm_area_struct *middle = vmg->middle;
> >         struct vm_area_struct *prev = vmg->prev;
> >         struct vm_area_struct *next;
> > @@ -901,11 +894,13 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
> >         if (merge_right) {
> >                 vma_start_write(next);
> >                 vmg->target = next;
> > +               sticky_flags |= (next->vm_flags & VM_STICKY);
> >         }
> >
> >         if (merge_left) {
> >                 vma_start_write(prev);
> >                 vmg->target = prev;
> > +               sticky_flags |= (prev->vm_flags & VM_STICKY);
> >         }
> >
> >         if (merge_both) {
> > @@ -975,6 +970,7 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
> >         if (err || commit_merge(vmg))
> >                 goto abort;
> >
> > +       vm_flags_set(vmg->target, sticky_flags);
> >         khugepaged_enter_vma(vmg->target, vmg->vm_flags);
> >         vmg->state = VMA_MERGE_SUCCESS;
> >         return vmg->target;
> > @@ -1125,6 +1121,10 @@ int vma_expand(struct vma_merge_struct *vmg)
> >         bool remove_next = false;
> >         struct vm_area_struct *target = vmg->target;
> >         struct vm_area_struct *next = vmg->next;
> > +       vm_flags_t sticky_flags;
> > +
> > +       sticky_flags = vmg->vm_flags & VM_STICKY;
> > +       sticky_flags |= target->vm_flags & VM_STICKY;
> >
> >         VM_WARN_ON_VMG(!target, vmg);
> >
> > @@ -1134,6 +1134,7 @@ int vma_expand(struct vma_merge_struct *vmg)
> >         if (next && (target != next) && (vmg->end == next->vm_end)) {
> >                 int ret;
> >
> > +               sticky_flags |= next->vm_flags & VM_STICKY;
> >                 remove_next = true;
> >                 /* This should already have been checked by this point. */
> >                 VM_WARN_ON_VMG(!can_merge_remove_vma(next), vmg);
> > @@ -1160,6 +1161,7 @@ int vma_expand(struct vma_merge_struct *vmg)
> >         if (commit_merge(vmg))
> >                 goto nomem;
> >
> > +       vm_flags_set(target, sticky_flags);
> >         return 0;
> >
> >  nomem:
> > @@ -1903,7 +1905,7 @@ static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *
> >         return a->vm_end == b->vm_start &&
> >                 mpol_equal(vma_policy(a), vma_policy(b)) &&
> >                 a->vm_file == b->vm_file &&
> > -               !((a->vm_flags ^ b->vm_flags) & ~(VM_ACCESS_FLAGS | VM_SOFTDIRTY)) &&
> > +               !((a->vm_flags ^ b->vm_flags) & ~(VM_ACCESS_FLAGS | VM_IGNORE_MERGE)) &&
> >                 b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT);
> >  }
> >
> > diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c
> > index 656e1c75b711..ee9d3547c421 100644
> > --- a/tools/testing/vma/vma.c
> > +++ b/tools/testing/vma/vma.c
>
> I prefer tests in a separate patch, but that might just be me. Feel
> free to ignore.

Yeah can split it out! I do tend to do that actually, not sure why I
deviated from that here.

>
> > @@ -48,6 +48,8 @@ static struct anon_vma dummy_anon_vma;
> >  #define ASSERT_EQ(_val1, _val2) ASSERT_TRUE((_val1) == (_val2))
> >  #define ASSERT_NE(_val1, _val2) ASSERT_TRUE((_val1) != (_val2))
> >
> > +#define IS_SET(_val, _flags) ((_val & _flags) == _flags)
> > +
> >  static struct task_struct __current;
> >
> >  struct task_struct *get_current(void)
> > @@ -441,7 +443,7 @@ static bool test_simple_shrink(void)
> >         return true;
> >  }
> >
> > -static bool test_merge_new(void)
> > +static bool __test_merge_new(bool is_sticky, bool a_is_sticky, bool b_is_sticky, bool c_is_sticky)
> >  {
> >         vm_flags_t vm_flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
> >         struct mm_struct mm = {};
> > @@ -469,23 +471,32 @@ static bool test_merge_new(void)
> >         struct vm_area_struct *vma, *vma_a, *vma_b, *vma_c, *vma_d;
> >         bool merged;
> >
> > +       if (is_sticky)
> > +               vm_flags |= VM_STICKY;
> > +
> >         /*
> >          * 0123456789abc
> >          * AA B       CC
> >          */
> >         vma_a = alloc_and_link_vma(&mm, 0, 0x2000, 0, vm_flags);
> >         ASSERT_NE(vma_a, NULL);
> > +       if (a_is_sticky)
> > +               vm_flags_set(vma_a, VM_STICKY);
> >         /* We give each VMA a single avc so we can test anon_vma duplication. */
> >         INIT_LIST_HEAD(&vma_a->anon_vma_chain);
> >         list_add(&dummy_anon_vma_chain_a.same_vma, &vma_a->anon_vma_chain);
> >
> >         vma_b = alloc_and_link_vma(&mm, 0x3000, 0x4000, 3, vm_flags);
> >         ASSERT_NE(vma_b, NULL);
> > +       if (b_is_sticky)
> > +               vm_flags_set(vma_b, VM_STICKY);
> >         INIT_LIST_HEAD(&vma_b->anon_vma_chain);
> >         list_add(&dummy_anon_vma_chain_b.same_vma, &vma_b->anon_vma_chain);
> >
> >         vma_c = alloc_and_link_vma(&mm, 0xb000, 0xc000, 0xb, vm_flags);
> >         ASSERT_NE(vma_c, NULL);
> > +       if (c_is_sticky)
> > +               vm_flags_set(vma_c, VM_STICKY);
> >         INIT_LIST_HEAD(&vma_c->anon_vma_chain);
> >         list_add(&dummy_anon_vma_chain_c.same_vma, &vma_c->anon_vma_chain);
> >
> > @@ -520,6 +531,8 @@ static bool test_merge_new(void)
> >         ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
> >         ASSERT_TRUE(vma_write_started(vma));
> >         ASSERT_EQ(mm.map_count, 3);
> > +       if (is_sticky || a_is_sticky || b_is_sticky)
> > +               ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY));
> >
> >         /*
> >          * Merge to PREVIOUS VMA.
> > @@ -537,6 +550,8 @@ static bool test_merge_new(void)
> >         ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
> >         ASSERT_TRUE(vma_write_started(vma));
> >         ASSERT_EQ(mm.map_count, 3);
> > +       if (is_sticky || a_is_sticky)
> > +               ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY));
> >
> >         /*
> >          * Merge to NEXT VMA.
> > @@ -556,6 +571,8 @@ static bool test_merge_new(void)
> >         ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
> >         ASSERT_TRUE(vma_write_started(vma));
> >         ASSERT_EQ(mm.map_count, 3);
> > +       if (is_sticky) /* D uses is_sticky. */
> > +               ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY));
> >
> >         /*
> >          * Merge BOTH sides.
> > @@ -574,6 +591,8 @@ static bool test_merge_new(void)
> >         ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
> >         ASSERT_TRUE(vma_write_started(vma));
> >         ASSERT_EQ(mm.map_count, 2);
> > +       if (is_sticky || a_is_sticky)
> > +               ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY));
> >
> >         /*
> >          * Merge to NEXT VMA.
> > @@ -592,6 +611,8 @@ static bool test_merge_new(void)
> >         ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
> >         ASSERT_TRUE(vma_write_started(vma));
> >         ASSERT_EQ(mm.map_count, 2);
> > +       if (is_sticky || c_is_sticky)
> > +               ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY));
> >
> >         /*
> >          * Merge BOTH sides.
> > @@ -609,6 +630,8 @@ static bool test_merge_new(void)
> >         ASSERT_EQ(vma->anon_vma, &dummy_anon_vma);
> >         ASSERT_TRUE(vma_write_started(vma));
> >         ASSERT_EQ(mm.map_count, 1);
> > +       if (is_sticky || a_is_sticky || c_is_sticky)
> > +               ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY));
> >
> >         /*
> >          * Final state.
> > @@ -637,6 +660,20 @@ static bool test_merge_new(void)
> >         return true;
> >  }
> >
> > +static bool test_merge_new(void)
> > +{
> > +       int i, j, k, l;
> > +
> > +       /* Generate every possible permutation of sticky flags. */
> > +       for (i = 0; i < 2; i++)
> > +               for (j = 0; j < 2; j++)
> > +                       for (k = 0; k < 2; k++)
> > +                               for (l = 0; l < 2; l++)
> > +                                       ASSERT_TRUE(__test_merge_new(i, j, k, l));
> > +
> > +       return true;
> > +}
> > +
> >  static bool test_vma_merge_special_flags(void)
> >  {
> >         vm_flags_t vm_flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
> > @@ -973,9 +1010,11 @@ static bool test_vma_merge_new_with_close(void)
> >         return true;
> >  }
> >
> > -static bool test_merge_existing(void)
> > +static bool __test_merge_existing(bool prev_is_sticky, bool middle_is_sticky, bool next_is_sticky)
> >  {
> >         vm_flags_t vm_flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
> > +       vm_flags_t prev_flags = vm_flags;
> > +       vm_flags_t next_flags = vm_flags;
> >         struct mm_struct mm = {};
> >         VMA_ITERATOR(vmi, &mm, 0);
> >         struct vm_area_struct *vma, *vma_prev, *vma_next;
> > @@ -988,6 +1027,13 @@ static bool test_merge_existing(void)
> >         };
> >         struct anon_vma_chain avc = {};
> >
> > +       if (prev_is_sticky)
> > +               prev_flags |= VM_STICKY;
> > +       if (middle_is_sticky)
> > +               vm_flags |= VM_STICKY;
> > +       if (next_is_sticky)
> > +               next_flags |= VM_STICKY;
> > +
> >         /*
> >          * Merge right case - partial span.
> >          *
> > @@ -1000,7 +1046,7 @@ static bool test_merge_existing(void)
> >          */
> >         vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, vm_flags);
> >         vma->vm_ops = &vm_ops; /* This should have no impact. */
> > -       vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, vm_flags);
> > +       vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, next_flags);
> >         vma_next->vm_ops = &vm_ops; /* This should have no impact. */
> >         vmg_set_range_anon_vma(&vmg, 0x3000, 0x6000, 3, vm_flags, &dummy_anon_vma);
> >         vmg.middle = vma;
> > @@ -1018,6 +1064,8 @@ static bool test_merge_existing(void)
> >         ASSERT_TRUE(vma_write_started(vma));
> >         ASSERT_TRUE(vma_write_started(vma_next));
> >         ASSERT_EQ(mm.map_count, 2);
> > +       if (middle_is_sticky || next_is_sticky)
> > +               ASSERT_TRUE(IS_SET(vma_next->vm_flags, VM_STICKY));
> >
> >         /* Clear down and reset. */
> >         ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
> > @@ -1033,7 +1081,7 @@ static bool test_merge_existing(void)
> >          *   NNNNNNN
> >          */
> >         vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, vm_flags);
> > -       vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, vm_flags);
> > +       vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, next_flags);
> >         vma_next->vm_ops = &vm_ops; /* This should have no impact. */
> >         vmg_set_range_anon_vma(&vmg, 0x2000, 0x6000, 2, vm_flags, &dummy_anon_vma);
> >         vmg.middle = vma;
> > @@ -1046,6 +1094,8 @@ static bool test_merge_existing(void)
> >         ASSERT_EQ(vma_next->anon_vma, &dummy_anon_vma);
> >         ASSERT_TRUE(vma_write_started(vma_next));
> >         ASSERT_EQ(mm.map_count, 1);
> > +       if (middle_is_sticky || next_is_sticky)
> > +               ASSERT_TRUE(IS_SET(vma_next->vm_flags, VM_STICKY));
> >
> >         /* Clear down and reset. We should have deleted vma. */
> >         ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);
> > @@ -1060,7 +1110,7 @@ static bool test_merge_existing(void)
> >          * 0123456789
> >          * PPPPPPV
> >          */
> > -       vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, vm_flags);
> > +       vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, prev_flags);
> >         vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
> >         vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, vm_flags);
> >         vma->vm_ops = &vm_ops; /* This should have no impact. */
> > @@ -1080,6 +1130,8 @@ static bool test_merge_existing(void)
> >         ASSERT_TRUE(vma_write_started(vma_prev));
> >         ASSERT_TRUE(vma_write_started(vma));
> >         ASSERT_EQ(mm.map_count, 2);
> > +       if (prev_is_sticky || middle_is_sticky)
> > +               ASSERT_TRUE(IS_SET(vma_prev->vm_flags, VM_STICKY));
> >
> >         /* Clear down and reset. */
> >         ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);
> > @@ -1094,7 +1146,7 @@ static bool test_merge_existing(void)
> >          * 0123456789
> >          * PPPPPPP
> >          */
> > -       vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, vm_flags);
> > +       vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, prev_flags);
> >         vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
> >         vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, vm_flags);
> >         vmg_set_range_anon_vma(&vmg, 0x3000, 0x7000, 3, vm_flags, &dummy_anon_vma);
> > @@ -1109,6 +1161,8 @@ static bool test_merge_existing(void)
> >         ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
> >         ASSERT_TRUE(vma_write_started(vma_prev));
> >         ASSERT_EQ(mm.map_count, 1);
> > +       if (prev_is_sticky || middle_is_sticky)
> > +               ASSERT_TRUE(IS_SET(vma_prev->vm_flags, VM_STICKY));
> >
> >         /* Clear down and reset. We should have deleted vma. */
> >         ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);
> > @@ -1123,10 +1177,10 @@ static bool test_merge_existing(void)
> >          * 0123456789
> >          * PPPPPPPPPP
> >          */
> > -       vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, vm_flags);
> > +       vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, prev_flags);
> >         vma_prev->vm_ops = &vm_ops; /* This should have no impact. */
> >         vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, vm_flags);
> > -       vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, vm_flags);
> > +       vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, next_flags);
> >         vmg_set_range_anon_vma(&vmg, 0x3000, 0x7000, 3, vm_flags, &dummy_anon_vma);
> >         vmg.prev = vma_prev;
> >         vmg.middle = vma;
> > @@ -1139,6 +1193,8 @@ static bool test_merge_existing(void)
> >         ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma);
> >         ASSERT_TRUE(vma_write_started(vma_prev));
> >         ASSERT_EQ(mm.map_count, 1);
> > +       if (prev_is_sticky || middle_is_sticky || next_is_sticky)
> > +               ASSERT_TRUE(IS_SET(vma_prev->vm_flags, VM_STICKY));
> >
> >         /* Clear down and reset. We should have deleted prev and next. */
> >         ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);
> > @@ -1158,9 +1214,9 @@ static bool test_merge_existing(void)
> >          * PPPVVVVVNNN
> >          */
> >
> > -       vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, vm_flags);
> > +       vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, prev_flags);
> >         vma = alloc_and_link_vma(&mm, 0x3000, 0x8000, 3, vm_flags);
> > -       vma_next = alloc_and_link_vma(&mm, 0x8000, 0xa000, 8, vm_flags);
> > +       vma_next = alloc_and_link_vma(&mm, 0x8000, 0xa000, 8, next_flags);
> >
> >         vmg_set_range(&vmg, 0x4000, 0x5000, 4, vm_flags);
> >         vmg.prev = vma;
> > @@ -1203,6 +1259,19 @@ static bool test_merge_existing(void)
> >         return true;
> >  }
> >
> > +static bool test_merge_existing(void)
> > +{
> > +       int i, j, k;
> > +
> > +       /* Generate every possible permutation of sticky flags. */
> > +       for (i = 0; i < 2; i++)
> > +               for (j = 0; j < 2; j++)
> > +                       for (k = 0; k < 2; k++)
> > +                               ASSERT_TRUE(__test_merge_existing(i, j, k));
> > +
> > +       return true;
> > +}
> > +
> >  static bool test_anon_vma_non_mergeable(void)
> >  {
> >         vm_flags_t vm_flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
> > diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
> > index e40c93edc5a7..3d9cb3a9411a 100644
> > --- a/tools/testing/vma/vma_internal.h
> > +++ b/tools/testing/vma/vma_internal.h
> > @@ -117,6 +117,38 @@ extern unsigned long dac_mmap_min_addr;
> >  #define VM_SEALED      VM_NONE
> >  #endif
> >
> > +/* Flags which should result in page tables being copied on fork. */
> > +#define VM_COPY_ON_FORK VM_MAYBE_GUARD
> > +
> > +/*
> > + * Flags which should be 'sticky' on merge - that is, flags which, when one VMA
> > + * possesses it but the other does not, the merged VMA should nonetheless have
> > + * applied to it:
> > + *
> > + * VM_COPY_ON_FORK - These flags indicates that a VMA maps a range that contains
> > + *                   metadata which should be unconditionally propagated upon
> > + *                   fork. When merging two VMAs, we encapsulate this range in
> > + *                   the merged VMA, so the flag should be 'sticky' as a result.
> > + */
> > +#define VM_STICKY VM_COPY_ON_FORK
> > +
> > +/*
> > + * VMA flags we ignore for the purposes of merge, i.e. one VMA possessing one
> > + * of these flags and the other not does not preclude a merge.
> > + *
> > + * VM_SOFTDIRTY - Should not prevent from VMA merging, if we match the flags but
> > + *                dirty bit -- the caller should mark merged VMA as dirty. If
> > + *                dirty bit won't be excluded from comparison, we increase
> > + *                pressure on the memory system forcing the kernel to generate
> > + *                new VMAs when old one could be extended instead.
> > + *
> > + *    VM_STICKY - If one VMA has flags which must be 'sticky', that is ones
> > + *                which should propagate to all VMAs, but the other does not,
> > + *                the merge should still proceed with the merge logic applying
> > + *                sticky flags to the final VMA.
> > + */
> > +#define VM_IGNORE_MERGE (VM_SOFTDIRTY | VM_STICKY)
> > +
> >  #define FIRST_USER_ADDRESS     0UL
> >  #define USER_PGTABLES_CEILING  0UL
> >
> > --
> > 2.51.0
> >

Thanks for review!