[PATCH v2 08/16] mm: add ability to take further action in vm_area_desc

Lorenzo Stoakes posted 16 patches 3 weeks, 1 day ago
There is a newer version of this series
[PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
Posted by Lorenzo Stoakes 3 weeks, 1 day ago
Some drivers/filesystems need to perform additional tasks after the VMA is
set up. This is typically in the form of pre-population.

The forms of pre-population most likely to be performed are a PFN remap or
insertion of a mixed map, so we provide this functionality, ensuring that
we perform the appropriate actions at the appropriate time - that is
setting flags at the point of .mmap_prepare, and performing the actual
remap at the point at which the VMA is fully established.

This prevents the driver from doing anything too crazy with a VMA at any
stage, and we retain complete control over how the mm functionality is
applied.

Unfortunately callers still do often require some kind of custom action, so
we add an optional success/error _hook to allow the caller to do something
after the action has succeeded or failed.

This is done at the point when the VMA has already been established, so the
harm that can be done is limited.

The error hook can be used to filter errors if necessary.

We implement actions as abstracted from the vm_area_desc, so we provide the
ability for custom hooks to invoke actions distinct from the vma
descriptor.

If any error arises on these final actions, we simply unmap the VMA
altogether.

Also update the stacked filesystem compatibility layer to utilise the
action behaviour, and update the VMA tests accordingly.

For drivers which perform truly custom logic, we provide a custom action
hook which is invoked at the point of action execution.

This can then, in turn, update the desc object and perform other actions,
such as partially remapping ranges for instance. We export
vma_desc_action_prepare() and vma_desc_action_complete() for drivers to do
this.

This is performed at a stage where the VMA is already established,
immediately prior to mapping completion, so it is considerably less
problematic than a general mmap hook.

Note that at the point of the action being taken, the VMA is visible via
the rmap, only the VMA write lock is held, so if anything needs to access
the VMA, it is able to.

Essentially the action is taken as if it were performed after the mapping,
but is kept atomic with VMA state.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/mm.h               |  30 ++++++
 include/linux/mm_types.h         |  61 ++++++++++++
 mm/util.c                        | 150 +++++++++++++++++++++++++++-
 mm/vma.c                         |  70 ++++++++-----
 tools/testing/vma/vma_internal.h | 164 ++++++++++++++++++++++++++++++-
 5 files changed, 447 insertions(+), 28 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index cca149bb8ef1..2ceead3ffcf0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3597,6 +3597,36 @@ static inline unsigned long vma_desc_pages(struct vm_area_desc *desc)
 	return vma_desc_size(desc) >> PAGE_SHIFT;
 }
 
+static inline void mmap_action_remap(struct mmap_action *action,
+		unsigned long addr, unsigned long pfn, unsigned long size,
+		pgprot_t pgprot)
+{
+	action->type = MMAP_REMAP_PFN;
+
+	action->remap.addr = addr;
+	action->remap.pfn = pfn;
+	action->remap.size = size;
+	action->remap.pgprot = pgprot;
+}
+
+static inline void mmap_action_mixedmap(struct mmap_action *action,
+		unsigned long addr, unsigned long pfn, unsigned long num_pages)
+{
+	action->type = MMAP_INSERT_MIXED;
+
+	action->mixedmap.addr = addr;
+	action->mixedmap.pfn = pfn;
+	action->mixedmap.num_pages = num_pages;
+}
+
+struct page **mmap_action_mixedmap_pages(struct mmap_action *action,
+		unsigned long addr, unsigned long num_pages);
+
+void mmap_action_prepare(struct mmap_action *action,
+			     struct vm_area_desc *desc);
+int mmap_action_complete(struct mmap_action *action,
+			     struct vm_area_struct *vma);
+
 /* Look up the first VMA which exactly match the interval vm_start ... vm_end */
 static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm,
 				unsigned long vm_start, unsigned long vm_end)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 4a441f78340d..ae6c7a0a18a7 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -770,6 +770,64 @@ struct pfnmap_track_ctx {
 };
 #endif
 
+/* What action should be taken after an .mmap_prepare call is complete? */
+enum mmap_action_type {
+	MMAP_NOTHING,		 /* Mapping is complete, no further action. */
+	MMAP_REMAP_PFN,		 /* Remap PFN range based on desc->remap. */
+	MMAP_INSERT_MIXED,	 /* Mixed map based on desc->mixedmap. */
+	MMAP_INSERT_MIXED_PAGES, /* Mixed map based on desc->mixedmap_pages. */
+	MMAP_CUSTOM_ACTION,	 /* User-provided hook. */
+};
+
+struct mmap_action {
+	union {
+		/* Remap range. */
+		struct {
+			unsigned long addr;
+			unsigned long pfn;
+			unsigned long size;
+			pgprot_t pgprot;
+		} remap;
+		/* Insert mixed map. */
+		struct {
+			unsigned long addr;
+			unsigned long pfn;
+			unsigned long num_pages;
+		} mixedmap;
+		/* Insert specific mixed map pages. */
+		struct {
+			unsigned long addr;
+			struct page **pages;
+			unsigned long num_pages;
+			/* kfree pages on completion? */
+			bool kfree_pages :1;
+		} mixedmap_pages;
+		struct {
+			int (*action_hook)(struct vm_area_struct *vma);
+		} custom;
+	};
+	enum mmap_action_type type;
+
+	/*
+	 * If specified, this hook is invoked after the selected action has been
+	 * successfully completed. Not that the VMA write lock still held.
+	 *
+	 * The absolute minimum ought to be done here.
+	 *
+	 * Returns 0 on success, or an error code.
+	 */
+	int (*success_hook)(struct vm_area_struct *vma);
+
+	/*
+	 * If specified, this hook is invoked when an error occurred when
+	 * attempting the selection action.
+	 *
+	 * The hook can return an error code in order to filter the error, but
+	 * it is not valid to clear the error here.
+	 */
+	int (*error_hook)(int err);
+};
+
 /*
  * Describes a VMA that is about to be mmap()'ed. Drivers may choose to
  * manipulate mutable fields which will cause those fields to be updated in the
@@ -793,6 +851,9 @@ struct vm_area_desc {
 	/* Write-only fields. */
 	const struct vm_operations_struct *vm_ops;
 	void *private_data;
+
+	/* Take further action? */
+	struct mmap_action action;
 };
 
 /*
diff --git a/mm/util.c b/mm/util.c
index 248f877f629b..11752d67b89c 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1155,15 +1155,18 @@ int __compat_vma_mmap_prepare(const struct file_operations *f_op,
 		.vm_file = vma->vm_file,
 		.vm_flags = vma->vm_flags,
 		.page_prot = vma->vm_page_prot,
+
+		.action.type = MMAP_NOTHING, /* Default */
 	};
 	int err;
 
 	err = f_op->mmap_prepare(&desc);
 	if (err)
 		return err;
-	set_vma_from_desc(vma, &desc);
 
-	return 0;
+	mmap_action_prepare(&desc.action, &desc);
+	set_vma_from_desc(vma, &desc);
+	return mmap_action_complete(&desc.action, vma);
 }
 EXPORT_SYMBOL(__compat_vma_mmap_prepare);
 
@@ -1279,6 +1282,149 @@ void snapshot_page(struct page_snapshot *ps, const struct page *page)
 	}
 }
 
+struct page **mmap_action_mixedmap_pages(struct mmap_action *action,
+		unsigned long addr, unsigned long num_pages)
+{
+	struct page **pages;
+
+	pages = kmalloc_array(num_pages, sizeof(struct page *), GFP_KERNEL);
+	if (!pages)
+		return NULL;
+
+	action->type = MMAP_INSERT_MIXED_PAGES;
+
+	action->mixedmap_pages.addr = addr;
+	action->mixedmap_pages.num_pages = num_pages;
+	action->mixedmap_pages.kfree_pages = true;
+	action->mixedmap_pages.pages = pages;
+
+	return pages;
+}
+EXPORT_SYMBOL(mmap_action_mixedmap_pages);
+
+/**
+ * mmap_action_prepare - Perform preparatory setup for an VMA descriptor
+ * action which need to be performed.
+ * @desc: The VMA descriptor to prepare for @action.
+ * @action: The action to perform.
+ *
+ * Other than internal mm use, this is intended to be used by mmap_prepare code
+ * which specifies a custom action hook and needs to prepare for another action
+ * it wishes to perform.
+ */
+void mmap_action_prepare(struct mmap_action *action,
+			     struct vm_area_desc *desc)
+{
+	switch (action->type) {
+	case MMAP_NOTHING:
+	case MMAP_CUSTOM_ACTION:
+		break;
+	case MMAP_REMAP_PFN:
+		remap_pfn_range_prepare(desc, action->remap.pfn);
+		break;
+	case MMAP_INSERT_MIXED:
+	case MMAP_INSERT_MIXED_PAGES:
+		desc->vm_flags |= VM_MIXEDMAP;
+		break;
+	}
+}
+EXPORT_SYMBOL(mmap_action_prepare);
+
+/**
+ * mmap_action_complete - Execute VMA descriptor action.
+ * @action: The action to perform.
+ * @vma: The VMA to perform the action upon.
+ *
+ * Similar to mmap_action_prepare(), other than internal mm usage this is
+ * intended for mmap_prepare users who implement a custom hook - with this
+ * function being called from the custom hook itself.
+ *
+ * Return: 0 on success, or error, at which point the VMA will be unmapped.
+ */
+int mmap_action_complete(struct mmap_action *action,
+			     struct vm_area_struct *vma)
+{
+	int err = 0;
+
+	switch (action->type) {
+	case MMAP_NOTHING:
+		break;
+	case MMAP_REMAP_PFN:
+		VM_WARN_ON_ONCE((vma->vm_flags & VM_REMAP_FLAGS) !=
+				VM_REMAP_FLAGS);
+
+		err = remap_pfn_range_complete(vma, action->remap.addr,
+				action->remap.pfn, action->remap.size,
+				action->remap.pgprot);
+
+		break;
+	case MMAP_INSERT_MIXED:
+	{
+		unsigned long pgnum = 0;
+		unsigned long pfn = action->mixedmap.pfn;
+		unsigned long addr = action->mixedmap.addr;
+		unsigned long vaddr = vma->vm_start;
+
+		VM_WARN_ON_ONCE(!(vma->vm_flags & VM_MIXEDMAP));
+
+		for (; pgnum < action->mixedmap.num_pages;
+		     pgnum++, pfn++, addr += PAGE_SIZE, vaddr += PAGE_SIZE) {
+			vm_fault_t vmf;
+
+			vmf = vmf_insert_mixed(vma, vaddr, addr);
+			if (vmf & VM_FAULT_ERROR) {
+				err = vm_fault_to_errno(vmf, 0);
+				break;
+			}
+		}
+
+		break;
+	}
+	case MMAP_INSERT_MIXED_PAGES:
+	{
+		struct page **pages = action->mixedmap_pages.pages;
+		unsigned long nr_pages = action->mixedmap_pages.num_pages;
+
+		VM_WARN_ON_ONCE(!(vma->vm_flags & VM_MIXEDMAP));
+
+		err = vm_insert_pages(vma, action->mixedmap_pages.addr,
+				pages, &nr_pages);
+		if (action->mixedmap_pages.kfree_pages)
+			kfree(pages);
+		break;
+	}
+	case MMAP_CUSTOM_ACTION:
+		err = action->custom.action_hook(vma);
+		break;
+	}
+
+	/*
+	 * If an error occurs, unmap the VMA altogether and return an error. We
+	 * only clear the newly allocated VMA, since this function is only
+	 * invoked if we do NOT merge, so we only clean up the VMA we created.
+	 */
+	if (err) {
+		const size_t len = vma_pages(vma) << PAGE_SHIFT;
+
+		do_munmap(current->mm, vma->vm_start, len, NULL);
+
+		if (action->error_hook) {
+			/* We may want to filter the error. */
+			err = action->error_hook(err);
+
+			/* The caller should not clear the error. */
+			VM_WARN_ON_ONCE(!err);
+		}
+		return err;
+	}
+
+	if (action->success_hook)
+		err = action->success_hook(vma);
+
+	return 0;
+}
+EXPORT_SYMBOL(mmap_action_complete);
+
 #ifdef CONFIG_MMU
 /**
  * folio_pte_batch - detect a PTE batch for a large folio
diff --git a/mm/vma.c b/mm/vma.c
index 36a9f4d453be..a1ec405bda25 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -2328,17 +2328,33 @@ static void update_ksm_flags(struct mmap_state *map)
 	map->vm_flags = ksm_vma_flags(map->mm, map->file, map->vm_flags);
 }
 
+static void set_desc_from_map(struct vm_area_desc *desc,
+		const struct mmap_state *map)
+{
+	desc->start = map->addr;
+	desc->end = map->end;
+
+	desc->pgoff = map->pgoff;
+	desc->vm_file = map->file;
+	desc->vm_flags = map->vm_flags;
+	desc->page_prot = map->page_prot;
+}
+
 /*
  * __mmap_setup() - Prepare to gather any overlapping VMAs that need to be
  * unmapped once the map operation is completed, check limits, account mapping
  * and clean up any pre-existing VMAs.
  *
+ * As a result it sets up the @map and @desc objects.
+ *
  * @map: Mapping state.
+ * @desc: VMA descriptor
  * @uf:  Userfaultfd context list.
  *
  * Returns: 0 on success, error code otherwise.
  */
-static int __mmap_setup(struct mmap_state *map, struct list_head *uf)
+static int __mmap_setup(struct mmap_state *map, struct vm_area_desc *desc,
+			struct list_head *uf)
 {
 	int error;
 	struct vma_iterator *vmi = map->vmi;
@@ -2395,6 +2411,7 @@ static int __mmap_setup(struct mmap_state *map, struct list_head *uf)
 	 */
 	vms_clean_up_area(vms, &map->mas_detach);
 
+	set_desc_from_map(desc, map);
 	return 0;
 }
 
@@ -2567,34 +2584,26 @@ static void __mmap_complete(struct mmap_state *map, struct vm_area_struct *vma)
  *
  * Returns 0 on success, or an error code otherwise.
  */
-static int call_mmap_prepare(struct mmap_state *map)
+static int call_mmap_prepare(struct mmap_state *map,
+		struct vm_area_desc *desc)
 {
 	int err;
-	struct vm_area_desc desc = {
-		.mm = map->mm,
-		.file = map->file,
-		.start = map->addr,
-		.end = map->end,
-
-		.pgoff = map->pgoff,
-		.vm_file = map->file,
-		.vm_flags = map->vm_flags,
-		.page_prot = map->page_prot,
-	};
 
 	/* Invoke the hook. */
-	err = vfs_mmap_prepare(map->file, &desc);
+	err = vfs_mmap_prepare(map->file, desc);
 	if (err)
 		return err;
 
+	mmap_action_prepare(&desc->action, desc);
+
 	/* Update fields permitted to be changed. */
-	map->pgoff = desc.pgoff;
-	map->file = desc.vm_file;
-	map->vm_flags = desc.vm_flags;
-	map->page_prot = desc.page_prot;
+	map->pgoff = desc->pgoff;
+	map->file = desc->vm_file;
+	map->vm_flags = desc->vm_flags;
+	map->page_prot = desc->page_prot;
 	/* User-defined fields. */
-	map->vm_ops = desc.vm_ops;
-	map->vm_private_data = desc.private_data;
+	map->vm_ops = desc->vm_ops;
+	map->vm_private_data = desc->private_data;
 
 	return 0;
 }
@@ -2642,16 +2651,24 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma = NULL;
-	int error;
 	bool have_mmap_prepare = file && file->f_op->mmap_prepare;
 	VMA_ITERATOR(vmi, mm, addr);
 	MMAP_STATE(map, mm, &vmi, addr, len, pgoff, vm_flags, file);
+	struct vm_area_desc desc = {
+		.mm = mm,
+		.file = file,
+		.action = {
+			.type = MMAP_NOTHING, /* Default to no further action. */
+		},
+	};
+	bool allocated_new = false;
+	int error;
 
 	map.check_ksm_early = can_set_ksm_flags_early(&map);
 
-	error = __mmap_setup(&map, uf);
+	error = __mmap_setup(&map, &desc, uf);
 	if (!error && have_mmap_prepare)
-		error = call_mmap_prepare(&map);
+		error = call_mmap_prepare(&map, &desc);
 	if (error)
 		goto abort_munmap;
 
@@ -2670,6 +2687,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
 		error = __mmap_new_vma(&map, &vma);
 		if (error)
 			goto unacct_error;
+		allocated_new = true;
 	}
 
 	if (have_mmap_prepare)
@@ -2677,6 +2695,12 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
 
 	__mmap_complete(&map, vma);
 
+	if (have_mmap_prepare && allocated_new) {
+		error = mmap_action_complete(&desc.action, vma);
+		if (error)
+			return error;
+	}
+
 	return addr;
 
 	/* Accounting was done by __mmap_setup(). */
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
index 07167446dcf4..c21642974798 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -170,6 +170,28 @@ typedef __bitwise unsigned int vm_fault_t;
 #define swap(a, b) \
 	do { typeof(a) __tmp = (a); (a) = (b); (b) = __tmp; } while (0)
 
+enum vm_fault_reason {
+	VM_FAULT_OOM            = (__force vm_fault_t)0x000001,
+	VM_FAULT_SIGBUS         = (__force vm_fault_t)0x000002,
+	VM_FAULT_MAJOR          = (__force vm_fault_t)0x000004,
+	VM_FAULT_HWPOISON       = (__force vm_fault_t)0x000010,
+	VM_FAULT_HWPOISON_LARGE = (__force vm_fault_t)0x000020,
+	VM_FAULT_SIGSEGV        = (__force vm_fault_t)0x000040,
+	VM_FAULT_NOPAGE         = (__force vm_fault_t)0x000100,
+	VM_FAULT_LOCKED         = (__force vm_fault_t)0x000200,
+	VM_FAULT_RETRY          = (__force vm_fault_t)0x000400,
+	VM_FAULT_FALLBACK       = (__force vm_fault_t)0x000800,
+	VM_FAULT_DONE_COW       = (__force vm_fault_t)0x001000,
+	VM_FAULT_NEEDDSYNC      = (__force vm_fault_t)0x002000,
+	VM_FAULT_COMPLETED      = (__force vm_fault_t)0x004000,
+	VM_FAULT_HINDEX_MASK    = (__force vm_fault_t)0x0f0000,
+};
+#define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS |	\
+			VM_FAULT_SIGSEGV | VM_FAULT_HWPOISON |	\
+			VM_FAULT_HWPOISON_LARGE | VM_FAULT_FALLBACK)
+
+#define FOLL_HWPOISON  (1 << 6)
+
 struct kref {
 	refcount_t refcount;
 };
@@ -274,6 +296,92 @@ struct mm_struct {
 
 struct vm_area_struct;
 
+/* What action should be taken after an .mmap_prepare call is complete? */
+enum mmap_action_type {
+	MMAP_NOTHING,		 /* Mapping is complete, no further action. */
+	MMAP_REMAP_PFN,		 /* Remap PFN range based on desc->remap. */
+	MMAP_INSERT_MIXED,	 /* Mixed map based on desc->mixedmap. */
+	MMAP_INSERT_MIXED_PAGES, /* Mixed map based on desc->mixedmap_pages. */
+	MMAP_CUSTOM_ACTION,	 /* User-provided hook. */
+};
+
+struct mmap_action {
+	union {
+		/* Remap range. */
+		struct {
+			unsigned long addr;
+			unsigned long pfn;
+			unsigned long size;
+			pgprot_t pgprot;
+		} remap;
+		/* Insert mixed map. */
+		struct {
+			unsigned long addr;
+			unsigned long pfn;
+			unsigned long num_pages;
+		} mixedmap;
+		/* Insert specific mixed map pages. */
+		struct {
+			unsigned long addr;
+			struct page **pages;
+			unsigned long num_pages;
+			/* kfree pages on completion? */
+			bool kfree_pages :1;
+		} mixedmap_pages;
+		struct {
+			int (*action_hook)(struct vm_area_struct *vma);
+		} custom;
+	};
+	enum mmap_action_type type;
+
+	/*
+	 * If specified, this hook is invoked after the selected action has been
+	 * successfully completed. Not that the VMA write lock still held.
+	 *
+	 * The absolute minimum ought to be done here.
+	 *
+	 * Returns 0 on success, or an error code.
+	 */
+	int (*success_hook)(struct vm_area_struct *vma);
+
+	/*
+	 * If specified, this hook is invoked when an error occurred when
+	 * attempting the selection action.
+	 *
+	 * The hook can return an error code in order to filter the error, but
+	 * it is not valid to clear the error here.
+	 */
+	int (*error_hook)(int err);
+};
+
+/*
+ * Describes a VMA that is about to be mmap()'ed. Drivers may choose to
+ * manipulate mutable fields which will cause those fields to be updated in the
+ * resultant VMA.
+ *
+ * Helper functions are not required for manipulating any field.
+ */
+struct vm_area_desc {
+	/* Immutable state. */
+	const struct mm_struct *const mm;
+	struct file *const file; /* May vary from vm_file in stacked callers. */
+	unsigned long start;
+	unsigned long end;
+
+	/* Mutable fields. Populated with initial state. */
+	pgoff_t pgoff;
+	struct file *vm_file;
+	vm_flags_t vm_flags;
+	pgprot_t page_prot;
+
+	/* Write-only fields. */
+	const struct vm_operations_struct *vm_ops;
+	void *private_data;
+
+	/* Take further action? */
+	struct mmap_action action;
+};
+
 /*
  * Describes a VMA that is about to be mmap()'ed. Drivers may choose to
  * manipulate mutable fields which will cause those fields to be updated in the
@@ -297,6 +405,9 @@ struct vm_area_desc {
 	/* Write-only fields. */
 	const struct vm_operations_struct *vm_ops;
 	void *private_data;
+
+	/* Take further action? */
+	struct mmap_action action;
 };
 
 struct file_operations {
@@ -1466,12 +1577,23 @@ static inline void free_anon_vma_name(struct vm_area_struct *vma)
 static inline void set_vma_from_desc(struct vm_area_struct *vma,
 		struct vm_area_desc *desc);
 
+static inline void mmap_action_prepare(struct mmap_action *action,
+					   struct vm_area_desc *desc)
+{
+}
+
+static inline int mmap_action_complete(struct mmap_action *action,
+					   struct vm_area_struct *vma)
+{
+	return 0;
+}
+
 static inline int __compat_vma_mmap_prepare(const struct file_operations *f_op,
 		struct file *file, struct vm_area_struct *vma)
 {
 	struct vm_area_desc desc = {
 		.mm = vma->vm_mm,
-		.file = vma->vm_file,
+		.file = file,
 		.start = vma->vm_start,
 		.end = vma->vm_end,
 
@@ -1479,15 +1601,18 @@ static inline int __compat_vma_mmap_prepare(const struct file_operations *f_op,
 		.vm_file = vma->vm_file,
 		.vm_flags = vma->vm_flags,
 		.page_prot = vma->vm_page_prot,
+
+		.action.type = MMAP_NOTHING, /* Default */
 	};
 	int err;
 
 	err = f_op->mmap_prepare(&desc);
 	if (err)
 		return err;
-	set_vma_from_desc(vma, &desc);
 
-	return 0;
+	mmap_action_prepare(&desc.action, &desc);
+	set_vma_from_desc(vma, &desc);
+	return mmap_action_complete(&desc.action, vma);
 }
 
 static inline int compat_vma_mmap_prepare(struct file *file,
@@ -1548,4 +1673,37 @@ static inline vm_flags_t ksm_vma_flags(const struct mm_struct *, const struct fi
 	return vm_flags;
 }
 
+static inline void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn)
+{
+}
+
+static inline int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
+		unsigned long pfn, unsigned long size, pgprot_t pgprot)
+{
+	return 0;
+}
+
+static inline vm_fault_t vmf_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
+		unsigned long pfn)
+{
+	return 0;
+}
+
+static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags)
+{
+	if (vm_fault & VM_FAULT_OOM)
+		return -ENOMEM;
+	if (vm_fault & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE))
+		return (foll_flags & FOLL_HWPOISON) ? -EHWPOISON : -EFAULT;
+	if (vm_fault & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV))
+		return -EFAULT;
+	return 0;
+}
+
+static inline int do_munmap(struct mm_struct *, unsigned long, size_t,
+		struct list_head *uf)
+{
+	return 0;
+}
+
 #endif	/* __MM_VMA_INTERNAL_H */
-- 
2.51.0
Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
Posted by Jason Gunthorpe 2 weeks, 3 days ago
On Wed, Sep 10, 2025 at 09:22:03PM +0100, Lorenzo Stoakes wrote:
> +static inline void mmap_action_remap(struct mmap_action *action,
> +		unsigned long addr, unsigned long pfn, unsigned long size,
> +		pgprot_t pgprot)
> +{
> +	action->type = MMAP_REMAP_PFN;
> +
> +	action->remap.addr = addr;
> +	action->remap.pfn = pfn;
> +	action->remap.size = size;
> +	action->remap.pgprot = pgprot;
> +}

These helpers drivers are supposed to call really should have kdocs.

Especially since 'addr' is sort of ambigous.

And I'm wondering why they don't take in the vm_area_desc? Eg shouldn't
we be strongly discouraging using anything other than
vma->vm_page_prot as the last argument?

I'd probably also have a small helper wrapper for the very common case
of whole vma:

/* Fill the entire VMA with pfns starting at pfn. Caller must have 
 * already checked desc has an appropriate size */
mmap_action_remap_full(struct vm_area_desc *desc, unsigned long pfn)

It is not normal for a driver to partially populate a VMA, lets call
those out as something weird.

> +struct page **mmap_action_mixedmap_pages(struct mmap_action *action,
> +		unsigned long addr, unsigned long num_pages)
> +{
> +	struct page **pages;
> +
> +	pages = kmalloc_array(num_pages, sizeof(struct page *), GFP_KERNEL);
> +	if (!pages)
> +		return NULL;

This allocation seems like a shame, I doubt many places actually need
it .. A callback to get each pfn would be better?

Jason
Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
Posted by Lorenzo Stoakes 2 weeks, 3 days ago
On Mon, Sep 15, 2025 at 09:11:12AM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 10, 2025 at 09:22:03PM +0100, Lorenzo Stoakes wrote:
> > +static inline void mmap_action_remap(struct mmap_action *action,
> > +		unsigned long addr, unsigned long pfn, unsigned long size,
> > +		pgprot_t pgprot)
> > +{
> > +	action->type = MMAP_REMAP_PFN;
> > +
> > +	action->remap.addr = addr;
> > +	action->remap.pfn = pfn;
> > +	action->remap.size = size;
> > +	action->remap.pgprot = pgprot;
> > +}
>
> These helpers drivers are supposed to call really should have kdocs.
>
> Especially since 'addr' is sort of ambigous.

OK.

>
> And I'm wondering why they don't take in the vm_area_desc? Eg shouldn't
> we be strongly discouraging using anything other than
> vma->vm_page_prot as the last argument?

I need to abstract desc from action so custom handlers can perform
sub-actions. It's unfortunate but there we go.

There'd be horrible confusion passing around a desc that has an action in
it that you then ignore, otherwise. Better to abstract the concept of
action altogether.

>
> I'd probably also have a small helper wrapper for the very common case
> of whole vma:
>
> /* Fill the entire VMA with pfns starting at pfn. Caller must have
>  * already checked desc has an appropriate size */
> mmap_action_remap_full(struct vm_area_desc *desc, unsigned long pfn)

See above re: desc vs. action.



>
> It is not normal for a driver to partially populate a VMA, lets call
> those out as something weird.
>
> > +struct page **mmap_action_mixedmap_pages(struct mmap_action *action,
> > +		unsigned long addr, unsigned long num_pages)
> > +{
> > +	struct page **pages;
> > +
> > +	pages = kmalloc_array(num_pages, sizeof(struct page *), GFP_KERNEL);
> > +	if (!pages)
> > +		return NULL;
>
> This allocation seems like a shame, I doubt many places actually need
> it .. A callback to get each pfn would be better?

It'd be hard to know how to get the context right that'd need to be supplied to
the callback.

In kcov's case it'd be kcov->area + an offset.

So we'd need an offset parameter, the struct file *, whatever else to be
passed.

And then we'll find a driver where that doesn't work and we're screwed.

I don't think optimising for mmap setup is really important.

We can always go back and refactor things later once this pattern is
established.

And again with ~230 odd drivers to update, I'd rather keep things as simple
as possible for now.

>
> Jason

Cheers, Lorenzo
Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
Posted by Jason Gunthorpe 2 weeks, 3 days ago
On Mon, Sep 15, 2025 at 01:23:30PM +0100, Lorenzo Stoakes wrote:
> On Mon, Sep 15, 2025 at 09:11:12AM -0300, Jason Gunthorpe wrote:
> > On Wed, Sep 10, 2025 at 09:22:03PM +0100, Lorenzo Stoakes wrote:
> > > +static inline void mmap_action_remap(struct mmap_action *action,
> > > +		unsigned long addr, unsigned long pfn, unsigned long size,
> > > +		pgprot_t pgprot)
> > > +{
> > > +	action->type = MMAP_REMAP_PFN;
> > > +
> > > +	action->remap.addr = addr;
> > > +	action->remap.pfn = pfn;
> > > +	action->remap.size = size;
> > > +	action->remap.pgprot = pgprot;
> > > +}
> >
> > These helpers drivers are supposed to call really should have kdocs.
> >
> > Especially since 'addr' is sort of ambigous.
> 
> OK.
> 
> >
> > And I'm wondering why they don't take in the vm_area_desc? Eg shouldn't
> > we be strongly discouraging using anything other than
> > vma->vm_page_prot as the last argument?
> 
> I need to abstract desc from action so custom handlers can perform
> sub-actions. It's unfortunate but there we go.

Why? I don't see this as required

Just mark the functions as manipulating the action using the 'action'
in the fuction name.

> > I'd probably also have a small helper wrapper for the very common case
> > of whole vma:
> >
> > /* Fill the entire VMA with pfns starting at pfn. Caller must have
> >  * already checked desc has an appropriate size */
> > mmap_action_remap_full(struct vm_area_desc *desc, unsigned long pfn)
> 
> See above re: desc vs. action.

Yet, this is the API most places actually want.
 
> It'd be hard to know how to get the context right that'd need to be supplied to
> the callback.
> 
> In kcov's case it'd be kcov->area + an offset.

Just use pgoff
 
> So we'd need an offset parameter, the struct file *, whatever else to be
> passed.

Yes
 
> And then we'll find a driver where that doesn't work and we're screwed.

Bah, you keep saying that but we also may never even find one.

Jason
Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
Posted by Lorenzo Stoakes 2 weeks, 3 days ago
On Mon, Sep 15, 2025 at 09:42:59AM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 15, 2025 at 01:23:30PM +0100, Lorenzo Stoakes wrote:
> > On Mon, Sep 15, 2025 at 09:11:12AM -0300, Jason Gunthorpe wrote:
> > > On Wed, Sep 10, 2025 at 09:22:03PM +0100, Lorenzo Stoakes wrote:
> > > > +static inline void mmap_action_remap(struct mmap_action *action,
> > > > +		unsigned long addr, unsigned long pfn, unsigned long size,
> > > > +		pgprot_t pgprot)
> > > > +{
> > > > +	action->type = MMAP_REMAP_PFN;
> > > > +
> > > > +	action->remap.addr = addr;
> > > > +	action->remap.pfn = pfn;
> > > > +	action->remap.size = size;
> > > > +	action->remap.pgprot = pgprot;
> > > > +}
> > >
> > > These helpers drivers are supposed to call really should have kdocs.
> > >
> > > Especially since 'addr' is sort of ambigous.
> >
> > OK.
> >
> > >
> > > And I'm wondering why they don't take in the vm_area_desc? Eg shouldn't
> > > we be strongly discouraging using anything other than
> > > vma->vm_page_prot as the last argument?
> >
> > I need to abstract desc from action so custom handlers can perform
> > sub-actions. It's unfortunate but there we go.
>
> Why? I don't see this as required
>
> Just mark the functions as manipulating the action using the 'action'
> in the fuction name.

Because now sub-callers that partially map using one method and partially map
using another now need to have a desc too that they have to 'just know' which
fields to update or artificially set up.

The vmcore case does something like this.

Instead, we have actions where it's 100% clear what's going to happen.

>
> > > I'd probably also have a small helper wrapper for the very common case
> > > of whole vma:
> > >
> > > /* Fill the entire VMA with pfns starting at pfn. Caller must have
> > >  * already checked desc has an appropriate size */
> > > mmap_action_remap_full(struct vm_area_desc *desc, unsigned long pfn)
> >
> > See above re: desc vs. action.
>
> Yet, this is the API most places actually want.
>
> > It'd be hard to know how to get the context right that'd need to be supplied to
> > the callback.
> >
> > In kcov's case it'd be kcov->area + an offset.
>
> Just use pgoff
>
> > So we'd need an offset parameter, the struct file *, whatever else to be
> > passed.
>
> Yes
>
> > And then we'll find a driver where that doesn't work and we're screwed.
>
> Bah, you keep saying that but we also may never even find one.

OK let me try something like this, then. I guess I can update it later if
we discover such a dirver.

>
> Jason

Cheers, Lorenzo
Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
Posted by Jason Gunthorpe 2 weeks, 3 days ago
On Mon, Sep 15, 2025 at 01:54:05PM +0100, Lorenzo Stoakes wrote:
> > Just mark the functions as manipulating the action using the 'action'
> > in the fuction name.
> 
> Because now sub-callers that partially map using one method and partially map
> using another now need to have a desc too that they have to 'just know' which
> fields to update or artificially set up.

Huh? There is only on desc->action, how can you have more than one
action with this scheme?

One action is the right thing anyhow, we can't meaningfully mix
different action types in the same VMA. That's nonsense.

You may need more flexible ways to get the address lists down the road
because not every driver will be contiguous, but that should still be
one action.

> The vmcore case does something like this.

vmcore is a true MIXEDMAP, it isn't doing two actions. These mixedmap
helpers just aren't good for what mixedmap needs.. Mixed map need a
list of physical pfns with a bit indicating if they are "special" or
not. If you do it with a callback or a kmalloc allocation it doesn't
matter.

vmcore would then populate that list with its mixture of special and
non-sepcial memory and do a single mixedmem action.

I think this series should drop the mixedmem stuff, it is the most
complicated action type. A vmalloc_user action is better for kcov.

And maybe that is just a comment overall. This would be nicer if each
series focused on adding one action with a three-four mmap users
converted to use it as an example case.

Eg there are not that many places calling vmalloc_user(), a single
series could convert alot of them.

If you did it this way we'd discover that there are already
helpers for vmalloc_user():

	return remap_vmalloc_range(vma, mdev_state->memblk, 0);

And kcov looks buggy to not be using it already. The above gets the
VMA type right and doesn't force mixedmap :)

Then the series goals are a bit better we can actually fully convert
and remove things like remap_vmalloc_range() in single series. That
looks feasible to me.

Jason
Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
Posted by Lorenzo Stoakes 2 weeks, 3 days ago
On Mon, Sep 15, 2025 at 10:11:42AM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 15, 2025 at 01:54:05PM +0100, Lorenzo Stoakes wrote:
> > > Just mark the functions as manipulating the action using the 'action'
> > > in the fuction name.
> >
> > Because now sub-callers that partially map using one method and partially map
> > using another now need to have a desc too that they have to 'just know' which
> > fields to update or artificially set up.
>
> Huh? There is only on desc->action, how can you have more than one
> action with this scheme?

Because you use a custom hook that can in turn perform actions? As I've
implemented for vmcore?

>
> One action is the right thing anyhow, we can't meaningfully mix
> different action types in the same VMA. That's nonsense.

OK, except that's how 'true' mixed maps work though right? As vmcore is doing?

>
> You may need more flexible ways to get the address lists down the road
> because not every driver will be contiguous, but that should still be
> one action.
>
> > The vmcore case does something like this.
>
> vmcore is a true MIXEDMAP, it isn't doing two actions. These mixedmap
> helpers just aren't good for what mixedmap needs.. Mixed map need a
> list of physical pfns with a bit indicating if they are "special" or
> not. If you do it with a callback or a kmalloc allocation it doesn't
> matter.

Well it's a mix of actions to accomodate PFNs and normal pages as
implemented via a custom hook that can invoke each.

>
> vmcore would then populate that list with its mixture of special and
> non-sepcial memory and do a single mixedmem action.

I'm confused as to why you say a helper would be no good here, then go on
to delineate how a helper could work...

>
> I think this series should drop the mixedmem stuff, it is the most
> complicated action type. A vmalloc_user action is better for kcov.

Fine, I mean if we could find a way to explicitly just give a list of stuff
to map that'd be _great_ vs. having a custom hook.

If we can avoid custom hooks altogether that'd be ideal.

Anyway I'll drop the mixed map stuff, fine.

>
> And maybe that is just a comment overall. This would be nicer if each
> series focused on adding one action with a three-four mmap users
> converted to use it as an example case.

In future series I'll try to group by the action type.

This series is _setting up this to be a possibility at all_.

The idea was that I could put fundamentals in that should cover most cases,
I could then go on to implement them in (relative) peace...

I mean once I drop the mixed map stuff, and refactor to vmalloc_user(),
then we are pretty much doing that, modulo a single vmalloc_user() case.

So maybe I should drop the vmalloc_user() bits too and make this a
remap-only change...

But I don't want to tackle _all_ remap cases here.

I want to add this functionality in and have it ready for next cycle (yeah
not so sure about that now...) so I can then do follow up work.

Am trying to do it before Kernel Recipes which I'll be at and then a (very
very very needed) couple weeks vacaation.

Anyway maybe if I simplify there's still a shot at this landing in time...

>
> Eg there are not that many places calling vmalloc_user(), a single
> series could convert alot of them.
>
> If you did it this way we'd discover that there are already
> helpers for vmalloc_user():
>
> 	return remap_vmalloc_range(vma, mdev_state->memblk, 0);
>
> And kcov looks buggy to not be using it already. The above gets the
> VMA type right and doesn't force mixedmap :)

Right, I mean maybe.

If I can take care of low hanging fruit relatively easily then maybe it'll
be more practical to refactor the 'odd ones out'.

>
> Then the series goals are a bit better we can actually fully convert
> and remove things like remap_vmalloc_range() in single series. That
> looks feasible to me.

Right.

I'd love to drop unused stuff earlier, so _that_ is not an unreasonable
requirement.

>
> Jason

I guess I'll do a respin then as per above.

Cheers, Lorenzo
Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
Posted by Jason Gunthorpe 2 weeks, 3 days ago
On Mon, Sep 15, 2025 at 02:51:52PM +0100, Lorenzo Stoakes wrote:
> > vmcore is a true MIXEDMAP, it isn't doing two actions. These mixedmap
> > helpers just aren't good for what mixedmap needs.. Mixed map need a
> > list of physical pfns with a bit indicating if they are "special" or
> > not. If you do it with a callback or a kmalloc allocation it doesn't
> > matter.
> 
> Well it's a mix of actions to accomodate PFNs and normal pages as
> implemented via a custom hook that can invoke each.

No it's not a mix of actions. The mixedmap helpers are just
wrong for actual mixedmap usage:

+static inline void mmap_action_remap(struct mmap_action *action,
+		unsigned long addr, unsigned long pfn, unsigned long size,
+		pgprot_t pgprot)
+
+static inline void mmap_action_mixedmap(struct mmap_action *action,
+		unsigned long addr, unsigned long pfn, unsigned long num_pages)

Mixed map is a list of PFNs and a flag if the PFN is special or
not. That's what makes mixed map different from the other mapping
cases.

One action per VMA, and mixed map is handled by supporting the above
lis tin some way.

> > I think this series should drop the mixedmem stuff, it is the most
> > complicated action type. A vmalloc_user action is better for kcov.
> 
> Fine, I mean if we could find a way to explicitly just give a list of stuff
> to map that'd be _great_ vs. having a custom hook.

You already proposed to allocate memory to hold an array, I suggested
to have a per-range callback. Either could work as an API for
mixedmap.

> So maybe I should drop the vmalloc_user() bits too and make this a
> remap-only change...

Sure
 
> But I don't want to tackle _all_ remap cases here.

Due 4-5 or something to show the API is working. Things like my remark
to have a better helper that does whole-vma only should show up more
clearly with a few more conversions.

It is generally a good idea when doing these reworks to look across
all the use cases patterns and try to simplify them. This is why a
series per pattern is a good idea because you are saying you found a
pattern, and here are N examples of the pattern to prove it.

Eg if a huge number of drivers are just mmaping a linear range of
memory with a fixed pgoff then a helper to support exactly that
pattern with minimal driver code should be developed.

Like below, apparently vmalloc_user() is already a pattern and already
has a simplifying safe helper.

> Anyway maybe if I simplify there's still a shot at this landing in time...

Simplify is always good to help things get merged :)
 
> > Eg there are not that many places calling vmalloc_user(), a single
> > series could convert alot of them.
> >
> > If you did it this way we'd discover that there are already
> > helpers for vmalloc_user():
> >
> > 	return remap_vmalloc_range(vma, mdev_state->memblk, 0);
> >
> > And kcov looks buggy to not be using it already. The above gets the
> > VMA type right and doesn't force mixedmap :)
> 
> Right, I mean maybe.

Maybe send out a single patch to change kcov to remap_vmalloc_range()
for this cycle? Answer the maybe?

Jason
Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
Posted by Lorenzo Stoakes 2 weeks, 3 days ago
On Mon, Sep 15, 2025 at 11:34:14AM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 15, 2025 at 02:51:52PM +0100, Lorenzo Stoakes wrote:
> > > vmcore is a true MIXEDMAP, it isn't doing two actions. These mixedmap
> > > helpers just aren't good for what mixedmap needs.. Mixed map need a
> > > list of physical pfns with a bit indicating if they are "special" or
> > > not. If you do it with a callback or a kmalloc allocation it doesn't
> > > matter.
> >
> > Well it's a mix of actions to accomodate PFNs and normal pages as
> > implemented via a custom hook that can invoke each.
>
> No it's not a mix of actions. The mixedmap helpers are just
> wrong for actual mixedmap usage:
>
> +static inline void mmap_action_remap(struct mmap_action *action,
> +		unsigned long addr, unsigned long pfn, unsigned long size,
> +		pgprot_t pgprot)
> +
> +static inline void mmap_action_mixedmap(struct mmap_action *action,
> +		unsigned long addr, unsigned long pfn, unsigned long num_pages)
>
> Mixed map is a list of PFNs and a flag if the PFN is special or
> not. That's what makes mixed map different from the other mapping
> cases.
>
> One action per VMA, and mixed map is handled by supporting the above
> lis tin some way.

I don't think any of the above is really useful for me to respond to, I
think you've misunderstood what I'm saying, but it doesn't really matter
because I agree that the interface you propose is better for mixed map.

>
> > > I think this series should drop the mixedmem stuff, it is the most
> > > complicated action type. A vmalloc_user action is better for kcov.
> >
> > Fine, I mean if we could find a way to explicitly just give a list of stuff
> > to map that'd be _great_ vs. having a custom hook.
>
> You already proposed to allocate memory to hold an array, I suggested
> to have a per-range callback. Either could work as an API for
> mixedmap.

Again, I think you've misunderstood me, but it's moot, because I agree,
this kind of interface is better.

>
> > So maybe I should drop the vmalloc_user() bits too and make this a
> > remap-only change...
>
> Sure
>
> > But I don't want to tackle _all_ remap cases here.
>
> Due 4-5 or something to show the API is working. Things like my remark
> to have a better helper that does whole-vma only should show up more
> clearly with a few more conversions.

I was trying to limit to mm or mm-adjacent as per the cover letter.

But sure I will do that.

>
> It is generally a good idea when doing these reworks to look across

It's not a rework :) cover letter describes why I'm doing this.

> all the use cases patterns and try to simplify them. This is why a
> series per pattern is a good idea because you are saying you found a
> pattern, and here are N examples of the pattern to prove it.
>
> Eg if a huge number of drivers are just mmaping a linear range of
> memory with a fixed pgoff then a helper to support exactly that
> pattern with minimal driver code should be developed.

Fine in spirit, let's be pragmatic also though.

Again this isn't a refactoring exercise. But I agree we should try to get
the API right as best we can.

>
> Like below, apparently vmalloc_user() is already a pattern and already
> has a simplifying safe helper.
>
> > Anyway maybe if I simplify there's still a shot at this landing in time...
>
> Simplify is always good to help things get merged :)

Yup :)

>
> > > Eg there are not that many places calling vmalloc_user(), a single
> > > series could convert alot of them.
> > >
> > > If you did it this way we'd discover that there are already
> > > helpers for vmalloc_user():
> > >
> > > 	return remap_vmalloc_range(vma, mdev_state->memblk, 0);
> > >
> > > And kcov looks buggy to not be using it already. The above gets the
> > > VMA type right and doesn't force mixedmap :)
> >
> > Right, I mean maybe.
>
> Maybe send out a single patch to change kcov to remap_vmalloc_range()
> for this cycle? Answer the maybe?

Sure I can probably do that.

The question is time and, because most of my days are full of review as per
my self-inflicted^W -selected role as a maintainer.

This series will be the priority obviously :)

>
> Jason

Cheers, Lorenzo
Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
Posted by Lorenzo Stoakes 2 weeks, 3 days ago
Hi Andrew,

Could you apply the below fixpatch?

Thanks, Lorenzo

----8<----
From 35b96b949b44397c744b18f10b40a9989d4a92d2 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Mon, 15 Sep 2025 11:01:06 +0100
Subject: [PATCH] mm: fix incorrect mixedmap implementation

This was typo'd due to staring too long at the cramfs implementation.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/util.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/util.c b/mm/util.c
index 9bfef9509d35..23a2ec675344 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1364,15 +1364,14 @@ int mmap_action_complete(struct mmap_action *action,
 		unsigned long pgnum = 0;
 		unsigned long pfn = action->mixedmap.pfn;
 		unsigned long addr = action->mixedmap.addr;
-		unsigned long vaddr = vma->vm_start;

 		VM_WARN_ON_ONCE(!(vma->vm_flags & VM_MIXEDMAP));

 		for (; pgnum < action->mixedmap.num_pages;
-		    pgnum++, pfn++, addr += PAGE_SIZE, vaddr += PAGE_SIZE) {
+		    pgnum++, pfn++, addr += PAGE_SIZE) {
 			vm_fault_t vmf;

-			vmf = vmf_insert_mixed(vma, vaddr, addr);
+			vmf = vmf_insert_mixed(vma, addr, pfn);
 			if (vmf & VM_FAULT_ERROR) {
 				err = vm_fault_to_errno(vmf, 0);
 				break;
--
2.51.0
Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
Posted by Chris Mason 2 weeks, 4 days ago
Hi Lorzeno,

On 9/10/25 4:22 PM, Lorenzo Stoakes wrote:
> Some drivers/filesystems need to perform additional tasks after the VMA is
> set up. This is typically in the form of pre-population.
> 
> The forms of pre-population most likely to be performed are a PFN remap or
> insertion of a mixed map, so we provide this functionality, ensuring that
> we perform the appropriate actions at the appropriate time - that is
> setting flags at the point of .mmap_prepare, and performing the actual
> remap at the point at which the VMA is fully established.
> 
> This prevents the driver from doing anything too crazy with a VMA at any
> stage, and we retain complete control over how the mm functionality is
> applied.
> 
> Unfortunately callers still do often require some kind of custom action, so
> we add an optional success/error _hook to allow the caller to do something
> after the action has succeeded or failed.
> 
> This is done at the point when the VMA has already been established, so the
> harm that can be done is limited.
> 
> The error hook can be used to filter errors if necessary.
> 
> We implement actions as abstracted from the vm_area_desc, so we provide the
> ability for custom hooks to invoke actions distinct from the vma
> descriptor.
> 
> If any error arises on these final actions, we simply unmap the VMA
> altogether.
> 
> Also update the stacked filesystem compatibility layer to utilise the
> action behaviour, and update the VMA tests accordingly.
> 
> For drivers which perform truly custom logic, we provide a custom action
> hook which is invoked at the point of action execution.
> 
> This can then, in turn, update the desc object and perform other actions,
> such as partially remapping ranges for instance. We export
> vma_desc_action_prepare() and vma_desc_action_complete() for drivers to do
> this.
> 
> This is performed at a stage where the VMA is already established,
> immediately prior to mapping completion, so it is considerably less
> problematic than a general mmap hook.
> 
> Note that at the point of the action being taken, the VMA is visible via
> the rmap, only the VMA write lock is held, so if anything needs to access
> the VMA, it is able to.
> 
> Essentially the action is taken as if it were performed after the mapping,
> but is kept atomic with VMA state.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  include/linux/mm.h               |  30 ++++++
>  include/linux/mm_types.h         |  61 ++++++++++++
>  mm/util.c                        | 150 +++++++++++++++++++++++++++-
>  mm/vma.c                         |  70 ++++++++-----
>  tools/testing/vma/vma_internal.h | 164 ++++++++++++++++++++++++++++++-
>  5 files changed, 447 insertions(+), 28 deletions(-)
> 

[ ... ]

> +/**
> + * mmap_action_complete - Execute VMA descriptor action.
> + * @action: The action to perform.
> + * @vma: The VMA to perform the action upon.
> + *
> + * Similar to mmap_action_prepare(), other than internal mm usage this is
> + * intended for mmap_prepare users who implement a custom hook - with this
> + * function being called from the custom hook itself.
> + *
> + * Return: 0 on success, or error, at which point the VMA will be unmapped.
> + */
> +int mmap_action_complete(struct mmap_action *action,
> +			     struct vm_area_struct *vma)
> +{
> +	int err = 0;
> +
> +	switch (action->type) {
> +	case MMAP_NOTHING:
> +		break;
> +	case MMAP_REMAP_PFN:
> +		VM_WARN_ON_ONCE((vma->vm_flags & VM_REMAP_FLAGS) !=
> +				VM_REMAP_FLAGS);
> +
> +		err = remap_pfn_range_complete(vma, action->remap.addr,
> +				action->remap.pfn, action->remap.size,
> +				action->remap.pgprot);
> +
> +		break;
> +	case MMAP_INSERT_MIXED:
> +	{
> +		unsigned long pgnum = 0;
> +		unsigned long pfn = action->mixedmap.pfn;
> +		unsigned long addr = action->mixedmap.addr;
> +		unsigned long vaddr = vma->vm_start;
> +
> +		VM_WARN_ON_ONCE(!(vma->vm_flags & VM_MIXEDMAP));
> +
> +		for (; pgnum < action->mixedmap.num_pages;
> +		     pgnum++, pfn++, addr += PAGE_SIZE, vaddr += PAGE_SIZE) {
> +			vm_fault_t vmf;
> +
> +			vmf = vmf_insert_mixed(vma, vaddr, addr);
                                                          ^^^^^
Should this be pfn instead of addr?

-chris
Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
Posted by Lorenzo Stoakes 2 weeks, 3 days ago
On Sat, Sep 13, 2025 at 06:54:06PM -0400, Chris Mason wrote:
> Hi Lorzeno,
>
> On 9/10/25 4:22 PM, Lorenzo Stoakes wrote:
> > Some drivers/filesystems need to perform additional tasks after the VMA is
> > set up. This is typically in the form of pre-population.
> >
> > The forms of pre-population most likely to be performed are a PFN remap or
> > insertion of a mixed map, so we provide this functionality, ensuring that
> > we perform the appropriate actions at the appropriate time - that is
> > setting flags at the point of .mmap_prepare, and performing the actual
> > remap at the point at which the VMA is fully established.
> >
> > This prevents the driver from doing anything too crazy with a VMA at any
> > stage, and we retain complete control over how the mm functionality is
> > applied.
> >
> > Unfortunately callers still do often require some kind of custom action, so
> > we add an optional success/error _hook to allow the caller to do something
> > after the action has succeeded or failed.
> >
> > This is done at the point when the VMA has already been established, so the
> > harm that can be done is limited.
> >
> > The error hook can be used to filter errors if necessary.
> >
> > We implement actions as abstracted from the vm_area_desc, so we provide the
> > ability for custom hooks to invoke actions distinct from the vma
> > descriptor.
> >
> > If any error arises on these final actions, we simply unmap the VMA
> > altogether.
> >
> > Also update the stacked filesystem compatibility layer to utilise the
> > action behaviour, and update the VMA tests accordingly.
> >
> > For drivers which perform truly custom logic, we provide a custom action
> > hook which is invoked at the point of action execution.
> >
> > This can then, in turn, update the desc object and perform other actions,
> > such as partially remapping ranges for instance. We export
> > vma_desc_action_prepare() and vma_desc_action_complete() for drivers to do
> > this.
> >
> > This is performed at a stage where the VMA is already established,
> > immediately prior to mapping completion, so it is considerably less
> > problematic than a general mmap hook.
> >
> > Note that at the point of the action being taken, the VMA is visible via
> > the rmap, only the VMA write lock is held, so if anything needs to access
> > the VMA, it is able to.
> >
> > Essentially the action is taken as if it were performed after the mapping,
> > but is kept atomic with VMA state.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> >  include/linux/mm.h               |  30 ++++++
> >  include/linux/mm_types.h         |  61 ++++++++++++
> >  mm/util.c                        | 150 +++++++++++++++++++++++++++-
> >  mm/vma.c                         |  70 ++++++++-----
> >  tools/testing/vma/vma_internal.h | 164 ++++++++++++++++++++++++++++++-
> >  5 files changed, 447 insertions(+), 28 deletions(-)
> >
>
> [ ... ]
>
> > +/**
> > + * mmap_action_complete - Execute VMA descriptor action.
> > + * @action: The action to perform.
> > + * @vma: The VMA to perform the action upon.
> > + *
> > + * Similar to mmap_action_prepare(), other than internal mm usage this is
> > + * intended for mmap_prepare users who implement a custom hook - with this
> > + * function being called from the custom hook itself.
> > + *
> > + * Return: 0 on success, or error, at which point the VMA will be unmapped.
> > + */
> > +int mmap_action_complete(struct mmap_action *action,
> > +			     struct vm_area_struct *vma)
> > +{
> > +	int err = 0;
> > +
> > +	switch (action->type) {
> > +	case MMAP_NOTHING:
> > +		break;
> > +	case MMAP_REMAP_PFN:
> > +		VM_WARN_ON_ONCE((vma->vm_flags & VM_REMAP_FLAGS) !=
> > +				VM_REMAP_FLAGS);
> > +
> > +		err = remap_pfn_range_complete(vma, action->remap.addr,
> > +				action->remap.pfn, action->remap.size,
> > +				action->remap.pgprot);
> > +
> > +		break;
> > +	case MMAP_INSERT_MIXED:
> > +	{
> > +		unsigned long pgnum = 0;
> > +		unsigned long pfn = action->mixedmap.pfn;
> > +		unsigned long addr = action->mixedmap.addr;
> > +		unsigned long vaddr = vma->vm_start;
> > +
> > +		VM_WARN_ON_ONCE(!(vma->vm_flags & VM_MIXEDMAP));
> > +
> > +		for (; pgnum < action->mixedmap.num_pages;
> > +		     pgnum++, pfn++, addr += PAGE_SIZE, vaddr += PAGE_SIZE) {
> > +			vm_fault_t vmf;
> > +
> > +			vmf = vmf_insert_mixed(vma, vaddr, addr);
>                                                           ^^^^^
> Should this be pfn instead of addr?

Yeah, sigh, this is a direct product of cramfs seemingly having a bug where it
was passing PA's and not PFNs.

I thought I had fixed this but clearly I missed this here.

Let me send a fix-patch!

>
> -chris

Cheers, Lorenzo
Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
Posted by Lorenzo Stoakes 2 weeks, 6 days ago
Hi Andrew,

Could you apply the below fix-patch to make nommu happy? It also has a couple
trivial whitespace fixes in it.

Thanks, Lorenzo

----8<----
From 94d0d29ab23b48bd301eb7e4e9abe88546565d7a Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Fri, 12 Sep 2025 10:56:39 +0100
Subject: [PATCH] nommu fix

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/util.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 66 insertions(+), 2 deletions(-)

diff --git a/mm/util.c b/mm/util.c
index 11752d67b89c..f0730efd34eb 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1302,6 +1302,7 @@ struct page **mmap_action_mixedmap_pages(struct mmap_action *action,
 }
 EXPORT_SYMBOL(mmap_action_mixedmap_pages);

+#ifdef CONFIG_MMU
 /**
  * mmap_action_prepare - Perform preparatory setup for an VMA descriptor
  * action which need to be performed.
@@ -1313,7 +1314,7 @@ EXPORT_SYMBOL(mmap_action_mixedmap_pages);
  * it wishes to perform.
  */
 void mmap_action_prepare(struct mmap_action *action,
-			    struct vm_area_desc *desc)
+			struct vm_area_desc *desc)
 {
 	switch (action->type) {
 	case MMAP_NOTHING:
@@ -1342,7 +1343,7 @@ EXPORT_SYMBOL(mmap_action_prepare);
  * Return: 0 on success, or error, at which point the VMA will be unmapped.
  */
 int mmap_action_complete(struct mmap_action *action,
-			    struct vm_area_struct *vma)
+			struct vm_area_struct *vma)
 {
 	int err = 0;

@@ -1424,6 +1425,69 @@ int mmap_action_complete(struct mmap_action *action,
 	return 0;
 }
 EXPORT_SYMBOL(mmap_action_complete);
+#else
+void mmap_action_prepare(struct mmap_action *action,
+			struct vm_area_desc *desc)
+{
+	switch (action->type) {
+	case MMAP_NOTHING:
+	case MMAP_CUSTOM_ACTION:
+		break;
+	case MMAP_REMAP_PFN:
+	case MMAP_INSERT_MIXED:
+	case MMAP_INSERT_MIXED_PAGES:
+		WARN_ON_ONCE(1); /* nommu cannot handle these. */
+		break;
+	}
+}
+EXPORT_SYMBOL(mmap_action_prepare);
+
+int mmap_action_complete(struct mmap_action *action,
+			struct vm_area_struct *vma)
+{
+	int err = 0;
+
+	switch (action->type) {
+	case MMAP_NOTHING:
+		break;
+	case MMAP_REMAP_PFN:
+	case MMAP_INSERT_MIXED:
+	case MMAP_INSERT_MIXED_PAGES:
+		WARN_ON_ONCE(1); /* nommu cannot handle these. */
+
+		break;
+	case MMAP_CUSTOM_ACTION:
+		err = action->custom.action_hook(vma);
+		break;
+	}
+
+	/*
+	* If an error occurs, unmap the VMA altogether and return an error. We
+	* only clear the newly allocated VMA, since this function is only
+	* invoked if we do NOT merge, so we only clean up the VMA we created.
+	*/
+	if (err) {
+		const size_t len = vma_pages(vma) << PAGE_SHIFT;
+
+		do_munmap(current->mm, vma->vm_start, len, NULL);
+
+		if (action->error_hook) {
+			/* We may want to filter the error. */
+			err = action->error_hook(err);
+
+			/* The caller should not clear the error. */
+			VM_WARN_ON_ONCE(!err);
+		}
+		return err;
+	}
+
+	if (action->success_hook)
+		err = action->success_hook(vma);
+
+	return 0;
+}
+EXPORT_SYMBOL(mmap_action_complete);
+#endif

 #ifdef CONFIG_MMU
 /**
--
2.51.0
Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
Posted by Reinette Chatre 2 weeks, 6 days ago
Hi Lorenzo,

On 9/10/25 1:22 PM, Lorenzo Stoakes wrote:

...

> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 4a441f78340d..ae6c7a0a18a7 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -770,6 +770,64 @@ struct pfnmap_track_ctx {
>  };
>  #endif
>  
> +/* What action should be taken after an .mmap_prepare call is complete? */
> +enum mmap_action_type {
> +	MMAP_NOTHING,		 /* Mapping is complete, no further action. */
> +	MMAP_REMAP_PFN,		 /* Remap PFN range based on desc->remap. */
> +	MMAP_INSERT_MIXED,	 /* Mixed map based on desc->mixedmap. */
> +	MMAP_INSERT_MIXED_PAGES, /* Mixed map based on desc->mixedmap_pages. */
> +	MMAP_CUSTOM_ACTION,	 /* User-provided hook. */
> +};
> +
> +struct mmap_action {
> +	union {
> +		/* Remap range. */
> +		struct {
> +			unsigned long addr;
> +			unsigned long pfn;
> +			unsigned long size;
> +			pgprot_t pgprot;
> +		} remap;
> +		/* Insert mixed map. */
> +		struct {
> +			unsigned long addr;
> +			unsigned long pfn;
> +			unsigned long num_pages;
> +		} mixedmap;
> +		/* Insert specific mixed map pages. */
> +		struct {
> +			unsigned long addr;
> +			struct page **pages;
> +			unsigned long num_pages;
> +			/* kfree pages on completion? */
> +			bool kfree_pages :1;
> +		} mixedmap_pages;
> +		struct {
> +			int (*action_hook)(struct vm_area_struct *vma);
> +		} custom;
> +	};
> +	enum mmap_action_type type;
> +
> +	/*
> +	 * If specified, this hook is invoked after the selected action has been
> +	 * successfully completed. Not that the VMA write lock still held.

A typo that may trip tired eyes: Not -> Note ? (perhaps also "is still held"?)
(also in the duplicate changes to tools/testing/vma/vma_internal.h)

> +	 *
> +	 * The absolute minimum ought to be done here.
> +	 *
> +	 * Returns 0 on success, or an error code.
> +	 */
> +	int (*success_hook)(struct vm_area_struct *vma);
> +
> +	/*
> +	 * If specified, this hook is invoked when an error occurred when
> +	 * attempting the selection action.
> +	 *
> +	 * The hook can return an error code in order to filter the error, but
> +	 * it is not valid to clear the error here.
> +	 */
> +	int (*error_hook)(int err);
> +};

Reinette
Re: [PATCH v2 08/16] mm: add ability to take further action in vm_area_desc
Posted by Lorenzo Stoakes 2 weeks, 6 days ago
On Thu, Sep 11, 2025 at 03:07:21PM -0700, Reinette Chatre wrote:
> Hi Lorenzo,
>
> On 9/10/25 1:22 PM, Lorenzo Stoakes wrote:
>
> ...
>
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 4a441f78340d..ae6c7a0a18a7 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -770,6 +770,64 @@ struct pfnmap_track_ctx {
> >  };
> >  #endif
> >
> > +/* What action should be taken after an .mmap_prepare call is complete? */
> > +enum mmap_action_type {
> > +	MMAP_NOTHING,		 /* Mapping is complete, no further action. */
> > +	MMAP_REMAP_PFN,		 /* Remap PFN range based on desc->remap. */
> > +	MMAP_INSERT_MIXED,	 /* Mixed map based on desc->mixedmap. */
> > +	MMAP_INSERT_MIXED_PAGES, /* Mixed map based on desc->mixedmap_pages. */
> > +	MMAP_CUSTOM_ACTION,	 /* User-provided hook. */
> > +};
> > +
> > +struct mmap_action {
> > +	union {
> > +		/* Remap range. */
> > +		struct {
> > +			unsigned long addr;
> > +			unsigned long pfn;
> > +			unsigned long size;
> > +			pgprot_t pgprot;
> > +		} remap;
> > +		/* Insert mixed map. */
> > +		struct {
> > +			unsigned long addr;
> > +			unsigned long pfn;
> > +			unsigned long num_pages;
> > +		} mixedmap;
> > +		/* Insert specific mixed map pages. */
> > +		struct {
> > +			unsigned long addr;
> > +			struct page **pages;
> > +			unsigned long num_pages;
> > +			/* kfree pages on completion? */
> > +			bool kfree_pages :1;
> > +		} mixedmap_pages;
> > +		struct {
> > +			int (*action_hook)(struct vm_area_struct *vma);
> > +		} custom;
> > +	};
> > +	enum mmap_action_type type;
> > +
> > +	/*
> > +	 * If specified, this hook is invoked after the selected action has been
> > +	 * successfully completed. Not that the VMA write lock still held.
>
> A typo that may trip tired eyes: Not -> Note ? (perhaps also "is still held"?)
> (also in the duplicate changes to tools/testing/vma/vma_internal.h)

Yeah good catch! Will fix if respin :)

Cheers, Lorenzo