[PATCH v3 08/13] mm: add ability to take further action in vm_area_desc

Lorenzo Stoakes posted 13 patches 2 weeks, 1 day ago
There is a newer version of this series
[PATCH v3 08/13] mm: add ability to take further action in vm_area_desc
Posted by Lorenzo Stoakes 2 weeks, 1 day ago
Some drivers/filesystems need to perform additional tasks after the VMA is
set up.  This is typically in the form of pre-population.

The forms of pre-population most likely to be performed are a PFN remap
or the insertion of normal folios and PFNs into a mixed map.

We start by implementing the PFN remap functionality, ensuring that we
perform the appropriate actions at the appropriate time - that is setting
flags at the point of .mmap_prepare, and performing the actual remap at the
point at which the VMA is fully established.

This prevents the driver from doing anything too crazy with a VMA at any
stage, and we retain complete control over how the mm functionality is
applied.

Unfortunately callers still do often require some kind of custom action,
so we add an optional success/error _hook to allow the caller to do
something after the action has succeeded or failed.

This is done at the point when the VMA has already been established, so
the harm that can be done is limited.

The error hook can be used to filter errors if necessary.

If any error arises on these final actions, we simply unmap the VMA
altogether.

Also update the stacked filesystem compatibility layer to utilise the
action behaviour, and update the VMA tests accordingly.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/mm.h               |  74 ++++++++++++++++
 include/linux/mm_types.h         |  46 ++++++++++
 mm/util.c                        | 145 ++++++++++++++++++++++++++++++-
 mm/vma.c                         |  70 ++++++++++-----
 tools/testing/vma/vma_internal.h |  83 +++++++++++++++++-
 5 files changed, 390 insertions(+), 28 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6d4cc7cdf1e1..ee4efb394ed3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3596,6 +3596,80 @@ static inline unsigned long vma_desc_pages(const struct vm_area_desc *desc)
 	return vma_desc_size(desc) >> PAGE_SHIFT;
 }
 
+/**
+ * mmap_action_remap - helper for mmap_prepare hook to specify that a pure PFN
+ * remap is required.
+ * @desc: The VMA descriptor for the VMA requiring remap.
+ * @start: The virtual address to start the remap from, must be within the VMA.
+ * @start_pfn: The first PFN in the range to remap.
+ * @size: The size of the range to remap, in bytes, at most spanning to the end
+ * of the VMA.
+ */
+static inline void mmap_action_remap(struct vm_area_desc *desc,
+				     unsigned long start,
+				     unsigned long start_pfn,
+				     unsigned long size)
+{
+	struct mmap_action *action = &desc->action;
+
+	/* [start, start + size) must be within the VMA. */
+	WARN_ON_ONCE(start < desc->start || start >= desc->end);
+	WARN_ON_ONCE(start + size > desc->end);
+
+	action->type = MMAP_REMAP_PFN;
+	action->remap.start = start;
+	action->remap.start_pfn = start_pfn;
+	action->remap.size = size;
+	action->remap.pgprot = desc->page_prot;
+}
+
+/**
+ * mmap_action_remap_full - helper for mmap_prepare hook to specify that the
+ * entirety of a VMA should be PFN remapped.
+ * @desc: The VMA descriptor for the VMA requiring remap.
+ * @start_pfn: The first PFN in the range to remap.
+ */
+static inline void mmap_action_remap_full(struct vm_area_desc *desc,
+					  unsigned long start_pfn)
+{
+	mmap_action_remap(desc, desc->start, start_pfn, vma_desc_size(desc));
+}
+
+/**
+ * mmap_action_ioremap - helper for mmap_prepare hook to specify that a pure PFN
+ * I/O remap is required.
+ * @desc: The VMA descriptor for the VMA requiring remap.
+ * @start: The virtual address to start the remap from, must be within the VMA.
+ * @start_pfn: The first PFN in the range to remap.
+ * @size: The size of the range to remap, in bytes, at most spanning to the end
+ * of the VMA.
+ */
+static inline void mmap_action_ioremap(struct vm_area_desc *desc,
+				       unsigned long start,
+				       unsigned long start_pfn,
+				       unsigned long size)
+{
+	mmap_action_remap(desc, start, start_pfn, size);
+	desc->action.remap.is_io_remap = true;
+}
+
+/**
+ * mmap_action_ioremap_full - helper for mmap_prepare hook to specify that the
+ * entirety of a VMA should be PFN I/O remapped.
+ * @desc: The VMA descriptor for the VMA requiring remap.
+ * @start_pfn: The first PFN in the range to remap.
+ */
+static inline void mmap_action_ioremap_full(struct vm_area_desc *desc,
+					  unsigned long start_pfn)
+{
+	mmap_action_ioremap(desc, desc->start, start_pfn, vma_desc_size(desc));
+}
+
+void mmap_action_prepare(struct mmap_action *action,
+			     struct vm_area_desc *desc);
+int mmap_action_complete(struct mmap_action *action,
+			     struct vm_area_struct *vma);
+
 /* Look up the first VMA which exactly match the interval vm_start ... vm_end */
 static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm,
 				unsigned long vm_start, unsigned long vm_end)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 31b27086586d..aa1e2003f366 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -775,6 +775,49 @@ struct pfnmap_track_ctx {
 };
 #endif
 
+/* What action should be taken after an .mmap_prepare call is complete? */
+enum mmap_action_type {
+	MMAP_NOTHING,		/* Mapping is complete, no further action. */
+	MMAP_REMAP_PFN,		/* Remap PFN range. */
+};
+
+/*
+ * Describes an action an mmap_prepare hook can instruct to be taken to complete
+ * the mapping of a VMA. Specified in vm_area_desc.
+ */
+struct mmap_action {
+	union {
+		/* Remap range. */
+		struct {
+			unsigned long start;
+			unsigned long start_pfn;
+			unsigned long size;
+			pgprot_t pgprot;
+			bool is_io_remap;
+		} remap;
+	};
+	enum mmap_action_type type;
+
+	/*
+	 * If specified, this hook is invoked after the selected action has been
+	 * successfully completed. Note that the VMA write lock still held.
+	 *
+	 * The absolute minimum ought to be done here.
+	 *
+	 * Returns 0 on success, or an error code.
+	 */
+	int (*success_hook)(const struct vm_area_struct *vma);
+
+	/*
+	 * If specified, this hook is invoked when an error occurred when
+	 * attempting the selection action.
+	 *
+	 * The hook can return an error code in order to filter the error, but
+	 * it is not valid to clear the error here.
+	 */
+	int (*error_hook)(int err);
+};
+
 /*
  * Describes a VMA that is about to be mmap()'ed. Drivers may choose to
  * manipulate mutable fields which will cause those fields to be updated in the
@@ -798,6 +841,9 @@ struct vm_area_desc {
 	/* Write-only fields. */
 	const struct vm_operations_struct *vm_ops;
 	void *private_data;
+
+	/* Take further action? */
+	struct mmap_action action;
 };
 
 /*
diff --git a/mm/util.c b/mm/util.c
index 6c1d64ed0221..64e0f28e251a 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1155,15 +1155,18 @@ int __compat_vma_mmap_prepare(const struct file_operations *f_op,
 		.vm_file = vma->vm_file,
 		.vm_flags = vma->vm_flags,
 		.page_prot = vma->vm_page_prot,
+
+		.action.type = MMAP_NOTHING, /* Default */
 	};
 	int err;
 
 	err = f_op->mmap_prepare(&desc);
 	if (err)
 		return err;
-	set_vma_from_desc(vma, &desc);
 
-	return 0;
+	mmap_action_prepare(&desc.action, &desc);
+	set_vma_from_desc(vma, &desc);
+	return mmap_action_complete(&desc.action, vma);
 }
 EXPORT_SYMBOL(__compat_vma_mmap_prepare);
 
@@ -1279,6 +1282,144 @@ void snapshot_page(struct page_snapshot *ps, const struct page *page)
 	}
 }
 
+#ifdef CONFIG_MMU
+/**
+ * mmap_action_prepare - Perform preparatory setup for an VMA descriptor
+ * action which need to be performed.
+ * @desc: The VMA descriptor to prepare for @action.
+ * @action: The action to perform.
+ */
+void mmap_action_prepare(struct mmap_action *action,
+			 struct vm_area_desc *desc)
+{
+	switch (action->type) {
+	case MMAP_NOTHING:
+		break;
+	case MMAP_REMAP_PFN:
+		if (action->remap.is_io_remap)
+			io_remap_pfn_range_prepare(desc, action->remap.start_pfn,
+				action->remap.size);
+		else
+			remap_pfn_range_prepare(desc, action->remap.start_pfn);
+		break;
+	}
+}
+EXPORT_SYMBOL(mmap_action_prepare);
+
+/**
+ * mmap_action_complete - Execute VMA descriptor action.
+ * @action: The action to perform.
+ * @vma: The VMA to perform the action upon.
+ *
+ * Similar to mmap_action_prepare().
+ *
+ * Return: 0 on success, or error, at which point the VMA will be unmapped.
+ */
+int mmap_action_complete(struct mmap_action *action,
+			 struct vm_area_struct *vma)
+{
+	int err = 0;
+
+	switch (action->type) {
+	case MMAP_NOTHING:
+		break;
+	case MMAP_REMAP_PFN:
+		VM_WARN_ON_ONCE((vma->vm_flags & VM_REMAP_FLAGS) !=
+				VM_REMAP_FLAGS);
+
+		if (action->remap.is_io_remap)
+			err = io_remap_pfn_range_complete(vma, action->remap.start,
+				action->remap.start_pfn, action->remap.size,
+				action->remap.pgprot);
+		else
+			err = remap_pfn_range_complete(vma, action->remap.start,
+				action->remap.start_pfn, action->remap.size,
+				action->remap.pgprot);
+		break;
+	}
+
+	/*
+	 * If an error occurs, unmap the VMA altogether and return an error. We
+	 * only clear the newly allocated VMA, since this function is only
+	 * invoked if we do NOT merge, so we only clean up the VMA we created.
+	 */
+	if (err) {
+		const size_t len = vma_pages(vma) << PAGE_SHIFT;
+
+		do_munmap(current->mm, vma->vm_start, len, NULL);
+
+		if (action->error_hook) {
+			/* We may want to filter the error. */
+			err = action->error_hook(err);
+
+			/* The caller should not clear the error. */
+			VM_WARN_ON_ONCE(!err);
+		}
+		return err;
+	}
+
+	if (action->success_hook)
+		err = action->success_hook(vma);
+
+	return err;
+}
+EXPORT_SYMBOL(mmap_action_complete);
+#else
+void mmap_action_prepare(struct mmap_action *action,
+			struct vm_area_desc *desc)
+{
+	switch (action->type) {
+	case MMAP_NOTHING:
+		break;
+	case MMAP_REMAP_PFN:
+		WARN_ON_ONCE(1); /* nommu cannot handle these. */
+		break;
+	}
+}
+EXPORT_SYMBOL(mmap_action_prepare);
+
+int mmap_action_complete(struct mmap_action *action,
+			struct vm_area_struct *vma)
+{
+	int err = 0;
+
+	switch (action->type) {
+	case MMAP_NOTHING:
+		break;
+	case MMAP_REMAP_PFN:
+		WARN_ON_ONCE(1); /* nommu cannot handle this. */
+
+		break;
+	}
+
+	/*
+	 * If an error occurs, unmap the VMA altogether and return an error. We
+	 * only clear the newly allocated VMA, since this function is only
+	 * invoked if we do NOT merge, so we only clean up the VMA we created.
+	 */
+	if (err) {
+		const size_t len = vma_pages(vma) << PAGE_SHIFT;
+
+		do_munmap(current->mm, vma->vm_start, len, NULL);
+
+		if (action->error_hook) {
+			/* We may want to filter the error. */
+			err = action->error_hook(err);
+
+			/* The caller should not clear the error. */
+			VM_WARN_ON_ONCE(!err);
+		}
+		return err;
+	}
+
+	if (action->success_hook)
+		err = action->success_hook(vma);
+
+	return 0;
+}
+EXPORT_SYMBOL(mmap_action_complete);
+#endif
+
 #ifdef CONFIG_MMU
 /**
  * folio_pte_batch - detect a PTE batch for a large folio
diff --git a/mm/vma.c b/mm/vma.c
index bdb070a62a2e..1be297f7bb00 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -2328,17 +2328,33 @@ static void update_ksm_flags(struct mmap_state *map)
 	map->vm_flags = ksm_vma_flags(map->mm, map->file, map->vm_flags);
 }
 
+static void set_desc_from_map(struct vm_area_desc *desc,
+		const struct mmap_state *map)
+{
+	desc->start = map->addr;
+	desc->end = map->end;
+
+	desc->pgoff = map->pgoff;
+	desc->vm_file = map->file;
+	desc->vm_flags = map->vm_flags;
+	desc->page_prot = map->page_prot;
+}
+
 /*
  * __mmap_setup() - Prepare to gather any overlapping VMAs that need to be
  * unmapped once the map operation is completed, check limits, account mapping
  * and clean up any pre-existing VMAs.
  *
+ * As a result it sets up the @map and @desc objects.
+ *
  * @map: Mapping state.
+ * @desc: VMA descriptor
  * @uf:  Userfaultfd context list.
  *
  * Returns: 0 on success, error code otherwise.
  */
-static int __mmap_setup(struct mmap_state *map, struct list_head *uf)
+static int __mmap_setup(struct mmap_state *map, struct vm_area_desc *desc,
+			struct list_head *uf)
 {
 	int error;
 	struct vma_iterator *vmi = map->vmi;
@@ -2395,6 +2411,7 @@ static int __mmap_setup(struct mmap_state *map, struct list_head *uf)
 	 */
 	vms_clean_up_area(vms, &map->mas_detach);
 
+	set_desc_from_map(desc, map);
 	return 0;
 }
 
@@ -2567,34 +2584,26 @@ static void __mmap_complete(struct mmap_state *map, struct vm_area_struct *vma)
  *
  * Returns 0 on success, or an error code otherwise.
  */
-static int call_mmap_prepare(struct mmap_state *map)
+static int call_mmap_prepare(struct mmap_state *map,
+		struct vm_area_desc *desc)
 {
 	int err;
-	struct vm_area_desc desc = {
-		.mm = map->mm,
-		.file = map->file,
-		.start = map->addr,
-		.end = map->end,
-
-		.pgoff = map->pgoff,
-		.vm_file = map->file,
-		.vm_flags = map->vm_flags,
-		.page_prot = map->page_prot,
-	};
 
 	/* Invoke the hook. */
-	err = vfs_mmap_prepare(map->file, &desc);
+	err = vfs_mmap_prepare(map->file, desc);
 	if (err)
 		return err;
 
+	mmap_action_prepare(&desc->action, desc);
+
 	/* Update fields permitted to be changed. */
-	map->pgoff = desc.pgoff;
-	map->file = desc.vm_file;
-	map->vm_flags = desc.vm_flags;
-	map->page_prot = desc.page_prot;
+	map->pgoff = desc->pgoff;
+	map->file = desc->vm_file;
+	map->vm_flags = desc->vm_flags;
+	map->page_prot = desc->page_prot;
 	/* User-defined fields. */
-	map->vm_ops = desc.vm_ops;
-	map->vm_private_data = desc.private_data;
+	map->vm_ops = desc->vm_ops;
+	map->vm_private_data = desc->private_data;
 
 	return 0;
 }
@@ -2642,16 +2651,24 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma = NULL;
-	int error;
 	bool have_mmap_prepare = file && file->f_op->mmap_prepare;
 	VMA_ITERATOR(vmi, mm, addr);
 	MMAP_STATE(map, mm, &vmi, addr, len, pgoff, vm_flags, file);
+	struct vm_area_desc desc = {
+		.mm = mm,
+		.file = file,
+		.action = {
+			.type = MMAP_NOTHING, /* Default to no further action. */
+		},
+	};
+	bool allocated_new = false;
+	int error;
 
 	map.check_ksm_early = can_set_ksm_flags_early(&map);
 
-	error = __mmap_setup(&map, uf);
+	error = __mmap_setup(&map, &desc, uf);
 	if (!error && have_mmap_prepare)
-		error = call_mmap_prepare(&map);
+		error = call_mmap_prepare(&map, &desc);
 	if (error)
 		goto abort_munmap;
 
@@ -2670,6 +2687,7 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
 		error = __mmap_new_vma(&map, &vma);
 		if (error)
 			goto unacct_error;
+		allocated_new = true;
 	}
 
 	if (have_mmap_prepare)
@@ -2677,6 +2695,12 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
 
 	__mmap_complete(&map, vma);
 
+	if (have_mmap_prepare && allocated_new) {
+		error = mmap_action_complete(&desc.action, vma);
+		if (error)
+			return error;
+	}
+
 	return addr;
 
 	/* Accounting was done by __mmap_setup(). */
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
index 07167446dcf4..8c4722c2eced 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -274,6 +274,50 @@ struct mm_struct {
 
 struct vm_area_struct;
 
+
+/* What action should be taken after an .mmap_prepare call is complete? */
+enum mmap_action_type {
+	MMAP_NOTHING,		/* Mapping is complete, no further action. */
+	MMAP_REMAP_PFN,		/* Remap PFN range. */
+};
+
+/*
+ * Describes an action an mmap_prepare hook can instruct to be taken to complete
+ * the mapping of a VMA. Specified in vm_area_desc.
+ */
+struct mmap_action {
+	union {
+		/* Remap range. */
+		struct {
+			unsigned long start;
+			unsigned long start_pfn;
+			unsigned long size;
+			pgprot_t pgprot;
+			bool is_io_remap;
+		} remap;
+	};
+	enum mmap_action_type type;
+
+	/*
+	 * If specified, this hook is invoked after the selected action has been
+	 * successfully completed. Note that the VMA write lock still held.
+	 *
+	 * The absolute minimum ought to be done here.
+	 *
+	 * Returns 0 on success, or an error code.
+	 */
+	int (*success_hook)(const struct vm_area_struct *vma);
+
+	/*
+	 * If specified, this hook is invoked when an error occurred when
+	 * attempting the selection action.
+	 *
+	 * The hook can return an error code in order to filter the error, but
+	 * it is not valid to clear the error here.
+	 */
+	int (*error_hook)(int err);
+};
+
 /*
  * Describes a VMA that is about to be mmap()'ed. Drivers may choose to
  * manipulate mutable fields which will cause those fields to be updated in the
@@ -297,6 +341,9 @@ struct vm_area_desc {
 	/* Write-only fields. */
 	const struct vm_operations_struct *vm_ops;
 	void *private_data;
+
+	/* Take further action? */
+	struct mmap_action action;
 };
 
 struct file_operations {
@@ -1466,12 +1513,23 @@ static inline void free_anon_vma_name(struct vm_area_struct *vma)
 static inline void set_vma_from_desc(struct vm_area_struct *vma,
 		struct vm_area_desc *desc);
 
+static inline void mmap_action_prepare(struct mmap_action *action,
+					   struct vm_area_desc *desc)
+{
+}
+
+static inline int mmap_action_complete(struct mmap_action *action,
+					   struct vm_area_struct *vma)
+{
+	return 0;
+}
+
 static inline int __compat_vma_mmap_prepare(const struct file_operations *f_op,
 		struct file *file, struct vm_area_struct *vma)
 {
 	struct vm_area_desc desc = {
 		.mm = vma->vm_mm,
-		.file = vma->vm_file,
+		.file = file,
 		.start = vma->vm_start,
 		.end = vma->vm_end,
 
@@ -1479,15 +1537,18 @@ static inline int __compat_vma_mmap_prepare(const struct file_operations *f_op,
 		.vm_file = vma->vm_file,
 		.vm_flags = vma->vm_flags,
 		.page_prot = vma->vm_page_prot,
+
+		.action.type = MMAP_NOTHING, /* Default */
 	};
 	int err;
 
 	err = f_op->mmap_prepare(&desc);
 	if (err)
 		return err;
-	set_vma_from_desc(vma, &desc);
 
-	return 0;
+	mmap_action_prepare(&desc.action, &desc);
+	set_vma_from_desc(vma, &desc);
+	return mmap_action_complete(&desc.action, vma);
 }
 
 static inline int compat_vma_mmap_prepare(struct file *file,
@@ -1548,4 +1609,20 @@ static inline vm_flags_t ksm_vma_flags(const struct mm_struct *, const struct fi
 	return vm_flags;
 }
 
+static inline void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn)
+{
+}
+
+static inline int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr,
+		unsigned long pfn, unsigned long size, pgprot_t pgprot)
+{
+	return 0;
+}
+
+static inline int do_munmap(struct mm_struct *, unsigned long, size_t,
+		struct list_head *uf)
+{
+	return 0;
+}
+
 #endif	/* __MM_VMA_INTERNAL_H */
-- 
2.51.0
Re: [PATCH v3 08/13] mm: add ability to take further action in vm_area_desc
Posted by Pedro Falcato 2 weeks, 1 day ago
On Tue, Sep 16, 2025 at 03:11:54PM +0100, Lorenzo Stoakes wrote:
> Some drivers/filesystems need to perform additional tasks after the VMA is
> set up.  This is typically in the form of pre-population.
> 
> The forms of pre-population most likely to be performed are a PFN remap
> or the insertion of normal folios and PFNs into a mixed map.
> 
> We start by implementing the PFN remap functionality, ensuring that we
> perform the appropriate actions at the appropriate time - that is setting
> flags at the point of .mmap_prepare, and performing the actual remap at the
> point at which the VMA is fully established.
> 
> This prevents the driver from doing anything too crazy with a VMA at any
> stage, and we retain complete control over how the mm functionality is
> applied.
> 
> Unfortunately callers still do often require some kind of custom action,
> so we add an optional success/error _hook to allow the caller to do
> something after the action has succeeded or failed.

Do we have any idea for rules regarding ->mmap_prepare() and ->*_hook()?
It feels spooky to e.g grab locks in mmap_prepare, and hold them across core
mmap(). And I guess it might be needed?

> 
> This is done at the point when the VMA has already been established, so
> the harm that can be done is limited.
> 
> The error hook can be used to filter errors if necessary.
> 
> If any error arises on these final actions, we simply unmap the VMA
> altogether.
> 
> Also update the stacked filesystem compatibility layer to utilise the
> action behaviour, and update the VMA tests accordingly.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
<snip>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 31b27086586d..aa1e2003f366 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -775,6 +775,49 @@ struct pfnmap_track_ctx {
>  };
>  #endif
>  
> +/* What action should be taken after an .mmap_prepare call is complete? */
> +enum mmap_action_type {
> +	MMAP_NOTHING,		/* Mapping is complete, no further action. */
> +	MMAP_REMAP_PFN,		/* Remap PFN range. */
> +};
> +
> +/*
> + * Describes an action an mmap_prepare hook can instruct to be taken to complete
> + * the mapping of a VMA. Specified in vm_area_desc.
> + */
> +struct mmap_action {
> +	union {
> +		/* Remap range. */
> +		struct {
> +			unsigned long start;
> +			unsigned long start_pfn;
> +			unsigned long size;
> +			pgprot_t pgprot;
> +			bool is_io_remap;
> +		} remap;
> +	};
> +	enum mmap_action_type type;
> +
> +	/*
> +	 * If specified, this hook is invoked after the selected action has been
> +	 * successfully completed. Note that the VMA write lock still held.
> +	 *
> +	 * The absolute minimum ought to be done here.
> +	 *
> +	 * Returns 0 on success, or an error code.
> +	 */
> +	int (*success_hook)(const struct vm_area_struct *vma);
> +
> +	/*
> +	 * If specified, this hook is invoked when an error occurred when
> +	 * attempting the selection action.
> +	 *
> +	 * The hook can return an error code in order to filter the error, but
> +	 * it is not valid to clear the error here.
> +	 */
> +	int (*error_hook)(int err);

Do we need two hooks? It might be more ergonomic to simply have a:

	int (*finish)(int err);


	int random_driver_finish(int err)
	{
		if (err)
			pr_err("ahhhhhhhhh\n");
		mutex_unlock(&big_lock);
		return err;
	}

It's also unclear to me if/why we need the capability to switch error codes,
but I might've missed some discussion on this.


-- 
Pedro
Re: [PATCH v3 08/13] mm: add ability to take further action in vm_area_desc
Posted by Lorenzo Stoakes 2 weeks ago
On Wed, Sep 17, 2025 at 12:32:10PM +0100, Pedro Falcato wrote:
> On Tue, Sep 16, 2025 at 03:11:54PM +0100, Lorenzo Stoakes wrote:
> > Some drivers/filesystems need to perform additional tasks after the VMA is
> > set up.  This is typically in the form of pre-population.
> >
> > The forms of pre-population most likely to be performed are a PFN remap
> > or the insertion of normal folios and PFNs into a mixed map.
> >
> > We start by implementing the PFN remap functionality, ensuring that we
> > perform the appropriate actions at the appropriate time - that is setting
> > flags at the point of .mmap_prepare, and performing the actual remap at the
> > point at which the VMA is fully established.
> >
> > This prevents the driver from doing anything too crazy with a VMA at any
> > stage, and we retain complete control over how the mm functionality is
> > applied.
> >
> > Unfortunately callers still do often require some kind of custom action,
> > so we add an optional success/error _hook to allow the caller to do
> > something after the action has succeeded or failed.
>
> Do we have any idea for rules regarding ->mmap_prepare() and ->*_hook()?
> It feels spooky to e.g grab locks in mmap_prepare, and hold them across core
> mmap(). And I guess it might be needed?

I already did a bunch of logic around this, but several respins later and we
don't curently support it as Jason pointed out probably we actually don't need
to, at least so far.

I don't think it's really worth saying 'do this don't do that'. As wayward
drivers will do whatever.

Sadly we do need those hooks because of error filtering and e.g. debug output on
success.

However on success though, I discourage anything too stupid by making the vma
parameter const so you'd have to do a const cast there.

On error you only get the error code so good luck with that.

Obviously there could be a static mutex, but I think that's unavoidable.

>
> >
> > This is done at the point when the VMA has already been established, so
> > the harm that can be done is limited.
> >
> > The error hook can be used to filter errors if necessary.
> >
> > If any error arises on these final actions, we simply unmap the VMA
> > altogether.
> >
> > Also update the stacked filesystem compatibility layer to utilise the
> > action behaviour, and update the VMA tests accordingly.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> <snip>
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 31b27086586d..aa1e2003f366 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -775,6 +775,49 @@ struct pfnmap_track_ctx {
> >  };
> >  #endif
> >
> > +/* What action should be taken after an .mmap_prepare call is complete? */
> > +enum mmap_action_type {
> > +	MMAP_NOTHING,		/* Mapping is complete, no further action. */
> > +	MMAP_REMAP_PFN,		/* Remap PFN range. */
> > +};
> > +
> > +/*
> > + * Describes an action an mmap_prepare hook can instruct to be taken to complete
> > + * the mapping of a VMA. Specified in vm_area_desc.
> > + */
> > +struct mmap_action {
> > +	union {
> > +		/* Remap range. */
> > +		struct {
> > +			unsigned long start;
> > +			unsigned long start_pfn;
> > +			unsigned long size;
> > +			pgprot_t pgprot;
> > +			bool is_io_remap;
> > +		} remap;
> > +	};
> > +	enum mmap_action_type type;
> > +
> > +	/*
> > +	 * If specified, this hook is invoked after the selected action has been
> > +	 * successfully completed. Note that the VMA write lock still held.
> > +	 *
> > +	 * The absolute minimum ought to be done here.
> > +	 *
> > +	 * Returns 0 on success, or an error code.
> > +	 */
> > +	int (*success_hook)(const struct vm_area_struct *vma);
> > +
> > +	/*
> > +	 * If specified, this hook is invoked when an error occurred when
> > +	 * attempting the selection action.
> > +	 *
> > +	 * The hook can return an error code in order to filter the error, but
> > +	 * it is not valid to clear the error here.
> > +	 */
> > +	int (*error_hook)(int err);
>
> Do we need two hooks? It might be more ergonomic to simply have a:
>
> 	int (*finish)(int err);
>
>
> 	int random_driver_finish(int err)
> 	{
> 		if (err)
> 			pr_err("ahhhhhhhhh\n");
> 		mutex_unlock(&big_lock);
> 		return err;
> 	}

No I think that's less clear. Better to spell it out.

>
> It's also unclear to me if/why we need the capability to switch error codes,
> but I might've missed some discussion on this.

There's drivers that do it. That's why.

>
>
> --
> Pedro
Re: [PATCH v3 08/13] mm: add ability to take further action in vm_area_desc
Posted by Jason Gunthorpe 2 weeks, 1 day ago
On Tue, Sep 16, 2025 at 03:11:54PM +0100, Lorenzo Stoakes wrote:

> +/* What action should be taken after an .mmap_prepare call is complete? */
> +enum mmap_action_type {
> +	MMAP_NOTHING,		/* Mapping is complete, no further action. */
> +	MMAP_REMAP_PFN,		/* Remap PFN range. */

Seems like it would be a bit tider to include MMAP_IO_REMAP_PFN here
instead of having the is_io_remap bool.

> @@ -1155,15 +1155,18 @@ int __compat_vma_mmap_prepare(const struct file_operations *f_op,
>  		.vm_file = vma->vm_file,
>  		.vm_flags = vma->vm_flags,
>  		.page_prot = vma->vm_page_prot,
> +
> +		.action.type = MMAP_NOTHING, /* Default */
>  	};
>  	int err;
>  
>  	err = f_op->mmap_prepare(&desc);
>  	if (err)
>  		return err;
> -	set_vma_from_desc(vma, &desc);
>  
> -	return 0;
> +	mmap_action_prepare(&desc.action, &desc);
> +	set_vma_from_desc(vma, &desc);
> +	return mmap_action_complete(&desc.action, vma);
>  }
>  EXPORT_SYMBOL(__compat_vma_mmap_prepare);

A function called prepare that now calls complete has become a bit oddly named??

> +int mmap_action_complete(struct mmap_action *action,
> +			 struct vm_area_struct *vma)
> +{
> +	int err = 0;
> +
> +	switch (action->type) {
> +	case MMAP_NOTHING:
> +		break;
> +	case MMAP_REMAP_PFN:
> +		VM_WARN_ON_ONCE((vma->vm_flags & VM_REMAP_FLAGS) !=
> +				VM_REMAP_FLAGS);

This is checked in remap_pfn_range_complete() IIRC? Probably not
needed here as well then.

> +		if (action->remap.is_io_remap)
> +			err = io_remap_pfn_range_complete(vma, action->remap.start,
> +				action->remap.start_pfn, action->remap.size,
> +				action->remap.pgprot);
> +		else
> +			err = remap_pfn_range_complete(vma, action->remap.start,
> +				action->remap.start_pfn, action->remap.size,
> +				action->remap.pgprot);
> +		break;
> +	}
> +
> +	/*
> +	 * If an error occurs, unmap the VMA altogether and return an error. We
> +	 * only clear the newly allocated VMA, since this function is only
> +	 * invoked if we do NOT merge, so we only clean up the VMA we created.
> +	 */
> +	if (err) {
> +		const size_t len = vma_pages(vma) << PAGE_SHIFT;
> +
> +		do_munmap(current->mm, vma->vm_start, len, NULL);
> +
> +		if (action->error_hook) {
> +			/* We may want to filter the error. */
> +			err = action->error_hook(err);
> +
> +			/* The caller should not clear the error. */
> +			VM_WARN_ON_ONCE(!err);
> +		}
> +		return err;
> +	}
> +
> +	if (action->success_hook)
> +		err = action->success_hook(vma);
> +
> +	return err;

I would write this as

	if (action->success_hook)
		return action->success_hook(vma);

	return 0;

Just for emphasis this is the success path.

> +int mmap_action_complete(struct mmap_action *action,
> +			struct vm_area_struct *vma)
> +{
> +	int err = 0;
> +
> +	switch (action->type) {
> +	case MMAP_NOTHING:
> +		break;
> +	case MMAP_REMAP_PFN:
> +		WARN_ON_ONCE(1); /* nommu cannot handle this. */
> +
> +		break;
> +	}
> +
> +	/*
> +	 * If an error occurs, unmap the VMA altogether and return an error. We
> +	 * only clear the newly allocated VMA, since this function is only
> +	 * invoked if we do NOT merge, so we only clean up the VMA we created.
> +	 */
> +	if (err) {
> +		const size_t len = vma_pages(vma) << PAGE_SHIFT;
> +
> +		do_munmap(current->mm, vma->vm_start, len, NULL);
> +
> +		if (action->error_hook) {
> +			/* We may want to filter the error. */
> +			err = action->error_hook(err);
> +
> +			/* The caller should not clear the error. */
> +			VM_WARN_ON_ONCE(!err);
> +		}
> +		return err;
> +	}

err is never !0 here, so this should go to a later patch/series.

Also seems like this cleanup wants to be in a function that is not
protected by #ifdef nommu since the code is identical on both branches.

> +	if (action->success_hook)
> +		err = action->success_hook(vma);
> +
> +	return 0;

return err, though prefer to match above, and probably this sequence
should be pulled into the same shared function as above too.

Jason
Re: [PATCH v3 08/13] mm: add ability to take further action in vm_area_desc
Posted by Lorenzo Stoakes 2 weeks, 1 day ago
On Tue, Sep 16, 2025 at 02:28:36PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 16, 2025 at 03:11:54PM +0100, Lorenzo Stoakes wrote:
>
> > +/* What action should be taken after an .mmap_prepare call is complete? */
> > +enum mmap_action_type {
> > +	MMAP_NOTHING,		/* Mapping is complete, no further action. */
> > +	MMAP_REMAP_PFN,		/* Remap PFN range. */
>
> Seems like it would be a bit tider to include MMAP_IO_REMAP_PFN here
> instead of having the is_io_remap bool.

Well, I did start with this, but felt simpler to treat it as a remap, and also
semantically, as it's more-or-less a remap with maybe different PFN...

But you know, thinking about it, yeah that's probably nicer, will change.

Often with these things you are back and forth on 'hmm maybe this maybe that' :)

>
> > @@ -1155,15 +1155,18 @@ int __compat_vma_mmap_prepare(const struct file_operations *f_op,
> >  		.vm_file = vma->vm_file,
> >  		.vm_flags = vma->vm_flags,
> >  		.page_prot = vma->vm_page_prot,
> > +
> > +		.action.type = MMAP_NOTHING, /* Default */
> >  	};
> >  	int err;
> >
> >  	err = f_op->mmap_prepare(&desc);
> >  	if (err)
> >  		return err;
> > -	set_vma_from_desc(vma, &desc);
> >
> > -	return 0;
> > +	mmap_action_prepare(&desc.action, &desc);
> > +	set_vma_from_desc(vma, &desc);
> > +	return mmap_action_complete(&desc.action, vma);
> >  }
> >  EXPORT_SYMBOL(__compat_vma_mmap_prepare);
>
> A function called prepare that now calls complete has become a bit oddly named??

That's a very good point... :) I mean it's kinda right in a way because it is a
compatibility layer for mmap_prepare for stacked filesystems etc. that can only
(for now) call .mmap() and are confronted with an underlying thing that has
.mmap_prepare, but also it's confusing now.

Will rename.

>
> > +int mmap_action_complete(struct mmap_action *action,
> > +			 struct vm_area_struct *vma)
> > +{
> > +	int err = 0;
> > +
> > +	switch (action->type) {
> > +	case MMAP_NOTHING:
> > +		break;
> > +	case MMAP_REMAP_PFN:
> > +		VM_WARN_ON_ONCE((vma->vm_flags & VM_REMAP_FLAGS) !=
> > +				VM_REMAP_FLAGS);
>
> This is checked in remap_pfn_range_complete() IIRC? Probably not
> needed here as well then.

Ah ok will drop then.

>
> > +		if (action->remap.is_io_remap)
> > +			err = io_remap_pfn_range_complete(vma, action->remap.start,
> > +				action->remap.start_pfn, action->remap.size,
> > +				action->remap.pgprot);
> > +		else
> > +			err = remap_pfn_range_complete(vma, action->remap.start,
> > +				action->remap.start_pfn, action->remap.size,
> > +				action->remap.pgprot);
> > +		break;
> > +	}
> > +
> > +	/*
> > +	 * If an error occurs, unmap the VMA altogether and return an error. We
> > +	 * only clear the newly allocated VMA, since this function is only
> > +	 * invoked if we do NOT merge, so we only clean up the VMA we created.
> > +	 */
> > +	if (err) {
> > +		const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > +
> > +		do_munmap(current->mm, vma->vm_start, len, NULL);
> > +
> > +		if (action->error_hook) {
> > +			/* We may want to filter the error. */
> > +			err = action->error_hook(err);
> > +
> > +			/* The caller should not clear the error. */
> > +			VM_WARN_ON_ONCE(!err);
> > +		}
> > +		return err;
> > +	}
> > +
> > +	if (action->success_hook)
> > +		err = action->success_hook(vma);
> > +
> > +	return err;
>
> I would write this as
>
> 	if (action->success_hook)
> 		return action->success_hook(vma);
>
> 	return 0;
>
> Just for emphasis this is the success path.

Ack. That is nicer actually.

>
> > +int mmap_action_complete(struct mmap_action *action,
> > +			struct vm_area_struct *vma)
> > +{
> > +	int err = 0;
> > +
> > +	switch (action->type) {
> > +	case MMAP_NOTHING:
> > +		break;
> > +	case MMAP_REMAP_PFN:
> > +		WARN_ON_ONCE(1); /* nommu cannot handle this. */
> > +
> > +		break;
> > +	}
> > +
> > +	/*
> > +	 * If an error occurs, unmap the VMA altogether and return an error. We
> > +	 * only clear the newly allocated VMA, since this function is only
> > +	 * invoked if we do NOT merge, so we only clean up the VMA we created.
> > +	 */
> > +	if (err) {
> > +		const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > +
> > +		do_munmap(current->mm, vma->vm_start, len, NULL);
> > +
> > +		if (action->error_hook) {
> > +			/* We may want to filter the error. */
> > +			err = action->error_hook(err);
> > +
> > +			/* The caller should not clear the error. */
> > +			VM_WARN_ON_ONCE(!err);
> > +		}
> > +		return err;
> > +	}
>
> err is never !0 here, so this should go to a later patch/series.

Right yeah. Doh! Will drop.

>
> Also seems like this cleanup wants to be in a function that is not
> protected by #ifdef nommu since the code is identical on both branches.

Not sure which cleanup you mean, this is new code :)

I don't at all like functions that if #ifdef nommu embedded in them.

And I frankly resent that we support nommu so I'm not inclined to share code
between that and code for arches that people actually use.

Anyway, we can probably simplify this quite a bit.

	WARN_ON_ONCE(action->type != MMAP_NOTHING);
	return 0;

>
> > +	if (action->success_hook)
> > +		err = action->success_hook(vma);
> > +
> > +	return 0;
>
> return err, though prefer to match above, and probably this sequence
> should be pulled into the same shared function as above too.

Yeah I mean, you're not going to make me actually have to ack nommu properly are
you?..

I suppose we could be tasteful and have a separate 'handle hooks' function or
something here or something.

Let me put my bias aside and take a look at that.

>
> Jason

Cheers, Lorenzo
Re: [PATCH v3 08/13] mm: add ability to take further action in vm_area_desc
Posted by Jason Gunthorpe 2 weeks, 1 day ago
On Tue, Sep 16, 2025 at 06:57:56PM +0100, Lorenzo Stoakes wrote:
> > > +	/*
> > > +	 * If an error occurs, unmap the VMA altogether and return an error. We
> > > +	 * only clear the newly allocated VMA, since this function is only
> > > +	 * invoked if we do NOT merge, so we only clean up the VMA we created.
> > > +	 */
> > > +	if (err) {
> > > +		const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > > +
> > > +		do_munmap(current->mm, vma->vm_start, len, NULL);
> > > +
> > > +		if (action->error_hook) {
> > > +			/* We may want to filter the error. */
> > > +			err = action->error_hook(err);
> > > +
> > > +			/* The caller should not clear the error. */
> > > +			VM_WARN_ON_ONCE(!err);
> > > +		}
> > > +		return err;
> > > +	}
> > Also seems like this cleanup wants to be in a function that is not
> > protected by #ifdef nommu since the code is identical on both branches.
> 
> Not sure which cleanup you mean, this is new code :)

I mean the code I quoted right abouve that cleans up the VMA on
error.. It is always the same finishing sequence, there is no nommu
dependency in it.

Just put it all in some "finish mmap complete" function and call it in
both mmu and nommu versions.

Jason
Re: [PATCH v3 08/13] mm: add ability to take further action in vm_area_desc
Posted by Lorenzo Stoakes 2 weeks, 1 day ago
On Tue, Sep 16, 2025 at 03:08:54PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 16, 2025 at 06:57:56PM +0100, Lorenzo Stoakes wrote:
> > > > +	/*
> > > > +	 * If an error occurs, unmap the VMA altogether and return an error. We
> > > > +	 * only clear the newly allocated VMA, since this function is only
> > > > +	 * invoked if we do NOT merge, so we only clean up the VMA we created.
> > > > +	 */
> > > > +	if (err) {
> > > > +		const size_t len = vma_pages(vma) << PAGE_SHIFT;
> > > > +
> > > > +		do_munmap(current->mm, vma->vm_start, len, NULL);
> > > > +
> > > > +		if (action->error_hook) {
> > > > +			/* We may want to filter the error. */
> > > > +			err = action->error_hook(err);
> > > > +
> > > > +			/* The caller should not clear the error. */
> > > > +			VM_WARN_ON_ONCE(!err);
> > > > +		}
> > > > +		return err;
> > > > +	}
> > > Also seems like this cleanup wants to be in a function that is not
> > > protected by #ifdef nommu since the code is identical on both branches.
> >
> > Not sure which cleanup you mean, this is new code :)
>
> I mean the code I quoted right abouve that cleans up the VMA on
> error.. It is always the same finishing sequence, there is no nommu
> dependency in it.

Ah right yeah.

I wonder if it's useful if err != 0 for nommu anyway.

>
> Just put it all in some "finish mmap complete" function and call it in
> both mmu and nommu versions.

Ack!

>
> Jason
>

Cheers, Lorenzo