[v1] mm: expand mmap_prepare functionality and usage

[PATCH 05/15] fs: afs: correctly drop reference count on mapping failure

Posted by Lorenzo Stoakes (Oracle) 3 weeks, 4 days ago

Commit 9d5403b1036c ("fs: convert most other generic_file_*mmap() users to
.mmap_prepare()") updated AFS to use the mmap_prepare callback in favour of
the deprecated mmap callback.

However, it did not account for the fact that mmap_prepare can fail to map
due to an out of memory error, and thus should not be incrementing a
reference count on mmap_prepare.

With the newly added vm_ops->mapped callback available, we can simply defer
this operation to that callback which is only invoked once the mapping is
successfully in place (but not yet visible to userspace as the mmap and VMA
write locks are held).

Therefore add afs_mapped() to implement this callback for AFS.

In practice the mapping allocations are 'too small to fail' so this is
something that realistically should never happen in practice (or would do
so in a case where the process is about to die anyway), but we should still
handle this.

Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 fs/afs/file.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index f609366fd2ac..69ef86f5e274 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -28,6 +28,8 @@ static ssize_t afs_file_splice_read(struct file *in, loff_t *ppos,
 static void afs_vm_open(struct vm_area_struct *area);
 static void afs_vm_close(struct vm_area_struct *area);
 static vm_fault_t afs_vm_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff);
+static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
+		      const struct file *file, void **vm_private_data);
 
 const struct file_operations afs_file_operations = {
 	.open		= afs_open,
@@ -61,6 +63,7 @@ const struct address_space_operations afs_file_aops = {
 };
 
 static const struct vm_operations_struct afs_vm_ops = {
+	.mapped		= afs_mapped,
 	.open		= afs_vm_open,
 	.close		= afs_vm_close,
 	.fault		= filemap_fault,
@@ -500,13 +503,22 @@ static int afs_file_mmap_prepare(struct vm_area_desc *desc)
 	afs_add_open_mmap(vnode);
 
 	ret = generic_file_mmap_prepare(desc);
-	if (ret == 0)
-		desc->vm_ops = &afs_vm_ops;
-	else
-		afs_drop_open_mmap(vnode);
+	if (ret)
+		return ret;
+
+	desc->vm_ops = &afs_vm_ops;
 	return ret;
 }
 
+static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
+		      const struct file *file, void **vm_private_data)
+{
+	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
+
+	afs_add_open_mmap(vnode);
+	return 0;
+}
+
 static void afs_vm_open(struct vm_area_struct *vma)
 {
 	afs_add_open_mmap(AFS_FS_I(file_inode(vma->vm_file)));
-- 
2.53.0

Re: [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure

Posted by Usama Arif 3 weeks, 4 days ago

On Thu, 12 Mar 2026 20:27:20 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:

> Commit 9d5403b1036c ("fs: convert most other generic_file_*mmap() users to
> .mmap_prepare()") updated AFS to use the mmap_prepare callback in favour of
> the deprecated mmap callback.
> 
> However, it did not account for the fact that mmap_prepare can fail to map
> due to an out of memory error, and thus should not be incrementing a
> reference count on mmap_prepare.
> 
> With the newly added vm_ops->mapped callback available, we can simply defer
> this operation to that callback which is only invoked once the mapping is
> successfully in place (but not yet visible to userspace as the mmap and VMA
> write locks are held).
> 
> Therefore add afs_mapped() to implement this callback for AFS.
> 
> In practice the mapping allocations are 'too small to fail' so this is
> something that realistically should never happen in practice (or would do
> so in a case where the process is about to die anyway), but we should still
> handle this.
> 
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
>  fs/afs/file.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/afs/file.c b/fs/afs/file.c
> index f609366fd2ac..69ef86f5e274 100644
> --- a/fs/afs/file.c
> +++ b/fs/afs/file.c
> @@ -28,6 +28,8 @@ static ssize_t afs_file_splice_read(struct file *in, loff_t *ppos,
>  static void afs_vm_open(struct vm_area_struct *area);
>  static void afs_vm_close(struct vm_area_struct *area);
>  static vm_fault_t afs_vm_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff);
> +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> +		      const struct file *file, void **vm_private_data);
>  
>  const struct file_operations afs_file_operations = {
>  	.open		= afs_open,
> @@ -61,6 +63,7 @@ const struct address_space_operations afs_file_aops = {
>  };
>  
>  static const struct vm_operations_struct afs_vm_ops = {
> +	.mapped		= afs_mapped,
>  	.open		= afs_vm_open,
>  	.close		= afs_vm_close,
>  	.fault		= filemap_fault,
> @@ -500,13 +503,22 @@ static int afs_file_mmap_prepare(struct vm_area_desc *desc)
>  	afs_add_open_mmap(vnode);

Is the above afs_add_open_mmap an additional one, which could cause a reference
leak? Does the above one need to be removed and only the one in afs_mapped()
needs to be kept?

>  
>  	ret = generic_file_mmap_prepare(desc);
> -	if (ret == 0)
> -		desc->vm_ops = &afs_vm_ops;
> -	else
> -		afs_drop_open_mmap(vnode);
> +	if (ret)
> +		return ret;
> +
> +	desc->vm_ops = &afs_vm_ops;
>  	return ret;
>  }
>  
> +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> +		      const struct file *file, void **vm_private_data)
> +{
> +	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
> +
> +	afs_add_open_mmap(vnode);
> +	return 0;
> +}
> +
>  static void afs_vm_open(struct vm_area_struct *vma)
>  {
>  	afs_add_open_mmap(AFS_FS_I(file_inode(vma->vm_file)));
> -- 
> 2.53.0
> 
>

Re: [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure

Posted by Lorenzo Stoakes (Oracle) 3 weeks, 4 days ago

On Fri, Mar 13, 2026 at 04:07:43AM -0700, Usama Arif wrote:
> On Thu, 12 Mar 2026 20:27:20 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
>
> > Commit 9d5403b1036c ("fs: convert most other generic_file_*mmap() users to
> > .mmap_prepare()") updated AFS to use the mmap_prepare callback in favour of
> > the deprecated mmap callback.
> >
> > However, it did not account for the fact that mmap_prepare can fail to map
> > due to an out of memory error, and thus should not be incrementing a
> > reference count on mmap_prepare.
> >
> > With the newly added vm_ops->mapped callback available, we can simply defer
> > this operation to that callback which is only invoked once the mapping is
> > successfully in place (but not yet visible to userspace as the mmap and VMA
> > write locks are held).
> >
> > Therefore add afs_mapped() to implement this callback for AFS.
> >
> > In practice the mapping allocations are 'too small to fail' so this is
> > something that realistically should never happen in practice (or would do
> > so in a case where the process is about to die anyway), but we should still
> > handle this.
> >
> > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > ---
> >  fs/afs/file.c | 20 ++++++++++++++++----
> >  1 file changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/afs/file.c b/fs/afs/file.c
> > index f609366fd2ac..69ef86f5e274 100644
> > --- a/fs/afs/file.c
> > +++ b/fs/afs/file.c
> > @@ -28,6 +28,8 @@ static ssize_t afs_file_splice_read(struct file *in, loff_t *ppos,
> >  static void afs_vm_open(struct vm_area_struct *area);
> >  static void afs_vm_close(struct vm_area_struct *area);
> >  static vm_fault_t afs_vm_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff);
> > +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> > +		      const struct file *file, void **vm_private_data);
> >
> >  const struct file_operations afs_file_operations = {
> >  	.open		= afs_open,
> > @@ -61,6 +63,7 @@ const struct address_space_operations afs_file_aops = {
> >  };
> >
> >  static const struct vm_operations_struct afs_vm_ops = {
> > +	.mapped		= afs_mapped,
> >  	.open		= afs_vm_open,
> >  	.close		= afs_vm_close,
> >  	.fault		= filemap_fault,
> > @@ -500,13 +503,22 @@ static int afs_file_mmap_prepare(struct vm_area_desc *desc)
> >  	afs_add_open_mmap(vnode);
>
> Is the above afs_add_open_mmap an additional one, which could cause a reference
> leak? Does the above one need to be removed and only the one in afs_mapped()
> needs to be kept?

Ah yeah good spot, will fix thanks!

>
> >
> >  	ret = generic_file_mmap_prepare(desc);
> > -	if (ret == 0)
> > -		desc->vm_ops = &afs_vm_ops;
> > -	else
> > -		afs_drop_open_mmap(vnode);
> > +	if (ret)
> > +		return ret;
> > +
> > +	desc->vm_ops = &afs_vm_ops;
> >  	return ret;
> >  }
> >
> > +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> > +		      const struct file *file, void **vm_private_data)
> > +{
> > +	struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
> > +
> > +	afs_add_open_mmap(vnode);
> > +	return 0;
> > +}
> > +
> >  static void afs_vm_open(struct vm_area_struct *vma)
> >  {
> >  	afs_add_open_mmap(AFS_FS_I(file_inode(vma->vm_file)));
> > --
> > 2.53.0
> >
> >

Cheers, Lorenzo

Re: [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure

Posted by Suren Baghdasaryan 3 weeks, 1 day ago

On Fri, Mar 13, 2026 at 5:00 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
>
> On Fri, Mar 13, 2026 at 04:07:43AM -0700, Usama Arif wrote:
> > On Thu, 12 Mar 2026 20:27:20 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> >
> > > Commit 9d5403b1036c ("fs: convert most other generic_file_*mmap() users to
> > > .mmap_prepare()") updated AFS to use the mmap_prepare callback in favour of
> > > the deprecated mmap callback.
> > >
> > > However, it did not account for the fact that mmap_prepare can fail to map
> > > due to an out of memory error, and thus should not be incrementing a
> > > reference count on mmap_prepare.

This is a bit confusing. I see the current implementation does
afs_add_open_mmap() and then if generic_file_mmap_prepare() fails it
does afs_drop_open_mmap(), therefore refcounting seems to be balanced.
Is there really a problem?

> > >
> > > With the newly added vm_ops->mapped callback available, we can simply defer
> > > this operation to that callback which is only invoked once the mapping is
> > > successfully in place (but not yet visible to userspace as the mmap and VMA
> > > write locks are held).
> > >
> > > Therefore add afs_mapped() to implement this callback for AFS.
> > >
> > > In practice the mapping allocations are 'too small to fail' so this is
> > > something that realistically should never happen in practice (or would do
> > > so in a case where the process is about to die anyway), but we should still
> > > handle this.

nit: I would drop the above paragraph. If it's impossible why are you
handling it? If it's unlikely, then handling it is even more
important.

> > >
> > > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > > ---
> > >  fs/afs/file.c | 20 ++++++++++++++++----
> > >  1 file changed, 16 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/fs/afs/file.c b/fs/afs/file.c
> > > index f609366fd2ac..69ef86f5e274 100644
> > > --- a/fs/afs/file.c
> > > +++ b/fs/afs/file.c
> > > @@ -28,6 +28,8 @@ static ssize_t afs_file_splice_read(struct file *in, loff_t *ppos,
> > >  static void afs_vm_open(struct vm_area_struct *area);
> > >  static void afs_vm_close(struct vm_area_struct *area);
> > >  static vm_fault_t afs_vm_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff);
> > > +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > +                 const struct file *file, void **vm_private_data);
> > >
> > >  const struct file_operations afs_file_operations = {
> > >     .open           = afs_open,
> > > @@ -61,6 +63,7 @@ const struct address_space_operations afs_file_aops = {
> > >  };
> > >
> > >  static const struct vm_operations_struct afs_vm_ops = {
> > > +   .mapped         = afs_mapped,
> > >     .open           = afs_vm_open,
> > >     .close          = afs_vm_close,
> > >     .fault          = filemap_fault,
> > > @@ -500,13 +503,22 @@ static int afs_file_mmap_prepare(struct vm_area_desc *desc)
> > >     afs_add_open_mmap(vnode);
> >
> > Is the above afs_add_open_mmap an additional one, which could cause a reference
> > leak? Does the above one need to be removed and only the one in afs_mapped()
> > needs to be kept?
>
> Ah yeah good spot, will fix thanks!
>
> >
> > >
> > >     ret = generic_file_mmap_prepare(desc);
> > > -   if (ret == 0)
> > > -           desc->vm_ops = &afs_vm_ops;
> > > -   else
> > > -           afs_drop_open_mmap(vnode);
> > > +   if (ret)
> > > +           return ret;
> > > +
> > > +   desc->vm_ops = &afs_vm_ops;
> > >     return ret;
> > >  }
> > >
> > > +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > +                 const struct file *file, void **vm_private_data)
> > > +{
> > > +   struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
> > > +
> > > +   afs_add_open_mmap(vnode);
> > > +   return 0;
> > > +}
> > > +
> > >  static void afs_vm_open(struct vm_area_struct *vma)
> > >  {
> > >     afs_add_open_mmap(AFS_FS_I(file_inode(vma->vm_file)));
> > > --
> > > 2.53.0
> > >
> > >
>
> Cheers, Lorenzo

Re: [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure

Posted by Lorenzo Stoakes (Oracle) 3 weeks, 1 day ago

On Sun, Mar 15, 2026 at 07:32:54PM -0700, Suren Baghdasaryan wrote:
> On Fri, Mar 13, 2026 at 5:00 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> >
> > On Fri, Mar 13, 2026 at 04:07:43AM -0700, Usama Arif wrote:
> > > On Thu, 12 Mar 2026 20:27:20 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> > >
> > > > Commit 9d5403b1036c ("fs: convert most other generic_file_*mmap() users to
> > > > .mmap_prepare()") updated AFS to use the mmap_prepare callback in favour of
> > > > the deprecated mmap callback.
> > > >
> > > > However, it did not account for the fact that mmap_prepare can fail to map
> > > > due to an out of memory error, and thus should not be incrementing a
> > > > reference count on mmap_prepare.
>
> This is a bit confusing. I see the current implementation does
> afs_add_open_mmap() and then if generic_file_mmap_prepare() fails it
> does afs_drop_open_mmap(), therefore refcounting seems to be balanced.
> Is there really a problem?

Firstly, mmap_prepare is invoked before we try to merge, so the VMA could in
theory get merged and then the refcounting will be wrong.

Secondly, mmap_prepare occurs at such at time where it is _possible_ that
allocation failures as described below could happen.

I'll update the commit message to reflect the merge aspect actually.

>
> > > >
> > > > With the newly added vm_ops->mapped callback available, we can simply defer
> > > > this operation to that callback which is only invoked once the mapping is
> > > > successfully in place (but not yet visible to userspace as the mmap and VMA
> > > > write locks are held).
> > > >
> > > > Therefore add afs_mapped() to implement this callback for AFS.
> > > >
> > > > In practice the mapping allocations are 'too small to fail' so this is
> > > > something that realistically should never happen in practice (or would do
> > > > so in a case where the process is about to die anyway), but we should still
> > > > handle this.
>
> nit: I would drop the above paragraph. If it's impossible why are you
> handling it? If it's unlikely, then handling it is even more
> important.

Sure I can drop it, but it's an ongoing thing with these small allocations.

I wish we could just move to a scenario where we can simpy assume allocations
will always succeed :)

Vlasta - thoughts?

Cheers, Lorenzo

>
> > > >
> > > > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > > > ---
> > > >  fs/afs/file.c | 20 ++++++++++++++++----
> > > >  1 file changed, 16 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/fs/afs/file.c b/fs/afs/file.c
> > > > index f609366fd2ac..69ef86f5e274 100644
> > > > --- a/fs/afs/file.c
> > > > +++ b/fs/afs/file.c
> > > > @@ -28,6 +28,8 @@ static ssize_t afs_file_splice_read(struct file *in, loff_t *ppos,
> > > >  static void afs_vm_open(struct vm_area_struct *area);
> > > >  static void afs_vm_close(struct vm_area_struct *area);
> > > >  static vm_fault_t afs_vm_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff);
> > > > +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > > +                 const struct file *file, void **vm_private_data);
> > > >
> > > >  const struct file_operations afs_file_operations = {
> > > >     .open           = afs_open,
> > > > @@ -61,6 +63,7 @@ const struct address_space_operations afs_file_aops = {
> > > >  };
> > > >
> > > >  static const struct vm_operations_struct afs_vm_ops = {
> > > > +   .mapped         = afs_mapped,
> > > >     .open           = afs_vm_open,
> > > >     .close          = afs_vm_close,
> > > >     .fault          = filemap_fault,
> > > > @@ -500,13 +503,22 @@ static int afs_file_mmap_prepare(struct vm_area_desc *desc)
> > > >     afs_add_open_mmap(vnode);
> > >
> > > Is the above afs_add_open_mmap an additional one, which could cause a reference
> > > leak? Does the above one need to be removed and only the one in afs_mapped()
> > > needs to be kept?
> >
> > Ah yeah good spot, will fix thanks!
> >
> > >
> > > >
> > > >     ret = generic_file_mmap_prepare(desc);
> > > > -   if (ret == 0)
> > > > -           desc->vm_ops = &afs_vm_ops;
> > > > -   else
> > > > -           afs_drop_open_mmap(vnode);
> > > > +   if (ret)
> > > > +           return ret;
> > > > +
> > > > +   desc->vm_ops = &afs_vm_ops;
> > > >     return ret;
> > > >  }
> > > >
> > > > +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > > +                 const struct file *file, void **vm_private_data)
> > > > +{
> > > > +   struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
> > > > +
> > > > +   afs_add_open_mmap(vnode);
> > > > +   return 0;
> > > > +}
> > > > +
> > > >  static void afs_vm_open(struct vm_area_struct *vma)
> > > >  {
> > > >     afs_add_open_mmap(AFS_FS_I(file_inode(vma->vm_file)));
> > > > --
> > > > 2.53.0
> > > >
> > > >
> >
> > Cheers, Lorenzo

Re: [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure

Posted by Suren Baghdasaryan 3 weeks ago

On Mon, Mar 16, 2026 at 7:29 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
>
> On Sun, Mar 15, 2026 at 07:32:54PM -0700, Suren Baghdasaryan wrote:
> > On Fri, Mar 13, 2026 at 5:00 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> > >
> > > On Fri, Mar 13, 2026 at 04:07:43AM -0700, Usama Arif wrote:
> > > > On Thu, 12 Mar 2026 20:27:20 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> > > >
> > > > > Commit 9d5403b1036c ("fs: convert most other generic_file_*mmap() users to
> > > > > .mmap_prepare()") updated AFS to use the mmap_prepare callback in favour of
> > > > > the deprecated mmap callback.
> > > > >
> > > > > However, it did not account for the fact that mmap_prepare can fail to map
> > > > > due to an out of memory error, and thus should not be incrementing a
> > > > > reference count on mmap_prepare.
> >
> > This is a bit confusing. I see the current implementation does
> > afs_add_open_mmap() and then if generic_file_mmap_prepare() fails it
> > does afs_drop_open_mmap(), therefore refcounting seems to be balanced.
> > Is there really a problem?
>
> Firstly, mmap_prepare is invoked before we try to merge, so the VMA could in
> theory get merged and then the refcounting will be wrong.

I see now. Ok, makes sense.

>
> Secondly, mmap_prepare occurs at such at time where it is _possible_ that
> allocation failures as described below could happen.

Right, but in that case afs_file_mmap_prepare() would drop its
refcount and return an error, so refcounting is still good, no?

>
> I'll update the commit message to reflect the merge aspect actually.

Thanks!

>
> >
> > > > >
> > > > > With the newly added vm_ops->mapped callback available, we can simply defer
> > > > > this operation to that callback which is only invoked once the mapping is
> > > > > successfully in place (but not yet visible to userspace as the mmap and VMA
> > > > > write locks are held).
> > > > >
> > > > > Therefore add afs_mapped() to implement this callback for AFS.
> > > > >
> > > > > In practice the mapping allocations are 'too small to fail' so this is
> > > > > something that realistically should never happen in practice (or would do
> > > > > so in a case where the process is about to die anyway), but we should still
> > > > > handle this.
> >
> > nit: I would drop the above paragraph. If it's impossible why are you
> > handling it? If it's unlikely, then handling it is even more
> > important.
>
> Sure I can drop it, but it's an ongoing thing with these small allocations.
>
> I wish we could just move to a scenario where we can simpy assume allocations
> will always succeed :)

That would be really nice but unfortunately the world is not that
perfect. I just don't want to be chasing some rarely reproducible bug
because of the assumption that an allocation is too small to fail.

>
> Vlasta - thoughts?
>
> Cheers, Lorenzo
>
> >
> > > > >
> > > > > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > > > > ---
> > > > >  fs/afs/file.c | 20 ++++++++++++++++----
> > > > >  1 file changed, 16 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/fs/afs/file.c b/fs/afs/file.c
> > > > > index f609366fd2ac..69ef86f5e274 100644
> > > > > --- a/fs/afs/file.c
> > > > > +++ b/fs/afs/file.c
> > > > > @@ -28,6 +28,8 @@ static ssize_t afs_file_splice_read(struct file *in, loff_t *ppos,
> > > > >  static void afs_vm_open(struct vm_area_struct *area);
> > > > >  static void afs_vm_close(struct vm_area_struct *area);
> > > > >  static vm_fault_t afs_vm_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff);
> > > > > +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > > > +                 const struct file *file, void **vm_private_data);
> > > > >
> > > > >  const struct file_operations afs_file_operations = {
> > > > >     .open           = afs_open,
> > > > > @@ -61,6 +63,7 @@ const struct address_space_operations afs_file_aops = {
> > > > >  };
> > > > >
> > > > >  static const struct vm_operations_struct afs_vm_ops = {
> > > > > +   .mapped         = afs_mapped,
> > > > >     .open           = afs_vm_open,
> > > > >     .close          = afs_vm_close,
> > > > >     .fault          = filemap_fault,
> > > > > @@ -500,13 +503,22 @@ static int afs_file_mmap_prepare(struct vm_area_desc *desc)
> > > > >     afs_add_open_mmap(vnode);
> > > >
> > > > Is the above afs_add_open_mmap an additional one, which could cause a reference
> > > > leak? Does the above one need to be removed and only the one in afs_mapped()
> > > > needs to be kept?
> > >
> > > Ah yeah good spot, will fix thanks!
> > >
> > > >
> > > > >
> > > > >     ret = generic_file_mmap_prepare(desc);
> > > > > -   if (ret == 0)
> > > > > -           desc->vm_ops = &afs_vm_ops;
> > > > > -   else
> > > > > -           afs_drop_open_mmap(vnode);
> > > > > +   if (ret)
> > > > > +           return ret;
> > > > > +
> > > > > +   desc->vm_ops = &afs_vm_ops;
> > > > >     return ret;
> > > > >  }
> > > > >
> > > > > +static int afs_mapped(unsigned long start, unsigned long end, pgoff_t pgoff,
> > > > > +                 const struct file *file, void **vm_private_data)
> > > > > +{
> > > > > +   struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
> > > > > +
> > > > > +   afs_add_open_mmap(vnode);
> > > > > +   return 0;
> > > > > +}
> > > > > +
> > > > >  static void afs_vm_open(struct vm_area_struct *vma)
> > > > >  {
> > > > >     afs_add_open_mmap(AFS_FS_I(file_inode(vma->vm_file)));
> > > > > --
> > > > > 2.53.0
> > > > >
> > > > >
> > >
> > > Cheers, Lorenzo

Re: [PATCH 05/15] fs: afs: correctly drop reference count on mapping failure

Posted by Lorenzo Stoakes (Oracle) 3 weeks ago

On Mon, Mar 16, 2026 at 08:41:48PM -0700, Suren Baghdasaryan wrote:
> On Mon, Mar 16, 2026 at 7:29 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> >
> > On Sun, Mar 15, 2026 at 07:32:54PM -0700, Suren Baghdasaryan wrote:
> > > On Fri, Mar 13, 2026 at 5:00 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> > > >
> > > > On Fri, Mar 13, 2026 at 04:07:43AM -0700, Usama Arif wrote:
> > > > > On Thu, 12 Mar 2026 20:27:20 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> > > > >
> > > > > > Commit 9d5403b1036c ("fs: convert most other generic_file_*mmap() users to
> > > > > > .mmap_prepare()") updated AFS to use the mmap_prepare callback in favour of
> > > > > > the deprecated mmap callback.
> > > > > >
> > > > > > However, it did not account for the fact that mmap_prepare can fail to map
> > > > > > due to an out of memory error, and thus should not be incrementing a
> > > > > > reference count on mmap_prepare.
> > >
> > > This is a bit confusing. I see the current implementation does
> > > afs_add_open_mmap() and then if generic_file_mmap_prepare() fails it
> > > does afs_drop_open_mmap(), therefore refcounting seems to be balanced.
> > > Is there really a problem?
> >
> > Firstly, mmap_prepare is invoked before we try to merge, so the VMA could in
> > theory get merged and then the refcounting will be wrong.
>
> I see now. Ok, makes sense.
>
> >
> > Secondly, mmap_prepare occurs at such at time where it is _possible_ that
> > allocation failures as described below could happen.
>
> Right, but in that case afs_file_mmap_prepare() would drop its
> refcount and return an error, so refcounting is still good, no?

Nope, in __mmap_region():

call_mmap_prepare()
-> __mmap_new_vma()
vm_area_alloc() -> can fail
vma_iter_prealloc() -> can fail
__mmap_new_file_vma() / shmem_zero_setup() -> can fail

If any of those fail the VMA is not even set up, so no close() will be called
because there's no VMA to call close on.

This is what makes mmap_prepare very different from mmap which passes in (a
partially established) VMA.

That and of course a potential merge would mean any refcount increment would be
wrong.

>
> >
> > I'll update the commit message to reflect the merge aspect actually.
>
> Thanks!

You're welcome, and done in v2 :)

>
> >
> > >
> > > > > >
> > > > > > With the newly added vm_ops->mapped callback available, we can simply defer
> > > > > > this operation to that callback which is only invoked once the mapping is
> > > > > > successfully in place (but not yet visible to userspace as the mmap and VMA
> > > > > > write locks are held).
> > > > > >
> > > > > > Therefore add afs_mapped() to implement this callback for AFS.
> > > > > >
> > > > > > In practice the mapping allocations are 'too small to fail' so this is
> > > > > > something that realistically should never happen in practice (or would do
> > > > > > so in a case where the process is about to die anyway), but we should still
> > > > > > handle this.
> > >
> > > nit: I would drop the above paragraph. If it's impossible why are you
> > > handling it? If it's unlikely, then handling it is even more
> > > important.
> >
> > Sure I can drop it, but it's an ongoing thing with these small allocations.
> >
> > I wish we could just move to a scenario where we can simpy assume allocations
> > will always succeed :)
>
> That would be really nice but unfortunately the world is not that
> perfect. I just don't want to be chasing some rarely reproducible bug
> because of the assumption that an allocation is too small to fail.

I mean I agree, we should handle all error paths.

Cheers, Lorenzo