[PATCH] fs: aio: reject partial mremap to avoid Null-pointer-dereference error

Zizhi Wo posted 1 patch 1 month, 4 weeks ago
There is a newer version of this series
fs/aio.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
[PATCH] fs: aio: reject partial mremap to avoid Null-pointer-dereference error
Posted by Zizhi Wo 1 month, 4 weeks ago
From: Zizhi Wo <wozizhi@huawei.com>

[BUG]
Recently, our internal syzkaller testing uncovered a null pointer
dereference issue:
BUG: kernel NULL pointer dereference, address: 0000000000000000
...
[   51.111664]  filemap_read_folio+0x25/0xe0
[   51.112410]  filemap_fault+0xad7/0x1250
[   51.113112]  __do_fault+0x4b/0x460
[   51.113699]  do_pte_missing+0x5bc/0x1db0
[   51.114250]  ? __pte_offset_map+0x23/0x170
[   51.114822]  __handle_mm_fault+0x9f8/0x1680
...
Crash analysis showed the file involved was an AIO ring file. The
phenomenon triggered is the same as the issue described in [1].

[CAUSE]
Consider the following scenario: userspace sets up an AIO context via
io_setup(), which creates a VMA covering the entire ring buffer. Then
userspace calls mremap() with the AIO ring address as the source, a smaller
old_len (less than the full ring size), MREMAP_MAYMOVE set, and without
MREMAP_DONTUNMAP. The kernel will relocate the requested portion to a new
destination address.

During this move, __split_vma() splits the original AIO ring VMA. The
requested portion is unmapped from the source and re-established at the
destination, while the remainder stays at the original source address as
an orphan VMA. The aio_ring_mremap() callback fires on the new destination
VMA, updating ctx->mmap_base to the destination address. But the callback
is unaware that only a partial region was moved and that an orphan VMA
still exists at the source:

  source(AIO):
  +-------------------+---------------------+
  |  moved to dest    |  orphan VMA (AIO)   |
  +-------------------+---------------------+
  A                 A+partial_len        A+ctx->mmap_size

  dest:
  +-------------------+
  |  moved VMA (AIO)  |
  +-------------------+
  B                 B+partial_len

Later, io_destroy() calls vm_munmap(ctx->mmap_base, ctx->mmap_size), which
unmaps the destination. This not only fails to unmap the orphan VMA at the
source, but also overshoots the destination VMA and may unmap unrelated
mappings adjacent to it! After put_aio_ring_file() calls truncate_setsize()
to remove all pages from the pagecache, any subsequent access to the orphan
VMA triggers filemap_fault(), which calls a_ops->read_folio(). Since aio
does not implement read_folio, this results in a NULL pointer dereference.

[FIX]
Note that expanding mremap (new_len > old_len) is already rejected because
AIO ring VMAs are created with VM_DONTEXPAND. The only problematic case is
a partial move where "old_len == new_len" but both are smaller than the
full ring size.

Fix this by checking in aio_ring_mremap() that the new VMA covers the
entire ring. This ensures the AIO ring is always moved as a whole,
preventing orphan VMAs and the subsequent crash.

[1]: https://lore.kernel.org/all/20260413010814.548568-1-wozizhi@huawei.com/

Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com>
---
 fs/aio.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/aio.c b/fs/aio.c
index a07bdd1aaaa6..48d049ff5267 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -369,7 +369,8 @@ static int aio_ring_mremap(struct vm_area_struct *vma)
 
 		ctx = rcu_dereference(table->table[i]);
 		if (ctx && ctx->aio_ring_file == file) {
-			if (!atomic_read(&ctx->dead)) {
+			if (!atomic_read(&ctx->dead) &&
+			    (ctx->mmap_size == (vma->vm_end - vma->vm_start))) {
 				ctx->user_id = ctx->mmap_base = vma->vm_start;
 				res = 0;
 			}
-- 
2.39.2
Re: [PATCH] fs: aio: reject partial mremap to avoid Null-pointer-dereference error
Posted by Christian Brauner 1 month, 3 weeks ago
On Sat, 18 Apr 2026 14:06:34 +0800, Zizhi Wo wrote:
> [BUG]
> Recently, our internal syzkaller testing uncovered a null pointer
> dereference issue:
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> ...
> [   51.111664]  filemap_read_folio+0x25/0xe0
> [   51.112410]  filemap_fault+0xad7/0x1250
> [   51.113112]  __do_fault+0x4b/0x460
> [   51.113699]  do_pte_missing+0x5bc/0x1db0
> [   51.114250]  ? __pte_offset_map+0x23/0x170
> [   51.114822]  __handle_mm_fault+0x9f8/0x1680
> ...
> Crash analysis showed the file involved was an AIO ring file. The
> phenomenon triggered is the same as the issue described in [1].
> 
> [...]

Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes

[1/1] fs: aio: reject partial mremap to avoid Null-pointer-dereference error
      https://git.kernel.org/vfs/vfs/c/db80e11b54d2
Re: [PATCH] fs: aio: reject partial mremap to avoid Null-pointer-dereference error
Posted by Jan Kara 1 month, 4 weeks ago
On Sat 18-04-26 14:06:34, Zizhi Wo wrote:
> From: Zizhi Wo <wozizhi@huawei.com>
> 
> [BUG]
> Recently, our internal syzkaller testing uncovered a null pointer
> dereference issue:
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> ...
> [   51.111664]  filemap_read_folio+0x25/0xe0
> [   51.112410]  filemap_fault+0xad7/0x1250
> [   51.113112]  __do_fault+0x4b/0x460
> [   51.113699]  do_pte_missing+0x5bc/0x1db0
> [   51.114250]  ? __pte_offset_map+0x23/0x170
> [   51.114822]  __handle_mm_fault+0x9f8/0x1680
> ...
> Crash analysis showed the file involved was an AIO ring file. The
> phenomenon triggered is the same as the issue described in [1].
> 
> [CAUSE]
> Consider the following scenario: userspace sets up an AIO context via
> io_setup(), which creates a VMA covering the entire ring buffer. Then
> userspace calls mremap() with the AIO ring address as the source, a smaller
> old_len (less than the full ring size), MREMAP_MAYMOVE set, and without
> MREMAP_DONTUNMAP. The kernel will relocate the requested portion to a new
> destination address.
> 
> During this move, __split_vma() splits the original AIO ring VMA. The
> requested portion is unmapped from the source and re-established at the
> destination, while the remainder stays at the original source address as
> an orphan VMA. The aio_ring_mremap() callback fires on the new destination
> VMA, updating ctx->mmap_base to the destination address. But the callback
> is unaware that only a partial region was moved and that an orphan VMA
> still exists at the source:
> 
>   source(AIO):
>   +-------------------+---------------------+
>   |  moved to dest    |  orphan VMA (AIO)   |
>   +-------------------+---------------------+
>   A                 A+partial_len        A+ctx->mmap_size
> 
>   dest:
>   +-------------------+
>   |  moved VMA (AIO)  |
>   +-------------------+
>   B                 B+partial_len
> 
> Later, io_destroy() calls vm_munmap(ctx->mmap_base, ctx->mmap_size), which
> unmaps the destination. This not only fails to unmap the orphan VMA at the
> source, but also overshoots the destination VMA and may unmap unrelated
> mappings adjacent to it! After put_aio_ring_file() calls truncate_setsize()
> to remove all pages from the pagecache, any subsequent access to the orphan
> VMA triggers filemap_fault(), which calls a_ops->read_folio(). Since aio
> does not implement read_folio, this results in a NULL pointer dereference.
> 
> [FIX]
> Note that expanding mremap (new_len > old_len) is already rejected because
> AIO ring VMAs are created with VM_DONTEXPAND. The only problematic case is
> a partial move where "old_len == new_len" but both are smaller than the
> full ring size.
> 
> Fix this by checking in aio_ring_mremap() that the new VMA covers the
> entire ring. This ensures the AIO ring is always moved as a whole,
> preventing orphan VMAs and the subsequent crash.
> 
> [1]: https://lore.kernel.org/all/20260413010814.548568-1-wozizhi@huawei.com/
> 
> Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com>

Looks good! Thanks! Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/aio.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index a07bdd1aaaa6..48d049ff5267 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -369,7 +369,8 @@ static int aio_ring_mremap(struct vm_area_struct *vma)
>  
>  		ctx = rcu_dereference(table->table[i]);
>  		if (ctx && ctx->aio_ring_file == file) {
> -			if (!atomic_read(&ctx->dead)) {
> +			if (!atomic_read(&ctx->dead) &&
> +			    (ctx->mmap_size == (vma->vm_end - vma->vm_start))) {
>  				ctx->user_id = ctx->mmap_base = vma->vm_start;
>  				res = 0;
>  			}
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR