fs/aio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[BUG]
Recently, our internal syzkaller testing uncovered a null pointer
dereference issue:
BUG: kernel NULL pointer dereference, address: 0000000000000000
...
[ 51.111664] filemap_read_folio+0x25/0xe0
[ 51.112410] filemap_fault+0xad7/0x1250
[ 51.113112] __do_fault+0x4b/0x460
[ 51.113699] do_pte_missing+0x5bc/0x1db0
[ 51.114250] ? __pte_offset_map+0x23/0x170
[ 51.114822] __handle_mm_fault+0x9f8/0x1680
[ 51.115408] handle_mm_fault+0x24c/0x570
[ 51.115958] do_user_addr_fault+0x226/0xa50
...
Crash analysis showed the file involved was an AIO ring file.
[CAUSE]
PARENT process CHILD process
t=0 io_setup(1, &ctx)
[access ctx addr]
fork()
io_destroy
vm_munmap // not affect child vma
percpu_ref_put
...
put_aio_ring_file
t=1 [access ctx addr] // pagefault
...
__do_fault
filemap_fault
max_idx = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE)
t=2 truncate_setsize
truncate_pagecache
t=3 filemap_get_folio // no folio, create folio
__filemap_get_folio(..., FGP_CREAT, ...) // page_not_uptodate
filemap_read_folio(file, mapping->a_ops->read_folio, folio) // oops!
At t=0, the parent process calls io_setup and then fork. The child process
gets its own VMA but without any PTEs. The parent then calls io_destroy.
Before i_size is truncated to 0, at t=1 the child process accesses this AIO
ctx address and triggers a pagefault. After the max_idx check passes, at
t=2 the parent calls truncate_setsize and truncate_pagecache. At t=3 the
child fails to obtain the folio, falls into the "page_not_uptodate" path,
and hits this problem because AIO does not implement "read_folio".
[Fix]
Fix this by marking the AIO ring buffer VMA with VM_DONTCOPY so
that fork()'s dup_mmap() skips it entirely. This is the correct
semantic because:
1) The child's ioctx_table is already reset to NULL by mm_init_aio() during
fork(), so the child has no AIO context and no way to perform any AIO
operations on this mapping.
2) The AIO ring VMA is only meaningful in conjunction with its associated
kioctx, which is never inherited across fork(). So child process with no
AIO context has no legitimate reason to access the ring buffer. Delivering
SIGSEGV on such an erroneous access is preferable to a kernel crash.
Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com>
---
fs/aio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/aio.c b/fs/aio.c
index 4fe163b8bf67..74e0e40b9636 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -397,11 +397,11 @@ static const struct vm_operations_struct aio_ring_vm_ops = {
#endif
};
static int aio_ring_mmap_prepare(struct vm_area_desc *desc)
{
- vma_desc_set_flags(desc, VMA_DONTEXPAND_BIT);
+ vma_desc_set_flags(desc, VMA_DONTEXPAND_BIT, VMA_DONTCOPY_BIT);
desc->vm_ops = &aio_ring_vm_ops;
return 0;
}
static const struct file_operations aio_ring_fops = {
--
2.39.2
On Mon, 13 Apr 2026 09:08:14 +0800, Zizhi Wo wrote:
> [BUG]
> Recently, our internal syzkaller testing uncovered a null pointer
> dereference issue:
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> ...
> [ 51.111664] filemap_read_folio+0x25/0xe0
> [ 51.112410] filemap_fault+0xad7/0x1250
> [ 51.113112] __do_fault+0x4b/0x460
> [ 51.113699] do_pte_missing+0x5bc/0x1db0
> [ 51.114250] ? __pte_offset_map+0x23/0x170
> [ 51.114822] __handle_mm_fault+0x9f8/0x1680
> [ 51.115408] handle_mm_fault+0x24c/0x570
> [ 51.115958] do_user_addr_fault+0x226/0xa50
> ...
> Crash analysis showed the file involved was an AIO ring file.
>
> [...]
Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes
[1/1] fs: aio: set VMA_DONTCOPY_BIT in mmap to fix NULL-pointer-dereference error
https://git.kernel.org/vfs/vfs/c/66f2b15e0c07
On Mon 13-04-26 09:08:14, Zizhi Wo wrote:
> [BUG]
> Recently, our internal syzkaller testing uncovered a null pointer
> dereference issue:
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> ...
> [ 51.111664] filemap_read_folio+0x25/0xe0
> [ 51.112410] filemap_fault+0xad7/0x1250
> [ 51.113112] __do_fault+0x4b/0x460
> [ 51.113699] do_pte_missing+0x5bc/0x1db0
> [ 51.114250] ? __pte_offset_map+0x23/0x170
> [ 51.114822] __handle_mm_fault+0x9f8/0x1680
> [ 51.115408] handle_mm_fault+0x24c/0x570
> [ 51.115958] do_user_addr_fault+0x226/0xa50
> ...
> Crash analysis showed the file involved was an AIO ring file.
>
> [CAUSE]
> PARENT process CHILD process
> t=0 io_setup(1, &ctx)
> [access ctx addr]
> fork()
> io_destroy
> vm_munmap // not affect child vma
> percpu_ref_put
> ...
> put_aio_ring_file
> t=1 [access ctx addr] // pagefault
> ...
> __do_fault
> filemap_fault
> max_idx = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE)
> t=2 truncate_setsize
> truncate_pagecache
> t=3 filemap_get_folio // no folio, create folio
> __filemap_get_folio(..., FGP_CREAT, ...) // page_not_uptodate
> filemap_read_folio(file, mapping->a_ops->read_folio, folio) // oops!
>
> At t=0, the parent process calls io_setup and then fork. The child process
> gets its own VMA but without any PTEs. The parent then calls io_destroy.
> Before i_size is truncated to 0, at t=1 the child process accesses this AIO
> ctx address and triggers a pagefault. After the max_idx check passes, at
> t=2 the parent calls truncate_setsize and truncate_pagecache. At t=3 the
> child fails to obtain the folio, falls into the "page_not_uptodate" path,
> and hits this problem because AIO does not implement "read_folio".
>
> [Fix]
> Fix this by marking the AIO ring buffer VMA with VM_DONTCOPY so
> that fork()'s dup_mmap() skips it entirely. This is the correct
> semantic because:
>
> 1) The child's ioctx_table is already reset to NULL by mm_init_aio() during
> fork(), so the child has no AIO context and no way to perform any AIO
> operations on this mapping.
> 2) The AIO ring VMA is only meaningful in conjunction with its associated
> kioctx, which is never inherited across fork(). So child process with no
> AIO context has no legitimate reason to access the ring buffer. Delivering
> SIGSEGV on such an erroneous access is preferable to a kernel crash.
>
> Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com>
Thanks for the fix! I agree it would have to be a rather contrived setup to
rely on AIO ringbuffer being inherited by fork(2) for the reasons you
mention above. Plus I think AIO ringbuffer is mostly a legacy thing these
days with people moving towards io_uring so the simpler we can make the
code the better. So I'm OK with trying this simple fix and seeing whether
somebody complains. Let me add Jens to CC just in case he's aware of a
situation where this could be problematic. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/aio.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 4fe163b8bf67..74e0e40b9636 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -397,11 +397,11 @@ static const struct vm_operations_struct aio_ring_vm_ops = {
> #endif
> };
>
> static int aio_ring_mmap_prepare(struct vm_area_desc *desc)
> {
> - vma_desc_set_flags(desc, VMA_DONTEXPAND_BIT);
> + vma_desc_set_flags(desc, VMA_DONTEXPAND_BIT, VMA_DONTCOPY_BIT);
> desc->vm_ops = &aio_ring_vm_ops;
> return 0;
> }
>
> static const struct file_operations aio_ring_fops = {
> --
> 2.39.2
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
© 2016 - 2026 Red Hat, Inc.