arch/arm64/include/asm/pgtable.h | 8 ++++ arch/riscv/include/asm/pgtable.h | 5 +++ include/linux/pgtable.h | 6 ++- mm/huge_memory.c | 22 +++++++--- mm/memory.c | 74 ++++++++++++++++++++++++++++---- 5 files changed, 98 insertions(+), 17 deletions(-)
Overview ======== This patch series adds huge page support for remap_pfn_range(), automatically creating huge mappings when prerequisites are satisfied (size, alignment, architecture support, etc.) and falling back to normal page mappings otherwise. This work builds on Peter Xu's previous efforts on huge pfnmap support [0]. TODO ==== - Add PUD-level huge page support. Currently, only PMD-level huge pages are supported. - Consider the logic related to vmap_page_range and extract reusable common code. Tests Done ========== - Cross-build tests. - Performance tests with custom device driver implementing mmap() with remap_pfn_range(): - lat_mem_rd benchmark modified to use mmap(device_fd) instead of malloc() shows around 40% improvement in memory access latency with huge page support compared to normal page mappings. numactl -C 0 lat_mem_rd -t 4096M (stride=64) Memory Size (MB) Without Huge Mapping With Huge Mapping Improvement ---------------- ----------------- -------------- ----------- 64.00 148.858 ns 100.780 ns 32.3% 128.00 164.745 ns 103.537 ns 37.2% 256.00 169.907 ns 103.179 ns 39.3% 512.00 171.285 ns 103.072 ns 39.8% 1024.00 173.054 ns 103.055 ns 40.4% 2048.00 172.820 ns 103.091 ns 40.3% 4096.00 172.877 ns 103.115 ns 40.4% - Custom memory copy operations on mmap(device_fd) show around 18% performance improvement with huge page support compared to normal page mappings. numactl -C 0 memcpy_test (memory copy performance test) Memory Size (MB) Without Huge Mapping With Huge Mapping Improvement ---------------- ----------------- -------------- ----------- 1024.00 95.76 ms 77.91 ms 18.6% 2048.00 190.87 ms 155.64 ms 18.5% 4096.00 380.84 ms 311.45 ms 18.2% [0] https://lore.kernel.org/all/20240826204353.2228736-2-peterx@redhat.com/T/#u Yin Tirui (2): pgtable: add pte_clrhuge() implementation for arm64 and riscv mm: add PMD-level huge page support for remap_pfn_range() arch/arm64/include/asm/pgtable.h | 8 ++++ arch/riscv/include/asm/pgtable.h | 5 +++ include/linux/pgtable.h | 6 ++- mm/huge_memory.c | 22 +++++++--- mm/memory.c | 74 ++++++++++++++++++++++++++++---- 5 files changed, 98 insertions(+), 17 deletions(-) -- 2.43.0
syzbot ci has tested the following series [v1] mm: add huge pfnmap support for remap_pfn_range() https://lore.kernel.org/all/20250923133104.926672-1-yintirui@huawei.com * [PATCH RFC 1/2] pgtable: add pte_clrhuge() implementation for arm64 and riscv * [PATCH RFC 2/2] mm: add PMD-level huge page support for remap_pfn_range() and found the following issues: * BUG: non-zero pgtables_bytes on freeing mm: NUM * stack segment fault in pgtable_trans_huge_withdraw Full report is available here: https://ci.syzbot.org/series/633cbff7-ef54-4f3a-9133-71cc271396ee *** BUG: non-zero pgtables_bytes on freeing mm: NUM tree: torvalds URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux base: 07e27ad16399afcd693be20211b0dfae63e0615f arch: amd64 compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8 config: https://ci.syzbot.org/builds/72b4b6cf-5400-40d6-94b6-1cfc0e85050d/config C repro: https://ci.syzbot.org/findings/3450ef75-3540-4c00-8b33-5625d4aa40ef/c_repro syz repro: https://ci.syzbot.org/findings/3450ef75-3540-4c00-8b33-5625d4aa40ef/syz_repro BUG: non-zero pgtables_bytes on freeing mm: 4096 *** stack segment fault in pgtable_trans_huge_withdraw tree: torvalds URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux base: 07e27ad16399afcd693be20211b0dfae63e0615f arch: amd64 compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8 config: https://ci.syzbot.org/builds/72b4b6cf-5400-40d6-94b6-1cfc0e85050d/config C repro: https://ci.syzbot.org/findings/dcfb72b5-c263-48da-830a-7f51aaa927db/c_repro syz repro: https://ci.syzbot.org/findings/dcfb72b5-c263-48da-830a-7f51aaa927db/syz_repro Oops: stack segment: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 6000 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:pgtable_trans_huge_withdraw+0x115/0x310 mm/pgtable-generic.c:188 Code: c3 10 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 5d 38 13 00 48 8b 03 48 89 04 24 4c 8d 78 08 4c 89 fd 48 c1 ed 03 <42> 80 7c 2d 00 00 74 08 4c 89 ff e8 3b 38 13 00 49 8b 07 48 8d 48 RSP: 0018:ffffc90002d5f300 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffffea0000fb3dd0 RCX: ffff888107769cc0 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000001 R08: ffff888022b90843 R09: 1ffff11004572108 R10: dffffc0000000000 R11: ffffed1004572109 R12: ffff88803ecf7000 R13: dffffc0000000000 R14: ffff88803ecf7000 R15: 0000000000000008 FS: 0000555576e7a500(0000) GS:ffff8880b8612000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000107d74000 CR4: 00000000000006f0 Call Trace: <TASK> zap_deposited_table mm/huge_memory.c:2177 [inline] zap_huge_pmd+0xa25/0xf50 mm/huge_memory.c:2205 zap_pmd_range mm/memory.c:1798 [inline] zap_pud_range mm/memory.c:1847 [inline] zap_p4d_range mm/memory.c:1868 [inline] unmap_page_range+0x9fe/0x4370 mm/memory.c:1889 unmap_single_vma mm/memory.c:1932 [inline] unmap_vmas+0x399/0x580 mm/memory.c:1976 exit_mmap+0x248/0xb50 mm/mmap.c:1280 __mmput+0x118/0x430 kernel/fork.c:1129 copy_process+0x2910/0x3c00 kernel/fork.c:2454 kernel_clone+0x21e/0x840 kernel/fork.c:2605 __do_sys_clone kernel/fork.c:2748 [inline] __se_sys_clone kernel/fork.c:2732 [inline] __x64_sys_clone+0x18b/0x1e0 kernel/fork.c:2732 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f96b638ec29 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc07e618c8 EFLAGS: 00000206 ORIG_RAX: 0000000000000038 RAX: ffffffffffffffda RBX: 00007f96b65d5fa0 RCX: 00007f96b638ec29 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000002001000 RBP: 00007f96b6411e41 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 00007f96b65d5fa0 R14: 00007f96b65d5fa0 R15: 0000000000000006 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:pgtable_trans_huge_withdraw+0x115/0x310 mm/pgtable-generic.c:188 Code: c3 10 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 5d 38 13 00 48 8b 03 48 89 04 24 4c 8d 78 08 4c 89 fd 48 c1 ed 03 <42> 80 7c 2d 00 00 74 08 4c 89 ff e8 3b 38 13 00 49 8b 07 48 8d 48 RSP: 0018:ffffc90002d5f300 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffffea0000fb3dd0 RCX: ffff888107769cc0 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000001 R08: ffff888022b90843 R09: 1ffff11004572108 R10: dffffc0000000000 R11: ffffed1004572109 R12: ffff88803ecf7000 R13: dffffc0000000000 R14: ffff88803ecf7000 R15: 0000000000000008 FS: 0000555576e7a500(0000) GS:ffff8880b8612000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000107d74000 CR4: 00000000000006f0 ---------------- Code disassembly (best guess): 0: c3 ret 1: 10 48 89 adc %cl,-0x77(%rax) 4: d8 48 c1 fmuls -0x3f(%rax) 7: e8 03 42 80 3c call 0x3c80420f c: 28 00 sub %al,(%rax) e: 74 08 je 0x18 10: 48 89 df mov %rbx,%rdi 13: e8 5d 38 13 00 call 0x133875 18: 48 8b 03 mov (%rbx),%rax 1b: 48 89 04 24 mov %rax,(%rsp) 1f: 4c 8d 78 08 lea 0x8(%rax),%r15 23: 4c 89 fd mov %r15,%rbp 26: 48 c1 ed 03 shr $0x3,%rbp * 2a: 42 80 7c 2d 00 00 cmpb $0x0,0x0(%rbp,%r13,1) <-- trapping instruction 30: 74 08 je 0x3a 32: 4c 89 ff mov %r15,%rdi 35: e8 3b 38 13 00 call 0x133875 3a: 49 8b 07 mov (%r15),%rax 3d: 48 rex.W 3e: 8d .byte 0x8d 3f: 48 rex.W *** If these findings have caused you to resend the series or submit a separate fix, please add the following tag to your commit message: Tested-by: syzbot@syzkaller.appspotmail.com --- This report is generated by a bot. It may contain errors. syzbot ci engineers can be reached at syzkaller@googlegroups.com.
© 2016 - 2025 Red Hat, Inc.