[RESEND RFC PATCH 0/2] Enable vmalloc huge mappings by default on arm64

Dev Jain posted 2 patches 1 month, 3 weeks ago
arch/arm64/include/asm/vmalloc.h |  6 +++++
arch/arm64/mm/pageattr.c         |  4 +--
include/linux/vmalloc.h          |  7 ++++++
mm/vmalloc.c                     | 43 +++++++++++++++++++++++++-------
4 files changed, 48 insertions(+), 12 deletions(-)
[RESEND RFC PATCH 0/2] Enable vmalloc huge mappings by default on arm64
Posted by Dev Jain 1 month, 3 weeks ago
In the quest for reducing TLB pressure via block mappings, enable huge
vmalloc by default on arm64 for BBML2-noabort systems which support kernel
live mapping split.

This series is an RFC, because I cannot get a performance improvement for
the usual benchmarks which we have. Currently, vmalloc follows an opt-in
approach for block mappings - the users calling vmalloc_huge() are the ones
which expect the most advantage from block mappings. Most users of
vmalloc(), kvmalloc() and kvzalloc() map a single page. After applying this
series, it is expected that a considerable number of users will produce
cont mappings, and probably none will produce PMD mappings.

I am asking for help from the community in testing - I believe that one of
the testing methods is xfstests: a lot of code uses the APIs mentioned
above. I am hoping that someone can jump in and run at least xfstests, and
probably some other tests which can take advantage of the reduced TLB
pressure from vmalloc cont mappings.

---
Patchset applies on Linus' master (d358e5254674).

Dev Jain (2):
  mm/vmalloc: Do not align size to huge size
  arm64/mm: Enable huge-vmalloc by default

 arch/arm64/include/asm/vmalloc.h |  6 +++++
 arch/arm64/mm/pageattr.c         |  4 +--
 include/linux/vmalloc.h          |  7 ++++++
 mm/vmalloc.c                     | 43 +++++++++++++++++++++++++-------
 4 files changed, 48 insertions(+), 12 deletions(-)

-- 
2.30.2
Re: [RESEND RFC PATCH 0/2] Enable vmalloc huge mappings by default on arm64
Posted by Uladzislau Rezki 3 weeks, 5 days ago
On Fri, Dec 12, 2025 at 09:56:59AM +0530, Dev Jain wrote:
> In the quest for reducing TLB pressure via block mappings, enable huge
> vmalloc by default on arm64 for BBML2-noabort systems which support kernel
> live mapping split.
> 
> This series is an RFC, because I cannot get a performance improvement for
> the usual benchmarks which we have. Currently, vmalloc follows an opt-in
> approach for block mappings - the users calling vmalloc_huge() are the ones
> which expect the most advantage from block mappings. Most users of
> vmalloc(), kvmalloc() and kvzalloc() map a single page. After applying this
> series, it is expected that a considerable number of users will produce
> cont mappings, and probably none will produce PMD mappings.
> 
> I am asking for help from the community in testing - I believe that one of
> the testing methods is xfstests: a lot of code uses the APIs mentioned
> above. I am hoping that someone can jump in and run at least xfstests, and
> probably some other tests which can take advantage of the reduced TLB
> pressure from vmalloc cont mappings.
> 
I checked how often vmalloc/vmap is triggered when i run xfstests. I think
it also depends on env. and can be different from one setup to another.

"echo vmalloc:alloc_vmap_area > set_event"

urezki@milan:~/data/optane/xfs-test/xfstests.git$ wc -l ./vmalloc_traces/*.trace
    2875 ./vmalloc_traces/generic_036.trace
   30117 ./vmalloc_traces/generic_038.trace
    8481 ./vmalloc_traces/generic_051.trace
   16986 ./vmalloc_traces/generic_055.trace
    6079 ./vmalloc_traces/generic_068.trace
    2792 ./vmalloc_traces/generic_070.trace
   26945 ./vmalloc_traces/generic_072.trace
    2772 ./vmalloc_traces/generic_076.trace
    2750 ./vmalloc_traces/generic_083.trace
    3319 ./vmalloc_traces/generic_095.trace
    2855 ./vmalloc_traces/generic_232.trace
    3537 ./vmalloc_traces/generic_269.trace
   21265 ./vmalloc_traces/generic_299.trace
    3231 ./vmalloc_traces/generic_300.trace
    3050 ./vmalloc_traces/generic_323.trace
    2831 ./vmalloc_traces/generic_390.trace
    4296 ./vmalloc_traces/generic_461.trace
    4807 ./vmalloc_traces/generic_476.trace
    3198 ./vmalloc_traces/generic_551.trace
    3096 ./vmalloc_traces/generic_616.trace
    6495 ./vmalloc_traces/generic_627.trace
   11232 ./vmalloc_traces/generic_642.trace
   11706 ./vmalloc_traces/generic_650.trace
    3135 ./vmalloc_traces/generic_750.trace
    5926 ./vmalloc_traces/generic_751.trace
   77623 ./vmalloc_traces/xfs_013.trace
    9172 ./vmalloc_traces/xfs_017.trace
    4145 ./vmalloc_traces/xfs_068.trace
    2982 ./vmalloc_traces/xfs_104.trace
    7293 ./vmalloc_traces/xfs_167.trace
   18851 ./vmalloc_traces/xfs_168.trace
    4373 ./vmalloc_traces/xfs_442.trace
    3550 ./vmalloc_traces/xfs_609.trace
  321765 total
urezki@milan:~/data/optane/xfs-test/xfstests.git$

Time execution is different for each test. For example "xfs_013" test
takes around 200 seconds on my system and is in top of number of calls:

77623 / 200 = 388.115 calls/sec
200 / 77623 = 0.002576 = ~each 2.5ms

Please note, i have not checked impact of your patch on time execution
or how TLB pressure is affected.

--
Uladzislau Rezki