arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/book3s/64/tlbflush.h | 5 +++ arch/powerpc/include/asm/tlbbatch.h | 14 ++++++++ arch/powerpc/mm/book3s64/radix_tlb.c | 32 +++++++++++++++++++ 4 files changed, 52 insertions(+) create mode 100644 arch/powerpc/include/asm/tlbbatch.h
Hello All, This is a quick PoC to add ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH support to powerpc for book3s64. The ISA in 6.10 of "Translation Table Update Synchronization Requirements" says that the architecture allows for optimizing the translation cache invalidation by doing it in bulk later after the PTE change has been done. That means if we can add ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH support, it will be possible to utilize optimizations in reclaim and migrate pages path which can defer the tlb invalidations to be done in bulk after all the page unmap operations has been completed. This a quick PoC for the same. Note that this may not be a complete patch yet, TLB on Power is already complex from the hardware side :) and then many optimizations done in the software (e.g. exit_lazy_flush_tlb to avoid tlbies). But since the current patch looked somewhat sane to me, I wanted to share to get an early feedback from people who are well versed with this side of code. Meanwhile I have many TODOs to look into which I am working in parallel for this work. Later will also get some benchmarks w.r.t promotion / demotion. I ran a micro-benchmark which was shared in other commits that adds this support on other archs. I can see some good initial improvements. without patch (perf report showing 7% in radix__flush_tlb_page_psize, even with single thread) ================== root# time ./a.out real 0m23.538s user 0m0.191s sys 0m5.270s # Overhead Command Shared Object Symbol # ........ ....... .......................... ............................................. # 7.19% a.out [kernel.vmlinux] [k] radix__flush_tlb_page_psize 5.63% a.out [kernel.vmlinux] [k] _raw_spin_lock 3.21% a.out a.out [.] main 2.93% a.out [kernel.vmlinux] [k] page_counter_cancel 2.58% a.out [kernel.vmlinux] [k] page_counter_try_charge 2.56% a.out [kernel.vmlinux] [k] _raw_spin_lock_irq 2.30% a.out [kernel.vmlinux] [k] try_to_unmap_one with patch ============ root# time ./a.out real 0m8.593s user 0m0.064s sys 0m1.610s # Overhead Command Shared Object Symbol # ........ ....... .......................... ............................................. # 5.10% a.out [kernel.vmlinux] [k] _raw_spin_lock 3.55% a.out [kernel.vmlinux] [k] __mod_memcg_lruvec_state 3.13% a.out a.out [.] main 3.00% a.out [kernel.vmlinux] [k] page_counter_try_charge 2.62% a.out [kernel.vmlinux] [k] _raw_spin_lock_irq 2.58% a.out [kernel.vmlinux] [k] page_counter_cancel 2.22% a.out [kernel.vmlinux] [k] try_to_unmap_one <micro-benchmark> ==================== #define PAGESIZE 65536 #define SIZE (1 * 1024 * 1024 * 10) int main() { volatile unsigned char *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); memset(p, 0x88, SIZE); for (int k = 0; k < 10000; k++) { /* swap in */ for (int i = 0; i < SIZE; i += PAGESIZE) { (void)p[i]; } /* swap out */ madvise(p, SIZE, MADV_PAGEOUT); } } Ritesh Harjani (IBM) (1): powerpc: Add support for batched unmap TLB flush arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/book3s/64/tlbflush.h | 5 +++ arch/powerpc/include/asm/tlbbatch.h | 14 ++++++++ arch/powerpc/mm/book3s64/radix_tlb.c | 32 +++++++++++++++++++ 4 files changed, 52 insertions(+) create mode 100644 arch/powerpc/include/asm/tlbbatch.h -- 2.46.0
© 2016 - 2024 Red Hat, Inc.