include/linux/oom.h | 23 ++++++ include/linux/swapfile.h | 2 + include/linux/vm_event_item.h | 1 + kernel/exit.c | 2 + mm/memcontrol.c | 6 -- mm/memory.c | 4 +- mm/page_alloc.c | 4 + mm/swapfile.c | 134 ++++++++++++++++++++++++++++++++++ mm/vmstat.c | 1 + 9 files changed, 170 insertions(+), 7 deletions(-)
1. Problem Scenario On systems with ZRAM and swap enabled, simultaneous process exits create contention. The primary bottleneck occurs during swap entry release operations, causing exiting processes to monopolize CPU resources. This leads to scheduling delays for high-priority processes. 2. Android Use Case During camera launch, LMKD terminates background processes to free memory. Exiting processes compete for CPU cycles, delaying the camera preview thread and causing visible stuttering - directly impacting user experience. 3. Root Cause Analysis When background applications heavily utilize swap space, process exit profiling reveals 55% of time spent in free_swap_and_cache_nr(): Function Duration (ms) Percentage do_signal 791.813 **********100% do_group_exit 791.813 **********100% do_exit 791.813 **********100% exit_mm 577.859 *******73% exit_mmap 577.497 *******73% zap_pte_range 558.645 *******71% free_swap_and_cache_nr 433.381 *****55% free_swap_slot 403.568 *****51% swap_entry_free 393.863 *****50% swap_range_free 372.602 ****47% 4. Optimization Approach a) For processes exceeding swap entry threshold: aggregate and isolate swap entries to enable fast exit b) Asynchronously release batched entries when isolation reaches configured threshold 5. Performance Gains (User Scenario: Camera Cold Launch) a) 74% reduction in process exit latency (>500ms cases) b) ~4% lower peak CPU load during concurrent process exits c) ~70MB additional free memory during camera preview initialization d) 40% reduction in camera preview stuttering probability 6. Prior Art & Improvements Reference: Zhiguo Jiang's patch (https://lore.kernel.org/all/20240805153639.1057-1-justinjiang@vivo.com/) Key enhancements: a) Reimplemented logic moved from mmu_gather.c to swapfile.c for clarity b) Async release delegated to workqueue kworkers with configurable max_active for NUMA-optimized concurrency Lei Liu (2): mm: swap: Gather swap entries and batch async release core mm: swap: Forced swap entries release under memory pressure include/linux/oom.h | 23 ++++++ include/linux/swapfile.h | 2 + include/linux/vm_event_item.h | 1 + kernel/exit.c | 2 + mm/memcontrol.c | 6 -- mm/memory.c | 4 +- mm/page_alloc.c | 4 + mm/swapfile.c | 134 ++++++++++++++++++++++++++++++++++ mm/vmstat.c | 1 + 9 files changed, 170 insertions(+), 7 deletions(-) -- 2.34.1
On Tue, Sep 09, 2025 at 02:53:39PM +0800, Lei Liu wrote: > 1. Problem Scenario > On systems with ZRAM and swap enabled, simultaneous process exits create > contention. The primary bottleneck occurs during swap entry release > operations, causing exiting processes to monopolize CPU resources. This > leads to scheduling delays for high-priority processes. > > 2. Android Use Case > During camera launch, LMKD terminates background processes to free memory. How does LMKD trigger the kills? SIGKILL or cgroup.kill? > Exiting processes compete for CPU cycles, delaying the camera preview > thread and causing visible stuttering - directly impacting user > experience. Since the exit/kill is due to low memory situation, punting the memory freeing to a low priority async mechanism will help in improving user experience. Most probably the application (camera preview here) will get into global reclaim and will compete for CPU with the async memory freeing. What we really need is faster memory freeing and we should explore all possible ways. As others suggested fix/improve the bottleneck in the memory freeing path. In addition I think we should explore parallelizing this as well. On Android, I suppose most of the memory is associated with single or small set of processes and parallelizing memory freeing would be challenging. BTW is LMKD using process_mrelease() to release the killed process memory?
On Tue, Sep 9, 2025 at 12:21 PM Shakeel Butt <shakeel.butt@linux.dev> wrote: > > On Tue, Sep 09, 2025 at 02:53:39PM +0800, Lei Liu wrote: > > 1. Problem Scenario > > On systems with ZRAM and swap enabled, simultaneous process exits create > > contention. The primary bottleneck occurs during swap entry release > > operations, causing exiting processes to monopolize CPU resources. This > > leads to scheduling delays for high-priority processes. > > > > 2. Android Use Case > > During camera launch, LMKD terminates background processes to free memory. > > How does LMKD trigger the kills? SIGKILL or cgroup.kill? SIGKILL > > > Exiting processes compete for CPU cycles, delaying the camera preview > > thread and causing visible stuttering - directly impacting user > > experience. > > Since the exit/kill is due to low memory situation, punting the memory > freeing to a low priority async mechanism will help in improving user > experience. Most probably the application (camera preview here) will get > into global reclaim and will compete for CPU with the async memory > freeing. > > What we really need is faster memory freeing and we should explore all > possible ways. As others suggested fix/improve the bottleneck in the > memory freeing path. In addition I think we should explore parallelizing > this as well. > > On Android, I suppose most of the memory is associated with single or > small set of processes and parallelizing memory freeing would be > challenging. BTW is LMKD using process_mrelease() to release the killed > process memory? Yes, LMKD has a reaper thread which wakes up and calls process_mrelease() after the main LMKD thread issued SIGKILL. >
On Tue, Sep 09, 2025 at 12:48:02PM -0700, Suren Baghdasaryan wrote: > On Tue, Sep 9, 2025 at 12:21 PM Shakeel Butt <shakeel.butt@linux.dev> wrote: > > > > On Tue, Sep 09, 2025 at 02:53:39PM +0800, Lei Liu wrote: > > > 1. Problem Scenario > > > On systems with ZRAM and swap enabled, simultaneous process exits create > > > contention. The primary bottleneck occurs during swap entry release > > > operations, causing exiting processes to monopolize CPU resources. This > > > leads to scheduling delays for high-priority processes. > > > > > > 2. Android Use Case > > > During camera launch, LMKD terminates background processes to free memory. > > > > How does LMKD trigger the kills? SIGKILL or cgroup.kill? > > SIGKILL > > > > > > Exiting processes compete for CPU cycles, delaying the camera preview > > > thread and causing visible stuttering - directly impacting user > > > experience. > > > > Since the exit/kill is due to low memory situation, punting the memory > > freeing to a low priority async mechanism will help in improving user > > experience. Most probably the application (camera preview here) will get > > into global reclaim and will compete for CPU with the async memory > > freeing. > > > > What we really need is faster memory freeing and we should explore all > > possible ways. As others suggested fix/improve the bottleneck in the > > memory freeing path. In addition I think we should explore parallelizing > > this as well. > > > > On Android, I suppose most of the memory is associated with single or > > small set of processes and parallelizing memory freeing would be > > challenging. BTW is LMKD using process_mrelease() to release the killed > > process memory? > > Yes, LMKD has a reaper thread which wakes up and calls > process_mrelease() after the main LMKD thread issued SIGKILL. > Thanks Suren. I remember Android is planning to use Apps in cgroup. Is that still the plan? I am actually looking into cgroup.kill, beside sending SIGKILL, putting the processes of the target cgroup in the oom reaper list. In addition, making oom reaper able to reap processes in parallel. I am hoping that functionality to be useful to Android as well. > >
On Wed, Sep 10, 2025 at 1:10 PM Shakeel Butt <shakeel.butt@linux.dev> wrote: > > On Tue, Sep 09, 2025 at 12:48:02PM -0700, Suren Baghdasaryan wrote: > > On Tue, Sep 9, 2025 at 12:21 PM Shakeel Butt <shakeel.butt@linux.dev> wrote: > > > > > > On Tue, Sep 09, 2025 at 02:53:39PM +0800, Lei Liu wrote: > > > > 1. Problem Scenario > > > > On systems with ZRAM and swap enabled, simultaneous process exits create > > > > contention. The primary bottleneck occurs during swap entry release > > > > operations, causing exiting processes to monopolize CPU resources. This > > > > leads to scheduling delays for high-priority processes. > > > > > > > > 2. Android Use Case > > > > During camera launch, LMKD terminates background processes to free memory. > > > > > > How does LMKD trigger the kills? SIGKILL or cgroup.kill? > > > > SIGKILL > > > > > > > > > Exiting processes compete for CPU cycles, delaying the camera preview > > > > thread and causing visible stuttering - directly impacting user > > > > experience. > > > > > > Since the exit/kill is due to low memory situation, punting the memory > > > freeing to a low priority async mechanism will help in improving user > > > experience. Most probably the application (camera preview here) will get > > > into global reclaim and will compete for CPU with the async memory > > > freeing. > > > > > > What we really need is faster memory freeing and we should explore all > > > possible ways. As others suggested fix/improve the bottleneck in the > > > memory freeing path. In addition I think we should explore parallelizing > > > this as well. > > > > > > On Android, I suppose most of the memory is associated with single or > > > small set of processes and parallelizing memory freeing would be > > > challenging. BTW is LMKD using process_mrelease() to release the killed > > > process memory? > > > > Yes, LMKD has a reaper thread which wakes up and calls > > process_mrelease() after the main LMKD thread issued SIGKILL. > > > > Thanks Suren. I remember Android is planning to use Apps in cgroup. Is > that still the plan? I am actually looking into cgroup.kill, beside > sending SIGKILL, putting the processes of the target cgroup in the oom > reaper list. In addition, making oom reaper able to reap processes in > parallel. I am hoping that functionality to be useful to Android as > well. Yes, cgroups v2 with per-app hierarchy is already enabled on Android as of about a year or so ago. The first usecase was the freezer. TJ (CC'ing him here) also changed how ActivityManager Service (AMS) kills process groups to use cgroup.kill (think when you force-stop an app that's what will happen). LMKD has not been changed to use cgroup.kill but that might be worth doing now. TJ, WDYT? > > >
On Wed, Sep 10, 2025 at 1:41 PM Suren Baghdasaryan <surenb@google.com> wrote: > > On Wed, Sep 10, 2025 at 1:10 PM Shakeel Butt <shakeel.butt@linux.dev> wrote: > > > > On Tue, Sep 09, 2025 at 12:48:02PM -0700, Suren Baghdasaryan wrote: > > > On Tue, Sep 9, 2025 at 12:21 PM Shakeel Butt <shakeel.butt@linux.dev> wrote: > > > > > > > > On Tue, Sep 09, 2025 at 02:53:39PM +0800, Lei Liu wrote: > > > > > 1. Problem Scenario > > > > > On systems with ZRAM and swap enabled, simultaneous process exits create > > > > > contention. The primary bottleneck occurs during swap entry release > > > > > operations, causing exiting processes to monopolize CPU resources. This > > > > > leads to scheduling delays for high-priority processes. > > > > > > > > > > 2. Android Use Case > > > > > During camera launch, LMKD terminates background processes to free memory. > > > > > > > > How does LMKD trigger the kills? SIGKILL or cgroup.kill? > > > > > > SIGKILL > > > > > > > > > > > > Exiting processes compete for CPU cycles, delaying the camera preview > > > > > thread and causing visible stuttering - directly impacting user > > > > > experience. > > > > > > > > Since the exit/kill is due to low memory situation, punting the memory > > > > freeing to a low priority async mechanism will help in improving user > > > > experience. Most probably the application (camera preview here) will get > > > > into global reclaim and will compete for CPU with the async memory > > > > freeing. > > > > > > > > What we really need is faster memory freeing and we should explore all > > > > possible ways. As others suggested fix/improve the bottleneck in the > > > > memory freeing path. In addition I think we should explore parallelizing > > > > this as well. > > > > > > > > On Android, I suppose most of the memory is associated with single or > > > > small set of processes and parallelizing memory freeing would be > > > > challenging. BTW is LMKD using process_mrelease() to release the killed > > > > process memory? > > > > > > Yes, LMKD has a reaper thread which wakes up and calls > > > process_mrelease() after the main LMKD thread issued SIGKILL. > > > > > > > Thanks Suren. I remember Android is planning to use Apps in cgroup. Is > > that still the plan? I am actually looking into cgroup.kill, beside > > sending SIGKILL, putting the processes of the target cgroup in the oom > > reaper list. In addition, making oom reaper able to reap processes in > > parallel. I am hoping that functionality to be useful to Android as > > well. > > Yes, cgroups v2 with per-app hierarchy is already enabled on Android > as of about a year or so ago. The first usecase was the freezer. TJ > (CC'ing him here) also changed how ActivityManager Service (AMS) kills > process groups to use cgroup.kill (think when you force-stop an app > that's what will happen). LMKD has not been changed to use cgroup.kill > but that might be worth doing now. TJ, WDYT? Sounds like it's worth trying here [1]. One potential downside of cgroup.kill is that it requires taking the cgroup_mutex, which is one of our most heavily contended locks. We already have logic that waits for exits in libprocessgroup's KillProcessGroup [2], but I don't think LMKD needs or wants that from its main thread. I think we'll still want process_mrelease [3] from LMKD's reaper thread. [1] https://cs.android.com/android/platform/superproject/main/+/main:system/memory/lmkd/reaper.cpp;drc=88ca1a4963004011669da415bc421b846936071f;l=233 [2] https://cs.android.com/android/platform/superproject/main/+/main:system/core/libprocessgroup/processgroup.cpp;drc=61197364367c9e404c7da6900658f1b16c42d0da;l=537 [3] https://cs.android.com/android/platform/superproject/main/+/main:system/memory/lmkd/reaper.cpp;drc=88ca1a4963004011669da415bc421b846936071f;l=123 Shakeel could we not also invoke the oom reaper's help for regular kill(SIGKILL)s?
On Wed, Sep 10, 2025 at 03:10:29PM -0700, T.J. Mercier wrote: > On Wed, Sep 10, 2025 at 1:41 PM Suren Baghdasaryan <surenb@google.com> wrote: > > > > On Wed, Sep 10, 2025 at 1:10 PM Shakeel Butt <shakeel.butt@linux.dev> wrote: > > > > > > On Tue, Sep 09, 2025 at 12:48:02PM -0700, Suren Baghdasaryan wrote: > > > > On Tue, Sep 9, 2025 at 12:21 PM Shakeel Butt <shakeel.butt@linux.dev> wrote: > > > > > > > > > > On Tue, Sep 09, 2025 at 02:53:39PM +0800, Lei Liu wrote: > > > > > > 1. Problem Scenario > > > > > > On systems with ZRAM and swap enabled, simultaneous process exits create > > > > > > contention. The primary bottleneck occurs during swap entry release > > > > > > operations, causing exiting processes to monopolize CPU resources. This > > > > > > leads to scheduling delays for high-priority processes. > > > > > > > > > > > > 2. Android Use Case > > > > > > During camera launch, LMKD terminates background processes to free memory. > > > > > > > > > > How does LMKD trigger the kills? SIGKILL or cgroup.kill? > > > > > > > > SIGKILL > > > > > > > > > > > > > > > Exiting processes compete for CPU cycles, delaying the camera preview > > > > > > thread and causing visible stuttering - directly impacting user > > > > > > experience. > > > > > > > > > > Since the exit/kill is due to low memory situation, punting the memory > > > > > freeing to a low priority async mechanism will help in improving user > > > > > experience. Most probably the application (camera preview here) will get > > > > > into global reclaim and will compete for CPU with the async memory > > > > > freeing. > > > > > > > > > > What we really need is faster memory freeing and we should explore all > > > > > possible ways. As others suggested fix/improve the bottleneck in the > > > > > memory freeing path. In addition I think we should explore parallelizing > > > > > this as well. > > > > > > > > > > On Android, I suppose most of the memory is associated with single or > > > > > small set of processes and parallelizing memory freeing would be > > > > > challenging. BTW is LMKD using process_mrelease() to release the killed > > > > > process memory? > > > > > > > > Yes, LMKD has a reaper thread which wakes up and calls > > > > process_mrelease() after the main LMKD thread issued SIGKILL. > > > > > > > > > > Thanks Suren. I remember Android is planning to use Apps in cgroup. Is > > > that still the plan? I am actually looking into cgroup.kill, beside > > > sending SIGKILL, putting the processes of the target cgroup in the oom > > > reaper list. In addition, making oom reaper able to reap processes in > > > parallel. I am hoping that functionality to be useful to Android as > > > well. > > > > Yes, cgroups v2 with per-app hierarchy is already enabled on Android > > as of about a year or so ago. The first usecase was the freezer. TJ > > (CC'ing him here) also changed how ActivityManager Service (AMS) kills > > process groups to use cgroup.kill (think when you force-stop an app > > that's what will happen). LMKD has not been changed to use cgroup.kill > > but that might be worth doing now. TJ, WDYT? > > Sounds like it's worth trying here [1]. > > One potential downside of cgroup.kill is that it requires taking the > cgroup_mutex, which is one of our most heavily contended locks. Oh let me look into that and see if we can remove cgroup_mutex from that interface. > > We already have logic that waits for exits in libprocessgroup's > KillProcessGroup [2], but I don't think LMKD needs or wants that from > its main thread. I think we'll still want process_mrelease [3] from > LMKD's reaper thread. I imagine once kernel oom reaper can work on killed processes transparently, it would be much easier to let it do the job instead of manual process_mrelease() on all the processes in a cgroup. > > [1] https://cs.android.com/android/platform/superproject/main/+/main:system/memory/lmkd/reaper.cpp;drc=88ca1a4963004011669da415bc421b846936071f;l=233 > [2] https://cs.android.com/android/platform/superproject/main/+/main:system/core/libprocessgroup/processgroup.cpp;drc=61197364367c9e404c7da6900658f1b16c42d0da;l=537 > [3] https://cs.android.com/android/platform/superproject/main/+/main:system/memory/lmkd/reaper.cpp;drc=88ca1a4963004011669da415bc421b846936071f;l=123 > > Shakeel could we not also invoke the oom reaper's help for regular > kill(SIGKILL)s? I don't see why this can not be done. I will take a look.
On Tue, Sep 9, 2025 at 12:48 PM Suren Baghdasaryan <surenb@google.com> wrote: > > > Exiting processes compete for CPU cycles, delaying the camera preview > > > thread and causing visible stuttering - directly impacting user > > > experience. > > > > Since the exit/kill is due to low memory situation, punting the memory > > freeing to a low priority async mechanism will help in improving user > > experience. Most probably the application (camera preview here) will get > > into global reclaim and will compete for CPU with the async memory > > freeing. > > > > What we really need is faster memory freeing and we should explore all > > possible ways. As others suggested fix/improve the bottleneck in the > > memory freeing path. In addition I think we should explore parallelizing > > this as well. > > > > On Android, I suppose most of the memory is associated with single or > > small set of processes and parallelizing memory freeing would be > > challenging. BTW is LMKD using process_mrelease() to release the killed > > process memory? > > Yes, LMKD has a reaper thread which wakes up and calls > process_mrelease() after the main LMKD thread issued SIGKILL. I feel this is a better solution to address the exit process that is too slow. We are basically optimizing the exit() system call, I feel there should be something we can do in the userspace before exit() to help us without the kernel putting too much complexity into exit(). process_mrelease() souds fit the bill pretty well. Chris
On 2025/9/10 3:48, Suren Baghdasaryan wrote: > On Tue, Sep 9, 2025 at 12:21 PM Shakeel Butt <shakeel.butt@linux.dev> wrote: >> On Tue, Sep 09, 2025 at 02:53:39PM +0800, Lei Liu wrote: >>> 1. Problem Scenario >>> On systems with ZRAM and swap enabled, simultaneous process exits create >>> contention. The primary bottleneck occurs during swap entry release >>> operations, causing exiting processes to monopolize CPU resources. This >>> leads to scheduling delays for high-priority processes. >>> >>> 2. Android Use Case >>> During camera launch, LMKD terminates background processes to free memory. >> How does LMKD trigger the kills? SIGKILL or cgroup.kill? > SIGKILL > >>> Exiting processes compete for CPU cycles, delaying the camera preview >>> thread and causing visible stuttering - directly impacting user >>> experience. >> Since the exit/kill is due to low memory situation, punting the memory >> freeing to a low priority async mechanism will help in improving user >> experience. Most probably the application (camera preview here) will get >> into global reclaim and will compete for CPU with the async memory >> freeing. >> >> What we really need is faster memory freeing and we should explore all >> possible ways. As others suggested fix/improve the bottleneck in the >> memory freeing path. In addition I think we should explore parallelizing >> this as well. >> >> On Android, I suppose most of the memory is associated with single or >> small set of processes and parallelizing memory freeing would be >> challenging. BTW is LMKD using process_mrelease() to release the killed >> process memory? > Yes, LMKD has a reaper thread which wakes up and calls > process_mrelease() after the main LMKD thread issued SIGKILL. Hi Suren our current issue is that after lmkd kills a process,|exit_mm|takes considerable time. The interface you provided might help quickly free memory, potentially allowing us to release some memory from processes before lmkd kills them. This could be a good idea. We will take your suggestion into consideration. Thank you >
On Wed, Sep 10, 2025 at 10:14:04PM +0800, Lei Liu wrote: > > On 2025/9/10 3:48, Suren Baghdasaryan wrote: > > On Tue, Sep 9, 2025 at 12:21 PM Shakeel Butt <shakeel.butt@linux.dev> wrote: > > > On Tue, Sep 09, 2025 at 02:53:39PM +0800, Lei Liu wrote: > > > > 1. Problem Scenario > > > > On systems with ZRAM and swap enabled, simultaneous process exits create > > > > contention. The primary bottleneck occurs during swap entry release > > > > operations, causing exiting processes to monopolize CPU resources. This > > > > leads to scheduling delays for high-priority processes. > > > > > > > > 2. Android Use Case > > > > During camera launch, LMKD terminates background processes to free memory. > > > How does LMKD trigger the kills? SIGKILL or cgroup.kill? > > SIGKILL > > > > > > Exiting processes compete for CPU cycles, delaying the camera preview > > > > thread and causing visible stuttering - directly impacting user > > > > experience. > > > Since the exit/kill is due to low memory situation, punting the memory > > > freeing to a low priority async mechanism will help in improving user > > > experience. Most probably the application (camera preview here) will get > > > into global reclaim and will compete for CPU with the async memory > > > freeing. > > > > > > What we really need is faster memory freeing and we should explore all > > > possible ways. As others suggested fix/improve the bottleneck in the > > > memory freeing path. In addition I think we should explore parallelizing > > > this as well. > > > > > > On Android, I suppose most of the memory is associated with single or > > > small set of processes and parallelizing memory freeing would be > > > challenging. BTW is LMKD using process_mrelease() to release the killed > > > process memory? > > Yes, LMKD has a reaper thread which wakes up and calls > > process_mrelease() after the main LMKD thread issued SIGKILL. > > Hi Suren > > our current issue is that after lmkd kills a process,|exit_mm|takes > considerable time. The interface you provided might help quickly free > memory, potentially allowing us to release some memory from processes before > lmkd kills them. This could be a good idea. > > We will take your suggestion into consideration. But LMKD already does the process_mrelease(). Is that not happening on your setup?
on 2025/9/11 4:12, Shakeel Butt wrote: > On Wed, Sep 10, 2025 at 10:14:04PM +0800, Lei Liu wrote: >> On 2025/9/10 3:48, Suren Baghdasaryan wrote: >>> On Tue, Sep 9, 2025 at 12:21 PM Shakeel Butt <shakeel.butt@linux.dev> wrote: >>>> On Tue, Sep 09, 2025 at 02:53:39PM +0800, Lei Liu wrote: >>>>> 1. Problem Scenario >>>>> On systems with ZRAM and swap enabled, simultaneous process exits create >>>>> contention. The primary bottleneck occurs during swap entry release >>>>> operations, causing exiting processes to monopolize CPU resources. This >>>>> leads to scheduling delays for high-priority processes. >>>>> >>>>> 2. Android Use Case >>>>> During camera launch, LMKD terminates background processes to free memory. >>>> How does LMKD trigger the kills? SIGKILL or cgroup.kill? >>> SIGKILL >>> >>>>> Exiting processes compete for CPU cycles, delaying the camera preview >>>>> thread and causing visible stuttering - directly impacting user >>>>> experience. >>>> Since the exit/kill is due to low memory situation, punting the memory >>>> freeing to a low priority async mechanism will help in improving user >>>> experience. Most probably the application (camera preview here) will get >>>> into global reclaim and will compete for CPU with the async memory >>>> freeing. >>>> >>>> What we really need is faster memory freeing and we should explore all >>>> possible ways. As others suggested fix/improve the bottleneck in the >>>> memory freeing path. In addition I think we should explore parallelizing >>>> this as well. >>>> >>>> On Android, I suppose most of the memory is associated with single or >>>> small set of processes and parallelizing memory freeing would be >>>> challenging. BTW is LMKD using process_mrelease() to release the killed >>>> process memory? >>> Yes, LMKD has a reaper thread which wakes up and calls >>> process_mrelease() after the main LMKD thread issued SIGKILL. >> Hi Suren >> >> our current issue is that after lmkd kills a process,|exit_mm|takes >> considerable time. The interface you provided might help quickly free >> memory, potentially allowing us to release some memory from processes before >> lmkd kills them. This could be a good idea. >> >> We will take your suggestion into consideration. > But LMKD already does the process_mrelease(). Is that not happening on > your setup? Hi Shakeel Thank you for your consideration. In our product, we have observed that in scenarios where multiple processes are being killed, the load on the lmkd_reaper thread can become very heavy, leading to issues with power consumption and lag. This problem also occurs in the current camera launch scenario. Best regards, Lei
On Wed, Sep 10, 2025 at 7:14 AM Lei Liu <liulei.rjpt@vivo.com> wrote: > >> On Android, I suppose most of the memory is associated with single or > >> small set of processes and parallelizing memory freeing would be > >> challenging. BTW is LMKD using process_mrelease() to release the killed > >> process memory? > > Yes, LMKD has a reaper thread which wakes up and calls > > process_mrelease() after the main LMKD thread issued SIGKILL. > > Hi Suren > > our current issue is that after lmkd kills a process,|exit_mm|takes > considerable time. The interface you provided might help quickly free > memory, potentially allowing us to release some memory from processes > before lmkd kills them. This could be a good idea. > > We will take your suggestion into consideration. Hi Lei, I do want to help your usage case. With my previous analysis of the swap fault time breakdown. The amount of time it spends on batching freeing the swap entry is not that much. Yes, it has a long tail, but that is on a very small percentage of page faults. It shouldn't have such a huge impact on the global average time. https://services.google.com/fh/files/misc/zswap-breakdown.png https://services.google.com/fh/files/misc/zswap-breakdown-detail.png That is what I am trying to get to, the batch free of swap entry is just the surface level. By itself it does not contribute much. Your exit latency is largely a different issue. However, the approach you take, (I briefly go over your patch) is to add another batch layer for the swap entry free. Which impacts not only the exit() path, it impacts other non exit() freeing of swap entry as well. The swap entry is a resource best managed by the swap allocator. The swap allocator knows best when it is the best place to cache it vs freeing it under pressure. The extra batch of swap entry free (before triggering the threshold) is just swap entry seating in the batch queue. The allocator has no internal knowledge of this batch behavior and it is interfering with the global view of swap entry allocator. You need to address this before your patch can be re-considered. It feels like a CFO needs to do a company wide budget and revenue projection. The sales department is having a side pocket account to defer the revenue and sand bagging the sales number, which can jeopardize the CFO's ability to budget and project . BTW, what I describe is probably illegal for public companies. Kids, don't try this at home. I think you can do some of the following: 1) redo the test with the latest kernel which does not have the swap slot caching batching any more. Report back what you got. 2) try out the process_mrelease(). Please share your findings, I am happy to work with you to address the problem you encounter. Chris
On Wed, Sep 10, 2025 at 7:14 AM Lei Liu <liulei.rjpt@vivo.com> wrote: > > > On 2025/9/10 3:48, Suren Baghdasaryan wrote: > > On Tue, Sep 9, 2025 at 12:21 PM Shakeel Butt <shakeel.butt@linux.dev> wrote: > >> On Tue, Sep 09, 2025 at 02:53:39PM +0800, Lei Liu wrote: > >>> 1. Problem Scenario > >>> On systems with ZRAM and swap enabled, simultaneous process exits create > >>> contention. The primary bottleneck occurs during swap entry release > >>> operations, causing exiting processes to monopolize CPU resources. This > >>> leads to scheduling delays for high-priority processes. > >>> > >>> 2. Android Use Case > >>> During camera launch, LMKD terminates background processes to free memory. > >> How does LMKD trigger the kills? SIGKILL or cgroup.kill? > > SIGKILL > > > >>> Exiting processes compete for CPU cycles, delaying the camera preview > >>> thread and causing visible stuttering - directly impacting user > >>> experience. > >> Since the exit/kill is due to low memory situation, punting the memory > >> freeing to a low priority async mechanism will help in improving user > >> experience. Most probably the application (camera preview here) will get > >> into global reclaim and will compete for CPU with the async memory > >> freeing. > >> > >> What we really need is faster memory freeing and we should explore all > >> possible ways. As others suggested fix/improve the bottleneck in the > >> memory freeing path. In addition I think we should explore parallelizing > >> this as well. > >> > >> On Android, I suppose most of the memory is associated with single or > >> small set of processes and parallelizing memory freeing would be > >> challenging. BTW is LMKD using process_mrelease() to release the killed > >> process memory? > > Yes, LMKD has a reaper thread which wakes up and calls > > process_mrelease() after the main LMKD thread issued SIGKILL. > > Hi Suren > > our current issue is that after lmkd kills a process,|exit_mm|takes > considerable time. The interface you provided might help quickly free > memory, potentially allowing us to release some memory from processes > before lmkd kills them. This could be a good idea. > > We will take your suggestion into consideration. I wasn't really suggesting anything, just explaining how LMKD works today. > > > Thank you > > > > > >
On Tue, Sep 9, 2025 at 3:04 PM Lei Liu <liulei.rjpt@vivo.com> wrote: > Hi Lei, > 1. Problem Scenario > On systems with ZRAM and swap enabled, simultaneous process exits create > contention. The primary bottleneck occurs during swap entry release > operations, causing exiting processes to monopolize CPU resources. This > leads to scheduling delays for high-priority processes. > > 2. Android Use Case > During camera launch, LMKD terminates background processes to free memory. > Exiting processes compete for CPU cycles, delaying the camera preview > thread and causing visible stuttering - directly impacting user > experience. > > 3. Root Cause Analysis > When background applications heavily utilize swap space, process exit > profiling reveals 55% of time spent in free_swap_and_cache_nr(): > > Function Duration (ms) Percentage > do_signal 791.813 **********100% > do_group_exit 791.813 **********100% > do_exit 791.813 **********100% > exit_mm 577.859 *******73% > exit_mmap 577.497 *******73% > zap_pte_range 558.645 *******71% > free_swap_and_cache_nr 433.381 *****55% > free_swap_slot 403.568 *****51% Thanks for sharing this case. One problem is that now the free_swap_slot function no longer exists after 0ff67f990bd4. Have you tested the latest kernel? Or what is the actual overhead here? Some batch freeing optimizations are introduced. And we have reworked the whole locking mechanism for swap, so even on a system with 96t the contention seems barely observable with common workloads. And another series is further reducing the contention and the overall overhead (24% faster freeing for phase 1): https://lore.kernel.org/linux-mm/20250905191357.78298-1-ryncsn@gmail.com/ Will these be helpful for you? I think optimizing the root problem is better than just deferring the overhead with async workers, which may increase the overall overhead and complexity. > swap_entry_free 393.863 *****50% > swap_range_free 372.602 ****47% > > 4. Optimization Approach > a) For processes exceeding swap entry threshold: aggregate and isolate > swap entries to enable fast exit > b) Asynchronously release batched entries when isolation reaches > configured threshold > > 5. Performance Gains (User Scenario: Camera Cold Launch) > a) 74% reduction in process exit latency (>500ms cases) > b) ~4% lower peak CPU load during concurrent process exits > c) ~70MB additional free memory during camera preview initialization > d) 40% reduction in camera preview stuttering probability > > 6. Prior Art & Improvements > Reference: Zhiguo Jiang's patch > (https://lore.kernel.org/all/20240805153639.1057-1-justinjiang@vivo.com/) > > Key enhancements: > a) Reimplemented logic moved from mmu_gather.c to swapfile.c for clarity > b) Async release delegated to workqueue kworkers with configurable > max_active for NUMA-optimized concurrency > > Lei Liu (2): > mm: swap: Gather swap entries and batch async release core > mm: swap: Forced swap entries release under memory pressure > > include/linux/oom.h | 23 ++++++ > include/linux/swapfile.h | 2 + > include/linux/vm_event_item.h | 1 + > kernel/exit.c | 2 + > mm/memcontrol.c | 6 -- > mm/memory.c | 4 +- > mm/page_alloc.c | 4 + > mm/swapfile.c | 134 ++++++++++++++++++++++++++++++++++ > mm/vmstat.c | 1 + > 9 files changed, 170 insertions(+), 7 deletions(-) > > -- > 2.34.1 > >
On 2025/9/9 15:30, Kairui Song wrote: > [You don't often get email from ryncsn@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > On Tue, Sep 9, 2025 at 3:04 PM Lei Liu <liulei.rjpt@vivo.com> wrote: > Hi Lei, > >> 1. Problem Scenario >> On systems with ZRAM and swap enabled, simultaneous process exits create >> contention. The primary bottleneck occurs during swap entry release >> operations, causing exiting processes to monopolize CPU resources. This >> leads to scheduling delays for high-priority processes. >> >> 2. Android Use Case >> During camera launch, LMKD terminates background processes to free memory. >> Exiting processes compete for CPU cycles, delaying the camera preview >> thread and causing visible stuttering - directly impacting user >> experience. >> >> 3. Root Cause Analysis >> When background applications heavily utilize swap space, process exit >> profiling reveals 55% of time spent in free_swap_and_cache_nr(): >> >> Function Duration (ms) Percentage >> do_signal 791.813 **********100% >> do_group_exit 791.813 **********100% >> do_exit 791.813 **********100% >> exit_mm 577.859 *******73% >> exit_mmap 577.497 *******73% >> zap_pte_range 558.645 *******71% >> free_swap_and_cache_nr 433.381 *****55% >> free_swap_slot 403.568 *****51% > Thanks for sharing this case. > > One problem is that now the free_swap_slot function no longer exists > after 0ff67f990bd4. Have you tested the latest kernel? Or what is the > actual overhead here? > > Some batch freeing optimizations are introduced. And we have reworked > the whole locking mechanism for swap, so even on a system with 96t the > contention seems barely observable with common workloads. > > And another series is further reducing the contention and the overall > overhead (24% faster freeing for phase 1): > https://lore.kernel.org/linux-mm/20250905191357.78298-1-ryncsn@gmail.com/ > > Will these be helpful for you? I think optimizing the root problem is > better than just deferring the overhead with async workers, which may > increase the overall overhead and complexity. Hi Kairui Thank you for your optimization suggestions. We believe your patch may help ou r scenario. We'll try integrating it to evaluate benefits. However, it may not fully solve our issue. Below is our problem description: Flame graph of time distribution for TikTok process exit (~400MB swapped): do_notify_resume 3.89% get_signal 3.89% do_signal_exit 3.88% do_exit 3.88% mmput 3.22% exit_mmap 3.22% unmap_vmas 3.08% unmap_page_range 3.07% free_swap_and_cache_nr 1.31%**** swap_entry_range_free 1.17%**** zram_slot_free_notify 1.11%**** zram_free_hw_entry_dc 0.43% free_zspage[zsmalloc] 0.09% CPU: 8-core ARM64 (14.21GHz+33.5GHz+4*2.7GHz), 12GB RAM Process with ~400MB swap exit situation: Exit takes 200-300ms, ~4% CPU load With more zram compression/swap, exit time increases to 400-500ms free_swap_and_cache_nr avg: 0.5ms, max: ~1.5ms (running time) free_swap_and_cache_nr dominates exit time (33%, up to 50% in worst cases ). Main time is zram resource freeing (0.25ms per operation). With dozens of simultaneous exits, cumulative time becomes significant. Optimization approach: Focus isn't on optimizing hot functions (limited improvement potential). High load comes from too many simultaneous exits. We'll make time-consumin g interfaces in do_exit asynchronous to accelerate exit completion while allowing non-swap page (file/anonymous) freeing by other processes. Camera startup scenario: 20-30 background apps, anonymous pages compressed to zram (200-500MB). Camera launch triggers lmkd to kill 10+ apps - their exits consume 25%+ CPU. System services/third-party processes use 60%+ CPU, leaving camera startup process CPU-starved and delayed. Sincere wishes, Lei > > >> swap_entry_free 393.863 *****50% >> swap_range_free 372.602 ****47% >> >> 4. Optimization Approach >> a) For processes exceeding swap entry threshold: aggregate and isolate >> swap entries to enable fast exit >> b) Asynchronously release batched entries when isolation reaches >> configured threshold >> >> 5. Performance Gains (User Scenario: Camera Cold Launch) >> a) 74% reduction in process exit latency (>500ms cases) >> b) ~4% lower peak CPU load during concurrent process exits >> c) ~70MB additional free memory during camera preview initialization >> d) 40% reduction in camera preview stuttering probability >> >> 6. Prior Art & Improvements >> Reference: Zhiguo Jiang's patch >> (https://lore.kernel.org/all/20240805153639.1057-1-justinjiang@vivo.com/) >> >> Key enhancements: >> a) Reimplemented logic moved from mmu_gather.c to swapfile.c for clarity >> b) Async release delegated to workqueue kworkers with configurable >> max_active for NUMA-optimized concurrency >> >> Lei Liu (2): >> mm: swap: Gather swap entries and batch async release core >> mm: swap: Forced swap entries release under memory pressure >> >> include/linux/oom.h | 23 ++++++ >> include/linux/swapfile.h | 2 + >> include/linux/vm_event_item.h | 1 + >> kernel/exit.c | 2 + >> mm/memcontrol.c | 6 -- >> mm/memory.c | 4 +- >> mm/page_alloc.c | 4 + >> mm/swapfile.c | 134 ++++++++++++++++++++++++++++++++++ >> mm/vmstat.c | 1 + >> 9 files changed, 170 insertions(+), 7 deletions(-) >> >> -- >> 2.34.1 >> >>
On Tue, Sep 9, 2025 at 12:31 AM Kairui Song <ryncsn@gmail.com> wrote: > > On Tue, Sep 9, 2025 at 3:04 PM Lei Liu <liulei.rjpt@vivo.com> wrote: > > > > Hi Lei, > > > 1. Problem Scenario > > On systems with ZRAM and swap enabled, simultaneous process exits create > > contention. The primary bottleneck occurs during swap entry release > > operations, causing exiting processes to monopolize CPU resources. This > > leads to scheduling delays for high-priority processes. > > > > 2. Android Use Case > > During camera launch, LMKD terminates background processes to free memory. > > Exiting processes compete for CPU cycles, delaying the camera preview > > thread and causing visible stuttering - directly impacting user > > experience. > > > > 3. Root Cause Analysis > > When background applications heavily utilize swap space, process exit > > profiling reveals 55% of time spent in free_swap_and_cache_nr(): > > > > Function Duration (ms) Percentage > > do_signal 791.813 **********100% > > do_group_exit 791.813 **********100% > > do_exit 791.813 **********100% > > exit_mm 577.859 *******73% > > exit_mmap 577.497 *******73% > > zap_pte_range 558.645 *******71% > > free_swap_and_cache_nr 433.381 *****55% > > free_swap_slot 403.568 *****51% > > Thanks for sharing this case. > > One problem is that now the free_swap_slot function no longer exists > after 0ff67f990bd4. Have you tested the latest kernel? Or what is the > actual overhead here? > > Some batch freeing optimizations are introduced. And we have reworked > the whole locking mechanism for swap, so even on a system with 96t the > contention seems barely observable with common workloads. > > And another series is further reducing the contention and the overall > overhead (24% faster freeing for phase 1): > https://lore.kernel.org/linux-mm/20250905191357.78298-1-ryncsn@gmail.com/ > > Will these be helpful for you? I think optimizing the root problem is > better than just deferring the overhead with async workers, which may > increase the overall overhead and complexity. +100. Hi Lei, This CC list is very long :-) Is it similar to this one a while back? https://lore.kernel.org/linux-mm/20240213-async-free-v3-1-b89c3cc48384@kernel.org/ I ultimately abandoned this approach and considered it harmful. Yes, I can be as harsh as I like for my own previous bad ideas. The better solution is as Kairui did, just remove the swap slot caching completely. It is the harder path to take and get better results. I recall having a discussion with Kairui on this and we are aligned on removing the swap slot caching eventually . Thanks Kairui for the heavy lifting of actually removing the swap slot cache. I am just cheerleading on the side :-) So no, we are not getting the async free of swap slot caching again. We shouldn't need to. Chris > > > > swap_entry_free 393.863 *****50% > > swap_range_free 372.602 ****47% > > > > 4. Optimization Approach > > a) For processes exceeding swap entry threshold: aggregate and isolate > > swap entries to enable fast exit > > b) Asynchronously release batched entries when isolation reaches > > configured threshold > > > > 5. Performance Gains (User Scenario: Camera Cold Launch) > > a) 74% reduction in process exit latency (>500ms cases) > > b) ~4% lower peak CPU load during concurrent process exits > > c) ~70MB additional free memory during camera preview initialization > > d) 40% reduction in camera preview stuttering probability > > > > 6. Prior Art & Improvements > > Reference: Zhiguo Jiang's patch > > (https://lore.kernel.org/all/20240805153639.1057-1-justinjiang@vivo.com/) > > > > Key enhancements: > > a) Reimplemented logic moved from mmu_gather.c to swapfile.c for clarity > > b) Async release delegated to workqueue kworkers with configurable > > max_active for NUMA-optimized concurrency > > > > Lei Liu (2): > > mm: swap: Gather swap entries and batch async release core > > mm: swap: Forced swap entries release under memory pressure > > > > include/linux/oom.h | 23 ++++++ > > include/linux/swapfile.h | 2 + > > include/linux/vm_event_item.h | 1 + > > kernel/exit.c | 2 + > > mm/memcontrol.c | 6 -- > > mm/memory.c | 4 +- > > mm/page_alloc.c | 4 + > > mm/swapfile.c | 134 ++++++++++++++++++++++++++++++++++ > > mm/vmstat.c | 1 + > > 9 files changed, 170 insertions(+), 7 deletions(-) > > > > -- > > 2.34.1 > > > > >
On Tue, Sep 9, 2025 at 3:30 PM Kairui Song <ryncsn@gmail.com> wrote: > > On Tue, Sep 9, 2025 at 3:04 PM Lei Liu <liulei.rjpt@vivo.com> wrote: > > > > Hi Lei, > > > 1. Problem Scenario > > On systems with ZRAM and swap enabled, simultaneous process exits create > > contention. The primary bottleneck occurs during swap entry release > > operations, causing exiting processes to monopolize CPU resources. This > > leads to scheduling delays for high-priority processes. > > > > 2. Android Use Case > > During camera launch, LMKD terminates background processes to free memory. > > Exiting processes compete for CPU cycles, delaying the camera preview > > thread and causing visible stuttering - directly impacting user > > experience. > > > > 3. Root Cause Analysis > > When background applications heavily utilize swap space, process exit > > profiling reveals 55% of time spent in free_swap_and_cache_nr(): > > > > Function Duration (ms) Percentage > > do_signal 791.813 **********100% > > do_group_exit 791.813 **********100% > > do_exit 791.813 **********100% > > exit_mm 577.859 *******73% > > exit_mmap 577.497 *******73% > > zap_pte_range 558.645 *******71% > > free_swap_and_cache_nr 433.381 *****55% > > free_swap_slot 403.568 *****51% > > Thanks for sharing this case. > > One problem is that now the free_swap_slot function no longer exists > after 0ff67f990bd4. Have you tested the latest kernel? Or what is the > actual overhead here? > > Some batch freeing optimizations are introduced. And we have reworked > the whole locking mechanism for swap, so even on a system with 96t the > contention seems barely observable with common workloads. > > And another series is further reducing the contention and the overall > overhead (24% faster freeing for phase 1): > https://lore.kernel.org/linux-mm/20250905191357.78298-1-ryncsn@gmail.com/ > > Will these be helpful for you? I think optimizing the root problem is > better than just deferring the overhead with async workers, which may > increase the overall overhead and complexity. > I feel the cover letter does not clearly describe where the bottleneck occurs or where the performance gains originate. To be honest, even the versions submitted last year did not present the bottleneck clearly. For example, is this due to lock contention (in which case we would need performance data to see how much CPU time is spent waiting for locks), or simply because we can simultaneously zap present and non-present PTEs? Thanks Barry
On 2025/9/9 17:24, Barry Song wrote: > [You don't often get email from 21cnbao@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > On Tue, Sep 9, 2025 at 3:30 PM Kairui Song <ryncsn@gmail.com> wrote: >> On Tue, Sep 9, 2025 at 3:04 PM Lei Liu <liulei.rjpt@vivo.com> wrote: >> Hi Lei, >> >>> 1. Problem Scenario >>> On systems with ZRAM and swap enabled, simultaneous process exits create >>> contention. The primary bottleneck occurs during swap entry release >>> operations, causing exiting processes to monopolize CPU resources. This >>> leads to scheduling delays for high-priority processes. >>> >>> 2. Android Use Case >>> During camera launch, LMKD terminates background processes to free memory. >>> Exiting processes compete for CPU cycles, delaying the camera preview >>> thread and causing visible stuttering - directly impacting user >>> experience. >>> >>> 3. Root Cause Analysis >>> When background applications heavily utilize swap space, process exit >>> profiling reveals 55% of time spent in free_swap_and_cache_nr(): >>> >>> Function Duration (ms) Percentage >>> do_signal 791.813 **********100% >>> do_group_exit 791.813 **********100% >>> do_exit 791.813 **********100% >>> exit_mm 577.859 *******73% >>> exit_mmap 577.497 *******73% >>> zap_pte_range 558.645 *******71% >>> free_swap_and_cache_nr 433.381 *****55% >>> free_swap_slot 403.568 *****51% >> Thanks for sharing this case. >> >> One problem is that now the free_swap_slot function no longer exists >> after 0ff67f990bd4. Have you tested the latest kernel? Or what is the >> actual overhead here? >> >> Some batch freeing optimizations are introduced. And we have reworked >> the whole locking mechanism for swap, so even on a system with 96t the >> contention seems barely observable with common workloads. >> >> And another series is further reducing the contention and the overall >> overhead (24% faster freeing for phase 1): >> https://lore.kernel.org/linux-mm/20250905191357.78298-1-ryncsn@gmail.com/ >> >> Will these be helpful for you? I think optimizing the root problem is >> better than just deferring the overhead with async workers, which may >> increase the overall overhead and complexity. >> > I feel the cover letter does not clearly describe where the bottleneck > occurs or where the performance gains originate. To be honest, even > the versions submitted last year did not present the bottleneck clearly. > > For example, is this due to lock contention (in which case we would > need performance data to see how much CPU time is spent waiting for > locks), or simply because we can simultaneously zap present and > non-present PTEs? > > Thanks > Barry Hi Barry Thank you for your question. Here is the issue we are encountering: Flame graph of time distribution for douyin process exit (~400MB swapped): do_notify_resume 3.89% get_signal 3.89% do_signal_exit 3.88% do_exit 3.88% mmput 3.22% exit_mmap 3.22% unmap_vmas 3.08% unmap_page_range 3.07% free_swap_and_cache_nr 1.31%**** swap_entry_range_free 1.17%**** zram_slot_free_notify 1.11%**** zram_free_hw_entry_dc 0.43% free_zspage[zsmalloc] 0.09% CPU: 8-core ARM64 (14.21GHz+33.5GHz+4*2.7GHz), 12GB RAM Process with ~400MB swap exit situation: Exit takes 200-300ms, ~4% CPU load With more zram compression/swap, exit time increases to 400-500ms free_swap_and_cache_nr avg: 0.5ms, max: ~1.5ms (running time) free_swap_and_cache_nr dominates exit time (33%, up to 50% in worst cases ). Main time is zram resource freeing (0.25ms per operation). With dozens of simultaneous exits, cumulative time becomes significant. Optimization approach: Focus isn't on optimizing hot functions (limited improvement potential). High load comes from too many simultaneous exits. We'll make time-consumin g interfaces in do_exit asynchronous to accelerate exit completion while allowing non-swap page (file/anonymous) freeing by other processes. Camera startup scenario: 20-30 background apps, anonymous pages compressed to zram (200-500MB). Camera launch triggers lmkd to kill 10+ apps - their exits consume 25%+ CPU. System services/third-party processes use 60%+ CPU, leaving camera startup process CPU-starved and delayed. Sincere wishes, Lei
On Tue, Sep 9, 2025 at 2:24 AM Barry Song <21cnbao@gmail.com> wrote: > I feel the cover letter does not clearly describe where the bottleneck > occurs or where the performance gains originate. To be honest, even > the versions submitted last year did not present the bottleneck clearly. > > For example, is this due to lock contention (in which case we would > need performance data to see how much CPU time is spent waiting for > locks), or simply because we can simultaneously zap present and > non-present PTEs? I have done some long tail analysis of the zswap page fault a while back, before zswap converting to xarray. For the zswap page fault, in the long tail a good chunk is a bath free swap slot. The breakdown inside shows a huge chunk is the clear_shadow() followed by memsw_uncharge(). I will post the link to the breakdown image once it is available. Chris
On Tue, Sep 9, 2025 at 9:15 AM Chris Li <chrisl@kernel.org> wrote: > > On Tue, Sep 9, 2025 at 2:24 AM Barry Song <21cnbao@gmail.com> wrote: > > I feel the cover letter does not clearly describe where the bottleneck > > occurs or where the performance gains originate. To be honest, even > > the versions submitted last year did not present the bottleneck clearly. > > > > For example, is this due to lock contention (in which case we would > > need performance data to see how much CPU time is spent waiting for > > locks), or simply because we can simultaneously zap present and > > non-present PTEs? > > I have done some long tail analysis of the zswap page fault a while > back, before zswap converting to xarray. For the zswap page fault, in > the long tail a good chunk is a bath free swap slot. The breakdown > inside shows a huge chunk is the clear_shadow() followed by > memsw_uncharge(). I will post the link to the breakdown image once it > is available. Here is a graph, high level breakdown shows the batch free swap slot contribute to the long tail: https://services.google.com/fh/files/misc/zswap-breakdown.png The detail breakdown inside bath free swap slots: https://services.google.com/fh/files/misc/zswap-breakdown-detail.png Those data are on pretty old data, before zswap uses the xarray. Now the batch freeing the swap entries is gone. I am wondering if the new kernel shows any bottleneck for Lei's zram test case. Hi Lei, Please report back on your new findings. In this case, with removal of swap slot cache, the performance profile will likely be very different. Let me know if you have difficulties running the latest kernel on your test bench. Chris
© 2016 - 2025 Red Hat, Inc.