include/linux/memcontrol.h | 3 + include/linux/swap.h | 11 ++ mm/Kconfig | 7 + mm/memcontrol.c | 55 ++++++ mm/swap.h | 18 ++ mm/swap_cgroup_priority.c | 335 +++++++++++++++++++++++++++++++++++++ mm/swapfile.c | 129 ++++++++++---- 7 files changed, 523 insertions(+), 35 deletions(-) create mode 100644 mm/swap_cgroup_priority.c
From: Youngjun Park <youngjun.park@lge.com> Introduction ============ I am a kernel developer working on platforms deployed on commercial consumer devices. Due to real-world product requirements, needed to modify the Linux kernel to support a new swap management mechanism. The proposed mechanism allows assigning different swap priorities to swap devices per cgroup. I believe this mechanism can be generally useful for similar constrained-device scenarios and would like to propose it for upstream inclusion and solicit feedback from the community. Motivation ========== Core requirement was to improve application responsiveness and loading time, especially for latency critical applications, without increasing RAM or storage hardware resources. Device constraints: - Linux-based embedded platform - Limited system RAM - Small local swap - No option to expand RAM or local swap To mitigate this, we explored utilizing idle RAM and storage from nearby devices as remote swap space. To maximize its effectiveness, we needed the ability to control which swap devices were used by different cgroups: - Assign faster local swap devices to latency critical apps - Assign remote swap devices to background apps However, current Linux kernel swap infrastructure does not support per-cgroup swap device assignment. To solve this, I propose a mechanism to allow each cgroup to specify its own swap device priorities. Evaluated Alternatives ====================== 1. **Per-cgroup dedicated swap devices** - Previously proposed upstream [1] - Challenges in managing global vs per-cgroup swap state - Difficult to integrate with existing memory.limit / swap.max semantics 2. **Multi-backend swap device with cgroup-aware routing** - Considered sort of layering violation (block device cgroup awareness) - Swap devices are commonly meant to be physical block devices. - Similar idea mentioned in [2] 3. **Per-cgroup swap device enable/disable with swap usage contorl** - Expand swap.max with zswap.writeback usage - Discussed in context of zswap writeback [3] - Cannot express arbitrary priority orderings (e.g. swap priority A-B-C on cgroup C-A-B impossible) - Less flexible than per-device priority approach 4. **Per-namespace swap priority configuration** - In short, make swap namespace for swap device priority - Overly complex for our use case - Cgroups are the natural scope for this mechanism Based on these findings, we chose to prototype per-cgroup swap priority configuration as the most natural, least invasive extension of the existing kernel mechanisms. Design and Semantics ==================== - Each swap device gets a unique ID at `swapon` time - Each cgroup has a `memory.swap.priority` interface: - Show unique ID by memory.swap.priority interface - Format: `unique_id:priority,unique_id:priority,...` - All currently-active swap devices must be listed - Priorities follow existing swap infrastructure semantics - The interface is writeable and updatable at runtime - A priority configuration can be reset via `echo "" > memory.swap.priority` - Swap on/off events propagate to all cgroups with priority configurations Example Usage ------------- # swap device on $ swapon NAME TYPE SIZE USED PRIO /dev/sdb partition 300M 0B 10 /dev/sdc partition 300M 0B 5 # assign custom priorities in a cgroup $ echo "1:5,2:10" > memory.swap.priority $ cat memory.swap.priority Active /dev/sdb unique:1 prio:5 /dev/sdc unique:2 prio:10 # adding new swap device later $ swapon /dev/sdd --priority -1 $ cat memory.swap.priority Active /dev/sdb unique:1 prio:5 /dev/sdc unique:2 prio:10 /dev/sdd unique:3 prio:-2 # reset cgroup priority $ echo "" > memory.swap.priority $ cat memory.swap.priority Inactive /dev/sdb unique:1 prio:10 /dev/sdc unique:2 prio:5 /dev/sdd unique:3 prio:-2 Implementation Notes ==================== The items mentioned below are to be considered during the next patch work. - Workaround using per swap cpu cluster as before - Priority propgation of child cgroup - And other TODO, XXX - Refactoring for reviewability and maintainability, comprehensive testing and performance evaluation Future Work =========== These are items that would benefit from further consideration and potential implementation. - Support for per-process or anything else swap prioritization - Optional usage limits per swap device (e.g., ratio, max bytes) - Generalizing the interface beyond cgroups References ========== [1] https://lkml.iu.edu/hypermail/linux/kernel/1404.0/02530.html [2] https://lore.kernel.org/linux-mm/CAMgjq7DGMS5A4t6nOQmwyLy5Px96aoejBkiwFHgy9uMk-F8Y-w@mail.gmail.com [3] https://lore.kernel.org/lkml/CAF8kJuN-4UE0skVHvjUzpGefavkLULMonjgkXUZSBVJrcGFXCA@mail.gmail.com All comments and feedback are greatly appreciated. Patch will follow. Sincerely, Youngjun Park youngjun.park (2): mm/swap, memcg: basic structure and logic for per cgroup swap priority control mm: swap: apply per cgroup swap priority mechansim on swap layer include/linux/memcontrol.h | 3 + include/linux/swap.h | 11 ++ mm/Kconfig | 7 + mm/memcontrol.c | 55 ++++++ mm/swap.h | 18 ++ mm/swap_cgroup_priority.c | 335 +++++++++++++++++++++++++++++++++++++ mm/swapfile.c | 129 ++++++++++---- 7 files changed, 523 insertions(+), 35 deletions(-) create mode 100644 mm/swap_cgroup_priority.c base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494 -- 2.34.1
On Thu, Jun 12, 2025 at 6:38 PM <youngjun.park@lge.com> wrote: > > From: Youngjun Park <youngjun.park@lge.com> > > Introduction > ============ > I am a kernel developer working on platforms deployed on commercial consumer devices. > Due to real-world product requirements, needed to modify the Linux kernel to support > a new swap management mechanism. The proposed mechanism allows assigning different swap > priorities to swap devices per cgroup. > I believe this mechanism can be generally useful for similar constrained-device scenarios > and would like to propose it for upstream inclusion and solicit feedback from the community. > > Motivation > ========== > Core requirement was to improve application responsiveness and loading time, especially > for latency critical applications, without increasing RAM or storage hardware resources. > Device constraints: > - Linux-based embedded platform > - Limited system RAM > - Small local swap > - No option to expand RAM or local swap > To mitigate this, we explored utilizing idle RAM and storage from nearby devices as remote > swap space. To maximize its effectiveness, we needed the ability to control which swap devices > were used by different cgroups: > - Assign faster local swap devices to latency critical apps > - Assign remote swap devices to background apps > However, current Linux kernel swap infrastructure does not support per-cgroup swap device > assignment. > To solve this, I propose a mechanism to allow each cgroup to specify its own swap device > priorities. > > Evaluated Alternatives > ====================== > 1. **Per-cgroup dedicated swap devices** > - Previously proposed upstream [1] > - Challenges in managing global vs per-cgroup swap state > - Difficult to integrate with existing memory.limit / swap.max semantics > 2. **Multi-backend swap device with cgroup-aware routing** > - Considered sort of layering violation (block device cgroup awareness) > - Swap devices are commonly meant to be physical block devices. > - Similar idea mentioned in [2] > 3. **Per-cgroup swap device enable/disable with swap usage contorl** > - Expand swap.max with zswap.writeback usage > - Discussed in context of zswap writeback [3] > - Cannot express arbitrary priority orderings > (e.g. swap priority A-B-C on cgroup C-A-B impossible) > - Less flexible than per-device priority approach > 4. **Per-namespace swap priority configuration** > - In short, make swap namespace for swap device priority > - Overly complex for our use case > - Cgroups are the natural scope for this mechanism > > Based on these findings, we chose to prototype per-cgroup swap priority configuration > as the most natural, least invasive extension of the existing kernel mechanisms. > > Design and Semantics > ==================== > - Each swap device gets a unique ID at `swapon` time > - Each cgroup has a `memory.swap.priority` interface: > - Show unique ID by memory.swap.priority interface > - Format: `unique_id:priority,unique_id:priority,...` > - All currently-active swap devices must be listed > - Priorities follow existing swap infrastructure semantics > - The interface is writeable and updatable at runtime > - A priority configuration can be reset via `echo "" > memory.swap.priority` > - Swap on/off events propagate to all cgroups with priority configurations > > Example Usage > ------------- > # swap device on > $ swapon > NAME TYPE SIZE USED PRIO > /dev/sdb partition 300M 0B 10 > /dev/sdc partition 300M 0B 5 > > # assign custom priorities in a cgroup > $ echo "1:5,2:10" > memory.swap.priority > $ cat memory.swap.priority > Active > /dev/sdb unique:1 prio:5 > /dev/sdc unique:2 prio:10 > > # adding new swap device later > $ swapon /dev/sdd --priority -1 > $ cat memory.swap.priority > Active > /dev/sdb unique:1 prio:5 > /dev/sdc unique:2 prio:10 > /dev/sdd unique:3 prio:-2 > > # reset cgroup priority > $ echo "" > memory.swap.priority > $ cat memory.swap.priority > Inactive > /dev/sdb unique:1 prio:10 > /dev/sdc unique:2 prio:5 > /dev/sdd unique:3 prio:-2 > > Implementation Notes > ==================== > The items mentioned below are to be considered during the next patch work. > > - Workaround using per swap cpu cluster as before > - Priority propgation of child cgroup > - And other TODO, XXX > - Refactoring for reviewability and maintainability, comprehensive testing > and performance evaluation Hi Youngjun, Interesting idea. For your current approach, I think all we need is per-cgroup swap meta info structures (and infrastures for maintaining and manipulating them). So we have a global version and a cgroup version of "plist, next cluster list, and maybe something else", right? And then once the allocator is folio aware it can just prefer the cgroup ones (as I mentioned in another reply) reusing all the same other routines. Changes are minimal, the cgroup swap meta infos and control plane are separately maintained. It seems aligned quite well with what I wanted to do, and can be done in a clean and easy to maintain way. Meanwhile with virtual swap, things could be even more flexible, not only changing the priority at swapout time, it will also provide capabilities to migrate and balance devices adaptively, and solve long term issues like mTHP fragmentation and min-order swapout etc.. Maybe they can be combined, like maybe cgroup can be limited to use the virtual device or physical ones depending on priority. Seems all solvable. Just some ideas here. Vswap can cover the priority part too. I think we might want to avoid duplicated interfaces. So I'm just imagining things now, will it be good if we have something like (following your design): $ cat memcg1/memory.swap.priority Active /dev/vswap:(zram/zswap? with compression params?) unique:0 prio:5 $ cat memcg2/memory.swap.priority Active /dev/vswap:/dev/nvme1 unique:1 prio:5 /dev/vswap:/dev/nvme2 unique:2 prio:10 /dev/vswap:/dev/vda unique:3 prio:15 /dev/sda unique:4 prio:20 $ cat memcg3/memory.swap.priority Active /dev/vda unique:3 prio:5 /dev/sda unique:4 prio:15 Meaning memcg1 (high priority) is allowed to use compressed memory only through vswap, and memcg2 (mid priority) uses disks through vswap and fallback to HDD. memcg3 (low prio) is only allowed to use slow devices. Global fallback just uses everything the system has. It might be over complex though? > > Future Work > =========== > These are items that would benefit from further consideration > and potential implementation. > > - Support for per-process or anything else swap prioritization > - Optional usage limits per swap device (e.g., ratio, max bytes) > - Generalizing the interface beyond cgroups > > References > ========== > [1] https://lkml.iu.edu/hypermail/linux/kernel/1404.0/02530.html > [2] https://lore.kernel.org/linux-mm/CAMgjq7DGMS5A4t6nOQmwyLy5Px96aoejBkiwFHgy9uMk-F8Y-w@mail.gmail.com > [3] https://lore.kernel.org/lkml/CAF8kJuN-4UE0skVHvjUzpGefavkLULMonjgkXUZSBVJrcGFXCA@mail.gmail.com > > All comments and feedback are greatly appreciated. > Patch will follow. > > Sincerely, > Youngjun Park > > youngjun.park (2): > mm/swap, memcg: basic structure and logic for per cgroup swap priority > control > mm: swap: apply per cgroup swap priority mechansim on swap layer > > include/linux/memcontrol.h | 3 + > include/linux/swap.h | 11 ++ > mm/Kconfig | 7 + > mm/memcontrol.c | 55 ++++++ > mm/swap.h | 18 ++ > mm/swap_cgroup_priority.c | 335 +++++++++++++++++++++++++++++++++++++ > mm/swapfile.c | 129 ++++++++++---- > 7 files changed, 523 insertions(+), 35 deletions(-) > create mode 100644 mm/swap_cgroup_priority.c > > base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494 > -- > 2.34.1 > >
On Thu, Jun 12, 2025 at 08:24:08PM +0800, Kairui Song wrote: > On Thu, Jun 12, 2025 at 6:38 PM <youngjun.park@lge.com> wrote: > > > > From: Youngjun Park <youngjun.park@lge.com> > > > > Introduction > > ============ > > I am a kernel developer working on platforms deployed on commercial consumer devices. > > Due to real-world product requirements, needed to modify the Linux kernel to support > > a new swap management mechanism. The proposed mechanism allows assigning different swap > > priorities to swap devices per cgroup. > > I believe this mechanism can be generally useful for similar constrained-device scenarios > > and would like to propose it for upstream inclusion and solicit feedback from the community. > > > > Motivation > > ========== > > Core requirement was to improve application responsiveness and loading time, especially > > for latency critical applications, without increasing RAM or storage hardware resources. > > Device constraints: > > - Linux-based embedded platform > > - Limited system RAM > > - Small local swap > > - No option to expand RAM or local swap > > To mitigate this, we explored utilizing idle RAM and storage from nearby devices as remote > > swap space. To maximize its effectiveness, we needed the ability to control which swap devices > > were used by different cgroups: > > - Assign faster local swap devices to latency critical apps > > - Assign remote swap devices to background apps > > However, current Linux kernel swap infrastructure does not support per-cgroup swap device > > assignment. > > To solve this, I propose a mechanism to allow each cgroup to specify its own swap device > > priorities. > > > > Evaluated Alternatives > > ====================== > > 1. **Per-cgroup dedicated swap devices** > > - Previously proposed upstream [1] > > - Challenges in managing global vs per-cgroup swap state > > - Difficult to integrate with existing memory.limit / swap.max semantics > > 2. **Multi-backend swap device with cgroup-aware routing** > > - Considered sort of layering violation (block device cgroup awareness) > > - Swap devices are commonly meant to be physical block devices. > > - Similar idea mentioned in [2] > > 3. **Per-cgroup swap device enable/disable with swap usage contorl** > > - Expand swap.max with zswap.writeback usage > > - Discussed in context of zswap writeback [3] > > - Cannot express arbitrary priority orderings > > (e.g. swap priority A-B-C on cgroup C-A-B impossible) > > - Less flexible than per-device priority approach > > 4. **Per-namespace swap priority configuration** > > - In short, make swap namespace for swap device priority > > - Overly complex for our use case > > - Cgroups are the natural scope for this mechanism > > > > Based on these findings, we chose to prototype per-cgroup swap priority configuration > > as the most natural, least invasive extension of the existing kernel mechanisms. > > > > Design and Semantics > > ==================== > > - Each swap device gets a unique ID at `swapon` time > > - Each cgroup has a `memory.swap.priority` interface: > > - Show unique ID by memory.swap.priority interface > > - Format: `unique_id:priority,unique_id:priority,...` > > - All currently-active swap devices must be listed > > - Priorities follow existing swap infrastructure semantics > > - The interface is writeable and updatable at runtime > > - A priority configuration can be reset via `echo "" > memory.swap.priority` > > - Swap on/off events propagate to all cgroups with priority configurations > > > > Example Usage > > ------------- > > # swap device on > > $ swapon > > NAME TYPE SIZE USED PRIO > > /dev/sdb partition 300M 0B 10 > > /dev/sdc partition 300M 0B 5 > > > > # assign custom priorities in a cgroup > > $ echo "1:5,2:10" > memory.swap.priority > > $ cat memory.swap.priority > > Active > > /dev/sdb unique:1 prio:5 > > /dev/sdc unique:2 prio:10 > > > > # adding new swap device later > > $ swapon /dev/sdd --priority -1 > > $ cat memory.swap.priority > > Active > > /dev/sdb unique:1 prio:5 > > /dev/sdc unique:2 prio:10 > > /dev/sdd unique:3 prio:-2 > > > > # reset cgroup priority > > $ echo "" > memory.swap.priority > > $ cat memory.swap.priority > > Inactive > > /dev/sdb unique:1 prio:10 > > /dev/sdc unique:2 prio:5 > > /dev/sdd unique:3 prio:-2 > > > > Implementation Notes > > ==================== > > The items mentioned below are to be considered during the next patch work. > > > > - Workaround using per swap cpu cluster as before > > - Priority propgation of child cgroup > > - And other TODO, XXX > > - Refactoring for reviewability and maintainability, comprehensive testing > > and performance evaluation > > Hi Youngjun, > > Interesting idea. For your current approach, I think all we need is > per-cgroup swap meta info structures (and infrastures for maintaining > and manipulating them). > > So we have a global version and a cgroup version of "plist, next > cluster list, and maybe something else", right? And then > once the allocator is folio aware it can just prefer the cgroup ones > (as I mentioned in another reply) reusing all the same other > routines. Changes are minimal, the cgroup swap meta infos > and control plane are separately maintained. > > It seems aligned quite well with what I wanted to do, and can be done > in a clean and easy to maintain way. > > Meanwhile with virtual swap, things could be even more flexible, not > only changing the priority at swapout time, it will also provide > capabilities to migrate and balance devices adaptively, and solve long > term issues like mTHP fragmentation and min-order swapout etc.. > > Maybe they can be combined, like maybe cgroup can be limited to use > the virtual device or physical ones depending on priority. Seems all > solvable. Just some ideas here. I had been thinking about the work related to vswap and alignment, so I'm glad to hear that they can harmonize. > Vswap can cover the priority part too. I think we might want to avoid > duplicated interfaces. > > So I'm just imagining things now, will it be good if we have something > like (following your design): > > $ cat memcg1/memory.swap.priority > Active > /dev/vswap:(zram/zswap? with compression params?) unique:0 prio:5 > > $ cat memcg2/memory.swap.priority > Active > /dev/vswap:/dev/nvme1 unique:1 prio:5 > /dev/vswap:/dev/nvme2 unique:2 prio:10 > /dev/vswap:/dev/vda unique:3 prio:15 > /dev/sda unique:4 prio:20 > > $ cat memcg3/memory.swap.priority > Active > /dev/vda unique:3 prio:5 > /dev/sda unique:4 prio:15 > > Meaning memcg1 (high priority) is allowed to use compressed memory > only through vswap, and memcg2 (mid priority) uses disks through vswap > and fallback to HDD. memcg3 (low prio) is only allowed to use slow > devices. > > Global fallback just uses everything the system has. It might be over > complex though? Just looking at the example usage which you mention, it seems flexible and good. I will think more about this in relation to it.
On Thu, Jun 12, 2025 at 5:24 AM Kairui Song <ryncsn@gmail.com> wrote: > > On Thu, Jun 12, 2025 at 6:38 PM <youngjun.park@lge.com> wrote: > > > > From: Youngjun Park <youngjun.park@lge.com> > > > > Introduction > > ============ > > I am a kernel developer working on platforms deployed on commercial consumer devices. > > Due to real-world product requirements, needed to modify the Linux kernel to support > > a new swap management mechanism. The proposed mechanism allows assigning different swap > > priorities to swap devices per cgroup. > > I believe this mechanism can be generally useful for similar constrained-device scenarios > > and would like to propose it for upstream inclusion and solicit feedback from the community. We're mostly just using zswap and disk swap, for now, so I don't have too much input for this. Kairui, would this design satisfy your zram use case as well? > > > > Motivation > > ========== > > Core requirement was to improve application responsiveness and loading time, especially > > for latency critical applications, without increasing RAM or storage hardware resources. > > Device constraints: > > - Linux-based embedded platform > > - Limited system RAM > > - Small local swap > > - No option to expand RAM or local swap > > To mitigate this, we explored utilizing idle RAM and storage from nearby devices as remote > > swap space. To maximize its effectiveness, we needed the ability to control which swap devices > > were used by different cgroups: > > - Assign faster local swap devices to latency critical apps > > - Assign remote swap devices to background apps > > However, current Linux kernel swap infrastructure does not support per-cgroup swap device > > assignment. > > To solve this, I propose a mechanism to allow each cgroup to specify its own swap device > > priorities. > > > > Evaluated Alternatives > > ====================== > > 1. **Per-cgroup dedicated swap devices** > > - Previously proposed upstream [1] > > - Challenges in managing global vs per-cgroup swap state > > - Difficult to integrate with existing memory.limit / swap.max semantics > > 2. **Multi-backend swap device with cgroup-aware routing** > > - Considered sort of layering violation (block device cgroup awareness) > > - Swap devices are commonly meant to be physical block devices. > > - Similar idea mentioned in [2] > > 3. **Per-cgroup swap device enable/disable with swap usage contorl** > > - Expand swap.max with zswap.writeback usage > > - Discussed in context of zswap writeback [3] > > - Cannot express arbitrary priority orderings > > (e.g. swap priority A-B-C on cgroup C-A-B impossible) > > - Less flexible than per-device priority approach > > 4. **Per-namespace swap priority configuration** > > - In short, make swap namespace for swap device priority > > - Overly complex for our use case > > - Cgroups are the natural scope for this mechanism > > > > Based on these findings, we chose to prototype per-cgroup swap priority configuration > > as the most natural, least invasive extension of the existing kernel mechanisms. > > > > Design and Semantics > > ==================== > > - Each swap device gets a unique ID at `swapon` time > > - Each cgroup has a `memory.swap.priority` interface: > > - Show unique ID by memory.swap.priority interface > > - Format: `unique_id:priority,unique_id:priority,...` > > - All currently-active swap devices must be listed > > - Priorities follow existing swap infrastructure semantics > > - The interface is writeable and updatable at runtime > > - A priority configuration can be reset via `echo "" > memory.swap.priority` > > - Swap on/off events propagate to all cgroups with priority configurations > > > > Example Usage > > ------------- > > # swap device on > > $ swapon > > NAME TYPE SIZE USED PRIO > > /dev/sdb partition 300M 0B 10 > > /dev/sdc partition 300M 0B 5 > > > > # assign custom priorities in a cgroup > > $ echo "1:5,2:10" > memory.swap.priority > > $ cat memory.swap.priority > > Active > > /dev/sdb unique:1 prio:5 > > /dev/sdc unique:2 prio:10 > > > > # adding new swap device later > > $ swapon /dev/sdd --priority -1 > > $ cat memory.swap.priority > > Active > > /dev/sdb unique:1 prio:5 > > /dev/sdc unique:2 prio:10 > > /dev/sdd unique:3 prio:-2 > > > > # reset cgroup priority > > $ echo "" > memory.swap.priority > > $ cat memory.swap.priority > > Inactive > > /dev/sdb unique:1 prio:10 > > /dev/sdc unique:2 prio:5 > > /dev/sdd unique:3 prio:-2 > > > > Implementation Notes > > ==================== > > The items mentioned below are to be considered during the next patch work. > > > > - Workaround using per swap cpu cluster as before > > - Priority propgation of child cgroup > > - And other TODO, XXX > > - Refactoring for reviewability and maintainability, comprehensive testing > > and performance evaluation > > Hi Youngjun, > > Interesting idea. For your current approach, I think all we need is > per-cgroup swap meta info structures (and infrastures for maintaining > and manipulating them). Agreed. > > So we have a global version and a cgroup version of "plist, next > cluster list, and maybe something else", right? And then > once the allocator is folio aware it can just prefer the cgroup ones > (as I mentioned in another reply) reusing all the same other > routines. Changes are minimal, the cgroup swap meta infos > and control plane are separately maintained. > > It seems aligned quite well with what I wanted to do, and can be done > in a clean and easy to maintain way. > > Meanwhile with virtual swap, things could be even more flexible, not > only changing the priority at swapout time, it will also provide > capabilities to migrate and balance devices adaptively, and solve long > term issues like mTHP fragmentation and min-order swapout etc.. Agreed. > > Maybe they can be combined, like maybe cgroup can be limited to use > the virtual device or physical ones depending on priority. Seems all > solvable. Just some ideas here. 100% > > Vswap can cover the priority part too. I think we might want to avoid > duplicated interfaces. Yeah as long as we have a reasonable cgroup interface, we can always change the implementation later. We can move things to virtual swap, etc. at a latter time. > > So I'm just imagining things now, will it be good if we have something > like (following your design): > > $ cat memcg1/memory.swap.priority > Active > /dev/vswap:(zram/zswap? with compression params?) unique:0 prio:5 > > $ cat memcg2/memory.swap.priority > Active > /dev/vswap:/dev/nvme1 unique:1 prio:5 > /dev/vswap:/dev/nvme2 unique:2 prio:10 > /dev/vswap:/dev/vda unique:3 prio:15 > /dev/sda unique:4 prio:20 > > $ cat memcg3/memory.swap.priority > Active > /dev/vda unique:3 prio:5 > /dev/sda unique:4 prio:15 > > Meaning memcg1 (high priority) is allowed to use compressed memory > only through vswap, and memcg2 (mid priority) uses disks through vswap > and fallback to HDD. memcg3 (low prio) is only allowed to use slow > devices. > > Global fallback just uses everything the system has. It might be over > complex though? Sounds good to me. > > > > > > Future Work > > =========== > > These are items that would benefit from further consideration > > and potential implementation. > > > > - Support for per-process or anything else swap prioritization This might be too granular. > > - Optional usage limits per swap device (e.g., ratio, max bytes) > > - Generalizing the interface beyond cgroups > > > > References > > ========== > > [1] https://lkml.iu.edu/hypermail/linux/kernel/1404.0/02530.html > > [2] https://lore.kernel.org/linux-mm/CAMgjq7DGMS5A4t6nOQmwyLy5Px96aoejBkiwFHgy9uMk-F8Y-w@mail.gmail.com > > [3] https://lore.kernel.org/lkml/CAF8kJuN-4UE0skVHvjUzpGefavkLULMonjgkXUZSBVJrcGFXCA@mail.gmail.com > > > > All comments and feedback are greatly appreciated. > > Patch will follow. > > > > Sincerely, > > Youngjun Park > > > > youngjun.park (2): > > mm/swap, memcg: basic structure and logic for per cgroup swap priority > > control > > mm: swap: apply per cgroup swap priority mechansim on swap layer > > > > include/linux/memcontrol.h | 3 + > > include/linux/swap.h | 11 ++ > > mm/Kconfig | 7 + > > mm/memcontrol.c | 55 ++++++ > > mm/swap.h | 18 ++ > > mm/swap_cgroup_priority.c | 335 +++++++++++++++++++++++++++++++++++++ > > mm/swapfile.c | 129 ++++++++++---- > > 7 files changed, 523 insertions(+), 35 deletions(-) > > create mode 100644 mm/swap_cgroup_priority.c > > > > base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494 > > -- > > 2.34.1 > > > >
© 2016 - 2025 Red Hat, Inc.