mm/page_alloc.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+)
Atomic allocations (GFP_ATOMIC), particularly in network interrupt contexts, are prone to failure during bursts of traffic if the pre-configured min_free_kbytes (atomic reserve) is insufficient. These failures lead to packet drops and performance degradation. Static tuning of vm.min_free_kbytes is often challenging: setting it too low risks drops, while setting it too high wastes valuable memory. This patch series introduces a reactive mechanism that: 1. Detects critical order-0 GFP_ATOMIC allocation failures. 2. Automatically doubles vm.min_free_kbytes to reserve more memory for future bursts. 3. Enforces a safety cap (1% of total RAM) to prevent OOM or excessive waste. This allows the system to self-adjust to the workload's specific atomic memory requirements without manual intervention. wujing (1): mm/page_alloc: auto-tune min_free_kbytes on atomic allocation failure mm/page_alloc.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) -- 2.39.5
This is v3 of the auto-tuning patch, addressing feedback from Vlastimil Babka, Andrew Morton, and Matthew Wilcox. Major shift in v3: Following Vlastimil's suggestion, this version abandons the direct modification of min_free_kbytes. Instead, it leverages the existing watermark_boost infrastructure. This approach is more idiomatic as it: - Avoids conflicts with administrative sysctl settings. - Only affects specific zones experiencing pressure. - Utilizes standard kswapd logic for natural decay after reclamation. Responses to Vlastimil Babka's feedback: > "Were they really packet drops observed? AFAIK the receive is deferred to non-irq > context if those atomic allocations fail, it shouldn't mean a drop." In our high-concurrency production environment, we observed that while the network stack tries to defer processing, persistent GFP_ATOMIC failures eventually lead to NIC-level drops due to RX buffer exhaustion. > "As for the implementation I'd rather not be changing min_free_kbytes directly... > We already have watermark_boost to dynamically change watermarks" Agreed and implemented in v3. Changes in v3: - Replaced min_free_kbytes modification with watermark_boost calls. - Removed all complex decay/persistence logic from v2, relying on kswapd's standard behavior. - Maintained the 10-second debounce mechanism. - Engaged netdev@ community as requested by Andrew Morton. Thanks for the thoughtful reviews! wujing (1): mm/page_alloc: auto-tune watermarks on atomic allocation failure mm/page_alloc.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) -- 2.39.5
This is v2 of the auto-tuning patch, addressing feedback from Andrew Morton and Matthew Wilcox. ## Responses to Andrew Morton's feedback: > "But no attempt to reduce it again after the load spike has gone away." v2 implements a decay mechanism: min_free_kbytes automatically reduces by 5% every 5 minutes after being increased. However, it stops at 1.2x the initial value rather than returning to baseline, ensuring the system "remembers" previous pressure patterns. > "Probably this should be selectable and tunable via a kernel boot parameter > or a procfs tunable." Per Matthew Wilcox's preference to avoid new tunables, v2 implements an algorithm designed to work automatically without configuration. The parameters (50% increase, 5% decay, 10s debounce) are chosen to be responsive yet stable. > "Can I suggest that you engage with [the networking people]? netdev@" Done - netdev@ is now CC'd on this v2 submission. ## Responses to Matthew Wilcox's feedback: > "Is doubling too aggressive? Would an increase of, say, 10% or 20% be more > appropriate?" v2 uses a 50% increase (compromise between responsiveness and conservatism). 20% felt too slow for burst traffic scenarios based on our observations. > "Do we have to wait for failure before increasing? Could we schedule the > increase for when we get to within, say, 10% of the current limit?" We considered proactive monitoring but concluded it would add overhead and complexity. The debounce mechanism (10s) ensures we don't thrash while still being reactive. > "Hm, how would we do that? Automatically decay by 5%, 300 seconds after > increasing; then schedule another decay for 300 seconds after that..." Exactly as you suggested! v2 implements this decay chain. The only addition is stopping at 1.2x baseline to preserve learning. > "Ugh, please, no new tunables. Let's just implement an algorithm that works." Agreed - v2 has zero new tunables. ## Changes in v2: - Reduced aggressiveness: +50% increase instead of doubling - Added debounce: Only trigger once per 10 seconds to prevent storms - Added decay: Automatically reduce by 5% every 5 minutes - Preserve learning: Decay stops at 1.2x initial value, not baseline - Engaged networking community (netdev@) Thanks for the thoughtful reviews! wujing (1): mm/page_alloc: auto-tune min_free_kbytes on atomic allocation failure mm/page_alloc.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) -- 2.39.5
© 2016 - 2026 Red Hat, Inc.