RFC [PATCH1] mm: vmscan: Limit isolate_lru_folios() IRQ holdoff latency impact

Mike Galbraith posted 1 patch 3 weeks, 3 days ago
mm/vmscan.c |   19 +++++++++++++++++++
1 file changed, 19 insertions(+)
RFC [PATCH1] mm: vmscan: Limit isolate_lru_folios() IRQ holdoff latency impact
Posted by Mike Galbraith 3 weeks, 3 days ago
(was regression: mm: vmscan: - size XL irqoff time increase v6.10+)


While examining a service latency regression beginning at v6.10, it was
discovered that isolate_lru_folios() runtimes could cause IRQ holdoff
spanning several ticks.  Worse measured was 78ms.

Given it's called with IRQs disabled, cap its runtme to around a ms.

With this patch, and one to mitigate move_expired_inodes() induced
wakeup latency hits up 66ms, the load that repeatably produced this:

(6.12.0-rc5 running osc kernel package build + ram sized bonnie)
T: 0 ( 4541) P:99 I:1000 C: 773124 Min:      1 Act:    6 Avg:    3 Max:   53381
T: 1 ( 4542) P:99 I:1500 C: 515381 Min:      1 Act:    7 Avg:    3 Max:   60119
T: 2 ( 4543) P:99 I:2000 C: 386465 Min:      1 Act:    5 Avg:    4 Max:   67469
T: 3 ( 4544) P:99 I:2500 C: 309182 Min:      1 Act:    7 Avg:    4 Max:   65534
T: 4 ( 4545) P:99 I:3000 C: 257682 Min:      1 Act:    6 Avg:    4 Max:   57776
T: 5 ( 4546) P:99 I:3500 C: 220874 Min:      1 Act:    4 Avg:    4 Max:   64737
T: 6 ( 4547) P:99 I:4000 C: 193289 Min:      1 Act:    6 Avg:    4 Max:   52567
T: 7 ( 4548) P:99 I:4500 C: 171799 Min:      1 Act:    6 Avg:    5 Max:   52606

latency went to this:

T: 0 ( 6078) P:99 I:1000 C:1031230 Min:      1 Act:    7 Avg:    4 Max:    4449
T: 1 ( 6079) P:99 I:1500 C: 687505 Min:      1 Act:    5 Avg:    4 Max:    2075
T: 2 ( 6080) P:99 I:2000 C: 515627 Min:      1 Act:  334 Avg:    5 Max:    2987
T: 3 ( 6081) P:99 I:2500 C: 412503 Min:      1 Act:    8 Avg:    5 Max:    2425
T: 4 ( 6082) P:99 I:3000 C: 343752 Min:      1 Act:   11 Avg:    5 Max:    1466
T: 5 ( 6083) P:99 I:3500 C: 294643 Min:      1 Act:  282 Avg:    5 Max:    3844
T: 6 ( 6084) P:99 I:4000 C: 257813 Min:      1 Act:    7 Avg:    5 Max:    3291
T: 7 ( 6085) P:99 I:4500 C: 229167 Min:      1 Act:    9 Avg:    5 Max:    3516

Signed-off-by: Mike Galbraith <efault@gmx.de>
---
 mm/vmscan.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -63,6 +63,9 @@
 #include <linux/swapops.h>
 #include <linux/balloon_compaction.h>
 #include <linux/sched/sysctl.h>
+#ifndef CONFIG_PREEMPT_RT
+#include <linux/sched/clock.h>
+#endif

 #include "internal.h"
 #include "swap.h"
@@ -1657,6 +1660,10 @@ static unsigned long isolate_lru_folios(
 	unsigned long skipped = 0;
 	unsigned long scan, total_scan, nr_pages;
 	LIST_HEAD(folios_skipped);
+#ifndef CONFIG_PREEMPT_RT
+	u64 then = local_clock();
+	int batch = 0;
+#endif

 	total_scan = 0;
 	scan = 0;
@@ -1709,6 +1716,18 @@ static unsigned long isolate_lru_folios(
 		move_to = dst;
 move:
 		list_move(&folio->lru, move_to);
+
+#ifndef CONFIG_PREEMPT_RT
+		/*
+		 * IRQs are disabled, cap holdoff at a millisecond or so.
+		 */
+		batch += nr_pages;
+		if (batch < 1024)
+			continue;
+		if (local_clock() - then >= NSEC_PER_MSEC)
+			break;
+		batch = 0;
+#endif
 	}

 	/*

# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 6.12.0.g4236f913-master
# --------------------------------------------------------------------
# latency: 77766 us, #12/12, CPU#5 | (M:unknown VP:0, KP:0, SP:0 HP:0 #P:8)
#    -----------------
#    | task: kswapd0-90 (uid:0 nice:0 policy:0 rt_prio:0)
#    -----------------
#  => started at: _raw_spin_lock_irq+0x16/0x40
#  => ended at:   _raw_spin_unlock_irq+0x10/0x30
#
#
#                    _------=> CPU#            
#                   / _-----=> irqs-off/BH-disabled
#                  | / _----=> need-resched    
#                  || / _---=> hardirq/softirq 
#                  ||| / _--=> preempt-depth   
#                  |||| / _-=> migrate-disable 
#                  ||||| /     delay           
#  cmd     pid     |||||| time  |   caller     
#     \   /        ||||||  \    |    /       
 kswapd0-90        5d....    0us : _raw_spin_lock_irq+0x16/0x40
 kswapd0-90        5d....    0us : preempt_count_add+0x0/0x50 <-_raw_spin_lock_irq+0x20/0x40
 kswapd0-90        5d..1.    0us*: isolate_lru_folios+0x0/0x3e0 <-shrink_inactive_list+0x10f/0x440
 kswapd0-90        5d..1. 77765us : __rcu_read_lock+0x0/0x20 <-isolate_lru_folios+0x160/0x3e0
 kswapd0-90        5d..1. 77765us : __rcu_read_unlock+0x0/0x180 <-isolate_lru_folios+0x199/0x3e0
 kswapd0-90        5d..1. 77766us : __mod_node_page_state+0x0/0x70 <-isolate_lru_folios+0x2ee/0x3e0
 kswapd0-90        5d..1. 77766us : __mod_zone_page_state+0x0/0x60 <-isolate_lru_folios+0x2fc/0x3e0
 kswapd0-90        5d..1. 77766us : __mod_node_page_state+0x0/0x70 <-shrink_inactive_list+0x12a/0x440
 kswapd0-90        5d..1. 77766us : _raw_spin_unlock_irq+0x0/0x30 <-shrink_inactive_list+0x16a/0x440
 kswapd0-90        5d..1. 77766us : _raw_spin_unlock_irq+0x10/0x30
 kswapd0-90        5d..1. 77767us : tracer_hardirqs_on+0xcb/0xe0 <-_raw_spin_unlock_irq+0x10/0x30
 kswapd0-90        5d..1. 77774us : <stack trace>
 => shrink_inactive_list+0x16a/0x440
 => shrink_lruvec+0x2fc/0x470
 => shrink_node+0x1d6/0x4a0
 => balance_pgdat+0x280/0x5c0
 => kswapd+0x10f/0x350
 => kthread+0xfa/0x130
 => ret_from_fork+0x2d/0x50
 => ret_from_fork_asm+0x11/0x20