From nobody Wed Apr 1 20:37:31 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A538C44BCB4 for ; Wed, 1 Apr 2026 13:58:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775051890; cv=none; b=DZHiHgAvYDp22Yq8i+EKkGw5IKo84zWkEvFLU+tq6FYr7FVmxgEZ/NJ8DvNWBAvp8jYVDiwpphD9U83TFdfIULYT1UTDlNVWLKNoFs/fzPuRsgI5GX7jCf/URgsrMPK+GItmDuxwuBQAvRYAyqiW3PUQmnTUDDx0ltx+kgKAhX4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775051890; c=relaxed/simple; bh=pwRxG9GAAJufRx8H+Z8Qgyqr6YmcWMfzUDBj8VjAQ5k=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=aNZCfVXVHHpuwSGzIRaoGJZj3V9DHZHFY3PPuCsob7e7LbcpHnmD0hTmkZEBNIQc5id5nC70mGaMwVyS75K00nhEEKaOqbHlPqn066WdDVJHdhRzeKYz3OSz3wUVTqP5RPBCmhgyUL3JXnQP8fjSw+uVKnA6c4a6nt+ZpMD9OEw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=debian.org; spf=none smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=eEl96zUq; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="eEl96zUq" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:Message-Id: Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date:From: Reply-To:Content-ID:Content-Description:In-Reply-To:References; bh=9rsyYcZRgU9Y9m6EjiMjdPnpAaVlNNPh8zFmHNh1W+Y=; b=eEl96zUqmHHS1O9aRrxCfAaKv9 JrEUCGHVx+qFR1lRPq0vQlz7VCPIg4LmnNQCyb66NevKQx6uGjACgUNlJEiiL4vt+UuKKv/yDDNxa thOn2n4P0MBaDfQ9T0Vt1+ra9tVFA2Yq2KdmjMLoZhttG/JxASTpBAg1A3G6xWADSZQVYM6vJkEDv uaZW3o/77EfL0o83jzyEpZoTRstcFk5CnG7c1La5MuWq7TfWwWvn4KLQJeCbhmUkckSBlDWZ7mwHF ZqyY3L2tWn6vVj9tNt8oiJwU4ILGRE+uLDFQPEkw5yeqzj1wSqRprOOc8WRtvtfVYx0R9ZVJ5chVM adzGMDhg==; Received: from authenticated user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1w7w55-0032AU-02; Wed, 01 Apr 2026 13:58:01 +0000 From: Breno Leitao Date: Wed, 01 Apr 2026 06:57:50 -0700 Subject: [PATCH] mm/vmstat: spread vmstat_update requeue across the stat interval Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260401-vmstat-v1-1-b68ce4a35055@debian.org> X-B4-Tracking: v=1; b=H4sIAF4kzWkC/yXMQQqDMBBG4asM/9rAaINIrlK6SO2kjqAtmSgF8 e6iXb7F9zaYZBVDoA1ZVjX9zAhUV4R+iPNbnL4QCA03LXuu3TpZicWx74STxHTzHhXhmyXp7xr dH/+25TlKX06NfT8AO4oC3WoAAAA= X-Change-ID: 20260401-vmstat-048e0feaf344 To: Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kas@kernel.org, shakeel.butt@linux.dev, usama.arif@linux.dev, kernel-team@meta.com, Breno Leitao X-Mailer: b4 0.16-dev-453a6 X-Developer-Signature: v=1; a=openpgp-sha256; l=3632; i=leitao@debian.org; h=from:subject:message-id; bh=pwRxG9GAAJufRx8H+Z8Qgyqr6YmcWMfzUDBj8VjAQ5k=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBpzSRlU90iOKhLZPrjg3oQM3wcL26xI35lLkGCj o6bzf91b2mJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCac0kZQAKCRA1o5Of/Hh3 bWjWD/9J6ejSbwu8LItrXR9+TD9jdyZbhREGu5RimJOjV1uq6J9iyGpmCky3hjifppp6vavFufC R4RhwNNTVEr6RoDOMcGuAJBqYM5s0o+6MHAWerGVlP/VhQJSDD78QAqLTIgPgWtguELNn5ExkrL lE9caYxKayZVcSS2QsJoyz4yYr21tFAyBrkHrDJXDOCtwlkG9FVUlhzGK/amoQabsCkYzMtrXsn 9miNeGaI7TdB7RabwfyCPTmx0BE3/Tx59/CrLQ53NAfLY/U7tEPV4LRHJ6F8F8XkB4LNnT2Hta3 Dbdw5Fp6wvFZObjkldlUtI+JVWiCviYEXwrC2VV7vXDxQx0pQ5cDVbuyYfw4mHp2joS3q65kon0 vwtdiVzV6b6HP9BYvbo2M+TxqVfM0R273GbPJpgm1fIXz3O9wqEh7bliRijCV3HXYF3fZWikpJz MTm6oMVwI6olQW94MDOh77za8UcDQO5GNSOBpUJed7HzJfJzi2Lxuph0yeOWZ6XDXuvjFt0ne85 zSzUQvEjzZPXXYB6T/QYGSuYxd7OUlH2sGkQS6tKzeqkQg+75kLsYNyOrWOdyC7uj7pnmYN+Bvu S3GKHPG3UXjo6vc8JfsFPUsJR8wDftC52LamyOaExTDmJj+xFNEgwsH3QwnMUBV6CIiAha+XBft L6qxrkuHkSQMiAA== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao vmstat_update uses round_jiffies_relative() when re-queuing itself, which aligns all CPUs' timers to the same second boundary. When many CPUs have pending PCP pages to drain, they all call decay_pcp_high() -> free_pcppages_bulk() simultaneously, serializing on zone->lock and hitting contention. Introduce vmstat_spread_delay() which distributes each CPU's vmstat_update evenly across the stat interval instead of aligning them. This does not increase the number of timer interrupts =E2=80=94 each CPU st= ill fires once per interval. The timers are simply staggered rather than aligned. Additionally, vmstat_work is DEFERRABLE_WORK, so it does not wake idle CPUs regardless of scheduling; the spread only affects CPUs that are already active `perf lock contention` shows 7.5x reduction in zone->lock contention (872 -> 117 contentions, 199ms -> 81ms total wait) on a 72-CPU aarch64 system under memory pressure. Tested on a 72-CPU aarch64 system using stress-ng --vm to generate memory allocation bursts. Lock contention was measured with: perf lock contention -a -b -S free_pcppages_bulk Results with KASAN enabled: free_pcppages_bulk contention (KASAN): +--------------+----------+----------+ | Metric | No fix | With fix | +--------------+----------+----------+ | Contentions | 872 | 117 | | Total wait | 199.43ms | 80.76ms | | Max wait | 4.19ms | 35.76ms | +--------------+----------+----------+ Results without KASAN: free_pcppages_bulk contention (no KASAN): +--------------+----------+----------+ | Metric | No fix | With fix | +--------------+----------+----------+ | Contentions | 240 | 133 | | Total wait | 34.01ms | 24.61ms | | Max wait | 965us | 1.35ms | +--------------+----------+----------+ Signed-off-by: Breno Leitao Acked-by: Johannes Weiner Acked-by: Kiryl Shutsemau (Meta) Acked-by: Usama Arif --- mm/vmstat.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/mm/vmstat.c b/mm/vmstat.c index 2370c6fb1fcd..2e94bd765606 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -2032,6 +2032,29 @@ static int vmstat_refresh(const struct ctl_table *ta= ble, int write, } #endif /* CONFIG_PROC_FS */ =20 +/* + * Return a per-cpu delay that spreads vmstat_update work across the stat + * interval. Without this, round_jiffies_relative() aligns every CPU's + * timer to the same second boundary, causing a thundering-herd on + * zone->lock when multiple CPUs drain PCP pages simultaneously via + * decay_pcp_high() -> free_pcppages_bulk(). + */ +static unsigned long vmstat_spread_delay(void) +{ + unsigned long interval =3D sysctl_stat_interval; + unsigned int nr_cpus =3D num_online_cpus(); + + if (nr_cpus <=3D 1) + return round_jiffies_relative(interval); + + /* + * Spread per-cpu vmstat work evenly across the interval. Don't + * use round_jiffies_relative() here -- it would snap every CPU + * back to the same second boundary, defeating the spread. + */ + return interval + (interval * (smp_processor_id() % nr_cpus)) / nr_cpus; +} + static void vmstat_update(struct work_struct *w) { if (refresh_cpu_vm_stats(true)) { @@ -2042,7 +2065,7 @@ static void vmstat_update(struct work_struct *w) */ queue_delayed_work_on(smp_processor_id(), mm_percpu_wq, this_cpu_ptr(&vmstat_work), - round_jiffies_relative(sysctl_stat_interval)); + vmstat_spread_delay()); } } =20 --- base-commit: cf7c3c02fdd0dfccf4d6611714273dcb538af2cb change-id: 20260401-vmstat-048e0feaf344 Best regards, -- =20 Breno Leitao