From nobody Sun Feb 8 13:45:55 2026 Received: from out203-205-221-155.mail.qq.com (out203-205-221-155.mail.qq.com [203.205.221.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF4B7FBF6 for ; Sat, 15 Nov 2025 03:05:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=203.205.221.155 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763175909; cv=none; b=i7JVn5VGWqHufqKrulTPmmJQ6BWqGd5V25dJruVFbIkRu+hv6R86524XAfDrsBOJ31yfYkylGTB+jMT96kVZ9hEVn6R8m+yF2H09hZ9s1plPv5bf/Yf9t1wIdZybxZLIrdAuoTOX9E/AgNYb2x74h9Z1fYjdIPe/YOiwzt1i00k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763175909; c=relaxed/simple; bh=QuoiiUFOMjygUBWR4kJg1TQNUWog6oKpDYh3SJU5z2g=; h=Message-ID:From:To:Cc:Subject:Date:MIME-Version; b=cBdLFNuWYlie73Ia8T4ohp6M85YcSKtDo9kAkMzVjFoKuvm5EjlN+NhIOkoPFAcxa5Q/zYgQw8nc11OrHirecrzuPJQK2aPZuvbM9QBPUUlLZc6f/oRx5UoKyxbv+6Ht/CmVj+8klSgHXgh/UN2eoJznR5kYzLjq5zL1wqIWQ6M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=qq.com; spf=pass smtp.mailfrom=qq.com; dkim=pass (1024-bit key) header.d=qq.com header.i=@qq.com header.b=t1MQHiTE; arc=none smtp.client-ip=203.205.221.155 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=qq.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=qq.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=qq.com header.i=@qq.com header.b="t1MQHiTE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201512; t=1763175897; bh=3pfW+nTPOizp5ae3wU4J9hmG9q649UAuyaxagX4QObQ=; h=From:To:Cc:Subject:Date; b=t1MQHiTETD6Xa4ok9CrCSPmZr6M2rxRVVTU4PWZH0Hjs6No1LX597GLzUKVrS3z6Z KI1eeefzf29gFyv4WSfCFq4JV1rnHooomlATNRGc71EzWUIOKmGY+YDYWNXKvIcy4I ylOPWy6P2bzhtGE7KjjIkDu4Njt5sMmLjOlt16x8= Received: from node68.. ([166.111.236.25]) by newxmesmtplogicsvrsza63-0.qq.com (NewEsmtp) with SMTP id 1363F030; Sat, 15 Nov 2025 11:04:54 +0800 X-QQ-mid: xmsmtpt1763175894tepxfd5gx Message-ID: X-QQ-XMAILINFO: MqswyhUqVe0CRwL+WpQGJ/BSeXOV4C9EFjf4e2HSkVgZOStP1lYvvkia98RTiY wZiY/DskIq9cTwce7tC1BTsnTCtlmJMFytzRrhLx+fZ3PaJUsOgDvJOn4vymnquFIo8jYpYR1nMp BugvTTAKGzBuc+puVgx5M4l499kwvWe5G3ZpzAxCp0o+JwNwgGnPxlsZN4CFN/F6bvzzQJ2x7crX oG33nOh9GevUbZG+f/yjHN96M41Q11K+5TiKQpLc8ueL9L3kpic8KRkX4glIuJKsP0lQN578Aogb jQ25NUwPkkAcYpzu5X3FAW5flcpiIgJHWmoJ4rQ8AUSHgsw/bsXRtVXmZ1GyfqgPrgIRSYXHl9Js IzTT/iEA7uF393ygXhJp2+Bq5xGUXlq6kyYdXRQ+lGdl/ytmzDy5k7c03Hj2JiXCXSxCAtvRWPZb 72+es7kBjqoPSmeXZNP/wZhfFdUB4XZTE2d4XctOraxCVtx7gqzbLu+rA2T2RDml1E6Lmms7bE1t vm3bRpjBmCxuHalHwOabG4LXUeHzQgl2ecCKjYk3laac3EECzZmVyAwhBZTCEKGkok3+Z1yj5Whr 7YDT0E19+gd6Fvcsp7BVQDsnlkSNYBjlDBAQzfiYRgucfQOnswOQNtjet93412ZR+KZ0ca02zyJE YZYgVvkS0i5d+ywW0tdLfEtTgga0lOnDMtV83Xc5r7Cg2PGLp7bOFAEo4liqyCPC0gjyfb6urfLK gCki5qVk7rZhnNuBUcJstfFtktnvhCtejaPjUa2GkwsqBKhjRPDl3CGnYrpjuuyWDOe7UXx9y0aQ HywY8eGDXfmbHlPdfxNE6VLD3hynfjkleeTC/ywfD6dGxkK9oLGqHEKia11Cfdkte1KS6ODtLlJ2 /XQwqOkYy7cLH993eIvv4ggDC2LxHAbcJeTegbpx1Zi2HSkfnY8uPeIaFPW7W+1qXnSs6sMgjzgT I9nFIsxQMefr7WPk7oS/+DzwudN003PyHmcUJ4g73uqGENPPJd82/QvheNIsGCJz+FTTjMm65zfq zkGc1fmpmSiiTaJ5qt X-QQ-XMRINFO: NI4Ajvh11aEj8Xl/2s1/T8w= From: fujunjie To: akpm@linux-foundation.org Cc: vbabka@suse.cz, surenb@google.com, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, fujunjie Subject: [PATCH v2] mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity Date: Sat, 15 Nov 2025 03:02:55 +0000 X-OQ-MSGID: <20251115030253.3138157-1-fujunjie1@qq.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" calculate_totalreserve_pages() currently finds the maximum lowmem_reserve[j] for a zone by scanning the full forward range [j =3D zone_idx .. MAX_NR_ZONES). However, for a given zone i, the lowmem_reserve[j] array (for j > i) is naturally expected to form a monotonically non-decreasing sequence in j, not as an implementation detail, but as a consequence that naturally arises from the semantics of lowmem_reserve[]. For zone "i", lowmem_reserve[j] expresses how many pages in zone i must effectively be kept in reserve when deciding whether an allocation class that may allocate from zones up to j is allowed to fall back into i. It protects less flexible allocation classes (which cannot use higher zones) from being starved by more flexible ones. Viewed from this semantics, it is natural to expect a partial ordering in j: as j increases, the allocation class gains access to a strictly larger set of fallback zones. Therefore lowmem_reserve[j] is expected to be monotonically non-decreasing in j: more flexible allocation classes must not be allowed to deplete low zones more aggressively than less flexible ones. In other words, if lowmem_reserve[j] were ever observed to *decrease* as j grows, that would be unexpected from the reserve semantics' point of view and would likely indicate a semantic change or a misconfiguration. The current implementation in setup_per_zone_lowmem_reserve() reflects this policy by accumulating managed pages from higher zones and applying the configured ratio, which results in a non-decreasing sequence. This patch makes calculate_totalreserve_pages() rely on that monotonicity explicitly and finds the maximum reserve value by scanning backward and stopping at the first non-zero entry. This avoids unnecessary iteration and reflects the conceptual model more directly. No functional behavior changes. To maintain this assumption explicitly, a comment is added next to setup_per_zone_lowmem_reserve() documenting the monotonicity expectation and noting that calculate_totalreserve_pages() relies on it. Changes in v2: - Reword the semantic explanation of lowmem_reserve[] monotonicity to clarify that it arises naturally from its semantics. - Maintain a minimal reference to the invariant in calculate_totalreserve_pages(), with full documentation placed in setup_per_zone_lowmem_reserve(). Signed-off-by: fujunjie Acked-by: Zi Yan --- mm/page_alloc.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 600d9e981c23d..d13a81de2203b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6285,10 +6285,21 @@ static void calculate_totalreserve_pages(void) long max =3D 0; unsigned long managed_pages =3D zone_managed_pages(zone); =20 - /* Find valid and maximum lowmem_reserve in the zone */ - for (j =3D i; j < MAX_NR_ZONES; j++) - max =3D max(max, zone->lowmem_reserve[j]); + /* + * lowmem_reserve[j] is monotonically non-decreasing + * in j for a given zone (see + * setup_per_zone_lowmem_reserve()). The maximum + * valid reserve lives at the highest index with a + * non-zero value, so scan backwards and stop at the + * first hit. + */ + for (j =3D MAX_NR_ZONES - 1; j > i; j--) { + if (!zone->lowmem_reserve[j]) + continue; =20 + max =3D zone->lowmem_reserve[j]; + break; + } /* we treat the high watermark as reserved pages. */ max +=3D high_wmark_pages(zone); =20 @@ -6313,7 +6324,21 @@ static void setup_per_zone_lowmem_reserve(void) { struct pglist_data *pgdat; enum zone_type i, j; - + /* + * For a given zone node_zones[i], lowmem_reserve[j] (j > i) + * represents how many pages in zone i must effectively be kept + * in reserve when deciding whether an allocation class that is + * allowed to allocate from zones up to j may fall back into + * zone i. + * + * As j increases, the allocation class can use a strictly larger + * set of fallback zones and therefore must not be allowed to + * deplete low zones more aggressively than a less flexible one. + * As a result, lowmem_reserve[j] is required to be monotonically + * non-decreasing in j for each zone i. Callers such as + * calculate_totalreserve_pages() rely on this monotonicity when + * selecting the maximum reserve entry. + */ for_each_online_pgdat(pgdat) { for (i =3D 0; i < MAX_NR_ZONES - 1; i++) { struct zone *zone =3D &pgdat->node_zones[i]; --=20 2.34.1