From nobody Fri Dec 19 07:26:14 2025 Received: from mail-yw1-f169.google.com (mail-yw1-f169.google.com [209.85.128.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0F51D2FDC21 for ; Tue, 14 Oct 2025 14:50:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760453416; cv=none; b=ex//QlARxBiwkIARL5zCEWLSSvfVVLA5Qm4v9L9Smdox8O+egcTi3iDr2CxOY/xxfk82JwersscF+meTOixW+Mkbhvq+NLimewTJ+HS1h6lJf7XlFNydPw9zdO5or5Z6iApvtxB0kWUX3TVC5VyGMavWKtDMBrCQLqAIiK0PQYc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760453416; c=relaxed/simple; bh=bCS44l/LvBBitJnnk4ob4XEeLl5p3PQtQuAfvJHhxEs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nAiw+pNrKJ9rnB5V4JBoXKtCpmrOQ0q92HNwnXGzk6sCf1fYYYL5+GGu27YmbiM3SYyX3YGWdnnLm082qWPhVLQ/80r9gkNgkG3lT/gIcxlfT9hr4hz6eBXeCuzE1l1fiHLLrg3GhpDnKJQxLXl2NsdG5KkaSeBW306uMR3xSIc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=GpqmWF6I; arc=none smtp.client-ip=209.85.128.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="GpqmWF6I" Received: by mail-yw1-f169.google.com with SMTP id 00721157ae682-77f9fb2d9c5so48427387b3.0 for ; Tue, 14 Oct 2025 07:50:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1760453414; x=1761058214; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DaKsUlhHVYzGr/qaa9OrQYufYtYEbalh3sXBUEUgogU=; b=GpqmWF6IH8FpV7onafHxL4+ZwpY9CDLtKqWINfOgYYwlZVfmMOM4sBh4E3ZI42qWR0 QdbFv3JuhZbIQvoFyGNiYXbVRcN0XzBQJivB0/lsFBdkKTM7DmIc0oOiKNuw9yPHkpdW UEwyGeeBpEZ7eZMT8VqkxMbq+aryAiDJgFmoII3EfnK77rCagLMYwNnVA+85/WX42oS8 VHjID2/el8g2DUOxoG7NEwwXIS+n5dx4ewfuFKkQPMBJ9FvN/84qlTw/Ne3tdSSzqIJ3 /k9stFMznMWxeDVvez6Vw1GoLrHNs6jdHnCkPORF3c/z7uNIEg6jcLsPZxeX4b3uHFMR AVDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760453414; x=1761058214; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DaKsUlhHVYzGr/qaa9OrQYufYtYEbalh3sXBUEUgogU=; b=b3nsitWkiE9ATl4Ss9kQ4J0MHVpNHxXIESMFyayCWe3/cLRc8t519CV69SUpIqDEnB lP0duiBLoj0X3VvndSFrmWevyYhJXA28GgeVBF0BTSCq15jFgEWhzgoH6Rvdzyk8Jb0B RedbnUyPxwHSZZ28d945foS3NG+qqTcapUSS7JPhUg9VSM1eXZ/Al+X4EOE9UUOsBFjV Bw4WN6bFgIvcGUhUdvzKwBK9vtqA3uELXqDIzjdtuoEdFjakZqBakR2TDWLn1GXLU2W8 b3o8n+HWBtcBxKIeEfdDhbCRufHS8/BGud2gaEzGGOw7m0xH0PAUg8Wiz9QuSHyjZ9eF JIvg== X-Forwarded-Encrypted: i=1; AJvYcCUEQCdaN6XhAJoZl0mNKxWdhieWP6bCv3cv/USEI8SMwU4JxBXfJa8z2Ry8wuOfY8gu9jJF3zg96D60Ieg=@vger.kernel.org X-Gm-Message-State: AOJu0YyNUFsZYYBGxiVDPpv59HFWOXIG6chtZ9yiDjiEHbm8kDQTwkQ+ dqZ7JONuK2RYStMOANmLJ47PrIjRmarefQJ8B8BS2eC72vtFO+pnUg1p X-Gm-Gg: ASbGnct7/f5UhD3WpSCj1C6YSlRipq21fK1PDthDf5MHbISpNSVxdOOsh2EFjJuafS9 D2QG1PQ6fMAgwrzZgsrCedT2HophI7MrRpDLievheVusuxDFfBbK7EEYVpUb/PN7BgYBYUCXbmE TkYtMJ5VprJ7DfmCHg7Tysi29TGxtmDtaGzvAjufSo0SWrhKmhY2FUOxUHu/iRP2g4IJc284nxK dmkrhbD9WRXJoWEkp+URLMgOW6SoDFPxS88Ee+pX4XbjQGoOzHNWFnR3B9errbiTeoqYcH9bFUT Dj0KwP8Js+QtYTkOxzdKVSIqh5wHa1+J28YWXF3auJTaQRvZu7n4ST/IR5v2RbQ7dn5UTGRgqni BJV2jkTfDbEKnJjhcVkkL7GI1fy8zkAnccLwC0nblZYs80evQnGDzLTBwoN7lexbp7ypPXPQKaT jY8kqZVd05 X-Google-Smtp-Source: AGHT+IF24AJ/mCnHHT8gWUZWqNOL84P3lZeFu7B9AvcqJEw2IIOt8BRDcj1YLWF4MZHoT7gLYf+rwg== X-Received: by 2002:a05:690c:3687:b0:781:5636:c1fb with SMTP id 00721157ae682-7815636c367mr7065537b3.38.1760453413642; Tue, 14 Oct 2025 07:50:13 -0700 (PDT) Received: from localhost ([2a03:2880:25ff:56::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-78106e34136sm34138527b3.25.2025.10.14.07.50.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Oct 2025 07:50:13 -0700 (PDT) From: Joshua Hahn To: Andrew Morton Cc: Chris Mason , Kiryl Shutsemau , "Liam R. Howlett" , Brendan Jackman , David Hildenbrand , Johannes Weiner , Lorenzo Stoakes , Michal Hocko , Mike Rapoport , Suren Baghdasaryan , Vlastimil Babka , Zi Yan , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-team@meta.com Subject: [PATCH v5 1/3] mm/page_alloc/vmstat: Simplify refresh_cpu_vm_stats change detection Date: Tue, 14 Oct 2025 07:50:08 -0700 Message-ID: <20251014145011.3427205-2-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20251014145011.3427205-1-joshua.hahnjy@gmail.com> References: <20251014145011.3427205-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently, refresh_cpu_vm_stats returns an int, indicating how many changes were made during its updates. Using this information, callers like vmstat_update can heuristically determine if more work will be done in the future. However, all of refresh_cpu_vm_stats's callers either (a) ignore the result, only caring about performing the updates, or (b) only care about whether changes were made, but not *how many* changes were made. Simplify the code by returning a bool instead to indicate if updates were made. In addition, simplify fold_diff and decay_pcp_high to return a bool for the same reason. Reviewed-by: Vlastimil Babka Reviewed-by: SeongJae Park Signed-off-by: Joshua Hahn --- include/linux/gfp.h | 2 +- mm/page_alloc.c | 8 ++++---- mm/vmstat.c | 28 +++++++++++++++------------- 3 files changed, 20 insertions(+), 18 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 0ceb4e09306c..f46b066c7661 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -386,7 +386,7 @@ extern void free_pages(unsigned long addr, unsigned int= order); #define free_page(addr) free_pages((addr), 0) =20 void page_alloc_init_cpuhp(void); -int decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp); +bool decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp); void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); void drain_all_pages(struct zone *zone); void drain_local_pages(struct zone *zone); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 600d9e981c23..bbc3282fdffc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2557,10 +2557,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned= int order, * Called from the vmstat counter updater to decay the PCP high. * Return whether there are addition works to do. */ -int decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp) +bool decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp) { int high_min, to_drain, batch; - int todo =3D 0; + bool todo =3D false; =20 high_min =3D READ_ONCE(pcp->high_min); batch =3D READ_ONCE(pcp->batch); @@ -2573,7 +2573,7 @@ int decay_pcp_high(struct zone *zone, struct per_cpu_= pages *pcp) pcp->high =3D max3(pcp->count - (batch << CONFIG_PCP_BATCH_SCALE_MAX), pcp->high - (pcp->high >> 3), high_min); if (pcp->high > high_min) - todo++; + todo =3D true; } =20 to_drain =3D pcp->count - pcp->high; @@ -2581,7 +2581,7 @@ int decay_pcp_high(struct zone *zone, struct per_cpu_= pages *pcp) spin_lock(&pcp->lock); free_pcppages_bulk(zone, to_drain, pcp, 0); spin_unlock(&pcp->lock); - todo++; + todo =3D true; } =20 return todo; diff --git a/mm/vmstat.c b/mm/vmstat.c index bb09c032eecf..98855f31294d 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -771,25 +771,25 @@ EXPORT_SYMBOL(dec_node_page_state); =20 /* * Fold a differential into the global counters. - * Returns the number of counters updated. + * Returns whether counters were updated. */ static int fold_diff(int *zone_diff, int *node_diff) { int i; - int changes =3D 0; + bool changed =3D false; =20 for (i =3D 0; i < NR_VM_ZONE_STAT_ITEMS; i++) if (zone_diff[i]) { atomic_long_add(zone_diff[i], &vm_zone_stat[i]); - changes++; + changed =3D true; } =20 for (i =3D 0; i < NR_VM_NODE_STAT_ITEMS; i++) if (node_diff[i]) { atomic_long_add(node_diff[i], &vm_node_stat[i]); - changes++; + changed =3D true; } - return changes; + return changed; } =20 /* @@ -806,16 +806,16 @@ static int fold_diff(int *zone_diff, int *node_diff) * with the global counters. These could cause remote node cache line * bouncing and will have to be only done when necessary. * - * The function returns the number of global counters updated. + * The function returns whether global counters were updated. */ -static int refresh_cpu_vm_stats(bool do_pagesets) +static bool refresh_cpu_vm_stats(bool do_pagesets) { struct pglist_data *pgdat; struct zone *zone; int i; int global_zone_diff[NR_VM_ZONE_STAT_ITEMS] =3D { 0, }; int global_node_diff[NR_VM_NODE_STAT_ITEMS] =3D { 0, }; - int changes =3D 0; + bool changed =3D false; =20 for_each_populated_zone(zone) { struct per_cpu_zonestat __percpu *pzstats =3D zone->per_cpu_zonestats; @@ -839,7 +839,8 @@ static int refresh_cpu_vm_stats(bool do_pagesets) if (do_pagesets) { cond_resched(); =20 - changes +=3D decay_pcp_high(zone, this_cpu_ptr(pcp)); + if (decay_pcp_high(zone, this_cpu_ptr(pcp))) + changed =3D true; #ifdef CONFIG_NUMA /* * Deal with draining the remote pageset of this @@ -861,13 +862,13 @@ static int refresh_cpu_vm_stats(bool do_pagesets) } =20 if (__this_cpu_dec_return(pcp->expire)) { - changes++; + changed =3D true; continue; } =20 if (__this_cpu_read(pcp->count)) { drain_zone_pages(zone, this_cpu_ptr(pcp)); - changes++; + changed =3D true; } #endif } @@ -887,8 +888,9 @@ static int refresh_cpu_vm_stats(bool do_pagesets) } } =20 - changes +=3D fold_diff(global_zone_diff, global_node_diff); - return changes; + if (fold_diff(global_zone_diff, global_node_diff)) + changed =3D true; + return changed; } =20 /* --=20 2.47.3 From nobody Fri Dec 19 07:26:14 2025 Received: from mail-yx1-f53.google.com (mail-yx1-f53.google.com [74.125.224.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D0FEA2FDC5A for ; Tue, 14 Oct 2025 14:50:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.224.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760453417; cv=none; b=Q6IBuM4XSW8J8V/fWlzT7bVw7XVVq3cdiuZdC1uQOpoqp5MMK+9SHwHh7/UNgKnfKrpVrdtLhHRtHJZyVyJbiJCdDl2/GJivNoMcrD8FhPumNBa0DpzrLiwMdHCnevCMo2PmwBHHe0lTzxRhTpnkoyJ71lLvxmJfKNa2cG6k+30= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760453417; c=relaxed/simple; bh=dBI3eakdPMRsK/c3A94rS1c38Jy4VOMXm5ScJs1zjkY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kFqYvhIb78HZA+sWq4J3hcKsOLhqdJsGdWsk5nOhJKzhrtlKpXV8IbW9Par4BwG4jacJjGS5ExYB6JYyRR9u5vqPP/Pi4Qn2i5byX+6hjfSYBH90BQ+Hu54tB03wvPzXLJ1DPKVszzCZsPzMMg5JUCJTsx82Z5DWNpB9cUUf5/o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=SCNAs7ib; arc=none smtp.client-ip=74.125.224.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="SCNAs7ib" Received: by mail-yx1-f53.google.com with SMTP id 956f58d0204a3-63d0692136bso2164714d50.1 for ; Tue, 14 Oct 2025 07:50:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1760453415; x=1761058215; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=G4PWcq4M+uwGRj6srPkRX14QChgWCsKUHbcguMBrius=; b=SCNAs7ibTFvgetJmp+jf0J1JW4XwWCVv13RCJwG03thyz4YIdLGs7eJ1rvsUjzo05t Vn0WYTCR/ji0nB0UmAHenFt37eqR3dx5Ghl04lGcAGraggWEqUV52MoEcCNgoNmb4I+/ ps/oOK0Mv9ivCXvXWRL47JB0i3ElrvHCxKcOhuMuFRO1FvzUwwGqOvzqak7X5nzXHCqW Wby4q2UKKt0Yrl3qdJEd+YmCUIGEzoMswrVe3udZg9xDX8G7wGhsxNNN+O2jKdp+rUN2 A9rVdsVLOkIiRvdvyBzmlV1dtxqFGyBpNXUGqzhUgTIrhUSFYe7sx6yWd6SjQz+b7T4R MZ9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760453415; x=1761058215; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=G4PWcq4M+uwGRj6srPkRX14QChgWCsKUHbcguMBrius=; b=JL/uFrjWQC+zpQBTj84G3wtMBW2ba1U36PG4H8J58tyFzCJhvB4WBoOR28csVK+05S Z1LHBAIaCtgwxuNpOSfRl1oB53gSHnN35aTh6jo47slox9ZYmKl8MvqUv0vs0aj8M+NM T8MhtgobNP+7mPznqpjVQb/D55RMS2IObqwwA8hDHdKBfjJPtNA4DDt8c1we9q4xdLnr A6oFbrUBTGR9jRRUCEzfpgQhLI48swa+54qYgPkbZKDD5tzE/ZQcIMVPR2ShGIWbUi4D H4ayysHF+uwjxcWEmClpNjcaSIKRUHF4r93DCyl7yHgQp7R/6zsf0zjhQEhXfd0hTEo0 4Khg== X-Forwarded-Encrypted: i=1; AJvYcCVW5jCQN2fZMFFmBydtaJ2y3EopCEBrw6t3DdC+djZ4cWbW8WsLwjdFVb033CxQAwyZbIEGDfuf0J28prk=@vger.kernel.org X-Gm-Message-State: AOJu0YwU30p8xeksXJAwEGSrBdjuAfkGDBc7SWNeM3zW8ChB1spZc5Bo o/9UQQcIW+7nLfViDJGtNnCnQPz0c+37N9Qs3xFzJuuEH5iXJkDRSVl1 X-Gm-Gg: ASbGncvy4JoudJnPg4d6gs76x3UmqvJpWoBshm/MDwKLV57r3Tef37ISaTJ9nk/yJGa wJxbBrT+Eh7qysufcPpNC2FXjqFFdg/MGIoNBae1Tyz3Prw69+h92bynKzZIeMwi44BU/2avziX /qbUOtNWDkNlhT0g6xtJ6bNvgu8ZRdu8rtr+ZKCXCROLSYAlu/8LTTjIh5Vz4fdBi8x0r5TuICl gIopYprSAcfItMwD/rjGSpzwSEu2jO/ku7QO+kqiYjwG+Svh05SpfPDpvLYlRN0sff54JgN3ZWI JK5kdHmK5c2p6XKSXOzcLcdock07Fa2JyPugmrrGoyH28an/Kg3c1zB8AHhbmf8qp64odOpo5rH ZANwZSimmoeSFbRzmKUk5sBPJLrRirgB7BWyiQ2w6xLHDsUH6o+qScBokRLn0COVQMJaLEBPHIi Wk8Uaqk59SeuMKUQs/8Q== X-Google-Smtp-Source: AGHT+IHvXFVB71mxsQWd49tPQxNk+oc1NWjBVBlPVRmPisF0KBHXyTAe+jCzazw9vyGeZFp4wLQqaw== X-Received: by 2002:a53:bf05:0:b0:63b:8e0b:fc9e with SMTP id 956f58d0204a3-63ccb8dc10dmr17448464d50.25.1760453414759; Tue, 14 Oct 2025 07:50:14 -0700 (PDT) Received: from localhost ([2a03:2880:25ff:a::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-781072f75c4sm33746987b3.64.2025.10.14.07.50.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Oct 2025 07:50:14 -0700 (PDT) From: Joshua Hahn To: Andrew Morton Cc: Chris Mason , Kiryl Shutsemau , Brendan Jackman , Johannes Weiner , Michal Hocko , Suren Baghdasaryan , Vlastimil Babka , Zi Yan , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-team@meta.com Subject: [PATCH v5 2/3] mm/page_alloc: Batch page freeing in decay_pcp_high Date: Tue, 14 Oct 2025 07:50:09 -0700 Message-ID: <20251014145011.3427205-3-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20251014145011.3427205-1-joshua.hahnjy@gmail.com> References: <20251014145011.3427205-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" It is possible for pcp->count - pcp->high to exceed pcp->batch by a lot. When this happens, we should perform batching to ensure that free_pcppages_bulk isn't called with too many pages to free at once and starve out other threads that need the pcp or zone lock. Since we are still only freeing the difference between the initial pcp->count and pcp->high values, there should be no change to how many pages are freed. Suggested-by: Chris Mason Suggested-by: Andrew Morton Co-developed-by: Johannes Weiner Reviewed-by: Vlastimil Babka Signed-off-by: Joshua Hahn --- mm/page_alloc.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bbc3282fdffc..8ecd48be8bdd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2559,7 +2559,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned i= nt order, */ bool decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp) { - int high_min, to_drain, batch; + int high_min, to_drain, to_drain_batched, batch; bool todo =3D false; =20 high_min =3D READ_ONCE(pcp->high_min); @@ -2577,11 +2577,14 @@ bool decay_pcp_high(struct zone *zone, struct per_c= pu_pages *pcp) } =20 to_drain =3D pcp->count - pcp->high; - if (to_drain > 0) { + while (to_drain > 0) { + to_drain_batched =3D min(to_drain, batch); spin_lock(&pcp->lock); - free_pcppages_bulk(zone, to_drain, pcp, 0); + free_pcppages_bulk(zone, to_drain_batched, pcp, 0); spin_unlock(&pcp->lock); todo =3D true; + + to_drain -=3D to_drain_batched; } =20 return todo; --=20 2.47.3 From nobody Fri Dec 19 07:26:14 2025 Received: from mail-yw1-f175.google.com (mail-yw1-f175.google.com [209.85.128.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 288192FE59C for ; Tue, 14 Oct 2025 14:50:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760453419; cv=none; b=E/Pu2tRJaSEJobOJye9a1CvfxvS+CJbSKoiXEKedEVZwY13SXvfEGipCfOVAZheOJ6FDCWSafD6fNbWFldQaji1fW5LgMjQxxMm3QhQIU5VyKpLX/CMk43eF6t8J3J2TPBqbZVP2WuM8S8LmkJLhuLRxhURAwi6VTBDLmgxWn5E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760453419; c=relaxed/simple; bh=gM4BHnwAtz6HI7EhDYS58J8f6agfCCs2MKOQf2OwZ+Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EHOR6NMbE0ZdMXEJVn+MRzpuvxnVLuXp8zaYRGT7ULTaB95f4dL8MSCKI5ttvZpKAxjzGQQvsvIzl2bSV2g33rJhIiNk6g/3uKBeU9XWlmMQCpoGWCY9GFcQWQx7kB2nY6Wgeq0BRkTISDDmkXwAd0IjvZKoiokxePlCmJxkds0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ZpmczhE6; arc=none smtp.client-ip=209.85.128.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ZpmczhE6" Received: by mail-yw1-f175.google.com with SMTP id 00721157ae682-77fac63ba26so53130427b3.3 for ; Tue, 14 Oct 2025 07:50:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1760453416; x=1761058216; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ClhLNJaftDP05NzJp1lkBa4h24q4WP2h3MIN5SIwT/I=; b=ZpmczhE6vR8IU/AEBHko79LkBd4LPy71TAhgZN1v2meHjt4VmCb+Z1VZ23Ct6tKxWh +oFCxMhbhk9Ff7HFcvoQsOteHYv0qKvINUKBSpaLROGP12OGuUQwNWQd/ZDdgaRlLTlM Sni5mL+1uX1+9TV1ZVGPEevXpV9kI0/RSPH0vF6sPnWJN+uvWtXRaw50JtSwpASlEVrE qWNrGZ9VfusqS427snAMPRLbn9jgMMvVJ3yNG5ZLQn2FY3LwkkXNEvSUm1IKJ6Z+4GbJ cHcL1oUFfSsuWRQm9ngz+VpzslBFXVPq+ZTQ/K6eT5oIrWHyJf2hmUNNM9bAKG59ahBU 0r/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760453416; x=1761058216; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ClhLNJaftDP05NzJp1lkBa4h24q4WP2h3MIN5SIwT/I=; b=Eks/tv9YGrE9BqZpKQX5BTHWLhqC7nG36DnSFr9/eCAlnkhiFzi4E2qhUfD+YxL7Ec tXF3uD0jCdBfeuTpIirxfo7oL/VvqMTXEsInKge7nOmjdTO+Nzuk9zK/2cqlElH6xYmC 1FEMlSEp45oFsX/vabiKSfbFT6t+miZVD/1tKQkHd89VAmx5CImUUfiOxvcIgErqVBcE 68+qtOvkU7mRxioN8ZcOifJVoZ9FQfzE/OY3e+ZNvr1MwhSEXPAyLY5J3wqTRSD+eFS/ zA4bAbQB6VYsZ3R1OcEMPu4fUgu4C3QrMQcHo39XgNBcVYL2V9QZHpg+/ioHBL3P9UDR ux2Q== X-Forwarded-Encrypted: i=1; AJvYcCVnkS9UWNwf8FG40wre+JV7GNMqKsVtcJtb8WWtYtNvvNnLklMmCs4N2vN8VLtY3zn4MTa/ddWchEdnw0I=@vger.kernel.org X-Gm-Message-State: AOJu0YxlOSbVvy029KHTz0alXZI7ZLuEUAlSDnLZMMdv9FRJ2kJHCbTY VuxuT97uD/YZnpiwCFS8GSQLULiC4a43o2OjLiNwBMBGHSNs/Y9CuyZS X-Gm-Gg: ASbGncsinYhRlo+Lu56uWXGOp6HZCQIb/HF8fNPZsNS31XRhToYv5jNuXyGSxzbVpgQ pE3aKGg2GnhFcrjgu45PYyNfaAeh28J/Spm9c2G6knm5OSOwmIj+N1QEErLKDddn46ZGcKRtiuU a7I3dzIl9Js50imvAII5oWB6HeMFF0R+MaVyKYpMrgp4ieS+JvEaF+V9pVW6RZwjo7hIqcUd90j FfD9zRhQPFKbeWsmlzNPs7fw0zlkXzOc+4p/oo78vdEAPM9p0HwqbEux52Y9phwiLP65XoDpmwo X5GLTCCeTyiZhnRhHAbMaDq3g6Q94pr8oNQoTiL/l1NGErm1eraM72z6SaJy+/+pyYgQPLVSwEw EYCSOkZxyJw1e+9QpRSY1klo7EA20Ki8L4U199NERQiLb59pPF5kDpp90sVL3f5E4SdWq1MrRGe Wm4Z2ibZXBvHJGx4B/r1U= X-Google-Smtp-Source: AGHT+IFLwCmMuwzIce9/aUOI7BcbPoJAhUZpDPqSafzbEBP4ORZ6NkkDdl2noKzyFimA8Zx/Uj77vw== X-Received: by 2002:a05:690c:d06:b0:735:7cbc:a935 with SMTP id 00721157ae682-780e13f5099mr272944127b3.11.1760453415821; Tue, 14 Oct 2025 07:50:15 -0700 (PDT) Received: from localhost ([2a03:2880:25ff:53::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-78107205820sm34229117b3.31.2025.10.14.07.50.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Oct 2025 07:50:15 -0700 (PDT) From: Joshua Hahn To: Andrew Morton Cc: Chris Mason , Kiryl Shutsemau , Brendan Jackman , Johannes Weiner , Michal Hocko , Suren Baghdasaryan , Vlastimil Babka , Zi Yan , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-team@meta.com Subject: [PATCH v5 3/3] mm/page_alloc: Batch page freeing in free_frozen_page_commit Date: Tue, 14 Oct 2025 07:50:10 -0700 Message-ID: <20251014145011.3427205-4-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20251014145011.3427205-1-joshua.hahnjy@gmail.com> References: <20251014145011.3427205-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Before returning, free_frozen_page_commit calls free_pcppages_bulk using nr_pcp_free to determine how many pages can appropritately be freed, based on the tunable parameters stored in pcp. While this number is an accurate representation of how many pages should be freed in total, it is not an appropriate number of pages to free at once using free_pcppages_bulk, since we have seen the value consistently go above 2000 in the Meta fleet on larger machines. As such, perform batched page freeing in free_pcppages_bulk by using pcp->batch. In order to ensure that other processes are not starved of the zone lock, free both the zone lock and pcp lock to yield to other threads. Note that because free_frozen_page_commit now performs a spinlock inside the function (and can fail), the function may now return with a freed pcp. To handle this, return true if the pcp is locked on exit and false otherwis= e. In addition, since free_frozen_page_commit must now be aware of what UP flags were stored at the time of the spin lock, and because we must be able to report new UP flags to the callers, add a new unsigned long* parameter UP_flags to keep track of this. The following are a few synthetic benchmarks, made on three machines. The first is a large machine with 754GiB memory and 316 processors. The second is a relatively smaller machine with 251GiB memory and 176 processors. The third and final is the smallest of the three, which has 62G= iB memory and 36 processors. On all machines, I kick off a kernel build with -j$(nproc). Negative delta is better (faster compilation) Large machine (754GiB memory, 316 processors) make -j$(nproc) +------------+---------------+-----------+ | Metric (s) | Variation (%) | Delta(%) | +------------+---------------+-----------+ | real | 0.8070 | - 1.4865 | | user | 0.2823 | + 0.4081 | | sys | 5.0267 | -11.8737 | +------------+---------------+-----------+ Medium machine (251GiB memory, 176 processors) make -j$(nproc) +------------+---------------+----------+ | Metric (s) | Variation (%) | Delta(%) | +------------+---------------+----------+ | real | 0.2806 | +0.0351 | | user | 0.0994 | +0.3170 | | sys | 0.6229 | -0.6277 | +------------+---------------+----------+ Small machine (62GiB memory, 36 processors) make -j$(nproc) +------------+---------------+----------+ | Metric (s) | Variation (%) | Delta(%) | +------------+---------------+----------+ | real | 0.1503 | -2.6585 | | user | 0.0431 | -2.2984 | | sys | 0.1870 | -3.2013 | +------------+---------------+----------+ Here, variation is the coefficient of variation, i.e. standard deviation / = mean. Suggested-by: Chris Mason Co-developed-by: Johannes Weiner Signed-off-by: Joshua Hahn --- mm/page_alloc.c | 65 ++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 56 insertions(+), 9 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8ecd48be8bdd..6d544521e49c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2818,12 +2818,22 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, s= truct zone *zone, return high; } =20 -static void free_frozen_page_commit(struct zone *zone, +/* + * Tune pcp alloc factor and adjust count & free_count. Free pages to brin= g the + * pcp's watermarks below high. + * + * May return a freed pcp, if during page freeing the pcp spinlock cannot = be + * reacquired. Return true if pcp is locked, false otherwise. + */ +static bool free_frozen_page_commit(struct zone *zone, struct per_cpu_pages *pcp, struct page *page, int migratetype, - unsigned int order, fpi_t fpi_flags) + unsigned int order, fpi_t fpi_flags, unsigned long *UP_flags) { int high, batch; + int to_free, to_free_batched; int pindex; + int cpu =3D smp_processor_id(); + int ret =3D true; bool free_high =3D false; =20 /* @@ -2861,15 +2871,46 @@ static void free_frozen_page_commit(struct zone *zo= ne, * Do not attempt to take a zone lock. Let pcp->count get * over high mark temporarily. */ - return; + return true; } =20 high =3D nr_pcp_high(pcp, zone, batch, free_high); if (pcp->count < high) - return; + return true; + + to_free =3D nr_pcp_free(pcp, batch, high, free_high); + while (to_free > 0 && pcp->count > 0) { + to_free_batched =3D min(to_free, batch); + free_pcppages_bulk(zone, to_free_batched, pcp, pindex); + to_free -=3D to_free_batched; + + if (to_free <=3D 0 || pcp->count <=3D 0) + break; + + pcp_spin_unlock(pcp); + pcp_trylock_finish(*UP_flags); + + pcp_trylock_prepare(*UP_flags); + pcp =3D pcp_spin_trylock(zone->per_cpu_pageset); + if (!pcp) { + pcp_trylock_finish(*UP_flags); + ret =3D false; + break; + } + + /* + * Check if this thread has been migrated to a different CPU. + * If that is the case, give up and indicate that the pcp is + * returned in an unlocked state. + */ + if (smp_processor_id() !=3D cpu) { + pcp_spin_unlock(pcp); + pcp_trylock_finish(*UP_flags); + ret =3D false; + break; + } + } =20 - free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), - pcp, pindex); if (test_bit(ZONE_BELOW_HIGH, &zone->flags) && zone_watermark_ok(zone, 0, high_wmark_pages(zone), ZONE_MOVABLE, 0)) { @@ -2887,6 +2928,7 @@ static void free_frozen_page_commit(struct zone *zone, next_memory_node(pgdat->node_id) < MAX_NUMNODES) atomic_set(&pgdat->kswapd_failures, 0); } + return ret; } =20 /* @@ -2934,7 +2976,9 @@ static void __free_frozen_pages(struct page *page, un= signed int order, pcp_trylock_prepare(UP_flags); pcp =3D pcp_spin_trylock(zone->per_cpu_pageset); if (pcp) { - free_frozen_page_commit(zone, pcp, page, migratetype, order, fpi_flags); + if (!free_frozen_page_commit(zone, pcp, page, migratetype, + order, fpi_flags, &UP_flags)) + return; pcp_spin_unlock(pcp); } else { free_one_page(zone, page, pfn, order, fpi_flags); @@ -3034,8 +3078,11 @@ void free_unref_folios(struct folio_batch *folios) migratetype =3D MIGRATE_MOVABLE; =20 trace_mm_page_free_batched(&folio->page); - free_frozen_page_commit(zone, pcp, &folio->page, migratetype, - order, FPI_NONE); + if (!free_frozen_page_commit(zone, pcp, &folio->page, + migratetype, order, FPI_NONE, &UP_flags)) { + pcp =3D NULL; + locked_zone =3D NULL; + } } =20 if (pcp) { --=20 2.47.3