From nobody Sun May 24 20:35:14 2026 Received: from mail-oi1-f171.google.com (mail-oi1-f171.google.com [209.85.167.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 696B9391827 for ; Fri, 22 May 2026 22:06:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779487594; cv=none; b=OgkT+BRYHia6E2p2NdqKhQ1u80rn1oXoBaq1fKos7B7pNAdv7ZNErcRVYWbYgBLs5iVaSPWNPlvKYkI2XNbE9vV6hgzlQsbzrqT4qlPYabJWYnZL/ZMDzDNRH5EHjO7lhR6UIvbqQDTzQQoyuXW3sdI2Clg1n5NlyOT4yWFjFg0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779487594; c=relaxed/simple; bh=8I8Xvk6Gz3cosLpmWj+Gb/iKGe7OG5+jYM3K/AvEEYk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nWJq+zbJe3bFNU7eM3LIOARB16uEQi3w1ZKQ9fGAZCHQlruYa7TCtxm/w/VnlwZLBHmRjD+EPNpymlHq6qd2zcl1CaeO/xiRy/4teyXYZHuoDfTdR31C7oEzvi0wAOTAZV6ksoWA9YPyUdKWIMvQsGUIk93aKXuTG0UyRH6c9L8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=El8uEpT7; arc=none smtp.client-ip=209.85.167.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="El8uEpT7" Received: by mail-oi1-f171.google.com with SMTP id 5614622812f47-485433a6889so679075b6e.0 for ; Fri, 22 May 2026 15:06:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779487592; x=1780092392; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RDRjwcWKT8jzrOHACXaQIaOyyT2TFq8Oufd7POOepxo=; b=El8uEpT7wia9wavqpW/Tu5UqRsm0/OJbSOPX/5KzYkpnlvMWozDhzeodhouAA9kvpV x7bj5GDYmCi3YJe4PZqHsEEPQjR8l3adacxfDrt0kY8NxNMm0mhaMlWi4claz7oIRxwB jtqLxATPkUOqiEYudsLruXglU/lEu8QcDhMqnrg7SBvkoDeky1HV39N7eGJy0RMb3NG/ DX3PVvfBOkEFQ3fHYmAKp6aKn3pGghvFIuY2Y0fF9pEbWoeRY0Al/WSFvdkkgLkMTpiB JFrcUdEm+pMtAKbRpbcs4kztuqU1kLpGrKCYAl+0wCWYDDl3sctFfZz/4cP7yjSW4SmR ZW9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779487592; x=1780092392; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=RDRjwcWKT8jzrOHACXaQIaOyyT2TFq8Oufd7POOepxo=; b=nXn3Ftghjtqv0MTS4mkq0xTMnLvVGB0kW8SwPpfksml6fXW13tmQIXhDx3eaE7dmXb O2upSUkPfnxyNzfmiedWquANEJzKZOhyU+Cxq6NM3Cyd8vQW2z0pWJ23mfR6EhOBcxUW bRo+BcfCp0H/SY7ETfxGm5ayOaSDv9orRA/RwkCie5yjCz3Zw/2CbQczKtPfUs6I50Ef UF7cOC/bVGulpBtXZhltjc5mS/HzhvcChPfrwvVn1vhl87L5MnIkaNSw7zWANmQMm8xu B4mgs+QYvQfqZ6VMnTRjBIQ9bvVxCfeDP+AbwWKb+AV73vRqj7sm8M9bI7KnUC/1z2ig wI5A== X-Forwarded-Encrypted: i=1; AFNElJ+OoHqwPwyMqUrnlJq6Fu2lFKOfP/Xse1AdoEdZycdLFbwq0mn7Ewl1267bapBrFkPMCNR6HM99059Ngk4=@vger.kernel.org X-Gm-Message-State: AOJu0YwQ2D57WFE6radcvV0ISGMTEFUebqnP+DMPySAV2Tjj7mlSSBmF BueeosCpaTPQoZG+1bpiO4GM5A4IV5jBrGdd/A3Eb6o1M5TLJcGcCruI X-Gm-Gg: Acq92OH4M/e2Ro1lQk7vpAzkeU2gwb7MLFKY92UkZfHs4HGyn0/9zrdvluXyoEDDH7n s+fKH0Gq26O/TE4zp4Icd4JeD7n43mVHSOxCe9GOFErJRgRjKHUM0Ut7H6YZrt9rrshXCtfZuFf aWgTW9CSqzfsFx4WIsYikpxKPhX1ybnnVkMErQZpJzO5JjKFjWtHT2SIjrOebGGUes/iqkDCK6Z /WPN3eA2G+0aqvOKy/rmlhJ34dvR4AYv+fx05ujmbEh0U2ssKQCAYdeGbbrpmR6L6cW+ZJZcY/m 9SyAETNB7R4gxpidWtDRxvHf9TKZ9liCt0pTAjlz/bXXTFJxFre5Lh/sN2UdWjy1F4PaZAximZW 1p7p+hpW0wGPvC/GoTDfbfa9QqLz0AVzgApwCyZHk1VBvGK4pIIZThDw46kmjn2ibvcuFeo530X U9m9wbdrduX0JZ+SfUrEmwNHVKWTyj0As1+N+RwGsDavnqjkOIsgMY7Q== X-Received: by 2002:a05:6808:1387:b0:45f:12bc:4579 with SMTP id 5614622812f47-4854a0e93c1mr3205060b6e.19.1779487592420; Fri, 22 May 2026 15:06:32 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:45::]) by smtp.gmail.com with ESMTPSA id 5614622812f47-485546edd14sm1199998b6e.10.2026.05.22.15.06.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 May 2026 15:06:31 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner , Michal Hocko Cc: Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 1/7 v2] mm/page_counter: introduce per-page_counter stock Date: Fri, 22 May 2026 15:06:19 -0700 Message-ID: <20260522220627.1150804-2-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260522220627.1150804-1-joshua.hahnjy@gmail.com> References: <20260522220627.1150804-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In order to avoid expensive hierarchy walks on every memcg charge and limit check, memcontrol uses per-cpu stocks (memcg_stock_pcp) to cache pre-charged pages and introduce a fast path to try_charge_memcg. However, there are a few quirks with the current implementation that could be improved upon. First, each memcg_stock_pcp can only cache the charges of 7 memcgs (defined as NR_MEMCG_STOCK), which means that once a CPU starts handling the charging of more than 7 memcgs, it randomly selects a victim memcg to evict and drain from the cpu, which can cause unnecessarily increased latencies and thrashing as memcgs continually evict each others' stock. Second, stock is tightly coupled with memcg, which means that all page counters in a memcg share the same resource. This may simplify some of the charging logic, but it prevents new page counters from being added and using a separate stock. We can address these concerns by pushing the concept of stock down to the page_counter level, which addresses the random eviction problem by getting rid of the 7 slot limit, and makes enabling separate stock caches for other page_counters simpler. Introduce a generic per-cpu stock directly in struct page_counter. Stock can optionally be enabled per-page_counter, limiting the overhead increase for page_counters who do not benefit greatly from caching charges. This patch introduces the page_counter_stock struct and its enable/disable/free functions, but does not use these yet. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- include/linux/page_counter.h | 13 ++++++++ mm/page_counter.c | 64 ++++++++++++++++++++++++++++++++++++ 2 files changed, 77 insertions(+) diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h index d649b6bbbc871..c7e3ab3356d20 100644 --- a/include/linux/page_counter.h +++ b/include/linux/page_counter.h @@ -5,8 +5,15 @@ #include #include #include +#include +#include #include =20 +struct page_counter_stock { + local_trylock_t lock; + unsigned long nr_pages; +}; + struct page_counter { /* * Make sure 'usage' does not share cacheline with any other field in @@ -41,6 +48,8 @@ struct page_counter { unsigned long high; unsigned long max; struct page_counter *parent; + struct page_counter_stock __percpu *stock; + unsigned int batch; } ____cacheline_internodealigned_in_smp; =20 #if BITS_PER_LONG =3D=3D 32 @@ -99,6 +108,10 @@ static inline void page_counter_reset_watermark(struct = page_counter *counter) counter->watermark =3D usage; } =20 +int page_counter_enable_stock(struct page_counter *counter, unsigned int b= atch); +void page_counter_disable_stock(struct page_counter *counter); +void page_counter_free_stock(struct page_counter *counter); + #if IS_ENABLED(CONFIG_MEMCG) || IS_ENABLED(CONFIG_CGROUP_DMEM) void page_counter_calculate_protection(struct page_counter *root, struct page_counter *counter, diff --git a/mm/page_counter.c b/mm/page_counter.c index 661e0f2a5127a..a1a871a9d5c49 100644 --- a/mm/page_counter.c +++ b/mm/page_counter.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -289,6 +290,69 @@ int page_counter_memparse(const char *buf, const char = *max, return 0; } =20 +int page_counter_enable_stock(struct page_counter *counter, unsigned int b= atch) +{ + struct page_counter_stock __percpu *stock; + int cpu; + + stock =3D alloc_percpu(struct page_counter_stock); + if (!stock) + return -ENOMEM; + + for_each_possible_cpu(cpu) { + struct page_counter_stock *s =3D per_cpu_ptr(stock, cpu); + + local_trylock_init(&s->lock); + } + counter->stock =3D stock; + counter->batch =3D batch; + + return 0; +} + +static void page_counter_drain_stock_nolock(struct page_counter *counter) +{ + unsigned long stock_to_drain =3D 0; + int cpu; + + for_each_possible_cpu(cpu) { + struct page_counter_stock *stock; + + stock =3D per_cpu_ptr(counter->stock, cpu); + stock_to_drain +=3D stock->nr_pages; + stock->nr_pages =3D 0; + } + + if (stock_to_drain) + page_counter_uncharge(counter, stock_to_drain); +} + +void page_counter_disable_stock(struct page_counter *counter) +{ + if (!counter->stock) + return; + + /* This prevents future charges from trying to deposit pages */ + WRITE_ONCE(counter->batch, 0); + + /* + * Charges can still be in-flight at this time. Instead of locking here, + * do the majority of the drains here without locking to free up pages + * now. Any remaining stock will be drained in page_counter_free_stock. + */ + page_counter_drain_stock_nolock(counter); +} + +void page_counter_free_stock(struct page_counter *counter) +{ + if (!counter->stock) + return; + + page_counter_drain_stock_nolock(counter); + free_percpu(counter->stock); + counter->stock =3D NULL; +} + =20 #if IS_ENABLED(CONFIG_MEMCG) || IS_ENABLED(CONFIG_CGROUP_DMEM) /* --=20 2.53.0-Meta From nobody Sun May 24 20:35:14 2026 Received: from mail-ot1-f42.google.com (mail-ot1-f42.google.com [209.85.210.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4A35392828 for ; Fri, 22 May 2026 22:06:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779487596; cv=none; b=FpUyObuAnigYOzhQOWPKZt8g7BalX9E6QNLdKpfjDlyPLcX9wKkHWMgEEFY3n0S8CRxhKR5MD3SDlkx+ZvioaknxoEo0v0OsW57UzSxjgn2wbsU06BMdNuiCqqiCTiJl6IV3BJU7+V3QRyVMgQr3zQLhK07PAYMXmjgT7Avc71k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779487596; c=relaxed/simple; bh=ob+VYgFU6m5S1jM4nkjA7iYLIdVS3X8bkqvXmzMrSvg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rZSV3Ss5hZ9A31zgez0UfjQrmIWWCLzU1YIiQfnzbsX7gBrYhGhxP6DLF8q8HNEtnO+W+Xfpk9TRdD7zleWlRu2/cF+cCSNb0ebiPQbAykpDKPdOQLeO/ryr2GOhWDysKqHs62ubVe2wg88ozZ8gj54KfkB2A7ec65bTI8CZQLk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MXgjwohL; arc=none smtp.client-ip=209.85.210.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MXgjwohL" Received: by mail-ot1-f42.google.com with SMTP id 46e09a7af769-7dbd23bc684so4216418a34.2 for ; Fri, 22 May 2026 15:06:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779487594; x=1780092394; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ieNLELksxQ12VWfSTE2wcDumq9qPO5WfoZYKitlIUko=; b=MXgjwohLqY33VXXjW436tp0GNuIjXwIKMPptm+adFVNjY808wXPnrkGPKPdLRiQ+eS Eu8wEo7NidO/cI0vUN7nXFRiGMw2VHWzpf0WbRgd1YPXV0jwByeaouW55vUQF1GvTN+o EeMtk3R5zmzcqanCkPoRqdU3En9BTHCQCRysvJ/cQPv55XHHaeCgFIw1lBDq69TwvEkN yezWM1UQxG9+EtGDp22DwTWuwMJfu80fxuqV4l2SviOW5zKi/9YxDusZFspB4FrPu96F Cp+wZeU8rl0Kux0KuvtaDmSZ2+TxKIT5vSmhNJ1i25TkWLmdMjCIOyK4kLYaROfysPnd 7JbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779487594; x=1780092394; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=ieNLELksxQ12VWfSTE2wcDumq9qPO5WfoZYKitlIUko=; b=bzateObIi/BecB7LTM9P6PFLvebzP0ikd3ZrewuCFav4OLT51wOTOAF1yuA14XsCA5 RiqkAG2qlIbqTbHsa1En5BOMgZmcjNdo1yhy2ArtLzJUvTke05Egd7O3PTEsBQdimdTy HBg8Z3RnRNgFyDeVVfOPrx9o66VLcBuUvQuFMPXHJtmSDVSEZggymGDMT+90IMSC3O8/ xyP7QtxeL0UMXEFZSw2wVi2K2c+wRpDk6bTK3LSdIDL/oKGTxtdQ5bY4p7/4VPtxOsUV cYbjkYH/zHg6LTuh9HhD89DSnZnZ5FzvkEsYV8JxcuilUlrIFHrgEoKmVY5iR/MPPvGm tS7Q== X-Forwarded-Encrypted: i=1; AFNElJ8JyZuY5x46YNTSzwZONLsGAxyq47l66ZYnXdmRua1KMUUn5N8IuQlciJeIrH8lN/uMXBy9UDtorx73DeQ=@vger.kernel.org X-Gm-Message-State: AOJu0YzyRHkwEF9RIDEw+ievFHWuiFnPVxgey1/0N8iG8qzmu/SUzib8 pglH1n6ZfNYsBIvEtql5LLmwcm099dTwjaD33gGYD6TwLQ8Bp3eE7q6x X-Gm-Gg: Acq92OFQY6LPREI5kgGvv7EH/qGsOu+1gTuZviVhuMXrB/nYPhaWisvVkYG1WzWJPea xdMGg1Ypc8A+bwY0UheYRfaNb4ZaqQFtsy9btAjWdj1nhtE36974zPan2w57xtdyBaS40hV8cK/ 7zIaDtORFUW9YEyH7OqK0vbA9OOrVMw1APDVZDHDYFcqGT5lwPGM0KMzLiUy/fJzr4dw/cjl9Wi Dk8eQodp2bjwdZNwX2DzBTyBBqgsGy2kYitphWm5STc2DYu61sabWQ5Ma6/oODnXiYuiDpSUNR/ Z7CbgNdTINtvaI6CZtxyjFdCmvFGDNVn3EAvmeEJfxP4RW/Xdv/atK0rwImckdOf2ECAqpcIzVM 74J65EEwDGotNCwku2CG1eeoeT5+v2tcNBW+nyc888mnm/Cgcpqrm9uD37D15/0LTz7HotoP5mc USdJchsgzWHaplflDbsUTpTBoXj7QdVMW7RXD5ErIC6YXGYi9It91e X-Received: by 2002:a05:6830:6d4d:b0:7e1:cbe3:bb1b with SMTP id 46e09a7af769-7e5fea05c4dmr3245063a34.0.1779487593743; Fri, 22 May 2026 15:06:33 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:1::]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7e6064aa0bbsm1941729a34.12.2026.05.22.15.06.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 May 2026 15:06:33 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner , Michal Hocko Cc: Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 2/7 v2] mm/page_counter: use page_counter_stock in page_counter_try_charge Date: Fri, 22 May 2026 15:06:20 -0700 Message-ID: <20260522220627.1150804-3-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260522220627.1150804-1-joshua.hahnjy@gmail.com> References: <20260522220627.1150804-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Make page_counter_try_charge() stock-aware. We preserve the same semantics as the existing stock handling logic in try_charge_memcg: 1. Limit-check against the stock. If there is enough, charge to the stock (non-hierarchical) and return immediately. 2. Greedily attempt to fulfill the charge request and fill the stock up at the same time via a hierarchical charge. 3. If we fail with this charge, retry again (once) with the exact number of pages requested. 4. If we succeed with the greedy attempt, then try to add those extra pages to the stock. If that fails (trylock), then uncharge those surplus pages hierarchically. As of this patch, the page_counter_stock is unused, as it has not been enabled on any memcg yet. No functional changes intended. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- mm/page_counter.c | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-) diff --git a/mm/page_counter.c b/mm/page_counter.c index a1a871a9d5c49..e002688bf7f1a 100644 --- a/mm/page_counter.c +++ b/mm/page_counter.c @@ -121,9 +121,25 @@ bool page_counter_try_charge(struct page_counter *coun= ter, struct page_counter **fail) { struct page_counter *c; + unsigned long charge =3D nr_pages; + unsigned long batch =3D READ_ONCE(counter->batch); bool protection =3D track_protection(counter); bool track_failcnt =3D counter->track_failcnt; =20 + if (counter->stock && local_trylock(&counter->stock->lock)) { + struct page_counter_stock *stock =3D this_cpu_ptr(counter->stock); + + if (stock->nr_pages >=3D charge) { + stock->nr_pages -=3D charge; + local_unlock(&counter->stock->lock); + return true; + } + local_unlock(&counter->stock->lock); + } + + charge =3D max_t(unsigned long, batch, nr_pages); + +retry: for (c =3D counter; c; c =3D c->parent) { long new; /* @@ -140,9 +156,9 @@ bool page_counter_try_charge(struct page_counter *count= er, * we either see the new limit or the setter sees the * counter has changed and retries. */ - new =3D atomic_long_add_return(nr_pages, &c->usage); + new =3D atomic_long_add_return(charge, &c->usage); if (new > c->max) { - atomic_long_sub(nr_pages, &c->usage); + atomic_long_sub(charge, &c->usage); /* * This is racy, but we can live with some * inaccuracy in the failcnt which is only used @@ -163,11 +179,31 @@ bool page_counter_try_charge(struct page_counter *cou= nter, WRITE_ONCE(c->watermark, new); } } + + /* charge > nr_pages implies this page_counter has stock enabled */ + if (charge > nr_pages) { + if (local_trylock(&counter->stock->lock)) { + struct page_counter_stock *stock; + + stock =3D this_cpu_ptr(counter->stock); + stock->nr_pages +=3D charge - nr_pages; + local_unlock(&counter->stock->lock); + } else { + page_counter_uncharge(counter, charge - nr_pages); + } + } + return true; =20 failed: for (c =3D counter; c !=3D *fail; c =3D c->parent) - page_counter_cancel(c, nr_pages); + page_counter_cancel(c, charge); + + if (charge > nr_pages) { + /* Retry without trying to grab extra pages to refill stock */ + charge =3D nr_pages; + goto retry; + } =20 return false; } --=20 2.53.0-Meta From nobody Sun May 24 20:35:14 2026 Received: from mail-oa1-f54.google.com (mail-oa1-f54.google.com [209.85.160.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C37E33955DA for ; Fri, 22 May 2026 22:06:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779487599; cv=none; b=tcTqiIuvbH/Gxd65diKDusqWz4Q6Q5KKqCB/k1mpo+P7SgUxNkup6uYfWbFIP8J9PvJloAUIFctt3ZnCQlGnQ9xcA5ZOvEJaK6X91wAM0C0UR7F4dZjtVlBMsoumg20VgoAitQhWzzHx+7whpmsb+nWGBZFrw+snVZPn0MuoURA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779487599; c=relaxed/simple; bh=bTlx5xfQqrhQmXD9/D84ZEOgc8v5KD0QPZ0MqCBJAak=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=URfbGjdjP/eotfEPEX5z0IDTdyyExwvMx2PEJCwv8RMp3sdZ+RG9uUJP6liqDxQ5oWwM8K8BC5A1PiCmeVSd77gST1wEI6J3CXBXvHSWyjPQPG0kh3/4gvl8Y7KqJPyVhKeJHxlJIeZJuCkuXKzBZBrW/yv4f0VyS2Pwj85R608= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mxPSIi/4; arc=none smtp.client-ip=209.85.160.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mxPSIi/4" Received: by mail-oa1-f54.google.com with SMTP id 586e51a60fabf-43496e6a964so5686366fac.2 for ; Fri, 22 May 2026 15:06:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779487597; x=1780092397; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=SMrn8GveLC1NdRQvZAymrZa6CvaW/GKC3sRoShkKmw8=; b=mxPSIi/4k6JBre2aaBdze56tkWq+g26v7BlVrm0H6xbeMR4bRznxIzK0F7063CXFKj lOXzJGoUAZ3ztiwqUCY2r2OrsMqz88byrw9WUPRnH7Qo6UyAnSuXR/Q/U1Iy0Nkgw8hP 178zCqh9J7aLXP4x1OFnUuoc8SI6cAG/A3SB1b8ROpweLWNJ/ktLxeDTNGGrI8koZO6v l+OAGCzy0nH+04vgnDmiTDW4YPS/Wlyxoc6MC80LK0ttBkDCblracVSrAgznpRQ8qLMc M3U8YCj5UqI5GX7XCiC9QMCjfRE2Ba4bj+ES0zeoeRcFGpDaE6GIeazWXSZvNE6KyuoS fOfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779487597; x=1780092397; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=SMrn8GveLC1NdRQvZAymrZa6CvaW/GKC3sRoShkKmw8=; b=PpQJ48OJcPDobDr1mT37vdYQk/+kKns1zYo3I+QK4RMPWioTYg4hLBwswuC2n/7pL4 tA3YcSlHBImYTYQ+H1LznJRbyVC6RZdX5VmS4S72u5VG0zPu+NxpOzRvMPhO4Tr6cDoM f+wTHhkZwbMWx+xDKyF5xBP3iwGzwa0+EdSk4BrzDUm5fwXYdSAGar8WpZE2rFisRREg lfs9SgkKP9IoEhCv/T/1JAypmggVxPSMSHqIFTb+fsN4R+jCF3h7LBOujRaIjKHRbsNo 8xJWqzVeN5FUNm9/WHFC/QrAyLOtqwXiPH8LuDPrTgryrySc9pLAtUZ81OjnniZX228m eljA== X-Forwarded-Encrypted: i=1; AFNElJ8Isy7FhF2MyMmmGD+TsW/r5sGsZicrNCJUzkeBzf+PQQl7FQvGu02SlvvL9yLx48aDGbzIWHu98pPrDJQ=@vger.kernel.org X-Gm-Message-State: AOJu0Yyg7UvUu4vTSX7rXRlTOHfmIJZrCSYvqWamfHUpZFQyeqApoiTI hY+tM8+6AdEgKiiGUn/RNBgEijV+LdPegVt91t3xed75co7wsdTY17gz X-Gm-Gg: Acq92OHoVW82gJNI5qCZp/hrrb50n3jl3aBtgaqXfDh9xF4jjpTdFjfQdiqWFOIvYLF VudwHVSL79LfggWBiA+hP7lb9keLuX0XEDN0tJ0Y40dYycSczEpnD0S8kMPPUSXvFFt0UPCo3+p JRSOWt98ZOfn74KIwzDubqpFmGFyAPkQrlJF2w+aDMGToUpfJmgTEMXEl95+kA10VnsJzMb+gRD lSBWzwXV+H9mvmmxOurDIRhxDT2x9zW8BT1sxOmBqEwfIs9mZnKxsVP1iGWrPv1yDNcU/SP4pDm FKgbSfzqntwAMCnV3w6IUIwLQzkdHTXb4DS1mnLgksXW2VDHTCcac+Jf5ecNzjbrcVJVZ47TEg/ 1oKVdcstvOmqMJWxX4m72nDPNnxC9ESlihikJ7DSuEL+qoPk7xajQBAg4Kx65rlb7IFjbbKzKAx klv4iAZLLultZDlqeUOZvVsJboquS5CDfxRAdQlBoryHd0AsFKN7Lphg== X-Received: by 2002:a05:6820:1995:b0:67b:b8d0:a7d5 with SMTP id 006d021491bc7-69d7ecb113emr2975215eaf.53.1779487596841; Fri, 22 May 2026 15:06:36 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:54::]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-69d83709124sm1455347eaf.5.2026.05.22.15.06.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 May 2026 15:06:35 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner , Michal Hocko Cc: Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 3/7 v2] mm/page_counter: introduce stock drain APIs Date: Fri, 22 May 2026 15:06:21 -0700 Message-ID: <20260522220627.1150804-4-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260522220627.1150804-1-joshua.hahnjy@gmail.com> References: <20260522220627.1150804-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce page_counter variants to replace memcg stock draining functions. page_counter_drain_stock_local() drains the stock of the local CPU, taking a local stock lock to serialize against concurrent charges. page_counter_drain_stock_cpu() does the same, but without taking a local lock. This is possible because it will only be called from the CPU hotplug path, where the CPU is dead and there cannot be any more charges. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- include/linux/page_counter.h | 3 +++ mm/page_counter.c | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+) diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h index c7e3ab3356d20..ffe13224213c9 100644 --- a/include/linux/page_counter.h +++ b/include/linux/page_counter.h @@ -111,6 +111,9 @@ static inline void page_counter_reset_watermark(struct = page_counter *counter) int page_counter_enable_stock(struct page_counter *counter, unsigned int b= atch); void page_counter_disable_stock(struct page_counter *counter); void page_counter_free_stock(struct page_counter *counter); +void page_counter_drain_stock_local(struct page_counter *counter); +void page_counter_drain_stock_cpu(struct page_counter *counter, + unsigned int cpu); =20 #if IS_ENABLED(CONFIG_MEMCG) || IS_ENABLED(CONFIG_CGROUP_DMEM) void page_counter_calculate_protection(struct page_counter *root, diff --git a/mm/page_counter.c b/mm/page_counter.c index e002688bf7f1a..fbfe9a1b29d2e 100644 --- a/mm/page_counter.c +++ b/mm/page_counter.c @@ -389,6 +389,40 @@ void page_counter_free_stock(struct page_counter *coun= ter) counter->stock =3D NULL; } =20 +void page_counter_drain_stock_local(struct page_counter *counter) +{ + struct page_counter_stock *stock; + unsigned long nr_pages; + + if (!counter->stock) + return; + + local_lock(&counter->stock->lock); + stock =3D this_cpu_ptr(counter->stock); + nr_pages =3D stock->nr_pages; + stock->nr_pages =3D 0; + local_unlock(&counter->stock->lock); + + if (nr_pages) + page_counter_uncharge(counter, nr_pages); +} + +void page_counter_drain_stock_cpu(struct page_counter *counter, + unsigned int cpu) +{ + struct page_counter_stock *stock; + unsigned long nr_pages; + + if (!counter->stock) + return; + + stock =3D per_cpu_ptr(counter->stock, cpu); + nr_pages =3D stock->nr_pages; + if (nr_pages) { + stock->nr_pages =3D 0; + page_counter_uncharge(counter, nr_pages); + } +} =20 #if IS_ENABLED(CONFIG_MEMCG) || IS_ENABLED(CONFIG_CGROUP_DMEM) /* --=20 2.53.0-Meta From nobody Sun May 24 20:35:14 2026 Received: from mail-ot1-f49.google.com (mail-ot1-f49.google.com [209.85.210.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5BD4C395AF8 for ; Fri, 22 May 2026 22:06:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779487601; cv=none; b=W6uwUGEQZPG9Inf/Dgw75QxiUtgH5Y/kTVOfY6P4aUBtw3moIC63e8IvGpGt1SeCrf3FjIXO9a8vL9tNSDMrGEtnRLh1GoCAGx9PxuRDWSVucEqf5CxSU1NZXaCw7rNvd3g6L2ps4uMmce59e5VboFCuJ87mU4j8iVakO0N2Rb8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779487601; c=relaxed/simple; bh=bY2Vrdbc9B72qquzLFSKVwqLesr2njXFUXExStd+pnc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ua4JRQ0WCrkQNSjPgaJpkO9YwmstmZyaXGRTicluaJlog2oCPB8FzR5VjZF3B5nAed6QDdeb8DmiZ4EPOS1cAaMbUtZVuMJFBBag3aphBupY6O6AFOPoBzBTVQQp57Sqq0ZehAx568rTpAEq9QGxvmS6mEDpJ9HhuI7S1tLhnBo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=AukS9c6c; arc=none smtp.client-ip=209.85.210.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AukS9c6c" Received: by mail-ot1-f49.google.com with SMTP id 46e09a7af769-7dcdca9aa0bso8688748a34.0 for ; Fri, 22 May 2026 15:06:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779487598; x=1780092398; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=coCEHSpY96h8xPt3azS5bgHLcKMYgGKS1j3gGbJhNfs=; b=AukS9c6chUgO2SSEH1MaKjRXfptn1SGmAuIe10zhOKVdAptS1/c8XXCGquHn6eUFG6 D0lroqGZdxBFVw6sCOqBfxqdJfvdq+5lO6/HQfbmbQghqxQjLH6Tisi6YCyPWyfGbqFU 52lIUCdzRwc/7bDHlJlGG/qGhBAbCM7jy+Y8xLs7jNTu1JUseWQzEyd0bhMboKkJZFJL JoGWdWOCO8BuKgrVcZ5xv7rUQnDm1ImCyi4ZN0uJr5uo9WKSSW6+m1wpfWW/8SdMRh8V 8JBQ25izbpvzzH/cHM9WhqgxGQFhuNiRExQOXSs4JMi4IUEEzkzC1wHfudX+qqVeUaBh imIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779487598; x=1780092398; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=coCEHSpY96h8xPt3azS5bgHLcKMYgGKS1j3gGbJhNfs=; b=iNCkFGleDwsaq5eiS/JBgdlWJJSIdU0rodGpiyGUzv5EoPYNg1dGjyoiIbYFIn8bGy Fb3c19SE9M+HwgrgxnNqQnCW4g+swkz6D32Abh0ofcuAK9+pvnRLMj7GW9zAf2vHzLLu RkPGnd9IuCvqeI3Q2a9Y4pGe/cY1Qs4Cv497ZQIaX8T7FA7AGCf0N/OekAkpv9ldyTqN Tk2R3Ef1Ota0k94WakyabF7h8epeDCEJrlfoyws4n0qqXKM/4Fx7ATrZm7dwV3RfbBiU a3vplkKjdO2JhglootnlhtJOKDwUZjPYcuJM0lijtr8UTY6mRwI7sOJrbtLECdGdvQ2x 1ZQw== X-Forwarded-Encrypted: i=1; AFNElJ+WNeW8LoQak8524CCrZVtZA6CjAyAztrMUJPIutyPc16aGXyrpMa7KPyMZSLXFD1EO59A81075HbQCF10=@vger.kernel.org X-Gm-Message-State: AOJu0YxzxMQdRUVNIFAAm99hHOjNDmlcZPfx9HZ2bHM2TL8vgHK4hRpq GjFHVi9/vHg7zo4HvaokKVa4mMMNLq6VZQoU3cVNEVUFgukBg9o9F0bg X-Gm-Gg: Acq92OHeCLG8jOHigXkQdxIxsoC+7+8Vy5OhDXRpu/oyGa8HPis/hCn0PuxQfPOVa2v +czRmzEKqOzabxEodX+dHxsG5p9lURV83Qj3Hs5YRB18d67r3iX48Zt9G9OG1RCsF4iaI4Xlz1i IxCOhnzjKYwjZ3C8jcZlvrlOTUzQgfVPluhxBsYpAKx+qQp7ZjrDNQkUZm0R6q0IrgPv79lkhzw wQdXAhFqfKVtXc4Lsr4i+4Jximf3ClOjwD0wzYdyhmHT/oMuO5oh6Le4lnfhC+nj9Ufsq7m4zPH cIF8q3HxwYnKFnAsvTnQygZs3u8tYcTOaobDU+RbcsXbjTx0V3aiMQo1NwxXaj0mRJUhSzvP5Fw jBk8VCtKPMYeH0+RpfFdd8xGVSYWtKJgHN6YJMBMRJ/VHPTn0nz8+uS7221HS2yTEpu+AfqxWPf q1tbWJp712N/kCGb75XWpsU/p8Ro/xbh2CrPekhPM4tOM57AkEHTeCd0cn0jm+SQ8= X-Received: by 2002:a05:6820:1789:b0:696:834d:cd15 with SMTP id 006d021491bc7-69d7fcaa869mr2248091eaf.7.1779487598260; Fri, 22 May 2026 15:06:38 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:6::]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-69d83a4eef1sm1506487eaf.11.2026.05.22.15.06.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 May 2026 15:06:37 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner , Michal Hocko Cc: Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 4/7 v2] mm/memcontrol: convert memcg to use page_counter_stock Date: Fri, 22 May 2026 15:06:22 -0700 Message-ID: <20260522220627.1150804-5-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260522220627.1150804-1-joshua.hahnjy@gmail.com> References: <20260522220627.1150804-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now with all of the memcg_stock handling logic replicated in page_counter_stock, switch memcg to use the page_counter_stock. There are a few details that have changed: First, the old special-casing for the !allow_spinning check to avoid refilling and flushing of the old stock is removed. This special casing was important previously, because refilling the stock could do a lot of extra work by evicting one of 7 random victim memcgs in the percpu memcg_stock slots. In the new per-counter design, refilling stock just adds pages to the counter's own local cache without affecting other memcgs, so the original reason for the special case no longer applies. Also, we can now fail during page_counter_enable_stock(), if there is not enough memory to allocate a percpu page_counter_stock. This failure is rare and nonfatal; the system can continue to operate, with the page counter working without stock and falling back to walking the hierarchy. Finally, drain_all_stock is restructured to iterate CPUs in the outer loop (rather than memcgs) to be able to schedule draining all memcgs via a single work_on_cpu call. It reduces the number of synchronous per-CPU work calls from O(memcgs * CPUs) to just O(CPUs). Note that obj_stock remains untouched by these changes. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- mm/memcontrol.c | 78 +++++++++++++++++++++---------------------------- 1 file changed, 34 insertions(+), 44 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 051b82ebf371c..cb1ea17e03730 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2053,6 +2053,17 @@ static void schedule_drain_work(int cpu, struct work= _struct *work) queue_work_on(cpu, memcg_wq, work); } =20 +static long drain_stock_on_cpu(void *arg) +{ + struct mem_cgroup *root_memcg =3D arg; + struct mem_cgroup *memcg; + + for_each_mem_cgroup_tree(memcg, root_memcg) + page_counter_drain_stock_local(&memcg->memory); + + return 0; +} + /* * Drains all per-CPU charge caches for given root_memcg resp. subtree * of the hierarchy under it. @@ -2064,28 +2075,16 @@ void drain_all_stock(struct mem_cgroup *root_memcg) /* If someone's already draining, avoid adding running more workers. */ if (!mutex_trylock(&percpu_charge_mutex)) return; - /* - * Notify other cpus that system-wide "drain" is running - * We do not care about races with the cpu hotplug because cpu down - * as well as workers from this path always operate on the local - * per-cpu data. CPU up doesn't touch memcg_stock at all. - */ + + for_each_online_cpu(cpu) + work_on_cpu(cpu, drain_stock_on_cpu, root_memcg); + + /* Drain obj_stock on all online CPUs */ migrate_disable(); curcpu =3D smp_processor_id(); for_each_online_cpu(cpu) { - struct memcg_stock_pcp *memcg_st =3D &per_cpu(memcg_stock, cpu); struct obj_stock_pcp *obj_st =3D &per_cpu(obj_stock, cpu); =20 - if (!test_bit(FLUSHING_CACHED_CHARGE, &memcg_st->flags) && - is_memcg_drain_needed(memcg_st, root_memcg) && - !test_and_set_bit(FLUSHING_CACHED_CHARGE, - &memcg_st->flags)) { - if (cpu =3D=3D curcpu) - drain_local_memcg_stock(&memcg_st->work); - else - schedule_drain_work(cpu, &memcg_st->work); - } - if (!test_bit(FLUSHING_CACHED_CHARGE, &obj_st->flags) && obj_stock_flush_required(obj_st, root_memcg) && !test_and_set_bit(FLUSHING_CACHED_CHARGE, @@ -2102,9 +2101,13 @@ void drain_all_stock(struct mem_cgroup *root_memcg) =20 static int memcg_hotplug_cpu_dead(unsigned int cpu) { + struct mem_cgroup *memcg; + /* no need for the local lock */ drain_obj_stock(&per_cpu(obj_stock, cpu)); - drain_stock_fully(&per_cpu(memcg_stock, cpu)); + + for_each_mem_cgroup(memcg) + page_counter_drain_stock_cpu(&memcg->memory, cpu); =20 return 0; } @@ -2379,7 +2382,6 @@ void __mem_cgroup_handle_over_high(gfp_t gfp_mask) static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int nr_pages) { - unsigned int batch =3D max(MEMCG_CHARGE_BATCH, nr_pages); int nr_retries =3D MAX_RECLAIM_RETRIES; struct mem_cgroup *mem_over_limit; struct page_counter *counter; @@ -2392,31 +2394,19 @@ static int try_charge_memcg(struct mem_cgroup *memc= g, gfp_t gfp_mask, bool allow_spinning =3D gfpflags_allow_spinning(gfp_mask); =20 retry: - if (consume_stock(memcg, nr_pages)) - return 0; - - if (!allow_spinning) - /* Avoid the refill and flush of the older stock */ - batch =3D nr_pages; - reclaim_options =3D MEMCG_RECLAIM_MAY_SWAP; if (!do_memsw_account() || - page_counter_try_charge(&memcg->memsw, batch, &counter)) { - if (page_counter_try_charge(&memcg->memory, batch, &counter)) + page_counter_try_charge(&memcg->memsw, nr_pages, &counter)) { + if (page_counter_try_charge(&memcg->memory, nr_pages, &counter)) goto done_restock; if (do_memsw_account()) - page_counter_uncharge(&memcg->memsw, batch); + page_counter_uncharge(&memcg->memsw, nr_pages); mem_over_limit =3D mem_cgroup_from_counter(counter, memory); } else { mem_over_limit =3D mem_cgroup_from_counter(counter, memsw); reclaim_options &=3D ~MEMCG_RECLAIM_MAY_SWAP; } =20 - if (batch > nr_pages) { - batch =3D nr_pages; - goto retry; - } - /* * Prevent unbounded recursion when reclaim operations need to * allocate memory. This might exceed the limits temporarily, @@ -2513,9 +2503,6 @@ static int try_charge_memcg(struct mem_cgroup *memcg,= gfp_t gfp_mask, return 0; =20 done_restock: - if (batch > nr_pages) - refill_stock(memcg, batch - nr_pages); - /* * If the hierarchy is above the normal consumption range, schedule * reclaim on returning to userland. We can perform reclaim here @@ -2552,7 +2539,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg,= gfp_t gfp_mask, * and distribute reclaim work and delay penalties * based on how much each task is actually allocating. */ - current->memcg_nr_pages_over_high +=3D batch; + current->memcg_nr_pages_over_high +=3D nr_pages; set_notify_resume(current); break; } @@ -2858,7 +2845,7 @@ static void obj_cgroup_uncharge_pages(struct obj_cgro= up *objcg, account_kmem_nmi_safe(memcg, -nr_pages); memcg1_account_kmem(memcg, -nr_pages); if (!mem_cgroup_is_root(memcg)) - refill_stock(memcg, nr_pages); + memcg_uncharge(memcg, nr_pages); =20 css_put(&memcg->css); } @@ -3797,6 +3784,8 @@ static void __mem_cgroup_free(struct mem_cgroup *memc= g) =20 static void mem_cgroup_free(struct mem_cgroup *memcg) { + page_counter_free_stock(&memcg->memory); + page_counter_free_stock(&memcg->memsw); lru_gen_exit_memcg(memcg); memcg_wb_domain_exit(memcg); __mem_cgroup_free(memcg); @@ -3956,6 +3945,9 @@ static int mem_cgroup_css_online(struct cgroup_subsys= _state *css) refcount_set(&memcg->id.ref, 1); css_get(css); =20 + /* failure is nonfatal, charges fall back to direct hierarchy */ + page_counter_enable_stock(&memcg->memory, MEMCG_CHARGE_BATCH); + /* * Ensure mem_cgroup_from_private_id() works once we're fully online. * @@ -3994,6 +3986,7 @@ static void mem_cgroup_css_offline(struct cgroup_subs= ys_state *css) lru_gen_offline_memcg(memcg); =20 drain_all_stock(memcg); + page_counter_disable_stock(&memcg->memory); =20 mem_cgroup_private_id_put(memcg, 1); } @@ -5185,7 +5178,7 @@ void mem_cgroup_sk_uncharge(const struct sock *sk, un= signed int nr_pages) =20 mod_memcg_state(memcg, MEMCG_SOCK, -nr_pages); =20 - refill_stock(memcg, nr_pages); + page_counter_uncharge(&memcg->memory, nr_pages); } =20 void mem_cgroup_flush_workqueue(void) @@ -5238,12 +5231,9 @@ int __init mem_cgroup_init(void) memcg_wq =3D alloc_workqueue("memcg", WQ_PERCPU, 0); WARN_ON(!memcg_wq); =20 - for_each_possible_cpu(cpu) { - INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work, - drain_local_memcg_stock); + for_each_possible_cpu(cpu) INIT_WORK(&per_cpu_ptr(&obj_stock, cpu)->work, drain_local_obj_stock); - } =20 memcg_size =3D struct_size_t(struct mem_cgroup, nodeinfo, nr_node_ids); memcg_cachep =3D kmem_cache_create("mem_cgroup", memcg_size, 0, --=20 2.53.0-Meta From nobody Sun May 24 20:35:14 2026 Received: from mail-oi1-f182.google.com (mail-oi1-f182.google.com [209.85.167.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63C7D39902D for ; Fri, 22 May 2026 22:06:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779487603; cv=none; b=EhX81sKAU8VJq/WVEV71Nelcx3eWopjy2zgtXYEfsABuHS4fAe0VV0NycPiNac9uJXKPD/wRO0IRN2cuBIMqZ/RIKz3Bc9aBjUAzq56Sk93rTMG51QQzHpSJXWQI14tFosRJk0LRiCPlwL8RQ3FNyazdFIIDYWgBy4xC5HwvGGc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779487603; c=relaxed/simple; bh=4UYwrcyLg6NFLJwsNn/rvSTVnQ4sAFbwhsLF1ErPS6s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YFADcEwwcbgG4Ag4h8D1TnS17U2T+67iOks60I8yjLXGG+kK37PR0/kPj4+XvAU1OG6nyEYqDVjX+WMAlGULR8NpzXKqTSLqad/uHOi3EQavQSdLxi9KoHf4BktzmaEQFRk6PXyAm4WbiDxULPSgmed2NEq9sc9VcIbNk2JPHEo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=M+UknO7K; arc=none smtp.client-ip=209.85.167.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="M+UknO7K" Received: by mail-oi1-f182.google.com with SMTP id 5614622812f47-4855003fa6cso839136b6e.3 for ; Fri, 22 May 2026 15:06:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779487601; x=1780092401; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=31yASLdMSEM7nexMHwB6RiM0ndTUZyyCT5Ui47lS5/Q=; b=M+UknO7KedpAQJVmVfDNiBhxZosxKwfgE9FYCfCPY0sdnhwghGxjdkUGwL1aoF+8Lj 6qcW9xfW1aCvu+NFmNZFQ3sWKnWDD82fkGwA6/phTbJDJalVepRaFLUtrxmwVVEa23uq ZmpIInVVK7nARoBlw7T7O9ZTaYXsnzQpnFbpu0/Ada+Q3XPph8BhnhN8qG17xmGAIvYx OwCgYxC9hHw1l8vMsBFYci1dAI8rmu0PoxpcEeGSCKCkNvwnO/yeVf0VhIHgGZ94oMi1 tEicObjnLRRIMBN+QvXcIC24vXWill1chCoTL338KnoXL68ApNaO5UMl2vOVF0iM5YCy 6dow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779487601; x=1780092401; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=31yASLdMSEM7nexMHwB6RiM0ndTUZyyCT5Ui47lS5/Q=; b=mAvW5OTVsKvq0TXwPyskLBv2HarVvFkxqOGwLx4eJJOqH39ji0Mp5o9O8zD+8VBRSY jTcthar4btGdbY6Y0mLfvKdd2TLmwlgI0sKxVUZEys8PifU7Uf+i9xTeIhPSN7WDxRe5 rrvM65OijoHtDvvJZInPlMnbGV923z0ig0mKjjIuVRuuV0TOPhxQtkzxGMDP10IJMTQR C0PjrOFIHWXGuhGjZZOOGQyMaVO0pBwQwSib4eIQFfMilS3MVA6URVEZJ0GzU/AAkpA5 32eahiYzBhwatnXkf2NywYDLP9b54fZKBDuTM6oe+8jpKQ1Yy2eSSiX+QPrmsJJJOOjU zXKQ== X-Forwarded-Encrypted: i=1; AFNElJ+wrnSAxLdJTmMvPDWOBATSIck4BCTIfpcAe/Kpy1buGeQXovOZAZ9Vi7ncuc1nMwBvexyXAO/27/IbYvE=@vger.kernel.org X-Gm-Message-State: AOJu0YyjDbAUJaGKz+GR0vENCaGHupL78YlfIAqqmv6uXd3rGJTPuNIU Bgk/aC3k/bF2wD4rfWRjV+ftCLFRk/LKofc+hTczKPOy5v3+DIZKzlwi X-Gm-Gg: Acq92OGSVhClDBE69e3Q9pcZ3hffe6oWfZTLknqE0zNny4qwq1j6EJgxHcdJKdOKDk1 qxMVjTTZhaEl6J27aZrJ74DCrGEDJ/oKyb4QM3GL+xauWTr/kvYFW1O6CmHyNbDw4oYuVzf7z7a 4bxnnz+XEeCVPmTBss99XcLHt/gsyBBDCH2QtRvVmJrIfX58asobTO4L2RoVGGhB78leZGIwMwc d1m3mrCsIEGw+7fRc+iaXMfsOTe6kH0x673qeGqnVQeWZPKnrO6jXnFQ2rk8sZyAsVo4QXDZFrQ yNJamvXyz1TaAMih+Ed8RR61zidKxICmcMfR4IQL3RdI538iI6QgGVrbsBeqYOp1Ih21ddnXU4m r6ULFufqBlaLOYVkSxHzSGVLe43eBUpC0UOeVW/JWMHU/qaUD1+MJnDKfvPhYVpwbJGGzYu4LyD 9NRQI+5U1jMSwzaxfxtvWZtUxSjUgFLn7OAFxzO8MimNr6DI5b9URWupINEvvb4uOUiflcA136H Qc= X-Received: by 2002:a05:6808:c2fa:b0:485:4441:721a with SMTP id 5614622812f47-4854a1d46f1mr3600895b6e.36.1779487601412; Fri, 22 May 2026 15:06:41 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:73::]) by smtp.gmail.com with ESMTPSA id 5614622812f47-48554041ccasm1180385b6e.0.2026.05.22.15.06.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 May 2026 15:06:40 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner , Michal Hocko Cc: Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 5/7 v2] mm/memcontrol: optimize memsw stock for cgroup v1 Date: Fri, 22 May 2026 15:06:23 -0700 Message-ID: <20260522220627.1150804-6-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260522220627.1150804-1-joshua.hahnjy@gmail.com> References: <20260522220627.1150804-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Previously, each memcg had its own stock, which was shared by all page counters within it. Specifically in try_charge_memcg, the stock limit check would occur before the memsw and memory page_counters were charged hierarchically. Now that the memcg stock was folded into the page_counter level, and we have replaced try_charge_memcg's stock check against the memory page_counter's stock, this leaves no fast path available for cgroup v1's memsw check. Introduce a new stock for the memsw page_counter, charged independently from the memory page_counter. This provides better caching on cgroup v1: The best case scenario is when both the memsw and memory page_counters can use their cached stock charge; this is the old behavior. The halfway scenario is when either the memsw or memory page_counter is within the stock size, but the other isn't. This requires one hierarchical charge. The worst case scenario is when both memsw and memory page_counters are over their limit, and must walk two page_counter hierarchies. This is the same as the old behavior. By introducing an independent stock for memsw, we can avoid the worst case scenario more often and can fail or succeed separately from the memory page counter. One user-visible change is that reported memsw usage may transiently be lower than memory usage. This happens because each counter independently batches the stock charges, so the visible values can differ by up to the stock batch size (MEMCG_CHARGE_BATCH) pages. Signed-off-by: Joshua Hahn --- mm/memcontrol.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index cb1ea17e03730..06ec4d26cb519 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2058,8 +2058,11 @@ static long drain_stock_on_cpu(void *arg) struct mem_cgroup *root_memcg =3D arg; struct mem_cgroup *memcg; =20 - for_each_mem_cgroup_tree(memcg, root_memcg) + for_each_mem_cgroup_tree(memcg, root_memcg) { page_counter_drain_stock_local(&memcg->memory); + if (do_memsw_account()) + page_counter_drain_stock_local(&memcg->memsw); + } =20 return 0; } @@ -2106,8 +2109,11 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu) /* no need for the local lock */ drain_obj_stock(&per_cpu(obj_stock, cpu)); =20 - for_each_mem_cgroup(memcg) + for_each_mem_cgroup(memcg) { page_counter_drain_stock_cpu(&memcg->memory, cpu); + if (do_memsw_account()) + page_counter_drain_stock_cpu(&memcg->memsw, cpu); + } =20 return 0; } @@ -3947,6 +3953,8 @@ static int mem_cgroup_css_online(struct cgroup_subsys= _state *css) =20 /* failure is nonfatal, charges fall back to direct hierarchy */ page_counter_enable_stock(&memcg->memory, MEMCG_CHARGE_BATCH); + if (do_memsw_account()) + page_counter_enable_stock(&memcg->memsw, MEMCG_CHARGE_BATCH); =20 /* * Ensure mem_cgroup_from_private_id() works once we're fully online. @@ -3987,6 +3995,8 @@ static void mem_cgroup_css_offline(struct cgroup_subs= ys_state *css) =20 drain_all_stock(memcg); page_counter_disable_stock(&memcg->memory); + if (do_memsw_account()) + page_counter_disable_stock(&memcg->memsw); =20 mem_cgroup_private_id_put(memcg, 1); } --=20 2.53.0-Meta From nobody Sun May 24 20:35:14 2026 Received: from mail-oi1-f178.google.com (mail-oi1-f178.google.com [209.85.167.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 60FEA39A4A7 for ; Fri, 22 May 2026 22:06:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779487606; cv=none; b=rwRfVck5OOjzM1Y+cUNzH4G+clG1ATwuAfc/IfOpelb4NRfhDqpuEnfgo/jfT9hasvnYVjahq0XZfjrXJ1I5nM7lDQH84fEN8Srgtn+hPfMXhBV6T1oyuI8kXAavhM6WlKgIL1kSpXy2oKypL5mF6egIFKZwH+NlgtyP/qEzuGg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779487606; c=relaxed/simple; bh=H/XQBqBRChs9osHNUhzeryf/uvzIMoUecr7yd1hITkk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QWapiOU6+vkdALT2i/YQIea+hG/MWgcjjVkfi2lv3TnuTnxjVE0riKcH2lV68eufcgMFLGgfmHFea59GKwCvFvyluQLcVUk4XMYpW4flxhNMzP318dMSXuoz2g4RsUMT6HphjNOeX6Ku2sJLCSkYrnfg1i/sZsqSgGNYHglqeCk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=l2znhhr8; arc=none smtp.client-ip=209.85.167.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="l2znhhr8" Received: by mail-oi1-f178.google.com with SMTP id 5614622812f47-484df1dce93so5033648b6e.2 for ; Fri, 22 May 2026 15:06:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779487604; x=1780092404; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nur42U2oSMrw1R78cJIX2iGXEeitKT4aAu4TEsPr2eM=; b=l2znhhr825O0VATWasx5uUKyMCtESUVcpZ5bYjG6ZelhNiwwmdgF/UYBEg71MXq8oH +LVz0D1IJcBpNhTANGwywm3zH+T7VIEpYOQ5QHuzVWRhu9G6OYrAXQ1qwAQIAiAjeFbk G3cxeLeZ6kQgOmZnAqXuSk6z/M3sWnyvGGvFFoyLh17UG9YKo/bVcwkrIdmHL1lk+DkQ gsx/ZysoWf5LSNbY5iAkasxAv4zFjxTgQnp6uX9yCjSjMiTZC1tdyyYC5Cr92PwbDTdi K+koGmg694X6QJELHGLYnstdHJfpP9Ub7oSffoHvoodUbXPxLxIBpWnLEyy0FWSmPWsJ Yp1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779487604; x=1780092404; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=nur42U2oSMrw1R78cJIX2iGXEeitKT4aAu4TEsPr2eM=; b=Wlu6MdoVT9So3YKt0JUFHAuiL43xffKnsGiCdwjRnzkpb1NIsZODPsJhvfnALLo/Ia NDAK8PwomGhbTfV5MGORaW/pRil35OLdXBHJKJqJ8Ew+0CuAqDkJvbyE+cc1iht+6SdC xPWX+pwELNx3RI/tZMf9qPUmYa/k5Nzbup2fA0B7i438bsU+k5OhJmwzvI5s3FPhrFs3 JzQBZMyXeGaKDw/v/sbOKOGXupNJzSuBmUeIYgn00qvyNs//55g16U4EU3ljUZwCETQL KSC3V2xUcB40CqlIcSZdbPssWFvmAGeCguBz3F87GJagf/XZ19YV1Ob1cwb1GOcruWyv bKcA== X-Forwarded-Encrypted: i=1; AFNElJ/sVZ7IvrDV9vGXG3PvN1K/BqLrYqSShSOcBnHPJQh8O61yD08SQiuoGgfD6PvP+S4/Vmuri9G+v6bj6vs=@vger.kernel.org X-Gm-Message-State: AOJu0YyNlWh1jpIKwb3HIPWYWXCgy8ir0h65GIJq+vZ9d14OrATZKlNu Eh+k12QYvyV3WFeJb9Mayx0OB5v1glKhraP6By7lpo0ai4tKmqq42to2 X-Gm-Gg: Acq92OEwMdt7KAERFIN4TeALrPhhuGK8i0hRaSHSdtLQOPeiUJss9VnSLMkeHgIByvz u1mmXx+LCZ/HDtwD7vfupm49FNcQcdhn1/Q490hNTU/FhS0oXj/hIrcjCYN7Eiluk/cGpuKDXpP DV2TMExhnATF4Cd6ARI9L8mANsWnGfVqmpzAYDmKONeVFegAUpjnTFXNjB+RrpTV9Lo58eEt3pv HAUFnUCYyekM9MHcNB7nudMZDeYUQ/BR/PsV96U8z7bxVuP/c9Sqw3QjP4KeJVAop3669kFMOdF fVlWlKulK4wzDZK1V7Oe+BX5q0blYfLOUIvUPm73WcVKEjMa43PekkoDjCevN4basXJOXAv/QYk UOk6MUrSBon7H0igDA+q8FoL6yyjXgSKiNVvAyCn77ckXgJ1BElMe3FGN3v8bGCykMOA6jWSUax 0KhUGBD0Sw220mEoXBBijoO757kbBPI/vtDZXu5YL4tcSKqXPGK9FY9w== X-Received: by 2002:a05:6808:e494:b0:485:5f33:91ab with SMTP id 5614622812f47-4855f3393c9mr972537b6e.43.1779487604358; Fri, 22 May 2026 15:06:44 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:5a::]) by smtp.gmail.com with ESMTPSA id 5614622812f47-48554755d06sm1157002b6e.16.2026.05.22.15.06.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 May 2026 15:06:43 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner , Michal Hocko Cc: Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 6/7 v2] mm/memcontrol: optimize stock usage for cgroup v2 Date: Fri, 22 May 2026 15:06:24 -0700 Message-ID: <20260522220627.1150804-7-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260522220627.1150804-1-joshua.hahnjy@gmail.com> References: <20260522220627.1150804-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In cgroup v2, it is unlikely for memcg charges to happen directly on non-leaf cgroups. There are a few exceptions, such as processes that have yet to be migrated to children, and tasks that are reparented on memcg destruction, that charge to the parent. Because it is rare for parent cgroups to receive direct charges, stock that remains in them are wasted memory. Drain parent stocks when the first child is created to return those pages for other memcgs to use. This optimization is not for cgroup v1, where tasks can be attached to any cgroup in the hierarchy, meaning stock can be consumed & refilled for non-leaf cgroups as well. Signed-off-by: Joshua Hahn --- mm/memcontrol.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 06ec4d26cb519..fc1e1b10b6ab6 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3968,6 +3968,21 @@ static int mem_cgroup_css_online(struct cgroup_subsy= s_state *css) */ xa_store(&mem_cgroup_private_ids, memcg->id.id, memcg, GFP_KERNEL); =20 + /* + * It is unlikely for non-leaf memcgs to get direct charges on v2. + * Drain the parent's stock if we are the first child. + */ + if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) { + struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); + int cpu; + + if (parent && !mem_cgroup_is_root(parent) && + !css_has_online_children(&parent->css)) { + for_each_online_cpu(cpu) + work_on_cpu(cpu, drain_stock_on_cpu, parent); + } + } + return 0; offline_kmem: memcg_offline_kmem(memcg); --=20 2.53.0-Meta From nobody Sun May 24 20:35:14 2026 Received: from mail-oa1-f52.google.com (mail-oa1-f52.google.com [209.85.160.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1C2B39B942 for ; Fri, 22 May 2026 22:06:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779487609; cv=none; b=ZvztzvLtmiqIgu1t3dlFcPf+nzPOQ940K9Wcs6uM04aaBrhFJ238wKNykF4iC8q5cabDk/VX4KbwNfEmmOGvPZ35rn21dPJPMzxYdWlUnEfGOB6MHKaH1wS7zfXioLUa046qPPVM/I+nyU46SkuHZFKxXB2dp10u2n8VO66vU1I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779487609; c=relaxed/simple; bh=wi2AftUjWyoieHh7H6LppbP8hmxwRBEpUtyqklssfNo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uSPmsk6oR5MVuHmDSr868QU2dxBuBroMEna3t2gLubCb/T9MXehODHt4bQVN/5fkr/xS/6Kcp6p+zU26ZptnY3IoJhy5r0Dd6moq3n9Mdrdwh+qMFp8vPMofU2s13txU7mP5QM/m6W10vIfD0A2J++Fv+Cpkv3hpcY5lmMW0FUw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=IztMW+2I; arc=none smtp.client-ip=209.85.160.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="IztMW+2I" Received: by mail-oa1-f52.google.com with SMTP id 586e51a60fabf-43587e63a8eso5117166fac.0 for ; Fri, 22 May 2026 15:06:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779487607; x=1780092407; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2LcIS+VJgzgWoo9AJl45fIdipLbvB6geiavxJnRYEKM=; b=IztMW+2IoqZmmQMpE1aZZIL5a24uM40p63Wi42GhEvYNJ4/y1JA+mdoCTHVRo0XK32 P2N7+34V9YhxOr8AvaETJTovmN9g0omXvkO/ctivTk6felrW6iYmRmXn6EOf/3nN0dwo Gj/tmK5ZlkrTtoferQjYDNSAwkNes4IKpk6HWGdWnmbWfqDP9PgbLuDKiUvJibF/5ae2 ndbncz+kdOATgK9GJPFKizL+/o+67JaTUEedfSUajVQgF4Z0yZix6Amm7ZqYl3gkYv/H dDLxc2wBUctkJEZ2iACe0YERvi57zjaz6aX8+t4rQVrugOutT7mdwgpqot9D03GgDhoS QN8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779487607; x=1780092407; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=2LcIS+VJgzgWoo9AJl45fIdipLbvB6geiavxJnRYEKM=; b=EchqoAKPZATGwULVVx4+NuCfnR+1ONREzhW0CC7WQR5y1C4o4ZGCHHNNDAIio3Hmn4 p/djSpIE6rVRjz5aENscWSmsYHOLj5tmhMae4uO+LGnvLkEyxn9Y1tF2pVjNZjdPrCyk sizn1nlR9XTswxgPAS5eVDwQhYrBoEZMiv52xn1Ll7zMtN9eU0qtKtLeqm/ioEF9ARAh Que9diDOb5e1/AN8Cx8ef3/bn9sJh+HtCX3LycU3lmKWpagPieIPccOnKU7y9wPF6+hX MYsdYnZX2TcrAp+ASxn2EOpH77bwcTGDlXkvxcZnkqp60FFayRBZ34iRQ3hqmRhG+or8 cnYg== X-Forwarded-Encrypted: i=1; AFNElJ9o0ztcN80kUM2Vh8y4Ow3j6n+GxhMaY0Eg5+YkJOIO5xOLY7oOMCiD9WM985Vc8hYj72scc/oPoYV0qy0=@vger.kernel.org X-Gm-Message-State: AOJu0Yz3+O6kx0uiGkNzh4C6GiTUSfm8tJbGgYJEhs9AaC4uiIedmpHT dpdq9cL59hYxabiqJg9K4BQ19D8u+R3DSpu44eVsG1DRI/alofySXUPI X-Gm-Gg: Acq92OH1cG//fyDp4cOpPWMXQkkHd8Am4gjOYjuwpOO275jKZGkt1zWJByTRoHwnpOb /Ms/wzJ3hTVyFsEe0cnGFNEDDm6MMRYyooTILsEBmjcQ0GAaNSwqWm0UrX/yQSX/JccvFpeiPa4 X5YjA4HUTU4oqJjbN+yo74C7Xw21QTwyRgt4nIsWJZ0arIoc5k0tzTyA1jj93YDoSyxdgX80leW f8t5OrykUkpZE+YRVLH7ZKa4qttrUdcWC0gWA6xdeX7cwrrytIxQrbc2CvYM4JWTya8W3yn5JVC w9V18go5J7u0znYzLaY3Wbx7opG65LM1OzlsikN20OlsHPjmar0REw2MzMJWHZZkMlZyb19iLhF GZ2jp1/MnrW9xTqRUb1Eap3N3hfkS2gVPtPcgQG2IT+WWMRcgIH/LyGT7IIcgytdE7F1xulDPST WnhEdvhu5O1xUgH4Z61ZiiJ7eGeape/I0XY9fEp5/y23SzPZrXxBjJ1A== X-Received: by 2002:a05:6870:310e:b0:439:baa5:9f21 with SMTP id 586e51a60fabf-43b5ae75b29mr3333686fac.26.1779487606711; Fri, 22 May 2026 15:06:46 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:59::]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-43b63512d63sm2857806fac.2.2026.05.22.15.06.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 May 2026 15:06:45 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner , Michal Hocko Cc: Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 7/7 v2] mm/memcontrol: remove unused memcg_stock code Date: Fri, 22 May 2026 15:06:25 -0700 Message-ID: <20260522220627.1150804-8-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260522220627.1150804-1-joshua.hahnjy@gmail.com> References: <20260522220627.1150804-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that all memcg_stock logic has been moved to page_counter_stock, we can remove all code related to handling memcg_stock. Note that obj_stock is untouched and is still needed. FLUSHING_CACHED_CHARGE is preserved so that it can be used by obj_stock as well. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- mm/memcontrol.c | 183 ------------------------------------------------ 1 file changed, 183 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index fc1e1b10b6ab6..24774c272ef8c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1810,24 +1810,7 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *m= emcg) pr_cont(" are going to be killed due to memory.oom.group set\n"); } =20 -/* - * The value of NR_MEMCG_STOCK is selected to keep the cached memcgs and t= heir - * nr_pages in a single cacheline. This may change in future. - */ -#define NR_MEMCG_STOCK 7 #define FLUSHING_CACHED_CHARGE 0 -struct memcg_stock_pcp { - local_trylock_t lock; - uint8_t nr_pages[NR_MEMCG_STOCK]; - struct mem_cgroup *cached[NR_MEMCG_STOCK]; - - struct work_struct work; - unsigned long flags; -}; - -static DEFINE_PER_CPU_ALIGNED(struct memcg_stock_pcp, memcg_stock) =3D { - .lock =3D INIT_LOCAL_TRYLOCK(lock), -}; =20 struct obj_stock_pcp { local_trylock_t lock; @@ -1851,47 +1834,6 @@ static void drain_obj_stock(struct obj_stock_pcp *st= ock); static bool obj_stock_flush_required(struct obj_stock_pcp *stock, struct mem_cgroup *root_memcg); =20 -/** - * consume_stock: Try to consume stocked charge on this cpu. - * @memcg: memcg to consume from. - * @nr_pages: how many pages to charge. - * - * Consume the cached charge if enough nr_pages are present otherwise retu= rn - * failure. Also return failure for charge request larger than - * MEMCG_CHARGE_BATCH or if the local lock is already taken. - * - * returns true if successful, false otherwise. - */ -static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) -{ - struct memcg_stock_pcp *stock; - uint8_t stock_pages; - bool ret =3D false; - int i; - - if (nr_pages > MEMCG_CHARGE_BATCH || - !local_trylock(&memcg_stock.lock)) - return ret; - - stock =3D this_cpu_ptr(&memcg_stock); - - for (i =3D 0; i < NR_MEMCG_STOCK; ++i) { - if (memcg !=3D READ_ONCE(stock->cached[i])) - continue; - - stock_pages =3D READ_ONCE(stock->nr_pages[i]); - if (stock_pages >=3D nr_pages) { - WRITE_ONCE(stock->nr_pages[i], stock_pages - nr_pages); - ret =3D true; - } - break; - } - - local_unlock(&memcg_stock.lock); - - return ret; -} - static void memcg_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages) { page_counter_uncharge(&memcg->memory, nr_pages); @@ -1899,51 +1841,6 @@ static void memcg_uncharge(struct mem_cgroup *memcg,= unsigned int nr_pages) page_counter_uncharge(&memcg->memsw, nr_pages); } =20 -/* - * Returns stocks cached in percpu and reset cached information. - */ -static void drain_stock(struct memcg_stock_pcp *stock, int i) -{ - struct mem_cgroup *old =3D READ_ONCE(stock->cached[i]); - uint8_t stock_pages; - - if (!old) - return; - - stock_pages =3D READ_ONCE(stock->nr_pages[i]); - if (stock_pages) { - memcg_uncharge(old, stock_pages); - WRITE_ONCE(stock->nr_pages[i], 0); - } - - css_put(&old->css); - WRITE_ONCE(stock->cached[i], NULL); -} - -static void drain_stock_fully(struct memcg_stock_pcp *stock) -{ - int i; - - for (i =3D 0; i < NR_MEMCG_STOCK; ++i) - drain_stock(stock, i); -} - -static void drain_local_memcg_stock(struct work_struct *dummy) -{ - struct memcg_stock_pcp *stock; - - if (WARN_ONCE(!in_task(), "drain in non-task context")) - return; - - local_lock(&memcg_stock.lock); - - stock =3D this_cpu_ptr(&memcg_stock); - drain_stock_fully(stock); - clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); - - local_unlock(&memcg_stock.lock); -} - static void drain_local_obj_stock(struct work_struct *dummy) { struct obj_stock_pcp *stock; @@ -1960,86 +1857,6 @@ static void drain_local_obj_stock(struct work_struct= *dummy) local_unlock(&obj_stock.lock); } =20 -static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) -{ - struct memcg_stock_pcp *stock; - struct mem_cgroup *cached; - uint8_t stock_pages; - bool success =3D false; - int empty_slot =3D -1; - int i; - - /* - * For now limit MEMCG_CHARGE_BATCH to 127 and less. In future if we - * decide to increase it more than 127 then we will need more careful - * handling of nr_pages[] in struct memcg_stock_pcp. - */ - BUILD_BUG_ON(MEMCG_CHARGE_BATCH > S8_MAX); - - VM_WARN_ON_ONCE(mem_cgroup_is_root(memcg)); - - if (nr_pages > MEMCG_CHARGE_BATCH || - !local_trylock(&memcg_stock.lock)) { - /* - * In case of larger than batch refill or unlikely failure to - * lock the percpu memcg_stock.lock, uncharge memcg directly. - */ - memcg_uncharge(memcg, nr_pages); - return; - } - - stock =3D this_cpu_ptr(&memcg_stock); - for (i =3D 0; i < NR_MEMCG_STOCK; ++i) { - cached =3D READ_ONCE(stock->cached[i]); - if (!cached && empty_slot =3D=3D -1) - empty_slot =3D i; - if (memcg =3D=3D READ_ONCE(stock->cached[i])) { - stock_pages =3D READ_ONCE(stock->nr_pages[i]) + nr_pages; - WRITE_ONCE(stock->nr_pages[i], stock_pages); - if (stock_pages > MEMCG_CHARGE_BATCH) - drain_stock(stock, i); - success =3D true; - break; - } - } - - if (!success) { - i =3D empty_slot; - if (i =3D=3D -1) { - i =3D get_random_u32_below(NR_MEMCG_STOCK); - drain_stock(stock, i); - } - css_get(&memcg->css); - WRITE_ONCE(stock->cached[i], memcg); - WRITE_ONCE(stock->nr_pages[i], nr_pages); - } - - local_unlock(&memcg_stock.lock); -} - -static bool is_memcg_drain_needed(struct memcg_stock_pcp *stock, - struct mem_cgroup *root_memcg) -{ - struct mem_cgroup *memcg; - bool flush =3D false; - int i; - - rcu_read_lock(); - for (i =3D 0; i < NR_MEMCG_STOCK; ++i) { - memcg =3D READ_ONCE(stock->cached[i]); - if (!memcg) - continue; - - if (READ_ONCE(stock->nr_pages[i]) && - mem_cgroup_is_descendant(memcg, root_memcg)) { - flush =3D true; - break; - } - } - rcu_read_unlock(); - return flush; -} - static void schedule_drain_work(int cpu, struct work_struct *work) { /* --=20 2.53.0-Meta