From nobody Mon Jun 8 22:53:11 2026 Received: from mail-oi1-f176.google.com (mail-oi1-f176.google.com [209.85.167.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 113E2385D79 for ; Mon, 25 May 2026 19:05:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779735902; cv=none; b=tkOGTs+xqfrHxE6yulLERKu44KPDk4nyyDK6De5J2xoyCy6JTNXOkhgXnigiWReOD3Y0+FxiKzNA3MrXWgANdNsL9Hc8lxSGiF6RgOBPJKv5u30A/XwbMN2nB7OFsiBPpSjRSJursaTfZr7pLoJGgP4E7mciyXQaUQ94tZ57b0U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779735902; c=relaxed/simple; bh=8I8Xvk6Gz3cosLpmWj+Gb/iKGe7OG5+jYM3K/AvEEYk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KcaHR6Z4qpUZEndjpZDXnJsNwSxB1ZMxREQbknqBMXOCocJYh30RGlzCc2HwCvkfUHvAJRfxNtF/vc7Tan8v6oAGYigDVH24QmoCAHUvLy/pSsYm6CNp4t8A2xZP0pYBbu7IWv/NYfz2I7h10btaxyyL8oa5JVISISdcWZf2vfE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=RWCIwxcZ; arc=none smtp.client-ip=209.85.167.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RWCIwxcZ" Received: by mail-oi1-f176.google.com with SMTP id 5614622812f47-479d85152c9so3536202b6e.2 for ; Mon, 25 May 2026 12:05:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779735900; x=1780340700; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RDRjwcWKT8jzrOHACXaQIaOyyT2TFq8Oufd7POOepxo=; b=RWCIwxcZPp9uH343qWVtGo5os0Wp4MVVM1iL6qhFkCXtYN5Sr70d9GF1wQJCrdp07S CPVBIXJPMceD4DTeUHgqqK/Ys1VtCEEuFyT+VnXSjrxdCX3BF7afxFagisCOsOjyjGOS MuRDZ9Eo+ObCpe4FlENwkcPid6+gc+OKPHPbHPkOckwXZVn9GtKlLUZeXgtY1b+bN6ZM B366AusD53PKsp37cK/KkpNGtr0Qg/bGpVifVy6nSAp28dVUwFvnJe67TKIWCnf5eMt1 35kNNXivXGOhZ8Fk9xcyxhY1M34Wzx61kLbEB/ZvlsT5q50gJwtWddN+bd+Wm8a6WLRc UxIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779735900; x=1780340700; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=RDRjwcWKT8jzrOHACXaQIaOyyT2TFq8Oufd7POOepxo=; b=HjttMPEePwEZQzYUv2BButXtpHwX7GpqWaZKWda3WBDky3VNtYBakXBeyJRmFToyF9 VueJTtA1ntuEVrW9xO5AZgFKn3h61T8Ta+EeJmNBHytqnJRjbLEFyvQTcvWejXSx86bf GhR61Z+x6RTB7nKW3PdacMQ8l35YY4EaWpl7rFE0pCqP7UUAjaBHxy6no1zpA4auekcO nLlcv57GNZiBZRS3c0904a7FifvZ/jCrlaC2x/iMVvLlqjR+UchM0Yf/TzSYzxZlstLd /1YEXgLeDvt7NPB/7TXJoDIB8hqQqCkn6ZFnWhnjRuiLKZC1vV8LJbhtbYxhPGIpMI5m qpIg== X-Forwarded-Encrypted: i=1; AFNElJ8sGrTjSjSNdrzeZOpuijX4uJjiRkz1sHXbUXPTwEeJJE0ezWIp420MjbsfLvm/Pmdww0jVzI9w1gUUSiY=@vger.kernel.org X-Gm-Message-State: AOJu0YxQ/UWr5QJIKMtvimyZwZVLEpb/u5bGB6YJECAFq8khAh3c6ZB0 H9JxzmtnNYAihyAWpIPxze6I7KX6CWFppqe0m7jzn21EnT1pq0iGVsSG X-Gm-Gg: Acq92OGAyKyF+ZwB4Zgq2hilcjbI1kstA0Yudi/XqJmKbK0FZpaAZ1Ws0by869Clnuz HbI6WjJHrE+ymjHoUXgaU0eEUX8f5qty8CHbbJLnx2LQUH1oRN0zKTKIKV22GmMVgX44AuJBZPl mLGc9mxwo+UckCNlgjUdmHwIBFXLgkaiPtQxaaG0+TPG0qKsTDTTKCKQQPYOpoSsmSbrz6V2Bs+ 6J9Izajhe/S3Ux+88YX4mlGrLit2+EsoCJgVZqh+/JAlgvXypg8pXx1JBbFm1CrMrfqpNqV+RzK 3u0JGaw4NVaenKc5rm+G9btFbzKX8qBA4m955WgP61puock1rfg5YeiHL1u+9mJWnwT+uZ0u1At OD/NQbZ8A9wtI1lTabQjP5+JTV4SDZ8/MrcMfyl9wRPCeeLo3Z8Gex8sT4NVRfuGl6Zgp3QOWs0 RhthzlWk5fjD7aehKNs/p3+q0H7ayRW77DOOytPKa6pGZNOLKtsEIzVw== X-Received: by 2002:a05:6808:1455:b0:485:1173:f62a with SMTP id 5614622812f47-48549d8fb9fmr8720464b6e.1.1779735899959; Mon, 25 May 2026 12:04:59 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:56::]) by smtp.gmail.com with ESMTPSA id 5614622812f47-485545070easm5050032b6e.7.2026.05.25.12.04.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 May 2026 12:04:58 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner , Michal Hocko Cc: Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 1/7 v3] mm/page_counter: introduce per-page_counter stock Date: Mon, 25 May 2026 12:04:48 -0700 Message-ID: <20260525190455.2843786-2-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260525190455.2843786-1-joshua.hahnjy@gmail.com> References: <20260525190455.2843786-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In order to avoid expensive hierarchy walks on every memcg charge and limit check, memcontrol uses per-cpu stocks (memcg_stock_pcp) to cache pre-charged pages and introduce a fast path to try_charge_memcg. However, there are a few quirks with the current implementation that could be improved upon. First, each memcg_stock_pcp can only cache the charges of 7 memcgs (defined as NR_MEMCG_STOCK), which means that once a CPU starts handling the charging of more than 7 memcgs, it randomly selects a victim memcg to evict and drain from the cpu, which can cause unnecessarily increased latencies and thrashing as memcgs continually evict each others' stock. Second, stock is tightly coupled with memcg, which means that all page counters in a memcg share the same resource. This may simplify some of the charging logic, but it prevents new page counters from being added and using a separate stock. We can address these concerns by pushing the concept of stock down to the page_counter level, which addresses the random eviction problem by getting rid of the 7 slot limit, and makes enabling separate stock caches for other page_counters simpler. Introduce a generic per-cpu stock directly in struct page_counter. Stock can optionally be enabled per-page_counter, limiting the overhead increase for page_counters who do not benefit greatly from caching charges. This patch introduces the page_counter_stock struct and its enable/disable/free functions, but does not use these yet. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- include/linux/page_counter.h | 13 ++++++++ mm/page_counter.c | 64 ++++++++++++++++++++++++++++++++++++ 2 files changed, 77 insertions(+) diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h index d649b6bbbc871..c7e3ab3356d20 100644 --- a/include/linux/page_counter.h +++ b/include/linux/page_counter.h @@ -5,8 +5,15 @@ #include #include #include +#include +#include #include =20 +struct page_counter_stock { + local_trylock_t lock; + unsigned long nr_pages; +}; + struct page_counter { /* * Make sure 'usage' does not share cacheline with any other field in @@ -41,6 +48,8 @@ struct page_counter { unsigned long high; unsigned long max; struct page_counter *parent; + struct page_counter_stock __percpu *stock; + unsigned int batch; } ____cacheline_internodealigned_in_smp; =20 #if BITS_PER_LONG =3D=3D 32 @@ -99,6 +108,10 @@ static inline void page_counter_reset_watermark(struct = page_counter *counter) counter->watermark =3D usage; } =20 +int page_counter_enable_stock(struct page_counter *counter, unsigned int b= atch); +void page_counter_disable_stock(struct page_counter *counter); +void page_counter_free_stock(struct page_counter *counter); + #if IS_ENABLED(CONFIG_MEMCG) || IS_ENABLED(CONFIG_CGROUP_DMEM) void page_counter_calculate_protection(struct page_counter *root, struct page_counter *counter, diff --git a/mm/page_counter.c b/mm/page_counter.c index 661e0f2a5127a..a1a871a9d5c49 100644 --- a/mm/page_counter.c +++ b/mm/page_counter.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -289,6 +290,69 @@ int page_counter_memparse(const char *buf, const char = *max, return 0; } =20 +int page_counter_enable_stock(struct page_counter *counter, unsigned int b= atch) +{ + struct page_counter_stock __percpu *stock; + int cpu; + + stock =3D alloc_percpu(struct page_counter_stock); + if (!stock) + return -ENOMEM; + + for_each_possible_cpu(cpu) { + struct page_counter_stock *s =3D per_cpu_ptr(stock, cpu); + + local_trylock_init(&s->lock); + } + counter->stock =3D stock; + counter->batch =3D batch; + + return 0; +} + +static void page_counter_drain_stock_nolock(struct page_counter *counter) +{ + unsigned long stock_to_drain =3D 0; + int cpu; + + for_each_possible_cpu(cpu) { + struct page_counter_stock *stock; + + stock =3D per_cpu_ptr(counter->stock, cpu); + stock_to_drain +=3D stock->nr_pages; + stock->nr_pages =3D 0; + } + + if (stock_to_drain) + page_counter_uncharge(counter, stock_to_drain); +} + +void page_counter_disable_stock(struct page_counter *counter) +{ + if (!counter->stock) + return; + + /* This prevents future charges from trying to deposit pages */ + WRITE_ONCE(counter->batch, 0); + + /* + * Charges can still be in-flight at this time. Instead of locking here, + * do the majority of the drains here without locking to free up pages + * now. Any remaining stock will be drained in page_counter_free_stock. + */ + page_counter_drain_stock_nolock(counter); +} + +void page_counter_free_stock(struct page_counter *counter) +{ + if (!counter->stock) + return; + + page_counter_drain_stock_nolock(counter); + free_percpu(counter->stock); + counter->stock =3D NULL; +} + =20 #if IS_ENABLED(CONFIG_MEMCG) || IS_ENABLED(CONFIG_CGROUP_DMEM) /* --=20 2.53.0-Meta From nobody Mon Jun 8 22:53:11 2026 Received: from mail-ot1-f53.google.com (mail-ot1-f53.google.com [209.85.210.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 812B2388E5A for ; Mon, 25 May 2026 19:05:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779735904; cv=none; b=Y6bEWJVqkSyk34xZIGSjvSGBkttB8EBpJOp739lnuOBNF4pXnZADqWxgqQZvGwnhEA+wFG8z8OZBAZIjwRTMRY2+cKv/ZF3esYAwYFGEpLXjzJr7RNfqKyjDy4/aD1+FBlrykB+S0vJdJttGqsFpkiliLzZfLDf3qg/67q834Kc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779735904; c=relaxed/simple; bh=ob+VYgFU6m5S1jM4nkjA7iYLIdVS3X8bkqvXmzMrSvg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pHfHByxTK8BeiAKEjEg2VS7zQ8srbepaVTEGoEF+EHbo7nv/6xR5aZNKNQ/elDkY/S0xCBwhKTe8dO4hAqFzTvattHnsasTj1pDazQsWyubQEVsoRQWhx5u3dwcq8wOJaw8Y82++FhGVUHieK5w6GElIMWglwuIW2RZiSXZ0Tkw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=IkpwdsXA; arc=none smtp.client-ip=209.85.210.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="IkpwdsXA" Received: by mail-ot1-f53.google.com with SMTP id 46e09a7af769-7e6201aeb2aso2380212a34.3 for ; Mon, 25 May 2026 12:05:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779735901; x=1780340701; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ieNLELksxQ12VWfSTE2wcDumq9qPO5WfoZYKitlIUko=; b=IkpwdsXAVvNV1xx+KtG/7+0+bxXc+fUBGQXwLL3ScsBJXtFD7sv/fk1YEi9TGFtRRI bRffVHe0FDjXA4Rxi4m1GpPoUnPMd+dqgYawI8b66Sj0MJsOaPskdS+vu9uHEhKKJU6a 1QvTufAvtxbUxcQOZQg21UWlTA3AI0BWvp8ZhPO683zwkHYa1aQzrBOfjRL3gaNpeom9 mfwHUCxVe5PT19aF/UBP/GgwmDvYyT+czdJVAZJirIjlk9h8zaOzjj8gDBHWGBCOw+1E R6UC3y+60CYVO4lyG8PtOKL6qFVPuY5/GfcABjdqty2zhlAVT4Jsffa0AKAVHtN9Uiiq cl5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779735901; x=1780340701; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=ieNLELksxQ12VWfSTE2wcDumq9qPO5WfoZYKitlIUko=; b=Rm9v7ilrr/1aM3h0E2SIZeK2vBRPs/osID+vtqPi0QTgQs1izPxNLO6nzSEEdWDqwd hrgaZ+4+BrYVOMVKCVZIq8tLNRTT2W8Q1aQgztsSP4ITptYICryNxv4C5iLKcB8cUIyd +wPT29x0Q9h9hqqkUFsGPYRQqakQ1oOhWha1T8TF3lD62z0n0uqOPxpUbAcVEvVfcbsS PETfO/+BMGCOSSDolFIskHp8vbxkacYqyFpZY4g3bDtwktjlGjZx8wSo5fsJJ4/9X2KS FwewJlITyzopZ/CjIqmAnwOXJBXT1cj6CkvZ14VWAHh3ZRz1n8jUl5VJYgipHj8UJnAY 7NUQ== X-Forwarded-Encrypted: i=1; AFNElJ/QiXVOm28OKcQ9l8Bk5eX/WL/YvNXtUAZWY4iBt3TQobW2yvWkBHAN8LGRE8U1pJVXSu8JEaL+NxA+1wg=@vger.kernel.org X-Gm-Message-State: AOJu0YxIjhpA0vG+uEDfcETHSq7d8eSyiHJkjksc2vWMqiMJteYXG8wt bXQcbvdJq/Gw4Ih7YLxfL3a8KyFAXkfbn9xKKiEB183wcyDsKb2dLr0Y X-Gm-Gg: Acq92OHzbc9qDxOTh7izOIYg0XDkRjBlqWtbeQb769wW+IyRsrKW+vi6ukoL2r8kbLP 8BSfRzrhRwKKuRFmSXFhIqHtvXQ5Xo4Gk2Tm7a3lObGF5c9E96pIz7RN3x3GqiKHlUqCuAwW4EI HO+XPwv1CV7PBzs5rT4u7WVvGr9HLuZlXDSFMoPLPsEf3Ub7jTEHKhc6GqG9HrKB5A7kfZZ4eJN oS/P5u2sFEInINx7D6MB3Ar1WGwMOw0VPgg8S6MucGN0YoDihwbLuZoDBf08iO6wbFKCr9ZEMR9 mSLDEMmLXoLDsUOkcUevtgQolPAdHJLqYinu4cfkvI0dVyHA9r9d8ptY5hwMwa7j3PRfCenYR9W NHao4pXQGUrMYC5kGqMekPwawkntY1h0uH0PzDvMITM8g1YoRcubPjhrX5gR4fFcQY2RdPam421 EYtfH5e7QNDljmuoY/RLIskYw4sth4h7SwVLZlFK5CgGTbszrgYid3Kg== X-Received: by 2002:a05:6830:6a91:b0:7d7:d510:4bf9 with SMTP id 46e09a7af769-7e5fedd5083mr9959038a34.12.1779735901451; Mon, 25 May 2026 12:05:01 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:50::]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7e6065e7cb6sm7850039a34.16.2026.05.25.12.05.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 May 2026 12:05:01 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner , Michal Hocko Cc: Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 2/7 v3] mm/page_counter: use page_counter_stock in page_counter_try_charge Date: Mon, 25 May 2026 12:04:49 -0700 Message-ID: <20260525190455.2843786-3-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260525190455.2843786-1-joshua.hahnjy@gmail.com> References: <20260525190455.2843786-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Make page_counter_try_charge() stock-aware. We preserve the same semantics as the existing stock handling logic in try_charge_memcg: 1. Limit-check against the stock. If there is enough, charge to the stock (non-hierarchical) and return immediately. 2. Greedily attempt to fulfill the charge request and fill the stock up at the same time via a hierarchical charge. 3. If we fail with this charge, retry again (once) with the exact number of pages requested. 4. If we succeed with the greedy attempt, then try to add those extra pages to the stock. If that fails (trylock), then uncharge those surplus pages hierarchically. As of this patch, the page_counter_stock is unused, as it has not been enabled on any memcg yet. No functional changes intended. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- mm/page_counter.c | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-) diff --git a/mm/page_counter.c b/mm/page_counter.c index a1a871a9d5c49..e002688bf7f1a 100644 --- a/mm/page_counter.c +++ b/mm/page_counter.c @@ -121,9 +121,25 @@ bool page_counter_try_charge(struct page_counter *coun= ter, struct page_counter **fail) { struct page_counter *c; + unsigned long charge =3D nr_pages; + unsigned long batch =3D READ_ONCE(counter->batch); bool protection =3D track_protection(counter); bool track_failcnt =3D counter->track_failcnt; =20 + if (counter->stock && local_trylock(&counter->stock->lock)) { + struct page_counter_stock *stock =3D this_cpu_ptr(counter->stock); + + if (stock->nr_pages >=3D charge) { + stock->nr_pages -=3D charge; + local_unlock(&counter->stock->lock); + return true; + } + local_unlock(&counter->stock->lock); + } + + charge =3D max_t(unsigned long, batch, nr_pages); + +retry: for (c =3D counter; c; c =3D c->parent) { long new; /* @@ -140,9 +156,9 @@ bool page_counter_try_charge(struct page_counter *count= er, * we either see the new limit or the setter sees the * counter has changed and retries. */ - new =3D atomic_long_add_return(nr_pages, &c->usage); + new =3D atomic_long_add_return(charge, &c->usage); if (new > c->max) { - atomic_long_sub(nr_pages, &c->usage); + atomic_long_sub(charge, &c->usage); /* * This is racy, but we can live with some * inaccuracy in the failcnt which is only used @@ -163,11 +179,31 @@ bool page_counter_try_charge(struct page_counter *cou= nter, WRITE_ONCE(c->watermark, new); } } + + /* charge > nr_pages implies this page_counter has stock enabled */ + if (charge > nr_pages) { + if (local_trylock(&counter->stock->lock)) { + struct page_counter_stock *stock; + + stock =3D this_cpu_ptr(counter->stock); + stock->nr_pages +=3D charge - nr_pages; + local_unlock(&counter->stock->lock); + } else { + page_counter_uncharge(counter, charge - nr_pages); + } + } + return true; =20 failed: for (c =3D counter; c !=3D *fail; c =3D c->parent) - page_counter_cancel(c, nr_pages); + page_counter_cancel(c, charge); + + if (charge > nr_pages) { + /* Retry without trying to grab extra pages to refill stock */ + charge =3D nr_pages; + goto retry; + } =20 return false; } --=20 2.53.0-Meta From nobody Mon Jun 8 22:53:11 2026 Received: from mail-ot1-f41.google.com (mail-ot1-f41.google.com [209.85.210.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D21C338656C for ; Mon, 25 May 2026 19:05:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779735906; cv=none; b=WlEaCzfBYKaypZyqLcnqHq9UIUOnXhJLGvOeamJnyjSgpK28s1oQF3CdcKND5Y8n4mTLVI6SVZdMG85UAtivwp0qwhl3qvQckmfa74IDg+1LsL6+6juImVW6sXbTD+kLxDR91VShWBORIEYAShx2CTu9JGiygXzAKkdnE9aNvDw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779735906; c=relaxed/simple; bh=bTlx5xfQqrhQmXD9/D84ZEOgc8v5KD0QPZ0MqCBJAak=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HOPYS80mAATzyjl1zy/hTLIu1XU/BUtrRjbahVUWv1r0XRTDhjkSdZEJHTPsN6nLAzIE155x4bpAZ009WkmlM0Sjkx5vmlG/m/9Q19kHm9WR0wawqvjWQ91dUjaNZ5qpvkUZOtXHtRAhio6/1K9GcRmXG6uVfaWDiqLZU1KBveU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=RsOgzCmz; arc=none smtp.client-ip=209.85.210.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RsOgzCmz" Received: by mail-ot1-f41.google.com with SMTP id 46e09a7af769-7e6061b3fd7so1804111a34.2 for ; Mon, 25 May 2026 12:05:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779735903; x=1780340703; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=SMrn8GveLC1NdRQvZAymrZa6CvaW/GKC3sRoShkKmw8=; b=RsOgzCmz19SDcp7N2tzO7U20cJnRdApwyrwl91AREGuSiJHM2aqVBAB2OReqHdtkv3 37Fn4wzXMO+s/vkJk+7b6VBQBkd/aYqtPVFcZA3OKqpSDsBpfCD811HuIj52oyOdSvfJ 9XEaePuBwDe7m+AulLrErG6LlLx3YyhAhYPOTn2on/cavANlXTVi5B2ziJmEq0NkFuV5 zu3T/L8HV3tl3nkmCHbIshJxWukdIHUhg96SiNMMdKO/sUJgnQCdCm22+M3PLWBhCcF1 FqcH+pGrRScyW5HXkS+xmA/tA2D1EPgGcfbTX9POJu4bB+9GRq+mg7g13E6zxmW77PNl xSBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779735903; x=1780340703; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=SMrn8GveLC1NdRQvZAymrZa6CvaW/GKC3sRoShkKmw8=; b=L3f/MiSHpSSiBy6JU0zFCzY5HL2qZRiyoy82ModuF5wyRFgRmg5kCqRbek2A3A19nV ZAo57FuVfv6BwtSYflvETJOoEBjywzohHm41PTinMRvVTRDpJDeUwNI0sDOMlSKTHtCy CYLzBBMfCKEEkC4vYCE6tqWPv1lxIxaDAyJ4fWWtnKCF0yTXIHQJOzwtVg+lMJ2MPJ6G S3Le0tFqiX+0OYJZo4lV6xXLfV1qotS+Wwf/QD9v9N3SxxABEKqsGDHK7KPAb8cvpIDI Z5Df/NRhfKReOMVQGl3hJE9mfdGYPZLUbw7/xGB+2F/jlon7Tq1t92CTBRQiXteN/C5D tlUw== X-Forwarded-Encrypted: i=1; AFNElJ8YvcfNDKQhFL75yY/aINDFH0oERu83+JMCYrPelPR8ES46X593ZOJxq/8dhOTLKG/RTiPx4GANHzQ76XA=@vger.kernel.org X-Gm-Message-State: AOJu0YxaSSm6k/aLQu2c9dYSVGHH4EhHOUWf79k/NGdB4VaB+j71MsHv QMBXBJrwK49+4kLngSDQL/TJHT5y2SiWnEBIT/LGbHbQeUOxfcGxixGl X-Gm-Gg: Acq92OH7s+TgFrcaf7hAdMfKlCxdJwjucmgeZ1MWhj2Xpbtv3h9CyQB13xQ5UH71JWW YjPLfEIcl96M3tslEq+xm7Q2U4s9q/SwrBeqxLrgMyhEAO9mOOQpz6s2YDCdGtOLeoxzlXOWSTR 6FjMk8wGguL59h6MddeWLdlibWEUAodTODhPkjkb8zryv4yjPUd7N6p25z9ostHeaVJK3Un53q/ QLg8aezY/5M3ly/pxNP73GQ6VyJ/FVWb7pVKHc5xJw5pB0zXixDhG6EUxTIjsavbBrQL86leJvx rXaj7k/n6SCb5ffn17kwIrDQHDKj8fluXgbHUWcoZX4XwoZGxcvNWQEs/58sLH79by/3nYMpJsL 8pUCpQ/YFUiFqZddXL8dYlOMVSygWkDLu0XHMbKOinQfK1tNryJfEYJqBPZOFJgP93Rb7mhGcbJ fIJx2EMWX8Dqnmaj8WTj93SBc5nG8sUsKI3xkd8FE1YEIrBnWQpUbUIw== X-Received: by 2002:a05:6830:610d:b0:7d7:c985:3a30 with SMTP id 46e09a7af769-7e5fee82ab6mr9298026a34.11.1779735903615; Mon, 25 May 2026 12:05:03 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:70::]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7e6065e6aabsm7915197a34.13.2026.05.25.12.05.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 May 2026 12:05:03 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner , Michal Hocko Cc: Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 3/7 v3] mm/page_counter: introduce stock drain APIs Date: Mon, 25 May 2026 12:04:50 -0700 Message-ID: <20260525190455.2843786-4-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260525190455.2843786-1-joshua.hahnjy@gmail.com> References: <20260525190455.2843786-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce page_counter variants to replace memcg stock draining functions. page_counter_drain_stock_local() drains the stock of the local CPU, taking a local stock lock to serialize against concurrent charges. page_counter_drain_stock_cpu() does the same, but without taking a local lock. This is possible because it will only be called from the CPU hotplug path, where the CPU is dead and there cannot be any more charges. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- include/linux/page_counter.h | 3 +++ mm/page_counter.c | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+) diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h index c7e3ab3356d20..ffe13224213c9 100644 --- a/include/linux/page_counter.h +++ b/include/linux/page_counter.h @@ -111,6 +111,9 @@ static inline void page_counter_reset_watermark(struct = page_counter *counter) int page_counter_enable_stock(struct page_counter *counter, unsigned int b= atch); void page_counter_disable_stock(struct page_counter *counter); void page_counter_free_stock(struct page_counter *counter); +void page_counter_drain_stock_local(struct page_counter *counter); +void page_counter_drain_stock_cpu(struct page_counter *counter, + unsigned int cpu); =20 #if IS_ENABLED(CONFIG_MEMCG) || IS_ENABLED(CONFIG_CGROUP_DMEM) void page_counter_calculate_protection(struct page_counter *root, diff --git a/mm/page_counter.c b/mm/page_counter.c index e002688bf7f1a..fbfe9a1b29d2e 100644 --- a/mm/page_counter.c +++ b/mm/page_counter.c @@ -389,6 +389,40 @@ void page_counter_free_stock(struct page_counter *coun= ter) counter->stock =3D NULL; } =20 +void page_counter_drain_stock_local(struct page_counter *counter) +{ + struct page_counter_stock *stock; + unsigned long nr_pages; + + if (!counter->stock) + return; + + local_lock(&counter->stock->lock); + stock =3D this_cpu_ptr(counter->stock); + nr_pages =3D stock->nr_pages; + stock->nr_pages =3D 0; + local_unlock(&counter->stock->lock); + + if (nr_pages) + page_counter_uncharge(counter, nr_pages); +} + +void page_counter_drain_stock_cpu(struct page_counter *counter, + unsigned int cpu) +{ + struct page_counter_stock *stock; + unsigned long nr_pages; + + if (!counter->stock) + return; + + stock =3D per_cpu_ptr(counter->stock, cpu); + nr_pages =3D stock->nr_pages; + if (nr_pages) { + stock->nr_pages =3D 0; + page_counter_uncharge(counter, nr_pages); + } +} =20 #if IS_ENABLED(CONFIG_MEMCG) || IS_ENABLED(CONFIG_CGROUP_DMEM) /* --=20 2.53.0-Meta From nobody Mon Jun 8 22:53:11 2026 Received: from mail-oo1-f41.google.com (mail-oo1-f41.google.com [209.85.161.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCA4738C437 for ; Mon, 25 May 2026 19:05:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779735911; cv=none; b=Ppkri0J7omTPO+BAHyE8UcEB+9BBSQMga69G/Rd4pptAtaN1jJh/ycrELXAr9fDUyTITSXGv2KtNUsmuJd845JhlBThQuq0LRYlpGGkJEfvQorcKBzcQ91SMV4nmNRCUxZJOEdSqe9uBTg8zVBRCssOm+0bcHzvGsX9NwlCihe4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779735911; c=relaxed/simple; bh=MThEOTa6itexIrgtSFecsTGMxxH6TnoRzC6AGdLiF28=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kJ8N5vN99C5wxWbwKCMvrsaeIk0DAjCj3B5Z6cuy8HtzPndCHViQ6rn18jtzAQil5FmRLgl9D5rAy+qwEYr8VlRYwZ0IdUV2YAOtNiOLqlB1iXF/mvdWfdKXfuDfN8NMN+IVU/+RHXILJI4UhxsIXTfj1r97zatAY375O5Nlv1c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=esVFmqup; arc=none smtp.client-ip=209.85.161.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="esVFmqup" Received: by mail-oo1-f41.google.com with SMTP id 006d021491bc7-69d7e72b052so2392392eaf.2 for ; Mon, 25 May 2026 12:05:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779735907; x=1780340707; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xtAatjseL4y/ybWN2OirmEntLqGpzhQX4p1MNNqGB8M=; b=esVFmqupdpfG9qcampNsSBEVaP7ZRaPskpA9anA0i55y+kVcM4871W48oYTH/1Qtlo x1HUZ7RI4tTkl3O/esvn7N4Ghw7Wbl4FWml2Ay9/jZOQkZYSl1rUfOSK1Vlz6QyCxqYa gu+bjyhct1hk4H2RLmak41cObZQ9MYai19ERTdXMSUB1dukSf/G0G6dN6qsM7vwLbwB9 BLrGKPa+oxqlJJjqbYOYScpoVUcRW1O9JY1cSAQYt4BWAiOu33K5PEM78tHvr+SpyKEs UPiDW9Hebs1qbwqu+rbWe+mp87G0IA/E7x7XP0bU0xLtN/cWnZOgNldxZ1TscBFa3G2X PAJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779735907; x=1780340707; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=xtAatjseL4y/ybWN2OirmEntLqGpzhQX4p1MNNqGB8M=; b=HPIDQ5GgxtVmzd9YB12twk6UEu/RTISjqKBOM9bFgAt5EDsmtfNd9XfUJTnzLHw3bA 0+nHvDQZ5XiPhvJ/j+Z0uJptZ2Boxk7Ar1rJemVa15KzXS/S3Ecp/MQnHdkxd8O0YjNF 6e8VTfmKlzzJkNKdG26jSsxqHeCalpJS2jO1ncE1UtvF1HwUXyGKll0E0hScxJ5IK4DP C4e/XQzEOFPGgKbsWdVwM7xTlLSPkHr0l3UDu4Bqhqsvxp2CecblEg0Uv1DdtVp1WrVa fhlViJZelM3rT1KL3XUntKf8GTcyisp4mMxoZO/ad5fAgbOVWNo3CUce/oXa0p4kimjl IbhA== X-Forwarded-Encrypted: i=1; AFNElJ/Lgn3JOfhEx/U5MZMPrEpW/lh3ClsDU8xoODDlRiG/2NDaHOBf81lvCDRP7PNBM+fUyKRHfz03YUbW3wI=@vger.kernel.org X-Gm-Message-State: AOJu0YyEuN8ZNhXhyJdU4C9/FXZHYwuetPQQ5gRoU/l8fs/kaLiqgDBV TCXciW/hbOENU8LxVGRt0P/5Ef8hPUm/Vfkg/1ppteDOaEzEEPCww29g X-Gm-Gg: Acq92OEuhFNiyFrUCSNwCXjYA3uj+d+zr3vaAljT5s8YNVGKKlpml8/2jgkhXu3yvbQ qPZYrYS7TQrRScnqa1wd+CK0Nl1i/0bYjgB5UV8dEA0Ubh+tHjKF/HDWYoVw0I5TWtRWasK6Cv6 3RrW2qdG0U15szT/mdJeqzMtGbMZ13Jl8Fb/o/XEUSvFzXlaPGVFoXN0XlaVhfJ3k6WNwWaD8sc TG0VG2qpmRUj95bWteUxn3Fipb7AqIfxSYEA2dBOiStV/YeQp4axH4e/AvGWTwICmVUnl4fSDMP 6QBh8pLYuNmyh+bw2jQtTpG1DFcAvzmxh1GrvmWcYgjTc0srCPKqaJd424c288HMI3UVL0Ae/cS LLQtOLWrJbHO1NResIuU4mCjTK61SI1a+eMOmaWTT2SZnSlTsPa7ZXSIgl49HyqNZsi8lZU3kQc 0VWKAPgZxzcuGQX+3Lkxzs1Y1+g3W27DTs2NQWfC9P13fIcEhmuTWHu2xiLb5KN2Xz X-Received: by 2002:a05:6820:198b:b0:69d:c626:c404 with SMTP id 006d021491bc7-69dc626c494mr1351814eaf.24.1779735906675; Mon, 25 May 2026 12:05:06 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:43::]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-69d8370df8asm5673853eaf.6.2026.05.25.12.05.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 May 2026 12:05:05 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner , Michal Hocko Cc: Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 4/7 v3] mm/memcontrol: convert memcg to use page_counter_stock Date: Mon, 25 May 2026 12:04:51 -0700 Message-ID: <20260525190455.2843786-5-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260525190455.2843786-1-joshua.hahnjy@gmail.com> References: <20260525190455.2843786-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now with all of the memcg_stock handling logic replicated in page_counter_stock, switch memcg to use the page_counter_stock. There are a few details that have changed: First, the old special-casing for the !allow_spinning check to avoid refilling and flushing of the old stock is removed. This special casing was important previously, because refilling the stock could do a lot of extra work by evicting one of 7 random victim memcgs in the percpu memcg_stock slots. In the new per-counter design, refilling stock just adds pages to the counter's own local cache without affecting other memcgs, so the original reason for the special case no longer applies. Also, we can now fail during page_counter_enable_stock(), if there is not enough memory to allocate a percpu page_counter_stock. This failure is rare and nonfatal; the system can continue to operate, with the page counter working without stock and falling back to walking the hierarchy. Finally, drain_all_stock is restructured to iterate CPUs in the outer loop (rather than memcgs) to be able to schedule draining all memcgs via a single work_on_cpu call. It reduces the number of synchronous per-CPU work calls from O(memcgs * CPUs) to just O(CPUs). Note that obj_stock remains untouched by these changes. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- mm/memcontrol.c | 78 +++++++++++++++++++++---------------------------- 1 file changed, 34 insertions(+), 44 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 368efc1455e35..952c6f7430395 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2260,6 +2260,17 @@ static void schedule_drain_work(int cpu, struct work= _struct *work) queue_work_on(cpu, memcg_wq, work); } =20 +static long drain_stock_on_cpu(void *arg) +{ + struct mem_cgroup *root_memcg =3D arg; + struct mem_cgroup *memcg; + + for_each_mem_cgroup_tree(memcg, root_memcg) + page_counter_drain_stock_local(&memcg->memory); + + return 0; +} + /* * Drains all per-CPU charge caches for given root_memcg resp. subtree * of the hierarchy under it. @@ -2271,28 +2282,16 @@ void drain_all_stock(struct mem_cgroup *root_memcg) /* If someone's already draining, avoid adding running more workers. */ if (!mutex_trylock(&percpu_charge_mutex)) return; - /* - * Notify other cpus that system-wide "drain" is running - * We do not care about races with the cpu hotplug because cpu down - * as well as workers from this path always operate on the local - * per-cpu data. CPU up doesn't touch memcg_stock at all. - */ + + for_each_online_cpu(cpu) + work_on_cpu(cpu, drain_stock_on_cpu, root_memcg); + + /* Drain obj_stock on all online CPUs */ migrate_disable(); curcpu =3D smp_processor_id(); for_each_online_cpu(cpu) { - struct memcg_stock_pcp *memcg_st =3D &per_cpu(memcg_stock, cpu); struct obj_stock_pcp *obj_st =3D &per_cpu(obj_stock, cpu); =20 - if (!test_bit(FLUSHING_CACHED_CHARGE, &memcg_st->flags) && - is_memcg_drain_needed(memcg_st, root_memcg) && - !test_and_set_bit(FLUSHING_CACHED_CHARGE, - &memcg_st->flags)) { - if (cpu =3D=3D curcpu) - drain_local_memcg_stock(&memcg_st->work); - else - schedule_drain_work(cpu, &memcg_st->work); - } - if (!test_bit(FLUSHING_CACHED_CHARGE, &obj_st->flags) && obj_stock_flush_required(obj_st, root_memcg) && !test_and_set_bit(FLUSHING_CACHED_CHARGE, @@ -2309,9 +2308,13 @@ void drain_all_stock(struct mem_cgroup *root_memcg) =20 static int memcg_hotplug_cpu_dead(unsigned int cpu) { + struct mem_cgroup *memcg; + /* no need for the local lock */ drain_obj_stock(&per_cpu(obj_stock, cpu)); - drain_stock_fully(&per_cpu(memcg_stock, cpu)); + + for_each_mem_cgroup(memcg) + page_counter_drain_stock_cpu(&memcg->memory, cpu); =20 return 0; } @@ -2586,7 +2589,6 @@ void __mem_cgroup_handle_over_high(gfp_t gfp_mask) static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int nr_pages) { - unsigned int batch =3D max(MEMCG_CHARGE_BATCH, nr_pages); int nr_retries =3D MAX_RECLAIM_RETRIES; struct mem_cgroup *mem_over_limit; struct page_counter *counter; @@ -2599,31 +2601,19 @@ static int try_charge_memcg(struct mem_cgroup *memc= g, gfp_t gfp_mask, bool allow_spinning =3D gfpflags_allow_spinning(gfp_mask); =20 retry: - if (consume_stock(memcg, nr_pages)) - return 0; - - if (!allow_spinning) - /* Avoid the refill and flush of the older stock */ - batch =3D nr_pages; - reclaim_options =3D MEMCG_RECLAIM_MAY_SWAP; if (!do_memsw_account() || - page_counter_try_charge(&memcg->memsw, batch, &counter)) { - if (page_counter_try_charge(&memcg->memory, batch, &counter)) + page_counter_try_charge(&memcg->memsw, nr_pages, &counter)) { + if (page_counter_try_charge(&memcg->memory, nr_pages, &counter)) goto done_restock; if (do_memsw_account()) - page_counter_uncharge(&memcg->memsw, batch); + page_counter_uncharge(&memcg->memsw, nr_pages); mem_over_limit =3D mem_cgroup_from_counter(counter, memory); } else { mem_over_limit =3D mem_cgroup_from_counter(counter, memsw); reclaim_options &=3D ~MEMCG_RECLAIM_MAY_SWAP; } =20 - if (batch > nr_pages) { - batch =3D nr_pages; - goto retry; - } - /* * Prevent unbounded recursion when reclaim operations need to * allocate memory. This might exceed the limits temporarily, @@ -2720,9 +2710,6 @@ static int try_charge_memcg(struct mem_cgroup *memcg,= gfp_t gfp_mask, return 0; =20 done_restock: - if (batch > nr_pages) - refill_stock(memcg, batch - nr_pages); - /* * If the hierarchy is above the normal consumption range, schedule * reclaim on returning to userland. We can perform reclaim here @@ -2759,7 +2746,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg,= gfp_t gfp_mask, * and distribute reclaim work and delay penalties * based on how much each task is actually allocating. */ - current->memcg_nr_pages_over_high +=3D batch; + current->memcg_nr_pages_over_high +=3D nr_pages; set_notify_resume(current); break; } @@ -3064,7 +3051,7 @@ static void obj_cgroup_uncharge_pages(struct obj_cgro= up *objcg, account_kmem_nmi_safe(memcg, -nr_pages); memcg1_account_kmem(memcg, -nr_pages); if (!mem_cgroup_is_root(memcg)) - refill_stock(memcg, nr_pages); + memcg_uncharge(memcg, nr_pages); =20 css_put(&memcg->css); } @@ -4096,6 +4083,8 @@ static void __mem_cgroup_free(struct mem_cgroup *memc= g) =20 static void mem_cgroup_free(struct mem_cgroup *memcg) { + page_counter_free_stock(&memcg->memory); + page_counter_free_stock(&memcg->memsw); lru_gen_exit_memcg(memcg); memcg_wb_domain_exit(memcg); __mem_cgroup_free(memcg); @@ -4268,6 +4257,9 @@ static int mem_cgroup_css_online(struct cgroup_subsys= _state *css) refcount_set(&memcg->id.ref, 1); css_get(css); =20 + /* failure is nonfatal, charges fall back to direct hierarchy */ + page_counter_enable_stock(&memcg->memory, MEMCG_CHARGE_BATCH); + /* * Ensure mem_cgroup_from_private_id() works once we're fully online. * @@ -4330,6 +4322,7 @@ static void mem_cgroup_css_offline(struct cgroup_subs= ys_state *css) lru_gen_offline_memcg(memcg); =20 drain_all_stock(memcg); + page_counter_disable_stock(&memcg->memory); =20 mem_cgroup_private_id_put(memcg, 1); } @@ -5524,7 +5517,7 @@ void mem_cgroup_sk_uncharge(const struct sock *sk, un= signed int nr_pages) =20 mod_memcg_state(memcg, MEMCG_SOCK, -nr_pages); =20 - refill_stock(memcg, nr_pages); + page_counter_uncharge(&memcg->memory, nr_pages); } =20 void mem_cgroup_flush_workqueue(void) @@ -5577,12 +5570,9 @@ int __init mem_cgroup_init(void) memcg_wq =3D alloc_workqueue("memcg", WQ_PERCPU, 0); WARN_ON(!memcg_wq); =20 - for_each_possible_cpu(cpu) { - INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work, - drain_local_memcg_stock); + for_each_possible_cpu(cpu) INIT_WORK(&per_cpu_ptr(&obj_stock, cpu)->work, drain_local_obj_stock); - } =20 memcg_size =3D struct_size_t(struct mem_cgroup, nodeinfo, nr_node_ids); memcg_cachep =3D kmem_cache_create("mem_cgroup", memcg_size, 0, --=20 2.53.0-Meta From nobody Mon Jun 8 22:53:11 2026 Received: from mail-oa1-f50.google.com (mail-oa1-f50.google.com [209.85.160.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FA9E393DC7 for ; Mon, 25 May 2026 19:05:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779735910; cv=none; b=Dwu+0X3JEdwcbFC8j5fBAlrFmFtdWSqPywfKk+7w01KZTzJLDQ16x1+0CAo1NqgPFeDQFsKz5pIE6qJwP+AyEernh7yTvzF2LxR4BvUegRf/bIoLAzIApmDHYtGcWaeL/0Fx+pX0BNwtQ5sLfnoA5EnkSKyfRYtzW4ZO/YjqC10= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779735910; c=relaxed/simple; bh=UapiZIDwIAH3hwELIidQYiIyRmDbdsmxK5RqCHCIn6U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RxDcx/83y2uS1L/6sHbWNDhBFhpl+OFKq1xzc98eE3bDdoXLvEN+R7G+Jtse+JVTdKV7ikIrI9ad5kuS4XKmGRmZbN2XWBtx4Jw5J8slpxFZmDF9Sr+KMNyWbqarxvCtxpGZ2H507bhL+C+AkTrBMWvhmEUibUA62hVi8rAHDXY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MNrvbyvB; arc=none smtp.client-ip=209.85.160.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MNrvbyvB" Received: by mail-oa1-f50.google.com with SMTP id 586e51a60fabf-43b6f19b7d4so2166925fac.0 for ; Mon, 25 May 2026 12:05:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779735908; x=1780340708; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dbriz7dbk9TD0L3VAQpwYxY9mGZG6ZyIbAJOMgP+gHw=; b=MNrvbyvBZOu0lqPp5vvG8wE1uDZNizxxCrcyMBUDBOIa2JVXIV5ETxEpkuWsYARRrU pKMwkqGdjGhlwiXIfBSW5PqVDoPEMhEQLa5p0eb9N6LFpVSoG43/oac+wGY1ahhoN+mf AjleL1cKj02/79fR+Lojrub9OdBD/H/js1C6A1QmlFIx17UXWtsvj4MIX8FdrEOfKYcs 7Yw9Pl1gHwwAXDYyd/0TjzM92NTSlJ42Amjv1zcQiKt2laXTs0ok85Y2ZXkb9joAjPt6 fQmE311dyoQBKnw7rDOIj/E70gn3qlUaxXLGIX+yhBt63qFi6oudQu2vQe6v3LSzGcBy zkeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779735908; x=1780340708; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=dbriz7dbk9TD0L3VAQpwYxY9mGZG6ZyIbAJOMgP+gHw=; b=LM9bmglI7aif8CYVsliJ02w+e9zIkmVZoK1lxSq177LSosJ6HBQyvS+gm2pyJLg+Ab CW7t7dunTGtdE6wjnNfGodn5ebyV1ZChV6NSxxY+vQ46IKO43pJODuNjeXggQuukvPC3 buj62orytlIGjHucGpWvkHU3DJKCoRMkfhh1Ttqh7COPZJXc++sZRDgMvQgiH0dK5UP+ 2U1cs9LfupONoCosGvyRf+fEKd2UQju0hb49ryxuDorOsMpwFCGmGyFwJahykIc0NuJb ymT9nnhS5INSNSLSK2rb8vgTP1xksje982W0oON/lmox2x3w/iR24tg7gtixt1BaYVBt 7yDQ== X-Forwarded-Encrypted: i=1; AFNElJ9h1/fVrIaHb1WfBDpWGtwLDZXhbtVZ+v3Q4va9HuwYUX0rUfacmMKcufoFXoc7CbCHGem2GojlkVvx67c=@vger.kernel.org X-Gm-Message-State: AOJu0Yxcf5/cUZrPstNWLn3O+a72pVJGFmUFnpC3dBNhlbjUa6aNk8qr tofV4h/4ntFc8BrdEhO0AhUBgJ12tdrR2bj2XUD2v37lRSm+pQcR/Bc3 X-Gm-Gg: Acq92OFkpJ9QTbSKSON/PXJiSJg/pMnGI1lnNG/PySjPARGvlxrL+KX4sLVOivqLuQE iWUJU1JPeZ81d4kbbh6wRrSFAxJjZ6i2RCUnNnuiStBidpGjposvF/jqg+lIMHJ1XGI+ymzPAoQ K4UKuc32GTP+T5m2ejS6ie8qCG/uCJXs56zkTvLGqGbKSXlguUQDBbudy/hFM0lVDu1Vpz4sB+H wKCYPCwQYumWMEmNHT/cViRzHRBOpsVVFrRlwHEJ8cM2MjKqRf90bP8GiGN8VhsYUE5Ex6QM2/4 +GQWvaP+i8ZY/W673egNQ6kdDRQ6gioDsYf6XIExKnJZUMZd6Hefy6MKZIE1RzPuhTr0NwUD8Ol Z3OZeiPyxKHdT+tDrz7CoxIe0zwKLluhULLe53EzwU61UtX2DTUl3jBhFxzN7Su6y933Ucaef0N iSegXEjbzPg7uEAhMoc/GPcYEn4rdXit2QVsTUSrsH5y2hVYp+xD0SbrI12jXzhw69 X-Received: by 2002:a05:6870:b0e5:b0:439:ca41:d517 with SMTP id 586e51a60fabf-43b5ae106d7mr9194388fac.27.1779735908121; Mon, 25 May 2026 12:05:08 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:42::]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-43b635d26b5sm10931364fac.6.2026.05.25.12.05.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 May 2026 12:05:07 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner , Michal Hocko Cc: Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 5/7 v3] mm/memcontrol: optimize memsw stock for cgroup v1 Date: Mon, 25 May 2026 12:04:52 -0700 Message-ID: <20260525190455.2843786-6-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260525190455.2843786-1-joshua.hahnjy@gmail.com> References: <20260525190455.2843786-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Previously, each memcg had its own stock, which was shared by all page counters within it. Specifically in try_charge_memcg, the stock limit check would occur before the memsw and memory page_counters were charged hierarchically. Now that the memcg stock was folded into the page_counter level, and we have replaced try_charge_memcg's stock check against the memory page_counter's stock, this leaves no fast path available for cgroup v1's memsw check. Introduce a new stock for the memsw page_counter, charged independently from the memory page_counter. This provides better caching on cgroup v1: The best case scenario is when both the memsw and memory page_counters can use their cached stock charge; this is the old behavior. The halfway scenario is when either the memsw or memory page_counter is within the stock size, but the other isn't. This requires one hierarchical charge. The worst case scenario is when both memsw and memory page_counters are over their limit, and must walk two page_counter hierarchies. This is the same as the old behavior. By introducing an independent stock for memsw, we can avoid the worst case scenario more often and can fail or succeed separately from the memory page counter. One user-visible change is that reported memsw usage may transiently be lower than memory usage. This happens because each counter independently batches the stock charges, so the visible values can differ by up to the stock batch size (MEMCG_CHARGE_BATCH) pages. Signed-off-by: Joshua Hahn --- mm/memcontrol.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 952c6f7430395..f20c9b829224a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2265,8 +2265,11 @@ static long drain_stock_on_cpu(void *arg) struct mem_cgroup *root_memcg =3D arg; struct mem_cgroup *memcg; =20 - for_each_mem_cgroup_tree(memcg, root_memcg) + for_each_mem_cgroup_tree(memcg, root_memcg) { page_counter_drain_stock_local(&memcg->memory); + if (do_memsw_account()) + page_counter_drain_stock_local(&memcg->memsw); + } =20 return 0; } @@ -2313,8 +2316,11 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu) /* no need for the local lock */ drain_obj_stock(&per_cpu(obj_stock, cpu)); =20 - for_each_mem_cgroup(memcg) + for_each_mem_cgroup(memcg) { page_counter_drain_stock_cpu(&memcg->memory, cpu); + if (do_memsw_account()) + page_counter_drain_stock_cpu(&memcg->memsw, cpu); + } =20 return 0; } @@ -4259,6 +4265,8 @@ static int mem_cgroup_css_online(struct cgroup_subsys= _state *css) =20 /* failure is nonfatal, charges fall back to direct hierarchy */ page_counter_enable_stock(&memcg->memory, MEMCG_CHARGE_BATCH); + if (do_memsw_account()) + page_counter_enable_stock(&memcg->memsw, MEMCG_CHARGE_BATCH); =20 /* * Ensure mem_cgroup_from_private_id() works once we're fully online. @@ -4323,6 +4331,8 @@ static void mem_cgroup_css_offline(struct cgroup_subs= ys_state *css) =20 drain_all_stock(memcg); page_counter_disable_stock(&memcg->memory); + if (do_memsw_account()) + page_counter_disable_stock(&memcg->memsw); =20 mem_cgroup_private_id_put(memcg, 1); } --=20 2.53.0-Meta From nobody Mon Jun 8 22:53:11 2026 Received: from mail-oa1-f42.google.com (mail-oa1-f42.google.com [209.85.160.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BCE1E38D001 for ; Mon, 25 May 2026 19:05:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779735912; cv=none; b=NwIfOlfmKNrSrVKFMxnRJFCL8Mt3G+cyq2BlSnNug1uVbP3G+iuxDQdJFEIBBelN3E2iAnevSPYK/1pNcnJaYqm8H9YI+35uAbNMJczHCnBKjpSfWJFlXtrfjzb2lNCgQNDCRe7ei7zAlIm/KW4l9N4S0zrMjG+G1+RHL3L3Ihk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779735912; c=relaxed/simple; bh=zi3Ude+mgYpIWVX0/fNZoP84JwZ0OmrzuaVBZnHQ+MM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=II9G1OnvboWNQ7YWlAkyL11O5WKgr6SlLCuE8M3m0GJ/rxTs9BH+gdFHY4xajgmej4hAEutAc8/qg16QFa8EtuLxSpkJ/Q5k2GPkJytcl31X/oQmgPg9YilWG+RUMLpSWw44Cb05JdmgTBEhQ4h+cPyYcORlpQ14Wd1mFEJ9dH0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=iqH8Yg5F; arc=none smtp.client-ip=209.85.160.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iqH8Yg5F" Received: by mail-oa1-f42.google.com with SMTP id 586e51a60fabf-43bf95c3f6fso263782fac.0 for ; Mon, 25 May 2026 12:05:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779735910; x=1780340710; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DQ36US7aEGO2e9MShcKJGWBIzONamDGB3D8ceYmvI+8=; b=iqH8Yg5F4gCLFUnOmnesvxqsvmmzTOGfKJxYrUvTbxQU6dIUMjYtkNGfyTZaVqxsjp tdBu4f5JkpcPRenPUG/FsGyTZRtv+ntzB23vrdyLPwOfM9Wp+f5I5OqZmcjoA3iWPoU3 hdwW/M+w8ih78RvAZto032ymThYOWAC0L63ngShpvI1dfN+QKEv1yrJsl0bI+DfMgrVa IunOZPpfX32ELBJ7CwTK1Vo+8Uoe+HWDA4s3xbG3dcDqROkLHBvoo5oAlpVhjQG5jG2L RBNv54LBPO7getggsVJw6mbVnCMc1WiIp9cot4LI+ty6ml1nlyVImQTwLSZUl55NYts7 3/tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779735910; x=1780340710; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=DQ36US7aEGO2e9MShcKJGWBIzONamDGB3D8ceYmvI+8=; b=rYtplgPk3rPKgmsN0CYbXkSVj8C8mvWtzhe+4YMLpGC+4C2m7Psxvz8saxsrvgiLZS /5tIJTrpVQ6m4+PqfV+uHU+WyI0oYUU7O329Ytks6DxXFj9UMM+J69+Jtvd4yKY93ejN KkFrfUmvyZCkoKMJggzg4oMt/AyJbWsqNz2rJ8x0N56HuojhxmtJWsdKRfiMRdJmIpY/ /t37MUe+pJzyz3ghV5RcT2wA/xWE1wO5ydoROkgxgUZnjTpn8T2G+NvJtW4qIYtQZsu2 SBqHkPyd0O8vHDJtvMY5CUrVWnlHCnplQmZzgNLMl6Xtp6PoZVN3U1/bT8pw3nqMJDQA 3X8A== X-Forwarded-Encrypted: i=1; AFNElJ/vXYEqyxkaI2q8yN+OhkkBoG/y/TsfqFd/ktxEp/pyAORK8keJh4gF3jJ95oByvlNavbpsgwvIzwsT4mI=@vger.kernel.org X-Gm-Message-State: AOJu0YxX9nmkwNELYHLBBJdNmYlh1F2CvI59PiAfZQO9ZCofA7IL3V1s f8sgFSez9b0wyrX93srNJvcAa66z1HlARVvMywnnjR9vBt3IWHtdmU+s X-Gm-Gg: Acq92OGkvSjN05aq3Ck4aH6bqgawTdN4sF6oujTuNYzm7aLjm4Cq/4a5iaadJKlVgHt 52iJhBs2e3JVTPhAKt6qcduJ1lmZ+hf4NbKPW6QYo/2PBaR2RL3uz+YohbSA3uk91RpOrbxnyQS oChWRGD+bwObGa4CBxI5rhBdHS/cdQPxRUTHzK1P92BbFumYrRAzldbVsB8mi2HnuUe/6XEm2i1 6x8fuJzaO+z0AlIjXgNyMpYR5UCX8Yp+SamDkDzdts86iZwPJHjMToNqeDoZi9DZ344ekGMUutx NAjX3G5pQl4QH3p8/PoqNHwfWe0HU1qFRXNv58z4JdntgCB31o+Eb2tZR4ztceMbntzPUsuyvay UJp3OGBLhxfPZvIMj6pXNFht6GqSFjyKb18Caq35bc1b0VCEymrbYA6U9iefdRy8arUqmnFU2y4 lp9/MQYZF8BEQPS61kQA71PS6qrTY8BJmyiCa3lXOFms2ImIdB1GUG0A== X-Received: by 2002:a05:6871:3291:b0:43a:60e5:21c6 with SMTP id 586e51a60fabf-43b2fcaa33bmr9336295fac.24.1779735909636; Mon, 25 May 2026 12:05:09 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:4c::]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-43b635d26b5sm10931436fac.6.2026.05.25.12.05.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 May 2026 12:05:09 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner , Michal Hocko Cc: Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 6/7 v3] mm/memcontrol: optimize stock usage for cgroup v2 Date: Mon, 25 May 2026 12:04:53 -0700 Message-ID: <20260525190455.2843786-7-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260525190455.2843786-1-joshua.hahnjy@gmail.com> References: <20260525190455.2843786-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In cgroup v2, it is unlikely for memcg charges to happen directly on non-leaf cgroups. There are a few exceptions, such as processes that have yet to be migrated to children, and tasks that are reparented on memcg destruction, that charge to the parent. Because it is rare for parent cgroups to receive direct charges, stock that remains in them are wasted memory. Drain parent stocks when the first child is created to return those pages for other memcgs to use. This optimization is not for cgroup v1, where tasks can be attached to any cgroup in the hierarchy, meaning stock can be consumed & refilled for non-leaf cgroups as well. Signed-off-by: Joshua Hahn --- mm/memcontrol.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f20c9b829224a..64b82f1782720 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4280,6 +4280,21 @@ static int mem_cgroup_css_online(struct cgroup_subsy= s_state *css) */ xa_store(&mem_cgroup_private_ids, memcg->id.id, memcg, GFP_KERNEL); =20 + /* + * It is unlikely for non-leaf memcgs to get direct charges on v2. + * Drain the parent's stock if we are the first child. + */ + if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) { + struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); + int cpu; + + if (parent && !mem_cgroup_is_root(parent) && + !css_has_online_children(&parent->css)) { + for_each_online_cpu(cpu) + work_on_cpu(cpu, drain_stock_on_cpu, parent); + } + } + return 0; free_objcg: for_each_node(nid) { --=20 2.53.0-Meta From nobody Mon Jun 8 22:53:11 2026 Received: from mail-ot1-f50.google.com (mail-ot1-f50.google.com [209.85.210.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 14DA037AA6C for ; Mon, 25 May 2026 19:05:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779735913; cv=none; b=cdT1kFyOnSS0MgxoMEtuKAlOp/SqauW057n4tXO+zH+4lXqrgu0hMHxrAkp3KJioUTlyJGCVhTCJxsDpTCOCmp6FmWvoDzTurJUlXPq+p1MoFwtL8txQJ6bhZrmbYI51ySXzkl6IXcJJP2gRdmAJQDyPSEzbZriBp5TeM2RIvS8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779735913; c=relaxed/simple; bh=/SPE5jNCw9e+ST8bg1/txKV8x+Z1fo3LcEU44Au9yJM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UXx4B7i/c0di6zRr9HvDiEtM0XZCMUi/y0Ur6NaH/MlAnGQisXscUyhHv+g+QmlkXnWJgPBLcm02zEKi7yu1Y3NytgWRQibT6OnZW/FITsQTC/xtRxqmicRMnpNPUOJl6HhHjHeWvfHK94hS/zvYo9AjqzPWq7fkrUW/XRmpa6w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=BpfXzNB8; arc=none smtp.client-ip=209.85.210.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="BpfXzNB8" Received: by mail-ot1-f50.google.com with SMTP id 46e09a7af769-7de7c57b52cso8615239a34.3 for ; Mon, 25 May 2026 12:05:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779735911; x=1780340711; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xnIw8YkeeVJF7u3zrXFA437o74mwG9/TEQ/6A3i13nY=; b=BpfXzNB8tbCDnyTHdp1DxHcUMyhq9zLUN+r5rEn4p1Zrd+CliLbzye1ZrRPKHRoBcL Td49U2rfjzZgRaOAMYJZVr1sgRB2wCdyyQldE5oqenlJrrDN/hzzeJ4WVB3KUMkMgoMb RcgIUpIJL59ky9mcSQNyGIVg76Zag33LsXbJdUiOQcQkVvrcfKmUOHwAa38jc2jgGFaV yWtKcJNjtTMKFYIsPB0n8WjxKtR+koSwi30UZVyJo2nhgi5pr/iEaCq7xGXzx273GfJx UduZYZXgPtZoABObvxWdf8jDICREr+Vu7XIgWtjF6F6wQfjHMIQj1PD0rZh7kn44uPh3 7F6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779735911; x=1780340711; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=xnIw8YkeeVJF7u3zrXFA437o74mwG9/TEQ/6A3i13nY=; b=FZwYS9Xt6aT1otAaSCJsQCWwyJ5aGukbOucqcoFihHmOkvAUGvG+XVxzYEDq8wXK8O jaz3F+CL1QecP6+MAQ7qc7ZnPE/nBxJNj7cczwy8u5y/ZTEf5/rctELvLnWV43QJHa70 HIOYbyer2CahSVBlxnq9dYNFAcEimsaVUFjiGZ+m8h6bEZ5ju20EEymrLx2cfJEP2T2/ uu38/ga874txhqV9ViUOoOSuiaL8I4GZgqFRrRblAFejCjHkTOJrFwXQ4tPX3cAbweYv LX60UWArQRh+5P/xJ+2gRHVBZoiq4f75S3JI+noAI38V2ik201kHmfiQkOhn3irO/z3s 8HBg== X-Forwarded-Encrypted: i=1; AFNElJ+lpS1jbIWzPCV7f2YQdS8VSG5tpO/J0NJj4QXgNNaMlXFVnH6lXYFoAngWWDdUEPS6oHQ45p4YVobg6hk=@vger.kernel.org X-Gm-Message-State: AOJu0YykVSrTvXFcpjxNuLVJrzVa5ApM8PHEnCxQYN57aZmpgGmvjURA 11cZO5Ss/RPQdfB9sSLuc2QtSaX59x9Vx0JKC5MZToWQG4RIs3ItQg+D X-Gm-Gg: Acq92OFx+9OKKOsYFJczjzgo73qv0Lrcuede+XkezUiU3yrqig8q4U7aIaQECzzFRrc X10MRPIq2V62yXwmLFaqz2fgW7UuPn1AdAnQzKkh2LJLzOobn2MVQ2P4PBr32s1pqyAbMcwRZTB ED+7/8OIcTBzI5XTo1+lwErX7kj7rNK5l+fBkwEBqiOvuQfxNi+TZR/elj8UqxkqvMJAe7JASM2 PWwL85u+1+ZmfpjrOSdcSgNETmh57fTAG80WTZKD2kX97klb5OI6c2mWcGqt+aRb0DFOrNhZsMN ivplYyOoPKs+QziFxpQ1YULspcqTdNz8QH8KKycsNFlxY1x/LXVOIL7b89R2wHvHGE+VFBaLYyC mRia09gsoHnC6Xzt0TnlS45gXP0z2jGTv08IpSQnaSSbrOeMBGPQhpOH3COlmlBfrdg/+7UtbI4 mj+/b3fcnQ3F55Z1InLm8UCBO1ZNYiRxiC6F2i9B4Vf6hVNtyzeiomJQ== X-Received: by 2002:a05:6830:7187:b0:7d7:e045:b489 with SMTP id 46e09a7af769-7e5feeb9b7cmr9201550a34.12.1779735910946; Mon, 25 May 2026 12:05:10 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:48::]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7e60663fe79sm7729095a34.20.2026.05.25.12.05.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 May 2026 12:05:10 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner , Michal Hocko Cc: Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 7/7 v3] mm/memcontrol: remove unused memcg_stock code Date: Mon, 25 May 2026 12:04:54 -0700 Message-ID: <20260525190455.2843786-8-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260525190455.2843786-1-joshua.hahnjy@gmail.com> References: <20260525190455.2843786-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that all memcg_stock logic has been moved to page_counter_stock, we can remove all code related to handling memcg_stock. Note that obj_stock is untouched and is still needed. FLUSHING_CACHED_CHARGE is preserved so that it can be used by obj_stock as well. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- mm/memcontrol.c | 186 ------------------------------------------------ 1 file changed, 186 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 64b82f1782720..5319219d0dcb5 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1998,25 +1998,7 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *m= emcg) pr_cont(" are going to be killed due to memory.oom.group set\n"); } =20 -/* - * The value of NR_MEMCG_STOCK is selected to keep the cached memcgs and t= heir - * nr_pages in a single cacheline. This may change in future. - */ -#define NR_MEMCG_STOCK 7 #define FLUSHING_CACHED_CHARGE 0 -struct memcg_stock_pcp { - local_trylock_t lock; - uint8_t nr_pages[NR_MEMCG_STOCK]; - struct mem_cgroup *cached[NR_MEMCG_STOCK]; - - struct work_struct work; - unsigned long flags; - uint8_t drain_idx; -}; - -static DEFINE_PER_CPU_ALIGNED(struct memcg_stock_pcp, memcg_stock) =3D { - .lock =3D INIT_LOCAL_TRYLOCK(lock), -}; =20 /* * NR_OBJ_STOCK is sized so the entire hot path of obj_stock_pcp @@ -2056,47 +2038,6 @@ static void drain_obj_stock(struct obj_stock_pcp *st= ock); static bool obj_stock_flush_required(struct obj_stock_pcp *stock, struct mem_cgroup *root_memcg); =20 -/** - * consume_stock: Try to consume stocked charge on this cpu. - * @memcg: memcg to consume from. - * @nr_pages: how many pages to charge. - * - * Consume the cached charge if enough nr_pages are present otherwise retu= rn - * failure. Also return failure for charge request larger than - * MEMCG_CHARGE_BATCH or if the local lock is already taken. - * - * returns true if successful, false otherwise. - */ -static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) -{ - struct memcg_stock_pcp *stock; - uint8_t stock_pages; - bool ret =3D false; - int i; - - if (nr_pages > MEMCG_CHARGE_BATCH || - !local_trylock(&memcg_stock.lock)) - return ret; - - stock =3D this_cpu_ptr(&memcg_stock); - - for (i =3D 0; i < NR_MEMCG_STOCK; ++i) { - if (memcg !=3D READ_ONCE(stock->cached[i])) - continue; - - stock_pages =3D READ_ONCE(stock->nr_pages[i]); - if (stock_pages >=3D nr_pages) { - WRITE_ONCE(stock->nr_pages[i], stock_pages - nr_pages); - ret =3D true; - } - break; - } - - local_unlock(&memcg_stock.lock); - - return ret; -} - static void memcg_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages) { page_counter_uncharge(&memcg->memory, nr_pages); @@ -2104,51 +2045,6 @@ static void memcg_uncharge(struct mem_cgroup *memcg,= unsigned int nr_pages) page_counter_uncharge(&memcg->memsw, nr_pages); } =20 -/* - * Returns stocks cached in percpu and reset cached information. - */ -static void drain_stock(struct memcg_stock_pcp *stock, int i) -{ - struct mem_cgroup *old =3D READ_ONCE(stock->cached[i]); - uint8_t stock_pages; - - if (!old) - return; - - stock_pages =3D READ_ONCE(stock->nr_pages[i]); - if (stock_pages) { - memcg_uncharge(old, stock_pages); - WRITE_ONCE(stock->nr_pages[i], 0); - } - - css_put(&old->css); - WRITE_ONCE(stock->cached[i], NULL); -} - -static void drain_stock_fully(struct memcg_stock_pcp *stock) -{ - int i; - - for (i =3D 0; i < NR_MEMCG_STOCK; ++i) - drain_stock(stock, i); -} - -static void drain_local_memcg_stock(struct work_struct *dummy) -{ - struct memcg_stock_pcp *stock; - - if (WARN_ONCE(!in_task(), "drain in non-task context")) - return; - - local_lock(&memcg_stock.lock); - - stock =3D this_cpu_ptr(&memcg_stock); - drain_stock_fully(stock); - clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); - - local_unlock(&memcg_stock.lock); -} - static void drain_local_obj_stock(struct work_struct *dummy) { struct obj_stock_pcp *stock; @@ -2165,88 +2061,6 @@ static void drain_local_obj_stock(struct work_struct= *dummy) local_unlock(&obj_stock.lock); } =20 -static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) -{ - struct memcg_stock_pcp *stock; - struct mem_cgroup *cached; - uint8_t stock_pages; - bool success =3D false; - int empty_slot =3D -1; - int i; - - /* - * For now limit MEMCG_CHARGE_BATCH to 127 and less. In future if we - * decide to increase it more than 127 then we will need more careful - * handling of nr_pages[] in struct memcg_stock_pcp. - */ - BUILD_BUG_ON(MEMCG_CHARGE_BATCH > S8_MAX); - - VM_WARN_ON_ONCE(mem_cgroup_is_root(memcg)); - - if (nr_pages > MEMCG_CHARGE_BATCH || - !local_trylock(&memcg_stock.lock)) { - /* - * In case of larger than batch refill or unlikely failure to - * lock the percpu memcg_stock.lock, uncharge memcg directly. - */ - memcg_uncharge(memcg, nr_pages); - return; - } - - stock =3D this_cpu_ptr(&memcg_stock); - for (i =3D 0; i < NR_MEMCG_STOCK; ++i) { - cached =3D READ_ONCE(stock->cached[i]); - if (!cached && empty_slot =3D=3D -1) - empty_slot =3D i; - if (memcg =3D=3D READ_ONCE(stock->cached[i])) { - stock_pages =3D READ_ONCE(stock->nr_pages[i]) + nr_pages; - WRITE_ONCE(stock->nr_pages[i], stock_pages); - if (stock_pages > MEMCG_CHARGE_BATCH) - drain_stock(stock, i); - success =3D true; - break; - } - } - - if (!success) { - i =3D empty_slot; - if (i =3D=3D -1) { - i =3D stock->drain_idx++; - if (stock->drain_idx =3D=3D NR_MEMCG_STOCK) - stock->drain_idx =3D 0; - drain_stock(stock, i); - } - css_get(&memcg->css); - WRITE_ONCE(stock->cached[i], memcg); - WRITE_ONCE(stock->nr_pages[i], nr_pages); - } - - local_unlock(&memcg_stock.lock); -} - -static bool is_memcg_drain_needed(struct memcg_stock_pcp *stock, - struct mem_cgroup *root_memcg) -{ - struct mem_cgroup *memcg; - bool flush =3D false; - int i; - - rcu_read_lock(); - for (i =3D 0; i < NR_MEMCG_STOCK; ++i) { - memcg =3D READ_ONCE(stock->cached[i]); - if (!memcg) - continue; - - if (READ_ONCE(stock->nr_pages[i]) && - mem_cgroup_is_descendant(memcg, root_memcg)) { - flush =3D true; - break; - } - } - rcu_read_unlock(); - return flush; -} - static void schedule_drain_work(int cpu, struct work_struct *work) { /* --=20 2.53.0-Meta