From nobody Sat Jun 20 19:59:04 2026 Received: from mail-ot1-f54.google.com (mail-ot1-f54.google.com [209.85.210.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E35422F74A for ; Fri, 10 Apr 2026 21:07:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775855268; cv=none; b=htbEpGgSjJ0Vrjz0u8kdaR9LXXZZc2VgYKNI+qE3jqOEabqOqqY5eCfPvhyY+4hLJHn7d0fHA9eoWVdPhjaHGnwXQcHIPGCc1AN94W1o2CnIKdcY6E99cSeQ4GZIxPqfzGK2BJEtGI2Q/MbZRa5eClQqz0yLzK1rsEnw9l8PGpw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775855268; c=relaxed/simple; bh=u9ErvmZeT2BqDjl0OuopUNGSnpUsL3K8g6ZSAxdifZg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UAO3qJZEi8k8iwa024SoO1gR7qL6Z3kCL9fLktafd/j5xWdwsZgjBE5RF/r7NIjMZOnaFwFyTgM/P9Y9FwuRWRdTXQZPNqeKQDBfWlMy9NQDP1M60LzAc1825nHLJgqO6C6t+QHDLM/BWqNF6Spxc/HVngItRg74ToafYheAf5A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=kDtwKJ1q; arc=none smtp.client-ip=209.85.210.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kDtwKJ1q" Received: by mail-ot1-f54.google.com with SMTP id 46e09a7af769-7d9b21d1461so2377234a34.1 for ; Fri, 10 Apr 2026 14:07:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775855266; x=1776460066; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Ra7lyKTz0FXclo/ZAhEfnaPWl8hR2IKs3cjWe/KBKyU=; b=kDtwKJ1qEnskAltVCx0rxmJUKWdMJJnxsWhBeqxd/jncd3cngRWiRWp3HTNBYhK+j5 OD3aSqf9+6GnbqxD5ZTHWPwcHLkbgwnyQcBj4eEuHbTOFB2r+m+igRx3HKRhf3LhoGRV OY5SPdcNblunCkUP0KOqnasi+lW0e0kr8fukvfJ50P7b/pH0rt7CRn7kCbq5niA7d3xl eMn1WqbcbQ+R76v5cA0NJEN25uxfvBh7fhglfdtfTOBIQEh3nBZSrQjsXOqzX83vH9Pv dLnd64oTvz8/6HpDUoggTLK4LQzjzEMFMSSe28toUTPHxWr/ghC2/Fi+5bxLDbeUVixh SReQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775855266; x=1776460066; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Ra7lyKTz0FXclo/ZAhEfnaPWl8hR2IKs3cjWe/KBKyU=; b=GBfe3IjFioKGgKD/FImKpM8l4n+QXvO/l4aUO8svpPDN6fKExsgJvokAiidec2iaBO 2BUeE9EgWp5UyDGRUzaYy1RmcQ8OTAFiOqbCE906saMvukfCA0au2hg+QS/JhkYtFUEy rSUwAOVN8BZFVMC31kadNIxRBAeqFcdabO4JWPMudJNQ0QL5nLGMbhJ3SvaEfb47rL0m D+MTzxGMTVKNsIeDDwQ6GDANhV1K/gnUQksdJnf4bg/kjP2YXezl90srPVAe4kr8g3ln 6lVTTpa/Dn1q3YR/OgMiqNuaSk6l7fitMf22Zr/fTQ/j9qPIy69/ZF7RW3wMTeYExOxT Y2sA== X-Forwarded-Encrypted: i=1; AJvYcCXeKJa1+EYukDuRTfAtsPic/nfsNAbmCLoVS0EPGpamzZAF3EdulglJCycCcOiyw2ggEu208qh6ELvJ3yo=@vger.kernel.org X-Gm-Message-State: AOJu0Yyyz8RUaQUxXnru8bcGojVu7CBMZ4uYdO74QeUcYHHu/8Bcmrcl ppbqvsnB5m9+zdQiDZkh9xFg9ycf5dkUIjbzYorUw2YgDpeIG0nLk4P0 X-Gm-Gg: AeBDieu+F7x7fZQFvYz83TMYOA0Gwpwcp4+cQAmQd3XRp42ckG5cKfsYs4N4T2n/u8q LvA02ExWpL6hZeWSFjHvSItX/OSirIb85PmD0TbwN4sLlW6/HnB7QmxYoIWYWoecbcjve8uCVPD cR9ktu7DsnSj3ieRbrMwJA4zCvVPBhnebFGQD2eGhkTIL7eGgAhk2NXhnswEWGNlFfwtT5mdjZl pNA4VWX1p2svBwKqzykE/b/IXjzJ9sBDRsWBWKqMeQ86MfURwaNhin7Ur0zFDPSpUbcGR1HcXQp oMY9sT4KgTBuz+rUULkNkJOlbZtY/PnF6Ad2RGpQSHWFsLy+1G956cc0olw9sRUyVg47cZpw7U3 kvQsOmk9qidkx+DKZ+/ik0lLuE00lkW/O9LPiJE+AAL2SV3ZNTthKqIJaBKP37c9dA89HGlPYfD aEGKWLYzJpUNB1iaesjL3Mx8VhdnxAPDOx X-Received: by 2002:a05:6830:8312:b0:7d8:7da0:7d8f with SMTP id 46e09a7af769-7dc177a646fmr3456536a34.16.1775855266401; Fri, 10 Apr 2026 14:07:46 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:72::]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7dc269d52ffsm3136947a34.26.2026.04.10.14.07.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Apr 2026 14:07:46 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner Cc: Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 1/8 RFC] mm/page_counter: introduce per-page_counter stock Date: Fri, 10 Apr 2026 14:06:55 -0700 Message-ID: <20260410210742.550489-2-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260410210742.550489-1-joshua.hahnjy@gmail.com> References: <20260410210742.550489-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In order to avoid expensive hierarchy walks on every memcg charge and limit check, memcontrol uses per-cpu stocks (memcg_stock_pcp) to cache pre-charged pages and introduce a fast path to try_charge_memcg. However, there are a few quirks with the current implementation that could be improved upon. First, each memcg_stock_pcp can only cache the charges of 7 memcgs (defined as NR_MEMCG_STOCK), which means that once a CPU starts handling the charging of more than 7 memcgs, it randomly selects a victim memcg to evict and drain from the cpu, which can cause unnecessarily increased latencies and thrashing as memcgs continually evict each others' stock. Second, stock is tightly coupled with memcg, which means that all page counters in a memcg share the same resource. This may simplify some of the charging logic, but it prevents new page counters from being added and using a separate stock. We can address these concerns by pushing the concept of stock down to the page_counter level, which addresses the random eviction problem by getting rid of the 7 slot limit, and makes enabling separate stock caches for other page_counters simpler. Introduce a generic per-cpu stock directly in struct page_counter. Stock can optionally be enabled per-page_counter, limiting the overhead increase for page_counters who do not benefit greatly from caching charges. This patch introduces the page_counter_stock struct and its enable/disable/free functions, but does not use these yet. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- include/linux/page_counter.h | 13 ++++++++ mm/page_counter.c | 60 ++++++++++++++++++++++++++++++++++++ 2 files changed, 73 insertions(+) diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h index d649b6bbbc871..c7e3ab3356d20 100644 --- a/include/linux/page_counter.h +++ b/include/linux/page_counter.h @@ -5,8 +5,15 @@ #include #include #include +#include +#include #include =20 +struct page_counter_stock { + local_trylock_t lock; + unsigned long nr_pages; +}; + struct page_counter { /* * Make sure 'usage' does not share cacheline with any other field in @@ -41,6 +48,8 @@ struct page_counter { unsigned long high; unsigned long max; struct page_counter *parent; + struct page_counter_stock __percpu *stock; + unsigned int batch; } ____cacheline_internodealigned_in_smp; =20 #if BITS_PER_LONG =3D=3D 32 @@ -99,6 +108,10 @@ static inline void page_counter_reset_watermark(struct = page_counter *counter) counter->watermark =3D usage; } =20 +int page_counter_enable_stock(struct page_counter *counter, unsigned int b= atch); +void page_counter_disable_stock(struct page_counter *counter); +void page_counter_free_stock(struct page_counter *counter); + #if IS_ENABLED(CONFIG_MEMCG) || IS_ENABLED(CONFIG_CGROUP_DMEM) void page_counter_calculate_protection(struct page_counter *root, struct page_counter *counter, diff --git a/mm/page_counter.c b/mm/page_counter.c index 661e0f2a5127a..965021993e161 100644 --- a/mm/page_counter.c +++ b/mm/page_counter.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -289,6 +290,65 @@ int page_counter_memparse(const char *buf, const char = *max, return 0; } =20 +int page_counter_enable_stock(struct page_counter *counter, unsigned int b= atch) +{ + struct page_counter_stock __percpu *stock; + int cpu; + + stock =3D alloc_percpu(struct page_counter_stock); + if (!stock) + return -ENOMEM; + + for_each_possible_cpu(cpu) { + struct page_counter_stock *s =3D per_cpu_ptr(stock, cpu); + + local_trylock_init(&s->lock); + } + counter->stock =3D stock; + counter->batch =3D batch; + + return 0; +} + +void page_counter_disable_stock(struct page_counter *counter) +{ + unsigned int stock_to_drain =3D 0; + int cpu; + + if (!counter->stock) + return; + + for_each_possible_cpu(cpu) { + struct page_counter_stock *stock; + + /* + * No need for local lock; this is called during css_offline, + * after the cgroup has already been removed. + */ + stock =3D per_cpu_ptr(counter->stock, cpu); + stock_to_drain +=3D stock->nr_pages; + } + + if (stock_to_drain) { + struct page_counter *c; + + for (c =3D counter; c; c =3D c->parent) + page_counter_cancel(c, stock_to_drain); + } + + /* This prevents future charges from trying to deposit pages */ + counter->batch =3D 0; +} + +void page_counter_free_stock(struct page_counter *counter) +{ + if (!counter->stock) + return; + + free_percpu(counter->stock); + counter->stock =3D NULL; +} + =20 #if IS_ENABLED(CONFIG_MEMCG) || IS_ENABLED(CONFIG_CGROUP_DMEM) /* --=20 2.52.0 From nobody Sat Jun 20 19:59:04 2026 Received: from mail-ot1-f42.google.com (mail-ot1-f42.google.com [209.85.210.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C33A43A8FE1 for ; Fri, 10 Apr 2026 21:07:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775855270; cv=none; b=tWqxe7qq2PtrHd7CDTfJClFbTT/RgLRuprr9eP6+CLGRAm6XiSZzHyDqocjX6TaAWTyIk6fff/WfZVp++0lrg+7jf/xm4iYfKX6UWVaS0v05oqj5VMtpQNgu5a6Gyd44kK5dJEmtGqdgltrqL9NL6jLMOlDdXwBqDVPnwHVo0/A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775855270; c=relaxed/simple; bh=cUF+TCsFr+schEecVTCKTIfvNZvIK6t/KRC4PIJ9c3A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EBj0I8TUNPK2LZ8CPluJOUxN+vuSvXM4sPCp8RdQa2JVeBt4LKUoFRemUD1339gmgFPUVXPYv+2UPlh/80l5o4rVRD5q7XbroCpHDIAxscn9WEqtRJ+3X9QOoaaS9LAXcHHJybrRLOGieCWPKh5EhsT5nH0xuA0Aam0KyiLcBto= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=o4Ga4Ul+; arc=none smtp.client-ip=209.85.210.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="o4Ga4Ul+" Received: by mail-ot1-f42.google.com with SMTP id 46e09a7af769-7dbd2a0211bso1859026a34.0 for ; Fri, 10 Apr 2026 14:07:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775855268; x=1776460068; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Uc/5SAW/hlzwTRYBBK07KZ6zfX2jO27FY+LfQ7kQsVY=; b=o4Ga4Ul+vHznqkwIZ0R9+pHO3B9As+aY4C4IRAECwWjG9lxhXZWJvxIFEEX1p9fNfv A+ydKpJspKEFfBGVDGKE/DYr1EhFuxZRowe/7W4lG/zz0mn36bWrmySyBifhYz8alsGA CsueoW93b0qAcggHL8BAnFpNyLZaT7zz9U2NO77ld84d940MjY4sjG+fL+/yeKody2d7 Gj2U5Ka6OqnC93DfurJ9zaqIRbSpMl9nhP+qqJ0xMXl36tLUDmVFOQFZcfxWQeyrJiD0 UuGUOejUwasiUM9JxDjzIwYKlU9PUz2yAVZJ8i2Wsw/i0Ypnu9QpaVUktLKsdR/d8n+J LAhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775855268; x=1776460068; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Uc/5SAW/hlzwTRYBBK07KZ6zfX2jO27FY+LfQ7kQsVY=; b=BoFlVEQmZuwoFtOFkOsGGnob+PKrBx6ezDp7nridBQvwAJrWLf8K09Oy613ASuIyjH z4U1WhwPbIz0KugeQL2jGZrCSCtUC1KapURH20F6NPhogTxB/c+D+1qh8N8X5d8ui4Ru taLuGUmvjmtMb+SAW447Mtq8MI3IALaOZitR/bploZnkH4PTxDjZIUzCOMZtanNLuMu4 rvT1JjhAzWlwCbLoLb7IK1DJdkUx9U/SK3zbF7Gpixo6fjjR6M5tNFnVCH7b6FHpX3vs ApNFMqLwjnN6swSPYybZ0f35TaCFur3uOmoaoPkd119o9/QtTR7JTSWXJWiTTEmejmqM ghgA== X-Forwarded-Encrypted: i=1; AJvYcCWnj+z4CJFJj9MQ9/sXR3AA7S+YQPY1uBd+gqJaUX5z5NS7Cbt3JFthK/+anDhaRWoCS5c3nRg+LtNCukM=@vger.kernel.org X-Gm-Message-State: AOJu0Yw0+x2cR7jMLqpgHAH30I6ucP7ra0hrolNJSyEXtxNNCWOJF9R7 Ahlcr5NuMzpUPq5atoL+C+2cK/HFBb9Aa4u1mCPaJxIyp8AAifyyV7L9uZXq1Q== X-Gm-Gg: AeBDietv7Lce4VjiSlyubp/kXC3++990KM68zZjXT2BHW/R+KHuuKdq6bQmHWhMulgs ntv/dpqaOmkIwt8VEDiJqCE9/5tfvORyFh92ko48eiV5MMwb4W9jtOiSNw6CHLM70FbhPfkTW0L Cja5wIdp8JebBg4mUMHhiK0PUZgFpCBpT3ABIKHCwV+EiuOmXR7WKF4TcxPk8b/GzsS11bLiejC HUeO5XEzd8NEqY8MJM070m6dTcRkVwHFyInVQbLs2QexVGm+Kr9I/1N4085QiIr2i8VxwadauRM gwYjo74V4pQbGQm5WA5wLb+okXhL15vISBQU8n3Mdm4D6KrFwFX6gvyXjnZreGO7A0LFhD+hB/2 4pLXC61MCePBEUjoMnKXO47wyOuUM/tY0Ycgc2ttJSUXSBtToC6RX6uuHxdycfNFWAQxolJz58i 982uuVIOiSIRoGYJ59xCun X-Received: by 2002:a05:6830:81f7:b0:7d7:e844:7f4e with SMTP id 46e09a7af769-7dc27f1ca2emr2973075a34.22.1775855267769; Fri, 10 Apr 2026 14:07:47 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:7::]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7dc3959099fsm214769a34.9.2026.04.10.14.07.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Apr 2026 14:07:47 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner Cc: Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 2/8 RFC] mm/page_counter: use page_counter_stock in page_counter_try_charge Date: Fri, 10 Apr 2026 14:06:56 -0700 Message-ID: <20260410210742.550489-3-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260410210742.550489-1-joshua.hahnjy@gmail.com> References: <20260410210742.550489-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Make page_counter_try_charge() stock-aware. We preserve the same semantics as the existing stock handling logic in try_charge_memcg: 1. Limit-check against the stock. If there is enough, charge to the stock (non-hierarchical) and return immediately. 2. Greedily attempt to fulfill the charge request and fill the stock up at the same time via a hierarchical charge. 3. If we fail with this charge, retry again (once) with the exact number of pages requested. 4. If we succeed with the greedy attempt, then try to add those extra pages to the stock. If that fails (trylock), then uncharge those surplus pages hierarchically. As of this patch, the page_counter_stock is unused, as it has not been enabled on any memcg yet. No functional changes intended. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- mm/page_counter.c | 41 ++++++++++++++++++++++++++++++++++++++--- 1 file changed, 38 insertions(+), 3 deletions(-) diff --git a/mm/page_counter.c b/mm/page_counter.c index 965021993e161..7a921872079b8 100644 --- a/mm/page_counter.c +++ b/mm/page_counter.c @@ -121,9 +121,24 @@ bool page_counter_try_charge(struct page_counter *coun= ter, struct page_counter **fail) { struct page_counter *c; + unsigned long charge =3D nr_pages; bool protection =3D track_protection(counter); bool track_failcnt =3D counter->track_failcnt; =20 + if (counter->stock && local_trylock(&counter->stock->lock)) { + struct page_counter_stock *stock =3D this_cpu_ptr(counter->stock); + + if (stock->nr_pages >=3D charge) { + stock->nr_pages -=3D charge; + local_unlock(&counter->stock->lock); + return true; + } + local_unlock(&counter->stock->lock); + } + + charge =3D max_t(unsigned long, counter->batch, nr_pages); + +retry: for (c =3D counter; c; c =3D c->parent) { long new; /* @@ -140,9 +155,9 @@ bool page_counter_try_charge(struct page_counter *count= er, * we either see the new limit or the setter sees the * counter has changed and retries. */ - new =3D atomic_long_add_return(nr_pages, &c->usage); + new =3D atomic_long_add_return(charge, &c->usage); if (new > c->max) { - atomic_long_sub(nr_pages, &c->usage); + atomic_long_sub(charge, &c->usage); /* * This is racy, but we can live with some * inaccuracy in the failcnt which is only used @@ -163,11 +178,31 @@ bool page_counter_try_charge(struct page_counter *cou= nter, WRITE_ONCE(c->watermark, new); } } + + /* charge > nr_pages implies this page_counter has stock enabled */ + if (charge > nr_pages) { + if (local_trylock(&counter->stock->lock)) { + struct page_counter_stock *stock; + + stock =3D this_cpu_ptr(counter->stock); + stock->nr_pages +=3D charge - nr_pages; + local_unlock(&counter->stock->lock); + } else { + page_counter_uncharge(counter, charge - nr_pages); + } + } + return true; =20 failed: for (c =3D counter; c !=3D *fail; c =3D c->parent) - page_counter_cancel(c, nr_pages); + page_counter_cancel(c, charge); + + if (charge > nr_pages) { + /* Retry without trying to grab extra pages to refill stock */ + charge =3D nr_pages; + goto retry; + } =20 return false; } --=20 2.52.0 From nobody Sat Jun 20 19:59:04 2026 Received: from mail-ot1-f44.google.com (mail-ot1-f44.google.com [209.85.210.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 341F23A9D89 for ; Fri, 10 Apr 2026 21:07:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775855271; cv=none; b=GcUAC8inj7X9KQ5X65r8mbFqYHXGfhM6mgWv/G2hyNug99iiNVfGjSRCOzEKJvkesyfvBKHHi8eYuIpluwt09Nu7dudvSu+6N5ZQNcJoEEWjLHHzxWN8E2caoNYnksd40XUc6D4eDrVPWMFjDIdPQ6pG5i0SfMJRyGohisZPWAI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775855271; c=relaxed/simple; bh=CVQ20RllDi00A02ezby1KNQZkIpJboxutLoOO03cGmU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Q9GkwGOKQoPh2J5cW9rtOxD1osVNlVFHhgB/egWIlhK4eJZrC5Q7bXeJbP7rwP4CL5EFFwmktmz8uSocAn2HjEPdevK8UFxxQ9FPsCe2d+UIGg0nT6Mbdf5JlJPtz2pZJSozEYMgU4DFvo+6pgQ3nb+f2Gm5Wubrz9H6LYVH3r0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Wu2WG49k; arc=none smtp.client-ip=209.85.210.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Wu2WG49k" Received: by mail-ot1-f44.google.com with SMTP id 46e09a7af769-7d86eb7c854so1287982a34.3 for ; Fri, 10 Apr 2026 14:07:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775855269; x=1776460069; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ZqJC8Gmg2c8f/XCk9+e6dccXNeaFU07UyyA9/we+nYU=; b=Wu2WG49ktKAmJXvTovsACGm3LDfDYHFhdEwe8Fsc/lVmBZy1Ylbb2yPhDrMovubJwn YmXzek3EOMAScdzdCwffzxSHjbwOkhI1RqCseKOc7ZAYmXoNMhLi2wZ/poeVgyQ0znQi tR4HA8uvfu+69lJinKjRHL/n7cZ/Wlb0hba/EJ/pruegfSzA8ekLJV0iJJAPgzG+fH3N vmlgQBw3GNhYKjJRvwENAyCZMZYWesJiPO5esJ4vAOL/h1a3GiM+LGwlSi3bD0708+fx iCjQJBqjVM/GLNDM0zinsom10FZoWzcXNszR2sTYMJQxORDHBOG7BF2lu1ulAjJ5E9cM vpkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775855269; x=1776460069; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=ZqJC8Gmg2c8f/XCk9+e6dccXNeaFU07UyyA9/we+nYU=; b=ro+z68OzEtUNyIyWE3IVoR6s9LaENAi+LsR6Kb+QlAyrlSddaoUjtXnglGrEXwi8m8 Otuw4LgHxnpOwPrs84USp69yVNzC8uLlKUuh81Hr/E/st0tjHxakRsGDilARQHMCvKTN w1r83OSXgpBbGq7ykDpFHdV2QxEElxDsw/mKlGHL6digi3R9g9EFFEYaog/XXFHRE+tN wkbRqIZj+d+EOqRTQeVYNPyahwV09h8iF01ZZhjxrrpGKxRxKn3lrQdmAP6qijOIzbpY 1zc811AaGQ2GyvcBbtWcCXQr69VhITDARZdpNgrqvsYDklyjdwZEg464/p5eugjL6Lrn Ub6A== X-Forwarded-Encrypted: i=1; AJvYcCWuDDuvLZdDyg1ua+6HlslXL4aBsAwiReNJnObnYzM2nxj2280rLsXsd7g9/C7QfyP2ngl5NptM23QA3Vs=@vger.kernel.org X-Gm-Message-State: AOJu0YzlsDhhbflzlONWwDa7auL8aqtNPLLWLY8+L4E/svPm1caOLG7f 9Y3etkZzB8TOjazHwrDJqyvzcqdw9Ox4h5oR9/c56vsL7LTbtZ5UblIR X-Gm-Gg: AeBDiev/FEAUnrv2gwXtKzi0bXjIJVMZVY6Jrf/NE1AM9JyKUt0MGoAxbBZY5XqREUZ rrSXRQ+pudxxWE7e2hkx0YVfQjHLDnXiSmP2ya3rDQ3Adg8YkdvKZEHGhJ0EfP4qYqeWaP5ANG3 PA8dePFnpu1LhUcfHA1HKUB1CggshQflQJQ32dGVBhRRvtl6sXRA2KF91ZWuoo2CeKpZBRFfIG9 wrm7vvwm+Ik5bt0ah3minmHlWFd/B2xSL7TFav/6VpEPhEkm5gNZDViopkLUxRdvgztUX3WfYlk DSH2jiV3MMfgqBfGqhhn3VaO4W8JoXF4pNW77t5MRvbybZT3gZWXCGXJhXTFIJ0OT0nVb7mj6ii mzaHwUeWq1GqilpZHuBbaCxqokQ61QfZ3OhpulKC+51aNuwgDNTRaPPlyXyvBN3vV0Yrj6ZEeGD 7c13LJ5KG1P9xAu1GbsoKizQ== X-Received: by 2002:a05:6830:3747:b0:7dc:18e:b5b2 with SMTP id 46e09a7af769-7dc27cb91acmr3191143a34.9.1775855269238; Fri, 10 Apr 2026 14:07:49 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:56::]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7dc269402b9sm2561044a34.20.2026.04.10.14.07.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Apr 2026 14:07:48 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner Cc: Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 3/8 RFC] mm/page_counter: use page_counter_stock in page_counter_uncharge Date: Fri, 10 Apr 2026 14:06:57 -0700 Message-ID: <20260410210742.550489-4-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260410210742.550489-1-joshua.hahnjy@gmail.com> References: <20260410210742.550489-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Make page_counter_uncharge() stock-aware. We preserve the same semantics as the existing stock handling logic in try_charge_memcg: 1. Instead of immediately walking the page_counter hierarchy, see if depositing the charge to the stock puts it over the batch limit. If not, deposit the charge and return immediately. 2. If we put the stock over the batch limit, walk up the page_counter hierarchy and uncharge the excess. Extract the repeated work of hierarchically cancelling page_counter charges into a helper function as well. As of this patch, the page_counter_stock is unused, as it has not been enabled on any memcg yet. No functional changes intended. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- mm/page_counter.c | 36 +++++++++++++++++++++++++++--------- 1 file changed, 27 insertions(+), 9 deletions(-) diff --git a/mm/page_counter.c b/mm/page_counter.c index 7a921872079b8..7be214034bfad 100644 --- a/mm/page_counter.c +++ b/mm/page_counter.c @@ -207,6 +207,15 @@ bool page_counter_try_charge(struct page_counter *coun= ter, return false; } =20 +static void page_counter_cancel_hierarchy(struct page_counter *counter, + unsigned long nr_pages) +{ + struct page_counter *c; + + for (c =3D counter; c; c =3D c->parent) + page_counter_cancel(c, nr_pages); +} + /** * page_counter_uncharge - hierarchically uncharge pages * @counter: counter @@ -214,10 +223,23 @@ bool page_counter_try_charge(struct page_counter *cou= nter, */ void page_counter_uncharge(struct page_counter *counter, unsigned long nr_= pages) { - struct page_counter *c; + unsigned long charge =3D nr_pages; =20 - for (c =3D counter; c; c =3D c->parent) - page_counter_cancel(c, nr_pages); + if (counter->stock && local_trylock(&counter->stock->lock)) { + struct page_counter_stock *stock =3D this_cpu_ptr(counter->stock); + + stock->nr_pages +=3D nr_pages; + if (stock->nr_pages > counter->batch) { + charge =3D stock->nr_pages - counter->batch; + stock->nr_pages =3D counter->batch; + local_unlock(&counter->stock->lock); + } else { + local_unlock(&counter->stock->lock); + return; + } + } + + page_counter_cancel_hierarchy(counter, charge); } =20 /** @@ -364,12 +386,8 @@ void page_counter_disable_stock(struct page_counter *c= ounter) stock_to_drain +=3D stock->nr_pages; } =20 - if (stock_to_drain) { - struct page_counter *c; - - for (c =3D counter; c; c =3D c->parent) - page_counter_cancel(c, stock_to_drain); - } + if (stock_to_drain) + page_counter_cancel_hierarchy(counter, stock_to_drain); =20 /* This prevents future charges from trying to deposit pages */ counter->batch =3D 0; --=20 2.52.0 From nobody Sat Jun 20 19:59:04 2026 Received: from mail-oa1-f49.google.com (mail-oa1-f49.google.com [209.85.160.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E29D3A9D8D for ; Fri, 10 Apr 2026 21:07:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775855274; cv=none; b=H0lRYoIZhRSiJFP53AaHV8z+Gf9Ovw4NAhcFyzW9XrTopa05vP7yYb/XbwqfTuBFz9ILPFuIQHO99cab6b7Sskh2112/Q0hTI17ZaHggD1YQ4N/cMZ8pBeU6106Bohsk9LdbmS+dE794rZWruokbxdRt4VtrrJckP+gRM+WpdKM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775855274; c=relaxed/simple; bh=3w6H/F3uWcnpSwsuRvgJu5YTfmoXTOuwZ4SFgkURGe0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qy0+EJ0eJbBEfHOtjlyrmVgeatWbMbl/rl8JGHKLUyg9chWMjM+ZdulFcm6bmWn8QoAizfmZoCIQewAIU02jj9sw7HW6jXurKIbm3Ll+2iVwamZQpW2GoD86EOKcTDTFIYniyGsIuaChhAfayObnrxQt/+grqj/fyp1q4GtOQP0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=rQZUHWeo; arc=none smtp.client-ip=209.85.160.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="rQZUHWeo" Received: by mail-oa1-f49.google.com with SMTP id 586e51a60fabf-4042fe53946so1145418fac.3 for ; Fri, 10 Apr 2026 14:07:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775855271; x=1776460071; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LPG2mb1u3XIYy2E/F4fu0GfPhkkeNWm+Yupk0vqgtxA=; b=rQZUHWeo0zsjE+s50VZLIc5rCefbkEo4bZWsqlkxy0MxSRE/CZPE7noxSQu4fq4uUZ BpQYi4l78e+uJ0oUAZ2K3o/6FlltwtsoLhdpWC+DVjO0W1rAJ/olOVsJJ3LO54Mwu7bR l6U+Ka8ovXZda8yQ0dEhMBvSzeb7fFZnts8i+kZwhUNgzpEPrYKCEMl4UnW3uynhiOYI Tv7HlVQiR44WiitiD7AnbA8cCo42EvVqRMO7XgV/hc+FLQ8wXz0pvd6z3E/1l5iAeb9s 5KkTYTc8c4r6JXX+urqI0i/UBwoyE6qrj2PRIV8SIIJrgKSFaoTiEGRdupb3K38x7Hqk +FZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775855271; x=1776460071; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=LPG2mb1u3XIYy2E/F4fu0GfPhkkeNWm+Yupk0vqgtxA=; b=lyWymv+eEPe8WdLVCIr5m2gmcPUi5MHV2rHBWmXKnVONQyJLCPn2vBnf5vjZhXy1mW 7IQGr+ubCZrpPnQDWDtRId9WV4Zv62dFY058oZdLreFAwScusV0TRxPivIgLNTs23g7I 32HLNihP/1n2bRd9EEXXMMBxiqvXP1XF8eCb/yTDcDFFafGespCkyfbpVTFz9d4AQwDZ 6qB9x9GVzhJHtq3V8Zl5yQx9SQkMsrHT4VSeTZpixSsIWEpw1nhJOm9c976fLAf5oqX6 Nub9yCYLv8djgihFMsb+43JMlPBQVzWYaRANHe89jWYVcfqR6Zg8zckLCW2pUc1TQfLu 40dA== X-Forwarded-Encrypted: i=1; AJvYcCUO1UHeBEe21LcriHkHJn98QH+USmFDz9z2ONMnnR07MTjr20Z0B+uLBosYrQDqymPwkCskQR86tSaFxNE=@vger.kernel.org X-Gm-Message-State: AOJu0Yze2JCTP+kwcPbAwcRJ4GNjbcVyPV0dHS5gsNWVdsEv9XJoeB4w JJ9YMOQa6+aYR7znsL+4AnFRW7I61HkBEcnl4Cd67XXlHltnOj0N2Q01 X-Gm-Gg: AeBDiesjFftU7lc2rCIdrvWyt7JC1fO8GDu5bJl2GiZN8zLX8xabi4WpTyftDq3oaK7 CQc2W4cw/TYwpwkyxan6/D8VQx0cpMreNAVecQR6NsVUjNny4hnhCPsbc09r25sdGwx3/MySffc G7MtpfIP3lA+pTkiC42iDdsOEe7gw/jlh7lIKuXUFwCIE5tTdA2fhcs5ai8yeFMpp/1MwaVETfJ ASGbV1rEhdoA55Fgt2JU7mSKYlikO05F3pY7T/85l49T2YI6lIy7sp2pGTs82xL/DNUp/LJffYY keQd8LpiB2F7ns1lexpK3sNtB4d2463YhpQTpy5Od8Sp955vUZnE+rgGvaalaEZN6rUiLEuf1Qv GrB6OOMl9AWE1LmYKJ5nsAlBKo3yi2MPYFgOdQP5vSlanMLPDZarU7Nx7GWQ5sNfXObiEoS2le3 qhe+SLJzPRTcYRZ3ha7Rw0 X-Received: by 2002:a05:6870:6986:b0:417:2a17:285 with SMTP id 586e51a60fabf-423e10a1761mr2566760fac.30.1775855271475; Fri, 10 Apr 2026 14:07:51 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:4::]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-423ddcf0376sm3027396fac.18.2026.04.10.14.07.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Apr 2026 14:07:50 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner Cc: Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 4/8 RFC] mm/page_counter: introduce stock drain APIs Date: Fri, 10 Apr 2026 14:06:58 -0700 Message-ID: <20260410210742.550489-5-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260410210742.550489-1-joshua.hahnjy@gmail.com> References: <20260410210742.550489-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce page_counter_drain_stock() and page_counter_drain_cpu() to replace memcg stock draining functions. page_counter_drain_stock() runs from drain_all_stock, which is called when the system is under memory pressure or a cgroup is dying. Because it is a rare operation, it uses work_on_cpu() to synchronously drain each online CPU's stock and synchronizes with concurrent charge/uncharge via local_lock. page_counter_drain_cpu() handles the CPU hotplug dead path, where the stock can be accessed directly without locking since the CPU is dead. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- include/linux/page_counter.h | 2 ++ mm/page_counter.c | 51 ++++++++++++++++++++++++++++++++++++ 2 files changed, 53 insertions(+) diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h index c7e3ab3356d20..c6772531074b5 100644 --- a/include/linux/page_counter.h +++ b/include/linux/page_counter.h @@ -111,6 +111,8 @@ static inline void page_counter_reset_watermark(struct = page_counter *counter) int page_counter_enable_stock(struct page_counter *counter, unsigned int b= atch); void page_counter_disable_stock(struct page_counter *counter); void page_counter_free_stock(struct page_counter *counter); +void page_counter_drain_stock(struct page_counter *counter); +void page_counter_drain_cpu(struct page_counter *counter, unsigned int cpu= ); =20 #if IS_ENABLED(CONFIG_MEMCG) || IS_ENABLED(CONFIG_CGROUP_DMEM) void page_counter_calculate_protection(struct page_counter *root, diff --git a/mm/page_counter.c b/mm/page_counter.c index 7be214034bfad..28c2e6442f7d3 100644 --- a/mm/page_counter.c +++ b/mm/page_counter.c @@ -12,6 +12,8 @@ #include #include #include +#include +#include #include =20 static bool track_protection(struct page_counter *c) @@ -402,6 +404,55 @@ void page_counter_free_stock(struct page_counter *coun= ter) counter->stock =3D NULL; } =20 +static long page_counter_drain_stock_cpu(void *arg) +{ + struct page_counter *counter =3D arg; + struct page_counter_stock *stock; + unsigned long nr_pages; + + local_lock(&counter->stock->lock); + stock =3D this_cpu_ptr(counter->stock); + nr_pages =3D stock->nr_pages; + stock->nr_pages =3D 0; + local_unlock(&counter->stock->lock); + + if (nr_pages) + page_counter_cancel_hierarchy(counter, nr_pages); + + return 0; +} +/* + * Drain per-cpu stock across all online CPUs. Caller (drain_all_stock) is + * already protected by a mutex, all future callers must serialize as well. + */ +void page_counter_drain_stock(struct page_counter *counter) +{ + int cpu; + + if (!counter->stock) + return; + + cpus_read_lock(); + for_each_online_cpu(cpu) + work_on_cpu(cpu, page_counter_drain_stock_cpu, counter); + cpus_read_unlock(); +} + +void page_counter_drain_cpu(struct page_counter *counter, unsigned int cpu) +{ + struct page_counter_stock *stock; + unsigned long nr_pages; + + if (!counter->stock) + return; + + stock =3D per_cpu_ptr(counter->stock, cpu); + nr_pages =3D stock->nr_pages; + if (nr_pages) { + stock->nr_pages =3D 0; + page_counter_cancel_hierarchy(counter, nr_pages); + } +} =20 #if IS_ENABLED(CONFIG_MEMCG) || IS_ENABLED(CONFIG_CGROUP_DMEM) /* --=20 2.52.0 From nobody Sat Jun 20 19:59:04 2026 Received: from mail-oi1-f172.google.com (mail-oi1-f172.google.com [209.85.167.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E3F5D3A9D84 for ; Fri, 10 Apr 2026 21:07:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775855280; cv=none; b=BUaf7Rh/aVPBkr2EvE9GS2jBKHn1hUtMdn+XL5QHuKMG028tjgrhwMOZxZS8U11QFbO7SnzRzLgqCytfhUGLJMlETYsK8HWKXWzVw9LzjW412USn9BfREjFfVYpE8OjCJpPl3J8662B9ytphKYhKylVUAXI5eJZttPT+qFvET5M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775855280; c=relaxed/simple; bh=r6+/9SqOs2RB195ZYMzM4gRcuwfE30J1dxzxYtTjG6Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mWElNLYkBbj8PtaPC8TY+qsa7zGQ7dnwwxhLIyHUnFp8wLt55cziJ/TFkbNjNXy5K0UIphavLHIeVwEl4kATimkHr1ydkY2Vjr8ch/juVQ8iAL5t16GQvRM0M7WKmjAvNcRmO0oGTd22y/UWcc8eDhqcmAZ34OIgs3WpbpKLOAs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ItBj9Hn3; arc=none smtp.client-ip=209.85.167.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ItBj9Hn3" Received: by mail-oi1-f172.google.com with SMTP id 5614622812f47-464ba2bb3aeso1492349b6e.1 for ; Fri, 10 Apr 2026 14:07:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775855273; x=1776460073; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PG5z20IDuVvS6OHbornUz1JNtNJjBAIARiXrIBCPKa0=; b=ItBj9Hn3tZeR+tJcvspODcQicEs8G6kbmviItnGoGz48BE4xuBrSH4VeRxvux9eEt9 fjRQEkM9isj1c4GCmj1r6vIeJgnxwRbMqpPZ/TgUA2Rby1+1eHxMoBKeq09FkAd4go+H hW2dD7qTUV0XZ+Yho3CltpzZQaBzj/N4wHRxN4M9wGzbQe+/SjgObodnGtj7ypbKFu8m Zb9doBkXoAt4i7UZBK1fMX3dbY1RdCX9u+dmtLYEnHjxFbbeOSrC1YJaGz3vGHmWKS8P xyqIP3ZmCth0QeeqyR+e6pURWscMETX8e16jgocwOijfGLZPkZ76j4fy0rypUiIKegdf OIcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775855273; x=1776460073; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=PG5z20IDuVvS6OHbornUz1JNtNJjBAIARiXrIBCPKa0=; b=K4uvIMcaoOjKHRF7nolAfmxYuuKQsKEPPrqQlYXiDr7GtQeaPBr9uShd+vcBaYLfXA E8r+17Pp5Z0l7GDFh0zntUctFZqMK0br7qCiSRt04Dh71VksqX8GCMohMRY5QvuHXYch 2+35TE5sw2MhEaRsdIVVqFEUP92z8Czah+kpI/tyFFuRrL3EWc+mwNjS4iqg8jw3yoWZ EuX6M5I9jbV1ZobHObvb3asZeaYqPb4ydPEWkgkVX62iniRf3/YMrxN6FU+o09zBAOfN 7c7mk4m0HfpF5iMrw0PkMRX0zfyBNggHWTSCZs5Yg9f5Pe44gBKR2MxmBNCGukAq/9PN Tbwg== X-Forwarded-Encrypted: i=1; AJvYcCU6SUd+/9rRwcUI4yv0c+MAKDah6qGXni5G1WUoym69rKfPaRgF8eFAVHanFRfXjYGmYn3oOD7thHX+/6M=@vger.kernel.org X-Gm-Message-State: AOJu0YwC5AqI0ilgTh4rhUPnqnCfnSmkkPGkk7NSAa6IrOygzqXenknL gfy4wz8kFduffmc67GoLXFuTZN/mgRfxqk74D+Q5aYo1OwiPlA3Ce+K4 X-Gm-Gg: AeBDietFuqPEC7DrvL4UyLPp2qm+nmNfpIl24WOyEf3eo2dMQ3I+8vgSamPZOGRbLnJ N6ZSZ7Oiv3CIgt44pdfZ4DjgR/f0NW7Gi1SMdZdu2gKc1SU8OZLOl3kKesNVBzcAXNSs1YC7/h5 9p5Wll4LA36tP/UlBAPGB+baP50TSp/gFQNbUzGpZMHzxvoo2Y+b4YLTZLmuhtZxWGJmSEIRg4q c+HqJFibhJD+KtUKu2/l4hWQTe722Fqh2N0T+RHWb7hkVYFDWyB2yUuX99sU6RPEGFS+H7bL4/3 RtnFfNfSfs18qcrHU6dJWWTHOUJ7DNdlGYRZN4z8q3q6pRCqZuL6FZu7y2HhTV5cibKMQDSEZrP qOu4D6d/DSq46dLMsPlx0RgDLDZITTs5pFEYb9zbzf0DOmzEUJjGHjWO9ATh6c9Vh48XA+VOYi5 arRu/8dy0/oEh6wK8IVWjQ X-Received: by 2002:a05:6808:1985:b0:46e:c1cd:8661 with SMTP id 5614622812f47-478b64b05d0mr2208429b6e.2.1775855272803; Fri, 10 Apr 2026 14:07:52 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:2::]) by smtp.gmail.com with ESMTPSA id 5614622812f47-478a2f5580asm2152863b6e.12.2026.04.10.14.07.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Apr 2026 14:07:52 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner Cc: Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 5/8 RFC] mm/memcontrol: convert memcg to use page_counter_stock Date: Fri, 10 Apr 2026 14:06:59 -0700 Message-ID: <20260410210742.550489-6-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260410210742.550489-1-joshua.hahnjy@gmail.com> References: <20260410210742.550489-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now with all of the memcg_stock handling logic replicated in page_counter_stock, switch memcg to use the page_counter_stock. There are a few details that have changed: First, the old special-casing for the !allow_spinning check to avoid refilling and flushing of the old stock is removed. This special casing was important previously, because refilling the stock could do a lot of extra work by evicting one of 7 random victim memcgs in the percpu memcg_stock slots. Now that we no longer randomly evict other memcg stocks, refilling just adds extra pages to the local cache. While there may be extra work attempted when trying to refill (rather than just servicing the exact number of pages requested), this is much less work than the flushing of other memcgs' stock. Secondly, stock checking is folded into the memory page_counter. This means that for cgroupv1 users who use the memsw page_counter, they will always incur the cost of hierarchically charging for memsw. One possible workaround for this is to introduce a separate stock for memsw, which would allow for separate stock checks for both memsw and memory, restoring the fastpath behavior. Finally, we can now fail during page_counter_enable_stock(), if there is not enough memory to allocate a percpu page_counter_stock. This failure is rare and nonfatal; the system can continue to operate, with the page counter working without stock and falling back to walking the hierarchy. Note that obj_stock remains untouched by these changes. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- mm/memcontrol.c | 68 +++++++++++++++++------------------------------ mm/page_counter.c | 5 +--- 2 files changed, 25 insertions(+), 48 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c3d98ab41f1f1..27d2edd5a7832 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2238,33 +2238,22 @@ static void schedule_drain_work(int cpu, struct wor= k_struct *work) */ void drain_all_stock(struct mem_cgroup *root_memcg) { + struct mem_cgroup *memcg; int cpu, curcpu; =20 /* If someone's already draining, avoid adding running more workers. */ if (!mutex_trylock(&percpu_charge_mutex)) return; - /* - * Notify other cpus that system-wide "drain" is running - * We do not care about races with the cpu hotplug because cpu down - * as well as workers from this path always operate on the local - * per-cpu data. CPU up doesn't touch memcg_stock at all. - */ + + for_each_mem_cgroup_tree(memcg, root_memcg) + page_counter_drain_stock(&memcg->memory); + + /* Drain obj_stock on all online CPUs */ migrate_disable(); curcpu =3D smp_processor_id(); for_each_online_cpu(cpu) { - struct memcg_stock_pcp *memcg_st =3D &per_cpu(memcg_stock, cpu); struct obj_stock_pcp *obj_st =3D &per_cpu(obj_stock, cpu); =20 - if (!test_bit(FLUSHING_CACHED_CHARGE, &memcg_st->flags) && - is_memcg_drain_needed(memcg_st, root_memcg) && - !test_and_set_bit(FLUSHING_CACHED_CHARGE, - &memcg_st->flags)) { - if (cpu =3D=3D curcpu) - drain_local_memcg_stock(&memcg_st->work); - else - schedule_drain_work(cpu, &memcg_st->work); - } - if (!test_bit(FLUSHING_CACHED_CHARGE, &obj_st->flags) && obj_stock_flush_required(obj_st, root_memcg) && !test_and_set_bit(FLUSHING_CACHED_CHARGE, @@ -2281,9 +2270,13 @@ void drain_all_stock(struct mem_cgroup *root_memcg) =20 static int memcg_hotplug_cpu_dead(unsigned int cpu) { + struct mem_cgroup *memcg; + /* no need for the local lock */ drain_obj_stock(&per_cpu(obj_stock, cpu)); - drain_stock_fully(&per_cpu(memcg_stock, cpu)); + + for_each_mem_cgroup(memcg) + page_counter_drain_cpu(&memcg->memory, cpu); =20 return 0; } @@ -2558,7 +2551,6 @@ void __mem_cgroup_handle_over_high(gfp_t gfp_mask) static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int nr_pages) { - unsigned int batch =3D max(MEMCG_CHARGE_BATCH, nr_pages); int nr_retries =3D MAX_RECLAIM_RETRIES; struct mem_cgroup *mem_over_limit; struct page_counter *counter; @@ -2571,31 +2563,19 @@ static int try_charge_memcg(struct mem_cgroup *memc= g, gfp_t gfp_mask, bool allow_spinning =3D gfpflags_allow_spinning(gfp_mask); =20 retry: - if (consume_stock(memcg, nr_pages)) - return 0; - - if (!allow_spinning) - /* Avoid the refill and flush of the older stock */ - batch =3D nr_pages; - reclaim_options =3D MEMCG_RECLAIM_MAY_SWAP; if (!do_memsw_account() || - page_counter_try_charge(&memcg->memsw, batch, &counter)) { - if (page_counter_try_charge(&memcg->memory, batch, &counter)) + page_counter_try_charge(&memcg->memsw, nr_pages, &counter)) { + if (page_counter_try_charge(&memcg->memory, nr_pages, &counter)) goto done_restock; if (do_memsw_account()) - page_counter_uncharge(&memcg->memsw, batch); + page_counter_uncharge(&memcg->memsw, nr_pages); mem_over_limit =3D mem_cgroup_from_counter(counter, memory); } else { mem_over_limit =3D mem_cgroup_from_counter(counter, memsw); reclaim_options &=3D ~MEMCG_RECLAIM_MAY_SWAP; } =20 - if (batch > nr_pages) { - batch =3D nr_pages; - goto retry; - } - /* * Prevent unbounded recursion when reclaim operations need to * allocate memory. This might exceed the limits temporarily, @@ -2692,9 +2672,6 @@ static int try_charge_memcg(struct mem_cgroup *memcg,= gfp_t gfp_mask, return 0; =20 done_restock: - if (batch > nr_pages) - refill_stock(memcg, batch - nr_pages); - /* * If the hierarchy is above the normal consumption range, schedule * reclaim on returning to userland. We can perform reclaim here @@ -2731,7 +2708,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg,= gfp_t gfp_mask, * and distribute reclaim work and delay penalties * based on how much each task is actually allocating. */ - current->memcg_nr_pages_over_high +=3D batch; + current->memcg_nr_pages_over_high +=3D nr_pages; set_notify_resume(current); break; } @@ -3036,7 +3013,7 @@ static void obj_cgroup_uncharge_pages(struct obj_cgro= up *objcg, account_kmem_nmi_safe(memcg, -nr_pages); memcg1_account_kmem(memcg, -nr_pages); if (!mem_cgroup_is_root(memcg)) - refill_stock(memcg, nr_pages); + memcg_uncharge(memcg, nr_pages); =20 css_put(&memcg->css); } @@ -3957,6 +3934,8 @@ static void __mem_cgroup_free(struct mem_cgroup *memc= g) =20 static void mem_cgroup_free(struct mem_cgroup *memcg) { + page_counter_free_stock(&memcg->memory); + page_counter_free_stock(&memcg->memsw); lru_gen_exit_memcg(memcg); memcg_wb_domain_exit(memcg); __mem_cgroup_free(memcg); @@ -4130,6 +4109,9 @@ static int mem_cgroup_css_online(struct cgroup_subsys= _state *css) refcount_set(&memcg->id.ref, 1); css_get(css); =20 + /* failure is nonfatal, charges fall back to direct hierarchy */ + page_counter_enable_stock(&memcg->memory, MEMCG_CHARGE_BATCH); + /* * Ensure mem_cgroup_from_private_id() works once we're fully online. * @@ -4192,6 +4174,7 @@ static void mem_cgroup_css_offline(struct cgroup_subs= ys_state *css) lru_gen_offline_memcg(memcg); =20 drain_all_stock(memcg); + page_counter_disable_stock(&memcg->memory); =20 mem_cgroup_private_id_put(memcg, 1); } @@ -5382,7 +5365,7 @@ void mem_cgroup_sk_uncharge(const struct sock *sk, un= signed int nr_pages) =20 mod_memcg_state(memcg, MEMCG_SOCK, -nr_pages); =20 - refill_stock(memcg, nr_pages); + page_counter_uncharge(&memcg->memory, nr_pages); } =20 void mem_cgroup_flush_workqueue(void) @@ -5435,12 +5418,9 @@ int __init mem_cgroup_init(void) memcg_wq =3D alloc_workqueue("memcg", WQ_PERCPU, 0); WARN_ON(!memcg_wq); =20 - for_each_possible_cpu(cpu) { - INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work, - drain_local_memcg_stock); + for_each_possible_cpu(cpu) INIT_WORK(&per_cpu_ptr(&obj_stock, cpu)->work, drain_local_obj_stock); - } =20 memcg_size =3D struct_size_t(struct mem_cgroup, nodeinfo, nr_node_ids); memcg_cachep =3D kmem_cache_create("mem_cgroup", memcg_size, 0, diff --git a/mm/page_counter.c b/mm/page_counter.c index 28c2e6442f7d3..51148ca3a5b63 100644 --- a/mm/page_counter.c +++ b/mm/page_counter.c @@ -421,10 +421,7 @@ static long page_counter_drain_stock_cpu(void *arg) =20 return 0; } -/* - * Drain per-cpu stock across all online CPUs. Caller (drain_all_stock) is - * already protected by a mutex, all future callers must serialize as well. - */ + void page_counter_drain_stock(struct page_counter *counter) { int cpu; --=20 2.52.0 From nobody Sat Jun 20 19:59:04 2026 Received: from mail-ot1-f51.google.com (mail-ot1-f51.google.com [209.85.210.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 22C8D3A9DAD for ; Fri, 10 Apr 2026 21:07:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775855276; cv=none; b=tL97Pmo8JE1IGQ6V0uUbzZ4vr6N2gi9k4SRGXnaPzyZ5Xg3jqi2VbYRw4nMCz7gDtC/rlVT2a0C7axqBzv0J89CHK8lOjx8ac/K/iTo0ew855Fl4W9IHp90xPJMAOXo3nvMLRllN1aOPJMcsN1vdMvRUwpMlaExf9jOy+EHg2QQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775855276; c=relaxed/simple; bh=fNBJAySEqaEJuZHafhBMXn/gZ8VxBkV7g0IuVKIaHHw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Kx5B2ThBBLPOPnRzNuuUDGkVB+M3X1fSscsmHKKbSzQaH6CPh/ovJ3RX1ci3QKqOP28lHKjSf/Y01CQD3yUj4jKm7Vo/qU5D6k5hRplwFYYSJ3XlEaKqg2GoSgSLfbxkdHGTIUN0NO7EL3IzyUaoaSbREQIhJnTAl7374dz0RuU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=akywdp/s; arc=none smtp.client-ip=209.85.210.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="akywdp/s" Received: by mail-ot1-f51.google.com with SMTP id 46e09a7af769-7dbd8c6fc84so1539516a34.3 for ; Fri, 10 Apr 2026 14:07:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775855274; x=1776460074; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HPucxgysmex46OWvxebyn8gwYFbxkYWxk7oOg4s3POI=; b=akywdp/sY9WaRGsdI5CBnKquJwydApWBhDEI10AzpK95dTt+oBRmUOTenGqBOc1iAB 5XZOLawq+7z9F3JN9Cg8W6ackRo/bQZA9OeuNJT5u2Qb3q626PfoWNtn5NmPSaVL8e5Y yBhuUxgUk6G1IE5Jtkxiiq899UoGpKWAj4ALZBvkfuzrKo8axyqvu5JqgqJvpk1SAlZ+ G81SE0a7aoCOWs5lx/+apUVYrclKyJJkSJNOVHXoaEHr5LmB0YFYJm+UtRCLKdmbuBda oymgntVikPugPCRnKBGjGy34wsZGteDQ49uR6aQjgmx4ZOo3AbB108HUUQB18ywH7nHc LAaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775855274; x=1776460074; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=HPucxgysmex46OWvxebyn8gwYFbxkYWxk7oOg4s3POI=; b=X6tIFGNbMtoL5RQIQOsqcq/wD5QWm+3QwWWQVpA86vVtrEB1VotXWQtgQHxQ+DlrEF Ut0Zho+U4ml23hlrfJsfLFs6GP9YpZ+40+zssexQeMNDA5M0h1hWWkahL4Zb8vip7IkP jZV27/HGqChOH7VOytc07WQRO2ja4foVTgtF5P9QzMCgcyqIveLytSmJ4T3K8Kl//4qj whkNs35eNwgpnxpNIEl2sf4KIwvrh+KTK/qiHYwkUir962lYIry09yZkQ/mUpaUk4ibR V9YbidRM9HIO9nJtBGiijtaGkSJ6U5EAkw1TZK9P3onI70uht0uWqz96yF4cOKEoz95V /IPw== X-Forwarded-Encrypted: i=1; AJvYcCWLhvbWZdRxOwWSMmWTQi9DG02zOspjb13xPxHdzRgCfglsL/ZEvwnhbYj7Qfli2NhNumA3E1GcR1GHxKg=@vger.kernel.org X-Gm-Message-State: AOJu0YzyWPZs6v1dg+0Do+kbtn0ySaTBida90r11abDeZm11oYjUdmZ8 rDCKVLYUzOF4H00qlBcionpS/Gpza9NrMLSzqm4tdvjNfgU/p5M6tMKZ X-Gm-Gg: AeBDievmYpfpGZ8hYR1YJWf/Ii1P3VrtGQwb6GmfVA10gk8t92kdgqS0yTZ3Vk8VD9N ERjouXhndA7FfZCyFi2c4zgMnjTeaJ+t1WMfxGskZyrOeaU1SnPzplJ6V/IRrdQ8KqmiqmIAeyH l4m4JwyUrgll5CoUmb/WT0s/RmIeG3LlNJHYJiEHRJoH4H40wk1C1toSY4wUX/CdD+tks50fg1p ND+Qt6duc1opW0KAzmHS17u0AuM24DI1K05A/0w96nomR0Z2i9e7lKpnYFKRrOXFW3dmKqs5srG qOA4JsK9pQZfKhxxIqiaApK2TpYiUhyhKndTG8dooNLZPksQHOwJmw6HCxA1IY4pC2pj5Sc9Q/u hIMiWqwPUXHh1KYZQLIwJXzKsDMKxcJ7g3dYxRTQWj8VGHYx4E8x6tO2R+rbxDzHhPpG+fMp8F9 QH57DmSnryAqh/zucBd8UOTS7fVnSV9P4n X-Received: by 2002:a05:6830:828e:b0:7d7:5113:f83a with SMTP id 46e09a7af769-7dc27e4ed34mr2752617a34.25.1775855274137; Fri, 10 Apr 2026 14:07:54 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:56::]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-7dc26576d30sm3085434a34.4.2026.04.10.14.07.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Apr 2026 14:07:53 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner Cc: Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 6/8 RFC] mm/memcontrol: optimize memsw stock for cgroup v1 Date: Fri, 10 Apr 2026 14:07:00 -0700 Message-ID: <20260410210742.550489-7-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260410210742.550489-1-joshua.hahnjy@gmail.com> References: <20260410210742.550489-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Previously, each memcg had its own stock, which was shared by all page counters within it. Specifically in try_charge_memcg, the stock limit check would occur before the memsw and memory page_counters were charged hierarchically. Now that the memcg stock was folded into the page_counter level, and we have replaced try_charge_memcg's stock check against the memory page_counter's stock, this leaves no fast path available for cgroup v1's memsw check. Introduce a new stock for the memsw page_counter, charged and uncharged independently from the memory page_counter. This provides better caching on cgroup v1: The best case scenario is when both the memsw and memory page_counters can use their cached stock charge; this is the old behavior. The halfway scenario is when either the memsw or memory page_counter is within the stock size, but the other isn't. This requires one hierarchical charge. The worst case scenario is when both memsw and memory page_counters are over their limit, and must walk two page_counter hierarchies. This is the same as the old behavior. By introducing an indepednent stock for memsw, we can avoid the worst case scenario more often and can fail or succeed separately from the memory page counter. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- mm/memcontrol.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 27d2edd5a7832..6d50f5d667434 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2245,8 +2245,10 @@ void drain_all_stock(struct mem_cgroup *root_memcg) if (!mutex_trylock(&percpu_charge_mutex)) return; =20 - for_each_mem_cgroup_tree(memcg, root_memcg) + for_each_mem_cgroup_tree(memcg, root_memcg) { page_counter_drain_stock(&memcg->memory); + page_counter_drain_stock(&memcg->memsw); + } =20 /* Drain obj_stock on all online CPUs */ migrate_disable(); @@ -2275,8 +2277,10 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu) /* no need for the local lock */ drain_obj_stock(&per_cpu(obj_stock, cpu)); =20 - for_each_mem_cgroup(memcg) + for_each_mem_cgroup(memcg) { page_counter_drain_cpu(&memcg->memory, cpu); + page_counter_drain_cpu(&memcg->memsw, cpu); + } =20 return 0; } @@ -4111,6 +4115,8 @@ static int mem_cgroup_css_online(struct cgroup_subsys= _state *css) =20 /* failure is nonfatal, charges fall back to direct hierarchy */ page_counter_enable_stock(&memcg->memory, MEMCG_CHARGE_BATCH); + if (do_memsw_account()) + page_counter_enable_stock(&memcg->memsw, MEMCG_CHARGE_BATCH); =20 /* * Ensure mem_cgroup_from_private_id() works once we're fully online. @@ -4175,6 +4181,7 @@ static void mem_cgroup_css_offline(struct cgroup_subs= ys_state *css) =20 drain_all_stock(memcg); page_counter_disable_stock(&memcg->memory); + page_counter_disable_stock(&memcg->memsw); =20 mem_cgroup_private_id_put(memcg, 1); } --=20 2.52.0 From nobody Sat Jun 20 19:59:04 2026 Received: from mail-ot1-f47.google.com (mail-ot1-f47.google.com [209.85.210.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A7AF922F74A for ; Fri, 10 Apr 2026 21:07:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775855279; cv=none; b=ZC27+pgZ6veaI2iE80fFq7sy/4rp6tq9VAArQ0NyC9v+fs6D4HoJDvGvlFNELS0kMasA0pAXJJtc+lDZvDPUtMdOYQalHUy6tXA+lsorBRbJiMcQ7DSBO6fx6Xoyv6slCuGlbCAwIDoByb7eKnoBrmET3AnVbSU1IDIXBmtn9Jc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775855279; c=relaxed/simple; bh=vs3QDWwNH10RCc4zFoNuAaKqRY15PIf14IbBQYxdpfE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gui6XhuPEBnBYGyLe5yHlNByE7h3K/XD+0B7mDKE2GHr/6I0EtXzWUkdogsXsdcCnHaMYs7p/ulKTaB2PJ1vWeuJ7QFqU+3NDNtj+2ZaEJC3H/NniJtBKu6xgEKTlw5wsvZw4b46NKa1lkS5mRRuOOJivwajv3W30sDg/o55yZE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mTZbeYOG; arc=none smtp.client-ip=209.85.210.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mTZbeYOG" Received: by mail-ot1-f47.google.com with SMTP id 46e09a7af769-7dbccb6ae20so1215453a34.3 for ; Fri, 10 Apr 2026 14:07:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775855275; x=1776460075; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Txux4TW6NI8BBnT/9aUbbKCnnuVuxmAnvDFuD00fzoQ=; b=mTZbeYOGCChtZIIs1yEcCtGqFy+EY7EWywHT8BaoQExmqin9TMWN6v2N2aeY8Di96k 9QG+H5OUsp5vFuL6/ez4SKFEpsFcKGXKqfBIX60JbgMG9QwcR6OF4KyqYgATcBSYX1fE HrTkm1npcTkGZ3eO6oWMGmKw5qXaiGZoTf1p6KBgrg2wBz809xp6aofbFBd5n5u+tzhY /ws/XqmbpXUZ8UAugnVfeHpWsM9BQR6BYyFnT3r6CoVsONEEw+QlfIN9KPJq7uld2kHS N0UzV6SWiC+weI9+um0TA24UEbRoA91aNyWQSWBNeoUrOCNBd+m3IEWdgznXF/W8gxle zmGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775855275; x=1776460075; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Txux4TW6NI8BBnT/9aUbbKCnnuVuxmAnvDFuD00fzoQ=; b=UrpmZiONC3pfuR3ESPLVOUO6IS1gztpkGNXM3Mj3B4BN0oJYgbj1v9DquoQ091zTf/ 6wjX9RehipR2027KBBz/taxIwpd5OtdSOCJ3eWEl2om/YQHq9M/26k6HxLZ1c41limkp NBuWLDdKKbYfr/zVoD3jbP7iz/8KG4rWFcfSRS9ckYHcZ0P5MGrQtl0rX8hUU+tw9EqE M3cxOQOe4nVIeW5QiOshukORid6e6MnjXkc1ajG4uKNjpZI/EgA2nTI1jwuDou4AZfPL cZwKKZWtpWlhlqWfTkOOk6KaqBcj5iHMp2KE+S9ZmwDG7jzXWWiy18TR+oDJp2Nd3vuM phIQ== X-Forwarded-Encrypted: i=1; AJvYcCXR2w+WeNVW7iAjYkxNi3lDMPMyWDEI/oVCBqI5kIgB2FMy38yYyWPPpuPdx6jOH5LTVb1ZgGnYMGr0agA=@vger.kernel.org X-Gm-Message-State: AOJu0Yxlgq+xaUJBG4nPeMbb28yX9e+i6Fhidzmq9cSEbS4vMWxruZBr cFX05YU/DwS7fAoRrWXnrvopJPm7p9rjcTBDluROHcNU0+yjPgtJKJY6 X-Gm-Gg: AeBDiesPCWBR0afpd2hVgfa0hWcClkzIVWiKzt9z0r37TgnJmWH8hg8U2zVGDAr7KUO G47hiOk5K42aeThF/82u6McjBZflc7XLSUdsl2CmWIJXFZGAWyJtoF5EdK2KPb50h0G3SIucmz5 aEzpLZHGkE927PBIESjUmgY5gTBCMhij5a67bS7aX1DdIN04BLf0PEXnyYlWFfNCTvbp9sVH45w pHpNBK0+KedRHVPb1PaT+5PFxIPV/oXAJsXRuyeV9D11EcY6JzJH5VK1Lx0BMexkmmlv/bgX5Uz 5Z54LeOcu8BpEmJbynZHWYnHIaGCBnFEk262EjY7H/Lu+EH2+kTwMptajBS/UNLbgmsL/A5lgZM Tvotb6AfU6UfwRP93kgf1Hl/D4YXhROuKM57gmVp6YAAdRADlYDFdC1CWTbWonGu3ikiayjQzrO Jx6PYTDGf9J6HVPs5t57vMgw== X-Received: by 2002:a05:6820:6307:b0:685:c39e:583a with SMTP id 006d021491bc7-68be7ee7a15mr1487108eaf.30.1775855275568; Fri, 10 Apr 2026 14:07:55 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:43::]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-68bc88c80a7sm1944727eaf.8.2026.04.10.14.07.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Apr 2026 14:07:55 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner Cc: Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 7/8 RFC] mm/memcontrol: optimize stock usage for cgroup v2 Date: Fri, 10 Apr 2026 14:07:01 -0700 Message-ID: <20260410210742.550489-8-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260410210742.550489-1-joshua.hahnjy@gmail.com> References: <20260410210742.550489-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In cgroup v2, tasks can only belong to leaf cgroups, meaning non-leaf cgroups never receive direct charges. Having stock remain in these cgroups therefore, is wasted percpu memory that will never be consumed unless all of its children are removed. To avoid leaving unused but accounted charges from remaining in non-leaf cgroups, drain the stock when leaf cgroups become parents. There is one caveat, which is concurrent charging and child creation. When a leaf cgroup becomes a parent at the same time it is still charging a task, there can be a race condition where the parent's stock is drained, then refilled by the charge. Instead of adding expensive synchronization mechanisms, accept the pages kept captive by parent page_counters which will not be able to use the stock until all its children are offlined first. It is a rare race condition, and is also bounded by MEMCG_CHARGE_BATCH =3D 64 pages. This optimization is not for cgroup v1, where tasks can be attached to any cgroup in the hierarchy, meaning stock can be consumed & refilled for non-leaf cgroups as well. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- mm/memcontrol.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6d50f5d667434..4be1638dde180 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4130,6 +4130,17 @@ static int mem_cgroup_css_online(struct cgroup_subsy= s_state *css) */ xa_store(&mem_cgroup_private_ids, memcg->id.id, memcg, GFP_KERNEL); =20 + /* + * On v2, non-leaf memcgs cannot directly be charged. This child's + * parent is no longer a leaf, so drain the parent's stock. + */ + if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) { + struct mem_cgroup *parent =3D parent_mem_cgroup(memcg); + + if (parent) + page_counter_drain_stock(&parent->memory); + } + return 0; free_objcg: for_each_node(nid) { --=20 2.52.0 From nobody Sat Jun 20 19:59:04 2026 Received: from mail-oo1-f44.google.com (mail-oo1-f44.google.com [209.85.161.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E0A8A3A9D87 for ; Fri, 10 Apr 2026 21:07:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775855286; cv=none; b=PnQEH74Zb/Z3ftZSIedusYkcjXRL+js9LkbelooQ68QKKIIqzb6vNcPags9dbRme+KpdSKwbGRYmjk6O7ietJEk2u/Fo4XK/oFcuzXFGNTdon0HtXL14lrdk2kbJoV0Mf/JzLsi1cGTnnxB8k10UAyAy3X+zt7zg1GwOR6KbXqs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775855286; c=relaxed/simple; bh=uHZSosdPaz/5klkEV1FY2gO/CN0MqSO80Sx627Bwhzw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sVGL/cM/p5bq/QvMa1xC2uiFhT9U5cEsu/LzbaoZ3FReM3WdyL790nVd6k+/OBc/9TB3tqsTDedt71MZRo6pY4vB56ZZPzZBO5BXpjfncGrKmTZm8ppURSBi7bAYxW2DplzIv99EIuao605nsHYjS8URTGrwft0MgZFvkTyZ40s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ktEoy/Jo; arc=none smtp.client-ip=209.85.161.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ktEoy/Jo" Received: by mail-oo1-f44.google.com with SMTP id 006d021491bc7-679b072ed3aso1157526eaf.1 for ; Fri, 10 Apr 2026 14:07:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775855277; x=1776460077; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=esfCictnJyPc20j2EkkTX38W+qKv+nRwhZtLvjA9q1E=; b=ktEoy/JoxcModDbVQ1E7R0xZgF68ITdXuS5+HH4cx1Vq6XzHFDvddqAWjdNlyWbLpt 1lbqm53Jgkr9hAyBnyi6QWpWvoZJf3rmRA0h+v4VnpS/g382fAFrcPcJdWNUad21hrCd zrsPiHDPR8srWv45oVRT4BjN7UfIZqMB3w35vtbTjaRP35TIx1lS404/KlfJxXmaDZ8E nST5ajUM34sDgC1KEiBdiIUup6zrh9pBx+fhb/aBKDUm8NyKVtieO0IuBlbQbVfuzF0X pHahyvPb5LlwulG74g10ca8kUuo1rNLwHRvHT8OdjYxjPg7iTTwPnEfROtEIsCS9PWtL wt7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775855277; x=1776460077; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=esfCictnJyPc20j2EkkTX38W+qKv+nRwhZtLvjA9q1E=; b=hgyQTqKXfucJhdwq7gmSFs+l2vQpxRgN9jv1+mAyiW+AYhJ9x35vsOmMNLDLv6PmNZ K7JpMBSInbjd2Fg+KsOWZwF8Hc1kq0znKwo+Fl/UCBcRK+wYaDnq480hloAHTJV/h2MG ulnQcHtWP1nWcK1W0lQj1Ep1jAF8HV3CGGcMu/BXg7bK7DZtfZC+/NqokiW+zU15lw0e 21n+9C5m44/zJLxkJWU1Xw6PqKD2wBiEeUYdBQ3j3cbMh/RpZjGGf0IjkKot7gJgDnjP iTgA7vOVgjQ9xYjwRobb6CFbYAC/hxldecjuOgGX9xWWFVZZdRea/wu6AYd0f/mZNzYD e+Pg== X-Forwarded-Encrypted: i=1; AJvYcCWlTzlYm5PgaFxL+5Wl7hV1Snw+94ynKXM6GX3jCXCNX8zPDQNy1PRwO9X/yJ9H7c2Oi+SCl0ztChOpZhA=@vger.kernel.org X-Gm-Message-State: AOJu0YxyuOVb+N+FHN/7ROvSkVHZ+Ggm0RVpvxrGkH7rzUJ2n0eXsjqn DVxppMVO2sZuejPRyJrIQlF/QB8w8JWb6ruG3MGgOOgkfDI4RgO7LWQ8r5I1jA== X-Gm-Gg: AeBDievVPTMKWM8m5pfR5ZgTsk+vv+6sndEVlSy8Dsk1sHwElrLwLl1ySeaqqymbGL7 rM4EY2/9aJc2UZ+k8I1zsW+EV/T4+u5L2DWyZoqhU9LYFrulhGuEhK8gGpkmiRUJu9xzqevZUZ9 DgWQeZjFqf/TKwhCMpcoiHbAXQM0tpb6FDDgDEYPbSmfxCLeaC3ZlBl/f2ZHFepcF/cRGBGkVLY HHNTkUEV7QTyXr0bDcveO4T4I+0v10Z+Y7GLLnxA4Bv2baQ8ygP58gq79TE8GMxoK5uW1X4z2hw uosuA9YHQvREWqZlcYzciJk47CR6ZvGEP4AFRFz1Gyr9At0CstaJgvJNoqRaAHGyBodOCOT2nMd Ds3ActfrYm3ithpEHf3i8eLdaHwK0OzBfo8Kvk0E3iD9bHENgRu+T2RMCcCRDBwgRb/CJE5nuar dO54sngQaMW1YllWaAFQjWAhMxXnOb9WUu X-Received: by 2002:a05:6820:1389:b0:67d:f88f:d853 with SMTP id 006d021491bc7-68be5c5dd0amr2359578eaf.6.1775855276856; Fri, 10 Apr 2026 14:07:56 -0700 (PDT) Received: from localhost ([2a03:2880:10ff:59::]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-423dcf9726fsm3229555fac.0.2026.04.10.14.07.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Apr 2026 14:07:56 -0700 (PDT) From: Joshua Hahn To: Johannes Weiner Cc: Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH 8/8 RFC] mm/memcontrol: remove unused memcg_stock code Date: Fri, 10 Apr 2026 14:07:02 -0700 Message-ID: <20260410210742.550489-9-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260410210742.550489-1-joshua.hahnjy@gmail.com> References: <20260410210742.550489-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that all memcg_stock logic has been moved to page_counter_stock, we can remove all code related to handling memcg_stock. Note that obj_stock is untouched and is still needed. FLUSHING_CACHED_CHARGE is preserved so that it can be used by obj_stock as well. Suggested-by: Johannes Weiner Signed-off-by: Joshua Hahn --- mm/memcontrol.c | 183 ------------------------------------------------ 1 file changed, 183 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4be1638dde180..7de23ecd7cef6 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1989,24 +1989,7 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *m= emcg) pr_cont(" are going to be killed due to memory.oom.group set\n"); } =20 -/* - * The value of NR_MEMCG_STOCK is selected to keep the cached memcgs and t= heir - * nr_pages in a single cacheline. This may change in future. - */ -#define NR_MEMCG_STOCK 7 #define FLUSHING_CACHED_CHARGE 0 -struct memcg_stock_pcp { - local_trylock_t lock; - uint8_t nr_pages[NR_MEMCG_STOCK]; - struct mem_cgroup *cached[NR_MEMCG_STOCK]; - - struct work_struct work; - unsigned long flags; -}; - -static DEFINE_PER_CPU_ALIGNED(struct memcg_stock_pcp, memcg_stock) =3D { - .lock =3D INIT_LOCAL_TRYLOCK(lock), -}; =20 struct obj_stock_pcp { local_trylock_t lock; @@ -2030,47 +2013,6 @@ static void drain_obj_stock(struct obj_stock_pcp *st= ock); static bool obj_stock_flush_required(struct obj_stock_pcp *stock, struct mem_cgroup *root_memcg); =20 -/** - * consume_stock: Try to consume stocked charge on this cpu. - * @memcg: memcg to consume from. - * @nr_pages: how many pages to charge. - * - * Consume the cached charge if enough nr_pages are present otherwise retu= rn - * failure. Also return failure for charge request larger than - * MEMCG_CHARGE_BATCH or if the local lock is already taken. - * - * returns true if successful, false otherwise. - */ -static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) -{ - struct memcg_stock_pcp *stock; - uint8_t stock_pages; - bool ret =3D false; - int i; - - if (nr_pages > MEMCG_CHARGE_BATCH || - !local_trylock(&memcg_stock.lock)) - return ret; - - stock =3D this_cpu_ptr(&memcg_stock); - - for (i =3D 0; i < NR_MEMCG_STOCK; ++i) { - if (memcg !=3D READ_ONCE(stock->cached[i])) - continue; - - stock_pages =3D READ_ONCE(stock->nr_pages[i]); - if (stock_pages >=3D nr_pages) { - WRITE_ONCE(stock->nr_pages[i], stock_pages - nr_pages); - ret =3D true; - } - break; - } - - local_unlock(&memcg_stock.lock); - - return ret; -} - static void memcg_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages) { page_counter_uncharge(&memcg->memory, nr_pages); @@ -2078,51 +2020,6 @@ static void memcg_uncharge(struct mem_cgroup *memcg,= unsigned int nr_pages) page_counter_uncharge(&memcg->memsw, nr_pages); } =20 -/* - * Returns stocks cached in percpu and reset cached information. - */ -static void drain_stock(struct memcg_stock_pcp *stock, int i) -{ - struct mem_cgroup *old =3D READ_ONCE(stock->cached[i]); - uint8_t stock_pages; - - if (!old) - return; - - stock_pages =3D READ_ONCE(stock->nr_pages[i]); - if (stock_pages) { - memcg_uncharge(old, stock_pages); - WRITE_ONCE(stock->nr_pages[i], 0); - } - - css_put(&old->css); - WRITE_ONCE(stock->cached[i], NULL); -} - -static void drain_stock_fully(struct memcg_stock_pcp *stock) -{ - int i; - - for (i =3D 0; i < NR_MEMCG_STOCK; ++i) - drain_stock(stock, i); -} - -static void drain_local_memcg_stock(struct work_struct *dummy) -{ - struct memcg_stock_pcp *stock; - - if (WARN_ONCE(!in_task(), "drain in non-task context")) - return; - - local_lock(&memcg_stock.lock); - - stock =3D this_cpu_ptr(&memcg_stock); - drain_stock_fully(stock); - clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags); - - local_unlock(&memcg_stock.lock); -} - static void drain_local_obj_stock(struct work_struct *dummy) { struct obj_stock_pcp *stock; @@ -2139,86 +2036,6 @@ static void drain_local_obj_stock(struct work_struct= *dummy) local_unlock(&obj_stock.lock); } =20 -static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages) -{ - struct memcg_stock_pcp *stock; - struct mem_cgroup *cached; - uint8_t stock_pages; - bool success =3D false; - int empty_slot =3D -1; - int i; - - /* - * For now limit MEMCG_CHARGE_BATCH to 127 and less. In future if we - * decide to increase it more than 127 then we will need more careful - * handling of nr_pages[] in struct memcg_stock_pcp. - */ - BUILD_BUG_ON(MEMCG_CHARGE_BATCH > S8_MAX); - - VM_WARN_ON_ONCE(mem_cgroup_is_root(memcg)); - - if (nr_pages > MEMCG_CHARGE_BATCH || - !local_trylock(&memcg_stock.lock)) { - /* - * In case of larger than batch refill or unlikely failure to - * lock the percpu memcg_stock.lock, uncharge memcg directly. - */ - memcg_uncharge(memcg, nr_pages); - return; - } - - stock =3D this_cpu_ptr(&memcg_stock); - for (i =3D 0; i < NR_MEMCG_STOCK; ++i) { - cached =3D READ_ONCE(stock->cached[i]); - if (!cached && empty_slot =3D=3D -1) - empty_slot =3D i; - if (memcg =3D=3D READ_ONCE(stock->cached[i])) { - stock_pages =3D READ_ONCE(stock->nr_pages[i]) + nr_pages; - WRITE_ONCE(stock->nr_pages[i], stock_pages); - if (stock_pages > MEMCG_CHARGE_BATCH) - drain_stock(stock, i); - success =3D true; - break; - } - } - - if (!success) { - i =3D empty_slot; - if (i =3D=3D -1) { - i =3D get_random_u32_below(NR_MEMCG_STOCK); - drain_stock(stock, i); - } - css_get(&memcg->css); - WRITE_ONCE(stock->cached[i], memcg); - WRITE_ONCE(stock->nr_pages[i], nr_pages); - } - - local_unlock(&memcg_stock.lock); -} - -static bool is_memcg_drain_needed(struct memcg_stock_pcp *stock, - struct mem_cgroup *root_memcg) -{ - struct mem_cgroup *memcg; - bool flush =3D false; - int i; - - rcu_read_lock(); - for (i =3D 0; i < NR_MEMCG_STOCK; ++i) { - memcg =3D READ_ONCE(stock->cached[i]); - if (!memcg) - continue; - - if (READ_ONCE(stock->nr_pages[i]) && - mem_cgroup_is_descendant(memcg, root_memcg)) { - flush =3D true; - break; - } - } - rcu_read_unlock(); - return flush; -} - static void schedule_drain_work(int cpu, struct work_struct *work) { /* --=20 2.52.0