From nobody Wed Dec 17 18:59:49 2025 Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 26EF6205AA8 for ; Sat, 15 Mar 2025 17:50:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.183 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742061014; cv=none; b=JkHaoZYOofidtQeNt8xw/mx9fEkGluqudqtG+503MvK/2iOA9/uvvZUi6DGe+iAWQtd4IC1oCN3T3Gtfpy3KWTs8kbvQlhDsk3OQPT+3/9eXGPz3lbGg86hZfQgmFUZPG1CjgyzMosGplJrlMxN6hI6ykLovsb1GlWld7RTHpa4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742061014; c=relaxed/simple; bh=Cv3THES77lUgLh8esbgj1YwwbLYIxyafSiP9r0Y5/+Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dNUFfOGIT2PXTzC9qCUUPFzF+hiCpawm9fK+nz4BYC05oB19aP457eMc0n5JvYIb5zjg/T9Xz+Ju93BFsErvyij6hUbqIfsmYUqRkq7UxxYLLFPV0PuAJ/LUgMbK52gYqgT+zWENIBiYojfE+kZ6qSQgDmMDCvtyRLt7xf8U/Dw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=eqqFYRay; arc=none smtp.client-ip=91.218.175.183 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="eqqFYRay" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1742061010; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=o4xUih43AY3K2la6xaLoSlR0RMvoWQOKhCt2j9dxoEk=; b=eqqFYRay+fQyoV6n5jswA6nCcb5FmqTbDPLFlZ2zp0kXlAMkjJwG98vDbE3BTw/x9vWnlN AIIUxGky5BNUCIqwWFkVJ8TuF2dvZuQgv+zD3kE9FReetGUYiptNLv9fON3ko4pdWauiGw 9o7QOfY/CwY1Uuiipxm1wber65uJISc= From: Shakeel Butt To: Andrew Morton Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Vlastimil Babka , Sebastian Andrzej Siewior , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Meta kernel team Subject: [PATCH 8/9] memcg: combine slab obj stock charging and accounting Date: Sat, 15 Mar 2025 10:49:29 -0700 Message-ID: <20250315174930.1769599-9-shakeel.butt@linux.dev> In-Reply-To: <20250315174930.1769599-1-shakeel.butt@linux.dev> References: <20250315174930.1769599-1-shakeel.butt@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Vlastimil Babka When handing slab objects, we use obj_cgroup_[un]charge() for (un)charging and mod_objcg_state() to account NR_SLAB_[UN]RECLAIMABLE_B. All these operations use the percpu stock for performance. However with the calls being separate, the stock_lock is taken twice in each case. By refactoring the code, we can turn mod_objcg_state() into __account_obj_stock() which is called on a stock that's already locked and validated. On the charging side we can call this function from consume_obj_stock() when it succeeds, and refill_obj_stock() in the fallback. We just expand parameters of these functions as necessary. The uncharge side from __memcg_slab_free_hook() is just the call to refill_obj_stock(). Other callers of obj_cgroup_[un]charge() (i.e. not slab) simply pass the extra parameters as NULL/zeroes to skip the __account_obj_stock() operation. In __memcg_slab_post_alloc_hook() we now charge each object separately, but that's not a problem as we did call mod_objcg_state() for each object separately, and most allocations are non-bulk anyway. This could be improved by batching all operations until slab_pgdat(slab) changes. Some preliminary benchmarking with a kfree(kmalloc()) loop of 10M iterations with/without __GFP_ACCOUNT: Before the patch: kmalloc/kfree !memcg: 581390144 cycles kmalloc/kfree memcg: 783689984 cycles After the patch: kmalloc/kfree memcg: 658723808 cycles More than half of the overhead of __GFP_ACCOUNT relative to non-accounted case seems eliminated. Signed-off-by: Vlastimil Babka Signed-off-by: Shakeel Butt Reviewed-by: Roman Gushchin --- mm/memcontrol.c | 77 +++++++++++++++++++++++++++++-------------------- 1 file changed, 46 insertions(+), 31 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index dfe9c2eb7816..553eb1d7250a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2753,25 +2753,17 @@ static void replace_stock_objcg(struct memcg_stock_= pcp *stock, WRITE_ONCE(stock->cached_objcg, objcg); } =20 -static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *= pgdat, - enum node_stat_item idx, int nr) +static void __account_obj_stock(struct obj_cgroup *objcg, + struct memcg_stock_pcp *stock, int nr, + struct pglist_data *pgdat, enum node_stat_item idx) { - struct memcg_stock_pcp *stock; - unsigned long flags; int *bytes; =20 - localtry_lock_irqsave(&memcg_stock.stock_lock, flags); - stock =3D this_cpu_ptr(&memcg_stock); - /* * Save vmstat data in stock and skip vmstat array update unless - * accumulating over a page of vmstat data or when pgdat or idx - * changes. + * accumulating over a page of vmstat data or when pgdat changes. */ - if (READ_ONCE(stock->cached_objcg) !=3D objcg) { - replace_stock_objcg(stock, objcg); - stock->cached_pgdat =3D pgdat; - } else if (stock->cached_pgdat !=3D pgdat) { + if (stock->cached_pgdat !=3D pgdat) { /* Flush the existing cached vmstat data */ struct pglist_data *oldpg =3D stock->cached_pgdat; =20 @@ -2808,11 +2800,10 @@ static void mod_objcg_state(struct obj_cgroup *objc= g, struct pglist_data *pgdat, } if (nr) __mod_objcg_mlstate(objcg, pgdat, idx, nr); - - localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags); } =20 -static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_by= tes) +static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_by= tes, + struct pglist_data *pgdat, enum node_stat_item idx) { struct memcg_stock_pcp *stock; unsigned long flags; @@ -2824,6 +2815,9 @@ static bool consume_obj_stock(struct obj_cgroup *objc= g, unsigned int nr_bytes) if (objcg =3D=3D READ_ONCE(stock->cached_objcg) && stock->nr_bytes >=3D n= r_bytes) { stock->nr_bytes -=3D nr_bytes; ret =3D true; + + if (pgdat) + __account_obj_stock(objcg, stock, nr_bytes, pgdat, idx); } =20 localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags); @@ -2908,7 +2902,8 @@ static bool obj_stock_flush_required(struct memcg_sto= ck_pcp *stock, } =20 static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_byt= es, - bool allow_uncharge) + bool allow_uncharge, int nr_acct, struct pglist_data *pgdat, + enum node_stat_item idx) { struct memcg_stock_pcp *stock; unsigned long flags; @@ -2923,6 +2918,9 @@ static void refill_obj_stock(struct obj_cgroup *objcg= , unsigned int nr_bytes, } stock->nr_bytes +=3D nr_bytes; =20 + if (pgdat) + __account_obj_stock(objcg, stock, nr_acct, pgdat, idx); + if (allow_uncharge && (stock->nr_bytes > PAGE_SIZE)) { nr_pages =3D stock->nr_bytes >> PAGE_SHIFT; stock->nr_bytes &=3D (PAGE_SIZE - 1); @@ -2934,12 +2932,13 @@ static void refill_obj_stock(struct obj_cgroup *obj= cg, unsigned int nr_bytes, obj_cgroup_uncharge_pages(objcg, nr_pages); } =20 -int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size) +static int obj_cgroup_charge_account(struct obj_cgroup *objcg, gfp_t gfp, = size_t size, + struct pglist_data *pgdat, enum node_stat_item idx) { unsigned int nr_pages, nr_bytes; int ret; =20 - if (consume_obj_stock(objcg, size)) + if (likely(consume_obj_stock(objcg, size, pgdat, idx))) return 0; =20 /* @@ -2972,15 +2971,21 @@ int obj_cgroup_charge(struct obj_cgroup *objcg, gfp= _t gfp, size_t size) nr_pages +=3D 1; =20 ret =3D obj_cgroup_charge_pages(objcg, gfp, nr_pages); - if (!ret && nr_bytes) - refill_obj_stock(objcg, PAGE_SIZE - nr_bytes, false); + if (!ret && (nr_bytes || pgdat)) + refill_obj_stock(objcg, nr_bytes ? PAGE_SIZE - nr_bytes : 0, + false, size, pgdat, idx); =20 return ret; } =20 +int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size) +{ + return obj_cgroup_charge_account(objcg, gfp, size, NULL, 0); +} + void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size) { - refill_obj_stock(objcg, size, true); + refill_obj_stock(objcg, size, true, 0, NULL, 0); } =20 static inline size_t obj_full_size(struct kmem_cache *s) @@ -3032,23 +3037,32 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache= *s, struct list_lru *lru, return false; } =20 - if (obj_cgroup_charge(objcg, flags, size * obj_full_size(s))) - return false; - for (i =3D 0; i < size; i++) { slab =3D virt_to_slab(p[i]); =20 if (!slab_obj_exts(slab) && alloc_slab_obj_exts(slab, s, flags, false)) { - obj_cgroup_uncharge(objcg, obj_full_size(s)); continue; } =20 + /* + * if we fail and size is 1, memcg_alloc_abort_single() will + * just free the object, which is ok as we have not assigned + * objcg to its obj_ext yet + * + * for larger sizes, kmem_cache_free_bulk() will uncharge + * any objects that were already charged and obj_ext assigned + * + * TODO: we could batch this until slab_pgdat(slab) changes + * between iterations, with a more complicated undo + */ + if (obj_cgroup_charge_account(objcg, flags, obj_full_size(s), + slab_pgdat(slab), cache_vmstat_idx(s))) + return false; + off =3D obj_to_index(s, slab, p[i]); obj_cgroup_get(objcg); slab_obj_exts(slab)[off].objcg =3D objcg; - mod_objcg_state(objcg, slab_pgdat(slab), - cache_vmstat_idx(s), obj_full_size(s)); } =20 return true; @@ -3057,6 +3071,8 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *= s, struct list_lru *lru, void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, int objects, struct slabobj_ext *obj_exts) { + size_t obj_size =3D obj_full_size(s); + for (int i =3D 0; i < objects; i++) { struct obj_cgroup *objcg; unsigned int off; @@ -3067,9 +3083,8 @@ void __memcg_slab_free_hook(struct kmem_cache *s, str= uct slab *slab, continue; =20 obj_exts[off].objcg =3D NULL; - obj_cgroup_uncharge(objcg, obj_full_size(s)); - mod_objcg_state(objcg, slab_pgdat(slab), cache_vmstat_idx(s), - -obj_full_size(s)); + refill_obj_stock(objcg, obj_size, true, -obj_size, + slab_pgdat(slab), cache_vmstat_idx(s)); obj_cgroup_put(objcg); } } --=20 2.47.1