From nobody Mon Dec 1 21:33:24 2025 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 388003164AB for ; Thu, 27 Nov 2025 23:36:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764286617; cv=none; b=CFeN6EnDTwWPF64KKY4sfOT29OHPaG8W/ekowZIgHwk1kJGcEdb4b/BAKLVS8xe/XCKiEKDEno4ewEjsBTjOGp3dMyIgTV48N6DocLc36YkwLaZjaoimo6ygEgO7MQ+5I+vizCLl8BFVJ99XpFwYxlsYNSluY88nRzKe+Jl1b0Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764286617; c=relaxed/simple; bh=mHuaje01lQgyas8H2U3AMJOubs9OfChnqp6RUO8b+tE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qKSf2ZAO0rNjwK1NhR6c9SSzOwxyAHaSjbQXl4NWmoeDcjpLk+pUawh8Vd1jxAqvGunHzZQFgr0OjuDTBVED06UMJ/tnEJmmmxl26B0hMwLxVRPE+gx+AsXCQ3IRxc06iOMsZeQlPLhxoz0h5xX8JYJBCHgzPk5uuzG5dmOfNZA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=m77tV44C; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=WakdwEOx; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=m77tV44C; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=WakdwEOx; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="m77tV44C"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="WakdwEOx"; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="m77tV44C"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="WakdwEOx" Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 686EC5BD08; Thu, 27 Nov 2025 23:36:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1764286613; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=06B+KtFJcCLWM7OEI7FlFF/W4XFxMu+OBjhz+zAvX5U=; b=m77tV44CYc/2L7cd1qBEUBNYPDDIy0bqrn1B719+3vvEplXfuboWDpFmJbYSBn7QIvsA6W HGNEqySFlzOPY4XWvW2Q74oXT0ap9ZGfyJeoqWRrpODaYMbu9JTJrzz4xXa2SVWA7EyTvE 6TBog3CQxs3XF3B3bC8G2YBN7uyDAOk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1764286613; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=06B+KtFJcCLWM7OEI7FlFF/W4XFxMu+OBjhz+zAvX5U=; b=WakdwEOxdOUsDIqxLQRe6FKZZu/JvXqwQ4BSruah9UsBSU7dFru6PKLccknVef63HMBFiY pdpR0Hx3xu91eUAQ== Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1764286613; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=06B+KtFJcCLWM7OEI7FlFF/W4XFxMu+OBjhz+zAvX5U=; b=m77tV44CYc/2L7cd1qBEUBNYPDDIy0bqrn1B719+3vvEplXfuboWDpFmJbYSBn7QIvsA6W HGNEqySFlzOPY4XWvW2Q74oXT0ap9ZGfyJeoqWRrpODaYMbu9JTJrzz4xXa2SVWA7EyTvE 6TBog3CQxs3XF3B3bC8G2YBN7uyDAOk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1764286613; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=06B+KtFJcCLWM7OEI7FlFF/W4XFxMu+OBjhz+zAvX5U=; b=WakdwEOxdOUsDIqxLQRe6FKZZu/JvXqwQ4BSruah9UsBSU7dFru6PKLccknVef63HMBFiY pdpR0Hx3xu91eUAQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 2C44B3EA63; Thu, 27 Nov 2025 23:36:52 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id zCs2OpTgKGlAGQAAD6G6ig (envelope-from ); Thu, 27 Nov 2025 23:36:52 +0000 From: Gabriel Krisman Bertazi To: linux-mm@kvack.org Cc: Gabriel Krisman Bertazi , linux-kernel@vger.kernel.org, jack@suse.cz, Mateusz Guzik , Shakeel Butt , Michal Hocko , Mathieu Desnoyers , Dennis Zhou , Tejun Heo , Christoph Lameter , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan Subject: [RFC PATCH 1/4] lib/percpu_counter: Split out a helper to insert into hotplug list Date: Thu, 27 Nov 2025 18:36:28 -0500 Message-ID: <20251127233635.4170047-2-krisman@suse.de> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251127233635.4170047-1-krisman@suse.de> References: <20251127233635.4170047-1-krisman@suse.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Spamd-Result: default: False [-2.80 / 50.00]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; MID_CONTAINS_FROM(1.00)[]; R_MISSING_CHARSET(0.50)[]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; FUZZY_RATELIMITED(0.00)[rspamd.com]; RCPT_COUNT_TWELVE(0.00)[18]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_ALL(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; URIBL_BLOCKED(0.00)[imap1.dmz-prg2.suse.org:helo,suse.de:mid,suse.de:email]; FROM_HAS_DN(0.00)[]; FREEMAIL_CC(0.00)[suse.de,vger.kernel.org,suse.cz,gmail.com,linux.dev,kernel.org,efficios.com,gentwo.org,linux-foundation.org,redhat.com,oracle.com,google.com]; TO_DN_SOME(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FREEMAIL_ENVRCPT(0.00)[gmail.com] X-Spam-Level: X-Spam-Score: -2.80 X-Spam-Flag: NO Content-Type: text/plain; charset="utf-8" In preparation to using it with the lazy pcpu counter. Signed-off-by: Gabriel Krisman Bertazi --- lib/percpu_counter.c | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c index 2891f94a11c6..c2322d53f3b1 100644 --- a/lib/percpu_counter.c +++ b/lib/percpu_counter.c @@ -185,11 +185,26 @@ s64 __percpu_counter_sum(struct percpu_counter *fbc) } EXPORT_SYMBOL(__percpu_counter_sum); =20 +static int cpu_hotplug_add_watchlist(struct percpu_counter *fbc, int nr_co= unters) +{ +#ifdef CONFIG_HOTPLUG_CPU + unsigned long flags; + int i; + + spin_lock_irqsave(&percpu_counters_lock, flags); + for (i =3D 0; i < nr_counters; i++) { + INIT_LIST_HEAD(&fbc[i].list); + list_add(&fbc[i].list, &percpu_counters); + } + spin_unlock_irqrestore(&percpu_counters_lock, flags); +#endif + return 0; +} + int __percpu_counter_init_many(struct percpu_counter *fbc, s64 amount, gfp_t gfp, u32 nr_counters, struct lock_class_key *key) { - unsigned long flags __maybe_unused; size_t counter_size; s32 __percpu *counters; u32 i; @@ -205,21 +220,12 @@ int __percpu_counter_init_many(struct percpu_counter = *fbc, s64 amount, for (i =3D 0; i < nr_counters; i++) { raw_spin_lock_init(&fbc[i].lock); lockdep_set_class(&fbc[i].lock, key); -#ifdef CONFIG_HOTPLUG_CPU - INIT_LIST_HEAD(&fbc[i].list); -#endif fbc[i].count =3D amount; fbc[i].counters =3D (void __percpu *)counters + i * counter_size; =20 debug_percpu_counter_activate(&fbc[i]); } - -#ifdef CONFIG_HOTPLUG_CPU - spin_lock_irqsave(&percpu_counters_lock, flags); - for (i =3D 0; i < nr_counters; i++) - list_add(&fbc[i].list, &percpu_counters); - spin_unlock_irqrestore(&percpu_counters_lock, flags); -#endif + cpu_hotplug_add_watchlist(fbc, nr_counters); return 0; } EXPORT_SYMBOL(__percpu_counter_init_many); --=20 2.51.0 From nobody Mon Dec 1 21:33:24 2025 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36F5A3277B1 for ; Thu, 27 Nov 2025 23:37:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764286622; cv=none; b=bu53hYe0KRQ4uAZ0V2Xs03WSOLAbpYyGwZo1gJuCZCdkhMXwb4LJ/vDMt2LM1u/LRsjj168hptsTDmQtDB+L6AQH1Xx4P7i+NrWZJ7YFEhhWCurW2Tb/T2XAPeZuYom+Ns9rCviiYi72RdGTc3Jqnq33SNCmHVBPHj8jtvSt+GQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764286622; c=relaxed/simple; bh=Cw7NsTWyVzEdO2hl133j1qVANBmyk+LYibeiMETdwVA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kxrIvphskH68j51a/iQjq1sC/f7p8Sy/WtMu2syfbHHgPORuewRQQe6WeNKMF2VRpQOwxyNW4k+E20CnRGSobN3mhWG1RmVlDb/N8OTB9kI3TZ95sDXqF4UrIhh/qYjbyhpCqzIayzl7YAQVtwS0lcn7sD5kZGI3g0MWbIHAi0Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=P9CT6E+i; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=CRJ91W7M; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=qoSWE1Mi; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=sDGmcqZ8; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="P9CT6E+i"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="CRJ91W7M"; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="qoSWE1Mi"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="sDGmcqZ8" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id A97AF336B9; Thu, 27 Nov 2025 23:36:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1764286617; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IW1WhoGnmYiCAA03AzqLgWTc11O/MV76t6IKrXLxrnY=; b=P9CT6E+i32aDEc/ncUmMRhqXPF6FxJkuuHXYEjVrTEn+G35uNxWhFZ4/F3jVn265iPBPxB 1UdM02xQcuEmVxgJ4ks+8rFdZi8y4LAJAS70ybtwixgI0Zx1+s5zpuRxC+xlZUPGd8DQ2z 6+Dn3RQMHGG8MVQv0DIG0ZZZEe6N2X8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1764286617; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IW1WhoGnmYiCAA03AzqLgWTc11O/MV76t6IKrXLxrnY=; b=CRJ91W7M2UjNES1uJaMK4sLSPIVbDgqXiHj2SUNcK7Ky078cwduw5O8jo6Rs8bj+GHSrrP bYPdCJiG6NgjodCQ== Authentication-Results: smtp-out1.suse.de; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=qoSWE1Mi; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=sDGmcqZ8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1764286615; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IW1WhoGnmYiCAA03AzqLgWTc11O/MV76t6IKrXLxrnY=; b=qoSWE1MiMWJJ1pBHpF7FzIy0dI/3iXoO1ujH7WQwF5yiS7TFaWpnFf0lB1qwggMNseboav moByj5mf7CLObY2Ypq0epVMXWAWSTlieFh6ygXmMH9HczI26G+QjUgGDisT8Af7XLl/dwG JpwBxwQdPmu24EmFb9uhpZ0QFbKmf5M= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1764286615; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IW1WhoGnmYiCAA03AzqLgWTc11O/MV76t6IKrXLxrnY=; b=sDGmcqZ8xtxuw22EbnqoCrQAdfd/P7HqySYVQcOJ32lcfLxiM2nqkUmMvjgVhzLKJzxXKp 9iuURRDaNh/1cnDA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 6962E3EA63; Thu, 27 Nov 2025 23:36:55 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id cBAeE5fgKGlEGQAAD6G6ig (envelope-from ); Thu, 27 Nov 2025 23:36:55 +0000 From: Gabriel Krisman Bertazi To: linux-mm@kvack.org Cc: Gabriel Krisman Bertazi , linux-kernel@vger.kernel.org, jack@suse.cz, Mateusz Guzik , Shakeel Butt , Michal Hocko , Mathieu Desnoyers , Dennis Zhou , Tejun Heo , Christoph Lameter , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan Subject: [RFC PATCH 2/4] lib: Support lazy initialization of per-cpu counters Date: Thu, 27 Nov 2025 18:36:29 -0500 Message-ID: <20251127233635.4170047-3-krisman@suse.de> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251127233635.4170047-1-krisman@suse.de> References: <20251127233635.4170047-1-krisman@suse.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; URIBL_BLOCKED(0.00)[imap1.dmz-prg2.suse.org:helo,imap1.dmz-prg2.suse.org:rdns,suse.cz:email,suse.de:mid,suse.de:email,suse.de:dkim]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; RCPT_COUNT_TWELVE(0.00)[18]; ARC_NA(0.00)[]; RBL_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:104:10:150:64:97:from]; FUZZY_RATELIMITED(0.00)[rspamd.com]; MIME_TRACE(0.00)[0:+]; FREEMAIL_CC(0.00)[suse.de,vger.kernel.org,suse.cz,gmail.com,linux.dev,kernel.org,efficios.com,gentwo.org,linux-foundation.org,redhat.com,oracle.com,google.com]; RCVD_TLS_ALL(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.cz:email,suse.de:mid,suse.de:email,suse.de:dkim,imap1.dmz-prg2.suse.org:helo,imap1.dmz-prg2.suse.org:rdns]; RCVD_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; DNSWL_BLOCKED(0.00)[2a07:de40:b281:104:10:150:64:97:from]; RCVD_VIA_SMTP_AUTH(0.00)[]; SPAMHAUS_XBL(0.00)[2a07:de40:b281:104:10:150:64:97:from]; DKIM_TRACE(0.00)[suse.de:+]; R_RATELIMIT(0.00)[to_ip_from(RLpqz8f45ibb1mrnbixkpon6m4)]; FREEMAIL_ENVRCPT(0.00)[gmail.com] X-Rspamd-Action: no action X-Spam-Flag: NO X-Spam-Score: -3.01 X-Spam-Level: X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Rspamd-Queue-Id: A97AF336B9 Content-Type: text/plain; charset="utf-8" While per-cpu counters are efficient when there is a need for frequent updates from different cpus, they have a non-trivial upfront initialization cost, mainly due to the percpu variable allocation. This cost becomes relevant both for short-lived counters and for cases where we don't know beforehand if there will be frequent updates from remote cpus. On both cases, it could have been better to just use a simple counter. The prime example is rss_stats of single-threaded tasks, where the vast majority of counter updates happen from a single-cpu context at a time, except for slowpath cases, such as OOM, khugepage. For those workloads, a simple counter would have sufficed and likely yielded better overall performance if the tasks were sufficiently short. There is no end of examples of short-lived single-thread workloads, in particular coreutils tools. This patch introduces a new counter flavor that delays the percpu initialization until needed. It is a dual-mode counter. It starts with a two-part counter that can be updated either from a local context through simple arithmetic or from a remote context through an atomic operation. Once remote accesses become more frequent, and the user considers the overhead of atomic updates surpasses the cost of initializing a fully-fledged per-cpu counter, the user can seamlessly upgrade the counter to the per-cpu counter. The first user of this are the rss_stat counters. Benchmarks results are provided on that patch. Suggested-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi --- include/linux/lazy_percpu_counter.h | 145 ++++++++++++++++++++++++++++ include/linux/percpu_counter.h | 5 +- lib/percpu_counter.c | 40 ++++++++ 3 files changed, 189 insertions(+), 1 deletion(-) create mode 100644 include/linux/lazy_percpu_counter.h diff --git a/include/linux/lazy_percpu_counter.h b/include/linux/lazy_percp= u_counter.h new file mode 100644 index 000000000000..7300b8c33507 --- /dev/null +++ b/include/linux/lazy_percpu_counter.h @@ -0,0 +1,145 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#include +#ifndef _LAZY_PERCPU_COUNTER +#define _LAZY_PERCPU_COUNTER + +/* Lazy percpu counter is a bi-modal distributed counter structure that + * starts off as a simple counter and can be upgraded to a full per-cpu + * counter when the user considers more non-local updates are likely to + * happen more frequently in the future. It is useful when non-local + * updates are rare, but might become more frequent after other + * operations. + * + * - Lazy-mode: + * + * Local updates are handled with a simple variable write, while + * non-local updates are handled through an atomic operation. Once + * non-local updates become more likely to happen in the future, the + * user can upgrade the counter, turning it into a normal + * per-cpu counter. + * + * Concurrency safety of 'local' accesses must be guaranteed by the + * caller API, either through task-local accesses or by external locks. + * + * In the initial lazy-mode, read is guaranteed to be exact only when + * reading from the local context with lazy_percpu_counter_sum_local. + * + * - Non-lazy-mode: + * Behaves as a per-cpu counter. + */ + +struct lazy_percpu_counter { + struct percpu_counter c; +}; + +#define LAZY_INIT_BIAS (1<<0) + +static inline s64 add_bias(long val) +{ + return (val << 1) | LAZY_INIT_BIAS; +} +static inline s64 remove_bias(long val) +{ + return val >> 1; +} + +static inline bool lazy_percpu_counter_initialized(struct lazy_percpu_coun= ter *lpc) +{ + return !(atomic_long_read(&lpc->c.remote) & LAZY_INIT_BIAS); +} + +static inline void lazy_percpu_counter_init_many(struct lazy_percpu_counte= r *lpc, int amount, + int nr_counters) +{ + for (int i =3D 0; i < nr_counters; i++) { + lpc[i].c.count =3D amount; + atomic_long_set(&lpc[i].c.remote, LAZY_INIT_BIAS); + raw_spin_lock_init(&lpc[i].c.lock); + } +} + +static inline void lazy_percpu_counter_add_atomic(struct lazy_percpu_count= er *lpc, s64 amount) +{ + long x =3D amount << 1; + long counter; + + do { + counter =3D atomic_long_read(&lpc->c.remote); + if (!(counter & LAZY_INIT_BIAS)) { + percpu_counter_add(&lpc->c, amount); + return; + } + } while (atomic_long_cmpxchg_relaxed(&lpc->c.remote, counter, (counter+x)= ) !=3D counter); +} + +static inline void lazy_percpu_counter_add_fast(struct lazy_percpu_counter= *lpc, s64 amount) +{ + if (lazy_percpu_counter_initialized(lpc)) + percpu_counter_add(&lpc->c, amount); + else + lpc->c.count +=3D amount; +} + +/* + * lazy_percpu_counter_sync needs to be protected against concurrent + * local updates. + */ +static inline s64 lazy_percpu_counter_sum_local(struct lazy_percpu_counter= *lpc) +{ + if (lazy_percpu_counter_initialized(lpc)) + return percpu_counter_sum(&lpc->c); + + lazy_percpu_counter_add_atomic(lpc, lpc->c.count); + lpc->c.count =3D 0; + return remove_bias(atomic_long_read(&lpc->c.remote)); +} + +static inline s64 lazy_percpu_counter_sum(struct lazy_percpu_counter *lpc) +{ + if (lazy_percpu_counter_initialized(lpc)) + return percpu_counter_sum(&lpc->c); + return remove_bias(atomic_long_read(&lpc->c.remote)) + lpc->c.count; +} + +static inline s64 lazy_percpu_counter_sum_positive(struct lazy_percpu_coun= ter *lpc) +{ + s64 val =3D lazy_percpu_counter_sum(lpc); + + return (val > 0) ? val : 0; +} + +static inline s64 lazy_percpu_counter_read(struct lazy_percpu_counter *lpc) +{ + if (lazy_percpu_counter_initialized(lpc)) + return percpu_counter_read(&lpc->c); + return remove_bias(atomic_long_read(&lpc->c.remote)) + lpc->c.count; +} + +static inline s64 lazy_percpu_counter_read_positive(struct lazy_percpu_cou= nter *lpc) +{ + s64 val =3D lazy_percpu_counter_read(lpc); + + return (val > 0) ? val : 0; +} + +int __lazy_percpu_counter_upgrade_many(struct lazy_percpu_counter *c, + int nr_counters, gfp_t gfp); +static inline int lazy_percpu_counter_upgrade_many(struct lazy_percpu_coun= ter *c, + int nr_counters, gfp_t gfp) +{ + /* Only check the first element, as batches are expected to be + * upgraded together. + */ + if (!lazy_percpu_counter_initialized(c)) + return __lazy_percpu_counter_upgrade_many(c, nr_counters, gfp); + return 0; +} + +static inline void lazy_percpu_counter_destroy_many(struct lazy_percpu_cou= nter *lpc, + u32 nr_counters) +{ + /* Only check the first element, as they must have been initialized toget= her. */ + if (lazy_percpu_counter_initialized(lpc)) + percpu_counter_destroy_many((struct percpu_counter *)lpc, nr_counters); +} +#endif diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h index 3a44dd1e33d2..e6fada9cba44 100644 --- a/include/linux/percpu_counter.h +++ b/include/linux/percpu_counter.h @@ -25,7 +25,10 @@ struct percpu_counter { #ifdef CONFIG_HOTPLUG_CPU struct list_head list; /* All percpu_counters are on a list */ #endif - s32 __percpu *counters; + union { + s32 __percpu *counters; + atomic_long_t remote; + }; }; =20 extern int percpu_counter_batch; diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c index c2322d53f3b1..0a210496f219 100644 --- a/lib/percpu_counter.c +++ b/lib/percpu_counter.c @@ -4,6 +4,7 @@ */ =20 #include +#include #include #include #include @@ -397,6 +398,45 @@ bool __percpu_counter_limited_add(struct percpu_counte= r *fbc, return good; } =20 +int __lazy_percpu_counter_upgrade_many(struct lazy_percpu_counter *counter= s, + int nr_counters, gfp_t gfp) +{ + s32 __percpu *pcpu_mem; + size_t counter_size; + + counter_size =3D ALIGN(sizeof(*pcpu_mem), __alignof__(*pcpu_mem)); + pcpu_mem =3D __alloc_percpu_gfp(nr_counters * counter_size, + __alignof__(*pcpu_mem), gfp); + if (!pcpu_mem) + return -ENOMEM; + + for (int i =3D 0; i < nr_counters; i++) { + struct lazy_percpu_counter *lpc =3D &(counters[i]); + s32 __percpu *n_counter; + s64 remote =3D 0; + + WARN_ON(lazy_percpu_counter_initialized(lpc)); + + /* + * After the xchg, lazy_percpu_counter behaves as a + * regular percpu counter. + */ + n_counter =3D (void __percpu *)pcpu_mem + i * counter_size; + remote =3D (s64) atomic_long_xchg(&lpc->c.remote, (s64)(uintptr_t) n_cou= nter); + + BUG_ON(!(remote & LAZY_INIT_BIAS)); + + percpu_counter_add_local(&lpc->c, remove_bias(remote)); + } + + for (int i =3D 0; i < nr_counters; i++) + debug_percpu_counter_activate(&counters[i].c); + + cpu_hotplug_add_watchlist((struct percpu_counter *) counters, nr_counters= ); + + return 0; +} + static int __init percpu_counter_startup(void) { int ret; --=20 2.51.0 From nobody Mon Dec 1 21:33:24 2025 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D18131D38C for ; Thu, 27 Nov 2025 23:37:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764286624; cv=none; b=PZ4o7zIrimdcfYa1Gk5MCJ2lWYMlYjsFZ2LdEgDkIJJUMXIFDYg03z5pJKlKYGGIciQyO8/eMosl880TCShYKPXORN+bb2bEcBrHF/v9LXKeIwzOLuSfa+ZObmWLc9/BP3opkDTSDSmBsAiFUuXX6vs8kj0nZ0D7IDj7rOUgDSY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764286624; c=relaxed/simple; bh=6vqEFjqx4qtg11JFsFpgLmKdKFEFnYEv3+5ZQi+BrQs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pEfoypCrpVaVO7qGZj/ZOMoqq5NWj0eQtDO+FsY9VOQVNrCGGhOYqQXTCgiikusbqRoQ+MOqjYeThsfe6j9OV/DeQHNnHjpptIvXhOIUrtfLeFkosdTff9WZpNbSvNYGKi8ThcGQRdjKs/qxg3jpwBM1S8jBZcC+buosKYtWgZ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=Ap4vSDWm; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=pZCS4rKm; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=Qdftk0iS; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=RfFM/vp6; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="Ap4vSDWm"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="pZCS4rKm"; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="Qdftk0iS"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="RfFM/vp6" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 2A7AA5BD09; Thu, 27 Nov 2025 23:36:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1764286619; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n0UUaCVzJQj1/iuGKRH/cFH/A07YpZOHP7llM/q/NC4=; b=Ap4vSDWmeM14JLoIya0pM7b9gtnZKJd4TDGsh3BOnyTh+J1Y7xwX0rJ3e/aw0tCqhP0vOE HdJ9yzbIhkFNiTCVTbzMVOcFW8OuNO2eSHufieMWIwI4X2nIOkV+0lK3XUvcvyyXzwjASN YlrKMB83IdkHBvRAwepkhGRnMuchCaQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1764286619; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n0UUaCVzJQj1/iuGKRH/cFH/A07YpZOHP7llM/q/NC4=; b=pZCS4rKmX00/K7qjGPGeGRncIEMz2ZfSI/TAe9YyeOIW7sloLQ/VAOU2EcX0g4VSbcJBfW UlWvNhqTyBeZCyDA== Authentication-Results: smtp-out2.suse.de; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=Qdftk0iS; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b="RfFM/vp6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1764286618; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n0UUaCVzJQj1/iuGKRH/cFH/A07YpZOHP7llM/q/NC4=; b=Qdftk0iSZmgPWzxli5uiBsXKNZIkrmhwaRuka7bf3JoqtXoRh/Qk6y2mtVASmG6u2V0HVm hIhxIJxY0Rk8KWvMmT6MBvpwQ/LQMaX7x0d56MJ8ZP8+6jomlUGqah1yAgZVpFOm/a58WA frBAff0NWHfn1540T1u4wv7ykcC1JqI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1764286618; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n0UUaCVzJQj1/iuGKRH/cFH/A07YpZOHP7llM/q/NC4=; b=RfFM/vp6sSa9Z/nrGP4ohEtuOJWX49AKgGQPVvl7/g3pGhZxCYRDrjghsrid5emCC+xYLL xwQkw+YaO9oM1KBA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id C09633EA65; Thu, 27 Nov 2025 23:36:57 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id WoV6I5ngKGlGGQAAD6G6ig (envelope-from ); Thu, 27 Nov 2025 23:36:57 +0000 From: Gabriel Krisman Bertazi To: linux-mm@kvack.org Cc: Gabriel Krisman Bertazi , linux-kernel@vger.kernel.org, jack@suse.cz, Mateusz Guzik , Shakeel Butt , Michal Hocko , Mathieu Desnoyers , Dennis Zhou , Tejun Heo , Christoph Lameter , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan Subject: [RFC PATCH 3/4] mm: Avoid percpu MM counters on single-threaded tasks Date: Thu, 27 Nov 2025 18:36:30 -0500 Message-ID: <20251127233635.4170047-4-krisman@suse.de> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251127233635.4170047-1-krisman@suse.de> References: <20251127233635.4170047-1-krisman@suse.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; URIBL_BLOCKED(0.00)[benchmark.sh:url,imap1.dmz-prg2.suse.org:helo,imap1.dmz-prg2.suse.org:rdns,suse.cz:email,suse.de:mid,suse.de:email,suse.de:dkim]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; RCPT_COUNT_TWELVE(0.00)[18]; ARC_NA(0.00)[]; RBL_SPAMHAUS_BLOCKED_OPENRESOLVER(0.00)[2a07:de40:b281:104:10:150:64:97:from]; FUZZY_RATELIMITED(0.00)[rspamd.com]; MIME_TRACE(0.00)[0:+]; FREEMAIL_CC(0.00)[suse.de,vger.kernel.org,suse.cz,gmail.com,linux.dev,kernel.org,efficios.com,gentwo.org,linux-foundation.org,redhat.com,oracle.com,google.com]; RCVD_TLS_ALL(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.cz:email,benchmark.sh:url,suse.de:mid,suse.de:email,suse.de:dkim,imap1.dmz-prg2.suse.org:helo,imap1.dmz-prg2.suse.org:rdns]; RCVD_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; DNSWL_BLOCKED(0.00)[2a07:de40:b281:104:10:150:64:97:from]; RCVD_VIA_SMTP_AUTH(0.00)[]; SPAMHAUS_XBL(0.00)[2a07:de40:b281:104:10:150:64:97:from]; DKIM_TRACE(0.00)[suse.de:+]; R_RATELIMIT(0.00)[to_ip_from(RLpqz8f45ibb1mrnbixkpon6m4)]; FREEMAIL_ENVRCPT(0.00)[gmail.com] X-Rspamd-Action: no action X-Spam-Flag: NO X-Spam-Score: -3.01 X-Spam-Level: X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Rspamd-Queue-Id: 2A7AA5BD09 Content-Type: text/plain; charset="utf-8" The cost of the pcpu memory allocation when forking a new task is non-negligible, as reported in a few occasions, such as [1]. But it can also be fully avoided for single-threaded applications, where we know the vast majority of updates happen from the local task context. For the trivial benchmark, bound to cpu 0 to reduce cost of migrations), like below: for (( i =3D 0; i < 20000; i++ )); do /bin/true; done on an 80c machine, this patchset yielded a 6% improvement in system time. On a 256c machine, the system time reduced by 11%. Profiling shows mm_init went from 13.5% of samples to less than 3.33% in the same 256c machine: Before: - 13.50% 3.93% benchmark.sh [kernel.kallsyms] [k] mm_init - 9.57% mm_init + 4.80% pcpu_alloc_noprof + 3.87% __percpu_counter_init_many After: - 3.33% 0.80% benchmark.sh [kernel.kallsyms] [k] mm_init - 2.53% mm_init + 2.05% pcpu_alloc_noprof For kernbench in 256c, the patchset yields a 1.4% improvement on system time. For gitsource, the improvement in system time I'm measuring is around 3.12%. The upgrade adds some overhead to the second fork, in particular an atomic operation, besides the expensive allocation that was moved from the first fork to the second. So a fair question is the impact of this patchset on multi-threaded applications. I wrote a microbenchmark similar to the /bin/true above, but that just spawns a second pthread and waits for it to finish. The second thread just returns immediately. This is executed in a loop, bound to a single NUMA node, with: for (( i =3D 0; i < 20000; i++ )); do /bin/parallel-true; done Profiling shows the lazy upgrade impact is minimal to the performance: - 0.68% 0.04% parallel-true [kernel.kallsyms] [k] __lazy_percpu_c= ounter_upgrade_many - 0.64% __lazy_percpu_counter_upgrade_many 0.62% pcpu_alloc_noprof Which is confirmed by the measured system time. With 20k runs, i'm still getting a slight improvement from baseline for the 2t case (2-4%). [1] https://lore.kernel.org/all/20230608111408.s2minsenlcjow7q3@quack3 Suggested-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi --- include/linux/mm.h | 24 ++++++++---------------- include/linux/mm_types.h | 4 ++-- include/trace/events/kmem.h | 4 ++-- kernel/fork.c | 14 ++++++-------- 4 files changed, 18 insertions(+), 28 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index d16b33bacc32..29de4c60ac6c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2679,36 +2679,28 @@ static inline bool get_user_page_fast_only(unsigned= long addr, */ static inline unsigned long get_mm_counter(struct mm_struct *mm, int membe= r) { - return percpu_counter_read_positive(&mm->rss_stat[member]); + return lazy_percpu_counter_read_positive(&mm->rss_stat[member]); } =20 static inline unsigned long get_mm_counter_sum(struct mm_struct *mm, int m= ember) { - return percpu_counter_sum_positive(&mm->rss_stat[member]); + return lazy_percpu_counter_sum_positive(&mm->rss_stat[member]); } =20 void mm_trace_rss_stat(struct mm_struct *mm, int member); =20 static inline void add_mm_counter(struct mm_struct *mm, int member, long v= alue) { - percpu_counter_add(&mm->rss_stat[member], value); - - mm_trace_rss_stat(mm, member); -} - -static inline void inc_mm_counter(struct mm_struct *mm, int member) -{ - percpu_counter_inc(&mm->rss_stat[member]); + if (READ_ONCE(current->mm) =3D=3D mm) + lazy_percpu_counter_add_fast(&mm->rss_stat[member], value); + else + lazy_percpu_counter_add_atomic(&mm->rss_stat[member], value); =20 mm_trace_rss_stat(mm, member); } =20 -static inline void dec_mm_counter(struct mm_struct *mm, int member) -{ - percpu_counter_dec(&mm->rss_stat[member]); - - mm_trace_rss_stat(mm, member); -} +#define inc_mm_counter(mm, member) add_mm_counter(mm, member, 1) +#define dec_mm_counter(mm, member) add_mm_counter(mm, member, -1) =20 /* Optimized variant when folio is already known not to be anon */ static inline int mm_counter_file(struct folio *folio) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 90e5790c318f..5a8d677efa85 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -18,7 +18,7 @@ #include #include #include -#include +#include #include #include =20 @@ -1119,7 +1119,7 @@ struct mm_struct { unsigned long saved_e_flags; #endif =20 - struct percpu_counter rss_stat[NR_MM_COUNTERS]; + struct lazy_percpu_counter rss_stat[NR_MM_COUNTERS]; =20 struct linux_binfmt *binfmt; =20 diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h index 7f93e754da5c..e21572f4d8a6 100644 --- a/include/trace/events/kmem.h +++ b/include/trace/events/kmem.h @@ -442,8 +442,8 @@ TRACE_EVENT(rss_stat, __entry->mm_id =3D mm_ptr_to_hash(mm); __entry->curr =3D !!(current->mm =3D=3D mm); __entry->member =3D member; - __entry->size =3D (percpu_counter_sum_positive(&mm->rss_stat[member]) - << PAGE_SHIFT); + __entry->size =3D (lazy_percpu_counter_sum_positive(&mm->rss_stat[member= ]) + << PAGE_SHIFT); ), =20 TP_printk("mm_id=3D%u curr=3D%d type=3D%s size=3D%ldB", diff --git a/kernel/fork.c b/kernel/fork.c index 3da0f08615a9..92698c60922e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -583,7 +583,7 @@ static void check_mm(struct mm_struct *mm) "Please make sure 'struct resident_page_types[]' is updated as well"); =20 for (i =3D 0; i < NR_MM_COUNTERS; i++) { - long x =3D percpu_counter_sum(&mm->rss_stat[i]); + long x =3D lazy_percpu_counter_sum_local(&mm->rss_stat[i]); =20 if (unlikely(x)) { pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld Comm:%s Pid:= %d\n", @@ -688,7 +688,7 @@ void __mmdrop(struct mm_struct *mm) put_user_ns(mm->user_ns); mm_pasid_drop(mm); mm_destroy_cid(mm); - percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS); + lazy_percpu_counter_destroy_many(mm->rss_stat, NR_MM_COUNTERS); =20 free_mm(mm); } @@ -1083,16 +1083,11 @@ static struct mm_struct *mm_init(struct mm_struct *= mm, struct task_struct *p, if (mm_alloc_cid(mm, p)) goto fail_cid; =20 - if (percpu_counter_init_many(mm->rss_stat, 0, GFP_KERNEL_ACCOUNT, - NR_MM_COUNTERS)) - goto fail_pcpu; - + lazy_percpu_counter_init_many(mm->rss_stat, 0, NR_MM_COUNTERS); mm->user_ns =3D get_user_ns(user_ns); lru_gen_init_mm(mm); return mm; =20 -fail_pcpu: - mm_destroy_cid(mm); fail_cid: destroy_context(mm); fail_nocontext: @@ -1535,6 +1530,9 @@ static int copy_mm(u64 clone_flags, struct task_struc= t *tsk) return 0; =20 if (clone_flags & CLONE_VM) { + if (lazy_percpu_counter_upgrade_many(oldmm->rss_stat, + NR_MM_COUNTERS, GFP_KERNEL_ACCOUNT)) + return -ENOMEM; mmget(oldmm); mm =3D oldmm; } else { --=20 2.51.0 From nobody Mon Dec 1 21:33:24 2025 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39CA2322DD0 for ; Thu, 27 Nov 2025 23:37:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764286631; cv=none; b=oeMwRX3wS01P7LTHhcxa5u/HldA8HjZdRnj0TA9pRj/P3QCO7gd4eW8DvbhoWkB9Gjz26RYd3bXj/nGr/gY7I5nBVne8T/OAx9bd3uzgjOsS+A9sipwjC5BJIr5P0dQOn+/l7zjRKNUmEpC3+qUpcl0je3orF6t+NPbRsE6mEFA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764286631; c=relaxed/simple; bh=tDwk1I3VaSC472orDHjOphavJ/t+I4WrhRaBpxRD3MM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YTHKT9yamT7vXe3cRAfgE6qyxgxSo3dXpsYhy1/mmAJkr7a/Gwu8t2teD0jSZ9oNvr8MJKg1+cprzGlajWWRAf3EFwKAQN9YVXsGfNsnmbngkX9GbSPW2UfPPAE8AhyYFOPpltk6ddKk211ABQqIhiSNB65AAFHlaTk9S1T1TVU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=shy6esmj; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=yAyK82Jg; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=shy6esmj; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=yAyK82Jg; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="shy6esmj"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="yAyK82Jg"; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="shy6esmj"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="yAyK82Jg" Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 920BD5BD0D; Thu, 27 Nov 2025 23:37:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1764286620; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GXN77uvZaUEzKpDFHr5D8RJvGuDYGsfduCNQSgDBJBM=; b=shy6esmjWDwV1Q5h4BPZmal58rRAd/KUi1HdBHgwxwrWaQYhCsclNBw6iS00ZR82wYJTKF 9LTXhtLSfIUtWDQKy11UkoSKYLEvyzIjyvmuthZQYLaiCLFoAZAvEmJ93T+Aa+fi6HTIyc RLy58zmqHpdpiBPwI7jaHHPoPVdyw14= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1764286620; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GXN77uvZaUEzKpDFHr5D8RJvGuDYGsfduCNQSgDBJBM=; b=yAyK82Jgo+FHSngW/yAhnG1uWQvgdDMtxLUPg5fpqwMJgP4CS5gpJLwroubvrhd8cMmp1l qMUTpXoABkTRolAg== Authentication-Results: smtp-out2.suse.de; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=shy6esmj; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=yAyK82Jg DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1764286620; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GXN77uvZaUEzKpDFHr5D8RJvGuDYGsfduCNQSgDBJBM=; b=shy6esmjWDwV1Q5h4BPZmal58rRAd/KUi1HdBHgwxwrWaQYhCsclNBw6iS00ZR82wYJTKF 9LTXhtLSfIUtWDQKy11UkoSKYLEvyzIjyvmuthZQYLaiCLFoAZAvEmJ93T+Aa+fi6HTIyc RLy58zmqHpdpiBPwI7jaHHPoPVdyw14= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1764286620; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GXN77uvZaUEzKpDFHr5D8RJvGuDYGsfduCNQSgDBJBM=; b=yAyK82Jgo+FHSngW/yAhnG1uWQvgdDMtxLUPg5fpqwMJgP4CS5gpJLwroubvrhd8cMmp1l qMUTpXoABkTRolAg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 54D883EA63; Thu, 27 Nov 2025 23:37:00 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id N2MbDpzgKGlOGQAAD6G6ig (envelope-from ); Thu, 27 Nov 2025 23:37:00 +0000 From: Gabriel Krisman Bertazi To: linux-mm@kvack.org Cc: Gabriel Krisman Bertazi , linux-kernel@vger.kernel.org, jack@suse.cz, Mateusz Guzik , Shakeel Butt , Michal Hocko , Mathieu Desnoyers , Dennis Zhou , Tejun Heo , Christoph Lameter , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan Subject: [RFC PATCH 4/4] mm: Split a slow path for updating mm counters Date: Thu, 27 Nov 2025 18:36:31 -0500 Message-ID: <20251127233635.4170047-5-krisman@suse.de> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251127233635.4170047-1-krisman@suse.de> References: <20251127233635.4170047-1-krisman@suse.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Spamd-Result: default: False [-3.01 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MID_CONTAINS_FROM(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; R_MISSING_CHARSET(0.50)[]; R_DKIM_ALLOW(-0.20)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; URIBL_BLOCKED(0.00)[imap1.dmz-prg2.suse.org:rdns,imap1.dmz-prg2.suse.org:helo,suse.de:mid,suse.de:dkim,suse.de:email]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; SPAMHAUS_XBL(0.00)[2a07:de40:b281:104:10:150:64:97:from]; RCPT_COUNT_TWELVE(0.00)[18]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; FUZZY_RATELIMITED(0.00)[rspamd.com]; FREEMAIL_CC(0.00)[suse.de,vger.kernel.org,suse.cz,gmail.com,linux.dev,kernel.org,efficios.com,gentwo.org,linux-foundation.org,redhat.com,oracle.com,google.com]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; DNSWL_BLOCKED(0.00)[2a07:de40:b281:104:10:150:64:97:from]; RCVD_VIA_SMTP_AUTH(0.00)[]; DWL_DNSWL_BLOCKED(0.00)[suse.de:dkim]; DKIM_TRACE(0.00)[suse.de:+]; R_RATELIMIT(0.00)[to_ip_from(RLpqz8f45ibb1mrnbixkpon6m4)]; FREEMAIL_ENVRCPT(0.00)[gmail.com] X-Spam-Level: X-Spam-Score: -3.01 X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Rspamd-Queue-Id: 920BD5BD0D X-Rspamd-Action: no action X-Spam-Flag: NO Content-Type: text/plain; charset="utf-8" For cases where we know we are not coming from local context, there is no point in touching current when incrementing/decrementing the counters. Split this path into another helper to avoid this cost. Signed-off-by: Gabriel Krisman Bertazi --- arch/s390/mm/gmap_helpers.c | 4 ++-- arch/s390/mm/pgtable.c | 4 ++-- fs/exec.c | 2 +- include/linux/mm.h | 14 +++++++++++--- kernel/events/uprobes.c | 2 +- mm/filemap.c | 2 +- mm/huge_memory.c | 22 +++++++++++----------- mm/khugepaged.c | 6 +++--- mm/ksm.c | 2 +- mm/madvise.c | 2 +- mm/memory.c | 20 ++++++++++---------- mm/migrate.c | 2 +- mm/migrate_device.c | 2 +- mm/rmap.c | 16 ++++++++-------- mm/swapfile.c | 6 +++--- mm/userfaultfd.c | 2 +- 16 files changed, 58 insertions(+), 50 deletions(-) diff --git a/arch/s390/mm/gmap_helpers.c b/arch/s390/mm/gmap_helpers.c index d4c3c36855e2..6d8498c56d08 100644 --- a/arch/s390/mm/gmap_helpers.c +++ b/arch/s390/mm/gmap_helpers.c @@ -29,9 +29,9 @@ static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry) { if (!non_swap_entry(entry)) - dec_mm_counter(mm, MM_SWAPENTS); + dec_mm_counter_other(mm, MM_SWAPENTS); else if (is_migration_entry(entry)) - dec_mm_counter(mm, mm_counter(pfn_swap_entry_folio(entry))); + dec_mm_counter_other(mm, mm_counter(pfn_swap_entry_folio(entry))); free_swap_and_cache(entry); } =20 diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index 0fde20bbc50b..021a04f958e5 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -686,11 +686,11 @@ void ptep_unshadow_pte(struct mm_struct *mm, unsigned= long saddr, pte_t *ptep) static void ptep_zap_swap_entry(struct mm_struct *mm, swp_entry_t entry) { if (!non_swap_entry(entry)) - dec_mm_counter(mm, MM_SWAPENTS); + dec_mm_counter_other(mm, MM_SWAPENTS); else if (is_migration_entry(entry)) { struct folio *folio =3D pfn_swap_entry_folio(entry); =20 - dec_mm_counter(mm, mm_counter(folio)); + dec_mm_counter_other(mm, mm_counter(folio)); } free_swap_and_cache(entry); } diff --git a/fs/exec.c b/fs/exec.c index 4298e7e08d5d..33d0eb00d315 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -137,7 +137,7 @@ static void acct_arg_size(struct linux_binprm *bprm, un= signed long pages) return; =20 bprm->vma_pages =3D pages; - add_mm_counter(mm, MM_ANONPAGES, diff); + add_mm_counter_local(mm, MM_ANONPAGES, diff); } =20 static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long = pos, diff --git a/include/linux/mm.h b/include/linux/mm.h index 29de4c60ac6c..2db12280e938 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2689,7 +2689,7 @@ static inline unsigned long get_mm_counter_sum(struct= mm_struct *mm, int member) =20 void mm_trace_rss_stat(struct mm_struct *mm, int member); =20 -static inline void add_mm_counter(struct mm_struct *mm, int member, long v= alue) +static inline void add_mm_counter_local(struct mm_struct *mm, int member, = long value) { if (READ_ONCE(current->mm) =3D=3D mm) lazy_percpu_counter_add_fast(&mm->rss_stat[member], value); @@ -2698,9 +2698,17 @@ static inline void add_mm_counter(struct mm_struct *= mm, int member, long value) =20 mm_trace_rss_stat(mm, member); } +static inline void add_mm_counter_other(struct mm_struct *mm, int member, = long value) +{ + lazy_percpu_counter_add_atomic(&mm->rss_stat[member], value); + + mm_trace_rss_stat(mm, member); +} =20 -#define inc_mm_counter(mm, member) add_mm_counter(mm, member, 1) -#define dec_mm_counter(mm, member) add_mm_counter(mm, member, -1) +#define inc_mm_counter_local(mm, member) add_mm_counter_local(mm, member, = 1) +#define dec_mm_counter_local(mm, member) add_mm_counter_local(mm, member, = -1) +#define inc_mm_counter_other(mm, member) add_mm_counter_other(mm, member, = 1) +#define dec_mm_counter_other(mm, member) add_mm_counter_other(mm, member, = -1) =20 /* Optimized variant when folio is already known not to be anon */ static inline int mm_counter_file(struct folio *folio) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 8709c69118b5..9c0e73dd2948 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -447,7 +447,7 @@ static int __uprobe_write(struct vm_area_struct *vma, if (!orig_page_is_identical(vma, vaddr, fw->page, &pmd_mappable)) goto remap; =20 - dec_mm_counter(vma->vm_mm, MM_ANONPAGES); + dec_mm_counter_other(vma->vm_mm, MM_ANONPAGES); folio_remove_rmap_pte(folio, fw->page, vma); if (!folio_mapped(folio) && folio_test_swapcache(folio) && folio_trylock(folio)) { diff --git a/mm/filemap.c b/mm/filemap.c index 13f0259d993c..5d1656e63602 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3854,7 +3854,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, =20 folio_unlock(folio); } while ((folio =3D next_uptodate_folio(&xas, mapping, end_pgoff)) !=3D N= ULL); - add_mm_counter(vma->vm_mm, folio_type, rss); + add_mm_counter_other(vma->vm_mm, folio_type, rss); pte_unmap_unlock(vmf->pte, vmf->ptl); trace_mm_filemap_map_pages(mapping, start_pgoff, end_pgoff); out: diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1b81680b4225..614b0a8e168b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1228,7 +1228,7 @@ static void map_anon_folio_pmd(struct folio *folio, p= md_t *pmd, folio_add_lru_vma(folio, vma); set_pmd_at(vma->vm_mm, haddr, pmd, entry); update_mmu_cache_pmd(vma, haddr, pmd); - add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + add_mm_counter_local(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); count_vm_event(THP_FAULT_ALLOC); count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC); count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC); @@ -1444,7 +1444,7 @@ static vm_fault_t insert_pmd(struct vm_area_struct *v= ma, unsigned long addr, } else { folio_get(fop.folio); folio_add_file_rmap_pmd(fop.folio, &fop.folio->page, vma); - add_mm_counter(mm, mm_counter_file(fop.folio), HPAGE_PMD_NR); + add_mm_counter_local(mm, mm_counter_file(fop.folio), HPAGE_PMD_NR); } } else { entry =3D pmd_mkhuge(pfn_pmd(fop.pfn, prot)); @@ -1563,7 +1563,7 @@ static vm_fault_t insert_pud(struct vm_area_struct *v= ma, unsigned long addr, =20 folio_get(fop.folio); folio_add_file_rmap_pud(fop.folio, &fop.folio->page, vma); - add_mm_counter(mm, mm_counter_file(fop.folio), HPAGE_PUD_NR); + add_mm_counter_local(mm, mm_counter_file(fop.folio), HPAGE_PUD_NR); } else { entry =3D pud_mkhuge(pfn_pud(fop.pfn, prot)); entry =3D pud_mkspecial(entry); @@ -1714,7 +1714,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm= _struct *src_mm, pmd =3D pmd_swp_mkuffd_wp(pmd); set_pmd_at(src_mm, addr, src_pmd, pmd); } - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); + add_mm_counter_local(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); mm_inc_nr_ptes(dst_mm); pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); if (!userfaultfd_wp(dst_vma)) @@ -1758,7 +1758,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm= _struct *src_mm, __split_huge_pmd(src_vma, src_pmd, addr, false); return -EAGAIN; } - add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); + add_mm_counter_local(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR); out_zero_page: mm_inc_nr_ptes(dst_mm); pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable); @@ -2223,11 +2223,11 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_= area_struct *vma, =20 if (folio_test_anon(folio)) { zap_deposited_table(tlb->mm, pmd); - add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); + add_mm_counter_other(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); } else { if (arch_needs_pgtable_deposit()) zap_deposited_table(tlb->mm, pmd); - add_mm_counter(tlb->mm, mm_counter_file(folio), + add_mm_counter_other(tlb->mm, mm_counter_file(folio), -HPAGE_PMD_NR); =20 /* @@ -2719,7 +2719,7 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_ar= ea_struct *vma, page =3D pud_page(orig_pud); folio =3D page_folio(page); folio_remove_rmap_pud(folio, page, vma); - add_mm_counter(tlb->mm, mm_counter_file(folio), -HPAGE_PUD_NR); + add_mm_counter_other(tlb->mm, mm_counter_file(folio), -HPAGE_PUD_NR); =20 spin_unlock(ptl); tlb_remove_page_size(tlb, page, HPAGE_PUD_SIZE); @@ -2755,7 +2755,7 @@ static void __split_huge_pud_locked(struct vm_area_st= ruct *vma, pud_t *pud, folio_set_referenced(folio); folio_remove_rmap_pud(folio, page, vma); folio_put(folio); - add_mm_counter(vma->vm_mm, mm_counter_file(folio), + add_mm_counter_local(vma->vm_mm, mm_counter_file(folio), -HPAGE_PUD_NR); } =20 @@ -2874,7 +2874,7 @@ static void __split_huge_pmd_locked(struct vm_area_st= ruct *vma, pmd_t *pmd, folio_remove_rmap_pmd(folio, page, vma); folio_put(folio); } - add_mm_counter(mm, mm_counter_file(folio), -HPAGE_PMD_NR); + add_mm_counter_local(mm, mm_counter_file(folio), -HPAGE_PMD_NR); return; } =20 @@ -3188,7 +3188,7 @@ static bool __discard_anon_folio_pmd_locked(struct vm= _area_struct *vma, =20 folio_remove_rmap_pmd(folio, pmd_page(orig_pmd), vma); zap_deposited_table(mm, pmdp); - add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR); + add_mm_counter_local(mm, MM_ANONPAGES, -HPAGE_PMD_NR); if (vma->vm_flags & VM_LOCKED) mlock_drain_local(); folio_put(folio); diff --git a/mm/khugepaged.c b/mm/khugepaged.c index abe54f0043c7..a6634ca0667d 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -691,7 +691,7 @@ static void __collapse_huge_page_copy_succeeded(pte_t *= pte, nr_ptes =3D 1; pteval =3D ptep_get(_pte); if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { - add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1); + add_mm_counter_other(vma->vm_mm, MM_ANONPAGES, 1); if (is_zero_pfn(pte_pfn(pteval))) { /* * ptl mostly unnecessary. @@ -1664,7 +1664,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, uns= igned long addr, /* step 3: set proper refcount and mm_counters. */ if (nr_mapped_ptes) { folio_ref_sub(folio, nr_mapped_ptes); - add_mm_counter(mm, mm_counter_file(folio), -nr_mapped_ptes); + add_mm_counter_other(mm, mm_counter_file(folio), -nr_mapped_ptes); } =20 /* step 4: remove empty page table */ @@ -1700,7 +1700,7 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, uns= igned long addr, if (nr_mapped_ptes) { flush_tlb_mm(mm); folio_ref_sub(folio, nr_mapped_ptes); - add_mm_counter(mm, mm_counter_file(folio), -nr_mapped_ptes); + add_mm_counter_other(mm, mm_counter_file(folio), -nr_mapped_ptes); } unlock: if (start_pte) diff --git a/mm/ksm.c b/mm/ksm.c index 7bc726b50b2f..7434cf1f4925 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1410,7 +1410,7 @@ static int replace_page(struct vm_area_struct *vma, s= truct page *page, * will get wrong values in /proc, and a BUG message in dmesg * when tearing down the mm. */ - dec_mm_counter(mm, MM_ANONPAGES); + dec_mm_counter_other(mm, MM_ANONPAGES); } =20 flush_cache_page(vma, addr, pte_pfn(ptep_get(ptep))); diff --git a/mm/madvise.c b/mm/madvise.c index fb1c86e630b6..ba7ea134f5ad 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -776,7 +776,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned = long addr, } =20 if (nr_swap) - add_mm_counter(mm, MM_SWAPENTS, nr_swap); + add_mm_counter_local(mm, MM_SWAPENTS, nr_swap); if (start_pte) { arch_leave_lazy_mmu_mode(); pte_unmap_unlock(start_pte, ptl); diff --git a/mm/memory.c b/mm/memory.c index 74b45e258323..9a18ac25955c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -488,7 +488,7 @@ static inline void add_mm_rss_vec(struct mm_struct *mm,= int *rss) =20 for (i =3D 0; i < NR_MM_COUNTERS; i++) if (rss[i]) - add_mm_counter(mm, i, rss[i]); + add_mm_counter_other(mm, i, rss[i]); } =20 static bool is_bad_page_map_ratelimited(void) @@ -2306,7 +2306,7 @@ static int insert_page_into_pte_locked(struct vm_area= _struct *vma, pte_t *pte, pteval =3D pte_mkyoung(pteval); pteval =3D maybe_mkwrite(pte_mkdirty(pteval), vma); } - inc_mm_counter(vma->vm_mm, mm_counter_file(folio)); + inc_mm_counter_local(vma->vm_mm, mm_counter_file(folio)); folio_add_file_rmap_pte(folio, page, vma); } set_pte_at(vma->vm_mm, addr, pte, pteval); @@ -3716,12 +3716,12 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) if (likely(vmf->pte && pte_same(ptep_get(vmf->pte), vmf->orig_pte))) { if (old_folio) { if (!folio_test_anon(old_folio)) { - dec_mm_counter(mm, mm_counter_file(old_folio)); - inc_mm_counter(mm, MM_ANONPAGES); + dec_mm_counter_other(mm, mm_counter_file(old_folio)); + inc_mm_counter_other(mm, MM_ANONPAGES); } } else { ksm_might_unmap_zero_page(mm, vmf->orig_pte); - inc_mm_counter(mm, MM_ANONPAGES); + inc_mm_counter_other(mm, MM_ANONPAGES); } flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); entry =3D folio_mk_pte(new_folio, vma->vm_page_prot); @@ -4916,8 +4916,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (should_try_to_free_swap(folio, vma, vmf->flags)) folio_free_swap(folio); =20 - add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages); - add_mm_counter(vma->vm_mm, MM_SWAPENTS, -nr_pages); + add_mm_counter_other(vma->vm_mm, MM_ANONPAGES, nr_pages); + add_mm_counter_other(vma->vm_mm, MM_SWAPENTS, -nr_pages); pte =3D mk_pte(page, vma->vm_page_prot); if (pte_swp_soft_dirty(vmf->orig_pte)) pte =3D pte_mksoft_dirty(pte); @@ -5223,7 +5223,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) } =20 folio_ref_add(folio, nr_pages - 1); - add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages); + add_mm_counter_other(vma->vm_mm, MM_ANONPAGES, nr_pages); count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC); folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE); folio_add_lru_vma(folio, vma); @@ -5375,7 +5375,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct fo= lio *folio, struct page *pa if (write) entry =3D maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); =20 - add_mm_counter(vma->vm_mm, mm_counter_file(folio), HPAGE_PMD_NR); + add_mm_counter_other(vma->vm_mm, mm_counter_file(folio), HPAGE_PMD_NR); folio_add_file_rmap_pmd(folio, page, vma); =20 /* @@ -5561,7 +5561,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf) folio_ref_add(folio, nr_pages - 1); set_pte_range(vmf, folio, page, nr_pages, addr); type =3D is_cow ? MM_ANONPAGES : mm_counter_file(folio); - add_mm_counter(vma->vm_mm, type, nr_pages); + add_mm_counter_other(vma->vm_mm, type, nr_pages); ret =3D 0; =20 unlock: diff --git a/mm/migrate.c b/mm/migrate.c index e3065c9edb55..dd8c6e6224f9 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -329,7 +329,7 @@ static bool try_to_map_unused_to_zeropage(struct page_v= ma_mapped_walk *pvmw, =20 set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); =20 - dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); + dec_mm_counter_other(pvmw->vma->vm_mm, mm_counter(folio)); return true; } =20 diff --git a/mm/migrate_device.c b/mm/migrate_device.c index abd9f6850db6..7f3e5d7b3109 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -676,7 +676,7 @@ static void migrate_vma_insert_page(struct migrate_vma = *migrate, if (userfaultfd_missing(vma)) goto unlock_abort; =20 - inc_mm_counter(mm, MM_ANONPAGES); + inc_mm_counter_other(mm, MM_ANONPAGES); folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE); if (!folio_is_zone_device(folio)) folio_add_lru_vma(folio, vma); diff --git a/mm/rmap.c b/mm/rmap.c index ac4f783d6ec2..0f6023ffb65d 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2085,7 +2085,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, set_huge_pte_at(mm, address, pvmw.pte, pteval, hsz); } else { - dec_mm_counter(mm, mm_counter(folio)); + dec_mm_counter_other(mm, mm_counter(folio)); set_pte_at(mm, address, pvmw.pte, pteval); } } else if (likely(pte_present(pteval)) && pte_unused(pteval) && @@ -2100,7 +2100,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, * migration) will not expect userfaults on already * copied pages. */ - dec_mm_counter(mm, mm_counter(folio)); + dec_mm_counter_other(mm, mm_counter(folio)); } else if (folio_test_anon(folio)) { swp_entry_t entry =3D page_swap_entry(subpage); pte_t swp_pte; @@ -2155,7 +2155,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, set_ptes(mm, address, pvmw.pte, pteval, nr_pages); goto walk_abort; } - add_mm_counter(mm, MM_ANONPAGES, -nr_pages); + add_mm_counter_other(mm, MM_ANONPAGES, -nr_pages); goto discard; } =20 @@ -2188,8 +2188,8 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, list_add(&mm->mmlist, &init_mm.mmlist); spin_unlock(&mmlist_lock); } - dec_mm_counter(mm, MM_ANONPAGES); - inc_mm_counter(mm, MM_SWAPENTS); + dec_mm_counter_other(mm, MM_ANONPAGES); + inc_mm_counter_other(mm, MM_SWAPENTS); swp_pte =3D swp_entry_to_pte(entry); if (anon_exclusive) swp_pte =3D pte_swp_mkexclusive(swp_pte); @@ -2217,7 +2217,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, * * See Documentation/mm/mmu_notifier.rst */ - dec_mm_counter(mm, mm_counter_file(folio)); + dec_mm_counter_other(mm, mm_counter_file(folio)); } discard: if (unlikely(folio_test_hugetlb(folio))) { @@ -2476,7 +2476,7 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, set_huge_pte_at(mm, address, pvmw.pte, pteval, hsz); } else { - dec_mm_counter(mm, mm_counter(folio)); + dec_mm_counter_other(mm, mm_counter(folio)); set_pte_at(mm, address, pvmw.pte, pteval); } } else if (likely(pte_present(pteval)) && pte_unused(pteval) && @@ -2491,7 +2491,7 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, * migration) will not expect userfaults on already * copied pages. */ - dec_mm_counter(mm, mm_counter(folio)); + dec_mm_counter_other(mm, mm_counter(folio)); } else { swp_entry_t entry; pte_t swp_pte; diff --git a/mm/swapfile.c b/mm/swapfile.c index 10760240a3a2..70f7d31c0854 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2163,7 +2163,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_= t *pmd, if (unlikely(hwpoisoned || !folio_test_uptodate(folio))) { swp_entry_t swp_entry; =20 - dec_mm_counter(vma->vm_mm, MM_SWAPENTS); + dec_mm_counter_other(vma->vm_mm, MM_SWAPENTS); if (hwpoisoned) { swp_entry =3D make_hwpoison_entry(page); } else { @@ -2181,8 +2181,8 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_= t *pmd, */ arch_swap_restore(folio_swap(entry, folio), folio); =20 - dec_mm_counter(vma->vm_mm, MM_SWAPENTS); - inc_mm_counter(vma->vm_mm, MM_ANONPAGES); + dec_mm_counter_other(vma->vm_mm, MM_SWAPENTS); + inc_mm_counter_other(vma->vm_mm, MM_ANONPAGES); folio_get(folio); if (folio =3D=3D swapcache) { rmap_t rmap_flags =3D RMAP_NONE; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index af61b95c89e4..34e760c37b7b 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -221,7 +221,7 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd, * Must happen after rmap, as mm_counter() checks mapping (via * PageAnon()), which is set by __page_set_anon_rmap(). */ - inc_mm_counter(dst_mm, mm_counter(folio)); + inc_mm_counter_other(dst_mm, mm_counter(folio)); =20 set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); =20 --=20 2.51.0