From nobody Fri Sep 12 13:01:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1698C77B7D for ; Mon, 15 May 2023 18:07:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245059AbjEOSHO (ORCPT ); Mon, 15 May 2023 14:07:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243053AbjEOSGl (ORCPT ); Mon, 15 May 2023 14:06:41 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A1C41F301 for ; Mon, 15 May 2023 11:03:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684173825; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=g+P2pLjwIHGTBC73R7sGDBbFogsWx67Df0W2nS+tUE4=; b=LPwbOa+FUHtn5dV0EVawj0fifNw7JpIAoitJ2UHiMyeGaeKJf7KcKgEpRrkn1+LfFLd7C0 ctaphAaz/P07I+aHRO6n58/etDGoURVIXopA6jCPNbUwbxPu9rq9bcBfUpFPNpusfH7JBk WU1RKOnEMtW16pDxu/GRr8kquCZDcYQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-605-3Deovc6xNXishRmnvXFshg-1; Mon, 15 May 2023 14:03:43 -0400 X-MC-Unique: 3Deovc6xNXishRmnvXFshg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2844F87082B; Mon, 15 May 2023 18:03:43 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id EFB9040C6EC4; Mon, 15 May 2023 18:03:42 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 40D304161E50E; Mon, 15 May 2023 15:02:17 -0300 (-03) Message-ID: <20230515180138.442505633@redhat.com> User-Agent: quilt/0.67 Date: Mon, 15 May 2023 15:00:16 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Russell King , Huacai Chen , Heiko Carstens , x86@kernel.org, Vlastimil Babka , Michal Hocko , Marcelo Tosatti Subject: [PATCH v8 01/13] vmstat: allow_direct_reclaim should use zone_page_state_snapshot References: <20230515180015.016409657@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A customer provided evidence indicating that a process was stalled in direct reclaim: - The process was trapped in throttle_direct_reclaim(). The function wait_event_killable() was called to wait condition =20 allow_direct_reclaim(pgdat) for current node to be true. =20 The allow_direct_reclaim(pgdat) examined the number of free pages =20 on the node by zone_page_state() which just returns value in =20 zone->vm_stat[NR_FREE_PAGES]. =20 =20 - On node #1, zone->vm_stat[NR_FREE_PAGES] was 0. =20 However, the freelist on this node was not empty. =20 =20 - This inconsistent of vmstat value was caused by percpu vmstat on =20 nohz_full cpus. Every increment/decrement of vmstat is performed =20 on percpu vmstat counter at first, then pooled diffs are cumulated =20 to the zone's vmstat counter in timely manner. However, on nohz_full =20 cpus (in case of this customer's system, 48 of 52 cpus) these pooled =20 diffs were not cumulated once the cpu had no event on it so that =20 the cpu started sleeping infinitely. =20 I checked percpu vmstat and found there were total 69 counts not =20 cumulated to the zone's vmstat counter yet. =20 =20 - In this situation, kswapd did not help the trapped process. =20 In pgdat_balanced(), zone_wakermark_ok_safe() examined the number =20 of free pages on the node by zone_page_state_snapshot() which =20 checks pending counts on percpu vmstat. =20 Therefore kswapd could know there were 69 free pages correctly. =20 Since zone->_watermark =3D {8, 20, 32}, kswapd did not work because =20 69 was greater than 32 as high watermark. =20 Change allow_direct_reclaim to use zone_page_state_snapshot, which allows a more precise version of the vmstat counters to be used. allow_direct_reclaim will only be called from try_to_free_pages, which is not a hot path. Suggested-by: Michal Hocko Signed-off-by: Marcelo Tosatti Acked-by: Michal Hocko Reviewed-by: Aaron Tomlin --- Index: linux-vmstat-remote/mm/vmscan.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/mm/vmscan.c +++ linux-vmstat-remote/mm/vmscan.c @@ -6886,7 +6886,7 @@ static bool allow_direct_reclaim(pg_data continue; =20 pfmemalloc_reserve +=3D min_wmark_pages(zone); - free_pages +=3D zone_page_state(zone, NR_FREE_PAGES); + free_pages +=3D zone_page_state_snapshot(zone, NR_FREE_PAGES); } =20 /* If there are no reserves (unexpected config) then do not throttle */ From nobody Fri Sep 12 13:01:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E888C77B7D for ; Mon, 15 May 2023 18:07:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245104AbjEOSHj (ORCPT ); Mon, 15 May 2023 14:07:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244716AbjEOSGr (ORCPT ); Mon, 15 May 2023 14:06:47 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 360D11CBA2 for ; Mon, 15 May 2023 11:03:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684173828; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=418Es5nqdsliVmHbTuXzLvQQdVHfWzYdcac5+iWJU4U=; b=YQqwe93QNXKVLVR2pog8ka5UKRU78mtPDcz3WjMlCum1gCzJmBHvr4ZudfmYQEoI5qXBqD mQ2nBGkbJDLFPfk6ErK1zhMTaJLsVICWDYExrwx3hZdMKTuv2qn0dRtmtqOHX9LR4ubBc8 VutM1xAyE5ozn63FZYtsGkmRo613fHw= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-351-yD2GRHLLNGCGNdDrFnss0w-1; Mon, 15 May 2023 14:03:43 -0400 X-MC-Unique: yD2GRHLLNGCGNdDrFnss0w-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 27ED93C0ED4A; Mon, 15 May 2023 18:03:43 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id F1701492B00; Mon, 15 May 2023 18:03:42 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 44A8A4161E523; Mon, 15 May 2023 15:02:17 -0300 (-03) Message-ID: <20230515180138.467548391@redhat.com> User-Agent: quilt/0.67 Date: Mon, 15 May 2023 15:00:17 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Russell King , Huacai Chen , Heiko Carstens , x86@kernel.org, Vlastimil Babka , Michal Hocko , Marcelo Tosatti Subject: [PATCH v8 02/13] this_cpu_cmpxchg: ARM64: switch this_cpu_cmpxchg to locked, add _local function References: <20230515180015.016409657@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Goal is to have vmstat_shepherd to transfer from per-CPU counters to global counters remotely. For this,=20 an atomic this_cpu_cmpxchg is necessary. Following the kernel convention for cmpxchg/cmpxchg_local, change ARM's this_cpu_cmpxchg_ helpers to be atomic, and add this_cpu_cmpxchg_local_ helpers which are not atomic. Signed-off-by: Marcelo Tosatti --- Index: linux-vmstat-remote/arch/arm64/include/asm/percpu.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/arch/arm64/include/asm/percpu.h +++ linux-vmstat-remote/arch/arm64/include/asm/percpu.h @@ -232,13 +232,23 @@ PERCPU_RET_OP(add, add, ldadd) _pcp_protect_return(xchg_relaxed, pcp, val) =20 #define this_cpu_cmpxchg_1(pcp, o, n) \ - _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) + _pcp_protect_return(cmpxchg, pcp, o, n) #define this_cpu_cmpxchg_2(pcp, o, n) \ - _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) + _pcp_protect_return(cmpxchg, pcp, o, n) #define this_cpu_cmpxchg_4(pcp, o, n) \ - _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) + _pcp_protect_return(cmpxchg, pcp, o, n) #define this_cpu_cmpxchg_8(pcp, o, n) \ + _pcp_protect_return(cmpxchg, pcp, o, n) + +#define this_cpu_cmpxchg_local_1(pcp, o, n) \ _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) +#define this_cpu_cmpxchg_local_2(pcp, o, n) \ + _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) +#define this_cpu_cmpxchg_local_4(pcp, o, n) \ + _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) +#define this_cpu_cmpxchg_local_8(pcp, o, n) \ + _pcp_protect_return(cmpxchg_relaxed, pcp, o, n) + =20 #ifdef __KVM_NVHE_HYPERVISOR__ extern unsigned long __hyp_per_cpu_offset(unsigned int cpu); From nobody Fri Sep 12 13:01:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD431C77B75 for ; Mon, 15 May 2023 18:07:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229773AbjEOSHS (ORCPT ); Mon, 15 May 2023 14:07:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39810 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243789AbjEOSGl (ORCPT ); Mon, 15 May 2023 14:06:41 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 931F51F308 for ; Mon, 15 May 2023 11:03:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684173826; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=6QfpHdKEKCW5B9XIpfLylgQS/cYcg8nRN+BFpTs113M=; b=PwGwixU1u4jEM7y3smS1NzsqGZS8r4lSJE0oMG2AGqz/WvDT+gD2EW5fexitCgPlgTFOEx 1mCvi6l1hKKHAUh8VlR+WVy+jBuVJoyZM42KGI2Tm352XVyD31wfno9LI3CHSHNgVTt5rT Lsq/28IMIpRs1BmaFkBuSngZPtkmxsQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-222-4rDDS1EbMJedd7bEsqx9QA-1; Mon, 15 May 2023 14:03:44 -0400 X-MC-Unique: 4rDDS1EbMJedd7bEsqx9QA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2A8703C0ED50; Mon, 15 May 2023 18:03:43 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id EF5352026D16; Mon, 15 May 2023 18:03:42 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 48EDC4161E525; Mon, 15 May 2023 15:02:17 -0300 (-03) Message-ID: <20230515180138.491271416@redhat.com> User-Agent: quilt/0.67 Date: Mon, 15 May 2023 15:00:18 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Russell King , Huacai Chen , Heiko Carstens , x86@kernel.org, Vlastimil Babka , Michal Hocko , Marcelo Tosatti Subject: [PATCH v8 03/13] this_cpu_cmpxchg: loongarch: switch this_cpu_cmpxchg to locked, add _local function References: <20230515180015.016409657@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Goal is to have vmstat_shepherd to transfer from per-CPU counters to global counters remotely. For this, an atomic this_cpu_cmpxchg is necessary. Following the kernel convention for cmpxchg/cmpxchg_local, add this_cpu_cmpxchg_local helpers to Loongarch. Signed-off-by: Marcelo Tosatti --- Index: linux-vmstat-remote/arch/loongarch/include/asm/percpu.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/arch/loongarch/include/asm/percpu.h +++ linux-vmstat-remote/arch/loongarch/include/asm/percpu.h @@ -150,6 +150,16 @@ static inline unsigned long __percpu_xch } =20 /* this_cpu_cmpxchg */ +#define _protect_cmpxchg(pcp, o, n) \ +({ \ + typeof(*raw_cpu_ptr(&(pcp))) __ret; \ + preempt_disable_notrace(); \ + __ret =3D cmpxchg(raw_cpu_ptr(&(pcp)), o, n); \ + preempt_enable_notrace(); \ + __ret; \ +}) + +/* this_cpu_cmpxchg_local */ #define _protect_cmpxchg_local(pcp, o, n) \ ({ \ typeof(*raw_cpu_ptr(&(pcp))) __ret; \ @@ -222,10 +232,15 @@ do { \ #define this_cpu_xchg_4(pcp, val) _percpu_xchg(pcp, val) #define this_cpu_xchg_8(pcp, val) _percpu_xchg(pcp, val) =20 -#define this_cpu_cmpxchg_1(ptr, o, n) _protect_cmpxchg_local(ptr, o, n) -#define this_cpu_cmpxchg_2(ptr, o, n) _protect_cmpxchg_local(ptr, o, n) -#define this_cpu_cmpxchg_4(ptr, o, n) _protect_cmpxchg_local(ptr, o, n) -#define this_cpu_cmpxchg_8(ptr, o, n) _protect_cmpxchg_local(ptr, o, n) +#define this_cpu_cmpxchg_local_1(ptr, o, n) _protect_cmpxchg_local(ptr, o,= n) +#define this_cpu_cmpxchg_local_2(ptr, o, n) _protect_cmpxchg_local(ptr, o,= n) +#define this_cpu_cmpxchg_local_4(ptr, o, n) _protect_cmpxchg_local(ptr, o,= n) +#define this_cpu_cmpxchg_local_8(ptr, o, n) _protect_cmpxchg_local(ptr, o,= n) + +#define this_cpu_cmpxchg_1(ptr, o, n) _protect_cmpxchg(ptr, o, n) +#define this_cpu_cmpxchg_2(ptr, o, n) _protect_cmpxchg(ptr, o, n) +#define this_cpu_cmpxchg_4(ptr, o, n) _protect_cmpxchg(ptr, o, n) +#define this_cpu_cmpxchg_8(ptr, o, n) _protect_cmpxchg(ptr, o, n) =20 #include From nobody Fri Sep 12 13:01:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7A34C77B75 for ; Mon, 15 May 2023 18:07:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245099AbjEOSHg (ORCPT ); Mon, 15 May 2023 14:07:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245018AbjEOSGp (ORCPT ); Mon, 15 May 2023 14:06:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 965083C12 for ; Mon, 15 May 2023 11:03:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684173827; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=ZYdI/NkwLfDGmXJs7KnOK9vem40tm0vaN9MiHTPCcnM=; b=KW7WzmkakLnsP7kmyU8q2PuZbyHaVpkwb697741UpWch0mkG/ruP7stxRss8SeCm4Nl4Fj ycnD+imBQKP8otBOKW5+ljWh77NgVkeN3WSZOy8YETyEZMK5ojH9qoELwMQusn6zzHA5GH EpgpGDbG9z+20N+o3nZ8QwS6+lrjTnI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-646-lkqpmNCLPSK5Vqgm07jqoA-1; Mon, 15 May 2023 14:03:43 -0400 X-MC-Unique: lkqpmNCLPSK5Vqgm07jqoA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 258BF382C96A; Mon, 15 May 2023 18:03:43 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id EF61A14152F6; Mon, 15 May 2023 18:03:42 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 4D38A4161E526; Mon, 15 May 2023 15:02:17 -0300 (-03) Message-ID: <20230515180138.516507215@redhat.com> User-Agent: quilt/0.67 Date: Mon, 15 May 2023 15:00:19 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Russell King , Huacai Chen , Heiko Carstens , x86@kernel.org, Vlastimil Babka , Michal Hocko , Marcelo Tosatti Subject: [PATCH v8 04/13] this_cpu_cmpxchg: S390: switch this_cpu_cmpxchg to locked, add _local function References: <20230515180015.016409657@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Goal is to have vmstat_shepherd to transfer from per-CPU counters to global counters remotely. For this, an atomic this_cpu_cmpxchg is necessary. Following the kernel convention for cmpxchg/cmpxchg_local, add S390's this_cpu_cmpxchg_local. Signed-off-by: Marcelo Tosatti --- Index: linux-vmstat-remote/arch/s390/include/asm/percpu.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/arch/s390/include/asm/percpu.h +++ linux-vmstat-remote/arch/s390/include/asm/percpu.h @@ -148,6 +148,11 @@ #define this_cpu_cmpxchg_4(pcp, oval, nval) arch_this_cpu_cmpxchg(pcp, ova= l, nval) #define this_cpu_cmpxchg_8(pcp, oval, nval) arch_this_cpu_cmpxchg(pcp, ova= l, nval) =20 +#define this_cpu_cmpxchg_local_1(pcp, oval, nval) arch_this_cpu_cmpxchg(pc= p, oval, nval) +#define this_cpu_cmpxchg_local_2(pcp, oval, nval) arch_this_cpu_cmpxchg(pc= p, oval, nval) +#define this_cpu_cmpxchg_local_4(pcp, oval, nval) arch_this_cpu_cmpxchg(pc= p, oval, nval) +#define this_cpu_cmpxchg_local_8(pcp, oval, nval) arch_this_cpu_cmpxchg(pc= p, oval, nval) + #define arch_this_cpu_xchg(pcp, nval) \ ({ \ typeof(pcp) *ptr__; \ From nobody Fri Sep 12 13:01:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD2C4C77B7D for ; Mon, 15 May 2023 18:07:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245088AbjEOSHb (ORCPT ); Mon, 15 May 2023 14:07:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39834 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243192AbjEOSGp (ORCPT ); Mon, 15 May 2023 14:06:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 925A61F307 for ; Mon, 15 May 2023 11:03:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684173826; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=F5ZFnFWd0bfXvbTVZzhDKmwHy0T6x+spMFYT61y9M6E=; b=g3vAmX6c3AFB9Rrmyat6tJSIckmT2gusprlx0AqtVuoDmaPBoI00F3AYPYM/EOVSpYZuAc wlPVA2L+FypOVn6UPH+LjbB3BNIWXOy2lSi5/p0kBT4tJ2fi4UEVmr7ce0R8GnHWu/wxPH oGea7ik6qAvW7msFR43Da6Fp8I0C2h0= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-159-s_j6uHycNbCQd9e2JrYNNA-1; Mon, 15 May 2023 14:03:42 -0400 X-MC-Unique: s_j6uHycNbCQd9e2JrYNNA-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 618F63C0ED41; Mon, 15 May 2023 18:03:41 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 04617492B00; Mon, 15 May 2023 18:03:41 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 54F094161E527; Mon, 15 May 2023 15:02:17 -0300 (-03) Message-ID: <20230515180138.541141182@redhat.com> User-Agent: quilt/0.67 Date: Mon, 15 May 2023 15:00:20 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Russell King , Huacai Chen , Heiko Carstens , x86@kernel.org, Vlastimil Babka , Michal Hocko , Marcelo Tosatti Subject: [PATCH v8 05/13] this_cpu_cmpxchg: x86: switch this_cpu_cmpxchg to locked, add _local function References: <20230515180015.016409657@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Goal is to have vmstat_shepherd to transfer from per-CPU counters to global counters remotely. For this, an atomic this_cpu_cmpxchg is necessary. Following the kernel convention for cmpxchg/cmpxchg_local, change x86's this_cpu_cmpxchg_ helpers to be atomic. and add this_cpu_cmpxchg_local_ helpers which are not atomic. Signed-off-by: Marcelo Tosatti --- Index: linux-vmstat-remote/arch/x86/include/asm/percpu.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/arch/x86/include/asm/percpu.h +++ linux-vmstat-remote/arch/x86/include/asm/percpu.h @@ -197,11 +197,11 @@ do { \ * cmpxchg has no such implied lock semantics as a result it is much * more efficient for cpu local operations. */ -#define percpu_cmpxchg_op(size, qual, _var, _oval, _nval) \ +#define percpu_cmpxchg_op(size, qual, _var, _oval, _nval, lockp) \ ({ \ __pcpu_type_##size pco_old__ =3D __pcpu_cast_##size(_oval); \ __pcpu_type_##size pco_new__ =3D __pcpu_cast_##size(_nval); \ - asm qual (__pcpu_op2_##size("cmpxchg", "%[nval]", \ + asm qual (__pcpu_op2_##size(lockp "cmpxchg", "%[nval]", \ __percpu_arg([var])) \ : [oval] "+a" (pco_old__), \ [var] "+m" (_var) \ @@ -279,16 +279,20 @@ do { \ #define raw_cpu_add_return_1(pcp, val) percpu_add_return_op(1, , pcp, val) #define raw_cpu_add_return_2(pcp, val) percpu_add_return_op(2, , pcp, val) #define raw_cpu_add_return_4(pcp, val) percpu_add_return_op(4, , pcp, val) -#define raw_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(1, , pcp, ova= l, nval) -#define raw_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(2, , pcp, ova= l, nval) -#define raw_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(4, , pcp, ova= l, nval) +#define raw_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(1, , pcp, ova= l, nval, "") +#define raw_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(2, , pcp, ova= l, nval, "") +#define raw_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(4, , pcp, ova= l, nval, "") =20 #define this_cpu_add_return_1(pcp, val) percpu_add_return_op(1, volatile,= pcp, val) #define this_cpu_add_return_2(pcp, val) percpu_add_return_op(2, volatile,= pcp, val) #define this_cpu_add_return_4(pcp, val) percpu_add_return_op(4, volatile,= pcp, val) -#define this_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(1, volatile,= pcp, oval, nval) -#define this_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(2, volatile,= pcp, oval, nval) -#define this_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(4, volatile,= pcp, oval, nval) +#define this_cpu_cmpxchg_local_1(pcp, oval, nval) percpu_cmpxchg_op(1, vol= atile, pcp, oval, nval, "") +#define this_cpu_cmpxchg_local_2(pcp, oval, nval) percpu_cmpxchg_op(2, vol= atile, pcp, oval, nval, "") +#define this_cpu_cmpxchg_local_4(pcp, oval, nval) percpu_cmpxchg_op(4, vol= atile, pcp, oval, nval, "") + +#define this_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(1, volatile,= pcp, oval, nval, LOCK_PREFIX) +#define this_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(2, volatile,= pcp, oval, nval, LOCK_PREFIX) +#define this_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(4, volatile,= pcp, oval, nval, LOCK_PREFIX) =20 #ifdef CONFIG_X86_CMPXCHG64 #define percpu_cmpxchg8b_double(pcp1, pcp2, o1, o2, n1, n2) \ @@ -319,16 +323,17 @@ do { \ #define raw_cpu_or_8(pcp, val) percpu_to_op(8, , "or", (pcp), val) #define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(8, , pcp, val) #define raw_cpu_xchg_8(pcp, nval) raw_percpu_xchg_op(pcp, nval) -#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(8, , pcp, ova= l, nval) +#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(8, , pcp, ova= l, nval, "") =20 -#define this_cpu_read_8(pcp) percpu_from_op(8, volatile, "mov", pcp) -#define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp)= , val) -#define this_cpu_add_8(pcp, val) percpu_add_op(8, volatile, (pcp), val) -#define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp), = val) -#define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), v= al) -#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(8, volatile,= pcp, val) -#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(8, volatile, pcp, nval) -#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(8, volatile,= pcp, oval, nval) +#define this_cpu_read_8(pcp) percpu_from_op(8, volatile, "mov", pcp) +#define this_cpu_write_8(pcp, val) percpu_to_op(8, volatile, "mov", (pcp= ), val) +#define this_cpu_add_8(pcp, val) percpu_add_op(8, volatile, (pcp), val) +#define this_cpu_and_8(pcp, val) percpu_to_op(8, volatile, "and", (pcp),= val) +#define this_cpu_or_8(pcp, val) percpu_to_op(8, volatile, "or", (pcp), = val) +#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(8, volatile= , pcp, val) +#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(8, volatile, pcp, nval) +#define this_cpu_cmpxchg_local_8(pcp, oval, nval) percpu_cmpxchg_op(8, vol= atile, pcp, oval, nval, "") +#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(8, volatile= , pcp, oval, nval, LOCK_PREFIX) =20 /* * Pretty complex macro to generate cmpxchg16 instruction. The instruction From nobody Fri Sep 12 13:01:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED7DAC77B7D for ; Mon, 15 May 2023 18:07:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245054AbjEOSHL (ORCPT ); Mon, 15 May 2023 14:07:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39846 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245007AbjEOSGj (ORCPT ); Mon, 15 May 2023 14:06:39 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C10C19958 for ; Mon, 15 May 2023 11:03:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684173827; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=kXDHBNZWD1x4YyqOA6ZVeranXWbQxiEjKgF55Tb+rLk=; b=Qn2L4jLnRcX3z8Ad0eQQF8y7swhtcWvCMwTYm3fV6XoGy459itk/1MoV3SE6ORPYcMeKZj p2QW1XCaLB7u7V5GGfWlCBOIZOQpgn0xgiAonfqJZ9jjmBdvNttTk4q0vX3BGpJnL+4rmn bMLlBL9KaVKg3W7Fs/8EN4UyFZ82vtE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-159-oDr3EEaKO8-jBZS-XI6F8g-1; Mon, 15 May 2023 14:03:41 -0400 X-MC-Unique: oDr3EEaKO8-jBZS-XI6F8g-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 41F9586C60B; Mon, 15 May 2023 18:03:41 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 013F840C2063; Mon, 15 May 2023 18:03:41 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 5E4704161E528; Mon, 15 May 2023 15:02:17 -0300 (-03) Message-ID: <20230515180138.566700916@redhat.com> User-Agent: quilt/0.67 Date: Mon, 15 May 2023 15:00:21 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Russell King , Huacai Chen , Heiko Carstens , x86@kernel.org, Vlastimil Babka , Michal Hocko , Marcelo Tosatti Subject: [PATCH v8 06/13] add this_cpu_cmpxchg_local and asm-generic definitions References: <20230515180015.016409657@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Goal is to have vmstat_shepherd to transfer from per-CPU counters to global counters remotely. For this, an atomic this_cpu_cmpxchg is necessary. Add this_cpu_cmpxchg_local_ helpers to asm-generic/percpu.h. Signed-off-by: Marcelo Tosatti --- Index: linux-vmstat-remote/include/asm-generic/percpu.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/include/asm-generic/percpu.h +++ linux-vmstat-remote/include/asm-generic/percpu.h @@ -424,6 +424,23 @@ do { \ this_cpu_generic_cmpxchg(pcp, oval, nval) #endif =20 +#ifndef this_cpu_cmpxchg_local_1 +#define this_cpu_cmpxchg_local_1(pcp, oval, nval) \ + this_cpu_generic_cmpxchg(pcp, oval, nval) +#endif +#ifndef this_cpu_cmpxchg_local_2 +#define this_cpu_cmpxchg_local_2(pcp, oval, nval) \ + this_cpu_generic_cmpxchg(pcp, oval, nval) +#endif +#ifndef this_cpu_cmpxchg_local_4 +#define this_cpu_cmpxchg_local_4(pcp, oval, nval) \ + this_cpu_generic_cmpxchg(pcp, oval, nval) +#endif +#ifndef this_cpu_cmpxchg_local_8 +#define this_cpu_cmpxchg_local_8(pcp, oval, nval) \ + this_cpu_generic_cmpxchg(pcp, oval, nval) +#endif + #ifndef this_cpu_cmpxchg_double_1 #define this_cpu_cmpxchg_double_1(pcp1, pcp2, oval1, oval2, nval1, nval2) \ this_cpu_generic_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2) Index: linux-vmstat-remote/include/linux/percpu-defs.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/include/linux/percpu-defs.h +++ linux-vmstat-remote/include/linux/percpu-defs.h @@ -513,6 +513,8 @@ do { \ #define this_cpu_xchg(pcp, nval) __pcpu_size_call_return2(this_cpu_xchg_, = pcp, nval) #define this_cpu_cmpxchg(pcp, oval, nval) \ __pcpu_size_call_return2(this_cpu_cmpxchg_, pcp, oval, nval) +#define this_cpu_cmpxchg_local(pcp, oval, nval) \ + __pcpu_size_call_return2(this_cpu_cmpxchg_local_, pcp, oval, nval) #define this_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2) \ __pcpu_double_call_return_bool(this_cpu_cmpxchg_double_, pcp1, pcp2, oval= 1, oval2, nval1, nval2) From nobody Fri Sep 12 13:01:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE2FFC77B7D for ; Mon, 15 May 2023 18:07:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244849AbjEOSHm (ORCPT ); Mon, 15 May 2023 14:07:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245020AbjEOSGr (ORCPT ); Mon, 15 May 2023 14:06:47 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A968F659B for ; Mon, 15 May 2023 11:03:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684173827; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=BJ6i2oDSiY3YdcuHRj+4qVRIpmZay1wZiVJ33jyxj48=; b=PjSIb+LnT91LxWlqsyWhcdESZ9VlPFgya2xGxH8emPMgoTgXS8CONHs3Xl5vKmaAkaFmCC H+CxZfaEBtEtPMBeFoeKPf34gJGx1iVvHFRRe4gjYhpS0QJinxBIOZ1W3JQp2k9VEEY9dy 7EgEWmQt5Mhd6b9a6h5gOnD8tpRuFdQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-159-HcyxXZ_fOKim-3NV21rk8g-1; Mon, 15 May 2023 14:03:42 -0400 X-MC-Unique: HcyxXZ_fOKim-3NV21rk8g-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 64E8A382C96A; Mon, 15 May 2023 18:03:41 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2E3102026D25; Mon, 15 May 2023 18:03:41 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 65C8A4161E529; Mon, 15 May 2023 15:02:17 -0300 (-03) Message-ID: <20230515180138.591710284@redhat.com> User-Agent: quilt/0.67 Date: Mon, 15 May 2023 15:00:22 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Russell King , Huacai Chen , Heiko Carstens , x86@kernel.org, Vlastimil Babka , Michal Hocko , Peter Xu , Marcelo Tosatti Subject: [PATCH v8 07/13] convert this_cpu_cmpxchg users to this_cpu_cmpxchg_local References: <20230515180015.016409657@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" this_cpu_cmpxchg was modified to atomic version, which=20 can be more costly than non-atomic version. Switch users of this_cpu_cmpxchg to this_cpu_cmpxchg_local (which preserves pre-non-atomic this_cpu_cmpxchg behaviour). Acked-by: Peter Xu Signed-off-by: Marcelo Tosatti --- Index: linux-vmstat-remote/kernel/fork.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/kernel/fork.c +++ linux-vmstat-remote/kernel/fork.c @@ -205,7 +205,7 @@ static bool try_release_thread_stack_to_ unsigned int i; =20 for (i =3D 0; i < NR_CACHED_STACKS; i++) { - if (this_cpu_cmpxchg(cached_stacks[i], NULL, vm) !=3D NULL) + if (this_cpu_cmpxchg_local(cached_stacks[i], NULL, vm) !=3D NULL) continue; return true; } Index: linux-vmstat-remote/kernel/scs.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/kernel/scs.c +++ linux-vmstat-remote/kernel/scs.c @@ -83,7 +83,7 @@ void scs_free(void *s) */ =20 for (i =3D 0; i < NR_CACHED_SCS; i++) - if (this_cpu_cmpxchg(scs_cache[i], 0, s) =3D=3D NULL) + if (this_cpu_cmpxchg_local(scs_cache[i], 0, s) =3D=3D NULL) return; =20 kasan_unpoison_vmalloc(s, SCS_SIZE, KASAN_VMALLOC_PROT_NORMAL); From nobody Fri Sep 12 13:01:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC8FAC77B75 for ; Mon, 15 May 2023 18:07:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245077AbjEOSHX (ORCPT ); Mon, 15 May 2023 14:07:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241894AbjEOSGm (ORCPT ); Mon, 15 May 2023 14:06:42 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5801F1CBB9 for ; Mon, 15 May 2023 11:03:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684173825; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=yXMqTzjpRwgaI/E9PhdUEcRLH1Fh3WNfw2d4qEVmv+k=; b=hZx4pEfJXosw1etoJhSEDn7l+IJRppOowSmzPpcrRsbItxyWyvxyb8A1J+CgdXdvTi1fj+ UDnLtHFiZr8KYvVo+RwlPN2yr9fupVMGcT1GZ6sme2CsXPP2dAhRW1YEjxMGqtEu8PQMYm dOEUKcTEFN/+MnTMsqzozJ3QtNuPLQA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-653-YaqFy-ZENrSao26e_j8GBw-1; Mon, 15 May 2023 14:03:42 -0400 X-MC-Unique: YaqFy-ZENrSao26e_j8GBw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 68332101A553; Mon, 15 May 2023 18:03:41 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 096C914152F6; Mon, 15 May 2023 18:03:41 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 6C1034161E52B; Mon, 15 May 2023 15:02:17 -0300 (-03) Message-ID: <20230515180138.617500477@redhat.com> User-Agent: quilt/0.67 Date: Mon, 15 May 2023 15:00:23 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Russell King , Huacai Chen , Heiko Carstens , x86@kernel.org, Vlastimil Babka , Michal Hocko , Marcelo Tosatti Subject: [PATCH v8 08/13] mm/vmstat: switch counter modification to cmpxchg References: <20230515180015.016409657@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In preparation for switching vmstat shepherd to flush per-CPU counters remotely, switch the __{mod,inc,dec} functions that modify the counters to use cmpxchg. To facilitate reviewing, functions are ordered in the text file, as: __{mod,inc,dec}_{zone,node}_page_state #ifdef CONFIG_HAVE_CMPXCHG_LOCAL {mod,inc,dec}_{zone,node}_page_state #else {mod,inc,dec}_{zone,node}_page_state #endif This patch defines the __ versions for the CONFIG_HAVE_CMPXCHG_LOCAL case to be their non-"__" counterparts: #ifdef CONFIG_HAVE_CMPXCHG_LOCAL {mod,inc,dec}_{zone,node}_page_state __{mod,inc,dec}_{zone,node}_page_state =3D {mod,inc,dec}_{zone,node}_page_s= tate #else {mod,inc,dec}_{zone,node}_page_state __{mod,inc,dec}_{zone,node}_page_state #endif To test the performance difference, a page allocator microbenchmark: https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/benc= h/page_bench01.c=20 with loops=3D1000000 was used, on Intel Core i7-11850H @ 2.50GHz. For the single_page_alloc_free test, which does /** Loop to measure **/ for (i =3D 0; i < rec->loops; i++) { my_page =3D alloc_page(gfp_mask); if (unlikely(my_page =3D=3D NULL)) return 0; __free_page(my_page); } Unit is cycles. Vanilla Patched Diff 115.25 117 1.4% Signed-off-by: Marcelo Tosatti --- Index: linux-vmstat-remote/mm/vmstat.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/mm/vmstat.c +++ linux-vmstat-remote/mm/vmstat.c @@ -334,6 +334,188 @@ void set_pgdat_percpu_threshold(pg_data_ } } =20 +#ifdef CONFIG_HAVE_CMPXCHG_LOCAL +/* + * If we have cmpxchg_local support then we do not need to incur the overh= ead + * that comes with local_irq_save/restore if we use this_cpu_cmpxchg. + * + * mod_state() modifies the zone counter state through atomic per cpu + * operations. + * + * Overstep mode specifies how overstep should handled: + * 0 No overstepping + * 1 Overstepping half of threshold + * -1 Overstepping minus half of threshold + */ +static inline void mod_zone_state(struct zone *zone, enum zone_stat_item i= tem, + long delta, int overstep_mode) +{ + struct per_cpu_zonestat __percpu *pcp =3D zone->per_cpu_zonestats; + s8 __percpu *p =3D pcp->vm_stat_diff + item; + long o, n, t, z; + + do { + z =3D 0; /* overflow to zone counters */ + + /* + * The fetching of the stat_threshold is racy. We may apply + * a counter threshold to the wrong the cpu if we get + * rescheduled while executing here. However, the next + * counter update will apply the threshold again and + * therefore bring the counter under the threshold again. + * + * Most of the time the thresholds are the same anyways + * for all cpus in a zone. + */ + t =3D this_cpu_read(pcp->stat_threshold); + + o =3D this_cpu_read(*p); + n =3D delta + o; + + if (abs(n) > t) { + int os =3D overstep_mode * (t >> 1); + + /* Overflow must be added to zone counters */ + z =3D n + os; + n =3D -os; + } + } while (this_cpu_cmpxchg(*p, o, n) !=3D o); + + if (z) + zone_page_state_add(z, zone, item); +} + +void mod_zone_page_state(struct zone *zone, enum zone_stat_item item, + long delta) +{ + mod_zone_state(zone, item, delta, 0); +} +EXPORT_SYMBOL(mod_zone_page_state); + +void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item, + long delta) +{ + mod_zone_state(zone, item, delta, 0); +} +EXPORT_SYMBOL(__mod_zone_page_state); + +void inc_zone_page_state(struct page *page, enum zone_stat_item item) +{ + mod_zone_state(page_zone(page), item, 1, 1); +} +EXPORT_SYMBOL(inc_zone_page_state); + +void __inc_zone_page_state(struct page *page, enum zone_stat_item item) +{ + mod_zone_state(page_zone(page), item, 1, 1); +} +EXPORT_SYMBOL(__inc_zone_page_state); + +void dec_zone_page_state(struct page *page, enum zone_stat_item item) +{ + mod_zone_state(page_zone(page), item, -1, -1); +} +EXPORT_SYMBOL(dec_zone_page_state); + +void __dec_zone_page_state(struct page *page, enum zone_stat_item item) +{ + mod_zone_state(page_zone(page), item, -1, -1); +} +EXPORT_SYMBOL(__dec_zone_page_state); + +static inline void mod_node_state(struct pglist_data *pgdat, + enum node_stat_item item, + int delta, int overstep_mode) +{ + struct per_cpu_nodestat __percpu *pcp =3D pgdat->per_cpu_nodestats; + s8 __percpu *p =3D pcp->vm_node_stat_diff + item; + long o, n, t, z; + + if (vmstat_item_in_bytes(item)) { + /* + * Only cgroups use subpage accounting right now; at + * the global level, these items still change in + * multiples of whole pages. Store them as pages + * internally to keep the per-cpu counters compact. + */ + VM_WARN_ON_ONCE(delta & (PAGE_SIZE - 1)); + delta >>=3D PAGE_SHIFT; + } + + do { + z =3D 0; /* overflow to node counters */ + + /* + * The fetching of the stat_threshold is racy. We may apply + * a counter threshold to the wrong the cpu if we get + * rescheduled while executing here. However, the next + * counter update will apply the threshold again and + * therefore bring the counter under the threshold again. + * + * Most of the time the thresholds are the same anyways + * for all cpus in a node. + */ + t =3D this_cpu_read(pcp->stat_threshold); + + o =3D this_cpu_read(*p); + n =3D delta + o; + + if (abs(n) > t) { + int os =3D overstep_mode * (t >> 1); + + /* Overflow must be added to node counters */ + z =3D n + os; + n =3D -os; + } + } while (this_cpu_cmpxchg(*p, o, n) !=3D o); + + if (z) + node_page_state_add(z, pgdat, item); +} + +void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item it= em, + long delta) +{ + mod_node_state(pgdat, item, delta, 0); +} +EXPORT_SYMBOL(mod_node_page_state); + +void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item = item, + long delta) +{ + mod_node_state(pgdat, item, delta, 0); +} +EXPORT_SYMBOL(__mod_node_page_state); + +void inc_node_state(struct pglist_data *pgdat, enum node_stat_item item) +{ + mod_node_state(pgdat, item, 1, 1); +} + +void inc_node_page_state(struct page *page, enum node_stat_item item) +{ + mod_node_state(page_pgdat(page), item, 1, 1); +} +EXPORT_SYMBOL(inc_node_page_state); + +void __inc_node_page_state(struct page *page, enum node_stat_item item) +{ + mod_node_state(page_pgdat(page), item, 1, 1); +} +EXPORT_SYMBOL(__inc_node_page_state); + +void dec_node_page_state(struct page *page, enum node_stat_item item) +{ + mod_node_state(page_pgdat(page), item, -1, -1); +} +EXPORT_SYMBOL(dec_node_page_state); + +void __dec_node_page_state(struct page *page, enum node_stat_item item) +{ + mod_node_state(page_pgdat(page), item, -1, -1); +} +EXPORT_SYMBOL(__dec_node_page_state); +#else /* * For use when we know that interrupts are disabled, * or when we know that preemption is disabled and that @@ -541,149 +723,6 @@ void __dec_node_page_state(struct page * } EXPORT_SYMBOL(__dec_node_page_state); =20 -#ifdef CONFIG_HAVE_CMPXCHG_LOCAL -/* - * If we have cmpxchg_local support then we do not need to incur the overh= ead - * that comes with local_irq_save/restore if we use this_cpu_cmpxchg. - * - * mod_state() modifies the zone counter state through atomic per cpu - * operations. - * - * Overstep mode specifies how overstep should handled: - * 0 No overstepping - * 1 Overstepping half of threshold - * -1 Overstepping minus half of threshold -*/ -static inline void mod_zone_state(struct zone *zone, - enum zone_stat_item item, long delta, int overstep_mode) -{ - struct per_cpu_zonestat __percpu *pcp =3D zone->per_cpu_zonestats; - s8 __percpu *p =3D pcp->vm_stat_diff + item; - long o, n, t, z; - - do { - z =3D 0; /* overflow to zone counters */ - - /* - * The fetching of the stat_threshold is racy. We may apply - * a counter threshold to the wrong the cpu if we get - * rescheduled while executing here. However, the next - * counter update will apply the threshold again and - * therefore bring the counter under the threshold again. - * - * Most of the time the thresholds are the same anyways - * for all cpus in a zone. - */ - t =3D this_cpu_read(pcp->stat_threshold); - - o =3D this_cpu_read(*p); - n =3D delta + o; - - if (abs(n) > t) { - int os =3D overstep_mode * (t >> 1) ; - - /* Overflow must be added to zone counters */ - z =3D n + os; - n =3D -os; - } - } while (this_cpu_cmpxchg(*p, o, n) !=3D o); - - if (z) - zone_page_state_add(z, zone, item); -} - -void mod_zone_page_state(struct zone *zone, enum zone_stat_item item, - long delta) -{ - mod_zone_state(zone, item, delta, 0); -} -EXPORT_SYMBOL(mod_zone_page_state); - -void inc_zone_page_state(struct page *page, enum zone_stat_item item) -{ - mod_zone_state(page_zone(page), item, 1, 1); -} -EXPORT_SYMBOL(inc_zone_page_state); - -void dec_zone_page_state(struct page *page, enum zone_stat_item item) -{ - mod_zone_state(page_zone(page), item, -1, -1); -} -EXPORT_SYMBOL(dec_zone_page_state); - -static inline void mod_node_state(struct pglist_data *pgdat, - enum node_stat_item item, int delta, int overstep_mode) -{ - struct per_cpu_nodestat __percpu *pcp =3D pgdat->per_cpu_nodestats; - s8 __percpu *p =3D pcp->vm_node_stat_diff + item; - long o, n, t, z; - - if (vmstat_item_in_bytes(item)) { - /* - * Only cgroups use subpage accounting right now; at - * the global level, these items still change in - * multiples of whole pages. Store them as pages - * internally to keep the per-cpu counters compact. - */ - VM_WARN_ON_ONCE(delta & (PAGE_SIZE - 1)); - delta >>=3D PAGE_SHIFT; - } - - do { - z =3D 0; /* overflow to node counters */ - - /* - * The fetching of the stat_threshold is racy. We may apply - * a counter threshold to the wrong the cpu if we get - * rescheduled while executing here. However, the next - * counter update will apply the threshold again and - * therefore bring the counter under the threshold again. - * - * Most of the time the thresholds are the same anyways - * for all cpus in a node. - */ - t =3D this_cpu_read(pcp->stat_threshold); - - o =3D this_cpu_read(*p); - n =3D delta + o; - - if (abs(n) > t) { - int os =3D overstep_mode * (t >> 1) ; - - /* Overflow must be added to node counters */ - z =3D n + os; - n =3D -os; - } - } while (this_cpu_cmpxchg(*p, o, n) !=3D o); - - if (z) - node_page_state_add(z, pgdat, item); -} - -void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item it= em, - long delta) -{ - mod_node_state(pgdat, item, delta, 0); -} -EXPORT_SYMBOL(mod_node_page_state); - -void inc_node_state(struct pglist_data *pgdat, enum node_stat_item item) -{ - mod_node_state(pgdat, item, 1, 1); -} - -void inc_node_page_state(struct page *page, enum node_stat_item item) -{ - mod_node_state(page_pgdat(page), item, 1, 1); -} -EXPORT_SYMBOL(inc_node_page_state); - -void dec_node_page_state(struct page *page, enum node_stat_item item) -{ - mod_node_state(page_pgdat(page), item, -1, -1); -} -EXPORT_SYMBOL(dec_node_page_state); -#else /* * Use interrupt disable to serialize counter updates */ Index: linux-vmstat-remote/mm/page_alloc.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/mm/page_alloc.c +++ linux-vmstat-remote/mm/page_alloc.c @@ -6249,9 +6249,6 @@ static int page_alloc_cpu_dead(unsigned /* * Zero the differential counters of the dead processor * so that the vm statistics are consistent. - * - * This is only okay since the processor is dead and cannot - * race with what we are doing. */ cpu_vm_stats_fold(cpu); From nobody Fri Sep 12 13:01:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E255C7EE24 for ; Mon, 15 May 2023 18:07:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245046AbjEOSHI (ORCPT ); Mon, 15 May 2023 14:07:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39786 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244969AbjEOSGj (ORCPT ); Mon, 15 May 2023 14:06:39 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45AE21EC12 for ; Mon, 15 May 2023 11:03:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684173825; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=17wik0CX4Ctne5k/8Sx2ByUswspOZ54CsQaysWp8u+M=; b=QdNVUz6F/nJkm1b882emExwQJc/MSqAMKse/fXgAsiuZ1nmj1LdhnDUBq6GcBfQoAFVG8v 2Syr7Cv6R3ozRw+oBtQL2UheKVQ+ugqK2rC0CdRoQy9WP8rtSHlqRYkiV0pOSa1+phUuFZ Kv+NHDq/cNLDiUkl8vrdxyC3+9pNfIc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-133-OxnDqdCGMg-hW8ImTw0DVg-1; Mon, 15 May 2023 14:03:42 -0400 X-MC-Unique: OxnDqdCGMg-hW8ImTw0DVg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6554286C60D; Mon, 15 May 2023 18:03:41 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 047482026D16; Mon, 15 May 2023 18:03:41 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 74B2A4161E52C; Mon, 15 May 2023 15:02:17 -0300 (-03) Message-ID: <20230515180138.642582847@redhat.com> User-Agent: quilt/0.67 Date: Mon, 15 May 2023 15:00:24 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Russell King , Huacai Chen , Heiko Carstens , x86@kernel.org, Vlastimil Babka , Michal Hocko , Marcelo Tosatti Subject: [PATCH v8 09/13] vmstat: switch per-cpu vmstat counters to 32-bits References: <20230515180015.016409657@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Some architectures only provide xchg/cmpxchg in 32/64-bit quantities. Since the next patch is about to use xchg on per-CPU vmstat counters, switch them to s32. Signed-off-by: Marcelo Tosatti --- Index: linux-vmstat-remote/include/linux/mmzone.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/include/linux/mmzone.h +++ linux-vmstat-remote/include/linux/mmzone.h @@ -675,8 +675,8 @@ struct per_cpu_pages { =20 struct per_cpu_zonestat { #ifdef CONFIG_SMP - s8 vm_stat_diff[NR_VM_ZONE_STAT_ITEMS]; - s8 stat_threshold; + s32 vm_stat_diff[NR_VM_ZONE_STAT_ITEMS]; + s32 stat_threshold; #endif #ifdef CONFIG_NUMA /* @@ -689,8 +689,8 @@ struct per_cpu_zonestat { }; =20 struct per_cpu_nodestat { - s8 stat_threshold; - s8 vm_node_stat_diff[NR_VM_NODE_STAT_ITEMS]; + s32 stat_threshold; + s32 vm_node_stat_diff[NR_VM_NODE_STAT_ITEMS]; }; =20 #endif /* !__GENERATING_BOUNDS.H */ Index: linux-vmstat-remote/mm/vmstat.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/mm/vmstat.c +++ linux-vmstat-remote/mm/vmstat.c @@ -351,7 +351,7 @@ static inline void mod_zone_state(struct long delta, int overstep_mode) { struct per_cpu_zonestat __percpu *pcp =3D zone->per_cpu_zonestats; - s8 __percpu *p =3D pcp->vm_stat_diff + item; + s32 __percpu *p =3D pcp->vm_stat_diff + item; long o, n, t, z; =20 do { @@ -428,7 +428,7 @@ static inline void mod_node_state(struct int delta, int overstep_mode) { struct per_cpu_nodestat __percpu *pcp =3D pgdat->per_cpu_nodestats; - s8 __percpu *p =3D pcp->vm_node_stat_diff + item; + s32 __percpu *p =3D pcp->vm_node_stat_diff + item; long o, n, t, z; =20 if (vmstat_item_in_bytes(item)) { @@ -525,7 +525,7 @@ void __mod_zone_page_state(struct zone * long delta) { struct per_cpu_zonestat __percpu *pcp =3D zone->per_cpu_zonestats; - s8 __percpu *p =3D pcp->vm_stat_diff + item; + s32 __percpu *p =3D pcp->vm_stat_diff + item; long x; long t; =20 @@ -556,7 +556,7 @@ void __mod_node_page_state(struct pglist long delta) { struct per_cpu_nodestat __percpu *pcp =3D pgdat->per_cpu_nodestats; - s8 __percpu *p =3D pcp->vm_node_stat_diff + item; + s32 __percpu *p =3D pcp->vm_node_stat_diff + item; long x; long t; =20 @@ -614,8 +614,8 @@ EXPORT_SYMBOL(__mod_node_page_state); void __inc_zone_state(struct zone *zone, enum zone_stat_item item) { struct per_cpu_zonestat __percpu *pcp =3D zone->per_cpu_zonestats; - s8 __percpu *p =3D pcp->vm_stat_diff + item; - s8 v, t; + s32 __percpu *p =3D pcp->vm_stat_diff + item; + s32 v, t; =20 /* See __mod_node_page_state */ preempt_disable_nested(); @@ -623,7 +623,7 @@ void __inc_zone_state(struct zone *zone, v =3D __this_cpu_inc_return(*p); t =3D __this_cpu_read(pcp->stat_threshold); if (unlikely(v > t)) { - s8 overstep =3D t >> 1; + s32 overstep =3D t >> 1; =20 zone_page_state_add(v + overstep, zone, item); __this_cpu_write(*p, -overstep); @@ -635,8 +635,8 @@ void __inc_zone_state(struct zone *zone, void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item) { struct per_cpu_nodestat __percpu *pcp =3D pgdat->per_cpu_nodestats; - s8 __percpu *p =3D pcp->vm_node_stat_diff + item; - s8 v, t; + s32 __percpu *p =3D pcp->vm_node_stat_diff + item; + s32 v, t; =20 VM_WARN_ON_ONCE(vmstat_item_in_bytes(item)); =20 @@ -646,7 +646,7 @@ void __inc_node_state(struct pglist_data v =3D __this_cpu_inc_return(*p); t =3D __this_cpu_read(pcp->stat_threshold); if (unlikely(v > t)) { - s8 overstep =3D t >> 1; + s32 overstep =3D t >> 1; =20 node_page_state_add(v + overstep, pgdat, item); __this_cpu_write(*p, -overstep); @@ -670,8 +670,8 @@ EXPORT_SYMBOL(__inc_node_page_state); void __dec_zone_state(struct zone *zone, enum zone_stat_item item) { struct per_cpu_zonestat __percpu *pcp =3D zone->per_cpu_zonestats; - s8 __percpu *p =3D pcp->vm_stat_diff + item; - s8 v, t; + s32 __percpu *p =3D pcp->vm_stat_diff + item; + s32 v, t; =20 /* See __mod_node_page_state */ preempt_disable_nested(); @@ -679,7 +679,7 @@ void __dec_zone_state(struct zone *zone, v =3D __this_cpu_dec_return(*p); t =3D __this_cpu_read(pcp->stat_threshold); if (unlikely(v < - t)) { - s8 overstep =3D t >> 1; + s32 overstep =3D t >> 1; =20 zone_page_state_add(v - overstep, zone, item); __this_cpu_write(*p, overstep); @@ -691,8 +691,8 @@ void __dec_zone_state(struct zone *zone, void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item) { struct per_cpu_nodestat __percpu *pcp =3D pgdat->per_cpu_nodestats; - s8 __percpu *p =3D pcp->vm_node_stat_diff + item; - s8 v, t; + s32 __percpu *p =3D pcp->vm_node_stat_diff + item; + s32 v, t; =20 VM_WARN_ON_ONCE(vmstat_item_in_bytes(item)); =20 @@ -702,7 +702,7 @@ void __dec_node_state(struct pglist_data v =3D __this_cpu_dec_return(*p); t =3D __this_cpu_read(pcp->stat_threshold); if (unlikely(v < - t)) { - s8 overstep =3D t >> 1; + s32 overstep =3D t >> 1; =20 node_page_state_add(v - overstep, pgdat, item); __this_cpu_write(*p, overstep); From nobody Fri Sep 12 13:01:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E154FC77B75 for ; Mon, 15 May 2023 18:07:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245071AbjEOSHU (ORCPT ); Mon, 15 May 2023 14:07:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39838 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243798AbjEOSGl (ORCPT ); Mon, 15 May 2023 14:06:41 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2FADDC7F for ; Mon, 15 May 2023 11:03:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684173827; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=Hcv+7yXWUE6dz6IV3pdXnHOeNZqOsphpnY/hyL0+XpA=; b=OLxlu9HJyXEZhnoWAvLkuiMTn6LUCC7NF4hYWkUqjNJyHpyu+zJSDbGKvoAA6XWTnGDDUp Gl+By/C5QTu33vea79vJvDZgpkeD3zS4fU18TTQRIV3MYUGZLYbDzPbRwNOcMRbIflECRA M/OPO6xbCjnh+jwBvREjrU8iJZZmZKM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-411-lHcv7IUkO2a5FK4nyxKaQw-1; Mon, 15 May 2023 14:03:43 -0400 X-MC-Unique: lHcv7IUkO2a5FK4nyxKaQw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0080E88CC41; Mon, 15 May 2023 18:03:43 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C62E92166B26; Mon, 15 May 2023 18:03:42 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 7B6454133CFAA; Mon, 15 May 2023 15:02:17 -0300 (-03) Message-ID: <20230515180138.668066640@redhat.com> User-Agent: quilt/0.67 Date: Mon, 15 May 2023 15:00:25 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Russell King , Huacai Chen , Heiko Carstens , x86@kernel.org, Vlastimil Babka , Michal Hocko , Marcelo Tosatti Subject: [PATCH v8 10/13] mm/vmstat: use xchg in cpu_vm_stats_fold References: <20230515180015.016409657@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In preparation to switch vmstat shepherd to flush per-CPU counters remotely, use xchg instead of a pair of read/write instructions. Signed-off-by: Marcelo Tosatti --- Index: linux-vmstat-remote/mm/vmstat.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/mm/vmstat.c +++ linux-vmstat-remote/mm/vmstat.c @@ -924,7 +924,7 @@ static int refresh_cpu_vm_stats(bool do_ } =20 /* - * Fold the data for an offline cpu into the global array. + * Fold the data for a cpu into the global array. * There cannot be any access by the offline cpu and therefore * synchronization is simplified. */ @@ -945,8 +945,7 @@ void cpu_vm_stats_fold(int cpu) if (pzstats->vm_stat_diff[i]) { int v; =20 - v =3D pzstats->vm_stat_diff[i]; - pzstats->vm_stat_diff[i] =3D 0; + v =3D xchg(&pzstats->vm_stat_diff[i], 0); atomic_long_add(v, &zone->vm_stat[i]); global_zone_diff[i] +=3D v; } @@ -956,8 +955,7 @@ void cpu_vm_stats_fold(int cpu) if (pzstats->vm_numa_event[i]) { unsigned long v; =20 - v =3D pzstats->vm_numa_event[i]; - pzstats->vm_numa_event[i] =3D 0; + v =3D xchg(&pzstats->vm_numa_event[i], 0); zone_numa_event_add(v, zone, i); } } @@ -973,8 +971,7 @@ void cpu_vm_stats_fold(int cpu) if (p->vm_node_stat_diff[i]) { int v; =20 - v =3D p->vm_node_stat_diff[i]; - p->vm_node_stat_diff[i] =3D 0; + v =3D xchg(&p->vm_node_stat_diff[i], 0); atomic_long_add(v, &pgdat->vm_stat[i]); global_node_diff[i] +=3D v; } From nobody Fri Sep 12 13:01:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95314C77B7D for ; Mon, 15 May 2023 18:07:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245082AbjEOSH1 (ORCPT ); Mon, 15 May 2023 14:07:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244982AbjEOSGn (ORCPT ); Mon, 15 May 2023 14:06:43 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E5931560A for ; Mon, 15 May 2023 11:03:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684173827; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=jkb4dlPC4zyep6CJ8cMLWwNNaJdusa90HtfOUuDaHEw=; b=XfGnLl3Np54PnRFn+fQaF1eUHilrj4ugMZE4hvTwrjlQREw9jZQujPOZkUUnUpK+Efn2DZ blHFzb1+Yk/ZMWufurmEjQLBq4vIFM6LLYJoC6kgDvLsga9r+13RNJk4NgHRpWWZ1E56fj NWEjOkUbvMxaSWqEwzIgwpwM3yBxk0Y= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-646-gp1bspFaPzeFbxO02PyHjg-1; Mon, 15 May 2023 14:03:43 -0400 X-MC-Unique: gp1bspFaPzeFbxO02PyHjg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 27D73185A79C; Mon, 15 May 2023 18:03:43 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id F1C7340C2063; Mon, 15 May 2023 18:03:42 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 81D6B4161E52E; Mon, 15 May 2023 15:02:17 -0300 (-03) Message-ID: <20230515180138.692835888@redhat.com> User-Agent: quilt/0.67 Date: Mon, 15 May 2023 15:00:26 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Russell King , Huacai Chen , Heiko Carstens , x86@kernel.org, Vlastimil Babka , Michal Hocko , Marcelo Tosatti Subject: [PATCH v8 11/13] mm/vmstat: switch vmstat shepherd to flush per-CPU counters remotely References: <20230515180015.016409657@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" With a task that busy loops on a given CPU, the kworker interruption to execute vmstat_update is undesired and may exceed latency thresholds for certain applications. Performance details for the kworker interruption: oslat 1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000) oslat 1094.456971: workqueue_queue_work: ... function=3Dvmstat_update ... oslat 1094.456974: sched_switch: prev_comm=3Doslat ... =3D=3D> next_comm= =3Dkworker/5:1 ... kworker 1094.456978: sched_switch: prev_comm=3Dkworker/5:1 =3D=3D> next_com= m=3Doslat ... The example above shows an additional 7us for the oslat -> kworker -> oslat switches. In the case of a virtualized CPU, and the vmstat_update interruption in the host (of a qemu-kvm vcpu), the latency penalty observed in the guest is higher than 50us, violating the acceptable latency threshold for certain applications. To fix this, now that the counters are modified via cmpxchg both CPU locally (via the account functions), and remotely (via cpu_vm_stats_fold), its possible to switch vmstat_shepherd to perform the per-CPU vmstats folding remotely. Signed-off-by: Marcelo Tosatti --- Index: linux-vmstat-remote/mm/vmstat.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/mm/vmstat.c +++ linux-vmstat-remote/mm/vmstat.c @@ -2049,6 +2049,23 @@ static void vmstat_shepherd(struct work_ =20 static DECLARE_DEFERRABLE_WORK(shepherd, vmstat_shepherd); =20 +#ifdef CONFIG_HAVE_CMPXCHG_LOCAL +/* Flush counters remotely if CPU uses cmpxchg to update its per-CPU count= ers */ +static void vmstat_shepherd(struct work_struct *w) +{ + int cpu; + + cpus_read_lock(); + for_each_online_cpu(cpu) { + cpu_vm_stats_fold(cpu); + cond_resched(); + } + cpus_read_unlock(); + + schedule_delayed_work(&shepherd, + round_jiffies_relative(sysctl_stat_interval)); +} +#else static void vmstat_shepherd(struct work_struct *w) { int cpu; @@ -2068,6 +2085,7 @@ static void vmstat_shepherd(struct work_ schedule_delayed_work(&shepherd, round_jiffies_relative(sysctl_stat_interval)); } +#endif =20 static void __init start_shepherd_timer(void) { From nobody Fri Sep 12 13:01:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C028C7EE24 for ; Mon, 15 May 2023 18:07:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245029AbjEOSHC (ORCPT ); Mon, 15 May 2023 14:07:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244957AbjEOSGf (ORCPT ); Mon, 15 May 2023 14:06:35 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45A521CB87 for ; Mon, 15 May 2023 11:03:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684173825; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=0t7wAXZL5USXQxpN4vCajFBy/hf1+LlNt2JScd0fQmU=; b=DO004ozI2GWIZFr4RLYziM4iVkGLBQYiJxbQnW344J3DQ1N4Y3XUGRmwLop/WIrqbSBoxV 9vTzmr/YTVS1BiVReiSENDmN4QmmBYxOx+71fxXexnv/vRU5BkUzzUEqLjTDrjx0nmFnTu Fd3dDi/cY+3HTqfR7YS0lnCWAvDk0RA= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-411-S8EJZ_4aPK6en7Q5TLj-6A-1; Mon, 15 May 2023 14:03:43 -0400 X-MC-Unique: S8EJZ_4aPK6en7Q5TLj-6A-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0C0671C0F2E3; Mon, 15 May 2023 18:03:43 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D41A42166B29; Mon, 15 May 2023 18:03:42 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 890914161E531; Mon, 15 May 2023 15:02:17 -0300 (-03) Message-ID: <20230515180138.717342763@redhat.com> User-Agent: quilt/0.67 Date: Mon, 15 May 2023 15:00:27 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Russell King , Huacai Chen , Heiko Carstens , x86@kernel.org, Vlastimil Babka , Michal Hocko , Marcelo Tosatti Subject: [PATCH v8 12/13] mm/vmstat: refresh stats remotely instead of via work item References: <20230515180015.016409657@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Refresh per-CPU stats remotely, instead of queueing=20 work items, for the stat_refresh procfs method. This fixes sosreport hang (which uses vmstat_refresh) with spinning SCHED_FIFO process. Signed-off-by: Marcelo Tosatti --- Index: linux-vmstat-remote/mm/vmstat.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/mm/vmstat.c +++ linux-vmstat-remote/mm/vmstat.c @@ -1907,11 +1907,20 @@ static DEFINE_PER_CPU(struct delayed_wor int sysctl_stat_interval __read_mostly =3D HZ; =20 #ifdef CONFIG_PROC_FS +#ifdef CONFIG_HAVE_CMPXCHG_LOCAL +static int refresh_all_vm_stats(void); +#else static void refresh_vm_stats(struct work_struct *work) { refresh_cpu_vm_stats(true); } =20 +static int refresh_all_vm_stats(void) +{ + return schedule_on_each_cpu(refresh_vm_stats); +} +#endif + int vmstat_refresh(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { @@ -1931,7 +1940,7 @@ int vmstat_refresh(struct ctl_table *tab * transiently negative values, report an error here if any of * the stats is negative, so we know to go looking for imbalance. */ - err =3D schedule_on_each_cpu(refresh_vm_stats); + err =3D refresh_all_vm_stats(); if (err) return err; for (i =3D 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { @@ -2051,7 +2060,7 @@ static DECLARE_DEFERRABLE_WORK(shepherd, =20 #ifdef CONFIG_HAVE_CMPXCHG_LOCAL /* Flush counters remotely if CPU uses cmpxchg to update its per-CPU count= ers */ -static void vmstat_shepherd(struct work_struct *w) +static int refresh_all_vm_stats(void) { int cpu; =20 @@ -2061,7 +2070,12 @@ static void vmstat_shepherd(struct work_ cond_resched(); } cpus_read_unlock(); + return 0; +} =20 +static void vmstat_shepherd(struct work_struct *w) +{ + refresh_all_vm_stats(); schedule_delayed_work(&shepherd, round_jiffies_relative(sysctl_stat_interval)); } From nobody Fri Sep 12 13:01:10 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77F6AC77B7D for ; Mon, 15 May 2023 18:07:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245021AbjEOSHf (ORCPT ); Mon, 15 May 2023 14:07:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245017AbjEOSGp (ORCPT ); Mon, 15 May 2023 14:06:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F42071EC0D for ; Mon, 15 May 2023 11:03:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684173829; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=RFL7dZ0ubDZf1tDjIuRkhAMaPrNe1DHD6pj/+UPZboc=; b=HGDm1vb4P7oeLIXHQ1mp1nGrWXOfIuI4cDDWr+V/SQgY3zPbuA1dnlRxYImQiU5AIheVs0 FuzKDLERDpDSUM+7DB8okC+Vn7XcBYy9cFUdlO40QNXPmn7TE6IgyPWCYfJ2heOTQSHu5I zxYK67PyU68738Ij8auVpfF8T+z8/bY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-605-pdz-PnGCOay5sxJCDmG24w-1; Mon, 15 May 2023 14:03:43 -0400 X-MC-Unique: pdz-PnGCOay5sxJCDmG24w-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3A68188CC44; Mon, 15 May 2023 18:03:43 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 033C935453; Mon, 15 May 2023 18:03:43 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 8FD874161E532; Mon, 15 May 2023 15:02:17 -0300 (-03) Message-ID: <20230515180138.742158693@redhat.com> User-Agent: quilt/0.67 Date: Mon, 15 May 2023 15:00:28 -0300 From: Marcelo Tosatti To: Christoph Lameter Cc: Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Russell King , Huacai Chen , Heiko Carstens , x86@kernel.org, Vlastimil Babka , Michal Hocko , Marcelo Tosatti Subject: [PATCH v8 13/13] vmstat: add pcp remote node draining via cpu_vm_stats_fold References: <20230515180015.016409657@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Large NUMA systems might have significant portions=20 of system memory to be trapped in pcp queues. The number of pcp is = =20 determined by the number of processors and nodes in a system. A system = =20 with 4 processors and 2 nodes has 8 pcps which is okay. But a system = =20 with 1024 processors and 512 nodes has 512k pcps with a high potential = =20 for large amount of memory being caught in them. Enable remote node draining for the CONFIG_HAVE_CMPXCHG_LOCAL case, where vmstat_shepherd will perform the aging and draining via cpu_vm_stats_fold. Suggested-by: Vlastimil Babka Signed-off-by: Marcelo Tosatti --- Index: linux-vmstat-remote/mm/vmstat.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/mm/vmstat.c +++ linux-vmstat-remote/mm/vmstat.c @@ -928,7 +928,7 @@ static int refresh_cpu_vm_stats(bool do_ * There cannot be any access by the offline cpu and therefore * synchronization is simplified. */ -void cpu_vm_stats_fold(int cpu) +void cpu_vm_stats_fold(int cpu, bool do_pagesets) { struct pglist_data *pgdat; struct zone *zone; @@ -938,6 +938,9 @@ void cpu_vm_stats_fold(int cpu) =20 for_each_populated_zone(zone) { struct per_cpu_zonestat *pzstats; +#ifdef CONFIG_NUMA + struct per_cpu_pages *pcp =3D per_cpu_ptr(zone->per_cpu_pageset, cpu); +#endif =20 pzstats =3D per_cpu_ptr(zone->per_cpu_zonestats, cpu); =20 @@ -948,6 +951,11 @@ void cpu_vm_stats_fold(int cpu) v =3D xchg(&pzstats->vm_stat_diff[i], 0); atomic_long_add(v, &zone->vm_stat[i]); global_zone_diff[i] +=3D v; +#ifdef CONFIG_NUMA + /* 3 seconds idle till flush */ + if (do_pagesets) + pcp->expire =3D 3; +#endif } } #ifdef CONFIG_NUMA @@ -959,6 +967,38 @@ void cpu_vm_stats_fold(int cpu) zone_numa_event_add(v, zone, i); } } + + if (do_pagesets) { + cond_resched(); + /* + * Deal with draining the remote pageset of a + * processor + * + * Check if there are pages remaining in this pageset + * if not then there is nothing to expire. + */ + if (!pcp->expire || !pcp->count) + continue; + + /* + * We never drain zones local to this processor. + */ + if (zone_to_nid(zone) =3D=3D cpu_to_node(cpu)) { + pcp->expire =3D 0; + continue; + } + + WARN_ON(pcp->expire < 0); + /* + * pcp->expire is only accessed from vmstat_shepherd context, + * therefore no locking is required. + */ + if (--pcp->expire) + continue; + + if (pcp->count) + drain_zone_pages(zone, pcp); + } #endif } =20 @@ -2066,7 +2106,7 @@ static int refresh_all_vm_stats(void) =20 cpus_read_lock(); for_each_online_cpu(cpu) { - cpu_vm_stats_fold(cpu); + cpu_vm_stats_fold(cpu, true); cond_resched(); } cpus_read_unlock(); Index: linux-vmstat-remote/include/linux/vmstat.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/include/linux/vmstat.h +++ linux-vmstat-remote/include/linux/vmstat.h @@ -297,7 +297,7 @@ extern void __dec_zone_state(struct zone extern void __dec_node_state(struct pglist_data *, enum node_stat_item); =20 void quiet_vmstat(void); -void cpu_vm_stats_fold(int cpu); +void cpu_vm_stats_fold(int cpu, bool do_pagesets); void refresh_zone_stat_thresholds(void); =20 struct ctl_table; Index: linux-vmstat-remote/mm/page_alloc.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- linux-vmstat-remote.orig/mm/page_alloc.c +++ linux-vmstat-remote/mm/page_alloc.c @@ -6250,7 +6250,7 @@ static int page_alloc_cpu_dead(unsigned * Zero the differential counters of the dead processor * so that the vm statistics are consistent. */ - cpu_vm_stats_fold(cpu); + cpu_vm_stats_fold(cpu, false); =20 for_each_populated_zone(zone) zone_pcp_update(zone, 0);