From nobody Sun Dec 14 06:34:19 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCE60C07E9D for ; Sat, 24 Sep 2022 15:22:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233687AbiIXPWo (ORCPT ); Sat, 24 Sep 2022 11:22:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36302 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229447AbiIXPWi (ORCPT ); Sat, 24 Sep 2022 11:22:38 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E8C9770E56 for ; Sat, 24 Sep 2022 08:22:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664032957; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TJy6J71aJA3IuazPVpZvOJ+o0HTfRS9wHinNTI5//yw=; b=OetVel6WfVEaemmg6Mg6RkZGHTVlcQDaUYxuXTIw9D5xaHztNY6DSP5T0kl0sqdynIbqdU OxoLZv9K/AL2VLRd4iXzd0PoexCM9KcAHqHuQGLmGx92bWYNVJ6O61zZW8JZHnjVudrj/e xg6cp0E8ibBw0UtHewvEY3G8JXpomt0= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-438-7Rh1qVlwM1GhsNQnD08-UA-1; Sat, 24 Sep 2022 11:22:36 -0400 X-MC-Unique: 7Rh1qVlwM1GhsNQnD08-UA-1 Received: by mail-wm1-f69.google.com with SMTP id i132-20020a1c3b8a000000b003b339a8556eso1631956wma.4 for ; Sat, 24 Sep 2022 08:22:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=TJy6J71aJA3IuazPVpZvOJ+o0HTfRS9wHinNTI5//yw=; b=yRl7BbKTu4Uk22VAt8GPei6WIn8o17pLDb90pTiMDq6y1dx+IIjAr91ndpMrCHLP/e qXeMUeoVxgljQ2wJzBPOzLs+u2KNd7ZRnbIuPTnmyQny5HaXqsDE4p7TdpcqzVh4pchV dXFVA4jz5ZNUu/1UC71VKHr+i1qM52a57S6qti65bBY8AMQolDyNQqHICO+ISCB7iW7O zcljSgM3HqCDJtpcGZPb6KdCg7Kp9WMmtw9NJHf49WjKqlxQF9o204t6hotngsuIUibq Ol645Rb4MulTvO3qpH68vuSuGG7lY/q0egYFQjat9VeQ+hoeKDwQkd0OkKUg4KHXfFmF ARBg== X-Gm-Message-State: ACrzQf31Nd6QPaWli1iwRa8IaUlKGT/QzOtsHmuchIMncyinJ3skCkmP BK+kKOQ9Zpj1egy+ad2J7/gvuwT1Hs4adIeHw0dSoVMDkOIQIwyWEJLHqxjKuiqzG4gsF/rVnBg ybccgnAywBW9nZ65XWI7Y9GU= X-Received: by 2002:adf:fd47:0:b0:228:dbab:8f5d with SMTP id h7-20020adffd47000000b00228dbab8f5dmr8180139wrs.524.1664032954966; Sat, 24 Sep 2022 08:22:34 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6sU8DZqkudooT7dcV2k2CxDwD6Eg+x8I5ijbDfegC963F4VJd1EaU/hhsShwvAdgEZ3nK3Sg== X-Received: by 2002:adf:fd47:0:b0:228:dbab:8f5d with SMTP id h7-20020adffd47000000b00228dbab8f5dmr8180125wrs.524.1664032954760; Sat, 24 Sep 2022 08:22:34 -0700 (PDT) Received: from localhost (cpc111743-lutn13-2-0-cust979.9-3.cable.virginm.net. [82.17.115.212]) by smtp.gmail.com with ESMTPSA id p21-20020a1c5455000000b003b27f644488sm5396182wmi.29.2022.09.24.08.22.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 24 Sep 2022 08:22:29 -0700 (PDT) From: Aaron Tomlin To: frederic@kernel.org, mtosatti@redhat.com Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, atomlin@atomlin.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v8 1/5] mm/vmstat: Add CPU-specific variable to track a vmstat discrepancy Date: Sat, 24 Sep 2022 16:22:23 +0100 Message-Id: <20220924152227.819815-2-atomlin@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220924152227.819815-1-atomlin@redhat.com> References: <20220924152227.819815-1-atomlin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Introduce a CPU-specific variable namely vmstat_dirty to indicate if a vmstat imbalance is present for a given CPU. Therefore, at the appropriate time, we can fold all the remaining differentials. This patch also provides trivial helpers for modification and testing. Signed-off-by: Aaron Tomlin --- mm/vmstat.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/mm/vmstat.c b/mm/vmstat.c index 90af9a8572f5..24c67b2e58fd 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -195,6 +195,22 @@ void fold_vm_numa_events(void) #endif =20 #ifdef CONFIG_SMP +static DEFINE_PER_CPU_ALIGNED(bool, vmstat_dirty); + +static inline void vmstat_mark_dirty(void) +{ + this_cpu_write(vmstat_dirty, true); +} + +static inline void vmstat_clear_dirty(void) +{ + this_cpu_write(vmstat_dirty, false); +} + +static inline bool is_vmstat_dirty(void) +{ + return this_cpu_read(vmstat_dirty); +} =20 int calculate_pressure_threshold(struct zone *zone) { --=20 2.37.1 From nobody Sun Dec 14 06:34:19 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17B7CC6FA86 for ; Sat, 24 Sep 2022 15:22:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233696AbiIXPWu (ORCPT ); Sat, 24 Sep 2022 11:22:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233417AbiIXPWm (ORCPT ); Sat, 24 Sep 2022 11:22:42 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A5B0873915 for ; Sat, 24 Sep 2022 08:22:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664032959; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vx+xKpue7wTXyWig6qL4Y2Lw1zImsNhejs2t+QDspOc=; b=bJMhfLNK1xtCnEArt9hJDC5ZhpWay7n7F/3hZsBSg7iSgKZP3F1tpdsee8ngMZbz/ur/JF 54ME2gxOsH2dN+iHB7ANKCBZll2zYf5UOdHWsFcGyY+iwro1PEMiXR8bKq2NNs6c48xKHT Tm+hb6BQ24P2AsIZS3e2nsH11gBt+so= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-310-SVyM0o3lPXesBOLvJl9BFw-1; Sat, 24 Sep 2022 11:22:38 -0400 X-MC-Unique: SVyM0o3lPXesBOLvJl9BFw-1 Received: by mail-wm1-f72.google.com with SMTP id fc12-20020a05600c524c00b003b5054c70d3so1630459wmb.5 for ; Sat, 24 Sep 2022 08:22:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=vx+xKpue7wTXyWig6qL4Y2Lw1zImsNhejs2t+QDspOc=; b=nvafOnyqK8k11VDs+Tnkn1fU3QEojqLm8ZdR9PFfH47wnpWbkt9qAy4t2gBwD6+boy hF1FvAB2o49d16b6ZhkV5C2F0cMVmGrRcXhyx/sxChYpOwvz32lkg9EptXYoX8mkCN8y 7afBv2tVNj0gErekH1MAQ8woNuu0gE+6SRJIWAFoFXJnXLQbidAFpO6DJ2AM2/NWAhRl Z/gUh1OYncGhxgirIGxtD5jyxXwWFIfcfvlvLA5VU5vB0MFXfCole6Q+n1xSv+LCZMrS ElN5hKMYREN9oxUCJJH9T2EG3bvnPbBFyTz7wcOOYyc1bhtEP8zfD/VKqMF5Fw1DJzg7 rIAg== X-Gm-Message-State: ACrzQf3fQ1zA5eyDLn6DZu8UWQ2TkKpXNI1PSTRyL4GZNA2/VZ+qmSuh W+FvAWvMTWt2QNGb27sjoH3qOQHBt1J4c54AqqXmtB2YcqwfilrZ/mtDbV6jg3v4/tmXOYWlMb5 LEG2AFUDO8ttwevpdqZRz+oU= X-Received: by 2002:adf:ea86:0:b0:22b:9f9f:a3b with SMTP id s6-20020adfea86000000b0022b9f9f0a3bmr5684010wrm.130.1664032957440; Sat, 24 Sep 2022 08:22:37 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4XlRA/1nNXyboTHo5AUkSzsW77JkBgIWvz5OJ8eCCBl08evcHlFngS43fDNqZ1Sk/si/OyYQ== X-Received: by 2002:adf:ea86:0:b0:22b:9f9f:a3b with SMTP id s6-20020adfea86000000b0022b9f9f0a3bmr5683998wrm.130.1664032957196; Sat, 24 Sep 2022 08:22:37 -0700 (PDT) Received: from localhost (cpc111743-lutn13-2-0-cust979.9-3.cable.virginm.net. [82.17.115.212]) by smtp.gmail.com with ESMTPSA id f11-20020a5d664b000000b0022af63bb6f2sm9701817wrw.113.2022.09.24.08.22.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 24 Sep 2022 08:22:36 -0700 (PDT) From: Aaron Tomlin To: frederic@kernel.org, mtosatti@redhat.com Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, atomlin@atomlin.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v8 2/5] mm/vmstat: Use vmstat_dirty to track CPU-specific vmstat discrepancies Date: Sat, 24 Sep 2022 16:22:24 +0100 Message-Id: <20220924152227.819815-3-atomlin@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220924152227.819815-1-atomlin@redhat.com> References: <20220924152227.819815-1-atomlin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This patch will now use the previously introduced CPU-specific variable namely vmstat_dirty to indicate if a vmstat differential/or imbalance is present for a given CPU. So, at the appropriate time, vmstat processing can be initiated. The hope is that this particular approach is "cheaper" when compared to need_update(). The idea is based on Marcelo's patch [1]. [1]: https://lore.kernel.org/lkml/20220204173554.763888172@fedora.localdoma= in/ Signed-off-by: Aaron Tomlin --- mm/vmstat.c | 48 ++++++++++++++---------------------------------- 1 file changed, 14 insertions(+), 34 deletions(-) diff --git a/mm/vmstat.c b/mm/vmstat.c index 24c67b2e58fd..472175642bd9 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -383,6 +383,7 @@ void __mod_zone_page_state(struct zone *zone, enum zone= _stat_item item, x =3D 0; } __this_cpu_write(*p, x); + vmstat_mark_dirty(); =20 if (IS_ENABLED(CONFIG_PREEMPT_RT)) preempt_enable(); @@ -421,6 +422,7 @@ void __mod_node_page_state(struct pglist_data *pgdat, e= num node_stat_item item, x =3D 0; } __this_cpu_write(*p, x); + vmstat_mark_dirty(); =20 if (IS_ENABLED(CONFIG_PREEMPT_RT)) preempt_enable(); @@ -619,6 +621,7 @@ static inline void mod_zone_state(struct zone *zone, =20 if (z) zone_page_state_add(z, zone, item); + vmstat_mark_dirty(); } =20 void mod_zone_page_state(struct zone *zone, enum zone_stat_item item, @@ -687,6 +690,7 @@ static inline void mod_node_state(struct pglist_data *p= gdat, =20 if (z) node_page_state_add(z, pgdat, item); + vmstat_mark_dirty(); } =20 void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item it= em, @@ -841,6 +845,14 @@ static int refresh_cpu_vm_stats(bool do_pagesets) int global_node_diff[NR_VM_NODE_STAT_ITEMS] =3D { 0, }; int changes =3D 0; =20 + /* + * Clear vmstat_dirty before clearing the percpu vmstats. + * If interrupts are enabled, it is possible that an interrupt + * or another task modifies a percpu vmstat, which will + * set vmstat_dirty to true. + */ + vmstat_clear_dirty(); + for_each_populated_zone(zone) { struct per_cpu_zonestat __percpu *pzstats =3D zone->per_cpu_zonestats; #ifdef CONFIG_NUMA @@ -1971,35 +1983,6 @@ static void vmstat_update(struct work_struct *w) } } =20 -/* - * Check if the diffs for a certain cpu indicate that - * an update is needed. - */ -static bool need_update(int cpu) -{ - pg_data_t *last_pgdat =3D NULL; - struct zone *zone; - - for_each_populated_zone(zone) { - struct per_cpu_zonestat *pzstats =3D per_cpu_ptr(zone->per_cpu_zonestats= , cpu); - struct per_cpu_nodestat *n; - - /* - * The fast way of checking if there are any vmstat diffs. - */ - if (memchr_inv(pzstats->vm_stat_diff, 0, sizeof(pzstats->vm_stat_diff))) - return true; - - if (last_pgdat =3D=3D zone->zone_pgdat) - continue; - last_pgdat =3D zone->zone_pgdat; - n =3D per_cpu_ptr(zone->zone_pgdat->per_cpu_nodestats, cpu); - if (memchr_inv(n->vm_node_stat_diff, 0, sizeof(n->vm_node_stat_diff))) - return true; - } - return false; -} - /* * Switch off vmstat processing and then fold all the remaining differenti= als * until the diffs stay at zero. The function is used by NOHZ and can only= be @@ -2010,10 +1993,7 @@ void quiet_vmstat(void) if (system_state !=3D SYSTEM_RUNNING) return; =20 - if (!delayed_work_pending(this_cpu_ptr(&vmstat_work))) - return; - - if (!need_update(smp_processor_id())) + if (!is_vmstat_dirty()) return; =20 /* @@ -2044,7 +2024,7 @@ static void vmstat_shepherd(struct work_struct *w) for_each_online_cpu(cpu) { struct delayed_work *dw =3D &per_cpu(vmstat_work, cpu); =20 - if (!delayed_work_pending(dw) && need_update(cpu)) + if (!delayed_work_pending(dw) && per_cpu(vmstat_dirty, cpu)) queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0); =20 cond_resched(); --=20 2.37.1 From nobody Sun Dec 14 06:34:19 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA3B3C32771 for ; Sat, 24 Sep 2022 15:22:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233712AbiIXPW4 (ORCPT ); Sat, 24 Sep 2022 11:22:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233684AbiIXPWo (ORCPT ); Sat, 24 Sep 2022 11:22:44 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9C6F80F49 for ; Sat, 24 Sep 2022 08:22:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664032962; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TUeQREelS5Rh9IXzWSL5LmYfttFnlORFlEDP5NM1JJE=; b=Ix2RY/4Jde5sKwtdYlDXxi7A1o8ZA9kHKouIlpTjqYc/FKR8yeEHJmUxUwUQywPJhzC9Nn Imz7wuKlPeNLyrQmR3sOecNruhovSKL9QhFgeYdoFJFV8HeIz7G51CCxAkTWAGARUNJ1Ci n2CZ8rLKcqqRMI2rt9gav4wY3zjhPns= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-651-S95I4ovTO0q5jRXSl3qK8w-1; Sat, 24 Sep 2022 11:22:41 -0400 X-MC-Unique: S95I4ovTO0q5jRXSl3qK8w-1 Received: by mail-wm1-f72.google.com with SMTP id p24-20020a05600c1d9800b003b4b226903dso4337060wms.4 for ; Sat, 24 Sep 2022 08:22:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=TUeQREelS5Rh9IXzWSL5LmYfttFnlORFlEDP5NM1JJE=; b=xSYF0ZAvSNmH03WouVtC6q8joWqIjNcYl5GxVTANn3PKxc51plPZRn8VTzrdswmMQg LG5gmkvpjPmLl3MBTWmv5icPmkG7N84no83RRBudMRnTecYcrP7fujEb0L0NdP9MPY08 TNwejajgxOhebOrJln3ForDLM21lp8ClRsx8iUqcwE4ywUpjmUeXg9pzSTqFwbRSJqHx NX+3CgGeUk+8QiYsE5YS6E4y227PCPCELZLGUlOvhpJsCRSYeLrcN5ty4M1rVCYTN5dg TqiXA8sylR381K6VdMsMpY9U+JHzbwYiHkX9WffHcbiWLpmJ+kxQT05uu3W7FR1BSNRf 0J4w== X-Gm-Message-State: ACrzQf23Kx1KEPh9B1zXEcU8fpjRXnzxU/8Xl24wnnF1DscQ9KtgLEKN W1QNXnYGNr5y84DTnC7TPl7Z1TKhhTcRGza5rkAXwXo9sGVXSsUEYD03rVhxxSwcFygBZ+WtIOo b/0NA144I+aJpisO0aUIVVuc= X-Received: by 2002:a05:600c:3781:b0:3a6:804a:afc with SMTP id o1-20020a05600c378100b003a6804a0afcmr16215759wmr.27.1664032960277; Sat, 24 Sep 2022 08:22:40 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5m/4w+5hkvjp8DxE5I0zgzzGcCHocqRX8TyR/j15JVMnw/b2kAHfK1Vm5eShxot1oDE/8fJQ== X-Received: by 2002:a05:600c:3781:b0:3a6:804a:afc with SMTP id o1-20020a05600c378100b003a6804a0afcmr16215737wmr.27.1664032960043; Sat, 24 Sep 2022 08:22:40 -0700 (PDT) Received: from localhost (cpc111743-lutn13-2-0-cust979.9-3.cable.virginm.net. [82.17.115.212]) by smtp.gmail.com with ESMTPSA id e19-20020a5d5953000000b00228c792aaaasm9941182wri.100.2022.09.24.08.22.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 24 Sep 2022 08:22:39 -0700 (PDT) From: Aaron Tomlin To: frederic@kernel.org, mtosatti@redhat.com Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, atomlin@atomlin.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v8 3/5] mm/vmstat: Do not queue vmstat_update if tick is stopped Date: Sat, 24 Sep 2022 16:22:25 +0100 Message-Id: <20220924152227.819815-4-atomlin@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220924152227.819815-1-atomlin@redhat.com> References: <20220924152227.819815-1-atomlin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Marcelo Tosatti From the vmstat shepherd, for CPUs that have the tick stopped, do not queue local work to flush the per-CPU vmstats, since in that case the flush is performed on return to userspace or when entering idle. Also cancel any delayed work on the local CPU, when entering idle on nohz full CPUs. Per-CPU pages can be freed remotely from housekeeping CPUs. Signed-off-by: Marcelo Tosatti --- mm/vmstat.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/mm/vmstat.c b/mm/vmstat.c index 472175642bd9..3b9a497965b4 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -29,6 +29,7 @@ #include #include #include +#include =20 #include "internal.h" =20 @@ -1990,19 +1991,23 @@ static void vmstat_update(struct work_struct *w) */ void quiet_vmstat(void) { + struct delayed_work *dw; + if (system_state !=3D SYSTEM_RUNNING) return; =20 if (!is_vmstat_dirty()) return; =20 + refresh_cpu_vm_stats(false); + /* - * Just refresh counters and do not care about the pending delayed - * vmstat_update. It doesn't fire that often to matter and canceling - * it would be too expensive from this path. - * vmstat_shepherd will take care about that for us. + * If the tick is stopped, cancel any delayed work to avoid + * interruptions to this CPU in the future. */ - refresh_cpu_vm_stats(false); + dw =3D &per_cpu(vmstat_work, smp_processor_id()); + if (delayed_work_pending(dw) && tick_nohz_tick_stopped()) + cancel_delayed_work(dw); } =20 /* @@ -2024,6 +2029,9 @@ static void vmstat_shepherd(struct work_struct *w) for_each_online_cpu(cpu) { struct delayed_work *dw =3D &per_cpu(vmstat_work, cpu); =20 + if (tick_nohz_tick_stopped_cpu(cpu)) + continue; + if (!delayed_work_pending(dw) && per_cpu(vmstat_dirty, cpu)) queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0); =20 --=20 2.37.1 From nobody Sun Dec 14 06:34:19 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B00BC32771 for ; Sat, 24 Sep 2022 15:23:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233807AbiIXPXC (ORCPT ); Sat, 24 Sep 2022 11:23:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233703AbiIXPWw (ORCPT ); Sat, 24 Sep 2022 11:22:52 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B91E8709C for ; Sat, 24 Sep 2022 08:22:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664032965; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FfsmTvmGXeM1/5qFA1Qh7h37ne6tkdVL6kPv6MNEep4=; b=JY4K53SvpnZ5+pIV375oZl9HWqRBBNKLpBE4G0H4USMcV3sbe6nomhlW1NU7ZThFTcXJIE KBOxETVAlyaU8z8itU7y1XbnLQzjsOT5XSfFsJPe5aKHkX/lGxFLh49julzGkwq3+oIBH3 eHd8oTvD6x4S2VVya5xiidqcjdsOcqg= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-99-WxaCzjl1N6S3eY3sgWrY9Q-1; Sat, 24 Sep 2022 11:22:44 -0400 X-MC-Unique: WxaCzjl1N6S3eY3sgWrY9Q-1 Received: by mail-wr1-f70.google.com with SMTP id h20-20020adfaa94000000b0022af8c26b72so485790wrc.7 for ; Sat, 24 Sep 2022 08:22:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=FfsmTvmGXeM1/5qFA1Qh7h37ne6tkdVL6kPv6MNEep4=; b=a01PPYz9RYbA64SyeKEJ51aTPoCFdJ0toUsVWq+RZf2p5kI3Lv3Cwq71dpjYfn4DD3 ZkQ3zJWfzJ52binUkdw8mvobWiSzy/pgKR/IYN9sDJtQkTSEcjWYMu31sbFgOA2WOTyx wW8kZw6XmK4MttAAuHOFDHBQ5r09MKmJcmuosjktKCoqinnZk7DEa4zT+zb9bfLqoUkn j//arVJJIRcBoVs6QG2lTMIucdjJAaFGb3fea0plPJ+rZsCYQSbLxkpckmU/MR54h4HM NQVmgjIPgsyauV1hLIdx57DlW3LwZ1QZ870v6H3gTZT/Vpz6MKpsoKSJ3D7b3AVxU44y PkNQ== X-Gm-Message-State: ACrzQf2C3ohpCnouMf8nQjx7jG8Cola3giu3lMmdUBIsuRuC0+tA1zWL dA3q/nxTYNRhwRFgud7qS+Zcs4fb28e9/ZhGK/wh6GzZwW190UtGXz505xim9EDP9kQ4djlDSn/ +cw09tkHPxQp95ThpuRBkkoE= X-Received: by 2002:a05:6000:18aa:b0:22a:b61d:877a with SMTP id b10-20020a05600018aa00b0022ab61d877amr8412523wri.512.1664032961876; Sat, 24 Sep 2022 08:22:41 -0700 (PDT) X-Google-Smtp-Source: AMsMyM56wiVnnOEt4nyShWrMqWg2SekETLv7jY1m2hDmQ/noBoVfgCeZVDYazjwtyeDkFVI65YEN6g== X-Received: by 2002:a05:6000:18aa:b0:22a:b61d:877a with SMTP id b10-20020a05600018aa00b0022ab61d877amr8412517wri.512.1664032961709; Sat, 24 Sep 2022 08:22:41 -0700 (PDT) Received: from localhost (cpc111743-lutn13-2-0-cust979.9-3.cable.virginm.net. [82.17.115.212]) by smtp.gmail.com with ESMTPSA id o3-20020a05600c378300b003b48dac344esm5421645wmr.43.2022.09.24.08.22.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 24 Sep 2022 08:22:41 -0700 (PDT) From: Aaron Tomlin To: frederic@kernel.org, mtosatti@redhat.com Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, atomlin@atomlin.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v8 4/5] tick/nohz_full: Ensure quiet_vmstat() is called on exit to user-mode when the idle tick is stopped Date: Sat, 24 Sep 2022 16:22:26 +0100 Message-Id: <20220924152227.819815-5-atomlin@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220924152227.819815-1-atomlin@redhat.com> References: <20220924152227.819815-1-atomlin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This patch ensures CPU-specific vmstat differentials do not remain when the scheduling-tick is stopped and before exiting to user-mode in the context of nohz_full only. A trivial test program was used to determine the impact of the proposed changes and under vanilla. The mlock(2) and munlock(2) system calls was used solely to modify vmstat item 'NR_MLOCK'. The following is an average count of CPU-cycles across the aforementioned system calls: Vanilla Modified Cycles per syscall 8461 8690 (+2.6%) Signed-off-by: Aaron Tomlin --- include/linux/tick.h | 5 +++-- kernel/time/tick-sched.c | 15 +++++++++++++++ 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/include/linux/tick.h b/include/linux/tick.h index bfd571f18cfd..a2bbd6d32e33 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -11,7 +11,6 @@ #include #include #include -#include =20 #ifdef CONFIG_GENERIC_CLOCKEVENTS extern void __init tick_init(void); @@ -272,6 +271,7 @@ static inline void tick_dep_clear_signal(struct signal_= struct *signal, =20 extern void tick_nohz_full_kick_cpu(int cpu); extern void __tick_nohz_task_switch(void); +void __tick_nohz_user_enter_prepare(void); extern void __init tick_nohz_full_setup(cpumask_var_t cpumask); #else static inline bool tick_nohz_full_enabled(void) { return false; } @@ -296,6 +296,7 @@ static inline void tick_dep_clear_signal(struct signal_= struct *signal, =20 static inline void tick_nohz_full_kick_cpu(int cpu) { } static inline void __tick_nohz_task_switch(void) { } +static inline void __tick_nohz_user_enter_prepare(void) { } static inline void tick_nohz_full_setup(cpumask_var_t cpumask) { } #endif =20 @@ -308,7 +309,7 @@ static inline void tick_nohz_task_switch(void) static inline void tick_nohz_user_enter_prepare(void) { if (tick_nohz_full_cpu(smp_processor_id())) - rcu_nocb_flush_deferred_wakeup(); + __tick_nohz_user_enter_prepare(); } =20 #endif diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index b0e3c9205946..634cd0fac267 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -26,6 +26,7 @@ #include #include #include +#include =20 #include =20 @@ -519,6 +520,20 @@ void __tick_nohz_task_switch(void) } } =20 +void __tick_nohz_user_enter_prepare(void) +{ + struct tick_sched *ts; + + if (tick_nohz_full_cpu(smp_processor_id())) { + ts =3D this_cpu_ptr(&tick_cpu_sched); + + if (ts->tick_stopped) + quiet_vmstat(); + rcu_nocb_flush_deferred_wakeup(); + } +} +EXPORT_SYMBOL_GPL(__tick_nohz_user_enter_prepare); + /* Get the boot-time nohz CPU list from the kernel parameters. */ void __init tick_nohz_full_setup(cpumask_var_t cpumask) { --=20 2.37.1 From nobody Sun Dec 14 06:34:19 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C6CAC32771 for ; Sat, 24 Sep 2022 15:24:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233417AbiIXPYx (ORCPT ); Sat, 24 Sep 2022 11:24:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40240 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229957AbiIXPYu (ORCPT ); Sat, 24 Sep 2022 11:24:50 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4891B1A3A4 for ; Sat, 24 Sep 2022 08:24:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664033085; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Bph3AMipjhSnF4iEVnBvZMNWDbHORj0Zn37njSUExec=; b=AHnEyQjDJZ95pD1aiUBrQGS/DcNHxy/W0hoAJoF+ZzBUgtsacqQMgaGtrJK8UE1ARdJTFZ XiRGYG/9knwWQvNCDMW6m8H1H6kqPvDDwXQKgloLr+Ut817hbCUkcvFxJTBTBd84f+7+CN GAQtjdNdyWza+dkKhexJkLqEvsKDB60= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-474-uKl66JGsMTW6tnvyC0AX6Q-1; Sat, 24 Sep 2022 11:24:44 -0400 X-MC-Unique: uKl66JGsMTW6tnvyC0AX6Q-1 Received: by mail-wm1-f69.google.com with SMTP id c129-20020a1c3587000000b003b5152ebf93so2170867wma.2 for ; Sat, 24 Sep 2022 08:24:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date; bh=Bph3AMipjhSnF4iEVnBvZMNWDbHORj0Zn37njSUExec=; b=g+ji1DuF62JxPmetanCRLzgivB22tKcrWMSVSS0R32ER0kRKiqH312WkPWZg1QqUV+ ZZ3+dZLuCPSP51JYqczXFt1iKH6PZfBDUdXvkraxNZOE0j42V9A+/vhPZC5GQri3PQaF 60pjR2s3wxvVg8pXEe1lFXDyFKWqEmxvny6KhiJe1jnoHyAYufKZ58H59DKACYIMU9qc mcWA/k3zwvHGT2uK3HN+HsmnfCUvdXNKO6uPcPkNzPOWLKa4IkOEmNkbadrFBfaeXOpm 6iRzqNz5/gR5T5Opc7m/L105zf2CWiDnLGBFSNXMP90HuSRFVf7/YMEUmNswL2qLa2kw AvMA== X-Gm-Message-State: ACrzQf3HzfF+/5+dhYhsi3czo/eNXDZId6jPVmKQlYPskhoK4Vdu/9KR 2ueO4XwBHx5TG3TEi3CyBE2DKjWtLtrQzsX81XjzXwpXT0QmAPekbi4s/l+82+F1wwSOU61r02i Xt3I7RiPW0rf+VJ+uDQ0gtQA= X-Received: by 2002:a5d:5846:0:b0:22a:f7d2:9045 with SMTP id i6-20020a5d5846000000b0022af7d29045mr8818997wrf.250.1664033083086; Sat, 24 Sep 2022 08:24:43 -0700 (PDT) X-Google-Smtp-Source: AMsMyM70EhfgjAjHBzizhVeJg3h6UFmOlUDSwdahLb7n7DZk1+e242Hq1wNDo3XO+Xfaa3AFFa0HHg== X-Received: by 2002:a5d:5846:0:b0:22a:f7d2:9045 with SMTP id i6-20020a5d5846000000b0022af7d29045mr8818987wrf.250.1664033082825; Sat, 24 Sep 2022 08:24:42 -0700 (PDT) Received: from localhost (cpc111743-lutn13-2-0-cust979.9-3.cable.virginm.net. [82.17.115.212]) by smtp.gmail.com with ESMTPSA id h6-20020a05600c350600b003b491f99a25sm6157684wmq.22.2022.09.24.08.24.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 24 Sep 2022 08:24:41 -0700 (PDT) From: Aaron Tomlin To: frederic@kernel.org, mtosatti@redhat.com Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, atomlin@atomlin.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v8 5/5] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too Date: Sat, 24 Sep 2022 16:24:41 +0100 Message-Id: <20220924152441.822460-1-atomlin@redhat.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220924152227.819815-1-atomlin@redhat.com> References: <20220924152227.819815-1-atomlin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In the context of the idle task and an adaptive-tick mode/or a nohz_full CPU, quiet_vmstat() can be called: before stopping the idle tick, entering an idle state and on exit. In particular, for the latter case, when the idle task is required to reschedule, the idle tick can remain stopped and the timer expiration time endless i.e., KTIME_MAX. Now, indeed before a nohz_full CPU enters an idle state, CPU-specific vmstat counters should be processed to ensure the respective values have been reset and folded into the zone specific 'vm_stat[]'. That being said, it can only occur when: the idle tick was previously stopped, and reprogramming of the timer is not required. A customer provided some evidence which indicates that the idle tick was stopped; albeit, CPU-specific vmstat counters still remained populated. Thus one can only assume quiet_vmstat() was not invoked on return to the idle loop. If I understand correctly, I suspect this divergence might erroneously prevent a reclaim attempt by kswapd. If the number of zone specific free pages are below their per-cpu drift value then zone_page_state_snapshot() is used to compute a more accurate view of the aforementioned statistic. Thus any task blocked on the NUMA node specific pfmemalloc_wait queue will be unable to make significant progress via direct reclaim unless it is killed after being woken up by kswapd (see throttle_direct_reclaim()). Consider the following theoretical scenario: - Note: CPU X is part of 'tick_nohz_full_mask' 1. CPU Y migrated running task A to CPU X that was in an idle state i.e. waiting for an IRQ; marked the current task on CPU X to need/or require a reschedule i.e., set TIF_NEED_RESCHED and invoked a reschedule IPI to CPU X (see sched_move_task()) 2. CPU X acknowledged the reschedule IPI. Generic idle loop code noticed the TIF_NEED_RESCHED flag against the idle task and attempts to exit of the loop and calls the main scheduler function i.e. __schedule(). Since the idle tick was previously stopped no scheduling-clock tick would occur. So, no deferred timers would be handled 3. Post transition to kernel execution Task A running on CPU X, indirectly released a few pages (e.g. see __free_one_page()); CPU X's 'vm_stat_diff[NR_FREE_PAGES]' was updated and zone specific 'vm_stat[]' update was deferred as per the CPU-specific stat threshold 4. Task A does invoke exit(2) and the kernel does remove the task from the run-queue; the idle task was selected to execute next since there are no other runnable tasks assigned to the given CPU (see pick_next_task() and pick_next_task_idle()) 5. On return to the idle loop since the idle tick was already stopped and can remain so (see [1] below) e.g. no pending soft IRQs, no attempt is made to zero and fold CPU X's vmstat counters since reprogramming of the scheduling-clock tick is not required/or needed (see [2]) ... do_idle { __current_set_polling() tick_nohz_idle_enter() while (!need_resched()) { local_irq_disable() ... /* No polling or broadcast event */ cpuidle_idle_call() { if (cpuidle_not_available(drv, dev)) { tick_nohz_idle_stop_tick() __tick_nohz_idle_stop_tick(this_cpu_ptr(&tick_cpu_sched)) { int cpu =3D smp_processor_id() if (ts->timer_expires_base) expires =3D ts->timer_expires else if (can_stop_idle_tick(cpu, ts)) (1) -------> expires =3D tick_nohz_next_event(ts, cpu) else return ts->idle_calls++ if (expires > 0LL) { tick_nohz_stop_tick(ts, cpu) { if (ts->tick_stopped && (expires =3D=3D ts->next_tick)) { (2) -------> if (tick =3D=3D KTIME_MAX || ts->next_tick = =3D=3D hrtimer_get_expires(&ts->sched_timer)) return } ... } So, the idea of this patch is to ensure refresh_cpu_vm_stats(false) is called, when it is appropriate, on return to the idle loop if the idle tick was previously stopped too. A trivial test program was used to determine the impact of the proposed changes and under vanilla. The nanosleep(2) system call was used several times to suspend execution for a period of time to approximately compute the number of CPU-cycles in the idle code path. The following is an average count of CPU-cycles: Vanilla Modified Cycles per idle loop 151858 153258 (+1.0%) Signed-off-by: Aaron Tomlin --- kernel/time/tick-sched.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 634cd0fac267..88a3e9fc3824 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -926,13 +926,14 @@ static void tick_nohz_stop_tick(struct tick_sched *ts= , int cpu) */ if (!ts->tick_stopped) { calc_load_nohz_start(); - quiet_vmstat(); =20 ts->last_tick =3D hrtimer_get_expires(&ts->sched_timer); ts->tick_stopped =3D 1; trace_tick_stop(1, TICK_DEP_MASK_NONE); } =20 + /* Attempt to fold when the idle tick is stopped or not */ + quiet_vmstat(); ts->next_tick =3D tick; =20 /* --=20 2.37.1