From nobody Sun Feb  8 02:45:03 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 76967C7EE25
	for <linux-kernel@archiver.kernel.org>; Thu,  8 Jun 2023 01:12:37 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232918AbjFHBMf (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 7 Jun 2023 21:12:35 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41560 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230418AbjFHBMd (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 7 Jun 2023 21:12:33 -0400
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.133.124])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 61E6E269F
        for <linux-kernel@vger.kernel.org>;
 Wed,  7 Jun 2023 18:11:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
        s=mimecast20190719; t=1686186705;
        h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
         to:to:cc:cc:mime-version:mime-version:content-type:content-type;
        bh=mdQ0lF7C7zgaRMoTdeqNXNG4mPlYrnhL6rmrdXZfaoI=;
        b=XZmp0ACewuoRonIncg5Uzv8nIIX0l73ii6rj0oYTpleAgsyAl+F0lpq8hZme1AYS9MpXo7
        rOfjs8R4t8IUMKCVdDjKa2SAeqDswcPMTrL/B7EFLDqwLR4pAooyzuLqbSIptLGpkym0Xh
        YVkuPuGdryIm35U3OPg4sG8ehmiK2Tc=
Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com
 [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-124-y1ZFZL7DPKeBSvgVD9UArA-1; Wed, 07 Jun 2023 21:11:42 -0400
X-MC-Unique: y1ZFZL7DPKeBSvgVD9UArA-1
Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com
 [10.11.54.10])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D75D5185A78E;
        Thu,  8 Jun 2023 01:11:41 +0000 (UTC)
Received: from tpad.localdomain (ovpn-112-2.gru2.redhat.com [10.97.112.2])
        by smtp.corp.redhat.com (Postfix) with ESMTPS id A20F6403367;
        Thu,  8 Jun 2023 01:11:41 +0000 (UTC)
Received: by tpad.localdomain (Postfix, from userid 1000)
        id C29CC40E16DC2; Wed,  7 Jun 2023 17:28:07 -0300 (-03)
Date: Wed, 7 Jun 2023 17:28:07 -0300
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <frederic@kernel.org>,
        linux-kernel@vger.kernel.org, linux-mm@kvack.org,
        Vlastimil Babka <vbabka@suse.cz>,
        Michal Hocko <mhocko@suse.com>
Subject: [PATCH] vmstat: skip periodic vmstat update for isolated CPUs
Message-ID: <ZIDoV/zxFKVmQl7W@tpad>
MIME-Version: 1.0
Content-Disposition: inline
X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"


Problem: The interruption caused by vmstat_update is undesirable
for certain applications.

With workloads that are running on isolated cpus with nohz full mode to
shield off any kernel interruption. For example, a VM running a
time sensitive application with a 50us maximum acceptable interruption
(use case: soft PLC).

oslat   1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000)
oslat   1094.456971: workqueue_queue_work: ... function=3Dvmstat_update ...
oslat   1094.456974: sched_switch: prev_comm=3Doslat ... =3D=3D> next_comm=
=3Dkworker/5:1 ...
kworker 1094.456978: sched_switch: prev_comm=3Dkworker/5:1 =3D=3D> next_com=
m=3Doslat ...

The example above shows an additional 7us for the
        oslat -> kworker -> oslat

switches. In the case of a virtualized CPU, and the vmstat_update
interruption in the host (of a qemu-kvm vcpu), the latency penalty
observed in the guest is higher than 50us, violating the acceptable
latency threshold.

The isolated vCPU can perform operations that modify per-CPU page counters,
for example to complete I/O operations:

      CPU 11/KVM-9540    [001] dNh1.  2314.248584: mod_zone_page_state <-__=
folio_end_writeback
      CPU 11/KVM-9540    [001] dNh1.  2314.248585: <stack trace>
 =3D> 0xffffffffc042b083
 =3D> mod_zone_page_state
 =3D> __folio_end_writeback
 =3D> folio_end_writeback
 =3D> iomap_finish_ioend
 =3D> blk_mq_end_request_batch
 =3D> nvme_irq
 =3D> __handle_irq_event_percpu
 =3D> handle_irq_event
 =3D> handle_edge_irq
 =3D> __common_interrupt
 =3D> common_interrupt
 =3D> asm_common_interrupt
 =3D> vmx_do_interrupt_nmi_irqoff
 =3D> vmx_handle_exit_irqoff
 =3D> vcpu_enter_guest
 =3D> vcpu_run
 =3D> kvm_arch_vcpu_ioctl_run
 =3D> kvm_vcpu_ioctl
 =3D> __x64_sys_ioctl
 =3D> do_syscall_64
 =3D> entry_SYSCALL_64_after_hwframe

In kernel users of vmstat counters either require the precise value and
they are using zone_page_state_snapshot interface or they can live with
an imprecision as the regular flushing can happen at arbitrary time and
cumulative error can grow (see calculate_normal_threshold).

>>From that POV the regular flushing can be postponed for CPUs that have   =
                                                                           =
                                                                           =
                              =20
been isolated from the kernel interference without critical
infrastructure ever noticing. Skip regular flushing from vmstat_shepherd
for all isolated CPUs to avoid interference with the isolated workload.

Suggested by Michal Hocko.

Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---

v3: improve changelog		(Michal Hocko)
v2: use cpu_is_isolated		(Michal Hocko)

Index: linux-vmstat-remote/mm/vmstat.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- linux-vmstat-remote.orig/mm/vmstat.c
+++ linux-vmstat-remote/mm/vmstat.c
@@ -28,6 +28,7 @@
 #include <linux/mm_inline.h>
 #include <linux/page_ext.h>
 #include <linux/page_owner.h>
+#include <linux/sched/isolation.h>
=20
 #include "internal.h"
=20
@@ -2022,6 +2023,20 @@ static void vmstat_shepherd(struct work_
 	for_each_online_cpu(cpu) {
 		struct delayed_work *dw =3D &per_cpu(vmstat_work, cpu);
=20
+		/*
+		 * In kernel users of vmstat counters either require the precise value a=
nd
+		 * they are using zone_page_state_snapshot interface or they can live wi=
th
+		 * an imprecision as the regular flushing can happen at arbitrary time a=
nd
+		 * cumulative error can grow (see calculate_normal_threshold).
+		 *
+		 * From that POV the regular flushing can be postponed for CPUs that have
+		 * been isolated from the kernel interference without critical
+		 * infrastructure ever noticing. Skip regular flushing from vmstat_sheph=
erd
+		 * for all isolated CPUs to avoid interference with the isolated workloa=
d.
+		 */
+		if (cpu_is_isolated(cpu))
+			continue;
+
 		if (!delayed_work_pending(dw) && need_update(cpu))
 			queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0);