From nobody Sun May 24 21:38:46 2026
Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com
 [205.220.177.32])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4AF02360ECE
	for <linux-kernel@vger.kernel.org>; Thu, 21 May 2026 03:21:48 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=205.220.177.32
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1779333709; cv=none;
 b=FiktrgLginSP9OQWRzgQltAKT+gipLNRJsDTXYrHmHP+PzpH7bXHveD1zaQIk2ZNnUy5sggMWtSswZ/ZE770GZUOZlmGBZA7daDiaAC0ep8XpLw+0AbH4ow48Gkt+Xm0wUA93K4hDqXqI9E2GCuNCerOPWcMV7LdYWiaU+DwXn4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1779333709; c=relaxed/simple;
	bh=4zD5zQyeSjjH2Db11Pst7jdk7zxeIk/Z4WM1kxSTv9A=;
	h=From:To:Cc:Subject:Date:Message-Id:MIME-Version;
 b=szj1SWkqmlxrTM66wZlrfgHgxLFUnjihVE8FHLZbNSUcUbMKglqs+EiFiZk3vzmNkq1Ucqdeh74mQchDiAoRbe/qEofPgh8FWTtk7DX5uaeDXcjpSFgsF8v1KBB2YUtbgDUFU6cIvzqWWbnC1gORhzksmIP56mjc8T06VWa4Kpc=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=oracle.com;
 spf=pass smtp.mailfrom=oracle.com;
 dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com
 header.b=eAsaUDnL; arc=none smtp.client-ip=205.220.177.32
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=oracle.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=oracle.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com
 header.b="eAsaUDnL"
Received: from pps.filterd (m0246631.ppops.net [127.0.0.1])
	by mx0b-00069f02.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id
 64L2sICR2016400;
	Thu, 21 May 2026 03:21:18 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc
	:content-transfer-encoding:date:from:message-id:mime-version
	:subject:to; s=corp-2025-04-25; bh=87wtpGT1QQTL6InZVmFQZIzEvrgMU
	+e6OvmS5V1TUVE=; b=eAsaUDnLTlY+uwTHNV7xpoYCBXai8oCWqLkopibLfe8rB
	QNbV9g4geKxIsqEJZF93qoW7Tdz9JPugWpOnl+ipljqME+KDNaInaDV00cxK7f/Q
	DCKft2CNhem3EaNIwHOOmJYEoRr0qtK370RO0lxDdQcBlL5mgsScO1bWEtJau70k
	iFgvnxknk6ucR2zTf0gM/NAKJrXo1iTooPNgZJnTvpANGHuPwQntq9S1dFu9hbTU
	7f/bhejAtccHRxg727AgGQmiMxC4uGA697tEi3m9e6Je+YY3tpGdaPES5ggKEsfb
	7aGqsgyKkp4z14WTZ1yQDTyUvm2vccCEvU9PPLHdQ==
Received: from iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com
 (iadpaimrmta03.appoci.oracle.com [130.35.103.27])
	by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 4e6h1t0dtn-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
	Thu, 21 May 2026 03:21:18 +0000 (GMT)
Received: from pps.filterd
 (iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1])
	by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (8.18.1.7/8.18.1.7)
 with ESMTP id 64L3JlY8010838;
	Thu, 21 May 2026 03:21:17 GMT
Received: from imran-metabox.au.oracle.com (dhcp-10-191-115-188.vpn.oracle.com
 [10.191.115.188])
	by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id
 4e6f1j36fy-1;
	Thu, 21 May 2026 03:21:16 +0000 (GMT)
From: Imran Khan <imran.f.khan@oracle.com>
To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
        vincent.guittot@linaro.org
Cc: dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com,
        mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org
Subject: [RFC PATCH RESEND] sched/rt: skip RT bandwidth accounting for
 unobserved CPU stalls
Date: Thu, 21 May 2026 11:21:14 +0800
Message-Id: <20260521032114.2694984-1-imran.f.khan@oracle.com>
X-Mailer: git-send-email 2.34.1
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49
 definitions=2026-05-20_03,2026-05-18_01,2025-10-01_01
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0
 spamscore=0 bulkscore=0 lowpriorityscore=0 suspectscore=0 phishscore=0
 mlxscore=0 mlxlogscore=999 malwarescore=0 classifier=spam adjust=0 reason=mlx
 scancount=1 engine=8.19.0-2605130000 definitions=main-2605210030
X-Proofpoint-ORIG-GUID: Wn3PSwxB0pemdhtdjDNbCGGpN9ukEtOI
X-Authority-Analysis: v=2.4 cv=aoKCzyZV c=1 sm=1 tr=0 ts=6a0e7a2e b=1 cx=c_pps
 a=qoll8+KPOyaMroiJ2sR5sw==:117 a=qoll8+KPOyaMroiJ2sR5sw==:17
 a=NGcC8JguVDcA:10 a=VkNPw1HP01LnGYTKEx00:22 a=jiCTI4zE5U7BLdzWsZGv:22
 a=o5oIOnhZENCTenyL_yNV:22 a=yPCof4ZbAAAA:8 a=NcSqvaeKeQX40LdDPfAA:9
 a=5yU3S35YU4bGjq-dph-N:22 a=Bho9c0fBagfJEIQBS7DQ:22 cc=ntf awl=host:12299
X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTIxMDAzMCBTYWx0ZWRfX1Fid9ZeDFpji
 62ZzdRJ5fasa8nTS0R7KA19k6lD7Kh9X4xq3StlL+CcCpMxahaeReqw2BHjFhQeSH8cWfKWOXoV
 kArBFvO+7ytJeDziYZwSQ6RcLd9NgsqbANPiRXTMmzdg8NoW1nrFg2QpatwMO3kq/59PFCFTuae
 /JizoAiau+dXWbKRx8Yqh9Q4MfpqmBXVJyWwo+lawVFreNNhNDUe6uQdsh74uyuomy7Q2q4Rg9V
 NcWeLZMMtWfwJj9PoS8HLkH9DFYTuwCTz9abCePgHfkdgpQNKNxgACJSXYk3IPpy3opmIZoPc5u
 Q+8kluW/yHfYG6qmFKB+40r1eCcQ8H7+Wfm4TIwIFyMrehlAAlNArDB50StNtVXZj8EybMUISCI
 FB5Rgpmq/sJx4cv7llPpJkupMTDjFWjtbBNGKkajtZ0/V6GYDCjXn3HQEf7+cMTaoq3HURAcMcF
 ZnGplUd+I+AYOLXFu924bSeTQV7rwkSoSVHYRT7A=
X-Proofpoint-GUID: Wn3PSwxB0pemdhtdjDNbCGGpN9ukEtOI
Content-Type: text/plain; charset="utf-8"

After a CPU stall which the guest scheduler did not observe ( for example
KVM live-migration where stop_and_copy takes long), the next update_curr_rt=
()
charges a delta_exec equal to the entire stall to the current RT task and
also to rt_rq::rt_time. With the default sched_rt_runtime_us=3D950000 and
sched_rt_period_us=3D1000000, even a few seconds of stall can set rt_thrott=
led
, dequeue the current RT task and keep it off the runq for multiple seconds.

For example following snippet shows one such instance where pid 30274
was the current task on CPU 45, during live migration. After live migration
it got preempted and has been on the runq for the last ~10 secs. CPU is idle
but RT task can't get on it because rt_runtime overrun has not been
compensated yet:

crash> runq -c 45
CPU 45 RUNQUEUE: ff1c8cb63d972840
  CURRENT: PID: 0      TASK: ff1c8c77c6c7a080  COMMAND: "swapper/45"
  RT PRIO_ARRAY: ff1c8cb63d972ac0
     [  0] PID: 30274  TASK: ff1c8c7d9aad4100  COMMAND: "NMSending"
     [  0] PID: 30791  TASK: ff1c8c7c2098a080  COMMAND: "cssdagent"

>>> per_cpu(prog["runqueues"], 45).clock_task.value_()
10537385941842
>>> per_cpu(prog["runqueues"], 45).rt.rt_time.value_()
6571872703

>>> per_cpu(prog["runqueues"], 45).clock_task.value_() - \
    find_task(30274).se.exec_start.value_()
10537394410

This snippet is from a system using v5.15.y kernel and as of now I don't
have a vmcore with current upstream tip but I could reproduce similar time =
jump
on current tip as well.

This change resets delta_exec to zero upon detecting a guest pause and hence
prevents exorbitant jumps in rt_rq::rt_time.

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
---
 kernel/sched/rt.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

I have kept the patch RFC because I am not sure if it should be fixed on the
KVM side.

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index f69e1f16d9238..e8d83080c3842 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -7,6 +7,8 @@
 #include "sched.h"
 #include "pelt.h"
=20
+#include <linux/kvm_para.h>
+
 int sched_rr_timeslice =3D RR_TIMESLICE;
 /* More than 4 hours if BW_SHIFT equals 20. */
 static const u64 max_rt_runtime =3D MAX_BW;
@@ -989,6 +991,18 @@ static void update_curr_rt(struct rq *rq)
 	if (!rt_bandwidth_enabled())
 		return;
=20
+	/*
+	 * Forgive RT bandwidth charged across an unobserved CPU stall
+	 * like KVM live-migration stop_and_copy.
+	 *
+	 * The magnitude check is to avoid race where the local softlockup
+	 * hrtimer consumed PVCLOCK_GUEST_STOPPED bit before this
+	 * update_curr_rt() call.
+	 */
+	if (kvm_check_and_clear_guest_paused() ||
+	    unlikely(delta_exec > (u64)sysctl_sched_rt_period * NSEC_PER_USEC))
+		delta_exec =3D 0;
+
 	for_each_sched_rt_entity(rt_se) {
 		struct rt_rq *rt_rq =3D rt_rq_of_se(rt_se);
 		int exceeded;

base-commit: 591cd656a1bf5ea94a222af5ef2ee76df029c1d2
--=20
2.34.1