From nobody Sun May 24 21:38:46 2026 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4AF02360ECE for ; Thu, 21 May 2026 03:21:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.177.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779333709; cv=none; b=FiktrgLginSP9OQWRzgQltAKT+gipLNRJsDTXYrHmHP+PzpH7bXHveD1zaQIk2ZNnUy5sggMWtSswZ/ZE770GZUOZlmGBZA7daDiaAC0ep8XpLw+0AbH4ow48Gkt+Xm0wUA93K4hDqXqI9E2GCuNCerOPWcMV7LdYWiaU+DwXn4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779333709; c=relaxed/simple; bh=4zD5zQyeSjjH2Db11Pst7jdk7zxeIk/Z4WM1kxSTv9A=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=szj1SWkqmlxrTM66wZlrfgHgxLFUnjihVE8FHLZbNSUcUbMKglqs+EiFiZk3vzmNkq1Ucqdeh74mQchDiAoRbe/qEofPgh8FWTtk7DX5uaeDXcjpSFgsF8v1KBB2YUtbgDUFU6cIvzqWWbnC1gORhzksmIP56mjc8T06VWa4Kpc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=eAsaUDnL; arc=none smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="eAsaUDnL" Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 64L2sICR2016400; Thu, 21 May 2026 03:21:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:message-id:mime-version :subject:to; s=corp-2025-04-25; bh=87wtpGT1QQTL6InZVmFQZIzEvrgMU +e6OvmS5V1TUVE=; b=eAsaUDnLTlY+uwTHNV7xpoYCBXai8oCWqLkopibLfe8rB QNbV9g4geKxIsqEJZF93qoW7Tdz9JPugWpOnl+ipljqME+KDNaInaDV00cxK7f/Q DCKft2CNhem3EaNIwHOOmJYEoRr0qtK370RO0lxDdQcBlL5mgsScO1bWEtJau70k iFgvnxknk6ucR2zTf0gM/NAKJrXo1iTooPNgZJnTvpANGHuPwQntq9S1dFu9hbTU 7f/bhejAtccHRxg727AgGQmiMxC4uGA697tEi3m9e6Je+YY3tpGdaPES5ggKEsfb 7aGqsgyKkp4z14WTZ1yQDTyUvm2vccCEvU9PPLHdQ== Received: from iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta03.appoci.oracle.com [130.35.103.27]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 4e6h1t0dtn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 21 May 2026 03:21:18 +0000 (GMT) Received: from pps.filterd (iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (8.18.1.7/8.18.1.7) with ESMTP id 64L3JlY8010838; Thu, 21 May 2026 03:21:17 GMT Received: from imran-metabox.au.oracle.com (dhcp-10-191-115-188.vpn.oracle.com [10.191.115.188]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id 4e6f1j36fy-1; Thu, 21 May 2026 03:21:16 +0000 (GMT) From: Imran Khan To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org Cc: dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Subject: [RFC PATCH RESEND] sched/rt: skip RT bandwidth accounting for unobserved CPU stalls Date: Thu, 21 May 2026 11:21:14 +0800 Message-Id: <20260521032114.2694984-1-imran.f.khan@oracle.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-05-20_03,2026-05-18_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 spamscore=0 bulkscore=0 lowpriorityscore=0 suspectscore=0 phishscore=0 mlxscore=0 mlxlogscore=999 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2605130000 definitions=main-2605210030 X-Proofpoint-ORIG-GUID: Wn3PSwxB0pemdhtdjDNbCGGpN9ukEtOI X-Authority-Analysis: v=2.4 cv=aoKCzyZV c=1 sm=1 tr=0 ts=6a0e7a2e b=1 cx=c_pps a=qoll8+KPOyaMroiJ2sR5sw==:117 a=qoll8+KPOyaMroiJ2sR5sw==:17 a=NGcC8JguVDcA:10 a=VkNPw1HP01LnGYTKEx00:22 a=jiCTI4zE5U7BLdzWsZGv:22 a=o5oIOnhZENCTenyL_yNV:22 a=yPCof4ZbAAAA:8 a=NcSqvaeKeQX40LdDPfAA:9 a=5yU3S35YU4bGjq-dph-N:22 a=Bho9c0fBagfJEIQBS7DQ:22 cc=ntf awl=host:12299 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTIxMDAzMCBTYWx0ZWRfX1Fid9ZeDFpji 62ZzdRJ5fasa8nTS0R7KA19k6lD7Kh9X4xq3StlL+CcCpMxahaeReqw2BHjFhQeSH8cWfKWOXoV kArBFvO+7ytJeDziYZwSQ6RcLd9NgsqbANPiRXTMmzdg8NoW1nrFg2QpatwMO3kq/59PFCFTuae /JizoAiau+dXWbKRx8Yqh9Q4MfpqmBXVJyWwo+lawVFreNNhNDUe6uQdsh74uyuomy7Q2q4Rg9V NcWeLZMMtWfwJj9PoS8HLkH9DFYTuwCTz9abCePgHfkdgpQNKNxgACJSXYk3IPpy3opmIZoPc5u Q+8kluW/yHfYG6qmFKB+40r1eCcQ8H7+Wfm4TIwIFyMrehlAAlNArDB50StNtVXZj8EybMUISCI FB5Rgpmq/sJx4cv7llPpJkupMTDjFWjtbBNGKkajtZ0/V6GYDCjXn3HQEf7+cMTaoq3HURAcMcF ZnGplUd+I+AYOLXFu924bSeTQV7rwkSoSVHYRT7A= X-Proofpoint-GUID: Wn3PSwxB0pemdhtdjDNbCGGpN9ukEtOI Content-Type: text/plain; charset="utf-8" After a CPU stall which the guest scheduler did not observe ( for example KVM live-migration where stop_and_copy takes long), the next update_curr_rt= () charges a delta_exec equal to the entire stall to the current RT task and also to rt_rq::rt_time. With the default sched_rt_runtime_us=3D950000 and sched_rt_period_us=3D1000000, even a few seconds of stall can set rt_thrott= led , dequeue the current RT task and keep it off the runq for multiple seconds. For example following snippet shows one such instance where pid 30274 was the current task on CPU 45, during live migration. After live migration it got preempted and has been on the runq for the last ~10 secs. CPU is idle but RT task can't get on it because rt_runtime overrun has not been compensated yet: crash> runq -c 45 CPU 45 RUNQUEUE: ff1c8cb63d972840 CURRENT: PID: 0 TASK: ff1c8c77c6c7a080 COMMAND: "swapper/45" RT PRIO_ARRAY: ff1c8cb63d972ac0 [ 0] PID: 30274 TASK: ff1c8c7d9aad4100 COMMAND: "NMSending" [ 0] PID: 30791 TASK: ff1c8c7c2098a080 COMMAND: "cssdagent" >>> per_cpu(prog["runqueues"], 45).clock_task.value_() 10537385941842 >>> per_cpu(prog["runqueues"], 45).rt.rt_time.value_() 6571872703 >>> per_cpu(prog["runqueues"], 45).clock_task.value_() - \ find_task(30274).se.exec_start.value_() 10537394410 This snippet is from a system using v5.15.y kernel and as of now I don't have a vmcore with current upstream tip but I could reproduce similar time = jump on current tip as well. This change resets delta_exec to zero upon detecting a guest pause and hence prevents exorbitant jumps in rt_rq::rt_time. Signed-off-by: Imran Khan --- kernel/sched/rt.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) I have kept the patch RFC because I am not sure if it should be fixed on the KVM side. diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index f69e1f16d9238..e8d83080c3842 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -7,6 +7,8 @@ #include "sched.h" #include "pelt.h" =20 +#include + int sched_rr_timeslice =3D RR_TIMESLICE; /* More than 4 hours if BW_SHIFT equals 20. */ static const u64 max_rt_runtime =3D MAX_BW; @@ -989,6 +991,18 @@ static void update_curr_rt(struct rq *rq) if (!rt_bandwidth_enabled()) return; =20 + /* + * Forgive RT bandwidth charged across an unobserved CPU stall + * like KVM live-migration stop_and_copy. + * + * The magnitude check is to avoid race where the local softlockup + * hrtimer consumed PVCLOCK_GUEST_STOPPED bit before this + * update_curr_rt() call. + */ + if (kvm_check_and_clear_guest_paused() || + unlikely(delta_exec > (u64)sysctl_sched_rt_period * NSEC_PER_USEC)) + delta_exec =3D 0; + for_each_sched_rt_entity(rt_se) { struct rt_rq *rt_rq =3D rt_rq_of_se(rt_se); int exceeded; base-commit: 591cd656a1bf5ea94a222af5ef2ee76df029c1d2 --=20 2.34.1