From nobody Sat Jun 13 13:33:43 2026 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C0E936998F for ; Thu, 7 May 2026 13:01:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.165.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778158902; cv=none; b=MQAkac+P8tBrsqK0pVioFCrkIOE3uOVJoag+RkOWuz5YVL2rvu97BauqEjmt1K1hY2Pmg85fipe86+NOlGS8gy3YA3yEU8QjCRFahlgXnDODnImfy8cskBDAvzn3rk6TU4Z3G4dU70ZcrlGrDcyagmiBpe/mBeDi3Qzqn8utmo8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778158902; c=relaxed/simple; bh=4zD5zQyeSjjH2Db11Pst7jdk7zxeIk/Z4WM1kxSTv9A=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=H3xDQdxKaD8jryuKaODVGwUfqz6UkSseC0qrWfvQt9NO/NgYxmgdBUXnBivxVLAVvJ94GPO3pn2dZ/AMFQdp1D60eLTwNOtNR4VVYB3XxvSAM8eLuqtIU9Zz/6nA5LxpwRzY7ysbWln34XzdW0P86uU2gbvqPROtThY15iGvAJ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=OxQV1U8M; arc=none smtp.client-ip=205.220.165.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="OxQV1U8M" Received: from pps.filterd (m0246629.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 646MfQU93946914; Thu, 7 May 2026 13:00:48 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:message-id:mime-version :subject:to; s=corp-2025-04-25; bh=87wtpGT1QQTL6InZVmFQZIzEvrgMU +e6OvmS5V1TUVE=; b=OxQV1U8MgY7wg/z+EjiZ0OdwSZomI7G+ko8VprmVTf78g VnmbZDDsNsE4X8sXPiQ6UV1UTwPKgphC3R1qf8UqjID7BlvXg+KCDy9/0iYWYBog nlt/1swmVwy97DXQMOLAZw5oh6cYsXxqSVuiyTQnHBFyBTA91dR5ZqHM25CqZT8w EAN/6RWrmyslyiE8Niqfuv4QyDSQsfmHcQoeXV7UJ1yheBRqoUkTYketkQNMv/u+ PJq2TU/EKkP9Nm9HKDG1RlewS3uxAI+cQ0hA/ovWm5b96nUCZ0Ydbbh0J/eprbl+ q2KHTO0HrKg3ztT1rmR4O2hGvj3xAmh4ONwgwxmig== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 4dw9eq0t6b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 07 May 2026 13:00:47 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.7/8.18.1.7) with ESMTP id 647CuEGo024763; Thu, 7 May 2026 13:00:46 GMT Received: from imran-metabox.au.oracle.com (dhcp-10-191-113-106.vpn.oracle.com [10.191.113.106]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 4dx5edgmqt-1; Thu, 07 May 2026 13:00:46 +0000 (GMT) From: Imran Khan To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com Cc: linux-kernel@vger.kernel.org Subject: [RFC PATCH] sched/rt: skip RT bandwidth accounting for unobserved CPU stalls Date: Thu, 7 May 2026 21:00:27 +0800 Message-Id: <20260507130027.3281306-1-imran.f.khan@oracle.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-05-07_01,2026-05-06_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 malwarescore=0 lowpriorityscore=0 mlxscore=0 adultscore=0 suspectscore=0 spamscore=0 bulkscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2604200000 definitions=main-2605070129 X-Proofpoint-ORIG-GUID: mnSi9HaQBj14VN8zKcKtYHV9-0GntQua X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNTA3MDEzMCBTYWx0ZWRfXy7jl96Q48e4k Q7DmWz8mzp+0wsmKTmKGn9mFsWpO7Lq+5erVte7jsPmxqoKVbXO0efmrdsB/tJ7WaD3HrXPbvu5 qnC7/o6PBlvPEFZt+vZ6+5kUhpCMHYa+FyECp3YHFKtGgqIPa1mtysXKKcLlUOrCZ8vQ3bf4YSp FmYtkqlQzsQsp2JsMKsYOfkLFMzYbCIfDSpgrpHrc0IpcIs4D0qW5yLv/EVASZlDR6xe6GSWI/4 5nmletciDlTm+SovqpXeWeekN+CJejtcMK8sXRzKp8KSrBBlzNUqsow+uEc0fEGwqZ/uCqIhnxU HmSagIMk3Q0KdMpSF+SA8M0Bil3CQnWqcGDiCDCTN7YqOCAIlTFphD3tWQrmW7SPoKhmvNsH/Rm dSyyQDbELdW2asubsTwfOCvK7AgyU8mdhF5Rl7L0MXaaDuvbhDHaVJvWldbDrL/tMzuqk33E1Mo btSkRczyTmAsGwOdGnw== X-Authority-Analysis: v=2.4 cv=YKKvDxGx c=1 sm=1 tr=0 ts=69fc8cff cx=c_pps a=XiAAW1AwiKB2Y8Wsi+sD2Q==:117 a=XiAAW1AwiKB2Y8Wsi+sD2Q==:17 a=NGcC8JguVDcA:10 a=VkNPw1HP01LnGYTKEx00:22 a=jiCTI4zE5U7BLdzWsZGv:22 a=EIcjfB9IiI4px24ztqRk:22 a=yPCof4ZbAAAA:8 a=NcSqvaeKeQX40LdDPfAA:9 X-Proofpoint-GUID: mnSi9HaQBj14VN8zKcKtYHV9-0GntQua Content-Type: text/plain; charset="utf-8" After a CPU stall which the guest scheduler did not observe ( for example KVM live-migration where stop_and_copy takes long), the next update_curr_rt= () charges a delta_exec equal to the entire stall to the current RT task and also to rt_rq::rt_time. With the default sched_rt_runtime_us=3D950000 and sched_rt_period_us=3D1000000, even a few seconds of stall can set rt_thrott= led , dequeue the current RT task and keep it off the runq for multiple seconds. For example following snippet shows one such instance where pid 30274 was the current task on CPU 45, during live migration. After live migration it got preempted and has been on the runq for the last ~10 secs. CPU is idle but RT task can't get on it because rt_runtime overrun has not been compensated yet: crash> runq -c 45 CPU 45 RUNQUEUE: ff1c8cb63d972840 CURRENT: PID: 0 TASK: ff1c8c77c6c7a080 COMMAND: "swapper/45" RT PRIO_ARRAY: ff1c8cb63d972ac0 [ 0] PID: 30274 TASK: ff1c8c7d9aad4100 COMMAND: "NMSending" [ 0] PID: 30791 TASK: ff1c8c7c2098a080 COMMAND: "cssdagent" >>> per_cpu(prog["runqueues"], 45).clock_task.value_() 10537385941842 >>> per_cpu(prog["runqueues"], 45).rt.rt_time.value_() 6571872703 >>> per_cpu(prog["runqueues"], 45).clock_task.value_() - \ find_task(30274).se.exec_start.value_() 10537394410 This snippet is from a system using v5.15.y kernel and as of now I don't have a vmcore with current upstream tip but I could reproduce similar time = jump on current tip as well. This change resets delta_exec to zero upon detecting a guest pause and hence prevents exorbitant jumps in rt_rq::rt_time. Signed-off-by: Imran Khan --- kernel/sched/rt.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) I have kept the patch RFC because I am not sure if it should be fixed on the KVM side. diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index f69e1f16d9238..e8d83080c3842 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -7,6 +7,8 @@ #include "sched.h" #include "pelt.h" =20 +#include + int sched_rr_timeslice =3D RR_TIMESLICE; /* More than 4 hours if BW_SHIFT equals 20. */ static const u64 max_rt_runtime =3D MAX_BW; @@ -989,6 +991,18 @@ static void update_curr_rt(struct rq *rq) if (!rt_bandwidth_enabled()) return; =20 + /* + * Forgive RT bandwidth charged across an unobserved CPU stall + * like KVM live-migration stop_and_copy. + * + * The magnitude check is to avoid race where the local softlockup + * hrtimer consumed PVCLOCK_GUEST_STOPPED bit before this + * update_curr_rt() call. + */ + if (kvm_check_and_clear_guest_paused() || + unlikely(delta_exec > (u64)sysctl_sched_rt_period * NSEC_PER_USEC)) + delta_exec =3D 0; + for_each_sched_rt_entity(rt_se) { struct rt_rq *rt_rq =3D rt_rq_of_se(rt_se); int exceeded; base-commit: 591cd656a1bf5ea94a222af5ef2ee76df029c1d2 --=20 2.34.1