From nobody Thu Apr 9 13:33:28 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 62EE22D193F for ; Mon, 2 Mar 2026 15:46:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772466365; cv=none; b=Y58cH4ryc/Kz8pCvDQS8fRzBqUxikVNjY+ZyVpJs0lBJykpjXEsXO/cM2ENdBEk86Bo77yiwnmpREgNmKE/oMSfi7Bjpf3wpSBgL5MYlDMJ/KZ2bkOSAIZQ7RdDalfoV6Mcu5wQiscNgASubd7zXAkviJNBDYuJFAEA+YSAoOCQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772466365; c=relaxed/simple; bh=NOnGHJpogfaiUBFNRdTfRKm7CTEwInkvm6CMzKFClhI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=VNphFBBhH2o8vcqNlTuZ6a3G0KVIQ9Wu9vsBhrVblGkbBVikwszqKMeGgPMUiY9MJF+35I5KCInBiEzR10tTd9eDZoLI/9a2hmNqbw2qZxw7pOU45zRbSeW5fch8O7QtsjaqDZnSgn2VVr8HOTVD+UXHPAKkq2l3P6kNh0wOfVQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=cjZflz/z; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cjZflz/z" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1772466363; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=DHcnZXjoNh9UlZ6zwZM29f4REBTL4xYzcPoERtc3VPQ=; b=cjZflz/zpj7snSPXArbsUqyD6TAnzntSA9hMNFA1ReMLoqcaU8EaSsla2sn0Ub7acS4kW+ +iSl/hBY7OMdT6/79R41O0kcNtjL0PY9Yh7/0f65+Q0vrNdeOmeCs2WojCn1E0vqmofrZF e5lOIEqN17HfEPlW7NI/Q+bK9DgZaFI= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-671-4gNxZ5SlMzu0L8pyd6SVeQ-1; Mon, 02 Mar 2026 10:46:00 -0500 X-MC-Unique: 4gNxZ5SlMzu0L8pyd6SVeQ-1 X-Mimecast-MFC-AGG-ID: 4gNxZ5SlMzu0L8pyd6SVeQ_1772466359 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 98FCD18002FB; Mon, 2 Mar 2026 15:45:58 +0000 (UTC) Received: from jlelli-thinkpadt14gen4.remote.csb (unknown [10.45.224.78]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 83E5B19560A3; Mon, 2 Mar 2026 15:45:54 +0000 (UTC) From: Juri Lelli Date: Mon, 02 Mar 2026 16:45:40 +0100 Subject: [PATCH v3] sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260302-upstream-fix-deadline-piboost-b4-v3-1-6ba32184a9e0@redhat.com> X-B4-Tracking: v=1; b=H4sIAAAAAAAC/42OzQrCMBAGX6XkbCTZ9NeT7yEemmRrI7YpSVqU0 nc3rYh4UvY08DGzM/HoDHpySGbicDLe2D6C2CVEtXV/QWp0ZAIMcgYso+Pgg8O6o425U421vpk e6WCktT5QmVLQFaQSecFLIFEzOIzTLXE6v9iP8ooqrN510RofrHtsP0x83b1z+e/cxGm8lKUiy wtZVuroULd12CvbkbU3wccoGPxhhGhkqgJZiKLh4tu4LMsTQ4SwEzUBAAA= X-Change-ID: 20260205-upstream-fix-deadline-piboost-b4-2d924be17182 To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: Philip Auld , Gabriele Monaco , linux-kernel@vger.kernel.org, Bruno Goncalves , Juri Lelli X-Developer-Signature: v=1; a=ed25519-sha256; t=1772466354; l=4763; i=juri.lelli@redhat.com; s=20250626; h=from:subject:message-id; bh=NOnGHJpogfaiUBFNRdTfRKm7CTEwInkvm6CMzKFClhI=; b=lx/NnRQFMnVAKecnLOeuzvs1LAia3uxGnOo9wKG9C3MZD1UQU5yWnMWbtTtn6v6xCHUy7QkzM +DwkY6633RVBVHUZZzScLs9QnvXsbkUR3ygH04UkezLGYM80LgnL70P X-Developer-Key: i=juri.lelli@redhat.com; a=ed25519; pk=kSwf88oiY/PYrNMRL/tjuBPiSGzc+U3bD13Zag6wO5Q= X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Running stress-ng --schedpolicy 0 on an RT kernel on a big machine might lead to the following WARNINGs (edited). sched: DL de-boosted task PID 22725: REPLENISH flag missing WARNING: CPU: 93 PID: 0 at kernel/sched/deadline.c:239 dequeue_task_dl+0x1= 5c/0x1f8 ... (running_bw underflow) Call trace: dequeue_task_dl+0x15c/0x1f8 (P) dequeue_task+0x80/0x168 deactivate_task+0x24/0x50 push_dl_task+0x264/0x2e0 dl_task_timer+0x1b0/0x228 __hrtimer_run_queues+0x188/0x378 hrtimer_interrupt+0xfc/0x260 ... The problem is that when a SCHED_DEADLINE task (lock holder) is changed to a lower priority class via sched_setscheduler(), it may fail to properly inherit the parameters of potential DEADLINE donors if it didn't already inherit them in the past (shorter deadline than donor's at that time). This might lead to bandwidth accounting corruption, as enqueue_task_dl() won't recognize the lock holder as boosted. The scenario occurs when: 1. A DEADLINE task (donor) blocks on a PI mutex held by another DEADLINE task (holder), but the holder doesn't inherit parameters (e.g., it already has a shorter deadline) 2. sched_setscheduler() changes the holder from DEADLINE to a lower class while still holding the mutex 3. The holder should now inherit DEADLINE parameters from the donor and be enqueued with ENQUEUE_REPLENISH, but this doesn't happen Fix the issue by introducing __setscheduler_dl_pi(), which detects when a DEADLINE (proper or boosted) task gets setscheduled to a lower priority class. In case, the function makes the task inherit DEADLINE parameters of the donoer (pi_se) and sets ENQUEUE_REPLENISH flag to ensure proper bandwidth accounting during the next enqueue operation. Reported-by: Bruno Goncalves Fixes: 2279f540ea7d ("sched/deadline: Fix priority inheritance with multipl= e scheduling classes") Signed-off-by: Juri Lelli --- Hello, v3 of the fix for the issue described in the changelog. The issue was discovered by Bruno Goncalves while running stress-ng --schedpolicy 0 on RT kernels on large systems (I believe lots of CPUs and PI enabled in-kernel mutexes makes it easier to trigger). Later on a simpler and more focused reproducer was created (with Claude Code help) and is available at https://github.com/jlelli/sched-deadline-tests/blob/master/test_dl_replenis= h_bug.c Fix also available from git@github.com:jlelli/linux.git fix-deadline-piboost-v3 --- Changes in v3: - Add Fixes tag and trip WARN splat (Peter). - Link to v2: https://patch.msgid.link/20260302-upstream-fix-deadline-piboo= st-b4-v2-1-0c92b737f13c@redhat.com Changes in v2: - Rebased to tip/sched/core as of today - Fix things inside !KEEP_PARAMS (Peter) - Create a different helper function - Link to v1: https://patch.msgid.link/20260206-upstream-fix-deadline-piboo= st-b4-v1-1-14043567b89c@redhat.com --- kernel/sched/syscalls.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c index a288ac0a633d7..b215b0ead9a60 100644 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -284,6 +284,35 @@ static bool check_same_owner(struct task_struct *p) uid_eq(cred->euid, pcred->uid)); } =20 +#ifdef CONFIG_RT_MUTEXES +static inline void __setscheduler_dl_pi(int newprio, int policy, + struct task_struct *p, + struct sched_change_ctx *scope) +{ + /* + * In case a DEADLINE task (either proper or boosted) gets + * setscheduled to a lower priority class, check if it neeeds to + * inherit parameters from a potential pi_task. In that case make + * sure replenishment happens with the next enqueue. + */ + + if (dl_prio(newprio) && !dl_policy(policy)) { + struct task_struct *pi_task =3D rt_mutex_get_top_task(p); + + if (pi_task) { + p->dl.pi_se =3D pi_task->dl.pi_se; + scope->flags |=3D ENQUEUE_REPLENISH; + } + } +} +#else /* !CONFIG_RT_MUTEXES */ +static inline void __setscheduler_dl_pi(int newprio, int policy, + struct task_struct *p, + struct sched_change_ctx *scope) +{ +} +#endif /* !CONFIG_RT_MUTEXES */ + #ifdef CONFIG_UCLAMP_TASK =20 static int uclamp_validate(struct task_struct *p, @@ -655,6 +684,7 @@ int __sched_setscheduler(struct task_struct *p, __setscheduler_params(p, attr); p->sched_class =3D next_class; p->prio =3D newprio; + __setscheduler_dl_pi(newprio, policy, p, scope); } __setscheduler_uclamp(p, attr); =20 --- base-commit: 2e7af192697ef2a71c76fd57860b0fcd02754e14 change-id: 20260205-upstream-fix-deadline-piboost-b4-2d924be17182 Best regards, -- =20 Juri Lelli