From nobody Mon Feb 9 14:35:11 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B85E5367F49 for ; Fri, 6 Feb 2026 13:26:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770384387; cv=none; b=KitQ+u8AOaUHIJ7qmWMU09rs6ZGBMvVa2lvMiaFOYfmDQjj2FKJj7amTSvfj5uNCcYU091HKX1eHBproJLBtafwsp5K4O144vu5ZahDm/BOVqZgY39mM+4IC5HXQPra75rlajAw8JBILvca953+UKoiRo1dDB021oqF4PUFTy94= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770384387; c=relaxed/simple; bh=GwfbmUXbK2clwL+cuxO9dar7qW3wj0RaYraoA7qUlko=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=d1GTQc7J86ubmbXUdff/nqX8/ID/BtPjQECSygxerxWEojwv1x9HrDa7y1oNWUYFMhVyjRlO2Oo5aRe8h09sWd1+Xn2VGIWA14l/4s9xRR7a35c9rxMlsUp4pMl4BKE0Q4H8VqG7UkQltSnhprEUQvPmbyAJb3Od+/AAcbBnPlE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=HK8q+DcJ; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HK8q+DcJ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1770384385; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=PkF8IczwnbDAW5nl9viHk0VD9faeN4i+rWmvIziHh3k=; b=HK8q+DcJJQIcO984GqzwVmhp8rsgrg+/7Vvf2yYfcLx49ui4ekXyrYmOL8h6AocI0mx+I1 InJSrs2vLNFW3CzJyyrDZzM0S3cU0WnMMvP1A/rdPxfrLBmMUw9/M4OcYFpDUjYD0EgoSP 27zczn6/16a0N5x+IH4Myj5F6oPooeA= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-21-bKDfWGZANlu9oMqeNmNN0Q-1; Fri, 06 Feb 2026 08:26:22 -0500 X-MC-Unique: bKDfWGZANlu9oMqeNmNN0Q-1 X-Mimecast-MFC-AGG-ID: bKDfWGZANlu9oMqeNmNN0Q_1770384381 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id ABF2918002CE; Fri, 6 Feb 2026 13:26:20 +0000 (UTC) Received: from jlelli-thinkpadt14gen4.remote.csb (unknown [10.44.32.54]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A85F719560A7; Fri, 6 Feb 2026 13:26:15 +0000 (UTC) From: Juri Lelli Date: Fri, 06 Feb 2026 14:25:52 +0100 Subject: [PATCH] sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260206-upstream-fix-deadline-piboost-b4-v1-1-14043567b89c@redhat.com> X-B4-Tracking: v=1; b=H4sIAAAAAAAC/yXNywrCMBCF4Vcps3YgGVpvryJdJM2oI5qETCpC6 bs36vKDw38WUC7CCudugcJvUUmxwe46mO4u3hglNAMZ2hsyA85Za2H3wqt8MLALT4mMWXxKWtH 3SOFEvWd7sEeClsmF2/R3cRn/1tk/eKrfLqzrBuC3fhaEAAAA X-Change-ID: 20260205-upstream-fix-deadline-piboost-b4-2d924be17182 To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: Philip Auld , Gabriele Monaco , linux-kernel@vger.kernel.org, Bruno Goncalves , Juri Lelli X-Developer-Signature: v=1; a=ed25519-sha256; t=1770384375; l=5187; i=juri.lelli@redhat.com; s=20250626; h=from:subject:message-id; bh=GwfbmUXbK2clwL+cuxO9dar7qW3wj0RaYraoA7qUlko=; b=wKt94GHQUfiKPFDEicyxkHJqOb2cyGE1a77acmx+PVnAq5OfIudaiEOH9E5HLlknr79d5WM7V ix3Ct7pe7WDAZQ1hMLpqsRVVUqHiUm+PftkteLnow3oOeY7UDBCpisd X-Developer-Key: i=juri.lelli@redhat.com; a=ed25519; pk=kSwf88oiY/PYrNMRL/tjuBPiSGzc+U3bD13Zag6wO5Q= X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Running stress-ng --schedpolicy 0 on an RT kernel on a big machine might lead to the following WARNINGs (edited). sched: DL de-boosted task PID 22725: REPLENISH flag missing WARNING: CPU: 93 PID: 0 at kernel/sched/deadline.c:239 dequeue_task_dl+0x1= 5c/0x1f8 ... (running_bw underflow) Call trace: dequeue_task_dl+0x15c/0x1f8 (P) dequeue_task+0x80/0x168 deactivate_task+0x24/0x50 push_dl_task+0x264/0x2e0 dl_task_timer+0x1b0/0x228 __hrtimer_run_queues+0x188/0x378 hrtimer_interrupt+0xfc/0x260 arch_timer_handler_phys+0x34/0x60 handle_percpu_devid_irq+0xa4/0x230 generic_handle_domain_irq+0x34/0x60 __gic_handle_irq_from_irqson.isra.0+0x158/0x298 gic_handle_irq+0x28/0x80 call_on_irq_stack+0x30/0x48 do_interrupt_handler+0xdc/0xe8 el1_interrupt+0x44/0xc0 el1h_64_irq_handler+0x18/0x28 el1h_64_irq+0x80/0x88 cpuidle_enter_state+0xc4/0x520 (P) cpuidle_enter+0x40/0x60 cpuidle_idle_call+0x13c/0x220 do_idle+0xa4/0x120 cpu_startup_entry+0x40/0x50 secondary_start_kernel+0xe4/0x128 __secondary_switched+0xc0/0xc8 The problem is that when a SCHED_DEADLINE task (lock holder) is changed to a lower priority class via sched_setscheduler(), it may fail to properly inherit the parameters of potential DEADLINE donors if it didn't already inherit them in the past (shorter deadline than donor's at that time). This might lead to bandwidth accounting corruption, as enqueue_task_dl() won't recognize the lock holder as boosted. The scenario occurs when: 1. A DEADLINE task (donor) blocks on a PI mutex held by another DEADLINE task (holder), but the holder doesn't inherit parameters (e.g., it already has a shorter deadline) 2. sched_setscheduler() changes the holder from DEADLINE to a lower class while still holding the mutex 3. The holder should now inherit DEADLINE parameters from the donor and be enqueued with ENQUEUE_REPLENISH, but this doesn't happen Fix the issue by introducing __setscheduler_dl(), which detects when a task's normal priority class differs from its PI-boosted class. When a (now!) non-DEADLINE task (normal_prio) is being boosted by a DEADLINE pi_task (effective prio), it inherits the DEADLINE parameters (pi_se) and sets the ENQUEUE_REPLENISH flag to ensure proper bandwidth accounting during the next enqueue operation. Reported-by: Bruno Goncalves Signed-off-by: Juri Lelli --- Hello, The underlying big(ger) issue is that PI is broken for DEADLINE. We know this, proxy exec is progressing well and will hopefully soon replace all this. In the meantime, here it comes another piece of duck tape trying to fix the issue described in the changelog. The issue was discovered by Bruno Goncalves while running stress-ng --schedpolicy 0 on RT kernels on large systems (I believe lots of CPUs and PI enabled in-kernel mutexes makes it easier to trigger). Later on a simpler and more focused reproducer was created (with Claude Code help) and is available at https://github.com/jlelli/sched-deadline-tests/blob/master/test_dl_replenis= h_bug.c Fix also available from git@github.com:jlelli/linux.git upstream/fix-deadline-piboost --- kernel/sched/syscalls.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c index 6f10db3646e7f..369e47b4ea863 100644 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -7,6 +7,7 @@ * Copyright (C) 1991-2002 Linus Torvalds * Copyright (C) 1998-2024 Ingo Molnar, Red Hat */ +#include "linux/sched/rt.h" #include #include #include @@ -284,6 +285,33 @@ static bool check_same_owner(struct task_struct *p) uid_eq(cred->euid, pcred->uid)); } =20 +#ifdef CONFIG_RT_MUTEXES +static void __setscheduler_dl(struct task_struct *p, + struct sched_change_ctx *scope) +{ + struct task_struct *pi_task =3D rt_mutex_get_top_task(p); + + /* + * In case a former DEADLINE task (either proper or boosted) gets + * setscheduled to a lower priority class, check if it neeeds to + * inherit parameters from a potential pi_task. In that case make + * sure replenishment happens with the next enqueue. + */ + if (!dl_prio(p->normal_prio) && + (pi_task && dl_prio(pi_task->prio))) { + p->dl.pi_se =3D pi_task->dl.pi_se; + + if (scope && scope->queued) + scope->flags |=3D ENQUEUE_REPLENISH; + } +} +#else /* !CONFIG_RT_MUTEXES */ +static void __setscheduler_dl(struct task_struct *p, + struct sched_change_ctx *scope) +{ +} +#endif /* !CONFIG_RT_MUTEXES */ + #ifdef CONFIG_UCLAMP_TASK =20 static int uclamp_validate(struct task_struct *p, @@ -657,6 +685,7 @@ int __sched_setscheduler(struct task_struct *p, p->prio =3D newprio; } __setscheduler_uclamp(p, attr); + __setscheduler_dl(p, scope); =20 if (scope->queued) { /* --- base-commit: e34881c84c255bc300f24d9fe685324be20da3d1 change-id: 20260205-upstream-fix-deadline-piboost-b4-2d924be17182 Best regards, -- =20 Juri Lelli