From nobody Wed Jun 17 04:04:28 2026 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 323BB37DEBB for ; Wed, 22 Apr 2026 13:21:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776864091; cv=none; b=IURramj8lsu58Nw+cjdTDxMaUbJrI+0+Yxth8XAEZp2f0335CVkXZ6seSXsYQIkIyuXS8NoFu3BJxNU5KSBGIz0NX7FRLrPtEXcpW2jN9ZSb5IMprJgnFQJrQz6Slm4XF7/lmENdBu3nVyMIMX/cP5RtITpldP+aOEhSURdSFIo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776864091; c=relaxed/simple; bh=Wl8/yonMi1Xj3FkTx7Nutbni9RoU96mZlTgPZJ9pTUo=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=AByFXNLrIl+Ww87y4uVStGHH+wc6yW+ScfFC6Wopmgmri323YwFEh54EsE8n2jQsyAx9jY6qZaJgNX332MdXyvLxa++tdotjPZ1PXMm1ZvfY+6O2KahjAJq+a/g80HbneuN9/zETxjkp81V6bF2tYp0PqjMvhMnGViynBmpVM3M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jpiecuch.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Nm8lG9XG; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jpiecuch.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Nm8lG9XG" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-4411a572dc5so2734611f8f.1 for ; Wed, 22 Apr 2026 06:21:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1776864088; x=1777468888; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=3CesbFZs4c9Pqg2tm5xcqWyWaJqZrUAi3Am22HOf06o=; b=Nm8lG9XGy/L27AEwLW7MUv1kW96wIZUGKLRrmG1SXJIjOHIRKKJ2sUFe5xRcngCuSP 4rlUPxxjolmouCglvdDfKQ0SL3iYYlHAqpZrNWqWfFqx3peZB22E4MO77/5h+sCbR8Ec Bd2+44AOiw1s6P9J5bYw7D3ObgX6CTx3BBlQGxAmCYQSX4S0XV8Q6DTLGZ5ALqsXBu0X Wp4ZbQXSgmttUgrhKfs62H5ZB+FLn50zOopTEm1TdHeA5X1NpsdiuVVDtnVXMAa70vJk +YMFPWuEWo9oybMO6oh/2t9FN9jjfbGWWhHRS2+J9rVmGYJA04/WvWb7AqcJjKagciOg 0Uvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776864088; x=1777468888; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=3CesbFZs4c9Pqg2tm5xcqWyWaJqZrUAi3Am22HOf06o=; b=AN2SO4a76bdSrqqdHIMcgqDRuUEm3RUzWNY2kfBjWOFt5CGHDv9tll1hYGGVer5e5l Ffn570lsI8gro3XVwyofEYDB+LxsKpg98RbyB07T6LFQRBNTZ8mx9x7ka2ibw7YJuwh5 kiTeN0vO+8YDJqPAQ5GNirRwK36MhvebYjiddkp3GQ9HWW3drBpK6lEVtOWssqGN0loi osWloUvvzz6/h1zlWkoXnsnxG3oygALguoEqSr+2+eGC155p15Syt3x3+w1XD6RwoS7e wslSqTqjVANLX+n1hWAWHAZfSzB4nAHDQfYq6dgNo4IGPeI9MXCdQhdBMKBQ66+bRIbG bflQ== X-Gm-Message-State: AOJu0YzRFaHvjvIzrqB8fEEy1zJevwS8I34FmaP2XlfjmCmmKRkCObNA eunLBpj/BEg9YxGMpY0FUcmbkFzeH6cnCqAKZDl0Gf33yL2XPt+yzox7QpN8v8oFAyI/Eg6zv1t cDbGpcytQMpfxNA== X-Received: from wruq11.prod.google.com ([2002:a5d:658b:0:b0:43c:f906:ae85]) (user=jpiecuch job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6000:2283:b0:43f:e94a:e777 with SMTP id ffacd0b85a97d-43fe94ae7eamr31430236f8f.37.1776864088325; Wed, 22 Apr 2026 06:21:28 -0700 (PDT) Date: Wed, 22 Apr 2026 13:21:27 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: aerc 0.21.0-0-g5549850facc2 Message-ID: Subject: SCX_ENQ_IMMED potentially leaving dispatched tasks lingering on local DSQs From: Kuba Piecuch To: Tejun Heo , Andrea Righi , Changwoo Min , David Vernet Cc: , Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Hi folks, I recently saw that scx_qmap got rid of the sched_switch tracepoint hook, claiming that SCX_OPS_ALWAYS_ENQ_IMMED is sufficient to keep tasks from lingering on local DSQs. This prompted me to think about some possible edge cases, and I think we can end up with lingering tasks on the local DSQ in the following scenario: Initial conditions: rq->curr =3D=3D rq->idle && rq->next_class =3D=3D &idle_sched_class 1. We enter schedule() for whatever reason, e.g. BPF scheduler kick from another CPU. 2. In __pick_next_task(), all sched classes above SCX fail to pick a task. We still have rq->next_class =3D=3D &idle_sched_class. 3. We enter do_pick_task_scx(). rq_modified_begin() does nothing because sched_class_above(rq->next_class, &ext_sched_class) is false. 4. ops.dispatch() dispatches two tasks. The first one goes to the local DSQ, and the second one goes to a remote CPU's local DSQ. The first task is dispatched without interference. 5. During dispatch of the second task, while the local CPU's rq lock is dro= pped during insertion into the remote CPU's local DSQ, an RT task wakes up on= the local CPU. Since rq->next_class is still idle, wakeup_preempt() calls wakeup_preempt_idle() which calls resched_curr(rq). This effectively does nothing since need_resched is cleared in __schedule() after pick. rq->next_class is set to &rt_sched_class. 6. At the end of balance_one(), we don't trigger a reenqueue because the lo= cal DSQ has only one task. 7. do_pick_task_scx() notices rq_modified_above(rq, &ext_sched_class) and returns RETRY_TASK. 8. The RT task ends up being picked and runs. SCX is not notified of the sw= itch because we're switching from the idle task to an RT task. If my understanding is correct and I didn't miss anything important, then at no point does SCX reenqueue the first task, even though it should. This particular scenario may not apply to scx_qmap, but I think it proves t= hat it's possible to have dispatched tasks lingering on the local DSQ even with SCX_OPS_ALWAYS_ENQ_IMMED. I was thinking we could fix this by adding a nr_immed check right before returning RETRY_TASK: diff --git i/kernel/sched/ext.c w/kernel/sched/ext.c index d66fea57ee69..480627fdc203 100644 --- i/kernel/sched/ext.c +++ w/kernel/sched/ext.c @@ -3079,8 +3079,11 @@ do_pick_task_scx(struct rq *rq, struct rq_flags *rf,= bool force_scx) * If @force_scx is true, always try to pick a SCHED_EXT task, * regardless of any higher-priority sched classes activity. */ - if (!force_scx && rq_modified_above(rq, &ext_sched_class)) + if (!force_scx && rq_modified_above(rq, &ext_sched_class)) { + if (rq->scx.nr_immed) + schedule_reenq_local(rq, 0); return RETRY_TASK; + } keep_prev =3D rq->scx.flags & SCX_RQ_BAL_KEEP; if (unlikely(keep_prev && ...but I think this only fixes the case where the RT task wakes up on the C= PU that is doing the dispatch. The other case is one where the RT task wakes up on the remote CPU (the one the second task was dispatched to) after inserti= on of the second task, assuming the remote CPU is initially idle. To fix both cases, one potential solution that comes to mind is bumping rq->next_class to &ext_sched_class when inserting a task into rq->scx.local= _dsq. Perhaps we should call wakeup_preempt() in dispatch_to_local_dsq()? Let me know what you think! Thanks, Kuba