The previous patch lifted the deadline bandwidth check during the kexec
process, which raises a potential issue: as the number of online CPUs
decreases, DL tasks may be crowded onto a few CPUs, which may starve the
CPU hotplug kthread. As a result, the hot-removal cannot proceed in
practice. On the other hand, as CPUs are offlined one by one, all tasks
will eventually be migrated to the kexec CPU.
Therefore, this patch marks all other CPUs as inactive to signal the
scheduler to migrate tasks to the kexec CPU during hot-removal.
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Pierre Gondois <pierre.gondois@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kexec@lists.infradead.org
To: linux-kernel@vger.kernel.org
---
kernel/cpu.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index db9f6c539b28c..76aa0f784602b 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1546,6 +1546,16 @@ void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
if (!cpu_online(primary_cpu))
primary_cpu = cpumask_first(cpu_online_mask);
+ /*
+ * Mark all other CPUs as inactive so the scheduler won't select them as
+ * migration targets.
+ */
+ for_each_online_cpu(cpu) {
+ if (cpu == primary_cpu)
+ continue;
+ set_cpu_active(cpu, false);
+ }
+
for_each_online_cpu(cpu) {
if (cpu == primary_cpu)
continue;
--
2.49.0
On Wed, Oct 22 2025 at 20:13, Pingfan Liu wrote:
> The previous patch lifted the deadline bandwidth check during the kexec
Once this is applied 'The previous patch' is meaningless.
> process, which raises a potential issue: as the number of online CPUs
> decreases, DL tasks may be crowded onto a few CPUs, which may starve the
> CPU hotplug kthread. As a result, the hot-removal cannot proceed in
> practice. On the other hand, as CPUs are offlined one by one, all tasks
> will eventually be migrated to the kexec CPU.
>
> Therefore, this patch marks all other CPUs as inactive to signal the
git grep "This patch" Documentation/process/
> scheduler to migrate tasks to the kexec CPU during hot-removal.
I'm not seeing what this solves. It just changes the timing of moving
tasks off to the boot CPU where they compete for the CPU for nothing.
When kexec() is in progress, then running user space tasks at all is a
completely pointless exercise.
So the obvious solution to the problem is to freeze all user space tasks
when kexec() is invoked. No horrible hacks in the deadline scheduler and
elsewhere required to make that work. No?
Thanks,
tglx
On Mon, Oct 27, 2025 at 06:06:32PM +0100, Thomas Gleixner wrote: > On Wed, Oct 22 2025 at 20:13, Pingfan Liu wrote: > > The previous patch lifted the deadline bandwidth check during the kexec > > Once this is applied 'The previous patch' is meaningless. > I will rephrase it. > > process, which raises a potential issue: as the number of online CPUs > > decreases, DL tasks may be crowded onto a few CPUs, which may starve the > > CPU hotplug kthread. As a result, the hot-removal cannot proceed in > > practice. On the other hand, as CPUs are offlined one by one, all tasks > > will eventually be migrated to the kexec CPU. > > > > Therefore, this patch marks all other CPUs as inactive to signal the > > git grep "This patch" Documentation/process/ > I will rephrase it. > > scheduler to migrate tasks to the kexec CPU during hot-removal. > > I'm not seeing what this solves. It just changes the timing of moving > tasks off to the boot CPU where they compete for the CPU for nothing. > > When kexec() is in progress, then running user space tasks at all is a > completely pointless exercise. > > So the obvious solution to the problem is to freeze all user space tasks I agree, but what about a less intrusive approach? Simply stopping the DL tasks should suffice, as everything works correctly without them. I have a draft patch ready. Let's discuss it and go from there. > when kexec() is invoked. No horrible hacks in the deadline scheduler and > elsewhere required to make that work. No? > To clarify, skipping the dl_bw_deactivate() validation is necessary because it prevents CPU hot-removal. Thanks, Pingfan
On Tue, Oct 28 2025 at 10:51, Pingfan Liu wrote:
> On Mon, Oct 27, 2025 at 06:06:32PM +0100, Thomas Gleixner wrote:
>> When kexec() is in progress, then running user space tasks at all is a
>> completely pointless exercise.
>>
>> So the obvious solution to the problem is to freeze all user space tasks
>
> I agree, but what about a less intrusive approach? Simply stopping the
> DL tasks should suffice, as everything works correctly without them.
What's intrusive about that? Task freezing exists already.
> I have a draft patch ready. Let's discuss it and go from there.
>
>> when kexec() is invoked. No horrible hacks in the deadline scheduler and
>> elsewhere required to make that work. No?
>
> To clarify, skipping the dl_bw_deactivate() validation is necessary
> because it prevents CPU hot-removal.
If you freeze stuff there is nothing to do. Hibernation works exactly
that way without any magic hacks in a particular scheduling class, no?
Thanks,
tglx
On Tue, Oct 28, 2025 at 01:59:11PM +0100, Thomas Gleixner wrote: > On Tue, Oct 28 2025 at 10:51, Pingfan Liu wrote: > > On Mon, Oct 27, 2025 at 06:06:32PM +0100, Thomas Gleixner wrote: > >> When kexec() is in progress, then running user space tasks at all is a > >> completely pointless exercise. > >> > >> So the obvious solution to the problem is to freeze all user space tasks > > > > I agree, but what about a less intrusive approach? Simply stopping the > > DL tasks should suffice, as everything works correctly without them. > > What's intrusive about that? Task freezing exists already. > Thanks for your guidance. That's a good point -- system suspending is a good analogy. I will check how PM handles it. > > I have a draft patch ready. Let's discuss it and go from there. > > > >> when kexec() is invoked. No horrible hacks in the deadline scheduler and > >> elsewhere required to make that work. No? > > > > To clarify, skipping the dl_bw_deactivate() validation is necessary > > because it prevents CPU hot-removal. > > If you freeze stuff there is nothing to do. Hibernation works exactly > that way without any magic hacks in a particular scheduling class, no? > There is a nuance: DL bandwidth represents a commitment, not necessarily the actual payload. Even a blocked DL task still occupies DL bandwidth. The system's DL bandwidth remains unchanged as long as the CPUs stay online, which is the case in hibernation. Best Regards, Pingfan
On Wed, Oct 29 2025 at 19:36, Pingfan Liu wrote:
> On Tue, Oct 28, 2025 at 01:59:11PM +0100, Thomas Gleixner wrote:
>> If you freeze stuff there is nothing to do. Hibernation works exactly
>> that way without any magic hacks in a particular scheduling class, no?
>>
>
> There is a nuance: DL bandwidth represents a commitment, not necessarily
> the actual payload. Even a blocked DL task still occupies DL bandwidth.
> The system's DL bandwidth remains unchanged as long as the CPUs stay
> online, which is the case in hibernation.
No. Hibernation brings the non-boot CPUs down in order to create the
disk image.
Thanks,
tglx
On Wed, Oct 29, 2025 at 8:13 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Wed, Oct 29 2025 at 19:36, Pingfan Liu wrote: > > On Tue, Oct 28, 2025 at 01:59:11PM +0100, Thomas Gleixner wrote: > >> If you freeze stuff there is nothing to do. Hibernation works exactly > >> that way without any magic hacks in a particular scheduling class, no? > >> > > > > There is a nuance: DL bandwidth represents a commitment, not necessarily > > the actual payload. Even a blocked DL task still occupies DL bandwidth. > > The system's DL bandwidth remains unchanged as long as the CPUs stay > > online, which is the case in hibernation. > > No. Hibernation brings the non-boot CPUs down in order to create the > disk image. > Oh, I see. Since there are no DL tasks in the runqueue, no migration occurs to activate the DL bandwidth. This approach, similar to PM, is perfect for addressing this issue. Thanks, Pingfan
© 2016 - 2026 Red Hat, Inc.