cpu/hotplug: disallow writing any state in atomic AP section to sysfs target

[PATCH] cpu/hotplug: disallow writing any state in atomic AP section to sysfs target

Posted by Koichiro Den 1 year, 1 month ago

When CONFIG_CPU_HOTPLUG_STATE_CONTROL=y, writing a state within the
atomic AP section to 'hotplug/target' file for a fully online cpu can
cause a kernel crash [1]. This occurs because take_cpu_down() disables
the CPU, but the state machine does not reach CPUHP_AP_OFFLINE. As a
result, when cpu stopper thread finishes its work and idle task takes
over, cpuhp_report_idle_dead() crashes on 'BUG_ON(st->state !=
CPUHP_AP_OFFLINE)'.

In the opposite direction, start_secondary() assumes all startup
callbacks have been invoked and transitions to CPUHP_AP_ONLINE_IDLE,
regardless of the written target. This can result in some callbacks in
the section being silently skipped.

To address the issue, disable writing any state within the atomic AP
states to sysfs target. Additionally, set cant_stop to true for both
CPUHP_BP_KICK_AP (when CONFIG_HOTPLUG_SPLIT_STARTUP=y) and
CPUHP_AP_ONLINE since we do not automatically make the state machine
proceed to the other end of the atomic states.

[1]:

  # grep 'tick:dying' /sys/devices/system/cpu/hotplug/states
    143: tick:dying
  # cat /sys/devices/system/cpu/cpu7/hotplug/target
    238  # fully online
  # echo 143 > /sys/devices/system/cpu/cpu7/hotplug/target

    [  145.091832] ------------[ cut here ]------------
    [  145.092928] kernel BUG at kernel/cpu.c:1365!
    [  145.093960] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
    --(snip)--

Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
---
Previous attempt:
https://lore.kernel.org/all/20241207144721.2828390-1-koichiro.den@canonical.com/
---
 kernel/cpu.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 34f1a09349fc..c877443f5888 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2127,6 +2127,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
 	[CPUHP_BP_KICK_AP] = {
 		.name			= "cpu:kick_ap",
 		.startup.single		= cpuhp_kick_ap_alive,
+		.cant_stop		= true,
 	},
 
 	/*
@@ -2192,6 +2193,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
 	 * state for synchronsization */
 	[CPUHP_AP_ONLINE] = {
 		.name			= "ap:online",
+		.cant_stop		= true,
 	},
 	/*
 	 * Handled on control processor until the plugged processor manages
@@ -2759,7 +2761,8 @@ static ssize_t target_store(struct device *dev, struct device_attribute *attr,
 		return ret;
 
 #ifdef CONFIG_CPU_HOTPLUG_STATE_CONTROL
-	if (target < CPUHP_OFFLINE || target > CPUHP_ONLINE)
+	if (target < CPUHP_OFFLINE || target > CPUHP_ONLINE ||
+	    cpuhp_is_atomic_state(target))
 		return -EINVAL;
 #else
 	if (target != CPUHP_OFFLINE && target != CPUHP_ONLINE)
-- 
2.43.0

Re: [PATCH] cpu/hotplug: disallow writing any state in atomic AP section to sysfs target

Posted by Vishal Chourasia 1 year ago

On Fri, Dec 20, 2024 at 11:15:38PM +0900, Koichiro Den wrote:
> When CONFIG_CPU_HOTPLUG_STATE_CONTROL=y, writing a state within the
> atomic AP section to 'hotplug/target' file for a fully online cpu can
> cause a kernel crash [1]. This occurs because take_cpu_down() disables
> the CPU, but the state machine does not reach CPUHP_AP_OFFLINE. As a
> result, when cpu stopper thread finishes its work and idle task takes
> over, cpuhp_report_idle_dead() crashes on 'BUG_ON(st->state !=
> CPUHP_AP_OFFLINE)'.
> 
> In the opposite direction, start_secondary() assumes all startup
> callbacks have been invoked and transitions to CPUHP_AP_ONLINE_IDLE,
> regardless of the written target. This can result in some callbacks in
> the section being silently skipped.
> 
> To address the issue, disable writing any state within the atomic AP
> states to sysfs target. Additionally, set cant_stop to true for both
> CPUHP_BP_KICK_AP (when CONFIG_HOTPLUG_SPLIT_STARTUP=y) and
> CPUHP_AP_ONLINE since we do not automatically make the state machine
> proceed to the other end of the atomic states.
> 
> [1]:
> 
>   # grep 'tick:dying' /sys/devices/system/cpu/hotplug/states
>     143: tick:dying
>   # cat /sys/devices/system/cpu/cpu7/hotplug/target
>     238  # fully online
>   # echo 143 > /sys/devices/system/cpu/cpu7/hotplug/target
After applying the patch, I don't see any system crash. 

Thank you for the patch!

Tested-By: Vishal Chourasia <vishalc@linux.ibm.com>
> 
>     [  145.091832] ------------[ cut here ]------------
>     [  145.092928] kernel BUG at kernel/cpu.c:1365!
>     [  145.093960] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>     --(snip)--
> 
> Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
> ---
> Previous attempt:
> https://lore.kernel.org/all/20241207144721.2828390-1-koichiro.den@canonical.com/
> ---
>  kernel/cpu.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 34f1a09349fc..c877443f5888 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2127,6 +2127,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
>  	[CPUHP_BP_KICK_AP] = {
>  		.name			= "cpu:kick_ap",
>  		.startup.single		= cpuhp_kick_ap_alive,
> +		.cant_stop		= true,
>  	},
>  
>  	/*
> @@ -2192,6 +2193,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
>  	 * state for synchronsization */
>  	[CPUHP_AP_ONLINE] = {
>  		.name			= "ap:online",
> +		.cant_stop		= true,
>  	},
>  	/*
>  	 * Handled on control processor until the plugged processor manages
> @@ -2759,7 +2761,8 @@ static ssize_t target_store(struct device *dev, struct device_attribute *attr,
>  		return ret;
>  
>  #ifdef CONFIG_CPU_HOTPLUG_STATE_CONTROL
> -	if (target < CPUHP_OFFLINE || target > CPUHP_ONLINE)
> +	if (target < CPUHP_OFFLINE || target > CPUHP_ONLINE ||
> +	    cpuhp_is_atomic_state(target))
>  		return -EINVAL;
>  #else
>  	if (target != CPUHP_OFFLINE && target != CPUHP_ONLINE)
pretty straight forward...

While I agree that this is one way to patch things up and, at the very
least, prevent a kernel crash, I wonder what infrastructure is needed to
make CONFIG_CPU_HOTPLUG_STATE_CONTROL=y work for the prepare phase.

IIUC:  
1. The control CPU invokes `takedown_cpu()`, which sets the new
target state as `CPUHP_AP_OFFLINE`.  

2. `take_cpu_down()` is executed on the AP in the stopper thread context
and starts invoking the prepare phase callbacks.  

3. Suppose the user wants to stop at some state in the
prepare phase; after executing that state's teardown callback, the
stopper thread returns control back to `takedown_cpu()` on the control
CPU.  

4. Here’s the part I am not sure about: On CPU 6, after the stopper
thread finishes, a context switch occurs, the idle thread starts
executing, and it encounters `BUG_ON(st->state != CPUHP_AP_OFFLINE)` in
`cpuhp_report_idle_dead()` → `do_idle()`.

So, the problem arises from the stopper thread going away in the middle
(depending on which step in the prepare phase was set as the target).
The idle thread starts next and expects that, if the AP is marked as
offline, its current cpuhp_state should be CPUHP_AP_OFFLINE. However, it
won't be, causing a kernel crash in cpuhp_report_idle_dead().


 vishalc
> -- 
> 2.43.0
>

Re: [PATCH] cpu/hotplug: disallow writing any state in atomic AP section to sysfs target

Posted by Thomas Gleixner 1 year ago

On Fri, Dec 20 2024 at 23:15, Koichiro Den wrote:
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 34f1a09349fc..c877443f5888 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2127,6 +2127,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
>  	[CPUHP_BP_KICK_AP] = {
>  		.name			= "cpu:kick_ap",
>  		.startup.single		= cpuhp_kick_ap_alive,
> +		.cant_stop		= true,

Why? If it stops here, then no harm is done. The AP just waits for being
released. It won't change the state as that's a seperate handshake
mechanism.

>  	},
>  
>  	/*
> @@ -2192,6 +2193,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
>  	 * state for synchronsization */
>  	[CPUHP_AP_ONLINE] = {
>  		.name			= "ap:online",
> +		.cant_stop		= true,

Your change log is pretty unclear about the reason for this change.

Thanks,

        tglx

Re: [PATCH] cpu/hotplug: disallow writing any state in atomic AP section to sysfs target

Posted by Koichiro Den 1 year ago

On Thu, Jan 16, 2025 at 02:21:14PM GMT, Thomas Gleixner wrote:
> On Fri, Dec 20 2024 at 23:15, Koichiro Den wrote:
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 34f1a09349fc..c877443f5888 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -2127,6 +2127,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
> >  	[CPUHP_BP_KICK_AP] = {
> >  		.name			= "cpu:kick_ap",
> >  		.startup.single		= cpuhp_kick_ap_alive,
> > +		.cant_stop		= true,
> 
> Why? If it stops here, then no harm is done. The AP just waits for being
> released. It won't change the state as that's a seperate handshake
> mechanism.

Thanks for the feedback.
I see my commit message was poorly worded. Please see below.

> 
> >  	},
> >  
> >  	/*
> > @@ -2192,6 +2193,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
> >  	 * state for synchronsization */
> >  	[CPUHP_AP_ONLINE] = {
> >  		.name			= "ap:online",
> > +		.cant_stop		= true,
> 
> Your change log is pretty unclear about the reason for this change.

Looking back, I agree it's unclear.

The reason why I set cant_stop for CPUHP_BP_KICK_AP is to prevent state
transitions, such as CPUHP_ONLINE -> CPUHP_BP_KICK_AP -> CPUHP_ONLINE, from
being stuck on its return trip. Kicking the AP needs to happen as idle task
is playing dead in this scenario.

I think it's inherently and unavoidably asymmetric because the atomic AP
states handle leading the idle task to dead, while kicking the AP occurs so
outside of the atomic AP states. It appears to me that just prohibiting
stops at CPUHP_BP_KICK_AP is the simplest solution.

My change log should have been something like this:

(before)
  [...]
  states to sysfs target. Additionally, set cant_stop to true for both
  CPUHP_BP_KICK_AP (when CONFIG_HOTPLUG_SPLIT_STARTUP=y) and
  CPUHP_AP_ONLINE since we do not automatically make the state machine
  proceed to the other end of the atomic states.

(after)
  [...]
  states to sysfs target. Additionally, set cant_stop to true for both
  CPUHP_BP_KICK_AP (when CONFIG_HOTPLUG_SPLIT_STARTUP=y) and
  CPUHP_AP_ONLINE due to the following reasons:
  * For CPUHP_BP_KICK_AP: To ensure transitions from a higher state can
    successfully return to a higher state, kicking the AP must take place.
  * For CPUHP_AP_ONLINE: we do not automatically make the state machine
    proceed to the other end of the atomic states.

Thanks,

-Koichiro Den

> 
> Thanks,
> 
>         tglx