[v3] x86/smp: Cure stop_other_cpus() and kexec() troubles

[patch v3 7/7] x86/smp: Put CPUs into INIT on shutdown if possible

Posted by Thomas Gleixner 2 years, 7 months ago

Parking CPUs in a HLT loop is not completely safe vs. kexec() as HLT can
resume execution due to NMI, SMI and MCE, which has the same issue as the
MWAIT loop.

Kicking the secondary CPUs into INIT makes this safe against NMI and SMI.

A broadcast MCE will take the machine down, but a broadcast MCE which makes
HLT resume and execute overwritten text, pagetables or data will end up in
a disaster too.

So chose the lesser of two evils and kick the secondary CPUs into INIT
unless the system has installed special wakeup mechanisms which are not
using INIT.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
---
V3: Renamed the function to smp_park_other_cpus_in_init() so it can
    be reused for crash eventually.
---
 arch/x86/include/asm/smp.h |    2 ++
 arch/x86/kernel/smp.c      |   39 ++++++++++++++++++++++++++++++++-------
 arch/x86/kernel/smpboot.c  |   19 +++++++++++++++++++
 3 files changed, 53 insertions(+), 7 deletions(-)

--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -139,6 +139,8 @@ void native_send_call_func_ipi(const str
 void native_send_call_func_single_ipi(int cpu);
 void x86_idle_thread_init(unsigned int cpu, struct task_struct *idle);
 
+bool smp_park_other_cpus_in_init(void);
+
 void smp_store_boot_cpu_info(void);
 void smp_store_cpu_info(int id);
 
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -131,7 +131,7 @@ static int smp_stop_nmi_callback(unsigne
 }
 
 /*
- * this function calls the 'stop' function on all other CPUs in the system.
+ * Disable virtualization, APIC etc. and park the CPU in a HLT loop
  */
 DEFINE_IDTENTRY_SYSVEC(sysvec_reboot)
 {
@@ -172,13 +172,17 @@ static void native_stop_other_cpus(int w
 	 * 2) Wait for all other CPUs to report that they reached the
 	 *    HLT loop in stop_this_cpu()
 	 *
-	 * 3) If #2 timed out send an NMI to the CPUs which did not
-	 *    yet report
+	 * 3) If the system uses INIT/STARTUP for CPU bringup, then
+	 *    send all present CPUs an INIT vector, which brings them
+	 *    completely out of the way.
 	 *
-	 * 4) Wait for all other CPUs to report that they reached the
+	 * 4) If #3 is not possible and #2 timed out send an NMI to the
+	 *    CPUs which did not yet report
+	 *
+	 * 5) Wait for all other CPUs to report that they reached the
 	 *    HLT loop in stop_this_cpu()
 	 *
-	 * #3 can obviously race against a CPU reaching the HLT loop late.
+	 * #4 can obviously race against a CPU reaching the HLT loop late.
 	 * That CPU will have reported already and the "have all CPUs
 	 * reached HLT" condition will be true despite the fact that the
 	 * other CPU is still handling the NMI. Again, there is no
@@ -194,7 +198,7 @@ static void native_stop_other_cpus(int w
 		/*
 		 * Don't wait longer than a second for IPI completion. The
 		 * wait request is not checked here because that would
-		 * prevent an NMI shutdown attempt in case that not all
+		 * prevent an NMI/INIT shutdown in case that not all
 		 * CPUs reach shutdown state.
 		 */
 		timeout = USEC_PER_SEC;
@@ -202,7 +206,27 @@ static void native_stop_other_cpus(int w
 			udelay(1);
 	}
 
-	/* if the REBOOT_VECTOR didn't work, try with the NMI */
+	/*
+	 * Park all other CPUs in INIT including "offline" CPUs, if
+	 * possible. That's a safe place where they can't resume execution
+	 * of HLT and then execute the HLT loop from overwritten text or
+	 * page tables.
+	 *
+	 * The only downside is a broadcast MCE, but up to the point where
+	 * the kexec() kernel brought all APs online again an MCE will just
+	 * make HLT resume and handle the MCE. The machine crashs and burns
+	 * due to overwritten text, page tables and data. So there is a
+	 * choice between fire and frying pan. The result is pretty much
+	 * the same. Chose frying pan until x86 provides a sane mechanism
+	 * to park a CPU.
+	 */
+	if (smp_park_other_cpus_in_init())
+		goto done;
+
+	/*
+	 * If park with INIT was not possible and the REBOOT_VECTOR didn't
+	 * take all secondary CPUs offline, try with the NMI.
+	 */
 	if (!cpumask_empty(&cpus_stop_mask)) {
 		/*
 		 * If NMI IPI is enabled, try to register the stop handler
@@ -234,6 +258,7 @@ static void native_stop_other_cpus(int w
 			udelay(1);
 	}
 
+done:
 	local_irq_save(flags);
 	disable_local_APIC();
 	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1465,6 +1465,25 @@ void arch_thaw_secondary_cpus_end(void)
 	cache_aps_init();
 }
 
+bool smp_park_other_cpus_in_init(void)
+{
+	unsigned int cpu, this_cpu = smp_processor_id();
+	unsigned int apicid;
+
+	if (apic->wakeup_secondary_cpu_64 || apic->wakeup_secondary_cpu)
+		return false;
+
+	for_each_present_cpu(cpu) {
+		if (cpu == this_cpu)
+			continue;
+		apicid = apic->cpu_present_to_apicid(cpu);
+		if (apicid == BAD_APICID)
+			continue;
+		send_init_sequence(apicid);
+	}
+	return true;
+}
+
 /*
  * Early setup to make printk work.
  */

Re: [patch v3 7/7] x86/smp: Put CPUs into INIT on shutdown if possible

Posted by Borislav Petkov 2 years, 7 months ago

On Thu, Jun 15, 2023 at 10:34:00PM +0200, Thomas Gleixner wrote:
> @@ -202,7 +206,27 @@ static void native_stop_other_cpus(int w
>  			udelay(1);
>  	}
>  
> -	/* if the REBOOT_VECTOR didn't work, try with the NMI */
> +	/*
> +	 * Park all other CPUs in INIT including "offline" CPUs, if
> +	 * possible. That's a safe place where they can't resume execution
> +	 * of HLT and then execute the HLT loop from overwritten text or
> +	 * page tables.
> +	 *
> +	 * The only downside is a broadcast MCE, but up to the point where
> +	 * the kexec() kernel brought all APs online again an MCE will just
> +	 * make HLT resume and handle the MCE. The machine crashs and burns

"crashes"

With that

Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

[BUG REPORT] Triggering a panic in an x86 virtual machine does not wait

Posted by Baokun Li 2 years, 7 months ago

When I manually trigger panic in a qume x86 VM with

        `echo c > /proc/sysrq-trigger`,

  I find that the VM will probably reboot directly, but the 
PANIC_TIMEOUT is 0.
This prevents us from exporting the vmcore via panic, and even if we succeed
in panic exporting the vmcore, the processes in the vmcore are mostly
stop_this_cpu(). By dichotomizing we found the patch that introduced the
behavior change

    45e34c8af58f ("x86/smp: Put CPUs into INIT on shutdown if possible"),

can anyone help to see what is happening?

Thanks!
-- 
With Best Regards,
Baokun Li
.

Re: [BUG REPORT] Triggering a panic in an x86 virtual machine does not wait

Posted by Thomas Gleixner 2 years, 7 months ago

On Mon, Jul 03 2023 at 11:44, Baokun Li wrote:

> When I manually trigger panic in a qume x86 VM with
>
>         `echo c > /proc/sysrq-trigger`,
>
>   I find that the VM will probably reboot directly, but the 
> PANIC_TIMEOUT is 0.
> This prevents us from exporting the vmcore via panic, and even if we succeed
> in panic exporting the vmcore, the processes in the vmcore are mostly
> stop_this_cpu(). By dichotomizing we found the patch that introduced the
> behavior change
>
>     45e34c8af58f ("x86/smp: Put CPUs into INIT on shutdown if possible"),

Bah, I missed that this is used by crash too. So if this happens to be
invoked on an AP, i.e. not on CPU 0, then the INIT will reset the
machine. Fix below.

Thanks,

        tglx
---
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ed2d51960a7d..e1aa2cd7734b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1348,6 +1348,14 @@ bool smp_park_other_cpus_in_init(void)
 	if (apic->wakeup_secondary_cpu_64 || apic->wakeup_secondary_cpu)
 		return false;
 
+	/*
+	 * If this is a crash stop which does not execute on the boot CPU,
+	 * then this cannot use the INIT mechanism because INIT to the boot
+	 * CPU will reset the machine.
+	 */
+	if (this_cpu)
+		return false;
+
 	for_each_present_cpu(cpu) {
 		if (cpu == this_cpu)
 			continue;

Re: [BUG REPORT] Triggering a panic in an x86 virtual machine does not wait

Posted by Baokun Li 2 years, 7 months ago

On 2023/7/5 16:59, Thomas Gleixner wrote:
> On Mon, Jul 03 2023 at 11:44, Baokun Li wrote:
>
>> When I manually trigger panic in a qume x86 VM with
>>
>>          `echo c > /proc/sysrq-trigger`,
>>
>>    I find that the VM will probably reboot directly, but the
>> PANIC_TIMEOUT is 0.
>> This prevents us from exporting the vmcore via panic, and even if we succeed
>> in panic exporting the vmcore, the processes in the vmcore are mostly
>> stop_this_cpu(). By dichotomizing we found the patch that introduced the
>> behavior change
>>
>>      45e34c8af58f ("x86/smp: Put CPUs into INIT on shutdown if possible"),
> Bah, I missed that this is used by crash too. So if this happens to be
> invoked on an AP, i.e. not on CPU 0, then the INIT will reset the
> machine. Fix below.
>
> Thanks,
>
>          tglx
> ---
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index ed2d51960a7d..e1aa2cd7734b 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1348,6 +1348,14 @@ bool smp_park_other_cpus_in_init(void)
>   	if (apic->wakeup_secondary_cpu_64 || apic->wakeup_secondary_cpu)
>   		return false;
>   
> +	/*
> +	 * If this is a crash stop which does not execute on the boot CPU,
> +	 * then this cannot use the INIT mechanism because INIT to the boot
> +	 * CPU will reset the machine.
> +	 */
> +	if (this_cpu)
> +		return false;
> +
>   	for_each_present_cpu(cpu) {
>   		if (cpu == this_cpu)
>   			continue;
This patch does fix the problem of rebooting at panic, but the exported 
stack
stays at stop_this_cpu() like below, instead of showing what the 
corresponding
process is doing as before.

PID: 681      TASK: ffff9ac2429d3080  CPU: 2    COMMAND: "fsstress"
  #0 [ffffb00200184fd0] stop_this_cpu at ffffffff89a4ffd8
  #1 [ffffb00200184fe8] __sysvec_reboot at ffffffff89a94213
  #2 [ffffb00200184ff0] sysvec_reboot at ffffffff8aee7491
--- <IRQ stack> ---
     RIP: 0000000000000010  RSP: 0000000000000018  RFLAGS: ffffb00200f8bd08
     RAX: ffff9ac256fda9d8  RBX: 0000000009973a85  RCX: ffff9ac256fda078
     RDX: ffff9ac24416e300  RSI: ffff9ac256fda9e0  RDI: ffffffffffffffff
     RBP: ffff9ac2443a5f88   R8: 0000000000000000   R9: ffff9ac2422eeea0
     R10: ffff9ac256fda9d8  R11: 0000000000549921  R12: ffff9ac2422eeea0
     R13: ffff9ac251cd23c8  R14: ffff9ac24269a800  R15: ffff9ac251cd2150
     ORIG_RAX: ffffffff8a1719e4  CS: 0206  SS: ffffffff8a1719c8
bt: WARNING: possibly bogus exception frame

Do you know how this happened? I would be grateful if you could fix it.

Thanks!
-- 
With Best Regards,
Baokun Li
.

Re: [BUG REPORT] Triggering a panic in an x86 virtual machine does not wait

Posted by Thomas Gleixner 2 years, 7 months ago

On Thu, Jul 06 2023 at 14:44, Baokun Li wrote:
> On 2023/7/5 16:59, Thomas Gleixner wrote:
>> +	/*
>> +	 * If this is a crash stop which does not execute on the boot CPU,
>> +	 * then this cannot use the INIT mechanism because INIT to the boot
>> +	 * CPU will reset the machine.
>> +	 */
>> +	if (this_cpu)
>> +		return false;

> This patch does fix the problem of rebooting at panic, but the
> exported stack stays at stop_this_cpu() like below, instead of showing
> what the corresponding process is doing as before.
>
> PID: 681      TASK: ffff9ac2429d3080  CPU: 2    COMMAND: "fsstress"
>   #0 [ffffb00200184fd0] stop_this_cpu at ffffffff89a4ffd8
>   #1 [ffffb00200184fe8] __sysvec_reboot at ffffffff89a94213
>   #2 [ffffb00200184ff0] sysvec_reboot at ffffffff8aee7491
> --- <IRQ stack> ---
>      RIP: 0000000000000010  RSP: 0000000000000018  RFLAGS: ffffb00200f8bd08
>      RAX: ffff9ac256fda9d8  RBX: 0000000009973a85  RCX: ffff9ac256fda078
>      RDX: ffff9ac24416e300  RSI: ffff9ac256fda9e0  RDI: ffffffffffffffff
>      RBP: ffff9ac2443a5f88   R8: 0000000000000000   R9: ffff9ac2422eeea0
>      R10: ffff9ac256fda9d8  R11: 0000000000549921  R12: ffff9ac2422eeea0
>      R13: ffff9ac251cd23c8  R14: ffff9ac24269a800  R15: ffff9ac251cd2150
>      ORIG_RAX: ffffffff8a1719e4  CS: 0206  SS: ffffffff8a1719c8
> bt: WARNING: possibly bogus exception frame
>
> Do you know how this happened? I would be grateful if you could fix it.

No, I don't. But there is clearly a hint:

> bt: WARNING: possibly bogus exception frame

So the exception frame seems to be corrupted. I have no idea why.

The question is, whether this goes away when you revert that commit or not.
I can't oracle that out from your report.

Can you please revert 45e34c8af58f on top of Linus tree and verify that
it makes the issue go away?

Thanks,

        tglx

Re: [BUG REPORT] Triggering a panic in an x86 virtual machine does not wait

Posted by Baokun Li 2 years, 7 months ago

On 2023/7/7 18:18, Thomas Gleixner wrote:
> On Thu, Jul 06 2023 at 14:44, Baokun Li wrote:
>> On 2023/7/5 16:59, Thomas Gleixner wrote:
>>> +	/*
>>> +	 * If this is a crash stop which does not execute on the boot CPU,
>>> +	 * then this cannot use the INIT mechanism because INIT to the boot
>>> +	 * CPU will reset the machine.
>>> +	 */
>>> +	if (this_cpu)
>>> +		return false;

This does solve the problem of x86 VMs not waiting when they panic, so

Reported-and-tested-by: Baokun Li <libaokun1@huawei.com>

>> This patch does fix the problem of rebooting at panic, but the
>> exported stack stays at stop_this_cpu() like below, instead of showing
>> what the corresponding process is doing as before.
>>
>> PID: 681      TASK: ffff9ac2429d3080  CPU: 2    COMMAND: "fsstress"
>>    #0 [ffffb00200184fd0] stop_this_cpu at ffffffff89a4ffd8
>>    #1 [ffffb00200184fe8] __sysvec_reboot at ffffffff89a94213
>>    #2 [ffffb00200184ff0] sysvec_reboot at ffffffff8aee7491
>> --- <IRQ stack> ---
>>       RIP: 0000000000000010  RSP: 0000000000000018  RFLAGS: ffffb00200f8bd08
>>       RAX: ffff9ac256fda9d8  RBX: 0000000009973a85  RCX: ffff9ac256fda078
>>       RDX: ffff9ac24416e300  RSI: ffff9ac256fda9e0  RDI: ffffffffffffffff
>>       RBP: ffff9ac2443a5f88   R8: 0000000000000000   R9: ffff9ac2422eeea0
>>       R10: ffff9ac256fda9d8  R11: 0000000000549921  R12: ffff9ac2422eeea0
>>       R13: ffff9ac251cd23c8  R14: ffff9ac24269a800  R15: ffff9ac251cd2150
>>       ORIG_RAX: ffffffff8a1719e4  CS: 0206  SS: ffffffff8a1719c8
>> bt: WARNING: possibly bogus exception frame
>>
>> Do you know how this happened? I would be grateful if you could fix it.
> No, I don't. But there is clearly a hint:
>
>> bt: WARNING: possibly bogus exception frame
> So the exception frame seems to be corrupted. I have no idea why.
>
> The question is, whether this goes away when you revert that commit or not.
> I can't oracle that out from your report.
>
> Can you please revert 45e34c8af58f on top of Linus tree and verify that
> it makes the issue go away?
>
> Thanks,
>
>          tglx
Yes, the stop_this_cpu() issue persisted after I reverted 45e34c8af58f 
and it
has nothing to do with your patch, I will try to bisect to find out 
which patch
introduced the issue.

Thank you very much for helping locate and rectify the problem that the x86
VM panic does not wait!

Cheers!
-- 
With Best Regards,
Baokun Li
.

[tip: x86/core] x86/smp: Don't send INIT to boot CPU

Posted by tip-bot2 for Thomas Gleixner 2 years, 7 months ago

The following commit has been merged into the x86/core branch of tip:

Commit-ID:     b1472a60a584694875a05cf8bcba8bdf0dc1cd3a
Gitweb:        https://git.kernel.org/tip/b1472a60a584694875a05cf8bcba8bdf0dc1cd3a
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Wed, 05 Jul 2023 10:59:23 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 07 Jul 2023 15:42:31 +02:00

x86/smp: Don't send INIT to boot CPU

Parking CPUs in INIT works well, except for the crash case when the CPU
which invokes smp_park_other_cpus_in_init() is not the boot CPU. Sending
INIT to the boot CPU resets the whole machine.

Prevent this by validating that this runs on the boot CPU. If not fall back
and let CPUs hang in HLT.

Fixes: 45e34c8af58f ("x86/smp: Put CPUs into INIT on shutdown if possible")
Reported-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/87ttui91jo.ffs@tglx
---
 arch/x86/kernel/smpboot.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 4ee4339..7417d9b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1473,6 +1473,14 @@ bool smp_park_other_cpus_in_init(void)
 	if (apic->wakeup_secondary_cpu_64 || apic->wakeup_secondary_cpu)
 		return false;
 
+	/*
+	 * If this is a crash stop which does not execute on the boot CPU,
+	 * then this cannot use the INIT mechanism because INIT to the boot
+	 * CPU will reset the machine.
+	 */
+	if (this_cpu)
+		return false;
+
 	for_each_present_cpu(cpu) {
 		if (cpu == this_cpu)
 			continue;

[tip: x86/core] x86/smp: Put CPUs into INIT on shutdown if possible

Posted by tip-bot2 for Thomas Gleixner 2 years, 7 months ago

The following commit has been merged into the x86/core branch of tip:

Commit-ID:     45e34c8af58f23db4474e2bfe79183efec09a18b
Gitweb:        https://git.kernel.org/tip/45e34c8af58f23db4474e2bfe79183efec09a18b
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 15 Jun 2023 22:34:00 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 20 Jun 2023 14:51:47 +02:00

x86/smp: Put CPUs into INIT on shutdown if possible

Parking CPUs in a HLT loop is not completely safe vs. kexec() as HLT can
resume execution due to NMI, SMI and MCE, which has the same issue as the
MWAIT loop.

Kicking the secondary CPUs into INIT makes this safe against NMI and SMI.

A broadcast MCE will take the machine down, but a broadcast MCE which makes
HLT resume and execute overwritten text, pagetables or data will end up in
a disaster too.

So chose the lesser of two evils and kick the secondary CPUs into INIT
unless the system has installed special wakeup mechanisms which are not
using INIT.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230615193330.608657211@linutronix.de

---
 arch/x86/include/asm/smp.h |  2 ++-
 arch/x86/kernel/smp.c      | 39 ++++++++++++++++++++++++++++++-------
 arch/x86/kernel/smpboot.c  | 19 ++++++++++++++++++-
 3 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index d4ce5cb..5906aa9 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -139,6 +139,8 @@ void native_send_call_func_ipi(const struct cpumask *mask);
 void native_send_call_func_single_ipi(int cpu);
 void x86_idle_thread_init(unsigned int cpu, struct task_struct *idle);
 
+bool smp_park_other_cpus_in_init(void);
+
 void smp_store_boot_cpu_info(void);
 void smp_store_cpu_info(int id);
 
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 174d623..0076932 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -131,7 +131,7 @@ static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
 }
 
 /*
- * this function calls the 'stop' function on all other CPUs in the system.
+ * Disable virtualization, APIC etc. and park the CPU in a HLT loop
  */
 DEFINE_IDTENTRY_SYSVEC(sysvec_reboot)
 {
@@ -172,13 +172,17 @@ static void native_stop_other_cpus(int wait)
 	 * 2) Wait for all other CPUs to report that they reached the
 	 *    HLT loop in stop_this_cpu()
 	 *
-	 * 3) If #2 timed out send an NMI to the CPUs which did not
-	 *    yet report
+	 * 3) If the system uses INIT/STARTUP for CPU bringup, then
+	 *    send all present CPUs an INIT vector, which brings them
+	 *    completely out of the way.
 	 *
-	 * 4) Wait for all other CPUs to report that they reached the
+	 * 4) If #3 is not possible and #2 timed out send an NMI to the
+	 *    CPUs which did not yet report
+	 *
+	 * 5) Wait for all other CPUs to report that they reached the
 	 *    HLT loop in stop_this_cpu()
 	 *
-	 * #3 can obviously race against a CPU reaching the HLT loop late.
+	 * #4 can obviously race against a CPU reaching the HLT loop late.
 	 * That CPU will have reported already and the "have all CPUs
 	 * reached HLT" condition will be true despite the fact that the
 	 * other CPU is still handling the NMI. Again, there is no
@@ -194,7 +198,7 @@ static void native_stop_other_cpus(int wait)
 		/*
 		 * Don't wait longer than a second for IPI completion. The
 		 * wait request is not checked here because that would
-		 * prevent an NMI shutdown attempt in case that not all
+		 * prevent an NMI/INIT shutdown in case that not all
 		 * CPUs reach shutdown state.
 		 */
 		timeout = USEC_PER_SEC;
@@ -202,7 +206,27 @@ static void native_stop_other_cpus(int wait)
 			udelay(1);
 	}
 
-	/* if the REBOOT_VECTOR didn't work, try with the NMI */
+	/*
+	 * Park all other CPUs in INIT including "offline" CPUs, if
+	 * possible. That's a safe place where they can't resume execution
+	 * of HLT and then execute the HLT loop from overwritten text or
+	 * page tables.
+	 *
+	 * The only downside is a broadcast MCE, but up to the point where
+	 * the kexec() kernel brought all APs online again an MCE will just
+	 * make HLT resume and handle the MCE. The machine crashes and burns
+	 * due to overwritten text, page tables and data. So there is a
+	 * choice between fire and frying pan. The result is pretty much
+	 * the same. Chose frying pan until x86 provides a sane mechanism
+	 * to park a CPU.
+	 */
+	if (smp_park_other_cpus_in_init())
+		goto done;
+
+	/*
+	 * If park with INIT was not possible and the REBOOT_VECTOR didn't
+	 * take all secondary CPUs offline, try with the NMI.
+	 */
 	if (!cpumask_empty(&cpus_stop_mask)) {
 		/*
 		 * If NMI IPI is enabled, try to register the stop handler
@@ -225,6 +249,7 @@ static void native_stop_other_cpus(int wait)
 			udelay(1);
 	}
 
+done:
 	local_irq_save(flags);
 	disable_local_APIC();
 	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index b403ead..4ee4339 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1465,6 +1465,25 @@ void arch_thaw_secondary_cpus_end(void)
 	cache_aps_init();
 }
 
+bool smp_park_other_cpus_in_init(void)
+{
+	unsigned int cpu, this_cpu = smp_processor_id();
+	unsigned int apicid;
+
+	if (apic->wakeup_secondary_cpu_64 || apic->wakeup_secondary_cpu)
+		return false;
+
+	for_each_present_cpu(cpu) {
+		if (cpu == this_cpu)
+			continue;
+		apicid = apic->cpu_present_to_apicid(cpu);
+		if (apicid == BAD_APICID)
+			continue;
+		send_init_sequence(apicid);
+	}
+	return true;
+}
+
 /*
  * Early setup to make printk work.
  */