[PATCH v4] x86/entry_32: Use stack segment selector for VERW operand

Pawan Gupta posted 1 patch 1 year, 5 months ago
There is a newer version of this series
arch/x86/entry/entry_32.S | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
[PATCH v4] x86/entry_32: Use stack segment selector for VERW operand
Posted by Pawan Gupta 1 year, 5 months ago
Robert Gill reported below #GP when dosemu software was executing vm86()
system call:

  general protection fault: 0000 [#1] PREEMPT SMP
  CPU: 4 PID: 4610 Comm: dosemu.bin Not tainted 6.6.21-gentoo-x86 #1
  Hardware name: Dell Inc. PowerEdge 1950/0H723K, BIOS 2.7.0 10/30/2010
  EIP: restore_all_switch_stack+0xbe/0xcf
  EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
  ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ff8affdc
  DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046
  CR0: 80050033 CR2: 00c2101c CR3: 04b6d000 CR4: 000406d0
  Call Trace:
   show_regs+0x70/0x78
   die_addr+0x29/0x70
   exc_general_protection+0x13c/0x348
   exc_bounds+0x98/0x98
   handle_exception+0x14d/0x14d
   exc_bounds+0x98/0x98
   restore_all_switch_stack+0xbe/0xcf
   exc_bounds+0x98/0x98
   restore_all_switch_stack+0xbe/0xcf

This only happens when VERW based mitigations like MDS/RFDS are enabled.
This is because segment registers with an arbitrary user value can result
in #GP when executing VERW. Intel SDM vol. 2C documents the following
behavior for VERW instruction:

  #GP(0) - If a memory operand effective address is outside the CS, DS, ES,
	   FS, or GS segment limit.

CLEAR_CPU_BUFFERS macro executes VERW instruction before returning to user
space. Replace CLEAR_CPU_BUFFERS with a safer version that uses %ss to
refer VERW operand mds_verw_sel. This ensures VERW will not #GP for an
arbitrary user %ds. Also, in NMI return path, move VERW to after
RESTORE_ALL_NMI that touches GPRs.

For clarity, below are the locations where the new CLEAR_CPU_BUFFERS_SAFE
version is being used:

* entry_INT80_32(), entry_SYSENTER_32() and interrupts (via
  handle_exception_return) do:

restore_all_switch_stack:
  [...]
   mov    %esi,%esi
   verw   %ss:0xc0fc92c0  <-------------
   iret

* Opportunistic SYSEXIT:

   [...]
   verw   %ss:0xc0fc92c0  <-------------
   btrl   $0x9,(%esp)
   popf
   pop    %eax
   sti
   sysexit

*  nmi_return and nmi_from_espfix:
   mov    %esi,%esi
   verw   %ss:0xc0fc92c0  <-------------
   jmp     .Lirq_return

Fixes: a0e2dab44d22 ("x86/entry_32: Add VERW just before userspace transition")
Cc: stable@vger.kernel.org # 5.10+
Reported-by: Robert Gill <rtgill82@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218707
Closes: https://lore.kernel.org/all/8c77ccfd-d561-45a1-8ed5-6b75212c7a58@leemhuis.info/
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Suggested-by: Brian Gerst <brgerst@gmail.com> # Use %ss
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
v4:
- Further simplify the patch by using %ss for all VERW calls in 32-bit mode (Brian).
- In NMI exit path move VERW after RESTORE_ALL_NMI that touches GPRs (Dave).

v3: https://lore.kernel.org/r/20240701-fix-dosemu-vm86-v3-1-b1969532c75a@linux.intel.com
- Simplify CLEAR_CPU_BUFFERS_SAFE by using %ss instead of %ds (Brian).
- Do verw before popf in SYSEXIT path (Jari).

v2: https://lore.kernel.org/r/20240627-fix-dosemu-vm86-v2-1-d5579f698e77@linux.intel.com
- Safe guard against any other system calls like vm86() that might change %ds (Dave).

v1: https://lore.kernel.org/r/20240426-fix-dosemu-vm86-v1-1-88c826a3f378@linux.intel.com
---

---
 arch/x86/entry/entry_32.S | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index d3a814efbff6..d54f6002e5a0 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -253,6 +253,16 @@
 .Lend_\@:
 .endm
 
+/*
+ * Safer version of CLEAR_CPU_BUFFERS that uses %ss to reference VERW operand
+ * mds_verw_sel. This ensures VERW will not #GP for an arbitrary user %ds.
+ */
+.macro CLEAR_CPU_BUFFERS_SAFE
+	ALTERNATIVE "jmp .Lskip_verw\@", "", X86_FEATURE_CLEAR_CPU_BUF
+	verw	%ss:_ASM_RIP(mds_verw_sel)
+.Lskip_verw\@:
+.endm
+
 .macro RESTORE_INT_REGS
 	popl	%ebx
 	popl	%ecx
@@ -871,6 +881,8 @@ SYM_FUNC_START(entry_SYSENTER_32)
 
 	/* Now ready to switch the cr3 */
 	SWITCH_TO_USER_CR3 scratch_reg=%eax
+	/* Clobbers ZF */
+	CLEAR_CPU_BUFFERS_SAFE
 
 	/*
 	 * Restore all flags except IF. (We restore IF separately because
@@ -881,7 +893,6 @@ SYM_FUNC_START(entry_SYSENTER_32)
 	BUG_IF_WRONG_CR3 no_user_check=1
 	popfl
 	popl	%eax
-	CLEAR_CPU_BUFFERS
 
 	/*
 	 * Return back to the vDSO, which will pop ecx and edx.
@@ -951,7 +962,7 @@ restore_all_switch_stack:
 
 	/* Restore user state */
 	RESTORE_REGS pop=4			# skip orig_eax/error_code
-	CLEAR_CPU_BUFFERS
+	CLEAR_CPU_BUFFERS_SAFE
 .Lirq_return:
 	/*
 	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
@@ -1144,7 +1155,6 @@ SYM_CODE_START(asm_exc_nmi)
 
 	/* Not on SYSENTER stack. */
 	call	exc_nmi
-	CLEAR_CPU_BUFFERS
 	jmp	.Lnmi_return
 
 .Lnmi_from_sysenter_stack:
@@ -1165,6 +1175,7 @@ SYM_CODE_START(asm_exc_nmi)
 
 	CHECK_AND_APPLY_ESPFIX
 	RESTORE_ALL_NMI cr3_reg=%edi pop=4
+	CLEAR_CPU_BUFFERS_SAFE
 	jmp	.Lirq_return
 
 #ifdef CONFIG_X86_ESPFIX32
@@ -1206,6 +1217,7 @@ SYM_CODE_START(asm_exc_nmi)
 	 *  1 - orig_ax
 	 */
 	lss	(1+5+6)*4(%esp), %esp			# back to espfix stack
+	CLEAR_CPU_BUFFERS_SAFE
 	jmp	.Lirq_return
 #endif
 SYM_CODE_END(asm_exc_nmi)

---
base-commit: f2661062f16b2de5d7b6a5c42a9a5c96326b8454
change-id: 20240426-fix-dosemu-vm86-dd111a01737e
Re: [PATCH v4] x86/entry_32: Use stack segment selector for VERW operand
Posted by Peter Zijlstra 1 year, 5 months ago
On Wed, Jul 10, 2024 at 12:06:47PM -0700, Pawan Gupta wrote:
> +/*
> + * Safer version of CLEAR_CPU_BUFFERS that uses %ss to reference VERW operand
> + * mds_verw_sel. This ensures VERW will not #GP for an arbitrary user %ds.
> + */
> +.macro CLEAR_CPU_BUFFERS_SAFE
> +	ALTERNATIVE "jmp .Lskip_verw\@", "", X86_FEATURE_CLEAR_CPU_BUF
> +	verw	%ss:_ASM_RIP(mds_verw_sel)
> +.Lskip_verw\@:
> +.endm

I know this is somewhat of a common pattern, but I think it is silly in
this case. Since we already have the ALTERNATIVE() why not NOP the one
VERW instruction instead?

That is,

	ALTERNATIVE("", "verw %ss:_ASM_RIP(mds_verw_sel)", X86_FEATURE_CLEAR_CPU_BUF)

and call it a day?
Re: [PATCH v4] x86/entry_32: Use stack segment selector for VERW operand
Posted by Pawan Gupta 1 year, 5 months ago
On Thu, Jul 11, 2024 at 11:03:29AM +0200, Peter Zijlstra wrote:
> On Wed, Jul 10, 2024 at 12:06:47PM -0700, Pawan Gupta wrote:
> > +/*
> > + * Safer version of CLEAR_CPU_BUFFERS that uses %ss to reference VERW operand
> > + * mds_verw_sel. This ensures VERW will not #GP for an arbitrary user %ds.
> > + */
> > +.macro CLEAR_CPU_BUFFERS_SAFE
> > +	ALTERNATIVE "jmp .Lskip_verw\@", "", X86_FEATURE_CLEAR_CPU_BUF
> > +	verw	%ss:_ASM_RIP(mds_verw_sel)
> > +.Lskip_verw\@:
> > +.endm
> 
> I know this is somewhat of a common pattern, but I think it is silly in
> this case. Since we already have the ALTERNATIVE() why not NOP the one
> VERW instruction instead?
> 
> That is,
> 
> 	ALTERNATIVE("", "verw %ss:_ASM_RIP(mds_verw_sel)", X86_FEATURE_CLEAR_CPU_BUF)

Will do, thanks.
Re: [PATCH v4] x86/entry_32: Use stack segment selector for VERW operand
Posted by Jari Ruusu 1 year, 5 months ago
On Wednesday, July 10th, 2024 at 22:06, Pawan Gupta <pawan.kumar.gupta@linux.intel.com> wrote:
> Fixes: a0e2dab44d22 ("x86/entry_32: Add VERW just before userspace transition")
> Cc: stable@vger.kernel.org # 5.10+
> Reported-by: Robert Gill rtgill82@gmail.com
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218707
> Closes: https://lore.kernel.org/all/8c77ccfd-d561-45a1-8ed5-6b75212c7a58@leemhuis.info/
> Suggested-by: Dave Hansen dave.hansen@linux.intel.com
> Suggested-by: Brian Gerst brgerst@gmail.com # Use %ss
> Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com
> 
> v4:
> - Further simplify the patch by using %ss for all VERW calls in 32-bit mode (Brian).
> - In NMI exit path move VERW after RESTORE_ALL_NMI that touches GPRs (Dave).
> 
> v3: https://lore.kernel.org/r/20240701-fix-dosemu-vm86-v3-1-b1969532c75a@linux.intel.com
> - Simplify CLEAR_CPU_BUFFERS_SAFE by using %ss instead of %ds (Brian).
> - Do verw before popf in SYSEXIT path (Jari).
> 
> v2: https://lore.kernel.org/r/20240627-fix-dosemu-vm86-v2-1-d5579f698e77@linux.intel.com
> - Safe guard against any other system calls like vm86() that might change %ds (Dave).
> 
> v1: https://lore.kernel.org/r/20240426-fix-dosemu-vm86-v1-1-88c826a3f378@linux.intel.com

Pawan,
Your patch looks OK to me.

Greg,
I have verified that patch hunks go correct places on
kernel.org linux-5.10.221, linux-6.1.97 and linux-6.6.38
kernels. All tests run inside 32-bit VM. The patch fixes
show-stopper issues with virtual-8086 mode and dosemu. Once
the patch is accepted upstream, it should go to all 5.10+
stable kernels.

--
Jari Ruusu  4096R/8132F189 12D6 4C3A DCDA 0AA4 27BD  ACDF F073 3C80 8132 F189
Re: [PATCH v4] x86/entry_32: Use stack segment selector for VERW operand
Posted by Uros Bizjak 1 year, 5 months ago

On 10. 07. 24 21:06, Pawan Gupta wrote:
> Robert Gill reported below #GP when dosemu software was executing vm86()
> system call:
> 
>    general protection fault: 0000 [#1] PREEMPT SMP
>    CPU: 4 PID: 4610 Comm: dosemu.bin Not tainted 6.6.21-gentoo-x86 #1
>    Hardware name: Dell Inc. PowerEdge 1950/0H723K, BIOS 2.7.0 10/30/2010
>    EIP: restore_all_switch_stack+0xbe/0xcf
>    EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
>    ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ff8affdc
>    DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046
>    CR0: 80050033 CR2: 00c2101c CR3: 04b6d000 CR4: 000406d0
>    Call Trace:
>     show_regs+0x70/0x78
>     die_addr+0x29/0x70
>     exc_general_protection+0x13c/0x348
>     exc_bounds+0x98/0x98
>     handle_exception+0x14d/0x14d
>     exc_bounds+0x98/0x98
>     restore_all_switch_stack+0xbe/0xcf
>     exc_bounds+0x98/0x98
>     restore_all_switch_stack+0xbe/0xcf
> 
> This only happens when VERW based mitigations like MDS/RFDS are enabled.
> This is because segment registers with an arbitrary user value can result
> in #GP when executing VERW. Intel SDM vol. 2C documents the following
> behavior for VERW instruction:
> 
>    #GP(0) - If a memory operand effective address is outside the CS, DS, ES,
> 	   FS, or GS segment limit.
> 
> CLEAR_CPU_BUFFERS macro executes VERW instruction before returning to user
> space. Replace CLEAR_CPU_BUFFERS with a safer version that uses %ss to
> refer VERW operand mds_verw_sel. This ensures VERW will not #GP for an
> arbitrary user %ds. Also, in NMI return path, move VERW to after
> RESTORE_ALL_NMI that touches GPRs.
> 
> For clarity, below are the locations where the new CLEAR_CPU_BUFFERS_SAFE
> version is being used:
> 
> * entry_INT80_32(), entry_SYSENTER_32() and interrupts (via
>    handle_exception_return) do:
> 
> restore_all_switch_stack:
>    [...]
>     mov    %esi,%esi
>     verw   %ss:0xc0fc92c0  <-------------
>     iret
> 
> * Opportunistic SYSEXIT:
> 
>     [...]
>     verw   %ss:0xc0fc92c0  <-------------
>     btrl   $0x9,(%esp)
>     popf
>     pop    %eax
>     sti
>     sysexit
> 
> *  nmi_return and nmi_from_espfix:
>     mov    %esi,%esi
>     verw   %ss:0xc0fc92c0  <-------------
>     jmp     .Lirq_return
> 
> Fixes: a0e2dab44d22 ("x86/entry_32: Add VERW just before userspace transition")
> Cc: stable@vger.kernel.org # 5.10+
> Reported-by: Robert Gill <rtgill82@gmail.com>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218707
> Closes: https://lore.kernel.org/all/8c77ccfd-d561-45a1-8ed5-6b75212c7a58@leemhuis.info/
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Suggested-by: Brian Gerst <brgerst@gmail.com> # Use %ss
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> ---
> v4:
> - Further simplify the patch by using %ss for all VERW calls in 32-bit mode (Brian).
> - In NMI exit path move VERW after RESTORE_ALL_NMI that touches GPRs (Dave).
> 
> v3: https://lore.kernel.org/r/20240701-fix-dosemu-vm86-v3-1-b1969532c75a@linux.intel.com
> - Simplify CLEAR_CPU_BUFFERS_SAFE by using %ss instead of %ds (Brian).
> - Do verw before popf in SYSEXIT path (Jari).
> 
> v2: https://lore.kernel.org/r/20240627-fix-dosemu-vm86-v2-1-d5579f698e77@linux.intel.com
> - Safe guard against any other system calls like vm86() that might change %ds (Dave).
> 
> v1: https://lore.kernel.org/r/20240426-fix-dosemu-vm86-v1-1-88c826a3f378@linux.intel.com
> ---
> 
> ---
>   arch/x86/entry/entry_32.S | 18 +++++++++++++++---
>   1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index d3a814efbff6..d54f6002e5a0 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -253,6 +253,16 @@
>   .Lend_\@:
>   .endm
>   
> +/*
> + * Safer version of CLEAR_CPU_BUFFERS that uses %ss to reference VERW operand
> + * mds_verw_sel. This ensures VERW will not #GP for an arbitrary user %ds.
> + */
> +.macro CLEAR_CPU_BUFFERS_SAFE
> +	ALTERNATIVE "jmp .Lskip_verw\@", "", X86_FEATURE_CLEAR_CPU_BUF
> +	verw	%ss:_ASM_RIP(mds_verw_sel)
> +.Lskip_verw\@:
> +.endm

Why not simply:

.macro CLEAR_CPU_BUFFERS_SAFE
	ALTERNATIVE "", __stringify(verw %ss:_ASM_RIP(mds_verw_sel)), 
X86_FEATURE_CLEAR_CPU_BUF
.endm

This is how CLEAR_CPU_BUFFERS .macro is defined in nospec-branch.h.

Uros.

> +
>   .macro RESTORE_INT_REGS
>   	popl	%ebx
>   	popl	%ecx
> @@ -871,6 +881,8 @@ SYM_FUNC_START(entry_SYSENTER_32)
>   
>   	/* Now ready to switch the cr3 */
>   	SWITCH_TO_USER_CR3 scratch_reg=%eax
> +	/* Clobbers ZF */
> +	CLEAR_CPU_BUFFERS_SAFE
>   
>   	/*
>   	 * Restore all flags except IF. (We restore IF separately because
> @@ -881,7 +893,6 @@ SYM_FUNC_START(entry_SYSENTER_32)
>   	BUG_IF_WRONG_CR3 no_user_check=1
>   	popfl
>   	popl	%eax
> -	CLEAR_CPU_BUFFERS
>   
>   	/*
>   	 * Return back to the vDSO, which will pop ecx and edx.
> @@ -951,7 +962,7 @@ restore_all_switch_stack:
>   
>   	/* Restore user state */
>   	RESTORE_REGS pop=4			# skip orig_eax/error_code
> -	CLEAR_CPU_BUFFERS
> +	CLEAR_CPU_BUFFERS_SAFE
>   .Lirq_return:
>   	/*
>   	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
> @@ -1144,7 +1155,6 @@ SYM_CODE_START(asm_exc_nmi)
>   
>   	/* Not on SYSENTER stack. */
>   	call	exc_nmi
> -	CLEAR_CPU_BUFFERS
>   	jmp	.Lnmi_return
>   
>   .Lnmi_from_sysenter_stack:
> @@ -1165,6 +1175,7 @@ SYM_CODE_START(asm_exc_nmi)
>   
>   	CHECK_AND_APPLY_ESPFIX
>   	RESTORE_ALL_NMI cr3_reg=%edi pop=4
> +	CLEAR_CPU_BUFFERS_SAFE
>   	jmp	.Lirq_return
>   
>   #ifdef CONFIG_X86_ESPFIX32
> @@ -1206,6 +1217,7 @@ SYM_CODE_START(asm_exc_nmi)
>   	 *  1 - orig_ax
>   	 */
>   	lss	(1+5+6)*4(%esp), %esp			# back to espfix stack
> +	CLEAR_CPU_BUFFERS_SAFE
>   	jmp	.Lirq_return
>   #endif
>   SYM_CODE_END(asm_exc_nmi)
> 
> ---
> base-commit: f2661062f16b2de5d7b6a5c42a9a5c96326b8454
> change-id: 20240426-fix-dosemu-vm86-dd111a01737e
> 
> 
>
Re: [PATCH v4] x86/entry_32: Use stack segment selector for VERW operand
Posted by Pawan Gupta 1 year, 5 months ago
On Wed, Jul 10, 2024 at 11:50:50PM +0200, Uros Bizjak wrote:
> 
> 
> On 10. 07. 24 21:06, Pawan Gupta wrote:
> > Robert Gill reported below #GP when dosemu software was executing vm86()
> > system call:
> > 
> >    general protection fault: 0000 [#1] PREEMPT SMP
> >    CPU: 4 PID: 4610 Comm: dosemu.bin Not tainted 6.6.21-gentoo-x86 #1
> >    Hardware name: Dell Inc. PowerEdge 1950/0H723K, BIOS 2.7.0 10/30/2010
> >    EIP: restore_all_switch_stack+0xbe/0xcf
> >    EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
> >    ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ff8affdc
> >    DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046
> >    CR0: 80050033 CR2: 00c2101c CR3: 04b6d000 CR4: 000406d0
> >    Call Trace:
> >     show_regs+0x70/0x78
> >     die_addr+0x29/0x70
> >     exc_general_protection+0x13c/0x348
> >     exc_bounds+0x98/0x98
> >     handle_exception+0x14d/0x14d
> >     exc_bounds+0x98/0x98
> >     restore_all_switch_stack+0xbe/0xcf
> >     exc_bounds+0x98/0x98
> >     restore_all_switch_stack+0xbe/0xcf
> > 
> > This only happens when VERW based mitigations like MDS/RFDS are enabled.
> > This is because segment registers with an arbitrary user value can result
> > in #GP when executing VERW. Intel SDM vol. 2C documents the following
> > behavior for VERW instruction:
> > 
> >    #GP(0) - If a memory operand effective address is outside the CS, DS, ES,
> > 	   FS, or GS segment limit.
> > 
> > CLEAR_CPU_BUFFERS macro executes VERW instruction before returning to user
> > space. Replace CLEAR_CPU_BUFFERS with a safer version that uses %ss to
> > refer VERW operand mds_verw_sel. This ensures VERW will not #GP for an
> > arbitrary user %ds. Also, in NMI return path, move VERW to after
> > RESTORE_ALL_NMI that touches GPRs.
> > 
> > For clarity, below are the locations where the new CLEAR_CPU_BUFFERS_SAFE
> > version is being used:
> > 
> > * entry_INT80_32(), entry_SYSENTER_32() and interrupts (via
> >    handle_exception_return) do:
> > 
> > restore_all_switch_stack:
> >    [...]
> >     mov    %esi,%esi
> >     verw   %ss:0xc0fc92c0  <-------------
> >     iret
> > 
> > * Opportunistic SYSEXIT:
> > 
> >     [...]
> >     verw   %ss:0xc0fc92c0  <-------------
> >     btrl   $0x9,(%esp)
> >     popf
> >     pop    %eax
> >     sti
> >     sysexit
> > 
> > *  nmi_return and nmi_from_espfix:
> >     mov    %esi,%esi
> >     verw   %ss:0xc0fc92c0  <-------------
> >     jmp     .Lirq_return
> > 
> > Fixes: a0e2dab44d22 ("x86/entry_32: Add VERW just before userspace transition")
> > Cc: stable@vger.kernel.org # 5.10+
> > Reported-by: Robert Gill <rtgill82@gmail.com>
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218707
> > Closes: https://lore.kernel.org/all/8c77ccfd-d561-45a1-8ed5-6b75212c7a58@leemhuis.info/
> > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Suggested-by: Brian Gerst <brgerst@gmail.com> # Use %ss
> > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > ---
> > v4:
> > - Further simplify the patch by using %ss for all VERW calls in 32-bit mode (Brian).
> > - In NMI exit path move VERW after RESTORE_ALL_NMI that touches GPRs (Dave).
> > 
> > v3: https://lore.kernel.org/r/20240701-fix-dosemu-vm86-v3-1-b1969532c75a@linux.intel.com
> > - Simplify CLEAR_CPU_BUFFERS_SAFE by using %ss instead of %ds (Brian).
> > - Do verw before popf in SYSEXIT path (Jari).
> > 
> > v2: https://lore.kernel.org/r/20240627-fix-dosemu-vm86-v2-1-d5579f698e77@linux.intel.com
> > - Safe guard against any other system calls like vm86() that might change %ds (Dave).
> > 
> > v1: https://lore.kernel.org/r/20240426-fix-dosemu-vm86-v1-1-88c826a3f378@linux.intel.com
> > ---
> > 
> > ---
> >   arch/x86/entry/entry_32.S | 18 +++++++++++++++---
> >   1 file changed, 15 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> > index d3a814efbff6..d54f6002e5a0 100644
> > --- a/arch/x86/entry/entry_32.S
> > +++ b/arch/x86/entry/entry_32.S
> > @@ -253,6 +253,16 @@
> >   .Lend_\@:
> >   .endm
> > +/*
> > + * Safer version of CLEAR_CPU_BUFFERS that uses %ss to reference VERW operand
> > + * mds_verw_sel. This ensures VERW will not #GP for an arbitrary user %ds.
> > + */
> > +.macro CLEAR_CPU_BUFFERS_SAFE
> > +	ALTERNATIVE "jmp .Lskip_verw\@", "", X86_FEATURE_CLEAR_CPU_BUF
> > +	verw	%ss:_ASM_RIP(mds_verw_sel)
> > +.Lskip_verw\@:
> > +.endm
> 
> Why not simply:
> 
> .macro CLEAR_CPU_BUFFERS_SAFE
> 	ALTERNATIVE "", __stringify(verw %ss:_ASM_RIP(mds_verw_sel)),
> X86_FEATURE_CLEAR_CPU_BUF
> .endm

We can do it this way as well. But, there are stable kernels that don't
support relocations in ALTERNATIVEs. The way it is done in current patch
can be backported without worrying about which kernels support relocations.
Re: [PATCH v4] x86/entry_32: Use stack segment selector for VERW operand
Posted by Uros Bizjak 1 year, 5 months ago
On Thu, Jul 11, 2024 at 1:16 AM Pawan Gupta
<pawan.kumar.gupta@linux.intel.com> wrote:
>
> On Wed, Jul 10, 2024 at 11:50:50PM +0200, Uros Bizjak wrote:
> >
> >
> > On 10. 07. 24 21:06, Pawan Gupta wrote:
> > > Robert Gill reported below #GP when dosemu software was executing vm86()
> > > system call:
> > >
> > >    general protection fault: 0000 [#1] PREEMPT SMP
> > >    CPU: 4 PID: 4610 Comm: dosemu.bin Not tainted 6.6.21-gentoo-x86 #1
> > >    Hardware name: Dell Inc. PowerEdge 1950/0H723K, BIOS 2.7.0 10/30/2010
> > >    EIP: restore_all_switch_stack+0xbe/0xcf
> > >    EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
> > >    ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ff8affdc
> > >    DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046
> > >    CR0: 80050033 CR2: 00c2101c CR3: 04b6d000 CR4: 000406d0
> > >    Call Trace:
> > >     show_regs+0x70/0x78
> > >     die_addr+0x29/0x70
> > >     exc_general_protection+0x13c/0x348
> > >     exc_bounds+0x98/0x98
> > >     handle_exception+0x14d/0x14d
> > >     exc_bounds+0x98/0x98
> > >     restore_all_switch_stack+0xbe/0xcf
> > >     exc_bounds+0x98/0x98
> > >     restore_all_switch_stack+0xbe/0xcf
> > >
> > > This only happens when VERW based mitigations like MDS/RFDS are enabled.
> > > This is because segment registers with an arbitrary user value can result
> > > in #GP when executing VERW. Intel SDM vol. 2C documents the following
> > > behavior for VERW instruction:
> > >
> > >    #GP(0) - If a memory operand effective address is outside the CS, DS, ES,
> > >        FS, or GS segment limit.
> > >
> > > CLEAR_CPU_BUFFERS macro executes VERW instruction before returning to user
> > > space. Replace CLEAR_CPU_BUFFERS with a safer version that uses %ss to
> > > refer VERW operand mds_verw_sel. This ensures VERW will not #GP for an
> > > arbitrary user %ds. Also, in NMI return path, move VERW to after
> > > RESTORE_ALL_NMI that touches GPRs.
> > >
> > > For clarity, below are the locations where the new CLEAR_CPU_BUFFERS_SAFE
> > > version is being used:
> > >
> > > * entry_INT80_32(), entry_SYSENTER_32() and interrupts (via
> > >    handle_exception_return) do:
> > >
> > > restore_all_switch_stack:
> > >    [...]
> > >     mov    %esi,%esi
> > >     verw   %ss:0xc0fc92c0  <-------------
> > >     iret
> > >
> > > * Opportunistic SYSEXIT:
> > >
> > >     [...]
> > >     verw   %ss:0xc0fc92c0  <-------------
> > >     btrl   $0x9,(%esp)
> > >     popf
> > >     pop    %eax
> > >     sti
> > >     sysexit
> > >
> > > *  nmi_return and nmi_from_espfix:
> > >     mov    %esi,%esi
> > >     verw   %ss:0xc0fc92c0  <-------------
> > >     jmp     .Lirq_return
> > >
> > > Fixes: a0e2dab44d22 ("x86/entry_32: Add VERW just before userspace transition")
> > > Cc: stable@vger.kernel.org # 5.10+
> > > Reported-by: Robert Gill <rtgill82@gmail.com>
> > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218707
> > > Closes: https://lore.kernel.org/all/8c77ccfd-d561-45a1-8ed5-6b75212c7a58@leemhuis.info/
> > > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > > Suggested-by: Brian Gerst <brgerst@gmail.com> # Use %ss
> > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > > ---
> > > v4:
> > > - Further simplify the patch by using %ss for all VERW calls in 32-bit mode (Brian).
> > > - In NMI exit path move VERW after RESTORE_ALL_NMI that touches GPRs (Dave).
> > >
> > > v3: https://lore.kernel.org/r/20240701-fix-dosemu-vm86-v3-1-b1969532c75a@linux.intel.com
> > > - Simplify CLEAR_CPU_BUFFERS_SAFE by using %ss instead of %ds (Brian).
> > > - Do verw before popf in SYSEXIT path (Jari).
> > >
> > > v2: https://lore.kernel.org/r/20240627-fix-dosemu-vm86-v2-1-d5579f698e77@linux.intel.com
> > > - Safe guard against any other system calls like vm86() that might change %ds (Dave).
> > >
> > > v1: https://lore.kernel.org/r/20240426-fix-dosemu-vm86-v1-1-88c826a3f378@linux.intel.com
> > > ---
> > >
> > > ---
> > >   arch/x86/entry/entry_32.S | 18 +++++++++++++++---
> > >   1 file changed, 15 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> > > index d3a814efbff6..d54f6002e5a0 100644
> > > --- a/arch/x86/entry/entry_32.S
> > > +++ b/arch/x86/entry/entry_32.S
> > > @@ -253,6 +253,16 @@
> > >   .Lend_\@:
> > >   .endm
> > > +/*
> > > + * Safer version of CLEAR_CPU_BUFFERS that uses %ss to reference VERW operand
> > > + * mds_verw_sel. This ensures VERW will not #GP for an arbitrary user %ds.
> > > + */
> > > +.macro CLEAR_CPU_BUFFERS_SAFE
> > > +   ALTERNATIVE "jmp .Lskip_verw\@", "", X86_FEATURE_CLEAR_CPU_BUF
> > > +   verw    %ss:_ASM_RIP(mds_verw_sel)
> > > +.Lskip_verw\@:
> > > +.endm
> >
> > Why not simply:
> >
> > .macro CLEAR_CPU_BUFFERS_SAFE
> >       ALTERNATIVE "", __stringify(verw %ss:_ASM_RIP(mds_verw_sel)),
> > X86_FEATURE_CLEAR_CPU_BUF
> > .endm
>
> We can do it this way as well. But, there are stable kernels that don't
> support relocations in ALTERNATIVEs. The way it is done in current patch
> can be backported without worrying about which kernels support relocations.

Then you could use absolute reference in backports to kernels that
don't support relocations in ALTERNATIVEs, "verw %ss:mds_verw_sel"
works as well, but the relocation is one byte larger.

Uros.
Re: [PATCH v4] x86/entry_32: Use stack segment selector for VERW operand
Posted by Jiri Slaby 1 year, 5 months ago
On 11. 07. 24, 10:30, Uros Bizjak wrote:
> On Thu, Jul 11, 2024 at 1:16 AM Pawan Gupta
> <pawan.kumar.gupta@linux.intel.com> wrote:
>>
>> On Wed, Jul 10, 2024 at 11:50:50PM +0200, Uros Bizjak wrote:
>>>
>>>
>>> On 10. 07. 24 21:06, Pawan Gupta wrote:
>>>> Robert Gill reported below #GP when dosemu software was executing vm86()
>>>> system call:
>>>>
>>>>     general protection fault: 0000 [#1] PREEMPT SMP
>>>>     CPU: 4 PID: 4610 Comm: dosemu.bin Not tainted 6.6.21-gentoo-x86 #1
>>>>     Hardware name: Dell Inc. PowerEdge 1950/0H723K, BIOS 2.7.0 10/30/2010
>>>>     EIP: restore_all_switch_stack+0xbe/0xcf
>>>>     EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
>>>>     ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ff8affdc
>>>>     DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046
>>>>     CR0: 80050033 CR2: 00c2101c CR3: 04b6d000 CR4: 000406d0
>>>>     Call Trace:
>>>>      show_regs+0x70/0x78
>>>>      die_addr+0x29/0x70
>>>>      exc_general_protection+0x13c/0x348
>>>>      exc_bounds+0x98/0x98
>>>>      handle_exception+0x14d/0x14d
>>>>      exc_bounds+0x98/0x98
>>>>      restore_all_switch_stack+0xbe/0xcf
>>>>      exc_bounds+0x98/0x98
>>>>      restore_all_switch_stack+0xbe/0xcf
>>>>
>>>> This only happens when VERW based mitigations like MDS/RFDS are enabled.
>>>> This is because segment registers with an arbitrary user value can result
>>>> in #GP when executing VERW. Intel SDM vol. 2C documents the following
>>>> behavior for VERW instruction:
>>>>
>>>>     #GP(0) - If a memory operand effective address is outside the CS, DS, ES,
>>>>         FS, or GS segment limit.
>>>>
>>>> CLEAR_CPU_BUFFERS macro executes VERW instruction before returning to user
>>>> space. Replace CLEAR_CPU_BUFFERS with a safer version that uses %ss to
>>>> refer VERW operand mds_verw_sel. This ensures VERW will not #GP for an
>>>> arbitrary user %ds. Also, in NMI return path, move VERW to after
>>>> RESTORE_ALL_NMI that touches GPRs.
>>>>
>>>> For clarity, below are the locations where the new CLEAR_CPU_BUFFERS_SAFE
>>>> version is being used:
>>>>
>>>> * entry_INT80_32(), entry_SYSENTER_32() and interrupts (via
>>>>     handle_exception_return) do:
>>>>
>>>> restore_all_switch_stack:
>>>>     [...]
>>>>      mov    %esi,%esi
>>>>      verw   %ss:0xc0fc92c0  <-------------
>>>>      iret
>>>>
>>>> * Opportunistic SYSEXIT:
>>>>
>>>>      [...]
>>>>      verw   %ss:0xc0fc92c0  <-------------
>>>>      btrl   $0x9,(%esp)
>>>>      popf
>>>>      pop    %eax
>>>>      sti
>>>>      sysexit
>>>>
>>>> *  nmi_return and nmi_from_espfix:
>>>>      mov    %esi,%esi
>>>>      verw   %ss:0xc0fc92c0  <-------------
>>>>      jmp     .Lirq_return
>>>>
>>>> Fixes: a0e2dab44d22 ("x86/entry_32: Add VERW just before userspace transition")
>>>> Cc: stable@vger.kernel.org # 5.10+
>>>> Reported-by: Robert Gill <rtgill82@gmail.com>
>>>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218707
>>>> Closes: https://lore.kernel.org/all/8c77ccfd-d561-45a1-8ed5-6b75212c7a58@leemhuis.info/
>>>> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
>>>> Suggested-by: Brian Gerst <brgerst@gmail.com> # Use %ss
>>>> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
>>>> ---
>>>> v4:
>>>> - Further simplify the patch by using %ss for all VERW calls in 32-bit mode (Brian).
>>>> - In NMI exit path move VERW after RESTORE_ALL_NMI that touches GPRs (Dave).
>>>>
>>>> v3: https://lore.kernel.org/r/20240701-fix-dosemu-vm86-v3-1-b1969532c75a@linux.intel.com
>>>> - Simplify CLEAR_CPU_BUFFERS_SAFE by using %ss instead of %ds (Brian).
>>>> - Do verw before popf in SYSEXIT path (Jari).
>>>>
>>>> v2: https://lore.kernel.org/r/20240627-fix-dosemu-vm86-v2-1-d5579f698e77@linux.intel.com
>>>> - Safe guard against any other system calls like vm86() that might change %ds (Dave).
>>>>
>>>> v1: https://lore.kernel.org/r/20240426-fix-dosemu-vm86-v1-1-88c826a3f378@linux.intel.com
>>>> ---
>>>>
>>>> ---
>>>>    arch/x86/entry/entry_32.S | 18 +++++++++++++++---
>>>>    1 file changed, 15 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
>>>> index d3a814efbff6..d54f6002e5a0 100644
>>>> --- a/arch/x86/entry/entry_32.S
>>>> +++ b/arch/x86/entry/entry_32.S
>>>> @@ -253,6 +253,16 @@
>>>>    .Lend_\@:
>>>>    .endm
>>>> +/*
>>>> + * Safer version of CLEAR_CPU_BUFFERS that uses %ss to reference VERW operand
>>>> + * mds_verw_sel. This ensures VERW will not #GP for an arbitrary user %ds.
>>>> + */
>>>> +.macro CLEAR_CPU_BUFFERS_SAFE
>>>> +   ALTERNATIVE "jmp .Lskip_verw\@", "", X86_FEATURE_CLEAR_CPU_BUF
>>>> +   verw    %ss:_ASM_RIP(mds_verw_sel)
>>>> +.Lskip_verw\@:
>>>> +.endm
>>>
>>> Why not simply:
>>>
>>> .macro CLEAR_CPU_BUFFERS_SAFE
>>>        ALTERNATIVE "", __stringify(verw %ss:_ASM_RIP(mds_verw_sel)),
>>> X86_FEATURE_CLEAR_CPU_BUF
>>> .endm
>>
>> We can do it this way as well. But, there are stable kernels that don't
>> support relocations in ALTERNATIVEs. The way it is done in current patch
>> can be backported without worrying about which kernels support relocations.
> 
> Then you could use absolute reference in backports to kernels that
> don't support relocations in ALTERNATIVEs, "verw %ss:mds_verw_sel"
> works as well, but the relocation is one byte larger.

No, the kernels support (and use) relocations. It's only ALTERNATIVEs in 
older kernels don't.

-- 
js
suse labs

Re: [PATCH v4] x86/entry_32: Use stack segment selector for VERW operand
Posted by Jiri Slaby 1 year, 5 months ago
On 11. 07. 24, 1:16, Pawan Gupta wrote:
> On Wed, Jul 10, 2024 at 11:50:50PM +0200, Uros Bizjak wrote:
...
>>> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
>>> index d3a814efbff6..d54f6002e5a0 100644
>>> --- a/arch/x86/entry/entry_32.S
>>> +++ b/arch/x86/entry/entry_32.S
>>> @@ -253,6 +253,16 @@
>>>    .Lend_\@:
>>>    .endm
>>> +/*
>>> + * Safer version of CLEAR_CPU_BUFFERS that uses %ss to reference VERW operand
>>> + * mds_verw_sel. This ensures VERW will not #GP for an arbitrary user %ds.
>>> + */
>>> +.macro CLEAR_CPU_BUFFERS_SAFE
>>> +	ALTERNATIVE "jmp .Lskip_verw\@", "", X86_FEATURE_CLEAR_CPU_BUF
>>> +	verw	%ss:_ASM_RIP(mds_verw_sel)
>>> +.Lskip_verw\@:
>>> +.endm
>>
>> Why not simply:
>>
>> .macro CLEAR_CPU_BUFFERS_SAFE
>> 	ALTERNATIVE "", __stringify(verw %ss:_ASM_RIP(mds_verw_sel)),
>> X86_FEATURE_CLEAR_CPU_BUF
>> .endm
> 
> We can do it this way as well. But, there are stable kernels that don't
> support relocations in ALTERNATIVEs. The way it is done in current patch
> can be backported without worrying about which kernels support relocations.

This sounds weird. There are code bases without ALTERNATIVE support at 
all. Will you expand ALTERNATIVE into some cmp & jmp here due to that? No.

Instead, you can send this "backport" to stable for older kernels later, 
once a proper patch is merged.

thanks,
-- 
js
suse labs
Re: [PATCH v4] x86/entry_32: Use stack segment selector for VERW operand
Posted by Pawan Gupta 1 year, 5 months ago
On Thu, Jul 11, 2024 at 07:49:25AM +0200, Jiri Slaby wrote:
> > > Why not simply:
> > > 
> > > .macro CLEAR_CPU_BUFFERS_SAFE
> > > 	ALTERNATIVE "", __stringify(verw %ss:_ASM_RIP(mds_verw_sel)),
> > > X86_FEATURE_CLEAR_CPU_BUF
> > > .endm
> > 
> > We can do it this way as well. But, there are stable kernels that don't
> > support relocations in ALTERNATIVEs. The way it is done in current patch
> > can be backported without worrying about which kernels support relocations.
> 
> This sounds weird. There are code bases without ALTERNATIVE support at all.
> Will you expand ALTERNATIVE into some cmp & jmp here due to that? No.

Agree, will change it to the way Uros and Peter suggested.

> Instead, you can send this "backport" to stable for older kernels later,
> once a proper patch is merged.

Ok, will take care of the differences in the backports.