[PATCH v2] x86/PV: issue branch prediction barrier when switching 64-bit guest to kernel mode

Jan Beulich posted 1 patch 1 year, 9 months ago
Failed in applying to current master (apply log)
[PATCH v2] x86/PV: issue branch prediction barrier when switching 64-bit guest to kernel mode
Posted by Jan Beulich 1 year, 9 months ago
Since both kernel and user mode run in ring 3, they run in the same
"predictor mode". While the kernel could take care of this itself, doing
so would be yet another item distinguishing PV from native. Additionally
we're in a much better position to issue the barrier command, and we can
save a #GP (for privileged instruction emulation) this way.

To allow to recover performance, introduce a new VM assist allowing the guest
kernel to suppress this barrier.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Leverage entry-IBPB. Add VM assist. Re-base.
---
I'm not entirely happy with re-using opt_ibpb_ctxt_switch here (it's a
mode switch after all, but v1 used opt_ibpb here), but it also didn't
seem very reasonable to introduce yet another command line option. The
only feasible alternative I would see is to check the CPUID bits directly.

--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -757,7 +757,8 @@ static inline void pv_inject_sw_interrup
  * but we can't make such requests fail all of the sudden.
  */
 #define PV64_VM_ASSIST_MASK (PV32_VM_ASSIST_MASK                      | \
-                             (1UL << VMASST_TYPE_m2p_strict))
+                             (1UL << VMASST_TYPE_m2p_strict)          | \
+                             (1UL << VMASST_TYPE_mode_switch_no_ibpb))
 #define HVM_VM_ASSIST_MASK  (1UL << VMASST_TYPE_runstate_update_flag)
 
 #define arch_vm_assist_valid_mask(d) \
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -467,7 +467,15 @@ void toggle_guest_mode(struct vcpu *v)
     if ( v->arch.flags & TF_kernel_mode )
         v->arch.pv.gs_base_kernel = gs_base;
     else
+    {
         v->arch.pv.gs_base_user = gs_base;
+
+        if ( opt_ibpb_ctxt_switch &&
+             !(d->arch.spec_ctrl_flags & SCF_entry_ibpb) &&
+             !VM_ASSIST(d, mode_switch_no_ibpb) )
+            wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB);
+    }
+
     asm volatile ( "swapgs" );
 
     _toggle_guest_pt(v);
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -571,6 +571,16 @@ DEFINE_XEN_GUEST_HANDLE(mmuext_op_t);
  */
 #define VMASST_TYPE_m2p_strict           32
 
+/*
+ * x86-64 guests: Suppress IBPB on guest-user to guest-kernel mode switch.
+ *
+ * By default (on affected and capable hardware) as a safety measure Xen,
+ * to cover for the fact that guest-kernel and guest-user modes are both
+ * running in ring 3 (and hence share prediction context), would issue a
+ * barrier for user->kernel mode switches of PV guests.
+ */
+#define VMASST_TYPE_mode_switch_no_ibpb  33
+
 #if __XEN_INTERFACE_VERSION__ < 0x00040600
 #define MAX_VMASST_TYPE                  3
 #endif
Re: [PATCH v2] x86/PV: issue branch prediction barrier when switching 64-bit guest to kernel mode
Posted by Roger Pau Monné 1 year, 7 months ago
On Tue, Jul 19, 2022 at 02:55:17PM +0200, Jan Beulich wrote:
> Since both kernel and user mode run in ring 3, they run in the same
> "predictor mode". While the kernel could take care of this itself, doing
> so would be yet another item distinguishing PV from native. Additionally
> we're in a much better position to issue the barrier command, and we can
> save a #GP (for privileged instruction emulation) this way.
> 
> To allow to recover performance, introduce a new VM assist allowing the guest
> kernel to suppress this barrier.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Leverage entry-IBPB. Add VM assist. Re-base.
> ---
> I'm not entirely happy with re-using opt_ibpb_ctxt_switch here (it's a
> mode switch after all, but v1 used opt_ibpb here), but it also didn't
> seem very reasonable to introduce yet another command line option. The
> only feasible alternative I would see is to check the CPUID bits directly.

Likely needs a mention in xen-command-line.md that the `ibpb` option
also controls whether a barrier is executed by Xen in PV vCPU context
switches from user-space to kernel.  The current text only mentions
vCPU context switches.

The rest LGTM.

Thanks, Roger.
Re: [PATCH v2] x86/PV: issue branch prediction barrier when switching 64-bit guest to kernel mode
Posted by Jan Beulich 1 year, 7 months ago
On 13.09.2022 17:41, Roger Pau Monné wrote:
> On Tue, Jul 19, 2022 at 02:55:17PM +0200, Jan Beulich wrote:
>> Since both kernel and user mode run in ring 3, they run in the same
>> "predictor mode". While the kernel could take care of this itself, doing
>> so would be yet another item distinguishing PV from native. Additionally
>> we're in a much better position to issue the barrier command, and we can
>> save a #GP (for privileged instruction emulation) this way.
>>
>> To allow to recover performance, introduce a new VM assist allowing the guest
>> kernel to suppress this barrier.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> v2: Leverage entry-IBPB. Add VM assist. Re-base.
>> ---
>> I'm not entirely happy with re-using opt_ibpb_ctxt_switch here (it's a
>> mode switch after all, but v1 used opt_ibpb here), but it also didn't
>> seem very reasonable to introduce yet another command line option. The
>> only feasible alternative I would see is to check the CPUID bits directly.
> 
> Likely needs a mention in xen-command-line.md that the `ibpb` option
> also controls whether a barrier is executed by Xen in PV vCPU context
> switches from user-space to kernel.  The current text only mentions
> vCPU context switches.

Andrew and I actually discussed this perhaps better having a separate
control.

Jan

Re: [PATCH v2] x86/PV: issue branch prediction barrier when switching 64-bit guest to kernel mode
Posted by Roger Pau Monné 1 year, 7 months ago
On Tue, Sep 13, 2022 at 06:05:30PM +0200, Jan Beulich wrote:
> On 13.09.2022 17:41, Roger Pau Monné wrote:
> > On Tue, Jul 19, 2022 at 02:55:17PM +0200, Jan Beulich wrote:
> >> Since both kernel and user mode run in ring 3, they run in the same
> >> "predictor mode". While the kernel could take care of this itself, doing
> >> so would be yet another item distinguishing PV from native. Additionally
> >> we're in a much better position to issue the barrier command, and we can
> >> save a #GP (for privileged instruction emulation) this way.
> >>
> >> To allow to recover performance, introduce a new VM assist allowing the guest
> >> kernel to suppress this barrier.
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> ---
> >> v2: Leverage entry-IBPB. Add VM assist. Re-base.
> >> ---
> >> I'm not entirely happy with re-using opt_ibpb_ctxt_switch here (it's a
> >> mode switch after all, but v1 used opt_ibpb here), but it also didn't
> >> seem very reasonable to introduce yet another command line option. The
> >> only feasible alternative I would see is to check the CPUID bits directly.
> > 
> > Likely needs a mention in xen-command-line.md that the `ibpb` option
> > also controls whether a barrier is executed by Xen in PV vCPU context
> > switches from user-space to kernel.  The current text only mentions
> > vCPU context switches.
> 
> Andrew and I actually discussed this perhaps better having a separate
> control.

OK, didn't know there was some feedback here already.  A separate
control would indeed be clearer.  I guess a new patch will appear
then?

Thanks, Roger.