[PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops

Peter Zijlstra posted 6 patches 10 months ago
There is a newer version of this series
[PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
Posted by Peter Zijlstra 10 months ago
Since there is only a single fastop() function, convert the FASTOP
stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return
thunks and all that jazz.

Specifically FASTOPs rely on the return thunk to preserve EFLAGS,
which not all of them can trivially do (call depth tracing suffers
here).

Objtool strenuously complains about things, therefore fix up the
various problems:

 - indirect call without a .rodata, fails to determine JUMP_TABLE,
   add an annotation for this.
 - fastop functions fall through, create an exception for this case
 - unreachable instruction after fastop_return, save/restore

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kvm/emulate.c              |   20 +++++++++++++++-----
 include/linux/objtool_types.h       |    1 +
 tools/include/linux/objtool_types.h |    1 +
 tools/objtool/check.c               |   11 ++++++++++-
 4 files changed, 27 insertions(+), 6 deletions(-)

--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -285,8 +285,8 @@ static void invalidate_registers(struct
  * different operand sizes can be reached by calculation, rather than a jump
  * table (which would be bigger than the code).
  *
- * The 16 byte alignment, considering 5 bytes for the RET thunk, 3 for ENDBR
- * and 1 for the straight line speculation INT3, leaves 7 bytes for the
+ * The 16 byte alignment, considering 5 bytes for the JMP, 4 for ENDBR
+ * and 1 for the straight line speculation INT3, leaves 6 bytes for the
  * body of the function.  Currently none is larger than 4.
  */
 static int fastop(struct x86_emulate_ctxt *ctxt, fastop_t fop);
@@ -304,7 +304,7 @@ static int fastop(struct x86_emulate_ctx
 	__FOP_FUNC(#name)
 
 #define __FOP_RET(name) \
-	"11: " ASM_RET \
+	"11: jmp fastop_return; int3 \n\t" \
 	".size " name ", .-" name "\n\t"
 
 #define FOP_RET(name) \
@@ -5044,14 +5044,24 @@ static void fetch_possible_mmx_operand(s
 		kvm_read_mmx_reg(op->addr.mm, &op->mm_val);
 }
 
-static int fastop(struct x86_emulate_ctxt *ctxt, fastop_t fop)
+/*
+ * All the FASTOP magic above relies on there being *one* instance of this
+ * so it can JMP back, avoiding RET and it's various thunks.
+ */
+static noinline int fastop(struct x86_emulate_ctxt *ctxt, fastop_t fop)
 {
 	ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF;
 
 	if (!(ctxt->d & ByteOp))
 		fop += __ffs(ctxt->dst.bytes) * FASTOP_SIZE;
 
-	asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
+	asm("push %[flags]; popf \n\t"
+	    UNWIND_HINT(UNWIND_HINT_TYPE_SAVE, 0, 0, 0)
+	    ASM_ANNOTATE(ANNOTYPE_JUMP_TABLE)
+	    JMP_NOSPEC
+	    "fastop_return: \n\t"
+	    UNWIND_HINT(UNWIND_HINT_TYPE_RESTORE, 0, 0, 0)
+	    "pushf; pop %[flags]\n"
 	    : "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags),
 	      [thunk_target]"+S"(fop), ASM_CALL_CONSTRAINT
 	    : "c"(ctxt->src2.val));
--- a/include/linux/objtool_types.h
+++ b/include/linux/objtool_types.h
@@ -65,5 +65,6 @@ struct unwind_hint {
 #define ANNOTYPE_IGNORE_ALTS		6
 #define ANNOTYPE_INTRA_FUNCTION_CALL	7
 #define ANNOTYPE_REACHABLE		8
+#define ANNOTYPE_JUMP_TABLE		9
 
 #endif /* _LINUX_OBJTOOL_TYPES_H */
--- a/tools/include/linux/objtool_types.h
+++ b/tools/include/linux/objtool_types.h
@@ -65,5 +65,6 @@ struct unwind_hint {
 #define ANNOTYPE_IGNORE_ALTS		6
 #define ANNOTYPE_INTRA_FUNCTION_CALL	7
 #define ANNOTYPE_REACHABLE		8
+#define ANNOTYPE_JUMP_TABLE		9
 
 #endif /* _LINUX_OBJTOOL_TYPES_H */
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2428,6 +2428,14 @@ static int __annotate_late(struct objtoo
 		insn->dead_end = false;
 		break;
 
+	/*
+	 * Must be after add_jump_table(); for it doesn't set a sane
+	 * _jump_table value.
+	 */
+	case ANNOTYPE_JUMP_TABLE:
+		insn->_jump_table = (void *)1;
+		break;
+
 	default:
 		ERROR_INSN(insn, "Unknown annotation type: %d", type);
 		return -1;
@@ -3559,7 +3567,8 @@ static int validate_branch(struct objtoo
 		if (func && insn_func(insn) && func != insn_func(insn)->pfunc) {
 			/* Ignore KCFI type preambles, which always fall through */
 			if (!strncmp(func->name, "__cfi_", 6) ||
-			    !strncmp(func->name, "__pfx_", 6))
+			    !strncmp(func->name, "__pfx_", 6) ||
+			    !strcmp(insn_func(insn)->name, "fastop"))
 				return 0;
 
 			if (file->ignore_unreachables)
Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
Posted by Josh Poimboeuf 10 months ago
On Mon, Apr 14, 2025 at 01:11:43PM +0200, Peter Zijlstra wrote:
> Since there is only a single fastop() function, convert the FASTOP
> stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return
> thunks and all that jazz.
> 
> Specifically FASTOPs rely on the return thunk to preserve EFLAGS,
> which not all of them can trivially do (call depth tracing suffers
> here).
> 
> Objtool strenuously complains about things, therefore fix up the
> various problems:
> 
>  - indirect call without a .rodata, fails to determine JUMP_TABLE,
>    add an annotation for this.
>  - fastop functions fall through, create an exception for this case
>  - unreachable instruction after fastop_return, save/restore

I think this breaks unwinding.  Each of the individual fastops inherits
fastop()'s stack but the ORC doesn't reflect that.

Should they just be moved to a proper .S file?

-- 
Josh
Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
Posted by Peter Zijlstra 9 months, 4 weeks ago
On Mon, Apr 14, 2025 at 03:36:50PM -0700, Josh Poimboeuf wrote:
> On Mon, Apr 14, 2025 at 01:11:43PM +0200, Peter Zijlstra wrote:
> > Since there is only a single fastop() function, convert the FASTOP
> > stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return
> > thunks and all that jazz.
> > 
> > Specifically FASTOPs rely on the return thunk to preserve EFLAGS,
> > which not all of them can trivially do (call depth tracing suffers
> > here).
> > 
> > Objtool strenuously complains about things, therefore fix up the
> > various problems:
> > 
> >  - indirect call without a .rodata, fails to determine JUMP_TABLE,
> >    add an annotation for this.
> >  - fastop functions fall through, create an exception for this case
> >  - unreachable instruction after fastop_return, save/restore
> 
> I think this breaks unwinding.  Each of the individual fastops inherits
> fastop()'s stack but the ORC doesn't reflect that.

I'm not sure I understand. There is only the one location, and we
simply save/restore the state around the one 'call'.
Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
Posted by Josh Poimboeuf 9 months, 4 weeks ago
On Tue, Apr 15, 2025 at 09:44:21AM +0200, Peter Zijlstra wrote:
> On Mon, Apr 14, 2025 at 03:36:50PM -0700, Josh Poimboeuf wrote:
> > On Mon, Apr 14, 2025 at 01:11:43PM +0200, Peter Zijlstra wrote:
> > > Since there is only a single fastop() function, convert the FASTOP
> > > stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return
> > > thunks and all that jazz.
> > > 
> > > Specifically FASTOPs rely on the return thunk to preserve EFLAGS,
> > > which not all of them can trivially do (call depth tracing suffers
> > > here).
> > > 
> > > Objtool strenuously complains about things, therefore fix up the
> > > various problems:
> > > 
> > >  - indirect call without a .rodata, fails to determine JUMP_TABLE,
> > >    add an annotation for this.
> > >  - fastop functions fall through, create an exception for this case
> > >  - unreachable instruction after fastop_return, save/restore
> > 
> > I think this breaks unwinding.  Each of the individual fastops inherits
> > fastop()'s stack but the ORC doesn't reflect that.
> 
> I'm not sure I understand. There is only the one location, and we
> simply save/restore the state around the one 'call'.

The problem isn't fastop() but rather the tiny functions it "calls".
Each of those is marked STT_FUNC so it gets its own ORC data saying the
return address is at RSP+8.

Changing from CALL_NOSPEC+RET to JMP_NOSPEC+JMP means the return address
isn't pushed before the branch.  Thus they become part of fastop()
rather than separate functions.  RSP+8 is only correct if it happens to
have not pushed anything to the stack before the indirect JMP.

The addresses aren't stored in an .rodata jump table so objtool doesn't
know the control flow.  Even if we made them non-FUNC, objtool wouldn't
be able to transfer the stack state.

-- 
Josh
Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
Posted by Peter Zijlstra 9 months, 4 weeks ago
On Tue, Apr 15, 2025 at 07:39:41AM -0700, Josh Poimboeuf wrote:
> On Tue, Apr 15, 2025 at 09:44:21AM +0200, Peter Zijlstra wrote:
> > On Mon, Apr 14, 2025 at 03:36:50PM -0700, Josh Poimboeuf wrote:
> > > On Mon, Apr 14, 2025 at 01:11:43PM +0200, Peter Zijlstra wrote:
> > > > Since there is only a single fastop() function, convert the FASTOP
> > > > stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return
> > > > thunks and all that jazz.
> > > > 
> > > > Specifically FASTOPs rely on the return thunk to preserve EFLAGS,
> > > > which not all of them can trivially do (call depth tracing suffers
> > > > here).
> > > > 
> > > > Objtool strenuously complains about things, therefore fix up the
> > > > various problems:
> > > > 
> > > >  - indirect call without a .rodata, fails to determine JUMP_TABLE,
> > > >    add an annotation for this.
> > > >  - fastop functions fall through, create an exception for this case
> > > >  - unreachable instruction after fastop_return, save/restore
> > > 
> > > I think this breaks unwinding.  Each of the individual fastops inherits
> > > fastop()'s stack but the ORC doesn't reflect that.
> > 
> > I'm not sure I understand. There is only the one location, and we
> > simply save/restore the state around the one 'call'.
> 
> The problem isn't fastop() but rather the tiny functions it "calls".
> Each of those is marked STT_FUNC so it gets its own ORC data saying the
> return address is at RSP+8.
> 
> Changing from CALL_NOSPEC+RET to JMP_NOSPEC+JMP means the return address
> isn't pushed before the branch.  Thus they become part of fastop()
> rather than separate functions.  RSP+8 is only correct if it happens to
> have not pushed anything to the stack before the indirect JMP.

Yeah, I finally got there. I'll go cook up something else.
Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
Posted by Peter Zijlstra 9 months, 2 weeks ago
On Wed, Apr 16, 2025 at 10:38:59AM +0200, Peter Zijlstra wrote:

> Yeah, I finally got there. I'll go cook up something else.

Sean, Paolo, can I once again ask how best to test this fastop crud?
Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
Posted by Sean Christopherson 9 months, 2 weeks ago
On Sat, Apr 26, 2025, Peter Zijlstra wrote:
> On Wed, Apr 16, 2025 at 10:38:59AM +0200, Peter Zijlstra wrote:
> 
> > Yeah, I finally got there. I'll go cook up something else.
> 
> Sean, Paolo, can I once again ask how best to test this fastop crud?

Apply the below, build KVM selftests, enable forced emulation in KVM, and then
run fastops_test.  It's well past time we had a selftest for this.  It won't
detect bugs that are specific to 32-bit kernels, e.g. b63f20a778c8 ("x86/retpoline:
Don't clobber RFLAGS during CALL_NOSPEC on i386"), since KVM selftests are 64-bit
only, but for what you're doing, it should suffice.

For 32-bit kernels, it requires a 32-bit QEMU and KVM-Unit-Tests (or maybe even
a full blown 32-bit guest image; I forget how much coverage KUT provides).
Regardless, I don't see any reason to put you through that pain, I can do that
sanity testing.

I'll post a proper patch for the new selftest after testing on AMD.  The test
relies on hardware providing deterministic behavior for undefined output (RFLAGS
and GPRs); I don't know if that holds true on AMD.

To enable forced emulation, set /sys/module/kvm/parameters/force_emulation_prefix
to '1' (for the purposes of this test, the value doesn't matter).  The param is
writable at runtime, so it doesn't matter if kvm.ko is built-in or a module. 

---
From: Sean Christopherson <seanjc@google.com>
Date: Mon, 28 Apr 2025 08:55:44 -0700
Subject: [PATCH] KVM: selftests: Add a test for x86's fastops emulation

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../testing/selftests/kvm/x86/fastops_test.c  | 165 ++++++++++++++++++
 2 files changed, 166 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86/fastops_test.c

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index f62b0a5aba35..411c3d5eb5b1 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -66,6 +66,7 @@ TEST_GEN_PROGS_x86 += x86/cr4_cpuid_sync_test
 TEST_GEN_PROGS_x86 += x86/dirty_log_page_splitting_test
 TEST_GEN_PROGS_x86 += x86/feature_msrs_test
 TEST_GEN_PROGS_x86 += x86/exit_on_emulation_failure_test
+TEST_GEN_PROGS_x86 += x86/fastops_test
 TEST_GEN_PROGS_x86 += x86/fix_hypercall_test
 TEST_GEN_PROGS_x86 += x86/hwcr_msr_test
 TEST_GEN_PROGS_x86 += x86/hyperv_clock
diff --git a/tools/testing/selftests/kvm/x86/fastops_test.c b/tools/testing/selftests/kvm/x86/fastops_test.c
new file mode 100644
index 000000000000..c3799edb5d0c
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/fastops_test.c
@@ -0,0 +1,165 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+/*
+ * Execute a fastop() instruction, with or without forced emulation.  BT bit 0
+ * to set RFLAGS.CF based on whether or not the input is even or odd, so that
+ * instructions like ADC and SBB are deterministic.
+ */
+#define guest_execute_fastop_1(FEP, insn, type_t, __val, __flags)			\
+do {											\
+	__asm__ __volatile__("bt $0, %[val]\n\t"					\
+			     FEP insn " %[val]\n\t"					\
+			     "pushfq\n\t"						\
+			     "pop %[flags]\n\t"						\
+			     : [val]"+r"(__val), [flags]"=r"(__flags)			\
+			     : : "cc", "memory");					\
+} while (0)
+
+#define guest_test_fastop_1(insn, type_t, __val)					\
+do {											\
+	type_t val = __val, ex_val = __val, input = __val;				\
+	uint64_t flags, ex_flags;							\
+											\
+	guest_execute_fastop_1("", insn, type_t, ex_val, ex_flags);			\
+	guest_execute_fastop_1(KVM_FEP, insn, type_t, val, flags);			\
+											\
+	__GUEST_ASSERT(val == ex_val,							\
+		       "Wanted 0x%lx for '%s 0x%lx', got 0x%lx",			\
+		       (uint64_t)ex_val, insn, (uint64_t)input, (uint64_t)val);		\
+	__GUEST_ASSERT(flags == ex_flags,						\
+			"Wanted flags 0x%lx for '%s 0x%lx', got 0x%lx",			\
+			ex_flags, insn, (uint64_t)input, flags);			\
+} while (0)
+
+#define guest_execute_fastop_2(FEP, insn, type_t, __input, __output, __flags)		\
+do {											\
+	__asm__ __volatile__("bt $0, %[output]\n\t"					\
+			     FEP insn " %[input], %[output]\n\t"			\
+			     "pushfq\n\t"						\
+			     "pop %[flags]\n\t"						\
+			     : [output]"+r"(__output), [flags]"=r"(__flags)		\
+			     : [input]"r"(__input) : "cc", "memory");			\
+} while (0)
+
+#define guest_test_fastop_2(insn, type_t, __val1, __val2)				\
+do {											\
+	type_t input = __val1, input2 = __val2, output = __val2, ex_output = __val2;	\
+	uint64_t flags, ex_flags;							\
+											\
+	guest_execute_fastop_2("", insn, type_t, input, ex_output, ex_flags);		\
+	guest_execute_fastop_2(KVM_FEP, insn, type_t, input, output, flags);		\
+											\
+	__GUEST_ASSERT(output == ex_output,						\
+		       "Wanted 0x%lx for '%s 0x%lx 0x%lx', got 0x%lx",			\
+		       (uint64_t)ex_output, insn, (uint64_t)input,			\
+		       (uint64_t)input2, (uint64_t)output);				\
+	__GUEST_ASSERT(flags == ex_flags,						\
+			"Wanted flags 0x%lx for '%s 0x%lx, 0x%lx', got 0x%lx",		\
+			ex_flags, insn, (uint64_t)input, (uint64_t)input2, flags);	\
+} while (0)
+
+#define guest_execute_fastop_cl(FEP, insn, type_t, __shift, __output, __flags)		\
+do {											\
+	__asm__ __volatile__("bt $0, %[output]\n\t"					\
+			     FEP insn " %%cl, %[output]\n\t"				\
+			     "pushfq\n\t"						\
+			     "pop %[flags]\n\t"						\
+			     : [output]"+r"(__output), [flags]"=r"(__flags)		\
+			     : "c"(__shift) : "cc", "memory");				\
+} while (0)
+
+#define guest_test_fastop_cl(insn, type_t, __val1, __val2)				\
+do {											\
+	type_t output = __val2, ex_output = __val2, input = __val2;			\
+	uint8_t shift = __val1;								\
+	uint64_t flags, ex_flags;							\
+											\
+	guest_execute_fastop_cl("", insn, type_t, shift, ex_output, ex_flags);		\
+	guest_execute_fastop_cl(KVM_FEP, insn, type_t, shift, output, flags);		\
+											\
+	__GUEST_ASSERT(output == ex_output,						\
+		       "Wanted 0x%lx for '%s 0x%x, 0x%lx', got 0x%lx",			\
+		       (uint64_t)ex_output, insn, shift, (uint64_t)input,		\
+		       (uint64_t)output);						\
+	__GUEST_ASSERT(flags == ex_flags,						\
+			"Wanted flags 0x%lx for '%s 0x%x, 0x%lx', got 0x%lx",		\
+			ex_flags, insn, shift, (uint64_t)input, flags);			\
+} while (0)
+
+static const uint64_t vals[] = {
+	0,
+	1,
+	2,
+	4,
+	7,
+	0x5555555555555555,
+	0xaaaaaaaaaaaaaaaa,
+	0xfefefefefefefefe,
+	0xffffffffffffffff,
+};
+
+#define guest_test_fastops(type_t, suffix)						\
+do {											\
+	int i, j;									\
+											\
+	for (i = 0; i < ARRAY_SIZE(vals); i++) {					\
+		guest_test_fastop_1("dec" suffix, type_t, vals[i]);			\
+		guest_test_fastop_1("inc" suffix, type_t, vals[i]);			\
+		guest_test_fastop_1("neg" suffix, type_t, vals[i]);			\
+		guest_test_fastop_1("not" suffix, type_t, vals[i]);			\
+											\
+		for (j = 0; j < ARRAY_SIZE(vals); j++) {				\
+			guest_test_fastop_2("add" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("adc" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("and" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("bsf" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("bsr" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("bt" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("btc" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("btr" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("bts" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("cmp" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("imul" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("or" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("sbb" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("sub" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("test" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("xor" suffix, type_t, vals[i], vals[j]);	\
+											\
+			guest_test_fastop_cl("rol" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_cl("ror" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_cl("rcl" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_cl("rcr" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_cl("sar" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_cl("shl" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_cl("shr" suffix, type_t, vals[i], vals[j]);	\
+		}									\
+	}										\
+} while (0)
+
+static void guest_code(void)
+{
+	guest_test_fastops(uint16_t, "w");
+	guest_test_fastops(uint32_t, "l");
+	guest_test_fastops(uint64_t, "q");
+
+	GUEST_DONE();
+}
+
+int main(int argc, char *argv[])
+{
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+
+	TEST_REQUIRE(is_forced_emulation_enabled);
+
+	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
+
+	vcpu_run(vcpu);
+	TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_DONE);
+
+	kvm_vm_free(vm);
+}

base-commit: 661b7ddb2d10258b53106d7c39c309806b00a99c
--
Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
Posted by Peter Zijlstra 9 months, 2 weeks ago
On Mon, Apr 28, 2025 at 10:13:31AM -0700, Sean Christopherson wrote:
> On Sat, Apr 26, 2025, Peter Zijlstra wrote:
> > On Wed, Apr 16, 2025 at 10:38:59AM +0200, Peter Zijlstra wrote:
> > 
> > > Yeah, I finally got there. I'll go cook up something else.
> > 
> > Sean, Paolo, can I once again ask how best to test this fastop crud?
> 
> Apply the below, build KVM selftests, 

Patch applied, my own hackery applied, host kernel built and booted,
foce_emulation_prefix set, but now I'm stuck at this seemingly simple
step..

$ cd tools/testing/selftests/kvm/
$ make
... metric ton of fail ...

Clearly I'm doing something wrong :/
Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
Posted by Sean Christopherson 9 months, 2 weeks ago
On Tue, Apr 29, 2025, Peter Zijlstra wrote:
> On Mon, Apr 28, 2025 at 10:13:31AM -0700, Sean Christopherson wrote:
> > On Sat, Apr 26, 2025, Peter Zijlstra wrote:
> > > On Wed, Apr 16, 2025 at 10:38:59AM +0200, Peter Zijlstra wrote:
> > > 
> > > > Yeah, I finally got there. I'll go cook up something else.
> > > 
> > > Sean, Paolo, can I once again ask how best to test this fastop crud?
> > 
> > Apply the below, build KVM selftests, 
> 
> Patch applied, my own hackery applied, host kernel built and booted,
> foce_emulation_prefix set, but now I'm stuck at this seemingly simple
> step..
> 
> $ cd tools/testing/selftests/kvm/
> $ make
> ... metric ton of fail ...
> 
> Clearly I'm doing something wrong :/

Did you install headers in the top level directory?  I.e. make headers_install.
The selftests build system was change a while back to require users to manually
install headers (I forget why, but it is indeed annoying).
Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
Posted by Peter Zijlstra 9 months, 2 weeks ago
On Tue, Apr 29, 2025 at 07:05:35AM -0700, Sean Christopherson wrote:
> On Tue, Apr 29, 2025, Peter Zijlstra wrote:
> > On Mon, Apr 28, 2025 at 10:13:31AM -0700, Sean Christopherson wrote:
> > > On Sat, Apr 26, 2025, Peter Zijlstra wrote:
> > > > On Wed, Apr 16, 2025 at 10:38:59AM +0200, Peter Zijlstra wrote:
> > > > 
> > > > > Yeah, I finally got there. I'll go cook up something else.
> > > > 
> > > > Sean, Paolo, can I once again ask how best to test this fastop crud?
> > > 
> > > Apply the below, build KVM selftests, 
> > 
> > Patch applied, my own hackery applied, host kernel built and booted,
> > foce_emulation_prefix set, but now I'm stuck at this seemingly simple
> > step..
> > 
> > $ cd tools/testing/selftests/kvm/
> > $ make
> > ... metric ton of fail ...
> > 
> > Clearly I'm doing something wrong :/
> 
> Did you install headers in the top level directory?  I.e. make headers_install.

No, of course not :-) I don't use the top directory to build anything,
ever.

All my builds are into build directories, using make O=foo. This allows
me to do parallel builds for multiple architectures etc. Also, much
easier to wipe a complete build directory than it is to clean out the
top level dir.

> The selftests build system was change a while back to require users to manually
> install headers (I forget why, but it is indeed annoying).

Bah, I remember NAK-ing that. Clearly the selftest people don't want
selftests to be usable :-(

Anyway, mingo build me a copy of the fastop selftest, and aside from a
few stupid mistakes, I seem to now pass it \o/

I'll attempt a Changelog and update the hyper-v patches and then post
the lot.
Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
Posted by Sean Christopherson 9 months, 2 weeks ago
On Tue, Apr 29, 2025, Peter Zijlstra wrote:
> On Tue, Apr 29, 2025 at 07:05:35AM -0700, Sean Christopherson wrote:
> > On Tue, Apr 29, 2025, Peter Zijlstra wrote:
> > > On Mon, Apr 28, 2025 at 10:13:31AM -0700, Sean Christopherson wrote:
> > > > On Sat, Apr 26, 2025, Peter Zijlstra wrote:
> > > > > On Wed, Apr 16, 2025 at 10:38:59AM +0200, Peter Zijlstra wrote:
> > > > > 
> > > > > > Yeah, I finally got there. I'll go cook up something else.
> > > > > 
> > > > > Sean, Paolo, can I once again ask how best to test this fastop crud?
> > > > 
> > > > Apply the below, build KVM selftests, 
> > > 
> > > Patch applied, my own hackery applied, host kernel built and booted,
> > > foce_emulation_prefix set, but now I'm stuck at this seemingly simple
> > > step..
> > > 
> > > $ cd tools/testing/selftests/kvm/
> > > $ make
> > > ... metric ton of fail ...
> > > 
> > > Clearly I'm doing something wrong :/
> > 
> > Did you install headers in the top level directory?  I.e. make headers_install.
> 
> No, of course not :-) I don't use the top directory to build anything,
> ever.
> 
> All my builds are into build directories, using make O=foo. This allows
> me to do parallel builds for multiple architectures etc. Also, much
> easier to wipe a complete build directory than it is to clean out the
> top level dir.

FWIW, you can do the same with KVM selftests (and presumably others?), although
the syntax is kinda weird (no idea why lib.mk uses OUTPUT instead of O).

E.g. to build KVM selftests in $HOME/build/selftests/x86

  make O=$HOME/build/selftests/x86 headers_install
  make OUTPUT=$HOME/build/selftests/x86