KVM: x86: Support APX feature for guests

[PATCH RFC v1 16/20] KVM: x86: Decode REX2 prefix in the emulator

Posted by Chang S. Bae 3 months ago

Extend the instruction emulator to recognize and interpret the REX2
prefix byte. Also, detect and flag invalid prefix sequences after a REX2
prefix.

In the existing prefix-decoding loop,
  * The loop exits when the first non-prefix byte is encountered.
  * Any non-REX prefix clears previously recorded REX information.

For REX2, however, once a REX2 prefix is encountered, most subsequent
prefixes are invalid. So, each subsequent prefix needs to be validated
before continuing the loop.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
RFC note:
The REX2 decoding itself is straightforward. The additional logic is
mainly to detect and handle invalid prefix sequences. If this seems
excessive, there is a chance to cut off this check since VMX would raise
'#UD' on such cases anyway.
---
 arch/x86/kvm/emulate.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 9bd61ea496e5..f9381a4055d6 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4844,7 +4844,7 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
 	ctxt->op_bytes = def_op_bytes;
 	ctxt->ad_bytes = def_ad_bytes;
 
-	/* Legacy prefixes. */
+	/* Legacy and REX/REX2 prefixes. */
 	for (;;) {
 		switch (ctxt->b = insn_fetch(u8, ctxt)) {
 		case 0x66:	/* operand-size override */
@@ -4887,9 +4887,20 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
 		case 0x40 ... 0x4f: /* REX */
 			if (mode != X86EMUL_MODE_PROT64)
 				goto done_prefixes;
+			if (ctxt->rex_prefix == REX2_PREFIX)
+				break;
 			ctxt->rex_prefix = REX_PREFIX;
 			ctxt->rex.raw    = 0x0f & ctxt->b;
 			continue;
+		case 0xd5: /* REX2 */
+			if (mode != X86EMUL_MODE_PROT64)
+				goto done_prefixes;
+			if (ctxt->rex_prefix == REX2_PREFIX &&
+			    ctxt->rex.bits.m0 == 0)
+				break;
+			ctxt->rex_prefix = REX2_PREFIX;
+			ctxt->rex.raw    = insn_fetch(u8, ctxt);
+			continue;
 		case 0xf0:	/* LOCK */
 			ctxt->lock_prefix = 1;
 			break;
@@ -4901,6 +4912,17 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
 			goto done_prefixes;
 		}
 
+		if (ctxt->rex_prefix == REX2_PREFIX) {
+			/*
+			 * A legacy or REX prefix following a REX2 prefix
+			 * forms an invalid byte sequences. Likewise,
+			 * a second REX2 prefix following a REX2 prefix
+			 * with M0=0 is invalid.
+			 */
+			ctxt->rex_prefix = REX2_INVALID;
+			goto done_prefixes;
+		}
+
 		/* Any legacy prefix after a REX prefix nullifies its effect. */
 		ctxt->rex_prefix = REX_NONE;
 		ctxt->rex.raw = 0;
-- 
2.51.0

Re: [PATCH RFC v1 16/20] KVM: x86: Decode REX2 prefix in the emulator

Posted by Paolo Bonzini 2 months, 4 weeks ago

On 11/10/25 19:01, Chang S. Bae wrote:
> Extend the instruction emulator to recognize and interpret the REX2
> prefix byte. Also, detect and flag invalid prefix sequences after a REX2
> prefix.
> 
> In the existing prefix-decoding loop,
>    * The loop exits when the first non-prefix byte is encountered.
>    * Any non-REX prefix clears previously recorded REX information.
> 
> For REX2, however, once a REX2 prefix is encountered, most subsequent
> prefixes are invalid. So, each subsequent prefix needs to be validated
> before continuing the loop.
> 
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> ---
> RFC note:
> The REX2 decoding itself is straightforward. The additional logic is
> mainly to detect and handle invalid prefix sequences. If this seems
> excessive, there is a chance to cut off this check since VMX would raise
> '#UD' on such cases anyway.
> ---
>   arch/x86/kvm/emulate.c | 24 +++++++++++++++++++++++-
>   1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 9bd61ea496e5..f9381a4055d6 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -4844,7 +4844,7 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
>   	ctxt->op_bytes = def_op_bytes;
>   	ctxt->ad_bytes = def_ad_bytes;
>   
> -	/* Legacy prefixes. */
> +	/* Legacy and REX/REX2 prefixes. */
>   	for (;;) {
>   		switch (ctxt->b = insn_fetch(u8, ctxt)) {
>   		case 0x66:	/* operand-size override */
> @@ -4887,9 +4887,20 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
>   		case 0x40 ... 0x4f: /* REX */
>   			if (mode != X86EMUL_MODE_PROT64)
>   				goto done_prefixes;
> +			if (ctxt->rex_prefix == REX2_PREFIX)
> +				break;
>   			ctxt->rex_prefix = REX_PREFIX;
>   			ctxt->rex.raw    = 0x0f & ctxt->b;
>   			continue;
> +		case 0xd5: /* REX2 */
> +			if (mode != X86EMUL_MODE_PROT64)
> +				goto done_prefixes;
Here you should also check

	if (ctxt->rex_prefix == REX_PREFIX) {
		ctxt->rex_prefix = REX2_INVALID;
		goto done_prefixes;
	}

> +			if (ctxt->rex_prefix == REX2_PREFIX &&
> +			    ctxt->rex.bits.m0 == 0)
> +				break;
> +			ctxt->rex_prefix = REX2_PREFIX;
> +			ctxt->rex.raw    = insn_fetch(u8, ctxt);
> +			continue;
After REX2 always comes the main opcode byte, so you can "goto 
done_prefixes" here.  Or even jump here already; in pseudocode:

	ctxt->b = insn_fetch(u8, ctxt);
	if (rex2 & REX_M)
		goto decode_twobyte;
	else
		goto decode_onebyte;

...

	if (ctxt->b == 0x0f) {
decode_twobyte:
		...
		if (ctxt->b == 0x38 && ctxt->rex_prefix != REX2_PREFIX)
			...
	} else {
decode_onebyte:
		...
	}



>   		case 0xf0:	/* LOCK */
>   			ctxt->lock_prefix = 1;
>   			break;
> @@ -4901,6 +4912,17 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
>   			goto done_prefixes;
>   		}
>   
> +		if (ctxt->rex_prefix == REX2_PREFIX) {
> +			/*
> +			 * A legacy or REX prefix following a REX2 prefix
> +			 * forms an invalid byte sequences. Likewise,
> +			 * a second REX2 prefix following a REX2 prefix
> +			 * with M0=0 is invalid.
> +			 */
> +			ctxt->rex_prefix = REX2_INVALID;
> +			goto done_prefixes;
> +		}

... and this is not needed.

Paolo

>   		/* Any legacy prefix after a REX prefix nullifies its effect. */
>   		ctxt->rex_prefix = REX_NONE;
>   		ctxt->rex.raw = 0;

Re: [PATCH RFC v1 16/20] KVM: x86: Decode REX2 prefix in the emulator

Posted by Chang S. Bae 2 months, 3 weeks ago

On 11/11/2025 9:55 AM, Paolo Bonzini wrote:
> On 11/10/25 19:01, Chang S. Bae wrote:
>>
>>           case 0x40 ... 0x4f: /* REX */
>>               if (mode != X86EMUL_MODE_PROT64)
>>                   goto done_prefixes;
>> +            if (ctxt->rex_prefix == REX2_PREFIX)
>> +                break;
>>               ctxt->rex_prefix = REX_PREFIX;
>>               ctxt->rex.raw    = 0x0f & ctxt->b;
>>               continue;
>> +        case 0xd5: /* REX2 */
>> +            if (mode != X86EMUL_MODE_PROT64)
>> +                goto done_prefixes;
> Here you should also check
> 
>      if (ctxt->rex_prefix == REX_PREFIX) {
>          ctxt->rex_prefix = REX2_INVALID;
>          goto done_prefixes;
>      }

You're right. Section 3.1.2.1 states:
| A REX prefix (0x4*) immediately preceding the REX2 prefix is not
| allowed and triggers #UD.

Now I think REX2_INVALID would just add another condition to handle
later. Instead, for such invalid case, it might be simpler to mark the
opcode as undefined and jump all the way after the lookup. See the diff
-- please let me know if you dislike it.

>> +            if (ctxt->rex_prefix == REX2_PREFIX &&
>> +                ctxt->rex.bits.m0 == 0)
>> +                break;
>> +            ctxt->rex_prefix = REX2_PREFIX;
>> +            ctxt->rex.raw    = insn_fetch(u8, ctxt);
>> +            continue;
> After REX2 always comes the main opcode byte, so you can "goto 
> done_prefixes" here.  Or even jump here already; in pseudocode:
> 
>      ctxt->b = insn_fetch(u8, ctxt);
>      if (rex2 & REX_M)
>          goto decode_twobyte;
>      else
>          goto decode_onebyte;

Yes, agreed. I think this makes the control flow more explicit.

>> +        if (ctxt->rex_prefix == REX2_PREFIX) {
>> +            /*
>> +             * A legacy or REX prefix following a REX2 prefix
>> +             * forms an invalid byte sequences. Likewise,
>> +             * a second REX2 prefix following a REX2 prefix
>> +             * with M0=0 is invalid.
>> +             */
>> +            ctxt->rex_prefix = REX2_INVALID;
>> +            goto done_prefixes;
>> +        }
> 
> ... and this is not needed.

I really like that this can go away.diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index b8a946cbd587..c62d21de14cb 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4479,6 +4479,8 @@ static const struct opcode opcode_map_0f_38[256] = {
 	N, N, X4(N), X8(N)
 };
 
+static const struct opcode undefined = D(Undefined);
+
 #undef D
 #undef N
 #undef G
@@ -4765,6 +4767,11 @@ static int decode_operand(struct x86_emulate_ctxt *ctxt, struct operand *op,
 	return rc;
 }
 
+static inline bool emul_egpr_enabled(struct x86_emulate_ctxt *ctxt __maybe_unused)
+{
+	return false;
+}
+
 int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int emulation_type)
 {
 	int rc = X86EMUL_CONTINUE;
@@ -4817,7 +4824,7 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
 	ctxt->op_bytes = def_op_bytes;
 	ctxt->ad_bytes = def_ad_bytes;
 
-	/* Legacy prefixes. */
+	/* Legacy and REX/REX2 prefixes. */
 	for (;;) {
 		switch (ctxt->b = insn_fetch(u8, ctxt)) {
 		case 0x66:	/* operand-size override */
@@ -4860,9 +4867,29 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
 		case 0x40 ... 0x4f: /* REX */
 			if (mode != X86EMUL_MODE_PROT64)
 				goto done_prefixes;
+			if (ctxt->rex_prefix == REX2_PREFIX) {
+				opcode = undefined;
+				goto decode_done;
+			}
 			ctxt->rex_prefix = REX_PREFIX;
 			ctxt->rex        = 0x0f & ctxt->b;
 			continue;
+		case 0xd5: /* REX2 */
+			if (mode != X86EMUL_MODE_PROT64)
+				goto done_prefixes;
+			if ((ctxt->rex_prefix == REX2_PREFIX && (ctxt->rex & REX_M) == 0) ||
+			    (ctxt->rex_prefix == REX_PREFIX) ||
+			    (!emul_egpr_enabled(ctxt))) {
+				opcode = undefined;
+				goto decode_done;
+			}
+			ctxt->rex_prefix = REX2_PREFIX;
+			ctxt->rex = insn_fetch(u8, ctxt);
+			ctxt->b   = insn_fetch(u8, ctxt);
+			if (ctxt->rex & REX_M)
+				goto decode_twobytes;
+			else
+				goto decode_onebyte;
 		case 0xf0:	/* LOCK */
 			ctxt->lock_prefix = 1;
 			break;
@@ -4889,6 +4916,7 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
 	if (ctxt->b == 0x0f) {
 		/* Escape byte: start two-byte opcode sequence */
 		ctxt->b = insn_fetch(u8, ctxt);
+decode_twobytes:
 		if (ctxt->b == 0x38 && ctxt->rex_prefix != REX2_PREFIX) {
 			/* Three-byte opcode */
 			ctxt->opcode_len = 3;
@@ -4900,10 +4928,12 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
 			opcode = twobyte_table[ctxt->b];
 		}
 	} else {
+decode_onebyte:
 		/* Single-byte opcode */
 		ctxt->opcode_len = 1;
 		opcode = opcode_table[ctxt->b];
 	}
+decode_done:
 	ctxt->d = opcode.flags;
 
 	if (ctxt->d & NoRex2 && ctxt->rex_prefix == REX2_PREFIX)

Re: [PATCH RFC v1 16/20] KVM: x86: Decode REX2 prefix in the emulator

Posted by Chang S. Bae 2 months, 3 weeks ago

On 11/13/2025 3:30 PM, Chang S. Bae wrote:
> On 11/11/2025 9:55 AM, Paolo Bonzini wrote:
>> On 11/10/25 19:01, Chang S. Bae wrote:
>>>
>>>           case 0x40 ... 0x4f: /* REX */
>>>               if (mode != X86EMUL_MODE_PROT64)
>>>                   goto done_prefixes;
>>> +            if (ctxt->rex_prefix == REX2_PREFIX)
>>> +                break;
>>>               ctxt->rex_prefix = REX_PREFIX;
>>>               ctxt->rex.raw    = 0x0f & ctxt->b;
>>>               continue;
>>> +        case 0xd5: /* REX2 */
>>> +            if (mode != X86EMUL_MODE_PROT64)
>>> +                goto done_prefixes;

[...]

>>> +            if (ctxt->rex_prefix == REX2_PREFIX &&
>>> +                ctxt->rex.bits.m0 == 0)
>>> +                break;
>>> +            ctxt->rex_prefix = REX2_PREFIX;
>>> +            ctxt->rex.raw    = insn_fetch(u8, ctxt);
>>> +            continue;
>> After REX2 always comes the main opcode byte, so you can "goto 
>> done_prefixes" here.  Or even jump here already; in pseudocode:
>>
>>      ctxt->b = insn_fetch(u8, ctxt);
>>      if (rex2 & REX_M)
>>          goto decode_twobyte;
>>      else
>>          goto decode_onebyte;
> 
> Yes, agreed. I think this makes the control flow more explicit.

While rebasing onto your VEX series, I noticed a couple of missings:

   (1) Jumping directly to the decode path skips the ctxt->op_bytes
       setup.
   (2) It also removes the logic that detects the invalid sequence:
       REX2->REX (unless intentional).

Perhaps it makes sense to simply continue prefix parsing. Then, at
'done_prefixes', we can check the M bit next to the W-bit check and jump
to the two-byte decode path.

I’ve attached a revised diff on top of the VEX series.diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1b3da3ba26b8..3a66741b6c8c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -245,6 +245,7 @@ enum {
 	REX_X = 2,
 	REX_R = 4,
 	REX_W = 8,
+	REX_M = 0x80,
 };
 
 static void writeback_registers(struct x86_emulate_ctxt *ctxt)
@@ -4849,6 +4850,18 @@ static int x86_decode_avx(struct x86_emulate_ctxt *ctxt,
 	return rc;
 }
 
+static inline bool rex2_invalid(struct x86_emulate_ctxt *ctxt)
+{
+	const struct x86_emulate_ops *ops = ctxt->ops;
+	u64 xcr = 0;
+
+	return ctxt->rex_prefix == REX_PREFIX ||
+	       (ctxt->rex_prefix == REX2_PREFIX && !(ctxt->rex_bits & REX_M)) ||
+	       !(ops->get_cr(ctxt, 4) & X86_CR4_OSXSAVE) ||
+	       ops->get_xcr(ctxt, 0, &xcr) ||
+	       !(xcr & XFEATURE_MASK_APX);
+}
+
 int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int emulation_type)
 {
 	int rc = X86EMUL_CONTINUE;
@@ -4902,7 +4915,7 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
 	ctxt->op_bytes = def_op_bytes;
 	ctxt->ad_bytes = def_ad_bytes;
 
-	/* Legacy prefixes. */
+	/* Legacy and REX/REX2 prefixes. */
 	for (;;) {
 		switch (ctxt->b = insn_fetch(u8, ctxt)) {
 		case 0x66:	/* operand-size override */
@@ -4945,9 +4958,23 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
 		case 0x40 ... 0x4f: /* REX */
 			if (mode != X86EMUL_MODE_PROT64)
 				goto done_prefixes;
+			if (ctxt->rex_prefix == REX2_PREFIX) {
+				opcode = ud;
+				goto done_modrm;
+			}
 			ctxt->rex_prefix = REX_PREFIX;
 			ctxt->rex_bits   = ctxt->b & 0xf;
 			continue;
+		case 0xd5: /* REX2 */
+			if (mode != X86EMUL_MODE_PROT64)
+				goto done_prefixes;
+			if (rex2_invalid(ctxt)) {
+				opcode = ud;
+				goto done_modrm;
+			}
+			ctxt->rex_prefix = REX2_PREFIX;
+			ctxt->rex_bits   = insn_fetch(u8, ctxt);
+			continue;
 		case 0xf0:	/* LOCK */
 			ctxt->lock_prefix = 1;
 			break;
@@ -4970,6 +4997,9 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
 	if (ctxt->rex_bits & REX_W)
 		ctxt->op_bytes = 8;
 
+	if (ctxt->rex_bits & REX_M)
+		goto decode_twobytes;
+
 	/* Opcode byte(s). */
 	if (ctxt->b == 0xc4 || ctxt->b == 0xc5) {
 		/* VEX or LDS/LES */
@@ -4987,8 +5017,9 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
 			goto done;
 	} else if (ctxt->b == 0x0f) {
 		/* Two- or three-byte opcode */
-		ctxt->opcode_len = 2;
 		ctxt->b = insn_fetch(u8, ctxt);
+decode_twobytes:
+		ctxt->opcode_len = 2;
 		opcode = twobyte_table[ctxt->b];
 
 		/* 0F_38 opcode map */

Re: [PATCH RFC v1 16/20] KVM: x86: Decode REX2 prefix in the emulator

Posted by Paolo Bonzini 2 months, 3 weeks ago

Il lun 17 nov 2025, 21:01 Chang S. Bae <chang.seok.bae@intel.com> ha scritto:
>
> While rebasing onto your VEX series, I noticed a couple of missings:
>
>    (1) Jumping directly to the decode path skips the ctxt->op_bytes
>        setup.
>    (2) It also removes the logic that detects the invalid sequence:
>        REX2->REX (unless intentional).
>
> Perhaps it makes sense to simply continue prefix parsing. Then, at
> 'done_prefixes', we can check the M bit next to the W-bit check and jump
> to the two-byte decode path.
>
> I’ve attached a revised diff on top of the VEX series.

Yes, that works for me with one change---after REX2 is processed there
should be a "ctxt->b = insn_fetch(u8, ctxt); goto done_prefixes;"
because REX2 is always the last prefix.

This also means that checking "(ctxt->rex_prefix == REX2_PREFIX &&
!(ctxt->rex_bits & REX_M))" is unnecessary. Instead the second REX2
prefix's 0xd5 byte can be treated as a No64 opcode and will trigger
#UD. In fact this is what the manual says: "a REX prefix byte (0x4*),
a VEX prefix byte (0xC4 or 0xC5), an EVEX prefix byte (0x62), or
another REX2 prefix byte (0xD5) following a REX2 prefix with REX2.M0 =
0 must #UD, because none of those bytes is the opcode of a valid
instruction in legacy map 0 in 64-bit mode".

So all you need to do is add the No64 flag to entries 0x40...0x4F of
the opcode_table, and then "goto done_prefixes" will cover that
sentence nicely.

Paolo

Re: [PATCH RFC v1 16/20] KVM: x86: Decode REX2 prefix in the emulator

Posted by Paolo Bonzini 2 months, 3 weeks ago

On 11/14/25 00:30, Chang S. Bae wrote:
> On 11/11/2025 9:55 AM, Paolo Bonzini wrote:
>> On 11/10/25 19:01, Chang S. Bae wrote:
>>>
>>>           case 0x40 ... 0x4f: /* REX */
>>>               if (mode != X86EMUL_MODE_PROT64)
>>>                   goto done_prefixes;
>>> +            if (ctxt->rex_prefix == REX2_PREFIX)
>>> +                break;
>>>               ctxt->rex_prefix = REX_PREFIX;
>>>               ctxt->rex.raw    = 0x0f & ctxt->b;
>>>               continue;
>>> +        case 0xd5: /* REX2 */
>>> +            if (mode != X86EMUL_MODE_PROT64)
>>> +                goto done_prefixes;
>> Here you should also check
>>
>>      if (ctxt->rex_prefix == REX_PREFIX) {
>>          ctxt->rex_prefix = REX2_INVALID;
>>          goto done_prefixes;
>>      }
> 
> You're right. Section 3.1.2.1 states:
> | A REX prefix (0x4*) immediately preceding the REX2 prefix is not
> | allowed and triggers #UD.
> 
> Now I think REX2_INVALID would just add another condition to handle
> later. Instead, for such invalid case, it might be simpler to mark the
> opcode as undefined and jump all the way after the lookup. See the diff
> -- please let me know if you dislike it.

Yes, I also thought it was unnecessary but waited until we merged the 
respective patches.

Paolo