x86/umip: Fix UMIP insn decoder false positives

[PATCH 1/3] x86/umip: Check that the instruction opcode is at least two bytes

Posted by Sean Christopherson 6 months ago

When checking for a potential UMIP violation on #GP, verify the decoder
found at least two opcode bytes to avoid false positives when the kernel
encounters an unknown instruction that starts with 0f.  Because the array
of opcode.bytes is zero-initialized by insn_init(), peeking at bytes[1]
will misinterpret garbage as a potential SLDT or STR instruction, and can
incorrectly trigger emulation.

E.g. if a vpalignr instruction

   62 83 c5 05 0f 08 ff     vpalignr xmm17{k5},xmm23,XMMWORD PTR [r8],0xff

hits a #GP, the kernel emulates it as STR and squashes the #GP (and
corrupts the userspace code stream).

Arguably the check should look for exactly two bytes, but no three byte
opcodes use '0f 00 xx' or '0f 01 xx' as an escape, i.e. it should be
impossible to get a false positive if the first two opcode bytes match
'0f 00' or '0f 01'.  Go with a more conservative check with respect to the
existing code to minimize the chances of breaking userspace, e.g. due to
decoder weirdness.

Fixes: 1e5db223696a ("x86/umip: Add emulation code for UMIP instructions")
Reported-by: Dan Snyder <dansnyder@google.com>
Analyzed-by; Nick Bray <ncbray@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/umip.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index 5a4b21389b1d..406ac01ce16d 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -156,8 +156,8 @@ static int identify_insn(struct insn *insn)
 	if (!insn->modrm.nbytes)
 		return -EINVAL;
 
-	/* All the instructions of interest start with 0x0f. */
-	if (insn->opcode.bytes[0] != 0xf)
+	/* The instructions of interest have 2-byte opcodes: 0F 00 or 0F 01. */
+	if (insn->opcode.nbytes < 2 || insn->opcode.bytes[0] != 0xf)
 		return -EINVAL;
 
 	if (insn->opcode.bytes[1] == 0x1) {
-- 
2.50.1.703.g449372360f-goog

Re: [PATCH 1/3] x86/umip: Check that the instruction opcode is at least two bytes

Posted by Borislav Petkov 4 months, 3 weeks ago

On Fri, Aug 08, 2025 at 10:23:56AM -0700, Sean Christopherson wrote:
> When checking for a potential UMIP violation on #GP, verify the decoder
> found at least two opcode bytes to avoid false positives when the kernel
> encounters an unknown instruction that starts with 0f.  Because the array
> of opcode.bytes is zero-initialized by insn_init(), peeking at bytes[1]
> will misinterpret garbage as a potential SLDT or STR instruction, and can
> incorrectly trigger emulation.
> 
> E.g. if a vpalignr instruction
> 
>    62 83 c5 05 0f 08 ff     vpalignr xmm17{k5},xmm23,XMMWORD PTR [r8],0xff
> 
> hits a #GP, the kernel emulates it as STR and squashes the #GP (and
> corrupts the userspace code stream).
> 
> Arguably the check should look for exactly two bytes, but no three byte
> opcodes use '0f 00 xx' or '0f 01 xx' as an escape, i.e. it should be
> impossible to get a false positive if the first two opcode bytes match
> '0f 00' or '0f 01'.  Go with a more conservative check with respect to the
> existing code to minimize the chances of breaking userspace, e.g. due to
> decoder weirdness.

So I did some staring... I guess this fix is trying to address our insn
decoder shortcoming and calls it "weirdness", right?

$ objdump -d a.out | awk -f ./arch/x86/tools/objdump_reformat.awk | ./arch/x86/tools/insn_decoder_test 
./arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder bug, please report this.
./arch/x86/tools/insn_decoder_test: warning:    0:      62 83 c5 05 0f 08 ff    vpalignr $0xff,(%r8),%xmm23,%xmm17{%k5}
./arch/x86/tools/insn_decoder_test: warning: objdump says 7 bytes, but insn_get_length() says 6
./arch/x86/tools/insn_decoder_test: warning: Decoded and checked 1 instructions with 1 failures

Looks like it.

a.out has:

0000000000000000 <.text>:
   0:   62 83 c5 05 0f 08 ff    vpalignr $0xff,(%r8),%xmm23,%xmm17{%k5}

I guess just adding the insn to the table doesn't fix it.

Masami?

---
diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 262f7ca1fb95..a23ff3c16908 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -23,7 +23,7 @@
 #
 # AVX Superscripts
 #  (ev): this opcode requires EVEX prefix.
-#  (es): this opcode requires EVEX prefix and is SCALABALE.
+#  (es): this opcode requires EVEX prefix and is SCALABLE.
 #  (evo): this opcode is changed by EVEX prefix (EVEX opcode)
 #  (v): this opcode requires VEX prefix.
 #  (v1): this opcode only supports 128bit VEX.
@@ -867,7 +867,7 @@ AVXcode: 3
 0c: vblendps Vx,Hx,Wx,Ib (66)
 0d: vblendpd Vx,Hx,Wx,Ib (66)
 0e: vpblendw Vx,Hx,Wx,Ib (66),(v1)
-0f: palignr Pq,Qq,Ib | vpalignr Vx,Hx,Wx,Ib (66),(v1)
+0f: palignr Pq,Qq,Ib | vpalignr Vx,Hx,Wx,Ib (66),(v1) | vpalignr Vx,kz,Hx,Wx,Ib (ev)
 14: vpextrb Rd/Mb,Vdq,Ib (66),(v1)
 15: vpextrw Rd/Mw,Vdq,Ib (66),(v1)
 16: vpextrd/q Ey,Vdq,Ib (66),(v1)


> Fixes: 1e5db223696a ("x86/umip: Add emulation code for UMIP instructions")
> Reported-by: Dan Snyder <dansnyder@google.com>
> Analyzed-by; Nick Bray <ncbray@google.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kernel/umip.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
> index 5a4b21389b1d..406ac01ce16d 100644
> --- a/arch/x86/kernel/umip.c
> +++ b/arch/x86/kernel/umip.c
> @@ -156,8 +156,8 @@ static int identify_insn(struct insn *insn)
>  	if (!insn->modrm.nbytes)
>  		return -EINVAL;
>  
> -	/* All the instructions of interest start with 0x0f. */
> -	if (insn->opcode.bytes[0] != 0xf)
> +	/* The instructions of interest have 2-byte opcodes: 0F 00 or 0F 01. */
> +	if (insn->opcode.nbytes < 2 || insn->opcode.bytes[0] != 0xf)
>  		return -EINVAL;
>  
>  	if (insn->opcode.bytes[1] == 0x1) {
> -- 
> 2.50.1.703.g449372360f-goog
> 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 1/3] x86/umip: Check that the instruction opcode is at least two bytes

Posted by Sean Christopherson 4 months, 3 weeks ago

On Fri, Sep 19, 2025, Borislav Petkov wrote:
> On Fri, Aug 08, 2025 at 10:23:56AM -0700, Sean Christopherson wrote:
> > When checking for a potential UMIP violation on #GP, verify the decoder
> > found at least two opcode bytes to avoid false positives when the kernel
> > encounters an unknown instruction that starts with 0f.  Because the array
> > of opcode.bytes is zero-initialized by insn_init(), peeking at bytes[1]
> > will misinterpret garbage as a potential SLDT or STR instruction, and can
> > incorrectly trigger emulation.
> > 
> > E.g. if a vpalignr instruction
> > 
> >    62 83 c5 05 0f 08 ff     vpalignr xmm17{k5},xmm23,XMMWORD PTR [r8],0xff
> > 
> > hits a #GP, the kernel emulates it as STR and squashes the #GP (and
> > corrupts the userspace code stream).
> > 
> > Arguably the check should look for exactly two bytes, but no three byte
> > opcodes use '0f 00 xx' or '0f 01 xx' as an escape, i.e. it should be
> > impossible to get a false positive if the first two opcode bytes match
> > '0f 00' or '0f 01'.  Go with a more conservative check with respect to the
> > existing code to minimize the chances of breaking userspace, e.g. due to
> > decoder weirdness.
> 
> So I did some staring... I guess this fix is trying to address our insn
> decoder shortcoming and calls it "weirdness", right?
> 
> $ objdump -d a.out | awk -f ./arch/x86/tools/objdump_reformat.awk | ./arch/x86/tools/insn_decoder_test 
> ./arch/x86/tools/insn_decoder_test: warning: Found an x86 instruction decoder bug, please report this.
> ./arch/x86/tools/insn_decoder_test: warning:    0:      62 83 c5 05 0f 08 ff    vpalignr $0xff,(%r8),%xmm23,%xmm17{%k5}
> ./arch/x86/tools/insn_decoder_test: warning: objdump says 7 bytes, but insn_get_length() says 6
> ./arch/x86/tools/insn_decoder_test: warning: Decoded and checked 1 instructions with 1 failures
> 
> Looks like it.
> 
> a.out has:
> 
> 0000000000000000 <.text>:
>    0:   62 83 c5 05 0f 08 ff    vpalignr $0xff,(%r8),%xmm23,%xmm17{%k5}
> 
> I guess just adding the insn to the table doesn't fix it.

vpalignr is just one example of an unhandled instruction, that's not what I find
weird.

The "weirdness" I am referring to is purely speculative; what I was trying to say
is that I deliberate went with a "bad" check on nbytes, i.e. it really should be
"insn->opcode.nbytes == 2".  But I didn't want to risk breaking some bizarre
userspace that happened to be relying on a quirk of the kernel's decoder (I
haven't dug into the decoder, so I genuinely have/had no idea what all could
happen).

> Masami?
> 
> ---
> diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
> index 262f7ca1fb95..a23ff3c16908 100644
> --- a/arch/x86/lib/x86-opcode-map.txt
> +++ b/arch/x86/lib/x86-opcode-map.txt
> @@ -23,7 +23,7 @@
>  #
>  # AVX Superscripts
>  #  (ev): this opcode requires EVEX prefix.
> -#  (es): this opcode requires EVEX prefix and is SCALABALE.
> +#  (es): this opcode requires EVEX prefix and is SCALABLE.
>  #  (evo): this opcode is changed by EVEX prefix (EVEX opcode)
>  #  (v): this opcode requires VEX prefix.
>  #  (v1): this opcode only supports 128bit VEX.
> @@ -867,7 +867,7 @@ AVXcode: 3
>  0c: vblendps Vx,Hx,Wx,Ib (66)
>  0d: vblendpd Vx,Hx,Wx,Ib (66)
>  0e: vpblendw Vx,Hx,Wx,Ib (66),(v1)
> -0f: palignr Pq,Qq,Ib | vpalignr Vx,Hx,Wx,Ib (66),(v1)
> +0f: palignr Pq,Qq,Ib | vpalignr Vx,Hx,Wx,Ib (66),(v1) | vpalignr Vx,kz,Hx,Wx,Ib (ev)
>  14: vpextrb Rd/Mb,Vdq,Ib (66),(v1)
>  15: vpextrw Rd/Mw,Vdq,Ib (66),(v1)
>  16: vpextrd/q Ey,Vdq,Ib (66),(v1)
> 
> 
> > Fixes: 1e5db223696a ("x86/umip: Add emulation code for UMIP instructions")
> > Reported-by: Dan Snyder <dansnyder@google.com>
> > Analyzed-by; Nick Bray <ncbray@google.com>
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kernel/umip.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
> > index 5a4b21389b1d..406ac01ce16d 100644
> > --- a/arch/x86/kernel/umip.c
> > +++ b/arch/x86/kernel/umip.c
> > @@ -156,8 +156,8 @@ static int identify_insn(struct insn *insn)
> >  	if (!insn->modrm.nbytes)
> >  		return -EINVAL;
> >  
> > -	/* All the instructions of interest start with 0x0f. */
> > -	if (insn->opcode.bytes[0] != 0xf)
> > +	/* The instructions of interest have 2-byte opcodes: 0F 00 or 0F 01. */
> > +	if (insn->opcode.nbytes < 2 || insn->opcode.bytes[0] != 0xf)
> >  		return -EINVAL;
> >  
> >  	if (insn->opcode.bytes[1] == 0x1) {
> > -- 
> > 2.50.1.703.g449372360f-goog
> > 
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 1/3] x86/umip: Check that the instruction opcode is at least two bytes

Posted by Peter Zijlstra 5 months, 4 weeks ago

On Fri, Aug 08, 2025 at 10:23:56AM -0700, Sean Christopherson wrote:
> When checking for a potential UMIP violation on #GP, verify the decoder
> found at least two opcode bytes to avoid false positives when the kernel
> encounters an unknown instruction that starts with 0f.  Because the array
> of opcode.bytes is zero-initialized by insn_init(), peeking at bytes[1]
> will misinterpret garbage as a potential SLDT or STR instruction, and can
> incorrectly trigger emulation.
> 
> E.g. if a vpalignr instruction
> 
>    62 83 c5 05 0f 08 ff     vpalignr xmm17{k5},xmm23,XMMWORD PTR [r8],0xff
> 
> hits a #GP, the kernel emulates it as STR and squashes the #GP (and
> corrupts the userspace code stream).
> 
> Arguably the check should look for exactly two bytes, but no three byte
> opcodes use '0f 00 xx' or '0f 01 xx' as an escape, i.e. it should be
> impossible to get a false positive if the first two opcode bytes match
> '0f 00' or '0f 01'.  Go with a more conservative check with respect to the
> existing code to minimize the chances of breaking userspace, e.g. due to
> decoder weirdness.
> 
> Fixes: 1e5db223696a ("x86/umip: Add emulation code for UMIP instructions")
> Reported-by: Dan Snyder <dansnyder@google.com>
> Analyzed-by; Nick Bray <ncbray@google.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

> ---
>  arch/x86/kernel/umip.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
> index 5a4b21389b1d..406ac01ce16d 100644
> --- a/arch/x86/kernel/umip.c
> +++ b/arch/x86/kernel/umip.c
> @@ -156,8 +156,8 @@ static int identify_insn(struct insn *insn)
>  	if (!insn->modrm.nbytes)
>  		return -EINVAL;
>  
> -	/* All the instructions of interest start with 0x0f. */
> -	if (insn->opcode.bytes[0] != 0xf)
> +	/* The instructions of interest have 2-byte opcodes: 0F 00 or 0F 01. */
> +	if (insn->opcode.nbytes < 2 || insn->opcode.bytes[0] != 0xf)
>  		return -EINVAL;
>  
>  	if (insn->opcode.bytes[1] == 0x1) {
> -- 
> 2.50.1.703.g449372360f-goog
>

[tip: x86/cpu] x86/umip: Check that the instruction opcode is at least two bytes

Posted by tip-bot2 for Sean Christopherson 4 months, 3 weeks ago

The following commit has been merged into the x86/cpu branch of tip:

Commit-ID:     32278c677947ae2f042c9535674a7fff9a245dd3
Gitweb:        https://git.kernel.org/tip/32278c677947ae2f042c9535674a7fff9a245dd3
Author:        Sean Christopherson <seanjc@google.com>
AuthorDate:    Fri, 08 Aug 2025 10:23:56 -07:00
Committer:     Borislav Petkov (AMD) <bp@alien8.de>
CommitterDate: Fri, 19 Sep 2025 20:21:12 +02:00

x86/umip: Check that the instruction opcode is at least two bytes

When checking for a potential UMIP violation on #GP, verify the decoder found
at least two opcode bytes to avoid false positives when the kernel encounters
an unknown instruction that starts with 0f.  Because the array of opcode.bytes
is zero-initialized by insn_init(), peeking at bytes[1] will misinterpret
garbage as a potential SLDT or STR instruction, and can incorrectly trigger
emulation.

E.g. if a VPALIGNR instruction

   62 83 c5 05 0f 08 ff     vpalignr xmm17{k5},xmm23,XMMWORD PTR [r8],0xff

hits a #GP, the kernel emulates it as STR and squashes the #GP (and corrupts
the userspace code stream).

Arguably the check should look for exactly two bytes, but no three byte
opcodes use '0f 00 xx' or '0f 01 xx' as an escape, i.e. it should be
impossible to get a false positive if the first two opcode bytes match '0f 00'
or '0f 01'.  Go with a more conservative check with respect to the existing
code to minimize the chances of breaking userspace, e.g. due to decoder
weirdness.

Analyzed by Nick Bray <ncbray@google.com>.

Fixes: 1e5db223696a ("x86/umip: Add emulation code for UMIP instructions")
Reported-by: Dan Snyder <dansnyder@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
---
 arch/x86/kernel/umip.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index 5a4b213..406ac01 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -156,8 +156,8 @@ static int identify_insn(struct insn *insn)
 	if (!insn->modrm.nbytes)
 		return -EINVAL;
 
-	/* All the instructions of interest start with 0x0f. */
-	if (insn->opcode.bytes[0] != 0xf)
+	/* The instructions of interest have 2-byte opcodes: 0F 00 or 0F 01. */
+	if (insn->opcode.nbytes < 2 || insn->opcode.bytes[0] != 0xf)
 		return -EINVAL;
 
 	if (insn->opcode.bytes[1] == 0x1) {

[PATCH 1/3] x86/umip: Check that the instruction opcode is at least two bytes
[PATCH 2/3] x86/umip: Fix decoding of register forms of 0F 01 (SGDT and SIDT aliases)
[PATCH 3/3] *** DO NOT MERGE *** x86/umip: Lazy person's KUnit test for UMIP emulation