tcg/i386: fix vector operations on 32-bit hosts

[Qemu-devel] [PATCH] tcg/i386: fix vector operations on 32-bit hosts

Posted by Roman Kapl 7 years, 2 months ago

The TCG backend uses LOWREGMASK to get the low 7 bits of register numbers. This
was defined as no-op for 32-bit x86, with the assumption that we have eight
registers anyway. This assumption is not true once we have xmm regs.

Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes and
have overflown into other opcode fields, wreaking havoc.

To trigger these problems, you can try running the "movi d8, #0x0" AArch64
instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated,
but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2".

Fixes: 770c2fc7bb ("Add vector operations")
Signed-off-by: Roman Kapl <rka@sysgo.com>
---

Note: It could also be possible to add a dedicated VEC_LOWREGMASK, but I don't
think it is better or signigicantly faster.

 tcg/i386/tcg-target.inc.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index a91e4f1313..436195894b 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -302,11 +302,7 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
     return 0;
 }
 
-#if TCG_TARGET_REG_BITS == 64
 # define LOWREGMASK(x)	((x) & 7)
-#else
-# define LOWREGMASK(x)	(x)
-#endif
 
 #define P_EXT		0x100		/* 0x0f opcode prefix */
 #define P_EXT38         0x200           /* 0x0f 0x38 opcode prefix */
-- 
2.11.0

Re: [Qemu-devel] [PATCH] tcg/i386: fix vector operations on 32-bit hosts

Posted by Roman Kapl 7 years, 1 month ago

ping http://patchwork.ozlabs.org/patch/961849/

On 08/24/2018 03:17 PM, Roman Kapl wrote:
> The TCG backend uses LOWREGMASK to get the low 7 bits of register numbers. This
> was defined as no-op for 32-bit x86, with the assumption that we have eight
> registers anyway. This assumption is not true once we have xmm regs.
> 
> Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes and
> have overflown into other opcode fields, wreaking havoc.
> 
> To trigger these problems, you can try running the "movi d8, #0x0" AArch64
> instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated,
> but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2".
> 
> Fixes: 770c2fc7bb ("Add vector operations")
> Signed-off-by: Roman Kapl <rka@sysgo.com>
> ---
> 
> Note: It could also be possible to add a dedicated VEC_LOWREGMASK, but I don't
> think it is better or signigicantly faster.
> 
>   tcg/i386/tcg-target.inc.c | 4 ----
>   1 file changed, 4 deletions(-)
> 
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index a91e4f1313..436195894b 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -302,11 +302,7 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
>       return 0;
>   }
>   
> -#if TCG_TARGET_REG_BITS == 64
>   # define LOWREGMASK(x)	((x) & 7)
> -#else
> -# define LOWREGMASK(x)	(x)
> -#endif
>   
>   #define P_EXT		0x100		/* 0x0f opcode prefix */
>   #define P_EXT38         0x200           /* 0x0f 0x38 opcode prefix */
>

Re: [Qemu-devel] [PATCH] tcg/i386: fix vector operations on 32-bit hosts

Posted by Philippe Mathieu-Daudé 7 years, 1 month ago

On 8/24/18 3:17 PM, Roman Kapl wrote:
> The TCG backend uses LOWREGMASK to get the low 7 bits of register numbers. This

7 = 0b111: the low 3 bits?

> was defined as no-op for 32-bit x86, with the assumption that we have eight
> registers anyway. This assumption is not true once we have xmm regs.
> 
> Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes and
> have overflown into other opcode fields, wreaking havoc.
> 
> To trigger these problems, you can try running the "movi d8, #0x0" AArch64
> instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated,
> but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2".
> 
> Fixes: 770c2fc7bb ("Add vector operations")
> Signed-off-by: Roman Kapl <rka@sysgo.com>
> ---
> 
> Note: It could also be possible to add a dedicated VEC_LOWREGMASK, but I don't
> think it is better or signigicantly faster.
> 
>  tcg/i386/tcg-target.inc.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index a91e4f1313..436195894b 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -302,11 +302,7 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
>      return 0;
>  }
>  
> -#if TCG_TARGET_REG_BITS == 64
>  # define LOWREGMASK(x)	((x) & 7)
> -#else
> -# define LOWREGMASK(x)	(x)
> -#endif
>  
>  #define P_EXT		0x100		/* 0x0f opcode prefix */
>  #define P_EXT38         0x200           /* 0x0f 0x38 opcode prefix */
>

Re: [Qemu-devel] [PATCH] tcg/i386: fix vector operations on 32-bit hosts

Posted by Roman Kapl 7 years, 1 month ago

Hi,

On 09/20/2018 02:19 PM, Philippe Mathieu-Daudé wrote:
> On 8/24/18 3:17 PM, Roman Kapl wrote:
>> The TCG backend uses LOWREGMASK to get the low 7 bits of register numbers. This
> 
> 7 = 0b111: the low 3 bits?

Yes, low 3 bits, 8 registers, the commit message is wrong.

Thanks, Roman Kapl

Re: [Qemu-devel] [PATCH] tcg/i386: fix vector operations on 32-bit hosts

Posted by Richard Henderson 7 years, 1 month ago

On 8/24/18 6:17 AM, Roman Kapl wrote:
> The TCG backend uses LOWREGMASK to get the low 7 bits of register numbers. This
> was defined as no-op for 32-bit x86, with the assumption that we have eight
> registers anyway. This assumption is not true once we have xmm regs.
> 
> Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes and
> have overflown into other opcode fields, wreaking havoc.
> 
> To trigger these problems, you can try running the "movi d8, #0x0" AArch64
> instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated,
> but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2".
> 
> Fixes: 770c2fc7bb ("Add vector operations")
> Signed-off-by: Roman Kapl <rka@sysgo.com>
> ---
> 
> Note: It could also be possible to add a dedicated VEC_LOWREGMASK, but I don't
> think it is better or signigicantly faster.

Agreed, this does seem to be the clearest solution.
I've queued the patch with the "7" typo in the commit message fixed.


r~