The TCG backend uses LOWREGMASK to get the low 7 bits of register numbers. This
was defined as no-op for 32-bit x86, with the assumption that we have eight
registers anyway. This assumption is not true once we have xmm regs.
Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes and
have overflown into other opcode fields, wreaking havoc.
To trigger these problems, you can try running the "movi d8, #0x0" AArch64
instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated,
but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2".
Fixes: 770c2fc7bb ("Add vector operations")
Signed-off-by: Roman Kapl <rka@sysgo.com>
---
Note: It could also be possible to add a dedicated VEC_LOWREGMASK, but I don't
think it is better or signigicantly faster.
tcg/i386/tcg-target.inc.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index a91e4f1313..436195894b 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -302,11 +302,7 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
return 0;
}
-#if TCG_TARGET_REG_BITS == 64
# define LOWREGMASK(x) ((x) & 7)
-#else
-# define LOWREGMASK(x) (x)
-#endif
#define P_EXT 0x100 /* 0x0f opcode prefix */
#define P_EXT38 0x200 /* 0x0f 0x38 opcode prefix */
--
2.11.0
ping http://patchwork.ozlabs.org/patch/961849/
On 08/24/2018 03:17 PM, Roman Kapl wrote:
> The TCG backend uses LOWREGMASK to get the low 7 bits of register numbers. This
> was defined as no-op for 32-bit x86, with the assumption that we have eight
> registers anyway. This assumption is not true once we have xmm regs.
>
> Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes and
> have overflown into other opcode fields, wreaking havoc.
>
> To trigger these problems, you can try running the "movi d8, #0x0" AArch64
> instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated,
> but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2".
>
> Fixes: 770c2fc7bb ("Add vector operations")
> Signed-off-by: Roman Kapl <rka@sysgo.com>
> ---
>
> Note: It could also be possible to add a dedicated VEC_LOWREGMASK, but I don't
> think it is better or signigicantly faster.
>
> tcg/i386/tcg-target.inc.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index a91e4f1313..436195894b 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -302,11 +302,7 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
> return 0;
> }
>
> -#if TCG_TARGET_REG_BITS == 64
> # define LOWREGMASK(x) ((x) & 7)
> -#else
> -# define LOWREGMASK(x) (x)
> -#endif
>
> #define P_EXT 0x100 /* 0x0f opcode prefix */
> #define P_EXT38 0x200 /* 0x0f 0x38 opcode prefix */
>
On 8/24/18 3:17 PM, Roman Kapl wrote:
> The TCG backend uses LOWREGMASK to get the low 7 bits of register numbers. This
7 = 0b111: the low 3 bits?
> was defined as no-op for 32-bit x86, with the assumption that we have eight
> registers anyway. This assumption is not true once we have xmm regs.
>
> Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes and
> have overflown into other opcode fields, wreaking havoc.
>
> To trigger these problems, you can try running the "movi d8, #0x0" AArch64
> instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated,
> but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2".
>
> Fixes: 770c2fc7bb ("Add vector operations")
> Signed-off-by: Roman Kapl <rka@sysgo.com>
> ---
>
> Note: It could also be possible to add a dedicated VEC_LOWREGMASK, but I don't
> think it is better or signigicantly faster.
>
> tcg/i386/tcg-target.inc.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index a91e4f1313..436195894b 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -302,11 +302,7 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
> return 0;
> }
>
> -#if TCG_TARGET_REG_BITS == 64
> # define LOWREGMASK(x) ((x) & 7)
> -#else
> -# define LOWREGMASK(x) (x)
> -#endif
>
> #define P_EXT 0x100 /* 0x0f opcode prefix */
> #define P_EXT38 0x200 /* 0x0f 0x38 opcode prefix */
>
Hi, On 09/20/2018 02:19 PM, Philippe Mathieu-Daudé wrote: > On 8/24/18 3:17 PM, Roman Kapl wrote: >> The TCG backend uses LOWREGMASK to get the low 7 bits of register numbers. This > > 7 = 0b111: the low 3 bits? Yes, low 3 bits, 8 registers, the commit message is wrong. Thanks, Roman Kapl
On 8/24/18 6:17 AM, Roman Kapl wrote:
> The TCG backend uses LOWREGMASK to get the low 7 bits of register numbers. This
> was defined as no-op for 32-bit x86, with the assumption that we have eight
> registers anyway. This assumption is not true once we have xmm regs.
>
> Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes and
> have overflown into other opcode fields, wreaking havoc.
>
> To trigger these problems, you can try running the "movi d8, #0x0" AArch64
> instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated,
> but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2".
>
> Fixes: 770c2fc7bb ("Add vector operations")
> Signed-off-by: Roman Kapl <rka@sysgo.com>
> ---
>
> Note: It could also be possible to add a dedicated VEC_LOWREGMASK, but I don't
> think it is better or signigicantly faster.
Agreed, this does seem to be the clearest solution.
I've queued the patch with the "7" typo in the commit message fixed.
r~
© 2016 - 2025 Red Hat, Inc.