Normally this is automatic in the size restrictions that are placed
on vector sizes coming from the implementation. However, for the
legitimate size tuple [oprsz=8, maxsz=32], we need to clear the final
24 bytes of the vector register. Without this check, do_dup selects
TCG_TYPE_V128 and clears only 16 bytes.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
tcg/tcg-op-gvec.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 22db1590d5..61c25f5784 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -287,8 +287,11 @@ void tcg_gen_gvec_4_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs,
in units of LNSZ. This limits the expansion of inline code. */
static inline bool check_size_impl(uint32_t oprsz, uint32_t lnsz)
{
- uint32_t lnct = oprsz / lnsz;
- return lnct >= 1 && lnct <= MAX_UNROLL;
+ if (oprsz % lnsz == 0) {
+ uint32_t lnct = oprsz / lnsz;
+ return lnct >= 1 && lnct <= MAX_UNROLL;
+ }
+ return false;
}
static void expand_clr(uint32_t dofs, uint32_t maxsz);
--
2.17.1
Richard Henderson <richard.henderson@linaro.org> writes:
> Normally this is automatic in the size restrictions that are placed
> on vector sizes coming from the implementation. However, for the
> legitimate size tuple [oprsz=8, maxsz=32], we need to clear the final
> 24 bytes of the vector register. Without this check, do_dup selects
> TCG_TYPE_V128 and clears only 16 bytes.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> tcg/tcg-op-gvec.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
> index 22db1590d5..61c25f5784 100644
> --- a/tcg/tcg-op-gvec.c
> +++ b/tcg/tcg-op-gvec.c
> @@ -287,8 +287,11 @@ void tcg_gen_gvec_4_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs,
> in units of LNSZ. This limits the expansion of inline code. */
> static inline bool check_size_impl(uint32_t oprsz, uint32_t lnsz)
> {
> - uint32_t lnct = oprsz / lnsz;
> - return lnct >= 1 && lnct <= MAX_UNROLL;
> + if (oprsz % lnsz == 0) {
> + uint32_t lnct = oprsz / lnsz;
> + return lnct >= 1 && lnct <= MAX_UNROLL;
> + }
> + return false;
> }
>
> static void expand_clr(uint32_t dofs, uint32_t maxsz);
--
Alex Bennée
© 2016 - 2025 Red Hat, Inc.