From nobody Tue Feb 10 06:58:46 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=linaro.org Return-Path: Received: from lists.gnu.org (209.51.188.17 [209.51.188.17]) by mx.zohomail.com with SMTPS id 1549684292833698.4135383507901; Fri, 8 Feb 2019 19:51:32 -0800 (PST) Received: from localhost ([127.0.0.1]:39216 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gsJfp-000735-PP for importer@patchew.org; Fri, 08 Feb 2019 22:51:25 -0500 Received: from eggs.gnu.org ([209.51.188.92]:42531) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gsJUD-00074z-3E for qemu-devel@nongnu.org; Fri, 08 Feb 2019 22:39:26 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gsJU7-0000at-3l for qemu-devel@nongnu.org; Fri, 08 Feb 2019 22:39:23 -0500 Received: from mail-pl1-x641.google.com ([2607:f8b0:4864:20::641]:45148) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gsJU4-0000X3-VR for qemu-devel@nongnu.org; Fri, 08 Feb 2019 22:39:17 -0500 Received: by mail-pl1-x641.google.com with SMTP id a14so2588494plm.12 for ; Fri, 08 Feb 2019 19:39:08 -0800 (PST) Received: from cloudburst.twiddle.net (97-126-115-157.tukw.qwest.net. [97.126.115.157]) by smtp.gmail.com with ESMTPSA id s84sm6340737pfi.15.2019.02.08.19.39.06 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 08 Feb 2019 19:39:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ekZakn26RrnuoOdbmnjbqOhwwrv1WMbjn8NMae33G2c=; b=nsEbfiuGYIYkuBLnGMZgr3KJIKCc8VH7iNr3HJrXgpJTN9OEdU5NUPuKlc+Bc8kB9+ 3mNTjcQ92F9yme/ORruuw/LN6fI5r/B5r2mQwEosNUgKCgD5r3J31jNKG+e5Lbqp3gFm K7oGe1+/Yy3BWeVmp0NOHMNHyOI/BM4o9whlnrHXyojB6HCz2RWlJQn6MzGatfXv0RdO n8KPAszVjCrOMt+T+HJ/KcOqQKAV62Cb8nBPMQ1y9T/J/FG0MxM8jZ8xKy5+qxGNPlau 4RDh0ED1zEWt8jkSjnASbJeWH1yA01Isj/+rmJ28P/cQ5a5ESxruDB1Vttf0hbe42xO3 nQpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ekZakn26RrnuoOdbmnjbqOhwwrv1WMbjn8NMae33G2c=; b=Ll5l/sbjqfOXTfaBmHDEjIigSLq8L28YXJET3lL3kEZ/26Zl6ppzAobs+C8XIZb5ah 0oRq9nGWGDVp3qMwLS7FBgY6u8wmBKR6thdMOVRUPMGchST4cCsTCsppzSU5lh9u7/pc Blbbyda4A8yC95vCVYOJpGjToUiGVvlvCLnE51n15jzkvRTz4AvtgjnQu/xGhGcDf8gR dfuMSYAL3XmugcrwqI3rfpQWx/M25Mef0i8zoNmy18rccDyUIhvaHwHNKSetsreGlyZQ 0037zyI3yEEp8l3gvq6eKT9oCRMDgyaWjMD5QeVDvwwuzA4aok411RzGfMgy5UOSKfwa 2iGw== X-Gm-Message-State: AHQUAuamqUvhhlYkHzcCSjp92J4Lg51PreLXo8vU4M2vNzIcGq78h7/i lVdJZcIw5cESj4/zssyXks+Ce3fHpyY= X-Google-Smtp-Source: AHgI3IZmtwZjgS/sKfsYvN+vfJoeM5NrTwpExN/GdvYLUdOZzEhDJULK31qlaPzppN2QLByMXax6WQ== X-Received: by 2002:a17:902:a70b:: with SMTP id w11mr26311760plq.84.1549683547636; Fri, 08 Feb 2019 19:39:07 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Fri, 8 Feb 2019 19:38:46 -0800 Message-Id: <20190209033847.9014-12-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.2 In-Reply-To: <20190209033847.9014-1-richard.henderson@linaro.org> References: <20190209033847.9014-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::641 Subject: [Qemu-devel] [PATCH v3 11/12] target/arm: Use vector operations for saturation X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" For same-sign saturation, we have tcg vector operations. We can compute the QC bit by comparing the saturated value against the unsaturated value. Signed-off-by: Richard Henderson --- target/arm/helper.h | 33 +++++++ target/arm/translate.h | 4 + target/arm/translate-a64.c | 36 ++++---- target/arm/translate.c | 172 +++++++++++++++++++++++++++++++------ target/arm/vec_helper.c | 130 ++++++++++++++++++++++++++++ 5 files changed, 331 insertions(+), 44 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index 9874c35ea9..923e8e1525 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -641,6 +641,39 @@ DEF_HELPER_FLAGS_6(gvec_fmla_idx_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_6(gvec_fmla_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) =20 +DEF_HELPER_FLAGS_5(gvec_uqadd_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_uqadd_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_uqadd_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_uqadd_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sqadd_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sqadd_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sqadd_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sqadd_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_uqsub_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_uqsub_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_uqsub_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_uqsub_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sqsub_b, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sqsub_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sqsub_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sqsub_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + #ifdef TARGET_AARCH64 #include "helper-a64.h" #include "helper-sve.h" diff --git a/target/arm/translate.h b/target/arm/translate.h index 17748ddfb9..f25fe75685 100644 --- a/target/arm/translate.h +++ b/target/arm/translate.h @@ -214,6 +214,10 @@ extern const GVecGen2i ssra_op[4]; extern const GVecGen2i usra_op[4]; extern const GVecGen2i sri_op[4]; extern const GVecGen2i sli_op[4]; +extern const GVecGen4 uqadd_op[4]; +extern const GVecGen4 sqadd_op[4]; +extern const GVecGen4 uqsub_op[4]; +extern const GVecGen4 sqsub_op[4]; void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b); =20 /* diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index fd5ceb6613..af8e4fd4be 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -10948,6 +10948,22 @@ static void disas_simd_3same_int(DisasContext *s, = uint32_t insn) } =20 switch (opcode) { + case 0x01: /* SQADD, UQADD */ + tcg_gen_gvec_4(vec_full_reg_offset(s, rd), + offsetof(CPUARMState, vfp.qc), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s), + (u ? uqadd_op : sqadd_op) + size); + return; + case 0x05: /* SQSUB, UQSUB */ + tcg_gen_gvec_4(vec_full_reg_offset(s, rd), + offsetof(CPUARMState, vfp.qc), + vec_full_reg_offset(s, rn), + vec_full_reg_offset(s, rm), + is_q ? 16 : 8, vec_full_reg_size(s), + (u ? uqsub_op : sqsub_op) + size); + return; case 0x0c: /* SMAX, UMAX */ if (u) { gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size); @@ -11043,16 +11059,6 @@ static void disas_simd_3same_int(DisasContext *s, = uint32_t insn) genfn =3D fns[size][u]; break; } - case 0x1: /* SQADD, UQADD */ - { - static NeonGenTwoOpEnvFn * const fns[3][2] =3D { - { gen_helper_neon_qadd_s8, gen_helper_neon_qadd_u8 }, - { gen_helper_neon_qadd_s16, gen_helper_neon_qadd_u16 }, - { gen_helper_neon_qadd_s32, gen_helper_neon_qadd_u32 }, - }; - genenvfn =3D fns[size][u]; - break; - } case 0x2: /* SRHADD, URHADD */ { static NeonGenTwoOpFn * const fns[3][2] =3D { @@ -11073,16 +11079,6 @@ static void disas_simd_3same_int(DisasContext *s, = uint32_t insn) genfn =3D fns[size][u]; break; } - case 0x5: /* SQSUB, UQSUB */ - { - static NeonGenTwoOpEnvFn * const fns[3][2] =3D { - { gen_helper_neon_qsub_s8, gen_helper_neon_qsub_u8 }, - { gen_helper_neon_qsub_s16, gen_helper_neon_qsub_u16 }, - { gen_helper_neon_qsub_s32, gen_helper_neon_qsub_u32 }, - }; - genenvfn =3D fns[size][u]; - break; - } case 0x8: /* SSHL, USHL */ { static NeonGenTwoOpFn * const fns[3][2] =3D { diff --git a/target/arm/translate.c b/target/arm/translate.c index 9b426f4271..dac737f6ca 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -6148,6 +6148,142 @@ const GVecGen3 cmtst_op[4] =3D { .vece =3D MO_64 }, }; =20 +static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat, + TCGv_vec a, TCGv_vec b) +{ + TCGv_vec x =3D tcg_temp_new_vec_matching(t); + tcg_gen_add_vec(vece, x, a, b); + tcg_gen_usadd_vec(vece, t, a, b); + tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t); + tcg_gen_or_vec(vece, sat, sat, x); + tcg_temp_free_vec(x); +} + +const GVecGen4 uqadd_op[4] =3D { + { .fniv =3D gen_uqadd_vec, + .fno =3D gen_helper_gvec_uqadd_b, + .opc =3D INDEX_op_usadd_vec, + .write_aofs =3D true, + .vece =3D MO_8 }, + { .fniv =3D gen_uqadd_vec, + .fno =3D gen_helper_gvec_uqadd_h, + .opc =3D INDEX_op_usadd_vec, + .write_aofs =3D true, + .vece =3D MO_16 }, + { .fniv =3D gen_uqadd_vec, + .fno =3D gen_helper_gvec_uqadd_s, + .opc =3D INDEX_op_usadd_vec, + .write_aofs =3D true, + .vece =3D MO_32 }, + { .fniv =3D gen_uqadd_vec, + .fno =3D gen_helper_gvec_uqadd_d, + .opc =3D INDEX_op_usadd_vec, + .write_aofs =3D true, + .vece =3D MO_64 }, +}; + +static void gen_sqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat, + TCGv_vec a, TCGv_vec b) +{ + TCGv_vec x =3D tcg_temp_new_vec_matching(t); + tcg_gen_add_vec(vece, x, a, b); + tcg_gen_ssadd_vec(vece, t, a, b); + tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t); + tcg_gen_or_vec(vece, sat, sat, x); + tcg_temp_free_vec(x); +} + +const GVecGen4 sqadd_op[4] =3D { + { .fniv =3D gen_sqadd_vec, + .fno =3D gen_helper_gvec_sqadd_b, + .opc =3D INDEX_op_ssadd_vec, + .write_aofs =3D true, + .vece =3D MO_8 }, + { .fniv =3D gen_sqadd_vec, + .fno =3D gen_helper_gvec_sqadd_h, + .opc =3D INDEX_op_ssadd_vec, + .write_aofs =3D true, + .vece =3D MO_16 }, + { .fniv =3D gen_sqadd_vec, + .fno =3D gen_helper_gvec_sqadd_s, + .opc =3D INDEX_op_ssadd_vec, + .write_aofs =3D true, + .vece =3D MO_32 }, + { .fniv =3D gen_sqadd_vec, + .fno =3D gen_helper_gvec_sqadd_d, + .opc =3D INDEX_op_ssadd_vec, + .write_aofs =3D true, + .vece =3D MO_64 }, +}; + +static void gen_uqsub_vec(unsigned vece, TCGv_vec t, TCGv_vec sat, + TCGv_vec a, TCGv_vec b) +{ + TCGv_vec x =3D tcg_temp_new_vec_matching(t); + tcg_gen_sub_vec(vece, x, a, b); + tcg_gen_ussub_vec(vece, t, a, b); + tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t); + tcg_gen_or_vec(vece, sat, sat, x); + tcg_temp_free_vec(x); +} + +const GVecGen4 uqsub_op[4] =3D { + { .fniv =3D gen_uqsub_vec, + .fno =3D gen_helper_gvec_uqsub_b, + .opc =3D INDEX_op_ussub_vec, + .write_aofs =3D true, + .vece =3D MO_8 }, + { .fniv =3D gen_uqsub_vec, + .fno =3D gen_helper_gvec_uqsub_h, + .opc =3D INDEX_op_ussub_vec, + .write_aofs =3D true, + .vece =3D MO_16 }, + { .fniv =3D gen_uqsub_vec, + .fno =3D gen_helper_gvec_uqsub_s, + .opc =3D INDEX_op_ussub_vec, + .write_aofs =3D true, + .vece =3D MO_32 }, + { .fniv =3D gen_uqsub_vec, + .fno =3D gen_helper_gvec_uqsub_d, + .opc =3D INDEX_op_ussub_vec, + .write_aofs =3D true, + .vece =3D MO_64 }, +}; + +static void gen_sqsub_vec(unsigned vece, TCGv_vec t, TCGv_vec sat, + TCGv_vec a, TCGv_vec b) +{ + TCGv_vec x =3D tcg_temp_new_vec_matching(t); + tcg_gen_sub_vec(vece, x, a, b); + tcg_gen_sssub_vec(vece, t, a, b); + tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t); + tcg_gen_or_vec(vece, sat, sat, x); + tcg_temp_free_vec(x); +} + +const GVecGen4 sqsub_op[4] =3D { + { .fniv =3D gen_sqsub_vec, + .fno =3D gen_helper_gvec_sqsub_b, + .opc =3D INDEX_op_sssub_vec, + .write_aofs =3D true, + .vece =3D MO_8 }, + { .fniv =3D gen_sqsub_vec, + .fno =3D gen_helper_gvec_sqsub_h, + .opc =3D INDEX_op_sssub_vec, + .write_aofs =3D true, + .vece =3D MO_16 }, + { .fniv =3D gen_sqsub_vec, + .fno =3D gen_helper_gvec_sqsub_s, + .opc =3D INDEX_op_sssub_vec, + .write_aofs =3D true, + .vece =3D MO_32 }, + { .fniv =3D gen_sqsub_vec, + .fno =3D gen_helper_gvec_sqsub_d, + .opc =3D INDEX_op_sssub_vec, + .write_aofs =3D true, + .vece =3D MO_64 }, +}; + /* Translate a NEON data processing instruction. Return nonzero if the instruction is invalid. We process data in a mixture of 32-bit and 64-bit chunks. @@ -6331,6 +6467,18 @@ static int disas_neon_data_insn(DisasContext *s, uin= t32_t insn) } return 0; =20 + case NEON_3R_VQADD: + tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc), + rn_ofs, rm_ofs, vec_size, vec_size, + (u ? uqadd_op : sqadd_op) + size); + break; + + case NEON_3R_VQSUB: + tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc), + rn_ofs, rm_ofs, vec_size, vec_size, + (u ? uqsub_op : sqsub_op) + size); + break; + case NEON_3R_VMUL: /* VMUL */ if (u) { /* Polynomial case allows only P8 and is handled below. */ @@ -6395,24 +6543,6 @@ static int disas_neon_data_insn(DisasContext *s, uin= t32_t insn) neon_load_reg64(cpu_V0, rn + pass); neon_load_reg64(cpu_V1, rm + pass); switch (op) { - case NEON_3R_VQADD: - if (u) { - gen_helper_neon_qadd_u64(cpu_V0, cpu_env, - cpu_V0, cpu_V1); - } else { - gen_helper_neon_qadd_s64(cpu_V0, cpu_env, - cpu_V0, cpu_V1); - } - break; - case NEON_3R_VQSUB: - if (u) { - gen_helper_neon_qsub_u64(cpu_V0, cpu_env, - cpu_V0, cpu_V1); - } else { - gen_helper_neon_qsub_s64(cpu_V0, cpu_env, - cpu_V0, cpu_V1); - } - break; case NEON_3R_VSHL: if (u) { gen_helper_neon_shl_u64(cpu_V0, cpu_V1, cpu_V0); @@ -6528,18 +6658,12 @@ static int disas_neon_data_insn(DisasContext *s, ui= nt32_t insn) case NEON_3R_VHADD: GEN_NEON_INTEGER_OP(hadd); break; - case NEON_3R_VQADD: - GEN_NEON_INTEGER_OP_ENV(qadd); - break; case NEON_3R_VRHADD: GEN_NEON_INTEGER_OP(rhadd); break; case NEON_3R_VHSUB: GEN_NEON_INTEGER_OP(hsub); break; - case NEON_3R_VQSUB: - GEN_NEON_INTEGER_OP_ENV(qsub); - break; case NEON_3R_VSHL: GEN_NEON_INTEGER_OP(shl); break; diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c index 65a18af4e0..10f17e4b5c 100644 --- a/target/arm/vec_helper.c +++ b/target/arm/vec_helper.c @@ -766,3 +766,133 @@ DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4) DO_FMLA_IDX(gvec_fmla_idx_d, float64, ) =20 #undef DO_FMLA_IDX + +#define DO_SAT(NAME, WTYPE, TYPEN, TYPEM, OP, MIN, MAX) \ +void HELPER(NAME)(void *vd, void *vq, void *vn, void *vm, uint32_t desc) = \ +{ = \ + intptr_t i, oprsz =3D simd_oprsz(desc); = \ + TYPEN *d =3D vd, *n =3D vn; TYPEM *m =3D vm; = \ + bool q =3D false; = \ + for (i =3D 0; i < oprsz / sizeof(TYPEN); i++) { = \ + WTYPE dd =3D (WTYPE)n[i] OP m[i]; = \ + if (dd < MIN) { = \ + dd =3D MIN; = \ + q =3D true; = \ + } else if (dd > MAX) { = \ + dd =3D MAX; = \ + q =3D true; = \ + } = \ + d[i] =3D dd; = \ + } = \ + if (q) { = \ + uint32_t *qc =3D vq; = \ + qc[0] =3D 1; = \ + } = \ + clear_tail(d, oprsz, simd_maxsz(desc)); = \ +} + +DO_SAT(gvec_uqadd_b, int, uint8_t, uint8_t, +, 0, UINT8_MAX) +DO_SAT(gvec_uqadd_h, int, uint16_t, uint16_t, +, 0, UINT16_MAX) +DO_SAT(gvec_uqadd_s, int64_t, uint32_t, uint32_t, +, 0, UINT32_MAX) + +DO_SAT(gvec_sqadd_b, int, int8_t, int8_t, +, INT8_MIN, INT8_MAX) +DO_SAT(gvec_sqadd_h, int, int16_t, int16_t, +, INT16_MIN, INT16_MAX) +DO_SAT(gvec_sqadd_s, int64_t, int32_t, int32_t, +, INT32_MIN, INT32_MAX) + +DO_SAT(gvec_uqsub_b, int, uint8_t, uint8_t, -, 0, UINT8_MAX) +DO_SAT(gvec_uqsub_h, int, uint16_t, uint16_t, -, 0, UINT16_MAX) +DO_SAT(gvec_uqsub_s, int64_t, uint32_t, uint32_t, -, 0, UINT32_MAX) + +DO_SAT(gvec_sqsub_b, int, int8_t, int8_t, -, INT8_MIN, INT8_MAX) +DO_SAT(gvec_sqsub_h, int, int16_t, int16_t, -, INT16_MIN, INT16_MAX) +DO_SAT(gvec_sqsub_s, int64_t, int32_t, int32_t, -, INT32_MIN, INT32_MAX) + +#undef DO_SAT + +void HELPER(gvec_uqadd_d)(void *vd, void *vq, void *vn, + void *vm, uint32_t desc) +{ + intptr_t i, oprsz =3D simd_oprsz(desc); + uint64_t *d =3D vd, *n =3D vn, *m =3D vm; + bool q =3D false; + + for (i =3D 0; i < oprsz / 8; i++) { + uint64_t nn =3D n[i], mm =3D m[i], dd =3D nn + mm; + if (dd < nn) { + dd =3D UINT64_MAX; + q =3D true; + } + d[i] =3D dd; + } + if (q) { + uint32_t *qc =3D vq; + qc[0] =3D 1; + } + clear_tail(d, oprsz, simd_maxsz(desc)); +} + +void HELPER(gvec_uqsub_d)(void *vd, void *vq, void *vn, + void *vm, uint32_t desc) +{ + intptr_t i, oprsz =3D simd_oprsz(desc); + uint64_t *d =3D vd, *n =3D vn, *m =3D vm; + bool q =3D false; + + for (i =3D 0; i < oprsz / 8; i++) { + uint64_t nn =3D n[i], mm =3D m[i], dd =3D nn - mm; + if (nn < mm) { + dd =3D 0; + q =3D true; + } + d[i] =3D dd; + } + if (q) { + uint32_t *qc =3D vq; + qc[0] =3D 1; + } + clear_tail(d, oprsz, simd_maxsz(desc)); +} + +void HELPER(gvec_sqadd_d)(void *vd, void *vq, void *vn, + void *vm, uint32_t desc) +{ + intptr_t i, oprsz =3D simd_oprsz(desc); + int64_t *d =3D vd, *n =3D vn, *m =3D vm; + bool q =3D false; + + for (i =3D 0; i < oprsz / 8; i++) { + int64_t nn =3D n[i], mm =3D m[i], dd =3D nn + mm; + if (((dd ^ nn) & ~(nn ^ mm)) & INT64_MIN) { + dd =3D (nn >> 63) ^ ~INT64_MIN; + q =3D true; + } + d[i] =3D dd; + } + if (q) { + uint32_t *qc =3D vq; + qc[0] =3D 1; + } + clear_tail(d, oprsz, simd_maxsz(desc)); +} + +void HELPER(gvec_sqsub_d)(void *vd, void *vq, void *vn, + void *vm, uint32_t desc) +{ + intptr_t i, oprsz =3D simd_oprsz(desc); + int64_t *d =3D vd, *n =3D vn, *m =3D vm; + bool q =3D false; + + for (i =3D 0; i < oprsz / 8; i++) { + int64_t nn =3D n[i], mm =3D m[i], dd =3D nn - mm; + if (((dd ^ nn) & (nn ^ mm)) & INT64_MIN) { + dd =3D (nn >> 63) ^ ~INT64_MIN; + q =3D true; + } + d[i] =3D dd; + } + if (q) { + uint32_t *qc =3D vq; + qc[0] =3D 1; + } + clear_tail(d, oprsz, simd_maxsz(desc)); +} --=20 2.17.2