From nobody Fri Sep 27 22:12:35 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=linaro.org ARC-Seal: i=1; a=rsa-sha256; t=1556688381; cv=none; d=zoho.com; s=zohoarc; b=LY7fU6wMDi45I6G742thNdbDx1lCcCL24yLrpVFl+cBfwUPP6oDXDcfp0phe9L5cSTjRsMETod3nwOngs8zryUPEqdoFK0KocgdLC18NPDNfafDwZVt9tLV+8qoehLzqMiV0q6y5m4BJTEppzDqpWIaarAe5L9klPleGruj0J/k= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1556688381; h=Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To:ARC-Authentication-Results; bh=OYW3j0AHRDjiM+2I81b0K/4Pb/2jwavlNMFJ+AYuqcM=; b=J6ZkQ5inAUqhXqXNIpTYmmTFLDenlFCrQh92b8/geey+mdBM/6ydOHYyo2od+/f9bPJFFAmyWdBk6nD9FF/9qQg34gZVTXVVjz1JgycFAEIFKiqFHKU5J035CIUzsiJRICpu0teg99CzViEYLrT4DmbN03x0c/rXQs3bf+V9JbQ= ARC-Authentication-Results: i=1; mx.zoho.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1556688381815905.1099653050836; Tue, 30 Apr 2019 22:26:21 -0700 (PDT) Received: from localhost ([127.0.0.1]:36528 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hLhl3-0006mG-KT for importer@patchew.org; Wed, 01 May 2019 01:26:17 -0400 Received: from eggs.gnu.org ([209.51.188.92]:38397) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hLhRb-0006aU-R8 for qemu-devel@nongnu.org; Wed, 01 May 2019 01:06:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hLhRX-0002xK-2A for qemu-devel@nongnu.org; Wed, 01 May 2019 01:06:11 -0400 Received: from mail-pg1-x542.google.com ([2607:f8b0:4864:20::542]:43619) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hLhRV-0002wQ-VL for qemu-devel@nongnu.org; Wed, 01 May 2019 01:06:06 -0400 Received: by mail-pg1-x542.google.com with SMTP id t22so4854158pgi.10 for ; Tue, 30 Apr 2019 22:06:05 -0700 (PDT) Received: from localhost.localdomain (97-113-189-189.tukw.qwest.net. [97.113.189.189]) by smtp.gmail.com with ESMTPSA id t127sm9687251pfb.106.2019.04.30.22.06.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Apr 2019 22:06:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:subject:date:message-id:in-reply-to:references; bh=OYW3j0AHRDjiM+2I81b0K/4Pb/2jwavlNMFJ+AYuqcM=; b=XmpDLnakOSblIbl9NAhd/EGXqC/UVryOyy0IKN1hVVHuzSdO6RQy/wPfKjtX6ueyvV Pj1MNVo0lUZPolos1P5a68QUDqL4V5M/3d9aE9y/sz879p1Hgz5kmnq22BAv0+tdUw7O Ps4XrC8uA0Cwmu0ZA6bH0CSLogmx/tAfIMcB1tiqWbgNnzFHaJvZgSYUytZQF9aoYUXZ oYE4SZg+hi3WUawkyLsW7TpMcsvGRtRGnfN+J7Wcf9ZF9VG2lrzrW4WBJAd1dPzN1/7f BaDecNx1mFGx/c3bErF3BR1L0AgiJVNMp+WevMyAuu6c9HlO85flQ6OpszykchsDsIPQ 3rGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=OYW3j0AHRDjiM+2I81b0K/4Pb/2jwavlNMFJ+AYuqcM=; b=gDCFtPTrR9Ae+vIqNnjCU03Y5/j9fok0OwbGQ0hC0IH7ATvOOFnpan50oQV0p73oui ro+4TgeQjOf6G+4HNRavNgtE/kwyws42TmTb6MEcX6gW79VYo2AzzxJEci+GIaJd5V0+ NAJrUfESDuXENxDQf/2O9gWy+YkKExL7QSeeFG00FzSTyLCEJx7GZSj8oYqaojpb8Imz E7KG8PDXwbaPGbKh4lWukLvFQexcOY3H1u3n2/hRoHLH8ZrlaK1LSQiW3CS2dE9mcX6c Zj4/yz6JtdvM1qIjq8F4z0vql9DhkbpxZcHlSbab1VyuNQklspOSkliapKFPQKWnkKQM jJbg== X-Gm-Message-State: APjAAAVji4aVgoxqc93KVdqzfo7cm8kmfu5Gt+p4Ov1tZSUS/R0Kv0s7 BOiKasyyUKmsG3ebu4thNlaaBVpyiyo= X-Google-Smtp-Source: APXvYqwfND1MfqLuRy31T9mxgaWUruexfKpjkqyVKbXP/U8OacqkG3cEH1eYJ3zSSzLaZgrYibxeIQ== X-Received: by 2002:a63:e003:: with SMTP id e3mr40871702pgh.0.1556687164723; Tue, 30 Apr 2019 22:06:04 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 30 Apr 2019 22:05:24 -0700 Message-Id: <20190501050536.15580-18-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190501050536.15580-1-richard.henderson@linaro.org> References: <20190501050536.15580-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::542 Subject: [Qemu-devel] [PATCH v2 17/29] tcg: Add gvec expanders for vector shift by scalar X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Signed-off-by: Richard Henderson Reviewed-by: Alex Benn=C3=A9e --- tcg/tcg-op-gvec.h | 7 ++ tcg/tcg-op.h | 4 + tcg/tcg-op-gvec.c | 204 ++++++++++++++++++++++++++++++++++++++++++++++ tcg/tcg-op-vec.c | 54 ++++++++++++ 4 files changed, 269 insertions(+) diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h index 84a6247b16..6ee98f3378 100644 --- a/tcg/tcg-op-gvec.h +++ b/tcg/tcg-op-gvec.h @@ -318,6 +318,13 @@ void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, u= int32_t aofs, void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs, int64_t shift, uint32_t oprsz, uint32_t maxsz); =20 +void tcg_gen_gvec_shls(unsigned vece, uint32_t dofs, uint32_t aofs, + TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_shrs(unsigned vece, uint32_t dofs, uint32_t aofs, + TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz); +void tcg_gen_gvec_sars(unsigned vece, uint32_t dofs, uint32_t aofs, + TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz); + /* * Perform vector shift by vector element, modulo the element size. * E.g. D[i] =3D A[i] << (B[i] % (8 << vece)). diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index 833c6330b5..472b73cb38 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -986,6 +986,10 @@ void tcg_gen_shli_vec(unsigned vece, TCGv_vec r, TCGv_= vec a, int64_t i); void tcg_gen_shri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i); void tcg_gen_sari_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i); =20 +void tcg_gen_shls_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s); +void tcg_gen_shrs_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s); +void tcg_gen_sars_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s); + void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s); void tcg_gen_shrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s); void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s); diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index 061ef329f1..8fc5ba042b 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -2555,6 +2555,210 @@ void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs= , uint32_t aofs, } } =20 +/* + * Specialized generation vector shifts by a non-constant scalar. + */ + +static void expand_2sh_vec(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t tysz, TCGType type, + TCGv_i32 shift, + void (*fni)(unsigned, TCGv_vec, TCGv_vec, TCGv_= i32)) +{ + TCGv_vec t0 =3D tcg_temp_new_vec(type); + uint32_t i; + + for (i =3D 0; i < oprsz; i +=3D tysz) { + tcg_gen_ld_vec(t0, cpu_env, aofs + i); + fni(vece, t0, t0, shift); + tcg_gen_st_vec(t0, cpu_env, dofs + i); + } + tcg_temp_free_vec(t0); +} + +static void +do_gvec_shifts(unsigned vece, uint32_t dofs, uint32_t aofs, + TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz, + void (*fni4)(TCGv_i32, TCGv_i32, TCGv_i32), + void (*fni8)(TCGv_i64, TCGv_i64, TCGv_i64), + void (*fniv_s)(unsigned, TCGv_vec, TCGv_vec, TCGv_i32), + void (*fniv_v)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec), + gen_helper_gvec_2 *fno, + const TCGOpcode *s_list, const TCGOpcode *v_list) +{ + TCGType type; + uint32_t some; + + check_size_align(oprsz, maxsz, dofs | aofs); + check_overlap_2(dofs, aofs, maxsz); + + /* If the backend has a scalar expansion, great. */ + type =3D choose_vector_type(s_list, vece, oprsz, vece =3D=3D MO_64); + if (type) { + const TCGOpcode *hold_list =3D tcg_swap_vecop_list(NULL); + switch (type) { + case TCG_TYPE_V256: + some =3D QEMU_ALIGN_DOWN(oprsz, 32); + expand_2sh_vec(vece, dofs, aofs, some, 32, + TCG_TYPE_V256, shift, fniv_s); + if (some =3D=3D oprsz) { + break; + } + dofs +=3D some; + aofs +=3D some; + oprsz -=3D some; + maxsz -=3D some; + /* fallthru */ + case TCG_TYPE_V128: + expand_2sh_vec(vece, dofs, aofs, oprsz, 16, + TCG_TYPE_V128, shift, fniv_s); + break; + case TCG_TYPE_V64: + expand_2sh_vec(vece, dofs, aofs, oprsz, 8, + TCG_TYPE_V64, shift, fniv_s); + break; + default: + g_assert_not_reached(); + } + tcg_swap_vecop_list(hold_list); + goto clear_tail; + } + + /* If the backend supports variable vector shifts, also cool. */ + type =3D choose_vector_type(v_list, vece, oprsz, vece =3D=3D MO_64); + if (type) { + const TCGOpcode *hold_list =3D tcg_swap_vecop_list(NULL); + TCGv_vec v_shift =3D tcg_temp_new_vec(type); + + if (vece =3D=3D MO_64) { + TCGv_i64 sh64 =3D tcg_temp_new_i64(); + tcg_gen_extu_i32_i64(sh64, shift); + tcg_gen_dup_i64_vec(MO_64, v_shift, sh64); + tcg_temp_free_i64(sh64); + } else { + tcg_gen_dup_i32_vec(vece, v_shift, shift); + } + + switch (type) { + case TCG_TYPE_V256: + some =3D QEMU_ALIGN_DOWN(oprsz, 32); + expand_2s_vec(vece, dofs, aofs, some, 32, TCG_TYPE_V256, + v_shift, false, fniv_v); + if (some =3D=3D oprsz) { + break; + } + dofs +=3D some; + aofs +=3D some; + oprsz -=3D some; + maxsz -=3D some; + /* fallthru */ + case TCG_TYPE_V128: + expand_2s_vec(vece, dofs, aofs, oprsz, 16, TCG_TYPE_V128, + v_shift, false, fniv_v); + break; + case TCG_TYPE_V64: + expand_2s_vec(vece, dofs, aofs, oprsz, 8, TCG_TYPE_V64, + v_shift, false, fniv_v); + break; + default: + g_assert_not_reached(); + } + tcg_temp_free_vec(v_shift); + tcg_swap_vecop_list(hold_list); + goto clear_tail; + } + + /* Otherwise fall back to integral... */ + if (fni4 && check_size_impl(oprsz, 4)) { + expand_2s_i32(dofs, aofs, oprsz, shift, false, fni4); + } else if (fni8 && check_size_impl(oprsz, 8)) { + TCGv_i64 sh64 =3D tcg_temp_new_i64(); + tcg_gen_extu_i32_i64(sh64, shift); + expand_2s_i64(dofs, aofs, oprsz, sh64, false, fni8); + tcg_temp_free_i64(sh64); + } else { + TCGv_ptr a0 =3D tcg_temp_new_ptr(); + TCGv_ptr a1 =3D tcg_temp_new_ptr(); + TCGv_i32 desc =3D tcg_temp_new_i32(); + + tcg_gen_shli_i32(desc, shift, SIMD_DATA_SHIFT); + tcg_gen_ori_i32(desc, desc, simd_desc(oprsz, maxsz, 0)); + tcg_gen_addi_ptr(a0, cpu_env, dofs); + tcg_gen_addi_ptr(a1, cpu_env, aofs); + + fno(a0, a1, desc); + + tcg_temp_free_ptr(a0); + tcg_temp_free_ptr(a1); + tcg_temp_free_i32(desc); + return; + } + + clear_tail: + if (oprsz < maxsz) { + expand_clr(dofs + oprsz, maxsz - oprsz); + } +} + +void tcg_gen_gvec_shls(unsigned vece, uint32_t dofs, uint32_t aofs, + TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz) +{ + static const TCGOpcode scalar_list[] =3D { INDEX_op_shls_vec, 0 }; + static const TCGOpcode vector_list[] =3D { INDEX_op_shlv_vec, 0 }; + static gen_helper_gvec_2 * const fno[4] =3D { + gen_helper_gvec_shl8i, + gen_helper_gvec_shl16i, + gen_helper_gvec_shl32i, + gen_helper_gvec_shl64i, + }; + + tcg_debug_assert(vece <=3D MO_64); + do_gvec_shifts(vece, dofs, aofs, shift, oprsz, maxsz, + vece =3D=3D MO_32 ? tcg_gen_shl_i32 : NULL, + vece =3D=3D MO_64 ? tcg_gen_shl_i64 : NULL, + tcg_gen_shls_vec, tcg_gen_shlv_vec, fno[vece], + scalar_list, vector_list); +} + +void tcg_gen_gvec_shrs(unsigned vece, uint32_t dofs, uint32_t aofs, + TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz) +{ + static const TCGOpcode scalar_list[] =3D { INDEX_op_shrs_vec, 0 }; + static const TCGOpcode vector_list[] =3D { INDEX_op_shrv_vec, 0 }; + static gen_helper_gvec_2 * const fno[4] =3D { + gen_helper_gvec_shr8i, + gen_helper_gvec_shr16i, + gen_helper_gvec_shr32i, + gen_helper_gvec_shr64i, + }; + + tcg_debug_assert(vece <=3D MO_64); + do_gvec_shifts(vece, dofs, aofs, shift, oprsz, maxsz, + vece =3D=3D MO_32 ? tcg_gen_shr_i32 : NULL, + vece =3D=3D MO_64 ? tcg_gen_shr_i64 : NULL, + tcg_gen_shrs_vec, tcg_gen_shrv_vec, fno[vece], + scalar_list, vector_list); +} + +void tcg_gen_gvec_sars(unsigned vece, uint32_t dofs, uint32_t aofs, + TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz) +{ + static const TCGOpcode scalar_list[] =3D { INDEX_op_sars_vec, 0 }; + static const TCGOpcode vector_list[] =3D { INDEX_op_sarv_vec, 0 }; + static gen_helper_gvec_2 * const fno[4] =3D { + gen_helper_gvec_sar8i, + gen_helper_gvec_sar16i, + gen_helper_gvec_sar32i, + gen_helper_gvec_sar64i, + }; + + tcg_debug_assert(vece <=3D MO_64); + do_gvec_shifts(vece, dofs, aofs, shift, oprsz, maxsz, + vece =3D=3D MO_32 ? tcg_gen_sar_i32 : NULL, + vece =3D=3D MO_64 ? tcg_gen_sar_i64 : NULL, + tcg_gen_sars_vec, tcg_gen_sarv_vec, fno[vece], + scalar_list, vector_list); +} + /* * Expand D =3D A << (B % element bits) * diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c index 96317dbd10..16062f5995 100644 --- a/tcg/tcg-op-vec.c +++ b/tcg/tcg-op-vec.c @@ -598,3 +598,57 @@ void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_= vec a, TCGv_vec b) { do_op3(vece, r, a, b, INDEX_op_sarv_vec); } + +static void do_shifts(unsigned vece, TCGv_vec r, TCGv_vec a, + TCGv_i32 s, TCGOpcode opc_s, TCGOpcode opc_v) +{ + TCGTemp *rt =3D tcgv_vec_temp(r); + TCGTemp *at =3D tcgv_vec_temp(a); + TCGTemp *st =3D tcgv_i32_temp(s); + TCGArg ri =3D temp_arg(rt); + TCGArg ai =3D temp_arg(at); + TCGArg si =3D temp_arg(st); + TCGType type =3D rt->base_type; + const TCGOpcode *hold_list; + int can; + + tcg_debug_assert(at->base_type >=3D type); + tcg_assert_listed_vecop(opc_s); + hold_list =3D tcg_swap_vecop_list(NULL); + + can =3D tcg_can_emit_vec_op(opc_s, type, vece); + if (can > 0) { + vec_gen_3(opc_s, type, vece, ri, ai, si); + } else if (can < 0) { + tcg_expand_vec_op(opc_s, type, vece, ri, ai, si); + } else { + TCGv_vec vec_s =3D tcg_temp_new_vec(type); + + if (vece =3D=3D MO_64) { + TCGv_i64 s64 =3D tcg_temp_new_i64(); + tcg_gen_extu_i32_i64(s64, s); + tcg_gen_dup_i64_vec(MO_64, vec_s, s64); + tcg_temp_free_i64(s64); + } else { + tcg_gen_dup_i32_vec(vece, vec_s, s); + } + do_op3(vece, r, a, vec_s, opc_v); + tcg_temp_free_vec(vec_s); + } + tcg_swap_vecop_list(hold_list); +} + +void tcg_gen_shls_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 b) +{ + do_shifts(vece, r, a, b, INDEX_op_shls_vec, INDEX_op_shlv_vec); +} + +void tcg_gen_shrs_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 b) +{ + do_shifts(vece, r, a, b, INDEX_op_shrs_vec, INDEX_op_shrv_vec); +} + +void tcg_gen_sars_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 b) +{ + do_shifts(vece, r, a, b, INDEX_op_sars_vec, INDEX_op_sarv_vec); +} --=20 2.17.1