From nobody Sun Dec 14 06:17:02 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail header.i=@quicinc.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=quicinc.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1625528508789275.8028091354789; Mon, 5 Jul 2021 16:41:48 -0700 (PDT) Received: from localhost ([::1]:45834 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m0YDj-0000X2-3Q for importer@patchew.org; Mon, 05 Jul 2021 19:41:47 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:35544) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m0Y77-0003kF-R7 for qemu-devel@nongnu.org; Mon, 05 Jul 2021 19:34:57 -0400 Received: from alexa-out-sd-01.qualcomm.com ([199.106.114.38]:37352) by eggs.gnu.org with esmtps (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1m0Y74-0004a6-3A for qemu-devel@nongnu.org; Mon, 05 Jul 2021 19:34:57 -0400 Received: from unknown (HELO ironmsg04-sd.qualcomm.com) ([10.53.140.144]) by alexa-out-sd-01.qualcomm.com with ESMTP; 05 Jul 2021 16:34:39 -0700 Received: from vu-tsimpson-aus.qualcomm.com (HELO vu-tsimpson1-aus.qualcomm.com) ([10.222.150.1]) by ironmsg04-sd.qualcomm.com with ESMTP; 05 Jul 2021 16:34:38 -0700 Received: by vu-tsimpson1-aus.qualcomm.com (Postfix, from userid 47164) id 11D6522A5; Mon, 5 Jul 2021 18:34:38 -0500 (CDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=quicinc.com; i=@quicinc.com; q=dns/txt; s=qcdkim; t=1625528094; x=1657064094; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bLHVoIGHXBHI5Wdfp8l0N/9csUZ0BOyDvX8HNY2ytLI=; b=giIvgvG2aNS0HEsjvnJrIbJ0VNKhUk1qGDFwEL2eXokrzwo0jN1vmZVX KixOsUdicfp00fTXivQvkdeuS0jx1vNU8q0Mk28ZpgcW+RXYjrViy4D+O p8VM/w9C4JUefkrMLQt/xGIstA+CJe1iPlu7UdyYkeOAxj+6TeyXlgqk0 E=; X-QCInternal: smtphost From: Taylor Simpson To: qemu-devel@nongnu.org Subject: [PATCH 19/20] Hexagon HVX (tests/tcg/hexagon) scatter_gather test Date: Mon, 5 Jul 2021 18:34:33 -0500 Message-Id: <1625528074-19440-20-git-send-email-tsimpson@quicinc.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1625528074-19440-1-git-send-email-tsimpson@quicinc.com> References: <1625528074-19440-1-git-send-email-tsimpson@quicinc.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=199.106.114.38; envelope-from=tsimpson@qualcomm.com; helo=alexa-out-sd-01.qualcomm.com X-Spam_score_int: -40 X-Spam_score: -4.1 X-Spam_bar: ---- X-Spam_report: (-4.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: ale@rev.ng, peter.maydell@linaro.org, bcain@quicinc.com, richard.henderson@linaro.org, tsimpson@quicinc.com, philmd@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1625528510273100001 Signed-off-by: Taylor Simpson --- tests/tcg/hexagon/scatter_gather.c | 1011 ++++++++++++++++++++++++++++++++= ++++ tests/tcg/hexagon/Makefile.target | 2 + 2 files changed, 1013 insertions(+) create mode 100644 tests/tcg/hexagon/scatter_gather.c diff --git a/tests/tcg/hexagon/scatter_gather.c b/tests/tcg/hexagon/scatter= _gather.c new file mode 100644 index 0000000..b93eb18 --- /dev/null +++ b/tests/tcg/hexagon/scatter_gather.c @@ -0,0 +1,1011 @@ +/* + * Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights Res= erved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, see . + */ + +/* + * This example tests the HVX scatter/gather instructions + * + * See section 5.13 of the V68 HVX Programmer's Reference + * + * There are 3 main classes operations + * _16 16-bit elements and 16-bit offsets + * _32 32-bit elements and 32-bit offsets + * _16_32 16-bit elements and 32-bit offsets + * + * There are also masked and accumulate versions + */ + +#include +#include +#include +#include + +typedef long HVX_Vector __attribute__((__vector_size__(128))) + __attribute__((aligned(128))); +typedef long HVX_VectorPair __attribute__((__vector_size__(256))) + __attribute__((aligned(128))); +typedef long HVX_VectorPred __attribute__((__vector_size__(128))) + __attribute__((aligned(128))); + +#define VSCATTER_16(BASE, RGN, OFF, VALS) \ + __builtin_HEXAGON_V6_vscattermh_128B((int)BASE, RGN, OFF, VALS) +#define VSCATTER_16_MASKED(MASK, BASE, RGN, OFF, VALS) \ + __builtin_HEXAGON_V6_vscattermhq_128B(MASK, (int)BASE, RGN, OFF, VALS) +#define VSCATTER_32(BASE, RGN, OFF, VALS) \ + __builtin_HEXAGON_V6_vscattermw_128B((int)BASE, RGN, OFF, VALS) +#define VSCATTER_32_MASKED(MASK, BASE, RGN, OFF, VALS) \ + __builtin_HEXAGON_V6_vscattermwq_128B(MASK, (int)BASE, RGN, OFF, VALS) +#define VSCATTER_16_32(BASE, RGN, OFF, VALS) \ + __builtin_HEXAGON_V6_vscattermhw_128B((int)BASE, RGN, OFF, VALS) +#define VSCATTER_16_32_MASKED(MASK, BASE, RGN, OFF, VALS) \ + __builtin_HEXAGON_V6_vscattermhwq_128B(MASK, (int)BASE, RGN, OFF, VALS) +#define VSCATTER_16_ACC(BASE, RGN, OFF, VALS) \ + __builtin_HEXAGON_V6_vscattermh_add_128B((int)BASE, RGN, OFF, VALS) +#define VSCATTER_32_ACC(BASE, RGN, OFF, VALS) \ + __builtin_HEXAGON_V6_vscattermw_add_128B((int)BASE, RGN, OFF, VALS) +#define VSCATTER_16_32_ACC(BASE, RGN, OFF, VALS) \ + __builtin_HEXAGON_V6_vscattermhw_add_128B((int)BASE, RGN, OFF, VALS) + +#define VGATHER_16(DSTADDR, BASE, RGN, OFF) \ + __builtin_HEXAGON_V6_vgathermh_128B(DSTADDR, (int)BASE, RGN, OFF) +#define VGATHER_16_MASKED(DSTADDR, MASK, BASE, RGN, OFF) \ + __builtin_HEXAGON_V6_vgathermhq_128B(DSTADDR, MASK, (int)BASE, RGN, OF= F) +#define VGATHER_32(DSTADDR, BASE, RGN, OFF) \ + __builtin_HEXAGON_V6_vgathermw_128B(DSTADDR, (int)BASE, RGN, OFF) +#define VGATHER_32_MASKED(DSTADDR, MASK, BASE, RGN, OFF) \ + __builtin_HEXAGON_V6_vgathermwq_128B(DSTADDR, MASK, (int)BASE, RGN, OF= F) +#define VGATHER_16_32(DSTADDR, BASE, RGN, OFF) \ + __builtin_HEXAGON_V6_vgathermhw_128B(DSTADDR, (int)BASE, RGN, OFF) +#define VGATHER_16_32_MASKED(DSTADDR, MASK, BASE, RGN, OFF) \ + __builtin_HEXAGON_V6_vgathermhwq_128B(DSTADDR, MASK, (int)BASE, RGN, O= FF) + +#define VSHUFF_H(V) \ + __builtin_HEXAGON_V6_vshuffh_128B(V) +#define VSPLAT_H(X) \ + __builtin_HEXAGON_V6_lvsplath_128B(X) +#define VAND_VAL(PRED, VAL) \ + __builtin_HEXAGON_V6_vandvrt_128B(PRED, VAL) +#define VDEAL_H(V) \ + __builtin_HEXAGON_V6_vdealh_128B(V) + +int err; + +/* define the number of rows/cols in a square matrix */ +#define MATRIX_SIZE 64 + +/* define the size of the scatter buffer */ +#define SCATTER_BUFFER_SIZE (MATRIX_SIZE * MATRIX_SIZE) + +/* fake vtcm - put buffers together and force alignment */ +static struct { + unsigned short vscatter16[SCATTER_BUFFER_SIZE]; + unsigned short vgather16[MATRIX_SIZE]; + unsigned int vscatter32[SCATTER_BUFFER_SIZE]; + unsigned int vgather32[MATRIX_SIZE]; + unsigned short vscatter16_32[SCATTER_BUFFER_SIZE]; + unsigned short vgather16_32[MATRIX_SIZE]; +} vtcm __attribute__((aligned(0x10000))); + +/* declare the arrays of reference values */ +unsigned short vscatter16_ref[SCATTER_BUFFER_SIZE]; +unsigned short vgather16_ref[MATRIX_SIZE]; +unsigned int vscatter32_ref[SCATTER_BUFFER_SIZE]; +unsigned int vgather32_ref[MATRIX_SIZE]; +unsigned short vscatter16_32_ref[SCATTER_BUFFER_SIZE]; +unsigned short vgather16_32_ref[MATRIX_SIZE]; + +/* declare the arrays of offsets */ +unsigned short half_offsets[MATRIX_SIZE]; +unsigned int word_offsets[MATRIX_SIZE]; + +/* declare the arrays of values */ +unsigned short half_values[MATRIX_SIZE]; +unsigned short half_values_acc[MATRIX_SIZE]; +unsigned short half_values_masked[MATRIX_SIZE]; +unsigned int word_values[MATRIX_SIZE]; +unsigned int word_values_acc[MATRIX_SIZE]; +unsigned int word_values_masked[MATRIX_SIZE]; + +/* declare the arrays of predicates */ +unsigned short half_predicates[MATRIX_SIZE]; +unsigned int word_predicates[MATRIX_SIZE]; + +/* make this big enough for all the intrinsics */ +const size_t region_len =3D sizeof(vtcm); + +/* optionally add sync instructions */ +#define SYNC_VECTOR 1 + +static void sync_scatter(void *addr) +{ +#if SYNC_VECTOR + /* + * Do the scatter release followed by a dummy load to complete the + * synchronization. Normally the dummy load would be deferred as + * long as possible to minimize stalls. + */ + asm volatile("vmem(%0 + #0):scatter_release\n" : : "r"(addr)); + /* use volatile to force the load */ + volatile HVX_Vector vDummy =3D *(HVX_Vector *)addr; vDummy =3D vDummy; +#endif +} + +static void sync_gather(void *addr) +{ +#if SYNC_VECTOR + /* use volatile to force the load */ + volatile HVX_Vector vDummy =3D *(HVX_Vector *)addr; vDummy =3D vDummy; +#endif +} + +/* optionally print the results */ +#define PRINT_DATA 0 + +#define FILL_CHAR '.' + +/* fill vtcm scratch with ee */ +void prefill_vtcm_scratch(void) +{ + memset(&vtcm, FILL_CHAR, sizeof(vtcm)); +} + +/* create byte offsets to be a diagonal of the matrix with 16 bit elements= */ +void create_offsets_values_preds_16(void) +{ + unsigned short half_element =3D 0; + unsigned short half_element_masked =3D 0; + char letter =3D 'A'; + char letter_masked =3D '@'; + + for (int i =3D 0; i < MATRIX_SIZE; i++) { + half_offsets[i] =3D i * (2 * MATRIX_SIZE + 2); + + half_element =3D 0; + half_element_masked =3D 0; + for (int j =3D 0; j < 2; j++) { + half_element |=3D letter << j * 8; + half_element_masked |=3D letter_masked << j * 8; + } + + half_values[i] =3D half_element; + half_values_acc[i] =3D ((i % 10) << 8) + (i % 10); + half_values_masked[i] =3D half_element_masked; + + letter++; + /* reset to 'A' */ + if (letter =3D=3D 'M') { + letter =3D 'A'; + } + + half_predicates[i] =3D (i % 3 =3D=3D 0 || i % 5 =3D=3D 0) ? ~0 : 0; + } +} + +/* create byte offsets to be a diagonal of the matrix with 32 bit elements= */ +void create_offsets_values_preds_32(void) +{ + unsigned int word_element =3D 0; + unsigned int word_element_masked =3D 0; + char letter =3D 'A'; + char letter_masked =3D '&'; + + for (int i =3D 0; i < MATRIX_SIZE; i++) { + word_offsets[i] =3D i * (4 * MATRIX_SIZE + 4); + + word_element =3D 0; + word_element_masked =3D 0; + for (int j =3D 0; j < 4; j++) { + word_element |=3D letter << j * 8; + word_element_masked |=3D letter_masked << j * 8; + } + + word_values[i] =3D word_element; + word_values_acc[i] =3D ((i % 10) << 8) + (i % 10); + word_values_masked[i] =3D word_element_masked; + + letter++; + /* reset to 'A' */ + if (letter =3D=3D 'M') { + letter =3D 'A'; + } + + word_predicates[i] =3D (i % 4 =3D=3D 0 || i % 7 =3D=3D 0) ? ~0 : 0; + } +} + +/* + * create byte offsets to be a diagonal of the matrix with 16 bit elements + * and 32 bit offsets + */ +void create_offsets_values_preds_16_32(void) +{ + unsigned short half_element =3D 0; + unsigned short half_element_masked =3D 0; + char letter =3D 'D'; + char letter_masked =3D '$'; + + for (int i =3D 0; i < MATRIX_SIZE; i++) { + word_offsets[i] =3D i * (2 * MATRIX_SIZE + 2); + + half_element =3D 0; + half_element_masked =3D 0; + for (int j =3D 0; j < 2; j++) { + half_element |=3D letter << j * 8; + half_element_masked |=3D letter_masked << j * 8; + } + + half_values[i] =3D half_element; + half_values_acc[i] =3D ((i % 10) << 8) + (i % 10); + half_values_masked[i] =3D half_element_masked; + + letter++; + /* reset to 'A' */ + if (letter =3D=3D 'P') { + letter =3D 'D'; + } + + half_predicates[i] =3D (i % 2 =3D=3D 0 || i % 13 =3D=3D 0) ? ~0 : = 0; + } +} + +/* scatter the 16 bit elements using intrinsics */ +void vector_scatter_16(void) +{ + /* copy the offsets and values to vectors */ + HVX_Vector offsets =3D *(HVX_Vector *)half_offsets; + HVX_Vector values =3D *(HVX_Vector *)half_values; + + VSCATTER_16(&vtcm.vscatter16, region_len, offsets, values); + + sync_scatter(vtcm.vscatter16); +} + +/* scatter-accumulate the 16 bit elements using intrinsics */ +void vector_scatter_16_acc(void) +{ + /* copy the offsets and values to vectors */ + HVX_Vector offsets =3D *(HVX_Vector *)half_offsets; + HVX_Vector values =3D *(HVX_Vector *)half_values_acc; + + VSCATTER_16_ACC(&vtcm.vscatter16, region_len, offsets, values); + + sync_scatter(vtcm.vscatter16); +} + +/* scatter the 16 bit elements using intrinsics */ +void vector_scatter_16_masked(void) +{ + /* copy the offsets and values to vectors */ + HVX_Vector offsets =3D *(HVX_Vector *)half_offsets; + HVX_Vector values =3D *(HVX_Vector *)half_values_masked; + HVX_Vector pred_reg =3D *(HVX_Vector *)half_predicates; + HVX_VectorPred preds =3D VAND_VAL(pred_reg, ~0); + + VSCATTER_16_MASKED(preds, &vtcm.vscatter16, region_len, offsets, value= s); + + sync_scatter(vtcm.vscatter16); +} + +/* scatter the 32 bit elements using intrinsics */ +void vector_scatter_32(void) +{ + /* copy the offsets and values to vectors */ + HVX_Vector offsetslo =3D *(HVX_Vector *)word_offsets; + HVX_Vector offsetshi =3D *(HVX_Vector *)&word_offsets[MATRIX_SIZE / 2]; + HVX_Vector valueslo =3D *(HVX_Vector *)word_values; + HVX_Vector valueshi =3D *(HVX_Vector *)&word_values[MATRIX_SIZE / 2]; + + VSCATTER_32(&vtcm.vscatter32, region_len, offsetslo, valueslo); + VSCATTER_32(&vtcm.vscatter32, region_len, offsetshi, valueshi); + + sync_scatter(vtcm.vscatter32); +} + +/* scatter-acc the 32 bit elements using intrinsics */ +void vector_scatter_32_acc(void) +{ + /* copy the offsets and values to vectors */ + HVX_Vector offsetslo =3D *(HVX_Vector *)word_offsets; + HVX_Vector offsetshi =3D *(HVX_Vector *)&word_offsets[MATRIX_SIZE / 2]; + HVX_Vector valueslo =3D *(HVX_Vector *)word_values_acc; + HVX_Vector valueshi =3D *(HVX_Vector *)&word_values_acc[MATRIX_SIZE / = 2]; + + VSCATTER_32_ACC(&vtcm.vscatter32, region_len, offsetslo, valueslo); + VSCATTER_32_ACC(&vtcm.vscatter32, region_len, offsetshi, valueshi); + + sync_scatter(vtcm.vscatter32); +} + +/* scatter the 32 bit elements using intrinsics */ +void vector_scatter_32_masked(void) +{ + /* copy the offsets and values to vectors */ + HVX_Vector offsetslo =3D *(HVX_Vector *)word_offsets; + HVX_Vector offsetshi =3D *(HVX_Vector *)&word_offsets[MATRIX_SIZE / 2]; + HVX_Vector valueslo =3D *(HVX_Vector *)word_values_masked; + HVX_Vector valueshi =3D *(HVX_Vector *)&word_values_masked[MATRIX_SIZE= / 2]; + HVX_Vector pred_reglo =3D *(HVX_Vector *)word_predicates; + HVX_Vector pred_reghi =3D *(HVX_Vector *)&word_predicates[MATRIX_SIZE = / 2]; + HVX_VectorPred predslo =3D VAND_VAL(pred_reglo, ~0); + HVX_VectorPred predshi =3D VAND_VAL(pred_reghi, ~0); + + VSCATTER_32_MASKED(predslo, &vtcm.vscatter32, region_len, offsetslo, + valueslo); + VSCATTER_32_MASKED(predshi, &vtcm.vscatter32, region_len, offsetshi, + valueshi); + + sync_scatter(vtcm.vscatter16); +} + +/* scatter the 16 bit elements with 32 bit offsets using intrinsics */ +void vector_scatter_16_32(void) +{ + HVX_VectorPair offsets; + HVX_Vector values; + + /* get the word offsets in a vector pair */ + offsets =3D *(HVX_VectorPair *)word_offsets; + + /* these values need to be shuffled for the scatter */ + values =3D *(HVX_Vector *)half_values; + values =3D VSHUFF_H(values); + + VSCATTER_16_32(&vtcm.vscatter16_32, region_len, offsets, values); + + sync_scatter(vtcm.vscatter16_32); +} + +/* scatter-acc the 16 bit elements with 32 bit offsets using intrinsics */ +void vector_scatter_16_32_acc(void) +{ + HVX_VectorPair offsets; + HVX_Vector values; + + /* get the word offsets in a vector pair */ + offsets =3D *(HVX_VectorPair *)word_offsets; + + /* these values need to be shuffled for the scatter */ + values =3D *(HVX_Vector *)half_values_acc; + values =3D VSHUFF_H(values); + + VSCATTER_16_32_ACC(&vtcm.vscatter16_32, region_len, offsets, values); + + sync_scatter(vtcm.vscatter16_32); +} + +/* masked scatter the 16 bit elements with 32 bit offsets using intrinsics= */ +void vector_scatter_16_32_masked(void) +{ + HVX_VectorPair offsets; + HVX_Vector values; + HVX_Vector pred_reg; + + /* get the word offsets in a vector pair */ + offsets =3D *(HVX_VectorPair *)word_offsets; + + /* these values need to be shuffled for the scatter */ + values =3D *(HVX_Vector *)half_values_masked; + values =3D VSHUFF_H(values); + + pred_reg =3D *(HVX_Vector *)half_predicates; + pred_reg =3D VSHUFF_H(pred_reg); + HVX_VectorPred preds =3D VAND_VAL(pred_reg, ~0); + + VSCATTER_16_32_MASKED(preds, &vtcm.vscatter16_32, region_len, offsets, + values); + + sync_scatter(vtcm.vscatter16_32); +} + +/* gather the elements from the scatter16 buffer */ +void vector_gather_16(void) +{ + HVX_Vector *vgather =3D (HVX_Vector *)&vtcm.vgather16; + HVX_Vector offsets =3D *(HVX_Vector *)half_offsets; + + VGATHER_16(vgather, &vtcm.vscatter16, region_len, offsets); + + sync_gather(vgather); +} + +static unsigned short gather_16_masked_init(void) +{ + char letter =3D '?'; + return letter | (letter << 8); +} + +void vector_gather_16_masked(void) +{ + HVX_Vector *vgather =3D (HVX_Vector *)&vtcm.vgather16; + HVX_Vector offsets =3D *(HVX_Vector *)half_offsets; + HVX_Vector pred_reg =3D *(HVX_Vector *)half_predicates; + HVX_VectorPred preds =3D VAND_VAL(pred_reg, ~0); + + *vgather =3D VSPLAT_H(gather_16_masked_init()); + VGATHER_16_MASKED(vgather, preds, &vtcm.vscatter16, region_len, offset= s); + + sync_gather(vgather); +} + +/* gather the elements from the scatter32 buffer */ +void vector_gather_32(void) +{ + HVX_Vector *vgatherlo =3D (HVX_Vector *)&vtcm.vgather32; + HVX_Vector *vgatherhi =3D + (HVX_Vector *)((int)&vtcm.vgather32 + (MATRIX_SIZE * 2)); + HVX_Vector offsetslo =3D *(HVX_Vector *)word_offsets; + HVX_Vector offsetshi =3D *(HVX_Vector *)&word_offsets[MATRIX_SIZE / 2]; + + VGATHER_32(vgatherlo, &vtcm.vscatter32, region_len, offsetslo); + VGATHER_32(vgatherhi, &vtcm.vscatter32, region_len, offsetshi); + + sync_gather(vgatherhi); +} + +static unsigned int gather_32_masked_init(void) +{ + char letter =3D '?'; + return letter | (letter << 8) | (letter << 16) | (letter << 24); +} + +void vector_gather_32_masked(void) +{ + HVX_Vector *vgatherlo =3D (HVX_Vector *)&vtcm.vgather32; + HVX_Vector *vgatherhi =3D + (HVX_Vector *)((int)&vtcm.vgather32 + (MATRIX_SIZE * 2)); + HVX_Vector offsetslo =3D *(HVX_Vector *)word_offsets; + HVX_Vector offsetshi =3D *(HVX_Vector *)&word_offsets[MATRIX_SIZE / 2]; + HVX_Vector pred_reglo =3D *(HVX_Vector *)word_predicates; + HVX_VectorPred predslo =3D VAND_VAL(pred_reglo, ~0); + HVX_Vector pred_reghi =3D *(HVX_Vector *)&word_predicates[MATRIX_SIZE = / 2]; + HVX_VectorPred predshi =3D VAND_VAL(pred_reghi, ~0); + + *vgatherlo =3D VSPLAT_H(gather_32_masked_init()); + *vgatherhi =3D VSPLAT_H(gather_32_masked_init()); + VGATHER_32_MASKED(vgatherlo, predslo, &vtcm.vscatter32, region_len, + offsetslo); + VGATHER_32_MASKED(vgatherhi, predshi, &vtcm.vscatter32, region_len, + offsetshi); + + sync_gather(vgatherlo); + sync_gather(vgatherhi); +} + +/* gather the elements from the scatter16_32 buffer */ +void vector_gather_16_32(void) +{ + HVX_Vector *vgather; + HVX_VectorPair offsets; + HVX_Vector values; + + /* get the vtcm address to gather from */ + vgather =3D (HVX_Vector *)&vtcm.vgather16_32; + + /* get the word offsets in a vector pair */ + offsets =3D *(HVX_VectorPair *)word_offsets; + + VGATHER_16_32(vgather, &vtcm.vscatter16_32, region_len, offsets); + + /* deal the elements to get the order back */ + values =3D *(HVX_Vector *)vgather; + values =3D VDEAL_H(values); + + /* write it back to vtcm address */ + *(HVX_Vector *)vgather =3D values; +} + +void vector_gather_16_32_masked(void) +{ + HVX_Vector *vgather; + HVX_VectorPair offsets; + HVX_Vector pred_reg; + HVX_VectorPred preds; + HVX_Vector values; + + /* get the vtcm address to gather from */ + vgather =3D (HVX_Vector *)&vtcm.vgather16_32; + + /* get the word offsets in a vector pair */ + offsets =3D *(HVX_VectorPair *)word_offsets; + pred_reg =3D *(HVX_Vector *)half_predicates; + pred_reg =3D VSHUFF_H(pred_reg); + preds =3D VAND_VAL(pred_reg, ~0); + + *vgather =3D VSPLAT_H(gather_16_masked_init()); + VGATHER_16_32_MASKED(vgather, preds, &vtcm.vscatter16_32, region_len, + offsets); + + /* deal the elements to get the order back */ + values =3D *(HVX_Vector *)vgather; + values =3D VDEAL_H(values); + + /* write it back to vtcm address */ + *(HVX_Vector *)vgather =3D values; +} + +static void check_buffer(const char *name, void *c, void *r, size_t size) +{ + char *check =3D (char *)c; + char *ref =3D (char *)r; + for (int i =3D 0; i < size; i++) { + if (check[i] !=3D ref[i]) { + printf("ERROR %s [%d]: 0x%x (%c) !=3D 0x%x (%c)\n", name, i, + check[i], check[i], ref[i], ref[i]); + err++; + } + } +} + +/* + * These scalar functions are the C equivalents of the vector functions th= at + * use HVX + */ + +/* scatter the 16 bit elements using C */ +void scalar_scatter_16(unsigned short *vscatter16) +{ + for (int i =3D 0; i < MATRIX_SIZE; ++i) { + vscatter16[half_offsets[i] / 2] =3D half_values[i]; + } +} + +void check_scatter_16() +{ + memset(vscatter16_ref, FILL_CHAR, + SCATTER_BUFFER_SIZE * sizeof(unsigned short)); + scalar_scatter_16(vscatter16_ref); + check_buffer(__func__, vtcm.vscatter16, vscatter16_ref, + SCATTER_BUFFER_SIZE * sizeof(unsigned short)); +} + +/* scatter the 16 bit elements using C */ +void scalar_scatter_16_acc(unsigned short *vscatter16) +{ + for (int i =3D 0; i < MATRIX_SIZE; ++i) { + vscatter16[half_offsets[i] / 2] +=3D half_values_acc[i]; + } +} + +void check_scatter_16_acc() +{ + memset(vscatter16_ref, FILL_CHAR, + SCATTER_BUFFER_SIZE * sizeof(unsigned short)); + scalar_scatter_16(vscatter16_ref); + scalar_scatter_16_acc(vscatter16_ref); + check_buffer(__func__, vtcm.vscatter16, vscatter16_ref, + SCATTER_BUFFER_SIZE * sizeof(unsigned short)); +} + +/* scatter the 16 bit elements using C */ +void scalar_scatter_16_masked(unsigned short *vscatter16) +{ + for (int i =3D 0; i < MATRIX_SIZE; i++) { + if (half_predicates[i]) { + vscatter16[half_offsets[i] / 2] =3D half_values_masked[i]; + } + } + +} + +void check_scatter_16_masked() +{ + memset(vscatter16_ref, FILL_CHAR, + SCATTER_BUFFER_SIZE * sizeof(unsigned short)); + scalar_scatter_16(vscatter16_ref); + scalar_scatter_16_acc(vscatter16_ref); + scalar_scatter_16_masked(vscatter16_ref); + check_buffer(__func__, vtcm.vscatter16, vscatter16_ref, + SCATTER_BUFFER_SIZE * sizeof(unsigned short)); +} + +/* scatter the 32 bit elements using C */ +void scalar_scatter_32(unsigned int *vscatter32) +{ + for (int i =3D 0; i < MATRIX_SIZE; ++i) { + vscatter32[word_offsets[i] / 4] =3D word_values[i]; + } +} + +void check_scatter_32() +{ + memset(vscatter32_ref, FILL_CHAR, + SCATTER_BUFFER_SIZE * sizeof(unsigned int)); + scalar_scatter_32(vscatter32_ref); + check_buffer(__func__, vtcm.vscatter32, vscatter32_ref, + SCATTER_BUFFER_SIZE * sizeof(unsigned int)); +} + +/* scatter the 32 bit elements using C */ +void scalar_scatter_32_acc(unsigned int *vscatter32) +{ + for (int i =3D 0; i < MATRIX_SIZE; ++i) { + vscatter32[word_offsets[i] / 4] +=3D word_values_acc[i]; + } +} + +void check_scatter_32_acc() +{ + memset(vscatter32_ref, FILL_CHAR, + SCATTER_BUFFER_SIZE * sizeof(unsigned int)); + scalar_scatter_32(vscatter32_ref); + scalar_scatter_32_acc(vscatter32_ref); + check_buffer(__func__, vtcm.vscatter32, vscatter32_ref, + SCATTER_BUFFER_SIZE * sizeof(unsigned int)); +} + +/* scatter the 32 bit elements using C */ +void scalar_scatter_32_masked(unsigned int *vscatter32) +{ + for (int i =3D 0; i < MATRIX_SIZE; i++) { + if (word_predicates[i]) { + vscatter32[word_offsets[i] / 4] =3D word_values_masked[i]; + } + } +} + +void check_scatter_32_masked() +{ + memset(vscatter32_ref, FILL_CHAR, + SCATTER_BUFFER_SIZE * sizeof(unsigned int)); + scalar_scatter_32(vscatter32_ref); + scalar_scatter_32_acc(vscatter32_ref); + scalar_scatter_32_masked(vscatter32_ref); + check_buffer(__func__, vtcm.vscatter32, vscatter32_ref, + SCATTER_BUFFER_SIZE * sizeof(unsigned int)); +} + +/* scatter the 32 bit elements using C */ +void scalar_scatter_16_32(unsigned short *vscatter16_32) +{ + for (int i =3D 0; i < MATRIX_SIZE; ++i) { + vscatter16_32[word_offsets[i] / 2] =3D half_values[i]; + } +} + +void check_scatter_16_32() +{ + memset(vscatter16_32_ref, FILL_CHAR, + SCATTER_BUFFER_SIZE * sizeof(unsigned short)); + scalar_scatter_16_32(vscatter16_32_ref); + check_buffer(__func__, vtcm.vscatter16_32, vscatter16_32_ref, + SCATTER_BUFFER_SIZE * sizeof(unsigned short)); +} + +/* scatter the 32 bit elements using C */ +void scalar_scatter_16_32_acc(unsigned short *vscatter16_32) +{ + for (int i =3D 0; i < MATRIX_SIZE; ++i) { + vscatter16_32[word_offsets[i] / 2] +=3D half_values_acc[i]; + } +} + +void check_scatter_16_32_acc() +{ + memset(vscatter16_32_ref, FILL_CHAR, + SCATTER_BUFFER_SIZE * sizeof(unsigned short)); + scalar_scatter_16_32(vscatter16_32_ref); + scalar_scatter_16_32_acc(vscatter16_32_ref); + check_buffer(__func__, vtcm.vscatter16_32, vscatter16_32_ref, + SCATTER_BUFFER_SIZE * sizeof(unsigned short)); +} + +void scalar_scatter_16_32_masked(unsigned short *vscatter16_32) +{ + for (int i =3D 0; i < MATRIX_SIZE; i++) { + if (half_predicates[i]) { + vscatter16_32[word_offsets[i] / 2] =3D half_values_masked[i]; + } + } +} + +void check_scatter_16_32_masked() +{ + memset(vscatter16_32_ref, FILL_CHAR, + SCATTER_BUFFER_SIZE * sizeof(unsigned short)); + scalar_scatter_16_32(vscatter16_32_ref); + scalar_scatter_16_32_acc(vscatter16_32_ref); + scalar_scatter_16_32_masked(vscatter16_32_ref); + check_buffer(__func__, vtcm.vscatter16_32, vscatter16_32_ref, + SCATTER_BUFFER_SIZE * sizeof(unsigned short)); +} + +/* gather the elements from the scatter buffer using C */ +void scalar_gather_16(unsigned short *vgather16) +{ + for (int i =3D 0; i < MATRIX_SIZE; ++i) { + vgather16[i] =3D vtcm.vscatter16[half_offsets[i] / 2]; + } +} + +void check_gather_16() +{ + memset(vgather16_ref, 0, MATRIX_SIZE * sizeof(unsigned short)); + scalar_gather_16(vgather16_ref); + check_buffer(__func__, vtcm.vgather16, vgather16_ref, + MATRIX_SIZE * sizeof(unsigned short)); +} + +void scalar_gather_16_masked(unsigned short *vgather16) +{ + for (int i =3D 0; i < MATRIX_SIZE; ++i) { + if (half_predicates[i]) { + vgather16[i] =3D vtcm.vscatter16[half_offsets[i] / 2]; + } + } +} + +void check_gather_16_masked() +{ + memset(vgather16_ref, gather_16_masked_init(), + MATRIX_SIZE * sizeof(unsigned short)); + scalar_gather_16_masked(vgather16_ref); + check_buffer(__func__, vtcm.vgather16, vgather16_ref, + MATRIX_SIZE * sizeof(unsigned short)); +} + +/* gather the elements from the scatter buffer using C */ +void scalar_gather_32(unsigned int *vgather32) +{ + for (int i =3D 0; i < MATRIX_SIZE; ++i) { + vgather32[i] =3D vtcm.vscatter32[word_offsets[i] / 4]; + } +} + +void check_gather_32(void) +{ + memset(vgather32_ref, 0, MATRIX_SIZE * sizeof(unsigned int)); + scalar_gather_32(vgather32_ref); + check_buffer(__func__, vtcm.vgather32, vgather32_ref, + MATRIX_SIZE * sizeof(unsigned int)); +} + +void scalar_gather_32_masked(unsigned int *vgather32) +{ + for (int i =3D 0; i < MATRIX_SIZE; ++i) { + if (word_predicates[i]) { + vgather32[i] =3D vtcm.vscatter32[word_offsets[i] / 4]; + } + } +} + + +void check_gather_32_masked(void) +{ + memset(vgather32_ref, gather_32_masked_init(), + MATRIX_SIZE * sizeof(unsigned int)); + scalar_gather_32_masked(vgather32_ref); + check_buffer(__func__, vtcm.vgather32, + vgather32_ref, MATRIX_SIZE * sizeof(unsigned int)); +} + +/* gather the elements from the scatter buffer using C */ +void scalar_gather_16_32(unsigned short *vgather16_32) +{ + for (int i =3D 0; i < MATRIX_SIZE; ++i) { + vgather16_32[i] =3D vtcm.vscatter16_32[word_offsets[i] / 2]; + } +} + +void check_gather_16_32(void) +{ + memset(vgather16_32_ref, 0, MATRIX_SIZE * sizeof(unsigned short)); + scalar_gather_16_32(vgather16_32_ref); + check_buffer(__func__, vtcm.vgather16_32, vgather16_32_ref, + MATRIX_SIZE * sizeof(unsigned short)); +} + +void scalar_gather_16_32_masked(unsigned short *vgather16_32) +{ + for (int i =3D 0; i < MATRIX_SIZE; ++i) { + if (half_predicates[i]) { + vgather16_32[i] =3D vtcm.vscatter16_32[word_offsets[i] / 2]; + } + } + +} + +void check_gather_16_32_masked(void) +{ + memset(vgather16_32_ref, gather_16_masked_init(), + MATRIX_SIZE * sizeof(unsigned short)); + scalar_gather_16_32_masked(vgather16_32_ref); + check_buffer(__func__, vtcm.vgather16_32, vgather16_32_ref, + MATRIX_SIZE * sizeof(unsigned short)); +} + +/* print scatter16 buffer */ +void print_scatter16_buffer(void) +{ + if (PRINT_DATA) { + printf("\n\nPrinting the 16 bit scatter buffer"); + + for (int i =3D 0; i < SCATTER_BUFFER_SIZE; i++) { + if ((i % MATRIX_SIZE) =3D=3D 0) { + printf("\n"); + } + for (int j =3D 0; j < 2; j++) { + printf("%c", (char)((vtcm.vscatter16[i] >> j * 8) & 0xff)); + } + printf(" "); + } + printf("\n"); + } +} + +/* print the gather 16 buffer */ +void print_gather_result_16(void) +{ + if (PRINT_DATA) { + printf("\n\nPrinting the 16 bit gather result\n"); + + for (int i =3D 0; i < MATRIX_SIZE; i++) { + for (int j =3D 0; j < 2; j++) { + printf("%c", (char)((vtcm.vgather16[i] >> j * 8) & 0xff)); + } + printf(" "); + } + printf("\n"); + } +} + +/* print the scatter32 buffer */ +void print_scatter32_buffer(void) +{ + if (PRINT_DATA) { + printf("\n\nPrinting the 32 bit scatter buffer"); + + for (int i =3D 0; i < SCATTER_BUFFER_SIZE; i++) { + if ((i % MATRIX_SIZE) =3D=3D 0) { + printf("\n"); + } + for (int j =3D 0; j < 4; j++) { + printf("%c", (char)((vtcm.vscatter32[i] >> j * 8) & 0xff)); + } + printf(" "); + } + printf("\n"); + } +} + +/* print the gather 32 buffer */ +void print_gather_result_32(void) +{ + if (PRINT_DATA) { + printf("\n\nPrinting the 32 bit gather result\n"); + + for (int i =3D 0; i < MATRIX_SIZE; i++) { + for (int j =3D 0; j < 4; j++) { + printf("%c", (char)((vtcm.vgather32[i] >> j * 8) & 0xff)); + } + printf(" "); + } + printf("\n"); + } +} + +/* print the scatter16_32 buffer */ +void print_scatter16_32_buffer(void) +{ + if (PRINT_DATA) { + printf("\n\nPrinting the 16_32 bit scatter buffer"); + + for (int i =3D 0; i < SCATTER_BUFFER_SIZE; i++) { + if ((i % MATRIX_SIZE) =3D=3D 0) { + printf("\n"); + } + for (int j =3D 0; j < 2; j++) { + printf("%c", + (unsigned char)((vtcm.vscatter16_32[i] >> j * 8) & 0= xff)); + } + printf(" "); + } + printf("\n"); + } +} + +/* print the gather 16_32 buffer */ +void print_gather_result_16_32(void) +{ + if (PRINT_DATA) { + printf("\n\nPrinting the 16_32 bit gather result\n"); + + for (int i =3D 0; i < MATRIX_SIZE; i++) { + for (int j =3D 0; j < 2; j++) { + printf("%c", + (unsigned char)((vtcm.vgather16_32[i] >> j * 8) & 0= xff)); + } + printf(" "); + } + printf("\n"); + } +} + +int main() +{ + prefill_vtcm_scratch(); + + /* 16 bit elements with 16 bit offsets */ + create_offsets_values_preds_16(); + + vector_scatter_16(); + print_scatter16_buffer(); + check_scatter_16(); + + vector_gather_16(); + print_gather_result_16(); + check_gather_16(); + + vector_gather_16_masked(); + print_gather_result_16(); + check_gather_16_masked(); + + vector_scatter_16_acc(); + print_scatter16_buffer(); + check_scatter_16_acc(); + + vector_scatter_16_masked(); + print_scatter16_buffer(); + check_scatter_16_masked(); + + /* 32 bit elements with 32 bit offsets */ + create_offsets_values_preds_32(); + + vector_scatter_32(); + print_scatter32_buffer(); + check_scatter_32(); + + vector_gather_32(); + print_gather_result_32(); + check_gather_32(); + + vector_gather_32_masked(); + print_gather_result_32(); + check_gather_32_masked(); + + vector_scatter_32_acc(); + print_scatter32_buffer(); + check_scatter_32_acc(); + + vector_scatter_32_masked(); + print_scatter32_buffer(); + check_scatter_32_masked(); + + /* 16 bit elements with 32 bit offsets */ + create_offsets_values_preds_16_32(); + + vector_scatter_16_32(); + print_scatter16_32_buffer(); + check_scatter_16_32(); + + vector_gather_16_32(); + print_gather_result_16_32(); + check_gather_16_32(); + + vector_gather_16_32_masked(); + print_gather_result_16_32(); + check_gather_16_32_masked(); + + vector_scatter_16_32_acc(); + print_scatter16_32_buffer(); + check_scatter_16_32_acc(); + + vector_scatter_16_32_masked(); + print_scatter16_32_buffer(); + check_scatter_16_32_masked(); + + puts(err ? "FAIL" : "PASS"); + return err; +} diff --git a/tests/tcg/hexagon/Makefile.target b/tests/tcg/hexagon/Makefile= .target index 6ac9ac6..bde90e7 100644 --- a/tests/tcg/hexagon/Makefile.target +++ b/tests/tcg/hexagon/Makefile.target @@ -47,11 +47,13 @@ HEX_TESTS +=3D brev HEX_TESTS +=3D load_unpack HEX_TESTS +=3D load_align HEX_TESTS +=3D vector_add_int +HEX_TESTS +=3D scatter_gather HEX_TESTS +=3D atomics HEX_TESTS +=3D fpstuff HEX_TESTS +=3D hvx_misc =20 TESTS +=3D $(HEX_TESTS) =20 +scatter_gather: CFLAGS +=3D -mhvx vector_add_int: CFLAGS +=3D -mhvx -fvectorize hvx_misc: CFLAGS +=3D -mhvx --=20 2.7.4