From nobody Sat May 30 20:13:15 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1776422908575371.00509556960674; Fri, 17 Apr 2026 03:48:28 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wDgjE-00011L-Ps; Fri, 17 Apr 2026 06:47:16 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wDgjB-00010B-8U; Fri, 17 Apr 2026 06:47:14 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1wDgj8-0007u8-ED; Fri, 17 Apr 2026 06:47:13 -0400 Received: from Huawei.localdomain (unknown [36.110.52.2]) by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S3; Fri, 17 Apr 2026 18:47:05 +0800 (CST) From: Molly Chen To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com, daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com, chao.liu.zevorn@gmail.com Cc: xiaoou@iscas.ac.cn, qemu-riscv@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 01/14] target/riscv: rvp: Add option defines and dependency check for packed simd extension Date: Fri, 17 Apr 2026 18:46:38 +0800 Message-Id: <20260417104652.17857-2-xiaoou@iscas.ac.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn> References: <20260417104652.17857-1-xiaoou@iscas.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S3 X-Coremail-Antispam: 1UD129KBjvJXoWxGw1UurWrtryrurWkuw4rKrg_yoW5Zr4Upr ZxG3yakw4DJayfAa93trykXFn8WrsYgws7Kwsruw4xAFZ5ArWUWrnxtw4j9r43GFWrZF42 93Wv9F13ZFWUZFDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBm14x267AKxVW5JVWrJwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26r1I6r4UM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1l84 ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE3s1le2I2 62IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcV AFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG 0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc7CjxVAaw2AFwI0_Jw0_GFyl42 xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWU GwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrxkI7VAKI4 8JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4U MIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJwCI42IY6I 8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjfUn2-5UUUUU X-Originating-IP: [36.110.52.2] X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists1p.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn; helo=cstnet.cn X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998, HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1776422910538158500 Content-Type: text/plain; charset="utf-8" Co-Authored by: Yin Zhang Co-Authored by: Dajun Huang Co-Authored by: Zhiyuan Yang Signed-off-by: Molly Chen --- target/riscv/cpu.c | 5 +++-- target/riscv/cpu.h | 1 + target/riscv/tcg/tcg-cpu.c | 16 ++++++++++++++++ 3 files changed, 20 insertions(+), 2 deletions(-) diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c index 72c6f4f0f1..c630faa892 100644 --- a/target/riscv/cpu.c +++ b/target/riscv/cpu.c @@ -41,7 +41,7 @@ /* RISC-V CPU definitions */ static const char riscv_single_letter_exts[] =3D "IEMAFDQCBPVH"; const uint32_t misa_bits[] =3D {RVI, RVE, RVM, RVA, RVF, RVD, RVV, - RVC, RVS, RVU, RVH, RVG, RVB, 0}; + RVC, RVS, RVU, RVH, RVG, RVB, RVP, 0}; =20 /* * From vector_helper.c @@ -1172,7 +1172,8 @@ static const MISAExtInfo misa_ext_info_arr[] =3D { MISA_EXT_INFO(RVH, "h", "Hypervisor"), MISA_EXT_INFO(RVV, "v", "Vector operations"), MISA_EXT_INFO(RVG, "g", "General purpose (IMAFD_Zicsr_Zifencei)"), - MISA_EXT_INFO(RVB, "b", "Bit manipulation (Zba_Zbb_Zbs)") + MISA_EXT_INFO(RVB, "b", "Bit manipulation (Zba_Zbb_Zbs)"), + MISA_EXT_INFO(RVP, "x-p", "Packed-SIMD instructions") }; =20 static void riscv_cpu_validate_misa_mxl(RISCVCPUClass *mcc) diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h index 4c0676ed53..e08f57d282 100644 --- a/target/riscv/cpu.h +++ b/target/riscv/cpu.h @@ -69,6 +69,7 @@ typedef struct CPUArchState CPURISCVState; #define RVH RV('H') #define RVG RV('G') #define RVB RV('B') +#define RVP RV('P') =20 extern const uint32_t misa_bits[]; const char *riscv_get_misa_ext_name(uint32_t bit); diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c index f3f7808895..4545ae721c 100644 --- a/target/riscv/tcg/tcg-cpu.c +++ b/target/riscv/tcg/tcg-cpu.c @@ -601,6 +601,11 @@ static void riscv_cpu_validate_b(RISCVCPU *cpu) } } =20 +static void riscv_cpu_validate_p(RISCVCPU *cpu) +{ + /* Enable sub-extensions here. Do nothing for now. */ +} + /* * Check consistency between chosen extensions while setting * cpu->cfg accordingly. @@ -619,6 +624,10 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, = Error **errp) riscv_cpu_validate_b(cpu); } =20 + if (riscv_has_ext(env, RVP)) { + riscv_cpu_validate_p(cpu); + } + if (riscv_has_ext(env, RVI) && riscv_has_ext(env, RVE)) { error_setg(errp, "I and E extensions are incompatible"); @@ -683,6 +692,12 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, = Error **errp) return; } =20 + if (riscv_has_ext(env, RVP) && + !(cpu->cfg.ext_zba && cpu->cfg.ext_zbb && cpu->cfg.ext_zbkb)) { + error_setg(errp, "P extension requires zba, zbb and zbkb extension= s"); + return; + } + riscv_cpu_validate_v(env, &cpu->cfg, &local_err); if (local_err !=3D NULL) { error_propagate(errp, local_err); @@ -1413,6 +1428,7 @@ static const RISCVCPUMisaExtConfig misa_ext_cfgs[] = =3D { MISA_CFG(RVV, false), MISA_CFG(RVG, false), MISA_CFG(RVB, false), + MISA_CFG(RVP, false), }; =20 /* --=20 2.34.1 From nobody Sat May 30 20:13:15 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1776422880323156.14155418811038; Fri, 17 Apr 2026 03:48:00 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wDgjG-00012H-Od; Fri, 17 Apr 2026 06:47:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wDgjF-000121-Jp; Fri, 17 Apr 2026 06:47:17 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1wDgjA-0007ub-Jw; Fri, 17 Apr 2026 06:47:17 -0400 Received: from Huawei.localdomain (unknown [36.110.52.2]) by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S4; Fri, 17 Apr 2026 18:47:07 +0800 (CST) From: Molly Chen To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com, daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com, chao.liu.zevorn@gmail.com Cc: xiaoou@iscas.ac.cn, qemu-riscv@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 02/14] target/riscv: rvp: add arithmetic instructions, including saturating and non-saturating operations Date: Fri, 17 Apr 2026 18:46:39 +0800 Message-Id: <20260417104652.17857-3-xiaoou@iscas.ac.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn> References: <20260417104652.17857-1-xiaoou@iscas.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S4 X-Coremail-Antispam: 1UD129KBjvAXoWDGryrZryDtFyfXryrCr43GFg_yoWrtF13Wo W7Gw4rAr1xJr13u3s3uw48XFWDZFW29a1kJr4F9r4Duas7Wr1xKr1UJwn5Za1rJr45KrWf XFZaqFn8Jas3Cr9rn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUYM7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r15M28IrcIa0x kI8VCY1x0267AKxVW8JVW5JwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l84AC jcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26F4j6r4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY1x0262kKe7AKxVWUtVW8Zw CF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j 6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2Ij64 vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr0_ Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0x vEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUYJmUUUUUU= X-Originating-IP: [36.110.52.2] X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists1p.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn; helo=cstnet.cn X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998, HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1776422883945154100 Content-Type: text/plain; charset="utf-8" Signed-off-by: Molly Chen --- target/riscv/helper.h | 40 + target/riscv/insn32.decode | 65 ++ target/riscv/insn_trans/trans_rvp.c.inc | 564 ++++++++++++ target/riscv/meson.build | 3 +- target/riscv/psimd_helper.c | 1069 +++++++++++++++++++++++ target/riscv/translate.c | 1 + 6 files changed, 1741 insertions(+), 1 deletion(-) create mode 100644 target/riscv/insn_trans/trans_rvp.c.inc create mode 100644 target/riscv/psimd_helper.c diff --git a/target/riscv/helper.h b/target/riscv/helper.h index 54d2331966..76bc6583fb 100644 --- a/target/riscv/helper.h +++ b/target/riscv/helper.h @@ -1351,3 +1351,43 @@ DEF_HELPER_4(vsm4r_vs, void, ptr, ptr, env, i32) #ifndef CONFIG_USER_ONLY DEF_HELPER_1(ssamoswap_disabled, void, env) #endif + +/* Packed SIMD */ +DEF_HELPER_3(padd_b, tl, env, tl, tl) +DEF_HELPER_3(padd_h, tl, env, tl, tl) +DEF_HELPER_3(padd_w, i64, env, i64, i64) +DEF_HELPER_3(padd_bs, tl, env, tl, tl) +DEF_HELPER_3(padd_hs, tl, env, tl, tl) +DEF_HELPER_3(padd_ws, i64, env, i64, i64) +DEF_HELPER_3(psub_b, tl, env, tl, tl) +DEF_HELPER_3(psub_h, tl, env, tl, tl) +DEF_HELPER_3(psub_w, i64, env, i64, i64) +DEF_HELPER_3(psh1add_h, tl, env, tl, tl) +DEF_HELPER_3(psh1add_w, i64, env, i64, i64) +DEF_HELPER_3(pssh1sadd_h, tl, env, tl, tl) +DEF_HELPER_3(pssh1sadd_w, i64, env, i64, i64) +DEF_HELPER_3(ssh1sadd, i32, env, i32, i32) +DEF_HELPER_3(psadd_b, tl, env, tl, tl) +DEF_HELPER_3(psadd_h, tl, env, tl, tl) +DEF_HELPER_3(psadd_w, i64, env, i64, i64) +DEF_HELPER_3(psaddu_b, tl, env, tl, tl) +DEF_HELPER_3(psaddu_h, tl, env, tl, tl) +DEF_HELPER_3(psaddu_w, i64, env, i64, i64) +DEF_HELPER_3(sadd, i32, env, i32, i32) +DEF_HELPER_3(saddu, i32, env, i32, i32) +DEF_HELPER_3(pssub_b, tl, env, tl, tl) +DEF_HELPER_3(pssub_h, tl, env, tl, tl) +DEF_HELPER_3(pssub_w, i64, env, i64, i64) +DEF_HELPER_3(pssubu_b, tl, env, tl, tl) +DEF_HELPER_3(pssubu_h, tl, env, tl, tl) +DEF_HELPER_3(pssubu_w, i64, env, i64, i64) +DEF_HELPER_3(ssub, i32, env, i32, i32) +DEF_HELPER_3(ssubu, i32, env, i32, i32) +DEF_HELPER_3(psati_h, tl, env, tl, tl) +DEF_HELPER_3(pusati_h, tl, env, tl, tl) +DEF_HELPER_3(psati_w, i64, env, i64, i64) +DEF_HELPER_3(pusati_w, i64, env, i64, i64) +DEF_HELPER_3(sati_32, i32, env, i32, i32) +DEF_HELPER_3(usati_32, i32, env, i32, i32) +DEF_HELPER_3(sati_64, i64, env, i64, i64) +DEF_HELPER_3(usati_64, i64, env, i64, i64) diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode index 6e35c4b1e6..6043eb39cf 100644 --- a/target/riscv/insn32.decode +++ b/target/riscv/insn32.decode @@ -40,6 +40,9 @@ %imm_z6 26:1 15:5 %imm_mop5 30:1 26:2 20:2 %imm_mop3 30:1 26:2 +%imm_p_ui16 20:4 +%imm_p_ui32 20:5 +%imm_p_ui64 20:6 =20 # Argument sets: &empty @@ -105,6 +108,10 @@ @mop5 . . .. .. .... .. ..... ... ..... ....... &mop5 imm=3D%imm_mop5 %rd = %rs1 @mop3 . . .. .. . ..... ..... ... ..... ....... &mop3 imm=3D%imm_mop3 %rd = %rs1 %rs2 =20 +@p_ui16 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui16 %rs1 %= rd +@p_ui32 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui32 %rs1 %= rd +@p_ui64 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui64 %rs1 %= rd + # Formats 64: @sh5 ....... ..... ..... ... ..... ....... &shift shamt=3D%sh5 = %rs1 %rd =20 @@ -1084,3 +1091,61 @@ sb_aqrl 00111 . . ..... ..... 000 ..... 0101111 @at= om_st sh_aqrl 00111 . . ..... ..... 001 ..... 0101111 @atom_st sw_aqrl 00111 . . ..... ..... 010 ..... 0101111 @atom_st sd_aqrl 00111 . . ..... ..... 011 ..... 0101111 @atom_st + + +# *** P Experimental Extension Version v018 *** +# Arithmetic Operations(Non-Saturating and Saturating) +padd_b 1000010 ..... ..... 000 ..... 0111011 @r +padd_h 1000000 ..... ..... 000 ..... 0111011 @r +padd_w 1000001 ..... ..... 000 ..... 0111011 @r +padd_bs 1001110 ..... ..... 010 ..... 0011011 @r +padd_hs 1001100 ..... ..... 010 ..... 0011011 @r +padd_ws 1001101 ..... ..... 010 ..... 0011011 @r +psub_b 1100010 ..... ..... 000 ..... 0111011 @r +psub_h 1100000 ..... ..... 000 ..... 0111011 @r +psub_w 1100001 ..... ..... 000 ..... 0111011 @r +psh1add_h 1010000 ..... ..... 010 ..... 0111011 @r +psh1add_w 1010001 ..... ..... 010 ..... 0111011 @r +pssh1sadd_h 1011000 ..... ..... 010 ..... 0111011 @r +{ + ssh1sadd 1011001 ..... ..... 010 ..... 0111011 @r + pssh1sadd_w 1011001 ..... ..... 010 ..... 0111011 @r +} +psadd_b 1001010 ..... ..... 000 ..... 0111011 @r +psadd_h 1001000 ..... ..... 000 ..... 0111011 @r +{ + sadd 1001001 ..... ..... 000 ..... 0111011 @r + psadd_w 1001001 ..... ..... 000 ..... 0111011 @r +} +psaddu_b 1011010 ..... ..... 000 ..... 0111011 @r +psaddu_h 1011000 ..... ..... 000 ..... 0111011 @r +{ + saddu 1011001 ..... ..... 000 ..... 0111011 @r + psaddu_w 1011001 ..... ..... 000 ..... 0111011 @r +} +pssub_b 1101010 ..... ..... 000 ..... 0111011 @r +pssub_h 1101000 ..... ..... 000 ..... 0111011 @r +{ + ssub 1101001 ..... ..... 000 ..... 0111011 @r + pssub_w 1101001 ..... ..... 000 ..... 0111011 @r +} +pssubu_b 1111010 ..... ..... 000 ..... 0111011 @r +pssubu_h 1111000 ..... ..... 000 ..... 0111011 @r +{ + ssubu 1111001 ..... ..... 000 ..... 0111011 @r + pssubu_w 1111001 ..... ..... 000 ..... 0111011 @r +} +psati_h 11100 001.... ..... 100 ..... 0011011 @p_ui16 +pusati_h 10100 001.... ..... 100 ..... 0011011 @p_ui16 +{ + sati_32 11100 01..... ..... 100 ..... 0011011 @p_ui32 + psati_w 11100 01..... ..... 100 ..... 0011011 @p_ui32 +} +{ + usati_32 10100 01..... ..... 100 ..... 0011011 @p_ui32 + pusati_w 10100 01..... ..... 100 ..... 0011011 @p_ui32 +} + +sati_64 111001 ...... ..... 100 ..... 0011011 @p_ui64 +usati_64 101001 ...... ..... 100 ..... 0011011 @p_ui64 + diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr= ans/trans_rvp.c.inc new file mode 100644 index 0000000000..6f7246b563 --- /dev/null +++ b/target/riscv/insn_trans/trans_rvp.c.inc @@ -0,0 +1,564 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* RISC-V translation routines for the P Standard Extensions. */ +/* Copyright (c) 2026 ISRC ISCAS. */ + +#define GEN_SIMD_TRANS(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE); \ + TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE); \ + TCGv dest =3D dest_gpr(ctx, a->rd); \ + gen_helper_##NAME(dest, tcg_env, src1, src2); \ + return true; \ +} + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_32(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE); \ + TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE); \ + TCGv dest =3D dest_gpr(ctx, a->rd); \ + gen_helper_##NAME(dest, tcg_env, src1, src2); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_32(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + return true; \ +} +#endif + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_64(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_64BIT(ctx); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_64(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_64BIT(ctx); \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE); \ + TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE); \ + TCGv dest =3D dest_gpr(ctx, a->rd); \ + gen_helper_##NAME(dest, tcg_env, src1, src2); \ + return true; \ +} +#endif + +#define GEN_SIMD_TRANS_ACC(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE); \ + TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE); \ + TCGv dest =3D dest_gpr(ctx, a->rd); \ + TCGv t =3D tcg_temp_new(); \ + gen_helper_##NAME(t, tcg_env, src1, src2, dest); \ + gen_set_gpr(ctx, a->rd, t); \ + return true; \ +} + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_ACC_32(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE); \ + TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE); \ + TCGv dest =3D dest_gpr(ctx, a->rd); \ + TCGv t =3D tcg_temp_new(); \ + gen_helper_##NAME(t, tcg_env, src1, src2, dest); \ + gen_set_gpr(ctx, a->rd, t); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_ACC_32(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + return true; \ +} +#endif + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_ACC_64(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_64BIT(ctx); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_ACC_64(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_64BIT(ctx); \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE); \ + TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE); \ + TCGv dest =3D dest_gpr(ctx, a->rd); \ + TCGv t =3D tcg_temp_new(); \ + gen_helper_##NAME(t, tcg_env, src1, src2, dest); \ + gen_set_gpr(ctx, a->rd, t); \ + return true; \ +} +#endif + +#define GEN_SIMD_TRANS_R1(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE); \ + TCGv dest =3D dest_gpr(ctx, a->rd); \ + gen_helper_##NAME(dest, tcg_env, src1); \ + gen_set_gpr(ctx, a->rd, dest); \ + return true; \ +} + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_R1_64(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_64BIT(ctx); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_R1_64(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_64BIT(ctx); \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE); \ + TCGv dest =3D dest_gpr(ctx, a->rd); \ + gen_helper_##NAME(dest, tcg_env, src1); \ + return true; \ +} +#endif + +#define GEN_SIMD_TRANS_IMM(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE); \ + TCGv imm =3D tcg_constant_tl(a->imm); \ + TCGv dest =3D dest_gpr(ctx, a->rd); \ + gen_helper_##NAME(dest, tcg_env, src1, imm); \ + return true; \ +} + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_IMM_32(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE); \ + TCGv imm =3D tcg_constant_tl(a->imm); \ + TCGv dest =3D dest_gpr(ctx, a->rd); \ + gen_helper_##NAME(dest, tcg_env, src1, imm); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_IMM_32(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + return true; \ +} +#endif + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_IMM_64(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_64BIT(ctx); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_IMM_64(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_64BIT(ctx); \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE); \ + TCGv imm =3D tcg_constant_tl(a->imm); \ + TCGv dest =3D dest_gpr(ctx, a->rd); \ + gen_helper_##NAME(dest, tcg_env, src1, imm); \ + return true; \ +} +#endif + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_REG_PAIR_1(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + REQUIRE_EXT(ctx, RVP); \ + TCGv_i32 src1 =3D get_gpr(ctx, a->rs1, EXT_NONE); \ + TCGv_i32 src2 =3D get_gpr(ctx, a->rs2, EXT_NONE); \ + TCGv_i64 t =3D tcg_temp_new_i64(); \ + gen_helper_##NAME(t, tcg_env, src1, src2); \ + set_pair_regs(ctx, (a->rd) * 2, t); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_REG_PAIR_1(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + return true; \ +} +#endif + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_REG_PAIR_2(INSN, HELPER) \ +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE); \ + TCGv src2_0 =3D get_gpr(ctx, (a->rs2) * 2, EXT_NONE); \ + TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2); \ + TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE); \ + TCGv src2_1 =3D get_gpr(ctx, (a->rs2) * 2 + 1, EXT_NONE); \ + TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1); \ + gen_helper_##HELPER(dest_0, tcg_env, src1_0, src2_0); \ + gen_helper_##HELPER(dest_1, tcg_env, src1_1, src2_1); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_REG_PAIR_2(INSN, HELPER) \ +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + return true; \ +} +#endif + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_REG_PAIR_3(INSN, HELPER) \ +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ +{ \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE); \ + TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2); \ + TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE); \ + TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1); \ + TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE); \ + gen_helper_##HELPER(dest_0, tcg_env, src1_0, src2); \ + gen_helper_##HELPER(dest_1, tcg_env, src1_1, src2); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_REG_PAIR_3(INSN, HELPER) \ +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + return true; \ +} +#endif + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_REG_PAIR_DW(INSN, HELPER) \ +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE); \ + TCGv src2_0 =3D get_gpr(ctx, (a->rs2) * 2, EXT_NONE); \ + TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2); \ + TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE); \ + TCGv src2_1 =3D get_gpr(ctx, (a->rs2) * 2 + 1, EXT_NONE); \ + TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1); \ + gen_helper_##HELPER(dest_0, tcg_env, src1_0, src2_0); \ + gen_helper_##HELPER(dest_1, tcg_env, src1_1, src2_1); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_REG_PAIR_DW(INSN, HELPER) \ +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + return true; \ +} +#endif + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_REG_PAIR_DW_IMM(INSN, HELPER) \ +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE); \ + TCGv imm_0 =3D tcg_constant_tl(a->imm); \ + TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2); \ + TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE); \ + TCGv imm_1 =3D tcg_constant_tl(a->imm); \ + TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1); \ + gen_helper_##HELPER(dest_0, tcg_env, src1_0, imm_0); \ + gen_helper_##HELPER(dest_1, tcg_env, src1_1, imm_1); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_REG_PAIR_DW_IMM(INSN, HELPER) \ +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + return true; \ +} +#endif + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_REG_PAIR_DW_IMM_2(INSN, HELPER) \ +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE); \ + TCGv imm_0 =3D tcg_constant_tl(a->imm); \ + TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2); \ + TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE); \ + TCGv imm_1 =3D tcg_constant_tl(a->imm); \ + TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1); \ + gen_helper_##HELPER##_32(dest_0, tcg_env, src1_0, imm_0); \ + gen_helper_##HELPER##_32(dest_1, tcg_env, src1_1, imm_1); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_REG_PAIR_DW_IMM_2(INSN, HELPER) \ +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + return true; \ +} +#endif + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_REG_PAIR_5(INSN, HELPER) \ +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ +{ \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE); \ + TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2); \ + TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE); \ + TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1); \ + gen_helper_##HELPER(dest_0, tcg_env, src1_0); \ + gen_helper_##HELPER(dest_1, tcg_env, src1_1); \ + gen_set_gpr(ctx, (a->rd) * 2, dest_0); \ + gen_set_gpr(ctx, (a->rd) * 2 + 1, dest_1); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_REG_PAIR_5(INSN, HELPER) \ +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + return true; \ +} +#endif + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_REG_PAIR_IMM(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE); \ + TCGv_i32 imm =3D tcg_constant_i32(a->imm); \ + TCGv_i64 t =3D tcg_temp_new_i64(); \ + gen_helper_##NAME(t, tcg_env, src1, imm); \ + set_pair_regs(ctx, (a->rd) * 2, t); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_REG_PAIR_IMM(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + return true; \ +} +#endif + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_REG_PAIR_IMM_2(INSN, HELPER) \ +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE); \ + TCGv imm_0 =3D tcg_constant_tl(a->imm); \ + TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2); \ + TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE); \ + TCGv imm_1 =3D tcg_constant_tl(a->imm); \ + TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1); \ + gen_helper_##HELPER(dest_0, tcg_env, src1_0, imm_0); \ + gen_helper_##HELPER(dest_1, tcg_env, src1_1, imm_1); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_REG_PAIR_IMM_2(INSN, HELPER) \ +static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + return true; \ +} +#endif + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_ACC_REG_PAIR_1(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + REQUIRE_EXT(ctx, RVP); \ + TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE); \ + TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE); \ + TCGv_i64 t =3D tcg_temp_new_i64(); \ + if (a->rd =3D=3D 0) { \ + tcg_gen_movi_i64(t, 0); \ + } else { \ + get_pair_regs(ctx, t, (a->rd) * 2); \ + } \ + gen_helper_##NAME(t, tcg_env, src1, src2, t); \ + set_pair_regs(ctx, (a->rd) * 2, t); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_ACC_REG_PAIR_1(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + return true; \ +} +#endif + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_REG_PAIR_PREDSUM(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + REQUIRE_EXT(ctx, RVP); \ + TCGv_i32 src1_l; \ + TCGv_i32 src1_h; \ + TCGv_i32 src2 =3D get_gpr(ctx, a->rs2, EXT_NONE); \ + TCGv_i32 dest =3D dest_gpr(ctx, a->rd); \ + if (a->rs1 =3D=3D 0) { \ + src1_l =3D tcg_temp_new_i32(); \ + src1_h =3D tcg_temp_new_i32(); \ + tcg_gen_movi_i32(src1_l, 0); \ + tcg_gen_movi_i32(src1_h, 0); \ + } else { \ + src1_l =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE); \ + src1_h =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE); \ + } \ + gen_helper_##NAME(dest, tcg_env, src1_l, src1_h, src2); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_REG_PAIR_PREDSUM(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + return true; \ +} +#endif + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_PN_OP_IMM(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + REQUIRE_EXT(ctx, RVP); \ + TCGv_i64 s1 =3D tcg_temp_new_i64(); \ + if (a->rs1 =3D=3D 0) { \ + tcg_gen_mov_i64(s1, 0); \ + } else { \ + get_pair_regs(ctx, s1, a->rs1 * 2); \ + } \ + TCGv shamt =3D tcg_constant_tl(a->imm); \ + TCGv_i32 dest =3D dest_gpr(ctx, a->rd); \ + gen_helper_##NAME(dest, tcg_env, s1, shamt); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_PN_OP_IMM(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + return true; \ +} +#endif + +#if defined(TARGET_RISCV32) +#define GEN_SIMD_TRANS_PN_OP_REG(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + REQUIRE_EXT(ctx, RVP); \ + TCGv_i64 s1 =3D tcg_temp_new_i64(); \ + if (a->rs1 =3D=3D 0) { \ + tcg_gen_mov_i64(s1, 0); \ + } else { \ + get_pair_regs(ctx, s1, a->rs1 * 2); \ + } \ + TCGv_i32 rs2 =3D get_gpr(ctx, a->rs2, EXT_NONE); \ + TCGv_i32 dest =3D dest_gpr(ctx, a->rd); \ + gen_helper_##NAME(dest, tcg_env, s1, rs2); \ + return true; \ +} +#else +#define GEN_SIMD_TRANS_PN_OP_REG(NAME) \ +static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ +{ \ + REQUIRE_32BIT(ctx); \ + return true; \ +} +#endif + +GEN_SIMD_TRANS(padd_b) +GEN_SIMD_TRANS(padd_h) +GEN_SIMD_TRANS_64(padd_w) +GEN_SIMD_TRANS(padd_bs) +GEN_SIMD_TRANS(padd_hs) +GEN_SIMD_TRANS_64(padd_ws) +GEN_SIMD_TRANS(psub_b) +GEN_SIMD_TRANS(psub_h) +GEN_SIMD_TRANS_64(psub_w) +GEN_SIMD_TRANS(psh1add_h) +GEN_SIMD_TRANS_64(psh1add_w) +GEN_SIMD_TRANS(pssh1sadd_h) +GEN_SIMD_TRANS_64(pssh1sadd_w) +GEN_SIMD_TRANS_32(ssh1sadd) +GEN_SIMD_TRANS(psadd_b) +GEN_SIMD_TRANS(psadd_h) +GEN_SIMD_TRANS_64(psadd_w) +GEN_SIMD_TRANS(psaddu_b) +GEN_SIMD_TRANS(psaddu_h) +GEN_SIMD_TRANS_64(psaddu_w) +GEN_SIMD_TRANS_32(sadd) +GEN_SIMD_TRANS_32(saddu) +GEN_SIMD_TRANS(pssub_b) +GEN_SIMD_TRANS(pssub_h) +GEN_SIMD_TRANS_64(pssub_w) +GEN_SIMD_TRANS(pssubu_b) +GEN_SIMD_TRANS(pssubu_h) +GEN_SIMD_TRANS_64(pssubu_w) +GEN_SIMD_TRANS_32(ssub) +GEN_SIMD_TRANS_32(ssubu) +GEN_SIMD_TRANS_IMM(psati_h) +GEN_SIMD_TRANS_IMM(pusati_h) +GEN_SIMD_TRANS_IMM_64(psati_w) +GEN_SIMD_TRANS_IMM_64(pusati_w) +GEN_SIMD_TRANS_IMM_32(sati_32) +GEN_SIMD_TRANS_IMM_32(usati_32) +GEN_SIMD_TRANS_IMM_64(sati_64) +GEN_SIMD_TRANS_IMM_64(usati_64) diff --git a/target/riscv/meson.build b/target/riscv/meson.build index 79f36abd63..45ed7f8d8a 100644 --- a/target/riscv/meson.build +++ b/target/riscv/meson.build @@ -28,7 +28,8 @@ riscv_ss.add(files( 'm128_helper.c', 'crypto_helper.c', 'zce_helper.c', - 'vcrypto_helper.c' + 'vcrypto_helper.c', + 'psimd_helper.c' )) =20 riscv_system_ss =3D ss.source_set() diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c new file mode 100644 index 0000000000..a754ee3b5e --- /dev/null +++ b/target/riscv/psimd_helper.c @@ -0,0 +1,1069 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* RISC-V Packed SIMD Extension Helpers for QEMU. */ +/* Copyright (C) 2026 ISRC ISCAS. */ + +#include "qemu/osdep.h" +#include "cpu.h" +#include "qemu/host-utils.h" +#include "exec/helper-proto.h" +#include "fpu/softfloat.h" +#include "internals.h" + + +/* Helper macros */ + +/* Element count calculations */ +#define ELEMS_B(target) (sizeof(target) * 8 / 8) /* byte elements count= */ +#define ELEMS_H(target) (sizeof(target) * 8 / 16) +#define ELEMS_W(target) (sizeof(target) * 8 / 32) /* word elements count= */ + +/* Element extraction macros - unsigned to avoid sign extension */ +#define EXTRACT8(val, idx) (((val) >> ((idx) * 8)) & 0xFF) +#define EXTRACT16(val, idx) (((val) >> ((idx) * 16)) & 0xFFFF) +#define EXTRACT32(val, idx) (((val) >> ((idx) * 32)) & 0xFFFFFFFF) + +/* Element insertion macros */ +#define INSERT8(val, res, idx) \ + ((val) | ((target_ulong)(uint8_t)(res) << ((idx) * 8))) +#define INSERT16(val, res, idx) \ + ((val) | ((target_ulong)(uint16_t)(res) << ((idx) * 16))) +#define INSERT32(val, res, idx) \ + ((val) | ((target_ulong)(uint32_t)(res) << ((idx) * 32))) + +/* Saturation constants */ +static const int8_t SAT_MAX_B =3D 127; +static const int8_t SAT_MIN_B =3D -128; +static const int16_t SAT_MAX_H =3D 32767; +static const int16_t SAT_MIN_H =3D -32768; +static const int32_t SAT_MAX_W =3D 2147483647; +static const int32_t SAT_MIN_W =3D -2147483648LL; +static const uint8_t USAT_MAX_B =3D 255; +static const uint16_t USAT_MAX_H =3D 65535; +static const uint32_t USAT_MAX_W =3D 4294967295U; + + +/* Saturation helper functions */ + +/** + * Signed saturation for 8-bit elements + * Returns saturated value and sets *sat if saturation occurred + */ +static inline int8_t signed_saturate_b(int32_t val, int *sat) +{ + if (val > SAT_MAX_B) { + *sat =3D 1; + return SAT_MAX_B; + } + if (val < SAT_MIN_B) { + *sat =3D 1; + return SAT_MIN_B; + } + return (int8_t)val; +} + +/** + * Signed saturation for 16-bit elements + */ +static inline int16_t signed_saturate_h(int32_t val, int *sat) +{ + if (val > SAT_MAX_H) { + *sat =3D 1; + return SAT_MAX_H; + } + if (val < SAT_MIN_H) { + *sat =3D 1; + return SAT_MIN_H; + } + return (int16_t)val; +} + +/** + * Signed saturation for 32-bit elements + */ +static inline int32_t signed_saturate_w(int64_t val, int *sat) +{ + if (val > SAT_MAX_W) { + *sat =3D 1; + return SAT_MAX_W; + } + if (val < SAT_MIN_W) { + *sat =3D 1; + return SAT_MIN_W; + } + return (int32_t)val; +} + +/** + * Unsigned saturation for 8-bit elements + */ +static inline uint8_t unsigned_saturate_b(uint32_t val, int *sat) +{ + if (val > USAT_MAX_B) { + *sat =3D 1; + return USAT_MAX_B; + } + return (uint8_t)val; +} + +/** + * Unsigned saturation for 16-bit elements + */ +static inline uint16_t unsigned_saturate_h(uint32_t val, int *sat) +{ + if (val > USAT_MAX_H) { + *sat =3D 1; + return USAT_MAX_H; + } + return (uint16_t)val; +} + +/** + * Unsigned saturation for 32-bit elements + */ +static inline uint32_t unsigned_saturate_w(uint64_t val, int *sat) +{ + if (val > USAT_MAX_W) { + *sat =3D 1; + return USAT_MAX_W; + } + return (uint32_t)val; +} + +/* Basic addition operations (non-saturating) */ + +/** + * PADD.B - Packed 8-bit addition + * For each byte: rd[i] =3D rs1[i] + rs2[i] (modular) + */ +target_ulong HELPER(padd_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + + for (int i =3D 0; i < elems; i++) { + uint8_t e1 =3D EXTRACT8(rs1, i); + uint8_t e2 =3D EXTRACT8(rs2, i); + uint8_t res =3D e1 + e2; + rd =3D INSERT8(rd, res, i); + } + return rd; +} + +/** + * PADD.H - Packed 16-bit addition + * For each halfword: rd[i] =3D rs1[i] + rs2[i] (modular) + */ +target_ulong HELPER(padd_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + uint16_t res =3D e1 + e2; + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PADD.W - Packed 32-bit addition (RV64 only) + * For each word: rd[i] =3D rs1[i] + rs2[i] (modular) + */ +uint64_t HELPER(padd_w)(CPURISCVState *env, + uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; /* 2 words in 64-bit */ + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + uint32_t res =3D e1 + e2; + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PADD.BS - Packed 8-bit addition with scalar second operand + * For each byte: rd[i] =3D rs1[i] + rs2[0] (modular) + */ +target_ulong HELPER(padd_bs)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + uint8_t e2 =3D EXTRACT8(rs2, 0); /* Scalar, take least significant by= te */ + + for (int i =3D 0; i < elems; i++) { + uint8_t e1 =3D EXTRACT8(rs1, i); + uint8_t res =3D e1 + e2; + rd =3D INSERT8(rd, res, i); + } + return rd; +} + +/** + * PADD.HS - Packed 16-bit addition with scalar second operand + * For each halfword: rd[i] =3D rs1[i] + rs2[0] (modular) + */ +target_ulong HELPER(padd_hs)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + uint16_t e2 =3D EXTRACT16(rs2, 0); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t res =3D e1 + e2; + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PADD.WS - Packed 32-bit addition with scalar second operand (RV64 only) + * For each word: rd[i] =3D rs1[i] + rs2[0] (modular) + */ +uint64_t HELPER(padd_ws)(CPURISCVState *env, + uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + uint32_t e2 =3D EXTRACT32(rs2, 0); + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t res =3D e1 + e2; + rd =3D INSERT32(rd, res, i); + } + return rd; +} + + +/* Basic subtraction operations (non-saturating) */ + +/** + * PSUB.B - Packed 8-bit subtraction + * For each byte: rd[i] =3D rs1[i] - rs2[i] (modular) + */ +target_ulong HELPER(psub_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + + for (int i =3D 0; i < elems; i++) { + uint8_t e1 =3D EXTRACT8(rs1, i); + uint8_t e2 =3D EXTRACT8(rs2, i); + uint8_t res =3D e1 - e2; + rd =3D INSERT8(rd, res, i); + } + return rd; +} + +/** + * PSUB.H - Packed 16-bit subtraction + * For each halfword: rd[i] =3D rs1[i] - rs2[i] (modular) + */ +target_ulong HELPER(psub_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + uint16_t res =3D e1 - e2; + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PSUB.W - Packed 32-bit subtraction (RV64 only) + * For each word: rd[i] =3D rs1[i] - rs2[i] (modular) + */ +uint64_t HELPER(psub_w)(CPURISCVState *env, + uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + uint32_t res =3D e1 - e2; + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/* Shift-left-by-one and add operations */ + +/** + * PSH1ADD.H - Shift left by 1 and add (16-bit) + * For each halfword: rd[i] =3D (rs1[i] << 1) + rs2[i] + */ +target_ulong HELPER(psh1add_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + uint16_t res =3D (e1 << 1) + e2; + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PSH1ADD.W - Shift left by 1 and add (32-bit, RV64 only) + * For each word: rd[i] =3D (rs1[i] << 1) + rs2[i] + */ +uint64_t HELPER(psh1add_w)(CPURISCVState *env, + uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + uint32_t res =3D (e1 << 1) + e2; + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PSSH1SADD.H - Saturating shift left by 1 and saturating add (16-bit) + * For each halfword: rd[i] =3D sat16(sat16(rs1[i] << 1) + rs2[i]) + */ +target_ulong HELPER(pssh1sadd_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t e2 =3D (int16_t)EXTRACT16(rs2, i); + int32_t shifted; + + /* Check if shift-left-1 would overflow */ + if (e1 > 0x3FFF || e1 < -0x4000) { + shifted =3D (e1 < 0) ? 0xFFFF8000LL : 0x7FFF; + sat =3D 1; + } else { + shifted =3D e1 << 1; + } + + int32_t sum =3D shifted + e2; + int16_t res =3D signed_saturate_h(sum, &sat); + rd =3D INSERT16(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSSH1SADD.W - Saturating shift left by 1 and add + * with saturation (32-bit, RV64 only) + * For each word: rd[i] =3D sat32(sat32(rs1[i] << 1) + rs2[i]) + */ +uint64_t HELPER(pssh1sadd_w)(CPURISCVState *env, + uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int32_t e2 =3D (int32_t)EXTRACT32(rs2, i); + int64_t shifted; + + /* Check if shift-left-1 would overflow */ + if (e1 > 0x3FFFFFFF || e1 < -0x40000000) { + shifted =3D (e1 < 0) ? 0xFFFFFFFF80000000LL : 0x7FFFFFFF; + sat =3D 1; + } else { + shifted =3D (int64_t)e1 << 1; + } + + int64_t sum =3D shifted + e2; + int32_t res =3D signed_saturate_w(sum, &sat); + rd =3D INSERT32(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * SSH1SADD - 32-bit scalar saturating shift left by 1 and saturating add + */ +uint32_t HELPER(ssh1sadd)(CPURISCVState *env, + uint32_t rs1, uint32_t rs2) +{ + int32_t a =3D (int32_t)rs1; + int32_t b =3D (int32_t)rs2; + int64_t shifted; + int sat =3D 0; + + /* Check if shift-left-1 would overflow */ + if (a > 0x3FFFFFFF || a < -0x40000000) { + shifted =3D (a < 0) ? 0xFFFFFFFF80000000LL : 0x7FFFFFFF; + sat =3D 1; + } else { + shifted =3D (int64_t)a << 1; + } + + int64_t sum =3D shifted + b; + int32_t res =3D signed_saturate_w(sum, &sat); + + if (sat) { + env->vxsat =3D 1; + } + return (uint32_t)res; +} + +/* Saturating addition operations */ + +/** + * PSADD.B - Packed 8-bit signed saturating addition + * For each byte: rd[i] =3D sat8(rs1[i] + rs2[i]) + */ +target_ulong HELPER(psadd_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + int8_t e1 =3D (int8_t)EXTRACT8(rs1, i); + int8_t e2 =3D (int8_t)EXTRACT8(rs2, i); + int32_t sum =3D (int32_t)e1 + (int32_t)e2; + int8_t res =3D signed_saturate_b(sum, &sat); + rd =3D INSERT8(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSADD.H - Packed 16-bit signed saturating addition + * For each halfword: rd[i] =3D sat16(rs1[i] + rs2[i]) + */ +target_ulong HELPER(psadd_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t e2 =3D (int16_t)EXTRACT16(rs2, i); + int32_t sum =3D (int32_t)e1 + (int32_t)e2; + int16_t res =3D signed_saturate_h(sum, &sat); + rd =3D INSERT16(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSADD.W - Packed 32-bit signed saturating addition (RV64 only) + * For each word: rd[i] =3D sat32(rs1[i] + rs2[i]) + */ +uint64_t HELPER(psadd_w)(CPURISCVState *env, + uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int32_t e2 =3D (int32_t)EXTRACT32(rs2, i); + int64_t sum =3D (int64_t)e1 + (int64_t)e2; + int32_t res =3D signed_saturate_w(sum, &sat); + rd =3D INSERT32(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSADDU.B - Packed 8-bit unsigned saturating addition + * For each byte: rd[i] =3D usat8(rs1[i] + rs2[i]) + */ +target_ulong HELPER(psaddu_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + uint8_t e1 =3D EXTRACT8(rs1, i); + uint8_t e2 =3D EXTRACT8(rs2, i); + uint32_t sum =3D (uint32_t)e1 + (uint32_t)e2; + uint8_t res =3D unsigned_saturate_b(sum, &sat); + rd =3D INSERT8(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSADDU.H - Packed 16-bit unsigned saturating addition + * For each halfword: rd[i] =3D usat16(rs1[i] + rs2[i]) + */ +target_ulong HELPER(psaddu_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + uint32_t sum =3D (uint32_t)e1 + (uint32_t)e2; + uint16_t res =3D unsigned_saturate_h(sum, &sat); + rd =3D INSERT16(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSADDU.W - Packed 32-bit unsigned saturating addition (RV64 only) + * For each word: rd[i] =3D usat32(rs1[i] + rs2[i]) + */ +uint64_t HELPER(psaddu_w)(CPURISCVState *env, + uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + uint64_t sum =3D (uint64_t)e1 + (uint64_t)e2; + uint32_t res =3D unsigned_saturate_w(sum, &sat); + rd =3D INSERT32(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * SADD - 32-bit signed saturating addition + */ +uint32_t HELPER(sadd)(CPURISCVState *env, + uint32_t rs1, uint32_t rs2) +{ + int32_t a =3D (int32_t)rs1; + int32_t b =3D (int32_t)rs2; + int64_t sum =3D (int64_t)a + (int64_t)b; + int sat =3D 0; + int32_t res =3D signed_saturate_w(sum, &sat); + + if (sat) { + env->vxsat =3D 1; + } + return (uint32_t)res; +} + +/** + * SADDU - 32-bit unsigned saturating addition + */ +uint32_t HELPER(saddu)(CPURISCVState *env, + uint32_t rs1, uint32_t rs2) +{ + uint32_t a =3D rs1; + uint32_t b =3D rs2; + uint64_t sum =3D (uint64_t)a + (uint64_t)b; + int sat =3D 0; + uint32_t res =3D unsigned_saturate_w(sum, &sat); + + if (sat) { + env->vxsat =3D 1; + } + return res; +} + +/* Saturating subtraction operations */ + +/** + * PSSUB.B - Packed 8-bit signed saturating subtraction + * For each byte: rd[i] =3D sat8(rs1[i] - rs2[i]) + */ +target_ulong HELPER(pssub_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + int8_t e1 =3D (int8_t)EXTRACT8(rs1, i); + int8_t e2 =3D (int8_t)EXTRACT8(rs2, i); + int32_t diff =3D (int32_t)e1 - (int32_t)e2; + int8_t res =3D signed_saturate_b(diff, &sat); + rd =3D INSERT8(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSSUB.H - Packed 16-bit signed saturating subtraction + * For each halfword: rd[i] =3D sat16(rs1[i] - rs2[i]) + */ +target_ulong HELPER(pssub_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t e2 =3D (int16_t)EXTRACT16(rs2, i); + int32_t diff =3D (int32_t)e1 - (int32_t)e2; + int16_t res =3D signed_saturate_h(diff, &sat); + rd =3D INSERT16(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSSUB.W - Packed 32-bit signed saturating subtraction (RV64 only) + * For each word: rd[i] =3D sat32(rs1[i] - rs2[i]) + */ +uint64_t HELPER(pssub_w)(CPURISCVState *env, + uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int32_t e2 =3D (int32_t)EXTRACT32(rs2, i); + int64_t diff =3D (int64_t)e1 - (int64_t)e2; + int32_t res =3D signed_saturate_w(diff, &sat); + rd =3D INSERT32(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSSUBU.B - Packed 8-bit unsigned saturating subtraction + * For each byte: rd[i] =3D usat8(rs1[i] - rs2[i]) + */ +target_ulong HELPER(pssubu_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + uint8_t e1 =3D EXTRACT8(rs1, i); + uint8_t e2 =3D EXTRACT8(rs2, i); + uint32_t diff =3D e1 - e2; /* Unsigned subtraction may underflow = */ + uint8_t res =3D unsigned_saturate_b(diff, &sat); + rd =3D INSERT8(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSSUBU.H - Packed 16-bit unsigned saturating subtraction + * For each halfword: rd[i] =3D usat16(rs1[i] - rs2[i]) + */ +target_ulong HELPER(pssubu_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + uint32_t diff =3D e1 - e2; + uint16_t res =3D unsigned_saturate_h(diff, &sat); + rd =3D INSERT16(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSSUBU.W - Packed 32-bit unsigned saturating subtraction (RV64 only) + * For each word: rd[i] =3D usat32(rs1[i] - rs2[i]) + */ +uint64_t HELPER(pssubu_w)(CPURISCVState *env, + uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + uint32_t res =3D (e1 >=3D e2) ? (e1 - e2) : 0; + if (e1 < e2) { + sat =3D 1; + } + rd =3D INSERT32(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * SSUB - 32-bit signed saturating subtraction + */ +uint32_t HELPER(ssub)(CPURISCVState *env, + uint32_t rs1, uint32_t rs2) +{ + int32_t a =3D (int32_t)rs1; + int32_t b =3D (int32_t)rs2; + int64_t diff =3D (int64_t)a - (int64_t)b; + int sat =3D 0; + int32_t res =3D signed_saturate_w(diff, &sat); + + if (sat) { + env->vxsat =3D 1; + } + return (uint32_t)res; +} + +/** + * SSUBU - 32-bit unsigned saturating subtraction + */ +uint32_t HELPER(ssubu)(CPURISCVState *env, + uint32_t rs1, uint32_t rs2) +{ + uint32_t a =3D rs1; + uint32_t b =3D rs2; + uint64_t diff =3D (uint64_t)a - (uint64_t)b; + int sat =3D 0; + uint32_t res =3D unsigned_saturate_w(diff, &sat); + + if (sat) { + env->vxsat =3D 1; + } + return res; +} + +/* Saturation instructions (SAT, USAT) */ + +/** + * PSATI.H - Packed 16-bit signed saturate to immediate bit-width + * For each halfword: rd[i] =3D sat(rs1[i], imm+1 bits) + */ +target_ulong HELPER(psati_h)(CPURISCVState *env, + target_ulong rs1, target_ulong imm) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + int range =3D (imm & 0x0F) + 1; /* imm specifies bits-1 */ + int64_t max =3D (1LL << (range - 1)) - 1; + int64_t min =3D -(1LL << (range - 1)); + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t res; + + if (e1 > max) { + res =3D max; + sat =3D 1; + } else if (e1 < min) { + res =3D min; + sat =3D 1; + } else { + res =3D e1; + } + + rd =3D INSERT16(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PUSATI.H - Packed 16-bit unsigned saturate to immediate bit-width + * For each halfword: rd[i] =3D usat(rs1[i], imm bits) + */ +target_ulong HELPER(pusati_h)(CPURISCVState *env, + target_ulong rs1, target_ulong imm) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + uint32_t max =3D (1U << imm) - 1; + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t res; + + if (e1 < 0) { + res =3D 0; + sat =3D 1; + } else if ((uint16_t)e1 > max) { + res =3D max; + sat =3D 1; + } else { + res =3D e1; + } + + rd =3D INSERT16(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSATI.W - Packed 32-bit signed saturate to immediate bit-width (RV64 on= ly) + * For each word: rd[i] =3D sat(rs1[i], imm+1 bits) + */ +uint64_t HELPER(psati_w)(CPURISCVState *env, + uint64_t rs1, uint64_t imm) +{ + uint64_t rd =3D 0; + int elems =3D 2; + int range =3D (imm & 0x1F) + 1; + int64_t max =3D (1LL << (range - 1)) - 1; + int64_t min =3D -(1LL << (range - 1)); + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int32_t res; + + if (e1 > max) { + res =3D max; + sat =3D 1; + } else if (e1 < min) { + res =3D min; + sat =3D 1; + } else { + res =3D e1; + } + + rd =3D INSERT32(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PUSATI.W - Packed 32-bit unsigned saturate to immediate bit-width (RV64= only) + * For each word: rd[i] =3D usat(rs1[i], imm bits) + */ +uint64_t HELPER(pusati_w)(CPURISCVState *env, + uint64_t rs1, uint64_t imm) +{ + uint64_t rd =3D 0; + int elems =3D 2; + uint64_t max =3D (1ULL << imm) - 1; + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int32_t res; + + if (e1 < 0) { + res =3D 0; + sat =3D 1; + } else if ((uint32_t)e1 > max) { + res =3D max; + sat =3D 1; + } else { + res =3D e1; + } + + rd =3D INSERT32(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * SATI_32 - 32-bit scalar signed saturation with immediate range + */ +uint32_t HELPER(sati_32)(CPURISCVState *env, + uint32_t rs1, uint32_t imm) +{ + int32_t a =3D (int32_t)rs1; + int range =3D (imm & 0x1F) + 1; /* imm specifies bits-1 */ + int64_t max =3D (1LL << (range - 1)) - 1; + int64_t min =3D -(1LL << (range - 1)); + int sat =3D 0; + + if (a > max) { + a =3D max; + sat =3D 1; + } else if (a < min) { + a =3D min; + sat =3D 1; + } + + if (sat) { + env->vxsat =3D 1; + } + return (uint32_t)a; +} + +/** + * USATI_32 - 32-bit scalar unsigned saturation with immediate range + */ +uint32_t HELPER(usati_32)(CPURISCVState *env, + uint32_t rs1, uint32_t imm) +{ + int32_t a =3D (int32_t)rs1; + uint32_t max =3D (1U << imm) - 1; + int sat =3D 0; + + if (a < 0) { + a =3D 0; + sat =3D 1; + } else if ((uint32_t)a > max) { + a =3D max; + sat =3D 1; + } + + if (sat) { + env->vxsat =3D 1; + } + return (uint32_t)a; +} + +/** + * SATI_64 - 64-bit scalar signed saturation with immediate range + */ +uint64_t HELPER(sati_64)(CPURISCVState *env, + uint64_t rs1, uint64_t imm) +{ + int64_t a =3D (int64_t)rs1; + int range =3D (imm & 0x3F) + 1; + int64_t max =3D (1LL << (range - 1)) - 1; + int64_t min =3D -(1LL << (range - 1)); + int sat =3D 0; + + if (a > max) { + a =3D max; + sat =3D 1; + } else if (a < min) { + a =3D min; + sat =3D 1; + } + + if (sat) { + env->vxsat =3D 1; + } + return (uint64_t)a; +} + +/** + * USATI_64 - 64-bit scalar unsigned saturation with immediate range + */ +uint64_t HELPER(usati_64)(CPURISCVState *env, + uint64_t rs1, uint64_t imm) +{ + int64_t a =3D (int64_t)rs1; + uint64_t max =3D (1ULL << imm) - 1; + int sat =3D 0; + + if (a < 0) { + a =3D 0; + sat =3D 1; + } else if ((uint64_t)a > max) { + a =3D max; + sat =3D 1; + } + + if (sat) { + env->vxsat =3D 1; + } + return (uint64_t)a; +} diff --git a/target/riscv/translate.c b/target/riscv/translate.c index 81087e0a5d..de3ec7a7ec 100644 --- a/target/riscv/translate.c +++ b/target/riscv/translate.c @@ -1206,6 +1206,7 @@ static uint32_t opcode_at(DisasContextBase *dcbase, t= arget_ulong pc) #include "insn_trans/trans_rvh.c.inc" #include "insn_trans/trans_rvv.c.inc" #include "insn_trans/trans_rvb.c.inc" +#include "insn_trans/trans_rvp.c.inc" #include "insn_trans/trans_rvzicond.c.inc" #include "insn_trans/trans_rvzacas.c.inc" #include "insn_trans/trans_rvzabha.c.inc" --=20 2.34.1 From nobody Sat May 30 20:13:15 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1776422968466246.70471034990396; Fri, 17 Apr 2026 03:49:28 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wDgjH-00012U-9I; Fri, 17 Apr 2026 06:47:19 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wDgjE-00010h-AH; Fri, 17 Apr 2026 06:47:16 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1wDgjB-0007w9-75; Fri, 17 Apr 2026 06:47:16 -0400 Received: from Huawei.localdomain (unknown [36.110.52.2]) by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S5; Fri, 17 Apr 2026 18:47:09 +0800 (CST) From: Molly Chen To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com, daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com, chao.liu.zevorn@gmail.com Cc: xiaoou@iscas.ac.cn, qemu-riscv@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 03/14] target/riscv: rvp: add averaging operations Date: Fri, 17 Apr 2026 18:46:40 +0800 Message-Id: <20260417104652.17857-4-xiaoou@iscas.ac.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn> References: <20260417104652.17857-1-xiaoou@iscas.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S5 X-Coremail-Antispam: 1UD129KBjvJXoW3uF47GF4kKrWUCr13ur4Durg_yoWkWr43pF WkJry2qay8JFWaqr4SkF15Ar43WFsxJw48Gr43tFySva1rJFZ5tryUtw42yFsxWF9rWF1Y 9a90y34DAa4Iqa7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUBl14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JrWl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UM2 8EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0DM2AI xVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20x vE14v26r1Y6r17McIj6I8E87Iv67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xv r2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7MxkF7I0En4kS14v26r1q6r43Mx AIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_ Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwI xGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWx JwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcV C2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7VUbF4iUUUUUU== X-Originating-IP: [36.110.52.2] X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists1p.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn; helo=cstnet.cn X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998, HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1776422970561158500 Content-Type: text/plain; charset="utf-8" Signed-off-by: Molly Chen --- target/riscv/helper.h | 20 ++ target/riscv/insn32.decode | 28 ++- target/riscv/insn_trans/trans_rvp.c.inc | 20 ++ target/riscv/psimd_helper.c | 266 ++++++++++++++++++++++++ 4 files changed, 333 insertions(+), 1 deletion(-) diff --git a/target/riscv/helper.h b/target/riscv/helper.h index 76bc6583fb..a72e02b44c 100644 --- a/target/riscv/helper.h +++ b/target/riscv/helper.h @@ -1353,6 +1353,7 @@ DEF_HELPER_1(ssamoswap_disabled, void, env) #endif =20 /* Packed SIMD */ +/* Packed SIMD - Arithmetic Operations(Non-Saturating and Saturating) */ DEF_HELPER_3(padd_b, tl, env, tl, tl) DEF_HELPER_3(padd_h, tl, env, tl, tl) DEF_HELPER_3(padd_w, i64, env, i64, i64) @@ -1391,3 +1392,22 @@ DEF_HELPER_3(sati_32, i32, env, i32, i32) DEF_HELPER_3(usati_32, i32, env, i32, i32) DEF_HELPER_3(sati_64, i64, env, i64, i64) DEF_HELPER_3(usati_64, i64, env, i64, i64) + +/* Packed SIMD - Averaging and Rounding Operations */ +DEF_HELPER_3(paadd_b, tl, env, tl, tl) +DEF_HELPER_3(paadd_h, tl, env, tl, tl) +DEF_HELPER_3(paadd_w, i64, env, i64, i64) +DEF_HELPER_3(paaddu_b, tl, env, tl, tl) +DEF_HELPER_3(paaddu_h, tl, env, tl, tl) +DEF_HELPER_3(paaddu_w, i64, env, i64, i64) +DEF_HELPER_3(aadd, i32, env, i32, i32) +DEF_HELPER_3(aaddu, i32, env, i32, i32) +DEF_HELPER_3(pasub_b, tl, env, tl, tl) +DEF_HELPER_3(pasub_h, tl, env, tl, tl) +DEF_HELPER_3(pasub_w, i64, env, i64, i64) +DEF_HELPER_3(pasubu_b, tl, env, tl, tl) +DEF_HELPER_3(pasubu_h, tl, env, tl, tl) +DEF_HELPER_3(pasubu_w, i64, env, i64, i64) +DEF_HELPER_3(asub, i32, env, i32, i32) +DEF_HELPER_3(asubu, i32, env, i32, i32) + diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode index 6043eb39cf..f609c38638 100644 --- a/target/riscv/insn32.decode +++ b/target/riscv/insn32.decode @@ -1094,7 +1094,7 @@ sd_aqrl 00111 . . ..... ..... 011 ..... 0101111 @ato= m_st =20 =20 # *** P Experimental Extension Version v018 *** -# Arithmetic Operations(Non-Saturating and Saturating) +# Packed SIMD - Arithmetic Operations(Non-Saturating and Saturating) padd_b 1000010 ..... ..... 000 ..... 0111011 @r padd_h 1000000 ..... ..... 000 ..... 0111011 @r padd_w 1000001 ..... ..... 000 ..... 0111011 @r @@ -1149,3 +1149,29 @@ pusati_h 10100 001.... ..... 100 ..... 0011011 @p_= ui16 sati_64 111001 ...... ..... 100 ..... 0011011 @p_ui64 usati_64 101001 ...... ..... 100 ..... 0011011 @p_ui64 =20 +# Packed SIMD - Averaging and Rounding Operations +paadd_b 1001110 ..... ..... 000 ..... 0111011 @r +paadd_h 1001100 ..... ..... 000 ..... 0111011 @r +{ + aadd 1001101 ..... ..... 000 ..... 0111011 @r + paadd_w 1001101 ..... ..... 000 ..... 0111011 @r +} +paaddu_b 1011110 ..... ..... 000 ..... 0111011 @r +paaddu_h 1011100 ..... ..... 000 ..... 0111011 @r +{ + aaddu 1011101 ..... ..... 000 ..... 0111011 @r + paaddu_w 1011101 ..... ..... 000 ..... 0111011 @r +} +pasub_b 1101110 ..... ..... 000 ..... 0111011 @r +pasub_h 1101100 ..... ..... 000 ..... 0111011 @r +{ + asub 1101101 ..... ..... 000 ..... 0111011 @r + pasub_w 1101101 ..... ..... 000 ..... 0111011 @r +} +pasubu_b 1111110 ..... ..... 000 ..... 0111011 @r +pasubu_h 1111100 ..... ..... 000 ..... 0111011 @r +{ + asubu 1111101 ..... ..... 000 ..... 0111011 @r + pasubu_w 1111101 ..... ..... 000 ..... 0111011 @r +} + diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr= ans/trans_rvp.c.inc index 6f7246b563..e3abb38d18 100644 --- a/target/riscv/insn_trans/trans_rvp.c.inc +++ b/target/riscv/insn_trans/trans_rvp.c.inc @@ -524,6 +524,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME = * a) \ } #endif =20 +/* Packed SIMD - Arithmetic Operations(Non-Saturating and Saturating) */ GEN_SIMD_TRANS(padd_b) GEN_SIMD_TRANS(padd_h) GEN_SIMD_TRANS_64(padd_w) @@ -562,3 +563,22 @@ GEN_SIMD_TRANS_IMM_32(sati_32) GEN_SIMD_TRANS_IMM_32(usati_32) GEN_SIMD_TRANS_IMM_64(sati_64) GEN_SIMD_TRANS_IMM_64(usati_64) + +/* Packed SIMD - Averaging and Rounding Operations */ +GEN_SIMD_TRANS(paadd_b) +GEN_SIMD_TRANS(paadd_h) +GEN_SIMD_TRANS_64(paadd_w) +GEN_SIMD_TRANS(paaddu_b) +GEN_SIMD_TRANS(paaddu_h) +GEN_SIMD_TRANS_64(paaddu_w) +GEN_SIMD_TRANS_32(aadd) +GEN_SIMD_TRANS_32(aaddu) +GEN_SIMD_TRANS(pasub_b) +GEN_SIMD_TRANS(pasub_h) +GEN_SIMD_TRANS_64(pasub_w) +GEN_SIMD_TRANS(pasubu_b) +GEN_SIMD_TRANS(pasubu_h) +GEN_SIMD_TRANS_64(pasubu_w) +GEN_SIMD_TRANS_32(asub) +GEN_SIMD_TRANS_32(asubu) + diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c index a754ee3b5e..23c0402de2 100644 --- a/target/riscv/psimd_helper.c +++ b/target/riscv/psimd_helper.c @@ -1067,3 +1067,269 @@ uint64_t HELPER(usati_64)(CPURISCVState *env, } return (uint64_t)a; } + +/* Averaging Operations (non-saturating) */ + +/** + * PAADD.B - Packed 8-bit signed averaging addition + * For each byte: rd[i] =3D (rs1[i] + rs2[i]) >> 1 + */ +target_ulong HELPER(paadd_b)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int8_t)EXTRACT8(rs1, i); + int16_t e2 =3D (int8_t)EXTRACT8(rs2, i); + int16_t avg =3D (e1 + e2) >> 1; + rd =3D INSERT8(rd, (int8_t)avg, i); + } + return rd; +} + +/** + * PAADD.H - Packed 16-bit signed averaging addition + * For each halfword: rd[i] =3D (rs1[i] + rs2[i]) >> 1 + */ +target_ulong HELPER(paadd_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int32_t e2 =3D (int16_t)EXTRACT16(rs2, i); + int32_t avg =3D (e1 + e2) >> 1; + rd =3D INSERT16(rd, (int16_t)avg, i); + } + return rd; +} + +/** + * PAADD.W - Packed 32-bit signed averaging addition (RV64 only) + * For each word: rd[i] =3D (rs1[i] + rs2[i]) >> 1 + */ +uint64_t HELPER(paadd_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int64_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int64_t e2 =3D (int32_t)EXTRACT32(rs2, i); + int64_t avg =3D (e1 + e2) >> 1; + rd =3D INSERT32(rd, (int32_t)avg, i); + } + return rd; +} + +/** + * PAADDU.B - Packed 8-bit unsigned averaging addition + * For each byte: rd[i] =3D (rs1[i] + rs2[i]) >> 1 + */ +target_ulong HELPER(paaddu_b)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT8(rs1, i); + uint16_t e2 =3D EXTRACT8(rs2, i); + uint16_t avg =3D (e1 + e2) >> 1; + rd =3D INSERT8(rd, (uint8_t)avg, i); + } + return rd; +} + +/** + * PAADDU.H - Packed 16-bit unsigned averaging addition + * For each halfword: rd[i] =3D (rs1[i] + rs2[i]) >> 1 + */ +target_ulong HELPER(paaddu_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT16(rs1, i); + uint32_t e2 =3D EXTRACT16(rs2, i); + uint32_t avg =3D (e1 + e2) >> 1; + rd =3D INSERT16(rd, (uint16_t)avg, i); + } + return rd; +} + +/** + * PAADDU.W - Packed 32-bit unsigned averaging addition (RV64 only) + * For each word: rd[i] =3D (rs1[i] + rs2[i]) >> 1 + */ +uint64_t HELPER(paaddu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint64_t e1 =3D EXTRACT32(rs1, i); + uint64_t e2 =3D EXTRACT32(rs2, i); + uint64_t avg =3D (e1 + e2) >> 1; + rd =3D INSERT32(rd, (uint32_t)avg, i); + } + return rd; +} + +/** + * AADD - 32-bit signed averaging addition + */ +uint32_t HELPER(aadd)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int64_t a =3D (int32_t)rs1; + int64_t b =3D (int32_t)rs2; + return (uint32_t)((a + b) >> 1); +} + +/** + * AADDU - 32-bit unsigned averaging addition + */ +uint32_t HELPER(aaddu)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t a =3D rs1; + uint64_t b =3D rs2; + return (uint32_t)((a + b) >> 1); +} + +/** + * PASUB.B - Packed 8-bit signed averaging subtraction + * For each byte: rd[i] =3D (rs1[i] - rs2[i]) >> 1 + */ +target_ulong HELPER(pasub_b)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int8_t)EXTRACT8(rs1, i); + int16_t e2 =3D (int8_t)EXTRACT8(rs2, i); + int16_t avg =3D (e1 - e2) >> 1; + rd =3D INSERT8(rd, (int8_t)avg, i); + } + return rd; +} + +/** + * PASUB.H - Packed 16-bit signed averaging subtraction + * For each halfword: rd[i] =3D (rs1[i] - rs2[i]) >> 1 + */ +target_ulong HELPER(pasub_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int32_t e2 =3D (int16_t)EXTRACT16(rs2, i); + int32_t avg =3D (e1 - e2) >> 1; + rd =3D INSERT16(rd, (int16_t)avg, i); + } + return rd; +} + +/** + * PASUB.W - Packed 32-bit signed averaging subtraction (RV64 only) + * For each word: rd[i] =3D (rs1[i] - rs2[i]) >> 1 + */ +uint64_t HELPER(pasub_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int64_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int64_t e2 =3D (int32_t)EXTRACT32(rs2, i); + int64_t avg =3D (e1 - e2) >> 1; + rd =3D INSERT32(rd, (int32_t)avg, i); + } + return rd; +} + +/** + * PASUBU.B - Packed 8-bit unsigned averaging subtraction + * For each byte: rd[i] =3D (rs1[i] - rs2[i]) >> 1 + */ +target_ulong HELPER(pasubu_b)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT8(rs1, i); + uint16_t e2 =3D EXTRACT8(rs2, i); + uint16_t avg =3D (e1 - e2) >> 1; + rd =3D INSERT8(rd, (uint8_t)avg, i); + } + return rd; +} + +/** + * PASUBU.H - Packed 16-bit unsigned averaging subtraction + * For each halfword: rd[i] =3D (rs1[i] - rs2[i]) >> 1 + */ +target_ulong HELPER(pasubu_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT16(rs1, i); + uint32_t e2 =3D EXTRACT16(rs2, i); + uint32_t avg =3D (e1 - e2) >> 1; + rd =3D INSERT16(rd, (uint16_t)avg, i); + } + return rd; +} + +/** + * PASUBU.W - Packed 32-bit unsigned averaging subtraction (RV64 only) + * For each word: rd[i] =3D (rs1[i] - rs2[i]) >> 1 + */ +uint64_t HELPER(pasubu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint64_t e1 =3D EXTRACT32(rs1, i); + uint64_t e2 =3D EXTRACT32(rs2, i); + uint64_t avg =3D (e1 - e2) >> 1; + rd =3D INSERT32(rd, (uint32_t)avg, i); + } + return rd; +} + +/** + * ASUB - 32-bit signed averaging subtraction + */ +uint32_t HELPER(asub)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int64_t a =3D (int32_t)rs1; + int64_t b =3D (int32_t)rs2; + return (uint32_t)((a - b) >> 1); +} + +/** + * ASUBU - 32-bit unsigned averaging subtraction + */ +uint32_t HELPER(asubu)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t a =3D rs1; + uint64_t b =3D rs2; + return (uint32_t)((a - b) >> 1); +} --=20 2.34.1 From nobody Sat May 30 20:13:15 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 17764228808321020.8274620942657; Fri, 17 Apr 2026 03:48:00 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wDgjJ-00013K-9m; Fri, 17 Apr 2026 06:47:21 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wDgjG-00012J-OJ; Fri, 17 Apr 2026 06:47:18 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1wDgjD-0007xe-ED; Fri, 17 Apr 2026 06:47:18 -0400 Received: from Huawei.localdomain (unknown [36.110.52.2]) by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S6; Fri, 17 Apr 2026 18:47:10 +0800 (CST) From: Molly Chen To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com, daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com, chao.liu.zevorn@gmail.com Cc: xiaoou@iscas.ac.cn, qemu-riscv@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 04/14] target/riscv: rvp: add absolute value and difference, comparison and mask generation operations Date: Fri, 17 Apr 2026 18:46:41 +0800 Message-Id: <20260417104652.17857-5-xiaoou@iscas.ac.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn> References: <20260417104652.17857-1-xiaoou@iscas.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S6 X-Coremail-Antispam: 1UD129KBjvAXoWfuw4DJrWfKrWktw43GF43GFg_yoW8ur1kXo ZrKw15A34fGr1fW348uw4xZr18XrW2v3WDGr48uw45Z3s3WF1Sgr15J3WkA3WxtrWayrW3 X39aqFn8J3ZxK3sxn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOU7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l 84ACjcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26F4j6r 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY1x0262kKe7AKxVWUtV W8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJw CI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7VUbnNVPUU UUU== X-Originating-IP: [36.110.52.2] X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists1p.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn; helo=cstnet.cn X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998, HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1776422884014154100 Content-Type: text/plain; charset="utf-8" Signed-off-by: Molly Chen --- target/riscv/helper.h | 38 ++ target/riscv/insn32.decode | 44 ++ target/riscv/insn_trans/trans_rvp.c.inc | 38 ++ target/riscv/psimd_helper.c | 634 ++++++++++++++++++++++++ 4 files changed, 754 insertions(+) diff --git a/target/riscv/helper.h b/target/riscv/helper.h index a72e02b44c..f6351ecd43 100644 --- a/target/riscv/helper.h +++ b/target/riscv/helper.h @@ -1411,3 +1411,41 @@ DEF_HELPER_3(pasubu_w, i64, env, i64, i64) DEF_HELPER_3(asub, i32, env, i32, i32) DEF_HELPER_3(asubu, i32, env, i32, i32) =20 +/* Packed SIMD - Absolute Value and Difference Operations */ +DEF_HELPER_2(psabs_b, tl, env, tl) +DEF_HELPER_2(psabs_h, tl, env, tl) +DEF_HELPER_2(abs, tl, env, tl) +DEF_HELPER_2(absw, i64, env, i64) +DEF_HELPER_3(pabd_b, tl, env, tl, tl) +DEF_HELPER_3(pabdu_b, tl, env, tl, tl) +DEF_HELPER_3(pabd_h, tl, env, tl, tl) +DEF_HELPER_3(pabdu_h, tl, env, tl, tl) +DEF_HELPER_3(pabdsumu_b, tl, env, tl, tl) +DEF_HELPER_4(pabdsumau_b, tl, env, tl, tl, tl) + +/* Packed SIMD - Comparison and Mask Generation Operations */ +DEF_HELPER_3(pmseq_b, tl, env, tl, tl) +DEF_HELPER_3(pmslt_b, tl, env, tl, tl) +DEF_HELPER_3(pmsltu_b, tl, env, tl, tl) +DEF_HELPER_3(pmin_b, tl, env, tl, tl) +DEF_HELPER_3(pminu_b, tl, env, tl, tl) +DEF_HELPER_3(pmax_b, tl, env, tl, tl) +DEF_HELPER_3(pmaxu_b, tl, env, tl, tl) +DEF_HELPER_3(pmseq_h, tl, env, tl, tl) +DEF_HELPER_3(pmslt_h, tl, env, tl, tl) +DEF_HELPER_3(pmsltu_h, tl, env, tl, tl) +DEF_HELPER_3(pmin_h, tl, env, tl, tl) +DEF_HELPER_3(pminu_h, tl, env, tl, tl) +DEF_HELPER_3(pmax_h, tl, env, tl, tl) +DEF_HELPER_3(pmaxu_h, tl, env, tl, tl) +DEF_HELPER_3(pmseq_w, i64, env, i64, i64) +DEF_HELPER_3(pmslt_w, i64, env, i64, i64) +DEF_HELPER_3(pmsltu_w, i64, env, i64, i64) +DEF_HELPER_3(pmin_w, i64, env, i64, i64) +DEF_HELPER_3(pminu_w, i64, env, i64, i64) +DEF_HELPER_3(pmax_w, i64, env, i64, i64) +DEF_HELPER_3(pmaxu_w, i64, env, i64, i64) +DEF_HELPER_3(mseq, i32, env, i32, i32) +DEF_HELPER_3(mslt, i32, env, i32, i32) +DEF_HELPER_3(msltu, i32, env, i32, i32) + diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode index f609c38638..2034041639 100644 --- a/target/riscv/insn32.decode +++ b/target/riscv/insn32.decode @@ -1175,3 +1175,47 @@ pasubu_h 1111100 ..... ..... 000 ..... 0111011 @r pasubu_w 1111101 ..... ..... 000 ..... 0111011 @r } =20 +# Packed SIMD - Absolute Value and Difference Operations +psabs_b 1110010 00111 ..... 010 ..... 0011011 @r2 +psabs_h 1110000 00111 ..... 010 ..... 0011011 @r2 +abs 01100 0000111 ..... 001 ..... 0010011 @r2 +absw 01100 0000111 ..... 001 ..... 0011011 @r2 +pabd_b 1100110 ..... ..... 000 ..... 0111011 @r +pabdu_b 1110110 ..... ..... 000 ..... 0111011 @r +pabd_h 1100100 ..... ..... 000 ..... 0111011 @r +pabdu_h 1110100 ..... ..... 000 ..... 0111011 @r +pabdsumu_b 1011010 ..... ..... 001 ..... 0111011 @r +pabdsumau_b 1011110 ..... ..... 001 ..... 0111011 @r + +# Packed SIMD - Comparison and Mask Generation Operations +pmseq_b 1100010 ..... ..... 110 ..... 0111011 @r +pmslt_b 1101010 ..... ..... 110 ..... 0111011 @r +pmsltu_b 1101110 ..... ..... 110 ..... 0111011 @r +pmin_b 1110010 ..... ..... 110 ..... 0111011 @r +pminu_b 1110110 ..... ..... 110 ..... 0111011 @r +pmax_b 1111010 ..... ..... 110 ..... 0111011 @r +pmaxu_b 1111110 ..... ..... 110 ..... 0111011 @r +pmseq_h 1100000 ..... ..... 110 ..... 0111011 @r +pmslt_h 1101000 ..... ..... 110 ..... 0111011 @r +pmsltu_h 1101100 ..... ..... 110 ..... 0111011 @r +pmin_h 1110000 ..... ..... 110 ..... 0111011 @r +pminu_h 1110100 ..... ..... 110 ..... 0111011 @r +pmax_h 1111000 ..... ..... 110 ..... 0111011 @r +pmaxu_h 1111100 ..... ..... 110 ..... 0111011 @r +{ + mseq 1100001 ..... ..... 110 ..... 0111011 @r + pmseq_w 1100001 ..... ..... 110 ..... 0111011 @r +} +{ + mslt 1101001 ..... ..... 110 ..... 0111011 @r + pmslt_w 1101001 ..... ..... 110 ..... 0111011 @r +} +{ + msltu 1101101 ..... ..... 110 ..... 0111011 @r + pmsltu_w 1101101 ..... ..... 110 ..... 0111011 @r +} +pmin_w 1110001 ..... ..... 110 ..... 0111011 @r +pminu_w 1110101 ..... ..... 110 ..... 0111011 @r +pmax_w 1111001 ..... ..... 110 ..... 0111011 @r +pmaxu_w 1111101 ..... ..... 110 ..... 0111011 @r + diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr= ans/trans_rvp.c.inc index e3abb38d18..27d482863c 100644 --- a/target/riscv/insn_trans/trans_rvp.c.inc +++ b/target/riscv/insn_trans/trans_rvp.c.inc @@ -582,3 +582,41 @@ GEN_SIMD_TRANS_64(pasubu_w) GEN_SIMD_TRANS_32(asub) GEN_SIMD_TRANS_32(asubu) =20 +/* Packed SIMD - Absolute Value and Difference Operations */ +GEN_SIMD_TRANS_R1(psabs_b) +GEN_SIMD_TRANS_R1(psabs_h) +GEN_SIMD_TRANS_R1(abs) +GEN_SIMD_TRANS_R1_64(absw) +GEN_SIMD_TRANS(pabd_b) +GEN_SIMD_TRANS(pabdu_b) +GEN_SIMD_TRANS(pabd_h) +GEN_SIMD_TRANS(pabdu_h) +GEN_SIMD_TRANS(pabdsumu_b) +GEN_SIMD_TRANS_ACC(pabdsumau_b) + +/* Packed SIMD - Comparison and Mask Generation Operations */ +GEN_SIMD_TRANS(pmseq_b) +GEN_SIMD_TRANS(pmslt_b) +GEN_SIMD_TRANS(pmsltu_b) +GEN_SIMD_TRANS(pmin_b) +GEN_SIMD_TRANS(pminu_b) +GEN_SIMD_TRANS(pmax_b) +GEN_SIMD_TRANS(pmaxu_b) +GEN_SIMD_TRANS(pmseq_h) +GEN_SIMD_TRANS(pmslt_h) +GEN_SIMD_TRANS(pmsltu_h) +GEN_SIMD_TRANS(pmin_h) +GEN_SIMD_TRANS(pminu_h) +GEN_SIMD_TRANS(pmax_h) +GEN_SIMD_TRANS(pmaxu_h) +GEN_SIMD_TRANS_64(pmseq_w) +GEN_SIMD_TRANS_64(pmslt_w) +GEN_SIMD_TRANS_64(pmsltu_w) +GEN_SIMD_TRANS_64(pmin_w) +GEN_SIMD_TRANS_64(pminu_w) +GEN_SIMD_TRANS_64(pmax_w) +GEN_SIMD_TRANS_64(pmaxu_w) +GEN_SIMD_TRANS_32(mseq) +GEN_SIMD_TRANS_32(mslt) +GEN_SIMD_TRANS_32(msltu) + diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c index 23c0402de2..38207c3a39 100644 --- a/target/riscv/psimd_helper.c +++ b/target/riscv/psimd_helper.c @@ -1333,3 +1333,637 @@ uint32_t HELPER(asubu)(CPURISCVState *env, uint32_t= rs1, uint32_t rs2) uint64_t b =3D rs2; return (uint32_t)((a - b) >> 1); } + +/* Absolute value operations */ + +/** + * PSABS.B - Packed 8-bit absolute value + * For each byte: rd[i] =3D abs(rs1[i]), saturate if MIN + */ +target_ulong HELPER(psabs_b)(CPURISCVState *env, target_ulong rs1) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + int8_t e1 =3D (int8_t)EXTRACT8(rs1, i); + int8_t res; + + if (e1 =3D=3D INT8_MIN) { + res =3D INT8_MAX; + sat =3D 1; + } else if (e1 < 0) { + res =3D -e1; + } else { + res =3D e1; + } + + rd =3D INSERT8(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSABS.H - Packed 16-bit absolute value + * For each halfword: rd[i] =3D abs(rs1[i]), saturate if MIN + */ +target_ulong HELPER(psabs_h)(CPURISCVState *env, target_ulong rs1) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t res; + + if (e1 =3D=3D INT16_MIN) { + res =3D INT16_MAX; + sat =3D 1; + } else if (e1 < 0) { + res =3D -e1; + } else { + res =3D e1; + } + + rd =3D INSERT16(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * ABS - 32/64-bit scalar absolute value + */ +target_ulong HELPER(abs)(CPURISCVState *env, target_ulong rs1) +{ + target_long a =3D (target_long)rs1; + return (a < 0) ? (target_ulong)(-a) : rs1; +} + +/** + * ABSW - Absolute value of low 32 bits (RV64) + */ +uint64_t HELPER(absw)(CPURISCVState *env, uint64_t rs1) +{ + int32_t a =3D (int32_t)EXTRACT32(rs1, 0); + uint32_t res; + + if (a =3D=3D INT32_MIN) { + res =3D 0x80000000; + } else if (a < 0) { + res =3D (uint32_t)(-a); + } else { + res =3D (uint32_t)a; + } + + return (uint64_t)res; +} + + +/* Absolute difference operations */ + +/** + * PABD.B - Packed 8-bit signed absolute difference + * For each byte: rd[i] =3D |rs1[i] - rs2[i]| + */ +target_ulong HELPER(pabd_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + + for (int i =3D 0; i < elems; i++) { + int8_t e1 =3D (int8_t)EXTRACT8(rs1, i); + int8_t e2 =3D (int8_t)EXTRACT8(rs2, i); + int16_t diff =3D (int16_t)e1 - (int16_t)e2; + uint8_t res =3D (diff >=3D 0) ? (uint8_t)diff : (uint8_t)(-diff); + rd =3D INSERT8(rd, res, i); + } + return rd; +} + +/** + * PABDU.B - Packed 8-bit unsigned absolute difference + * For each byte: rd[i] =3D |rs1[i] - rs2[i]| + */ +target_ulong HELPER(pabdu_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + + for (int i =3D 0; i < elems; i++) { + uint8_t e1 =3D EXTRACT8(rs1, i); + uint8_t e2 =3D EXTRACT8(rs2, i); + uint8_t res =3D (e1 > e2) ? (e1 - e2) : (e2 - e1); + rd =3D INSERT8(rd, res, i); + } + return rd; +} + +/** + * PABD.H - Packed 16-bit signed absolute difference + * For each halfword: rd[i] =3D |rs1[i] - rs2[i]| + */ +target_ulong HELPER(pabd_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t e2 =3D (int16_t)EXTRACT16(rs2, i); + int32_t diff =3D (int32_t)e1 - (int32_t)e2; + uint16_t res =3D (diff >=3D 0) ? (uint16_t)diff : (uint16_t)(-diff= ); + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PABDU.H - Packed 16-bit unsigned absolute difference + * For each halfword: rd[i] =3D |rs1[i] - rs2[i]| + */ +target_ulong HELPER(pabdu_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + uint16_t res =3D (e1 > e2) ? (e1 - e2) : (e2 - e1); + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PABDSUMU.B - Sum of unsigned absolute differences + * Returns sum(|rs1[i] - rs2[i]|) for all bytes + */ +target_ulong HELPER(pabdsumu_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong sum =3D 0; + int elems =3D ELEMS_B(rs1); + + for (int i =3D 0; i < elems; i++) { + uint8_t e1 =3D EXTRACT8(rs1, i); + uint8_t e2 =3D EXTRACT8(rs2, i); + uint8_t diff =3D (e1 > e2) ? (e1 - e2) : (e2 - e1); + sum +=3D diff; + } + + return sum; +} + +/** + * PABDSUMAU.B - Accumulated sum of unsigned absolute differences + * rd =3D rd + sum(|rs1[i] - rs2[i]|) + */ +target_ulong HELPER(pabdsumau_b)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong rd) +{ + target_ulong sum =3D rd; + int elems =3D ELEMS_B(rs1); + + for (int i =3D 0; i < elems; i++) { + uint8_t e1 =3D EXTRACT8(rs1, i); + uint8_t e2 =3D EXTRACT8(rs2, i); + uint8_t diff =3D (e1 > e2) ? (e1 - e2) : (e2 - e1); + sum +=3D diff; + } + + return sum; +} + +/* Comparison operations (producing masks) */ + +/** + * PMSEQ.B - Packed 8-bit equal comparison + * For each byte: rd[i] =3D 0xFF if rs1[i] =3D=3D rs2[i], else 0x00 + */ +target_ulong HELPER(pmseq_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + + for (int i =3D 0; i < elems; i++) { + uint8_t e1 =3D EXTRACT8(rs1, i); + uint8_t e2 =3D EXTRACT8(rs2, i); + uint8_t res =3D (e1 =3D=3D e2) ? 0xFF : 0x00; + rd =3D INSERT8(rd, res, i); + } + return rd; +} + +/** + * PMSLT.B - Packed 8-bit signed less-than comparison + * For each byte: rd[i] =3D 0xFF if rs1[i] < rs2[i], else 0x00 + */ +target_ulong HELPER(pmslt_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + + for (int i =3D 0; i < elems; i++) { + int8_t e1 =3D (int8_t)EXTRACT8(rs1, i); + int8_t e2 =3D (int8_t)EXTRACT8(rs2, i); + uint8_t res =3D (e1 < e2) ? 0xFF : 0x00; + rd =3D INSERT8(rd, res, i); + } + return rd; +} + +/** + * PMSLTU.B - Packed 8-bit unsigned less-than comparison + * For each byte: rd[i] =3D 0xFF if rs1[i] < rs2[i], else 0x00 + */ +target_ulong HELPER(pmsltu_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + + for (int i =3D 0; i < elems; i++) { + uint8_t e1 =3D EXTRACT8(rs1, i); + uint8_t e2 =3D EXTRACT8(rs2, i); + uint8_t res =3D (e1 < e2) ? 0xFF : 0x00; + rd =3D INSERT8(rd, res, i); + } + return rd; +} + +/** + * PMIN.B - Packed 8-bit signed minimum + * For each byte: rd[i] =3D min(rs1[i], rs2[i]) + */ +target_ulong HELPER(pmin_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + + for (int i =3D 0; i < elems; i++) { + int8_t e1 =3D (int8_t)EXTRACT8(rs1, i); + int8_t e2 =3D (int8_t)EXTRACT8(rs2, i); + int8_t res =3D (e1 < e2) ? e1 : e2; + rd =3D INSERT8(rd, res, i); + } + return rd; +} + +/** + * PMINU.B - Packed 8-bit unsigned minimum + * For each byte: rd[i] =3D min(rs1[i], rs2[i]) + */ +target_ulong HELPER(pminu_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + + for (int i =3D 0; i < elems; i++) { + uint8_t e1 =3D EXTRACT8(rs1, i); + uint8_t e2 =3D EXTRACT8(rs2, i); + uint8_t res =3D (e1 < e2) ? e1 : e2; + rd =3D INSERT8(rd, res, i); + } + return rd; +} + +/** + * PMAX.B - Packed 8-bit signed maximum + * For each byte: rd[i] =3D max(rs1[i], rs2[i]) + */ +target_ulong HELPER(pmax_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + + for (int i =3D 0; i < elems; i++) { + int8_t e1 =3D (int8_t)EXTRACT8(rs1, i); + int8_t e2 =3D (int8_t)EXTRACT8(rs2, i); + int8_t res =3D (e1 > e2) ? e1 : e2; + rd =3D INSERT8(rd, res, i); + } + return rd; +} + +/** + * PMAXU.B - Packed 8-bit unsigned maximum + * For each byte: rd[i] =3D max(rs1[i], rs2[i]) + */ +target_ulong HELPER(pmaxu_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + + for (int i =3D 0; i < elems; i++) { + uint8_t e1 =3D EXTRACT8(rs1, i); + uint8_t e2 =3D EXTRACT8(rs2, i); + uint8_t res =3D (e1 > e2) ? e1 : e2; + rd =3D INSERT8(rd, res, i); + } + return rd; +} + +/** + * PMSEQ.H - Packed 16-bit equal comparison + * For each halfword: rd[i] =3D 0xFFFF if rs1[i] =3D=3D rs2[i], else 0x0000 + */ +target_ulong HELPER(pmseq_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + uint16_t res =3D (e1 =3D=3D e2) ? 0xFFFF : 0x0000; + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PMSLT.H - Packed 16-bit signed less-than comparison + * For each halfword: rd[i] =3D 0xFFFF if rs1[i] < rs2[i], else 0x0000 + */ +target_ulong HELPER(pmslt_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t e2 =3D (int16_t)EXTRACT16(rs2, i); + uint16_t res =3D (e1 < e2) ? 0xFFFF : 0x0000; + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PMSLTU.H - Packed 16-bit unsigned less-than comparison + * For each halfword: rd[i] =3D 0xFFFF if rs1[i] < rs2[i], else 0x0000 + */ +target_ulong HELPER(pmsltu_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + uint16_t res =3D (e1 < e2) ? 0xFFFF : 0x0000; + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PMIN.H - Packed 16-bit signed minimum + * For each halfword: rd[i] =3D min(rs1[i], rs2[i]) + */ +target_ulong HELPER(pmin_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t e2 =3D (int16_t)EXTRACT16(rs2, i); + int16_t res =3D (e1 < e2) ? e1 : e2; + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PMINU.H - Packed 16-bit unsigned minimum + * For each halfword: rd[i] =3D min(rs1[i], rs2[i]) + */ +target_ulong HELPER(pminu_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + uint16_t res =3D (e1 < e2) ? e1 : e2; + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PMAX.H - Packed 16-bit signed maximum + * For each halfword: rd[i] =3D max(rs1[i], rs2[i]) + */ +target_ulong HELPER(pmax_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t e2 =3D (int16_t)EXTRACT16(rs2, i); + int16_t res =3D (e1 > e2) ? e1 : e2; + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PMAXU.H - Packed 16-bit unsigned maximum + * For each halfword: rd[i] =3D max(rs1[i], rs2[i]) + */ +target_ulong HELPER(pmaxu_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + uint16_t res =3D (e1 > e2) ? e1 : e2; + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PMSEQ.W - Packed 32-bit equal comparison (RV64 only) + * For each word: rd[i] =3D 0xFFFFFFFF if rs1[i] =3D=3D rs2[i], else 0x000= 00000 + */ +uint64_t HELPER(pmseq_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + uint32_t res =3D (e1 =3D=3D e2) ? 0xFFFFFFFFU : 0x00000000U; + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMSLT.W - Packed 32-bit signed less-than comparison (RV64 only) + * For each word: rd[i] =3D 0xFFFFFFFF if rs1[i] < rs2[i], else 0x00000000 + */ +uint64_t HELPER(pmslt_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int32_t e2 =3D (int32_t)EXTRACT32(rs2, i); + uint32_t res =3D (e1 < e2) ? 0xFFFFFFFFU : 0x00000000U; + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMSLTU.W - Packed 32-bit unsigned less-than comparison (RV64 only) + * For each word: rd[i] =3D 0xFFFFFFFF if rs1[i] < rs2[i], else 0x00000000 + */ +uint64_t HELPER(pmsltu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + uint32_t res =3D (e1 < e2) ? 0xFFFFFFFFU : 0x00000000U; + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMIN.W - Packed 32-bit signed minimum (RV64 only) + * For each word: rd[i] =3D min(rs1[i], rs2[i]) + */ +uint64_t HELPER(pmin_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int32_t e2 =3D (int32_t)EXTRACT32(rs2, i); + int32_t res =3D (e1 < e2) ? e1 : e2; + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMINU.W - Packed 32-bit unsigned minimum (RV64 only) + * For each word: rd[i] =3D min(rs1[i], rs2[i]) + */ +uint64_t HELPER(pminu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + uint32_t res =3D (e1 < e2) ? e1 : e2; + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMAX.W - Packed 32-bit signed maximum (RV64 only) + * For each word: rd[i] =3D max(rs1[i], rs2[i]) + */ +uint64_t HELPER(pmax_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int32_t e2 =3D (int32_t)EXTRACT32(rs2, i); + int32_t res =3D (e1 > e2) ? e1 : e2; + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMAXU.W - Packed 32-bit unsigned maximum (RV64 only) + * For each word: rd[i] =3D max(rs1[i], rs2[i]) + */ +uint64_t HELPER(pmaxu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + uint32_t res =3D (e1 > e2) ? e1 : e2; + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * MSEQ - 32-bit scalar set if equal (mask) + */ +uint32_t HELPER(mseq)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + return (rs1 =3D=3D rs2) ? 0xFFFFFFFFU : 0x00000000U; +} + +/** + * MSLT - 32-bit scalar set if signed less than (mask) + */ +uint32_t HELPER(mslt)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + return ((int32_t)rs1 < (int32_t)rs2) ? 0xFFFFFFFFU : 0x00000000U; +} + +/** + * MSLTU - 32-bit scalar set if unsigned less than (mask) + */ +uint32_t HELPER(msltu)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + return (rs1 < rs2) ? 0xFFFFFFFFU : 0x00000000U; +} + --=20 2.34.1 From nobody Sat May 30 20:13:15 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1776422857901955.2302875986313; Fri, 17 Apr 2026 03:47:37 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wDgjS-0001Bn-1U; Fri, 17 Apr 2026 06:47:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wDgjK-00014V-Ds; Fri, 17 Apr 2026 06:47:22 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1wDgjG-0007yy-PO; Fri, 17 Apr 2026 06:47:22 -0400 Received: from Huawei.localdomain (unknown [36.110.52.2]) by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S7; Fri, 17 Apr 2026 18:47:12 +0800 (CST) From: Molly Chen To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com, daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com, chao.liu.zevorn@gmail.com Cc: xiaoou@iscas.ac.cn, qemu-riscv@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 05/14] target/riscv: rvp: add shift operations Date: Fri, 17 Apr 2026 18:46:42 +0800 Message-Id: <20260417104652.17857-6-xiaoou@iscas.ac.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn> References: <20260417104652.17857-1-xiaoou@iscas.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S7 X-Coremail-Antispam: 1UD129KBjvAXoWftr1ruFyrAw48CrW7Zr1fWFg_yoW8Kr1rWo ZxKw1Yyw1fGr13u348uw48Xr1Iqry2vw1DJr4rZr4UXa97Wr12gF15J34kZF4xJrWayrW5 XFZ3KF95JF1akr93n29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOU7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l 84ACjcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26F4j6r 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY1x0262kKe7AKxVWUtV W8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJw CI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7VUbnNVPUU UUU== X-Originating-IP: [36.110.52.2] X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists1p.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn; helo=cstnet.cn X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998, HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1776422860775158500 Content-Type: text/plain; charset="utf-8" Signed-off-by: Molly Chen --- target/riscv/helper.h | 34 ++ target/riscv/insn32.decode | 44 ++ target/riscv/insn_trans/trans_rvp.c.inc | 34 ++ target/riscv/psimd_helper.c | 736 ++++++++++++++++++++++++ 4 files changed, 848 insertions(+) diff --git a/target/riscv/helper.h b/target/riscv/helper.h index f6351ecd43..d97552eb58 100644 --- a/target/riscv/helper.h +++ b/target/riscv/helper.h @@ -1449,3 +1449,37 @@ DEF_HELPER_3(mseq, i32, env, i32, i32) DEF_HELPER_3(mslt, i32, env, i32, i32) DEF_HELPER_3(msltu, i32, env, i32, i32) =20 +/* Packed SIMD - Shift Operations */ +DEF_HELPER_3(pslli_b, tl, env, tl, tl) +DEF_HELPER_3(psll_bs, tl, env, tl, tl) +DEF_HELPER_3(pslli_h, tl, env, tl, tl) +DEF_HELPER_3(psll_hs, tl, env, tl, tl) +DEF_HELPER_3(pslli_w, i64, env, i64, i64) +DEF_HELPER_3(psll_ws, i64, env, i64, i64) +DEF_HELPER_3(psrli_b, tl, env, tl, tl) +DEF_HELPER_3(psrl_bs, tl, env, tl, tl) +DEF_HELPER_3(psrli_h, tl, env, tl, tl) +DEF_HELPER_3(psrl_hs, tl, env, tl, tl) +DEF_HELPER_3(psrli_w, i64, env, i64, i64) +DEF_HELPER_3(psrl_ws, i64, env, i64, i64) +DEF_HELPER_3(psrai_b, tl, env, tl, tl) +DEF_HELPER_3(psra_bs, tl, env, tl, tl) +DEF_HELPER_3(psrai_h, tl, env, tl, tl) +DEF_HELPER_3(psra_hs, tl, env, tl, tl) +DEF_HELPER_3(psrai_w, i64, env, i64, i64) +DEF_HELPER_3(psra_ws, i64, env, i64, i64) +DEF_HELPER_3(psslai_h, tl, env, tl, tl) +DEF_HELPER_3(psslai_w, i64, env, i64, i64) +DEF_HELPER_3(sslai, i32, env, i32, i32) +DEF_HELPER_3(psrari_h, tl, env, tl, tl) +DEF_HELPER_3(psrari_w, i64, env, i64, i64) +DEF_HELPER_3(srari_32, i32, env, i32, i32) +DEF_HELPER_3(srari_64, i64, env, i64, i64) +DEF_HELPER_3(pssha_hs, tl, env, tl, tl) +DEF_HELPER_3(pssha_ws, i64, env, i64, i64) +DEF_HELPER_3(psshar_hs, tl, env, tl, tl) +DEF_HELPER_3(psshar_ws, i64, env, i64, i64) +DEF_HELPER_3(ssha, i32, env, i32, i32) +DEF_HELPER_3(sshar, i32, env, i32, i32) +DEF_HELPER_3(sha, i64, env, i64, i64) +DEF_HELPER_3(shar, i64, env, i64, i64) diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode index 2034041639..69514e2cb9 100644 --- a/target/riscv/insn32.decode +++ b/target/riscv/insn32.decode @@ -40,6 +40,7 @@ %imm_z6 26:1 15:5 %imm_mop5 30:1 26:2 20:2 %imm_mop3 30:1 26:2 +%imm_p_ui8 20:3 %imm_p_ui16 20:4 %imm_p_ui32 20:5 %imm_p_ui64 20:6 @@ -108,6 +109,7 @@ @mop5 . . .. .. .... .. ..... ... ..... ....... &mop5 imm=3D%imm_mop5 %rd = %rs1 @mop3 . . .. .. . ..... ..... ... ..... ....... &mop3 imm=3D%imm_mop3 %rd = %rs1 %rs2 =20 +@p_ui8 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui8 %rs1 %= rd @p_ui16 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui16 %rs1 %= rd @p_ui32 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui32 %rs1 %= rd @p_ui64 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui64 %rs1 %= rd @@ -1219,3 +1221,45 @@ pminu_w 1110101 ..... ..... 110 ..... 0111011 @r pmax_w 1111001 ..... ..... 110 ..... 0111011 @r pmaxu_w 1111101 ..... ..... 110 ..... 0111011 @r =20 +# Packed SIMD - Shift Operations +pslli_b 10000 0001... ..... 010 ..... 0011011 @p_ui8 +psll_bs 1000110 ..... ..... 010 ..... 0011011 @r +pslli_h 10000 001.... ..... 010 ..... 0011011 @p_ui16 +psll_hs 1000100 ..... ..... 010 ..... 0011011 @r +pslli_w 10000 01..... ..... 010 ..... 0011011 @p_ui32 +psll_ws 1000101 ..... ..... 010 ..... 0011011 @r +psrli_b 10000 0001... ..... 100 ..... 0011011 @p_ui8 +psrl_bs 1000110 ..... ..... 100 ..... 0011011 @r +psrli_h 10000 001.... ..... 100 ..... 0011011 @p_ui16 +psrl_hs 1000100 ..... ..... 100 ..... 0011011 @r +psrli_w 10000 01..... ..... 100 ..... 0011011 @p_ui32 +psrl_ws 1000101 ..... ..... 100 ..... 0011011 @r +psrai_b 11000 0001... ..... 100 ..... 0011011 @p_ui8 +psra_bs 1100110 ..... ..... 100 ..... 0011011 @r +psrai_h 11000 001.... ..... 100 ..... 0011011 @p_ui16 +psra_hs 1100100 ..... ..... 100 ..... 0011011 @r +psrai_w 11000 01..... ..... 100 ..... 0011011 @p_ui32 +psra_ws 1100101 ..... ..... 100 ..... 0011011 @r +psslai_h 11010 001.... ..... 010 ..... 0011011 @p_ui16 +{ + sslai 11010 01..... ..... 010 ..... 0011011 @p_ui32 + psslai_w 11010 01..... ..... 010 ..... 0011011 @p_ui32 +} +psrari_h 11010 001.... ..... 100 ..... 0011011 @p_ui16 +{ + srari_32 11010 01..... ..... 100 ..... 0011011 @p_ui32 + psrari_w 11010 01..... ..... 100 ..... 0011011 @p_ui32 +} +srari_64 110101 ...... ..... 100 ..... 0011011 @p_ui64 +pssha_hs 1110100 ..... ..... 010 ..... 0011011 @r +{ + ssha 1110101 ..... ..... 010 ..... 0011011 @r + pssha_ws 1110101 ..... ..... 010 ..... 0011011 @r +} +psshar_hs 1111100 ..... ..... 010 ..... 0011011 @r +{ + sshar 1111101 ..... ..... 010 ..... 0011011 @r + psshar_ws 1111101 ..... ..... 010 ..... 0011011 @r +} +sha 1110111 ..... ..... 010 ..... 0011011 @r +shar 1111111 ..... ..... 010 ..... 0011011 @r diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr= ans/trans_rvp.c.inc index 27d482863c..d0b645d083 100644 --- a/target/riscv/insn_trans/trans_rvp.c.inc +++ b/target/riscv/insn_trans/trans_rvp.c.inc @@ -620,3 +620,37 @@ GEN_SIMD_TRANS_32(mseq) GEN_SIMD_TRANS_32(mslt) GEN_SIMD_TRANS_32(msltu) =20 +/* Packed SIMD - Shift Operations */ +GEN_SIMD_TRANS_IMM(pslli_b) +GEN_SIMD_TRANS(psll_bs) +GEN_SIMD_TRANS_IMM(pslli_h) +GEN_SIMD_TRANS(psll_hs) +GEN_SIMD_TRANS_IMM_64(pslli_w) +GEN_SIMD_TRANS_64(psll_ws) +GEN_SIMD_TRANS_IMM(psrli_b) +GEN_SIMD_TRANS(psrl_bs) +GEN_SIMD_TRANS_IMM(psrli_h) +GEN_SIMD_TRANS(psrl_hs) +GEN_SIMD_TRANS_IMM_64(psrli_w) +GEN_SIMD_TRANS_64(psrl_ws) +GEN_SIMD_TRANS_IMM(psrai_b) +GEN_SIMD_TRANS(psra_bs) +GEN_SIMD_TRANS_IMM(psrai_h) +GEN_SIMD_TRANS(psra_hs) +GEN_SIMD_TRANS_IMM_64(psrai_w) +GEN_SIMD_TRANS_64(psra_ws) +GEN_SIMD_TRANS_IMM(psslai_h) +GEN_SIMD_TRANS_IMM_64(psslai_w) +GEN_SIMD_TRANS_IMM_32(sslai) +GEN_SIMD_TRANS_IMM(psrari_h) +GEN_SIMD_TRANS_IMM_64(psrari_w) +GEN_SIMD_TRANS_IMM_32(srari_32) +GEN_SIMD_TRANS_IMM_64(srari_64) +GEN_SIMD_TRANS(pssha_hs) +GEN_SIMD_TRANS_64(pssha_ws) +GEN_SIMD_TRANS(psshar_hs) +GEN_SIMD_TRANS_64(psshar_ws) +GEN_SIMD_TRANS_32(ssha) +GEN_SIMD_TRANS_32(sshar) +GEN_SIMD_TRANS_64(sha) +GEN_SIMD_TRANS_64(shar) diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c index 38207c3a39..ef556eb007 100644 --- a/target/riscv/psimd_helper.c +++ b/target/riscv/psimd_helper.c @@ -1967,3 +1967,739 @@ uint32_t HELPER(msltu)(CPURISCVState *env, uint32_t= rs1, uint32_t rs2) return (rs1 < rs2) ? 0xFFFFFFFFU : 0x00000000U; } =20 +/* Shift operations (immediate and register) */ + +/** + * PSLLI.B - Packed 8-bit logical shift left immediate + * For each byte: rd[i] =3D rs1[i] << imm + */ +target_ulong HELPER(pslli_b)(CPURISCVState *env, + target_ulong rs1, target_ulong imm) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + uint8_t shamt =3D imm & 0x07; /* 8-bit elements, max shift 7 */ + + for (int i =3D 0; i < elems; i++) { + uint8_t e1 =3D EXTRACT8(rs1, i); + uint8_t res =3D e1 << shamt; + rd =3D INSERT8(rd, res, i); + } + return rd; +} + +/** + * PSLL.BS - Packed 8-bit logical shift left from register + * For each byte: rd[i] =3D rs1[i] << rs2[4:0] + */ +target_ulong HELPER(psll_bs)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + uint8_t shamt =3D rs2 & 0x07; /* rs2[2:0] for 8-bit */ + + for (int i =3D 0; i < elems; i++) { + uint8_t e1 =3D EXTRACT8(rs1, i); + uint8_t res =3D e1 << shamt; + rd =3D INSERT8(rd, res, i); + } + return rd; +} + +/** + * PSLLI.H - Packed 16-bit logical shift left immediate + * For each halfword: rd[i] =3D rs1[i] << imm + */ +target_ulong HELPER(pslli_h)(CPURISCVState *env, + target_ulong rs1, target_ulong imm) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + uint8_t shamt =3D imm & 0x0F; /* 16-bit elements, max shift 15 */ + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t res =3D e1 << shamt; + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PSLL.HS - Packed 16-bit logical shift left from register + * For each halfword: rd[i] =3D rs1[i] << rs2[4:0] + */ +target_ulong HELPER(psll_hs)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + uint8_t shamt =3D rs2 & 0x0F; /* rs2[3:0] for 16-bit */ + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t res =3D e1 << shamt; + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PSLLI.W - Packed 32-bit logical shift left immediate (RV64 only) + * For each word: rd[i] =3D rs1[i] << imm + */ +uint64_t HELPER(pslli_w)(CPURISCVState *env, uint64_t rs1, uint64_t imm) +{ + uint64_t rd =3D 0; + int elems =3D 2; + uint8_t shamt =3D imm & 0x1F; /* 32-bit elements, max shift 31 */ + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t res =3D e1 << shamt; + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PSLL.WS - Packed 32-bit logical shift left from register (RV64 only) + * For each word: rd[i] =3D rs1[i] << rs2[5:0] + */ +uint64_t HELPER(psll_ws)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + uint8_t shamt =3D rs2 & 0x1F; + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t res =3D e1 << shamt; + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PSRLI.B - Packed 8-bit logical shift right immediate + * For each byte: rd[i] =3D rs1[i] >> imm + */ +target_ulong HELPER(psrli_b)(CPURISCVState *env, + target_ulong rs1, target_ulong imm) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + uint8_t shamt =3D imm & 0x07; + + for (int i =3D 0; i < elems; i++) { + uint8_t e1 =3D EXTRACT8(rs1, i); + uint8_t res =3D e1 >> shamt; + rd =3D INSERT8(rd, res, i); + } + return rd; +} + +/** + * PSRL.BS - Packed 8-bit logical shift right from register + * For each byte: rd[i] =3D rs1[i] >> rs2[4:0] + */ +target_ulong HELPER(psrl_bs)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + uint8_t shamt =3D rs2 & 0x07; + + for (int i =3D 0; i < elems; i++) { + uint8_t e1 =3D EXTRACT8(rs1, i); + uint8_t res =3D e1 >> shamt; + rd =3D INSERT8(rd, res, i); + } + return rd; +} + +/** + * PSRLI.H - Packed 16-bit logical shift right immediate + * For each halfword: rd[i] =3D rs1[i] >> imm + */ +target_ulong HELPER(psrli_h)(CPURISCVState *env, + target_ulong rs1, target_ulong imm) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + uint8_t shamt =3D imm & 0x0F; + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t res =3D e1 >> shamt; + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PSRL.HS - Packed 16-bit logical shift right from register + * For each halfword: rd[i] =3D rs1[i] >> rs2[4:0] + */ +target_ulong HELPER(psrl_hs)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + uint8_t shamt =3D rs2 & 0x0F; + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t res =3D e1 >> shamt; + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PSRLI.W - Packed 32-bit logical shift right immediate (RV64 only) + * For each word: rd[i] =3D rs1[i] >> imm + */ +uint64_t HELPER(psrli_w)(CPURISCVState *env, uint64_t rs1, uint64_t imm) +{ + uint64_t rd =3D 0; + int elems =3D 2; + uint8_t shamt =3D imm & 0x1F; + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t res =3D e1 >> shamt; + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PSRL.WS - Packed 32-bit logical shift right from register (RV64 only) + * For each word: rd[i] =3D rs1[i] >> rs2[5:0] + */ +uint64_t HELPER(psrl_ws)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + uint8_t shamt =3D rs2 & 0x1F; + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t res =3D e1 >> shamt; + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PSRAI.B - Packed 8-bit arithmetic shift right immediate + * For each byte: rd[i] =3D (int8_t)rs1[i] >> imm + */ +target_ulong HELPER(psrai_b)(CPURISCVState *env, + target_ulong rs1, target_ulong imm) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + uint8_t shamt =3D imm & 0x07; + + for (int i =3D 0; i < elems; i++) { + int8_t e1 =3D (int8_t)EXTRACT8(rs1, i); + int8_t res =3D e1 >> shamt; /* Arithmetic right shift */ + rd =3D INSERT8(rd, (uint8_t)res, i); + } + return rd; +} + +/** + * PSRA.BS - Packed 8-bit arithmetic shift right from register + * For each byte: rd[i] =3D (int8_t)rs1[i] >> rs2[4:0] + */ +target_ulong HELPER(psra_bs)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_B(rd); + uint8_t shamt =3D rs2 & 0x07; + + for (int i =3D 0; i < elems; i++) { + int8_t e1 =3D (int8_t)EXTRACT8(rs1, i); + int8_t res =3D e1 >> shamt; + rd =3D INSERT8(rd, (uint8_t)res, i); + } + return rd; +} + +/** + * PSRAI.H - Packed 16-bit arithmetic shift right immediate + * For each halfword: rd[i] =3D (int16_t)rs1[i] >> imm + */ +target_ulong HELPER(psrai_h)(CPURISCVState *env, + target_ulong rs1, target_ulong imm) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + uint8_t shamt =3D imm & 0x0F; + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t res =3D e1 >> shamt; + rd =3D INSERT16(rd, (uint16_t)res, i); + } + return rd; +} + +/** + * PSRA.HS - Packed 16-bit arithmetic shift right from register + * For each halfword: rd[i] =3D (int16_t)rs1[i] >> rs2[4:0] + */ +target_ulong HELPER(psra_hs)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + uint8_t shamt =3D rs2 & 0x0F; + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t res =3D e1 >> shamt; + rd =3D INSERT16(rd, (uint16_t)res, i); + } + return rd; +} + +/** + * PSRAI.W - Packed 32-bit arithmetic shift right immediate (RV64 only) + * For each word: rd[i] =3D (int32_t)rs1[i] >> imm + */ +uint64_t HELPER(psrai_w)(CPURISCVState *env, uint64_t rs1, uint64_t imm) +{ + uint64_t rd =3D 0; + int elems =3D 2; + uint8_t shamt =3D imm & 0x1F; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int32_t res =3D e1 >> shamt; + rd =3D INSERT32(rd, (uint32_t)res, i); + } + return rd; +} + +/** + * PSRA.WS - Packed 32-bit arithmetic shift right from register (RV64 only) + * For each word: rd[i] =3D (int32_t)rs1[i] >> rs2[5:0] + */ +uint64_t HELPER(psra_ws)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + uint8_t shamt =3D rs2 & 0x1F; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int32_t res =3D e1 >> shamt; + rd =3D INSERT32(rd, (uint32_t)res, i); + } + return rd; +} + +/* Saturating shift operations */ + +/** + * PSSLAI.H - Packed 16-bit saturating shift left immediate + * For each halfword: rd[i] =3D sat16(rs1[i] << imm) + */ +target_ulong HELPER(psslai_h)(CPURISCVState *env, + target_ulong rs1, target_ulong imm) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + int sat =3D 0; + uint8_t shamt =3D imm & 0x0F; + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int32_t shifted =3D (int32_t)e1 << shamt; + int16_t res =3D signed_saturate_h(shifted, &sat); + rd =3D INSERT16(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSSLAI.W - Packed 32-bit saturating shift left immediate (RV64 only) + * For each word: rd[i] =3D sat32(rs1[i] << imm) + */ +uint64_t HELPER(psslai_w)(CPURISCVState *env, uint64_t rs1, uint64_t imm) +{ + uint64_t rd =3D 0; + int elems =3D 2; + int sat =3D 0; + uint8_t shamt =3D imm & 0x1F; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int64_t shifted =3D (int64_t)e1 << shamt; + int32_t res =3D signed_saturate_w(shifted, &sat); + rd =3D INSERT32(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * SSLAL - 32-bit scalar saturating shift left immediate + */ +uint32_t HELPER(sslai)(CPURISCVState *env, uint32_t rs1, uint32_t imm) +{ + int32_t a =3D (int32_t)rs1; + uint8_t shamt =3D imm & 0x1F; + int64_t shifted =3D (int64_t)a << shamt; + int sat =3D 0; + int32_t res =3D signed_saturate_w(shifted, &sat); + + if (sat) { + env->vxsat =3D 1; + } + return (uint32_t)res; +} + +/* Rounding shift operations */ + +/** + * PSRARI.H - Packed 16-bit arithmetic shift right with rounding (immediat= e) + * For each halfword: rd[i] =3D round((int16_t)rs1[i] >> imm) + */ +target_ulong HELPER(psrari_h)(CPURISCVState *env, + target_ulong rs1, target_ulong imm) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + uint8_t shamt =3D imm & 0x0F; + + if (shamt =3D=3D 0) { + return rs1; + } + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int32_t rounded =3D ((e1 >> (shamt - 1)) + 1) >> 1; + rd =3D INSERT16(rd, (int16_t)rounded, i); + } + return rd; +} + +/** + * PSRARI.W - Packed 32-bit arithmetic shift right + * with rounding (immediate) (RV64 only) + * For each word: rd[i] =3D round((int32_t)rs1[i] >> imm) + */ +uint64_t HELPER(psrari_w)(CPURISCVState *env, uint64_t rs1, uint64_t imm) +{ + uint64_t rd =3D 0; + int elems =3D 2; + uint8_t shamt =3D imm & 0x1F; + + if (shamt =3D=3D 0) { + return rs1; + } + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int64_t rounded =3D ((e1 >> (shamt - 1)) + 1) >> 1; + rd =3D INSERT32(rd, (int32_t)rounded, i); + } + return rd; +} + +/** + * SRARI_32 - 32-bit scalar arithmetic shift right with rounding + */ +uint32_t HELPER(srari_32)(CPURISCVState *env, uint32_t rs1, uint32_t imm) +{ + int32_t a =3D (int32_t)rs1; + uint8_t shamt =3D imm & 0x1F; + + if (shamt =3D=3D 0) { + return rs1; + } + + return (uint32_t)(((a >> (shamt - 1)) + 1) >> 1); +} + +/** + * SRARI_64 - 64-bit scalar arithmetic shift right with rounding + */ +uint64_t HELPER(srari_64)(CPURISCVState *env, uint64_t rs1, uint64_t imm) +{ + int64_t a =3D (int64_t)rs1; + uint8_t shamt =3D imm & 0x3F; + + if (shamt =3D=3D 0) { + return rs1; + } + + return (uint64_t)(((a >> (shamt - 1)) + 1) >> 1); +} + +/* Variable shift operations (with saturation and rounding) */ + +/** + * PSSHA.HS - Packed 16-bit variable shift with saturation + * Positive shift left (saturating), negative shift right (non-saturating) + */ +target_ulong HELPER(pssha_hs)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + int sat =3D 0; + int8_t shamt =3D (int8_t)(rs2 & 0xFF); /* rs2[7:0] as signed */ + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t res; + + if (shamt >=3D 0) { + /* Left shift with saturation */ + int32_t shifted =3D (int32_t)e1 << shamt; + res =3D signed_saturate_h(shifted, &sat); + } else { + /* Right shift (no saturation) */ + int right =3D -shamt; + if (right >=3D 16) { + res =3D (e1 < 0) ? -1 : 0; + } else { + res =3D e1 >> right; + } + } + + rd =3D INSERT16(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSSHA.WS - Packed 32-bit variable shift with saturation (RV64 only) + * Positive shift left (saturating), negative shift right (non-saturating) + */ +uint64_t HELPER(pssha_ws)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + int sat =3D 0; + int8_t shamt =3D (int8_t)(rs2 & 0xFF); + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int32_t res; + + if (shamt >=3D 0) { + int64_t shifted =3D (int64_t)e1 << shamt; + res =3D signed_saturate_w(shifted, &sat); + } else { + int right =3D -shamt; + if (right >=3D 32) { + res =3D (e1 < 0) ? -1 : 0; + } else { + res =3D e1 >> right; + } + } + + rd =3D INSERT32(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSSHAR.HS - Packed 16-bit variable shift with rounding and saturation + * Positive shift left (saturating), negative shift right (rounded) + */ +target_ulong HELPER(psshar_hs)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + int sat =3D 0; + int8_t shamt =3D (int8_t)(rs2 & 0xFF); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t res; + + if (shamt >=3D 0) { + /* Left shift with saturation */ + int32_t shifted =3D (int32_t)e1 << shamt; + res =3D signed_saturate_h(shifted, &sat); + } else { + /* Right shift with rounding */ + int right =3D -shamt; + if (right >=3D 16) { + res =3D (e1 < 0) ? -1 : 0; + } else { + int32_t rounded =3D ((e1 >> (right - 1)) + 1) >> 1; + res =3D (int16_t)rounded; + } + } + + rd =3D INSERT16(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSSHAR.WS - Packed 32-bit variable shift with + * rounding and saturation (RV64 only) + * Positive shift left (saturating), negative shift right (rounded) + */ +uint64_t HELPER(psshar_ws)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + int sat =3D 0; + int8_t shamt =3D (int8_t)(rs2 & 0xFF); + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int32_t res; + + if (shamt >=3D 0) { + int64_t shifted =3D (int64_t)e1 << shamt; + res =3D signed_saturate_w(shifted, &sat); + } else { + int right =3D -shamt; + if (right >=3D 32) { + res =3D (e1 < 0) ? -1 : 0; + } else { + int64_t rounded =3D ((e1 >> (right - 1)) + 1) >> 1; + res =3D (int32_t)rounded; + } + } + + rd =3D INSERT32(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * SSHA - 32-bit scalar variable shift with saturation + */ +uint32_t HELPER(ssha)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int32_t a =3D (int32_t)rs1; + int8_t shamt =3D (int8_t)(rs2 & 0xFF); + int sat =3D 0; + int32_t res; + + if (shamt >=3D 0) { + int64_t shifted =3D (int64_t)a << shamt; + res =3D signed_saturate_w(shifted, &sat); + } else { + int right =3D -shamt; + if (right >=3D 32) { + res =3D (a < 0) ? -1 : 0; + } else { + res =3D a >> right; + } + } + + if (sat) { + env->vxsat =3D 1; + } + return (uint32_t)res; +} + +/** + * SSHAR - 32-bit scalar variable shift with rounding and saturation + */ +uint32_t HELPER(sshar)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int32_t a =3D (int32_t)rs1; + int8_t shamt =3D (int8_t)(rs2 & 0xFF); + int sat =3D 0; + int32_t res; + + if (shamt >=3D 0) { + int64_t shifted =3D (int64_t)a << shamt; + res =3D signed_saturate_w(shifted, &sat); + } else { + int right =3D -shamt; + if (right >=3D 32) { + res =3D (a < 0) ? -1 : 0; + } else { + int64_t rounded =3D ((a >> (right - 1)) + 1) >> 1; + res =3D (int32_t)rounded; + } + } + + if (sat) { + env->vxsat =3D 1; + } + return (uint32_t)res; +} + +/** + * SHA - 64-bit scalar variable shift + */ +uint64_t HELPER(sha)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + int64_t a =3D (int64_t)rs1; + int8_t shamt =3D (int8_t)(rs2 & 0xFF); + + if (shamt >=3D 0) { + return (uint64_t)(a << shamt); + } else { + int right =3D -shamt; + if (right >=3D 64) { + return (a < 0) ? (uint64_t)-1 : 0; + } else { + return (uint64_t)(a >> right); + } + } +} + +/** + * SHAR - 64-bit scalar variable shift with rounding + */ +uint64_t HELPER(shar)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + int64_t a =3D (int64_t)rs1; + int8_t shamt =3D (int8_t)(rs2 & 0xFF); + + if (shamt >=3D 0) { + return (uint64_t)(a << shamt); + } else { + int right =3D -shamt; + if (right >=3D 64) { + return (a < 0) ? (uint64_t)-1 : 0; + } else { + __int128_t rounded =3D ((__int128_t)a >> (right - 1)) + 1; + return (uint64_t)((int64_t)(rounded >> 1)); + } + } +} --=20 2.34.1 From nobody Sat May 30 20:13:15 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1776422934665626.6776050674862; Fri, 17 Apr 2026 03:48:54 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wDgjL-000154-FW; Fri, 17 Apr 2026 06:47:23 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wDgjJ-00013B-0o; Fri, 17 Apr 2026 06:47:21 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1wDgjF-0007yf-PG; Fri, 17 Apr 2026 06:47:20 -0400 Received: from Huawei.localdomain (unknown [36.110.52.2]) by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S8; Fri, 17 Apr 2026 18:47:13 +0800 (CST) From: Molly Chen To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com, daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com, chao.liu.zevorn@gmail.com Cc: xiaoou@iscas.ac.cn, qemu-riscv@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 06/14] target/riscv: rvp: add exchange operations Date: Fri, 17 Apr 2026 18:46:43 +0800 Message-Id: <20260417104652.17857-7-xiaoou@iscas.ac.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn> References: <20260417104652.17857-1-xiaoou@iscas.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S8 X-Coremail-Antispam: 1UD129KBjvJXoWfJF1rXF1xKF47Kw45ZrW7twb_yoWkAryDpF Wvkry2q3y3JFySgw4fKF1fAw15WwsxJry8GrZxKF1Sqa1fXF1kJrW5tw13urs7GF9rWry5 Wa98A3y8AFyIq37anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPj14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr 1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I0E14v26rxl6s0D M2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjx v20xvE14v26r1Y6r17McIj6I8E87Iv67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1l F7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7MxkF7I0En4kS14v26r1q6r 43MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_ Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x 0EwIxGrwCI42IY6xIIjxv20xvE14v26r1I6r4UMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWx JVW8Jr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMI IF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjfU5TmhDUUU U X-Originating-IP: [36.110.52.2] X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists1p.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn; helo=cstnet.cn X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998, HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1776422936808154100 Content-Type: text/plain; charset="utf-8" Signed-off-by: Molly Chen --- target/riscv/helper.h | 14 ++ target/riscv/insn32.decode | 14 ++ target/riscv/insn_trans/trans_rvp.c.inc | 14 ++ target/riscv/psimd_helper.c | 294 ++++++++++++++++++++++++ 4 files changed, 336 insertions(+) diff --git a/target/riscv/helper.h b/target/riscv/helper.h index d97552eb58..fc66712570 100644 --- a/target/riscv/helper.h +++ b/target/riscv/helper.h @@ -1483,3 +1483,17 @@ DEF_HELPER_3(ssha, i32, env, i32, i32) DEF_HELPER_3(sshar, i32, env, i32, i32) DEF_HELPER_3(sha, i64, env, i64, i64) DEF_HELPER_3(shar, i64, env, i64, i64) + +/* Packed SIMD - Exchange Operations */ +DEF_HELPER_3(pas_hx, tl, env, tl, tl) +DEF_HELPER_3(psa_hx, tl, env, tl, tl) +DEF_HELPER_3(psas_hx, tl, env, tl, tl) +DEF_HELPER_3(pssa_hx, tl, env, tl, tl) +DEF_HELPER_3(paas_hx, tl, env, tl, tl) +DEF_HELPER_3(pasa_hx, tl, env, tl, tl) +DEF_HELPER_3(pas_wx, i64, env, i64, i64) +DEF_HELPER_3(psa_wx, i64, env, i64, i64) +DEF_HELPER_3(psas_wx, i64, env, i64, i64) +DEF_HELPER_3(pssa_wx, i64, env, i64, i64) +DEF_HELPER_3(paas_wx, i64, env, i64, i64) +DEF_HELPER_3(pasa_wx, i64, env, i64, i64) diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode index 69514e2cb9..ba003ed513 100644 --- a/target/riscv/insn32.decode +++ b/target/riscv/insn32.decode @@ -1263,3 +1263,17 @@ psshar_hs 1111100 ..... ..... 010 ..... 0011011 @r } sha 1110111 ..... ..... 010 ..... 0011011 @r shar 1111111 ..... ..... 010 ..... 0011011 @r + +# Packed SIMD - Exchange Operations +pas_hx 1000000 ..... ..... 110 ..... 0111011 @r +psa_hx 1000010 ..... ..... 110 ..... 0111011 @r +psas_hx 1001000 ..... ..... 110 ..... 0111011 @r +pssa_hx 1001010 ..... ..... 110 ..... 0111011 @r +paas_hx 1001100 ..... ..... 110 ..... 0111011 @r +pasa_hx 1001110 ..... ..... 110 ..... 0111011 @r +pas_wx 1000001 ..... ..... 110 ..... 0111011 @r +psa_wx 1000011 ..... ..... 110 ..... 0111011 @r +psas_wx 1001001 ..... ..... 110 ..... 0111011 @r +pssa_wx 1001011 ..... ..... 110 ..... 0111011 @r +paas_wx 1001101 ..... ..... 110 ..... 0111011 @r +pasa_wx 1001111 ..... ..... 110 ..... 0111011 @r diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr= ans/trans_rvp.c.inc index d0b645d083..b24a8ef7c2 100644 --- a/target/riscv/insn_trans/trans_rvp.c.inc +++ b/target/riscv/insn_trans/trans_rvp.c.inc @@ -654,3 +654,17 @@ GEN_SIMD_TRANS_32(ssha) GEN_SIMD_TRANS_32(sshar) GEN_SIMD_TRANS_64(sha) GEN_SIMD_TRANS_64(shar) + +/* Packed SIMD - Exchange Operations */ +GEN_SIMD_TRANS(pas_hx) +GEN_SIMD_TRANS(psa_hx) +GEN_SIMD_TRANS(psas_hx) +GEN_SIMD_TRANS(pssa_hx) +GEN_SIMD_TRANS(paas_hx) +GEN_SIMD_TRANS(pasa_hx) +GEN_SIMD_TRANS_64(pas_wx) +GEN_SIMD_TRANS_64(psa_wx) +GEN_SIMD_TRANS_64(psas_wx) +GEN_SIMD_TRANS_64(pssa_wx) +GEN_SIMD_TRANS_64(paas_wx) +GEN_SIMD_TRANS_64(pasa_wx) diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c index ef556eb007..e48c9897ae 100644 --- a/target/riscv/psimd_helper.c +++ b/target/riscv/psimd_helper.c @@ -2703,3 +2703,297 @@ uint64_t HELPER(shar)(CPURISCVState *env, uint64_t = rs1, uint64_t rs2) } } } + +/* Exchange operations (AS/SA/AS/SA with X suffix) */ + +/** + * PAS.HX - Packed add-subtract with exchange + * For each pair: {rd[2i] =3D rs1[2i] - rs2[2i+1], rd[2i+1] =3D rs1[2i+1] = + rs2[2i]} + */ +target_ulong HELPER(pas_hx)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i +=3D 2) { + int16_t s1_lo =3D (int16_t)EXTRACT16(rs1, i); + int16_t s1_hi =3D (int16_t)EXTRACT16(rs1, i + 1); + int16_t s2_lo =3D (int16_t)EXTRACT16(rs2, i); + int16_t s2_hi =3D (int16_t)EXTRACT16(rs2, i + 1); + int16_t res_lo =3D s1_lo - s2_hi; + int16_t res_hi =3D s1_hi + s2_lo; + rd =3D INSERT16(rd, res_lo, i); + rd =3D INSERT16(rd, res_hi, i + 1); + } + return rd; +} + +/** + * PSA.HX - Packed subtract-add with exchange + * For each pair: {rd[2i] =3D rs1[2i] + rs2[2i+1], rd[2i+1] =3D rs1[2i+1] = - rs2[2i]} + */ +target_ulong HELPER(psa_hx)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i +=3D 2) { + int16_t s1_lo =3D (int16_t)EXTRACT16(rs1, i); + int16_t s1_hi =3D (int16_t)EXTRACT16(rs1, i + 1); + int16_t s2_lo =3D (int16_t)EXTRACT16(rs2, i); + int16_t s2_hi =3D (int16_t)EXTRACT16(rs2, i + 1); + int16_t res_lo =3D s1_lo + s2_hi; + int16_t res_hi =3D s1_hi - s2_lo; + rd =3D INSERT16(rd, res_lo, i); + rd =3D INSERT16(rd, res_hi, i + 1); + } + return rd; +} + +/** + * PSAS.HX - Packed saturating add-subtract with exchange + */ +target_ulong HELPER(psas_hx)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + int sat =3D 0; + + for (int i =3D 0; i < elems; i +=3D 2) { + int16_t s1_lo =3D (int16_t)EXTRACT16(rs1, i); + int16_t s1_hi =3D (int16_t)EXTRACT16(rs1, i + 1); + int16_t s2_lo =3D (int16_t)EXTRACT16(rs2, i); + int16_t s2_hi =3D (int16_t)EXTRACT16(rs2, i + 1); + int32_t diff =3D (int32_t)s1_lo - (int32_t)s2_hi; + int32_t sum =3D (int32_t)s1_hi + (int32_t)s2_lo; + int16_t res_lo =3D signed_saturate_h(diff, &sat); + int16_t res_hi =3D signed_saturate_h(sum, &sat); + rd =3D INSERT16(rd, res_lo, i); + rd =3D INSERT16(rd, res_hi, i + 1); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSSA.HX - Packed saturating subtract-add with exchange + */ +target_ulong HELPER(pssa_hx)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + int sat =3D 0; + + for (int i =3D 0; i < elems; i +=3D 2) { + int16_t s1_lo =3D (int16_t)EXTRACT16(rs1, i); + int16_t s1_hi =3D (int16_t)EXTRACT16(rs1, i + 1); + int16_t s2_lo =3D (int16_t)EXTRACT16(rs2, i); + int16_t s2_hi =3D (int16_t)EXTRACT16(rs2, i + 1); + int32_t sum =3D (int32_t)s1_lo + (int32_t)s2_hi; + int32_t diff =3D (int32_t)s1_hi - (int32_t)s2_lo; + int16_t res_lo =3D signed_saturate_h(sum, &sat); + int16_t res_hi =3D signed_saturate_h(diff, &sat); + rd =3D INSERT16(rd, res_lo, i); + rd =3D INSERT16(rd, res_hi, i + 1); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PAAS.HX - Packed averaging add-subtract with exchange + */ +target_ulong HELPER(paas_hx)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i +=3D 2) { + int16_t s1_lo =3D (int16_t)EXTRACT16(rs1, i); + int16_t s1_hi =3D (int16_t)EXTRACT16(rs1, i + 1); + int16_t s2_lo =3D (int16_t)EXTRACT16(rs2, i); + int16_t s2_hi =3D (int16_t)EXTRACT16(rs2, i + 1); + int16_t res_lo =3D (s1_lo - s2_hi) >> 1; + int16_t res_hi =3D (s1_hi + s2_lo) >> 1; + rd =3D INSERT16(rd, res_lo, i); + rd =3D INSERT16(rd, res_hi, i + 1); + } + return rd; +} + +/** + * PASA.HX - Packed averaging subtract-add with exchange + */ +target_ulong HELPER(pasa_hx)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i +=3D 2) { + int16_t s1_lo =3D (int16_t)EXTRACT16(rs1, i); + int16_t s1_hi =3D (int16_t)EXTRACT16(rs1, i + 1); + int16_t s2_lo =3D (int16_t)EXTRACT16(rs2, i); + int16_t s2_hi =3D (int16_t)EXTRACT16(rs2, i + 1); + int16_t res_lo =3D (s1_lo + s2_hi) >> 1; + int16_t res_hi =3D (s1_hi - s2_lo) >> 1; + rd =3D INSERT16(rd, res_lo, i); + rd =3D INSERT16(rd, res_hi, i + 1); + } + return rd; +} + +/** + * PAS.WX - Word version of packed add-subtract with exchange (RV64 only) + */ +uint64_t HELPER(pas_wx)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i +=3D 2) { + int32_t s1_lo =3D (int32_t)EXTRACT32(rs1, i); + int32_t s1_hi =3D (int32_t)EXTRACT32(rs1, i + 1); + int32_t s2_lo =3D (int32_t)EXTRACT32(rs2, i); + int32_t s2_hi =3D (int32_t)EXTRACT32(rs2, i + 1); + int32_t res_lo =3D s1_lo - s2_hi; + int32_t res_hi =3D s1_hi + s2_lo; + rd =3D INSERT32(rd, res_lo, i); + rd =3D INSERT32(rd, res_hi, i + 1); + } + return rd; +} + +/** + * PSA.WX - Word version of packed subtract-add with exchange (RV64 only) + */ +uint64_t HELPER(psa_wx)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i +=3D 2) { + int32_t s1_lo =3D (int32_t)EXTRACT32(rs1, i); + int32_t s1_hi =3D (int32_t)EXTRACT32(rs1, i + 1); + int32_t s2_lo =3D (int32_t)EXTRACT32(rs2, i); + int32_t s2_hi =3D (int32_t)EXTRACT32(rs2, i + 1); + int32_t res_lo =3D s1_lo + s2_hi; + int32_t res_hi =3D s1_hi - s2_lo; + rd =3D INSERT32(rd, res_lo, i); + rd =3D INSERT32(rd, res_hi, i + 1); + } + return rd; +} + +/** + * PSAS.WX - Word version of packed saturating + * add-subtract with exchange (RV64 only) + */ +uint64_t HELPER(psas_wx)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + int sat =3D 0; + + for (int i =3D 0; i < elems; i +=3D 2) { + int32_t s1_lo =3D (int32_t)EXTRACT32(rs1, i); + int32_t s1_hi =3D (int32_t)EXTRACT32(rs1, i + 1); + int32_t s2_lo =3D (int32_t)EXTRACT32(rs2, i); + int32_t s2_hi =3D (int32_t)EXTRACT32(rs2, i + 1); + int64_t diff =3D (int64_t)s1_lo - (int64_t)s2_hi; + int64_t sum =3D (int64_t)s1_hi + (int64_t)s2_lo; + int32_t res_lo =3D signed_saturate_w(diff, &sat); + int32_t res_hi =3D signed_saturate_w(sum, &sat); + rd =3D INSERT32(rd, res_lo, i); + rd =3D INSERT32(rd, res_hi, i + 1); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSSA.WX - Word version of packed saturating + * subtract-add with exchange (RV64 only) + */ +uint64_t HELPER(pssa_wx)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + int sat =3D 0; + + for (int i =3D 0; i < elems; i +=3D 2) { + int32_t s1_lo =3D (int32_t)EXTRACT32(rs1, i); + int32_t s1_hi =3D (int32_t)EXTRACT32(rs1, i + 1); + int32_t s2_lo =3D (int32_t)EXTRACT32(rs2, i); + int32_t s2_hi =3D (int32_t)EXTRACT32(rs2, i + 1); + int64_t sum =3D (int64_t)s1_lo + (int64_t)s2_hi; + int64_t diff =3D (int64_t)s1_hi - (int64_t)s2_lo; + int32_t res_lo =3D signed_saturate_w(sum, &sat); + int32_t res_hi =3D signed_saturate_w(diff, &sat); + rd =3D INSERT32(rd, res_lo, i); + rd =3D INSERT32(rd, res_hi, i + 1); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PAAS.WX - Word version of packed averaging + * add-subtract with exchange (RV64 only) + */ +uint64_t HELPER(paas_wx)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i +=3D 2) { + int64_t s1_lo =3D (int32_t)EXTRACT32(rs1, i); + int64_t s1_hi =3D (int32_t)EXTRACT32(rs1, i + 1); + int64_t s2_lo =3D (int32_t)EXTRACT32(rs2, i); + int64_t s2_hi =3D (int32_t)EXTRACT32(rs2, i + 1); + int32_t res_lo =3D (s1_lo - s2_hi) >> 1; + int32_t res_hi =3D (s1_hi + s2_lo) >> 1; + rd =3D INSERT32(rd, res_lo, i); + rd =3D INSERT32(rd, res_hi, i + 1); + } + return rd; +} + +/** + * PASA.WX - Word version of packed averaging + * subtract-add with exchange (RV64 only) + */ +uint64_t HELPER(pasa_wx)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i +=3D 2) { + int64_t s1_lo =3D (int32_t)EXTRACT32(rs1, i); + int64_t s1_hi =3D (int32_t)EXTRACT32(rs1, i + 1); + int64_t s2_lo =3D (int32_t)EXTRACT32(rs2, i); + int64_t s2_hi =3D (int32_t)EXTRACT32(rs2, i + 1); + int32_t res_lo =3D (s1_lo + s2_hi) >> 1; + int32_t res_hi =3D (s1_hi - s2_lo) >> 1; + rd =3D INSERT32(rd, res_lo, i); + rd =3D INSERT32(rd, res_hi, i + 1); + } + return rd; +} --=20 2.34.1 From nobody Sat May 30 20:13:15 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1776422951796336.36580170293564; Fri, 17 Apr 2026 03:49:11 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wDgjU-0001DX-F4; Fri, 17 Apr 2026 06:47:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wDgjN-00015O-8P; Fri, 17 Apr 2026 06:47:27 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1wDgjJ-0007za-FI; Fri, 17 Apr 2026 06:47:24 -0400 Received: from Huawei.localdomain (unknown [36.110.52.2]) by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S9; Fri, 17 Apr 2026 18:47:15 +0800 (CST) From: Molly Chen To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com, daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com, chao.liu.zevorn@gmail.com Cc: xiaoou@iscas.ac.cn, qemu-riscv@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 07/14] target/riscv: rvp: add horizontal reduction, pack, merge and cout leading operations Date: Fri, 17 Apr 2026 18:46:44 +0800 Message-Id: <20260417104652.17857-8-xiaoou@iscas.ac.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn> References: <20260417104652.17857-1-xiaoou@iscas.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S9 X-Coremail-Antispam: 1UD129KBjvAXoWftFWxXrWkurWUWrW5ZryDAwb_yoW8ur4rXo Z3Gw15A34fGr1fZ34kCw47Xr17ZrZFvw1kWr4rursruas7Wr1agF15t3W8Aa4xGrWSyrW5 X39aqF15J3W3u3sxn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOU7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l 84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26F4j6r 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY1x0262kKe7AKxVWUtV W8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJw CI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2KfnxnUUI43ZEXa7VUbnNVPUU UUU== X-Originating-IP: [36.110.52.2] X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists1p.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn; helo=cstnet.cn X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998, HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1776422952453158500 Content-Type: text/plain; charset="utf-8" Signed-off-by: Molly Chen --- target/riscv/helper.h | 46 ++ target/riscv/insn32.decode | 44 ++ target/riscv/insn_trans/trans_rvp.c.inc | 44 ++ target/riscv/psimd_helper.c | 619 ++++++++++++++++++++++++ 4 files changed, 753 insertions(+) diff --git a/target/riscv/helper.h b/target/riscv/helper.h index fc66712570..78ae034331 100644 --- a/target/riscv/helper.h +++ b/target/riscv/helper.h @@ -1497,3 +1497,49 @@ DEF_HELPER_3(psas_wx, i64, env, i64, i64) DEF_HELPER_3(pssa_wx, i64, env, i64, i64) DEF_HELPER_3(paas_wx, i64, env, i64, i64) DEF_HELPER_3(pasa_wx, i64, env, i64, i64) + +/* Packed SIMD - Horizontal Reduction Operations */ +DEF_HELPER_3(predsum_bs, tl, env, tl, tl) +DEF_HELPER_3(predsumu_bs, tl, env, tl, tl) +DEF_HELPER_3(predsum_hs, tl, env, tl, tl) +DEF_HELPER_3(predsumu_hs, tl, env, tl, tl) +DEF_HELPER_3(predsum_ws, i64, env, i64, i64) +DEF_HELPER_3(predsumu_ws, i64, env, i64, i64) + +/* Packed SIMD - Pack, Unpack, and Merge Operations */ +DEF_HELPER_3(ppaire_b, tl, env, tl, tl) +DEF_HELPER_3(ppaireo_b, tl, env, tl, tl) +DEF_HELPER_3(ppairoe_b, tl, env, tl, tl) +DEF_HELPER_3(ppairo_b, tl, env, tl, tl) + +DEF_HELPER_3(ppaire_h, i64, env, i64, i64) +DEF_HELPER_3(ppaireo_h, tl, env, tl, tl) +DEF_HELPER_3(ppairoe_h, tl, env, tl, tl) +DEF_HELPER_3(ppairo_h, tl, env, tl, tl) + +DEF_HELPER_3(ppaireo_w, i64, env, i64, i64) +DEF_HELPER_3(ppairoe_w, i64, env, i64, i64) +DEF_HELPER_3(ppairo_w, i64, env, i64, i64) +DEF_HELPER_2(psext_h_b, tl, env, tl) +DEF_HELPER_2(psext_w_b, i64, env, i64) +DEF_HELPER_2(psext_w_h, i64, env, i64) +DEF_HELPER_2(rev, tl, env, tl) +DEF_HELPER_2(rev16, i64, env, i64) +DEF_HELPER_3(zip8p, i64, env, i64, i64) +DEF_HELPER_3(zip8hp, i64, env, i64, i64) +DEF_HELPER_3(unzip8p, i64, env, i64, i64) +DEF_HELPER_3(unzip8hp, i64, env, i64, i64) +DEF_HELPER_3(zip16p, i64, env, i64, i64) +DEF_HELPER_3(zip16hp, i64, env, i64, i64) +DEF_HELPER_3(unzip16p, i64, env, i64, i64) +DEF_HELPER_3(unzip16hp, i64, env, i64, i64) +DEF_HELPER_4(slx, tl, env, tl, tl, tl) +DEF_HELPER_4(srx, tl, env, tl, tl, tl) +DEF_HELPER_4(mvm, tl, env, tl, tl, tl) +DEF_HELPER_4(mvmn, tl, env, tl, tl, tl) +DEF_HELPER_4(merge, tl, env, tl, tl, tl) + +/* Packed SIMD - Count Leading Operations */ +DEF_HELPER_2(cls, tl, env, tl) +DEF_HELPER_2(clsw, i64, env, i64) + diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode index ba003ed513..09bb69b302 100644 --- a/target/riscv/insn32.decode +++ b/target/riscv/insn32.decode @@ -1277,3 +1277,47 @@ psas_wx 1001001 ..... ..... 110 ..... 0111011 @r pssa_wx 1001011 ..... ..... 110 ..... 0111011 @r paas_wx 1001101 ..... ..... 110 ..... 0111011 @r pasa_wx 1001111 ..... ..... 110 ..... 0111011 @r + +# Packed SIMD - Horizontal Reduction Operations +predsum_bs 1001110 ..... ..... 100 ..... 0011011 @r +predsumu_bs 1011110 ..... ..... 100 ..... 0011011 @r +predsum_hs 1001100 ..... ..... 100 ..... 0011011 @r +predsumu_hs 1011100 ..... ..... 100 ..... 0011011 @r +predsum_ws 1001101 ..... ..... 100 ..... 0011011 @r +predsumu_ws 1011101 ..... ..... 100 ..... 0011011 @r + +# Packed SIMD - Pack, Unpack, and Merge Operations +ppaire_b 1000000 ..... ..... 100 ..... 0111011 @r +ppaireo_b 1001000 ..... ..... 100 ..... 0111011 @r +ppairoe_b 1010000 ..... ..... 100 ..... 0111011 @r +ppairo_b 1011000 ..... ..... 100 ..... 0111011 @r +ppaireo_h 1001001 ..... ..... 100 ..... 0111011 @r +ppairoe_h 1010001 ..... ..... 100 ..... 0111011 @r +ppairo_h 1011001 ..... ..... 100 ..... 0111011 @r +ppaire_h 1000001 ..... ..... 100 ..... 0111011 @r +ppaireo_w 1001011 ..... ..... 100 ..... 0111011 @r +ppairoe_w 1010011 ..... ..... 100 ..... 0111011 @r +ppairo_w 1011011 ..... ..... 100 ..... 0111011 @r +psext_h_b 1110000 00100 ..... 010 ..... 0011011 @r2 +psext_w_b 1110001 00100 ..... 010 ..... 0011011 @r2 +psext_w_h 1110001 00101 ..... 010 ..... 0011011 @r2 +rev 01101 0111111 ..... 101 ..... 0010011 @r2 +rev16 01101 0110000 ..... 101 ..... 0010011 @r2 +zip8p 1111000 ..... ..... 010 ..... 0111011 @r +zip8hp 1111010 ..... ..... 010 ..... 0111011 @r +unzip8p 1110000 ..... ..... 010 ..... 0111011 @r +unzip8hp 1110010 ..... ..... 010 ..... 0111011 @r +zip16p 1111001 ..... ..... 010 ..... 0111011 @r +zip16hp 1111011 ..... ..... 010 ..... 0111011 @r +unzip16p 1110001 ..... ..... 010 ..... 0111011 @r +unzip16hp 1110011 ..... ..... 010 ..... 0111011 @r +slx 1000111 ..... ..... 001 ..... 0111011 @r +srx 1010111 ..... ..... 001 ..... 0111011 @r +mvm 1010100 ..... ..... 001 ..... 0111011 @r +mvmn 1010101 ..... ..... 001 ..... 0111011 @r +merge 1010110 ..... ..... 001 ..... 0111011 @r + +# Packed SIMD - Count Leading Operations +cls 01100 0000011 ..... 001 ..... 0010011 @r2 +clsw 01100 0000011 ..... 001 ..... 0011011 @r2 + diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr= ans/trans_rvp.c.inc index b24a8ef7c2..fc6254b395 100644 --- a/target/riscv/insn_trans/trans_rvp.c.inc +++ b/target/riscv/insn_trans/trans_rvp.c.inc @@ -668,3 +668,47 @@ GEN_SIMD_TRANS_64(psas_wx) GEN_SIMD_TRANS_64(pssa_wx) GEN_SIMD_TRANS_64(paas_wx) GEN_SIMD_TRANS_64(pasa_wx) + +/* Packed SIMD - Horizontal Reduction Operations */ +GEN_SIMD_TRANS(predsum_bs) +GEN_SIMD_TRANS(predsumu_bs) +GEN_SIMD_TRANS(predsum_hs) +GEN_SIMD_TRANS(predsumu_hs) +GEN_SIMD_TRANS_64(predsum_ws) +GEN_SIMD_TRANS_64(predsumu_ws) + +/* Packed SIMD - Pack, Unpack, and Merge Operations */ +GEN_SIMD_TRANS(ppaire_b) +GEN_SIMD_TRANS(ppaireo_b) +GEN_SIMD_TRANS(ppairoe_b) +GEN_SIMD_TRANS(ppairo_b) +GEN_SIMD_TRANS_64(ppaire_h) +GEN_SIMD_TRANS(ppaireo_h) +GEN_SIMD_TRANS(ppairoe_h) +GEN_SIMD_TRANS(ppairo_h) +GEN_SIMD_TRANS_64(ppaireo_w) +GEN_SIMD_TRANS_64(ppairoe_w) +GEN_SIMD_TRANS_64(ppairo_w) +GEN_SIMD_TRANS_R1(psext_h_b) +GEN_SIMD_TRANS_R1_64(psext_w_b) +GEN_SIMD_TRANS_R1_64(psext_w_h) +GEN_SIMD_TRANS_R1(rev) +GEN_SIMD_TRANS_R1_64(rev16) +GEN_SIMD_TRANS_64(zip8p) +GEN_SIMD_TRANS_64(zip8hp) +GEN_SIMD_TRANS_64(unzip8p) +GEN_SIMD_TRANS_64(unzip8hp) +GEN_SIMD_TRANS_64(zip16p) +GEN_SIMD_TRANS_64(zip16hp) +GEN_SIMD_TRANS_64(unzip16p) +GEN_SIMD_TRANS_64(unzip16hp) +GEN_SIMD_TRANS_ACC(slx) +GEN_SIMD_TRANS_ACC(srx) +GEN_SIMD_TRANS_ACC(mvm) +GEN_SIMD_TRANS_ACC(mvmn) +GEN_SIMD_TRANS_ACC(merge) + +/* Packed SIMD - Count Leading Operations */ +GEN_SIMD_TRANS_R1(cls) +GEN_SIMD_TRANS_R1_64(clsw) + diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c index e48c9897ae..4080aab234 100644 --- a/target/riscv/psimd_helper.c +++ b/target/riscv/psimd_helper.c @@ -2997,3 +2997,622 @@ uint64_t HELPER(pasa_wx)(CPURISCVState *env, uint64= _t rs1, uint64_t rs2) } return rd; } + +/* Horizontal sum operations */ + +/** + * PREDSUM.BS - Signed reduction sum of bytes + * rd =3D rs2 + sum(sign_extend(rs1[i])) + */ +target_ulong HELPER(predsum_bs)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + int64_t sum =3D (int64_t)(int32_t)rs2; + int elems =3D ELEMS_B(rs1); + + for (int i =3D 0; i < elems; i++) { + int8_t e1 =3D (int8_t)EXTRACT8(rs1, i); + sum +=3D e1; + } + + return (target_ulong)sum; +} + +/** + * PREDSUMU.BS - Unsigned reduction sum of bytes + * rd =3D rs2 + sum(zero_extend(rs1[i])) + */ +target_ulong HELPER(predsumu_bs)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + uint64_t sum =3D rs2; + int elems =3D ELEMS_B(rs1); + + for (int i =3D 0; i < elems; i++) { + uint8_t e1 =3D EXTRACT8(rs1, i); + sum +=3D e1; + } + + return (target_ulong)sum; +} + +/** + * PREDSUM.HS - Signed reduction sum of halfwords + * rd =3D rs2 + sum(sign_extend(rs1[i])) + */ +target_ulong HELPER(predsum_hs)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + int64_t sum =3D (int64_t)(int32_t)rs2; + int elems =3D ELEMS_H(rs1); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + sum +=3D e1; + } + + return (target_ulong)sum; +} + +/** + * PREDSUMU.HS - Unsigned reduction sum of halfwords + * rd =3D rs2 + sum(zero_extend(rs1[i])) + */ +target_ulong HELPER(predsumu_hs)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + uint64_t sum =3D rs2; + int elems =3D ELEMS_H(rs1); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + sum +=3D e1; + } + + return (target_ulong)sum; +} + +/** + * PREDSUM.WS - Signed reduction sum of words (RV64 only) + * rd =3D rs2 + sum(sign_extend(rs1[i])) + */ +uint64_t HELPER(predsum_ws)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + int64_t sum =3D (int64_t)rs2; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + sum +=3D e1; + } + + return (uint64_t)sum; +} + +/** + * PREDSUMU.WS - Unsigned reduction sum of words (RV64 only) + * rd =3D rs2 + sum(zero_extend(rs1[i])) + */ +uint64_t HELPER(predsumu_ws)(CPURISCVState *env, uint64_t rs1, uint64_t rs= 2) +{ + uint64_t sum =3D rs2; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + sum +=3D e1; + } + + return sum; +} + +/* Packing/unpacking operations */ + +/** + * PPAIRE.B - Pair low bytes of corresponding halfwords + * For each halfword: rd[i] =3D {rs2[i][7:0], rs1[i][7:0]} + */ +target_ulong HELPER(ppaire_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + uint16_t res =3D ((e2 & 0x00FF) << 8) | (e1 & 0x00FF); + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PPAIREO.B - Pair high byte of rs2 with low byte of rs1 + * For each halfword: rd[i] =3D {rs2[i][15:8], rs1[i][7:0]} + */ +target_ulong HELPER(ppaireo_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + uint16_t res =3D ((e2 >> 8) << 8) | (e1 & 0x00FF); + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PPAIROE.B - Pair low byte of rs2 with high byte of rs1 + * For each halfword: rd[i] =3D {rs2[i][7:0], rs1[i][15:8]} + */ +target_ulong HELPER(ppairoe_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + uint16_t res =3D ((e2 & 0x00FF) << 8) | ((e1 >> 8) & 0x00FF); + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PPAIRO.B - Pair high bytes of corresponding halfwords + * For each halfword: rd[i] =3D {rs2[i][15:8], rs1[i][15:8]} + */ +target_ulong HELPER(ppairo_b)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + uint16_t res =3D ((e2 >> 8) << 8) | ((e1 >> 8) & 0x00FF); + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PPAIRE.H - Pair low halfwords of corresponding words + * (RV64 only) + * For each word: rd[i] =3D {rs2[i][15:0], rs1[i][15:0]} + */ +uint64_t HELPER(ppaire_h)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + uint32_t res =3D ((e2 & 0x0000FFFF) << 16) | (e1 & 0x0000FFFF); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PPAIREO.H - Pair high halfword of rs2 with low halfword of rs1 (RV64 on= ly) + */ +target_ulong HELPER(ppaireo_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + uint32_t res =3D ((e2 >> 16) << 16) | (e1 & 0x0000FFFF); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PPAIROE.H - Pair low halfword of rs2 with high halfword of rs1 (RV64 on= ly) + */ +target_ulong HELPER(ppairoe_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + uint32_t res =3D ((e2 & 0x0000FFFF) << 16) | ((e1 >> 16) & 0x0000F= FFF); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PPAIRO.H - Pair high halfwords of corresponding words (RV64 only) + */ +target_ulong HELPER(ppairo_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + uint32_t res =3D ((e2 >> 16) << 16) | ((e1 >> 16) & 0x0000FFFF); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PPAIREO.W - Pair low word of rs2 with low word of rs1 (RV64 only) + */ +uint64_t HELPER(ppaireo_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + uint32_t e1 =3D EXTRACT32(rs1, 0); + uint32_t e2 =3D EXTRACT32(rs2, 1); + rd =3D ((uint64_t)e2 << 32) | e1; + return rd; +} + +/** + * PPAIROE.W - Pair low word of rs2 with high word of rs1 (RV64 only) + */ +uint64_t HELPER(ppairoe_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + uint32_t e1 =3D EXTRACT32(rs1, 1); + uint32_t e2 =3D EXTRACT32(rs2, 0); + rd =3D ((uint64_t)e2 << 32) | e1; + return rd; +} + +/** + * PPAIRO.W - Pair high word of rs2 with high word of rs1 (RV64 only) + */ +uint64_t HELPER(ppairo_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + uint32_t e1 =3D EXTRACT32(rs1, 1); + uint32_t e2 =3D EXTRACT32(rs2, 1); + rd =3D ((uint64_t)e2 << 32) | e1; + return rd; +} + +/** + * PSEXT.H.B - Sign-extend bytes to halfwords within each halfword + */ +target_ulong HELPER(psext_h_b)(CPURISCVState *env, target_ulong rs1) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + int8_t b0 =3D (int8_t)(e1 & 0xFF); + int16_t res =3D (int16_t)b0; + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PSEXT.W.B - Sign-extend bytes to words (RV64 only) + */ +uint64_t HELPER(psext_w_b)(CPURISCVState *env, uint64_t rs1) +{ + uint64_t rd =3D 0; + int8_t b0 =3D (int8_t)EXTRACT8(rs1, 0); + int8_t b4 =3D (int8_t)EXTRACT8(rs1, 4); + uint32_t lo =3D (uint32_t)(int32_t)b0; + uint32_t hi =3D (uint32_t)(int32_t)b4; + rd =3D ((uint64_t)hi << 32) | lo; + return rd; +} + +/** + * PSEXT.W.H - Sign-extend halfwords to words (RV64 only) + */ +uint64_t HELPER(psext_w_h)(CPURISCVState *env, uint64_t rs1) +{ + uint64_t rd =3D 0; + int16_t h0 =3D (int16_t)EXTRACT16(rs1, 0); + int16_t h2 =3D (int16_t)EXTRACT16(rs1, 2); + uint32_t lo =3D (uint32_t)(int32_t)h0; + uint32_t hi =3D (uint32_t)(int32_t)h2; + rd =3D ((uint64_t)hi << 32) | lo; + return rd; +} + +/** + * REV - Reverse all bits + */ +target_ulong HELPER(rev)(CPURISCVState *env, target_ulong rs1) +{ + target_ulong rd =3D 0; + + for (int i =3D 0; i < TARGET_LONG_BITS; i++) { + rd =3D (rd << 1) | (rs1 & 1); + rs1 >>=3D 1; + } + + return rd; +} + +/** + * REV16 - Reverse 16-bit chunks (RV64 only) + */ +uint64_t HELPER(rev16)(CPURISCVState *env, uint64_t rs1) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t chunk =3D EXTRACT16(rs1, i); + rd =3D (rd << 16) | chunk; + } + + return rd; +} + +/** + * ZIP8P - Interleave bytes from rs2 and rs1 (RV64 only) + * rd =3D {rs2[31:24], rs1[31:24], rs2[23:16], rs1[23:16], + * rs2[15:8], rs1[15:8], rs2[7:0], rs1[7:0]} + */ +uint64_t HELPER(zip8p)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint8_t b1 =3D EXTRACT8(rs1, 3 - i); + uint8_t b2 =3D EXTRACT8(rs2, 3 - i); + rd =3D (rd << 16) | ((uint16_t)b2 << 8) | b1; + } + + return rd; +} + +/** + * ZIP8HP - Interleave high bytes from rs2 and rs1 (RV64 only) + */ +uint64_t HELPER(zip8hp)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint8_t b1 =3D EXTRACT8(rs1, 7 - i); + uint8_t b2 =3D EXTRACT8(rs2, 7 - i); + rd =3D (rd << 16) | ((uint16_t)b2 << 8) | b1; + } + + return rd; +} + +/** + * UNZIP8P - De-interleave bytes + * (RV64 only) + */ +uint64_t HELPER(unzip8p)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint64_t b1 =3D EXTRACT8(rs1, 2 * i) << 8 * i; + uint64_t b2 =3D EXTRACT8(rs2, 2 * i) << (32 + 8 * i); + rd =3D rd | b2 | b1; + } + + return rd; +} + +/** + * UNZIP8HP - De-interleave high bytes + * (RV64 only) + */ +uint64_t HELPER(unzip8hp)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint64_t b1 =3D EXTRACT8(rs1, 2 * i + 1) << 8 * i; + uint64_t b2 =3D EXTRACT8(rs2, 2 * i + 1) << (32 + 8 * i); + rd =3D rd | b2 | b1; + } + + return rd; +} + +/** + * ZIP16P - Interleave halfwords from rs2 and rs1 (RV64 only) + */ +uint64_t HELPER(zip16p)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + uint16_t h1 =3D EXTRACT16(rs1, 1 - i); + uint16_t h2 =3D EXTRACT16(rs2, 1 - i); + rd =3D (rd << 32) | ((uint32_t)h2 << 16) | h1; + } + + return rd; +} + +/** + * ZIP16HP - Interleave high halfwords (RV64 only) + */ +uint64_t HELPER(zip16hp)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + uint16_t h1 =3D EXTRACT16(rs1, 3 - i); + uint16_t h2 =3D EXTRACT16(rs2, 3 - i); + rd =3D (rd << 32) | ((uint32_t)h2 << 16) | h1; + } + + return rd; +} + +/** + * UNZIP16P - De-interleave halfwords (RV64 only) + */ +uint64_t HELPER(unzip16p)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + uint64_t b1 =3D EXTRACT16(rs1, 2 * i) << 16 * i; + uint64_t b2 =3D EXTRACT16(rs2, 2 * i) << (32 + 16 * i); + rd =3D rd | b2 | b1; + } + + return rd; +} + +/** + * UNZIP16HP - De-interleave high halfwords (RV64 only) + */ +uint64_t HELPER(unzip16hp)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + uint64_t b1 =3D EXTRACT16(rs1, 2 * i + 1) << 16 * i; + uint64_t b2 =3D EXTRACT16(rs2, 2 * i + 1) << (32 + 16 * i); + rd =3D rd | b2 | b1; + } + + return rd; +} + + +/* Merge and mask operations */ + +/** + * SLX - Shift left extended (concatenate rd and rs1, shift left, take upp= er) + */ +target_ulong HELPER(slx)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong rd) +{ + int shamt =3D (TARGET_LONG_BITS =3D=3D 32) ? (rs2 & 0x1F) : (rs2 & 0x3= F); + target_ulong xrs1 =3D 0; + target_ulong xrd =3D 0; + + if (shamt <=3D TARGET_LONG_BITS) { + xrs1 =3D rs1 >> (TARGET_LONG_BITS - shamt); + xrd =3D (rd << shamt) + xrs1; + } else { + xrd =3D rs1 << (shamt - TARGET_LONG_BITS); + } + + return xrd; +} + +/** + * SRX - Shift right extended (concatenate rs1 and rd, shift right, take l= ower) + */ +target_ulong HELPER(srx)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong rd) +{ + int shamt =3D (TARGET_LONG_BITS =3D=3D 32) ? (rs2 & 0x1F) : (rs2 & 0x3= F); + target_ulong xrs1 =3D 0; + target_ulong xrd =3D 0; + + if (shamt <=3D TARGET_LONG_BITS) { + xrs1 =3D rs1 << (TARGET_LONG_BITS - shamt); + xrd =3D (rd >> shamt) + xrs1; + } else { + xrd =3D rs1 >> (shamt - TARGET_LONG_BITS); + } + + return xrd; +} + +/** + * MVM - Move masked + * For each bit: rd[i] =3D rs2[i] ? rs1[i] : rd[i] + */ +target_ulong HELPER(mvm)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong rd) +{ + return (~rs2 & rd) | (rs2 & rs1); +} + +/** + * MVMN - Move masked not + * For each bit: rd[i] =3D rs2[i] ? rd[i] : rs1[i] + */ +target_ulong HELPER(mvmn)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong rd) +{ + return (~rs2 & rs1) | (rs2 & rd); +} + +/** + * MERGE - Merge + * For each bit: rd[i] =3D rd[i] ? rs2[i] : rs1[i] + */ +target_ulong HELPER(merge)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong rd) +{ + return (~rd & rs1) | (rd & rs2); +} + +/* Count leading operations */ + +/** + * CLS - Count leading redundant sign bits + */ +target_ulong HELPER(cls)(CPURISCVState *env, target_ulong rs1) +{ + target_long a =3D (target_long)rs1; + target_ulong cnt =3D 0; + +#if TARGET_LONG_BITS =3D=3D 64 + target_long lo_bound =3D 0xC000000000000000LL; + target_long hi_bound =3D 0x3FFFFFFFFFFFFFFFLL; +#else + target_long lo_bound =3D 0xC0000000; + target_long hi_bound =3D 0x3FFFFFFF; +#endif + + while (cnt < TARGET_LONG_BITS - 1 && a >=3D lo_bound && a <=3D hi_boun= d) { + cnt++; + a <<=3D 1; + } + + return cnt; +} + +/** + * CLSW - Count leading redundant sign bits of low 32 bits (RV64) + */ +uint64_t HELPER(clsw)(CPURISCVState *env, uint64_t rs1) +{ + int32_t a =3D (int32_t)(rs1 & 0xFFFFFFFF); + int32_t lo_bound =3D 0xC0000000; + int32_t hi_bound =3D 0x3FFFFFFF; + int c =3D 0; + + while (c < 31 && a >=3D lo_bound && a <=3D hi_bound) { + c++; + a <<=3D 1; + } + + return c; +} --=20 2.34.1 From nobody Sat May 30 20:13:15 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1776422971518963.8582716011879; Fri, 17 Apr 2026 03:49:31 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wDgjX-0001GI-GR; Fri, 17 Apr 2026 06:47:35 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wDgjN-00015R-RB; Fri, 17 Apr 2026 06:47:27 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1wDgjI-0007zM-NP; Fri, 17 Apr 2026 06:47:25 -0400 Received: from Huawei.localdomain (unknown [36.110.52.2]) by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S10; Fri, 17 Apr 2026 18:47:16 +0800 (CST) From: Molly Chen To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com, daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com, chao.liu.zevorn@gmail.com Cc: xiaoou@iscas.ac.cn, qemu-riscv@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 08/14] target/riscv: rvp: add pure multiplication operations Date: Fri, 17 Apr 2026 18:46:45 +0800 Message-Id: <20260417104652.17857-9-xiaoou@iscas.ac.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn> References: <20260417104652.17857-1-xiaoou@iscas.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S10 X-Coremail-Antispam: 1UD129KBjvAXoWfXr1kAr1UGr45JF4xtF1rtFb_yoWrXF4xKo W3Gw1Yy3s3Gw1xuw4rCa1UXw17ZrWIvw1DJw4Fvr45Xas7Gr17KF15J34kAayxGrWSyrW8 WFZavF1fJF9Ik3srn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOj7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l 84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV WxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE 3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I x0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJVW8 JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc7CjxVAaw2AFwI0_Jw 0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AK xVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrx kI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF7I0E14v2 6F4j6r4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr 1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUqLvNU UUUU= X-Originating-IP: [36.110.52.2] X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists1p.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn; helo=cstnet.cn X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998, HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1776422972645158500 Content-Type: text/plain; charset="utf-8" Signed-off-by: Molly Chen --- target/riscv/helper.h | 62 ++ target/riscv/insn32.decode | 92 ++ target/riscv/insn_trans/trans_rvp.c.inc | 62 ++ target/riscv/psimd_helper.c | 1066 +++++++++++++++++++++++ 4 files changed, 1282 insertions(+) diff --git a/target/riscv/helper.h b/target/riscv/helper.h index 78ae034331..4b3f01f8d0 100644 --- a/target/riscv/helper.h +++ b/target/riscv/helper.h @@ -1543,3 +1543,65 @@ DEF_HELPER_4(merge, tl, env, tl, tl, tl) DEF_HELPER_2(cls, tl, env, tl) DEF_HELPER_2(clsw, i64, env, i64) =20 +/* Packed SIMD - Pure Multiplication Operations */ +DEF_HELPER_3(pmulh_h, tl, env, tl, tl) +DEF_HELPER_3(pmulhsu_h, tl, env, tl, tl) +DEF_HELPER_3(pmulhu_h, tl, env, tl, tl) +DEF_HELPER_3(pmulhr_h, tl, env, tl, tl) +DEF_HELPER_3(pmulhrsu_h, tl, env, tl, tl) +DEF_HELPER_3(pmulhru_h, tl, env, tl, tl) +DEF_HELPER_3(pmulh_w, i64, env, i64, i64) +DEF_HELPER_3(pmulhr_w, i64, env, i64, i64) +DEF_HELPER_3(pmulhsu_w, i64, env, i64, i64) +DEF_HELPER_3(pmulhrsu_w, i64, env, i64, i64) +DEF_HELPER_3(pmulhu_w, i64, env, i64, i64) +DEF_HELPER_3(pmulhru_w, i64, env, i64, i64) +DEF_HELPER_3(mulhr, i32, env, i32, i32) +DEF_HELPER_3(mulhrsu, i32, env, i32, i32) +DEF_HELPER_3(mulhru, i32, env, i32, i32) +DEF_HELPER_3(pmulh_h_b0, tl, env, tl, tl) +DEF_HELPER_3(pmulh_h_b1, tl, env, tl, tl) +DEF_HELPER_3(pmulhsu_h_b0, tl, env, tl, tl) +DEF_HELPER_3(pmulhsu_h_b1, tl, env, tl, tl) +DEF_HELPER_3(mulh_h0, i32, env, i32, i32) +DEF_HELPER_3(mulh_h1, i32, env, i32, i32) +DEF_HELPER_3(mulhsu_h0, i32, env, i32, i32) +DEF_HELPER_3(mulhsu_h1, i32, env, i32, i32) +DEF_HELPER_3(pmulh_w_h0, i64, env, i64, i64) +DEF_HELPER_3(pmulh_w_h1, i64, env, i64, i64) +DEF_HELPER_3(pmulhsu_w_h0, i64, env, i64, i64) +DEF_HELPER_3(pmulhsu_w_h1, i64, env, i64, i64) +DEF_HELPER_3(pmul_h_b00, tl, env, tl, tl) +DEF_HELPER_3(pmul_h_b01, tl, env, tl, tl) +DEF_HELPER_3(pmul_h_b11, tl, env, tl, tl) +DEF_HELPER_3(pmulsu_h_b00, tl, env, tl, tl) +DEF_HELPER_3(pmulsu_h_b11, tl, env, tl, tl) +DEF_HELPER_3(pmulu_h_b00, tl, env, tl, tl) +DEF_HELPER_3(pmulu_h_b01, tl, env, tl, tl) +DEF_HELPER_3(pmulu_h_b11, tl, env, tl, tl) +DEF_HELPER_3(pmul_w_h00, i64, env, i64, i64) +DEF_HELPER_3(pmul_w_h01, i64, env, i64, i64) +DEF_HELPER_3(pmul_w_h11, i64, env, i64, i64) +DEF_HELPER_3(pmulsu_w_h00, i64, env, i64, i64) +DEF_HELPER_3(pmulsu_w_h11, i64, env, i64, i64) +DEF_HELPER_3(pmulu_w_h00, i64, env, i64, i64) +DEF_HELPER_3(pmulu_w_h01, i64, env, i64, i64) +DEF_HELPER_3(pmulu_w_h11, i64, env, i64, i64) +DEF_HELPER_3(pm2sadd_h, tl, env, tl, tl) +DEF_HELPER_3(pm2sadd_hx, tl, env, tl, tl) +DEF_HELPER_3(mul_h00, i32, env, i32, i32) +DEF_HELPER_3(mul_h01, i32, env, i32, i32) +DEF_HELPER_3(mul_h11, i32, env, i32, i32) +DEF_HELPER_3(mulsu_h00, i32, env, i32, i32) +DEF_HELPER_3(mulsu_h11, i32, env, i32, i32) +DEF_HELPER_3(mulu_h00, i32, env, i32, i32) +DEF_HELPER_3(mulu_h01, i32, env, i32, i32) +DEF_HELPER_3(mulu_h11, i32, env, i32, i32) +DEF_HELPER_3(mul_w00, i64, env, i64, i64) +DEF_HELPER_3(mul_w01, i64, env, i64, i64) +DEF_HELPER_3(mul_w11, i64, env, i64, i64) +DEF_HELPER_3(mulsu_w00, i64, env, i64, i64) +DEF_HELPER_3(mulsu_w11, i64, env, i64, i64) +DEF_HELPER_3(mulu_w00, i64, env, i64, i64) +DEF_HELPER_3(mulu_w01, i64, env, i64, i64) +DEF_HELPER_3(mulu_w11, i64, env, i64, i64) diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode index 09bb69b302..bd3b14af5b 100644 --- a/target/riscv/insn32.decode +++ b/target/riscv/insn32.decode @@ -1321,3 +1321,95 @@ merge 1010110 ..... ..... 001 ..... 0111011 @r cls 01100 0000011 ..... 001 ..... 0010011 @r2 clsw 01100 0000011 ..... 001 ..... 0011011 @r2 =20 +# Packed SIMD - Pure Multiplication Operations +pmulh_h 10000 00 ..... ..... 111 ..... 0111011 @r +pmulhsu_h 11000 00 ..... ..... 111 ..... 0111011 @r +pmulhu_h 10010 00 ..... ..... 111 ..... 0111011 @r +pmulhr_h 10000 10 ..... ..... 111 ..... 0111011 @r +pmulhrsu_h 11000 10 ..... ..... 111 ..... 0111011 @r +pmulhru_h 10010 10 ..... ..... 111 ..... 0111011 @r +pmulh_w 10000 01 ..... ..... 111 ..... 0111011 @r +{ + mulhr 10000 11 ..... ..... 111 ..... 0111011 @r + pmulhr_w 10000 11 ..... ..... 111 ..... 0111011 @r +} +pmulhsu_w 11000 01 ..... ..... 111 ..... 0111011 @r +{ + mulhrsu 11000 11 ..... ..... 111 ..... 0111011 @r + pmulhrsu_w 11000 11 ..... ..... 111 ..... 0111011 @r +} +pmulhu_w 10010 01 ..... ..... 111 ..... 0111011 @r +{ + mulhru 10010 11 ..... ..... 111 ..... 0111011 @r + pmulhru_w 10010 11 ..... ..... 111 ..... 0111011 @r +} +pmulh_h_b0 10100 00 ..... ..... 111 ..... 0111011 @r +pmulh_h_b1 10110 00 ..... ..... 111 ..... 0111011 @r +pmulhsu_h_b0 10100 10 ..... ..... 111 ..... 0111011 @r +pmulhsu_h_b1 10110 10 ..... ..... 111 ..... 0111011 @r +{ + mulh_h0 10100 01 ..... ..... 111 ..... 0111011 @r + pmulh_w_h0 10100 01 ..... ..... 111 ..... 0111011 @r +} +{ + mulh_h1 10110 01 ..... ..... 111 ..... 0111011 @r + pmulh_w_h1 10110 01 ..... ..... 111 ..... 0111011 @r +} +{ + mulhsu_h0 10100 11 ..... ..... 111 ..... 0111011 @r + pmulhsu_w_h0 10100 11 ..... ..... 111 ..... 0111011 @r +} +{ + mulhsu_h1 10110 11 ..... ..... 111 ..... 0111011 @r + pmulhsu_w_h1 10110 11 ..... ..... 111 ..... 0111011 @r +} +pmul_h_b00 10000 00 ..... ..... 011 ..... 0111011 @r +pmul_h_b01 10010 00 ..... ..... 001 ..... 0111011 @r +pmul_h_b11 10010 00 ..... ..... 011 ..... 0111011 @r +pmulsu_h_b00 11100 00 ..... ..... 011 ..... 0111011 @r +pmulsu_h_b11 11110 00 ..... ..... 011 ..... 0111011 @r +pmulu_h_b00 10100 00 ..... ..... 011 ..... 0111011 @r +pmulu_h_b01 10110 00 ..... ..... 001 ..... 0111011 @r +pmulu_h_b11 10110 00 ..... ..... 011 ..... 0111011 @r +{ + mul_h00 10000 01 ..... ..... 011 ..... 0111011 @r + pmul_w_h00 10000 01 ..... ..... 011 ..... 0111011 @r +} +{ + mul_h01 10010 01 ..... ..... 001 ..... 0111011 @r + pmul_w_h01 10010 01 ..... ..... 001 ..... 0111011 @r +} +{ + mul_h11 10010 01 ..... ..... 011 ..... 0111011 @r + pmul_w_h11 10010 01 ..... ..... 011 ..... 0111011 @r +} +{ + mulsu_h00 11100 01 ..... ..... 011 ..... 0111011 @r + pmulsu_w_h00 11100 01 ..... ..... 011 ..... 0111011 @r +} +{ + mulsu_h11 11110 01 ..... ..... 011 ..... 0111011 @r + pmulsu_w_h11 11110 01 ..... ..... 011 ..... 0111011 @r +} +{ + mulu_h00 10100 01 ..... ..... 011 ..... 0111011 @r + pmulu_w_h00 10100 01 ..... ..... 011 ..... 0111011 @r +} +{ + mulu_h01 10110 01 ..... ..... 001 ..... 0111011 @r + pmulu_w_h01 10110 01 ..... ..... 001 ..... 0111011 @r +} +{ + mulu_h11 10110 01 ..... ..... 011 ..... 0111011 @r + pmulu_w_h11 10110 01 ..... ..... 011 ..... 0111011 @r +} +pm2sadd_h 11000 10 ..... ..... 101 ..... 0111011 @r +pm2sadd_hx 11010 10 ..... ..... 101 ..... 0111011 @r +mul_w00 10000 11 ..... ..... 011 ..... 0111011 @r +mul_w01 10010 11 ..... ..... 001 ..... 0111011 @r +mul_w11 10010 11 ..... ..... 011 ..... 0111011 @r +mulsu_w00 11100 11 ..... ..... 011 ..... 0111011 @r +mulsu_w11 11110 11 ..... ..... 011 ..... 0111011 @r +mulu_w00 10100 11 ..... ..... 011 ..... 0111011 @r +mulu_w01 10110 11 ..... ..... 001 ..... 0111011 @r +mulu_w11 10110 11 ..... ..... 011 ..... 0111011 @r diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr= ans/trans_rvp.c.inc index fc6254b395..b01656ffb0 100644 --- a/target/riscv/insn_trans/trans_rvp.c.inc +++ b/target/riscv/insn_trans/trans_rvp.c.inc @@ -712,3 +712,65 @@ GEN_SIMD_TRANS_ACC(merge) GEN_SIMD_TRANS_R1(cls) GEN_SIMD_TRANS_R1_64(clsw) =20 +/* Packed SIMD - Pure Multiplication Operations */ +GEN_SIMD_TRANS(pmulh_h) +GEN_SIMD_TRANS(pmulhsu_h) +GEN_SIMD_TRANS(pmulhu_h) +GEN_SIMD_TRANS(pmulhr_h) +GEN_SIMD_TRANS(pmulhrsu_h) +GEN_SIMD_TRANS(pmulhru_h) +GEN_SIMD_TRANS_64(pmulh_w) +GEN_SIMD_TRANS_64(pmulhr_w) +GEN_SIMD_TRANS_64(pmulhsu_w) +GEN_SIMD_TRANS_64(pmulhrsu_w) +GEN_SIMD_TRANS_64(pmulhu_w) +GEN_SIMD_TRANS_64(pmulhru_w) +GEN_SIMD_TRANS_32(mulhr) +GEN_SIMD_TRANS_32(mulhrsu) +GEN_SIMD_TRANS_32(mulhru) +GEN_SIMD_TRANS(pmulh_h_b0) +GEN_SIMD_TRANS(pmulh_h_b1) +GEN_SIMD_TRANS(pmulhsu_h_b0) +GEN_SIMD_TRANS(pmulhsu_h_b1) +GEN_SIMD_TRANS_32(mulh_h0) +GEN_SIMD_TRANS_32(mulh_h1) +GEN_SIMD_TRANS_32(mulhsu_h0) +GEN_SIMD_TRANS_32(mulhsu_h1) +GEN_SIMD_TRANS_64(pmulh_w_h0) +GEN_SIMD_TRANS_64(pmulh_w_h1) +GEN_SIMD_TRANS_64(pmulhsu_w_h0) +GEN_SIMD_TRANS_64(pmulhsu_w_h1) +GEN_SIMD_TRANS(pmul_h_b00) +GEN_SIMD_TRANS(pmul_h_b01) +GEN_SIMD_TRANS(pmul_h_b11) +GEN_SIMD_TRANS(pmulsu_h_b00) +GEN_SIMD_TRANS(pmulsu_h_b11) +GEN_SIMD_TRANS(pmulu_h_b00) +GEN_SIMD_TRANS(pmulu_h_b01) +GEN_SIMD_TRANS(pmulu_h_b11) +GEN_SIMD_TRANS_64(pmul_w_h00) +GEN_SIMD_TRANS_64(pmul_w_h01) +GEN_SIMD_TRANS_64(pmul_w_h11) +GEN_SIMD_TRANS_64(pmulsu_w_h00) +GEN_SIMD_TRANS_64(pmulsu_w_h11) +GEN_SIMD_TRANS_64(pmulu_w_h00) +GEN_SIMD_TRANS_64(pmulu_w_h01) +GEN_SIMD_TRANS_64(pmulu_w_h11) +GEN_SIMD_TRANS(pm2sadd_h) +GEN_SIMD_TRANS(pm2sadd_hx) +GEN_SIMD_TRANS_32(mul_h00) +GEN_SIMD_TRANS_32(mul_h01) +GEN_SIMD_TRANS_32(mul_h11) +GEN_SIMD_TRANS_32(mulsu_h00) +GEN_SIMD_TRANS_32(mulsu_h11) +GEN_SIMD_TRANS_32(mulu_h00) +GEN_SIMD_TRANS_32(mulu_h01) +GEN_SIMD_TRANS_32(mulu_h11) +GEN_SIMD_TRANS_64(mul_w00) +GEN_SIMD_TRANS_64(mul_w01) +GEN_SIMD_TRANS_64(mul_w11) +GEN_SIMD_TRANS_64(mulsu_w00) +GEN_SIMD_TRANS_64(mulsu_w11) +GEN_SIMD_TRANS_64(mulu_w00) +GEN_SIMD_TRANS_64(mulu_w01) +GEN_SIMD_TRANS_64(mulu_w11) diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c index 4080aab234..b60fd3094c 100644 --- a/target/riscv/psimd_helper.c +++ b/target/riscv/psimd_helper.c @@ -3616,3 +3616,1069 @@ uint64_t HELPER(clsw)(CPURISCVState *env, uint64_t= rs1) =20 return c; } + +/* Pure multiplication operations */ + +/** + * PMULH.H - Packed signed 16-bit multiply high + * For each halfword: rd[i] =3D (rs1[i] * rs2[i]) >> 16 + */ +target_ulong HELPER(pmulh_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t e2 =3D (int16_t)EXTRACT16(rs2, i); + int32_t prod =3D (int32_t)e1 * (int32_t)e2; + uint16_t high =3D (uint16_t)(prod >> 16); + rd =3D INSERT16(rd, high, i); + } + return rd; +} + +/** + * PMULHSU.H - Packed signed x unsigned 16-bit multiply high + */ +target_ulong HELPER(pmulhsu_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + int32_t prod =3D (int32_t)e1 * (uint32_t)e2; + uint16_t high =3D (uint16_t)(prod >> 16); + rd =3D INSERT16(rd, high, i); + } + return rd; +} + +/** + * PMULHU.H - Packed unsigned 16-bit multiply high + */ +target_ulong HELPER(pmulhu_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + uint32_t prod =3D (uint32_t)e1 * (uint32_t)e2; + uint16_t high =3D (uint16_t)(prod >> 16); + rd =3D INSERT16(rd, high, i); + } + return rd; +} + +/** + * PMULHR.H - Packed signed 16-bit multiply high with rounding + */ +target_ulong HELPER(pmulhr_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t e2 =3D (int16_t)EXTRACT16(rs2, i); + int32_t prod =3D (int32_t)e1 * (int32_t)e2 + (1 << 15); + uint16_t high =3D (uint16_t)(prod >> 16); + rd =3D INSERT16(rd, high, i); + } + return rd; +} + +/** + * PMULHRSU.H - Packed signed x unsigned 16-bit multiply high with rounding + */ +target_ulong HELPER(pmulhrsu_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + int32_t prod =3D (int32_t)e1 * (uint32_t)e2 + (1 << 15); + uint16_t high =3D (uint16_t)(prod >> 16); + rd =3D INSERT16(rd, high, i); + } + return rd; +} + +/** + * PMULHRU.H - Packed unsigned 16-bit multiply high with rounding + */ +target_ulong HELPER(pmulhru_h)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + uint32_t prod =3D (uint32_t)e1 * (uint32_t)e2 + (1 << 15); + uint16_t high =3D (uint16_t)(prod >> 16); + rd =3D INSERT16(rd, high, i); + } + return rd; +} + +/** + * PMULH.W - Packed signed 32-bit multiply high (RV64 only) + */ +uint64_t HELPER(pmulh_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int32_t e2 =3D (int32_t)EXTRACT32(rs2, i); + int64_t prod =3D (int64_t)e1 * (int64_t)e2; + uint32_t high =3D (uint32_t)(prod >> 32); + rd =3D INSERT32(rd, high, i); + } + return rd; +} + +/** + * PMULHR.W - Packed signed 32-bit multiply high with rounding (RV64 only) + */ +uint64_t HELPER(pmulhr_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int32_t e2 =3D (int32_t)EXTRACT32(rs2, i); + int64_t prod =3D (int64_t)e1 * (int64_t)e2 + (1LL << 31); + uint32_t high =3D (uint32_t)(prod >> 32); + rd =3D INSERT32(rd, high, i); + } + return rd; +} + +/** + * PMULHSU.W - Packed signed x unsigned 32-bit multiply high (RV64 only) + */ +uint64_t HELPER(pmulhsu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + int64_t prod =3D (int64_t)e1 * (uint64_t)e2; + uint32_t high =3D (uint32_t)(prod >> 32); + rd =3D INSERT32(rd, high, i); + } + return rd; +} + +/** + * PMULHRSU.W - Packed signed x unsigned 32-bit + * multiply high with rounding (RV64 only) + */ +uint64_t HELPER(pmulhrsu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + int64_t prod =3D (int64_t)e1 * (uint64_t)e2 + (1LL << 31); + uint32_t high =3D (uint32_t)(prod >> 32); + rd =3D INSERT32(rd, high, i); + } + return rd; +} + +/** + * PMULHU.W - Packed unsigned 32-bit multiply high (RV64 only) + */ +uint64_t HELPER(pmulhu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + uint64_t prod =3D (uint64_t)e1 * (uint64_t)e2; + uint32_t high =3D (uint32_t)(prod >> 32); + rd =3D INSERT32(rd, high, i); + } + return rd; +} + +/** + * PMULHRU.W - Packed unsigned 32-bit multiply high with rounding (RV64 on= ly) + */ +uint64_t HELPER(pmulhru_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + uint64_t prod =3D (uint64_t)e1 * (uint64_t)e2 + (1LL << 31); + uint32_t high =3D (uint32_t)(prod >> 32); + rd =3D INSERT32(rd, high, i); + } + return rd; +} + +/** + * MULHR - 32-bit signed multiply high with rounding + */ +uint32_t HELPER(mulhr)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int32_t a =3D (int32_t)rs1; + int32_t b =3D (int32_t)rs2; + int64_t prod =3D (int64_t)a * (int64_t)b + (1LL << 31); + return (uint32_t)(prod >> 32); +} + +/** + * MULHRSU - 32-bit signed x unsigned multiply high with rounding + */ +uint32_t HELPER(mulhrsu)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int32_t a =3D (int32_t)rs1; + uint32_t b =3D rs2; + int64_t prod =3D (int64_t)a * (uint64_t)b + (1LL << 31); + return (uint32_t)(prod >> 32); +} + +/** + * MULHRU - 32-bit unsigned multiply high with rounding + */ +uint32_t HELPER(mulhru)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint32_t a =3D rs1; + uint32_t b =3D rs2; + uint64_t prod =3D (uint64_t)a * (uint64_t)b + (1LL << 31); + return (uint32_t)(prod >> 32); +} + +/** + * PMULH.H.B0 - Multiply halfword by low byte, result high halfword + */ +target_ulong HELPER(pmulh_h_b0)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int8_t e2 =3D (int8_t)EXTRACT8(rs2, i * 2); + int32_t prod =3D (int32_t)e1 * (int32_t)e2; + uint16_t high =3D (uint16_t)(prod >> 8); + rd =3D INSERT16(rd, high, i); + } + return rd; +} + +/** + * PMULH.H.B1 - Multiply halfword by high byte, result high halfword + */ +target_ulong HELPER(pmulh_h_b1)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int8_t e2 =3D (int8_t)EXTRACT8(rs2, i * 2 + 1); + int32_t prod =3D (int32_t)e1 * (int32_t)e2; + uint16_t high =3D (uint16_t)(prod >> 8); + rd =3D INSERT16(rd, high, i); + } + return rd; +} + +/** + * PMULHSU.H.B0 - Multiply signed halfword by unsigned + * low byte, result high halfword + */ +target_ulong HELPER(pmulhsu_h_b0)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + uint8_t e2 =3D EXTRACT8(rs2, i * 2); + int32_t prod =3D (int32_t)e1 * (uint32_t)e2; + uint16_t high =3D (uint16_t)(prod >> 8); + rd =3D INSERT16(rd, high, i); + } + return rd; +} + +/** + * PMULHSU.H.B1 - Multiply signed halfword by unsigned + * high byte, result high halfword + */ +target_ulong HELPER(pmulhsu_h_b1)(CPURISCVState *env, + target_ulong rs1, target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + uint8_t e2 =3D EXTRACT8(rs2, i * 2 + 1); + int32_t prod =3D (int32_t)e1 * (uint32_t)e2; + uint16_t high =3D (uint16_t)(prod >> 8); + rd =3D INSERT16(rd, high, i); + } + return rd; +} + +/** + * MULH.H0 - 32-bit multiply by low halfword, result high 16 bits + */ +uint32_t HELPER(mulh_h0)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int32_t a =3D (int32_t)rs1; + int16_t b =3D (int16_t)(rs2 & 0xFFFF); + int64_t prod =3D (int64_t)a * (int64_t)b; + return (uint32_t)(prod >> 16); +} + +/** + * MULH.H1 - 32-bit multiply by high halfword, result high 16 bits + */ +uint32_t HELPER(mulh_h1)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int32_t a =3D (int32_t)rs1; + int16_t b =3D (int16_t)((rs2 >> 16) & 0xFFFF); + int64_t prod =3D (int64_t)a * (int64_t)b; + return (uint32_t)(prod >> 16); +} + +/** + * MULHSU.H0 - 32-bit signed multiply by unsigned + * low halfword, result high 16 bits + */ +uint32_t HELPER(mulhsu_h0)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int32_t a =3D (int32_t)rs1; + uint16_t b =3D (uint16_t)(rs2 & 0xFFFF); + int64_t prod =3D (int64_t)a * (uint64_t)b; + return (uint32_t)(prod >> 16); +} + +/** + * MULHSU.H1 - 32-bit signed multiply by unsigned + * high halfword, result high 16 bits + */ +uint32_t HELPER(mulhsu_h1)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int32_t a =3D (int32_t)rs1; + uint16_t b =3D (uint16_t)((rs2 >> 16) & 0xFFFF); + int64_t prod =3D (int64_t)a * (uint64_t)b; + return (uint32_t)(prod >> 16); +} + +/** + * PMULH.W.H0 - Multiply word by low halfword, result high word (RV64 only) + */ +uint64_t HELPER(pmulh_w_h0)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int16_t e2 =3D (int16_t)EXTRACT16(rs2, i * 2); + int64_t prod =3D (int64_t)e1 * (int64_t)e2; + uint32_t high =3D (uint32_t)(prod >> 16); + rd =3D INSERT32(rd, high, i); + } + return rd; +} + +/** + * PMULH.W.H1 - Multiply word by high halfword, result high word (RV64 onl= y) + */ +uint64_t HELPER(pmulh_w_h1)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int16_t e2 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int64_t prod =3D (int64_t)e1 * (int64_t)e2; + uint32_t high =3D (uint32_t)(prod >> 16); + rd =3D INSERT32(rd, high, i); + } + return rd; +} + +/** + * PMULHSU.W.H0 - Multiply signed word by unsigned + * low halfword, result high word (RV64 only) + */ +uint64_t HELPER(pmulhsu_w_h0)(CPURISCVState *env, uint64_t rs1, uint64_t r= s2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i * 2); + int64_t prod =3D (int64_t)e1 * (uint64_t)e2; + uint32_t high =3D (uint32_t)(prod >> 16); + rd =3D INSERT32(rd, high, i); + } + return rd; +} + +/** + * PMULHSU.W.H1 - Multiply signed word by unsigned + * high halfword, result high word (RV64 only) + */ +uint64_t HELPER(pmulhsu_w_h1)(CPURISCVState *env, uint64_t rs1, uint64_t r= s2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i * 2 + 1); + int64_t prod =3D (int64_t)e1 * (uint64_t)e2; + uint32_t high =3D (uint32_t)(prod >> 16); + rd =3D INSERT32(rd, high, i); + } + return rd; +} + +/** + * PMUL.H.B00 - Multiply halfword by low byte of each halfword + * For each halfword: rd[i] =3D rs1[i][7:0] * rs2[i][7:0] + */ +target_ulong HELPER(pmul_h_b00)(CPURISCVState *env, + target_ulong s1, target_ulong s2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t s1_h =3D EXTRACT16(s1, i); + uint16_t s2_h =3D EXTRACT16(s2, i); + int8_t s1_b0 =3D (int8_t)(s1_h & 0xFF); + int8_t s2_b0 =3D (int8_t)(s2_h & 0xFF); + int16_t mul =3D (int16_t)s1_b0 * (int16_t)s2_b0; + rd =3D INSERT16(rd, (uint16_t)mul, i); + } + return rd; +} + +/** + * PMUL.H.B01 - Multiply halfword low byte by halfword high byte + * For each halfword: rd[i] =3D rs1[i][7:0] * rs2[i][15:8] + */ +target_ulong HELPER(pmul_h_b01)(CPURISCVState *env, + target_ulong s1, target_ulong s2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t s1_h =3D EXTRACT16(s1, i); + uint16_t s2_h =3D EXTRACT16(s2, i); + int8_t s1_b0 =3D (int8_t)(s1_h & 0xFF); + int8_t s2_b1 =3D (int8_t)((s2_h >> 8) & 0xFF); + int16_t mul =3D (int16_t)s1_b0 * (int16_t)s2_b1; + rd =3D INSERT16(rd, (uint16_t)mul, i); + } + return rd; +} + +/** + * PMUL.H.B11 - Multiply halfword high byte by halfword high byte + * For each halfword: rd[i] =3D rs1[i][15:8] * rs2[i][15:8] + */ +target_ulong HELPER(pmul_h_b11)(CPURISCVState *env, + target_ulong s1, target_ulong s2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t s1_h =3D EXTRACT16(s1, i); + uint16_t s2_h =3D EXTRACT16(s2, i); + int8_t s1_b1 =3D (int8_t)((s1_h >> 8) & 0xFF); + int8_t s2_b1 =3D (int8_t)((s2_h >> 8) & 0xFF); + int16_t mul =3D (int16_t)s1_b1 * (int16_t)s2_b1; + rd =3D INSERT16(rd, (uint16_t)mul, i); + } + return rd; +} + +/** + * PMULSU.H.B00 - Signed x unsigned multiply, low bytes + * For each halfword: rd[i] =3D (signed)rs1[i][7:0] * (unsigned)rs2[i][7:0] + */ +target_ulong HELPER(pmulsu_h_b00)(CPURISCVState *env, + target_ulong s1, target_ulong s2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t s1_h =3D EXTRACT16(s1, i); + uint16_t s2_h =3D EXTRACT16(s2, i); + int8_t s1_b0 =3D (int8_t)(s1_h & 0xFF); + uint8_t s2_b0 =3D (uint8_t)(s2_h & 0xFF); + int16_t mul =3D (int16_t)s1_b0 * (uint16_t)s2_b0; + rd =3D INSERT16(rd, (uint16_t)mul, i); + } + return rd; +} + +/** + * PMULSU.H.B11 - Signed x unsigned multiply, high bytes + * For each halfword: rd[i] =3D (signed)rs1[i][15:8] * (unsigned)rs2[i][15= :8] + */ +target_ulong HELPER(pmulsu_h_b11)(CPURISCVState *env, + target_ulong s1, target_ulong s2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t s1_h =3D EXTRACT16(s1, i); + uint16_t s2_h =3D EXTRACT16(s2, i); + int8_t s1_b1 =3D (int8_t)((s1_h >> 8) & 0xFF); + uint8_t s2_b1 =3D (uint8_t)((s2_h >> 8) & 0xFF); + int16_t mul =3D (int16_t)s1_b1 * (uint16_t)s2_b1; + rd =3D INSERT16(rd, (uint16_t)mul, i); + } + return rd; +} + +/** + * PMULU.H.B00 - Unsigned multiply, low bytes + * For each halfword: rd[i] =3D rs1[i][7:0] * rs2[i][7:0] (unsigned) + */ +target_ulong HELPER(pmulu_h_b00)(CPURISCVState *env, + target_ulong s1, target_ulong s2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t s1_h =3D EXTRACT16(s1, i); + uint16_t s2_h =3D EXTRACT16(s2, i); + uint8_t s1_b0 =3D (uint8_t)(s1_h & 0xFF); + uint8_t s2_b0 =3D (uint8_t)(s2_h & 0xFF); + uint16_t mul =3D (uint16_t)s1_b0 * (uint16_t)s2_b0; + rd =3D INSERT16(rd, mul, i); + } + return rd; +} + +/** + * PMULU.H.B01 - Unsigned multiply, rs1 low byte x rs2 high byte + * For each halfword: rd[i] =3D rs1[i][7:0] * rs2[i][15:8] (unsigned) + */ +target_ulong HELPER(pmulu_h_b01)(CPURISCVState *env, + target_ulong s1, target_ulong s2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t s1_h =3D EXTRACT16(s1, i); + uint16_t s2_h =3D EXTRACT16(s2, i); + uint8_t s1_b0 =3D (uint8_t)(s1_h & 0xFF); + uint8_t s2_b1 =3D (uint8_t)((s2_h >> 8) & 0xFF); + uint16_t mul =3D (uint16_t)s1_b0 * (uint16_t)s2_b1; + rd =3D INSERT16(rd, mul, i); + } + return rd; +} + +/** + * PMULU.H.B11 - Unsigned multiply, high bytes + * For each halfword: rd[i] =3D rs1[i][15:8] * rs2[i][15:8] (unsigned) + */ +target_ulong HELPER(pmulu_h_b11)(CPURISCVState *env, + target_ulong s1, target_ulong s2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t s1_h =3D EXTRACT16(s1, i); + uint16_t s2_h =3D EXTRACT16(s2, i); + uint8_t s1_b1 =3D (uint8_t)((s1_h >> 8) & 0xFF); + uint8_t s2_b1 =3D (uint8_t)((s2_h >> 8) & 0xFF); + uint16_t mul =3D (uint16_t)s1_b1 * (uint16_t)s2_b1; + rd =3D INSERT16(rd, mul, i); + } + return rd; +} + +/** + * PMUL.W.H00 - Multiply word by low halfword of each word + * For each word: rd[i] =3D rs1[i][15:0] * rs2[i][15:0] + */ +uint64_t HELPER(pmul_w_h00)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2); + int32_t mul =3D (int32_t)s1_h0 * (int32_t)s2_h0; + rd =3D INSERT32(rd, (uint32_t)mul, i); + } + return rd; +} + +/** + * PMUL.W.H01 - Multiply word by low halfword x high halfword + * For each word: rd[i] =3D rs1[i][15:0] * rs2[i][31:16] + */ +uint64_t HELPER(pmul_w_h01)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t mul =3D (int32_t)s1_h0 * (int32_t)s2_h1; + rd =3D INSERT32(rd, (uint32_t)mul, i); + } + return rd; +} + +/** + * PMUL.W.H11 - Multiply word by high halfword x high halfword + * For each word: rd[i] =3D rs1[i][31:16] * rs2[i][31:16] + */ +uint64_t HELPER(pmul_w_h11)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t mul =3D (int32_t)s1_h1 * (int32_t)s2_h1; + rd =3D INSERT32(rd, (uint32_t)mul, i); + } + return rd; +} + +/** + * PMULSU.W.H00 - Signed x unsigned multiply, low halfwords + * For each word: rd[i] =3D (signed)rs1[i][15:0] * (unsigned)rs2[i][15:0] + */ +uint64_t HELPER(pmulsu_w_h00)(CPURISCVState *env, uint64_t rs1, uint64_t r= s2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + uint16_t s2_h0 =3D EXTRACT16(rs2, i * 2); + int32_t mul =3D (int32_t)s1_h0 * (uint32_t)s2_h0; + rd =3D INSERT32(rd, (uint32_t)mul, i); + } + return rd; +} + +/** + * PMULSU.W.H11 - Signed x unsigned multiply, high halfwords + * For each word: rd[i] =3D (signed)rs1[i][31:16] * (unsigned)rs2[i][31:16] + */ +uint64_t HELPER(pmulsu_w_h11)(CPURISCVState *env, uint64_t rs1, uint64_t r= s2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1); + int32_t mul =3D (int32_t)s1_h1 * (uint32_t)s2_h1; + rd =3D INSERT32(rd, (uint32_t)mul, i); + } + return rd; +} + +/** + * PMULU.W.H00 - Unsigned multiply, low halfwords + * For each word: rd[i] =3D rs1[i][15:0] * rs2[i][15:0] (unsigned) + */ +uint64_t HELPER(pmulu_w_h00)(CPURISCVState *env, uint64_t rs1, uint64_t rs= 2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint16_t s1_h0 =3D EXTRACT16(rs1, i * 2); + uint16_t s2_h0 =3D EXTRACT16(rs2, i * 2); + uint32_t mul =3D (uint32_t)s1_h0 * (uint32_t)s2_h0; + rd =3D INSERT32(rd, mul, i); + } + return rd; +} + +/** + * PMULU.W.H01 - Unsigned multiply, low halfword x high halfword + * For each word: rd[i] =3D rs1[i][15:0] * rs2[i][31:16] (unsigned) + */ +uint64_t HELPER(pmulu_w_h01)(CPURISCVState *env, uint64_t rs1, uint64_t rs= 2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint16_t s1_h0 =3D EXTRACT16(rs1, i * 2); + uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1); + uint32_t mul =3D (uint32_t)s1_h0 * (uint32_t)s2_h1; + rd =3D INSERT32(rd, mul, i); + } + return rd; +} + +/** + * PMULU.W.H11 - Unsigned multiply, high halfwords + * For each word: rd[i] =3D rs1[i][31:16] * rs2[i][31:16] (unsigned) + */ +uint64_t HELPER(pmulu_w_h11)(CPURISCVState *env, uint64_t rs1, uint64_t rs= 2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint16_t s1_h1 =3D EXTRACT16(rs1, i * 2 + 1); + uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1); + uint32_t mul =3D (uint32_t)s1_h1 * (uint32_t)s2_h1; + rd =3D INSERT32(rd, mul, i); + } + return rd; +} + +/** + * PM2SADD.H - Packed saturating multiply-add (non-crossed) + * + * For each 32-bit word: + * result =3D sat32(rs1[31:16] * rs2[31:16] + rs1[15:0] * rs2[15:0]) + * + * Special case: if both halfwords in both sources are 0x8000 (-32768), + * result saturates to 0x7FFFFFFF and sets vxsat + */ +target_ulong HELPER(pm2sadd_h)(CPURISCVState *env, + target_ulong s1, target_ulong s2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); /* Number of 32-bit words */ + int global_sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + /* Extract both halfwords from each source for this word */ + uint32_t s1_word =3D EXTRACT32(s1, i); + uint32_t s2_word =3D EXTRACT32(s2, i); + + int16_t s1_h0 =3D (int16_t)EXTRACT16(s1_word, 0); + int16_t s1_h1 =3D (int16_t)EXTRACT16(s1_word, 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(s2_word, 0); + int16_t s2_h1 =3D (int16_t)EXTRACT16(s2_word, 1); + + uint32_t result; + + /* Check for the special saturation case: all halfwords are -32768= */ + if ((s1_h0 =3D=3D -32768) && (s1_h1 =3D=3D -32768) && + (s2_h0 =3D=3D -32768) && (s2_h1 =3D=3D -32768)) { + result =3D 0x7FFFFFFF; + global_sat =3D 1; + } else { + /* Normal case: compute products and sum */ + int32_t mul_00 =3D (int32_t)s1_h0 * (int32_t)s2_h0; + int32_t mul_11 =3D (int32_t)s1_h1 * (int32_t)s2_h1; + + /* The sum may overflow 32 bits; the result is truncated. */ + result =3D (uint32_t)(mul_00 + mul_11); + } + + rd =3D INSERT32(rd, result, i); + } + + if (global_sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PM2SADD.HX - Packed saturating multiply-add crossed + * + * For each 32-bit word: + * result =3D sat32(rs1[31:16] * rs2[15:0] + rs1[15:0] * rs2[31:16]) + * + * Special case: if both halfwords in both sources are 0x8000 (-32768), + * result saturates to 0x7FFFFFFF and sets vxsat + */ +target_ulong HELPER(pm2sadd_hx)(CPURISCVState *env, + target_ulong s1, target_ulong s2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); /* Number of 32-bit words */ + int global_sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + /* Extract both halfwords from each source for this word */ + uint32_t s1_word =3D EXTRACT32(s1, i); + uint32_t s2_word =3D EXTRACT32(s2, i); + + int16_t s1_h0 =3D (int16_t)EXTRACT16(s1_word, 0); + int16_t s1_h1 =3D (int16_t)EXTRACT16(s1_word, 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(s2_word, 0); + int16_t s2_h1 =3D (int16_t)EXTRACT16(s2_word, 1); + + uint32_t result; + + /* Check for the special saturation case: all halfwords are -32768= */ + if ((s1_h0 =3D=3D -32768) && (s1_h1 =3D=3D -32768) && + (s2_h0 =3D=3D -32768) && (s2_h1 =3D=3D -32768)) { + result =3D 0x7FFFFFFF; + global_sat =3D 1; + } else { + /* Crossed products: s1_h0 * s2_h1 and s1_h1 * s2_h0 */ + int32_t mul_01 =3D (int32_t)s1_h0 * (int32_t)s2_h1; + int32_t mul_10 =3D (int32_t)s1_h1 * (int32_t)s2_h0; + + /* Sum the crossed products */ + result =3D (uint32_t)(mul_01 + mul_10); + } + + rd =3D INSERT32(rd, result, i); + } + + if (global_sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * MUL.H00 - 32-bit signed multiply, low halfwords + * Returns product of low halfwords of rs1 and rs2 + */ +uint32_t HELPER(mul_h00)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0); + int32_t mul =3D (int32_t)s1_h0 * (int32_t)s2_h0; + return (uint32_t)mul; +} + +/** + * MUL.H01 - 32-bit signed multiply, rs1 low halfword x rs2 high halfword + * Returns product of low halfword of rs1 and high halfword of rs2 + */ +uint32_t HELPER(mul_h01)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1); + int32_t mul =3D (int32_t)s1_h0 * (int32_t)s2_h1; + return (uint32_t)mul; +} + +/** + * MUL.H11 - 32-bit signed multiply, high halfwords + * Returns product of high halfwords of rs1 and rs2 + */ +uint32_t HELPER(mul_h11)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1); + int32_t mul =3D (int32_t)s1_h1 * (int32_t)s2_h1; + return (uint32_t)mul; +} + +/** + * MULSU.H00 - 32-bit signed x unsigned multiply, low halfwords + * Returns product of low halfword of rs1 (signed) + * and low halfword of rs2 (unsigned) + */ +uint32_t HELPER(mulsu_h00)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + uint16_t s2_h0 =3D EXTRACT16(rs2, 0); + int32_t mul =3D (int32_t)s1_h0 * (uint32_t)s2_h0; + return (uint32_t)mul; +} + +/** + * MULSU.H11 - 32-bit signed x unsigned multiply, high halfwords + * Returns product of high halfword of rs1 (signed) + * and high halfword of rs2 (unsigned) + */ +uint32_t HELPER(mulsu_h11)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1); + uint16_t s2_h1 =3D EXTRACT16(rs2, 1); + int32_t mul =3D (int32_t)s1_h1 * (uint32_t)s2_h1; + return (uint32_t)mul; +} + +/** + * MULU.H00 - 32-bit unsigned multiply, low halfwords + * Returns product of low halfwords of rs1 and rs2 (unsigned) + */ +uint32_t HELPER(mulu_h00)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint16_t s1_h0 =3D EXTRACT16(rs1, 0); + uint16_t s2_h0 =3D EXTRACT16(rs2, 0); + uint32_t mul =3D (uint32_t)s1_h0 * (uint32_t)s2_h0; + return mul; +} + +/** + * MULU.H01 - 32-bit unsigned multiply, rs1 low halfword x rs2 high halfwo= rd + * Returns product of low halfword of rs1 and high halfword of rs2 (unsign= ed) + */ +uint32_t HELPER(mulu_h01)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint16_t s1_h0 =3D EXTRACT16(rs1, 0); + uint16_t s2_h1 =3D EXTRACT16(rs2, 1); + uint32_t mul =3D (uint32_t)s1_h0 * (uint32_t)s2_h1; + return mul; +} + +/** + * MULU.H11 - 32-bit unsigned multiply, high halfwords + * Returns product of high halfwords of rs1 and rs2 (unsigned) + */ +uint32_t HELPER(mulu_h11)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint16_t s1_h1 =3D EXTRACT16(rs1, 1); + uint16_t s2_h1 =3D EXTRACT16(rs2, 1); + uint32_t mul =3D (uint32_t)s1_h1 * (uint32_t)s2_h1; + return mul; +} + +/** + * MUL.W00 - 64-bit signed multiply, low word x low word + * Returns full 64-bit product of low 32 bits of rs1 and rs2 + */ +uint64_t HELPER(mul_w00)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0); + int64_t mul =3D (int64_t)s1_w0 * (int64_t)s2_w0; + return (uint64_t)mul; +} + +/** + * MUL.W01 - 64-bit signed multiply, low word x high word + * Returns full 64-bit product of low 32 bits of rs1 and high 32 bits of r= s2 + */ +uint64_t HELPER(mul_w01)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1); + int64_t mul =3D (int64_t)s1_w0 * (int64_t)s2_w1; + return (uint64_t)mul; +} + +/** + * MUL.W11 - 64-bit signed multiply, high word x high word + * Returns full 64-bit product of high 32 bits of rs1 and high 32 bits of = rs2 + */ +uint64_t HELPER(mul_w11)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1); + int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1); + int64_t mul =3D (int64_t)s1_w1 * (int64_t)s2_w1; + return (uint64_t)mul; +} + +/** + * MULSU.W00 - 64-bit signed x unsigned multiply, low word x low word + * Returns full 64-bit product of low 32 bits of rs1 + * (signed) and low 32 bits of rs2 (unsigned) + */ +uint64_t HELPER(mulsu_w00)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + uint32_t s2_w0 =3D EXTRACT32(rs2, 0); + int64_t mul =3D (int64_t)s1_w0 * (uint64_t)s2_w0; + return (uint64_t)mul; +} + +/** + * MULSU.W11 - 64-bit signed x unsigned multiply, high word x high word + * Returns full 64-bit product of high 32 bits of rs1 + * (signed) and high 32 bits of rs2 (unsigned) + */ +uint64_t HELPER(mulsu_w11)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1); + uint32_t s2_w1 =3D EXTRACT32(rs2, 1); + int64_t mul =3D (int64_t)s1_w1 * (uint64_t)s2_w1; + return (uint64_t)mul; +} + +/** + * MULU.W00 - 64-bit unsigned multiply, low word x low word + * Returns full 64-bit product of low 32 bits of rs1 + * and low 32 bits of rs2 (unsigned) + */ +uint64_t HELPER(mulu_w00)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint32_t s1_w0 =3D EXTRACT32(rs1, 0); + uint32_t s2_w0 =3D EXTRACT32(rs2, 0); + uint64_t mul =3D (uint64_t)s1_w0 * (uint64_t)s2_w0; + return mul; +} + +/** + * MULU.W01 - 64-bit unsigned multiply, low word x high word + * Returns full 64-bit product of low 32 bits of rs1 + * and high 32 bits of rs2 (unsigned) + */ +uint64_t HELPER(mulu_w01)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint32_t s1_w0 =3D EXTRACT32(rs1, 0); + uint32_t s2_w1 =3D EXTRACT32(rs2, 1); + uint64_t mul =3D (uint64_t)s1_w0 * (uint64_t)s2_w1; + return mul; +} + +/** + * MULU.W11 - 64-bit unsigned multiply, high word x high word + * Returns full 64-bit product of high 32 bits of rs1 + * and high 32 bits of rs2 (unsigned) + */ +uint64_t HELPER(mulu_w11)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint32_t s1_w1 =3D EXTRACT32(rs1, 1); + uint32_t s2_w1 =3D EXTRACT32(rs2, 1); + uint64_t mul =3D (uint64_t)s1_w1 * (uint64_t)s2_w1; + return mul; +} --=20 2.34.1 From nobody Sat May 30 20:13:16 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1776422952788468.7809594646741; Fri, 17 Apr 2026 03:49:12 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wDgjU-0001EB-Vr; Fri, 17 Apr 2026 06:47:33 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wDgjP-00017F-38; Fri, 17 Apr 2026 06:47:27 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1wDgjK-0007zf-9k; Fri, 17 Apr 2026 06:47:26 -0400 Received: from Huawei.localdomain (unknown [36.110.52.2]) by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S11; Fri, 17 Apr 2026 18:47:17 +0800 (CST) From: Molly Chen To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com, daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com, chao.liu.zevorn@gmail.com Cc: xiaoou@iscas.ac.cn, qemu-riscv@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 09/14] target/riscv: rvp: add multiply-accumulate operations Date: Fri, 17 Apr 2026 18:46:46 +0800 Message-Id: <20260417104652.17857-10-xiaoou@iscas.ac.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn> References: <20260417104652.17857-1-xiaoou@iscas.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S11 X-Coremail-Antispam: 1UD129KBjvAXoWfAw1fXr1xZF4xtF4DGr47CFg_yoWrJrWxto W3Gw1Yy395ur4xu3yF9w4UXr1jqrWIvw1DJw4Fvr43Xas7Gr9rKr15J34kAa4xCrWayrWr WrZayFyrtFy3C3sxn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOj7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l 84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV WxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE 3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I x0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJVW8 JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc7CjxVAaw2AFwI0_Jw 0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AK xVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrx kI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v2 6F4j6r4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr 1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUqLvNU UUUU= X-Originating-IP: [36.110.52.2] X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists1p.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn; helo=cstnet.cn X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998, HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1776422954522158500 Content-Type: text/plain; charset="utf-8" Signed-off-by: Molly Chen --- target/riscv/helper.h | 56 ++ target/riscv/insn32.decode | 92 +++ target/riscv/insn_trans/trans_rvp.c.inc | 56 ++ target/riscv/psimd_helper.c | 946 ++++++++++++++++++++++++ 4 files changed, 1150 insertions(+) diff --git a/target/riscv/helper.h b/target/riscv/helper.h index 4b3f01f8d0..54f8591672 100644 --- a/target/riscv/helper.h +++ b/target/riscv/helper.h @@ -1605,3 +1605,59 @@ DEF_HELPER_3(mulsu_w11, i64, env, i64, i64) DEF_HELPER_3(mulu_w00, i64, env, i64, i64) DEF_HELPER_3(mulu_w01, i64, env, i64, i64) DEF_HELPER_3(mulu_w11, i64, env, i64, i64) + +/* Packed SIMD - Multiply-Accumulate Operations */ +DEF_HELPER_4(pmhacc_h, tl, env, tl, tl, tl) +DEF_HELPER_4(pmhaccsu_h, tl, env, tl, tl, tl) +DEF_HELPER_4(pmhaccu_h, tl, env, tl, tl, tl) +DEF_HELPER_4(pmhracc_h, tl, env, tl, tl, tl) +DEF_HELPER_4(pmhraccsu_h, tl, env, tl, tl, tl) +DEF_HELPER_4(pmhraccu_h, tl, env, tl, tl, tl) +DEF_HELPER_4(pmhacc_w, i64, env, i64, i64, i64) +DEF_HELPER_4(pmhracc_w, i64, env, i64, i64, i64) +DEF_HELPER_4(pmhaccsu_w, i64, env, i64, i64, i64) +DEF_HELPER_4(pmhraccsu_w, i64, env, i64, i64, i64) +DEF_HELPER_4(pmhaccu_w, i64, env, i64, i64, i64) +DEF_HELPER_4(pmhraccu_w, i64, env, i64, i64, i64) +DEF_HELPER_4(mhacc, i32, env, i32, i32, i32) +DEF_HELPER_4(mhracc, i32, env, i32, i32, i32) +DEF_HELPER_4(mhaccsu, i32, env, i32, i32, i32) +DEF_HELPER_4(mhraccsu, i32, env, i32, i32, i32) +DEF_HELPER_4(mhaccu, i32, env, i32, i32, i32) +DEF_HELPER_4(mhraccu, i32, env, i32, i32, i32) +DEF_HELPER_4(pmhacc_h_b0, tl, env, tl, tl, tl) +DEF_HELPER_4(pmhacc_h_b1, tl, env, tl, tl, tl) +DEF_HELPER_4(pmhaccsu_h_b0, tl, env, tl, tl, tl) +DEF_HELPER_4(pmhaccsu_h_b1, tl, env, tl, tl, tl) +DEF_HELPER_4(mhacc_h0, i32, env, i32, i32, i32) +DEF_HELPER_4(mhacc_h1, i32, env, i32, i32, i32) +DEF_HELPER_4(mhaccsu_h0, i32, env, i32, i32, i32) +DEF_HELPER_4(mhaccsu_h1, i32, env, i32, i32, i32) +DEF_HELPER_4(pmhacc_w_h0, i64, env, i64, i64, i64) +DEF_HELPER_4(pmhacc_w_h1, i64, env, i64, i64, i64) +DEF_HELPER_4(pmhaccsu_w_h0, i64, env, i64, i64, i64) +DEF_HELPER_4(pmhaccsu_w_h1, i64, env, i64, i64, i64) +DEF_HELPER_4(pmacc_w_h00, i64, env, i64, i64, i64) +DEF_HELPER_4(pmacc_w_h01, i64, env, i64, i64, i64) +DEF_HELPER_4(pmacc_w_h11, i64, env, i64, i64, i64) +DEF_HELPER_4(pmaccsu_w_h00, i64, env, i64, i64, i64) +DEF_HELPER_4(pmaccsu_w_h11, i64, env, i64, i64, i64) +DEF_HELPER_4(pmaccu_w_h00, i64, env, i64, i64, i64) +DEF_HELPER_4(pmaccu_w_h01, i64, env, i64, i64, i64) +DEF_HELPER_4(pmaccu_w_h11, i64, env, i64, i64, i64) +DEF_HELPER_4(macc_h00, i32, env, i32, i32, i32) +DEF_HELPER_4(macc_h01, i32, env, i32, i32, i32) +DEF_HELPER_4(macc_h11, i32, env, i32, i32, i32) +DEF_HELPER_4(maccsu_h00, i32, env, i32, i32, i32) +DEF_HELPER_4(maccsu_h11, i32, env, i32, i32, i32) +DEF_HELPER_4(maccu_h00, i32, env, i32, i32, i32) +DEF_HELPER_4(maccu_h01, i32, env, i32, i32, i32) +DEF_HELPER_4(maccu_h11, i32, env, i32, i32, i32) +DEF_HELPER_4(macc_w00, i64, env, i64, i64, i64) +DEF_HELPER_4(macc_w01, i64, env, i64, i64, i64) +DEF_HELPER_4(macc_w11, i64, env, i64, i64, i64) +DEF_HELPER_4(maccsu_w00, i64, env, i64, i64, i64) +DEF_HELPER_4(maccsu_w11, i64, env, i64, i64, i64) +DEF_HELPER_4(maccu_w00, i64, env, i64, i64, i64) +DEF_HELPER_4(maccu_w01, i64, env, i64, i64, i64) +DEF_HELPER_4(maccu_w11, i64, env, i64, i64, i64) diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode index bd3b14af5b..9944d0b52c 100644 --- a/target/riscv/insn32.decode +++ b/target/riscv/insn32.decode @@ -1413,3 +1413,95 @@ mulsu_w11 11110 11 ..... ..... 011 ..... 01110= 11 @r mulu_w00 10100 11 ..... ..... 011 ..... 0111011 @r mulu_w01 10110 11 ..... ..... 001 ..... 0111011 @r mulu_w11 10110 11 ..... ..... 011 ..... 0111011 @r + +# Packed SIMD - Multiply-Accumulate Operations +pmhacc_h 10001 00 ..... ..... 111 ..... 0111011 @r +pmhaccsu_h 11001 00 ..... ..... 111 ..... 0111011 @r +pmhaccu_h 10011 00 ..... ..... 111 ..... 0111011 @r +pmhracc_h 10001 10 ..... ..... 111 ..... 0111011 @r +pmhraccsu_h 11001 10 ..... ..... 111 ..... 0111011 @r +pmhraccu_h 10011 10 ..... ..... 111 ..... 0111011 @r +{ + mhacc 10001 01 ..... ..... 111 ..... 0111011 @r + pmhacc_w 10001 01 ..... ..... 111 ..... 0111011 @r +} +{ + mhracc 10001 11 ..... ..... 111 ..... 0111011 @r + pmhracc_w 10001 11 ..... ..... 111 ..... 0111011 @r +} +{ + mhaccsu 11001 01 ..... ..... 111 ..... 0111011 @r + pmhaccsu_w 11001 01 ..... ..... 111 ..... 0111011 @r +} +{ + mhraccsu 11001 11 ..... ..... 111 ..... 0111011 @r + pmhraccsu_w 11001 11 ..... ..... 111 ..... 0111011 @r +} +{ + mhaccu 10011 01 ..... ..... 111 ..... 0111011 @r + pmhaccu_w 10011 01 ..... ..... 111 ..... 0111011 @r +} +{ + mhraccu 10011 11 ..... ..... 111 ..... 0111011 @r + pmhraccu_w 10011 11 ..... ..... 111 ..... 0111011 @r +} +pmhacc_h_b0 10101 00 ..... ..... 111 ..... 0111011 @r +pmhacc_h_b1 10111 00 ..... ..... 111 ..... 0111011 @r +pmhaccsu_h_b0 10101 10 ..... ..... 111 ..... 0111011 @r +pmhaccsu_h_b1 10111 10 ..... ..... 111 ..... 0111011 @r +{ + mhacc_h0 10101 01 ..... ..... 111 ..... 0111011 @r + pmhacc_w_h0 10101 01 ..... ..... 111 ..... 0111011 @r +} +{ + mhacc_h1 10111 01 ..... ..... 111 ..... 0111011 @r + pmhacc_w_h1 10111 01 ..... ..... 111 ..... 0111011 @r +} +{ + mhaccsu_h0 10101 11 ..... ..... 111 ..... 0111011 @r + pmhaccsu_w_h0 10101 11 ..... ..... 111 ..... 0111011 @r +} +{ + mhaccsu_h1 10111 11 ..... ..... 111 ..... 0111011 @r + pmhaccsu_w_h1 10111 11 ..... ..... 111 ..... 0111011 @r +} +{ + macc_h00 10001 01 ..... ..... 011 ..... 0111011 @r + pmacc_w_h00 10001 01 ..... ..... 011 ..... 0111011 @r +} +{ + macc_h01 10011 01 ..... ..... 001 ..... 0111011 @r + pmacc_w_h01 10011 01 ..... ..... 001 ..... 0111011 @r +} +{ + macc_h11 10011 01 ..... ..... 011 ..... 0111011 @r + pmacc_w_h11 10011 01 ..... ..... 011 ..... 0111011 @r +} +{ + maccsu_h00 11101 01 ..... ..... 011 ..... 0111011 @r + pmaccsu_w_h00 11101 01 ..... ..... 011 ..... 0111011 @r +} +{ + maccsu_h11 11111 01 ..... ..... 011 ..... 0111011 @r + pmaccsu_w_h11 11111 01 ..... ..... 011 ..... 0111011 @r +} +{ + maccu_h00 10101 01 ..... ..... 011 ..... 0111011 @r + pmaccu_w_h00 10101 01 ..... ..... 011 ..... 0111011 @r +} +{ + maccu_h01 10111 01 ..... ..... 001 ..... 0111011 @r + pmaccu_w_h01 10111 01 ..... ..... 001 ..... 0111011 @r +} +{ + maccu_h11 10111 01 ..... ..... 011 ..... 0111011 @r + pmaccu_w_h11 10111 01 ..... ..... 011 ..... 0111011 @r +} +macc_w00 10001 11 ..... ..... 011 ..... 0111011 @r +macc_w01 10011 11 ..... ..... 001 ..... 0111011 @r +macc_w11 10011 11 ..... ..... 011 ..... 0111011 @r +maccsu_w00 11101 11 ..... ..... 011 ..... 0111011 @r +maccsu_w11 11111 11 ..... ..... 011 ..... 0111011 @r +maccu_w00 10101 11 ..... ..... 011 ..... 0111011 @r +maccu_w01 10111 11 ..... ..... 001 ..... 0111011 @r +maccu_w11 10111 11 ..... ..... 011 ..... 0111011 @r diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr= ans/trans_rvp.c.inc index b01656ffb0..b3476c26ad 100644 --- a/target/riscv/insn_trans/trans_rvp.c.inc +++ b/target/riscv/insn_trans/trans_rvp.c.inc @@ -774,3 +774,59 @@ GEN_SIMD_TRANS_64(mulsu_w11) GEN_SIMD_TRANS_64(mulu_w00) GEN_SIMD_TRANS_64(mulu_w01) GEN_SIMD_TRANS_64(mulu_w11) + +/* Packed SIMD - Multiply-Accumulate Operations */ +GEN_SIMD_TRANS_ACC(pmhacc_h) +GEN_SIMD_TRANS_ACC(pmhaccsu_h) +GEN_SIMD_TRANS_ACC(pmhaccu_h) +GEN_SIMD_TRANS_ACC(pmhracc_h) +GEN_SIMD_TRANS_ACC(pmhraccsu_h) +GEN_SIMD_TRANS_ACC(pmhraccu_h) +GEN_SIMD_TRANS_ACC_64(pmhacc_w) +GEN_SIMD_TRANS_ACC_64(pmhracc_w) +GEN_SIMD_TRANS_ACC_64(pmhaccsu_w) +GEN_SIMD_TRANS_ACC_64(pmhraccsu_w) +GEN_SIMD_TRANS_ACC_64(pmhaccu_w) +GEN_SIMD_TRANS_ACC_64(pmhraccu_w) +GEN_SIMD_TRANS_ACC_32(mhacc) +GEN_SIMD_TRANS_ACC_32(mhracc) +GEN_SIMD_TRANS_ACC_32(mhaccsu) +GEN_SIMD_TRANS_ACC_32(mhraccsu) +GEN_SIMD_TRANS_ACC_32(mhaccu) +GEN_SIMD_TRANS_ACC_32(mhraccu) +GEN_SIMD_TRANS_ACC(pmhacc_h_b0) +GEN_SIMD_TRANS_ACC(pmhacc_h_b1) +GEN_SIMD_TRANS_ACC(pmhaccsu_h_b0) +GEN_SIMD_TRANS_ACC(pmhaccsu_h_b1) +GEN_SIMD_TRANS_ACC_32(mhacc_h0) +GEN_SIMD_TRANS_ACC_32(mhacc_h1) +GEN_SIMD_TRANS_ACC_32(mhaccsu_h0) +GEN_SIMD_TRANS_ACC_32(mhaccsu_h1) +GEN_SIMD_TRANS_ACC_64(pmhacc_w_h0) +GEN_SIMD_TRANS_ACC_64(pmhacc_w_h1) +GEN_SIMD_TRANS_ACC_64(pmhaccsu_w_h0) +GEN_SIMD_TRANS_ACC_64(pmhaccsu_w_h1) +GEN_SIMD_TRANS_ACC_64(pmacc_w_h00) +GEN_SIMD_TRANS_ACC_64(pmacc_w_h01) +GEN_SIMD_TRANS_ACC_64(pmacc_w_h11) +GEN_SIMD_TRANS_ACC_64(pmaccsu_w_h00) +GEN_SIMD_TRANS_ACC_64(pmaccsu_w_h11) +GEN_SIMD_TRANS_ACC_64(pmaccu_w_h00) +GEN_SIMD_TRANS_ACC_64(pmaccu_w_h01) +GEN_SIMD_TRANS_ACC_64(pmaccu_w_h11) +GEN_SIMD_TRANS_ACC_32(macc_h00) +GEN_SIMD_TRANS_ACC_32(macc_h01) +GEN_SIMD_TRANS_ACC_32(macc_h11) +GEN_SIMD_TRANS_ACC_32(maccsu_h00) +GEN_SIMD_TRANS_ACC_32(maccsu_h11) +GEN_SIMD_TRANS_ACC_32(maccu_h00) +GEN_SIMD_TRANS_ACC_32(maccu_h01) +GEN_SIMD_TRANS_ACC_32(maccu_h11) +GEN_SIMD_TRANS_ACC_64(macc_w00) +GEN_SIMD_TRANS_ACC_64(macc_w01) +GEN_SIMD_TRANS_ACC_64(macc_w11) +GEN_SIMD_TRANS_ACC_64(maccsu_w00) +GEN_SIMD_TRANS_ACC_64(maccsu_w11) +GEN_SIMD_TRANS_ACC_64(maccu_w00) +GEN_SIMD_TRANS_ACC_64(maccu_w01) +GEN_SIMD_TRANS_ACC_64(maccu_w11) diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c index b60fd3094c..7f32a13ba0 100644 --- a/target/riscv/psimd_helper.c +++ b/target/riscv/psimd_helper.c @@ -4682,3 +4682,949 @@ uint64_t HELPER(mulu_w11)(CPURISCVState *env, uint6= 4_t rs1, uint64_t rs2) uint64_t mul =3D (uint64_t)s1_w1 * (uint64_t)s2_w1; return mul; } + +/* Multiply-Accumulate Operations */ + +/** + * PMHACC.H - Packed signed 16-bit multiply high with accumulate + */ +target_ulong HELPER(pmhacc_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t e2 =3D (int16_t)EXTRACT16(rs2, i); + int16_t d =3D (int16_t)EXTRACT16(dest, i); + int32_t prod =3D (int32_t)e1 * (int32_t)e2; + int16_t high =3D (int16_t)(prod >> 16); + uint16_t res =3D (uint16_t)(high + d); + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PMHACCSU.H - Packed signed x unsigned 16-bit multiply high with accumul= ate + */ +target_ulong HELPER(pmhaccsu_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + int16_t d =3D (int16_t)EXTRACT16(dest, i); + int32_t prod =3D (int32_t)e1 * (uint32_t)e2; + int16_t high =3D (int16_t)(prod >> 16); + uint16_t res =3D (uint16_t)(high + d); + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PMHACCU.H - Packed unsigned 16-bit multiply high with accumulate + */ +target_ulong HELPER(pmhaccu_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + uint16_t d =3D (uint16_t)EXTRACT16(dest, i); + uint32_t prod =3D (uint32_t)e1 * (uint32_t)e2; + uint16_t high =3D (uint16_t)(prod >> 16); + uint16_t res =3D high + d; + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PMHRACC.H - Packed signed 16-bit multiply high with rounding and accumu= late + */ +target_ulong HELPER(pmhracc_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t e2 =3D (int16_t)EXTRACT16(rs2, i); + int16_t d =3D (int16_t)EXTRACT16(dest, i); + int32_t prod =3D (int32_t)e1 * (int32_t)e2 + (1 << 15); + int16_t high =3D (int16_t)(prod >> 16); + uint16_t res =3D (uint16_t)(high + d); + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PMHRACCSU.H - Packed signed x unsigned 16-bit + * multiply high with rounding and accumulate + */ +target_ulong HELPER(pmhraccsu_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + int16_t d =3D (int16_t)EXTRACT16(dest, i); + int32_t prod =3D (int32_t)e1 * (uint32_t)e2 + (1 << 15); + int16_t high =3D (int16_t)(prod >> 16); + uint16_t res =3D (uint16_t)(high + d); + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PMHRACCU.H - Packed unsigned 16-bit multiply + * high with rounding and accumulate + */ +target_ulong HELPER(pmhraccu_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D EXTRACT16(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i); + uint16_t d =3D (uint16_t)EXTRACT16(dest, i); + uint32_t prod =3D (uint32_t)e1 * (uint32_t)e2 + (1 << 15); + uint16_t high =3D (uint16_t)(prod >> 16); + uint16_t res =3D high + d; + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PMHACC.W - Packed signed 32-bit multiply high with accumulate (RV64 onl= y) + */ +uint64_t HELPER(pmhacc_w)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int32_t e2 =3D (int32_t)EXTRACT32(rs2, i); + int32_t d =3D (int32_t)EXTRACT32(dest, i); + int64_t prod =3D (int64_t)e1 * (int64_t)e2; + int32_t high =3D (int32_t)(prod >> 32); + uint32_t res =3D (uint32_t)(high + d); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMHRACC.W - Packed signed 32-bit multiply high + * with rounding and accumulate (RV64 only) + */ +uint64_t HELPER(pmhracc_w)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int32_t e2 =3D (int32_t)EXTRACT32(rs2, i); + int32_t d =3D (int32_t)EXTRACT32(dest, i); + int64_t prod =3D (int64_t)e1 * (int64_t)e2 + (1LL << 31); + int32_t high =3D (int32_t)(prod >> 32); + uint32_t res =3D (uint32_t)(high + d); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMHACCSU.W - Packed signed x unsigned 32-bit + * multiply high with accumulate (RV64 only) + */ +uint64_t HELPER(pmhaccsu_w)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + int32_t d =3D (int32_t)EXTRACT32(dest, i); + int64_t prod =3D (int64_t)e1 * (uint64_t)e2; + int32_t high =3D (int32_t)(prod >> 32); + uint32_t res =3D (uint32_t)(high + d); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMHRACCSU.W - Packed signed x unsigned 32-bit + * multiply high with rounding and accumulate + * (RV64 only) + */ +uint64_t HELPER(pmhraccsu_w)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + int32_t d =3D (int32_t)EXTRACT32(dest, i); + int64_t prod =3D (int64_t)e1 * (uint64_t)e2 + (1LL << 31); + int32_t high =3D (int32_t)(prod >> 32); + uint32_t res =3D (uint32_t)(high + d); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMHACCU.W - Packed unsigned 32-bit multiply high with accumulate (RV64 = only) + */ +uint64_t HELPER(pmhaccu_w)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + uint32_t d =3D EXTRACT32(dest, i); + uint64_t prod =3D (uint64_t)e1 * (uint64_t)e2; + uint32_t high =3D (uint32_t)(prod >> 32); + uint32_t res =3D high + d; + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMHRACCU.W - Packed unsigned 32-bit multiply + * high with rounding and accumulate (RV64 only) + */ +uint64_t HELPER(pmhraccu_w)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint32_t e1 =3D EXTRACT32(rs1, i); + uint32_t e2 =3D EXTRACT32(rs2, i); + uint32_t d =3D EXTRACT32(dest, i); + uint64_t prod =3D (uint64_t)e1 * (uint64_t)e2 + (1LL << 31); + uint32_t high =3D (uint32_t)(prod >> 32); + uint32_t res =3D high + d; + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * MHACC - 32-bit signed multiply high with accumulate + */ +uint32_t HELPER(mhacc)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int32_t a =3D (int32_t)rs1; + int32_t b =3D (int32_t)rs2; + int32_t d =3D (int32_t)dest; + int64_t prod =3D (int64_t)a * (int64_t)b; + return (uint32_t)(d + (prod >> 32)); +} + +/** + * MHRACC - 32-bit signed multiply high with rounding and accumulate + */ +uint32_t HELPER(mhracc)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int32_t a =3D (int32_t)rs1; + int32_t b =3D (int32_t)rs2; + int32_t d =3D (int32_t)dest; + int64_t prod =3D (int64_t)a * (int64_t)b + (1LL << 31); + return (uint32_t)(d + (prod >> 32)); +} + +/** + * MHACCSU - 32-bit signed x unsigned multiply high with accumulate + */ +uint32_t HELPER(mhaccsu)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int32_t a =3D (int32_t)rs1; + uint32_t b =3D rs2; + int32_t d =3D (int32_t)dest; + int64_t prod =3D (int64_t)a * (uint64_t)b; + return (uint32_t)(d + (prod >> 32)); +} + +/** + * MHRACCSU - 32-bit signed x unsigned multiply high + * with rounding and accumulate + */ +uint32_t HELPER(mhraccsu)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int32_t a =3D (int32_t)rs1; + uint32_t b =3D rs2; + int32_t d =3D (int32_t)dest; + int64_t prod =3D (int64_t)a * (uint64_t)b + (1LL << 31); + return (uint32_t)(d + (prod >> 32)); +} + +/** + * MHACCU - 32-bit unsigned multiply high with accumulate + */ +uint32_t HELPER(mhaccu)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + uint32_t a =3D rs1; + uint32_t b =3D rs2; + uint32_t d =3D dest; + uint64_t prod =3D (uint64_t)a * (uint64_t)b; + return (uint32_t)(d + (prod >> 32)); +} + +/** + * MHRACCU - 32-bit unsigned multiply high with rounding and accumulate + */ +uint32_t HELPER(mhraccu)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + uint32_t a =3D rs1; + uint32_t b =3D rs2; + uint32_t d =3D dest; + uint64_t prod =3D (uint64_t)a * (uint64_t)b + (1LL << 31); + return (uint32_t)(d + (prod >> 32)); +} + +/** + * PMHACC.H.B0 - Multiply halfword by low byte and accumulate (high halfwo= rd) + */ +target_ulong HELPER(pmhacc_h_b0)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int8_t e2 =3D (int8_t)EXTRACT8(rs2, i * 2); + int16_t d =3D (int16_t)EXTRACT16(dest, i); + int32_t prod =3D (int32_t)e1 * (int32_t)e2; + int16_t high =3D (int16_t)(prod >> 8); + uint16_t res =3D (uint16_t)(high + d); + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PMHACC.H.B1 - Multiply halfword by high byte and accumulate (high halfw= ord) + */ +target_ulong HELPER(pmhacc_h_b1)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int8_t e2 =3D (int8_t)EXTRACT8(rs2, i * 2 + 1); + int16_t d =3D (int16_t)EXTRACT16(dest, i); + int32_t prod =3D (int32_t)e1 * (int32_t)e2; + int16_t high =3D (int16_t)(prod >> 8); + uint16_t res =3D (uint16_t)(high + d); + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PMHACCSU.H.B0 - Multiply signed halfword by unsigned low byte and accum= ulate + */ +target_ulong HELPER(pmhaccsu_h_b0)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + uint8_t e2 =3D EXTRACT8(rs2, i * 2); + int16_t d =3D (int16_t)EXTRACT16(dest, i); + int32_t prod =3D (int32_t)e1 * (uint32_t)e2; + int16_t high =3D (int16_t)(prod >> 8); + uint16_t res =3D (uint16_t)(high + d); + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * PMHACCSU.H.B1 - Multiply signed halfword by unsigned high byte and accu= mulate + */ +target_ulong HELPER(pmhaccsu_h_b1)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + uint8_t e2 =3D EXTRACT8(rs2, i * 2 + 1); + int16_t d =3D (int16_t)EXTRACT16(dest, i); + int32_t prod =3D (int32_t)e1 * (uint32_t)e2; + int16_t high =3D (int16_t)(prod >> 8); + uint16_t res =3D (uint16_t)(high + d); + rd =3D INSERT16(rd, res, i); + } + return rd; +} + +/** + * MHACC.H0 - 32-bit multiply by low halfword high accumulate + */ +uint32_t HELPER(mhacc_h0)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int32_t a =3D (int32_t)rs1; + int16_t b =3D (int16_t)(rs2 & 0xFFFF); + int32_t d =3D (int32_t)dest; + int64_t prod =3D (int64_t)a * (int64_t)b; + return (uint32_t)(d + (prod >> 16)); +} + +/** + * MHACC.H1 - 32-bit multiply by high halfword high accumulate + */ +uint32_t HELPER(mhacc_h1)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int32_t a =3D (int32_t)rs1; + int16_t b =3D (int16_t)((rs2 >> 16) & 0xFFFF); + int32_t d =3D (int32_t)dest; + int64_t prod =3D (int64_t)a * (int64_t)b; + return (uint32_t)(d + (prod >> 16)); +} + +/** + * MHACCSU.H0 - 32-bit signed multiply by unsigned low halfword high accum= ulate + */ +uint32_t HELPER(mhaccsu_h0)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int32_t a =3D (int32_t)rs1; + uint16_t b =3D (uint16_t)(rs2 & 0xFFFF); + int32_t d =3D (int32_t)dest; + int64_t prod =3D (int64_t)a * (uint64_t)b; + return (uint32_t)(d + (prod >> 16)); +} + +/** + * MHACCSU.H1 - 32-bit signed multiply by unsigned high halfword high accu= mulate + */ +uint32_t HELPER(mhaccsu_h1)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int32_t a =3D (int32_t)rs1; + uint16_t b =3D (uint16_t)((rs2 >> 16) & 0xFFFF); + int32_t d =3D (int32_t)dest; + int64_t prod =3D (int64_t)a * (uint64_t)b; + return (uint32_t)(d + (prod >> 16)); +} + +/** + * PMHACC.W.H0 - Multiply word by low halfword high accumulate (RV64 only) + */ +uint64_t HELPER(pmhacc_w_h0)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int16_t e2 =3D (int16_t)EXTRACT16(rs2, i * 2); + int32_t d =3D (int32_t)EXTRACT32(dest, i); + int64_t prod =3D (int64_t)e1 * (int64_t)e2; + int32_t high =3D (int32_t)(prod >> 16); + uint32_t res =3D (uint32_t)(high + d); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMHACC.W.H1 - Multiply word by high halfword high accumulate (RV64 only) + */ +uint64_t HELPER(pmhacc_w_h1)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int16_t e2 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t d =3D (int32_t)EXTRACT32(dest, i); + int64_t prod =3D (int64_t)e1 * (int64_t)e2; + int32_t high =3D (int32_t)(prod >> 16); + uint32_t res =3D (uint32_t)(high + d); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMHACCSU.W.H0 - Multiply signed word by unsigned low halfword + * high accumulate (RV64 only) + */ +uint64_t HELPER(pmhaccsu_w_h0)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i * 2); + int32_t d =3D (int32_t)EXTRACT32(dest, i); + int64_t prod =3D (int64_t)e1 * (uint64_t)e2; + int32_t high =3D (int32_t)(prod >> 16); + uint32_t res =3D (uint32_t)(high + d); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMHACCSU.W.H1 - Multiply signed word by unsigned high halfword + * high accumulate (RV64 only) + */ +uint64_t HELPER(pmhaccsu_w_h1)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + uint16_t e2 =3D EXTRACT16(rs2, i * 2 + 1); + int32_t d =3D (int32_t)EXTRACT32(dest, i); + int64_t prod =3D (int64_t)e1 * (uint64_t)e2; + int32_t high =3D (int32_t)(prod >> 16); + uint32_t res =3D (uint32_t)(high + d); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMACC.W.H00 - Packed multiply-accumulate, low halfwords + * For each word: rd[i] =3D dest[i] + (rs1[i][15:0] * rs2[i][15:0]) + */ +uint64_t HELPER(pmacc_w_h00)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2); + int32_t d_h =3D (int32_t)EXTRACT32(dest, i); + int32_t mul =3D (int32_t)s1_h0 * (int32_t)s2_h0; + rd =3D INSERT32(rd, (uint32_t)(d_h + mul), i); + } + return rd; +} + +/** + * PMACC.W.H01 - Packed multiply-accumulate, rs1 low x rs2 high + * For each word: rd[i] =3D dest[i] + (rs1[i][15:0] * rs2[i][31:16]) + */ +uint64_t HELPER(pmacc_w_h01)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t d_h =3D (int32_t)EXTRACT32(dest, i); + int32_t mul =3D (int32_t)s1_h0 * (int32_t)s2_h1; + rd =3D INSERT32(rd, (uint32_t)(d_h + mul), i); + } + return rd; +} + +/** + * PMACC.W.H11 - Packed multiply-accumulate, high halfwords + * For each word: rd[i] =3D dest[i] + (rs1[i][31:16] * rs2[i][31:16]) + */ +uint64_t HELPER(pmacc_w_h11)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t d_h =3D (int32_t)EXTRACT32(dest, i); + int32_t mul =3D (int32_t)s1_h1 * (int32_t)s2_h1; + rd =3D INSERT32(rd, (uint32_t)(d_h + mul), i); + } + return rd; +} + +/** + * PMACCSU.W.H00 - Packed signed x unsigned multiply-accumulate, low halfw= ords + * For each word: rd[i] =3D dest[i] + + * (signed)rs1[i][15:0] * (unsigned)rs2[i][15:0] + */ +uint64_t HELPER(pmaccsu_w_h00)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + uint16_t s2_h0 =3D EXTRACT16(rs2, i * 2); + int32_t d_h =3D (int32_t)EXTRACT32(dest, i); + int32_t mul =3D (int32_t)s1_h0 * (uint32_t)s2_h0; + rd =3D INSERT32(rd, (uint32_t)(d_h + mul), i); + } + return rd; +} + +/** + * PMACCSU.W.H11 - Packed signed x unsigned multiply-accumulate, high half= words + * For each word: rd[i] =3D dest[i] + + * (signed)rs1[i][31:16] * (unsigned)rs2[i][31:16] + */ +uint64_t HELPER(pmaccsu_w_h11)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1); + int32_t d_h =3D (int32_t)EXTRACT32(dest, i); + int32_t mul =3D (int32_t)s1_h1 * (uint32_t)s2_h1; + rd =3D INSERT32(rd, (uint32_t)(d_h + mul), i); + } + return rd; +} + +/** + * PMACCU.W.H00 - Packed unsigned multiply-accumulate, low halfwords + * For each word: rd[i] =3D dest[i] + rs1[i][15:0] * rs2[i][15:0] (unsigne= d) + */ +uint64_t HELPER(pmaccu_w_h00)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint16_t s1_h0 =3D EXTRACT16(rs1, i * 2); + uint16_t s2_h0 =3D EXTRACT16(rs2, i * 2); + uint32_t d_h =3D EXTRACT32(dest, i); + uint32_t mul =3D (uint32_t)s1_h0 * (uint32_t)s2_h0; + rd =3D INSERT32(rd, d_h + mul, i); + } + return rd; +} + +/** + * PMACCU.W.H01 - Packed unsigned multiply-accumulate, rs1 low x rs2 high + * For each word: rd[i] =3D dest[i] + rs1[i][15:0] * rs2[i][31:16] (unsign= ed) + */ +uint64_t HELPER(pmaccu_w_h01)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint16_t s1_h0 =3D EXTRACT16(rs1, i * 2); + uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1); + uint32_t d_h =3D EXTRACT32(dest, i); + uint32_t mul =3D (uint32_t)s1_h0 * (uint32_t)s2_h1; + rd =3D INSERT32(rd, d_h + mul, i); + } + return rd; +} + +/** + * PMACCU.W.H11 - Packed unsigned multiply-accumulate, high halfwords + * For each word: rd[i] =3D dest[i] + rs1[i][31:16] * rs2[i][31:16] (unsig= ned) + */ +uint64_t HELPER(pmaccu_w_h11)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + int elems =3D 2; + + for (int i =3D 0; i < elems; i++) { + uint16_t s1_h1 =3D EXTRACT16(rs1, i * 2 + 1); + uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1); + uint32_t d_h =3D EXTRACT32(dest, i); + uint32_t mul =3D (uint32_t)s1_h1 * (uint32_t)s2_h1; + rd =3D INSERT32(rd, d_h + mul, i); + } + return rd; +} + +/** + * MACC.H00 - 32-bit signed multiply-accumulate, low halfwords + * dest =3D dest + (rs1[15:0] * rs2[15:0]) + */ +uint32_t HELPER(macc_h00)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0); + int32_t d_h =3D (int32_t)dest; + int32_t mul =3D (int32_t)s1_h0 * (int32_t)s2_h0; + return (uint32_t)(d_h + mul); +} + +/** + * MACC.H01 - 32-bit signed multiply-accumulate, rs1 low x rs2 high + * dest =3D dest + (rs1[15:0] * rs2[31:16]) + */ +uint32_t HELPER(macc_h01)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1); + int32_t d_h =3D (int32_t)dest; + int32_t mul =3D (int32_t)s1_h0 * (int32_t)s2_h1; + return (uint32_t)(d_h + mul); +} + +/** + * MACC.H11 - 32-bit signed multiply-accumulate, high halfwords + * dest =3D dest + (rs1[31:16] * rs2[31:16]) + */ +uint32_t HELPER(macc_h11)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1); + int32_t d_h =3D (int32_t)dest; + int32_t mul =3D (int32_t)s1_h1 * (int32_t)s2_h1; + return (uint32_t)(d_h + mul); +} + +/** + * MACCSU.H00 - 32-bit signed x unsigned multiply-accumulate, low halfwords + * dest =3D dest + (rs1[15:0] * rs2[15:0]) with rs2 unsigned + */ +uint32_t HELPER(maccsu_h00)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + uint16_t s2_h0 =3D EXTRACT16(rs2, 0); + int32_t d_h =3D (int32_t)dest; + int32_t mul =3D (int32_t)s1_h0 * (uint32_t)s2_h0; + return (uint32_t)(d_h + mul); +} + +/** + * MACCSU.H11 - 32-bit signed x unsigned multiply-accumulate, high halfwor= ds + * dest =3D dest + (rs1[31:16] * rs2[31:16]) with rs2 unsigned + */ +uint32_t HELPER(maccsu_h11)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1); + uint16_t s2_h1 =3D EXTRACT16(rs2, 1); + int32_t d_h =3D (int32_t)dest; + int32_t mul =3D (int32_t)s1_h1 * (uint32_t)s2_h1; + return (uint32_t)(d_h + mul); +} + +/** + * MACCU.H00 - 32-bit unsigned multiply-accumulate, low halfwords + * dest =3D dest + (rs1[15:0] * rs2[15:0]) (unsigned) + */ +uint32_t HELPER(maccu_h00)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + uint16_t s1_h0 =3D EXTRACT16(rs1, 0); + uint16_t s2_h0 =3D EXTRACT16(rs2, 0); + uint32_t d_h =3D dest; + uint32_t mul =3D (uint32_t)s1_h0 * (uint32_t)s2_h0; + return d_h + mul; +} + +/** + * MACCU.H01 - 32-bit unsigned multiply-accumulate, rs1 low x rs2 high + * dest =3D dest + (rs1[15:0] * rs2[31:16]) (unsigned) + */ +uint32_t HELPER(maccu_h01)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + uint16_t s1_h0 =3D EXTRACT16(rs1, 0); + uint16_t s2_h1 =3D EXTRACT16(rs2, 1); + uint32_t d_h =3D dest; + uint32_t mul =3D (uint32_t)s1_h0 * (uint32_t)s2_h1; + return d_h + mul; +} + +/** + * MACCU.H11 - 32-bit unsigned multiply-accumulate, high halfwords + * dest =3D dest + (rs1[31:16] * rs2[31:16]) (unsigned) + */ +uint32_t HELPER(maccu_h11)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + uint16_t s1_h1 =3D EXTRACT16(rs1, 1); + uint16_t s2_h1 =3D EXTRACT16(rs2, 1); + uint32_t d_h =3D dest; + uint32_t mul =3D (uint32_t)s1_h1 * (uint32_t)s2_h1; + return d_h + mul; +} + +/** + * MACC.W00 - 64-bit signed multiply-accumulate, low word x low word + * dest =3D dest + (rs1[31:0] * rs2[31:0]) + */ +uint64_t HELPER(macc_w00)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0); + int64_t d_w =3D (int64_t)dest; + int64_t mul =3D (int64_t)s1_w0 * (int64_t)s2_w0; + return (uint64_t)(d_w + mul); +} + +/** + * MACC.W01 - 64-bit signed multiply-accumulate, low word x high word + * dest =3D dest + (rs1[31:0] * rs2[63:32]) + */ +uint64_t HELPER(macc_w01)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1); + int64_t d_w =3D (int64_t)dest; + int64_t mul =3D (int64_t)s1_w0 * (int64_t)s2_w1; + return (uint64_t)(d_w + mul); +} + +/** + * MACC.W11 - 64-bit signed multiply-accumulate, high word x high word + * dest =3D dest + (rs1[63:32] * rs2[63:32]) + */ +uint64_t HELPER(macc_w11)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1); + int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1); + int64_t d_w =3D (int64_t)dest; + int64_t mul =3D (int64_t)s1_w1 * (int64_t)s2_w1; + return (uint64_t)(d_w + mul); +} + +/** + * MACCSU.W00 - 64-bit signed x unsigned + * multiply-accumulate, low word x low word + * dest =3D dest + (rs1[31:0] * rs2[31:0]) with rs2 interpreted as unsigned + */ +uint64_t HELPER(maccsu_w00)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + uint32_t s2_w0 =3D EXTRACT32(rs2, 0); + int64_t d_w =3D (int64_t)dest; + int64_t mul =3D (int64_t)s1_w0 * (uint64_t)s2_w0; + return (uint64_t)(d_w + mul); +} + +/** + * MACCSU.W11 - 64-bit signed x unsigned + * multiply-accumulate, high word x high word + * dest =3D dest + (rs1[63:32] * rs2[63:32]) with rs2 interpreted as unsig= ned + */ +uint64_t HELPER(maccsu_w11)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1); + uint32_t s2_w1 =3D EXTRACT32(rs2, 1); + int64_t d_w =3D (int64_t)dest; + int64_t mul =3D (int64_t)s1_w1 * (uint64_t)s2_w1; + return (uint64_t)(d_w + mul); +} + +/** + * MACCU.W00 - 64-bit unsigned multiply-accumulate, low word x low word + * dest =3D dest + (rs1[31:0] * rs2[31:0]) (unsigned) + */ +uint64_t HELPER(maccu_w00)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint32_t s1_w0 =3D EXTRACT32(rs1, 0); + uint32_t s2_w0 =3D EXTRACT32(rs2, 0); + uint64_t d_w =3D dest; + uint64_t mul =3D (uint64_t)s1_w0 * (uint64_t)s2_w0; + return d_w + mul; +} + +/** + * MACCU.W01 - 64-bit unsigned multiply-accumulate, low word x high word + * dest =3D dest + (rs1[31:0] * rs2[63:32]) (unsigned) + */ +uint64_t HELPER(maccu_w01)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint32_t s1_w0 =3D EXTRACT32(rs1, 0); + uint32_t s2_w1 =3D EXTRACT32(rs2, 1); + uint64_t d_w =3D dest; + uint64_t mul =3D (uint64_t)s1_w0 * (uint64_t)s2_w1; + return d_w + mul; +} + +/** + * MACCU.W11 - 64-bit unsigned multiply-accumulate, high word x high word + * dest =3D dest + (rs1[63:32] * rs2[63:32]) (unsigned) + */ +uint64_t HELPER(maccu_w11)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint32_t s1_w1 =3D EXTRACT32(rs1, 1); + uint32_t s2_w1 =3D EXTRACT32(rs2, 1); + uint64_t d_w =3D dest; + uint64_t mul =3D (uint64_t)s1_w1 * (uint64_t)s2_w1; + return d_w + mul; +} --=20 2.34.1 From nobody Sat May 30 20:13:16 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1776422936058589.8751077899603; Fri, 17 Apr 2026 03:48:56 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wDgjW-0001Ft-RX; Fri, 17 Apr 2026 06:47:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wDgjP-00017T-HB; Fri, 17 Apr 2026 06:47:28 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1wDgjL-0007zq-9r; Fri, 17 Apr 2026 06:47:27 -0400 Received: from Huawei.localdomain (unknown [36.110.52.2]) by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S12; Fri, 17 Apr 2026 18:47:19 +0800 (CST) From: Molly Chen To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com, daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com, chao.liu.zevorn@gmail.com Cc: xiaoou@iscas.ac.cn, qemu-riscv@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 10/14] target/riscv: rvp: add Q-format multiplication operations Date: Fri, 17 Apr 2026 18:46:47 +0800 Message-Id: <20260417104652.17857-11-xiaoou@iscas.ac.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn> References: <20260417104652.17857-1-xiaoou@iscas.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S12 X-Coremail-Antispam: 1UD129KBjvAXoW3trW3Ar43XF43Aw17Kw1kAFb_yoW8Aw1rAo W3Gw1Yy395uw17ur409w4UX3WUXrZ2qw1DXw4UZr47Xa4xKrnrKF45J34kAFyxGrWayrW7 WFZ3JF1rtFy3C3sxn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOb7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l 84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV WxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE 3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I x0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJVW8 JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc7CjxVAaw2AFwI0_Jw 0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AK xVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrx kI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v2 6r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F 4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjfU5Tmh DUUUU X-Originating-IP: [36.110.52.2] X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists1p.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn; helo=cstnet.cn X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998, HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1776422938119154100 Content-Type: text/plain; charset="utf-8" Signed-off-by: Molly Chen --- target/riscv/helper.h | 28 ++ target/riscv/insn32.decode | 43 +++ target/riscv/insn_trans/trans_rvp.c.inc | 28 ++ target/riscv/psimd_helper.c | 446 ++++++++++++++++++++++++ 4 files changed, 545 insertions(+) diff --git a/target/riscv/helper.h b/target/riscv/helper.h index 54f8591672..a5ecf9b7d7 100644 --- a/target/riscv/helper.h +++ b/target/riscv/helper.h @@ -1661,3 +1661,31 @@ DEF_HELPER_4(maccsu_w11, i64, env, i64, i64, i64) DEF_HELPER_4(maccu_w00, i64, env, i64, i64, i64) DEF_HELPER_4(maccu_w01, i64, env, i64, i64, i64) DEF_HELPER_4(maccu_w11, i64, env, i64, i64, i64) + +/* Packed SIMD - Q-Format Multiplication Operations */ +DEF_HELPER_3(pmulq_h, tl, env, tl, tl) +DEF_HELPER_3(pmulqr_h, tl, env, tl, tl) +DEF_HELPER_3(pmulq_w, i64, env, i64, i64) +DEF_HELPER_3(pmulqr_w, i64, env, i64, i64) +DEF_HELPER_3(mulq, i32, env, i32, i32) +DEF_HELPER_3(mulqr, i32, env, i32, i32) + +/* Packed SIMD - Q-Format Multiply-Accumulate Operations */ +DEF_HELPER_4(mqacc_h00, i32, env, i32, i32, i32) +DEF_HELPER_4(mqacc_h01, i32, env, i32, i32, i32) +DEF_HELPER_4(mqacc_h11, i32, env, i32, i32, i32) +DEF_HELPER_4(mqracc_h00, i32, env, i32, i32, i32) +DEF_HELPER_4(mqracc_h01, i32, env, i32, i32, i32) +DEF_HELPER_4(mqracc_h11, i32, env, i32, i32, i32) +DEF_HELPER_4(mqacc_w00, i64, env, i64, i64, i64) +DEF_HELPER_4(mqacc_w01, i64, env, i64, i64, i64) +DEF_HELPER_4(mqacc_w11, i64, env, i64, i64, i64) +DEF_HELPER_4(mqracc_w00, i64, env, i64, i64, i64) +DEF_HELPER_4(mqracc_w01, i64, env, i64, i64, i64) +DEF_HELPER_4(mqracc_w11, i64, env, i64, i64, i64) +DEF_HELPER_4(pmqacc_w_h00, i64, env, i64, i64, i64) +DEF_HELPER_4(pmqacc_w_h01, i64, env, i64, i64, i64) +DEF_HELPER_4(pmqacc_w_h11, i64, env, i64, i64, i64) +DEF_HELPER_4(pmqracc_w_h00, i64, env, i64, i64, i64) +DEF_HELPER_4(pmqracc_w_h01, i64, env, i64, i64, i64) +DEF_HELPER_4(pmqracc_w_h11, i64, env, i64, i64, i64) diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode index 9944d0b52c..b2a89e3a1f 100644 --- a/target/riscv/insn32.decode +++ b/target/riscv/insn32.decode @@ -1505,3 +1505,46 @@ maccsu_w11 11111 11 ..... ..... 011 ..... 01110= 11 @r maccu_w00 10101 11 ..... ..... 011 ..... 0111011 @r maccu_w01 10111 11 ..... ..... 001 ..... 0111011 @r maccu_w11 10111 11 ..... ..... 011 ..... 0111011 @r + +# Packed SIMD - Q-Format Multiplication Operations +pmulq_h 11010 00 ..... ..... 111 ..... 0111011 @r +pmulqr_h 11010 10 ..... ..... 111 ..... 0111011 @r +{ + mulq 11010 01 ..... ..... 111 ..... 0111011 @r + pmulq_w 11010 01 ..... ..... 111 ..... 0111011 @r +} +{ + mulqr 11010 11 ..... ..... 111 ..... 0111011 @r + pmulqr_w 11010 11 ..... ..... 111 ..... 0111011 @r +} +# Packed SIMD - Q-Format Multiply-Accumulate Operations +{ + mqacc_h00 11101 00 ..... ..... 111 ..... 0111011 @r + pmqacc_w_h00 11101 00 ..... ..... 111 ..... 0111011 @r +} +{ + mqacc_h01 11111 00 ..... ..... 101 ..... 0111011 @r + pmqacc_w_h01 11111 00 ..... ..... 101 ..... 0111011 @r +} +{ + mqacc_h11 11111 00 ..... ..... 111 ..... 0111011 @r + pmqacc_w_h11 11111 00 ..... ..... 111 ..... 0111011 @r +} +{ + mqracc_h00 11101 10 ..... ..... 111 ..... 0111011 @r + pmqracc_w_h00 11101 10 ..... ..... 111 ..... 0111011 @r +} +{ + mqracc_h01 11111 10 ..... ..... 101 ..... 0111011 @r + pmqracc_w_h01 11111 10 ..... ..... 101 ..... 0111011 @r +} +{ + mqracc_h11 11111 10 ..... ..... 111 ..... 0111011 @r + pmqracc_w_h11 11111 10 ..... ..... 111 ..... 0111011 @r +} +mqacc_w00 11101 01 ..... ..... 111 ..... 0111011 @r +mqacc_w01 11111 01 ..... ..... 101 ..... 0111011 @r +mqacc_w11 11111 01 ..... ..... 111 ..... 0111011 @r +mqracc_w00 11101 11 ..... ..... 111 ..... 0111011 @r +mqracc_w01 11111 11 ..... ..... 101 ..... 0111011 @r +mqracc_w11 11111 11 ..... ..... 111 ..... 0111011 @r diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr= ans/trans_rvp.c.inc index b3476c26ad..3310e23dce 100644 --- a/target/riscv/insn_trans/trans_rvp.c.inc +++ b/target/riscv/insn_trans/trans_rvp.c.inc @@ -830,3 +830,31 @@ GEN_SIMD_TRANS_ACC_64(maccsu_w11) GEN_SIMD_TRANS_ACC_64(maccu_w00) GEN_SIMD_TRANS_ACC_64(maccu_w01) GEN_SIMD_TRANS_ACC_64(maccu_w11) + +/* Packed SIMD - Q-Format Multiplication Operations */ +GEN_SIMD_TRANS(pmulq_h) +GEN_SIMD_TRANS(pmulqr_h) +GEN_SIMD_TRANS_64(pmulq_w) +GEN_SIMD_TRANS_64(pmulqr_w) +GEN_SIMD_TRANS_32(mulq) +GEN_SIMD_TRANS_32(mulqr) + +/* Packed SIMD - Q-Format Multiply-Accumulate Operations */ +GEN_SIMD_TRANS_ACC_32(mqacc_h00) +GEN_SIMD_TRANS_ACC_32(mqacc_h01) +GEN_SIMD_TRANS_ACC_32(mqacc_h11) +GEN_SIMD_TRANS_ACC_32(mqracc_h00) +GEN_SIMD_TRANS_ACC_32(mqracc_h01) +GEN_SIMD_TRANS_ACC_32(mqracc_h11) +GEN_SIMD_TRANS_ACC_64(mqacc_w00) +GEN_SIMD_TRANS_ACC_64(mqacc_w01) +GEN_SIMD_TRANS_ACC_64(mqacc_w11) +GEN_SIMD_TRANS_ACC_64(mqracc_w00) +GEN_SIMD_TRANS_ACC_64(mqracc_w01) +GEN_SIMD_TRANS_ACC_64(mqracc_w11) +GEN_SIMD_TRANS_ACC_64(pmqacc_w_h00) +GEN_SIMD_TRANS_ACC_64(pmqacc_w_h01) +GEN_SIMD_TRANS_ACC_64(pmqacc_w_h11) +GEN_SIMD_TRANS_ACC_64(pmqracc_w_h00) +GEN_SIMD_TRANS_ACC_64(pmqracc_w_h01) +GEN_SIMD_TRANS_ACC_64(pmqracc_w_h11) diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c index bddd24c997..d69a2f6453 100644 --- a/target/riscv/psimd_helper.c +++ b/target/riscv/psimd_helper.c @@ -5628,3 +5628,449 @@ uint64_t HELPER(maccu_w11)(CPURISCVState *env, uint= 64_t rs1, uint64_t mul =3D (uint64_t)s1_w1 * (uint64_t)s2_w1; return d_w + mul; } + +/* Q-Format Multiplication Operations */ + +/** + * PMULQ.H - Packed signed Q-format multiply (fractional) + */ +target_ulong HELPER(pmulq_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t e2 =3D (int16_t)EXTRACT16(rs2, i); + uint16_t result; + + if ((e1 =3D=3D -32768) && (e2 =3D=3D -32768)) { + sat =3D 1; + result =3D 0x7FFF; + } else { + int32_t prod =3D (int32_t)e1 * (int32_t)e2; + result =3D (prod >> 15) & 0xFFFF; + } + rd =3D INSERT16(rd, result, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PMULQR.H - Packed signed Q-format multiply with rounding + */ +target_ulong HELPER(pmulqr_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + int16_t e1 =3D (int16_t)EXTRACT16(rs1, i); + int16_t e2 =3D (int16_t)EXTRACT16(rs2, i); + uint16_t result; + + if ((e1 =3D=3D -32768) && (e2 =3D=3D -32768)) { + sat =3D 1; + result =3D 0x7FFF; + } else { + int32_t prod =3D (int32_t)e1 * (int32_t)e2 + (1 << 14); + result =3D (prod >> 15) & 0xFFFF; + } + rd =3D INSERT16(rd, result, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PMULQ.W - Packed signed 32-bit Q-format multiply (RV64 only) + */ +uint64_t HELPER(pmulq_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int32_t e2 =3D (int32_t)EXTRACT32(rs2, i); + uint32_t result; + + if ((e1 =3D=3D -2147483647 - 1) && (e2 =3D=3D -2147483647 - 1)) { + sat =3D 1; + result =3D 0x7FFFFFFF; + } else { + int64_t prod =3D (int64_t)e1 * (int64_t)e2; + result =3D (uint32_t)(prod >> 31); + } + rd =3D INSERT32(rd, result, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PMULQR.W - Packed signed 32-bit Q-format multiply with rounding (RV64 o= nly) + */ +uint64_t HELPER(pmulqr_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int elems =3D 2; + int sat =3D 0; + + for (int i =3D 0; i < elems; i++) { + int32_t e1 =3D (int32_t)EXTRACT32(rs1, i); + int32_t e2 =3D (int32_t)EXTRACT32(rs2, i); + uint32_t result; + + if ((e1 =3D=3D -2147483647 - 1) && (e2 =3D=3D -2147483647 - 1)) { + sat =3D 1; + result =3D 0x7FFFFFFF; + } else { + int64_t prod =3D (int64_t)e1 * (int64_t)e2 + (1LL << 30); + result =3D (uint32_t)(prod >> 31); + } + rd =3D INSERT32(rd, result, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * MULQ - 32-bit signed Q-format multiply + */ +uint32_t HELPER(mulq)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int32_t a =3D (int32_t)rs1; + int32_t b =3D (int32_t)rs2; + + if ((a =3D=3D -2147483647 - 1) && (b =3D=3D -2147483647 - 1)) { + env->vxsat =3D 1; + return 0x7FFFFFFF; + } else { + int64_t prod =3D (int64_t)a * (int64_t)b; + return (uint32_t)(prod >> 31); + } +} + +/** + * MULQR - 32-bit signed Q-format multiply with rounding + */ +uint32_t HELPER(mulqr)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int32_t a =3D (int32_t)rs1; + int32_t b =3D (int32_t)rs2; + + if ((a =3D=3D -2147483647 - 1) && (b =3D=3D -2147483647 - 1)) { + env->vxsat =3D 1; + return 0x7FFFFFFF; + } else { + int64_t prod =3D (int64_t)a * (int64_t)b + (1LL << 30); + return (uint32_t)(prod >> 31); + } +} + + +/* Q-Format Multiply-Accumulate Operations */ + +/** + * MQACC.H00 - Q-format multiply accumulate, both operands low halfword + */ +uint32_t HELPER(mqacc_h00)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int16_t s1_h0 =3D (int16_t)(rs1 & 0xFFFF); + int16_t s2_h0 =3D (int16_t)(rs2 & 0xFFFF); + int32_t d =3D (int32_t)dest; + int64_t prod =3D (int64_t)s1_h0 * (int64_t)s2_h0; + return (uint32_t)(d + (int32_t)(prod >> 15)); +} + +/** + * MQACC.H01 - Q-format multiply accumulate, rs1 low, rs2 high + */ +uint32_t HELPER(mqacc_h01)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int16_t s1_h0 =3D (int16_t)(rs1 & 0xFFFF); + int16_t s2_h1 =3D (int16_t)((rs2 >> 16) & 0xFFFF); + int32_t d =3D (int32_t)dest; + int64_t prod =3D (int64_t)s1_h0 * (int64_t)s2_h1; + return (uint32_t)(d + (int32_t)(prod >> 15)); +} + +/** + * MQACC.H11 - Q-format multiply accumulate, both operands high halfword + */ +uint32_t HELPER(mqacc_h11)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int16_t s1_h1 =3D (int16_t)((rs1 >> 16) & 0xFFFF); + int16_t s2_h1 =3D (int16_t)((rs2 >> 16) & 0xFFFF); + int32_t d =3D (int32_t)dest; + int64_t prod =3D (int64_t)s1_h1 * (int64_t)s2_h1; + return (uint32_t)(d + (int32_t)(prod >> 15)); +} + +/** + * MQRACC.H00 - Q-format multiply accumulate with rounding, both low halfw= ord + */ +uint32_t HELPER(mqracc_h00)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int16_t s1_h0 =3D (int16_t)(rs1 & 0xFFFF); + int16_t s2_h0 =3D (int16_t)(rs2 & 0xFFFF); + int32_t d =3D (int32_t)dest; + int64_t prod =3D (int64_t)s1_h0 * (int64_t)s2_h0 + (1LL << 14); + return (uint32_t)(d + (int32_t)(prod >> 15)); +} + +/** + * MQRACC.H01 - Q-format multiply accumulate with rounding, rs1 low, rs2 h= igh + */ +uint32_t HELPER(mqracc_h01)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int16_t s1_h0 =3D (int16_t)(rs1 & 0xFFFF); + int16_t s2_h1 =3D (int16_t)((rs2 >> 16) & 0xFFFF); + int32_t d =3D (int32_t)dest; + int64_t prod =3D (int64_t)s1_h0 * (int64_t)s2_h1 + (1LL << 14); + return (uint32_t)(d + (int32_t)(prod >> 15)); +} + +/** + * MQRACC.H11 - Q-format multiply accumulate with rounding, both high half= word + */ +uint32_t HELPER(mqracc_h11)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint32_t dest) +{ + int16_t s1_h1 =3D (int16_t)((rs1 >> 16) & 0xFFFF); + int16_t s2_h1 =3D (int16_t)((rs2 >> 16) & 0xFFFF); + int32_t d =3D (int32_t)dest; + int64_t prod =3D (int64_t)s1_h1 * (int64_t)s2_h1 + (1LL << 14); + return (uint32_t)(d + (int32_t)(prod >> 15)); +} + +/** + * MQACC.W00 - Q-format multiply accumulate, both low word (RV64) + */ +uint64_t HELPER(mqacc_w00)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int32_t s1_w0 =3D (int32_t)(rs1 & 0xFFFFFFFF); + int32_t s2_w0 =3D (int32_t)(rs2 & 0xFFFFFFFF); + int64_t d =3D (int64_t)dest; + int64_t prod =3D (int64_t)s1_w0 * (int64_t)s2_w0; + __int128_t prod_95 =3D ((__int128_t)prod) >> 31; + return (uint64_t)(d + (int64_t)prod_95); +} + +/** + * MQACC.W01 - Q-format multiply accumulate, rs1 low, rs2 high (RV64) + */ +uint64_t HELPER(mqacc_w01)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int32_t s1_w0 =3D (int32_t)(rs1 & 0xFFFFFFFF); + int32_t s2_w1 =3D (int32_t)((rs2 >> 32) & 0xFFFFFFFF); + int64_t d =3D (int64_t)dest; + int64_t prod =3D (int64_t)s1_w0 * (int64_t)s2_w1; + __int128_t prod_95 =3D ((__int128_t)prod) >> 31; + return (uint64_t)(d + (int64_t)prod_95); +} + +/** + * MQACC.W11 - Q-format multiply accumulate, both high word (RV64) + */ +uint64_t HELPER(mqacc_w11)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int32_t s1_w1 =3D (int32_t)((rs1 >> 32) & 0xFFFFFFFF); + int32_t s2_w1 =3D (int32_t)((rs2 >> 32) & 0xFFFFFFFF); + int64_t d =3D (int64_t)dest; + int64_t prod =3D (int64_t)s1_w1 * (int64_t)s2_w1; + __int128_t prod_95 =3D ((__int128_t)prod) >> 31; + return (uint64_t)(d + (int64_t)prod_95); +} + +/** + * MQRACC.W00 - Q-format multiply accumulate with rounding, + * both low word (RV64) + */ +uint64_t HELPER(mqracc_w00)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int32_t s1_w0 =3D (int32_t)(rs1 & 0xFFFFFFFF); + int32_t s2_w0 =3D (int32_t)(rs2 & 0xFFFFFFFF); + int64_t d =3D (int64_t)dest; + int64_t prod =3D (int64_t)s1_w0 * (int64_t)s2_w0 + (1LL << 30); + __int128_t prod_95 =3D ((__int128_t)prod) >> 31; + return (uint64_t)(d + (int64_t)prod_95); +} + +/** + * MQRACC.W01 - Q-format multiply accumulate with rounding, + * rs1 low, rs2 high (RV64) + */ +uint64_t HELPER(mqracc_w01)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int32_t s1_w0 =3D (int32_t)(rs1 & 0xFFFFFFFF); + int32_t s2_w1 =3D (int32_t)((rs2 >> 32) & 0xFFFFFFFF); + int64_t d =3D (int64_t)dest; + int64_t prod =3D (int64_t)s1_w0 * (int64_t)s2_w1 + (1LL << 30); + __int128_t prod_95 =3D ((__int128_t)prod) >> 31; + return (uint64_t)(d + (int64_t)prod_95); +} + +/** + * MQRACC.W11 - Q-format multiply accumulate with rounding, + * both high word (RV64) + */ +uint64_t HELPER(mqracc_w11)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int32_t s1_w1 =3D (int32_t)((rs1 >> 32) & 0xFFFFFFFF); + int32_t s2_w1 =3D (int32_t)((rs2 >> 32) & 0xFFFFFFFF); + int64_t d =3D (int64_t)dest; + int64_t prod =3D (int64_t)s1_w1 * (int64_t)s2_w1 + (1LL << 30); + __int128_t prod_95 =3D ((__int128_t)prod) >> 31; + return (uint64_t)(d + (int64_t)prod_95); +} + +/** + * PMQACC.W.H00 - Packed Q-format multiply accumulate, + * low halfword (RV64) + */ +uint64_t HELPER(pmqacc_w_h00)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2); + int32_t d_w =3D (int32_t)EXTRACT32(dest, i); + int64_t prod =3D (int64_t)s1_h0 * (int64_t)s2_h0; + uint32_t res =3D (uint32_t)(d_w + (int32_t)(prod >> 15)); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMQACC.W.H01 - Packed Q-format multiply accumulate, + * rs1 low, rs2 high (RV64) + */ +uint64_t HELPER(pmqacc_w_h01)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t d_w =3D (int32_t)EXTRACT32(dest, i); + int64_t prod =3D (int64_t)s1_h0 * (int64_t)s2_h1; + uint32_t res =3D (uint32_t)(d_w + (int32_t)(prod >> 15)); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMQACC.W.H11 - Packed Q-format multiply accumulate, + * both high halfword (RV64) + */ +uint64_t HELPER(pmqacc_w_h11)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t d_w =3D (int32_t)EXTRACT32(dest, i); + int64_t prod =3D (int64_t)s1_h1 * (int64_t)s2_h1; + uint32_t res =3D (uint32_t)(d_w + (int32_t)(prod >> 15)); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMQRACC.W.H00 - Packed Q-format multiply accumulate + * with rounding, low halfword (RV64) + */ +uint64_t HELPER(pmqracc_w_h00)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2); + int32_t d_w =3D (int32_t)EXTRACT32(dest, i); + int64_t prod =3D (int64_t)s1_h0 * (int64_t)s2_h0 + (1LL << 14); + uint32_t res =3D (uint32_t)(d_w + (int32_t)(prod >> 15)); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMQRACC.W.H01 - Packed Q-format multiply accumulate + * with rounding, rs1 low, rs2 high (RV64) + */ +uint64_t HELPER(pmqracc_w_h01)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t d_w =3D (int32_t)EXTRACT32(dest, i); + int64_t prod =3D (int64_t)s1_h0 * (int64_t)s2_h1 + (1LL << 14); + uint32_t res =3D (uint32_t)(d_w + (int32_t)(prod >> 15)); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMQRACC.W.H11 - Packed Q-format multiply accumulate + * with rounding, both high halfword (RV64) + */ +uint64_t HELPER(pmqracc_w_h11)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t d_w =3D (int32_t)EXTRACT32(dest, i); + int64_t prod =3D (int64_t)s1_h1 * (int64_t)s2_h1 + (1LL << 14); + uint32_t res =3D (uint32_t)(d_w + (int32_t)(prod >> 15)); + rd =3D INSERT32(rd, res, i); + } + return rd; +} --=20 2.34.1 From nobody Sat May 30 20:13:16 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1776422974065246.73751285455887; Fri, 17 Apr 2026 03:49:34 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wDgjq-0001Wz-GK; Fri, 17 Apr 2026 06:47:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wDgjn-0001Ps-Ig; Fri, 17 Apr 2026 06:47:51 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1wDgjh-00080h-PI; Fri, 17 Apr 2026 06:47:51 -0400 Received: from Huawei.localdomain (unknown [36.110.52.2]) by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S13; Fri, 17 Apr 2026 18:47:20 +0800 (CST) From: Molly Chen To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com, daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com, chao.liu.zevorn@gmail.com Cc: xiaoou@iscas.ac.cn, qemu-riscv@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 11/14] target/riscv: rvp: add two-way and four-way multiply and accumulate operations Date: Fri, 17 Apr 2026 18:46:48 +0800 Message-Id: <20260417104652.17857-12-xiaoou@iscas.ac.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn> References: <20260417104652.17857-1-xiaoou@iscas.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S13 X-Coremail-Antispam: 1UD129KBjvAXoWfKw48WryDAw4UArWkGrW7urg_yoWrWr1kto W3G3Wjy393Xw17uws5uw1UZr1vvrW2vrn8Ww40vr15Xas7Gry7KF1rXw1kZFW8CrWSyFWU WrZ2vF1rJa43C3srn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOb7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l 84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV WxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE 3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I x0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJVW8 JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc7CjxVAaw2AFwI0_Jw 0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AK xVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrx kI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v2 6r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F 4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjfU5Tmh DUUUU X-Originating-IP: [36.110.52.2] X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists1p.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn; helo=cstnet.cn X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998, HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1776422976604158501 Content-Type: text/plain; charset="utf-8" Signed-off-by: Molly Chen --- target/riscv/helper.h | 48 ++ target/riscv/insn32.decode | 48 ++ target/riscv/insn_trans/trans_rvp.c.inc | 48 ++ target/riscv/psimd_helper.c | 938 ++++++++++++++++++++++++ 4 files changed, 1082 insertions(+) diff --git a/target/riscv/helper.h b/target/riscv/helper.h index a5ecf9b7d7..663ac0e242 100644 --- a/target/riscv/helper.h +++ b/target/riscv/helper.h @@ -1689,3 +1689,51 @@ DEF_HELPER_4(pmqacc_w_h11, i64, env, i64, i64, i64) DEF_HELPER_4(pmqracc_w_h00, i64, env, i64, i64, i64) DEF_HELPER_4(pmqracc_w_h01, i64, env, i64, i64, i64) DEF_HELPER_4(pmqracc_w_h11, i64, env, i64, i64, i64) + +/* Packed SIMD - Two-Way Multiply and Accumulate Operations */ +DEF_HELPER_3(pmq2add_h, tl, env, tl, tl) +DEF_HELPER_3(pmqr2add_h, tl, env, tl, tl) +DEF_HELPER_4(pmq2adda_h, tl, env, tl, tl, tl) +DEF_HELPER_4(pmqr2adda_h, tl, env, tl, tl, tl) +DEF_HELPER_3(pmq2add_w, i64, env, i64, i64) +DEF_HELPER_3(pmqr2add_w, i64, env, i64, i64) +DEF_HELPER_4(pmq2adda_w, i64, env, i64, i64, i64) +DEF_HELPER_4(pmqr2adda_w, i64, env, i64, i64, i64) +DEF_HELPER_3(pm2add_h, tl, env, tl, tl) +DEF_HELPER_3(pm2addsu_h, tl, env, tl, tl) +DEF_HELPER_3(pm2addu_h, tl, env, tl, tl) +DEF_HELPER_3(pm2add_hx, tl, env, tl, tl) +DEF_HELPER_3(pm2sub_h, tl, env, tl, tl) +DEF_HELPER_3(pm2sub_hx, tl, env, tl, tl) +DEF_HELPER_4(pm2adda_h, tl, env, tl, tl, tl) +DEF_HELPER_4(pm2addasu_h, tl, env, tl, tl, tl) +DEF_HELPER_4(pm2addau_h, tl, env, tl, tl, tl) +DEF_HELPER_4(pm2adda_hx, tl, env, tl, tl, tl) +DEF_HELPER_4(pm2suba_h, tl, env, tl, tl, tl) +DEF_HELPER_4(pm2suba_hx, tl, env, tl, tl, tl) +DEF_HELPER_3(pm2add_w, i64, env, i64, i64) +DEF_HELPER_3(pm2addsu_w, i64, env, i64, i64) +DEF_HELPER_3(pm2addu_w, i64, env, i64, i64) +DEF_HELPER_3(pm2add_wx, i64, env, i64, i64) +DEF_HELPER_3(pm2sub_w, i64, env, i64, i64) +DEF_HELPER_3(pm2sub_wx, i64, env, i64, i64) +DEF_HELPER_4(pm2adda_w, i64, env, i64, i64, i64) +DEF_HELPER_4(pm2addasu_w, i64, env, i64, i64, i64) +DEF_HELPER_4(pm2addau_w, i64, env, i64, i64, i64) +DEF_HELPER_4(pm2adda_wx, i64, env, i64, i64, i64) +DEF_HELPER_4(pm2suba_w, i64, env, i64, i64, i64) +DEF_HELPER_4(pm2suba_wx, i64, env, i64, i64, i64) + +/* Packed SIMD - Four-Way Multiply and Accumulate Operations */ +DEF_HELPER_3(pm4add_b, tl, env, tl, tl) +DEF_HELPER_3(pm4addsu_b, tl, env, tl, tl) +DEF_HELPER_3(pm4addu_b, tl, env, tl, tl) +DEF_HELPER_4(pm4adda_b, tl, env, tl, tl, tl) +DEF_HELPER_4(pm4addasu_b, tl, env, tl, tl, tl) +DEF_HELPER_4(pm4addau_b, tl, env, tl, tl, tl) +DEF_HELPER_3(pm4add_h, i64, env, i64, i64) +DEF_HELPER_3(pm4addsu_h, i64, env, i64, i64) +DEF_HELPER_3(pm4addu_h, i64, env, i64, i64) +DEF_HELPER_4(pm4adda_h, i64, env, i64, i64, i64) +DEF_HELPER_4(pm4addasu_h, i64, env, i64, i64, i64) +DEF_HELPER_4(pm4addau_h, i64, env, i64, i64, i64) diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode index b2a89e3a1f..ebfbf8c799 100644 --- a/target/riscv/insn32.decode +++ b/target/riscv/insn32.decode @@ -1548,3 +1548,51 @@ mqacc_w11 11111 01 ..... ..... 111 ..... 01110= 11 @r mqracc_w00 11101 11 ..... ..... 111 ..... 0111011 @r mqracc_w01 11111 11 ..... ..... 101 ..... 0111011 @r mqracc_w11 11111 11 ..... ..... 111 ..... 0111011 @r + +# Packed SIMD - Two-Way Multiply and Accumulate Operations +pmq2add_h 10110 00 ..... ..... 101 ..... 0111011 @r +pmqr2add_h 10110 10 ..... ..... 101 ..... 0111011 @r +pmq2adda_h 10111 00 ..... ..... 101 ..... 0111011 @r +pmqr2adda_h 10111 10 ..... ..... 101 ..... 0111011 @r +pmq2add_w 10110 01 ..... ..... 101 ..... 0111011 @r +pmqr2add_w 10110 11 ..... ..... 101 ..... 0111011 @r +pmq2adda_w 10111 01 ..... ..... 101 ..... 0111011 @r +pmqr2adda_w 10111 11 ..... ..... 101 ..... 0111011 @r +pm2add_h 10000 00 ..... ..... 101 ..... 0111011 @r +pm2addsu_h 11100 00 ..... ..... 101 ..... 0111011 @r +pm2addu_h 10100 00 ..... ..... 101 ..... 0111011 @r +pm2add_hx 10010 00 ..... ..... 101 ..... 0111011 @r +pm2sub_h 11000 00 ..... ..... 101 ..... 0111011 @r +pm2sub_hx 11010 00 ..... ..... 101 ..... 0111011 @r +pm2adda_h 10001 00 ..... ..... 101 ..... 0111011 @r +pm2addasu_h 11101 00 ..... ..... 101 ..... 0111011 @r +pm2addau_h 10101 00 ..... ..... 101 ..... 0111011 @r +pm2adda_hx 10011 00 ..... ..... 101 ..... 0111011 @r +pm2suba_h 11001 00 ..... ..... 101 ..... 0111011 @r +pm2suba_hx 11011 00 ..... ..... 101 ..... 0111011 @r +pm2add_w 10000 01 ..... ..... 101 ..... 0111011 @r +pm2addsu_w 11100 01 ..... ..... 101 ..... 0111011 @r +pm2addu_w 10100 01 ..... ..... 101 ..... 0111011 @r +pm2add_wx 10010 01 ..... ..... 101 ..... 0111011 @r +pm2sub_w 11000 01 ..... ..... 101 ..... 0111011 @r +pm2sub_wx 11010 01 ..... ..... 101 ..... 0111011 @r +pm2adda_w 10001 01 ..... ..... 101 ..... 0111011 @r +pm2addasu_w 11101 01 ..... ..... 101 ..... 0111011 @r +pm2addau_w 10101 01 ..... ..... 101 ..... 0111011 @r +pm2adda_wx 10011 01 ..... ..... 101 ..... 0111011 @r +pm2suba_w 11001 01 ..... ..... 101 ..... 0111011 @r +pm2suba_wx 11011 01 ..... ..... 101 ..... 0111011 @r + +# Packed SIMD - Four-Way Multiply and Accumulate Operations +pm4add_b 10000 10 ..... ..... 101 ..... 0111011 @r +pm4addsu_b 11100 10 ..... ..... 101 ..... 0111011 @r +pm4addu_b 10100 10 ..... ..... 101 ..... 0111011 @r +pm4adda_b 10001 10 ..... ..... 101 ..... 0111011 @r +pm4addasu_b 11101 10 ..... ..... 101 ..... 0111011 @r +pm4addau_b 10101 10 ..... ..... 101 ..... 0111011 @r +pm4add_h 10000 11 ..... ..... 101 ..... 0111011 @r +pm4addsu_h 11100 11 ..... ..... 101 ..... 0111011 @r +pm4addu_h 10100 11 ..... ..... 101 ..... 0111011 @r +pm4adda_h 10001 11 ..... ..... 101 ..... 0111011 @r +pm4addasu_h 11101 11 ..... ..... 101 ..... 0111011 @r +pm4addau_h 10101 11 ..... ..... 101 ..... 0111011 @r diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr= ans/trans_rvp.c.inc index 3310e23dce..86071d71f7 100644 --- a/target/riscv/insn_trans/trans_rvp.c.inc +++ b/target/riscv/insn_trans/trans_rvp.c.inc @@ -858,3 +858,51 @@ GEN_SIMD_TRANS_ACC_64(pmqacc_w_h11) GEN_SIMD_TRANS_ACC_64(pmqracc_w_h00) GEN_SIMD_TRANS_ACC_64(pmqracc_w_h01) GEN_SIMD_TRANS_ACC_64(pmqracc_w_h11) + +/* Packed SIMD - Two-Way Multiply and Accumulate Operations */ +GEN_SIMD_TRANS(pmq2add_h) +GEN_SIMD_TRANS(pmqr2add_h) +GEN_SIMD_TRANS_ACC(pmq2adda_h) +GEN_SIMD_TRANS_ACC(pmqr2adda_h) +GEN_SIMD_TRANS_64(pmq2add_w) +GEN_SIMD_TRANS_64(pmqr2add_w) +GEN_SIMD_TRANS_ACC_64(pmq2adda_w) +GEN_SIMD_TRANS_ACC_64(pmqr2adda_w) +GEN_SIMD_TRANS(pm2add_h) +GEN_SIMD_TRANS(pm2addsu_h) +GEN_SIMD_TRANS(pm2addu_h) +GEN_SIMD_TRANS(pm2add_hx) +GEN_SIMD_TRANS(pm2sub_h) +GEN_SIMD_TRANS(pm2sub_hx) +GEN_SIMD_TRANS_ACC(pm2adda_h) +GEN_SIMD_TRANS_ACC(pm2addasu_h) +GEN_SIMD_TRANS_ACC(pm2addau_h) +GEN_SIMD_TRANS_ACC(pm2adda_hx) +GEN_SIMD_TRANS_ACC(pm2suba_h) +GEN_SIMD_TRANS_ACC(pm2suba_hx) +GEN_SIMD_TRANS_64(pm2add_w) +GEN_SIMD_TRANS_64(pm2addsu_w) +GEN_SIMD_TRANS_64(pm2addu_w) +GEN_SIMD_TRANS_64(pm2add_wx) +GEN_SIMD_TRANS_64(pm2sub_w) +GEN_SIMD_TRANS_64(pm2sub_wx) +GEN_SIMD_TRANS_ACC_64(pm2adda_w) +GEN_SIMD_TRANS_ACC_64(pm2addasu_w) +GEN_SIMD_TRANS_ACC_64(pm2addau_w) +GEN_SIMD_TRANS_ACC_64(pm2adda_wx) +GEN_SIMD_TRANS_ACC_64(pm2suba_w) +GEN_SIMD_TRANS_ACC_64(pm2suba_wx) + +/* Packed SIMD - Four-Way Multiply and Accumulate Operations */ +GEN_SIMD_TRANS(pm4add_b) +GEN_SIMD_TRANS(pm4addsu_b) +GEN_SIMD_TRANS(pm4addu_b) +GEN_SIMD_TRANS_ACC(pm4adda_b) +GEN_SIMD_TRANS_ACC(pm4addasu_b) +GEN_SIMD_TRANS_ACC(pm4addau_b) +GEN_SIMD_TRANS_64(pm4add_h) +GEN_SIMD_TRANS_64(pm4addsu_h) +GEN_SIMD_TRANS_64(pm4addu_h) +GEN_SIMD_TRANS_ACC_64(pm4adda_h) +GEN_SIMD_TRANS_ACC_64(pm4addasu_h) +GEN_SIMD_TRANS_ACC_64(pm4addau_h) diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c index d69a2f6453..5eede48581 100644 --- a/target/riscv/psimd_helper.c +++ b/target/riscv/psimd_helper.c @@ -6074,3 +6074,941 @@ uint64_t HELPER(pmqracc_w_h11)(CPURISCVState *env, = uint64_t rs1, } return rd; } + +/* Two-Way Multiply and Accumulate Operations */ + +/** + * PMQ2ADD.H - Add two Q-format products + */ +target_ulong HELPER(pmq2add_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t prod0 =3D (int32_t)s1_h0 * (int32_t)s2_h0; + int64_t prod0_47 =3D ((int64_t)prod0) >> 15; + int32_t prod1 =3D (int32_t)s1_h1 * (int32_t)s2_h1; + int64_t prod1_47 =3D ((int64_t)prod1) >> 15; + uint32_t sum =3D (uint32_t)(prod0_47 + prod1_47); + rd =3D INSERT32(rd, sum, i); + } + return rd; +} + +/** + * PMQR2ADD.H - Add two Q-format products with rounding + */ +target_ulong HELPER(pmqr2add_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t prod0 =3D (int32_t)s1_h0 * (int32_t)s2_h0 + (1LL << 14); + int64_t prod0_47 =3D ((int64_t)prod0) >> 15; + int32_t prod1 =3D (int32_t)s1_h1 * (int32_t)s2_h1 + (1LL << 14); + int64_t prod1_47 =3D ((int64_t)prod1) >> 15; + uint32_t sum =3D (uint32_t)(prod0_47 + prod1_47); + rd =3D INSERT32(rd, sum, i); + } + return rd; +} + +/** + * PMQ2ADDA.H - Add two Q-format products with accumulate + */ +target_ulong HELPER(pmq2adda_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t d =3D (int32_t)EXTRACT32(dest, i); + int32_t prod0 =3D (int32_t)s1_h0 * (int32_t)s2_h0; + int64_t prod0_47 =3D ((int64_t)prod0) >> 15; + int32_t prod1 =3D (int32_t)s1_h1 * (int32_t)s2_h1; + int64_t prod1_47 =3D ((int64_t)prod1) >> 15; + uint32_t sum =3D (uint32_t)(d + prod0_47 + prod1_47); + rd =3D INSERT32(rd, sum, i); + } + return rd; +} + +/** + * PMQR2ADDA.H - Add two Q-format products with rounding and accumulate + */ +target_ulong HELPER(pmqr2adda_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t d =3D (int32_t)EXTRACT32(dest, i); + int32_t prod0 =3D (int32_t)s1_h0 * (int32_t)s2_h0 + (1LL << 14); + int64_t prod0_47 =3D ((int64_t)prod0) >> 15; + int32_t prod1 =3D (int32_t)s1_h1 * (int32_t)s2_h1 + (1LL << 14); + int64_t prod1_47 =3D ((int64_t)prod1) >> 15; + uint32_t sum =3D (uint32_t)(d + prod0_47 + prod1_47); + rd =3D INSERT32(rd, sum, i); + } + return rd; +} + +/** + * PMQ2ADD.W - Add two Q-format products (word, RV64 only) + */ +uint64_t HELPER(pmq2add_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1); + int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0); + int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1); + int64_t prod0 =3D (int64_t)s1_w0 * (int64_t)s2_w0; + __int128_t prod0_95 =3D ((__int128_t)prod0) >> 31; + int64_t prod1 =3D (int64_t)s1_w1 * (int64_t)s2_w1; + __int128_t prod1_95 =3D ((__int128_t)prod1) >> 31; + return (uint64_t)(prod0_95 + prod1_95); +} + +/** + * PMQR2ADD.W - Add two Q-format products with rounding (word, RV64 only) + */ +uint64_t HELPER(pmqr2add_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1); + int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0); + int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1); + int64_t prod0 =3D (int64_t)s1_w0 * (int64_t)s2_w0 + (1LL << 30); + __int128_t prod0_95 =3D ((__int128_t)prod0) >> 31; + int64_t prod1 =3D (int64_t)s1_w1 * (int64_t)s2_w1 + (1LL << 30); + __int128_t prod1_95 =3D ((__int128_t)prod1) >> 31; + return (uint64_t)(prod0_95 + prod1_95); +} + +/** + * PMQ2ADDA.W - Add two Q-format products with accumulate (word, RV64 only) + */ +uint64_t HELPER(pmq2adda_w)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1); + int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0); + int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1); + int64_t d =3D (int64_t)dest; + int64_t prod0 =3D (int64_t)s1_w0 * (int64_t)s2_w0; + __int128_t prod0_95 =3D ((__int128_t)prod0) >> 31; + int64_t prod1 =3D (int64_t)s1_w1 * (int64_t)s2_w1; + __int128_t prod1_95 =3D ((__int128_t)prod1) >> 31; + return (uint64_t)(d + prod0_95 + prod1_95); +} + +/** + * PMQR2ADDA.W - Add two Q-format products with rounding + * and accumulate (word, RV64 only) + */ +uint64_t HELPER(pmqr2adda_w)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1); + int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0); + int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1); + int64_t d =3D (int64_t)dest; + int64_t prod0 =3D (int64_t)s1_w0 * (int64_t)s2_w0 + (1LL << 30); + __int128_t prod0_95 =3D ((__int128_t)prod0) >> 31; + int64_t prod1 =3D (int64_t)s1_w1 * (int64_t)s2_w1 + (1LL << 30); + __int128_t prod1_95 =3D ((__int128_t)prod1) >> 31; + return (uint64_t)(d + prod0_95 + prod1_95); +} + +/** + * PM2ADD.H - Add two products horizontally + * For each word: rd[i] =3D rs1[2i] * rs2[2i] + rs1[2i+1] * rs2[2i+1] + */ +target_ulong HELPER(pm2add_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t prod0 =3D (int32_t)s1_h0 * (int32_t)s2_h0; + int32_t prod1 =3D (int32_t)s1_h1 * (int32_t)s2_h1; + uint32_t sum =3D (uint32_t)(prod0 + prod1); + rd =3D INSERT32(rd, sum, i); + } + return rd; +} + +/** + * PM2ADDSU.H - Add two products horizontally (signed x unsigned) + */ +target_ulong HELPER(pm2addsu_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + uint16_t s2_h0 =3D EXTRACT16(rs2, i * 2); + uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1); + int32_t prod0 =3D (int32_t)s1_h0 * (uint32_t)s2_h0; + int32_t prod1 =3D (int32_t)s1_h1 * (uint32_t)s2_h1; + uint32_t sum =3D (uint32_t)(prod0 + prod1); + rd =3D INSERT32(rd, sum, i); + } + return rd; +} + +/** + * PM2ADDU.H - Add two products horizontally (unsigned) + */ +target_ulong HELPER(pm2addu_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t s1_h0 =3D EXTRACT16(rs1, i * 2); + uint16_t s1_h1 =3D EXTRACT16(rs1, i * 2 + 1); + uint16_t s2_h0 =3D EXTRACT16(rs2, i * 2); + uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1); + uint32_t prod0 =3D (uint32_t)s1_h0 * (uint32_t)s2_h0; + uint32_t prod1 =3D (uint32_t)s1_h1 * (uint32_t)s2_h1; + uint32_t sum =3D prod0 + prod1; + rd =3D INSERT32(rd, sum, i); + } + return rd; +} + +/** + * PM2ADD.HX - Add cross products horizontally + * For each word: rd[i] =3D rs1[2i] * rs2[2i+1] + rs1[2i+1] * rs2[2i] + */ +target_ulong HELPER(pm2add_hx)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t prod01 =3D (int32_t)s1_h0 * (int32_t)s2_h1; + int32_t prod10 =3D (int32_t)s1_h1 * (int32_t)s2_h0; + uint32_t sum =3D (uint32_t)(prod01 + prod10); + rd =3D INSERT32(rd, sum, i); + } + return rd; +} + +/** + * PM2SUB.H - Subtract two products horizontally + * For each word: rd[i] =3D rs1[2i] * rs2[2i] - rs1[2i+1] * rs2[2i+1] + */ +target_ulong HELPER(pm2sub_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t prod0 =3D (int32_t)s1_h0 * (int32_t)s2_h0; + int32_t prod1 =3D (int32_t)s1_h1 * (int32_t)s2_h1; + uint32_t diff =3D (uint32_t)(prod0 - prod1); + rd =3D INSERT32(rd, diff, i); + } + return rd; +} + +/** + * PM2SUB.HX - Subtract cross products horizontally + * For each word: rd[i] =3D rs1[2i+1] * rs2[2i] - rs1[2i] * rs2[2i+1] + */ +target_ulong HELPER(pm2sub_hx)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t prod10 =3D (int32_t)s1_h1 * (int32_t)s2_h0; + int32_t prod01 =3D (int32_t)s1_h0 * (int32_t)s2_h1; + uint32_t diff =3D (uint32_t)(prod10 - prod01); + rd =3D INSERT32(rd, diff, i); + } + return rd; +} + +/** + * PM2ADDA.H - Add two products horizontally with accumulate + */ +target_ulong HELPER(pm2adda_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t d =3D (int32_t)EXTRACT32(dest, i); + int32_t prod0 =3D (int32_t)s1_h0 * (int32_t)s2_h0; + int32_t prod1 =3D (int32_t)s1_h1 * (int32_t)s2_h1; + uint32_t sum =3D (uint32_t)(d + prod0 + prod1); + rd =3D INSERT32(rd, sum, i); + } + return rd; +} + +/** + * PM2ADDASU.H - Add two products horizontally with accumulate + * (signed x unsigned) + */ +target_ulong HELPER(pm2addasu_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + uint16_t s2_h0 =3D EXTRACT16(rs2, i * 2); + uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1); + int32_t d =3D (int32_t)EXTRACT32(dest, i); + int32_t prod0 =3D (int32_t)s1_h0 * (uint32_t)s2_h0; + int32_t prod1 =3D (int32_t)s1_h1 * (uint32_t)s2_h1; + uint32_t sum =3D (uint32_t)(d + prod0 + prod1); + rd =3D INSERT32(rd, sum, i); + } + return rd; +} + +/** + * PM2ADDAU.H - Add two products horizontally with accumulate (unsigned) + */ +target_ulong HELPER(pm2addau_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + uint16_t s1_h0 =3D EXTRACT16(rs1, i * 2); + uint16_t s1_h1 =3D EXTRACT16(rs1, i * 2 + 1); + uint16_t s2_h0 =3D EXTRACT16(rs2, i * 2); + uint16_t s2_h1 =3D EXTRACT16(rs2, i * 2 + 1); + uint32_t d =3D EXTRACT32(dest, i); + uint32_t prod0 =3D (uint32_t)s1_h0 * (uint32_t)s2_h0; + uint32_t prod1 =3D (uint32_t)s1_h1 * (uint32_t)s2_h1; + uint32_t sum =3D d + prod0 + prod1; + rd =3D INSERT32(rd, sum, i); + } + return rd; +} + +/** + * PM2ADDA.HX - Add cross products horizontally with accumulate + */ +target_ulong HELPER(pm2adda_hx)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t d =3D (int32_t)EXTRACT32(dest, i); + int32_t prod01 =3D (int32_t)s1_h0 * (int32_t)s2_h1; + int32_t prod10 =3D (int32_t)s1_h1 * (int32_t)s2_h0; + uint32_t sum =3D (uint32_t)(d + prod01 + prod10); + rd =3D INSERT32(rd, sum, i); + } + return rd; +} + +/** + * PM2SUBA.H - Subtract two products horizontally with accumulate + */ +target_ulong HELPER(pm2suba_h)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t d =3D (int32_t)EXTRACT32(dest, i); + int32_t prod0 =3D (int32_t)s1_h0 * (int32_t)s2_h0; + int32_t prod1 =3D (int32_t)s1_h1 * (int32_t)s2_h1; + uint32_t diff =3D (uint32_t)(d + prod0 - prod1); + rd =3D INSERT32(rd, diff, i); + } + return rd; +} + +/** + * PM2SUBA.HX - Subtract cross products horizontally with accumulate + */ +target_ulong HELPER(pm2suba_hx)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, i * 2 + 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, i * 2); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, i * 2 + 1); + int32_t d =3D (int32_t)EXTRACT32(dest, i); + int32_t prod01 =3D (int32_t)s1_h0 * (int32_t)s2_h1; + int32_t prod10 =3D (int32_t)s1_h1 * (int32_t)s2_h0; + uint32_t diff =3D (uint32_t)(d + prod01 - prod10); + rd =3D INSERT32(rd, diff, i); + } + return rd; +} + +/** + * PM2ADD.W - Add two products horizontally (word, RV64 only) + */ +uint64_t HELPER(pm2add_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1); + int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0); + int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1); + int64_t prod0 =3D (int64_t)s1_w0 * (int64_t)s2_w0; + int64_t prod1 =3D (int64_t)s1_w1 * (int64_t)s2_w1; + return (uint64_t)(prod0 + prod1); +} + +/** + * PM2ADDSU.W - Add two products horizontally (signed x unsigned, RV64 onl= y) + */ +uint64_t HELPER(pm2addsu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1); + uint32_t s2_w0 =3D EXTRACT32(rs2, 0); + uint32_t s2_w1 =3D EXTRACT32(rs2, 1); + int64_t prod0 =3D (int64_t)s1_w0 * (uint64_t)s2_w0; + int64_t prod1 =3D (int64_t)s1_w1 * (uint64_t)s2_w1; + return (uint64_t)(prod0 + prod1); +} + +/** + * PM2ADDU.W - Add two products horizontally (unsigned, RV64 only) + */ +uint64_t HELPER(pm2addu_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint32_t s1_w0 =3D EXTRACT32(rs1, 0); + uint32_t s1_w1 =3D EXTRACT32(rs1, 1); + uint32_t s2_w0 =3D EXTRACT32(rs2, 0); + uint32_t s2_w1 =3D EXTRACT32(rs2, 1); + uint64_t prod0 =3D (uint64_t)s1_w0 * (uint64_t)s2_w0; + uint64_t prod1 =3D (uint64_t)s1_w1 * (uint64_t)s2_w1; + return prod0 + prod1; +} + +/** + * PM2ADD.WX - Add cross products horizontally (word, RV64 only) + */ +uint64_t HELPER(pm2add_wx)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1); + int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0); + int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1); + int64_t prod01 =3D (int64_t)s1_w0 * (int64_t)s2_w1; + int64_t prod10 =3D (int64_t)s1_w1 * (int64_t)s2_w0; + return (uint64_t)(prod01 + prod10); +} + +/** + * PM2SUB.W - Subtract two products horizontally (word, RV64 only) + */ +uint64_t HELPER(pm2sub_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1); + int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0); + int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1); + int64_t prod0 =3D (int64_t)s1_w0 * (int64_t)s2_w0; + int64_t prod1 =3D (int64_t)s1_w1 * (int64_t)s2_w1; + return (uint64_t)(prod0 - prod1); +} + +/** + * PM2SUB.WX - Subtract cross products horizontally (word, RV64 only) + */ +uint64_t HELPER(pm2sub_wx)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1); + int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0); + int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1); + int64_t prod10 =3D (int64_t)s1_w1 * (int64_t)s2_w0; + int64_t prod01 =3D (int64_t)s1_w0 * (int64_t)s2_w1; + return (uint64_t)(prod10 - prod01); +} + +/** + * PM2ADDA.W - Add two products horizontally with accumulate (word, RV64 o= nly) + */ +uint64_t HELPER(pm2adda_w)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1); + int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0); + int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1); + int64_t d =3D (int64_t)dest; + int64_t prod0 =3D (int64_t)s1_w0 * (int64_t)s2_w0; + int64_t prod1 =3D (int64_t)s1_w1 * (int64_t)s2_w1; + return (uint64_t)(d + prod0 + prod1); +} + +/** + * PM2ADDASU.W - Add two products horizontally with accumulate + * (signed x unsigned, RV64 only) + */ +uint64_t HELPER(pm2addasu_w)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1); + uint32_t s2_w0 =3D EXTRACT32(rs2, 0); + uint32_t s2_w1 =3D EXTRACT32(rs2, 1); + int64_t d =3D (int64_t)dest; + int64_t prod0 =3D (int64_t)s1_w0 * (uint64_t)s2_w0; + int64_t prod1 =3D (int64_t)s1_w1 * (uint64_t)s2_w1; + return (uint64_t)(d + prod0 + prod1); +} + +/** + * PM2ADDAU.W - Add two products horizontally with accumulate + * (unsigned, RV64 only) + */ +uint64_t HELPER(pm2addau_w)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint32_t s1_w0 =3D EXTRACT32(rs1, 0); + uint32_t s1_w1 =3D EXTRACT32(rs1, 1); + uint32_t s2_w0 =3D EXTRACT32(rs2, 0); + uint32_t s2_w1 =3D EXTRACT32(rs2, 1); + uint64_t d =3D dest; + uint64_t prod0 =3D (uint64_t)s1_w0 * (uint64_t)s2_w0; + uint64_t prod1 =3D (uint64_t)s1_w1 * (uint64_t)s2_w1; + return d + prod0 + prod1; +} + +/** + * PM2ADDA.WX - Add cross products horizontally with accumulate + * (word, RV64 only) + */ +uint64_t HELPER(pm2adda_wx)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1); + int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0); + int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1); + int64_t d =3D (int64_t)dest; + int64_t prod01 =3D (int64_t)s1_w0 * (int64_t)s2_w1; + int64_t prod10 =3D (int64_t)s1_w1 * (int64_t)s2_w0; + return (uint64_t)(d + prod01 + prod10); +} + +/** + * PM2SUBA.W - Subtract two products horizontally with accumulate + * (word, RV64 only) + */ +uint64_t HELPER(pm2suba_w)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1); + int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0); + int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1); + int64_t d =3D (int64_t)dest; + int64_t prod0 =3D (int64_t)s1_w0 * (int64_t)s2_w0; + int64_t prod1 =3D (int64_t)s1_w1 * (int64_t)s2_w1; + return (uint64_t)(d + prod0 - prod1); +} + +/** + * PM2SUBA.WX - Subtract cross products horizontally with accumulate + * (word, RV64 only) + */ +uint64_t HELPER(pm2suba_wx)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int32_t s1_w0 =3D (int32_t)EXTRACT32(rs1, 0); + int32_t s1_w1 =3D (int32_t)EXTRACT32(rs1, 1); + int32_t s2_w0 =3D (int32_t)EXTRACT32(rs2, 0); + int32_t s2_w1 =3D (int32_t)EXTRACT32(rs2, 1); + int64_t d =3D (int64_t)dest; + int64_t prod01 =3D (int64_t)s1_w0 * (int64_t)s2_w1; + int64_t prod10 =3D (int64_t)s1_w1 * (int64_t)s2_w0; + return (uint64_t)(d + prod01 - prod10); +} + + +/* Four-Way Multiply and Accumulate Operations */ + +/** + * PM4ADD.B - Add four products horizontally (byte to word) + */ +target_ulong HELPER(pm4add_b)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + int8_t s1_b0 =3D (int8_t)EXTRACT8(rs1, i * 4); + int8_t s1_b1 =3D (int8_t)EXTRACT8(rs1, i * 4 + 1); + int8_t s1_b2 =3D (int8_t)EXTRACT8(rs1, i * 4 + 2); + int8_t s1_b3 =3D (int8_t)EXTRACT8(rs1, i * 4 + 3); + int8_t s2_b0 =3D (int8_t)EXTRACT8(rs2, i * 4); + int8_t s2_b1 =3D (int8_t)EXTRACT8(rs2, i * 4 + 1); + int8_t s2_b2 =3D (int8_t)EXTRACT8(rs2, i * 4 + 2); + int8_t s2_b3 =3D (int8_t)EXTRACT8(rs2, i * 4 + 3); + int32_t prod0 =3D (int32_t)s1_b0 * (int32_t)s2_b0; + int32_t prod1 =3D (int32_t)s1_b1 * (int32_t)s2_b1; + int32_t prod2 =3D (int32_t)s1_b2 * (int32_t)s2_b2; + int32_t prod3 =3D (int32_t)s1_b3 * (int32_t)s2_b3; + uint32_t sum =3D (uint32_t)(prod0 + prod1 + prod2 + prod3); + rd =3D INSERT32(rd, sum, i); + } + return rd; +} + +/** + * PM4ADDSU.B - Add four products horizontally (signed x unsigned) + */ +target_ulong HELPER(pm4addsu_b)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + int8_t s1_b0 =3D (int8_t)EXTRACT8(rs1, i * 4); + int8_t s1_b1 =3D (int8_t)EXTRACT8(rs1, i * 4 + 1); + int8_t s1_b2 =3D (int8_t)EXTRACT8(rs1, i * 4 + 2); + int8_t s1_b3 =3D (int8_t)EXTRACT8(rs1, i * 4 + 3); + uint8_t s2_b0 =3D EXTRACT8(rs2, i * 4); + uint8_t s2_b1 =3D EXTRACT8(rs2, i * 4 + 1); + uint8_t s2_b2 =3D EXTRACT8(rs2, i * 4 + 2); + uint8_t s2_b3 =3D EXTRACT8(rs2, i * 4 + 3); + int32_t prod0 =3D (int32_t)s1_b0 * (uint32_t)s2_b0; + int32_t prod1 =3D (int32_t)s1_b1 * (uint32_t)s2_b1; + int32_t prod2 =3D (int32_t)s1_b2 * (uint32_t)s2_b2; + int32_t prod3 =3D (int32_t)s1_b3 * (uint32_t)s2_b3; + uint32_t sum =3D (uint32_t)(prod0 + prod1 + prod2 + prod3); + rd =3D INSERT32(rd, sum, i); + } + return rd; +} + +/** + * PM4ADDU.B - Add four products horizontally (unsigned) + */ +target_ulong HELPER(pm4addu_b)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + uint8_t s1_b0 =3D EXTRACT8(rs1, i * 4); + uint8_t s1_b1 =3D EXTRACT8(rs1, i * 4 + 1); + uint8_t s1_b2 =3D EXTRACT8(rs1, i * 4 + 2); + uint8_t s1_b3 =3D EXTRACT8(rs1, i * 4 + 3); + uint8_t s2_b0 =3D EXTRACT8(rs2, i * 4); + uint8_t s2_b1 =3D EXTRACT8(rs2, i * 4 + 1); + uint8_t s2_b2 =3D EXTRACT8(rs2, i * 4 + 2); + uint8_t s2_b3 =3D EXTRACT8(rs2, i * 4 + 3); + uint32_t prod0 =3D (uint32_t)s1_b0 * (uint32_t)s2_b0; + uint32_t prod1 =3D (uint32_t)s1_b1 * (uint32_t)s2_b1; + uint32_t prod2 =3D (uint32_t)s1_b2 * (uint32_t)s2_b2; + uint32_t prod3 =3D (uint32_t)s1_b3 * (uint32_t)s2_b3; + uint32_t sum =3D prod0 + prod1 + prod2 + prod3; + rd =3D INSERT32(rd, sum, i); + } + return rd; +} + +/** + * PM4ADDA.B - Add four products horizontally with accumulate + */ +target_ulong HELPER(pm4adda_b)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + int8_t s1_b0 =3D (int8_t)EXTRACT8(rs1, i * 4); + int8_t s1_b1 =3D (int8_t)EXTRACT8(rs1, i * 4 + 1); + int8_t s1_b2 =3D (int8_t)EXTRACT8(rs1, i * 4 + 2); + int8_t s1_b3 =3D (int8_t)EXTRACT8(rs1, i * 4 + 3); + int8_t s2_b0 =3D (int8_t)EXTRACT8(rs2, i * 4); + int8_t s2_b1 =3D (int8_t)EXTRACT8(rs2, i * 4 + 1); + int8_t s2_b2 =3D (int8_t)EXTRACT8(rs2, i * 4 + 2); + int8_t s2_b3 =3D (int8_t)EXTRACT8(rs2, i * 4 + 3); + int32_t d =3D (int32_t)EXTRACT32(dest, i); + int32_t prod0 =3D (int32_t)s1_b0 * (int32_t)s2_b0; + int32_t prod1 =3D (int32_t)s1_b1 * (int32_t)s2_b1; + int32_t prod2 =3D (int32_t)s1_b2 * (int32_t)s2_b2; + int32_t prod3 =3D (int32_t)s1_b3 * (int32_t)s2_b3; + uint32_t sum =3D (uint32_t)(d + prod0 + prod1 + prod2 + prod3); + rd =3D INSERT32(rd, sum, i); + } + return rd; +} + +/** + * PM4ADDASU.B - Add four products horizontally with accumulate + * (signed x unsigned) + */ +target_ulong HELPER(pm4addasu_b)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + int8_t s1_b0 =3D (int8_t)EXTRACT8(rs1, i * 4); + int8_t s1_b1 =3D (int8_t)EXTRACT8(rs1, i * 4 + 1); + int8_t s1_b2 =3D (int8_t)EXTRACT8(rs1, i * 4 + 2); + int8_t s1_b3 =3D (int8_t)EXTRACT8(rs1, i * 4 + 3); + uint8_t s2_b0 =3D EXTRACT8(rs2, i * 4); + uint8_t s2_b1 =3D EXTRACT8(rs2, i * 4 + 1); + uint8_t s2_b2 =3D EXTRACT8(rs2, i * 4 + 2); + uint8_t s2_b3 =3D EXTRACT8(rs2, i * 4 + 3); + int32_t d =3D (int32_t)EXTRACT32(dest, i); + int32_t prod0 =3D (int32_t)s1_b0 * (uint32_t)s2_b0; + int32_t prod1 =3D (int32_t)s1_b1 * (uint32_t)s2_b1; + int32_t prod2 =3D (int32_t)s1_b2 * (uint32_t)s2_b2; + int32_t prod3 =3D (int32_t)s1_b3 * (uint32_t)s2_b3; + uint32_t sum =3D (uint32_t)(d + prod0 + prod1 + prod2 + prod3); + rd =3D INSERT32(rd, sum, i); + } + return rd; +} + +/** + * PM4ADDAU.B - Add four products horizontally with accumulate (unsigned) + */ +target_ulong HELPER(pm4addau_b)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2, target_ulong dest) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_W(rd); + + for (int i =3D 0; i < elems; i++) { + uint8_t s1_b0 =3D EXTRACT8(rs1, i * 4); + uint8_t s1_b1 =3D EXTRACT8(rs1, i * 4 + 1); + uint8_t s1_b2 =3D EXTRACT8(rs1, i * 4 + 2); + uint8_t s1_b3 =3D EXTRACT8(rs1, i * 4 + 3); + uint8_t s2_b0 =3D EXTRACT8(rs2, i * 4); + uint8_t s2_b1 =3D EXTRACT8(rs2, i * 4 + 1); + uint8_t s2_b2 =3D EXTRACT8(rs2, i * 4 + 2); + uint8_t s2_b3 =3D EXTRACT8(rs2, i * 4 + 3); + uint32_t d =3D EXTRACT32(dest, i); + uint32_t prod0 =3D (uint32_t)s1_b0 * (uint32_t)s2_b0; + uint32_t prod1 =3D (uint32_t)s1_b1 * (uint32_t)s2_b1; + uint32_t prod2 =3D (uint32_t)s1_b2 * (uint32_t)s2_b2; + uint32_t prod3 =3D (uint32_t)s1_b3 * (uint32_t)s2_b3; + uint32_t sum =3D d + prod0 + prod1 + prod2 + prod3; + rd =3D INSERT32(rd, sum, i); + } + return rd; +} + +/** + * PM4ADD.H - Add four products horizontally (halfword to doubleword, RV64= only) + */ +uint64_t HELPER(pm4add_h)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1); + int16_t s1_h2 =3D (int16_t)EXTRACT16(rs1, 2); + int16_t s1_h3 =3D (int16_t)EXTRACT16(rs1, 3); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1); + int16_t s2_h2 =3D (int16_t)EXTRACT16(rs2, 2); + int16_t s2_h3 =3D (int16_t)EXTRACT16(rs2, 3); + int64_t prod0 =3D (int64_t)s1_h0 * (int64_t)s2_h0; + int64_t prod1 =3D (int64_t)s1_h1 * (int64_t)s2_h1; + int64_t prod2 =3D (int64_t)s1_h2 * (int64_t)s2_h2; + int64_t prod3 =3D (int64_t)s1_h3 * (int64_t)s2_h3; + rd =3D (uint64_t)(prod0 + prod1 + prod2 + prod3); + return rd; +} + +/** + * PM4ADDSU.H - Add four products horizontally (signed x unsigned, RV64 on= ly) + */ +uint64_t HELPER(pm4addsu_h)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1); + int16_t s1_h2 =3D (int16_t)EXTRACT16(rs1, 2); + int16_t s1_h3 =3D (int16_t)EXTRACT16(rs1, 3); + uint16_t s2_h0 =3D EXTRACT16(rs2, 0); + uint16_t s2_h1 =3D EXTRACT16(rs2, 1); + uint16_t s2_h2 =3D EXTRACT16(rs2, 2); + uint16_t s2_h3 =3D EXTRACT16(rs2, 3); + int64_t prod0 =3D (int64_t)s1_h0 * (uint64_t)s2_h0; + int64_t prod1 =3D (int64_t)s1_h1 * (uint64_t)s2_h1; + int64_t prod2 =3D (int64_t)s1_h2 * (uint64_t)s2_h2; + int64_t prod3 =3D (int64_t)s1_h3 * (uint64_t)s2_h3; + rd =3D (uint64_t)(prod0 + prod1 + prod2 + prod3); + return rd; +} + +/** + * PM4ADDU.H - Add four products horizontally (unsigned, RV64 only) + */ +uint64_t HELPER(pm4addu_h)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + uint16_t s1_h0 =3D EXTRACT16(rs1, 0); + uint16_t s1_h1 =3D EXTRACT16(rs1, 1); + uint16_t s1_h2 =3D EXTRACT16(rs1, 2); + uint16_t s1_h3 =3D EXTRACT16(rs1, 3); + uint16_t s2_h0 =3D EXTRACT16(rs2, 0); + uint16_t s2_h1 =3D EXTRACT16(rs2, 1); + uint16_t s2_h2 =3D EXTRACT16(rs2, 2); + uint16_t s2_h3 =3D EXTRACT16(rs2, 3); + uint64_t prod0 =3D (uint64_t)s1_h0 * (uint64_t)s2_h0; + uint64_t prod1 =3D (uint64_t)s1_h1 * (uint64_t)s2_h1; + uint64_t prod2 =3D (uint64_t)s1_h2 * (uint64_t)s2_h2; + uint64_t prod3 =3D (uint64_t)s1_h3 * (uint64_t)s2_h3; + rd =3D prod0 + prod1 + prod2 + prod3; + return rd; +} + +/** + * PM4ADDA.H - Add four products horizontally with accumulate (RV64 only) + */ +uint64_t HELPER(pm4adda_h)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1); + int16_t s1_h2 =3D (int16_t)EXTRACT16(rs1, 2); + int16_t s1_h3 =3D (int16_t)EXTRACT16(rs1, 3); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1); + int16_t s2_h2 =3D (int16_t)EXTRACT16(rs2, 2); + int16_t s2_h3 =3D (int16_t)EXTRACT16(rs2, 3); + int64_t d =3D (int64_t)dest; + int64_t prod0 =3D (int64_t)s1_h0 * (int64_t)s2_h0; + int64_t prod1 =3D (int64_t)s1_h1 * (int64_t)s2_h1; + int64_t prod2 =3D (int64_t)s1_h2 * (int64_t)s2_h2; + int64_t prod3 =3D (int64_t)s1_h3 * (int64_t)s2_h3; + return (uint64_t)(d + prod0 + prod1 + prod2 + prod3); +} + +/** + * PM4ADDASU.H - Add four products horizontally with accumulate + * (signed x unsigned, RV64 only) + */ +uint64_t HELPER(pm4addasu_h)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1); + int16_t s1_h2 =3D (int16_t)EXTRACT16(rs1, 2); + int16_t s1_h3 =3D (int16_t)EXTRACT16(rs1, 3); + uint16_t s2_h0 =3D EXTRACT16(rs2, 0); + uint16_t s2_h1 =3D EXTRACT16(rs2, 1); + uint16_t s2_h2 =3D EXTRACT16(rs2, 2); + uint16_t s2_h3 =3D EXTRACT16(rs2, 3); + int64_t d =3D (int64_t)dest; + int64_t prod0 =3D (int64_t)s1_h0 * (uint64_t)s2_h0; + int64_t prod1 =3D (int64_t)s1_h1 * (uint64_t)s2_h1; + int64_t prod2 =3D (int64_t)s1_h2 * (uint64_t)s2_h2; + int64_t prod3 =3D (int64_t)s1_h3 * (uint64_t)s2_h3; + return (uint64_t)(d + prod0 + prod1 + prod2 + prod3); +} + +/** + * PM4ADDAU.H - Add four products horizontally with accumulate + * (unsigned, RV64 only) + */ +uint64_t HELPER(pm4addau_h)(CPURISCVState *env, uint64_t rs1, + uint64_t rs2, uint64_t dest) +{ + uint16_t s1_h0 =3D EXTRACT16(rs1, 0); + uint16_t s1_h1 =3D EXTRACT16(rs1, 1); + uint16_t s1_h2 =3D EXTRACT16(rs1, 2); + uint16_t s1_h3 =3D EXTRACT16(rs1, 3); + uint16_t s2_h0 =3D EXTRACT16(rs2, 0); + uint16_t s2_h1 =3D EXTRACT16(rs2, 1); + uint16_t s2_h2 =3D EXTRACT16(rs2, 2); + uint16_t s2_h3 =3D EXTRACT16(rs2, 3); + uint64_t d =3D dest; + uint64_t prod0 =3D (uint64_t)s1_h0 * (uint64_t)s2_h0; + uint64_t prod1 =3D (uint64_t)s1_h1 * (uint64_t)s2_h1; + uint64_t prod2 =3D (uint64_t)s1_h2 * (uint64_t)s2_h2; + uint64_t prod3 =3D (uint64_t)s1_h3 * (uint64_t)s2_h3; + return d + prod0 + prod1 + prod2 + prod3; +} --=20 2.34.1 From nobody Sat May 30 20:13:16 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1776422920325306.8366230748461; Fri, 17 Apr 2026 03:48:40 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wDgjm-0001P9-A8; Fri, 17 Apr 2026 06:47:50 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wDgjk-0001OZ-4O; Fri, 17 Apr 2026 06:47:48 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1wDgjh-000827-W2; Fri, 17 Apr 2026 06:47:47 -0400 Received: from Huawei.localdomain (unknown [36.110.52.2]) by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S14; Fri, 17 Apr 2026 18:47:22 +0800 (CST) From: Molly Chen To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com, daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com, chao.liu.zevorn@gmail.com Cc: xiaoou@iscas.ac.cn, qemu-riscv@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 12/14] target/riscv: rvp: add load and replicate instructions. Date: Fri, 17 Apr 2026 18:46:49 +0800 Message-Id: <20260417104652.17857-13-xiaoou@iscas.ac.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn> References: <20260417104652.17857-1-xiaoou@iscas.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S14 X-Coremail-Antispam: 1UD129KBjvJXoWxXry3Kr4xuF1DXrWDKw18Grg_yoWrAr4fpF 48Gr17GrWkGr13AF93Kr45Jr13Wrs5G34UG3sxW3Z7AF45JFWrA348Kw43tr4FqryDWFWU GF1UAryDuFZ5JwUanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUPY14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwCY1x0262kKe7AKxVWUtV W8ZwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v2 6r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFylIxkGc2 Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_ Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr 1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x0JUqLvNU UUUU= X-Originating-IP: [36.110.52.2] X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists1p.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn; helo=cstnet.cn X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998, HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1776422921780154100 Content-Type: text/plain; charset="utf-8" Signed-off-by: Molly Chen --- target/riscv/insn32.decode | 16 ++++++ target/riscv/insn_trans/trans_rvp.c.inc | 67 +++++++++++++++++++++++++ target/riscv/translate.c | 2 + 3 files changed, 85 insertions(+) diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode index ebfbf8c799..b1bde37de4 100644 --- a/target/riscv/insn32.decode +++ b/target/riscv/insn32.decode @@ -44,6 +44,10 @@ %imm_p_ui16 20:4 %imm_p_ui32 20:5 %imm_p_ui64 20:6 +%imm_p_l1 16:8 +%imm_p_l2 15:s1 16:9 +%imm_p_l3 15:s9 24:1 !function=3Dex_shift_6 +%imm_p_l4 15:s9 24:1 !function=3Dex_shift_22 =20 # Argument sets: &empty @@ -64,6 +68,7 @@ &k_aes shamt rs2 rs1 rd &mop5 imm rd rs1 &mop3 imm rd rs1 rs2 +&p_l imm rd =20 # Formats 32: @r ....... ..... ..... ... ..... ....... &r %rs2 %r= s1 %rd @@ -113,6 +118,10 @@ @p_ui16 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui16 %rs1 %= rd @p_ui32 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui32 %rs1 %= rd @p_ui64 ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui64 %rs1 %= rd +@p_l1 ........ ........ .... ..... ....... &p_l imm=3D%imm_p_l1 = %rd +@p_l2 ....... .......... ... ..... ....... &p_l imm=3D%imm_p_l2 = %rd +@p_l3 ....... .......... ... ..... ....... &p_l imm=3D%imm_p_l3 = %rd +@p_l4 ....... .......... ... ..... ....... &p_l imm=3D%imm_p_l4 = %rd =20 # Formats 64: @sh5 ....... ..... ..... ... ..... ....... &shift shamt=3D%sh5 = %rs1 %rd @@ -1596,3 +1605,10 @@ pm4addu_h 10100 11 ..... ..... 101 ..... 01110= 11 @r pm4adda_h 10001 11 ..... ..... 101 ..... 0111011 @r pm4addasu_h 11101 11 ..... ..... 101 ..... 0111011 @r pm4addau_h 10101 11 ..... ..... 101 ..... 0111011 @r + +# Packed SIMD - Load and Replicate instructions +pli_b 10110100 ........ 0010 ..... 0011011 @p_l1 +pli_h 1011000 .......... 010 ..... 0011011 @p_l2 +plui_h 1111000 .......... 010 ..... 0011011 @p_l3 +pli_w 1011001 ..... ..... 010 ..... 0011011 @p_l2 +plui_w 1111001 ..... ..... 010 ..... 0011011 @p_l4 diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr= ans/trans_rvp.c.inc index 86071d71f7..b82774e00f 100644 --- a/target/riscv/insn_trans/trans_rvp.c.inc +++ b/target/riscv/insn_trans/trans_rvp.c.inc @@ -906,3 +906,70 @@ GEN_SIMD_TRANS_64(pm4addu_h) GEN_SIMD_TRANS_ACC_64(pm4adda_h) GEN_SIMD_TRANS_ACC_64(pm4addasu_h) GEN_SIMD_TRANS_ACC_64(pm4addau_h) + +static bool trans_pli_b(DisasContext *ctx, arg_pli_b * a) +{ + REQUIRE_EXT(ctx, RVP); + int i =3D 1; + target_long imm =3D a->imm; + while (i < TARGET_LONG_SIZE) { + imm =3D ((imm << 8) + a->imm); + i++; + } + gen_set_gpri(ctx, a->rd, imm); + return true; +} + +static bool trans_pli_h(DisasContext *ctx, arg_pli_h * a) +{ + REQUIRE_EXT(ctx, RVP); + int i =3D 1; + target_long imm =3D a->imm; + while (i < TARGET_LONG_SIZE / 2) { + imm =3D (imm << 16) + (a->imm & 0xFFFF); + i++; + } + gen_set_gpri(ctx, a->rd, imm); + return true; +} + +static bool trans_plui_h(DisasContext *ctx, arg_plui_h * a) +{ + REQUIRE_EXT(ctx, RVP); + int i =3D 1; + target_long imm =3D a->imm; + while (i < TARGET_LONG_SIZE / 2) { + imm =3D (imm << 16) + (a->imm & 0xFFFF); + i++; + } + gen_set_gpri(ctx, a->rd, imm); + return true; +} + +static bool trans_pli_w(DisasContext *ctx, arg_pli_w * a) +{ + REQUIRE_64BIT(ctx); + REQUIRE_EXT(ctx, RVP); + int i =3D 1; + int64_t imm =3D a->imm; + while (i < TARGET_LONG_SIZE / 4) { + imm =3D (imm << 32) + (a->imm & 0xFFFFFFFF); + i++; + } + gen_set_gpri(ctx, a->rd, imm); + return true; +} + +static bool trans_plui_w(DisasContext *ctx, arg_plui_w * a) +{ + REQUIRE_64BIT(ctx); + REQUIRE_EXT(ctx, RVP); + int i =3D 1; + int64_t imm =3D a->imm; + while (i < TARGET_LONG_SIZE / 4) { + imm =3D (imm << 32) + (a->imm & 0xFFFFFFFF); + i++; + } + gen_set_gpri(ctx, a->rd, imm); + return true; +} diff --git a/target/riscv/translate.c b/target/riscv/translate.c index de3ec7a7ec..04efc7aced 100644 --- a/target/riscv/translate.c +++ b/target/riscv/translate.c @@ -796,7 +796,9 @@ EX_SH(1) EX_SH(2) EX_SH(3) EX_SH(4) +EX_SH(6) EX_SH(12) +EX_SH(22) =20 #define REQUIRE_EXT(ctx, ext) do { \ if (!has_ext(ctx, ext)) { \ --=20 2.34.1 From nobody Sat May 30 20:13:16 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1776422938295955.9618840338967; Fri, 17 Apr 2026 03:48:58 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wDgjs-0001Xw-9Y; Fri, 17 Apr 2026 06:47:56 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wDgjp-0001Uw-9F; Fri, 17 Apr 2026 06:47:53 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1wDgjj-00083b-QA; Fri, 17 Apr 2026 06:47:52 -0400 Received: from Huawei.localdomain (unknown [36.110.52.2]) by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S15; Fri, 17 Apr 2026 18:47:23 +0800 (CST) From: Molly Chen To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com, daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com, chao.liu.zevorn@gmail.com Cc: xiaoou@iscas.ac.cn, qemu-riscv@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 13/14] target/riscv: rvp: add rv32-only register-pair instructions Date: Fri, 17 Apr 2026 18:46:50 +0800 Message-Id: <20260417104652.17857-14-xiaoou@iscas.ac.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn> References: <20260417104652.17857-1-xiaoou@iscas.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S15 X-Coremail-Antispam: 1UD129KBjvAXoWDKw1rGFWfGrykAFy3Kr1DGFg_yoWfXr48to W5Gw15Ar97GrW7ua4akw4UXFy7Zry2vwn3Jr45Zr47uayfGr47KFn8Jrn5Zay8JrWFkFWf XFZ3Grn5tr1a934Dn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUY37AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l 84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV WxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE 3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I x0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJVW8 JwACjI8F5VA0II8E6IAqYI8I648v4I1lc7CjxVAaw2AFwI0_Jw0_GFyl4I8I3I0E4IkC6x 0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2 zVAF1VAY17CE14v26r1q6r43MIIF0xvE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2 IY6xkF7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2 jsIE14v26r4j6F4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73Uj IFyTuYvjfU5TmhDUUUU X-Originating-IP: [36.110.52.2] X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists1p.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn; helo=cstnet.cn X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998, HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1776422940602158501 Content-Type: text/plain; charset="utf-8" Signed-off-by: Molly Chen --- target/riscv/helper.h | 131 ++ target/riscv/insn32.decode | 279 +++ target/riscv/insn_trans/trans_rvp.c.inc | 786 ++++++++- target/riscv/psimd_helper.c | 2068 +++++++++++++++++++++++ 4 files changed, 3220 insertions(+), 44 deletions(-) diff --git a/target/riscv/helper.h b/target/riscv/helper.h index 663ac0e242..85d4fe1b67 100644 --- a/target/riscv/helper.h +++ b/target/riscv/helper.h @@ -1737,3 +1737,134 @@ DEF_HELPER_3(pm4addu_h, i64, env, i64, i64) DEF_HELPER_4(pm4adda_h, i64, env, i64, i64, i64) DEF_HELPER_4(pm4addasu_h, i64, env, i64, i64, i64) DEF_HELPER_4(pm4addau_h, i64, env, i64, i64, i64) + +/* Packed SIMD - Double-Width Operations (RV32 only, register pairs) */ +DEF_HELPER_3(pwadd_b, i64, env, i32, i32) +DEF_HELPER_4(pwadda_b, i64, env, i32, i32, i64) +DEF_HELPER_3(pwaddu_b, i64, env, i32, i32) +DEF_HELPER_4(pwaddau_b, i64, env, i32, i32, i64) +DEF_HELPER_3(pwsub_b, i64, env, i32, i32) +DEF_HELPER_4(pwsuba_b, i64, env, i32, i32, i64) +DEF_HELPER_3(pwsubu_b, i64, env, i32, i32) +DEF_HELPER_4(pwsubau_b, i64, env, i32, i32, i64) +DEF_HELPER_3(pwslli_b, i64, env, i32, i32) +DEF_HELPER_3(pwsll_bs, i64, env, i32, i32) +DEF_HELPER_3(pwslai_b, i64, env, i32, i32) +DEF_HELPER_3(pwsla_bs, i64, env, i32, i32) + +DEF_HELPER_3(pwadd_h, i64, env, i32, i32) +DEF_HELPER_4(pwadda_h, i64, env, i32, i32, i64) +DEF_HELPER_3(pwaddu_h, i64, env, i32, i32) +DEF_HELPER_4(pwaddau_h, i64, env, i32, i32, i64) +DEF_HELPER_3(pwsub_h, i64, env, i32, i32) +DEF_HELPER_4(pwsuba_h, i64, env, i32, i32, i64) +DEF_HELPER_3(pwsubu_h, i64, env, i32, i32) +DEF_HELPER_4(pwsubau_h, i64, env, i32, i32, i64) +DEF_HELPER_3(pwslli_h, i64, env, i32, i32) +DEF_HELPER_3(pwsll_hs, i64, env, i32, i32) +DEF_HELPER_3(pwslai_h, i64, env, i32, i32) +DEF_HELPER_3(pwsla_hs, i64, env, i32, i32) + +DEF_HELPER_3(wadd, i64, env, i32, i32) +DEF_HELPER_4(wadda, i64, env, i32, i32, i64) +DEF_HELPER_3(waddu, i64, env, i32, i32) +DEF_HELPER_4(waddau, i64, env, i32, i32, i64) +DEF_HELPER_3(wsub, i64, env, i32, i32) +DEF_HELPER_4(wsuba, i64, env, i32, i32, i64) +DEF_HELPER_3(wsubu, i64, env, i32, i32) +DEF_HELPER_4(wsubau, i64, env, i32, i32, i64) +DEF_HELPER_3(wslli, i64, env, i32, i32) +DEF_HELPER_3(wsll, i64, env, i32, i32) +DEF_HELPER_3(wslai, i64, env, i32, i32) +DEF_HELPER_3(wsla, i64, env, i32, i32) + +DEF_HELPER_3(wzip8p, i64, env, i32, i32) +DEF_HELPER_3(wzip16p, i64, env, i32, i32) + +DEF_HELPER_4(predsum_dbs, i32, env, i32, i32, i32) +DEF_HELPER_4(predsumu_dbs, i32, env, i32, i32, i32) +DEF_HELPER_4(predsum_dhs, i32, env, i32, i32, i32) +DEF_HELPER_4(predsumu_dhs, i32, env, i32, i32, i32) + +DEF_HELPER_3(pnsrli_b, i32, env, i64, i32) +DEF_HELPER_3(pnsrai_b, i32, env, i64, i32) +DEF_HELPER_3(pnsrari_b, i32, env, i64, i32) +DEF_HELPER_3(pnclipi_b, i32, env, i64, i32) +DEF_HELPER_3(pnclipri_b, i32, env, i64, i32) +DEF_HELPER_3(pnclipiu_b, i32, env, i64, i32) +DEF_HELPER_3(pnclipriu_b, i32, env, i64, i32) +DEF_HELPER_3(pnsrl_bs, i32, env, i64, i32) +DEF_HELPER_3(pnsra_bs, i32, env, i64, i32) +DEF_HELPER_3(pnsrar_bs, i32, env, i64, i32) +DEF_HELPER_3(pnclip_bs, i32, env, i64, i32) +DEF_HELPER_3(pnclipr_bs, i32, env, i64, i32) +DEF_HELPER_3(pnclipu_bs, i32, env, i64, i32) +DEF_HELPER_3(pnclipru_bs, i32, env, i64, i32) + +DEF_HELPER_3(pnsrli_h, i32, env, i64, i32) +DEF_HELPER_3(pnsrai_h, i32, env, i64, i32) +DEF_HELPER_3(pnsrari_h, i32, env, i64, i32) +DEF_HELPER_3(pnclipi_h, i32, env, i64, i32) +DEF_HELPER_3(pnclipri_h, i32, env, i64, i32) +DEF_HELPER_3(pnclipiu_h, i32, env, i64, i32) +DEF_HELPER_3(pnclipriu_h, i32, env, i64, i32) +DEF_HELPER_3(pnsrl_hs, i32, env, i64, i32) +DEF_HELPER_3(pnsra_hs, i32, env, i64, i32) +DEF_HELPER_3(pnsrar_hs, i32, env, i64, i32) +DEF_HELPER_3(pnclip_hs, i32, env, i64, i32) +DEF_HELPER_3(pnclipr_hs, i32, env, i64, i32) +DEF_HELPER_3(pnclipu_hs, i32, env, i64, i32) +DEF_HELPER_3(pnclipru_hs, i32, env, i64, i32) + +DEF_HELPER_3(nsrli, i32, env, i64, i32) +DEF_HELPER_3(nsrai, i32, env, i64, i32) +DEF_HELPER_3(nsrari, i32, env, i64, i32) +DEF_HELPER_3(nclipi, i32, env, i64, i32) +DEF_HELPER_3(nclipri, i32, env, i64, i32) +DEF_HELPER_3(nclipiu, i32, env, i64, i32) +DEF_HELPER_3(nclipriu, i32, env, i64, i32) +DEF_HELPER_3(nsrl, i32, env, i64, i32) +DEF_HELPER_3(nsra, i32, env, i64, i32) +DEF_HELPER_3(nsrar, i32, env, i64, i32) +DEF_HELPER_3(nclip, i32, env, i64, i32) +DEF_HELPER_3(nclipr, i32, env, i64, i32) +DEF_HELPER_3(nclipu, i32, env, i64, i32) +DEF_HELPER_3(nclipru, i32, env, i64, i32) + +DEF_HELPER_4(pmqwacc_h, i64, env, i32, i32, i64) +DEF_HELPER_4(pmqrwacc_h, i64, env, i32, i32, i64) +DEF_HELPER_4(mqwacc, i64, env, i32, i32, i64) +DEF_HELPER_4(mqrwacc, i64, env, i32, i32, i64) + +DEF_HELPER_3(pwmul_b, i64, env, i32, i32) +DEF_HELPER_3(pwmulsu_b, i64, env, i32, i32) +DEF_HELPER_3(pwmulu_b, i64, env, i32, i32) +DEF_HELPER_3(pwmul_h, i64, env, i32, i32) +DEF_HELPER_3(pwmulsu_h, i64, env, i32, i32) +DEF_HELPER_3(pwmulu_h, i64, env, i32, i32) + +DEF_HELPER_4(pwmacc_h, i64, env, i32, i32, i64) +DEF_HELPER_4(pwmaccsu_h, i64, env, i32, i32, i64) +DEF_HELPER_4(pwmaccu_h, i64, env, i32, i32, i64) + +DEF_HELPER_3(wmul, i64, env, i32, i32) +DEF_HELPER_3(wmulsu, i64, env, i32, i32) +DEF_HELPER_3(wmulu, i64, env, i32, i32) + +DEF_HELPER_4(wmacc, i64, env, i32, i32, i64) +DEF_HELPER_4(wmaccsu, i64, env, i32, i32, i64) +DEF_HELPER_4(wmaccu, i64, env, i32, i32, i64) + +DEF_HELPER_3(pm2wadd_h, i64, env, i32, i32) +DEF_HELPER_3(pm2waddsu_h, i64, env, i32, i32) +DEF_HELPER_3(pm2waddu_h, i64, env, i32, i32) +DEF_HELPER_3(pm2wadd_hx, i64, env, i32, i32) +DEF_HELPER_4(pm2wadda_h, i64, env, i32, i32, i64) +DEF_HELPER_4(pm2waddasu_h, i64, env, i32, i32, i64) +DEF_HELPER_4(pm2waddau_h, i64, env, i32, i32, i64) +DEF_HELPER_4(pm2wadda_hx, i64, env, i32, i32, i64) + +DEF_HELPER_3(pm2wsub_h, i64, env, i32, i32) +DEF_HELPER_3(pm2wsub_hx, i64, env, i32, i32) +DEF_HELPER_4(pm2wsuba_h, i64, env, i32, i32, i64) +DEF_HELPER_4(pm2wsuba_hx, i64, env, i32, i32, i64) diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode index b1bde37de4..7be0b9e5e6 100644 --- a/target/riscv/insn32.decode +++ b/target/riscv/insn32.decode @@ -23,6 +23,9 @@ %rd 7:5 %sh5 20:5 %sh6 20:6 +%rs2_p 21:4 +%rs1_p 16:4 +%rd_p 8:4 =20 %sh7 20:7 %csr 20:12 @@ -69,6 +72,7 @@ &mop5 imm rd rs1 &mop3 imm rd rs1 rs2 &p_l imm rd +&p_ui imm rs1 rd =20 # Formats 32: @r ....... ..... ..... ... ..... ....... &r %rs2 %r= s1 %rd @@ -101,6 +105,11 @@ @r2_zimm11 . zimm:11 ..... ... ..... ....... %rs1 %rd @r2_zimm10 .. zimm:10 ..... ... ..... ....... %rs1 %rd @r2_s ....... ..... ..... ... ..... ....... %rs2 %rs1 +@r_p_1 ....... ..... ..... ... ..... ....... &r %rs2 %rs1 rd=3D= %rd_p +@r_p_2 ....... ..... ..... ... ..... ....... &r rs2=3D%rs2_p rs1= =3D%rs1_p rd=3D%rd_p +@r_p_3 ....... ..... ..... ... ..... ....... &r %rs2 rs1=3D%rs1_p= rd=3D%rd_p +@r_p_4 ....... ..... ..... ... ..... ....... &r %rs2 rs1=3D%rs1_p= %rd +@r2_p ....... ..... ..... ... ..... ....... &r2 rs1=3D%rs1_p rd= =3D%rd_p =20 @hfence_gvma ....... ..... ..... ... ..... ....... %rs2 %rs1 @hfence_vvma ....... ..... ..... ... ..... ....... %rs2 %rs1 @@ -122,6 +131,18 @@ @p_l2 ....... .......... ... ..... ....... &p_l imm=3D%imm_p_l2 = %rd @p_l3 ....... .......... ... ..... ....... &p_l imm=3D%imm_p_l3 = %rd @p_l4 ....... .......... ... ..... ....... &p_l imm=3D%imm_p_l4 = %rd +@p_l1_p ........ ........ .... ..... ....... &p_l imm=3D%imm_p_l1 = rd=3D%rd_p +@p_l2_p ........ ........ .... ..... ....... &p_l imm=3D%imm_p_l2 = rd=3D%rd_p +@p_l3_p ....... .......... ... ..... ....... &p_l imm=3D%imm_p_l3 = rd=3D%rd_p +@p_ui8_p ..... .... ... ..... ... ..... ....... &i imm=3D%imm_p_ui8 rs1= =3D%rs1_p rd=3D%rd_p +@p_ui16_p ..... .... ... ..... ... ..... ....... &p_ui imm=3D%imm_p_ui16 %= rs1 rd=3D%rd_p +@p_ui16_p_2 ..... .... ... ..... ... ..... ....... &p_ui imm=3D%imm_p_ui16= rs1=3D%rs1_p rd=3D%rd_p +@p_ui16_p_3 ..... .... ... .... .... ..... ....... &p_ui imm=3D%imm_p_ui16= rs1=3D%rs1_p %rd +@p_ui32_p ..... .... ... ..... ... ..... ....... &p_ui imm=3D%imm_p_ui32 %= rs1 rd=3D%rd_p +@p_ui32_p_2 ..... .... ... ..... ... ..... ....... &p_ui imm=3D%imm_p_ui32= rs1=3D%rs1_p rd=3D%rd_p +@p_ui32_p_3 ..... .... ... ..... ... ..... ....... &p_ui imm=3D%imm_p_ui32= rs1=3D%rs1_p %rd +@p_ui64_p ..... .... ... ..... ... ..... ....... &p_ui imm=3D%imm_p_ui64 %= rs1 rd=3D%rd_p +@p_ui64_p_2 ..... .... ... ..... ... ..... ....... &p_ui imm=3D%imm_p_ui64= rs1=3D%rs1_p %rd =20 # Formats 64: @sh5 ....... ..... ..... ... ..... ....... &shift shamt=3D%sh5 = %rs1 %rd @@ -1612,3 +1633,261 @@ pli_h 1011000 .......... 010 ..... 0011011 @p_l2 plui_h 1111000 .......... 010 ..... 0011011 @p_l3 pli_w 1011001 ..... ..... 010 ..... 0011011 @p_l2 plui_w 1111001 ..... ..... 010 ..... 0011011 @p_l4 + +# Packed SIMD - Double-Width Operations (RV32 only, register pairs) +# register-pair destination +pwadd_b 0000010 ..... ..... 010 .... 10011011 @r_p_1 +pwadda_b 0000110 ..... ..... 010 .... 10011011 @r_p_1 +pwaddu_b 0001010 ..... ..... 010 .... 10011011 @r_p_1 +pwaddau_b 0001110 ..... ..... 010 .... 10011011 @r_p_1 +pwsub_b 0100010 ..... ..... 010 .... 10011011 @r_p_1 +pwsuba_b 0100110 ..... ..... 010 .... 10011011 @r_p_1 +pwsubu_b 0101010 ..... ..... 010 .... 10011011 @r_p_1 +pwsubau_b 0101110 ..... ..... 010 .... 10011011 @r_p_1 +pwslli_b 00000 001.... ..... 010 .... 00011011 @p_ui16_p +pwsll_bs 0000100 ..... ..... 010 .... 00011011 @r_p_1 +pwslai_b 01000 001.... ..... 010 .... 00011011 @p_ui16_p +pwsla_bs 0100100 ..... ..... 010 .... 00011011 @r_p_1 + +pwadd_h 0000000 ..... ..... 010 .... 10011011 @r_p_1 +pwadda_h 0000100 ..... ..... 010 .... 10011011 @r_p_1 +pwaddu_h 0001000 ..... ..... 010 .... 10011011 @r_p_1 +pwaddau_h 0001100 ..... ..... 010 .... 10011011 @r_p_1 +pwsub_h 0100000 ..... ..... 010 .... 10011011 @r_p_1 +pwsuba_h 0100100 ..... ..... 010 .... 10011011 @r_p_1 +pwsubu_h 0101000 ..... ..... 010 .... 10011011 @r_p_1 +pwsubau_h 0101100 ..... ..... 010 .... 10011011 @r_p_1 +pwslli_h 00000 01..... ..... 010 .... 00011011 @p_ui32_p +pwsll_hs 0000101 ..... ..... 010 .... 00011011 @r_p_1 +pwslai_h 01000 01..... ..... 010 .... 00011011 @p_ui32_p +pwsla_hs 0100101 ..... ..... 010 .... 00011011 @r_p_1 + +wadd 0000001 ..... ..... 010 .... 10011011 @r_p_1 +wadda 0000101 ..... ..... 010 .... 10011011 @r_p_1 +waddu 0001001 ..... ..... 010 .... 10011011 @r_p_1 +waddau 0001101 ..... ..... 010 .... 10011011 @r_p_1 +wsub 0100001 ..... ..... 010 .... 10011011 @r_p_1 +wsuba 0100101 ..... ..... 010 .... 10011011 @r_p_1 +wsubu 0101001 ..... ..... 010 .... 10011011 @r_p_1 +wsubau 0101101 ..... ..... 010 .... 10011011 @r_p_1 +wslli 00000 1...... ..... 010 .... 00011011 @p_ui64_p +wsll 0000111 ..... ..... 010 .... 00011011 @r_p_1 +wslai 01000 1...... ..... 010 .... 00011011 @p_ui64_p +wsla 0100111 ..... ..... 010 .... 00011011 @r_p_1 + +wzip8p 0111100 ..... ..... 010 .... 00011011 @r_p_1 +wzip16p 0111101 ..... ..... 010 .... 00011011 @r_p_1 + +#register-pair operands +pli_db 00110100 ........ 0010 .... 00011011 @p_l1_p +padd_db 1000010 .... 0 .... 0110 .... 00011011 @r_p_2 +psub_db 1100010 .... 0 .... 0110 .... 00011011 @r_p_2 +psadd_db 1001010 .... 0 .... 0110 .... 00011011 @r_p_2 +psaddu_db 1011010 .... 0 .... 0110 .... 00011011 @r_p_2 +pssub_db 1101010 .... 0 .... 0110 .... 00011011 @r_p_2 +pssubu_db 1111010 .... 0 .... 0110 .... 00011011 @r_p_2 +paadd_db 1001110 .... 0 .... 0110 .... 00011011 @r_p_2 +paaddu_db 1011110 .... 0 .... 0110 .... 00011011 @r_p_2 +pasub_db 1101110 .... 0 .... 0110 .... 00011011 @r_p_2 +pasubu_db 1111110 .... 0 .... 0110 .... 00011011 @r_p_2 +pabd_db 1100110 .... 0 .... 0110 .... 00011011 @r_p_2 +pabdu_db 1110110 .... 0 .... 0110 .... 00011011 @r_p_2 +psabs_db 0110010 00111 .... 0110 .... 00011011 @r2_p +pli_dh 0011000 .......... 010 .... 00011011 @p_l2_p +plui_dh 0111000 .......... 010 .... 00011011 @p_l3_p +padd_dh 1000000 .... 0 .... 0110 .... 00011011 @r_p_2 +psub_dh 1100000 .... 0 .... 0110 .... 00011011 @r_p_2 +psadd_dh 1001000 .... 0 .... 0110 .... 00011011 @r_p_2 +psaddu_dh 1011000 .... 0 .... 0110 .... 00011011 @r_p_2 +pssub_dh 1101000 .... 0 .... 0110 .... 00011011 @r_p_2 +pssubu_dh 1111000 .... 0 .... 0110 .... 00011011 @r_p_2 +paadd_dh 1001100 .... 0 .... 0110 .... 00011011 @r_p_2 +paaddu_dh 1011100 .... 0 .... 0110 .... 00011011 @r_p_2 +pasub_dh 1101100 .... 0 .... 0110 .... 00011011 @r_p_2 +pasubu_dh 1111100 .... 0 .... 0110 .... 00011011 @r_p_2 +psh1add_dh 1010000 .... 1 .... 0110 .... 00011011 @r_p_2 +pssh1sadd_dh 1011000 .... 1 .... 0110 .... 00011011 @r_p_2 +pas_dhx 1000000 .... 1 .... 1110 .... 00011011 @r_p_2 +psa_dhx 1000010 .... 1 .... 1110 .... 00011011 @r_p_2 +psas_dhx 1001000 .... 1 .... 1110 .... 00011011 @r_p_2 +pssa_dhx 1001010 .... 1 .... 1110 .... 00011011 @r_p_2 +paas_dhx 1001100 .... 1 .... 1110 .... 00011011 @r_p_2 +pasa_dhx 1001110 .... 1 .... 1110 .... 00011011 @r_p_2 +pabd_dh 1100100 .... 0 .... 0110 .... 00011011 @r_p_2 +pabdu_dh 1110100 .... 0 .... 0110 .... 00011011 @r_p_2 +psabs_dh 0110000 00111 .... 0110 .... 00011011 @r2_p +padd_dw 1000001 .... 0 .... 0110 .... 00011011 @r_p_2 +psub_dw 1100001 .... 0 .... 0110 .... 00011011 @r_p_2 +psadd_dw 1001001 .... 0 .... 0110 .... 00011011 @r_p_2 +psaddu_dw 1011001 .... 0 .... 0110 .... 00011011 @r_p_2 +pssub_dw 1101001 .... 0 .... 0110 .... 00011011 @r_p_2 +pssubu_dw 1111001 .... 0 .... 0110 .... 00011011 @r_p_2 +paadd_dw 1001101 .... 0 .... 0110 .... 00011011 @r_p_2 +paaddu_dw 1011101 .... 0 .... 0110 .... 00011011 @r_p_2 +pasub_dw 1101101 .... 0 .... 0110 .... 00011011 @r_p_2 +pasubu_dw 1111101 .... 0 .... 0110 .... 00011011 @r_p_2 +psh1add_dw 1010001 .... 1 .... 0110 .... 00011011 @r_p_2 +pssh1sadd_dw 1011001 .... 1 .... 0110 .... 00011011 @r_p_2 +addd_p 1000011 .... 0 .... 0110 .... 00011011 @r_p_2 +subd_p 1100011 .... 0 .... 0110 .... 00011011 @r_p_2 + +# register-pair first source only +predsum_dbs 0001110 ..... .... 0100 ..... 0011011 @r_p_4 +predsumu_dbs 0011110 ..... .... 0100 ..... 0011011 @r_p_4 +predsum_dhs 0001100 ..... .... 0100 ..... 0011011 @r_p_4 +predsumu_dhs 0011100 ..... .... 0100 ..... 0011011 @r_p_4 + +# register-pair operands +pslli_db 00000 0001... .... 0110 .... 00011011 @p_ui8_p +psrli_db 00000 0001... .... 1110 .... 00011011 @p_ui8_p +psrai_db 01000 0001... .... 1110 .... 00011011 @p_ui8_p +pmin_db 1110010 .... 1 .... 1110 .... 00011011 @r_p_2 +pminu_db 1110110 .... 1 .... 1110 .... 00011011 @r_p_2 +pmax_db 1111010 .... 1 .... 1110 .... 00011011 @r_p_2 +pmaxu_db 1111110 .... 1 .... 1110 .... 00011011 @r_p_2 +pmseq_db 1100010 .... 1 .... 1110 .... 00011011 @r_p_2 +pmslt_db 1101010 .... 1 .... 1110 .... 00011011 @r_p_2 +pmsltu_db 1101110 .... 1 .... 1110 .... 00011011 @r_p_2 +psext_dh_b 0110000 00100 .... 0110 .... 00011011 @r2_p +psati_dh 01100 001.... .... 1110 .... 00011011 @p_ui16_p_2 +pusati_dh 00100 001.... .... 1110 .... 00011011 @p_ui16_p_2 +pslli_dh 00000 001.... .... 0110 .... 00011011 @p_ui16_p_2 +psrli_dh 00000 001.... .... 1110 .... 00011011 @p_ui16_p_2 +psrai_dh 01000 001.... .... 1110 .... 00011011 @p_ui16_p_2 +psslai_dh 01010 001.... .... 0110 .... 00011011 @p_ui16_p_2 +psrari_dh 01010 001.... .... 1110 .... 00011011 @p_ui16_p_2 +pmin_dh 1110000 .... 1 .... 1110 .... 00011011 @r_p_2 +pminu_dh 1110100 .... 1 .... 1110 .... 00011011 @r_p_2 +pmax_dh 1111000 .... 1 .... 1110 .... 00011011 @r_p_2 +pmaxu_dh 1111100 .... 1 .... 1110 .... 00011011 @r_p_2 +pmseq_dh 1100000 .... 1 .... 1110 .... 00011011 @r_p_2 +pmslt_dh 1101000 .... 1 .... 1110 .... 00011011 @r_p_2 +pmsltu_dh 1101100 .... 1 .... 1110 .... 00011011 @r_p_2 +psext_dw_b 0110001 00100 .... 0110 .... 00011011 @r2_p +psext_dw_h 0110001 00101 .... 0110 .... 00011011 @r2_p +psati_dw 01100 01..... .... 1110 .... 00011011 @p_ui32_p_2 +pusati_dw 00100 01..... .... 1110 .... 00011011 @p_ui32_p_2 +pslli_dw 00000 01..... .... 0110 .... 00011011 @p_ui32_p_2 +psrli_dw 00000 01..... .... 1110 .... 00011011 @p_ui32_p_2 +psrai_dw 01000 01..... .... 1110 .... 00011011 @p_ui32_p_2 +psslai_dw 01010 01..... .... 0110 .... 00011011 @p_ui32_p_2 +psrari_dw 01010 01..... .... 1110 .... 00011011 @p_ui32_p_2 +pmin_dw 1110001 .... 1 .... 1110 .... 00011011 @r_p_2 +pminu_dw 1110101 .... 1 .... 1110 .... 00011011 @r_p_2 +pmax_dw 1111001 .... 1 .... 1110 .... 00011011 @r_p_2 +pmaxu_dw 1111101 .... 1 .... 1110 .... 00011011 @r_p_2 +pmseq_dw 1100001 .... 1 .... 1110 .... 00011011 @r_p_2 +pmslt_dw 1101001 .... 1 .... 1110 .... 00011011 @r_p_2 +pmsltu_dw 1101101 .... 1 .... 1110 .... 00011011 @r_p_2 + +# register-pair first source and dest +padd_dbs 0001110 ..... .... 0110 .... 00011011 @r_p_3 +psll_dbs 0000110 ..... .... 0110 .... 00011011 @r_p_3 +psra_dbs 0100110 ..... .... 1110 .... 00011011 @r_p_3 +padd_dhs 0001100 ..... .... 0110 .... 00011011 @r_p_3 +psll_dhs 0000100 ..... .... 0110 .... 00011011 @r_p_3 +psrl_dhs 0000100 ..... .... 1110 .... 00011011 @r_p_3 +psra_dhs 0100100 ..... .... 1110 .... 00011011 @r_p_3 +pssha_dhs 0110100 ..... .... 0110 .... 00011011 @r_p_3 +psshar_dhs 0111100 ..... .... 0110 .... 00011011 @r_p_3 +padd_dws 0001101 ..... .... 0110 .... 00011011 @r_p_3 +psll_dws 0000101 ..... .... 0110 .... 00011011 @r_p_3 +psrl_dws 0000101 ..... .... 1110 .... 00011011 @r_p_3 +psra_dws 0100101 ..... .... 1110 .... 00011011 @r_p_3 +pssha_dws 0110101 ..... .... 0110 .... 00011011 @r_p_3 +psshar_dws 0111101 ..... .... 0110 .... 00011011 @r_p_3 + +# register-pair operands +ppaire_db 1000000 .... 0 .... 1110 .... 00011011 @r_p_2 +ppaireo_db 1001000 .... 0 .... 1110 .... 00011011 @r_p_2 +ppairoe_db 1010000 .... 0 .... 1110 .... 00011011 @r_p_2 +ppairo_db 1011000 .... 0 .... 1110 .... 00011011 @r_p_2 +ppaire_dh 1000001 .... 0 .... 1110 .... 00011011 @r_p_2 +ppaireo_dh 1001001 .... 0 .... 1110 .... 00011011 @r_p_2 +ppairoe_dh 1010001 .... 0 .... 1110 .... 00011011 @r_p_2 +ppairo_dh 1011001 .... 0 .... 1110 .... 00011011 @r_p_2 + +#register-pair first source only +pnsrli_b 00000 001.... .... 1100 ..... 0011011 @p_ui16_p_3 +pnsrai_b 01000 001.... .... 1100 ..... 0011011 @p_ui16_p_3 +pnsrari_b 01010 001.... .... 1100 ..... 0011011 @p_ui16_p_3 +pnclipi_b 01100 001.... .... 1100 ..... 0011011 @p_ui16_p_3 +pnclipri_b 01110 001.... .... 1100 ..... 0011011 @p_ui16_p_3 +pnclipiu_b 00100 001.... .... 1100 ..... 0011011 @p_ui16_p_3 +pnclipriu_b 00110 001.... .... 1100 ..... 0011011 @p_ui16_p_3 +pnsrl_bs 00001 00 ..... .... 1100 ..... 0011011 @r_p_4 +pnsra_bs 01001 00 ..... .... 1100 ..... 0011011 @r_p_4 +pnsrar_bs 01011 00 ..... .... 1100 ..... 0011011 @r_p_4 +pnclip_bs 01101 00 ..... .... 1100 ..... 0011011 @r_p_4 +pnclipr_bs 01111 00 ..... .... 1100 ..... 0011011 @r_p_4 +pnclipu_bs 00101 00 ..... .... 1100 ..... 0011011 @r_p_4 +pnclipru_bs 00111 00 ..... .... 1100 ..... 0011011 @r_p_4 + +pnsrli_h 00000 01..... .... 1100 ..... 0011011 @p_ui32_p_3 +pnsrai_h 01000 01..... .... 1100 ..... 0011011 @p_ui32_p_3 +pnsrari_h 01010 01..... .... 1100 ..... 0011011 @p_ui32_p_3 +pnclipi_h 01100 01..... .... 1100 ..... 0011011 @p_ui32_p_3 +pnclipri_h 01110 01..... .... 1100 ..... 0011011 @p_ui32_p_3 +pnclipiu_h 00100 01..... .... 1100 ..... 0011011 @p_ui32_p_3 +pnclipriu_h 00110 01..... .... 1100 ..... 0011011 @p_ui32_p_3 +pnsrl_hs 00001 01 ..... .... 1100 ..... 0011011 @r_p_4 +pnsra_hs 01001 01 ..... .... 1100 ..... 0011011 @r_p_4 +pnsrar_hs 01011 01 ..... .... 1100 ..... 0011011 @r_p_4 +pnclip_hs 01101 01 ..... .... 1100 ..... 0011011 @r_p_4 +pnclipr_hs 01111 01 ..... .... 1100 ..... 0011011 @r_p_4 +pnclipu_hs 00101 01 ..... .... 1100 ..... 0011011 @r_p_4 +pnclipru_hs 00111 01 ..... .... 1100 ..... 0011011 @r_p_4 + +nsrli 00000 1...... .... 1100 ..... 0011011 @p_ui64_p_2 +nsrai 01000 1...... .... 1100 ..... 0011011 @p_ui64_p_2 +nsrari 01010 1...... .... 1100 ..... 0011011 @p_ui64_p_2 +nclipi 01100 1...... .... 1100 ..... 0011011 @p_ui64_p_2 +nclipri 01110 1...... .... 1100 ..... 0011011 @p_ui64_p_2 +nclipiu 00100 1...... .... 1100 ..... 0011011 @p_ui64_p_2 +nclipriu 00110 1...... .... 1100 ..... 0011011 @p_ui64_p_2 +nsrl 00001 11 ..... .... 1100 ..... 0011011 @r_p_4 +nsra 01001 11 ..... .... 1100 ..... 0011011 @r_p_4 +nsrar 01011 11 ..... .... 1100 ..... 0011011 @r_p_4 +nclip 01101 11 ..... .... 1100 ..... 0011011 @r_p_4 +nclipr 01111 11 ..... .... 1100 ..... 0011011 @r_p_4 +nclipu 00101 11 ..... .... 1100 ..... 0011011 @r_p_4 +nclipru 00111 11 ..... .... 1100 ..... 0011011 @r_p_4 + +# register-pair multiply +pmqwacc_h 01111 00 ..... ..... 010 .... 10011011 @r_p_1 +pmqrwacc_h 01111 10 ..... ..... 010 .... 10011011 @r_p_1 +mqwacc 01111 01 ..... ..... 010 .... 10011011 @r_p_1 +mqrwacc 01111 11 ..... ..... 010 .... 10011011 @r_p_1 + +pwmul_b 00100 10 ..... ..... 010 .... 10011011 @r_p_1 +pwmulsu_b 01100 10 ..... ..... 010 .... 10011011 @r_p_1 +pwmulu_b 00110 10 ..... ..... 010 .... 10011011 @r_p_1 + +pwmul_h 00100 00 ..... ..... 010 .... 10011011 @r_p_1 +pwmulsu_h 01100 00 ..... ..... 010 .... 10011011 @r_p_1 +pwmulu_h 00110 00 ..... ..... 010 .... 10011011 @r_p_1 +pwmacc_h 00101 00 ..... ..... 010 .... 10011011 @r_p_1 +pwmaccsu_h 01101 00 ..... ..... 010 .... 10011011 @r_p_1 +pwmaccu_h 00111 00 ..... ..... 010 .... 10011011 @r_p_1 + +wmul 00100 01 ..... ..... 010 .... 10011011 @r_p_1 +wmulsu 01100 01 ..... ..... 010 .... 10011011 @r_p_1 +wmulu 00110 01 ..... ..... 010 .... 10011011 @r_p_1 +wmacc 00101 01 ..... ..... 010 .... 10011011 @r_p_1 +wmaccsu 01101 01 ..... ..... 010 .... 10011011 @r_p_1 +wmaccu 00111 01 ..... ..... 010 .... 10011011 @r_p_1 + +pm2wadd_h 00000 11 ..... ..... 010 .... 10011011 @r_p_1 +pm2waddsu_h 01100 11 ..... ..... 010 .... 10011011 @r_p_1 +pm2waddu_h 00100 11 ..... ..... 010 .... 10011011 @r_p_1 +pm2wadd_hx 00010 11 ..... ..... 010 .... 10011011 @r_p_1 + +pm2wadda_h 00001 11 ..... ..... 010 .... 10011011 @r_p_1 +pm2waddasu_h 01101 11 ..... ..... 010 .... 10011011 @r_p_1 +pm2waddau_h 00101 11 ..... ..... 010 .... 10011011 @r_p_1 +pm2wadda_hx 00011 11 ..... ..... 010 .... 10011011 @r_p_1 + +pm2wsub_h 01000 11 ..... ..... 010 .... 10011011 @r_p_1 +pm2wsub_hx 01010 11 ..... ..... 010 .... 10011011 @r_p_1 +pm2wsuba_h 01001 11 ..... ..... 010 .... 10011011 @r_p_1 +pm2wsuba_hx 01011 11 ..... ..... 010 .... 10011011 @r_p_1 diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr= ans/trans_rvp.c.inc index b82774e00f..ca459293a3 100644 --- a/target/riscv/insn_trans/trans_rvp.c.inc +++ b/target/riscv/insn_trans/trans_rvp.c.inc @@ -2,6 +2,38 @@ /* RISC-V translation routines for the P Standard Extensions. */ /* Copyright (c) 2026 ISRC ISCAS. */ =20 +/* Save a 64 bit data in src to dst and dst + 1 */ +static void set_pair_regs(DisasContext *ctx, int dst, TCGv_i64 src) +{ +#if defined(TARGET_RISCV32) + TCGv_i64 tl_64 =3D tcg_temp_new_i64(); + TCGv_i64 th_64 =3D tcg_temp_new_i64(); + TCGv_i32 tl_32 =3D tcg_temp_new_i32(); + TCGv_i32 th_32 =3D tcg_temp_new_i32(); + tcg_gen_extract_i64(tl_64, src, 0, 32); + tcg_gen_extract_i64(th_64, src, 32, 32); + tcg_gen_trunc_i64_tl(tl_32, tl_64); + tcg_gen_trunc_i64_tl(th_32, th_64); + gen_set_gpr(ctx, dst, tl_32); + gen_set_gpr(ctx, dst + 1, th_32); +# else + gen_set_gpr(ctx, dst, src); +#endif +} + +/* Concat two 32 bit data in src and src + 1 to dst */ +static void get_pair_regs(DisasContext *ctx, TCGv_i64 dst, int src) +{ +#if defined(TARGET_RISCV32) + TCGv t1 =3D get_gpr(ctx, src, EXT_NONE); + TCGv t2 =3D get_gpr(ctx, src + 1, EXT_NONE); + tcg_gen_concat_i32_i64(dst, t1, t2); +#else + TCGv t1 =3D get_gpr(ctx, src, EXT_NONE); + tcg_gen_mov_tl(dst, t1); +#endif +} + #define GEN_SIMD_TRANS(NAME) \ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ { \ @@ -10,7 +42,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * = a) \ TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE); \ TCGv dest =3D dest_gpr(ctx, a->rd); \ gen_helper_##NAME(dest, tcg_env, src1, src2); \ - return true; \ + return true; \ } =20 #if defined(TARGET_RISCV32) @@ -23,14 +55,14 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME = * a) \ TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE); \ TCGv dest =3D dest_gpr(ctx, a->rd); \ gen_helper_##NAME(dest, tcg_env, src1, src2); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_32(NAME) \ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ { \ REQUIRE_32BIT(ctx); \ - return true; \ + return true; \ } #endif =20 @@ -39,7 +71,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * = a) \ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ { \ REQUIRE_64BIT(ctx); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_64(NAME) \ @@ -51,7 +83,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * = a) \ TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE); \ TCGv dest =3D dest_gpr(ctx, a->rd); \ gen_helper_##NAME(dest, tcg_env, src1, src2); \ - return true; \ + return true; \ } #endif =20 @@ -65,7 +97,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * = a) \ TCGv t =3D tcg_temp_new(); \ gen_helper_##NAME(t, tcg_env, src1, src2, dest); \ gen_set_gpr(ctx, a->rd, t); \ - return true; \ + return true; \ } =20 #if defined(TARGET_RISCV32) @@ -80,14 +112,14 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME= * a) \ TCGv t =3D tcg_temp_new(); \ gen_helper_##NAME(t, tcg_env, src1, src2, dest); \ gen_set_gpr(ctx, a->rd, t); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_ACC_32(NAME) \ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ { \ REQUIRE_32BIT(ctx); \ - return true; \ + return true; \ } #endif =20 @@ -96,7 +128,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME *= a) \ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ { \ REQUIRE_64BIT(ctx); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_ACC_64(NAME) \ @@ -110,7 +142,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME = * a) \ TCGv t =3D tcg_temp_new(); \ gen_helper_##NAME(t, tcg_env, src1, src2, dest); \ gen_set_gpr(ctx, a->rd, t); \ - return true; \ + return true; \ } #endif =20 @@ -122,7 +154,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME = * a) \ TCGv dest =3D dest_gpr(ctx, a->rd); \ gen_helper_##NAME(dest, tcg_env, src1); \ gen_set_gpr(ctx, a->rd, dest); \ - return true; \ + return true; \ } =20 #if defined(TARGET_RISCV32) @@ -130,7 +162,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME = * a) \ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ { \ REQUIRE_64BIT(ctx); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_R1_64(NAME) \ @@ -141,7 +173,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME = * a) \ TCGv src1 =3D get_gpr(ctx, a->rs1, EXT_NONE); \ TCGv dest =3D dest_gpr(ctx, a->rd); \ gen_helper_##NAME(dest, tcg_env, src1); \ - return true; \ + return true; \ } #endif =20 @@ -153,7 +185,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME = * a) \ TCGv imm =3D tcg_constant_tl(a->imm); \ TCGv dest =3D dest_gpr(ctx, a->rd); \ gen_helper_##NAME(dest, tcg_env, src1, imm); \ - return true; \ + return true; \ } =20 #if defined(TARGET_RISCV32) @@ -166,14 +198,14 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAM= E * a) \ TCGv imm =3D tcg_constant_tl(a->imm); \ TCGv dest =3D dest_gpr(ctx, a->rd); \ gen_helper_##NAME(dest, tcg_env, src1, imm); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_IMM_32(NAME) \ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ { \ REQUIRE_32BIT(ctx); \ - return true; \ + return true; \ } #endif =20 @@ -182,7 +214,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME = * a) \ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ { \ REQUIRE_64BIT(ctx); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_IMM_64(NAME) \ @@ -194,7 +226,7 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAME = * a) \ TCGv imm =3D tcg_constant_tl(a->imm); \ TCGv dest =3D dest_gpr(ctx, a->rd); \ gen_helper_##NAME(dest, tcg_env, src1, imm); \ - return true; \ + return true; \ } #endif =20 @@ -209,14 +241,14 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAM= E * a) \ TCGv_i64 t =3D tcg_temp_new_i64(); \ gen_helper_##NAME(t, tcg_env, src1, src2); \ set_pair_regs(ctx, (a->rd) * 2, t); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_REG_PAIR_1(NAME) \ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ { \ REQUIRE_32BIT(ctx); \ - return true; \ + return true; \ } #endif =20 @@ -234,14 +266,14 @@ static bool trans_##INSN(DisasContext *ctx, arg_##INS= N * a) \ TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1); \ gen_helper_##HELPER(dest_0, tcg_env, src1_0, src2_0); \ gen_helper_##HELPER(dest_1, tcg_env, src1_1, src2_1); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_REG_PAIR_2(INSN, HELPER) \ static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ { \ REQUIRE_32BIT(ctx); \ - return true; \ + return true; \ } #endif =20 @@ -257,14 +289,14 @@ static bool trans_##INSN(DisasContext *ctx, arg_##INS= N * a) \ TCGv src2 =3D get_gpr(ctx, a->rs2, EXT_NONE); \ gen_helper_##HELPER(dest_0, tcg_env, src1_0, src2); \ gen_helper_##HELPER(dest_1, tcg_env, src1_1, src2); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_REG_PAIR_3(INSN, HELPER) \ static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ { \ REQUIRE_32BIT(ctx); \ - return true; \ + return true; \ } #endif =20 @@ -282,14 +314,14 @@ static bool trans_##INSN(DisasContext *ctx, arg_##INS= N * a) \ TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1); \ gen_helper_##HELPER(dest_0, tcg_env, src1_0, src2_0); \ gen_helper_##HELPER(dest_1, tcg_env, src1_1, src2_1); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_REG_PAIR_DW(INSN, HELPER) \ static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ { \ REQUIRE_32BIT(ctx); \ - return true; \ + return true; \ } #endif =20 @@ -307,14 +339,14 @@ static bool trans_##INSN(DisasContext *ctx, arg_##INS= N * a) \ TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1); \ gen_helper_##HELPER(dest_0, tcg_env, src1_0, imm_0); \ gen_helper_##HELPER(dest_1, tcg_env, src1_1, imm_1); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_REG_PAIR_DW_IMM(INSN, HELPER) \ static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ { \ REQUIRE_32BIT(ctx); \ - return true; \ + return true; \ } #endif =20 @@ -332,14 +364,14 @@ static bool trans_##INSN(DisasContext *ctx, arg_##INS= N * a) \ TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1); \ gen_helper_##HELPER##_32(dest_0, tcg_env, src1_0, imm_0); \ gen_helper_##HELPER##_32(dest_1, tcg_env, src1_1, imm_1); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_REG_PAIR_DW_IMM_2(INSN, HELPER) \ static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ { \ REQUIRE_32BIT(ctx); \ - return true; \ + return true; \ } #endif =20 @@ -356,14 +388,14 @@ static bool trans_##INSN(DisasContext *ctx, arg_##INS= N * a) \ gen_helper_##HELPER(dest_1, tcg_env, src1_1); \ gen_set_gpr(ctx, (a->rd) * 2, dest_0); \ gen_set_gpr(ctx, (a->rd) * 2 + 1, dest_1); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_REG_PAIR_5(INSN, HELPER) \ static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ { \ REQUIRE_32BIT(ctx); \ - return true; \ + return true; \ } #endif =20 @@ -378,14 +410,14 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAM= E * a) \ TCGv_i64 t =3D tcg_temp_new_i64(); \ gen_helper_##NAME(t, tcg_env, src1, imm); \ set_pair_regs(ctx, (a->rd) * 2, t); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_REG_PAIR_IMM(NAME) \ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ { \ REQUIRE_32BIT(ctx); \ - return true; \ + return true; \ } #endif =20 @@ -403,14 +435,14 @@ static bool trans_##INSN(DisasContext *ctx, arg_##INS= N * a) \ TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1); \ gen_helper_##HELPER(dest_0, tcg_env, src1_0, imm_0); \ gen_helper_##HELPER(dest_1, tcg_env, src1_1, imm_1); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_REG_PAIR_IMM_2(INSN, HELPER) \ static bool trans_##INSN(DisasContext *ctx, arg_##INSN * a) \ { \ REQUIRE_32BIT(ctx); \ - return true; \ + return true; \ } #endif =20 @@ -430,14 +462,14 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAM= E * a) \ } \ gen_helper_##NAME(t, tcg_env, src1, src2, t); \ set_pair_regs(ctx, (a->rd) * 2, t); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_ACC_REG_PAIR_1(NAME) \ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ { \ REQUIRE_32BIT(ctx); \ - return true; \ + return true; \ } #endif =20 @@ -461,14 +493,14 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAM= E * a) \ src1_h =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE); \ } \ gen_helper_##NAME(dest, tcg_env, src1_l, src1_h, src2); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_REG_PAIR_PREDSUM(NAME) \ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ { \ REQUIRE_32BIT(ctx); \ - return true; \ + return true; \ } #endif =20 @@ -487,14 +519,14 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAM= E * a) \ TCGv shamt =3D tcg_constant_tl(a->imm); \ TCGv_i32 dest =3D dest_gpr(ctx, a->rd); \ gen_helper_##NAME(dest, tcg_env, s1, shamt); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_PN_OP_IMM(NAME) \ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ { \ REQUIRE_32BIT(ctx); \ - return true; \ + return true; \ } #endif =20 @@ -513,14 +545,14 @@ static bool trans_##NAME(DisasContext *ctx, arg_##NAM= E * a) \ TCGv_i32 rs2 =3D get_gpr(ctx, a->rs2, EXT_NONE); \ TCGv_i32 dest =3D dest_gpr(ctx, a->rd); \ gen_helper_##NAME(dest, tcg_env, s1, rs2); \ - return true; \ + return true; \ } #else #define GEN_SIMD_TRANS_PN_OP_REG(NAME) \ static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \ { \ REQUIRE_32BIT(ctx); \ - return true; \ + return true; \ } #endif =20 @@ -907,6 +939,236 @@ GEN_SIMD_TRANS_ACC_64(pm4adda_h) GEN_SIMD_TRANS_ACC_64(pm4addasu_h) GEN_SIMD_TRANS_ACC_64(pm4addau_h) =20 +/* Packed SIMD - Double-Width Operations (RV32 only, register pairs) */ +GEN_SIMD_TRANS_REG_PAIR_1(pwadd_b) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwadda_b) +GEN_SIMD_TRANS_REG_PAIR_1(pwaddu_b) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwaddau_b) +GEN_SIMD_TRANS_REG_PAIR_1(pwsub_b) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwsuba_b) +GEN_SIMD_TRANS_REG_PAIR_1(pwsubu_b) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwsubau_b) +GEN_SIMD_TRANS_REG_PAIR_IMM(pwslli_b) +GEN_SIMD_TRANS_REG_PAIR_1(pwsll_bs) +GEN_SIMD_TRANS_REG_PAIR_IMM(pwslai_b) +GEN_SIMD_TRANS_REG_PAIR_1(pwsla_bs) + +GEN_SIMD_TRANS_REG_PAIR_1(pwadd_h) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwadda_h) +GEN_SIMD_TRANS_REG_PAIR_1(pwaddu_h) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwaddau_h) +GEN_SIMD_TRANS_REG_PAIR_1(pwsub_h) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwsuba_h) +GEN_SIMD_TRANS_REG_PAIR_1(pwsubu_h) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwsubau_h) +GEN_SIMD_TRANS_REG_PAIR_IMM(pwslli_h) +GEN_SIMD_TRANS_REG_PAIR_1(pwsll_hs) +GEN_SIMD_TRANS_REG_PAIR_IMM(pwslai_h) +GEN_SIMD_TRANS_REG_PAIR_1(pwsla_hs) + +GEN_SIMD_TRANS_REG_PAIR_1(wadd) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(wadda) +GEN_SIMD_TRANS_REG_PAIR_1(waddu) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(waddau) +GEN_SIMD_TRANS_REG_PAIR_1(wsub) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(wsuba) +GEN_SIMD_TRANS_REG_PAIR_1(wsubu) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(wsubau) + +GEN_SIMD_TRANS_REG_PAIR_IMM(wslli) +GEN_SIMD_TRANS_REG_PAIR_1(wsll) +GEN_SIMD_TRANS_REG_PAIR_IMM(wslai) +GEN_SIMD_TRANS_REG_PAIR_1(wsla) + +GEN_SIMD_TRANS_REG_PAIR_2(padd_db, padd_b) +GEN_SIMD_TRANS_REG_PAIR_2(psub_db, psub_b) +GEN_SIMD_TRANS_REG_PAIR_2(psadd_db, psadd_b) +GEN_SIMD_TRANS_REG_PAIR_2(psaddu_db, psaddu_b) +GEN_SIMD_TRANS_REG_PAIR_2(pssub_db, pssub_b) +GEN_SIMD_TRANS_REG_PAIR_2(pssubu_db, pssubu_b) +GEN_SIMD_TRANS_REG_PAIR_2(paadd_db, paadd_b) +GEN_SIMD_TRANS_REG_PAIR_2(paaddu_db, paaddu_b) +GEN_SIMD_TRANS_REG_PAIR_2(pasub_db, pasub_b) +GEN_SIMD_TRANS_REG_PAIR_2(pasubu_db, pasubu_b) +GEN_SIMD_TRANS_REG_PAIR_2(pabd_db, pabd_b) +GEN_SIMD_TRANS_REG_PAIR_2(pabdu_db, pabdu_b) +GEN_SIMD_TRANS_REG_PAIR_5(psabs_db, psabs_b) +GEN_SIMD_TRANS_REG_PAIR_2(padd_dh, padd_h) +GEN_SIMD_TRANS_REG_PAIR_2(psub_dh, psub_h) +GEN_SIMD_TRANS_REG_PAIR_2(psadd_dh, psadd_h) +GEN_SIMD_TRANS_REG_PAIR_2(psaddu_dh, psaddu_h) +GEN_SIMD_TRANS_REG_PAIR_2(pssub_dh, pssub_h) +GEN_SIMD_TRANS_REG_PAIR_2(pssubu_dh, pssubu_h) +GEN_SIMD_TRANS_REG_PAIR_2(paadd_dh, paadd_h) +GEN_SIMD_TRANS_REG_PAIR_2(paaddu_dh, paaddu_h) +GEN_SIMD_TRANS_REG_PAIR_2(pasub_dh, pasub_h) +GEN_SIMD_TRANS_REG_PAIR_2(pasubu_dh, pasubu_h) +GEN_SIMD_TRANS_REG_PAIR_2(psh1add_dh, psh1add_h) +GEN_SIMD_TRANS_REG_PAIR_2(pssh1sadd_dh, pssh1sadd_h) +GEN_SIMD_TRANS_REG_PAIR_2(pas_dhx, pas_hx) +GEN_SIMD_TRANS_REG_PAIR_2(psa_dhx, psa_hx) +GEN_SIMD_TRANS_REG_PAIR_2(psas_dhx, psas_hx) +GEN_SIMD_TRANS_REG_PAIR_2(pssa_dhx, pssa_hx) +GEN_SIMD_TRANS_REG_PAIR_2(paas_dhx, paas_hx) +GEN_SIMD_TRANS_REG_PAIR_2(pasa_dhx, pasa_hx) +GEN_SIMD_TRANS_REG_PAIR_2(pabd_dh, pabd_h) +GEN_SIMD_TRANS_REG_PAIR_2(pabdu_dh, pabdu_h) +GEN_SIMD_TRANS_REG_PAIR_5(psabs_dh, psabs_h) +GEN_SIMD_TRANS_REG_PAIR_DW(psadd_dw, sadd) +GEN_SIMD_TRANS_REG_PAIR_DW(psaddu_dw, saddu) +GEN_SIMD_TRANS_REG_PAIR_DW(pssub_dw, ssub) +GEN_SIMD_TRANS_REG_PAIR_DW(pssubu_dw, ssubu) +GEN_SIMD_TRANS_REG_PAIR_DW(paadd_dw, aadd) +GEN_SIMD_TRANS_REG_PAIR_DW(paaddu_dw, aaddu) +GEN_SIMD_TRANS_REG_PAIR_DW(pasub_dw, asub) +GEN_SIMD_TRANS_REG_PAIR_DW(pasubu_dw, asubu) +GEN_SIMD_TRANS_REG_PAIR_DW(pssh1sadd_dw, ssh1sadd) + +GEN_SIMD_TRANS_REG_PAIR_IMM_2(pslli_db, pslli_b) +GEN_SIMD_TRANS_REG_PAIR_IMM_2(psrli_db, psrli_b) +GEN_SIMD_TRANS_REG_PAIR_IMM_2(psrai_db, psrai_b) +GEN_SIMD_TRANS_REG_PAIR_2(pmin_db, pmin_b) +GEN_SIMD_TRANS_REG_PAIR_2(pminu_db, pminu_b) +GEN_SIMD_TRANS_REG_PAIR_2(pmax_db, pmax_b) +GEN_SIMD_TRANS_REG_PAIR_2(pmaxu_db, pmaxu_b) +GEN_SIMD_TRANS_REG_PAIR_2(pmseq_db, pmseq_b) +GEN_SIMD_TRANS_REG_PAIR_2(pmslt_db, pmslt_b) +GEN_SIMD_TRANS_REG_PAIR_2(pmsltu_db, pmsltu_b) +GEN_SIMD_TRANS_REG_PAIR_5(psext_dh_b, psext_h_b) +GEN_SIMD_TRANS_REG_PAIR_IMM_2(psati_dh, psati_h) +GEN_SIMD_TRANS_REG_PAIR_IMM_2(pusati_dh, pusati_h) +GEN_SIMD_TRANS_REG_PAIR_IMM_2(pslli_dh, pslli_h) +GEN_SIMD_TRANS_REG_PAIR_IMM_2(psrli_dh, psrli_h) +GEN_SIMD_TRANS_REG_PAIR_IMM_2(psrai_dh, psrai_h) +GEN_SIMD_TRANS_REG_PAIR_IMM_2(psslai_dh, psslai_h) +GEN_SIMD_TRANS_REG_PAIR_IMM_2(psrari_dh, psrari_h) +GEN_SIMD_TRANS_REG_PAIR_2(pmin_dh, pmin_h) +GEN_SIMD_TRANS_REG_PAIR_2(pminu_dh, pminu_h) +GEN_SIMD_TRANS_REG_PAIR_2(pmax_dh, pmax_h) +GEN_SIMD_TRANS_REG_PAIR_2(pmaxu_dh, pmaxu_h) +GEN_SIMD_TRANS_REG_PAIR_2(pmseq_dh, pmseq_h) +GEN_SIMD_TRANS_REG_PAIR_2(pmslt_dh, pmslt_h) +GEN_SIMD_TRANS_REG_PAIR_2(pmsltu_dh, pmsltu_h) +GEN_SIMD_TRANS_REG_PAIR_DW_IMM_2(psati_dw, sati) +GEN_SIMD_TRANS_REG_PAIR_DW_IMM_2(pusati_dw, usati) +GEN_SIMD_TRANS_REG_PAIR_DW_IMM(psslai_dw, sslai) +GEN_SIMD_TRANS_REG_PAIR_DW_IMM_2(psrari_dw, srari) +GEN_SIMD_TRANS_REG_PAIR_DW(pmseq_dw, mseq) +GEN_SIMD_TRANS_REG_PAIR_DW(pmslt_dw, mslt) +GEN_SIMD_TRANS_REG_PAIR_DW(pmsltu_dw, msltu) + +GEN_SIMD_TRANS_REG_PAIR_3(padd_dbs, padd_bs) +GEN_SIMD_TRANS_REG_PAIR_3(psll_dbs, psll_bs) +GEN_SIMD_TRANS_REG_PAIR_3(psra_dbs, psra_bs) +GEN_SIMD_TRANS_REG_PAIR_3(padd_dhs, padd_hs) +GEN_SIMD_TRANS_REG_PAIR_3(psll_dhs, psll_hs) +GEN_SIMD_TRANS_REG_PAIR_3(psrl_dhs, psrl_hs) +GEN_SIMD_TRANS_REG_PAIR_3(psra_dhs, psra_hs) +GEN_SIMD_TRANS_REG_PAIR_3(pssha_dhs, pssha_hs) +GEN_SIMD_TRANS_REG_PAIR_3(psshar_dhs, psshar_hs) +GEN_SIMD_TRANS_REG_PAIR_DW(pssha_dws, ssha) +GEN_SIMD_TRANS_REG_PAIR_DW(psshar_dws, sshar) + +GEN_SIMD_TRANS_REG_PAIR_2(ppairo_db, ppairo_b) +GEN_SIMD_TRANS_REG_PAIR_2(ppairo_dh, ppairo_h) +GEN_SIMD_TRANS_REG_PAIR_2(ppaire_db, ppaire_b) +GEN_SIMD_TRANS_REG_PAIR_2(ppaireo_db, ppaireo_b) +GEN_SIMD_TRANS_REG_PAIR_2(ppaireo_dh, ppaireo_h) +GEN_SIMD_TRANS_REG_PAIR_2(ppairoe_dh, ppairoe_h) +GEN_SIMD_TRANS_REG_PAIR_2(ppairoe_db, ppairoe_b) + +GEN_SIMD_TRANS_REG_PAIR_PREDSUM(predsum_dbs) +GEN_SIMD_TRANS_REG_PAIR_PREDSUM(predsumu_dbs) +GEN_SIMD_TRANS_REG_PAIR_PREDSUM(predsum_dhs) +GEN_SIMD_TRANS_REG_PAIR_PREDSUM(predsumu_dhs) + +GEN_SIMD_TRANS_PN_OP_IMM(pnsrli_b) +GEN_SIMD_TRANS_PN_OP_IMM(pnsrai_b) +GEN_SIMD_TRANS_PN_OP_IMM(pnsrari_b) +GEN_SIMD_TRANS_PN_OP_IMM(pnclipi_b) +GEN_SIMD_TRANS_PN_OP_IMM(pnclipri_b) +GEN_SIMD_TRANS_PN_OP_IMM(pnclipiu_b) +GEN_SIMD_TRANS_PN_OP_IMM(pnclipriu_b) + +GEN_SIMD_TRANS_PN_OP_IMM(pnsrli_h) +GEN_SIMD_TRANS_PN_OP_IMM(pnsrai_h) +GEN_SIMD_TRANS_PN_OP_IMM(pnsrari_h) +GEN_SIMD_TRANS_PN_OP_IMM(pnclipi_h) +GEN_SIMD_TRANS_PN_OP_IMM(pnclipri_h) +GEN_SIMD_TRANS_PN_OP_IMM(pnclipiu_h) +GEN_SIMD_TRANS_PN_OP_IMM(pnclipriu_h) + +GEN_SIMD_TRANS_PN_OP_IMM(nsrli) +GEN_SIMD_TRANS_PN_OP_IMM(nsrai) +GEN_SIMD_TRANS_PN_OP_IMM(nsrari) +GEN_SIMD_TRANS_PN_OP_IMM(nclipi) +GEN_SIMD_TRANS_PN_OP_IMM(nclipri) +GEN_SIMD_TRANS_PN_OP_IMM(nclipiu) +GEN_SIMD_TRANS_PN_OP_IMM(nclipriu) + +GEN_SIMD_TRANS_PN_OP_REG(pnsrl_bs) +GEN_SIMD_TRANS_PN_OP_REG(pnsra_bs) +GEN_SIMD_TRANS_PN_OP_REG(pnsrar_bs) +GEN_SIMD_TRANS_PN_OP_REG(pnclip_bs) +GEN_SIMD_TRANS_PN_OP_REG(pnclipr_bs) +GEN_SIMD_TRANS_PN_OP_REG(pnclipu_bs) +GEN_SIMD_TRANS_PN_OP_REG(pnclipru_bs) + +GEN_SIMD_TRANS_PN_OP_REG(pnsrl_hs) +GEN_SIMD_TRANS_PN_OP_REG(pnsra_hs) +GEN_SIMD_TRANS_PN_OP_REG(pnsrar_hs) +GEN_SIMD_TRANS_PN_OP_REG(pnclip_hs) +GEN_SIMD_TRANS_PN_OP_REG(pnclipr_hs) +GEN_SIMD_TRANS_PN_OP_REG(pnclipu_hs) +GEN_SIMD_TRANS_PN_OP_REG(pnclipru_hs) + +GEN_SIMD_TRANS_PN_OP_REG(nsrl) +GEN_SIMD_TRANS_PN_OP_REG(nsra) +GEN_SIMD_TRANS_PN_OP_REG(nsrar) +GEN_SIMD_TRANS_PN_OP_REG(nclip) +GEN_SIMD_TRANS_PN_OP_REG(nclipr) +GEN_SIMD_TRANS_PN_OP_REG(nclipu) +GEN_SIMD_TRANS_PN_OP_REG(nclipru) + +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pmqwacc_h) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pmqrwacc_h) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(mqwacc) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(mqrwacc) + +GEN_SIMD_TRANS_REG_PAIR_1(pwmul_b) +GEN_SIMD_TRANS_REG_PAIR_1(pwmulsu_b) +GEN_SIMD_TRANS_REG_PAIR_1(pwmulu_b) + +GEN_SIMD_TRANS_REG_PAIR_1(pwmul_h) +GEN_SIMD_TRANS_REG_PAIR_1(pwmulsu_h) +GEN_SIMD_TRANS_REG_PAIR_1(pwmulu_h) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwmacc_h) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwmaccsu_h) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pwmaccu_h) + +GEN_SIMD_TRANS_REG_PAIR_1(wmul) +GEN_SIMD_TRANS_REG_PAIR_1(wmulsu) +GEN_SIMD_TRANS_REG_PAIR_1(wmulu) + +GEN_SIMD_TRANS_ACC_REG_PAIR_1(wmacc) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(wmaccsu) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(wmaccu) + +GEN_SIMD_TRANS_REG_PAIR_1(pm2wadd_h) +GEN_SIMD_TRANS_REG_PAIR_1(pm2waddsu_h) +GEN_SIMD_TRANS_REG_PAIR_1(pm2waddu_h) +GEN_SIMD_TRANS_REG_PAIR_1(pm2wadd_hx) + +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pm2wadda_h) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pm2waddasu_h) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pm2waddau_h) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pm2wadda_hx) + +GEN_SIMD_TRANS_REG_PAIR_1(pm2wsub_h) +GEN_SIMD_TRANS_REG_PAIR_1(pm2wsub_hx) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pm2wsuba_h) +GEN_SIMD_TRANS_ACC_REG_PAIR_1(pm2wsuba_hx) + static bool trans_pli_b(DisasContext *ctx, arg_pli_b * a) { REQUIRE_EXT(ctx, RVP); @@ -973,3 +1235,439 @@ static bool trans_plui_w(DisasContext *ctx, arg_plui= _w * a) gen_set_gpri(ctx, a->rd, imm); return true; } + +static bool trans_pli_db(DisasContext *ctx, arg_pli_db * a) +{ + REQUIRE_EXT(ctx, RVP); + int i =3D 1; + target_long imm =3D a->imm; + while (i < TARGET_LONG_SIZE) { + imm =3D ((imm << 8) + a->imm); + i++; + } + gen_set_gpri(ctx, (a->rd) * 2, imm); + gen_set_gpri(ctx, (a->rd) * 2 + 1, imm); + return true; +} + +static bool trans_pli_dh(DisasContext *ctx, arg_pli_dh * a) +{ + REQUIRE_EXT(ctx, RVP); + int i =3D 1; + target_long imm =3D a->imm; + while (i < TARGET_LONG_SIZE / 2) { + imm =3D (imm << 16) + (a->imm & 0xFFFF); + i++; + } + gen_set_gpri(ctx, (a->rd) * 2, imm); + gen_set_gpri(ctx, (a->rd) * 2 + 1, imm); + return true; +} + +static bool trans_plui_dh(DisasContext *ctx, arg_plui_dh * a) +{ + REQUIRE_EXT(ctx, RVP); + int i =3D 1; + target_long imm =3D a->imm; + while (i < TARGET_LONG_SIZE / 2) { + imm =3D (imm << 16) + (a->imm & 0xFFFF); + i++; + } + gen_set_gpri(ctx, (a->rd) * 2, imm); + gen_set_gpri(ctx, (a->rd) * 2 + 1, imm); + return true; +} + +static bool trans_padd_dw(DisasContext *ctx, arg_padd_dw * a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE); + TCGv src2_0 =3D get_gpr(ctx, (a->rs2) * 2, EXT_NONE); + TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2); + TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE); + TCGv src2_1 =3D get_gpr(ctx, (a->rs2) * 2 + 1, EXT_NONE); + TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1); + tcg_gen_add_tl(dest_0, src1_0, src2_0); + tcg_gen_add_tl(dest_1, src1_1, src2_1); + gen_set_gpr(ctx, (a->rd) * 2, dest_0); + gen_set_gpr(ctx, (a->rd) * 2 + 1, dest_1); + return true; +} + +static bool trans_psub_dw(DisasContext *ctx, arg_psub_dw * a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE); + TCGv src2_0 =3D get_gpr(ctx, (a->rs2) * 2, EXT_NONE); + TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2); + TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE); + TCGv src2_1 =3D get_gpr(ctx, (a->rs2) * 2 + 1, EXT_NONE); + TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1); + tcg_gen_sub_tl(dest_0, src1_0, src2_0); + tcg_gen_sub_tl(dest_1, src1_1, src2_1); + gen_set_gpr(ctx, (a->rd) * 2, dest_0); + gen_set_gpr(ctx, (a->rd) * 2 + 1, dest_1); + return true; +} + +static bool trans_psh1add_dw(DisasContext *ctx, arg_psh1add_dw * a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE); + TCGv src2_0 =3D get_gpr(ctx, (a->rs2) * 2, EXT_NONE); + TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2); + TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE); + TCGv src2_1 =3D get_gpr(ctx, (a->rs2) * 2 + 1, EXT_NONE); + TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1); + gen_sh1add(dest_0, src1_0, src2_0); + gen_sh1add(dest_1, src1_1, src2_1); + gen_set_gpr(ctx, (a->rd) * 2, dest_0); + gen_set_gpr(ctx, (a->rd) * 2 + 1, dest_1); + return true; +} + +/* Verify rd is not zero register for wzip8p and wzip16p. */ +#if defined(TARGET_RISCV32) +static bool trans_wzip8p(DisasContext *ctx, arg_wzip8p * a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + TCGv_i32 src1 =3D get_gpr(ctx, a->rs1, EXT_NONE); + TCGv_i32 src2 =3D get_gpr(ctx, a->rs2, EXT_NONE); + TCGv_i64 t =3D tcg_temp_new_i64(); + if (a->rd =3D=3D 0) { + return true; + } else { + get_pair_regs(ctx, t, (a->rd) * 2); + } + gen_helper_wzip8p(t, tcg_env, src1, src2); + set_pair_regs(ctx, (a->rd) * 2, t); + return true; +} +#else +static bool trans_wzip8p(DisasContext *ctx, arg_wzip8p * a) +{ + REQUIRE_32BIT(ctx); + return true; +} +#endif + +#if defined(TARGET_RISCV32) +static bool trans_wzip16p(DisasContext *ctx, arg_wzip16p * a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + TCGv_i32 src1 =3D get_gpr(ctx, a->rs1, EXT_NONE); + TCGv_i32 src2 =3D get_gpr(ctx, a->rs2, EXT_NONE); + TCGv_i64 t =3D tcg_temp_new_i64(); + if (a->rd =3D=3D 0) { + return true; + } else { + get_pair_regs(ctx, t, (a->rd) * 2); + } + gen_helper_wzip16p(t, tcg_env, src1, src2); + set_pair_regs(ctx, (a->rd) * 2, t); + return true; +} +#else +static bool trans_wzip16p(DisasContext *ctx, arg_wzip16p * a) +{ + REQUIRE_32BIT(ctx); + return true; +} +#endif + +static bool trans_addd_p(DisasContext *ctx, arg_addd_p * a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + TCGv_i64 src1 =3D tcg_temp_new_i64(); + TCGv_i64 src2 =3D tcg_temp_new_i64(); + TCGv_i64 dest =3D tcg_temp_new_i64(); + get_pair_regs(ctx, src1, (a->rs1) * 2); + get_pair_regs(ctx, src2, (a->rs2) * 2); + get_pair_regs(ctx, dest, (a->rd) * 2); + tcg_gen_add_i64(dest, src1, src2); + set_pair_regs(ctx, (a->rd) * 2, dest); + + return true; +} + +static bool trans_subd_p(DisasContext *ctx, arg_subd_p * a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + TCGv_i64 src1 =3D tcg_temp_new_i64(); + TCGv_i64 src2 =3D tcg_temp_new_i64(); + TCGv_i64 dest =3D tcg_temp_new_i64(); + get_pair_regs(ctx, src1, (a->rs1) * 2); + get_pair_regs(ctx, src2, (a->rs2) * 2); + get_pair_regs(ctx, dest, (a->rd) * 2); + tcg_gen_sub_i64(dest, src1, src2); + set_pair_regs(ctx, (a->rd) * 2, dest); + + return true; +} + +static bool trans_psext_dw_b(DisasContext *ctx, arg_psext_dw_b * a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2); + TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE); + TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1); + TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE); + + tcg_gen_ext8s_tl(dest_0, src1_0); + gen_set_gpr(ctx, (a->rd) * 2, dest_0); + + tcg_gen_ext8s_tl(dest_1, src1_1); + gen_set_gpr(ctx, (a->rd) * 2 + 1, dest_1); + + return true; +} + +static bool trans_psext_dw_h(DisasContext *ctx, arg_psext_dw_h * a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + TCGv dest_0 =3D dest_gpr(ctx, (a->rd) * 2); + TCGv src1_0 =3D get_gpr(ctx, (a->rs1) * 2, EXT_NONE); + TCGv dest_1 =3D dest_gpr(ctx, (a->rd) * 2 + 1); + TCGv src1_1 =3D get_gpr(ctx, (a->rs1) * 2 + 1, EXT_NONE); + + tcg_gen_ext16s_tl(dest_0, src1_0); + gen_set_gpr(ctx, (a->rd) * 2, dest_0); + + tcg_gen_ext16s_tl(dest_1, src1_1); + gen_set_gpr(ctx, (a->rd) * 2 + 1, dest_1); + + return true; +} + +static bool trans_pslli_dw(DisasContext *ctx, arg_pslli_dw *a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + arg_shift a0, a1; + a0.rd =3D (a->rd) * 2; + a0.rs1 =3D (a->rs1) * 2; + a0.shamt =3D a->imm; + a1.rd =3D (a->rd) * 2 + 1; + a1.rs1 =3D (a->rs1) * 2 + 1; + a1.shamt =3D a->imm; + + gen_shift_imm_fn(ctx, &a0, EXT_NONE, tcg_gen_shli_tl, NULL); + gen_shift_imm_fn(ctx, &a1, EXT_NONE, tcg_gen_shli_tl, NULL); + + return true; +} + +static bool trans_psrli_dw(DisasContext *ctx, arg_psrli_dw *a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + arg_shift a0, a1; + a0.rd =3D (a->rd) * 2; + a0.rs1 =3D (a->rs1) * 2; + a0.shamt =3D a->imm; + a1.rd =3D (a->rd) * 2 + 1; + a1.rs1 =3D (a->rs1) * 2 + 1; + a1.shamt =3D a->imm; + + gen_shift_imm_fn_per_ol(ctx, &a0, EXT_NONE, tcg_gen_shri_tl, + gen_srliw, NULL); + gen_shift_imm_fn_per_ol(ctx, &a1, EXT_NONE, tcg_gen_shri_tl, + gen_srliw, NULL); + + return true; +} + +static bool trans_psrai_dw(DisasContext *ctx, arg_psrai_dw *a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + arg_shift a0, a1; + a0.rd =3D (a->rd) * 2; + a0.rs1 =3D (a->rs1) * 2; + a0.shamt =3D a->imm; + a1.rd =3D (a->rd) * 2 + 1; + a1.rs1 =3D (a->rs1) * 2 + 1; + a1.shamt =3D a->imm; + + gen_shift_imm_fn_per_ol(ctx, &a0, EXT_NONE, tcg_gen_sari_tl, + gen_sraiw, NULL); + gen_shift_imm_fn_per_ol(ctx, &a1, EXT_NONE, tcg_gen_sari_tl, + gen_sraiw, NULL); + + return true; +} + +static bool trans_pmin_dw(DisasContext *ctx, arg_pmin_dw *a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + REQUIRE_ZBB(ctx); + arg_r a0, a1; + a0.rd =3D (a->rd) * 2; + a0.rs1 =3D (a->rs1) * 2; + a0.rs2 =3D (a->rs2) * 2; + a1.rd =3D (a->rd) * 2 + 1; + a1.rs1 =3D (a->rs1) * 2 + 1; + a1.rs2 =3D (a->rs2) * 2 + 1; + + gen_arith(ctx, &a0, EXT_SIGN, tcg_gen_smin_tl, NULL); + gen_arith(ctx, &a1, EXT_SIGN, tcg_gen_smin_tl, NULL); + + return true; +} + +static bool trans_pminu_dw(DisasContext *ctx, arg_pminu_dw *a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + REQUIRE_ZBB(ctx); + arg_r a0, a1; + a0.rd =3D (a->rd) * 2; + a0.rs1 =3D (a->rs1) * 2; + a0.rs2 =3D (a->rs2) * 2; + a1.rd =3D (a->rd) * 2 + 1; + a1.rs1 =3D (a->rs1) * 2 + 1; + a1.rs2 =3D (a->rs2) * 2 + 1; + + gen_arith(ctx, &a0, EXT_SIGN, tcg_gen_umin_tl, NULL); + gen_arith(ctx, &a1, EXT_SIGN, tcg_gen_umin_tl, NULL); + + return true; +} + +static bool trans_pmax_dw(DisasContext *ctx, arg_pmax_dw *a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + REQUIRE_ZBB(ctx); + arg_r a0, a1; + a0.rd =3D (a->rd) * 2; + a0.rs1 =3D (a->rs1) * 2; + a0.rs2 =3D (a->rs2) * 2; + a1.rd =3D (a->rd) * 2 + 1; + a1.rs1 =3D (a->rs1) * 2 + 1; + a1.rs2 =3D (a->rs2) * 2 + 1; + + gen_arith(ctx, &a0, EXT_SIGN, tcg_gen_smax_tl, NULL); + gen_arith(ctx, &a1, EXT_SIGN, tcg_gen_smax_tl, NULL); + + return true; +} + +static bool trans_pmaxu_dw(DisasContext *ctx, arg_pmaxu_dw *a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + REQUIRE_ZBB(ctx); + arg_r a0, a1; + a0.rd =3D (a->rd) * 2; + a0.rs1 =3D (a->rs1) * 2; + a0.rs2 =3D (a->rs2) * 2; + a1.rd =3D (a->rd) * 2 + 1; + a1.rs1 =3D (a->rs1) * 2 + 1; + a1.rs2 =3D (a->rs2) * 2 + 1; + + gen_arith(ctx, &a0, EXT_SIGN, tcg_gen_umax_tl, NULL); + gen_arith(ctx, &a1, EXT_SIGN, tcg_gen_umax_tl, NULL); + + return true; +} + +static bool trans_padd_dws(DisasContext *ctx, arg_padd_dws *a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + arg_r a0, a1; + a0.rd =3D (a->rd) * 2; + a0.rs1 =3D (a->rs1) * 2; + a0.rs2 =3D a->rs2; + a1.rd =3D (a->rd) * 2 + 1; + a1.rs1 =3D (a->rs1) * 2 + 1; + a1.rs2 =3D a->rs2; + + gen_arith(ctx, &a0, EXT_NONE, tcg_gen_add_tl, NULL); + gen_arith(ctx, &a1, EXT_NONE, tcg_gen_add_tl, NULL); + + return true; +} + +static bool trans_psll_dws(DisasContext *ctx, arg_psll_dws *a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + arg_r a0, a1; + a0.rd =3D (a->rd) * 2; + a0.rs1 =3D (a->rs1) * 2; + a0.rs2 =3D a->rs2; + a1.rd =3D (a->rd) * 2 + 1; + a1.rs1 =3D (a->rs1) * 2 + 1; + a1.rs2 =3D a->rs2; + + gen_shift(ctx, &a0, EXT_NONE, tcg_gen_shl_tl, NULL); + gen_shift(ctx, &a1, EXT_NONE, tcg_gen_shl_tl, NULL); + + return true; +} + +static bool trans_psrl_dws(DisasContext *ctx, arg_psrl_dws *a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + arg_r a0, a1; + a0.rd =3D (a->rd) * 2; + a0.rs1 =3D (a->rs1) * 2; + a0.rs2 =3D a->rs2; + a1.rd =3D (a->rd) * 2 + 1; + a1.rs1 =3D (a->rs1) * 2 + 1; + a1.rs2 =3D a->rs2; + + gen_shift(ctx, &a0, EXT_ZERO, tcg_gen_shr_tl, NULL); + gen_shift(ctx, &a1, EXT_ZERO, tcg_gen_shr_tl, NULL); + + return true; +} + +static bool trans_psra_dws(DisasContext *ctx, arg_psra_dws *a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + arg_r a0, a1; + a0.rd =3D (a->rd) * 2; + a0.rs1 =3D (a->rs1) * 2; + a0.rs2 =3D a->rs2; + a1.rd =3D (a->rd) * 2 + 1; + a1.rs1 =3D (a->rs1) * 2 + 1; + a1.rs2 =3D a->rs2; + + gen_shift(ctx, &a0, EXT_SIGN, tcg_gen_sar_tl, NULL); + gen_shift(ctx, &a1, EXT_SIGN, tcg_gen_sar_tl, NULL); + + return true; +} + +static bool trans_ppaire_dh(DisasContext *ctx, arg_ppaire_dh *a) +{ + REQUIRE_32BIT(ctx); + REQUIRE_EXT(ctx, RVP); + REQUIRE_ZBKB(ctx); + arg_r a0, a1; + a0.rd =3D (a->rd) * 2; + a0.rs1 =3D (a->rs1) * 2; + a0.rs2 =3D (a->rs2) * 2; + a1.rd =3D (a->rd) * 2 + 1; + a1.rs1 =3D (a->rs1) * 2 + 1; + a1.rs2 =3D (a->rs2) * 2 + 1; + + gen_arith(ctx, &a0, EXT_NONE, gen_pack, NULL); + gen_arith(ctx, &a1, EXT_NONE, gen_pack, NULL); + return true; +} diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c index 5eede48581..4c91800128 100644 --- a/target/riscv/psimd_helper.c +++ b/target/riscv/psimd_helper.c @@ -7012,3 +7012,2071 @@ uint64_t HELPER(pm4addau_h)(CPURISCVState *env, ui= nt64_t rs1, uint64_t prod3 =3D (uint64_t)s1_h3 * (uint64_t)s2_h3; return d + prod0 + prod1 + prod2 + prod3; } + +/* Double-Width Operations (RV32 only, register pairs) */ + +/** + * PWADD.B - Packed widening byte to halfword addition (RV32) + * rd_pair =3D {rs1[31:24]+rs2[31:24], rs1[23:16]+rs2[23:16], + * rs1[15:8]+rs2[15:8], rs1[7:0]+rs2[7:0]} (sign-extended) + */ +uint64_t HELPER(pwadd_b)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + int16_t e1 =3D (int8_t)((rs1 >> (i * 8)) & 0xFF); + int16_t e2 =3D (int8_t)((rs2 >> (i * 8)) & 0xFF); + int16_t res =3D e1 + e2; + rd |=3D ((uint64_t)(uint16_t)res) << (i * 16); + } + + return rd; +} + +/** + * PWADDA.B - Packed widening byte to halfword addition with accumulate (R= V32) + * rd_pair +=3D {rs1[i] + rs2[i]} + */ +uint64_t HELPER(pwadda_b)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t rd) +{ + uint64_t result =3D 0; + + for (int i =3D 0; i < 4; i++) { + int16_t e1 =3D (int8_t)((rs1 >> (i * 8)) & 0xFF); + int16_t e2 =3D (int8_t)((rs2 >> (i * 8)) & 0xFF); + int16_t acc =3D (int16_t)((rd >> (i * 16)) & 0xFFFF); + int16_t res =3D acc + e1 + e2; + result |=3D ((uint64_t)(uint16_t)res) << (i * 16); + } + + return result; +} + +/** + * PWADDU.B - Packed widening byte to halfword unsigned addition (RV32) + */ +uint64_t HELPER(pwaddu_b)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t e1 =3D (uint8_t)((rs1 >> (i * 8)) & 0xFF); + uint16_t e2 =3D (uint8_t)((rs2 >> (i * 8)) & 0xFF); + uint16_t res =3D e1 + e2; + rd |=3D ((uint64_t)res) << (i * 16); + } + + return rd; +} + +/** + * PWADDAU.B - Packed widening byte to halfword unsigned addition + * with accumulate (RV32) + */ +uint64_t HELPER(pwaddau_b)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t rd) +{ + uint64_t result =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t e1 =3D (uint8_t)((rs1 >> (i * 8)) & 0xFF); + uint16_t e2 =3D (uint8_t)((rs2 >> (i * 8)) & 0xFF); + uint16_t acc =3D (uint16_t)((rd >> (i * 16)) & 0xFFFF); + uint16_t res =3D acc + e1 + e2; + result |=3D ((uint64_t)res) << (i * 16); + } + + return result; +} + +/** + * PWSUB.B - Packed widening byte to halfword subtraction (RV32) + */ +uint64_t HELPER(pwsub_b)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + int16_t e1 =3D (int8_t)((rs1 >> (i * 8)) & 0xFF); + int16_t e2 =3D (int8_t)((rs2 >> (i * 8)) & 0xFF); + int16_t res =3D e1 - e2; + rd |=3D ((uint64_t)(uint16_t)res) << (i * 16); + } + + return rd; +} + +/** + * PWSUBA.B - Packed widening byte to halfword subtraction + * with accumulate (RV32) + */ +uint64_t HELPER(pwsuba_b)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t rd) +{ + uint64_t result =3D 0; + + for (int i =3D 0; i < 4; i++) { + int16_t e1 =3D (int8_t)((rs1 >> (i * 8)) & 0xFF); + int16_t e2 =3D (int8_t)((rs2 >> (i * 8)) & 0xFF); + int16_t acc =3D (int16_t)((rd >> (i * 16)) & 0xFFFF); + int16_t res =3D acc + (e1 - e2); + result |=3D ((uint64_t)(uint16_t)res) << (i * 16); + } + + return result; +} + +/** + * PWSUBU.B - Packed widening byte to halfword unsigned subtraction (RV32) + */ +uint64_t HELPER(pwsubu_b)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t e1 =3D (uint8_t)((rs1 >> (i * 8)) & 0xFF); + uint16_t e2 =3D (uint8_t)((rs2 >> (i * 8)) & 0xFF); + uint16_t res =3D e1 - e2; + rd |=3D ((uint64_t)res) << (i * 16); + } + + return rd; +} + +/** + * PWSUBAU.B - Packed widening byte to halfword unsigned subtraction + * with accumulate (RV32) + */ +uint64_t HELPER(pwsubau_b)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t rd) +{ + uint64_t result =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t e1 =3D (uint8_t)((rs1 >> (i * 8)) & 0xFF); + uint16_t e2 =3D (uint8_t)((rs2 >> (i * 8)) & 0xFF); + uint16_t acc =3D (uint16_t)((rd >> (i * 16)) & 0xFFFF); + uint16_t res =3D acc + (e1 - e2); + result |=3D ((uint64_t)res) << (i * 16); + } + + return result; +} + +/** + * PWSLLI.B - Packed widening shift left immediate (byte to halfword) + */ +uint64_t HELPER(pwslli_b)(CPURISCVState *env, uint32_t rs1, uint32_t imm) +{ + uint64_t rd =3D 0; + uint8_t shamt =3D imm & 0x0F; + + for (int i =3D 0; i < 4; i++) { + uint16_t e1 =3D (uint8_t)((rs1 >> (i * 8)) & 0xFF); + uint16_t res =3D e1 << shamt; + rd |=3D ((uint64_t)res) << (i * 16); + } + + return rd; +} + +/** + * PWSLL.BS - Packed widening shift left from register (byte to halfword) + */ +uint64_t HELPER(pwsll_bs)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + uint8_t shamt =3D rs2 & 0x1F; + + for (int i =3D 0; i < 4; i++) { + uint16_t e1 =3D (uint8_t)((rs1 >> (i * 8)) & 0xFF); + uint16_t res =3D e1 << shamt; + rd |=3D ((uint64_t)res) << (i * 16); + } + + return rd; +} + +/** + * PWSLAI.B - Packed widening signed shift left immediate (byte to halfwor= d) + */ +uint64_t HELPER(pwslai_b)(CPURISCVState *env, uint32_t rs1, uint32_t imm) +{ + uint64_t rd =3D 0; + uint8_t shamt =3D imm & 0x0F; + + for (int i =3D 0; i < 4; i++) { + int16_t e1 =3D (int8_t)((rs1 >> (i * 8)) & 0xFF); + int16_t res =3D e1 << shamt; + rd |=3D ((uint64_t)(uint16_t)res) << (i * 16); + } + + return rd; +} + +/** + * PWSLA.BS - Packed widening signed shift left from register (byte to hal= fword) + */ +uint64_t HELPER(pwsla_bs)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + uint8_t shamt =3D rs2 & 0x1F; + + for (int i =3D 0; i < 4; i++) { + int16_t e1 =3D (int8_t)((rs1 >> (i * 8)) & 0xFF); + int16_t res =3D e1 << shamt; + rd |=3D ((uint64_t)(uint16_t)res) << (i * 16); + } + + return rd; +} + +/** + * PWADD.H - Packed widening halfword to word addition (RV32) + */ +uint64_t HELPER(pwadd_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int32_t e1 =3D (int16_t)((rs1 >> (i * 16)) & 0xFFFF); + int32_t e2 =3D (int16_t)((rs2 >> (i * 16)) & 0xFFFF); + int32_t res =3D e1 + e2; + rd |=3D ((uint64_t)(uint32_t)res) << (i * 32); + } + + return rd; +} + +/** + * PWADDA.H - Packed widening halfword to word addition with accumulate (R= V32) + */ +uint64_t HELPER(pwadda_h)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t rd) +{ + uint64_t result =3D 0; + + for (int i =3D 0; i < 2; i++) { + int32_t e1 =3D (int16_t)((rs1 >> (i * 16)) & 0xFFFF); + int32_t e2 =3D (int16_t)((rs2 >> (i * 16)) & 0xFFFF); + int32_t acc =3D (int32_t)((rd >> (i * 32)) & 0xFFFFFFFF); + int32_t res =3D acc + e1 + e2; + result |=3D ((uint64_t)(uint32_t)res) << (i * 32); + } + + return result; +} + +/** + * PWADDU.H - Packed widening halfword to word unsigned addition (RV32) + */ +uint64_t HELPER(pwaddu_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + uint32_t e1 =3D (uint16_t)((rs1 >> (i * 16)) & 0xFFFF); + uint32_t e2 =3D (uint16_t)((rs2 >> (i * 16)) & 0xFFFF); + uint32_t res =3D e1 + e2; + rd |=3D ((uint64_t)res) << (i * 32); + } + + return rd; +} + +/** + * PWADDAU.H - Packed widening halfword to word unsigned addition + * with accumulate (RV32) + */ +uint64_t HELPER(pwaddau_h)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t rd) +{ + uint64_t result =3D 0; + + for (int i =3D 0; i < 2; i++) { + uint32_t e1 =3D (uint16_t)((rs1 >> (i * 16)) & 0xFFFF); + uint32_t e2 =3D (uint16_t)((rs2 >> (i * 16)) & 0xFFFF); + uint32_t acc =3D (uint32_t)((rd >> (i * 32)) & 0xFFFFFFFF); + uint32_t res =3D acc + e1 + e2; + result |=3D ((uint64_t)res) << (i * 32); + } + + return result; +} + +/** + * PWSUB.H - Packed widening halfword to word subtraction (RV32) + */ +uint64_t HELPER(pwsub_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int32_t e1 =3D (int16_t)((rs1 >> (i * 16)) & 0xFFFF); + int32_t e2 =3D (int16_t)((rs2 >> (i * 16)) & 0xFFFF); + int32_t res =3D e1 - e2; + rd |=3D ((uint64_t)(uint32_t)res) << (i * 32); + } + + return rd; +} + +/** + * PWSUBA.H - Packed widening halfword to word subtraction + * with accumulate (RV32) + */ +uint64_t HELPER(pwsuba_h)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t rd) +{ + uint64_t result =3D 0; + + for (int i =3D 0; i < 2; i++) { + int32_t e1 =3D (int16_t)((rs1 >> (i * 16)) & 0xFFFF); + int32_t e2 =3D (int16_t)((rs2 >> (i * 16)) & 0xFFFF); + int32_t acc =3D (int32_t)((rd >> (i * 32)) & 0xFFFFFFFF); + int32_t res =3D acc + (e1 - e2); + result |=3D ((uint64_t)(uint32_t)res) << (i * 32); + } + + return result; +} + +/** + * PWSUBU.H - Packed widening halfword to word unsigned subtraction (RV32) + */ +uint64_t HELPER(pwsubu_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + uint32_t e1 =3D (uint16_t)((rs1 >> (i * 16)) & 0xFFFF); + uint32_t e2 =3D (uint16_t)((rs2 >> (i * 16)) & 0xFFFF); + uint32_t res =3D e1 - e2; + rd |=3D ((uint64_t)res) << (i * 32); + } + + return rd; +} + +/** + * PWSUBAU.H - Packed widening halfword to word unsigned subtraction + * with accumulate (RV32) + */ +uint64_t HELPER(pwsubau_h)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t rd) +{ + uint64_t result =3D 0; + + for (int i =3D 0; i < 2; i++) { + uint32_t e1 =3D (uint16_t)((rs1 >> (i * 16)) & 0xFFFF); + uint32_t e2 =3D (uint16_t)((rs2 >> (i * 16)) & 0xFFFF); + uint32_t acc =3D (uint32_t)((rd >> (i * 32)) & 0xFFFFFFFF); + uint32_t res =3D acc + (e1 - e2); + result |=3D ((uint64_t)res) << (i * 32); + } + + return result; +} + +/** + * PWSLLI.H - Packed widening shift left immediate (halfword to word) + */ +uint64_t HELPER(pwslli_h)(CPURISCVState *env, uint32_t rs1, uint32_t imm) +{ + uint64_t rd =3D 0; + uint8_t shamt =3D imm & 0x1F; + + for (int i =3D 0; i < 2; i++) { + uint32_t e1 =3D (uint16_t)((rs1 >> (i * 16)) & 0xFFFF); + uint32_t res =3D e1 << shamt; + rd |=3D ((uint64_t)res) << (i * 32); + } + + return rd; +} + +/** + * PWSLL.HS - Packed widening shift left from register (halfword to word) + */ +uint64_t HELPER(pwsll_hs)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + uint8_t shamt =3D rs2 & 0x1F; + + for (int i =3D 0; i < 2; i++) { + uint32_t e1 =3D (uint16_t)((rs1 >> (i * 16)) & 0xFFFF); + uint32_t res =3D e1 << shamt; + rd |=3D ((uint64_t)res) << (i * 32); + } + + return rd; +} + +/** + * PWSLAI.H - Packed widening signed shift left immediate (halfword to wor= d) + */ +uint64_t HELPER(pwslai_h)(CPURISCVState *env, uint32_t rs1, uint32_t imm) +{ + uint64_t rd =3D 0; + uint8_t shamt =3D imm & 0x1F; + + for (int i =3D 0; i < 2; i++) { + int32_t e1 =3D (int16_t)((rs1 >> (i * 16)) & 0xFFFF); + int32_t res =3D e1 << shamt; + rd |=3D ((uint64_t)(uint32_t)res) << (i * 32); + } + + return rd; +} + +/** + * PWSLA.HS - Packed widening signed shift left from register (halfword to= word) + */ +uint64_t HELPER(pwsla_hs)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + uint8_t shamt =3D rs2 & 0x1F; + + for (int i =3D 0; i < 2; i++) { + int32_t e1 =3D (int16_t)((rs1 >> (i * 16)) & 0xFFFF); + int32_t res =3D e1 << shamt; + rd |=3D ((uint64_t)(uint32_t)res) << (i * 32); + } + + return rd; +} + +/** + * WADD - Widening signed addition (RV32) + */ +uint64_t HELPER(wadd)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int64_t a =3D (int32_t)rs1; + int64_t b =3D (int32_t)rs2; + return (uint64_t)(a + b); +} + +/** + * WADDA - Widening signed addition with accumulate (RV32) + */ +uint64_t HELPER(wadda)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t rd) +{ + int64_t a =3D (int32_t)rs1; + int64_t b =3D (int32_t)rs2; + int64_t acc =3D (int64_t)rd; + return (uint64_t)(acc + a + b); +} + +/** + * WADDU - Widening unsigned addition (RV32) + */ +uint64_t HELPER(waddu)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t a =3D rs1; + uint64_t b =3D rs2; + return a + b; +} + +/** + * WADDAU - Widening unsigned addition with accumulate (RV32) + */ +uint64_t HELPER(waddau)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t rd) +{ + uint64_t acc =3D rd; + return acc + rs1 + rs2; +} + +/** + * WSUB - Widening signed subtraction (RV32) + */ +uint64_t HELPER(wsub)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int64_t a =3D (int32_t)rs1; + int64_t b =3D (int32_t)rs2; + return (uint64_t)(a - b); +} + +/** + * WSUBA - Widening signed subtraction with accumulate (RV32) + */ +uint64_t HELPER(wsuba)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t rd) +{ + int64_t a =3D (int32_t)rs1; + int64_t b =3D (int32_t)rs2; + int64_t acc =3D (int64_t)rd; + return (uint64_t)(acc + a - b); +} + +/** + * WSUBU - Widening unsigned subtraction (RV32) + */ +uint64_t HELPER(wsubu)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t a =3D rs1; + uint64_t b =3D rs2; + return a - b; +} + +/** + * WSUBAU - Widening unsigned subtraction with accumulate (RV32) + */ +uint64_t HELPER(wsubau)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t rd) +{ + uint64_t acc =3D rd; + return acc + rs1 - rs2; +} + +/** + * WSLLI - Widening logical shift left immediate (RV32) + */ +uint64_t HELPER(wslli)(CPURISCVState *env, uint32_t rs1, uint32_t imm) +{ + uint64_t a =3D rs1; + uint8_t shamt =3D imm & 0x3F; + return a << shamt; +} + +/** + * WSLL - Widening logical shift left from register (RV32) + */ +uint64_t HELPER(wsll)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t a =3D rs1; + uint8_t shamt =3D rs2 & 0x3F; + return a << shamt; +} + +/** + * WSLAI - Widening signed shift left immediate (RV32) + */ +uint64_t HELPER(wslai)(CPURISCVState *env, uint32_t rs1, uint32_t imm) +{ + int64_t a =3D (int32_t)rs1; + uint8_t shamt =3D imm & 0x3F; + return (uint64_t)(a << shamt); +} + +/** + * WSLA - Widening signed shift left from register (RV32) + */ +uint64_t HELPER(wsla)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int64_t a =3D (int32_t)rs1; + uint8_t shamt =3D rs2 & 0x3F; + return (uint64_t)(a << shamt); +} + +/** + * WZIP8P - Double-width interleave bytes (RV32) + */ +uint64_t HELPER(wzip8p)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint64_t b1 =3D (uint64_t)EXTRACT8(rs1, i) << 16 * i; + uint64_t b2 =3D (uint64_t)EXTRACT8(rs2, i) << (16 * i + 8); + rd =3D rd | b2 | b1; + } + + return rd; +} + +/** + * WZIP16P - Double-width interleave halfwords (RV32) + */ +uint64_t HELPER(wzip16p)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + uint64_t h1 =3D (uint64_t)EXTRACT16(rs1, i) << (32 * i); + uint64_t h2 =3D (uint64_t)EXTRACT16(rs2, i) << (32 * i + 16); + rd =3D rd | h2 | h1; + } + + return rd; +} + +/** + * PREDSUM.DBS - Double-width signed reduction sum of bytes (RV32) + */ +uint32_t HELPER(predsum_dbs)(CPURISCVState *env, uint32_t rs1_lo, + uint32_t rs1_hi, uint32_t rs2) +{ + int64_t sum =3D (int32_t)rs2; + int64_t s1 =3D ((int64_t)rs1_hi << 32) | rs1_lo; + + for (int i =3D 0; i < 8; i++) { + int8_t b =3D (int8_t)((s1 >> (i * 8)) & 0xFF); + sum +=3D b; + } + + return (uint32_t)sum; +} + +/** + * PREDSUMU.DBS - Double-width unsigned reduction sum of bytes (RV32) + */ +uint32_t HELPER(predsumu_dbs)(CPURISCVState *env, uint32_t rs1_lo, + uint32_t rs1_hi, uint32_t rs2) +{ + uint64_t sum =3D rs2; + uint64_t s1 =3D ((uint64_t)rs1_hi << 32) | rs1_lo; + + for (int i =3D 0; i < 8; i++) { + uint8_t b =3D (uint8_t)((s1 >> (i * 8)) & 0xFF); + sum +=3D b; + } + + return (uint32_t)sum; +} + +/** + * PREDSUM.DHS - Double-width signed reduction sum of halfwords (RV32) + */ +uint32_t HELPER(predsum_dhs)(CPURISCVState *env, uint32_t rs1_lo, + uint32_t rs1_hi, uint32_t rs2) +{ + int64_t sum =3D (int32_t)rs2; + int64_t s1 =3D ((int64_t)rs1_hi << 32) | rs1_lo; + + for (int i =3D 0; i < 4; i++) { + int16_t h =3D (int16_t)((s1 >> (i * 16)) & 0xFFFF); + sum +=3D h; + } + + return (uint32_t)sum; +} + +/** + * PREDSUMU.DHS - Double-width unsigned reduction sum of halfwords (RV32) + */ +uint32_t HELPER(predsumu_dhs)(CPURISCVState *env, uint32_t rs1_lo, + uint32_t rs1_hi, uint32_t rs2) +{ + uint64_t sum =3D rs2; + uint64_t s1 =3D ((uint64_t)rs1_hi << 32) | rs1_lo; + + for (int i =3D 0; i < 4; i++) { + uint16_t h =3D (uint16_t)((s1 >> (i * 16)) & 0xFFFF); + sum +=3D h; + } + + return (uint32_t)sum; +} + + +/* Narrowing Operations (RV32 only, register pair sources) */ + +/** + * PNSRLI.B - Narrowing logical shift right immediate (64-bit to 32-bit) + */ +uint32_t HELPER(pnsrli_b)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + uint32_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF; + uint8_t result =3D (s1_h >> (shamt & 0xF)) & 0xFF; + rd |=3D ((uint32_t)result) << (i * 8); + } + + return rd; +} + +/** + * PNSRL.BS - Narrowing logical shift right from register (64-bit to 32-bi= t) + */ +uint32_t HELPER(pnsrl_bs)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + uint32_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF; + uint32_t s1_h_z32 =3D (uint32_t)s1_h; + uint8_t result =3D (s1_h_z32 >> (shamt & 0x1F)) & 0xFF; + rd |=3D ((uint32_t)result) << (i * 8); + } + + return rd; +} + +/** + * PNSRAI.B - Narrowing arithmetic shift right immediate (64-bit to 32-bit) + */ +uint32_t HELPER(pnsrai_b)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + uint32_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF; + int32_t s1_h_s32 =3D (int32_t)(int16_t)s1_h; + int32_t s1_h_s24 =3D (s1_h_s32 << 8) >> 8; + uint8_t result =3D s1_h_s24 >> (shamt & 0xF) & 0xFF; + rd |=3D ((uint32_t)result) << (i * 8); + } + + return rd; +} + +/** + * PNSRA.BS - Narrowing arithmetic shift right from register (64-bit to 32= -bit) + */ +uint32_t HELPER(pnsra_bs)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + uint32_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF; + int64_t s1_h_s64 =3D (int64_t)(int16_t)s1_h; + s1_h_s64 =3D (s1_h_s64 << 24) >> 24; + uint8_t result =3D s1_h_s64 >> (shamt & 0x1F) & 0xFF; + rd |=3D ((uint32_t)result) << (i * 8); + } + + return rd; +} + +/** + * PNSRARI.B - Narrowing arithmetic shift right with rounding + * immediate (64-bit to 32-bit) + */ +uint32_t HELPER(pnsrari_b)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + uint32_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF; + int32_t s1_h_s32 =3D (int32_t)(int16_t)s1_h; + int32_t s1_h_s24 =3D (s1_h_s32 << 8) >> 8; + uint32_t shx_25bit =3D ((uint32_t)s1_h_s24 << 1); + uint32_t shx =3D (shx_25bit >> (shamt & 0xF)) & 0x1FF; + uint8_t result =3D ((shx + 1) >> 1) & 0xFF; + rd |=3D ((uint32_t)result) << (i * 8); + } + + return rd; +} + +/** + * PNSRAR.BS - Narrowing arithmetic shift right with rounding + * from register (64-bit to 32-bit) + */ +uint32_t HELPER(pnsrar_bs)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + uint32_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF; + int64_t s1_h_s64 =3D (int64_t)(int16_t)s1_h; + int64_t s1_h_s40 =3D (s1_h_s64 << 24) >> 24; + uint64_t shx_41bit =3D ((uint64_t)s1_h_s40 << 1); + uint64_t shx =3D (shx_41bit >> (shamt & 0x1F)) & 0x1FF; + uint8_t result =3D ((shx + 1) >> 1) & 0xFF; + rd |=3D ((uint32_t)result) << (i * 8); + } + + return rd; +} + +/** + * PNCLIPI.B - Narrowing clip signed (64-bit to 32-bit) with immediate shi= ft + */ +uint32_t HELPER(pnclipi_b)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + uint32_t rd =3D 0; + int sat =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF; + int32_t s1_h_s32 =3D (int32_t)(int16_t)s1_h; + int16_t shx =3D (int16_t)(s1_h_s32 >> (shamt & 0xF)); + uint8_t result =3D 0; + + if (shx < -128) { + sat =3D 1; + result =3D 0x80; /* -128 */ + } else if (shx > 127) { + sat =3D 1; + result =3D 0x7F; /* 127 */ + } else { + result =3D (uint8_t)shx; + } + rd |=3D ((uint32_t)result << (i * 8)); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PNCLIPRI.B - Narrowing clip signed with rounding + * (64-bit to 32-bit) with immediate shift + */ +uint32_t HELPER(pnclipri_b)(CPURISCVState *env, uint64_t s1, uint32_t sham= t) +{ + uint32_t rd =3D 0; + int sat =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF; + int32_t s1_h_s32 =3D (int32_t)(int16_t)s1_h; + uint64_t shx_33bit =3D ((uint32_t)s1_h_s32 << 1); + uint32_t shx =3D (shx_33bit >> (shamt & 0xF)) & 0x1FFFF; + uint16_t round_shx =3D (uint16_t)((shx + 1) >> 1); + int16_t round_shx_s =3D (int16_t)round_shx; + uint8_t result =3D 0; + + if (round_shx_s < -128) { + sat =3D 1; + result =3D 0x80; + } else if (round_shx_s > 127) { + sat =3D 1; + result =3D 0x7F; + } else { + result =3D (uint8_t)round_shx; + } + + rd |=3D ((uint32_t)result) << (i * 8); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PNCLIPIU.B - Narrowing clip unsigned (64-bit to 32-bit) with immediate = shift + */ +uint32_t HELPER(pnclipiu_b)(CPURISCVState *env, uint64_t s1, uint32_t sham= t) +{ + uint32_t rd =3D 0; + int sat =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF; + uint16_t shx =3D s1_h >> (shamt & 0xF); + uint8_t result =3D 0; + + if (shx > 0x00FF) { + sat =3D 1; + result =3D 0xFF; + } else { + result =3D (uint8_t)(shx & 0xFF); + } + rd |=3D ((uint32_t)result) << (i * 8); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PNCLIPRIU.B - Narrowing clip unsigned with rounding + * (64-bit to 32-bit) with immediate shift + */ +uint32_t HELPER(pnclipriu_b)(CPURISCVState *env, uint64_t s1, uint32_t sha= mt) +{ + uint32_t rd =3D 0; + int sat =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF; + uint32_t shx_17bit =3D ((uint32_t)s1_h << 1); + uint32_t shx =3D shx_17bit >> (shamt & 0xF); + uint16_t round_shx =3D (uint16_t)((shx + 1) >> 1); + uint8_t result =3D 0; + + if (round_shx > 0x00FF) { + sat =3D 1; + result =3D 0xFF; + } else { + result =3D (uint8_t)(round_shx & 0xFF); + } + rd |=3D ((uint32_t)result) << (i * 8); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PNCLIP.BS - Narrowing clip signed from register (64-bit to 32-bit) + */ +uint32_t HELPER(pnclip_bs)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + uint32_t rd =3D 0; + int sat =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF; + int64_t s1_h_s64 =3D (int64_t)(int16_t)s1_h; + int64_t s1_h_s48 =3D (s1_h_s64 << 16) >> 16; + int16_t shx =3D (int16_t)(s1_h_s48 >> (shamt & 0x1F)); + uint8_t result =3D 0; + + if (shx < -128) { + sat =3D 1; + result =3D 0x80; + } else if (shx > 127) { + sat =3D 1; + result =3D 0x7F; + } else { + result =3D (uint8_t)shx; + } + rd |=3D ((uint32_t)result) << (i * 8); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PNCLIPR.BS - Narrowing clip signed with rounding + * from register (64-bit to 32-bit) + */ +uint32_t HELPER(pnclipr_bs)(CPURISCVState *env, uint64_t s1, uint32_t sham= t) +{ + uint32_t rd =3D 0; + int sat =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF; + int64_t s1_h_s64 =3D (int64_t)(int16_t)s1_h; + int64_t s1_h_s48 =3D (s1_h_s64 << 16) >> 16; + uint64_t shx_49bit =3D ((uint64_t)s1_h_s48 << 1); + uint32_t shx =3D (shx_49bit >> (shamt & 0x1F)) & 0x1FFFF; + uint16_t round_shx =3D (uint16_t)((shx + 1) >> 1); + int16_t round_shx_s =3D (int16_t)round_shx; + uint8_t result =3D 0; + + if (round_shx_s < -128) { + sat =3D 1; + result =3D 0x80; + } else if (round_shx_s > 127) { + sat =3D 1; + result =3D 0x7F; + } else { + result =3D (uint8_t)round_shx; + } + rd |=3D ((uint32_t)result) << (i * 8); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PNCLIPU.BS - Narrowing clip unsigned from register (64-bit to 32-bit) + */ +uint32_t HELPER(pnclipu_bs)(CPURISCVState *env, uint64_t s1, uint32_t sham= t) +{ + uint32_t rd =3D 0; + int sat =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF; + uint32_t s1_h_z32 =3D (uint32_t)s1_h; + uint16_t shx =3D (s1_h_z32 >> (shamt & 0x1F)) & 0xFFFF; + uint8_t result =3D 0; + + if (shx > 0x00FF) { + sat =3D 1; + result =3D 0xFF; + } else { + result =3D (uint8_t)(shx & 0xFF); + } + rd |=3D ((uint32_t)result) << (i * 8); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PNCLIPRU.BS - Narrowing clip unsigned with rounding + * from register (64-bit to 32-bit) + */ +uint32_t HELPER(pnclipru_bs)(CPURISCVState *env, uint64_t s1, uint32_t sha= mt) +{ + uint32_t rd =3D 0; + int sat =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t s1_h =3D (s1 >> (i * 16)) & 0xFFFF; + uint32_t s1_h_z32 =3D (uint32_t)s1_h; + uint64_t shx_33bit =3D ((uint64_t)s1_h_z32 << 1); + uint32_t shx =3D (shx_33bit >> (shamt & 0x1F)) & 0x1FFFF; + uint16_t round_shx =3D (uint16_t)((shx + 1) >> 1); + uint8_t result =3D 0; + + if (round_shx > 0x00FF) { + sat =3D 1; + result =3D 0xFF; + } else { + result =3D (uint8_t)(round_shx & 0xFF); + } + rd |=3D ((uint32_t)result) << (i * 8); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PNSRLI.H - Narrowing logical shift right immediate + * (64-bit to 32-bit, word to halfword) + */ +uint32_t HELPER(pnsrli_h)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + uint32_t rd =3D 0; + uint32_t s1_low =3D (uint32_t)(s1 & 0xFFFFFFFF); + uint32_t s1_high =3D (uint32_t)((s1 >> 32) & 0xFFFFFFFF); + + uint16_t rd_low =3D (s1_low >> (shamt & 0x1F)) & 0xFFFF; + uint16_t rd_high =3D (s1_high >> (shamt & 0x1F)) & 0xFFFF; + + rd =3D ((uint32_t)rd_high << 16) | rd_low; + return rd; +} + +/** + * PNSRAI.H - Narrowing arithmetic shift right immediate + * (64-bit to 32-bit, word to halfword) + */ +uint32_t HELPER(pnsrai_h)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + uint32_t rd =3D 0; + uint32_t s1_low =3D (uint32_t)(s1 & 0xFFFFFFFF); + int64_t s1_low_s64 =3D (int64_t)(int32_t)s1_low; + int64_t s1_low_s48 =3D (s1_low_s64 << 16) >> 16; + + uint32_t s1_high =3D (uint32_t)((s1 >> 32) & 0xFFFFFFFF); + int64_t s1_high_s64 =3D (int64_t)(int32_t)s1_high; + int64_t s1_high_s48 =3D (s1_high_s64 << 16) >> 16; + + uint16_t rd_low =3D (s1_low_s48 >> (shamt & 0x1F)) & 0xFFFF; + uint16_t rd_high =3D (s1_high_s48 >> (shamt & 0x1F)) & 0xFFFF; + + rd =3D ((uint32_t)rd_high << 16) | rd_low; + return rd; +} + +/** + * PNSRARI.H - Narrowing arithmetic shift right with rounding + * immediate (64-bit to 32-bit, word to halfword) + */ +uint32_t HELPER(pnsrari_h)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + uint32_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + uint32_t s1_w =3D (s1 >> (i * 32)) & 0xFFFFFFFF; + int64_t s1_w_s64 =3D (int64_t)(int32_t)s1_w; + int64_t s1_w_s48 =3D (s1_w_s64 << 16) >> 16; + uint64_t shx_49bit =3D ((uint64_t)s1_w_s48 << 1); + uint32_t shx =3D (shx_49bit >> (shamt & 0x1F)) & 0x1FFFF; + rd |=3D ((uint16_t)((shx + 1) >> 1)) << (i * 16); + } + + return rd; +} + +/** + * PNSRL.HS - Narrowing logical shift right from register + * (64-bit to 32-bit, word to halfword) + */ +uint32_t HELPER(pnsrl_hs)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + uint32_t rd =3D 0; + uint32_t s1_low =3D (uint32_t)(s1 & 0xFFFFFFFF); + uint32_t s1_high =3D (uint32_t)((s1 >> 32) & 0xFFFFFFFF); + + uint16_t rd_low =3D (s1_low >> (shamt & 0x1F)) & 0xFFFF; + uint16_t rd_high =3D (s1_high >> (shamt & 0x1F)) & 0xFFFF; + + rd =3D ((uint32_t)rd_high << 16) | rd_low; + return rd; +} + +/** + * PNSRA.HS - Narrowing arithmetic shift right from register + * (64-bit to 32-bit, word to halfword) + */ +uint32_t HELPER(pnsra_hs)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + uint32_t rd =3D 0; + uint32_t s1_low =3D (uint32_t)(s1 & 0xFFFFFFFF); + uint32_t s1_high =3D (uint32_t)((s1 >> 32) & 0xFFFFFFFF); + + uint16_t rd_low =3D (s1_low >> (shamt & 0x1F)) & 0xFFFF; + uint16_t rd_high =3D (s1_high >> (shamt & 0x1F)) & 0xFFFF; + + rd =3D ((uint32_t)rd_high << 16) | rd_low; + return rd; +} + +/** + * PNSRAR.HS - Narrowing arithmetic shift right with rounding + * from register (64-bit to 32-bit, word to halfword) + */ +uint32_t HELPER(pnsrar_hs)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + uint32_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + uint32_t s1_w =3D (s1 >> (i * 32)) & 0xFFFFFFFF; + int64_t s1_w_s64 =3D (int64_t)(int32_t)s1_w; + int64_t s1_w_s48 =3D (s1_w_s64 << 16) >> 16; + uint64_t shx_49bit =3D ((uint64_t)s1_w_s48 << 1); + uint32_t shx =3D (shx_49bit >> (shamt & 0x1F)) & 0x1FFFF; + rd |=3D ((uint16_t)((shx + 1) >> 1)) << (i * 16); + } + + return rd; +} + +/** + * PNCLIP.HS - Narrowing signed clip from register shift (word to halfword) + * For each word: arithmetic right shift, clip to signed 16-bit + * shx =3D (int32_t)rs1[i] >> shamt + * result =3D sat16(shx) + */ +uint32_t HELPER(pnclip_hs)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + uint32_t rd =3D 0; + int sat =3D 0; + uint8_t shift =3D shamt & 0x1F; + + for (int i =3D 0; i < 2; i++) { + uint32_t s1_w =3D EXTRACT32(s1, i); + int64_t s1_w_s64 =3D (int64_t)(int32_t)s1_w; + int32_t shx =3D (int32_t)(s1_w_s64 >> shift); + uint16_t result; + + if (shx < -32768) { + sat =3D 1; + result =3D 0x8000; + } else if (shx > 32767) { + sat =3D 1; + result =3D 0x7FFF; + } else { + result =3D (uint16_t)shx; + } + + rd =3D INSERT16(rd, result, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PNCLIPR.HS - Narrowing signed clip with rounding + * from register (word to halfword) + * For each word: ((int32_t)rs1[i] << 1) >> shamt, round, clip to signed 1= 6-bit + * shx_65bit =3D ((int64_t)rs1[i] << 1) + * shx =3D (shx_65bit >> shamt) & mask + * round =3D (shx + 1) >> 1 + * result =3D sat16(round) + */ +uint32_t HELPER(pnclipr_hs)(CPURISCVState *env, uint64_t s1, uint32_t sham= t) +{ + uint32_t rd =3D 0; + int sat =3D 0; + uint8_t shift =3D shamt & 0x1F; + + for (int i =3D 0; i < 2; i++) { + uint32_t s1_w =3D EXTRACT32(s1, i); + int64_t s1_w_s64 =3D (int64_t)(int32_t)s1_w; + __uint128_t shx_65bit =3D (__uint128_t)s1_w_s64 << 1; + uint64_t shx =3D (uint64_t)(shx_65bit >> shift) & 0x1FFFFFFFF; + int32_t round_shx =3D (int32_t)((shx + 1) >> 1); + uint16_t result; + + if (round_shx < -32768) { + sat =3D 1; + result =3D 0x8000; + } else if (round_shx > 32767) { + sat =3D 1; + result =3D 0x7FFF; + } else { + result =3D (uint16_t)round_shx; + } + + rd =3D INSERT16(rd, result, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PNCLIPI.H - Narrowing signed clip from immediate shift (word to halfwor= d) + * For each word: rs1[i] >> imm, clip to signed 16-bit + */ +uint32_t HELPER(pnclipi_h)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + return HELPER(pnclip_hs)(env, s1, shamt); +} + +/** + * PNCLIPRI.H - Narrowing signed clip with rounding + * from immediate shift (word to halfword) + * For each word: (rs1[i] << 1) >> imm, round, clip to signed 16-bit + */ +uint32_t HELPER(pnclipri_h)(CPURISCVState *env, uint64_t s1, uint32_t sham= t) +{ + return HELPER(pnclipr_hs)(env, s1, shamt); +} + +/** + * PNCLIPU.HS - Narrowing unsigned clip from register shift (word to halfw= ord) + * For each word: shift right, clip to unsigned 16-bit + * shx =3D rs1[i] >> shamt + * result =3D (shx > 65535) ? 0xFFFF : shx + */ +uint32_t HELPER(pnclipu_hs)(CPURISCVState *env, uint64_t s1, uint32_t sham= t) +{ + uint32_t rd =3D 0; + int sat =3D 0; + uint8_t shift =3D shamt & 0x1F; + + for (int i =3D 0; i < 2; i++) { + uint32_t s1_w =3D EXTRACT32(s1, i); + uint32_t shx =3D s1_w >> shift; + uint16_t result; + + if (shx > 65535) { + sat =3D 1; + result =3D 0xFFFF; + } else { + result =3D (uint16_t)shx; + } + + rd =3D INSERT16(rd, result, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PNCLIPRU.HS - Narrowing unsigned clip with rounding + * from register (word to halfword) + * For each word: (rs1[i] << 1) >> shamt, round, clip to unsigned 16-bit + * shx =3D ((rs1[i] << 1) >> shamt) + * round =3D (shx + 1) >> 1 + * result =3D (round > 65535) ? 0xFFFF : round + */ +uint32_t HELPER(pnclipru_hs)(CPURISCVState *env, uint64_t s1, uint32_t sha= mt) +{ + uint32_t rd =3D 0; + int sat =3D 0; + uint8_t shift =3D shamt & 0x1F; + + for (int i =3D 0; i < 2; i++) { + uint32_t s1_w =3D EXTRACT32(s1, i); + uint64_t shx_33bit =3D (uint64_t)s1_w << 1; + uint64_t shx =3D shx_33bit >> shift; + uint32_t round_shx =3D (uint32_t)((shx + 1) >> 1); + uint16_t result; + + if (round_shx > 65535) { + sat =3D 1; + result =3D 0xFFFF; + } else { + result =3D (uint16_t)round_shx; + } + + rd =3D INSERT16(rd, result, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PNCLIPIU.H - Narrowing unsigned clip from immediate shift (word to half= word) + * For each word: rs1[i] >> imm, clip to unsigned 16-bit + */ +uint32_t HELPER(pnclipiu_h)(CPURISCVState *env, uint64_t s1, uint32_t sham= t) +{ + return HELPER(pnclipu_hs)(env, s1, shamt); +} + +/** + * PNCLIPRIU.H - Narrowing unsigned clip with rounding + * from immediate shift (word to halfword) + * For each word: (rs1[i] << 1) >> imm, round, clip to unsigned 16-bit + */ +uint32_t HELPER(pnclipriu_h)(CPURISCVState *env, uint64_t s1, uint32_t sha= mt) +{ + return HELPER(pnclipru_hs)(env, s1, shamt); +} + +/** + * NSRLI - Narrowing logical shift right immediate (64-bit to 32-bit) + */ +uint32_t HELPER(nsrli)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + return (s1 >> (shamt & 0x3F)) & 0xFFFFFFFF; +} + +/** + * NSRAI - Narrowing arithmetic shift right immediate (64-bit to 32-bit) + */ +uint32_t HELPER(nsrai)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + __int128_t s1_s128 =3D (__int128_t)((int64_t)s1); + __int128_t s1_s96 =3D (s1_s128 << 32) >> 32; + return (uint32_t)(s1_s96 >> (shamt & 0x3F)) & 0xFFFFFFFF; +} + +/** + * NSRARI - Narrowing arithmetic shift right with rounding + * immediate (64-bit to 32-bit) + */ +uint32_t HELPER(nsrari)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + __int128_t s1_s128 =3D (__int128_t)((int64_t)s1); + __int128_t s1_s96 =3D (s1_s128 << 32) >> 32; + __uint128_t shx_97bit =3D ((__uint128_t)s1_s96 << 1); + uint64_t shx =3D (uint64_t)(shx_97bit >> (shamt & 0x3F)) & 0x1FFFFFFFF; + return (uint32_t)((shx + 1) >> 1); +} + +/** + * NSRL - Narrowing logical shift right from register (64-bit to 32-bit) + */ +uint32_t HELPER(nsrl)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + return (s1 >> (shamt & 0x3F)) & 0xFFFFFFFF; +} + +/** + * NSRA - Narrowing arithmetic shift right from register (64-bit to 32-bit) + */ +uint32_t HELPER(nsra)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + __int128_t s1_s128 =3D (__int128_t)((int64_t)s1); + __int128_t s1_s96 =3D (s1_s128 << 32) >> 32; + return (uint32_t)(s1_s96 >> (shamt & 0x3F)) & 0xFFFFFFFF; +} + +/** + * NSRAR - Narrowing arithmetic shift right with rounding + * from register (64-bit to 32-bit) + */ +uint32_t HELPER(nsrar)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + __int128_t s1_s128 =3D (__int128_t)((int64_t)s1); + __int128_t s1_s96 =3D (s1_s128 << 32) >> 32; + __uint128_t shx_97bit =3D ((__uint128_t)s1_s96 << 1); + uint64_t shx =3D (uint64_t)(shx_97bit >> (shamt & 0x3F)) & 0x1FFFFFFFF; + return (uint32_t)((shx + 1) >> 1); +} + +/** + * NCLIPI - Narrowing clip signed with immediate shift (64-bit to 32-bit) + */ +uint32_t HELPER(nclipi)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + __int128_t s1_s128 =3D (__int128_t)((int64_t)s1); + int64_t shx =3D (int64_t)(s1_s128 >> (shamt & 0x3F)); + + if (shx < -2147483648LL) { + env->vxsat =3D 1; + return 0x80000000U; + } else if (shx > 2147483647LL) { + env->vxsat =3D 1; + return 0x7FFFFFFFU; + } else { + return (uint32_t)(shx & 0xFFFFFFFF); + } +} + +/** + * NCLIPRI - Narrowing clip signed with rounding and immediate + * shift (64-bit to 32-bit) + */ +uint32_t HELPER(nclipri)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + typedef struct { + __uint128_t low; + uint8_t high; + } Uint129; + + Uint129 left_shift_1(__int128_t s1_s128) + { + Uint129 result; + __uint128_t us1 =3D (__uint128_t)s1_s128; + result.low =3D us1 << 1; + result.high =3D (us1 >> 127) & 0x1; + return result; + } + + Uint129 right_shift(Uint129 val, uint32_t smt) + { + Uint129 result; + if (smt =3D=3D 0) { + return val; + } else if (smt >=3D 129) { + result.low =3D 0; + result.high =3D 0; + } else if (smt =3D=3D 128) { + result.low =3D val.high; + result.high =3D 0; + } else { + result.low =3D (val.low >> smt) | + ((__uint128_t)val.high << (128 - smt)); + result.high =3D (val.high >> smt); + } + return result; + } + + __int128_t s1_s128 =3D (__int128_t)((int64_t)s1); + Uint129 shx_129bit =3D left_shift_1(s1_s128); + Uint129 shx =3D right_shift(shx_129bit, shamt & 0x3F); + int64_t round_shx =3D (int64_t)((shx.low + 1) >> 1); + + if (round_shx < -2147483648LL) { + env->vxsat =3D 1; + return 0x80000000U; + } else if (round_shx > 2147483647LL) { + env->vxsat =3D 1; + return 0x7FFFFFFFU; + } else { + return (uint32_t)round_shx; + } +} + +/** + * NCLIPIU - Narrowing clip unsigned with immediate shift (64-bit to 32-bi= t) + */ +uint32_t HELPER(nclipiu)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + uint64_t shx =3D s1 >> (shamt & 0x3F); + + if (shx > 4294967295ULL) { + env->vxsat =3D 1; + return 0xFFFFFFFFU; + } else { + return (uint32_t)(shx & 0xFFFFFFFF); + } +} + +/** + * NCLIPRIU - Narrowing clip unsigned with rounding and immediate + * shift (64-bit to 32-bit) + */ +uint32_t HELPER(nclipriu)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + __uint128_t shx_65bit =3D (s1 << 1); + __uint128_t shx =3D shx_65bit >> (shamt & 0x3F); + uint64_t round_shx =3D (shx + 1) >> 1; + + if (round_shx > 4294967295ULL) { + env->vxsat =3D 1; + return 0xFFFFFFFFU; + } else { + return (uint32_t)(round_shx & 0xFFFFFFFF); + } +} + +/** + * NCLIP - Narrowing clip signed from register (64-bit to 32-bit) + */ +uint32_t HELPER(nclip)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + __int128_t s1_s128 =3D (__int128_t)((int64_t)s1); + int64_t shx =3D (int64_t)(s1_s128 >> (shamt & 0x3F)); + + if (shx < -2147483648LL) { + env->vxsat =3D 1; + return 0x80000000U; + } else if (shx > 2147483647LL) { + env->vxsat =3D 1; + return 0x7FFFFFFFU; + } else { + return (uint32_t)(shx & 0xFFFFFFFF); + } +} + +/** + * NCLIPR - Narrowing clip signed with rounding from register (64-bit to 3= 2-bit) + */ +uint32_t HELPER(nclipr)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + typedef struct { + __uint128_t low; + uint8_t high; + } Uint129; + + Uint129 left_shift_1(__int128_t s1_s128) + { + Uint129 result; + __uint128_t us1 =3D (__uint128_t)s1_s128; + result.low =3D us1 << 1; + result.high =3D (us1 >> 127) & 0x1; + return result; + } + + Uint129 right_shift(Uint129 val, uint32_t smt) + { + Uint129 result; + if (smt =3D=3D 0) { + return val; + } else if (smt >=3D 129) { + result.low =3D 0; + result.high =3D 0; + } else if (smt =3D=3D 128) { + result.low =3D val.high; + result.high =3D 0; + } else { + result.low =3D (val.low >> smt) | + ((__uint128_t)val.high << (128 - smt)); + result.high =3D (val.high >> smt); + } + return result; + } + + __int128_t s1_s128 =3D (__int128_t)((int64_t)s1); + Uint129 shx_129bit =3D left_shift_1(s1_s128); + Uint129 shx =3D right_shift(shx_129bit, shamt & 0x3F); + int64_t round_shx =3D (int64_t)((shx.low + 1) >> 1); + + if (round_shx < -2147483648LL) { + env->vxsat =3D 1; + return 0x80000000U; + } else if (round_shx > 2147483647LL) { + env->vxsat =3D 1; + return 0x7FFFFFFFU; + } else { + return (uint32_t)round_shx; + } +} + +/** + * NCLIPU - Narrowing clip unsigned from register (64-bit to 32-bit) + */ +uint32_t HELPER(nclipu)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + uint64_t shx =3D s1 >> (shamt & 0x3F); + + if (shx > 4294967295ULL) { + env->vxsat =3D 1; + return 0xFFFFFFFFU; + } else { + return (uint32_t)(shx & 0xFFFFFFFF); + } +} + +/** + * NCLIPRU - Narrowing clip unsigned with rounding + * from register (64-bit to 32-bit) + */ +uint32_t HELPER(nclipru)(CPURISCVState *env, uint64_t s1, uint32_t shamt) +{ + __uint128_t shx_65bit =3D (s1 << 1); + __uint128_t shx =3D shx_65bit >> (shamt & 0x3F); + uint64_t round_shx =3D (shx + 1) >> 1; + + if (round_shx > 4294967295ULL) { + env->vxsat =3D 1; + return 0xFFFFFFFFU; + } else { + return (uint32_t)(round_shx & 0xFFFFFFFF); + } +} + +/* Multiplication with Even-Odd Register Pairs as Destination (RV32 only) = */ + +/** + * PMQWACC.H - Packed Q-format halfword to word multiply accumulate + */ +uint64_t HELPER(pmqwacc_h)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int16_t s1_h =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s2_h =3D (int16_t)EXTRACT16(rs2, i * 2); + int32_t d_w =3D (int32_t)EXTRACT32(dest, i); + int64_t prod =3D (int64_t)s1_h * (int64_t)s2_h; + uint32_t res =3D (uint32_t)(d_w + (int32_t)(prod >> 15)); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PMQRWACC.H - Packed Q-format halfword to word multiply + * accumulate with rounding + */ +uint64_t HELPER(pmqrwacc_h)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int16_t s1_h =3D (int16_t)EXTRACT16(rs1, i * 2); + int16_t s2_h =3D (int16_t)EXTRACT16(rs2, i * 2); + int32_t d_w =3D (int32_t)EXTRACT32(dest, i); + int64_t prod =3D (int64_t)s1_h * (int64_t)s2_h + (1LL << 14); + uint32_t res =3D (uint32_t)(d_w + (int32_t)(prod >> 15)); + rd =3D INSERT32(rd, res, i); + } + return rd; +} + +/** + * PWMUL.B - Widening byte to halfword multiplication + */ +uint64_t HELPER(pwmul_b)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + int8_t s1_b =3D (int8_t)EXTRACT8(rs1, i); + int8_t s2_b =3D (int8_t)EXTRACT8(rs2, i); + int16_t prod =3D (int16_t)s1_b * (int16_t)s2_b; + rd |=3D ((uint64_t)(uint16_t)prod) << (i * 16); + } + return rd; +} + +/** + * PWMULSU.B - Widening signed x unsigned byte to halfword multiplication + */ +uint64_t HELPER(pwmulsu_b)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + int8_t s1_b =3D (int8_t)EXTRACT8(rs1, i); + uint8_t s2_b =3D EXTRACT8(rs2, i); + int16_t prod =3D (int16_t)s1_b * (uint16_t)s2_b; + rd |=3D ((uint64_t)(uint16_t)prod) << (i * 16); + } + return rd; +} + +/** + * PWMULU.B - Widening unsigned byte to halfword multiplication + */ +uint64_t HELPER(pwmulu_b)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint8_t s1_b =3D EXTRACT8(rs1, i); + uint8_t s2_b =3D EXTRACT8(rs2, i); + uint16_t prod =3D (uint16_t)s1_b * (uint16_t)s2_b; + rd |=3D ((uint64_t)prod) << (i * 16); + } + return rd; +} + +/** + * PWMUL.H - Widening halfword to word multiplication + */ +uint64_t HELPER(pwmul_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int16_t s1_h =3D (int16_t)EXTRACT16(rs1, i); + int16_t s2_h =3D (int16_t)EXTRACT16(rs2, i); + int32_t prod =3D (int32_t)s1_h * (int32_t)s2_h; + rd |=3D ((uint64_t)(uint32_t)prod) << (i * 32); + } + return rd; +} + +/** + * PWMULSU.H - Widening signed x unsigned halfword to word multiplication + */ +uint64_t HELPER(pwmulsu_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int16_t s1_h =3D (int16_t)EXTRACT16(rs1, i); + uint16_t s2_h =3D EXTRACT16(rs2, i); + int32_t prod =3D (int32_t)s1_h * (uint32_t)s2_h; + rd |=3D ((uint64_t)(uint32_t)prod) << (i * 32); + } + return rd; +} + +/** + * PWMULU.H - Widening unsigned halfword to word multiplication + */ +uint64_t HELPER(pwmulu_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + uint16_t s1_h =3D EXTRACT16(rs1, i); + uint16_t s2_h =3D EXTRACT16(rs2, i); + uint32_t prod =3D (uint32_t)s1_h * (uint32_t)s2_h; + rd |=3D ((uint64_t)prod) << (i * 32); + } + return rd; +} + +/** + * PWMACC.H - Widening multiply accumulate (halfword to word) + */ +uint64_t HELPER(pwmacc_h)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int16_t s1_h =3D (int16_t)EXTRACT16(rs1, i); + int16_t s2_h =3D (int16_t)EXTRACT16(rs2, i); + int32_t d_w =3D (int32_t)EXTRACT32(dest, i); + int32_t prod =3D (int32_t)s1_h * (int32_t)s2_h; + uint32_t res =3D (uint32_t)(d_w + prod); + rd |=3D ((uint64_t)res) << (i * 32); + } + return rd; +} + +/** + * PWMACCSU.H - Widening signed x unsigned multiply + * accumulate (halfword to word) + */ +uint64_t HELPER(pwmaccsu_h)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + int16_t s1_h =3D (int16_t)EXTRACT16(rs1, i); + uint16_t s2_h =3D EXTRACT16(rs2, i); + int32_t d_w =3D (int32_t)EXTRACT32(dest, i); + int32_t prod =3D (int32_t)s1_h * (uint32_t)s2_h; + uint32_t res =3D (uint32_t)(d_w + prod); + rd |=3D ((uint64_t)res) << (i * 32); + } + return rd; +} + +/** + * PWMACCU.H - Widening unsigned multiply accumulate (halfword to word) + */ +uint64_t HELPER(pwmaccu_h)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t dest) +{ + uint64_t rd =3D 0; + + for (int i =3D 0; i < 2; i++) { + uint16_t s1_h =3D EXTRACT16(rs1, i); + uint16_t s2_h =3D EXTRACT16(rs2, i); + uint32_t d_w =3D EXTRACT32(dest, i); + uint32_t prod =3D (uint32_t)s1_h * (uint32_t)s2_h; + uint32_t res =3D d_w + prod; + rd |=3D ((uint64_t)res) << (i * 32); + } + return rd; +} + +/** + * MQWACC - Q-format word multiply accumulate + */ +uint64_t HELPER(mqwacc)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t dest) +{ + int64_t s1 =3D (int64_t)(int32_t)rs1; + int64_t s2 =3D (int64_t)(int32_t)rs2; + int64_t d =3D (int64_t)dest; + __int128_t prod =3D (__int128_t)s1 * (__int128_t)s2; + return (uint64_t)(d + (int64_t)(prod >> 31)); +} + +/** + * MQRWACC - Q-format word multiply accumulate with rounding + */ +uint64_t HELPER(mqrwacc)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t dest) +{ + int64_t s1 =3D (int64_t)(int32_t)rs1; + int64_t s2 =3D (int64_t)(int32_t)rs2; + int64_t d =3D (int64_t)dest; + __int128_t prod =3D (__int128_t)s1 * (__int128_t)s2 + (1LL << 30); + return (uint64_t)(d + (int64_t)(prod >> 31)); +} + +/** + * WMUL - Widening signed multiplication (32-bit to 64-bit, RV32) + */ +uint64_t HELPER(wmul)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + return (uint64_t)((int64_t)(int32_t)rs1 * (int64_t)(int32_t)rs2); +} + +/** + * WMULSU - Widening signed x unsigned multiplication (32-bit to 64-bit, R= V32) + */ +uint64_t HELPER(wmulsu)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + return (uint64_t)((int64_t)(int32_t)rs1 * (uint64_t)rs2); +} + +/** + * WMULU - Widening unsigned multiplication (32-bit to 64-bit, RV32) + */ +uint64_t HELPER(wmulu)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + return (uint64_t)rs1 * (uint64_t)rs2; +} + +/** + * WMACC - Widening multiply accumulate signed (32-bit to 64-bit, RV32) + */ +uint64_t HELPER(wmacc)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t dest) +{ + return (uint64_t)((int64_t)(int32_t)rs1 * + (int64_t)(int32_t)rs2 + (int64_t)dest); +} + +/** + * WMACCSU - Widening multiply accumulate signed x unsigned + * (32-bit to 64-bit, RV32) + */ +uint64_t HELPER(wmaccsu)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t dest) +{ + return (uint64_t)((int64_t)(int32_t)rs1 * (uint64_t)rs2 + (int64_t)des= t); +} + +/** + * WMACCU - Widening multiply accumulate unsigned (32-bit to 64-bit, RV32) + */ +uint64_t HELPER(wmaccu)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t dest) +{ + return (uint64_t)rs1 * (uint64_t)rs2 + (uint64_t)dest; +} + +/** + * PM2WADD.H - Add two widening products (halfword to doubleword) + */ +uint64_t HELPER(pm2wadd_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1); + int64_t prod0 =3D (int64_t)s1_h0 * (int64_t)s2_h0; + int64_t prod1 =3D (int64_t)s1_h1 * (int64_t)s2_h1; + return (uint64_t)(prod0 + prod1); +} + +/** + * PM2WADDSU.H - Add two widening products + * (signed x unsigned, halfword to doubleword) + */ +uint64_t HELPER(pm2waddsu_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs= 2) +{ + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1); + uint16_t s2_h0 =3D EXTRACT16(rs2, 0); + uint16_t s2_h1 =3D EXTRACT16(rs2, 1); + int64_t prod0 =3D (int64_t)s1_h0 * (uint64_t)s2_h0; + int64_t prod1 =3D (int64_t)s1_h1 * (uint64_t)s2_h1; + return (uint64_t)(prod0 + prod1); +} + +/** + * PM2WADDU.H - Add two widening products (unsigned, halfword to doublewor= d) + */ +uint64_t HELPER(pm2waddu_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + uint16_t s1_h0 =3D EXTRACT16(rs1, 0); + uint16_t s1_h1 =3D EXTRACT16(rs1, 1); + uint16_t s2_h0 =3D EXTRACT16(rs2, 0); + uint16_t s2_h1 =3D EXTRACT16(rs2, 1); + uint64_t prod0 =3D (uint64_t)s1_h0 * (uint64_t)s2_h0; + uint64_t prod1 =3D (uint64_t)s1_h1 * (uint64_t)s2_h1; + return prod0 + prod1; +} + +/** + * PM2WADDA.H - Add two widening products with accumulate + * (halfword to doubleword) + */ +uint64_t HELPER(pm2wadda_h)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t dest) +{ + int16_t s1_h0 =3D EXTRACT16(rs1, 0); + int16_t s1_h1 =3D EXTRACT16(rs1, 1); + int16_t s2_h0 =3D EXTRACT16(rs2, 0); + int16_t s2_h1 =3D EXTRACT16(rs2, 1); + int64_t d_h =3D (int64_t)dest; + int64_t mul_00 =3D (int64_t)s1_h0 * (int64_t)s2_h0; + int64_t mul_11 =3D (int64_t)s1_h1 * (int64_t)s2_h1; + return (uint64_t)(d_h + mul_00 + mul_11); +} + +/** + * PM2WADDASU.H - Add two widening products with accumulate + * (signed x unsigned, halfword to doubleword) + */ +uint64_t HELPER(pm2waddasu_h)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t dest) +{ + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1); + uint16_t s2_h0 =3D (uint16_t)EXTRACT16(rs2, 0); + uint16_t s2_h1 =3D (uint16_t)EXTRACT16(rs2, 1); + int64_t d_h =3D (int64_t)dest; + int64_t mul_00 =3D (int64_t)s1_h0 * (uint64_t)s2_h0; + int64_t mul_11 =3D (int64_t)s1_h1 * (uint64_t)s2_h1; + return (uint64_t)(d_h + mul_00 + mul_11); +} + +/** + * PM2WADDAU.H - Add two widening products with accumulate + * (unsigned, halfword to doubleword) + */ +uint64_t HELPER(pm2waddau_h)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t dest) +{ + uint16_t s1_h0 =3D (uint16_t)EXTRACT16(rs1, 0); + uint16_t s1_h1 =3D (uint16_t)EXTRACT16(rs1, 1); + uint16_t s2_h0 =3D (uint16_t)EXTRACT16(rs2, 0); + uint16_t s2_h1 =3D (uint16_t)EXTRACT16(rs2, 1); + uint64_t d_h =3D (uint64_t)dest; + uint64_t mul_00 =3D (uint64_t)s1_h0 * (uint64_t)s2_h0; + uint64_t mul_11 =3D (uint64_t)s1_h1 * (uint64_t)s2_h1; + return (uint64_t)(d_h + mul_00 + mul_11); +} + +/** + * PM2WADD.HX - Add two widening cross products (halfword to doubleword) + */ +uint64_t HELPER(pm2wadd_hx)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1); + int64_t prod01 =3D (int64_t)s1_h0 * (int64_t)s2_h1; + int64_t prod10 =3D (int64_t)s1_h1 * (int64_t)s2_h0; + return (uint64_t)(prod01 + prod10); +} + +/** + * PM2WADDA.HX - Add two widening cross products with accumulate + * (halfword to doubleword) + */ +uint64_t HELPER(pm2wadda_hx)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t dest) +{ + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1); + int64_t d =3D (int64_t)dest; + int64_t prod01 =3D (int64_t)s1_h0 * (int64_t)s2_h1; + int64_t prod10 =3D (int64_t)s1_h1 * (int64_t)s2_h0; + return (uint64_t)(d + prod01 + prod10); +} + +/** + * PM2WSUB.H - Subtract two widening products (halfword to doubleword) + */ +uint64_t HELPER(pm2wsub_h)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1); + int64_t prod0 =3D (int64_t)s1_h0 * (int64_t)s2_h0; + int64_t prod1 =3D (int64_t)s1_h1 * (int64_t)s2_h1; + return (uint64_t)(prod0 - prod1); +} + +/** + * PM2WSUB.HX - Subtract two widening cross products (halfword to doublewo= rd) + */ +uint64_t HELPER(pm2wsub_hx)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1); + int64_t prod01 =3D (int64_t)s1_h0 * (int64_t)s2_h1; + int64_t prod10 =3D (int64_t)s1_h1 * (int64_t)s2_h0; + return (uint64_t)(prod01 - prod10); +} + +/** + * PM2WSUBA.H - Subtract two widening products with accumulate + * (halfword to doubleword) + */ +uint64_t HELPER(pm2wsuba_h)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t dest) +{ + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1); + int64_t d =3D (int64_t)dest; + int64_t prod0 =3D (int64_t)s1_h0 * (int64_t)s2_h0; + int64_t prod1 =3D (int64_t)s1_h1 * (int64_t)s2_h1; + return (uint64_t)(d + prod0 - prod1); +} + +/** + * PM2WSUBA.HX - Subtract two widening cross products with accumulate + * (halfword to doubleword) + */ +uint64_t HELPER(pm2wsuba_hx)(CPURISCVState *env, uint32_t rs1, + uint32_t rs2, uint64_t dest) +{ + int16_t s1_h0 =3D (int16_t)EXTRACT16(rs1, 0); + int16_t s1_h1 =3D (int16_t)EXTRACT16(rs1, 1); + int16_t s2_h0 =3D (int16_t)EXTRACT16(rs2, 0); + int16_t s2_h1 =3D (int16_t)EXTRACT16(rs2, 1); + int64_t d =3D (int64_t)dest; + int64_t prod01 =3D (int64_t)s1_h0 * (int64_t)s2_h1; + int64_t prod10 =3D (int64_t)s1_h1 * (int64_t)s2_h0; + return (uint64_t)(d + prod01 - prod10); +} --=20 2.34.1 From nobody Sat May 30 20:13:16 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1776422973305318.5674874282191; Fri, 17 Apr 2026 03:49:33 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wDgjq-0001XR-Ue; Fri, 17 Apr 2026 06:47:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wDgjo-0001Tu-Hj; Fri, 17 Apr 2026 06:47:52 -0400 Received: from smtp21.cstnet.cn ([159.226.251.21] helo=cstnet.cn) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1wDgjl-000846-F4; Fri, 17 Apr 2026 06:47:52 -0400 Received: from Huawei.localdomain (unknown [36.110.52.2]) by APP-01 (Coremail) with SMTP id qwCowAB3H2ulD+JpLDmSDQ--.804S16; Fri, 17 Apr 2026 18:47:25 +0800 (CST) From: Molly Chen To: palmer@dabbelt.com, alistair.francis@wdc.com, liwei1518@gmail.com, daniel.barboza@oss.qualcomm.com, zhiwei_liu@linux.alibaba.com, chao.liu.zevorn@gmail.com Cc: xiaoou@iscas.ac.cn, qemu-riscv@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH 14/14] target/riscv: rvp: update to v020, add SHL and PNCLIP[U]P.* instructions Date: Fri, 17 Apr 2026 18:46:51 +0800 Message-Id: <20260417104652.17857-15-xiaoou@iscas.ac.cn> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260417104652.17857-1-xiaoou@iscas.ac.cn> References: <20260417104652.17857-1-xiaoou@iscas.ac.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: qwCowAB3H2ulD+JpLDmSDQ--.804S16 X-Coremail-Antispam: 1UD129KBjvAXoWfJF4UWrWxCFyfXr4UGFyUZFb_yoW8Gw1xZo WrKw45Ar1fGw13u34F9w4UXr1UZr92vw1kGr48Zr42qas7Wr12gFn8J3s5AF40qrWayrW7 XrZ3WryrtF1akr9rn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOb7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l 84ACjcxK6xIIjxv20xvE14v26ryj6F1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV WxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE 3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I x0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r4j6F4UMcvjeVCFs4IE7xkEbVWUJVW8 JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lc7CjxVAaw2AFwI0_Jw 0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AK xVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrx kI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v2 6r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F 4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIYCTnIWIevJa73UjIFyTuYvjfU5Tmh DUUUU X-Originating-IP: [36.110.52.2] X-CM-SenderInfo: 50ld003x6l2u1dvotugofq/ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists1p.gnu.org; Received-SPF: pass client-ip=159.226.251.21; envelope-from=xiaoou@iscas.ac.cn; helo=cstnet.cn X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, HK_RANDOM_ENVFROM=0.998, HK_RANDOM_FROM=0.998, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1776422974548158500 Content-Type: text/plain; charset="utf-8" Signed-off-by: Molly Chen --- target/riscv/helper.h | 14 + target/riscv/insn32.decode | 22 ++ target/riscv/insn_trans/trans_rvp.c.inc | 18 ++ target/riscv/psimd_helper.c | 370 ++++++++++++++++++++++++ 4 files changed, 424 insertions(+) diff --git a/target/riscv/helper.h b/target/riscv/helper.h index 85d4fe1b67..a9dbe53dbf 100644 --- a/target/riscv/helper.h +++ b/target/riscv/helper.h @@ -1483,6 +1483,14 @@ DEF_HELPER_3(ssha, i32, env, i32, i32) DEF_HELPER_3(sshar, i32, env, i32, i32) DEF_HELPER_3(sha, i64, env, i64, i64) DEF_HELPER_3(shar, i64, env, i64, i64) +DEF_HELPER_3(psshl_hs, tl, env, tl, tl) +DEF_HELPER_3(psshlr_hs, tl, env, tl, tl) +DEF_HELPER_3(psshl_ws, i64, env, i64, i64) +DEF_HELPER_3(psshlr_ws, i64, env, i64, i64) +DEF_HELPER_3(sshl, i32, env, i32, i32) +DEF_HELPER_3(sshlr, i32, env, i32, i32) +DEF_HELPER_3(shl, i64, env, i64, i64) +DEF_HELPER_3(shlr, i64, env, i64, i64) =20 /* Packed SIMD - Exchange Operations */ DEF_HELPER_3(pas_hx, tl, env, tl, tl) @@ -1538,6 +1546,12 @@ DEF_HELPER_4(srx, tl, env, tl, tl, tl) DEF_HELPER_4(mvm, tl, env, tl, tl, tl) DEF_HELPER_4(mvmn, tl, env, tl, tl, tl) DEF_HELPER_4(merge, tl, env, tl, tl, tl) +DEF_HELPER_3(pnclipp_b, i64, env, i64, i64) +DEF_HELPER_3(pnclipup_b, i64, env, i64, i64) +DEF_HELPER_3(pnclipp_h, i64, env, i64, i64) +DEF_HELPER_3(pnclipup_h, i64, env, i64, i64) +DEF_HELPER_3(pnclipp_w, i64, env, i64, i64) +DEF_HELPER_3(pnclipup_w, i64, env, i64, i64) =20 /* Packed SIMD - Count Leading Operations */ DEF_HELPER_2(cls, tl, env, tl) diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode index 7be0b9e5e6..d7aebd55e2 100644 --- a/target/riscv/insn32.decode +++ b/target/riscv/insn32.decode @@ -1293,6 +1293,18 @@ psshar_hs 1111100 ..... ..... 010 ..... 0011011 @r } sha 1110111 ..... ..... 010 ..... 0011011 @r shar 1111111 ..... ..... 010 ..... 0011011 @r +psshl_hs 1010100 ..... ..... 010 ..... 0011011 @r +psshlr_hs 1011100 ..... ..... 010 ..... 0011011 @r +{ + psshl_ws 1010101 ..... ..... 010 ..... 0011011 @r + sshl 1010101 ..... ..... 010 ..... 0011011 @r +} +{ + psshlr_ws 1011101 ..... ..... 010 ..... 0011011 @r + sshlr 1011101 ..... ..... 010 ..... 0011011 @r +} +shl 1010111 ..... ..... 010 ..... 0011011 @r +shlr 1011111 ..... ..... 010 ..... 0011011 @r =20 # Packed SIMD - Exchange Operations pas_hx 1000000 ..... ..... 110 ..... 0111011 @r @@ -1346,6 +1358,12 @@ srx 1010111 ..... ..... 001 ..... 0111011 @r mvm 1010100 ..... ..... 001 ..... 0111011 @r mvmn 1010101 ..... ..... 001 ..... 0111011 @r merge 1010110 ..... ..... 001 ..... 0111011 @r +pnclipp_b 1100000 ..... ..... 010 ..... 0111011 @r +pnclipup_b 1000000 ..... ..... 010 ..... 0111011 @r +pnclipp_h 1100001 ..... ..... 010 ..... 0111011 @r +pnclipup_h 1000001 ..... ..... 010 ..... 0111011 @r +pnclipp_w 1100011 ..... ..... 010 ..... 0111011 @r +pnclipup_w 1000011 ..... ..... 010 ..... 0111011 @r =20 # Packed SIMD - Count Leading Operations cls 01100 0000011 ..... 001 ..... 0010011 @r2 @@ -1790,12 +1808,16 @@ psrl_dhs 0000100 ..... .... 1110 .... 00011011 @= r_p_3 psra_dhs 0100100 ..... .... 1110 .... 00011011 @r_p_3 pssha_dhs 0110100 ..... .... 0110 .... 00011011 @r_p_3 psshar_dhs 0111100 ..... .... 0110 .... 00011011 @r_p_3 +psshl_dhs 0010100 ..... .... 0110 .... 00011011 @r_p_3 +psshlr_dhs 0011100 ..... .... 0110 .... 00011011 @r_p_3 padd_dws 0001101 ..... .... 0110 .... 00011011 @r_p_3 psll_dws 0000101 ..... .... 0110 .... 00011011 @r_p_3 psrl_dws 0000101 ..... .... 1110 .... 00011011 @r_p_3 psra_dws 0100101 ..... .... 1110 .... 00011011 @r_p_3 pssha_dws 0110101 ..... .... 0110 .... 00011011 @r_p_3 psshar_dws 0111101 ..... .... 0110 .... 00011011 @r_p_3 +psshl_dws 0010101 ..... .... 0110 .... 00011011 @r_p_3 +psshlr_dws 0011101 ..... .... 0110 .... 00011011 @r_p_3 =20 # register-pair operands ppaire_db 1000000 .... 0 .... 1110 .... 00011011 @r_p_2 diff --git a/target/riscv/insn_trans/trans_rvp.c.inc b/target/riscv/insn_tr= ans/trans_rvp.c.inc index ca459293a3..b4142adcfb 100644 --- a/target/riscv/insn_trans/trans_rvp.c.inc +++ b/target/riscv/insn_trans/trans_rvp.c.inc @@ -686,6 +686,14 @@ GEN_SIMD_TRANS_32(ssha) GEN_SIMD_TRANS_32(sshar) GEN_SIMD_TRANS_64(sha) GEN_SIMD_TRANS_64(shar) +GEN_SIMD_TRANS(psshl_hs) +GEN_SIMD_TRANS(psshlr_hs) +GEN_SIMD_TRANS_64(psshl_ws) +GEN_SIMD_TRANS_64(psshlr_ws) +GEN_SIMD_TRANS_32(sshl) +GEN_SIMD_TRANS_32(sshlr) +GEN_SIMD_TRANS_64(shl) +GEN_SIMD_TRANS_64(shlr) =20 /* Packed SIMD - Exchange Operations */ GEN_SIMD_TRANS(pas_hx) @@ -739,6 +747,12 @@ GEN_SIMD_TRANS_ACC(srx) GEN_SIMD_TRANS_ACC(mvm) GEN_SIMD_TRANS_ACC(mvmn) GEN_SIMD_TRANS_ACC(merge) +GEN_SIMD_TRANS_64(pnclipp_b) +GEN_SIMD_TRANS_64(pnclipup_b) +GEN_SIMD_TRANS_64(pnclipp_h) +GEN_SIMD_TRANS_64(pnclipup_h) +GEN_SIMD_TRANS_64(pnclipp_w) +GEN_SIMD_TRANS_64(pnclipup_w) =20 /* Packed SIMD - Count Leading Operations */ GEN_SIMD_TRANS_R1(cls) @@ -1066,8 +1080,12 @@ GEN_SIMD_TRANS_REG_PAIR_3(psrl_dhs, psrl_hs) GEN_SIMD_TRANS_REG_PAIR_3(psra_dhs, psra_hs) GEN_SIMD_TRANS_REG_PAIR_3(pssha_dhs, pssha_hs) GEN_SIMD_TRANS_REG_PAIR_3(psshar_dhs, psshar_hs) +GEN_SIMD_TRANS_REG_PAIR_3(psshl_dhs, psshl_hs) +GEN_SIMD_TRANS_REG_PAIR_3(psshlr_dhs, psshlr_hs) GEN_SIMD_TRANS_REG_PAIR_DW(pssha_dws, ssha) GEN_SIMD_TRANS_REG_PAIR_DW(psshar_dws, sshar) +GEN_SIMD_TRANS_REG_PAIR_DW(psshl_dws, sshl) +GEN_SIMD_TRANS_REG_PAIR_DW(psshlr_dws, sshlr) =20 GEN_SIMD_TRANS_REG_PAIR_2(ppairo_db, ppairo_b) GEN_SIMD_TRANS_REG_PAIR_2(ppairo_dh, ppairo_h) diff --git a/target/riscv/psimd_helper.c b/target/riscv/psimd_helper.c index 4c91800128..96e016d90d 100644 --- a/target/riscv/psimd_helper.c +++ b/target/riscv/psimd_helper.c @@ -2704,6 +2704,242 @@ uint64_t HELPER(shar)(CPURISCVState *env, uint64_t = rs1, uint64_t rs2) } } =20 +/** + * PSSHL.HS - Packed 16-bit variable shift with unsigned saturation + * Positive shift left (saturating), negative shift right (logical) + */ +target_ulong HELPER(psshl_hs)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + int sat =3D 0; + int8_t shamt =3D (int8_t)(rs2 & 0xFF); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D (uint16_t)EXTRACT16(rs1, i); + uint16_t res; + + if (shamt >=3D 0) { + uint32_t shifted =3D (shamt >=3D 16) ? ((uint32_t)e1 << 16) + : ((uint32_t)e1 << shamt); + res =3D unsigned_saturate_h(shifted, &sat); + } else { + int right =3D -shamt; + if (right >=3D 16) { + res =3D 0; + } else { + res =3D e1 >> right; + } + } + + rd =3D INSERT16(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSSHLR.HS - Packed 16-bit variable shift with rounding + * and unsigned saturation + * Positive shift left (saturating), negative shift right (logical, rounde= d) + */ +target_ulong HELPER(psshlr_hs)(CPURISCVState *env, target_ulong rs1, + target_ulong rs2) +{ + target_ulong rd =3D 0; + int elems =3D ELEMS_H(rd); + int sat =3D 0; + int8_t shamt =3D (int8_t)(rs2 & 0xFF); + + for (int i =3D 0; i < elems; i++) { + uint16_t e1 =3D (uint16_t)EXTRACT16(rs1, i); + uint16_t res; + + if (shamt >=3D 0) { + uint32_t shifted =3D (shamt >=3D 16) ? ((uint32_t)e1 << 16) + : ((uint32_t)e1 << shamt); + res =3D unsigned_saturate_h(shifted, &sat); + } else { + int right =3D -shamt; + if (right > 16) { + res =3D 0; + } else { + uint32_t rounded =3D ((uint32_t)e1 >> (right - 1)) + 1; + res =3D (uint16_t)(rounded >> 1); + } + } + + rd =3D INSERT16(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSSHL.WS - Packed 32-bit variable shift with unsigned saturation (RV64 = only) + * Positive shift left (saturating), negative shift right (logical) + */ +uint64_t HELPER(psshl_ws)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int sat =3D 0; + int8_t shamt =3D (int8_t)(rs2 & 0xFF); + + for (int i =3D 0; i < 2; i++) { + uint32_t e1 =3D (uint32_t)EXTRACT32(rs1, i); + uint32_t res; + + if (shamt >=3D 0) { + uint64_t shifted =3D (shamt >=3D 32) ? ((uint64_t)e1 << 32) + : ((uint64_t)e1 << shamt); + res =3D unsigned_saturate_w(shifted, &sat); + } else { + int right =3D -shamt; + if (right >=3D 32) { + res =3D 0; + } else { + res =3D e1 >> right; + } + } + + rd =3D INSERT32(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PSSHLR.WS - Packed 32-bit variable shift with rounding + * and unsigned saturation (RV64 only) + * Positive shift left (saturating), negative shift right (logical, rounde= d) + */ +uint64_t HELPER(psshlr_ws)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int sat =3D 0; + int8_t shamt =3D (int8_t)(rs2 & 0xFF); + + for (int i =3D 0; i < 2; i++) { + uint32_t e1 =3D (uint32_t)EXTRACT32(rs1, i); + uint32_t res; + + if (shamt >=3D 0) { + uint64_t shifted =3D (shamt >=3D 32) ? ((uint64_t)e1 << 32) + : ((uint64_t)e1 << shamt); + res =3D unsigned_saturate_w(shifted, &sat); + } else { + int right =3D -shamt; + if (right > 32) { + res =3D 0; + } else { + uint64_t rounded =3D ((uint64_t)e1 >> (right - 1)) + 1; + res =3D (uint32_t)(rounded >> 1); + } + } + + rd =3D INSERT32(rd, res, i); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * SSHL - 32-bit scalar variable shift with unsigned saturation + */ +uint32_t HELPER(sshl)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int sat =3D 0; + int8_t shamt =3D (int8_t)(rs2 & 0xFF); + + if (shamt < 0) { + int right =3D -shamt; + return (right >=3D 32) ? 0 : (rs1 >> right); + } else { + uint64_t shifted =3D (shamt >=3D 32) ? ((uint64_t)rs1 << 32) + : ((uint64_t)rs1 << shamt); + uint32_t res =3D unsigned_saturate_w(shifted, &sat); + if (sat) { + env->vxsat =3D 1; + } + return res; + } +} + +/** + * SSHLR - 32-bit scalar variable shift with rounding and unsigned saturat= ion + */ +uint32_t HELPER(sshlr)(CPURISCVState *env, uint32_t rs1, uint32_t rs2) +{ + int sat =3D 0; + int8_t shamt =3D (int8_t)(rs2 & 0xFF); + + if (shamt < 0) { + int right =3D -shamt; + if (right > 32) { + return 0; + } else { + uint64_t rounded =3D ((uint64_t)rs1 >> (right - 1)) + 1; + return rounded >> 1; + } + } else { + uint64_t shifted =3D (shamt >=3D 32) ? ((uint64_t)rs1 << 32) + : ((uint64_t)rs1 << shamt); + uint32_t res =3D unsigned_saturate_w(shifted, &sat); + if (sat) { + env->vxsat =3D 1; + } + return res; + } +} + +/** + * SHL - 64-bit scalar variable logical shift + */ +uint64_t HELPER(shl)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + int8_t shamt =3D (int8_t)(rs2 & 0xFF); + + if (shamt < 0) { + int right =3D -shamt; + return (right >=3D 64) ? 0 : (rs1 >> right); + } else { + return (shamt >=3D 64) ? 0 : (rs1 << shamt); + } +} + +/** + * SHLR - 64-bit scalar variable logical shift with rounding + */ +uint64_t HELPER(shlr)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + int8_t shamt =3D (int8_t)(rs2 & 0xFF); + + if (shamt < 0) { + int right =3D -shamt; + if (right > 64) { + return 0; + } else { + uint64_t rounded =3D (rs1 >> (right - 1)) + 1; + return rounded >> 1; + } + } else { + return (shamt >=3D 64) ? 0 : (rs1 << shamt); + } +} + /* Exchange operations (AS/SA/AS/SA with X suffix) */ =20 /** @@ -3573,6 +3809,140 @@ target_ulong HELPER(merge)(CPURISCVState *env, targ= et_ulong rs1, return (~rd & rs1) | (rd & rs2); } =20 +/** + * PNCLIPP.B - Pack narrow clip signed halfwords to bytes (RV64 only) + */ +uint64_t HELPER(pnclipp_b)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int sat =3D 0; + + for (int i =3D 0; i < 4; i++) { + int16_t lo =3D (int16_t)EXTRACT16(rs1, i); + int16_t hi =3D (int16_t)EXTRACT16(rs2, i); + int8_t res_lo =3D signed_saturate_b(lo, &sat); + int8_t res_hi =3D signed_saturate_b(hi, &sat); + + rd =3D (uint64_t)INSERT8(rd, res_lo, i); + rd =3D (uint64_t)INSERT8(rd, res_hi, i + 4); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PNCLIPUP.B - Pack narrow clip unsigned halfwords to bytes (RV64 only) + */ +uint64_t HELPER(pnclipup_b)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int sat =3D 0; + + for (int i =3D 0; i < 4; i++) { + uint16_t lo =3D (uint16_t)EXTRACT16(rs1, i); + uint16_t hi =3D (uint16_t)EXTRACT16(rs2, i); + uint8_t res_lo =3D unsigned_saturate_b(lo, &sat); + uint8_t res_hi =3D unsigned_saturate_b(hi, &sat); + + rd =3D (uint64_t)INSERT8(rd, res_lo, i); + rd =3D (uint64_t)INSERT8(rd, res_hi, i + 4); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PNCLIPP.H - Pack narrow clip signed words to halfwords (RV64 only) + */ +uint64_t HELPER(pnclipp_h)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int sat =3D 0; + + for (int i =3D 0; i < 2; i++) { + int32_t lo =3D (int32_t)EXTRACT32(rs1, i); + int32_t hi =3D (int32_t)EXTRACT32(rs2, i); + int16_t res_lo =3D signed_saturate_h(lo, &sat); + int16_t res_hi =3D signed_saturate_h(hi, &sat); + + rd =3D (uint64_t)INSERT16(rd, res_lo, i); + rd =3D (uint64_t)INSERT16(rd, res_hi, i + 2); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PNCLIPUP.H - Pack narrow clip unsigned words to halfwords (RV64 only) + */ +uint64_t HELPER(pnclipup_h)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int sat =3D 0; + + for (int i =3D 0; i < 2; i++) { + uint32_t lo =3D (uint32_t)EXTRACT32(rs1, i); + uint32_t hi =3D (uint32_t)EXTRACT32(rs2, i); + uint16_t res_lo =3D unsigned_saturate_h(lo, &sat); + uint16_t res_hi =3D unsigned_saturate_h(hi, &sat); + + rd =3D (uint64_t)INSERT16(rd, res_lo, i); + rd =3D (uint64_t)INSERT16(rd, res_hi, i + 2); + } + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PNCLIPP.W - Pack narrow clip signed doublewords to words (RV64 only) + */ +uint64_t HELPER(pnclipp_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int sat =3D 0; + int32_t res_lo =3D signed_saturate_w((int64_t)rs1, &sat); + int32_t res_hi =3D signed_saturate_w((int64_t)rs2, &sat); + + rd =3D (uint64_t)(uint32_t)res_lo; + rd |=3D (uint64_t)(uint32_t)res_hi << 32; + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + +/** + * PNCLIPUP.W - Pack narrow clip unsigned doublewords to words (RV64 only) + */ +uint64_t HELPER(pnclipup_w)(CPURISCVState *env, uint64_t rs1, uint64_t rs2) +{ + uint64_t rd =3D 0; + int sat =3D 0; + uint32_t res_lo =3D unsigned_saturate_w(rs1, &sat); + uint32_t res_hi =3D unsigned_saturate_w(rs2, &sat); + + rd =3D (uint64_t)res_lo; + rd |=3D (uint64_t)res_hi << 32; + + if (sat) { + env->vxsat =3D 1; + } + return rd; +} + /* Count leading operations */ =20 /** --=20 2.34.1